5 May, 2024

Adding labels to the assembler.

Part 3 Adding labels to the assembler

In Part 2 of this series, we implemented a main loop and an assembler, and ended up with just enough of an Owl-2820 CPU to run the code to generate the Fibonacci sequence that we first saw in Part 0.

In this part, we’re going to improve the assembler so that it supports labels.

Labels

In assembly language, a label is a mechanism for giving a meaningful name to an address in memory. For example, in this fragment of code, fib_loop is a label, and the bltu instruction uses it as a branch target.

fib_loop:
        mv      a3, a2                      ; tmp = current
        addi    a1, a1, -1                  ; n = n - 1
        add     a2, a0, a2                  ; current = current + prev
        mv      a0, a3                      ; previous = tmp
        bltu    s4, a1, fib_loop            ; if n > 1 go to fib_loop

If we didn’t have labels, then we would have to use a numeric value to represent the address. This numeric value could be an absolute value, or it could be an offset relative to the current address. Put in terms of labels, an offset is the difference between the address of the instruction that refers to the label and the address of the label itself.

Without labels, the same code would look like this, as the Owl-2820 CPU’s bltu instruction uses an offset rather than an absolute address.

        mv      a3, a2                      ; tmp = current
        addi    a1, a1, -1                  ; n = n - 1
        add     a2, a0, a2                  ; current = current + prev
        mv      a0, a3                      ; previous = tmp
        bltu    s4, a1, -16                 ; if n > 1 go to the address 16 bytes before this one

Calculating offsets

For each assembly language instruction that refers to a label, we need to calculate an offset from the address of the instruction to the address of the label.

The assembler from Part 2 did not support this, so I hand-calculated the offsets then hard-coded them at the top of the Assemble() function.

std::vector<uint32_t> Assemble()
{
    Assembler a;

    // Offsets to labels.
    const int32_t fib = 24;
    const int32_t print_loop1 = -24;
    const int32_t print_loop2 = -60;
    const int32_t printf = 0; // No value, because we're going to cheat.
    const int32_t done = 48;
    const int32_t fib_loop = -16;

    // ...
}

Calculating the offsets by hand was not a problem for this tiny, one-off program. However, it would rapidly become very tiresome if we had to recalculate offsets manually whenever we made a code change.

The problem becomes worse when there are multiple references to the same label. This can happen very easily. In the example above, print_loop1 and print_loop2 are both offsets that refer to the print_loop label. If we needed to refer to the print_loop label from elsewhere in the code then we would need to add yet another offset, print_loop3, as offsets are always relative to the program counter, pc.

Similarly, given a label and an offset that refers to it, if we were to insert or remove lines of code between the two then the offset would need to be recalculated.

Fortunately, offset calculation is something that can be done automatically.

Adapting the assembler to use labels

Let’s adapt the Assembler to do the work. To recap, our assembler emits Owl-2820 machine code. It is not an interpreter that runs it.

We will modify the assembler by introducing a Label type to represent labels. Rather than going into the implementation details now, I’ll show you what the Assemble() function looks like after it has been updated to use labels.

The only difference between the code below and the Assemble() function from the previous post is that this new version uses labels rather than offsets. In all other respects they are the same. The Owl-2820 machine code generated by this version of Assemble() is identical to that generated by the previous one. Both versions of Assemble() use the assembler to emit Owl-2820 machine code which generates the Fibonacci sequence when run.

std::vector<uint32_t> Assemble()
{
    Assembler a;

    // clang-format off
// main:
    a.Li(s0, 0);                // li   s0, 0                   ; i = 0
    a.Li(s2, 2);                // li   s2, 2                   ; s2 = 2
    a.Lui(a0, 1);               // lui  a0, %hi(format_str)
    a.Addi(s1, a0, -548);       // addi s1, a0, %lo(format_str) ; s1 = the address of the printf format string
    a.Li(s3, 48);               // li   s3, 48                  ; s3 = 48
    a.Li(s4, 1);                // li   s4, 1                   ; s4 = 1
    Label fib = a.MakeLabel();
    a.J(fib);                   // j    fib                     ; go to fib
// print_loop:
    Label print_loop = a.MakeLabel();
    a.BindLabel(print_loop);
    a.Mv(a0, s1);               // mv   a0, s1                  ; arg0 = the address of the printf format string
    a.Mv(a1, s0);               // mv   a1, s0                  ; arg1 = i (arg2 contains current)
    Label printf = a.MakeLabel();
    a.Call(printf);             // call printf                  ; call printf
    a.Addi(s0, s0, 1);          // addi s0, s0, 1               ; i = i + 1
    Label done = a.MakeLabel();
    a.Beq(s0, s3, done);        // beq  s0, s3, done            ; if i == 48 go to done
// fib:
    a.BindLabel(fib);
    a.Mv(a2, s0);               // mv   a2, s0                  ; current = i
    a.Bltu(s0, s2, print_loop); // bltu s0, s2, print_loop      ; if i < 2 go to print_loop
    a.Li(a0, 0);                // li   a0, 0                   ; previous = 0
    a.Li(a2, 1);                // li   a2, 1                   ; current = 1
    a.Mv(a1, s0);               // mv   a1, s0                  ; n = i
// fib_loop:
    Label fib_loop = a.MakeLabel();
    a.BindLabel(fib_loop);
    a.Mv(a3, a2);               // mv   a3, a2                  ; tmp = current
    a.Addi(a1, a1, -1);         // addi a1, a1, -1              ; n = n - 1
    a.Add(a2, a0, a2);          // add  a2, a0, a2              ; current = current + prev
    a.Mv(a0, a3);               // mv   a0, a3                  ; previous = tmp
    a.Bltu(s4, a1, fib_loop);   // bltu s4, a1, fib_loop        ; if n > 1 go to fib_loop
    a.J(print_loop);            // j    print_loop              ; go to print_loop
// done:
    a.BindLabel(done);
    a.Li(a0, 0);                // li   a0, 0                   ; set the return value of main() to 0

    // Emit an illegal instruction so that we have something to stop us.
    a.Emit(0);

    // Bind `printf` so that returning the code doesn't error.
    a.BindLabel(printf);
    // clang-format on

    return a.Code();
}

How labels are used

Before we go onto implementation, we need to understand the problems that we need to solve. In the code listing above, there are two scenarios in which labels are used:

  1. a label whose address is known
  2. a label whose address is not known

Let’s take a look at each of these scenarios in turn, using excerpts from the code above. We’ll see that each scenario requires slightly different treatment.

A label whose address is known

Here’s an excerpt from the listing in the previous section. It illustrates the first scenario, in which a label is used after its address is already known. It shows the fib_loop label being created, bound to an address, then used in the bltu instruction.

Here, the label fib_loop is used by the bltu instruction after the label’s address is known.

    Label fib_loop = a.MakeLabel(); // [1] Create a label, `fib_loop`.
    a.BindLabel(fib_loop);          // [2] Bind `fib_loop` to the address of the next instruction,
    a.Mv(a3, a2);                   //     i.e., the address of this `mv` instruction.
    a.Addi(a1, a1, -1);
    a.Add(a2, a0, a2);
    a.Mv(a0, a3);
    a.Bltu(s4, a1, fib_loop);       // [3] Emit an instruction that uses the `fib_loop` label.
  1. The MakeLabel() member function creates a label, fib_loop. Each label created by the assembler maps to an address in the code. However, when a label is first created then that address is unknown.

  2. The assembler keeps track of the address where the next instruction will be written. When the BindLabel() member function is called, it assigns that address to the label, fib_loop.

  3. The assembler has an overload for Bltu() that takes a Label instead of a signed-extended 12-bit offset. When this overloaded version of Bltu() is invoked then it calculates the offset from the address of the bltu instruction to the known address that is bound to the fib_loop label, and emits the bltu instruction using that offset.

In this example, the offset is -16, because the fib_loop label has been bound to the address of the mv a3, a3 instruction which appears four 32-bit instructions earlier in the code.

At the point where the label fib_loop is used by the bltu instruction its address is already known, so the offset can be calculated immediately.

A label whose address is unknown

Here’s another excerpt from the listing in the previous section. It illustrates the second scenario, in which a label is used before its address is known. It shows the fib label being created, then used by the j instruction, and only later being bound to an address.

Here, the label fib is used by the j instruction before the label’s address is known.

    Label fib = a.MakeLabel(); // [1] Create a label, `fib`.
    a.J(fib);                  // [2] Emit an instruction that uses the `fib` label.
// print_loop:
    Label print_loop = a.MakeLabel();
    a.BindLabel(print_loop);
    a.Mv(a0, s1);
    a.Mv(a1, s0);
    Label printf = a.MakeLabel();
    a.Call(printf);
    a.Addi(s0, s0, 1);
    Label done = a.MakeLabel();
    a.Beq(s0, s3, done);
// fib:
    a.BindLabel(fib);          // [3] Bind `fib` to the address of the next instruction,
    a.Mv(a2, s0);              //     i.e., the address of this `mv` instruction.
  1. The MakeLabel() member function creates a label. Its address is unknown.

  2. The assembler has an overload for J() that takes a label instead of a sign-extended 20-bit offset. However, at this point the address of the label is not known, so the assembler emits the instruction with a zero offset. It then makes a note that it needs to come back and fix up the instruction at this address with a sign-extended 20-bit offset once the address of the label is known.

  3. When BindLabel() is called, then the address of the next instruction is assigned to the label. Now the address of the label is now known, so the assembler then looks for any previously recorded fix-ups associated with the label and resolves them with the correctly calculated offsets.

In this example, the offset is 24 because the fib label appears six 32-bit instructions later than the point where it was used by the j instruction.

When the assembler emits an instruction, that instruction is written to an address in memory. When the assembler emits an instruction that uses a label, then that label is substituted with an offset if the label’s address is known.

However, if the label’s address is not yet known then the assembler can’t resolve it. Instead, it emits the instruction with a zero offset and associates a fix-up entry with the label. The fix-up contains the address of the instruction whose offset needs to be fixed up, along with the type of offset appropriate to the instruction, such as a sign-extended 12-bit offset for a branch instruction. When the label is finally bound to an address, then the assembler resolves all the fix-ups associated with that label.

Implementing labels in the assembler

The assembler has been extended to cater for a new Label type. Instances of this type are created by the assembler using MakeLabel(), and are bound to an address using BindLabel().

In addition, the code-emitting member functions that take an offset as one of their parameters gain overloads that take a Label instead of an offset, as shown below.

class Assembler
{
public:
    //...
    void Beq(uint32_t r0, uint32_t r1, int32_t offs12);
    void Beq(uint32_t r0, uint32_t r1, Label label);

    void Bltu(uint32_t r0, uint32_t r1, int32_t offs12);
    void Bltu(uint32_t r0, uint32_t r1, Label label);

    void Call(int32_t offs20);
    void Call(Label label);

    void J(int32_t offs20);
    void J(Label label);
    // ...

The Label type

From the caller’s point of view, Label is opaque. It is simply a handle to be used with the assembler in place of an address or offset.

class Label
{
public:
    using Id = size_t;

    explicit Label(Id id) : id_{id}
    {
    }

    Id GetId() const
    {
        return id_;
    }

private:
    Id id_; // [1] The label's id.
};
  1. The label id, id_, is an index, initialized and used by the assembler.

Assembler member variables

The assembler has some new member variables for dealing with labels and fix-ups.

class Assembler
{
private:
    static constexpr uint32_t badAddress = static_cast<uint32_t>(-1); // [1]

    uint32_t current_{};                     // [2] The current write position.
    std::vector<uint32_t> labels_;           // [3] Addresses by label id.
    std::multimap<Label::Id, Fixup> fixups_; // [4] Fixups by label id.
  1. badAddress is a constant that indicates that an address is not yet known.
  2. current_ is the address of the next instruction to be emitted by the assembler.
  3. labels_ is a mapping from label ids to addresses.
  4. fixups_ contains the fix-ups (if any) associated with a given label id.

Assembler member functions

The assembler gains some new member functions. Here are the most notable.

MakeLabel()

MakeLabel() creates a new Label.

    Label MakeLabel()
    {
        auto labelId = labels_.size(); // [1]
        labels_.push_back(badAddress); // [2]
        return Label(labelId);         // [3]
    }
  1. It assigns a new label id. This is nothing more than an index into labels_.
  2. It adds a new entry to labels_ and sets its value to badAddress to indicate that the address for the new label id is currently unknown.
  3. It then packages the new label id into a new Label and returns that to the caller.

BindLabel()

BindLabel() binds the current address to a Label.

    void BindLabel(Label label)
    {
        // [1] Give the label an address.
        const auto id = label.GetId();
        const auto address = Current();
        labels_[id] = address;

        // [2] Find all the fixups for this label id.
        if (const auto fixupsForId = fixups_.equal_range(id); fixupsForId.first != fixups_.end())
        {
            // [3] Resolve the fixups.
            for (auto [_, fixup] : std::ranges::subrange(fixupsForId.first, fixupsForId.second))
            {
                const auto offset = address - fixup.target;
                if (fixup.type == FixupType::offs12)
                {
                    ResolveFixup<FixupType::offs12>(fixup.target, offset); // [4] Resolve fix-up.
                }
                else if (fixup.type == FixupType::offs20)
                {
                    ResolveFixup<FixupType::offs20>(fixup.target, offset); // [4] Resolve fix-up.
                }
            }

            // [5] We don't need these fixups any more.
            fixups_.erase(id);
        }
    }
  1. It retrieves the label id from the label and sets the label at that index to the current address.
  2. It then checks if there are any fix-ups associated with the label id.
  3. If there are any fix-ups, then it resolves them.
  4. Fixups are resolved by calling the member function template ResolveFixup().
  5. Finally, the fix-ups for the label id are erased as they’re no longer needed.

AddFixup()

Member function template AddFixup() is called when a code-emitting function attempts to use a Label whose address is not known. For example, here’s how it might be used by Bltu().

void Bltu(uint32_t r0, uint32_t r1, Label label)
{
    if (const auto addr = AddressOf(label); addr.has_value()) // [1]
    {
        // [2]
        const int32_t offs12 = *addr - Current();
        Emit(encode::opc(Opcode::Bltu) | encode::r0(r0) | encode::r1(r1) | encode::offs12(offs12));
    }
    else
    {
        // [3]
        AddFixup<FixupType::offs12>(label);
        Emit(encode::opc(Opcode::Bltu) | encode::r0(r0) | encode::r1(r1) | encode::offs12(0));
    }
}
  1. Check if there is an address associated with this label.
  2. The label has an address, so subtract the address of the current instruction to get an offset, then use the offset when emitting a bltu instruction.
  3. The label does not have an address, so call AddFixup() so that BindLabel() can supply one later, then emit a bltu instruction whose offset is zero.

The implementation of AddFixup() is straightforward.

    template<FixupType type>
    void AddFixup(Label label)
    {
        fixups_.emplace(label.GetId(), Fixup{.target = Current(), .type = type}); // [1]
    }
  1. Add a mapping from the label id to a Fixup whose target is the current address and whose type is the offset type used by the current instruction.

ResolveFixup()

Member function template ResolveFixup() takes an address and an offset and fixes up the code at the given address to use the offset of that type.

    template<FixupType type>
    void ResolveFixup(uint32_t addr, int32_t offset)
    {
        auto existing = code_[addr / 4];              // [1] Get the existing instruction.
        if constexpr (type == FixupType::offs12)      // [2] Used when `FixupType` is `offs12`.
        {
            existing = (existing & 0x000fffff) | encode::offs12(offset);
        }
        else if constexpr (type == FixupType::offs20) // [3] Used when `FixupType` is `offs20`.
        {
            existing = (existing & 0x00000fff) | encode::offs20(offset);
        }
        code_[addr / 4] = existing;                   // [4] Update the code.
    }
  1. It reads the existing instruction from the emitted code.
  2. If the fix-up is for an instruction that uses a sign-extended 12-bit offset, then it masks out the 12 bits occupied by the offset and re-encodes a new offset.
  3. If the fix-up is for an instruction that uses a sign-extended 20-bit offset, then it masks out the 20 bits occupied by the offset and re-encodes a new offset.
  4. It updates the emitted code by overwriting it with the modified instruction.

The new Assembler

Here are the new versions of the Assembler class and the Assemble() function that uses it.

It has a few small differences with some of the explanations above, mostly because I’ve taken the opportunity to use templates where there are obvious patterns, e.g., for the branch instructions. Also, I’ve modified Assembler::Code() so that it will throw an exception if there are any unbound labels.

You can see the results of running this on Compiler Explorer.

class Assembler
{
    static constexpr uint32_t badAddress = static_cast<uint32_t>(-1);

    enum class FixupType
    {
        offs12,
        offs20
    };

    struct Fixup
    {
        uint32_t target; // The address that contains the data to be fixed up.
        FixupType type;  // The type of the data that needs to be fixed up.
    };

    std::vector<uint32_t> code_;
    uint32_t current_{};
    std::vector<uint32_t> labels_;
    std::multimap<Label::Id, Fixup> fixups_;

    uint32_t Current() const
    {
        return current_;
    }

    std::optional<uint32_t> AddressOf(Label label)
    {
        if (auto result = labels_[label.GetId()]; result != badAddress)
        {
            return result;
        }
        return {};
    }

    template<FixupType type>
    void ResolveFixup(uint32_t addr, int32_t offset)
    {
        auto existing = code_[addr / 4];
        if constexpr (type == FixupType::offs12)
        {
            existing = (existing & 0x000fffff) | encode::offs12(offset);
        }
        else if constexpr (type == FixupType::offs20)
        {
            existing = (existing & 0x00000fff) | encode::offs20(offset);
        }
        code_[addr / 4] = existing;
    }

    template<FixupType type>
    void AddFixup(Label label)
    {
        fixups_.emplace(label.GetId(), Fixup{.target = Current(), .type = type});
    }

public:
    void BindLabel(Label label)
    {
        const auto id = label.GetId();
        const auto address = Current();
        labels_[id] = address;

        // Find all the fixups for this label id.
        if (const auto fixupsForId = fixups_.equal_range(id); fixupsForId.first != fixups_.end())
        {
            // Resolve the fixups.
            for (auto [_, fixup] : std::ranges::subrange(fixupsForId.first, fixupsForId.second))
            {
                const auto offset = address - fixup.target;
                if (fixup.type == FixupType::offs12)
                {
                    ResolveFixup<FixupType::offs12>(fixup.target, offset);
                }
                else if (fixup.type == FixupType::offs20)
                {
                    ResolveFixup<FixupType::offs20>(fixup.target, offset);
                }
            }

            // We don't need these fixups any more.
            fixups_.erase(id);
        }
    }

    Label MakeLabel()
    {
        auto labelId = labels_.size();
        labels_.push_back(badAddress);
        return Label(labelId);
    }

    const std::vector<uint32_t>& Code() const
    {
        if (fixups_.empty())
        {
            return code_;
        }
        throw std::runtime_error("There are unbound labels.");
    }

    void Emit(uint32_t u)
    {
        code_.push_back(u);
        current_ += 4; // 4 bytes per instruction.
    }

    void Add(uint32_t r0, uint32_t r1, uint32_t r2)
    {
        // add r0, r1, r2
        Emit(encode::opc(Opcode::Add) | encode::r0(r0) | encode::r1(r1) | encode::r2(r2));
    }

    void Addi(uint32_t r0, uint32_t r1, int32_t imm12)
    {
        // addi r0, r1, imm12
        Emit(encode::opc(Opcode::Addi) | encode::r0(r0) | encode::r1(r1) | encode::imm12(imm12));
    }

    template<Opcode opcode>
    void Branch(uint32_t r0, uint32_t r1, int32_t offs12)
    {
        Emit(encode::opc(opcode) | encode::r0(r0) | encode::r1(r1) | encode::offs12(offs12));
    }

    template<Opcode opcode>
    void Branch(uint32_t r0, uint32_t r1, Label label)
    {
        if (const auto addr = AddressOf(label); addr.has_value())
        {
            Branch<opcode>(r0, r1, *addr - Current());
        }
        else
        {
            AddFixup<FixupType::offs12>(label);
            Branch<opcode>(r0, r1, 0);
        }
    }

    void Beq(uint32_t r0, uint32_t r1, int32_t offs12)
    {
        Branch<Opcode::Beq>(r0, r1, offs12);
    }

    void Beq(uint32_t r0, uint32_t r1, Label label)
    {
        Branch<Opcode::Beq>(r0, r1, label);
    }

    void Bltu(uint32_t r0, uint32_t r1, int32_t offs12)
    {
        Branch<Opcode::Bltu>(r0, r1, offs12);
    }

    void Bltu(uint32_t r0, uint32_t r1, Label label)
    {
        Branch<Opcode::Bltu>(r0, r1, label);
    }

    template<Opcode opcode>
    void Jump(int32_t offs20)
    {
        Emit(encode::opc(opcode) | encode::offs20(offs20));
    }

    template<Opcode opcode>
    void Jump(Label label)
    {
        if (const auto addr = AddressOf(label); addr.has_value())
        {
            Jump<opcode>(*addr - Current());
        }
        else
        {
            AddFixup<FixupType::offs20>(label);
            Jump<opcode>(0);
        }
    }

    void Call(int32_t offs20)
    {
        Jump<Opcode::Call>(offs20);
    }

    void Call(Label label)
    {
        Jump<Opcode::Call>(label);
    }

    void J(int32_t offs20)
    {
        Jump<Opcode::J>(offs20);
    }

    void J(Label label)
    {
        Jump<Opcode::J>(label);
    }

    void Li(uint32_t r0, int32_t imm12)
    {
        // li r0, imm12
        Emit(encode::opc(Opcode::Li) | encode::r0(r0) | encode::imm12(imm12));
    }

    void Lui(uint32_t r0, uint32_t uimm20)
    {
        // lui r0, uimm20
        Emit(encode::opc(Opcode::Lui) | encode::r0(r0) | encode::uimm20(uimm20));
    }

    void Mv(uint32_t r0, uint32_t r1)
    {
        // mv r0, r1
        Emit(encode::opc(Opcode::Mv) | encode::r0(r0) | encode::r1(r1));
    }
};

std::vector<uint32_t> Assemble()
{
    Assembler a;

    // clang-format off
// main:
    a.Li(s0, 0);                // li   s0, 0                   ; i = 0
    a.Li(s2, 2);                // li   s2, 2                   ; s2 = 2
    a.Lui(a0, 1);               // lui  a0, %hi(format_str)
    a.Addi(s1, a0, -548);       // addi s1, a0, %lo(format_str) ; s1 = the address of the printf format string
    a.Li(s3, 48);               // li   s3, 48                  ; s3 = 48
    a.Li(s4, 1);                // li   s4, 1                   ; s4 = 1
    Label fib = a.MakeLabel();
    a.J(fib);                   // j    fib                     ; go to fib
// print_loop:
    Label print_loop = a.MakeLabel();
    a.BindLabel(print_loop);
    a.Mv(a0, s1);               // mv   a0, s1                  ; arg0 = the address of the printf format string
    a.Mv(a1, s0);               // mv   a1, s0                  ; arg1 = i (arg2 contains current)
    Label printf = a.MakeLabel();
    a.Call(printf);             // call printf                  ; call printf
    a.Addi(s0, s0, 1);          // addi s0, s0, 1               ; i = i + 1
    Label done = a.MakeLabel();
    a.Beq(s0, s3, done);        // beq  s0, s3, done            ; if i == 48 go to done
// fib:
    a.BindLabel(fib);
    a.Mv(a2, s0);               // mv   a2, s0                  ; current = i
    a.Bltu(s0, s2, print_loop); // bltu s0, s2, print_loop      ; if i < 2 go to print_loop
    a.Li(a0, 0);                // li   a0, 0                   ; previous = 0
    a.Li(a2, 1);                // li   a2, 1                   ; current = 1
    a.Mv(a1, s0);               // mv   a1, s0                  ; n = i
// fib_loop:
    Label fib_loop = a.MakeLabel();
    a.BindLabel(fib_loop);
    a.Mv(a3, a2);               // mv   a3, a2                  ; tmp = current
    a.Addi(a1, a1, -1);         // addi a1, a1, -1              ; n = n - 1
    a.Add(a2, a0, a2);          // add  a2, a0, a2              ; current = current + prev
    a.Mv(a0, a3);               // mv   a0, a3                  ; previous = tmp
    a.Bltu(s4, a1, fib_loop);   // bltu s4, a1, fib_loop        ; if n > 1 go to fib_loop
    a.J(print_loop);            // j    print_loop              ; go to print_loop
// done:
    a.BindLabel(done);
    a.Li(a0, 0);                // li   a0, 0                   ; set the return value of main() to 0

    // Emit an illegal instruction so that we have something to stop us.
    a.Emit(0);

    // Bind `printf` so that returning the code doesn't error.
    a.BindLabel(printf);
    // clang-format on

    return a.Code();
}

Conclusion

I hadn’t originally planned to write a post describing how to add labels to the assembler. However, I felt that it was worthwhile doing so because the effort involved in writing code with manually calculated offsets does not scale, and it became obvious that the additional code needed to be explained.

As before, you can view and run the entire code on Compiler Explorer.