5 July, 2024

Implementing memory.

Part 5 Implementing memory

Until now, the implementation of the Owl-2820 CPU has had no need to access memory beyond fetching the next instruction. However, all but the most trivial programs need to read values from memory so that they can operate on them, and then having operated on them, write the results back to memory.

In this post, we’re going to extend the Owl-2820 CPU with instructions that allow it to read from and write to memory.

Accessing memory

Let’s write a program that requires memory access, then we’ll update our implementation of the Owl-2820 CPU so that it can run the program.

Fibonacci with a lookup table in C

Let’s do this by rewriting the code that prints the first forty-eight Fibonacci numbers so that it uses a memory-based lookup table rather than repeated addition. Here’s what that looks like in C.

#include <stdint.h>
#include <stdio.h>

static uint32_t lut[48] = {
    0,          // fib(0)
    1,          // fib(1)
    1,          // fib(2)
    2,          // fib(3)
    3,          // fib(4)
    5,          // fib(5)
    8,          // fib(6)
    13,         // fib(7)
    // --- snip ---
    1836311903, // fib(46)
    2971215073, // fib(47)
};

static uint32_t fib(uint32_t n)
{
    if (n < 48)
    {
        return lut[n];
    }
    return -1;
}

int main(void)
{
    for (uint32_t i = 0; i < 48; i++)
    {
        uint32_t result = fib(i);
        printf("fib(%u) = %u\n", i, result);
    }
    return 0;
}

Fibonacci with a lookup table in assembly language

Given that the Owl-2820 CPU is based on RISC-V, the easiest way to turn the C into assembly language is to compile it for a RISC-V target, such as this code generated by gcc on Compiler Explorer, then tweak it manually so that it will run on the Owl.

Here’s what the resulting code looks like when translated to Owl-2820 assembly language. I’ve replaced the call to printf() with a call to a print_fib subroutine similar to the one from part 4, and I’ve added the start entry point which calls main so that main has something to return to, but overall it is essentially the same code that gcc generated.

This code makes use of the lw (load word) and sw (store word) memory access instructions which we haven’t seen previously.

start:
    call main
    .word 0                         ; emit an illegal instruction to stop the VM

main:
    addi    sp, sp, -32             ; reserve space on the stack

    ; save registers s0 - s3 and the return address register ra onto the stack
    sw      s0, 24(sp)
    sw      s1, 20(sp)
    sw      s2, 16(sp)
    sw      s3, 12(sp)
    sw      ra, 28(sp)

    ; s1 = the address of the start of the lookup table
    lui     s1, %hi(lut)
    addi    s1, s1, %lo(lut)

    li      s0, 0                   ; i = 0
    li      s2, 48                  ; s2 = 48

print_loop:
    lw      a1, 0(s1)               ; arg1 = *s1
    mv      a0, s0                  ; arg0 = i
    addi    s0, s0, 1               ; i++
    call    print_fib
    addi    s1, s1, 4               ; bump s1 to the next address in the lookup table
    bltu    s0, s2, print_loop      ; if i < 48 go to print_loop
        
    ; restore the return address register ra and registers s0 - s3 from the stack
    lw      ra, 28(sp)
    lw      s0, 24(sp)
    lw      s1, 20(sp)
    lw      s2, 16(sp)
    lw      s3, 12(sp)

    li      a0, 0                   ; set the return value to zero

    addi    sp, sp, 32              ; free the reserved space back to the stack
    ret                             ; return from main

print_fib:
    ecall                           ; invoke hard-coded print_fib
    ret                             ; return from print_fib

lut:
    .word 0                         ; fib(0)
    .word 1                         ; fib(1)
    .word 1                         ; fib(2)
    .word 2                         ; fib(3)
    .word 3                         ; fib(4)
    .word 5                         ; fib(5)
    .word 8                         ; fib(6)
    .word 13                        ; fib(7)
    ; --- snip ---
    .word 1836311903                ; fib(46)
    .word 2971215073                ; fib(47)

Memory access in the stack frame

The original listing of Fibonacci from part 0 had a stack frame. I removed it at the time because it wasn’t necessary for generating the Fibonacci sequence.

This time we’re retaining it because unlike some instruction set architectures which have dedicated push and pop instructions that operate directly on the stack to save and restore registers, the Owl-2820 CPU does not, and therefore needs to use its sw and lw instructions to do this.

The code to set up the stack frame consists primarily of sw (store word) memory access instructions that store register values into the stack at memory addresses relative to the stack pointer, sp.

main:
    addi    sp, sp, -32             ; reserve space on the stack

    ; save registers s0 - s3 and the return address register ra onto the stack
    sw      s0, 24(sp)
    sw      s1, 20(sp)
    sw      s2, 16(sp)
    sw      s3, 12(sp)
    sw      ra, 28(sp)

Similarly, the code to tear down the stack frame consists primarily of lw (load word) instructions which it uses to restore register values from the stack at memory addresses relative to the stack pointer, sp.

    ; restore the return address register ra and registers s0 - s3 from the stack
    lw      ra, 28(sp)
    lw      s0, 24(sp)
    lw      s1, 20(sp)
    lw      s2, 16(sp)
    lw      s3, 12(sp)

    li      a0, 0                   ; set the return value to zero

    addi    sp, sp, 32              ; free the reserved space back to the stack
    ret                             ; return from main

The reason that these particular registers are saved on the stack is that according to the Owl-2820 calling convention described in part 4 they are all callee-saved registers, meaning that it is the callee not the caller is responsible for preserving their values.

As a reminder, the ret instruction, which we also introduced in part 4, doesn’t pop the return address from the stack. Instead, it jumps to the address in the return address register, ra, which was saved to the stack by an sw instruction on entry to main and restored from the stack by an lw instruction on exit from main.

Memory access in the Fibonacci lookup table

The lookup table itself is implemented by using the s1 register as a pointer to its current entry.

Here’s the code to initialize s1 with the address of the lookup table. Don’t worry what %hi and %lo mean for now as I’ll explain those shortly.

    ; s1 = the address of the start of the lookup table
    lui     s1, %hi(lut)            ; load the upper 20 bits of the lookup table's address into the
                                    ; upper 20 bits of s1
    addi    s1, s1, %lo(lut)        ; combine that with the lower 12 bits to get the full 32-bit
                                    ; address

Here’s the code that uses the lookup table.

print_loop:
    lw      a1, 0(s1)               ; arg1 = *s1
    mv      a0, s0                  ; arg0 = i
    addi    s0, s0, 1               ; i++
    call    print_fib
    addi    s1, s1, 4               ; bump s1 to the next address in the lookup table
    bltu    s0, s2, print_loop      ; if i < 48 go to print_loop

The lw a1, 0(s1) instruction reads the word at the address pointed to by s1 and puts it into the a1 register so that it can be used as an argument to print_fib.

After printing, it increments s1 by 4 bytes (the size of a 32-bit word) so that it points to the address of the next value, and loops around to print the next value.

Load and store instructions

The Owl-2820 assembly language for this implementation of Fibonacci introduced two new instructions, lw and sw. These load (read) a 32-bit word value from memory into a register and store (write) a 32-bit word value from a register into memory, respectively.

The Owl-2820 CPU also supports reading and writing of byte values and 16-bit halfword values. The following table shows all the Owl-2820 instructions that read from and write to memory.

The Owl-2820 CPU is little-endian, meaning that values in memory that are larger than a byte are stored with the least-significant byte first.

Instruction	Description
`lb r0, imm12(r1)`	Loads a byte from address `imm12(r1)` and sign-extends it into `r0`
`lbu r0, imm12(r1)`	Loads a byte from address `imm12(r1)` and zero-extends it into `r0`
`lh r0, imm12(r1)`	Loads a little-endian halfword from address `imm12(r1)` and sign-extends it into `r0`
`lhu r0, imm12(r1)`	Loads a little-endian halfword from address `imm12(r1)` and zero-extends it into `r0`
`lw r0, imm12(r1)`	Loads a little-endian word from address `imm12(r1)` into `r0`.
`sb r0, imm12(r1)`	Stores the lowest byte of `r0` into address `imm12(r1)`
`sh r0, imm12(r1)`	Stores the lower halfword of `r0` into address `imm12(r1)` in little-endian order
`sw r0, imm12(r1)`	Stores the word in `r0` into address `imm12(r1)` in little-endian order

Directives

The assembly language also introduced a few assembly language directives and expressions. These are not Owl-2820 instructions, but are used to tell the assembler to do something.

Directive	Description
`.word u32`	Emits the 32-bit unsigned word `u32` into the code generated by the assembler
`%lo(label)`	Calculates the lower 12 bits of the absolute address of label `label`
`%hi(label)`	Calculates the upper 20 bits of the absolute address of label `label`

The .word directive tells the assembler to emit a word at the current address rather than emiting an instruction. This allows us to write values in the Fibonacci lookup table directly into the code generated by the assembler.

lut:
    .word 0                         ; fib(0)
    .word 1                         ; fib(1)
    .word 1                         ; fib(2)
    .word 2                         ; fib(3)
    .word 3                         ; fib(4)
    .word 5                         ; fib(5)
    .word 8                         ; fib(6)
    .word 13                        ; fib(7)
    ; --- snip ---
    .word 1836311903                ; fib(46)
    .word 2971215073                ; fib(47)

The %hi and %lo expressions work with the lui and addi instructions so that we can get the address of the lookup table. We need to do this as the Owl-2820 CPU provides no way of loading a 32-bit immediate value in a single instruction, because instructions themselves occupy 32-bits.

    ; s1 = the address of the start of the lookup table
    lui     s1, %hi(lut)            ; load the upper 20 bits of the lookup table's address into the
                                    ; upper 20 bits of s1
    addi    s1, s1, %lo(lut)        ; combine that with the lower 12 bits to get the full 32-bit
                                    ; address

Implementing the memory access instructions

The next few subsections describe the Owl-2820 memory access instructions using the format introduced in part 1.

The memory accesses for these instructions are implemented in terms of ReadN() and WriteN() functions, where N is the size of the operation in bits, e.g., Read8() reads 8 bits. These functions are explained in detail in the next section, Implementing memory.

lb - load byte

Format

lb r0, imm12(r1)

Encoding

imm12 `[31:20]`	unused `[19:17]`	r1 `[16:12]`	r0 `[11:7]`	opcode `[6:0]`
12 bits	3 bits	5 bits	5 bits	`Lb` : 7 bits

Description

Loads a sign-extended byte value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0.

Possible implementation

    const uint32_t addr = x[r1(ins)] + imm12(ins);
    const int32_t signExtendedByte = static_cast<int8_t>(Read8(memory, addr));
    x[r0(ins)] = signExtendedByte;
    x[0] = 0; // Ensure x0 is always zero.

lbu - load byte unsigned

Format

lbu r0, imm12(r1)

Encoding

imm12 `[31:20]`	unused `[19:17]`	r1 `[16:12]`	r0 `[11:7]`	opcode `[6:0]`
12 bits	3 bits	5 bits	5 bits	`Lbu` : 7 bits

Description

Loads a zero-extended byte value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0.

Possible implementation

    const uint32_t addr = x[r1(ins)] + imm12(ins);
    const uint32_t zeroExtendedByte = static_cast<uint8_t>(Read8(memory, addr));
    x[r0(ins)] = zeroExtendedByte;
    x[0] = 0; // Ensure x0 is always zero.

lh - load halfword

Format

lh r0, imm12(r1)

Encoding

imm12 `[31:20]`	unused `[19:17]`	r1 `[16:12]`	r0 `[11:7]`	opcode `[6:0]`
12 bits	3 bits	5 bits	5 bits	`Lh` : 7 bits

Description

Loads a sign-extended 16-bit halfword value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0. The value is read from memory in little-endian order, i.e., least significant byte first.

Possible implementation

    const uint32_t addr = x[r1(ins)] + imm12(ins);
    const int32_t signExtendedHalfWord = static_cast<int16_t>(Read16(memory, addr));
    x[r0(ins)] = signExtendedHalfWord;
    x[0] = 0; // Ensure x0 is always zero.

lhu - load halfword unsigned

Format

lhu r0, imm12(r1)

Encoding

imm12 `[31:20]`	unused `[19:17]`	r1 `[16:12]`	r0 `[11:7]`	opcode `[6:0]`
12 bits	3 bits	5 bits	5 bits	`Lhu` : 7 bits

Description

Loads a zero-extended 16-bit halfword value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0. The value is read from memory in little-endian order, i.e., least significant byte first.

Possible implementation

    const uint32_t addr = x[r1(ins)] + imm12(ins);
    const uint32_t zeroExtendedHalfWord = static_cast<uint16_t>(Read16(memory, addr));
    x[r0(ins)] = zeroExtendedHalfWord;
    x[0] = 0; // Ensure x0 is always zero.

lw - load word

Format

lw r0, imm12(r1)

Encoding

imm12 `[31:20]`	unused `[19:17]`	r1 `[16:12]`	r0 `[11:7]`	opcode `[6:0]`
12 bits	3 bits	5 bits	5 bits	`Lw` : 7 bits

Description

Loads a 32-bit word value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0. The value is read from memory in little-endian order, i.e., least significant byte first.

Possible implementation

    const uint32_t addr = x[r1(ins)] + imm12(ins);
    x[r0(ins)] = Read32(memory, addr);
    x[0] = 0; // Ensure x0 is always zero.

sb - store byte

Format

sb r0, imm12(r1)

Encoding

imm12 `[31:20]`	unused `[19:17]`	r1 `[16:12]`	r0 `[11:7]`	opcode `[6:0]`
12 bits	3 bits	5 bits	5 bits	`Sb` : 7 bits

Description

Stores the value in the lowest byte of register r0 into the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12.

Possible implementation

    const uint32_t addr = x[r1(ins)] + imm12(ins);
    const std::byte byte = static_cast<std::byte>(x[r0(ins)]);
    Write8(memory, addr, byte);

sh - store halfword

Format

sh r0, imm12(r1)

Encoding

imm12 `[31:20]`	unused `[19:17]`	r1 `[16:12]`	r0 `[11:7]`	opcode `[6:0]`
12 bits	3 bits	5 bits	5 bits	`Sh` : 7 bits

Description

Stores the value in the lowest 16 bits of register r0 into the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12. The value is written into memory in little-endian order, i.e., least significant byte first.

Possible implementation

    const uint32_t addr = x[r1(ins)] + imm12(ins);
    const uint16_t halfWord = static_cast<uint16_t>(x[r0(ins)]);
    Write16(memory, addr, halfWord);

sw - store word

Format

sw r0, imm12(r1)

Encoding

imm12 `[31:20]`	unused `[19:17]`	r1 `[16:12]`	r0 `[11:7]`	opcode `[6:0]`
12 bits	3 bits	5 bits	5 bits	`Sw` : 7 bits

Description

Stores the 32-bit value in register r0 into the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12. The value is written into memory in little-endian order, i.e., least significant byte first.

Possible implementation

    const uint32_t addr = x[r1(ins)] + imm12(ins);
    const uint32_t word = x[r0(ins)];
    Write32(memory, addr, word);

Implementing memory

The memory access instructions in the previous section were implemented in terms of the following ReadN() and WriteN() memory access functions.

Function	Description
`Read8()`	Reads a byte from memory
`Read16()`	Reads a 16-bit halfword from memory in little-endian order
`Read32()`	Reads a 32-bit word from memory in little-endian order
`Write8()`	Writes a byte into memory
`Write16()`	Writes a 16-bit halfword into memory in little-endian order
`Write32()`	Writes a 32-bit word into memory in little-endian order

All of these functions operate on byte boundaries, so memory can be thought of as a directly addressable sequence of contiguous bytes. In C++, we can represent this with a std::span of bytes.

    using Memory = std::span<std::byte>;

Reading and writing bytes

Reading and writing bytes is simple. Our memory implementation is a std::span of bytes, so we index it with the Owl-2820 VM’s memory address and read or write the byte value as required.

std::byte Read8(const Memory memory, uint32_t addr)
{
    return memory[addr];
}

void Write8(Memory memory, uint32_t addr, std::byte byte)
{
    memory[addr] = byte;
}

Reading and writing larger values

Reading and writing 16-bit halfword and 32-bit word values is a little more complicated for two reasons:

The Owl-2820 CPU is little-endian whereas the host CPU may be big-endian.
The Owl-2820 CPU allows unaligned memory access whereas the host CPU may not.

Swapping bytes

If the host CPU is little-endian, then any data read from Owl-2820 memory is already in little-endian order and does not to be reversed. However, if the host CPU is big-endian, then any data read from Owl-2820 memory in little-endian order needs to be reversed to the native order, otherwise the value will be incorrect.

Given that the Owl-2820 CPU can read and write both 16-bit halfwords and 32-bit words, we can solve this problem by supplying two overloaded functions that will simply return the value on a little-endian host machine, and swap the byte order of the value on a big-endian host machine.

Both overloads contain if constexpr statements that are evaluated at compile time which means that the compiler will generate the appropriate code to leave the byte order unchanged when the host machine is little-endian, and to reverse the byte order when the host machine is big-endian.

Bytes are swapped by a combination of masking and rotating which most major compilers will detect and replace with a single instruction, at least where the underlying instruction set architecture supports it. If you’re interested in other such bit-twiddling tricks then I recommend Hacker’s Delight by Henry S. Warren, as the book is full of them.

Here’s the implementation of 32-bit version of the overloaded function. The comments in the code contain a worked example showing how the hex number 0x12345678 can have its byte order reversed to 0x78563412. The 16-bit implementation of the function is similar, albeit simpler.

uint32_t AsLE(uint32_t word)
{
    if constexpr (std::endian::native == std::endian::little)
    {
        // Nothing to do. We're on a little-endian platform, and Owl-2820 is little-endian.
        return word;
    }

    if constexpr (std::endian::native == std::endian::big)
    {
        // We're on a big-endian platform so reverse the byte order as Owl-2820 is little-endian.
        // The code may look like it takes several instructions, but MSVC, gcc,and clang are clever
        // enough to figure out what is going on and will generate the equivalent of `bswap` (x86)
        // or `rev` (ARM).
        //
        // Example (all values are hex).
        //
        // std::rotr(word & 00ff00ff, 8)
        //      12345678 & 00ff00ff             == 00340078
        //      00340078 rotr 8                 == 78004500
        //
        // std::rotl(word, 8) & 00ff00ff
        //      12345678 rotl 8                 == 34567812
        //      34567812 & 00ff00ff             == 00560012
        //
        // 78004500 | 00560012                  == 78563412
        //
        return std::rotr(word & 0x00ff00ff, 8) | (std::rotl(word, 8) & 0x00ff00ff);
    }
}

Handling unaligned memory access

Some instruction set architectures do not support unaligned memory access. On such architectures, a 16-bit halfword can only be read from or written to a memory address that is a multiple of two. Likewise, a 32-bit word can only be read from or written to a memory address that is a multiple of four. Any attempt to perform an unaligned memory access will result in a fault often known as a bus error.

The Owl-2820 CPU supports unaligned memory access so to support the scenario where the host machine does not, we need to provide a mechanism to translate an unaligned memory access into an aligned one. This turns out to be straightforward.

When reading, we can do a bytewise memcpy() of the bytes from Owl-2820 memory into a variable that is suitably aligned. For example, here’s how we can read a potentially unaligned 16-bit value from Owl-2820 memory into a uint16_t which will be aligned to a 16-bit boundary on an architecture that requires it.

    // We do the equivalent of a memcpy from the VM's memory into a uint16_t variable which we can
    // expect the compiler to align appropriately for the host platform. On a platform which
    // supports unaligned reads, most compilers will detect what we're doing and optimize it away.
    uint16_t v;
    std::ranges::copy_n(memory.data() + addr, sizeof(v), reinterpret_cast<std::byte*>(&v));

Similarly, when writing, we can do a memcpy() from an aligned value into Owl-2820 memory.

    // We do the equivalent of a mempy from an aligned uint16_t value into the VM's memory.
    const uint16_t v = valueToCopy;
    std::ranges::copy_n(reinterpret_cast<const std::byte*>(&v), sizeof(v), memory.data() + addr);
}

Note that as this is C++20 I’ve spelled memcpy() as std::ranges::copy_n().

Putting it all together

Putting it all together, to read a 16-bit halfword or a 32-bit word from memory, we first deal with unaligned memory access by copying the bytes from Owl-2820 memory into a suitably aligned value, then, if necessary, we swap the bytes to convert them from little-endian order to native order.

uint32_t Read32(const Memory memory, uint32_t addr)
{
    // Owl-2820 is permissive about unaligned memory accesses. This may not be the case for
    // the host platform, so we do the equivalent of a memcpy from the VM's memory before
    // trying to interpret the value. Most compilers will detect what we're doing and
    // optimize it away.
    uint32_t v;
    std::ranges::copy_n(memory.data() + addr, sizeof(uint32_t), reinterpret_cast<std::byte*>(&v));

    // Owl-2820 is little-endian, so swap the byte order if necessary.
    return AsLE(v);
}

To write a 16-bit halfword or a 32-bit word to memory, we first swap the value from native order into little-endian order if necessary, then copy the underlying bytes from the aligned value into Owl-2820 memory.

void Write32(Memory memory, uint32_t addr, uint32_t word)
{
    // Owl-2820 is little-endian, so swap the byte order if necessary.
    const uint32_t v = AsLE(word);

    // Owl-2820 is permissive about unaligned memory accesses. This may not be the case for
    // the host platform, so we do the equivalent of a memcpy when writing the value to the
    // VM's memory. Most compilers will detect what we're doing and optimize it away.
    std::ranges::copy_n(reinterpret_cast<const std::byte*>(&v), sizeof(uint32_t),
                        memory.data() + addr);
}

Rewriting Run()

The original implementation of Run() performed no memory access other than fetching the next instruction from memory. We implemented that by supplying Run() with a pointer to some 32-bit values representing the Owl-2820 instructions, then by using the program counter, pc, to derive an index into those instructions as shown in the code fragment below.

void Run(const uint32_t* code)
{
    // ...
    while (!done)
    {
        // Fetch a 32-bit word from memory at the address pointed to by the program counter.
        pc = nextPc;
        nextPc += wordSize;
        const uint32_t ins = code[pc / wordSize];

Changes to Run()

The revised implementation of Run() needs to support not only fetching the next instruction from memory as before, but it also needs to support reading from and writing to byte-addressable memory.

It enables this by changing Run()’s signature to take a std::span<uint32_t> which unifies both code and data into a word-aligned memory image.

void Run(std::span<uint32_t> image)
{

To illustrate how it is used, here’s how Run() is invoked from main().

    // Create a 4K memory image.
    constexpr size_t memorySize = 4096;
    std::vector<uint32_t> image(memorySize / sizeof(uint32_t));

    // Assemble our program and copy it to the start of the memory image.
    auto code = Assemble();
    std::ranges::copy(code, image.begin());

    Run(image);

Two views of memory

When Run() fetches an instruction, it does so via code, a read-only, word-addressable view of the image, whereas memory access instructions such as lb and sb operate on memory, a read/write, byte-addressable view of the image exposed by std::as_writable_bytes().

void Run(std::span<uint32_t> image)
{
    // ...
    // Get a read-only, word-addressable view of the image for fetching instructions.
    const auto code = image;

    // Get a read/write, byte-addressable view of the image for memory accesses.
    auto memory = std::as_writable_bytes(image);

Initializing the stack pointer

Our new implementation of Fibonacci uses the stack, so Run() needs to initialize the stack pointer, sp, to a sensible value. It sets it to the end of memory as the stack grows downwards in memory.

    // ...
    // Set the stack pointer to the end of memory.
    x[sp] = uint32_t(memory.size());

Fetching an instruction

The code that fetches the next instruction from memory needs to take account of byte order where it didn’t previously. It does this by using the 32-bit overload of AsLE() to read the value in little-endian order.

    // ...
    while (!done)
    {
        // Fetch a 32-bit word from the address pointed to by the program counter.
        pc = nextPc;
        nextPc += wordSize;
        const uint32_t ins = AsLE(code[pc / wordSize]);

Implementing memory access instructions

Finally, Run() needs implementations for the memory access instructions. The implementations of these instructions are the same as those shown in previous sections, and use the ReadN() and WriteN() functions to operate on memory, the byte-addressable view of the image.

        // Decode the word to extract the opcode.
        const Opcode opcode = Opcode(ins & 0x7f);

        // Dispatch it and execute it.
        switch (opcode)
        {
        // ...
        case Opcode::Lb: {
            // r0 <- sext(memory8(r1 + imm12))
            const uint32_t addr = x[r1(ins)] + imm12(ins);
            const int32_t signExtendedByte = static_cast<int8_t>(Read8(memory, addr));
            x[r0(ins)] = signExtendedByte;
            x[0] = 0; // Ensure x0 is always zero.
            break;
        }

        // ...

        case Opcode::Sw: {
            // memory32(r1 + imm12) <- r0
            const uint32_t addr = x[r1(ins)] + imm12(ins);
            const uint32_t word = x[r0(ins)];
            Write32(memory, addr, word);
            break;
        }
        // ...

Summary

In this post, we added memory access to the Owl-2820 CPU and demonstrated it using an implementation of Fibonacci that uses a lookup table. We also built upon what we learned about calling conventions in the previous post, by having our program set up and tear down a stack frame. Finally, we unified code and data into one memory image which we supplied to Run().

Full code

Here’s a link to the full code for this post on Compiler Explorer.