Implementing memory.
Part 5 Implementing memory
Until now, the implementation of the Owl-2820 CPU has had no need to access memory beyond fetching the next instruction. However, all but the most trivial programs need to read values from memory so that they can operate on them, and then having operated on them, write the results back to memory.
In this post, we’re going to extend the Owl-2820 CPU with instructions that allow it to read from and write to memory.
Accessing memory
Let’s write a program that requires memory access, then we’ll update our implementation of the Owl-2820 CPU so that it can run the program.
Fibonacci with a lookup table in C
Let’s do this by rewriting the code that prints the first forty-eight Fibonacci numbers so that it uses a memory-based lookup table rather than repeated addition. Here’s what that looks like in C.
#include <stdint.h>
#include <stdio.h>
static uint32_t lut[48] = {
0, // fib(0)
1, // fib(1)
1, // fib(2)
2, // fib(3)
3, // fib(4)
5, // fib(5)
8, // fib(6)
13, // fib(7)
// --- snip ---
1836311903, // fib(46)
2971215073, // fib(47)
};
static uint32_t fib(uint32_t n)
{
if (n < 48)
{
return lut[n];
}
return -1;
}
int main(void)
{
for (uint32_t i = 0; i < 48; i++)
{
uint32_t result = fib(i);
printf("fib(%u) = %u\n", i, result);
}
return 0;
}
Fibonacci with a lookup table in assembly language
Given that the Owl-2820 CPU is based on RISC-V, the easiest way to turn the C into assembly language is to compile it for a RISC-V target, such as this code generated by gcc on Compiler Explorer, then tweak it manually so that it will run on the Owl.
Here’s what the resulting code looks like when translated to Owl-2820 assembly language. I’ve replaced the call to printf()
with a call to a print_fib
subroutine similar to the one from part 4, and I’ve added the start
entry point which calls main
so that main
has something to return to, but overall it is essentially the same code that gcc generated.
This code makes use of the lw
(load word) and sw
(store word) memory access instructions which we haven’t seen previously.
start:
call main
.word 0 ; emit an illegal instruction to stop the VM
main:
addi sp, sp, -32 ; reserve space on the stack
; save registers s0 - s3 and the return address register ra onto the stack
sw s0, 24(sp)
sw s1, 20(sp)
sw s2, 16(sp)
sw s3, 12(sp)
sw ra, 28(sp)
; s1 = the address of the start of the lookup table
lui s1, %hi(lut)
addi s1, s1, %lo(lut)
li s0, 0 ; i = 0
li s2, 48 ; s2 = 48
print_loop:
lw a1, 0(s1) ; arg1 = *s1
mv a0, s0 ; arg0 = i
addi s0, s0, 1 ; i++
call print_fib
addi s1, s1, 4 ; bump s1 to the next address in the lookup table
bltu s0, s2, print_loop ; if i < 48 go to print_loop
; restore the return address register ra and registers s0 - s3 from the stack
lw ra, 28(sp)
lw s0, 24(sp)
lw s1, 20(sp)
lw s2, 16(sp)
lw s3, 12(sp)
li a0, 0 ; set the return value to zero
addi sp, sp, 32 ; free the reserved space back to the stack
ret ; return from main
print_fib:
ecall ; invoke hard-coded print_fib
ret ; return from print_fib
lut:
.word 0 ; fib(0)
.word 1 ; fib(1)
.word 1 ; fib(2)
.word 2 ; fib(3)
.word 3 ; fib(4)
.word 5 ; fib(5)
.word 8 ; fib(6)
.word 13 ; fib(7)
; --- snip ---
.word 1836311903 ; fib(46)
.word 2971215073 ; fib(47)
Memory access in the stack frame
The original listing of Fibonacci from part 0 had a stack frame. I removed it at the time because it wasn’t necessary for generating the Fibonacci sequence.
This time we’re retaining it because unlike some instruction set architectures which have dedicated push
and pop
instructions that operate directly on the stack to save and restore registers, the Owl-2820 CPU does not, and therefore needs to use its sw
and lw
instructions to do this.
The code to set up the stack frame consists primarily of sw
(store word) memory access instructions that store register values into the stack at memory addresses relative to the stack pointer, sp.
main:
addi sp, sp, -32 ; reserve space on the stack
; save registers s0 - s3 and the return address register ra onto the stack
sw s0, 24(sp)
sw s1, 20(sp)
sw s2, 16(sp)
sw s3, 12(sp)
sw ra, 28(sp)
Similarly, the code to tear down the stack frame consists primarily of lw
(load word) instructions which it uses to restore register values from the stack at memory addresses relative to the stack pointer, sp.
; restore the return address register ra and registers s0 - s3 from the stack
lw ra, 28(sp)
lw s0, 24(sp)
lw s1, 20(sp)
lw s2, 16(sp)
lw s3, 12(sp)
li a0, 0 ; set the return value to zero
addi sp, sp, 32 ; free the reserved space back to the stack
ret ; return from main
The reason that these particular registers are saved on the stack is that according to the Owl-2820 calling convention described in part 4 they are all callee-saved registers, meaning that it is the callee not the caller is responsible for preserving their values.
As a reminder, the ret
instruction, which we also introduced in part 4, doesn’t pop the return address from the stack. Instead, it jumps to the address in the return address register, ra, which was saved to the stack by an sw
instruction on entry to main
and restored from the stack by an lw
instruction on exit from main
.
Memory access in the Fibonacci lookup table
The lookup table itself is implemented by using the s1 register as a pointer to its current entry.
Here’s the code to initialize s1 with the address of the lookup table. Don’t worry what %hi
and %lo
mean for now as I’ll explain those shortly.
; s1 = the address of the start of the lookup table
lui s1, %hi(lut) ; load the upper 20 bits of the lookup table's address into the
; upper 20 bits of s1
addi s1, s1, %lo(lut) ; combine that with the lower 12 bits to get the full 32-bit
; address
Here’s the code that uses the lookup table.
print_loop:
lw a1, 0(s1) ; arg1 = *s1
mv a0, s0 ; arg0 = i
addi s0, s0, 1 ; i++
call print_fib
addi s1, s1, 4 ; bump s1 to the next address in the lookup table
bltu s0, s2, print_loop ; if i < 48 go to print_loop
The lw a1, 0(s1)
instruction reads the word at the address pointed to by s1 and puts it into the a1 register so that it can be used as an argument to print_fib
.
After printing, it increments s1 by 4 bytes (the size of a 32-bit word) so that it points to the address of the next value, and loops around to print the next value.
Load and store instructions
The Owl-2820 assembly language for this implementation of Fibonacci introduced two new instructions, lw
and sw
. These load (read) a 32-bit word value from memory into a register and store (write) a 32-bit word value from a register into memory, respectively.
The Owl-2820 CPU also supports reading and writing of byte values and 16-bit halfword values. The following table shows all the Owl-2820 instructions that read from and write to memory.
The Owl-2820 CPU is little-endian, meaning that values in memory that are larger than a byte are stored with the least-significant byte first.
Instruction | Description |
---|---|
lb r0, imm12(r1) |
Loads a byte from address imm12(r1) and sign-extends it into r0 |
lbu r0, imm12(r1) |
Loads a byte from address imm12(r1) and zero-extends it into r0 |
lh r0, imm12(r1) |
Loads a little-endian halfword from address imm12(r1) and sign-extends it into r0 |
lhu r0, imm12(r1) |
Loads a little-endian halfword from address imm12(r1) and zero-extends it into r0 |
lw r0, imm12(r1) |
Loads a little-endian word from address imm12(r1) into r0 . |
sb r0, imm12(r1) |
Stores the lowest byte of r0 into address imm12(r1) |
sh r0, imm12(r1) |
Stores the lower halfword of r0 into address imm12(r1) in little-endian order |
sw r0, imm12(r1) |
Stores the word in r0 into address imm12(r1) in little-endian order |
Directives
The assembly language also introduced a few assembly language directives and expressions. These are not Owl-2820 instructions, but are used to tell the assembler to do something.
Directive | Description |
---|---|
.word u32 |
Emits the 32-bit unsigned word u32 into the code generated by the assembler |
%lo(label) |
Calculates the lower 12 bits of the absolute address of label label |
%hi(label) |
Calculates the upper 20 bits of the absolute address of label label |
The .word
directive tells the assembler to emit a word at the current address rather than emiting an instruction. This allows us to write values in the Fibonacci lookup table directly into the code generated by the assembler.
lut:
.word 0 ; fib(0)
.word 1 ; fib(1)
.word 1 ; fib(2)
.word 2 ; fib(3)
.word 3 ; fib(4)
.word 5 ; fib(5)
.word 8 ; fib(6)
.word 13 ; fib(7)
; --- snip ---
.word 1836311903 ; fib(46)
.word 2971215073 ; fib(47)
The %hi
and %lo
expressions work with the lui
and addi
instructions so that we can get the address of the lookup table. We need to do this as the Owl-2820 CPU provides no way of loading a 32-bit immediate value in a single instruction, because instructions themselves occupy 32-bits.
; s1 = the address of the start of the lookup table
lui s1, %hi(lut) ; load the upper 20 bits of the lookup table's address into the
; upper 20 bits of s1
addi s1, s1, %lo(lut) ; combine that with the lower 12 bits to get the full 32-bit
; address
Implementing the memory access instructions
The next few subsections describe the Owl-2820 memory access instructions using the format introduced in part 1.
The memory accesses for these instructions are implemented in terms of ReadN()
and WriteN()
functions, where N is the size of the operation in bits, e.g., Read8()
reads 8 bits. These functions are explained in detail in the next section, Implementing memory.
lb - load byte
Format
lb r0, imm12(r1)
Encoding
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | Lb : 7 bits |
Description
Loads a sign-extended byte value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0.
Possible implementation
const uint32_t addr = x[r1(ins)] + imm12(ins);
const int32_t signExtendedByte = static_cast<int8_t>(Read8(memory, addr));
x[r0(ins)] = signExtendedByte;
x[0] = 0; // Ensure x0 is always zero.
lbu - load byte unsigned
Format
lbu r0, imm12(r1)
Encoding
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | Lbu : 7 bits |
Description
Loads a zero-extended byte value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0.
Possible implementation
const uint32_t addr = x[r1(ins)] + imm12(ins);
const uint32_t zeroExtendedByte = static_cast<uint8_t>(Read8(memory, addr));
x[r0(ins)] = zeroExtendedByte;
x[0] = 0; // Ensure x0 is always zero.
lh - load halfword
Format
lh r0, imm12(r1)
Encoding
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | Lh : 7 bits |
Description
Loads a sign-extended 16-bit halfword value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0. The value is read from memory in little-endian order, i.e., least significant byte first.
Possible implementation
const uint32_t addr = x[r1(ins)] + imm12(ins);
const int32_t signExtendedHalfWord = static_cast<int16_t>(Read16(memory, addr));
x[r0(ins)] = signExtendedHalfWord;
x[0] = 0; // Ensure x0 is always zero.
lhu - load halfword unsigned
Format
lhu r0, imm12(r1)
Encoding
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | Lhu : 7 bits |
Description
Loads a zero-extended 16-bit halfword value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0. The value is read from memory in little-endian order, i.e., least significant byte first.
Possible implementation
const uint32_t addr = x[r1(ins)] + imm12(ins);
const uint32_t zeroExtendedHalfWord = static_cast<uint16_t>(Read16(memory, addr));
x[r0(ins)] = zeroExtendedHalfWord;
x[0] = 0; // Ensure x0 is always zero.
lw - load word
Format
lw r0, imm12(r1)
Encoding
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | Lw : 7 bits |
Description
Loads a 32-bit word value from the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12 into register r0. The value is read from memory in little-endian order, i.e., least significant byte first.
Possible implementation
const uint32_t addr = x[r1(ins)] + imm12(ins);
x[r0(ins)] = Read32(memory, addr);
x[0] = 0; // Ensure x0 is always zero.
sb - store byte
Format
sb r0, imm12(r1)
Encoding
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | Sb : 7 bits |
Description
Stores the value in the lowest byte of register r0 into the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12.
Possible implementation
const uint32_t addr = x[r1(ins)] + imm12(ins);
const std::byte byte = static_cast<std::byte>(x[r0(ins)]);
Write8(memory, addr, byte);
sh - store halfword
Format
sh r0, imm12(r1)
Encoding
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | Sh : 7 bits |
Description
Stores the value in the lowest 16 bits of register r0 into the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12. The value is written into memory in little-endian order, i.e., least significant byte first.
Possible implementation
const uint32_t addr = x[r1(ins)] + imm12(ins);
const uint16_t halfWord = static_cast<uint16_t>(x[r0(ins)]);
Write16(memory, addr, halfWord);
sw - store word
Format
sw r0, imm12(r1)
Encoding
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | Sw : 7 bits |
Description
Stores the 32-bit value in register r0 into the memory address pointed to by register r1 plus the sign-extended 12-bit immediate value imm12. The value is written into memory in little-endian order, i.e., least significant byte first.
Possible implementation
const uint32_t addr = x[r1(ins)] + imm12(ins);
const uint32_t word = x[r0(ins)];
Write32(memory, addr, word);
Implementing memory
The memory access instructions in the previous section were implemented in terms of the following ReadN()
and WriteN()
memory access functions.
Function | Description |
---|---|
Read8() |
Reads a byte from memory |
Read16() |
Reads a 16-bit halfword from memory in little-endian order |
Read32() |
Reads a 32-bit word from memory in little-endian order |
Write8() |
Writes a byte into memory |
Write16() |
Writes a 16-bit halfword into memory in little-endian order |
Write32() |
Writes a 32-bit word into memory in little-endian order |
All of these functions operate on byte boundaries, so memory can be thought of as a directly addressable sequence of contiguous bytes. In C++, we can represent this with a std::span
of bytes.
using Memory = std::span<std::byte>;
Reading and writing bytes
Reading and writing bytes is simple. Our memory implementation is a std::span
of bytes, so we index it with the Owl-2820 VM’s memory address and read or write the byte value as required.
std::byte Read8(const Memory memory, uint32_t addr)
{
return memory[addr];
}
void Write8(Memory memory, uint32_t addr, std::byte byte)
{
memory[addr] = byte;
}
Reading and writing larger values
Reading and writing 16-bit halfword and 32-bit word values is a little more complicated for two reasons:
- The Owl-2820 CPU is little-endian whereas the host CPU may be big-endian.
- The Owl-2820 CPU allows unaligned memory access whereas the host CPU may not.
Swapping bytes
If the host CPU is little-endian, then any data read from Owl-2820 memory is already in little-endian order and does not to be reversed. However, if the host CPU is big-endian, then any data read from Owl-2820 memory in little-endian order needs to be reversed to the native order, otherwise the value will be incorrect.
Given that the Owl-2820 CPU can read and write both 16-bit halfwords and 32-bit words, we can solve this problem by supplying two overloaded functions that will simply return the value on a little-endian host machine, and swap the byte order of the value on a big-endian host machine.
Both overloads contain if constexpr
statements that are evaluated at compile time which means that the compiler will generate the appropriate code to leave the byte order unchanged when the host machine is little-endian, and to reverse the byte order when the host machine is big-endian.
Bytes are swapped by a combination of masking and rotating which most major compilers will detect and replace with a single instruction, at least where the underlying instruction set architecture supports it. If you’re interested in other such bit-twiddling tricks then I recommend Hacker’s Delight by Henry S. Warren, as the book is full of them.
Here’s the implementation of 32-bit version of the overloaded function. The comments in the code contain a worked example showing how the hex number 0x12345678
can have its byte order reversed to 0x78563412
. The 16-bit implementation of the function is similar, albeit simpler.
uint32_t AsLE(uint32_t word)
{
if constexpr (std::endian::native == std::endian::little)
{
// Nothing to do. We're on a little-endian platform, and Owl-2820 is little-endian.
return word;
}
if constexpr (std::endian::native == std::endian::big)
{
// We're on a big-endian platform so reverse the byte order as Owl-2820 is little-endian.
// The code may look like it takes several instructions, but MSVC, gcc,and clang are clever
// enough to figure out what is going on and will generate the equivalent of `bswap` (x86)
// or `rev` (ARM).
//
// Example (all values are hex).
//
// std::rotr(word & 00ff00ff, 8)
// 12345678 & 00ff00ff == 00340078
// 00340078 rotr 8 == 78004500
//
// std::rotl(word, 8) & 00ff00ff
// 12345678 rotl 8 == 34567812
// 34567812 & 00ff00ff == 00560012
//
// 78004500 | 00560012 == 78563412
//
return std::rotr(word & 0x00ff00ff, 8) | (std::rotl(word, 8) & 0x00ff00ff);
}
}
Handling unaligned memory access
Some instruction set architectures do not support unaligned memory access. On such architectures, a 16-bit halfword can only be read from or written to a memory address that is a multiple of two. Likewise, a 32-bit word can only be read from or written to a memory address that is a multiple of four. Any attempt to perform an unaligned memory access will result in a fault often known as a bus error.
The Owl-2820 CPU supports unaligned memory access so to support the scenario where the host machine does not, we need to provide a mechanism to translate an unaligned memory access into an aligned one. This turns out to be straightforward.
When reading, we can do a bytewise memcpy()
of the bytes from Owl-2820 memory into a variable that is suitably aligned. For example, here’s how we can read a potentially unaligned 16-bit value from Owl-2820 memory into a uint16_t
which will be aligned to a 16-bit boundary on an architecture that requires it.
// We do the equivalent of a memcpy from the VM's memory into a uint16_t variable which we can
// expect the compiler to align appropriately for the host platform. On a platform which
// supports unaligned reads, most compilers will detect what we're doing and optimize it away.
uint16_t v;
std::ranges::copy_n(memory.data() + addr, sizeof(v), reinterpret_cast<std::byte*>(&v));
Similarly, when writing, we can do a memcpy()
from an aligned value into Owl-2820 memory.
// We do the equivalent of a mempy from an aligned uint16_t value into the VM's memory.
const uint16_t v = valueToCopy;
std::ranges::copy_n(reinterpret_cast<const std::byte*>(&v), sizeof(v), memory.data() + addr);
}
Note that as this is C++20 I’ve spelled memcpy()
as std::ranges::copy_n()
.
Putting it all together
Putting it all together, to read a 16-bit halfword or a 32-bit word from memory, we first deal with unaligned memory access by copying the bytes from Owl-2820 memory into a suitably aligned value, then, if necessary, we swap the bytes to convert them from little-endian order to native order.
uint32_t Read32(const Memory memory, uint32_t addr)
{
// Owl-2820 is permissive about unaligned memory accesses. This may not be the case for
// the host platform, so we do the equivalent of a memcpy from the VM's memory before
// trying to interpret the value. Most compilers will detect what we're doing and
// optimize it away.
uint32_t v;
std::ranges::copy_n(memory.data() + addr, sizeof(uint32_t), reinterpret_cast<std::byte*>(&v));
// Owl-2820 is little-endian, so swap the byte order if necessary.
return AsLE(v);
}
To write a 16-bit halfword or a 32-bit word to memory, we first swap the value from native order into little-endian order if necessary, then copy the underlying bytes from the aligned value into Owl-2820 memory.
void Write32(Memory memory, uint32_t addr, uint32_t word)
{
// Owl-2820 is little-endian, so swap the byte order if necessary.
const uint32_t v = AsLE(word);
// Owl-2820 is permissive about unaligned memory accesses. This may not be the case for
// the host platform, so we do the equivalent of a memcpy when writing the value to the
// VM's memory. Most compilers will detect what we're doing and optimize it away.
std::ranges::copy_n(reinterpret_cast<const std::byte*>(&v), sizeof(uint32_t),
memory.data() + addr);
}
Rewriting Run()
The original implementation of Run()
performed no memory access other than fetching the next instruction from memory. We implemented that by supplying Run()
with a pointer to some 32-bit values representing the Owl-2820 instructions, then by using the program counter, pc, to derive an index into those instructions as shown in the code fragment below.
void Run(const uint32_t* code)
{
// ...
while (!done)
{
// Fetch a 32-bit word from memory at the address pointed to by the program counter.
pc = nextPc;
nextPc += wordSize;
const uint32_t ins = code[pc / wordSize];
Changes to Run()
The revised implementation of Run()
needs to support not only fetching the next instruction from memory as before, but it also needs to support reading from and writing to byte-addressable memory.
It enables this by changing Run()
’s signature to take a std::span<uint32_t>
which unifies both code and data into a word-aligned memory image.
void Run(std::span<uint32_t> image)
{
To illustrate how it is used, here’s how Run()
is invoked from main()
.
// Create a 4K memory image.
constexpr size_t memorySize = 4096;
std::vector<uint32_t> image(memorySize / sizeof(uint32_t));
// Assemble our program and copy it to the start of the memory image.
auto code = Assemble();
std::ranges::copy(code, image.begin());
Run(image);
Two views of memory
When Run()
fetches an instruction, it does so via code
, a read-only, word-addressable view of the image, whereas memory access instructions such as lb
and sb
operate on memory
, a read/write, byte-addressable view of the image exposed by std::as_writable_bytes()
.
void Run(std::span<uint32_t> image)
{
// ...
// Get a read-only, word-addressable view of the image for fetching instructions.
const auto code = image;
// Get a read/write, byte-addressable view of the image for memory accesses.
auto memory = std::as_writable_bytes(image);
Initializing the stack pointer
Our new implementation of Fibonacci uses the stack, so Run()
needs to initialize the stack pointer, sp, to a sensible value. It sets it to the end of memory as the stack grows downwards in memory.
// ...
// Set the stack pointer to the end of memory.
x[sp] = uint32_t(memory.size());
Fetching an instruction
The code that fetches the next instruction from memory needs to take account of byte order where it didn’t previously. It does this by using the 32-bit overload of AsLE()
to read the value in little-endian order.
// ...
while (!done)
{
// Fetch a 32-bit word from the address pointed to by the program counter.
pc = nextPc;
nextPc += wordSize;
const uint32_t ins = AsLE(code[pc / wordSize]);
Implementing memory access instructions
Finally, Run()
needs implementations for the memory access instructions. The implementations of these instructions are the same as those shown in previous sections, and use the ReadN()
and WriteN()
functions to operate on memory
, the byte-addressable view of the image.
// Decode the word to extract the opcode.
const Opcode opcode = Opcode(ins & 0x7f);
// Dispatch it and execute it.
switch (opcode)
{
// ...
case Opcode::Lb: {
// r0 <- sext(memory8(r1 + imm12))
const uint32_t addr = x[r1(ins)] + imm12(ins);
const int32_t signExtendedByte = static_cast<int8_t>(Read8(memory, addr));
x[r0(ins)] = signExtendedByte;
x[0] = 0; // Ensure x0 is always zero.
break;
}
// ...
case Opcode::Sw: {
// memory32(r1 + imm12) <- r0
const uint32_t addr = x[r1(ins)] + imm12(ins);
const uint32_t word = x[r0(ins)];
Write32(memory, addr, word);
break;
}
// ...
Summary
In this post, we added memory access to the Owl-2820 CPU and demonstrated it using an implementation of Fibonacci that uses a lookup table. We also built upon what we learned about calling conventions in the previous post, by having our program set up and tear down a stack frame. Finally, we unified code and data into one memory image which we supplied to Run()
.
Full code
Here’s a link to the full code for this post on Compiler Explorer.