17 March, 2024

In this series we’re going to implement a virtual machine from scratch.

Let’s Build a Virtual Machine: Part 0 - Introduction

In this series we’re going to implement a virtual machine. We’ll start by creating a virtual CPU in software, and implementing its instruction set. We’ll write assembly language programs for our virtual CPU and run those programs on a virtual machine that we create. And over time we will build things up so that we can write programs for our virtual machine using languages such as C, C++, Rust or Swift.

Let’s see where that journey takes us.

Prerequisites

There aren’t many prerequisites for this series. My assumption is that you have at least a passing acquaintance with a language such as C, C++, Rust, or Swift, and that you’re reasonably familiar with ideas such as bits, bytes, addresses, and registers.

The initial implementation will be written in C++, largely because that’s what I use for my day job. Other fine languages are also available. If I use a C++ specific feature in the code then I will try my best to explain it.

Owls say what?

My writing will be incredibly verbose if I have to say things like “the virtual machine that we’re implementing” or “the virtual CPU that’s at the heart of the virtual machine that we’re implementing”.

Instead, I’m going to give this project a name. I’ll call it Owl, partly as an homage to the logo used on the BBC Micro, and also because it gives me an excuse to refer to the virtual CPU as the 2820 (two-eight-two-oh).

Typically I’ll describe the project as a whole as Owl, but if I need to make a distinction then I might say “the Owl VM” or “the Owl CPU”.

Building a minimal virtual CPU

If we’re going to implement a virtual CPU, then we’ll need to think about how we could run programs on it. Let’s start small, and consider how to implement enough of a CPU to run a program that generates the Fibonacci sequence for unsigned 32-bit integers.

To do that, let’s take a look at what a real CPU would do, by writing a program in C, then looking at the assembly language that it generates.

Fibonacci in C

Here’s a C program that displays numbers in the Fibonacci sequence, starting with fib(0) and ending with the prime number fib(47). It stops after displaying fib(47) because calculating fib(48) would overflow an unsigned 32-bit integer.

#include <stdint.h>
#include <stdio.h>

static uint32_t fib(uint32_t n)
{
    if (n < 2)
    {
        return n;
    }
    uint32_t previous = 0;
    uint32_t current = 1;
    for (; n > 1; --n)
    {
        uint32_t next = current + previous;
        previous = current;
        current = next;
    }
    return current;
}

int main(void)
{
    for (uint32_t i = 0; i < 48; i++)
    {
        uint32_t result = fib(i);
        printf("fib(%u) = %u\n", i, result);
    }
    return 0;
}

I chose this implementation because it’s simple to understand, even when compiled into assembly language. We’ll see what that looks like a little later.

Here’s what it outputs (abbreviated).

fib(0) = 0
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
...
fib(46) = 1836311903
fib(47) = 2971215073

Fibonacci in assembly language

Here’s a link to our program in Compiler Explorer.

If you click the link then you’ll see the C program in the panel on the left, and the equivalent assembly language in the panel on the right. What you’re looking at is RISC-V assembly language, specifically the 32-bit base integer instruction set known as RV32I.

There are many different instruction set architectures, including x64 and ARM, but I’ve picked RV32I as it’s far smaller and easier to explain than either of those.

Here’s a walkthrough of the assembly language that the compiler generated. I’ve commented it and changed the names of some of the labels to make things clearer. If you don’t know assembly language then it may still look very complicated, but I can promise you that it isn’t, and it won’t be long before we can write a program that can run this.

main:
        ; Enter main(), saving the registers that it uses onto the stack.
        addi    sp, sp, -32
        sw      ra, 28(sp)
        sw      s0, 24(sp)
        sw      s1, 20(sp)
        sw      s2, 16(sp)
        sw      s3, 12(sp)
        sw      s4, 8(sp)
        ; ------------------------------------------------------------------------------------------
        li      s0, 0                       ; i = 0
        li      s2, 2                       ; s2 = 2
        lui     a0, %hi(format_str)
        addi    s1, a0, %lo(format_str)     ; s1 = the address of the printf format string
        li      s3, 48                      ; s3 = 48
        li      s4, 1                       ; s4 = 1
        j       fib                         ; go to fib
print_loop:
        mv      a0, s1                      ; arg0 = the address of the printf format string
        mv      a1, s0                      ; arg1 = i
                                            ; arg2 is already set to current
        call    printf                      ; call printf
        addi    s0, s0, 1                   ; i = i + 1
        beq     s0, s3, done                ; if i == 48 go to done
fib:
        mv      a2, s0                      ; current = i
        bltu    s0, s2, print_loop          ; if i < 2 go to print_loop
        li      a0, 0                       ; previous = 0
        li      a2, 1                       ; current = 1
        mv      a1, s0                      ; n = i
fib_loop:
        mv      a3, a2                      ; tmp = current
        addi    a1, a1, -1                  ; n = n - 1
        add     a2, a0, a2                  ; current = current + prev
        mv      a0, a3                      ; previous = tmp
        bltu    s4, a1, fib_loop            ; if n > 1 go to fib_loop
        j       print_loop                  ; go to print_loop
done:
        li      a0, 0                       ; set the return value of main() to 0
        ; ------------------------------------------------------------------------------------------
        ; Restore registers from the stack and return from main().
        lw      ra, 28(sp)
        lw      s0, 24(sp)
        lw      s1, 20(sp)
        lw      s2, 16(sp)
        lw      s3, 12(sp)
        lw      s4, 8(sp)
        addi    sp, sp, 32
        ret                                 ; return
format_str:
        .asciz  "fib(%u) = %u\n"            ; the printf format string, terminated by '\0'

Registers

To understand the assembly language output better, we need to understand how RV32I names and uses registers.

RV32I has 32 integer registers, named x0 - x31, which are all 32-bit registers. The first of these, x0, always contains zero, and writing to it has no effect. The remaining registers are theoretically general purpose registers, but in practice certain conventions apply. The RISC-V integer register naming convention documentation on GitHub contains a table with the names of the integer registers and how they’re used. I’ve reproduced it in part here.

Register Name Description
x0 zero always contains zero (writing to x0 has no effect)
x1 ra return address
x2 sp stack pointer
x3 gp global pointer
x4 tp thread pointer
x5 - x7 t0 - t2 temporary registers
x8 - x9 s0 - s1 callee-saved registers
x10 - x17 a0 - a7 argument registers
x18 - x27 s2 - s11 callee-saved registers
x28 - x31 t3 - t6 temporary registers

The stack frame

Now we know that sp is the stack pointer, ra is the return address register, and s0 thru s11 are callee-saved registers, we can get a better understanding of the code in the listing above that comes before the first horizontal line, and the code in the listing above that comes after the second horizontal line.

These pieces of code are compiler-generated boilerplate that set up the stack frame on entry to main() and tear it down on exit from main().

Here’s the first piece of code, before the first horizontal line in the listing above:

main:
        ; Enter main(), saving the registers that it uses onto the stack.
        addi    sp, sp, -32
        sw      ra, 28(sp)
        sw      s0, 24(sp)
        sw      s1, 20(sp)
        sw      s2, 16(sp)
        sw      s3, 12(sp)
        sw      s4, 8(sp)
        ; ------------------------------------------------------------------------------------------

It sets up the stack frame by reserving some space in the stack, then saving the return address ra and the callee-saved registers s0 thru s4 into that space.

To save registers to the stack, it uses the sw (store word) instruction. For example, sw ra, 28(sp) means store the value in register ra into memory at the address pointed to the value in register sp plus 28 bytes.

To visualize this, think of memory as an array of 32-bit integers, and registers sp and ra as 32-bit integer variables.

mem[sp + 28/4] = ra; // sw ra, 28(sp)

The fact that it saves s0 thru s4 onto the stack is a good indication that those registers are used in main()’s body, and therefore need to be saved onto the stack to be restored later when main() returns.

Here’s the second piece of code, after the second horizontal line in the listing above:

        ; ------------------------------------------------------------------------------------------
        ; Restore registers from the stack and return from main().
        lw      ra, 28(sp)
        lw      s0, 24(sp)
        lw      s1, 20(sp)
        lw      s2, 16(sp)
        lw      s3, 12(sp)
        lw      s4, 8(sp)
        addi    sp, sp, 32
        ret                                 ; return

It tears down the stack frame by restoring the return address register ra and the caller saved registers s0 thru s4 from where they were previously saved on the stack, before restoring the previously reserved space on the stack and returning to the caller via the ret instruction.

To restore registers from the stack, it uses the lw (load word) instruction. For example, lw s4, 8(sp) means load the value in memory at the address pointed to by the value in register sp plus 8 bytes into register s4.

To visualize this, use the same trick as before and think of memory as an array of 32-bit integers, and registers sp and s4 as 32-bit integer variables.

s4 = mem[sp + 8/4]; // lw s4, 8(sp)

We don’t need to worry about these pieces of code any further, as they’re essentially boilerplate that the compiler generates at the start and end of each function. They won’t help us to generate the Fibonacci sequence.

However, there is one important observation to make. Our original program had two functions, main() and fib(), but there is no stack frame setup and teardown code for fib() as it has been inlined by the compiler. In other words, fib() doesn’t exist as a separate function in the generated assembly language, so there’s no need for a stack frame.

The Fibonacci code

Let’s look at main()’s body, which is the code between the two horizontal lines in the earlier listing. Here’s where we can find the code that we would like to run on our CPU. This is the code that displays the Fibonacci numbers from fib(0) to fib(47).

        li      s0, 0                       ; i = 0
        li      s2, 2                       ; s2 = 2
        lui     a0, %hi(format_str)
        addi    s1, a0, %lo(format_str)     ; s1 = the address of the printf format string
        li      s3, 48                      ; s3 = 48
        li      s4, 1                       ; s4 = 1
        j       fib                         ; go to fib
print_loop:
        mv      a0, s1                      ; arg0 = the address of the printf format string
        mv      a1, s0                      ; arg1 = i
                                            ; arg2 is already set to current
        call    printf                      ; call printf
        addi    s0, s0, 1                   ; i = i + 1
        beq     s0, s3, done                ; if i == 48 go to done
fib:
        mv      a2, s0                      ; current = i
        bltu    s0, s2, print_loop          ; if i < 2 go to print_loop
        li      a0, 0                       ; previous = 0
        li      a2, 1                       ; current = 1
        mv      a1, s0                      ; n = i
fib_loop:
        mv      a3, a2                      ; tmp = current
        addi    a1, a1, -1                  ; n = n - 1
        add     a2, a0, a2                  ; current = current + prev
        mv      a0, a3                      ; previous = tmp
        bltu    s4, a1, fib_loop            ; if n > 1 go to fib_loop
        j       print_loop                  ; go to print_loop
done:
        li      a0, 0                       ; set the return value of main() to 0

There are twenty-four instructions here. It may feel like a lot, but each instruction takes the form of an opcode, followed by zero or more operands. If we count the opcodes, then we’ll find that the code in main()’s body uses just nine of them.

Here they are.

Opcode Meaning Description
add Add registers adds the values in two registers and stores the result into another register
addi Add Immediate adds an immediate value and a register and stores the result in another register
beq Branch if EQual compares the content of two registers and jumps to a different location if they’re equal
bltu Branch if Less Than (Unsigned) compares the unsigned values in two registers and jumps to a different location if the first is less than the second
call Call calls a function
li Load Immediate loads an immediate value into a register
lui Load Upper Immediate loads an immediate value into the upper bits of a register
j Jump jumps to a location
mv Move copies the value in one register into another register

Modern CPUs can have hundreds of opcodes. Writing an emulator for one of them would be quite daunting. Fortunately, if we want our CPU to run the code in main()’s body then we’ll only need to implement the nine opcodes in the table above.

In the next post in this series, we’ll take a look at how we can do that.