31 August, 2024

Loading and running a compiled image on an Owl-2820 VM.

Part 9 Running a compiled image on an Owl-2820 VM

At the end of the previous post we got as far as being able to compile a C program into an image that contains RISC-V instructions. Now let’s see what it takes to load that image and run it on the Owl-2820 VM.

Here are a couple of options:

  1. Decode the instructions in the image from RISC-V instruction encoding and re-encode them into Owl-2820 instruction encoding, then run the resulting image on the Owl-2820 VM.

  2. Modify the Owl-2820 VM to decode RISC-V instruction encoding and run the image directly.

Let’s start with option 1 as we already have one piece of the puzzle. We know how to encode instructions for the Owl-2820 by using its assembler. For example, if we want to encode an Add instruction then we can do so like this:

    Assembler a;
    //...
    a.Add(r0, r1, r1);

The other piece of the puzzle is that we need a way of decoding RISC-V RV32I instructions into opcodes and their operands, then dispatching them to the Owl-2820 assembler. We need a RISC-V dispatcher.

A RISC-V dispatcher

Decoding RISC-V’s instruction encoding is a little trickier than decoding Owl-2820’s instruction encoding, but not horribly so.

Creating a dispatcher from a table

As I’ve written one or two RISC-V instruction set emulators before, I’m going to play the “here’s one I made earlier” card and use a slightly hacky Python program that I wrote that uses a table to generate C++ or Rust source code for a RISC-V instruction dispatcher.

The program, make_dispatcher.py, has a simple command line.

usage: make_dispatcher.py [-h] [-c] [-f] [-m] {c++,rust}

Generate a RISC-V instruction dispatcher for the RV32I base ISA plus extensions.

positional arguments:
  {c++,rust}  Target language for code generation.

options:
  -h, --help  show this help message and exit
  -c          Enable the 'C' extension
  -f          Enable the 'F' extension
  -m          Enable the 'M' extension

Here’s how to use it to generate the C++ source code for an RV32I dispatcher.

python make_dispatcher.py c++

The output is heavily templated C++, because my last RISC-V emulator was motivated by an attempt to learn more about C++ concepts.

// This code was generated by `.\make_dispatcher.py c++`. Do not edit.

// A dispatcher for RV32I instructions. BYO handler.
template<typename Handler>
    requires IsRv32iInstructionHandler<Handler>
struct Rv32iDispatcher : public Handler
{
    using Item = typename Handler::Item;

    // Decodes the input word to an RV32I instruction and dispatches it to a handler.
    // clang-format off
    auto Dispatch(u32 code) -> Item
    {
        Handler& self = static_cast<Handler&>(*this);
        Instruction c(code);

        switch (code) {
            case 0x00000073: return self.Ecall();
            case 0x00100073: return self.Ebreak();
        }
        switch (code & 0xfe00707f) {
            case 0x00000033: return self.Add(c.Rd(), c.Rs1(), c.Rs2());
            case 0x40000033: return self.Sub(c.Rd(), c.Rs1(), c.Rs2());
            case 0x00001033: return self.Sll(c.Rd(), c.Rs1(), c.Rs2());
            case 0x00002033: return self.Slt(c.Rd(), c.Rs1(), c.Rs2());
            case 0x00003033: return self.Sltu(c.Rd(), c.Rs1(), c.Rs2());
            case 0x00004033: return self.Xor(c.Rd(), c.Rs1(), c.Rs2());
            case 0x00005033: return self.Srl(c.Rd(), c.Rs1(), c.Rs2());
            case 0x40005033: return self.Sra(c.Rd(), c.Rs1(), c.Rs2());
            case 0x00006033: return self.Or(c.Rd(), c.Rs1(), c.Rs2());
            case 0x00007033: return self.And(c.Rd(), c.Rs1(), c.Rs2());
            case 0x00001013: return self.Slli(c.Rd(), c.Rs1(), c.Shamtw());
            case 0x00005013: return self.Srli(c.Rd(), c.Rs1(), c.Shamtw());
            case 0x40005013: return self.Srai(c.Rd(), c.Rs1(), c.Shamtw());
        }
        switch (code & 0x0000707f) {
            case 0x00000063: return self.Beq(c.Rs1(), c.Rs2(), c.Bimmediate());
            case 0x00001063: return self.Bne(c.Rs1(), c.Rs2(), c.Bimmediate());
            case 0x00004063: return self.Blt(c.Rs1(), c.Rs2(), c.Bimmediate());
            case 0x00005063: return self.Bge(c.Rs1(), c.Rs2(), c.Bimmediate());
            case 0x00006063: return self.Bltu(c.Rs1(), c.Rs2(), c.Bimmediate());
            case 0x00007063: return self.Bgeu(c.Rs1(), c.Rs2(), c.Bimmediate());
            case 0x00000067: return self.Jalr(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00000013: return self.Addi(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00002013: return self.Slti(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00003013: return self.Sltiu(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00004013: return self.Xori(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00006013: return self.Ori(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00007013: return self.Andi(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00000003: return self.Lb(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00001003: return self.Lh(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00002003: return self.Lw(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00004003: return self.Lbu(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00005003: return self.Lhu(c.Rd(), c.Rs1(), c.Iimmediate());
            case 0x00000023: return self.Sb(c.Rs1(), c.Rs2(), c.Simmediate());
            case 0x00001023: return self.Sh(c.Rs1(), c.Rs2(), c.Simmediate());
            case 0x00002023: return self.Sw(c.Rs1(), c.Rs2(), c.Simmediate());
            case 0x0000000f: return self.Fence(c.Fm(), c.Rd(), c.Rs1());
        }
        switch (code & 0x0000007f) {
            case 0x0000006f: return self.Jal(c.Rd(), c.Jimmediate());
            case 0x00000037: return self.Lui(c.Rd(), c.Uimmediate());
            case 0x00000017: return self.Auipc(c.Rd(), c.Uimmediate());
        }
        return self.Illegal(code);
    }
    // clang-format on
};

// End of auto-generated code.

Dispatching RISC-V instructions to the Owl-2820 assembler

Fortunately, it’s easy to open an editor and make a few simple substitutions to the generated code that will give us a function that decodes a RISC-V instruction and dispatches it to the Owl-2820 assembler. This in turn encodes the instruction for Owl-2820.

Making those substitutions gives us the Transcode() function.

void Transcode(Assembler& a, uint32_t code)
{
    DecodeRv32 rv(code);

    // clang-format off
    switch (code) {
        case 0x00000073: return a.Ecall();
        case 0x00100073: return a.Ebreak();
    }
    switch (code & 0xfe00707f) {
        case 0x00000033: return a.Add(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40000033: return a.Sub(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x00001033: return a.Sll(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x00002033: return a.Slt(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x00003033: return a.Sltu(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x00004033: return a.Xor(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x00005033: return a.Srl(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40005033: return a.Sra(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x00006033: return a.Or(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x00007033: return a.And(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x00001013: return a.Slli(rv.Rd(), rv.Rs1(), rv.Shamtw());
        case 0x00005013: return a.Srli(rv.Rd(), rv.Rs1(), rv.Shamtw());
        case 0x40005013: return a.Srai(rv.Rd(), rv.Rs1(), rv.Shamtw());
    }
    switch (code & 0x0000707f) {
        case 0x00000063: return a.Beq(rv.Rs1(), rv.Rs2(), rv.Bimmediate());
        case 0x00001063: return a.Bne(rv.Rs1(), rv.Rs2(), rv.Bimmediate());
        case 0x00004063: return a.Blt(rv.Rs1(), rv.Rs2(), rv.Bimmediate());
        case 0x00005063: return a.Bge(rv.Rs1(), rv.Rs2(), rv.Bimmediate());
        case 0x00006063: return a.Bltu(rv.Rs1(), rv.Rs2(), rv.Bimmediate());
        case 0x00007063: return a.Bgeu(rv.Rs1(), rv.Rs2(), rv.Bimmediate());
        case 0x00000067: return a.Jalr(rv.Rd(), rv.Iimmediate(),rv.Rs1());
        case 0x00000013: return a.Addi(rv.Rd(), rv.Rs1(), rv.Iimmediate());
        case 0x00002013: return a.Slti(rv.Rd(), rv.Rs1(), rv.Iimmediate());
        case 0x00003013: return a.Sltiu(rv.Rd(), rv.Rs1(), rv.Iimmediate());
        case 0x00004013: return a.Xori(rv.Rd(), rv.Rs1(), rv.Iimmediate());
        case 0x00006013: return a.Ori(rv.Rd(), rv.Rs1(), rv.Iimmediate());
        case 0x00007013: return a.Andi(rv.Rd(), rv.Rs1(), rv.Iimmediate());
        case 0x00000003: return a.Lb(rv.Rd(), rv.Iimmediate(), rv.Rs1());
        case 0x00001003: return a.Lh(rv.Rd(), rv.Iimmediate(), rv.Rs1());
        case 0x00002003: return a.Lw(rv.Rd(), rv.Iimmediate(), rv.Rs1());
        case 0x00004003: return a.Lbu(rv.Rd(), rv.Iimmediate(), rv.Rs1());
        case 0x00005003: return a.Lhu(rv.Rd(), rv.Iimmediate(), rv.Rs1());
        case 0x00000023: return a.Sb(rv.Rs1(), rv.Simmediate(), rv.Rs2());
        case 0x00001023: return a.Sh(rv.Rs1(), rv.Simmediate(), rv.Rs2());
        case 0x00002023: return a.Sw(rv.Rs1(), rv.Simmediate(), rv.Rs2());
        case 0x0000000f: return a.Fence();
    }
    switch (code & 0x0000007f) {
        case 0x0000006f: return a.Jal(rv.Rd(), rv.Jimmediate());
        case 0x00000037: return a.Lui(rv.Rd(), rv.Uimmediate());
        case 0x00000017: return a.Auipc(rv.Rd(), rv.Uimmediate());
    }
    // clang-format on
    return a.Illegal(code);
}

There’s very little to explain here. The various switch statements are used to determine the instruction’s opcode. When a match is found, then the instruction’s operands are decoded by member functions of the DecodeRv32 helper class and passed as arguments to the assembler which re-encodes the instruction for Owl-2820.

Decoding operands

DecodeRv32 is a helper class for decoding operands from bitfields in a RISC-V instruction. It’s akin to the decoder functions in the Owl-2820 VM’s decode namespace, but for RISC-V rather than Owl.

class DecodeRv32
{
    uint32_t ins_{};

public:
    DecodeRv32(uint32_t ins) : ins_{ins} {}

    uint32_t Rd() const { return (ins_ >> 7) & 0x1f; }
    uint32_t Rs1() const { return (ins_ >> 15) & 0x1f; }
    uint32_t Rs2() const { return (ins_ >> 20) & 0x1f; }
    uint32_t Shamtw() const { return (ins_ >> 20) & 0x1f; }

    uint32_t Bimmediate() const
    {
        auto imm12 = static_cast<int32_t>(ins_ & 0x80000000) >> 19;     // ins[31] -> sext(imm[12])
        auto imm11 = static_cast<int32_t>((ins_ & 0x00000080) << 4);    // ins[7] -> imm[11]
        auto imm10_5 = static_cast<int32_t>((ins_ & 0x7e000000) >> 20); // ins[30:25] -> imm[10:5]
        auto imm4_1 = static_cast<int32_t>((ins_ & 0x00000f00) >> 7);   // ins[11:8]  -> imm[4:1]
        return static_cast<uint32_t>(imm12 | imm11 | imm10_5 | imm4_1);
    }

    uint32_t Iimmediate() const
    {
        return static_cast<uint32_t>(
                (static_cast<int32_t>(ins_) >> 20)); // ins[31:20] -> sext(imm[11:0])
    }

    uint32_t Simmediate() const
    {
        auto imm11_5 =
                (static_cast<int32_t>(ins_ & 0xfe000000)) >> 20; // ins[31:25] -> sext(imm[11:5])
        auto imm4_0 = static_cast<int32_t>((ins_ & 0x00000f80) >> 7); // ins[11:7]  -> imm[4:0]
        return static_cast<uint32_t>(imm11_5 | imm4_0);
    }

    uint32_t Jimmediate() const
    {
        auto imm20 = static_cast<int32_t>(ins_ & 0x80000000) >> 11;     // ins[31] -> sext(imm[20])
        auto imm19_12 = static_cast<int32_t>(ins_ & 0x000ff000);        // ins[19:12] -> imm[19:12]
        auto imm11 = static_cast<int32_t>((ins_ & 0x00100000) >> 9);    // ins[20] -> imm[11]
        auto imm10_1 = static_cast<int32_t>((ins_ & 0x7fe00000) >> 20); // ins[30:21] -> imm[10:1]
        return static_cast<uint32_t>(imm20 | imm19_12 | imm11 | imm10_1);
    }

    uint32_t Uimmediate() const
    {
        return ins_ & 0xfffff000; // ins[31:12] -> imm[31:12]
    }
};

Decoding registers

In RISC-V, there are 3 possible register encodings:

  1. The destination register rd.
  2. The source register rs1.
  3. The source register rs2.

These operands are decoded by the Rd(), Rs1(), and Rs2() member functions, and are approximately equivalent in usage to Owl-2820’s r0, r1 and r2 fields. They are very easy to decode, as they consist of a single shift operation and a single mask operation.

Decoding immediate values

Things become more complicated when decoding immediate values. For Owl-2820, immediate values are always encoded as a sequence of contiguous bits starting from the top-most bit of the instruction. This makes decoding and sign-extension easy to implement in software.

However, RISC-V instructions are designed to be decoded in hardware, where there is no need to keep bits consecutive. As we’ve seen before, the RISC-V encoding requires more work to decode in software. To illustrate this, compare the implementation of Jimmediate() in the example above with its Owl-2820 equivalent, offs20(), below.

    uint32_t offs20(const uint32_t ins)
    {
        return static_cast<uint32_t>(static_cast<int32_t>(ins & 0xfffff000) >> 11);
    }

Running the program

Now we have everything that we need to load the RV32I image and run it on the Owl-2820 VM. We can do that in a few simple steps:

  • Load the RV32I image.
  • Convert it to Owl-2820 encoding.
  • Run the image on the Owl VM.

Loading the RV32I image

Firstly, we load the RV32I image from part 8. This is done by LoadRv32i() which represents the image as an array of bytes.

std::span<uint32_t> LoadRv32iImage()
{
    // The RISC-V binary image from: https://badlydrawnrod.github.io/posts/2024/08/20/lbavm-008/
    // clang-format off
    alignas(alignof(uint32_t)) static uint8_t image[] = {
        0x13, 0x05, 0x00, 0x00, 0x93, 0x05, 0x00, 0x00, 0x13, 0x06, 0x00, 0x00, ... // etc
        0x13, 0x85, 0x08, 0x00, 0xE3, 0x68, 0x07, 0xFF, 0x93, 0x08, 0x10, 0x00, ... // etc
        // ...
        // ... etc
        // ...
        0x73, 0x00, 0x00, 0x00, 0x13, 0x06, 0x16, 0x00, 0xE3, 0x14, 0xF6, 0xFC, ... // etc
        0x67, 0x80, 0x00, 0x00
    };
    // clang-format on

    uint32_t* imageBegin = reinterpret_cast<uint32_t*>(image);
    uint32_t* imageEnd = imageBegin + sizeof(image) / sizeof(uint32_t);
    return std::span<uint32_t>(imageBegin, imageEnd);
}

Converting the image to Owl-2820 encoding

Then we convert the image to Owl-2820 encoding with Rv32iToOwl(). This uses the assembler with Transcode() to change each instruction from RISC-V instruction encoding into Owl-2820 instruction encoding.

std::vector<uint32_t> Rv32iToOwl(std::span<uint32_t> image)
{
    Assembler a;
    for (auto code : image)
    {
        Transcode(a, code);
    }
    return a.Code();
}

Running the converted image on the Owl-2820 VM

Finally, we can run the resulting image on the Owl-2820 VM.

int main()
{
    try
    {
        // Create a 4K memory image.
        constexpr size_t memorySize = 4096;
        std::vector<uint32_t> image(memorySize / sizeof(uint32_t));

        auto rv32iImage = LoadRv32iImage();

        // Transcode it to Owl-2820 and copy the result into our VM image.
        auto owlImage = Rv32iToOwl(rv32iImage);
        std::ranges::copy(owlImage, image.begin());

        Run(image);
    }
    catch (const std::exception& e)
    {
        std::cerr << e.what() << '\n';
    }
}

Summary

Over the course of the last two posts, we have learned how cross-compile C for RISC-V, and how to load and run the resulting image on our Owl-2820 VM.

Show me the code

Here’s a link to the fully runnable code for this post on Compiler Explorer.