5 October, 2024

Expressing the idea of an instruction handler using C++ concepts.

Part 11 Revisiting the instruction handler

In part 10 - running RISC-V instructions directly, we were able to run RISC-V code directly on the Owl VM, and ran benchmarks to show that the VM is quicker when it uses the Owl-2820 instruction encoding than it is when it decodes RISC-V instructions directly.

To do that, we used three different dispatcher functions.

  1. Transcode() dispatches RISC-V encoded instructions to the Owl-2820 Assembler.
  2. DispatchRv32i() dispatches RISC-V encoded instructions to the OwlCpu.
  3. DispatchOwl() dispatches Owl-2820 encoded instructions to the OwlCpu.

Each dispatcher function decodes instructions into opcodes and operands, then dispatches them to a class that fulfils the role of an instruction handler.

In part 10, I described instruction handlers as a protocol implemented in terms of handler functions for Owl-2820 instructions. In this post, we’ll see how we can take the representation of an instruction handler from mere convention, to a representation in C++ that can be enforced by the compiler.

Instruction handlers by convention

Let’s review how instruction handlers are used by the dispatcher functions.

Transcode() dispatches RISC-V encoded instructions to the Owl-2820 Assembler. Here, the Assembler fulfils the role of the instruction handler by assembling instructions in Owl-2820 encoding.

void Transcode(Assembler& a, uint32_t code)
{
    DecodeRv32 rv(code);

    switch (code) {
        case 0x00000073: return a.Ecall();
        case 0x00100073: return a.Ebreak();
    }
    switch (code & 0xfe00707f) {
        case 0x00000033: return a.Add(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40000033: return a.Sub(rv.Rd(), rv.Rs1(), rv.Rs2());
        // ...

Similarly, DispatchRv32i() dispatches RISC-V encoded instructions to the OwlCpu. Here, OwlCpu fulfils the role of the instruction handler by executing the instructions.

void DispatchRv32i(OwlCpu& cpu, uint32_t code)
{
    DecodeRv32 rv(code);

    switch (code) {
        case 0x00000073: return cpu.Ecall();
        case 0x00100073: return cpu.Ebreak();
    }
    switch (code & 0xfe00707f) {
        case 0x00000033: return cpu.Add(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40000033: return cpu.Sub(rv.Rd(), rv.Rs1(), rv.Rs2());
        // ...

If we were to implement a disassembler then we could have DisassembleRv32i() which dispatches RISC-V encoded instructions to a Disassembler class. In this case, Disassembler fulfils the role of the instruction handler by disassembling the instructions and writing them to stdout.

void DisassembleRv32i(Disassembler& d, uint32_t code)
{
    DecodeRv32 rv(code);

    switch (code) {
        case 0x00000073: return d.Ecall();
        case 0x00100073: return d.Ebreak();
    }
    switch (code & 0xfe00707f) {
        case 0x00000033: return d.Add(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40000033: return d.Sub(rv.Rd(), rv.Rs1(), rv.Rs2());
        // ...

The body of the dispatcher function differs very little between examples. The only thing that differs from function to function is the type of the instruction handler. Transcode() dispatches to Assembler, DispatchRv32i() dispatches to OwlCpu, and DisassembleRv32i() dispatches to Disassembler.

Although I’ve been describing Assembler, Disassembler and OwlCpu as instruction handlers, until now the notion of instruction handler has been something that we’ve been enforcing by convention only. In other words, it is not something that the compiler can enforce on our behalf.

If, however, we were to make this more than just a convention by encoding it in C++ as an InstructionHandler, then not only could the compiler enforce it for us, but also we wouldn’t have to write the same dispatcher function again and again for different instruction handlers.

How do we go from this idea of “instruction handler”, implemented by convention only, to a C++ InstructionHandler?

Instruction handlers as C++ interfaces

We could specify InstructionHandler as a C++ interface, i.e., a class whose members are all pure virtual functions.

class InstructionHandler
{
public:
    void Ecall() = 0;
    void Ebreak() = 0;
    void Add(uint32_t r0, uint32_t r1, uint32_t r2) = 0;
    void Sub(uint32_t r0, uint32_t r1, uint32_t r2) = 0;
    // ...

Each instruction handler class would derive from the InstructionHandler interface and override its virtual functions. For example, here’s how the Assembler would do it.

class Assembler : public InstructionHandler
{
public:
    void Ecall() override;
    void Ebreak() override;
    void Add(uint32_t r0, uint32_t r1, uint32_t r2) override;
    void Sub(uint32_t r0, uint32_t r1, uint32_t r2) override;
    // ...

Given this InstructionHandler interface, we could implement a single dispatcher function for RISC-V encoded instructions, like this.

void DispatchRv32i(InstructionHandler& handler, uint32_t code)
{
    DecodeRv32 rv(code);

    switch (code) {
        case 0x00000073: return handler.Ecall();
        case 0x00100073: return handler.Ebreak();
    }
    switch (code & 0xfe00707f) {
        case 0x00000033: return handler.Add(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40000033: return handler.Sub(rv.Rd(), rv.Rs1(), rv.Rs2());
        // ...

Similarly, we could implement a single dispatcher function for Owl-2820 encoded instructions.

void DispatchOwl(InstructionHandler& handler, uint32_t code)
{
    using namespace decode;

    const Opcode opcode = Opcode(code & 0x7f);

    switch (opcode)
    {
        case Opcode::Ecall: return handler.Ecall();
        case Opcode::Ebreak: return handler.Ebreak();
        case Opcode::Add: return handler.Add(r0(ins), r1(ins), r2(ins));
        case Opcode::Sub: return handler.Sub(r0(ins), r1(ins), r2(ins));
        // ...

At first glance, an InstructionHandler interface is a promising approach that is easy to implement. But what are its trade-offs?

Notational convenience vs run time performance

The first trade-off has to do with the mechanism by which interfaces are implemented in C++.

We used a pure virtual base class to specify the InstructionHandler interface, then overrode its member functions in the derived classes. However, virtual functions have a small performance penalty, because each function call involves an indirect call via a vtable. This is fractionally slower than making a direct call.

For some applications, and indeed for the Assembler and Disassembler classes, the extra cost of invoking a virtual function is small in proportion to the cost of the function itself.

However, for the OwlCpu class, each function body is tiny and performs only a few native instructions itself. As a result, the cost of an indirect call is proportionately higher than it would be for functions that do more work, such as those in the Disassembler. This is exacerbated by the fact that we might run the Assembler or Disassembler once, but when we’re running the VM itself then each instruction may be run many times over in a hot loop, so the overhead of indirect calls will be measurable over time.

Admittedly, modern C++ compilers are remarkably good at devirtualization, that is, replacing indirect calls with direct calls or inlined code when they detect that it is possible, but there are plenty of circumstances in which they won’t be able to do this.

In short, if we implement InstructionHandler in terms of an interface with virtual member functions, then in some circumstances there will be a small but measurable hit on performance.

Notational convenenience vs flexibility

The second trade-off is less obvious. If we use an interface then we can’t change the return types of the member functions without changing the interface.

Let’s illustrate this by having another look at how DisassembleRv32i() invokes the Disassembler.

void DisassembleRv32i(Disassembler& d, uint32_t code)
{
    DecodeRv32 rv(code);

    switch (code) {
        case 0x00000073: return d.Ecall();
        case 0x00100073: return d.Ebreak();
    }
    switch (code & 0xfe00707f) {
        case 0x00000033: return d.Add(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40000033: return d.Sub(rv.Rd(), rv.Rs1(), rv.Rs2());
        // ...

Each member function of the Disassembler returns void. The assumption here is that the Disassembler has been pre-configured to write its output somewhere, or perhaps it is simply hard-coded to write to stdout.

This is not flexible. What if we wanted to change the return type? For example, we might want our Disassembler to return a std::string rather than writing to stdout. Or we might want to separate disassembly from formatting, and have the Disassembler return a struct whose members consist of the opcode and its operands.

If we wanted to change the return type of the Disassembler to std::string, then we might try writing something like the example below. The only thing that has changed from the code above is the return type, which has changed from void to std::string.

std::string DisassembleRv32i(Disassembler& d, uint32_t code)
{
    DecodeRv32 rv(code);

    switch (code) {
        case 0x00000073: return d.Ecall();
        case 0x00100073: return d.Ebreak();
    }
    switch (code & 0xfe00707f) {
        case 0x00000033: return d.Add(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40000033: return d.Sub(rv.Rd(), rv.Rs1(), rv.Rs2());
        // ...

But if we do that, then the Disassembler would have to implement a different interface, because all the virtual member functions of the InstructionHandler interface return void, not std::string. This variation in return type is not something that we can express with a C++ interface.

In other words, the second trade-off is that the syntactic convenenience of interfaces will prevent us from being able to change the return type. What we gain in convenience, we lose in flexibility.

Why an interface may not be right

Now that we understand some of its trade-offs, choosing to use a C++ interface for the InstructionHandler is a decision in favour of notational convenience over both performance and flexibility.

In many circumstances this would be a good choice. However, the small but significant run-time overhead of virtual dispatch when executing millions of Owl-2820 instructions per second may be acceptable when running it on a desktop CPU, but it makes it less likely that we can run an Owl-2820 VM in a compute-constrained environment such as a Raspberry Pi Pico.

Let’s look at a different approach which doesn’t have this performance overhead.

Instruction handlers as C++ concepts

Specifying an InstructionHandler using an interface is an example of run-time polymorphism. But what if we could use compile-time polymorphism instead? In C++, that usually takes the form of templates. But rather than going for simple, unconstrained templates, let’s look at how we could use C++ concepts, which are named constraints on template arguments, that were introduced in C++ 20.

Our idea of an instruction handler has two parts to it.

  1. We want to specify the member functions of an instruction handler, and the types of their parameters.
  2. We want to specify the return type of the member functions required by an instruction handler.

We were able to specify (1) with an interface. But we were not able to specify (2) in this manner.

However, we can specify both if we implement InstructionHandler as a C++ concept.

Using the InstructionHandler concept in dispatcher functions

Using an InstructionHandler concept will enable us to write a function template for DispatchRv32i() that works with any type for which the requirements of the InstructionHandler concept are satisfied.

In other words, we can write DispatchRv32i() as a function template like this.

template <typename T>
    requires InstructionHandler<T>
T::Item DispatchRv32i(T& handler, uint32_t code)
{
    DecodeRv32 rv(code);

    switch (code) {
        case 0x00000073: return handler.Ecall();
        case 0x00100073: return handler.Ebreak();
    }
    switch (code & 0xfe00707f) {
        case 0x00000033: return handler.Add(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40000033: return handler.Sub(rv.Rd(), rv.Rs1(), rv.Rs2());
        // ...

Or, more succinctly, like this.

auto DispatchRv32i(InstructionHandler auto& handler, uint32_t code) -> auto
{
    DecodeRv32 rv(code);

    switch (code) {
        case 0x00000073: return handler.Ecall();
        case 0x00100073: return handler.Ebreak();
    }
    switch (code & 0xfe00707f) {
        case 0x00000033: return handler.Add(rv.Rd(), rv.Rs1(), rv.Rs2());
        case 0x40000033: return handler.Sub(rv.Rd(), rv.Rs1(), rv.Rs2());
        // ...

What’s more, we can do the same for DispatchOwl().

auto DispatchOwl(InstructionHandler auto& handler, uint32_t code) -> auto
{
    using namespace decode;

    const Opcode opcode = Opcode(code & 0x7f);

    switch (opcode)
    {
        case Opcode::Ecall: return handler.Ecall();
        case Opcode::Ebreak: return handler.Ebreak();
        case Opcode::Add: return handler.Add(r0(ins), r1(ins), r2(ins));
        case Opcode::Sub: return handler.Sub(r0(ins), r1(ins), r2(ins));
        // ...

Specifying the InstructionHandler concept

Now that we’ve seen how to use the InstructionHandler concept from dispatcher functions, let’s look at how we can specify the concept itself in C++ using a requires-expression.

template<typename T>
concept InstructionHandler =
        requires(T t, uint32_t r0, uint32_t r1, uint32_t r2, uint32_t shift, int32_t offs12,
                 int32_t imm12, int32_t offs20, uint32_t uimm20, uint32_t ins) {
            //
            typename T::Item;
            //
            { t.Ecall() } -> std::same_as<typename T::Item>;
            { t.Ebreak() } -> std::same_as<typename T::Item>;
            //
            { t.Add(r0, r1, r2) } -> std::same_as<typename T::Item>;
            { t.Sub(r0, r1, r2) } -> std::same_as<typename T::Item>;
            { t.Sll(r0, r1, r2) } -> std::same_as<typename T::Item>;
            // ...

You can read this as saying that for a class T to satisfy the requirements of the InstructionHandler concept, then it must be possible to compile the expressions in braces on the left, and that the return type of each expression should be the same as an associated type, T::Item.

Enforcing the InstructionHandler concept

With the interface based approach there was an obvious, direct relationship between each derived class and the InstructionHandler. With that approach, we wrote classes by deriving them from InstructionHandler and overriding its virtual member functions. If we had tried to do this with a class that didn’t override all of the pure virtual functions from InstructionHandler then we would have had a compilation error.

class OwlCpu : public InstructionHandler
{
public:
    void Ecall() override;
    // ...

Unfortunately, the relationship between a C++ concept and a class that meets that concept’s requirements is less obvious.

We can, however, make it clearer and enforce the relationship between a class and the InstructionHandler concept that it implements, by placing a static_assert() immediately after the class declaration, as shown below for OwlCpu.

class OwlCpu
{
    // ...
};

static_assert(InstructionHandler<OwlCpu>); // If OwlCpu doesn't meet the requirements of the concept
                                           // then compilation will fail _here_.

This will cause compilation to fail if the class doesn’t meet the requirements of the concept. And, perhaps more importantly, it will fail on the line containing the static_assert(), which is a huge aid to diagnosing problems because the line containing the error is very close to the code that caused it. Without the static_assert(), the errors would occur elsewhere in the code and the problem would be far less obvious.

Modifying instruction handlers

Finally, we get to implementation. We know how to specify the InstructionHandler concept and how to enforce it, so let’s modify our convention-based instruction handlers, Assembler, Disassembler, and OwlCpu to work with it.

We can express Assembler, Disassembler and OwlCpu as InstructionHandlers by giving each of them an associated Item type, and by implementing member functions that match the InstructionHandler concept’s requirements. We can also use a static_assert() after the class to cause compilation to fail if the class doesn’t meet the requirements of the concept.

Modifying Assembler

Here’s how we can modify Assembler to work with the InstructionHandler concept.

The associated Item type is aliased to void, so for the Assembler each member function required by the InstructionHandler concept must return void. The static_assert() after the class means that we’ll get a compilation error if Assembler doesn’t meet all the requirements of the InstructionHandler concept.

class Assembler
{
public:
    using Item = void;

    void Ecall()
    {
        Emit(encode::opc(Opcode::Ecall));
    }

    void Ebreak()
    {
        Emit(encode::opc(Opcode::Ebreak));
    }

    void Add(uint32_t r0, uint32_t r1, uint32_t r2)
    {
        Emit(encode::opc(Opcode::Add) | encode::r0(r0) | encode::r1(r1) | encode::r2(r2));
    }

    void Sub(uint32_t r0, uint32_t r1, uint32_t r2)
    {
        Emit(encode::opc(Opcode::Sub) | encode::r0(r0) | encode::r1(r1) | encode::r2(r2));
    }

    // ...
};

static_assert(InstructionHandler<Assembler>);

Modifying OwlCpu

Here’s how we can modify OwlCpu.

As with Assembler, the associated Item type is aliased to void, so for OwlCpu each member function required by the InstructionHandler concept must return void. Again, the static_assert() after the class will cause a compilation error if OwlCpu doesn’t meet the requirements of InstructionHandler.

class OwlCpu
{
public:
    using Item = void;

    void Ecall()
    {
        const auto syscall = Syscall(x[a7]);
        switch (syscall)
        {
            // ...
        }
    }

    void Ebreak()
    {
        done = true;
    }

    void Add(uint32_t r0, uint32_t r1, uint32_t r2)
    {
        // r0 <- r1 + r2
        x[r0] = x[r1] + x[r2];
        x[0] = 0; // Ensure x0 is always zero.
    }

    void Sub(uint32_t r0, uint32_t r1, uint32_t r2)
    {
        // r0 <- r1 + r2
        x[r0] = x[r1] - x[r2];
        x[0] = 0; // Ensure x0 is always zero.
    }

    // ...
};

static_assert(InstructionHandler<Assembler>);

Modifying Disassembler

Finally, here’s how we can modify Disassembler.

In this case, the associated Item type is aliased to std::string, so each member function of Disassembler that is required by the InstructionHandler concept must also return std::string.

class Disassembler
{
public:
    using Item = std::string;

    std::string Ecall()
    {
        return "ecall";
    }

    std::string Ebreak()
    {
        return "ebreak";
    }

    std::string Add(uint32_t r0, uint32_t r1, uint32_t r2)
    {
        return std::format("add {}, {}, {}", regnames[r0], regnames[r1], regnames[r2]);
    }

    std::string Sub(uint32_t r0, uint32_t r1, uint32_t r2)
    {
        return std::format("sub {}, {}, {}", regnames[r0], regnames[r1], regnames[r2]);
    }

    // ...
};

static_assert(InstructionHandler<Disassembler>);

Summary

This post has largely been an exploration of the trade-offs between C++ interfaces and concepts when representing InstructionHandler in C++.

Given the choice between using run-time polymorphism in the form of C++ interfaces, and compile-time polymorphism in the form of C++ concepts, I’ve chosen to take the compile-time route, knowing that the small performance hit of virtual dispatch in the interface-based approach would have been multiplied many times over when running thousands or millions of instructions in the Owl-2820 VM.

I have plans for the Owl-2820 VM that may involve running it in compute-constrained environments, and I would like to get as much performance out of it as possible rather than prematurely pessimizing against performance by choosing virtual dispatch.

Show me the code

Until now, I’ve provided the code as a single source file on Compiler Explorer. However, for this post I’ve broken it up into multiple files and made it build with CMake.

You can find the code for this post, along with build instructions, in the posts/lbavm-011 branch of the Owl CPU repo on GitHub.