30 July, 2024

Implementing the remaining instructions in the Owl-2820 instruction set.

Part 7 Implementing the Owl-2820 instruction set

Over the last couple of posts we’ve added memory to the Owl-2820 VM and we’ve seen how we can implement system calls.

However, we haven’t implemented all of the instructions in the Owl-2820 instruction set, as we’ve limited ourselves to just enough instructions to be able to run programs that implement Fibonacci. The Owl-2820 VM needs to be able to do more than run Fibonacci, so in this post we’re going to implement the remaining instructions in the Owl-2820 instruction set.

Recap

We implemented these instructions in part 1.

Instruction Description
add r0, r1, r2 add the values in registers r1 and r2 and stores the result into r0
addi r0, r1, imm12 adds the value in register r1 to an immediate value and stores the result into r0
beq r0, r1, offs12 compares the values in registers r0 and r1 and branches to offs12 if they’re equal
bltu r0, r1, offs12 compares the values in registers r0 and r1 and branches to offs12 if r0 is less than r1
j offs20 jumps to offs20
li r0, imm12 loads an immediate value into register r0
lui r0, uimm20 loads an immediate value into the upper bits of register r0
mv r0, r1 copies the value in r1 into r0

We implemented call, ret and ecall in part 4.

Instruction Description
call offs20 calls a subroutine at offs20, saving the return address in the return address register, ra
ret returns from a subroutine by jumping to the address in the return address register, ra
ecall invokes a system call

We added memory access instructions in part 5.

Instruction Description
lb r0, imm12(r1) Loads a byte from address imm12(r1) and sign-extends it into r0
lbu r0, imm12(r1) Loads a byte from address imm12(r1) and zero-extends it into r0
lh r0, imm12(r1) Loads a little-endian halfword from address imm12(r1) and sign-extends it into r0
lhu r0, imm12(r1) Loads a little-endian halfword from address imm12(r1) and zero-extends it into r0
lw r0, imm12(r1) Loads a little-endian word from address imm12(r1) into r0.
sb r0, imm12(r1) Stores the lowest byte of r0 into address imm12(r1)
sh r0, imm12(r1) Stores the lower halfword of r0 into address imm12(r1) in little-endian order
sw r0, imm12(r1) Stores the word in r0 into address imm12(r1) in little-endian order

The rest of the Owl

We learned in part 1 that the Owl-2820 CPU is based on RISC-V, specifically the RV32I base integer instruction set which has forty unique instructions. We’ve implemented nineteen - less than half that number. In fact, some of what we’ve implemented doesn’t actually have an RV32I equivalent. For example, you may be surprised to learn that j, li, mv, call and ret are not really RISC-V instructions, but are instructions that I’ve added to the Owl-2820 instruction set to enhance decoding performance.

Unlike previously, I am not going to go into the implementation of each and every instruction otherwise this post would become very long. In most cases, we’ve already seen how at least one instruction in each group is implemented, and the remainder of the instructions in the group tend to be fairly similar to each other.

For example, instructions in the Branch instructions group differ only in the branch condition, such as less-than or greater-than. Similarly, arithmetic instructions in the Register-register instructions group differ only in the operator, such as plus or minus.

The full Owl-2820 instruction set

Here’s an overview of the full Owl-2820 instruction set. It is essentially the RV32I instruction set, translated into Owl-2820 terminology, plus some additional Owl-only instructions.

System instructions

The instructions in this group can be broadly categorised as system instructions.

Instruction Description
ecall invokes a system call
ebreak breakpoint

Instructions in this group have the following encoding.

unused [31:7] opcode [6:0]
25 bits 7 bits

Register-register instructions

Instructions in this group perform an operation on the contents of registers r1 and r2, storing the result into register r0.

Instruction Description
add r0, r1, r2 adds the values in r1 and r2 and stores the result into r0
sub r0, r1, r2 subtracts the value in r2 from r1 and stores the result in r0
sll r0, r1, r2 does a shift left of r1 by r2 positions and stores the result in r0
slt r0, r1, r2 sets r0 to 1 if r1 < r2 otherwise sets it to 0 (signed comparison)
sltu r0, r1, r2 sets r0 to 1 if r1 < r2 otherwise sets it to 0 (unsigned comparison)
xor r0, r1, r2 bitwise XORs r1 with r2 and stores the result in r0
srl r0, r1, r2 does a zero-extending shift right of r1 by r2 positions and stores the result in r0
sra r0, r1, r2 does a sign-extending shift right of r1 by r2 positions and stores the result in r0
or r0, r1, r2 bitwise ORs r1 with r2 and stores the result in r0
and r0, r1, r2 bitwise ANDs r1 with r2 and stores the result in r0

Instructions in this group have the following encoding.

unused [31:22] r2 [21:17] r1 [16:12] r0 [11:7] opcode [6:0]
10 bits 5 bits 5 bits 5 bits 7 bits

Immediate shift instructions

These instructions shift the contents of register r1 by an immediate shift value, storing the result into register r0.

Instruction Description
slli r0, r1, shift does a shift left of r1 by shift positions and stores the result in r0
srli r0, r1, shift does a zero-extending shift right of r1 by shift positions and stores the result in r0
srai r0, r1, shift does a sign-extending shift right of r1 by shift positions and stores the result in r0

Instructions in this group have the following encoding.

unused [31:22] shift [21:17] r1 [16:12] r0 [11:7] opcode [6:0]
10 bits 5 bits 5 bits 5 bits 7 bits

Branch instructions

The conditional branch instructions perform a comparison operation on the contents of registers r0 and r1. If the comparison evaluates to true then they take the branch, offs12, which is a sign-extended 12-bit offset in multiples of two bytes relative to the address of the branch instruction.

Instruction Description
beq r0, r1, offs12 branches to offs12 if r0 == r1
bne r0, r1, offs12 branches to offs12 if r0 != r1
blt r0, r1, offs12 branches to offs12 if r0 < r1 (signed comparison)
bge r0, r1, offs12 branches to offs12 if r0 >= r1 (signed comparison)
bltu r0, r1, offs12 branches to offs12 if r0 < r1 (unsigned comparison)
bgeu r0, r1, offs12 branches to offs12 if r0 >= r1 (unsigned comparison)

Instructions in this group have the following encoding.

offs12 [31:20] unused [19:17] r1 [16:12] r0 [11:7] opcode [6:0]
12 bits 3 bits 5 bits 5 bits 7 bits

Register-immediate instructions

Instructions in this group perform an operation on the contents of register r1 and a sign-extended 12-bit immediate value, imm12, storing the result into register r0.

Instruction Description
addi r0, r1, imm12 adds the value in register r1 to an immediate value and stores the result into r0
slti r0, r1, imm12 sets r0 to 1 if r1 < imm12 (signed comparison) otherwise sets r0 to 0
sltiu r0, r1, imm12 sets r0 to 1 if r1 < imm12 (unsigned comparison) otherwise sets r0 to 0
xori r0, r1, imm12 bitwise XORs r1 with imm12 and stores the result in r0
ori r0, r1, imm12 bitwise ORs r1 with imm12 and stores the result in r0
andi r0, r1, imm12 bitwise ANDs r1 with imm12 and stores the result in r0

Instructions in this group have the following encoding.

imm12 [31:20] unused [19:17] r1 [16:12] r0 [11:7] opcode [6:0]
12 bits 3 bits 5 bits 5 bits 7 bits

Load instructions

These instructions load a value from memory starting at the address pointed to by the contents of register r1 plus the sign-extended 12-bit immediate value imm12 into register r0. Memory is read from in little-endian order, i.e., values larger than a byte are read from memory with the least-significant byte first.

Instruction Description
lb r0, imm12(r1) Loads a byte from address imm12(r1) and sign-extends it into r0
lbu r0, imm12(r1) Loads a byte from address imm12(r1) and zero-extends it into r0
lh r0, imm12(r1) Loads a little-endian halfword from address imm12(r1) and sign-extends it into r0
lhu r0, imm12(r1) Loads a little-endian halfword from address imm12(r1) and zero-extends it into r0
lw r0, imm12(r1) Loads a little-endian word from address imm12(r1) into r0

Instructions in this group have the following encoding.

imm12 [31:20] unused [19:17] r1 [16:12] r0 [11:7] opcode [6:0]
12 bits 3 bits 5 bits 5 bits 7 bits

Store instructions

Instructions in this group store a value from register r0 into memory starting at the address pointed to by the contents of register r1 plus the sign-extended 12-bit immediate value imm12. Memory is written to in little-endian order, i.e., values larger than a byte are written to memory with the least-significant byte first.

Instruction Description
sb r0, imm12(r1) Stores the lowest byte of r0 into address imm12(r1)
sh r0, imm12(r1) Stores the lower halfword of r0 into address imm12(r1) in little-endian order
sw r0, imm12(r1) Stores the word in r0 into address imm12(r1) in little-endian order

Instructions in this group have the following encoding.

imm12 [31:20] unused [19:17] r1 [16:12] r0 [11:7] opcode [6:0]
12 bits 3 bits 5 bits 5 bits 7 bits

Memory ordering instructions

The fence instruction is used to ensure memory ordering. It currently has no effect on the Owl-2820 CPU and is effectively a no-operation (nop).

Instruction Description
fence Enforces memory ordering.

This instruction has the following encoding.

unused [31:7] opcode [6:0]
25 bits 7 bits

On the subject of nop instructions, I was recently reminded that when I was a teenager, I thought that FA would be an ideal encoding for a nop instruction. However amusing this would have been to my much younger self, it is not practical as it adds unnecessary complexity to decoding.

Subroutine call instructions

These instructions are used for calling subroutines.

Instruction Description
jalr r0, offs12(r1) calls a subroutine at offs12 plus r1, saving the return address in r0
jal r0, offs20 calls a subroutine at offs20, saving the return address in r0

The jalr instruction has the following encoding.

offs12 [31:20] unused [19:17] r1 [16:12] r0 [11:7] opcode [6:0]
12 bits 3 bits 5 bits 5 bits 7 bits

The jal instruction has the following encoding.

offs20 [31:12] r0 [11:7] opcode [6:0]
20 bits 5 bits 7 bits

Miscellaneous instructions

We’ve already seen lui in action when combined with addi to put 32-bit immediate values into a register. We can do something similar with auipc which combines with addi to put a 32-bit address into a register. In the latter case the address is an absolute address that is calculated relative to the value in the program counter, pc.

Both these combinations mean that jalr can call a subroutine anywhere in Owl’s 32-bit address space. This would not be possible otherwise, as the Owl-2820 instruction encoding cannot specify a 32-bit address in a single instruction, because instructions themselves occupy 32-bits.

Instruction Description
lui r0, uimm20 loads an immediate value into the upper bits of r0, setting its lower 12 bits to zero
auipc r0, uimm20 adds uimm20 to the upper 20 bits of pc, storing the result in r0

The lui and auipc instructions have the following encoding.

uimm20 [31:12] r0 [11:7] opcode [6:0]
20 bits 5 bits 7 bits

Owl-2820 only instructions

The instructions in this group are specific to the Owl-2820.

Each of them could have been implemented with existing RV32I instructions, but this would be slower to decode and dispatch. You might recall that we compared RISC-V encoding with Owl encoding in part 1 and learned that an encoding that is simple to decode in hardware may need multiple steps to decode in software.

For instance, the call instruction could have been implemented as jal ra, offs20, but in my experience the extra work needed to decode it is measurable, so I gave it its own opcode. The same goes for j which could have been implemented as jal zero, offs20, and ret which could could have been implemented as jalr zero, 0(ra). Similarly, both li and mv could have been implemented as addi r0, zero, imm12 and add r0, r1, zero respectively.

Another consideration is simply one of readability. For example, part 4 of this series is much easier to understand in terms of call and ret than it would have been otherwise.

Instruction Description
j offs20 jumps to offs20
call offs20 calls a subroutine at offs20, saving the return address in the return address register, ra
ret returns from a subroutine by jumping to the address in the return address register, ra
li r0, imm12 loads an immediate value into register r0
mv r0, r1 copies the value in r1 into r0

The j and call instructions have the following encoding.

offs20 [31:12] r0 [11:7] opcode [6:0]
20 bits 5 bits 7 bits

The ret instruction has the following encoding.

unused [31:7] opcode [6:0]
25 bits 7 bits

The li instruction has the following encoding.

imm12 [31:20] unused [19:12] r0 [11:7] opcode [6:0]
12 bits 8 bits 5 bits 7 bits

The mv instruction has the following encoding.

unused [31:17] r1 [16:12] r0 [11:7] opcode [6:0]
15 bits 5 bits 5 bits 7 bits

Show me the code

There are no code samples in this post because there is not much to explain. Individual Owl-2820 instructions do very little, and can be implemented in a few lines of code each.

You can find the full code for this post on Compiler Explorer.

Summary

In this post we implemented the rest of the Owl-2820 instruction set. This was straightforward, because individual instructions can be implemented in no more than half a dozen lines of code.

In theory, since Owl-2820’s instruction set is based on RV32I, we have also implemented an emulator for the RV32I instruction set. In practice we’re not quite there, because Owl-2820 instructions have a different encoding to RV32I as this simplifies decoding them in software.

This is limiting, because it means that any programs created for our Owl-2820 VM must be written in Owl-2820 assembly language using our rudimentary assembler.

Coming soon

In the short term it could be interesting to develop a few small programs in Owl-2820 assembly language, and if you’re so inclined then I encourage you to try. However, after a while this would become tedious and repetitive.

What we really need is a way of writing Owl-2820 programs in a higher level language.

Ideally, we would write programs for our Owl-2820 VM in a language such as C, C++, Rust or Swift, then compile them for an RV32I target and run them on our Owl-2820 VM.

Doing this will be the theme for the next few posts.