Implementing the remaining instructions in the Owl-2820 instruction set.
Part 7 Implementing the Owl-2820 instruction set
Over the last couple of posts we’ve added memory to the Owl-2820 VM and we’ve seen how we can implement system calls.
However, we haven’t implemented all of the instructions in the Owl-2820 instruction set, as we’ve limited ourselves to just enough instructions to be able to run programs that implement Fibonacci. The Owl-2820 VM needs to be able to do more than run Fibonacci, so in this post we’re going to implement the remaining instructions in the Owl-2820 instruction set.
Recap
We implemented these instructions in part 1.
Instruction | Description |
---|---|
add r0, r1, r2 |
add the values in registers r1 and r2 and stores the result into r0 |
addi r0, r1, imm12 |
adds the value in register r1 to an immediate value and stores the result into r0 |
beq r0, r1, offs12 |
compares the values in registers r0 and r1 and branches to offs12 if they’re equal |
bltu r0, r1, offs12 |
compares the values in registers r0 and r1 and branches to offs12 if r0 is less than r1 |
j offs20 |
jumps to offs20 |
li r0, imm12 |
loads an immediate value into register r0 |
lui r0, uimm20 |
loads an immediate value into the upper bits of register r0 |
mv r0, r1 |
copies the value in r1 into r0 |
We implemented call
, ret
and ecall
in part 4.
Instruction | Description |
---|---|
call offs20 |
calls a subroutine at offs20 , saving the return address in the return address register, ra |
ret |
returns from a subroutine by jumping to the address in the return address register, ra |
ecall |
invokes a system call |
We added memory access instructions in part 5.
Instruction | Description |
---|---|
lb r0, imm12(r1) |
Loads a byte from address imm12(r1) and sign-extends it into r0 |
lbu r0, imm12(r1) |
Loads a byte from address imm12(r1) and zero-extends it into r0 |
lh r0, imm12(r1) |
Loads a little-endian halfword from address imm12(r1) and sign-extends it into r0 |
lhu r0, imm12(r1) |
Loads a little-endian halfword from address imm12(r1) and zero-extends it into r0 |
lw r0, imm12(r1) |
Loads a little-endian word from address imm12(r1) into r0 . |
sb r0, imm12(r1) |
Stores the lowest byte of r0 into address imm12(r1) |
sh r0, imm12(r1) |
Stores the lower halfword of r0 into address imm12(r1) in little-endian order |
sw r0, imm12(r1) |
Stores the word in r0 into address imm12(r1) in little-endian order |
The rest of the Owl
We learned in part 1 that the Owl-2820 CPU is based on RISC-V, specifically the RV32I base integer instruction set which has forty unique instructions. We’ve implemented nineteen - less than half that number. In fact, some of what we’ve implemented doesn’t actually have an RV32I equivalent. For example, you may be surprised to learn that j
, li
, mv
, call
and ret
are not really RISC-V instructions, but are instructions that I’ve added to the Owl-2820 instruction set to enhance decoding performance.
Unlike previously, I am not going to go into the implementation of each and every instruction otherwise this post would become very long. In most cases, we’ve already seen how at least one instruction in each group is implemented, and the remainder of the instructions in the group tend to be fairly similar to each other.
For example, instructions in the Branch instructions group differ only in the branch condition, such as less-than or greater-than. Similarly, arithmetic instructions in the Register-register instructions group differ only in the operator, such as plus or minus.
The full Owl-2820 instruction set
Here’s an overview of the full Owl-2820 instruction set. It is essentially the RV32I instruction set, translated into Owl-2820 terminology, plus some additional Owl-only instructions.
System instructions
The instructions in this group can be broadly categorised as system instructions.
Instruction | Description |
---|---|
ecall |
invokes a system call |
ebreak |
breakpoint |
Instructions in this group have the following encoding.
unused [31:7] |
opcode [6:0] |
---|---|
25 bits | 7 bits |
Register-register instructions
Instructions in this group perform an operation on the contents of registers r1 and r2, storing the result into register r0.
Instruction | Description |
---|---|
add r0, r1, r2 |
adds the values in r1 and r2 and stores the result into r0 |
sub r0, r1, r2 |
subtracts the value in r2 from r1 and stores the result in r0 |
sll r0, r1, r2 |
does a shift left of r1 by r2 positions and stores the result in r0 |
slt r0, r1, r2 |
sets r0 to 1 if r1 < r2 otherwise sets it to 0 (signed comparison) |
sltu r0, r1, r2 |
sets r0 to 1 if r1 < r2 otherwise sets it to 0 (unsigned comparison) |
xor r0, r1, r2 |
bitwise XORs r1 with r2 and stores the result in r0 |
srl r0, r1, r2 |
does a zero-extending shift right of r1 by r2 positions and stores the result in r0 |
sra r0, r1, r2 |
does a sign-extending shift right of r1 by r2 positions and stores the result in r0 |
or r0, r1, r2 |
bitwise ORs r1 with r2 and stores the result in r0 |
and r0, r1, r2 |
bitwise ANDs r1 with r2 and stores the result in r0 |
Instructions in this group have the following encoding.
unused [31:22] |
r2 [21:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
10 bits | 5 bits | 5 bits | 5 bits | 7 bits |
Immediate shift instructions
These instructions shift the contents of register r1 by an immediate shift value, storing the result into register r0.
Instruction | Description |
---|---|
slli r0, r1, shift |
does a shift left of r1 by shift positions and stores the result in r0 |
srli r0, r1, shift |
does a zero-extending shift right of r1 by shift positions and stores the result in r0 |
srai r0, r1, shift |
does a sign-extending shift right of r1 by shift positions and stores the result in r0 |
Instructions in this group have the following encoding.
unused [31:22] |
shift [21:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
10 bits | 5 bits | 5 bits | 5 bits | 7 bits |
Branch instructions
The conditional branch instructions perform a comparison operation on the contents of registers r0 and r1. If the comparison evaluates to true then they take the branch, offs12, which is a sign-extended 12-bit offset in multiples of two bytes relative to the address of the branch instruction.
Instruction | Description |
---|---|
beq r0, r1, offs12 |
branches to offs12 if r0 == r1 |
bne r0, r1, offs12 |
branches to offs12 if r0 != r1 |
blt r0, r1, offs12 |
branches to offs12 if r0 < r1 (signed comparison) |
bge r0, r1, offs12 |
branches to offs12 if r0 >= r1 (signed comparison) |
bltu r0, r1, offs12 |
branches to offs12 if r0 < r1 (unsigned comparison) |
bgeu r0, r1, offs12 |
branches to offs12 if r0 >= r1 (unsigned comparison) |
Instructions in this group have the following encoding.
offs12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | 7 bits |
Register-immediate instructions
Instructions in this group perform an operation on the contents of register r1 and a sign-extended 12-bit immediate value, imm12, storing the result into register r0.
Instruction | Description |
---|---|
addi r0, r1, imm12 |
adds the value in register r1 to an immediate value and stores the result into r0 |
slti r0, r1, imm12 |
sets r0 to 1 if r1 < imm12 (signed comparison) otherwise sets r0 to 0 |
sltiu r0, r1, imm12 |
sets r0 to 1 if r1 < imm12 (unsigned comparison) otherwise sets r0 to 0 |
xori r0, r1, imm12 |
bitwise XORs r1 with imm12 and stores the result in r0 |
ori r0, r1, imm12 |
bitwise ORs r1 with imm12 and stores the result in r0 |
andi r0, r1, imm12 |
bitwise ANDs r1 with imm12 and stores the result in r0 |
Instructions in this group have the following encoding.
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | 7 bits |
Load instructions
These instructions load a value from memory starting at the address pointed to by the contents of register r1 plus the sign-extended 12-bit immediate value imm12 into register r0. Memory is read from in little-endian order, i.e., values larger than a byte are read from memory with the least-significant byte first.
Instruction | Description |
---|---|
lb r0, imm12(r1) |
Loads a byte from address imm12(r1) and sign-extends it into r0 |
lbu r0, imm12(r1) |
Loads a byte from address imm12(r1) and zero-extends it into r0 |
lh r0, imm12(r1) |
Loads a little-endian halfword from address imm12(r1) and sign-extends it into r0 |
lhu r0, imm12(r1) |
Loads a little-endian halfword from address imm12(r1) and zero-extends it into r0 |
lw r0, imm12(r1) |
Loads a little-endian word from address imm12(r1) into r0 |
Instructions in this group have the following encoding.
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | 7 bits |
Store instructions
Instructions in this group store a value from register r0 into memory starting at the address pointed to by the contents of register r1 plus the sign-extended 12-bit immediate value imm12. Memory is written to in little-endian order, i.e., values larger than a byte are written to memory with the least-significant byte first.
Instruction | Description |
---|---|
sb r0, imm12(r1) |
Stores the lowest byte of r0 into address imm12(r1) |
sh r0, imm12(r1) |
Stores the lower halfword of r0 into address imm12(r1) in little-endian order |
sw r0, imm12(r1) |
Stores the word in r0 into address imm12(r1) in little-endian order |
Instructions in this group have the following encoding.
imm12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | 7 bits |
Memory ordering instructions
The fence
instruction is used to ensure memory ordering. It currently has no effect on the Owl-2820 CPU and is effectively a no-operation (nop
).
Instruction | Description |
---|---|
fence | Enforces memory ordering. |
This instruction has the following encoding.
unused [31:7] |
opcode [6:0] |
---|---|
25 bits | 7 bits |
On the subject of nop
instructions, I was recently reminded that when I was a teenager, I thought that FA
would be an ideal encoding for a nop
instruction. However amusing this would have been to my much younger self, it is not practical as it adds unnecessary complexity to decoding.
Subroutine call instructions
These instructions are used for calling subroutines.
Instruction | Description |
---|---|
jalr r0, offs12(r1) |
calls a subroutine at offs12 plus r1 , saving the return address in r0 |
jal r0, offs20 |
calls a subroutine at offs20 , saving the return address in r0 |
The jalr
instruction has the following encoding.
offs12 [31:20] |
unused [19:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|---|
12 bits | 3 bits | 5 bits | 5 bits | 7 bits |
The jal
instruction has the following encoding.
offs20 [31:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|
20 bits | 5 bits | 7 bits |
Miscellaneous instructions
We’ve already seen lui
in action when combined with addi
to put 32-bit immediate values into a register. We can do something similar with auipc
which combines with addi
to put a 32-bit address into a register. In the latter case the address is an absolute address that is calculated relative to the value in the program counter, pc.
Both these combinations mean that jalr
can call a subroutine anywhere in Owl’s 32-bit address space. This would not be possible otherwise, as the Owl-2820 instruction encoding cannot specify a 32-bit address in a single instruction, because instructions themselves occupy 32-bits.
Instruction | Description |
---|---|
lui r0, uimm20 |
loads an immediate value into the upper bits of r0 , setting its lower 12 bits to zero |
auipc r0, uimm20 |
adds uimm20 to the upper 20 bits of pc, storing the result in r0 |
The lui
and auipc
instructions have the following encoding.
uimm20 [31:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|
20 bits | 5 bits | 7 bits |
Owl-2820 only instructions
The instructions in this group are specific to the Owl-2820.
Each of them could have been implemented with existing RV32I instructions, but this would be slower to decode and dispatch. You might recall that we compared RISC-V encoding with Owl encoding in part 1 and learned that an encoding that is simple to decode in hardware may need multiple steps to decode in software.
For instance, the call
instruction could have been implemented as jal ra, offs20
, but in my experience the extra work needed to decode it is measurable, so I gave it its own opcode. The same goes for j
which could have been implemented as jal zero, offs20
, and ret
which could could have been implemented as jalr zero, 0(ra)
. Similarly, both li
and mv
could have been implemented as addi r0, zero, imm12
and add r0, r1, zero
respectively.
Another consideration is simply one of readability. For example, part 4 of this series is much easier to understand in terms of call
and ret
than it would have been otherwise.
Instruction | Description |
---|---|
j offs20 |
jumps to offs20 |
call offs20 |
calls a subroutine at offs20 , saving the return address in the return address register, ra |
ret |
returns from a subroutine by jumping to the address in the return address register, ra |
li r0, imm12 |
loads an immediate value into register r0 |
mv r0, r1 |
copies the value in r1 into r0 |
The j
and call
instructions have the following encoding.
offs20 [31:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|
20 bits | 5 bits | 7 bits |
The ret
instruction has the following encoding.
unused [31:7] |
opcode [6:0] |
---|---|
25 bits | 7 bits |
The li
instruction has the following encoding.
imm12 [31:20] |
unused [19:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|
12 bits | 8 bits | 5 bits | 7 bits |
The mv
instruction has the following encoding.
unused [31:17] |
r1 [16:12] |
r0 [11:7] |
opcode [6:0] |
---|---|---|---|
15 bits | 5 bits | 5 bits | 7 bits |
Show me the code
There are no code samples in this post because there is not much to explain. Individual Owl-2820 instructions do very little, and can be implemented in a few lines of code each.
You can find the full code for this post on Compiler Explorer.
Summary
In this post we implemented the rest of the Owl-2820 instruction set. This was straightforward, because individual instructions can be implemented in no more than half a dozen lines of code.
In theory, since Owl-2820’s instruction set is based on RV32I, we have also implemented an emulator for the RV32I instruction set. In practice we’re not quite there, because Owl-2820 instructions have a different encoding to RV32I as this simplifies decoding them in software.
This is limiting, because it means that any programs created for our Owl-2820 VM must be written in Owl-2820 assembly language using our rudimentary assembler.
Coming soon
In the short term it could be interesting to develop a few small programs in Owl-2820 assembly language, and if you’re so inclined then I encourage you to try. However, after a while this would become tedious and repetitive.
What we really need is a way of writing Owl-2820 programs in a higher level language.
Ideally, we would write programs for our Owl-2820 VM in a language such as C, C++, Rust or Swift, then compile them for an RV32I target and run them on our Owl-2820 VM.
Doing this will be the theme for the next few posts.