In which we implement call and return, then discuss the need for calling conventions.
Part 4 Call, return, and calling conventions
In this post, we’re going to implement call and return for the Owl-2820 CPU, then we’ll discuss the related topic of calling conventions.
Call and return
Most instruction set architectures have the ability to implement subroutines through some form of call and return, in which call transfers control to a subroutine, and return transfers control to the return address, which is the address of the next instruction after the call.
There are two common ways of implementing this: by using the stack, and by using a register.
When using the stack to hold the return address, call is implemented by pushing the return address onto the stack, then transferring control to the subroutine. Conversely, return is implemented by popping the return address from the stack then transferring control back to that address.
When using a register to hold the return address, call is implemented by saving the return address into a so-called link register, then transferring control to the subroutine. Return is implemented by transferring control back to the adddress that was saved in the link register.
The Owl-2820 CPU is based on RISC-V, so we will implement its call instruction call
and its return instruction ret
using a link register.
Implementing call and ret
Here’s the code that implements call
. It saves the address of the next instruction, nextPc, into the designated return address register, x[ra], then it sets nextPc to the address in the program counter, pc, plus a sign-extended 20-bit offset, offs20.
In this context, ra is a symbolic register name that evaluates to 1. In other words, x[ra] is simply another name for x[1] in our register array.
case Opcode::Call: {
// ra <- pc + 4, pc <- pc + offs20
x[ra] = nextPc; // Save the return address in `ra`.
nextPc = pc + offs20(ins); // Make the call.
break;
}
Similarly, here’s the code for ret
. It copies the previously saved value from the designated return address register, x[ra], into nextPc.
case Opcode::Ret: {
// pc <- ra
nextPc = x[ra]; // Return to `ra`.
break;
}
Rewriting printf() as a subroutine
In Part 2 of this series, I took a liberty with the fact that there was only one call
instruction in the entire assembly language program and hard-coded the Call
opcode to the rather specific printf()
implementation shown below.
case Opcode::Call: {
// Do a hard-coded printf().
std::cout << std::format("fib({}) = {}\n", x[a1], x[a2]);
break;
}
Now that we have proper implementations for call
and ret
, we are much closer to being able to implement printf()
as a subroutine.
In order to do so, we still need to solve the problem of actually writing some output, so in the short term we’ll move the original implementation of printf()
from the Call
opcode to a different opcode, Ecall
, which represents a system call.
case Opcode::Ecall: {
// Do a hard-coded printf().
std::cout << std::format("fib({}) = {}\n", x[a1], x[a2]);
break;
}
Having done this, we can finally implement printf()
as a subroutine. The new implementation of printf()
will invoke ecall
to perform the hard-coded output previously invoked by call
, before returning to the caller via ret
.
We’ll also take the opportunity to rename the subroutine from printf()
to print_fib()
as that’s a more accurate description of what it does. Here’s the implementation.
// print_fib:
Label print_fib = a.MakeLabel();
a.BindLabel(print_fib);
a.Ecall(); // ecall ; invoke hard-coded print_fib
a.Ret(); // ret ; return to the caller
And here’s the updated implementation of the Assemble()
function. It now genuinely calls printf()
, admittedly using its new name, print_fib()
.
std::vector<uint32_t> Assemble()
{
Assembler a;
// clang-format off
// main:
a.Li(s0, 0); // li s0, 0 ; i = 0
a.Li(s2, 2); // li s2, 2 ; s2 = 2
a.Li(s3, 48); // li s3, 48 ; s3 = 48
a.Li(s4, 1); // li s4, 1 ; s4 = 1
Label fib = a.MakeLabel();
a.J(fib); // j fib ; go to fib
// print_fib:
Label print_fib = a.MakeLabel();
a.BindLabel(print_fib);
a.Ecall(); // ecall ; invoke hard-coded print_fib
a.Ret(); // ret ; return to the caller
// print_loop:
Label print_loop = a.MakeLabel();
a.BindLabel(print_loop);
a.Mv(a0, s1); // mv a0, s1 ; arg0 = the address of the printf format string
a.Mv(a1, s0); // mv a1, s0 ; arg1 = i (arg2 contains current)
a.Call(print_fib); // call print_fib ; call print_fib
a.Addi(s0, s0, 1); // addi s0, s0, 1 ; i = i + 1
Label done = a.MakeLabel();
a.Beq(s0, s3, done); // beq s0, s3, done ; if i == 48 go to done
// fib:
a.BindLabel(fib);
a.Mv(a2, s0); // mv a2, s0 ; current = i
a.Bltu(s0, s2, print_loop); // bltu s0, s2, print_loop ; if i < 2 go to print_loop
a.Li(a0, 0); // li a0, 0 ; previous = 0
a.Li(a2, 1); // li a2, 1 ; current = 1
a.Mv(a1, s0); // mv a1, s0 ; n = i
// fib_loop:
Label fib_loop = a.MakeLabel();
a.BindLabel(fib_loop);
a.Mv(a3, a2); // mv a3, a2 ; tmp = current
a.Addi(a1, a1, -1); // addi a1, a1, -1 ; n = n - 1
a.Add(a2, a0, a2); // add a2, a0, a2 ; current = current + prev
a.Mv(a0, a3); // mv a0, a3 ; previous = tmp
a.Bltu(s4, a1, fib_loop); // bltu s4, a1, fib_loop ; if n > 1 go to fib_loop
a.J(print_loop); // j print_loop ; go to print_loop
// done:
a.BindLabel(done);
a.Li(a0, 0); // li a0, 0 ; set the return value of main() to 0
// Emit an illegal instruction so that we have something to stop us.
a.Emit(0);
// clang-format on
return a.Code();
}
You can view and run this intermediate version of the code on Compiler Explorer.
At this point, I’m somewhat grateful that we added labels to the assembler in Part 3, as changing the offsets manually would have been painful. As it happens, the labels took care of everything and we didn’t even have to think about it.
However, there’s still something that needs to be explained, namely, how did we decide on how to pass arguments to print_fib()
? The answer is that we used a calling convention, and that’s the topic of the next section.
Calling conventions
If the call to print_fib()
was written in C++, then it might look something like this. It’s a function that takes three arguments.
print_fib("fib(%d) = %d\n", i, current);
Here’s how that might translate into assembly language.
mv a0, s1 ; a0 = the address of the printf format string
mv a1, s0 ; a1 = i
; a2 already contains 'current'
call print_fib ; call print_fib
In the assembly language code above, you may have noticed that the arguments to print_fib()
are passed to it in registers a0, a1, and a2.
These registers are used as follows:
- a0 holds the address of the
printf()
format string. It’s actually unused, becauseprint_fib()
doesn’t take a format string, but I’ve left it like this as it leaves the option open to reintroduceprintf()
at a later date. - a1 holds
i
, the number whose Fibonacci value is being printed - a2 holds
current
, the Fibonacci value itself
This arrangement of registers to was no accident. How arguments are passed to a subroutine is part of what is known as a calling convention.
A calling convention describes the relationship between a called subroutine, often referred to as the callee, and the code that calls it, known as the caller.
Amongst other things, a calling convention describes the following:
- how the caller passes arguments to the callee, in registers and on the stack
- how the callee returns the result to the caller
- which registers have their values preserved by the callee
- which registers do not have their values preserved by the callee and therefore need to be preserved by the caller
- which registers are temporary and therefore are preserved by neither caller nor callee
- how the caller passes the return address to the callee
The Owl-2820 calling convention
In Part 1 of this series I introduced the Owl-2820 registers, and stated that they’d have the same names and usages as RV32I.
Register | Name | Description |
---|---|---|
x0 | zero | always contains zero (writing to x0 has no effect) |
x1 | ra | return address |
x2 | sp | stack pointer |
x3 | gp | global pointer |
x4 | tp | thread pointer |
x5 - x7 | t0 - t2 | temporary registers |
x8 - x9 | s0 - s1 | callee-saved registers |
x10 - x17 | a0 - a7 | argument registers |
x18 - x27 | s2 - s11 | callee-saved registers |
x28 - x31 | t3 - t6 | temporary registers |
If we turn this table around, and use the symbolic register names, then we can see some aspects of how the Owl-2820 calling convention works.
Registers | Calling Convention Usage | Who saves? |
---|---|---|
a0 - a1 | argument registers, used to pass arguments from the caller to the callee, and to return values from the callee back to the caller | caller |
a2 - a7 | argument registers, used to pass arguments from the caller to the callee | caller |
s0 | frame pointer | callee |
s1 - s11 | callee-saved registers | callee |
t0 - t6 | temporary registers, not preserved by the callee, so must be preserved by the caller if used after the call | caller |
ra | return address | callee |
In particular, we can see that registers a0 through a7 are used to pass arguments from the caller to the called subroutine, and we can see that registers a0 and sometimes a1 are used to return values from the called subroutine back to the caller.
We can also see that register ra is used to hold the return address.
Example
Let’s look at that in practice, first by seeing what the generated code looks like for a subroutine and its caller, then by stepping through it one instruction at a time.
A subroutine
Let’s assume that we have a C++ compiler that compiles to Owl-2820 assembly language, and we have the following function, add3()
, which takes three int
s in parameters a
, b
, and c
, adds them together, then returns the result. What assembly language code will be generated by the compiler when it compiles this code?
int add3(int a, int b, int c)
{
return a + b + c;
}
For the compiler to generate the code for this function, it needs to know how the caller has passed a
, b
, and c
to it so that it can use them. Likewise, it needs to know how to return the result to the caller.
It might generate something like this, using the calling convention that the first three arguments, corresponding to parameters a
, b
, and c
are passed to it in registers a0, a1, and a2 respectively, and that the result is returned in register a0.
add3:
; return a + b + c
add a0, a0, a1 ; a += b
add a0, a0, a2 ; a += c
ret
Calling the subroutine
Now, let’s say that we want to call add3()
then print the result by calling a function print_int()
. Again, what assembly language code will be generated by the compiler when it compiles this code?
int result = add3(1, 2, 3);
print_int(result);
For the compiler to generate this code, it needs to know how to pass each of the arguments to the add3()
function. It also needs to know how to get the return value from add3()
and pass that return value to print_int()
.
It might generate something like this, again using the calling convention that the first three arguments to add3()
will be passed in registers a0, a1, and a2, and that the result will be returned in register a0.
; result = add3(1, 2, 3)
li a0, 1 ; arg0 (a) = 1
li a1, 2 ; arg1 (b) = 2
li a2, 3 ; arg2 (c) = 3
call add3 ; result = add3(a, b, c)
It would then go on to generate the following code, knowing that the return value from add3()
is in register a0 which be passed as the first and only argument to print_int()
.
; print_int(result)
call print_int ; print_int(result)
Stepping through
Let’s step through this in detail.
The compiler knows that add3()
has parameters int a
, int b
, and int c
, so it generates code to pass it three integer arguments. It places these arguments 1
, 2
, and 3
, into registers a0, a1, and a2 respectively.
li a0, 1 ; arg0 (a) = 1
li a1, 2 ; arg1 (b) = 2
li a2, 3 ; arg2 (c) = 3
It calls add3()
. Let’s step into that.
call add3
The compiler knows that the first argument to add3()
was passed in register a0 and that the second argument to add3()
was passed in register a1. These correspond to parameters a
and b
so it adds them to get an intermediate result.
add a0, a0, a1 ; a += b
Now register a0 holds the result of a + b
. The compiler knows that the third argument to add3()
was passed in register a2. This corresponds to parameter c
, so it adds that to the intermediate result in a0 to get a + b + c
.
add a0, a0, a2 ; a += c
Then it returns to the caller. The return value is in register a0.
ret
At this point a0 contains the final result of a + b + c
, and a1 and a2 are unchanged from before the call.
Function add3()
required three arguments, so these were passed to it the first three argument registers, a0, a1 and a2. A function with two arguments would have had its arguments passed to it in registers a0 and a1, and a function with one argument, such as print_int()
will have its arguments passed to it in register a0.
Rather conveniently, a0 already contains the result of calling add3()
, so there’s nothing to do other than to simply call print_int()
, which will take the value in a0 and print it to stdout.
call print_int ; print_int(result)
Why have a calling convention?
In theory, calling conventions are not mandatory. However, if there were no calling conventions then there would be an almost unlimited number of ways of passing arguments to a subroutine and returning results, and an equally unlimited number of ways in which the caller and callee could decide how registers are preserved across a call and by whom.
However, this would make it quite difficult for a compiler to generate code, because it would have to know the calling convention for each and every subroutine, and wouldn’t be able to make any assumptions about which registers are preserved across a call.
In practice most architectures have no more than a handful of calling conventions.
Summary
In this post, we looked at how to implement call
and ret
for the Owl-2820 CPU, then we discovered that being able to call a subroutine and return from it led us to questions about how to pass parameters and return values.
We answered those questions when we learned how a calling convention describes the agreed relationship between a called subroutine and its caller.