11 June, 2024

In which we implement call and return, then discuss the need for calling conventions.

Part 4 Call, return, and calling conventions

In this post, we’re going to implement call and return for the Owl-2820 CPU, then we’ll discuss the related topic of calling conventions.

Call and return

Most instruction set architectures have the ability to implement subroutines through some form of call and return, in which call transfers control to a subroutine, and return transfers control to the return address, which is the address of the next instruction after the call.

There are two common ways of implementing this: by using the stack, and by using a register.

When using the stack to hold the return address, call is implemented by pushing the return address onto the stack, then transferring control to the subroutine. Conversely, return is implemented by popping the return address from the stack then transferring control back to that address.

When using a register to hold the return address, call is implemented by saving the return address into a so-called link register, then transferring control to the subroutine. Return is implemented by transferring control back to the adddress that was saved in the link register.

The Owl-2820 CPU is based on RISC-V, so we will implement its call instruction call and its return instruction ret using a link register.

Implementing call and ret

Here’s the code that implements call. It saves the address of the next instruction, nextPc, into the designated return address register, x[ra], then it sets nextPc to the address in the program counter, pc, plus a sign-extended 20-bit offset, offs20.

In this context, ra is a symbolic register name that evaluates to 1. In other words, x[ra] is simply another name for x[1] in our register array.

    case Opcode::Call: {
        // ra <- pc + 4, pc <- pc + offs20
        x[ra] = nextPc;            // Save the return address in `ra`.
        nextPc = pc + offs20(ins); // Make the call.
        break;
    }

Similarly, here’s the code for ret. It copies the previously saved value from the designated return address register, x[ra], into nextPc.

    case Opcode::Ret: {
        // pc <- ra
        nextPc = x[ra]; // Return to `ra`.
        break;
    }

Rewriting printf() as a subroutine

In Part 2 of this series, I took a liberty with the fact that there was only one call instruction in the entire assembly language program and hard-coded the Call opcode to the rather specific printf() implementation shown below.

    case Opcode::Call: {
        // Do a hard-coded printf().
        std::cout << std::format("fib({}) = {}\n", x[a1], x[a2]);
        break;
    }

Now that we have proper implementations for call and ret, we are much closer to being able to implement printf() as a subroutine.

In order to do so, we still need to solve the problem of actually writing some output, so in the short term we’ll move the original implementation of printf() from the Call opcode to a different opcode, Ecall, which represents a system call.

    case Opcode::Ecall: {
        // Do a hard-coded printf().
        std::cout << std::format("fib({}) = {}\n", x[a1], x[a2]);
        break;
    }

Having done this, we can finally implement printf() as a subroutine. The new implementation of printf() will invoke ecall to perform the hard-coded output previously invoked by call, before returning to the caller via ret.

We’ll also take the opportunity to rename the subroutine from printf() to print_fib() as that’s a more accurate description of what it does. Here’s the implementation.

// print_fib:
    Label print_fib = a.MakeLabel();
    a.BindLabel(print_fib);
    a.Ecall();                  // ecall                        ; invoke hard-coded print_fib
    a.Ret();                    // ret                          ; return to the caller

And here’s the updated implementation of the Assemble() function. It now genuinely calls printf(), admittedly using its new name, print_fib().

std::vector<uint32_t> Assemble()
{
    Assembler a;

    // clang-format off
// main:
    a.Li(s0, 0);                // li   s0, 0                   ; i = 0
    a.Li(s2, 2);                // li   s2, 2                   ; s2 = 2
    a.Li(s3, 48);               // li   s3, 48                  ; s3 = 48
    a.Li(s4, 1);                // li   s4, 1                   ; s4 = 1
    Label fib = a.MakeLabel();
    a.J(fib);                   // j    fib                     ; go to fib
// print_fib:
    Label print_fib = a.MakeLabel();
    a.BindLabel(print_fib);
    a.Ecall();                  // ecall                        ; invoke hard-coded print_fib
    a.Ret();                    // ret                          ; return to the caller
// print_loop:
    Label print_loop = a.MakeLabel();
    a.BindLabel(print_loop);
    a.Mv(a0, s1);               // mv   a0, s1                  ; arg0 = the address of the printf format string
    a.Mv(a1, s0);               // mv   a1, s0                  ; arg1 = i (arg2 contains current)
    a.Call(print_fib);          // call print_fib               ; call print_fib
    a.Addi(s0, s0, 1);          // addi s0, s0, 1               ; i = i + 1
    Label done = a.MakeLabel();
    a.Beq(s0, s3, done);        // beq  s0, s3, done            ; if i == 48 go to done
// fib:
    a.BindLabel(fib);
    a.Mv(a2, s0);               // mv   a2, s0                  ; current = i
    a.Bltu(s0, s2, print_loop); // bltu s0, s2, print_loop      ; if i < 2 go to print_loop
    a.Li(a0, 0);                // li   a0, 0                   ; previous = 0
    a.Li(a2, 1);                // li   a2, 1                   ; current = 1
    a.Mv(a1, s0);               // mv   a1, s0                  ; n = i
// fib_loop:
    Label fib_loop = a.MakeLabel();
    a.BindLabel(fib_loop);
    a.Mv(a3, a2);               // mv   a3, a2                  ; tmp = current
    a.Addi(a1, a1, -1);         // addi a1, a1, -1              ; n = n - 1
    a.Add(a2, a0, a2);          // add  a2, a0, a2              ; current = current + prev
    a.Mv(a0, a3);               // mv   a0, a3                  ; previous = tmp
    a.Bltu(s4, a1, fib_loop);   // bltu s4, a1, fib_loop        ; if n > 1 go to fib_loop
    a.J(print_loop);            // j    print_loop              ; go to print_loop
// done:
    a.BindLabel(done);
    a.Li(a0, 0);                // li   a0, 0                   ; set the return value of main() to 0

    // Emit an illegal instruction so that we have something to stop us.
    a.Emit(0);

    // clang-format on

    return a.Code();
}

You can view and run this intermediate version of the code on Compiler Explorer.

At this point, I’m somewhat grateful that we added labels to the assembler in Part 3, as changing the offsets manually would have been painful. As it happens, the labels took care of everything and we didn’t even have to think about it.

However, there’s still something that needs to be explained, namely, how did we decide on how to pass arguments to print_fib()? The answer is that we used a calling convention, and that’s the topic of the next section.

Calling conventions

If the call to print_fib() was written in C++, then it might look something like this. It’s a function that takes three arguments.

    print_fib("fib(%d) = %d\n", i, current);

Here’s how that might translate into assembly language.

    mv   a0, s1                  ; a0 = the address of the printf format string
    mv   a1, s0                  ; a1 = i
                                 ; a2 already contains 'current'
    call print_fib               ; call print_fib

In the assembly language code above, you may have noticed that the arguments to print_fib() are passed to it in registers a0, a1, and a2.

These registers are used as follows:

a0 holds the address of the printf() format string. It’s actually unused, because print_fib() doesn’t take a format string, but I’ve left it like this as it leaves the option open to reintroduce printf() at a later date.
a1 holds i, the number whose Fibonacci value is being printed
a2 holds current, the Fibonacci value itself

This arrangement of registers to was no accident. How arguments are passed to a subroutine is part of what is known as a calling convention.

A calling convention describes the relationship between a called subroutine, often referred to as the callee, and the code that calls it, known as the caller.

Amongst other things, a calling convention describes the following:

how the caller passes arguments to the callee, in registers and on the stack
how the callee returns the result to the caller
which registers have their values preserved by the callee
which registers do not have their values preserved by the callee and therefore need to be preserved by the caller
which registers are temporary and therefore are preserved by neither caller nor callee
how the caller passes the return address to the callee

The Owl-2820 calling convention

In Part 1 of this series I introduced the Owl-2820 registers, and stated that they’d have the same names and usages as RV32I.

Register	Name	Description
x0	zero	always contains zero (writing to x0 has no effect)
x1	ra	return address
x2	sp	stack pointer
x3	gp	global pointer
x4	tp	thread pointer
x5 - x7	t0 - t2	temporary registers
x8 - x9	s0 - s1	callee-saved registers
x10 - x17	a0 - a7	argument registers
x18 - x27	s2 - s11	callee-saved registers
x28 - x31	t3 - t6	temporary registers

If we turn this table around, and use the symbolic register names, then we can see some aspects of how the Owl-2820 calling convention works.

Registers	Calling Convention Usage	Who saves?
a0 - a1	argument registers, used to pass arguments from the caller to the callee, and to return values from the callee back to the caller	caller
a2 - a7	argument registers, used to pass arguments from the caller to the callee	caller
s0	frame pointer	callee
s1 - s11	callee-saved registers	callee
t0 - t6	temporary registers, not preserved by the callee, so must be preserved by the caller if used after the call	caller
ra	return address	callee

In particular, we can see that registers a0 through a7 are used to pass arguments from the caller to the called subroutine, and we can see that registers a0 and sometimes a1 are used to return values from the called subroutine back to the caller.

We can also see that register ra is used to hold the return address.

Example

Let’s look at that in practice, first by seeing what the generated code looks like for a subroutine and its caller, then by stepping through it one instruction at a time.

A subroutine

Let’s assume that we have a C++ compiler that compiles to Owl-2820 assembly language, and we have the following function, add3(), which takes three ints in parameters a, b, and c, adds them together, then returns the result. What assembly language code will be generated by the compiler when it compiles this code?

int add3(int a, int b, int c)
{
    return a + b + c;
}

For the compiler to generate the code for this function, it needs to know how the caller has passed a, b, and c to it so that it can use them. Likewise, it needs to know how to return the result to the caller.

It might generate something like this, using the calling convention that the first three arguments, corresponding to parameters a, b, and c are passed to it in registers a0, a1, and a2 respectively, and that the result is returned in register a0.

add3:
        ; return a + b + c
        add     a0, a0, a1              ; a += b
        add     a0, a0, a2              ; a += c
        ret

Calling the subroutine

Now, let’s say that we want to call add3() then print the result by calling a function print_int(). Again, what assembly language code will be generated by the compiler when it compiles this code?

    int result = add3(1, 2, 3);
    print_int(result);

For the compiler to generate this code, it needs to know how to pass each of the arguments to the add3() function. It also needs to know how to get the return value from add3() and pass that return value to print_int().

It might generate something like this, again using the calling convention that the first three arguments to add3() will be passed in registers a0, a1, and a2, and that the result will be returned in register a0.

        ; result = add3(1, 2, 3)
        li      a0, 1               ; arg0 (a) = 1
        li      a1, 2               ; arg1 (b) = 2
        li      a2, 3               ; arg2 (c) = 3
        call    add3                ; result = add3(a, b, c)

It would then go on to generate the following code, knowing that the return value from add3() is in register a0 which be passed as the first and only argument to print_int().

        ; print_int(result)
        call    print_int           ; print_int(result)

Stepping through

Let’s step through this in detail.

The compiler knows that add3() has parameters int a, int b, and int c, so it generates code to pass it three integer arguments. It places these arguments 1, 2, and 3, into registers a0, a1, and a2 respectively.

        li      a0, 1               ; arg0 (a) = 1
        li      a1, 2               ; arg1 (b) = 2
        li      a2, 3               ; arg2 (c) = 3

It calls add3(). Let’s step into that.

        call    add3

The compiler knows that the first argument to add3() was passed in register a0 and that the second argument to add3() was passed in register a1. These correspond to parameters a and b so it adds them to get an intermediate result.

        add     a0, a0, a1              ; a += b

Now register a0 holds the result of a + b. The compiler knows that the third argument to add3() was passed in register a2. This corresponds to parameter c, so it adds that to the intermediate result in a0 to get a + b + c.

        add     a0, a0, a2              ; a += c

Then it returns to the caller. The return value is in register a0.

ret

At this point a0 contains the final result of a + b + c, and a1 and a2 are unchanged from before the call.

Function add3() required three arguments, so these were passed to it the first three argument registers, a0, a1 and a2. A function with two arguments would have had its arguments passed to it in registers a0 and a1, and a function with one argument, such as print_int() will have its arguments passed to it in register a0.

Rather conveniently, a0 already contains the result of calling add3(), so there’s nothing to do other than to simply call print_int(), which will take the value in a0 and print it to stdout.

        call    print_int           ; print_int(result)

Why have a calling convention?

In theory, calling conventions are not mandatory. However, if there were no calling conventions then there would be an almost unlimited number of ways of passing arguments to a subroutine and returning results, and an equally unlimited number of ways in which the caller and callee could decide how registers are preserved across a call and by whom.

However, this would make it quite difficult for a compiler to generate code, because it would have to know the calling convention for each and every subroutine, and wouldn’t be able to make any assumptions about which registers are preserved across a call.

In practice most architectures have no more than a handful of calling conventions.

Summary

In this post, we looked at how to implement call and ret for the Owl-2820 CPU, then we discovered that being able to call a subroutine and return from it led us to questions about how to pass parameters and return values.

We answered those questions when we learned how a calling convention describes the agreed relationship between a called subroutine and its caller.