# call versus branch-and-link

**Source:** https://muforth.dev/call-versus-branch-and-link/

Call versus branch-and-link – muforth


# Call versus branch-and-link

---

### [What’s the big difference?](#whats-the-big-difference)

Years ago, while pondering the “nature” of threaded code something struck me: In a threaded code system the *called* routine decides whether or not to push a return address onto the call stack.

This is in contrast to native code running on “traditional” architectures with “call” instructions. Call instructions *always* push the return address, even when it’s unnecessary. (It’s unnecessary when the code being is called is *leaf* routine – one that makes no further calls. But the caller doesn’t – and can’t! – know this.)

RISC architects realized that always pushing the return address was inefficient – doing so creates unnecessary memory traffic – and instead of call instructions specified “branch and link” instructions.

What’s the difference?

Operationally, a call instruction does two things:

* pushes onto the stack the address of the following instruction
* jumps to the destination

A branch-and-link instruction – maybe it should be called link-and-branch? – also does two things:

* captures in a register (the “link” register) the address of the following instruction
* jumps to the destination

If the *called* code is a leaf routine, it doesn’t need to push the link register onto the stack. It is careful, however, to preserve its value; then, at exit, it returns to its caller by jumping to the address in the link register.

If the called code *will* be making calls to other code, on entry it pushes the link register (to preserve it), and on exit it *first* pops and restores it, and then retuns to its caller as above, by jumping to the address in the register.

### [Not created equal](#not-created-equal)

There are two variations of branch-and-link:

* one that always uses the same link register (eg, ARM)
* one that allows specification of the link register (RISC-V, and probably others – I’m not a RISC expert. ;-)

The latter is much more powerful. If all your code does is “normal” calls and returns, the difference is unimportant. But there is a curious class of uses for which the difference is important: when using a branch-and-link to capture the following address for a purpose *other* than to return to it later.

Say whaaaaat? Why would you ever do this?

### [Creative misuse](#creative-misuse)

I mentioned in my discussion of [threaded code](/threaded-code/) that one can “misuse” a call instruction to capture the following address. This is sometimes useful when writing the “runtime” behaviors of Forth words. But using call in this way is inefficient: you capture the following address by pushing it, but then immediately pop it and do something with it.

Misusing branch-and-link instructions to do the same thing is much more efficient. The address is captured in a register, and the code moves on. No push and pop. (There *is* a jump involved, so perhaps a pipeline refill occurs.)

And if, for some reason, we want to do this twice, in immediate succession, we simply specify a different link register in each of the branch-and-links. (Hmm, and now we are perhaps doing *two* pipeline refills in quick succession...)

But again, why?

In a non-threaded implementation of Forth’s create/does words, this is *precisely* what happens. I’ll explain how this works by first showing how it works in a ITC (indirect-threaded code) system, and then replace some pointers with jump-and-link instructions.

We’ll look how three kinds of Forth words are represented: colon words, variables, and “incrementers” (which are going to be defined via create/does).

Our example colon word was defined in Forth like this:

```
  : 4*   2* 2* ;
```

and its compiled form looks like this:

```
      4*
  ~~~~~~~~~~~
  addr of NEST    ( code field)
  addr of 2*      ( parameter field)
  addr of 2*
  addr of UNNEST
```

Our variable looks like this:

```
     radix
  ~~~~~~~~~~~
  addr of VARIABLE    (code field)
  2                   (parameter field)
```

And our create’d word – the most complicated of our examples – was created by the following Forth code:

```
  : incr   create ,  does> @ + ;
  4 incr cell+
```

To further complicate this example – which is really the *key* to my argument – we will assume a tethered cross-compiled Forth, which means that the words `create , does>` execute on the host machine, and the target contains only the *runtime* pieces. `cell+` and `incr` look like this:

```
     cell+
  +----------+
  |   addr   |---+   ( code field)
  +----------+   |
  4              |   ( parameter field)
                 |
     incr        |
  ~~~~~~~~~~~    |
  jal DODOES   <-+
  addr of @
  addr of +
  addr of UNNEST
```

```
  DODOES:  Push IP onto R stack
           Push pfa onto D stack
           Execute NEXT
```

Two notes about `incr`. It is *not* a normal Forth word with a code field and a parameter field. It is like the built-in “runtimes” for colon words or variables. In this case – unlike with VARIABLE or NEST – we want to express the runtime in *Forth* code rather than machine code. But runtimes have to *start* with machine code. And a `does>` runtime has to do two things: nest the execution of Forth (just as we would do if we were calling a colon word), and push the pfa of the create’d word (`cell+` in our example.)

Since this behavior is common to all `does>` runtimes, we compile it *once* and `jal` to it.

Thus, `incr` *begins* with `jal DODOES`. This is the *one* place in an ITC system where *code* specific to an architecture has to be compiled into a word’s body. The Forth code in a parent (defining) word that specifies the execution behavior of child words – ie, the code following `does>` – has to be *prefixed* by machine code of some kind so that the code field in child words can point to it. Code fields *always* point to machine code.

Now for the point of this exercise. Let’s convert these three examples to “native” code, using jump-and-link instructions in the bodies of colon words instead of lists of pointers. Note that we use two different link registers: `w` in code fields, and `ra` in colon word bodies.

```
      4*
  ~~~~~~~~~~~
  jal  w NEST    (code field)
  jal ra 2*      (parameter field)
  jal ra 2*
  jal ra UNNEST
```

NEST in this world changes slightly from the ITC version. Notice that the `jal w NEST` captures the pfa – which is the address of the first call to `2*` – in `w`. NEST now looks like this:

```
  NEST:  Push RA onto R stack
         jr 0(w)
```

That `jr 0(w)` means “return to the body of the colon word and start executing the code there”.

Our variable looks like this:

```
     radix
  ~~~~~~~~~~~
  jal w VARIABLE    (code field)
  2                 (parameter field)
```

Again, the `jal w VARIABLE` captures the pfa in `w`.

Since there is one more level of “nesting” in `does>` words, we need a *third* link register. Let’s use x.

```
     cell+
  ~~~~~~~~~~~
  jal w INCR   (code field)
  4            (parameter field)
```

```
  INCR:  jal  x DODOES
         jal ra @
         jal ra +
         jal ra UNNEST
```

```
  DODOES:  Push RA onto R stack
           Push W (pfa) onto D stack
           jr 0(x)
```

The `jal w INCR` in `cell+` captures the pfa in `w`. The `jal x DODOES` in INCR captures the address of the Forth code that will be executed after W is pushed onto D stack. DODOES pushes both stacks, and then “returns” to the body of INCR.

It’s convoluted, and it’s possibly inefficient – because we are executing *three* `jal` instructions in quick succession, and probably causing a series of pipeline refills – but it’s very little code, and it’s elegant in a twisted way. ;-)

---

[Send feedback](/cdn-cgi/l/email-protection#2f0a18180a191a0a191d0a19170a191e0a194b0a181c0a181b0a191a0a181d0a1b1f0a194a0a19160a194b0a191d0a194c0a191a0a194b0a191e0a191c0a19170a19160a194a0a191a0a181c0a1d4a0a191c0a19490a194b105c5a4d454a4c5b120a1a4d425a49405d5b470a1a4b0a1d1f6c4e43430a1d1f594a5d5c5a5c0a1d1f4d5d4e414c47024e414b0243464144) on this page (last edited 2017 May 01)  
Browse [all pages](/all-pages/)  
Return [home](/)