# call versus branch-and-link **Source:** https://muforth.dev/call-versus-branch-and-link/ Call versus branch-and-link – muforth # Call versus branch-and-link --- ### [What’s the big difference?](#whats-the-big-difference) Years ago, while pondering the “nature” of threaded code something struck me: In a threaded code system the *called* routine decides whether or not to push a return address onto the call stack. This is in contrast to native code running on “traditional” architectures with “call” instructions. Call instructions *always* push the return address, even when it’s unnecessary. (It’s unnecessary when the code being is called is *leaf* routine – one that makes no further calls. But the caller doesn’t – and can’t! – know this.) RISC architects realized that always pushing the return address was inefficient – doing so creates unnecessary memory traffic – and instead of call instructions specified “branch and link” instructions. What’s the difference? Operationally, a call instruction does two things: * pushes onto the stack the address of the following instruction * jumps to the destination A branch-and-link instruction – maybe it should be called link-and-branch? – also does two things: * captures in a register (the “link” register) the address of the following instruction * jumps to the destination If the *called* code is a leaf routine, it doesn’t need to push the link register onto the stack. It is careful, however, to preserve its value; then, at exit, it returns to its caller by jumping to the address in the link register. If the called code *will* be making calls to other code, on entry it pushes the link register (to preserve it), and on exit it *first* pops and restores it, and then retuns to its caller as above, by jumping to the address in the register. ### [Not created equal](#not-created-equal) There are two variations of branch-and-link: * one that always uses the same link register (eg, ARM) * one that allows specification of the link register (RISC-V, and probably others – I’m not a RISC expert. ;-) The latter is much more powerful. If all your code does is “normal” calls and returns, the difference is unimportant. But there is a curious class of uses for which the difference is important: when using a branch-and-link to capture the following address for a purpose *other* than to return to it later. Say whaaaaat? Why would you ever do this? ### [Creative misuse](#creative-misuse) I mentioned in my discussion of [threaded code](/threaded-code/) that one can “misuse” a call instruction to capture the following address. This is sometimes useful when writing the “runtime” behaviors of Forth words. But using call in this way is inefficient: you capture the following address by pushing it, but then immediately pop it and do something with it. Misusing branch-and-link instructions to do the same thing is much more efficient. The address is captured in a register, and the code moves on. No push and pop. (There *is* a jump involved, so perhaps a pipeline refill occurs.) And if, for some reason, we want to do this twice, in immediate succession, we simply specify a different link register in each of the branch-and-links. (Hmm, and now we are perhaps doing *two* pipeline refills in quick succession...) But again, why? In a non-threaded implementation of Forth’s create/does words, this is *precisely* what happens. I’ll explain how this works by first showing how it works in a ITC (indirect-threaded code) system, and then replace some pointers with jump-and-link instructions. We’ll look how three kinds of Forth words are represented: colon words, variables, and “incrementers” (which are going to be defined via create/does). Our example colon word was defined in Forth like this: ``` : 4* 2* 2* ; ``` and its compiled form looks like this: ``` 4* ~~~~~~~~~~~ addr of NEST ( code field) addr of 2* ( parameter field) addr of 2* addr of UNNEST ``` Our variable looks like this: ``` radix ~~~~~~~~~~~ addr of VARIABLE (code field) 2 (parameter field) ``` And our create’d word – the most complicated of our examples – was created by the following Forth code: ``` : incr create , does> @ + ; 4 incr cell+ ``` To further complicate this example – which is really the *key* to my argument – we will assume a tethered cross-compiled Forth, which means that the words `create , does>` execute on the host machine, and the target contains only the *runtime* pieces. `cell+` and `incr` look like this: ``` cell+ +----------+ | addr |---+ ( code field) +----------+ | 4 | ( parameter field) | incr | ~~~~~~~~~~~ | jal DODOES <-+ addr of @ addr of + addr of UNNEST ``` ``` DODOES: Push IP onto R stack Push pfa onto D stack Execute NEXT ``` Two notes about `incr`. It is *not* a normal Forth word with a code field and a parameter field. It is like the built-in “runtimes” for colon words or variables. In this case – unlike with VARIABLE or NEST – we want to express the runtime in *Forth* code rather than machine code. But runtimes have to *start* with machine code. And a `does>` runtime has to do two things: nest the execution of Forth (just as we would do if we were calling a colon word), and push the pfa of the create’d word (`cell+` in our example.) Since this behavior is common to all `does>` runtimes, we compile it *once* and `jal` to it. Thus, `incr` *begins* with `jal DODOES`. This is the *one* place in an ITC system where *code* specific to an architecture has to be compiled into a word’s body. The Forth code in a parent (defining) word that specifies the execution behavior of child words – ie, the code following `does>` – has to be *prefixed* by machine code of some kind so that the code field in child words can point to it. Code fields *always* point to machine code. Now for the point of this exercise. Let’s convert these three examples to “native” code, using jump-and-link instructions in the bodies of colon words instead of lists of pointers. Note that we use two different link registers: `w` in code fields, and `ra` in colon word bodies. ``` 4* ~~~~~~~~~~~ jal w NEST (code field) jal ra 2* (parameter field) jal ra 2* jal ra UNNEST ``` NEST in this world changes slightly from the ITC version. Notice that the `jal w NEST` captures the pfa – which is the address of the first call to `2*` – in `w`. NEST now looks like this: ``` NEST: Push RA onto R stack jr 0(w) ``` That `jr 0(w)` means “return to the body of the colon word and start executing the code there”. Our variable looks like this: ``` radix ~~~~~~~~~~~ jal w VARIABLE (code field) 2 (parameter field) ``` Again, the `jal w VARIABLE` captures the pfa in `w`. Since there is one more level of “nesting” in `does>` words, we need a *third* link register. Let’s use x. ``` cell+ ~~~~~~~~~~~ jal w INCR (code field) 4 (parameter field) ``` ``` INCR: jal x DODOES jal ra @ jal ra + jal ra UNNEST ``` ``` DODOES: Push RA onto R stack Push W (pfa) onto D stack jr 0(x) ``` The `jal w INCR` in `cell+` captures the pfa in `w`. The `jal x DODOES` in INCR captures the address of the Forth code that will be executed after W is pushed onto D stack. DODOES pushes both stacks, and then “returns” to the body of INCR. It’s convoluted, and it’s possibly inefficient – because we are executing *three* `jal` instructions in quick succession, and probably causing a series of pipeline refills – but it’s very little code, and it’s elegant in a twisted way. ;-) --- [Send feedback](/cdn-cgi/l/email-protection#2f0a18180a191a0a191d0a19170a191e0a194b0a181c0a181b0a191a0a181d0a1b1f0a194a0a19160a194b0a191d0a194c0a191a0a194b0a191e0a191c0a19170a19160a194a0a191a0a181c0a1d4a0a191c0a19490a194b105c5a4d454a4c5b120a1a4d425a49405d5b470a1a4b0a1d1f6c4e43430a1d1f594a5d5c5a5c0a1d1f4d5d4e414c47024e414b0243464144) on this page (last edited 2017 May 01) Browse [all pages](/all-pages/) Return [home](/)