refrences

This commit is contained in:
2026-02-19 16:16:24 -05:00
parent 3ce2977f01
commit 2d43f1711c
90 changed files with 30482 additions and 1 deletions

View File

@@ -0,0 +1,184 @@
# call versus branch-and-link
**Source:** https://muforth.dev/call-versus-branch-and-link/
Call versus branch-and-link muforth
# Call versus branch-and-link
---
### [Whats the big difference?](#whats-the-big-difference)
Years ago, while pondering the “nature” of threaded code something struck me: In a threaded code system the *called* routine decides whether or not to push a return address onto the call stack.
This is in contrast to native code running on “traditional” architectures with “call” instructions. Call instructions *always* push the return address, even when its unnecessary. (Its unnecessary when the code being is called is *leaf* routine one that makes no further calls. But the caller doesnt and cant! know this.)
RISC architects realized that always pushing the return address was inefficient doing so creates unnecessary memory traffic and instead of call instructions specified “branch and link” instructions.
Whats the difference?
Operationally, a call instruction does two things:
* pushes onto the stack the address of the following instruction
* jumps to the destination
A branch-and-link instruction maybe it should be called link-and-branch? also does two things:
* captures in a register (the “link” register) the address of the following instruction
* jumps to the destination
If the *called* code is a leaf routine, it doesnt need to push the link register onto the stack. It is careful, however, to preserve its value; then, at exit, it returns to its caller by jumping to the address in the link register.
If the called code *will* be making calls to other code, on entry it pushes the link register (to preserve it), and on exit it *first* pops and restores it, and then retuns to its caller as above, by jumping to the address in the register.
### [Not created equal](#not-created-equal)
There are two variations of branch-and-link:
* one that always uses the same link register (eg, ARM)
* one that allows specification of the link register (RISC-V, and probably others Im not a RISC expert. ;-)
The latter is much more powerful. If all your code does is “normal” calls and returns, the difference is unimportant. But there is a curious class of uses for which the difference is important: when using a branch-and-link to capture the following address for a purpose *other* than to return to it later.
Say whaaaaat? Why would you ever do this?
### [Creative misuse](#creative-misuse)
I mentioned in my discussion of [threaded code](/threaded-code/) that one can “misuse” a call instruction to capture the following address. This is sometimes useful when writing the “runtime” behaviors of Forth words. But using call in this way is inefficient: you capture the following address by pushing it, but then immediately pop it and do something with it.
Misusing branch-and-link instructions to do the same thing is much more efficient. The address is captured in a register, and the code moves on. No push and pop. (There *is* a jump involved, so perhaps a pipeline refill occurs.)
And if, for some reason, we want to do this twice, in immediate succession, we simply specify a different link register in each of the branch-and-links. (Hmm, and now we are perhaps doing *two* pipeline refills in quick succession...)
But again, why?
In a non-threaded implementation of Forths create/does words, this is *precisely* what happens. Ill explain how this works by first showing how it works in a ITC (indirect-threaded code) system, and then replace some pointers with jump-and-link instructions.
Well look how three kinds of Forth words are represented: colon words, variables, and “incrementers” (which are going to be defined via create/does).
Our example colon word was defined in Forth like this:
```
: 4* 2* 2* ;
```
and its compiled form looks like this:
```
4*
~~~~~~~~~~~
addr of NEST ( code field)
addr of 2* ( parameter field)
addr of 2*
addr of UNNEST
```
Our variable looks like this:
```
radix
~~~~~~~~~~~
addr of VARIABLE (code field)
2 (parameter field)
```
And our created word the most complicated of our examples was created by the following Forth code:
```
: incr create , does> @ + ;
4 incr cell+
```
To further complicate this example which is really the *key* to my argument we will assume a tethered cross-compiled Forth, which means that the words `create , does>` execute on the host machine, and the target contains only the *runtime* pieces. `cell+` and `incr` look like this:
```
cell+
+----------+
| addr |---+ ( code field)
+----------+ |
4 | ( parameter field)
|
incr |
~~~~~~~~~~~ |
jal DODOES <-+
addr of @
addr of +
addr of UNNEST
```
```
DODOES: Push IP onto R stack
Push pfa onto D stack
Execute NEXT
```
Two notes about `incr`. It is *not* a normal Forth word with a code field and a parameter field. It is like the built-in “runtimes” for colon words or variables. In this case unlike with VARIABLE or NEST we want to express the runtime in *Forth* code rather than machine code. But runtimes have to *start* with machine code. And a `does>` runtime has to do two things: nest the execution of Forth (just as we would do if we were calling a colon word), and push the pfa of the created word (`cell+` in our example.)
Since this behavior is common to all `does>` runtimes, we compile it *once* and `jal` to it.
Thus, `incr` *begins* with `jal DODOES`. This is the *one* place in an ITC system where *code* specific to an architecture has to be compiled into a words body. The Forth code in a parent (defining) word that specifies the execution behavior of child words ie, the code following `does>` has to be *prefixed* by machine code of some kind so that the code field in child words can point to it. Code fields *always* point to machine code.
Now for the point of this exercise. Lets convert these three examples to “native” code, using jump-and-link instructions in the bodies of colon words instead of lists of pointers. Note that we use two different link registers: `w` in code fields, and `ra` in colon word bodies.
```
4*
~~~~~~~~~~~
jal w NEST (code field)
jal ra 2* (parameter field)
jal ra 2*
jal ra UNNEST
```
NEST in this world changes slightly from the ITC version. Notice that the `jal w NEST` captures the pfa which is the address of the first call to `2*` in `w`. NEST now looks like this:
```
NEST: Push RA onto R stack
jr 0(w)
```
That `jr 0(w)` means “return to the body of the colon word and start executing the code there”.
Our variable looks like this:
```
radix
~~~~~~~~~~~
jal w VARIABLE (code field)
2 (parameter field)
```
Again, the `jal w VARIABLE` captures the pfa in `w`.
Since there is one more level of “nesting” in `does>` words, we need a *third* link register. Lets use x.
```
cell+
~~~~~~~~~~~
jal w INCR (code field)
4 (parameter field)
```
```
INCR: jal x DODOES
jal ra @
jal ra +
jal ra UNNEST
```
```
DODOES: Push RA onto R stack
Push W (pfa) onto D stack
jr 0(x)
```
The `jal w INCR` in `cell+` captures the pfa in `w`. The `jal x DODOES` in INCR captures the address of the Forth code that will be executed after W is pushed onto D stack. DODOES pushes both stacks, and then “returns” to the body of INCR.
Its convoluted, and its possibly inefficient because we are executing *three* `jal` instructions in quick succession, and probably causing a series of pipeline refills but its very little code, and its elegant in a twisted way. ;-)
---
[Send feedback](/cdn-cgi/l/email-protection#2f0a18180a191a0a191d0a19170a191e0a194b0a181c0a181b0a191a0a181d0a1b1f0a194a0a19160a194b0a191d0a194c0a191a0a194b0a191e0a191c0a19170a19160a194a0a191a0a181c0a1d4a0a191c0a19490a194b105c5a4d454a4c5b120a1a4d425a49405d5b470a1a4b0a1d1f6c4e43430a1d1f594a5d5c5a5c0a1d1f4d5d4e414c47024e414b0243464144) on this page (last edited 2017 May 01)
Browse [all pages](/all-pages/)
Return [home](/)