refrences
This commit is contained in:
184
references/ForthNotes/call_versus_branch-and-link.md
Normal file
184
references/ForthNotes/call_versus_branch-and-link.md
Normal file
@@ -0,0 +1,184 @@
|
||||
# call versus branch-and-link
|
||||
|
||||
**Source:** https://muforth.dev/call-versus-branch-and-link/
|
||||
|
||||
Call versus branch-and-link – muforth
|
||||
|
||||
|
||||
# Call versus branch-and-link
|
||||
|
||||
---
|
||||
|
||||
### [What’s the big difference?](#whats-the-big-difference)
|
||||
|
||||
Years ago, while pondering the “nature” of threaded code something struck me: In a threaded code system the *called* routine decides whether or not to push a return address onto the call stack.
|
||||
|
||||
This is in contrast to native code running on “traditional” architectures with “call” instructions. Call instructions *always* push the return address, even when it’s unnecessary. (It’s unnecessary when the code being is called is *leaf* routine – one that makes no further calls. But the caller doesn’t – and can’t! – know this.)
|
||||
|
||||
RISC architects realized that always pushing the return address was inefficient – doing so creates unnecessary memory traffic – and instead of call instructions specified “branch and link” instructions.
|
||||
|
||||
What’s the difference?
|
||||
|
||||
Operationally, a call instruction does two things:
|
||||
|
||||
* pushes onto the stack the address of the following instruction
|
||||
* jumps to the destination
|
||||
|
||||
A branch-and-link instruction – maybe it should be called link-and-branch? – also does two things:
|
||||
|
||||
* captures in a register (the “link” register) the address of the following instruction
|
||||
* jumps to the destination
|
||||
|
||||
If the *called* code is a leaf routine, it doesn’t need to push the link register onto the stack. It is careful, however, to preserve its value; then, at exit, it returns to its caller by jumping to the address in the link register.
|
||||
|
||||
If the called code *will* be making calls to other code, on entry it pushes the link register (to preserve it), and on exit it *first* pops and restores it, and then retuns to its caller as above, by jumping to the address in the register.
|
||||
|
||||
### [Not created equal](#not-created-equal)
|
||||
|
||||
There are two variations of branch-and-link:
|
||||
|
||||
* one that always uses the same link register (eg, ARM)
|
||||
* one that allows specification of the link register (RISC-V, and probably others – I’m not a RISC expert. ;-)
|
||||
|
||||
The latter is much more powerful. If all your code does is “normal” calls and returns, the difference is unimportant. But there is a curious class of uses for which the difference is important: when using a branch-and-link to capture the following address for a purpose *other* than to return to it later.
|
||||
|
||||
Say whaaaaat? Why would you ever do this?
|
||||
|
||||
### [Creative misuse](#creative-misuse)
|
||||
|
||||
I mentioned in my discussion of [threaded code](/threaded-code/) that one can “misuse” a call instruction to capture the following address. This is sometimes useful when writing the “runtime” behaviors of Forth words. But using call in this way is inefficient: you capture the following address by pushing it, but then immediately pop it and do something with it.
|
||||
|
||||
Misusing branch-and-link instructions to do the same thing is much more efficient. The address is captured in a register, and the code moves on. No push and pop. (There *is* a jump involved, so perhaps a pipeline refill occurs.)
|
||||
|
||||
And if, for some reason, we want to do this twice, in immediate succession, we simply specify a different link register in each of the branch-and-links. (Hmm, and now we are perhaps doing *two* pipeline refills in quick succession...)
|
||||
|
||||
But again, why?
|
||||
|
||||
In a non-threaded implementation of Forth’s create/does words, this is *precisely* what happens. I’ll explain how this works by first showing how it works in a ITC (indirect-threaded code) system, and then replace some pointers with jump-and-link instructions.
|
||||
|
||||
We’ll look how three kinds of Forth words are represented: colon words, variables, and “incrementers” (which are going to be defined via create/does).
|
||||
|
||||
Our example colon word was defined in Forth like this:
|
||||
|
||||
```
|
||||
: 4* 2* 2* ;
|
||||
```
|
||||
|
||||
and its compiled form looks like this:
|
||||
|
||||
```
|
||||
4*
|
||||
~~~~~~~~~~~
|
||||
addr of NEST ( code field)
|
||||
addr of 2* ( parameter field)
|
||||
addr of 2*
|
||||
addr of UNNEST
|
||||
```
|
||||
|
||||
Our variable looks like this:
|
||||
|
||||
```
|
||||
radix
|
||||
~~~~~~~~~~~
|
||||
addr of VARIABLE (code field)
|
||||
2 (parameter field)
|
||||
```
|
||||
|
||||
And our create’d word – the most complicated of our examples – was created by the following Forth code:
|
||||
|
||||
```
|
||||
: incr create , does> @ + ;
|
||||
4 incr cell+
|
||||
```
|
||||
|
||||
To further complicate this example – which is really the *key* to my argument – we will assume a tethered cross-compiled Forth, which means that the words `create , does>` execute on the host machine, and the target contains only the *runtime* pieces. `cell+` and `incr` look like this:
|
||||
|
||||
```
|
||||
cell+
|
||||
+----------+
|
||||
| addr |---+ ( code field)
|
||||
+----------+ |
|
||||
4 | ( parameter field)
|
||||
|
|
||||
incr |
|
||||
~~~~~~~~~~~ |
|
||||
jal DODOES <-+
|
||||
addr of @
|
||||
addr of +
|
||||
addr of UNNEST
|
||||
```
|
||||
|
||||
```
|
||||
DODOES: Push IP onto R stack
|
||||
Push pfa onto D stack
|
||||
Execute NEXT
|
||||
```
|
||||
|
||||
Two notes about `incr`. It is *not* a normal Forth word with a code field and a parameter field. It is like the built-in “runtimes” for colon words or variables. In this case – unlike with VARIABLE or NEST – we want to express the runtime in *Forth* code rather than machine code. But runtimes have to *start* with machine code. And a `does>` runtime has to do two things: nest the execution of Forth (just as we would do if we were calling a colon word), and push the pfa of the create’d word (`cell+` in our example.)
|
||||
|
||||
Since this behavior is common to all `does>` runtimes, we compile it *once* and `jal` to it.
|
||||
|
||||
Thus, `incr` *begins* with `jal DODOES`. This is the *one* place in an ITC system where *code* specific to an architecture has to be compiled into a word’s body. The Forth code in a parent (defining) word that specifies the execution behavior of child words – ie, the code following `does>` – has to be *prefixed* by machine code of some kind so that the code field in child words can point to it. Code fields *always* point to machine code.
|
||||
|
||||
Now for the point of this exercise. Let’s convert these three examples to “native” code, using jump-and-link instructions in the bodies of colon words instead of lists of pointers. Note that we use two different link registers: `w` in code fields, and `ra` in colon word bodies.
|
||||
|
||||
```
|
||||
4*
|
||||
~~~~~~~~~~~
|
||||
jal w NEST (code field)
|
||||
jal ra 2* (parameter field)
|
||||
jal ra 2*
|
||||
jal ra UNNEST
|
||||
```
|
||||
|
||||
NEST in this world changes slightly from the ITC version. Notice that the `jal w NEST` captures the pfa – which is the address of the first call to `2*` – in `w`. NEST now looks like this:
|
||||
|
||||
```
|
||||
NEST: Push RA onto R stack
|
||||
jr 0(w)
|
||||
```
|
||||
|
||||
That `jr 0(w)` means “return to the body of the colon word and start executing the code there”.
|
||||
|
||||
Our variable looks like this:
|
||||
|
||||
```
|
||||
radix
|
||||
~~~~~~~~~~~
|
||||
jal w VARIABLE (code field)
|
||||
2 (parameter field)
|
||||
```
|
||||
|
||||
Again, the `jal w VARIABLE` captures the pfa in `w`.
|
||||
|
||||
Since there is one more level of “nesting” in `does>` words, we need a *third* link register. Let’s use x.
|
||||
|
||||
```
|
||||
cell+
|
||||
~~~~~~~~~~~
|
||||
jal w INCR (code field)
|
||||
4 (parameter field)
|
||||
```
|
||||
|
||||
```
|
||||
INCR: jal x DODOES
|
||||
jal ra @
|
||||
jal ra +
|
||||
jal ra UNNEST
|
||||
```
|
||||
|
||||
```
|
||||
DODOES: Push RA onto R stack
|
||||
Push W (pfa) onto D stack
|
||||
jr 0(x)
|
||||
```
|
||||
|
||||
The `jal w INCR` in `cell+` captures the pfa in `w`. The `jal x DODOES` in INCR captures the address of the Forth code that will be executed after W is pushed onto D stack. DODOES pushes both stacks, and then “returns” to the body of INCR.
|
||||
|
||||
It’s convoluted, and it’s possibly inefficient – because we are executing *three* `jal` instructions in quick succession, and probably causing a series of pipeline refills – but it’s very little code, and it’s elegant in a twisted way. ;-)
|
||||
|
||||
---
|
||||
|
||||
[Send feedback](/cdn-cgi/l/email-protection#2f0a18180a191a0a191d0a19170a191e0a194b0a181c0a181b0a191a0a181d0a1b1f0a194a0a19160a194b0a191d0a194c0a191a0a194b0a191e0a191c0a19170a19160a194a0a191a0a181c0a1d4a0a191c0a19490a194b105c5a4d454a4c5b120a1a4d425a49405d5b470a1a4b0a1d1f6c4e43430a1d1f594a5d5c5a5c0a1d1f4d5d4e414c47024e414b0243464144) on this page (last edited 2017 May 01)
|
||||
Browse [all pages](/all-pages/)
|
||||
Return [home](/)
|
||||
Reference in New Issue
Block a user