refrences

2026-02-19 16:16:24 -05:00
parent 3ce2977f01
commit 2d43f1711c
90 changed files with 30482 additions and 1 deletions
--- a/references/ForthNotes/call_versus_branch-and-link.md
+++ b/references/ForthNotes/call_versus_branch-and-link.md
@@ -0,0 +1,184 @@
+# call versus branch-and-link
+
+**Source:** https://muforth.dev/call-versus-branch-and-link/
+
+Call versus branch-and-link – muforth
+
+
+# Call versus branch-and-link
+
+---
+
+### [What’s the big difference?](#whats-the-big-difference)
+
+Years ago, while pondering the “nature” of threaded code something struck me: In a threaded code system the *called* routine decides whether or not to push a return address onto the call stack.
+
+This is in contrast to native code running on “traditional” architectures with “call” instructions. Call instructions *always* push the return address, even when it’s unnecessary. (It’s unnecessary when the code being is called is *leaf* routine – one that makes no further calls. But the caller doesn’t – and can’t! – know this.)
+
+RISC architects realized that always pushing the return address was inefficient – doing so creates unnecessary memory traffic – and instead of call instructions specified “branch and link” instructions.
+
+What’s the difference?
+
+Operationally, a call instruction does two things:
+
+* pushes onto the stack the address of the following instruction
+* jumps to the destination
+
+A branch-and-link instruction – maybe it should be called link-and-branch? – also does two things:
+
+* captures in a register (the “link” register) the address of the following instruction
+* jumps to the destination
+
+If the *called* code is a leaf routine, it doesn’t need to push the link register onto the stack. It is careful, however, to preserve its value; then, at exit, it returns to its caller by jumping to the address in the link register.
+
+If the called code *will* be making calls to other code, on entry it pushes the link register (to preserve it), and on exit it *first* pops and restores it, and then retuns to its caller as above, by jumping to the address in the register.
+
+### [Not created equal](#not-created-equal)
+
+There are two variations of branch-and-link:
+
+* one that always uses the same link register (eg, ARM)
+* one that allows specification of the link register (RISC-V, and probably others – I’m not a RISC expert. ;-)
+
+The latter is much more powerful. If all your code does is “normal” calls and returns, the difference is unimportant. But there is a curious class of uses for which the difference is important: when using a branch-and-link to capture the following address for a purpose *other* than to return to it later.
+
+Say whaaaaat? Why would you ever do this?
+
+### [Creative misuse](#creative-misuse)
+
+I mentioned in my discussion of [threaded code](/threaded-code/) that one can “misuse” a call instruction to capture the following address. This is sometimes useful when writing the “runtime” behaviors of Forth words. But using call in this way is inefficient: you capture the following address by pushing it, but then immediately pop it and do something with it.
+
+Misusing branch-and-link instructions to do the same thing is much more efficient. The address is captured in a register, and the code moves on. No push and pop. (There *is* a jump involved, so perhaps a pipeline refill occurs.)
+
+And if, for some reason, we want to do this twice, in immediate succession, we simply specify a different link register in each of the branch-and-links. (Hmm, and now we are perhaps doing *two* pipeline refills in quick succession...)
+
+But again, why?
+
+In a non-threaded implementation of Forth’s create/does words, this is *precisely* what happens. I’ll explain how this works by first showing how it works in a ITC (indirect-threaded code) system, and then replace some pointers with jump-and-link instructions.
+
+We’ll look how three kinds of Forth words are represented: colon words, variables, and “incrementers” (which are going to be defined via create/does).
+
+Our example colon word was defined in Forth like this:
+
+```
+  : 4*   2* 2* ;
+```
+
+and its compiled form looks like this:
+
+```
+      4*
+  ~~~~~~~~~~~
+  addr of NEST    ( code field)
+  addr of 2*      ( parameter field)
+  addr of 2*
+  addr of UNNEST
+```
+
+Our variable looks like this:
+
+```
+     radix
+  ~~~~~~~~~~~
+  addr of VARIABLE    (code field)
+  2                   (parameter field)
+```
+
+And our create’d word – the most complicated of our examples – was created by the following Forth code:
+
+```
+  : incr   create ,  does> @ + ;
+  4 incr cell+
+```
+
+To further complicate this example – which is really the *key* to my argument – we will assume a tethered cross-compiled Forth, which means that the words `create , does>` execute on the host machine, and the target contains only the *runtime* pieces. `cell+` and `incr` look like this:
+
+```
+     cell+
+  +----------+
+  |   addr   |---+   ( code field)
+  +----------+   |
+  4              |   ( parameter field)
+                 |
+     incr        |
+  ~~~~~~~~~~~    |
+  jal DODOES   <-+
+  addr of @
+  addr of +
+  addr of UNNEST
+```
+
+```
+  DODOES:  Push IP onto R stack
+           Push pfa onto D stack
+           Execute NEXT
+```
+
+Two notes about `incr`. It is *not* a normal Forth word with a code field and a parameter field. It is like the built-in “runtimes” for colon words or variables. In this case – unlike with VARIABLE or NEST – we want to express the runtime in *Forth* code rather than machine code. But runtimes have to *start* with machine code. And a `does>` runtime has to do two things: nest the execution of Forth (just as we would do if we were calling a colon word), and push the pfa of the create’d word (`cell+` in our example.)
+
+Since this behavior is common to all `does>` runtimes, we compile it *once* and `jal` to it.
+
+Thus, `incr` *begins* with `jal DODOES`. This is the *one* place in an ITC system where *code* specific to an architecture has to be compiled into a word’s body. The Forth code in a parent (defining) word that specifies the execution behavior of child words – ie, the code following `does>` – has to be *prefixed* by machine code of some kind so that the code field in child words can point to it. Code fields *always* point to machine code.
+
+Now for the point of this exercise. Let’s convert these three examples to “native” code, using jump-and-link instructions in the bodies of colon words instead of lists of pointers. Note that we use two different link registers: `w` in code fields, and `ra` in colon word bodies.
+
+```
+      4*
+  ~~~~~~~~~~~
+  jal  w NEST    (code field)
+  jal ra 2*      (parameter field)
+  jal ra 2*
+  jal ra UNNEST
+```
+
+NEST in this world changes slightly from the ITC version. Notice that the `jal w NEST` captures the pfa – which is the address of the first call to `2*` – in `w`. NEST now looks like this:
+
+```
+  NEST:  Push RA onto R stack
+         jr 0(w)
+```
+
+That `jr 0(w)` means “return to the body of the colon word and start executing the code there”.
+
+Our variable looks like this:
+
+```
+     radix
+  ~~~~~~~~~~~
+  jal w VARIABLE    (code field)
+  2                 (parameter field)
+```
+
+Again, the `jal w VARIABLE` captures the pfa in `w`.
+
+Since there is one more level of “nesting” in `does>` words, we need a *third* link register. Let’s use x.
+
+```
+     cell+
+  ~~~~~~~~~~~
+  jal w INCR   (code field)
+  4            (parameter field)
+```
+
+```
+  INCR:  jal  x DODOES
+         jal ra @
+         jal ra +
+         jal ra UNNEST
+```
+
+```
+  DODOES:  Push RA onto R stack
+           Push W (pfa) onto D stack
+           jr 0(x)
+```
+
+The `jal w INCR` in `cell+` captures the pfa in `w`. The `jal x DODOES` in INCR captures the address of the Forth code that will be executed after W is pushed onto D stack. DODOES pushes both stacks, and then “returns” to the body of INCR.
+
+It’s convoluted, and it’s possibly inefficient – because we are executing *three* `jal` instructions in quick succession, and probably causing a series of pipeline refills – but it’s very little code, and it’s elegant in a twisted way. ;-)
+
+---
+
+[Send feedback](/cdn-cgi/l/email-protection#2f0a18180a191a0a191d0a19170a191e0a194b0a181c0a181b0a191a0a181d0a1b1f0a194a0a19160a194b0a191d0a194c0a191a0a194b0a191e0a191c0a19170a19160a194a0a191a0a181c0a1d4a0a191c0a19490a194b105c5a4d454a4c5b120a1a4d425a49405d5b470a1a4b0a1d1f6c4e43430a1d1f594a5d5c5a5c0a1d1f4d5d4e414c47024e414b0243464144) on this page (last edited 2017 May 01)  
+Browse [all pages](/all-pages/)  
+Return [home](/)