This commit is contained in:
2026-02-20 21:19:59 -05:00
parent e630590065
commit b3984970a8
3 changed files with 126 additions and 13 deletions

View File

@@ -0,0 +1,86 @@
# Advanced Source-Less Programming & JIT Architecture: A Hardcore Technical Study
This document contains a deep-dive technical extraction of the mechanics, JIT compiler optimizations, and paradigms presented by Timothy Lottes and Onat Türkçüoğlu. These notes surpass high-level theory, detailing the exact x86-64 assembly generation rules, state-tracking mechanisms, and memory layouts required to implement a zero-overhead, source-less Forth environment.
---
## 1. The Lottes "x68" Paradigm: Editor as the OS
Lottes's approach fundamentally transforms the editor into a live, dynamic linker and machine-code orchestrator.
### 1.1 The Lexical Grid and 32-Bit Instruction Granularity
In x68, the runtime contains *no parsing logic*. Code is a flat array of 32-bit tokens (4-bit tag, 28-bit payload).
To make the x86-64 architecture fit this visual editor grid, Lottes forces all generated machine code to 32-bit boundaries:
* **Instruction Padding:** Native instructions that are smaller than 4 bytes are padded.
* *Example:* `RET` (`0xC3`) becomes `C3 90 90 90` (using three `NOP`s).
* *Example:* `MOV` or `ADD` can use ignored segment overrides (like the `3E` DS prefix) or unnecessary `REX` prefixes to reach exactly 4 bytes.
* **Auto-Relinking:** The editor implicitly acts as a linker. Because every instruction is 32 bits, 32-bit RIP-relative offsets for `CALL` (`E8`) and `JMP` (`E9`) are perfectly aligned. When the user inserts or deletes a token in the editor, the editor instantly recalculates and updates the raw binary relative offsets for all branch instructions.
* **Shorthand Assembly UI:** The editor can decode these 32-bit blocks and display human-readable macro-assembly, e.g., mapping `add rcx, qword ptr [rdx + 0x8]` to the visual string `h + at i08`.
### 1.2 ColorForth Semantic Tags & The State Machine
The 4-bit color tag dictates how the editor/JIT interprets the 28-bit payload:
* **White (Ignored):** Comments, formatting, or skipped words.
* **Yellow (Immediate Execution):**
* If a number: Append it to the data stack *during edit/compile time*.
* If a word: Look it up in the dictionary and execute its associated code *immediately*.
* **Red (Define):** Sets a word in the dictionary to point to the current compilation address (or TOS).
* **Green (Compile):**
* If a number: Emits machine code to push that number (e.g., `mov rax, imm`).
* If a word: Looks it up in the *Macro* dictionary; if found, calls it (code generation). Otherwise, looks it up in the *Forth* dictionary and emits a `CALL` to it.
* **Cyan/Blue (Defer Execution):** Looks up a word in the macro dictionary and appends a call to it. Used for macros that generate other macros.
* **Magenta (Variable/Pointer):** Sets the dictionary value to point to the *next source token* in memory.
* **The Transition Trigger:** A transition from Yellow (Execution) to Green (Compilation) causes the JIT to pop the current Top of Stack and emit a native machine-code instruction to push that value. (i.e., "Turning a computed number back into a program").
### 1.3 The 5-Byte Folded Interpreter
To eliminate the massive pipeline stall (branch misprediction) caused by a standard `NEXT` instruction in threaded-code interpreters, Lottes suggests embedding a micro-interpreter at the *end of every word*:
1. **`LODSD` (1 byte or 2 bytes with REX):** Loads the next 32-bit token from `RSI` (the instruction pointer) into `EAX`/`RAX` and increments `RSI`.
2. **Lookup (2 bytes):** Uses a highly optimized hash or direct mapping to translate the token payload to a memory address.
3. **Jump (2 bytes):** Emits an indirect jump (e.g., `JMP RAX`).
*Result:* Every word transition has its own dedicated branch predictor slot in the CPU hardware, reducing average clock stalls from ~16 to near 0.
---
## 2. Onat's VAMP / KYRA: High-Performance Macro-Assembler
Onat's implementation provides a masterclass in eliminating the Forth data stack and leveraging x86-64 hardware registers optimally.
### 2.1 The 2-Register Stack & JIT State Tracking
Traditional Forth maintains a data stack in RAM, requiring constant memory loads/stores. Onat eliminates this:
* **The Stack is `RAX` and `RDX`.** No memory is used for parameter passing.
* **The 1-Bit JIT Optimizer:** The JIT compiler maintains a single bit of state: `is_rax_tos` (Is RAX currently the Top of Stack?).
* **Smart Compilation:**
* If the user types a Cyan number (Immediate), the JIT checks `is_rax_tos`. If true, it emits `mov rax, imm`. If false, it emits `mov rdx, imm`.
* Before compiling a `CALL`, the JIT knows which register the target function expects the TOS to be in. If the current JIT state mismatches the target's expectation, it automatically emits the 3-byte `xchg rax, rdx` (`48 87 C2`) instruction *before* the call.
* This makes operations like `SWAP` virtually free—they often just flip the compiler's internal `is_rax_tos` boolean without emitting any machine code.
* **Function Prologue/Epilogue:** Functions do not push/pop to a return stack in memory manually; they rely purely on the native x86 `call` and `ret` instructions utilizing `RSP` purely as a call stack.
### 2.2 Global Preemptive Scatter (The "Tape Drive")
Because the data stack is limited to two items, passing deep context is impossible.
* **Global Single-Register Base:** A single x86 register (e.g., `R12` or `R15`) is dedicated globally as the base pointer for all application memory (giving "gigabytes of state").
* **Colors map to memory operations:**
* **Green Tag (Read):** Emits `mov REG, [base_ptr + token_offset]`.
* **Red Tag (Write):** Emits `mov [base_ptr + token_offset], REG`.
* **FFI (Foreign Function Interface):** To call complex OS APIs (like Vulcan `VkImageCreateInfo`), VAMP does not use C-struct bindings. It manually calculates byte-offsets from the global base, emits instructions to write the struct data inline, aligns `RSP` for the OS calling convention, and calls the dynamic library pointer.
### 2.3 Lexical Syntax and Color Semantics
Onat uses a 24-bit dictionary index + 8-bit color tag. The semantics map directly to JIT actions:
* **Magenta Pipe (`|`):** Defines the boundary of a function. The JIT encounters this, emits a `RET` (`C3`) to close the previous function, and records the current instruction pointer as the start address of the new function.
* **White (Call):** Emits a relative `CALL` to the target. (If jumping to a dynamic address already in a register, it optimizes to `JMP RAX`).
* **Yellow (Macro):** Executes the attached code *during JIT compilation*. Used for compiler directives, setting layouts, or emitting specialized instructions like `LOCK` prefixes.
* **Blue (Comment):** Ignored by the JIT pointer entirely.
### 2.4 Control Flow without ASTs
VAMP abandons standard `IF/ELSE/THEN` parsing trees in favor of assembly-level basic blocks and lambdas.
* **Lambdas `{ }`:** Defining a lambda simply compiles the block of code elsewhere and leaves its executable memory address on the stack (`RAX` or `RDX`).
* **Conditionals via Global State:**
1. A comparison (e.g., `>`) is executed.
2. The result is written to a dedicated global variable (e.g., `condition` using a Red tag).
3. The conditional jump word reads the `condition` variable, consumes the lambda's address from the stack, and emits `CMP condition, 0` followed by `JZ lambda_address`.
* **Basic Blocks `[ ]`:** These constrain the scope of assembly generation. If a conditional within a block passes, execution falls through. If it fails, it jumps to the nearest closing `]`.
### 2.5 Live Debugging via Instruction Injection
The most powerful UX feature of VAMP is its real-time data flow visualization.
* The editor tracks the user's cursor position.
* During JIT compilation, if the `compiler_instruction_ptr` equals the `editor_cursor_ptr`, the JIT injects a debug macro.
* This macro emits instructions to copy the current state of `RAX` and `RDX` (the entire data stack) into a global circular buffer.
* The UI reads this buffer, instantly displaying the exact runtime state of the program at the cursor's location, acting as an instant, zero-cost `printf`.