Files
forth_bootslop/references/Video_Breakdowns.md
2026-02-20 21:19:59 -05:00

8.1 KiB

Advanced Source-Less Programming & JIT Architecture: A Hardcore Technical Study

This document contains a deep-dive technical extraction of the mechanics, JIT compiler optimizations, and paradigms presented by Timothy Lottes and Onat Türkçüoğlu. These notes surpass high-level theory, detailing the exact x86-64 assembly generation rules, state-tracking mechanisms, and memory layouts required to implement a zero-overhead, source-less Forth environment.


1. The Lottes "x68" Paradigm: Editor as the OS

Lottes's approach fundamentally transforms the editor into a live, dynamic linker and machine-code orchestrator.

1.1 The Lexical Grid and 32-Bit Instruction Granularity

In x68, the runtime contains no parsing logic. Code is a flat array of 32-bit tokens (4-bit tag, 28-bit payload). To make the x86-64 architecture fit this visual editor grid, Lottes forces all generated machine code to 32-bit boundaries:

  • Instruction Padding: Native instructions that are smaller than 4 bytes are padded.
    • Example: RET (0xC3) becomes C3 90 90 90 (using three NOPs).
    • Example: MOV or ADD can use ignored segment overrides (like the 3E DS prefix) or unnecessary REX prefixes to reach exactly 4 bytes.
  • Auto-Relinking: The editor implicitly acts as a linker. Because every instruction is 32 bits, 32-bit RIP-relative offsets for CALL (E8) and JMP (E9) are perfectly aligned. When the user inserts or deletes a token in the editor, the editor instantly recalculates and updates the raw binary relative offsets for all branch instructions.
  • Shorthand Assembly UI: The editor can decode these 32-bit blocks and display human-readable macro-assembly, e.g., mapping add rcx, qword ptr [rdx + 0x8] to the visual string h + at i08.

1.2 ColorForth Semantic Tags & The State Machine

The 4-bit color tag dictates how the editor/JIT interprets the 28-bit payload:

  • White (Ignored): Comments, formatting, or skipped words.
  • Yellow (Immediate Execution):
    • If a number: Append it to the data stack during edit/compile time.
    • If a word: Look it up in the dictionary and execute its associated code immediately.
  • Red (Define): Sets a word in the dictionary to point to the current compilation address (or TOS).
  • Green (Compile):
    • If a number: Emits machine code to push that number (e.g., mov rax, imm).
    • If a word: Looks it up in the Macro dictionary; if found, calls it (code generation). Otherwise, looks it up in the Forth dictionary and emits a CALL to it.
  • Cyan/Blue (Defer Execution): Looks up a word in the macro dictionary and appends a call to it. Used for macros that generate other macros.
  • Magenta (Variable/Pointer): Sets the dictionary value to point to the next source token in memory.
  • The Transition Trigger: A transition from Yellow (Execution) to Green (Compilation) causes the JIT to pop the current Top of Stack and emit a native machine-code instruction to push that value. (i.e., "Turning a computed number back into a program").

1.3 The 5-Byte Folded Interpreter

To eliminate the massive pipeline stall (branch misprediction) caused by a standard NEXT instruction in threaded-code interpreters, Lottes suggests embedding a micro-interpreter at the end of every word:

  1. LODSD (1 byte or 2 bytes with REX): Loads the next 32-bit token from RSI (the instruction pointer) into EAX/RAX and increments RSI.
  2. Lookup (2 bytes): Uses a highly optimized hash or direct mapping to translate the token payload to a memory address.
  3. Jump (2 bytes): Emits an indirect jump (e.g., JMP RAX). Result: Every word transition has its own dedicated branch predictor slot in the CPU hardware, reducing average clock stalls from ~16 to near 0.

2. Onat's VAMP / KYRA: High-Performance Macro-Assembler

Onat's implementation provides a masterclass in eliminating the Forth data stack and leveraging x86-64 hardware registers optimally.

2.1 The 2-Register Stack & JIT State Tracking

Traditional Forth maintains a data stack in RAM, requiring constant memory loads/stores. Onat eliminates this:

  • The Stack is RAX and RDX. No memory is used for parameter passing.
  • The 1-Bit JIT Optimizer: The JIT compiler maintains a single bit of state: is_rax_tos (Is RAX currently the Top of Stack?).
  • Smart Compilation:
    • If the user types a Cyan number (Immediate), the JIT checks is_rax_tos. If true, it emits mov rax, imm. If false, it emits mov rdx, imm.
    • Before compiling a CALL, the JIT knows which register the target function expects the TOS to be in. If the current JIT state mismatches the target's expectation, it automatically emits the 3-byte xchg rax, rdx (48 87 C2) instruction before the call.
    • This makes operations like SWAP virtually free—they often just flip the compiler's internal is_rax_tos boolean without emitting any machine code.
  • Function Prologue/Epilogue: Functions do not push/pop to a return stack in memory manually; they rely purely on the native x86 call and ret instructions utilizing RSP purely as a call stack.

2.2 Global Preemptive Scatter (The "Tape Drive")

Because the data stack is limited to two items, passing deep context is impossible.

  • Global Single-Register Base: A single x86 register (e.g., R12 or R15) is dedicated globally as the base pointer for all application memory (giving "gigabytes of state").
  • Colors map to memory operations:
    • Green Tag (Read): Emits mov REG, [base_ptr + token_offset].
    • Red Tag (Write): Emits mov [base_ptr + token_offset], REG.
  • FFI (Foreign Function Interface): To call complex OS APIs (like Vulcan VkImageCreateInfo), VAMP does not use C-struct bindings. It manually calculates byte-offsets from the global base, emits instructions to write the struct data inline, aligns RSP for the OS calling convention, and calls the dynamic library pointer.

2.3 Lexical Syntax and Color Semantics

Onat uses a 24-bit dictionary index + 8-bit color tag. The semantics map directly to JIT actions:

  • Magenta Pipe (|): Defines the boundary of a function. The JIT encounters this, emits a RET (C3) to close the previous function, and records the current instruction pointer as the start address of the new function.
  • White (Call): Emits a relative CALL to the target. (If jumping to a dynamic address already in a register, it optimizes to JMP RAX).
  • Yellow (Macro): Executes the attached code during JIT compilation. Used for compiler directives, setting layouts, or emitting specialized instructions like LOCK prefixes.
  • Blue (Comment): Ignored by the JIT pointer entirely.

2.4 Control Flow without ASTs

VAMP abandons standard IF/ELSE/THEN parsing trees in favor of assembly-level basic blocks and lambdas.

  • Lambdas { }: Defining a lambda simply compiles the block of code elsewhere and leaves its executable memory address on the stack (RAX or RDX).
  • Conditionals via Global State:
    1. A comparison (e.g., >) is executed.
    2. The result is written to a dedicated global variable (e.g., condition using a Red tag).
    3. The conditional jump word reads the condition variable, consumes the lambda's address from the stack, and emits CMP condition, 0 followed by JZ lambda_address.
  • Basic Blocks [ ]: These constrain the scope of assembly generation. If a conditional within a block passes, execution falls through. If it fails, it jumps to the nearest closing ].

2.5 Live Debugging via Instruction Injection

The most powerful UX feature of VAMP is its real-time data flow visualization.

  • The editor tracks the user's cursor position.
  • During JIT compilation, if the compiler_instruction_ptr equals the editor_cursor_ptr, the JIT injects a debug macro.
  • This macro emits instructions to copy the current state of RAX and RDX (the entire data stack) into a global circular buffer.
  • The UI reads this buffer, instantly displaying the exact runtime state of the program at the cursor's location, acting as an instant, zero-cost printf.