Files
forth_bootslop/references/kyra_in-depth.md
2026-02-20 21:38:52 -05:00

86 lines
7.4 KiB
Markdown

# In-Depth Analysis: Metaprogramming KYRA in KYRA (Onat Türkçüoğlu)
This document provides a comprehensive synthesis of the "Metaprogramming KYRA in KYRA" presentation given by Onat Türkçüoğlu at the Silicon Valley Forth Interest Group (SVFIG) on April 26, 2025. It integrates insights from the video transcript and the extensive OCR analysis of his visual editor.
This presentation is the most explicit, hardcore low-level deep dive into Onat's binary-encoded compiler (KYRA) and serves as the definitive mechanical blueprint for our `bootslop` project.
---
## 1. Performance and "Runtime-Opinionated" Languages
Onat's primary critique of traditional Forth (and languages like C or Rust) is that they are "runtime opinionated." Standard Forth dictates a memory-based data stack and return stack. This makes it fundamentally incompatible with environments like GPU compute shaders.
* **Compilation Speed:** KYRA compiles its entire program (including a custom editor, Vulkan renderers, and FFMPEG integrations) in **8.24 milliseconds** natively on Windows/Linux.
* **The 2-Item Hardware Stack:** To achieve hardware locality and GPU compatibility, KYRA strictly restricts the data stack to exactly two CPU registers: **`RAX` (Top of Stack)** and **`RDX` (Next on Stack)**.
* **Zero Stack Overhead:** By having no memory data stack, KYRA eliminates the push/pop overhead that plagues standard Forth implementations.
## 2. The Mechanics of the KYRA Emitter
KYRA is not an interpreter; it is a high-level macro assembler that generates direct x86-64 machine code via JIT compilation.
### The `xchg` Trick (The Magenta Pipe `|`)
* Because the stack is just `RAX` and `RDX`, ensuring `RAX` is the active "Top of Stack" before executing a word is vital.
* The `xchg rax, rdx` instruction compiles to a tiny 2-byte opcode: `48 92`.
* **Definitions:** There are no `begin` or `end` words. A magenta pipe token (`|`) implicitly signals the start of a new definition. The JIT reacts to this by:
1. Emitting a `RET` (`C3`) to close the *previous* definition.
2. Emitting `48 92` (`xchg rax, rdx`) to ensure proper stack alignment for the *new* definition.
### Color Semantics and Code Generation (From Transcript & OCR)
* **Magenta (`|`):** Definition boundary (`RET` + `xchg rax, rdx`).
* **White (Call):** A compile-time call. Emits a direct `CALL` instruction or a `JMP RAX` (e.g., `FFE0`) if optimizing a tail call.
* **Green (Load):** Emits a read from memory: `mov rax, [global_offset]`.
* **Red (Store):** Emits a write to memory: `mov [global_offset], rax`.
* **Yellow (Execute/Immediate):** A highly overloaded color used for runtime execution, immediate invocation of lambdas, or prefix accessors (like struct member reading).
* **Cyan (Literal):** Compiles an immediate value load: `mov rax, imm`.
* **Blue (Comment):** Stored directly in the token payload (3 characters per 24-bit payload) without polluting the global dictionary.
## 3. Global Memory vs. Local Variables
Onat heavily critiques the conventional wisdom of avoiding global variables, specifically calling out Rust for forcing developers to pass state through 30 layers of call stacks.
* **Implicit Register Passing:** For passing transient state (like the active UI element's `slot ID`), he implicitly passes the value in a dedicated register (e.g., `R12D`) across functions, completely bypassing any need to push it to a stack.
* **Single-Register Memory Base:** He dedicates a single CPU register to act as the base pointer for all program memory. This gives instant `[BASE_REG + offset]` access to "gigabytes of state."
* **The "Tape Drive" in Practice:** Instead of a stack, data needed for complex API calls (like Vulkan initialization) is pre-scattered into these known global offsets using Red (Store) words, and then passed via a single pointer.
## 4. Dictionary Management and The "Deck"
Unlike text-based Forths that require hashing, KYRA uses a pure binary index map.
* **24-Bit Indices:** Words are stored as 24-bit indices pointing to 8-byte cells. (Onat notes his next iteration moves to 32-bit indices + a separate 1-byte tag array, exactly matching Lottes's `x68` annotation model).
* **Visual Organization (The "Scrolls"):** The dictionary is explicitly organized by the programmer into 16-word horizontal "scrolls" (e.g., one scroll for "Vulkan API", another for "Math").
* **IP Protection:** Because the dictionary mapping is separate from the source array, you can ship the binary source indices without the dictionary symbols, effectively stripping the symbols while retaining the executable structure.
## 5. Control Flow: Basic Blocks `[ ]` and Lambdas `{ }`
KYRA eliminates standard Abstract Syntax Trees (ASTs) and `if/else/then` branching.
* **Basic Blocks `[ ]`:** These visually constrain the assembly output. They provide implicit begin, link (else), and end jump targets for the JIT to resolve relative offsets within a limited scope.
* **Lambdas `{ }`:** A lambda (colored Yellow `{`) does not execute inline. The JIT compiles the block of code elsewhere in the arena and leaves its executable memory address in `RAX`.
* **Conditionals:** To perform an `IF`:
1. Evaluate a condition (e.g., `luma > 0.6`).
2. Write the boolean result to a dedicated global `condition` variable.
3. Define a lambda block containing the "true" branch (leaving its address in `RAX`).
4. Call an execution word that reads the `condition` variable, emits a `cmp condition, 0`, and executes a `jz` (jump if zero) to skip the lambda address stored in `RAX`.
## 6. FFI: Bridging to C and Vulkan (WinAPI equivalent)
Dealing with OS APIs and standard C libraries (like Vulkan and FFMPEG) requires satisfying the C Application Binary Interface (ABI).
* **RSP Alignment:** The hardware stack pointer (`RSP`) is exclusively used for the call stack (return addresses), eliminating buffer overflow vulnerabilities.
* **The FFI Dance:** When calling external C functions, Onat's macros explicitly read `RSP` into a temporary variable, align `RSP` to 16-bytes (a strict requirement for Windows/Linux x64 C ABI), execute the `CALL`, and then restore `RSP`.
* *(Note for Bootslop: We saw `CCALL1`, `CCALL2`, etc., in the OCR, confirming he uses specialized macro words to map the `RAX`/`RDX` stack and global variables into the standard `RCX`, `RDX`, `R8`, `R9` C-ABI registers before triggering the OS call).*
## 7. Development Workflow
* **Bug Triage over Asserts:** There are no unit tests or assertions. Bugs are found by commenting out blocks of code (disabling them) and hitting compile. Because compilation takes 8ms, binary searching for the crash point is faster than writing tests.
* **Free Printf / Data Flow:** By hovering over a word in the editor, the system automatically injects code to record `RAX` and `RDX` at that exact execution step, allowing the programmer to step through the data flow visually without running traditional debuggers.
---
### Conclusion for `bootslop`
The "Metaprogramming KYRA" talk confirms that our 2-register stack and "preemptive scatter" global memory model in `attempt_1/main.c` is the exact correct path.
The next major hurdles for `bootslop` will be:
1. Implementing the `xchg rax, rdx` definition boundary logic.
2. Creating an FFI bridge (like Onat's `CCALL`) that aligns `RSP` to 16 bytes and maps globals to WinAPI registers, allowing our minimal Forth to summon full OS windows and graphics.
3. Transitioning dictionary definitions from string-parsing to direct array index resolution.