diff --git a/references/blog_in-depth.md b/references/blog_in-depth.md new file mode 100644 index 0000000..a544f44 --- /dev/null +++ b/references/blog_in-depth.md @@ -0,0 +1,97 @@ +# In-Depth Analysis: Timothy Lottes's Development Blogs (2007 - 2016) + +This document synthesizes the architectural paradigms, implementation details, and philosophical shifts documented in Timothy Lottes's blogs over a decade of building minimal, high-performance Forth-like operating environments. This knowledge is crucial for understanding the "Lottes/Onat Paradigm" and successfully implementing the `bootslop` project. + +--- + +## 1. The Core Philosophy: "Vintage Programming" + +Lottes advocates for returning to a "stone-age" development methodology reminiscent of the Commodore 64 or HP48, but applied to modern x86-64 hardware and GPUs. + +* **Rejection of Modern Complexity:** He explicitly rejects the "NO" of modern operating systems—compilers, linkers, debuggers, memory protection, paging, and bloated ABIs. He aims for an environment that says "YES" to direct hardware access. +* **The OS IS the Editor:** The system boots directly into a visual editor. This editor functions simultaneously as an IDE, assembler, disassembler, hex editor, debugger, and live-coding environment. +* **Instant Iteration:** The primary goal is a sub-5ms edit-compile-run loop. Debugging is done via instant visual feedback and "printf" style memory peeking within the editor itself, rendering traditional debuggers obsolete. +* **Extreme Minimalism:** His compilers and core runtimes often fit within 1.5KB to 4KB (e.g., the 1536-byte bootloader/interpreter project). + +## 2. The Evolution to "Source-Less" Programming + +The most critical architectural shift in Lottes's work is the move from text-based source files (like his 2014 "A" language) to **Source-Less Programming** (2015). + +### Why Source-Less? +Parsing text (lexical analysis, string hashing, AST generation) is slow and complex. In a source-less model, the "source code" *is* the binary executable image (or a direct structured representation of it). + +### The Architecture of Source-Less (x68) +1. **32-Bit Granularity:** Every token in the system is exactly 32 bits (4 bytes). + * To accommodate variable-length x86-64 instructions, Lottes invented "x68". + * **Padding:** Standard x86 instructions are padded to exactly 32 bits (or multiples of 32 bits) using ignored segment override prefixes (like `2E` or `3E`) and multi-byte NOPs. + * Example: A `RET` instruction (`C3`) becomes `C3 90 90 90`. + * *Why?* This keeps immediate values (like 32-bit addresses or constants) 32-bit aligned, drastically simplifying the editor and the assembler. + +2. **The Token Types:** A 32-bit word in memory represents one of four things: + * **DAT (Data):** Hexadecimal data or an immediate value. + * **OP (Opcode):** A padded 32-bit x86-64 machine instruction. + * **ABS (Absolute Address):** A direct 32-bit memory pointer. + * **REL (Relative Address):** An `[RIP + imm32]` relative offset used for branching. + +3. **The Annotation Overlay (The "Shadow" Memory):** + * Because raw 32-bit hex values are unreadable to humans, the editor maintains a *parallel array* of 64-bit annotations for every 32-bit token. + * **Annotation Layout (64-bit):** + * `Tag` (4 to 8 bits): Defines how the editor should display and treat the 32-bit value (e.g., display as a signed int, an opcode name, a relative address, or a specific color). + * `Label / Name`: A short string (e.g., 5 to 8 characters, often compressed using 6-bit or 7-bit encodings to fit) that acts as the human-readable name for the memory address. + * *The Magic:* The editor reads the binary array and the annotation array. It uses the tags to dynamically format the screen. There is **zero string parsing** at runtime. + +4. **Edit-Time Relinking (The Visual Linker):** + * When you insert or delete a token in the editor, all tokens tagged as `ABS` or `REL` (addresses) are automatically recalculated and updated in real-time. The editor *is* the linker. + +5. **Live State vs. Edit State:** + * Memory is split: The live running program, and the edit buffer. + * When edits are made and confirmed (e.g., hitting ESC or Enter), the editor atomically swaps or patches the live image with the edited image. + +## 3. Language Paradigms: "Ear" and "Toe" + +In his "Random Holiday 2015" post, Lottes solidifies the specific DSLs used within this source-less framework: + +* **"Toe" (The Low-Level Assembler):** This is the subset of x86-64 with 32-bit padded opcodes. It is heavily macro-driven to assemble machine code. +* **"Ear" (The High-Level Macro/Forth Language):** A zero-operand, Forth-like language embedded directly into the binary form. + * Instead of a traditional Forth interpreter searching a dictionary at runtime, the dictionary is resolved at *edit-time* or *import-time*. + * A token is just an index or a direct `CALL` instruction to the compiled word. + +### The 2-Item Stack (Implicit Registers) +While early experiments used a traditional Forth data stack in memory, Lottes's later architectures (and Onat's derived work) map the stack directly to hardware registers to eliminate memory overhead: +* `RAX` = Top of Stack (TOS) +* `RBX` (or `RDX` in Onat's VAMP) = Second item on stack (NOS) +* **The xchg Trick:** Stack rotation is often handled by `xchg rax, rbx` (or `rdx`), which compiles to a tiny 2-3 byte instruction, keeping execution entirely within the CPU cache. + +## 4. Bootstrapping "The Chicken Without an Egg" + +How do you build a system that requires a custom binary editor to write code, when you don't have the editor yet? + +1. **C Prototype First:** Lottes explicitly states he builds the first iteration of the visual editor and virtual machine in C (using WinAPI or standard libraries). This allows rapid iteration of the visual layout and the memory arena logic. +2. **Hand-Assembling Bootstraps:** He uses standard assemblers (like NASM) or hexadecimal byte-banging (using tools like `objdump -d`) to figure out the exact padded 32-bit opcode bytes. +3. **Embed Opcode Definitions:** The C prototype includes hardcoded arrays of bytes that represent the base opcodes (e.g., `MOV`, `ADD`, `CALL`, `RET`). +4. **Self-Hosting:** Once the C editor is stable and can generate binary code into an arena, he rewrites the editor *inside* the custom language within the C editor, eventually discarding the C host. + +## 5. UI and Visual Design + +The UI is not an afterthought; it is integral to the architecture. + +* **The Grid:** The editor displays memory as a strict grid. Typical layout: 8 tokens per line (fitting half a 64-byte cache line). +* **Two Rows per Token:** + * Top Row: The Annotation (Label/Name), color-coded. + * Bottom Row: The 32-bit Data (Hex value, or a resolved symbol name if tagged as an address). +* **Colors (ColorForth Inspired):** + * Colors dictate semantic meaning (e.g., Red = Define, Green = Compile, Yellow = Execute/Immediate, White/Grey = Comment/Format). This visual syntax replaces traditional language keywords. +* **Pixel-Perfect Fonts:** Lottes builds custom, fixed-width raster fonts (e.g., 6x11 or 8x8) to ensure perfect readability without anti-aliasing blurring, often treating specific characters (like `_`, `-`, `=`) as line-drawing characters to structure the UI. + +## Summary for the `bootslop` Implementation + +Our current `attempt_1/main.c` is perfectly aligned with Phase 1 of the Lottes bootstrapping process: +1. We have a C-based WinAPI editor. +2. We have a token array (`tape_arena`) and an annotation array (`anno_arena`). +3. We have 32-bit tokens packed with a 4-bit semantic tag and a 28-bit payload. +4. We have a basic JIT emitter targeting a 2-register (`RAX`/`RDX`) virtual machine. + +**Next Immediate Priorities based on Lottes's path:** +* Move away from string-based dictionary lookups at runtime to **Edit-Time Relinking** (resolving addresses when the token is typed or modified in the UI). +* Implement the **Padding Strategy** for the x86-64 JIT emitter to ensure all emitted logical blocks align cleanly, paving the way for 1:1 token-to-machine-code mapping. +* Refine the Editor Grid to show the two-row (Annotation / Data) layout clearly. \ No newline at end of file diff --git a/references/forth_day_2020_in-depth.md b/references/forth_day_2020_in-depth.md new file mode 100644 index 0000000..895c738 --- /dev/null +++ b/references/forth_day_2020_in-depth.md @@ -0,0 +1,58 @@ +# In-Depth Analysis: Onat's Forth Day 2020 Presentation + +This document provides an exhaustive breakdown of the technical specifics, screen visuals, and mechanical explanations from Onat Türkçüoğlu's "Preview of x64 & ColorForth & SPIR V" presentation at Forth Day 2020, synthesizing both the video transcript and the OCR analysis of the editor's visual state. + +--- + +## 1. The Environment and Editor UI + +Onat introduces a custom 3-pane UI built entirely from scratch in C and Vulkan. This editor serves as the primary IDE, compiler, and visual debugger. + +### Visual Layout (from OCR & Video) +* **The Three Panes:** Left/center panes display the block-based, colorized Forth/macro tokens. The right pane displays live x86-64 assembly output (or SPIR-V binary data) that updates instantly as the user edits the source. +* **Color Semantics (Observed in OCR):** + * **Cyan:** Low-level x86-64 opcodes or API functions (`mov`, `jmp`, `xorpd`, `CCALL1`, `ide_syscmd`). + * **Yellow:** Line numbers, specific execution tokens, or immediate jump labels/blocks. + * **Magenta:** High-level struct definitions, bitwise layouts, and basic block delineations (`Structs`, `vars`, `bits`). + * **Red:** Literal numbers (`32`, `64`), format strings, or specific SPIR-V instruction IDs and properties. + * **Orange/Green:** UI and control flow modifiers. +* **State Tracking:** The editor treats code blocks as tracked state objects, which allows for native, robust Undo/Redo operations without relying on a traditional text file format. + +## 2. O(1) Dictionary Lookup & "Compile-Time Call Graph" + +Traditional Forth systems (and even Lottes's early systems) relied on hashing strings or linear searches to resolve words. Onat eliminated this overhead entirely. + +* **Source Memory Mapping:** Instead of hashing, the compiler allocates an extra 4 bytes per character in the visual block to store the *exact source memory location* of the currently compiled word. +* **Instant Resolution:** Because the token itself points to its origin, "Jump to Definition" is instantaneous. +* **Execution Tracing:** He demonstrates a command that instantly numbers every occurrence of a word across the codebase in the exact chronological order of execution. This provides a "compile-time call graph" without actually running the program, allowing the programmer to visualize the data flow statically. + +## 3. The High-Level x64 Macro Assembler + +The core of the system is not a traditional Forth interpreter, but a high-level macro assembler that compiles words directly into x64 machine code. + +* **Syntax & Abstraction:** + * The syntax is designed to be readable and fluid: `AX to BX` or `CX + offset`. + * A "direction register" macro allows toggling the flow of data. For instance, `from AX to BX register, let's move an unsigned` emits a 32-bit `mov ebx, eax`. + * Modifiers like `long` change the emission to a 64-bit `mov rbx, rax`. +* **Low-Level Control (OCR Insights):** The OCR reveals exact x64 instructions embedded in the blocks: + * `xorpd xm15, xm15` and `movups [rsi], xm15` show direct, native access to SSE/AVX registers for vectorized operations. + * Macros like `PUSH2 rsi, rdi` and `POP2 rsi, rdi` are used instead of traditional C-style prologues/epilogues, maintaining tight control over the stack pointer and register preservation. + * **C-ABI Integration:** The OCR shows words like `CCALL1 ide_p` and `CCALL3 ide_syscmd`. This indicates a custom FFI (Foreign Function Interface) macro set (`CCALL0`, `CCALL1`, `CCALL2`, `CCALL3`) designed to automatically align the stack (`RSP` to 16 bytes) and map registers to the C-ABI (e.g., `RCX`, `RDX`, `R8`, `R9` on Windows) to call out to the C-based host/Vulkan engine. + +## 4. SPIR-V Generation + +A significant portion of the presentation focuses on using this same macro-assembler foundation to generate SPIR-V (the intermediate representation for Vulkan compute/graphics shaders) entirely from scratch, replacing massive compiler toolchains like `glslang`. + +* **x64 vs. SPIR-V Complexity:** Onat notes that x64 assembly was actually *less* complicated to generate than SPIR-V. + * x64 is a flat, linear instruction stream. + * SPIR-V is strictly structured. It requires rigid sections for Capabilities, Extensions, Memory Models, Entry Points, Execution Modes, Types, and Function Definitions before any actual logic can be emitted. +* **SPIR-V Macros (OCR Insights):** The OCR captures the exact implementation of the SPIR-V generator: + * Words like `opTypeInt 32`, `opTypeVector 4`, `opTypeFloat` map directly to the SPIR-V specification binary IDs. + * Memory addresses and types are explicitly laid out: `PhysicalStorageBuffer64`. + * This proves that the "sourceless" environment scales perfectly from raw CPU machine code to structured GPU bytecodes by just changing the underlying byte-emission macros. + +## 5. Key Takeaways for the `bootslop` Implementation + +1. **Immediate x64 Access:** The system shouldn't hide the CPU. It should expose it via macros (like `CCALL`) that handle the tedious parts of the ABI while letting the programmer write `movups` if they want to. +2. **Visual Over Text:** The implementation of 4 extra bytes per character to store "source location" reinforces that the visual grid *is* the data structure. It's not text being parsed; it's a spatial array of tokens pointing to each other. +3. **The FFI Bridge:** We will need a macro pattern equivalent to `CCALL` in our JIT emitter to talk to WinAPI functions without trashing the 2-item (`RAX`/`RDX`) stack or violating the 16-byte `RSP` alignment required by Windows. \ No newline at end of file diff --git a/references/kyra_in-depth.md b/references/kyra_in-depth.md new file mode 100644 index 0000000..31cfaf9 --- /dev/null +++ b/references/kyra_in-depth.md @@ -0,0 +1,86 @@ +# In-Depth Analysis: Metaprogramming KYRA in KYRA (Onat Türkçüoğlu) + +This document provides a comprehensive synthesis of the "Metaprogramming KYRA in KYRA" presentation given by Onat Türkçüoğlu at the Silicon Valley Forth Interest Group (SVFIG) on April 26, 2025. It integrates insights from the video transcript and the extensive OCR analysis of his visual editor. + +This presentation is the most explicit, hardcore low-level deep dive into Onat's binary-encoded compiler (KYRA) and serves as the definitive mechanical blueprint for our `bootslop` project. + +--- + +## 1. Performance and "Runtime-Opinionated" Languages + +Onat's primary critique of traditional Forth (and languages like C or Rust) is that they are "runtime opinionated." Standard Forth dictates a memory-based data stack and return stack. This makes it fundamentally incompatible with environments like GPU compute shaders. + +* **Compilation Speed:** KYRA compiles its entire program (including a custom editor, Vulkan renderers, and FFMPEG integrations) in **8.24 milliseconds** natively on Windows/Linux. +* **The 2-Item Hardware Stack:** To achieve hardware locality and GPU compatibility, KYRA strictly restricts the data stack to exactly two CPU registers: **`RAX` (Top of Stack)** and **`RDX` (Next on Stack)**. +* **Zero Stack Overhead:** By having no memory data stack, KYRA eliminates the push/pop overhead that plagues standard Forth implementations. + +## 2. The Mechanics of the KYRA Emitter + +KYRA is not an interpreter; it is a high-level macro assembler that generates direct x86-64 machine code via JIT compilation. + +### The `xchg` Trick (The Magenta Pipe `|`) +* Because the stack is just `RAX` and `RDX`, ensuring `RAX` is the active "Top of Stack" before executing a word is vital. +* The `xchg rax, rdx` instruction compiles to a tiny 2-byte opcode: `48 92`. +* **Definitions:** There are no `begin` or `end` words. A magenta pipe token (`|`) implicitly signals the start of a new definition. The JIT reacts to this by: + 1. Emitting a `RET` (`C3`) to close the *previous* definition. + 2. Emitting `48 92` (`xchg rax, rdx`) to ensure proper stack alignment for the *new* definition. + +### Color Semantics and Code Generation (From Transcript & OCR) +* **Magenta (`|`):** Definition boundary (`RET` + `xchg rax, rdx`). +* **White (Call):** A compile-time call. Emits a direct `CALL` instruction or a `JMP RAX` (e.g., `FFE0`) if optimizing a tail call. +* **Green (Load):** Emits a read from memory: `mov rax, [global_offset]`. +* **Red (Store):** Emits a write to memory: `mov [global_offset], rax`. +* **Yellow (Execute/Immediate):** A highly overloaded color used for runtime execution, immediate invocation of lambdas, or prefix accessors (like struct member reading). +* **Cyan (Literal):** Compiles an immediate value load: `mov rax, imm`. +* **Blue (Comment):** Stored directly in the token payload (3 characters per 24-bit payload) without polluting the global dictionary. + +## 3. Global Memory vs. Local Variables + +Onat heavily critiques the conventional wisdom of avoiding global variables, specifically calling out Rust for forcing developers to pass state through 30 layers of call stacks. + +* **Implicit Register Passing:** For passing transient state (like the active UI element's `slot ID`), he implicitly passes the value in a dedicated register (e.g., `R12D`) across functions, completely bypassing any need to push it to a stack. +* **Single-Register Memory Base:** He dedicates a single CPU register to act as the base pointer for all program memory. This gives instant `[BASE_REG + offset]` access to "gigabytes of state." +* **The "Tape Drive" in Practice:** Instead of a stack, data needed for complex API calls (like Vulkan initialization) is pre-scattered into these known global offsets using Red (Store) words, and then passed via a single pointer. + +## 4. Dictionary Management and The "Deck" + +Unlike text-based Forths that require hashing, KYRA uses a pure binary index map. + +* **24-Bit Indices:** Words are stored as 24-bit indices pointing to 8-byte cells. (Onat notes his next iteration moves to 32-bit indices + a separate 1-byte tag array, exactly matching Lottes's `x68` annotation model). +* **Visual Organization (The "Scrolls"):** The dictionary is explicitly organized by the programmer into 16-word horizontal "scrolls" (e.g., one scroll for "Vulkan API", another for "Math"). +* **IP Protection:** Because the dictionary mapping is separate from the source array, you can ship the binary source indices without the dictionary symbols, effectively stripping the symbols while retaining the executable structure. + +## 5. Control Flow: Basic Blocks `[ ]` and Lambdas `{ }` + +KYRA eliminates standard Abstract Syntax Trees (ASTs) and `if/else/then` branching. + +* **Basic Blocks `[ ]`:** These visually constrain the assembly output. They provide implicit begin, link (else), and end jump targets for the JIT to resolve relative offsets within a limited scope. +* **Lambdas `{ }`:** A lambda (colored Yellow `{`) does not execute inline. The JIT compiles the block of code elsewhere in the arena and leaves its executable memory address in `RAX`. +* **Conditionals:** To perform an `IF`: + 1. Evaluate a condition (e.g., `luma > 0.6`). + 2. Write the boolean result to a dedicated global `condition` variable. + 3. Define a lambda block containing the "true" branch (leaving its address in `RAX`). + 4. Call an execution word that reads the `condition` variable, emits a `cmp condition, 0`, and executes a `jz` (jump if zero) to skip the lambda address stored in `RAX`. + +## 6. FFI: Bridging to C and Vulkan (WinAPI equivalent) + +Dealing with OS APIs and standard C libraries (like Vulkan and FFMPEG) requires satisfying the C Application Binary Interface (ABI). + +* **RSP Alignment:** The hardware stack pointer (`RSP`) is exclusively used for the call stack (return addresses), eliminating buffer overflow vulnerabilities. +* **The FFI Dance:** When calling external C functions, Onat's macros explicitly read `RSP` into a temporary variable, align `RSP` to 16-bytes (a strict requirement for Windows/Linux x64 C ABI), execute the `CALL`, and then restore `RSP`. +* *(Note for Bootslop: We saw `CCALL1`, `CCALL2`, etc., in the OCR, confirming he uses specialized macro words to map the `RAX`/`RDX` stack and global variables into the standard `RCX`, `RDX`, `R8`, `R9` C-ABI registers before triggering the OS call).* + +## 7. Development Workflow + +* **Bug Triage over Asserts:** There are no unit tests or assertions. Bugs are found by commenting out blocks of code (disabling them) and hitting compile. Because compilation takes 8ms, binary searching for the crash point is faster than writing tests. +* **Free Printf / Data Flow:** By hovering over a word in the editor, the system automatically injects code to record `RAX` and `RDX` at that exact execution step, allowing the programmer to step through the data flow visually without running traditional debuggers. + +--- +### Conclusion for `bootslop` + +The "Metaprogramming KYRA" talk confirms that our 2-register stack and "preemptive scatter" global memory model in `attempt_1/main.c` is the exact correct path. + +The next major hurdles for `bootslop` will be: +1. Implementing the `xchg rax, rdx` definition boundary logic. +2. Creating an FFI bridge (like Onat's `CCALL`) that aligns `RSP` to 16 bytes and maps globals to WinAPI registers, allowing our minimal Forth to summon full OS windows and graphics. +3. Transitioning dictionary definitions from string-parsing to direct array index resolution. \ No newline at end of file diff --git a/references/neokineogfx_in-depth.md b/references/neokineogfx_in-depth.md new file mode 100644 index 0000000..7d5910f --- /dev/null +++ b/references/neokineogfx_in-depth.md @@ -0,0 +1,62 @@ +# In-Depth Analysis: Neokineogfx - 4th And Beyond (Timothy Lottes) + +This document synthesizes the insights extracted from the transcript and OCR analysis of Timothy Lottes's "4th And Beyond" presentation video (released under his Neokineogfx channel in 2026). It details the evolution of his Forth derivatives, the specifics of his "x68" encoding, and the mechanics of his "5th" system. + +--- + +## 1. Evolution from Calculator to Forth +Lottes traces the ideal interactive tool back to Reverse Polish Notation (RPN) calculators like the HP48. +* **The Baseline:** Start with simple RPN math on a stack. +* **The Dictionary:** Introduce a dictionary that points to positions on the data stack or to executable code. +* **Color Semantics (ColorForth Inspired):** + * **Yellow (Execute):** Push numbers to the stack, or execute dictionary words. + * **Red (Define):** Define a word. + * **Green (Compile):** Compile words or push values during compilation. + * **Magenta (Variable):** Define a variable. + +## 2. The Branch Misprediction Problem +Standard Forth causes severe CPU pipeline stalls (averaging 16-clock stalls on architectures like Zen 2) due to constant branch misprediction when interpreting tags or navigating the dictionary lookup loop. + +* **Solution - The Folded Interpreter:** Lottes mitigates this by folding a tiny (5-byte) interpreter directly into the end of every compiled word. +* By ending every word with its own fetch/dispatch logic (e.g., `LODSD`, lookup, `JMP`), the CPU's branch predictor gets unique slots for every transition, drastically improving execution speed. + +## 3. The Architecture of "Source-Less" (x68) +To make manipulating binary data as easy as text, Lottes invented "x68"—a subset of x86-64 designed purely around 32-bit boundaries. + +* **32-Bit Instruction Granularity:** Every x86-64 instruction is padded to exactly 4 bytes (or multiples of 4). +* **Prefix Padding:** x86-64 allows ignored prefixes (like `3E`, the DS segment override) and multi-byte NOPs to pad instructions. + * *Example (RET):* `C3` padded to `f0f c3` or `C3 90 90 90` (RET + NOPs). + * *Example (Inline Data):* Moving a 32-bit immediate is padded with `3E`s to ensure the immediate value is perfectly 32-bit aligned in the next memory slot. +* **Why?** This removes the complexity of variable-length instructions, turning compilation into an edit-time operation where the user simply copies and pastes 32-bit words. + +## 4. Editor Mechanics & Annotation Overlay +The editor is an "Advanced 32-bit Hex Editor". The source code is literally the binary array. + +* **Structure:** The file is split into blocks. For every 32-bit source word, there are 64 bits of annotation memory. +* **64-bit Annotation Layout:** + * 8 characters encoded in 7 bits each (56 bits total) acting as the human-readable Label/Note. + * 8-bit Tag. This tag dictates how the 32-bit value in memory is formatted in the editor (e.g., Hex Data, Absolute Address, Relative Address). +* **Visual Layout:** The editor displays lines with two elements per cell: + * Top: The Annotation string (color-coded by tag). + * Bottom: The 32-bit interpreted value. +* **Auto-Relinking:** The editor dynamically recalculates `CALL`/`JMP` 32-bit relative offsets and 8-bit conditional jump offsets when tokens are inserted or deleted. The editor is the linker. + +## 5. Free-Form Source & Argument Fetching +Lottes diverges from strict zero-operand Forth by introducing "preemptive scatter" arguments directly in the source stream. + +* **Source is the Dictionary:** The 32-bit words are direct absolute memory pointers into the binary. +* **Argument Fetching:** Instead of pushing to a data stack before calling, words can read ahead in the instruction stream. + * `[RSI]` points to the current word. + * `[RSI+4]`, `[RSI+8]` can be fetched directly into registers (like `RCX`, `RDX`) within the word's implementation. +* **Benefits:** This reduces branch granularity and eliminates stack shuffling overhead, making it much faster for heavy code-generation tasks (like JITing GPU shaders). + +## 6. The Self-Modifying OS Cartridge +To handle persistent storage and live updates without complex OS APIs, Lottes leverages Linux's memory mapping and dirty page writeback. + +* **The Execution Loop:** + 1. Launch `cart` (the binary). + 2. The binary copies itself to `cart.bck` and launches `cart.bck`. + 3. `cart.bck` maps the original `cart` file into memory (e.g., at the 6MiB mark) with Read/Write/Execute (RWE) permissions. + 4. It maps an adjustable zero-fill memory space immediately following it. + 5. It jumps into the interpreter. +* **Persistence:** Because the file is mapped into memory, any changes made in the editor modify the file in RAM. Linux's kernel automatically flushes "dirty pages" to the physical disk (e.g., every 30 seconds on SteamOS/SteamDeck). There is no "Save File" code required; data and code reside together and persist implicitly. \ No newline at end of file