92 lines
9.3 KiB
Markdown
92 lines
9.3 KiB
Markdown
# In-Depth Chronological Breakdown of Source-Less Programming Reference Videos
|
|
|
|
This document provides an exhaustive, highly detailed chronological paraphrase of the technical specifics, screen visuals, and mechanical explanations provided by Timothy Lottes and Onat Türkçüoğlu.
|
|
|
|
---
|
|
|
|
## 1. "Forth Day 2020 - Preview of x64 & ColorForth & SPIR V" (Onat, 2020)
|
|
|
|
**0:00 - 3:00 | Introduction & The Editor Visuals**
|
|
Onat introduces his 1-month-old iteration of Forth, inspired by ColorForth.
|
|
* **Screen Details:** A custom 3-pane UI rendered in C and Vulkan. Left/center panes show block-based colored tokens; the right pane displays live x64 assembly output that updates instantly as he edits.
|
|
* The editor treats code blocks as tracked state objects, supporting native undo/redo.
|
|
|
|
**3:00 - 6:00 | O(1) Dictionary Lookup & Execution Tracing**
|
|
* To avoid hashing, his compiler allocates an extra 4 bytes per character strictly to store the *source memory location* of the currently compiled word.
|
|
* **Visual Feature:** "Jump to Definition" and an "Execution Trace" overlay. He demonstrates invoking a command that instantly numbers every occurrence of a word across the codebase in the exact chronological order of execution, providing a "compile-time call graph" without running the program.
|
|
|
|
**6:00 - 11:00 | The High-Level x64 Macro Assembler & SPIR-V**
|
|
* **Screen Details:** Syntax like `AX to BX` or `CX + offset`. Toggling a "direction register" macro changes `from AX to BX register, let's move an unsigned` into a 32-bit `mov ebx, eax`. Modifiers like `long` emit 64-bit `mov rbx, rax`.
|
|
* He uses this same macro-assembler to generate SPIR-V. He notes x64 was actually less complicated than SPIR-V because x64 is a flat instruction stream, whereas SPIR-V requires strict sections, type declarations, and capabilities, forcing him to introduce "sections" into his JIT.
|
|
|
|
---
|
|
|
|
## 2. "4th And Beyond" (Timothy Lottes, NeoKineoGfx, 2026)
|
|
|
|
**0:00 - 8:00 | HP48 Evolution & ColorForth Mechanics**
|
|
* Lottes advocates removing compilers, linkers, and debuggers. He starts with HP48's RPN as the baseline.
|
|
* **Screen Details:** He defines a red word `4K` pointing to the next item on the data stack. Typing `1024 4 *` computes `4096`. `4K` acts as a variable.
|
|
* He defines `DROP` pointing to `add esi, -4` and `ret`. `4K 1 2 + DROP` yields 4096.
|
|
* He reviews ColorForth: Code compiles onto the data stack. Yellow = Execute, Red = Define, Green = Compile, Magenta = Variable. A Yellow-to-Green transition pops the stack and emits a `push` instruction.
|
|
* **Screen Details:** Disassembly of Block 24/26 shows `168B 2 , C28B0689 ,`. This pushes bytes onto the stack, disassembling to `mov edx, dword ptr esi` and `mov dword ptr esi, eax` (literally byte-banging machine code).
|
|
|
|
**8:00 - 20:00 | Branch Misprediction, Folded Interpreter, & x68**
|
|
* Standard Forth causes 16-clock branch misprediction stalls due to tag branching.
|
|
* **The Folded Interpreter:** Lottes fixes this by folding a 5-byte interpreter into the end of every word: `LODSD`, lookup, `JMP RAX`. Every transition gets its own branch predictor slot.
|
|
* **x68 Architecture:** Forces all instructions to 32-bit boundaries. `RET` (`C3`) is padded with three `NOP`s (`90 90 90`). `MOV ESI, imm32` is padded with a `3E` ignored DS prefix.
|
|
* This makes relative offsets (`CALL`, `JMP`) align perfectly. The editor auto-relinks offsets as tokens are inserted/deleted.
|
|
* **Assembly Shorthand:** Editor maps `add rcx, qword ptr [rdx + 0x8]` to visual `h + at i08`.
|
|
|
|
**20:00 - End | Live Execution (SteamOS/Linux)**
|
|
* Lottes targets a mix of high-level JIT and raw x68 sourceless.
|
|
* **Cartridge execution:** The binary copies itself to `cart.back`, maps into memory at a fixed address (bypassing ASLR), and provides a zero-fill space. 32-bit tokens act as direct absolute memory pointers, removing lookup overhead.
|
|
|
|
---
|
|
|
|
## 3. "Metaprogramming VAMP in KYRA" (Onat, SVFIG, 2025-04-26)
|
|
|
|
This presentation contains the most explicit, hardcore low-level details regarding Onat's binary-encoded compiler (VAMP).
|
|
|
|
**0:00 - 10:00 | The Binary Editor, Compilation Speed, & The 2-Item Stack**
|
|
* VAMP compiles the entire program (Vulkan renderers, UI) in **8.24 milliseconds** on Windows/Linux. His previous text-based Forth took 16-17.8ms just to compile the editor.
|
|
* **Hardware Locality & The Stack:** Traditional Forth is "runtime opinionated" with a memory data stack, making GPU compute shaders difficult. Onat strictly restricts the stack to two CPU registers: **`RAX` and `RDX`**.
|
|
* **Screen Details:** The stack state is constantly visualized in the top left corner.
|
|
* **Magenta Pipe `|`:** There are no `begin` or `end` definition words. A magenta pipe token implicitly signals the end of the previous definition (compiling a `ret`) and starts the new one. Spaces between words imply execution.
|
|
|
|
**10:00 - 18:00 | Dictionary Management, UX, & Indexing**
|
|
* **Dictionary Encoding:** Words are stored as 24-bit indices pointing to 8-byte cells, packed with an 8-bit color tag. (He notes the next iteration will use 32-bit indices + a separate 1-byte tag block for faster skipping of empty blocks).
|
|
* This pure index mapping eliminates hashing and string parsing. It allows IP-protection: you can ship the source indices without the symbols/dictionary. Core language is just 2 to 4 KB.
|
|
* **Screen Details:** Words are organized explicitly into 16-word horizontal "scrolls" (e.g., "Vulkan API", "FFMPEG", "x64 Assembly"). He presses `Ctrl+Space` to manually redefine a word in a specific scroll.
|
|
* **Comments:** A comment (Blue tag) is encoded as a string directly inside the 24-bit payload (3 characters).
|
|
|
|
**18:00 - 28:00 | Data Flow Visualization & Global Memory**
|
|
* **Free Printf:** Hovering over a word injects code to record `RAX` and `RDX`. Pressing Previous/Next steps through the execution flow visually.
|
|
* **Global Variables vs. Stacks:** To pass complex state (since the stack only holds two items), he relies entirely on global memory. He explicitly critiques Rust's "safe programming" for forcing developers to pass state through 30 layers of call stacks.
|
|
* **Single-Register Memory Access:** He dedicates a single CPU register to act as the base pointer for all program memory, giving instant access to "gigabytes of state".
|
|
|
|
**28:00 - 45:00 | Syntax, Tags, and JIT Assembly Mechanics**
|
|
* He demonstrates compiling Vulcan commands. Instead of typing `vkGetSwapchainImagesKHR`, he defines a word `get swap chain images` in the `vk device` scroll.
|
|
* **The `xchg` Trick (`48 92`):** Because the stack is just `RAX` and `RDX`, keeping `RAX` as the Top of Stack is vital. He explicitly notes that `xchg rax, rdx` compiles to just two bytes: `48 92` (REX.W + xchg eax, edx). Before starting a definition or making a call, the JIT emits `48 92` to ensure `RAX` is correctly aligned as the top.
|
|
* **Color Semantics:**
|
|
* **White (Call):** Emits a `CALL` or `JMP RAX` (e.g., `FFE0`).
|
|
* **Green (Load):** Emits `mov rax, [global_offset]`.
|
|
* **Red (Store):** Emits `mov [global_offset], rax`.
|
|
* **Yellow (Immediate/Execute):** Used heavily. For a number, emits `mov rax, imm`. Also used to invoke a lambda block.
|
|
* **Blue (Comment):** Ignored.
|
|
* **Cyan (Number):** Data literal.
|
|
|
|
**45:00 - 55:00 | Lambdas `{ }` & Basic Blocks `[ ]`**
|
|
* He explicitly eliminates `if/else` ASTs.
|
|
* **Lambdas `{ }`:** Defining a lambda block (Yellow `{`) does not execute it. It compiles the block elsewhere and leaves its executable memory address in `RAX`.
|
|
* **Basic Blocks `[ ]`:** These define a constrained range of assembly with implicit begin, link, and end jump targets.
|
|
* **Conditionals in Blocks:** He shows checking `if luma > 0.6`. He explicitly creates a `condition` variable (e.g., `26E`). The `>` operator consumes the values and writes the boolean to `condition`. The conditional word then reads `condition` and consumes the lambda address from `RAX`, emitting a `cmp condition, 0` and `jz lambda_address`.
|
|
|
|
**55:00 - 1:10:00 | FFI, Stack Pointers, and OS Interop**
|
|
* **`RSP` Alignment:** The hardware stack pointer (`RSP`) is exclusively used for the call stack, eliminating buffer overflows. When calling OS APIs (like FFMPEG), he explicitly reads `RSP` into a variable to align it to 16 bytes (required by C ABI), makes the call, and restores it.
|
|
* **Filling Structs:** For `VkImageCreateInfo`, he uses a temporary variable `$` (Dollar sign). He doesn't use C headers. He knows `14` is the Type ID, manually pushing offsets into the contiguous memory space (e.g., `info + offset`).
|
|
|
|
**1:10:00 - End | SPIR-V, Bug Triage, and Implicit Registers**
|
|
* **SPIR-V Generation:** VAMP directly emits SPIR-V. He shows the spec (Opcode 194 is Shift Right Logical) and demonstrates a 4-line definition that writes exactly `194` and its operands into a binary vector, replacing a 100MB `glslang` compiler with ~256KB of VAMP code.
|
|
* **Bug Triage:** He does not use tests or asserts. He triages bugs by commenting out blocks of code (disabling them) and hitting compile (8ms) until the crash stops.
|
|
* **Implicit Register Passing:** He shows UI hover logic where the `slot ID` is implicitly passed in register `R12D` across functions, completely avoiding pushing it to the data stack.
|
|
* **Lock Prefix:** Writing concurrent code is handled by the macro assembler. Placing the word `lock` before an `inc` token simply emits the `F0` prefix byte. |