Files
forth_bootslop/references/Video_Breakdowns.md
2026-02-20 21:25:46 -05:00

9.3 KiB

In-Depth Chronological Breakdown of Source-Less Programming Reference Videos

This document provides an exhaustive, highly detailed chronological paraphrase of the technical specifics, screen visuals, and mechanical explanations provided by Timothy Lottes and Onat Türkçüoğlu.


1. "Forth Day 2020 - Preview of x64 & ColorForth & SPIR V" (Onat, 2020)

0:00 - 3:00 | Introduction & The Editor Visuals Onat introduces his 1-month-old iteration of Forth, inspired by ColorForth.

  • Screen Details: A custom 3-pane UI rendered in C and Vulkan. Left/center panes show block-based colored tokens; the right pane displays live x64 assembly output that updates instantly as he edits.
  • The editor treats code blocks as tracked state objects, supporting native undo/redo.

3:00 - 6:00 | O(1) Dictionary Lookup & Execution Tracing

  • To avoid hashing, his compiler allocates an extra 4 bytes per character strictly to store the source memory location of the currently compiled word.
  • Visual Feature: "Jump to Definition" and an "Execution Trace" overlay. He demonstrates invoking a command that instantly numbers every occurrence of a word across the codebase in the exact chronological order of execution, providing a "compile-time call graph" without running the program.

6:00 - 11:00 | The High-Level x64 Macro Assembler & SPIR-V

  • Screen Details: Syntax like AX to BX or CX + offset. Toggling a "direction register" macro changes from AX to BX register, let's move an unsigned into a 32-bit mov ebx, eax. Modifiers like long emit 64-bit mov rbx, rax.
  • He uses this same macro-assembler to generate SPIR-V. He notes x64 was actually less complicated than SPIR-V because x64 is a flat instruction stream, whereas SPIR-V requires strict sections, type declarations, and capabilities, forcing him to introduce "sections" into his JIT.

2. "4th And Beyond" (Timothy Lottes, NeoKineoGfx, 2026)

0:00 - 8:00 | HP48 Evolution & ColorForth Mechanics

  • Lottes advocates removing compilers, linkers, and debuggers. He starts with HP48's RPN as the baseline.
  • Screen Details: He defines a red word 4K pointing to the next item on the data stack. Typing 1024 4 * computes 4096. 4K acts as a variable.
  • He defines DROP pointing to add esi, -4 and ret. 4K 1 2 + DROP yields 4096.
  • He reviews ColorForth: Code compiles onto the data stack. Yellow = Execute, Red = Define, Green = Compile, Magenta = Variable. A Yellow-to-Green transition pops the stack and emits a push instruction.
  • Screen Details: Disassembly of Block 24/26 shows 168B 2 , C28B0689 ,. This pushes bytes onto the stack, disassembling to mov edx, dword ptr esi and mov dword ptr esi, eax (literally byte-banging machine code).

8:00 - 20:00 | Branch Misprediction, Folded Interpreter, & x68

  • Standard Forth causes 16-clock branch misprediction stalls due to tag branching.
  • The Folded Interpreter: Lottes fixes this by folding a 5-byte interpreter into the end of every word: LODSD, lookup, JMP RAX. Every transition gets its own branch predictor slot.
  • x68 Architecture: Forces all instructions to 32-bit boundaries. RET (C3) is padded with three NOPs (90 90 90). MOV ESI, imm32 is padded with a 3E ignored DS prefix.
  • This makes relative offsets (CALL, JMP) align perfectly. The editor auto-relinks offsets as tokens are inserted/deleted.
  • Assembly Shorthand: Editor maps add rcx, qword ptr [rdx + 0x8] to visual h + at i08.

20:00 - End | Live Execution (SteamOS/Linux)

  • Lottes targets a mix of high-level JIT and raw x68 sourceless.
  • Cartridge execution: The binary copies itself to cart.back, maps into memory at a fixed address (bypassing ASLR), and provides a zero-fill space. 32-bit tokens act as direct absolute memory pointers, removing lookup overhead.

3. "Metaprogramming VAMP in KYRA" (Onat, SVFIG, 2025-04-26)

This presentation contains the most explicit, hardcore low-level details regarding Onat's binary-encoded compiler (VAMP).

0:00 - 10:00 | The Binary Editor, Compilation Speed, & The 2-Item Stack

  • VAMP compiles the entire program (Vulkan renderers, UI) in 8.24 milliseconds on Windows/Linux. His previous text-based Forth took 16-17.8ms just to compile the editor.
  • Hardware Locality & The Stack: Traditional Forth is "runtime opinionated" with a memory data stack, making GPU compute shaders difficult. Onat strictly restricts the stack to two CPU registers: RAX and RDX.
  • Screen Details: The stack state is constantly visualized in the top left corner.
  • Magenta Pipe |: There are no begin or end definition words. A magenta pipe token implicitly signals the end of the previous definition (compiling a ret) and starts the new one. Spaces between words imply execution.

10:00 - 18:00 | Dictionary Management, UX, & Indexing

  • Dictionary Encoding: Words are stored as 24-bit indices pointing to 8-byte cells, packed with an 8-bit color tag. (He notes the next iteration will use 32-bit indices + a separate 1-byte tag block for faster skipping of empty blocks).
  • This pure index mapping eliminates hashing and string parsing. It allows IP-protection: you can ship the source indices without the symbols/dictionary. Core language is just 2 to 4 KB.
  • Screen Details: Words are organized explicitly into 16-word horizontal "scrolls" (e.g., "Vulkan API", "FFMPEG", "x64 Assembly"). He presses Ctrl+Space to manually redefine a word in a specific scroll.
  • Comments: A comment (Blue tag) is encoded as a string directly inside the 24-bit payload (3 characters).

18:00 - 28:00 | Data Flow Visualization & Global Memory

  • Free Printf: Hovering over a word injects code to record RAX and RDX. Pressing Previous/Next steps through the execution flow visually.
  • Global Variables vs. Stacks: To pass complex state (since the stack only holds two items), he relies entirely on global memory. He explicitly critiques Rust's "safe programming" for forcing developers to pass state through 30 layers of call stacks.
  • Single-Register Memory Access: He dedicates a single CPU register to act as the base pointer for all program memory, giving instant access to "gigabytes of state".

28:00 - 45:00 | Syntax, Tags, and JIT Assembly Mechanics

  • He demonstrates compiling Vulcan commands. Instead of typing vkGetSwapchainImagesKHR, he defines a word get swap chain images in the vk device scroll.
  • The xchg Trick (48 92): Because the stack is just RAX and RDX, keeping RAX as the Top of Stack is vital. He explicitly notes that xchg rax, rdx compiles to just two bytes: 48 92 (REX.W + xchg eax, edx). Before starting a definition or making a call, the JIT emits 48 92 to ensure RAX is correctly aligned as the top.
  • Color Semantics:
    • White (Call): Emits a CALL or JMP RAX (e.g., FFE0).
    • Green (Load): Emits mov rax, [global_offset].
    • Red (Store): Emits mov [global_offset], rax.
    • Yellow (Immediate/Execute): Used heavily. For a number, emits mov rax, imm. Also used to invoke a lambda block.
    • Blue (Comment): Ignored.
    • Cyan (Number): Data literal.

45:00 - 55:00 | Lambdas { } & Basic Blocks [ ]

  • He explicitly eliminates if/else ASTs.
  • Lambdas { }: Defining a lambda block (Yellow {) does not execute it. It compiles the block elsewhere and leaves its executable memory address in RAX.
  • Basic Blocks [ ]: These define a constrained range of assembly with implicit begin, link, and end jump targets.
  • Conditionals in Blocks: He shows checking if luma > 0.6. He explicitly creates a condition variable (e.g., 26E). The > operator consumes the values and writes the boolean to condition. The conditional word then reads condition and consumes the lambda address from RAX, emitting a cmp condition, 0 and jz lambda_address.

55:00 - 1:10:00 | FFI, Stack Pointers, and OS Interop

  • RSP Alignment: The hardware stack pointer (RSP) is exclusively used for the call stack, eliminating buffer overflows. When calling OS APIs (like FFMPEG), he explicitly reads RSP into a variable to align it to 16 bytes (required by C ABI), makes the call, and restores it.
  • Filling Structs: For VkImageCreateInfo, he uses a temporary variable $ (Dollar sign). He doesn't use C headers. He knows 14 is the Type ID, manually pushing offsets into the contiguous memory space (e.g., info + offset).

1:10:00 - End | SPIR-V, Bug Triage, and Implicit Registers

  • SPIR-V Generation: VAMP directly emits SPIR-V. He shows the spec (Opcode 194 is Shift Right Logical) and demonstrates a 4-line definition that writes exactly 194 and its operands into a binary vector, replacing a 100MB glslang compiler with ~256KB of VAMP code.
  • Bug Triage: He does not use tests or asserts. He triages bugs by commenting out blocks of code (disabling them) and hitting compile (8ms) until the crash stops.
  • Implicit Register Passing: He shows UI hover logic where the slot ID is implicitly passed in register R12D across functions, completely avoiding pushing it to the data stack.
  • Lock Prefix: Writing concurrent code is handled by the macro assembler. Placing the word lock before an inc token simply emits the F0 prefix byte.