# Technical Outline: Attempt 1 ## Overview `attempt_1` is a minimal C program that serves as a proof-of-concept for the "Lottes/Onat" sourceless ColorForth paradigm. It successfully integrates a visual editor, a live JIT compiler, and an execution environment into a single, cohesive Win32 application that links against the C runtime but avoids direct includes of standard headers, using manually declared functions instead. The application presents a visual grid of 32-bit tokens rendered via `microui` floating panels and allows the user to navigate and edit them directly. On every keypress, the token array is re-compiled into x86-64 machine code and executed, with the results (register states and global memory) displayed instantly in the HUD. ## Core Concepts Implemented 1. **Sourceless Token Array (`FArena` tape):** * The "source code" is a contiguous block of `U4` (32-bit) integers allocated by `VirtualAlloc` and managed by the `FArena` from `duffle.h`. * Each token is packed with a 4-bit "Color" tag and a 28-bit payload, adhering to the core design. 2. **Annotation Layer (`FArena` anno):** * A parallel `FArena` of `U8` (64-bit) integers stores an 8-character string for each corresponding token on the tape. * The UI renderer prioritizes displaying this string, but the compiler only ever sees the indices packed into the 32-bit token. 3. **2-Register Stack & Global Memory:** * The JIT compiler emits x86-64 that strictly adheres to Onat's `RAX`/`RDX` register stack. * A `vm_globals` array (16 x `U8`) is passed by pointer into the JIT'd code via `RCX` (Win64 calling convention), held in `RBX` for the duration of execution. * `vm_globals[14]` and `vm_globals[15]` serve as the `RAX` and `RDX` save/restore slots across JIT entry and exit. * Indices 0–13 are available as the "tape drive" global memory for `FETCH`/`STORE` primitives. 4. **Handmade x86-64 JIT Emitter with Named DSL:** * A small set of `emit8`/`emit32`/`emit64` functions write raw x86-64 opcodes into a `VirtualAlloc` block marked `PAGE_EXECUTE_READWRITE`. * All emission is done through a well-defined **x64 Emission DSL** (`#pragma region x64 Emission DSL`) consisting of: * Named REX prefix constants (`x64_REX`, `x64_REX_R`, `x64_REX_B`, etc.). * Named register encoding constants (`x64_reg_RAX`, `x64_reg_RDX`, etc.). * ModRM and SIB composition macros (`x64_modrm(mod, reg, rm)`, `x64_sib(scale, index, base)`). * Named opcode constants (`x64_op_MOV_reg_rm`, `x64_op_CALL_rel32`, etc.). * Composite inline instruction helpers (`x64_XCHG_RAX_RDX()`, `x64_ADD_RAX_RDX()`, `x64_RET_IF_ZERO()`, `x64_FETCH()`, `x64_STORE()`, etc.). * Prologue/Epilogue helpers (`x64_JIT_PROLOGUE()`, `x64_JIT_EPILOGUE()`). * FFI helpers (`x64_FFI_PROLOGUE()`, `x64_FFI_MAP_ARGS()`, `x64_FFI_CALL_ABS(addr)`, `x64_FFI_EPILOGUE()`). * **Raw magic bytes are forbidden** in `compile_and_run_tape` and `compile_action`. All emission uses the DSL. 5. **Modal Editor (Win32 GDI + microui):** * The UI is built with `microui` rendered via raw Win32 GDI calls defined in `duffle.h`. * It features two modes: `Navigation` (blue cursor, arrow key movement) and `Edit` (orange cursor, text input). * The editor correctly handles token insertion, deletion (Vim-style backspace), tag cycling (Tab), and value editing, all while re-compiling and re-executing on every keystroke. * Four floating panels: **ColorForth Source Tape**, **Compiler & Status**, **Registers & Globals**, **Print Log**. 6. **O(1) Dictionary & Visual Linking:** * The dictionary relies on an edit-time visual linker. When the tape is modified, `relink_tape` resolves names to absolute source memory indices. * The compiler resolves references in `O(1)` time by indexing into `tape_to_code_offset[65536]`. 7. **Implicit Definition Boundaries (STag_Define):** * A `STag_Define` token causes the JIT to: 1. Emit `RET` to close the prior block (via `x64_RET()`). 2. Emit a `JMP rel32` placeholder to skip over the new definition body. 3. Record the entry point in `tape_to_code_offset[i]`. 4. Emit `xchg rax, rdx` (via `x64_XCHG_RAX_RDX()`) as the definition's first instruction, rotating the 2-register stack. 8. **Lambda Tag (STag_Lambda):** * A `STag_Lambda` token compiles a code block out-of-line and leaves its absolute 64-bit address in `RAX` for use with `STORE` or `EXECUTE`. * Implemented via `x64_MOV_RDX_RAX()` to save the prior TOS, a `mov rax, imm64` with a patched-in address, and a `JMP rel32` to skip the body. 9. **x68 Instruction Padding:** * `pad32()` pads every logical block/instruction to exact 32-bit multiples using `0x90` (NOPs), aligning with the visual token grid. 10. **The FFI Bridge:** * `x64_FFI_PROLOGUE()` pushes `RDX`, aligns `RSP` to 16 bytes, and allocates 32 bytes of shadow space. * x64_FFI_MAP_ARGS() maps the 2-register stack and globals into Win64 ABI registers (RCX=RAX, R8=globals[0], R9=globals[1]). * x64_FFI_CALL_ABS(addr) loads the absolute 64-bit function address into R10 and calls it. * x64_FFI_EPILOGUE() restores RSP and pops RDX. Persistence (Cartridge Save/Load): F1 saves the tape and annotation arenas (with metadata) to cartridge.bin via WriteFile. F2 loads from cartridge.bin, re-runs relink_tape() and compile_and_run_tape() to restore full live state. Primitive Instruction Set ```md ID Name Emitted x86-64 (via DSL) 1 SWAP x64_XCHG_RAX_RDX() 2 MULT x64_IMUL_RAX_RDX() 3 ADD x64_ADD_RAX_RDX() 4 FETCH x64_FETCH() — mov rax, [rbx + rax*8] 5 DEC x64_DEC_RAX() 6 STORE x64_STORE() — mov [rbx + rax*8], rdx 7 RET_IF_Z x64_RET_IF_ZERO() 8 RETURN x64_RET() 9 PRINT FFI dance → ms_builtin_print 10 RET_IF_S x64_RET_IF_SIGN() 11 DUP x64_MOV_RDX_RAX() 12 DROP x64_MOV_RAX_RDX() 13 SUB x64_SUB_RAX_RDX() 14 EXECUTE x64_CALL_RAX() ``` ## What’s Missing (TODO) - DSL wrappers for forward jump placeholders: The JMP rel32 and CALL rel32 forward-jump patterns in compile_and_run_tape still use bare emit8(x64_op_JMP_rel32) + emit32(0) pairs. Dedicated x64_JMP_fwd_placeholder(U4* offset_out) and x64_patch_fwd(U4 offset) helpers should be added to the DSL to eliminate this last gap. - Expanded Annotation Layer (Variable-Length Comments): The anno_arena strictly allocates 8 bytes per token. Arbitrarily long comment blocks need a separate indirection layer without disrupting the O(1) compile mapping. - Expanded Instruction Set: No floating point. No multi-way branching beyond RET_IF_Z / RET_IF_S. - Basic Block Jumps [ ]: Lottes-style scoped jump targets for structured control flow without an AST are not yet implemented. - Tape Drive / Preemptive Scatter Improvements: The FFI argument mapping reads globals[0] and globals[1] for R8/R9. A proper scatter model that pre-places arguments into named slots before a call is not yet formalized. - Self-Hosting Bootstrap: The editor and JIT are written in C. The long-term goal is to rewrite the core inside the custom language itself, discarding the C host. ## References Utilized ### Heavily Utilized: - Onat’s Talks: The core architecture (2-register stack, global memory tape, JIT philosophy) is a direct implementation of the concepts from his VAMP/KYRA presentations. Lottes’ Twitter Notes: The 2-character mapped dictionary, ret-if-signed (RET_IF_ZERO), and annotation layer concepts were taken directly from his tweets. - User’s duffle.h & fortish-study: The C coding conventions (X-Macros, FArena, byte-width types, ms_ prefixes) were adopted from these sources. ### Lightly Utilized: - Lottes’ Blog: Provided the high-level “sourceless” philosophy and inspiration. - Grok Searches: Served to validate our understanding and provide parallels (like Wasm’s linear memory), but did not provide direct implementation details.