Files
forth_bootslop/attempt_1/attempt_1.md
2026-02-21 14:18:15 -05:00

7.8 KiB
Raw Permalink Blame History

Technical Outline: Attempt 1

Overview

attempt_1 is a minimal C program that serves as a proof-of-concept for the "Lottes/Onat" sourceless ColorForth paradigm. It successfully integrates a visual editor, a live JIT compiler, and an execution environment into a single, cohesive Win32 application that links against the C runtime but avoids direct includes of standard headers, using manually declared functions instead.

The application presents a visual grid of 32-bit tokens rendered via microui floating panels and allows the user to navigate and edit them directly. On every keypress, the token array is re-compiled into x86-64 machine code and executed, with the results (register states and global memory) displayed instantly in the HUD.

Core Concepts Implemented

  1. Sourceless Token Array (FArena tape):

    • The "source code" is a contiguous block of U4 (32-bit) integers allocated by VirtualAlloc and managed by the FArena from duffle.h.
    • Each token is packed with a 4-bit "Color" tag and a 28-bit payload, adhering to the core design.
  2. Annotation Layer (FArena anno):

    • A parallel FArena of U8 (64-bit) integers stores an 8-character string for each corresponding token on the tape.
    • The UI renderer prioritizes displaying this string, but the compiler only ever sees the indices packed into the 32-bit token.
  3. 2-Register Stack & Global Memory:

    • The JIT compiler emits x86-64 that strictly adheres to Onat's RAX/RDX register stack.
    • A vm_globals array (16 x U8) is passed by pointer into the JIT'd code via RCX (Win64 calling convention), held in RBX for the duration of execution.
    • vm_globals[14] and vm_globals[15] serve as the RAX and RDX save/restore slots across JIT entry and exit.
    • Indices 013 are available as the "tape drive" global memory for FETCH/STORE primitives.
  4. Handmade x86-64 JIT Emitter with Named DSL:

    • A small set of emit8/emit32/emit64 functions write raw x86-64 opcodes into a VirtualAlloc block marked PAGE_EXECUTE_READWRITE.
    • All emission is done through a well-defined x64 Emission DSL (#pragma region x64 Emission DSL) consisting of:
      • Named REX prefix constants (x64_REX, x64_REX_R, x64_REX_B, etc.).
      • Named register encoding constants (x64_reg_RAX, x64_reg_RDX, etc.).
      • ModRM and SIB composition macros (x64_modrm(mod, reg, rm), x64_sib(scale, index, base)).
      • Named opcode constants (x64_op_MOV_reg_rm, x64_op_CALL_rel32, etc.).
      • Composite inline instruction helpers (x64_XCHG_RAX_RDX(), x64_ADD_RAX_RDX(), x64_RET_IF_ZERO(), x64_FETCH(), x64_STORE(), etc.).
      • Prologue/Epilogue helpers (x64_JIT_PROLOGUE(), x64_JIT_EPILOGUE()).
      • FFI helpers (x64_FFI_PROLOGUE(), x64_FFI_MAP_ARGS(), x64_FFI_CALL_ABS(addr), x64_FFI_EPILOGUE()).
    • Raw magic bytes are forbidden in compile_and_run_tape and compile_action. All emission uses the DSL.
  5. Modal Editor (Win32 GDI + microui):

    • The UI is built with microui rendered via raw Win32 GDI calls defined in duffle.h.
    • It features two modes: Navigation (blue cursor, arrow key movement) and Edit (orange cursor, text input).
    • The editor correctly handles token insertion, deletion (Vim-style backspace), tag cycling (Tab), and value editing, all while re-compiling and re-executing on every keystroke.
    • Four floating panels: ColorForth Source Tape, Compiler & Status, Registers & Globals, Print Log.
  6. O(1) Dictionary & Visual Linking:

    • The dictionary relies on an edit-time visual linker. When the tape is modified, relink_tape resolves names to absolute source memory indices.
    • The compiler resolves references in O(1) time by indexing into tape_to_code_offset[65536].
  7. Implicit Definition Boundaries (STag_Define):

    • A STag_Define token causes the JIT to:
      1. Emit RET to close the prior block (via x64_RET()).
      2. Emit a JMP rel32 placeholder to skip over the new definition body.
      3. Record the entry point in tape_to_code_offset[i].
      4. Emit xchg rax, rdx (via x64_XCHG_RAX_RDX()) as the definition's first instruction, rotating the 2-register stack.
  8. Lambda Tag (STag_Lambda):

    • A STag_Lambda token compiles a code block out-of-line and leaves its absolute 64-bit address in RAX for use with STORE or EXECUTE.
    • Implemented via x64_MOV_RDX_RAX() to save the prior TOS, a mov rax, imm64 with a patched-in address, and a JMP rel32 to skip the body.
  9. x68 Instruction Padding:

    • pad32() pads every logical block/instruction to exact 32-bit multiples using 0x90 (NOPs), aligning with the visual token grid.
  10. The FFI Bridge:

    • x64_FFI_PROLOGUE() pushes RDX, aligns RSP to 16 bytes, and allocates 32 bytes of shadow space. * x64_FFI_MAP_ARGS() maps the 2-register stack and globals into Win64 ABI registers (RCX=RAX, R8=globals[0], R9=globals[1]). * x64_FFI_CALL_ABS(addr) loads the absolute 64-bit function address into R10 and calls it. * x64_FFI_EPILOGUE() restores RSP and pops RDX.

Persistence (Cartridge Save/Load): F1 saves the tape and annotation arenas (with metadata) to cartridge.bin via WriteFile. F2 loads from cartridge.bin, re-runs relink_tape() and compile_and_run_tape() to restore full live state. Primitive Instruction Set

ID	Name	Emitted x86-64 (via DSL)
1	SWAP	x64_XCHG_RAX_RDX()
2	MULT	x64_IMUL_RAX_RDX()
3	ADD	x64_ADD_RAX_RDX()
4	FETCH	x64_FETCH() — mov rax, [rbx + rax*8]
5	DEC	x64_DEC_RAX()
6	STORE	x64_STORE() — mov [rbx + rax*8], rdx
7	RET_IF_Z	x64_RET_IF_ZERO()
8	RETURN	x64_RET()
9	PRINT	FFI dance → ms_builtin_print
10	RET_IF_S	x64_RET_IF_SIGN()
11	DUP	x64_MOV_RDX_RAX()
12	DROP	x64_MOV_RAX_RDX()
13	SUB	x64_SUB_RAX_RDX()
14	EXECUTE	x64_CALL_RAX()

Whats Missing (TODO)

  • DSL wrappers for forward jump placeholders: The JMP rel32 and CALL rel32 forward-jump patterns in compile_and_run_tape still use bare emit8(x64_op_JMP_rel32) + emit32(0) pairs. Dedicated x64_JMP_fwd_placeholder(U4* offset_out) and x64_patch_fwd(U4 offset) helpers should be added to the DSL to eliminate this last gap.
  • Expanded Annotation Layer (Variable-Length Comments): The anno_arena strictly allocates 8 bytes per token. Arbitrarily long comment blocks need a separate indirection layer without disrupting the O(1) compile mapping.
  • Expanded Instruction Set: No floating point. No multi-way branching beyond RET_IF_Z / RET_IF_S.
  • Basic Block Jumps [ ]: Lottes-style scoped jump targets for structured control flow without an AST are not yet implemented.
  • Tape Drive / Preemptive Scatter Improvements: The FFI argument mapping reads globals[0] and globals[1] for R8/R9. A proper scatter model that pre-places arguments into named slots before a call is not yet formalized.
  • Self-Hosting Bootstrap: The editor and JIT are written in C. The long-term goal is to rewrite the core inside the custom language itself, discarding the C host.

References Utilized

Heavily Utilized:

  • Onats Talks: The core architecture (2-register stack, global memory tape, JIT philosophy) is a direct implementation of the concepts from his VAMP/KYRA presentations. Lottes Twitter Notes: The 2-character mapped dictionary, ret-if-signed (RET_IF_ZERO), and annotation layer concepts were taken directly from his tweets.
  • Users duffle.h & fortish-study: The C coding conventions (X-Macros, FArena, byte-width types, ms_ prefixes) were adopted from these sources.

Lightly Utilized:

  • Lottes Blog: Provided the high-level “sourceless” philosophy and inspiration.
  • Grok Searches: Served to validate our understanding and provide parallels (like Wasms linear memory), but did not provide direct implementation details.