7.8 KiB
Technical Outline: Attempt 1
Overview
attempt_1 is a minimal C program that serves as a proof-of-concept for the "Lottes/Onat" sourceless ColorForth paradigm. It successfully integrates a visual editor, a live JIT compiler, and an execution environment into a single, cohesive Win32 application that links against the C runtime but avoids direct includes of standard headers, using manually declared functions instead.
The application presents a visual grid of 32-bit tokens rendered via microui floating panels and allows the user to navigate and edit them directly. On every keypress, the token array is re-compiled into x86-64 machine code and executed, with the results (register states and global memory) displayed instantly in the HUD.
Core Concepts Implemented
-
Sourceless Token Array (
FArenatape):- The "source code" is a contiguous block of
U4(32-bit) integers allocated byVirtualAllocand managed by theFArenafromduffle.h. - Each token is packed with a 4-bit "Color" tag and a 28-bit payload, adhering to the core design.
- The "source code" is a contiguous block of
-
Annotation Layer (
FArenaanno):- A parallel
FArenaofU8(64-bit) integers stores an 8-character string for each corresponding token on the tape. - The UI renderer prioritizes displaying this string, but the compiler only ever sees the indices packed into the 32-bit token.
- A parallel
-
2-Register Stack & Global Memory:
- The JIT compiler emits x86-64 that strictly adheres to Onat's
RAX/RDXregister stack. - A
vm_globalsarray (16 xU8) is passed by pointer into the JIT'd code viaRCX(Win64 calling convention), held inRBXfor the duration of execution. vm_globals[14]andvm_globals[15]serve as theRAXandRDXsave/restore slots across JIT entry and exit.- Indices 0–13 are available as the "tape drive" global memory for
FETCH/STOREprimitives.
- The JIT compiler emits x86-64 that strictly adheres to Onat's
-
Handmade x86-64 JIT Emitter with Named DSL:
- A small set of
emit8/emit32/emit64functions write raw x86-64 opcodes into aVirtualAllocblock markedPAGE_EXECUTE_READWRITE. - All emission is done through a well-defined x64 Emission DSL (
#pragma region x64 Emission DSL) consisting of:- Named REX prefix constants (
x64_REX,x64_REX_R,x64_REX_B, etc.). - Named register encoding constants (
x64_reg_RAX,x64_reg_RDX, etc.). - ModRM and SIB composition macros (
x64_modrm(mod, reg, rm),x64_sib(scale, index, base)). - Named opcode constants (
x64_op_MOV_reg_rm,x64_op_CALL_rel32, etc.). - Composite inline instruction helpers (
x64_XCHG_RAX_RDX(),x64_ADD_RAX_RDX(),x64_RET_IF_ZERO(),x64_FETCH(),x64_STORE(), etc.). - Prologue/Epilogue helpers (
x64_JIT_PROLOGUE(),x64_JIT_EPILOGUE()). - FFI helpers (
x64_FFI_PROLOGUE(),x64_FFI_MAP_ARGS(),x64_FFI_CALL_ABS(addr),x64_FFI_EPILOGUE()).
- Named REX prefix constants (
- Raw magic bytes are forbidden in
compile_and_run_tapeandcompile_action. All emission uses the DSL.
- A small set of
-
Modal Editor (Win32 GDI + microui):
- The UI is built with
microuirendered via raw Win32 GDI calls defined induffle.h. - It features two modes:
Navigation(blue cursor, arrow key movement) andEdit(orange cursor, text input). - The editor correctly handles token insertion, deletion (Vim-style backspace), tag cycling (Tab), and value editing, all while re-compiling and re-executing on every keystroke.
- Four floating panels: ColorForth Source Tape, Compiler & Status, Registers & Globals, Print Log.
- The UI is built with
-
O(1) Dictionary & Visual Linking:
- The dictionary relies on an edit-time visual linker. When the tape is modified,
relink_taperesolves names to absolute source memory indices. - The compiler resolves references in
O(1)time by indexing intotape_to_code_offset[65536].
- The dictionary relies on an edit-time visual linker. When the tape is modified,
-
Implicit Definition Boundaries (STag_Define):
- A
STag_Definetoken causes the JIT to:- Emit
RETto close the prior block (viax64_RET()). - Emit a
JMP rel32placeholder to skip over the new definition body. - Record the entry point in
tape_to_code_offset[i]. - Emit
xchg rax, rdx(viax64_XCHG_RAX_RDX()) as the definition's first instruction, rotating the 2-register stack.
- Emit
- A
-
Lambda Tag (STag_Lambda):
- A
STag_Lambdatoken compiles a code block out-of-line and leaves its absolute 64-bit address inRAXfor use withSTOREorEXECUTE. - Implemented via
x64_MOV_RDX_RAX()to save the prior TOS, amov rax, imm64with a patched-in address, and aJMP rel32to skip the body.
- A
-
x68 Instruction Padding:
pad32()pads every logical block/instruction to exact 32-bit multiples using0x90(NOPs), aligning with the visual token grid.
-
The FFI Bridge:
x64_FFI_PROLOGUE()pushesRDX, alignsRSPto 16 bytes, and allocates 32 bytes of shadow space. * x64_FFI_MAP_ARGS() maps the 2-register stack and globals into Win64 ABI registers (RCX=RAX, R8=globals[0], R9=globals[1]). * x64_FFI_CALL_ABS(addr) loads the absolute 64-bit function address into R10 and calls it. * x64_FFI_EPILOGUE() restores RSP and pops RDX.
Persistence (Cartridge Save/Load): F1 saves the tape and annotation arenas (with metadata) to cartridge.bin via WriteFile. F2 loads from cartridge.bin, re-runs relink_tape() and compile_and_run_tape() to restore full live state. Primitive Instruction Set
ID Name Emitted x86-64 (via DSL)
1 SWAP x64_XCHG_RAX_RDX()
2 MULT x64_IMUL_RAX_RDX()
3 ADD x64_ADD_RAX_RDX()
4 FETCH x64_FETCH() — mov rax, [rbx + rax*8]
5 DEC x64_DEC_RAX()
6 STORE x64_STORE() — mov [rbx + rax*8], rdx
7 RET_IF_Z x64_RET_IF_ZERO()
8 RETURN x64_RET()
9 PRINT FFI dance → ms_builtin_print
10 RET_IF_S x64_RET_IF_SIGN()
11 DUP x64_MOV_RDX_RAX()
12 DROP x64_MOV_RAX_RDX()
13 SUB x64_SUB_RAX_RDX()
14 EXECUTE x64_CALL_RAX()
What’s Missing (TODO)
- DSL wrappers for forward jump placeholders: The JMP rel32 and CALL rel32 forward-jump patterns in compile_and_run_tape still use bare emit8(x64_op_JMP_rel32) + emit32(0) pairs. Dedicated x64_JMP_fwd_placeholder(U4* offset_out) and x64_patch_fwd(U4 offset) helpers should be added to the DSL to eliminate this last gap.
- Expanded Annotation Layer (Variable-Length Comments): The anno_arena strictly allocates 8 bytes per token. Arbitrarily long comment blocks need a separate indirection layer without disrupting the O(1) compile mapping.
- Expanded Instruction Set: No floating point. No multi-way branching beyond RET_IF_Z / RET_IF_S.
- Basic Block Jumps [ ]: Lottes-style scoped jump targets for structured control flow without an AST are not yet implemented.
- Tape Drive / Preemptive Scatter Improvements: The FFI argument mapping reads globals[0] and globals[1] for R8/R9. A proper scatter model that pre-places arguments into named slots before a call is not yet formalized.
- Self-Hosting Bootstrap: The editor and JIT are written in C. The long-term goal is to rewrite the core inside the custom language itself, discarding the C host.
References Utilized
Heavily Utilized:
- Onat’s Talks: The core architecture (2-register stack, global memory tape, JIT philosophy) is a direct implementation of the concepts from his VAMP/KYRA presentations. Lottes’ Twitter Notes: The 2-character mapped dictionary, ret-if-signed (RET_IF_ZERO), and annotation layer concepts were taken directly from his tweets.
- User’s duffle.h & fortish-study: The C coding conventions (X-Macros, FArena, byte-width types, ms_ prefixes) were adopted from these sources.
Lightly Utilized:
- Lottes’ Blog: Provided the high-level “sourceless” philosophy and inspiration.
- Grok Searches: Served to validate our understanding and provide parallels (like Wasm’s linear memory), but did not provide direct implementation details.