107 lines
7.8 KiB
Markdown
107 lines
7.8 KiB
Markdown
# Technical Outline: Attempt 1
|
||
|
||
## Overview
|
||
`attempt_1` is a minimal C program that serves as a proof-of-concept for the "Lottes/Onat" sourceless ColorForth paradigm. It successfully integrates a visual editor, a live JIT compiler, and an execution environment into a single, cohesive Win32 application that links against the C runtime but avoids direct includes of standard headers, using manually declared functions instead.
|
||
|
||
The application presents a visual grid of 32-bit tokens rendered via `microui` floating panels and allows the user to navigate and edit them directly. On every keypress, the token array is re-compiled into x86-64 machine code and executed, with the results (register states and global memory) displayed instantly in the HUD.
|
||
|
||
## Core Concepts Implemented
|
||
|
||
1. **Sourceless Token Array (`FArena` tape):**
|
||
* The "source code" is a contiguous block of `U4` (32-bit) integers allocated by `VirtualAlloc` and managed by the `FArena` from `duffle.h`.
|
||
* Each token is packed with a 4-bit "Color" tag and a 28-bit payload, adhering to the core design.
|
||
|
||
2. **Annotation Layer (`FArena` anno):**
|
||
* A parallel `FArena` of `U8` (64-bit) integers stores an 8-character string for each corresponding token on the tape.
|
||
* The UI renderer prioritizes displaying this string, but the compiler only ever sees the indices packed into the 32-bit token.
|
||
|
||
3. **2-Register Stack & Global Memory:**
|
||
* The JIT compiler emits x86-64 that strictly adheres to Onat's `RAX`/`RDX` register stack.
|
||
* A `vm_globals` array (16 x `U8`) is passed by pointer into the JIT'd code via `RCX` (Win64 calling convention), held in `RBX` for the duration of execution.
|
||
* `vm_globals[14]` and `vm_globals[15]` serve as the `RAX` and `RDX` save/restore slots across JIT entry and exit.
|
||
* Indices 0–13 are available as the "tape drive" global memory for `FETCH`/`STORE` primitives.
|
||
|
||
4. **Handmade x86-64 JIT Emitter with Named DSL:**
|
||
* A small set of `emit8`/`emit32`/`emit64` functions write raw x86-64 opcodes into a `VirtualAlloc` block marked `PAGE_EXECUTE_READWRITE`.
|
||
* All emission is done through a well-defined **x64 Emission DSL** (`#pragma region x64 Emission DSL`) consisting of:
|
||
* Named REX prefix constants (`x64_REX`, `x64_REX_R`, `x64_REX_B`, etc.).
|
||
* Named register encoding constants (`x64_reg_RAX`, `x64_reg_RDX`, etc.).
|
||
* ModRM and SIB composition macros (`x64_modrm(mod, reg, rm)`, `x64_sib(scale, index, base)`).
|
||
* Named opcode constants (`x64_op_MOV_reg_rm`, `x64_op_CALL_rel32`, etc.).
|
||
* Composite inline instruction helpers (`x64_XCHG_RAX_RDX()`, `x64_ADD_RAX_RDX()`, `x64_RET_IF_ZERO()`, `x64_FETCH()`, `x64_STORE()`, etc.).
|
||
* Prologue/Epilogue helpers (`x64_JIT_PROLOGUE()`, `x64_JIT_EPILOGUE()`).
|
||
* FFI helpers (`x64_FFI_PROLOGUE()`, `x64_FFI_MAP_ARGS()`, `x64_FFI_CALL_ABS(addr)`, `x64_FFI_EPILOGUE()`).
|
||
* **Raw magic bytes are forbidden** in `compile_and_run_tape` and `compile_action`. All emission uses the DSL.
|
||
|
||
5. **Modal Editor (Win32 GDI + microui):**
|
||
* The UI is built with `microui` rendered via raw Win32 GDI calls defined in `duffle.h`.
|
||
* It features two modes: `Navigation` (blue cursor, arrow key movement) and `Edit` (orange cursor, text input).
|
||
* The editor correctly handles token insertion, deletion (Vim-style backspace), tag cycling (Tab), and value editing, all while re-compiling and re-executing on every keystroke.
|
||
* Four floating panels: **ColorForth Source Tape**, **Compiler & Status**, **Registers & Globals**, **Print Log**.
|
||
|
||
6. **O(1) Dictionary & Visual Linking:**
|
||
* The dictionary relies on an edit-time visual linker. When the tape is modified, `relink_tape` resolves names to absolute source memory indices.
|
||
* The compiler resolves references in `O(1)` time by indexing into `tape_to_code_offset[65536]`.
|
||
|
||
7. **Implicit Definition Boundaries (STag_Define):**
|
||
* A `STag_Define` token causes the JIT to:
|
||
1. Emit `RET` to close the prior block (via `x64_RET()`).
|
||
2. Emit a `JMP rel32` placeholder to skip over the new definition body.
|
||
3. Record the entry point in `tape_to_code_offset[i]`.
|
||
4. Emit `xchg rax, rdx` (via `x64_XCHG_RAX_RDX()`) as the definition's first instruction, rotating the 2-register stack.
|
||
|
||
8. **Lambda Tag (STag_Lambda):**
|
||
* A `STag_Lambda` token compiles a code block out-of-line and leaves its absolute 64-bit address in `RAX` for use with `STORE` or `EXECUTE`.
|
||
* Implemented via `x64_MOV_RDX_RAX()` to save the prior TOS, a `mov rax, imm64` with a patched-in address, and a `JMP rel32` to skip the body.
|
||
|
||
9. **x68 Instruction Padding:**
|
||
* `pad32()` pads every logical block/instruction to exact 32-bit multiples using `0x90` (NOPs), aligning with the visual token grid.
|
||
|
||
10. **The FFI Bridge:**
|
||
* `x64_FFI_PROLOGUE()` pushes `RDX`, aligns `RSP` to 16 bytes, and allocates 32 bytes of shadow space. * x64_FFI_MAP_ARGS() maps the 2-register stack and globals into Win64 ABI registers (RCX=RAX, R8=globals[0], R9=globals[1]). * x64_FFI_CALL_ABS(addr) loads the absolute 64-bit function address into R10 and calls it. * x64_FFI_EPILOGUE() restores RSP and pops RDX.
|
||
|
||
Persistence (Cartridge Save/Load):
|
||
F1 saves the tape and annotation arenas (with metadata) to cartridge.bin via WriteFile.
|
||
F2 loads from cartridge.bin, re-runs relink_tape() and compile_and_run_tape() to restore full live state.
|
||
Primitive Instruction Set
|
||
|
||
```md
|
||
ID Name Emitted x86-64 (via DSL)
|
||
1 SWAP x64_XCHG_RAX_RDX()
|
||
2 MULT x64_IMUL_RAX_RDX()
|
||
3 ADD x64_ADD_RAX_RDX()
|
||
4 FETCH x64_FETCH() — mov rax, [rbx + rax*8]
|
||
5 DEC x64_DEC_RAX()
|
||
6 STORE x64_STORE() — mov [rbx + rax*8], rdx
|
||
7 RET_IF_Z x64_RET_IF_ZERO()
|
||
8 RETURN x64_RET()
|
||
9 PRINT FFI dance → ms_builtin_print
|
||
10 RET_IF_S x64_RET_IF_SIGN()
|
||
11 DUP x64_MOV_RDX_RAX()
|
||
12 DROP x64_MOV_RAX_RDX()
|
||
13 SUB x64_SUB_RAX_RDX()
|
||
14 EXECUTE x64_CALL_RAX()
|
||
```
|
||
|
||
## What’s Missing (TODO)
|
||
|
||
- DSL wrappers for forward jump placeholders: The JMP rel32 and CALL rel32 forward-jump patterns in compile_and_run_tape still use bare emit8(x64_op_JMP_rel32) + emit32(0) pairs. Dedicated x64_JMP_fwd_placeholder(U4* offset_out) and x64_patch_fwd(U4 offset) helpers should be added to the DSL to eliminate this last gap.
|
||
- Expanded Annotation Layer (Variable-Length Comments): The anno_arena strictly allocates 8 bytes per token. Arbitrarily long comment blocks need a separate indirection layer without disrupting the O(1) compile mapping.
|
||
- Expanded Instruction Set: No floating point. No multi-way branching beyond RET_IF_Z / RET_IF_S.
|
||
- Basic Block Jumps [ ]: Lottes-style scoped jump targets for structured control flow without an AST are not yet implemented.
|
||
- Tape Drive / Preemptive Scatter Improvements: The FFI argument mapping reads globals[0] and globals[1] for R8/R9. A proper scatter model that pre-places arguments into named slots before a call is not yet formalized.
|
||
- Self-Hosting Bootstrap: The editor and JIT are written in C. The long-term goal is to rewrite the core inside the custom language itself, discarding the C host.
|
||
|
||
## References Utilized
|
||
|
||
### Heavily Utilized:
|
||
|
||
- Onat’s Talks: The core architecture (2-register stack, global memory tape, JIT philosophy) is a direct implementation of the concepts from his VAMP/KYRA presentations.
|
||
Lottes’ Twitter Notes: The 2-character mapped dictionary, ret-if-signed (RET_IF_ZERO), and annotation layer concepts were taken directly from his tweets.
|
||
- User’s duffle.h & fortish-study: The C coding conventions (X-Macros, FArena, byte-width types, ms_ prefixes) were adopted from these sources.
|
||
|
||
### Lightly Utilized:
|
||
|
||
- Lottes’ Blog: Provided the high-level “sourceless” philosophy and inspiration.
|
||
- Grok Searches: Served to validate our understanding and provide parallels (like Wasm’s linear memory), but did not provide direct implementation details.
|