Compare commits

...

4 Commits

Author SHA1 Message Date
ed f18af12968 progress (need to review further) 2026-02-22 18:09:22 -05:00
ed 03fb6a8180 add bootslop sproject toml for manual_slop 2026-02-21 20:58:26 -05:00
ed 67f8639ee7 progress 2026-02-21 14:18:15 -05:00
ed 6953e6b9b3 progress 2026-02-21 13:11:54 -05:00
6 changed files with 1056 additions and 1518 deletions
+31 -16
View File
@@ -44,25 +44,40 @@ Based on the curation in `./references/`, the resulting system MUST adhere to th
## Current Development Roadmap (attempt_1)
The prototype currently implements a functional WinAPI modal editor, a 2-register (`RAX`/`RDX`) JIT compiler with an `O(1)` visual linker, x68 32-bit instruction padding, implicit definition boundaries (Magenta Pipe), and an initial FFI Bridge (`emit_ffi_dance`).
The prototype currently implements:
- A functional WinAPI modal editor backed by `microui` for immediate-mode floating panels.
- A 2-register (`RAX`/`RDX`) JIT compiler with an `O(1)` visual linker (`tape_to_code_offset` table).
- x68-style 32-bit instruction padding via `pad32()` using `0x90` NOPs.
- Implicit definition boundaries (Magenta Pipe / `STag_Define`) emitting `JMP rel32` over the body and `xchg rax, rdx` at the entry point.
- An FFI Bridge (`x64_FFI_PROLOGUE`, `x64_FFI_MAP_ARGS`, `x64_FFI_CALL_ABS`, `x64_FFI_EPILOGUE`) for calling WinAPI functions safely from JIT'd code.
- Persistence via F1 (save) / F2 (load) to `cartridge.bin`.
- A Lambda tag (`STag_Lambda`) that compiles a code block out-of-line and leaves its address in `RAX`.
- A well-defined **x64 Emission DSL** (`#pragma region x64 Emission DSL`) with named REX prefixes, register encodings, ModRM/SIB composition macros, opcode constants, and composite instruction inline functions.
### x64 Emission DSL Discipline
All JIT code emission in `main.c` MUST use the x64 Emission DSL defined in the `#pragma region x64 Emission DSL` block. Raw magic bytes are forbidden. The allowed primitives are:
- **Composite helpers:** `x64_XCHG_RAX_RDX()`, `x64_MOV_RDX_RAX()`, `x64_MOV_RAX_RDX()`, `x64_ADD_RAX_RDX()`, `x64_SUB_RAX_RDX()`, `x64_IMUL_RAX_RDX()`, `x64_DEC_RAX()`, `x64_TEST_RAX_RAX()`, `x64_RET_IF_ZERO()`, `x64_RET_IF_SIGN()`, `x64_FETCH()`, `x64_STORE()`, `x64_CALL_RAX()`, `x64_RET()`.
- **Prologue/Epilogue:** `x64_JIT_PROLOGUE()`, `x64_JIT_EPILOGUE()`.
- **FFI:** `x64_FFI_PROLOGUE()`, `x64_FFI_MAP_ARGS()`, `x64_FFI_CALL_ABS(addr)`, `x64_FFI_EPILOGUE()`.
- **Raw emission only via named constants:** `emit8(x64_op_*)`, `emit8(x64_REX*)`, `emit8(x64_modrm(*))`, `emit32(val)`, `emit64(val)`.
- **Exception:** Forward jump placeholders (`JMP rel32`, `CALL rel32`) that have no composite helper may use `emit8(x64_op_JMP_rel32)` / `emit8(x64_op_CALL_rel32)` directly with a following `emit32(0)` placeholder, pending a dedicated DSL wrapper.
Here is a breakdown of the next steps to advance the `attempt_1` implementation towards a complete ColorForth derivative:
1. ~~**Refine the FFI / Tape Drive Argument Scatter:**~~ (Completed via `PRIM_PRINT` updating to load R8/R9 from `vm_globals`)
* Currently, the FFI bridge only maps `RAX` and `RDX` to the C-ABI `RCX` and `RDX`.
* Implement "Preemptive Scatter" logic so the FFI bridge correctly reads subsequent arguments (e.g., `R8`, `R9`) directly from pre-defined offsets in the `vm_globals` tape drive instead of just zeroing them out.
1. ~~**Refine the FFI / Tape Drive Argument Scatter:**~~ (Completed)
2. ~~**Implement the Self-Modifying Cartridge (Persistence):**~~ (Completed via F1/F2 save/load)
3. ~~**Refine Visual Editor Interactions:**~~ (Completed via `microui` integration)
4. ~~**Audit and enforce x64 Emission DSL usage throughout `main.c`:**~~ (Completed — all raw magic bytes replaced with named DSL constants and composite helpers)
2. **Expanded Annotation Layer (Variable-Length Comments):**
* The current `anno_arena` strictly allocates 8 bytes (a `U8`) per token.
* Refactor the visual editor and annotation memory management to allow for arbitrarily long text blocks (comments) to be attached to specific tokens without disrupting the `O(1)` compilation mapping.
5. **Add DSL wrappers for forward jump placeholders:**
- `x64_JMP_fwd_placeholder(U4* offset_out)` — emits `E9 00000000` and writes the patch offset.
- `x64_patch_fwd(U4 offset)` — patches a previously emitted placeholder with the current code position.
- This will eliminate the last remaining raw `emit8`/`emit32` pairs in `compile_and_run_tape`.
3. ~~**Implement the Self-Modifying Cartridge (Persistence):**~~ (Completed via F1/F2 save/load)
* The tape and annotations are currently lost when the program closes.
* Move away from purely transient `VirtualAlloc` buffers to a memory-mapped file approach (or a manual Save/Load equivalent in WinAPI) to allow the "executable as source" to persist between sessions.
6. **Expanded Annotation Layer (Variable-Length Comments):**
- The current `anno_arena` strictly allocates 8 bytes (a `U8`) per token.
- Refactor the visual editor and annotation memory management to allow for arbitrarily long text blocks (comments) to be attached to specific tokens without disrupting the `O(1)` compilation mapping.
4. ~~**Refine Visual Editor Interactions:**~~ (Completed via `microui` integration)
* Implement a proper internal text-editing cursor within the `STag_Data` and `STag_Format` (annotation) tokens, rather than relying on backspace-truncation and appendage.
* Migrated to `microui` for immediate mode GUI floating panels, auto-layout token sizing (for a natural text look), and window resizing.
5. **Continuous Validation & Complex Control Flow:**
* Expand the primitive set to allow for more complex, AST-less control flow (e.g., handling Lambdas or specific Basic Block jumps).
7. **Continuous Validation & Complex Control Flow:**
- Expand the primitive set to allow for more complex, AST-less control flow (e.g., handling Basic Block jumps `[ ]`).
- Investigate adding a `RET_IF_ZERO` + tail-call pattern for loops without explicit branch instructions.
+75 -27
View File
@@ -3,7 +3,7 @@
## Overview
`attempt_1` is a minimal C program that serves as a proof-of-concept for the "Lottes/Onat" sourceless ColorForth paradigm. It successfully integrates a visual editor, a live JIT compiler, and an execution environment into a single, cohesive Win32 application that links against the C runtime but avoids direct includes of standard headers, using manually declared functions instead.
The application presents a visual grid of 32-bit tokens and allows the user to navigate and edit them directly. On every keypress, the token array is re-compiled into x86-64 machine code and executed, with the results (register states and global memory) displayed instantly in the HUD.
The application presents a visual grid of 32-bit tokens rendered via `microui` floating panels and allows the user to navigate and edit them directly. On every keypress, the token array is re-compiled into x86-64 machine code and executed, with the results (register states and global memory) displayed instantly in the HUD.
## Core Concepts Implemented
@@ -17,42 +17,90 @@ The application presents a visual grid of 32-bit tokens and allows the user to n
3. **2-Register Stack & Global Memory:**
* The JIT compiler emits x86-64 that strictly adheres to Onat's `RAX`/`RDX` register stack.
* A `vm_globals` array is passed by pointer into the JIT'd code (via `RCX` on Win64), allowing instructions like `FETCH` and `STORE` to simulate the "tape drive" memory model.
* A `vm_globals` array (16 x `U8`) is passed by pointer into the JIT'd code via `RCX` (Win64 calling convention), held in `RBX` for the duration of execution.
* `vm_globals[14]` and `vm_globals[15]` serve as the `RAX` and `RDX` save/restore slots across JIT entry and exit.
* Indices 013 are available as the "tape drive" global memory for `FETCH`/`STORE` primitives.
4. **Handmade x86-64 JIT Emitter:**
* A small set of `emit8`/`emit32` functions write raw x86-64 opcodes into a `VirtualAlloc` block marked as executable (`PAGE_EXECUTE_READWRITE`).
* This buffer is cast to a C function pointer and called directly, bypassing the need for an external assembler like NASM or a complex library like Zydis for this prototype stage.
4. **Handmade x86-64 JIT Emitter with Named DSL:**
* A small set of `emit8`/`emit32`/`emit64` functions write raw x86-64 opcodes into a `VirtualAlloc` block marked `PAGE_EXECUTE_READWRITE`.
* All emission is done through a well-defined **x64 Emission DSL** (`#pragma region x64 Emission DSL`) consisting of:
* Named REX prefix constants (`x64_REX`, `x64_REX_R`, `x64_REX_B`, etc.).
* Named register encoding constants (`x64_reg_RAX`, `x64_reg_RDX`, etc.).
* ModRM and SIB composition macros (`x64_modrm(mod, reg, rm)`, `x64_sib(scale, index, base)`).
* Named opcode constants (`x64_op_MOV_reg_rm`, `x64_op_CALL_rel32`, etc.).
* Composite inline instruction helpers (`x64_XCHG_RAX_RDX()`, `x64_ADD_RAX_RDX()`, `x64_RET_IF_ZERO()`, `x64_FETCH()`, `x64_STORE()`, etc.).
* Prologue/Epilogue helpers (`x64_JIT_PROLOGUE()`, `x64_JIT_EPILOGUE()`).
* FFI helpers (`x64_FFI_PROLOGUE()`, `x64_FFI_MAP_ARGS()`, `x64_FFI_CALL_ABS(addr)`, `x64_FFI_EPILOGUE()`).
* **Raw magic bytes are forbidden** in `compile_and_run_tape` and `compile_action`. All emission uses the DSL.
5. **Modal Editor (Win32 GDI):**
* The UI is built with raw Win32 GDI calls defined in `duffle.h`.
* It features two modes: `Navigation` (gray cursor, arrow key movement) and `Edit` (orange cursor, text input).
5. **Modal Editor (Win32 GDI + microui):**
* The UI is built with `microui` rendered via raw Win32 GDI calls defined in `duffle.h`.
* It features two modes: `Navigation` (blue cursor, arrow key movement) and `Edit` (orange cursor, text input).
* The editor correctly handles token insertion, deletion (Vim-style backspace), tag cycling (Tab), and value editing, all while re-compiling and re-executing on every keystroke.
* Four floating panels: **ColorForth Source Tape**, **Compiler & Status**, **Registers & Globals**, **Print Log**.
6. **O(1) Dictionary & Visual Linking:**
* The dictionary relies on an edit-time visual linker. When the tape is modified, `relink_tape` resolves names to absolute source memory indices.
* The compiler resolves references in `O(1)` time instantly by indexing into an offset mapping table (`tape_to_code_offset`).
* The compiler resolves references in `O(1)` time by indexing into `tape_to_code_offset[65536]`.
7. **Implicit Definition Boundaries (Magenta Pipe):**
* Definitions implicitly cause the JIT to emit a `RET` to close the prior block, and an `xchg rax, rdx` to rotate the stack for the new block.
7. **Implicit Definition Boundaries (STag_Define):**
* A `STag_Define` token causes the JIT to:
1. Emit `RET` to close the prior block (via `x64_RET()`).
2. Emit a `JMP rel32` placeholder to skip over the new definition body.
3. Record the entry point in `tape_to_code_offset[i]`.
4. Emit `xchg rax, rdx` (via `x64_XCHG_RAX_RDX()`) as the definition's first instruction, rotating the 2-register stack.
8. **x68 Instruction Padding:**
* The JIT pads every logical block/instruction to exact 32-bit multiples using `0x90` (NOPs) to perfectly align with the visual token grid logic.
8. **Lambda Tag (STag_Lambda):**
* A `STag_Lambda` token compiles a code block out-of-line and leaves its absolute 64-bit address in `RAX` for use with `STORE` or `EXECUTE`.
* Implemented via `x64_MOV_RDX_RAX()` to save the prior TOS, a `mov rax, imm64` with a patched-in address, and a `JMP rel32` to skip the body.
9. **The FFI Bridge:**
* The system uses an FFI macro (`emit_ffi_dance`) to align the `RSP` stack to 16 bytes, allocate 32 bytes of shadow space, and map the 2-register data stack/globals into the Windows C-ABI (`RCX`, `RDX`, `R8`, `R9`) to safely call WinAPI functions (like `MessageBoxA`).
9. **x68 Instruction Padding:**
* `pad32()` pads every logical block/instruction to exact 32-bit multiples using `0x90` (NOPs), aligning with the visual token grid.
## What's Missing (TODO)
10. **The FFI Bridge:**
* `x64_FFI_PROLOGUE()` pushes `RDX`, aligns `RSP` to 16 bytes, and allocates 32 bytes of shadow space. * x64_FFI_MAP_ARGS() maps the 2-register stack and globals into Win64 ABI registers (RCX=RAX, R8=globals[0], R9=globals[1]). * x64_FFI_CALL_ABS(addr) loads the absolute 64-bit function address into R10 and calls it. * x64_FFI_EPILOGUE() restores RSP and pops RDX.
* **Saving/Loading (Persistence):** The tape and annotation arenas are purely in-memory and are lost when the program closes. Need to implement the self-modifying OS cartridge concept.
* **Expanded Instruction Set:** The JIT only knows a handful of primitives. It has no support for floating point or more complex branches.
* **Annotation Editing & Comments:** Typing into an annotation just appends characters up to 8 bytes. A proper text-editing cursor within the token is needed, and support for arbitrarily long comments should be implemented.
* **Tape Drive / Preemptive Scatter Logic:** Improve the FFI argument mapping to properly read from the "tape drive" memory slots instead of just mapping RAX/RDX to the first parameters.
Persistence (Cartridge Save/Load):
F1 saves the tape and annotation arenas (with metadata) to cartridge.bin via WriteFile.
F2 loads from cartridge.bin, re-runs relink_tape() and compile_and_run_tape() to restore full live state.
Primitive Instruction Set
```md
ID Name Emitted x86-64 (via DSL)
1 SWAP x64_XCHG_RAX_RDX()
2 MULT x64_IMUL_RAX_RDX()
3 ADD x64_ADD_RAX_RDX()
4 FETCH x64_FETCH() — mov rax, [rbx + rax*8]
5 DEC x64_DEC_RAX()
6 STORE x64_STORE() — mov [rbx + rax*8], rdx
7 RET_IF_Z x64_RET_IF_ZERO()
8 RETURN x64_RET()
9 PRINT FFI dance → ms_builtin_print
10 RET_IF_S x64_RET_IF_SIGN()
11 DUP x64_MOV_RDX_RAX()
12 DROP x64_MOV_RAX_RDX()
13 SUB x64_SUB_RAX_RDX()
14 EXECUTE x64_CALL_RAX()
```
## Whats Missing (TODO)
- DSL wrappers for forward jump placeholders: The JMP rel32 and CALL rel32 forward-jump patterns in compile_and_run_tape still use bare emit8(x64_op_JMP_rel32) + emit32(0) pairs. Dedicated x64_JMP_fwd_placeholder(U4* offset_out) and x64_patch_fwd(U4 offset) helpers should be added to the DSL to eliminate this last gap.
- Expanded Annotation Layer (Variable-Length Comments): The anno_arena strictly allocates 8 bytes per token. Arbitrarily long comment blocks need a separate indirection layer without disrupting the O(1) compile mapping.
- Expanded Instruction Set: No floating point. No multi-way branching beyond RET_IF_Z / RET_IF_S.
- Basic Block Jumps [ ]: Lottes-style scoped jump targets for structured control flow without an AST are not yet implemented.
- Tape Drive / Preemptive Scatter Improvements: The FFI argument mapping reads globals[0] and globals[1] for R8/R9. A proper scatter model that pre-places arguments into named slots before a call is not yet formalized.
- Self-Hosting Bootstrap: The editor and JIT are written in C. The long-term goal is to rewrite the core inside the custom language itself, discarding the C host.
## References Utilized
* **Heavily Utilized:**
* **Onat's Talks:** The core architecture (2-register stack, global memory tape, JIT philosophy) is a direct implementation of the concepts from his VAMP/KYRA presentations.
* **Lottes' Twitter Notes:** The 2-character mapped dictionary, `ret-if-signed` (`RET_IF_ZERO`), and annotation layer concepts were taken directly from his tweets.
* **User's `duffle.h` & `fortish-study`:** The C coding conventions (X-Macros, `FArena`, byte-width types, `ms_` prefixes) were adopted from these sources.
* **Lightly Utilized:**
* **Lottes' Blog:** Provided the high-level "sourceless" philosophy and inspiration.
* **Grok Searches:** Served to validate our understanding and provide parallels (like Wasm's linear memory), but did not provide direct implementation details.
### Heavily Utilized:
- Onats Talks: The core architecture (2-register stack, global memory tape, JIT philosophy) is a direct implementation of the concepts from his VAMP/KYRA presentations.
Lottes Twitter Notes: The 2-character mapped dictionary, ret-if-signed (RET_IF_ZERO), and annotation layer concepts were taken directly from his tweets.
- Users duffle.h & fortish-study: The C coding conventions (X-Macros, FArena, byte-width types, ms_ prefixes) were adopted from these sources.
### Lightly Utilized:
- Lottes Blog: Provided the high-level “sourceless” philosophy and inspiration.
- Grok Searches: Served to validate our understanding and provide parallels (like Wasms linear memory), but did not provide direct implementation details.
+496 -96
View File
@@ -7,7 +7,8 @@
X(Data, "Data", 0x0094BAA1, "$") \
X(Imm, "Imm", 0x004AA4C2, "^") \
X(Comment, "Comment", 0x00AAAAAA, ".") \
X(Format, "Format", 0x003A2F3B, " ")
X(Format, "Format", 0x003A2F3B, " ") \
X(Lambda, "Lambda", 0x00D675A4, "%")
typedef Enum_(U4, STag) {
#define X(n, s, c, p) tmpl(STag, n),
@@ -137,6 +138,10 @@ WinAPI B4 ms_close_handle(void* hObject) asm("CloseHandle");
#define PRIM_DUP 11
#define PRIM_DROP 12
#define PRIM_SUB 13
#define PRIM_EXECUTE 14
#define PRIM_IF 15
#define PRIM_END 16
#define PRIM_RECURSE 17
global const char* prim_names[] = {
"",
@@ -152,7 +157,11 @@ global const char* prim_names[] = {
"RET_IF_S",
"DUP ",
"DROP ",
"SUB "
"SUB ",
"EXECUTE ",
"[ ",
"] ",
"RECURSE "
};
internal U4 resolve_name_to_index(const char* ref_name);
@@ -222,6 +231,12 @@ internal void emit32(U4 val) {
code_arena.used += 4;
}
}
internal void emit64(U8 val) {
if (code_arena.used + 8 <= code_arena.capacity) {
u8_r(code_arena.start+ code_arena.used)[0] = val;
code_arena.used += 8;
}
}
internal void pad32(void) {
while ((code_arena.used % 4) != 0) emit8(0x90);
@@ -274,82 +289,347 @@ internal void relink_tape(void) {
}
}
#pragma region x64 Emission DSL
// ===================================================================================================================
// x64 Emission DSL
// Follows the same bit-field composition pattern as the GP command macros.
// ===================================================================================================================
// --- REX Prefix Composition ---
// REX byte: 0100 W R X B
// W = 64-bit operand width
// R = extends ModRM.reg field to reach R8-R15
// X = extends SIB.index field to reach R8-R15
// B = extends ModRM.r/m field to reach R8-R15
#define x64_rex_offset_W 3
#define x64_rex_offset_R 2
#define x64_rex_offset_X 1
#define x64_rex_offset_B 0
#define x64_rex_base 0x40
#define x64_rex_W (0x1 << x64_rex_offset_W) // 64-bit operand size
#define x64_rex_R (0x1 << x64_rex_offset_R) // Extend reg field
#define x64_rex_X (0x1 << x64_rex_offset_X) // Extend index field
#define x64_rex_B (0x1 << x64_rex_offset_B) // Extend r/m field
#define x64_rex(flags) (x64_rex_base | (flags))
#define x64_REX x64_rex(x64_rex_W) // 0x48 - 64-bit, standard regs
#define x64_REX_R x64_rex(x64_rex_W | x64_rex_R) // 0x4C - 64-bit, extended reg field
#define x64_REX_B x64_rex(x64_rex_W | x64_rex_B) // 0x49 - 64-bit, extended r/m field
#define x64_REX_RB x64_rex(x64_rex_W | x64_rex_R | x64_rex_B) // 0x4D
// --- Register Encoding ---
// These are the 3-bit register IDs used in ModRM and SIB fields.
#define x64_reg_RAX 0x0 // 000
#define x64_reg_RCX 0x1 // 001
#define x64_reg_RDX 0x2 // 010
#define x64_reg_RBX 0x3 // 011
#define x64_reg_RSP 0x4 // 100 (also: SIB follows when in r/m with Mod != 11)
#define x64_reg_RBP 0x5 // 101 (also: disp32 no base when Mod = 00)
#define x64_reg_RSI 0x6 // 110
#define x64_reg_RDI 0x7 // 111
// Extended registers (require REX.R or REX.B)
#define x64_reg_R8 0x0 // 000 + REX.R/B
#define x64_reg_R9 0x1 // 001 + REX.R/B
#define x64_reg_R10 0x2 // 010 + REX.R/B
#define x64_reg_R11 0x3 // 011 + REX.R/B
// --- ModRM Composition ---
// ModRM byte: [Mod:2][Reg:3][R/M:3]
// Mod=11 -> both operands are registers (no memory)
// Mod=00 -> r/m is a memory address, no displacement
// Mod=01 -> r/m is a memory address + 8-bit displacement
// Mod=10 -> r/m is a memory address + 32-bit displacement
#define x64_mod_mem 0x0 // 00 - memory, no displacement
#define x64_mod_mem_disp8 0x1 // 01 - memory + 8-bit displacement
#define x64_mod_mem_disp32 0x2 // 10 - memory + 32-bit displacement
#define x64_mod_reg 0x3 // 11 - register direct (no memory)
#define x64_modrm_offset_mod 6
#define x64_modrm_offset_reg 3
#define x64_modrm_offset_rm 0
#define x64_modrm(mod, reg, rm) \
(((mod) << x64_modrm_offset_mod) | ((reg) << x64_modrm_offset_reg) | ((rm) << x64_modrm_offset_rm))
// Register-to-register ModRM shortcuts (Mod=11, the common case)
#define x64_modrm_rr(reg, rm) x64_modrm(x64_mod_reg, reg, rm)
// Commonly used ModRM bytes in this runtime (reg direct)
#define x64_modrm_RAX_RAX x64_modrm_rr(x64_reg_RAX, x64_reg_RAX) // 0xC0
#define x64_modrm_RAX_RCX x64_modrm_rr(x64_reg_RAX, x64_reg_RCX) // 0xC1
#define x64_modrm_RAX_RDX x64_modrm_rr(x64_reg_RAX, x64_reg_RDX) // 0xC2
#define x64_modrm_RAX_RBX x64_modrm_rr(x64_reg_RAX, x64_reg_RBX) // 0xC3
#define x64_modrm_RCX_RAX x64_modrm_rr(x64_reg_RCX, x64_reg_RAX) // 0xC8
#define x64_modrm_RCX_RBX x64_modrm_rr(x64_reg_RCX, x64_reg_RBX) // 0xCB
#define x64_modrm_RDX_RAX x64_modrm_rr(x64_reg_RDX, x64_reg_RAX) // 0xD0
// Memory + disp8 ModRM shortcuts (Mod=01)
#define x64_modrm_RAX_mem_disp8_RBX x64_modrm(x64_mod_mem_disp8, x64_reg_RAX, x64_reg_RBX) // 0x43
#define x64_modrm_RDX_mem_disp8_RBX x64_modrm(x64_mod_mem_disp8, x64_reg_RDX, x64_reg_RBX) // 0x53
// SIB-addressed ModRM (Mod=00, R/M=RSP signals SIB follows)
#define x64_modrm_RAX_sib x64_modrm(x64_mod_mem, x64_reg_RAX, x64_reg_RSP) // 0x04
#define x64_modrm_RDX_sib x64_modrm(x64_mod_mem, x64_reg_RDX, x64_reg_RSP) // 0x14
// --- SIB Composition ---
// SIB byte: [Scale:2][Index:3][Base:3]
// Scale: 00=*1, 01=*2, 10=*4, 11=*8
#define x64_sib_scale_1 0x0 // 00
#define x64_sib_scale_2 0x1 // 01
#define x64_sib_scale_4 0x2 // 10
#define x64_sib_scale_8 0x3 // 11
#define x64_sib_offset_scale 6
#define x64_sib_offset_index 3
#define x64_sib_offset_base 0
#define x64_sib(scale, index, base) \
(((scale) << x64_sib_offset_scale) | ((index) << x64_sib_offset_index) | ((base) << x64_sib_offset_base))
// Tape drive SIB: [rbx + rax*8]
// Scale=8, Index=RAX, Base=RBX
#define x64_sib_tape x64_sib(x64_sib_scale_8, x64_reg_RAX, x64_reg_RBX) // 0xC3
// --- Opcodes ---
#define x64_op_MOV_rm_reg 0x89 // mov r/m, reg (store: reg -> memory or register)
#define x64_op_MOV_reg_rm 0x8B // mov reg, r/m (load: memory or register -> reg)
#define x64_op_MOV_rm_imm32 0xC7 // mov r/m, imm32 (sign-extended to 64-bit)
#define x64_op_MOV_rax_imm64 0xB8 // mov rax, imm64 (register baked into opcode)
#define x64_op_MOV_r10_imm64 0xBA // mov r10, imm64 (B8 + r10_id=2, needs REX.B)
#define x64_op_XCHG_rm_reg 0x87 // xchg r/m, reg
#define x64_op_ADD_rm_reg 0x01 // add r/m, reg
#define x64_op_SUB_rm_reg 0x29 // sub r/m, reg
#define x64_op_IMUL_reg_rm 0x0F // imul prefix (followed by 0xAF)
#define x64_op_IMUL_reg_rm2 0xAF // imul reg, r/m (second byte)
#define x64_op_TEST_rm_reg 0x85 // test r/m, reg (sets ZF and SF)
#define x64_op_UNARY 0xFF // inc/dec/call-indirect (Reg field = opcode extension)
#define x64_op_ARITH_imm8 0x83 // add/sub/etc with sign-extended 8-bit immediate (Reg = extension)
#define x64_op_ARITH_imm32 0x81 // add/sub/etc with 32-bit immediate (Reg = extension)
// Opcode extensions (used as the Reg field of ModRM with 0xFF and 0x83)
#define x64_ext_INC 0x0 // /0
#define x64_ext_DEC 0x1 // /1
#define x64_ext_CALL 0x2 // /2 (used with 0xFF for indirect call)
#define x64_ext_ADD 0x0 // /0 (used with 0x83/0x81)
#define x64_ext_SUB 0x5 // /5 (used with 0x83/0x81)
#define x64_op_CALL_rel32 0xE8 // call rel32
#define x64_op_JMP_rel32 0xE9 // jmp rel32
#define x64_op_JNZ_rel8 0x75 // jnz rel8 (jump if Zero Flag not set)
#define x64_op_JNS_rel8 0x79 // jns rel8 (jump if Sign Flag not set)
#define x64_op_RET 0xC3 // ret
#define x64_op_NOP 0x90 // nop (used for padding to 32-bit alignment)
// Push/Pop (opcode encodes register directly, no ModRM)
#define x64_op_PUSH_RBX 0x53 // push rbx (50 + rbx_id=3)
#define x64_op_POP_RBX 0x5B // pop rbx (58 + rbx_id=3)
#define x64_op_PUSH_RDX 0x52 // push rdx (50 + rdx_id=2)
#define x64_op_POP_RDX 0x5A // pop rdx (58 + rdx_id=2)
// --- Composite Instruction Macros ---
// Each maps directly to the emit8/emit32/emit64 calls in compile_action.
// Stack Machine Operations
IA_ void x64_XCHG_RAX_RDX() { emit8(x64_REX); emit8(x64_op_XCHG_rm_reg); emit8(x64_modrm_RAX_RDX); }
IA_ void x64_MOV_RDX_RAX() { emit8(x64_REX); emit8(x64_op_MOV_rm_reg); emit8(x64_modrm_RAX_RDX); } // DUP
IA_ void x64_MOV_RAX_RDX() { emit8(x64_REX); emit8(x64_op_MOV_rm_reg); emit8(x64_modrm_RDX_RAX); } // DROP
// Arithmetic (2-register stack: op RAX with RDX, result in RAX)
IA_ void x64_ADD_RAX_RDX() { emit8(x64_REX); emit8(x64_op_ADD_rm_reg); emit8(x64_modrm_RAX_RDX); }
IA_ void x64_SUB_RAX_RDX() { emit8(x64_REX); emit8(x64_op_SUB_rm_reg); emit8(x64_modrm_RAX_RDX); }
IA_ void x64_IMUL_RAX_RDX() { emit8(x64_REX); emit8(x64_op_IMUL_reg_rm); emit8(x64_op_IMUL_reg_rm2); emit8(x64_modrm_RAX_RDX); }
IA_ void x64_DEC_RAX() { emit8(x64_REX); emit8(x64_op_UNARY); emit8(x64_modrm(x64_mod_reg, x64_ext_DEC, x64_reg_RAX)); }
// Flag Operations (for conditional returns)
IA_ void x64_TEST_RAX_RAX() { emit8(x64_REX); emit8(x64_op_TEST_rm_reg); emit8(x64_modrm_RAX_RAX); }
// Conditional Returns (TEST must precede these)
IA_ void x64_RET_IF_ZERO() { x64_TEST_RAX_RAX(); emit8(x64_op_JNZ_rel8); emit8(0x01); emit8(x64_op_RET); } // JNZ skips the RET if RAX != 0, so RET only fires when RAX == 0
IA_ void x64_RET_IF_SIGN() { x64_TEST_RAX_RAX(); emit8(x64_op_JNS_rel8); emit8(0x01); emit8(x64_op_RET); } // JNS skips the RET if RAX >= 0, so RET only fires when RAX < 0
// Tape Drive Memory (Preemptive Scatter via RBX base pointer)
IA_ void x64_FETCH() {
emit8(x64_REX);
emit8(x64_op_MOV_reg_rm);
emit8(x64_modrm_RAX_sib);
emit8(x64_sib_tape);
}
IA_ void x64_STORE() {
emit8(x64_REX);
emit8(x64_op_MOV_rm_reg);
emit8(x64_modrm_RDX_sib);
emit8(x64_sib_tape);
}
// Indirect call through RAX (EXECUTE primitive)
IA_ void x64_CALL_RAX() {
emit8(x64_op_UNARY);
emit8(x64_modrm(x64_mod_reg, x64_ext_CALL, x64_reg_RAX));
}
IA_ void x64_RET() { emit8(x64_op_RET); } // RET
// Forward-jump placeholder: emits JMP rel32 with a zero displacement and writes
// the offset of the displacement field into *offset_out so it can be patched later.
IA_ void x64_JMP_fwd_placeholder(U4* offset_out) {
emit8(x64_op_JMP_rel32);
offset_out[0] = u4_(code_arena.used);
emit32(0);
}
// Patch a previously emitted forward-jump placeholder.
// offset is the value written by x64_JMP_fwd_placeholder; the displacement is
// computed relative to the end of the 4-byte field (i.e. offset + 4).
IA_ void x64_patch_fwd(U4 offset) {
u4_r(code_arena.start + offset)[0] = u4_(code_arena.used) - (offset + 4);
}
// Conditional Forward Jump (IF)
IA_ void x64_JMP_IF_ZERO_fwd_placeholder(U4* offset_out) {
x64_TEST_RAX_RAX();
emit8(0x0F); emit8(0x84); // JZ rel32
offset_out[0] = u4_(code_arena.used);
emit32(0);
}
// Unconditional Backward Jump (RECURSE)
IA_ void x64_JMP_back(U4 target) {
emit8(x64_op_JMP_rel32);
S4 rel32 = s4_(target) - s4_(code_arena.used + 4);
emit32(u4_(rel32));
}
// JIT Entry Prologue: save RBX, load vm_globals ptr from RCX, restore RAX/RDX state
// vm_globals[14] = RAX save slot (14 * 8 = 0x70)
// vm_globals[15] = RDX save slot (15 * 8 = 0x78)
#define x64_vm_rax_slot 0x70
#define x64_vm_rdx_slot 0x78
IA_ void x64_JIT_PROLOGUE() {
emit8(x64_op_PUSH_RBX);
emit8(x64_REX); emit8(x64_op_MOV_rm_reg); emit8(x64_modrm_RCX_RBX);
emit8(x64_REX); emit8(x64_op_MOV_reg_rm); emit8(x64_modrm_RAX_mem_disp8_RBX); emit8(x64_vm_rax_slot);
emit8(x64_REX); emit8(x64_op_MOV_reg_rm); emit8(x64_modrm_RDX_mem_disp8_RBX); emit8(x64_vm_rdx_slot);
}
// JIT Exit Epilogue: save RAX/RDX state back, restore RBX, return
IA_ void x64_JIT_EPILOGUE() {
emit8(x64_REX); emit8(x64_op_MOV_rm_reg); emit8(x64_modrm_RAX_mem_disp8_RBX); emit8(x64_vm_rax_slot);
emit8(x64_REX); emit8(x64_op_MOV_rm_reg); emit8(x64_modrm_RDX_mem_disp8_RBX); emit8(x64_vm_rdx_slot);
emit8(x64_op_POP_RBX);
emit8(x64_op_RET);
}
// Win64 FFI Dance: align RSP, allocate 32 bytes shadow space
// sub rsp, 40 (32 shadow + 8 to realign since we pushed RDX)
#define x64_ffi_shadow_space 0x28
IA_ void x64_FFI_PROLOGUE() {
emit8(x64_op_PUSH_RDX);
emit8(x64_REX); emit8(x64_op_ARITH_imm8);
emit8(x64_modrm(x64_mod_reg, x64_ext_SUB, x64_reg_RSP));
emit8(x64_ffi_shadow_space);
}
// Map 2-register stack and tape drive into Win64 ABI argument registers
// RCX = RAX (arg1 = Top of Stack)
// RDX = RDX (arg2 = Next of Stack, already in place)
// R8 = vm_globals[0]
// R9 = vm_globals[1]
#define x64_FFI_MAP_ARGS() \
do { \
emit8(x64_REX); emit8(x64_op_MOV_rm_reg); emit8(x64_modrm_RAX_RCX); \
emit8(x64_REX_R); emit8(x64_op_MOV_reg_rm); emit8(x64_modrm(x64_mod_mem, x64_reg_R8, x64_reg_RBX)); \
emit8(x64_REX_R); emit8(x64_op_MOV_reg_rm); emit8(x64_modrm(x64_mod_mem_disp8, x64_reg_R9, x64_reg_RBX)); emit8(0x08); \
} while(0)
// Load absolute 64-bit function address into R10 and call it
#define x64_FFI_CALL_ABS(abs_addr) \
do { \
emit8(x64_REX_B); emit8(x64_op_MOV_r10_imm64); \
emit32(u4_(u8_(abs_addr) & 0xFFFFFFFF)); \
emit32(u4_(u8_(abs_addr) >> 32)); \
emit8(0x41); emit8(x64_op_UNARY); \
emit8(x64_modrm(x64_mod_reg, x64_ext_CALL, x64_reg_R10)); \
} while(0)
// Restore RSP and RDX after FFI call
#define x64_FFI_EPILOGUE() \
do { \
emit8(x64_REX); emit8(x64_op_ARITH_imm8); \
emit8(x64_modrm(x64_mod_reg, x64_ext_ADD, x64_reg_RSP)); \
emit8(x64_ffi_shadow_space); \
emit8(x64_op_POP_RDX); \
} while(0)
#pragma endregion x64 Emission DSL
internal void compile_action(U4 val)
{
if (val >= 0x10000) {
U4 p = val - 0x10000;
if (p == PRIM_SWAP) {
emit8(0x48); emit8(0x87); emit8(0xC2);
x64_XCHG_RAX_RDX();
pad32();
return;
} else if (p == PRIM_MULT) {
emit8(0x48); emit8(0x0F); emit8(0xAF); emit8(0xC2);
}
else if (p == PRIM_MULT) {
x64_IMUL_RAX_RDX();
pad32();
return;
} else if (p == PRIM_ADD) {
emit8(0x48); emit8(0x01); emit8(0xD0);
}
else if (p == PRIM_ADD) {
x64_ADD_RAX_RDX();
pad32();
return;
} else if (p == PRIM_SUB) {
emit8(0x48); emit8(0x29); emit8(0xD0);
}
else if (p == PRIM_SUB) {
x64_SUB_RAX_RDX();
pad32();
return;
} else if (p == PRIM_FETCH) {
emit8(0x48); emit8(0x8B); emit8(0x04); emit8(0xC3); // mov rax, [rbx + rax*8]
}
else if (p == PRIM_FETCH) {
x64_FETCH();
pad32();
return;
} else if (p == PRIM_DEC) {
emit8(0x48); emit8(0xFF); emit8(0xC8);
}
else if (p == PRIM_DEC) {
x64_DEC_RAX();
pad32();
return;
} else if (p == PRIM_STORE) {
emit8(0x48); emit8(0x89); emit8(0x14); emit8(0xC3); // mov [rbx + rax*8], rdx
}
else if (p == PRIM_STORE) {
x64_STORE();
pad32();
return;
} else if (p == PRIM_RET_Z) {
emit8(0x48); emit8(0x85); emit8(0xC0);
emit8(0x75); emit8(0x01);
emit8(0xC3);
}
else if (p == PRIM_RET_Z) {
x64_RET_IF_ZERO();
pad32();
return;
} else if (p == PRIM_RET_S) {
emit8(0x48); emit8(0x85); emit8(0xC0);
emit8(0x79); emit8(0x01);
emit8(0xC3);
}
else if (p == PRIM_RET_S) {
x64_RET_IF_SIGN();
pad32();
return;
} else if (p == PRIM_RET) {
emit8(0xC3);
}
else if (p == PRIM_RET) {
emit8(x64_op_RET);
pad32();
return;
} else if (p == PRIM_DUP) {
emit8(0x48); emit8(0x89); emit8(0xC2);
}
else if (p == PRIM_DUP) {
x64_MOV_RDX_RAX();
pad32();
return;
} else if (p == PRIM_DROP) {
emit8(0x48); emit8(0x89); emit8(0xD0);
}
else if (p == PRIM_DROP) {
x64_MOV_RAX_RDX();
pad32();
return;
} else if (p == PRIM_PRINT) {
// FFI Dance: Save RDX, Align RSP (32 shadow + 8 align = 40)
emit8(0x52); // push rdx
emit8(0x48); emit8(0x83); emit8(0xEC); emit8(0x28); // sub rsp, 40
// Map arguments: RCX=RAX, RDX=RDX(already loaded), R8=Globals[0], R9=Globals[1]
emit8(0x48); emit8(0x89); emit8(0xC1); // mov rcx, rax
emit8(0x4C); emit8(0x8B); emit8(0x03); // mov r8, [rbx]
emit8(0x4C); emit8(0x8B); emit8(0x4B); emit8(0x08); // mov r9, [rbx+8]
// Load func ptr and call
emit8(0x49); emit8(0xBA); // mov r10, ...
U8 addr = u8_(& ms_builtin_print);
emit32(u4_(addr & 0xFFFFFFFF));
emit32(u4_(addr >> 32));
emit8(0x41); emit8(0xFF); emit8(0xD2); // call r10
// Restore
emit8(0x48); emit8(0x83); emit8(0xC4); emit8(0x28); // add rsp, 40
emit8(0x5A); // pop rdx
}
else if (p == PRIM_EXECUTE) {
x64_CALL_RAX();
pad32();
return;
}
else if (p == PRIM_PRINT) {
x64_FFI_PROLOGUE();
x64_FFI_MAP_ARGS();
x64_FFI_CALL_ABS(u8_(& ms_builtin_print));
x64_FFI_EPILOGUE();
pad32();
return;
}
@@ -359,55 +639,71 @@ internal void compile_action(U4 val)
U4 target = tape_to_code_offset[val];
pad32();
S4 rel32 = s4_(target) - s4_(code_arena.used + 5);
emit8(0xE8);
emit8(x64_op_CALL_rel32);
emit32(u4_(rel32));
pad32();
}
}
IA_ void compile_and_run_tape(void)
{
farena_reset(& code_arena);
log_count = 0;
gdi_log_count = 0;
emit8(0x53); // push rbx (callee-saved; also aligns RSP to 0 mod 16)
emit8(0x48); emit8(0x89); emit8(0xCB); // mov rbx, rcx (stable globals ptr for whole JIT session)
emit8(0x48); emit8(0x8B); emit8(0x43); emit8(0x70); // mov rax, [rbx+0x70]
emit8(0x48); emit8(0x8B); emit8(0x53); emit8(0x78); // mov rdx, [rbx+0x78]
x64_JIT_PROLOGUE();
U4*r tape_ptr = u4_r(tape_arena.start);
U8*r anno_ptr = u8_r(anno_arena.start);
B4 in_def = false;
U4 def_jmp_offset = 0;
U4 current_def_idx = 0;
B4 in_lambda = false;
U4 lambda_jmp_offset = 0;
U4 ctrl_stack[32] = {0};
U4 ctrl_depth = 0;
U8 end_idx = run_full ? (tape_arena.used / sizeof(U4)) : (cursor_idx + 1);
for (U8 i = 0; i < end_idx; i++)
{
U4 tag = unpack_tag(tape_ptr[i]);
U4 val = unpack_val(tape_ptr[i]);
B4 is_terminator = (tag == STag_Define || tag == STag_Imm);
if (in_lambda && (is_terminator || tag == STag_Lambda)) {
x64_RET();
pad32();
x64_patch_fwd(lambda_jmp_offset);
in_lambda = false;
}
if (in_def && is_terminator) {
x64_RET();
pad32();
x64_patch_fwd(def_jmp_offset);
in_def = false;
}
if (tag == STag_Define)
{
if (in_def == false) {
pad32();
emit8(0xE9);
def_jmp_offset = code_arena.used;
emit32(0);
x64_JMP_fwd_placeholder(& def_jmp_offset);
pad32();
in_def = true;
} else {
emit8(0xC3);
pad32();
}
current_def_idx = i;
tape_to_code_offset[i] = code_arena.used;
emit8(0x48); emit8(0x87); emit8(0xC2);
x64_XCHG_RAX_RDX();
pad32();
}
else if (tag == STag_Call || tag == STag_Imm)
else if (tag == STag_Lambda)
{
char* name = (char*)&anno_ptr[i];
char* name = (char*)& anno_ptr[i];
char val_hex[9];
u64_to_hex(val, val_hex, 8);
val_hex[8] = '\0';
@@ -415,35 +711,72 @@ IA_ void compile_and_run_tape(void)
{ ktl_str8_key("name"), str8(name) },
{ ktl_str8_key("val"), str8(val_hex) },
};
debug_log(str8("Compiling call: <name> (val: <val>)"), ktl_str8_from_arr(call_log_table));
debug_log(str8("Compiling lambda: <name> (val: <val>)"), ktl_str8_from_arr(call_log_table));
if (tag == STag_Imm && in_def) {
emit8(0xC3);
// mov rdx, rax (save old rax into rdx)
x64_MOV_RDX_RAX();
// mov rax, imm64 (placeholder for lambda body address)
emit8(x64_REX);
emit8(x64_op_MOV_rax_imm64);
U4 rax_imm_offset = code_arena.used;
emit64(0);
pad32();
U4 current = code_arena.used;
u4_r(code_arena.start + def_jmp_offset)[0] = current - (def_jmp_offset + 4);
in_def = false;
// jmp rel32 over lambda body
x64_JMP_fwd_placeholder(& lambda_jmp_offset);
pad32();
in_lambda = true;
// Patch the mov rax, imm64 with the actual lambda body address
U8 lambda_addr = u8_(code_arena.start + code_arena.used);
u8_r(code_arena.start + rax_imm_offset)[0] = lambda_addr;
}
else if (tag == STag_Call || tag == STag_Imm)
{
if (val == 0x10000 + PRIM_IF) {
x64_JMP_IF_ZERO_fwd_placeholder(& ctrl_stack[ctrl_depth++]);
pad32();
}
else if (val == 0x10000 + PRIM_END) {
if (ctrl_depth > 0) {
U4 offset = ctrl_stack[--ctrl_depth];
x64_patch_fwd(offset);
}
pad32();
}
else if (val == 0x10000 + PRIM_RECURSE) {
if (in_def) {
x64_JMP_back(tape_to_code_offset[current_def_idx]);
}
pad32();
}
else {
compile_action(val);
}
else if (tag == STag_Data) {
emit8(0x48); emit8(0x89); emit8(0xC2);
emit8(0x48); emit8(0xC7); emit8(0xC0); emit32(val);
}
else if (tag == STag_Data)
{
x64_MOV_RDX_RAX();
emit8(x64_REX);
emit8(x64_op_MOV_rm_imm32);
emit8(x64_modrm(x64_mod_reg, 0, x64_reg_RAX));
emit32(val);
pad32();
}
}
if (in_lambda) {
x64_RET();
pad32();
x64_patch_fwd(lambda_jmp_offset);
}
if (in_def) {
emit8(0xC3);
x64_RET();
pad32();
U4 current = code_arena.used;
u4_r(code_arena.start + def_jmp_offset)[0] = current - (def_jmp_offset + 4);
x64_patch_fwd(def_jmp_offset);
}
emit8(0x48); emit8(0x89); emit8(0x43); emit8(0x70); // mov [rbx+0x70], rax
emit8(0x48); emit8(0x89); emit8(0x53); emit8(0x78); // mov [rbx+0x78], rdx
emit8(0x5B); // pop rbx
emit8(0xC3); // ret
x64_JIT_EPILOGUE();
typedef void JIT_Func(U8* globals_ptr);
JIT_Func* func = (JIT_Func*)code_arena.start;
@@ -452,12 +785,8 @@ IA_ void compile_and_run_tape(void)
vm_rax = vm_globals[14];
vm_rdx = vm_globals[15];
char rax_hex[9];
u64_to_hex(vm_rax, rax_hex, 8);
rax_hex[8] = '\0';
char rdx_hex[9];
u64_to_hex(vm_rdx, rdx_hex, 8);
rdx_hex[8] = '\0';
char rax_hex[9]; u64_to_hex(vm_rax, rax_hex, 8); rax_hex[8] = '\0';
char rdx_hex[9]; u64_to_hex(vm_rdx, rdx_hex, 8); rdx_hex[8] = '\0';
KTL_Slot_Str8 post_jit_log_table[] = {
{ ktl_str8_key("rax"), str8(rax_hex) },
{ ktl_str8_key("rdx"), str8(rdx_hex) },
@@ -466,6 +795,8 @@ IA_ void compile_and_run_tape(void)
}
#undef r
#undef v
#undef expect
@@ -761,14 +1092,16 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
case MS_WM_PAINT: {
mu_begin(&mu_ctx);
if (mu_begin_window(&mu_ctx, "ColorForth Source Tape", mu_rect(10, 10, 900, 480))) {
if (mu_begin_window(&mu_ctx, "ColorForth Source Tape", mu_rect(10, 10, 900, 480)))
{
U4*r tape_ptr = u4_r(tape_arena.start);
U8*r anno_ptr = u8_r(anno_arena.start);
S4 start_x = 5, start_y = 5, spacing_x = 6, spacing_y = 26;
S4 x = start_x, y = start_y;
for (U8 i = 0; i < tape_count; i++) {
for (U8 i = 0; i < tape_count; i++)
{
U4 t = tape_ptr[i];
U4 tag = unpack_tag(t);
U4 val = unpack_val(t);
@@ -851,7 +1184,6 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
mu_text(&mu_ctx, "[F5] Toggle Run | [PgUp/PgDn] Scroll");
mu_ctx.style->colors[MU_COLOR_TEXT] = mu_color(255, 255, 255, 255);
mu_text(&mu_ctx, jit_str);
if (tape_count > 0 && cursor_idx < tape_count) {
U4 cur_tag = unpack_tag(tape_ptr[cursor_idx]);
const char* tag_name = tag_names [cur_tag];
@@ -869,7 +1201,6 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
mu_ctx.style->colors[MU_COLOR_TEXT] = mu_color(230, 230, 230, 255);
mu_end_window(&mu_ctx);
}
if (mu_begin_window(&mu_ctx, "Registers & Globals", mu_rect(370, 500, 350, 200))) {
char state_str[64] = "RAX: 00000000 | RDX: 00000000";
u64_to_hex(vm_rax, state_str + 5, 8);
@@ -890,7 +1221,6 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
mu_ctx.style->colors[MU_COLOR_TEXT] = mu_color(230, 230, 230, 255);
mu_end_window(&mu_ctx);
}
if (mu_begin_window(&mu_ctx, "Print Log", mu_rect(730, 500, 250, 200))) {
mu_layout_row(&mu_ctx, 1, (int[]){-1}, 0);
mu_ctx.style->colors[MU_COLOR_TEXT] = mu_color(161, 186, 148, 255);
@@ -960,6 +1290,8 @@ int main(void) {
mu_ctx.text_width = text_width_cb;
mu_ctx.text_height = text_height_cb;
// Factorial
{
scatter(pack_token(STag_Comment, 0), "INIT ");
scatter(pack_token(STag_Data, 5), 0);
scatter(pack_token(STag_Data, 0), 0);
@@ -998,11 +1330,79 @@ int main(void) {
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Call, 0), "RECURSE ");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
}
// IF TEST
{
scatter(pack_token(STag_Comment, 0), "IF TEST ");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, 0), "[ ");
scatter(pack_token(STag_Data, 99), 0);
scatter(pack_token(STag_Call, 0), "PRINT ");
scatter(pack_token(STag_Call, 0), "] ");
scatter(pack_token(STag_Data, 11), 0);
scatter(pack_token(STag_Call, 0), "PRINT ");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, 0), "[ ");
scatter(pack_token(STag_Data, 22), 0);
scatter(pack_token(STag_Call, 0), "PRINT ");
scatter(pack_token(STag_Call, 0), "] ");
scatter(pack_token(STag_Format, 0xA), 0);
}
// Lambda test
{
scatter(pack_token(STag_Comment, 0), "LAMBDAS ");
scatter(pack_token(STag_Format, 0xA), 0);
// --- Store Lambda 1 (Square) in Global[0] ---
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Lambda, 0), "L_SQUARE");
// Lambda Body:
scatter(pack_token(STag_Call, 0), "SWAP "); // Get the argument into RAX
scatter(pack_token(STag_Call, 0), "DUP ");
scatter(pack_token(STag_Call, 0), "MULT ");
// Terminate Lambda, return to main scope, and prepare for STORE
scatter(pack_token(STag_Imm, 0), "SWAP "); // RAX = 0, RDX = L_SQUARE_addr
scatter(pack_token(STag_Call, 0), "STORE "); // Global[0] = L_SQUARE_addr
scatter(pack_token(STag_Format, 0xA), 0);
// --- Store Lambda 2 (Double) in Global[1] ---
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Lambda, 0), "L_DOUBLE");
// Lambda Body:
scatter(pack_token(STag_Call, 0), "SWAP "); // Get the argument into RAX
scatter(pack_token(STag_Call, 0), "DUP ");
scatter(pack_token(STag_Call, 0), "ADD ");
// Terminate Lambda, return to main scope, and prepare for STORE
scatter(pack_token(STag_Imm, 0), "SWAP "); // RAX = 1, RDX = L_DOUBLE_addr
scatter(pack_token(STag_Call, 0), "STORE "); // Global[1] = L_DOUBLE_addr
scatter(pack_token(STag_Format, 0xA), 0);
// --- Execute Lambda 1 (Square of 5) ---
scatter(pack_token(STag_Comment, 0), "USE L1 ");
scatter(pack_token(STag_Data, 5), 0); // Argument for lambda
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, 0), "FETCH "); // RAX = Global[0] (L_SQUARE_addr), RDX = 5
scatter(pack_token(STag_Call, 0), "EXECUTE "); // Calls L_SQUARE. Returns RAX = 25
scatter(pack_token(STag_Call, 0), "PRINT "); // Prints 0x19 (25)
scatter(pack_token(STag_Format, 0xA), 0);
// --- Execute Lambda 2 (Double of 5) ---
scatter(pack_token(STag_Comment, 0), "USE L2 ");
scatter(pack_token(STag_Data, 5), 0); // Argument for lambda
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, 0), "FETCH "); // RAX = Global[1] (L_DOUBLE_addr), RDX = 5
scatter(pack_token(STag_Call, 0), "EXECUTE "); // Calls L_DOUBLE. Returns RAX = 10
scatter(pack_token(STag_Call, 0), "PRINT "); // Prints 0xA (10)
}
relink_tape();
run_full = true;
-888
View File
@@ -1,888 +0,0 @@
#include "duffle.amd64.win32.h"
// --- Semantic Tags (Using X-Macros & Enum_) ---
#define Tag_Entries() \
X(Define, "Define", 0x0018AEFF, ":") \
X(Call, "Call", 0x00D6A454, "~") \
X(Data, "Data", 0x0094BAA1, "$") \
X(Imm, "Imm", 0x004AA4C2, "^") \
X(Comment, "Comment", 0x00AAAAAA, ".") \
X(Format, "Format", 0x003A2F3B, " ")
typedef Enum_(U4, STag) {
#define X(n, s, c, p) tmpl(STag, n),
Tag_Entries()
#undef X
STag_Count,
};
global U4 tag_colors[] = {
#define X(n, s, c, p) c,
Tag_Entries()
#undef X
};
global const char* tag_prefixes[] = {
#define X(n, s, c, p) p,
Tag_Entries()
#undef X
};
global const char* tag_names[] = {
#define X(n, s, c, p) s,
Tag_Entries()
#undef X
};
#define pack_token(tag, val) ((u4_(tag) << 28) | (u4_(val) & 0x0FFFFFFF))
#define unpack_tag(token) ( ((token) >> 28) & 0x0F)
#define unpack_val(token) ( (token) & 0x0FFFFFFF)
#define TOKENS_PER_ROW 8
#define MODE_NAV 0
#define MODE_EDIT 1
global FArena tape_arena;
global FArena anno_arena;
global U8 cursor_idx = 0;
global U4 editor_mode = MODE_NAV;
global B4 mode_switch_now = false;
global FArena code_arena;
global U8 vm_rax = 0;
global U8 vm_rdx = 0;
global U8 vm_globals[16] = {0};
global B4 run_full = false;
global U8 log_buffer[16] = {0};
global U4 log_count = 0;
global S4 scroll_y_offset = 0;
// New GDI log
#define GDI_LOG_MAX_LINES 10
#define GDI_LOG_MAX_LINE_LEN 128
global char gdi_log_buffer[GDI_LOG_MAX_LINES][GDI_LOG_MAX_LINE_LEN] = {0};
global U4 gdi_log_count = 0;
internal void debug_log(Str8 fmt, KTL_Str8 table) {
// A static buffer for our log lines.
LP_ UTF8 console_log_buffer[1024];
mem_zero(u8_(console_log_buffer), 1024);
// Format the string.
Str8 result = str8_fmt_ktl_buf(slice_ut_arr(console_log_buffer), table, fmt);
// Also write to our GDI log buffer
if (gdi_log_count < GDI_LOG_MAX_LINES) {
U4 len_to_copy = result.len < GDI_LOG_MAX_LINE_LEN - 1 ? result.len : GDI_LOG_MAX_LINE_LEN - 1;
mem_copy(u8_(gdi_log_buffer[gdi_log_count]), u8_(result.ptr), len_to_copy);
gdi_log_buffer[gdi_log_count][len_to_copy] = '\0';
gdi_log_count++;
}
// Get stdout handle.
MS_Handle stdout_handle = ms_get_std_handle(MS_STD_OUTPUT);
// Write the formatted string.
ms_write_console(stdout_handle, result.ptr, (U4)result.len, nullptr, 0);
// Write a newline.
ms_write_console(stdout_handle, (UTF8 const*r)"\n", 1, nullptr, 0);
}
U8 ms_builtin_print(U8 val, U8 rdx_val, U8 r8_val, U8 r9_val) {
char hex1[9], hex2[9], hex3[9], hex4[9];
u64_to_hex(val, hex1, 8); hex1[8] = '\0';
u64_to_hex(rdx_val, hex2, 8); hex2[8] = '\0';
u64_to_hex(r8_val, hex3, 8); hex3[8] = '\0';
u64_to_hex(r9_val, hex4, 8); hex4[8] = '\0';
KTL_Slot_Str8 log_table[] = {
{ ktl_str8_key("v1"), str8(hex1) },
{ ktl_str8_key("v2"), str8(hex2) },
{ ktl_str8_key("v3"), str8(hex3) },
{ ktl_str8_key("v4"), str8(hex4) },
};
debug_log(str8("FFI PRINT -> RCX:<v1> RDX:<v2> R8:<v3> R9:<v4>"), ktl_str8_from_arr(log_table));
if (log_count < 16) log_buffer[log_count++] = val;
return val;
}
// Visual Linker & O(1) Dictionary
global U4 tape_to_code_offset[65536] = {0};
// --- WinAPI Persistence ---
#define MS_GENERIC_READ 0x80000000
#define MS_GENERIC_WRITE 0x40000000
#define MS_CREATE_ALWAYS 2
#define MS_OPEN_EXISTING 3
#define MS_FILE_ATTRIBUTE_NORMAL 0x80
#define MS_VK_F1 0x70
#define MS_VK_F2 0x71
WinAPI void* ms_create_file_a(char const* lpFileName, U4 dwDesiredAccess, U4 dwShareMode, void* lpSecurityAttributes, U4 dwCreationDisposition, U4 dwFlagsAndAttributes, void* hTemplateFile) asm("CreateFileA");
WinAPI B4 ms_write_file(void* hFile, void const* lpBuffer, U4 nNumberOfBytesToWrite, U4* lpNumberOfBytesWritten, void* lpOverlapped) asm("WriteFile");
WinAPI B4 ms_read_file(void* hFile, void* lpBuffer, U4 nNumberOfBytesToRead, U4* lpNumberOfBytesRead, void* lpOverlapped) asm("ReadFile");
WinAPI B4 ms_close_handle(void* hObject) asm("CloseHandle");
#define PRIM_SWAP 1
#define PRIM_MULT 2
#define PRIM_ADD 3
#define PRIM_FETCH 4
#define PRIM_DEC 5
#define PRIM_STORE 6
#define PRIM_RET_Z 7
#define PRIM_RET 8
#define PRIM_PRINT 9
#define PRIM_RET_S 10
#define PRIM_DUP 11
#define PRIM_DROP 12
#define PRIM_SUB 13
global const char* prim_names[] = {
"",
"SWAP ",
"MULT ",
"ADD ",
"FETCH ",
"DEC ",
"STORE ",
"RET_IF_Z",
"RETURN ",
"PRINT ",
"RET_IF_S",
"DUP ",
"DROP ",
"SUB "
};
internal U4 resolve_name_to_index(const char* ref_name);
internal void relink_tape(void);
IA_ void compile_and_run_tape(void);
internal void save_cartridge(void) {
void* hFile = ms_create_file_a("cartridge.bin", MS_GENERIC_WRITE, 0, nullptr, MS_CREATE_ALWAYS, MS_FILE_ATTRIBUTE_NORMAL, nullptr);
if (hFile != (void*)-1) {
U4 written = 0;
ms_write_file(hFile, & tape_arena.used, 8, & written, nullptr);
ms_write_file(hFile, & anno_arena.used, 8, & written, nullptr);
ms_write_file(hFile, & cursor_idx, 8, & written, nullptr);
ms_write_file(hFile, (void*)tape_arena.start, (U4)tape_arena.used, & written, nullptr);
ms_write_file(hFile, (void*)anno_arena.start, (U4)anno_arena.used, & written, nullptr);
ms_close_handle(hFile);
}
}
internal void load_cartridge(void) {
void* hFile = ms_create_file_a("cartridge.bin", MS_GENERIC_READ, 0, nullptr, MS_OPEN_EXISTING, MS_FILE_ATTRIBUTE_NORMAL, nullptr);
if (hFile != (void*)-1) {
U4 read = 0;
ms_read_file(hFile, & tape_arena.used, 8, & read, nullptr);
ms_read_file(hFile, & anno_arena.used, 8, & read, nullptr);
ms_read_file(hFile, & cursor_idx, 8, & read, nullptr);
ms_read_file(hFile, (void*)tape_arena.start, (U4)tape_arena.used, & read, nullptr);
ms_read_file(hFile, (void*)anno_arena.start, (U4)anno_arena.used, & read, nullptr);
ms_close_handle(hFile);
relink_tape();
compile_and_run_tape();
}
}
IA_ void scatter(U4 token, const char* anno_str) {
if (tape_arena.used + sizeof(U4) <= tape_arena.capacity && anno_arena.used + sizeof(U8) <= anno_arena.capacity) {
U4 tag = unpack_tag(token);
U4 val = unpack_val(token);
if (anno_str && (tag == STag_Call || tag == STag_Imm)) {
val = resolve_name_to_index(anno_str);
}
U4*r ptr = u4_r(tape_arena.start + tape_arena.used);
ptr[0] = pack_token(tag, val);
tape_arena.used += sizeof(U4);
U8*r aptr = u8_r(anno_arena.start + anno_arena.used);
aptr[0] = 0;
if (anno_str) {
char* dest = (char*)aptr;
int i = 0; while(i < 8 && anno_str[i]) { dest[i] = anno_str[i]; i ++; }
}
anno_arena.used += sizeof(U8);
}
}
internal void emit8(U1 b) {
if (code_arena.used + 1 <= code_arena.capacity) {
u1_r(code_arena.start + code_arena.used)[0] = b;
code_arena.used += 1;
}
}
internal void emit32(U4 val) {
if (code_arena.used + 4 <= code_arena.capacity) {
u4_r(code_arena.start + code_arena.used)[0] = val;
code_arena.used += 4;
}
}
internal void pad32(void) {
while ((code_arena.used % 4) != 0) emit8(0x90);
}
internal U4 resolve_name_to_index(const char* ref_name) {
U8 tape_count = tape_arena.used / sizeof(U4);
U4*r tape_ptr = u4_r(tape_arena.start);
U8*r anno_ptr = u8_r(anno_arena.start);
U8 prim_count = array_len(prim_names);
for (int p = 1; p < prim_count; p++) {
int match = 1;
for (int c = 0; c < 8; c++) {
char c1 = ref_name[c] ? ref_name[c] : ' ';
char c2 = prim_names[p][c] ? prim_names[p][c] : ' ';
if (c1 != c2) { match = 0; break; }
}
if (match) return p + 0x10000;
}
for (U8 j = 0; j < tape_count; j++) {
if (unpack_tag(tape_ptr[j]) == STag_Define) {
char* def_name = (char*)&anno_ptr[j];
int match = 1;
for (int c = 0; c < 8; c++) {
char c1 = ref_name[c] ? ref_name[c] : ' ';
char c2 = def_name[c] ? def_name[c] : ' ';
if (c1 != c2) { match = 0; break; }
}
if (match) return j;
}
}
return 0;
}
internal void relink_tape(void) {
U8 tape_count = tape_arena.used / sizeof(U4);
U4*r tape_ptr = u4_r(tape_arena.start);
U8*r anno_ptr = u8_r(anno_arena.start);
for (U8 i = 0; i < tape_count; i++) {
U4 t = tape_ptr[i];
U4 tag = unpack_tag(t);
if (tag == STag_Call || tag == STag_Imm) {
char* ref_name = (char*)&anno_ptr[i];
U4 new_val = resolve_name_to_index(ref_name);
tape_ptr[i] = pack_token(tag, new_val);
}
}
}
internal void compile_action(U4 val)
{
if (val >= 0x10000) {
U4 p = val - 0x10000;
if (p == PRIM_SWAP) {
emit8(0x48); emit8(0x87); emit8(0xC2);
pad32();
return;
} else if (p == PRIM_MULT) {
emit8(0x48); emit8(0x0F); emit8(0xAF); emit8(0xC2);
pad32();
return;
} else if (p == PRIM_ADD) {
emit8(0x48); emit8(0x01); emit8(0xD0);
pad32();
return;
} else if (p == PRIM_SUB) {
emit8(0x48); emit8(0x29); emit8(0xD0);
pad32();
return;
} else if (p == PRIM_FETCH) {
emit8(0x48); emit8(0x8B); emit8(0x04); emit8(0xC3); // mov rax, [rbx + rax*8]
pad32();
return;
} else if (p == PRIM_DEC) {
emit8(0x48); emit8(0xFF); emit8(0xC8);
pad32();
return;
} else if (p == PRIM_STORE) {
emit8(0x48); emit8(0x89); emit8(0x14); emit8(0xC3); // mov [rbx + rax*8], rdx
pad32();
return;
} else if (p == PRIM_RET_Z) {
emit8(0x48); emit8(0x85); emit8(0xC0);
emit8(0x75); emit8(0x01);
emit8(0xC3);
pad32();
return;
} else if (p == PRIM_RET_S) {
emit8(0x48); emit8(0x85); emit8(0xC0);
emit8(0x79); emit8(0x01);
emit8(0xC3);
pad32();
return;
} else if (p == PRIM_RET) {
emit8(0xC3);
pad32();
return;
} else if (p == PRIM_DUP) {
emit8(0x48); emit8(0x89); emit8(0xC2);
pad32();
return;
} else if (p == PRIM_DROP) {
emit8(0x48); emit8(0x89); emit8(0xD0);
pad32();
return;
} else if (p == PRIM_PRINT) {
// FFI Dance: Save RDX, Align RSP (32 shadow + 8 align = 40)
emit8(0x52); // push rdx
emit8(0x48); emit8(0x83); emit8(0xEC); emit8(0x28); // sub rsp, 40
// Map arguments: RCX=RAX, RDX=RDX(already loaded), R8=Globals[0], R9=Globals[1]
emit8(0x48); emit8(0x89); emit8(0xC1); // mov rcx, rax
emit8(0x4C); emit8(0x8B); emit8(0x03); // mov r8, [rbx]
emit8(0x4C); emit8(0x8B); emit8(0x4B); emit8(0x08); // mov r9, [rbx+8]
// Load func ptr and call
emit8(0x49); emit8(0xBA); // mov r10, ...
U8 addr = u8_(& ms_builtin_print);
emit32(u4_(addr & 0xFFFFFFFF));
emit32(u4_(addr >> 32));
emit8(0x41); emit8(0xFF); emit8(0xD2); // call r10
// Restore
emit8(0x48); emit8(0x83); emit8(0xC4); emit8(0x28); // add rsp, 40
emit8(0x5A); // pop rdx
pad32();
return;
}
}
if (val > 0 && val < 0x10000) {
U4 target = tape_to_code_offset[val];
pad32();
S4 rel32 = s4_(target) - s4_(code_arena.used + 5);
emit8(0xE8);
emit32(u4_(rel32));
pad32();
}
}
IA_ void compile_and_run_tape(void)
{
farena_reset(& code_arena);
log_count = 0;
gdi_log_count = 0;
emit8(0x53); // push rbx (callee-saved; also aligns RSP to 0 mod 16)
emit8(0x48); emit8(0x89); emit8(0xCB); // mov rbx, rcx (stable globals ptr for whole JIT session)
emit8(0x48); emit8(0x8B); emit8(0x43); emit8(0x70); // mov rax, [rbx+0x70]
emit8(0x48); emit8(0x8B); emit8(0x53); emit8(0x78); // mov rdx, [rbx+0x78]
U4*r tape_ptr = u4_r(tape_arena.start);
U8*r anno_ptr = u8_r(anno_arena.start);
B4 in_def = false;
U4 def_jmp_offset = 0;
U8 end_idx = run_full ? (tape_arena.used / sizeof(U4)) : (cursor_idx + 1);
for (U8 i = 0; i < end_idx; i++)
{
U4 tag = unpack_tag(tape_ptr[i]);
U4 val = unpack_val(tape_ptr[i]);
if (tag == STag_Define)
{
if (in_def == false) {
pad32();
emit8(0xE9);
def_jmp_offset = code_arena.used;
emit32(0);
pad32();
in_def = true;
} else {
emit8(0xC3);
pad32();
}
tape_to_code_offset[i] = code_arena.used;
emit8(0x48); emit8(0x87); emit8(0xC2);
pad32();
}
else if (tag == STag_Call || tag == STag_Imm)
{
char* name = (char*)&anno_ptr[i];
char val_hex[9];
u64_to_hex(val, val_hex, 8);
val_hex[8] = '\0';
KTL_Slot_Str8 call_log_table[] = {
{ ktl_str8_key("name"), str8(name) },
{ ktl_str8_key("val"), str8(val_hex) },
};
debug_log(str8("Compiling call: <name> (val: <val>)"), ktl_str8_from_arr(call_log_table));
if (tag == STag_Imm && in_def) {
emit8(0xC3);
pad32();
U4 current = code_arena.used;
u4_r(code_arena.start + def_jmp_offset)[0] = current - (def_jmp_offset + 4);
in_def = false;
}
compile_action(val);
}
else if (tag == STag_Data) {
emit8(0x48); emit8(0x89); emit8(0xC2);
emit8(0x48); emit8(0xC7); emit8(0xC0); emit32(val);
pad32();
}
}
if (in_def) {
emit8(0xC3);
pad32();
U4 current = code_arena.used;
u4_r(code_arena.start + def_jmp_offset)[0] = current - (def_jmp_offset + 4);
}
emit8(0x48); emit8(0x89); emit8(0x43); emit8(0x70); // mov [rbx+0x70], rax
emit8(0x48); emit8(0x89); emit8(0x53); emit8(0x78); // mov [rbx+0x78], rdx
emit8(0x5B); // pop rbx
emit8(0xC3); // ret
typedef void JIT_Func(U8* globals_ptr);
JIT_Func* func = (JIT_Func*)code_arena.start;
func(vm_globals);
vm_rax = vm_globals[14];
vm_rdx = vm_globals[15];
char rax_hex[9];
u64_to_hex(vm_rax, rax_hex, 8);
rax_hex[8] = '\0';
char rdx_hex[9];
u64_to_hex(vm_rdx, rdx_hex, 8);
rdx_hex[8] = '\0';
KTL_Slot_Str8 post_jit_log_table[] = {
{ ktl_str8_key("rax"), str8(rax_hex) },
{ ktl_str8_key("rdx"), str8(rdx_hex) },
};
debug_log(str8("JIT finished. RAX: <rax>, RDX: <rdx>"), ktl_str8_from_arr(post_jit_log_table));
}
S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
{
U8 tape_count = tape_arena.used / sizeof(U4);
U4*r tape_ptr = u4_r(tape_arena.start);
switch (msg) {
case MS_WM_CHAR: {
if (editor_mode != MODE_EDIT) return 0;
U4 t = tape_ptr[cursor_idx];
U4 tag = unpack_tag(t);
U4 val = unpack_val(t);
U1 c = u1_(wparam);
B4 should_skip = c < 32 || (c == 'e' && mode_switch_now);
if (should_skip) { mode_switch_now = false; return 0; }
if (tag == STag_Data) {
U4 digit = 16;
if (c >= '0' && c <= '9') digit = c - '0';
if (c >= 'a' && c <= 'f') digit = c - 'a' + 10;
if (c >= 'A' && c <= 'F') digit = c - 'A' + 10;
if (digit < 16) {
val = ((val << 4) | digit) & 0x0FFFFFFF;
tape_ptr[cursor_idx] = pack_token(tag, val);
}
}
else if (tag != STag_Format) {
U8*r anno_ptr = u8_r(anno_arena.start);
char* anno_str = (char*) & anno_ptr[cursor_idx];
int len = 0;
while (len < 8 && anno_str[len] != '\0' && anno_str[len] != ' ') len ++;
if (len < 8) {
anno_str[len] = (char)c;
for (int i = len + 1; i < 8; i++) anno_str[i] = '\0';
if (tag == STag_Call || tag == STag_Imm || tag == STag_Define) {
U4 new_val = resolve_name_to_index(anno_str);
tape_ptr[cursor_idx] = pack_token(tag, new_val);
if (tag == STag_Define) relink_tape();
}
}
}
vm_rax = 0; vm_rdx = 0; mem_zero(u8_(vm_globals), sizeof(vm_globals));
compile_and_run_tape();
ms_invalidate_rect(hwnd, nullptr, true);
return 0;
}
case MS_WM_KEYDOWN: {
if (wparam == 0x45 && editor_mode == MODE_NAV) {
editor_mode = MODE_EDIT;
mode_switch_now = true;
ms_invalidate_rect(hwnd, nullptr, true);
return 0;
}
if (wparam == 0x1B && editor_mode == MODE_EDIT) {
editor_mode = MODE_NAV;
relink_tape();
ms_invalidate_rect(hwnd, nullptr, true);
return 0;
}
if (editor_mode == MODE_EDIT) {
if (wparam == MS_VK_BACK) {
U4 t = tape_ptr[cursor_idx];
U4 tag = unpack_tag(t);
U4 val = unpack_val(t);
if (tag == STag_Data) {
val = val >> 4;
tape_ptr[cursor_idx] = pack_token(tag, val);
}
else if (tag != STag_Format) {
U8*r anno_ptr = u8_r(anno_arena.start);
char* anno_str = (char*) & anno_ptr[cursor_idx];
int len = 0;
while (len < 8 && anno_str[len] != '\0' && anno_str[len] != ' ') len ++;
if (len > 0) {
anno_str[len - 1] = '\0';
if (tag == STag_Call || tag == STag_Imm || tag == STag_Define) {
U4 new_val = resolve_name_to_index(anno_str);
tape_ptr[cursor_idx] = pack_token(tag, new_val);
if (tag == STag_Define) relink_tape();
}
}
}
vm_rax = 0; vm_rdx = 0; mem_zero(u8_(vm_globals), sizeof(vm_globals));
compile_and_run_tape();
ms_invalidate_rect(hwnd, nullptr, true);
}
return 0;
}
if (wparam == MS_VK_RIGHT && cursor_idx < tape_count - 1) cursor_idx ++;
if (wparam == MS_VK_LEFT && cursor_idx > 0) cursor_idx --;
if (wparam == MS_VK_UP) {
U8 line_start = cursor_idx;
while (line_start > 0 && unpack_tag(tape_ptr[line_start - 1]) != STag_Format) line_start--;
if (line_start > 0) {
U8 col = cursor_idx - line_start;
U8 prev_line_start = line_start - 1;
while (prev_line_start > 0 && unpack_tag(tape_ptr[prev_line_start - 1]) != STag_Format) prev_line_start--;
U8 prev_line_len = (line_start - 1) - prev_line_start;
cursor_idx = prev_line_start + (col < prev_line_len ? col : prev_line_len);
}
}
if (wparam == MS_VK_DOWN) {
U8 line_start = cursor_idx;
while (line_start > 0 && unpack_tag(tape_ptr[line_start - 1]) != STag_Format) line_start --;
U8 col = cursor_idx - line_start;
U8 next_line_start = cursor_idx;
while (next_line_start < tape_count && unpack_tag(tape_ptr[next_line_start]) != STag_Format) next_line_start ++;
if (next_line_start < tape_count) {
next_line_start ++;
U8 next_line_end = next_line_start;
while (next_line_end < tape_count && unpack_tag(tape_ptr[next_line_end]) != STag_Format) next_line_end ++;
U8 next_line_len = next_line_end - next_line_start;
cursor_idx = next_line_start + (col < next_line_len ? col : next_line_len);
}
}
if (wparam == MS_VK_PRIOR) { scroll_y_offset -= 100; if (scroll_y_offset < 0) scroll_y_offset = 0; }
if (wparam == MS_VK_NEXT) { scroll_y_offset += 100; }
if (wparam == MS_VK_F5) { run_full = !run_full; }
if (wparam == MS_VK_F1) { save_cartridge(); }
if (wparam == MS_VK_F2) { load_cartridge(); ms_invalidate_rect(hwnd, nullptr, true); }
if (wparam == MS_VK_TAB) {
U4 t = tape_ptr[cursor_idx];
U4 tag = (unpack_tag(t) + 1) % STag_Count;
tape_ptr[cursor_idx] = pack_token(tag, unpack_val(t));
}
else if (wparam == MS_VK_BACK)
{
U8 delete_idx = cursor_idx;
B4 is_shift = (ms_get_async_key_state(MS_VK_SHIFT) & 0x8000) != 0;
if (is_shift == false) {
if (cursor_idx > 0) {
delete_idx = cursor_idx - 1;
cursor_idx--;
}
else return 0;
}
if (tape_count > 0) {
U8*r anno_ptr = u8_r(anno_arena.start);
for (U8 i = delete_idx; i < tape_count - 1; i ++) {
tape_ptr[i] = tape_ptr[i + 1];
anno_ptr[i] = anno_ptr[i + 1];
}
tape_arena.used -= sizeof(U4);
anno_arena.used -= sizeof(U8);
}
relink_tape();
}
else if (wparam == MS_VK_SPACE || wparam == MS_VK_RETURN) {
B4 is_shift = (ms_get_async_key_state(MS_VK_SHIFT) & 0x8000) != 0;
U8 insert_idx = cursor_idx;
if (is_shift) insert_idx ++;
if (tape_arena.used + sizeof(U4) <= tape_arena.capacity && anno_arena.used + sizeof(U8) <= anno_arena.capacity) {
U8*r anno_ptr = u8_r(anno_arena.start);
for (U8 i = tape_count; i > insert_idx; i --) {
tape_ptr[i] = tape_ptr[i-1];
anno_ptr[i] = anno_ptr[i-1];
}
if (wparam == MS_VK_RETURN) {
tape_ptr[insert_idx] = pack_token(STag_Format, 0xA);
anno_ptr[insert_idx] = 0;
} else {
tape_ptr[insert_idx] = pack_token(STag_Comment, 0);
anno_ptr[insert_idx] = 0;
}
if (is_shift) cursor_idx ++;
tape_arena.used += sizeof(U4);
anno_arena.used += sizeof(U8);
}
}
vm_rax = 0; vm_rdx = 0;
mem_zero(u8_(vm_globals), sizeof(vm_globals));
compile_and_run_tape();
ms_invalidate_rect(hwnd, nullptr, true);
return 0;
}
case MS_WM_PAINT: {
MS_PAINTSTRUCT ps;
void* hdc = ms_begin_paint(hwnd, & ps);
void* hFont = ms_create_font_a(20, 0, 0, 0, 400, 0, 0, 0, 0, 0, 0, 0, 0, "Consolas");
void* hOldFont = ms_select_object(hdc, hFont);
ms_set_bk_mode(hdc, 1);
void* hBgBrush = ms_create_solid_brush(0x00222222);
ms_select_object(hdc, hBgBrush);
ms_rectangle(hdc, -1, -1, 3000, 3000);
void* hBrushEdit = ms_create_solid_brush(0x008E563B);
void* hBrushNav = ms_create_solid_brush(0x00262F3B);
S4 start_x = 40, start_y = 60, spacing_x = 110, spacing_y = 35;
S4 x = start_x, y = start_y;
U4*r tape_ptr = u4_r(tape_arena.start);
U8*r anno_ptr = u8_r(anno_arena.start);
for (U8 i = 0; i < tape_count; i++)
{
if (x >= start_x + (TOKENS_PER_ROW * spacing_x)) {
x = start_x; y += spacing_y;
}
S4 render_y = y - scroll_y_offset;
if (i == cursor_idx && render_y >= 30 && render_y < 500) {
ms_select_object(hdc, editor_mode == MODE_EDIT ? hBrushEdit : hBrushNav);
ms_rectangle(hdc, x - 5, render_y - 2, x + 95, render_y + 22);
}
if (render_y >= 30 && render_y < 500)
{
U4 t = tape_ptr[i];
U4 tag = unpack_tag(t);
U4 val = unpack_val(t);
U8 anno = anno_ptr[i];
if (tag == STag_Format && val == 0xA) {
ms_set_text_color(hdc, 0x00444444);
ms_text_out_a(hdc, x, render_y, " \\n ", 6);
x = start_x;
y += spacing_y;
}
else
{
U4 color = tag_colors[tag];
const char* prefix = tag_prefixes[tag];
ms_set_text_color(hdc, color);
if (editor_mode == MODE_EDIT && i == cursor_idx) {
ms_set_text_color(hdc, 0x001E1E1E);
}
char val_str[9];
if (tag == STag_Data) {
u64_to_hex(val, val_str, 6);
val_str[6] = '\0';
}
else
{
char* a_str = (char*) & anno;
for(int c=0; c<8; c++) {
val_str[c] = a_str[c] ? a_str[c] : ' ';
}
val_str[8] = '\0';
}
char out_buf[12];
out_buf[0] = prefix[0];
out_buf[1] = ' ';
mem_copy(u8_(out_buf + 2), u8_(val_str), 8);
out_buf[10] = '\0';
ms_text_out_a(hdc, x, render_y, out_buf, 10);
x += spacing_x;
}
}
else if (unpack_tag(tape_ptr[i]) == STag_Format && unpack_val(tape_ptr[i]) == 0xA) {
x = start_x;
y += spacing_y;
}
else {
x += spacing_x;
}
}
void* hHudBrush = ms_create_solid_brush(0x00141E23);
ms_select_object(hdc, hHudBrush);
ms_rectangle(hdc, -1, 500, 3000, 3000);
ms_rectangle(hdc, -1, -1, 3000, 40);
ms_set_text_color(hdc, 0x00AAAAAA);
ms_text_out_a(hdc, 40, 10, "x86-64 Machine Code Emitter | 2-Reg Stack | [F5] Toggle Run Mode | [PgUp/PgDn] Scroll", 85);
ms_set_text_color(hdc, 0x00FFFFFF);
char jit_str[64] = "Mode: Incremental | JIT Size: 0x000 bytes";
if (run_full) mem_copy(u8_(jit_str + 6), u8_("Full "), 11);
u64_to_hex(code_arena.used, jit_str + 32, 3);
ms_text_out_a(hdc, 40, 520, jit_str, 41);
char state_str[64] = "RAX: 00000000 | RDX: 00000000";
u64_to_hex(vm_rax, state_str + 5, 8);
u64_to_hex(vm_rdx, state_str + 21, 8);
ms_set_text_color(hdc, 0x0094BAA1);
ms_text_out_a(hdc, 40, 550, state_str, 29);
if (tape_count > 0 && cursor_idx < tape_count) {
U4 cur_tag = unpack_tag(tape_ptr[cursor_idx]);
const char* tag_name = tag_names [cur_tag];
U4 cur_color = tag_colors[cur_tag];
char semantics_str[64] = "Current Tag: ";
U4 name_len = 0;
while (tag_name[name_len]) {
semantics_str[13 + name_len] = tag_name[name_len];
name_len ++;
}
semantics_str[13 + name_len] = '\0';
ms_set_text_color(hdc, cur_color);
ms_text_out_a(hdc, 40, 580, semantics_str, 13 + name_len);
}
ms_set_text_color(hdc, 0x00C8C8C8);
ms_text_out_a(hdc, 400, 520, "Global Memory (Contiguous Array):", 33);
for (int i=0; i < 4; i ++) {
char glob_str[32] = "[0]: 00000000";
glob_str[1] = '0' + i;
u64_to_hex(vm_globals[i], glob_str + 5, 8);
ms_set_text_color(hdc, 0x00D6A454);
ms_text_out_a(hdc, 400, 550 + (i * 25), glob_str, 13);
}
ms_set_text_color(hdc, 0x00C8C8C8);
ms_text_out_a(hdc, 750, 520, "Print Log:", 10);
for (int i = 0; i<log_count && i < 4; i ++) {
char log_str[32] = "00000000";
u64_to_hex(log_buffer[i], log_str, 8);
ms_set_text_color(hdc, 0x0094BAA1);
ms_text_out_a(hdc, 750, 550 + (i * 25), log_str, 8);
}
ms_set_text_color(hdc, 0x00C8C8C8);
ms_text_out_a(hdc, 40, 650, "Debug Log:", 10);
for (U4 i = 0; i < gdi_log_count; i++) {
U4 len = 0;
while(gdi_log_buffer[i][len] != '\0' && len < GDI_LOG_MAX_LINE_LEN) {
len++;
}
ms_set_text_color(hdc, 0x00AAAAAA);
ms_text_out_a(hdc, 40, 670 + (i * 20), gdi_log_buffer[i], len);
}
ms_select_object(hdc, hOldFont);
ms_delete_object(hBgBrush);
ms_delete_object(hBrushEdit);
ms_delete_object(hBrushNav);
ms_delete_object(hHudBrush);
ms_end_paint(hwnd, & ps);
return 0;
}
case MS_WM_DESTROY: { ms_post_quit_message(0); return 0; }
}
return ms_def_window_proc_a(hwnd, msg, wparam, lparam);
}
int main(void) {
Slice tape_mem = slice_ut_(u8_(ms_virtual_alloc(nullptr, 64 * 1024, MS_MEM_COMMIT | MS_MEM_RESERVE, MS_PAGE_READWRITE)), 64 * 1024);
Slice anno_mem = slice_ut_(u8_(ms_virtual_alloc(nullptr, 64 * 1024, MS_MEM_COMMIT | MS_MEM_RESERVE, MS_PAGE_READWRITE)), 64 * 1024);
Slice code_mem = slice_ut_(u8_(ms_virtual_alloc(nullptr, 64 * 1024, MS_MEM_COMMIT | MS_MEM_RESERVE, MS_PAGE_EXECUTE_READWRITE)), 64 * 1024);
if (! tape_mem.ptr || ! anno_mem.ptr || ! code_mem.ptr) ms_exit_process(1);
farena_init(& tape_arena, tape_mem);
farena_init(& anno_arena, anno_mem);
farena_init(& code_arena, code_mem);
scatter(pack_token(STag_Comment, 0), "INIT ");
scatter(pack_token(STag_Data, 5), 0);
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Imm, 0), "STORE ");
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Imm, 0), "STORE ");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Define, 0), "F_STEP ");
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Call, 0), "RET_IF_Z");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Call, 0), "MULT ");
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, 0), "STORE ");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Call, 0), "DEC ");
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, 0), "STORE ");
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Call, 0), "PRINT ");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
relink_tape();
run_full = true;
compile_and_run_tape();
run_full = false;
MS_WNDCLASSA wc;
mem_fill(u8_(& wc), 0, sizeof(wc));
wc.lpfnWndProc = win_proc;
wc.hInstance = ms_get_stock_object(0);
wc.lpszClassName = "ColorForthWindow";
wc.hbrBackground = ms_get_stock_object(4);
ms_register_class_a(& wc);
void* hwnd = ms_create_window_ex_a(0, wc.lpszClassName, "Sourceless Global Memory Explorer", MS_WS_OVERLAPPEDWINDOW | MS_WS_VISIBLE, 100, 100, 1100, 750, nullptr, nullptr, wc.hInstance, nullptr);
MS_MSG msg;
while (ms_get_message_a(& msg, nullptr, 0, 0)) { ms_translate_message(& msg); ms_dispatch_message_a(& msg); }
ms_exit_process(0);
return 0;
}
-98
View File
@@ -1,98 +0,0 @@
#pragma region OS
#pragma warning(push)
#pragma warning(disable: 4820)
#pragma comment(lib, "Kernel32.lib")
#pragma comment(lib, "Advapi32.lib")
#define MS_INVALID_HANDLE_VALUE ((MS_HANDLE)(S8)-1)
#define MS_ANYSIZE_ARRAY 1
#define MS_MEM_COMMIT 0x00001000
#define MS_MEM_RESERVE 0x00002000
#define MS_MEM_RELEASE 0x00008000
#define MS_MEM_LARGE_PAGES 0x20000000
#define MS_PAGE_READWRITE 0x04
#define MS_TOKEN_ADJUST_PRIVILEGES (0x0020)
#define MS_SE_PRIVILEGE_ENABLED (0x00000002L)
#define MS_TOKEN_QUERY (0x0008)
#define MS__TEXT(quote) L ## quote // r_winnt
#define MS_TEXT(quote) MS__TEXT(quote) // r_winnt
#define MS_SE_LOCK_MEMORY_NAME MS_TEXT("SeLockMemoryPrivilege")
typedef int MS_BOOL;
typedef unsigned long MS_DWORD;
typedef MS_DWORD* MS_PDWORD;
typedef void* MS_HANDLE;
typedef MS_HANDLE* MS_PHANDLE;
typedef long MS_LONG;
typedef S8 MS_LONGLONG;
typedef char const* MS_LPCSTR;
typedef unsigned short* MS_LPWSTR, *MS_PWSTR;
typedef void* MS_LPVOID;
typedef MS_DWORD* MS_LPDWORD;
typedef U8 MS_ULONG_PTR, *MS_PULONG_PTR;
typedef void const* MS_LPCVOID;
typedef struct MS_SECURITY_ATTRIBUTES *MS_PSECURITY_ATTRIBUTES, *MS_LPSECURITY_ATTRIBUTES;
typedef struct MS_OVERLAPPED *MS_LPOVERLAPPED;
typedef def_union(MS_LARGE_INTEGER) { struct { MS_DWORD LowPart; MS_LONG HighPart; } _; struct { MS_DWORD LowPart; MS_LONG HighPart; } u; MS_LONGLONG QuadPart; };
typedef def_struct(MS_FILE) { void* _Placeholder; };
typedef def_struct(MS_SECURITY_ATTRIBUTES) { MS_DWORD nLength; A4_B1 _PAD_; MS_LPVOID lpSecurityDescriptor; MS_BOOL bInheritHandle; };
typedef def_struct(MS_OVERLAPPED) { MS_ULONG_PTR Internal; MS_ULONG_PTR InternalHigh; union { struct { MS_DWORD Offset; MS_DWORD OffsetHigh; } _; void* Pointer; } _; MS_HANDLE hEvent; };
typedef struct MS_LUID* MS_PLUID;
typedef struct MS_LUID_AND_ATTRIBUTES* MS_PLUID_AND_ATTRIBUTES;
typedef struct MS_TOKEN_PRIVILEGES* MS_PTOKEN_PRIVILEGES;
typedef def_struct(MS_LUID) { MS_DWORD LowPart; MS_LONG HighPart; };
typedef def_struct(MS_LUID_AND_ATTRIBUTES) { MS_LUID Luid; MS_DWORD Attributes; };
typedef def_struct(MS_TOKEN_PRIVILEGES) { MS_DWORD PrivilegeCount; MS_LUID_AND_ATTRIBUTES Privileges[MS_ANYSIZE_ARRAY]; };
WinAPI MS_BOOL CloseHandle(MS_HANDLE hObject);
WinAPI MS_BOOL AdjustTokenPrivileges(MS_HANDLE TokenHandle, MS_BOOL DisableAllPrivileges, MS_PTOKEN_PRIVILEGES NewState, MS_DWORD BufferLength, MS_PTOKEN_PRIVILEGES PreviousState, MS_PDWORD ReturnLength);
WinAPI MS_HANDLE GetCurrentProcess(void);
WinAPI U8 GetLargePageMinimum(void);
WinAPI MS_BOOL LookupPrivilegeValueW(MS_LPWSTR lpSystemName, MS_LPWSTR lpName, MS_PLUID lpLuid);
WinAPI MS_BOOL OpenProcessToken(MS_HANDLE ProcessHandle, MS_DWORD DesiredAccess, MS_PHANDLE TokenHandle);
WinAPI MS_LPVOID VirtualAlloc(MS_LPVOID lpAddress, U8 dwSize, MS_DWORD flAllocationType, MS_DWORD flProtect);
WinAPI MS_BOOL VirtualFree (MS_LPVOID lpAddress, U8 dwSize, MS_DWORD dwFreeType);
#pragma warning(pop)
typedef def_struct(OS_Windows_State) { OS_SystemInfo system_info; };
global OS_Windows_State os__windows_info;
IA_ OS_SystemInfo* os_system_info(void) { return & os__windows_info.system_info; }
I_
void os__enable_large_pages(void) {
MS_HANDLE token;
if (OpenProcessToken(GetCurrentProcess(), MS_TOKEN_ADJUST_PRIVILEGES | MS_TOKEN_QUERY, &token))
{
MS_LUID luid;
if (LookupPrivilegeValueW(0, MS_SE_LOCK_MEMORY_NAME, &luid))
{
MS_TOKEN_PRIVILEGES priv;
priv.PrivilegeCount = 1;
priv.Privileges[0].Luid = luid;
priv.Privileges[0].Attributes = MS_SE_PRIVILEGE_ENABLED;
AdjustTokenPrivileges(token, 0, & priv, size_of(priv), 0, 0);
}
CloseHandle(token);
}
}
I_
void os_init(void) {
os__enable_large_pages();
OS_SystemInfo*R_ info = & os__windows_info.system_info;
info->target_page_size = (U8)GetLargePageMinimum();
}
// TODO(Ed): Large pages disabled for now... (not failing gracefully)
IA_ U8 os__vmem_reserve(U8 size, Opts_vmem*R_ opts) {
assert(opts != nullptr);
void*R_ result = VirtualAlloc(cast(void*R_, opts->base_addr), size
, MS_MEM_RESERVE
// |MS_MEM_COMMIT|(opts->no_large_pages == false ? MS_MEM_LARGE_PAGES : 0)
, MS_PAGE_READWRITE
);
return u8_(result);
}
IA_ B4 os__vmem_commit(U8 vm, U8 size, Opts_vmem*R_ opts) {
assert(opts != nullptr);
// if (opts->no_large_pages == false ) { return 1; }
B4 result = (VirtualAlloc(cast(MS_LPVOID, vm), size, MS_MEM_COMMIT, MS_PAGE_READWRITE) != 0);
return result;
}
I_ void os_vmem_release(U8 vm, U8 size) { VirtualFree(cast(MS_LPVOID, vm), 0, MS_MEM_RELEASE); }
#pragma endregion OS
+61
View File
@@ -0,0 +1,61 @@
[project]
name = "bootslop"
git_dir = "C:/projects/forth/bootslop"
word_wrap = true
system_prompt = "Avoid reading bootslop.md. It has full discussion histories, its very expnesive for your token limits."
main_context = "C:/projects/forth/bootslop/GEMINI.md"
[output]
namespace = "bootslop"
output_dir = "./md_gen"
[files]
base_dir = "."
paths = [
"C:/projects/forth/bootslop/.editorconfig",
"C:/projects/forth/bootslop/CONVENTIONS.md",
"C:/projects/forth/bootslop/GEMINI.md",
"C:/projects/forth/bootslop/Readme.md",
"C:/projects/forth/bootslop/references/Architectural_Consolidation.md",
"C:/projects/forth/bootslop/references/kyra_in-depth.md",
"C:/projects/forth/bootslop/references/neokineogfx_in-depth.md",
"C:/projects/forth/bootslop/scripts/build.attempt_1.c.ps1",
"C:/projects/forth/bootslop/attempt_1/attempt_1.md",
"C:/projects/forth/bootslop/attempt_1/duffle.amd64.win32.h",
"C:/projects/forth/bootslop/attempt_1/main.c",
"C:/projects/forth/bootslop/attempt_1/microui.h",
]
[screenshots]
base_dir = "C:/Users/Ed/scoop/apps/sharex/current/ShareX/Screenshots/2026-02"
paths = []
[discussion]
roles = [
"User",
"AI",
"Vendor API",
"System",
]
active = "test"
auto_add = true
[discussion.discussions.main]
git_commit = ""
last_updated = "2026-02-22T11:09:05"
history = [
"@2026-02-21T20:58:31\nUser:\nwhat should we do next with attempt_1",
"@2026-02-21T20:58:32\nAI:\nNow I have a clear picture. Let me give you a prioritized assessment:\n\n---\n\n## Assessment: What to Do Next with `attempt_1`\n\nBased on the roadmap, the code, and the gap analysis, here's what I'd recommend in priority order:\n\n### 1. **Fix the JMP rel32 Hack & Add Forward-Jump DSL Wrappers** (Quick Win — Immediate)\n\nThis is listed as item #5 in the roadmap and it's the most mechanically obvious fix. Right now you have this abomination on line 657:\n\n```c\nemit8(x64_op_CALL_rel32 - 3); // E9 = JMP rel32\n```\n\n`0xE8 - 3 = 0xE5`, which is **wrong**. `JMP rel32` is `0xE9`, and `0xE8 - 3 = 0xE5` (which is `IN EAX, imm8`). This is a **live bug** — the define-skip jump is emitting garbage. It only works by accident if that code path is never actually taken, or if it's being masked by some other behavior.\n\nThe fix is trivial: you already have `x64_op_JMP_rel32` defined as `0xE9` in the DSL. Plus, adding the two helpers (`x64_JMP_fwd_placeholder` / `x64_CALL_fwd_placeholder` / `x64_patch_fwd`) will clean up ~6 raw emit patterns in `compile_and_run_tape`.\n\n**Also needed:** A `x64_MOV_RAX_IMM64()` helper and a `x64_MOV_RAX_IMM32()` helper for the Data and Lambda tag emission, which are currently still raw.\n\n### 2. **Add a `x64_MOV_RAX_IMM32(val)` / `x64_MOV_RAX_IMM64_placeholder()` DSL wrapper** (Quick Win)\n\nLines 683-686 and 707-710 are raw emit sequences for `mov rax, imm64` and `mov r/m, imm32` that should have named DSL wrappers. This is the last gap before you can honestly say \"all emission uses the DSL.\"\n\n### 3. **Basic Block Jumps `[ ]`** (The Real Next Feature)\n\nThis is item #7 from the roadmap and is the **most architecturally significant next step**. Right now your control flow is limited to:\n- `RET_IF_ZERO` / `RET_IF_SIGN` (conditional early returns)\n- Lambdas (out-of-line code blocks)\n- Unconditional `CALL`\n\nYou have no way to express a loop or a conditional branch *within* a definition. The `[ ]` basic block mechanism from Onat would give you:\n- `[` — mark the top of a basic block (save the code offset)\n- `]` — emit a conditional or unconditional backward jump to the `[`\n- Or: `[` emits a forward-jump placeholder, and `]` patches it (for if-style blocks)\n\nThis would let you write factorial as a loop inside a single definition instead of relying on repeated `Imm` calls to `F_STEP`.\n\n### 4. **Expanded Annotation Layer** (Important but Can Wait)\n\nThe 8-byte annotation limit per token is functional but restrictive. This is item #6 — refactoring to variable-length comments. I'd defer this until after `[ ]` blocks work, because it's a UI/data-structure change rather than a compiler feature.\n\n---\n\n## My Recommendation: Do Items 1 + 2 Right Now\n\nFix the JMP bug, add the forward-jump DSL wrappers, and add the remaining immediate-move wrappers. This is ~30 minutes of clean mechanical work, eliminates every remaining raw emit in `compile_and_run_tape`, and gives you the exact `x64_patch_fwd` infrastructure you'll need for `[ ]` basic blocks immediately after.\n\nWant me to implement items 1 and 2 now?",
]
[discussion.discussions.test]
git_commit = "03fb6a818060e71e53e1c240e5f869d4d7e64c49"
last_updated = "2026-02-22T17:03:58"
history = [
"@2026-02-22T11:15:32\nUser:\nWhat should we do next with attempt_1?",
"@2026-02-22T11:15:34\nAI:\nLooking at the `GEMINI.md` roadmap, items 1-4 are completed. The remaining items are:\n\n**5. DSL wrappers for forward jump placeholders** — The `JMP rel32` and `CALL rel32` forward-jump patterns in `compile_and_run_tape` still use bare `emit8(x64_op_JMP_rel32) + emit32(0)` pairs. This is the most surgical, self-contained next step.\n\n**6. Expanded Annotation Layer** — The `anno_arena` is fixed at 8 bytes per token. Arbitrary-length comments need an indirection layer.\n\n**7. Continuous Validation & Complex Control Flow** — Basic Block jumps `[ ]`, more conditional patterns, loop primitives.\n\n---\n\nMy recommendation is to tackle **#5 first**, because:\n\n- It's the smallest, most well-defined gap. The exact API is already specced in the roadmap:\n - `x64_JMP_fwd_placeholder(U4* offset_out)` — emits `E9 00000000`, writes the patch offset.\n - `x64_patch_fwd(U4 offset)` — patches the placeholder with the current code position.\n- It closes the last \"raw magic bytes\" hole in the DSL discipline, making the emitter fully consistent.\n- It's a prerequisite for **#7** (Basic Block `[ ]` jumps need exactly this mechanism to resolve forward targets cleanly).\n\nAfter #5, **#7 (Basic Blocks `[ ]`)** is the highest-leverage architectural feature — it's what unlocks structured control flow without an AST, which is the core of the Lottes/Onat paradigm. **#6** (variable-length annotations) is more of a quality-of-life editor feature and can wait.\n\nWant me to implement **#5** now?",
"@2026-02-22T11:15:45\nUser:\nAlright lets do 5.",
"@2026-02-22T11:17:41\nAI:\nLooking at `main.c`, I need to add two DSL helpers to the `x64 Emission DSL` region and then replace the bare `emit8`/`emit32` pairs in `compile_and_run_tape`. Let me identify exactly what needs changing.\n\nThere are three raw forward-jump sites in `compile_and_run_tape`:\n\n1. **`STag_Define` block** — `emit8(x64_op_CALL_rel32 - 3)` + `emit32(0)` (note: `E9` is `JMP rel32`, written as `CALL_rel32 - 3` which is a bug-prone magic expression)\n2. **`STag_Lambda` block** — `emit8(x64_op_JMP_rel32)` + `emit32(0)`\n3. **Both patch sites** — `u4_r(code_arena.start + <offset>)[0] = current - (<offset> + 4)`\n\nI'll add `x64_JMP_fwd_placeholder` and `x64_patch_fwd` to the DSL, then clean up all three sites.\n\nI can see the file was saved but the PowerShell string replacement missed the `STag_Define` and both patch sites inside the loop — the old raw patterns are still there in `compile_and_run_tape`. The new DSL helpers were added correctly to the DSL region, but the call sites in the loop body weren't updated. Let me fix those now.\n\nThe DSL helpers were added correctly but the call sites in `compile_and_run_tape` still have the old raw patterns. The string replacement is failing because of whitespace/tab differences. I'll use a Python helper script to do the targeted replacements reliably.\n\nThe file looks correct. Let me do a final verification build to confirm it compiles and links cleanly.\n\nItem 5 is done and clean. Here's a summary of what was accomplished:\n\n**Two new DSL helpers added** to the `#pragma region x64 Emission DSL` block, immediately after `x64_RET()`:\n\n- `x64_JMP_fwd_placeholder(U4* offset_out)` — emits `E9 00000000` and records the displacement field's offset for later patching.\n- `x64_patch_fwd(U4 offset)` — writes the correct `rel32` displacement back into the previously emitted placeholder, computed as `current_pos - (offset + 4)`.\n\n**Six raw emission sites eliminated** in `compile_and_run_tape`:\n\n| Site | Old | New |\n|---|---|---|\n| `STag_Define` JMP emission | `emit8(x64_op_CALL_rel32 - 3)` + `emit32(0)` | `x64_JMP_fwd_placeholder(& def_jmp_offset)` |\n| `STag_Lambda` JMP emission | `emit8(x64_op_JMP_rel32)` + `emit32(0)` | `x64_JMP_fwd_placeholder(& lambda_jmp_offset)` |\n| Lambda patch (in loop) | `u4_r(...)[0] = current - (offset + 4)` | `x64_patch_fwd(lambda_jmp_offset)` |\n| Define patch (in loop) | `u4_r(...)[0] = current - (offset + 4)` | `x64_patch_fwd(def_jmp_offset)` |\n| Lambda patch (tail) | `u4_r(...)[0] = current - (offset + 4)` | `x64_patch_fwd(lambda_jmp_offset)` |\n| Define patch (tail) | `u4_r(...)[0] = current - (offset + 4)` | `x64_patch_fwd(def_jmp_offset)` |\n\nThe DSL is now fully consistent — no raw forward-jump patterns remain anywhere in `compile_and_run_tape`. The `x64_op_JMP_rel32` exception note in `GEMINI.md` can now be retired as well. Ready to move on to **#7 (Basic Block jumps `[ ]`)** whenever you are.",
"@2026-02-22T11:41:36\nUser:\nMove on to 7. (basic block jumps) sure.",
"@2026-02-22T11:45:25\nAI:\nI have successfully implemented the Basic Block jumps (`[ ]`) and the `RET_IF_ZERO` + tail-call pattern for loops. \n\nHere is a breakdown of what was added to `attempt_1/main.c`:\n\n1. **New Primitives:**\n - `[ ` (`PRIM_IF`): Emits a conditional forward jump (`JZ rel32`) that skips the block if `RAX == 0`. It pushes the patch offset to a new `ctrl_stack` managed by the JIT compiler.\n - `] ` (`PRIM_END`): Pops the offset from `ctrl_stack` and patches the forward jump with the correct relative displacement.\n - `RECURSE ` (`PRIM_RECURSE`): Emits an unconditional backward jump (`JMP rel32`) to the entry point of the current definition (`tape_to_code_offset[current_def_idx]`).\n\n2. **Tail-Call Loop Pattern:**\n The `Factorial` test was refactored. Instead of manually calling `F_STEP` five times, `F_STEP` now ends with a `RECURSE` primitive. Because `F_STEP` already uses `RET_IF_Z` to break out when the counter reaches zero, this effectively creates a zero-overhead `while` loop using the tail-call pattern.\n\n3. **AST-Less Control Flow (`[ ]`):**\n A new `IF TEST` was added to the tape initialization. It demonstrates the `[` and `]` block. It pushes `0` to `RAX`, hits `[`, and correctly skips the inner block. Then it pushes `1` to `RAX`, hits `[`, and correctly executes the inner block.\n\nThe compiler logic for these constructs is extremely simple and operates strictly in `O(1)` time during the single JIT pass, adhering to the Lottes/Onat \"zero-overhead\" philosophy. The project builds cleanly and the executable is ready to run.\n\nWould you like to move on to **#6 (Expanded Annotation Layer)** next, or is there anything you want to tweak with the control flow?",
]