# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## AI Behavior Rules - **Do not** create shell scripts, README files, or descriptive files unless explicitly instructed. - **Do not** do anything beyond what was asked. Suggest extras in text; do not implement them. - If a task is heavy, use sub-agents (codebase investigator, code editor, pattern analyzer, etc.). - Screenshots are in `C:\Users\Ed\scoop\apps\sharex\current\ShareX\Screenshots\2026-02` — user will specify which by last-modified. Manually pasted content goes in `./gallery`. - Do not use `.gitignore` to infer file relevance for context. - Goal is guided mentorship: validate architecture, give nudges, provide tactical help when asked. The user is learning to build this system. Do not auto-generate finished solutions. ## Project Overview **bootslop** is an experimental x86-64 Windows application: a sourceless, zero-overhead ColorForth-inspired programming environment. Inspired by Timothy Lottes' "x56-40" / source-less programming series and Onat Türkçüoğlu's VAMP/KYRA register-stack architecture. There is no human-readable source — the "source of truth" is a binary token array (the "tape"). It features a modal visual editor (GDI-based), real-time JIT compilation to x86-64 machine code, and cartridge-based persistence. Canonical architecture reference: `references/Architectural_Consolidation.md` Coding conventions: `CONVENTIONS.md` AI behavior and goal context: `GEMINI.md` ## Build Two-stage build via PowerShell: compile with clang, link with lld-link. ```powershell pwsh scripts/build.attempt_1.c.ps1 ``` Output goes to `build/attempt_1.exe`. Run the exe manually — it opens a GUI window. **Toolchain requirements:** `clang` and `lld-link.exe` on PATH. Targets amd64 Windows 11. Compiler flags: `-std=c23 -O0 -g -Wall -DBUILD_DEBUG=1 -fno-exceptions -fdiagnostics-absolute-paths` Linker flags: `/MACHINE:X64 /SUBSYSTEM:CONSOLE /DEBUG /INCREMENTAL:NO` + `kernel32.lib user32.lib gdi32.lib` Note: `-nostdlib` / `-ffreestanding` are commented out in the build script — the CRT is currently linked but `` / `` must not be included directly. No automated tests exist. Verification is interactive via the running GUI. ## Code Architecture All active source is in `attempt_1/`: - **`main.c`** — The entire application (~867 lines). Contains: semantic tag definitions (X-macro), global VM state, the JIT compiler (`compile_action`, `compile_and_run_tape`), the GDI renderer, keyboard input handling, and cartridge save/load (F1/F2). - **`duffle.amd64.win32.h`** — The C DSL header. Defines all base types (`U1`–`U8`, `S1`–`S8`, `F4`, `F8`, `B1`–`B8`, `Str8`, `UTF8`), macros (`global`, `internal`, `LP_`, `I_`, `N_`), arena allocator (`FArena`, `farena_push`, `farena_reset`), string formatting, and raw WinAPI bindings. ### Token / Tape Model - Tokens are `U4` (32-bit): top 4 bits = semantic tag, lower 28 bits = value or annotation index. - Tags are defined via X-macro `Tag_Entries()`: `Define` (`:`) · `Call` (`~`) · `Data` (`$`) · `Imm` (`^`) · `Comment` (`.`) · `Format` (` `) - Two arenas: `tape_arena` (array of `U4` tokens) and `anno_arena` (array of `U8` — one 8-char name slot per token, space-padded for name resolution). - Helper macros: `pack_token(tag, val)`, `unpack_tag(token)`, `unpack_val(token)`. ### JIT Compiler - `compile_action(val)` — emits x86-64 machine code for a single primitive or call. Called by `compile_and_run_tape` for each token. - `compile_and_run_tape()` (`IA_` always-inline) — resets `code_arena`, compiles the tape up to `cursor_idx + 1` (incremental mode, `run_full == false`) or the full tape (`run_full == true`), then immediately executes the generated code. Called on every relevant keystroke. - **JIT prologue/epilogue:** The generated function takes `U8* globals_ptr` (= `vm_globals`). Prologue loads `rax` from `globals_ptr[0x70/8]` = `vm_globals[14]` and `rdx` from `globals_ptr[0x78/8]` = `vm_globals[15]`. Epilogue stores them back. `vm_rax` / `vm_rdx` are synced from `vm_globals[14/15]` after execution. - **The Magenta Pipe:** Every `Define` token emits a `JMP` (to skip over the function body for inline execution flow) followed by `xchg rax, rdx` at the word entry point. This is the implicit register-stack rotation at word boundaries — Onat's "magenta pipe". - **O(1) linker:** `tape_to_code_offset[65536]` maps tape index → byte offset in `code_arena`. Populated during `compile_and_run_tape` when a `Define` token is encountered. - The VM uses two global registers (`vm_rax`, `vm_rdx`) and 16 global memory cells (`vm_globals[16]`). No traditional Forth data stack in memory. - **13 primitive operations:** `SWAP` · `MULT` · `ADD` · `FETCH` · `STORE` · `DUP` · `DROP` · `SUB` · `DEC` · `PRINT` · `RET` · `RET_IF_Z` · `RET_IF_S` - **32-bit instruction granularity:** All emitted instructions are padded to 4-byte alignment via NOP bytes (0x90). `pad32()` enforces this after every emit. - Name resolution: `resolve_name_to_index()` matches 8-char space-padded annotations against primitives first, then prior `Define` tokens. After edits, `relink_tape()` re-resolves all `Call`/`Imm` references. ### Editor - Two modes: `MODE_NAV` (navigate) / `MODE_EDIT` (type into token). Toggled with `E` / `Escape`. - **Key bindings (NAV mode):** - `E` — enter MODE_EDIT - Arrow keys — move cursor (Up/Down navigate by logical lines delimited by `Format` tokens) - `Tab` — cycle the current token's tag through `STag_*` values - `Space` — insert a new `Comment` token at cursor - `Shift+Space` — insert a new `Comment` token after cursor - `Return` — insert a `Format` (newline) token at cursor - `Backspace` — delete token before cursor - `Shift+Backspace` — delete token at cursor - `PgUp` / `PgDn` — scroll viewport - `F5` — toggle `run_full` (incremental ↔ full-tape JIT) - `F1` — save cartridge to `cartridge.bin` - `F2` — load cartridge from `cartridge.bin` and run - **Key bindings (EDIT mode):** - Hex digits (`0-9`, `a-f`) — shift into `Data` token value - Any printable char — append to annotation name (up to 8 chars) - `Backspace` — shift `Data` value right or trim annotation name - `Escape` — exit to MODE_NAV, triggers `relink_tape()` - Tape renders as colored token boxes, `TOKENS_PER_ROW` (8) per row, each showing a tag prefix char and either a 6-char hex value (Data) or an 8-char annotation name. - GDI rendering via `BeginPaint`/`EndPaint`. The HUD (status bar at bottom) shows RAX/RDX state, global memory cells [0-3], print log, and debug log. ### Persistence - Cartridge format: `[tape_arena.used : U8][anno_arena.used : U8][cursor_idx : U8] [tape data][anno data]` - On load: restores arenas, cursor, calls `relink_tape()` then `compile_and_run_tape()`. ## Current Development Roadmap Status as of 2026-02-21: 1. **FFI / Tape Drive Argument Scatter** — the PRINT primitive manually aligns RSP and moves rax into rcx before calling `ms_builtin_print`. R8/R9 args should come from pre-defined `vm_globals` offsets ("preemptive scatter") rather than being zeroed. 2. **Variable-Length Annotations** — `anno_arena` is fixed at 8 bytes per token. Need a scheme for longer comments without breaking the `O(1)` `tape_to_code_offset` mapping. 3. ~~**Cartridge Persistence**~~ — DONE (F1/F2 save/load via WinAPI `CreateFileA`/`WriteFile`). 4. **Editor Cursor Refinement** — proper in-token cursor for `Data` and annotation tokens, rather than backspace-truncation and right-shift append. 5. **Control Flow Expansion** — lambdas or basic block jumps beyond the current conditional-return primitives (`RET_IF_Z`, `RET_IF_S`). ## C DSL Conventions (from CONVENTIONS.md — strictly enforced) **Types:** Never use `int`, `long`, `unsigned`, etc. Always use `U1`/`U2`/`U4`/`U8` (unsigned), `S1`/`S2`/`S4`/`S8` (signed), `F4`/`F8` (float), `B1`–`B8` (bool). Use cast macros (`u8_(val)`, `u4_(val)`, `u4_r(ptr)`) — not C-style casts. Standard C casts only for complex types where no macro exists. **Naming:** `lower_snake_case` for functions/variables. `PascalCase` for types. WinAPI bindings prefixed with `ms_` using `asm("SymbolName")` — never declare raw WinAPI names. **const placement:** Always to the right: `char const*`, not `const char*`. **Structs/Enums:** Use `typedef Struct_(Name) { ... };` and `typedef Enum_(UnderlyingType, Name) { ... };`. **X-Macros:** Use for enums coupled with metadata (colors, prefixes, names). Entry names PascalCase, enum symbols use `tmpl(TypeName, Entry)` → `TypeName_Entry`. **Memory:** Use `FArena` / `farena_push` / `farena_reset` — no raw malloc. Use `mem_fill`/`mem_copy` not memset/memcpy. Do not `#include ` or ``. **Formatting:** Allman braces for complex blocks. Vertical alignment for struct fields and related declarations. Space between `&` and operand: `& my_var`. `else if` / `else` on new lines. Align consecutive `while`/`if` keywords vertically where possible. **Storage class keywords:** `global` (= `static` at file scope), `internal` (= `static` for functions), `LP_` (= `static` inside a function), `I_` (inline), `N_` (noinline), `IA_` (always-inline). **Line length:** 120–160 characters per line in scripts.