Files
forth_bootslop/CLAUDE.md
2026-02-21 10:52:56 -05:00

173 lines
9.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## AI Behavior Rules
- **Do not** create shell scripts, README files, or descriptive files unless explicitly instructed.
- **Do not** do anything beyond what was asked. Suggest extras in text; do not implement them.
- If a task is heavy, use sub-agents (codebase investigator, code editor, pattern analyzer, etc.).
- Screenshots are in `C:\Users\Ed\scoop\apps\sharex\current\ShareX\Screenshots\2026-02` — user will
specify which by last-modified. Manually pasted content goes in `./gallery`.
- Do not use `.gitignore` to infer file relevance for context.
- Goal is guided mentorship: validate architecture, give nudges, provide tactical help when asked.
The user is learning to build this system. Do not auto-generate finished solutions.
## Project Overview
**bootslop** is an experimental x86-64 Windows application: a sourceless, zero-overhead
ColorForth-inspired programming environment. Inspired by Timothy Lottes' "x56-40" / source-less
programming series and Onat Türkçüoğlu's VAMP/KYRA register-stack architecture.
There is no human-readable source — the "source of truth" is a binary token array (the "tape").
It features a modal visual editor (GDI-based), real-time JIT compilation to x86-64 machine code,
and cartridge-based persistence.
Canonical architecture reference: `references/Architectural_Consolidation.md`
Coding conventions: `CONVENTIONS.md`
AI behavior and goal context: `GEMINI.md`
## Build
Two-stage build via PowerShell: compile with clang, link with lld-link.
```powershell
pwsh scripts/build.attempt_1.c.ps1
```
Output goes to `build/attempt_1.exe`. Run the exe manually — it opens a GUI window.
**Toolchain requirements:** `clang` and `lld-link.exe` on PATH. Targets amd64 Windows 11.
Compiler flags: `-std=c23 -O0 -g -Wall -DBUILD_DEBUG=1 -fno-exceptions -fdiagnostics-absolute-paths`
Linker flags: `/MACHINE:X64 /SUBSYSTEM:CONSOLE /DEBUG /INCREMENTAL:NO` + `kernel32.lib user32.lib gdi32.lib`
Note: `-nostdlib` / `-ffreestanding` are commented out in the build script — the CRT is currently
linked but `<stdlib.h>` / `<string.h>` must not be included directly.
No automated tests exist. Verification is interactive via the running GUI.
## Code Architecture
All active source is in `attempt_1/`:
- **`main.c`** — The entire application (~867 lines). Contains: semantic tag definitions (X-macro),
global VM state, the JIT compiler (`compile_action`, `compile_and_run_tape`), the GDI renderer,
keyboard input handling, and cartridge save/load (F1/F2).
- **`duffle.amd64.win32.h`** — The C DSL header. Defines all base types (`U1``U8`, `S1``S8`,
`F4`, `F8`, `B1``B8`, `Str8`, `UTF8`), macros (`global`, `internal`, `LP_`, `I_`, `N_`),
arena allocator (`FArena`, `farena_push`, `farena_reset`), string formatting, and raw WinAPI
bindings.
### Token / Tape Model
- Tokens are `U4` (32-bit): top 4 bits = semantic tag, lower 28 bits = value or annotation index.
- Tags are defined via X-macro `Tag_Entries()`:
`Define` (`:`) · `Call` (`~`) · `Data` (`$`) · `Imm` (`^`) · `Comment` (`.`) · `Format` (` `)
- Two arenas: `tape_arena` (array of `U4` tokens) and `anno_arena` (array of `U8` — one 8-char
name slot per token, space-padded for name resolution).
- Helper macros: `pack_token(tag, val)`, `unpack_tag(token)`, `unpack_val(token)`.
### JIT Compiler
- `compile_action(val)` — emits x86-64 machine code for a single primitive or call. Called by
`compile_and_run_tape` for each token.
- `compile_and_run_tape()` (`IA_` always-inline) — resets `code_arena`, compiles the tape up to
`cursor_idx + 1` (incremental mode, `run_full == false`) or the full tape (`run_full == true`),
then immediately executes the generated code. Called on every relevant keystroke.
- **JIT prologue/epilogue:** The generated function takes `U8* globals_ptr` (= `vm_globals`).
Prologue loads `rax` from `globals_ptr[0x70/8]` = `vm_globals[14]` and `rdx` from
`globals_ptr[0x78/8]` = `vm_globals[15]`. Epilogue stores them back. `vm_rax` / `vm_rdx` are
synced from `vm_globals[14/15]` after execution.
- **The Magenta Pipe:** Every `Define` token emits a `JMP` (to skip over the function body for
inline execution flow) followed by `xchg rax, rdx` at the word entry point. This is the implicit
register-stack rotation at word boundaries — Onat's "magenta pipe".
- **O(1) linker:** `tape_to_code_offset[65536]` maps tape index → byte offset in `code_arena`.
Populated during `compile_and_run_tape` when a `Define` token is encountered.
- The VM uses two global registers (`vm_rax`, `vm_rdx`) and 16 global memory cells
(`vm_globals[16]`). No traditional Forth data stack in memory.
- **13 primitive operations:** `SWAP` · `MULT` · `ADD` · `FETCH` · `STORE` · `DUP` · `DROP` ·
`SUB` · `DEC` · `PRINT` · `RET` · `RET_IF_Z` · `RET_IF_S`
- **32-bit instruction granularity:** All emitted instructions are padded to 4-byte alignment via
NOP bytes (0x90). `pad32()` enforces this after every emit.
- Name resolution: `resolve_name_to_index()` matches 8-char space-padded annotations against
primitives first, then prior `Define` tokens. After edits, `relink_tape()` re-resolves all
`Call`/`Imm` references.
### Editor
- Two modes: `MODE_NAV` (navigate) / `MODE_EDIT` (type into token). Toggled with `E` / `Escape`.
- **Key bindings (NAV mode):**
- `E` — enter MODE_EDIT
- Arrow keys — move cursor (Up/Down navigate by logical lines delimited by `Format` tokens)
- `Tab` — cycle the current token's tag through `STag_*` values
- `Space` — insert a new `Comment` token at cursor
- `Shift+Space` — insert a new `Comment` token after cursor
- `Return` — insert a `Format` (newline) token at cursor
- `Backspace` — delete token before cursor
- `Shift+Backspace` — delete token at cursor
- `PgUp` / `PgDn` — scroll viewport
- `F5` — toggle `run_full` (incremental ↔ full-tape JIT)
- `F1` — save cartridge to `cartridge.bin`
- `F2` — load cartridge from `cartridge.bin` and run
- **Key bindings (EDIT mode):**
- Hex digits (`0-9`, `a-f`) — shift into `Data` token value
- Any printable char — append to annotation name (up to 8 chars)
- `Backspace` — shift `Data` value right or trim annotation name
- `Escape` — exit to MODE_NAV, triggers `relink_tape()`
- Tape renders as colored token boxes, `TOKENS_PER_ROW` (8) per row, each showing a tag prefix
char and either a 6-char hex value (Data) or an 8-char annotation name.
- GDI rendering via `BeginPaint`/`EndPaint`. The HUD (status bar at bottom) shows RAX/RDX state,
global memory cells [0-3], print log, and debug log.
### Persistence
- Cartridge format: `[tape_arena.used : U8][anno_arena.used : U8][cursor_idx : U8]
[tape data][anno data]`
- On load: restores arenas, cursor, calls `relink_tape()` then `compile_and_run_tape()`.
## Current Development Roadmap
Status as of 2026-02-21:
1. **FFI / Tape Drive Argument Scatter** — the PRINT primitive manually aligns RSP and moves rax
into rcx before calling `ms_builtin_print`. R8/R9 args should come from pre-defined `vm_globals`
offsets ("preemptive scatter") rather than being zeroed.
2. **Variable-Length Annotations** — `anno_arena` is fixed at 8 bytes per token. Need a scheme
for longer comments without breaking the `O(1)` `tape_to_code_offset` mapping.
3. ~~**Cartridge Persistence**~~ — DONE (F1/F2 save/load via WinAPI `CreateFileA`/`WriteFile`).
4. **Editor Cursor Refinement** — proper in-token cursor for `Data` and annotation tokens, rather
than backspace-truncation and right-shift append.
5. **Control Flow Expansion** — lambdas or basic block jumps beyond the current conditional-return
primitives (`RET_IF_Z`, `RET_IF_S`).
## C DSL Conventions (from CONVENTIONS.md — strictly enforced)
**Types:** Never use `int`, `long`, `unsigned`, etc. Always use `U1`/`U2`/`U4`/`U8` (unsigned),
`S1`/`S2`/`S4`/`S8` (signed), `F4`/`F8` (float), `B1``B8` (bool).
Use cast macros (`u8_(val)`, `u4_(val)`, `u4_r(ptr)`) — not C-style casts. Standard C casts only
for complex types where no macro exists.
**Naming:** `lower_snake_case` for functions/variables. `PascalCase` for types. WinAPI bindings
prefixed with `ms_` using `asm("SymbolName")` — never declare raw WinAPI names.
**const placement:** Always to the right: `char const*`, not `const char*`.
**Structs/Enums:** Use `typedef Struct_(Name) { ... };` and `typedef Enum_(UnderlyingType, Name) { ... };`.
**X-Macros:** Use for enums coupled with metadata (colors, prefixes, names). Entry names PascalCase,
enum symbols use `tmpl(TypeName, Entry)` → `TypeName_Entry`.
**Memory:** Use `FArena` / `farena_push` / `farena_reset` — no raw malloc. Use `mem_fill`/`mem_copy`
not memset/memcpy. Do not `#include <stdlib.h>` or `<string.h>`.
**Formatting:** Allman braces for complex blocks. Vertical alignment for struct fields and related
declarations. Space between `&` and operand: `& my_var`. `else if` / `else` on new lines. Align
consecutive `while`/`if` keywords vertically where possible.
**Storage class keywords:** `global` (= `static` at file scope), `internal` (= `static` for
functions), `LP_` (= `static` inside a function), `I_` (inline), `N_` (noinline), `IA_`
(always-inline).
**Line length:** 120160 characters per line in scripts.