Files
forth_bootslop/CLAUDE.md
2026-02-21 10:52:56 -05:00

9.2 KiB
Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

AI Behavior Rules

  • Do not create shell scripts, README files, or descriptive files unless explicitly instructed.
  • Do not do anything beyond what was asked. Suggest extras in text; do not implement them.
  • If a task is heavy, use sub-agents (codebase investigator, code editor, pattern analyzer, etc.).
  • Screenshots are in C:\Users\Ed\scoop\apps\sharex\current\ShareX\Screenshots\2026-02 — user will specify which by last-modified. Manually pasted content goes in ./gallery.
  • Do not use .gitignore to infer file relevance for context.
  • Goal is guided mentorship: validate architecture, give nudges, provide tactical help when asked. The user is learning to build this system. Do not auto-generate finished solutions.

Project Overview

bootslop is an experimental x86-64 Windows application: a sourceless, zero-overhead ColorForth-inspired programming environment. Inspired by Timothy Lottes' "x56-40" / source-less programming series and Onat Türkçüoğlu's VAMP/KYRA register-stack architecture.

There is no human-readable source — the "source of truth" is a binary token array (the "tape"). It features a modal visual editor (GDI-based), real-time JIT compilation to x86-64 machine code, and cartridge-based persistence.

Canonical architecture reference: references/Architectural_Consolidation.md Coding conventions: CONVENTIONS.md AI behavior and goal context: GEMINI.md

Build

Two-stage build via PowerShell: compile with clang, link with lld-link.

pwsh scripts/build.attempt_1.c.ps1

Output goes to build/attempt_1.exe. Run the exe manually — it opens a GUI window.

Toolchain requirements: clang and lld-link.exe on PATH. Targets amd64 Windows 11.

Compiler flags: -std=c23 -O0 -g -Wall -DBUILD_DEBUG=1 -fno-exceptions -fdiagnostics-absolute-paths Linker flags: /MACHINE:X64 /SUBSYSTEM:CONSOLE /DEBUG /INCREMENTAL:NO + kernel32.lib user32.lib gdi32.lib

Note: -nostdlib / -ffreestanding are commented out in the build script — the CRT is currently linked but <stdlib.h> / <string.h> must not be included directly.

No automated tests exist. Verification is interactive via the running GUI.

Code Architecture

All active source is in attempt_1/:

  • main.c — The entire application (~867 lines). Contains: semantic tag definitions (X-macro), global VM state, the JIT compiler (compile_action, compile_and_run_tape), the GDI renderer, keyboard input handling, and cartridge save/load (F1/F2).
  • duffle.amd64.win32.h — The C DSL header. Defines all base types (U1U8, S1S8, F4, F8, B1B8, Str8, UTF8), macros (global, internal, LP_, I_, N_), arena allocator (FArena, farena_push, farena_reset), string formatting, and raw WinAPI bindings.

Token / Tape Model

  • Tokens are U4 (32-bit): top 4 bits = semantic tag, lower 28 bits = value or annotation index.
  • Tags are defined via X-macro Tag_Entries(): Define (:) · Call (~) · Data ($) · Imm (^) · Comment (.) · Format ( )
  • Two arenas: tape_arena (array of U4 tokens) and anno_arena (array of U8 — one 8-char name slot per token, space-padded for name resolution).
  • Helper macros: pack_token(tag, val), unpack_tag(token), unpack_val(token).

JIT Compiler

  • compile_action(val) — emits x86-64 machine code for a single primitive or call. Called by compile_and_run_tape for each token.
  • compile_and_run_tape() (IA_ always-inline) — resets code_arena, compiles the tape up to cursor_idx + 1 (incremental mode, run_full == false) or the full tape (run_full == true), then immediately executes the generated code. Called on every relevant keystroke.
  • JIT prologue/epilogue: The generated function takes U8* globals_ptr (= vm_globals). Prologue loads rax from globals_ptr[0x70/8] = vm_globals[14] and rdx from globals_ptr[0x78/8] = vm_globals[15]. Epilogue stores them back. vm_rax / vm_rdx are synced from vm_globals[14/15] after execution.
  • The Magenta Pipe: Every Define token emits a JMP (to skip over the function body for inline execution flow) followed by xchg rax, rdx at the word entry point. This is the implicit register-stack rotation at word boundaries — Onat's "magenta pipe".
  • O(1) linker: tape_to_code_offset[65536] maps tape index → byte offset in code_arena. Populated during compile_and_run_tape when a Define token is encountered.
  • The VM uses two global registers (vm_rax, vm_rdx) and 16 global memory cells (vm_globals[16]). No traditional Forth data stack in memory.
  • 13 primitive operations: SWAP · MULT · ADD · FETCH · STORE · DUP · DROP · SUB · DEC · PRINT · RET · RET_IF_Z · RET_IF_S
  • 32-bit instruction granularity: All emitted instructions are padded to 4-byte alignment via NOP bytes (0x90). pad32() enforces this after every emit.
  • Name resolution: resolve_name_to_index() matches 8-char space-padded annotations against primitives first, then prior Define tokens. After edits, relink_tape() re-resolves all Call/Imm references.

Editor

  • Two modes: MODE_NAV (navigate) / MODE_EDIT (type into token). Toggled with E / Escape.
  • Key bindings (NAV mode):
    • E — enter MODE_EDIT
    • Arrow keys — move cursor (Up/Down navigate by logical lines delimited by Format tokens)
    • Tab — cycle the current token's tag through STag_* values
    • Space — insert a new Comment token at cursor
    • Shift+Space — insert a new Comment token after cursor
    • Return — insert a Format (newline) token at cursor
    • Backspace — delete token before cursor
    • Shift+Backspace — delete token at cursor
    • PgUp / PgDn — scroll viewport
    • F5 — toggle run_full (incremental ↔ full-tape JIT)
    • F1 — save cartridge to cartridge.bin
    • F2 — load cartridge from cartridge.bin and run
  • Key bindings (EDIT mode):
    • Hex digits (0-9, a-f) — shift into Data token value
    • Any printable char — append to annotation name (up to 8 chars)
    • Backspace — shift Data value right or trim annotation name
    • Escape — exit to MODE_NAV, triggers relink_tape()
  • Tape renders as colored token boxes, TOKENS_PER_ROW (8) per row, each showing a tag prefix char and either a 6-char hex value (Data) or an 8-char annotation name.
  • GDI rendering via BeginPaint/EndPaint. The HUD (status bar at bottom) shows RAX/RDX state, global memory cells [0-3], print log, and debug log.

Persistence

  • Cartridge format: [tape_arena.used : U8][anno_arena.used : U8][cursor_idx : U8] [tape data][anno data]
  • On load: restores arenas, cursor, calls relink_tape() then compile_and_run_tape().

Current Development Roadmap

Status as of 2026-02-21:

  1. FFI / Tape Drive Argument Scatter — the PRINT primitive manually aligns RSP and moves rax into rcx before calling ms_builtin_print. R8/R9 args should come from pre-defined vm_globals offsets ("preemptive scatter") rather than being zeroed.
  2. Variable-Length Annotationsanno_arena is fixed at 8 bytes per token. Need a scheme for longer comments without breaking the O(1) tape_to_code_offset mapping.
  3. Cartridge Persistence — DONE (F1/F2 save/load via WinAPI CreateFileA/WriteFile).
  4. Editor Cursor Refinement — proper in-token cursor for Data and annotation tokens, rather than backspace-truncation and right-shift append.
  5. Control Flow Expansion — lambdas or basic block jumps beyond the current conditional-return primitives (RET_IF_Z, RET_IF_S).

C DSL Conventions (from CONVENTIONS.md — strictly enforced)

Types: Never use int, long, unsigned, etc. Always use U1/U2/U4/U8 (unsigned), S1/S2/S4/S8 (signed), F4/F8 (float), B1B8 (bool). Use cast macros (u8_(val), u4_(val), u4_r(ptr)) — not C-style casts. Standard C casts only for complex types where no macro exists.

Naming: lower_snake_case for functions/variables. PascalCase for types. WinAPI bindings prefixed with ms_ using asm("SymbolName") — never declare raw WinAPI names.

const placement: Always to the right: char const*, not const char*.

Structs/Enums: Use typedef Struct_(Name) { ... }; and typedef Enum_(UnderlyingType, Name) { ... };.

X-Macros: Use for enums coupled with metadata (colors, prefixes, names). Entry names PascalCase, enum symbols use tmpl(TypeName, Entry)TypeName_Entry.

Memory: Use FArena / farena_push / farena_reset — no raw malloc. Use mem_fill/mem_copy not memset/memcpy. Do not #include <stdlib.h> or <string.h>.

Formatting: Allman braces for complex blocks. Vertical alignment for struct fields and related declarations. Space between & and operand: & my_var. else if / else on new lines. Align consecutive while/if keywords vertically where possible.

Storage class keywords: global (= static at file scope), internal (= static for functions), LP_ (= static inside a function), I_ (inline), N_ (noinline), IA_ (always-inline).

Line length: 120160 characters per line in scripts.