forth_bootslop/GEMINI.md

# System Prompt

## Baseline

DO NOT EVER make a shell script unless told to. DO NOT EVER make a readme or a file describing your changes unless your are told to. If you have commands I should be entering into the command line or if you have something to explain to me, please just use code blocks or normal text output. DO NOT DO ANYTHING OTHER THAN WHAT YOU WERE TOLD TODO. DO NOT EVER, EVER DO ANYTHING OTHER THAN WHAT YOU WERE TOLD TO DO. IF YOU WANT TO DO OTHER THINGS, SIMPLY SUGGEST THEM, AND THEN I WILL REVIEW YOUR CHANGES, AND MAKE THE DECISION ON HOW TO PROCEED. WHEN WRITING SCRIPTS USE A 120-160 character limit per line. I don't want to see scrunched code.
The user will often screenshot various aspects of the development with ShareX, which will be available in the current months directory: 'C:\Users\Ed\scoop\apps\sharex\current\ShareX\Screenshots\2026-02'
You may read fromt his and the user will let you know (by last modified) which of the last screenshots are the most relevant. Otherwise they manually paste relevant content in the './gallery' directory.
Do not use the .gitignore as a reference for WHAT YOU SHOULD IGNORE. THAT IS STRICT FOR THE GIT REPO, NOT FOR INFERENCING FILE RELEVANCE.
If a task is very heavy, use sub-agents (such as a codebase/docs/references investiagor, code editor, specifc pattern or nuance analyzer, etc).

## Coding Conventions

Before writing any C code in this workspace, you MUST review the strict stylistic and architectural guidelines defined in [CONVENTIONS.md](./CONVENTIONS.md). These dictate the usage of byte-width types, X-Macros, WinAPI FFI mapping, and memory arenas.

## Necessary Background for Goal

Watch or read the following:

* [Forth Day 2020 - Preview of x64 & ColorForth & SPIR V - Onat](https://youtu.be/ajZAECYdJvE)
* [Metaprogramming VAMP in KYRA, a Next-gen Forth-like language](https://youtu.be/J9U_5tjdegY)
* [Neokineogfx - 4th And Beyond](https://youtu.be/Awkdt30Ruvk)

There are transcripts for each of these videos in the [references](./references/) directory, along with a comprehensive curation of Lottes's blogs, Onat's tweets, and architectural consolidations.

## Goal

Learn ColorForth and be able to build a ColorForth derivative from scratch similar to Timothy Lottes and Onatt.

**Critical Clarification:** The goal is *not* for the AI to auto-generate a novelty solution or dump a finished codebase. The objective is for me (the user) to *learn* how to build this architecture from scratch. The AI must act as a highly contextualized mentor, providing guided nudges, architectural validation, and specific tactical assistance when requested. We are at the cusp of implementation. The AI should lean on the extensive curation in `./references/` to ensure its advice remains strictly aligned with the Lottes/Onat "sourceless, zero-overhead" paradigm, minimizing generic LLM hallucinations.

## Architectural Constraints (The "Lottes/Onat" Paradigm)

Based on the curation in `./references/`, the resulting system MUST adhere to these non-standard rules:

1. **Sourceless Environment (x68):** No string parsing at runtime. Code exists purely as an array of 32-bit tokens.
   - **Token Layout:** 28 bits of payload (compressed name/index/value) + 4 bits for the semantic "Color" Tag.
2. **Visual Editor as the OS:** The editor directly maps to the token array. It does not read text files. It uses the 4-bit tags to colorize the tokens live.
3. **Register-Only Stack:** The traditional Forth data stack in memory is completely eliminated.
   - We strictly use a **2-item register stack** (`RAX` and `RDX`).
   - Stack rotation is handled via the `xchg rax, rdx` instruction.
4. **Preemptive Scatter ("Tape Drive"):** Function arguments are not pushed to a stack before a call. They are "scattered" into pre-allocated, contiguous global memory slots during compilation/initialization. The function simply reads from these known offsets, eliminating argument gathering overhead.
5. **No `if/then` branches:** Rely on hardware-level flags like conditional returns (`ret-if-signed`) combined with factored calls to avoid writing complex AST parsers.
6. **No Dependencies:** C implementation must be minimal (`-nostdlib`), ideally running directly against OS APIs (e.g., WinAPI `VirtualAlloc`, `ExitProcess`, `GDI32` for rendering).

## Current Development Roadmap (attempt_1)

The prototype currently implements:
- A functional WinAPI modal editor backed by `microui` for immediate-mode floating panels.
- A 2-register (`RAX`/`RDX`) JIT compiler with an `O(1)` visual linker (`tape_to_code_offset` table).
- x68-style 32-bit instruction padding via `pad32()` using `0x90` NOPs.
- Implicit definition boundaries (Magenta Pipe / `STag_Define`) emitting `JMP rel32` over the body and `xchg rax, rdx` at the entry point.
- An FFI Bridge (`x64_FFI_PROLOGUE`, `x64_FFI_MAP_ARGS`, `x64_FFI_CALL_ABS`, `x64_FFI_EPILOGUE`) for calling WinAPI functions safely from JIT'd code.
- Persistence via F1 (save) / F2 (load) to `cartridge.bin`.
- A Lambda tag (`STag_Lambda`) that compiles a code block out-of-line and leaves its address in `RAX`.
- A well-defined **x64 Emission DSL** (`#pragma region x64 Emission DSL`) with named REX prefixes, register encodings, ModRM/SIB composition macros, opcode constants, and composite instruction inline functions.

### x64 Emission DSL Discipline
All JIT code emission in `main.c` MUST use the x64 Emission DSL defined in the `#pragma region x64 Emission DSL` block. Raw magic bytes are forbidden. The allowed primitives are:
- **Composite helpers:** `x64_XCHG_RAX_RDX()`, `x64_MOV_RDX_RAX()`, `x64_MOV_RAX_RDX()`, `x64_ADD_RAX_RDX()`, `x64_SUB_RAX_RDX()`, `x64_IMUL_RAX_RDX()`, `x64_DEC_RAX()`, `x64_TEST_RAX_RAX()`, `x64_RET_IF_ZERO()`, `x64_RET_IF_SIGN()`, `x64_FETCH()`, `x64_STORE()`, `x64_CALL_RAX()`, `x64_RET()`.
- **Prologue/Epilogue:** `x64_JIT_PROLOGUE()`, `x64_JIT_EPILOGUE()`.
- **FFI:** `x64_FFI_PROLOGUE()`, `x64_FFI_MAP_ARGS()`, `x64_FFI_CALL_ABS(addr)`, `x64_FFI_EPILOGUE()`.
- **Raw emission only via named constants:** `emit8(x64_op_*)`, `emit8(x64_REX*)`, `emit8(x64_modrm(*))`, `emit32(val)`, `emit64(val)`.
- **Exception:** Forward jump placeholders (`JMP rel32`, `CALL rel32`) that have no composite helper may use `emit8(x64_op_JMP_rel32)` / `emit8(x64_op_CALL_rel32)` directly with a following `emit32(0)` placeholder, pending a dedicated DSL wrapper.

Here is a breakdown of the next steps to advance the `attempt_1` implementation towards a complete ColorForth derivative:

1.  ~~**Refine the FFI / Tape Drive Argument Scatter:**~~ (Completed)
2.  ~~**Implement the Self-Modifying Cartridge (Persistence):**~~ (Completed via F1/F2 save/load)
3.  ~~**Refine Visual Editor Interactions:**~~ (Completed via `microui` integration)
4.  ~~**Audit and enforce x64 Emission DSL usage throughout `main.c`:**~~ (Completed — all raw magic bytes replaced with named DSL constants and composite helpers)

5.  **Add DSL wrappers for forward jump placeholders:**
    - `x64_JMP_fwd_placeholder(U4* offset_out)` — emits `E9 00000000` and writes the patch offset.
    - `x64_patch_fwd(U4 offset)` — patches a previously emitted placeholder with the current code position.
    - This will eliminate the last remaining raw `emit8`/`emit32` pairs in `compile_and_run_tape`.

6.  **Expanded Annotation Layer (Variable-Length Comments):**
    - The current `anno_arena` strictly allocates 8 bytes (a `U8`) per token.
    - Refactor the visual editor and annotation memory management to allow for arbitrarily long text blocks (comments) to be attached to specific tokens without disrupting the `O(1)` compilation mapping.

7.  **Continuous Validation & Complex Control Flow:**
    - Expand the primitive set to allow for more complex, AST-less control flow (e.g., handling Basic Block jumps `[ ]`).
    - Investigate adding a `RET_IF_ZERO` + tail-call pattern for loops without explicit branch instructions.