diff --git a/CONVENTIONS.md b/CONVENTIONS.md index 60e6b2d..84c3890 100644 --- a/CONVENTIONS.md +++ b/CONVENTIONS.md @@ -10,6 +10,7 @@ This document outlines the strict C style and architectural conventions expected * Float: `F4`, `F8` * Boolean: `B1`, `B2`, `B4`, `B8` (use `true`/`false` primitives) * Strings/Chars: `UTF8` (for characters), `Str8` (for string slices) +* **Fundamental Type Casts:** Strictly use the provided casting macros (e.g., `u8_(val)`, `u4_r(ptr)`, `s4_(val)`) instead of standard C-style cast syntax like `(U8)val`. Standard casts should only be used for complex types or when an appropriate macro isn't available. * **WinAPI Structs:** Only use `MS_` prefixed fundamental types (e.g., `MS_LONG`, `MS_DWORD`) *inside* WinAPI struct definitions (`MS_WNDCLASSA`, etc.) to maintain FFI compatibility. Do not use them in general application logic. ## 2. Declaration Wrappers & X-Macros @@ -18,8 +19,8 @@ This document outlines the strict C style and architectural conventions expected * `typedef Enum_(UnderlyingType, Name) { ... };` * **X-Macros:** Use X-Macros to tightly couple Enums with their corresponding string representations or metadata. ```c - #define My_Tag_Entries() - X(Define, "Define") + #define My_Tag_Entries() \ + X(Define, "Define") \ X(Call, "Call") ``` @@ -43,19 +44,16 @@ This document outlines the strict C style and architectural conventions expected } MS_WNDCLASSA; ``` * **Multi-line Argument Alignment:** For long function signatures, place one argument per line with a single 4-space indent. - * Example: - ```c - WinAPI B4 ms_read_console( - MS_Handle handle, - UTF8*r buffer, - U4 to_read, - U4*r num_read, - U8 reserved_input_control - ) asm("ReadConsoleA"); - ``` * **WinAPI Grouping:** Group foreign procedure declarations by their originating OS library (e.g., Kernel32, User32, GDI32) using comment headers. * **Brace Style:** Use Allman style (braces on a new line) for function bodies or control blocks (`if`, `for`, `switch`, etc.) that are large or complex. Smaller blocks may use K&R style. -* **Conditionals:** Always place `else if` and `else` statements on a new line, un-nested from the previous closing brace. +* **Conditionals & Control Flow:** Always place `else if` and `else` statements on a new line. Align control flow parentheses (e.g., between consecutive `while` and `if` blocks) vertically when possible for aesthetic uniformity: + ```c + while (len < 8) len ++; + if (len > 0) { ... } + ``` +* **Address-Of Operator:** Do insert a space between the address-of operator (`&`) and the variable name. + * **Correct:** `& my_var` + * **Incorrect:** `&my_var` ## 5. Memory Management * **Standard Library:** The C standard library is linked, but headers like `` or `` should not be included directly. Required functions should be declared manually if needed, or accessed via compiler builtins. @@ -77,3 +75,4 @@ This document outlines the strict C style and architectural conventions expected X(Call, "Call", 0x00D6A454, "~") ``` * **Naming Conventions:** When using X-Macros for Tags, entry names should be PascalCase, and the Enum symbols should be prefixed with the Enum type name (e.g., `tmpl(STag, Define)` -> `STag_Define`). + diff --git a/GEMINI.md b/GEMINI.md index 2fb9a3e..a47b1c6 100644 --- a/GEMINI.md +++ b/GEMINI.md @@ -38,3 +38,31 @@ Based on the curation in `./references/`, the resulting system MUST adhere to th 4. **Preemptive Scatter ("Tape Drive"):** Function arguments are not pushed to a stack before a call. They are "scattered" into pre-allocated, contiguous global memory slots during compilation/initialization. The function simply reads from these known offsets, eliminating argument gathering overhead. 5. **No `if/then` branches:** Rely on hardware-level flags like conditional returns (`ret-if-signed`) combined with factored calls to avoid writing complex AST parsers. 6. **No Dependencies:** C implementation must be minimal (`-nostdlib`), ideally running directly against OS APIs (e.g., WinAPI `VirtualAlloc`, `ExitProcess`, `GDI32` for rendering). + +## Current Development Roadmap (attempt_1) + +Here's a breakdown of the next steps to advance the `attempt_1` implementation towards a ColorForth derivative: + +1. **Enhance Lexer/Parser/Compiler (JIT) in `main.c`:** + * **Token Interpretation:** Refine the interpretation of the 28-bit payload based on the 4-bit color tag (e.g., differentiate between immediate values, dictionary IDs, and data addresses). + * **Dictionary Lookup:** Improve the efficiency and scalability of dictionary lookups for custom words beyond the current linear search. + * **New Word Definition:** Implement mechanisms for defining new Forth words directly within the editor, compiling them into the `code_arena`. + +2. **Refine Visual Editor (`win_proc` in `main.c`):** + * **Dynamic Colorization:** Ensure all rendered tokens accurately reflect their 4-bit color tags, updating dynamically with changes. + * **Annotation Handling:** Implement more sophisticated display for token annotations, supporting up to 8 characters clearly without truncation or visual artifacts. + * **Input Handling:** Improve text input for `STag_Data` (e.g., supporting full hexadecimal input, backspace functionality). + * **Cursor Behavior:** Ensure the cursor accurately reflects the current editing position within the token stream. + +3. **Expand Register-Only Stack Operations:** + * Implement core Forth stack manipulation words (e.g., `DUP`, `DROP`, `OVER`, `ROT`) by generating appropriate x86-64 assembly instructions that operate solely on `RAX` and `RDX`. + +4. **Develop `Tape Drive` Memory Management:** + * Ensure all memory access (read/write) for Forth variables and data structures correctly utilize the `vm_globals` array and the "preemptive scatter" approach. + +5. **Implement Control Flow without Branches:** + * Leverage conditional returns and factored calls to create more complex control flow structures (e.g., `IF`/`ELSE`/`THEN` equivalents) without introducing explicit `jmp` instructions where not architecturally intended. + +6. **Continuous Validation & Debugging:** + * Enhance debugging output within the UI to provide clearer insight into VM state (RAX, RDX, global memory, log buffer) during execution. + * Consider adding simple "tests" as Forth sequences within `tape_arena` to verify new features. diff --git a/references/Video_Breakdowns.md b/references/Video_Breakdowns.md new file mode 100644 index 0000000..c104bee --- /dev/null +++ b/references/Video_Breakdowns.md @@ -0,0 +1,86 @@ +# Advanced Source-Less Programming & JIT Architecture: A Hardcore Technical Study + +This document contains a deep-dive technical extraction of the mechanics, JIT compiler optimizations, and paradigms presented by Timothy Lottes and Onat Türkçüoğlu. These notes surpass high-level theory, detailing the exact x86-64 assembly generation rules, state-tracking mechanisms, and memory layouts required to implement a zero-overhead, source-less Forth environment. + +--- + +## 1. The Lottes "x68" Paradigm: Editor as the OS + +Lottes's approach fundamentally transforms the editor into a live, dynamic linker and machine-code orchestrator. + +### 1.1 The Lexical Grid and 32-Bit Instruction Granularity +In x68, the runtime contains *no parsing logic*. Code is a flat array of 32-bit tokens (4-bit tag, 28-bit payload). +To make the x86-64 architecture fit this visual editor grid, Lottes forces all generated machine code to 32-bit boundaries: +* **Instruction Padding:** Native instructions that are smaller than 4 bytes are padded. + * *Example:* `RET` (`0xC3`) becomes `C3 90 90 90` (using three `NOP`s). + * *Example:* `MOV` or `ADD` can use ignored segment overrides (like the `3E` DS prefix) or unnecessary `REX` prefixes to reach exactly 4 bytes. +* **Auto-Relinking:** The editor implicitly acts as a linker. Because every instruction is 32 bits, 32-bit RIP-relative offsets for `CALL` (`E8`) and `JMP` (`E9`) are perfectly aligned. When the user inserts or deletes a token in the editor, the editor instantly recalculates and updates the raw binary relative offsets for all branch instructions. +* **Shorthand Assembly UI:** The editor can decode these 32-bit blocks and display human-readable macro-assembly, e.g., mapping `add rcx, qword ptr [rdx + 0x8]` to the visual string `h + at i08`. + +### 1.2 ColorForth Semantic Tags & The State Machine +The 4-bit color tag dictates how the editor/JIT interprets the 28-bit payload: +* **White (Ignored):** Comments, formatting, or skipped words. +* **Yellow (Immediate Execution):** + * If a number: Append it to the data stack *during edit/compile time*. + * If a word: Look it up in the dictionary and execute its associated code *immediately*. +* **Red (Define):** Sets a word in the dictionary to point to the current compilation address (or TOS). +* **Green (Compile):** + * If a number: Emits machine code to push that number (e.g., `mov rax, imm`). + * If a word: Looks it up in the *Macro* dictionary; if found, calls it (code generation). Otherwise, looks it up in the *Forth* dictionary and emits a `CALL` to it. +* **Cyan/Blue (Defer Execution):** Looks up a word in the macro dictionary and appends a call to it. Used for macros that generate other macros. +* **Magenta (Variable/Pointer):** Sets the dictionary value to point to the *next source token* in memory. +* **The Transition Trigger:** A transition from Yellow (Execution) to Green (Compilation) causes the JIT to pop the current Top of Stack and emit a native machine-code instruction to push that value. (i.e., "Turning a computed number back into a program"). + +### 1.3 The 5-Byte Folded Interpreter +To eliminate the massive pipeline stall (branch misprediction) caused by a standard `NEXT` instruction in threaded-code interpreters, Lottes suggests embedding a micro-interpreter at the *end of every word*: +1. **`LODSD` (1 byte or 2 bytes with REX):** Loads the next 32-bit token from `RSI` (the instruction pointer) into `EAX`/`RAX` and increments `RSI`. +2. **Lookup (2 bytes):** Uses a highly optimized hash or direct mapping to translate the token payload to a memory address. +3. **Jump (2 bytes):** Emits an indirect jump (e.g., `JMP RAX`). +*Result:* Every word transition has its own dedicated branch predictor slot in the CPU hardware, reducing average clock stalls from ~16 to near 0. + +--- + +## 2. Onat's VAMP / KYRA: High-Performance Macro-Assembler + +Onat's implementation provides a masterclass in eliminating the Forth data stack and leveraging x86-64 hardware registers optimally. + +### 2.1 The 2-Register Stack & JIT State Tracking +Traditional Forth maintains a data stack in RAM, requiring constant memory loads/stores. Onat eliminates this: +* **The Stack is `RAX` and `RDX`.** No memory is used for parameter passing. +* **The 1-Bit JIT Optimizer:** The JIT compiler maintains a single bit of state: `is_rax_tos` (Is RAX currently the Top of Stack?). +* **Smart Compilation:** + * If the user types a Cyan number (Immediate), the JIT checks `is_rax_tos`. If true, it emits `mov rax, imm`. If false, it emits `mov rdx, imm`. + * Before compiling a `CALL`, the JIT knows which register the target function expects the TOS to be in. If the current JIT state mismatches the target's expectation, it automatically emits the 3-byte `xchg rax, rdx` (`48 87 C2`) instruction *before* the call. + * This makes operations like `SWAP` virtually free—they often just flip the compiler's internal `is_rax_tos` boolean without emitting any machine code. +* **Function Prologue/Epilogue:** Functions do not push/pop to a return stack in memory manually; they rely purely on the native x86 `call` and `ret` instructions utilizing `RSP` purely as a call stack. + +### 2.2 Global Preemptive Scatter (The "Tape Drive") +Because the data stack is limited to two items, passing deep context is impossible. +* **Global Single-Register Base:** A single x86 register (e.g., `R12` or `R15`) is dedicated globally as the base pointer for all application memory (giving "gigabytes of state"). +* **Colors map to memory operations:** + * **Green Tag (Read):** Emits `mov REG, [base_ptr + token_offset]`. + * **Red Tag (Write):** Emits `mov [base_ptr + token_offset], REG`. +* **FFI (Foreign Function Interface):** To call complex OS APIs (like Vulcan `VkImageCreateInfo`), VAMP does not use C-struct bindings. It manually calculates byte-offsets from the global base, emits instructions to write the struct data inline, aligns `RSP` for the OS calling convention, and calls the dynamic library pointer. + +### 2.3 Lexical Syntax and Color Semantics +Onat uses a 24-bit dictionary index + 8-bit color tag. The semantics map directly to JIT actions: +* **Magenta Pipe (`|`):** Defines the boundary of a function. The JIT encounters this, emits a `RET` (`C3`) to close the previous function, and records the current instruction pointer as the start address of the new function. +* **White (Call):** Emits a relative `CALL` to the target. (If jumping to a dynamic address already in a register, it optimizes to `JMP RAX`). +* **Yellow (Macro):** Executes the attached code *during JIT compilation*. Used for compiler directives, setting layouts, or emitting specialized instructions like `LOCK` prefixes. +* **Blue (Comment):** Ignored by the JIT pointer entirely. + +### 2.4 Control Flow without ASTs +VAMP abandons standard `IF/ELSE/THEN` parsing trees in favor of assembly-level basic blocks and lambdas. +* **Lambdas `{ }`:** Defining a lambda simply compiles the block of code elsewhere and leaves its executable memory address on the stack (`RAX` or `RDX`). +* **Conditionals via Global State:** + 1. A comparison (e.g., `>`) is executed. + 2. The result is written to a dedicated global variable (e.g., `condition` using a Red tag). + 3. The conditional jump word reads the `condition` variable, consumes the lambda's address from the stack, and emits `CMP condition, 0` followed by `JZ lambda_address`. +* **Basic Blocks `[ ]`:** These constrain the scope of assembly generation. If a conditional within a block passes, execution falls through. If it fails, it jumps to the nearest closing `]`. + +### 2.5 Live Debugging via Instruction Injection +The most powerful UX feature of VAMP is its real-time data flow visualization. +* The editor tracks the user's cursor position. +* During JIT compilation, if the `compiler_instruction_ptr` equals the `editor_cursor_ptr`, the JIT injects a debug macro. +* This macro emits instructions to copy the current state of `RAX` and `RDX` (the entire data stack) into a global circular buffer. +* The UI reads this buffer, instantly displaying the exact runtime state of the program at the cursor's location, acting as an instant, zero-cost `printf`. \ No newline at end of file