Compare commits

..

5 Commits

Author SHA1 Message Date
ed 34e7f6017e notes 2026-02-20 22:31:17 -05:00
ed fd132c6efc progress 2026-02-20 22:10:29 -05:00
ed 2db4acd493 dont use let gemini respect .gitignore 2026-02-20 21:41:43 -05:00
ed 5a1f2fd799 remove unused script 2026-02-20 21:39:22 -05:00
ed d387fc4f10 notes 2026-02-20 21:38:52 -05:00
11 changed files with 541 additions and 393 deletions
+7
View File
@@ -0,0 +1,7 @@
{
"context": {
"fileFiltering": {
"respectGitIgnore": false
}
}
}
+16 -19
View File
@@ -6,6 +6,7 @@ DO NOT EVER make a shell script unless told to. DO NOT EVER make a readme or a f
The user will often screenshot various aspects of the development with ShareX, which will be available in the current months directory: 'C:\Users\Ed\scoop\apps\sharex\current\ShareX\Screenshots\2026-02'
You may read fromt his and the user will let you know (by last modified) which of the last screenshots are the most relevant. Otherwise they manually paste relevant content in the './gallery' directory.
Do not use the .gitignore as a reference for WHAT YOU SHOULD IGNORE. THAT IS STRICT FOR THE GIT REPO, NOT FOR INFERENCING FILE RELEVANCE.
If a task is very heavy, use sub-agents (such as a codebase/docs/references investiagor, code editor, specifc pattern or nuance analyzer, etc).
## Coding Conventions
@@ -43,28 +44,24 @@ Based on the curation in `./references/`, the resulting system MUST adhere to th
## Current Development Roadmap (attempt_1)
Here's a breakdown of the next steps to advance the `attempt_1` implementation towards a ColorForth derivative:
The prototype currently implements a functional WinAPI modal editor, a 2-register (`RAX`/`RDX`) JIT compiler with an `O(1)` visual linker, x68 32-bit instruction padding, implicit definition boundaries (Magenta Pipe), and an initial FFI Bridge (`emit_ffi_dance`).
1. **Enhance Lexer/Parser/Compiler (JIT) in `main.c`:**
* **Token Interpretation:** Refine the interpretation of the 28-bit payload based on the 4-bit color tag (e.g., differentiate between immediate values, dictionary IDs, and data addresses).
* **Dictionary Lookup:** Improve the efficiency and scalability of dictionary lookups for custom words beyond the current linear search.
* **New Word Definition:** Implement mechanisms for defining new Forth words directly within the editor, compiling them into the `code_arena`.
Here is a breakdown of the next steps to advance the `attempt_1` implementation towards a complete ColorForth derivative:
2. **Refine Visual Editor (`win_proc` in `main.c`):**
* **Dynamic Colorization:** Ensure all rendered tokens accurately reflect their 4-bit color tags, updating dynamically with changes.
* **Annotation Handling:** Implement more sophisticated display for token annotations, supporting up to 8 characters clearly without truncation or visual artifacts.
* **Input Handling:** Improve text input for `STag_Data` (e.g., supporting full hexadecimal input, backspace functionality).
* **Cursor Behavior:** Ensure the cursor accurately reflects the current editing position within the token stream.
1. **Refine the FFI / Tape Drive Argument Scatter:**
* Currently, the FFI bridge only maps `RAX` and `RDX` to the C-ABI `RCX` and `RDX`.
* Implement "Preemptive Scatter" logic so the FFI bridge correctly reads subsequent arguments (e.g., `R8`, `R9`) directly from pre-defined offsets in the `vm_globals` tape drive instead of just zeroing them out.
3. **Expand Register-Only Stack Operations:**
* Implement core Forth stack manipulation words (e.g., `DUP`, `DROP`, `OVER`, `ROT`) by generating appropriate x86-64 assembly instructions that operate solely on `RAX` and `RDX`.
2. **Expanded Annotation Layer (Variable-Length Comments):**
* The current `anno_arena` strictly allocates 8 bytes (a `U8`) per token.
* Refactor the visual editor and annotation memory management to allow for arbitrarily long text blocks (comments) to be attached to specific tokens without disrupting the `O(1)` compilation mapping.
4. **Develop `Tape Drive` Memory Management:**
* Ensure all memory access (read/write) for Forth variables and data structures correctly utilize the `vm_globals` array and the "preemptive scatter" approach.
3. **Implement the Self-Modifying Cartridge (Persistence):**
* The tape and annotations are currently lost when the program closes.
* Move away from purely transient `VirtualAlloc` buffers to a memory-mapped file approach (or a manual Save/Load equivalent in WinAPI) to allow the "executable as source" to persist between sessions.
5. **Implement Control Flow without Branches:**
* Leverage conditional returns and factored calls to create more complex control flow structures (e.g., `IF`/`ELSE`/`THEN` equivalents) without introducing explicit `jmp` instructions where not architecturally intended.
4. **Refine Visual Editor Interactions:**
* Implement a proper internal text-editing cursor within the `STag_Data` and `STag_Format` (annotation) tokens, rather than relying on backspace-truncation and appendage.
6. **Continuous Validation & Debugging:**
* Enhance debugging output within the UI to provide clearer insight into VM state (RAX, RDX, global memory, log buffer) during execution.
* Consider adding simple "tests" as Forth sequences within `tape_arena` to verify new features.
5. **Continuous Validation & Complex Control Flow:**
* Expand the primitive set to allow for more complex, AST-less control flow (e.g., handling Lambdas or specific Basic Block jumps).
+1 -1
View File
@@ -4,7 +4,7 @@ This repository contains the curation materials and prototype implementation for
## Project Goal
The objective is to *learn* how to build this architecture from scratch, with the AI acting as a highly contextualized mentor. We are using a `-nostdlib` C environment on Win32 to construct a visual editor that is simultaneously the IDE, the compiler, and the OS for a tiny, high-performance computing environment.
The objective is to *learn* how to build this architecture from scratch, with the AI acting as a highly contextualized mentor.
## Current State
+19 -11
View File
@@ -13,7 +13,7 @@ The application presents a visual grid of 32-bit tokens and allows the user to n
2. **Annotation Layer (`FArena` anno):**
* A parallel `FArena` of `U8` (64-bit) integers stores an 8-character string for each corresponding token on the tape.
* The UI renderer prioritizes displaying this string, but the compiler only ever sees the 2-character ID packed into the 32-bit token, successfully implementing Lottes' dictionary annotation strategy.
* The UI renderer prioritizes displaying this string, but the compiler only ever sees the indices packed into the 32-bit token.
3. **2-Register Stack & Global Memory:**
* The JIT compiler emits x86-64 that strictly adheres to Onat's `RAX`/`RDX` register stack.
@@ -23,22 +23,30 @@ The application presents a visual grid of 32-bit tokens and allows the user to n
* A small set of `emit8`/`emit32` functions write raw x86-64 opcodes into a `VirtualAlloc` block marked as executable (`PAGE_EXECUTE_READWRITE`).
* This buffer is cast to a C function pointer and called directly, bypassing the need for an external assembler like NASM or a complex library like Zydis for this prototype stage.
5. **2-Character Mapped Dictionary & Resolver:**
* The `ID2(a, b)` macro packs two characters into a 16-bit integer for use as a token's payload.
* The JIT compiler maintains a simple array-based dictionary. On a `: Define` token, it records the ID and the current memory offset. On a `~ Call` token, it looks up the ID and emits a relative 32-bit `CALL` instruction (`0xE8`).
* It also correctly emits `JMP` instructions to skip over definition bodies during linear execution.
6. **Modal Editor (Win32 GDI):**
5. **Modal Editor (Win32 GDI):**
* The UI is built with raw Win32 GDI calls defined in `duffle.h`.
* It features two modes: `Navigation` (gray cursor, arrow key movement) and `Edit` (orange cursor, text input).
* The editor correctly handles token insertion, deletion (Vim-style backspace), tag cycling (Tab), and value editing, all while re-compiling and re-executing on every keystroke.
6. **O(1) Dictionary & Visual Linking:**
* The dictionary relies on an edit-time visual linker. When the tape is modified, `relink_tape` resolves names to absolute source memory indices.
* The compiler resolves references in `O(1)` time instantly by indexing into an offset mapping table (`tape_to_code_offset`).
7. **Implicit Definition Boundaries (Magenta Pipe):**
* Definitions implicitly cause the JIT to emit a `RET` to close the prior block, and an `xchg rax, rdx` to rotate the stack for the new block.
8. **x68 Instruction Padding:**
* The JIT pads every logical block/instruction to exact 32-bit multiples using `0x90` (NOPs) to perfectly align with the visual token grid logic.
9. **The FFI Bridge:**
* The system uses an FFI macro (`emit_ffi_dance`) to align the `RSP` stack to 16 bytes, allocate 32 bytes of shadow space, and map the 2-register data stack/globals into the Windows C-ABI (`RCX`, `RDX`, `R8`, `R9`) to safely call WinAPI functions (like `MessageBoxA`).
## What's Missing (TODO)
* **Saving/Loading:** The tape and annotation arenas are purely in-memory and are lost when the program closes.
* **Expanded Instruction Set:** The JIT only knows a handful of primitives (`SWAP`, `MULT`, `ADD`, `FETCH`, `STORE`, `DEC`, `RET_IF_ZERO`, `PRINT`). It has no support for floating point, stack manipulation for C FFI, or more complex branches.
* **Robust Dictionary:** The current dictionary is a simple array that is rebuilt on every compile. It doesn't handle collisions, scoping, or namespaces.
* **Annotation Editing:** Typing into an annotation just appends characters. A proper text-editing cursor within the token is needed.
* **Saving/Loading (Persistence):** The tape and annotation arenas are purely in-memory and are lost when the program closes. Need to implement the self-modifying OS cartridge concept.
* **Expanded Instruction Set:** The JIT only knows a handful of primitives. It has no support for floating point or more complex branches.
* **Annotation Editing & Comments:** Typing into an annotation just appends characters up to 8 bytes. A proper text-editing cursor within the token is needed, and support for arbitrarily long comments should be implemented.
* **Tape Drive / Preemptive Scatter Logic:** Improve the FFI argument mapping to properly read from the "tape drive" memory slots instead of just mapping RAX/RDX to the first parameters.
## References Utilized
* **Heavily Utilized:**
+188 -178
View File
@@ -1,16 +1,13 @@
#include "duffle.amd64.win32.h"
// --- Semantic Tags (Using X-Macros & Enum_) ---
// Colors translated from Cozy-and-WIndy:
// 0x00bbggrr Win32 format
#define Tag_Entries() \
X(Define, "Define", 0x0018AEFF, ":") /* Orange-ish (Language.Type) */ \
X(Call, "Call", 0x00D6A454, "~") /* Soft Blue (Language.Class) */ \
X(Data, "Data", 0x0094BAA1, "$") /* Muted Green (Language.Number) */ \
X(Imm, "Imm", 0x004AA4C2, "^") /* Sand/Yellow (Language.Keyword) */ \
X(Comment, "Comment", 0x00AAAAAA, ".") /* Grey (Language.Comment) */ \
X(Format, "Format", 0x003A2F3B, " ") /* Current Line BG for invisibles */
X(Define, "Define", 0x0018AEFF, ":") \
X(Call, "Call", 0x00D6A454, "~") \
X(Data, "Data", 0x0094BAA1, "$") \
X(Imm, "Imm", 0x004AA4C2, "^") \
X(Comment, "Comment", 0x00AAAAAA, ".") \
X(Format, "Format", 0x003A2F3B, " ")
typedef Enum_(U4, STag) {
#define X(n, s, c, p) tmpl(STag, n),
@@ -34,55 +31,61 @@ global const char* tag_names[] = {
#undef X
};
// Token Packing: 28 bits payload | 4 bits tag
#define pack_token(tag, val) ((u4_(tag) << 28) | (u4_(val) & 0x0FFFFFFF))
#define unpack_tag(token) ( ((token) >> 28) & 0x0F)
#define unpack_val(token) ( (token) & 0x0FFFFFFF)
// 2-Character Mapped Dictionary Helper
#define id2(a, b) ((u4_(a) << 8) | u4_(b))
#define TOKENS_PER_ROW 8
#define MODE_NAV 0
#define MODE_EDIT 1
// The Tape Drive (Using FArena from duffle)
global FArena tape_arena;
global FArena anno_arena;
global U8 cursor_idx = 0;
global U4 editor_mode = MODE_NAV;
global U4 mode_switch_now = false;
// Executable Code Arena (The JIT)
global FArena code_arena;
// VM State: 2-Reg Stack + Global Memory
global U8 vm_rax = 0; // Top
global U8 vm_rdx = 0; // Next
global U8 vm_rax = 0;
global U8 vm_rdx = 0;
global U8 vm_globals[16] = {0};
// Execution Mode & Logging
global B4 run_full = false;
global U8 log_buffer[16] = {0};
global U4 log_count = 0;
// UI State
global S4 scroll_y_offset = 0;
void ms_builtin_print(U8 val) {
if (log_count < 16) {
log_buffer[log_count++] = val;
}
if (log_count < 16) log_buffer[log_count++] = val;
}
// Dictionary
typedef struct {
U4 val;
U4 offset;
} DictEntry;
global DictEntry dict[256];
global U8 dict_count = 0;
// Visual Linker & O(1) Dictionary
global U4 tape_to_code_offset[65536] = {0};
#define PRIM_SWAP 1
#define PRIM_MULT 2
#define PRIM_ADD 3
#define PRIM_FETCH 4
#define PRIM_DEC 5
#define PRIM_STORE 6
#define PRIM_RET_Z 7
#define PRIM_RET 8
#define PRIM_PRINT 9
global const char* prim_names[] = {
"",
"SWAP ",
"MULT ",
"ADD ",
"FETCH ",
"DEC ",
"STORE ",
"RET_IF_Z",
"RETURN ",
"PRINT "
};
IA_ void scatter(U4 token, const char* anno_str) {
if (tape_arena.used + sizeof(U4) <= tape_arena.capacity && anno_arena.used + sizeof(U8) <= anno_arena.capacity) {
@@ -93,96 +96,138 @@ IA_ void scatter(U4 token, const char* anno_str) {
aptr[0] = 0;
if (anno_str) {
char* dest = (char*)aptr;
int i = 0; while(i < 8 && anno_str[i]) {
dest[i] = anno_str[i];
i ++;
}
int i = 0; while(i < 8 && anno_str[i]) { dest[i] = anno_str[i]; i ++; }
}
anno_arena.used += sizeof(U8);
}
}
// --- Minimal x86-64 Emitter ---
internal void emit8(U1 b) {
if (code_arena.used + 1 <= code_arena.capacity) {
U1*r ptr = u1_r(code_arena.start + code_arena.used);
ptr[0] = b;
u1_r(code_arena.start + code_arena.used)[0] = b;
code_arena.used += 1;
}
}
internal void emit32(U4 val) {
if (code_arena.used + 4 <= code_arena.capacity) {
U4*r ptr = u4_r(code_arena.start + code_arena.used);
ptr[0] = val;
u4_r(code_arena.start + code_arena.used)[0] = val;
code_arena.used += 4;
}
}
internal void pad32(void) {
while ((code_arena.used % 4) != 0) emit8(0x90);
}
internal void relink_tape(void) {
U8 tape_count = tape_arena.used / sizeof(U4);
U4*r tape_ptr = u4_r(tape_arena.start);
U8*r anno_ptr = u8_r(anno_arena.start);
for (U8 i = 0; i < tape_count; i++) {
U4 t = tape_ptr[i];
U4 tag = unpack_tag(t);
if (tag == STag_Call || tag == STag_Imm) {
char* ref_name = (char*)&anno_ptr[i];
U4 new_val = 0;
for (int p = 1; p <= 9; p++) {
int match = 1;
for (int c = 0; c < 8; c++) {
char c1 = ref_name[c] ? ref_name[c] : ' ';
char c2 = prim_names[p][c] ? prim_names[p][c] : ' ';
if (c1 != c2) { match = 0; break; }
}
if (match) { new_val = p; break; }
}
if (new_val == 0) {
for (U8 j = 0; j < tape_count; j++) {
if (unpack_tag(tape_ptr[j]) == STag_Define) {
char* def_name = (char*)&anno_ptr[j];
int match = 1;
for (int c = 0; c < 8; c++) {
char c1 = ref_name[c] ? ref_name[c] : ' ';
char c2 = def_name[c] ? def_name[c] : ' ';
if (c1 != c2) { match = 0; break; }
}
if (match) { new_val = j; break; }
}
}
}
tape_ptr[i] = pack_token(tag, new_val);
}
}
}
internal void compile_action(U4 val)
{
if (val == id2('S','W')) { // SWAP: xchg rax, rdx
if (val == PRIM_SWAP) {
emit8(0x48); emit8(0x87); emit8(0xC2);
pad32();
return;
} else if (val == id2('M','*')) { // MULT: imul rax, rdx
} else if (val == PRIM_MULT) {
emit8(0x48); emit8(0x0F); emit8(0xAF); emit8(0xC2);
pad32();
return;
} else if (val == id2('+',' ')) { // ADD: add rax, rdx
} else if (val == PRIM_ADD) {
emit8(0x48); emit8(0x01); emit8(0xD0);
pad32();
return;
} else if (val == id2('@',' ')) { // FETCH: mov rax, QWORD PTR [rcx + rax*8]
} else if (val == PRIM_FETCH) {
emit8(0x48); emit8(0x8B); emit8(0x04); emit8(0xC1);
pad32();
return;
} else if (val == id2('-','1')) { // DEC: dec rax
} else if (val == PRIM_DEC) {
emit8(0x48); emit8(0xFF); emit8(0xC8);
pad32();
return;
} else if (val == id2('!',' ')) { // STORE: mov QWORD PTR [rcx + rax*8], rdx
} else if (val == PRIM_STORE) {
emit8(0x48); emit8(0x89); emit8(0x14); emit8(0xC1);
pad32();
return;
} else if (val == id2('R','0')) { // RET_IF_ZERO: test rax, rax; jnz +1; ret
emit8(0x48); emit8(0x85); emit8(0xC0); // test rax, rax
emit8(0x75); emit8(0x01); // jnz skip_ret (+1 byte)
emit8(0xC3); // ret
return;
} else if (val == id2('R','E')) { // RET
} else if (val == PRIM_RET_Z) {
emit8(0x48); emit8(0x85); emit8(0xC0);
emit8(0x75); emit8(0x01);
emit8(0xC3);
pad32();
return;
} else if (val == id2('P','R')) { // PRINT: call ms_builtin_print
emit8(0x51); // push rcx
emit8(0x52); // push rdx
emit8(0x48); emit8(0x83); emit8(0xEC); emit8(0x20); // sub rsp, 32
emit8(0x48); emit8(0x89); emit8(0xC1); // mov rcx, rax
emit8(0x49); emit8(0xB8); // mov r8, imm64
} else if (val == PRIM_RET) {
emit8(0xC3);
pad32();
return;
} else if (val == PRIM_PRINT) {
emit8(0x51); emit8(0x52);
emit8(0x48); emit8(0x83); emit8(0xEC); emit8(0x20);
emit8(0x48); emit8(0x89); emit8(0xC1);
emit8(0x49); emit8(0xB8);
U8 addr = u8_(& ms_builtin_print);
emit32(u4_(addr & 0xFFFFFFFF));
emit32(u4_(addr >> 32));
emit8(0x41); emit8(0xFF); emit8(0xD0); // call r8
emit8(0x48); emit8(0x83); emit8(0xC4); emit8(0x20); // add rsp, 32
emit8(0x5A); // pop rdx
emit8(0x59); // pop rcx
emit8(0x41); emit8(0xFF); emit8(0xD0);
emit8(0x48); emit8(0x83); emit8(0xC4); emit8(0x20);
emit8(0x5A); emit8(0x59);
pad32();
return;
}
// Dictionary Resolver (Call User Word)
for (U8 entry = 0; entry < dict_count; entry++) {
if (dict[entry].val == val) {
U4 target = dict[entry].offset;
U4 current = code_arena.used;
S4 rel32 = s4_(target) - s4_(current + 5);
emit8(0xE8); // CALL rel32
if (val > 0) {
U4 target = tape_to_code_offset[val];
pad32();
S4 rel32 = s4_(target) - s4_(code_arena.used + 5);
emit8(0xE8);
emit32(u4_(rel32));
return;
}
pad32();
}
}
IA_ void compile_and_run_tape(void)
{
farena_reset(& code_arena);
dict_count = 0;
log_count = 0;
// Prologue: Load VM state from globals[14] and [15]
emit8(0x48); emit8(0x8B); emit8(0x41); emit8(0x70); // mov rax, [rcx+112]
emit8(0x48); emit8(0x8B); emit8(0x51); emit8(0x78); // mov rdx, [rcx+120]
emit8(0x48); emit8(0x8B); emit8(0x41); emit8(0x70);
emit8(0x48); emit8(0x8B); emit8(0x51); emit8(0x78);
U4*r tape_ptr = u4_r(tape_arena.start);
B4 in_def = false;
@@ -195,35 +240,36 @@ IA_ void compile_and_run_tape(void)
if (tag == STag_Define)
{
if (in_def == false) {
emit8(0xE9); // JMP rel32 (Skip over definition body)
pad32();
emit8(0xE9);
def_jmp_offset = code_arena.used;
emit32(0);
pad32();
in_def = true;
} else {
emit8(0xC3);
pad32();
}
if (dict_count < 256) {
dict[dict_count].val = val;
dict[dict_count].offset = code_arena.used;
dict_count++;
}
tape_to_code_offset[i] = code_arena.used;
emit8(0x48); emit8(0x87); emit8(0xC2);
pad32();
}
else if (tag == STag_Call)
{
compile_action(val);
if (val == id2('R','E') && in_def) {
// End of definition block, patch the jump
U4 current = code_arena.used;
u4_r(code_arena.start + def_jmp_offset)[0] = current - (def_jmp_offset + 4);
in_def = false;
}
}
else if (tag == STag_Data) {
emit8(0x48); emit8(0x89); emit8(0xC2); // mov rdx, rax
emit8(0x48); emit8(0xC7); emit8(0xC0); emit32(val); // mov rax, imm32
emit8(0x48); emit8(0x89); emit8(0xC2);
emit8(0x48); emit8(0xC7); emit8(0xC0); emit32(val);
pad32();
}
else if (tag == STag_Imm)
{
if (in_def) {
// If we execute something, we jump out of def block first
emit8(0xC3);
pad32();
U4 current = code_arena.used;
u4_r(code_arena.start + def_jmp_offset)[0] = current - (def_jmp_offset + 4);
in_def = false;
@@ -233,27 +279,24 @@ IA_ void compile_and_run_tape(void)
}
if (in_def) {
// If we hit cursor inside a definition, patch jump so it doesn't crash on execution
emit8(0xC3);
pad32();
U4 current = code_arena.used;
u4_r(code_arena.start + def_jmp_offset)[0] = current - (def_jmp_offset + 4);
}
// Epilogue: Save VM state back to globals
emit8(0x48); emit8(0x89); emit8(0x41); emit8(0x70); // mov [rcx+112], rax
emit8(0x48); emit8(0x89); emit8(0x51); emit8(0x78); // mov [rcx+120], rdx
emit8(0xC3); // ret
emit8(0x48); emit8(0x89); emit8(0x41); emit8(0x70);
emit8(0x48); emit8(0x89); emit8(0x51); emit8(0x78);
emit8(0xC3);
// Cast code arena to function pointer and CALL it!
typedef void JIT_Func(U8* globals_ptr);
JIT_Func* func = (JIT_Func*)code_arena.start;
func(vm_globals);
// Read state for UI
vm_rax = vm_globals[14];
vm_rdx = vm_globals[15];
}
// --- Window Procedure ---
S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
{
U8 tape_count = tape_arena.used / sizeof(U4);
@@ -267,12 +310,10 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
U4 val = unpack_val(t);
U1 c = u1_(wparam);
// Skip control characters and the 'E' that triggered the mode
bool should_skip = c < 32 || (c == 'e' && mode_switch_now);
if (should_skip) { mode_switch_now = false; return 0; }
if (tag == STag_Data) {
// Hex input
U4 digit = 16;
if (c >= '0' && c <= '9') digit = c - '0';
if (c >= 'a' && c <= 'f') digit = c - 'a' + 10;
@@ -290,28 +331,22 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
if (len < 8) {
anno_str[len] = (char)c;
for (int i = len + 1; i < 8; i++) anno_str[i] = '\0';
// Update the 2-char token ID from the first 2 chars
char c1 = anno_str[0] ? anno_str[0] : ' ';
char c2 = anno_str[1] ? anno_str[1] : ' ';
val = id2(c1, c2);
tape_ptr[cursor_idx] = pack_token(tag, val);
}
}
vm_rax = 0; vm_rdx = 0; mem_zero(u8_(vm_globals), sizeof(vm_globals));
relink_tape();
compile_and_run_tape();
ms_invalidate_rect(hwnd, nullptr, true);
return 0;
}
case MS_WM_KEYDOWN: {
if (wparam == 0x45 && editor_mode == MODE_NAV) { // 'E'
if (wparam == 0x45 && editor_mode == MODE_NAV) {
editor_mode = MODE_EDIT;
mode_switch_now = true;
ms_invalidate_rect(hwnd, nullptr, true);
return 0;
// ~~Consume the keypress so it doesn't trigger WM_CHAR~~
// NOTE(Ed): Still triggers WM_CHAR we need to track when we just entered edit mode and that must be consumed.
}
if (wparam == 0x1B && editor_mode == MODE_EDIT) { // ESC
if (wparam == 0x1B && editor_mode == MODE_EDIT) {
editor_mode = MODE_NAV;
ms_invalidate_rect(hwnd, nullptr, true);
return 0;
@@ -332,16 +367,14 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
while (len < 8 && anno_str[len] != '\0' && anno_str[len] != ' ') len ++;
if (len > 0) {
anno_str[len - 1] = '\0';
char c1 = anno_str[0] ? anno_str[0] : ' ';
char c2 = anno_str[1] ? anno_str[1] : ' ';
tape_ptr[cursor_idx] = pack_token(tag, id2(c1, c2));
}
}
vm_rax = 0; vm_rdx = 0; mem_zero(u8_(vm_globals), sizeof(vm_globals));
relink_tape();
compile_and_run_tape();
ms_invalidate_rect(hwnd, nullptr, true);
}
return 0; // Block navigation keys in Edit Mode
return 0;
}
if (wparam == MS_VK_RIGHT && cursor_idx < tape_count - 1) cursor_idx ++;
@@ -365,7 +398,7 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
U8 next_line_start = cursor_idx;
while (next_line_start < tape_count && unpack_tag(tape_ptr[next_line_start]) != STag_Format) next_line_start ++;
if (next_line_start < tape_count) {
next_line_start ++; // Skip the newline
next_line_start ++;
U8 next_line_end = next_line_start;
while (next_line_end < tape_count && unpack_tag(tape_ptr[next_line_end]) != STag_Format) next_line_end ++;
U8 next_line_len = next_line_end - next_line_start;
@@ -378,15 +411,12 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
if (wparam == MS_VK_F5) { run_full = !run_full; }
if (wparam == MS_VK_TAB) {
// Cycle Color Tag
U4 t = tape_ptr[cursor_idx];
U4 tag = (unpack_tag(t) + 1) % STag_Count;
tape_ptr[cursor_idx] = pack_token(tag, unpack_val(t));
}
else if (wparam == MS_VK_BACK)
{
// Delete Token
// Shift: delete AT cursor | Regular: delete TO THE LEFT
U8 delete_idx = cursor_idx;
B4 is_shift = (ms_get_async_key_state(MS_VK_SHIFT) & 0x8000) != 0;
if (is_shift == false) {
@@ -407,8 +437,6 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
}
}
else if (wparam == MS_VK_SPACE || wparam == MS_VK_RETURN) {
// Insert New Token
// Shift: insert AFTER cursor | Regular: insert BEFORE cursor
B4 is_shift = (ms_get_async_key_state(MS_VK_SHIFT) & 0x8000) != 0;
U8 insert_idx = cursor_idx;
if (is_shift) insert_idx ++;
@@ -423,7 +451,7 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
tape_ptr[insert_idx] = pack_token(STag_Format, 0xA);
anno_ptr[insert_idx] = 0;
} else {
tape_ptr[insert_idx] = pack_token(STag_Comment, id2(' ',' '));
tape_ptr[insert_idx] = pack_token(STag_Comment, 0);
anno_ptr[insert_idx] = 0;
}
if (is_shift) cursor_idx ++;
@@ -432,10 +460,10 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
}
}
// Interaction: Reset VM and compile
vm_rax = 0; vm_rdx = 0;
mem_zero(u8_(vm_globals), sizeof(vm_globals));
relink_tape();
compile_and_run_tape();
ms_invalidate_rect(hwnd, nullptr, true);
@@ -447,7 +475,7 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
void* hFont = ms_create_font_a(20, 0, 0, 0, 400, 0, 0, 0, 0, 0, 0, 0, 0, "Consolas");
void* hOldFont = ms_select_object(hdc, hFont);
ms_set_bk_mode(hdc, 1); // TRANSPARENT text background
ms_set_bk_mode(hdc, 1);
void* hBgBrush = ms_create_solid_brush(0x00222222);
ms_select_object(hdc, hBgBrush);
@@ -462,7 +490,6 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
U4*r tape_ptr = u4_r(tape_arena.start);
U8*r anno_ptr = u8_r(anno_arena.start);
// Render Tokens
for (U8 i = 0; i < tape_count; i++)
{
if (x >= start_x + (TOKENS_PER_ROW * spacing_x)) {
@@ -492,7 +519,6 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
ms_set_text_color(hdc, color);
if (editor_mode == MODE_EDIT && i == cursor_idx) {
// Better visibility in Edit Mode: White text on White-ish cursor
ms_set_text_color(hdc, 0x001E1E1E);
}
@@ -503,20 +529,12 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
}
else
{
// Extract annotation string
char* a_str = (char*) & anno;
if (a_str[0] == '\0') {
// Fallback to 2-character ID
val_str[0] = (char)((val >> 8) & 0xFF);
val_str[1] = (char)(val & 0xFF);
val_str[2] = ' '; val_str[3] = ' '; val_str[4] = ' '; val_str[5] = ' ';
val_str[6] = '\0';
for(int c=0; c<8; c++) {
val_str[c] = a_str[c] ? a_str[c] : ' ';
}
else {
mem_copy(u8_(val_str), u8_(a_str), 8);
val_str[8] = '\0';
}
}
char out_buf[12];
out_buf[0] = prefix[0];
out_buf[1] = ' ';
@@ -536,7 +554,6 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
}
}
// Draw a solid background behind the HUD to cover scrolling text
void* hHudBrush = ms_create_solid_brush(0x00141E23);
ms_select_object(hdc, hHudBrush);
ms_rectangle(hdc, -1, 500, 3000, 3000);
@@ -545,7 +562,6 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
ms_set_text_color(hdc, 0x00AAAAAA);
ms_text_out_a(hdc, 40, 10, "x86-64 Machine Code Emitter | 2-Reg Stack | [F5] Toggle Run Mode | [PgUp/PgDn] Scroll", 85);
// Render VM State
ms_set_text_color(hdc, 0x00FFFFFF);
char jit_str[64] = "Mode: Incremental | JIT Size: 0x000 bytes";
if (run_full) mem_copy(u8_(jit_str + 6), u8_("Full "), 11);
@@ -555,10 +571,9 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
char state_str[64] = "RAX: 00000000 | RDX: 00000000";
u64_to_hex(vm_rax, state_str + 5, 8);
u64_to_hex(vm_rdx, state_str + 21, 8);
ms_set_text_color(hdc, 0x0094BAA1); // Number green
ms_set_text_color(hdc, 0x0094BAA1);
ms_text_out_a(hdc, 40, 550, state_str, 29);
// HUD: Display Current Token Meaning
if (tape_count > 0 && cursor_idx < tape_count) {
U4 cur_tag = unpack_tag(tape_ptr[cursor_idx]);
const char* tag_name = tag_names [cur_tag];
@@ -580,11 +595,10 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
char glob_str[32] = "[0]: 00000000";
glob_str[1] = '0' + i;
u64_to_hex(vm_globals[i], glob_str + 5, 8);
ms_set_text_color(hdc, 0x00D6A454); // Soft blue
ms_set_text_color(hdc, 0x00D6A454);
ms_text_out_a(hdc, 400, 550 + (i * 25), glob_str, 13);
}
// Print Log
ms_set_text_color(hdc, 0x00C8C8C8);
ms_text_out_a(hdc, 750, 520, "Print Log:", 10);
for (int i = 0; i<log_count && i < 4; i ++) {
@@ -607,7 +621,6 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam)
}
int main(void) {
// 1. Initialize Memory Arenas using WinAPI + FArena
Slice tape_mem = slice_ut_(u8_(ms_virtual_alloc(nullptr, 64 * 1024, MS_MEM_COMMIT | MS_MEM_RESERVE, MS_PAGE_READWRITE)), 64 * 1024);
Slice anno_mem = slice_ut_(u8_(ms_virtual_alloc(nullptr, 64 * 1024, MS_MEM_COMMIT | MS_MEM_RESERVE, MS_PAGE_READWRITE)), 64 * 1024);
Slice code_mem = slice_ut_(u8_(ms_virtual_alloc(nullptr, 64 * 1024, MS_MEM_COMMIT | MS_MEM_RESERVE, MS_PAGE_EXECUTE_READWRITE)), 64 * 1024);
@@ -617,54 +630,51 @@ int main(void) {
farena_init(& anno_arena, anno_mem);
farena_init(& code_arena, code_mem);
// Bootstrap Robust Sample: Factorial State Machine
scatter(pack_token(STag_Comment, id2('I','N')), "INIT "); // .IN
scatter(pack_token(STag_Data, 5), 0); // $5
scatter(pack_token(STag_Data, 0), 0); // $0 (Addr)
scatter(pack_token(STag_Imm, id2('!',' ')), "STORE "); // ^!
scatter(pack_token(STag_Data, 1), 0); // $1
scatter(pack_token(STag_Data, 1), 0); // $1 (Addr)
scatter(pack_token(STag_Imm, id2('!',' ')), "STORE "); // ^!
scatter(pack_token(STag_Format, 0xA), 0); // Newline
// Define the FS (Factorial Step) word in memory
scatter(pack_token(STag_Define, id2('F','S')), "F_STEP ");
scatter(pack_token(STag_Comment, 0), "INIT ");
scatter(pack_token(STag_Data, 5), 0);
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, id2('@',' ')), "FETCH ");
scatter(pack_token(STag_Call, id2('R','0')), "RET_IF_Z");
scatter(pack_token(STag_Format, 0xA), 0); // Newline
scatter(pack_token(STag_Imm, 0), "STORE ");
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, id2('@',' ')), "FETCH ");
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, id2('@',' ')), "FETCH ");
scatter(pack_token(STag_Call, id2('M','*')), "MULT ");
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, id2('!',' ')), "STORE ");
scatter(pack_token(STag_Format, 0xA), 0); // Newline
scatter(pack_token(STag_Imm, 0), "STORE ");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Define, 0), "F_STEP ");
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, id2('@',' ')), "FETCH ");
scatter(pack_token(STag_Call, id2('-','1')), "DEC ");
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, id2('!',' ')), "STORE ");
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Call, 0), "RET_IF_Z");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, id2('@',' ')), "FETCH ");
scatter(pack_token(STag_Call, id2('P','R')), "PRINT "); // Print Accumulator!
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Call, 0), "MULT ");
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, 0), "STORE ");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Call, id2('R','E')), "RETURN "); // Return!
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Call, 0), "DEC ");
scatter(pack_token(STag_Data, 0), 0);
scatter(pack_token(STag_Call, 0), "STORE ");
scatter(pack_token(STag_Format, 0xA), 0); // Newline
scatter(pack_token(STag_Data, 1), 0);
scatter(pack_token(STag_Call, 0), "FETCH ");
scatter(pack_token(STag_Call, 0), "PRINT ");
// Call it
scatter(pack_token(STag_Imm, id2('F','S')), "F_STEP "); // ^FS
scatter(pack_token(STag_Imm, id2('F','S')), "F_STEP ");
scatter(pack_token(STag_Imm, id2('F','S')), "F_STEP ");
scatter(pack_token(STag_Imm, id2('F','S')), "F_STEP ");
scatter(pack_token(STag_Imm, id2('F','S')), "F_STEP ");
scatter(pack_token(STag_Format, 0xA), 0);
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
scatter(pack_token(STag_Imm, 0), "F_STEP ");
relink_tape();
MS_WNDCLASSA wc;
mem_fill(u8_(& wc), 0, sizeof(wc));
+2 -1
View File
@@ -19,10 +19,11 @@ This document serves as the master blueprint for the research and curation phase
## 3. Onat's VAMP/KYRA Architecture (The Runtime/Codegen)
* **2-Item Register Stack:** Uses `RAX` and `RDX` as a tiny, hardware-resident stack.
* **The Swap:** `xchg rax, rdx` (1-byte: `48 87 C2`) is emitted to rotate the "top of stack".
* **The Swap / Magenta Pipe:** A definition boundary implicitly emits `RET` (to close the last block) followed by `xchg rax, rdx` (1-byte: `48 87 C2` or `48 92`) to rotate the "top of stack" for the new block.
* **Aliased Global Namespace:** The CPU register file is treated as a shared, aliased memory space for functions.
* **Functions as Blocks:** Words are "free of arguments and returns" in the traditional sense.
* **Preemptive Scatter ("Tape Drive"):** Arguments are pre-placed into fixed, contiguous memory slots ("the tape") by the compiler/loader before execution. This eliminates "argument gathering" during function calls.
* **The FFI Dance (C-ABI Integration):** To call OS APIs (like WinAPI or Vulkan), the hardware stack pointer (`RSP`) must be strictly 16-byte aligned. Custom macros (like `CCALL`) must save state, align `RSP`, map the 2-register stack into C-ABI registers (`RCX`, `RDX`, `R8`, `R9`), execute the `CALL`, and restore `RSP`.
## 4. Implementation Components
* **Emitter:** **Zydis Encoder API**. Zero-allocation, sub-5ms instruction generation.
+97
View File
@@ -0,0 +1,97 @@
# In-Depth Analysis: Timothy Lottes's Development Blogs (2007 - 2016)
This document synthesizes the architectural paradigms, implementation details, and philosophical shifts documented in Timothy Lottes's blogs over a decade of building minimal, high-performance Forth-like operating environments. This knowledge is crucial for understanding the "Lottes/Onat Paradigm" and successfully implementing the `bootslop` project.
---
## 1. The Core Philosophy: "Vintage Programming"
Lottes advocates for returning to a "stone-age" development methodology reminiscent of the Commodore 64 or HP48, but applied to modern x86-64 hardware and GPUs.
* **Rejection of Modern Complexity:** He explicitly rejects the "NO" of modern operating systems—compilers, linkers, debuggers, memory protection, paging, and bloated ABIs. He aims for an environment that says "YES" to direct hardware access.
* **The OS IS the Editor:** The system boots directly into a visual editor. This editor functions simultaneously as an IDE, assembler, disassembler, hex editor, debugger, and live-coding environment.
* **Instant Iteration:** The primary goal is a sub-5ms edit-compile-run loop. Debugging is done via instant visual feedback and "printf" style memory peeking within the editor itself, rendering traditional debuggers obsolete.
* **Extreme Minimalism:** His compilers and core runtimes often fit within 1.5KB to 4KB (e.g., the 1536-byte bootloader/interpreter project).
## 2. The Evolution to "Source-Less" Programming
The most critical architectural shift in Lottes's work is the move from text-based source files (like his 2014 "A" language) to **Source-Less Programming** (2015).
### Why Source-Less?
Parsing text (lexical analysis, string hashing, AST generation) is slow and complex. In a source-less model, the "source code" *is* the binary executable image (or a direct structured representation of it).
### The Architecture of Source-Less (x68)
1. **32-Bit Granularity:** Every token in the system is exactly 32 bits (4 bytes).
* To accommodate variable-length x86-64 instructions, Lottes invented "x68".
* **Padding:** Standard x86 instructions are padded to exactly 32 bits (or multiples of 32 bits) using ignored segment override prefixes (like `2E` or `3E`) and multi-byte NOPs.
* Example: A `RET` instruction (`C3`) becomes `C3 90 90 90`.
* *Why?* This keeps immediate values (like 32-bit addresses or constants) 32-bit aligned, drastically simplifying the editor and the assembler.
2. **The Token Types:** A 32-bit word in memory represents one of four things:
* **DAT (Data):** Hexadecimal data or an immediate value.
* **OP (Opcode):** A padded 32-bit x86-64 machine instruction.
* **ABS (Absolute Address):** A direct 32-bit memory pointer.
* **REL (Relative Address):** An `[RIP + imm32]` relative offset used for branching.
3. **The Annotation Overlay (The "Shadow" Memory):**
* Because raw 32-bit hex values are unreadable to humans, the editor maintains a *parallel array* of 64-bit annotations for every 32-bit token.
* **Annotation Layout (64-bit):**
* `Tag` (4 to 8 bits): Defines how the editor should display and treat the 32-bit value (e.g., display as a signed int, an opcode name, a relative address, or a specific color).
* `Label / Name`: A short string (e.g., 5 to 8 characters, often compressed using 6-bit or 7-bit encodings to fit) that acts as the human-readable name for the memory address.
* *The Magic:* The editor reads the binary array and the annotation array. It uses the tags to dynamically format the screen. There is **zero string parsing** at runtime.
4. **Edit-Time Relinking (The Visual Linker):**
* When you insert or delete a token in the editor, all tokens tagged as `ABS` or `REL` (addresses) are automatically recalculated and updated in real-time. The editor *is* the linker.
5. **Live State vs. Edit State:**
* Memory is split: The live running program, and the edit buffer.
* When edits are made and confirmed (e.g., hitting ESC or Enter), the editor atomically swaps or patches the live image with the edited image.
## 3. Language Paradigms: "Ear" and "Toe"
In his "Random Holiday 2015" post, Lottes solidifies the specific DSLs used within this source-less framework:
* **"Toe" (The Low-Level Assembler):** This is the subset of x86-64 with 32-bit padded opcodes. It is heavily macro-driven to assemble machine code.
* **"Ear" (The High-Level Macro/Forth Language):** A zero-operand, Forth-like language embedded directly into the binary form.
* Instead of a traditional Forth interpreter searching a dictionary at runtime, the dictionary is resolved at *edit-time* or *import-time*.
* A token is just an index or a direct `CALL` instruction to the compiled word.
### The 2-Item Stack (Implicit Registers)
While early experiments used a traditional Forth data stack in memory, Lottes's later architectures (and Onat's derived work) map the stack directly to hardware registers to eliminate memory overhead:
* `RAX` = Top of Stack (TOS)
* `RBX` (or `RDX` in Onat's VAMP) = Second item on stack (NOS)
* **The xchg Trick:** Stack rotation is often handled by `xchg rax, rbx` (or `rdx`), which compiles to a tiny 2-3 byte instruction, keeping execution entirely within the CPU cache.
## 4. Bootstrapping "The Chicken Without an Egg"
How do you build a system that requires a custom binary editor to write code, when you don't have the editor yet?
1. **C Prototype First:** Lottes explicitly states he builds the first iteration of the visual editor and virtual machine in C (using WinAPI or standard libraries). This allows rapid iteration of the visual layout and the memory arena logic.
2. **Hand-Assembling Bootstraps:** He uses standard assemblers (like NASM) or hexadecimal byte-banging (using tools like `objdump -d`) to figure out the exact padded 32-bit opcode bytes.
3. **Embed Opcode Definitions:** The C prototype includes hardcoded arrays of bytes that represent the base opcodes (e.g., `MOV`, `ADD`, `CALL`, `RET`).
4. **Self-Hosting:** Once the C editor is stable and can generate binary code into an arena, he rewrites the editor *inside* the custom language within the C editor, eventually discarding the C host.
## 5. UI and Visual Design
The UI is not an afterthought; it is integral to the architecture.
* **The Grid:** The editor displays memory as a strict grid. Typical layout: 8 tokens per line (fitting half a 64-byte cache line).
* **Two Rows per Token:**
* Top Row: The Annotation (Label/Name), color-coded.
* Bottom Row: The 32-bit Data (Hex value, or a resolved symbol name if tagged as an address).
* **Colors (ColorForth Inspired):**
* Colors dictate semantic meaning (e.g., Red = Define, Green = Compile, Yellow = Execute/Immediate, White/Grey = Comment/Format). This visual syntax replaces traditional language keywords.
* **Pixel-Perfect Fonts:** Lottes builds custom, fixed-width raster fonts (e.g., 6x11 or 8x8) to ensure perfect readability without anti-aliasing blurring, often treating specific characters (like `_`, `-`, `=`) as line-drawing characters to structure the UI.
## Summary for the `bootslop` Implementation
Our current `attempt_1/main.c` is perfectly aligned with Phase 1 of the Lottes bootstrapping process:
1. We have a C-based WinAPI editor.
2. We have a token array (`tape_arena`) and an annotation array (`anno_arena`).
3. We have 32-bit tokens packed with a 4-bit semantic tag and a 28-bit payload.
4. We have a basic JIT emitter targeting a 2-register (`RAX`/`RDX`) virtual machine.
**Next Immediate Priorities based on Lottes's path:**
* Move away from string-based dictionary lookups at runtime to **Edit-Time Relinking** (resolving addresses when the token is typed or modified in the UI).
* Implement the **Padding Strategy** for the x86-64 JIT emitter to ensure all emitted logical blocks align cleanly, paving the way for 1:1 token-to-machine-code mapping.
* Refine the Editor Grid to show the two-row (Annotation / Data) layout clearly.
+58
View File
@@ -0,0 +1,58 @@
# In-Depth Analysis: Onat's Forth Day 2020 Presentation
This document provides an exhaustive breakdown of the technical specifics, screen visuals, and mechanical explanations from Onat Türkçüoğlu's "Preview of x64 & ColorForth & SPIR V" presentation at Forth Day 2020, synthesizing both the video transcript and the OCR analysis of the editor's visual state.
---
## 1. The Environment and Editor UI
Onat introduces a custom 3-pane UI built entirely from scratch in C and Vulkan. This editor serves as the primary IDE, compiler, and visual debugger.
### Visual Layout (from OCR & Video)
* **The Three Panes:** Left/center panes display the block-based, colorized Forth/macro tokens. The right pane displays live x86-64 assembly output (or SPIR-V binary data) that updates instantly as the user edits the source.
* **Color Semantics (Observed in OCR):**
* **Cyan:** Low-level x86-64 opcodes or API functions (`mov`, `jmp`, `xorpd`, `CCALL1`, `ide_syscmd`).
* **Yellow:** Line numbers, specific execution tokens, or immediate jump labels/blocks.
* **Magenta:** High-level struct definitions, bitwise layouts, and basic block delineations (`Structs`, `vars`, `bits`).
* **Red:** Literal numbers (`32`, `64`), format strings, or specific SPIR-V instruction IDs and properties.
* **Orange/Green:** UI and control flow modifiers.
* **State Tracking:** The editor treats code blocks as tracked state objects, which allows for native, robust Undo/Redo operations without relying on a traditional text file format.
## 2. O(1) Dictionary Lookup & "Compile-Time Call Graph"
Traditional Forth systems (and even Lottes's early systems) relied on hashing strings or linear searches to resolve words. Onat eliminated this overhead entirely.
* **Source Memory Mapping:** Instead of hashing, the compiler allocates an extra 4 bytes per character in the visual block to store the *exact source memory location* of the currently compiled word.
* **Instant Resolution:** Because the token itself points to its origin, "Jump to Definition" is instantaneous.
* **Execution Tracing:** He demonstrates a command that instantly numbers every occurrence of a word across the codebase in the exact chronological order of execution. This provides a "compile-time call graph" without actually running the program, allowing the programmer to visualize the data flow statically.
## 3. The High-Level x64 Macro Assembler
The core of the system is not a traditional Forth interpreter, but a high-level macro assembler that compiles words directly into x64 machine code.
* **Syntax & Abstraction:**
* The syntax is designed to be readable and fluid: `AX to BX` or `CX + offset`.
* A "direction register" macro allows toggling the flow of data. For instance, `from AX to BX register, let's move an unsigned` emits a 32-bit `mov ebx, eax`.
* Modifiers like `long` change the emission to a 64-bit `mov rbx, rax`.
* **Low-Level Control (OCR Insights):** The OCR reveals exact x64 instructions embedded in the blocks:
* `xorpd xm15, xm15` and `movups [rsi], xm15` show direct, native access to SSE/AVX registers for vectorized operations.
* Macros like `PUSH2 rsi, rdi` and `POP2 rsi, rdi` are used instead of traditional C-style prologues/epilogues, maintaining tight control over the stack pointer and register preservation.
* **C-ABI Integration:** The OCR shows words like `CCALL1 ide_p` and `CCALL3 ide_syscmd`. This indicates a custom FFI (Foreign Function Interface) macro set (`CCALL0`, `CCALL1`, `CCALL2`, `CCALL3`) designed to automatically align the stack (`RSP` to 16 bytes) and map registers to the C-ABI (e.g., `RCX`, `RDX`, `R8`, `R9` on Windows) to call out to the C-based host/Vulkan engine.
## 4. SPIR-V Generation
A significant portion of the presentation focuses on using this same macro-assembler foundation to generate SPIR-V (the intermediate representation for Vulkan compute/graphics shaders) entirely from scratch, replacing massive compiler toolchains like `glslang`.
* **x64 vs. SPIR-V Complexity:** Onat notes that x64 assembly was actually *less* complicated to generate than SPIR-V.
* x64 is a flat, linear instruction stream.
* SPIR-V is strictly structured. It requires rigid sections for Capabilities, Extensions, Memory Models, Entry Points, Execution Modes, Types, and Function Definitions before any actual logic can be emitted.
* **SPIR-V Macros (OCR Insights):** The OCR captures the exact implementation of the SPIR-V generator:
* Words like `opTypeInt 32`, `opTypeVector 4`, `opTypeFloat` map directly to the SPIR-V specification binary IDs.
* Memory addresses and types are explicitly laid out: `PhysicalStorageBuffer64`.
* This proves that the "sourceless" environment scales perfectly from raw CPU machine code to structured GPU bytecodes by just changing the underlying byte-emission macros.
## 5. Key Takeaways for the `bootslop` Implementation
1. **Immediate x64 Access:** The system shouldn't hide the CPU. It should expose it via macros (like `CCALL`) that handle the tedious parts of the ABI while letting the programmer write `movups` if they want to.
2. **Visual Over Text:** The implementation of 4 extra bytes per character to store "source location" reinforces that the visual grid *is* the data structure. It's not text being parsed; it's a spatial array of tokens pointing to each other.
3. **The FFI Bridge:** We will need a macro pattern equivalent to `CCALL` in our JIT emitter to talk to WinAPI functions without trashing the 2-item (`RAX`/`RDX`) stack or violating the 16-byte `RSP` alignment required by Windows.
+86
View File
@@ -0,0 +1,86 @@
# In-Depth Analysis: Metaprogramming KYRA in KYRA (Onat Türkçüoğlu)
This document provides a comprehensive synthesis of the "Metaprogramming KYRA in KYRA" presentation given by Onat Türkçüoğlu at the Silicon Valley Forth Interest Group (SVFIG) on April 26, 2025. It integrates insights from the video transcript and the extensive OCR analysis of his visual editor.
This presentation is the most explicit, hardcore low-level deep dive into Onat's binary-encoded compiler (KYRA) and serves as the definitive mechanical blueprint for our `bootslop` project.
---
## 1. Performance and "Runtime-Opinionated" Languages
Onat's primary critique of traditional Forth (and languages like C or Rust) is that they are "runtime opinionated." Standard Forth dictates a memory-based data stack and return stack. This makes it fundamentally incompatible with environments like GPU compute shaders.
* **Compilation Speed:** KYRA compiles its entire program (including a custom editor, Vulkan renderers, and FFMPEG integrations) in **8.24 milliseconds** natively on Windows/Linux.
* **The 2-Item Hardware Stack:** To achieve hardware locality and GPU compatibility, KYRA strictly restricts the data stack to exactly two CPU registers: **`RAX` (Top of Stack)** and **`RDX` (Next on Stack)**.
* **Zero Stack Overhead:** By having no memory data stack, KYRA eliminates the push/pop overhead that plagues standard Forth implementations.
## 2. The Mechanics of the KYRA Emitter
KYRA is not an interpreter; it is a high-level macro assembler that generates direct x86-64 machine code via JIT compilation.
### The `xchg` Trick (The Magenta Pipe `|`)
* Because the stack is just `RAX` and `RDX`, ensuring `RAX` is the active "Top of Stack" before executing a word is vital.
* The `xchg rax, rdx` instruction compiles to a tiny 2-byte opcode: `48 92`.
* **Definitions:** There are no `begin` or `end` words. A magenta pipe token (`|`) implicitly signals the start of a new definition. The JIT reacts to this by:
1. Emitting a `RET` (`C3`) to close the *previous* definition.
2. Emitting `48 92` (`xchg rax, rdx`) to ensure proper stack alignment for the *new* definition.
### Color Semantics and Code Generation (From Transcript & OCR)
* **Magenta (`|`):** Definition boundary (`RET` + `xchg rax, rdx`).
* **White (Call):** A compile-time call. Emits a direct `CALL` instruction or a `JMP RAX` (e.g., `FFE0`) if optimizing a tail call.
* **Green (Load):** Emits a read from memory: `mov rax, [global_offset]`.
* **Red (Store):** Emits a write to memory: `mov [global_offset], rax`.
* **Yellow (Execute/Immediate):** A highly overloaded color used for runtime execution, immediate invocation of lambdas, or prefix accessors (like struct member reading).
* **Cyan (Literal):** Compiles an immediate value load: `mov rax, imm`.
* **Blue (Comment):** Stored directly in the token payload (3 characters per 24-bit payload) without polluting the global dictionary.
## 3. Global Memory vs. Local Variables
Onat heavily critiques the conventional wisdom of avoiding global variables, specifically calling out Rust for forcing developers to pass state through 30 layers of call stacks.
* **Implicit Register Passing:** For passing transient state (like the active UI element's `slot ID`), he implicitly passes the value in a dedicated register (e.g., `R12D`) across functions, completely bypassing any need to push it to a stack.
* **Single-Register Memory Base:** He dedicates a single CPU register to act as the base pointer for all program memory. This gives instant `[BASE_REG + offset]` access to "gigabytes of state."
* **The "Tape Drive" in Practice:** Instead of a stack, data needed for complex API calls (like Vulkan initialization) is pre-scattered into these known global offsets using Red (Store) words, and then passed via a single pointer.
## 4. Dictionary Management and The "Deck"
Unlike text-based Forths that require hashing, KYRA uses a pure binary index map.
* **24-Bit Indices:** Words are stored as 24-bit indices pointing to 8-byte cells. (Onat notes his next iteration moves to 32-bit indices + a separate 1-byte tag array, exactly matching Lottes's `x68` annotation model).
* **Visual Organization (The "Scrolls"):** The dictionary is explicitly organized by the programmer into 16-word horizontal "scrolls" (e.g., one scroll for "Vulkan API", another for "Math").
* **IP Protection:** Because the dictionary mapping is separate from the source array, you can ship the binary source indices without the dictionary symbols, effectively stripping the symbols while retaining the executable structure.
## 5. Control Flow: Basic Blocks `[ ]` and Lambdas `{ }`
KYRA eliminates standard Abstract Syntax Trees (ASTs) and `if/else/then` branching.
* **Basic Blocks `[ ]`:** These visually constrain the assembly output. They provide implicit begin, link (else), and end jump targets for the JIT to resolve relative offsets within a limited scope.
* **Lambdas `{ }`:** A lambda (colored Yellow `{`) does not execute inline. The JIT compiles the block of code elsewhere in the arena and leaves its executable memory address in `RAX`.
* **Conditionals:** To perform an `IF`:
1. Evaluate a condition (e.g., `luma > 0.6`).
2. Write the boolean result to a dedicated global `condition` variable.
3. Define a lambda block containing the "true" branch (leaving its address in `RAX`).
4. Call an execution word that reads the `condition` variable, emits a `cmp condition, 0`, and executes a `jz` (jump if zero) to skip the lambda address stored in `RAX`.
## 6. FFI: Bridging to C and Vulkan (WinAPI equivalent)
Dealing with OS APIs and standard C libraries (like Vulkan and FFMPEG) requires satisfying the C Application Binary Interface (ABI).
* **RSP Alignment:** The hardware stack pointer (`RSP`) is exclusively used for the call stack (return addresses), eliminating buffer overflow vulnerabilities.
* **The FFI Dance:** When calling external C functions, Onat's macros explicitly read `RSP` into a temporary variable, align `RSP` to 16-bytes (a strict requirement for Windows/Linux x64 C ABI), execute the `CALL`, and then restore `RSP`.
* *(Note for Bootslop: We saw `CCALL1`, `CCALL2`, etc., in the OCR, confirming he uses specialized macro words to map the `RAX`/`RDX` stack and global variables into the standard `RCX`, `RDX`, `R8`, `R9` C-ABI registers before triggering the OS call).*
## 7. Development Workflow
* **Bug Triage over Asserts:** There are no unit tests or assertions. Bugs are found by commenting out blocks of code (disabling them) and hitting compile. Because compilation takes 8ms, binary searching for the crash point is faster than writing tests.
* **Free Printf / Data Flow:** By hovering over a word in the editor, the system automatically injects code to record `RAX` and `RDX` at that exact execution step, allowing the programmer to step through the data flow visually without running traditional debuggers.
---
### Conclusion for `bootslop`
The "Metaprogramming KYRA" talk confirms that our 2-register stack and "preemptive scatter" global memory model in `attempt_1/main.c` is the exact correct path.
The next major hurdles for `bootslop` will be:
1. Implementing the `xchg rax, rdx` definition boundary logic.
2. Creating an FFI bridge (like Onat's `CCALL`) that aligns `RSP` to 16 bytes and maps globals to WinAPI registers, allowing our minimal Forth to summon full OS windows and graphics.
3. Transitioning dictionary definitions from string-parsing to direct array index resolution.
+62
View File
@@ -0,0 +1,62 @@
# In-Depth Analysis: Neokineogfx - 4th And Beyond (Timothy Lottes)
This document synthesizes the insights extracted from the transcript and OCR analysis of Timothy Lottes's "4th And Beyond" presentation video (released under his Neokineogfx channel in 2026). It details the evolution of his Forth derivatives, the specifics of his "x68" encoding, and the mechanics of his "5th" system.
---
## 1. Evolution from Calculator to Forth
Lottes traces the ideal interactive tool back to Reverse Polish Notation (RPN) calculators like the HP48.
* **The Baseline:** Start with simple RPN math on a stack.
* **The Dictionary:** Introduce a dictionary that points to positions on the data stack or to executable code.
* **Color Semantics (ColorForth Inspired):**
* **Yellow (Execute):** Push numbers to the stack, or execute dictionary words.
* **Red (Define):** Define a word.
* **Green (Compile):** Compile words or push values during compilation.
* **Magenta (Variable):** Define a variable.
## 2. The Branch Misprediction Problem
Standard Forth causes severe CPU pipeline stalls (averaging 16-clock stalls on architectures like Zen 2) due to constant branch misprediction when interpreting tags or navigating the dictionary lookup loop.
* **Solution - The Folded Interpreter:** Lottes mitigates this by folding a tiny (5-byte) interpreter directly into the end of every compiled word.
* By ending every word with its own fetch/dispatch logic (e.g., `LODSD`, lookup, `JMP`), the CPU's branch predictor gets unique slots for every transition, drastically improving execution speed.
## 3. The Architecture of "Source-Less" (x68)
To make manipulating binary data as easy as text, Lottes invented "x68"—a subset of x86-64 designed purely around 32-bit boundaries.
* **32-Bit Instruction Granularity:** Every x86-64 instruction is padded to exactly 4 bytes (or multiples of 4).
* **Prefix Padding:** x86-64 allows ignored prefixes (like `3E`, the DS segment override) and multi-byte NOPs to pad instructions.
* *Example (RET):* `C3` padded to `f0f c3` or `C3 90 90 90` (RET + NOPs).
* *Example (Inline Data):* Moving a 32-bit immediate is padded with `3E`s to ensure the immediate value is perfectly 32-bit aligned in the next memory slot.
* **Why?** This removes the complexity of variable-length instructions, turning compilation into an edit-time operation where the user simply copies and pastes 32-bit words.
## 4. Editor Mechanics & Annotation Overlay
The editor is an "Advanced 32-bit Hex Editor". The source code is literally the binary array.
* **Structure:** The file is split into blocks. For every 32-bit source word, there are 64 bits of annotation memory.
* **64-bit Annotation Layout:**
* 8 characters encoded in 7 bits each (56 bits total) acting as the human-readable Label/Note.
* 8-bit Tag. This tag dictates how the 32-bit value in memory is formatted in the editor (e.g., Hex Data, Absolute Address, Relative Address).
* **Visual Layout:** The editor displays lines with two elements per cell:
* Top: The Annotation string (color-coded by tag).
* Bottom: The 32-bit interpreted value.
* **Auto-Relinking:** The editor dynamically recalculates `CALL`/`JMP` 32-bit relative offsets and 8-bit conditional jump offsets when tokens are inserted or deleted. The editor is the linker.
## 5. Free-Form Source & Argument Fetching
Lottes diverges from strict zero-operand Forth by introducing "preemptive scatter" arguments directly in the source stream.
* **Source is the Dictionary:** The 32-bit words are direct absolute memory pointers into the binary.
* **Argument Fetching:** Instead of pushing to a data stack before calling, words can read ahead in the instruction stream.
* `[RSI]` points to the current word.
* `[RSI+4]`, `[RSI+8]` can be fetched directly into registers (like `RCX`, `RDX`) within the word's implementation.
* **Benefits:** This reduces branch granularity and eliminates stack shuffling overhead, making it much faster for heavy code-generation tasks (like JITing GPU shaders).
## 6. The Self-Modifying OS Cartridge
To handle persistent storage and live updates without complex OS APIs, Lottes leverages Linux's memory mapping and dirty page writeback.
* **The Execution Loop:**
1. Launch `cart` (the binary).
2. The binary copies itself to `cart.bck` and launches `cart.bck`.
3. `cart.bck` maps the original `cart` file into memory (e.g., at the 6MiB mark) with Read/Write/Execute (RWE) permissions.
4. It maps an adjustable zero-fill memory space immediately following it.
5. It jumps into the interpreter.
* **Persistence:** Because the file is mapped into memory, any changes made in the editor modify the file in RAM. Linux's kernel automatically flushes "dirty pages" to the physical disk (e.g., every 30 seconds on SteamOS/SteamDeck). There is no "Save File" code required; data and code reside together and persist implicitly.
-178
View File
@@ -1,178 +0,0 @@
$path_root = split-path -Path $PSScriptRoot -Parent
$misc = join-path $PSScriptRoot 'helpers/misc.ps1'
. $misc
$path_toolchain = join-path $path_root 'toolchain'
$path_rad = join-path $path_toolchain 'rad'
# --- Toolchain Executable Paths ---
$compiler = 'clang'
$optimizer = 'opt.exe'
$linker = 'lld-link.exe'
$archiver = 'llvm-lib.exe'
$radbin = join-path $path_rad 'radbin.exe'
$radlink = join-path $path_rad 'radlink.exe'
# https://clang.llvm.org/docs/ClangCommandLineReference.html
$flag_all_c = @('-x', 'c')
$flag_c11 = '-std=c11'
$flag_c23 = '-std=c23'
$flag_all_cpp = '-x c++'
$flag_charset_utf8 = '-fexec-charset=utf-8'
$flag_compile = '-c'
$flag_color_diagnostics = '-fcolor-diagnostics'
$flag_no_builtin_includes = '-nobuiltininc'
$flag_no_color_diagnostics = '-fno-color-diagnostics'
$flag_debug = '-g'
$flag_debug_codeview = '-gcodeview'
$flag_define = '-D'
$flag_emit_llvm = '-emit-llvm'
$flag_stop_after_gen = '-S'
$flag_exceptions_disabled = '-fno-exceptions'
$flag_rtti_disabled = '-fno-rtti'
$flag_diagnostics_absolute_paths = '-fdiagnostics-absolute-paths'
$flag_preprocess = '-E'
$flag_include = '-I'
$flag_section_data = '-fdata-sections'
$flag_section_functions = '-ffunction-sections'
$flag_library = '-l'
$flag_library_path = '-L'
$flag_linker = '-Wl,'
$flag_link_dll = '/DLL'
$flag_link_mapfile = '/MAP:'
$flag_link_optimize_references = '/OPT:REF'
$flag_link_win_subsystem_console = '/SUBSYSTEM:CONSOLE'
$flag_link_win_subsystem_windows = '/SUBSYSTEM:WINDOWS'
$flag_link_win_machine_32 = '/MACHINE:X86'
$flag_link_win_machine_64 = '/MACHINE:X64'
$flag_link_win_debug = '/DEBUG'
$flag_link_win_pdb = '/PDB:'
$flag_link_win_path_output = '/OUT:'
$flag_link_no_incremental = '/INCREMENTAL:NO'
$flag_no_optimization = '-O0'
$flag_optimize_fast = '-O2'
$flag_optimize_size = '-O1'
$flag_optimize_intrinsics = '-Oi'
$flag_path_output = '-o'
$flag_preprocess_non_intergrated = '-no-integrated-cpp'
$flag_profiling_debug = '-fdebug-info-for-profiling'
$flag_set_stack_size = '-stack='
$flag_syntax_only = '-fsyntax-only'
$flag_target_arch = '-target'
$flag_time_trace = '-ftime-trace'
$flag_verbose = '-v'
$flag_wall = '-Wall'
$flag_warning = '-W'
$flag_warnings_as_errors = '-Werror'
$flag_nologo = '/nologo'
$path_build = join-path $path_root 'build'
if ( -not(test-path -Path $path_build) ) {
new-item -ItemType Directory -Path $path_build
}
push-location $path_build
# --- File Paths ---
$unit_name = "simple"
$unit_source = join-path $path_root "code\C\$unit_name.c"
$ir_unoptimized = join-path $path_build "$unit_name.ll"
$ir_optimized = join-path $path_build "$unit_name.opt.ll"
$object = join-path $path_build "$unit_name.obj"
$binary = join-path $path_build "$unit_name.exe"
$pdb = join-path $path_build "$unit_name.pdb"
$map = join-path $path_build "$unit_name.map"
# --- Stage 1: Compile C to LLVM IR ---
write-host "Stage 1: Compiling C to LLVM IR"
$compiler_args = @()
# $compiler_args += $flag_stop_after_gen
# $compiler_args += $flag_emit_llvm
$compiler_args += ($flag_define + 'BUILD_DEBUG=1')
$compiler_args += $flag_debug
# $compiler_args += $flag_debug_codeview
$compiler_args += $flag_wall
# $compiler_args += $flag_charset_utf8
$compiler_args += $flag_c23
$compiler_args += $flag_no_optimization
# $compiler_args += $flag_no_builtin_includes
$compiler_args += $flag_diagnostics_absolute_paths
$compiler_args += $flag_rtti_disabled
$compiler_args += $flag_exceptions_disabled
$compiler_args += ($flag_include + $path_root)
$compiler_args += $flag_compile
$compiler_args += $flag_path_output, $object
$compiler_args += $unit_source
$compiler_args | ForEach-Object { Write-Host $_ }
$stage1_time = Measure-Command { & $compiler $compiler_args }
write-host "Compilation took $($stage1_time.TotalMilliseconds)ms"
# write-host "IR generation took $($stage1_time.TotalMilliseconds)ms"
write-host
# --- Stage 2: Manually Optimize LLVM IR ---
if ($false) {
write-host "Manually Optimizing LLVM IR with 'opt'"
$optimization_passes = @(
'-sroa', # Scalar Replacement Of Aggregates
'-early-cse', # Early Common Subexpression Elimination
'-instcombine' # Instruction Combining
)
$optimizer_args = @(
$optimization_passes,
$ir_unoptimized,
$flag_path_output,
$ir_optimized
)
$optimizer_args | ForEach-Object { Write-Host $_ }
$stage2_time = Measure-Command { & $optimizer $optimizer_args }
write-host "Optimization took $($stage2_time.TotalMilliseconds)ms"
write-hosts
write-host "Compiling LLVM IR to Object File with 'clang'"
$ir_to_obj_args = @()
$ir_to_obj_args += $flag_compile
$ir_to_obj_args += $flag_path_output, $object
$ir_to_obj_args += $ir_optimized
$ir_to_obj_args | ForEach-Object { Write-Host $_ }
$stage3_time = Measure-Command { & $compiler $ir_to_obj_args }
write-host "Object file generation took $($stage3_time.TotalMilliseconds)ms"
write-host
}
if ($true) {
# write-host "Linking with lld-link"
$linker_args = @()
$linker_args += $flag_nologo
$linker_args += $flag_link_win_machine_64
$linker_args += $flag_link_no_incremental
$linker_args += ($flag_link_win_path_output + $binary)
$linker_args += "$flag_link_win_debug"
$linker_args += $flag_link_win_pdb + $pdb
$linker_args += $flag_link_mapfile + $map
$linker_args += $flag_link_win_subsystem_console
$linker_args += $object
# Diagnoistc print for the args
$linker_args | ForEach-Object { Write-Host $_ }
$linking_time = Measure-Command { & $linker $linker_args }
write-host "Linking took $($linking_time.TotalMilliseconds)ms"
write-host
}
if ($false) {
write-host "Dumping Debug Info"
$rbin_out = '--out:'
$rbin_dump = '--dump'
$rdi = join-path $path_build "$unit_name.rdi"
$rdi_listing = join-path $path_build "$unit_name.rdi.list"
$nargs = @($pdb, ($rbin_out + $rdi))
& $radbin $nargs
$nargs = @($rbin_dump, $rdi)
$dump = & $radbin $nargs
$dump > $rdi_listing
}
Pop-Location