Compare commits

..

2 Commits

Author SHA1 Message Date
ed 884deeda4d notes 2026-02-20 18:57:42 -05:00
ed 9567a05697 progress 2026-02-20 16:58:07 -05:00
5 changed files with 193 additions and 84 deletions
+25
View File
@@ -0,0 +1,25 @@
# Bootslop: A Sourceless ColorForth Derivative
This repository contains the curation materials and prototype implementation for building a zero-overhead, sourceless ColorForth-derivative for x86-64, specifically modeled after the architectures of Timothy Lottes and Onat Türkçüoğlu.
## Project Goal
The objective is to *learn* how to build this architecture from scratch, with the AI acting as a highly contextualized mentor. We are using a `-nostdlib` C environment on Win32 to construct a visual editor that is simultaneously the IDE, the compiler, and the OS for a tiny, high-performance computing environment.
## Current State
The `attempt_1/` directory contains a working C prototype that successfully implements the core architectural pillars:
* A "sourceless" editor that manipulates a 32-bit token array (`Tape Drive`) and a parallel 64-bit annotation array.
* A modal, interactive GUI built with raw Win32 GDI calls.
* A handmade Just-In-Time (JIT) compiler that translates tokens into executable x86-64 machine code on every keypress.
* An execution model based on Onat's 2-register stack (`RAX`/`RDX`) and a global memory tape.
## Helper Scripts
This repository contains several Python scripts used during the initial curation and content-gathering phase:
* `process_visuals.py`: Downloads videos from YouTube, extracts frames based on transcript timestamps, performs OCR on the frames, and uses color analysis to generate semantically-tagged markdown logs of the visual content. It also crops out relevant code blocks and diagrams.
* `fetch_blog.py`: Parses `TimothyLottesBlog.csv` and scrapes the HTML content of each blog post, converting it to clean markdown for local archival.
* `fetch_notes.py`: Parses `FORTH_NOTES.csv`, filters out irrelevant or already-processed links, and scrapes the remaining pages into markdown files.
* `estimate_context.py`: A utility to scan the `references/` directory and provide a rough estimate of the total token count to ensure it fits within the AI model's context window.
* `ocr_interaction.py`: A small utility to perform OCR on single image files.
+50
View File
@@ -0,0 +1,50 @@
# Technical Outline: Attempt 1
## Overview
`attempt_1` is a minimal, `-nostdlib` C program that serves as a proof-of-concept for the "Lottes/Onat" sourceless ColorForth paradigm. It successfully integrates a visual editor, a live JIT compiler, and an execution environment into a single, cohesive Win32 application that runs without any external dependencies or the C runtime.
The application presents a visual grid of 32-bit tokens and allows the user to navigate and edit them directly. On every keypress, the token array is re-compiled into x86-64 machine code and executed, with the results (register states and global memory) displayed instantly in the HUD.
## Core Concepts Implemented
1. **Sourceless Token Array (`FArena` tape):**
* The "source code" is a contiguous block of `U4` (32-bit) integers allocated by `VirtualAlloc` and managed by the `FArena` from `duffle.h`.
* Each token is packed with a 4-bit "Color" tag and a 28-bit payload, adhering to the core design.
2. **Annotation Layer (`FArena` anno):**
* A parallel `FArena` of `U8` (64-bit) integers stores an 8-character string for each corresponding token on the tape.
* The UI renderer prioritizes displaying this string, but the compiler only ever sees the 2-character ID packed into the 32-bit token, successfully implementing Lottes' dictionary annotation strategy.
3. **2-Register Stack & Global Memory:**
* The JIT compiler emits x86-64 that strictly adheres to Onat's `RAX`/`RDX` register stack.
* A `vm_globals` array is passed by pointer into the JIT'd code (via `RCX` on Win64), allowing instructions like `FETCH` and `STORE` to simulate the "tape drive" memory model.
4. **Handmade x86-64 JIT Emitter:**
* A small set of `emit8`/`emit32` functions write raw x86-64 opcodes into a `VirtualAlloc` block marked as executable (`PAGE_EXECUTE_READWRITE`).
* This buffer is cast to a C function pointer and called directly, bypassing the need for an external assembler like NASM or a complex library like Zydis for this prototype stage.
5. **2-Character Mapped Dictionary & Resolver:**
* The `ID2(a, b)` macro packs two characters into a 16-bit integer for use as a token's payload.
* The JIT compiler maintains a simple array-based dictionary. On a `: Define` token, it records the ID and the current memory offset. On a `~ Call` token, it looks up the ID and emits a relative 32-bit `CALL` instruction (`0xE8`).
* It also correctly emits `JMP` instructions to skip over definition bodies during linear execution.
6. **Modal Editor (Win32 GDI):**
* The UI is built with raw Win32 GDI calls defined in `duffle.h`.
* It features two modes: `Navigation` (gray cursor, arrow key movement) and `Edit` (orange cursor, text input).
* The editor correctly handles token insertion, deletion (Vim-style backspace), tag cycling (Tab), and value editing, all while re-compiling and re-executing on every keystroke.
## What's Missing (TODO)
* **Saving/Loading:** The tape and annotation arenas are purely in-memory and are lost when the program closes.
* **Expanded Instruction Set:** The JIT only knows a handful of primitives (`SWAP`, `MULT`, `ADD`, `FETCH`, `STORE`, `DEC`, `RET_IF_ZERO`, `PRINT`). It has no support for floating point, stack manipulation for C FFI, or more complex branches.
* **Robust Dictionary:** The current dictionary is a simple array that is rebuilt on every compile. It doesn't handle collisions, scoping, or namespaces.
* **Annotation Editing:** Typing into an annotation just appends characters. A proper text-editing cursor within the token is needed.
## References Utilized
* **Heavily Utilized:**
* **Onat's Talks:** The core architecture (2-register stack, global memory tape, JIT philosophy) is a direct implementation of the concepts from his VAMP/KYRA presentations.
* **Lottes' Twitter Notes:** The 2-character mapped dictionary, `ret-if-signed` (`RET_IF_ZERO`), and annotation layer concepts were taken directly from his tweets.
* **User's `duffle.h` & `fortish-study`:** The C coding conventions (X-Macros, `FArena`, byte-width types, `ms_` prefixes) were adopted from these sources.
* **Lightly Utilized:**
* **Lottes' Blog:** Provided the high-level "sourceless" philosophy and inspiration.
* **Grok Searches:** Served to validate our understanding and provide parallels (like Wasm's linear memory), but did not provide direct implementation details.
+4
View File
@@ -608,6 +608,10 @@ WinAPI void* ms_get_stock_object(S4 i) asm("GetStockObject");
WinAPI void* ms_create_font_a(S4 cHeight, S4 cWidth, S4 cEscapement, S4 cOrientation, S4 cWeight, U4 bItalic, U4 bUnderline, U4 bStrikeOut, U4 iCharSet, U4 iOutPrecision, U4 iClipPrecision, U4 iQuality, U4 iPitchAndFamily, char const* pszFaceName) asm("CreateFontA"); WinAPI void* ms_create_font_a(S4 cHeight, S4 cWidth, S4 cEscapement, S4 cOrientation, S4 cWeight, U4 bItalic, U4 bUnderline, U4 bStrikeOut, U4 iCharSet, U4 iOutPrecision, U4 iClipPrecision, U4 iQuality, U4 iPitchAndFamily, char const* pszFaceName) asm("CreateFontA");
WinAPI void* ms_select_object(void* hdc, void* h) asm("SelectObject"); WinAPI void* ms_select_object(void* hdc, void* h) asm("SelectObject");
WinAPI S4 ms_rectangle(void* hdc, S4 left, S4 top, S4 right, S4 bottom) asm("Rectangle"); WinAPI S4 ms_rectangle(void* hdc, S4 left, S4 top, S4 right, S4 bottom) asm("Rectangle");
WinAPI S4 ms_set_bk_mode(void* hdc, S4 mode) asm("SetBkMode");
WinAPI void* ms_create_solid_brush(U4 color) asm("CreateSolidBrush");
WinAPI S4 ms_delete_object(void* ho) asm("DeleteObject");
WinAPI S2 ms_get_async_key_state(S4 vKey) asm("GetAsyncKeyState");
#define MS_MEM_COMMIT 0x00001000 #define MS_MEM_COMMIT 0x00001000
#define MS_MEM_RESERVE 0x00002000 #define MS_MEM_RESERVE 0x00002000
+65 -35
View File
@@ -6,14 +6,16 @@
#include "duffle.amd64.win32.h" #include "duffle.amd64.win32.h"
// --- Semantic Tags (Using X-Macros & Enum_) --- // --- Semantic Tags (Using X-Macros & Enum_) ---
// Colors translated from Cozy-and-WIndy:
// 0x00bbggrr Win32 format
typedef Enum_(U4, STag) { typedef Enum_(U4, STag) {
#define Tag_Entries() \ #define Tag_Entries() \
X(Define, "Define", 0x003333FF, ":") /* RED */ \ X(Define, "Define", 0x0018AEFF, ":") /* Orange-ish (Language.Type) */ \
X(Call, "Call", 0x0033FF33, "~") /* GREEN */ \ X(Call, "Call", 0x00D6A454, "~") /* Soft Blue (Language.Class) */ \
X(Data, "Data", 0x00FFFF33, "$") /* CYAN */ \ X(Data, "Data", 0x0094BAA1, "$") /* Muted Green (Language.Number) */ \
X(Imm, "Imm", 0x0033FFFF, "^") /* YELLOW */ \ X(Imm, "Imm", 0x004AA4C2, "^") /* Sand/Yellow (Language.Keyword) */ \
X(Comment, "Comment", 0x00888888, ".") /* DIM */ \ X(Comment, "Comment", 0x00AAAAAA, ".") /* Grey (Language.Comment) */ \
X(Format, "Format", 0x00444444, " ") /* INVISIBLE/FORMAT */ X(Format, "Format", 0x003A2F3B, " ") /* Current Line BG for invisibles */
#define X(n, s, c, p) tmpl(STag, n), #define X(n, s, c, p) tmpl(STag, n),
Tag_Entries() Tag_Entries()
@@ -287,6 +289,8 @@ IA_ void compile_and_run_tape(void) {
#define MS_VK_PRIOR 0x21 #define MS_VK_PRIOR 0x21
#define MS_VK_NEXT 0x22 #define MS_VK_NEXT 0x22
#define MS_VK_SHIFT 0x10
// --- Window Procedure --- // --- Window Procedure ---
S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) { S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
U8 tape_count = tape_arena.used / sizeof(U4); U8 tape_count = tape_arena.used / sizeof(U4);
@@ -294,12 +298,14 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
switch (msg) { switch (msg) {
case MS_WM_CHAR: { case MS_WM_CHAR: {
if (editor_mode != MODE_EDIT) return 0;
U4 t = tape_ptr[cursor_idx]; U4 t = tape_ptr[cursor_idx];
U4 tag = UNPACK_TAG(t); U4 tag = UNPACK_TAG(t);
U4 val = UNPACK_VAL(t); U4 val = UNPACK_VAL(t);
U1 c = (U1)wparam; U1 c = (U1)wparam;
// Skip control characters in WM_CHAR (handled in KEYDOWN) // Skip control characters and the 'E' that triggered the mode
if (c < 32) break; if (c < 32) break;
if (tag == STag_Data) { if (tag == STag_Data) {
@@ -340,7 +346,7 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
if (wparam == 0x45 && editor_mode == MODE_NAV) { // 'E' if (wparam == 0x45 && editor_mode == MODE_NAV) { // 'E'
editor_mode = MODE_EDIT; editor_mode = MODE_EDIT;
ms_invalidate_rect(hwnd, NULL, true); ms_invalidate_rect(hwnd, NULL, true);
return 0; return 0; // Consume the keypress so it doesn't trigger WM_CHAR
} }
if (wparam == 0x1B && editor_mode == MODE_EDIT) { // ESC if (wparam == 0x1B && editor_mode == MODE_EDIT) { // ESC
editor_mode = MODE_NAV; editor_mode = MODE_NAV;
@@ -415,34 +421,48 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
U4 tag = (UNPACK_TAG(t) + 1) % STag_Count; U4 tag = (UNPACK_TAG(t) + 1) % STag_Count;
tape_ptr[cursor_idx] = PACK_TOKEN(tag, UNPACK_VAL(t)); tape_ptr[cursor_idx] = PACK_TOKEN(tag, UNPACK_VAL(t));
} else if (wparam == MS_VK_BACK) { } else if (wparam == MS_VK_BACK) {
// Delete Token and move cursor left, shifting BOTH arenas // Delete Token
// Shift: delete AT cursor | Regular: delete TO THE LEFT
U8 delete_idx = cursor_idx;
B4 is_shift = (ms_get_async_key_state(MS_VK_SHIFT) & 0x8000) != 0;
if (!is_shift) {
if (cursor_idx > 0) {
delete_idx = cursor_idx - 1;
cursor_idx--;
} else return 0;
}
if (tape_count > 0) { if (tape_count > 0) {
U8*r anno_ptr = C_(U8*r, anno_arena.start); U8*r anno_ptr = C_(U8*r, anno_arena.start);
for (U8 i = cursor_idx; i < tape_count - 1; i++) { for (U8 i = delete_idx; i < tape_count - 1; i++) {
tape_ptr[i] = tape_ptr[i+1]; tape_ptr[i] = tape_ptr[i+1];
anno_ptr[i] = anno_ptr[i+1]; anno_ptr[i] = anno_ptr[i+1];
} }
tape_arena.used -= sizeof(U4); tape_arena.used -= sizeof(U4);
anno_arena.used -= sizeof(U8); anno_arena.used -= sizeof(U8);
if (cursor_idx > 0) cursor_idx--;
} }
} else if (wparam == MS_VK_SPACE || wparam == MS_VK_RETURN) { } else if (wparam == MS_VK_SPACE || wparam == MS_VK_RETURN) {
// Insert New Token (Pre-append at cursor), shifting BOTH arenas // Insert New Token
// Shift: insert AFTER cursor | Regular: insert BEFORE cursor
B4 is_shift = (ms_get_async_key_state(MS_VK_SHIFT) & 0x8000) != 0;
U8 insert_idx = cursor_idx;
if (is_shift) insert_idx++;
if (tape_arena.used + sizeof(U4) <= tape_arena.capacity && anno_arena.used + sizeof(U8) <= anno_arena.capacity) { if (tape_arena.used + sizeof(U4) <= tape_arena.capacity && anno_arena.used + sizeof(U8) <= anno_arena.capacity) {
U8*r anno_ptr = C_(U8*r, anno_arena.start); U8*r anno_ptr = C_(U8*r, anno_arena.start);
for (U8 i = tape_count; i > cursor_idx; i--) { for (U8 i = tape_count; i > insert_idx; i--) {
tape_ptr[i] = tape_ptr[i-1]; tape_ptr[i] = tape_ptr[i-1];
anno_ptr[i] = anno_ptr[i-1]; anno_ptr[i] = anno_ptr[i-1];
} }
if (wparam == MS_VK_RETURN) { if (wparam == MS_VK_RETURN) {
tape_ptr[cursor_idx] = PACK_TOKEN(STag_Format, 0xA); // Newline tape_ptr[insert_idx] = PACK_TOKEN(STag_Format, 0xA);
anno_ptr[cursor_idx] = 0; anno_ptr[insert_idx] = 0;
cursor_idx++; // Move past newline
} else { } else {
tape_ptr[cursor_idx] = PACK_TOKEN(STag_Comment, ID2(' ',' ')); tape_ptr[insert_idx] = PACK_TOKEN(STag_Comment, ID2(' ',' '));
anno_ptr[cursor_idx] = 0; anno_ptr[insert_idx] = 0;
cursor_idx++; // Move past space
} }
if (is_shift) cursor_idx++;
tape_arena.used += sizeof(U4); tape_arena.used += sizeof(U4);
anno_arena.used += sizeof(U8); anno_arena.used += sizeof(U8);
} }
@@ -463,7 +483,15 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
void* hFont = ms_create_font_a(20, 0, 0, 0, 400, 0, 0, 0, 0, 0, 0, 0, 0, "Consolas"); void* hFont = ms_create_font_a(20, 0, 0, 0, 400, 0, 0, 0, 0, 0, 0, 0, 0, "Consolas");
void* hOldFont = ms_select_object(hdc, hFont); void* hOldFont = ms_select_object(hdc, hFont);
ms_set_bk_color(hdc, 0x001E1E1E); ms_set_bk_mode(hdc, 1); // TRANSPARENT text background
void* hBgBrush = ms_create_solid_brush(0x00222222);
ms_select_object(hdc, hBgBrush);
ms_rectangle(hdc, -1, -1, 3000, 3000);
void* hBrushEdit = ms_create_solid_brush(0x008E563B);
void* hBrushNav = ms_create_solid_brush(0x00262F3B);
S4 start_x = 40, start_y = 60, spacing_x = 110, spacing_y = 35; S4 start_x = 40, start_y = 60, spacing_x = 110, spacing_y = 35;
S4 x = start_x, y = start_y; S4 x = start_x, y = start_y;
@@ -479,8 +507,7 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
S4 render_y = y - scroll_y_offset; S4 render_y = y - scroll_y_offset;
if (i == cursor_idx && render_y >= 30 && render_y < 500) { if (i == cursor_idx && render_y >= 30 && render_y < 500) {
void* hBrush = ms_get_stock_object(editor_mode == MODE_EDIT ? 0 : 2); // WHITE_BRUSH : GRAY_BRUSH ms_select_object(hdc, editor_mode == MODE_EDIT ? hBrushEdit : hBrushNav);
ms_select_object(hdc, hBrush);
ms_rectangle(hdc, x - 5, render_y - 2, x + 95, render_y + 22); ms_rectangle(hdc, x - 5, render_y - 2, x + 95, render_y + 22);
} }
@@ -501,11 +528,11 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
ms_set_text_color(hdc, color); ms_set_text_color(hdc, color);
if (editor_mode == MODE_EDIT && i == cursor_idx) { if (editor_mode == MODE_EDIT && i == cursor_idx) {
ms_set_text_color(hdc, 0x00000000); // Black text on white cursor // Better visibility in Edit Mode: White text on White-ish cursor
ms_set_text_color(hdc, 0x001E1E1E);
} }
char val_str[9]; char val_str[9]; if (tag == STag_Data) {
if (tag == STag_Data) {
u64_to_hex(val, val_str, 6); u64_to_hex(val, val_str, 6);
val_str[6] = '\0'; val_str[6] = '\0';
} else { } else {
@@ -541,10 +568,10 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
} }
// Draw a solid background behind the HUD to cover scrolling text // Draw a solid background behind the HUD to cover scrolling text
void* hBlackBrush = ms_get_stock_object(4); // BLACK_BRUSH void* hHudBrush = ms_create_solid_brush(0x00141E23);
ms_select_object(hdc, hBlackBrush); ms_select_object(hdc, hHudBrush);
ms_rectangle(hdc, 0, 500, 1100, 750); ms_rectangle(hdc, -1, 500, 3000, 3000);
ms_rectangle(hdc, 0, 0, 1100, 40); ms_rectangle(hdc, -1, -1, 3000, 40);
ms_set_text_color(hdc, 0x00AAAAAA); ms_set_text_color(hdc, 0x00AAAAAA);
ms_text_out_a(hdc, 40, 10, "x86-64 Machine Code Emitter | 2-Reg Stack | [F5] Toggle Run Mode | [PgUp/PgDn] Scroll", 85); ms_text_out_a(hdc, 40, 10, "x86-64 Machine Code Emitter | 2-Reg Stack | [F5] Toggle Run Mode | [PgUp/PgDn] Scroll", 85);
@@ -559,7 +586,7 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
char state_str[64] = "RAX: 00000000 | RDX: 00000000"; char state_str[64] = "RAX: 00000000 | RDX: 00000000";
u64_to_hex(vm_rax, state_str + 5, 8); u64_to_hex(vm_rax, state_str + 5, 8);
u64_to_hex(vm_rdx, state_str + 21, 8); u64_to_hex(vm_rdx, state_str + 21, 8);
ms_set_text_color(hdc, 0x0033FF33); ms_set_text_color(hdc, 0x0094BAA1); // Number green
ms_text_out_a(hdc, 40, 550, state_str, 29); ms_text_out_a(hdc, 40, 550, state_str, 29);
// HUD: Display Current Token Meaning // HUD: Display Current Token Meaning
@@ -580,27 +607,30 @@ S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) {
ms_text_out_a(hdc, 40, 580, semantics_str, 13 + name_len); ms_text_out_a(hdc, 40, 580, semantics_str, 13 + name_len);
} }
ms_set_text_color(hdc, 0x00FFFFFF); ms_set_text_color(hdc, 0x00C8C8C8);
ms_text_out_a(hdc, 400, 520, "Global Memory (Contiguous Array):", 33); ms_text_out_a(hdc, 400, 520, "Global Memory (Contiguous Array):", 33);
for (int i=0; i<4; i++) { for (int i=0; i<4; i++) {
char glob_str[32] = "[0]: 00000000"; char glob_str[32] = "[0]: 00000000";
glob_str[1] = '0' + i; glob_str[1] = '0' + i;
u64_to_hex(vm_globals[i], glob_str + 5, 8); u64_to_hex(vm_globals[i], glob_str + 5, 8);
ms_set_text_color(hdc, 0x00FFFF33); ms_set_text_color(hdc, 0x00D6A454); // Soft blue
ms_text_out_a(hdc, 400, 550 + (i * 25), glob_str, 13); ms_text_out_a(hdc, 400, 550 + (i * 25), glob_str, 13);
} }
// Print Log // Print Log
ms_set_text_color(hdc, 0x00FFFFFF); ms_set_text_color(hdc, 0x00C8C8C8);
ms_text_out_a(hdc, 750, 520, "Print Log:", 10); ms_text_out_a(hdc, 750, 520, "Print Log:", 10);
for (int i=0; i<log_count && i<4; i++) { for (int i=0; i<log_count && i<4; i++) {
char log_str[32] = "00000000"; char log_str[32] = "00000000";
u64_to_hex(log_buffer[i], log_str, 8); u64_to_hex(log_buffer[i], log_str, 8);
ms_set_text_color(hdc, 0x00FF33FF); ms_set_text_color(hdc, 0x0094BAA1);
ms_text_out_a(hdc, 750, 550 + (i * 25), log_str, 8); ms_text_out_a(hdc, 750, 550 + (i * 25), log_str, 8);
} }
ms_select_object(hdc, hOldFont); ms_select_object(hdc, hOldFont);
ms_delete_object(hBgBrush);
ms_delete_object(hBrushEdit);
ms_delete_object(hBrushNav);
ms_delete_object(hHudBrush);
ms_end_paint(hwnd, &ps); ms_end_paint(hwnd, &ps);
return 0; return 0;
} }
Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB