From 884deeda4dd2a342d82058b3943a7af7efbe7780 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Fri, 20 Feb 2026 18:57:42 -0500 Subject: [PATCH] notes --- Readme.md | 25 +++++++++++++++++++++ attempt_1/attempt_1.md | 50 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 Readme.md create mode 100644 attempt_1/attempt_1.md diff --git a/Readme.md b/Readme.md new file mode 100644 index 0000000..27dc685 --- /dev/null +++ b/Readme.md @@ -0,0 +1,25 @@ +# Bootslop: A Sourceless ColorForth Derivative + +This repository contains the curation materials and prototype implementation for building a zero-overhead, sourceless ColorForth-derivative for x86-64, specifically modeled after the architectures of Timothy Lottes and Onat Türkçüoğlu. + +## Project Goal + +The objective is to *learn* how to build this architecture from scratch, with the AI acting as a highly contextualized mentor. We are using a `-nostdlib` C environment on Win32 to construct a visual editor that is simultaneously the IDE, the compiler, and the OS for a tiny, high-performance computing environment. + +## Current State + +The `attempt_1/` directory contains a working C prototype that successfully implements the core architectural pillars: +* A "sourceless" editor that manipulates a 32-bit token array (`Tape Drive`) and a parallel 64-bit annotation array. +* A modal, interactive GUI built with raw Win32 GDI calls. +* A handmade Just-In-Time (JIT) compiler that translates tokens into executable x86-64 machine code on every keypress. +* An execution model based on Onat's 2-register stack (`RAX`/`RDX`) and a global memory tape. + +## Helper Scripts + +This repository contains several Python scripts used during the initial curation and content-gathering phase: + +* `process_visuals.py`: Downloads videos from YouTube, extracts frames based on transcript timestamps, performs OCR on the frames, and uses color analysis to generate semantically-tagged markdown logs of the visual content. It also crops out relevant code blocks and diagrams. +* `fetch_blog.py`: Parses `TimothyLottesBlog.csv` and scrapes the HTML content of each blog post, converting it to clean markdown for local archival. +* `fetch_notes.py`: Parses `FORTH_NOTES.csv`, filters out irrelevant or already-processed links, and scrapes the remaining pages into markdown files. +* `estimate_context.py`: A utility to scan the `references/` directory and provide a rough estimate of the total token count to ensure it fits within the AI model's context window. +* `ocr_interaction.py`: A small utility to perform OCR on single image files. diff --git a/attempt_1/attempt_1.md b/attempt_1/attempt_1.md new file mode 100644 index 0000000..27cef8e --- /dev/null +++ b/attempt_1/attempt_1.md @@ -0,0 +1,50 @@ +# Technical Outline: Attempt 1 + +## Overview +`attempt_1` is a minimal, `-nostdlib` C program that serves as a proof-of-concept for the "Lottes/Onat" sourceless ColorForth paradigm. It successfully integrates a visual editor, a live JIT compiler, and an execution environment into a single, cohesive Win32 application that runs without any external dependencies or the C runtime. + +The application presents a visual grid of 32-bit tokens and allows the user to navigate and edit them directly. On every keypress, the token array is re-compiled into x86-64 machine code and executed, with the results (register states and global memory) displayed instantly in the HUD. + +## Core Concepts Implemented + +1. **Sourceless Token Array (`FArena` tape):** + * The "source code" is a contiguous block of `U4` (32-bit) integers allocated by `VirtualAlloc` and managed by the `FArena` from `duffle.h`. + * Each token is packed with a 4-bit "Color" tag and a 28-bit payload, adhering to the core design. + +2. **Annotation Layer (`FArena` anno):** + * A parallel `FArena` of `U8` (64-bit) integers stores an 8-character string for each corresponding token on the tape. + * The UI renderer prioritizes displaying this string, but the compiler only ever sees the 2-character ID packed into the 32-bit token, successfully implementing Lottes' dictionary annotation strategy. + +3. **2-Register Stack & Global Memory:** + * The JIT compiler emits x86-64 that strictly adheres to Onat's `RAX`/`RDX` register stack. + * A `vm_globals` array is passed by pointer into the JIT'd code (via `RCX` on Win64), allowing instructions like `FETCH` and `STORE` to simulate the "tape drive" memory model. + +4. **Handmade x86-64 JIT Emitter:** + * A small set of `emit8`/`emit32` functions write raw x86-64 opcodes into a `VirtualAlloc` block marked as executable (`PAGE_EXECUTE_READWRITE`). + * This buffer is cast to a C function pointer and called directly, bypassing the need for an external assembler like NASM or a complex library like Zydis for this prototype stage. + +5. **2-Character Mapped Dictionary & Resolver:** + * The `ID2(a, b)` macro packs two characters into a 16-bit integer for use as a token's payload. + * The JIT compiler maintains a simple array-based dictionary. On a `: Define` token, it records the ID and the current memory offset. On a `~ Call` token, it looks up the ID and emits a relative 32-bit `CALL` instruction (`0xE8`). + * It also correctly emits `JMP` instructions to skip over definition bodies during linear execution. + +6. **Modal Editor (Win32 GDI):** + * The UI is built with raw Win32 GDI calls defined in `duffle.h`. + * It features two modes: `Navigation` (gray cursor, arrow key movement) and `Edit` (orange cursor, text input). + * The editor correctly handles token insertion, deletion (Vim-style backspace), tag cycling (Tab), and value editing, all while re-compiling and re-executing on every keystroke. + +## What's Missing (TODO) + +* **Saving/Loading:** The tape and annotation arenas are purely in-memory and are lost when the program closes. +* **Expanded Instruction Set:** The JIT only knows a handful of primitives (`SWAP`, `MULT`, `ADD`, `FETCH`, `STORE`, `DEC`, `RET_IF_ZERO`, `PRINT`). It has no support for floating point, stack manipulation for C FFI, or more complex branches. +* **Robust Dictionary:** The current dictionary is a simple array that is rebuilt on every compile. It doesn't handle collisions, scoping, or namespaces. +* **Annotation Editing:** Typing into an annotation just appends characters. A proper text-editing cursor within the token is needed. + +## References Utilized +* **Heavily Utilized:** + * **Onat's Talks:** The core architecture (2-register stack, global memory tape, JIT philosophy) is a direct implementation of the concepts from his VAMP/KYRA presentations. + * **Lottes' Twitter Notes:** The 2-character mapped dictionary, `ret-if-signed` (`RET_IF_ZERO`), and annotation layer concepts were taken directly from his tweets. + * **User's `duffle.h` & `fortish-study`:** The C coding conventions (X-Macros, `FArena`, byte-width types, `ms_` prefixes) were adopted from these sources. +* **Lightly Utilized:** + * **Lottes' Blog:** Provided the high-level "sourceless" philosophy and inspiration. + * **Grok Searches:** Served to validate our understanding and provide parallels (like Wasm's linear memory), but did not provide direct implementation details.