notes
This commit is contained in:
97
references/blog_in-depth.md
Normal file
97
references/blog_in-depth.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# In-Depth Analysis: Timothy Lottes's Development Blogs (2007 - 2016)
|
||||
|
||||
This document synthesizes the architectural paradigms, implementation details, and philosophical shifts documented in Timothy Lottes's blogs over a decade of building minimal, high-performance Forth-like operating environments. This knowledge is crucial for understanding the "Lottes/Onat Paradigm" and successfully implementing the `bootslop` project.
|
||||
|
||||
---
|
||||
|
||||
## 1. The Core Philosophy: "Vintage Programming"
|
||||
|
||||
Lottes advocates for returning to a "stone-age" development methodology reminiscent of the Commodore 64 or HP48, but applied to modern x86-64 hardware and GPUs.
|
||||
|
||||
* **Rejection of Modern Complexity:** He explicitly rejects the "NO" of modern operating systems—compilers, linkers, debuggers, memory protection, paging, and bloated ABIs. He aims for an environment that says "YES" to direct hardware access.
|
||||
* **The OS IS the Editor:** The system boots directly into a visual editor. This editor functions simultaneously as an IDE, assembler, disassembler, hex editor, debugger, and live-coding environment.
|
||||
* **Instant Iteration:** The primary goal is a sub-5ms edit-compile-run loop. Debugging is done via instant visual feedback and "printf" style memory peeking within the editor itself, rendering traditional debuggers obsolete.
|
||||
* **Extreme Minimalism:** His compilers and core runtimes often fit within 1.5KB to 4KB (e.g., the 1536-byte bootloader/interpreter project).
|
||||
|
||||
## 2. The Evolution to "Source-Less" Programming
|
||||
|
||||
The most critical architectural shift in Lottes's work is the move from text-based source files (like his 2014 "A" language) to **Source-Less Programming** (2015).
|
||||
|
||||
### Why Source-Less?
|
||||
Parsing text (lexical analysis, string hashing, AST generation) is slow and complex. In a source-less model, the "source code" *is* the binary executable image (or a direct structured representation of it).
|
||||
|
||||
### The Architecture of Source-Less (x68)
|
||||
1. **32-Bit Granularity:** Every token in the system is exactly 32 bits (4 bytes).
|
||||
* To accommodate variable-length x86-64 instructions, Lottes invented "x68".
|
||||
* **Padding:** Standard x86 instructions are padded to exactly 32 bits (or multiples of 32 bits) using ignored segment override prefixes (like `2E` or `3E`) and multi-byte NOPs.
|
||||
* Example: A `RET` instruction (`C3`) becomes `C3 90 90 90`.
|
||||
* *Why?* This keeps immediate values (like 32-bit addresses or constants) 32-bit aligned, drastically simplifying the editor and the assembler.
|
||||
|
||||
2. **The Token Types:** A 32-bit word in memory represents one of four things:
|
||||
* **DAT (Data):** Hexadecimal data or an immediate value.
|
||||
* **OP (Opcode):** A padded 32-bit x86-64 machine instruction.
|
||||
* **ABS (Absolute Address):** A direct 32-bit memory pointer.
|
||||
* **REL (Relative Address):** An `[RIP + imm32]` relative offset used for branching.
|
||||
|
||||
3. **The Annotation Overlay (The "Shadow" Memory):**
|
||||
* Because raw 32-bit hex values are unreadable to humans, the editor maintains a *parallel array* of 64-bit annotations for every 32-bit token.
|
||||
* **Annotation Layout (64-bit):**
|
||||
* `Tag` (4 to 8 bits): Defines how the editor should display and treat the 32-bit value (e.g., display as a signed int, an opcode name, a relative address, or a specific color).
|
||||
* `Label / Name`: A short string (e.g., 5 to 8 characters, often compressed using 6-bit or 7-bit encodings to fit) that acts as the human-readable name for the memory address.
|
||||
* *The Magic:* The editor reads the binary array and the annotation array. It uses the tags to dynamically format the screen. There is **zero string parsing** at runtime.
|
||||
|
||||
4. **Edit-Time Relinking (The Visual Linker):**
|
||||
* When you insert or delete a token in the editor, all tokens tagged as `ABS` or `REL` (addresses) are automatically recalculated and updated in real-time. The editor *is* the linker.
|
||||
|
||||
5. **Live State vs. Edit State:**
|
||||
* Memory is split: The live running program, and the edit buffer.
|
||||
* When edits are made and confirmed (e.g., hitting ESC or Enter), the editor atomically swaps or patches the live image with the edited image.
|
||||
|
||||
## 3. Language Paradigms: "Ear" and "Toe"
|
||||
|
||||
In his "Random Holiday 2015" post, Lottes solidifies the specific DSLs used within this source-less framework:
|
||||
|
||||
* **"Toe" (The Low-Level Assembler):** This is the subset of x86-64 with 32-bit padded opcodes. It is heavily macro-driven to assemble machine code.
|
||||
* **"Ear" (The High-Level Macro/Forth Language):** A zero-operand, Forth-like language embedded directly into the binary form.
|
||||
* Instead of a traditional Forth interpreter searching a dictionary at runtime, the dictionary is resolved at *edit-time* or *import-time*.
|
||||
* A token is just an index or a direct `CALL` instruction to the compiled word.
|
||||
|
||||
### The 2-Item Stack (Implicit Registers)
|
||||
While early experiments used a traditional Forth data stack in memory, Lottes's later architectures (and Onat's derived work) map the stack directly to hardware registers to eliminate memory overhead:
|
||||
* `RAX` = Top of Stack (TOS)
|
||||
* `RBX` (or `RDX` in Onat's VAMP) = Second item on stack (NOS)
|
||||
* **The xchg Trick:** Stack rotation is often handled by `xchg rax, rbx` (or `rdx`), which compiles to a tiny 2-3 byte instruction, keeping execution entirely within the CPU cache.
|
||||
|
||||
## 4. Bootstrapping "The Chicken Without an Egg"
|
||||
|
||||
How do you build a system that requires a custom binary editor to write code, when you don't have the editor yet?
|
||||
|
||||
1. **C Prototype First:** Lottes explicitly states he builds the first iteration of the visual editor and virtual machine in C (using WinAPI or standard libraries). This allows rapid iteration of the visual layout and the memory arena logic.
|
||||
2. **Hand-Assembling Bootstraps:** He uses standard assemblers (like NASM) or hexadecimal byte-banging (using tools like `objdump -d`) to figure out the exact padded 32-bit opcode bytes.
|
||||
3. **Embed Opcode Definitions:** The C prototype includes hardcoded arrays of bytes that represent the base opcodes (e.g., `MOV`, `ADD`, `CALL`, `RET`).
|
||||
4. **Self-Hosting:** Once the C editor is stable and can generate binary code into an arena, he rewrites the editor *inside* the custom language within the C editor, eventually discarding the C host.
|
||||
|
||||
## 5. UI and Visual Design
|
||||
|
||||
The UI is not an afterthought; it is integral to the architecture.
|
||||
|
||||
* **The Grid:** The editor displays memory as a strict grid. Typical layout: 8 tokens per line (fitting half a 64-byte cache line).
|
||||
* **Two Rows per Token:**
|
||||
* Top Row: The Annotation (Label/Name), color-coded.
|
||||
* Bottom Row: The 32-bit Data (Hex value, or a resolved symbol name if tagged as an address).
|
||||
* **Colors (ColorForth Inspired):**
|
||||
* Colors dictate semantic meaning (e.g., Red = Define, Green = Compile, Yellow = Execute/Immediate, White/Grey = Comment/Format). This visual syntax replaces traditional language keywords.
|
||||
* **Pixel-Perfect Fonts:** Lottes builds custom, fixed-width raster fonts (e.g., 6x11 or 8x8) to ensure perfect readability without anti-aliasing blurring, often treating specific characters (like `_`, `-`, `=`) as line-drawing characters to structure the UI.
|
||||
|
||||
## Summary for the `bootslop` Implementation
|
||||
|
||||
Our current `attempt_1/main.c` is perfectly aligned with Phase 1 of the Lottes bootstrapping process:
|
||||
1. We have a C-based WinAPI editor.
|
||||
2. We have a token array (`tape_arena`) and an annotation array (`anno_arena`).
|
||||
3. We have 32-bit tokens packed with a 4-bit semantic tag and a 28-bit payload.
|
||||
4. We have a basic JIT emitter targeting a 2-register (`RAX`/`RDX`) virtual machine.
|
||||
|
||||
**Next Immediate Priorities based on Lottes's path:**
|
||||
* Move away from string-based dictionary lookups at runtime to **Edit-Time Relinking** (resolving addresses when the token is typed or modified in the UI).
|
||||
* Implement the **Padding Strategy** for the x86-64 JIT emitter to ensure all emitted logical blocks align cleanly, paving the way for 1:1 token-to-machine-code mapping.
|
||||
* Refine the Editor Grid to show the two-row (Annotation / Data) layout clearly.
|
||||
Reference in New Issue
Block a user