diff --git a/.vscode/settings.json b/.vscode/settings.json new file mode 100644 index 0000000..1c884f6 --- /dev/null +++ b/.vscode/settings.json @@ -0,0 +1,3 @@ +{ + "C_Cpp.default.compilerPath": "C:/Users/Ed/scoop/apps/llvm/current/bin/clang-cl.exe" +} \ No newline at end of file diff --git a/CONVENTIONS.md b/CONVENTIONS.md new file mode 100644 index 0000000..d1f61e8 --- /dev/null +++ b/CONVENTIONS.md @@ -0,0 +1,43 @@ +# C DSL Conventions (duffle) + +This document outlines the strict C style and architectural conventions expected in this workspace. It is based on the `duffle.amd64.win32.h` header, the user's `fortish-study` samples, and iterative feedback. + +## 1. Type Conventions (Byte-Width Fundamentals) +* **Never use standard C types** (e.g., `int`, `long`, `unsigned`, `short`, `float`, `double`) directly in application code. +* **Always use the byte-width typedefs:** + * Unsigned: `U1`, `U2`, `U4`, `U8` + * Signed: `S1`, `S2`, `S4`, `S8` + * Float: `F4`, `F8` + * Boolean: `B1`, `B2`, `B4`, `B8` (use `true`/`false` primitives) + * Strings/Chars: `UTF8` (for characters), `Str8` (for string slices) +* **WinAPI Structs:** Only use `MS_` prefixed fundamental types (e.g., `MS_LONG`, `MS_DWORD`) *inside* WinAPI struct definitions (`MS_WNDCLASSA`, etc.) to maintain FFI compatibility. Do not use them in general application logic. + +## 2. Declaration Wrappers & X-Macros +* **Structs and Enums:** Always use the macro wrappers for defining compound types to enforce clean namespacing. + * `typedef Struct_(Name) { ... };` + * `typedef Enum_(UnderlyingType, Name) { ... };` +* **X-Macros:** Use X-Macros to tightly couple Enums with their corresponding string representations or metadata. + ```c + #define My_Tag_Entries() + X(Define, "Define") + X(Call, "Call") + ``` + +## 3. Function & Symbol Naming +* **Case:** Strictly use `lower_snake_case` for all functions and variables. +* **Types:** Use `PascalCase` for type names (`FArena`, `SWord_Tag`). +* **WinAPI Symbols:** When declaring foreign Win32 symbols, prefix the C function name with `ms_` (using `lower_snake_case`) and use the `asm("SymbolName")` attribute to link it to the actual DLL export. + * *Correct:* `WinAPI U2 ms_register_class(const MS_WNDCLASSA* lpWndClass) asm("RegisterClassA");` + * *Incorrect:* `WinAPI U2 RegisterClassA(...);` + +## 4. Memory Management +* **No Standard Library:** The environment is built with `-nostdlib` and `-ffreestanding`. Never include ``, ``, etc. +* **Arenas over Malloc:** Use `FArena` and its associated macros (`farena_push`, `farena_push_type`, `farena_reset`) for all dynamic memory allocations. Do not use raw pointers with manual arithmetic when an arena can handle it. +* **Memory Ops:** Use `mem_fill` and `mem_copy` instead of standard `memset`/`memcpy` within the application logic. (A minimal `memset`/`memcpy` shim is only provided to satisfy compiler intrinsic struct zeroing under `-nostdlib`). + +## 5. Modifiers +* `internal`: Static functions. +* `global`: Global state variables. +* `IA_`: Internal Always Inline. +* `I_`: Internal Inline. +* Pointers use `*r` (restrict) or `*v` (volatile) macros where applicable. \ No newline at end of file diff --git a/GEMINI.md b/GEMINI.md index eeabf07..fbc8cb7 100644 --- a/GEMINI.md +++ b/GEMINI.md @@ -6,6 +6,9 @@ DO NOT EVER make a shell script unless told to. DO NOT EVER make a readme or a f WHEN WRITING SCRIPTS USE A 120-160 character limit per line. I don't want to see scrunched code. +## Coding Conventions +Before writing any C code in this workspace, you MUST review the strict stylistic and architectural guidelines defined in [CONVENTIONS.md](./CONVENTIONS.md). These dictate the usage of byte-width types, X-Macros, WinAPI FFI mapping, and memory arenas. + ## Necessary Background for Goal Watch or read the following: @@ -14,111 +17,31 @@ Watch or read the following: * [Metaprogramming VAMP in KYRA, a Next-gen Forth-like language](https://youtu.be/J9U_5tjdegY) * [Neokineogfx - 4th And Beyond](https://youtu.be/Awkdt30Ruvk) -There are transcripts for each of these vide2s in the [references](./references/) directory. +There are transcripts for each of these videos in the [references](./references/) directory, along with a comprehensive curation of Lottes's blogs, Onat's tweets, and architectural consolidations. ## Goal Learn ColorForth and be able to build a ColorForth derivative from scratch similar to Timothy Lottes and Onatt. -**Critical Clarification:** The goal is *not* for the AI to auto-generate a novelty solution or dump a finished codebase. The objective is for me (the user) to *learn* how to build this architecture from scratch. The AI must act as a highly contextualized mentor, providing guided nudges, architectural validation, and specific tactical assistance when requested. We are at the cusp of implementation. The AI should lean on the extensive curation in `./references/` to ensure its advice remains strictly aligned with the Lottes/Onat "sourceless, zero-overhead, register-only" paradigm, minimizing generic LLM hallucinations. +**Critical Clarification:** The goal is *not* for the AI to auto-generate a novelty solution or dump a finished codebase. The objective is for me (the user) to *learn* how to build this architecture from scratch. The AI must act as a highly contextualized mentor, providing guided nudges, architectural validation, and specific tactical assistance when requested. We are at the cusp of implementation. The AI should lean on the extensive curation in `./references/` to ensure its advice remains strictly aligned with the Lottes/Onat "sourceless, zero-overhead" paradigm, minimizing generic LLM hallucinations. -## Some exceprts from discussing things with Grok 4.20 beta +## Architectural Constraints (The "Lottes/Onat" Paradigm) -### User +Based on the curation in `./references/`, the resulting system MUST adhere to these non-standard rules: -```md -What are the prerequisites for me to understand this completely. I know code such as those conveyed by Ryan Fleury, Jonathan Blow, Casey Muratori, Mike Acton, John Carmack, Tim Sweeney, Eskil Steenberg, Sebastian Aaltonen, etc. - -I'm only aware of x86-64 and mips assembly by doing Pikuma, reading some information on assemblers. Looking deeply into linkers and doing my own build scripts. Looking into portable executable format on windows. But I'm not fluent in machine level code or its nuances. Like I don't have intuition on the trade-offs of Lottes's subset of x86 he uses nor for Onatt's and how easy it is to extend and support new codegen within their forth-like derivatives, etc. - -They always advertise it as simple and I think there is a true simplicity to it but there is a gulf. Similar to being able to use lua but maybe it would take a while to bootstrap your own lua jit I guess. The biggest hurdle being the frontend they need to actually deal with non-text based code modification or management. - -The other thing is I value being able to integrate or use external tooling like the rad debugger. And it will need most likely me to generate some source file and dynamically populate data it needs on the fly so I can open step-debug the program at runtime... I'm not sure how big of a hurdle that is. Do I even need linkers with this setup? -``` - -### Grok - -```md -**Lottes' x68 (32-bit granularity x86-64 subset)** and **Onat's KYRA codegen style** are both deliberate restrictions on the full x86-64 ISA. They are chosen for **editor simplicity** and **zero-overhead execution**, not raw peak FLOPS. Both deliver **net performance gains** in their target domain (tiny self-contained interactive systems, shaders, live-reloaded tools) because the code stays <100 KB and fits entirely in L1/L2 cache. There is **no notable runtime degradation** for real work — often the opposite. - -Here is the exact trade-off breakdown. - -### Lottes' x68 — 32-bit aligned opcodes only -Every instruction (and immediate) is forced to 32-bit (4-byte) boundaries by padding. - -**How padding works (exact from his talk):** -- Ignored prefixes (e.g. 0x3E DS segment) -- Multi-byte NOPs (0x0F 0x1F ...) -- Extra REX prefixes when not needed -- Example: normal RET (C3) → C3 90 90 90 (padded to 4 bytes) - -**Trade-offs:** - -| Aspect | Benefit (why he does it) | Cost / Potential downside | Real perf impact (his domain) | -|---------------------|-------------------------------------------------------|----------------------------------------------------|-------------------------------| -| **Editor / Sourceless** | Code = pure 32-bit word array. Hex editor, find/replace, auto-relink jumps all trivial. No variable-length decode. | None in practice | Huge win (instant edit) | -| **Jumps / Calls** | Relative offsets always 32-bit and aligned to end of opcode → editor can patch without decoding | Slightly larger code | Neutral (tiny code) | -| **Inline data** | Immediates always at 32-bit boundaries → easy visual data in hex view | — | Win | -| **Code size** | — | 20–50% larger due to padding (estimated) | Negligible (everything in L1) | -| **Decode / uop cache** | — | Slightly more bytes fetched, possible extra uops | Negligible on Zen/Intel wide decoders | -| **ISA coverage** | Still full x64 power for the ops he needs | Can't use arbitrary unpadable instructions easily | None (he only needs tiny subset) | - -**Perf verdict from Lottes:** -He explicitly says the code is so small it all fits in cache, so the padding cost is irrelevant. The **gain** is massive reduction in total system complexity (no assembler, no linker, no text parser). Sourceless mode = true zero-overhead runtime. He views this as **better** than full ISA for interactive work. - -### Onat's KYRA style (not a strict opcode subset, but a coding / emission convention) -Full x86-64 base + SSE (without VEX prefixes) + all addressing modes (including RIP-relative). The restriction is in **how** he uses it. - -**Core rules (exact from talk + his site):** -- Only **two temporary registers** for the "stack": RAX and RDX + 1 hidden bit ("which is top?"). -- Before every call/definition: `xchg rax, rdx` (1 byte, 48 87 C2). -- Everything else = global memory accessed via **one base register** (R15 points to a contiguous struct of all variables). -- Every high-level word emits **1–3 real instructions** max (often 1). -- SPIR-V shaders = direct binary emit (no glslang). - -**Trade-offs:** - -| Aspect | Benefit (why Onat does it) | Cost / downside | Real perf impact | -|-------------------------|-----------------------------------------------------------------|------------------------------------------------------|------------------| -| **Runtime stack** | Zero stack ops, zero frame setup, no juggling | Must manage globals manually | **Massive gain** (60 ms text/Forth vs 4 ms) | -| **Register usage** | Only 2 temps + base → trivial templates, 1800-byte compiler | Occasional spill to global (he says 3 ns, ignore) | Gain (fewer instructions) | -| **Globals** | One contiguous block via R15 → perfect cache locality | Programmer tracks lifetimes (single-threaded) | Gain (better than stack) | -| **SSE / AVX** | SSE without VEX (2/3-byte encoding) | No VEX/AVX2+ in templates (easy to add later) | Minor loss if you need AVX, but he doesn't | -| **Dictionary / lookup** | Pure 32-bit index (next version 32-bit index + separate tag byte) | You curate the index space by hand | **Huge gain** (no hash, 1–2 ms full compile) | -| **Code size / density** | Extremely dense (single instructions) | None | Win | - -**Perf numbers Onat actually shows (VAMP program):** -- Full recompile (editor + Vulkan + FFmpeg + shaders): 4–8 ms on normal hardware. -- Shader compile: <5 ms (mostly driver), ~100× faster than glslang. -- Instruction generation: 180 million instructions/second/core on Ryzen 5900HX. -- Compared to C: "thousand times more time to compile in C" for the same program. - -He explicitly calls out the classic Forth stack as the bottleneck: "the difference between using data stack versus using registers only". - -### Overall: Do they lose or gain perf? -**Net gain — often by 5-100× in the metrics that matter to them.** - -**Gains:** -- Iteration speed (4–8 ms full rebuild vs seconds in C). -- Runtime speed (fewer instructions, perfect locality, no ABI/calling-convention tax). -- Cache behavior (tiny code + contiguous globals = L1 resident forever). -- Debuggability / visibility (live dataflow viz of RAX/RDX + globals). - -**Losses (real but tiny):** -- Slightly larger code from Lottes padding. -- Occasional extra global access instead of a 3rd/4th register (Onat). -- Can't blindly copy-paste AVX-heavy code without adjusting templates. - -For programs that fit in a few cache lines (exactly their use-case: editors, shaders, tools, games), the simplicity wins so hard that the "losses" disappear. This is why both say "simple" and mean it — they removed everything that costs more than it gives. - -If you build the 60-line emitter I described earlier, you will feel the gain immediately: change one token → re-emit 100 bytes → run again in <1 ms. That is the real performance story. - -Want the exact 32-bit x68 padding helper + Onat's xchg + global load/store templates as copy-paste C code next? Just say the word and I'll drop the blocks. -``` +1. **Sourceless Environment (x68):** No string parsing at runtime. Code exists purely as an array of 32-bit tokens. + - **Token Layout:** 28 bits of payload (compressed name/index/value) + 4 bits for the semantic "Color" Tag. +2. **Visual Editor as the OS:** The editor directly maps to the token array. It does not read text files. It uses the 4-bit tags to colorize the tokens live. +3. **Register-Only Stack:** The traditional Forth data stack in memory is completely eliminated. + - We strictly use a **2-item register stack** (`RAX` and `RDX`). + - Stack rotation is handled via the `xchg rax, rdx` instruction. +4. **Preemptive Scatter ("Tape Drive"):** Function arguments are not pushed to a stack before a call. They are "scattered" into pre-allocated, contiguous global memory slots during compilation/initialization. The function simply reads from these known offsets, eliminating argument gathering overhead. +5. **No `if/then` branches:** Rely on hardware-level flags like conditional returns (`ret-if-signed`) combined with factored calls to avoid writing complex AST parsers. +6. **No Dependencies:** C implementation must be minimal (`-nostdlib`), ideally running directly against OS APIs (e.g., WinAPI `VirtualAlloc`, `ExitProcess`, `GDI32` for rendering). ## Visual Context Synthesis & Color Semantics -Based on the extracted frame OCR data from the references (Lottes' and Onat's presentations), here is the persistent mapping of ColorForth visual semantics to language logic for this project: +Based on the extracted frame OCR data from the references: - **Red (``):** Defines a new word or symbol in the dictionary. This is the entry point for compilation. - **Green (``):** Compiles a word into the current definition. @@ -126,8 +49,3 @@ Based on the extracted frame OCR data from the references (Lottes' and Onat's pr - **Cyan/Blue (`` / ``):** Used for variables, memory addresses, or formatting layout (not executable instruction logic). - **White/Dim (`` / ``):** Comments, annotations, and UI elements. - **Magenta (``):** Typically used for pointers or state modifiers. - -**Architectural Notes Extracted:** -1. **Sourceless Environment:** The underlying system doesn't deal with parsing strings. It deals with 32-bit tagged tokens (as noted in Lottes' 32-bit x68 alignment). -2. **Visual Editor:** The editor is intrinsically tied to the compiler. It reads the same memory structure. It uses these color properties to colorize the tokens live. -3. **Hardware Locality:** We see a major focus on removing the stack in favor of register rotation (`RAX`, `RDX`) as per Onat's methodology. diff --git a/attempt_1/duffle.amd64.win32.h b/attempt_1/duffle.amd64.win32.h index 25820c5..7416cfb 100644 --- a/attempt_1/duffle.amd64.win32.h +++ b/attempt_1/duffle.amd64.win32.h @@ -159,7 +159,7 @@ IA_ U8 atm_swap_u8(U8*r addr, U8 value){asm volatile("lock xchgq %0,%1":"=r"(val #pragma endregion Thread Coherence #pragma region Debug -WinAPI void process_exit(U4 status) asm("exit"); +WinAPI void process_exit(U4 status) asm("ExitProcess"); #define debug_trap() __builtin_debugtrap() #if BUILD_DEBUG IA_ void assert(U8 cond) { if(cond){return;} else{debug_trap(); process_exit(1);} } @@ -566,3 +566,61 @@ IA_ void str8gen_append_fmt(Str8Gen*r gen, Str8 fmt, KTL_Str8 tbl) { } #define str8gen_append_str8_(gen, s) str8gen_append_str8(gen, str8(s)) #pragma endregion Text Ops + +#pragma region OS_GDI_And_Minimal +// --- WinAPI Minimal Definitions --- +typedef struct MS_WNDCLASSA { + U4 style; + S8 (*lpfnWndProc)(void*, U4, U8, S8); + S4 cbClsExtra; + S4 cbWndExtra; + void* hInstance; + void* hIcon; + void* hCursor; + void* hbrBackground; + char const* lpszMenuName; + char const* lpszClassName; +} MS_WNDCLASSA; + +typedef struct MS_POINT { S4 x, y; } MS_POINT; +typedef struct MS_MSG { void* hwnd; U4 message; U8 wParam; S8 lParam; U4 time; MS_POINT pt; } MS_MSG; +typedef struct MS_RECT { S4 left, top, right, bottom; } MS_RECT; +typedef struct MS_PAINTSTRUCT { void* hdc; S4 fErase; MS_RECT rcPaint; S4 fRestore; S4 fIncUpdate; U1 rgbReserved[32]; } MS_PAINTSTRUCT; + +// Win32 API declarations +WinAPI void* ms_virtual_alloc(void* lpAddress, U8 dwSize, U4 flAllocationType, U4 flProtect) asm("VirtualAlloc"); +WinAPI void ms_exit_process(U4 uExitCode) asm("ExitProcess"); +WinAPI U2 ms_register_class_a(const MS_WNDCLASSA* lpWndClass) asm("RegisterClassA"); +WinAPI void* ms_create_window_ex_a(U4 dwExStyle, char const* lpClassName, char const* lpWindowName, U4 dwStyle, S4 X, S4 Y, S4 nWidth, S4 nHeight, void* hWndParent, void* hMenu, void* hInstance, void* lpParam) asm("CreateWindowExA"); +WinAPI S4 ms_show_window(void* hWnd, S4 nCmdShow) asm("ShowWindow"); +WinAPI S4 ms_get_message_a(MS_MSG* lpMsg, void* hWnd, U4 wMsgFilterMin, U4 wMsgFilterMax) asm("GetMessageA"); +WinAPI S4 ms_translate_message(const MS_MSG* lpMsg) asm("TranslateMessage"); +WinAPI S8 ms_dispatch_message_a(const MS_MSG* lpMsg) asm("DispatchMessageA"); +WinAPI S8 ms_def_window_proc_a(void* hWnd, U4 Msg, U8 wParam, S8 lParam) asm("DefWindowProcA"); +WinAPI void ms_post_quit_message(S4 nExitCode) asm("PostQuitMessage"); +WinAPI S4 ms_invalidate_rect(void* hWnd, const MS_RECT* lpRect, S4 bErase) asm("InvalidateRect"); +WinAPI void* ms_begin_paint(void* hWnd, MS_PAINTSTRUCT* lpPaint) asm("BeginPaint"); +WinAPI S4 ms_end_paint(void* hWnd, const MS_PAINTSTRUCT* lpPaint) asm("EndPaint"); +WinAPI U4 ms_set_text_color(void* hdc, U4 color) asm("SetTextColor"); +WinAPI U4 ms_set_bk_color(void* hdc, U4 color) asm("SetBkColor"); +WinAPI S4 ms_text_out_a(void* hdc, S4 x, S4 y, char const* lpString, S4 c) asm("TextOutA"); +WinAPI void* ms_get_stock_object(S4 i) asm("GetStockObject"); +WinAPI void* ms_create_font_a(S4 cHeight, S4 cWidth, S4 cEscapement, S4 cOrientation, S4 cWeight, U4 bItalic, U4 bUnderline, U4 bStrikeOut, U4 iCharSet, U4 iOutPrecision, U4 iClipPrecision, U4 iQuality, U4 iPitchAndFamily, char const* pszFaceName) asm("CreateFontA"); +WinAPI void* ms_select_object(void* hdc, void* h) asm("SelectObject"); +WinAPI S4 ms_rectangle(void* hdc, S4 left, S4 top, S4 right, S4 bottom) asm("Rectangle"); + +#define MS_MEM_COMMIT 0x00001000 +#define MS_MEM_RESERVE 0x00002000 +#define MS_PAGE_READWRITE 0x04 +#define MS_WM_DESTROY 0x0002 +#define MS_WM_PAINT 0x000F +#define MS_WM_KEYDOWN 0x0100 +#define MS_WS_OVERLAPPEDWINDOW 0x00CF0000 +#define MS_WS_VISIBLE 0x10000000 +#define MS_VK_LEFT 0x25 +#define MS_VK_UP 0x26 +#define MS_VK_RIGHT 0x27 +#define MS_VK_DOWN 0x28 + +#define MS_PAGE_EXECUTE_READWRITE 0x40 +#pragma endregion OS_GDI_And_Minimal diff --git a/attempt_1/main.c b/attempt_1/main.c index dafe721..1562208 100644 --- a/attempt_1/main.c +++ b/attempt_1/main.c @@ -5,332 +5,301 @@ #include "duffle.amd64.win32.h" -// --- WinAPI Minimal Definitions --- -typedef int MS_BOOL; -typedef unsigned long MS_DWORD; -typedef void* MS_HANDLE; -typedef MS_HANDLE MS_HWND; -typedef MS_HANDLE MS_HMENU; -typedef MS_HANDLE MS_HINSTANCE; -typedef MS_HANDLE MS_HICON; -typedef MS_HANDLE MS_HCURSOR; -typedef MS_HANDLE MS_HBRUSH; -typedef MS_HANDLE MS_HDC; -typedef MS_HANDLE MS_HFONT; -typedef long MS_LONG; -typedef char const* MS_LPCSTR; -typedef void* MS_LPVOID; -typedef S8 MS_LRESULT; -typedef U8 MS_WPARAM; -typedef S8 MS_LPARAM; -typedef U4 MS_UINT; +// --- Semantic Tags (Using X-Macros & Enum_) --- +typedef Enum_(U4, STag) { +#define Tag_Entries() \ + X(Define, "Define", 0x003333FF, ":") /* RED */ \ + X(Call, "Call", 0x0033FF33, "~") /* GREEN */ \ + X(Data, "Data", 0x00FFFF33, "$") /* CYAN */ \ + X(Imm, "Imm", 0x0033FFFF, "^") /* YELLOW */ \ + X(Comment, "Comment", 0x00888888, ".") /* DIM */ -typedef struct MS_WNDCLASSA { - MS_UINT style; - MS_LRESULT (*lpfnWndProc)(MS_HWND, MS_UINT, MS_WPARAM, MS_LPARAM); - int cbClsExtra; - int cbWndExtra; - MS_HINSTANCE hInstance; - MS_HICON hIcon; - MS_HCURSOR hCursor; - MS_HBRUSH hbrBackground; - MS_LPCSTR lpszMenuName; - MS_LPCSTR lpszClassName; -} MS_WNDCLASSA; +#define X(n, s, c, p) tmpl(STag, n), + Tag_Entries() +#undef X + STag_Count, +}; -typedef struct MS_POINT { - MS_LONG x, y; -} MS_POINT; +// Helper array to fetch Hex colors for UI rendering based on STag +global U4 tag_colors[] = { +#define X(n, s, c, p) c, + Tag_Entries() +#undef X +}; -typedef struct MS_MSG { - MS_HWND hwnd; - MS_UINT message; - MS_WPARAM wParam; - MS_LPARAM lParam; - MS_DWORD time; - MS_POINT pt; -} MS_MSG; - -typedef struct MS_RECT { - MS_LONG left, top, right, bottom; -} MS_RECT; - -typedef struct MS_PAINTSTRUCT { - MS_HDC hdc; - MS_BOOL fErase; - MS_RECT rcPaint; - MS_BOOL fRestore; - MS_BOOL fIncUpdate; - U1 rgbReserved[32]; -} MS_PAINTSTRUCT; - -// Win32 API declarations -WinAPI MS_LPVOID VirtualAlloc(MS_LPVOID lpAddress, U8 dwSize, MS_DWORD flAllocationType, MS_DWORD flProtect); -WinAPI void ExitProcess(MS_UINT uExitCode); - -WinAPI U2 RegisterClassA(const MS_WNDCLASSA* lpWndClass); -WinAPI MS_HWND CreateWindowExA(MS_DWORD dwExStyle, MS_LPCSTR lpClassName, MS_LPCSTR lpWindowName, MS_DWORD dwStyle, int X, int Y, int nWidth, int nHeight, MS_HWND hWndParent, MS_HMENU hMenu, MS_HINSTANCE hInstance, MS_LPVOID lpParam); -WinAPI MS_BOOL ShowWindow(MS_HWND hWnd, int nCmdShow); -WinAPI MS_BOOL GetMessageA(MS_MSG* lpMsg, MS_HWND hWnd, MS_UINT wMsgFilterMin, MS_UINT wMsgFilterMax); -WinAPI MS_BOOL TranslateMessage(const MS_MSG* lpMsg); -WinAPI MS_LRESULT DispatchMessageA(const MS_MSG* lpMsg); -WinAPI MS_LRESULT DefWindowProcA(MS_HWND hWnd, MS_UINT Msg, MS_WPARAM wParam, MS_LPARAM lParam); -WinAPI void PostQuitMessage(int nExitCode); -WinAPI MS_BOOL InvalidateRect(MS_HWND hWnd, const MS_RECT* lpRect, MS_BOOL bErase); - -WinAPI MS_HDC BeginPaint(MS_HWND hWnd, MS_PAINTSTRUCT* lpPaint); -WinAPI MS_BOOL EndPaint(MS_HWND hWnd, const MS_PAINTSTRUCT* lpPaint); -WinAPI MS_DWORD SetTextColor(MS_HDC hdc, MS_DWORD color); -WinAPI MS_DWORD SetBkColor(MS_HDC hdc, MS_DWORD color); -WinAPI MS_BOOL TextOutA(MS_HDC hdc, int x, int y, MS_LPCSTR lpString, int c); -WinAPI MS_HANDLE GetStockObject(int i); -WinAPI MS_HFONT CreateFontA(int cHeight, int cWidth, int cEscapement, int cOrientation, int cWeight, MS_DWORD bItalic, MS_DWORD bUnderline, MS_DWORD bStrikeOut, MS_DWORD iCharSet, MS_DWORD iOutPrecision, MS_DWORD iClipPrecision, MS_DWORD iQuality, MS_DWORD iPitchAndFamily, MS_LPCSTR pszFaceName); -WinAPI MS_HANDLE SelectObject(MS_HDC hdc, MS_HANDLE h); - -#define MS_MEM_COMMIT 0x00001000 -#define MS_MEM_RESERVE 0x00002000 -#define MS_PAGE_READWRITE 0x04 - -#define MS_WM_DESTROY 0x0002 -#define MS_WM_PAINT 0x000F -#define MS_WM_KEYDOWN 0x0100 -#define MS_WS_OVERLAPPEDWINDOW 0x00CF0000 -#define MS_WS_VISIBLE 0x10000000 - -#define MS_VK_PRIOR 0x21 // Page Up -#define MS_VK_NEXT 0x22 // Page Down - -// --- Semantic Tags (The "Colors" of ColorForth) --- -#define TAG_DEFINE 0x0 // RED: New word definition -#define TAG_CALL 0x1 // GREEN: Call/Compile word -#define TAG_DATA 0x2 // CYAN: Variable or Literal Address -#define TAG_IMM 0x3 // YELLOW: Immediate value/Execute -#define TAG_COMMENT 0x4 // WHITE: Ignored by compiler +// Helper array to fetch the text prefix based on STag +global const char* tag_prefixes[] = { +#define X(n, s, c, p) p, + Tag_Entries() +#undef X +}; // Token Packing: 28 bits payload | 4 bits tag #define PACK_TOKEN(tag, val) (((U4)(tag) << 28) | ((U4)(val) & 0x0FFFFFFF)) #define UNPACK_TAG(token) (((token) >> 28) & 0x0F) #define UNPACK_VAL(token) ((token) & 0x0FFFFFFF) -// The Tape Drive (Memory Arena) -global U4* tape; -global U8 tape_pos = 0; -global U8 view_block = 0; // Current block being viewed -#define TOKENS_PER_BLOCK 256 +#define TOKENS_PER_ROW 8 -// Virtual Machine State (Onat's 2-item stack) -global U8 vm_rax = 0; -global U8 vm_rdx = 0; +// The Tape Drive (Using FArena from duffle) +global FArena tape_arena; +global U8 cursor_idx = 0; -internal void scatter(U4 token) { - tape[tape_pos++] = token; -} +// Executable Code Arena (The JIT) +global FArena code_arena; -// Minimal u64 to hex string helper -internal void u64_to_hex(U8 val, char* buf, int chars) { - static const char hex_chars[] = "0123456789ABCDEF"; - for(S1 i = chars - 1; i >= 0; --i) { - buf[i] = hex_chars[val & 0xF]; - val >>= 4; - } -} +// VM State: 2-Reg Stack + Global Memory +global U8 vm_rax = 0; // Top +global U8 vm_rdx = 0; // Next +global U8 vm_globals[16] = {0}; -// Provide memset for the compiler's implicit struct zeroing (-nostdlib) +// Provide memset/memcpy for the compiler's implicit struct zeroing (-nostdlib) void* memset(void* dest, int c, U8 count) { - U1* bytes = (U1*)dest; - while (count--) { - *bytes++ = (U1)c; - } + mem_fill(u8_(dest), c, count); return dest; } -// --- The Tiny Interpreter --- -internal void vm_execute(U4 val) { - // Very rudimentary simulated execution. - // 0x1 = DUP - // 0x2 = MULT - // Normally this would look up the address in the dictionary. - if (val == 0x1) { - // DUP (push rax into rdx, simulating a 2-reg stack) - vm_rdx = vm_rax; - } else if (val == 0x2) { - // MULT (rax = rax * rdx) - vm_rax = vm_rax * vm_rdx; - } else if (val == 0x51415245) { - // Call "SQUARE". For this tiny mock, we just execute its body directly: DUP * - vm_execute(0x1); // DUP - vm_execute(0x2); // MULT - } +void* memcpy(void* dest, const void* src, U8 count) { + mem_copy(u8_(dest), u8_(src), count); + return dest; } -internal void vm_eval_tape() { - for (U8 i = 0; i < tape_pos; ++i) { - U4 t = tape[i]; - U4 tag = UNPACK_TAG(t); - U4 val = UNPACK_VAL(t); +IA_ void scatter(U4 token) { + U4*r ptr = farena_push_type(&tape_arena, U4); + if (ptr) { ptr[0] = token; } +} - if (tag == TAG_DATA) { - // Push data onto the 2-register stack (simulate the xchg setup) - vm_rdx = vm_rax; - vm_rax = val; - } else if (tag == TAG_IMM) { - // Execute immediately - vm_execute(val); +IA_ void u64_to_hex(U8 val, char* buf, S4 chars) { + static const char hex_chars[] = "0123456789ABCDEF"; + for(S1 i = chars - 1; i >= 0; --i) { buf[i] = hex_chars[val & 0xF]; val >>= 4; } +} + +// --- Minimal x86-64 Emitter --- +IA_ void emit8(U1 b) { U1*r p = farena_push_type(&code_arena, U1); if(p) p[0] = b; } +IA_ void emit32(U4 val){ U4*r p = farena_push_type(&code_arena, U4); if(p) p[0] = val; } + +IA_ void compile_word(U4 tag, U4 val) { + if (tag == tmpl(STag, Data)) { + // mov rdx, rax + emit8(0x48); emit8(0x89); emit8(0xC2); + // mov rax, imm32 + emit8(0x48); emit8(0xC7); emit8(0xC0); emit32(val); + } else if (tag == tmpl(STag, Imm) || tag == tmpl(STag, Call)) { + if (val == 0x1) { // SWAP: xchg rax, rdx + emit8(0x48); emit8(0x87); emit8(0xC2); + } else if (val == 0x2) { // MULT: imul rax, rdx + emit8(0x48); emit8(0x0F); emit8(0xAF); emit8(0xC2); + } else if (val == 0x3) { // ADD: add rax, rdx + emit8(0x48); emit8(0x01); emit8(0xD0); + } else if (val == 0x4) { // FETCH: mov rax, QWORD PTR [rcx + rax*8] + emit8(0x48); emit8(0x8B); emit8(0x04); emit8(0xC1); + } else if (val == 0x5) { // DEC: dec rax + emit8(0x48); emit8(0xFF); emit8(0xC8); + } else if (val == 0x6) { // STORE: mov QWORD PTR [rcx + rax*8], rdx + emit8(0x48); emit8(0x89); emit8(0x14); emit8(0xC1); + } else if (val == 0x7) { // RET_IF_ZERO: test rax, rax; jnz +9; epilogue; ret + emit8(0x48); emit8(0x85); emit8(0xC0); // test rax, rax + emit8(0x75); emit8(0x09); // jnz skip_ret (+9 bytes) + emit8(0x48); emit8(0x89); emit8(0x41); emit8(0x70); // mov [rcx+112], rax + emit8(0x48); emit8(0x89); emit8(0x51); emit8(0x78); // mov [rcx+120], rdx + emit8(0xC3); // ret + } else if (val == 0xFA) { // F_STEP (Inlined Compiler Macro) + compile_word(tmpl(STag, Data), 0); + compile_word(tmpl(STag, Imm), 0x4); + compile_word(tmpl(STag, Imm), 0x7); + + compile_word(tmpl(STag, Data), 1); + compile_word(tmpl(STag, Imm), 0x4); + compile_word(tmpl(STag, Data), 0); + compile_word(tmpl(STag, Imm), 0x4); + compile_word(tmpl(STag, Imm), 0x2); + compile_word(tmpl(STag, Data), 1); + compile_word(tmpl(STag, Imm), 0x6); + + compile_word(tmpl(STag, Data), 0); + compile_word(tmpl(STag, Imm), 0x4); + compile_word(tmpl(STag, Imm), 0x5); + compile_word(tmpl(STag, Data), 0); + compile_word(tmpl(STag, Imm), 0x6); } } } -// --- Window Procedure (Event Loop) --- -MS_LRESULT win_proc(MS_HWND hwnd, MS_UINT msg, MS_WPARAM wparam, MS_LPARAM lparam) { +IA_ void compile_and_run_tape(void) { + farena_reset(&code_arena); + + // Prologue: Load VM state from globals[14] and [15] + emit8(0x48); emit8(0x8B); emit8(0x41); emit8(0x70); // mov rax, [rcx+112] + emit8(0x48); emit8(0x8B); emit8(0x51); emit8(0x78); // mov rdx, [rcx+120] + + // Compile the selected tokens + U4*r tape_ptr = C_(U4*r, tape_arena.start); + for (U8 i = 0; i <= cursor_idx; i++) { + compile_word(UNPACK_TAG(tape_ptr[i]), UNPACK_VAL(tape_ptr[i])); + } + + // Epilogue: Save VM state back to globals + emit8(0x48); emit8(0x89); emit8(0x41); emit8(0x70); // mov [rcx+112], rax + emit8(0x48); emit8(0x89); emit8(0x51); emit8(0x78); // mov [rcx+120], rdx + emit8(0xC3); // ret + + // Cast code arena to function pointer and CALL it! + typedef void (*JIT_Func)(U8* globals_ptr); + JIT_Func func = (JIT_Func)code_arena.start; + func(vm_globals); + + // Read state for UI + vm_rax = vm_globals[14]; + vm_rdx = vm_globals[15]; +} + +// --- Window Procedure --- +S8 win_proc(void* hwnd, U4 msg, U8 wparam, S8 lparam) { + U8 tape_count = tape_arena.used / sizeof(U4); + switch (msg) { case MS_WM_KEYDOWN: { - if (wparam == MS_VK_NEXT) { // Page Down - if ((view_block + 1) * TOKENS_PER_BLOCK < tape_pos) view_block++; - InvalidateRect(hwnd, NULL, true); - } else if (wparam == MS_VK_PRIOR) { // Page Up - if (view_block > 0) view_block--; - InvalidateRect(hwnd, NULL, true); - } + if (wparam == MS_VK_RIGHT && cursor_idx < tape_count - 1) cursor_idx++; + if (wparam == MS_VK_LEFT && cursor_idx > 0) cursor_idx--; + if (wparam == MS_VK_DOWN && cursor_idx + TOKENS_PER_ROW < tape_count) cursor_idx += TOKENS_PER_ROW; + if (wparam == MS_VK_UP && cursor_idx >= TOKENS_PER_ROW) cursor_idx -= TOKENS_PER_ROW; + + // Interaction: Reset VM and compile up to cursor + vm_rax = 0; vm_rdx = 0; + mem_zero(u8_(vm_globals), sizeof(vm_globals)); + + compile_and_run_tape(); + + ms_invalidate_rect(hwnd, NULL, true); return 0; } case MS_WM_PAINT: { MS_PAINTSTRUCT ps; - MS_HDC hdc = BeginPaint(hwnd, &ps); + void* hdc = ms_begin_paint(hwnd, &ps); + void* hFont = ms_create_font_a(20, 0, 0, 0, 400, 0, 0, 0, 0, 0, 0, 0, 0, "Consolas"); + void* hOldFont = ms_select_object(hdc, hFont); - // Modern Monospace Font (Consolas) - MS_HFONT hFont = CreateFontA(22, 0, 0, 0, 400, 0, 0, 0, 0, 0, 0, 0, 0, "Consolas"); - MS_HANDLE hOldFont = SelectObject(hdc, hFont); + ms_set_bk_color(hdc, 0x001E1E1E); + S4 start_x = 40, start_y = 60, spacing_x = 110, spacing_y = 35; - // Dark background - U4 bg_color = 0x001E1E1E; - SetBkColor(hdc, bg_color); + U4*r tape_ptr = C_(U4*r, tape_arena.start); - int x = 20; - int y = 20; - int line_height = 24; + // Render Tokens + for (U8 i = 0; i < tape_count; i++) { + S4 col = (S4)(i % TOKENS_PER_ROW); + S4 row = (S4)(i / TOKENS_PER_ROW); + S4 x = start_x + (col * spacing_x); + S4 y = start_y + (row * spacing_y); - // Render Block Header - SetTextColor(hdc, 0x00AAAAAA); - char header_str[32] = "Block 0x000"; - u64_to_hex(view_block, header_str + 8, 3); - TextOutA(hdc, x, y, header_str, 11); - y += line_height * 2; + if (i == cursor_idx) { + void* hBrush = ms_get_stock_object(2); // GRAY_BRUSH + ms_select_object(hdc, hBrush); + ms_rectangle(hdc, x - 5, y - 2, x + 95, y + 22); + } - // Render Tokens for current block - U8 start_idx = view_block * TOKENS_PER_BLOCK; - U8 end_idx = start_idx + TOKENS_PER_BLOCK; - if (end_idx > tape_pos) end_idx = tape_pos; - - for (U8 i = start_idx; i < end_idx; i++) { - U4 t = tape[i]; + U4 t = tape_ptr[i]; U4 tag = UNPACK_TAG(t); U4 val = UNPACK_VAL(t); - U4 color = 0x00FFFFFF; - const char* prefix = ""; + U4 color = tag_colors[tag]; + const char* prefix = tag_prefixes[tag]; - switch (tag) { - case TAG_DEFINE: color = 0x003333FF; prefix = ": "; break; // RED - case TAG_CALL: color = 0x0033FF33; prefix = "~ "; break; // GREEN - case TAG_DATA: color = 0x00FFFF33; prefix = "$ "; break; // CYAN - case TAG_IMM: color = 0x0033FFFF; prefix = "^ "; break; // YELLOW - case TAG_COMMENT: color = 0x00AAAAAA; prefix = ". "; break; // DIM + ms_set_text_color(hdc, color); + + char val_str[9]; + u64_to_hex(val, val_str, 6); + val_str[6] = '\0'; + + // Friendly names for our primitives + if (tag == tmpl(STag, Imm) || tag == tmpl(STag, Call)) { + if (val == 0x1) mem_copy(u8_(val_str), u8_("SWAP "), 6); + if (val == 0x2) mem_copy(u8_(val_str), u8_("MULT "), 6); + if (val == 0x3) mem_copy(u8_(val_str), u8_("ADD "), 6); + if (val == 0x4) mem_copy(u8_(val_str), u8_("FETCH "), 6); + if (val == 0x5) mem_copy(u8_(val_str), u8_("DEC "), 6); + if (val == 0x6) mem_copy(u8_(val_str), u8_("STORE "), 6); + if (val == 0x7) mem_copy(u8_(val_str), u8_("RET_IF"), 6); + if (val == 0xFA) mem_copy(u8_(val_str), u8_("F_STEP"), 6); } - SetTextColor(hdc, color); - TextOutA(hdc, x, y, prefix, 2); + char out_buf[10]; + out_buf[0] = prefix[0]; + out_buf[1] = ' '; + mem_copy(u8_(out_buf + 2), u8_(val_str), 6); + out_buf[8] = '\0'; - char val_str[8]; - u64_to_hex(val, val_str, 7); - val_str[7] = '\0'; - TextOutA(hdc, x + 24, y, val_str, 7); - - y += line_height; - // Simple column wrapping inside the block - if (y > 500) { - y = 20 + line_height * 2; - x += 160; - } + ms_text_out_a(hdc, x, y, out_buf, 8); } - // Render VM State at the bottom right - y = 480; - x = 600; - SetTextColor(hdc, 0x00FFFFFF); - TextOutA(hdc, x, y, "VM State (2-Reg Stack)", 22); - y += line_height; + ms_set_text_color(hdc, 0x00AAAAAA); + ms_text_out_a(hdc, 40, 20, "x86-64 Machine Code Emitter | 2-Reg Stack + Global Tape | Factorial", 68); + + // Render VM State + ms_set_text_color(hdc, 0x00FFFFFF); + char jit_str[32] = "JIT Size: 0x000 bytes"; + u64_to_hex(code_arena.used, jit_str + 12, 3); + ms_text_out_a(hdc, 40, 480, jit_str, 21); + + char state_str[64] = "RAX: 00000000 | RDX: 00000000"; + u64_to_hex(vm_rax, state_str + 5, 8); + u64_to_hex(vm_rdx, state_str + 21, 8); + ms_set_text_color(hdc, 0x0033FF33); + ms_text_out_a(hdc, 40, 510, state_str, 29); - char rax_str[16] = "RAX: 0x00000000"; - char rdx_str[16] = "RDX: 0x00000000"; - u64_to_hex(vm_rax, rax_str + 7, 8); - u64_to_hex(vm_rdx, rdx_str + 7, 8); + ms_set_text_color(hdc, 0x00FFFFFF); + ms_text_out_a(hdc, 400, 480, "Global Memory (Contiguous Array):", 33); + for (int i=0; i<4; i++) { + char glob_str[32] = "[0]: 00000000"; + glob_str[1] = '0' + i; + u64_to_hex(vm_globals[i], glob_str + 5, 8); + ms_set_text_color(hdc, 0x00FFFF33); + ms_text_out_a(hdc, 400, 510 + (i * 25), glob_str, 13); + } - SetTextColor(hdc, 0x0033FF33); - TextOutA(hdc, x, y, rax_str, 15); - y += line_height; - SetTextColor(hdc, 0x00FFFF33); - TextOutA(hdc, x, y, rdx_str, 15); - - SelectObject(hdc, hOldFont); - EndPaint(hwnd, &ps); - return 0; - } - case MS_WM_DESTROY: { - PostQuitMessage(0); + ms_select_object(hdc, hOldFont); + ms_end_paint(hwnd, &ps); return 0; } + case MS_WM_DESTROY: { ms_post_quit_message(0); return 0; } } - return DefWindowProcA(hwnd, msg, wparam, lparam); + return ms_def_window_proc_a(hwnd, msg, wparam, lparam); } -void main() { - tape = (U4*)VirtualAlloc(NULL, 64 * 1024, MS_MEM_COMMIT | MS_MEM_RESERVE, MS_PAGE_READWRITE); - if (!tape) ExitProcess(1); +int main(void) { + // 1. Initialize Memory Arenas using WinAPI + FArena + Slice tape_mem = slice_ut_(u8_(ms_virtual_alloc(NULL, 64 * 1024, MS_MEM_COMMIT | MS_MEM_RESERVE, MS_PAGE_READWRITE)), 64 * 1024); + Slice code_mem = slice_ut_(u8_(ms_virtual_alloc(NULL, 64 * 1024, MS_MEM_COMMIT | MS_MEM_RESERVE, MS_PAGE_EXECUTE_READWRITE)), 64 * 1024); + if (!tape_mem.ptr || !code_mem.ptr) ms_exit_process(1); + + farena_init(&tape_arena, tape_mem); + farena_init(&code_arena, code_mem); - // Bootstrap Block 0 - scatter(PACK_TOKEN(TAG_DEFINE, 0x51415245)); // ":SQUARE" - scatter(PACK_TOKEN(TAG_CALL, 0x00000001)); // DUP - scatter(PACK_TOKEN(TAG_CALL, 0x00000002)); // MULT - scatter(PACK_TOKEN(TAG_CALL, 0x00000003)); // RET - scatter(PACK_TOKEN(TAG_COMMENT, 0x4E4F5445)); // ".NOTE" - scatter(PACK_TOKEN(TAG_DATA, 5)); // $5 - scatter(PACK_TOKEN(TAG_IMM, 0x51415245)); // ^SQUARE + // Bootstrap Robust Sample: Factorial State Machine + scatter(PACK_TOKEN(tmpl(STag, Comment), 0x1111)); // .INIT + scatter(PACK_TOKEN(tmpl(STag, Data), 5)); // $5 + scatter(PACK_TOKEN(tmpl(STag, Data), 0)); // $0 (Addr) + scatter(PACK_TOKEN(tmpl(STag, Imm), 0x6)); // ^STORE + scatter(PACK_TOKEN(tmpl(STag, Data), 1)); // $1 + scatter(PACK_TOKEN(tmpl(STag, Data), 1)); // $1 (Addr) + scatter(PACK_TOKEN(tmpl(STag, Imm), 0x6)); // ^STORE + scatter(PACK_TOKEN(tmpl(STag, Comment), 0xFAFA)); // .FAFA + scatter(PACK_TOKEN(tmpl(STag, Imm), 0xFA)); // ^F_STEP + scatter(PACK_TOKEN(tmpl(STag, Imm), 0xFA)); + scatter(PACK_TOKEN(tmpl(STag, Imm), 0xFA)); + scatter(PACK_TOKEN(tmpl(STag, Imm), 0xFA)); + scatter(PACK_TOKEN(tmpl(STag, Imm), 0xFA)); - // Fill some padding so we can test pagination (Page Down) - for(int i=0; i < 300; i++) { - scatter(PACK_TOKEN(TAG_COMMENT, 0x0)); - } - - // Block 1 content - scatter(PACK_TOKEN(TAG_DATA, 10)); // $10 - scatter(PACK_TOKEN(TAG_IMM, 0x51415245)); // ^SQUARE - - // Run Interpreter - vm_eval_tape(); - - // Window Setup MS_WNDCLASSA wc; memset(&wc, 0, sizeof(wc)); wc.lpfnWndProc = win_proc; - wc.hInstance = (MS_HINSTANCE)GetStockObject(0); + wc.hInstance = ms_get_stock_object(0); wc.lpszClassName = "ColorForthWindow"; - wc.hbrBackground = (MS_HBRUSH)GetStockObject(4); - - if (!RegisterClassA(&wc)) ExitProcess(1); - - MS_HWND hwnd = CreateWindowExA( - 0, wc.lpszClassName, "Sourceless Tape Drive Editor", - MS_WS_OVERLAPPEDWINDOW | MS_WS_VISIBLE, - 100, 100, 800, 600, NULL, NULL, wc.hInstance, NULL - ); - - if (!hwnd) ExitProcess(1); + wc.hbrBackground = ms_get_stock_object(4); + ms_register_class_a(&wc); + void* hwnd = ms_create_window_ex_a(0, wc.lpszClassName, "Sourceless Global Memory Explorer", MS_WS_OVERLAPPEDWINDOW | MS_WS_VISIBLE, 100, 100, 1100, 750, NULL, NULL, wc.hInstance, NULL); MS_MSG msg; - while (GetMessageA(&msg, NULL, 0, 0)) { - TranslateMessage(&msg); - DispatchMessageA(&msg); - } - - ExitProcess(0); + while (ms_get_message_a(&msg, NULL, 0, 0)) { ms_translate_message(&msg); ms_dispatch_message_a(&msg); } + ms_exit_process(0); + return 0; }