initial commit

This commit is contained in:
2026-02-19 15:37:05 -05:00
commit f4c8181cfc
6 changed files with 489 additions and 0 deletions

2
.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
references
.env

131
GEMINI.md Normal file
View File

@@ -0,0 +1,131 @@
# System Prompt
## Baseline
DO NOT EVER make a shell script unless told to. DO NOT EVER make a readme or a file describing your changes unless your are told to. If you have commands I should be entering into the command line or if you have something to explain to me, please just use code blocks or normal text output. DO NOT DO ANYTHING OTHER THAN WHAT YOU WERE TOLD TODO. DO NOT EVER, EVER DO ANYTHING OTHER THAN WHAT YOU WERE TOLD TO DO. IF YOU WANT TO DO OTHER THINGS, SIMPLY SUGGEST THEM, AND THEN I WILL REVIEW YOUR CHANGES, AND MAKE THE DECISION ON HOW TO PROCEED.
WHEN WRITING SCRIPTS USE A 120-160 character limit per line. I don't want to see scrunched code.
## Necessary Background for Goal
Watch or read the following:
* [Forth Day 2020 - Preview of x64 & ColorForth & SPIR V - Onat](https://youtu.be/ajZAECYdJvE)
* [Metaprogramming VAMP in KYRA, a Next-gen Forth-like language](https://youtu.be/J9U_5tjdegY)
* [Neokineogfx - 4th And Beyond](https://youtu.be/Awkdt30Ruvk)
There are transcripts for each of these vide2s in the [references](./references/) directory.
## Goal
Learn ColorForth and be able to build a ColorForth derivative from scratch similar to Timothy Lottes and Onatt. First I want to make sure the agent has a full grasp of the background content provided. I can provide more if necessary.
## Some exceprts from discussing things with Grok 4.20 beta
### User
```md
What are the prerequisites for me to understand this completely. I know code such as those conveyed by Ryan Fleury, Jonathan Blow, Casey Muratori, Mike Acton, John Carmack, Tim Sweeney, Eskil Steenberg, Sebastian Aaltonen, etc.
I'm only aware of x86-64 and mips assembly by doing Pikuma, reading some information on assemblers. Looking deeply into linkers and doing my own build scripts. Looking into portable executable format on windows. But I'm not fluent in machine level code or its nuances. Like I don't have intuition on the trade-offs of Lottes's subset of x86 he uses nor for Onatt's and how easy it is to extend and support new codegen within their forth-like derivatives, etc.
They always advertise it as simple and I think there is a true simplicity to it but there is a gulf. Similar to being able to use lua but maybe it would take a while to bootstrap your own lua jit I guess. The biggest hurdle being the frontend they need to actually deal with non-text based code modification or management.
The other thing is I value being able to integrate or use external tooling like the rad debugger. And it will need most likely me to generate some source file and dynamically populate data it needs on the fly so I can open step-debug the program at runtime... I'm not sure how big of a hurdle that is. Do I even need linkers with this setup?
```
### Grok
```md
**Lottes' x68 (32-bit granularity x86-64 subset)** and **Onat's KYRA codegen style** are both deliberate restrictions on the full x86-64 ISA. They are chosen for **editor simplicity** and **zero-overhead execution**, not raw peak FLOPS. Both deliver **net performance gains** in their target domain (tiny self-contained interactive systems, shaders, live-reloaded tools) because the code stays <100 KB and fits entirely in L1/L2 cache. There is **no notable runtime degradation** for real work — often the opposite.
Here is the exact trade-off breakdown.
### Lottes' x68 — 32-bit aligned opcodes only
Every instruction (and immediate) is forced to 32-bit (4-byte) boundaries by padding.
**How padding works (exact from his talk):**
- Ignored prefixes (e.g. 0x3E DS segment)
- Multi-byte NOPs (0x0F 0x1F ...)
- Extra REX prefixes when not needed
- Example: normal RET (C3) → C3 90 90 90 (padded to 4 bytes)
**Trade-offs:**
| Aspect | Benefit (why he does it) | Cost / Potential downside | Real perf impact (his domain) |
|---------------------|-------------------------------------------------------|----------------------------------------------------|-------------------------------|
| **Editor / Sourceless** | Code = pure 32-bit word array. Hex editor, find/replace, auto-relink jumps all trivial. No variable-length decode. | None in practice | Huge win (instant edit) |
| **Jumps / Calls** | Relative offsets always 32-bit and aligned to end of opcode → editor can patch without decoding | Slightly larger code | Neutral (tiny code) |
| **Inline data** | Immediates always at 32-bit boundaries → easy visual data in hex view | — | Win |
| **Code size** | — | 2050% larger due to padding (estimated) | Negligible (everything in L1) |
| **Decode / uop cache** | — | Slightly more bytes fetched, possible extra uops | Negligible on Zen/Intel wide decoders |
| **ISA coverage** | Still full x64 power for the ops he needs | Can't use arbitrary unpadable instructions easily | None (he only needs tiny subset) |
**Perf verdict from Lottes:**
He explicitly says the code is so small it all fits in cache, so the padding cost is irrelevant. The **gain** is massive reduction in total system complexity (no assembler, no linker, no text parser). Sourceless mode = true zero-overhead runtime. He views this as **better** than full ISA for interactive work.
### Onat's KYRA style (not a strict opcode subset, but a coding / emission convention)
Full x86-64 base + SSE (without VEX prefixes) + all addressing modes (including RIP-relative). The restriction is in **how** he uses it.
**Core rules (exact from talk + his site):**
- Only **two temporary registers** for the "stack": RAX and RDX + 1 hidden bit ("which is top?").
- Before every call/definition: `xchg rax, rdx` (1 byte, 48 87 C2).
- Everything else = global memory accessed via **one base register** (R15 points to a contiguous struct of all variables).
- Every high-level word emits **13 real instructions** max (often 1).
- SPIR-V shaders = direct binary emit (no glslang).
**Trade-offs:**
| Aspect | Benefit (why Onat does it) | Cost / downside | Real perf impact |
|-------------------------|-----------------------------------------------------------------|------------------------------------------------------|------------------|
| **Runtime stack** | Zero stack ops, zero frame setup, no juggling | Must manage globals manually | **Massive gain** (60 ms text/Forth vs 4 ms) |
| **Register usage** | Only 2 temps + base → trivial templates, 1800-byte compiler | Occasional spill to global (he says 3 ns, ignore) | Gain (fewer instructions) |
| **Globals** | One contiguous block via R15 → perfect cache locality | Programmer tracks lifetimes (single-threaded) | Gain (better than stack) |
| **SSE / AVX** | SSE without VEX (2/3-byte encoding) | No VEX/AVX2+ in templates (easy to add later) | Minor loss if you need AVX, but he doesn't |
| **Dictionary / lookup** | Pure 32-bit index (next version 32-bit index + separate tag byte) | You curate the index space by hand | **Huge gain** (no hash, 12 ms full compile) |
| **Code size / density** | Extremely dense (single instructions) | None | Win |
**Perf numbers Onat actually shows (VAMP program):**
- Full recompile (editor + Vulkan + FFmpeg + shaders): 48 ms on normal hardware.
- Shader compile: <5 ms (mostly driver), ~100× faster than glslang.
- Instruction generation: 180 million instructions/second/core on Ryzen 5900HX.
- Compared to C: "thousand times more time to compile in C" for the same program.
He explicitly calls out the classic Forth stack as the bottleneck: "the difference between using data stack versus using registers only".
### Overall: Do they lose or gain perf?
**Net gain — often by 5-100× in the metrics that matter to them.**
**Gains:**
- Iteration speed (48 ms full rebuild vs seconds in C).
- Runtime speed (fewer instructions, perfect locality, no ABI/calling-convention tax).
- Cache behavior (tiny code + contiguous globals = L1 resident forever).
- Debuggability / visibility (live dataflow viz of RAX/RDX + globals).
**Losses (real but tiny):**
- Slightly larger code from Lottes padding.
- Occasional extra global access instead of a 3rd/4th register (Onat).
- Can't blindly copy-paste AVX-heavy code without adjusting templates.
For programs that fit in a few cache lines (exactly their use-case: editors, shaders, tools, games), the simplicity wins so hard that the "losses" disappear. This is why both say "simple" and mean it — they removed everything that costs more than it gives.
If you build the 60-line emitter I described earlier, you will feel the gain immediately: change one token → re-emit 100 bytes → run again in <1 ms. That is the real performance story.
Want the exact 32-bit x68 padding helper + Onat's xchg + global load/store templates as copy-paste C code next? Just say the word and I'll drop the blocks.
```
## Visual Context Synthesis & Color Semantics
Based on the extracted frame OCR data from the references (Lottes' and Onat's presentations), here is the persistent mapping of ColorForth visual semantics to language logic for this project:
- **Red (`<RED>`):** Defines a new word or symbol in the dictionary. This is the entry point for compilation.
- **Green (`<GREEN>`):** Compiles a word into the current definition.
- **Yellow/Orange (`<YELLOW>` / `<ORANGE>`):** Executes a word immediately. Often used for macros or immediate mode evaluation within the editor.
- **Cyan/Blue (`<CYAN>` / `<BLUE>`):** Used for variables, memory addresses, or formatting layout (not executable instruction logic).
- **White/Dim (`<WHITE>` / `<DIM>`):** Comments, annotations, and UI elements.
- **Magenta (`<MAGENTA>`):** Typically used for pointers or state modifiers.
**Architectural Notes Extracted:**
1. **Sourceless Environment:** The underlying system doesn't deal with parsing strings. It deals with 32-bit tagged tokens (as noted in Lottes' 32-bit x68 alignment).
2. **Visual Editor:** The editor is intrinsically tied to the compiler. It reads the same memory structure. It uses these color properties to colorize the tokens live.
3. **Hardware Locality:** We see a major focus on removing the stack in favor of register rotation (`RAX`, `RDX`) as per Onat's methodology.

29
download_videos.py Normal file
View File

@@ -0,0 +1,29 @@
import os
import yt_dlp
VIDEOS = {
"Forth Day 2020 - Preview of x64 & ColorForth & SPIR V - Onat.txt": "https://youtu.be/ajZAECYdJvE",
"Neokineogfx - 4th And Beyond - Transcript.txt": "https://youtu.be/Awkdt30Ruvk",
"Silicon Valley Forth Interest Group - Metaprogramming VAMP in KYRA, a Next-gen Forth-like language --- Onat Türkçüoğlu -- 2025-04-26.txt": "https://youtu.be/J9U_5tjdegY"
}
OUT_DIR = "C:/projects/forth/bootslop/references/processed_visuals"
os.makedirs(OUT_DIR, exist_ok=True)
ydl_opts = {
'format': 'bestvideo[ext=mp4]/best',
'quiet': False,
'no_warnings': True,
}
for transcript_file, url in VIDEOS.items():
video_name = os.path.splitext(transcript_file)[0]
video_path = os.path.join(OUT_DIR, f"{video_name}.mp4")
if not os.path.exists(video_path):
print(f"Downloading {video_name}...")
opts = ydl_opts.copy()
opts['outtmpl'] = video_path
with yt_dlp.YoutubeDL(opts) as ydl:
ydl.download([url])
else:
print(f"{video_name} already exists.")

112
extract_frames.py Normal file
View File

@@ -0,0 +1,112 @@
import os
import re
import sys
import imagehash
from PIL import Image
import cv2
import yt_dlp
VIDEOS = {
"Forth Day 2020 - Preview of x64 & ColorForth & SPIR V - Onat.txt": "https://youtu.be/ajZAECYdJvE",
"Neokineogfx - 4th And Beyond - Transcript.txt": "https://youtu.be/Awkdt30Ruvk",
"Silicon Valley Forth Interest Group - Metaprogramming VAMP in KYRA, a Next-gen Forth-like language --- Onat Türkçüoğlu -- 2025-04-26.txt": "https://youtu.be/J9U_5tjdegY"
}
REFERENCES_DIR = "C:/projects/forth/bootslop/references"
SNAPSHOTS_DIR = os.path.join(REFERENCES_DIR, "snapshots")
os.makedirs(SNAPSHOTS_DIR, exist_ok=True)
def parse_timestamps(filepath):
timestamps = []
with open(filepath, 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
# Match MM:SS or H:MM:SS
if re.match(r'^(\d+:)?\d+:\d{2}$', line):
parts = list(map(int, line.split(':')))
if len(parts) == 2:
seconds = parts[0] * 60 + parts[1]
else:
seconds = parts[0] * 3600 + parts[1] * 60 + parts[2]
timestamps.append(seconds)
return sorted(list(set(timestamps)))
def download_video(url, output_path):
ydl_opts = {
'format': 'best[ext=mp4]/best',
'outtmpl': output_path,
'quiet': True,
'no_warnings': True,
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
def extract_and_dedup_frames(video_path, timestamps, out_dir):
os.makedirs(out_dir, exist_ok=True)
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print(f"Failed to open video {video_path}")
return
fps = cap.get(cv2.CAP_PROP_FPS)
saved_hashes = []
for t_sec in timestamps:
# Set video position
cap.set(cv2.CAP_PROP_POS_MSEC, t_sec * 1000)
ret, frame = cap.read()
if not ret:
continue
# Convert CV2 BGR to RGB for Pillow
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
pil_img = Image.fromarray(frame_rgb)
# Calculate perceptual hash
h = imagehash.phash(pil_img)
# Check against previous hashes (hamming distance < 5 is very similar)
is_duplicate = False
for saved_h in saved_hashes:
if h - saved_h < 5:
is_duplicate = True
break
if not is_duplicate:
saved_hashes.append(h)
out_filename = os.path.join(out_dir, f"frame_{t_sec:04d}s.jpg")
pil_img.save(out_filename, quality=90)
print(f"Saved {out_filename}")
cap.release()
def main():
for transcript_file, url in VIDEOS.items():
filepath = os.path.join(REFERENCES_DIR, transcript_file)
if not os.path.exists(filepath):
print(f"Transcript not found: {filepath}")
continue
print(f"Processing {transcript_file}...")
timestamps = parse_timestamps(filepath)
print(f"Found {len(timestamps)} timestamps.")
video_name = os.path.splitext(transcript_file)[0]
video_path = os.path.join(SNAPSHOTS_DIR, f"{video_name}.mp4")
out_dir = os.path.join(SNAPSHOTS_DIR, video_name)
if not os.path.exists(video_path):
print(f"Downloading video from {url}...")
download_video(url, video_path)
print("Extracting frames...")
extract_and_dedup_frames(video_path, timestamps, out_dir)
# Clean up video to save space
if os.path.exists(video_path):
os.remove(video_path)
if __name__ == '__main__':
main()

214
process_visuals.py Normal file
View File

@@ -0,0 +1,214 @@
import os
import re
import cv2
import yt_dlp
import numpy as np
import imagehash
from PIL import Image
import asyncio
from winsdk.windows.media.ocr import OcrEngine
from winsdk.windows.globalization import Language
from winsdk.windows.graphics.imaging import SoftwareBitmap, BitmapPixelFormat, BitmapAlphaMode
VIDEOS = {
"Forth Day 2020 - Preview of x64 & ColorForth & SPIR V - Onat.txt": "https://youtu.be/ajZAECYdJvE",
"Neokineogfx - 4th And Beyond - Transcript.txt": "https://youtu.be/Awkdt30Ruvk",
"Silicon Valley Forth Interest Group - Metaprogramming VAMP in KYRA, a Next-gen Forth-like language --- Onat Türkçüoğlu -- 2025-04-26.txt": "https://youtu.be/J9U_5tjdegY"
}
REFERENCES_DIR = "C:/projects/forth/bootslop/references"
OUT_DIR = os.path.join(REFERENCES_DIR, "processed_visuals")
os.makedirs(OUT_DIR, exist_ok=True)
def parse_timestamps(filepath):
timestamps = []
with open(filepath, 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
if re.match(r'^(\d+:)?\d+:\d{2}$', line):
parts = list(map(int, line.split(':')))
if len(parts) == 2:
seconds = parts[0] * 60 + parts[1]
else:
seconds = parts[0] * 3600 + parts[1] * 60 + parts[2]
timestamps.append(seconds)
return sorted(list(set(timestamps)))
def download_highres(url, output_path):
ydl_opts = {
'format': 'bestvideo[ext=mp4]/best',
'outtmpl': output_path,
'quiet': True,
'no_warnings': True,
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
from winsdk.windows.storage import StorageFile
from winsdk.windows.graphics.imaging import BitmapDecoder
async def ocr_image(img_cv2):
temp_path = os.path.join(OUT_DIR, "temp_ocr.png")
cv2.imwrite(temp_path, img_cv2)
file = await StorageFile.get_file_from_path_async(os.path.abspath(temp_path))
stream = await file.open_read_async()
decoder = await BitmapDecoder.create_async(stream)
bitmap = await decoder.get_software_bitmap_async()
engine = OcrEngine.try_create_from_language(Language("en-US"))
if not engine:
return None
return await engine.recognize_async(bitmap)
def get_word_color(img_hsv, rect):
x, y, w, h = int(rect.x), int(rect.y), int(rect.width), int(rect.height)
roi = img_hsv[y:y+h, x:x+w]
if roi.size == 0: return "WHITE"
# Mask out background (low saturation/value)
mask = (roi[:,:,1] > 60) & (roi[:,:,2] > 60)
if np.sum(mask) < 5:
# Check brightness
if np.mean(roi[:,:,2]) > 128:
return "WHITE"
return "DIM"
avg_h = np.median(roi[:,:,0][mask])
# OpenCV Hue is 0-179
if avg_h < 12 or avg_h > 165: return "RED"
elif 12 <= avg_h < 25: return "ORANGE"
elif 25 <= avg_h < 40: return "YELLOW"
elif 40 <= avg_h < 85: return "GREEN"
elif 85 <= avg_h < 130: return "CYAN"
elif 130 <= avg_h < 150: return "BLUE"
elif 150 <= avg_h <= 165: return "MAGENTA"
return "UNKNOWN"
def detect_faces(gray_img):
cascade_path = cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
if not os.path.exists(cascade_path): return []
face_cascade = cv2.CascadeClassifier(cascade_path)
return face_cascade.detectMultiScale(gray_img, 1.3, 5)
async def process_video(video_path, timestamps, video_name):
out_dir = os.path.join(OUT_DIR, video_name)
os.makedirs(out_dir, exist_ok=True)
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print(f"Failed to open {video_path}")
return
saved_hashes = []
markdown_lines = [f"# OCR and Visual Log for {video_name}\n"]
for t_sec in timestamps:
cap.set(cv2.CAP_PROP_POS_MSEC, t_sec * 1000)
ret, frame = cap.read()
if not ret: continue
h = imagehash.phash(Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)))
is_duplicate = any(h - saved_h < 5 for saved_h in saved_hashes)
if is_duplicate: continue
saved_hashes.append(h)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
faces = detect_faces(gray)
total_area = frame.shape[0] * frame.shape[1]
face_area = sum(w*h for (x,y,w,h) in faces)
ocr_result = await ocr_image(frame)
words = []
text_area = 0
lines_output = []
if ocr_result and ocr_result.lines:
for line in ocr_result.lines:
line_str = []
for word in line.words:
words.append(word)
text_area += word.bounding_rect.width * word.bounding_rect.height
color = get_word_color(hsv, word.bounding_rect)
line_str.append(f"<{color}>{word.text}</{color}>")
lines_output.append(" ".join(line_str))
# Heuristic to discard useless frames (mostly face, no code)
if face_area > total_area * 0.05 and text_area < total_area * 0.01:
continue # Skip this frame
markdown_lines.append(f"## Time {t_sec}s")
if lines_output:
markdown_lines.append("\n".join(lines_output))
# Crop code block
if words:
min_x = int(min(w.bounding_rect.x for w in words))
min_y = int(min(w.bounding_rect.y for w in words))
max_x = int(max(w.bounding_rect.x + w.bounding_rect.width for w in words))
max_y = int(max(w.bounding_rect.y + w.bounding_rect.height for w in words))
pad = 30
min_x = max(0, min_x - pad)
min_y = max(0, min_y - pad)
max_x = min(frame.shape[1], max_x + pad)
max_y = min(frame.shape[0], max_y + pad)
code_crop = frame[min_y:max_y, min_x:max_x]
code_path = os.path.join(out_dir, f"code_{t_sec:04d}s.jpg")
cv2.imwrite(code_path, code_crop)
markdown_lines.append(f"\n*Saved code image: {code_path}*")
# Find non-text visual content (e.g. diagrams) outside the text area
# A very simple heuristic: if there's a huge gap of non-text that has strong edges
edges = cv2.Canny(gray, 50, 150)
# mask out the text area
edges[min_y:max_y, min_x:max_x] = 0
# mask out faces
for (fx,fy,fw,fh) in faces:
edges[fy:fy+fh, fx:fx+fw] = 0
non_text_pixels = cv2.countNonZero(edges)
if non_text_pixels > 5000:
# There is significant visual structure outside the text and faces.
# Find its bounding box
pts = cv2.findNonZero(edges)
if pts is not None:
bx, by, bw, bh = cv2.boundingRect(pts)
if bw > 100 and bh > 100: # large enough
visual_crop = frame[max(0, by-pad):min(frame.shape[0], by+bh+pad), max(0, bx-pad):min(frame.shape[1], bx+bw+pad)]
visual_path = os.path.join(out_dir, f"visual_{t_sec:04d}s.jpg")
cv2.imwrite(visual_path, visual_crop)
markdown_lines.append(f"\n*Saved non-text visual: {visual_path}*")
markdown_lines.append("\n---\n")
cap.release()
with open(os.path.join(OUT_DIR, f"{video_name}_ocr.md"), "w", encoding="utf-8") as f:
f.write("\n".join(markdown_lines))
async def main():
for transcript_file, url in VIDEOS.items():
filepath = os.path.join(REFERENCES_DIR, transcript_file)
if not os.path.exists(filepath): continue
print(f"Processing {transcript_file}...")
timestamps = parse_timestamps(filepath)
video_name = os.path.splitext(transcript_file)[0]
video_path = os.path.join(OUT_DIR, f"{video_name}.mp4")
if not os.path.exists(video_path):
print(f"Downloading high-res video...")
download_highres(url, video_path)
print(f"Extracting, OCRing, and semantic tagging frames...")
await process_video(video_path, timestamps, video_name)
if __name__ == '__main__':
asyncio.run(main())

1
setup_gemini.ps1 Normal file
View File

@@ -0,0 +1 @@
Get-Content .env | ForEach-Object { $name, $value = $_.Split('=', 2); [Environment]::SetEnvironmentVariable($name, $value, "Process") }