manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	d08fe29521	docs(report): extensive Lottes/Onat paradigm report (4102 lines, 26 appendices) Comprehensive synthesis of the Lottes/Onat compiler-as-OS paradigm from: - 53 Twitter threads (Nov 2022 - Jul 2026, ~250 posts, ~150 media) - 22 historical Lottes blog posts (2007-2016) - 4 bootslop in-depth analyst notes (blog_in-depth, forth_day_2020, kyra_in-depth, neokineogfx) - ps1-ai brain-dump-distilled-v2.md (53K chars of Ed's synthesis) - 20+ Twitter conversational partners - Charles Moore (ColorForth), Onat Türkçüoğlu (KYRA/VAMP), Andy Gavin (GOAL), Ryan Fleury (RAD Debugger) as convergent practitioners Covers: timeline 2007-2026, technical architecture in depth (x68 / annotation overlay / source-less editor / color tokens / 2-reg stack / folded interpreter / pre-emptive scatter / register-file-as-memory / C-ABI words), per-thread and per-blog-post analysis, UE/Unity/Godot bridge, adoption decision framework, failure modes, convergence matrix across 4 implementers, 20 defining quotes. Written in a single session, ~4 hours of focused research, using the manual-slop MCP tools + native read/write/edit. The report itself follows the Lottes/Onat philosophy: Touch it once (read each source once), small to macros to data structures (this report IS the data structure), the editor IS the linker (every claim links to its source).	2026-07-27 21:54:03 -04:00
ed	a92941979e	docs(twitter): add 1776597200073568623 corpus (NOTimothyLottes 'AMD GPU shader compiler isn't doing' book) 7-post thread with @MartinJIFuller + @SebAaltonen, April 2024, 35 likes, 4445 views, 4 media (3 JPGs + 1 MP4). 'This is how slow code feels like' (huge switch table in compiled shader) vs 'this is how clever you feel after finding the optimization' (manual bit extract + AND). @SebAaltonen: 'Sad that the compiler isn't doing this for us in 2024.' NOTimothyLottes: 'On mastodon.gamedev.place I'm collecting posts with AMD GPU disassembly dumps for a book on everything the shader compiler ISN'T doing. Constant AMD GPU compiler perf bug SPAM is a key part of my online strategy to keep a low follower count.' High engagement because of the contrarian framing.	2026-07-27 13:03:43 -04:00
ed	a6229e9308	docs(twitter): add 1687062786273062912 corpus (NOTimothyLottes SPC 8-deep swap BO + 3-thread load) 5-post thread, 2 JPGs, 2023-08-03. SPC: 8-deep swap all aliasing same BO (image) fixes Game Mode crash with external display. Pipelined load: 2 ms to black screen; 460 ms to allocate 4 GiB GTT video RAM; 558 ms to copy 256 MiB cart from page cache to USWC (page faults), all in parallel. Cold-launch worst case (cache cleared): 1.5 sec to read 256 MiB from stock SSD; no compile after this, 'literally in game at this point.' 3-thread load: (1.) X11 then spin present/ keyboard, (2.) AMDgpu device open + SDMA after CART read then dispatch, (3.) CART disk->USWC GTT read(). SSD needs linear reads to be fast so don't manually thread read(). Sibling to SPC project intro (1687257354490818561).	2026-07-27 13:01:26 -04:00
ed	ef5e824694	docs(twitter): add 1735622924571201674 corpus (NOTimothyLottes Holiday GPU thoughts 2023) 25-post thread, 0 media, 2023-12-15. HIGHEST engagement 2023 thread: 112 likes, 25 reposts, 31576 views. 'Holiday GPU thoughts as a recovering pc/mobile/etc cross platform dev-holic' - the 16-bit permutation philosophy + post-black-box philosophy in one thread. [0] STP prototypes: 32-bit multi-dispatch vs packed-16 single- dispatch-ubershader, 40% faster, 'Industry has huge untapped opt potential.' [1-7] 16-bit fundamentals: AMD Vega+ for PC, draw compatibility line at 16-bit, dev on AMD VK RDNA2 + avoid DXC deoptimizer, Gather4 SoA, range management, denormals, Touch It Once is top optimization, no pass-graph fragmentation. [8-12] TAA scaling: 720p/1080p rendering for 4k, large L3 holds full render targets, RDNA2's 128 MiB L3 was spot on, RDNA3 drop was wrong, console no-L3 was wrong. [13-17] Triangles obsolete: 8x area scaling doesn't know connectivity, scaling TAA = mostly tri culling, fix geometrical aliasing to break black boxes, stratified sampling 2x geo density, frame viewport jitter EOL, TAA disocclusion deeply integrated into shading. [18-23] ML/GI/RT critique: per-pixel ML people wrong, multi-pass vs fewer-passes, GI needs surface shade cache -> object-space shading -> NUMA, HW RT people wrong (non-ray-traversal ordered access, occlusion = neighbor-coherent, GI = high-freq occlusion of low-freq probe domain, HW RT skin-all re-tree won't scale, stratified visibility = bounded costs). [24] Vote with engineering: general-purpose CS, say no to black boxes. [25] Scaling TAA wants sparse striped data per frame, low-freq pixel control cage displacing high-freq reprojected feedback.	2026-07-27 13:01:23 -04:00
ed	cd1f785bd0	docs(twitter): add 1590241107317002240 corpus (NOTimothyLottes 'Valve please allow binary shaders on Steam Deck') 1-post thread with @olson_dan, Nov 9 2022. 'If only PC vendors with open ISAs (AMD/Intel) would allow loading binary compute shaders in VK built using non-driver toolchains. If Valve enabled binary shaders on Steam Deck in Linux, that would be my new exclusive dev platform, instead of just a possible porting target.' Earliest known expression of the wish that becomes the Aug 2025 SPIR-V-from-data work. Sits between the Nov 2022 tip-line (1588906002212323328) and the Dec 2022 FSR1 thread (1597798161665253376).	2026-07-27 12:59:04 -04:00
ed	b20fa47487	docs(twitter): add 1591646625692553216 corpus (NOTimothyLottes self-modifying-binary update) 1-post thread, Nov 13 2022. 'My prefered method of managing application data was the self-modifying-binary: (a.) on launch copy binary to temp file, (b.) launch temp file, (c.) exit, (d.) now temp file execution can freely modify origional binary.' Cleanest description of his update/install pattern.	2026-07-27 12:59:02 -04:00
ed	d08a14e3a5	docs(twitter): add 1588515348223254528 corpus (NOTimothyLottes VK ~1500 lines overhead + @rianflo) 3-post thread with @rianflo, Nov 4 2022. Vulkan 'patches and wires' might be a sweet spot (kinda like Xcode visualization of Metal but reverse). VK: ~1500 lines of overhead (including header src, no includes) for complete hotrod bind-everything-only-once compute-only gfx engine. Not a fan of VK bloatware API complexity but can be minimized and exposes enough to do 'wonderfull things to the hardware.' Earliest known line-count for his VK engine - context for the Dec 2023 retrospective (1736161886079533186) which shows the fully-articulated design.	2026-07-27 12:58:07 -04:00
ed	3a27b89890	docs(twitter): add 1939170419958816855 corpus (NOTimothyLottes python-as-C-preprocessor + runtime code-gen) 4-post thread with @mmalex + @seflless, June 2025. @mmalex vibed-coded a python C preprocessor: finds /* python <code> */ in .c, runs the python, dumps stdout after the comment, replaces old block on change. @seflless points out Cog (https://cog.readthedocs.io) which did the same back in 2000ish. NOTimothyLottes: 'Many of my forth-like languages worked via nested runtime code gen, so it was possible to generate+run code to build either code or data, then repeat. The system would bootstrap its own assembler that way.' Precursor to the Jul 2026 micro macro-forth + nasm-bootstrapped 4KiB Linux Forth threads.	2026-07-27 12:53:00 -04:00
ed	9e2ff29841	docs(twitter): add 2077241596869751029 corpus (NOTimothyLottes uber-instruction interp language) 5-post thread, 0 media, July 2026. SAR (signed shift-right) trick: HW uses only 5\|6 LSBs of shift-by so mixed-use control words don't need MSB masking - saves an instruction. Working through a single uber-instruction interpreted language for code generation: never misses the instruction cache, executes fully out of micro-op cache, fully data driven, direct map 64K entry symbol table. Borrows from SIMD GPU programming: instead of branching/predicating to avoid stores, just store to a discarded area (negative offset on GPU, designated trash address on CPU). No exit condition logic in infinite loop - exit via storing to set address, external watchdog polls at some sleep frequency. SAR trick can grab 3 values: SF (MSB), CF (shift-out bit -1), and output - SF+CF for CMOVcc, output for next shift. Sibling to the 2079746130309415185 micro macro-forth and 2078349730204078527 8-byte aligned Forth word threads - the uber-instruction version of the runtime-language work.	2026-07-27 10:43:43 -04:00
ed	907d7d5e83	docs(twitter): add 1589810282104524800 corpus (NOTimothyLottes Bind-Everything-Once layout aliasing) 1-post thread, 1 image, Nov 8 2022. Single-post example showing storage image format by aliased name + bound as sampled image; buffer access {readonly, writeonly, volatile (read uncached), atomic} by aliased name as {uint, uint4, float, float4}. Earliest documented example of the Bind-Everything-Once pattern that resurfaces in the Dec 2023 VK retrospective (1736161886079533186) and the Dec 2024 STORAGE_TEXEL_BUFFER deep dive (1870351985855119449).	2026-07-27 02:10:32 -04:00
ed	409413a169	docs(twitter): add 1597798161665253376 corpus (NOTimothyLottes explicit packed 16-bit GLSL + FSR1) 7-post thread with @rianflo, Nov 2022. Confirms NOTimothyLottes wrote CAS/FSR1/etc GLSL versions with packed 16-bit math by hand. 'Explicit packed 16-bit works on AMD VK Vega and up. Up to 30% improvement on ALU bound stuff. Lots of occupancy wins.' Constants all packed and aliased as UINT (no conversion overheads). FSR1 source has 'F' (32-bit) and 'H' and 'Hx2' (packed 16-bit) function variants. Sibling to the Nov 2022 GPU tip-line thread (1588906002212323328).	2026-07-27 02:04:50 -04:00
ed	a64cc163f1	docs(twitter): add 1651268028795961344 corpus (NOTimothyLottes DuskOS + Forth-as-asm) 5-post thread, 0 media, April 2023. NOTimothyLottes + @kenpex + @wadetb. Links DuskOS (a Forth-as-asm-level-OS). NOTimothyLottes: Forth as OLPC boot firmware has been done (openbios). 'Can make a custom forth to binary in a few K. Bring up a runtime assembler in the language and do anything fast and simple from that point, load libraries, make system/dll C calls. 100,000x complexity reduction in compiler toolchain size.' Seed of the whole Forth-in-runtime workstream that shows up in the Jul 2026 threads (nasm-bootstrapped 4 KiB Linux Forth, micro macro-forth, etc).	2026-07-27 02:02:54 -04:00
ed	d2098c8dec	docs(twitter): add 1688022114962374656 corpus (NOTimothyLottes SPC vintage 640x480 + 15KHz CRT plan) 4-post thread, 2 images, Aug 6 2023. Same day as the RDNA2 assembler design doc (1688286623375454208). SPC project: vintage 640x480 VGA style render + 640x240 fallback for 15 KHz arcade CRTs (PGM-esk) + CRT shader scaling on Deck + OLED/LCD. Mix of render pixel aspect changes, less than 480 lines, letter-box for integer Y scaling. 15 KHz output path docked: HD Fury Nano (no <480p) -> HDMI-to-VGA -> sync combiner (GBS-C bug workaround) -> GBS-C line-drops 480p60 to 240p60 component. Bitmap font uses right angles + 2x2 stroke in 8x16 (pow2) so 640x240p still gets readable 8x8. Overscan safety: Y axis easy - keep cursor pinned in 'safe' zone, still print lines in overscan area (visible on non-overscan CRTs).	2026-07-27 02:01:57 -04:00
ed	5db46d3b64	docs(twitter): add 1588906002212323328 corpus (NOTimothyLottes GPU Programming Tip Line, Nov 2022) 21-post thread, 0 media, 20 numbered GPU tips - the earliest NOTimothyLottes thread in the corpus. Highlights: [0] Normalization 'rsq(0)=INF0=NaN' -> 'normalize_safe(x){return xmin(MAX_FLOAT,rsq(dot(x,x)));}' [1] FP16 1/denormals -> INFs -> NaNs; fix 'rcp(max(x,SMALLEST_NORMAL))' [2] 'spirv-opt -Os' against IHV compile times [3] Driver ignoring '[[dont_unroll]]' workaround via constantBuffer.zero [4-5] 'Ship-One-Shader' = one SPIR-V binary + specialization constants at PSO gen (requires spirv-opt -Os) [6-8] Bit masking: signed 'bitfieldExtract(int(v),bit,1)' to all-0/1 mask; AMD 'Bfi' as '(ins&mask)\|(src&(~mask))'; bitfieldInsert hits V_BFI_B32 portable [9] clamp = med3 on AMD (V_MED3_* non-portable) [10-11] Bool-as-float: 'saturate(ab+c)' for AND\|OR, '(-a)b+1.0' for NAND [12] INF-to-NaN: 'x*0.0+x' [13-15] Semi-persistent workgroups (4x 8x8 tiles in 16x16 footprint, 64-wide group) - up to 10% perf on AMD, gains via L0 retention, no wait-for-store on exit [16-17] Merge passes to avoid DRAM round-trips; serial dependent passes can be merged into one shader for >10% L2-resident gains [18-19] Packed 16-bit double-rate: up to 30% (except NV); even without, 16-bit manages register pressure esp with smaller HW limits or compiler troubles	2026-07-27 02:00:58 -04:00
ed	eed1590d1e	docs(twitter): add 1687257354490818561 corpus (NOTimothyLottes SPC project intro + Steam Deck HID) 7-post thread, 6 images. SPC project intro: 'like a Pico8, but turns the Steam Deck into an actual console for exclusive GPU assembly projects. The way consoles had been done before silly portability became the only focus.' GPU-accessible KEY + GAMEPAD input; no functions, no API, fixed VA address + data format. Steam Deck + 3 generations of PlayStation HID brought up; Xbox controllers aren't HIDs so skipping for now. Hot plug via epoll, every-second open attempt; {product, vendor} not unique enough (aliasing), use packet size too. Up to 4 device outputs collected for GPU; Steam Deck controls always last for docking. Audio deferred, networking skipped for v1. 2000 lines of C, no includes. Next: GPU-side assembler + data editor (binary at first).	2026-07-27 01:58:31 -04:00
ed	a642db922d	docs(twitter): add 1688286623375454208 corpus (NOTimothyLottes SPC RDNA2 assembler design) 13-post thread, 12 images. THE RDNA2 ASSEMBLER DESIGN DOC, Aug 2023 - 3 years before the Aug 2025 SPIR-V-from-data work. Same idea, more concrete, GPU-side assembler+editor building RDNA2 binary for Steam Deck. Custom binary format (no text input); 32-bit words treated identically so assembly step is massively parallel. 6-arg symbol joining + 7th arg for signed 16-bit relative branch offset. 5-char symbol names (6-bit char table, right-angle-only font, upper case, some glyph aliasing). SMEM 21-bit offset: 20 bits absolute positive from KART base, 1 MiB easy data. Source = two 128-bit values per 32-bit word: {RULE, SYMBOLS, RELATIVE BRANCH} and {LABEL, COMMENT}. 3-view editor {symbol, rule, source}, 2 copies of everything per edit, dead parallel. Interface inspired by MOD-like trackers (Impulse Tracker): simple {instrument, pattern, arrangement} with spreadsheet feel. Precursor to the 2079746130309415185 micro macro- forth and the 1951638140600381942 cart file work.	2026-07-27 01:58:27 -04:00
ed	e0c4ef80b7	docs(twitter): add 1692395128739057844 corpus (NOTimothyLottes SPC CRT shader sub-pixel fix) 4-post thread, 3 images, 21 likes, 2894 views. Previous CRT shader was broken; corrected, energy-conserving sub-pixel line width raster using individual channels. Each scan line simulated with only 3 pixels but 9 sub-pixels; reaches 7/8 of display peak brightness; some perceptual scan effect preserved. Scan line thickness adaptive to linear brightness: thins = energy-conserving + thin-pixel brightness compensation; bloom = lines overlap. 'CRTs feel like they have more contrast than they do because individual scan lines are super bright when in focus; the black surround feels darker to the mind.'	2026-07-27 01:58:22 -04:00
ed	3f8a243bab	docs(twitter): add 1692565070583136348 corpus (NOTimothyLottes SPC CRT shader on Steam Deck) 4-post thread, 7 images. SPC CRT shader on Steam Deck; not doing subpix render this time but RGB vs BGR order still matters. Comparing grille / bad-convergence-scan / subpix-scan. Non-subpix: 1-pixel RGB separation (intentional bad convergence) to help hide scaling. Subpix: 1/3-pixel separation (less visible misaligned convergence). Subpix wins: 'almost 3x doubling of scanline width detail, just perceptually feels a lot cleaner and easier to reconstruct in the mind.'	2026-07-27 01:56:31 -04:00
ed	0e74369052	docs(twitter): add 1857820858162753661 corpus (NOTimothyLottes mmap-log 'no debugger no libc' design) 13-post thread with @bmcnett. Canonical statement of the mmap log design: 'printf-debugging is a horrible term; if you're debugger- free might as well be libc-free.' .log file mapped, first half = lines of fixed 64-char size (cacheline), second half = single 32-bit atomic counter for write position. Writing = increment atomic + dump line. No contention, no file IO, lock-free, captures temporal ordering across threads. Fixed format: r\|sc.milmic\|line_\|hex_____\|0000000000-\|string... (r=reload, sc.milmic=time since launch, line=source line, hex, dec, msg). Tlk(__LINE__, n, 'msg') API, no printf needed. Examples show startup timing (cart mapping 0.005-0.008s) + Vulkan instance 2.4s with PSO hits at 3K us each. Multi-run log for comparison, Notepad2 F5 to reload. Discussion of fixed line widths in modern era. The cleanest documentation of the mmap log pattern - 8 months before the Aug 2025 CART file announcement.	2026-07-27 01:56:28 -04:00
ed	6df60c25ae	docs(twitter): add 1868492851170205803 corpus (NOTimothyLottes 'buffer zoo' + @kechogarcia + @axelgneiting) 7-post thread. NOTimothyLottes: 'Buffer zoo' is the silly season of Vulkan shader-side stuff to get what you actually want = instruction intrinsics; just layout + SSBO aliasing hints at brutal API design. Dream API: 'Read<32,64,128>(uint64_t base, uint32_t offset, uint32_t immediate, uint32_t cacheControl, uint32_t format); And be done with this stupid mess.' Rough edges of TEXEL_BUFFER: 9-bit shared 5-bit E is read-only on NV, no AMD; sRGB yes NV, no AMD; NV limits to 128M elements = {512MiB, 1GiB, 2GiB} for {32,64,128}-bit. Third sibling of the Dec 2024 buffer-zoo cluster.	2026-07-27 01:55:10 -04:00
ed	2f60019cbd	docs(twitter): add 1868436190447432064 corpus (NOTimothyLottes Vulkan 'buffer zoo' problem) 2-post thread. The real 'buffer zoo' problem with Vulkan: need HW instruction emulation macros with overcomplete both {offset, index} inputs so emulation can choose the right path based on whatever {SSBO, TEXEL_BUFFER, future pointer}. TEXEL_BUFFER for formats one can't load from SSBOs, both need 'indexes', and whenever IHVs actually correctly optimize the pointer extension, one needs byte offsets. 'Deadcode removal nightmare land wins today.' Sibling to the 1870351985855119449 STORAGE_TEXEL_BUFFER aliasing thread.	2026-07-27 01:53:21 -04:00
ed	f75e2c2442	docs(twitter): add 1868716773937414589 corpus (NOTimothyLottes 'write an assembler for GCN/RDNA' + SteamOS wish) 5-post thread with @GustavSterbrant + @AgileJebrim. NOTimothyLottes: most problems can't be fixed by bypassing GLSL and doing SPIR-V directly; not yet tempted to write a new shader language. At-home stuff targets VK Windows mostly (unfortunately) - no interface in AMD's Windows driver to load binary shaders into VK. 'If SteamOS ever fully took over PC gaming, then certainly I'd just go direct to the AMD kernel driver and bypass user-mode VK.' Would write an assembler specific for GCN/RDNA/whateversNext, not just modify an existing compiler. Precursor to the Aug 2025 SPIR-V-from-data work.	2026-07-27 01:53:18 -04:00
ed	aef260d907	docs(twitter): add 1859395250537595016 corpus (NOTimothyLottes hard-core shader dev mega-thread) 33-post thread, 13 PNGs, the canonical 'shader dev' walkthrough. Reads FXAA 3.11 / FSR1 / Unity STP, names the permutations pattern (32-bit / packed 16-bit / implicit mediump / MIN-MAX sampling). DXIL no bitfield ops vs SPIR-V (HLSL pre-processing on non-Xbox = fail). 16-bit perf: don't permute from 32-bit float constants (instant PC perf death), alias FP16 as UINT32 binary blobs. Designing shaders like GPU assembly, AMD RDNA2 as the design target. Defines map logic to associated instruction (fma, bitfieldExtract), all-ints-unsigned convention with SI1_I1()-style bitcast macros, 3-letter type macros. Compiler pattern-match bugs: tried always-UINT4 bitcast, didn't work; native types except waveops. Argument passing SSA state explosion: inout uint4[16] (all VGPRs) didn't end well - shader langs support globals, use them. Next level: mostly globals with type- bitcast aliasing; x86-64 union-of-structs-on-globals as the GPU calling ABI. Foundation for the SPIR-V-from-data + cart-file work (Aug 2025).	2026-07-27 01:52:23 -04:00
ed	a9335928c7	docs(twitter): add 1858715434436227544 corpus (NOTimothyLottes NV front-buffer bringup + 'cart file' origin) 8-post thread on NV-specific 1-deep swap with IMMEDIATE presentation, after previous AMD-tuned impl stopped working. One descriptor set always bound (resources static after init). Dropped VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT_EXT ('god awful naming length') to avoid NV indirection perf hit. 2700 lines of engine, embedded headers, one-file compile. 0.25 sec hot-load on NV dGPU: 4 MiB 'cart' file from pagecache, 512 MiB GPU buffer, all PSOs, image alloc, command buffer copy+clear. Load-time parallelism: {instance create, cart map, TLB warming by walking pages, window bringup} = 0.08 sec. After VK device, signal background SPIR-V load while building descriptor set layout, then PSO compile parallel. Earliest documented use of 'cart file' term - 9 months before the Aug 2025 public announcement.	2026-07-27 01:48:44 -04:00
ed	d3460e3b53	docs(twitter): add 1786551881504104518 corpus (NOTimothyLottes 'pure ASM is easy, bloat is hard') 7-post debate with @SebAaltonen + @Nerfoxingaround. NOTimothyLottes: back in the day wrote DOS extender (32-bit mode), Sound Blaster drivers, VGA interface, UI + audio synth + sequencer/editor, all in assembly, dead easy - a lot easier than bringing up a triangle in Vulkan. Nothing complex about ASM with a good macro preprocessor; easier than HLLs because you know exactly what you get; interrupts and syscalls are easy. The mess came later when systems forced C ABI library interfaces with their stacks and junk - then C++ made it worse. 'Pure ASM is easy, its interfacing with the bloat world that got hard.' Philosophical backbone of his single-file-C / no-debugger lifestyle.	2026-07-27 01:46:13 -04:00
ed	60231356bb	docs(twitter): add 1870351985855119449 corpus (NOTimothyLottes STORAGE_TEXEL_BUFFER aliasing) 18-post single-author technical walkthrough, 6 PNGs. Defines streaming qualifier alphabet (W=writeonly coherent, R=readonly, A=atomics, E=streaming readonly/exclusive, F=streaming writeonly/final) for future-compat code even though Vulkan is missing them. Only 32/64/128- bit type descriptors kept; type aliasing for fast path, explicit type only when buffer compression might help someday. No 16-bit <U,S>NORM, no 9E5/sRGB buffer access. AMD buffer atomics get signage from opcode so UINT32 TEXEL buffer can alias r32ui/r32i. Few hundred macros for STB access. Continuation of the bind-everything-once engine cleanup.	2026-07-27 01:45:02 -04:00
ed	2dd370ae39	docs(twitter): add 1917656437473399108 corpus (NOTimothyLottes 'C is not assembly' punchline) 10-post thread. 9 of 10 posts are duplicates of posts 2-10 in the existing merged 1917646466417381426 corpus (same conversation, gallery- dl anchored to this leaf URL). The unique value here is post 10 (2025-04-30 19:04:57): 'I laugh when people say C is like assembly, they are missing what we actually did in assembly back then, which was all registers and globals and gotos, no stacks. It's radically different than good assembly.' Same content also captured in bootslop references/X.com - Onat & Lottes Interaction 1.png.ocr.md.	2026-07-27 01:42:25 -04:00
ed	df60414c98	docs(twitter): add 1948009807161721332 corpus (NOTimothyLottes single-file-C + mmap log dev setup) 13-post thread with @Karyuuntei. Single-source C includes only __FILE__; WIN32 + VK headers inlined with structural-type rewrites (64-bit ints not pointers) to get 'toward C--'. GLSL and C mixed in same file, sharing defines. One external include for compiled SPIR-V; spirv-opt as pre-processor (else IHV compilers 10x slower). Dev setup: 2 terminals each with own shell script, one loops regenerating SPIR-V, other loops recompiling+running. MINGW64 not VS. Mmap log format: {[restart]\|[ms]\|[line]\|[hex]\|[dec]\|[comment]}, fixed-size lines, lock-free (one atomic add), wraps, clear = rm. 0.3ms startup.	2026-07-27 00:20:04 -04:00
ed	1dbcafd8c2	docs(twitter): add 1950860870818439202 corpus (NOTimothyLottes 'build engine from asm' announcement) 55-post thread, HIGHEST engagement NOTimothyLottes thread in corpus (312 likes, 15 reposts, 20147 views), 1 MP4 video. The canonical project-announcement thread: 'people claim assembly is hard; a good counter would be showing how to build a x86-64 WIN32 Vulkan engine from scratch in ASM.' Posts 2-3 lay out the rationale (game logic on GPU = no point avoiding asm; re-arch argument-gather for cold- cache via store multicast + linear prefetch). Post 7+ defend the WIN32-as-Linux-strategy (Valve/Proton + Wine). Post 22 reveals CRT setup (low-res, bitmap fonts). Post 50 the punchline: no triangles, PS-free, CS-only, all gfx generated vintage-PC-style on compute GPUs. Post 51: VK wins for compute because VkEvents pipelinable vs DX12's serializing barriers vs GL's lack of pipelining. Post 55: live systems as 'the IDE' with function keys as save/restore pallet, snapshot the entire project as one file.	2026-07-27 00:13:11 -04:00
ed	fcf70065e6	docs(twitter): add 1951347512088088657 corpus (NOTimothyLottes 'engine' project announcement) 5-post thread. Announcement of next at-home project: build-from-assembly video series + public-domain 'engine' = live-edit toy for GPU-side PC game dev. Won't run on Intel iGPUs (binding limits). Memory model is CART-style (RAM CART buffer = snapshotted + dynamic GPU = not). GPU- side editing tools for shader source + bind tables, build-the-editors- in-it, load/store to CART. Sized for what a single person could pull off, not TBs of team-gen content.	2026-07-27 00:11:57 -04:00
ed	ea9ca0e738	docs(twitter): add 2075403544111263795 corpus (NOTimothyLottes KMSDRM + framebuffer font + @darrellprograms) 6-post thread. Prefers framebuffer-console boot over graphical login; converting LottesCode6x12 font to PSF2 for framebuffer terminal source-editing; wants Vulkan swap bringup without X/Wayland. @darrellprograms suggests SDL KMSDRM option. NOTimothyLottes: 'zero development effort' is the wrong goal for him - the goal is to push the state of the art, use SDL source as reference. Tangent: VT switching + VRAM page-out + exclusive GPU ownership.	2026-07-27 00:10:16 -04:00
ed	57f26b0aec	docs(twitter): add 1688491417109196800 corpus (NOTimothyLottes CRT-on-LCD sub-pixel multiplexing) 6-post thread, 6 images, HIGHEST engagement NOTimothyLottes thread in corpus (98 likes, 18 reposts, 12960 views). CRT emulation on LCDs can trigger LCD hardware bugs - Steam Deck example: 2x1 {G,RB} checker inside a window affects the scan outside the window (Deck scan is 90deg rotated). Theory: 90deg scan gives {(bottom)R, G, B (top)} sub-pixel components; alternate {G,RB} per pixel for new sub-pixel pattern at different virtual resolution. Linear-energy-conserving in theory, breaks down in practice. Tangent post 6 from @realtimekeith on voltage-inversion crosstalk varying across TN/IPS/VA.	2026-07-26 22:12:06 -04:00
ed	56538f8e3d	docs(twitter): add 2061252886453944549 corpus (NOTimothyLottes debug philosophy + @retrotink2) 3-post tangent. Same root post as 2061124942968545433 (no-debugger debug / mmap log file evolving toward CART). @retrotink2 (RetroTink hardware-modder) replies 'here I am still debugging by looking at if a single LED turns on'; NOTimothyLottes: 'that plus an oscilloscope, this is the way'.	2026-07-26 22:12:03 -04:00
ed	2b2ff48a35	docs(twitter): add 2011429743053111302 corpus (NOTimothyLottes color-forth permutation space + @VPCOMPRESSB) 3-post thread, 1 image. Links a YouTube video on exploring the permutation space of color-forth-like systems. @VPCOMPRESSB asks if the program should optimize its own code at/after init for every subsequent exec to be hyper-specialized. NOTimothyLottes: yes can do that, but those systems are already faster than human response time (without external Linux kernel deps); will be able to recompile all binary code (including GPU-side) in a tiny fraction of frame time.	2026-07-26 21:43:24 -04:00
ed	212d29c5f1	docs(twitter): add 2061124942968545433 corpus (NOTimothyLottes CART-as-log + GPU-rendered hex) 3-post thread. Replaces the mmap fixed-size log file with a CART file mmapped on CPU with mapped GPU access (no file IO) + background page walker to prevent paging out. Beginning of CART is a grid of 32-bit unsigned values, hex-dumped as the 'log file' to either term or GPU render. Same framework for CPU and GPU debug ('write and it just appears'). Visualized as 4-bit/char hex terminal with hex font.	2026-07-26 21:43:23 -04:00
ed	58af0e7bb1	docs(twitter): add 1951638140600381942 corpus (NOTimothyLottes SPIR-V-from-data + 'cart file') 7-post multi-person thread (NOTimothyLottes / @onatt0 / @EskilSteenberg / @olson_dan). On-the-fly SPIR-V generation instead of GLSL: cart file = 'code+data' restartable package, macro-assembly-style language where defines are played back interleaved for ILP/loop-unroll, SPIR-V out direct (no GLSL step). Inspired by C64 SID tracker pattern + instrument. CPU does the SPIR-V generation from data the GPU can read/write.	2026-07-26 21:43:21 -04:00
ed	2fe6afa15d	docs(twitter): add 2074725506050642021 corpus (NOTimothyLottes RADV + queue setup) 5-post thread. vkGetShaderInfoAMD available on RADV (will be filing optimization bugs). DEVICE_UNCACHED_BIT_AMD works on RADV for low-latency CPU/GPU. Header-free Vulkan: VkPhysicalDeviceLimits replaced with 63 64-bit values (504 bytes) for direct byte offset. Next: another beam-racing effort on Linux, separate queue for present-only to fully decouple {dispatch, swap}.	2026-07-26 21:37:44 -04:00
ed	b3e4c9c1c6	docs(twitter): add 2075769880092037309 corpus (NOTimothyLottes Vulkan queue-choice rethink) 4-post thread, highest-engagement NOTimothyLottes thread in the corpus (42 likes, 5063 views, 2 reposts). Don't search for the ideal {presentation,graphics,compute} queue - just use queue 0. AMD: render via compute on queue 0, present on queue 1, decouple for front-buffer racing. NVIDIA: queue 1 is DMA, queue 2 is compute; queue 0 = gfx + present, dispatch on queue 2 by default. Tangent: someone complaining about the bitmap font in Post 1 looking like a captcha.	2026-07-26 21:37:42 -04:00
ed	ffc39fc6a2	docs(twitter): add 2076833886827376782 corpus (NOTimothyLottes J_/JNzI1_ goto macros + nanorc) 6-post thread. Root: commits fully to {if,goto} control flow via better macros (J_(label) -> goto label; JNzI1_(label,v) -> if(v!=0) goto label;). Customizes nanorc syntax-highlight to color them. Tangent: someone suggesting Perl DSL, not invited to the 'better software' conference.	2026-07-26 21:36:18 -04:00
ed	186a71321a	docs(twitter): add 2078729662021111958 corpus (NOTimothyLottes 64KiB aligned jump-window + @noop_dev exchange) 11-post thread. Root: 4-byte overhead interpreter with 64KiB aligned window of directly-jumpable words (write to ax doesn't change other 48 bits), cuts source size in half. Tangent with @noop_dev covering cold-cache misses, runtime-macroassembler idea, 4K Atari 2600 emu precedent. End: GPU code generation for AMD where 8+ bitfields in an opcode means the simple interpreter won't work.	2026-07-26 21:36:15 -04:00
ed	f603a7bf69	docs(twitter): add 2078349730204078527 corpus (NOTimothyLottes 8-byte aligned Forth word) 4-post x86-64 interpreter work: all 0-6 arg syscalls in 32 bytes, embed interpreter inside words with 3-byte overhead (AD lodsd + FF E0 jmp rax), force lower 32-bit but keep 64-bit, pack interpreted forth words in aligned 8-bytes. Tangent post 4 from @NOTimothyLottes self-replying about custom bytecode + on-load decompression.	2026-07-26 21:36:13 -04:00
ed	60510ac349	docs(twitter): add 2078257788732461389 corpus (NOTimothyLottes nasm-bootstrapped 4KiB Linux Forth) 4 posts, 1 image. Post 1 = the actual content (nasm, ELF64, 4 KiB binary target, radical Forth, no C baggage). Posts 2-4 = tangent with @furan about the bitmap font in Post 1's screenshot. gallery-dl pulled the whole conversation tree.	2026-07-26 21:35:03 -04:00
ed	d2ee4f0ea0	docs(twitter): add 2079746130309415185 corpus (NOTimothyLottes micro macro-forth rough-draft) 5-post design walkthrough of a branch-free 8-bit/word color-forth variant for code generation: ~62 dict entries/page with paging, call/return inlined by the compiler, branch-free compile + branch-free execution, lookup-table pre-compile writes unaligned 8-bytes. Intended for tiny self-contained chunks that include their own codegen. Whitney-esk in single-char vars, non-Whitney in no higher-order arrays.	2026-07-26 21:33:59 -04:00
ed	41be106888	docs(twitter): add 2076534605193036043 corpus (NOTimothyLottes X_/G_ macro + assembly-style control flow) 3-post companion to 2063733456144597200 + 2076893128515112995: defines X_ = return, G_ = goto. Moves from C-style {if,while,do,switch,for} to assembly-style {if,goto} so static branch prediction (backward=taken, forward=not_taken) is explicit. Plus exit-with-error now properly drains the background console render thread.	2026-07-26 21:22:36 -04:00
ed	175bce00f3	docs(twitter): add 1737201090058219980 corpus (NOTimothyLottes logger + multi-session design) 3-post sibling to the 1736161886079533186 VK retrospective: same public- domain release context, Dec 2023. Covers the mmap-ring-buffer error logger and the multi-session rationale (crash auto-reload + log preservation, no debugger).	2026-07-26 21:21:17 -04:00
ed	c271307215	docs(twitter): add 1736161886079533186 corpus (NOTimothyLottes VK engine retrospective) 10-post single-author walkthrough of his old Vulkan pipeline: warm-all-pages startup, auto-relaunch on crash, single-SPIR-V plus spec-constants, Bind- Everything-Once, SSBO-as-4-types aliasing, GPU-side game logic, hardware- style fixed resources. High engagement (36 likes, 3778 views). Closes with the ruthless-anti-complexity thesis.	2026-07-26 21:19:35 -04:00
ed	1d581651fe	docs(twitter): add 2073543950497898839 corpus (Wine audio exclusive-mode bug)	2026-07-26 21:17:53 -04:00
ed	0f80938a55	docs(twitter): add 2076893128515112995 corpus (NOTimothyLottes error-check disasm) 6-post single-author walkthrough of the fast-path error-check pattern: volatile store __LINE__ + error code, TEST, conditional forward branch to a distant Err() call. Companion to the 2063733456144597200 thread which established the macro system via the conversation with @winning_tactic.	2026-07-26 21:15:16 -04:00
ed	0d51c3541d	fix(twitter_threads): strip trailing slashes from --media-dir and --output Path objects with trailing separators (e.g. './media/') intermittently failed to glob on Windows after download_media.py had just finished writing the directory. Normalize via rstrip('/\\\\') before Path() re-wrapping so the CLI is forgiving regardless of OS path quirks. Repro: gallery-dl ran + immediate render with trailing slash on the just-created media/ dir -> glob returned nothing -> markdown emitted no ![Media N] lines. Removing the trailing slash fixed it; this makes both work.	2026-07-26 21:12:20 -04:00
ed	8a1e2ecc9e	docs: TRACK_COMPLETION + chronology row + state.toml superseded Track closed per user direction with PARTIAL completion (17 of 31 tasks done; 13 deferred to followup tracks). TRACK_COMPLETION_test_suite_cleanup_gemini_cli_removal_20260705.md records the 12-VC status (7 PASS, 5 NOT DONE / NOT VERIFIED), the phase-by-phase breakdown, branch state, risks, and hand-off notes. conductor/chronology.md: Active row updated to 'Partially Completed' with the commit range + summary of completed/deferred work. state.toml: status = 'superseded'; current_phase kept at 1 (mid-Phase 1) to reflect actual stopping point; new [followup_tracks] section records the two upcoming tracks: - vendor_ai_client_track (Front C metadata work) - test_de_crufting_track (Front B cruft work)	2026-07-05 21:29:05 -04:00

1 2 3 4 5 ...