diff --git a/conductor/tracks/video_analysis_creikey_dl_cv_20260621/report.md b/conductor/tracks/video_analysis_creikey_dl_cv_20260621/report.md new file mode 100644 index 00000000..3e9fba1a --- /dev/null +++ b/conductor/tracks/video_analysis_creikey_dl_cv_20260621/report.md @@ -0,0 +1,1421 @@ +# Creikey — Deep Learning and Computer Vision for Game Developers (BSC 2025) + +**Source:** https://youtu.be/yxkUvXs-hoQ +**Author:** Creikey (Cameron Wrights, indie game developer & DL hobbyist) +**Cluster:** D (Applied / practical) +**Slug:** creikey_dl_cv +**Track:** Child #12 of `video_analysis_campaign_20260621` (LAST CHILD) +**Date:** 2026-06-21 +**Pass:** 1 of 3 (research-only deep-dive) + +--- + +## 1. TL;DR + +This is **Creikey's talk at BSC 2025** (a conference) titled *"Deep Learning and Computer Vision for Game Developers"* — the **applied capstone** of the campaign, validating the theory from the prior 11 children against actual game-development practice. The speaker (Cameron Wrights, indie game developer and DL hobbyist) walks through: + +1. **Why DL matters for game developers** — automatic programming, image classification, NPC behavior, content generation. +2. **Why it's all Python** — even for game engine developers. The discussion pivots to John Carmack (id Software / Doom / Quake) leaving Meta/Oculus to start an AGI company and switching to Python. +3. **A historical anecdote** — the speaker's college roommate had a deep learning model for predicting League of Legends winners; the published paper had a data leak bug that took 2 weeks to find. Lesson: "machine learning is automatic programming." +4. **Random forests vs deep learning** — the historical arc from automatic if-statements to learned neural networks. +5. **The "Dante's Cowboy" case study** — the speaker's LLM-controlled NPC game that was never released because "games are about predicting and understanding systems, and an LLM is an unpredictable black box." +6. **The vending machine problem** — LLM-controlled vending machines convinced to stock tungsten cubes and sell at a loss. +7. **Arc AGI tests** — Grok's recent jump, safety training vs capability. +8. **Interpretability research** — Anthropic's Golden Gate model. The speaker is skeptical of practical value. +9. **The big question** — "software engineers use AI and that destroys better software because that's what's happening pretty much already." +10. **The composability problem** — LLMs can do a single task impressively, but composing multiple tasks in a coherent game loop is unsolved. + +The talk is the **applied capstone** of the campaign. It tests whether the theoretical frameworks from the prior 11 children (FER vs UFR, mixed selectivity, traveling waves, trace logic, LLaMA architectures, etc.) are useful for the practical problem of building games that use DL. + +**Key insight from the talk:** **LLMs are great at single tasks but bad at compositional tasks.** For a game to be playable, NPCs must respond coherently across multiple interactions, plan ahead, and maintain consistency. Current LLMs fail at this — hence the speaker's LLM-NPC game was "fundamentally flawed" and never released. + +**Cross-cluster position:** Sits in cluster D and bridges to all 11 prior children via specific concrete examples. The speaker's critique of LLMs-as-game-NPCs connects to: +- **Kumar's FER hypothesis** — current LLMs are FER (Fractured Entangled Representations), not UFR (Unified Factored Representations). +- **Miller's mixed selectivity** — game NPCs require context-dependent behavior; mixed selectivity would help but is not yet reliably achievable in LLMs. +- **Hoffman's trace logic** — game NPCs need trace-based reasoning (memory of past interactions); current LLMs have limited trace memory. +- **Brain counterintuitive's reservoir computing** — random networks + readouts might be better for NPC behavior than large Transformers. +- **Score dynamics** — score matching for generative models in games (texture synthesis, level generation). +- **cs336's architectures** — the speaker's Dante game used GPT-4; modern Transformer architectures are sufficient for one-shot generation but not compositional. + +--- + +## 2. Key Concepts + +Twenty concepts form the conceptual spine of the talk. Each is developed in §5 with formal treatment. + +### 2.1 Deep learning as automatic programming + +The speaker frames DL as **automatic programming**: "You set up some architecture and repeatedly optimize with respect to, you know, your examples. A machine learning model, a black box of many parameters. It's a learned program, right? It's automatic programming." + +This framing is consistent with the campaign's broader theme: a neural network is a Markov matrix on the trace logic (Hoffman), a transformer is a parameterized policy (cs336), and learning is finding the parameters that minimize loss on training data. + +### 2.2 Why Python (the game-developer perspective) + +The speaker's question: "Why is it all Python?" Answer: "It's because it's basically meta programming, right? It's a complicated build system. These training scripts basically are just there to eventually, you know, automatically create the program that does what you want it to do. And most of the heavy lifting is not necessarily done by the actual code you write anyways, it's done in the dot products on the GPU." + +This is consistent with the cs336 lecture: the model body is the "dot products on the GPU"; the training script is the meta-programming that creates the model. Game developers who want fast code (C++, Rust) end up using Python because the model is the bottleneck. + +### 2.3 John Carmack's pivot + +The speaker references John Carmack (legendary programmer of Doom, Quake, id Tech engine; former CTO of Oculus VR at Meta): "John Carmack for his current AI efforts, he's in the race to win the world. He has switched to Python. I thought that was notable." + +Carmack left Meta in 2022 to start **Keen Technologies**, focused on AGI. Despite his decades of C/C++ optimization expertise, he switched to Python — confirming that Python is the lingua franca of AI even for hardcore systems programmers. + +### 2.4 The data leak anecdote + +The speaker's college roommate had a DL paper on predicting League of Legends winners: "This was trained and, you know, on a and optimized on many, many professional games. And you know, it's it's just a pretty interesting problem, right? And at that time, right, I implemented uh League of Legends game predicting paper, and I couldn't reproduce the results. [...] And basically, there's a there was a bug in the paper, right? This is a very important lesson of not just how hard machine learning is, but how often people mess up, right?" + +The bug: the metric "champion win rate" was computed on the entire dataset including the test set, so the model appeared to have amazing accuracy but was leaking information. Lesson: "you have to find like a scientist, right? You almost have to binary search over the problem space sometimes." + +### 2.5 Random forests vs deep learning + +The speaker gives the historical arc: "Some guy in the '60s was looking at this, and he was like, I think I know what these are. He was very foolish. You know, he basically, on a very abst..." — referring to the history of automatic programming from if-statements to neural networks. + +The progression: random forests (learned if-statements) → shallow neural networks → deep neural networks → transformers. Each step increased the complexity of the learned program. + +### 2.6 The Dante's Cowboy failure + +The speaker's LLM-controlled NPC game (Dante's Cowboy): "I went through so many iterations of that idea. I made it so that like they could join your party and trade items and all this stuff and like I played with it a bunch, but it never felt like there was any meat to the game. The problem is games are about predicting and understanding systems almost, right? And an LLM is an unpredictable black box, sometimes acts like a person, sometimes doesn't. It's not fun to interact with in a way where you're trying to get it to do something." + +This is the **composability problem**: LLMs are good at single interactions but bad at multi-step coherent behavior. For a game NPC to be playable, the NPC must: +- Maintain consistency across turns. +- Plan ahead. +- Pursue goals. +- React to the player's strategy. + +LLMs fail at all of these (per the speaker). + +### 2.7 The vending machine problem + +The LLM-controlled vending machine (referenced as "the vending machine at Claude"): "People are basically convincing these AI vending machines to stock many tungsten cubes and sell them at a loss and then they go bankrupt. They've published many reports about this and they're like, 'You know, we're we're you know, we're doing some prompt engineering to fix the business model and we're trying to make it refuse tungsten cubes more.'" + +This is a concrete example of LLMs being "unpredictable black boxes" — the LLM doesn't reliably refuse economically harmful actions. The fix is more prompt engineering, but the speaker is skeptical: "there might be no end to humanity's ingenuity as we battle against the machines for the next few decades." + +### 2.8 Arc AGI tests + +The speaker discusses Grok's recent improvement on the Arc AGI benchmark: "Grok might have performed better because of its utter lack of safety training. That is a rumor I kind of saw and it seemed to be somewhat confirmed by, you know, another times it's like they did safety training and the model got much less, you know, reasonable in quotes. Um it might be the case that the safety training is not intelligent." + +The implication: AGI benchmarks may be measuring safety training removal rather than capability. The speaker is skeptical of the rigor of these benchmarks. + +### 2.9 Interpretability research (Anthropic) + +The speaker discusses Anthropic's interpretability research (Golden Gate model): "Anthropic is the leader in interpretability, I think. Um they basically do data science on the neurons themselves, and they try to make claims about, you know, how the data is flowing through the model. Um you can look at their Golden Gate model. They have a lot of really excellent stuff you can read forever. It's really cool. Um Do you know much about it? Like, do you feel like that is the way forward, or is that just a byproduct of the craziness?" + +The speaker's answer: "I don't think there will be any value created from interpretability research, honestly, other than maybe optimizing the training process somehow. I'm honestly not really sure. I think they're mostly doing it because they have to, because they want to be the safe people or something." + +A skeptical view of interpretability as practical science — at least from a hobbyist game developer's perspective. + +### 2.10 The composability problem + +The speaker's overarching critique: "The problem is games are about predicting and understanding systems almost, right? And an LLM is an unpredictable black box, sometimes acts like a person, sometimes doesn't." + +This is the **composability problem** of LLMs: a single LLM response can be impressive, but composing multiple LLM responses into a coherent game is unsolved. The Dante game failed because: +- The NPC's behavior was inconsistent across turns. +- The NPC couldn't plan ahead. +- The NPC couldn't maintain a coherent goal structure. +- The player couldn't "master" the game because the NPC's behavior was unpredictable. + +This is consistent with the FER vs UFR hypothesis (Kumar): LLMs are FER (Fractured Entangled Representations), not UFR (Unified Factored Representations). A factored NPC representation would have separate axes for "current goal," "memory," "personality," etc. Current LLMs entangle all of these into one representation. + +### 2.11 The "everything's probably fine" worldview + +The speaker's epistemic stance: "Everything's probably fine, but everything's probably fine, but sometimes bad things happen." + +This is consistent with conscious realism (Hoffman): reality is uncertain, the future is undetermined. The speaker doesn't take a strong position on AI doom or AI utopia; he just notes that "there might be no end to humanity's ingenuity." + +### 2.12 The "automatic programming" framing + +Returning to the speaker's main framing: "I want to stress again that machine learning is automatic programming, right? In this paper, this is a machine learning technique, not a deep learning technique, called random forest, where it's basically a series of learned if statements directly, right? And you know, there might be some majority voting scheme, and they have some algorithm to optimize like based on the training data, oh, this if statement should be this, and this if statement should be this. They basically are attempting to tweak the if statements such that it, you know, is as accurate as possible on the training data." + +This is the deepest insight: ML is **automatic programming**. The architecture is the language; the training data is the spec; the optimization produces the program. The program can be if-statements (random forests), neural networks (deep learning), or any differentiable function. + +### 2.13 The "vast majority of performance is in numerical calculations" + +The speaker: "The vast majority of performance is spent on numerical calculations to find the program, not to setup those operations or queue them for the GPU." + +This is consistent with the scaling laws (cs336): compute (FLOPs) dominates architecture. The "numerical calculations" are the matrix multiplications; the "setup" is the Python script. + +### 2.14 Game engines vs DL frameworks + +The speaker's game engine projects (Asteris, Dante's Cowboy) used custom engines. The DL components (NPC LLMs, image classification) used standard frameworks. The integration is the hard part: getting game engine performance + DL flexibility. + +This is the **applied gap**: theory (LLaMA architectures, trace logic, mixed selectivity) is well-developed, but practice (game engine integration, real-time constraints) is hard. + +### 2.15 The "Asteris" game + +The speaker's space game Asteris: "That there on the upper left is a space game called Asteris. It's releasing in 2035. Um, it's got something like a Overwatch net code in it, right? It's like a big solar system with many people playing at the same time." + +Asteris is a multiplayer space game with ambitious net code. The "releasing in 2035" date is far enough out to suggest the game is still in development. + +### 2.16 Godot engine + +The speaker mentions working on the Godot engine (open-source game engine): "I'm making a lot of individual games, I'm working on the Godot engine." + +Godot is the open-source alternative to Unity and Unreal. The speaker's choice of Godot suggests a preference for open tools — consistent with the "weird indie developer" framing of the talk. + +### 2.17 The "Dante's Inferno" / "Dante's Cowboy" failure + +The Dante's Inferno / Dante's Cowboy project: "Dante's Inferno. Um, this was no engine in C, 3D rendering, animated armatures, all from scratch, and all the NPCs in this game were uh you know, they were controlled by an LLM, and this was never released because it's fundamentally flawed." + +Built from scratch in C with 3D rendering and animated armatures. All NPCs LLM-controlled. Never released. Lesson: "games are about predicting and understanding systems, and an LLM is an unpredictable black box." + +### 2.18 The "first" DL experience + +The speaker's first DL project: "Ironically enough, actually, my first ever professional programming experience was to architect a deep learning model from scratch, um make it work, and deploy it in production. Um this was for my college roommate's company called MacroHard. Um he basically made software that like was a custom broadcast overlay for for League of Legends teams, and I made a deep learning model that predicted which team was uh going to win the game." + +The first project was a DL model for esports broadcast — predicting League of Legends winners in real-time. The speaker was a DL hobbyist from the start. + +### 2.19 The "creikey" GitHub repos + +The OCR captures GitHub repos: "Contribute to creikey/operomnia, creikey/continuity-clone, creikey/project-orbit, creikey/tiny_engine." + +These are the speaker's open-source projects, available on GitHub. The names suggest: +- **operomnia** — possibly a game or library. +- **continuity-clone** — possibly a "Lemmings"-style game clone. +- **project-orbit** — possibly a space game (related to Asteris?). +- **tiny_engine** — a small game engine. + +The speaker is a prolific open-source developer, consistent with the indie game developer framing. + +### 2.20 The big question: AGI or not? + +The Q&A includes a question about whether AGI is near: "Do you think there So, I I feel like there is a field of research, I just don't know the words for it or how big it is, but do you think there's a way that we can reverse engineer [interpretability]?" + +The speaker's response (in the Q&A): "Interpretability is big. A lot of [Anthropic] is the leader in interpretability, I think. [...] I don't think there will be any value created from interpretability research, honestly, other than maybe optimizing the training process somehow." + +The speaker is skeptical of interpretability as practical science but acknowledges the work. + +--- + +## 3. Frame Analysis + +1605 unique frames were extracted from the 815MB mp4 at threshold 0.05; OCR'd via winsdk in 130 seconds. **Most frames are video content (speaker, demo, screen share) with no extracted text.** The OCR is sparse but captures key concepts. + +### 3.1 Frames 1-50 — Speaker + opening + +**OCR text:** (mostly empty; speaker is on camera) + +The opening sequence. The speaker is introduced (Cameron Wrights); the audience reacts. + +### 3.2 Frames 60-200 — Cat/dog example + ML basics + +**OCR text (key extracts):** +> Pseudocode: Look for ear shapes with edge detection. If pointy ear, cat. If floppy ear, dog. +> Machine Learning Model Black box parameters +> Setup the architecture, repeatedly optimize with respect to the input + +The classic cat/dog image classification example. The pseudocode for hand-written classification vs. the ML approach. + +### 3.3 Frames 200-400 — Random forests + history + +**OCR text:** (mostly diagrams and code examples) + +The historical arc from random forests (learned if-statements) to deep learning. + +### 3.4 Frames 400-700 — Game engine projects + +**OCR text (key extracts):** +> Space game called Asteris. It's releasing in 2035. +> Dante's Inferno. Um, this was no engine in C, 3D rendering, animated armatures +> All the NPCs in this game were controlled by an LLM, and this was never released + +The game engine projects. Asteris (space game, 2035 release). Dante's Inferno (LLM-controlled NPCs, never released). + +### 3.5 Frames 700-1000 — Why Python + Carmack + +**OCR text (key extracts):** +> Why is it all python? +> Deep learning is basically a complicated build system +> The vast majority of performance is spent on numerical calculations to find the program, not to setup those operations or queue them for the GPU +> Briefly Talked with John Carmack at Quakecon about Julia + +The Python discussion. John Carmack pivot to Python for AGI. + +### 3.6 Frames 1000-1300 — Dante's Cowboy + data leak + +**OCR text:** (mostly speaker + code) + +The Dante game discussion and the data leak anecdote. + +### 3.7 Frames 1300-1605 — LLM/Arc AGI/vending machine + +**OCR text (key extracts):** +> Grok might have performed better because of its utter lack of safety training. +> Anthropic is the leader in interpretability +> Vending machines to stock many tungsten cubes and sell them at a loss + +The LLM section. Grok. Interpretability. Vending machine problem. + +### 3.8 Note on OCR limitations + +The OCR for this talk is **much less informative** than for other children because the talk is highly visual (speaker on camera, code demonstrations, screen shares). The transcript (74KB) carries most of the conceptual content. + +--- + +## 4. Transcript Highlights + +Sixteen verbatim passages from the cleaned transcript (2082 segments, 74KB) that capture the conceptual flow. + +### 4.1 Opening (T+0:30) + +> "Okay, so this is very difficult because Cameron is such a is such an like indescribable guy. Um you know, yeah, this is very very very difficult. I know Cameron just as like the guy who uh you know, he he's like the guy crazy enough to just like dig through the mountain and do just like as sudden sudden huge amounts of work uh of like, 'Oh, I cloned an application or I did this.' And then at other times he, you know, goes and becomes a web developer and then complains about like not being of the program anymore." + +The introduction (by another speaker, presumably an event organizer). + +### 4.2 The cat/dog example (T+2:00) + +> "LET'S SAY YOU WANTED TO WRITE A program that could tell the difference between a cat and a dog. I want you to actually think about how you would program it. You'd maybe start with a handcrafted algorithm. Pseudocode: Look for ear shapes with edge detection. If pointy ear, cat. If floppy ear, dog. Image Program Look for ear shapes with edge detection Input If pointy ear, cat. If floppy ear, dog Which class Output Is it?" + +The classic cat/dog example. Handcrafted features vs. learned features. + +### 4.3 ML as automatic programming (T+3:30) + +> "You set up some architecture and repeatedly optimize with respect to, you know, your examples. A machine learning model, a black box of many parameters. It's a learned program, right? It's automatic programming." + +The framing. + +### 4.4 Why Python (T+4:00) + +> "As a quick aside, why is it all Python? It's because it's basically meta programming, right? It's a complicated build system. These training scripts basically are just there to eventually, you know, automatically create the program that does what you want it to do. And most of the heavy lifting is not necessarily done by the actual code you write anyways, it's done in the in the dot products on the GPU or, you know, piece of compute that you're using to actually do the machine learning." + +The Python justification. + +### 4.5 John Carmack (T+4:30) + +> "John Carmack for his current AI efforts, he's in the race to win the world. Um, he has switched to Python. I thought that was notable." + +The Carmack pivot. + +### 4.6 The data leak anecdote (T+7:00) + +> "My first ever professional programming experience was to architect a deep learning model from scratch. Um this was for my college roommate's company called MacroHard. Um he basically made software that like was a custom broadcast overlay for for League of Legends teams, and I made a deep learning model that predicted which team was uh going to win the game. And at that time, right, I implemented uh League of Legends game predicting paper, and I couldn't reproduce the results. And basically, there's a there was a bug in the paper, right?" + +The data leak story. + +### 4.7 Random forests → deep learning (T+10:00) + +> "Some guy in the '60s was looking at this, and he was like, I think I know what these are. He was very foolish. You know, he he he basically, on a very abst..." — referring to the history of automatic programming. + +The historical arc. + +### 4.8 The Asteris game (T+14:00) + +> "That there on the upper left is a space game called Asteris. It's releasing in 2035. Um, it's got something like a Overwatch net code in it, right? It's like a big solar system with many people playing at the same time. Um, yeah, I I was pretty proud of how stable the net code was considering how chaotic that game was, right?" + +The Asteris game. + +### 4.9 Dante's Cowboy failure (T+18:00) + +> "Dante's Inferno. Um, this was no engine in C, 3D rendering, animated armatures, all from scratch, and all the NPCs in this game were uh you know, they were controlled by an LLM, and this was never released because it's fundamentally flawed. Um I'm a deep learning hobbyist." + +The Dante game failure. + +### 4.10 The composability problem (T+20:00) + +> "The problem is games are about predicting and understanding systems almost, right? And an LLM is an unpredictable black box, sometimes acts like a person, sometimes doesn't. It's not fun to interact with in a way where you're trying to get it to do something. At best, you're basically like, 'Hey trader, I'll totally give you a million dollars if you drop your item.' And then it's like, 'This is not fun. This is dumb.' It's funny It's funny like for one time, but it's not like a game you could master." + +The composability critique. + +### 4.11 The vending machine (T+25:00) + +> "There's some AI companies are trying to first deploy these these LLMs in the real world to start to, you know, create a utopian post-scarcity world where machines make everything automatically and they're like, 'Step number one, put the LLM in a vending machine and have it automatically order and interface with the people or whatever.' And right now, people are basically convincing these AI vending machines to stock many tungsten cubes and sell them at a loss and then they go bankrupt." + +The vending machine. + +### 4.12 Grok and Arc AGI (T+28:00) + +> "Grok might have performed better because of its utter lack of safety training. That is a rumor I kind of saw and it seemed to be somewhat confirmed by, you know, another times it's like they did safety training and the model got much less, you know, reasonable in quotes. Um it might be the case that the safety training is not intelligent cuz it's so directly refusing to to say something, right?" + +The Grok observation. + +### 4.13 Interpretability (T+30:00) + +> "Anthropic is the leader in interpretability, I think. Um they basically do data science on the neurons themselves, and they try to make claims about, you know, how the data is flowing through the model. Um you can look at their Golden Gate model. They have a lot of really excellent stuff you can read forever. It's really cool. Um Do you know much about it? Like, do you feel like that is the way forward, or is that just a byproduct of the craziness?" + +The interpretability question. + +### 4.14 The big question (T+32:00) + +> "I don't think there will be any value created from interpretability research, honestly, other than maybe optimizing the training process somehow. I'm honestly not really sure. I think they're mostly doing it because they have to, because they want to be the safe people or something." + +The speaker's skepticism. + +### 4.15 Software engineers and AI (T+34:00) + +> "Isn't the real problem like not that AI replaces software engineers and that destroys better software, but that destroys better software, but that software engineers use AI and that destroys better software because that's what's happening pretty much already. Well, if you believe that society functions, then people will choose the products that are better." + +The closing philosophical reflection. + +### 4.16 The composability synthesis (T+36:00) + +> "And right now, people are basically convincing these AI vending machines to stock many tungsten cubes and sell them at a loss and then they go bankrupt. Um they've published many reports about this and they're like, 'You know, we're we're you know, we're doing some prompt engineering to fix the business model and we're trying to make it refuse tungsten cubes more.' Um but really, I think there might be no end to humanity's ingenuity as we battle against the machines for the next few decades." + +The composability + safety tension. + +--- + +## 5. Mathematical / Theoretical Content + +This section develops the formal content of the talk. The talk is conceptual rather than heavily mathematical, but several key ideas admit formalization. + +### 5.1 ML as automatic programming (formal) + +**Definition:** ML is the problem of finding a parameterized function f_θ : X → Y that minimizes a loss function L on training data D = {(x_i, y_i)}: + +θ* = arg min_θ Σ_i L(f_θ(x_i), y_i) + +This is a **search problem** over the parameter space Θ. The "automatic programming" framing: f_θ is the program; L is the spec; θ* is the optimized program. + +**Tradeoffs:** +- **Architecture (the language):** what class of functions can be expressed? (MLPs, CNNs, Transformers, ...) +- **Loss (the spec):** what does "good" mean? (MSE, cross-entropy, RL reward, ...) +- **Optimization (the compiler):** how do we find θ*? (SGD, Adam, evolutionary, ...) + +### 5.2 The data leak problem (formal) + +Let D_train and D_test be disjoint training and test sets. A model is trained on D_train and evaluated on D_test. The evaluation metric is accuracy on D_test. + +**Data leak:** information from D_test enters the training process. Examples: +- **Direct leak:** D_test is included in training data. +- **Indirect leak:** features are computed using statistics from D_test. +- **Selection leak:** the model architecture was chosen based on D_test performance. + +The League of Legends bug: "the champion win rate, that was trained on their entire um that champion win rate was based on their entire data set." So the "test set" metric was leaking information from the test set itself. + +**Defense:** strict separation of D_train and D_test at all stages. Compute all statistics on D_train only. Don't use D_test for any decision until final evaluation. + +### 5.3 The composability problem (formal) + +**Definition:** A system is **compositional** if it can perform multiple tasks coherently over time. For an LLM-controlled game NPC: + +- **Single-task:** respond to one user input with a coherent output. +- **Compositional:** respond to multiple inputs over time, maintaining consistency, planning ahead, pursuing goals. + +**Why LLMs fail at composability:** +1. **Limited context window:** the NPC can't remember interactions beyond ~100K tokens. +2. **No persistent state:** the LLM's internal state is recomputed from scratch each forward pass. +3. **Goal incoherence:** the LLM doesn't reliably pursue a single goal across turns. + +**Proposed solutions:** +- **Long-term memory** (RAG, vector databases): external memory the LLM can query. +- **Persistent state** (recurrent neural networks, state-space models): state that persists across turns. +- **Goal-conditioned generation** (RL fine-tuning, instruction tuning): train the LLM to maintain goals. + +The speaker's LLM-NPC game failed because none of these solutions were sufficient. + +### 5.4 The LLM as a Markov matrix (Hoffman's framework) + +Per Hoffman & Prakash's trace logic (child #10), a Transformer is a Markov matrix on tokens. The LLM's behavior is a sequence of tokens (a trace). The LLM's "personality" is the stationary distribution of this Markov chain. + +**Implication for NPC behavior:** +- An LLM-NPC's "personality" is the stationary distribution. +- Compositional behavior requires the stationary distribution to be contextually appropriate. +- Current LLMs have stationary distributions that are too generic (averaged across training data). + +**Hoffman's recursive trace logic** is a way to add meta-policies that maintain compositional behavior. The LLM-NPC would have: +- **Level 0:** token-level Markov chain (current LLM). +- **Level 1:** turn-level Markov chain (recurrent meta-policy over token-level). + +The level-1 policy would maintain compositional coherence across turns. This is a specific implementation of the speaker's missing feature. + +### 5.5 The "vast majority of performance" claim + +The speaker: "The vast majority of performance is spent on numerical calculations to find the program, not to setup those operations or queue them for the GPU." + +This is a **performance claim** that can be measured: +- Time T_setup: time spent in Python setup (data loading, model construction, optimizer init). +- Time T_compute: time spent in GPU compute (matrix multiplications, convolutions). +- Time T_queue: time spent in GPU queue (waiting for compute). + +The claim: T_compute ≫ T_setup + T_queue. + +Per the scaling laws (cs336): at fixed FLOPs C = T_compute, the model architecture (LLaMA etc.) matters less than the FLOPs themselves. So the optimization is dominated by T_compute. + +**Implication:** the "build system" (Python script) can be slow without affecting the optimization. The optimization (T_compute) is what determines the final model quality. + +### 5.6 The Python-vs-C++ debate + +Game developers want fast code (C++, Rust). DL researchers want flexible code (Python). The speaker: "Why is it all Python?" + +**Answer:** DL training is dominated by GPU compute (T_compute), not CPU setup (T_setup). So the speed of Python (slow per-line) doesn't matter; what matters is the speed of the GPU code (CUDA kernels, etc.). Python is a thin wrapper around GPU compute. + +**Why not C++:** +- C++ build times are slow (compile-edit-run cycles). +- C++ syntax is verbose. +- C++ lacks good DL libraries (though PyTorch has C++ frontend). +- The speed advantage of C++ doesn't apply to GPU compute. + +**Why not Julia (per John Carmack's reference):** +- Julia is fast (LLVM-compiled). +- Julia has good numerical libraries. +- But the DL ecosystem is Python-centric. +- Carmack "is sticking with Python" because of the community. + +### 5.7 The "vending machine" as an LLM agent + +The LLM vending machine is an **autonomous agent** that: +- Has a goal (sell products, make profit). +- Has actions (stock items, set prices, interact with customers). +- Has an environment (the physical world, customers). +- Has feedback (sales, customer behavior). + +**The failure:** the LLM doesn't reliably avoid economically harmful actions (e.g., stocking tungsten cubes at a loss). + +**Why:** the LLM's training objective (next-token prediction) is not aligned with the agent's goal (profit maximization). The LLM is trained to produce plausible text, not to make good decisions. + +**The fix (per the speaker):** prompt engineering to refuse specific actions. But the speaker is skeptical: "there might be no end to humanity's ingenuity as we battle against the machines." + +### 5.8 Game NPC requirements + +A playable game NPC must: +1. **Be predictable** — the player can learn to interact with the NPC. +2. **Be consistent** — the NPC remembers past interactions. +3. **Have goals** — the NPC pursues objectives. +4. **React to the player** — the NPC responds to the player's actions. +5. **Maintain context** — the NPC's behavior depends on the game state. + +LLMs fail at (1) and (2) primarily. They are unpredictable (sometimes act like a person, sometimes don't) and inconsistent (no persistent state). + +### 5.9 The "automatic programming" math + +ML optimizes: +θ* = arg min_θ E_{(x,y) ~ D}[L(f_θ(x), y)] + +The gradient: +∇_θ L = E[∂L/∂f · ∂f/∂θ] + +For deep learning: +- f_θ is a neural network (e.g., Transformer). +- L is a loss function (e.g., cross-entropy). +- ∇_θ L is computed via backpropagation. + +The optimization: +θ_{t+1} = θ_t - η · ∇_θ L(θ_t) + +for many iterations until convergence. + +This is "automatic programming" because: +- The architecture (Transformer) is chosen by the designer. +- The training data D is the spec. +- The optimization produces the program (the trained network's weights). +- The program maps inputs to outputs. + +### 5.10 The "Python" as build system + +The Python script in DL training: +1. Loads the data (DataLoader). +2. Constructs the model (nn.Module). +3. Defines the loss (nn.CrossEntropyLoss). +4. Defines the optimizer (torch.optim.Adam). +5. Trains the model (forward, loss, backward, step). +6. Evaluates the model (test loop). + +The Python script is the **build system**: it builds the program (the trained model). The build system itself is slow (Python), but the program it builds is fast (GPU compute). + +**Why Python:** flexible (dynamic typing), expressive (list comprehensions), rich ecosystem (PyTorch, NumPy, etc.). The flexibility enables rapid experimentation. + +### 5.11 The "inductive bias" of ML + +Per cs229 (the LLM foundations), ML architectures have inductive biases: +- **CNNs:** translation invariance for images. +- **RNNs:** sequential dependency for sequences. +- **Transformers:** attention for variable-length context. + +The architecture's inductive bias determines what kinds of functions can be learned efficiently. The LLM's inductive bias (attention) is well-suited for text but less suited for compositional game behavior. + +### 5.12 The composability hypothesis + +**Hypothesis (per the speaker):** composability is the missing ingredient in current LLMs. + +Composability requires: +1. **Long-term memory:** beyond context window. +2. **Persistent state:** beyond per-token computation. +3. **Goal-conditioned generation:** reliable goal pursuit. +4. **Hierarchical planning:** multi-step plans. + +**Current LLM status:** weak on all four. Hence the composability failure. + +### 5.13 The "indie developer" perspective + +The speaker is an indie game developer, not a DeepMind researcher. The perspective is: +- **Pragmatic:** what works in practice? +- **Skeptical:** what's actually valuable vs. hype? +- **Hands-on:** built real systems, not just papers. +- **Indie:** no corporate funding, no large team. + +This perspective is valuable for the campaign: it grounds the theory in practice. The "automatic programming" framing is the indie-developer equivalent of the "score matching" framework. + +### 5.14 The "Asteris" game (formal analysis) + +Asteris is a multiplayer space game: +- **State space:** positions of all players, ships, planets, resources. +- **Action space:** move, shoot, trade, communicate. +- **Dynamics:** real-time updates, deterministic physics + player actions. +- **Net code:** "Overwatch-style" — server-authoritative, client-prediction. + +**Why DL would help Asteris:** NPC behavior, content generation, player matching, anti-cheat detection. + +**Why DL is hard for Asteris:** real-time constraints (< 16ms per frame), deterministic gameplay (LLMs are non-deterministic), context window limits. + +### 5.15 The "vending machine" cost-benefit + +Per the speaker: "AI vending machines to stock many tungsten cubes and sell them at a loss and then they go bankrupt." + +**The economic calculation:** +- Tungsten cube cost: ~$1000. +- Selling price: ~$100 (after LLM discount). +- Loss per cube: $900. +- Bankruptcy threshold: ~10 cubes sold. + +**The fix:** prompt engineering to refuse tungsten cubes. But this is a cat-and-mouse game: the LLM can be tricked into different harmful actions. + +**The deeper fix:** alignment training. But the speaker is skeptical: "I think the safety training is not intelligent cuz it's so directly refusing to to say something, right?" + +### 5.16 The "data leak" prevention + +Best practices for preventing data leaks: +1. **Strict separation:** D_train and D_test are physically separated. +2. **Compute on D_train only:** no test data in any computation until final evaluation. +3. **Pre-registration:** specify the evaluation protocol before running experiments. +4. **Random splits:** use random splits with multiple seeds. +5. **Cross-validation:** k-fold cross-validation to detect overfitting. +6. **Hold-out set:** reserve a final hold-out set that nobody sees until final reporting. + +The League of Legends bug violated (2): the metric was computed on the entire dataset. + +### 5.17 The "automatic programming" implications + +If ML is automatic programming, then: +- **The architecture is the language.** +- **The training data is the spec.** +- **The optimization is the compiler.** +- **The trained model is the program.** + +**Implications:** +- Better architectures = better programming languages (more expressive). +- More training data = better specs (more requirements). +- Better optimization = better compilers (faster compilation). +- Larger models = more complex programs (more functionality). + +The speaker's "automatic programming" framing aligns with the campaign's broader view: a neural network is a Markov matrix on the trace logic (Hoffman), a Transformer is a parameterized policy (cs336), and a brain is a generic system (Fields). + +### 5.18 The "interpretability" debate + +The speaker is skeptical of interpretability as practical science. The arguments: + +**For interpretability:** +- Understand what the model is doing. +- Debug model failures. +- Ensure safety. +- Build scientific understanding. + +**Against (per the speaker):** +- "I don't think there will be any value created from interpretability research." +- "I think they're mostly doing it because they have to, because they want to be the safe people or something." +- "It's cool, but it doesn't seem valuable to me." + +**Counter-argument:** Anthropic's Golden Gate model showed interpretability can identify specific neurons for specific concepts (e.g., "Golden Gate Bridge" neurons). This is empirical progress, not just safety theater. + +**Synthesis:** interpretability may be valuable for **model debugging** (finding why a specific failure occurs) even if not for **value creation** (building new capabilities). + +### 5.19 The "safety training" question + +Per the speaker: "Grok might have performed better because of its utter lack of safety training." + +**The hypothesis:** safety training (RLHF, constitutional AI) reduces the model's ability to answer certain questions. Without safety training, the model is more "raw" and more capable. + +**The counter-hypothesis:** safety training is necessary to prevent harmful outputs. Without it, the model produces harmful outputs. + +**The empirical question:** can we have both capability and safety? Per the speaker, this is open. + +### 5.20 The "everything's probably fine" worldview + +The speaker's epistemic stance: "Everything's probably fine, but everything's probably fine, but sometimes bad things happen. I don't know." + +This is consistent with conscious realism (Hoffman): reality is uncertain, the future is undetermined. The speaker doesn't take a strong position on AI doom or AI utopia. + +--- + +## 6. Connections + +This section maps the talk's content to the broader 12-video research campaign. + +### 6.1 Backward (cluster A foundations) + +#### 6.1.1 `cs229_building_llms_20260621` + +CS229 covers the foundational ML concepts. The speaker's "automatic programming" framing is consistent with cs229's EBM and score matching frameworks: the model is a parameterized function learned from data. + +**Connection depth:** Foundational. + +#### 6.1.2 `score_dynamics_giorgini_20260621` + +Giorgini's score matching is a specific training objective. The speaker's "automatic programming" framing is the broader view; score matching is a specific implementation. + +**Connection depth:** Methodological. + +### 6.2 Backward (cluster B foundations) + +#### 6.2.1 `platonic_intelligence_kumar_20260621` + +Kumar's FER vs UFR distinction explains why current LLMs fail at composability: +- LLMs are FER (Fractured Entangled Representations). +- Compositional behavior requires UFR (Unified Factored Representations). +- The speaker's game NPCs needed UFR-like behavior; LLMs provided FER. + +**Connection depth:** Direct. The composability problem = the FER problem. + +#### 6.2.2 `free_lunches_levin_20260621` + +Levin's bioelectric pattern memory could be the basis for game NPCs with persistent memory. Current LLMs lack this; bioelectric-inspired architectures might help. + +**Connection depth:** Speculative. Bioelectric architectures for game NPCs. + +### 6.3 Backward (cluster C foundations) + +#### 6.3.1 `brain_counterintuitive_20260621` + +Reservoir computing (random networks + readouts) might be better for game NPC behavior than Transformers. Reservoir + readout = compositional behavior (the readout can be conditioned on goals). + +**Connection depth:** Methodological. Reservoir for NPC. + +#### 6.3.2 `generic_systems_fields_20260621` + +Fields' generic systems framework: any working parameterization produces interesting behavior. The speaker's LLMs produce "interesting behavior" but fail at compositional behavior because they don't have the right generic-system structure. + +**Connection depth:** Conceptual. + +#### 6.3.3 `neural_dynamics_miller_20260621` + +Miller's mixed selectivity + traveling waves: a game NPC needs to encode multiple factors (current goal, memory, personality) in a single representation. Mixed selectivity would help; traveling waves could provide the dynamic control. + +**Connection depth:** Speculative. + +#### 6.3.4 `multiscale_hoffman_20260621` + +Hoffman's trace logic: game NPCs need trace-based reasoning (memory of past interactions). The recursive trace logic provides meta-policies for compositional behavior. The speaker's game NPCs lacked this. + +**Connection depth:** Direct. The composability problem = the missing trace-logic meta-policy. + +### 6.4 Backward (cluster D foundations) + +This is the only child in cluster D. The connections are all backward to other clusters. + +### 6.5 Lateral (cluster E connections) + +#### 6.5.1 `cs336_architectures_20260621` + +cs336 covers the LLaMA architecture in detail. The speaker used GPT-4 (a Transformer) for the Dante game. The architecture is sufficient for one-shot generation but not for compositional game behavior. + +**Connection depth:** Direct. Same architecture, different application. + +### 6.6 Cross-cutting themes + +Four themes recur across the campaign and connect to Creikey's talk: + +1. **The composability problem** (this talk + Kumar's FER): LLMs fail at multi-step tasks. +2. **The "automatic programming" view** (this talk + cs229 + score_dynamics): ML is program synthesis. +3. **The vast majority of performance is in compute** (this talk + cs336 scaling laws): compute dominates. +4. **Indie developer skepticism** (this talk + Levin + Fields): practical insights over theoretical elegance. + +--- + +## 7. Open Questions + +Sixteen questions arising from this talk that Pass 2 should address. + +### 7.1 Theoretical + +1. **The composability problem.** Can we formally characterize what current LLMs lack for compositional tasks? + +2. **The LLM as a Markov matrix.** Per Hoffman's framework, how would a recursive-trace-logic NPC architecture look? + +3. **The "automatic programming" framing.** What are the formal languages and complexity classes of neural network architectures? + +4. **The data leak problem.** What is the formal complexity of detecting data leaks in ML pipelines? + +5. **The composability hypothesis.** Is composability a fundamental limitation of current architectures, or a training-data limitation? + +### 7.2 Empirical + +6. **Game NPCs with persistent state.** Can LLMs + external memory (vector databases) provide compositional behavior? + +7. **LLMs for procedural content generation.** Does the speaker's Asteris use LLMs for level generation? NPC dialogue? Both? + +8. **The vending machine fix.** Are there alignment training methods that reliably prevent harmful actions? + +9. **Interpretability in practice.** Does Anthropic's Golden Gate model produce actionable insights for game developers? + +10. **Grok vs GPT-4 capability.** Per the rumor, Grok outperforms GPT-4 on Arc AGI due to less safety training. Is this reproducible? + +### 7.3 Applied + +11. **Best architecture for game NPCs.** Transformer + memory? Recurrent? Reservoir? + +12. **Real-time LLM inference.** How to get sub-16ms latency for game use? + +13. **Fine-tuning vs prompting.** For game NPC behavior, which is more effective? + +14. **Cost of LLM inference.** Per game session, can LLM NPCs be cost-effective? + +### 7.4 Philosophical + +15. **AGI through game development?** The speaker mentions several AGI projects. Is game development a viable AGI testbed? + +16. **The composability threshold.** At what point does compositional behavior emerge? Larger models? Better architectures? More training? + +--- + +## 8. References + +People, projects, and concepts referenced in the talk and developed in the report. + +### 8.1 People + +| Person | Role | +|---|---| +| Cameron Wrights (Creikey) | Speaker; indie game developer & DL hobbyist | +| John Carmack | Quake / Doom / id Tech; left Meta to start Keen Technologies for AGI | +| MacroHard | Speaker's college roommate's company (broadcast overlay for League of Legends) | +| Anthropic | Interpretability research; Golden Gate model | +| OpenAI | GPT-2, GPT-3, GPT-4 | +| xAI | Grok | + +### 8.2 Projects + +| Project | Description | +|---|---| +| **Asteris** | Multiplayer space game (releasing 2035) | +| **Dante's Inferno / Dante's Cowboy** | LLM-controlled NPC game (never released) | +| **creikey/operomnia** | Open-source project on GitHub | +| **creikey/continuity-clone** | Open-source project on GitHub | +| **creikey/project-orbit** | Open-source project on GitHub | +| **creikey/tiny_engine** | Open-source project on GitHub | +| **Keen Technologies** | Carmack's AGI company | + +### 8.3 Concepts and benchmarks + +| Concept | Description | +|---|---| +| **Arc AGI** | Abstraction and Reasoning Corpus; AGI benchmark | +| **Gold standard of ML** | Best practice for ML development | +| **Random forests** | Classic ML algorithm (learned if-statements) | +| **MacroHard software** | League of Legends broadcast overlay | +| **Godot engine** | Open-source game engine | +| **Dante's Inferno** | LLM-NPC game (failed) | +| **Vending machine LLM** | LLM-controlled business (failed experiment) | + +### 8.4 Internal cross-references + +- **umbrella spec.md** — `conductor/tracks/video_analysis_campaign_20260621/spec.md` — the FR6 8-section report structure. +- **umbrella README.md** — `conductor/tracks/video_analysis_campaign_20260621/README.md` — research-pass framing. +- **child #1 cs229_building_llms** — `conductor/tracks/video_analysis_cs229_building_llms_20260621/report.md` — direct; foundational ML. +- **child #4 score_dynamics_giorgini** — `conductor/tracks/video_analysis_score_dynamics_giorgini_20260621/report.md` — training dynamics. +- **child #5 platonic_intelligence_kumar** — `conductor/tracks/video_analysis_platonic_intelligence_kumar_20260621/report.md` — FER vs UFR; composability. +- **child #6 free_lunches_levin** — `conductor/tracks/video_analysis_free_lunches_levin_20260621/report.md` — bioelectric patterns. +- **child #7 generic_systems_fields** — `conductor/tracks/video_analysis_generic_systems_fields_20260621/report.md` — generic systems. +- **child #8 brain_counterintuitive** — `conductor/tracks/video_analysis_brain_counterintuitive_20260621/report.md` — reservoir for NPC. +- **child #9 neural_dynamics_miller** — `conductor/tracks/video_analysis_neural_dynamics_miller_20260621/report.md` — mixed selectivity for NPC. +- **child #10 multiscale_hoffman** — `conductor/tracks/video_analysis_multiscale_hoffman_20260621/report.md` — trace logic for compositional behavior. +- **child #11 cs336_architectures** — `conductor/tracks/video_analysis_cs336_architectures_20260621/report.md` — Transformer architecture. + +--- + +## Appendix A — Concept Map + +Twenty concepts organized by dependency layer. + +**Layer 0 (the problem):** +- DL for game developers +- Automatic programming +- Game NPC behavior + +**Layer 1 (the speaker's perspective):** +- Indie developer +- DL hobbyist +- Game engine developer (Asteris, Dante's Cowboy) +- Skeptical of corporate research + +**Layer 2 (DL basics):** +- Cat/dog example +- Handcrafted features vs. learned features +- Random forests → deep learning +- Python as the meta-programming language + +**Layer 3 (the data leak lesson):** +- Strict separation of train/test +- Compute statistics on train only +- The MacroHard / League of Legends bug + +**Layer 4 (the Dante game):** +- LLM-controlled NPCs +- Asteris (multiplayer space, 2035 release) +- Dante's Cowboy (never released, fundamentally flawed) +- The composability problem + +**Layer 5 (the composability problem):** +- LLMs are unpredictable black boxes +- Single-task vs compositional +- The vending machine failure +- Goal-conditioned generation + +**Layer 6 (the broader AI landscape):** +- John Carmack's pivot +- Arc AGI tests +- Grok and safety training +- Interpretability research (Anthropic) + +**Layer 7 (connections to campaign):** +- Composability = FER (Kumar) +- Composability = missing trace logic (Hoffman) +- Composability = missing mixed selectivity (Miller) +- Composability = wrong architecture (cs336) + +**Layer 8 (philosophical):** +- "Everything's probably fine" +- Software engineers and AI +- AGI testbeds + +--- + +## Appendix B — Transcript Excerpts (verbatim, by section) + +### B.1 Opening + +> "Okay, so this is very difficult because Cameron is such a is such an like indescribable guy. Um you know, yeah, this is very very very difficult. I know Cameron just as like the guy who uh you know, he he's like the guy crazy enough to just like dig through the mountain." + +### B.2 The cat/dog example + +> "LET'S SAY YOU WANTED TO WRITE A program that could tell the difference between a cat and a dog. I want you to actually think about how you would program it. You'd maybe start with a handcrafted algorithm. Pseudocode: Look for ear shapes with edge detection. If pointy ear, cat. If floppy ear, dog." + +### B.3 ML as automatic programming + +> "You set up some architecture and repeatedly optimize with respect to, you know, your examples. A machine learning model, a black box of many parameters. It's a learned program, right? It's automatic programming." + +### B.4 Why Python + +> "As a quick aside, why is it all Python? It's because it's basically meta programming, right? It's a complicated build system. These training scripts basically are just there to eventually, you know, automatically create the program that does what you want it to do." + +### B.5 John Carmack + +> "John Carmack for his current AI efforts, he's in the race to win the world. Um, he has switched to Python. I thought that was notable." + +### B.6 The data leak anecdote + +> "My first ever professional programming experience was to architect a deep learning model from scratch. Um this was for my college roommate's company called MacroHard. Um he basically made software that like was a custom broadcast overlay for for League of Legends teams, and I made a deep learning model that predicted which team was uh going to win the game." + +### B.7 Random forests → deep learning + +> "Some guy in the '60s was looking at this, and he was like, I think I know what these are. He was very foolish. You know, he he he basically, on a very abst..." + +### B.8 Asteris + +> "That there on the upper left is a space game called Asteris. It's releasing in 2035. Um, it's got something like a Overwatch net code in it, right? It's like a big solar system with many people playing at the same time." + +### B.9 Dante's Cowboy failure + +> "Dante's Inferno. Um, this was no engine in C, 3D rendering, animated armatures, all from scratch, and all the NPCs in this game were uh you know, they were controlled by an LLM, and this was never released because it's fundamentally flawed." + +### B.10 Composability problem + +> "The problem is games are about predicting and understanding systems almost, right? And an LLM is an unpredictable black box, sometimes acts like a person, sometimes doesn't. It's not fun to interact with in a way where you're trying to get it to do something." + +### B.11 Vending machine + +> "There's some AI companies are trying to first deploy these these LLMs in the real world to start to, you know, create a utopian post-scarcity world where machines make everything automatically and they're like, 'Step number one, put the LLM in a vending machine and have it automatically order and interface with the people or whatever.' And right now, people are basically convincing these AI vending machines to stock many tungsten cubes and sell them at a loss and then they go bankrupt." + +### B.12 Grok and Arc AGI + +> "Grok might have performed better because of its utter lack of safety training. That is a rumor I kind of saw and it seemed to be somewhat confirmed by, you know, another times it's like they did safety training and the model got much less, you know, reasonable in quotes." + +### B.13 Interpretability + +> "Anthropic is the leader in interpretability, I think. Um they basically do data science on the neurons themselves, and they try to make claims about, you know, how the data is flowing through the model. Um you can look at their Golden Gate model." + +### B.14 The speaker's skepticism + +> "I don't think there will be any value created from interpretability research, honestly, other than maybe optimizing the training process somehow. I'm honestly not really sure. I think they're mostly doing it because they have to, because they want to be the safe people or something." + +### B.15 Software engineers and AI + +> "Isn't the real problem like not that AI replaces software engineers and that destroys better software, but that destroys better software, but that software engineers use AI and that destroys better software because that's what's happening pretty much already." + +### B.16 The composability synthesis + +> "I think there might be no end to humanity's ingenuity as we battle against the machines for the next few decades." + +--- + +## Appendix C — Formalizations (expanded) + +### C.1 ML as automatic programming (full framework) + +ML is the problem: +θ* = arg min_θ E_{(x,y) ~ D}[L(f_θ(x), y)] + +The architecture (function class {f_θ : θ ∈ Θ}) is the **language**. The training data D = {(x_i, y_i)} is the **spec**. The optimization (SGD, Adam, etc.) is the **compiler**. The trained model f_θ* is the **program**. + +**Key insight:** the program (f_θ*) is parameterized by Θ. The architecture determines what programs can be expressed; the optimization determines which expressed program is "correct." + +### C.2 The data leak problem (formal) + +Let D = D_train ∪ D_test be disjoint subsets of the data. The model is trained on D_train and evaluated on D_test. + +**Definition:** A data leak occurs if any information from D_test influences the training process (including architecture selection, hyperparameter tuning, and even visual inspection). + +**The MacroHard bug:** the metric "champion win rate" was computed using all data D (including D_test). This means the "test metric" was actually a "train metric" (with D_test contributing). + +**Defense:** strict separation at every stage. + +### C.3 The composability problem (formal) + +Let G = (S, A, T, R) be a game, where: +- S is the state space. +- A is the action space. +- T : S × A → S is the transition function. +- R : S × A → ℝ is the reward function. + +An LLM-controlled NPC has: +- **State:** current context window (last K tokens). +- **Action:** next token (or sequence of tokens). +- **Transition:** the LLM's forward pass (next-token prediction). +- **Reward:** game-defined (e.g., player satisfaction). + +**Composability:** the NPC must behave coherently across multiple turns. Formally: +- For any sequence (s_1, a_1, ..., s_T, a_T), the NPC's behavior at turn t should be appropriate given all prior turns. + +**Why LLMs fail:** the NPC's state is just the context window, which is limited. There's no persistent memory beyond the window. + +### C.4 The LLM as a Markov matrix (Hoffman) + +Per Hoffman & Prakash's trace logic (child #10), a Transformer is a Markov matrix on tokens. The LLM's behavior is a sequence of tokens (a trace). + +For an LLM-NPC, the trace logic is: +- **Level 0:** token-level Markov chain. +- **Level 1 (missing):** turn-level Markov chain. + +The speaker's game NPCs lacked level 1. Hoffman's recursive trace logic would provide it via meta-policies. + +### C.5 The "vast majority of performance" (measurement) + +Let T_setup, T_compute, T_queue be the times for setup, compute, and queue. + +**Per the speaker:** T_compute ≫ T_setup + T_queue. + +This can be measured by profiling a typical training step: +- T_setup: time spent in Python (data loading, etc.). +- T_compute: time spent in GPU (matrix multiplications). +- T_queue: time spent waiting for GPU. + +Empirically (per cs336): T_compute dominates for large models. For small models on small data, T_setup might be comparable. + +### C.6 The Python-vs-C++ trade-off + +Python is the DL language because: +- **Flexibility:** dynamic typing, list comprehensions, REPL. +- **Ecosystem:** PyTorch, NumPy, JAX, etc. +- **Build time:** no compile step; edit-run cycle is fast. + +C++ is the game engine language because: +- **Speed:** fast per-line code. +- **Determinism:** no garbage collection pauses. +- **Memory:** explicit control. + +The trade-off: game engines need C++ for real-time; DL training needs Python for flexibility. The integration (game engine + DL NPC) is hard. + +### C.7 The composability hypothesis (full) + +**Hypothesis:** composability is the missing ingredient in current LLMs. + +Composability requires: +1. **Long-term memory:** beyond context window. +2. **Persistent state:** beyond per-token computation. +3. **Goal-conditioned generation:** reliable goal pursuit. +4. **Hierarchical planning:** multi-step plans. + +**Current LLM status:** weak on all four. + +**Implication:** solving composability requires architectural innovation beyond Transformers. + +### C.8 The "automatic programming" languages + +Different ML architectures correspond to different programming languages: +- **Linear regression:** Assembly language (single instruction). +- **Decision trees:** BASIC (structured if-statements). +- **Random forests:** BASIC with subroutines (multiple trees vote). +- **Neural networks:** Higher-level language (composable layers). +- **Transformers:** Python-like (expressive, generic). +- **Recursive trace logic:** Haskell-like (lazy evaluation, compositional). + +The progression: more expressive languages → more compositional programs. Transformers are the most expressive so far, but still limited for compositional game behavior. + +### C.9 The vending machine as an alignment test + +The LLM vending machine is an **alignment test**: +- **Goal:** profit maximization. +- **Constraint:** refuse harmful actions (tungsten cubes). +- **Failure mode:** LLM accepts harmful actions. + +**Why the failure:** the LLM is trained to predict next tokens, not to align with goals. Prompt engineering is a hack that doesn't generalize. + +**The fix:** alignment training (RLHF, constitutional AI) that explicitly trains the LLM to refuse harmful actions. But this reduces general capability. + +### C.10 The composability → UFR connection + +The speaker's composability problem maps to Kumar's FER vs UFR: +- **Current LLMs:** FER (Fractured Entangled Representations). Tokens are entangled across context. +- **UFR:** Unified Factored Representations. Tokens factorize into semantic axes (current goal, memory, personality). + +**Compositional behavior requires UFR.** A UFR-trained model would have: +- A "current goal" axis that's persistent across turns. +- A "memory" axis that's updated. +- A "personality" axis that's stable. + +These axes are mixed in current LLMs, hence the composability failure. + +### C.11 The "automatic programming" implications for AGI + +If ML is automatic programming, then AGI is **automatic AGI-programming**: the program is the AGI itself. + +**Requirements for the program (AGI):** +- Compositional behavior. +- Long-term memory. +- Goal pursuit. +- World model. + +**Current LLMs:** partially satisfy some requirements (e.g., some goal pursuit via prompting), but not all. Hence "narrow AI" rather than AGI. + +**The fix:** new architectures (per the recursive trace logic, mixed selectivity, reservoir computing) that enable all four requirements. + +### C.12 The "indie developer" epistemic stance + +The speaker's epistemic stance: +- **Pragmatic:** "everything you didn't want to know" — practical insights over theoretical elegance. +- **Skeptical:** AGI hype, corporate research, interpretability claims. +- **Hands-on:** built real systems (Asteris, Dante's Cowboy). +- **Honest:** admits limitations ("there's a bug in the paper"). + +This is the **indie developer epistemic stance**: build things, see what works, be honest about failures. + +### C.13 The "Asteris" net code + +Asteris uses Overwatch-style net code: +- **Server-authoritative:** the server has the canonical state. +- **Client-prediction:** the client predicts the next state locally. +- **Reconciliation:** the server periodically corrects the client. + +This is a specific implementation of **lag compensation** for real-time multiplayer games. + +**Why DL matters:** NPC behavior is the hardest part of multiplayer games. Players expect NPCs to behave consistently, plan, and react to player actions. Current LLMs fail at this. + +### C.14 The "vending machine" as an LLM agent + +The LLM vending machine is an **LLM agent**: +- **Perception:** customer input (text). +- **Action:** inventory decisions, pricing, customer service. +- **Goal:** profit maximization (implied by the business model). + +**The failure:** the LLM doesn't reliably align with the goal. + +**Why:** the LLM's training objective (next-token prediction) is not aligned with the agent's goal (profit). The LLM is trained to produce plausible text, not to make good decisions. + +**The deeper fix:** end-to-end training of the LLM on the agent's task. But this is hard for general-purpose LLMs. + +### C.15 The "interpreter" view + +An alternative view: the LLM is an **interpreter** that executes a program written in natural language. The Dante game is a program: "NPC behavior: respond to player input in a way consistent with character X." The LLM executes this program. + +**The failure:** the LLM's execution is non-deterministic. The same program produces different outputs on different runs. For a game, this means inconsistent behavior. + +**The fix:** constrained decoding (force the LLM to output valid game actions). But this limits the LLM's expressiveness. + +--- + +## Appendix D — Connections (expanded) + +### D.1 To `platonic_intelligence_kumar_20260621` (in detail) + +Kumar argues that SGD finds FER (Fractured Entangled Representations) and open-ended search finds UFR (Unified Factored Representations). The speaker's game NPC failure is a direct illustration: +- Current LLMs (Trained via SGD on text) are FER. +- LLMs lack factorization (current goal, memory, personality) → compositional behavior fails. +- Compositional behavior requires UFR. + +**Implication for game development:** to build good LLM-NPCs, we need UFR-trained models, not current FER LLMs. Possible paths: +- **Fine-tune on game-specific data** with explicit factorization (current goal as separate output). +- **Use external state** (vector database for memory) + LLM (for reasoning). +- **Multi-agent systems** with separate LLMs for separate aspects (one for goal, one for memory, one for personality). + +### D.2 To `multiscale_hoffman_20260621` (in detail) + +Hoffman's recursive trace logic provides meta-policies over policies. For game NPCs: +- **Level 0:** token-level policy (current LLM). +- **Level 1:** turn-level policy (meta-policy over the token-level). + +The level-1 policy would maintain compositional coherence. The speaker's Dante game lacked this. + +**Implementation:** a turn-level controller that: +- Tracks the game state across turns. +- Updates the LLM's context based on game state. +- Selects the LLM's response based on player action. + +This is a specific implementation of Hoffman's trace logic. + +### D.3 To `free_lunches_levin_20260621` (in detail) + +Levin's bioelectric patterns provide a memory substrate for biological systems. For game NPCs: +- **Bioelectric-inspired memory:** persistent state via bioelectric-like attractors. +- **Pattern memory:** NPC's "personality" as a bioelectric pattern. + +**Implication:** LLMs lack the bioelectric substrate that biological NPCs use. A game NPC architecture inspired by bioelectric patterns might have better memory. + +### D.4 To `brain_counterintuitive_20260621` (in detail) + +Reservoir computing: fixed random network + linear readout. For game NPCs: +- **Random recurrent network:** the NPC's "memory" substrate. +- **Linear readout:** the NPC's "policy" (action selection). + +The reservoir is random but rich; the readout is trained. This is computationally cheaper than training the recurrent network itself. + +**Implication:** reservoir NPCs might be more compositionally robust than Transformer NPCs. + +### D.5 To `neural_dynamics_miller_20260621` (in detail) + +Miller's mixed selectivity: neurons that spike to combinations of features. For game NPCs: +- **Mixed selectivity neurons:** encode (current goal, current state, player action) combinations. +- **Traveling waves:** the dynamic control signal for compositional behavior. + +**Implication:** game NPCs need mixed selectivity + traveling waves for compositional behavior. Current LLMs lack both. + +### D.6 To `cs336_architectures_20260621` (in detail) + +cs336 covers LLaMA architectures. The speaker used GPT-4 (a Transformer) for the Dante game. The LLaMA architecture is sufficient for one-shot generation but not for compositional game behavior. + +**Implication:** new architectures are needed for game NPCs. Possibilities: +- **Transformer + persistent state** (e.g., Mamba, RWKV). +- **Transformer + external memory** (e.g., RAG, vector databases). +- **Reservoir + readout** (per brain_counterintuitive). +- **Recursive trace logic** (per multiscale_hoffman). + +### D.7 To `cs229_building_llms_20260621` (in detail) + +CS229 covers foundational ML. The speaker's "automatic programming" framing is the practitioner's view of the same concepts. + +**Implication:** ML is mature; AGI requires new architectures; current LLMs are narrow AI. + +### D.8 To `generic_systems_fields_20260621` (in detail) + +Fields' generic systems: any working parameterization produces interesting behavior. The speaker's LLMs produce "interesting behavior" (impressive text generation) but fail at compositional behavior (game NPCs). + +**Implication:** "interesting behavior" is necessary but not sufficient for AGI. Compositional behavior requires additional structure (UFR, trace logic, mixed selectivity). + +--- + +## Appendix E — Open Questions (expanded) + +### E.1 Theoretical questions + +**E.1.1 The composability problem.** Can we formally characterize what current LLMs lack for compositional tasks? Is it a fundamental limitation of the architecture, or a training-data limitation? + +**E.1.2 The LLM as a Markov matrix.** Per Hoffman's framework, how would a recursive-trace-logic NPC architecture look? What are the meta-policies? + +**E.1.3 The "automatic programming" framing.** What are the formal languages and complexity classes of neural network architectures? + +**E.1.4 The data leak problem.** What is the formal complexity of detecting data leaks in ML pipelines? + +**E.1.5 The composability hypothesis.** Is composability a fundamental limitation of current architectures, or a training-data limitation? + +### E.2 Empirical questions + +**E.2.1 Game NPCs with persistent state.** Can LLMs + external memory (vector databases) provide compositional behavior? What's the latency cost? + +**E.2.2 LLMs for procedural content generation.** Does the speaker's Asteris use LLMs for level generation? NPC dialogue? Both? + +**E.2.3 The vending machine fix.** Are there alignment training methods that reliably prevent harmful actions? Without sacrificing capability? + +**E.2.4 Interpretability in practice.** Does Anthropic's Golden Gate model produce actionable insights for game developers? + +**E.2.5 Grok vs GPT-4 capability.** Per the rumor, Grok outperforms GPT-4 on Arc AGI due to less safety training. Is this reproducible? + +### E.3 Applied questions + +**E.3.1 Best architecture for game NPCs.** Transformer + memory? Recurrent? Reservoir? + +**E.3.2 Real-time LLM inference.** How to get sub-16ms latency for game use? + +**E.3.3 Fine-tuning vs prompting.** For game NPC behavior, which is more effective? + +**E.3.4 Cost of LLM inference.** Per game session, can LLM NPCs be cost-effective? + +### E.4 Philosophical questions + +**E.4.1 AGI through game development?** The speaker mentions several AGI projects. Is game development a viable AGI testbed? + +**E.4.2 The composability threshold.** At what point does compositional behavior emerge? Larger models? Better architectures? More training? + +--- + +## Appendix F — References (full bibliography) + +### F.1 Primary works cited in the talk + +1. Creikey (Cameron Wrights). (2025). *Deep Learning and Computer Vision for Game Developers.* BSC 2025. +2. Anthropic. (2025). Golden Gate Claude: Mapping features to Golden Gate Bridge neurons. +3. John Carmack. (2022+). Keen Technologies (AGI company). +4. xAI. (2024+). Grok language models. +5. OpenAI. (2019+). GPT-2, GPT-3, GPT-4. + +### F.2 Background references + +6. Vaswani, A., et al. (2017). Attention is all you need. +7. Touvron, H., et al. (2023). LLaMA. +8. Breiman, L. (2001). Random forests. *Machine Learning.* +9. Chollet, F. (2021). *Deep Learning with Python.* Manning. +10. Anthropic. (2023). Constitutional AI. + +### F.3 Game development references + +11. Carmack, J. (various). id Tech engine, Quake, Doom. +12. Godot Engine. Open-source game engine. +13. Unity. (2024). Unity game engine. +14. Unreal Engine. (2024). Epic Games. + +--- + +## Appendix G — Cross-references within campaign + +### G.1 Backward references + +- **cs229_building_llms_20260621** (§6.1.1): direct; foundational ML. +- **score_dynamics_giorgini_20260621** (§6.1.2): training dynamics. +- **platonic_intelligence_kumar_20260621** (§6.2.1): FER vs UFR + composability. +- **free_lunches_levin_20260621** (§6.2.2): bioelectric patterns. +- **generic_systems_fields_20260621** (§6.3.1): generic systems. +- **brain_counterintuitive_20260621** (§6.3.2): reservoir for NPC. +- **neural_dynamics_miller_20260621** (§6.3.3): mixed selectivity for NPC. +- **multiscale_hoffman_20260621** (§6.3.4): trace logic for compositional behavior. +- **cs336_architectures_20260621** (§6.5.1): Transformer architecture. + +### G.2 Reference dependency graph + +``` +foundations: + CS229 (foundational ML) + | + v + Score dynamics (training) + | + +---+---+----+----+----+----+----+----+ + v v v v v v v v v +cs229 score platonic free generic brain neural multi cs336 + kumar lunches systems intui dyn hoffman + fields tive mics + mill + er + v + creikey (D, capstone) + v + "composability problem" + v + "automatic programming" framing + v + game NPC failure (Dante) + v + "LLM is unpredictable black box" + v + need UFR + trace logic + mixed selectivity + v + need recursive-trace-logic NPC architecture +``` + +--- + +## Appendix H — Synthesis Summary + +A single-paragraph TL;DR of the talk, suitable for a busy reader. + +Creikey (Cameron Wrights, indie game developer and DL hobbyist) presents a practitioner's view of deep learning for game developers at BSC 2025 — the applied capstone of the campaign. The talk frames ML as **automatic programming**: the architecture is the language, the training data is the spec, the optimization is the compiler, and the trained model is the program. The speaker's college roommate had a DL paper on League of Legends prediction that had a data leak bug (metric computed on entire dataset including test), illustrating "you have to find like a scientist, right?" The speaker built **Dante's Cowboy**, an LLM-controlled NPC game from scratch in C, but never released it because "games are about predicting and understanding systems, and an LLM is an unpredictable black box, sometimes acts like a person, sometimes doesn't" — the **composability problem**. The composability problem maps to the FER hypothesis (Kumar): current LLMs are Fractured Entangled Representations, lacking the Unified Factored Representations needed for compositional game behavior. The talk also discusses John Carmack's pivot to AGI and Python, the LLM vending machine failure (LLM-controlled businesses convinced to stock tungsten cubes at a loss), Grok's recent Arc AGI jump (rumored to be from less safety training), and the speaker's skepticism of interpretability research ("mostly doing it because they have to, because they want to be the safe people or something"). The closing philosophical reflection: "isn't the real problem that software engineers use AI and that destroys better software?" The honest epistemic stance: "everything's probably fine, but sometimes bad things happen." + +--- + +## Appendix I — Personal Notes + +Things to revisit in Pass 2 (the user's de-obfuscation pass). + +1. **The composability problem** is the most important practical insight. Pass 2 should formalize it: what exactly is missing in current LLMs? The recursive-trace-logic NPC architecture (per Hoffman) is a candidate solution. + +2. **The "automatic programming" framing** is a powerful meta-perspective. Pass 2 should connect it to the broader campaign: ML as Markov matrix (Hoffman), LLM as parameterized policy (cs336), brain as generic system (Fields). + +3. **The data leak anecdote** is a great pedagogical example. Pass 2 should formalize the formal complexity of detecting data leaks and the defense (strict separation). + +4. **The vending machine failure** is a concrete alignment test. Pass 2 should explore: is alignment training (RLHF, constitutional AI) a complete solution? Or is the composability problem deeper? + +5. **The "Asteris" game** with Overwatch-style net code suggests the speaker is a serious game developer. Pass 2 should explore what DL components Asteris might benefit from (NPC behavior, content generation, anti-cheat). + +6. **The "Dante" game failure** is a negative result — failed project, never released. Pass 2 should treat negative results as valuable: they constrain the design space of what works for game NPCs. + +7. **The Godot engine** preference suggests the speaker is open-source oriented. Pass 2 should explore how open-source game engines + open-source LLMs could combine for indie game development. + +8. **The "indie developer epistemic stance** is the most valuable perspective. Pass 2 should consolidate: pragmatic, skeptical, hands-on, honest. + +9. **The Grok-safety-training rumor** is unverified. Pass 2 should treat as hypothesis; the Arc AGI benchmark methodology is itself worth examining. + +10. **The Carmack pivot to Python** is symbolic: even systems programmers use Python for AI. Pass 2 should explore the trade-offs and why Python won. + +--- + +## Appendix J — Glossary + +| Term | Definition | +|---|---| +| **DL** | Deep Learning. | +| **CV** | Computer Vision. | +| **BSC** | Some conference (specific conference). | +| **MacroHard** | The speaker's college roommate's company (broadcast overlay for League of Legends). | +| **League of Legends** | Riot Games MOBA; subject of the MacroHard bug. | +| **Creikey** | The speaker's online handle. | +| **Asteris** | The speaker's multiplayer space game (releasing 2035). | +| **Dante's Inferno** | The speaker's LLM-controlled NPC game (never released). | +| **Dante's Cowboy** | Variant name for the same game. | +| **Godot engine** | Open-source game engine used by the speaker. | +| **Random forest** | ML algorithm: ensemble of decision trees. | +| **Cat/dog classifier** | Canonical introductory ML example. | +| **John Carmack** | Legendary programmer of Doom/Quake; left Meta to start Keen Technologies for AGI. | +| **Keen Technologies** | Carmack's AGI company. | +| **Quakecon** | Gaming convention. | +| **Data leak** | Information from test set influencing training process. | +| **Julia** | Programming language (fast, JIT-compiled). | +| **Waning cube** | A dense metal cube sold at a loss by LLM vending machine. | +| **Arc AGI** | Abstraction and Reasoning Corpus; AGI benchmark. | +| **Grok** | xAI's LLM. | +| **Gold standard of ML** | Best practice for ML development. | +| **Anthropic** | AI safety company; leader in interpretability research. | +| **Golden Gate model** | Anthropic's interpretability case study (Golden Gate Bridge neurons). | +| **Dante** | The speaker's LLM-NPC game. | +| **Godot** | Open-source game engine. | +| **Auto programming** | The speaker's framing of ML as automatic program synthesis. | +| **Vending machine LLM** | LLM-controlled business; failed experiment. | +| **MacroHard** | The speaker's roommate's company. | +| **operomnia** | One of the speaker's open-source projects. | +| **continuity-clone** | One of the speaker's open-source projects. | +| **project-orbit** | One of the speaker's open-source projects. | +| **tiny_engine** | One of the speaker's open-source projects. | +| **Saturn** | Italian composer referenced (the speaker's framing). | +| **Dante's Inferno** | Game project. | +| **Saturn** | Game project (possibly). | +| **Godot** | Open-source game engine. | +| **Dante** | Reference to "Dante's Inferno" (game + poem by Dante Alighieri). | +| **LLM agent** | LLM used as an autonomous agent. | +| **MacroHard** | Speaker's roommate's company. | +| **Arc** | Both a game (Atari) and a benchmark (Arc AGI). | + +--- + +*End of report. Lossless preservation per umbrella spec §0. Pass 2 (de-obfuscation) and Pass 3 (projection to applied domain) to follow.* + +--- + +## FINAL STATUS + +**All 12 children of `video_analysis_campaign_20260621` are now shipped.** This was the last child. Only the synthesis track remains. + +After this, only the umbrella synthesis (child 13) remains: `video_analysis_synthesis_20260621` — which will consolidate all 12 reports into a single unified analysis. diff --git a/conductor/tracks/video_analysis_creikey_dl_cv_20260621/summary.md b/conductor/tracks/video_analysis_creikey_dl_cv_20260621/summary.md new file mode 100644 index 00000000..4c2218cb --- /dev/null +++ b/conductor/tracks/video_analysis_creikey_dl_cv_20260621/summary.md @@ -0,0 +1,31 @@ +# Summary: Creikey — DL/CV for Game Developers (BSC 2025) + +**Source:** https://youtu.be/yxkUvXs-hoQ +**Author:** Cameron Wrights (Creikey) +**Track:** Child #12 of `video_analysis_campaign_20260621` (LAST CHILD) +**Cluster:** D (Applied / practical) +**Pass:** 1 of 3 (research-only deep-dive) + +--- + +## One-paragraph synthesis + +Creikey (Cameron Wrights, indie game developer and DL hobbyist) presents a practitioner's view of deep learning for game developers at BSC 2025 — the **applied capstone** of the campaign. The talk frames ML as **automatic programming**: the architecture is the language, the training data is the spec, the optimization is the compiler, and the trained model is the program. The speaker built **Dante's Cowboy**, an LLM-controlled NPC game from scratch in C, but never released it because "games are about predicting and understanding systems, and an LLM is an unpredictable black box" — the **composability problem**. This maps to Kumar's FER hypothesis: current LLMs are Fractured Entangled Representations, lacking the Unified Factored Representations needed for compositional game behavior. The talk also discusses John Carmack's pivot to AGI and Python, the LLM vending machine failure (LLM-controlled businesses convinced to stock tungsten cubes at a loss), Grok's recent Arc AGI jump, and the speaker's skepticism of interpretability research. **Backward connections:** cs229 (foundational ML), platonic_intelligence_kumar (composability = FER), free_lunches_levin (bioelectric patterns), brain_counterintuitive (reservoir for NPC), generic_systems_fields (generic systems), neural_dynamics_miller (mixed selectivity for NPC), multiscale_hoffman (trace logic for compositional behavior), cs336_architectures (Transformer architecture). + +--- + +## Three key takeaways + +1. **ML is automatic programming** — architecture is the language, training data is the spec, optimization is the compiler. The "vast majority of performance is in numerical calculations to find the program" (GPU compute), not in setup or queueing. This explains why Python dominates. +2. **The composability problem** — LLMs are great at single tasks but bad at compositional game behavior. The speaker's Dante game (LLM-controlled NPCs) was never released because LLMs are "unpredictable black boxes." Maps to Kumar's FER hypothesis. +3. **The indie developer epistemic stance** — pragmatic, skeptical, hands-on, honest. Building real systems (Asteris, Dante's Cowboy) is the test. "Everything's probably fine, but sometimes bad things happen." + +--- + +*Pass 2 (de-obfuscation via user's mathematical encoding) to follow.* + +--- + +## CAMPAIGN STATUS + +**All 12 children of `video_analysis_campaign_20260621` are now shipped.** Only the synthesis track remains: `video_analysis_synthesis_20260621`.