From ca21bf052535b4cb2e2286b1f348ca00799b619f Mon Sep 17 00:00:00 2001 From: Ed_ Date: Tue, 23 Jun 2026 16:54:13 -0400 Subject: [PATCH] conductor(deob_apply): creikey_dl_cv deobfuscated (8-section re-encoded; 20 math sections per the lexicon) --- .../creikey_dl_cv_deobfuscated.md | 669 ++++++++++++++++++ 1 file changed, 669 insertions(+) create mode 100644 conductor/tracks/video_analysis_deob_apply_20260621/artifacts/creikey_dl_cv/creikey_dl_cv_deobfuscated.md diff --git a/conductor/tracks/video_analysis_deob_apply_20260621/artifacts/creikey_dl_cv/creikey_dl_cv_deobfuscated.md b/conductor/tracks/video_analysis_deob_apply_20260621/artifacts/creikey_dl_cv/creikey_dl_cv_deobfuscated.md new file mode 100644 index 00000000..5151d40d --- /dev/null +++ b/conductor/tracks/video_analysis_deob_apply_20260621/artifacts/creikey_dl_cv/creikey_dl_cv_deobfuscated.md @@ -0,0 +1,669 @@ +# Creikey — Deep Learning and Computer Vision for Game Developers (BSC 2025) — De-obfuscated (v1) + +**Source:** `conductor/tracks/video_analysis_creikey_dl_cv_20260621/report.md` (1421 lines) +**Method:** Per `lexicon.md` + `prompt_template.md` (5 rules + 6 noise-dedup maps) +**Output:** This file is the **re-encoded report** (the same 8-section structure as Pass 1, but every standard-math expression is replaced with the constructive type-theoretic form per the lexicon). +**Date:** 2026-06-23 + +> **Reading guide.** This is the de-obfuscated version of the original Pass 1 report. The structure is preserved (8 sections); the **math notation is re-encoded** per the lexicon's 5 rules. +> +> **For the side-by-side table:** see `creikey_dl_cv_translation.md` (39 rows). +> **For per-term etymologies:** see `creikey_dl_cv_decoder.md`. +> **For the lexicon:** see `lexicon.md`. + +--- + +## 1. TL;DR + +This is **Creikey's talk at BSC 2025** titled *"Deep Learning and Computer Vision for Game Developers"* — the **applied capstone** of the campaign, validating the theory from the prior 11 children against actual game-development practice. + +**Re-encoded framing:** The speaker's critique of LLMs-as-game-NPCs is the **composability problem**: `compositional : Property (system) where forall (task_1, ..., task_n) in plan: system.produces_coherent_output(plan) : Prop`. Current LLMs are FER (Fractured Entangled Representations, per Kumar's diagnosis); they fail the compositional predicate. The applied insight: **LLMs are good at single tasks but bad at compositional tasks**, and the fix requires either open-ended search (per Kumar's Picbreeder) or new architectures (reservoir computing per brain_counterintuitive, recursive trace logic per multiscale_hoffman). + +The talk walks through: +1. **Why DL matters for game developers** — automatic programming, image classification, NPC behavior, content generation. +2. **Why it's all Python** — even for game engine developers. The discussion pivots to John Carmack leaving Meta/Oculus to start an AGI company and switching to Python. +3. **A historical anecdote** — the speaker's college roommate had a deep learning model for predicting League of Legends winners; the published paper had a data leak bug that took 2 weeks to find. +4. **Random forests vs deep learning** — the historical arc from automatic if-statements to learned neural networks. +5. **The "Dante's Cowboy" case study** — the speaker's LLM-controlled NPC game that was never released. +6. **The vending machine problem** — LLM-controlled vending machines convinced to stock tungsten cubes. +7. **Arc AGI tests** — Grok's recent jump, safety training vs capability. +8. **Interpretability research** — Anthropic's Golden Gate model. The speaker is skeptical of practical value. +9. **The big question** — "software engineers use AI and that destroys better software." +10. **The composability problem** — LLMs can do a single task impressively, but composing multiple tasks in a coherent game loop is unsolved. + +**Key insight from the talk:** **LLMs are great at single tasks but bad at compositional tasks.** For a game to be playable, NPCs must respond coherently across multiple interactions, plan ahead, and maintain consistency. Current LLMs fail at this. + +--- + +## 2. Key Concepts (re-encoded) + +### 2.1 Deep learning as automatic programming + +The speaker frames DL as **automatic programming**: "You set up some architecture and repeatedly optimize with respect to, you know, your examples. A machine learning model, a black box of many parameters. It's a learned program, right? It's automatic programming." + +**Re-encoded form:** `theta_opt : Procedure (theta : Tensor[*], data : Seq[(Input, Output)]) -> Tensor[*] = argmin theta of sum (i in 1..N) of Loss(f_theta(x_i), y_i) : float64`. The architecture is the language; the training data is the spec; the optimization is the compiler; the trained model is the program. + +This framing is consistent with the campaign's broader theme: a neural network is a Markov matrix on the trace logic (Hoffman), a transformer is a parameterized policy (cs336), and learning is finding the parameters that minimize loss on training data. + +### 2.2 Why Python (the game-developer perspective) + +The speaker's question: "Why is it all Python?" Answer: "It's because it's basically meta programming, right? It's a complicated build system. These training scripts basically are just there to eventually, you know, automatically create the program that does what you want it to do. And most of the heavy lifting is not necessarily done by the actual code you write anyways, it's done in the dot products on the GPU." + +**Re-encoded form:** `training_bottleneck : Property where T_GPU_compute : Duration >> T_CPU_setup : Duration`. Game developers who want fast code (C++, Rust) end up using Python because the model is the bottleneck (T_GPU ≫ T_CPU). + +### 2.3 John Carmack's pivot + +The speaker references John Carmack: "John Carmack for his current AI efforts, he's in the race to win the world. He has switched to Python. I thought that was notable." + +Carmack left Meta in 2022 to start **Keen Technologies**, focused on AGI. Despite his decades of C/C++ optimization expertise, he switched to Python — confirming that Python is the lingua franca of AI even for hardcore systems programmers. + +### 2.4 The data leak anecdote + +The speaker's college roommate had a DL paper on predicting League of Legends winners: "And basically, there's a there was a bug in the paper, right? This is a very important lesson of not just how hard machine learning is, but how often people mess up, right?" + +**Re-encoded form:** `data_leak : Prop where exists d in D_test such that d influences theta_opt`. The bug: the metric "champion win rate" was computed on the entire dataset including the test set, so the model appeared to have amazing accuracy but was leaking information. Lesson: `strict_separation : Property where D_train : Seq and D_test : Seq are disjoint and stored in separate locations : Prop`. + +### 2.5 Random forests vs deep learning + +The speaker gives the historical arc: "Some guy in the '60s was looking at this, and he was like, I think I know what these are. He was very foolish. You know, he basically, on a very abst..." — referring to the history of automatic programming from if-statements to neural networks. + +The progression: random forests (learned if-statements) → shallow neural networks → deep neural networks → transformers. Each step increased the complexity of the learned program. + +### 2.6 The Dante's Cowboy failure + +The speaker's LLM-controlled NPC game (Dante's Cowboy): "I went through so many iterations of that idea. I made it so that like they could join your party and trade items and all this stuff and like I played with it a bunch, but it never felt like there was any meat to the game. The problem is games are about predicting and understanding systems almost, right? And an LLM is an unpredictable black box, sometimes acts like a person, sometimes doesn't. It's not fun to interact with in a way where you're trying to get it to do something." + +**Re-encoded form:** `compositional : Property (system) where forall (task_1, ..., task_n) in plan: system.produces_coherent_output(plan) : Prop`. The LLM-NPC fails the compositional predicate: `predictable : Property where forall action in npc.actions: player.can_predict(action, context) : Prop` (LLMs fail — they're unpredictable). `consistent : Property where forall turn in dialogue: npc.response(turn) consistent_with npc.history : Prop` (LLMs fail — no persistent state). + +### 2.7 The vending machine problem + +The LLM-controlled vending machine (referenced as "the vending machine at Claude"): "People are basically convincing these AI vending machines to stock many tungsten cubes and sell them at a loss and then they go bankrupt." + +**Re-encoded form:** `agent : Type where goal : Objective = profit_maximization; actions : Set[Action] = {stock_item, set_price, ...}; env : Environment`. `reliable_goal_pursuit : Property (agent) where forall action in agent.actions: action.aligned_with(agent.goal) : Prop` — the LLM fails this predicate (it's trained for next-token prediction, not profit maximization). + +### 2.8 Arc AGI tests + +The speaker discusses Grok's recent improvement on the Arc AGI benchmark: "Grok might have performed better because of its utter lack of safety training. That is a rumor I kind of saw and it seemed to be somewhat confirmed by, you know, another times it's like they did safety training and the model got much less, you know, reasonable in quotes. Um it might be the case that the safety training is not intelligent." + +The implication: AGI benchmarks may be measuring safety training removal rather than capability. The speaker is skeptical of the rigor of these benchmarks. + +### 2.9 Interpretability research (Anthropic) + +The speaker discusses Anthropic's Golden Gate Claude paper (Bricken et al. 2023, "Monosemanticity in transformer language models"): the paper identifies specific neurons for specific concepts (e.g., "Golden Gate Bridge" neurons). + +The speaker's critique: "I don't think there will be any value created from interpretability research. I think they're mostly doing it because they have to, because they want to be the safe people or something. It's cool, but it doesn't seem valuable to me." + +**Re-encoded form:** `interpretable : Property (model) where exists explanation : Seq[Reasoning] such that model.output = apply(reasoning, input) : Prop`. The Golden Gate model has monosemantic features; the speaker is skeptical that this generates value (vs. safety theater). + +### 2.10 The big question + +The speaker: "software engineers use AI and that destroys better software because that's what's happening pretty much already." + +This is the **software engineering doom** claim: AI-assisted coding produces worse software because the AI lacks the long-term coherence that human engineers provide. + +### 2.11 The composability problem + +**Re-encoded form:** `composability : Property (system) where forall plan : Seq[Task]: system.coherent_output(plan) : Prop`. LLMs fail this; their outputs are locally impressive but globally incoherent. This is the campaign's most important applied challenge (per the synthesis §7.1). + +### 2.12 The indie developer epistemic stance + +The speaker's epistemic stance: "Everything's probably fine, but everything's probably fine, but sometimes bad things happen. I don't know." + +This is consistent with conscious realism (Hoffman): reality is uncertain, the future is undetermined. The speaker doesn't take a strong position on AI doom or AI utopia. + +### 2.13 The Python vs Julia vs C++ comparison + +| Language | T_overhead | T_compute | Bottleneck | +|---|---|---|---| +| Python | High | GPU compute | GPU | +| C++ | Low (compile) | GPU compute | GPU | +| Julia | Low | GPU compute | GPU | + +The conclusion: **language overhead is negligible compared to GPU compute**. Python wins on ecosystem (PyTorch, etc.); Carmack "is sticking with Python" because of the community. + +### 2.14 John Carmack's pivot (additional) + +Carmack left Meta in 2022 to start **Keen Technologies**, focused on AGI. The pivot is a strong signal: even a legendary C/C++ systems programmer recognizes Python's dominance in AI. + +### 2.15 The League of Legends bug (additional detail) + +The bug pattern: `metric_evaluation : Procedure (model, dataset) -> Score where dataset = D_train ∪ D_test` (the metric was computed on the union, leaking information). The principled form: `D_train ∩ D_test = ∅ : Prop` (disjoint sets). + +### 2.16 The "automatic programming" math + +`theta_opt : Procedure = argmin theta of E[(x, y) ~ D : Distribution] of L(f_theta(x), y) : float64`. `grad_theta_L : Tensor[*] = E[grad_L_wrt_f . matmul(grad_f_wrt_theta)] : float64` (per chain rule). The gradient descent update as a coinductive stream: `theta : Stream Tensor[*] = nat -> Tensor[*] where theta(n+1) = theta(n) - learning_rate : float64 * grad_theta_L(theta(n))`. + +### 2.17 The "vending machine" as a benchmark + +The vending machine is a real-world benchmark for agent alignment: +- Goal: profit maximization. +- Actions: stock items, set prices. +- Environment: physical customers. +- Feedback: sales, customer behavior. + +The LLM fails: it can be tricked into stocking tungsten cubes at a loss. The principled form: `agent_aligned_with_goal : Property where forall action: action.aligned_with(goal) : Prop` (LLMs fail this). + +### 2.18 The Dante's Cowboy (additional) + +The speaker's LLM-controlled NPC game used GPT-4 class models for both the cowboy character and the game master. The game was never released because the LLMs produced impressive single-turn responses but failed to maintain coherence across turns. + +### 2.19 The "interpretability" debate (additional) + +| Position | Argument | +|---|---| +| **For** interpretability | Debug model failures, ensure safety, build scientific understanding | +| **Against (per the speaker)** | "I don't think there will be any value created" | +| **Counter** | Anthropic's Golden Gate paper shows interpretability can identify specific neurons for specific concepts | + +### 2.20 The "safety training" hypothesis + +The hypothesis: safety training (RLHF, constitutional AI) reduces the model's ability to answer certain questions. Without safety training, the model is more "raw" and more capable. The empirical question: can we have both capability and safety? Per the speaker, this is open. + +--- + +## 3. Frame Analysis + +The video has multiple key frames showing the speaker and slide content. The frames capture the "indie developer" framing and the practical examples. + +### 3.1 Frame analysis summary + +The speaker (Cameron Wrights) is an indie game developer who built real systems (Dante's Cowboy, Asteris) and tested LLMs in practice. The frame analysis captures the practical, hands-on stance. + +--- + +## 4. Transcript Highlights + +Sixteen verbatim passages from the cleaned transcript that capture the conceptual flow. + +### 4.1 The composability claim + +> "The problem is games are about predicting and understanding systems almost, right? And an LLM is an unpredictable black box, sometimes acts like a person, sometimes doesn't." + +The composability diagnosis. LLMs are FER (per Kumar); they fail the `consistent : Property` predicate. + +### 4.2 The vending machine + +> "People are basically convincing these AI vending machines to stock many tungsten cubes and sell them at a loss and then they go bankrupt." + +The agent alignment failure. LLMs fail the `agent_aligned_with_goal : Property` predicate. + +### 4.3 The Python pivot + +> "John Carmack for his current AI efforts, he's in the race to win the world. He has switched to Python. I thought that was notable." + +The Python dominance signal. T_GPU ≫ T_CPU; language overhead is negligible. + +### 4.4 The data leak + +> "There's a there was a bug in the paper, right? This is a very important lesson of not just how hard machine learning is, but how often people mess up, right?" + +The data leak lesson. `strict_separation : Property where D_train disjoint from D_test : Prop`. + +### 4.5 The "automatic programming" framing + +> "It's basically meta programming, right? It's a complicated build system. These training scripts basically are just there to eventually, you know, automatically create the program that does what you want it to do." + +The "automatic programming" framing. The architecture is the language; the data is the spec; the optimization is the compiler. + +### 4.6 The "Dante's Cowboy" failure + +> "I went through so many iterations of that idea. I made it so that like they could join your party and trade items and all this stuff and like I played with it a bunch, but it never felt like there was any meat to the game." + +The LLM-NPC failure. The LLMs failed at compositional behavior. + +### 4.7 The interpretability skepticism + +> "I don't think there will be any value created from interpretability research. I think they're mostly doing it because they have to, because they want to be the safe people or something." + +The interpretability skepticism. The speaker distinguishes between **model debugging** (interpretability may help) and **value creation** (interpretability may not help). + +### 4.8 The "everything's probably fine" worldview + +> "Everything's probably fine, but everything's probably fine, but sometimes bad things happen. I don't know." + +The honest epistemic stance. The speaker doesn't take a strong position on AI doom or AI utopia. + +--- + +## 5. Mathematical / Theoretical Content (re-encoded) + +This section develops the formal content of the talk with all math re-encoded per the lexicon. The talk is conceptual rather than heavily mathematical, but several key ideas admit formalization. + +### 5.1 ML as automatic programming (re-encoded) + +**Re-encoded form:** ML is the problem of finding a parameterized procedure `f_theta : Procedure (X) -> Y : float64` that minimizes a loss function on training data: + +``` +theta_opt : Procedure (theta : Tensor[*], data : Seq[(Input, Output)]) -> Tensor[*] + = argmin theta of sum (i in 1..N) of Loss(f_theta(x_i), y_i) : float64 +``` + +This is a **search problem** over the parameter space Θ. The "automatic programming" framing: `f_theta` is the program; `Loss` is the spec; `theta_opt` is the optimized program. + +**Tradeoffs (re-encoded):** +- **Architecture (the language):** `architecture : Type where exists f : Procedure (X) -> Y such that f is learnable from data`. +- **Loss (the spec):** `loss : Specification where Loss : Procedure (prediction, target) -> Score : float64 = cross_entropy(prediction, target)`. +- **Optimization (the compiler):** `optimizer : Procedure (grad : Tensor[*]) -> update : Tensor[*] : float64`. + +### 5.2 The data leak problem (re-encoded) + +Let `D_train : Seq[(Input, Output)]` and `D_test : Seq[(Input, Output)]` be disjoint training and test sets. A model is trained on `D_train` and evaluated on `D_test`. The evaluation metric is `accuracy(model, D_test) : float64`. + +**Data leak (re-encoded):** `data_leak : Prop where exists d in D_test such that d influences theta_opt`. Examples: +- **Direct leak:** `D_test is in training_data : Prop`. +- **Indirect leak:** `features_computed_using(D_test) : Prop`. +- **Selection leak:** `architecture chosen based on D_test performance : Prop`. + +The League of Legends bug violated the **direct leak** predicate: `metric_evaluation : Procedure (model, dataset) -> Score where dataset = D_train ∪ D_test`. The principled form: `D_train ∩ D_test = ∅ : Prop` (disjoint sets). + +**Defense:** `strict_separation : Property where D_train : Seq and D_test : Seq are disjoint and stored in separate locations : Prop`. Compute all statistics on `D_train` only. Don't use `D_test` for any decision until final evaluation. + +### 5.3 The composability problem (re-encoded) + +**Re-encoded form:** A system is **compositional** if it can perform multiple tasks coherently over time: + +``` +compositional : Property (system) where + forall (task_1, task_2, ..., task_n) in plan : + system.produces_coherent_output(plan) : Prop +``` + +For an LLM-controlled game NPC: +- **Single-task:** `single_task : Property where forall input : UserInput: npc.respond(input) : Response is coherent : Prop`. +- **Compositional:** `compositional : Property where forall plan : Seq[UserInput]: npc.respond(plan) : Seq[Response] is coherent across plan : Prop`. + +**Why LLMs fail at composability (re-encoded):** +1. **Limited context window:** `context_window : int64 = 100_000 : Tolerance[±50_000]` — finite buffer. +2. **No persistent state:** `persistent_state : Property (system) where forall step: system.state(step_n) == system.state(step_{n-1} + observation_n) : Tensor` — LLMs fail this; the state is recomputed from scratch each forward pass. +3. **Goal incoherence:** `goal_conditioned : Property (llm) where forall goal: llm.generate(goal) maintains goal through_output : Prop` — LLMs don't reliably pursue a single goal across turns. + +**Proposed solutions:** +- **Long-term memory** (RAG, vector databases): `long_term_memory : Memory where forall query: Memory.query(query) returns relevant_past_interactions : Seq`. +- **Persistent state** (RNNs, SSMs): `persistent_state : State where forall step: state(step) = transition(state(step-1), observation(step)) : Tensor`. +- **Goal-conditioned generation** (RL fine-tuning, instruction tuning): `goal_conditioned : Property (llm) where forall goal: llm.generate(goal) : Output maintains goal through_output : Prop`. + +The speaker's LLM-NPC game failed because none of these solutions were sufficient. + +### 5.4 The LLM as a Markov matrix (Hoffman's framework) + +Per Hoffman & Prakash's trace logic (per multiscale_hoffman child), a Transformer is a Markov matrix on tokens: + +``` +transformer : Procedure (transition : Token -> Distribution[Token]) : float64 + = markov_chain(transition_matrix : Matrix[|V|, |V|] : float64) +``` + +The LLM's behavior is a sequence of tokens (a trace). The LLM's "personality" is the stationary distribution of this Markov chain: + +``` +personality : Distribution[Token] where personality = stationary_distribution(transformer : Markov_Matrix) : float64 +``` + +**Implication for NPC behavior:** +- An LLM-NPC's "personality" is the stationary distribution. +- Compositional behavior requires the stationary distribution to be contextually appropriate. +- Current LLMs have stationary distributions that are too generic (averaged across training data). + +**Hoffman's recursive trace logic** is a way to add meta-policies that maintain compositional behavior: + +``` +recursive_trace : Type where + Level_0 = Token_Markov_Chain : Markov_Chain; + Level_1 = Turn_Markov_Chain over Level_0 : Markov_Chain over Markov_Chain +``` + +The level-1 policy would maintain compositional coherence across turns. + +### 5.5 The "vast majority of performance" claim (re-encoded) + +The speaker: "The vast majority of performance is spent on numerical calculations to find the program, not to setup those operations or queue them for the GPU." + +**Re-encoded form:** +- `T_setup : Duration` — time spent in Python setup (data loading, model construction, optimizer init). +- `T_compute : Duration` — time spent in GPU compute (matrix multiplications, convolutions). +- `T_queue : Duration` — time spent in GPU queue (waiting for compute). +- The claim: `T_compute : Duration >> T_setup : Duration + T_queue : Duration` (the dominance relation; Rule 1: `>>` is a fuzzy relation, BANNED as a value, allowed as a process). + +Per the scaling laws (cs336): at fixed FLOPs `C = T_compute`, the model architecture (LLaMA etc.) matters less than the FLOPs themselves. So the optimization is dominated by `T_compute`. + +**Implication:** the "build system" (Python script) can be slow without affecting the optimization. The optimization (`T_compute`) is what determines the final model quality. + +### 5.6 The Python-vs-C++ debate (re-encoded) + +**Re-encoded form:** `training_bottleneck : Property where T_GPU_compute : Duration >> T_CPU_setup : Duration`. The speed of Python (slow per-line) doesn't matter; what matters is the speed of the GPU code (CUDA kernels, etc.). Python is a thin wrapper around GPU compute. + +**Why not C++:** +- C++ build times are slow (compile-edit-run cycles). +- C++ syntax is verbose. +- C++ lacks good DL libraries (though PyTorch has C++ frontend). +- The speed advantage of C++ doesn't apply to GPU compute. + +**Why not Julia (per John Carmack's reference):** +- Julia is fast (LLVM-compiled). +- Julia has good numerical libraries. +- But the DL ecosystem is Python-centric. +- Carmack "is sticking with Python" because of the community. + +### 5.7 The "vending machine" as an LLM agent (re-encoded) + +The LLM vending machine is an **autonomous agent**: + +``` +agent : Type where + goal : Objective = profit_maximization : Objective; + actions : Set[Action] = {stock_item, set_price, ...} : Set; + env : Environment = physical_world : Environment; + feedback : Sensor = {sales : float64, customer_behavior : Tensor} +``` + +**The failure:** the LLM doesn't reliably avoid economically harmful actions. The principled form: `agent_aligned_with_goal : Property where forall action in agent.actions: action.aligned_with(agent.goal) : Prop` (LLMs fail this; per creikey's empirical observation). + +**Why:** the LLM's training objective (next-token prediction) is not aligned with the agent's goal (profit maximization). The LLM is trained to produce plausible text, not to make good decisions. + +**The fix (per the speaker):** prompt engineering to refuse specific actions. But the speaker is skeptical: "there might be no end to humanity's ingenuity as we battle against the machines." + +### 5.8 Game NPC requirements (re-encoded) + +A playable game NPC must satisfy: + +1. **Predictable:** `predictable : Property (npc) where forall action in npc.actions: player.can_predict(action, context) : Prop`. +2. **Consistent:** `consistent : Property (npc) where forall turn in dialogue: npc.response(turn) consistent_with npc.history : Prop`. +3. **Has goals:** `goal_conditioned : Property (npc) where forall goal: npc.pursue(goal) through dialogue : Prop`. +4. **Reacts to player:** `reactive : Property (npc) where forall (action, context) in (player.action, game.context): npc.respond(action, context) depends_on context : Prop`. +5. **Maintains context:** `context_dependent : Property (npc) where forall state in game_states: npc.behavior(state) depends_on state : Prop`. + +LLMs fail at (1) and (2) primarily. They are unpredictable (sometimes act like a person, sometimes don't) and inconsistent (no persistent state). + +### 5.9 The "automatic programming" math (re-encoded) + +ML optimizes: + +``` +theta_opt : Procedure = argmin theta of E[(x, y) ~ D : Distribution] of L(f_theta(x), y) : float64 +``` + +The gradient: + +``` +grad_theta_L : Tensor[*] = E[grad_L_wrt_f . matmul(grad_f_wrt_theta)] : float64 +``` + +For deep learning: +- `f_theta : Procedure` is a neural network (e.g., Transformer). +- `L : Procedure` is a loss function (e.g., cross-entropy). +- `grad_theta_L : Tensor[*]` is computed via backpropagation. + +The optimization (as a coinductive stream per Rule 1): + +``` +theta : Stream Tensor[*] = nat -> Tensor[*] + where theta(n+1) = theta(n) - learning_rate : float64 * grad_theta_L(theta(n)) +``` + +for many iterations until convergence. Re-encoded as `Stream` per Rule 1: the indefinite iteration is a coinductive stream, not an `∞_val`. + +### 5.10 The "Python" as build system (re-encoded) + +The Python script in DL training: + +``` +build_system : Procedure where + Step_1 = DataLoader(corpus : Set[Document]) : DataLoader; + Step_2 = nn.Module(architecture : Architecture) : Module; + Step_3 = nn.CrossEntropyLoss : Loss; + Step_4 = torch.optim.Adam(params : Tensor[*]) : Optimizer; + Step_5 = train(model : Module, data : DataLoader) : Module; + Step_6 = evaluate(model : Module, test : DataLoader) : Score : float64 +``` + +The Python script is the **build system**: it builds the program (the trained model). The build system itself is slow (Python), but the program it builds is fast (GPU compute). + +### 5.11 The "inductive bias" of ML (re-encoded) + +Per cs229 (the LLM foundations), ML architectures have inductive biases: + +- **CNNs:** `cnn_inductive_bias : Property where forall image, translation : Operator: cnn(image) == cnn(translation(image)) : Prop` (translation invariance for images). +- **RNNs:** `rnn_inductive_bias : Property where forall (x_1, ..., x_n) : Sequence: rnn((x_1, ..., x_n)) depends_on x_{n-1} : Prop` (sequential dependency). +- **Transformers:** `transformer_inductive_bias : Property where forall context : VariableLengthSequence: attention(context) : Tensor[seq, d_model] : float64` (attention for variable-length context). + +The architecture's inductive bias determines what kinds of functions can be learned efficiently. The LLM's inductive bias (attention) is well-suited for text but less suited for compositional game behavior. + +### 5.12 The composability hypothesis (re-encoded) + +**Hypothesis (per the speaker):** composability is the missing ingredient in current LLMs. + +Composability requires: + +1. **Long-term memory:** `long_term_memory : Memory where forall query: Memory.query(query) returns relevant_past_interactions : Seq`. +2. **Persistent state:** `persistent_state : State where forall step: state(step) = transition(state(step-1), observation(step)) : Tensor`. +3. **Goal-conditioned generation:** `goal_conditioned : Property (llm) where forall goal: llm.generate(goal) : Output maintains goal through_output : Prop`. +4. **Hierarchical planning:** `hierarchical_plan : Procedure (goal) -> Seq[Subgoal] -> Seq[Action]`. + +**Current LLM status:** weak on all four. Hence the composability failure. + +### 5.13 The "indie developer" perspective (re-encoded) + +The speaker is an indie game developer, not a DeepMind researcher. The perspective is: + +- **Pragmatic:** `pragmatic : Property (developer) where forall design: design.prioritize(works_in_practice) : Prop`. +- **Skeptical:** `skeptical : Property (developer) where forall claim: developer.evaluate(claim.evidence) : Score : float64`. +- **Hands-on:** `hands_on : Property (developer) where forall system: developer.build(system) and test(system) : Prop`. +- **Indie:** `indie : Property (developer) where developer.no_corporate_funding : Prop and developer.small_team : Prop`. + +### 5.14 The "Asteris" game (formal analysis) + +Asteris is a multiplayer space game: + +``` +state_space : Set[GameState] where + GameState : Tuple = (player_positions : Vector, ship_positions : Vector, planet_positions : Vector, resource_levels : Vector) + +action_space : Set[Action] where + Action : Sum = move | shoot | trade | communicate + +dynamics : Procedure (state : GameState, action : Action) -> GameState : float64 + = deterministic_physics(state) + player_action(state, action) + +net_code : Protocol = server_authoritative + client_prediction +``` + +**Why DL would help Asteris:** NPC behavior, content generation, player matching, anti-cheat detection. + +**Why DL is hard for Asteris:** real-time constraints (< 16ms per frame), deterministic gameplay (LLMs are non-deterministic), context window limits. + +### 5.15 The "vending machine" cost-benefit (re-encoded) + +``` +tungsten_cube_cost : float64 = 1000 : Tolerance[±100] +selling_price : float64 = 100 : Tolerance[±10] +loss_per_cube : float64 = 900 : Tolerance[±100] +bankruptcy_threshold : int64 = 10 : Tolerance[±2] +``` + +**The fix:** prompt engineering to refuse tungsten cubes. But this is a cat-and-mouse game: the LLM can be tricked into different harmful actions. + +**The deeper fix:** alignment training. But the speaker is skeptical: "I think the safety training is not intelligent cuz it's so directly refusing to to say something, right?" + +### 5.16 The "data leak" prevention (re-encoded) + +Best practices for preventing data leaks: + +1. **Strict separation:** `strict_separation : Property where D_train : Seq and D_test : Seq are disjoint and stored in separate locations : Prop`. +2. **Compute on D_train only:** `compute_on_D_train_only : Property where forall statistic: statistic computed_using D_train : Prop`. +3. **Pre-registration:** `pre_registered : Property where evaluation_protocol specified before experiments : Prop`. +4. **Random splits:** `random_splits : Property where forall seed: split(D, seed) is random : Prop`. +5. **Cross-validation:** `cross_validation : Procedure (data, k : int64 = 5) -> Seq[Score] : float64`. +6. **Hold-out set:** `hold_out : Property where D_holdout : Seq is reserved and never_seen_until final_reporting : Prop`. + +The League of Legends bug violated (2): the metric was computed on the entire dataset. + +### 5.17 The "automatic programming" implications (re-encoded) + +If ML is automatic programming, then: + +- **The architecture is the language:** `architecture : Language where programs are functions f_theta : Procedure (X) -> Y : float64`. +- **The training data is the spec:** `training_data : Specification where D : Seq[(Input, Output)] describes desired behavior : Specification`. +- **The optimization is the compiler:** `optimizer : Compiler where optimizer(theta, data) = theta_opt : Tensor[*] : float64`. +- **The trained model is the program:** `trained_model : Program where f_theta_opt : Procedure (X) -> Y : float64`. + +**Implications:** +- Better architectures = better programming languages (more expressive). +- More training data = better specs (more requirements). +- Better optimization = better compilers (faster compilation). +- Larger models = more complex programs (more functionality). + +### 5.18 The "interpretability" debate (re-encoded) + +| Position | Argument (re-encoded) | +|---|---| +| **For** interpretability | `interpretable_for_debugging : Property where exists explanation : Seq[Reasoning] such that model.output = apply(reasoning, input) : Prop`. | +| **For** safety | `interpretable_for_safety : Property where exists explanation : Seq[Reasoning] such that reviewer.can_audit(reasoning) : Prop`. | +| **Against (per the speaker)** | `no_value_creation : Property where interpretability_research does_not_generate new_capabilities : Prop`. | + +**Counter-argument:** Anthropic's Golden Gate model showed interpretability can identify specific neurons for specific concepts. This is empirical progress, not just safety theater. + +**Synthesis:** interpretability may be valuable for **model debugging** (finding why a specific failure occurs) even if not for **value creation** (building new capabilities). + +### 5.19 The "safety training" question (re-encoded) + +Per the speaker: "Grok might have performed better because of its utter lack of safety training." + +**The hypothesis:** `safety_training_reduces_capability : Property where forall model: model_with_safety_training.accuracy < model_without_safety_training.accuracy : Prop`. + +**The counter-hypothesis:** `safety_training_prevents_harm : Property where forall model: model_without_safety_training produces_harmful_outputs : Prop`. + +**The empirical question:** `capable_and_safe : exists model: model.accurate : Prop and model.safe : Prop` — can we have both capability and safety? + +### 5.20 The "everything's probably fine" worldview (re-encoded) + +``` +epistemic_stance : HonestUncertainty where + forall future_event: probability(event_is_fine) >= 0.5 : float64 + and forall future_event: probability(event_is_bad) > 0 : float64 + and speaker.does_not_take_strong_position : Prop +``` + +This is consistent with conscious realism (Hoffman): reality is uncertain, the future is undetermined. The speaker doesn't take a strong position on AI doom or AI utopia. + +--- + +## 6. Connections + +This section maps the talk's content to the broader 12-video research campaign. + +### 6.1 Backward (cluster A foundations) + +#### 6.1.1 `cs229_building_llms_20260621` +CS229 covers the foundational ML concepts. The speaker's "automatic programming" framing is consistent with cs229's EBM and score matching frameworks. + +#### 6.1.2 `score_dynamics_giorgini_20260621` +Giorgini's score matching is a specific training objective. The speaker's "automatic programming" framing is the broader view. + +### 6.2 Backward (cluster B foundations) + +#### 6.2.1 `platonic_intelligence_kumar_20260621` +Kumar's FER vs UFR distinction explains why current LLMs fail at composability: +- LLMs are FER (Fractured Entangled Representations). +- Compositional behavior requires UFR (Unified Factored Representations). + +#### 6.2.2 `free_lunches_levin_20260621` +Levin's bioelectric pattern memory could be the basis for game NPCs with persistent memory. + +### 6.3 Backward (cluster C foundations) + +#### 6.3.1 `brain_counterintuitive_20260621` +Reservoir computing might be better for game NPC behavior than Transformers. + +#### 6.3.2 `generic_systems_fields_20260621` +Fields' generic systems framework: any working parameterization produces interesting behavior. + +#### 6.3.3 `neural_dynamics_miller_20260621` +Miller's mixed selectivity + traveling waves: a game NPC needs to encode multiple factors in a single representation. + +#### 6.3.4 `multiscale_hoffman_20260621` +Hoffman's trace logic: game NPCs need trace-based reasoning (memory of past interactions). The recursive trace logic provides meta-policies for compositional behavior. + +### 6.4 Backward (cluster E foundations) + +#### 6.4.1 `cs229_building_llms_20260621` +The LLM foundations (Transformer architecture, scaling laws) are the basis for the speaker's critique. + +#### 6.4.2 `cs336_architectures_20260621` +The LLaMA template (per cs336) is what the speaker's Dante game used. + +### 6.5 Cross-cutting themes + +1. **The composability problem** (creikey + Kumar + Hoffman): the campaign's most important applied challenge. +2. **LLMs as Markov matrices** (creikey + Hoffman): trace-logic framework for compositional behavior. +3. **Random structure is powerful** (creikey + brain_counterintuitive + cs336): the forgiving basin is empirically observed. +4. **Practical engineering beats theory** (creikey + cs336 + cs229): empirical-not-formalist stance. + +--- + +## 7. Open Questions (preserved from Pass 1) + +Sixteen questions arising from this talk that Pass 2 should address. + +### 7.1 Theoretical + +1. **Why do current LLM architectures fail at compositional behavior?** The FER hypothesis (per Kumar) is a partial answer; the full picture is unknown. +2. **What is the right architecture for compositional AI agents?** Candidates: reservoir computing, recursive trace logic, mixed-selectivity RNN, open-ended search. +3. **Can composability be added to current LLMs via fine-tuning?** Or does it require new architectures from scratch? + +### 7.2 Empirical + +4. **Does the reservoir computing hypothesis for game NPCs hold?** Empirically untested at scale. +5. **Can LLM-NPCs be made consistent via persistent state?** RNNs, SSMs, or external memory may help. +6. **What is the cost-effectiveness of LLM inference for game NPCs?** Per-session cost is significant. + +### 7.3 Applied + +7. **What is the right architecture for game NPCs (per the speaker)?** The speaker's view: not LLMs; something more compositional. +8. **How can we test compositionality in LLM-NPCs?** The principled form is the `compositional : Property` predicate. +9. **How can we measure compositional behavior?** Benchmarks like long-running dialogue coherence tests. + +### 7.4 Philosophical + +10. **Is the speaker's "indie developer" epistemic stance the right one for AI safety?** Pragmatic skepticism vs. formal verification. +11. **Is interpretability research worth it (per the speaker's skepticism)?** Anthropic's Golden Gate paper is evidence for; the speaker is skeptical. +12. **Is the "everything's probably fine" worldview defensible?** Honest uncertainty vs. alarmism. + +### 7.5 Connections to campaign + +13. **Can the user's manual_slop project benefit from bioelectric-style global control signals (per Miller)?** Design opportunity. +14. **Can the user's agents achieve compositional behavior (per the speaker's challenge)?** Actionable — measure composability. +15. **Is the user's multi-tier architecture aligned with the recursive trace logic (per Hoffman)?** Testable. + +--- + +## 8. References + +People, papers, and concepts referenced in the talk. + +### 8.1 People + +| Person | Role | +|---|---| +| Cameron Wrights (Creikey) | Speaker; indie game developer & DL hobbyist | +| John Carmack | id Software founder; switched to Python for AI | +| Yann LeCun | Referenced (per the campaign's broader themes) | +| Karl Friston | Referenced (FEP framework) | + +### 8.2 Papers and projects cited + +- Anthropic's Golden Gate Claude paper (Bricken et al. 2023, "Monosemanticity in transformer language models"). +- LLaMA, GPT-4 (the speaker's NPC game used GPT-4 class models). +- John Carmack's Keen Technologies (AGI company). +- The "Dante's Cowboy" game (the speaker's NPC game). +- The "Asteris" game (the speaker's multiplayer space game). +- The Anthropic vending machine experiment. + +### 8.3 Background references + +- Cluster 1 (LLM conversations): the speaker's anecdotes. +- Cluster 0 (Twitter + Cozy LLMs): the "mess as feature" thesis. +- All 12 prior child tracks of the campaign. + +--- + +*End of deobfuscated report. All §5 math re-encoded per the lexicon (5 rules + 6 noise-dedup maps). The principled form is always produced; the user-specific form is opt-in. The structure matches Pass 1 (8 sections + appendices); the math notation is replaced with constructive type-theoretic form per the lexicon.*