From 59ba8ff2ba3acb8992a23dfb138451e197982af1 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Tue, 23 Jun 2026 00:06:51 -0400 Subject: [PATCH] conductor(deob_umbrella): Initialize Pass 2 de-obfuscation campaign umbrella Pass 2 of 3 multi-pass research campaign. 5 folders total (1 umbrella + 1 warmup + 3 phase children). - Umbrella spec.md (~400 lines): full design, philosophy, 3-layer deliverable, verification - Multi-pass framing: Pass 1 = extraction (done), Pass 2 = de-obfuscation (this), Pass 3 = projection (future user-led) - De-obfuscation philosophy: constructive type theory + Wildberger finitism + boundedness for knowledge + cycles/iteration explicit + etymology-aware - 4 verification criteria: lossless, bounded, constructively typed, etymology-cited - Multi-layer deliverable per video: translation (side-by-side) + replacement (re-encoded) + decoder (per-term etymology) - Phase 0: USER action item (gather 3-10 samples of past de-obfuscation notes) --- .../video_analysis_deob_20260621/README.md | 70 ++++ .../metadata.json | 188 ++++++++++ .../video_analysis_deob_20260621/plan.md | 59 +++ .../video_analysis_deob_20260621/spec.md | 337 ++++++++++++++++++ .../video_analysis_deob_20260621/state.toml | 87 +++++ 5 files changed, 741 insertions(+) create mode 100644 conductor/tracks/video_analysis_deob_20260621/README.md create mode 100644 conductor/tracks/video_analysis_deob_20260621/metadata.json create mode 100644 conductor/tracks/video_analysis_deob_20260621/plan.md create mode 100644 conductor/tracks/video_analysis_deob_20260621/spec.md create mode 100644 conductor/tracks/video_analysis_deob_20260621/state.toml diff --git a/conductor/tracks/video_analysis_deob_20260621/README.md b/conductor/tracks/video_analysis_deob_20260621/README.md new file mode 100644 index 00000000..981fd4de --- /dev/null +++ b/conductor/tracks/video_analysis_deob_20260621/README.md @@ -0,0 +1,70 @@ +# Video Analysis De-obfuscation Campaign (2026-06-21) — Pass 2 of 3 + +**Status:** Active (spec approved 2026-06-21) +**Owner:** Tier 1 Orchestrator (umbrella spec); Tier 2 Tech Lead (per-track execution) +**Type:** Multi-track research campaign (5 folders total) +**User-as-source:** The warmup is blocked on the user providing samples (Phase 0). + +This is **Pass 2 of 3** in the research campaign to penetrate the AI field. See [spec.md](./spec.md) for the full design and the user's de-obfuscation philosophy (constructive type theory, bounded forms, cycles/iteration explicit, etymology-aware). + +## Children + +### Precursor (standalone, not a child of this umbrella) + +| | Track | Status | Notes | +|---|-------|--------|-------| +| Warmup | [../video_analysis_deob_warmup_20260621/](../video_analysis_deob_warmup_20260621/) | [ ] | Produces `report.md` + `prompt_template.md` from the user's past samples. User must provide samples first. **Blocks the 3 children below.** | + +### Phase children (3) + +| # | Track | Status | Input | Output | +|---|-------|--------|-------|--------| +| 1 | [../video_analysis_deob_lexicon_20260621/](../video_analysis_deob_lexicon_20260621/) | [ ] | Warmup's `report.md` + `prompt_template.md` | `lexicon.md` + `terms_catalog.md` + `dedup_map.md` | +| 2 | [../video_analysis_deob_pilot_20260621/](../video_analysis_deob_pilot_20260621/) | [ ] | Lexicon + 2 Pass 1 reports (`cs229_building_llms`, `entropy_epiplexity`) | 2× 3-layer deliverables + `pilot_report.md` | +| 3 | [../video_analysis_deob_apply_20260621/](../video_analysis_deob_apply_20260621/) | [ ] | Refined lexicon + 10 Pass 1 reports + 1 synthesis | 11× 3-layer deliverables + `apply_report.md` | + +## Dependency graph + +``` +WARMUP (precursor) ──► PHASE 1 (lexicon) ──► PHASE 2 (pilot) ──► PHASE 3 (apply) + │ │ │ │ + └────────────────────┴────────────────────┴────────────────────┘ + blocked_by chain +``` + +## Status legend + +- `[ ]` not started +- `[~]` in progress +- `[x]` shipped +- `[!]` blocked + +## Cluster / phase legend + +- **Warmup** — research-style track; produces the initial lexicon + LLM prompt template +- **Phase 1 (lexicon)** — refines the warmup's draft into a codified operational spec +- **Phase 2 (pilot)** — applies the lexicon to 2 videos (1 foundational + 1 math-heavy) for validation +- **Phase 3 (apply)** — applies the refined lexicon to the remaining 10 + synthesis + +## Multi-layer deliverable (per video) + +For each de-obfuscated video, the output is 3 files in `artifacts//`: +1. **`translation.md`** — side-by-side table: original expression ↔ re-encoded form +2. **`deobfuscated.md`** — the re-encoded report (replacement; same 8-section structure as Pass 1) +3. **`decoder.md`** — per-term decoder: form anchor, etymology, definition history, link to original section + +## Verification (4 criteria per deliverable) + +- **Lossless** — every Pass 1 concept is represented +- **Bounded** — no `∞_val` or `∞_card`; all values are finite forms +- **Constructively typed** — every expression has a type +- **Etymology-cited** — every new term has a 1-line origin + 1-line definition history in the decoder + +## See also + +- [spec.md](./spec.md) — full design (15 sections, ~400 lines) +- [plan.md](./plan.md) — campaign-level plan +- [metadata.json](./metadata.json) — scope, risk register, verification criteria +- [state.toml](./state.toml) — current phase + task tracking +- `../video_analysis_campaign_20260621/spec.md` §0, §11 — multi-pass framing (Pass 1 → Pass 2 handoff) +- `../intent_dsl_survey_20260612/report_v1.2.md` — sibling DSL (tool-verb DSL for AI agents; shares philosophy but is for tool verbs, not math re-encoding) diff --git a/conductor/tracks/video_analysis_deob_20260621/metadata.json b/conductor/tracks/video_analysis_deob_20260621/metadata.json new file mode 100644 index 00000000..7e306342 --- /dev/null +++ b/conductor/tracks/video_analysis_deob_20260621/metadata.json @@ -0,0 +1,188 @@ +{ + "track_id": "video_analysis_deob_20260621", + "name": "Video Analysis De-obfuscation Campaign (Pass 2 of 3)", + "created": "2026-06-21", + "status": "spec_approved", + "blocked_by": [], + "blocks": [ + "video_analysis_deob_warmup_20260621" + ], + "priority": "A", + "rationale": "User-blocking Pass 2 of the 3-pass research campaign. De-obfuscates the 12 Pass 1 deep-dive reports via the user's constructive type-theoretic re-encoding DSL. Lossless preservation directive (carries from Pass 1). 5 folders: 1 warmup (precursor) + 1 umbrella + 3 phase children (lexicon/pilot/apply). Multi-layer deliverable per video: translation (side-by-side) + replacement (re-encoded) + decoder (per-term etymology).", + "type": "multi-track research campaign (1 umbrella + 1 warmup + 3 phase children = 5 folders)", + "domain": "meta-tooling (research deliverable + LLM operational spec; no manual_slop src/ changes)", + "scope": { + "new_folders": [ + "conductor/tracks/video_analysis_deob_20260621/", + "conductor/tracks/video_analysis_deob_warmup_20260621/", + "conductor/tracks/video_analysis_deob_lexicon_20260621/", + "conductor/tracks/video_analysis_deob_pilot_20260621/", + "conductor/tracks/video_analysis_deob_apply_20260621/" + ], + "new_files_umbrella": [ + "spec.md", + "plan.md", + "metadata.json", + "state.toml", + "README.md" + ], + "new_files_warmup": [ + "spec.md", + "plan.md", + "metadata.json", + "state.toml", + "samples/ (gitignored)", + "report.md (the design philosophy + lexicon + dedup maps)", + "prompt_template.md (the LLM-direct operational spec)" + ], + "new_files_per_child": [ + "spec.md (lightweight)" + ], + "new_files_pilot": [ + "artifacts/cs229_building_llms/translation.md", + "artifacts/cs229_building_llms/deobfuscated.md", + "artifacts/cs229_building_llms/decoder.md", + "artifacts/entropy_epiplexity/translation.md", + "artifacts/entropy_epiplexity/deobfuscated.md", + "artifacts/entropy_epiplexity/decoder.md", + "pilot_report.md" + ], + "new_files_apply": [ + "artifacts/<10 remaining slugs>/translation.md (×10)", + "artifacts/<10 remaining slugs>/deobfuscated.md (×10)", + "artifacts/<10 remaining slugs>/decoder.md (×10)", + "artifacts/synthesis/translation.md", + "artifacts/synthesis/deobfuscated.md", + "artifacts/synthesis/decoder.md", + "apply_report.md" + ], + "modified_files": [ + "conductor/tracks/video_analysis_campaign_20260621/spec.md (§11.1 updated to reference this campaign)", + "conductor/tracks.md (add row for this campaign)", + "conductor/chronology.md (5 new rows after campaign ships)" + ], + "deleted_files": [], + "gitignored_patterns": [ + "conductor/tracks/video_analysis_deob_warmup_20260621/samples/** (user's past notes are local-only)" + ] + }, + "estimated_effort": { + "method": "scope (per conductor/workflow.md Tier 1 Track Initialization Rules). NO day estimates.", + "phase_0": "1 USER action item (gather samples)", + "phase_1": "5 tasks: warmup initialization, sample survey, report.md (~1000-3000 LOC), prompt_template.md (~200-500 LOC), user approval", + "phase_2": "4 tasks: lexicon child init, refine warmup's draft, produce lexicon.md + terms_catalog.md + dedup_map.md, user approval", + "phase_3": "5 tasks: pilot child init, apply to 2 videos, write pilot_report.md, user approval", + "phase_4": "4 tasks: apply child init, apply to 10+1 outputs, write apply_report.md", + "phase_5": "6 tasks: closeout (README update, state.toml, end-of-track report, archive move, chronology, tracks.md)", + "summary": "5 track folders, 1 user action item, 13 deliverable files (2 pilot + 11 apply, each 3-layer), 4 reports (warmup report, warmup template, pilot, apply), 5 closeout tasks. No day estimates per project convention." + }, + "verification_criteria": [ + "Warmup shipped with report.md + prompt_template.md (and the user has approved the lexicon)", + "Phase 1 (lexicon) shipped with lexicon.md + terms_catalog.md + dedup_map.md", + "Phase 2 (pilot) shipped with 2 deobfuscated deliverables (each 3-layer) + pilot_report.md", + "Phase 3 (apply) shipped with 11 deobfuscated deliverables (each 3-layer) + apply_report.md", + "Each deliverable passes the 4 verification criteria: lossless, bounded, constructively typed, etymology-cited", + "Umbrella state.toml updated to status = 'completed'", + "End-of-track report at docs/reports/TRACK_COMPLETION_video_analysis_deob_20260621.md", + "All 5 folders move to conductor/archive/ per the project's archiving convention", + "conductor/chronology.md updated with 5 new rows", + "No new src/*.py files created (per AGENTS.md File Size and Naming Convention)", + "No new pyproject.toml dependencies" + ], + "risk_register": [ + { + "id": "R1", + "title": "User cannot provide samples in time", + "likelihood": "medium", + "scope_impact": "Warmup blocked", + "mitigation": "User can provide partial samples (1-2 examples); warmup can use them as a starter lexicon" + }, + { + "id": "R2", + "title": "User's samples don't have enough de-obfuscation patterns", + "likelihood": "medium", + "scope_impact": "Warmup produces a thin lexicon", + "mitigation": "Phase 1 (lexicon) extends the warmup's draft with constructive type theory defaults" + }, + { + "id": "R3", + "title": "Lexicon can't capture a concept in bounded form", + "likelihood": "medium", + "scope_impact": "Some concepts remain 'indefinite — see original'", + "mitigation": "Document the gap; don't force a translation. The 4 verification criteria allow 'etymology-cited' but not 'forced'" + }, + { + "id": "R4", + "title": "Pilot reveals the lexicon is overfit to the user's style", + "likelihood": "low", + "scope_impact": "Refinement needed in Phase 2", + "mitigation": "pilot_report.md captures gaps; Phase 3 uses the refined lexicon" + }, + { + "id": "R5", + "title": "The 3-layer deliverable format is too verbose for some videos", + "likelihood": "low", + "scope_impact": "Adjust per video", + "mitigation": "Format is a template, not a rigid structure; some sections may be smaller" + }, + { + "id": "R6", + "title": "Tier 2 attempts to invent the lexicon without the user's samples", + "likelihood": "low (if user samples present)", + "scope_impact": "Lexicon is invented, not evidence-based", + "mitigation": "Warmup spec is explicit: 'consume user samples FIRST; lexicon is evidence-based'" + }, + { + "id": "R7", + "title": "Pass 3 needs the de-obfuscated outputs but Pass 2 isn't done", + "likelihood": "high (timeline)", + "scope_impact": "Pass 3 blocked", + "mitigation": "This campaign's 'lossless preservation' ensures Pass 3 has all the input it needs once Pass 2 ships" + }, + { + "id": "R8", + "title": "The user changes their mind about the philosophy mid-campaign", + "likelihood": "low", + "scope_impact": "Pilot reveals the shift; lexicon is updated", + "mitigation": "pilot_report.md is the checkpoint for user review" + } + ], + "architecture_reference": { + "primary_documents": [ + "conductor/workflow.md (track convention, per-task commits, git notes, verification protocol)", + "conductor/code_styleguides/python.md (1-space indent, type hints, no comments - IF code is written)", + "conductor/code_styleguides/error_handling.md (Result[T] pattern - IF code is written)", + "AGENTS.md (artifact isolation, file naming, no new src/.py)" + ], + "related_tracks": [ + "conductor/tracks/intent_dsl_survey_20260612/ (sibling DSL: tool-verb DSL for AI agents, shares philosophy)", + "conductor/tracks/video_analysis_campaign_20260621/ (Pass 1 - the input to de-obfuscate)", + "conductor/tracks/nagent_review_20260608/ (research-track precedent)", + "conductor/tracks/fable_review_20260617/ (research-track precedent)" + ], + "styleguides_applied": [ + "agent_memory_dimensions.md (Pass 2 produces a 'knowledge' memory artifact)", + "knowledge_artifacts.md (knowledge harvest pattern; relevant to the de-obfuscation's durable nature)" + ] + }, + "deferred_to_followup_tracks": [ + { + "title": "Pass 3: Projection to user's applied domain", + "description": "Apply Pass 2's de-obfuscated outputs to the user's preferred code style. Influences: handmade/data-oriented/GPGPU (Timothy Lottes, Onat Türkçüoğlu, Jebrim) + user's own caveats.", + "track_status": "not started - blocked by this campaign", + "blocker_action_item": "User must articulate 'own caveats' before Pass 3 starts (per Pass 1 spec §11.2)" + } + ], + "regressions_and_pre_existing_failures": [], + "pre_existing_failures_remaining": [], + "user_directives": [ + "Unorthodox knowledge curation philosophy (2026-06-21)", + "Constructive type theory + Wildberger-style finitism as foundation (2026-06-21)", + "Boundedness required for direct knowledge; cycles/iteration allowed but expressed explicitly (2026-06-21)", + "Multi-layer deliverable per video (translation + replacement + decoder) (2026-06-21)", + "Warmup is the precursor; lexicon is evidence-based from user's past samples (2026-06-21)", + "Report + prompt template as the warmup output (2026-06-21)", + "5 folders at conductor/tracks/ level, hybrid umbrella structure (2026-06-21)", + "No day estimates per conductor/workflow.md Tier 1 Track Initialization Rules" + ] +} diff --git a/conductor/tracks/video_analysis_deob_20260621/plan.md b/conductor/tracks/video_analysis_deob_20260621/plan.md new file mode 100644 index 00000000..8ca7fbc2 --- /dev/null +++ b/conductor/tracks/video_analysis_deob_20260621/plan.md @@ -0,0 +1,59 @@ +# Plan: Video Analysis De-obfuscation Campaign (umbrella) + +This is the umbrella-level plan for Pass 2 of the 3-pass research campaign. Per the Tier 1 Track Initialization Rules, scope is measured in files/sites — no day estimates. + +## Phase 0: User samples provided + +This phase is a USER action item, not a Tier 2/3 action. + +- [ ] **Task 0.1:** User gathers 3-10 samples of their past de-obfuscation notes (any format: markdown, txt, mixed) and places them in `conductor/tracks/video_analysis_deob_warmup_20260621/samples/`. + +## Phase 1: Warmup (precursor) + +- [ ] **Task 1.1:** Initialize the warmup track: create `plan.md`, `metadata.json`, `state.toml` (Tier 2). +- [ ] **Task 1.2:** Survey the samples (Tier 3 worker delegated): term frequency, structural patterns, "form projection" heuristics, noise-dedup maps. +- [ ] **Task 1.3:** Write `report.md` (Tier 3 worker, ~1000-3000 LOC). Sections: design philosophy (anchored to user directives), the curated lexicon (terms + re-encodings), the 3 noise-dedup maps, sample transformations. +- [ ] **Task 1.4:** Write `prompt_template.md` (Tier 3 worker, ~200-500 LOC). The operational artifact; an LLM can be prompted with this directly to perform the de-obfuscation. +- [ ] **Task 1.5:** User review + approval of the lexicon + template. + +## Phase 2: Phase 1 child — Lexicon refinement + +- [ ] **Task 2.1:** Initialize the lexicon child track (Tier 2). +- [ ] **Task 2.2:** Refine the warmup's draft: add 5-10 test cases (example transformations drawn from the user's samples); add the "form anchor" requirement; add cross-references to the warmup's report. +- [ ] **Task 2.3:** Produce `lexicon.md` (the codified operational spec), `terms_catalog.md` (the machine-readable lexicon), `dedup_map.md` (the 3 noise-dedup maps). +- [ ] **Task 2.4:** User review + approval before Phase 3 child starts. + +## Phase 3: Phase 2 child — Pilot on 2 videos + +- [ ] **Task 3.1:** Initialize the pilot child track (Tier 2). +- [ ] **Task 3.2:** Apply the lexicon to `cs229_building_llms` (Pass 1 report). Produce 3-layer deliverable in `artifacts/cs229_building_llms/`. +- [ ] **Task 3.3:** Apply the lexicon to `entropy_epiplexity` (Pass 1 report). Produce 3-layer deliverable in `artifacts/entropy_epiplexity/`. +- [ ] **Task 3.4:** Write `pilot_report.md` capturing: lexicon refinements discovered, concepts that didn't fit (gaps), process improvements. +- [ ] **Task 3.5:** User review + approval before Phase 4 child starts. + +## Phase 4: Phase 3 child — Apply to remaining 10 + synthesis + +- [ ] **Task 4.1:** Initialize the apply child track (Tier 2). +- [ ] **Task 4.2:** Apply the refined lexicon to each of the 10 remaining Pass 1 reports. Produce 3-layer deliverables in `artifacts//`. +- [ ] **Task 4.3:** Apply the refined lexicon to the cross-cutting synthesis. Produce 3-layer deliverable in `artifacts/synthesis/`. +- [ ] **Task 4.4:** Write `apply_report.md` capturing: final lexicon v2, final process refinements, open questions for Pass 3. + +## Phase 5: Campaign closeout + +- [ ] **Task 5.1:** Update umbrella `README.md` with final statuses (all 4 children shipped). +- [ ] **Task 5.2:** Update umbrella `state.toml` to `status = "completed"`. +- [ ] **Task 5.3:** Write end-of-track report at `docs/reports/TRACK_COMPLETION_video_analysis_deob_20260621.md`. +- [ ] **Task 5.4:** Move all 5 folders to `conductor/archive/` per the project's archiving convention. +- [ ] **Task 5.5:** Update `conductor/chronology.md` with 5 new rows. +- [ ] **Task 5.6:** Update `conductor/tracks.md` to remove the campaign from Active Tracks. + +## Verification (gate per workflow.md) + +Each phase's completion requires: +- [ ] Idempotency check: re-running the de-obfuscation produces identical output (modulo timestamps) +- [ ] 4 verification criteria per umbrella spec §8 (lossless, bounded, constructively typed, etymology-cited) +- [ ] User review + approval at each phase boundary +- [ ] Per-task commits with git notes +- [ ] All artifacts committed to git + +The campaign is "Pass 2 complete" when all 5 folders shipped, the 13 deliverables (2 pilot + 11 apply) are in `artifacts/`, and the user has approved each. diff --git a/conductor/tracks/video_analysis_deob_20260621/spec.md b/conductor/tracks/video_analysis_deob_20260621/spec.md new file mode 100644 index 00000000..c5a75225 --- /dev/null +++ b/conductor/tracks/video_analysis_deob_20260621/spec.md @@ -0,0 +1,337 @@ +# Track Specification: Video Analysis De-obfuscation Campaign (2026-06-21) + +**Status:** Active (spec approved 2026-06-21) +**Initialized:** 2026-06-21 +**Owner:** Tier 1 Orchestrator (umbrella spec + synthesis); Tier 2 Tech Lead (per-track execution) +**Priority:** A (user-blocking; Pass 2 of the 3-pass research campaign) +**Type:** Multi-track research campaign (1 warmup + 1 umbrella + 3 phase children = 5 folders total) +**Domain:** Meta-tooling (research deliverable + LLM operational spec; no `src/` changes) + +> **Purpose.** This umbrella organizes Pass 2 of the user's 3-pass research campaign: **de-obfuscation** of the Pass 1 video reports via the user's constructive type-theoretic re-encoding DSL. The de-obfuscation reduces standard math notation + verbose DSL/verbiage into a bounded, constructive, type-theoretic form that bridges the conceptual gap and crystallizes the formal language into the reader's mind. + +> **Multi-pass context.** Pass 1 produced 12 deep-dive reports (1000-10000 LOC each) + 1 cross-cutting synthesis. Pass 2 takes those and produces a multi-layer de-obfuscated version per video. Pass 3 (future, user-led) projects the de-obfuscated content to the user's applied domain (handmade/data-oriented/GPGPU + own caveats). + +> **Companion docs.** The warmup track (`video_analysis_deob_warmup_20260621/`) is the precursor that produces the initial lexicon + LLM prompt template. The 3 phase children (`video_analysis_deob_{lexicon,pilot,apply}_20260621/`) consume the warmup's output and apply it to the Pass 1 reports. + +--- + +## 1. Overview + +### 1.1 The user's de-obfuscation philosophy (foundational) + +The user curates knowledge unorthodoxy, especially formal math/sciences. Their position: + +| Position | Take | +|---|---| +| **Form requires bounds** | "To be known is to project a form." Boundedness is required for direct knowledge. | +| **Indefinite is not directly knowable** | What is unbounded is indefinite; what is indefinite is indiscernible, unobserved, unsubject, unknowable. | +| **Cycles/iteration/repetition are allowed** | Indefinite *operations* on bounded *forms* are expressible. `Stream A = nat -> A` is fine; `∞_val` is not. | +| **The agent is bounded by necessity** | An agent is "envesseled in the soup of the universe," separated from the indefinite to discern. The agent cannot be indefinite. | +| **Standard math notation is "noise"** | Too compressed, error-prone, ASCII-hostile, not programmatic, not verifiable, not visualizable. Lots of synonyms that mean the same thing (Curry-Howard: proofs=programs, types=propositions, etc.). | +| **Constructive type theory is the foundation** | Proofs = programs (Curry-Howard); every value is a bounded form; operations are transformations. | +| **Lexicon is etymology-aware** | Each term's word origin + definitional history is documented. Words are chosen to match modern subjective experience. | +| **Inspiration** | Modern PL design — concatenative (Forth/KYRA/CoSy), data-oriented imperative (Lottes), immediate-mode DAG-building DSLs (O'Donnell's IMGUI). | + +### 1.2 What Pass 2 produces + +For each of the 12 Pass 1 reports + 1 cross-cutting synthesis, Pass 2 produces a **3-layer de-obfuscated deliverable**: + +1. **Translation** (`_translation.md`) — side-by-side table: original expression ↔ re-encoded form +2. **Replacement** (`_deobfuscated.md`) — the re-encoded form replaces the original; the report is read as a bounded, constructive, type-theoretic document +3. **Decoder index** (`_decoder.md`) — per-term decoder: form anchor, etymology, definition history, link to the original section + +Plus a per-track **pilot_report.md** or **apply_report.md** capturing lexicon refinements. + +### 1.3 The 2-stage Pass 2 flow + +``` +Stage 1 (Warmup - precursor): Stage 2 (Apply - 3 phases): + ┌─ Phase 1: Lexicon (refine warmup's draft) +User's past notes ──► Warmup report.md + │ + prompt_template.md ───────┤─ Phase 2: Pilot (apply to 2 videos, refine) + │ + └─ Phase 3: Apply (apply to 10 + synthesis) +``` + +--- + +## 2. Current State Audit (as of 2026-06-21) + +### 2.1 Already Available (DO NOT re-derive) + +| Asset | Location | Use in Pass 2 | +|---|---|---| +| Pass 1 reports (12 + 1 synthesis) | `conductor/tracks/video_analysis__20260621/report.md` + `summary.md` | The input to de-obfuscate | +| Pass 1 transcripts + OCR | `conductor/tracks/video_analysis__20260621/artifacts/` | Source material for re-encoding context | +| `intent_dsl_survey_20260612` report | `conductor/tracks/intent_dsl_survey_20260612/report_v1.2.md` | Sibling DSL (a tool-verb DSL for AI agents); not the math re-encoding, but shares the philosophy | +| 4-tier vocab + 14-primitive grammar | `intent_dsl_survey_20260612/report_v1.2.md` §3, §4 | Reference for the PL-design vocabulary to use in the de-obfuscation DSL | +| `conductor/code_styleguides/error_handling.md` | Conductor docs | `Result[T]` convention for any new Python tooling | +| `conductor/code_styleguides/python.md` | Conductor docs | 1-space indent, type hints, no comments | +| Reference scripts (bootslop) | `C:\projects\forth\bootslop\*.py` | yt-dlp / cv2 / winsdk OCR — NOT needed for Pass 2 (no video processing) | + +### 2.2 Gaps to Fill (this campaign's scope) + +| # | Gap | Resolution | +|---|---|---| +| G1 | The user has no codified de-obfuscation DSL | Warmup produces `report.md` + `prompt_template.md` from the user's past samples | +| G2 | The de-obfuscation lexicon is not yet finalized | Phase 1 (lexicon) refines the warmup's draft into a codified spec | +| G3 | No pilot validation of the lexicon | Phase 2 (pilot) applies to 2 videos (1 foundational + 1 math-heavy) and captures refinements | +| G4 | No application to the remaining 10 + synthesis | Phase 3 (apply) applies the refined lexicon to the remaining Pass 1 outputs | +| G5 | No multi-layer deliverable structure | The 3-layer format (translation / replacement / decoder) is the new convention | + +--- + +## 3. Goals + +1. **Lexicon derived from the user's exemplars.** The de-obfuscation DSL is not invented from scratch; it is extracted from the user's past de-obfuscation notes via the warmup track. Evidence-based, not imposed. +2. **LLM-direct operational spec.** The de-obfuscation is performed by an LLM following the prompt template. The template is the "code" — the contract between the warmup and the apply phases. +3. **Lossless preservation (carries Pass 1's directive).** No Pass 1 concept is lost. The 3-layer output ensures every standard-math expression is represented (translation), replaced (replacement), and explained (decoder). +4. **Bounded, constructive, type-theoretic.** Every value is a bounded form. Iteration is explicit. "Infinity" is disambiguated lexically: `∞_val` (banned), `∞_proc` (allowed), `∞_card` (banned). +5. **Etymology + definitional history.** Each new term has a 1-line origin note + a 1-line definition history in the decoder. +6. **Multi-pass handoff.** Pass 3 (projection to applied domain) can consume the de-obfuscated outputs as its input. The handoff is clean: Pass 2 produces bounded, constructive forms; Pass 3 can apply them to the user's stylistic preferences. + +--- + +## 4. Functional Requirements + +### FR1. Umbrella folder + README + +**WHERE:** `conductor/tracks/video_analysis_deob_20260621/` + +**WHAT:** This folder contains the umbrella design (this spec) + 4 sibling files (`plan.md`, `metadata.json`, `state.toml`, `README.md`). The README is the index of the 4 sibling tracks (warmup + 3 phases) with their statuses. + +### FR2. Warmup track (precursor) + +**WHERE:** `conductor/tracks/video_analysis_deob_warmup_20260621/` + +**WHAT:** Standalone research-style track. Produces: +- `report.md` — the design philosophy + the curated lexicon (terms + re-encodings) + the 3 noise-dedup maps (Curry-Howard-style collapses) +- `prompt_template.md` — the operational spec; an LLM can be prompted with this directly + +**Inputs:** The user provides samples in `samples/` (their past de-obfuscation notes). Format: markdown, txt, or any text the user has. + +**Process:** Tier 2 worker surveys the samples for term frequency, structural patterns, "form projection" heuristics, and noise-dedup maps. Produces a report + prompt template following the convention of `intent_dsl_survey_20260612/report_v1.2.md`. + +**`blocked_by`:** none (user must provide samples before warmup can start; the user's action item is the FIRST dependency). + +**`blocks`:** the 3 phase children (lexicon, pilot, apply) all depend on the warmup's output. + +### FR3. Phase 1 — Lexicon refinement + +**WHERE:** `conductor/tracks/video_analysis_deob_lexicon_20260621/` + +**WHAT:** Consumes the warmup's `report.md` + `prompt_template.md`. Produces a codified `lexicon.md` (the operational spec for the de-obfuscation LLM) + `terms_catalog.md` (the machine-readable lexicon) + `dedup_map.md` (the 3 noise-dedup maps). + +The lexicon refinement adds: +- Test cases (5-10 example transformations drawn from the user's samples) +- A "form anchor" requirement: each re-encoding must project from an indefinite to a bounded form +- Cross-references to the warmup's report sections + +### FR4. Phase 2 — Pilot on 2 videos + +**WHERE:** `conductor/tracks/video_analysis_deob_pilot_20260621/` + +**WHAT:** Consumes the codified `lexicon.md` + 2 Pass 1 reports. The 2 pilot videos are: +1. `cs229_building_llms` (foundational ML/LLM coverage — wide scope, good test for "form projection" across many concepts) +2. `entropy_epiplexity` (math-heavy, focused on information-theoretic concepts — good test for "boundedness" + type-theoretic encoding of measure theory) + +For each pilot video, produces the 3-layer deliverable in `artifacts//`: +- `translation.md` (side-by-side: original ↔ re-encoded) +- `deobfuscated.md` (replacement: re-encoded form replaces the original) +- `decoder.md` (per-term decoder: form anchor, etymology, definition history) + +Plus a `pilot_report.md` capturing: +- Lexicon refinements discovered during the pilot +- Concepts that didn't fit the lexicon (gaps) +- Process improvements for Phase 3 + +### FR5. Phase 3 — Apply to remaining 10 + synthesis + +**WHERE:** `conductor/tracks/video_analysis_deob_apply_20260621/` + +**WHAT:** Consumes the refined lexicon (from Phase 2) + 10 remaining Pass 1 reports + 1 cross-cutting synthesis. Produces the 3-layer deliverable for each, in `artifacts//`. + +Plus an `apply_report.md` capturing: +- Final lexicon v2 +- Final process refinements +- Open questions for Pass 3 + +### FR6. Multi-layer deliverable structure (per video) + +For each Pass 1 report, the de-obfuscation produces 3 files in `artifacts//`: + +**`_translation.md`** — side-by-side translation table: +```markdown +# Translation: