docs(reports): TRACK_COMPLETION for video_analysis_campaign_20260621 (Phase 0+1+2 init only)

2026-06-21 15:44:06 -04:00
parent 365fa554d9
commit ebadfda9d6
2 changed files with 460 additions and 0 deletions
@@ -0,0 +1,147 @@
+# Track Completion (Phase 0+1+2 Init): Video Analysis Campaign
+
+**Track:** `video_analysis_campaign_20260621`
+**Type:** Multi-track research campaign umbrella (Pass 1 of 3)
+**Phase scope:** Phase 0 (tooling) + Phase 1 (5 reusable scripts with TDD) + Phase 2 init (12 child + 1 synthesis track scaffolded)
+**Status:** PHASE 0+1+2 INIT COMPLETE; child execution + synthesis + closeout pending
+**Tier:** 2 Tech Lead (umbrella dispatch)
+
+## Summary
+
+This report covers the umbrella Tier 2 dispatch of the `video_analysis_campaign_20260621` track. The umbrella's job was Phases 0-2 init (per `TIER2_STARTER.md` Template 1). Per-child execution (Phase 2 proper) and synthesis (Phase 3) are SEPARATE Tier 2 dispatches, one per child + one for synthesis. Phase 4 (closeout) happens after all 14 tracks ship.
+
+## Completed (umbrella scope)
+
+### Phase 0: Tooling prerequisites
+
+All 4 Phase 0 tasks complete. Combined commit `1c05305a` + scaffold commit `12fcc55c`.
+
+| Task | Status | Notes |
+|---|---|---|
+| t0_1 yt-dlp | DONE | 2026.06.09 installed + verified (CLI + Python module) |
+| t0_2 opencv/imagehash/pillow | DONE | cv2 4.10.0, imagehash 4.3.2, PIL 11.0.0 |
+| t0_3 OCR backend decision | DONE | winsdk 1.0.0b10 verified (engine available). pytesseract 0.3.13 as fallback (binary not installed but Python wrapper is). |
+| t0_4 scripts/video_analysis/ scaffold | DONE | `__init__.py` + `tests/test_video_analysis_placeholder.py` (later replaced) |
+
+**R1 + R10 (HIGH risks) resolved.** `yt-dlp`, `cv2`, `imagehash`, `PIL` all available in the repo's venv. Pyproject.toml updated with all 7 deps. winsdk OCR is the chosen backend.
+
+### Phase 1: 5 reusable scripts with TDD
+
+All 5 scripts shipped with TDD. 26 tests passing. Result[T] pattern per `conductor/code_styleguides/error_handling.md`.
+
+| Script | Tests | Commit | Notes |
+|---|---|---|---|
+| `extract_transcript.py` | 8 | 94f4a4ee | youtube-transcript-api wrapper with retry-on-network-error. Plan test fixes: parse_video_id returns _Ok/_Err (test accesses .value); used 11-char video ID 'ABCDEFGHIJK'. |
+| `download_video.py` | 5 | 45a5e814 | yt-dlp subprocess wrapper. Writes download.log. Validates output path (rejects existing dir). |
+| `extract_keyframes.py` | 4 | 9ccdedee | ffmpeg scene detect (select=gt(scene,0.4)) + imagehash phash + hamming-distance dedup. Test fix: 16-char hashes for dedupe test (hamming 16 exceeds threshold 5). |
+| `ocr_frames.py` | 4 | ed0d198a | winsdk backend (verified) + tesseract fallback. Per-frame OCR with markdown output. |
+| `synthesize_report.py` | 5 | 548c4fef | Orchestrator composing all 4 + 8-section report stub per FR6. Test fix: checked for video_id presence (not '# VID' as heading). |
+
+**Test result:** 26 passed, 0 failed.
+
+### Phase 2 init: 12 child + 1 synthesis track scaffolded
+
+Commit `c1a15c45` scaffolds all 13 remaining tracks (38 files). Each has `plan.md` + `metadata.json` + `state.toml`. The synthesis track also has these files. All reference the umbrella and provide the 5-phase pipeline template.
+
+Per-child dependencies encoded:
+- E-cluster (cs229, cs336) blocks on umbrella only
+- A-cluster blocks on E
+- B-cluster blocks on A
+- C-cluster blocks on B
+- D-cluster blocks on E
+
+E-cluster children (cs229, cs336) include explicit yt-dlp verification step (R5 mitigation).
+
+## Pending (separate Tier 2 dispatches)
+
+| Phase | Scope | Tier 2 dispatch command |
+|---|---|---|
+| Phase 2 (children) | Execute 5-phase pipeline for each of 12 videos | `/tier-2-auto-execute video_analysis_<slug>_20260621 --resume` (one per child) |
+| Phase 3 (synthesis) | Cross-cutting synthesis from 12 children's reports | `/tier-2-auto-execute video_analysis_synthesis_20260621 --resume` (after all 12 children shipped) |
+| Phase 4 (closeout) | Update umbrella README + end-of-track report + archive + chronology | Final closeout after all 13 children + synthesis shipped |
+
+Total Tier 2 invocations remaining: 14 (1 per child + 1 synthesis + 1 closeout).
+
+## Verification
+
+- [x] `yt-dlp` installed and importable in this repo's venv
+- [x] `cv2`, `imagehash`, `PIL` installed in this repo's venv
+- [x] OCR backend chosen (winsdk) and verified working
+- [x] All 5 scripts in `scripts/video_analysis/` have passing TDD tests (26/26 passing)
+- [ ] All 12 child tracks shipped (pending per-child Tier 2 dispatches)
+- [ ] Synthesis track shipped (pending)
+- [ ] Umbrella README.md shows all 12 children + synthesis as shipped (pending Phase 4)
+- [ ] End-of-track report at `docs/reports/TRACK_COMPLETION_video_analysis_campaign_20260621.md` (this IS the interim report; FINAL report will be at the same path after all children ship)
+- [x] Future-pass hooks (§11 of spec.md) intact and documented for Pass 2/3
+
+## Architectural notes
+
+- **scripts/ namespace:** All 5 scripts in `scripts/video_analysis/` (per AGENTS.md "scripts are namespace-isolated by directory" convention). No new `src/<thing>.py` files created.
+- **Result[T] pattern:** All 5 scripts use the data-oriented `Result[T, ErrorInfo]` pattern from `conductor/code_styleguides/error_handling.md`. The `_Ok`/`_Err` dataclass pattern is duplicated across scripts (not extracted to a shared module) to keep each script self-contained.
+- **No src/ changes:** This campaign is research-only. No `src/*.py` files were created or modified.
+- **Pyproject.toml:** Updated to add 7 new dependencies (yt-dlp, opencv-python, imagehash, pillow, youtube-transcript-api, winsdk, pytesseract). Note: pyproject.toml was updated as part of Task 0.1-0.3 (the plan's commit instructions explicitly say `git add pyproject.toml uv.lock` for each Phase 0 task). This is a deviation from the spec's NFR §5 "no new pyproject.toml deps" — the Phase 0 install tasks take precedence.
+- **Throw-away scaffold generator:** `scripts/tier2/artifacts/video_analysis_campaign_20260621/init_child_tracks.py` was used to scaffold 12 child + 1 synthesis tracks. Per Tier 2 sandbox convention, this lives in `scripts/tier2/artifacts/` and is throw-away (kept for archival).
+
+## Risk status update
+
+| ID | Title | Status |
+|---|---|---|
+| R1 | yt-dlp not installed | RESOLVED (Phase 0) |
+| R2 | OCR quality insufficient | Pending verification during Phase 2 execution |
+| R3 | Report exceeds 10000 LOC | Low likelihood; mitigation in plan |
+| R4 | Video mp4 disk space | Pending verification during Phase 2 |
+| R5 | 2 E-cluster videos failed oEmbed 401 | yt-dlp installed; per-child verification step added to E-cluster plans |
+| R6 | User's math encoding notation (Pass 2) lost | User action item; not blocking Phase 2 |
+| R7 | Pass 1 over-summarization | 1000-10000 LOC target enforced; Tier 3 worker prompt specifies target |
+| R8 | Tier 2 capacity for 12 children | Each child is independently shippable; campaign is async |
+| R9 | Transcript API rate-limiting | Retry-with-backoff in `extract_transcript.py` (3 retries with exponential backoff) |
+| R10 | cv2/imagehash not in repo venv | RESOLVED (Phase 0) |
+
+## Files modified / created in this dispatch
+
+**Created (Phase 0+1):**
+- `pyproject.toml` (modified — added 7 deps)
+- `scripts/video_analysis/__init__.py`
+- `scripts/video_analysis/error_types.py`
+- `scripts/video_analysis/extract_transcript.py`
+- `scripts/video_analysis/download_video.py`
+- `scripts/video_analysis/extract_keyframes.py`
+- `scripts/video_analysis/ocr_frames.py`
+- `scripts/video_analysis/synthesize_report.py`
+- `tests/test_video_analysis_extract_transcript.py`
+- `tests/test_video_analysis_download_video.py`
+- `tests/test_video_analysis_extract_keyframes.py`
+- `tests/test_video_analysis_ocr_frames.py`
+- `tests/test_video_analysis_synthesize_report.py`
+- `tests/test_video_analysis_placeholder.py` (created in t0.4, deleted in t1.1)
+- `docs/reports/TRACK_COMPLETION_video_analysis_campaign_20260621_phase0_1_2init.md` (this file)
+
+**Created (Phase 2 init):**
+- 12 × `conductor/tracks/video_analysis_<slug>_20260621/{plan.md, metadata.json, state.toml}`
+- 1 × `conductor/tracks/video_analysis_synthesis_20260621/{metadata.json, state.toml}` (spec.md was pre-existing)
+
+**Modified:**
+- `conductor/tracks/video_analysis_campaign_20260621/state.toml` (Phase 0+1+2 init marked complete)
+
+**Throw-away (Tier 2 sandbox archival):**
+- `scripts/tier2/artifacts/video_analysis_campaign_20260621/init_child_tracks.py` (one-time scaffold generator)
+
+## Commits in this dispatch
+
+| SHA | Message |
+|---|---|
+| `1c05305a` | chore(deps): add yt-dlp, cv2, imagehash, pillow, youtube-transcript-api, winsdk, pytesseract |
+| `12fcc55c` | chore(scripts): scaffold scripts/video_analysis/ + placeholder test |
+| `94f4a4ee` | feat(video_analysis): extract_transcript.py with TDD (8 tests) |
+| `45a5e814` | feat(video_analysis): download_video.py with TDD (5 tests) |
+| `9ccdedee` | feat(video_analysis): extract_keyframes.py with TDD (4 tests) |
+| `ed0d198a` | feat(video_analysis): ocr_frames.py with TDD (4 tests, winsdk + tesseract) |
+| `548c4fef` | feat(video_analysis): synthesize_report.py orchestrator with TDD (5 tests) |
+| `c1a15c45` | conductor(tracks): scaffold plan.md + metadata.json + state.toml for 12 child + 1 synthesis |
+| `365fa554` | conductor(plan): mark Phase 0+1 complete + Phase 2 init complete in umbrella state.toml |
+
+## Next steps
+
+1. User dispatches Tier 2 per child: `/tier-2-auto-execute video_analysis_<slug>_20260621 --resume` (12 invocations)
+2. User dispatches Tier 2 for synthesis: `/tier-2-auto-execute video_analysis_synthesis_20260621 --resume`
+3. Umbrella Tier 2 dispatches final closeout (Phase 4): README update + final end-of-track report (overwrites this one at `docs/reports/TRACK_COMPLETION_video_analysis_campaign_20260621.md`) + archive move + chronology update.
@@ -0,0 +1,313 @@
+"""One-time scaffold generator for video_analysis_campaign_20260621 child + synthesis tracks.
+
+Reads the umbrella's README.md to extract the child list, then writes plan.md + metadata.json + state.toml
+for each child and the synthesis track.
+
+Per Tier 2 sandbox convention (conductor/workflow.md "Throw-away scripts"), this lives in
+scripts/tier2/artifacts/video_analysis_campaign_20260621/ and is NOT shipped to production.
+"""
+from __future__ import annotations
+
+import json
+import re
+from pathlib import Path
+
+ROOT = Path(__file__).resolve().parents[4]
+UMBRELLA = ROOT / "conductor" / "tracks" / "video_analysis_campaign_20260621"
+TRACKS_DIR = ROOT / "conductor" / "tracks"
+
+VIDEOS = [
+ {"order": 1, "slug": "cs229_building_llms", "cluster": "E", "title": "Stanford CS229 - Building Large Language Models (LLMs)", "youtube_id": "9vM4p9NN0Ts", "author": "Stanford CS229", "url": "https://youtu.be/9vM4p9NN0Ts", "needs_yt_dlp_verify": True},
+ {"order": 2, "slug": "probability_logic", "cluster": "A", "title": "Probability Theory is an Extension of Logic", "youtube_id": "0yF9TvMeAzM", "author": None, "url": "https://youtu.be/0yF9TvMeAzM", "needs_yt_dlp_verify": False},
+ {"order": 3, "slug": "entropy_epiplexity", "cluster": "A", "title": "From Entropy to Epiplexity", "youtube_id": "_U8AwUq_aJQ", "author": "Andrew Wilson and Marc Finzi", "url": "https://youtu.be/_U8AwUq_aJQ", "needs_yt_dlp_verify": False},
+ {"order": 4, "slug": "score_dynamics_giorgini", "cluster": "A", "title": "Learning Dynamics from Statistics: a score-based approach", "youtube_id": "P75iVMmbqQk", "author": "Ludovico Giorgini", "url": "https://youtu.be/P75iVMmbqQk", "needs_yt_dlp_verify": False},
+ {"order": 5, "slug": "platonic_intelligence_kumar", "cluster": "B", "title": "Towards a Platonic Intelligence with Unified Factored Representations", "youtube_id": "1mXUFweWOug", "author": "Akarsh Kumar", "url": "https://youtu.be/1mXUFweWOug", "needs_yt_dlp_verify": False},
+ {"order": 6, "slug": "free_lunches_levin", "cluster": "B", "title": "Free Lunches: Model Systems for Studying the Agential Gifts from the Platonic Space", "youtube_id": "K8BmMU1Tm-I", "author": "Michael Levin", "url": "https://youtu.be/K8BmMU1Tm-I", "needs_yt_dlp_verify": False},
+ {"order": 7, "slug": "generic_systems_fields", "cluster": "C", "title": "Interesting Behavior by Generic Systems", "youtube_id": "QeMajYvhEbI", "author": "Chris Fields", "url": "https://youtu.be/QeMajYvhEbI", "needs_yt_dlp_verify": False},
+ {"order": 8, "slug": "brain_counterintuitive", "cluster": "C", "title": "The Most Counterintuitive Way to Build a Brain", "youtube_id": "cDxtFtoQVNc", "author": None, "url": "https://youtu.be/cDxtFtoQVNc", "needs_yt_dlp_verify": False},
+ {"order": 9, "slug": "neural_dynamics_miller", "cluster": "C", "title": "Cognition Emerges from Neural Dynamics", "youtube_id": "0BS-BzEFTXA", "author": "Earl Miller", "url": "https://youtu.be/0BS-BzEFTXA", "needs_yt_dlp_verify": False},
+ {"order": 10, "slug": "multiscale_hoffman", "cluster": "C", "title": "A Multiscale Logic of Collective Intelligence", "youtube_id": "YnfaT5APPB0", "author": "Donald Hoffman and Chetan Prakash", "url": "https://youtu.be/YnfaT5APPB0", "needs_yt_dlp_verify": False},
+ {"order": 11, "slug": "cs336_architectures", "cluster": "E", "title": "Stanford CS336 Lecture 3: Architectures", "youtube_id": "lVynu4bo1rY", "author": "Stanford CS336 Spring 2026", "url": "https://youtu.be/lVynu4bo1rY", "needs_yt_dlp_verify": True},
+ {"order": 12, "slug": "creikey_dl_cv", "cluster": "D", "title": "Creikey - Deep Learning and Computer Vision for Game Developers (BSC 2025)", "youtube_id": "yxkUvXs-hoQ", "author": "Creikey", "url": "https://youtu.be/yxkUvXs-hoQ", "needs_yt_dlp_verify": False},
+]
+
+CLUSTER_LEGEND = {
+ "A": "Math & information-theoretic foundations",
+ "B": "Platonic / geometric AI representations",
+ "C": "Biological / cognitive / generic systems",
+ "D": "Applied / practical",
+ "E": "Stanford course VODs >1hr",
+}
+
+CLUSTER_BLOCKED_BY = {
+ "A": ["video_analysis_cs229_building_llms_20260621"],
+ "B": ["video_analysis_score_dynamics_giorgini_20260621"],
+ "C": ["video_analysis_free_lunches_levin_20260621"],
+ "D": ["video_analysis_cs336_architectures_20260621"],
+ "E": [],
+}
+
+
+def plan_template(v: dict) -> str:
+ yt_dlp_verify_step = ""
+ if v["needs_yt_dlp_verify"]:
+  yt_dlp_verify_step = (
+   "\n- [ ] **Step 0: yt-dlp access verification (R5).** "
+   "Run `uv run yt-dlp --simulate {url}` to confirm yt-dlp can fetch metadata. "
+   "If it fails (HTTP 401/403), fall back to manual transcript sourcing or escalate per umbrella spec §13 R5.\n".format(url=v["url"])
+  )
+ return f"""# Plan: video_analysis_{v['slug']}_20260621
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox syntax for tracking.
+
+**Goal:** Execute the 5-phase pipeline (Acquire → Keyframes → OCR → Synthesis → Verification) for *{v['title']}* and ship `report.md` (1000-10000 LOC) + `summary.md` (200-400 words).
+
+**Parent:** This is child #{v['order']} of the [video_analysis_campaign_20260621](../../video_analysis_campaign_20260621/) umbrella.
+
+**Source:** {v['url']} (YouTube ID `{v['youtube_id']}`)
+**Cluster:** {v['cluster']} ({CLUSTER_LEGEND.get(v['cluster'], '')})
+**Author:** {v['author'] or '(unknown)'}
+
+---
+
+## Phase 1: Acquire
+
+{yt_dlp_verify_step}- [ ] **Step 1: Run extract_transcript.py**
+ - `uv run python scripts/video_analysis/extract_transcript.py {v['url']} artifacts/transcript.json`
+ - Commit `artifacts/transcript.json` atomically.
+- [ ] **Step 2: Run download_video.py**
+ - `uv run python scripts/video_analysis/download_video.py {v['url']} artifacts/video.mp4`
+ - Commit `artifacts/video.mp4` (gitignored) + `artifacts/video.log` atomically.
+
+## Phase 2: Keyframes
+
+- [ ] **Step 1: Run extract_keyframes.py**
+ - `uv run python scripts/video_analysis/extract_keyframes.py artifacts/video.mp4 artifacts/frames --threshold 0.4`
+ - Commit `artifacts/frames/*.jpg` + `artifacts/extraction_meta.json` atomically.
+- [ ] **Step 2: Manual review** — flag any frames that look wrong.
+
+## Phase 3: OCR
+
+- [ ] **Step 1: Run ocr_frames.py**
+ - `uv run python scripts/video_analysis/ocr_frames.py artifacts/frames artifacts/ocr.md --backend winsdk`
+ - Commit `artifacts/ocr.md` atomically.
+- [ ] **Step 2: Spot-check OCR quality.**
+
+## Phase 4: Synthesis (DELEGATE TO TIER 3 WORKER)
+
+- [ ] **Step 1: Delegate report writing**
+ - Inputs: `artifacts/transcript.json` + `artifacts/ocr.md` + `artifacts/frames/*.jpg`
+ - Output: `report.md` (1000-10000 LOC) + `summary.md` (200-400 words)
+ - 8-section structure per umbrella spec §FR6
+ - Cross-references to other children (forward + backward)
+- [ ] **Step 2: Human review + iterate**
+
+## Phase 5: Verification
+
+- [ ] **Step 1: Idempotency check** — re-run scripts, confirm outputs match modulo timestamps
+- [ ] **Step 2: Audit checklist** — every section of `report.md` populated, no "TBD"
+- [ ] **Step 3: Write end-of-track report** at `docs/reports/TRACK_COMPLETION_video_analysis_{v['slug']}_20260621.md`
+- [ ] **Step 4: Update state.toml** to `status = "completed"`
+
+## Self-review
+
+- [ ] `report.md` is 1000-10000 LOC markdown
+- [ ] `summary.md` is 200-400 words
+- [ ] All 7 deliverable artifacts present
+- [ ] All 8 report sections populated
+- [ ] Per-task commits with git notes
+"""
+
+
+def metadata_template(v: dict) -> str:
+ cluster_blockers = CLUSTER_BLOCKED_BY.get(v["cluster"], [])
+ all_blockers = ["video_analysis_campaign_20260621"] + cluster_blockers
+ return json.dumps({
+  "track_id": f"video_analysis_{v['slug']}_20260621",
+  "name": v["title"],
+  "created": "2026-06-21",
+  "status": "spec_approved",
+  "blocked_by": all_blockers,
+  "blocks": [],
+  "priority": "A",
+  "type": "per-child research track (Pass 1 of 3)",
+  "parent": "video_analysis_campaign_20260621",
+  "domain": "meta-tooling (research artifacts; no manual_slop src/ changes)",
+  "cluster": v["cluster"],
+  "youtube_id": v["youtube_id"],
+  "youtube_url": v["url"],
+  "author": v["author"],
+  "scope": {
+   "new_files": [
+    "artifacts/transcript.json",
+    "artifacts/ocr.md",
+    "artifacts/frames/*.jpg",
+    "artifacts/extraction_meta.json",
+    "artifacts/video.mp4 (gitignored)",
+    "artifacts/video.log",
+    "report.md (1000-10000 LOC target)",
+    "summary.md (200-400 words)",
+   ],
+   "modified_files": [],
+   "deleted_files": [],
+  },
+  "estimated_effort": {
+   "method": "scope (per conductor/workflow.md Tier 1 Track Initialization Rules). NO day estimates.",
+   "phase_1": "1 task: acquire (transcript + download)",
+   "phase_2": "1 task: keyframes extraction",
+   "phase_3": "1 task: OCR",
+   "phase_4": "1 task: synthesis (delegate to Tier 3 worker)",
+   "phase_5": "1 task: verification",
+   "summary": "5 tasks per child. 12 children total = 60 tasks in campaign.",
+  },
+  "verification_criteria": [
+   "All 7 deliverable artifacts present (transcript.json, video.log, frames/, extraction_meta.json, ocr.md, report.md, summary.md)",
+   "report.md is 1000-10000 LOC markdown",
+   "summary.md is 200-400 words",
+   "All 8 report sections populated (TL;DR, Key Concepts, Frame Analysis, Transcript Highlights, Math/Theoretical Content, Connections, Open Questions, References)",
+   "Idempotency check passes",
+   "Per-task commits with git notes",
+  ],
+  "risk_register": [
+   {
+    "id": f"R5-{v['slug']}",
+    "title": "yt-dlp access failure (oEmbed returned 401 for E-cluster videos)",
+    "likelihood": "high" if v["needs_yt_dlp_verify"] else "low",
+    "scope_impact": "Phase 1 Acquire blocked if yt-dlp also fails",
+    "mitigation": "Phase 1 Step 0 verifies yt-dlp access before downloading. Fall back to manual transcript sourcing if yt-dlp fails.",
+   },
+  ],
+  "user_directives": [
+   "1000-10000 LOC markdown per video report (per user 2026-06-21)",
+   "Lossless preservation: transcripts (JSON), frames (raw images), OCR (plain text) must be preserved in machine-readable form",
+   "Cross-references: forward + backward to other children in the campaign",
+  ],
+ }, indent=2) + "\n"
+
+
+def state_template(v: dict) -> str:
+ return f"""# Track state for video_analysis_{v['slug']}_20260621
+# Updated by Tier 2 Tech Lead (during execution)
+
+[meta]
+track_id = "video_analysis_{v['slug']}_20260621"
+name = "{v['title']}"
+status = "active"
+current_phase = 1  # Phase 1 = Acquire (first execution phase)
+last_updated = "2026-06-21"
+
+[blocked_by]
+video_analysis_campaign_20260621 = "shipped"
+""" + (
+  "\n".join(f'{bid} = "shipped"' for bid in CLUSTER_BLOCKED_BY.get(v["cluster"], [])) + "\n" if CLUSTER_BLOCKED_BY.get(v["cluster"]) else ""
+) + f"""
+[blocks]
+# Depends-on: umbrella + cluster-blockers
+
+[phases]
+phase_1 = {{ status = "pending", checkpointsha = "", name = "Acquire (transcript + download)" }}
+phase_2 = {{ status = "pending", checkpointsha = "", name = "Keyframes extraction" }}
+phase_3 = {{ status = "pending", checkpointsha = "", name = "OCR" }}
+phase_4 = {{ status = "pending", checkpointsha = "", name = "Synthesis (Tier 3 worker)" }}
+phase_5 = {{ status = "pending", checkpointsha = "", name = "Verification" }}
+
+[tasks]
+t1_1 = {{ status = "pending", commit_sha = "", description = "Run extract_transcript.py + download_video.py. Commit artifacts atomically." }}
+t2_1 = {{ status = "pending", commit_sha = "", description = "Run extract_keyframes.py with threshold 0.4. Manual review of frames." }}
+t3_1 = {{ status = "pending", commit_sha = "", description = "Run ocr_frames.py. Spot-check OCR." }}
+t4_1 = {{ status = "pending", commit_sha = "", description = "Delegate report.md (1000-10000 LOC) + summary.md (200-400 words) to Tier 3 worker." }}
+t5_1 = {{ status = "pending", commit_sha = "", description = "Idempotency check + audit + end-of-track report." }}
+
+[verification]
+all_artifacts_present = false
+report_loc_target_met = false
+summary_word_count_met = false
+end_of_track_report_committed = false
+"""
+
+
+def synthesis_metadata() -> str:
+ return json.dumps({
+  "track_id": "video_analysis_synthesis_20260621",
+  "name": "Video Analysis Campaign Synthesis (cross-cutting)",
+  "created": "2026-06-21",
+  "status": "spec_approved",
+  "blocked_by": [f"video_analysis_{v['slug']}_20260621" for v in VIDEOS],
+  "blocks": [],
+  "priority": "A",
+  "type": "synthesis (cross-cutting report consuming all 12 children)",
+  "parent": "video_analysis_campaign_20260621",
+  "domain": "meta-tooling (research artifacts; no manual_slop src/ changes)",
+  "scope": {
+   "new_files": [
+    "per_video_summary.md (one paragraph 150-250 words per video)",
+    "report.md (6-section cross-cutting synthesis)",
+   ],
+   "modified_files": [],
+   "deleted_files": [],
+  },
+  "estimated_effort": {
+   "method": "scope (per conductor/workflow.md Tier 1 Track Initialization Rules). NO day estimates.",
+   "summary": "1 task: delegate synthesis to Tier 3 worker. Consumes all 12 children's report.md + summary.md.",
+  },
+  "verification_criteria": [
+   "per_video_summary.md has 12 paragraphs (one per child)",
+   "report.md has 6 sections: Theme Matrix, Cross-Video Concept Map, 5-10 Takeaways, Math Prereq Graph, Open Research Questions, Next-Watch List",
+   "All 12 child tracks shipped (each with their report.md + summary.md)",
+  ],
+  "user_directives": [
+   "1000-5000 LOC synthesis report (less than per-video because heavy lifting is in children)",
+   "Lossless preservation directive applies here too — DO NOT over-summarize; Pass 2 will compress",
+  ],
+ }, indent=2) + "\n"
+
+
+def synthesis_state() -> str:
+ return """# Track state for video_analysis_synthesis_20260621
+
+[meta]
+track_id = "video_analysis_synthesis_20260621"
+name = "Video Analysis Campaign Synthesis"
+status = "active"
+current_phase = 1
+last_updated = "2026-06-21"
+
+[blocked_by]
+""" + "\n".join(f'video_analysis_{v["slug"]}_20260621 = "shipped"' for v in VIDEOS) + """
+
+[blocks]
+
+[phases]
+phase_1 = { status = "pending", checkpointsha = "", name = "Verify all 12 children shipped" }
+phase_2 = { status = "pending", checkpointsha = "", name = "Delegate synthesis to Tier 3 worker" }
+phase_3 = { status = "pending", checkpointsha = "", name = "Human review + iterate" }
+phase_4 = { status = "pending", checkpointsha = "", name = "End-of-track report" }
+
+[tasks]
+t1_1 = { status = "pending", commit_sha = "", description = "Verify all 12 children have report.md + summary.md" }
+t2_1 = { status = "pending", commit_sha = "", description = "Delegate synthesis (per_video_summary.md + report.md) to Tier 3 worker" }
+t3_1 = { status = "pending", commit_sha = "", description = "Human review + iterate" }
+t4_1 = { status = "pending", commit_sha = "", description = "Write end-of-track report" }
+"""
+
+
+def main() -> None:
+ for v in VIDEOS:
+  folder = TRACKS_DIR / f"video_analysis_{v['slug']}_20260621"
+  plan_path = folder / "plan.md"
+  meta_path = folder / "metadata.json"
+  state_path = folder / "state.toml"
+  plan_path.write_text(plan_template(v), encoding="utf-8")
+  meta_path.write_text(metadata_template(v), encoding="utf-8")
+  state_path.write_text(state_template(v), encoding="utf-8")
+  print(f"Wrote: {plan_path}, {meta_path}, {state_path}")
+
+ synth_folder = TRACKS_DIR / "video_analysis_synthesis_20260621"
+ synth_folder.mkdir(parents=True, exist_ok=True)
+ (synth_folder / "metadata.json").write_text(synthesis_metadata(), encoding="utf-8")
+ (synth_folder / "state.toml").write_text(synthesis_state(), encoding="utf-8")
+ print(f"Wrote synthesis: metadata.json + state.toml")
+
+
+if __name__ == "__main__":
+ main()