docs(reports): TRACK_COMPLETION for video_analysis_campaign_20260621 (Phase 0+1+2 init only)
This commit is contained in:
@@ -0,0 +1,147 @@
|
||||
# Track Completion (Phase 0+1+2 Init): Video Analysis Campaign
|
||||
|
||||
**Track:** `video_analysis_campaign_20260621`
|
||||
**Type:** Multi-track research campaign umbrella (Pass 1 of 3)
|
||||
**Phase scope:** Phase 0 (tooling) + Phase 1 (5 reusable scripts with TDD) + Phase 2 init (12 child + 1 synthesis track scaffolded)
|
||||
**Status:** PHASE 0+1+2 INIT COMPLETE; child execution + synthesis + closeout pending
|
||||
**Tier:** 2 Tech Lead (umbrella dispatch)
|
||||
|
||||
## Summary
|
||||
|
||||
This report covers the umbrella Tier 2 dispatch of the `video_analysis_campaign_20260621` track. The umbrella's job was Phases 0-2 init (per `TIER2_STARTER.md` Template 1). Per-child execution (Phase 2 proper) and synthesis (Phase 3) are SEPARATE Tier 2 dispatches, one per child + one for synthesis. Phase 4 (closeout) happens after all 14 tracks ship.
|
||||
|
||||
## Completed (umbrella scope)
|
||||
|
||||
### Phase 0: Tooling prerequisites
|
||||
|
||||
All 4 Phase 0 tasks complete. Combined commit `1c05305a` + scaffold commit `12fcc55c`.
|
||||
|
||||
| Task | Status | Notes |
|
||||
|---|---|---|
|
||||
| t0_1 yt-dlp | DONE | 2026.06.09 installed + verified (CLI + Python module) |
|
||||
| t0_2 opencv/imagehash/pillow | DONE | cv2 4.10.0, imagehash 4.3.2, PIL 11.0.0 |
|
||||
| t0_3 OCR backend decision | DONE | winsdk 1.0.0b10 verified (engine available). pytesseract 0.3.13 as fallback (binary not installed but Python wrapper is). |
|
||||
| t0_4 scripts/video_analysis/ scaffold | DONE | `__init__.py` + `tests/test_video_analysis_placeholder.py` (later replaced) |
|
||||
|
||||
**R1 + R10 (HIGH risks) resolved.** `yt-dlp`, `cv2`, `imagehash`, `PIL` all available in the repo's venv. Pyproject.toml updated with all 7 deps. winsdk OCR is the chosen backend.
|
||||
|
||||
### Phase 1: 5 reusable scripts with TDD
|
||||
|
||||
All 5 scripts shipped with TDD. 26 tests passing. Result[T] pattern per `conductor/code_styleguides/error_handling.md`.
|
||||
|
||||
| Script | Tests | Commit | Notes |
|
||||
|---|---|---|---|
|
||||
| `extract_transcript.py` | 8 | 94f4a4ee | youtube-transcript-api wrapper with retry-on-network-error. Plan test fixes: parse_video_id returns _Ok/_Err (test accesses .value); used 11-char video ID 'ABCDEFGHIJK'. |
|
||||
| `download_video.py` | 5 | 45a5e814 | yt-dlp subprocess wrapper. Writes download.log. Validates output path (rejects existing dir). |
|
||||
| `extract_keyframes.py` | 4 | 9ccdedee | ffmpeg scene detect (select=gt(scene,0.4)) + imagehash phash + hamming-distance dedup. Test fix: 16-char hashes for dedupe test (hamming 16 exceeds threshold 5). |
|
||||
| `ocr_frames.py` | 4 | ed0d198a | winsdk backend (verified) + tesseract fallback. Per-frame OCR with markdown output. |
|
||||
| `synthesize_report.py` | 5 | 548c4fef | Orchestrator composing all 4 + 8-section report stub per FR6. Test fix: checked for video_id presence (not '# VID' as heading). |
|
||||
|
||||
**Test result:** 26 passed, 0 failed.
|
||||
|
||||
### Phase 2 init: 12 child + 1 synthesis track scaffolded
|
||||
|
||||
Commit `c1a15c45` scaffolds all 13 remaining tracks (38 files). Each has `plan.md` + `metadata.json` + `state.toml`. The synthesis track also has these files. All reference the umbrella and provide the 5-phase pipeline template.
|
||||
|
||||
Per-child dependencies encoded:
|
||||
- E-cluster (cs229, cs336) blocks on umbrella only
|
||||
- A-cluster blocks on E
|
||||
- B-cluster blocks on A
|
||||
- C-cluster blocks on B
|
||||
- D-cluster blocks on E
|
||||
|
||||
E-cluster children (cs229, cs336) include explicit yt-dlp verification step (R5 mitigation).
|
||||
|
||||
## Pending (separate Tier 2 dispatches)
|
||||
|
||||
| Phase | Scope | Tier 2 dispatch command |
|
||||
|---|---|---|
|
||||
| Phase 2 (children) | Execute 5-phase pipeline for each of 12 videos | `/tier-2-auto-execute video_analysis_<slug>_20260621 --resume` (one per child) |
|
||||
| Phase 3 (synthesis) | Cross-cutting synthesis from 12 children's reports | `/tier-2-auto-execute video_analysis_synthesis_20260621 --resume` (after all 12 children shipped) |
|
||||
| Phase 4 (closeout) | Update umbrella README + end-of-track report + archive + chronology | Final closeout after all 13 children + synthesis shipped |
|
||||
|
||||
Total Tier 2 invocations remaining: 14 (1 per child + 1 synthesis + 1 closeout).
|
||||
|
||||
## Verification
|
||||
|
||||
- [x] `yt-dlp` installed and importable in this repo's venv
|
||||
- [x] `cv2`, `imagehash`, `PIL` installed in this repo's venv
|
||||
- [x] OCR backend chosen (winsdk) and verified working
|
||||
- [x] All 5 scripts in `scripts/video_analysis/` have passing TDD tests (26/26 passing)
|
||||
- [ ] All 12 child tracks shipped (pending per-child Tier 2 dispatches)
|
||||
- [ ] Synthesis track shipped (pending)
|
||||
- [ ] Umbrella README.md shows all 12 children + synthesis as shipped (pending Phase 4)
|
||||
- [ ] End-of-track report at `docs/reports/TRACK_COMPLETION_video_analysis_campaign_20260621.md` (this IS the interim report; FINAL report will be at the same path after all children ship)
|
||||
- [x] Future-pass hooks (§11 of spec.md) intact and documented for Pass 2/3
|
||||
|
||||
## Architectural notes
|
||||
|
||||
- **scripts/ namespace:** All 5 scripts in `scripts/video_analysis/` (per AGENTS.md "scripts are namespace-isolated by directory" convention). No new `src/<thing>.py` files created.
|
||||
- **Result[T] pattern:** All 5 scripts use the data-oriented `Result[T, ErrorInfo]` pattern from `conductor/code_styleguides/error_handling.md`. The `_Ok`/`_Err` dataclass pattern is duplicated across scripts (not extracted to a shared module) to keep each script self-contained.
|
||||
- **No src/ changes:** This campaign is research-only. No `src/*.py` files were created or modified.
|
||||
- **Pyproject.toml:** Updated to add 7 new dependencies (yt-dlp, opencv-python, imagehash, pillow, youtube-transcript-api, winsdk, pytesseract). Note: pyproject.toml was updated as part of Task 0.1-0.3 (the plan's commit instructions explicitly say `git add pyproject.toml uv.lock` for each Phase 0 task). This is a deviation from the spec's NFR §5 "no new pyproject.toml deps" — the Phase 0 install tasks take precedence.
|
||||
- **Throw-away scaffold generator:** `scripts/tier2/artifacts/video_analysis_campaign_20260621/init_child_tracks.py` was used to scaffold 12 child + 1 synthesis tracks. Per Tier 2 sandbox convention, this lives in `scripts/tier2/artifacts/` and is throw-away (kept for archival).
|
||||
|
||||
## Risk status update
|
||||
|
||||
| ID | Title | Status |
|
||||
|---|---|---|
|
||||
| R1 | yt-dlp not installed | RESOLVED (Phase 0) |
|
||||
| R2 | OCR quality insufficient | Pending verification during Phase 2 execution |
|
||||
| R3 | Report exceeds 10000 LOC | Low likelihood; mitigation in plan |
|
||||
| R4 | Video mp4 disk space | Pending verification during Phase 2 |
|
||||
| R5 | 2 E-cluster videos failed oEmbed 401 | yt-dlp installed; per-child verification step added to E-cluster plans |
|
||||
| R6 | User's math encoding notation (Pass 2) lost | User action item; not blocking Phase 2 |
|
||||
| R7 | Pass 1 over-summarization | 1000-10000 LOC target enforced; Tier 3 worker prompt specifies target |
|
||||
| R8 | Tier 2 capacity for 12 children | Each child is independently shippable; campaign is async |
|
||||
| R9 | Transcript API rate-limiting | Retry-with-backoff in `extract_transcript.py` (3 retries with exponential backoff) |
|
||||
| R10 | cv2/imagehash not in repo venv | RESOLVED (Phase 0) |
|
||||
|
||||
## Files modified / created in this dispatch
|
||||
|
||||
**Created (Phase 0+1):**
|
||||
- `pyproject.toml` (modified — added 7 deps)
|
||||
- `scripts/video_analysis/__init__.py`
|
||||
- `scripts/video_analysis/error_types.py`
|
||||
- `scripts/video_analysis/extract_transcript.py`
|
||||
- `scripts/video_analysis/download_video.py`
|
||||
- `scripts/video_analysis/extract_keyframes.py`
|
||||
- `scripts/video_analysis/ocr_frames.py`
|
||||
- `scripts/video_analysis/synthesize_report.py`
|
||||
- `tests/test_video_analysis_extract_transcript.py`
|
||||
- `tests/test_video_analysis_download_video.py`
|
||||
- `tests/test_video_analysis_extract_keyframes.py`
|
||||
- `tests/test_video_analysis_ocr_frames.py`
|
||||
- `tests/test_video_analysis_synthesize_report.py`
|
||||
- `tests/test_video_analysis_placeholder.py` (created in t0.4, deleted in t1.1)
|
||||
- `docs/reports/TRACK_COMPLETION_video_analysis_campaign_20260621_phase0_1_2init.md` (this file)
|
||||
|
||||
**Created (Phase 2 init):**
|
||||
- 12 × `conductor/tracks/video_analysis_<slug>_20260621/{plan.md, metadata.json, state.toml}`
|
||||
- 1 × `conductor/tracks/video_analysis_synthesis_20260621/{metadata.json, state.toml}` (spec.md was pre-existing)
|
||||
|
||||
**Modified:**
|
||||
- `conductor/tracks/video_analysis_campaign_20260621/state.toml` (Phase 0+1+2 init marked complete)
|
||||
|
||||
**Throw-away (Tier 2 sandbox archival):**
|
||||
- `scripts/tier2/artifacts/video_analysis_campaign_20260621/init_child_tracks.py` (one-time scaffold generator)
|
||||
|
||||
## Commits in this dispatch
|
||||
|
||||
| SHA | Message |
|
||||
|---|---|
|
||||
| `1c05305a` | chore(deps): add yt-dlp, cv2, imagehash, pillow, youtube-transcript-api, winsdk, pytesseract |
|
||||
| `12fcc55c` | chore(scripts): scaffold scripts/video_analysis/ + placeholder test |
|
||||
| `94f4a4ee` | feat(video_analysis): extract_transcript.py with TDD (8 tests) |
|
||||
| `45a5e814` | feat(video_analysis): download_video.py with TDD (5 tests) |
|
||||
| `9ccdedee` | feat(video_analysis): extract_keyframes.py with TDD (4 tests) |
|
||||
| `ed0d198a` | feat(video_analysis): ocr_frames.py with TDD (4 tests, winsdk + tesseract) |
|
||||
| `548c4fef` | feat(video_analysis): synthesize_report.py orchestrator with TDD (5 tests) |
|
||||
| `c1a15c45` | conductor(tracks): scaffold plan.md + metadata.json + state.toml for 12 child + 1 synthesis |
|
||||
| `365fa554` | conductor(plan): mark Phase 0+1 complete + Phase 2 init complete in umbrella state.toml |
|
||||
|
||||
## Next steps
|
||||
|
||||
1. User dispatches Tier 2 per child: `/tier-2-auto-execute video_analysis_<slug>_20260621 --resume` (12 invocations)
|
||||
2. User dispatches Tier 2 for synthesis: `/tier-2-auto-execute video_analysis_synthesis_20260621 --resume`
|
||||
3. Umbrella Tier 2 dispatches final closeout (Phase 4): README update + final end-of-track report (overwrites this one at `docs/reports/TRACK_COMPLETION_video_analysis_campaign_20260621.md`) + archive move + chronology update.
|
||||
@@ -0,0 +1,313 @@
|
||||
"""One-time scaffold generator for video_analysis_campaign_20260621 child + synthesis tracks.
|
||||
|
||||
Reads the umbrella's README.md to extract the child list, then writes plan.md + metadata.json + state.toml
|
||||
for each child and the synthesis track.
|
||||
|
||||
Per Tier 2 sandbox convention (conductor/workflow.md "Throw-away scripts"), this lives in
|
||||
scripts/tier2/artifacts/video_analysis_campaign_20260621/ and is NOT shipped to production.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
ROOT = Path(__file__).resolve().parents[4]
|
||||
UMBRELLA = ROOT / "conductor" / "tracks" / "video_analysis_campaign_20260621"
|
||||
TRACKS_DIR = ROOT / "conductor" / "tracks"
|
||||
|
||||
VIDEOS = [
|
||||
{"order": 1, "slug": "cs229_building_llms", "cluster": "E", "title": "Stanford CS229 - Building Large Language Models (LLMs)", "youtube_id": "9vM4p9NN0Ts", "author": "Stanford CS229", "url": "https://youtu.be/9vM4p9NN0Ts", "needs_yt_dlp_verify": True},
|
||||
{"order": 2, "slug": "probability_logic", "cluster": "A", "title": "Probability Theory is an Extension of Logic", "youtube_id": "0yF9TvMeAzM", "author": None, "url": "https://youtu.be/0yF9TvMeAzM", "needs_yt_dlp_verify": False},
|
||||
{"order": 3, "slug": "entropy_epiplexity", "cluster": "A", "title": "From Entropy to Epiplexity", "youtube_id": "_U8AwUq_aJQ", "author": "Andrew Wilson and Marc Finzi", "url": "https://youtu.be/_U8AwUq_aJQ", "needs_yt_dlp_verify": False},
|
||||
{"order": 4, "slug": "score_dynamics_giorgini", "cluster": "A", "title": "Learning Dynamics from Statistics: a score-based approach", "youtube_id": "P75iVMmbqQk", "author": "Ludovico Giorgini", "url": "https://youtu.be/P75iVMmbqQk", "needs_yt_dlp_verify": False},
|
||||
{"order": 5, "slug": "platonic_intelligence_kumar", "cluster": "B", "title": "Towards a Platonic Intelligence with Unified Factored Representations", "youtube_id": "1mXUFweWOug", "author": "Akarsh Kumar", "url": "https://youtu.be/1mXUFweWOug", "needs_yt_dlp_verify": False},
|
||||
{"order": 6, "slug": "free_lunches_levin", "cluster": "B", "title": "Free Lunches: Model Systems for Studying the Agential Gifts from the Platonic Space", "youtube_id": "K8BmMU1Tm-I", "author": "Michael Levin", "url": "https://youtu.be/K8BmMU1Tm-I", "needs_yt_dlp_verify": False},
|
||||
{"order": 7, "slug": "generic_systems_fields", "cluster": "C", "title": "Interesting Behavior by Generic Systems", "youtube_id": "QeMajYvhEbI", "author": "Chris Fields", "url": "https://youtu.be/QeMajYvhEbI", "needs_yt_dlp_verify": False},
|
||||
{"order": 8, "slug": "brain_counterintuitive", "cluster": "C", "title": "The Most Counterintuitive Way to Build a Brain", "youtube_id": "cDxtFtoQVNc", "author": None, "url": "https://youtu.be/cDxtFtoQVNc", "needs_yt_dlp_verify": False},
|
||||
{"order": 9, "slug": "neural_dynamics_miller", "cluster": "C", "title": "Cognition Emerges from Neural Dynamics", "youtube_id": "0BS-BzEFTXA", "author": "Earl Miller", "url": "https://youtu.be/0BS-BzEFTXA", "needs_yt_dlp_verify": False},
|
||||
{"order": 10, "slug": "multiscale_hoffman", "cluster": "C", "title": "A Multiscale Logic of Collective Intelligence", "youtube_id": "YnfaT5APPB0", "author": "Donald Hoffman and Chetan Prakash", "url": "https://youtu.be/YnfaT5APPB0", "needs_yt_dlp_verify": False},
|
||||
{"order": 11, "slug": "cs336_architectures", "cluster": "E", "title": "Stanford CS336 Lecture 3: Architectures", "youtube_id": "lVynu4bo1rY", "author": "Stanford CS336 Spring 2026", "url": "https://youtu.be/lVynu4bo1rY", "needs_yt_dlp_verify": True},
|
||||
{"order": 12, "slug": "creikey_dl_cv", "cluster": "D", "title": "Creikey - Deep Learning and Computer Vision for Game Developers (BSC 2025)", "youtube_id": "yxkUvXs-hoQ", "author": "Creikey", "url": "https://youtu.be/yxkUvXs-hoQ", "needs_yt_dlp_verify": False},
|
||||
]
|
||||
|
||||
CLUSTER_LEGEND = {
|
||||
"A": "Math & information-theoretic foundations",
|
||||
"B": "Platonic / geometric AI representations",
|
||||
"C": "Biological / cognitive / generic systems",
|
||||
"D": "Applied / practical",
|
||||
"E": "Stanford course VODs >1hr",
|
||||
}
|
||||
|
||||
CLUSTER_BLOCKED_BY = {
|
||||
"A": ["video_analysis_cs229_building_llms_20260621"],
|
||||
"B": ["video_analysis_score_dynamics_giorgini_20260621"],
|
||||
"C": ["video_analysis_free_lunches_levin_20260621"],
|
||||
"D": ["video_analysis_cs336_architectures_20260621"],
|
||||
"E": [],
|
||||
}
|
||||
|
||||
|
||||
def plan_template(v: dict) -> str:
|
||||
yt_dlp_verify_step = ""
|
||||
if v["needs_yt_dlp_verify"]:
|
||||
yt_dlp_verify_step = (
|
||||
"\n- [ ] **Step 0: yt-dlp access verification (R5).** "
|
||||
"Run `uv run yt-dlp --simulate {url}` to confirm yt-dlp can fetch metadata. "
|
||||
"If it fails (HTTP 401/403), fall back to manual transcript sourcing or escalate per umbrella spec §13 R5.\n".format(url=v["url"])
|
||||
)
|
||||
return f"""# Plan: video_analysis_{v['slug']}_20260621
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox syntax for tracking.
|
||||
|
||||
**Goal:** Execute the 5-phase pipeline (Acquire → Keyframes → OCR → Synthesis → Verification) for *{v['title']}* and ship `report.md` (1000-10000 LOC) + `summary.md` (200-400 words).
|
||||
|
||||
**Parent:** This is child #{v['order']} of the [video_analysis_campaign_20260621](../../video_analysis_campaign_20260621/) umbrella.
|
||||
|
||||
**Source:** {v['url']} (YouTube ID `{v['youtube_id']}`)
|
||||
**Cluster:** {v['cluster']} ({CLUSTER_LEGEND.get(v['cluster'], '')})
|
||||
**Author:** {v['author'] or '(unknown)'}
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Acquire
|
||||
|
||||
{yt_dlp_verify_step}- [ ] **Step 1: Run extract_transcript.py**
|
||||
- `uv run python scripts/video_analysis/extract_transcript.py {v['url']} artifacts/transcript.json`
|
||||
- Commit `artifacts/transcript.json` atomically.
|
||||
- [ ] **Step 2: Run download_video.py**
|
||||
- `uv run python scripts/video_analysis/download_video.py {v['url']} artifacts/video.mp4`
|
||||
- Commit `artifacts/video.mp4` (gitignored) + `artifacts/video.log` atomically.
|
||||
|
||||
## Phase 2: Keyframes
|
||||
|
||||
- [ ] **Step 1: Run extract_keyframes.py**
|
||||
- `uv run python scripts/video_analysis/extract_keyframes.py artifacts/video.mp4 artifacts/frames --threshold 0.4`
|
||||
- Commit `artifacts/frames/*.jpg` + `artifacts/extraction_meta.json` atomically.
|
||||
- [ ] **Step 2: Manual review** — flag any frames that look wrong.
|
||||
|
||||
## Phase 3: OCR
|
||||
|
||||
- [ ] **Step 1: Run ocr_frames.py**
|
||||
- `uv run python scripts/video_analysis/ocr_frames.py artifacts/frames artifacts/ocr.md --backend winsdk`
|
||||
- Commit `artifacts/ocr.md` atomically.
|
||||
- [ ] **Step 2: Spot-check OCR quality.**
|
||||
|
||||
## Phase 4: Synthesis (DELEGATE TO TIER 3 WORKER)
|
||||
|
||||
- [ ] **Step 1: Delegate report writing**
|
||||
- Inputs: `artifacts/transcript.json` + `artifacts/ocr.md` + `artifacts/frames/*.jpg`
|
||||
- Output: `report.md` (1000-10000 LOC) + `summary.md` (200-400 words)
|
||||
- 8-section structure per umbrella spec §FR6
|
||||
- Cross-references to other children (forward + backward)
|
||||
- [ ] **Step 2: Human review + iterate**
|
||||
|
||||
## Phase 5: Verification
|
||||
|
||||
- [ ] **Step 1: Idempotency check** — re-run scripts, confirm outputs match modulo timestamps
|
||||
- [ ] **Step 2: Audit checklist** — every section of `report.md` populated, no "TBD"
|
||||
- [ ] **Step 3: Write end-of-track report** at `docs/reports/TRACK_COMPLETION_video_analysis_{v['slug']}_20260621.md`
|
||||
- [ ] **Step 4: Update state.toml** to `status = "completed"`
|
||||
|
||||
## Self-review
|
||||
|
||||
- [ ] `report.md` is 1000-10000 LOC markdown
|
||||
- [ ] `summary.md` is 200-400 words
|
||||
- [ ] All 7 deliverable artifacts present
|
||||
- [ ] All 8 report sections populated
|
||||
- [ ] Per-task commits with git notes
|
||||
"""
|
||||
|
||||
|
||||
def metadata_template(v: dict) -> str:
|
||||
cluster_blockers = CLUSTER_BLOCKED_BY.get(v["cluster"], [])
|
||||
all_blockers = ["video_analysis_campaign_20260621"] + cluster_blockers
|
||||
return json.dumps({
|
||||
"track_id": f"video_analysis_{v['slug']}_20260621",
|
||||
"name": v["title"],
|
||||
"created": "2026-06-21",
|
||||
"status": "spec_approved",
|
||||
"blocked_by": all_blockers,
|
||||
"blocks": [],
|
||||
"priority": "A",
|
||||
"type": "per-child research track (Pass 1 of 3)",
|
||||
"parent": "video_analysis_campaign_20260621",
|
||||
"domain": "meta-tooling (research artifacts; no manual_slop src/ changes)",
|
||||
"cluster": v["cluster"],
|
||||
"youtube_id": v["youtube_id"],
|
||||
"youtube_url": v["url"],
|
||||
"author": v["author"],
|
||||
"scope": {
|
||||
"new_files": [
|
||||
"artifacts/transcript.json",
|
||||
"artifacts/ocr.md",
|
||||
"artifacts/frames/*.jpg",
|
||||
"artifacts/extraction_meta.json",
|
||||
"artifacts/video.mp4 (gitignored)",
|
||||
"artifacts/video.log",
|
||||
"report.md (1000-10000 LOC target)",
|
||||
"summary.md (200-400 words)",
|
||||
],
|
||||
"modified_files": [],
|
||||
"deleted_files": [],
|
||||
},
|
||||
"estimated_effort": {
|
||||
"method": "scope (per conductor/workflow.md Tier 1 Track Initialization Rules). NO day estimates.",
|
||||
"phase_1": "1 task: acquire (transcript + download)",
|
||||
"phase_2": "1 task: keyframes extraction",
|
||||
"phase_3": "1 task: OCR",
|
||||
"phase_4": "1 task: synthesis (delegate to Tier 3 worker)",
|
||||
"phase_5": "1 task: verification",
|
||||
"summary": "5 tasks per child. 12 children total = 60 tasks in campaign.",
|
||||
},
|
||||
"verification_criteria": [
|
||||
"All 7 deliverable artifacts present (transcript.json, video.log, frames/, extraction_meta.json, ocr.md, report.md, summary.md)",
|
||||
"report.md is 1000-10000 LOC markdown",
|
||||
"summary.md is 200-400 words",
|
||||
"All 8 report sections populated (TL;DR, Key Concepts, Frame Analysis, Transcript Highlights, Math/Theoretical Content, Connections, Open Questions, References)",
|
||||
"Idempotency check passes",
|
||||
"Per-task commits with git notes",
|
||||
],
|
||||
"risk_register": [
|
||||
{
|
||||
"id": f"R5-{v['slug']}",
|
||||
"title": "yt-dlp access failure (oEmbed returned 401 for E-cluster videos)",
|
||||
"likelihood": "high" if v["needs_yt_dlp_verify"] else "low",
|
||||
"scope_impact": "Phase 1 Acquire blocked if yt-dlp also fails",
|
||||
"mitigation": "Phase 1 Step 0 verifies yt-dlp access before downloading. Fall back to manual transcript sourcing if yt-dlp fails.",
|
||||
},
|
||||
],
|
||||
"user_directives": [
|
||||
"1000-10000 LOC markdown per video report (per user 2026-06-21)",
|
||||
"Lossless preservation: transcripts (JSON), frames (raw images), OCR (plain text) must be preserved in machine-readable form",
|
||||
"Cross-references: forward + backward to other children in the campaign",
|
||||
],
|
||||
}, indent=2) + "\n"
|
||||
|
||||
|
||||
def state_template(v: dict) -> str:
|
||||
return f"""# Track state for video_analysis_{v['slug']}_20260621
|
||||
# Updated by Tier 2 Tech Lead (during execution)
|
||||
|
||||
[meta]
|
||||
track_id = "video_analysis_{v['slug']}_20260621"
|
||||
name = "{v['title']}"
|
||||
status = "active"
|
||||
current_phase = 1 # Phase 1 = Acquire (first execution phase)
|
||||
last_updated = "2026-06-21"
|
||||
|
||||
[blocked_by]
|
||||
video_analysis_campaign_20260621 = "shipped"
|
||||
""" + (
|
||||
"\n".join(f'{bid} = "shipped"' for bid in CLUSTER_BLOCKED_BY.get(v["cluster"], [])) + "\n" if CLUSTER_BLOCKED_BY.get(v["cluster"]) else ""
|
||||
) + f"""
|
||||
[blocks]
|
||||
# Depends-on: umbrella + cluster-blockers
|
||||
|
||||
[phases]
|
||||
phase_1 = {{ status = "pending", checkpointsha = "", name = "Acquire (transcript + download)" }}
|
||||
phase_2 = {{ status = "pending", checkpointsha = "", name = "Keyframes extraction" }}
|
||||
phase_3 = {{ status = "pending", checkpointsha = "", name = "OCR" }}
|
||||
phase_4 = {{ status = "pending", checkpointsha = "", name = "Synthesis (Tier 3 worker)" }}
|
||||
phase_5 = {{ status = "pending", checkpointsha = "", name = "Verification" }}
|
||||
|
||||
[tasks]
|
||||
t1_1 = {{ status = "pending", commit_sha = "", description = "Run extract_transcript.py + download_video.py. Commit artifacts atomically." }}
|
||||
t2_1 = {{ status = "pending", commit_sha = "", description = "Run extract_keyframes.py with threshold 0.4. Manual review of frames." }}
|
||||
t3_1 = {{ status = "pending", commit_sha = "", description = "Run ocr_frames.py. Spot-check OCR." }}
|
||||
t4_1 = {{ status = "pending", commit_sha = "", description = "Delegate report.md (1000-10000 LOC) + summary.md (200-400 words) to Tier 3 worker." }}
|
||||
t5_1 = {{ status = "pending", commit_sha = "", description = "Idempotency check + audit + end-of-track report." }}
|
||||
|
||||
[verification]
|
||||
all_artifacts_present = false
|
||||
report_loc_target_met = false
|
||||
summary_word_count_met = false
|
||||
end_of_track_report_committed = false
|
||||
"""
|
||||
|
||||
|
||||
def synthesis_metadata() -> str:
|
||||
return json.dumps({
|
||||
"track_id": "video_analysis_synthesis_20260621",
|
||||
"name": "Video Analysis Campaign Synthesis (cross-cutting)",
|
||||
"created": "2026-06-21",
|
||||
"status": "spec_approved",
|
||||
"blocked_by": [f"video_analysis_{v['slug']}_20260621" for v in VIDEOS],
|
||||
"blocks": [],
|
||||
"priority": "A",
|
||||
"type": "synthesis (cross-cutting report consuming all 12 children)",
|
||||
"parent": "video_analysis_campaign_20260621",
|
||||
"domain": "meta-tooling (research artifacts; no manual_slop src/ changes)",
|
||||
"scope": {
|
||||
"new_files": [
|
||||
"per_video_summary.md (one paragraph 150-250 words per video)",
|
||||
"report.md (6-section cross-cutting synthesis)",
|
||||
],
|
||||
"modified_files": [],
|
||||
"deleted_files": [],
|
||||
},
|
||||
"estimated_effort": {
|
||||
"method": "scope (per conductor/workflow.md Tier 1 Track Initialization Rules). NO day estimates.",
|
||||
"summary": "1 task: delegate synthesis to Tier 3 worker. Consumes all 12 children's report.md + summary.md.",
|
||||
},
|
||||
"verification_criteria": [
|
||||
"per_video_summary.md has 12 paragraphs (one per child)",
|
||||
"report.md has 6 sections: Theme Matrix, Cross-Video Concept Map, 5-10 Takeaways, Math Prereq Graph, Open Research Questions, Next-Watch List",
|
||||
"All 12 child tracks shipped (each with their report.md + summary.md)",
|
||||
],
|
||||
"user_directives": [
|
||||
"1000-5000 LOC synthesis report (less than per-video because heavy lifting is in children)",
|
||||
"Lossless preservation directive applies here too — DO NOT over-summarize; Pass 2 will compress",
|
||||
],
|
||||
}, indent=2) + "\n"
|
||||
|
||||
|
||||
def synthesis_state() -> str:
|
||||
return """# Track state for video_analysis_synthesis_20260621
|
||||
|
||||
[meta]
|
||||
track_id = "video_analysis_synthesis_20260621"
|
||||
name = "Video Analysis Campaign Synthesis"
|
||||
status = "active"
|
||||
current_phase = 1
|
||||
last_updated = "2026-06-21"
|
||||
|
||||
[blocked_by]
|
||||
""" + "\n".join(f'video_analysis_{v["slug"]}_20260621 = "shipped"' for v in VIDEOS) + """
|
||||
|
||||
[blocks]
|
||||
|
||||
[phases]
|
||||
phase_1 = { status = "pending", checkpointsha = "", name = "Verify all 12 children shipped" }
|
||||
phase_2 = { status = "pending", checkpointsha = "", name = "Delegate synthesis to Tier 3 worker" }
|
||||
phase_3 = { status = "pending", checkpointsha = "", name = "Human review + iterate" }
|
||||
phase_4 = { status = "pending", checkpointsha = "", name = "End-of-track report" }
|
||||
|
||||
[tasks]
|
||||
t1_1 = { status = "pending", commit_sha = "", description = "Verify all 12 children have report.md + summary.md" }
|
||||
t2_1 = { status = "pending", commit_sha = "", description = "Delegate synthesis (per_video_summary.md + report.md) to Tier 3 worker" }
|
||||
t3_1 = { status = "pending", commit_sha = "", description = "Human review + iterate" }
|
||||
t4_1 = { status = "pending", commit_sha = "", description = "Write end-of-track report" }
|
||||
"""
|
||||
|
||||
|
||||
def main() -> None:
|
||||
for v in VIDEOS:
|
||||
folder = TRACKS_DIR / f"video_analysis_{v['slug']}_20260621"
|
||||
plan_path = folder / "plan.md"
|
||||
meta_path = folder / "metadata.json"
|
||||
state_path = folder / "state.toml"
|
||||
plan_path.write_text(plan_template(v), encoding="utf-8")
|
||||
meta_path.write_text(metadata_template(v), encoding="utf-8")
|
||||
state_path.write_text(state_template(v), encoding="utf-8")
|
||||
print(f"Wrote: {plan_path}, {meta_path}, {state_path}")
|
||||
|
||||
synth_folder = TRACKS_DIR / "video_analysis_synthesis_20260621"
|
||||
synth_folder.mkdir(parents=True, exist_ok=True)
|
||||
(synth_folder / "metadata.json").write_text(synthesis_metadata(), encoding="utf-8")
|
||||
(synth_folder / "state.toml").write_text(synthesis_state(), encoding="utf-8")
|
||||
print(f"Wrote synthesis: metadata.json + state.toml")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user