ed
338573b1e8
refactor(video_analysis): extract_transcript.py uses yt-dlp VTT directly (skip youtube-transcript-api which consistently fails for these videos)
...
youtube-transcript-api v1.2.4 returns XML parse error on empty response for ALL videos in this campaign. yt-dlp's --write-auto-subs reliably returns 1000s of segments per video. Switched to yt-dlp as the primary path.
Tests updated to mock _fetch_via_ytdlp instead of _fetch_raw_transcript. 8/8 tests passing.
2026-06-21 16:33:44 -04:00
ed
0bc8abbe9a
conductor(cs229): Phase 1 Acquire - transcript.json (5397 segments via yt-dlp VTT fallback) + video.log (yt-dlp success for 336MB mp4, R5 verified)
...
Fix extract_transcript.py: YouTubeTranscriptApi.get_transcript() (not .fetch()). youtube-transcript-api v1.2.4 uses class method get_transcript(video_id), not instance .fetch().
R5 mitigation: yt-dlp's VTT auto-sub extraction works where youtube-transcript-api fails (XML parse error on empty response). 5397 segments recovered.
Add gitignore patterns for video_analysis artifacts: *.mp4, *.vtt (regenerable). video.log intentionally tracked.
2026-06-21 16:08:15 -04:00
ed
548c4fef63
feat(video_analysis): synthesize_report.py orchestrator with TDD (5 tests)
2026-06-21 15:39:22 -04:00
ed
ed0d198afe
feat(video_analysis): ocr_frames.py with TDD (4 tests, winsdk + tesseract backends)
2026-06-21 15:35:41 -04:00
ed
9ccdedeeb3
feat(video_analysis): extract_keyframes.py with TDD (4 tests)
2026-06-21 15:34:18 -04:00
ed
45a5e81406
feat(video_analysis): download_video.py with TDD (5 tests)
2026-06-21 15:32:46 -04:00
ed
94f4a4eee9
feat(video_analysis): extract_transcript.py with TDD (8 tests)
2026-06-21 15:31:42 -04:00
ed
12fcc55cfc
chore(scripts): scaffold scripts/video_analysis/ + placeholder test
2026-06-21 15:26:56 -04:00