diff --git a/conductor/tracks/video_analysis_deob_lexicon_20260621/TIER2_STARTER.md b/conductor/tracks/video_analysis_deob_lexicon_20260621/TIER2_STARTER.md new file mode 100644 index 00000000..7e4ee2d2 --- /dev/null +++ b/conductor/tracks/video_analysis_deob_lexicon_20260621/TIER2_STARTER.md @@ -0,0 +1,239 @@ +# Tier 2 Starter Prompt: Video Analysis De-obfuscation — Lexicon Refinement + +**Purpose.** This file is the dispatch prompt for Tier 2 autonomous agents picking up the Phase 1 (lexicon) child track. It supplements the auto-loaded `spec.md` + `plan.md` per `conductor/tier2/commands/tier-2-auto-execute.md` step 2. + +**Track:** `video_analysis_deob_lexicon_20260621` (Pass 2 of 3, Phase 1 of 3 within Pass 2) + +--- + +## Track identity + +- **ID:** `video_analysis_deob_lexicon_20260621` +- **Type:** Research-only child track (Phase 1 of Pass 2) +- **Status:** spec ✓ (lightweight scaffold; plan/metadata/state to be created by Tier 2) +- **Priority:** A (user-blocking; Pass 2 of the 3-pass research campaign) +- **Domain:** Meta-tooling (research deliverable; no `src/` changes) + +## Mission (what this track produces) + +The lexicon child consumes the warmup's `report.md` + `prompt_template.md` and produces **3 codified operational artifacts**: + +1. **`lexicon.md`** — the codified operational spec; the "contract" between the warmup and the apply phases. **The principled spine is preserved** (5 rules + 6 noise-dedup maps + form-anchor examples + etymology rule + lossless preservation); **user-specific re-encodings are moved to Appendix B** as optional output conventions. + +2. **`terms_catalog.md`** — the machine-readable lexicon (~70 terms in 4 tiers) with: per-term tier, conventional form, principled re-encoding, optional user-specific form, etymology, form anchor, source cluster. Each entry is tagged `[principled]` or `[user-also-accepted]` to make the distinction explicit. + +3. **`dedup_map.md`** — the 6 noise-dedup maps refined with: per-map source clusters, examples, and a clear distinction between principled maps (3) and user-preferred maps (3). + +This Phase 1 child is **NOT a research track** — it does NOT survey new material. It refines the warmup's draft into a codified spec. The evidence base is the warmup's 10 cluster sub-reports (`research/cluster_0_*.md` through `research/cluster_9_*.md`, ~2,491 LOC, 137 patterns). + +--- + +## Files to read in this order + +### 1. This Phase 1 child track (REQUIRED) + +- `./spec.md` (lightweight scaffold; read first) +- `./plan.md` (created by Tier 2 during init) + +### 2. The umbrella + warmup (REQUIRED — these are your inputs) + +- `/conductor/tracks/video_analysis_deob_20260621/spec.md` (umbrella; the full design) +- `/conductor/tracks/video_analysis_deob_20260621/README.md` (child index) +- `/conductor/tracks/video_analysis_deob_20260621/state.toml` (current phase + tasks) +- `/conductor/tracks/video_analysis_deob_warmup_20260621/spec.md` (warmup design) +- `/conductor/tracks/video_analysis_deob_warmup_20260621/report.md` (warmup's design doc; the primary input) +- `/conductor/tracks/video_analysis_deob_warmup_20260621/prompt_template.md` (warmup's prompt template; the operational spec the lexicon refines) +- `/conductor/tracks/video_analysis_deob_warmup_20260621/TIER2_STARTER.md` (warmup's dispatch prompt; contains the user-directives log + risk register + verification criteria) + +### 3. The evidence base (LOCAL-ONLY — read but DO NOT commit) + +- `/conductor/tracks/video_analysis_deob_warmup_20260621/research/cluster_0_*.md` through `cluster_9_*.md` (10 cluster sub-reports, ~2,491 LOC; the source material for the lexicon) +- **CRITICAL:** These are derivative work of the user's samples, NOT the samples themselves. Reading them is fine; committing their content verbatim is NOT. The lexicon is the codified form; the cluster sub-reports are the evidence base. + +### 4. Project conventions (REQUIRED at session start per workflow.md) + +- `/AGENTS.md` (critical anti-patterns, file naming, no day estimates, skip-marker policy) +- `/conductor/workflow.md` (task workflow, Tier 2 sandbox conventions, failcount contract) +- `/conductor/code_styleguides/python.md` (1-space indent, type hints, no comments — IF code is written; not applicable here) +- `/conductor/code_styleguides/error_handling.md` (Result[T] pattern — not applicable for this research-only track) + +### 5. Reference tracks (consult as needed) + +- `/conductor/tracks/intent_dsl_survey_20260612/report_v1.2.md` — sibling DSL research track (same report structure, similar scope) +- `/conductor/tracks/video_analysis_deob_warmup_20260621/TIER2_STARTER.md` — sibling track (Pass 2 precursor); reference for prompt-template format + +--- + +## Critical user directives (load-bearing) + +The warmup captured these in `state.toml` `[user_directives_logged]`; they apply to Phase 1 unchanged. + +1. **Constructive type theory as foundation.** "I like Norman Wildberger's work. And I like the constructivist current progress on type theories as a foundational system." Phase 1 must keep the type-theoretic primitives (Cluster 3) as the principled core of the lexicon. + +2. **Boundedness for direct knowledge.** "No observer or mechanism or construct can be infinite in resolution or quantification. To have distinction must have a bounds." Phase 1's `lexicon.md` Rule 1 (Boundedness) is non-negotiable. + +3. **Cycles/iteration allowed but expressed explicitly.** "Infinite is okay well handled CORRECTLY." Phase 1 must keep the `∞_val` / `∞_proc` / `∞_card` lexical disambiguation as a principled rule. + +4. **Etymology-aware lexicon.** Every new term has a 1-line origin + 1-line definition history. The 4-language pattern (Greek + Latin + English + Sanskrit) is the user's preferred convention; the 1-line origin is the scheme-canonical minimum. + +5. **Lossless preservation with explicit compression history.** (User 2026-06-23 directive.) Phase 1 must preserve the `prompt_template.md` Rule 4 with the "compression notes" field per layer. + +6. **Encoding-explicit.** (User 2026-06-23 directive.) Every value-bearing term has an `encoding:` attribute. Phase 1 must keep the encoding taxonomy + the `quantity() : ` form as a principled rule. + +7. **Honest epistemic hedging.** "Don't know what `<<` here is" / "this is probably not the 'full' definition" / "Me fucking around" — the user values honest uncertainty over confident guesses. Phase 1 must preserve the "indefinite — see original" pattern. + +8. **The principled vs user-specific distinction (the surgical-edits context).** Per the warmup's 2026-06-23 surgical edits, Phase 1 must FORMALIZE this distinction: + - **Principled entries** are the scheme's canonical forms (from the 5 rules + 6 noise-dedup maps + form-anchor examples). + - **User-specific entries** are the user's personal preferences (Sectored Language V1, GA reinterpretations, classical Greek/Latin/Sanskrit, "construct not invent" rename). These are OPTIONAL output conventions. + - **Phase 1 should NOT** promote user-specific entries to scheme-canonical. **Phase 1 SHOULD** move §3.5 (Sectored Language operator terms) to Appendix B and tag each user-specific entry with `[user-also-accepted]`. + +9. **Secular sanitization (per user 2026-06-23).** "Make sure to sanitize some of the more esoteric or theurgic stuff. I want this to be somewhat secular in its perception so it's better formalization for general audiences." Phase 1 must keep the esoteric content (Witness/Vessel/Aether ontology, classical philosophy, cosmology) OUT of the public `lexicon.md`. It stays in `cluster_0_twitter.md` for the user's reference. + +10. **The 12 unresolved items + 19 meditation-depth items = 31 total.** Per warmup's §A.3 + §11.3. Phase 1 should address these: either (a) include in the lexicon if the answer is clear from the cluster sub-reports, or (b) defer to a future "lexicon v2" with an explicit TODO list. + +--- + +## Key risks (from warmup spec §9 + Tier 2's experience) + +- **R1 (medium):** Tier 2 may revert the surgical edits by re-including user-specific entries in the principled section. **Mitigation:** this starter prompt explicitly tells Tier 2 to FORMALIZE the distinction, not undo it. +- **R2 (medium):** Tier 2 may try to re-survey the samples (wasteful; the cluster sub-reports are already the evidence base). **Mitigation:** this starter prompt tells Tier 2 to REFINE the warmup, not re-survey. +- **R3 (medium):** The 31 unresolved items may bloat the lexicon. **Mitigation:** include only if the answer is clear; otherwise defer with explicit TODO. +- **R4 (low):** Phase 1's `lexicon.md` may grow too large (>3000 LOC). **Mitigation:** cap at 3000; move deep examples to Appendix. +- **R5 (low):** The 4-language pattern (Greek/Latin/English/Sanskrit) in the etymology section may be dropped. **Mitigation:** the warmup preserved it; Phase 1 must preserve it for user-specific terms (per the user's preference). + +--- + +## Hard constraints + +- **No `src/*.py` changes.** This is a research-only track; no production code. +- **No `pyproject.toml` dependencies.** All work is research (markdown files). +- **No `uv pip install` for new packages.** The lexicon child is markdown only. +- **No `scripts/` Python tooling.** Optional in Phase 1 (no, the warmup didn't need it; Phase 1 doesn't either). +- **No day estimates in any artifact.** Scope measured in files/sites per `conductor/workflow.md` Tier 1 Track Initialization Rules. +- **No re-surveying.** Phase 1 refines the warmup's draft; it does NOT re-read the user's samples. The cluster sub-reports in `research/` are the evidence base. +- **Per-task atomic commits.** Each deliverable (`lexicon.md`, `terms_catalog.md`, `dedup_map.md`) is committed in its own commit with a git note. +- **No comments in code** (if any code is written; not applicable here). +- **1-space indent, type hints** (if any code is written; not applicable here). + +--- + +## Tier 2 sandbox conventions (per `conductor/tier2/agents/tier2-autonomous.md`) + +- **Test runner:** `uv run python scripts/run_tests_batched.py` (NEVER `uv run pytest` directly). Not applicable for this research-only track — no tests to run. +- **Default branch:** `master`. Use `origin/master` for `git fetch` and as the base for new branches. +- **Line endings:** preserve existing (CRLF stays CRLF, LF stays LF). +- **Throw-away scripts:** `scripts/tier2/artifacts//` (not the base `scripts/tier2/` directory). Not applicable here. +- **End-of-track report:** `docs/reports/TRACK_COMPLETION_.md` per `conductor/tier2/agents/tier2-autonomous.md` step 42. +- **State update:** `state.toml` → `status = "completed"` at the end. +- **Hard bans:** `git push*`, `git checkout*`, `git restore*`, `git reset*` (3-layer enforced; see `conductor/tier2/agents/tier2-autonomous.md` for details). +- **File access:** Tier 2 clone only (Windows restricted token + OpenCode permission rules). **NEVER USE APPDATA** — denied at the bash level. +- **Failcount contract:** After every task commit, check `should_give_up` from `scripts.tier2.failcount`. State persisted at `tests/artifacts/tier2_state//state.json`. Thresholds: 3 consecutive red, 3 consecutive green, 30 min no progress. + +--- + +## Verification criteria (gate for Phase 1 completion) + +Per the lexicon child spec §6: + +- [ ] `lexicon.md` exists, follows the refined structure (principled spine + Appendix B for user-specific) +- [ ] `terms_catalog.md` exists, machine-readable (per-term table with principled/user-also-accepted tags) +- [ ] `dedup_map.md` exists, refined 6 maps with principled vs user-preferred distinction +- [ ] §3.5 (Sectored Language operator terms) moved to Appendix B with "User's preferred output conventions" framing +- [ ] Each user-specific entry in §3.4 tagged `[user-also-accepted]` +- [ ] The 4-language pattern (Greek/Latin/English/Sanskrit) preserved for user-specific terms +- [ ] The esoteric content (Witness/Vessel/Aether ontology, classical philosophy) NOT in the public lexicon +- [ ] The 12 unresolved items + 19 meditation-depth items addressed (include if clear, defer with TODO if not) +- [ ] 5 rules preserved (Boundedness, Form-anchor, Etymology, Lossless, Encoding-explicit) +- [ ] 6 noise-dedup maps preserved (3 principled: proofs=programs, sets=kinds, functions=procedures; 3 user-preferred: GA collapse, invent→construct, number=quantity) +- [ ] Lossless + compression history preserved (per user 2026-06-23) +- [ ] User has reviewed and approved the refined lexicon +- [ ] All 3 deliverables committed atomically (one commit per deliverable) +- [ ] Git notes attached to each commit +- [ ] `state.toml` updated to `status = "completed"` +- [ ] `docs/reports/TRACK_COMPLETION_video_analysis_deob_lexicon_20260621.md` exists + +--- + +## Execution plan (per lexicon child plan.md) + +| Phase | Task | Notes | +|---|---|---| +| 0 | Initialize Phase 1 track | Create `plan.md` + `metadata.json` + `state.toml` per Tier 2 conventions | +| 1 | Read the warmup's `report.md` + `prompt_template.md` | DO NOT re-survey the samples; the cluster sub-reports are the evidence base | +| 2 | Refine the lexicon (5-step process) | (a) Tag each user-specific entry with `[user-also-accepted]`; (b) Move §3.5 to Appendix B; (c) Refine the 6 noise-dedup maps (3 principled, 3 user-preferred); (d) Address the 31 unresolved items; (e) Add test cases from the cluster sub-reports | +| 3 | Produce `lexicon.md` | The codified operational spec; the "contract" between warmup and apply phases | +| 4 | Produce `terms_catalog.md` | Machine-readable lexicon with principled/user-also-accepted tags | +| 5 | Produce `dedup_map.md` | Refined 6 maps with principled vs user-preferred distinction | +| 6 | User review | Pause for user feedback before marking completed | +| 7 | Update `state.toml` to `status = "completed"` | `git add state.toml && git commit -m "conductor(state): mark lexicon completed"` | +| 8 | Write end-of-track report | `docs/reports/TRACK_COMPLETION_video_analysis_deob_lexicon_20260621.md` per `conductor/tier2/agents/tier2-autonomous.md` | + +--- + +## When stuck + +- **Cluster sub-reports are the evidence base; do NOT re-survey the samples.** The 137 patterns are already documented. Phase 1's job is to CODIFY them, not discover new ones. +- **The 4-language pattern is OPTIONAL** in the principled section but MANDATORY for user-specific terms. If you're unsure whether a term is principled or user-specific, default to principled (no etymology required beyond 1-line origin + 1-line definition history). +- **The 31 unresolved items are not blocking.** Include in the lexicon only if the cluster sub-reports have a clear answer; otherwise defer with an explicit TODO at the end of `lexicon.md`. +- **The esoteric content (Witness/Vessel/Aether ontology, classical philosophy) is NOT in the public lexicon.** It stays in `cluster_0_twitter.md` for the user's reference. If a tier-1 entry references these terms, the user has already accepted to keep them; do NOT promote them to scheme-canonical. +- **Sandbox blocks reads from cluster sub-reports?** Per the warmup spec, the cluster sub-reports are committed (they're research artifacts). The Tier 2 clone has them. `git status` should show no untracked files. + +--- + +## Quick reference: dispatch + +``` +/tier-2-auto-execute video_analysis_deob_lexicon_20260621 +``` + +Plus this context (paste BEFORE invoking): + +``` +TRACK: video_analysis_deob_lexicon_20260621 +TYPE: Research-only child track (Pass 2 Phase 1 of 3) +STATUS: spec ✓ (lightweight); plan/metadata/state to be created +PRIORITY: A (user-blocking) + +PRODUCES: 3 deliverables (lexicon.md + terms_catalog.md + dedup_map.md) that refine the warmup's draft into a codified operational spec. +CONSUMES: warmup's report.md + prompt_template.md + research/cluster_*.md (10 cluster sub-reports, ~2,491 LOC) + +CRITICAL: The warmup's 2026-06-23 surgical edits distinguished principled re-encodings (from the 5 rules) from user-specific re-encodings (Sectored Language, GA, classical Greek/Latin). Phase 1 must FORMALIZE this distinction: +- Tag each user-specific entry with [user-also-accepted] +- Move §3.5 (Sectored Language operator terms) to Appendix B +- DO NOT promote user-specific entries to scheme-canonical +- DO NOT re-include esoteric content (Witness/Vessel/Aether) in the public lexicon +- DO NOT re-survey the samples; the cluster sub-reports are the evidence base + +USER DIRECTIVES (load-bearing): +1. Constructive type theory as foundation +2. Boundedness for direct knowledge +3. Cycles/iteration explicit; no "infinite" values +4. Etymology-aware lexicon (1-line origin + 1-line history) +5. Lossless with compression history +6. Encoding-explicit (every value-bearing term has encoding: attribute) +7. Honest epistemic hedging +8. Principled vs user-specific distinction (the surgical-edit context) +9. Secular sanitization +10. 31 unresolved items (12 + 19) to address + +FILES TO READ FIRST: +1. ./TIER2_STARTER.md (this file) +2. ./spec.md (the lightweight scaffold) +3. /conductor/tracks/video_analysis_deob_20260621/spec.md (umbrella) +4. /conductor/tracks/video_analysis_deob_warmup_20260621/report.md (warmup design) +5. /conductor/tracks/video_analysis_deob_warmup_20260621/prompt_template.md (warmup prompt) +6. /conductor/tracks/video_analysis_deob_warmup_20260621/research/cluster_*.md (10 cluster sub-reports, evidence base) + +EXECUTION: Read warmup outputs → refine (tag user-specific, move §3.5 to Appendix B) → produce 3 deliverables → user review → closeout. +``` + +--- + +## Post-Phase-1 + +After Phase 1 ships, Phase 2 (pilot) can start: +- `video_analysis_deob_pilot_20260621/` (consumes the refined `lexicon.md` + `terms_catalog.md` + `dedup_map.md`) +- Applies the lexicon to 2 Pass 1 reports (`cs229_building_llms` + `entropy_epiplexity`) +- Captures refinements in `pilot_report.md` + +Each phase has its own spec.md (already scaffolded). Tier 2 will be dispatched for each phase separately. diff --git a/conductor/tracks/video_analysis_deob_lexicon_20260621/metadata.json b/conductor/tracks/video_analysis_deob_lexicon_20260621/metadata.json new file mode 100644 index 00000000..9d90aba1 --- /dev/null +++ b/conductor/tracks/video_analysis_deob_lexicon_20260621/metadata.json @@ -0,0 +1,145 @@ +{ + "track_id": "video_analysis_deob_lexicon_20260621", + "name": "Video Analysis De-obfuscation — Lexicon Refinement (Pass 2 Phase 1 of 3)", + "created": "2026-06-21", + "status": "spec_approved", + "blocked_by": [ + "video_analysis_deob_warmup_20260621" + ], + "blocks": [ + "video_analysis_deob_pilot_20260621", + "video_analysis_deob_apply_20260621" + ], + "priority": "A", + "rationale": "User-blocking Phase 1 of Pass 2 (de-obfuscation). Consumes the warmup's report.md + prompt_template.md and refines them into a codified operational spec (lexicon.md + terms_catalog.md + dedup_map.md). Formalizes the 2026-06-23 surgical-edits distinction between principled re-encodings (from the 5 rules) and user-specific re-encodings (from the samples). Research-only; no src/ changes.", + "type": "research-only child track (Pass 2 Phase 1 of 3)", + "domain": "meta-tooling (research deliverable; no manual_slop src/ changes)", + "scope": { + "new_folders": [ + "conductor/tracks/video_analysis_deob_lexicon_20260621/" + ], + "new_files": [ + "spec.md (lightweight scaffold)", + "plan.md", + "metadata.json", + "state.toml", + "TIER2_STARTER.md", + "lexicon.md (~1000-2000 LOC; the codified operational spec)", + "terms_catalog.md (machine-readable lexicon)", + "dedup_map.md (the 6 noise-dedup maps refined)" + ], + "modified_files": [], + "deleted_files": [], + "gitignored_patterns": [] + }, + "estimated_effort": { + "method": "scope (per conductor/workflow.md Tier 1 Track Initialization Rules). NO day estimates.", + "phase_0": "1 task: init state.toml", + "phase_1": "5 tasks: read warmup outputs (no re-survey; spot-check 2-3 cluster sub-reports)", + "phase_2": "5 tasks: tag user-specific entries, move §3.5 to Appendix B, refine 6 dedup maps, add 5-10 test cases, address 31 unresolved items", + "phase_3": "6 tasks: write 3 deliverables + 3 atomic commits with git notes", + "phase_4": "2 tasks: user review + state.toml update", + "phase_5": "3 tasks: idempotency check, audit, end-of-track report", + "summary": "5 phases, 22 tasks, 3 deliverables (~1500-3000 LOC combined), 1 user action item (review). No day estimates per project convention." + }, + "verification_criteria": [ + "All 3 deliverables present (lexicon.md + terms_catalog.md + dedup_map.md)", + "lexicon.md has the 5 rules (Boundedness, Form-anchor, Etymology, Lossless, Encoding-explicit) + 4 tiered terms + 6 noise-dedup maps + test cases", + "§3.5 (Sectored Language operator terms) moved to Appendix B", + "Each user-specific entry in §3.4 tagged [user-also-accepted]", + "The 4-language pattern (Greek/Latin/English/Sanskrit) preserved for user-specific terms", + "The esoteric content (Witness/Vessel/Aether ontology, classical philosophy) NOT in the public lexicon", + "The 31 unresolved items addressed (include if clear, defer with TODO if not)", + "At least 5 test cases included (drawn from the cluster sub-reports, not verbatim from samples)", + "User has reviewed and approved the refined lexicon", + "All 3 deliverables committed atomically (one commit per deliverable)", + "Git notes attached to each commit", + "state.toml updated to status = 'completed'", + "End-of-track report at docs/reports/TRACK_COMPLETION_video_analysis_deob_lexicon_20260621.md", + "No new src/*.py files created (per AGENTS.md File Size and Naming Convention)", + "No new pyproject.toml dependencies" + ], + "risk_register": [ + { + "id": "R1", + "title": "Tier 2 reverts the surgical edits by re-including user-specific entries in the principled section", + "likelihood": "medium", + "scope_impact": "undoes the 2026-06-23 user refinement", + "mitigation": "TIER2_STARTER.md explicitly tells Tier 2 to FORMALIZE the distinction, not undo it; lexicon.md must tag user-specific entries with [user-also-accepted]" + }, + { + "id": "R2", + "title": "Tier 2 re-surveys the samples (wasteful)", + "likelihood": "medium", + "scope_impact": "duplicates work; bloats lexicon.md", + "mitigation": "TIER2_STARTER.md tells Tier 2 to REFINE the warmup, not re-survey; the cluster sub-reports are the evidence base" + }, + { + "id": "R3", + "title": "The 31 unresolved items bloat the lexicon", + "likelihood": "medium", + "scope_impact": "lexicon.md grows too large", + "mitigation": "include only if cluster sub-reports have a clear answer; otherwise defer with explicit TODO" + }, + { + "id": "R4", + "title": "lexicon.md grows too large (>3000 LOC)", + "likelihood": "low", + "scope_impact": "hard to reference", + "mitigation": "cap at 3000 LOC; move deep examples to Appendix" + }, + { + "id": "R5", + "title": "The 4-language pattern is dropped for user-specific terms", + "likelihood": "low", + "scope_impact": "user preference not preserved", + "mitigation": "TIER2_STARTER.md tells Tier 2 to preserve the 4-language pattern for user-specific terms (per the warmup's etymology rule)" + }, + { + "id": "R6", + "title": "Esoteric content (Witness/Vessel/Aether) leaks into the public lexicon", + "likelihood": "low", + "scope_impact": "violates user 2026-06-23 secular sanitization directive", + "mitigation": "TIER2_STARTER.md tells Tier 2 to keep esoteric content in cluster_0_twitter.md only; it stays OUT of lexicon.md" + } + ], + "architecture_reference": { + "primary_documents": [ + "conductor/workflow.md (track convention, per-task commits, git notes)", + "conductor/tracks/video_analysis_deob_20260621/spec.md (umbrella design)", + "conductor/tracks/video_analysis_deob_warmup_20260621/report.md (warmup design doc; primary input)", + "conductor/tracks/video_analysis_deob_warmup_20260621/prompt_template.md (warmup prompt; operational input)" + ], + "related_tracks": [ + "conductor/tracks/video_analysis_deob_warmup_20260621/ (upstream consumer; the precursor)", + "conductor/tracks/video_analysis_deob_pilot_20260621/ (downstream consumer; consumes the refined lexicon)", + "conductor/tracks/video_analysis_deob_apply_20260621/ (downstream consumer; consumes the refined lexicon)", + "conductor/tracks/intent_dsl_survey_20260612/ (sibling research track; same report structure)" + ] + }, + "deferred_to_followup_tracks": [ + { + "title": "Phase 2 (pilot)", + "description": "Applies the refined lexicon to 2 Pass 1 reports (cs229_building_llms + entropy_epiplexity).", + "track_status": "blocked by this track" + }, + { + "title": "Phase 3 (apply)", + "description": "Applies the refined lexicon to 10 remaining Pass 1 reports + 1 cross-cutting synthesis.", + "track_status": "blocked by Phase 2" + } + ], + "regressions_and_pre_existing_failures": [], + "pre_existing_failures_remaining": [], + "user_directives": [ + "Constructive type theory as foundation (2026-06-21)", + "Boundedness for direct knowledge; cycles/iteration explicit (2026-06-21)", + "Etymology-aware lexicon (2026-06-21)", + "Lossless with explicit compression history (2026-06-23)", + "Encoding-explicit (every value-bearing term has encoding: attribute) (2026-06-23)", + "Honest epistemic hedging (2026-06-21)", + "Secular sanitization - esoteric content OUT of public lexicon (2026-06-23)", + "Principled vs user-specific distinction (the 2026-06-23 surgical edits; Phase 1 formalizes this)", + "No day estimates per conductor/workflow.md Tier 1 Track Initialization Rules (added 2026-06-16). Scope measured in files/sites only." + ] +} diff --git a/conductor/tracks/video_analysis_deob_lexicon_20260621/plan.md b/conductor/tracks/video_analysis_deob_lexicon_20260621/plan.md new file mode 100644 index 00000000..282bcc86 --- /dev/null +++ b/conductor/tracks/video_analysis_deob_lexicon_20260621/plan.md @@ -0,0 +1,75 @@ +# Plan: Video Analysis De-obfuscation — Lexicon Refinement + +This is the Phase 1 (lexicon) child plan for Pass 2 of the 3-pass research campaign. Per the Tier 1 Track Initialization Rules, scope is measured in files/sites — no day estimates. + +## Phase 0: Init + +- [ ] **Task 0.1:** Initialize the Phase 1 child track: create `state.toml` (Tier 2). + +## Phase 1: Read the warmup outputs (no re-survey) + +- [ ] **Task 1.1:** Read the warmup's `report.md` (the design doc, 576 lines). +- [ ] **Task 1.2:** Read the warmup's `prompt_template.md` (the operational spec, ~430 lines, 5 rules + 6 noise-dedup maps). +- [ ] **Task 1.3:** Read the warmup's `TIER2_STARTER.md` (the user-directives log + risk register + verification criteria). +- [ ] **Task 1.4:** Spot-check 2-3 of the 10 cluster sub-reports in `research/` (the evidence base; do NOT re-survey all 158 sample files). +- [ ] **Task 1.5:** Honor the 2026-06-23 surgical edits: the principled vs user-specific distinction is explicit in the warmup's `report.md` §3.4, §3.5, §4.4, §6.2 (Reading guide notes) and `prompt_template.md` "Your role" + "The Sectored Language Operator Names" + verification checklist. **Phase 1 FORMALIZES this distinction; it does NOT undo it.** + +## Phase 2: Refine the lexicon (5-step process) + +- [ ] **Task 2.1:** Tag every user-specific entry in `report.md` §3.4 with `[user-also-accepted]`. The principled entries (from the 5 rules) stay untagged; the user-specific entries (Sectored Language V1, GA reinterpretations, classical Greek/Latin) get the tag. +- [ ] **Task 2.2:** Move `report.md` §3.5 (Sectored Language operator terms) to **Appendix B** ("User's preferred output conventions, optional"). The table itself stays; the location changes. +- [ ] **Task 2.3:** Refine the 6 noise-dedup maps in `report.md` §4: clearly mark which are principled (3) and which are user-preferred (3): + - **Principled:** proofs=programs (Curry-Howard), sets=kinds (constructive), functions=procedures (concatenative) + - **User-preferred:** Real=Imaginary=Bivector (GA collapse), invent→construct, number=value=quantity (encoding-explicit) +- [ ] **Task 2.4:** Add 5-10 test cases drawn from the cluster sub-reports. Each test case: original notation → re-encoded form, with the lexicon terms used. **The test cases use the SHAPE of the re-encoding, not verbatim sample content** (per the warmup spec's "extract patterns, not content" directive). +- [ ] **Task 2.5:** Address the 31 unresolved items (12 from warmup §A.3 + 19 from §11.3). For each: + - If the cluster sub-reports have a clear answer: include in the lexicon with a citation. + - If unclear: add to a "Deferred to lexicon v2" TODO list at the end of `lexicon.md`. + +## Phase 3: Codify (produce the 3 deliverables) + +- [ ] **Task 3.1:** Write `lexicon.md` (~1000-2000 LOC). Structure: + - §1 The 4 Rules (Boundedness, Form-anchor, Etymology, Lossless) + Rule 5 (Encoding-explicit, per user 2026-06-23) + - §2 The 4 Tiers (~70 terms, each tagged `[principled]` or `[user-also-accepted]`) + - §3 The 6 Noise-Dedup Maps (3 principled, 3 user-preferred, clearly marked) + - §4 Test Cases (5-10, drawn from cluster sub-reports) + - §5 Form-Anchor Requirement (formal definition) + - §6 Etymology Requirement (1-line origin + 1-line history; 4-language for user-specific terms) + - §7 Encoding-Explicit Requirement (per Rule 5) + - §8 Cross-References to Warmup + Phase 2/3 (downstream) + - Appendix A: Provenance (cluster sub-reports) + - **Appendix B: User's preferred output conventions (optional)** — moved from `report.md` §3.5 +- [ ] **Task 3.2:** Commit `lexicon.md` with git note summarizing the principled vs user-specific formalization. +- [ ] **Task 3.3:** Write `terms_catalog.md` (machine-readable). Per-term table with: tier, conventional form, principled re-encoding, optional user-specific form (`[user-also-accepted]`), etymology, form anchor, source cluster. +- [ ] **Task 3.4:** Commit `terms_catalog.md` with git note. +- [ ] **Task 3.5:** Write `dedup_map.md` (the 6 maps refined). Each map: source clusters, principled/user-preferred flag, examples (drawn from cluster sub-reports), edge cases. +- [ ] **Task 3.6:** Commit `dedup_map.md` with git note. + +## Phase 4: User review + +- [ ] **Task 4.1:** User reviews the 3 deliverables + the user-specific tagging. Approves, or iterates (loop back to Phase 2). +- [ ] **Task 4.2:** Update `state.toml` to `status = "completed"`. + +## Phase 5: Verification + +- [ ] **Task 5.1:** Idempotency check (re-read the warmup's outputs, confirm the refined lexicon is consistent). +- [ ] **Task 5.2:** Audit checklist: every user-specific entry tagged; esoteric content NOT in public lexicon; 5 rules preserved; 6 noise-dedup maps preserved; 31 unresolved items addressed. +- [ ] **Task 5.3:** Write end-of-track report at `docs/reports/TRACK_COMPLETION_video_analysis_deob_lexicon_20260621.md`. + +## Verification (gate per workflow.md) + +- [ ] All 3 deliverables present (`lexicon.md` + `terms_catalog.md` + `dedup_map.md`) +- [ ] `lexicon.md` has the 5 rules (Boundedness, Form-anchor, Etymology, Lossless, Encoding-explicit) + 4 tiered terms + 6 noise-dedup maps + test cases +- [ ] §3.5 (Sectored Language operator terms) moved to Appendix B +- [ ] Each user-specific entry in §3.4 tagged `[user-also-accepted]` +- [ ] The 4-language pattern (Greek/Latin/English/Sanskrit) preserved for user-specific terms +- [ ] The esoteric content (Witness/Vessel/Aether ontology, classical philosophy) NOT in the public lexicon +- [ ] The 31 unresolved items addressed (include if clear, defer with TODO if not) +- [ ] At least 5 test cases included (drawn from the cluster sub-reports) +- [ ] User has reviewed and approved +- [ ] All 3 deliverables committed atomically +- [ ] Git notes attached to each commit +- [ ] `state.toml` updated to `status = "completed"` +- [ ] End-of-track report at `docs/reports/TRACK_COMPLETION_video_analysis_deob_lexicon_20260621.md` + +The Phase 1 child is "Pass 2 Phase 1 complete" when all 3 deliverables are committed + user-approved. Phase 2 (pilot) can then start. diff --git a/conductor/tracks/video_analysis_deob_lexicon_20260621/spec.md b/conductor/tracks/video_analysis_deob_lexicon_20260621/spec.md index f1d8a5f2..4afe8a5c 100644 --- a/conductor/tracks/video_analysis_deob_lexicon_20260621/spec.md +++ b/conductor/tracks/video_analysis_deob_lexicon_20260621/spec.md @@ -29,13 +29,13 @@ | Dedup maps | `dedup_map.md` | The 3+ noise-dedup maps from the warmup, refined with examples | | Test cases | (5-10 example transformations, embedded in `lexicon.md`) | Each test case: original notation → re-encoded form, with the lexicon terms used | -**Optional (added per child track execution convention):** `plan.md`, `metadata.json`, `state.toml`. +**Optional (added per child track execution convention):** `plan.md`, `metadata.json`, `state.toml`. (These are now present in the folder, scaffolded at spec time for Tier 2 to consume. Per the warmup pattern, the child track's `plan.md` enumerates the 5-phase pipeline; `metadata.json` is the scope/risk register; `state.toml` is the task tracker.) ## 3. Pipeline (5 phases) Per the umbrella spec §5 (Phase 2 of the umbrella). Each phase commits atomically. -- [ ] **Phase 1: Init.** Initialize the child track (`plan.md` + `metadata.json` + `state.toml` per Tier 2 conventions). +- [ ] **Phase 1: Init.** Initialize the child track (Tier 2 reads the scaffolded `plan.md` + `metadata.json` + `state.toml`). - [ ] **Phase 2: Refine.** Tier 3 worker reads the warmup's `report.md` + `prompt_template.md`. Refines the lexicon by: - Adding 5-10 test cases drawn from the user's samples - Adding the "form anchor" requirement to each term @@ -102,3 +102,7 @@ Per the umbrella spec §5 (Phase 2 of the umbrella). Each phase commits atomical - [Umbrella spec.md](../../video_analysis_deob_20260621/spec.md) - [Umbrella README.md](../../video_analysis_deob_20260621/README.md) - [Warmup spec.md](../../video_analysis_deob_warmup_20260621/spec.md) +- [Warmup report.md](../../video_analysis_deob_warmup_20260621/report.md) (the design doc; primary input) +- [Warmup prompt_template.md](../../video_analysis_deob_warmup_20260621/prompt_template.md) (the operational spec; the input being codified) +- [Warmup TIER2_STARTER.md](../../video_analysis_deob_warmup_20260621/TIER2_STARTER.md) (the user-directives log + risk register + verification criteria) +- [TIER2_STARTER.md](./TIER2_STARTER.md) (the dispatch prompt for Tier 2) diff --git a/conductor/tracks/video_analysis_deob_lexicon_20260621/state.toml b/conductor/tracks/video_analysis_deob_lexicon_20260621/state.toml new file mode 100644 index 00000000..92da694a --- /dev/null +++ b/conductor/tracks/video_analysis_deob_lexicon_20260621/state.toml @@ -0,0 +1,84 @@ +# Track state for video_analysis_deob_lexicon_20260621 +# Updated by Tier 2 Tech Lead during execution + +[meta] +track_id = "video_analysis_deob_lexicon_20260621" +name = "Video Analysis De-obfuscation — Lexicon Refinement (Pass 2 Phase 1 of 3)" +status = "active" +current_phase = 0 # Phase 0 = init +last_updated = "2026-06-23" + +[blocked_by] +video_analysis_deob_warmup_20260621 = "shipped 2026-06-23 (after the 2026-06-23 surgical edits)" + +[blocks] +video_analysis_deob_pilot_20260621 = "blocked (consumes lexicon.md + terms_catalog.md + dedup_map.md)" +video_analysis_deob_apply_20260621 = "blocked (consumes lexicon.md + terms_catalog.md + dedup_map.md)" + +[phases] +phase_0 = { status = "pending", checkpointsha = "", name = "Init (state.toml)" } +phase_1 = { status = "pending", checkpointsha = "", name = "Read the warmup outputs (no re-survey)" } +phase_2 = { status = "pending", checkpointsha = "", name = "Refine the lexicon (5-step process)" } +phase_3 = { status = "pending", checkpointsha = "", name = "Codify (produce 3 deliverables)" } +phase_4 = { status = "pending", checkpointsha = "", name = "User review" } +phase_5 = { status = "pending", checkpointsha = "", name = "Verification + end-of-track report" } + +[tasks] +# Phase 0 (init) +t0_1 = { status = "pending", commit_sha = "", description = "Initialize Phase 1 child track: create state.toml per Tier 2 conventions" } + +# Phase 1 (read warmup outputs) +t1_1 = { status = "pending", commit_sha = "", description = "Read the warmup's report.md (the design doc, 576 lines)" } +t1_2 = { status = "pending", commit_sha = "", description = "Read the warmup's prompt_template.md (the operational spec, ~430 lines)" } +t1_3 = { status = "pending", commit_sha = "", description = "Read the warmup's TIER2_STARTER.md (user-directives log + risk register + verification criteria)" } +t1_4 = { status = "pending", commit_sha = "", description = "Spot-check 2-3 of the 10 cluster sub-reports in research/ (the evidence base; do NOT re-survey all 158 sample files)" } +t1_5 = { status = "pending", commit_sha = "", description = "Honor the 2026-06-23 surgical edits: the principled vs user-specific distinction is explicit in the warmup's report.md §3.4, §3.5, §4.4, §6.2 and prompt_template.md. Phase 1 FORMALIZES this distinction; it does NOT undo it." } + +# Phase 2 (refine the lexicon) +t2_1 = { status = "pending", commit_sha = "", description = "Tag every user-specific entry in report.md §3.4 with [user-also-accepted]. The principled entries (from the 5 rules) stay untagged; the user-specific entries (Sectored Language V1, GA reinterpretations, classical Greek/Latin) get the tag." } +t2_2 = { status = "pending", commit_sha = "", description = "Move report.md §3.5 (Sectored Language operator terms) to Appendix B (User's preferred output conventions, optional). The table itself stays; the location changes." } +t2_3 = { status = "pending", commit_sha = "", description = "Refine the 6 noise-dedup maps in report.md §4: clearly mark which are principled (3) and which are user-preferred (3)" } +t2_4 = { status = "pending", commit_sha = "", description = "Add 5-10 test cases drawn from the cluster sub-reports (the SHAPE of the re-encoding, not verbatim sample content)" } +t2_5 = { status = "pending", commit_sha = "", description = "Address the 31 unresolved items (12 from warmup §A.3 + 19 from §11.3). Include if cluster sub-reports have a clear answer; otherwise defer with explicit TODO at the end of lexicon.md." } + +# Phase 3 (codify) +t3_1 = { status = "pending", commit_sha = "", description = "Write lexicon.md (~1000-2000 LOC). Structure: §1 5 Rules, §2 4 Tiers (~70 terms), §3 6 Noise-Dedup Maps, §4 Test Cases, §5 Form-Anchor, §6 Etymology, §7 Encoding-Explicit, §8 Cross-Refs, Appendix A Provenance, Appendix B User's preferred output conventions" } +t3_2 = { status = "pending", commit_sha = "", description = "Commit lexicon.md with git note summarizing the principled vs user-specific formalization" } +t3_3 = { status = "pending", commit_sha = "", description = "Write terms_catalog.md (machine-readable). Per-term table with: tier, conventional form, principled re-encoding, optional user-specific form, etymology, form anchor, source cluster" } +t3_4 = { status = "pending", commit_sha = "", description = "Commit terms_catalog.md with git note" } +t3_5 = { status = "pending", commit_sha = "", description = "Write dedup_map.md (the 6 maps refined). Each map: source clusters, principled/user-preferred flag, examples (drawn from cluster sub-reports), edge cases" } +t3_6 = { status = "pending", commit_sha = "", description = "Commit dedup_map.md with git note" } + +# Phase 4 (user review) +t4_1 = { status = "pending", commit_sha = "", description = "User reviews the 3 deliverables + the user-specific tagging. Approves or iterates (loop back to Phase 2)" } +t4_2 = { status = "pending", commit_sha = "", description = "Update state.toml to status = 'completed'" } + +# Phase 5 (verification) +t5_1 = { status = "pending", commit_sha = "", description = "Idempotency check (re-read the warmup's outputs, confirm the refined lexicon is consistent)" } +t5_2 = { status = "pending", commit_sha = "", description = "Audit checklist: every user-specific entry tagged; esoteric content NOT in public lexicon; 5 rules preserved; 6 noise-dedup maps preserved; 31 unresolved items addressed" } +t5_3 = { status = "pending", commit_sha = "", description = "Write end-of-track report at docs/reports/TRACK_COMPLETION_video_analysis_deob_lexicon_20260621.md" } + +[verification] +lexicon_md_committed = false +terms_catalog_md_committed = false +dedup_map_md_committed = false +appendix_b_moved = false +user_specific_tagged = false +esoteric_content_excluded = false +test_cases_added = false +unresolved_items_addressed = false +user_approved = false +state_toml_completed = false +end_of_track_report_committed = false + +[user_directives_logged] +constructive_type_theory = "Per user 2026-06-21: 'I like Norman Wildberger's work. And I like the constructivist current progress on type theories as a foundational system.'" +bounded_for_knowledge = "Per user 2026-06-21: 'No observer or mechanism or construct can be infinite in resolution or quantification. To have distinction must have a bounds.'" +cycles_iteration_allowed = "Per user 2026-06-21: 'Infinite is okay well handled CORRECTLY... What can be indefinite is that can be subjected upon is that of cycles, that of iteration, that of repetition.'" +etymology_aware = "Per user 2026-06-21: etymology + definitional history documented" +lossless_compression_history = "Per user 2026-06-23: explicit compression notes per layer in the de-obfuscation's 3-layer output" +encoding_explicit = "Per user 2026-06-23: every value-bearing term has an encoding: attribute (default float64)" +honest_epistemic_hedging = "Per user 2026-06-21: 'Don't know what `<<` here is' / 'Me fucking around' style annotations are valid" +secular_sanitization = "Per user 2026-06-23: esoteric content (Witness/Vessel/Aether, classical philosophy) NOT in public lexicon; stays in cluster_0_twitter.md" +principled_vs_user_specific = "Per user 2026-06-23 surgical edits: Phase 1 formalizes this distinction (tag user-specific entries, move §3.5 to Appendix B)" +no_day_estimates = "Per conductor/workflow.md Tier 1 Track Initialization Rules (added 2026-06-16). Scope measured in files/sites only."