Private
Public Access
0
0

conductor(deob_umbrella): Initialize Pass 2 de-obfuscation campaign umbrella

Pass 2 of 3 multi-pass research campaign. 5 folders total (1 umbrella + 1 warmup + 3 phase children).
- Umbrella spec.md (~400 lines): full design, philosophy, 3-layer deliverable, verification
- Multi-pass framing: Pass 1 = extraction (done), Pass 2 = de-obfuscation (this), Pass 3 = projection (future user-led)
- De-obfuscation philosophy: constructive type theory + Wildberger finitism + boundedness for knowledge + cycles/iteration explicit + etymology-aware
- 4 verification criteria: lossless, bounded, constructively typed, etymology-cited
- Multi-layer deliverable per video: translation (side-by-side) + replacement (re-encoded) + decoder (per-term etymology)
- Phase 0: USER action item (gather 3-10 samples of past de-obfuscation notes)
This commit is contained in:
2026-06-23 00:06:51 -04:00
parent 2b9f7376e0
commit 59ba8ff2ba
5 changed files with 741 additions and 0 deletions
@@ -0,0 +1,70 @@
# Video Analysis De-obfuscation Campaign (2026-06-21) — Pass 2 of 3
**Status:** Active (spec approved 2026-06-21)
**Owner:** Tier 1 Orchestrator (umbrella spec); Tier 2 Tech Lead (per-track execution)
**Type:** Multi-track research campaign (5 folders total)
**User-as-source:** The warmup is blocked on the user providing samples (Phase 0).
This is **Pass 2 of 3** in the research campaign to penetrate the AI field. See [spec.md](./spec.md) for the full design and the user's de-obfuscation philosophy (constructive type theory, bounded forms, cycles/iteration explicit, etymology-aware).
## Children
### Precursor (standalone, not a child of this umbrella)
| | Track | Status | Notes |
|---|-------|--------|-------|
| Warmup | [../video_analysis_deob_warmup_20260621/](../video_analysis_deob_warmup_20260621/) | [ ] | Produces `report.md` + `prompt_template.md` from the user's past samples. User must provide samples first. **Blocks the 3 children below.** |
### Phase children (3)
| # | Track | Status | Input | Output |
|---|-------|--------|-------|--------|
| 1 | [../video_analysis_deob_lexicon_20260621/](../video_analysis_deob_lexicon_20260621/) | [ ] | Warmup's `report.md` + `prompt_template.md` | `lexicon.md` + `terms_catalog.md` + `dedup_map.md` |
| 2 | [../video_analysis_deob_pilot_20260621/](../video_analysis_deob_pilot_20260621/) | [ ] | Lexicon + 2 Pass 1 reports (`cs229_building_llms`, `entropy_epiplexity`) | 2× 3-layer deliverables + `pilot_report.md` |
| 3 | [../video_analysis_deob_apply_20260621/](../video_analysis_deob_apply_20260621/) | [ ] | Refined lexicon + 10 Pass 1 reports + 1 synthesis | 11× 3-layer deliverables + `apply_report.md` |
## Dependency graph
```
WARMUP (precursor) ──► PHASE 1 (lexicon) ──► PHASE 2 (pilot) ──► PHASE 3 (apply)
│ │ │ │
└────────────────────┴────────────────────┴────────────────────┘
blocked_by chain
```
## Status legend
- `[ ]` not started
- `[~]` in progress
- `[x]` shipped
- `[!]` blocked
## Cluster / phase legend
- **Warmup** — research-style track; produces the initial lexicon + LLM prompt template
- **Phase 1 (lexicon)** — refines the warmup's draft into a codified operational spec
- **Phase 2 (pilot)** — applies the lexicon to 2 videos (1 foundational + 1 math-heavy) for validation
- **Phase 3 (apply)** — applies the refined lexicon to the remaining 10 + synthesis
## Multi-layer deliverable (per video)
For each de-obfuscated video, the output is 3 files in `artifacts/<slug>/`:
1. **`translation.md`** — side-by-side table: original expression ↔ re-encoded form
2. **`deobfuscated.md`** — the re-encoded report (replacement; same 8-section structure as Pass 1)
3. **`decoder.md`** — per-term decoder: form anchor, etymology, definition history, link to original section
## Verification (4 criteria per deliverable)
- **Lossless** — every Pass 1 concept is represented
- **Bounded** — no `∞_val` or `∞_card`; all values are finite forms
- **Constructively typed** — every expression has a type
- **Etymology-cited** — every new term has a 1-line origin + 1-line definition history in the decoder
## See also
- [spec.md](./spec.md) — full design (15 sections, ~400 lines)
- [plan.md](./plan.md) — campaign-level plan
- [metadata.json](./metadata.json) — scope, risk register, verification criteria
- [state.toml](./state.toml) — current phase + task tracking
- `../video_analysis_campaign_20260621/spec.md` §0, §11 — multi-pass framing (Pass 1 → Pass 2 handoff)
- `../intent_dsl_survey_20260612/report_v1.2.md` — sibling DSL (tool-verb DSL for AI agents; shares philosophy but is for tool verbs, not math re-encoding)
@@ -0,0 +1,188 @@
{
"track_id": "video_analysis_deob_20260621",
"name": "Video Analysis De-obfuscation Campaign (Pass 2 of 3)",
"created": "2026-06-21",
"status": "spec_approved",
"blocked_by": [],
"blocks": [
"video_analysis_deob_warmup_20260621"
],
"priority": "A",
"rationale": "User-blocking Pass 2 of the 3-pass research campaign. De-obfuscates the 12 Pass 1 deep-dive reports via the user's constructive type-theoretic re-encoding DSL. Lossless preservation directive (carries from Pass 1). 5 folders: 1 warmup (precursor) + 1 umbrella + 3 phase children (lexicon/pilot/apply). Multi-layer deliverable per video: translation (side-by-side) + replacement (re-encoded) + decoder (per-term etymology).",
"type": "multi-track research campaign (1 umbrella + 1 warmup + 3 phase children = 5 folders)",
"domain": "meta-tooling (research deliverable + LLM operational spec; no manual_slop src/ changes)",
"scope": {
"new_folders": [
"conductor/tracks/video_analysis_deob_20260621/",
"conductor/tracks/video_analysis_deob_warmup_20260621/",
"conductor/tracks/video_analysis_deob_lexicon_20260621/",
"conductor/tracks/video_analysis_deob_pilot_20260621/",
"conductor/tracks/video_analysis_deob_apply_20260621/"
],
"new_files_umbrella": [
"spec.md",
"plan.md",
"metadata.json",
"state.toml",
"README.md"
],
"new_files_warmup": [
"spec.md",
"plan.md",
"metadata.json",
"state.toml",
"samples/<user-provided-files> (gitignored)",
"report.md (the design philosophy + lexicon + dedup maps)",
"prompt_template.md (the LLM-direct operational spec)"
],
"new_files_per_child": [
"spec.md (lightweight)"
],
"new_files_pilot": [
"artifacts/cs229_building_llms/translation.md",
"artifacts/cs229_building_llms/deobfuscated.md",
"artifacts/cs229_building_llms/decoder.md",
"artifacts/entropy_epiplexity/translation.md",
"artifacts/entropy_epiplexity/deobfuscated.md",
"artifacts/entropy_epiplexity/decoder.md",
"pilot_report.md"
],
"new_files_apply": [
"artifacts/<10 remaining slugs>/translation.md (×10)",
"artifacts/<10 remaining slugs>/deobfuscated.md (×10)",
"artifacts/<10 remaining slugs>/decoder.md (×10)",
"artifacts/synthesis/translation.md",
"artifacts/synthesis/deobfuscated.md",
"artifacts/synthesis/decoder.md",
"apply_report.md"
],
"modified_files": [
"conductor/tracks/video_analysis_campaign_20260621/spec.md (§11.1 updated to reference this campaign)",
"conductor/tracks.md (add row for this campaign)",
"conductor/chronology.md (5 new rows after campaign ships)"
],
"deleted_files": [],
"gitignored_patterns": [
"conductor/tracks/video_analysis_deob_warmup_20260621/samples/** (user's past notes are local-only)"
]
},
"estimated_effort": {
"method": "scope (per conductor/workflow.md Tier 1 Track Initialization Rules). NO day estimates.",
"phase_0": "1 USER action item (gather samples)",
"phase_1": "5 tasks: warmup initialization, sample survey, report.md (~1000-3000 LOC), prompt_template.md (~200-500 LOC), user approval",
"phase_2": "4 tasks: lexicon child init, refine warmup's draft, produce lexicon.md + terms_catalog.md + dedup_map.md, user approval",
"phase_3": "5 tasks: pilot child init, apply to 2 videos, write pilot_report.md, user approval",
"phase_4": "4 tasks: apply child init, apply to 10+1 outputs, write apply_report.md",
"phase_5": "6 tasks: closeout (README update, state.toml, end-of-track report, archive move, chronology, tracks.md)",
"summary": "5 track folders, 1 user action item, 13 deliverable files (2 pilot + 11 apply, each 3-layer), 4 reports (warmup report, warmup template, pilot, apply), 5 closeout tasks. No day estimates per project convention."
},
"verification_criteria": [
"Warmup shipped with report.md + prompt_template.md (and the user has approved the lexicon)",
"Phase 1 (lexicon) shipped with lexicon.md + terms_catalog.md + dedup_map.md",
"Phase 2 (pilot) shipped with 2 deobfuscated deliverables (each 3-layer) + pilot_report.md",
"Phase 3 (apply) shipped with 11 deobfuscated deliverables (each 3-layer) + apply_report.md",
"Each deliverable passes the 4 verification criteria: lossless, bounded, constructively typed, etymology-cited",
"Umbrella state.toml updated to status = 'completed'",
"End-of-track report at docs/reports/TRACK_COMPLETION_video_analysis_deob_20260621.md",
"All 5 folders move to conductor/archive/ per the project's archiving convention",
"conductor/chronology.md updated with 5 new rows",
"No new src/*.py files created (per AGENTS.md File Size and Naming Convention)",
"No new pyproject.toml dependencies"
],
"risk_register": [
{
"id": "R1",
"title": "User cannot provide samples in time",
"likelihood": "medium",
"scope_impact": "Warmup blocked",
"mitigation": "User can provide partial samples (1-2 examples); warmup can use them as a starter lexicon"
},
{
"id": "R2",
"title": "User's samples don't have enough de-obfuscation patterns",
"likelihood": "medium",
"scope_impact": "Warmup produces a thin lexicon",
"mitigation": "Phase 1 (lexicon) extends the warmup's draft with constructive type theory defaults"
},
{
"id": "R3",
"title": "Lexicon can't capture a concept in bounded form",
"likelihood": "medium",
"scope_impact": "Some concepts remain 'indefinite — see original'",
"mitigation": "Document the gap; don't force a translation. The 4 verification criteria allow 'etymology-cited' but not 'forced'"
},
{
"id": "R4",
"title": "Pilot reveals the lexicon is overfit to the user's style",
"likelihood": "low",
"scope_impact": "Refinement needed in Phase 2",
"mitigation": "pilot_report.md captures gaps; Phase 3 uses the refined lexicon"
},
{
"id": "R5",
"title": "The 3-layer deliverable format is too verbose for some videos",
"likelihood": "low",
"scope_impact": "Adjust per video",
"mitigation": "Format is a template, not a rigid structure; some sections may be smaller"
},
{
"id": "R6",
"title": "Tier 2 attempts to invent the lexicon without the user's samples",
"likelihood": "low (if user samples present)",
"scope_impact": "Lexicon is invented, not evidence-based",
"mitigation": "Warmup spec is explicit: 'consume user samples FIRST; lexicon is evidence-based'"
},
{
"id": "R7",
"title": "Pass 3 needs the de-obfuscated outputs but Pass 2 isn't done",
"likelihood": "high (timeline)",
"scope_impact": "Pass 3 blocked",
"mitigation": "This campaign's 'lossless preservation' ensures Pass 3 has all the input it needs once Pass 2 ships"
},
{
"id": "R8",
"title": "The user changes their mind about the philosophy mid-campaign",
"likelihood": "low",
"scope_impact": "Pilot reveals the shift; lexicon is updated",
"mitigation": "pilot_report.md is the checkpoint for user review"
}
],
"architecture_reference": {
"primary_documents": [
"conductor/workflow.md (track convention, per-task commits, git notes, verification protocol)",
"conductor/code_styleguides/python.md (1-space indent, type hints, no comments - IF code is written)",
"conductor/code_styleguides/error_handling.md (Result[T] pattern - IF code is written)",
"AGENTS.md (artifact isolation, file naming, no new src/<thing>.py)"
],
"related_tracks": [
"conductor/tracks/intent_dsl_survey_20260612/ (sibling DSL: tool-verb DSL for AI agents, shares philosophy)",
"conductor/tracks/video_analysis_campaign_20260621/ (Pass 1 - the input to de-obfuscate)",
"conductor/tracks/nagent_review_20260608/ (research-track precedent)",
"conductor/tracks/fable_review_20260617/ (research-track precedent)"
],
"styleguides_applied": [
"agent_memory_dimensions.md (Pass 2 produces a 'knowledge' memory artifact)",
"knowledge_artifacts.md (knowledge harvest pattern; relevant to the de-obfuscation's durable nature)"
]
},
"deferred_to_followup_tracks": [
{
"title": "Pass 3: Projection to user's applied domain",
"description": "Apply Pass 2's de-obfuscated outputs to the user's preferred code style. Influences: handmade/data-oriented/GPGPU (Timothy Lottes, Onat Türkçüoğlu, Jebrim) + user's own caveats.",
"track_status": "not started - blocked by this campaign",
"blocker_action_item": "User must articulate 'own caveats' before Pass 3 starts (per Pass 1 spec §11.2)"
}
],
"regressions_and_pre_existing_failures": [],
"pre_existing_failures_remaining": [],
"user_directives": [
"Unorthodox knowledge curation philosophy (2026-06-21)",
"Constructive type theory + Wildberger-style finitism as foundation (2026-06-21)",
"Boundedness required for direct knowledge; cycles/iteration allowed but expressed explicitly (2026-06-21)",
"Multi-layer deliverable per video (translation + replacement + decoder) (2026-06-21)",
"Warmup is the precursor; lexicon is evidence-based from user's past samples (2026-06-21)",
"Report + prompt template as the warmup output (2026-06-21)",
"5 folders at conductor/tracks/ level, hybrid umbrella structure (2026-06-21)",
"No day estimates per conductor/workflow.md Tier 1 Track Initialization Rules"
]
}
@@ -0,0 +1,59 @@
# Plan: Video Analysis De-obfuscation Campaign (umbrella)
This is the umbrella-level plan for Pass 2 of the 3-pass research campaign. Per the Tier 1 Track Initialization Rules, scope is measured in files/sites — no day estimates.
## Phase 0: User samples provided
This phase is a USER action item, not a Tier 2/3 action.
- [ ] **Task 0.1:** User gathers 3-10 samples of their past de-obfuscation notes (any format: markdown, txt, mixed) and places them in `conductor/tracks/video_analysis_deob_warmup_20260621/samples/`.
## Phase 1: Warmup (precursor)
- [ ] **Task 1.1:** Initialize the warmup track: create `plan.md`, `metadata.json`, `state.toml` (Tier 2).
- [ ] **Task 1.2:** Survey the samples (Tier 3 worker delegated): term frequency, structural patterns, "form projection" heuristics, noise-dedup maps.
- [ ] **Task 1.3:** Write `report.md` (Tier 3 worker, ~1000-3000 LOC). Sections: design philosophy (anchored to user directives), the curated lexicon (terms + re-encodings), the 3 noise-dedup maps, sample transformations.
- [ ] **Task 1.4:** Write `prompt_template.md` (Tier 3 worker, ~200-500 LOC). The operational artifact; an LLM can be prompted with this directly to perform the de-obfuscation.
- [ ] **Task 1.5:** User review + approval of the lexicon + template.
## Phase 2: Phase 1 child — Lexicon refinement
- [ ] **Task 2.1:** Initialize the lexicon child track (Tier 2).
- [ ] **Task 2.2:** Refine the warmup's draft: add 5-10 test cases (example transformations drawn from the user's samples); add the "form anchor" requirement; add cross-references to the warmup's report.
- [ ] **Task 2.3:** Produce `lexicon.md` (the codified operational spec), `terms_catalog.md` (the machine-readable lexicon), `dedup_map.md` (the 3 noise-dedup maps).
- [ ] **Task 2.4:** User review + approval before Phase 3 child starts.
## Phase 3: Phase 2 child — Pilot on 2 videos
- [ ] **Task 3.1:** Initialize the pilot child track (Tier 2).
- [ ] **Task 3.2:** Apply the lexicon to `cs229_building_llms` (Pass 1 report). Produce 3-layer deliverable in `artifacts/cs229_building_llms/`.
- [ ] **Task 3.3:** Apply the lexicon to `entropy_epiplexity` (Pass 1 report). Produce 3-layer deliverable in `artifacts/entropy_epiplexity/`.
- [ ] **Task 3.4:** Write `pilot_report.md` capturing: lexicon refinements discovered, concepts that didn't fit (gaps), process improvements.
- [ ] **Task 3.5:** User review + approval before Phase 4 child starts.
## Phase 4: Phase 3 child — Apply to remaining 10 + synthesis
- [ ] **Task 4.1:** Initialize the apply child track (Tier 2).
- [ ] **Task 4.2:** Apply the refined lexicon to each of the 10 remaining Pass 1 reports. Produce 3-layer deliverables in `artifacts/<slug>/`.
- [ ] **Task 4.3:** Apply the refined lexicon to the cross-cutting synthesis. Produce 3-layer deliverable in `artifacts/synthesis/`.
- [ ] **Task 4.4:** Write `apply_report.md` capturing: final lexicon v2, final process refinements, open questions for Pass 3.
## Phase 5: Campaign closeout
- [ ] **Task 5.1:** Update umbrella `README.md` with final statuses (all 4 children shipped).
- [ ] **Task 5.2:** Update umbrella `state.toml` to `status = "completed"`.
- [ ] **Task 5.3:** Write end-of-track report at `docs/reports/TRACK_COMPLETION_video_analysis_deob_20260621.md`.
- [ ] **Task 5.4:** Move all 5 folders to `conductor/archive/` per the project's archiving convention.
- [ ] **Task 5.5:** Update `conductor/chronology.md` with 5 new rows.
- [ ] **Task 5.6:** Update `conductor/tracks.md` to remove the campaign from Active Tracks.
## Verification (gate per workflow.md)
Each phase's completion requires:
- [ ] Idempotency check: re-running the de-obfuscation produces identical output (modulo timestamps)
- [ ] 4 verification criteria per umbrella spec §8 (lossless, bounded, constructively typed, etymology-cited)
- [ ] User review + approval at each phase boundary
- [ ] Per-task commits with git notes
- [ ] All artifacts committed to git
The campaign is "Pass 2 complete" when all 5 folders shipped, the 13 deliverables (2 pilot + 11 apply) are in `artifacts/`, and the user has approved each.
@@ -0,0 +1,337 @@
# Track Specification: Video Analysis De-obfuscation Campaign (2026-06-21)
**Status:** Active (spec approved 2026-06-21)
**Initialized:** 2026-06-21
**Owner:** Tier 1 Orchestrator (umbrella spec + synthesis); Tier 2 Tech Lead (per-track execution)
**Priority:** A (user-blocking; Pass 2 of the 3-pass research campaign)
**Type:** Multi-track research campaign (1 warmup + 1 umbrella + 3 phase children = 5 folders total)
**Domain:** Meta-tooling (research deliverable + LLM operational spec; no `src/` changes)
> **Purpose.** This umbrella organizes Pass 2 of the user's 3-pass research campaign: **de-obfuscation** of the Pass 1 video reports via the user's constructive type-theoretic re-encoding DSL. The de-obfuscation reduces standard math notation + verbose DSL/verbiage into a bounded, constructive, type-theoretic form that bridges the conceptual gap and crystallizes the formal language into the reader's mind.
> **Multi-pass context.** Pass 1 produced 12 deep-dive reports (1000-10000 LOC each) + 1 cross-cutting synthesis. Pass 2 takes those and produces a multi-layer de-obfuscated version per video. Pass 3 (future, user-led) projects the de-obfuscated content to the user's applied domain (handmade/data-oriented/GPGPU + own caveats).
> **Companion docs.** The warmup track (`video_analysis_deob_warmup_20260621/`) is the precursor that produces the initial lexicon + LLM prompt template. The 3 phase children (`video_analysis_deob_{lexicon,pilot,apply}_20260621/`) consume the warmup's output and apply it to the Pass 1 reports.
---
## 1. Overview
### 1.1 The user's de-obfuscation philosophy (foundational)
The user curates knowledge unorthodoxy, especially formal math/sciences. Their position:
| Position | Take |
|---|---|
| **Form requires bounds** | "To be known is to project a form." Boundedness is required for direct knowledge. |
| **Indefinite is not directly knowable** | What is unbounded is indefinite; what is indefinite is indiscernible, unobserved, unsubject, unknowable. |
| **Cycles/iteration/repetition are allowed** | Indefinite *operations* on bounded *forms* are expressible. `Stream A = nat -> A` is fine; `∞_val` is not. |
| **The agent is bounded by necessity** | An agent is "envesseled in the soup of the universe," separated from the indefinite to discern. The agent cannot be indefinite. |
| **Standard math notation is "noise"** | Too compressed, error-prone, ASCII-hostile, not programmatic, not verifiable, not visualizable. Lots of synonyms that mean the same thing (Curry-Howard: proofs=programs, types=propositions, etc.). |
| **Constructive type theory is the foundation** | Proofs = programs (Curry-Howard); every value is a bounded form; operations are transformations. |
| **Lexicon is etymology-aware** | Each term's word origin + definitional history is documented. Words are chosen to match modern subjective experience. |
| **Inspiration** | Modern PL design — concatenative (Forth/KYRA/CoSy), data-oriented imperative (Lottes), immediate-mode DAG-building DSLs (O'Donnell's IMGUI). |
### 1.2 What Pass 2 produces
For each of the 12 Pass 1 reports + 1 cross-cutting synthesis, Pass 2 produces a **3-layer de-obfuscated deliverable**:
1. **Translation** (`<slug>_translation.md`) — side-by-side table: original expression ↔ re-encoded form
2. **Replacement** (`<slug>_deobfuscated.md`) — the re-encoded form replaces the original; the report is read as a bounded, constructive, type-theoretic document
3. **Decoder index** (`<slug>_decoder.md`) — per-term decoder: form anchor, etymology, definition history, link to the original section
Plus a per-track **pilot_report.md** or **apply_report.md** capturing lexicon refinements.
### 1.3 The 2-stage Pass 2 flow
```
Stage 1 (Warmup - precursor): Stage 2 (Apply - 3 phases):
┌─ Phase 1: Lexicon (refine warmup's draft)
User's past notes ──► Warmup report.md + │
prompt_template.md ───────┤─ Phase 2: Pilot (apply to 2 videos, refine)
└─ Phase 3: Apply (apply to 10 + synthesis)
```
---
## 2. Current State Audit (as of 2026-06-21)
### 2.1 Already Available (DO NOT re-derive)
| Asset | Location | Use in Pass 2 |
|---|---|---|
| Pass 1 reports (12 + 1 synthesis) | `conductor/tracks/video_analysis_<slug>_20260621/report.md` + `summary.md` | The input to de-obfuscate |
| Pass 1 transcripts + OCR | `conductor/tracks/video_analysis_<slug>_20260621/artifacts/` | Source material for re-encoding context |
| `intent_dsl_survey_20260612` report | `conductor/tracks/intent_dsl_survey_20260612/report_v1.2.md` | Sibling DSL (a tool-verb DSL for AI agents); not the math re-encoding, but shares the philosophy |
| 4-tier vocab + 14-primitive grammar | `intent_dsl_survey_20260612/report_v1.2.md` §3, §4 | Reference for the PL-design vocabulary to use in the de-obfuscation DSL |
| `conductor/code_styleguides/error_handling.md` | Conductor docs | `Result[T]` convention for any new Python tooling |
| `conductor/code_styleguides/python.md` | Conductor docs | 1-space indent, type hints, no comments |
| Reference scripts (bootslop) | `C:\projects\forth\bootslop\*.py` | yt-dlp / cv2 / winsdk OCR — NOT needed for Pass 2 (no video processing) |
### 2.2 Gaps to Fill (this campaign's scope)
| # | Gap | Resolution |
|---|---|---|
| G1 | The user has no codified de-obfuscation DSL | Warmup produces `report.md` + `prompt_template.md` from the user's past samples |
| G2 | The de-obfuscation lexicon is not yet finalized | Phase 1 (lexicon) refines the warmup's draft into a codified spec |
| G3 | No pilot validation of the lexicon | Phase 2 (pilot) applies to 2 videos (1 foundational + 1 math-heavy) and captures refinements |
| G4 | No application to the remaining 10 + synthesis | Phase 3 (apply) applies the refined lexicon to the remaining Pass 1 outputs |
| G5 | No multi-layer deliverable structure | The 3-layer format (translation / replacement / decoder) is the new convention |
---
## 3. Goals
1. **Lexicon derived from the user's exemplars.** The de-obfuscation DSL is not invented from scratch; it is extracted from the user's past de-obfuscation notes via the warmup track. Evidence-based, not imposed.
2. **LLM-direct operational spec.** The de-obfuscation is performed by an LLM following the prompt template. The template is the "code" — the contract between the warmup and the apply phases.
3. **Lossless preservation (carries Pass 1's directive).** No Pass 1 concept is lost. The 3-layer output ensures every standard-math expression is represented (translation), replaced (replacement), and explained (decoder).
4. **Bounded, constructive, type-theoretic.** Every value is a bounded form. Iteration is explicit. "Infinity" is disambiguated lexically: `∞_val` (banned), `∞_proc` (allowed), `∞_card` (banned).
5. **Etymology + definitional history.** Each new term has a 1-line origin note + a 1-line definition history in the decoder.
6. **Multi-pass handoff.** Pass 3 (projection to applied domain) can consume the de-obfuscated outputs as its input. The handoff is clean: Pass 2 produces bounded, constructive forms; Pass 3 can apply them to the user's stylistic preferences.
---
## 4. Functional Requirements
### FR1. Umbrella folder + README
**WHERE:** `conductor/tracks/video_analysis_deob_20260621/`
**WHAT:** This folder contains the umbrella design (this spec) + 4 sibling files (`plan.md`, `metadata.json`, `state.toml`, `README.md`). The README is the index of the 4 sibling tracks (warmup + 3 phases) with their statuses.
### FR2. Warmup track (precursor)
**WHERE:** `conductor/tracks/video_analysis_deob_warmup_20260621/`
**WHAT:** Standalone research-style track. Produces:
- `report.md` — the design philosophy + the curated lexicon (terms + re-encodings) + the 3 noise-dedup maps (Curry-Howard-style collapses)
- `prompt_template.md` — the operational spec; an LLM can be prompted with this directly
**Inputs:** The user provides samples in `samples/` (their past de-obfuscation notes). Format: markdown, txt, or any text the user has.
**Process:** Tier 2 worker surveys the samples for term frequency, structural patterns, "form projection" heuristics, and noise-dedup maps. Produces a report + prompt template following the convention of `intent_dsl_survey_20260612/report_v1.2.md`.
**`blocked_by`:** none (user must provide samples before warmup can start; the user's action item is the FIRST dependency).
**`blocks`:** the 3 phase children (lexicon, pilot, apply) all depend on the warmup's output.
### FR3. Phase 1 — Lexicon refinement
**WHERE:** `conductor/tracks/video_analysis_deob_lexicon_20260621/`
**WHAT:** Consumes the warmup's `report.md` + `prompt_template.md`. Produces a codified `lexicon.md` (the operational spec for the de-obfuscation LLM) + `terms_catalog.md` (the machine-readable lexicon) + `dedup_map.md` (the 3 noise-dedup maps).
The lexicon refinement adds:
- Test cases (5-10 example transformations drawn from the user's samples)
- A "form anchor" requirement: each re-encoding must project from an indefinite to a bounded form
- Cross-references to the warmup's report sections
### FR4. Phase 2 — Pilot on 2 videos
**WHERE:** `conductor/tracks/video_analysis_deob_pilot_20260621/`
**WHAT:** Consumes the codified `lexicon.md` + 2 Pass 1 reports. The 2 pilot videos are:
1. `cs229_building_llms` (foundational ML/LLM coverage — wide scope, good test for "form projection" across many concepts)
2. `entropy_epiplexity` (math-heavy, focused on information-theoretic concepts — good test for "boundedness" + type-theoretic encoding of measure theory)
For each pilot video, produces the 3-layer deliverable in `artifacts/<slug>/`:
- `translation.md` (side-by-side: original ↔ re-encoded)
- `deobfuscated.md` (replacement: re-encoded form replaces the original)
- `decoder.md` (per-term decoder: form anchor, etymology, definition history)
Plus a `pilot_report.md` capturing:
- Lexicon refinements discovered during the pilot
- Concepts that didn't fit the lexicon (gaps)
- Process improvements for Phase 3
### FR5. Phase 3 — Apply to remaining 10 + synthesis
**WHERE:** `conductor/tracks/video_analysis_deob_apply_20260621/`
**WHAT:** Consumes the refined lexicon (from Phase 2) + 10 remaining Pass 1 reports + 1 cross-cutting synthesis. Produces the 3-layer deliverable for each, in `artifacts/<slug>/`.
Plus an `apply_report.md` capturing:
- Final lexicon v2
- Final process refinements
- Open questions for Pass 3
### FR6. Multi-layer deliverable structure (per video)
For each Pass 1 report, the de-obfuscation produces 3 files in `artifacts/<slug>/`:
**`<slug>_translation.md`** — side-by-side translation table:
```markdown
# Translation: <Video Title>
| # | Original Section | Original Expression | Re-encoded Form | Form Anchor |
|---|------------------|---------------------|-----------------|-------------|
| 1 | §2 Key Concepts | `set S = {x | P(x)}` | `kind S = {x : T | proof : P x}` | bounded set comprehension |
| 2 | §4 Transcript | `∀x ∈ : x² ≥ 0` | `forall x : Real, square x >= 0` | bounded quantification over Reals |
| ... |
```
**`<slug>_deobfuscated.md`** — the re-encoded report (replacement). Same 8-section structure as Pass 1's report, but every standard-math expression is replaced with the constructive type-theoretic form.
**`<slug>_decoder.md`** — per-term decoder:
```markdown
# Decoder: <Video Title>
## Term: Set
- Original notation: `S = {x | P(x)}`
- Re-encoded: `kind S = {x : T | proof : P x}`
- Form anchor: bounded set comprehension (a `kind` is a finite enumerated type)
- Etymology: "set" (Old English "settan" = to set, place); the word evokes "placement"
- Definition history: Cantor (1895) proposed unbounded set theory; the user rejects this in favor of bounded kinds
- Source sections in original: §2.1, §4.3, §5.7
## Term: Forall
- ...
```
### FR7. The de-obfuscation DSL (what the lexicon defines)
The lexicon produced by the warmup + refined by Phase 1 is the **de-obfuscation DSL**. It has:
| Component | Definition |
|---|---|
| **Terms** | The vocabulary: `kind`, `forall`, `exists`, `proof`, `program`, `type`, `bounded`, `stream`, `iterate`, `cycle`, `form`, `anchor`, etc. (the warmup discovers these from the user's samples) |
| **Grammar** | How terms combine. Inherits from constructive type theory: `term := term term | ( term ) | name : type | lambda name . term | ...` |
| **Noise-dedup map** | The 3 collapse maps: proofs=programs (Curry-Howard), types=propositions, sets=kinds, etc. |
| **Boundedness rules** | `∞_val` (banned), `∞_proc` (allowed as `Stream A = nat -> A`), `∞_card` (banned). Every value must be a bounded form. |
| **Form-anchor rule** | Every re-encoding must have a form anchor: "what bounded form does this project from the indefinite?" |
| **Etymology rule** | Every new term has a 1-line origin + 1-line definition history in the decoder. |
| **Verification rule** | The 4 verification criteria per §12: lossless, bounded, constructively typed, etymology-cited. |
### FR8. Dependency graph
```
UMBRELLA (video_analysis_deob_20260621)
├── Warmup (video_analysis_deob_warmup_20260621)
│ │
│ ▼ (warmup produces report.md + prompt_template.md)
├── Phase 1 (video_analysis_deob_lexicon_20260621) — consumes warmup
│ │
│ ▼ (Phase 1 produces lexicon.md + terms_catalog.md + dedup_map.md)
├── Phase 2 (video_analysis_deob_pilot_20260621) — consumes Phase 1
│ │
│ ▼ (Phase 2 produces 2 deobfuscated deliverables + pilot_report.md)
└── Phase 3 (video_analysis_deob_apply_20260621) — consumes Phase 2
▼ (Phase 3 produces 11 deobfuscated deliverables + apply_report.md)
```
### FR9. Storage & naming
- All 5 new folders under `conductor/tracks/` (matching the user's directive: "just make new files for pass 2 in the same directories")
- The warmup's `samples/` is gitignored (user's past notes are local-only; not committed)
- The 3-layer deliverables are committed (research artifacts, the whole point of Pass 2)
- Reports and decoder files are committed
- Per-phase pilot/apply reports are committed
### FR10. No src/ changes
Pass 2 produces research artifacts (markdown files). It does NOT modify `src/*.py`, add `src/<thing>.py` files, or add new `pyproject.toml` deps. The only code that may be written is for tooling (a possible `scripts/deobfuscate/` namespace IF a Tier 3 worker finds that the prompt template alone is insufficient — this is a judgment call during Phase 1).
---
## 5. Non-Functional Requirements
- **TDD if code is written.** Any new Python tooling in `scripts/deobfuscate/` follows the same conventions as `scripts/video_analysis/` (Result[T], 1-space indent, type hints, no comments, tests in `tests/test_deobfuscate_*.py`).
- **Per-task atomic commits.** Each phase follows `conductor/workflow.md` per-task commit discipline.
- **Git notes.** Each task gets a git note summarizing what was done and why.
- **No day estimates.** Scope measured in files/sites per `conductor/workflow.md` Tier 1 Track Initialization Rules.
- **User-as-source dependency.** The warmup is blocked on the user providing samples. This is a USER action item, not a Tier 2/3 action.
- **Lossless preservation directive (carried from Pass 1).** No Pass 1 concept is lost in the de-obfuscation.
---
## 6. Out of Scope (Explicit)
- **Pass 3 (projection to applied domain).** Future, user-led. The de-obfuscated outputs of Pass 2 are Pass 3's input.
- **The user's "own caveats" (referenced in Pass 1's spec §11.2).** User must articulate these before Pass 3 starts. Out of scope here.
- **The math encoding notation design itself (without the user's exemplars).** Pass 2 is EVIDENCE-BASED — the lexicon is derived from the user's past work, not invented.
- **Interpreter for the de-obfuscation DSL.** Out of scope. The LLM is the executor; no interpreter is built.
- **Modifying `src/*.py` files in manual_slop.** Research-only campaign.
- **Adding `pyproject.toml` dependencies.** All work is research (markdown files).
- **Automated verification of the de-obfuscation's "correctness."** The 4 verification criteria (lossless, bounded, constructively typed, etymology-cited) are checked by Tier 3 + the user, not by automated tooling.
---
## 7. Architecture Reference
This campaign does not modify the manual_slop application architecture. It produces research artifacts. The architecture refs that DO apply:
- **Track convention:** `conductor/workflow.md` "Standard Task Workflow" + "Tier 1 Track Initialization Rules" + per-task commit discipline
- **Code style (if code is written):** `conductor/code_styleguides/python.md` + `conductor/code_styleguides/error_handling.md`
- **Research track precedent:** `conductor/tracks/intent_dsl_survey_20260612/` (research-style report + operational spec)
- **Campaign umbrella precedent:** `conductor/tracks/video_analysis_campaign_20260621/` (1 umbrella + N children at `conductor/tracks/` level)
- **Multi-pass framing (load-bearing):** `conductor/tracks/video_analysis_campaign_20260621/spec.md` §0, §11
---
## 8. Verification Criteria
The campaign is "done" when all of the following are true:
- [ ] Warmup shipped with `report.md` + `prompt_template.md` (and the user has approved the lexicon)
- [ ] Phase 1 (lexicon) shipped with `lexicon.md` + `terms_catalog.md` + `dedup_map.md`
- [ ] Phase 2 (pilot) shipped with 2 deobfuscated deliverables (each 3-layer) + `pilot_report.md` capturing refinements
- [ ] Phase 3 (apply) shipped with 11 deobfuscated deliverables (each 3-layer) + `apply_report.md` capturing final lexicon v2
- [ ] Each deobfuscated deliverable passes the 4 verification criteria:
1. **Lossless** — every Pass 1 concept is represented in the de-obfuscated form (no dropped content)
2. **Bounded** — no `∞_val` or `∞_card` in the output; all values are finite forms
3. **Constructively typed** — every expression has a type; type-checking is mentally executable
4. **Etymology-cited** — every new term in the deobfuscation has a 1-line origin + 1-line definition history in the decoder
- [ ] Umbrella `state.toml` updated to `status = "completed"`
- [ ] End-of-track report at `docs/reports/TRACK_COMPLETION_video_analysis_deob_20260621.md`
- [ ] All 5 folders move to `conductor/archive/` per the project's archiving convention
- [ ] `conductor/chronology.md` updated with 5 new rows
---
## 9. Risk Register
| ID | Title | Likelihood | Scope impact | Mitigation |
|---|---|---|---|---|
| R1 | User cannot provide samples in time | Medium | Warmup blocked | User can provide partial samples; warmup can use 1-2 examples as a starter |
| R2 | User's samples don't have enough de-obfuscation patterns (e.g., mostly raw notes) | Medium | Warmup produces a thin lexicon | Phase 1 (lexicon) extends the warmup's draft with constructive type theory defaults |
| R3 | Lexicon can't capture a concept in bounded form | Medium | Some concepts remain "indefinite — see original" | Document the gap; don't force a translation |
| R4 | Pilot reveals the lexicon is overfit to the user's style | Low | Refinement needed in Phase 2 | `pilot_report.md` captures gaps; Phase 3 uses the refined lexicon |
| R5 | The 3-layer deliverable format is too verbose for some videos | Low | Adjust per video | Format is a template, not a rigid structure; some sections may be smaller |
| R6 | Tier 2 attempts to invent the lexicon without the user's samples | Low (if user samples present) | Lexicon is invented, not evidence-based | Warmup spec is explicit: "consume user samples FIRST; lexicon is evidence-based" |
| R7 | Pass 3 needs the de-obfuscated outputs but Pass 2 isn't done | High (timeline) | Pass 3 blocked | This campaign's "lossless preservation" ensures Pass 3 has all the input it needs once Pass 2 ships |
| R8 | The user changes their mind about the philosophy mid-campaign | Low | Pilot reveals the shift; lexicon is updated | `pilot_report.md` is the checkpoint for user review |
---
## 10. User Directives (recorded for next agent / future-self)
- **2026-06-21:** "I have a very unorthodox take for how I curate knowledge, especially formal knowledge in the math and sciences." — Pass 2 is curation, not just translation.
- **2026-06-21:** "I like theurgy, I like some aspects of platonic thought... consistent time-invariant shared objective reference to similar patterns." — Platonism: shared reference through subjective lenses.
- **2026-06-21:** "I like Norman Wildberger's work. And I like the constructivist current progress on type theories." — Foundational: constructive type theory + Wildberger's algebraic finitism.
- **2026-06-21:** "I don't like the way indefinites/infinities/infinitesimals are defined or verbally utilized." — Boundedness required for direct knowledge.
- **2026-06-21:** "Infinite is okay well handled CORRECTLY. No observer or mechanism or construct can be infinite in resolution or quantification." — Cycles/iteration are fine; `∞_val` is not.
- **2026-06-21:** "I can provide samples of notes I've done but it will take time and might be best to leave to a 'warmup' track to gather and survey those." — Warmup is the precursor; the lexicon is evidence-based.
- **2026-06-21:** "Multi-layer for sure" (answer to Q1).
- **2026-06-21:** "Report + prompt template" (answer to Q3, Q4).
- **2026-06-21:** "Just make new files for pass 2 in the same directories" + "I like having that umbrella track similar to the campaign track" (answer to Q4) — 5 folders at `conductor/tracks/` level, hybrid umbrella.
- **2026-06-21:** "Without giving examples this is the best I can do to describe where I am." — The ideation has set the philosophy; the exemplars (warmup) provide the concrete terms.
---
## 11. See Also
- `conductor/tracks/video_analysis_campaign_20260621/spec.md` §0 (multi-pass framing) + §11 (Pass 2 handoff contract, now superseded by this spec)
- `conductor/tracks/intent_dsl_survey_20260612/report_v1.2.md` — the sibling DSL; shares the philosophy but is for tool verbs, not math re-encoding
- `conductor/tracks/nagent_review_20260608/` — research-track precedent
- `conductor/tracks/fable_review_20260617/` — research-track precedent
- `conductor/code_styleguides/agent_memory_dimensions.md` — 4 memory dimensions; Pass 2 produces a "knowledge" memory (per-dimension)
- `conductor/code_styleguides/knowledge_artifacts.md` — knowledge harvest pattern; relevant to the de-obfuscation's "durable" nature
- `conductor/workflow.md` "Tier 1 Track Initialization Rules" + "Tier 2 Autonomous Sandbox" — execution conventions
- `conductor/tier2/agents/tier2-autonomous.md` — Tier 2 agent directives (test runner, branch conventions, failcount)
- `conductor/tier2/commands/tier-2-auto-execute.md` — Tier 2 dispatch protocol
@@ -0,0 +1,87 @@
# Track state for video_analysis_deob_20260621
# Updated by Tier 1 Orchestrator (initially) and Tier 2 Tech Lead (during execution)
[meta]
track_id = "video_analysis_deob_20260621"
name = "Video Analysis De-obfuscation Campaign (Pass 2 of 3)"
status = "active"
current_phase = 0 # Phase 0 = waiting for user samples
last_updated = "2026-06-21"
[blocked_by]
# Independent umbrella. No blockers.
[blocks]
# This umbrella blocks (via dependency graph, not direct block):
# - video_analysis_deob_lexicon_20260621 (blocked by warmup)
# - video_analysis_deob_pilot_20260621 (blocked by lexicon)
# - video_analysis_deob_apply_20260621 (blocked by pilot)
# The umbrella is a coordination artifact, not a literal block.
[phases]
phase_0 = { status = "in_progress", checkpointsha = "", name = "User samples provided (USER action item)" }
phase_1 = { status = "pending", checkpointsha = "", name = "Warmup (precursor) - report.md + prompt_template.md" }
phase_2 = { status = "pending", checkpointsha = "", name = "Phase 1 child: Lexicon refinement" }
phase_3 = { status = "pending", checkpointsha = "", name = "Phase 2 child: Pilot on 2 videos" }
phase_4 = { status = "pending", checkpointsha = "", name = "Phase 3 child: Apply to remaining 10 + synthesis" }
phase_5 = { status = "pending", checkpointsha = "", name = "Campaign closeout" }
[tasks]
# Phase 0 (USER action)
t0_1 = { status = "pending", commit_sha = "", description = "User gathers 3-10 samples of past de-obfuscation notes and places them in conductor/tracks/video_analysis_deob_warmup_20260621/samples/" }
# Phase 1 (warmup)
t1_1 = { status = "pending", commit_sha = "", description = "Initialize warmup track: create plan.md + metadata.json + state.toml" }
t1_2 = { status = "pending", commit_sha = "", description = "Survey the samples: term frequency, structural patterns, form projection heuristics, noise-dedup maps (Tier 3 worker)" }
t1_3 = { status = "pending", commit_sha = "", description = "Write report.md (~1000-3000 LOC) - design philosophy + curated lexicon + 3 noise-dedup maps + sample transformations" }
t1_4 = { status = "pending", commit_sha = "", description = "Write prompt_template.md (~200-500 LOC) - LLM-direct operational spec" }
t1_5 = { status = "pending", commit_sha = "", description = "User review + approval of the lexicon + template" }
# Phase 2 (lexicon child)
t2_1 = { status = "pending", commit_sha = "", description = "Initialize lexicon child track" }
t2_2 = { status = "pending", commit_sha = "", description = "Refine warmup's draft: add 5-10 test cases, add form anchor requirement, add cross-references" }
t2_3 = { status = "pending", commit_sha = "", description = "Produce lexicon.md + terms_catalog.md + dedup_map.md" }
t2_4 = { status = "pending", commit_sha = "", description = "User review + approval before Phase 3 starts" }
# Phase 3 (pilot child)
t3_1 = { status = "pending", commit_sha = "", description = "Initialize pilot child track" }
t3_2 = { status = "pending", commit_sha = "", description = "Apply lexicon to cs229_building_llms - produce 3-layer deliverable in artifacts/cs229_building_llms/" }
t3_3 = { status = "pending", commit_sha = "", description = "Apply lexicon to entropy_epiplexity - produce 3-layer deliverable in artifacts/entropy_epiplexity/" }
t3_4 = { status = "pending", commit_sha = "", description = "Write pilot_report.md - lexicon refinements, gaps, process improvements" }
t3_5 = { status = "pending", commit_sha = "", description = "User review + approval before Phase 4 starts" }
# Phase 4 (apply child)
t4_1 = { status = "pending", commit_sha = "", description = "Initialize apply child track" }
t4_2 = { status = "pending", commit_sha = "", description = "Apply refined lexicon to 10 remaining Pass 1 reports - produce 3-layer deliverables" }
t4_3 = { status = "pending", commit_sha = "", description = "Apply refined lexicon to cross-cutting synthesis - produce 3-layer deliverable" }
t4_4 = { status = "pending", commit_sha = "", description = "Write apply_report.md - final lexicon v2, process refinements, open questions for Pass 3" }
# Phase 5 (closeout)
t5_1 = { status = "pending", commit_sha = "", description = "Update umbrella README.md with final statuses" }
t5_2 = { status = "pending", commit_sha = "", description = "Update umbrella state.toml to status = 'completed'" }
t5_3 = { status = "pending", commit_sha = "", description = "Write end-of-track report at docs/reports/TRACK_COMPLETION_video_analysis_deob_20260621.md" }
t5_4 = { status = "pending", commit_sha = "", description = "Move all 5 folders to conductor/archive/" }
t5_5 = { status = "pending", commit_sha = "", description = "Update conductor/chronology.md with 5 new rows" }
t5_6 = { status = "pending", commit_sha = "", description = "Update conductor/tracks.md to remove from Active Tracks" }
[verification]
warmup_shipped = false
lexicon_shipped = false
pilot_shipped = false
apply_shipped = false
end_of_track_report_committed = false
all_5_folders_archived = false
chronology_updated = false
user_approved_at_each_phase = false
[user_directives_logged]
unorthodox_curation = "Per user 2026-06-21: 'I have a very unorthodox take for how I curate knowledge, especially formal knowledge in the math and sciences.'"
constructive_type_theory = "Per user 2026-06-21: 'I like Norman Wildberger's work. And I like the constructivist current progress on type theories as a foundational system.'"
bounded_for_knowledge = "Per user 2026-06-21: 'No observer or mechanism or construct can be infinite in resolution or quantification. To have distinction must have a bounds.'"
cycles_iteration_allowed = "Per user 2026-06-21: 'Infinite is okay well handled CORRECTLY... What can be indefinite is that can be subjected upon is that of cycles, that of iteration, that of repetition.'"
warmup_evidence_based = "Per user 2026-06-21: 'I can provide samples of notes I've done but it will take time and might be best to leave to a warmup track to gather and survey those, to then codify how this de-obfuscation via an llm following that within a track's plan would do.'"
multi_layer_deliverable = "Per user 2026-06-21: 'Multi-layer for sure'"
report_plus_template = "Per user 2026-06-21: warmup output is 'report.md (research style) + the actionable template which can reference the report among the samples as well'"
hybrid_umbrella = "Per user 2026-06-21: 'I like having that umbrella track similar to the campaign track for the first pass'"
same_directories = "Per user 2026-06-21: 'You just make new files for pass 2 in the same directories'"
no_day_estimates = "Per conductor/workflow.md Tier 1 Track Initialization Rules (added 2026-06-16). Scope measured in files/sites only."