docs(spec): directive hot-swap harness design + video analysis campaign B

Design for the directive hot-swap harness (Campaign A) + scope for the 4-video analysis campaign (Campaign B). Two parallel campaigns sharing a theme (encoding information densely for LLMs) but tracked independently. Campaign A (Track A-1): directive harvest + conductor/directives/ scaffold + preset markdown system + role-prompt 'warm with:' bootstrap. No scripts, no TOML — markdown-only, LLM-native. Duplicates current directives as v1 variants; alternative encodings (v2+) added over time as experiments. Campaign B: 4 new videos (entropy/compression, LeCun world models, LeCun vs LLMs, recursive self-improvement). Follows the established 3-pass pattern from the previous 12-video campaign. Separate track spec. Cross-campaign: video insights may surface alternative encoding strategies; the harness design mirrors the video campaign's deobfuscation pattern (same content, different encoding).
2026-06-27 13:42:32 -04:00
parent 284d4c42fd
commit d07296bbb4
1 changed files with 230 additions and 0 deletions
@@ -0,0 +1,230 @@
+# Design: Directive Hot-Swap Harness (OpenCode Directive Presets)
+
+**Date:** 2026-06-27
+**Status:** Draft — pending user review
+**Track ID (proposed):** `directive_hotswap_harness_20260627`
+
+## Problem
+
+The codebase's directives — the instructions that tell LLMs how to behave (banned patterns, conventions, hard bans, anti-patterns) — are scattered across the entire doc tree: `AGENTS.md`, `conductor/workflow.md`, `conductor/product-guidelines.md`, `conductor/tech-stack.md`, every `conductor/code_styleguides/*.md`, `docs/Readme.md`, `docs/AGENTS.md`, all 14 `docs/guide_*.md`, etc. They're embedded in prose, tables, anti-pattern sections, "Critical Anti-Patterns" lists, "Hard Rules," styleguide sections.
+
+The 4 tier role prompts (`.opencode/agents/tier1-orchestrator.md`, `tier2-tech-lead.md`, `tier3-worker.md`, `tier4-qa.md`) plus the autonomous variant (`conductor/tier2/agents/tier2-autonomous.md`) currently hardcode a list of ~11 files to read before any action. This list is static — every session gets the same directives regardless of the task. There's no mechanism to:
+- Test whether an alternative encoding of the same directive (imperative-ban vs. rationale-first vs. before/after) produces better LLM compliance
+- Hot-swap which encoding is active without manually editing files or navigating the filesystem
+- Exercise per-session control over which directives the LLM warms up with
+
+## Goal
+
+Build a **directive hot-swap harness** that lets the user:
+1. Maintain multiple alternative encodings ("variants") of the same directive as separate files
+2. Compose active directive sets into named "presets" (markdown bills of materials)
+3. Hot-swap which preset is active via a single `warm with: <path>` instruction in the role prompt or session message
+4. Use the existing file-reading behavior LLMs already have — no scripts, no TOML, no build steps
+
+## Design
+
+### The directive directory structure
+
+```
+conductor/directives/
+  <directive_name>/
+    v1.md          ← the baseline encoding (verbatim lift from current docs)
+    v2_<style>.md  ← alternative encodings (added over time)
+  presets/
+    current_baseline.md    ← the default preset (all v1)
+    <experimental>.md      ← alternative presets (added over time)
+```
+
+**Naming convention:** lowercase, underscore-separated, action-oriented (`ban_dict_any`, not `dict_str_any_ban`). The name describes the directive's intent.
+
+**Variant file format:** each `vN.md` has a short header annotating why this iteration exists, then the directive text:
+
+```markdown
+# <directive_name> — v1
+
+**Why this iteration:** Lifted verbatim from `conductor/code_styleguides/python.md` §17.1.
+This is the baseline encoding — the imperative-ban style currently in production.
+Future variants will test alternative encodings against this baseline.
+
+---
+
+<directive text>
+```
+
+### The preset format
+
+A preset is a markdown bill of materials. It tells the LLM which directive variant files to read for this run. Nothing more.
+
+```markdown
+# Preset: current_baseline
+
+The baseline directive composition — all v1 variants lifted from the current
+production docs.
+
+## Directives to warm
+
+Read each file below before any action.
+
+- ban_dict_any: conductor/directives/ban_dict_any/v1.md
+- ban_optional_returns: conductor/directives/ban_optional_returns/v1.md
+- no_local_imports: conductor/directives/no_local_imports/v1.md
+- ...
+
+## Notes
+
+All v1 (verbatim lifts from current production docs). No alternative encodings
+tested yet. This preset is the control group for future experiments.
+```
+
+**Key properties:**
+- **Flat list.** No nesting, no conditionals, no includes. The LLM reads the list, reads the files.
+- **Human-readable name.** `current_baseline`, `exploratory_rationale`, `minimal_tokens` — pick by name.
+- **Notes section.** Documents the hypothesis being tested. This is the experiment log, inline with the preset.
+- **Partial swaps.** Swap 2-3 directives to v2, leave the rest at v1. The preset makes the diff explicit.
+- **No script needed.** Author a new preset by copying an existing one and changing variant paths. Hot-swap by telling the LLM which preset to use.
+
+### The role-prompt bootstrap
+
+The 5 role prompts (`.opencode/agents/tier1-orchestrator.md`, `tier2-tech-lead.md`, `tier3-worker.md`, `tier4-qa.md`, and `conductor/tier2/agents/tier2-autonomous.md`) have a hardcoded "MANDATORY: Pre-Action Required Reading" section listing ~11 specific files. This is replaced with a single `warm with:` directive.
+
+```markdown
+## MANDATORY: Directive Warm-up
+
+warm with: conductor/directives/presets/current_baseline.md
+
+Read the preset file above. It lists directive variant files to read before any action.
+Read each file the preset references. These are your active directives for this session.
+
+If the user specifies a different preset (e.g., "warm with: conductor/directives/presets/exploratory_rationale.md"),
+use that instead. The user's instruction overrides the default.
+```
+
+**Key properties:**
+- **One line is the bootstrap.** `warm with: <path>` is the entire mechanism.
+- **User override.** The user can tell the LLM "warm with: <path>" in their session message and it uses that preset instead of the default. This is the hot-swap — no file editing, just a text instruction.
+- **Per-role defaults.** Each tier role prompt can default to a different preset.
+- **Non-directive reads remain hardcoded.** Files that aren't tunable directives (e.g., `conductor/tracks/tier2_leak_prevention_20260620/spec.md`, `conductor/tier2/githooks/forbidden-files.txt`) stay as direct references in the role prompt.
+
+### What stays in the role prompt (not directive-based)
+
+- `AGENTS.md` — project operating rules (contains directives AND non-directive rules)
+- `conductor/workflow.md` — operational workflow
+- `conductor/edit_workflow.md` — edit tool contract
+- `conductor/tier2/githooks/forbidden-files.txt` — file denylist
+- The relevant `docs/guide_*.md` — architecture reference
+
+These are context, not tunable directives. They stay hardcoded in the role prompt.
+
+### The directive harvest
+
+The directives are NOT limited to the 11 files the role prompts mandate. They're scattered across the entire doc tree. The track's first phase is a systematic harvest:
+
+**A directive is any statement that tells the LLM:**
+- "Do X" / "Don't do X" (imperative)
+- "Use Y instead of Z" (preference)
+- "This is BANNED" (hard ban)
+- "Follow pattern P" (convention)
+- "Never do Q" (anti-pattern)
+
+**NOT a directive:**
+- Descriptive prose ("The App class holds GUI state")
+- Architecture documentation ("Thread domains are separated by...")
+- Reference material ("The 45-tool inventory includes...")
+
+**Sources to comb (non-exhaustive):**
+- `AGENTS.md` — "Critical Anti-Patterns", "File Size and Naming Convention", "Session-Learned Anti-Patterns", "Process Anti-Patterns"
+- `conductor/workflow.md` — "Code Style", "Guiding Principles", "Testing Requirements", "Known Pitfalls", "Process Anti-Patterns", "Tier 2 Autonomous Sandbox conventions"
+- `conductor/product-guidelines.md` — "Core Value", "Code Standards & Architecture", "Data-Oriented Error Handling", "Phase 5: Heavy Curation"
+- `conductor/tech-stack.md` — "Core Value" header
+- `conductor/code_styleguides/data_oriented_design.md` — §8.5 "Python Type Promotion Mandate", the 7-question simplification pass, the 10-question self-check
+- `conductor/code_styleguides/python.md` — §10 "Anti-OOP Conventions", §17 "LLM Default Anti-Patterns" (the 7 banned patterns)
+- `conductor/code_styleguides/error_handling.md` — the Result[T] convention, the AI Agent Checklist
+- `conductor/code_styleguides/type_aliases.md` — "When NOT to promote"
+- `conductor/code_styleguides/feature_flags.md` — "delete to turn off" convention
+- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4-dimension decision tree
+- `conductor/code_styleguides/rag_integration_discipline.md` — "conservative-RAG rule"
+- `conductor/code_styleguides/cache_friendly_context.md` — stable-to-volatile ordering
+- `conductor/code_styleguides/knowledge_artifacts.md` — the harvest pattern
+- `docs/AGENTS.md` — "Convention Enforcement"
+- `docs/Readme.md` — any directive-like content in feature descriptions
+
+**Granularity resolution:** the harvest produces a candidate list. Then the question of which directives to merge (e.g., `ban_prefix_aliasing` + `no_local_imports` might become `import_hygiene`), split, or keep standalone is resolved in the harvest phase — not locked in upfront.
+
+### The original docs stay untouched
+
+The `conductor/directives/` tree is a *parallel* structure, not a replacement. The original docs (`python.md`, `error_handling.md`, `AGENTS.md`, etc.) remain the canonical source until a future track deprecates them. The harness is useful immediately (the v1 variants are exact copies); the old docs are not broken.
+
+### Why no scripts / TOML
+
+The user explicitly rejected TOML manifests and scripts for this initial version: "no need to systematize that hard when I don't know what's going to work yet." The preset is markdown. The hot-swap is a text instruction. The variant selection is a path in a markdown file. No build steps, no generated files, no tooling dependencies. If the system proves useful, a future track can add automation (auto-generating presets from the directory tree, token-cost analysis per variant, automated compliance testing).
+
+## Scope: Two Parallel Campaigns
+
+The user's request bundles two distinct campaigns that share a theme ("how do you encode information densely for an LLM?") but are tracked and executed independently.
+
+### Campaign A: Directive Hot-Swap Harness (this spec)
+
+**Track A-1 (this):** directive harvest + scaffold + baseline preset + role-prompt bootstrap update. Gets the system working with v1 (current) encodings.
+
+Future tracks in Campaign A:
+- Alternative encoding authoring (v2, v3 per directive — the actual experimentation)
+- Manual Slop integration (a "Directive Lab" panel for virtualized directive selection)
+- Token-cost analysis tooling
+- Automated compliance testing
+
+### Campaign B: Video Analysis (4 new videos)
+
+A separate research campaign following the established 3-pass pattern from the previous 12-video campaign (Pass 1: extract → Pass 2: deobfuscate → Pass 3: project to C11/Python). The 4 videos:
+
+1. **Reinventing Entropy | Compression is Intelligence Part 1** (https://youtu.be/l6DKRf-fAAM)
+2. **Yann LeCun: World Models: Enabling the next AI revolution** (https://www.youtube.com/watch?v=72Xj8k5WQX4)
+3. **Yann LeCun's $1B Bet Against LLMs [Part 1]** (https://youtu.be/kYkIdXwW2AE)
+4. **Recursive Self-Improvement** (https://youtu.be/t7_ZXgfJVG8)
+
+### Cross-Campaign Relationship
+
+The two campaigns inform each other but have no hard dependency:
+
+- **The video analysis informs directive encoding.** The entropy/compression video (video 1) provides theoretical grounding for how information density affects comprehension. LeCun's world-model work (videos 2-3) informs how LLMs model directive intent. Recursive self-improvement (video 4) is directly relevant to the meta-question of whether better directive encodings can be discovered iteratively. Insights from the video analysis may surface alternative encoding strategies to test in Campaign A's harness.
+
+- **The harness informs the video analysis.** The previous video campaign produced a lexicon + C11 reference + deobfuscation DSL. The directive harness is itself a compression-aid tool — it encodes the same directive in fewer/different tokens and observes the effect. The harness's design (preset as bill-of-materials, variant as alternative encoding) is the same pattern as the video campaign's deobfuscation pass (same content, different encoding). The harness may inform how the video analysis encodes its own outputs.
+
+- **Execution order:** the campaigns can run in parallel. Campaign A (Track A-1) is an engineering track; Campaign B is a research track. They don't share files. The cross-pollination is intellectual, not structural.
+
+### The video analysis track structure (Campaign B)
+
+Follows the established 3-pass pattern from `docs/reports/2026-06-15/CAMPAIGN_CLOSE_OUT_video_analysis_20260621.md`:
+
+- **Pass 1:** Information extraction (4 deep-dive reports, one per video). Uses the existing `scripts/video_analysis/` pipeline (download_video, extract_transcript, extract_keyframes, ocr_frames, synthesize_report). The lexicon v2 from the previous campaign is the starting point for deobfuscation.
+- **Pass 2:** Deobfuscation (apply the lexicon v2 to the 4 new videos' content). May produce lexicon v3 corrections if the new videos surface notation the lexicon doesn't cover.
+- **Pass 3:** C11/Python projection (project each video's deobfuscated content to code in the user's idiomatic style).
+
+The video analysis track is initialized as a separate conductor track (`video_analysis_campaign_2_20260627` or similar). Its spec/plan is authored separately from this design doc.
+
+## Out of Scope (for Track A-1)
+
+- **Authoring alternative encodings (v2+).** This track only creates v1 (verbatim lifts). The experimentation is a future activity.
+- **Deprecating the original docs.** The old docs stay as canonical source.
+- **Scripts for preset generation or variant selection.** No automation in this version.
+- **Manual Slop GUI integration.** The harness is OpenCode-only for now.
+- **Token-cost analysis.** No tooling to measure token cost per variant in this version.
+- **Automated compliance testing.** No test harness to measure LLM compliance per encoding.
+- **The 4-video analysis (Campaign B).** Separate track, separate campaign. This design doc covers Campaign A (the harness) only. The video analysis gets its own track spec.
+
+## Risks
+
+1. **Harvest completeness.** The directive harvest might miss directives embedded in prose. Mitigation: systematic combing of the doc tree + the user reviews the candidate list before variants are created.
+2. **Granularity ambiguity.** Some directives overlap (e.g., "ban dict[str, Any]" and "use typed dataclass fields" are two sides of the same coin). Mitigation: the harvest phase produces a candidate list; the granularity is resolved there, not upfront.
+3. **Role-prompt drift.** The 5 role prompts need to be updated consistently. Mitigation: the `warm with:` line is the only change; the rest of each role prompt is untouched.
+4. **Adoption friction.** LLMs might not follow the `warm with:` instruction reliably. Mitigation: the instruction is simple (read a file, read the files it lists) and uses the existing file-reading behavior the LLMs already have.
+
+## See Also
+
+- `conductor/tier2/agents/tier2-autonomous.md` — the role prompt that will be updated with `warm with:`
+- `conductor/tier2/commands/tier-2-auto-execute.md` — the slash command template
+- `conductor/code_styleguides/python.md` §17 — the primary source of directives to harvest
+- `conductor/code_styleguides/error_handling.md` — the Result[T] convention to harvest
+- `AGENTS.md` "Critical Anti-Patterns" — the hard bans to harvest
+- `docs/guide_meta_boundary.md` — the meta-tooling / application distinction (relevant to why this harness lives in the meta-tooling domain)
+- `docs/reports/2026-06-15/CAMPAIGN_CLOSE_OUT_video_analysis_20260621.md` — the previous video campaign's closeout (the pattern Campaign B follows)
+- `scripts/video_analysis/` — the existing video analysis pipeline (Campaign B reuses this)