Private
Public Access
0
0

docs(spec): directive hot-swap harness design + video analysis campaign B

Design for the directive hot-swap harness (Campaign A) + scope for the
4-video analysis campaign (Campaign B). Two parallel campaigns sharing a
theme (encoding information densely for LLMs) but tracked independently.

Campaign A (Track A-1): directive harvest + conductor/directives/ scaffold
+ preset markdown system + role-prompt 'warm with:' bootstrap. No scripts,
no TOML — markdown-only, LLM-native. Duplicates current directives as v1
variants; alternative encodings (v2+) added over time as experiments.

Campaign B: 4 new videos (entropy/compression, LeCun world models, LeCun
vs LLMs, recursive self-improvement). Follows the established 3-pass
pattern from the previous 12-video campaign. Separate track spec.

Cross-campaign: video insights may surface alternative encoding strategies;
the harness design mirrors the video campaign's deobfuscation pattern
(same content, different encoding).
This commit is contained in:
2026-06-27 13:42:32 -04:00
parent 284d4c42fd
commit d07296bbb4
@@ -0,0 +1,230 @@
# Design: Directive Hot-Swap Harness (OpenCode Directive Presets)
**Date:** 2026-06-27
**Status:** Draft — pending user review
**Track ID (proposed):** `directive_hotswap_harness_20260627`
## Problem
The codebase's directives — the instructions that tell LLMs how to behave (banned patterns, conventions, hard bans, anti-patterns) — are scattered across the entire doc tree: `AGENTS.md`, `conductor/workflow.md`, `conductor/product-guidelines.md`, `conductor/tech-stack.md`, every `conductor/code_styleguides/*.md`, `docs/Readme.md`, `docs/AGENTS.md`, all 14 `docs/guide_*.md`, etc. They're embedded in prose, tables, anti-pattern sections, "Critical Anti-Patterns" lists, "Hard Rules," styleguide sections.
The 4 tier role prompts (`.opencode/agents/tier1-orchestrator.md`, `tier2-tech-lead.md`, `tier3-worker.md`, `tier4-qa.md`) plus the autonomous variant (`conductor/tier2/agents/tier2-autonomous.md`) currently hardcode a list of ~11 files to read before any action. This list is static — every session gets the same directives regardless of the task. There's no mechanism to:
- Test whether an alternative encoding of the same directive (imperative-ban vs. rationale-first vs. before/after) produces better LLM compliance
- Hot-swap which encoding is active without manually editing files or navigating the filesystem
- Exercise per-session control over which directives the LLM warms up with
## Goal
Build a **directive hot-swap harness** that lets the user:
1. Maintain multiple alternative encodings ("variants") of the same directive as separate files
2. Compose active directive sets into named "presets" (markdown bills of materials)
3. Hot-swap which preset is active via a single `warm with: <path>` instruction in the role prompt or session message
4. Use the existing file-reading behavior LLMs already have — no scripts, no TOML, no build steps
## Design
### The directive directory structure
```
conductor/directives/
<directive_name>/
v1.md ← the baseline encoding (verbatim lift from current docs)
v2_<style>.md ← alternative encodings (added over time)
presets/
current_baseline.md ← the default preset (all v1)
<experimental>.md ← alternative presets (added over time)
```
**Naming convention:** lowercase, underscore-separated, action-oriented (`ban_dict_any`, not `dict_str_any_ban`). The name describes the directive's intent.
**Variant file format:** each `vN.md` has a short header annotating why this iteration exists, then the directive text:
```markdown
# <directive_name> — v1
**Why this iteration:** Lifted verbatim from `conductor/code_styleguides/python.md` §17.1.
This is the baseline encoding — the imperative-ban style currently in production.
Future variants will test alternative encodings against this baseline.
---
<directive text>
```
### The preset format
A preset is a markdown bill of materials. It tells the LLM which directive variant files to read for this run. Nothing more.
```markdown
# Preset: current_baseline
The baseline directive composition — all v1 variants lifted from the current
production docs.
## Directives to warm
Read each file below before any action.
- ban_dict_any: conductor/directives/ban_dict_any/v1.md
- ban_optional_returns: conductor/directives/ban_optional_returns/v1.md
- no_local_imports: conductor/directives/no_local_imports/v1.md
- ...
## Notes
All v1 (verbatim lifts from current production docs). No alternative encodings
tested yet. This preset is the control group for future experiments.
```
**Key properties:**
- **Flat list.** No nesting, no conditionals, no includes. The LLM reads the list, reads the files.
- **Human-readable name.** `current_baseline`, `exploratory_rationale`, `minimal_tokens` — pick by name.
- **Notes section.** Documents the hypothesis being tested. This is the experiment log, inline with the preset.
- **Partial swaps.** Swap 2-3 directives to v2, leave the rest at v1. The preset makes the diff explicit.
- **No script needed.** Author a new preset by copying an existing one and changing variant paths. Hot-swap by telling the LLM which preset to use.
### The role-prompt bootstrap
The 5 role prompts (`.opencode/agents/tier1-orchestrator.md`, `tier2-tech-lead.md`, `tier3-worker.md`, `tier4-qa.md`, and `conductor/tier2/agents/tier2-autonomous.md`) have a hardcoded "MANDATORY: Pre-Action Required Reading" section listing ~11 specific files. This is replaced with a single `warm with:` directive.
```markdown
## MANDATORY: Directive Warm-up
warm with: conductor/directives/presets/current_baseline.md
Read the preset file above. It lists directive variant files to read before any action.
Read each file the preset references. These are your active directives for this session.
If the user specifies a different preset (e.g., "warm with: conductor/directives/presets/exploratory_rationale.md"),
use that instead. The user's instruction overrides the default.
```
**Key properties:**
- **One line is the bootstrap.** `warm with: <path>` is the entire mechanism.
- **User override.** The user can tell the LLM "warm with: <path>" in their session message and it uses that preset instead of the default. This is the hot-swap — no file editing, just a text instruction.
- **Per-role defaults.** Each tier role prompt can default to a different preset.
- **Non-directive reads remain hardcoded.** Files that aren't tunable directives (e.g., `conductor/tracks/tier2_leak_prevention_20260620/spec.md`, `conductor/tier2/githooks/forbidden-files.txt`) stay as direct references in the role prompt.
### What stays in the role prompt (not directive-based)
- `AGENTS.md` — project operating rules (contains directives AND non-directive rules)
- `conductor/workflow.md` — operational workflow
- `conductor/edit_workflow.md` — edit tool contract
- `conductor/tier2/githooks/forbidden-files.txt` — file denylist
- The relevant `docs/guide_*.md` — architecture reference
These are context, not tunable directives. They stay hardcoded in the role prompt.
### The directive harvest
The directives are NOT limited to the 11 files the role prompts mandate. They're scattered across the entire doc tree. The track's first phase is a systematic harvest:
**A directive is any statement that tells the LLM:**
- "Do X" / "Don't do X" (imperative)
- "Use Y instead of Z" (preference)
- "This is BANNED" (hard ban)
- "Follow pattern P" (convention)
- "Never do Q" (anti-pattern)
**NOT a directive:**
- Descriptive prose ("The App class holds GUI state")
- Architecture documentation ("Thread domains are separated by...")
- Reference material ("The 45-tool inventory includes...")
**Sources to comb (non-exhaustive):**
- `AGENTS.md` — "Critical Anti-Patterns", "File Size and Naming Convention", "Session-Learned Anti-Patterns", "Process Anti-Patterns"
- `conductor/workflow.md` — "Code Style", "Guiding Principles", "Testing Requirements", "Known Pitfalls", "Process Anti-Patterns", "Tier 2 Autonomous Sandbox conventions"
- `conductor/product-guidelines.md` — "Core Value", "Code Standards & Architecture", "Data-Oriented Error Handling", "Phase 5: Heavy Curation"
- `conductor/tech-stack.md` — "Core Value" header
- `conductor/code_styleguides/data_oriented_design.md` — §8.5 "Python Type Promotion Mandate", the 7-question simplification pass, the 10-question self-check
- `conductor/code_styleguides/python.md` — §10 "Anti-OOP Conventions", §17 "LLM Default Anti-Patterns" (the 7 banned patterns)
- `conductor/code_styleguides/error_handling.md` — the Result[T] convention, the AI Agent Checklist
- `conductor/code_styleguides/type_aliases.md` — "When NOT to promote"
- `conductor/code_styleguides/feature_flags.md` — "delete to turn off" convention
- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4-dimension decision tree
- `conductor/code_styleguides/rag_integration_discipline.md` — "conservative-RAG rule"
- `conductor/code_styleguides/cache_friendly_context.md` — stable-to-volatile ordering
- `conductor/code_styleguides/knowledge_artifacts.md` — the harvest pattern
- `docs/AGENTS.md` — "Convention Enforcement"
- `docs/Readme.md` — any directive-like content in feature descriptions
**Granularity resolution:** the harvest produces a candidate list. Then the question of which directives to merge (e.g., `ban_prefix_aliasing` + `no_local_imports` might become `import_hygiene`), split, or keep standalone is resolved in the harvest phase — not locked in upfront.
### The original docs stay untouched
The `conductor/directives/` tree is a *parallel* structure, not a replacement. The original docs (`python.md`, `error_handling.md`, `AGENTS.md`, etc.) remain the canonical source until a future track deprecates them. The harness is useful immediately (the v1 variants are exact copies); the old docs are not broken.
### Why no scripts / TOML
The user explicitly rejected TOML manifests and scripts for this initial version: "no need to systematize that hard when I don't know what's going to work yet." The preset is markdown. The hot-swap is a text instruction. The variant selection is a path in a markdown file. No build steps, no generated files, no tooling dependencies. If the system proves useful, a future track can add automation (auto-generating presets from the directory tree, token-cost analysis per variant, automated compliance testing).
## Scope: Two Parallel Campaigns
The user's request bundles two distinct campaigns that share a theme ("how do you encode information densely for an LLM?") but are tracked and executed independently.
### Campaign A: Directive Hot-Swap Harness (this spec)
**Track A-1 (this):** directive harvest + scaffold + baseline preset + role-prompt bootstrap update. Gets the system working with v1 (current) encodings.
Future tracks in Campaign A:
- Alternative encoding authoring (v2, v3 per directive — the actual experimentation)
- Manual Slop integration (a "Directive Lab" panel for virtualized directive selection)
- Token-cost analysis tooling
- Automated compliance testing
### Campaign B: Video Analysis (4 new videos)
A separate research campaign following the established 3-pass pattern from the previous 12-video campaign (Pass 1: extract → Pass 2: deobfuscate → Pass 3: project to C11/Python). The 4 videos:
1. **Reinventing Entropy | Compression is Intelligence Part 1** (https://youtu.be/l6DKRf-fAAM)
2. **Yann LeCun: World Models: Enabling the next AI revolution** (https://www.youtube.com/watch?v=72Xj8k5WQX4)
3. **Yann LeCun's $1B Bet Against LLMs [Part 1]** (https://youtu.be/kYkIdXwW2AE)
4. **Recursive Self-Improvement** (https://youtu.be/t7_ZXgfJVG8)
### Cross-Campaign Relationship
The two campaigns inform each other but have no hard dependency:
- **The video analysis informs directive encoding.** The entropy/compression video (video 1) provides theoretical grounding for how information density affects comprehension. LeCun's world-model work (videos 2-3) informs how LLMs model directive intent. Recursive self-improvement (video 4) is directly relevant to the meta-question of whether better directive encodings can be discovered iteratively. Insights from the video analysis may surface alternative encoding strategies to test in Campaign A's harness.
- **The harness informs the video analysis.** The previous video campaign produced a lexicon + C11 reference + deobfuscation DSL. The directive harness is itself a compression-aid tool — it encodes the same directive in fewer/different tokens and observes the effect. The harness's design (preset as bill-of-materials, variant as alternative encoding) is the same pattern as the video campaign's deobfuscation pass (same content, different encoding). The harness may inform how the video analysis encodes its own outputs.
- **Execution order:** the campaigns can run in parallel. Campaign A (Track A-1) is an engineering track; Campaign B is a research track. They don't share files. The cross-pollination is intellectual, not structural.
### The video analysis track structure (Campaign B)
Follows the established 3-pass pattern from `docs/reports/2026-06-15/CAMPAIGN_CLOSE_OUT_video_analysis_20260621.md`:
- **Pass 1:** Information extraction (4 deep-dive reports, one per video). Uses the existing `scripts/video_analysis/` pipeline (download_video, extract_transcript, extract_keyframes, ocr_frames, synthesize_report). The lexicon v2 from the previous campaign is the starting point for deobfuscation.
- **Pass 2:** Deobfuscation (apply the lexicon v2 to the 4 new videos' content). May produce lexicon v3 corrections if the new videos surface notation the lexicon doesn't cover.
- **Pass 3:** C11/Python projection (project each video's deobfuscated content to code in the user's idiomatic style).
The video analysis track is initialized as a separate conductor track (`video_analysis_campaign_2_20260627` or similar). Its spec/plan is authored separately from this design doc.
## Out of Scope (for Track A-1)
- **Authoring alternative encodings (v2+).** This track only creates v1 (verbatim lifts). The experimentation is a future activity.
- **Deprecating the original docs.** The old docs stay as canonical source.
- **Scripts for preset generation or variant selection.** No automation in this version.
- **Manual Slop GUI integration.** The harness is OpenCode-only for now.
- **Token-cost analysis.** No tooling to measure token cost per variant in this version.
- **Automated compliance testing.** No test harness to measure LLM compliance per encoding.
- **The 4-video analysis (Campaign B).** Separate track, separate campaign. This design doc covers Campaign A (the harness) only. The video analysis gets its own track spec.
## Risks
1. **Harvest completeness.** The directive harvest might miss directives embedded in prose. Mitigation: systematic combing of the doc tree + the user reviews the candidate list before variants are created.
2. **Granularity ambiguity.** Some directives overlap (e.g., "ban dict[str, Any]" and "use typed dataclass fields" are two sides of the same coin). Mitigation: the harvest phase produces a candidate list; the granularity is resolved there, not upfront.
3. **Role-prompt drift.** The 5 role prompts need to be updated consistently. Mitigation: the `warm with:` line is the only change; the rest of each role prompt is untouched.
4. **Adoption friction.** LLMs might not follow the `warm with:` instruction reliably. Mitigation: the instruction is simple (read a file, read the files it lists) and uses the existing file-reading behavior the LLMs already have.
## See Also
- `conductor/tier2/agents/tier2-autonomous.md` — the role prompt that will be updated with `warm with:`
- `conductor/tier2/commands/tier-2-auto-execute.md` — the slash command template
- `conductor/code_styleguides/python.md` §17 — the primary source of directives to harvest
- `conductor/code_styleguides/error_handling.md` — the Result[T] convention to harvest
- `AGENTS.md` "Critical Anti-Patterns" — the hard bans to harvest
- `docs/guide_meta_boundary.md` — the meta-tooling / application distinction (relevant to why this harness lives in the meta-tooling domain)
- `docs/reports/2026-06-15/CAMPAIGN_CLOSE_OUT_video_analysis_20260621.md` — the previous video campaign's closeout (the pattern Campaign B follows)
- `scripts/video_analysis/` — the existing video analysis pipeline (Campaign B reuses this)