docs(spec): directive hot-swap harness design + video analysis campaign B
Design for the directive hot-swap harness (Campaign A) + scope for the 4-video analysis campaign (Campaign B). Two parallel campaigns sharing a theme (encoding information densely for LLMs) but tracked independently. Campaign A (Track A-1): directive harvest + conductor/directives/ scaffold + preset markdown system + role-prompt 'warm with:' bootstrap. No scripts, no TOML — markdown-only, LLM-native. Duplicates current directives as v1 variants; alternative encodings (v2+) added over time as experiments. Campaign B: 4 new videos (entropy/compression, LeCun world models, LeCun vs LLMs, recursive self-improvement). Follows the established 3-pass pattern from the previous 12-video campaign. Separate track spec. Cross-campaign: video insights may surface alternative encoding strategies; the harness design mirrors the video campaign's deobfuscation pattern (same content, different encoding).
This commit is contained in:
@@ -0,0 +1,230 @@
|
||||
# Design: Directive Hot-Swap Harness (OpenCode Directive Presets)
|
||||
|
||||
**Date:** 2026-06-27
|
||||
**Status:** Draft — pending user review
|
||||
**Track ID (proposed):** `directive_hotswap_harness_20260627`
|
||||
|
||||
## Problem
|
||||
|
||||
The codebase's directives — the instructions that tell LLMs how to behave (banned patterns, conventions, hard bans, anti-patterns) — are scattered across the entire doc tree: `AGENTS.md`, `conductor/workflow.md`, `conductor/product-guidelines.md`, `conductor/tech-stack.md`, every `conductor/code_styleguides/*.md`, `docs/Readme.md`, `docs/AGENTS.md`, all 14 `docs/guide_*.md`, etc. They're embedded in prose, tables, anti-pattern sections, "Critical Anti-Patterns" lists, "Hard Rules," styleguide sections.
|
||||
|
||||
The 4 tier role prompts (`.opencode/agents/tier1-orchestrator.md`, `tier2-tech-lead.md`, `tier3-worker.md`, `tier4-qa.md`) plus the autonomous variant (`conductor/tier2/agents/tier2-autonomous.md`) currently hardcode a list of ~11 files to read before any action. This list is static — every session gets the same directives regardless of the task. There's no mechanism to:
|
||||
- Test whether an alternative encoding of the same directive (imperative-ban vs. rationale-first vs. before/after) produces better LLM compliance
|
||||
- Hot-swap which encoding is active without manually editing files or navigating the filesystem
|
||||
- Exercise per-session control over which directives the LLM warms up with
|
||||
|
||||
## Goal
|
||||
|
||||
Build a **directive hot-swap harness** that lets the user:
|
||||
1. Maintain multiple alternative encodings ("variants") of the same directive as separate files
|
||||
2. Compose active directive sets into named "presets" (markdown bills of materials)
|
||||
3. Hot-swap which preset is active via a single `warm with: <path>` instruction in the role prompt or session message
|
||||
4. Use the existing file-reading behavior LLMs already have — no scripts, no TOML, no build steps
|
||||
|
||||
## Design
|
||||
|
||||
### The directive directory structure
|
||||
|
||||
```
|
||||
conductor/directives/
|
||||
<directive_name>/
|
||||
v1.md ← the baseline encoding (verbatim lift from current docs)
|
||||
v2_<style>.md ← alternative encodings (added over time)
|
||||
presets/
|
||||
current_baseline.md ← the default preset (all v1)
|
||||
<experimental>.md ← alternative presets (added over time)
|
||||
```
|
||||
|
||||
**Naming convention:** lowercase, underscore-separated, action-oriented (`ban_dict_any`, not `dict_str_any_ban`). The name describes the directive's intent.
|
||||
|
||||
**Variant file format:** each `vN.md` has a short header annotating why this iteration exists, then the directive text:
|
||||
|
||||
```markdown
|
||||
# <directive_name> — v1
|
||||
|
||||
**Why this iteration:** Lifted verbatim from `conductor/code_styleguides/python.md` §17.1.
|
||||
This is the baseline encoding — the imperative-ban style currently in production.
|
||||
Future variants will test alternative encodings against this baseline.
|
||||
|
||||
---
|
||||
|
||||
<directive text>
|
||||
```
|
||||
|
||||
### The preset format
|
||||
|
||||
A preset is a markdown bill of materials. It tells the LLM which directive variant files to read for this run. Nothing more.
|
||||
|
||||
```markdown
|
||||
# Preset: current_baseline
|
||||
|
||||
The baseline directive composition — all v1 variants lifted from the current
|
||||
production docs.
|
||||
|
||||
## Directives to warm
|
||||
|
||||
Read each file below before any action.
|
||||
|
||||
- ban_dict_any: conductor/directives/ban_dict_any/v1.md
|
||||
- ban_optional_returns: conductor/directives/ban_optional_returns/v1.md
|
||||
- no_local_imports: conductor/directives/no_local_imports/v1.md
|
||||
- ...
|
||||
|
||||
## Notes
|
||||
|
||||
All v1 (verbatim lifts from current production docs). No alternative encodings
|
||||
tested yet. This preset is the control group for future experiments.
|
||||
```
|
||||
|
||||
**Key properties:**
|
||||
- **Flat list.** No nesting, no conditionals, no includes. The LLM reads the list, reads the files.
|
||||
- **Human-readable name.** `current_baseline`, `exploratory_rationale`, `minimal_tokens` — pick by name.
|
||||
- **Notes section.** Documents the hypothesis being tested. This is the experiment log, inline with the preset.
|
||||
- **Partial swaps.** Swap 2-3 directives to v2, leave the rest at v1. The preset makes the diff explicit.
|
||||
- **No script needed.** Author a new preset by copying an existing one and changing variant paths. Hot-swap by telling the LLM which preset to use.
|
||||
|
||||
### The role-prompt bootstrap
|
||||
|
||||
The 5 role prompts (`.opencode/agents/tier1-orchestrator.md`, `tier2-tech-lead.md`, `tier3-worker.md`, `tier4-qa.md`, and `conductor/tier2/agents/tier2-autonomous.md`) have a hardcoded "MANDATORY: Pre-Action Required Reading" section listing ~11 specific files. This is replaced with a single `warm with:` directive.
|
||||
|
||||
```markdown
|
||||
## MANDATORY: Directive Warm-up
|
||||
|
||||
warm with: conductor/directives/presets/current_baseline.md
|
||||
|
||||
Read the preset file above. It lists directive variant files to read before any action.
|
||||
Read each file the preset references. These are your active directives for this session.
|
||||
|
||||
If the user specifies a different preset (e.g., "warm with: conductor/directives/presets/exploratory_rationale.md"),
|
||||
use that instead. The user's instruction overrides the default.
|
||||
```
|
||||
|
||||
**Key properties:**
|
||||
- **One line is the bootstrap.** `warm with: <path>` is the entire mechanism.
|
||||
- **User override.** The user can tell the LLM "warm with: <path>" in their session message and it uses that preset instead of the default. This is the hot-swap — no file editing, just a text instruction.
|
||||
- **Per-role defaults.** Each tier role prompt can default to a different preset.
|
||||
- **Non-directive reads remain hardcoded.** Files that aren't tunable directives (e.g., `conductor/tracks/tier2_leak_prevention_20260620/spec.md`, `conductor/tier2/githooks/forbidden-files.txt`) stay as direct references in the role prompt.
|
||||
|
||||
### What stays in the role prompt (not directive-based)
|
||||
|
||||
- `AGENTS.md` — project operating rules (contains directives AND non-directive rules)
|
||||
- `conductor/workflow.md` — operational workflow
|
||||
- `conductor/edit_workflow.md` — edit tool contract
|
||||
- `conductor/tier2/githooks/forbidden-files.txt` — file denylist
|
||||
- The relevant `docs/guide_*.md` — architecture reference
|
||||
|
||||
These are context, not tunable directives. They stay hardcoded in the role prompt.
|
||||
|
||||
### The directive harvest
|
||||
|
||||
The directives are NOT limited to the 11 files the role prompts mandate. They're scattered across the entire doc tree. The track's first phase is a systematic harvest:
|
||||
|
||||
**A directive is any statement that tells the LLM:**
|
||||
- "Do X" / "Don't do X" (imperative)
|
||||
- "Use Y instead of Z" (preference)
|
||||
- "This is BANNED" (hard ban)
|
||||
- "Follow pattern P" (convention)
|
||||
- "Never do Q" (anti-pattern)
|
||||
|
||||
**NOT a directive:**
|
||||
- Descriptive prose ("The App class holds GUI state")
|
||||
- Architecture documentation ("Thread domains are separated by...")
|
||||
- Reference material ("The 45-tool inventory includes...")
|
||||
|
||||
**Sources to comb (non-exhaustive):**
|
||||
- `AGENTS.md` — "Critical Anti-Patterns", "File Size and Naming Convention", "Session-Learned Anti-Patterns", "Process Anti-Patterns"
|
||||
- `conductor/workflow.md` — "Code Style", "Guiding Principles", "Testing Requirements", "Known Pitfalls", "Process Anti-Patterns", "Tier 2 Autonomous Sandbox conventions"
|
||||
- `conductor/product-guidelines.md` — "Core Value", "Code Standards & Architecture", "Data-Oriented Error Handling", "Phase 5: Heavy Curation"
|
||||
- `conductor/tech-stack.md` — "Core Value" header
|
||||
- `conductor/code_styleguides/data_oriented_design.md` — §8.5 "Python Type Promotion Mandate", the 7-question simplification pass, the 10-question self-check
|
||||
- `conductor/code_styleguides/python.md` — §10 "Anti-OOP Conventions", §17 "LLM Default Anti-Patterns" (the 7 banned patterns)
|
||||
- `conductor/code_styleguides/error_handling.md` — the Result[T] convention, the AI Agent Checklist
|
||||
- `conductor/code_styleguides/type_aliases.md` — "When NOT to promote"
|
||||
- `conductor/code_styleguides/feature_flags.md` — "delete to turn off" convention
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4-dimension decision tree
|
||||
- `conductor/code_styleguides/rag_integration_discipline.md` — "conservative-RAG rule"
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — stable-to-volatile ordering
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` — the harvest pattern
|
||||
- `docs/AGENTS.md` — "Convention Enforcement"
|
||||
- `docs/Readme.md` — any directive-like content in feature descriptions
|
||||
|
||||
**Granularity resolution:** the harvest produces a candidate list. Then the question of which directives to merge (e.g., `ban_prefix_aliasing` + `no_local_imports` might become `import_hygiene`), split, or keep standalone is resolved in the harvest phase — not locked in upfront.
|
||||
|
||||
### The original docs stay untouched
|
||||
|
||||
The `conductor/directives/` tree is a *parallel* structure, not a replacement. The original docs (`python.md`, `error_handling.md`, `AGENTS.md`, etc.) remain the canonical source until a future track deprecates them. The harness is useful immediately (the v1 variants are exact copies); the old docs are not broken.
|
||||
|
||||
### Why no scripts / TOML
|
||||
|
||||
The user explicitly rejected TOML manifests and scripts for this initial version: "no need to systematize that hard when I don't know what's going to work yet." The preset is markdown. The hot-swap is a text instruction. The variant selection is a path in a markdown file. No build steps, no generated files, no tooling dependencies. If the system proves useful, a future track can add automation (auto-generating presets from the directory tree, token-cost analysis per variant, automated compliance testing).
|
||||
|
||||
## Scope: Two Parallel Campaigns
|
||||
|
||||
The user's request bundles two distinct campaigns that share a theme ("how do you encode information densely for an LLM?") but are tracked and executed independently.
|
||||
|
||||
### Campaign A: Directive Hot-Swap Harness (this spec)
|
||||
|
||||
**Track A-1 (this):** directive harvest + scaffold + baseline preset + role-prompt bootstrap update. Gets the system working with v1 (current) encodings.
|
||||
|
||||
Future tracks in Campaign A:
|
||||
- Alternative encoding authoring (v2, v3 per directive — the actual experimentation)
|
||||
- Manual Slop integration (a "Directive Lab" panel for virtualized directive selection)
|
||||
- Token-cost analysis tooling
|
||||
- Automated compliance testing
|
||||
|
||||
### Campaign B: Video Analysis (4 new videos)
|
||||
|
||||
A separate research campaign following the established 3-pass pattern from the previous 12-video campaign (Pass 1: extract → Pass 2: deobfuscate → Pass 3: project to C11/Python). The 4 videos:
|
||||
|
||||
1. **Reinventing Entropy | Compression is Intelligence Part 1** (https://youtu.be/l6DKRf-fAAM)
|
||||
2. **Yann LeCun: World Models: Enabling the next AI revolution** (https://www.youtube.com/watch?v=72Xj8k5WQX4)
|
||||
3. **Yann LeCun's $1B Bet Against LLMs [Part 1]** (https://youtu.be/kYkIdXwW2AE)
|
||||
4. **Recursive Self-Improvement** (https://youtu.be/t7_ZXgfJVG8)
|
||||
|
||||
### Cross-Campaign Relationship
|
||||
|
||||
The two campaigns inform each other but have no hard dependency:
|
||||
|
||||
- **The video analysis informs directive encoding.** The entropy/compression video (video 1) provides theoretical grounding for how information density affects comprehension. LeCun's world-model work (videos 2-3) informs how LLMs model directive intent. Recursive self-improvement (video 4) is directly relevant to the meta-question of whether better directive encodings can be discovered iteratively. Insights from the video analysis may surface alternative encoding strategies to test in Campaign A's harness.
|
||||
|
||||
- **The harness informs the video analysis.** The previous video campaign produced a lexicon + C11 reference + deobfuscation DSL. The directive harness is itself a compression-aid tool — it encodes the same directive in fewer/different tokens and observes the effect. The harness's design (preset as bill-of-materials, variant as alternative encoding) is the same pattern as the video campaign's deobfuscation pass (same content, different encoding). The harness may inform how the video analysis encodes its own outputs.
|
||||
|
||||
- **Execution order:** the campaigns can run in parallel. Campaign A (Track A-1) is an engineering track; Campaign B is a research track. They don't share files. The cross-pollination is intellectual, not structural.
|
||||
|
||||
### The video analysis track structure (Campaign B)
|
||||
|
||||
Follows the established 3-pass pattern from `docs/reports/2026-06-15/CAMPAIGN_CLOSE_OUT_video_analysis_20260621.md`:
|
||||
|
||||
- **Pass 1:** Information extraction (4 deep-dive reports, one per video). Uses the existing `scripts/video_analysis/` pipeline (download_video, extract_transcript, extract_keyframes, ocr_frames, synthesize_report). The lexicon v2 from the previous campaign is the starting point for deobfuscation.
|
||||
- **Pass 2:** Deobfuscation (apply the lexicon v2 to the 4 new videos' content). May produce lexicon v3 corrections if the new videos surface notation the lexicon doesn't cover.
|
||||
- **Pass 3:** C11/Python projection (project each video's deobfuscated content to code in the user's idiomatic style).
|
||||
|
||||
The video analysis track is initialized as a separate conductor track (`video_analysis_campaign_2_20260627` or similar). Its spec/plan is authored separately from this design doc.
|
||||
|
||||
## Out of Scope (for Track A-1)
|
||||
|
||||
- **Authoring alternative encodings (v2+).** This track only creates v1 (verbatim lifts). The experimentation is a future activity.
|
||||
- **Deprecating the original docs.** The old docs stay as canonical source.
|
||||
- **Scripts for preset generation or variant selection.** No automation in this version.
|
||||
- **Manual Slop GUI integration.** The harness is OpenCode-only for now.
|
||||
- **Token-cost analysis.** No tooling to measure token cost per variant in this version.
|
||||
- **Automated compliance testing.** No test harness to measure LLM compliance per encoding.
|
||||
- **The 4-video analysis (Campaign B).** Separate track, separate campaign. This design doc covers Campaign A (the harness) only. The video analysis gets its own track spec.
|
||||
|
||||
## Risks
|
||||
|
||||
1. **Harvest completeness.** The directive harvest might miss directives embedded in prose. Mitigation: systematic combing of the doc tree + the user reviews the candidate list before variants are created.
|
||||
2. **Granularity ambiguity.** Some directives overlap (e.g., "ban dict[str, Any]" and "use typed dataclass fields" are two sides of the same coin). Mitigation: the harvest phase produces a candidate list; the granularity is resolved there, not upfront.
|
||||
3. **Role-prompt drift.** The 5 role prompts need to be updated consistently. Mitigation: the `warm with:` line is the only change; the rest of each role prompt is untouched.
|
||||
4. **Adoption friction.** LLMs might not follow the `warm with:` instruction reliably. Mitigation: the instruction is simple (read a file, read the files it lists) and uses the existing file-reading behavior the LLMs already have.
|
||||
|
||||
## See Also
|
||||
|
||||
- `conductor/tier2/agents/tier2-autonomous.md` — the role prompt that will be updated with `warm with:`
|
||||
- `conductor/tier2/commands/tier-2-auto-execute.md` — the slash command template
|
||||
- `conductor/code_styleguides/python.md` §17 — the primary source of directives to harvest
|
||||
- `conductor/code_styleguides/error_handling.md` — the Result[T] convention to harvest
|
||||
- `AGENTS.md` "Critical Anti-Patterns" — the hard bans to harvest
|
||||
- `docs/guide_meta_boundary.md` — the meta-tooling / application distinction (relevant to why this harness lives in the meta-tooling domain)
|
||||
- `docs/reports/2026-06-15/CAMPAIGN_CLOSE_OUT_video_analysis_20260621.md` — the previous video campaign's closeout (the pattern Campaign B follows)
|
||||
- `scripts/video_analysis/` — the existing video analysis pipeline (Campaign B reuses this)
|
||||
Reference in New Issue
Block a user