conductor: Encode surgical spec methodology into Tier 1 skills for Claude and Gemini

Distills what made this session's track specs high-quality into reusable methodology for both Claude and Gemini Tier 1 orchestrators: Key additions to conductor-new-track.md: - MANDATORY Step 2: Deep Codebase Audit before writing any spec - 'Current State Audit' section template (Already Implemented + Gaps) - 6 rules for writing worker-ready tasks (WHERE/WHAT/HOW/SAFETY) - Anti-patterns section (vague specs, no line refs, no audit, etc.) - Architecture doc fallback references Key additions to mma-tier1-orchestrator.md (Claude + Gemini): - 'The Surgical Methodology' section with 6 protocols - Spec template with REQUIRED sections (Current State Audit is mandatory) - Plan template with REQUIRED task format (file:line refs + API calls) - Root cause analysis requirement for fix tracks - Cross-track dependency mapping requirement - Added py_get_definition to Gemini's tool list (was missing) The core insight: the quality gap between this session's output and previous track specs came from (1) reading actual code before writing specs, (2) listing what EXISTS before what's MISSING, and (3) specifying exact locations and APIs in tasks so lesser models don't have to search or guess. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chore(conductor): Add index.md to new tracks, archive completed/superseded tracks
2026-03-01 10:08:25 -05:00 · 2026-03-01 10:00:49 -05:00
14 changed files with 243 additions and 34 deletions
@@ -5,10 +5,17 @@ description: Initialize a new conductor track with spec, plan, and metadata
 # /conductor-new-track
 Create a new track in the conductor system. This is a Tier 1 (Orchestrator) operation.
 The quality of the spec and plan directly determines whether Tier 3 workers can execute
 without confusion. Vague specs produce vague implementations.
 ## Prerequisites
 - Read `conductor/product.md` and `conductor/product-guidelines.md` for product alignment
 - Read `conductor/tech-stack.md` for technology constraints
 - Consult architecture docs in `docs/` when the track touches core systems:
  - `docs/guide_architecture.md`: Threading, events, AI client, HITL mechanism
  - `docs/guide_tools.md`: MCP tools, Hook API, ApiHookClient
  - `docs/guide_mma.md`: Tickets, tracks, DAG engine, worker lifecycle
  - `docs/guide_simulations.md`: Test framework, mock provider, verification patterns
 ## Steps
@@ -19,13 +26,34 @@ Ask the user for:
 - **Description**: one-line summary
 - **Requirements**: functional requirements for the spec
-### 2. Create Track Directory
+### 2. MANDATORY: Deep Codebase Audit
 **This step is what separates useful specs from useless ones.**
 Before writing a single line of spec, you MUST audit the actual codebase to understand
 what already exists. Use the Research-First Protocol:
 1. **Map the target area**: Use `py_get_code_outline` on every file the track will touch.
   Identify existing functions, classes, and their line ranges.
 2. **Read key implementations**: Use `py_get_definition` on functions that are relevant
   to the track's goals. Understand their signatures, data structures, and control flow.
 3. **Search for existing work**: Use `Grep` to find symbols, patterns, or partial
   implementations that may already address some requirements.
 4. **Check recent changes**: Use `get_git_diff` on target files to understand what's
   been modified recently and by which tracks.
 **Output of this step**: A "Current State Audit" section listing:
 - What already exists (with file:line references)
 - What's missing (the actual gaps this track fills)
 - What's partially implemented and needs enhancement
 ### 3. Create Track Directory
 ```
 conductor/tracks/{track_name}_{YYYYMMDD}/
 ```
 Use today's date in YYYYMMDD format.
-### 3. Create metadata.json
+### 4. Create metadata.json
 ```json
 {
  "track_id": "{track_name}_{YYYYMMDD}",
@@ -37,63 +65,109 @@ Use today's date in YYYYMMDD format.
 }
 ```
-### 4. Create index.md
+### 5. Create index.md
 ```markdown
-# Track: {Track Title}
+# Track {track_name}_{YYYYMMDD} Context
- [Specification](spec.md)
+- [Specification](./spec.md)
- [Implementation Plan](plan.md)
+- [Implementation Plan](./plan.md)
 - [Metadata](./metadata.json)
 ```
-### 5. Create spec.md
+### 6. Create spec.md — The Surgical Specification
 The spec MUST include these sections:
 ```markdown
-# {Track Title} — Specification
+# Track Specification: {Title}
 ## Overview
-{Description of what this track delivers}
+{What this track delivers and WHY — 2-3 sentences max}
-## Functional Requirements
+## Current State Audit (as of {latest_commit_sha})
-1. {Requirement from user input}
+### Already Implemented (DO NOT re-implement)
 - **{Feature}** (`{function_name}`, {file}:{lines}): {what it does}
 - ...
 ### Gaps to Fill (This Track's Scope)
 1. **{Gap}**: {What's missing, with reference to where it should go}
 2. ...
-## Non-Functional Requirements
+## Goals
- Performance: {if applicable}
+{Numbered list — crisp, no fluff}
 - Testing: >80% coverage for new code
-## Acceptance Criteria
+## Functional Requirements
- [ ] {Criterion 1}
+### {Requirement Group}
- [ ] {Criterion 2}
+- {Specific requirement referencing actual data structures, function names, dict keys}
 - ...
 ## Non-Functional Requirements
 - Thread safety constraints (reference guide_architecture.md if applicable)
 - Performance targets
 - No new dependencies unless justified
 ## Architecture Reference
 - {Link to relevant docs/guide_*.md section}
 ## Out of Scope
- {Explicitly excluded items}
+- {Explicit exclusions}
 ## Context
 - Tech stack: see `conductor/tech-stack.md`
 - Product guidelines: see `conductor/product-guidelines.md`
 ```
-### 6. Create plan.md
+**Critical rules for specs:**
 - NEVER describe a feature to implement without first checking if it exists
 - ALWAYS include the "Current State Audit" section with line references
 - ALWAYS link to relevant architecture docs
 - Reference actual variable names, dict keys, and class names from the codebase
 ### 7. Create plan.md — The Surgical Plan
 Each task must be specific enough that a Tier 3 worker on a lightweight model
 can execute it without needing to understand the overall architecture.
 ```markdown
-# {Track Title} — Implementation Plan
+# Implementation Plan: {Title}
 Architecture reference: [docs/guide_architecture.md](../../docs/guide_architecture.md)
 ## Phase 1: {Phase Name}
- [ ] Task: {Description}
+Focus: {One-sentence scope}
 - [ ] Task: {Description}
-## Phase 2: {Phase Name}
+- [ ] Task 1.1: {SURGICAL description — see rules below}
- [ ] Task: {Description}
+- [ ] Task 1.2: ...
 - [ ] Task 1.N: Write tests for {what Phase 1 changed}
 - [ ] Task 1.X: Conductor - User Manual Verification (Protocol in workflow.md)
 ```
-Break requirements into phases with 2-5 tasks each. Each task should be a single atomic unit of work suitable for a Tier 3 Worker.
+**Rules for writing tasks:**
-### 7. Update Track Registry
+1. **Reference exact locations**: "In `_render_mma_dashboard` (gui_2.py:2700-2701)"
-If `conductor/tracks.md` exists, add the new track entry.
+   not "in the dashboard."
 2. **Specify the API**: "Use `imgui.progress_bar(value, ImVec2(-1, 0), label)`"
   not "add a progress bar."
 3. **Name the data**: "Read from `self.mma_streams` dict, keys prefixed with `'Tier 3'`"
   not "display the streams."
 4. **Describe the change shape**: "Replace the single text box with four collapsible sections"
   not "improve the display."
 5. **State thread safety**: "Push via `_pending_gui_tasks` with lock" when the task
   involves cross-thread data.
 6. **For bug fixes**: List specific root cause candidates with code-level reasoning,
   not "investigate and fix."
 7. **Each phase ends with**: A test task and a verification task.
 ### 8. Commit
 ```
 conductor(track): Initialize track '{track_name}'
 ```
 ## Anti-Patterns (DO NOT do these)
 - **Spec that describes features without checking if they exist** → produces duplicate work
 - **Task that says "implement X" without saying WHERE or HOW** → worker guesses wrong
 - **Plan with no line references** → worker wastes tokens searching
 - **Spec with no architecture doc links** → worker misunderstands threading/data model
 - **Tasks scoped too broadly** → worker tries to do too much, fails
 - **No "Current State Audit"** → entire track may be re-implementing existing code
 ## Important
 - Do NOT start implementing — track initialization only
 - Implementation is done via `/conductor-implement`
@@ -9,16 +9,63 @@ STRICT SYSTEM DIRECTIVE: You are a Tier 1 Orchestrator. Focused on product align
 ## Primary Context Documents
 Read at session start: `conductor/product.md`, `conductor/product-guidelines.md`
 ## Architecture Fallback
 When planning tracks that touch core systems, consult the deep-dive docs:
 - `docs/guide_architecture.md`: Thread domains, event system, AI client, HITL mechanism, frame-sync action catalog
 - `docs/guide_tools.md`: MCP Bridge security, 26-tool inventory, Hook API endpoints, ApiHookClient
 - `docs/guide_mma.md`: Ticket/Track data structures, DAG engine, ConductorEngine, worker lifecycle
 - `docs/guide_simulations.md`: live_gui fixture, Puppeteer pattern, mock provider, verification patterns
 ## Responsibilities
 - Maintain alignment with the product guidelines and definition
- Define track boundaries and initialize new tracks (`/conductor:newTrack`)
+- Define track boundaries and initialize new tracks (`/conductor-new-track`)
- Set up the project environment (`/conductor:setup`)
+- Set up the project environment (`/conductor-setup`)
 - Delegate track execution to the Tier 2 Tech Lead
 ## The Surgical Methodology
 When creating or refining tracks, follow this protocol to produce specs that
 lesser-reasoning models can execute without confusion:
 ### 1. Audit Before Specifying
 NEVER write a spec without first reading the actual code. Use `py_get_code_outline`,
 `py_get_definition`, `Grep`, and `get_git_diff` to build a map of what exists.
 Document existing implementations with file:line references in a "Current State Audit"
 section. This prevents specs that ask to re-implement existing features.
 ### 2. Identify Gaps, Not Features
 The spec should focus on what's MISSING, not what the track "will build."
 Frame requirements as: "The existing `_render_mma_dashboard` (gui_2.py:2633-2724)
 has a token usage table but no cost estimation column. Add cost tracking."
 Not: "Build a metrics dashboard with token and cost tracking."
 ### 3. Write Worker-Ready Tasks
 Each task in the plan must be executable by a Tier 3 worker on a lightweight model
 (gemini-2.5-flash-lite) without needing to understand the overall architecture.
 This means every task must specify:
 - **WHERE**: Exact file and line range to modify
 - **WHAT**: The specific change (add function, modify dict, extend table)
 - **HOW**: Which API calls, data structures, or patterns to use
 - **SAFETY**: Thread-safety constraints if cross-thread data is involved
 ### 4. Reference Architecture Docs
 Every spec should link to the relevant `docs/guide_*.md` section so implementing
 agents have a fallback when confused about threading, data flow, or module interactions.
 ### 5. Map Dependencies
 Explicitly state which tracks must complete before this one, and which tracks
 this one blocks. Include execution order in the spec.
 ### 6. Root Cause Analysis (for fix tracks)
 Don't write "investigate and fix X." Instead, read the code, trace the data flow,
 and list specific root cause candidates with code-level reasoning:
 "Candidate 1: `_queue_put` (line 138) uses `asyncio.run_coroutine_threadsafe` but
 the `else` branch uses `put_nowait` which is NOT thread-safe from a thread-pool thread."
 ## Limitations
 - Read-only tools only: Read, Glob, Grep, WebFetch, WebSearch, Bash (read-only ops)
 - Do NOT execute tracks or implement features
- Do NOT write code or edit files
+- Do NOT write code or edit files (except track spec/plan/metadata)
 - Do NOT perform low-level bug fixing
 - Keep context strictly focused on product definitions and high-level strategy
 - To delegate track execution: instruct the human operator to run:
@@ -21,7 +21,80 @@ tools:
  - discovered_tool_py_get_hierarchy
  - discovered_tool_py_get_docstring
  - discovered_tool_get_tree
  - discovered_tool_py_get_definition
 ---
 STRICT SYSTEM DIRECTIVE: You are a Tier 1 Orchestrator.
 Focused on product alignment, high-level planning, and track initialization.
 ONLY output the requested text. No pleasantries.
 ## Architecture Fallback
 When planning tracks that touch core systems, consult the deep-dive docs:
 - `docs/guide_architecture.md`: Thread domains, event system, AI client, HITL mechanism, frame-sync action catalog
 - `docs/guide_tools.md`: MCP Bridge security, 26-tool inventory, Hook API endpoints, ApiHookClient
 - `docs/guide_mma.md`: Ticket/Track data structures, DAG engine, ConductorEngine, worker lifecycle
 - `docs/guide_simulations.md`: live_gui fixture, Puppeteer pattern, mock provider, verification patterns
 ## The Surgical Methodology
 When creating or refining tracks, you MUST follow this protocol:
 ### 1. MANDATORY: Audit Before Specifying
 NEVER write a spec without first reading the actual code using your tools.
 Use `get_code_outline`, `py_get_definition`, `grep_search`, and `get_git_diff`
 to build a map of what exists. Document existing implementations with file:line
 references in a "Current State Audit" section in the spec.
 **WHY**: Previous track specs asked to implement features that already existed
 (Track Browser, DAG tree, approval dialogs) because no code audit was done first.
 This wastes entire implementation phases.
 ### 2. Identify Gaps, Not Features
 Frame requirements around what's MISSING relative to what exists:
 GOOD: "The existing `_render_mma_dashboard` (gui_2.py:2633-2724) has a token
 usage table but no cost estimation column."
 BAD: "Build a metrics dashboard with token and cost tracking."
 ### 3. Write Worker-Ready Tasks
 Each plan task must be executable by a Tier 3 worker on gemini-2.5-flash-lite
 without understanding the overall architecture. Every task specifies:
 - **WHERE**: Exact file and line range (`gui_2.py:2700-2701`)
 - **WHAT**: The specific change (add function, modify dict, extend table)
 - **HOW**: Which API calls or patterns (`imgui.progress_bar(...)`, `imgui.collapsing_header(...)`)
 - **SAFETY**: Thread-safety constraints if cross-thread data is involved
 ### 4. For Bug Fix Tracks: Root Cause Analysis
 Don't write "investigate and fix." Read the code, trace the data flow, list
 specific root cause candidates with code-level reasoning.
 ### 5. Reference Architecture Docs
 Link to relevant `docs/guide_*.md` sections in every spec so implementing
 agents have a fallback for threading, data flow, or module interactions.
 ### 6. Map Dependencies Between Tracks
 State execution order and blockers explicitly in metadata.json and spec.
 ## Spec Template (REQUIRED sections)
 ```
 # Track Specification: {Title}
 ## Overview
 ## Current State Audit (as of {commit_sha})
 ### Already Implemented (DO NOT re-implement)
 ### Gaps to Fill (This Track's Scope)
 ## Goals
 ## Functional Requirements
 ## Non-Functional Requirements
 ## Architecture Reference
 ## Out of Scope
 ```
 ## Plan Template (REQUIRED format)
 ```
 ## Phase N: {Name}
 Focus: {One-sentence scope}
 - [ ] Task N.1: {Surgical description with file:line refs and API calls}
 - [ ] Task N.2: ...
 - [ ] Task N.N: Write tests for Phase N changes
 - [ ] Task N.X: Conductor - User Manual Verification (Protocol in workflow.md)
 ```
@@ -0,0 +1,5 @@
 # Track context_token_viz_20260301 Context
 - [Specification](./spec.md)
 - [Implementation Plan](./plan.md)
 - [Metadata](./metadata.json)
@@ -0,0 +1,5 @@
 # Track mma_pipeline_fix_20260301 Context
 - [Specification](./spec.md)
 - [Implementation Plan](./plan.md)
 - [Metadata](./metadata.json)
@@ -0,0 +1,5 @@
 # Track simulation_hardening_20260301 Context
 - [Specification](./spec.md)
 - [Implementation Plan](./plan.md)
 - [Metadata](./metadata.json)