conductor: Encode surgical spec methodology into Tier 1 skills for Claude and Gemini

Distills what made this session's track specs high-quality into reusable methodology for both Claude and Gemini Tier 1 orchestrators: Key additions to conductor-new-track.md: - MANDATORY Step 2: Deep Codebase Audit before writing any spec - 'Current State Audit' section template (Already Implemented + Gaps) - 6 rules for writing worker-ready tasks (WHERE/WHAT/HOW/SAFETY) - Anti-patterns section (vague specs, no line refs, no audit, etc.) - Architecture doc fallback references Key additions to mma-tier1-orchestrator.md (Claude + Gemini): - 'The Surgical Methodology' section with 6 protocols - Spec template with REQUIRED sections (Current State Audit is mandatory) - Plan template with REQUIRED task format (file:line refs + API calls) - Root cause analysis requirement for fix tracks - Cross-track dependency mapping requirement - Added py_get_definition to Gemini's tool list (was missing) The core insight: the quality gap between this session's output and previous track specs came from (1) reading actual code before writing specs, (2) listing what EXISTS before what's MISSING, and (3) specifying exact locations and APIs in tasks so lesser models don't have to search or guess. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-01 10:08:25 -05:00
parent 458529fb13
commit 52a463d13f
3 changed files with 228 additions and 34 deletions
@@ -5,10 +5,17 @@ description: Initialize a new conductor track with spec, plan, and metadata
 # /conductor-new-track

 Create a new track in the conductor system. This is a Tier 1 (Orchestrator) operation.
+The quality of the spec and plan directly determines whether Tier 3 workers can execute
+without confusion. Vague specs produce vague implementations.

 ## Prerequisites
 - Read `conductor/product.md` and `conductor/product-guidelines.md` for product alignment
 - Read `conductor/tech-stack.md` for technology constraints
+- Consult architecture docs in `docs/` when the track touches core systems:
+  - `docs/guide_architecture.md`: Threading, events, AI client, HITL mechanism
+  - `docs/guide_tools.md`: MCP tools, Hook API, ApiHookClient
+  - `docs/guide_mma.md`: Tickets, tracks, DAG engine, worker lifecycle
+  - `docs/guide_simulations.md`: Test framework, mock provider, verification patterns

 ## Steps

@@ -19,13 +26,34 @@ Ask the user for:
 - **Description**: one-line summary
 - **Requirements**: functional requirements for the spec

-### 2. Create Track Directory
+### 2. MANDATORY: Deep Codebase Audit
+
+**This step is what separates useful specs from useless ones.**
+
+Before writing a single line of spec, you MUST audit the actual codebase to understand
+what already exists. Use the Research-First Protocol:
+
+1. **Map the target area**: Use `py_get_code_outline` on every file the track will touch.
+   Identify existing functions, classes, and their line ranges.
+2. **Read key implementations**: Use `py_get_definition` on functions that are relevant
+   to the track's goals. Understand their signatures, data structures, and control flow.
+3. **Search for existing work**: Use `Grep` to find symbols, patterns, or partial
+   implementations that may already address some requirements.
+4. **Check recent changes**: Use `get_git_diff` on target files to understand what's
+   been modified recently and by which tracks.
+
+**Output of this step**: A "Current State Audit" section listing:
+- What already exists (with file:line references)
+- What's missing (the actual gaps this track fills)
+- What's partially implemented and needs enhancement
+
+### 3. Create Track Directory
 ```
 conductor/tracks/{track_name}_{YYYYMMDD}/
 ```
 Use today's date in YYYYMMDD format.

-### 3. Create metadata.json
+### 4. Create metadata.json
 ```json
 {
  "track_id": "{track_name}_{YYYYMMDD}",
@@ -37,63 +65,109 @@ Use today's date in YYYYMMDD format.
 }
 ```

-### 4. Create index.md
+### 5. Create index.md
 ```markdown
-# Track: {Track Title}
+# Track {track_name}_{YYYYMMDD} Context

- [Specification](spec.md)
- [Implementation Plan](plan.md)
+- [Specification](./spec.md)
+- [Implementation Plan](./plan.md)
+- [Metadata](./metadata.json)
 ```

-### 5. Create spec.md
+### 6. Create spec.md — The Surgical Specification
+
+The spec MUST include these sections:
+
 ```markdown
-# {Track Title} — Specification
+# Track Specification: {Title}

 ## Overview
-{Description of what this track delivers}
+{What this track delivers and WHY — 2-3 sentences max}

-## Functional Requirements
-1. {Requirement from user input}
+## Current State Audit (as of {latest_commit_sha})
+### Already Implemented (DO NOT re-implement)
+- **{Feature}** (`{function_name}`, {file}:{lines}): {what it does}
+- ...
+
+### Gaps to Fill (This Track's Scope)
+1. **{Gap}**: {What's missing, with reference to where it should go}
 2. ...

-## Non-Functional Requirements
- Performance: {if applicable}
- Testing: >80% coverage for new code
+## Goals
+{Numbered list — crisp, no fluff}

-## Acceptance Criteria
- [ ] {Criterion 1}
- [ ] {Criterion 2}
+## Functional Requirements
+### {Requirement Group}
+- {Specific requirement referencing actual data structures, function names, dict keys}
+- ...
+
+## Non-Functional Requirements
+- Thread safety constraints (reference guide_architecture.md if applicable)
+- Performance targets
+- No new dependencies unless justified
+
+## Architecture Reference
+- {Link to relevant docs/guide_*.md section}

 ## Out of Scope
- {Explicitly excluded items}
-
-## Context
- Tech stack: see `conductor/tech-stack.md`
- Product guidelines: see `conductor/product-guidelines.md`
+- {Explicit exclusions}
 ```

-### 6. Create plan.md
+**Critical rules for specs:**
+- NEVER describe a feature to implement without first checking if it exists
+- ALWAYS include the "Current State Audit" section with line references
+- ALWAYS link to relevant architecture docs
+- Reference actual variable names, dict keys, and class names from the codebase
+
+### 7. Create plan.md — The Surgical Plan
+
+Each task must be specific enough that a Tier 3 worker on a lightweight model
+can execute it without needing to understand the overall architecture.
+
 ```markdown
-# {Track Title} — Implementation Plan
+# Implementation Plan: {Title}
+
+Architecture reference: [docs/guide_architecture.md](../../docs/guide_architecture.md)

 ## Phase 1: {Phase Name}
- [ ] Task: {Description}
- [ ] Task: {Description}
+Focus: {One-sentence scope}

-## Phase 2: {Phase Name}
- [ ] Task: {Description}
+- [ ] Task 1.1: {SURGICAL description — see rules below}
+- [ ] Task 1.2: ...
+- [ ] Task 1.N: Write tests for {what Phase 1 changed}
+- [ ] Task 1.X: Conductor - User Manual Verification (Protocol in workflow.md)
 ```

-Break requirements into phases with 2-5 tasks each. Each task should be a single atomic unit of work suitable for a Tier 3 Worker.
+**Rules for writing tasks:**

-### 7. Update Track Registry
-If `conductor/tracks.md` exists, add the new track entry.
+1. **Reference exact locations**: "In `_render_mma_dashboard` (gui_2.py:2700-2701)"
+   not "in the dashboard."
+2. **Specify the API**: "Use `imgui.progress_bar(value, ImVec2(-1, 0), label)`"
+   not "add a progress bar."
+3. **Name the data**: "Read from `self.mma_streams` dict, keys prefixed with `'Tier 3'`"
+   not "display the streams."
+4. **Describe the change shape**: "Replace the single text box with four collapsible sections"
+   not "improve the display."
+5. **State thread safety**: "Push via `_pending_gui_tasks` with lock" when the task
+   involves cross-thread data.
+6. **For bug fixes**: List specific root cause candidates with code-level reasoning,
+   not "investigate and fix."
+7. **Each phase ends with**: A test task and a verification task.

 ### 8. Commit
 ```
 conductor(track): Initialize track '{track_name}'
 ```

+## Anti-Patterns (DO NOT do these)
+
+- **Spec that describes features without checking if they exist** → produces duplicate work
+- **Task that says "implement X" without saying WHERE or HOW** → worker guesses wrong
+- **Plan with no line references** → worker wastes tokens searching
+- **Spec with no architecture doc links** → worker misunderstands threading/data model
+- **Tasks scoped too broadly** → worker tries to do too much, fails
+- **No "Current State Audit"** → entire track may be re-implementing existing code
+
 ## Important
 - Do NOT start implementing — track initialization only
 - Implementation is done via `/conductor-implement`
@@ -9,16 +9,63 @@ STRICT SYSTEM DIRECTIVE: You are a Tier 1 Orchestrator. Focused on product align
 ## Primary Context Documents
 Read at session start: `conductor/product.md`, `conductor/product-guidelines.md`

+## Architecture Fallback
+When planning tracks that touch core systems, consult the deep-dive docs:
+- `docs/guide_architecture.md`: Thread domains, event system, AI client, HITL mechanism, frame-sync action catalog
+- `docs/guide_tools.md`: MCP Bridge security, 26-tool inventory, Hook API endpoints, ApiHookClient
+- `docs/guide_mma.md`: Ticket/Track data structures, DAG engine, ConductorEngine, worker lifecycle
+- `docs/guide_simulations.md`: live_gui fixture, Puppeteer pattern, mock provider, verification patterns
+
 ## Responsibilities
 - Maintain alignment with the product guidelines and definition
- Define track boundaries and initialize new tracks (`/conductor:newTrack`)
- Set up the project environment (`/conductor:setup`)
+- Define track boundaries and initialize new tracks (`/conductor-new-track`)
+- Set up the project environment (`/conductor-setup`)
 - Delegate track execution to the Tier 2 Tech Lead

+## The Surgical Methodology
+
+When creating or refining tracks, follow this protocol to produce specs that
+lesser-reasoning models can execute without confusion:
+
+### 1. Audit Before Specifying
+NEVER write a spec without first reading the actual code. Use `py_get_code_outline`,
+`py_get_definition`, `Grep`, and `get_git_diff` to build a map of what exists.
+Document existing implementations with file:line references in a "Current State Audit"
+section. This prevents specs that ask to re-implement existing features.
+
+### 2. Identify Gaps, Not Features
+The spec should focus on what's MISSING, not what the track "will build."
+Frame requirements as: "The existing `_render_mma_dashboard` (gui_2.py:2633-2724)
+has a token usage table but no cost estimation column. Add cost tracking."
+Not: "Build a metrics dashboard with token and cost tracking."
+
+### 3. Write Worker-Ready Tasks
+Each task in the plan must be executable by a Tier 3 worker on a lightweight model
+(gemini-2.5-flash-lite) without needing to understand the overall architecture.
+This means every task must specify:
+- **WHERE**: Exact file and line range to modify
+- **WHAT**: The specific change (add function, modify dict, extend table)
+- **HOW**: Which API calls, data structures, or patterns to use
+- **SAFETY**: Thread-safety constraints if cross-thread data is involved
+
+### 4. Reference Architecture Docs
+Every spec should link to the relevant `docs/guide_*.md` section so implementing
+agents have a fallback when confused about threading, data flow, or module interactions.
+
+### 5. Map Dependencies
+Explicitly state which tracks must complete before this one, and which tracks
+this one blocks. Include execution order in the spec.
+
+### 6. Root Cause Analysis (for fix tracks)
+Don't write "investigate and fix X." Instead, read the code, trace the data flow,
+and list specific root cause candidates with code-level reasoning:
+"Candidate 1: `_queue_put` (line 138) uses `asyncio.run_coroutine_threadsafe` but
+the `else` branch uses `put_nowait` which is NOT thread-safe from a thread-pool thread."
+
 ## Limitations
 - Read-only tools only: Read, Glob, Grep, WebFetch, WebSearch, Bash (read-only ops)
 - Do NOT execute tracks or implement features
- Do NOT write code or edit files
+- Do NOT write code or edit files (except track spec/plan/metadata)
 - Do NOT perform low-level bug fixing
 - Keep context strictly focused on product definitions and high-level strategy
 - To delegate track execution: instruct the human operator to run:
@@ -21,7 +21,80 @@ tools:
  - discovered_tool_py_get_hierarchy
  - discovered_tool_py_get_docstring
  - discovered_tool_get_tree
+  - discovered_tool_py_get_definition
 ---
 STRICT SYSTEM DIRECTIVE: You are a Tier 1 Orchestrator.
 Focused on product alignment, high-level planning, and track initialization.
 ONLY output the requested text. No pleasantries.
+
+## Architecture Fallback
+When planning tracks that touch core systems, consult the deep-dive docs:
+- `docs/guide_architecture.md`: Thread domains, event system, AI client, HITL mechanism, frame-sync action catalog
+- `docs/guide_tools.md`: MCP Bridge security, 26-tool inventory, Hook API endpoints, ApiHookClient
+- `docs/guide_mma.md`: Ticket/Track data structures, DAG engine, ConductorEngine, worker lifecycle
+- `docs/guide_simulations.md`: live_gui fixture, Puppeteer pattern, mock provider, verification patterns
+
+## The Surgical Methodology
+
+When creating or refining tracks, you MUST follow this protocol:
+
+### 1. MANDATORY: Audit Before Specifying
+NEVER write a spec without first reading the actual code using your tools.
+Use `get_code_outline`, `py_get_definition`, `grep_search`, and `get_git_diff`
+to build a map of what exists. Document existing implementations with file:line
+references in a "Current State Audit" section in the spec.
+
+**WHY**: Previous track specs asked to implement features that already existed
+(Track Browser, DAG tree, approval dialogs) because no code audit was done first.
+This wastes entire implementation phases.
+
+### 2. Identify Gaps, Not Features
+Frame requirements around what's MISSING relative to what exists:
+GOOD: "The existing `_render_mma_dashboard` (gui_2.py:2633-2724) has a token
+usage table but no cost estimation column."
+BAD: "Build a metrics dashboard with token and cost tracking."
+
+### 3. Write Worker-Ready Tasks
+Each plan task must be executable by a Tier 3 worker on gemini-2.5-flash-lite
+without understanding the overall architecture. Every task specifies:
+- **WHERE**: Exact file and line range (`gui_2.py:2700-2701`)
+- **WHAT**: The specific change (add function, modify dict, extend table)
+- **HOW**: Which API calls or patterns (`imgui.progress_bar(...)`, `imgui.collapsing_header(...)`)
+- **SAFETY**: Thread-safety constraints if cross-thread data is involved
+
+### 4. For Bug Fix Tracks: Root Cause Analysis
+Don't write "investigate and fix." Read the code, trace the data flow, list
+specific root cause candidates with code-level reasoning.
+
+### 5. Reference Architecture Docs
+Link to relevant `docs/guide_*.md` sections in every spec so implementing
+agents have a fallback for threading, data flow, or module interactions.
+
+### 6. Map Dependencies Between Tracks
+State execution order and blockers explicitly in metadata.json and spec.
+
+## Spec Template (REQUIRED sections)
+```
+# Track Specification: {Title}
+
+## Overview
+## Current State Audit (as of {commit_sha})
+### Already Implemented (DO NOT re-implement)
+### Gaps to Fill (This Track's Scope)
+## Goals
+## Functional Requirements
+## Non-Functional Requirements
+## Architecture Reference
+## Out of Scope
+```
+
+## Plan Template (REQUIRED format)
+```
+## Phase N: {Name}
+Focus: {One-sentence scope}
+
+- [ ] Task N.1: {Surgical description with file:line refs and API calls}
+- [ ] Task N.2: ...
+- [ ] Task N.N: Write tests for Phase N changes
+- [ ] Task N.X: Conductor - User Manual Verification (Protocol in workflow.md)
+```