chore: Wire architecture docs into mma_exec.py and workflow delegation prompts

mma_exec.py changes: - get_role_documents: Tier 1 now gets docs/guide_architecture.md + guide_mma.md (was: only product.md). Tier 2 gets same (was: only tech-stack + workflow). Tier 3 gets guide_architecture.md (was: only workflow.md — workers modifying gui_2.py had zero knowledge of threading model). Tier 4 gets guide_architecture.md (was: nothing). - Tier 3 system directive: Added ARCHITECTURE REFERENCE callout, CRITICAL THREADING RULE (never write GUI state from background thread), TASK FORMAT instruction (follow WHERE/WHAT/HOW/SAFETY from surgical tasks), and py_get_definition to tool list. - Tier 4 system directive: Added ARCHITECTURE REFERENCE callout and instruction to trace errors through thread domains documented in guide_architecture.md. conductor/workflow.md changes: - Red Phase delegation prompt: Replaced 'with a prompt to create tests' with surgical prompt format example showing WHERE/WHAT/HOW/SAFETY. - Green Phase delegation prompt: Replaced 'with a highly specific prompt' with surgical prompt format example with exact line refs and API calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chore(gemini): Encode surgical methodology into all Gemini MMA skills
2026-03-01 10:16:38 -05:00 · 2026-03-01 10:13:29 -05:00
5 changed files with 115 additions and 20 deletions
@@ -7,12 +7,33 @@ description: Focused on product alignment, high-level planning, and track initia

 You are the Tier 1 Orchestrator. Your role is to oversee the product direction and manage project/track initialization within the Conductor framework.

+## Primary Context Documents
+Read at session start: `conductor/product.md`, `conductor/product-guidelines.md`
+
+## Architecture Fallback
+When planning tracks that touch core systems, consult:
+- `docs/guide_architecture.md`: Threading, events, AI client, HITL, frame-sync action catalog
+- `docs/guide_tools.md`: MCP Bridge, Hook API endpoints, ApiHookClient methods
+- `docs/guide_mma.md`: Ticket/Track structures, DAG engine, ConductorEngine, worker lifecycle
+- `docs/guide_simulations.md`: live_gui fixture, Puppeteer pattern, mock provider
+
 ## Responsibilities
 - Maintain alignment with the product guidelines and definition.
 - Define track boundaries and initialize new tracks (`/conductor:newTrack`).
 - Set up the project environment (`/conductor:setup`).
 - Delegate track execution to the Tier 2 Tech Lead.

+## Surgical Spec Protocol (MANDATORY)
+When creating or refining tracks, you MUST:
+1. **Audit** the codebase with `get_code_outline`, `py_get_definition`, `grep_search` before writing any spec. Document what exists with file:line refs.
+2. **Spec gaps, not features** — frame requirements relative to what already exists.
+3. **Write worker-ready tasks** — each specifies WHERE (file:line), WHAT (change), HOW (API call), SAFETY (thread constraints).
+4. **For fix tracks** — list root cause candidates with code-level reasoning.
+5. **Reference architecture docs** — link to relevant `docs/guide_*.md` sections.
+6. **Map dependencies** — state execution order and blockers between tracks.
+
+See `activate_skill mma-orchestrator` for the full protocol and examples.
+
 ## Limitations
 - Do not execute tracks or implement features.
 - Do not write code or perform low-level bug fixing.
@@ -7,6 +7,13 @@ description: Focused on track execution, architectural design, and implementatio

 You are the Tier 2 Tech Lead. Your role is to manage the implementation of tracks (`/conductor:implement`), ensure architectural integrity, and oversee the work of Tier 3 and 4 sub-agents.

+## Architecture Fallback
+When implementing tracks, consult these docs for threading, data flow, and module interactions:
+- `docs/guide_architecture.md`: Thread domains, `_process_pending_gui_tasks` action catalog, AI client architecture, HITL blocking flow
+- `docs/guide_tools.md`: MCP tools, Hook API endpoints, session logging
+- `docs/guide_mma.md`: Ticket/Track structures, DAG engine, worker lifecycle
+- `docs/guide_simulations.md`: Testing patterns, mock provider
+
 ## Responsibilities
 - Manage the execution of implementation tracks.
 - Ensure alignment with `tech-stack.md` and project architecture.
@@ -14,6 +21,15 @@ You are the Tier 2 Tech Lead. Your role is to manage the implementation of track
 - Maintain persistent context throughout a track's implementation phase (No Context Amnesia).
 - Review implementations and coordinate bug fixes via Tier 4 QA.

+## Surgical Delegation Protocol
+When delegating to Tier 3 workers, construct prompts that specify:
+- **WHERE**: Exact file and line range to modify
+- **WHAT**: The specific change (add function, modify dict, extend table)
+- **HOW**: Which API calls, data structures, or patterns to use
+- **SAFETY**: Thread-safety constraints (e.g., "push via `_pending_gui_tasks` with lock")
+
+Example prompt: `"In gui_2.py, modify _render_mma_dashboard (lines 2685-2699). Extend the token usage table from 3 to 5 columns by adding 'Model' and 'Est. Cost'. Use imgui.table_setup_column(). Import cost_tracker. Use 1-space indentation."`
+
 ## Limitations
 - Do not perform heavy implementation work directly; delegate to Tier 3.
 - Delegate implementation tasks to Tier 3 Workers using `uv run python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`.
@@ -36,14 +36,14 @@ All tasks follow a strict lifecycle:
 4. **Write Failing Tests (Red Phase):**
   - **Pre-Delegation Checkpoint:** Before spawning a worker for dangerous or non-trivial changes, ensure your current progress is staged (`git add .`) or committed. This prevents losing iterations if a sub-agent incorrectly uses `git restore`.
   - **Code Style:** ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
-   - **Delegate Test Creation:** Do NOT write test code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a prompt to create the necessary test files and unit tests based on the task criteria. (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
+   - **Delegate Test Creation:** Do NOT write test code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a **surgical prompt** specifying WHERE (file:line range), WHAT (test to create), HOW (which assertions/fixtures to use), and SAFETY (thread constraints if applicable). Example: `"Write tests in tests/test_cost_tracker.py for cost_tracker.py:estimate_cost(). Test all model patterns in MODEL_PRICING dict. Assert unknown model returns 0. Use 1-space indentation."` (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
   - Take the code generated by the Worker and apply it.
   - **CRITICAL:** Run the tests and confirm that they fail as expected. This is the "Red" phase of TDD. Do not proceed until you have failing tests.

 5. **Implement to Pass Tests (Green Phase):**
   - **Pre-Delegation Checkpoint:** Ensure current progress is staged or committed before delegating.
   - **Code Style:** ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
-   - **Delegate Implementation:** Do NOT write the implementation code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a highly specific prompt to write the minimum amount of application code necessary to make the failing tests pass. (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
+   - **Delegate Implementation:** Do NOT write the implementation code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a **surgical prompt** specifying WHERE (file:line range to modify), WHAT (the specific change), HOW (which API calls, data structures, or patterns to use), and SAFETY (thread-safety constraints). Example: `"In gui_2.py _render_mma_dashboard (lines 2685-2699), extend the token usage table from 3 to 5 columns. Add 'Model' and 'Est. Cost' using imgui.table_setup_column(). Call cost_tracker.estimate_cost(model, input_tokens, output_tokens). Use 1-space indentation."` (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
   - Take the code generated by the Worker and apply it.
   - Run the test suite again and confirm that all tests now pass. This is the "Green" phase.

@@ -13,16 +13,46 @@ To accomplish this, you MUST delegate token-heavy or stateless tasks to **Tier 3
 To ensure proper environment handling and logging, you MUST NOT call the `gemini` command directly for sub-tasks. Instead, use the wrapper script:
 `uv run python scripts/mma_exec.py --role <Role> "..."`

+## 0. Architecture Fallback & Surgical Methodology
+
+**Before creating or refining any track**, consult the deep-dive architecture docs:
+- `docs/guide_architecture.md`: Thread domains, event system (`AsyncEventQueue`, `_pending_gui_tasks` action catalog), AI client multi-provider architecture, HITL Execution Clutch blocking flow, frame-sync mechanism
+- `docs/guide_tools.md`: MCP Bridge 3-layer security model, full 26-tool inventory with params, Hook API GET/POST endpoints with request/response formats, ApiHookClient method reference
+- `docs/guide_mma.md`: Ticket/Track/WorkerContext data structures, DAG engine (cycle detection, topological sort), ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia
+- `docs/guide_simulations.md`: `live_gui` fixture lifecycle, Puppeteer pattern, mock provider JSON-L protocol, visual verification patterns
+
+### The Surgical Spec Protocol (MANDATORY for track creation)
+
+When creating tracks (`activate_skill mma-tier1-orchestrator`), follow this protocol:
+
+1. **AUDIT BEFORE SPECIFYING**: Use `get_code_outline`, `py_get_definition`, `grep_search`, and `get_git_diff` to map what already exists. Previous track specs asked to re-implement existing features (Track Browser, DAG tree, approval dialogs) because no audit was done. Document findings in a "Current State Audit" section with file:line references.
+
+2. **GAPS, NOT FEATURES**: Frame requirements as what's MISSING relative to what exists.
+   - GOOD: "The existing `_render_mma_dashboard` (gui_2.py:2633-2724) has a token usage table but no cost column."
+   - BAD: "Build a metrics dashboard with token and cost tracking."
+
+3. **WORKER-READY TASKS**: Each plan task must specify:
+   - **WHERE**: Exact file and line range (`gui_2.py:2700-2701`)
+   - **WHAT**: The specific change (add function, modify dict, extend table)
+   - **HOW**: Which API calls (`imgui.progress_bar(...)`, `imgui.collapsing_header(...)`)
+   - **SAFETY**: Thread-safety constraints if cross-thread data is involved
+
+4. **ROOT CAUSE ANALYSIS** (for fix tracks): Don't write "investigate and fix." List specific candidates with code-level reasoning.
+
+5. **REFERENCE DOCS**: Link to relevant `docs/guide_*.md` sections in every spec.
+
+6. **MAP DEPENDENCIES**: State execution order and blockers between tracks.
+
 ## 1. The Tier 3 Worker (Execution)
 When performing code modifications or implementing specific requirements:
 1. **Pre-Delegation Checkpoint:** For dangerous or non-trivial changes, ALWAYS stage your changes (`git add .`) or commit before delegating to a Tier 3 Worker. If the worker fails or runs `git restore`, you will lose all prior AI iterations for that file if it wasn't staged/committed.
 2. **Code Style Enforcement:** You MUST explicitly remind the worker to "use exactly 1-space indentation for Python code" in your prompt to prevent them from breaking the established codebase style.
 3. **DO NOT** perform large code writes yourself.
-4. **DO** construct a single, highly specific prompt with a clear objective.
+4. **DO** construct a single, highly specific prompt with a clear objective. Include exact file:line references and the specific API calls to use (from your audit or the architecture docs).
 5. **DO** spawn a Tier 3 Worker.
-   *Command:* `uv run python scripts/mma_exec.py --role tier3-worker "Implement [SPECIFIC_INSTRUCTION] in [FILE_PATH]. Use 1-space indentation. Follow TDD and return success status or code changes."`
+   *Command:* `uv run python scripts/mma_exec.py --role tier3-worker "Implement [SPECIFIC_INSTRUCTION] in [FILE_PATH] at lines [N-M]. Use [SPECIFIC_API_CALL]. Use 1-space indentation."`
 6. **Handling Repeated Failures:** If a Tier 3 Worker fails multiple times on the same task, it may lack the necessary capability. You must track failures and retry with `--failure-count <N>` (e.g., `--failure-count 2`). This tells `mma_exec.py` to escalate the sub-agent to a more powerful reasoning model (like `gemini-3-flash`).
-7. The Tier 3 Worker is stateless and has tool access for file I/O. 
+7. The Tier 3 Worker is stateless and has tool access for file I/O.

 ## 2. The Tier 4 QA Agent (Diagnostics)
 If you run a test or command that fails with a significant error or large traceback:
@@ -38,15 +68,23 @@ Unlike the stateless sub-agents (Tiers 3 & 4), the **Tier 2 Tech Lead** maintain
 To minimize context bloat for Tier 2 & 3:
 1. Use `py_get_code_outline` or `get_tree` to map out the structure of a file or project.
 2. Use `py_get_skeleton` and `py_get_imports` to understand the interface, docstrings, and dependencies of modules.
-3. Use `py_find_usages` to pinpoint where a function or class is called instead of searching the whole codebase.
-4. Use `py_check_syntax` after making string replacements to ensure the file is still syntactically valid.
-5. Only use `read_file` with `start_line` and `end_line` for specific implementation details once target areas are identified.
-6. Tier 3 workers MUST NOT read the full content of unrelated files.
+3. Use `py_get_definition` to read specific functions/classes by name without loading entire files.
+4. Use `py_find_usages` to pinpoint where a function or class is called instead of searching the whole codebase.
+5. Use `py_check_syntax` after making string replacements to ensure the file is still syntactically valid.
+6. Only use `read_file` with `start_line` and `end_line` for specific implementation details once target areas are identified.
+7. Tier 3 workers MUST NOT read the full content of unrelated files.
+
+## 5. Cross-Skill Activation
+When your current role requires capabilities from another tier, use `activate_skill`:
+- **Track creation/refinement**: `activate_skill mma-tier1-orchestrator` — applies the Surgical Spec Protocol
+- **Track execution**: `activate_skill mma-tier2-tech-lead` — applies persistent context and TDD workflow
+- **Quick code task**: Spawn via `mma_exec.py --role tier3-worker` (stateless, no skill activation needed)
+- **Error analysis**: Spawn via `mma_exec.py --role tier4-qa` (stateless, no skill activation needed)

 <examples>
 ### Example 1: Spawning a Tier 4 QA Agent
 **User / System:** `pytest tests/test_gui.py` failed with 400 lines of output.
-**Agent (You):** 
+**Agent (You):**
 ```json
 {
  "command": "python scripts/mma_exec.py --role tier4-qa \"Summarize this stack trace into a 20-word fix: [snip first 30 lines...]\"",
@@ -54,15 +92,24 @@ To minimize context bloat for Tier 2 & 3:
 }
 ```

-### Example 2: Spawning a Tier 3 Worker
-**User:** Please implement the `ASTParser` class in `file_cache.py` as defined in Track 1.
+### Example 2: Spawning a Tier 3 Worker with Surgical Prompt
+**User:** Please implement the cost tracking column in the token usage table.
 **Agent (You):**
 ```json
 {
-  "command": "python scripts/mma_exec.py --role tier3-worker \"Read file_cache.py and implement the ASTParser class using tree-sitter. Ensure you preserve docstrings but strip function bodies. Output the updated code.\"",
-  "description": "Delegating implementation to a Tier 3 Worker."
+  "command": "python scripts/mma_exec.py --role tier3-worker \"In gui_2.py, modify _render_mma_dashboard (lines 2685-2699). Extend the token usage table from 3 columns to 5 by adding 'Model' and 'Est. Cost' columns. Use imgui.table_setup_column() for the new columns. Import cost_tracker and call cost_tracker.estimate_cost(model, input_tokens, output_tokens) for each tier row. Add a total row at the bottom. Use 1-space indentation.\"",
+  "description": "Delegating surgical implementation to Tier 3 Worker with exact line refs."
 }
 ```
+
+### Example 3: Creating a Track with Audit
+**User:** Create a track for adding dark mode support.
+**Agent (You):**
+1. First, audit the codebase: `py_get_code_outline gui_2.py` → find `_render_theme_panel` (lines 2993-3030).
+2. Read the existing implementation: `py_get_definition gui_2.py _render_theme_panel`.
+3. Check for existing color constants: `grep_search "vec4\|C_" gui_2.py`.
+4. Now write the spec with a "Current State Audit" section documenting what the theme panel already does.
+5. Write tasks referencing the exact lines and imgui color APIs to use.
 </examples>

 <triggers>
@@ -70,4 +117,5 @@ To minimize context bloat for Tier 2 & 3:
 - When encountering a large error trace from a shell execution (Errors > 100 lines).
 - When explicitly instructed to act as a "Tech Lead" or "Orchestrator".
 - When managing complex, multi-file Track implementations.
-</triggers>
+- When creating or refining conductor tracks (MUST follow Surgical Spec Protocol).
+</triggers>
@@ -73,11 +73,15 @@ def get_model_for_role(role: str, failure_count: int = 0) -> str:

 def get_role_documents(role: str) -> list[str]:
 if role == 'tier1-orchestrator' or role == 'tier1':
-  return ['conductor/product.md', 'conductor/product-guidelines.md']
+  return ['conductor/product.md', 'conductor/product-guidelines.md',
+   'docs/guide_architecture.md', 'docs/guide_mma.md']
 elif role == 'tier2-tech-lead' or role == 'tier2':
-  return ['conductor/tech-stack.md', 'conductor/workflow.md']
+  return ['conductor/tech-stack.md', 'conductor/workflow.md',
+   'docs/guide_architecture.md', 'docs/guide_mma.md']
 elif role == 'tier3-worker' or role == 'tier3':
-  return ['conductor/workflow.md']
+  return ['docs/guide_architecture.md']
+ elif role == 'tier4-qa' or role == 'tier4':
+  return ['docs/guide_architecture.md']
 return []

 def log_delegation(role: str, full_prompt: str, result: str | None = None, summary_prompt: str | None = None) -> str:
@@ -165,16 +169,22 @@ def execute_agent(role: str, prompt: str, docs: list[str], debug: bool = False,
  "Your goal is to implement specific code changes or tests based on the provided task. " \
  "CRITICAL CODE STYLE RULE: ALL Python code MUST use exactly 1 SPACE for indentation. DO NOT use 4 spaces or tabs. " \
  "You have access to tools for reading and writing files (e.g., read_file, write_file, replace), " \
-  "codebase investigation (discovered_tool_py_get_code_outline, discovered_tool_py_get_skeleton, discovered_tool_py_find_usages, discovered_tool_py_get_imports, discovered_tool_py_check_syntax, discovered_tool_get_tree), " \
+  "codebase investigation (discovered_tool_py_get_code_outline, discovered_tool_py_get_skeleton, discovered_tool_py_get_definition, discovered_tool_py_find_usages, discovered_tool_py_get_imports, discovered_tool_py_check_syntax, discovered_tool_get_tree), " \
  "version control (discovered_tool_get_git_diff), and web tools (discovered_tool_web_search, discovered_tool_fetch_url). " \
  "You CAN execute PowerShell scripts via discovered_tool_run_powershell for verification and testing. " \
+  "ARCHITECTURE REFERENCE: docs/guide_architecture.md contains the threading model, cross-thread data structures, " \
+  "frame-sync mechanism (_process_pending_gui_tasks action catalog), AI client architecture, and HITL Execution Clutch. " \
+  "CRITICAL THREADING RULE: NEVER write GUI state from a background thread. Push task dicts to _pending_gui_tasks with the lock. " \
+  "TASK FORMAT: Your task will specify WHERE (file:line), WHAT (change), HOW (API calls), and SAFETY (thread constraints). Follow these exactly. " \
  "Follow TDD and return success status or code changes. No pleasantries, no conversational filler."
 elif role in ['tier4', 'tier4-qa']:
  system_directive = "STRICT SYSTEM DIRECTIVE: You are a stateless Tier 4 QA Agent. " \
  "Your goal is to analyze errors, summarize logs, or verify tests. " \
-  "You have access to tools for reading files, exploring the codebase (discovered_tool_py_get_code_outline, discovered_tool_py_get_skeleton, discovered_tool_py_find_usages, discovered_tool_py_get_imports), " \
+  "You have access to tools for reading files, exploring the codebase (discovered_tool_py_get_code_outline, discovered_tool_py_get_skeleton, discovered_tool_py_get_definition, discovered_tool_py_find_usages, discovered_tool_py_get_imports), " \
  "version control (discovered_tool_get_git_diff), and web tools (discovered_tool_web_search, discovered_tool_fetch_url). " \
  "You CAN execute PowerShell scripts via discovered_tool_run_powershell for diagnostics. " \
+  "ARCHITECTURE REFERENCE: docs/guide_architecture.md contains the threading model and data flow. " \
+  "When analyzing errors, trace the data flow through the thread domains (GUI main, asyncio worker, HookServer) documented there. " \
  "ONLY output the requested analysis. No pleasantries."
 else:
  system_directive = f"STRICT SYSTEM DIRECTIVE: You are a stateless {role}. " \