- Add step 1: read mma-tier2-tech-lead.md before any track work
- Add explicit stop rule when Tier 3 delegation fails (credit/API error)
Tier 2 must NOT silently absorb Tier 3 work as a fallback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tier 1 planning calls are strategic — the model should never use file tools
during epic initialization. This caused JSON parse failures when the model
tried to verify file references in the epic prompt.
- ai_client.py: add enable_tools param to send() and _send_gemini()
- orchestrator_pm.py: pass enable_tools=False in generate_tracks()
- tests/visual_sim_mma_v2.py: remove file reference from test epic
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Task 2.2 of mma_pipeline_fix_20260301: _cb_plan_epic captures comms baseline before generate_tracks() and pushes mma_tier_usage['Tier 1'] update via custom_callback. _start_track_logic does same for generate_tickets() -> mma_tier_usage['Tier 2'].
Task 2.1 of mma_pipeline_fix_20260301: capture comms baseline before send(), then sum input_tokens/output_tokens from IN/response entries to populate engine.tier_usage['Tier 3'].
Tasks 1.1, 1.2, 1.3 of mma_pipeline_fix_20260301:
- Task 1.1: Add [MMA] diagnostic print before _queue_put in run_worker_lifecycle; enhance except to include traceback
- Task 1.2: Replace unsafe event_queue._queue.put_nowait() else branches with RuntimeError in run_worker_lifecycle, confirm_execution, confirm_spawn
- Task 1.3: Verified run_in_executor positional arg order is correct (no change needed)
Addresses three gaps where Claude Code and Gemini CLI outperform Manual Slop's
MMA during actual execution:
1. Live worker streaming: Wire comms_log_callback to per-ticket streams so
users see real-time output instead of waiting for worker completion.
2. Per-tier model config: Replace hardcoded get_model_for_role with GUI
dropdowns persisted to project TOML.
3. Parallel DAG execution: asyncio.gather for independent tickets (exploratory
— _send_lock may block, needs investigation).
4. Auto-retry with escalation: flash-lite -> flash -> pro on BLOCKED, up to
2 retries (wires existing --failure-count mechanism into ConductorEngine).
7 new tasks across Phase 6, bringing total to 30 tasks across 6 phases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mma_exec.py changes:
- get_role_documents: Tier 1 now gets docs/guide_architecture.md + guide_mma.md
(was: only product.md). Tier 2 gets same (was: only tech-stack + workflow).
Tier 3 gets guide_architecture.md (was: only workflow.md — workers modifying
gui_2.py had zero knowledge of threading model). Tier 4 gets guide_architecture.md
(was: nothing).
- Tier 3 system directive: Added ARCHITECTURE REFERENCE callout, CRITICAL
THREADING RULE (never write GUI state from background thread), TASK FORMAT
instruction (follow WHERE/WHAT/HOW/SAFETY from surgical tasks), and
py_get_definition to tool list.
- Tier 4 system directive: Added ARCHITECTURE REFERENCE callout and instruction
to trace errors through thread domains documented in guide_architecture.md.
conductor/workflow.md changes:
- Red Phase delegation prompt: Replaced 'with a prompt to create tests' with
surgical prompt format example showing WHERE/WHAT/HOW/SAFETY.
- Green Phase delegation prompt: Replaced 'with a highly specific prompt' with
surgical prompt format example with exact line refs and API calls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updates three Gemini skill files to match the Claude command methodology:
mma-orchestrator/SKILL.md:
- New Section 0: Architecture Fallback with links to all 4 docs/guide_*.md
- New Surgical Spec Protocol (6-point mandatory checklist)
- New Section 5: Cross-Skill Activation for tier transitions
- Example 2 rewritten with surgical prompt (exact line refs + API calls)
- New Example 3: Track creation with audit-first workflow
- Added py_get_definition to tool usage guidance
mma-tier1-orchestrator/SKILL.md:
- Added Architecture Fallback and Surgical Spec Protocol summary
- References activate_skill mma-orchestrator for full protocol
mma-tier2-tech-lead/SKILL.md:
- Added Architecture Fallback section
- Added Surgical Delegation Protocol with WHERE/WHAT/HOW/SAFETY example
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Distills what made this session's track specs high-quality into reusable
methodology for both Claude and Gemini Tier 1 orchestrators:
Key additions to conductor-new-track.md:
- MANDATORY Step 2: Deep Codebase Audit before writing any spec
- 'Current State Audit' section template (Already Implemented + Gaps)
- 6 rules for writing worker-ready tasks (WHERE/WHAT/HOW/SAFETY)
- Anti-patterns section (vague specs, no line refs, no audit, etc.)
- Architecture doc fallback references
Key additions to mma-tier1-orchestrator.md (Claude + Gemini):
- 'The Surgical Methodology' section with 6 protocols
- Spec template with REQUIRED sections (Current State Audit is mandatory)
- Plan template with REQUIRED task format (file:line refs + API calls)
- Root cause analysis requirement for fix tracks
- Cross-track dependency mapping requirement
- Added py_get_definition to Gemini's tool list (was missing)
The core insight: the quality gap between this session's output and previous
track specs came from (1) reading actual code before writing specs, (2) listing
what EXISTS before what's MISSING, and (3) specifying exact locations and APIs
in tasks so lesser models don't have to search or guess.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrites comprehensive_gui_ux_20260228 spec and plan using deep analysis of
the actual gui_2.py implementation (3078 lines). The previous spec asked to
implement features that already exist (Track Browser, DAG tree, epic planning,
approval dialogs, token table, performance monitor). The new spec:
- Documents 15 already-implemented features with exact line references
- Identifies 8 actual gaps (tier stream panels, DAG editing, cost tracking,
conductor lifecycle forms, track-scoped discussions, approval indicators,
track proposal editing, stream scrollability)
- Rewrites all 5 phases with surgical task descriptions referencing exact
gui_2.py line ranges, function names, and data structures
- Each task specifies the precise imgui API calls to use
- References docs/guide_architecture.md for threading constraints
- References docs/guide_mma.md for Ticket/Track data structures
Also adds architecture documentation fallback references to:
- conductor/workflow.md (new principle #9)
- conductor/product.md (new Architecture Reference section)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three independent root causes fixed:
- gui_2.py: Route mma_spawn_approval/mma_step_approval events in _process_event_queue
- multi_agent_conductor.py: Pass asyncio loop from ConductorEngine.run() through to
thread-pool workers for thread-safe event queue access; add _queue_put helper
- ai_client.py: Preserve GeminiCliAdapter in reset_session() instead of nulling it
Test: visual_sim_mma_v2::test_mma_complete_lifecycle passes in ~8s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>