Files
manual_slop/conductor/tracks/simulation_hardening_20260301/plan.md
Ed_ 0d2b6049d1 conductor: Create 3 MVP tracks with surgical specs from full codebase analysis
Three new tracks identified by analyzing product.md requirements against
actual codebase state using 1M-context Opus with all architecture docs loaded:

1. mma_pipeline_fix_20260301 (P0, blocker):
   - Diagnoses why Tier 3 worker output never reaches mma_streams in GUI
   - Identifies 4 root cause candidates: positional arg ordering, asyncio.Queue
     thread-safety violation, ai_client.reset_session() side effects, token
     stats stub returning empty dict
   - 2 phases, 6 tasks with exact line references

2. simulation_hardening_20260301 (P1, depends on pipeline fix):
   - Addresses 3 documented issues from robust_live_simulation session compression
   - Mock triggers wrong approval popup, popup state desync, approval ambiguity
   - 3 phases, 9 tasks including standalone mock test suite

3. context_token_viz_20260301 (P2):
   - Builds UI for product.md primary use case #2 'Context & Memory Management'
   - Backend already complete (get_history_bleed_stats, 140 lines)
   - Token budget bar, proportion breakdown, trimming preview, cache status
   - 3 phases, 10 tasks

Execution order: pipeline_fix -> simulation_hardening -> gui_ux (parallel w/ token_viz)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-01 09:58:34 -05:00

3.2 KiB

Implementation Plan: Simulation Hardening

Depends on: mma_pipeline_fix_20260301 Architecture reference: docs/guide_simulations.md

Phase 1: Mock Provider Cleanup

  • Task 1.1: Rewrite tests/mock_gemini_cli.py response routing to be explicit about which prompts trigger tool calls vs plain text. Current default emits read_file tool calls which trigger _pending_ask_dialog (wrong approval type). Fix: only emit tool calls when the prompt contains '"role": "tool"' (already handled as the post-tool-call response path). The default path (Tier 3 worker prompts, epic planning, sprint planning) should return plain text only. Remove any remaining magic keyword matching that isn't necessary. Verify by checking that the mock's output for an epic planning prompt does NOT contain any function_call JSON.
  • Task 1.2: Add a new response route to mock_gemini_cli.py for Tier 2 Tech Lead prompts. Detect via 'PATH: Sprint Planning' or 'generate the implementation tickets' in the prompt. Return a well-formed JSON array of 2-3 mock tickets with proper depends_on relationships. Ensure the JSON is parseable by conductor_tech_lead.py's multi-layer extraction (test by feeding the mock output through json.loads()).
  • Task 1.3: Write a standalone test (tests/test_mock_gemini_cli.py) that invokes the mock script via subprocess.run() with various stdin prompts and verifies: (a) epic prompt → Track JSON, no tool calls; (b) sprint prompt → Ticket JSON, no tool calls; (c) worker prompt → plain text, no tool calls; (d) tool-result prompt → plain text response.

Phase 2: Simulation Stability

  • Task 2.1: In tests/visual_sim_mma_v2.py, add a time.sleep(0.5) after every client.click() call that triggers a state change (Accept, Load Track, Approve). This gives the GUI thread one frame to process _pending_gui_tasks before the next get_mma_status() poll. The current rapid-fire click-then-poll pattern races against the frame-sync mechanism.
  • Task 2.2: Add explicit client.wait_for_value() calls after critical state transitions instead of raw polling loops. For example, after client.click('btn_mma_accept_tracks'), use client.wait_for_value('proposed_tracks_count', 0, timeout=10) (may need to add a proposed_tracks_count field to the /api/gui/mma_status response, or just poll until proposed_tracks is empty/absent).
  • Task 2.3: Add a test timeout decorator or pytest.mark.timeout(300) to the main test function to prevent infinite hangs in CI. Currently the test can hang forever if any polling loop never satisfies its condition.

Phase 3: End-to-End Verification

  • Task 3.1: Run the full tests/visual_sim_mma_v2.py against the live GUI with mock provider. All 8 stages must pass. Document any remaining failures with exact error output and polling state at time of failure.
  • Task 3.2: Verify that after the full simulation run, client.get_mma_status() returns: (a) mma_status is 'done' or tickets are all 'completed'; (b) mma_streams contains at least one key with 'Tier 3'; (c) mma_tier_usage shows non-zero values for at least Tier 3.
  • Task 3.3: Conductor - User Manual Verification 'Phase 3: End-to-End Verification' (Protocol in workflow.md)