conductor: Create 3 MVP tracks with surgical specs from full codebase analysis

Three new tracks identified by analyzing product.md requirements against actual codebase state using 1M-context Opus with all architecture docs loaded: 1. mma_pipeline_fix_20260301 (P0, blocker): - Diagnoses why Tier 3 worker output never reaches mma_streams in GUI - Identifies 4 root cause candidates: positional arg ordering, asyncio.Queue thread-safety violation, ai_client.reset_session() side effects, token stats stub returning empty dict - 2 phases, 6 tasks with exact line references 2. simulation_hardening_20260301 (P1, depends on pipeline fix): - Addresses 3 documented issues from robust_live_simulation session compression - Mock triggers wrong approval popup, popup state desync, approval ambiguity - 3 phases, 9 tasks including standalone mock test suite 3. context_token_viz_20260301 (P2): - Builds UI for product.md primary use case #2 'Context & Memory Management' - Backend already complete (get_history_bleed_stats, 140 lines) - Token budget bar, proportion breakdown, trimming preview, cache status - 3 phases, 10 tasks Execution order: pipeline_fix -> simulation_hardening -> gui_ux (parallel w/ token_viz) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-01 09:58:34 -05:00
parent d93f650c3a
commit 0d2b6049d1
9 changed files with 194 additions and 0 deletions
@@ -0,0 +1,10 @@
+{
+    "track_id": "simulation_hardening_20260301",
+    "description": "Stabilize visual_sim_mma_v2.py and mock_gemini_cli.py for reliable end-to-end MMA simulation.",
+    "type": "fix",
+    "status": "new",
+    "priority": "P1",
+    "depends_on": ["mma_pipeline_fix_20260301"],
+    "created_at": "2026-03-01T15:45:00Z",
+    "updated_at": "2026-03-01T15:45:00Z"
+}
@@ -0,0 +1,22 @@
+# Implementation Plan: Simulation Hardening
+
+Depends on: `mma_pipeline_fix_20260301`
+Architecture reference: [docs/guide_simulations.md](../../docs/guide_simulations.md)
+
+## Phase 1: Mock Provider Cleanup
+
+- [ ] Task 1.1: Rewrite `tests/mock_gemini_cli.py` response routing to be explicit about which prompts trigger tool calls vs plain text. Current default emits `read_file` tool calls which trigger `_pending_ask_dialog` (wrong approval type). Fix: only emit tool calls when the prompt contains `'"role": "tool"'` (already handled as the post-tool-call response path). The default path (Tier 3 worker prompts, epic planning, sprint planning) should return plain text only. Remove any remaining magic keyword matching that isn't necessary. Verify by checking that the mock's output for an epic planning prompt does NOT contain any `function_call` JSON.
+- [ ] Task 1.2: Add a new response route to `mock_gemini_cli.py` for Tier 2 Tech Lead prompts. Detect via `'PATH: Sprint Planning'` or `'generate the implementation tickets'` in the prompt. Return a well-formed JSON array of 2-3 mock tickets with proper `depends_on` relationships. Ensure the JSON is parseable by `conductor_tech_lead.py`'s multi-layer extraction (test by feeding the mock output through `json.loads()`).
+- [ ] Task 1.3: Write a standalone test (`tests/test_mock_gemini_cli.py`) that invokes the mock script via `subprocess.run()` with various stdin prompts and verifies: (a) epic prompt → Track JSON, no tool calls; (b) sprint prompt → Ticket JSON, no tool calls; (c) worker prompt → plain text, no tool calls; (d) tool-result prompt → plain text response.
+
+## Phase 2: Simulation Stability
+
+- [ ] Task 2.1: In `tests/visual_sim_mma_v2.py`, add a `time.sleep(0.5)` after every `client.click()` call that triggers a state change (Accept, Load Track, Approve). This gives the GUI thread one frame to process `_pending_gui_tasks` before the next `get_mma_status()` poll. The current rapid-fire click-then-poll pattern races against the frame-sync mechanism.
+- [ ] Task 2.2: Add explicit `client.wait_for_value()` calls after critical state transitions instead of raw polling loops. For example, after `client.click('btn_mma_accept_tracks')`, use `client.wait_for_value('proposed_tracks_count', 0, timeout=10)` (may need to add a `proposed_tracks_count` field to the `/api/gui/mma_status` response, or just poll until `proposed_tracks` is empty/absent).
+- [ ] Task 2.3: Add a test timeout decorator or `pytest.mark.timeout(300)` to the main test function to prevent infinite hangs in CI. Currently the test can hang forever if any polling loop never satisfies its condition.
+
+## Phase 3: End-to-End Verification
+
+- [ ] Task 3.1: Run the full `tests/visual_sim_mma_v2.py` against the live GUI with mock provider. All 8 stages must pass. Document any remaining failures with exact error output and polling state at time of failure.
+- [ ] Task 3.2: Verify that after the full simulation run, `client.get_mma_status()` returns: (a) `mma_status` is `'done'` or tickets are all `'completed'`; (b) `mma_streams` contains at least one key with `'Tier 3'`; (c) `mma_tier_usage` shows non-zero values for at least Tier 3.
+- [ ] Task 3.3: Conductor - User Manual Verification 'Phase 3: End-to-End Verification' (Protocol in workflow.md)
@@ -0,0 +1,34 @@
+# Track Specification: Simulation Hardening
+
+## Overview
+The `robust_live_simulation_verification` track is marked complete but its session compression documents three unresolved issues: (1) brittle mock that triggers the wrong approval popup, (2) popup state desynchronization after "Accept" clicks, (3) Tier 3 output never appearing in `mma_streams` (fixed by `mma_pipeline_fix` track). This track stabilizes the simulation framework so it reliably passes end-to-end.
+
+## Prerequisites
+- `mma_pipeline_fix_20260301` MUST be completed first (fixes Tier 3 stream plumbing).
+
+## Current Issues (from session compression 2026-02-28)
+
+### Issue 1: Mock Triggers Wrong Approval Popup
+`mock_gemini_cli.py` defaults to emitting a `read_file` tool call, which triggers the general tool approval popup (`_pending_ask_dialog`) instead of the MMA spawn popup (`_pending_mma_spawn`). The test expects the spawn popup and times out.
+
+**Root cause**: The mock's default response path doesn't distinguish between MMA orchestration prompts and Tier 3 worker prompts. It needs to NOT emit tool calls for orchestration-level prompts (Tier 1/2), only for worker-level prompts where tool use is expected.
+
+### Issue 2: Popup State Desynchronization
+After clicking "Accept" on the track proposal modal, `_show_track_proposal_modal` is set to `False` but the test still sees the popup as active. The hook API's `mma_status` returns stale `proposed_tracks` data.
+
+**Root cause**: `_cb_accept_tracks` (gui_2.py:2012-2045) processes tracks and clears `proposed_tracks`, but this runs on the GUI thread. The `ApiHookClient.get_mma_status()` reads via the GUI trampoline pattern, but there may be a frame delay before the state updates are visible.
+
+### Issue 3: Approval Type Ambiguity
+The test polling loop auto-approves `pending_approval` but can't distinguish between tool approval (`_pending_ask_dialog`), MMA step approval (`_pending_mma_approval`), and spawn approval (`_pending_mma_spawn`). The simulation needs explicit handling for each type.
+
+**Already resolved in code**: `get_mma_status` now returns separate `pending_tool_approval`, `pending_mma_step_approval`, `pending_mma_spawn_approval` booleans. The test in `visual_sim_mma_v2.py` already checks these individually. The fix is in making the mock not trigger unexpected approval types.
+
+## Goals
+1. Make `tests/visual_sim_mma_v2.py` pass reliably against the live GUI.
+2. Clean up mock_gemini_cli.py to be deterministic and not trigger spurious approvals.
+3. Add retry/timeout resilience to polling loops.
+
+## Architecture Reference
+- Simulation patterns: [docs/guide_simulations.md](../../docs/guide_simulations.md)
+- Hook API endpoints: [docs/guide_tools.md](../../docs/guide_tools.md) — see `/api/gui/mma_status` response fields
+- HITL mechanism: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see "The Execution Clutch"