Three new tracks identified by analyzing product.md requirements against
actual codebase state using 1M-context Opus with all architecture docs loaded:
1. mma_pipeline_fix_20260301 (P0, blocker):
- Diagnoses why Tier 3 worker output never reaches mma_streams in GUI
- Identifies 4 root cause candidates: positional arg ordering, asyncio.Queue
thread-safety violation, ai_client.reset_session() side effects, token
stats stub returning empty dict
- 2 phases, 6 tasks with exact line references
2. simulation_hardening_20260301 (P1, depends on pipeline fix):
- Addresses 3 documented issues from robust_live_simulation session compression
- Mock triggers wrong approval popup, popup state desync, approval ambiguity
- 3 phases, 9 tasks including standalone mock test suite
3. context_token_viz_20260301 (P2):
- Builds UI for product.md primary use case #2 'Context & Memory Management'
- Backend already complete (get_history_bleed_stats, 140 lines)
- Token budget bar, proportion breakdown, trimming preview, cache status
- 3 phases, 10 tasks
Execution order: pipeline_fix -> simulation_hardening -> gui_ux (parallel w/ token_viz)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.8 KiB
Track Specification: Simulation Hardening
Overview
The robust_live_simulation_verification track is marked complete but its session compression documents three unresolved issues: (1) brittle mock that triggers the wrong approval popup, (2) popup state desynchronization after "Accept" clicks, (3) Tier 3 output never appearing in mma_streams (fixed by mma_pipeline_fix track). This track stabilizes the simulation framework so it reliably passes end-to-end.
Prerequisites
mma_pipeline_fix_20260301MUST be completed first (fixes Tier 3 stream plumbing).
Current Issues (from session compression 2026-02-28)
Issue 1: Mock Triggers Wrong Approval Popup
mock_gemini_cli.py defaults to emitting a read_file tool call, which triggers the general tool approval popup (_pending_ask_dialog) instead of the MMA spawn popup (_pending_mma_spawn). The test expects the spawn popup and times out.
Root cause: The mock's default response path doesn't distinguish between MMA orchestration prompts and Tier 3 worker prompts. It needs to NOT emit tool calls for orchestration-level prompts (Tier 1/2), only for worker-level prompts where tool use is expected.
Issue 2: Popup State Desynchronization
After clicking "Accept" on the track proposal modal, _show_track_proposal_modal is set to False but the test still sees the popup as active. The hook API's mma_status returns stale proposed_tracks data.
Root cause: _cb_accept_tracks (gui_2.py:2012-2045) processes tracks and clears proposed_tracks, but this runs on the GUI thread. The ApiHookClient.get_mma_status() reads via the GUI trampoline pattern, but there may be a frame delay before the state updates are visible.
Issue 3: Approval Type Ambiguity
The test polling loop auto-approves pending_approval but can't distinguish between tool approval (_pending_ask_dialog), MMA step approval (_pending_mma_approval), and spawn approval (_pending_mma_spawn). The simulation needs explicit handling for each type.
Already resolved in code: get_mma_status now returns separate pending_tool_approval, pending_mma_step_approval, pending_mma_spawn_approval booleans. The test in visual_sim_mma_v2.py already checks these individually. The fix is in making the mock not trigger unexpected approval types.
Goals
- Make
tests/visual_sim_mma_v2.pypass reliably against the live GUI. - Clean up mock_gemini_cli.py to be deterministic and not trigger spurious approvals.
- Add retry/timeout resilience to polling loops.
Architecture Reference
- Simulation patterns: docs/guide_simulations.md
- Hook API endpoints: docs/guide_tools.md — see
/api/gui/mma_statusresponse fields - HITL mechanism: docs/guide_architecture.md — see "The Execution Clutch"