2.8 KiB
Track Specification: Simulation Hardening
Overview
The robust_live_simulation_verification track is marked complete but its session compression documents three unresolved issues: (1) brittle mock that triggers the wrong approval popup, (2) popup state desynchronization after "Accept" clicks, (3) Tier 3 output never appearing in mma_streams (fixed by mma_pipeline_fix track). This track stabilizes the simulation framework so it reliably passes end-to-end.
Prerequisites
mma_pipeline_fix_20260301MUST be completed first (fixes Tier 3 stream plumbing).
Current Issues (from session compression 2026-02-28)
Issue 1: Mock Triggers Wrong Approval Popup
mock_gemini_cli.py defaults to emitting a read_file tool call, which triggers the general tool approval popup (_pending_ask_dialog) instead of the MMA spawn popup (_pending_mma_spawn). The test expects the spawn popup and times out.
Root cause: The mock's default response path doesn't distinguish between MMA orchestration prompts and Tier 3 worker prompts. It needs to NOT emit tool calls for orchestration-level prompts (Tier 1/2), only for worker-level prompts where tool use is expected.
Issue 2: Popup State Desynchronization
After clicking "Accept" on the track proposal modal, _show_track_proposal_modal is set to False but the test still sees the popup as active. The hook API's mma_status returns stale proposed_tracks data.
Root cause: _cb_accept_tracks (gui_2.py:2012-2045) processes tracks and clears proposed_tracks, but this runs on the GUI thread. The ApiHookClient.get_mma_status() reads via the GUI trampoline pattern, but there may be a frame delay before the state updates are visible.
Issue 3: Approval Type Ambiguity
The test polling loop auto-approves pending_approval but can't distinguish between tool approval (_pending_ask_dialog), MMA step approval (_pending_mma_approval), and spawn approval (_pending_mma_spawn). The simulation needs explicit handling for each type.
Already resolved in code: get_mma_status now returns separate pending_tool_approval, pending_mma_step_approval, pending_mma_spawn_approval booleans. The test in visual_sim_mma_v2.py already checks these individually. The fix is in making the mock not trigger unexpected approval types.
Goals
- Make
tests/visual_sim_mma_v2.pypass reliably against the live GUI. - Clean up mock_gemini_cli.py to be deterministic and not trigger spurious approvals.
- Add retry/timeout resilience to polling loops.
Architecture Reference
- Simulation patterns: docs/guide_simulations.md
- Hook API endpoints: docs/guide_tools.md — see
/api/gui/mma_statusresponse fields - HITL mechanism: docs/guide_architecture.md — see "The Execution Clutch"