Files

Ed_ 0d2b6049d1 conductor: Create 3 MVP tracks with surgical specs from full codebase analysis

Three new tracks identified by analyzing product.md requirements against
actual codebase state using 1M-context Opus with all architecture docs loaded:

1. mma_pipeline_fix_20260301 (P0, blocker):
   - Diagnoses why Tier 3 worker output never reaches mma_streams in GUI
   - Identifies 4 root cause candidates: positional arg ordering, asyncio.Queue
     thread-safety violation, ai_client.reset_session() side effects, token
     stats stub returning empty dict
   - 2 phases, 6 tasks with exact line references

2. simulation_hardening_20260301 (P1, depends on pipeline fix):
   - Addresses 3 documented issues from robust_live_simulation session compression
   - Mock triggers wrong approval popup, popup state desync, approval ambiguity
   - 3 phases, 9 tasks including standalone mock test suite

3. context_token_viz_20260301 (P2):
   - Builds UI for product.md primary use case #2 'Context & Memory Management'
   - Backend already complete (get_history_bleed_stats, 140 lines)
   - Token budget bar, proportion breakdown, trimming preview, cache status
   - 3 phases, 10 tasks

Execution order: pipeline_fix -> simulation_hardening -> gui_ux (parallel w/ token_viz)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-01 09:58:34 -05:00

2.8 KiB

Raw Blame History

Track Specification: Simulation Hardening

Overview

The robust_live_simulation_verification track is marked complete but its session compression documents three unresolved issues: (1) brittle mock that triggers the wrong approval popup, (2) popup state desynchronization after "Accept" clicks, (3) Tier 3 output never appearing in mma_streams (fixed by mma_pipeline_fix track). This track stabilizes the simulation framework so it reliably passes end-to-end.

Prerequisites

mma_pipeline_fix_20260301 MUST be completed first (fixes Tier 3 stream plumbing).

Current Issues (from session compression 2026-02-28)

mock_gemini_cli.py defaults to emitting a read_file tool call, which triggers the general tool approval popup (_pending_ask_dialog) instead of the MMA spawn popup (_pending_mma_spawn). The test expects the spawn popup and times out.

Root cause: The mock's default response path doesn't distinguish between MMA orchestration prompts and Tier 3 worker prompts. It needs to NOT emit tool calls for orchestration-level prompts (Tier 1/2), only for worker-level prompts where tool use is expected.

After clicking "Accept" on the track proposal modal, _show_track_proposal_modal is set to False but the test still sees the popup as active. The hook API's mma_status returns stale proposed_tracks data.

Root cause: _cb_accept_tracks (gui_2.py:2012-2045) processes tracks and clears proposed_tracks, but this runs on the GUI thread. The ApiHookClient.get_mma_status() reads via the GUI trampoline pattern, but there may be a frame delay before the state updates are visible.

Issue 3: Approval Type Ambiguity

The test polling loop auto-approves pending_approval but can't distinguish between tool approval (_pending_ask_dialog), MMA step approval (_pending_mma_approval), and spawn approval (_pending_mma_spawn). The simulation needs explicit handling for each type.

Already resolved in code: get_mma_status now returns separate pending_tool_approval, pending_mma_step_approval, pending_mma_spawn_approval booleans. The test in visual_sim_mma_v2.py already checks these individually. The fix is in making the mock not trigger unexpected approval types.

Goals

Make tests/visual_sim_mma_v2.py pass reliably against the live GUI.
Clean up mock_gemini_cli.py to be deterministic and not trigger spurious approvals.
Add retry/timeout resilience to polling loops.

Architecture Reference

Simulation patterns: docs/guide_simulations.md
Hook API endpoints: docs/guide_tools.md — see /api/gui/mma_status response fields
HITL mechanism: docs/guide_architecture.md — see "The Execution Clutch"

2.8 KiB Raw Blame History