checkpoint: this is a mess... need to define stricter DSL or system for how the AI devices sims and hookup api for tests.

This commit is contained in:
2026-02-28 22:50:14 -05:00
parent 2a69244f36
commit 6b0823ad6c
10 changed files with 101 additions and 77 deletions

View File

@@ -31,4 +31,15 @@ This is a multi-track phase. To ensure architectural integrity, these tracks **M
3. **MMA Dashboard Visualization Overhaul:** (Builds the UI to visualize the state and subsets)
4. **[CURRENT] Robust Live Simulation Verification:** (Builds the tests to verify the UI and state)
**Prerequisites for this track:** `MMA Dashboard Visualization Overhaul` MUST be completed (`[x]`) before starting this track.
**Prerequisites for this track:** `MMA Dashboard Visualization Overhaul` MUST be completed (`[x]`) before starting this track.
## Session Compression (2026-02-28)
**Current State & Glaring Issues:**
1. **Brittle Interception System:** The visual simulation (`tests/visual_sim_mma_v2.py`) relies heavily on polling an `api_hooks.py` endpoint (`/api/gui/mma_status`) that aggregates several boolean flags (`pending_approval`, `pending_spawn`). This has proven extremely brittle. For example, `mock_gemini_cli.py` defaults to emitting a `read_file` tool call, which triggers the *general* tool approval popup (`_pending_ask`), freezing the test because it was expecting the *MMA spawn* popup (`_pending_mma_spawn`) or the *Track Proposal* modal.
2. **Mock Pollution in App Domain:** Previous attempts to fix the simulation shoehorned test-specific mock JSON responses directly into `ai_client.py` and `scripts/mma_exec.py`. This conflates the test environment with the production application codebase.
3. **Popup Handling Failures:** The GUI's state machine for closing popups (like `_show_track_proposal_modal` in `_cb_accept_tracks`) is desynchronized from the hook API. The test clicks "Accept", the tracks generate, but the UI state doesn't cleanly reset, leading to endless timeouts during test runs.
**Next Steps for the Handoff:**
- Completely rip out the hardcoded mock JSON arrays from `ai_client.py` and `scripts/mma_exec.py`.
- Refactor `tests/mock_gemini_cli.py` to be a pure, standalone mock that perfectly simulates the expected streaming behavior of `gemini_cli` without relying on the app to intercept specific magic prompts.
- Stabilize the hook API (`api_hooks.py`) so the test script can unambiguously distinguish between a general tool approval, an MMA step approval, and an MMA worker spawn approval, instead of relying on a fragile `pending_approval` catch-all.