conductor: Create 3 MVP tracks with surgical specs from full codebase analysis

Three new tracks identified by analyzing product.md requirements against actual codebase state using 1M-context Opus with all architecture docs loaded: 1. mma_pipeline_fix_20260301 (P0, blocker): - Diagnoses why Tier 3 worker output never reaches mma_streams in GUI - Identifies 4 root cause candidates: positional arg ordering, asyncio.Queue thread-safety violation, ai_client.reset_session() side effects, token stats stub returning empty dict - 2 phases, 6 tasks with exact line references 2. simulation_hardening_20260301 (P1, depends on pipeline fix): - Addresses 3 documented issues from robust_live_simulation session compression - Mock triggers wrong approval popup, popup state desync, approval ambiguity - 3 phases, 9 tasks including standalone mock test suite 3. context_token_viz_20260301 (P2): - Builds UI for product.md primary use case #2 'Context & Memory Management' - Backend already complete (get_history_bleed_stats, 140 lines) - Token budget bar, proportion breakdown, trimming preview, cache status - 3 phases, 10 tasks Execution order: pipeline_fix -> simulation_hardening -> gui_ux (parallel w/ token_viz) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-01 09:58:34 -05:00
parent d93f650c3a
commit 0d2b6049d1
9 changed files with 194 additions and 0 deletions
@@ -0,0 +1,10 @@
+{
+    "track_id": "mma_pipeline_fix_20260301",
+    "description": "Fix Tier 3 worker responses not reaching mma_streams in GUI, fix token usage tracking stubs.",
+    "type": "fix",
+    "status": "new",
+    "priority": "P0",
+    "blocks": ["comprehensive_gui_ux_20260228", "simulation_hardening_20260301"],
+    "created_at": "2026-03-01T15:45:00Z",
+    "updated_at": "2026-03-01T15:45:00Z"
+}
@@ -0,0 +1,18 @@
+# Implementation Plan: MMA Pipeline Fix & Worker Stream Verification
+
+## Phase 1: Diagnose & Fix Worker Stream Pipeline
+
+- [ ] Task 1.1: Add diagnostic logging to `run_worker_lifecycle` (multi_agent_conductor.py:280-290). Before the `_queue_put` call, add `print(f"[MMA] Pushing Tier 3 response for {ticket.id}, loop={'present' if loop else 'NONE'}, stream_id={response_payload['stream_id']}")`. Also add a `print` inside the `except Exception as e` block that currently silently swallows errors. This will reveal whether (a) the function reaches the push point, (b) `loop` is passed correctly, (c) any exceptions are being swallowed.
+- [ ] Task 1.2: Remove the unsafe `else` branch in `run_worker_lifecycle` (multi_agent_conductor.py:289-290) that calls `event_queue._queue.put_nowait()`. `asyncio.Queue` is NOT thread-safe from non-event-loop threads. The `else` branch should either raise an error (`raise RuntimeError("loop is required for thread-safe event queue access")`) or use a fallback that IS thread-safe. Same fix needed in `confirm_execution` (line 156) and `confirm_spawn` (line 183).
+- [ ] Task 1.3: Verify the `run_in_executor` positional argument order at `multi_agent_conductor.py:118-127` matches `run_worker_lifecycle`'s signature exactly: `(ticket, context, context_files, event_queue, engine, md_content, loop)`. The signature at line 207 is: `(ticket, context, context_files=None, event_queue=None, engine=None, md_content="", loop=None)`. Positional args must be in this exact order. If any are swapped, fix the call site.
+- [ ] Task 1.4: Write a unit test that creates a mock `AsyncEventQueue` and `asyncio.AbstractEventLoop`, calls `run_worker_lifecycle` with a mock `ai_client.send` (returning a fixed string), and verifies the `("response", {...})` event was pushed with the correct `stream_id` format `"Tier 3 (Worker): {ticket.id}"`.
+
+## Phase 2: Fix Token Usage Tracking
+
+- [ ] Task 2.1: In `run_worker_lifecycle` (multi_agent_conductor.py:295-298), the `stats = {}` stub produces zero token counts. Replace with `stats = ai_client.get_history_bleed_stats()` which returns a dict containing `"total_input_tokens"` and `"total_output_tokens"` (see ai_client.py:1657-1796). Extract the relevant fields and update `engine.tier_usage["Tier 3"]`. If `get_history_bleed_stats` is too heavy, use the simpler approach: after `ai_client.send()`, read the last comms log entry from `ai_client.get_comms_log()[-1]` which contains `payload.usage` with token counts.
+- [ ] Task 2.2: Similarly fix Tier 1 and Tier 2 token tracking. In `_cb_plan_epic` (gui_2.py:1985-2010) and wherever Tier 2 calls happen, ensure `mma_tier_usage` is updated with actual token counts from comms log entries.
+
+## Phase 3: End-to-End Verification
+
+- [ ] Task 3.1: Update `tests/visual_sim_mma_v2.py` Stage 8 to assert that `mma_streams` contains a key matching `"Tier 3"` with non-empty content after a full mock MMA run. If this already passes with the fixes from Phase 1, mark as verified. If not, trace the specific failure point using the diagnostic logging from Task 1.1.
+- [ ] Task 3.2: Conductor - User Manual Verification 'Phase 3: End-to-End Verification' (Protocol in workflow.md)
@@ -0,0 +1,26 @@
+# Track Specification: MMA Pipeline Fix & Worker Stream Verification
+
+## Overview
+The MMA pipeline has a verified code path from `run_worker_lifecycle` → `_queue_put("response", ...)` → `_process_event_queue` → `_pending_gui_tasks("handle_ai_response")` → `mma_streams[stream_id] = text`. However, the robust_live_simulation track's session compression (2026-02-28) documented that Tier 3 worker output never appears in `mma_streams` during actual GUI operation. The simulation only ever sees `'Tier 1'` in `mma_streams` keys.
+
+This track diagnoses and fixes the pipeline break, then verifies end-to-end that worker output flows from `ai_client.send()` through to the GUI's `mma_streams` dict.
+
+## Root Cause Candidates (from code analysis)
+
+1. **`run_in_executor` positional arg ordering**: `run_worker_lifecycle` has 7 parameters. The call at `multi_agent_conductor.py:118-127` passes them positionally. If the order is wrong, `loop` could be `None` and `_queue_put` would silently fail (the `if loop:` branch is skipped, falling to `event_queue._queue.put_nowait()` which may not work from a thread-pool thread because `asyncio.Queue.put_nowait` is not thread-safe when called from outside the event loop).
+
+2. **`asyncio.Queue` thread safety**: `_queue_put` uses `asyncio.run_coroutine_threadsafe()` which IS thread-safe. But the `else` branch (`event_queue._queue.put_nowait(...)`) is NOT — `asyncio.Queue` is NOT thread-safe for cross-thread access. If `loop` is `None`, this branch silently corrupts or drops the event.
+
+3. **`ai_client.reset_session()` side effects**: Called at the start of `run_worker_lifecycle`, this resets the global `_gemini_cli_adapter.session_id = None`. If the adapter is shared state and the GUI's Tier 2 call is still in-flight, this could corrupt the provider state.
+
+4. **Token stats stub**: `engine.tier_usage` update uses `stats = {}` (empty dict, commented "ai_client.get_token_stats() is not available"), so `prompt_tokens` and `candidates_tokens` are always 0. Not a stream bug but a data bug.
+
+## Goals
+1. Fix Tier 3 worker responses reaching `mma_streams` in the GUI.
+2. Fix token usage tracking for Tier 3 workers.
+3. Verify via `ApiHookClient.get_mma_status()` that `mma_streams` contains Tier 3 output after a mock MMA run.
+
+## Architecture Reference
+- Threading model: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see "Cross-Thread Data Structures" and "Pattern A: AsyncEventQueue"
+- Worker lifecycle: [docs/guide_mma.md](../../docs/guide_mma.md) — see "Tier 3: Worker Lifecycle"
+- Frame-sync: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see "Frame-Sync Mechanism" action catalog (`handle_ai_response` with `stream_id`)