conductor(tracks): archive 3 completed tracks, update tracks.md with active/archived sections

This commit is contained in:
2026-03-02 10:46:08 -05:00
parent e7879f45a6
commit c35f372f52
13 changed files with 17 additions and 7 deletions

View File

@@ -1,5 +0,0 @@
# Track comprehensive_gui_ux_20260228 Context
- [Specification](./spec.md)
- [Implementation Plan](./plan.md)
- [Metadata](./metadata.json)

View File

@@ -1,10 +0,0 @@
{
"description": "Enhance existing MMA orchestration GUI: tier stream panels, DAG editing, cost tracking, conductor lifecycle forms, track-scoped discussions, approval indicators, visual polish.",
"track_id": "comprehensive_gui_ux_20260228",
"type": "feature",
"created_at": "2026-03-01T08:42:57Z",
"status": "completed",
"updated_at": "2026-03-01T20:15:00Z",
"refined_by": "claude-opus-4-6 (1M context)",
"refined_from_commit": "08e003a"
}

View File

@@ -1,58 +0,0 @@
# Implementation Plan: Comprehensive Conductor & MMA GUI UX
Architecture reference: [docs/guide_architecture.md](../../docs/guide_architecture.md), [docs/guide_mma.md](../../docs/guide_mma.md)
## Phase 1: Tier Stream Panels & Approval Indicators
Focus: Make all 4 tier output streams visible and indicate pending approvals.
- [x] Task 1.1: Replace the single Tier 1 strategy text box in `_render_mma_dashboard` (gui_2.py:2700-2701) with four collapsible sections — one per tier. Each section uses `imgui.collapsing_header(f"Tier {N}: {label}")` wrapping a `begin_child` scrollable region (200px height). Tier 1 = "Strategy", Tier 2 = "Tech Lead", Tier 3 = "Workers", Tier 4 = "QA". Tier 3 should aggregate all `mma_streams` keys containing "Tier 3" with ticket ID sub-headers. Each section auto-scrolls to bottom when new content arrives (track previous scroll position, scroll only if user was at bottom).
- [x] Task 1.2: Add approval state indicators to the MMA dashboard. After the "Status:" line in `_render_mma_dashboard` (gui_2.py:2672-2676), check `self._pending_mma_spawn`, `self._pending_mma_approval`, and `self._pending_ask_dialog`. When any is active, render a colored blinking badge: `imgui.text_colored(ImVec4(1,0.3,0.3,1), "APPROVAL PENDING")` using `sin(time.time()*5)` for alpha pulse. Also add a `imgui.same_line()` button "Go to Approval" that scrolls/focuses the relevant dialog.
- [x] Task 1.3: Write unit tests verifying: (a) `mma_streams` with keys "Tier 1", "Tier 2 (Tech Lead)", "Tier 3: T-001", "Tier 4 (QA)" are all rendered (check by mocking `imgui.collapsing_header` calls); (b) approval indicators appear when `_pending_mma_spawn is not None`.
- [x] Task 1.4: Conductor - User Manual Verification 'Phase 1: Tier Stream Panels & Approval Indicators' (Protocol in workflow.md)
## Phase 2: Cost Tracking & Enhanced Token Table
Focus: Add cost estimation to the existing token usage display.
- [x] Task 2.1: Create a new module `cost_tracker.py` with a `MODEL_PRICING` dict mapping model name patterns to `{"input_per_mtok": float, "output_per_mtok": float}`. Include entries for: `gemini-2.5-flash-lite` ($0.075/$0.30), `gemini-2.5-flash` ($0.15/$0.60), `gemini-3-flash-preview` ($0.15/$0.60), `gemini-3.1-pro-preview` ($3.50/$10.50), `claude-*-sonnet` ($3/$15), `claude-*-opus` ($15/$75), `deepseek-v3` ($0.27/$1.10). Function: `estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float` that does pattern matching on model name and returns dollar cost.
- [x] Task 2.2: Extend the token usage table in `_render_mma_dashboard` (gui_2.py:2685-2699) from 3 columns to 5: add "Est. Cost" and "Model". Populate using `cost_tracker.estimate_cost()` with the model name from `self.mma_tier_usage` (need to extend `tier_usage` dict in `ConductorEngine._push_state` to include model name per tier, or use a default mapping: Tier 1 → `gemini-3.1-pro-preview`, Tier 2 → `gemini-3-flash-preview`, Tier 3 → `gemini-2.5-flash-lite`, Tier 4 → `gemini-2.5-flash-lite`). Show total cost row at bottom.
- [x] Task 2.3: Write tests for `cost_tracker.estimate_cost()` covering all model patterns and edge cases (unknown model returns 0).
- [x] Task 2.4: Conductor - User Manual Verification 'Phase 2: Cost Tracking & Enhanced Token Table' (Protocol in workflow.md)
## Phase 3: Track Proposal Editing & Conductor Lifecycle Forms
Focus: Make track proposals editable and add conductor setup/newTrack GUI forms.
- [x] Task 3.1: Enhance `_render_track_proposal_modal` (gui_2.py:2146-2173) to make track titles and goals editable. Replace `imgui.text_colored` for title with `imgui.input_text(f"##track_title_{idx}", track['title'])`. Replace `imgui.text_wrapped` for goal with `imgui.input_text_multiline(f"##track_goal_{idx}", track['goal'], ImVec2(-1, 60))`. Add a "Remove" button per track (`imgui.button(f"Remove##{idx}")`) that pops from `self.proposed_tracks`. Edited values must be written back to `self.proposed_tracks[idx]`.
- [x] Task 3.2: Add a "Conductor Setup" collapsible section at the top of the MMA dashboard (before the Track Browser). Contains a "Run Setup" button. On click, reads `conductor/workflow.md`, `conductor/tech-stack.md`, `conductor/product.md` using `Path.read_text()`, computes a readiness summary (files found, line counts, track count via `project_manager.get_all_tracks()`), and displays it in a read-only text region. This is informational only — no backend changes.
- [x] Task 3.3: Add a "New Track" form below the Track Browser. Fields: track name (input_text), description (input_text_multiline), type dropdown (feature/chore/fix via `imgui.combo`). "Create" button calls a new helper `_cb_create_track(name, desc, type)` that: creates `conductor/tracks/{name}_{date}/` directory, writes a minimal `spec.md` from the description, writes an empty `plan.md` template, writes `metadata.json` with the track ID/type/status="new", then refreshes `self.tracks` via `project_manager.get_all_tracks()`.
- [x] Task 3.4: Write tests for track creation helper: verify directory structure, file contents, and metadata.json format. Test proposal modal editing by verifying `proposed_tracks` list is mutated correctly.
- [x] Task 3.5: Conductor - User Manual Verification 'Phase 3: Track Proposal Editing & Conductor Lifecycle Forms' (Protocol in workflow.md)
## Phase 4: DAG Editing & Track-Scoped Discussion
Focus: Allow GUI-based ticket manipulation and track-specific discussion history.
- [x] Task 4.1: Add an "Add Ticket" button below the Task DAG section in `_render_mma_dashboard`. On click, show an inline form: ticket ID (input_text, default auto-increment like "T-NNN"), description (input_text_multiline), target_file (input_text), depends_on (multi-select or comma-separated input of existing ticket IDs). "Create" button appends a new `Ticket` dict to `self.active_tickets` with `status="todo"` and triggers `_push_mma_state_update()` to synchronize the ConductorEngine. Cancel hides the form. Store the form visibility in `self._show_add_ticket_form: bool`.
- [x] Task 4.2: Add a "Delete" button to each DAG node in `_render_ticket_dag_node` (gui_2.py:2770-2773, after the Skip button). On click, show a confirmation popup. On confirm, remove the ticket from `self.active_tickets`, remove it from all other tickets' `depends_on` lists, and push state update. Only allow deletion of `todo` or `blocked` tickets (not `in_progress` or `completed`).
- [x] Task 4.3: Add track-scoped discussion support. In `_render_discussion_panel` (gui_2.py:2295-2483), add a toggle checkbox "Track Discussion" (visible only when `self.active_track` is set). When toggled ON: load history via `project_manager.load_track_history(self.active_track.id, base_dir)` into `self.disc_entries`, set a flag `self._track_discussion_active = True`. When toggled OFF or track changes: restore project discussion. On save/flush, if `_track_discussion_active`, write to track history file instead of project history.
- [x] Task 4.4: Write tests for: (a) adding a ticket updates `active_tickets` and has correct default fields; (b) deleting a ticket removes it from all `depends_on` references; (c) track discussion toggle switches `disc_entries` source.
- [x] Task 4.5: Conductor - User Manual Verification 'Phase 4: DAG Editing & Track-Scoped Discussion' (Protocol in workflow.md)
## Phase 5: Visual Polish & Integration Testing
Focus: Dense, responsive dashboard with arcade aesthetics and end-to-end verification.
- [x] Task 5.1: Add color-coded styling to the Track Browser table. Status column uses colored text: "new" = gray, "active" = yellow, "done" = green, "blocked" = red. Progress bar uses `imgui.push_style_color` to tint: <33% red, 33-66% yellow, >66% green.
- [x] Task 5.2: Improve the DAG tree nodes with status-colored left borders. Use `imgui.get_cursor_screen_pos()` and `imgui.get_window_draw_list().add_rect_filled()` to draw a 4px colored strip to the left of each tree node matching its status color.
- [x] Task 5.3: Add a "Dashboard Summary" header line at the top of `_render_mma_dashboard` showing: `Track: {name} | Tickets: {done}/{total} | Cost: ${total_cost:.4f} | Status: {mma_status}` in a single dense line with colored segments.
- [x] Task 5.4: Write an end-to-end integration test (extending `tests/visual_sim_mma_v2.py` or creating `tests/visual_sim_gui_ux.py`) that verifies via `ApiHookClient`: (a) track creation form produces correct directory structure; (b) tier streams are populated during MMA execution; (c) approval indicators appear when expected; (d) cost tracking shows non-zero values after execution.
- [x] Task 5.5: Verify all new UI elements maintain >30 FPS via `get_ui_performance` during a full MMA simulation run.
- [x] Task 5.6: Conductor - User Manual Verification 'Phase 5: Visual Polish & Integration Testing' (Protocol in workflow.md)
## Phase 6: Live Worker Streaming & Engine Enhancements
Focus: Make MMA execution observable in real-time and configurable from the GUI. Currently workers are black boxes until completion.
- [x] Task 6.1: Wire `ai_client.comms_log_callback` to per-ticket streams during `run_worker_lifecycle` (multi_agent_conductor.py:207-300). Before calling `ai_client.send()`, set `ai_client.comms_log_callback` to a closure that pushes intermediate text chunks to the GUI via `_queue_put(event_queue, loop, "response", {"text": chunk, "stream_id": f"Tier 3 (Worker): {ticket.id}", "status": "streaming..."})`. After `send()` returns, restore the original callback. This gives real-time output streaming to the Tier 3 stream panels from Phase 1.
- [x] Task 6.2: Add per-tier model configuration to the MMA dashboard. Below the token usage table in `_render_mma_dashboard`, add a collapsible "Tier Model Config" section with 4 rows (Tier 1-4). Each row: tier label + `imgui.combo` dropdown populated from `ai_client.list_models()` (cached). Store selections in `self.mma_tier_models: dict[str, str]` with defaults from `mma_exec.get_model_for_role()`. On change, write to `self.project["mma"]["tier_models"]` for persistence.
- [x] Task 6.3: Wire per-tier model config into the execution pipeline. In `ConductorEngine.run` (multi_agent_conductor.py:105-135), when creating `WorkerContext`, read the model name from the GUI's `mma_tier_models` dict (passed via the event queue or stored on the engine). Pass it through to `run_worker_lifecycle` which should use it in `ai_client.set_provider`/`ai_client.set_model_params` before calling `send()`. Also update `mma_exec.py:get_model_for_role` to accept an override parameter.
- [x] Task 6.4: Add parallel DAG execution. In `ConductorEngine.run` (multi_agent_conductor.py:100-135), replace the sequential `for ticket in ready_tasks` loop with `asyncio.gather(*[loop.run_in_executor(None, run_worker_lifecycle, ...) for ticket in ready_tasks])`. Each worker already gets its own `ai_client.reset_session()` so they're isolated. Guard with `ai_client._send_lock` awareness — if the lock serializes all sends, parallel execution won't help. In that case, create per-worker provider instances or use separate session IDs. Mark this task as exploratory — if `_send_lock` blocks parallelism, document the constraint and defer.
- [x] Task 6.5: Add automatic retry with model escalation. In `ConductorEngine.run`, after `run_worker_lifecycle` returns, check if `ticket.status == "blocked"`. If so, and `retry_count < max_retries` (default 2), increment retry count, escalate the model (e.g., flash-lite → flash → pro), and re-execute. Store `retry_count` as a field on the ticket dict. After max retries, leave as blocked.
- [x] Task 6.6: Write tests for: (a) streaming callback pushes intermediate content to event queue; (b) per-tier model config persists to project TOML; (c) retry escalation increments model tier.
- [x] Task 6.7: Conductor - User Manual Verification 'Phase 6: Live Worker Streaming & Engine Enhancements' (Protocol in workflow.md)

View File

@@ -1,112 +0,0 @@
# Track Specification: Comprehensive Conductor & MMA GUI UX
## Overview
This track enhances the existing MMA orchestration GUI from its current functional-but-minimal state to a production-quality control surface. The existing implementation already has a working Track Browser, DAG tree visualizer, epic planning flow, approval dialogs, and token usage table. This track focuses on the **gaps**: dedicated tier stream panels, DAG editing, track-scoped discussions, conductor lifecycle GUI forms, cost tracking, and visual polish.
## Current State Audit (as of 08e003a)
### Already Implemented (DO NOT re-implement)
- **Track Browser table** (`_render_mma_dashboard`, lines 2633-2660): Title, status, progress bar, Load button per track.
- **Epic Planning** (`_render_projects_panel`, lines 1968-1983 + `_cb_plan_epic`): Input field + "Plan Epic (Tier 1)" button, background thread orchestration.
- **Track Proposal Modal** (`_render_track_proposal_modal`, lines 2146-2173): Shows proposed tracks, Start/Accept/Cancel.
- **Step Mode toggle**: Checkbox for "Step Mode (HITL)" with `self.mma_step_mode`.
- **Active Track Info**: Description + ticket progress bar.
- **Token Usage Table**: Per-tier input/output display in a 3-column ImGui table.
- **Tier 1 Strategy Stream**: `mma_streams.get("Tier 1")` rendered as read-only multiline (150px).
- **Task DAG Tree** (`_render_ticket_dag_node`, lines 2726-2785): Recursive tree with color-coded status (gray/yellow/green/red/orange), tooltips showing ID/target/description/dependencies/worker-stream, Retry/Skip buttons.
- **Spawn Interceptor** (`MMASpawnApprovalDialog`): Editable prompt, context_md, abort capability.
- **MMA Step Approval** (`MMAApprovalDialog`): Editable payload, approve/reject.
- **Script Confirmation** (`ConfirmDialog`): Editable script, approve/reject.
- **Comms History Panel** (`_render_comms_history_panel`, lines 2859-2984).
- **Tool Calls Panel** (`_render_tool_calls_panel`, lines 2787-2857).
- **Performance Monitor**: FPS, Frame Time, CPU, Input Lag via `perf_monitor`.
### Gaps to Fill (This Track's Scope)
1. **Tier Stream Panels**: Only Tier 1 gets a dedicated text box. Tier 2/3/4 streams exist in `mma_streams` dict but have no dedicated UI. Tier 3 output is tooltip-only on DAG nodes. No Tier 2 (Tech Lead) or Tier 4 (QA) visibility at all.
2. **DAG Editing**: Can Retry/Skip tickets but cannot reorder, insert, or delete tasks from the GUI.
3. **Conductor Lifecycle Forms**: `/conductor:setup` and `/conductor:newTrack` have no GUI equivalents — they're CLI-only. Users must use slash commands or the epic planning flow.
4. **Track-Scoped Discussion**: Discussions are global. When a track is active, the discussion panel should optionally isolate to that track's context. `project_manager.load_track_history()` exists but isn't wired to the GUI.
5. **Cost Estimation**: Token counts are displayed but not converted to estimated cost per tier or per track.
6. **Approval State Indicators**: The dashboard doesn't visually indicate when a spawn/step/tool approval is pending. `pending_mma_spawn_approval`, `pending_mma_step_approval`, `pending_tool_approval` are tracked but not rendered.
7. **Track Proposal Editing**: The modal shows proposed tracks read-only. No ability to edit track titles, goals, or remove unwanted tracks before accepting.
8. **Stream Scrollability**: Tier 1 stream is a 150px non-scrolling text box. Needs proper scrollable, resizable panels for all tier streams.
## Goals
1. **Tier Stream Visibility**: Dedicated, scrollable panels for all 4 tier output streams (Tier 1 Strategy, Tier 2 Tech Lead, Tier 3 Worker, Tier 4 QA) with auto-scroll and copy support.
2. **DAG Manipulation**: Add/remove tickets from the active track's DAG via the GUI, with dependency validation.
3. **Conductor GUI Forms**: Setup and track creation forms that invoke the same logic as the CLI slash commands.
4. **Track-Scoped Discussions**: Switch the discussion panel to track-specific history when a track is active.
5. **Cost Tracking**: Per-tier and per-track cost estimation based on model pricing.
6. **Approval Indicators**: Clear visual cues (blinking, color changes) when any approval gate is pending.
7. **Track Proposal Editing**: Allow editing/removing proposed tracks before acceptance.
8. **Polish & Density**: Make the dashboard information-dense and responsive to the MMA engine's state.
## Functional Requirements
### Tier Stream Panels
- Four collapsible/expandable text regions in the MMA dashboard, one per tier.
- Auto-scroll to bottom on new content. Toggle for manual scroll lock.
- Each stream populated from `self.mma_streams` keyed by tier prefix.
- Tier 3 streams: aggregate all `"Tier 3: T-xxx"` keyed entries, render with ticket ID headers.
### DAG Editing
- "Add Ticket" button: opens an inline form (ID, description, target_file, depends_on dropdown).
- "Remove Ticket" button on each DAG node (with confirmation).
- Changes must update `self.active_tickets`, rebuild the ConductorEngine's `TrackDAG`, and push state via `_push_state`.
### Conductor Lifecycle Forms
- "Setup Conductor" button that reads `conductor/workflow.md`, `conductor/tech-stack.md`, `conductor/product.md` and displays a readiness summary.
- "New Track" form: name, description, type dropdown. Creates the track directory structure under `conductor/tracks/`.
### Track-Scoped Discussion
- When `self.active_track` is set, add a toggle "Track Discussion" that switches to `project_manager.load_track_history(track_id)`.
- Saving flushes to the track's history file instead of the project's.
### Cost Tracking
- Model pricing table (configurable or hardcoded initial version).
- Compute `cost = (input_tokens / 1M) * input_price + (output_tokens / 1M) * output_price` per tier.
- Display as additional column in the existing token usage table.
### Approval Indicators
- When `_pending_mma_spawn` is not None: flash the "MMA Dashboard" tab header or show a blinking indicator.
- When `_pending_mma_approval` is not None: similar.
- When `_pending_ask_dialog` is True: similar.
- Use `imgui.push_style_color` to tint the relevant UI region.
### Track Proposal Editing
- Make track titles and goals editable in the proposal modal.
- Add a "Remove" button per proposed track.
- Edited data flows back to `self.proposed_tracks` before acceptance.
## Non-Functional Requirements
- **Thread Safety**: All new data mutations from background threads must go through `_pending_gui_tasks`. No direct GUI state writes from non-main threads.
- **No New Dependencies**: Use only existing Dear PyGui / imgui-bundle APIs.
- **Performance**: New panels must not degrade FPS below 30 under normal operation. Verify via `get_ui_performance`.
## Architecture Reference
- Threading model and `_process_pending_gui_tasks` action catalog: [docs/guide_architecture.md](../../docs/guide_architecture.md)
- MMA data structures (Ticket, Track, WorkerContext): [docs/guide_mma.md](../../docs/guide_mma.md)
- Hook API for testing: [docs/guide_tools.md](../../docs/guide_tools.md)
- Simulation patterns: [docs/guide_simulations.md](../../docs/guide_simulations.md)
## Functional Requirements (Engine Enhancements)
### Live Worker Streaming
- During `run_worker_lifecycle`, set `ai_client.comms_log_callback` to push intermediate text chunks to the per-ticket stream via the event queue. Currently workers are black boxes until completion — both Claude Code and Gemini CLI stream in real-time. The callback should push `{"text": chunk, "stream_id": "Tier 3 (Worker): {ticket.id}", "status": "streaming..."}` events.
### Per-Tier Model Configuration
- `mma_exec.py:get_model_for_role` is hardcoded. Add a GUI section with `imgui.combo` dropdowns for each tier's model. Persist to `project["mma"]["tier_models"]`. Wire into `ConductorEngine` and `run_worker_lifecycle`.
### Parallel DAG Execution
- `ConductorEngine.run()` executes ready tickets sequentially. DAG-independent tickets should run in parallel via `asyncio.gather`. Constraint: `ai_client._send_lock` serializes all API calls — parallel workers may need separate provider instances or the lock needs to be per-session rather than global. Mark as exploratory.
### Automatic Retry with Model Escalation
- `mma_exec.py` has `--failure-count` for escalation but `ConductorEngine` doesn't use it. When a worker produces BLOCKED, auto-retry with a more capable model (up to 2 retries).
## Out of Scope
- Remote management via web browser.
- Visual diagram generation (Dear PyGui node editor for DAG — future track).
- Docking/floating multi-viewport layout (requires imgui docking branch investigation — future track).

View File

@@ -1,5 +0,0 @@
# Track mma_pipeline_fix_20260301 Context
- [Specification](./spec.md)
- [Implementation Plan](./plan.md)
- [Metadata](./metadata.json)

View File

@@ -1,10 +0,0 @@
{
"track_id": "mma_pipeline_fix_20260301",
"description": "Fix Tier 3 worker responses not reaching mma_streams in GUI, fix token usage tracking stubs.",
"type": "fix",
"status": "new",
"priority": "P0",
"blocks": ["comprehensive_gui_ux_20260228", "simulation_hardening_20260301"],
"created_at": "2026-03-01T15:45:00Z",
"updated_at": "2026-03-01T15:45:00Z"
}

View File

@@ -1,18 +0,0 @@
# Implementation Plan: MMA Pipeline Fix & Worker Stream Verification
## Phase 1: Diagnose & Fix Worker Stream Pipeline
- [x] Task 1.1: Add diagnostic logging to `run_worker_lifecycle` (multi_agent_conductor.py:280-290). Before the `_queue_put` call, add `print(f"[MMA] Pushing Tier 3 response for {ticket.id}, loop={'present' if loop else 'NONE'}, stream_id={response_payload['stream_id']}")`. Also add a `print` inside the `except Exception as e` block that currently silently swallows errors. This will reveal whether (a) the function reaches the push point, (b) `loop` is passed correctly, (c) any exceptions are being swallowed. b7c2839
- [x] Task 1.2: Remove the unsafe `else` branch in `run_worker_lifecycle` (multi_agent_conductor.py:289-290) that calls `event_queue._queue.put_nowait()`. `asyncio.Queue` is NOT thread-safe from non-event-loop threads. The `else` branch should either raise an error (`raise RuntimeError("loop is required for thread-safe event queue access")`) or use a fallback that IS thread-safe. Same fix needed in `confirm_execution` (line 156) and `confirm_spawn` (line 183). b7c2839
- [x] Task 1.3: Verify the `run_in_executor` positional argument order at `multi_agent_conductor.py:118-127` matches `run_worker_lifecycle`'s signature exactly: `(ticket, context, context_files, event_queue, engine, md_content, loop)`. The signature at line 207 is: `(ticket, context, context_files=None, event_queue=None, engine=None, md_content="", loop=None)`. Positional args must be in this exact order. If any are swapped, fix the call site. VERIFIED CORRECT — no code change needed. b7c2839
- [x] Task 1.4: Write a unit test that creates a mock `AsyncEventQueue` and `asyncio.AbstractEventLoop`, calls `run_worker_lifecycle` with a mock `ai_client.send` (returning a fixed string), and verifies the `("response", {...})` event was pushed with the correct `stream_id` format `"Tier 3 (Worker): {ticket.id}"`. c5695c6
## Phase 2: Fix Token Usage Tracking
- [x] Task 2.1: In `run_worker_lifecycle` (multi_agent_conductor.py:295-298), the `stats = {}` stub produces zero token counts. Replace with `stats = ai_client.get_history_bleed_stats()` which returns a dict containing `"total_input_tokens"` and `"total_output_tokens"` (see ai_client.py:1657-1796). Extract the relevant fields and update `engine.tier_usage["Tier 3"]`. If `get_history_bleed_stats` is too heavy, use the simpler approach: after `ai_client.send()`, read the last comms log entry from `ai_client.get_comms_log()[-1]` which contains `payload.usage` with token counts. Used comms-log baseline approach. 3eefdfd
- [x] Task 2.2: Similarly fix Tier 1 and Tier 2 token tracking. In `_cb_plan_epic` (gui_2.py:1985-2010) and wherever Tier 2 calls happen, ensure `mma_tier_usage` is updated with actual token counts from comms log entries. a2097f1
## Phase 3: End-to-End Verification
- [x] Task 3.1: Update `tests/visual_sim_mma_v2.py` Stage 8 to assert that `mma_streams` contains a key matching `"Tier 3"` with non-empty content after a full mock MMA run. Rewrote test for real Gemini API (CLI quota exhausted) with _poll/_drain_approvals helpers, frame-sync sleeps, 120s timeouts. Addresses simulation_hardening Issues 2 & 3. 89a8d9b
- [x] Task 3.2: Fix Tier 1 tool-use bug (enable_tools=False in generate_tracks), rerun sim test — PASSED in 11s. ce5b6d2

View File

@@ -1,26 +0,0 @@
# Track Specification: MMA Pipeline Fix & Worker Stream Verification
## Overview
The MMA pipeline has a verified code path from `run_worker_lifecycle``_queue_put("response", ...)``_process_event_queue``_pending_gui_tasks("handle_ai_response")``mma_streams[stream_id] = text`. However, the robust_live_simulation track's session compression (2026-02-28) documented that Tier 3 worker output never appears in `mma_streams` during actual GUI operation. The simulation only ever sees `'Tier 1'` in `mma_streams` keys.
This track diagnoses and fixes the pipeline break, then verifies end-to-end that worker output flows from `ai_client.send()` through to the GUI's `mma_streams` dict.
## Root Cause Candidates (from code analysis)
1. **`run_in_executor` positional arg ordering**: `run_worker_lifecycle` has 7 parameters. The call at `multi_agent_conductor.py:118-127` passes them positionally. If the order is wrong, `loop` could be `None` and `_queue_put` would silently fail (the `if loop:` branch is skipped, falling to `event_queue._queue.put_nowait()` which may not work from a thread-pool thread because `asyncio.Queue.put_nowait` is not thread-safe when called from outside the event loop).
2. **`asyncio.Queue` thread safety**: `_queue_put` uses `asyncio.run_coroutine_threadsafe()` which IS thread-safe. But the `else` branch (`event_queue._queue.put_nowait(...)`) is NOT — `asyncio.Queue` is NOT thread-safe for cross-thread access. If `loop` is `None`, this branch silently corrupts or drops the event.
3. **`ai_client.reset_session()` side effects**: Called at the start of `run_worker_lifecycle`, this resets the global `_gemini_cli_adapter.session_id = None`. If the adapter is shared state and the GUI's Tier 2 call is still in-flight, this could corrupt the provider state.
4. **Token stats stub**: `engine.tier_usage` update uses `stats = {}` (empty dict, commented "ai_client.get_token_stats() is not available"), so `prompt_tokens` and `candidates_tokens` are always 0. Not a stream bug but a data bug.
## Goals
1. Fix Tier 3 worker responses reaching `mma_streams` in the GUI.
2. Fix token usage tracking for Tier 3 workers.
3. Verify via `ApiHookClient.get_mma_status()` that `mma_streams` contains Tier 3 output after a mock MMA run.
## Architecture Reference
- Threading model: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see "Cross-Thread Data Structures" and "Pattern A: AsyncEventQueue"
- Worker lifecycle: [docs/guide_mma.md](../../docs/guide_mma.md) — see "Tier 3: Worker Lifecycle"
- Frame-sync: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see "Frame-Sync Mechanism" action catalog (`handle_ai_response` with `stream_id`)

View File

@@ -1,5 +0,0 @@
# Track simulation_hardening_20260301 Context
- [Specification](./spec.md)
- [Implementation Plan](./plan.md)
- [Metadata](./metadata.json)

View File

@@ -1,10 +0,0 @@
{
"track_id": "simulation_hardening_20260301",
"description": "Stabilize visual_sim_mma_v2.py and mock_gemini_cli.py for reliable end-to-end MMA simulation.",
"type": "fix",
"status": "new",
"priority": "P1",
"depends_on": ["mma_pipeline_fix_20260301"],
"created_at": "2026-03-01T15:45:00Z",
"updated_at": "2026-03-01T15:45:00Z"
}

View File

@@ -1,22 +0,0 @@
# Implementation Plan: Simulation Hardening
Depends on: `mma_pipeline_fix_20260301`
Architecture reference: [docs/guide_simulations.md](../../docs/guide_simulations.md)
## Phase 1: Mock Provider Cleanup
- [x] Task 1.1: PRE-RESOLVED — mock_gemini_cli.py default path already returns plain text JSON (not function_call). Routing verified by code inspection: Epic/Sprint/Worker/tool-result all return plain text. Covered by Task 1.3 test.
- [x] Task 1.2: Fix mock sprint planning ticket format. Current mock returns `goal`/`target_file` fields; ConductorEngine.parse_json_tickets expects `description`/`status`/`assigned_to`. Also add `'generate the implementation tickets'` keyword detection alongside `'PATH: Sprint Planning'`. 0593b28
- [x] Task 1.3: Write a standalone test (`tests/test_mock_gemini_cli.py`) that invokes the mock script via `subprocess.run()` with various stdin prompts and verifies: (a) epic prompt → Track JSON, no tool calls; (b) sprint prompt → Ticket JSON, no tool calls; (c) worker prompt → plain text, no tool calls; (d) tool-result prompt → plain text response. 0873453
## Phase 2: Simulation Stability
- [x] Task 2.1: PRE-RESOLVED — visual_sim_mma_v2.py already has 0.31.5s frame-sync sleeps after every state-changing click, implemented in mma_pipeline_fix track (89a8d9b).
- [x] Task 2.2: PRE-RESOLVED — _poll() with condition lambdas already covers all state-transition waits cleanly. wait_for_value exists in ApiHookClient but _poll() is more flexible and already in use.
- [x] Task 2.3: Add `@pytest.mark.timeout(300)` to test_mma_complete_lifecycle to prevent infinite CI hangs. 63fa181
## Phase 3: End-to-End Verification
- [x] Task 3.1: PRE-RESOLVED — visual_sim_mma_v2.py passes in 11s against live GUI with real Gemini API (gemini-2.5-flash-lite). Verified in mma_pipeline_fix track. All 8 stages pass. ce5b6d2
- [x] Task 3.2: Added Stage 9 to sim test: non-blocking poll for mma_tier_usage Tier 3 non-zero (30s, warns if not wired). Tier 3 stream and mma_status checks already covered by Stages 7-8. 63fa181
- [x] Task 3.3: Fixed pending_script_approval gap (btn_approve_script unwired, _pending_dialog not in hook API). Sim test PASSED in 19.73s. Tier 3 token usage confirmed: input=34839, output=514. 90fc38f

View File

@@ -1,34 +0,0 @@
# Track Specification: Simulation Hardening
## Overview
The `robust_live_simulation_verification` track is marked complete but its session compression documents three unresolved issues: (1) brittle mock that triggers the wrong approval popup, (2) popup state desynchronization after "Accept" clicks, (3) Tier 3 output never appearing in `mma_streams` (fixed by `mma_pipeline_fix` track). This track stabilizes the simulation framework so it reliably passes end-to-end.
## Prerequisites
- `mma_pipeline_fix_20260301` MUST be completed first (fixes Tier 3 stream plumbing).
## Current Issues (from session compression 2026-02-28)
### Issue 1: Mock Triggers Wrong Approval Popup
`mock_gemini_cli.py` defaults to emitting a `read_file` tool call, which triggers the general tool approval popup (`_pending_ask_dialog`) instead of the MMA spawn popup (`_pending_mma_spawn`). The test expects the spawn popup and times out.
**Root cause**: The mock's default response path doesn't distinguish between MMA orchestration prompts and Tier 3 worker prompts. It needs to NOT emit tool calls for orchestration-level prompts (Tier 1/2), only for worker-level prompts where tool use is expected.
### Issue 2: Popup State Desynchronization
After clicking "Accept" on the track proposal modal, `_show_track_proposal_modal` is set to `False` but the test still sees the popup as active. The hook API's `mma_status` returns stale `proposed_tracks` data.
**Root cause**: `_cb_accept_tracks` (gui_2.py:2012-2045) processes tracks and clears `proposed_tracks`, but this runs on the GUI thread. The `ApiHookClient.get_mma_status()` reads via the GUI trampoline pattern, but there may be a frame delay before the state updates are visible.
### Issue 3: Approval Type Ambiguity
The test polling loop auto-approves `pending_approval` but can't distinguish between tool approval (`_pending_ask_dialog`), MMA step approval (`_pending_mma_approval`), and spawn approval (`_pending_mma_spawn`). The simulation needs explicit handling for each type.
**Already resolved in code**: `get_mma_status` now returns separate `pending_tool_approval`, `pending_mma_step_approval`, `pending_mma_spawn_approval` booleans. The test in `visual_sim_mma_v2.py` already checks these individually. The fix is in making the mock not trigger unexpected approval types.
## Goals
1. Make `tests/visual_sim_mma_v2.py` pass reliably against the live GUI.
2. Clean up mock_gemini_cli.py to be deterministic and not trigger spurious approvals.
3. Add retry/timeout resilience to polling loops.
## Architecture Reference
- Simulation patterns: [docs/guide_simulations.md](../../docs/guide_simulations.md)
- Hook API endpoints: [docs/guide_tools.md](../../docs/guide_tools.md) — see `/api/gui/mma_status` response fields
- HITL mechanism: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see "The Execution Clutch"