feat(mma): complete Phase 6 and finalize Comprehensive GUI UX track

- Implement Live Worker Streaming: wire ai_client.comms_log_callback to Tier 3 streams - Add Parallel DAG Execution using asyncio.gather for non-dependent tickets - Implement Automatic Retry with Model Escalation (Flash-Lite -> Flash -> Pro) - Add Tier Model Configuration UI to MMA Dashboard with project TOML persistence - Fix FPS reporting in PerformanceMonitor to prevent transient 0.0 values - Update Ticket model with retry_count and dictionary-like access - Stabilize Gemini CLI integration tests and handle script approval events in simulations - Finalize and verify all 6 phases of the implementation plan
2026-03-01 22:38:43 -05:00
parent d1ce0eaaeb
commit 9fb01ce5d1
22 changed files with 756 additions and 498 deletions
@@ -1,10 +1,10 @@
-{
+{
    "description":  "Enhance existing MMA orchestration GUI: tier stream panels, DAG editing, cost tracking, conductor lifecycle forms, track-scoped discussions, approval indicators, visual polish.",
    "track_id":  "comprehensive_gui_ux_20260228",
    "type":  "feature",
    "created_at":  "2026-03-01T08:42:57Z",
-    "status":  "refined",
-    "updated_at":  "2026-03-01T15:30:00Z",
+    "status":  "completed",
+    "updated_at":  "2026-03-01T20:15:00Z",
    "refined_by":  "claude-opus-4-6 (1M context)",
    "refined_from_commit":  "08e003a"
 }
@@ -16,7 +16,7 @@ Focus: Add cost estimation to the existing token usage display.
 - [x] Task 2.1: Create a new module `cost_tracker.py` with a `MODEL_PRICING` dict mapping model name patterns to `{"input_per_mtok": float, "output_per_mtok": float}`. Include entries for: `gemini-2.5-flash-lite` ($0.075/$0.30), `gemini-2.5-flash` ($0.15/$0.60), `gemini-3-flash-preview` ($0.15/$0.60), `gemini-3.1-pro-preview` ($3.50/$10.50), `claude-*-sonnet` ($3/$15), `claude-*-opus` ($15/$75), `deepseek-v3` ($0.27/$1.10). Function: `estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float` that does pattern matching on model name and returns dollar cost.
 - [x] Task 2.2: Extend the token usage table in `_render_mma_dashboard` (gui_2.py:2685-2699) from 3 columns to 5: add "Est. Cost" and "Model". Populate using `cost_tracker.estimate_cost()` with the model name from `self.mma_tier_usage` (need to extend `tier_usage` dict in `ConductorEngine._push_state` to include model name per tier, or use a default mapping: Tier 1 → `gemini-3.1-pro-preview`, Tier 2 → `gemini-3-flash-preview`, Tier 3 → `gemini-2.5-flash-lite`, Tier 4 → `gemini-2.5-flash-lite`). Show total cost row at bottom.
 - [x] Task 2.3: Write tests for `cost_tracker.estimate_cost()` covering all model patterns and edge cases (unknown model returns 0).
- [~] Task 2.4: Conductor - User Manual Verification 'Phase 2: Cost Tracking & Enhanced Token Table' (Protocol in workflow.md)
+- [x] Task 2.4: Conductor - User Manual Verification 'Phase 2: Cost Tracking & Enhanced Token Table' (Protocol in workflow.md)

 ## Phase 3: Track Proposal Editing & Conductor Lifecycle Forms
 Focus: Make track proposals editable and add conductor setup/newTrack GUI forms.
@@ -25,7 +25,7 @@ Focus: Make track proposals editable and add conductor setup/newTrack GUI forms.
 - [x] Task 3.2: Add a "Conductor Setup" collapsible section at the top of the MMA dashboard (before the Track Browser). Contains a "Run Setup" button. On click, reads `conductor/workflow.md`, `conductor/tech-stack.md`, `conductor/product.md` using `Path.read_text()`, computes a readiness summary (files found, line counts, track count via `project_manager.get_all_tracks()`), and displays it in a read-only text region. This is informational only — no backend changes.
 - [x] Task 3.3: Add a "New Track" form below the Track Browser. Fields: track name (input_text), description (input_text_multiline), type dropdown (feature/chore/fix via `imgui.combo`). "Create" button calls a new helper `_cb_create_track(name, desc, type)` that: creates `conductor/tracks/{name}_{date}/` directory, writes a minimal `spec.md` from the description, writes an empty `plan.md` template, writes `metadata.json` with the track ID/type/status="new", then refreshes `self.tracks` via `project_manager.get_all_tracks()`.
 - [x] Task 3.4: Write tests for track creation helper: verify directory structure, file contents, and metadata.json format. Test proposal modal editing by verifying `proposed_tracks` list is mutated correctly.
- [~] Task 3.5: Conductor - User Manual Verification 'Phase 3: Track Proposal Editing & Conductor Lifecycle Forms' (Protocol in workflow.md)
+- [x] Task 3.5: Conductor - User Manual Verification 'Phase 3: Track Proposal Editing & Conductor Lifecycle Forms' (Protocol in workflow.md)

 ## Phase 4: DAG Editing & Track-Scoped Discussion
 Focus: Allow GUI-based ticket manipulation and track-specific discussion history.
@@ -34,25 +34,25 @@ Focus: Allow GUI-based ticket manipulation and track-specific discussion history
 - [x] Task 4.2: Add a "Delete" button to each DAG node in `_render_ticket_dag_node` (gui_2.py:2770-2773, after the Skip button). On click, show a confirmation popup. On confirm, remove the ticket from `self.active_tickets`, remove it from all other tickets' `depends_on` lists, and push state update. Only allow deletion of `todo` or `blocked` tickets (not `in_progress` or `completed`).
 - [x] Task 4.3: Add track-scoped discussion support. In `_render_discussion_panel` (gui_2.py:2295-2483), add a toggle checkbox "Track Discussion" (visible only when `self.active_track` is set). When toggled ON: load history via `project_manager.load_track_history(self.active_track.id, base_dir)` into `self.disc_entries`, set a flag `self._track_discussion_active = True`. When toggled OFF or track changes: restore project discussion. On save/flush, if `_track_discussion_active`, write to track history file instead of project history.
 - [x] Task 4.4: Write tests for: (a) adding a ticket updates `active_tickets` and has correct default fields; (b) deleting a ticket removes it from all `depends_on` references; (c) track discussion toggle switches `disc_entries` source.
- [~] Task 4.5: Conductor - User Manual Verification 'Phase 4: DAG Editing & Track-Scoped Discussion' (Protocol in workflow.md)
+- [x] Task 4.5: Conductor - User Manual Verification 'Phase 4: DAG Editing & Track-Scoped Discussion' (Protocol in workflow.md)

 ## Phase 5: Visual Polish & Integration Testing
 Focus: Dense, responsive dashboard with arcade aesthetics and end-to-end verification.

- [~] Task 5.1: Add color-coded styling to the Track Browser table. Status column uses colored text: "new" = gray, "active" = yellow, "done" = green, "blocked" = red. Progress bar uses `imgui.push_style_color` to tint: <33% red, 33-66% yellow, >66% green.
- [ ] Task 5.2: Improve the DAG tree nodes with status-colored left borders. Use `imgui.get_cursor_screen_pos()` and `imgui.get_window_draw_list().add_rect_filled()` to draw a 4px colored strip to the left of each tree node matching its status color.
- [ ] Task 5.3: Add a "Dashboard Summary" header line at the top of `_render_mma_dashboard` showing: `Track: {name} | Tickets: {done}/{total} | Cost: ${total_cost:.4f} | Status: {mma_status}` in a single dense line with colored segments.
- [ ] Task 5.4: Write an end-to-end integration test (extending `tests/visual_sim_mma_v2.py` or creating `tests/visual_sim_gui_ux.py`) that verifies via `ApiHookClient`: (a) track creation form produces correct directory structure; (b) tier streams are populated during MMA execution; (c) approval indicators appear when expected; (d) cost tracking shows non-zero values after execution.
- [ ] Task 5.5: Verify all new UI elements maintain >30 FPS via `get_ui_performance` during a full MMA simulation run.
- [ ] Task 5.6: Conductor - User Manual Verification 'Phase 5: Visual Polish & Integration Testing' (Protocol in workflow.md)
+- [x] Task 5.1: Add color-coded styling to the Track Browser table. Status column uses colored text: "new" = gray, "active" = yellow, "done" = green, "blocked" = red. Progress bar uses `imgui.push_style_color` to tint: <33% red, 33-66% yellow, >66% green.
+- [x] Task 5.2: Improve the DAG tree nodes with status-colored left borders. Use `imgui.get_cursor_screen_pos()` and `imgui.get_window_draw_list().add_rect_filled()` to draw a 4px colored strip to the left of each tree node matching its status color.
+- [x] Task 5.3: Add a "Dashboard Summary" header line at the top of `_render_mma_dashboard` showing: `Track: {name} | Tickets: {done}/{total} | Cost: ${total_cost:.4f} | Status: {mma_status}` in a single dense line with colored segments.
+- [x] Task 5.4: Write an end-to-end integration test (extending `tests/visual_sim_mma_v2.py` or creating `tests/visual_sim_gui_ux.py`) that verifies via `ApiHookClient`: (a) track creation form produces correct directory structure; (b) tier streams are populated during MMA execution; (c) approval indicators appear when expected; (d) cost tracking shows non-zero values after execution.
+- [x] Task 5.5: Verify all new UI elements maintain >30 FPS via `get_ui_performance` during a full MMA simulation run.
+- [x] Task 5.6: Conductor - User Manual Verification 'Phase 5: Visual Polish & Integration Testing' (Protocol in workflow.md)

 ## Phase 6: Live Worker Streaming & Engine Enhancements
 Focus: Make MMA execution observable in real-time and configurable from the GUI. Currently workers are black boxes until completion.

- [ ] Task 6.1: Wire `ai_client.comms_log_callback` to per-ticket streams during `run_worker_lifecycle` (multi_agent_conductor.py:207-300). Before calling `ai_client.send()`, set `ai_client.comms_log_callback` to a closure that pushes intermediate text chunks to the GUI via `_queue_put(event_queue, loop, "response", {"text": chunk, "stream_id": f"Tier 3 (Worker): {ticket.id}", "status": "streaming..."})`. After `send()` returns, restore the original callback. This gives real-time output streaming to the Tier 3 stream panels from Phase 1.
- [ ] Task 6.2: Add per-tier model configuration to the MMA dashboard. Below the token usage table in `_render_mma_dashboard`, add a collapsible "Tier Model Config" section with 4 rows (Tier 1-4). Each row: tier label + `imgui.combo` dropdown populated from `ai_client.list_models()` (cached). Store selections in `self.mma_tier_models: dict[str, str]` with defaults from `mma_exec.get_model_for_role()`. On change, write to `self.project["mma"]["tier_models"]` for persistence.
- [ ] Task 6.3: Wire per-tier model config into the execution pipeline. In `ConductorEngine.run` (multi_agent_conductor.py:105-135), when creating `WorkerContext`, read the model name from the GUI's `mma_tier_models` dict (passed via the event queue or stored on the engine). Pass it through to `run_worker_lifecycle` which should use it in `ai_client.set_provider`/`ai_client.set_model_params` before calling `send()`. Also update `mma_exec.py:get_model_for_role` to accept an override parameter.
- [ ] Task 6.4: Add parallel DAG execution. In `ConductorEngine.run` (multi_agent_conductor.py:100-135), replace the sequential `for ticket in ready_tasks` loop with `asyncio.gather(*[loop.run_in_executor(None, run_worker_lifecycle, ...) for ticket in ready_tasks])`. Each worker already gets its own `ai_client.reset_session()` so they're isolated. Guard with `ai_client._send_lock` awareness — if the lock serializes all sends, parallel execution won't help. In that case, create per-worker provider instances or use separate session IDs. Mark this task as exploratory — if `_send_lock` blocks parallelism, document the constraint and defer.
- [ ] Task 6.5: Add automatic retry with model escalation. In `ConductorEngine.run`, after `run_worker_lifecycle` returns, check if `ticket.status == "blocked"`. If so, and `retry_count < max_retries` (default 2), increment retry count, escalate the model (e.g., flash-lite → flash → pro), and re-execute. Store `retry_count` as a field on the ticket dict. After max retries, leave as blocked.
- [ ] Task 6.6: Write tests for: (a) streaming callback pushes intermediate content to event queue; (b) per-tier model config persists to project TOML; (c) retry escalation increments model tier.
- [ ] Task 6.7: Conductor - User Manual Verification 'Phase 6: Live Worker Streaming & Engine Enhancements' (Protocol in workflow.md)
+- [x] Task 6.1: Wire `ai_client.comms_log_callback` to per-ticket streams during `run_worker_lifecycle` (multi_agent_conductor.py:207-300). Before calling `ai_client.send()`, set `ai_client.comms_log_callback` to a closure that pushes intermediate text chunks to the GUI via `_queue_put(event_queue, loop, "response", {"text": chunk, "stream_id": f"Tier 3 (Worker): {ticket.id}", "status": "streaming..."})`. After `send()` returns, restore the original callback. This gives real-time output streaming to the Tier 3 stream panels from Phase 1.
+- [x] Task 6.2: Add per-tier model configuration to the MMA dashboard. Below the token usage table in `_render_mma_dashboard`, add a collapsible "Tier Model Config" section with 4 rows (Tier 1-4). Each row: tier label + `imgui.combo` dropdown populated from `ai_client.list_models()` (cached). Store selections in `self.mma_tier_models: dict[str, str]` with defaults from `mma_exec.get_model_for_role()`. On change, write to `self.project["mma"]["tier_models"]` for persistence.
+- [x] Task 6.3: Wire per-tier model config into the execution pipeline. In `ConductorEngine.run` (multi_agent_conductor.py:105-135), when creating `WorkerContext`, read the model name from the GUI's `mma_tier_models` dict (passed via the event queue or stored on the engine). Pass it through to `run_worker_lifecycle` which should use it in `ai_client.set_provider`/`ai_client.set_model_params` before calling `send()`. Also update `mma_exec.py:get_model_for_role` to accept an override parameter.
+- [x] Task 6.4: Add parallel DAG execution. In `ConductorEngine.run` (multi_agent_conductor.py:100-135), replace the sequential `for ticket in ready_tasks` loop with `asyncio.gather(*[loop.run_in_executor(None, run_worker_lifecycle, ...) for ticket in ready_tasks])`. Each worker already gets its own `ai_client.reset_session()` so they're isolated. Guard with `ai_client._send_lock` awareness — if the lock serializes all sends, parallel execution won't help. In that case, create per-worker provider instances or use separate session IDs. Mark this task as exploratory — if `_send_lock` blocks parallelism, document the constraint and defer.
+- [x] Task 6.5: Add automatic retry with model escalation. In `ConductorEngine.run`, after `run_worker_lifecycle` returns, check if `ticket.status == "blocked"`. If so, and `retry_count < max_retries` (default 2), increment retry count, escalate the model (e.g., flash-lite → flash → pro), and re-execute. Store `retry_count` as a field on the ticket dict. After max retries, leave as blocked.
+- [x] Task 6.6: Write tests for: (a) streaming callback pushes intermediate content to event queue; (b) per-tier model config persists to project TOML; (c) retry escalation increments model tier.
+- [x] Task 6.7: Conductor - User Manual Verification 'Phase 6: Live Worker Streaming & Engine Enhancements' (Protocol in workflow.md)
@@ -0,0 +1,8 @@
+{
+ "id": "ux_sim_test_20260301",
+ "title": "UX_SIM_TEST",
+ "description": "Simulation testing for GUI UX",
+ "type": "feature",
+ "status": "new",
+ "progress": 0.0
+}
@@ -0,0 +1,3 @@
+# Implementation Plan: UX_SIM_TEST
+
+- [ ] Task 1: Initialize
@@ -0,0 +1,5 @@
+# Specification: UX_SIM_TEST
+
+Type: feature
+
+Description: Simulation testing for GUI UX