more refinements

2026-03-06 15:47:18 -05:00
parent fca40fd8da
commit 49ae811be9
2 changed files with 149 additions and 106 deletions
@@ -8,59 +8,48 @@ Real-time cost tracking panel displaying cost per model, session totals, and bre
 ### Already Implemented (DO NOT re-implement)
 #### cost_tracker.py (src/cost_tracker.py)
- **`MODEL_PRICING` dict**: Pricing per 1M tokens for all supported models
+- **`MODEL_PRICING` list**: List of (regex_pattern, rates_dict) tuples
  ```python
-  MODEL_PRICING: dict[str, dict[str, float]] = {
+  MODEL_PRICING = [
-   "gemini-2.5-flash-lite": {"input": 0.075, "output": 0.30},
+      (r"gemini-2\.5-flash-lite", {"input_per_mtok": 0.075, "output_per_mtok": 0.30}),
-   "gemini-2.5-flash": {"input": 0.15, "output": 0.60},
+      (r"gemini-2\.5-flash", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}),
-   "gemini-3.1-pro-preview": {"input": 1.25, "output": 5.00},
+      (r"gemini-3-flash-preview", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}),
-   "claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
+      (r"gemini-3\.1-pro-preview", {"input_per_mtok": 3.50, "output_per_mtok": 10.50}),
-   "deepseek-v3": {"input": 0.27, "output": 1.10},
+      (r"claude-.*-sonnet", {"input_per_mtok": 3.0, "output_per_mtok": 15.0}),
-   # ... more models
+      (r"deepseek-v3", {"input_per_mtok": 0.27, "output_per_mtok": 1.10}),
-  }
+  ]
  ```
- **`estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float`**: Calculate cost in USD
+- **`estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float`**: Uses regex match, returns 0.0 for unknown models
 - **Returns 0.0 for unknown models** - safe default
-#### Token Tracking in ai_client.py
+#### MMA Tier Usage Tracking (multi_agent_conductor.py)
- **`_add_bleed_derived()`** (ai_client.py): Adds derived token counts to comms entries
+- **`ConductorEngine.tier_usage`** already tracks per-tier token counts AND model:
 - **`get_history_bleed_stats()`**: Returns token statistics from history
 - **Gemini**: Token counts from API response (`usage_metadata`)
 - **Anthropic**: Token counts from API response (`usage`)
 - **DeepSeek**: Token counts from API response (`usage`)
 #### MMA Tier Usage Tracking
 - **`ConductorEngine.tier_usage`** (multi_agent_conductor.py): Tracks per-tier token usage
  ```python
  self.tier_usage = {
-   "Tier 1": {"input": 0, "output": 0},
+   "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
-   "Tier 2": {"input": 0, "output": 0},
+   "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
-   "Tier 3": {"input": 0, "output": 0},
+   "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
-   "Tier 4": {"input": 0, "output": 0},
+   "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
  }
  ```
 - **Key insight**: The model name is already tracked per tier in `tier_usage[tier]["model"]`
 - Updated in `run_worker_lifecycle()` from comms_log token counts
 ### Gaps to Fill (This Track's Scope)
 - No GUI panel to display cost information
 - No session-level cost accumulation
- No per-model breakdown visualization
+- No per-tier cost breakdown in UI
 - No tier breakdown visualization
 ## Architectural Constraints
 ### Non-Blocking Updates
 - Cost calculations MUST NOT block UI thread
- Token counts are read from existing tracking - no new API calls
+- Token counts are read from existing tier_usage - no new tracking needed
 - Use cached values, update on state change events
 ### Cross-Thread Data Access
- `tier_usage` is updated on asyncio worker thread
+- `tier_usage` is updated on worker threads
- GUI reads via `_process_pending_gui_tasks` pattern
+- GUI reads via MMA state updates through `_pending_gui_tasks` pattern
- Already synchronized through MMA state updates
+- Already synchronized through existing state update mechanism
 ### Memory Efficiency
 - Session cost is a simple float - no history array needed
 - Per-model costs can be dict: `{model_name: float}`
 ## Architecture Reference
@@ -69,38 +58,72 @@ Real-time cost tracking panel displaying cost per model, session totals, and bre
 | File | Lines | Purpose |
 |------|-------|---------|
 | `src/cost_tracker.py` | 10-40 | `MODEL_PRICING`, `estimate_cost()` |
-| `src/ai_client.py` | ~500-550 | `_add_bleed_derived()`, `get_history_bleed_stats()` |
+| `src/multi_agent_conductor.py` | ~50-60 | `tier_usage` dict with input/output/model |
 | `src/multi_agent_conductor.py` | ~50-60 | `tier_usage` dict |
 | `src/gui_2.py` | ~2700-2800 | `_render_mma_dashboard()` - existing tier usage display |
 | `src/gui_2.py` | ~1800-1900 | `_render_token_budget_panel()` - potential location |
-### Existing MMA Dashboard Pattern
+### Cost Calculation Pattern
-The `_render_mma_dashboard()` method already displays tier usage in a table. Extend this pattern for cost display.
+```python
 from src import cost_tracker
 usage = engine.tier_usage["Tier 3"]
 cost = cost_tracker.estimate_cost(
    usage["model"],      # Already tracked!
    usage["input"],
    usage["output"]
 )
 ```
 ## Functional Requirements
 ### FR1: Session Cost Accumulation
- Track total cost for the current session
+- Track total cost for the current session in App/AppController state
 - Reset on session reset
- Store in `App` or `AppController` state
+- Sum of all tier costs
-### FR2: Per-Model Cost Display
+### FR2: Per-Tier Cost Display
- Show cost broken down by model name
+- Show cost per MMA tier using existing `tier_usage[tier]["model"]` for model
- Group by provider (Gemini, Anthropic, DeepSeek)
+- Show input/output tokens alongside cost
- Show token counts alongside costs
+- Calculate using `cost_tracker.estimate_cost()`
-### FR3: Tier Breakdown Display
+### FR3: Real-Time Updates
 - Show cost per MMA tier (Tier 1-4)
 - Use existing `tier_usage` data
 - Calculate cost using `cost_tracker.estimate_cost()`
 ### FR4: Real-Time Updates
 - Update cost display when MMA state changes
 - Hook into existing `mma_state_update` event handling
 - No polling - event-driven
 ## Non-Functional Requirements
 | Requirement | Constraint |
 |-------------|------------|
 | Frame Time Impact | <1ms when panel visible |
 | Memory Overhead | <1KB for session cost state |
 ## Testing Requirements
 ### Unit Tests
 - Test `estimate_cost()` with known model/token combinations
 - Test unknown model returns 0.0
 - Test session cost accumulation
 ### Integration Tests (via `live_gui` fixture)
 - Verify cost panel displays after MMA execution
 - Verify session reset clears costs
 ## Out of Scope
 - Historical cost tracking across sessions
 - Cost budgeting/alerts
 - Per-model aggregation (model already per-tier)
 ## Acceptance Criteria
 - [ ] Cost panel displays in GUI
 - [ ] Per-tier cost shown with token counts
 - [ ] Tier breakdown uses existing tier_usage model field
 - [ ] Total session cost accumulates correctly
 - [ ] Panel updates on MMA state changes
 - [ ] Uses existing `cost_tracker.estimate_cost()`
 - [ ] Session reset clears costs
 - [ ] 1-space indentation maintained
 ## Non-Functional Requirements
 | Requirement | Constraint |
 |-------------|------------|
 | Frame Time Impact | <1ms when panel visible |
@@ -1,49 +1,40 @@
 # Track Specification: Kill/Abort Running Workers (kill_abort_workers_20260306)
 ## Overview
-Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion in `run_in_executor()`; add cancel button with forced termination option.
+Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion; add cancel button with forced termination option.
 ## Current State Audit
 ### Already Implemented (DO NOT re-implement)
 #### Worker Execution (multi_agent_conductor.py)
- **`run_worker_lifecycle()`**: Executes ticket via `loop.run_in_executor(None, ...)`
+- **`run_worker_lifecycle()`**: Executes ticket via `threading.Thread(daemon=True)`
- **`ConductorEngine.run()`**: Main loop that spawns workers
+- **`ConductorEngine.run()`**: Spawns parallel workers:
- **No cancellation mechanism exists**
+- **No thread references stored** - threads launched and joined() but no tracking
 - **No abort mechanism** - no way to stop a running worker
-#### Thread Pool Pattern (app_controller.py / gui_2.py)
+#### Threading (multi_agent_conductor.py)
- Uses `threading.Thread(daemon=True)` for background work
+- **`threading.Thread`**: Used for workers
- `asyncio.run_in_executor()` for blocking AI calls
+- **`threading.Event`**: Available for signaling
- No thread references stored for later cancellation
+- **No abort event per worker**
-#### GUI State (gui_2.py)
+### Gaps to Fill (This Track's scope)
- **`mma_streams`**: Dict of worker output streams by tier
+- No worker thread tracking
- **`active_tickets`**: List of currently active tickets
+- No abort signal mechanism
- **No per-worker thread tracking**
+- No kill button UI
 ### Gaps to Fill (This Track's Scope)
 - No way to track individual worker threads
 - No cancellation signal mechanism
 - No UI for kill/abort
 - No cleanup on termination
 ## Architectural Constraints
 ### Thread Safety
 - Worker tracking MUST use thread-safe data structures
 - Kill signal MUST be atomic (threading.Event)
 - Status updates MUST be atomic
 ### Clean Termination
 - Resources (file handles, network connections) MUST be released
 - Partial results SHOULD be preserved
 - No zombie processes
-### AI Client Cancellation
+### Abort Timing
- `ai_client.send()` is blocking - cannot be interrupted mid-call
+- **AI API calls cannot mid-call interruption** (API limitation)
- Kill can only happen between API calls or during tool execution
+- Abort only between API calls or during tool execution
- Use `threading.Event` to signal abort between operations
+- Check abort flag between operations
 ## Architecture Reference
@@ -51,56 +42,85 @@ Add ability to kill/abort a running Tier 3 worker mid-execution. Currently worke
 | File | Lines | Purpose |
 |------|-------|---------|
-| `src/multi_agent_conductor.py` | 150-250 | `run_worker_lifecycle()` - add abort check |
+| `src/multi_agent_conductor.py` | ~80-150 | `ConductorEngine.run()` - thread spawning |
-| `src/multi_agent_conductor.py` | 80-120 | `ConductorEngine.run()` - track workers |
+| `src/multi_agent_conductor.py` | ~250-320 | `run_worker_lifecycle()` - add abort check |
-| `src/dag_engine.py` | 50-80 | `ExecutionEngine` - add kill method |
+| `src/gui_2.py` | ~2650-2750 | `_render_mma_dashboard()` - add kill buttons |
 | `src/models.py` | 30-50 | `Ticket` - add abort_event field |
 | `src/gui_2.py` | 2700-2800 | `_render_mma_dashboard()` - add kill buttons |
 ### Proposed Worker Tracking Pattern
 ### Current Thread Pattern
 ```python
-# In ConductorEngine:
+# In ConductorEngine.run():
-self._active_workers: dict[str, dict] = {}  # ticket_id -> {thread, event, status}
+threads = []
 for ticket in to_run:
    t = threading.Thread(
        target=run_worker_lifecycle,
        args=(ticket, context, context_files, self.event_queue, self, md_content),
        daemon=True
    )
    threads.append(t)
    t.start()
-# In Ticket:
+for t in threads:
-self._abort_event: threading.Event = threading.Event()
+    t.join()
 # In run_worker_lifecycle:
 if ticket._abort_event.is_set():
 return  # Exit early
 ```
 ## Functional Requirements
 ### FR1: Worker Thread Tracking
- Store reference to each worker's thread
+- Store thread reference in `_active_workers: dict[ticket_id, Thread]`
- Store abort Event per worker
+- Track thread state: running, completed, killed
- Track status: running, killing, killed
+- Clean up on completion
-### FR2: Kill Button UI
+### FR2: Abort Event Mechan
- Button per active worker in MMA dashboard
+- Add `threading.Event()` per ticket: `_abort_events[ticket_id]`
 - Worker checks event between operations:
 - API call cannot be interrupted (limitation documented)
 ### FR3: Kill Button UI
 - Button per running worker in MMA dashboard
 - Confirmation dialog before kill
 - Disabled if no workers running
-### FR3: Abort Signal Mechanism
+### FR4: Clean Termination
- `threading.Event` per ticket for abort signaling
+- On kill: set `abort_event.set()`
- Worker checks event between operations
+- Wait for thread to finish (with timeout)
- AI client call cannot be interrupted (document limitation)
+- Remove from `_active_workers`
 ### FR4: Clean Cleanup
 - Mark ticket as "killed" status
 - Preserve partial output in stream
 - Remove from active workers dict
 ## Non-Functional Requirements
 | Requirement | Constraint |
 |-------------|------------|
 | Response Time | Kill takes effect within 1s of button press |
 | No Deadlocks | Kill cannot cause system hang |
 | Memory Safety | Worker resources freed after kill |
 ## Testing Requirements
 ### Unit Tests
 - Test abort event stops worker at check point
 - Test worker tracking dict updates correctly
 - Test kill button enables/disables based on workers
 ### Integration Tests (via `live_gui` fixture)
 - Start worker, click kill, verify termination
 - Verify partial output preserved
 - Verify no zombie threads
 ## Out of Scope
 - Force-killing AI API calls (API limitation)
 - Kill and restart (separate track)
 - Kill during PowerShell execution (separate concern)
 ## Acceptance Criteria
 - [ ] Kill button visible per running worker
 - [ ] Confirmation dialog appears
 - [ ] Worker terminates within 1s of kill
 - [ ] Partial output preserved in stream
 - [ ] Resources cleaned up
 - [ ] Status reflects "killed"
 - [ ] No zombie threads after kill
 - [ ] 1-space indentation maintained
 | No Deadlocks | Kill cannot cause system hang |
 | Memory Safety | Worker resources freed after kill |
 ## Testing Requirements
 ### Unit Tests