diff --git a/conductor/tracks/cost_token_analytics_20260306/spec.md b/conductor/tracks/cost_token_analytics_20260306/spec.md index ae5cc66..2f56dc0 100644 --- a/conductor/tracks/cost_token_analytics_20260306/spec.md +++ b/conductor/tracks/cost_token_analytics_20260306/spec.md @@ -8,59 +8,48 @@ Real-time cost tracking panel displaying cost per model, session totals, and bre ### Already Implemented (DO NOT re-implement) #### cost_tracker.py (src/cost_tracker.py) -- **`MODEL_PRICING` dict**: Pricing per 1M tokens for all supported models +- **`MODEL_PRICING` list**: List of (regex_pattern, rates_dict) tuples ```python - MODEL_PRICING: dict[str, dict[str, float]] = { - "gemini-2.5-flash-lite": {"input": 0.075, "output": 0.30}, - "gemini-2.5-flash": {"input": 0.15, "output": 0.60}, - "gemini-3.1-pro-preview": {"input": 1.25, "output": 5.00}, - "claude-3-5-sonnet": {"input": 3.00, "output": 15.00}, - "deepseek-v3": {"input": 0.27, "output": 1.10}, - # ... more models - } + MODEL_PRICING = [ + (r"gemini-2\.5-flash-lite", {"input_per_mtok": 0.075, "output_per_mtok": 0.30}), + (r"gemini-2\.5-flash", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}), + (r"gemini-3-flash-preview", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}), + (r"gemini-3\.1-pro-preview", {"input_per_mtok": 3.50, "output_per_mtok": 10.50}), + (r"claude-.*-sonnet", {"input_per_mtok": 3.0, "output_per_mtok": 15.0}), + (r"deepseek-v3", {"input_per_mtok": 0.27, "output_per_mtok": 1.10}), + ] ``` -- **`estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float`**: Calculate cost in USD -- **Returns 0.0 for unknown models** - safe default +- **`estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float`**: Uses regex match, returns 0.0 for unknown models -#### Token Tracking in ai_client.py -- **`_add_bleed_derived()`** (ai_client.py): Adds derived token counts to comms entries -- **`get_history_bleed_stats()`**: Returns token statistics from history -- **Gemini**: Token counts from API response (`usage_metadata`) -- **Anthropic**: Token counts from API response (`usage`) -- **DeepSeek**: Token counts from API response (`usage`) - -#### MMA Tier Usage Tracking -- **`ConductorEngine.tier_usage`** (multi_agent_conductor.py): Tracks per-tier token usage +#### MMA Tier Usage Tracking (multi_agent_conductor.py) +- **`ConductorEngine.tier_usage`** already tracks per-tier token counts AND model: ```python self.tier_usage = { - "Tier 1": {"input": 0, "output": 0}, - "Tier 2": {"input": 0, "output": 0}, - "Tier 3": {"input": 0, "output": 0}, - "Tier 4": {"input": 0, "output": 0}, + "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"}, + "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"}, + "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"}, + "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"}, } ``` +- **Key insight**: The model name is already tracked per tier in `tier_usage[tier]["model"]` +- Updated in `run_worker_lifecycle()` from comms_log token counts ### Gaps to Fill (This Track's Scope) - No GUI panel to display cost information - No session-level cost accumulation -- No per-model breakdown visualization -- No tier breakdown visualization +- No per-tier cost breakdown in UI ## Architectural Constraints ### Non-Blocking Updates - Cost calculations MUST NOT block UI thread -- Token counts are read from existing tracking - no new API calls +- Token counts are read from existing tier_usage - no new tracking needed - Use cached values, update on state change events ### Cross-Thread Data Access -- `tier_usage` is updated on asyncio worker thread -- GUI reads via `_process_pending_gui_tasks` pattern -- Already synchronized through MMA state updates - -### Memory Efficiency -- Session cost is a simple float - no history array needed -- Per-model costs can be dict: `{model_name: float}` +- `tier_usage` is updated on worker threads +- GUI reads via MMA state updates through `_pending_gui_tasks` pattern +- Already synchronized through existing state update mechanism ## Architecture Reference @@ -69,38 +58,72 @@ Real-time cost tracking panel displaying cost per model, session totals, and bre | File | Lines | Purpose | |------|-------|---------| | `src/cost_tracker.py` | 10-40 | `MODEL_PRICING`, `estimate_cost()` | -| `src/ai_client.py` | ~500-550 | `_add_bleed_derived()`, `get_history_bleed_stats()` | -| `src/multi_agent_conductor.py` | ~50-60 | `tier_usage` dict | +| `src/multi_agent_conductor.py` | ~50-60 | `tier_usage` dict with input/output/model | | `src/gui_2.py` | ~2700-2800 | `_render_mma_dashboard()` - existing tier usage display | -| `src/gui_2.py` | ~1800-1900 | `_render_token_budget_panel()` - potential location | -### Existing MMA Dashboard Pattern -The `_render_mma_dashboard()` method already displays tier usage in a table. Extend this pattern for cost display. +### Cost Calculation Pattern +```python +from src import cost_tracker +usage = engine.tier_usage["Tier 3"] +cost = cost_tracker.estimate_cost( + usage["model"], # Already tracked! + usage["input"], + usage["output"] +) +``` ## Functional Requirements ### FR1: Session Cost Accumulation -- Track total cost for the current session +- Track total cost for the current session in App/AppController state - Reset on session reset -- Store in `App` or `AppController` state +- Sum of all tier costs -### FR2: Per-Model Cost Display -- Show cost broken down by model name -- Group by provider (Gemini, Anthropic, DeepSeek) -- Show token counts alongside costs +### FR2: Per-Tier Cost Display +- Show cost per MMA tier using existing `tier_usage[tier]["model"]` for model +- Show input/output tokens alongside cost +- Calculate using `cost_tracker.estimate_cost()` -### FR3: Tier Breakdown Display -- Show cost per MMA tier (Tier 1-4) -- Use existing `tier_usage` data -- Calculate cost using `cost_tracker.estimate_cost()` - -### FR4: Real-Time Updates +### FR3: Real-Time Updates - Update cost display when MMA state changes - Hook into existing `mma_state_update` event handling - No polling - event-driven ## Non-Functional Requirements +| Requirement | Constraint | +|-------------|------------| +| Frame Time Impact | <1ms when panel visible | +| Memory Overhead | <1KB for session cost state | + +## Testing Requirements + +### Unit Tests +- Test `estimate_cost()` with known model/token combinations +- Test unknown model returns 0.0 +- Test session cost accumulation + +### Integration Tests (via `live_gui` fixture) +- Verify cost panel displays after MMA execution +- Verify session reset clears costs + +## Out of Scope +- Historical cost tracking across sessions +- Cost budgeting/alerts +- Per-model aggregation (model already per-tier) + +## Acceptance Criteria +- [ ] Cost panel displays in GUI +- [ ] Per-tier cost shown with token counts +- [ ] Tier breakdown uses existing tier_usage model field +- [ ] Total session cost accumulates correctly +- [ ] Panel updates on MMA state changes +- [ ] Uses existing `cost_tracker.estimate_cost()` +- [ ] Session reset clears costs +- [ ] 1-space indentation maintained + +## Non-Functional Requirements + | Requirement | Constraint | |-------------|------------| | Frame Time Impact | <1ms when panel visible | diff --git a/conductor/tracks/kill_abort_workers_20260306/spec.md b/conductor/tracks/kill_abort_workers_20260306/spec.md index d77576a..59c947e 100644 --- a/conductor/tracks/kill_abort_workers_20260306/spec.md +++ b/conductor/tracks/kill_abort_workers_20260306/spec.md @@ -1,49 +1,40 @@ # Track Specification: Kill/Abort Running Workers (kill_abort_workers_20260306) ## Overview -Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion in `run_in_executor()`; add cancel button with forced termination option. +Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion; add cancel button with forced termination option. ## Current State Audit ### Already Implemented (DO NOT re-implement) #### Worker Execution (multi_agent_conductor.py) -- **`run_worker_lifecycle()`**: Executes ticket via `loop.run_in_executor(None, ...)` -- **`ConductorEngine.run()`**: Main loop that spawns workers -- **No cancellation mechanism exists** +- **`run_worker_lifecycle()`**: Executes ticket via `threading.Thread(daemon=True)` +- **`ConductorEngine.run()`**: Spawns parallel workers: +- **No thread references stored** - threads launched and joined() but no tracking +- **No abort mechanism** - no way to stop a running worker -#### Thread Pool Pattern (app_controller.py / gui_2.py) -- Uses `threading.Thread(daemon=True)` for background work -- `asyncio.run_in_executor()` for blocking AI calls -- No thread references stored for later cancellation +#### Threading (multi_agent_conductor.py) +- **`threading.Thread`**: Used for workers +- **`threading.Event`**: Available for signaling +- **No abort event per worker** -#### GUI State (gui_2.py) -- **`mma_streams`**: Dict of worker output streams by tier -- **`active_tickets`**: List of currently active tickets -- **No per-worker thread tracking** - -### Gaps to Fill (This Track's Scope) -- No way to track individual worker threads -- No cancellation signal mechanism -- No UI for kill/abort +### Gaps to Fill (This Track's scope) +- No worker thread tracking +- No abort signal mechanism +- No kill button UI - No cleanup on termination ## Architectural Constraints -### Thread Safety -- Worker tracking MUST use thread-safe data structures -- Kill signal MUST be atomic (threading.Event) -- Status updates MUST be atomic - ### Clean Termination - Resources (file handles, network connections) MUST be released - Partial results SHOULD be preserved - No zombie processes -### AI Client Cancellation -- `ai_client.send()` is blocking - cannot be interrupted mid-call -- Kill can only happen between API calls or during tool execution -- Use `threading.Event` to signal abort between operations +### Abort Timing +- **AI API calls cannot mid-call interruption** (API limitation) +- Abort only between API calls or during tool execution +- Check abort flag between operations ## Architecture Reference @@ -51,56 +42,85 @@ Add ability to kill/abort a running Tier 3 worker mid-execution. Currently worke | File | Lines | Purpose | |------|-------|---------| -| `src/multi_agent_conductor.py` | 150-250 | `run_worker_lifecycle()` - add abort check | -| `src/multi_agent_conductor.py` | 80-120 | `ConductorEngine.run()` - track workers | -| `src/dag_engine.py` | 50-80 | `ExecutionEngine` - add kill method | -| `src/models.py` | 30-50 | `Ticket` - add abort_event field | -| `src/gui_2.py` | 2700-2800 | `_render_mma_dashboard()` - add kill buttons | - -### Proposed Worker Tracking Pattern +| `src/multi_agent_conductor.py` | ~80-150 | `ConductorEngine.run()` - thread spawning | +| `src/multi_agent_conductor.py` | ~250-320 | `run_worker_lifecycle()` - add abort check | +| `src/gui_2.py` | ~2650-2750 | `_render_mma_dashboard()` - add kill buttons | +### Current Thread Pattern ```python -# In ConductorEngine: -self._active_workers: dict[str, dict] = {} # ticket_id -> {thread, event, status} +# In ConductorEngine.run(): +threads = [] +for ticket in to_run: + t = threading.Thread( + target=run_worker_lifecycle, + args=(ticket, context, context_files, self.event_queue, self, md_content), + daemon=True + ) + threads.append(t) + t.start() -# In Ticket: -self._abort_event: threading.Event = threading.Event() - -# In run_worker_lifecycle: -if ticket._abort_event.is_set(): - return # Exit early +for t in threads: + t.join() ``` ## Functional Requirements ### FR1: Worker Thread Tracking -- Store reference to each worker's thread -- Store abort Event per worker -- Track status: running, killing, killed +- Store thread reference in `_active_workers: dict[ticket_id, Thread]` +- Track thread state: running, completed, killed +- Clean up on completion -### FR2: Kill Button UI -- Button per active worker in MMA dashboard +### FR2: Abort Event Mechan +- Add `threading.Event()` per ticket: `_abort_events[ticket_id]` +- Worker checks event between operations: +- API call cannot be interrupted (limitation documented) + +### FR3: Kill Button UI +- Button per running worker in MMA dashboard - Confirmation dialog before kill - Disabled if no workers running -### FR3: Abort Signal Mechanism -- `threading.Event` per ticket for abort signaling -- Worker checks event between operations -- AI client call cannot be interrupted (document limitation) - -### FR4: Clean Cleanup -- Mark ticket as "killed" status +### FR4: Clean Termination +- On kill: set `abort_event.set()` +- Wait for thread to finish (with timeout) +- Remove from `_active_workers` - Preserve partial output in stream -- Remove from active workers dict ## Non-Functional Requirements - | Requirement | Constraint | |-------------|------------| | Response Time | Kill takes effect within 1s of button press | | No Deadlocks | Kill cannot cause system hang | | Memory Safety | Worker resources freed after kill | +## Testing Requirements +### Unit Tests +- Test abort event stops worker at check point +- Test worker tracking dict updates correctly +- Test kill button enables/disables based on workers + +### Integration Tests (via `live_gui` fixture) +- Start worker, click kill, verify termination +- Verify partial output preserved +- Verify no zombie threads + +## Out of Scope +- Force-killing AI API calls (API limitation) +- Kill and restart (separate track) +- Kill during PowerShell execution (separate concern) + +## Acceptance Criteria +- [ ] Kill button visible per running worker +- [ ] Confirmation dialog appears +- [ ] Worker terminates within 1s of kill +- [ ] Partial output preserved in stream +- [ ] Resources cleaned up +- [ ] Status reflects "killed" +- [ ] No zombie threads after kill +- [ ] 1-space indentation maintained +| No Deadlocks | Kill cannot cause system hang | +| Memory Safety | Worker resources freed after kill | + ## Testing Requirements ### Unit Tests