more refinements
This commit is contained in:
@@ -8,59 +8,48 @@ Real-time cost tracking panel displaying cost per model, session totals, and bre
|
||||
### Already Implemented (DO NOT re-implement)
|
||||
|
||||
#### cost_tracker.py (src/cost_tracker.py)
|
||||
- **`MODEL_PRICING` dict**: Pricing per 1M tokens for all supported models
|
||||
- **`MODEL_PRICING` list**: List of (regex_pattern, rates_dict) tuples
|
||||
```python
|
||||
MODEL_PRICING: dict[str, dict[str, float]] = {
|
||||
"gemini-2.5-flash-lite": {"input": 0.075, "output": 0.30},
|
||||
"gemini-2.5-flash": {"input": 0.15, "output": 0.60},
|
||||
"gemini-3.1-pro-preview": {"input": 1.25, "output": 5.00},
|
||||
"claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
|
||||
"deepseek-v3": {"input": 0.27, "output": 1.10},
|
||||
# ... more models
|
||||
}
|
||||
MODEL_PRICING = [
|
||||
(r"gemini-2\.5-flash-lite", {"input_per_mtok": 0.075, "output_per_mtok": 0.30}),
|
||||
(r"gemini-2\.5-flash", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}),
|
||||
(r"gemini-3-flash-preview", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}),
|
||||
(r"gemini-3\.1-pro-preview", {"input_per_mtok": 3.50, "output_per_mtok": 10.50}),
|
||||
(r"claude-.*-sonnet", {"input_per_mtok": 3.0, "output_per_mtok": 15.0}),
|
||||
(r"deepseek-v3", {"input_per_mtok": 0.27, "output_per_mtok": 1.10}),
|
||||
]
|
||||
```
|
||||
- **`estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float`**: Calculate cost in USD
|
||||
- **Returns 0.0 for unknown models** - safe default
|
||||
- **`estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float`**: Uses regex match, returns 0.0 for unknown models
|
||||
|
||||
#### Token Tracking in ai_client.py
|
||||
- **`_add_bleed_derived()`** (ai_client.py): Adds derived token counts to comms entries
|
||||
- **`get_history_bleed_stats()`**: Returns token statistics from history
|
||||
- **Gemini**: Token counts from API response (`usage_metadata`)
|
||||
- **Anthropic**: Token counts from API response (`usage`)
|
||||
- **DeepSeek**: Token counts from API response (`usage`)
|
||||
|
||||
#### MMA Tier Usage Tracking
|
||||
- **`ConductorEngine.tier_usage`** (multi_agent_conductor.py): Tracks per-tier token usage
|
||||
#### MMA Tier Usage Tracking (multi_agent_conductor.py)
|
||||
- **`ConductorEngine.tier_usage`** already tracks per-tier token counts AND model:
|
||||
```python
|
||||
self.tier_usage = {
|
||||
"Tier 1": {"input": 0, "output": 0},
|
||||
"Tier 2": {"input": 0, "output": 0},
|
||||
"Tier 3": {"input": 0, "output": 0},
|
||||
"Tier 4": {"input": 0, "output": 0},
|
||||
"Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
|
||||
"Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
|
||||
"Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
|
||||
"Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
|
||||
}
|
||||
```
|
||||
- **Key insight**: The model name is already tracked per tier in `tier_usage[tier]["model"]`
|
||||
- Updated in `run_worker_lifecycle()` from comms_log token counts
|
||||
|
||||
### Gaps to Fill (This Track's Scope)
|
||||
- No GUI panel to display cost information
|
||||
- No session-level cost accumulation
|
||||
- No per-model breakdown visualization
|
||||
- No tier breakdown visualization
|
||||
- No per-tier cost breakdown in UI
|
||||
|
||||
## Architectural Constraints
|
||||
|
||||
### Non-Blocking Updates
|
||||
- Cost calculations MUST NOT block UI thread
|
||||
- Token counts are read from existing tracking - no new API calls
|
||||
- Token counts are read from existing tier_usage - no new tracking needed
|
||||
- Use cached values, update on state change events
|
||||
|
||||
### Cross-Thread Data Access
|
||||
- `tier_usage` is updated on asyncio worker thread
|
||||
- GUI reads via `_process_pending_gui_tasks` pattern
|
||||
- Already synchronized through MMA state updates
|
||||
|
||||
### Memory Efficiency
|
||||
- Session cost is a simple float - no history array needed
|
||||
- Per-model costs can be dict: `{model_name: float}`
|
||||
- `tier_usage` is updated on worker threads
|
||||
- GUI reads via MMA state updates through `_pending_gui_tasks` pattern
|
||||
- Already synchronized through existing state update mechanism
|
||||
|
||||
## Architecture Reference
|
||||
|
||||
@@ -69,38 +58,72 @@ Real-time cost tracking panel displaying cost per model, session totals, and bre
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `src/cost_tracker.py` | 10-40 | `MODEL_PRICING`, `estimate_cost()` |
|
||||
| `src/ai_client.py` | ~500-550 | `_add_bleed_derived()`, `get_history_bleed_stats()` |
|
||||
| `src/multi_agent_conductor.py` | ~50-60 | `tier_usage` dict |
|
||||
| `src/multi_agent_conductor.py` | ~50-60 | `tier_usage` dict with input/output/model |
|
||||
| `src/gui_2.py` | ~2700-2800 | `_render_mma_dashboard()` - existing tier usage display |
|
||||
| `src/gui_2.py` | ~1800-1900 | `_render_token_budget_panel()` - potential location |
|
||||
|
||||
### Existing MMA Dashboard Pattern
|
||||
The `_render_mma_dashboard()` method already displays tier usage in a table. Extend this pattern for cost display.
|
||||
### Cost Calculation Pattern
|
||||
```python
|
||||
from src import cost_tracker
|
||||
usage = engine.tier_usage["Tier 3"]
|
||||
cost = cost_tracker.estimate_cost(
|
||||
usage["model"], # Already tracked!
|
||||
usage["input"],
|
||||
usage["output"]
|
||||
)
|
||||
```
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
### FR1: Session Cost Accumulation
|
||||
- Track total cost for the current session
|
||||
- Track total cost for the current session in App/AppController state
|
||||
- Reset on session reset
|
||||
- Store in `App` or `AppController` state
|
||||
- Sum of all tier costs
|
||||
|
||||
### FR2: Per-Model Cost Display
|
||||
- Show cost broken down by model name
|
||||
- Group by provider (Gemini, Anthropic, DeepSeek)
|
||||
- Show token counts alongside costs
|
||||
### FR2: Per-Tier Cost Display
|
||||
- Show cost per MMA tier using existing `tier_usage[tier]["model"]` for model
|
||||
- Show input/output tokens alongside cost
|
||||
- Calculate using `cost_tracker.estimate_cost()`
|
||||
|
||||
### FR3: Tier Breakdown Display
|
||||
- Show cost per MMA tier (Tier 1-4)
|
||||
- Use existing `tier_usage` data
|
||||
- Calculate cost using `cost_tracker.estimate_cost()`
|
||||
|
||||
### FR4: Real-Time Updates
|
||||
### FR3: Real-Time Updates
|
||||
- Update cost display when MMA state changes
|
||||
- Hook into existing `mma_state_update` event handling
|
||||
- No polling - event-driven
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
| Requirement | Constraint |
|
||||
|-------------|------------|
|
||||
| Frame Time Impact | <1ms when panel visible |
|
||||
| Memory Overhead | <1KB for session cost state |
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
### Unit Tests
|
||||
- Test `estimate_cost()` with known model/token combinations
|
||||
- Test unknown model returns 0.0
|
||||
- Test session cost accumulation
|
||||
|
||||
### Integration Tests (via `live_gui` fixture)
|
||||
- Verify cost panel displays after MMA execution
|
||||
- Verify session reset clears costs
|
||||
|
||||
## Out of Scope
|
||||
- Historical cost tracking across sessions
|
||||
- Cost budgeting/alerts
|
||||
- Per-model aggregation (model already per-tier)
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Cost panel displays in GUI
|
||||
- [ ] Per-tier cost shown with token counts
|
||||
- [ ] Tier breakdown uses existing tier_usage model field
|
||||
- [ ] Total session cost accumulates correctly
|
||||
- [ ] Panel updates on MMA state changes
|
||||
- [ ] Uses existing `cost_tracker.estimate_cost()`
|
||||
- [ ] Session reset clears costs
|
||||
- [ ] 1-space indentation maintained
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
| Requirement | Constraint |
|
||||
|-------------|------------|
|
||||
| Frame Time Impact | <1ms when panel visible |
|
||||
|
||||
@@ -1,49 +1,40 @@
|
||||
# Track Specification: Kill/Abort Running Workers (kill_abort_workers_20260306)
|
||||
|
||||
## Overview
|
||||
Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion in `run_in_executor()`; add cancel button with forced termination option.
|
||||
Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion; add cancel button with forced termination option.
|
||||
|
||||
## Current State Audit
|
||||
|
||||
### Already Implemented (DO NOT re-implement)
|
||||
|
||||
#### Worker Execution (multi_agent_conductor.py)
|
||||
- **`run_worker_lifecycle()`**: Executes ticket via `loop.run_in_executor(None, ...)`
|
||||
- **`ConductorEngine.run()`**: Main loop that spawns workers
|
||||
- **No cancellation mechanism exists**
|
||||
- **`run_worker_lifecycle()`**: Executes ticket via `threading.Thread(daemon=True)`
|
||||
- **`ConductorEngine.run()`**: Spawns parallel workers:
|
||||
- **No thread references stored** - threads launched and joined() but no tracking
|
||||
- **No abort mechanism** - no way to stop a running worker
|
||||
|
||||
#### Thread Pool Pattern (app_controller.py / gui_2.py)
|
||||
- Uses `threading.Thread(daemon=True)` for background work
|
||||
- `asyncio.run_in_executor()` for blocking AI calls
|
||||
- No thread references stored for later cancellation
|
||||
#### Threading (multi_agent_conductor.py)
|
||||
- **`threading.Thread`**: Used for workers
|
||||
- **`threading.Event`**: Available for signaling
|
||||
- **No abort event per worker**
|
||||
|
||||
#### GUI State (gui_2.py)
|
||||
- **`mma_streams`**: Dict of worker output streams by tier
|
||||
- **`active_tickets`**: List of currently active tickets
|
||||
- **No per-worker thread tracking**
|
||||
|
||||
### Gaps to Fill (This Track's Scope)
|
||||
- No way to track individual worker threads
|
||||
- No cancellation signal mechanism
|
||||
- No UI for kill/abort
|
||||
### Gaps to Fill (This Track's scope)
|
||||
- No worker thread tracking
|
||||
- No abort signal mechanism
|
||||
- No kill button UI
|
||||
- No cleanup on termination
|
||||
|
||||
## Architectural Constraints
|
||||
|
||||
### Thread Safety
|
||||
- Worker tracking MUST use thread-safe data structures
|
||||
- Kill signal MUST be atomic (threading.Event)
|
||||
- Status updates MUST be atomic
|
||||
|
||||
### Clean Termination
|
||||
- Resources (file handles, network connections) MUST be released
|
||||
- Partial results SHOULD be preserved
|
||||
- No zombie processes
|
||||
|
||||
### AI Client Cancellation
|
||||
- `ai_client.send()` is blocking - cannot be interrupted mid-call
|
||||
- Kill can only happen between API calls or during tool execution
|
||||
- Use `threading.Event` to signal abort between operations
|
||||
### Abort Timing
|
||||
- **AI API calls cannot mid-call interruption** (API limitation)
|
||||
- Abort only between API calls or during tool execution
|
||||
- Check abort flag between operations
|
||||
|
||||
## Architecture Reference
|
||||
|
||||
@@ -51,56 +42,85 @@ Add ability to kill/abort a running Tier 3 worker mid-execution. Currently worke
|
||||
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `src/multi_agent_conductor.py` | 150-250 | `run_worker_lifecycle()` - add abort check |
|
||||
| `src/multi_agent_conductor.py` | 80-120 | `ConductorEngine.run()` - track workers |
|
||||
| `src/dag_engine.py` | 50-80 | `ExecutionEngine` - add kill method |
|
||||
| `src/models.py` | 30-50 | `Ticket` - add abort_event field |
|
||||
| `src/gui_2.py` | 2700-2800 | `_render_mma_dashboard()` - add kill buttons |
|
||||
|
||||
### Proposed Worker Tracking Pattern
|
||||
| `src/multi_agent_conductor.py` | ~80-150 | `ConductorEngine.run()` - thread spawning |
|
||||
| `src/multi_agent_conductor.py` | ~250-320 | `run_worker_lifecycle()` - add abort check |
|
||||
| `src/gui_2.py` | ~2650-2750 | `_render_mma_dashboard()` - add kill buttons |
|
||||
|
||||
### Current Thread Pattern
|
||||
```python
|
||||
# In ConductorEngine:
|
||||
self._active_workers: dict[str, dict] = {} # ticket_id -> {thread, event, status}
|
||||
# In ConductorEngine.run():
|
||||
threads = []
|
||||
for ticket in to_run:
|
||||
t = threading.Thread(
|
||||
target=run_worker_lifecycle,
|
||||
args=(ticket, context, context_files, self.event_queue, self, md_content),
|
||||
daemon=True
|
||||
)
|
||||
threads.append(t)
|
||||
t.start()
|
||||
|
||||
# In Ticket:
|
||||
self._abort_event: threading.Event = threading.Event()
|
||||
|
||||
# In run_worker_lifecycle:
|
||||
if ticket._abort_event.is_set():
|
||||
return # Exit early
|
||||
for t in threads:
|
||||
t.join()
|
||||
```
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
### FR1: Worker Thread Tracking
|
||||
- Store reference to each worker's thread
|
||||
- Store abort Event per worker
|
||||
- Track status: running, killing, killed
|
||||
- Store thread reference in `_active_workers: dict[ticket_id, Thread]`
|
||||
- Track thread state: running, completed, killed
|
||||
- Clean up on completion
|
||||
|
||||
### FR2: Kill Button UI
|
||||
- Button per active worker in MMA dashboard
|
||||
### FR2: Abort Event Mechan
|
||||
- Add `threading.Event()` per ticket: `_abort_events[ticket_id]`
|
||||
- Worker checks event between operations:
|
||||
- API call cannot be interrupted (limitation documented)
|
||||
|
||||
### FR3: Kill Button UI
|
||||
- Button per running worker in MMA dashboard
|
||||
- Confirmation dialog before kill
|
||||
- Disabled if no workers running
|
||||
|
||||
### FR3: Abort Signal Mechanism
|
||||
- `threading.Event` per ticket for abort signaling
|
||||
- Worker checks event between operations
|
||||
- AI client call cannot be interrupted (document limitation)
|
||||
|
||||
### FR4: Clean Cleanup
|
||||
- Mark ticket as "killed" status
|
||||
### FR4: Clean Termination
|
||||
- On kill: set `abort_event.set()`
|
||||
- Wait for thread to finish (with timeout)
|
||||
- Remove from `_active_workers`
|
||||
- Preserve partial output in stream
|
||||
- Remove from active workers dict
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
| Requirement | Constraint |
|
||||
|-------------|------------|
|
||||
| Response Time | Kill takes effect within 1s of button press |
|
||||
| No Deadlocks | Kill cannot cause system hang |
|
||||
| Memory Safety | Worker resources freed after kill |
|
||||
|
||||
## Testing Requirements
|
||||
### Unit Tests
|
||||
- Test abort event stops worker at check point
|
||||
- Test worker tracking dict updates correctly
|
||||
- Test kill button enables/disables based on workers
|
||||
|
||||
### Integration Tests (via `live_gui` fixture)
|
||||
- Start worker, click kill, verify termination
|
||||
- Verify partial output preserved
|
||||
- Verify no zombie threads
|
||||
|
||||
## Out of Scope
|
||||
- Force-killing AI API calls (API limitation)
|
||||
- Kill and restart (separate track)
|
||||
- Kill during PowerShell execution (separate concern)
|
||||
|
||||
## Acceptance Criteria
|
||||
- [ ] Kill button visible per running worker
|
||||
- [ ] Confirmation dialog appears
|
||||
- [ ] Worker terminates within 1s of kill
|
||||
- [ ] Partial output preserved in stream
|
||||
- [ ] Resources cleaned up
|
||||
- [ ] Status reflects "killed"
|
||||
- [ ] No zombie threads after kill
|
||||
- [ ] 1-space indentation maintained
|
||||
| No Deadlocks | Kill cannot cause system hang |
|
||||
| Memory Safety | Worker resources freed after kill |
|
||||
|
||||
## Testing Requirements
|
||||
|
||||
### Unit Tests
|
||||
|
||||
Reference in New Issue
Block a user