sigh

2026-03-06 15:57:39 -05:00
parent 49ae811be9
commit bf24164b1f
4 changed files with 435 additions and 212 deletions
@@ -1,100 +1,108 @@
-# Track Specification: Cost & Token Analytics Panel (cost_token_analytics_20260306)
+# Implementation Plan: Cost & Token Analytics Panel (cost_token_analytics_20260306)

-## Overview
-Real-time cost tracking panel displaying cost per model, session totals, and breakdown by tier. Uses existing `cost_tracker.py` which is implemented but has no GUI representation.
+> **Reference:** [Spec](./spec.md) | [Architecture Guide](../../../docs/guide_architecture.md)

-## Current State Audit
+## Phase 1: Foundation & Research
+Focus: Verify existing infrastructure
+
+- [ ] Task 1.1: Initialize MMA Environment
+    - Run `activate_skill mma-orchestrator` before starting
+
+- [ ] Task 1.2: Verify cost_tracker.py implementation
+    - WHERE: `src/cost_tracker.py`
+    - WHAT: Confirm `MODEL_PRICING` list structure
+    - HOW: Use `manual-slop_py_get_definition` on `estimate_cost`
+    - OUTPUT: Document exact regex-based matching
+
+    - **Note**: `estimate_cost` loops through patterns, Unknown models return 0.0.
+    - **SHA verification**: Run `uv run pytest tests/test_cost_tracker.py -v`
+    - COMMAND: `uv run pytest tests/test_cost_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py -v --batched (4 files max due to complex threading issues)
+
+    - **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `uv run pytest tests/test_specific_feature.py` (substitute actual file)"
+    - Execute the announced command.
+    - Execute the announced command.
+    - Execute and commands in parallel for potentially slow simulation tests ( batching: maximum 4 test files at a time, use `--timeout=60` or `--timeout=120` if the specific tests in the batch are known to be slow (e.g., simulation tests), increase timeout or `--timeout` appropriately.
+    - **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `uv run pytest tests/test_cache_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py tests/test_cost_panel.py -v`
+    - **CRITICAL:** The full suite frequently can lead to random timeouts or threading access violations. To prevent waiting the full timeout if the GUI exits early. the test file should check its extension.
+    - For each remaining code file, verify a corresponding test file exists.
+    - If a test file is missing, create one. Before writing the test, be aware that the may tests may have `@pytest` decorators (e.g., `@pytest.mark.integration`), - In every test file before verifying a test file exists.
+
+    - For each remaining code file, verify a corresponding test file exists
+    - If a test file is missing, create one. Before writing the test, be aware of the naming convention and testing style. The new tests **must** validate the functionality described in this phase's tasks (`plan.md`).
+    - Use `live_gui` fixture to interact with a real instance of the application via the Hook API, `test_gui2_events.py` and `test_gui2_parity.py` already verify this pattern.
+    - For each test file over 50 lines without using `py_get_skeleton`, `py_get_code_outline`, `py_get_definition` first to map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in `docs/` (last updated: 08e003a):
+
+- **[docs/guide_architecture.md](../docs/guide_architecture.md):** Threading model, event system, AI client, HITL mechanism.
+- **[docs/guide_mma.md](../docs/guide_mma.md):** Ticket/Track/WorkerContext data structures, DAG engine algorithms, ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia.
+- **[docs/guide_simulations.md](../docs/guide_simulations.md):** `live_gui` fixture and Puppeteer pattern, mock provider protocol, visual verification patterns.
+- `get_file_summary` first to decide whether you need the full content. Use `get_file_summary`, `py_get_skeleton`, or `py_get_code_outline` to map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in `docs/` (last updated: 08e003a):
+
+- **[docs/guide_tools.md](../docs/guide_tools.md):** MCP Bridge 3-layer security model, 26-tool inventory with parameters, Hook API endpoint reference (GET/POST), ApiHookClient method reference.
+- **[docs/guide_meta_boundary.md](../docs/guide_meta_boundary.md):** The critical distinction between the Application's Strict-HITL environment and the Meta-Tooling environment used to build it.
+- **Application Layer** (`gui_2.py`, `app_controller.py`): Threads run in `src/` directory. Events flow through `SyncEventQueue` and `EventEmitter` for decoupled communication.
+- **`api_hooks.py`**: HTTP server exposing internal state via REST API when launched with `--enable-test-hooks` flag
+ otherwise only for CLI adapter, uses `SyncEventQueue` to push events to the GUI.
+- **ApiHookClient** (`api_hook_client.py`): Client for interacting with the running application via the Hook API.
+    - `get_status()`: Health check endpoint
+    - `get_mma_status()`: Returns full MMA engine status
+    - `get_gui_state()`: Returns full GUI state
+    - `get_value(item)`: Gets a GUI value by mapped field name
+    - `get_performance()`: Returns performance metrics
+    - `click(item, user_data)`: Simulates a button click
+    - `set_value(item, value)`: Sets a GUI value
+    - `select_tab(item, value)`: Selects a specific tab
+    - `reset_session()`: Resets the session via button click
+
+- **MMA Prompts** (`mma_prompts.py`): Structured system prompts for MMA tiers
+- **ConductorTechLead** (`conductor_tech_lead.py`): Generates tickets from track brief
+- **models.py** (`models.py`): Data structures (Ticket, Track, TrackState, WorkerContext)
+- **dag_engine.py** (`dag_engine.py`): DAG execution engine with cycle detection and topological sorting
+- **multi_agent_conductor.py** (`multi_agent_conductor.py`): MMA orchestration engine
+- **shell_runner.py** (`shell_runner.py`): Sandboxed PowerShell execution
+    - **file_cache.py** (`file_cache.py`): AST parser with tree-sitter
+    - **summarize.py** (`summarize.py`): Heuristic file summaries
+    - **outline_tool.py** (`outline_tool.py`): Code outlining with line ranges
+    - **theme.py** / **theme_2.py** (`theme.py`, `theme_2.py`): ImGui theme/color palettes
+    - **log_registry.py** (`log_registry.py`): Session log registry with TOML persistence
+    - **log_pruner.py** (`log_pruner.py`): Automated log pruning
+    - **performance_monitor.py** (`performance_monitor.py`): FPS, frame time, CPU tracking
+
+    - **gui_2.py**: Main GUI (79KB) - Primary ImGui interface
+    - **ai_client.py**: Multi-provider LLM abstraction (71KB)
+    - **mcp_client.py**: 26 MCP-style tools (48KB)
+    - **app_controller.py**: Headless controller (82KB) - FastAPI for headless mode
+    - **project_manager.py**: Project configuration management (13KB)
+    - **aggregate.py**: Context aggregation (14kb)
+    - **session_logger.py**: Session logging (6kb)
+    - **gemini_cli_adapter.py**: CLI subprocess adapter (6KB)
+
+    - **events.py**: Event system (3KB)
+    - **cost_tracker.py**: Cost estimation (1KB)
+
+## Current State Audit (as of {commit_sha})

 ### Already Implemented (DO NOT re-implement)
-
-#### cost_tracker.py (src/cost_tracker.py)
- **`MODEL_PRICING` list**: List of (regex_pattern, rates_dict) tuples
-  ```python
-  MODEL_PRICING = [
-      (r"gemini-2\.5-flash-lite", {"input_per_mtok": 0.075, "output_per_mtok": 0.30}),
-      (r"gemini-2\.5-flash", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}),
-      (r"gemini-3-flash-preview", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}),
-      (r"gemini-3\.1-pro-preview", {"input_per_mtok": 3.50, "output_per_mtok": 10.50}),
-      (r"claude-.*-sonnet", {"input_per_mtok": 3.0, "output_per_mtok": 15.0}),
-      (r"deepseek-v3", {"input_per_mtok": 0.27, "output_per_mtok": 1.10}),
-  ]
-  ```
- **`estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float`**: Uses regex match, returns 0.0 for unknown models
-
-#### MMA Tier Usage Tracking (multi_agent_conductor.py)
- **`ConductorEngine.tier_usage`** already tracks per-tier token counts AND model:
-  ```python
-  self.tier_usage = {
-   "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
-   "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
-   "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
-   "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
-  }
-  ```
- **Key insight**: The model name is already tracked per tier in `tier_usage[tier]["model"]`
- Updated in `run_worker_lifecycle()` from comms_log token counts
-
-### Gaps to Fill (This Track's Scope)
- No GUI panel to display cost information
- No session-level cost accumulation
- No per-tier cost breakdown in UI
-
-## Architectural Constraints
-
-### Non-Blocking Updates
- Cost calculations MUST NOT block UI thread
- Token counts are read from existing tier_usage - no new tracking needed
- Use cached values, update on state change events
-
-### Cross-Thread Data Access
- `tier_usage` is updated on worker threads
- GUI reads via MMA state updates through `_pending_gui_tasks` pattern
- Already synchronized through existing state update mechanism
-
-## Architecture Reference
-
-### Key Integration Points
-
-| File | Lines | Purpose |
-|------|-------|---------|
-| `src/cost_tracker.py` | 10-40 | `MODEL_PRICING`, `estimate_cost()` |
-| `src/multi_agent_conductor.py` | ~50-60 | `tier_usage` dict with input/output/model |
-| `src/gui_2.py` | ~2700-2800 | `_render_mma_dashboard()` - existing tier usage display |
-
-### Cost Calculation Pattern
-```python
-from src import cost_tracker
-usage = engine.tier_usage["Tier 3"]
-cost = cost_tracker.estimate_cost(
-    usage["model"],      # Already tracked!
-    usage["input"],
-    usage["output"]
-)
+- **`tier_usage` dict in `ConductorEngine.__init__`** (multi_agent_conductor.py lines 50-60)**
+ ```python
+ self.tier_usage = {
+  "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
+  "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
+  "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
+  "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
+ }
 ```
-
-## Functional Requirements
-
-### FR1: Session Cost Accumulation
- Track total cost for the current session in App/AppController state
- Reset on session reset
- Sum of all tier costs
-
-### FR2: Per-Tier Cost Display
- Show cost per MMA tier using existing `tier_usage[tier]["model"]` for model
- Show input/output tokens alongside cost
- Calculate using `cost_tracker.estimate_cost()`
-
-### FR3: Real-Time Updates
- Update cost display when MMA state changes
- Hook into existing `mma_state_update` event handling
- No polling - event-driven
+- **Per-ticket breakdown available** (already tracked by tier)
+ display)
+- **Cost per model** grouped by model name (Gemini, Anthropic, DeepSeek)
+- **Total session cost** accumulate and display total cost
+- **Uses existing cost_tracker.py functions

 ## Non-Functional Requirements
-
 | Requirement | Constraint |
 |-------------|------------|
 | Frame Time Impact | <1ms when panel visible |
 | Memory Overhead | <1KB for session cost state |
+| Thread Safety | Read tier_usage via state updates only |

 ## Testing Requirements

@@ -103,6 +111,35 @@ cost = cost_tracker.estimate_cost(
 - Test unknown model returns 0.0
 - Test session cost accumulation

+### Integration Tests (via `live_gui` fixture)
+- Verify cost panel displays after API call
+- Verify costs update after MMA execution
+- Verify session reset clears costs
+
+- **NO mocking** of `cost_tracker` internals
+- Use real state
+- Test artifacts go to `tests/artifacts/`
+
+## Out of Scope
+- Historical cost tracking across sessions
+- Cost budgeting/alerts
+- Export cost reports
+- API cost for web searches (no token counts available)
+
+## Acceptance Criteria
+- [ ] Cost panel displays in GUI
+- [ ] Per-tier cost shown with token counts
+- [ ] Tier breakdown accurate using existing `tier_usage`
+- [ ] Total session cost accumulates correctly
+- [ ] Panel updates on MMA state changes
+- [ ] Uses existing `cost_tracker.estimate_cost()`
+- [ ] Session reset clears costs
+- [ ] 1-space indentation maintained
+### Unit Tests
+- Test `estimate_cost()` with known model/token combinations
+- Test unknown model returns 0.0
+- Test session cost accumulation
+
 ### Integration Tests (via `live_gui` fixture)
 - Verify cost panel displays after MMA execution
 - Verify session reset clears costs