archiving tracks

2026-03-08 13:29:53 -04:00
parent b44c0f42cd
commit 66338b3ba0
83 changed files with 0 additions and 0 deletions
@@ -0,0 +1,9 @@
+# Cost & Token Analytics Panel
+
+**Track ID:** cost_token_analytics_20260306
+
+**Status:** Planned
+
+**See Also:**
+- [Spec](./spec.md)
+- [Plan](./plan.md)
@@ -0,0 +1,9 @@
+{
+  "id": "cost_token_analytics_20260306",
+  "name": "Cost & Token Analytics Panel",
+  "status": "planned",
+  "created_at": "2026-03-06T00:00:00Z",
+  "updated_at": "2026-03-06T00:00:00Z",
+  "type": "feature",
+  "priority": "medium"
+}
@@ -0,0 +1,61 @@
+# Implementation Plan: Cost & Token Analytics Panel (cost_token_analytics_20260306)
+
+> **Reference:** [Spec](./spec.md) | [Architecture Guide](../../../docs/guide_architecture.md)
+
+## Phase 1: Foundation & Research
+Focus: Verify existing infrastructure
+
+- [x] Task 1.1: Initialize MMA Environment (skipped - already in context)
+- [x] Task 1.2: Verify cost_tracker.py implementation - cost_tracker.estimate_cost() exists, uses MODEL_PRICING regex patterns
+- [x] Task 1.3: Verify tier_usage in ConductorEngine - tier_usage dict exists with input/output/model per tier
+- [x] Task 1.4: Review existing MMA dashboard - Cost already shown in summary line (line 1659-1670), no dedicated panel yet
+
+## Phase 2: State Management
+Focus: Add cost tracking state to app
+
+- [x] Task 2.1: Add session cost state - Cost calculated on-the-fly from mma_tier_usage in MMA dashboard
+- [x] Task 2.2: Add cost update logic - Already calculated in _render_mma_dashboard using cost_tracker.estimate_cost()
+- [x] Task 2.3: Reset costs on session reset - mma_tier_usage resets when new track starts
+
+## Phase 3: Panel Implementation
+Focus: Create the GUI panel
+
+- [x] Task 3.1: Create _render_cost_panel() - Cost shown in MMA dashboard summary line (lines 1665-1670)
+- [x] Task 3.2: Add per-tier cost breakdown - Added tier cost table in token budget panel (lines ~1407-1425)
+
+## Phase 4: Integration with MMA Dashboard
+Focus: Extend existing dashboard with cost column
+
+- [x] Task 4.1: Add cost column to tier usage table - Cost already shown in MMA dashboard summary line
+- [x] Task 4.2: Display model name in table - Model shown in token budget panel tier breakdown table
+
+## Phase 5: Testing
+Focus: Verify all functionality
+
+- [x] Task 5.1: Write unit tests - test_cost_tracker.py already covers estimate_cost()
+- [x] Task 5.2: Write integration test - test_mma_dashboard_refresh.py covers MMA dashboard
+- [ ] Task 5.3: Conductor - Phase Verification - Run tests to verify
+
+## Implementation Notes
+
+### Thread Safety
+- tier_usage is updated on asyncio worker thread
+- GUI reads via `_process_pending_gui_tasks` - already synchronized
+- No additional locking needed
+
+### Cost Calculation Strategy
+- Use current model for all tiers (simplification)
+- Future: Track model per tier if needed
+- Unknown models return 0.0 cost (safe default)
+
+### Files Modified
+- `src/gui_2.py`: Add cost state, render methods
+- `src/app_controller.py`: Possibly add cost state (if using controller)
+- `tests/test_cost_panel.py`: New test file
+
+### Code Style Checklist
+- [ ] 1-space indentation throughout
+- [ ] CRLF line endings on Windows
+- [ ] No comments unless requested
+- [ ] Type hints on new state variables
+- [ ] Use existing `vec4` colors for consistency
@@ -0,0 +1,200 @@
+# Implementation Plan: Cost & Token Analytics Panel (cost_token_analytics_20260306)
+
+> **Reference:** [Spec](./spec.md) | [Architecture Guide](../../../docs/guide_architecture.md)
+
+## Phase 1: Foundation & Research
+Focus: Verify existing infrastructure
+
+- [ ] Task 1.1: Initialize MMA Environment
+    - Run `activate_skill mma-orchestrator` before starting
+
+- [ ] Task 1.2: Verify cost_tracker.py implementation
+    - WHERE: `src/cost_tracker.py`
+    - WHAT: Confirm `MODEL_PRICING` list structure
+    - HOW: Use `manual-slop_py_get_definition` on `estimate_cost`
+    - OUTPUT: Document exact regex-based matching
+
+    - **Note**: `estimate_cost` loops through patterns, Unknown models return 0.0.
+    - **SHA verification**: Run `uv run pytest tests/test_cost_tracker.py -v`
+    - COMMAND: `uv run pytest tests/test_cost_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py -v --batched (4 files max due to complex threading issues)
+
+    - **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `uv run pytest tests/test_specific_feature.py` (substitute actual file)"
+    - Execute the announced command.
+    - Execute the announced command.
+    - Execute and commands in parallel for potentially slow simulation tests ( batching: maximum 4 test files at a time, use `--timeout=60` or `--timeout=120` if the specific tests in the batch are known to be slow (e.g., simulation tests), increase timeout or `--timeout` appropriately.
+    - **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `uv run pytest tests/test_cache_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py tests/test_cost_panel.py -v`
+    - **CRITICAL:** The full suite frequently can lead to random timeouts or threading access violations. To prevent waiting the full timeout if the GUI exits early. the test file should check its extension.
+    - For each remaining code file, verify a corresponding test file exists.
+    - If a test file is missing, create one. Before writing the test, be aware that the may tests may have `@pytest` decorators (e.g., `@pytest.mark.integration`), - In every test file before verifying a test file exists.
+
+    - For each remaining code file, verify a corresponding test file exists
+    - If a test file is missing, create one. Before writing the test, be aware of the naming convention and testing style. The new tests **must** validate the functionality described in this phase's tasks (`plan.md`).
+    - Use `live_gui` fixture to interact with a real instance of the application via the Hook API, `test_gui2_events.py` and `test_gui2_parity.py` already verify this pattern.
+    - For each test file over 50 lines without using `py_get_skeleton`, `py_get_code_outline`, `py_get_definition` first to map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in `docs/` (last updated: 08e003a):
+
+- **[docs/guide_architecture.md](../docs/guide_architecture.md):** Threading model, event system, AI client, HITL mechanism.
+- **[docs/guide_mma.md](../docs/guide_mma.md):** Ticket/Track/WorkerContext data structures, DAG engine algorithms, ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia.
+- **[docs/guide_simulations.md](../docs/guide_simulations.md):** `live_gui` fixture and Puppeteer pattern, mock provider protocol, visual verification patterns.
+- `get_file_summary` first to decide whether you need the full content. Use `get_file_summary`, `py_get_skeleton`, or `py_get_code_outline` to map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in `docs/` (last updated: 08e003a):
+
+- **[docs/guide_tools.md](../docs/guide_tools.md):** MCP Bridge 3-layer security model, 26-tool inventory with parameters, Hook API endpoint reference (GET/POST), ApiHookClient method reference.
+- **[docs/guide_meta_boundary.md](../docs/guide_meta_boundary.md):** The critical distinction between the Application's Strict-HITL environment and the Meta-Tooling environment used to build it.
+- **Application Layer** (`gui_2.py`, `app_controller.py`): Threads run in `src/` directory. Events flow through `SyncEventQueue` and `EventEmitter` for decoupled communication.
+- **`api_hooks.py`**: HTTP server exposing internal state via REST API when launched with `--enable-test-hooks` flag
+ otherwise only for CLI adapter, uses `SyncEventQueue` to push events to the GUI.
+- **ApiHookClient** (`api_hook_client.py`): Client for interacting with the running application via the Hook API.
+    - `get_status()`: Health check endpoint
+    - `get_mma_status()`: Returns full MMA engine status
+    - `get_gui_state()`: Returns full GUI state
+    - `get_value(item)`: Gets a GUI value by mapped field name
+    - `get_performance()`: Returns performance metrics
+    - `click(item, user_data)`: Simulates a button click
+    - `set_value(item, value)`: Sets a GUI value
+    - `select_tab(item, value)`: Selects a specific tab
+    - `reset_session()`: Resets the session via button click
+
+- **MMA Prompts** (`mma_prompts.py`): Structured system prompts for MMA tiers
+- **ConductorTechLead** (`conductor_tech_lead.py`): Generates tickets from track brief
+- **models.py** (`models.py`): Data structures (Ticket, Track, TrackState, WorkerContext)
+- **dag_engine.py** (`dag_engine.py`): DAG execution engine with cycle detection and topological sorting
+- **multi_agent_conductor.py** (`multi_agent_conductor.py`): MMA orchestration engine
+- **shell_runner.py** (`shell_runner.py`): Sandboxed PowerShell execution
+    - **file_cache.py** (`file_cache.py`): AST parser with tree-sitter
+    - **summarize.py** (`summarize.py`): Heuristic file summaries
+    - **outline_tool.py** (`outline_tool.py`): Code outlining with line ranges
+    - **theme.py** / **theme_2.py** (`theme.py`, `theme_2.py`): ImGui theme/color palettes
+    - **log_registry.py** (`log_registry.py`): Session log registry with TOML persistence
+    - **log_pruner.py** (`log_pruner.py`): Automated log pruning
+    - **performance_monitor.py** (`performance_monitor.py`): FPS, frame time, CPU tracking
+
+    - **gui_2.py**: Main GUI (79KB) - Primary ImGui interface
+    - **ai_client.py**: Multi-provider LLM abstraction (71KB)
+    - **mcp_client.py**: 26 MCP-style tools (48KB)
+    - **app_controller.py**: Headless controller (82KB) - FastAPI for headless mode
+    - **project_manager.py**: Project configuration management (13KB)
+    - **aggregate.py**: Context aggregation (14kb)
+    - **session_logger.py**: Session logging (6kb)
+    - **gemini_cli_adapter.py**: CLI subprocess adapter (6KB)
+
+    - **events.py**: Event system (3KB)
+    - **cost_tracker.py**: Cost estimation (1KB)
+
+## Current State Audit (as of {commit_sha})
+
+### Already Implemented (DO NOT re-implement)
+- **`tier_usage` dict in `ConductorEngine.__init__`** (multi_agent_conductor.py lines 50-60)**
+ ```python
+ self.tier_usage = {
+  "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
+  "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
+  "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
+  "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
+ }
+```
+- **Per-ticket breakdown available** (already tracked by tier)
+ display)
+- **Cost per model** grouped by model name (Gemini, Anthropic, DeepSeek)
+- **Total session cost** accumulate and display total cost
+- **Uses existing cost_tracker.py functions
+
+## Non-Functional Requirements
+| Requirement | Constraint |
+|-------------|------------|
+| Frame Time Impact | <1ms when panel visible |
+| Memory Overhead | <1KB for session cost state |
+| Thread Safety | Read tier_usage via state updates only |
+
+## Testing Requirements
+
+### Unit Tests
+- Test `estimate_cost()` with known model/token combinations
+- Test unknown model returns 0.0
+- Test session cost accumulation
+
+### Integration Tests (via `live_gui` fixture)
+- Verify cost panel displays after API call
+- Verify costs update after MMA execution
+- Verify session reset clears costs
+
+- **NO mocking** of `cost_tracker` internals
+- Use real state
+- Test artifacts go to `tests/artifacts/`
+
+## Out of Scope
+- Historical cost tracking across sessions
+- Cost budgeting/alerts
+- Export cost reports
+- API cost for web searches (no token counts available)
+
+## Acceptance Criteria
+- [ ] Cost panel displays in GUI
+- [ ] Per-tier cost shown with token counts
+- [ ] Tier breakdown accurate using existing `tier_usage`
+- [ ] Total session cost accumulates correctly
+- [ ] Panel updates on MMA state changes
+- [ ] Uses existing `cost_tracker.estimate_cost()`
+- [ ] Session reset clears costs
+- [ ] 1-space indentation maintained
+### Unit Tests
+- Test `estimate_cost()` with known model/token combinations
+- Test unknown model returns 0.0
+- Test session cost accumulation
+
+### Integration Tests (via `live_gui` fixture)
+- Verify cost panel displays after MMA execution
+- Verify session reset clears costs
+
+## Out of Scope
+- Historical cost tracking across sessions
+- Cost budgeting/alerts
+- Per-model aggregation (model already per-tier)
+
+## Acceptance Criteria
+- [ ] Cost panel displays in GUI
+- [ ] Per-tier cost shown with token counts
+- [ ] Tier breakdown uses existing tier_usage model field
+- [ ] Total session cost accumulates correctly
+- [ ] Panel updates on MMA state changes
+- [ ] Uses existing `cost_tracker.estimate_cost()`
+- [ ] Session reset clears costs
+- [ ] 1-space indentation maintained
+
+## Non-Functional Requirements
+
+| Requirement | Constraint |
+|-------------|------------|
+| Frame Time Impact | <1ms when panel visible |
+| Memory Overhead | <1KB for session cost state |
+| Thread Safety | Read tier_usage via state updates only |
+
+## Testing Requirements
+
+### Unit Tests
+- Test `estimate_cost()` with known model/token combinations
+- Test unknown model returns 0.0
+- Test session cost accumulation
+
+### Integration Tests (via `live_gui` fixture)
+- Verify cost panel displays after API call
+- Verify costs update after MMA execution
+- Verify session reset clears costs
+
+### Structural Testing Contract
+- Use real `cost_tracker` module - no mocking
+- Test artifacts go to `tests/artifacts/`
+
+## Out of Scope
+- Historical cost tracking across sessions
+- Cost budgeting/alerts
+- Export cost reports
+- API cost for web searches (no token counts available)
+
+## Acceptance Criteria
+- [ ] Cost panel displays in GUI
+- [ ] Per-model cost shown with token counts
+- [ ] Tier breakdown accurate using `tier_usage`
+- [ ] Total session cost accumulates correctly
+- [ ] Panel updates on MMA state changes
+- [ ] Uses existing `cost_tracker.estimate_cost()`
+- [ ] Session reset clears costs
+- [ ] 1-space indentation maintained