11 KiB
Implementation Plan: Cost & Token Analytics Panel (cost_token_analytics_20260306)
Reference: Spec | Architecture Guide
Phase 1: Foundation & Research
Focus: Verify existing infrastructure
-
Task 1.1: Initialize MMA Environment
- Run
activate_skill mma-orchestratorbefore starting
- Run
-
Task 1.2: Verify cost_tracker.py implementation
-
WHERE:
src/cost_tracker.py -
WHAT: Confirm
MODEL_PRICINGlist structure -
HOW: Use
manual-slop_py_get_definitiononestimate_cost -
OUTPUT: Document exact regex-based matching
-
Note:
estimate_costloops through patterns, Unknown models return 0.0. -
SHA verification: Run
uv run pytest tests/test_cost_tracker.py -v -
COMMAND: `uv run pytest tests/test_cost_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py -v --batched (4 files max due to complex threading issues)
-
Example Announcement: "I will now run the automated test suite to verify the phase. Command:
uv run pytest tests/test_specific_feature.py(substitute actual file)" -
Execute the announced command.
-
Execute the announced command.
-
Execute and commands in parallel for potentially slow simulation tests ( batching: maximum 4 test files at a time, use
--timeout=60or--timeout=120if the specific tests in the batch are known to be slow (e.g., simulation tests), increase timeout or--timeoutappropriately. -
Example Announcement: "I will now run the automated test suite to verify the phase. Command:
uv run pytest tests/test_cache_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py tests/test_cost_panel.py -v -
CRITICAL: The full suite frequently can lead to random timeouts or threading access violations. To prevent waiting the full timeout if the GUI exits early. the test file should check its extension.
-
For each remaining code file, verify a corresponding test file exists.
-
If a test file is missing, create one. Before writing the test, be aware that the may tests may have
@pytestdecorators (e.g.,@pytest.mark.integration), - In every test file before verifying a test file exists. -
For each remaining code file, verify a corresponding test file exists
-
If a test file is missing, create one. Before writing the test, be aware of the naming convention and testing style. The new tests must validate the functionality described in this phase's tasks (
plan.md). -
Use
live_guifixture to interact with a real instance of the application via the Hook API,test_gui2_events.pyandtest_gui2_parity.pyalready verify this pattern. -
For each test file over 50 lines without using
py_get_skeleton,py_get_code_outline,py_get_definitionfirst to map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs indocs/(last updated:08e003a):
-
-
docs/guide_architecture.md: Threading model, event system, AI client, HITL mechanism.
-
docs/guide_mma.md: Ticket/Track/WorkerContext data structures, DAG engine algorithms, ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia.
-
docs/guide_simulations.md:
live_guifixture and Puppeteer pattern, mock provider protocol, visual verification patterns. -
get_file_summaryfirst to decide whether you need the full content. Useget_file_summary,py_get_skeleton, orpy_get_code_outlineto map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs indocs/(last updated:08e003a): -
docs/guide_tools.md: MCP Bridge 3-layer security model, 26-tool inventory with parameters, Hook API endpoint reference (GET/POST), ApiHookClient method reference.
-
docs/guide_meta_boundary.md: The critical distinction between the Application's Strict-HITL environment and the Meta-Tooling environment used to build it.
-
Application Layer (
gui_2.py,app_controller.py): Threads run insrc/directory. Events flow throughSyncEventQueueandEventEmitterfor decoupled communication. -
api_hooks.py: HTTP server exposing internal state via REST API when launched with--enable-test-hooksflag otherwise only for CLI adapter, usesSyncEventQueueto push events to the GUI. -
ApiHookClient (
api_hook_client.py): Client for interacting with the running application via the Hook API.get_status(): Health check endpointget_mma_status(): Returns full MMA engine statusget_gui_state(): Returns full GUI stateget_value(item): Gets a GUI value by mapped field nameget_performance(): Returns performance metricsclick(item, user_data): Simulates a button clickset_value(item, value): Sets a GUI valueselect_tab(item, value): Selects a specific tabreset_session(): Resets the session via button click
-
MMA Prompts (
mma_prompts.py): Structured system prompts for MMA tiers -
ConductorTechLead (
conductor_tech_lead.py): Generates tickets from track brief -
models.py (
models.py): Data structures (Ticket, Track, TrackState, WorkerContext) -
dag_engine.py (
dag_engine.py): DAG execution engine with cycle detection and topological sorting -
multi_agent_conductor.py (
multi_agent_conductor.py): MMA orchestration engine -
shell_runner.py (
shell_runner.py): Sandboxed PowerShell execution-
file_cache.py (
file_cache.py): AST parser with tree-sitter -
summarize.py (
summarize.py): Heuristic file summaries -
outline_tool.py (
outline_tool.py): Code outlining with line ranges -
theme.py / theme_2.py (
theme.py,theme_2.py): ImGui theme/color palettes -
log_registry.py (
log_registry.py): Session log registry with TOML persistence -
log_pruner.py (
log_pruner.py): Automated log pruning -
performance_monitor.py (
performance_monitor.py): FPS, frame time, CPU tracking -
gui_2.py: Main GUI (79KB) - Primary ImGui interface
-
ai_client.py: Multi-provider LLM abstraction (71KB)
-
mcp_client.py: 26 MCP-style tools (48KB)
-
app_controller.py: Headless controller (82KB) - FastAPI for headless mode
-
project_manager.py: Project configuration management (13KB)
-
aggregate.py: Context aggregation (14kb)
-
session_logger.py: Session logging (6kb)
-
gemini_cli_adapter.py: CLI subprocess adapter (6KB)
-
events.py: Event system (3KB)
-
cost_tracker.py: Cost estimation (1KB)
-
Current State Audit (as of {commit_sha})
Already Implemented (DO NOT re-implement)
tier_usagedict inConductorEngine.__init__(multi_agent_conductor.py lines 50-60)**
self.tier_usage = {
"Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
"Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
"Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
"Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
}
- Per-ticket breakdown available (already tracked by tier) display)
- Cost per model grouped by model name (Gemini, Anthropic, DeepSeek)
- Total session cost accumulate and display total cost
- **Uses existing cost_tracker.py functions
Non-Functional Requirements
| Requirement | Constraint |
|---|---|
| Frame Time Impact | <1ms when panel visible |
| Memory Overhead | <1KB for session cost state |
| Thread Safety | Read tier_usage via state updates only |
Testing Requirements
Unit Tests
- Test
estimate_cost()with known model/token combinations - Test unknown model returns 0.0
- Test session cost accumulation
Integration Tests (via live_gui fixture)
-
Verify cost panel displays after API call
-
Verify costs update after MMA execution
-
Verify session reset clears costs
-
NO mocking of
cost_trackerinternals -
Use real state
-
Test artifacts go to
tests/artifacts/
Out of Scope
- Historical cost tracking across sessions
- Cost budgeting/alerts
- Export cost reports
- API cost for web searches (no token counts available)
Acceptance Criteria
- Cost panel displays in GUI
- Per-tier cost shown with token counts
- Tier breakdown accurate using existing
tier_usage - Total session cost accumulates correctly
- Panel updates on MMA state changes
- Uses existing
cost_tracker.estimate_cost() - Session reset clears costs
- 1-space indentation maintained
Unit Tests
- Test
estimate_cost()with known model/token combinations - Test unknown model returns 0.0
- Test session cost accumulation
Integration Tests (via live_gui fixture)
- Verify cost panel displays after MMA execution
- Verify session reset clears costs
Out of Scope
- Historical cost tracking across sessions
- Cost budgeting/alerts
- Per-model aggregation (model already per-tier)
Acceptance Criteria
- Cost panel displays in GUI
- Per-tier cost shown with token counts
- Tier breakdown uses existing tier_usage model field
- Total session cost accumulates correctly
- Panel updates on MMA state changes
- Uses existing
cost_tracker.estimate_cost() - Session reset clears costs
- 1-space indentation maintained
Non-Functional Requirements
| Requirement | Constraint |
|---|---|
| Frame Time Impact | <1ms when panel visible |
| Memory Overhead | <1KB for session cost state |
| Thread Safety | Read tier_usage via state updates only |
Testing Requirements
Unit Tests
- Test
estimate_cost()with known model/token combinations - Test unknown model returns 0.0
- Test session cost accumulation
Integration Tests (via live_gui fixture)
- Verify cost panel displays after API call
- Verify costs update after MMA execution
- Verify session reset clears costs
Structural Testing Contract
- Use real
cost_trackermodule - no mocking - Test artifacts go to
tests/artifacts/
Out of Scope
- Historical cost tracking across sessions
- Cost budgeting/alerts
- Export cost reports
- API cost for web searches (no token counts available)
Acceptance Criteria
- Cost panel displays in GUI
- Per-model cost shown with token counts
- Tier breakdown accurate using
tier_usage - Total session cost accumulates correctly
- Panel updates on MMA state changes
- Uses existing
cost_tracker.estimate_cost() - Session reset clears costs
- 1-space indentation maintained