Files
manual_slop/conductor/tracks/cost_token_analytics_20260306/spec.md
2026-03-06 15:57:39 -05:00

11 KiB

Implementation Plan: Cost & Token Analytics Panel (cost_token_analytics_20260306)

Reference: Spec | Architecture Guide

Phase 1: Foundation & Research

Focus: Verify existing infrastructure

  • Task 1.1: Initialize MMA Environment

    • Run activate_skill mma-orchestrator before starting
  • Task 1.2: Verify cost_tracker.py implementation

    • WHERE: src/cost_tracker.py

    • WHAT: Confirm MODEL_PRICING list structure

    • HOW: Use manual-slop_py_get_definition on estimate_cost

    • OUTPUT: Document exact regex-based matching

    • Note: estimate_cost loops through patterns, Unknown models return 0.0.

    • SHA verification: Run uv run pytest tests/test_cost_tracker.py -v

    • COMMAND: `uv run pytest tests/test_cost_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py -v --batched (4 files max due to complex threading issues)

    • Example Announcement: "I will now run the automated test suite to verify the phase. Command: uv run pytest tests/test_specific_feature.py (substitute actual file)"

    • Execute the announced command.

    • Execute the announced command.

    • Execute and commands in parallel for potentially slow simulation tests ( batching: maximum 4 test files at a time, use --timeout=60 or --timeout=120 if the specific tests in the batch are known to be slow (e.g., simulation tests), increase timeout or --timeout appropriately.

    • Example Announcement: "I will now run the automated test suite to verify the phase. Command: uv run pytest tests/test_cache_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py tests/test_cost_panel.py -v

    • CRITICAL: The full suite frequently can lead to random timeouts or threading access violations. To prevent waiting the full timeout if the GUI exits early. the test file should check its extension.

    • For each remaining code file, verify a corresponding test file exists.

    • If a test file is missing, create one. Before writing the test, be aware that the may tests may have @pytest decorators (e.g., @pytest.mark.integration), - In every test file before verifying a test file exists.

    • For each remaining code file, verify a corresponding test file exists

    • If a test file is missing, create one. Before writing the test, be aware of the naming convention and testing style. The new tests must validate the functionality described in this phase's tasks (plan.md).

    • Use live_gui fixture to interact with a real instance of the application via the Hook API, test_gui2_events.py and test_gui2_parity.py already verify this pattern.

    • For each test file over 50 lines without using py_get_skeleton, py_get_code_outline, py_get_definition first to map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in docs/ (last updated: 08e003a):

  • docs/guide_architecture.md: Threading model, event system, AI client, HITL mechanism.

  • docs/guide_mma.md: Ticket/Track/WorkerContext data structures, DAG engine algorithms, ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia.

  • docs/guide_simulations.md: live_gui fixture and Puppeteer pattern, mock provider protocol, visual verification patterns.

  • get_file_summary first to decide whether you need the full content. Use get_file_summary, py_get_skeleton, or py_get_code_outline to map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in docs/ (last updated: 08e003a):

  • docs/guide_tools.md: MCP Bridge 3-layer security model, 26-tool inventory with parameters, Hook API endpoint reference (GET/POST), ApiHookClient method reference.

  • docs/guide_meta_boundary.md: The critical distinction between the Application's Strict-HITL environment and the Meta-Tooling environment used to build it.

  • Application Layer (gui_2.py, app_controller.py): Threads run in src/ directory. Events flow through SyncEventQueue and EventEmitter for decoupled communication.

  • api_hooks.py: HTTP server exposing internal state via REST API when launched with --enable-test-hooks flag otherwise only for CLI adapter, uses SyncEventQueue to push events to the GUI.

  • ApiHookClient (api_hook_client.py): Client for interacting with the running application via the Hook API.

    • get_status(): Health check endpoint
    • get_mma_status(): Returns full MMA engine status
    • get_gui_state(): Returns full GUI state
    • get_value(item): Gets a GUI value by mapped field name
    • get_performance(): Returns performance metrics
    • click(item, user_data): Simulates a button click
    • set_value(item, value): Sets a GUI value
    • select_tab(item, value): Selects a specific tab
    • reset_session(): Resets the session via button click
  • MMA Prompts (mma_prompts.py): Structured system prompts for MMA tiers

  • ConductorTechLead (conductor_tech_lead.py): Generates tickets from track brief

  • models.py (models.py): Data structures (Ticket, Track, TrackState, WorkerContext)

  • dag_engine.py (dag_engine.py): DAG execution engine with cycle detection and topological sorting

  • multi_agent_conductor.py (multi_agent_conductor.py): MMA orchestration engine

  • shell_runner.py (shell_runner.py): Sandboxed PowerShell execution

    • file_cache.py (file_cache.py): AST parser with tree-sitter

    • summarize.py (summarize.py): Heuristic file summaries

    • outline_tool.py (outline_tool.py): Code outlining with line ranges

    • theme.py / theme_2.py (theme.py, theme_2.py): ImGui theme/color palettes

    • log_registry.py (log_registry.py): Session log registry with TOML persistence

    • log_pruner.py (log_pruner.py): Automated log pruning

    • performance_monitor.py (performance_monitor.py): FPS, frame time, CPU tracking

    • gui_2.py: Main GUI (79KB) - Primary ImGui interface

    • ai_client.py: Multi-provider LLM abstraction (71KB)

    • mcp_client.py: 26 MCP-style tools (48KB)

    • app_controller.py: Headless controller (82KB) - FastAPI for headless mode

    • project_manager.py: Project configuration management (13KB)

    • aggregate.py: Context aggregation (14kb)

    • session_logger.py: Session logging (6kb)

    • gemini_cli_adapter.py: CLI subprocess adapter (6KB)

    • events.py: Event system (3KB)

    • cost_tracker.py: Cost estimation (1KB)

Current State Audit (as of {commit_sha})

Already Implemented (DO NOT re-implement)

  • tier_usage dict in ConductorEngine.__init__ (multi_agent_conductor.py lines 50-60)**
self.tier_usage = {
 "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
 "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
 "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
 "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
}
  • Per-ticket breakdown available (already tracked by tier) display)
  • Cost per model grouped by model name (Gemini, Anthropic, DeepSeek)
  • Total session cost accumulate and display total cost
  • **Uses existing cost_tracker.py functions

Non-Functional Requirements

Requirement Constraint
Frame Time Impact <1ms when panel visible
Memory Overhead <1KB for session cost state
Thread Safety Read tier_usage via state updates only

Testing Requirements

Unit Tests

  • Test estimate_cost() with known model/token combinations
  • Test unknown model returns 0.0
  • Test session cost accumulation

Integration Tests (via live_gui fixture)

  • Verify cost panel displays after API call

  • Verify costs update after MMA execution

  • Verify session reset clears costs

  • NO mocking of cost_tracker internals

  • Use real state

  • Test artifacts go to tests/artifacts/

Out of Scope

  • Historical cost tracking across sessions
  • Cost budgeting/alerts
  • Export cost reports
  • API cost for web searches (no token counts available)

Acceptance Criteria

  • Cost panel displays in GUI
  • Per-tier cost shown with token counts
  • Tier breakdown accurate using existing tier_usage
  • Total session cost accumulates correctly
  • Panel updates on MMA state changes
  • Uses existing cost_tracker.estimate_cost()
  • Session reset clears costs
  • 1-space indentation maintained

Unit Tests

  • Test estimate_cost() with known model/token combinations
  • Test unknown model returns 0.0
  • Test session cost accumulation

Integration Tests (via live_gui fixture)

  • Verify cost panel displays after MMA execution
  • Verify session reset clears costs

Out of Scope

  • Historical cost tracking across sessions
  • Cost budgeting/alerts
  • Per-model aggregation (model already per-tier)

Acceptance Criteria

  • Cost panel displays in GUI
  • Per-tier cost shown with token counts
  • Tier breakdown uses existing tier_usage model field
  • Total session cost accumulates correctly
  • Panel updates on MMA state changes
  • Uses existing cost_tracker.estimate_cost()
  • Session reset clears costs
  • 1-space indentation maintained

Non-Functional Requirements

Requirement Constraint
Frame Time Impact <1ms when panel visible
Memory Overhead <1KB for session cost state
Thread Safety Read tier_usage via state updates only

Testing Requirements

Unit Tests

  • Test estimate_cost() with known model/token combinations
  • Test unknown model returns 0.0
  • Test session cost accumulation

Integration Tests (via live_gui fixture)

  • Verify cost panel displays after API call
  • Verify costs update after MMA execution
  • Verify session reset clears costs

Structural Testing Contract

  • Use real cost_tracker module - no mocking
  • Test artifacts go to tests/artifacts/

Out of Scope

  • Historical cost tracking across sessions
  • Cost budgeting/alerts
  • Export cost reports
  • API cost for web searches (no token counts available)

Acceptance Criteria

  • Cost panel displays in GUI
  • Per-model cost shown with token counts
  • Tier breakdown accurate using tier_usage
  • Total session cost accumulates correctly
  • Panel updates on MMA state changes
  • Uses existing cost_tracker.estimate_cost()
  • Session reset clears costs
  • 1-space indentation maintained