ed/manual_slop

Fork 0

Files

Ed_ bf24164b1f sigh

2026-03-06 15:57:39 -05:00

11 KiB

Raw Blame History

Implementation Plan: Cost & Token Analytics Panel (cost_token_analytics_20260306)

Reference: Spec | Architecture Guide

Phase 1: Foundation & Research

Focus: Verify existing infrastructure

Task 1.1: Initialize MMA Environment
- Run activate_skill mma-orchestrator before starting
Task 1.2: Verify cost_tracker.py implementation
- WHERE: src/cost_tracker.py
- WHAT: Confirm MODEL_PRICING list structure
- HOW: Use manual-slop_py_get_definition on estimate_cost
- OUTPUT: Document exact regex-based matching
- Note: estimate_cost loops through patterns, Unknown models return 0.0.
- SHA verification: Run uv run pytest tests/test_cost_tracker.py -v
- COMMAND: `uv run pytest tests/test_cost_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py -v --batched (4 files max due to complex threading issues)
- Example Announcement: "I will now run the automated test suite to verify the phase. Command: uv run pytest tests/test_specific_feature.py (substitute actual file)"
- Execute the announced command.
- Execute the announced command.
- Execute and commands in parallel for potentially slow simulation tests ( batching: maximum 4 test files at a time, use --timeout=60 or --timeout=120 if the specific tests in the batch are known to be slow (e.g., simulation tests), increase timeout or --timeout appropriately.
- Example Announcement: "I will now run the automated test suite to verify the phase. Command: uv run pytest tests/test_cache_panel.py tests/test_conductor_engine_v2.py tests/test_cost_tracker.py tests/test_cost_panel.py -v
- CRITICAL: The full suite frequently can lead to random timeouts or threading access violations. To prevent waiting the full timeout if the GUI exits early. the test file should check its extension.
- For each remaining code file, verify a corresponding test file exists.
- If a test file is missing, create one. Before writing the test, be aware that the may tests may have @pytest decorators (e.g., @pytest.mark.integration), - In every test file before verifying a test file exists.
- For each remaining code file, verify a corresponding test file exists
- If a test file is missing, create one. Before writing the test, be aware of the naming convention and testing style. The new tests must validate the functionality described in this phase's tasks (plan.md).
- Use live_gui fixture to interact with a real instance of the application via the Hook API, test_gui2_events.py and test_gui2_parity.py already verify this pattern.
- For each test file over 50 lines without using py_get_skeleton, py_get_code_outline, py_get_definition first to map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in docs/ (last updated: 08e003a):
docs/guide_architecture.md: Threading model, event system, AI client, HITL mechanism.
docs/guide_mma.md: Ticket/Track/WorkerContext data structures, DAG engine algorithms, ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia.
docs/guide_simulations.md: live_gui fixture and Puppeteer pattern, mock provider protocol, visual verification patterns.
get_file_summary first to decide whether you need the full content. Use get_file_summary, py_get_skeleton, or py_get_code_outline to map the architecture when uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in docs/ (last updated: 08e003a):
docs/guide_tools.md: MCP Bridge 3-layer security model, 26-tool inventory with parameters, Hook API endpoint reference (GET/POST), ApiHookClient method reference.
docs/guide_meta_boundary.md: The critical distinction between the Application's Strict-HITL environment and the Meta-Tooling environment used to build it.
Application Layer (gui_2.py, app_controller.py): Threads run in src/ directory. Events flow through SyncEventQueue and EventEmitter for decoupled communication.
api_hooks.py: HTTP server exposing internal state via REST API when launched with --enable-test-hooks flag otherwise only for CLI adapter, uses SyncEventQueue to push events to the GUI.
ApiHookClient (api_hook_client.py): Client for interacting with the running application via the Hook API.
- get_status(): Health check endpoint
- get_mma_status(): Returns full MMA engine status
- get_gui_state(): Returns full GUI state
- get_value(item): Gets a GUI value by mapped field name
- get_performance(): Returns performance metrics
- click(item, user_data): Simulates a button click
- set_value(item, value): Sets a GUI value
- select_tab(item, value): Selects a specific tab
- reset_session(): Resets the session via button click
MMA Prompts (mma_prompts.py): Structured system prompts for MMA tiers
ConductorTechLead (conductor_tech_lead.py): Generates tickets from track brief
models.py (models.py): Data structures (Ticket, Track, TrackState, WorkerContext)
dag_engine.py (dag_engine.py): DAG execution engine with cycle detection and topological sorting
multi_agent_conductor.py (multi_agent_conductor.py): MMA orchestration engine
shell_runner.py (shell_runner.py): Sandboxed PowerShell execution
- file_cache.py (file_cache.py): AST parser with tree-sitter
- summarize.py (summarize.py): Heuristic file summaries
- outline_tool.py (outline_tool.py): Code outlining with line ranges
- theme.py / theme_2.py (theme.py, theme_2.py): ImGui theme/color palettes
- log_registry.py (log_registry.py): Session log registry with TOML persistence
- log_pruner.py (log_pruner.py): Automated log pruning
- performance_monitor.py (performance_monitor.py): FPS, frame time, CPU tracking
- gui_2.py: Main GUI (79KB) - Primary ImGui interface
- ai_client.py: Multi-provider LLM abstraction (71KB)
- mcp_client.py: 26 MCP-style tools (48KB)
- app_controller.py: Headless controller (82KB) - FastAPI for headless mode
- project_manager.py: Project configuration management (13KB)
- aggregate.py: Context aggregation (14kb)
- session_logger.py: Session logging (6kb)
- gemini_cli_adapter.py: CLI subprocess adapter (6KB)
- events.py: Event system (3KB)
- cost_tracker.py: Cost estimation (1KB)

Current State Audit (as of {commit_sha})

Already Implemented (DO NOT re-implement)

tier_usage dict in ConductorEngine.__init__ (multi_agent_conductor.py lines 50-60)**

self.tier_usage = {
 "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
 "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
 "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
 "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
}

Per-ticket breakdown available (already tracked by tier) display)
Cost per model grouped by model name (Gemini, Anthropic, DeepSeek)
Total session cost accumulate and display total cost
**Uses existing cost_tracker.py functions

Non-Functional Requirements

Requirement	Constraint
Frame Time Impact	<1ms when panel visible
Memory Overhead	<1KB for session cost state
Thread Safety	Read tier_usage via state updates only

Testing Requirements

Unit Tests

Test estimate_cost() with known model/token combinations
Test unknown model returns 0.0
Test session cost accumulation

Integration Tests (via `live_gui` fixture)

Verify cost panel displays after API call
Verify costs update after MMA execution
Verify session reset clears costs
NO mocking of cost_tracker internals
Use real state
Test artifacts go to tests/artifacts/

Out of Scope

Historical cost tracking across sessions
Cost budgeting/alerts
Export cost reports
API cost for web searches (no token counts available)

Acceptance Criteria

Cost panel displays in GUI
Per-tier cost shown with token counts
Tier breakdown accurate using existing tier_usage
Total session cost accumulates correctly
Panel updates on MMA state changes
Uses existing cost_tracker.estimate_cost()
Session reset clears costs
1-space indentation maintained

Unit Tests

Test estimate_cost() with known model/token combinations
Test unknown model returns 0.0
Test session cost accumulation

Integration Tests (via `live_gui` fixture)

Verify cost panel displays after MMA execution
Verify session reset clears costs

Out of Scope

Historical cost tracking across sessions
Cost budgeting/alerts
Per-model aggregation (model already per-tier)

Acceptance Criteria

Cost panel displays in GUI
Per-tier cost shown with token counts
Tier breakdown uses existing tier_usage model field
Total session cost accumulates correctly
Panel updates on MMA state changes
Uses existing cost_tracker.estimate_cost()
Session reset clears costs
1-space indentation maintained

Non-Functional Requirements

Requirement	Constraint
Frame Time Impact	<1ms when panel visible
Memory Overhead	<1KB for session cost state
Thread Safety	Read tier_usage via state updates only

Testing Requirements

Unit Tests

Test estimate_cost() with known model/token combinations
Test unknown model returns 0.0
Test session cost accumulation

Integration Tests (via `live_gui` fixture)

Verify cost panel displays after API call
Verify costs update after MMA execution
Verify session reset clears costs

Structural Testing Contract

Use real cost_tracker module - no mocking
Test artifacts go to tests/artifacts/

Out of Scope

Historical cost tracking across sessions
Cost budgeting/alerts
Export cost reports
API cost for web searches (no token counts available)

Acceptance Criteria

Cost panel displays in GUI
Per-model cost shown with token counts
Tier breakdown accurate using tier_usage
Total session cost accumulates correctly
Panel updates on MMA state changes
Uses existing cost_tracker.estimate_cost()
Session reset clears costs
1-space indentation maintained

11 KiB Raw Blame History

Implementation Plan: Cost & Token Analytics Panel (cost_token_analytics_20260306)

Phase 1: Foundation & Research

Current State Audit (as of {commit_sha})

Already Implemented (DO NOT re-implement)

Non-Functional Requirements

Testing Requirements

Unit Tests

Integration Tests (via live_gui fixture)

Out of Scope

Acceptance Criteria

Unit Tests

Integration Tests (via live_gui fixture)

Out of Scope

Acceptance Criteria

Non-Functional Requirements

Testing Requirements

Unit Tests

Integration Tests (via live_gui fixture)

Structural Testing Contract

Out of Scope

Acceptance Criteria

11 KiB

Raw Blame History

Integration Tests (via `live_gui` fixture)

Integration Tests (via `live_gui` fixture)

Integration Tests (via `live_gui` fixture)