Private
Public Access
0
0
Files
manual_slop/src/cost_tracker.py
T
ed f9b5c9372d feat(grok,llama): add to PROVIDERS; add 11 pricing entries (3 Grok + 8 Llama)
Side concerns for Phase 3:

1. PROVIDERS: src/models.py:56 now includes 'grok' and 'llama' alongside
   the 6 existing vendors. Centralized registry; gui_2.py and
   app_controller.py import from here. State tasks t3.5 and t3.16
   were scoped to gui_2.py/app_controller.py but the actual change
   is at the centralized registry, per the project's single-source-of-
   truth pattern (per src/models.py module docstring and the Phase 5
   audit script audit_no_models_config_io.py which enforces that
   PROVIDERS lives in models.py).

2. cost_tracker.py: added 11 regex pricing entries (3 Grok + 8 Llama):

   Grok (per xAI public pricing):
   - grok-2: 2.00 / 10.00
   - grok-2-vision: 2.00 / 10.00
   - grok-beta: 5.00 / 15.00

   Llama (per Grok's consultation: pricing varies by backend; registry
   entries represent the most common case):
   - llama-3.1-8b-instant: 0.05 / 0.08 (Groq)
   - llama-3.1-70b-versatile: 0.59 / 0.79 (Groq)
   - llama-3.1-405b-reasoning: 3.00 / 3.00 (OpenRouter avg)
   - llama-3.2-1b-preview: 0.04 / 0.04
   - llama-3.2-3b-preview: 0.06 / 0.06
   - llama-3.2-11b-vision-preview: 0.18 / 0.18
   - llama-3.2-90b-vision-preview: 0.90 / 0.90
   - llama-3.3-70b-specdec: 0.59 / 0.79 (Groq)

   (all per 1M tokens, USD; matches the structure of existing entries;
   note: 'llama-3.1', 'llama-3.2', 'llama-3.3' are regex patterns to
   allow future model variants in the same family.)

   Spot check:
   - estimate_cost('grok-2', 1000, 500) = 0.007 (= 0.002 + 0.005)
   - estimate_cost('llama-3.3-70b-specdec', 1000, 500) = 0.000985

3. SKIPPED t3.4 and t3.15 (credentials templates): no
   credentials_template.toml exists in the project (Phase 2 established
   this). The user maintains their own credentials.toml directly.

4. t3.6 and t3.17 (Grok/Llama models in capability registry) were
   completed in Phase 1's initial population of 22 entries
   (commit 6be04bc). Grok has 4 entries (1 wildcard + 3 models);
   Llama has 9 entries (1 wildcard + 8 models). Grok-2-vision has
   vision=True; Llama 3.2-11b/90b vision variants have vision=True.

Verification: 38/38 tests pass in batch.
2026-06-11 02:02:56 -04:00

82 lines
3.6 KiB
Python

"""
Cost Tracker - Token cost estimation for API calls.
This module provides cost estimation for different LLM providers based on per-token pricing.
It is used to display estimated costs in the MMA Dashboard.
Pricing Data (per 1M tokens):
- gemini-2.5-flash-lite: $0.075 input / $0.30 output
- gemini-3-flash-preview: $0.15 input / $0.60 output
- gemini-3.1-pro-preview: $3.50 input / $10.50 output
- claude-*-sonnet: $3.0 input / $15.0 output
- claude-*-opus: $15.0 input / $75.0 output
- deepseek-v3: $0.27 input / $1.10 output
Usage:
from src.cost_tracker import estimate_cost
total = estimate_cost("gemini-2.5-flash-lite", 50000, 10000)
# Returns: 0.007 (approx)
Accuracy:
- Pricing data may be outdated
- Uses regex matching for model identification
- Returns 0.0 for unknown models
Integration:
- Used by gui_2.py for MMA dashboard cost display
- Called after each API call
See Also:
- src/ai_client.py for token tracking
- docs/guide_mma.md for MMA dashboard documentation
"""
import re
# Pricing per 1M tokens in USD
MODEL_PRICING = [
(r"gemini-2\.5-flash-lite", {"input_per_mtok": 0.075, "output_per_mtok": 0.30}),
(r"gemini-2\.5-flash", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}),
(r"gemini-3-flash-preview", {"input_per_mtok": 0.15, "output_per_mtok": 0.60}),
(r"gemini-3\.1-pro-preview", {"input_per_mtok": 3.50, "output_per_mtok": 10.50}),
(r"claude-.*-sonnet", {"input_per_mtok": 3.0, "output_per_mtok": 15.0}),
(r"claude-.*-opus", {"input_per_mtok": 15.0, "output_per_mtok": 75.0}),
(r"deepseek-v3", {"input_per_mtok": 0.27, "output_per_mtok": 1.10}),
(r"qwen-turbo", {"input_per_mtok": 0.05, "output_per_mtok": 0.10}),
(r"qwen-plus", {"input_per_mtok": 0.40, "output_per_mtok": 1.20}),
(r"qwen-max", {"input_per_mtok": 2.00, "output_per_mtok": 6.00}),
(r"qwen-long", {"input_per_mtok": 0.07, "output_per_mtok": 0.28}),
(r"qwen-vl-plus", {"input_per_mtok": 0.21, "output_per_mtok": 0.63}),
(r"qwen-vl-max", {"input_per_mtok": 0.50, "output_per_mtok": 1.50}),
(r"qwen-audio", {"input_per_mtok": 0.10, "output_per_mtok": 0.30}),
(r"grok-2", {"input_per_mtok": 2.00, "output_per_mtok": 10.00}),
(r"grok-2-vision", {"input_per_mtok": 2.00, "output_per_mtok": 10.00}),
(r"grok-beta", {"input_per_mtok": 5.00, "output_per_mtok": 15.00}),
(r"llama-3\.1-8b-instant", {"input_per_mtok": 0.05, "output_per_mtok": 0.08}),
(r"llama-3\.1-70b-versatile", {"input_per_mtok": 0.59, "output_per_mtok": 0.79}),
(r"llama-3\.1-405b-reasoning", {"input_per_mtok": 3.00, "output_per_mtok": 3.00}),
(r"llama-3\.2-1b-preview", {"input_per_mtok": 0.04, "output_per_mtok": 0.04}),
(r"llama-3\.2-3b-preview", {"input_per_mtok": 0.06, "output_per_mtok": 0.06}),
(r"llama-3\.2-11b-vision-preview", {"input_per_mtok": 0.18, "output_per_mtok": 0.18}),
(r"llama-3\.2-90b-vision-preview", {"input_per_mtok": 0.90, "output_per_mtok": 0.90}),
(r"llama-3\.3-70b-specdec", {"input_per_mtok": 0.59, "output_per_mtok": 0.79}),
]
def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
"""
Estimate the cost of a model call based on input and output tokens.
Returns the total cost in USD.
[C: src/gui_2.py:App._render_mma_track_summary, src/gui_2.py:App._render_mma_usage_section, src/gui_2.py:App._render_token_budget_panel, tests/test_cost_tracker.py:test_estimate_cost]
"""
if not model:
return 0.0
for pattern, rates in MODEL_PRICING:
if re.search(pattern, model, re.IGNORECASE):
input_cost = (input_tokens / 1_000_000) * rates["input_per_mtok"]
output_cost = (output_tokens / 1_000_000) * rates["output_per_mtok"]
return input_cost + output_cost
return 0.0