# Track Specification: Context & Token Visualization ## Overview product.md lists "Context & Memory Management" as primary use case #2: "Better visualization and management of token usage and context memory, allowing developers to optimize prompt limits manually." The backend already computes everything needed via `ai_client.get_history_bleed_stats()` (ai_client.py:1657-1796, 140 lines). This track builds the UI to expose it. ## Current State ### Backend (already implemented) `get_history_bleed_stats(md_content=None) -> dict[str, Any]` returns: - `provider`: Active provider name - `model`: Active model name - `history_turns`: Number of conversation turns - `estimated_prompt_tokens`: Total estimated prompt tokens (system + history + tools) - `max_prompt_tokens`: Provider's max (180K Anthropic, 900K Gemini) - `utilization_pct`: `estimated / max * 100` - `headroom_tokens`: Tokens remaining before trimming kicks in - `would_trim`: Boolean — whether the next call would trigger history trimming - `trimmable_turns`: Number of turns that could be dropped - `system_tokens`: Tokens consumed by system prompt + context - `tools_tokens`: Tokens consumed by tool definitions - `history_tokens`: Tokens consumed by conversation history - Per-message breakdown with role, token estimate, and whether it contains tool use ### GUI (missing) No UI exists to display any of this. The user has zero visibility into: - How close they are to hitting the context window limit - What proportion is system prompt vs history vs tools - Which messages would be trimmed and when - Whether Gemini's server-side cache is active and how large it is ## Goals 1. **Token Budget Bar**: A prominent progress bar showing context utilization (green < 50%, yellow 50-80%, red > 80%). 2. **Breakdown Panel**: Stacked bar or table showing system/tools/history proportions. 3. **Trimming Preview**: When `would_trim` is true, show which turns would be dropped. 4. **Cache Status**: For Gemini, show whether `_gemini_cache` exists, its size in tokens, and TTL remaining. 5. **Refresh**: Auto-refresh on provider/model switch and after each AI response. ## Architecture Reference - AI client state: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see "AI Client: Multi-Provider Architecture" - Gemini cache: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see "Gemini Cache Strategy" - Anthropic cache: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see "Anthropic Cache Strategy (4-Breakpoint System)" - Frame-sync: [docs/guide_architecture.md](../../docs/guide_architecture.md) — see `_process_pending_gui_tasks` for how to safely read backend state from GUI thread