2.6 KiB
2.6 KiB
Track Specification: Context & Token Visualization
Overview
product.md lists "Context & Memory Management" as primary use case #2: "Better visualization and management of token usage and context memory, allowing developers to optimize prompt limits manually." The backend already computes everything needed via ai_client.get_history_bleed_stats() (ai_client.py:1657-1796, 140 lines). This track builds the UI to expose it.
Current State
Backend (already implemented)
get_history_bleed_stats(md_content=None) -> dict[str, Any] returns:
provider: Active provider namemodel: Active model namehistory_turns: Number of conversation turnsestimated_prompt_tokens: Total estimated prompt tokens (system + history + tools)max_prompt_tokens: Provider's max (180K Anthropic, 900K Gemini)utilization_pct:estimated / max * 100headroom_tokens: Tokens remaining before trimming kicks inwould_trim: Boolean — whether the next call would trigger history trimmingtrimmable_turns: Number of turns that could be droppedsystem_tokens: Tokens consumed by system prompt + contexttools_tokens: Tokens consumed by tool definitionshistory_tokens: Tokens consumed by conversation history- Per-message breakdown with role, token estimate, and whether it contains tool use
GUI (missing)
No UI exists to display any of this. The user has zero visibility into:
- How close they are to hitting the context window limit
- What proportion is system prompt vs history vs tools
- Which messages would be trimmed and when
- Whether Gemini's server-side cache is active and how large it is
Goals
- Token Budget Bar: A prominent progress bar showing context utilization (green < 50%, yellow 50-80%, red > 80%).
- Breakdown Panel: Stacked bar or table showing system/tools/history proportions.
- Trimming Preview: When
would_trimis true, show which turns would be dropped. - Cache Status: For Gemini, show whether
_gemini_cacheexists, its size in tokens, and TTL remaining. - Refresh: Auto-refresh on provider/model switch and after each AI response.
Architecture Reference
- AI client state: docs/guide_architecture.md — see "AI Client: Multi-Provider Architecture"
- Gemini cache: docs/guide_architecture.md — see "Gemini Cache Strategy"
- Anthropic cache: docs/guide_architecture.md — see "Anthropic Cache Strategy (4-Breakpoint System)"
- Frame-sync: docs/guide_architecture.md — see
_process_pending_gui_tasksfor how to safely read backend state from GUI thread