3.6 KiB
3.6 KiB
Implementation Plan: Context & Token Visualization
Architecture reference: docs/guide_architecture.md — AI Client section
Phase 1: Token Budget Display
- Task 1.1: Add a new method
_render_token_budget_panel(self)ingui_2.py.5bfb20fPlace it in the Provider panel area (after_render_provider_panel, gui_2.py:2485-2542), or as a new collapsible section within the provider panel. Callai_client.get_history_bleed_stats(self._last_stable_md)— need to cacheself._last_stable_mdfrom the last_do_generate()call (gui_2.py:1408-1425, thestable_mdreturn value). Store the result inself._token_stats: dict = {}, refreshed on each_do_generatecall and on provider/model switch. - Task 1.2: Render the utilization bar.
5bfb20fUseimgui.progress_bar(stats['utilization_pct'] / 100, ImVec2(-1, 0), f"{stats['utilization_pct']:.1f}%"). Color-code viaimgui.push_style_color(imgui.Col_.plot_histogram, ...): green if <50%, yellow if 50-80%, red if >80%. Below the bar, show:f"{stats['estimated_prompt_tokens']:,} / {stats['max_prompt_tokens']:,} tokens ({stats['headroom_tokens']:,} remaining)". - Task 1.3: Render the proportion breakdown as a 3-row table.
5bfb20f: System (system_tokens), Tools (tools_tokens), History (history_tokens). Each row shows token count and percentage of total. Useimgui.begin_table("token_breakdown", 3)with columns: Component, Tokens, Pct. - Task 1.4: Write tests verifying
_render_token_budget_panel.5bfb20fcallsget_history_bleed_statsand handles the empty dict case (when no provider is configured).
Phase 2: Trimming Preview & Cache Status
- Task 2.1: When
stats.get('would_trim')is True, render a warning:imgui.text_colored(ImVec4(1,0.3,0,1), "WARNING: Next call will trim history"). Below it, showf"Trimmable turns: {stats['trimmable_turns']}". Ifstatscontains per-message breakdown, render the first 3 trimmable messages with their role and token count in a compact list. - Task 2.2: Add Gemini cache status display. Read
ai_client._gemini_cache(checkis not None),ai_client._gemini_cache_created_at, andai_client._GEMINI_CACHE_TTL. If cache exists, show:"Gemini Cache: ACTIVE | Age: {age_seconds}s / {ttl}s | Renews at: {ttl * 0.9:.0f}s". If not, show"Gemini Cache: INACTIVE". Guard withif ai_client._provider == "gemini":. - Task 2.3: Add Anthropic cache hint. When provider is
"anthropic", show:"Anthropic: 4-breakpoint ephemeral caching (auto-managed)"with the number of history turns and whether the latest response used cache reads (check last comms log entry forcache_read_input_tokens). - [~] Task 2.4: Write tests for trimming warning visibility and cache status display.
Phase 3: Auto-Refresh & Integration
- Task 3.1: Hook
_token_statsrefresh into three trigger points: (a) after_do_generate()completes — cachestable_mdand callget_history_bleed_stats; (b) after provider/model switch incurrent_provider.setterandcurrent_model.setter— clear and re-fetch; (c) after eachhandle_ai_responsein_process_pending_gui_tasks— refresh stats since history grew. For (c), use a flagself._token_stats_dirty = Trueand refresh in the next frame's render call to avoid calling the stats function too frequently. - Task 3.2: Add the token budget panel to the Hook API. Extend
/api/gui/mma_status(or add a new/api/gui/token_statsendpoint) to expose_token_statsfor simulation verification. This allows tests to assert on token utilization levels. - Task 3.3: Conductor - User Manual Verification 'Phase 3: Auto-Refresh & Integration' (Protocol in workflow.md)