Files
manual_slop/conductor/tracks/context_token_viz_20260301/plan.md

3.7 KiB

Implementation Plan: Context & Token Visualization

Architecture reference: docs/guide_architecture.md — AI Client section

Phase 1: Token Budget Display

  • Task 1.1: Add a new method _render_token_budget_panel(self) in gui_2.py. 5bfb20f Place it in the Provider panel area (after _render_provider_panel, gui_2.py:2485-2542), or as a new collapsible section within the provider panel. Call ai_client.get_history_bleed_stats(self._last_stable_md) — need to cache self._last_stable_md from the last _do_generate() call (gui_2.py:1408-1425, the stable_md return value). Store the result in self._token_stats: dict = {}, refreshed on each _do_generate call and on provider/model switch.
  • Task 1.2: Render the utilization bar. 5bfb20f Use imgui.progress_bar(stats['utilization_pct'] / 100, ImVec2(-1, 0), f"{stats['utilization_pct']:.1f}%"). Color-code via imgui.push_style_color(imgui.Col_.plot_histogram, ...): green if <50%, yellow if 50-80%, red if >80%. Below the bar, show: f"{stats['estimated_prompt_tokens']:,} / {stats['max_prompt_tokens']:,} tokens ({stats['headroom_tokens']:,} remaining)".
  • Task 1.3: Render the proportion breakdown as a 3-row table. 5bfb20f: System (system_tokens), Tools (tools_tokens), History (history_tokens). Each row shows token count and percentage of total. Use imgui.begin_table("token_breakdown", 3) with columns: Component, Tokens, Pct.
  • Task 1.4: Write tests verifying _render_token_budget_panel. 5bfb20f calls get_history_bleed_stats and handles the empty dict case (when no provider is configured).

Phase 2: Trimming Preview & Cache Status

  • Task 2.1: When stats.get('would_trim') is True. 7b5d9b1, render a warning: imgui.text_colored(ImVec4(1,0.3,0,1), "WARNING: Next call will trim history"). Below it, show f"Trimmable turns: {stats['trimmable_turns']}". If stats contains per-message breakdown, render the first 3 trimmable messages with their role and token count in a compact list.
  • Task 2.2: Add Gemini cache status display. 7b5d9b1 Read ai_client._gemini_cache (check is not None), ai_client._gemini_cache_created_at, and ai_client._GEMINI_CACHE_TTL. If cache exists, show: "Gemini Cache: ACTIVE | Age: {age_seconds}s / {ttl}s | Renews at: {ttl * 0.9:.0f}s". If not, show "Gemini Cache: INACTIVE". Guard with if ai_client._provider == "gemini":.
  • Task 2.3: Add Anthropic cache hint. 7b5d9b1 When provider is "anthropic", show: "Anthropic: 4-breakpoint ephemeral caching (auto-managed)" with the number of history turns and whether the latest response used cache reads (check last comms log entry for cache_read_input_tokens).
  • Task 2.4: Write tests for trimming warning visibility and cache status display. 7b5d9b1

Phase 3: Auto-Refresh & Integration

  • Task 3.1: Hook _token_stats refresh into three trigger points. 6f18102: (a) after _do_generate() completes — cache stable_md and call get_history_bleed_stats; (b) after provider/model switch in current_provider.setter and current_model.setter — clear and re-fetch; (c) after each handle_ai_response in _process_pending_gui_tasks — refresh stats since history grew. For (c), use a flag self._token_stats_dirty = True and refresh in the next frame's render call to avoid calling the stats function too frequently.
  • Task 3.2: Add the token budget panel to the Hook API. 6f18102 Extend /api/gui/mma_status (or add a new /api/gui/token_stats endpoint) to expose _token_stats for simulation verification. This allows tests to assert on token utilization levels.
  • Task 3.3: Conductor - User Manual Verification 'Phase 3: Auto-Refresh & Integration' (Protocol in workflow.md). 2929a64 — verified by user, panel rendering correctly.