Files
manual_slop/conductor/archive/context_token_viz_20260301/spec.md

2.6 KiB

Track Specification: Context & Token Visualization

Overview

product.md lists "Context & Memory Management" as primary use case #2: "Better visualization and management of token usage and context memory, allowing developers to optimize prompt limits manually." The backend already computes everything needed via ai_client.get_history_bleed_stats() (ai_client.py:1657-1796, 140 lines). This track builds the UI to expose it.

Current State

Backend (already implemented)

get_history_bleed_stats(md_content=None) -> dict[str, Any] returns:

  • provider: Active provider name
  • model: Active model name
  • history_turns: Number of conversation turns
  • estimated_prompt_tokens: Total estimated prompt tokens (system + history + tools)
  • max_prompt_tokens: Provider's max (180K Anthropic, 900K Gemini)
  • utilization_pct: estimated / max * 100
  • headroom_tokens: Tokens remaining before trimming kicks in
  • would_trim: Boolean — whether the next call would trigger history trimming
  • trimmable_turns: Number of turns that could be dropped
  • system_tokens: Tokens consumed by system prompt + context
  • tools_tokens: Tokens consumed by tool definitions
  • history_tokens: Tokens consumed by conversation history
  • Per-message breakdown with role, token estimate, and whether it contains tool use

GUI (missing)

No UI exists to display any of this. The user has zero visibility into:

  • How close they are to hitting the context window limit
  • What proportion is system prompt vs history vs tools
  • Which messages would be trimmed and when
  • Whether Gemini's server-side cache is active and how large it is

Goals

  1. Token Budget Bar: A prominent progress bar showing context utilization (green < 50%, yellow 50-80%, red > 80%).
  2. Breakdown Panel: Stacked bar or table showing system/tools/history proportions.
  3. Trimming Preview: When would_trim is true, show which turns would be dropped.
  4. Cache Status: For Gemini, show whether _gemini_cache exists, its size in tokens, and TTL remaining.
  5. Refresh: Auto-refresh on provider/model switch and after each AI response.

Architecture Reference