Files

Ed_ 0d2b6049d1 conductor: Create 3 MVP tracks with surgical specs from full codebase analysis

Three new tracks identified by analyzing product.md requirements against
actual codebase state using 1M-context Opus with all architecture docs loaded:

1. mma_pipeline_fix_20260301 (P0, blocker):
   - Diagnoses why Tier 3 worker output never reaches mma_streams in GUI
   - Identifies 4 root cause candidates: positional arg ordering, asyncio.Queue
     thread-safety violation, ai_client.reset_session() side effects, token
     stats stub returning empty dict
   - 2 phases, 6 tasks with exact line references

2. simulation_hardening_20260301 (P1, depends on pipeline fix):
   - Addresses 3 documented issues from robust_live_simulation session compression
   - Mock triggers wrong approval popup, popup state desync, approval ambiguity
   - 3 phases, 9 tasks including standalone mock test suite

3. context_token_viz_20260301 (P2):
   - Builds UI for product.md primary use case #2 'Context & Memory Management'
   - Backend already complete (get_history_bleed_stats, 140 lines)
   - Token budget bar, proportion breakdown, trimming preview, cache status
   - 3 phases, 10 tasks

Execution order: pipeline_fix -> simulation_hardening -> gui_ux (parallel w/ token_viz)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-01 09:58:34 -05:00

2.6 KiB

Raw Blame History

Track Specification: Context & Token Visualization

Overview

product.md lists "Context & Memory Management" as primary use case #2: "Better visualization and management of token usage and context memory, allowing developers to optimize prompt limits manually." The backend already computes everything needed via ai_client.get_history_bleed_stats() (ai_client.py:1657-1796, 140 lines). This track builds the UI to expose it.

Current State

Backend (already implemented)

get_history_bleed_stats(md_content=None) -> dict[str, Any] returns:

provider: Active provider name
model: Active model name
history_turns: Number of conversation turns
estimated_prompt_tokens: Total estimated prompt tokens (system + history + tools)
max_prompt_tokens: Provider's max (180K Anthropic, 900K Gemini)
utilization_pct: estimated / max * 100
headroom_tokens: Tokens remaining before trimming kicks in
would_trim: Boolean — whether the next call would trigger history trimming
trimmable_turns: Number of turns that could be dropped
system_tokens: Tokens consumed by system prompt + context
tools_tokens: Tokens consumed by tool definitions
history_tokens: Tokens consumed by conversation history
Per-message breakdown with role, token estimate, and whether it contains tool use

GUI (missing)

No UI exists to display any of this. The user has zero visibility into:

How close they are to hitting the context window limit
What proportion is system prompt vs history vs tools
Which messages would be trimmed and when
Whether Gemini's server-side cache is active and how large it is

Goals

Token Budget Bar: A prominent progress bar showing context utilization (green < 50%, yellow 50-80%, red > 80%).
Breakdown Panel: Stacked bar or table showing system/tools/history proportions.
Trimming Preview: When would_trim is true, show which turns would be dropped.
Cache Status: For Gemini, show whether _gemini_cache exists, its size in tokens, and TTL remaining.
Refresh: Auto-refresh on provider/model switch and after each AI response.

Architecture Reference

AI client state: docs/guide_architecture.md — see "AI Client: Multi-Provider Architecture"
Gemini cache: docs/guide_architecture.md — see "Gemini Cache Strategy"
Anthropic cache: docs/guide_architecture.md — see "Anthropic Cache Strategy (4-Breakpoint System)"
Frame-sync: docs/guide_architecture.md — see _process_pending_gui_tasks for how to safely read backend state from GUI thread

2.6 KiB Raw Blame History