Files
manual_slop/conductor/tracks/caching_optimization_20260308/plan.md

3.0 KiB

Implementation Plan: AI Provider Caching Optimization

Phase 1: Metric Tracking & Prefix Stabilization

  • Task: Implement cache metric tracking for OpenAI and DeepSeek in src/ai_client.py.
    • Update _send_deepseek to extract prompt_cache_hit_tokens and prompt_cache_miss_tokens from usage metadata.
    • Update _send_openai (or its equivalent) to extract cached_tokens from prompt_tokens_details.
    • Update _append_comms and the response_received event to propagate these metrics.
  • Task: Optimize prompt structure for OpenAI and DeepSeek to stabilize prefixes.
    • Ensure system instructions and tool definitions are at the absolute beginning of the messages array.
    • Research and implement the prompt_cache_key parameter for OpenAI if applicable to increase hit rates.
  • Task: Conductor - User Manual Verification 'Phase 1: Metric Tracking & Prefix Stabilization' (Protocol in workflow.md)

Phase 2: Anthropic 4-Breakpoint Optimization

  • Task: Implement hierarchical caching for Anthropic in src/ai_client.py.
    • Refactor _send_anthropic to use exactly 4 breakpoints:
      1. Global System block.
      2. Project Context block.
      3. Context Injection block (file contents).
      4. Sliding history window (last N turns).
  • Task: Research and implement "Automatic Caching" if supported by the SDK.
    • Check if cache_control: {"type": "ephemeral"} can be applied at the request level to simplify history caching.
  • Task: Conductor - User Manual Verification 'Phase 2: Anthropic 4-Breakpoint Optimization' (Protocol in workflow.md)

Phase 3: Gemini Caching & TTL Management

  • Task: Optimize Gemini explicit caching logic.
    • Update _send_gemini to handle the 32k token threshold more intelligently (e.g., only create CachedContent when multiple turns are expected).
    • Expose _GEMINI_CACHE_TTL as a configurable setting in config.toml.
  • Task: Implement manual cache controls in src/ai_client.py.
    • Add invalidate_provider_caches(provider) to delete server-side caches.
  • Task: Conductor - User Manual Verification 'Phase 3: Gemini Caching & TTL Management' (Protocol in workflow.md)

Phase 4: GUI Integration & Visualization

  • Task: Enhance the AI Metrics panel in src/gui_2.py.
    • Add "Saved Tokens" and "Cache Hit Rate" displays.
    • Implement visual indicators (badges) for cached files in the Context Hub.
  • Task: Add manual cache management buttons to the AI Settings panel.
    • "Force Cache Rebuild" and "Clear All Server Caches".
  • Task: Update Comms Log UI to show per-response metrics.
    • Modify _render_comms_history_panel in src/gui_2.py to display token usage (including cache hits) for each response entry.
  • Task: Final end-to-end efficiency audit across all providers.
  • Task: Conductor - User Manual Verification 'Phase 4: GUI Integration & Visualization' (Protocol in workflow.md)