Explore Help

ed/manual_slop

1

0

You've already forked manual_slop

Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity

Files

8d79faa22df96cba0d023a61887c86ebcbd1007e

manual_slop/conductor/tracks/caching_optimization_20260308/plan.md

Ed_ 235b369d15 chore(conductor): Add per-response metrics requirement to caching optimization track

2026-03-08 13:55:32 -04:00

3.0 KiB

Raw Blame History

Implementation Plan: AI Provider Caching Optimization

Phase 1: Metric Tracking & Prefix Stabilization

Task: Implement cache metric tracking for OpenAI and DeepSeek in src/ai_client.py.
- Update _send_deepseek to extract prompt_cache_hit_tokens and prompt_cache_miss_tokens from usage metadata.
- Update _send_openai (or its equivalent) to extract cached_tokens from prompt_tokens_details.
- Update _append_comms and the response_received event to propagate these metrics.
Task: Optimize prompt structure for OpenAI and DeepSeek to stabilize prefixes.
- Ensure system instructions and tool definitions are at the absolute beginning of the messages array.
- Research and implement the prompt_cache_key parameter for OpenAI if applicable to increase hit rates.
Task: Conductor - User Manual Verification 'Phase 1: Metric Tracking & Prefix Stabilization' (Protocol in workflow.md)

Phase 2: Anthropic 4-Breakpoint Optimization

Task: Implement hierarchical caching for Anthropic in src/ai_client.py.
- Refactor _send_anthropic to use exactly 4 breakpoints:
  1. Global System block.
  2. Project Context block.
  3. Context Injection block (file contents).
  4. Sliding history window (last N turns).
Task: Research and implement "Automatic Caching" if supported by the SDK.
- Check if cache_control: {"type": "ephemeral"} can be applied at the request level to simplify history caching.
Task: Conductor - User Manual Verification 'Phase 2: Anthropic 4-Breakpoint Optimization' (Protocol in workflow.md)

Phase 3: Gemini Caching & TTL Management

Task: Optimize Gemini explicit caching logic.
- Update _send_gemini to handle the 32k token threshold more intelligently (e.g., only create CachedContent when multiple turns are expected).
- Expose _GEMINI_CACHE_TTL as a configurable setting in config.toml.
Task: Implement manual cache controls in src/ai_client.py.
- Add invalidate_provider_caches(provider) to delete server-side caches.
Task: Conductor - User Manual Verification 'Phase 3: Gemini Caching & TTL Management' (Protocol in workflow.md)

Phase 4: GUI Integration & Visualization

Task: Enhance the AI Metrics panel in src/gui_2.py.
- Add "Saved Tokens" and "Cache Hit Rate" displays.
- Implement visual indicators (badges) for cached files in the Context Hub.
Task: Add manual cache management buttons to the AI Settings panel.
- "Force Cache Rebuild" and "Clear All Server Caches".
Task: Update Comms Log UI to show per-response metrics.
- Modify _render_comms_history_panel in src/gui_2.py to display token usage (including cache hits) for each response entry.
Task: Final end-to-end efficiency audit across all providers.
Task: Conductor - User Manual Verification 'Phase 4: GUI Integration & Visualization' (Protocol in workflow.md)

Reference in New Issue View Git Blame Copy Permalink

Powered by Gitea Version: 1.25.4 Page: 86ms Template: 1ms

English

Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API