chore(conductor): Add new track 'AI Provider Caching Optimization'
This commit is contained in:
39
conductor/tracks/caching_optimization_20260308/plan.md
Normal file
39
conductor/tracks/caching_optimization_20260308/plan.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# Implementation Plan: AI Provider Caching Optimization
|
||||
|
||||
## Phase 1: Metric Tracking & Prefix Stabilization
|
||||
- [ ] Task: Implement cache metric tracking for OpenAI and DeepSeek in `src/ai_client.py`.
|
||||
- [ ] Update `_send_deepseek` to extract `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` from usage metadata.
|
||||
- [ ] Update `_send_openai` (or its equivalent) to extract `cached_tokens` from `prompt_tokens_details`.
|
||||
- [ ] Update `_append_comms` and the `response_received` event to propagate these metrics.
|
||||
- [ ] Task: Optimize prompt structure for OpenAI and DeepSeek to stabilize prefixes.
|
||||
- [ ] Ensure system instructions and tool definitions are at the absolute beginning of the messages array.
|
||||
- [ ] Research and implement the `prompt_cache_key` parameter for OpenAI if applicable to increase hit rates.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Metric Tracking & Prefix Stabilization' (Protocol in workflow.md)
|
||||
|
||||
## Phase 2: Anthropic 4-Breakpoint Optimization
|
||||
- [ ] Task: Implement hierarchical caching for Anthropic in `src/ai_client.py`.
|
||||
- [ ] Refactor `_send_anthropic` to use exactly 4 breakpoints:
|
||||
1. Global System block.
|
||||
2. Project Context block.
|
||||
3. Context Injection block (file contents).
|
||||
4. Sliding history window (last N turns).
|
||||
- [ ] Task: Research and implement "Automatic Caching" if supported by the SDK.
|
||||
- [ ] Check if `cache_control: {"type": "ephemeral"}` can be applied at the request level to simplify history caching.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Anthropic 4-Breakpoint Optimization' (Protocol in workflow.md)
|
||||
|
||||
## Phase 3: Gemini Caching & TTL Management
|
||||
- [ ] Task: Optimize Gemini explicit caching logic.
|
||||
- [ ] Update `_send_gemini` to handle the 32k token threshold more intelligently (e.g., only create `CachedContent` when multiple turns are expected).
|
||||
- [ ] Expose `_GEMINI_CACHE_TTL` as a configurable setting in `config.toml`.
|
||||
- [ ] Task: Implement manual cache controls in `src/ai_client.py`.
|
||||
- [ ] Add `invalidate_provider_caches(provider)` to delete server-side caches.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Gemini Caching & TTL Management' (Protocol in workflow.md)
|
||||
|
||||
## Phase 4: GUI Integration & Visualization
|
||||
- [ ] Task: Enhance the AI Metrics panel in `src/gui_2.py`.
|
||||
- [ ] Add "Saved Tokens" and "Cache Hit Rate" displays.
|
||||
- [ ] Implement visual indicators (badges) for cached files in the Context Hub.
|
||||
- [ ] Task: Add manual cache management buttons to the AI Settings panel.
|
||||
- [ ] "Force Cache Rebuild" and "Clear All Server Caches".
|
||||
- [ ] Task: Final end-to-end efficiency audit across all providers.
|
||||
- [ ] Task: Conductor - User Manual Verification 'Phase 4: GUI Integration & Visualization' (Protocol in workflow.md)
|
||||
Reference in New Issue
Block a user