chore(conductor): Add new track 'AI Provider Caching Optimization'

2026-03-08 13:55:06 -04:00
parent 792352fb5b
commit d7083fc73f
5 changed files with 109 additions and 0 deletions
--- a/conductor/tracks.md
+++ b/conductor/tracks.md
@@ -76,6 +76,10 @@ This file tracks all major tracks for the project. Each track has its own detail
    *Link: [./tracks/zhipu_integration_20260308/](./tracks/zhipu_integration_20260308/)*
    *Goal: Add support for Zhipu AI (z.ai) as a first-class model provider (GLM-4, GLM-4-Flash, GLM-4V). Implement core client, vision support, and cost tracking.*
 14. [ ] **Track: AI Provider Caching Optimization**
    *Link: [./tracks/caching_optimization_20260308/](./tracks/caching_optimization_20260308/)*
    *Goal: Verify and optimize caching strategies across all providers. Implement 4-breakpoint hierarchy for Anthropic, prefix stabilization for OpenAI/DeepSeek, and hybrid explicit/implicit caching for Gemini. Add GUI hit rate metrics.*
 ---
 ## Phase 3: Future Horizons
--- a/conductor/tracks/caching_optimization_20260308/index.md
+++ b/conductor/tracks/caching_optimization_20260308/index.md
@@ -0,0 +1,5 @@
 # Track caching_optimization_20260308 Context
 - [Specification](./spec.md)
 - [Implementation Plan](./plan.md)
 - [Metadata](./metadata.json)
--- a/conductor/tracks/caching_optimization_20260308/metadata.json
+++ b/conductor/tracks/caching_optimization_20260308/metadata.json
@@ -0,0 +1,8 @@
 {
  "track_id": "caching_optimization_20260308",
  "type": "feature",
  "status": "new",
  "created_at": "2026-03-08T13:55:00Z",
  "updated_at": "2026-03-08T13:55:00Z",
  "description": "Verify all ai providers implementation in ai_client.py and elsehwere are using the best approach to caching files, prompts, etc. Intent is to optimally maximize efficency of agent usage of tokens, and other metrics providers charge."
 }
--- a/conductor/tracks/caching_optimization_20260308/plan.md
+++ b/conductor/tracks/caching_optimization_20260308/plan.md
@@ -0,0 +1,39 @@
 # Implementation Plan: AI Provider Caching Optimization
 ## Phase 1: Metric Tracking & Prefix Stabilization
 - [ ] Task: Implement cache metric tracking for OpenAI and DeepSeek in `src/ai_client.py`.
    - [ ] Update `_send_deepseek` to extract `prompt_cache_hit_tokens` and `prompt_cache_miss_tokens` from usage metadata.
    - [ ] Update `_send_openai` (or its equivalent) to extract `cached_tokens` from `prompt_tokens_details`.
    - [ ] Update `_append_comms` and the `response_received` event to propagate these metrics.
 - [ ] Task: Optimize prompt structure for OpenAI and DeepSeek to stabilize prefixes.
    - [ ] Ensure system instructions and tool definitions are at the absolute beginning of the messages array.
    - [ ] Research and implement the `prompt_cache_key` parameter for OpenAI if applicable to increase hit rates.
 - [ ] Task: Conductor - User Manual Verification 'Phase 1: Metric Tracking & Prefix Stabilization' (Protocol in workflow.md)
 ## Phase 2: Anthropic 4-Breakpoint Optimization
 - [ ] Task: Implement hierarchical caching for Anthropic in `src/ai_client.py`.
    - [ ] Refactor `_send_anthropic` to use exactly 4 breakpoints:
        1. Global System block.
        2. Project Context block.
        3. Context Injection block (file contents).
        4. Sliding history window (last N turns).
 - [ ] Task: Research and implement "Automatic Caching" if supported by the SDK.
    - [ ] Check if `cache_control: {"type": "ephemeral"}` can be applied at the request level to simplify history caching.
 - [ ] Task: Conductor - User Manual Verification 'Phase 2: Anthropic 4-Breakpoint Optimization' (Protocol in workflow.md)
 ## Phase 3: Gemini Caching & TTL Management
 - [ ] Task: Optimize Gemini explicit caching logic.
    - [ ] Update `_send_gemini` to handle the 32k token threshold more intelligently (e.g., only create `CachedContent` when multiple turns are expected).
    - [ ] Expose `_GEMINI_CACHE_TTL` as a configurable setting in `config.toml`.
 - [ ] Task: Implement manual cache controls in `src/ai_client.py`.
    - [ ] Add `invalidate_provider_caches(provider)` to delete server-side caches.
 - [ ] Task: Conductor - User Manual Verification 'Phase 3: Gemini Caching & TTL Management' (Protocol in workflow.md)
 ## Phase 4: GUI Integration & Visualization
 - [ ] Task: Enhance the AI Metrics panel in `src/gui_2.py`.
    - [ ] Add "Saved Tokens" and "Cache Hit Rate" displays.
    - [ ] Implement visual indicators (badges) for cached files in the Context Hub.
 - [ ] Task: Add manual cache management buttons to the AI Settings panel.
    - [ ] "Force Cache Rebuild" and "Clear All Server Caches".
 - [ ] Task: Final end-to-end efficiency audit across all providers.
 - [ ] Task: Conductor - User Manual Verification 'Phase 4: GUI Integration & Visualization' (Protocol in workflow.md)
--- a/conductor/tracks/caching_optimization_20260308/spec.md
+++ b/conductor/tracks/caching_optimization_20260308/spec.md
@@ -0,0 +1,53 @@
 # Specification: AI Provider Caching Optimization
 ## Overview
 This track aims to verify and optimize the caching strategies across all supported AI providers (Gemini, Anthropic, DeepSeek, MiniMax, and OpenAI). The goal is to minimize token consumption and latency by ensuring that static and recurring context (system prompts, tool definitions, project documents, and conversation history) are effectively cached using each provider's specific mechanisms.
 ## Functional Requirements
 ### 1. Provider-Specific Optimization
 #### **Anthropic (Claude)**
 - **4-Breakpoint Strategy:** Implement a hierarchical caching structure using the 4 available `cache_control: {"type": "ephemeral"}` breakpoints:
    1.  **Global System:** Core instructions and stable tool definitions.
    2.  **Project Context:** High-signal project documents (`README.md`, `tech-stack.md`, etc.).
    3.  **Context Injection:** Large file contents or skeletons injected into the current session.
    4.  **Rolling History:** A "sliding window" breakpoint on the history to cache the majority of previous turns.
 - **Automatic Caching:** Research and integrate the top-level `cache_control` flag if supported by the current SDK version to simplify history caching.
 #### **OpenAI & DeepSeek**
 - **Prefix Stabilization:** Ensure that the `messages` array is structured with most-static content first (System role content, then invariant project context).
 - **Metric Monitoring:** Implement tracking for `cached_tokens` (OpenAI) and `prompt_cache_hit_tokens` (DeepSeek) from the API usage metadata.
 - **Minimum Threshold Awareness:** Only attempt to optimize or report on caching when the total prefix length exceeds the provider's threshold (1,024 for OpenAI, 64 for DeepSeek).
 #### **Gemini**
 - **Hybrid Caching:**
    - **Explicit:** Continue using `CachedContent` for massive context blocks (>= 32k tokens) in the `system_instruction`.
    - **Implicit:** Optimize the `system_instruction` and first message prefix to trigger implicit caching for smaller contexts (~1k-4k tokens).
 - **TTL Management:** Allow users to configure the TTL for explicit caches to balance cost vs. persistence.
 #### **MiniMax**
 - **Parity Check:** Verify if the MiniMax API supports OpenAI-style usage metrics for cached tokens and implement tracking if available.
 ### 2. GUI Enhancements
 - **Cache Hit Visualization:** Add a "Cache Hit Rate" or "Saved Tokens" indicator to the AI Metrics panel.
 - **Manual Cache Management:**
    - Add a "Force Cache Rebuild" button to invalidate existing caches and start fresh.
    - Add a "Clear Provider Caches" button to explicitly delete server-side caches (primarily for Gemini).
 - **Cache Awareness:** Display a visual indicator (e.g., a "Cached" badge) next to files in the Context Hub that are currently part of a cached prefix.
 ## Non-Functional Requirements
 - **Efficiency:** The caching logic itself must not introduce significant overhead or latency.
 - **Cost-Effectiveness:** Avoid explicit caching (Gemini) for single-query sessions where storage fees outweigh token savings.
 - **Robustness:** Gracefully handle "Cache Not Found" errors or automatic evictions.
 ## Acceptance Criteria
 - [ ] Anthropic requests consistently use 3-4 breakpoints for complex sessions.
 - [ ] OpenAI and DeepSeek responses correctly report cached token counts in the Comms Log and GUI.
 - [ ] Gemini explicit caches are proactively rebuilt and deleted according to the configured TTL and session lifecycle.
 - [ ] The GUI provides clear visibility into saved tokens and hit rates.
 - [ ] Users can manually clear or rebuild caches via the settings.
 ## Out of Scope
 - Implementing client-side persistent KV caching (focusing strictly on vendor-side prompt/context caching).
 - Caching of image/vision data across different sessions (unless natively supported by the provider's prompt cache).