wip docs

2026-02-22 11:22:08 -05:00
parent da8df7a393
commit 1b71b748db
6 changed files with 31 additions and 21 deletions
--- a/MainContext.md
+++ b/MainContext.md
@@ -87,9 +87,9 @@ Is a local GUI tool for manually curating and sending context to AI APIs. It agg
 - All tool calls (script + result/rejection) are appended to `_tool_log` and displayed in the Tool Calls panel

 **Dynamic file context refresh (ai_client.py):**
- After every tool call round, all project files from `file_items` are re-read from disk via `_reread_file_items()`
- For Anthropic: the refreshed file contents are injected as a `text` block appended to the `tool_results` user message, prefixed with `[FILES UPDATED]` and an instruction not to re-read them
- For Gemini: files are re-read (updating the `file_items` list in place) but cannot be injected into tool results due to Gemini's structured function response format
+- After the last tool call in each round, all project files from `file_items` are re-read from disk via `_reread_file_items()`. The `file_items` variable is reassigned so subsequent rounds see fresh content.
+- For Anthropic: the refreshed file contents are injected as a `text` block appended to the `tool_results` user message, prefixed with `[FILES UPDATED]` and an instruction not to re-read them.
+- For Gemini: refreshed file contents are appended to the last function response's `output` string as a `[SYSTEM: FILES UPDATED]` block. On the next tool round, stale `[FILES UPDATED]` blocks are stripped from history and old tool outputs are truncated to `_history_trunc_limit` characters to control token growth.
 - `_build_file_context_text(file_items)` formats the refreshed files as markdown code blocks (same format as the original context)
 - The `tool_result_send` comms log entry filters out the injected text block (only logs actual `tool_result` entries) to keep the comms panel clean
 - `file_items` flows from `aggregate.build_file_items()` â†’ `gui.py` `self.last_file_items` â†’ `ai_client.send(file_items=...)` â†’ `_send_anthropic(file_items=...)` / `_send_gemini(file_items=...)`
@@ -142,9 +142,11 @@ Entry layout: index + timestamp + direction + kind + provider/model header row,
 - `close_session()` flushes and closes both file handles; called just before `dpg.destroy_context()`

 **Anthropic prompt caching:**
- System prompt sent as an array with `cache_control: ephemeral` on the text block
- Last tool in `_ANTHROPIC_TOOLS` has `cache_control: ephemeral`; system + tools prefix is cached together after the first request
- First user message content[0] is the `<context>` block with `cache_control: ephemeral`; content[1] is the user question without cache control
+- System prompt + context are combined into one string, chunked into <=120k char blocks, and sent as the `system=` parameter array. Only the LAST chunk gets `cache_control: ephemeral`, so the entire system prefix is cached as one unit.
+- Last tool in `_ANTHROPIC_TOOLS` (`run_powershell`) has `cache_control: ephemeral`; this means the tools prefix is cached together with the system prefix after the first request.
+- The user message is sent as a plain `[{"type": "text", "text": user_message}]` block with NO cache_control. The context lives in `system=`, not in the first user message.
+- The tools list is built once per session via `_get_anthropic_tools()` and reused across all API calls within the tool loop, avoiding redundant Python-side reconstruction.
+- `_strip_cache_controls()` removes stale `cache_control` markers from all history entries before each API call, ensuring only the stable system/tools prefix consumes cache breakpoint slots.
 - Cache stats (creation tokens, read tokens) are surfaced in the comms log usage dict and displayed in the Comms History panel

 **Data flow:**
@@ -190,15 +192,17 @@ Entry layout: index + timestamp + direction + kind + provider/model header row,

 **Known extension points:**
 - Add more providers by adding a section to `credentials.toml`, a `_list_*` and `_send_*` function in `ai_client.py`, and the provider name to the `PROVIDERS` list in `gui.py`
- System prompt support could be added as a field in the project `.toml` and passed in `ai_client.send()`
 - Discussion history excerpts could be individually toggleable for inclusion in the generated md
 - `MAX_TOOL_ROUNDS` in `ai_client.py` caps agentic loops at 10 rounds; adjustable
 - `COMMS_CLAMP_CHARS` in `gui.py` controls the character threshold for clamping heavy payload fields in the Comms History panel
 - Additional project metadata (description, tags, created date) could be added to `[project]` in the per-project toml

 ### Gemini Context Management
- Investigating ways to prevent context duplication in _gemini_chat history, as currently <context>{md_content}</context> is prepended to the user message on every single request, causing history bloat.
- Discussing explicit Gemini Context Caching API (client.caches.create()) to store read-only file context and avoid re-reading files across sessions.
+- Gemini uses explicit caching via `client.caches.create()` to store the `system_instruction` + tools as an immutable cached prefix with a 1-hour TTL. The cache is created once per chat session.
+- When context changes (detected via `md_content` hash), the old cache is deleted, a new cache is created, and chat history is migrated to a fresh chat session pointing at the new cache.
+- If cache creation fails (e.g., content is under the minimum token threshold — 1024 for Flash, 4096 for Pro), the system falls back to inline `system_instruction` in the chat config. Implicit caching may still provide cost savings in this case.
+- The `<context>` block lives inside `system_instruction`, NOT in user messages, preventing history bloat across turns.
+- On cleanup/exit, active caches are deleted via `ai_client.cleanup()` to prevent orphaned billing.

 ### Latest Changes
 - Removed `Config` panel from the GUI to streamline per-project configuration.