gemini "fixes"

2026-02-22 11:32:54 -05:00
9 changed files with 216 additions and 551 deletions
@@ -12,16 +12,16 @@ Is a local GUI tool for manually curating and sending context to AI APIs. It agg
 - `uv` - package/env management

 **Files:**
- `gui.py` - main GUI, `App` class, all panels, all callbacks, confirmation dialog, layout persistence, rich comms rendering; `[+ Maximize]` buttons in `ConfirmDialog` and `win_script_output` now pass text directly as `user_data` / read from `self._last_script` / `self._last_output` instance vars instead of `dpg.get_value(tag)` — fixes glitch when word-wrap is ON or dialog is dismissed before viewer opens
- `ai_client.py` - unified provider wrapper, model listing, session management, send, tool/function-call loop, comms log, provider error classification, token estimation, and aggressive history truncation
- `aggregate.py` - reads config, collects files/screenshots/discussion, builds `file_items` with `mtime` for cache optimization, writes numbered `.md` files to `output_dir` using `build_markdown_from_items` to avoid double I/O; `run()` returns `(markdown_str, path, file_items)` tuple; `summary_only=False` by default (full file contents sent, not heuristic summaries)
+- `gui.py` - main GUI, `App` class, all panels, all callbacks, confirmation dialog, layout persistence, rich comms rendering
+- `ai_client.py` - unified provider wrapper, model listing, session management, send, tool/function-call loop, comms log, provider error classification
+- `aggregate.py` - reads config, collects files/screenshots/discussion, writes numbered `.md` files to `output_dir`
 - `shell_runner.py` - subprocess wrapper that runs PowerShell scripts sandboxed to `base_dir`, returns stdout/stderr/exit code as a string
 - `session_logger.py` - opens timestamped log files at session start; writes comms entries as JSON-L and tool calls as markdown; saves each AI-generated script as a `.ps1` file
 - `project_manager.py` - per-project .toml load/save, entry serialisation (entry_to_str/str_to_entry with @timestamp support), default_project/default_discussion factories, migrate_from_legacy_config, flat_config for aggregate.run(), git helpers (get_git_commit, get_git_log)
 - `theme.py` - palette definitions, font loading, scale, load_from_config/save_to_config
 - `gemini.py` - legacy standalone Gemini wrapper (not used by the main GUI; superseded by `ai_client.py`)
 - `file_cache.py` - stub; Anthropic Files API path removed; kept so stale imports don't break
- `mcp_client.py` - MCP-style tools (read_file, list_directory, search_files, get_file_summary, web_search, fetch_url); allowlist enforced against project file_items + base_dirs for file tools; web tools are unrestricted; dispatched by ai_client tool-use loop for both Anthropic and Gemini
+- `mcp_client.py` - MCP-style read-only file tools (read_file, list_directory, search_files, get_file_summary); allowlist enforced against project file_items + base_dirs; dispatched by ai_client tool-use loop for both Anthropic and Gemini
 - `summarize.py` - local heuristic summariser (no AI); .py via AST, .toml via regex, .md headings, generic preview; used by mcp_client.get_file_summary and aggregate.build_summary_section
 - `config.toml` - global-only settings: [ai] provider+model+system_prompt, [theme] palette+font+scale, [projects] paths array + active path
 - `manual_slop.toml` - per-project file: [project] name+git_dir+system_prompt+main_context, [output] namespace+output_dir, [files] base_dir+paths, [screenshots] base_dir+paths, [discussion] roles+active+[discussion.discussions.<name>] git_commit+last_updated+history
@@ -87,7 +87,7 @@ Is a local GUI tool for manually curating and sending context to AI APIs. It agg
 - All tool calls (script + result/rejection) are appended to `_tool_log` and displayed in the Tool Calls panel

 **Dynamic file context refresh (ai_client.py):**
- After the last tool call in each round, project files from `file_items` are checked via `_reread_file_items()`. It uses `mtime` to only re-read modified files, returning only the `changed` files to build a minimal `[FILES UPDATED]` block.
+- After the last tool call in each round, all project files from `file_items` are re-read from disk via `_reread_file_items()`. The `file_items` variable is reassigned so subsequent rounds see fresh content.
 - For Anthropic: the refreshed file contents are injected as a `text` block appended to the `tool_results` user message, prefixed with `[FILES UPDATED]` and an instruction not to re-read them.
 - For Gemini: refreshed file contents are appended to the last function response's `output` string as a `[SYSTEM: FILES UPDATED]` block. On the next tool round, stale `[FILES UPDATED]` blocks are stripped from history and old tool outputs are truncated to `_history_trunc_limit` characters to control token growth.
 - `_build_file_context_text(file_items)` formats the refreshed files as markdown code blocks (same format as the original context)
@@ -141,12 +141,10 @@ Entry layout: index + timestamp + direction + kind + provider/model header row,
 - `log_tool_call(script, result, script_path)` writes the script to `scripts/generated/<ts>_<seq:04d>.ps1` and appends a markdown record to the toolcalls log without the script body (just the file path + result); uses a `threading.Lock` for the sequence counter
 - `close_session()` flushes and closes both file handles; called just before `dpg.destroy_context()`

-**Anthropic prompt caching & history management:**
+**Anthropic prompt caching:**
 - System prompt + context are combined into one string, chunked into <=120k char blocks, and sent as the `system=` parameter array. Only the LAST chunk gets `cache_control: ephemeral`, so the entire system prefix is cached as one unit.
 - Last tool in `_ANTHROPIC_TOOLS` (`run_powershell`) has `cache_control: ephemeral`; this means the tools prefix is cached together with the system prefix after the first request.
 - The user message is sent as a plain `[{"type": "text", "text": user_message}]` block with NO cache_control. The context lives in `system=`, not in the first user message.
- `_add_history_cache_breakpoint` places `cache_control:ephemeral` on the last content block of the second-to-last user message, using the 4th cache breakpoint to cache the conversation history prefix.
- `_trim_anthropic_history` uses token estimation (`_CHARS_PER_TOKEN = 3.5`) to keep the prompt under `_ANTHROPIC_MAX_PROMPT_TOKENS = 180_000`. It strips stale file refreshes from old turns, and drops oldest turn pairs if still over budget.
 - The tools list is built once per session via `_get_anthropic_tools()` and reused across all API calls within the tool loop, avoiding redundant Python-side reconstruction.
 - `_strip_cache_controls()` removes stale `cache_control` markers from all history entries before each API call, ensuring only the stable system/tools prefix consumes cache breakpoint slots.
 - Cache stats (creation tokens, read tokens) are surfaced in the comms log usage dict and displayed in the Comms History panel
@@ -182,15 +180,13 @@ Entry layout: index + timestamp + direction + kind + provider/model header row,
 **MCP file tools (mcp_client.py + ai_client.py):**
 - Four read-only tools exposed to the AI as native function/tool declarations: `read_file`, `list_directory`, `search_files`, `get_file_summary`
 - Access control: `mcp_client.configure(file_items, extra_base_dirs)` is called before each send; builds an allowlist of resolved absolute paths from the project's `file_items` plus the `base_dir`; any path that is not explicitly in the list or not under one of the allowed directories returns `ACCESS DENIED`
- `mcp_client.dispatch(tool_name, tool_input)` is the single dispatch entry point used by both Anthropic and Gemini tool-use loops; `TOOL_NAMES` set now includes all six tool names
+- `mcp_client.dispatch(tool_name, tool_input)` is the single dispatch entry point used by both Anthropic and Gemini tool-use loops
 - Anthropic: MCP tools appear before `run_powershell` in the tools list (no `cache_control` on them; only `run_powershell` carries `cache_control: ephemeral`)
 - Gemini: MCP tools are included in the `FunctionDeclaration` list alongside `run_powershell`
 - `get_file_summary` uses `summarize.summarise_file()` â€” same heuristic used for the initial `<context>` block, so the AI gets the same compact structural view it already knows
 - `list_directory` sorts dirs before files; shows name, type, and size
 - `search_files` uses `Path.glob()` with the caller-supplied pattern (supports `**/*.py` style)
 - `read_file` returns raw UTF-8 text; errors (not found, access denied, decode error) are returned as error strings rather than exceptions, so the AI sees them as tool results
- `web_search(query)` queries DuckDuckGo HTML endpoint and returns the top 5 results (title, URL, snippet) as a formatted string; uses a custom `_DDGParser` (HTMLParser subclass)
- `fetch_url(url)` fetches a URL, strips HTML tags/scripts via `_TextExtractor` (HTMLParser subclass), collapses whitespace, and truncates to 40k chars to prevent context blowup; handles DuckDuckGo redirect links automatically
 - `summarize.py` heuristics: `.py` â†’ AST imports + ALL_CAPS constants + classes+methods + top-level functions; `.toml` â†’ table headers + top-level keys; `.md` â†’ h1â€“h3 headings with indentation; all others â†’ line count + first 8 lines preview
 - Comms log: MCP tool calls log `OUT/tool_call` with `{"name": ..., "args": {...}}` and `IN/tool_result` with `{"name": ..., "output": ...}`; rendered in the Comms History panel via `_render_payload_tool_call` (shows each arg key/value) and `_render_payload_tool_result` (shows output)

@@ -203,9 +199,7 @@ Entry layout: index + timestamp + direction + kind + provider/model header row,

 ### Gemini Context Management
 - Gemini uses explicit caching via `client.caches.create()` to store the `system_instruction` + tools as an immutable cached prefix with a 1-hour TTL. The cache is created once per chat session.
- Proactively rebuilds cache at 90% of `_GEMINI_CACHE_TTL = 3600` to avoid stale-reference errors.
 - When context changes (detected via `md_content` hash), the old cache is deleted, a new cache is created, and chat history is migrated to a fresh chat session pointing at the new cache.
- Trims history by dropping oldest pairs if input tokens exceed `_GEMINI_MAX_INPUT_TOKENS = 900_000`.
 - If cache creation fails (e.g., content is under the minimum token threshold — 1024 for Flash, 4096 for Pro), the system falls back to inline `system_instruction` in the chat config. Implicit caching may still provide cost savings in this case.
 - The `<context>` block lives inside `system_instruction`, NOT in user messages, preventing history bloat across turns.
 - On cleanup/exit, active caches are deleted via `ai_client.cleanup()` to prevent orphaned billing.
@@ -250,34 +244,3 @@ Documentation has been completely rewritten matching the strict, structural form
 - `docs/guide_architecture.md`: Details the Python implementation algorithms, queue management for UI rendering, the specific AST heuristics used for context aggregation, and the distinct algorithms for trimming Anthropic history vs Gemini state caching.
 - `docs/Readme.md`: The core interface manual.
 - `docs/guide_tools.md`: Security architecture for `_is_allowed` paths and definitions of the read-only vs destructive tool pipeline.
-
-
-
-
-## Updates (2026-02-22 — ai_client.py & aggregate.py)
-
-### mcp_client.py — Web Tools Added
- `web_search(query)` and `fetch_url(url)` added as two new MCP tools alongside the existing four file tools.
- `TOOL_NAMES` set updated to include all six tool names for dispatch routing.
- `MCP_TOOL_SPECS` list extended with full JSON schema definitions for both web tools.
- Both tools are declared in `_build_anthropic_tools()` and `_gemini_tool_declaration()` so they are available to both providers.
- Web tools bypass the `_is_allowed` path check (no filesystem access); file tools retain the allowlist enforcement.
-
-### aggregate.py — run() double-I/O elimination
- `run()` now calls `build_file_items()` once, then passes the result to `build_markdown_from_items()` instead of calling `build_files_section()` separately. This avoids reading every file twice per send.
- `build_markdown_from_items()` accepts a `summary_only` flag (default `False`); when `False` it inlines full file content; when `True` it delegates to `summarize.build_summary_markdown()` for compact structural summaries.
- `run()` returns a 3-tuple `(markdown_str, output_path, file_items)` — the `file_items` list is passed through to `gui.py` as `self.last_file_items` for dynamic context refresh after tool calls.
-
-
-## Updates (2026-02-22 — gui.py [+ Maximize] bug fix)
-
-### Problem
-Three `[+ Maximize]` buttons were reading their text content via `dpg.get_value(tag)` at click time:
-1. `ConfirmDialog.show()` — passed `f"{self._tag}_script"` as `user_data` and called `dpg.get_value(u)` in the lambda. If the dialog was dismissed before the viewer opened, the item no longer existed and the call would fail silently or crash.
-2. `win_script_output` Script `[+ Maximize]` — used `user_data="last_script_text"` and `dpg.get_value(u)`. When word-wrap is ON, `last_script_text` is hidden (`show=False`); in some DPG versions `dpg.get_value` on a hidden `input_text` returns `""`.
-3. `win_script_output` Output `[+ Maximize]` — same issue with `"last_script_output"`.
-
-### Fix
- `ConfirmDialog.show()`: changed `user_data` to `self._script` (the actual text string captured at button-creation time) and the callback to `lambda s, a, u: _show_text_viewer("Confirm Script", u)`. The text is now baked in at dialog construction, not read from a potentially-deleted widget.
- `App._append_tool_log()`: added `self._last_script = script` and `self._last_output = result` assignments so the latest values are always available as instance state.
- `win_script_output` buttons: both `[+ Maximize]` buttons now use `lambda s, a, u: _show_text_viewer("...", self._last_script/output)` directly, bypassing DPG widget state entirely.
@@ -98,28 +98,24 @@ def build_file_items(base_dir: Path, files: list[str]) -> list[dict]:
        entry    : str   (original config entry string)
        content  : str   (file text, or error string)
        error    : bool
-        mtime    : float (last modification time, for skip-if-unchanged optimization)
    """
    items = []
    for entry in files:
        paths = resolve_paths(base_dir, entry)
        if not paths:
-            items.append({"path": None, "entry": entry, "content": f"ERROR: no files matched: {entry}", "error": True, "mtime": 0.0})
+            items.append({"path": None, "entry": entry, "content": f"ERROR: no files matched: {entry}", "error": True})
            continue
        for path in paths:
            try:
                content = path.read_text(encoding="utf-8")
-                mtime = path.stat().st_mtime
                error = False
            except FileNotFoundError:
                content = f"ERROR: file not found: {path}"
-                mtime = 0.0
                error = True
            except Exception as e:
                content = f"ERROR: {e}"
-                mtime = 0.0
                error = True
-            items.append({"path": path, "entry": entry, "content": content, "error": error, "mtime": mtime})
+            items.append({"path": path, "entry": entry, "content": content, "error": error})
    return items

 def build_summary_section(base_dir: Path, files: list[str]) -> str:
@@ -130,43 +126,8 @@ def build_summary_section(base_dir: Path, files: list[str]) -> str:
    items = build_file_items(base_dir, files)
    return summarize.build_summary_markdown(items)

-def _build_files_section_from_items(file_items: list[dict]) -> str:
-    """Build the files markdown section from pre-read file items (avoids double I/O)."""
-    sections = []
-    for item in file_items:
-        path = item.get("path")
-        entry = item.get("entry", "unknown")
-        content = item.get("content", "")
-        if path is None:
-            sections.append(f"### `{entry}`\n\n```text\n{content}\n```")
-            continue
-        suffix = path.suffix.lstrip(".") if hasattr(path, "suffix") else "text"
-        lang = suffix if suffix else "text"
-        original = entry if "*" not in entry else str(path)
-        sections.append(f"### `{original}`\n\n```{lang}\n{content}\n```")
-    return "\n\n---\n\n".join(sections)
-
-
-def build_markdown_from_items(file_items: list[dict], screenshot_base_dir: Path, screenshots: list[str], history: list[str], summary_only: bool = False) -> str:
-    """Build markdown from pre-read file items instead of re-reading from disk."""
+def build_static_markdown(base_dir: Path, files: list[str], screenshot_base_dir: Path, screenshots: list[str], summary_only: bool = False) -> str:
    parts = []
-    # STATIC PREFIX: Files and Screenshots must go first to maximize Cache Hits
-    if file_items:
-        if summary_only:
-            parts.append("## Files (Summary)\n\n" + summarize.build_summary_markdown(file_items))
-        else:
-            parts.append("## Files\n\n" + _build_files_section_from_items(file_items))
-    if screenshots:
-        parts.append("## Screenshots\n\n" + build_screenshots_section(screenshot_base_dir, screenshots))
-    # DYNAMIC SUFFIX: History changes every turn, must go last
-    if history:
-        parts.append("## Discussion History\n\n" + build_discussion_section(history))
-    return "\n\n---\n\n".join(parts)
-
-
-def build_markdown(base_dir: Path, files: list[str], screenshot_base_dir: Path, screenshots: list[str], history: list[str], summary_only: bool = False) -> str:
-    parts = []
-    # STATIC PREFIX: Files and Screenshots must go first to maximize Cache Hits
    if files:
        if summary_only:
            parts.append("## Files (Summary)\n\n" + build_summary_section(base_dir, files))
@@ -174,12 +135,12 @@ def build_markdown(base_dir: Path, files: list[str], screenshot_base_dir: Path,
            parts.append("## Files\n\n" + build_files_section(base_dir, files))
    if screenshots:
        parts.append("## Screenshots\n\n" + build_screenshots_section(screenshot_base_dir, screenshots))
-    # DYNAMIC SUFFIX: History changes every turn, must go last
-    if history:
-        parts.append("## Discussion History\n\n" + build_discussion_section(history))
-    return "\n\n---\n\n".join(parts)
+    return "\n\n---\n\n".join(parts) if parts else ""

-def run(config: dict) -> tuple[str, Path, list[dict]]:
+def build_dynamic_markdown(history: list[str]) -> str:
+    return "## Discussion History\n\n" + build_discussion_section(history) if history else ""
+
+def run(config: dict) -> tuple[str, str, Path, list[dict]]:
    namespace = config.get("project", {}).get("name")
    if not namespace:
        namespace = config.get("output", {}).get("namespace", "project")
@@ -193,18 +154,21 @@ def run(config: dict) -> tuple[str, Path, list[dict]]:
    output_dir.mkdir(parents=True, exist_ok=True)
    increment = find_next_increment(output_dir, namespace)
    output_file = output_dir / f"{namespace}_{increment:03d}.md"
-    # Build file items once, then construct markdown from them (avoids double I/O)
-    file_items = build_file_items(base_dir, files)
-    markdown = build_markdown_from_items(file_items, screenshot_base_dir, screenshots, history,
-                                         summary_only=False)
+    
+    static_md = build_static_markdown(base_dir, files, screenshot_base_dir, screenshots, summary_only=False)
+    dynamic_md = build_dynamic_markdown(history)
+    
+    markdown = f"{static_md}\n\n---\n\n{dynamic_md}" if static_md and dynamic_md else static_md or dynamic_md
    output_file.write_text(markdown, encoding="utf-8")
-    return markdown, output_file, file_items
+    
+    file_items = build_file_items(base_dir, files)
+    return static_md, dynamic_md, output_file, file_items

 def main():
    with open("config.toml", "rb") as f:
        import tomllib
        config = tomllib.load(f)
-    markdown, output_file, _ = run(config)
+    static_md, dynamic_md, output_file, _ = run(config)
    print(f"Written: {output_file}")

 if __name__ == "__main__":
@@ -13,7 +13,6 @@ during chat creation to avoid massive history bloat.
 # ai_client.py
 import tomllib
 import json
-import time
 import datetime
 from pathlib import Path
 import file_cache
@@ -35,12 +34,6 @@ def set_model_params(temp: float, max_tok: int, trunc_limit: int = 8000):
 _gemini_client = None
 _gemini_chat = None
 _gemini_cache = None
-_gemini_cache_md_hash: int | None = None
-_gemini_cache_created_at: float | None = None
-
-# Gemini cache TTL in seconds. Caches are created with this TTL and
-# proactively rebuilt at 90% of this value to avoid stale-reference errors.
-_GEMINI_CACHE_TTL = 3600

 _anthropic_client = None
 _anthropic_history: list[dict] = []
@@ -223,7 +216,6 @@ def cleanup():

 def reset_session():
    global _gemini_client, _gemini_chat, _gemini_cache
-    global _gemini_cache_md_hash, _gemini_cache_created_at
    global _anthropic_client, _anthropic_history
    global _CACHED_ANTHROPIC_TOOLS
    if _gemini_client and _gemini_cache:
@@ -234,8 +226,6 @@ def reset_session():
    _gemini_client = None
    _gemini_chat = None
    _gemini_cache = None
-    _gemini_cache_md_hash = None
-    _gemini_cache_created_at = None
    _anthropic_client = None
    _anthropic_history = []
    _CACHED_ANTHROPIC_TOOLS = None
@@ -393,15 +383,12 @@ def _run_script(script: str, base_dir: str) -> str:

 # ------------------------------------------------------------------ dynamic file context refresh

-def _reread_file_items(file_items: list[dict]) -> tuple[list[dict], list[dict]]:
+def _reread_file_items(file_items: list[dict]) -> list[dict]:
    """
-    Re-read file_items from disk, but only files whose mtime has changed.
-    Returns (all_items, changed_items) — all_items is the full refreshed list,
-    changed_items contains only the files that were actually modified since
-    the last read (used to build a minimal [FILES UPDATED] block).
+    Re-read every file in file_items from disk, returning a fresh list.
+    This is called after tool calls so the AI sees updated file contents.
    """
    refreshed = []
-    changed = []
    for item in file_items:
        path = item.get("path")
        if path is None:
@@ -410,20 +397,11 @@ def _reread_file_items(file_items: list[dict]) -> tuple[list[dict], list[dict]]:
        from pathlib import Path as _P
        p = _P(path) if not isinstance(path, _P) else path
        try:
-            current_mtime = p.stat().st_mtime
-            prev_mtime = item.get("mtime", 0.0)
-            if current_mtime == prev_mtime:
-                refreshed.append(item)  # unchanged — skip re-read
-                continue
            content = p.read_text(encoding="utf-8")
-            new_item = {**item, "content": content, "error": False, "mtime": current_mtime}
-            refreshed.append(new_item)
-            changed.append(new_item)
+            refreshed.append({**item, "content": content, "error": False})
        except Exception as e:
-            err_item = {**item, "content": f"ERROR re-reading {p}: {e}", "error": True, "mtime": 0.0}
-            refreshed.append(err_item)
-            changed.append(err_item)
-    return refreshed, changed
+            refreshed.append({**item, "content": f"ERROR re-reading {p}: {e}", "error": True})
+    return refreshed


 def _build_file_context_text(file_items: list[dict]) -> str:
@@ -475,110 +453,66 @@ def _ensure_gemini_client():
        _gemini_client = genai.Client(api_key=creds["gemini"]["api_key"])


-
-def _get_gemini_history_list(chat):
-    if not chat: return []
-    # google-genai SDK stores the mutable list in _history
-    if hasattr(chat, "_history"):
-        return chat._history
-    if hasattr(chat, "history"):
-        return chat.history
-    if hasattr(chat, "get_history"):
-        return chat.get_history()
-    return []
-
-def _send_gemini(md_content: str, user_message: str, base_dir: str, file_items: list[dict] | None = None) -> str:
-    global _gemini_chat, _gemini_cache, _gemini_cache_md_hash, _gemini_cache_created_at
+def _send_gemini(static_md: str, dynamic_md: str, user_message: str, base_dir: str, file_items: list[dict] | None = None) -> str:
+    global _gemini_chat, _gemini_cache
    from google.genai import types
    try:
        _ensure_gemini_client(); mcp_client.configure(file_items or [], [base_dir])
-        sys_instr = f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"
+        sys_instr = f"{_get_combined_system_prompt()}\n\n<context>\n{static_md}\n</context>"
        tools_decl = [_gemini_tool_declaration()]
-
-        # DYNAMIC CONTEXT: Check if files/context changed mid-session
-        current_md_hash = hash(md_content)
+        
+        current_md_hash = hash(static_md)
        old_history = None
-        if _gemini_chat and _gemini_cache_md_hash != current_md_hash:
-            old_history = list(_get_gemini_history_list(_gemini_chat)) if _get_gemini_history_list(_gemini_chat) else []
+        if _gemini_chat and getattr(_gemini_chat, "_last_md_hash", None) != current_md_hash:
+            old_history = list(_gemini_chat.history) if _gemini_chat.history else []
            if _gemini_cache:
                try: _gemini_client.caches.delete(name=_gemini_cache.name)
                except: pass
-            _gemini_chat = None
-            _gemini_cache = None
-            _gemini_cache_created_at = None
-            _append_comms("OUT", "request", {"message": "[CONTEXT CHANGED] Rebuilding cache and chat session..."})
-
-        # CACHE TTL: Proactively rebuild before the cache expires server-side.
-        # If we don't, send_message() will reference a deleted cache and fail.
-        if _gemini_chat and _gemini_cache and _gemini_cache_created_at:
-            elapsed = time.time() - _gemini_cache_created_at
-            if elapsed > _GEMINI_CACHE_TTL * 0.9:
-                old_history = list(_get_gemini_history_list(_gemini_chat)) if _get_gemini_history_list(_gemini_chat) else []
-                try: _gemini_client.caches.delete(name=_gemini_cache.name)
-                except: pass
-                _gemini_chat = None
-                _gemini_cache = None
-                _gemini_cache_created_at = None
-                _append_comms("OUT", "request", {"message": f"[CACHE TTL] Rebuilding cache (expired after {int(elapsed)}s)..."})
+            _gemini_chat, _gemini_cache = None, None
+            _append_comms("OUT", "request", {"message": "[STATIC CONTEXT CHANGED] Rebuilding cache and chat session..."})

        if not _gemini_chat:
            chat_config = types.GenerateContentConfig(
-                system_instruction=sys_instr,
-                tools=tools_decl,
-                temperature=_temperature,
-                max_output_tokens=_max_tokens,
+                system_instruction=sys_instr, tools=tools_decl, temperature=_temperature, max_output_tokens=_max_tokens,
                safety_settings=[types.SafetySetting(category="HARM_CATEGORY_DANGEROUS_CONTENT", threshold="BLOCK_ONLY_HIGH")]
            )
            try:
-                # Gemini requires 1024 (Flash) or 4096 (Pro) tokens to cache.
-                _gemini_cache = _gemini_client.caches.create(
-                    model=_model,
-                    config=types.CreateCachedContentConfig(
-                        system_instruction=sys_instr,
-                        tools=tools_decl,
-                        ttl=f"{_GEMINI_CACHE_TTL}s",
-                    )
-                )
-                _gemini_cache_created_at = time.time()
+                _gemini_cache = _gemini_client.caches.create(model=_model, config=types.CreateCachedContentConfig(system_instruction=sys_instr, tools=tools_decl, ttl="3600s"))
                chat_config = types.GenerateContentConfig(
-                    cached_content=_gemini_cache.name,
-                    temperature=_temperature,
-                    max_output_tokens=_max_tokens,
+                    cached_content=_gemini_cache.name, temperature=_temperature, max_output_tokens=_max_tokens,
                    safety_settings=[types.SafetySetting(category="HARM_CATEGORY_DANGEROUS_CONTENT", threshold="BLOCK_ONLY_HIGH")]
                )
                _append_comms("OUT", "request", {"message": f"[CACHE CREATED] {_gemini_cache.name}"})
-            except Exception as e:
-                _gemini_cache = None
-                _gemini_cache_created_at = None
-                _append_comms("OUT", "request", {"message": f"[CACHE FAILED] {type(e).__name__}: {e} — falling back to inline system_instruction"})
-
+            except Exception: _gemini_cache = None
+                
            kwargs = {"model": _model, "config": chat_config}
-            if old_history:
-                kwargs["history"] = old_history
-
+            if old_history: kwargs["history"] = old_history
            _gemini_chat = _gemini_client.chats.create(**kwargs)
-            _gemini_cache_md_hash = current_md_hash
-        
-        _append_comms("OUT", "request", {"message": f"[ctx {len(md_content)} + msg {len(user_message)}]"})
-        payload, all_text = user_message, []
+            _gemini_chat._last_md_hash = current_md_hash

-        # Strip stale file refreshes and truncate old tool outputs ONCE before
-        # entering the tool loop (not per-round — history entries don't change).
-        if _gemini_chat and _get_gemini_history_list(_gemini_chat):
-            for msg in _get_gemini_history_list(_gemini_chat):
+        import re
+        if _gemini_chat and _gemini_chat.history:
+            for msg in _gemini_chat.history:
                if msg.role == "user" and hasattr(msg, "parts"):
                    for p in msg.parts:
+                        if hasattr(p, "text") and p.text and "<discussion>" in p.text:
+                            p.text = re.sub(r"<discussion>.*?</discussion>\n\n", "", p.text, flags=re.DOTALL)
                        if hasattr(p, "function_response") and p.function_response and hasattr(p.function_response, "response"):
                            r = p.function_response.response
-                            if isinstance(r, dict) and "output" in r:
-                                val = r["output"]
-                                if isinstance(val, str):
-                                    if "[SYSTEM: FILES UPDATED]" in val:
-                                        val = val.split("[SYSTEM: FILES UPDATED]")[0].strip()
-                                    if _history_trunc_limit > 0 and len(val) > _history_trunc_limit:
-                                        val = val[:_history_trunc_limit] + "\n\n... [TRUNCATED BY SYSTEM TO SAVE TOKENS.]"
-                                    r["output"] = val
+                            r_dict = r if isinstance(r, dict) else getattr(r, "__dict__", {})
+                            val = r_dict.get("output") if isinstance(r_dict, dict) else getattr(r, "output", None)
+                            if isinstance(val, str):
+                                if "[SYSTEM: FILES UPDATED]" in val: val = val.split("[SYSTEM: FILES UPDATED]")[0].strip()
+                                if _history_trunc_limit > 0 and len(val) > _history_trunc_limit:
+                                    val = val[:_history_trunc_limit] + "\n\n... [TRUNCATED BY SYSTEM TO SAVE TOKENS.]"
+                                if isinstance(r, dict): r["output"] = val
+                                else: setattr(r, "output", val)

+        full_user_msg = f"<discussion>\n{dynamic_md}\n</discussion>\n\n{user_message}" if dynamic_md else user_message
+        _append_comms("OUT", "request", {"message": f"[ctx {len(static_md)} static + {len(dynamic_md)} dynamic + msg {len(user_message)}]"})
+        
+        payload, all_text = full_user_msg, []
+        
        for r_idx in range(MAX_TOOL_ROUNDS + 2):
            resp = _gemini_chat.send_message(payload)
            txt = "\n".join(p.text for c in resp.candidates if getattr(c, "content", None) for p in c.content.parts if hasattr(p, "text") and p.text)
@@ -587,34 +521,27 @@ def _send_gemini(md_content: str, user_message: str, base_dir: str, file_items:
            calls = [p.function_call for c in resp.candidates if getattr(c, "content", None) for p in c.content.parts if hasattr(p, "function_call") and p.function_call]
            usage = {"input_tokens": getattr(resp.usage_metadata, "prompt_token_count", 0), "output_tokens": getattr(resp.usage_metadata, "candidates_token_count", 0)}
            cached_tokens = getattr(resp.usage_metadata, "cached_content_token_count", None)
-            if cached_tokens:
-                usage["cache_read_input_tokens"] = cached_tokens
+            if cached_tokens: usage["cache_read_input_tokens"] = cached_tokens
            reason = resp.candidates[0].finish_reason.name if resp.candidates and hasattr(resp.candidates[0], "finish_reason") else "STOP"
            
            _append_comms("IN", "response", {"round": r_idx, "stop_reason": reason, "text": txt, "tool_calls": [{"name": c.name, "args": dict(c.args)} for c in calls], "usage": usage})
            
-            # Guard: if Gemini reports input tokens approaching the limit, drop oldest history pairs
            total_in = usage.get("input_tokens", 0)
-            if total_in > _GEMINI_MAX_INPUT_TOKENS and _gemini_chat and _get_gemini_history_list(_gemini_chat):
-                hist = _get_gemini_history_list(_gemini_chat)
+            if total_in > _GEMINI_MAX_INPUT_TOKENS and _gemini_chat and _gemini_chat.history:
+                hist = list(_gemini_chat.history)
                dropped = 0
-                # Drop oldest pairs (user+model) but keep at least the last 2 entries
                while len(hist) > 4 and total_in > _GEMINI_MAX_INPUT_TOKENS * 0.7:
-                    # Drop in pairs (user + model) to maintain alternating roles required by Gemini
-                    saved = 0
-                    for _ in range(2):
-                        if not hist: break
-                        for p in hist[0].parts:
-                            if hasattr(p, "text") and p.text:
-                                saved += len(p.text) // 4
-                            elif hasattr(p, "function_response") and p.function_response:
-                                r = getattr(p.function_response, "response", {})
-                                if isinstance(r, dict):
-                                    saved += len(str(r.get("output", ""))) // 4
-                        hist.pop(0)
-                        dropped += 1
-                    total_in -= max(saved, 200)
+                    saved = sum(len(p.text)//4 for p in hist[0].parts if hasattr(p, "text") and p.text)
+                    for p in hist[0].parts:
+                        if hasattr(p, "function_response") and p.function_response:
+                            r = getattr(p.function_response, "response", {})
+                            val = r.get("output", "") if isinstance(r, dict) else getattr(r, "output", "")
+                            saved += len(str(val)) // 4
+                    hist.pop(0)
+                    total_in -= max(saved, 100)
+                    dropped += 1
                if dropped > 0:
+                    _gemini_chat.history = hist
                    _append_comms("OUT", "request", {"message": f"[GEMINI HISTORY TRIMMED: dropped {dropped} old entries to stay within token budget]"})

            if not calls or r_idx > MAX_TOOL_ROUNDS: break
@@ -633,12 +560,11 @@ def _send_gemini(md_content: str, user_message: str, base_dir: str, file_items:

                if i == len(calls) - 1:
                    if file_items:
-                        file_items, changed = _reread_file_items(file_items)
-                        ctx = _build_file_context_text(changed)
-                        if ctx:
-                            out += f"\n\n[SYSTEM: FILES UPDATED]\n\n{ctx}"
+                        file_items = _reread_file_items(file_items)
+                        ctx = _build_file_context_text(file_items)
+                        if ctx: out += f"\n\n[SYSTEM: FILES UPDATED]\n\n{ctx}"
                    if r_idx == MAX_TOOL_ROUNDS: out += "\n\n[SYSTEM: MAX ROUNDS. PROVIDE FINAL ANSWER.]"
-
+                
                f_resps.append(types.Part.from_function_response(name=name, response={"output": out}))
                log.append({"tool_use_id": name, "content": out})
            
@@ -670,15 +596,7 @@ _FILE_REFRESH_MARKER = "[FILES UPDATED"


 def _estimate_message_tokens(msg: dict) -> int:
-    """
-    Rough token estimate for a single Anthropic message dict.
-    Caches the result on the dict as '_est_tokens' so repeated calls
-    (e.g., from _trim_anthropic_history) don't re-scan unchanged messages.
-    Call _invalidate_token_estimate() when a message's content is modified.
-    """
-    cached = msg.get("_est_tokens")
-    if cached is not None:
-        return cached
+    """Rough token estimate for a single Anthropic message dict."""
    total_chars = 0
    content = msg.get("content", "")
    if isinstance(content, str):
@@ -696,14 +614,7 @@ def _estimate_message_tokens(msg: dict) -> int:
                    total_chars += len(_json.dumps(inp, ensure_ascii=False))
            elif isinstance(block, str):
                total_chars += len(block)
-    est = max(1, int(total_chars / _CHARS_PER_TOKEN))
-    msg["_est_tokens"] = est
-    return est
-
-
-def _invalidate_token_estimate(msg: dict):
-    """Remove the cached token estimate so the next call recalculates."""
-    msg.pop("_est_tokens", None)
+    return max(1, int(total_chars / _CHARS_PER_TOKEN))


 def _estimate_prompt_tokens(system_blocks: list[dict], history: list[dict]) -> int:
@@ -715,86 +626,48 @@ def _estimate_prompt_tokens(system_blocks: list[dict], history: list[dict]) -> i
        total += max(1, int(len(text) / _CHARS_PER_TOKEN))
    # Tool definitions (rough fixed estimate — they're ~2k tokens for our set)
    total += 2500
-    # History messages (uses cached estimates for unchanged messages)
+    # History messages
    for msg in history:
        total += _estimate_message_tokens(msg)
    return total


 def _strip_stale_file_refreshes(history: list[dict]):
-    """
-    Remove [FILES UPDATED ...] text blocks from all history turns EXCEPT
-    the very last user message. These are stale snapshots from previous
-    tool rounds that bloat the context without providing value.
-    """
    if len(history) < 2:
        return
-    # Find the index of the last user message — we keep its file refresh intact
-    last_user_idx = -1
-    for i in range(len(history) - 1, -1, -1):
-        if history[i].get("role") == "user":
-            last_user_idx = i
-            break
+    last_user_idx = next((i for i in range(len(history)-1, -1, -1) if history[i].get("role") == "user"), -1)
    for i, msg in enumerate(history):
        if msg.get("role") != "user" or i == last_user_idx:
            continue
        content = msg.get("content")
        if not isinstance(content, list):
            continue
-        cleaned = []
-        for block in content:
-            if isinstance(block, dict) and block.get("type") == "text":
-                text = block.get("text", "")
-                if text.startswith(_FILE_REFRESH_MARKER):
-                    continue  # drop this stale file refresh block
-            cleaned.append(block)
+        cleaned = [b for b in content if not (isinstance(b, dict) and b.get("type") == "text" and b.get("text", "").startswith(_FILE_REFRESH_MARKER))]
        if len(cleaned) < len(content):
            msg["content"] = cleaned
-            _invalidate_token_estimate(msg)


-def _trim_anthropic_history(system_blocks: list[dict], history: list[dict]):
-    """
-    Trim the Anthropic history to fit within the token budget.
-    Strategy:
-      1. Strip stale file-refresh injections from old turns.
-      2. If still over budget, drop oldest turn pairs (user + assistant).
-    Returns the number of messages dropped.
-    """
-    # Phase 1: strip stale file refreshes
+def _trim_anthropic_history(system_blocks: list[dict], history: list[dict]) -> int:
    _strip_stale_file_refreshes(history)
-
    est = _estimate_prompt_tokens(system_blocks, history)
    if est <= _ANTHROPIC_MAX_PROMPT_TOKENS:
        return 0
-
-    # Phase 2: drop oldest turn pairs until within budget
    dropped = 0
    while len(history) > 3 and est > _ANTHROPIC_MAX_PROMPT_TOKENS:
-        # Protect history[0] (original user prompt). Drop from history[1] (assistant) and history[2] (user)
        if history[1].get("role") == "assistant" and len(history) > 2 and history[2].get("role") == "user":
-            removed_asst = history.pop(1)
-            removed_user = history.pop(1)
+            est -= _estimate_message_tokens(history.pop(1))
+            est -= _estimate_message_tokens(history.pop(1))
            dropped += 2
-            est -= _estimate_message_tokens(removed_asst)
-            est -= _estimate_message_tokens(removed_user)
-            # Also drop dangling tool_results if the next message is an assistant and the removed user was just tool results
            while len(history) > 2 and history[1].get("role") == "assistant" and history[2].get("role") == "user":
-                content = history[2].get("content", [])
-                if isinstance(content, list) and content and isinstance(content[0], dict) and content[0].get("type") == "tool_result":
-                    r_a = history.pop(1)
-                    r_u = history.pop(1)
+                c = history[2].get("content", [])
+                if isinstance(c, list) and c and isinstance(c[0], dict) and c[0].get("type") == "tool_result":
+                    est -= _estimate_message_tokens(history.pop(1))
+                    est -= _estimate_message_tokens(history.pop(1))
                    dropped += 2
-                    est -= _estimate_message_tokens(r_a)
-                    est -= _estimate_message_tokens(r_u)
-                else:
-                    break
+                else: break
        else:
-            # Edge case fallback: drop index 1 (protecting index 0)
-            removed = history.pop(1)
+            est -= _estimate_message_tokens(history.pop(1))
            dropped += 1
-            est -= _estimate_message_tokens(removed)
-
    return dropped


@@ -842,28 +715,6 @@ def _strip_cache_controls(history: list[dict]):
                if isinstance(block, dict):
                    block.pop("cache_control", None)

-def _add_history_cache_breakpoint(history: list[dict]):
-    """
-    Place cache_control:ephemeral on the last content block of the
-    second-to-last user message. This uses one of the 4 allowed Anthropic
-    cache breakpoints to cache the conversation prefix so the full history
-    isn't reprocessed on every request.
-    """
-    user_indices = [i for i, m in enumerate(history) if m.get("role") == "user"]
-    if len(user_indices) < 2:
-        return  # Only one user message (the current turn) — nothing stable to cache
-    target_idx = user_indices[-2]
-    content = history[target_idx].get("content")
-    if isinstance(content, list) and content:
-        last_block = content[-1]
-        if isinstance(last_block, dict):
-            last_block["cache_control"] = {"type": "ephemeral"}
-    elif isinstance(content, str):
-        history[target_idx]["content"] = [
-            {"type": "text", "text": content, "cache_control": {"type": "ephemeral"}}
-        ]
-
-
 def _repair_anthropic_history(history: list[dict]):
    """
    If history ends with an assistant message that contains tool_use blocks
@@ -896,217 +747,119 @@ def _repair_anthropic_history(history: list[dict]):
    })


-def _send_anthropic(md_content: str, user_message: str, base_dir: str, file_items: list[dict] | None = None) -> str:
+def _send_anthropic(static_md: str, dynamic_md: str, user_message: str, base_dir: str, file_items: list[dict] | None = None) -> str:
    try:
        _ensure_anthropic_client()
        mcp_client.configure(file_items or [], [base_dir])

-        # Split system into two cache breakpoints:
-        # 1. Stable system prompt (never changes — always a cache hit)
-        # 2. Dynamic file context (invalidated only when files change)
-        stable_prompt = _get_combined_system_prompt()
-        stable_blocks = [{"type": "text", "text": stable_prompt, "cache_control": {"type": "ephemeral"}}]
-        context_text = f"\n\n<context>\n{md_content}\n</context>"
-        context_blocks = _build_chunked_context_blocks(context_text)
-        system_blocks = stable_blocks + context_blocks
+        system_text = _get_combined_system_prompt() + f"\n\n<context>\n{static_md}\n</context>"
+        system_blocks = _build_chunked_context_blocks(system_text)
+        
+        if dynamic_md:
+            system_blocks.append({"type": "text", "text": f"<discussion>\n{dynamic_md}\n</discussion>"})

        user_content = [{"type": "text", "text": user_message}]

-        # COMPRESS HISTORY: Truncate massive tool outputs from previous turns
        for msg in _anthropic_history:
            if msg.get("role") == "user" and isinstance(msg.get("content"), list):
-                modified = False
                for block in msg["content"]:
                    if isinstance(block, dict) and block.get("type") == "tool_result":
                        t_content = block.get("content", "")
                        if _history_trunc_limit > 0 and isinstance(t_content, str) and len(t_content) > _history_trunc_limit:
                            block["content"] = t_content[:_history_trunc_limit] + "\n\n... [TRUNCATED BY SYSTEM TO SAVE TOKENS. Original output was too large.]"
-                            modified = True
-                if modified:
-                    _invalidate_token_estimate(msg)

        _strip_cache_controls(_anthropic_history)
        _repair_anthropic_history(_anthropic_history)
+        
+        user_content[-1]["cache_control"] = {"type": "ephemeral"}
        _anthropic_history.append({"role": "user", "content": user_content})
-        # Use the 4th cache breakpoint to cache the conversation history prefix.
-        # This is placed on the second-to-last user message (the last stable one).
-        _add_history_cache_breakpoint(_anthropic_history)

        n_chunks = len(system_blocks)
        _append_comms("OUT", "request", {
-            "message": (
-                f"[system {n_chunks} chunk(s), {len(md_content)} chars context] "
-                f"{user_message[:200]}{'...' if len(user_message) > 200 else ''}"
-            ),
+            "message": (f"[system {n_chunks} chunk(s), {len(static_md)} static + {len(dynamic_md)} dynamic chars context] "
+                        f"{user_message[:200]}{'...' if len(user_message) > 200 else ''}"),
        })

        all_text_parts = []

-        # We allow MAX_TOOL_ROUNDS, plus 1 final loop to get the text synthesis
        for round_idx in range(MAX_TOOL_ROUNDS + 2):
-            # Trim history to fit within token budget before each API call
            dropped = _trim_anthropic_history(system_blocks, _anthropic_history)
            if dropped > 0:
                est_tokens = _estimate_prompt_tokens(system_blocks, _anthropic_history)
-                _append_comms("OUT", "request", {
-                    "message": (
-                        f"[HISTORY TRIMMED: dropped {dropped} old messages to fit token budget. "
-                        f"Estimated {est_tokens} tokens remaining. {len(_anthropic_history)} messages in history.]"
-                    ),
-                })
-
-            def _strip_private_keys(history):
-                return [{k: v for k, v in m.items() if not k.startswith("_")} for m in history]
+                _append_comms("OUT", "request", {"message": f"[HISTORY TRIMMED: dropped {dropped} old messages to fit token budget. Estimated {est_tokens} tokens remaining.]"})

            response = _anthropic_client.messages.create(
-                model=_model,
-                max_tokens=_max_tokens,
-                temperature=_temperature,
-                system=system_blocks,
-                tools=_get_anthropic_tools(),
-                messages=_strip_private_keys(_anthropic_history),
+                model=_model, max_tokens=_max_tokens, temperature=_temperature,
+                system=system_blocks, tools=_get_anthropic_tools(), messages=_anthropic_history,
            )

-            # Convert SDK content block objects to plain dicts before storing in history
            serialised_content = [_content_block_to_dict(b) for b in response.content]
-
-            _anthropic_history.append({
-                "role": "assistant",
-                "content": serialised_content,
-            })
+            _anthropic_history.append({"role": "assistant", "content": serialised_content})

            text_blocks = [b.text for b in response.content if hasattr(b, "text") and b.text]
-            if text_blocks:
-                all_text_parts.append("\n".join(text_blocks))
+            if text_blocks: all_text_parts.append("\n".join(text_blocks))

-            tool_use_blocks = [
-                {"id": b.id, "name": b.name, "input": b.input}
-                for b in response.content
-                if getattr(b, "type", None) == "tool_use"
-            ]
+            tool_use_blocks = [{"id": b.id, "name": b.name, "input": b.input} for b in response.content if getattr(b, "type", None) == "tool_use"]

-            usage_dict: dict = {}
+            usage_dict = {}
            if response.usage:
-                usage_dict["input_tokens"]  = response.usage.input_tokens
-                usage_dict["output_tokens"] = response.usage.output_tokens
-                cache_creation = getattr(response.usage, "cache_creation_input_tokens", None)
-                cache_read     = getattr(response.usage, "cache_read_input_tokens",     None)
-                if cache_creation is not None:
-                    usage_dict["cache_creation_input_tokens"] = cache_creation
-                if cache_read is not None:
-                    usage_dict["cache_read_input_tokens"] = cache_read
+                usage_dict.update({"input_tokens": response.usage.input_tokens, "output_tokens": response.usage.output_tokens})
+                if getattr(response.usage, "cache_creation_input_tokens", None) is not None:
+                    usage_dict["cache_creation_input_tokens"] = response.usage.cache_creation_input_tokens
+                if getattr(response.usage, "cache_read_input_tokens", None) is not None:
+                    usage_dict["cache_read_input_tokens"] = response.usage.cache_read_input_tokens

-            _append_comms("IN", "response", {
-                "round":       round_idx,
-                "stop_reason": response.stop_reason,
-                "text":        "\n".join(text_blocks),
-                "tool_calls":  tool_use_blocks,
-                "usage":       usage_dict,
-            })
+            _append_comms("IN", "response", {"round": round_idx, "stop_reason": response.stop_reason, "text": "\n".join(text_blocks), "tool_calls": tool_use_blocks, "usage": usage_dict})

-            if response.stop_reason != "tool_use" or not tool_use_blocks:
-                break
-
-            if round_idx > MAX_TOOL_ROUNDS:
-                # The model ignored the MAX ROUNDS warning and kept calling tools.
-                # Force abort to prevent infinite loop.
-                break
+            if response.stop_reason != "tool_use" or not tool_use_blocks: break
+            if round_idx > MAX_TOOL_ROUNDS: break

            tool_results = []
            for block in response.content:
-                if getattr(block, "type", None) != "tool_use":
-                    continue
-                b_name = getattr(block, "name", None)
-                b_id   = getattr(block, "id",   "")
-                b_input = getattr(block, "input", {})
+                if getattr(block, "type", None) != "tool_use": continue
+                b_name, b_id, b_input = getattr(block, "name", None), getattr(block, "id", ""), getattr(block, "input", {})
                if b_name in mcp_client.TOOL_NAMES:
                    _append_comms("OUT", "tool_call", {"name": b_name, "id": b_id, "args": b_input})
-                    output = mcp_client.dispatch(b_name, b_input)
-                    _append_comms("IN", "tool_result", {"name": b_name, "id": b_id, "output": output})
-                    tool_results.append({
-                        "type":        "tool_result",
-                        "tool_use_id": b_id,
-                        "content":     output,
-                    })
+                    out = mcp_client.dispatch(b_name, b_input)
                elif b_name == TOOL_NAME:
-                    script = b_input.get("script", "")
-                    _append_comms("OUT", "tool_call", {
-                        "name":   TOOL_NAME,
-                        "id":     b_id,
-                        "script": script,
-                    })
-                    output = _run_script(script, base_dir)
-                    _append_comms("IN", "tool_result", {
-                        "name":   TOOL_NAME,
-                        "id":     b_id,
-                        "output": output,
-                    })
-                    tool_results.append({
-                        "type":        "tool_result",
-                        "tool_use_id": b_id,
-                        "content":     output,
-                    })
+                    scr = b_input.get("script", "")
+                    _append_comms("OUT", "tool_call", {"name": TOOL_NAME, "id": b_id, "script": scr})
+                    out = _run_script(scr, base_dir)
+                else: out = f"ERROR: unknown tool '{b_name}'"
+                
+                _append_comms("IN", "tool_result", {"name": b_name, "id": b_id, "output": out})
+                tool_results.append({"type": "tool_result", "tool_use_id": b_id, "content": out})

-            # Refresh file context after tool calls — only inject CHANGED files
            if file_items:
-                file_items, changed = _reread_file_items(file_items)
-                refreshed_ctx = _build_file_context_text(changed)
+                file_items = _reread_file_items(file_items)
+                refreshed_ctx = _build_file_context_text(file_items)
                if refreshed_ctx:
-                    tool_results.append({
-                        "type": "text",
-                        "text": (
-                            "[FILES UPDATED — current contents below. "
-                            "Do NOT re-read these files with PowerShell.]\n\n"
-                            + refreshed_ctx
-                        ),
-                    })
+                    tool_results.append({"type": "text", "text": f"[{_FILE_REFRESH_MARKER} — current contents below. Do NOT re-read these files with PowerShell.]\n\n{refreshed_ctx}"})

            if round_idx == MAX_TOOL_ROUNDS:
-                tool_results.append({
-                    "type": "text",
-                    "text": "SYSTEM WARNING: MAX TOOL ROUNDS REACHED. YOU MUST PROVIDE YOUR FINAL ANSWER NOW WITHOUT CALLING ANY MORE TOOLS."
-                })
+                tool_results.append({"type": "text", "text": "SYSTEM WARNING: MAX TOOL ROUNDS REACHED. YOU MUST PROVIDE YOUR FINAL ANSWER NOW WITHOUT CALLING ANY MORE TOOLS."})

-            _anthropic_history.append({
-                "role":    "user",
-                "content": tool_results,
-            })
-
-            _append_comms("OUT", "tool_result_send", {
-                "results": [
-                    {"tool_use_id": r["tool_use_id"], "content": r["content"]}
-                    for r in tool_results if r.get("type") == "tool_result"
-                ],
-            })
+            _anthropic_history.append({"role": "user", "content": tool_results})
+            _append_comms("OUT", "tool_result_send", {"results": [{"tool_use_id": r["tool_use_id"], "content": r["content"]} for r in tool_results if r.get("type") == "tool_result"]})

        final_text = "\n\n".join(all_text_parts)
        return final_text if final_text.strip() else "(No text returned by the model)"
-
-    except ProviderError:
-        raise
-    except Exception as exc:
-        raise _classify_anthropic_error(exc) from exc
+    except ProviderError: raise
+    except Exception as exc: raise _classify_anthropic_error(exc) from exc


 # ------------------------------------------------------------------ unified send

 def send(
-    md_content: str,
+    static_md: str,
+    dynamic_md: str,
    user_message: str,
    base_dir: str = ".",
    file_items: list[dict] | None = None,
 ) -> str:
-    """
-    Send a message to the active provider.
-
-    md_content  : aggregated markdown string from aggregate.run()
-    user_message: the user question / instruction
-    base_dir    : project base directory (for PowerShell tool calls)
-    file_items  : list of file dicts from aggregate.build_file_items() for
-                  dynamic context refresh after tool calls
-    """
+    """Send a message to the active provider."""
    if _provider == "gemini":
-        return _send_gemini(md_content, user_message, base_dir, file_items)
+        return _send_gemini(static_md, dynamic_md, user_message, base_dir, file_items)
    elif _provider == "anthropic":
-        return _send_anthropic(md_content, user_message, base_dir, file_items)
-    raise ValueError(f"unknown provider: {_provider}")
+        return _send_anthropic(static_md, dynamic_md, user_message, base_dir, file_items)
+    raise ValueError(f"unknown provider: {_provider}")
@@ -10,11 +10,11 @@ system_prompt = "DO NOT EVER make a shell script unless told to. DO NOT EVER mak
 palette = "10x Dark"
 font_path = "C:/Users/Ed/AppData/Local/uv/cache/archive-v0/WSthkYsQ82b_ywV6DkiaJ/pygame_gui/data/FiraCode-Regular.ttf"
 font_size = 18.0
-scale = 1.25
+scale = 1.1

 [projects]
 paths = [
    "manual_slop.toml",
    "C:/projects/forth/bootslop/bootslop.toml",
 ]
-active = "manual_slop.toml"
+active = "C:/projects/forth/bootslop/bootslop.toml"
@@ -29,7 +29,7 @@ Controls what is explicitly fed into the context compiler.

 - **Base Dir:** Defines the root for path resolution and tool constraints.
 - **Paths:** Explicit files or wildcard globs (e.g., src/**/*.rs). 
- When generating a request, full file contents are inlined into the context by default (`summary_only=False`). The AI can also call `get_file_summary` via its MCP tools to get a compact structural view of any file on demand.
+- When generating a request, these files are summarized symbolically (summarize.py) to conserve tokens, unless the AI explicitly decides to read their full contents via its internal tools.

 ## Interaction Panels

@@ -46,9 +46,8 @@ Switch between API backends (Gemini, Anthropic) on the fly. Clicking "Fetch Mode

 ### Global Text Viewer & Script Outputs

- **Last Script Output:** Whenever the AI executes a background script, this window pops up, flashing blue. It contains both the executed script and the stdout/stderr. The `[+ Maximize]` buttons read directly from stored instance variables (`_last_script`, `_last_output`) rather than DPG widget tags, so they work correctly regardless of word-wrap state.
+- **Last Script Output:** Whenever the AI executes a background script, this window pops up, flashing blue. It contains both the executed script and the stdout/stderr.
 - **Text Viewer:** A large, resizable global popup invoked anytime you click a [+] or [+ Maximize] button in the UI. Used for deep-reading long logs, discussion entries, or script bodies.
- **Confirm Dialog:** The `[+ Maximize]` button in the script approval modal passes the script text directly as `user_data` at button-creation time, so it remains safe to click even after the dialog has been dismissed.

 ## System Prompts

@@ -1,4 +1,4 @@
-# Guide: Architecture
+# Guide: Architecture

 Overview of the package design, state management, and code-path layout.

@@ -33,9 +33,10 @@ This occurs inside aggregate.run.
 If using the default workflow, aggregate.py hashes through the following process:

 1. **Glob Resolution:** Iterates through config["files"]["paths"] and unpacks any wildcards (e.g., src/**/*.rs) against the designated base_dir.
-2. **File Item Build:** `build_file_items()` reads each resolved file once, storing path, content, and `mtime`. This list is returned alongside the markdown so `ai_client.py` can use it for dynamic context refresh after tool calls without re-reading from disk.
-3. **Markdown Generation:** `build_markdown_from_items()` assembles the final `<project>_00N.md` string. By default (`summary_only=False`) it inlines full file contents. If `summary_only=True`, it delegates to `summarize.build_summary_markdown()` which uses AST-based heuristics to produce compact structural summaries instead.
-4. The Markdown file is persisted to disk (`./md_gen/` by default) for auditing. `run()` returns a 3-tuple `(markdown_str, output_path, file_items)`.
+2. **Summarization Pass:** Instead of concatenating raw file bodies (which would quickly overwhelm the ~200k token limit over multiple rounds), the files are passed to summarize.py.
+3. **AST Parsing:** summarize.py runs a heuristic pass. For Python files, it uses the standard ast module to read structural nodes (Classes, Methods, Imports, Constants). It outputs a compact Markdown table.
+4. **Markdown Generation:** The final <project>_00N.md string is constructed, comprising the truncated AST summaries, the user's current project system prompt, and the active discussion branch. 
+5. The Markdown file is persisted to disk (./md_gen/ by default) for auditing.

 ### AI Communication & The Tool Loop

@@ -84,4 +85,3 @@ All I/O bound session data is recorded sequentially. session_logger.py hooks int
 - logs/comms_<ts>.log: A JSON-L structured timeline of every raw payload sent/received.
 - logs/toolcalls_<ts>.log: A sequential markdown record detailing every AI tool invocation and its exact stdout result.
 - scripts/generated/: Every .ps1 script approved and executed by the shell runner is physically written to disk for version control transparency.
-
@@ -12,22 +12,17 @@ Implemented in mcp_client.py. These tools allow the AI to selectively expand its

 ### Security & Scope

-Every **filesystem** MCP tool passes its arguments through `_resolve_and_check`. This function ensures that the requested path falls under one of the allowed directories defined in the GUI's Base Dir configurations.
+Every filesystem MCP tool passes its arguments through _resolve_and_check. This function ensures that the requested path falls under one of the allowed directories defined in the GUI's Base Dir configurations. 
 If the AI attempts to read or search a path outside the project bounds, the tool safely catches the constraint violation and returns ACCESS DENIED.

-The two **web tools** (`web_search`, `fetch_url`) bypass this check entirely — they have no filesystem access and are unrestricted.
-
 ### Supplied Tools:

-**Filesystem tools** (access-controlled via `_resolve_and_check`):
-* `read_file(path)`: Returns the raw UTF-8 text of a file.
-* `list_directory(path)`: Returns a formatted table of a directory's contents, showing file vs dir and byte sizes.
-* `search_files(path, pattern)`: Executes a glob search (e.g., `**/*.py`) within an allowed directory.
-* `get_file_summary(path)`: Invokes the local `summarize.py` heuristic parser to get the AST structure of a file without reading the whole body.
-
-**Web tools** (unrestricted — no filesystem access):
-* `web_search(query)`: Queries DuckDuckGo's raw HTML endpoint and returns the top 5 results (title, URL, snippet) using a native `_DDGParser` (HTMLParser subclass) to avoid heavy dependencies.
-* `fetch_url(url)`: Downloads a target webpage and strips out all scripts, styling, and structural HTML via `_TextExtractor`, returning only the raw prose content (clamped to 40,000 characters). Automatically resolves DuckDuckGo redirect links.
+* read_file(path): Returns the raw UTF-8 text of a file.
+* list_directory(path): Returns a formatted table of a directory's contents, showing file vs dir and byte sizes.
+* search_files(path, pattern): Executes an absolute glob search (e.g., **/*.py) to find specific files.
+* get_file_summary(path): Invokes the local summarize.py heuristic parser to get the AST structure of a file without reading the whole body.
+* web_search(query): Queries DuckDuckGo's raw HTML endpoint and returns the top 5 results (Titles, URLs, Snippets) using a native HTMLParser to avoid heavy dependencies.
+* fetch_url(url): Downloads a target webpage and strips out all scripts, styling, and structural HTML, returning only the raw prose content (clamped to 40,000 characters).

 ## 2. Destructive Execution (run_powershell)

@@ -1,4 +1,4 @@
-# gui.py
+# gui.py
 """
 Note(Gemini):
 The main DearPyGui interface orchestrator.
@@ -121,10 +121,19 @@ def _add_kv_row(parent: str, key: str, val, val_color=None):


 def _render_usage(parent: str, usage: dict):
-    """Render Anthropic usage dict as a compact token table."""
+    """Render Anthropic usage dict as a compact token table, with true totals."""
    if not usage:
        return
    dpg.add_text("usage:", color=_SUBHDR_COLOR, parent=parent)
+    
+    cache_read = usage.get("cache_read_input_tokens", 0)
+    cache_create = usage.get("cache_creation_input_tokens", 0)
+    raw_input = usage.get("input_tokens", 0)
+    total_in = cache_read + cache_create + raw_input
+    
+    if total_in > raw_input:
+        _add_kv_row(parent, "  total_input_tokens", total_in, _NUM_COLOR)
+        
    order = [
        "input_tokens",
        "cache_read_input_tokens",
@@ -301,9 +310,9 @@ class ConfirmDialog:
                with dpg.group(horizontal=True):
                    dpg.add_text("Script:")
                    dpg.add_button(
-                        label="[+ Maximize]",
-                        user_data=self._script,
-                        callback=lambda s, a, u: _show_text_viewer("Confirm Script", u)
+                        label="[+ Maximize]", 
+                        user_data=f"{self._tag}_script",
+                        callback=lambda s, a, u: _show_text_viewer("Confirm Script", dpg.get_value(u))
                    )
                dpg.add_input_text(
                    tag=f"{self._tag}_script",
@@ -432,8 +441,6 @@ class App:
        self._pending_dialog_lock = threading.Lock()

        self._tool_log: list[tuple[str, str]] = []
-        self._last_script: str = ""
-        self._last_output: str = ""

        # Comms log entries queued from background thread for main-thread rendering
        self._pending_comms: list[dict] = []
@@ -750,8 +757,6 @@ class App:
        return output

    def _append_tool_log(self, script: str, result: str):
-        self._last_script = script
-        self._last_output = result
        self._tool_log.append((script, result))
        self._rebuild_tool_log()
        
@@ -859,7 +864,7 @@ class App:
        }
        theme.save_to_config(self.config)

-    def _do_generate(self) -> tuple[str, Path, list]:
+    def _do_generate(self) -> tuple[str, str, Path, list]:
        self._flush_to_project()
        self._save_active_project()
        self._flush_to_config()
@@ -1114,8 +1119,9 @@ class App:

    def cb_md_only(self):
        try:
-            md, path, _file_items = self._do_generate()
-            self.last_md = md
+            s_md, d_md, path, _file_items = self._do_generate()
+            self.last_static_md = s_md
+            self.last_dynamic_md = d_md
            self.last_md_path = path
            self._update_status(f"md written: {path.name}")
        except Exception as e:
@@ -1138,8 +1144,9 @@ class App:
        if self.send_thread and self.send_thread.is_alive():
            return
        try:
-            md, path, file_items = self._do_generate()
-            self.last_md = md
+            s_md, d_md, path, file_items = self._do_generate()
+            self.last_static_md = s_md
+            self.last_dynamic_md = d_md
            self.last_md_path = path
            self.last_file_items = file_items
        except Exception as e:
@@ -1156,6 +1163,7 @@ class App:
        if global_sp: combined_sp.append(global_sp.strip())
        if project_sp: combined_sp.append(project_sp.strip())
        ai_client.set_custom_system_prompt("\n\n".join(combined_sp))
+        
        temp = dpg.get_value("ai_temperature") if dpg.does_item_exist("ai_temperature") else 0.0
        max_tok = dpg.get_value("ai_max_tokens") if dpg.does_item_exist("ai_max_tokens") else 8192
        trunc = dpg.get_value("ai_history_trunc") if dpg.does_item_exist("ai_history_trunc") else 8000
@@ -1166,7 +1174,7 @@ class App:
            if auto_add:
                self._queue_history_add("User", user_msg)
            try:
-                response = ai_client.send(self.last_md, user_msg, base_dir, self.last_file_items)
+                response = ai_client.send(getattr(self, "last_static_md", ""), getattr(self, "last_dynamic_md", ""), user_msg, base_dir, self.last_file_items)
                self._update_response(response)
                self._update_status("done")
                self._trigger_blink = True
@@ -1921,7 +1929,8 @@ class App:
                dpg.add_text("Script:")
                dpg.add_button(
                    label="[+ Maximize]",
-                    callback=lambda s, a, u: _show_text_viewer("Last Script", self._last_script),
+                    user_data="last_script_text",
+                    callback=lambda s, a, u: _show_text_viewer("Last Script", dpg.get_value(u))
                )
            dpg.add_input_text(
                tag="last_script_text",
@@ -1937,7 +1946,8 @@ class App:
                dpg.add_text("Output:")
                dpg.add_button(
                    label="[+ Maximize]",
-                    callback=lambda s, a, u: _show_text_viewer("Last Output", self._last_output),
+                    user_data="last_script_output",
+                    callback=lambda s, a, u: _show_text_viewer("Last Output", dpg.get_value(u))
                )
            dpg.add_input_text(
                tag="last_script_output",
@@ -2122,7 +2132,3 @@ def main():

 if __name__ == "__main__":
    main()
-
-
-
-