Compare commits
2 Commits
da8df7a393
...
4755f4b590
| Author | SHA1 | Date | |
|---|---|---|---|
| 4755f4b590 | |||
| 1b71b748db |
+13
-9
@@ -87,9 +87,9 @@ Is a local GUI tool for manually curating and sending context to AI APIs. It agg
|
||||
- All tool calls (script + result/rejection) are appended to `_tool_log` and displayed in the Tool Calls panel
|
||||
|
||||
**Dynamic file context refresh (ai_client.py):**
|
||||
- After every tool call round, all project files from `file_items` are re-read from disk via `_reread_file_items()`
|
||||
- For Anthropic: the refreshed file contents are injected as a `text` block appended to the `tool_results` user message, prefixed with `[FILES UPDATED]` and an instruction not to re-read them
|
||||
- For Gemini: files are re-read (updating the `file_items` list in place) but cannot be injected into tool results due to Gemini's structured function response format
|
||||
- After the last tool call in each round, all project files from `file_items` are re-read from disk via `_reread_file_items()`. The `file_items` variable is reassigned so subsequent rounds see fresh content.
|
||||
- For Anthropic: the refreshed file contents are injected as a `text` block appended to the `tool_results` user message, prefixed with `[FILES UPDATED]` and an instruction not to re-read them.
|
||||
- For Gemini: refreshed file contents are appended to the last function response's `output` string as a `[SYSTEM: FILES UPDATED]` block. On the next tool round, stale `[FILES UPDATED]` blocks are stripped from history and old tool outputs are truncated to `_history_trunc_limit` characters to control token growth.
|
||||
- `_build_file_context_text(file_items)` formats the refreshed files as markdown code blocks (same format as the original context)
|
||||
- The `tool_result_send` comms log entry filters out the injected text block (only logs actual `tool_result` entries) to keep the comms panel clean
|
||||
- `file_items` flows from `aggregate.build_file_items()` → `gui.py` `self.last_file_items` → `ai_client.send(file_items=...)` → `_send_anthropic(file_items=...)` / `_send_gemini(file_items=...)`
|
||||
@@ -142,9 +142,11 @@ Entry layout: index + timestamp + direction + kind + provider/model header row,
|
||||
- `close_session()` flushes and closes both file handles; called just before `dpg.destroy_context()`
|
||||
|
||||
**Anthropic prompt caching:**
|
||||
- System prompt sent as an array with `cache_control: ephemeral` on the text block
|
||||
- Last tool in `_ANTHROPIC_TOOLS` has `cache_control: ephemeral`; system + tools prefix is cached together after the first request
|
||||
- First user message content[0] is the `<context>` block with `cache_control: ephemeral`; content[1] is the user question without cache control
|
||||
- System prompt + context are combined into one string, chunked into <=120k char blocks, and sent as the `system=` parameter array. Only the LAST chunk gets `cache_control: ephemeral`, so the entire system prefix is cached as one unit.
|
||||
- Last tool in `_ANTHROPIC_TOOLS` (`run_powershell`) has `cache_control: ephemeral`; this means the tools prefix is cached together with the system prefix after the first request.
|
||||
- The user message is sent as a plain `[{"type": "text", "text": user_message}]` block with NO cache_control. The context lives in `system=`, not in the first user message.
|
||||
- The tools list is built once per session via `_get_anthropic_tools()` and reused across all API calls within the tool loop, avoiding redundant Python-side reconstruction.
|
||||
- `_strip_cache_controls()` removes stale `cache_control` markers from all history entries before each API call, ensuring only the stable system/tools prefix consumes cache breakpoint slots.
|
||||
- Cache stats (creation tokens, read tokens) are surfaced in the comms log usage dict and displayed in the Comms History panel
|
||||
|
||||
**Data flow:**
|
||||
@@ -190,15 +192,17 @@ Entry layout: index + timestamp + direction + kind + provider/model header row,
|
||||
|
||||
**Known extension points:**
|
||||
- Add more providers by adding a section to `credentials.toml`, a `_list_*` and `_send_*` function in `ai_client.py`, and the provider name to the `PROVIDERS` list in `gui.py`
|
||||
- System prompt support could be added as a field in the project `.toml` and passed in `ai_client.send()`
|
||||
- Discussion history excerpts could be individually toggleable for inclusion in the generated md
|
||||
- `MAX_TOOL_ROUNDS` in `ai_client.py` caps agentic loops at 10 rounds; adjustable
|
||||
- `COMMS_CLAMP_CHARS` in `gui.py` controls the character threshold for clamping heavy payload fields in the Comms History panel
|
||||
- Additional project metadata (description, tags, created date) could be added to `[project]` in the per-project toml
|
||||
|
||||
### Gemini Context Management
|
||||
- Investigating ways to prevent context duplication in _gemini_chat history, as currently <context>{md_content}</context> is prepended to the user message on every single request, causing history bloat.
|
||||
- Discussing explicit Gemini Context Caching API (client.caches.create()) to store read-only file context and avoid re-reading files across sessions.
|
||||
- Gemini uses explicit caching via `client.caches.create()` to store the `system_instruction` + tools as an immutable cached prefix with a 1-hour TTL. The cache is created once per chat session.
|
||||
- When context changes (detected via `md_content` hash), the old cache is deleted, a new cache is created, and chat history is migrated to a fresh chat session pointing at the new cache.
|
||||
- If cache creation fails (e.g., content is under the minimum token threshold — 1024 for Flash, 4096 for Pro), the system falls back to inline `system_instruction` in the chat config. Implicit caching may still provide cost savings in this case.
|
||||
- The `<context>` block lives inside `system_instruction`, NOT in user messages, preventing history bloat across turns.
|
||||
- On cleanup/exit, active caches are deleted via `ai_client.cleanup()` to prevent orphaned billing.
|
||||
|
||||
### Latest Changes
|
||||
- Removed `Config` panel from the GUI to streamline per-project configuration.
|
||||
|
||||
@@ -171,5 +171,3 @@ def main():
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
|
||||
|
||||
+53
-29
@@ -217,6 +217,7 @@ def cleanup():
|
||||
def reset_session():
|
||||
global _gemini_client, _gemini_chat, _gemini_cache
|
||||
global _anthropic_client, _anthropic_history
|
||||
global _CACHED_ANTHROPIC_TOOLS
|
||||
if _gemini_client and _gemini_cache:
|
||||
try:
|
||||
_gemini_client.caches.delete(name=_gemini_cache.name)
|
||||
@@ -227,6 +228,7 @@ def reset_session():
|
||||
_gemini_cache = None
|
||||
_anthropic_client = None
|
||||
_anthropic_history = []
|
||||
_CACHED_ANTHROPIC_TOOLS = None
|
||||
file_cache.reset_client()
|
||||
|
||||
|
||||
@@ -309,6 +311,15 @@ def _build_anthropic_tools() -> list[dict]:
|
||||
|
||||
_ANTHROPIC_TOOLS = _build_anthropic_tools()
|
||||
|
||||
_CACHED_ANTHROPIC_TOOLS = None
|
||||
|
||||
def _get_anthropic_tools() -> list[dict]:
|
||||
"""Return the Anthropic tools list, rebuilding only once per session."""
|
||||
global _CACHED_ANTHROPIC_TOOLS
|
||||
if _CACHED_ANTHROPIC_TOOLS is None:
|
||||
_CACHED_ANTHROPIC_TOOLS = _build_anthropic_tools()
|
||||
return _CACHED_ANTHROPIC_TOOLS
|
||||
|
||||
|
||||
def _gemini_tool_declaration():
|
||||
from google.genai import types
|
||||
@@ -443,15 +454,13 @@ def _ensure_gemini_client():
|
||||
|
||||
|
||||
def _send_gemini(md_content: str, user_message: str, base_dir: str, file_items: list[dict] | None = None) -> str:
|
||||
global _gemini_chat
|
||||
global _gemini_chat, _gemini_cache
|
||||
from google.genai import types
|
||||
try:
|
||||
_ensure_gemini_client(); mcp_client.configure(file_items or [], [base_dir])
|
||||
sys_instr = f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"
|
||||
tools_decl = [_gemini_tool_declaration()]
|
||||
|
||||
global _gemini_cache, _gemini_chat
|
||||
|
||||
# DYNAMIC CONTEXT: Check if files/context changed mid-session
|
||||
current_md_hash = hash(md_content)
|
||||
old_history = None
|
||||
@@ -490,8 +499,7 @@ def _send_gemini(md_content: str, user_message: str, base_dir: str, file_items:
|
||||
)
|
||||
_append_comms("OUT", "request", {"message": f"[CACHE CREATED] {_gemini_cache.name}"})
|
||||
except Exception as e:
|
||||
# Fallback if under token limit or API error
|
||||
pass
|
||||
_gemini_cache = None # Ensure clean state on failure
|
||||
|
||||
kwargs = {"model": _model, "config": chat_config}
|
||||
if old_history:
|
||||
@@ -500,27 +508,11 @@ def _send_gemini(md_content: str, user_message: str, base_dir: str, file_items:
|
||||
_gemini_chat = _gemini_client.chats.create(**kwargs)
|
||||
_gemini_chat._last_md_hash = current_md_hash
|
||||
|
||||
# COMPRESS HISTORY: Truncate massive tool outputs from previous turns to stop token leaks
|
||||
if _gemini_chat and getattr(_gemini_chat, "history", None):
|
||||
for msg in _gemini_chat.history:
|
||||
if msg.role == "user" and hasattr(msg, "parts"):
|
||||
for p in msg.parts:
|
||||
if hasattr(p, "function_response") and p.function_response and hasattr(p.function_response, "response"):
|
||||
r = p.function_response.response
|
||||
if isinstance(r, dict) and "output" in r:
|
||||
val = r["output"]
|
||||
if isinstance(val, str):
|
||||
if "[SYSTEM: FILES UPDATED]" in val:
|
||||
val = val.split("[SYSTEM: FILES UPDATED]")[0].strip()
|
||||
if _history_trunc_limit > 0 and len(val) > _history_trunc_limit:
|
||||
val = val[:_history_trunc_limit] + "\n\n... [TRUNCATED BY SYSTEM TO SAVE TOKENS. Original output was too large.]"
|
||||
r["output"] = val
|
||||
|
||||
_append_comms("OUT", "request", {"message": f"[ctx {len(md_content)} + msg {len(user_message)}]"})
|
||||
payload, all_text = user_message, []
|
||||
|
||||
for r_idx in range(MAX_TOOL_ROUNDS + 2):
|
||||
# Strip stale file refreshes from Gemini history
|
||||
# Strip stale file refreshes and truncate old tool outputs in Gemini history
|
||||
if _gemini_chat and _gemini_chat.history:
|
||||
for msg in _gemini_chat.history:
|
||||
if msg.role == "user" and hasattr(msg, "parts"):
|
||||
@@ -529,8 +521,12 @@ def _send_gemini(md_content: str, user_message: str, base_dir: str, file_items:
|
||||
r = p.function_response.response
|
||||
if isinstance(r, dict) and "output" in r:
|
||||
val = r["output"]
|
||||
if isinstance(val, str) and "[SYSTEM: FILES UPDATED]" in val:
|
||||
r["output"] = val.split("[SYSTEM: FILES UPDATED]")[0].strip()
|
||||
if isinstance(val, str):
|
||||
if "[SYSTEM: FILES UPDATED]" in val:
|
||||
val = val.split("[SYSTEM: FILES UPDATED]")[0].strip()
|
||||
if _history_trunc_limit > 0 and len(val) > _history_trunc_limit:
|
||||
val = val[:_history_trunc_limit] + "\n\n... [TRUNCATED BY SYSTEM TO SAVE TOKENS.]"
|
||||
r["output"] = val
|
||||
|
||||
resp = _gemini_chat.send_message(payload)
|
||||
txt = "\n".join(p.text for c in resp.candidates if getattr(c, "content", None) for p in c.content.parts if hasattr(p, "text") and p.text)
|
||||
@@ -544,6 +540,29 @@ def _send_gemini(md_content: str, user_message: str, base_dir: str, file_items:
|
||||
reason = resp.candidates[0].finish_reason.name if resp.candidates and hasattr(resp.candidates[0], "finish_reason") else "STOP"
|
||||
|
||||
_append_comms("IN", "response", {"round": r_idx, "stop_reason": reason, "text": txt, "tool_calls": [{"name": c.name, "args": dict(c.args)} for c in calls], "usage": usage})
|
||||
|
||||
# Guard: if Gemini reports input tokens approaching the limit, drop oldest history pairs
|
||||
total_in = usage.get("input_tokens", 0)
|
||||
if total_in > _GEMINI_MAX_INPUT_TOKENS and _gemini_chat and _gemini_chat.history:
|
||||
hist = _gemini_chat.history
|
||||
dropped = 0
|
||||
# Drop oldest pairs (user+model) but keep at least the last 2 entries
|
||||
while len(hist) > 4 and total_in > _GEMINI_MAX_INPUT_TOKENS * 0.7:
|
||||
# Rough estimate: each dropped message saves ~(chars/4) tokens
|
||||
saved = 0
|
||||
for p in hist[0].parts:
|
||||
if hasattr(p, "text") and p.text:
|
||||
saved += len(p.text) // 4
|
||||
elif hasattr(p, "function_response") and p.function_response:
|
||||
r = getattr(p.function_response, "response", {})
|
||||
if isinstance(r, dict):
|
||||
saved += len(str(r.get("output", ""))) // 4
|
||||
hist.pop(0)
|
||||
total_in -= max(saved, 100)
|
||||
dropped += 1
|
||||
if dropped > 0:
|
||||
_append_comms("OUT", "request", {"message": f"[GEMINI HISTORY TRIMMED: dropped {dropped} old entries to stay within token budget]"})
|
||||
|
||||
if not calls or r_idx > MAX_TOOL_ROUNDS: break
|
||||
|
||||
f_resps, log = [], []
|
||||
@@ -560,8 +579,10 @@ def _send_gemini(md_content: str, user_message: str, base_dir: str, file_items:
|
||||
|
||||
if i == len(calls) - 1:
|
||||
if file_items:
|
||||
ctx = _build_file_context_text(_reread_file_items(file_items))
|
||||
if ctx: out += f"\n\n[SYSTEM: FILES UPDATED]\n\n{ctx}"
|
||||
file_items = _reread_file_items(file_items)
|
||||
ctx = _build_file_context_text(file_items)
|
||||
if ctx:
|
||||
out += f"\n\n[SYSTEM: FILES UPDATED]\n\n{ctx}"
|
||||
if r_idx == MAX_TOOL_ROUNDS: out += "\n\n[SYSTEM: MAX ROUNDS. PROVIDE FINAL ANSWER.]"
|
||||
|
||||
f_resps.append(types.Part.from_function_response(name=name, response={"output": out}))
|
||||
@@ -586,6 +607,10 @@ _CHARS_PER_TOKEN = 3.5
|
||||
# Anthropic's limit is 200k. We leave headroom for the response + tool schemas.
|
||||
_ANTHROPIC_MAX_PROMPT_TOKENS = 180_000
|
||||
|
||||
# Gemini models have a 1M context window but we cap well below to leave headroom.
|
||||
# If the model reports input tokens exceeding this, we trim old history.
|
||||
_GEMINI_MAX_INPUT_TOKENS = 900_000
|
||||
|
||||
# Marker prefix used to identify stale file-refresh injections in history
|
||||
_FILE_REFRESH_MARKER = "[FILES UPDATED"
|
||||
|
||||
@@ -830,7 +855,7 @@ def _send_anthropic(md_content: str, user_message: str, base_dir: str, file_item
|
||||
max_tokens=_max_tokens,
|
||||
temperature=_temperature,
|
||||
system=system_blocks,
|
||||
tools=_build_anthropic_tools(),
|
||||
tools=_get_anthropic_tools(),
|
||||
messages=_anthropic_history,
|
||||
)
|
||||
|
||||
@@ -976,5 +1001,4 @@ def send(
|
||||
return _send_gemini(md_content, user_message, base_dir, file_items)
|
||||
elif _provider == "anthropic":
|
||||
return _send_anthropic(md_content, user_message, base_dir, file_items)
|
||||
raise ValueError(f"unknown provider: {_provider}")
|
||||
|
||||
raise ValueError(f"unknown provider: {_provider}")
|
||||
+3
-3
@@ -1,6 +1,6 @@
|
||||
[ai]
|
||||
provider = "gemini"
|
||||
model = "gemini-3.1-pro-preview"
|
||||
provider = "anthropic"
|
||||
model = "claude-sonnet-4-6"
|
||||
temperature = 0.6000000238418579
|
||||
max_tokens = 12000
|
||||
history_trunc_limit = 8000
|
||||
@@ -17,4 +17,4 @@ paths = [
|
||||
"manual_slop.toml",
|
||||
"C:/projects/forth/bootslop/bootslop.toml",
|
||||
]
|
||||
active = "manual_slop.toml"
|
||||
active = "C:/projects/forth/bootslop/bootslop.toml"
|
||||
|
||||
@@ -8,6 +8,8 @@ A GUI orchestrator for local LLM-driven coding sessions, built to prevent the AI
|
||||
|
||||
The heart of context management.
|
||||
|
||||
> **Note:** The Config panel has been removed. Output directory and auto-add history settings are now integrated into the Projects and Discussion History panels respectively.
|
||||
|
||||
- **Configuration:** You specify the Git Directory (for commit tracking) and a Main Context File (the markdown file containing your project's notes and schema).
|
||||
- **Word-Wrap Toggle:** Dynamically swaps text rendering in large read-only panels (Responses, Comms Log) between unwrapped (ideal for viewing precise code formatting) and wrapped (ideal for prose).
|
||||
- **Project Switching:** Switch between different <project>.toml profiles to instantly swap out your entire active file list, discussion history, and settings.
|
||||
|
||||
@@ -44,14 +44,15 @@ The communication model is unified under ai_client.py, which normalizes the Gemi
|
||||
|
||||
The loop is defined as follows:
|
||||
|
||||
1. **Prompt Injection:** The aggregated Markdown context and system prompt are injected. (Gemini injects this directly into system_instruction at chat instantiation to prevent history bloat; Anthropic chunks this into cache_control: ephemeral blocks).
|
||||
2. **Execution Loop:** A MAX_TOOL_ROUNDS (default 10) bounded loop begins.
|
||||
1. **Prompt Injection:** The aggregated Markdown context and system prompt are injected. For Gemini, the system_instruction and tools are stored in an explicit cache via `client.caches.create()` with a 1-hour TTL; if cache creation fails (under minimum token threshold), it falls back to inline system_instruction. When context changes mid-session, the old cache is deleted and a new one is created. For Anthropic, the system prompt + context are sent as `system=` blocks with `cache_control: ephemeral` on the last chunk, and tools carry `cache_control: ephemeral` on the last tool definition.
|
||||
2. **Execution Loop:** A MAX_TOOL_ROUNDS (default 10) bounded loop begins. The tools list for Anthropic is built once per session and reused.
|
||||
3. The AI provider is polled.
|
||||
4. If the provider's stop_reason is ool_use:
|
||||
4. If the provider's stop_reason is tool_use:
|
||||
1. The loop parses the requested tool (either a read-only MCP tool or the destructive PowerShell tool).
|
||||
2. If PowerShell, it dispatches a blocking event to the Main Thread (see *On Tool Execution & Concurrency*).
|
||||
3. Once the result is retrieved, the loop executes a **Dynamic Refresh** (_reread_file_items). Any files currently tracked by the project are pulled from the disk fresh.
|
||||
4. The tool result, appended with the fresh [FILES UPDATED] block, is sent back to the provider.
|
||||
3. Once the last tool result in the batch is retrieved, the loop executes a **Dynamic Refresh** (`_reread_file_items`). Any files currently tracked by the project are pulled from disk fresh. The `file_items` variable is reassigned so subsequent tool rounds see the updated content.
|
||||
4. For Anthropic: the refreshed file contents are appended as a text block to the tool_results user message. For Gemini: the refreshed contents are appended to the last function response's output string. In both cases, the block is prefixed with `[FILES UPDATED]` / `[SYSTEM: FILES UPDATED]`.
|
||||
5. On subsequent rounds, stale file-refresh blocks from previous turns are stripped from history to prevent token accumulation. For Gemini, old tool outputs exceeding `_history_trunc_limit` characters are also truncated.
|
||||
5. Once the model outputs standard text, the loop terminates and yields the string back to the GUI callback.
|
||||
|
||||
### On Tool Execution & Concurrency
|
||||
|
||||
+5
-2
@@ -46,5 +46,8 @@ The core system prompt explicitly guides the AI on how to use this tool safely:
|
||||
|
||||
### Synthetic Context Refresh
|
||||
|
||||
Immediately after **any** tool call turn finishes, ai_client runs _reread_file_items. It fetches the latest disk state of all files in the current project context and appends them as a synthetic [FILES UPDATED] message to the tool result.
|
||||
This means if the AI writes to a file, it instantly "sees" the modification in its next turn without having to waste a cycle calling read_file.
|
||||
After the **last** tool call in each round finishes (when multiple tools are called in a single round, the refresh happens once after all of them), ai_client runs `_reread_file_items`. It fetches the latest disk state of all files in the current project context. The `file_items` variable is reassigned so subsequent tool rounds within the same request use the fresh content.
|
||||
|
||||
For Anthropic, the refreshed contents are injected as a text block in the `tool_results` user message. For Gemini, they are appended to the last function response's output string. In both cases, the block is prefixed with `[FILES UPDATED]` / `[SYSTEM: FILES UPDATED]`.
|
||||
|
||||
On the next tool round, stale file-refresh blocks from previous rounds are stripped from history to prevent token accumulation. This means if the AI writes to a file, it instantly "sees" the modification in its next turn without having to waste a cycle calling `read_file`, and the cost of carrying the full file snapshot is limited to one round.
|
||||
|
||||
+2
-2
@@ -1,7 +1,7 @@
|
||||
[project]
|
||||
name = "manual_slop"
|
||||
git_dir = "C:/projects/manual_slop"
|
||||
system_prompt = "Make sure to update MainContext.md every time.\nMake destructive modifications to the project, ITS OK, I HAVE GIT HISTORY TO MANAGE THE PROJECTS."
|
||||
system_prompt = "Make sure to update MainContext.md every time.\nMake destructive modifications to the project, ITS OK, I HAVE GIT HISTORY TO MANAGE THE PROJECTS.\nAvoid reading manual_slop.toml its expensive as it has the history of multiple dicussions.\n"
|
||||
main_context = "C:/projects/manual_slop/MainContext.md"
|
||||
word_wrap = true
|
||||
|
||||
@@ -147,7 +147,7 @@ history = [
|
||||
|
||||
[discussion.discussions."docs writeup"]
|
||||
git_commit = "bf2d09f3fd817d64fbf6b4aa667e2b635b6fbc0e"
|
||||
last_updated = "2026-02-22T10:34:24"
|
||||
last_updated = "2026-02-22T11:08:58"
|
||||
history = [
|
||||
"@2026-02-22T08:56:39\nUser:\nLets write extensive documentation in the same style that I used for my VEFontCache-Oodin project.\nI added it's directories to your context.",
|
||||
"@2026-02-22T08:56:58\nAI:\n(No text returned)",
|
||||
|
||||
@@ -154,4 +154,3 @@ def flat_config(proj: dict, disc_name: str | None = None) -> dict:
|
||||
"history": disc_data.get("history", []),
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@@ -133,5 +133,3 @@ def log_tool_call(script: str, result: str, script_path: str | None):
|
||||
pass
|
||||
|
||||
return str(ps1_path) if ps1_path else None
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user