Files

Ed_ d7d0583b4e Agent scripting ux improvements

2026-02-21 23:15:48 -05:00

21 KiB

Raw Blame History

Manual Slop

Summary

Is a local GUI tool for manually curating and sending context to AI APIs. It aggregates files, screenshots, and discussion history into a structured markdown file and sends it to a chosen AI provider with a user-written message. The AI can also execute PowerShell scripts within the project directory, with user confirmation required before each execution.

Stack:

dearpygui - GUI with docking/floating/resizable panels
google-genai - Gemini API
anthropic - Anthropic API
tomli-w - TOML writing
uv - package/env management

Files:

gui.py - main GUI, App class, all panels, all callbacks, confirmation dialog, layout persistence, rich comms rendering
ai_client.py - unified provider wrapper, model listing, session management, send, tool/function-call loop, comms log, provider error classification
aggregate.py - reads config, collects files/screenshots/discussion, writes numbered .md files to output_dir
shell_runner.py - subprocess wrapper that runs PowerShell scripts sandboxed to base_dir, returns stdout/stderr/exit code as a string
session_logger.py - opens timestamped log files at session start; writes comms entries as JSON-L and tool calls as markdown; saves each AI-generated script as a .ps1 file
project_manager.py - per-project .toml load/save, entry serialisation (entry_to_str/str_to_entry with @timestamp support), default_project/default_discussion factories, migrate_from_legacy_config, flat_config for aggregate.run(), git helpers (get_git_commit, get_git_log)
theme.py - palette definitions, font loading, scale, load_from_config/save_to_config
gemini.py - legacy standalone Gemini wrapper (not used by the main GUI; superseded by ai_client.py)
file_cache.py - stub; Anthropic Files API path removed; kept so stale imports don't break
mcp_client.py - MCP-style read-only file tools (read_file, list_directory, search_files, get_file_summary); allowlist enforced against project file_items + base_dirs; dispatched by ai_client tool-use loop for both Anthropic and Gemini
summarize.py - local heuristic summariser (no AI); .py via AST, .toml via regex, .md headings, generic preview; used by mcp_client.get_file_summary and aggregate.build_summary_section
config.toml - global-only settings: [ai] provider+model+system_prompt, [theme] palette+font+scale, [projects] paths array + active path
manual_slop.toml - per-project file: [project] name+git_dir+system_prompt+main_context, [output] namespace+output_dir, [files] base_dir+paths, [screenshots] base_dir+paths, [discussion] roles+active+[discussion.discussions.] git_commit+last_updated+history
credentials.toml - gemini api_key, anthropic api_key
dpg_layout.ini - Dear PyGui window layout file (auto-saved on exit, auto-loaded on startup); gitignore this per-user

GUI Panels:

Projects - active project name display (green), git directory input + Browse button, scrollable list of loaded project paths (click name to switch, x to remove), Add Project / New Project / Save All buttons
Config - namespace, output dir, save (these are project-level fields from the active .toml)
Files - base_dir, scrollable path list with remove, add file(s), add wildcard
Screenshots - base_dir, scrollable path list with remove, add screenshot(s)
Discussion History - discussion selector (collapsible header): listbox of named discussions, git commit + last_updated display, Update Commit button, Create/Rename/Delete buttons with name input; structured entry editor: each entry has collapse toggle (-/+), role combo, timestamp display, multiline content field; per-entry Ins/Del buttons when collapsed; global toolbar: + Entry, -All, +All, Clear All, Save; collapsible Roles sub-section; -> History buttons on Message and Response panels append current message/response as new entry with timestamp
Provider - provider combo (gemini/anthropic), model listbox populated from API, fetch models button
Message - multiline input, Gen+Send button, MD Only button, Reset session button, -> History button
Response - readonly multiline displaying last AI response, -> History button
Tool Calls - scrollable log of every PowerShell tool call the AI made; Clear button
System Prompts - global (all projects) and project-specific multiline text areas for injecting custom system instructions. Combined with the built-in tool prompt.
Comms History - rich structured live log of every API interaction; status line at top; colour legend; Clear button

Layout persistence:

dpg.configure_app(..., init_file="dpg_layout.ini") loads the ini at startup if it exists; DPG silently ignores a missing file
dpg.save_init_file("dpg_layout.ini") is called immediately before dpg.destroy_context() on clean exit
The ini records window positions, sizes, and dock node assignments in DPG's native format
First run (no ini) uses the hardcoded pos= defaults in _build_ui(); after that the ini takes over
Delete dpg_layout.ini to reset to defaults

Project management:

config.toml is global-only: [ai], [theme], [projects] (paths list + active path). No project data lives here.
Each project has its own .toml file (e.g. manual_slop.toml). Multiple project tomls can be registered by path.
App.__init__ loads global config, then loads the active project .toml via project_manager.load_project(). Falls back to migrate_from_legacy_config() if no valid project file exists, creating a new .toml automatically.
_flush_to_project() pulls widget values into self.project (the per-project dict) and serialises disc_entries into the active discussion's history list
_flush_to_config() writes global settings ([ai], [theme], [projects]) into self.config
_save_active_project() writes self.project to the active .toml path via project_manager.save_project()
_do_generate() calls both flush methods, saves both files, then uses project_manager.flat_config() to produce the dict that aggregate.run() expects â€” so aggregate.py needs zero changes
Switching projects: saves current project, loads new one, refreshes all GUI state, resets AI session
New project: file dialog for save path, creates default project structure, saves it, switches to it

Discussion management (per-project):

Each project .toml stores one or more named discussions under [discussion.discussions.<name>]
Each discussion has: git_commit (str), last_updated (ISO timestamp), history (list of serialised entry strings)
active key in [discussion] tracks which discussion is currently selected
Creating a discussion: adds a new empty discussion dict via default_discussion(), switches to it
Renaming: moves the dict to a new key, updates active if it was the current one
Deleting: removes the dict; cannot delete the last discussion; switches to first remaining if active was deleted
Switching: flushes current entries to project, loads new discussion's history, rebuilds disc list
Update Commit button: runs git rev-parse HEAD in the project's git_dir and stores result + timestamp in the active discussion
Timestamps: each disc entry carries a ts field (ISO datetime); shown next to the role combo; new entries from -> History or + Entry get now_ts()

Entry serialisation (project_manager):

entry_to_str(entry) â†’ "@<ts>\n<role>:\n<content>" (or "<role>:\n<content>" if no ts)
str_to_entry(raw, roles) â†’ parses optional @<ts> prefix, then role line, then content; returns {role, content, collapsed, ts}
Round-trips correctly through TOML string arrays; handles legacy entries without timestamps

AI Tool Use (PowerShell):

Both Gemini and Anthropic are configured with a run_powershell tool/function declaration
When the AI wants to edit or create files it emits a tool call with a script string
ai_client runs a loop (max MAX_TOOL_ROUNDS = 5) feeding tool results back until the AI stops calling tools
Before any script runs, gui.py shows a modal ConfirmDialog on the main thread; the background send thread blocks on a threading.Event until the user clicks Approve or Reject
The dialog displays base_dir, shows the script in an editable text box (allowing last-second tweaks), and has Approve & Run / Reject buttons
On approval the (possibly edited) script is passed to shell_runner.run_powershell() which prepends Set-Location -LiteralPath '<base_dir>' and runs it via powershell -NoProfile -NonInteractive -Command
stdout, stderr, and exit code are returned to the AI as the tool result
Rejections return "USER REJECTED: command was not executed" to the AI
All tool calls (script + result/rejection) are appended to _tool_log and displayed in the Tool Calls panel

Dynamic file context refresh (ai_client.py):

After every tool call round, all project files from file_items are re-read from disk via _reread_file_items()
For Anthropic: the refreshed file contents are injected as a text block appended to the tool_results user message, prefixed with [FILES UPDATED] and an instruction not to re-read them
For Gemini: files are re-read (updating the file_items list in place) but cannot be injected into tool results due to Gemini's structured function response format
_build_file_context_text(file_items) formats the refreshed files as markdown code blocks (same format as the original context)
The tool_result_send comms log entry filters out the injected text block (only logs actual tool_result entries) to keep the comms panel clean
file_items flows from aggregate.build_file_items() â†’ gui.py self.last_file_items â†’ ai_client.send(file_items=...) â†’ _send_anthropic(file_items=...) / _send_gemini(file_items=...)
System prompt updated to tell the AI: "the user's context files are automatically refreshed after every tool call, so you do NOT need to re-read files that are already provided in the block"

Anthropic bug fixes applied (session history):

Bug 1: SDK ContentBlock objects now converted to plain dicts via _content_block_to_dict() before storing in _anthropic_history; prevents re-serialisation failures on subsequent tool-use rounds
Bug 2: _repair_anthropic_history simplified to dict-only path since history always contains dicts
Bug 3: Gemini part.function_call access now guarded with hasattr check
Bug 4: Anthropic b.type == "tool_use" changed to getattr(b, "type", None) == "tool_use" for safe access during response processing

Comms Log (ai_client.py):

_comms_log: list[dict] accumulates every API interaction during a session
_append_comms(direction, kind, payload) called at each boundary: OUT/request before sending, IN/response after each model reply, OUT/tool_call before executing, IN/tool_result after executing, OUT/tool_result_send when returning results to the model
Entry fields: ts (HH:MM:SS), direction (OUT/IN), kind, provider, model, payload (dict)
Anthropic responses also include usage (input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens) and stop_reason in payload
get_comms_log() returns a snapshot; clear_comms_log() empties it
comms_log_callback (injected by gui.py) is called from the background thread with each new entry; gui queues entries in _pending_comms (lock-protected) and flushes them to the DPG panel each render frame
COMMS_CLAMP_CHARS = 300 in gui.py governs the display cutoff for heavy text fields

Comms History panel â€” rich structured rendering (gui.py):

Rather than showing raw JSON, each comms entry is rendered using a kind-specific renderer function. Unknown kinds fall back to a generic key/value layout.

Colour maps:

Direction: OUT = blue-ish (100,200,255), IN = green-ish (140,255,160)
Kind: request=gold, response=light-green, tool_call=orange, tool_result=light-blue, tool_result_send=lavender
Labels: grey (180,180,180); values: near-white (220,220,220); dict keys/indices: (140,200,255); numbers/token counts: (180,255,180); sub-headers: (220,200,120)

Helper functions:

_add_text_field(parent, label, value) â€” labelled text; strings longer than COMMS_CLAMP_CHARS render as an 80px readonly scrollable input_text; shorter strings render as add_text
_add_kv_row(parent, key, val) â€” single horizontal key: value row
_render_usage(parent, usage) â€” renders Anthropic token usage dict in a fixed display order (input â†’ cache_read â†’ cache_creation â†’ output)
_render_tool_calls_list(parent, tool_calls) â€” iterates tool call list, showing name, id, and all args via _add_text_field

Kind-specific renderers (in _KIND_RENDERERS dict, dispatched by _render_comms_entry):

_render_payload_request â€” shows message field via _add_text_field
_render_payload_response â€” shows round, stop_reason (orange), text, tool_calls list, usage block
_render_payload_tool_call â€” shows name, optional id, script via _add_text_field
_render_payload_tool_result â€” shows name, optional id, output via _add_text_field
_render_payload_tool_result_send â€” iterates results list, shows tool_use_id and content per result
_render_payload_generic â€” fallback for unknown kinds; renders all keys, using _add_text_field for keys in _HEAVY_KEYS, _add_kv_row for others; dicts/lists are JSON-serialised

Entry layout: index + timestamp + direction + kind + provider/model header row, then payload rendered by the appropriate function, then a separator line.

Session Logger (session_logger.py):

open_session() called once at GUI startup; creates logs/ and scripts/generated/ directories; opens logs/comms_<ts>.log and logs/toolcalls_<ts>.log (line-buffered)
log_comms(entry) appends each comms entry as a JSON-L line to the comms log; called from App._on_comms_entry (background thread); thread-safe via GIL + line buffering
log_tool_call(script, result, script_path) writes the script to scripts/generated/<ts>_<seq:04d>.ps1 and appends a markdown record to the toolcalls log without the script body (just the file path + result); uses a threading.Lock for the sequence counter
close_session() flushes and closes both file handles; called just before dpg.destroy_context()

Anthropic prompt caching:

System prompt sent as an array with cache_control: ephemeral on the text block
Last tool in _ANTHROPIC_TOOLS has cache_control: ephemeral; system + tools prefix is cached together after the first request
First user message content[0] is the <context> block with cache_control: ephemeral; content[1] is the user question without cache control
Cache stats (creation tokens, read tokens) are surfaced in the comms log usage dict and displayed in the Comms History panel

Data flow:

GUI edits are held in App state (self.files, self.screenshots, self.disc_entries, self.project) and dpg widget values
_flush_to_project() pulls all widget values into self.project dict (per-project data)
_flush_to_config() pulls global settings into self.config dict
_do_generate() calls both flush methods, saves both files, calls project_manager.flat_config(self.project, disc_name) to produce a dict for aggregate.run(), which writes the md and returns (markdown_str, path, file_items)
cb_generate_send() calls _do_generate() then threads a call to ai_client.send(md, message, base_dir)
ai_client.send() prepends the md as a <context> block to the user message and sends via the active provider chat session
If the AI responds with tool calls, the loop handles them (with GUI confirmation) before returning the final text response
Sessions are stateful within a run (chat history maintained), Reset clears them, the tool log, and the comms log

Config persistence:

config.toml â€” global only: [ai] provider+model, [theme] palette+font+scale, [projects] paths array + active path
<project>.toml â€” per-project: output, files, screenshots, discussion (roles, active discussion name, all named discussions with their history+metadata)
On every send and save, both files are written
On clean exit, run() calls _flush_to_project(), _save_active_project(), _flush_to_config(), save_config() before destroying context

Threading model:

DPG render loop runs on the main thread
AI sends and model fetches run on daemon background threads
_pending_dialog (guarded by a threading.Lock) is set by the background thread and consumed by the render loop each frame, calling dialog.show() on the main thread
dialog.wait() blocks the background thread on a threading.Event until the user acts
_pending_comms (guarded by a separate threading.Lock) is populated by _on_comms_entry (background thread) and drained by _flush_pending_comms() each render frame (main thread)

Provider error handling:

ProviderError(kind, provider, original) wraps upstream API exceptions with a classified kind: quota, rate_limit, auth, balance, network, unknown
_classify_anthropic_error and _classify_gemini_error inspect exception types and status codes/message bodies to assign the kind
ui_message() returns a human-readable label for display in the Response panel

MCP file tools (mcp_client.py + ai_client.py):

Four read-only tools exposed to the AI as native function/tool declarations: read_file, list_directory, search_files, get_file_summary
Access control: mcp_client.configure(file_items, extra_base_dirs) is called before each send; builds an allowlist of resolved absolute paths from the project's file_items plus the base_dir; any path that is not explicitly in the list or not under one of the allowed directories returns ACCESS DENIED
mcp_client.dispatch(tool_name, tool_input) is the single dispatch entry point used by both Anthropic and Gemini tool-use loops
Anthropic: MCP tools appear before run_powershell in the tools list (no cache_control on them; only run_powershell carries cache_control: ephemeral)
Gemini: MCP tools are included in the FunctionDeclaration list alongside run_powershell
get_file_summary uses summarize.summarise_file() â€” same heuristic used for the initial <context> block, so the AI gets the same compact structural view it already knows
list_directory sorts dirs before files; shows name, type, and size
search_files uses Path.glob() with the caller-supplied pattern (supports **/*.py style)
read_file returns raw UTF-8 text; errors (not found, access denied, decode error) are returned as error strings rather than exceptions, so the AI sees them as tool results
summarize.py heuristics: .py â†’ AST imports + ALL_CAPS constants + classes+methods + top-level functions; .toml â†’ table headers + top-level keys; .md â†’ h1â€“h3 headings with indentation; all others â†’ line count + first 8 lines preview
Comms log: MCP tool calls log OUT/tool_call with {"name": ..., "args": {...}} and IN/tool_result with {"name": ..., "output": ...}; rendered in the Comms History panel via _render_payload_tool_call (shows each arg key/value) and _render_payload_tool_result (shows output)

Known extension points:

Add more providers by adding a section to credentials.toml, a _list_* and _send_* function in ai_client.py, and the provider name to the PROVIDERS list in gui.py
System prompt support could be added as a field in the project .toml and passed in ai_client.send()
Discussion history excerpts could be individually toggleable for inclusion in the generated md
MAX_TOOL_ROUNDS in ai_client.py caps agentic loops at 5 rounds; adjustable
COMMS_CLAMP_CHARS in gui.py controls the character threshold for clamping heavy payload fields in the Comms History panel
Additional project metadata (description, tags, created date) could be added to [project] in the per-project toml

Gemini Context Management

Investigating ways to prevent context duplication in _gemini_chat history, as currently {md_content} is prepended to the user message on every single request, causing history bloat.
Discussing explicit Gemini Context Caching API (client.caches.create()) to store read-only file context and avoid re-reading files across sessions.

Latest Changes

Removed Config panel from the GUI to streamline per-project configuration.
output_dir was moved into the Projects panel.
auto_add_history was moved to the Discussion History panel.
namespace is no longer a configurable field; aggregate.py automatically uses the active project's name property.

UI / Visual Updates

The success blink notification on the response text box is now dimmer and more transparent to be less visually jarring.
Added a new floating Last Script Output popup window. This window automatically displays and blinks blue whenever the AI executes a PowerShell tool, showing both the executed script and its result in real-time.