33 KiB
Architecture
Top | Tools & IPC | MMA Orchestration | Simulations
Philosophy: The Decoupled State Machine
Manual Slop solves a single tension: AI reasoning is high-latency and non-deterministic; GUI interaction must be low-latency and responsive. The engine enforces strict decoupling between four thread domains so that multi-second LLM calls never block the render loop, and every AI-generated payload passes through a human-auditable gate before execution.
The architectural philosophy follows data-oriented design principles:
- The GUI (
gui_2.py,app_controller.py) remains a pure visualization of application state - State mutations occur only through lock-guarded queues consumed on the main render thread
- Background threads never write GUI state directly — they serialize task dicts for later consumption
- All cross-thread communication uses explicit synchronization primitives (Locks, Conditions, Events)
Project Structure
The codebase is organized into a src/ layout to separate implementation from configuration and artifacts.
manual_slop/
├── conductor/ # Conductor tracks, specs, and plans
├── docs/ # Deep-dive architectural documentation
├── logs/ # Session logs, agent traces, and errors
├── scripts/ # Build, migration, and IPC bridge scripts
├── src/ # Core Python implementation
│ ├── ai_client.py # LLM provider abstraction
│ ├── gui_2.py # Main ImGui application
│ ├── mcp_client.py # MCP tool implementation
│ └── ... # Other core modules
├── tests/ # Pytest suite and simulation fixtures
├── simulation/ # Workflow and agent simulation logic
├── sloppy.py # Primary application entry point
├── config.toml # Global application settings
└── manual_slop.toml # Project-specific configuration
Thread Domains
Four distinct thread domains operate concurrently:
| Domain | Created By | Purpose | Lifecycle | Key Synchronization Primitives |
|---|---|---|---|---|
| Main / GUI | immapp.run() |
Dear ImGui retained-mode render loop; sole writer of GUI state | App lifetime | None (consumer of queues) |
| Asyncio Worker | App.__init__ via threading.Thread(daemon=True) |
Event queue processing, AI client calls | Daemon (dies with process) | AsyncEventQueue, threading.Lock |
| HookServer | api_hooks.HookServer.start() |
HTTP API on :8999 for external automation and IPC |
Daemon thread | threading.Lock, threading.Event |
| Ad-hoc | Transient threading.Thread calls |
Model-fetching, legacy send paths, log pruning | Short-lived | Task-specific locks |
The asyncio worker is not the main thread's event loop. It runs a dedicated asyncio.new_event_loop() on its own daemon thread:
# AppController.__init__:
self._loop = asyncio.new_event_loop()
self._loop_thread = threading.Thread(target=self._run_event_loop, daemon=True)
self._loop_thread.start()
# _run_event_loop:
def _run_event_loop(self) -> None:
asyncio.set_event_loop(self._loop)
self._loop.create_task(self._process_event_queue())
self._loop.run_forever()
The GUI thread uses asyncio.run_coroutine_threadsafe(coro, self._loop) to push work into this loop.
Thread-Local Context Isolation
For concurrent multi-agent execution, the application uses threading.local() to manage per-thread context:
# ai_client.py
_local_storage = threading.local()
def get_current_tier() -> Optional[str]:
"""Returns the current tier from thread-local storage."""
return getattr(_local_storage, "current_tier", None)
def set_current_tier(tier: Optional[str]) -> None:
"""Sets the current tier in thread-local storage."""
_local_storage.current_tier = tier
This ensures that comms log entries and tool calls are correctly tagged with their source tier even when multiple workers execute concurrently.
Cross-Thread Data Structures
All cross-thread communication uses one of three patterns:
Pattern A: AsyncEventQueue (GUI -> Asyncio)
# events.py
class AsyncEventQueue:
_queue: asyncio.Queue # holds Tuple[str, Any] items
async def put(self, event_name: str, payload: Any = None) -> None
async def get(self) -> Tuple[str, Any]
The central event bus. Uses asyncio.Queue, so non-asyncio threads must enqueue via asyncio.run_coroutine_threadsafe(). Consumer is App._process_event_queue(), running as a long-lived coroutine on the asyncio loop.
Pattern B: Guarded Lists (Any Thread -> GUI)
Background threads cannot write GUI state directly. They append task dicts to lock-guarded lists; the main thread drains these once per frame:
# App.__init__:
self._pending_gui_tasks: list[dict[str, Any]] = []
self._pending_gui_tasks_lock = threading.Lock()
self._pending_comms: list[dict[str, Any]] = []
self._pending_comms_lock = threading.Lock()
self._pending_tool_calls: list[tuple[str, str, float]] = []
self._pending_tool_calls_lock = threading.Lock()
self._pending_history_adds: list[dict[str, Any]] = []
self._pending_history_adds_lock = threading.Lock()
Additional locks:
self._send_thread_lock = threading.Lock() # Guards send_thread creation
self._pending_dialog_lock = threading.Lock() # Guards _pending_dialog + _pending_actions dict
Pattern C: Condition-Variable Dialogs (Bidirectional Blocking)
Used for Human-in-the-Loop (HITL) approval. Background thread blocks on threading.Condition; GUI thread signals after user action. See the HITL section below.
Event System
Three classes in events.py (89 lines, no external dependencies beyond asyncio and typing):
EventEmitter
class EventEmitter:
_listeners: Dict[str, List[Callable]]
def on(self, event_name: str, callback: Callable) -> None
def emit(self, event_name: str, *args: Any, **kwargs: Any) -> None
Synchronous pub-sub. Callbacks execute in the caller's thread. Used by ai_client.events for lifecycle hooks (request_start, response_received, tool_execution). No thread safety — relies on consistent single-thread usage.
AsyncEventQueue
Described above in Pattern A.
UserRequestEvent
class UserRequestEvent:
prompt: str # User's raw input text
stable_md: str # Generated markdown context (files, screenshots)
file_items: List[Any] # File attachment items for dynamic refresh
disc_text: str # Serialized discussion history
base_dir: str # Working directory for shell commands
def to_dict(self) -> Dict[str, Any]
Pure data carrier. Created on the GUI thread in _handle_generate_send, consumed on the asyncio thread in _handle_request_event.
Application Lifetime
Boot Sequence
The App.__init__ (lines 152-296) follows this precise order:
- Config hydration: Reads
config.toml(global) and<project>.toml(local). Builds the initial "world view" — tracked files, discussion history, active models. - Thread bootstrapping:
- Asyncio event loop thread starts (
_loop_thread). HookServerstarts as a daemon iftest_hooks_enabledor provider isgemini_cli.
- Asyncio event loop thread starts (
- Callback wiring (
_init_ai_and_hooks): Connectsai_client.confirm_and_run_callback,comms_log_callback,tool_log_callbackto GUI handlers. - UI entry: Main thread enters
immapp.run(). GUI is now alive; background threads are ready.
Shutdown Sequence
When immapp.run() returns (user closed window):
hook_server.stop()— shuts down HTTP server, joins thread.perf_monitor.stop().ai_client.cleanup()— destroys server-side API caches (GeminiCachedContent).- Dual-Flush persistence:
_flush_to_project(),_save_active_project(),_flush_to_config(),save_config()— commits state back to both project and global configs. session_logger.close_session().
The asyncio loop thread is a daemon — it dies with the process. App.shutdown() exists for explicit cleanup in test scenarios:
def shutdown(self) -> None:
if self._loop.is_running():
self._loop.call_soon_threadsafe(self._loop.stop)
if self._loop_thread.is_alive():
self._loop_thread.join(timeout=2.0)
The Task Pipeline: Producer-Consumer Synchronization
Request Flow
GUI Thread Asyncio Thread GUI Thread (next frame)
────────── ────────────── ──────────────────────
1. User clicks "Gen + Send"
2. _handle_generate_send():
- Compiles md context
- Creates UserRequestEvent
- Enqueues via
run_coroutine_threadsafe ──> 3. _process_event_queue():
awaits event_queue.get()
routes "user_request" to
_handle_request_event()
4. Configures ai_client
5. ai_client.send() BLOCKS
(seconds to minutes)
6. On completion, enqueues
"response" event back ──> 7. _process_pending_gui_tasks():
Drains task list under lock
Sets ai_response text
Triggers terminal blink
Event Types Routed by _process_event_queue
| Event Name | Action |
|---|---|
"user_request" |
Calls _handle_request_event(payload) — synchronous blocking AI call |
"response" |
Appends {"action": "handle_ai_response", ...} to _pending_gui_tasks |
"mma_state_update" |
Appends {"action": "mma_state_update", ...} to _pending_gui_tasks |
"mma_spawn_approval" |
Appends the raw payload for HITL dialog creation |
"mma_step_approval" |
Appends the raw payload for HITL dialog creation |
The pattern: events arriving on the asyncio thread that need GUI state changes are serialized into _pending_gui_tasks for consumption on the next render frame.
Frame-Sync Mechanism: _process_pending_gui_tasks
Called once per ImGui frame on the main GUI thread. This is the sole safe point for mutating GUI-visible state.
Locking strategy — copy-and-clear:
def _process_pending_gui_tasks(self) -> None:
if not self._pending_gui_tasks:
return
with self._pending_gui_tasks_lock:
tasks = self._pending_gui_tasks[:] # Snapshot
self._pending_gui_tasks.clear() # Release lock fast
for task in tasks:
# Process each task outside the lock
Acquires the lock briefly to snapshot the task list, then processes outside the lock. Minimizes lock contention with producer threads.
Complete Action Type Catalog
| Action | Source | Effect |
|---|---|---|
"refresh_api_metrics" |
asyncio/hooks | Updates API metrics display |
"handle_ai_response" |
asyncio | Sets ai_response, ai_status, mma_streams[stream_id]; triggers blink; optionally auto-adds to discussion history |
"show_track_proposal" |
asyncio | Sets proposed_tracks list, opens modal |
"mma_state_update" |
asyncio | Updates mma_status, active_tier, mma_tier_usage, active_tickets, active_track |
"set_value" |
HookServer | Sets any field in _settable_fields map via setattr; special-cases current_provider/current_model to reconfigure AI client |
"click" |
HookServer | Dispatches to _clickable_actions map; introspects signatures to decide whether to pass user_data |
"select_list_item" |
HookServer | Routes to _switch_discussion() for discussion listbox |
{"type": "ask"} |
HookServer | Opens ask dialog: sets _pending_ask_dialog = True, stores _ask_request_id and _ask_tool_data |
"clear_ask" |
HookServer | Clears ask dialog state if request_id matches |
"custom_callback" |
HookServer | Executes an arbitrary callable with args |
"mma_step_approval" |
asyncio (MMA engine) | Creates MMAApprovalDialog, stores in _pending_mma_approval |
"mma_spawn_approval" |
asyncio (MMA engine) | Creates MMASpawnApprovalDialog, stores in _pending_mma_spawn |
"refresh_from_project" |
HookServer/internal | Reloads all UI state from project dict |
The Execution Clutch: Human-in-the-Loop
The "Execution Clutch" ensures every destructive AI action passes through an auditable human gate. Three dialog types implement this, all sharing the same blocking pattern.
Dialog Classes
ConfirmDialog — PowerShell script execution approval:
class ConfirmDialog:
_uid: str # uuid4 identifier
_script: str # The PowerShell script text (editable)
_base_dir: str # Working directory
_condition: threading.Condition # Blocking primitive
_done: bool # Signal flag
_approved: bool # User's decision
def wait(self) -> tuple[bool, str] # Blocks until _done; returns (approved, script)
MMAApprovalDialog — MMA tier step approval:
class MMAApprovalDialog:
_ticket_id: str
_payload: str # The step payload (editable)
_condition: threading.Condition
_done: bool
_approved: bool
def wait(self) -> tuple[bool, str] # Returns (approved, payload)
MMASpawnApprovalDialog — Sub-agent spawn approval:
class MMASpawnApprovalDialog:
_ticket_id: str
_role: str # tier3-worker, tier4-qa, etc.
_prompt: str # Spawn prompt (editable)
_context_md: str # Context document (editable)
_condition: threading.Condition
_done: bool
_approved: bool
_abort: bool # Can abort entire track
def wait(self) -> dict[str, Any] # Returns {approved, abort, prompt, context_md}
Blocking Flow
Using ConfirmDialog as exemplar:
ASYNCIO THREAD (ai_client tool callback) GUI MAIN THREAD
───────────────────────────────────────── ───────────────
1. ai_client calls _confirm_and_run(script)
2. Creates ConfirmDialog(script, base_dir)
3. Stores dialog:
- Headless: _pending_actions[uid] = dialog
- GUI mode: _pending_dialog = dialog
4. If test_hooks_enabled:
pushes to _api_event_queue
5. dialog.wait() BLOCKS on _condition
6. Next frame: ImGui renders
_pending_dialog in modal
7. User clicks Approve/Reject
8. _handle_approve_script():
with dialog._condition:
dialog._approved = True
dialog._done = True
dialog._condition.notify_all()
9. wait() returns (True, potentially_edited_script)
10. Executes shell_runner.run_powershell()
11. Returns output to ai_client
The _condition.wait(timeout=0.1) uses a 100ms polling interval inside a loop — a polling-with-condition hybrid that ensures the blocking thread wakes periodically.
Resolution Paths
GUI button path (normal interactive use):
_handle_approve_script() / _handle_approve_mma_step() / _handle_approve_spawn() directly manipulate the dialog's condition variable from the GUI thread.
HTTP API path (headless/automation):
resolve_pending_action(action_id, approved) looks up the dialog by UUID in _pending_actions dict (headless) or _pending_dialog (GUI), then signals the condition:
def resolve_pending_action(self, action_id: str, approved: bool) -> bool:
with self._pending_dialog_lock:
if action_id in self._pending_actions:
dialog = self._pending_actions[action_id]
with dialog._condition:
dialog._approved = approved
dialog._done = True
dialog._condition.notify_all()
return True
MMA approval path:
_handle_mma_respond(approved, payload, abort, prompt, context_md) is the unified resolver. It uses a dialog_container — a one-element list [None] used as a mutable reference shared between the MMA engine (which creates the container) and the GUI (which populates it via _process_pending_gui_tasks).
AI Client: Multi-Provider Architecture
ai_client.py operates as a stateful singleton — all provider state is held in module-level globals. There is no class wrapping; the module itself is the abstraction layer.
Module-Level State
_provider: str = "gemini" # "gemini" | "anthropic" | "deepseek" | "gemini_cli"
_model: str = "gemini-2.5-flash-lite"
_temperature: float = 0.0
_max_tokens: int = 8192
_history_trunc_limit: int = 8000 # Char limit for truncating old tool outputs
_send_lock: threading.Lock # Serializes ALL send() calls across providers
Per-provider client objects:
# Gemini (SDK-managed stateful chat)
_gemini_client: genai.Client | None
_gemini_chat: Any # Holds history internally
_gemini_cache: Any # Server-side CachedContent
_gemini_cache_md_hash: int | None # For cache invalidation
_GEMINI_CACHE_TTL: int = 3600 # 1-hour; rebuilt at 90% (3240s)
# Anthropic (client-managed history)
_anthropic_client: anthropic.Anthropic | None
_anthropic_history: list[dict] # Mutable [{role, content}, ...]
_anthropic_history_lock: threading.Lock
# DeepSeek (raw HTTP, client-managed history)
_deepseek_history: list[dict]
_deepseek_history_lock: threading.Lock
# Gemini CLI (adapter wrapper)
_gemini_cli_adapter: GeminiCliAdapter | None
Safety limits:
MAX_TOOL_ROUNDS: int = 10 # Max tool-call loop iterations per send()
_MAX_TOOL_OUTPUT_BYTES: int = 500_000 # 500KB cumulative tool output budget
_ANTHROPIC_CHUNK_SIZE: int = 120_000 # Max chars per system text block
_ANTHROPIC_MAX_PROMPT_TOKENS: int = 180_000 # 200k limit minus headroom
_GEMINI_MAX_INPUT_TOKENS: int = 900_000 # 1M window minus headroom
The send() Dispatcher
def send(md_content, user_message, base_dir=".", file_items=None,
discussion_history="", stream=False,
pre_tool_callback=None, qa_callback=None) -> str:
with _send_lock:
if _provider == "gemini": return _send_gemini(...)
elif _provider == "gemini_cli": return _send_gemini_cli(...)
elif _provider == "anthropic": return _send_anthropic(...)
elif _provider == "deepseek": return _send_deepseek(..., stream=stream)
_send_lock serializes all API calls — only one provider call can be in-flight at a time. All providers share the same callback signatures. Return type is always str.
Provider Comparison
| Aspect | Gemini SDK | Anthropic | DeepSeek | Gemini CLI |
|---|---|---|---|---|
| Client | genai.Client |
anthropic.Anthropic |
Raw requests.post |
GeminiCliAdapter (subprocess) |
| History | SDK-managed (_gemini_chat._history) |
Client-managed list | Client-managed list | CLI-managed (session ID) |
| Caching | Server-side CachedContent with TTL |
Prompt caching via cache_control: ephemeral (4 breakpoints) |
None | None |
| Tool format | types.FunctionDeclaration |
JSON Schema dict | Not declared | Same as SDK via adapter |
| Tool results | Part.from_function_response(response={"output": ...}) |
{"type": "tool_result", "tool_use_id": ..., "content": ...} |
{"role": "tool", "tool_call_id": ..., "content": ...} |
{"role": "tool", ...} |
| History trimming | In-place at 40% of 900K token estimate | 2-phase: strip stale file refreshes, then drop turn pairs at 180K | None | None |
| Streaming | No | No | Yes | No |
Tool-Call Loop (common pattern across providers)
All providers follow the same high-level loop, iterated up to MAX_TOOL_ROUNDS + 2 times:
- Send message (or tool results from prior round) to API.
- Extract text response and any function calls.
- Log to comms log; emit events.
- If no function calls or max rounds exceeded: break.
- For each function call:
- If
pre_tool_callbackrejects: return rejection text. - Dispatch to
mcp_client.dispatch()orshell_runner.run_powershell(). - After the last call of this round: run
_reread_file_items()for context refresh. - Truncate tool output at
_history_trunc_limitchars. - Accumulate
_cumulative_tool_bytes.
- If
- If cumulative bytes > 500KB: inject warning.
- Package tool results in provider-specific format; loop.
Context Refresh Mechanism
After the last tool call in each round, _reread_file_items(file_items) checks mtimes of all tracked files:
- For each file item: compare
Path.stat().st_mtimeagainst storedmtime. - If unchanged: pass through as-is.
- If changed: re-read content, store
old_contentfor diffing, updatemtime. - Changed files are diffed via
_build_file_diff_text:- Files <= 200 lines: emit full content.
- Files > 200 lines with
old_content: emitdifflib.unified_diff.
- Diff is appended to the last tool's output as
[SYSTEM: FILES UPDATED]\n\n{diff}. - Stale
[FILES UPDATED]blocks are stripped from older history turns by_strip_stale_file_refreshesto prevent context bloat.
Anthropic Cache Strategy (4-Breakpoint System)
Anthropic allows a maximum of 4 cache_control: ephemeral breakpoints:
| # | Location | Purpose |
|---|---|---|
| 1 | Last block of stable system prompt | Cache base instructions |
| 2 | Last block of context chunks | Cache file context |
| 3 | Last tool definition | Cache tool schema |
| 4 | Second-to-last user message | Cache conversation prefix |
Before placing breakpoint 4, all existing cache_control is stripped from history to prevent exceeding the limit.
Gemini Cache Strategy (Server-Side TTL)
System instruction content is hashed. On each call, a 3-way decision:
- Hash changed: Delete old cache, rebuild with new content.
- Cache age > 90% of TTL: Proactive renewal (delete + rebuild).
- No cache exists: Create new
CachedContentif token count >= 2048; otherwise inline.
Comms Log System
Every API interaction is logged to a module-level list with real-time GUI push:
def _append_comms(direction: str, kind: str, payload: dict[str, Any]) -> None:
entry = {
"ts": datetime.now().strftime("%H:%M:%S"),
"direction": direction, # "OUT" (to API) or "IN" (from API)
"kind": kind, # "request" | "response" | "tool_call" | "tool_result"
"provider": _provider,
"model": _model,
"payload": payload,
}
_comms_log.append(entry)
if comms_log_callback:
comms_log_callback(entry) # Real-time push to GUI
State Machines
ai_status (Informal)
"idle" -> "sending..." -> [AI call in progress]
-> "running powershell..." -> "powershell done, awaiting AI..."
-> "fetching url..." | "searching web..."
-> "done" | "error"
-> "idle" (on reset)
HITL Dialog State (Binary per type)
_pending_dialog is not None— script confirmation active_pending_mma_approval is not None— MMA step approval active_pending_mma_spawn is not None— spawn approval active_pending_ask_dialog == True— tool ask dialog active
Security: The MCP Allowlist
Every filesystem tool (read, list, search, write) is gated by the MCP Bridge (mcp_client.py). See guide_tools.md for the complete security model, tool inventory, and endpoint reference.
Summary: Every path is resolved to an absolute path and checked against a dynamically-built allowlist constructed from the project's tracked files and base directories. Files named history.toml or *_history.toml are hard-blacklisted.
Telemetry & Auditing
Every interaction is designed to be auditable:
- JSON-L Comms Logs: Raw API traffic logged to
logs/sessions/<id>/comms.logfor debugging and token cost analysis. - Tool Call Logs: Markdown-formatted sequential records to
toolcalls.log. - Generated Scripts: Every PowerShell script that passes through the Execution Clutch is saved to
scripts/generated/<ts>_<seq>.ps1. - API Hook Logs: All HTTP hook invocations logged to
apihooks.log. - CLI Call Logs: Subprocess execution details (command, stdin, stdout, stderr, latency) to
clicalls.logas JSON-L. - Performance Monitor: Real-time FPS, Frame Time, CPU, Input Lag tracked and queryable via Hook API.
Telemetry Data Structures
# Comms log entry (JSON-L)
{
"ts": "14:32:05",
"direction": "OUT",
"kind": "tool_call",
"provider": "gemini",
"model": "gemini-2.5-flash-lite",
"payload": {
"name": "run_powershell",
"id": "call_abc123",
"script": "Get-ChildItem"
},
"source_tier": "Tier 3",
"local_ts": 1709875925.123
}
# Performance metrics (via get_metrics())
{
"fps": 60.0,
"fps_avg": 58.5,
"last_frame_time_ms": 16.67,
"frame_time_ms_avg": 17.1,
"cpu_percent": 12.5,
"cpu_percent_avg": 15.2,
"input_lag_ms": 2.3,
"input_lag_ms_avg": 3.1,
"time_render_mma_dashboard_ms": 5.2,
"time_render_mma_dashboard_ms_avg": 4.8
}
MMA Engine Architecture
WorkerPool: Concurrent Worker Management
The WorkerPool class in multi_agent_conductor.py manages a bounded pool of worker threads:
class WorkerPool:
def __init__(self, max_workers: int = 4):
self.max_workers = max_workers
self._active: dict[str, threading.Thread] = {}
self._lock = threading.Lock()
self._semaphore = threading.Semaphore(max_workers)
def spawn(self, ticket_id: str, target: Callable, args: tuple) -> Optional[threading.Thread]:
with self._lock:
if len(self._active) >= self.max_workers:
return None
def wrapper(*a, **kw):
try:
with self._semaphore:
target(*a, **kw)
finally:
with self._lock:
self._active.pop(ticket_id, None)
t = threading.Thread(target=wrapper, args=args, daemon=True)
with self._lock:
self._active[ticket_id] = t
t.start()
return t
Key behaviors:
- Bounded concurrency:
max_workers(default 4) limits parallel ticket execution - Semaphore gating: Ensures no more than
max_workerscan execute simultaneously - Automatic cleanup: Thread removes itself from
_activedict on completion - Non-blocking spawn: Returns
Noneif pool is full, allowing the engine to defer
ConductorEngine: Orchestration Loop
The ConductorEngine orchestrates ticket execution within a track:
class ConductorEngine:
def __init__(self, track: Track, event_queue: Optional[SyncEventQueue] = None,
auto_queue: bool = False) -> None:
self.track = track
self.event_queue = event_queue
self.dag = TrackDAG(self.track.tickets)
self.engine = ExecutionEngine(self.dag, auto_queue=auto_queue)
self.pool = WorkerPool(max_workers=4)
self._abort_events: dict[str, threading.Event] = {}
self._pause_event = threading.Event()
self._tier_usage_lock = threading.Lock()
self.tier_usage = {
"Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
"Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
"Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
"Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
}
Main execution loop (run method):
- Pause check: If
_pause_eventis set, sleep and broadcast "paused" status - DAG tick: Call
engine.tick()to get ready tasks - Completion check: If no ready tasks and all completed, break with "done" status
- Wait for workers: If tasks in-progress or pool active, sleep and continue
- Blockage detection: If no ready, no in-progress, and not all done, break with "blocked" status
- Spawn workers: For each ready task, spawn a worker via
pool.spawn() - Model escalation: Workers use
models_list[min(retry_count, 2)]for capability upgrade on retries
Abort Event Propagation
Each ticket has an associated threading.Event for abort signaling:
# Before spawning worker
self._abort_events[ticket.id] = threading.Event()
# Worker checks abort at three points:
# 1. Before major work
if abort_event.is_set():
ticket.status = "killed"
return "ABORTED"
# 2. Before tool execution (in clutch_callback)
if abort_event.is_set():
return False # Reject tool
# 3. After blocking send() returns
if abort_event.is_set():
ticket.status = "killed"
return "ABORTED"
Architectural Invariants
-
Single-writer principle: All GUI state mutations happen on the main thread via
_process_pending_gui_tasks. Background threads never write GUI state directly. -
Copy-and-clear lock pattern:
_process_pending_gui_taskssnapshots and clears the task list under the lock, then processes outside the lock. -
Context Amnesia: Each MMA Tier 3 Worker starts with
ai_client.reset_session(). No conversational bleed between tickets. -
Send serialization:
_send_lockensures only one provider call is in-flight at a time across all threads. -
Dual-Flush persistence: On exit, state is committed to both project-level and global-level config files.
-
No cross-thread GUI mutation: Background threads must push tasks to
_pending_gui_tasksrather than calling GUI methods directly. -
Abort-before-execution: Workers check abort events before major work phases, enabling clean cancellation.
-
Bounded worker pool:
WorkerPoolenforcesmax_workerslimit to prevent resource exhaustion.
Error Classification & Recovery
ProviderError Taxonomy
The ProviderError class provides structured error classification:
class ProviderError(Exception):
def __init__(self, kind: str, provider: str, original: Exception):
self.kind = kind # "quota" | "rate_limit" | "auth" | "balance" | "network" | "unknown"
self.provider = provider
self.original = original
def ui_message(self) -> str:
labels = {
"quota": "QUOTA EXHAUSTED",
"rate_limit": "RATE LIMITED",
"auth": "AUTH / API KEY ERROR",
"balance": "BALANCE / BILLING ERROR",
"network": "NETWORK / CONNECTION ERROR",
"unknown": "API ERROR",
}
return f"[{self.provider.upper()} {labels.get(self.kind, 'API ERROR')}]\n\n{self.original}"
Error Recovery Patterns
| Error Kind | Recovery Strategy |
|---|---|
quota |
Display in UI, await user intervention |
rate_limit |
Exponential backoff (not yet implemented) |
auth |
Prompt for credential verification |
balance |
Display billing alert |
network |
Auto-retry with timeout |
unknown |
Log full traceback, display in UI |
Memory Management
History Trimming Strategies
Gemini (40% threshold):
if total_in > _GEMINI_MAX_INPUT_TOKENS * 0.4:
while len(hist) > 4 and total_in > _GEMINI_MAX_INPUT_TOKENS * 0.3:
# Drop oldest message pairs
hist.pop(0) # Assistant
hist.pop(0) # User
Anthropic (180K limit):
def _trim_anthropic_history(system_blocks, history):
est = _estimate_prompt_tokens(system_blocks, history)
while len(history) > 3 and est > _ANTHROPIC_MAX_PROMPT_TOKENS:
# Drop turn pairs, preserving tool_result chains
...
Tool Output Budget
_MAX_TOOL_OUTPUT_BYTES: int = 500_000 # 500KB cumulative
if _cumulative_tool_bytes > _MAX_TOOL_OUTPUT_BYTES:
# Inject warning, force final answer
parts.append("SYSTEM WARNING: Cumulative tool output exceeded 500KB budget.")
AST Cache (file_cache.py)
_ast_cache: Dict[str, Tuple[float, tree_sitter.Tree]] = {}
def get_cached_tree(self, path: Optional[str], code: str) -> tree_sitter.Tree:
mtime = p.stat().st_mtime if p.exists() else 0.0
if path in _ast_cache:
cached_mtime, tree = _ast_cache[path]
if cached_mtime == mtime:
return tree
# Parse and cache with simple LRU (max 10 entries)
if len(_ast_cache) >= 10:
del _ast_cache[next(iter(_ast_cache))]
tree = self.parse(code)
_ast_cache[path] = (mtime, tree)
return tree