Files

Ed_ 08e003a137 docs: Complete documentation rewrite at gencpp/VEFontCache reference quality

Rewrites all docs from Gemini's 330-line executive summaries to 1874 lines
of expert-level architectural reference matching the pedagogical depth of
gencpp (Parser_Algo.md, AST_Types.md) and VEFontCache-Odin (guide_architecture.md).

Changes:
- guide_architecture.md: 73 -> 542 lines. Adds inline data structures for all
  dialog classes, cross-thread communication patterns, complete action type
  catalog, provider comparison table, 4-breakpoint Anthropic cache strategy,
  Gemini server-side cache lifecycle, context refresh algorithm.
- guide_tools.md: 66 -> 385 lines. Full 26-tool inventory with parameters,
  3-layer MCP security model walkthrough, all Hook API GET/POST endpoints
  with request/response formats, ApiHookClient method reference, /api/ask
  synchronous HITL protocol, shell runner with env config.
- guide_mma.md: NEW (368 lines). Fills major documentation gap — complete
  Ticket/Track/WorkerContext data structures, DAG engine algorithms (cycle
  detection, topological sort), ConductorEngine execution loop, Tier 2 ticket
  generation, Tier 3 worker lifecycle with context amnesia, token firewalling.
- guide_simulations.md: 64 -> 377 lines. 8-stage Puppeteer simulation
  lifecycle, mock_gemini_cli.py JSON-L protocol, approval automation pattern,
  ASTParser tree-sitter vs stdlib ast comparison, VerificationLogger.
- Readme.md: Rewritten with module map, architecture summary, config examples.
- docs/Readme.md: Proper index with guide contents table and GUI panel docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-01 09:44:50 -05:00

23 KiB

Raw Blame History

Architecture

Top | Tools & IPC | MMA Orchestration | Simulations

Philosophy: The Decoupled State Machine

Manual Slop solves a single tension: AI reasoning is high-latency and non-deterministic; GUI interaction must be low-latency and responsive. The engine enforces strict decoupling between three thread domains so that multi-second LLM calls never block the render loop, and every AI-generated payload passes through a human-auditable gate before execution.

Thread Domains

Four distinct thread domains operate concurrently:

Domain	Created By	Purpose	Lifecycle
Main / GUI	`immapp.run()`	Dear ImGui retained-mode render loop; sole writer of GUI state	App lifetime
Asyncio Worker	`App.__init__` via `threading.Thread(daemon=True)`	Event queue processing, AI client calls	Daemon (dies with process)
HookServer	`api_hooks.HookServer.start()`	HTTP API on `:8999` for external automation and IPC	Daemon thread
Ad-hoc	Transient `threading.Thread` calls	Model-fetching, legacy send paths	Short-lived

The asyncio worker is not the main thread's event loop. It runs a dedicated asyncio.new_event_loop() on its own daemon thread:

# App.__init__:
self._loop = asyncio.new_event_loop()
self._loop_thread = threading.Thread(target=self._run_event_loop, daemon=True)
self._loop_thread.start()

# _run_event_loop:
def _run_event_loop(self) -> None:
    asyncio.set_event_loop(self._loop)
    self._loop.create_task(self._process_event_queue())
    self._loop.run_forever()

The GUI thread uses asyncio.run_coroutine_threadsafe(coro, self._loop) to push work into this loop.

Cross-Thread Data Structures

All cross-thread communication uses one of three patterns:

Pattern A: AsyncEventQueue (GUI -> Asyncio)

# events.py
class AsyncEventQueue:
    _queue: asyncio.Queue  # holds Tuple[str, Any] items

    async def put(self, event_name: str, payload: Any = None) -> None
    async def get(self) -> Tuple[str, Any]

The central event bus. Uses asyncio.Queue, so non-asyncio threads must enqueue via asyncio.run_coroutine_threadsafe(). Consumer is App._process_event_queue(), running as a long-lived coroutine on the asyncio loop.

Pattern B: Guarded Lists (Any Thread -> GUI)

Background threads cannot write GUI state directly. They append task dicts to lock-guarded lists; the main thread drains these once per frame:

# App.__init__:
self._pending_gui_tasks: list[dict[str, Any]] = []
self._pending_gui_tasks_lock = threading.Lock()

self._pending_comms: list[dict[str, Any]] = []
self._pending_comms_lock = threading.Lock()

self._pending_tool_calls: list[tuple[str, str, float]] = []
self._pending_tool_calls_lock = threading.Lock()

self._pending_history_adds: list[dict[str, Any]] = []
self._pending_history_adds_lock = threading.Lock()

Additional locks:

self._send_thread_lock = threading.Lock()       # Guards send_thread creation
self._pending_dialog_lock = threading.Lock()     # Guards _pending_dialog + _pending_actions dict

Pattern C: Condition-Variable Dialogs (Bidirectional Blocking)

Used for Human-in-the-Loop (HITL) approval. Background thread blocks on threading.Condition; GUI thread signals after user action. See the HITL section below.

Event System

Three classes in events.py (89 lines, no external dependencies beyond asyncio and typing):

EventEmitter

class EventEmitter:
    _listeners: Dict[str, List[Callable]]

    def on(self, event_name: str, callback: Callable) -> None
    def emit(self, event_name: str, *args: Any, **kwargs: Any) -> None

Synchronous pub-sub. Callbacks execute in the caller's thread. Used by ai_client.events for lifecycle hooks (request_start, response_received, tool_execution). No thread safety — relies on consistent single-thread usage.

AsyncEventQueue

Described above in Pattern A.

UserRequestEvent

class UserRequestEvent:
    prompt: str           # User's raw input text
    stable_md: str        # Generated markdown context (files, screenshots)
    file_items: List[Any] # File attachment items for dynamic refresh
    disc_text: str        # Serialized discussion history
    base_dir: str         # Working directory for shell commands

    def to_dict(self) -> Dict[str, Any]

Pure data carrier. Created on the GUI thread in _handle_generate_send, consumed on the asyncio thread in _handle_request_event.

Application Lifetime

Boot Sequence

The App.__init__ (lines 152-296) follows this precise order:

Config hydration: Reads config.toml (global) and <project>.toml (local). Builds the initial "world view" — tracked files, discussion history, active models.
Thread bootstrapping:
- Asyncio event loop thread starts (_loop_thread).
- HookServer starts as a daemon if test_hooks_enabled or provider is gemini_cli.
Callback wiring (_init_ai_and_hooks): Connects ai_client.confirm_and_run_callback, comms_log_callback, tool_log_callback to GUI handlers.
UI entry: Main thread enters immapp.run(). GUI is now alive; background threads are ready.

Shutdown Sequence

When immapp.run() returns (user closed window):

hook_server.stop() — shuts down HTTP server, joins thread.
perf_monitor.stop().
ai_client.cleanup() — destroys server-side API caches (Gemini CachedContent).
Dual-Flush persistence: _flush_to_project(), _save_active_project(), _flush_to_config(), save_config() — commits state back to both project and global configs.
session_logger.close_session().

The asyncio loop thread is a daemon — it dies with the process. App.shutdown() exists for explicit cleanup in test scenarios:

def shutdown(self) -> None:
    if self._loop.is_running():
        self._loop.call_soon_threadsafe(self._loop.stop)
    if self._loop_thread.is_alive():
        self._loop_thread.join(timeout=2.0)

The Task Pipeline: Producer-Consumer Synchronization

Request Flow

GUI Thread                    Asyncio Thread                      GUI Thread (next frame)
──────────                    ──────────────                      ──────────────────────
1. User clicks "Gen + Send"
2. _handle_generate_send():
   - Compiles md context
   - Creates UserRequestEvent
   - Enqueues via
     run_coroutine_threadsafe  ──>  3. _process_event_queue():
                                       awaits event_queue.get()
                                       routes "user_request" to
                                       _handle_request_event()
                                   4. Configures ai_client
                                   5. ai_client.send() BLOCKS
                                      (seconds to minutes)
                                   6. On completion, enqueues
                                      "response" event back       ──>  7. _process_pending_gui_tasks():
                                                                          Drains task list under lock
                                                                          Sets ai_response text
                                                                          Triggers terminal blink

Event Types Routed by `_process_event_queue`

Event Name	Action
`"user_request"`	Calls `_handle_request_event(payload)` — synchronous blocking AI call
`"response"`	Appends `{"action": "handle_ai_response", ...}` to `_pending_gui_tasks`
`"mma_state_update"`	Appends `{"action": "mma_state_update", ...}` to `_pending_gui_tasks`
`"mma_spawn_approval"`	Appends the raw payload for HITL dialog creation
`"mma_step_approval"`	Appends the raw payload for HITL dialog creation

The pattern: events arriving on the asyncio thread that need GUI state changes are serialized into _pending_gui_tasks for consumption on the next render frame.

Frame-Sync Mechanism: `_process_pending_gui_tasks`

Called once per ImGui frame on the main GUI thread. This is the sole safe point for mutating GUI-visible state.

Locking strategy — copy-and-clear:

def _process_pending_gui_tasks(self) -> None:
    if not self._pending_gui_tasks:
        return
    with self._pending_gui_tasks_lock:
        tasks = self._pending_gui_tasks[:]   # Snapshot
        self._pending_gui_tasks.clear()       # Release lock fast
    for task in tasks:
        # Process each task outside the lock

Acquires the lock briefly to snapshot the task list, then processes outside the lock. Minimizes lock contention with producer threads.

Complete Action Type Catalog

Action	Source	Effect
`"refresh_api_metrics"`	asyncio/hooks	Updates API metrics display
`"handle_ai_response"`	asyncio	Sets `ai_response`, `ai_status`, `mma_streams[stream_id]`; triggers blink; optionally auto-adds to discussion history
`"show_track_proposal"`	asyncio	Sets `proposed_tracks` list, opens modal
`"mma_state_update"`	asyncio	Updates `mma_status`, `active_tier`, `mma_tier_usage`, `active_tickets`, `active_track`
`"set_value"`	HookServer	Sets any field in `_settable_fields` map via `setattr`; special-cases `current_provider`/`current_model` to reconfigure AI client
`"click"`	HookServer	Dispatches to `_clickable_actions` map; introspects signatures to decide whether to pass `user_data`
`"select_list_item"`	HookServer	Routes to `_switch_discussion()` for discussion listbox
`{"type": "ask"}`	HookServer	Opens ask dialog: sets `_pending_ask_dialog = True`, stores `_ask_request_id` and `_ask_tool_data`
`"clear_ask"`	HookServer	Clears ask dialog state if request_id matches
`"custom_callback"`	HookServer	Executes an arbitrary callable with args
`"mma_step_approval"`	asyncio (MMA engine)	Creates `MMAApprovalDialog`, stores in `_pending_mma_approval`
`"mma_spawn_approval"`	asyncio (MMA engine)	Creates `MMASpawnApprovalDialog`, stores in `_pending_mma_spawn`
`"refresh_from_project"`	HookServer/internal	Reloads all UI state from project dict

The Execution Clutch: Human-in-the-Loop

The "Execution Clutch" ensures every destructive AI action passes through an auditable human gate. Three dialog types implement this, all sharing the same blocking pattern.

Dialog Classes

ConfirmDialog — PowerShell script execution approval:

class ConfirmDialog:
    _uid: str                        # uuid4 identifier
    _script: str                     # The PowerShell script text (editable)
    _base_dir: str                   # Working directory
    _condition: threading.Condition  # Blocking primitive
    _done: bool                      # Signal flag
    _approved: bool                  # User's decision

    def wait(self) -> tuple[bool, str]   # Blocks until _done; returns (approved, script)

MMAApprovalDialog — MMA tier step approval:

class MMAApprovalDialog:
    _ticket_id: str
    _payload: str                    # The step payload (editable)
    _condition: threading.Condition
    _done: bool
    _approved: bool

    def wait(self) -> tuple[bool, str]   # Returns (approved, payload)

MMASpawnApprovalDialog — Sub-agent spawn approval:

class MMASpawnApprovalDialog:
    _ticket_id: str
    _role: str                       # tier3-worker, tier4-qa, etc.
    _prompt: str                     # Spawn prompt (editable)
    _context_md: str                 # Context document (editable)
    _condition: threading.Condition
    _done: bool
    _approved: bool
    _abort: bool                     # Can abort entire track

    def wait(self) -> dict[str, Any]   # Returns {approved, abort, prompt, context_md}

Blocking Flow

Using ConfirmDialog as exemplar:

   ASYNCIO THREAD (ai_client tool callback)         GUI MAIN THREAD
   ─────────────────────────────────────────         ───────────────
   1. ai_client calls _confirm_and_run(script)
   2. Creates ConfirmDialog(script, base_dir)
   3. Stores dialog:
      - Headless: _pending_actions[uid] = dialog
      - GUI mode: _pending_dialog = dialog
   4. If test_hooks_enabled:
      pushes to _api_event_queue
   5. dialog.wait() BLOCKS on _condition
                                                    6. Next frame: ImGui renders
                                                       _pending_dialog in modal
                                                    7. User clicks Approve/Reject
                                                    8. _handle_approve_script():
                                                       with dialog._condition:
                                                           dialog._approved = True
                                                           dialog._done = True
                                                           dialog._condition.notify_all()
   9. wait() returns (True, potentially_edited_script)
   10. Executes shell_runner.run_powershell()
   11. Returns output to ai_client

The _condition.wait(timeout=0.1) uses a 100ms polling interval inside a loop — a polling-with-condition hybrid that ensures the blocking thread wakes periodically.

Resolution Paths

GUI button path (normal interactive use): _handle_approve_script() / _handle_approve_mma_step() / _handle_approve_spawn() directly manipulate the dialog's condition variable from the GUI thread.

HTTP API path (headless/automation): resolve_pending_action(action_id, approved) looks up the dialog by UUID in _pending_actions dict (headless) or _pending_dialog (GUI), then signals the condition:

def resolve_pending_action(self, action_id: str, approved: bool) -> bool:
    with self._pending_dialog_lock:
        if action_id in self._pending_actions:
            dialog = self._pending_actions[action_id]
            with dialog._condition:
                dialog._approved = approved
                dialog._done = True
                dialog._condition.notify_all()
            return True

MMA approval path: _handle_mma_respond(approved, payload, abort, prompt, context_md) is the unified resolver. It uses a dialog_container — a one-element list [None] used as a mutable reference shared between the MMA engine (which creates the container) and the GUI (which populates it via _process_pending_gui_tasks).

AI Client: Multi-Provider Architecture

ai_client.py operates as a stateful singleton — all provider state is held in module-level globals. There is no class wrapping; the module itself is the abstraction layer.

Module-Level State

_provider: str = "gemini"              # "gemini" | "anthropic" | "deepseek" | "gemini_cli"
_model: str = "gemini-2.5-flash-lite"
_temperature: float = 0.0
_max_tokens: int = 8192
_history_trunc_limit: int = 8000       # Char limit for truncating old tool outputs

_send_lock: threading.Lock             # Serializes ALL send() calls across providers

Per-provider client objects:

# Gemini (SDK-managed stateful chat)
_gemini_client: genai.Client | None
_gemini_chat: Any                      # Holds history internally
_gemini_cache: Any                     # Server-side CachedContent
_gemini_cache_md_hash: int | None      # For cache invalidation
_GEMINI_CACHE_TTL: int = 3600          # 1-hour; rebuilt at 90% (3240s)

# Anthropic (client-managed history)
_anthropic_client: anthropic.Anthropic | None
_anthropic_history: list[dict]         # Mutable [{role, content}, ...]
_anthropic_history_lock: threading.Lock

# DeepSeek (raw HTTP, client-managed history)
_deepseek_history: list[dict]
_deepseek_history_lock: threading.Lock

# Gemini CLI (adapter wrapper)
_gemini_cli_adapter: GeminiCliAdapter | None

Safety limits:

MAX_TOOL_ROUNDS: int = 10              # Max tool-call loop iterations per send()
_MAX_TOOL_OUTPUT_BYTES: int = 500_000  # 500KB cumulative tool output budget
_ANTHROPIC_CHUNK_SIZE: int = 120_000   # Max chars per system text block
_ANTHROPIC_MAX_PROMPT_TOKENS: int = 180_000  # 200k limit minus headroom
_GEMINI_MAX_INPUT_TOKENS: int = 900_000      # 1M window minus headroom

The `send()` Dispatcher

def send(md_content, user_message, base_dir=".", file_items=None,
         discussion_history="", stream=False,
         pre_tool_callback=None, qa_callback=None) -> str:
    with _send_lock:
        if _provider == "gemini":      return _send_gemini(...)
        elif _provider == "gemini_cli": return _send_gemini_cli(...)
        elif _provider == "anthropic":  return _send_anthropic(...)
        elif _provider == "deepseek":   return _send_deepseek(..., stream=stream)

_send_lock serializes all API calls — only one provider call can be in-flight at a time. All providers share the same callback signatures. Return type is always str.

Provider Comparison

Aspect	Gemini SDK	Anthropic	DeepSeek	Gemini CLI
Client	`genai.Client`	`anthropic.Anthropic`	Raw `requests.post`	`GeminiCliAdapter` (subprocess)
History	SDK-managed (`_gemini_chat._history`)	Client-managed list	Client-managed list	CLI-managed (session ID)
Caching	Server-side `CachedContent` with TTL	Prompt caching via `cache_control: ephemeral` (4 breakpoints)	None	None
Tool format	`types.FunctionDeclaration`	JSON Schema dict	Not declared	Same as SDK via adapter
Tool results	`Part.from_function_response(response={"output": ...})`	`{"type": "tool_result", "tool_use_id": ..., "content": ...}`	`{"role": "tool", "tool_call_id": ..., "content": ...}`	`{"role": "tool", ...}`
History trimming	In-place at 40% of 900K token estimate	2-phase: strip stale file refreshes, then drop turn pairs at 180K	None	None
Streaming	No	No	Yes	No

Tool-Call Loop (common pattern across providers)

All providers follow the same high-level loop, iterated up to MAX_TOOL_ROUNDS + 2 times:

Send message (or tool results from prior round) to API.
Extract text response and any function calls.
Log to comms log; emit events.
If no function calls or max rounds exceeded: break.
For each function call:
- If pre_tool_callback rejects: return rejection text.
- Dispatch to mcp_client.dispatch() or shell_runner.run_powershell().
- After the last call of this round: run _reread_file_items() for context refresh.
- Truncate tool output at _history_trunc_limit chars.
- Accumulate _cumulative_tool_bytes.
If cumulative bytes > 500KB: inject warning.
Package tool results in provider-specific format; loop.

Context Refresh Mechanism

After the last tool call in each round, _reread_file_items(file_items) checks mtimes of all tracked files:

For each file item: compare Path.stat().st_mtime against stored mtime.
If unchanged: pass through as-is.
If changed: re-read content, store old_content for diffing, update mtime.
Changed files are diffed via _build_file_diff_text:
- Files <= 200 lines: emit full content.
- Files > 200 lines with old_content: emit difflib.unified_diff.
Diff is appended to the last tool's output as [SYSTEM: FILES UPDATED]\n\n{diff}.
Stale [FILES UPDATED] blocks are stripped from older history turns by _strip_stale_file_refreshes to prevent context bloat.

Anthropic Cache Strategy (4-Breakpoint System)

Anthropic allows a maximum of 4 cache_control: ephemeral breakpoints:

#	Location	Purpose
1	Last block of stable system prompt	Cache base instructions
2	Last block of context chunks	Cache file context
3	Last tool definition	Cache tool schema
4	Second-to-last user message	Cache conversation prefix

Before placing breakpoint 4, all existing cache_control is stripped from history to prevent exceeding the limit.

Gemini Cache Strategy (Server-Side TTL)

System instruction content is hashed. On each call, a 3-way decision:

Hash changed: Delete old cache, rebuild with new content.
Cache age > 90% of TTL: Proactive renewal (delete + rebuild).
No cache exists: Create new CachedContent if token count >= 2048; otherwise inline.

Comms Log System

Every API interaction is logged to a module-level list with real-time GUI push:

def _append_comms(direction: str, kind: str, payload: dict[str, Any]) -> None:
    entry = {
        "ts":        datetime.now().strftime("%H:%M:%S"),
        "direction": direction,     # "OUT" (to API) or "IN" (from API)
        "kind":      kind,          # "request" | "response" | "tool_call" | "tool_result"
        "provider":  _provider,
        "model":     _model,
        "payload":   payload,
    }
    _comms_log.append(entry)
    if comms_log_callback:
        comms_log_callback(entry)   # Real-time push to GUI

State Machines

`ai_status` (Informal)

"idle" -> "sending..." -> [AI call in progress]
    -> "running powershell..." -> "powershell done, awaiting AI..."
    -> "fetching url..." | "searching web..."
    -> "done" | "error"
    -> "idle" (on reset)

HITL Dialog State (Binary per type)

_pending_dialog is not None — script confirmation active
_pending_mma_approval is not None — MMA step approval active
_pending_mma_spawn is not None — spawn approval active
_pending_ask_dialog == True — tool ask dialog active

Security: The MCP Allowlist

Every filesystem tool (read, list, search, write) is gated by the MCP Bridge (mcp_client.py). See guide_tools.md for the complete security model, tool inventory, and endpoint reference.

Summary: Every path is resolved to an absolute path and checked against a dynamically-built allowlist constructed from the project's tracked files and base directories. Files named history.toml or *_history.toml are hard-blacklisted.

Telemetry & Auditing

Every interaction is designed to be auditable:

JSON-L Comms Logs: Raw API traffic logged to logs/sessions/<id>/comms.log for debugging and token cost analysis.
Tool Call Logs: Markdown-formatted sequential records to toolcalls.log.
Generated Scripts: Every PowerShell script that passes through the Execution Clutch is saved to scripts/generated/<ts>_<seq>.ps1.
API Hook Logs: All HTTP hook invocations logged to apihooks.log.
CLI Call Logs: Subprocess execution details (command, stdin, stdout, stderr, latency) to clicalls.log as JSON-L.
Performance Monitor: Real-time FPS, Frame Time, CPU, Input Lag tracked and queryable via Hook API.

Architectural Invariants

Single-writer principle: All GUI state mutations happen on the main thread via _process_pending_gui_tasks. Background threads never write GUI state directly.
Copy-and-clear lock pattern: _process_pending_gui_tasks snapshots and clears the task list under the lock, then processes outside the lock.
Context Amnesia: Each MMA Tier 3 Worker starts with ai_client.reset_session(). No conversational bleed between tickets.
Send serialization: _send_lock ensures only one provider call is in-flight at a time across all threads.
Dual-Flush persistence: On exit, state is committed to both project-level and global-level config files.

23 KiB Raw Blame History