From 08e003a1372a9a669f06c3b8a9a6699cdab23c99 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Sun, 1 Mar 2026 09:44:50 -0500 Subject: [PATCH] docs: Complete documentation rewrite at gencpp/VEFontCache reference quality MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rewrites all docs from Gemini's 330-line executive summaries to 1874 lines of expert-level architectural reference matching the pedagogical depth of gencpp (Parser_Algo.md, AST_Types.md) and VEFontCache-Odin (guide_architecture.md). Changes: - guide_architecture.md: 73 -> 542 lines. Adds inline data structures for all dialog classes, cross-thread communication patterns, complete action type catalog, provider comparison table, 4-breakpoint Anthropic cache strategy, Gemini server-side cache lifecycle, context refresh algorithm. - guide_tools.md: 66 -> 385 lines. Full 26-tool inventory with parameters, 3-layer MCP security model walkthrough, all Hook API GET/POST endpoints with request/response formats, ApiHookClient method reference, /api/ask synchronous HITL protocol, shell runner with env config. - guide_mma.md: NEW (368 lines). Fills major documentation gap — complete Ticket/Track/WorkerContext data structures, DAG engine algorithms (cycle detection, topological sort), ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia, token firewalling. - guide_simulations.md: 64 -> 377 lines. 8-stage Puppeteer simulation lifecycle, mock_gemini_cli.py JSON-L protocol, approval automation pattern, ASTParser tree-sitter vs stdlib ast comparison, VerificationLogger. - Readme.md: Rewritten with module map, architecture summary, config examples. - docs/Readme.md: Proper index with guide contents table and GUI panel docs. Co-Authored-By: Claude Opus 4.6 (1M context) --- Readme.md | 138 ++++++--- docs/Readme.md | 81 +++--- docs/guide_architecture.md | 554 ++++++++++++++++++++++++++++++++++--- docs/guide_mma.md | 368 ++++++++++++++++++++++++ docs/guide_simulations.md | 390 +++++++++++++++++++++++--- docs/guide_tools.md | 404 ++++++++++++++++++++++++--- 6 files changed, 1742 insertions(+), 193 deletions(-) create mode 100644 docs/guide_mma.md diff --git a/Readme.md b/Readme.md index d61ae5e..6f23389 100644 --- a/Readme.md +++ b/Readme.md @@ -1,66 +1,128 @@ # Manual Slop -An experimental, high-density AI orchestration engine designed for expert developers. Manual Slop provides a strictly controlled environment for executing complex, multi-tier AI workflows with deterministic human-in-the-loop (HITL) overrides. +A GUI orchestrator for local LLM-driven coding sessions. Manual Slop bridges high-latency AI reasoning with a low-latency ImGui render loop via a thread-safe asynchronous pipeline, ensuring every AI-generated payload passes through a human-auditable gate before execution. + +**Tech Stack**: Python 3.11+, Dear PyGui / ImGui, FastAPI, Uvicorn +**Providers**: Gemini API, Anthropic API, DeepSeek, Gemini CLI (headless) +**Platform**: Windows (PowerShell) — single developer, local use --- -## 1. Technical Philosophy +## Architecture at a Glance -Manual Slop is not a chat interface. It is a **Decoupled State Machine** built on the principle that AI reasoning should be observable, mutable, and interruptible. It bridges high-latency AI execution with a low-latency, retained-mode GUI via a thread-safe asynchronous pipeline. +Four thread domains operate concurrently: the ImGui main loop, an asyncio worker for AI calls, a `HookServer` (HTTP on `:8999`) for external automation, and transient threads for model fetching. Background threads never write GUI state directly — they serialize task dicts into lock-guarded lists that the main thread drains once per frame ([details](./docs/guide_architecture.md#the-task-pipeline-producer-consumer-synchronization)). -### Core Features -* **Hierarchical MMA (4-Tier Architecture):** Orchestrate complex tracks using a tiered model (Orchestrator -> Tech Lead -> Worker -> QA) with explicit token firewalling. -* **The Execution Clutch:** A deterministic "gear-shifting" mechanism that pauses execution for human inspection and mutation of AI-generated payloads. -* **MCP-Bridge & Tooling:** Integrated filesystem sandboxing and native search/fetch tools with project-wide security allowlists. -* **Live Simulation Framework:** A robust verification suite using API hooks for automated visual and state assertions. +The **Execution Clutch** suspends the AI execution thread on a `threading.Condition` when a destructive action (PowerShell script, sub-agent spawn) is requested. The GUI renders a modal where the user can read, edit, or reject the payload. On approval, the condition is signaled and execution resumes ([details](./docs/guide_architecture.md#the-execution-clutch-human-in-the-loop)). + +The **MMA (Multi-Model Agent)** system decomposes epics into tracks, tracks into DAG-ordered tickets, and executes each ticket with a stateless Tier 3 worker that starts from `ai_client.reset_session()` — no conversational bleed between tickets ([details](./docs/guide_mma.md)). --- -## 2. Deep-Dive Documentation +## Documentation -For expert-level technical details, refer to our specialized guides: - -* **[Architectural Technical Reference](./docs/guide_architecture.md):** Deep-dive into thread synchronization, the task pipeline, and the decoupled state machine. -* **[Tooling & IPC Reference](./docs/guide_tools.md):** Specification of the Hook API, MCP bridge, and the HITL communication protocol. -* **[Verification & Simulation Framework](./docs/guide_simulations.md):** Detailed breakdown of the live GUI testing infrastructure and simulation lifecycle. +| Guide | Scope | +|---|---| +| [Architecture](./docs/guide_architecture.md) | Threading model, event system, AI client multi-provider architecture, HITL mechanism, comms logging | +| [Tools & IPC](./docs/guide_tools.md) | MCP Bridge security model, all 26 native tools, Hook API endpoints, ApiHookClient reference, shell runner | +| [MMA Orchestration](./docs/guide_mma.md) | 4-tier hierarchy, Ticket/Track data structures, DAG engine, ConductorEngine execution loop, worker lifecycle | +| [Simulations](./docs/guide_simulations.md) | `live_gui` fixture, Puppeteer pattern, mock provider, visual verification patterns, ASTParser / summarizer | --- -## 3. Setup & Environment +## Module Map + +| File | Lines | Role | +|---|---|---| +| `gui_2.py` | ~3080 | Primary ImGui interface — App class, frame-sync, HITL dialogs | +| `ai_client.py` | ~1800 | Multi-provider LLM abstraction (Gemini, Anthropic, DeepSeek, Gemini CLI) | +| `mcp_client.py` | ~870 | 26 MCP tools with filesystem sandboxing and tool dispatch | +| `api_hooks.py` | ~330 | HookServer — REST API for external automation on `:8999` | +| `api_hook_client.py` | ~245 | Python client for the Hook API (used by tests and external tooling) | +| `multi_agent_conductor.py` | ~250 | ConductorEngine — Tier 2 orchestration loop with DAG execution | +| `conductor_tech_lead.py` | ~100 | Tier 2 ticket generation from track briefs | +| `dag_engine.py` | ~100 | TrackDAG (dependency graph) + ExecutionEngine (tick-based state machine) | +| `models.py` | ~100 | Ticket, Track, WorkerContext dataclasses | +| `events.py` | ~89 | EventEmitter, AsyncEventQueue, UserRequestEvent | +| `project_manager.py` | ~300 | TOML config persistence, discussion management, track state | +| `session_logger.py` | ~200 | JSON-L + markdown audit trails (comms, tools, CLI, hooks) | +| `shell_runner.py` | ~100 | PowerShell execution with timeout, env config, QA callback | +| `file_cache.py` | ~150 | ASTParser (tree-sitter) — skeleton and curated views | +| `summarize.py` | ~120 | Heuristic file summaries (imports, classes, functions) | +| `outline_tool.py` | ~80 | Hierarchical code outline via stdlib `ast` | + +--- + +## Setup ### Prerequisites -* Python 3.11+ -* [`uv`](https://github.com/astral-sh/uv) for high-speed package management. + +- Python 3.11+ +- [`uv`](https://github.com/astral-sh/uv) for package management ### Installation -1. Clone the repository. -2. Install dependencies: - ```powershell - uv sync - ``` -3. Configure credentials in `credentials.toml`: - ```toml - [gemini] - api_key = "YOUR_KEY" - [anthropic] - api_key = "YOUR_KEY" - ``` -### Running the Engine -Launch the main GUI application: ```powershell -uv run gui_2.py +git clone +cd manual_slop +uv sync ``` -To enable the Hook API for external telemetry or testing: +### Credentials + +Configure in `credentials.toml`: + +```toml +[gemini] +api_key = "YOUR_KEY" + +[anthropic] +api_key = "YOUR_KEY" + +[deepseek] +api_key = "YOUR_KEY" +``` + +### Running + ```powershell -uv run gui_2.py --enable-test-hooks +uv run gui_2.py # Normal mode +uv run gui_2.py --enable-test-hooks # With Hook API on :8999 +``` + +### Running Tests + +```powershell +uv run pytest tests/ -v ``` --- -## 4. Feature Roadmap (2026) +## Project Configuration -* **DAG-Based Task Execution:** Real-time visual tracking of multi-agent ticket dependencies. -* **Token Budgeting & Throttling:** Granular control over cost and context accumulation per tier. -* **Advanced Simulation Suite:** Expanded visual verification for multi-modal reasoning tracks. +Projects are stored as `.toml` files. The discussion history is split into a sibling `_history.toml` to keep the main config lean. + +```toml +[project] +name = "my_project" +git_dir = "./my_repo" +system_prompt = "" + +[files] +base_dir = "./my_repo" +paths = ["src/**/*.py", "README.md"] + +[screenshots] +base_dir = "./my_repo" +paths = [] + +[output] +output_dir = "./md_gen" + +[gemini_cli] +binary_path = "gemini" + +[agent.tools] +run_powershell = true +read_file = true +# ... 26 tool flags +``` diff --git a/docs/Readme.md b/docs/Readme.md index 5e690bf..07d3af4 100644 --- a/docs/Readme.md +++ b/docs/Readme.md @@ -1,59 +1,74 @@ -# Manual Slop +# Documentation Index -A GUI orchestrator for local LLM-driven coding sessions, built to prevent the AI from running wild and to provide total transparency into the context and execution state. +[Top](../Readme.md) -## Core Management Panels +--- + +## Guides + +| Guide | Contents | +|---|---| +| [Architecture](guide_architecture.md) | Thread domains, cross-thread data structures, event system, application lifetime, task pipeline (producer-consumer), Execution Clutch (HITL), AI client multi-provider architecture, Anthropic/Gemini caching strategies, context refresh, comms logging, state machines | +| [Tools & IPC](guide_tools.md) | MCP Bridge 3-layer security model, all 26 native tool signatures, Hook API GET/POST endpoints with request/response formats, ApiHookClient method reference, `/api/ask` synchronous HITL protocol, session logging, shell runner | +| [MMA Orchestration](guide_mma.md) | Ticket/Track/WorkerContext data structures, DAG engine (cycle detection, topological sort), ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia, Tier 4 QA integration, token firewalling, track state persistence | +| [Simulations](guide_simulations.md) | `live_gui` pytest fixture lifecycle, `VerificationLogger`, process cleanup, Puppeteer pattern (8-stage MMA simulation), approval automation, mock provider (`mock_gemini_cli.py`) with JSON-L protocol, visual verification patterns, ASTParser (tree-sitter) vs summarizer (stdlib `ast`) | + +--- + +## GUI Panels ### Projects Panel -The heart of context management. +Configuration and context management. Specifies the Git Directory (for commit tracking) and tracked file paths. Project switching swaps the active file list, discussion history, and settings via `.toml` profiles. -> **Note:** The Config panel has been removed. Output directory and auto-add history settings are now integrated into the Projects and Discussion History panels respectively. - -- **Configuration:** You specify the Git Directory (for commit tracking) and a Main Context File (the markdown file containing your project's notes and schema). -- **Word-Wrap Toggle:** Dynamically swaps text rendering in large read-only panels (Responses, Comms Log) between unwrapped (ideal for viewing precise code formatting) and wrapped (ideal for prose). -- **Project Switching:** Switch between different .toml profiles to instantly swap out your entire active file list, discussion history, and settings. +- **Word-Wrap Toggle**: Dynamically swaps text rendering in large read-only panels (Responses, Comms Log) between unwrapped (code formatting) and wrapped (prose). ### Discussion History -Manages your conversational branches, preventing context poisoning across different tasks. +Manages conversational branches to prevent context poisoning across tasks. -- **Discussions Sub-Menu:** Allows you to create separate timelines for different tasks (e.g., "Refactoring Auth" vs. "Adding API Endpoints"). -- **Git Commit Tracking:** Clicking "Update Commit" reads HEAD from your project's git directory and stamps the discussion. -- **Entry Management:** Each turn has a Role (User, AI, System). You can toggle entries between **Read** and **Edit** modes, collapse them, or hit [+ Max] to open them in the Global Text Viewer. -- **Auto-Add:** If toggled, anything sent from the "Message" panel and returned to the "Response" panel is automatically appended to the current discussion history. +- **Discussions Sub-Menu**: Create separate timelines for different tasks (e.g., "Refactoring Auth" vs. "Adding API Endpoints"). +- **Git Commit Tracking**: "Update Commit" reads HEAD from the project's git directory and stamps the discussion. +- **Entry Management**: Each turn has a Role (User, AI, System). Toggle between Read/Edit modes, collapse entries, or open in the Global Text Viewer via `[+ Max]`. +- **Auto-Add**: When toggled, Message panel sends and Response panel returns are automatically appended to the current discussion. ### Files & Screenshots -Controls what is explicitly fed into the context compiler. +Controls what is fed into the context compiler. -- **Base Dir:** Defines the root for path resolution and tool constraints. -- **Paths:** Explicit files or wildcard globs (e.g., src/**/*.rs). -- When generating a request, full file contents are inlined into the context by default (`summary_only=False`). The AI can also call `get_file_summary` via its MCP tools to get a compact structural view of any file on demand. - -## Interaction Panels +- **Base Dir**: Defines the root for path resolution and MCP tool constraints. +- **Paths**: Explicit files or wildcard globs (`src/**/*.rs`). +- Full file contents are inlined by default. The AI can call `get_file_summary` for compact structural views. ### Provider -Switch between API backends (Gemini, Anthropic) on the fly. Clicking "Fetch Models" queries the active provider for the latest model list. +Switches between API backends (Gemini, Anthropic, DeepSeek, Gemini CLI). "Fetch Models" queries the active provider for the latest model list. ### Message & Response -- **Message:** Your input field. -- **Gen + Send:** Compiles the markdown context and dispatches the background thread to the AI. -- **MD Only:** Dry-runs the compiler so you can inspect the generated _00N.md without triggering an API charge. -- **Response:** The read-only output. Flashes green when a new response arrives. +- **Message**: User input field. +- **Gen + Send**: Compiles markdown context and dispatches to the AI via `AsyncEventQueue`. +- **MD Only**: Dry-runs the compiler for context inspection without API cost. +- **Response**: Read-only output; flashes green on new response. ### Global Text Viewer & Script Outputs -- **Last Script Output:** Whenever the AI executes a background script, this window pops up, flashing blue. It contains both the executed script and the stdout/stderr. The `[+ Maximize]` buttons read directly from stored instance variables (`_last_script`, `_last_output`) rather than DPG widget tags, so they work correctly regardless of word-wrap state. -- **Text Viewer:** A large, resizable global popup invoked anytime you click a [+] or [+ Maximize] button in the UI. Used for deep-reading long logs, discussion entries, or script bodies. -- **Confirm Dialog:** The `[+ Maximize]` button in the script approval modal passes the script text directly as `user_data` at button-creation time, so it remains safe to click even after the dialog has been dismissed. +- **Last Script Output**: Pops up (flashing blue) whenever the AI executes a script. Shows both the executed script and stdout/stderr. `[+ Maximize]` reads from stored instance variables, not DPG widget tags, so it works regardless of word-wrap state. +- **Text Viewer**: Large resizable popup invoked by `[+]` / `[+ Maximize]` buttons. For deep-reading long logs, discussion entries, or script bodies. +- **Confirm Dialog**: The `[+ Maximize]` button in the script approval modal passes script text as `user_data` at button-creation time — safe to click even after the dialog is dismissed. -## System Prompts +### Tool Calls & Comms History -Provides two text inputs for overriding default instructions: +Real-time display of MCP tool invocations and raw API traffic. Each comms entry: timestamp, direction (OUT/IN), kind, provider, model, payload. -1. **Global:** Applied across every project you load. -2. **Project:** Specific to the active workspace. -These are concatenated onto the strict tool-usage guidelines the agent is initialized with. +### MMA Dashboard + +Displays the 4-tier orchestration state: active track, ticket DAG with status indicators, per-tier token usage, output streams. Approval buttons for spawn/step/tool gates. + +### System Prompts + +Two text inputs for instruction overrides: +1. **Global**: Applied across every project. +2. **Project**: Specific to the active workspace. + +Concatenated onto the base tool-usage guidelines. diff --git a/docs/guide_architecture.md b/docs/guide_architecture.md index 2a9494e..6f54ae1 100644 --- a/docs/guide_architecture.md +++ b/docs/guide_architecture.md @@ -1,72 +1,542 @@ -# Manual Slop: Architectural Technical Reference +# Architecture -A deep-dive into the asynchronous orchestration, state synchronization, and the "Linear Execution Clutch" of the Manual Slop engine. This document is designed to move the reader from a high-level mental model to a low-level implementation understanding. +[Top](../Readme.md) | [Tools & IPC](guide_tools.md) | [MMA Orchestration](guide_mma.md) | [Simulations](guide_simulations.md) --- -## 1. Philosophy: The Decoupled State Machine +## Philosophy: The Decoupled State Machine -Manual Slop is built on a single, core realization: **AI reasoning is high-latency and non-deterministic, while GUI interaction must be low-latency and responsive.** - -To solve this, the engine enforces a strict decoupling between three distinct boundaries: - -* **The GUI Boundary (Main Thread):** A retained-mode loop (ImGui) that must never block. It handles visual telemetry and user "Seal of Approval" actions. -* **The AI Boundary (Daemon Threads):** Stateless execution loops that handle the "heavy lifting" of context aggregation, LLM communication, and tool reasoning. -* **The Orchestration Boundary (Asyncio):** A background thread that manages the flow of data between the other two, ensuring thread-safe communication without blocking the UI. +Manual Slop solves a single tension: **AI reasoning is high-latency and non-deterministic; GUI interaction must be low-latency and responsive.** The engine enforces strict decoupling between three thread domains so that multi-second LLM calls never block the render loop, and every AI-generated payload passes through a human-auditable gate before execution. --- -## 2. System Lifetime & Initialization +## Thread Domains -The application lifecycle, managed by `App` in `gui_2.py`, follows a precise sequence to ensure the environment is ready before the first frame: +Four distinct thread domains operate concurrently: -1. **Context Hydration:** The engine reads `config.toml` (global) and `.toml` (local). This builds the initial "world view" of the project—what files are tracked, what the discussion history is, and which AI models are active. -2. **Thread Bootstrapping:** - * The `Asyncio` event loop thread is started (`_loop_thread`). - * The `HookServer` (FastAPI) is started as a daemon to handle IPC. -3. **UI Entry:** The main thread enters `immapp.run()`. At this point, the GUI is "alive," and the background threads are ready to receive tasks. -4. **The Dual-Flush Shutdown:** On exit, the system commits state back to both project and global configs. This ensures that your window positions, active discussions, and even pending tool results are preserved for the next session. +| Domain | Created By | Purpose | Lifecycle | +|---|---|---|---| +| **Main / GUI** | `immapp.run()` | Dear ImGui retained-mode render loop; sole writer of GUI state | App lifetime | +| **Asyncio Worker** | `App.__init__` via `threading.Thread(daemon=True)` | Event queue processing, AI client calls | Daemon (dies with process) | +| **HookServer** | `api_hooks.HookServer.start()` | HTTP API on `:8999` for external automation and IPC | Daemon thread | +| **Ad-hoc** | Transient `threading.Thread` calls | Model-fetching, legacy send paths | Short-lived | + +The asyncio worker is **not** the main thread's event loop. It runs a dedicated `asyncio.new_event_loop()` on its own daemon thread: + +```python +# App.__init__: +self._loop = asyncio.new_event_loop() +self._loop_thread = threading.Thread(target=self._run_event_loop, daemon=True) +self._loop_thread.start() + +# _run_event_loop: +def _run_event_loop(self) -> None: + asyncio.set_event_loop(self._loop) + self._loop.create_task(self._process_event_queue()) + self._loop.run_forever() +``` + +The GUI thread uses `asyncio.run_coroutine_threadsafe(coro, self._loop)` to push work into this loop. --- -## 3. The Task Pipeline: Producer-Consumer Synchronization +## Cross-Thread Data Structures -Because ImGui state cannot be safely modified from a background thread, Manual Slop uses a **Producer-Consumer** model for all updates. +All cross-thread communication uses one of three patterns: -### The Flow of an AI Request -1. **Produce:** When you click "Gen + Send," the GUI thread produces a `UserRequestEvent` and pushes it into the `AsyncEventQueue`. -2. **Consume:** The background `asyncio` loop pops this event and dispatches it to the `ai_client`. The GUI thread remains free to render and respond to other inputs. -3. **Task Backlog:** When the AI responds, the background thread *cannot* update the UI text boxes directly. Instead, it appends a **Task Dictionary** to the `_pending_gui_tasks` list. -4. **Sync:** On every frame, the GUI thread checks this list. If tasks exist, it acquires a lock, clears the list, and executes the updates (e.g., "Set AI response text," "Blink the terminal indicator"). +### Pattern A: AsyncEventQueue (GUI -> Asyncio) + +```python +# events.py +class AsyncEventQueue: + _queue: asyncio.Queue # holds Tuple[str, Any] items + + async def put(self, event_name: str, payload: Any = None) -> None + async def get(self) -> Tuple[str, Any] +``` + +The central event bus. Uses `asyncio.Queue`, so non-asyncio threads must enqueue via `asyncio.run_coroutine_threadsafe()`. Consumer is `App._process_event_queue()`, running as a long-lived coroutine on the asyncio loop. + +### Pattern B: Guarded Lists (Any Thread -> GUI) + +Background threads cannot write GUI state directly. They append task dicts to lock-guarded lists; the main thread drains these once per frame: + +```python +# App.__init__: +self._pending_gui_tasks: list[dict[str, Any]] = [] +self._pending_gui_tasks_lock = threading.Lock() + +self._pending_comms: list[dict[str, Any]] = [] +self._pending_comms_lock = threading.Lock() + +self._pending_tool_calls: list[tuple[str, str, float]] = [] +self._pending_tool_calls_lock = threading.Lock() + +self._pending_history_adds: list[dict[str, Any]] = [] +self._pending_history_adds_lock = threading.Lock() +``` + +Additional locks: +```python +self._send_thread_lock = threading.Lock() # Guards send_thread creation +self._pending_dialog_lock = threading.Lock() # Guards _pending_dialog + _pending_actions dict +``` + +### Pattern C: Condition-Variable Dialogs (Bidirectional Blocking) + +Used for Human-in-the-Loop (HITL) approval. Background thread blocks on `threading.Condition`; GUI thread signals after user action. See the [HITL section](#the-execution-clutch-human-in-the-loop) below. --- -## 4. The Execution Clutch: Human-In-The-Loop (HITL) +## Event System -The "Execution Clutch" is our answer to the "Black Box" problem of AI. It allows you to shift from automatic execution to a manual, deterministic step-through mode. +Three classes in `events.py` (89 lines, no external dependencies beyond `asyncio` and `typing`): -### How the "Shifting" Works -When the AI requests a destructive action (like running a PowerShell script), the background execution thread is **suspended** using a `threading.Condition`: +### EventEmitter -1. **The Pause:** The thread enters a `.wait()` state. It is physically blocked. -2. **The Modal:** A task is sent to the GUI to open a modal dialog. -3. **The Mutation:** The user can read the script, edit it, or reject it. -4. **The Unleash:** When the user clicks "Approve," the GUI thread updates the shared state and calls `.notify_all()`. The background thread "wakes up," executes the (potentially modified) script, and reports the result back to the AI. +```python +class EventEmitter: + _listeners: Dict[str, List[Callable]] + + def on(self, event_name: str, callback: Callable) -> None + def emit(self, event_name: str, *args: Any, **kwargs: Any) -> None +``` + +Synchronous pub-sub. Callbacks execute in the caller's thread. Used by `ai_client.events` for lifecycle hooks (`request_start`, `response_received`, `tool_execution`). No thread safety — relies on consistent single-thread usage. + +### AsyncEventQueue + +Described above in Pattern A. + +### UserRequestEvent + +```python +class UserRequestEvent: + prompt: str # User's raw input text + stable_md: str # Generated markdown context (files, screenshots) + file_items: List[Any] # File attachment items for dynamic refresh + disc_text: str # Serialized discussion history + base_dir: str # Working directory for shell commands + + def to_dict(self) -> Dict[str, Any] +``` + +Pure data carrier. Created on the GUI thread in `_handle_generate_send`, consumed on the asyncio thread in `_handle_request_event`. --- -## 5. Security: The MCP Allowlist +## Application Lifetime -To prevent "hallucinated" file access, every filesystem tool (read, list, search) is gated by the **MCP (Model Context Protocol) Bridge**: +### Boot Sequence -* **Resolution:** Every path requested by the AI is resolved to an absolute path. -* **Checking:** It is verified against the project's `base_dir`. If the AI tries to `read_file("C:/Windows/System32/...")`, the bridge intercepts the call and returns an `ACCESS DENIED` error to the model before the OS is ever touched. +The `App.__init__` (lines 152-296) follows this precise order: + +1. **Config hydration**: Reads `config.toml` (global) and `.toml` (local). Builds the initial "world view" — tracked files, discussion history, active models. +2. **Thread bootstrapping**: + - Asyncio event loop thread starts (`_loop_thread`). + - `HookServer` starts as a daemon if `test_hooks_enabled` or provider is `gemini_cli`. +3. **Callback wiring** (`_init_ai_and_hooks`): Connects `ai_client.confirm_and_run_callback`, `comms_log_callback`, `tool_log_callback` to GUI handlers. +4. **UI entry**: Main thread enters `immapp.run()`. GUI is now alive; background threads are ready. + +### Shutdown Sequence + +When `immapp.run()` returns (user closed window): + +1. `hook_server.stop()` — shuts down HTTP server, joins thread. +2. `perf_monitor.stop()`. +3. `ai_client.cleanup()` — destroys server-side API caches (Gemini `CachedContent`). +4. **Dual-Flush persistence**: `_flush_to_project()`, `_save_active_project()`, `_flush_to_config()`, `save_config()` — commits state back to both project and global configs. +5. `session_logger.close_session()`. + +The asyncio loop thread is a daemon — it dies with the process. `App.shutdown()` exists for explicit cleanup in test scenarios: + +```python +def shutdown(self) -> None: + if self._loop.is_running(): + self._loop.call_soon_threadsafe(self._loop.stop) + if self._loop_thread.is_alive(): + self._loop_thread.join(timeout=2.0) +``` --- -## 6. Telemetry & Auditing +## The Task Pipeline: Producer-Consumer Synchronization -Every interaction in Manual Slop is designed to be auditable: -* **JSON-L Comms Logs:** Raw API traffic is logged for debugging and token cost analysis. -* **Generated Scripts:** Every script that passes through the "Clutch" is saved to `scripts/generated/`. -* **Performance Monitor:** Real-time metrics (FPS, Frame Time, Input Lag) are tracked and can be queried via the Hook API to ensure the UI remains "fluid" under load. +### Request Flow + +``` +GUI Thread Asyncio Thread GUI Thread (next frame) +────────── ────────────── ────────────────────── +1. User clicks "Gen + Send" +2. _handle_generate_send(): + - Compiles md context + - Creates UserRequestEvent + - Enqueues via + run_coroutine_threadsafe ──> 3. _process_event_queue(): + awaits event_queue.get() + routes "user_request" to + _handle_request_event() + 4. Configures ai_client + 5. ai_client.send() BLOCKS + (seconds to minutes) + 6. On completion, enqueues + "response" event back ──> 7. _process_pending_gui_tasks(): + Drains task list under lock + Sets ai_response text + Triggers terminal blink +``` + +### Event Types Routed by `_process_event_queue` + +| Event Name | Action | +|---|---| +| `"user_request"` | Calls `_handle_request_event(payload)` — synchronous blocking AI call | +| `"response"` | Appends `{"action": "handle_ai_response", ...}` to `_pending_gui_tasks` | +| `"mma_state_update"` | Appends `{"action": "mma_state_update", ...}` to `_pending_gui_tasks` | +| `"mma_spawn_approval"` | Appends the raw payload for HITL dialog creation | +| `"mma_step_approval"` | Appends the raw payload for HITL dialog creation | + +The pattern: events arriving on the asyncio thread that need GUI state changes are **serialized into `_pending_gui_tasks`** for consumption on the next render frame. + +### Frame-Sync Mechanism: `_process_pending_gui_tasks` + +Called once per ImGui frame on the **main GUI thread**. This is the sole safe point for mutating GUI-visible state. + +**Locking strategy** — copy-and-clear: + +```python +def _process_pending_gui_tasks(self) -> None: + if not self._pending_gui_tasks: + return + with self._pending_gui_tasks_lock: + tasks = self._pending_gui_tasks[:] # Snapshot + self._pending_gui_tasks.clear() # Release lock fast + for task in tasks: + # Process each task outside the lock +``` + +Acquires the lock briefly to snapshot the task list, then processes outside the lock. Minimizes lock contention with producer threads. + +### Complete Action Type Catalog + +| Action | Source | Effect | +|---|---|---| +| `"refresh_api_metrics"` | asyncio/hooks | Updates API metrics display | +| `"handle_ai_response"` | asyncio | Sets `ai_response`, `ai_status`, `mma_streams[stream_id]`; triggers blink; optionally auto-adds to discussion history | +| `"show_track_proposal"` | asyncio | Sets `proposed_tracks` list, opens modal | +| `"mma_state_update"` | asyncio | Updates `mma_status`, `active_tier`, `mma_tier_usage`, `active_tickets`, `active_track` | +| `"set_value"` | HookServer | Sets any field in `_settable_fields` map via `setattr`; special-cases `current_provider`/`current_model` to reconfigure AI client | +| `"click"` | HookServer | Dispatches to `_clickable_actions` map; introspects signatures to decide whether to pass `user_data` | +| `"select_list_item"` | HookServer | Routes to `_switch_discussion()` for discussion listbox | +| `{"type": "ask"}` | HookServer | Opens ask dialog: sets `_pending_ask_dialog = True`, stores `_ask_request_id` and `_ask_tool_data` | +| `"clear_ask"` | HookServer | Clears ask dialog state if request_id matches | +| `"custom_callback"` | HookServer | Executes an arbitrary callable with args | +| `"mma_step_approval"` | asyncio (MMA engine) | Creates `MMAApprovalDialog`, stores in `_pending_mma_approval` | +| `"mma_spawn_approval"` | asyncio (MMA engine) | Creates `MMASpawnApprovalDialog`, stores in `_pending_mma_spawn` | +| `"refresh_from_project"` | HookServer/internal | Reloads all UI state from project dict | + +--- + +## The Execution Clutch: Human-in-the-Loop + +The "Execution Clutch" ensures every destructive AI action passes through an auditable human gate. Three dialog types implement this, all sharing the same blocking pattern. + +### Dialog Classes + +**`ConfirmDialog`** — PowerShell script execution approval: + +```python +class ConfirmDialog: + _uid: str # uuid4 identifier + _script: str # The PowerShell script text (editable) + _base_dir: str # Working directory + _condition: threading.Condition # Blocking primitive + _done: bool # Signal flag + _approved: bool # User's decision + + def wait(self) -> tuple[bool, str] # Blocks until _done; returns (approved, script) +``` + +**`MMAApprovalDialog`** — MMA tier step approval: + +```python +class MMAApprovalDialog: + _ticket_id: str + _payload: str # The step payload (editable) + _condition: threading.Condition + _done: bool + _approved: bool + + def wait(self) -> tuple[bool, str] # Returns (approved, payload) +``` + +**`MMASpawnApprovalDialog`** — Sub-agent spawn approval: + +```python +class MMASpawnApprovalDialog: + _ticket_id: str + _role: str # tier3-worker, tier4-qa, etc. + _prompt: str # Spawn prompt (editable) + _context_md: str # Context document (editable) + _condition: threading.Condition + _done: bool + _approved: bool + _abort: bool # Can abort entire track + + def wait(self) -> dict[str, Any] # Returns {approved, abort, prompt, context_md} +``` + +### Blocking Flow + +Using `ConfirmDialog` as exemplar: + +``` + ASYNCIO THREAD (ai_client tool callback) GUI MAIN THREAD + ───────────────────────────────────────── ─────────────── + 1. ai_client calls _confirm_and_run(script) + 2. Creates ConfirmDialog(script, base_dir) + 3. Stores dialog: + - Headless: _pending_actions[uid] = dialog + - GUI mode: _pending_dialog = dialog + 4. If test_hooks_enabled: + pushes to _api_event_queue + 5. dialog.wait() BLOCKS on _condition + 6. Next frame: ImGui renders + _pending_dialog in modal + 7. User clicks Approve/Reject + 8. _handle_approve_script(): + with dialog._condition: + dialog._approved = True + dialog._done = True + dialog._condition.notify_all() + 9. wait() returns (True, potentially_edited_script) + 10. Executes shell_runner.run_powershell() + 11. Returns output to ai_client +``` + +The `_condition.wait(timeout=0.1)` uses a 100ms polling interval inside a loop — a polling-with-condition hybrid that ensures the blocking thread wakes periodically. + +### Resolution Paths + +**GUI button path** (normal interactive use): +`_handle_approve_script()` / `_handle_approve_mma_step()` / `_handle_approve_spawn()` directly manipulate the dialog's condition variable from the GUI thread. + +**HTTP API path** (headless/automation): +`resolve_pending_action(action_id, approved)` looks up the dialog by UUID in `_pending_actions` dict (headless) or `_pending_dialog` (GUI), then signals the condition: + +```python +def resolve_pending_action(self, action_id: str, approved: bool) -> bool: + with self._pending_dialog_lock: + if action_id in self._pending_actions: + dialog = self._pending_actions[action_id] + with dialog._condition: + dialog._approved = approved + dialog._done = True + dialog._condition.notify_all() + return True +``` + +**MMA approval path**: +`_handle_mma_respond(approved, payload, abort, prompt, context_md)` is the unified resolver. It uses a `dialog_container` — a one-element list `[None]` used as a mutable reference shared between the MMA engine (which creates the container) and the GUI (which populates it via `_process_pending_gui_tasks`). + +--- + +## AI Client: Multi-Provider Architecture + +`ai_client.py` operates as a **stateful singleton** — all provider state is held in module-level globals. There is no class wrapping; the module itself is the abstraction layer. + +### Module-Level State + +```python +_provider: str = "gemini" # "gemini" | "anthropic" | "deepseek" | "gemini_cli" +_model: str = "gemini-2.5-flash-lite" +_temperature: float = 0.0 +_max_tokens: int = 8192 +_history_trunc_limit: int = 8000 # Char limit for truncating old tool outputs + +_send_lock: threading.Lock # Serializes ALL send() calls across providers +``` + +Per-provider client objects: + +```python +# Gemini (SDK-managed stateful chat) +_gemini_client: genai.Client | None +_gemini_chat: Any # Holds history internally +_gemini_cache: Any # Server-side CachedContent +_gemini_cache_md_hash: int | None # For cache invalidation +_GEMINI_CACHE_TTL: int = 3600 # 1-hour; rebuilt at 90% (3240s) + +# Anthropic (client-managed history) +_anthropic_client: anthropic.Anthropic | None +_anthropic_history: list[dict] # Mutable [{role, content}, ...] +_anthropic_history_lock: threading.Lock + +# DeepSeek (raw HTTP, client-managed history) +_deepseek_history: list[dict] +_deepseek_history_lock: threading.Lock + +# Gemini CLI (adapter wrapper) +_gemini_cli_adapter: GeminiCliAdapter | None +``` + +Safety limits: + +```python +MAX_TOOL_ROUNDS: int = 10 # Max tool-call loop iterations per send() +_MAX_TOOL_OUTPUT_BYTES: int = 500_000 # 500KB cumulative tool output budget +_ANTHROPIC_CHUNK_SIZE: int = 120_000 # Max chars per system text block +_ANTHROPIC_MAX_PROMPT_TOKENS: int = 180_000 # 200k limit minus headroom +_GEMINI_MAX_INPUT_TOKENS: int = 900_000 # 1M window minus headroom +``` + +### The `send()` Dispatcher + +```python +def send(md_content, user_message, base_dir=".", file_items=None, + discussion_history="", stream=False, + pre_tool_callback=None, qa_callback=None) -> str: + with _send_lock: + if _provider == "gemini": return _send_gemini(...) + elif _provider == "gemini_cli": return _send_gemini_cli(...) + elif _provider == "anthropic": return _send_anthropic(...) + elif _provider == "deepseek": return _send_deepseek(..., stream=stream) +``` + +`_send_lock` serializes all API calls — only one provider call can be in-flight at a time. All providers share the same callback signatures. Return type is always `str`. + +### Provider Comparison + +| Aspect | Gemini SDK | Anthropic | DeepSeek | Gemini CLI | +|---|---|---|---|---| +| **Client** | `genai.Client` | `anthropic.Anthropic` | Raw `requests.post` | `GeminiCliAdapter` (subprocess) | +| **History** | SDK-managed (`_gemini_chat._history`) | Client-managed list | Client-managed list | CLI-managed (session ID) | +| **Caching** | Server-side `CachedContent` with TTL | Prompt caching via `cache_control: ephemeral` (4 breakpoints) | None | None | +| **Tool format** | `types.FunctionDeclaration` | JSON Schema dict | Not declared | Same as SDK via adapter | +| **Tool results** | `Part.from_function_response(response={"output": ...})` | `{"type": "tool_result", "tool_use_id": ..., "content": ...}` | `{"role": "tool", "tool_call_id": ..., "content": ...}` | `{"role": "tool", ...}` | +| **History trimming** | In-place at 40% of 900K token estimate | 2-phase: strip stale file refreshes, then drop turn pairs at 180K | None | None | +| **Streaming** | No | No | Yes | No | + +### Tool-Call Loop (common pattern across providers) + +All providers follow the same high-level loop, iterated up to `MAX_TOOL_ROUNDS + 2` times: + +1. Send message (or tool results from prior round) to API. +2. Extract text response and any function calls. +3. Log to comms log; emit events. +4. If no function calls or max rounds exceeded: **break**. +5. For each function call: + - If `pre_tool_callback` rejects: return rejection text. + - Dispatch to `mcp_client.dispatch()` or `shell_runner.run_powershell()`. + - After the **last** call of this round: run `_reread_file_items()` for context refresh. + - Truncate tool output at `_history_trunc_limit` chars. + - Accumulate `_cumulative_tool_bytes`. +6. If cumulative bytes > 500KB: inject warning. +7. Package tool results in provider-specific format; loop. + +### Context Refresh Mechanism + +After the last tool call in each round, `_reread_file_items(file_items)` checks mtimes of all tracked files: + +1. For each file item: compare `Path.stat().st_mtime` against stored `mtime`. +2. If unchanged: pass through as-is. +3. If changed: re-read content, store `old_content` for diffing, update `mtime`. +4. Changed files are diffed via `_build_file_diff_text`: + - Files <= 200 lines: emit full content. + - Files > 200 lines with `old_content`: emit `difflib.unified_diff`. +5. Diff is appended to the last tool's output as `[SYSTEM: FILES UPDATED]\n\n{diff}`. +6. Stale `[FILES UPDATED]` blocks are stripped from older history turns by `_strip_stale_file_refreshes` to prevent context bloat. + +### Anthropic Cache Strategy (4-Breakpoint System) + +Anthropic allows a maximum of 4 `cache_control: ephemeral` breakpoints: + +| # | Location | Purpose | +|---|---|---| +| 1 | Last block of stable system prompt | Cache base instructions | +| 2 | Last block of context chunks | Cache file context | +| 3 | Last tool definition | Cache tool schema | +| 4 | Second-to-last user message | Cache conversation prefix | + +Before placing breakpoint 4, all existing `cache_control` is stripped from history to prevent exceeding the limit. + +### Gemini Cache Strategy (Server-Side TTL) + +System instruction content is hashed. On each call, a 3-way decision: + +- **Hash changed**: Delete old cache, rebuild with new content. +- **Cache age > 90% of TTL**: Proactive renewal (delete + rebuild). +- **No cache exists**: Create new `CachedContent` if token count >= 2048; otherwise inline. + +--- + +## Comms Log System + +Every API interaction is logged to a module-level list with real-time GUI push: + +```python +def _append_comms(direction: str, kind: str, payload: dict[str, Any]) -> None: + entry = { + "ts": datetime.now().strftime("%H:%M:%S"), + "direction": direction, # "OUT" (to API) or "IN" (from API) + "kind": kind, # "request" | "response" | "tool_call" | "tool_result" + "provider": _provider, + "model": _model, + "payload": payload, + } + _comms_log.append(entry) + if comms_log_callback: + comms_log_callback(entry) # Real-time push to GUI +``` + +--- + +## State Machines + +### `ai_status` (Informal) + +``` +"idle" -> "sending..." -> [AI call in progress] + -> "running powershell..." -> "powershell done, awaiting AI..." + -> "fetching url..." | "searching web..." + -> "done" | "error" + -> "idle" (on reset) +``` + +### HITL Dialog State (Binary per type) + +- `_pending_dialog is not None` — script confirmation active +- `_pending_mma_approval is not None` — MMA step approval active +- `_pending_mma_spawn is not None` — spawn approval active +- `_pending_ask_dialog == True` — tool ask dialog active + +--- + +## Security: The MCP Allowlist + +Every filesystem tool (read, list, search, write) is gated by the MCP Bridge (`mcp_client.py`). See [guide_tools.md](guide_tools.md) for the complete security model, tool inventory, and endpoint reference. + +Summary: Every path is resolved to an absolute path and checked against a dynamically-built allowlist constructed from the project's tracked files and base directories. Files named `history.toml` or `*_history.toml` are hard-blacklisted. + +--- + +## Telemetry & Auditing + +Every interaction is designed to be auditable: + +- **JSON-L Comms Logs**: Raw API traffic logged to `logs/sessions//comms.log` for debugging and token cost analysis. +- **Tool Call Logs**: Markdown-formatted sequential records to `toolcalls.log`. +- **Generated Scripts**: Every PowerShell script that passes through the Execution Clutch is saved to `scripts/generated/_.ps1`. +- **API Hook Logs**: All HTTP hook invocations logged to `apihooks.log`. +- **CLI Call Logs**: Subprocess execution details (command, stdin, stdout, stderr, latency) to `clicalls.log` as JSON-L. +- **Performance Monitor**: Real-time FPS, Frame Time, CPU, Input Lag tracked and queryable via Hook API. + +--- + +## Architectural Invariants + +1. **Single-writer principle**: All GUI state mutations happen on the main thread via `_process_pending_gui_tasks`. Background threads never write GUI state directly. +2. **Copy-and-clear lock pattern**: `_process_pending_gui_tasks` snapshots and clears the task list under the lock, then processes outside the lock. +3. **Context Amnesia**: Each MMA Tier 3 Worker starts with `ai_client.reset_session()`. No conversational bleed between tickets. +4. **Send serialization**: `_send_lock` ensures only one provider call is in-flight at a time across all threads. +5. **Dual-Flush persistence**: On exit, state is committed to both project-level and global-level config files. diff --git a/docs/guide_mma.md b/docs/guide_mma.md new file mode 100644 index 0000000..fac825e --- /dev/null +++ b/docs/guide_mma.md @@ -0,0 +1,368 @@ +# MMA: 4-Tier Multi-Model Agent Orchestration + +[Top](../Readme.md) | [Architecture](guide_architecture.md) | [Tools & IPC](guide_tools.md) | [Simulations](guide_simulations.md) + +--- + +## Overview + +The MMA (Multi-Model Agent) system is a hierarchical task decomposition and execution engine. A high-level "epic" is broken into tracks, tracks are decomposed into tickets with dependency relationships, and tickets are executed by stateless workers with human-in-the-loop approval at every destructive boundary. + +``` +Tier 1: Orchestrator — product alignment, epic → tracks +Tier 2: Tech Lead — track → tickets (DAG), architectural oversight +Tier 3: Worker — stateless TDD implementation per ticket +Tier 4: QA — stateless error analysis, no fixes +``` + +--- + +## Data Structures (`models.py`) + +### Ticket + +The atomic unit of work. All MMA execution revolves around transitioning tickets through their state machine. + +```python +@dataclass +class Ticket: + id: str # e.g., "T-001" + description: str # Human-readable task description + status: str # "todo" | "in_progress" | "completed" | "blocked" + assigned_to: str # Tier assignment: "tier3-worker", "tier4-qa" + target_file: Optional[str] = None # File this ticket modifies + context_requirements: List[str] = field() # Files needed for context injection + depends_on: List[str] = field() # Ticket IDs that must complete first + blocked_reason: Optional[str] = None # Why this ticket is blocked + step_mode: bool = False # If True, requires manual approval before execution + + def mark_blocked(self, reason: str) -> None # Sets status="blocked", stores reason + def mark_complete(self) -> None # Sets status="completed" + def to_dict(self) -> Dict[str, Any] + @classmethod + def from_dict(cls, data) -> "Ticket" +``` + +**Status state machine:** + +``` +todo ──> in_progress ──> completed + | | + v v +blocked blocked +``` + +### Track + +A collection of tickets with a shared goal. + +```python +@dataclass +class Track: + id: str # Track identifier + description: str # Track-level brief + tickets: List[Ticket] = field() # Ordered list of tickets + + def get_executable_tickets(self) -> List[Ticket] + # Returns all 'todo' tickets whose depends_on are all 'completed' +``` + +### WorkerContext + +```python +@dataclass +class WorkerContext: + ticket_id: str # Which ticket this worker is processing + model_name: str # LLM model to use (e.g., "gemini-2.5-flash-lite") + messages: List[dict] # Conversation history for this worker +``` + +--- + +## DAG Engine (`dag_engine.py`) + +Two classes: `TrackDAG` (graph) and `ExecutionEngine` (state machine). + +### TrackDAG + +```python +class TrackDAG: + def __init__(self, tickets: List[Ticket]): + self.tickets = tickets + self.ticket_map = {t.id: t for t in tickets} # O(1) lookup by ID +``` + +**`get_ready_tasks()`**: Returns tickets where `status == 'todo'` AND all `depends_on` have `status == 'completed'`. Missing dependencies are treated as NOT completed (fail-safe). + +**`has_cycle()`**: Classic DFS cycle detection using visited set + recursion stack: + +```python +def has_cycle(self) -> bool: + visited = set() + rec_stack = set() + def is_cyclic(ticket_id): + if ticket_id in rec_stack: return True # Back edge = cycle + if ticket_id in visited: return False # Already explored + visited.add(ticket_id) + rec_stack.add(ticket_id) + for neighbor in ticket.depends_on: + if is_cyclic(neighbor): return True + rec_stack.remove(ticket_id) + return False + for ticket in self.tickets: + if ticket.id not in visited: + if is_cyclic(ticket.id): return True + return False +``` + +**`topological_sort()`**: Calls `has_cycle()` first — raises `ValueError` if cycle found. Standard DFS post-order topological sort. Returns list of ticket ID strings in dependency order. + +### ExecutionEngine + +```python +class ExecutionEngine: + def __init__(self, dag: TrackDAG, auto_queue: bool = False): + self.dag = dag + self.auto_queue = auto_queue +``` + +**`tick()`** — the heartbeat. On each call: +1. Queries `dag.get_ready_tasks()` for eligible tickets. +2. If `auto_queue` is enabled: non-`step_mode` tasks are automatically promoted to `in_progress`. +3. `step_mode` tasks remain in `todo` until `approve_task()` is called. +4. Returns the list of ready tasks. + +**`approve_task(task_id)`**: Manually transitions `todo` → `in_progress` if all dependencies are met. + +**`update_task_status(task_id, status)`**: Force-sets status (used by workers to mark `completed` or `blocked`). + +--- + +## ConductorEngine (`multi_agent_conductor.py`) + +The Tier 2 orchestrator. Owns the execution loop that drives tickets through the DAG. + +```python +class ConductorEngine: + def __init__(self, track: Track, event_queue=None, auto_queue=False): + self.track = track + self.event_queue = event_queue + self.tier_usage = { + "Tier 1": {"input": 0, "output": 0}, + "Tier 2": {"input": 0, "output": 0}, + "Tier 3": {"input": 0, "output": 0}, + "Tier 4": {"input": 0, "output": 0}, + } + self.dag = TrackDAG(self.track.tickets) + self.engine = ExecutionEngine(self.dag, auto_queue=auto_queue) +``` + +### State Broadcast (`_push_state`) + +On every state change, the engine pushes the full orchestration state to the GUI via `AsyncEventQueue`: + +```python +async def _push_state(self, status="running", active_tier=None): + payload = { + "status": status, # "running" | "done" | "blocked" + "active_tier": active_tier, # e.g., "Tier 2 (Tech Lead)", "Tier 3 (Worker): T-001" + "tier_usage": self.tier_usage, + "track": {"id": self.track.id, "title": self.track.description}, + "tickets": [asdict(t) for t in self.track.tickets] + } + await self.event_queue.put("mma_state_update", payload) +``` + +This payload is consumed by the GUI's `_process_pending_gui_tasks` handler for `"mma_state_update"`, which updates `mma_status`, `active_tier`, `mma_tier_usage`, `active_tickets`, and `active_track`. + +### Ticket Ingestion (`parse_json_tickets`) + +Parses a JSON array of ticket dicts (from Tier 2 LLM output) into `Ticket` objects, appends to `self.track.tickets`, then rebuilds the `TrackDAG` and `ExecutionEngine`. + +### Main Execution Loop (`run`) + +```python +async def run(self): + while True: + ready_tasks = self.engine.tick() + + if not ready_tasks: + if all tickets completed: + await self._push_state("done") + break + if any in_progress: + await asyncio.sleep(1) # Waiting for async workers + continue + else: + await self._push_state("blocked") + break + + for ticket in ready_tasks: + if in_progress or (auto_queue and not step_mode): + ticket.status = "in_progress" + await self._push_state("running", f"Tier 3 (Worker): {ticket.id}") + + # Create worker context + context = WorkerContext( + ticket_id=ticket.id, + model_name="gemini-2.5-flash-lite", + messages=[] + ) + + # Execute in thread pool (blocking AI call) + await loop.run_in_executor( + None, run_worker_lifecycle, ticket, context, ... + ) + + await self._push_state("running", "Tier 2 (Tech Lead)") + + elif todo and (step_mode or not auto_queue): + await self._push_state("running", f"Awaiting Approval: {ticket.id}") + await asyncio.sleep(1) # Pause for HITL approval +``` + +--- + +## Tier 2: Tech Lead (`conductor_tech_lead.py`) + +The Tier 2 AI call converts a high-level Track brief into discrete Tier 3 tickets. + +### `generate_tickets(track_brief, module_skeletons) -> list[dict]` + +```python +def generate_tickets(track_brief: str, module_skeletons: str) -> list[dict]: + system_prompt = mma_prompts.PROMPTS.get("tier2_sprint_planning") + user_message = ( + f"### TRACK BRIEF:\n{track_brief}\n\n" + f"### MODULE SKELETONS:\n{module_skeletons}\n\n" + "Please generate the implementation tickets for this track." + ) + # Temporarily override system prompt + old_system_prompt = ai_client._custom_system_prompt + ai_client.set_custom_system_prompt(system_prompt) + try: + response = ai_client.send(md_content="", user_message=user_message) + # Multi-layer JSON extraction: + # 1. Try ```json ... ``` blocks + # 2. Try ``` ... ``` blocks + # 3. Regex search for [ { ... } ] pattern + tickets = json.loads(json_match) + return tickets + finally: + ai_client.set_custom_system_prompt(old_system_prompt) +``` + +The JSON extraction is defensive — handles markdown code fences, bare JSON, and regex fallback for embedded arrays. + +### `topological_sort(tickets: list[dict]) -> list[dict]` + +Convenience wrapper: converts raw dicts to `Ticket` objects, builds a `TrackDAG`, calls `dag.topological_sort()`, returns the original dicts reordered by sorted IDs. + +--- + +## Tier 3: Worker Lifecycle (`run_worker_lifecycle`) + +This free function executes a single ticket. Key behaviors: + +### Context Amnesia + +```python +ai_client.reset_session() # Each ticket starts with a clean slate +``` + +No conversational bleed between tickets. Every worker is stateless. + +### Context Injection + +For `context_requirements` files: +- First file: `parser.get_curated_view(content)` — full skeleton with `@core_logic` and `[HOT]` bodies preserved. +- Subsequent files: `parser.get_skeleton(content)` — cheaper, signatures + docstrings only. + +### Prompt Construction + +```python +user_message = ( + f"You are assigned to Ticket {ticket.id}.\n" + f"Task Description: {ticket.description}\n" + f"\nContext Files:\n{context_injection}\n" + "Please complete this task. If you are blocked and cannot proceed, " + "start your response with 'BLOCKED' and explain why." +) +``` + +### HITL Clutch Integration + +If `event_queue` is provided, `confirm_spawn()` is called before executing, allowing the user to: +- Read the prompt and context. +- Edit both the prompt and context markdown. +- Approve, reject, or abort the entire track. + +The `confirm_spawn` function uses the `dialog_container` pattern: + +1. Create `dialog_container = [None]` (mutable container for thread communication). +2. Push `"mma_spawn_approval"` task to event queue with the container. +3. Poll `dialog_container[0]` every 100ms for up to 60 seconds. +4. When the GUI fills in the dialog, call `.wait()` to get the result. +5. Returns `(approved, modified_prompt, modified_context)`. + +--- + +## Tier 4: QA Error Analysis + +Stateless error analysis. Invoked via the `qa_callback` parameter in `shell_runner.run_powershell()` when a command fails. + +```python +def run_tier4_analysis(error_message: str) -> str: + """Stateless Tier 4 QA analysis of an error message.""" + # Uses a dedicated system prompt for error triage + # Returns analysis text (root cause, suggested fix) + # Does NOT modify any code — analysis only +``` + +Integrated directly into the shell execution pipeline: if `qa_callback` is provided and the command has non-zero exit or stderr output, the callback result is appended to the tool output as `QA ANALYSIS:\n`. + +--- + +## Cross-System Data Flow + +The full MMA lifecycle from epic to completion: + +1. **Tier 1 (Orchestrator)**: User enters an epic description in the GUI. Creates a `Track` with a brief. +2. **Tier 2 (Tech Lead)**: `conductor_tech_lead.generate_tickets()` calls `ai_client.send()` with the `tier2_sprint_planning` prompt, producing a JSON ticket list. +3. **Ingestion**: `ConductorEngine.parse_json_tickets()` ingests the JSON, builds `Ticket` objects, constructs `TrackDAG` + `ExecutionEngine`. +4. **Execution loop**: `ConductorEngine.run()` enters the async loop, calling `engine.tick()` each iteration. +5. **Worker dispatch**: For each ready ticket, `run_worker_lifecycle()` is called in a thread executor. It uses `ai_client.send()` with MCP tools (dispatched through `mcp_client.dispatch()`). +6. **Security enforcement**: MCP tools enforce the allowlist via `_resolve_and_check()` on every filesystem operation. +7. **State broadcast**: `_push_state()` → `AsyncEventQueue` → GUI renders DAG + ticket status. +8. **External visibility**: `ApiHookClient.get_mma_status()` queries the Hook API for the full orchestration state. +9. **HITL gates**: `confirm_spawn()` pushes to event queue → GUI renders dialog → user approves/edits → `dialog_container[0].wait()` returns the decision. + +--- + +## Token Firewalling + +Each tier operates within its own token budget: + +- **Tier 3 workers** use lightweight models (default: `gemini-2.5-flash-lite`) and receive only the files listed in `context_requirements`. +- **Context Amnesia** ensures no accumulated history bleeds between tickets. +- **Tier 2** tracks cumulative `tier_usage` per tier: `{"input": N, "output": N}` for token cost monitoring. +- **First file vs subsequent files**: The first `context_requirements` file gets a curated view (preserving hot paths); subsequent files get only skeletons. + +--- + +## Track State Persistence + +Track state can be persisted to disk via `project_manager.py`: + +``` +conductor/tracks// + spec.md # Track specification (human-authored) + plan.md # Implementation plan with checkbox tasks + metadata.json # Track metadata (id, type, status, timestamps) + state.toml # Structured TrackState with task list +``` + +`project_manager.get_all_tracks(base_dir)` scans the tracks directory with a three-tier metadata fallback: +1. `state.toml` (structured `TrackState`) — counts tasks with `status == "completed"`. +2. `metadata.json` (legacy) — gets id/title/status only. +3. `plan.md` (regex) — counts `- [x]` vs `- [ ]` checkboxes for progress. diff --git a/docs/guide_simulations.md b/docs/guide_simulations.md index 4916b4a..2e235dd 100644 --- a/docs/guide_simulations.md +++ b/docs/guide_simulations.md @@ -1,63 +1,377 @@ -# Manual Slop: Verification & Simulation Framework +# Verification & Simulation Framework -Detailed specification of the live GUI testing infrastructure, simulation lifecycle, and the mock provider strategy. +[Top](../Readme.md) | [Architecture](guide_architecture.md) | [Tools & IPC](guide_tools.md) | [MMA Orchestration](guide_mma.md) --- -## 1. Live GUI Verification Infrastructure - -To verify complex UI state and asynchronous interactions, Manual Slop employs a **Live Verification** strategy using the application's built-in API hooks. +## Infrastructure ### `--enable-test-hooks` -When launched with this flag, the application starts the `HookServer` on port `8999`, exposing its internal state to external HTTP requests. This is the foundation for all automated visual verification. + +When launched with this flag, the application starts the `HookServer` on port `8999`, exposing its internal state to external HTTP requests. This is the foundation for all automated verification. Without this flag, the Hook API is only available when the provider is `gemini_cli`. ### The `live_gui` pytest Fixture -Defined in `tests/conftest.py`, this session-scoped fixture manages the lifecycle of the application under test: -1. **Startup:** Spawns `gui_2.py` in a separate process with `--enable-test-hooks`. -2. **Telemetry:** Polls `/status` until the hook server is ready. -3. **Isolation:** Resets the AI session and clears comms logs between tests to prevent state pollution. -4. **Teardown:** Robustly kills the process tree on completion or failure. + +Defined in `tests/conftest.py`, this session-scoped fixture manages the lifecycle of the application under test. + +**Spawning:** + +```python +@pytest.fixture(scope="session") +def live_gui() -> Generator[tuple[subprocess.Popen, str], None, None]: + process = subprocess.Popen( + ["uv", "run", "python", "-u", gui_script, "--enable-test-hooks"], + stdout=log_file, stderr=log_file, text=True, + creationflags=subprocess.CREATE_NEW_PROCESS_GROUP if os.name == 'nt' else 0 + ) +``` + +- **`-u` flag**: Disables output buffering for real-time log capture. +- **Process group**: On Windows, uses `CREATE_NEW_PROCESS_GROUP` so the entire tree (GUI + child processes) can be killed cleanly. +- **Logging**: Stdout/stderr redirected to `logs/gui_2_py_test.log`. + +**Readiness polling:** + +```python +max_retries = 15 # seconds +while time.time() - start_time < max_retries: + response = requests.get("http://127.0.0.1:8999/status", timeout=0.5) + if response.status_code == 200: + ready = True; break + if process.poll() is not None: break # Process died early + time.sleep(0.5) +``` + +Polls `GET /status` every 500ms for up to 15 seconds. Checks `process.poll()` each iteration to detect early crashes (avoids waiting the full timeout if the GUI exits). Pre-check: tests if port 8999 is already occupied. + +**Failure path:** If the hook server never responds, kills the process tree and calls `pytest.fail()` to abort the entire test session. Diagnostic telemetry (startup time, PID, success/fail) is written via `VerificationLogger`. + +**Teardown:** + +```python +finally: + client = ApiHookClient() + client.reset_session() # Clean GUI state before killing + time.sleep(0.5) + kill_process_tree(process.pid) + log_file.close() +``` + +Sends `reset_session()` via `ApiHookClient` before killing to prevent stale state files. + +**Yield value:** `(process: subprocess.Popen, gui_script: str)`. + +### Session Isolation + +```python +@pytest.fixture(autouse=True) +def reset_ai_client() -> Generator[None, None, None]: + ai_client.reset_session() + ai_client.set_provider("gemini", "gemini-2.5-flash-lite") + yield +``` + +Runs automatically before every test. Resets the `ai_client` module state and defaults to a safe model, preventing state pollution between tests. + +### Process Cleanup + +```python +def kill_process_tree(pid: int | None) -> None: +``` + +- **Windows**: `taskkill /F /T /PID ` — force-kills the process and all children (`/T` is critical since the GUI spawns child processes). +- **Unix**: `os.killpg(os.getpgid(pid), SIGKILL)` to kill the entire process group. + +### VerificationLogger + +Structured diagnostic logging for test telemetry: + +```python +class VerificationLogger: + def __init__(self, test_name: str, script_name: str): + self.logs_dir = Path(f"logs/test/{datetime.now().strftime('%Y%m%d_%H%M%S')}") + + def log_state(self, field: str, before: Any, after: Any, delta: Any = None) + def finalize(self, description: str, status: str, result_msg: str) +``` + +Output format: fixed-width column table (`Field | Before | After | Delta`) written to `logs/test//.txt`. Dual output: file + tagged stdout lines for CI visibility. --- -## 2. Simulation Lifecycle: The "Puppeteer" Pattern +## Simulation Lifecycle: The "Puppeteer" Pattern -Simulations (like `tests/visual_sim_mma_v2.py`) act as a "Puppeteer," driving the GUI through the `ApiHookClient`. +Simulations act as external puppeteers, driving the GUI through the `ApiHookClient` HTTP interface. The canonical example is `tests/visual_sim_mma_v2.py`. -### Phase 1: Environment Setup -* **Provider Mocking:** The simulation sets the `current_provider` to `gemini_cli` and redirects the `gcli_path` to a mock script (e.g., `tests/mock_gemini_cli.py`). -* **Workspace Isolation:** The `files_base_dir` is pointed to a temporary artifacts directory to prevent accidental modification of the host project. +### Stage 1: Mock Provider Setup -### Phase 2: User Interaction Loop -The simulation replicates a human workflow by invoking client methods: -1. `client.set_value('mma_epic_input', '...')`: Injects the epic description. -2. `client.click('btn_mma_plan_epic')`: Triggers the orchestration engine. +```python +client = ApiHookClient() +client.set_value('current_provider', 'gemini_cli') +mock_cli_path = f'{sys.executable} {os.path.abspath("tests/mock_gemini_cli.py")}' +client.set_value('gcli_path', mock_cli_path) +client.set_value('files_base_dir', 'tests/artifacts/temp_workspace') +client.click('btn_project_save') +``` -### Phase 3: Polling & Assertion -Because AI orchestration is asynchronous, simulations use a **Polling with Multi-Modal Approval** loop: -* **State Polling:** The script polls `client.get_mma_status()` in a loop. -* **Auto-Approval:** If the status indicates a pending tool or spawn request, the simulation automatically clicks the approval buttons (`btn_approve_spawn`, `btn_approve_tool`). -* **Verification:** Once the expected state (e.g., "Mock Goal 1" appears in the track list) is detected, the simulation proceeds to the next phase or asserts success. +- Switches the GUI's LLM provider to `gemini_cli` (the CLI adapter). +- Points the CLI binary to `python tests/mock_gemini_cli.py` — all LLM calls go to the mock. +- Redirects `files_base_dir` to a temp workspace to prevent polluting real project directories. +- Saves the project configuration. + +### Stage 2: Epic Planning + +```python +client.set_value('mma_epic_input', 'Develop a new feature') +client.click('btn_mma_plan_epic') +``` + +Enters an epic description and triggers planning. The GUI invokes the LLM (which hits the mock). + +### Stage 3: Poll for Proposed Tracks (60s timeout) + +```python +for _ in range(60): + status = client.get_mma_status() + if status.get('pending_mma_spawn_approval'): client.click('btn_approve_spawn') + elif status.get('pending_mma_step_approval'): client.click('btn_approve_mma_step') + elif status.get('pending_tool_approval'): client.click('btn_approve_tool') + if status.get('proposed_tracks') and len(status['proposed_tracks']) > 0: break + time.sleep(1) +``` + +The **approval automation** is a critical pattern repeated in every polling loop. The MMA engine has three approval gates: +- **Spawn approval**: Permission to create a new worker subprocess. +- **Step approval**: Permission to proceed with the next orchestration step. +- **Tool approval**: Permission to execute a tool call. + +All three are auto-approved by clicking the corresponding button. Without this, the engine would block indefinitely at each gate. + +### Stage 4: Accept Tracks + +```python +client.click('btn_mma_accept_tracks') +``` + +### Stage 5: Poll for Tracks Populated (30s timeout) + +Waits until `status['tracks']` contains a track with `'Mock Goal 1'` in its title. + +### Stage 6: Load Track and Verify Tickets (60s timeout) + +```python +client.click('btn_mma_load_track', user_data=track_id_to_load) +``` + +Then polls until: +- `active_track` matches the loaded track ID. +- `active_tickets` list is non-empty. + +### Stage 7: Verify MMA Status Transitions (120s timeout) + +Polls until `mma_status == 'running'` or `'done'`. Continues auto-approving all gates. + +### Stage 8: Verify Worker Output in Streams (60s timeout) + +```python +streams = status.get('mma_streams', {}) +if any("Tier 3" in k for k in streams.keys()): + tier3_key = [k for k in streams.keys() if "Tier 3" in k][0] + if "SUCCESS: Mock Tier 3 worker" in streams[tier3_key]: + streams_found = True +``` + +Verifies that `mma_streams` contains a key with "Tier 3" and the value contains the exact mock output string. + +### Assertions Summary + +1. Mock provider setup succeeds (try/except with `pytest.fail`). +2. `proposed_tracks` appears within 60 seconds. +3. `'Mock Goal 1'` track exists in tracks list within 30 seconds. +4. Track loads and `active_tickets` populate within 60 seconds. +5. MMA status becomes `'running'` or `'done'` within 120 seconds. +6. Tier 3 worker output with specific mock content appears in `mma_streams` within 60 seconds. --- -## 3. Mock Provider Strategy - -To test the 4-Tier MMA hierarchy without incurring API costs or latency, Manual Slop uses a **Script-Based Mocking** strategy via the `gemini_cli` adapter. +## Mock Provider Strategy ### `tests/mock_gemini_cli.py` -This script simulates the behavior of the `gemini` CLI by: -1. **Input Parsing:** Reading the system prompt and user message from the environment/stdin. -2. **Deterministic Response:** Returning pre-defined JSON payloads (e.g., track definitions, worker implementation scripts) based on keywords in the prompt. -3. **Tool Simulation:** Mimicking function-call responses to trigger the "Execution Clutch" within the GUI. + +A fake Gemini CLI executable that replaces the real `gemini` binary during integration tests. Outputs JSON-L messages matching the real CLI's streaming output protocol. + +**Input mechanism:** + +```python +prompt = sys.stdin.read() # Primary: prompt via stdin +sys.argv # Secondary: management command detection +os.environ.get('GEMINI_CLI_HOOK_CONTEXT') # Tertiary: environment variable +``` + +**Management command bypass:** + +```python +if len(sys.argv) > 1 and sys.argv[1] in ["mcp", "extensions", "skills", "hooks"]: + return # Silent exit +``` + +**Response routing** — keyword matching on stdin content: + +| Prompt Contains | Response | Session ID | +|---|---|---| +| `'PATH: Epic Initialization'` | Two mock Track objects (`mock-track-1`, `mock-track-2`) | `mock-session-epic` | +| `'PATH: Sprint Planning'` | Two mock Ticket objects (`mock-ticket-1` independent, `mock-ticket-2` depends on `mock-ticket-1`) | `mock-session-sprint` | +| `'"role": "tool"'` or `'"tool_call_id"'` | Success message (simulates post-tool-call final answer) | `mock-session-final` | +| Default (Tier 3 worker prompts) | `"SUCCESS: Mock Tier 3 worker implemented the change. [MOCK OUTPUT]"` | `mock-session-default` | + +**Output protocol** — every response is exactly two JSON-L lines: + +```json +{"type": "message", "role": "assistant", "content": ""} +{"type": "result", "status": "success", "stats": {"total_tokens": N, ...}, "session_id": "mock-session-*"} +``` + +This matches the real Gemini CLI's streaming output format. `flush=True` on every `print()` ensures the consuming process receives data immediately. + +**Tool call simulation:** The mock does **not** emit tool calls. It detects tool results in the prompt (`'"role": "tool"'` check) and responds with a final answer, simulating the second turn of a tool-call conversation without actually issuing calls. + +**Debug output:** All debug information goes to stderr, keeping stdout clean for the JSON-L protocol. --- -## 4. Visual Verification Examples +## Visual Verification Patterns -Tests in this framework don't just check return values; they verify the **rendered state** of the application: -* **DAG Integrity:** Verifying that `active_tickets` in the MMA status matches the expected task graph. -* **Stream Telemetry:** Checking `mma_streams` to ensure that output from multiple tiers is correctly captured and displayed in the terminal. -* **Modal State:** Asserting that the correct dialog (e.g., `ConfirmDialog`) is active during a pending tool call. +Tests in this framework don't just check return values — they verify the **rendered state** of the application via the Hook API. -By combining these techniques, Manual Slop achieves a level of verification rigor usually reserved for high-stakes embedded systems or complex graphics engines. +### DAG Integrity + +Verify that `active_tickets` in the MMA status matches the expected task graph: + +```python +status = client.get_mma_status() +tickets = status.get('active_tickets', []) +assert len(tickets) >= 2 +assert any(t['id'] == 'mock-ticket-1' for t in tickets) +``` + +### Stream Telemetry + +Check `mma_streams` to ensure output from multiple tiers is correctly captured and routed: + +```python +streams = status.get('mma_streams', {}) +tier3_keys = [k for k in streams.keys() if "Tier 3" in k] +assert len(tier3_keys) > 0 +assert "SUCCESS" in streams[tier3_keys[0]] +``` + +### Modal State + +Assert that the correct dialog is active during a pending tool call: + +```python +status = client.get_mma_status() +assert status.get('pending_tool_approval') == True +# or +diag = client.get_indicator_state('thinking') +assert diag.get('thinking') == True +``` + +### Performance Monitoring + +Verify UI responsiveness under load: + +```python +perf = client.get_performance() +assert perf['fps'] > 30 +assert perf['input_lag_ms'] < 100 +``` + +--- + +## Supporting Analysis Modules + +### `file_cache.py` — ASTParser (tree-sitter) + +```python +class ASTParser: + def __init__(self, language: str = "python"): + self.language = tree_sitter.Language(tree_sitter_python.language()) + self.parser = tree_sitter.Parser(self.language) + + def parse(self, code: str) -> tree_sitter.Tree + def get_skeleton(self, code: str) -> str + def get_curated_view(self, code: str) -> str +``` + +**`get_skeleton` algorithm:** +1. Parse code to tree-sitter AST. +2. Walk all `function_definition` nodes. +3. For each body (`block` node): + - If first non-comment child is a docstring: preserve docstring, replace rest with `...`. + - Otherwise: replace entire body with `...`. +4. Apply edits in reverse byte order (maintains valid offsets). + +**`get_curated_view` algorithm:** +Enhanced skeleton that preserves bodies under two conditions: +- Function has `@core_logic` decorator. +- Function body contains a `# [HOT]` comment anywhere in its descendants. + +If either condition is true, the body is preserved verbatim. This enables a two-tier code view: hot paths shown in full, boilerplate compressed. + +### `summarize.py` — Heuristic File Summaries + +Token-efficient structural descriptions without AI calls: + +```python +_SUMMARISERS: dict[str, Callable] = { + ".py": _summarise_python, # imports, classes, methods, functions, constants + ".toml": _summarise_toml, # table keys + array lengths + ".md": _summarise_markdown, # h1-h3 headings + ".ini": _summarise_generic, # line count + preview +} +``` + +**`_summarise_python`** uses stdlib `ast`: +1. Parse with `ast.parse()`. +2. Extract deduplicated imports (top-level module names only). +3. Extract `ALL_CAPS` constants (both `Assign` and `AnnAssign`). +4. Extract classes with their method names. +5. Extract top-level function names. + +Output: +``` +**Python** — 150 lines +imports: ast, json, pathlib +constants: TIMEOUT_SECONDS +class ASTParser: __init__, parse, get_skeleton +functions: summarise_file, build_summary_markdown +``` + +### `outline_tool.py` — Hierarchical Code Outline + +```python +class CodeOutliner: + def outline(self, code: str) -> str +``` + +Walks top-level `ast` nodes: +- `ClassDef` → `[Class] Name (Lines X-Y)` + docstring + recurse for methods +- `FunctionDef` → `[Func] Name (Lines X-Y)` or `[Method] Name` if nested +- `AsyncFunctionDef` → `[Async Func] Name (Lines X-Y)` + +Only extracts first line of docstrings. Uses indentation depth as heuristic for method vs function. + +--- + +## Two Parallel Code Analysis Implementations + +The codebase has two parallel approaches for structural code analysis: + +| Aspect | `file_cache.py` (tree-sitter) | `summarize.py` / `outline_tool.py` (stdlib `ast`) | +|---|---|---| +| Parser | tree-sitter with `tree_sitter_python` | Python's built-in `ast` module | +| Precision | Byte-accurate, preserves exact syntax | Line-level, may lose formatting nuance | +| `@core_logic` / `[HOT]` | Supported (selective body preservation) | Not supported | +| Used by | `py_get_skeleton` MCP tool, worker context injection | `get_file_summary` MCP tool, `py_get_code_outline` | +| Performance | Slightly slower (C extension + tree walk) | Faster (pure Python, simpler walk) | diff --git a/docs/guide_tools.md b/docs/guide_tools.md index f95800d..4d6bdb1 100644 --- a/docs/guide_tools.md +++ b/docs/guide_tools.md @@ -1,65 +1,385 @@ -# Manual Slop: Tooling & IPC Technical Reference +# Tooling & IPC Technical Reference -A deep-dive into the Model Context Protocol (MCP) bridge, the Hook API, and the "Human-in-the-Loop" communication protocol. +[Top](../Readme.md) | [Architecture](guide_architecture.md) | [MMA Orchestration](guide_mma.md) | [Simulations](guide_simulations.md) --- -## 1. The MCP Bridge: Filesystem Security +## The MCP Bridge: Filesystem Security -The AI's ability to interact with your filesystem is mediated by a strict security allowlist. +The AI's ability to interact with the filesystem is mediated by a three-layer security model in `mcp_client.py`. Every tool accessing the disk passes through `_resolve_and_check(path)` before any I/O occurs. -### Path Resolution & Sandboxing -Every tool accessing the disk (e.g., `read_file`, `list_directory`, `search_files`) executes `_resolve_and_check(path)`: -1. **Normalization:** The requested path is converted to an absolute path. -2. **Constraint Check:** The path must reside within the project's `base_dir`. -3. **Enforcement:** Violations trigger a `PermissionError`, returned to the model as an `ACCESS DENIED` status. +### Global State -### Native Toolset -* **`read_file(path)`:** UTF-8 extraction, clamped by token budgets. -* **`list_directory(path)`:** Returns a structural map (Name, Type, Size). -* **`get_file_summary(path)`:** AST-based heuristic parsing for high-signal architectural mapping without full-file read costs. -* **`web_search(query)`:** Scrapes DuckDuckGo raw HTML via a dependency-free parser. +```python +_allowed_paths: set[Path] = set() # Explicit file allowlist (resolved absolutes) +_base_dirs: set[Path] = set() # Directory roots for containment checks +_primary_base_dir: Path | None = None # Used for resolving relative paths +perf_monitor_callback: Optional[Callable[[], dict[str, Any]]] = None +``` + +### Layer 1: Allowlist Construction (`configure`) + +Called by `ai_client` before each send cycle. Takes `file_items` (from `aggregate.build_file_items()`) and optional `extra_base_dirs`. + +1. Resets `_allowed_paths` and `_base_dirs` to empty sets on every call. +2. Sets `_primary_base_dir` from `extra_base_dirs[0]` (resolved) or falls back to `Path.cwd()`. +3. Iterates all `file_items`, resolving each `item["path"]` to an absolute path. Each resolved path is added to `_allowed_paths`; its parent directory is added to `_base_dirs`. +4. Any entries in `extra_base_dirs` that are valid directories are also added to `_base_dirs`. + +### Layer 2: Path Validation (`_is_allowed`) + +Checks run in this exact order: + +1. **Blacklist** (hard deny): If filename is `history.toml` or ends with `_history.toml`, return `False`. Prevents the AI from reading conversation history. +2. **Explicit allowlist**: If resolved path is in `_allowed_paths`, return `True`. +3. **CWD fallback**: If `_base_dirs` is empty, any path under `cwd()` is allowed. +4. **Base directory containment**: Path must be a subpath of at least one entry in `_base_dirs` (via `relative_to()`). +5. **Default deny**: All other paths are rejected. + +All paths are resolved (following symlinks) before comparison, preventing symlink-based traversal. + +### Layer 3: Resolution Gate (`_resolve_and_check`) + +Every tool call passes through this: + +1. Convert raw path string to `Path`. +2. If not absolute, prepend `_primary_base_dir`. +3. Resolve to absolute. +4. Call `_is_allowed()`. +5. Return `(resolved_path, "")` on success or `(None, error_message)` on failure. + +The error message includes the full list of allowed base directories for debugging. --- -## 2. The Hook API: Remote Control & Telemetry +## Native Tool Inventory -Manual Slop exposes a REST-based IPC interface (running by default on port `8999`) to facilitate automated verification and external monitoring. +The `dispatch` function (line 806) is a flat if/elif chain mapping 26 tool names to implementations. All tools are categorized below with their parameters and behavior. -### Core Endpoints -* `GET /status`: Engine health and hook server readiness. -* `GET /mma_status`: Retrieves the 4-Tier state, active track metadata, and current ticket DAG status. -* `POST /api/gui`: Pushes events into the `AsyncEventQueue`. - * Payload example: `{"action": "set_value", "item": "current_provider", "value": "anthropic"}` -* `GET /diagnostics`: High-frequency telemetry for UI performance (FPS, CPU, Input Lag). +### File I/O Tools -### ApiHookClient Implementation -The `api_hook_client.py` provides a robust wrapper for the Hook API: -* **Synchronous Wait:** `wait_for_server()` polls `/status` with exponential backoff. -* **State Polling:** `wait_for_value()` blocks until a specific GUI element matches an expected state. -* **Remote Interaction:** `click()`, `set_value()`, and `select_tab()` methods allow external agents to drive the GUI. +| Tool | Parameters | Description | +|---|---|---| +| `read_file` | `path` | UTF-8 file content extraction | +| `list_directory` | `path` | Compact table: `[file/dir] name size`. Applies blacklist filter to entries. | +| `search_files` | `path`, `pattern` | Glob pattern matching within an allowed directory. Applies blacklist filter. | +| `get_file_slice` | `path`, `start_line`, `end_line` | Returns specific line range (1-based, inclusive) | +| `set_file_slice` | `path`, `start_line`, `end_line`, `new_content` | Replaces a line range with new content (surgical edit) | +| `get_tree` | `path`, `max_depth` | Directory structure up to `max_depth` levels | + +### AST-Based Tools (Python only) + +These use `file_cache.ASTParser` (tree-sitter) or stdlib `ast` for structural code analysis: + +| Tool | Parameters | Description | +|---|---|---| +| `py_get_skeleton` | `path` | Signatures + docstrings, bodies replaced with `...`. Uses tree-sitter. | +| `py_get_code_outline` | `path` | Hierarchical outline: `[Class] Name (Lines X-Y)` with nested methods. Uses stdlib `ast`. | +| `py_get_definition` | `path`, `name` | Full source of a specific class/function/method. Supports `ClassName.method` dot notation. | +| `py_update_definition` | `path`, `name`, `new_content` | Surgical replacement: locates symbol via `ast`, delegates to `set_file_slice`. | +| `py_get_signature` | `path`, `name` | Only the `def` line through the colon. | +| `py_set_signature` | `path`, `name`, `new_signature` | Replaces only the signature, preserving body. | +| `py_get_class_summary` | `path`, `name` | Class docstring + list of method signatures. | +| `py_get_var_declaration` | `path`, `name` | Module-level or class-level variable assignment line(s). | +| `py_set_var_declaration` | `path`, `name`, `new_declaration` | Surgical variable replacement. | +| `py_find_usages` | `path`, `name` | Exact string match search across a file or directory. | +| `py_get_imports` | `path` | Parses AST, returns strict dependency list. | +| `py_check_syntax` | `path` | Quick syntax validation via `ast.parse()`. | +| `py_get_hierarchy` | `path`, `class_name` | Scans directory for subclasses of a given class. | +| `py_get_docstring` | `path`, `name` | Extracts docstring for module, class, or function. | + +### Analysis Tools + +| Tool | Parameters | Description | +|---|---|---| +| `get_file_summary` | `path` | Heuristic summary via `summarize.py`: imports, classes, functions, constants for `.py`; table keys for `.toml`; headings for `.md`. | +| `get_git_diff` | `path`, `base_rev`, `head_rev` | Git diff output for a file or directory. | + +### Network Tools + +| Tool | Parameters | Description | +|---|---|---| +| `web_search` | `query` | Scrapes DuckDuckGo HTML via dependency-free `_DDGParser` (HTMLParser subclass). Returns top 5 results with title, URL, snippet. | +| `fetch_url` | `url` | Fetches URL content, strips HTML tags via `_TextExtractor`. | + +### Runtime Tools + +| Tool | Parameters | Description | +|---|---|---| +| `get_ui_performance` | (none) | Returns FPS, Frame Time, CPU, Input Lag via injected `perf_monitor_callback`. No security check (no filesystem access). | + +### Tool Implementation Patterns + +**AST-based read tools** follow this pattern: +```python +def py_get_skeleton(path: str) -> str: + p, err = _resolve_and_check(path) + if err: return err + if not p.exists(): return f"ERROR: file not found: {path}" + if not p.is_file() or p.suffix != ".py": return f"ERROR: not a python file: {path}" + from file_cache import ASTParser + code = p.read_text(encoding="utf-8") + parser = ASTParser("python") + return parser.get_skeleton(code) +``` + +**AST-based write tools** use stdlib `ast` (not tree-sitter) to locate symbols, then delegate to `set_file_slice`: +```python +def py_update_definition(path: str, name: str, new_content: str) -> str: + p, err = _resolve_and_check(path) + if err: return err + code = p.read_text(encoding="utf-8").lstrip(chr(0xFEFF)) # Strip BOM + tree = ast.parse(code) + node = _get_symbol_node(tree, name) # Walks AST for matching node + if not node: return f"ERROR: could not find definition '{name}'" + start = getattr(node, "lineno") + end = getattr(node, "end_lineno") + return set_file_slice(path, start, end, new_content) +``` + +The `_get_symbol_node` helper supports dot notation (`ClassName.method_name`) by first finding the class, then searching its body for the method. --- -## 3. The HITL IPC Flow: `ask/respond` +## The Hook API: Remote Control & Telemetry -Manual Slop supports a synchronous "Human-in-the-Loop" request pattern for operations requiring explicit confirmation or manual data mutation. +Manual Slop exposes a REST-based IPC interface on `127.0.0.1:8999` using Python's `ThreadingHTTPServer`. Each incoming request gets its own thread. -### Sequence of Operation -1. **Request:** A background agent (e.g., a Tier 3 Worker) calls `/api/ask` with a JSON payload. -2. **Intercept:** the `HookServer` generates a unique `request_id` and pushes a `type: "ask"` event to the GUI's `_pending_gui_tasks`. -3. **Modal Display:** The GUI renders an `Approve/Reject` modal with the payload details. -4. **Response:** Upon user action, the GUI thread `POST`s to `/api/ask/respond`. -5. **Resume:** The original agent call to `/api/ask` (which was polling for completion) unblocks and receives the user's response. +### Server Architecture -This pattern is the foundation of the **Execution Clutch**, ensuring that no destructive action occurs without an auditable human signal. +```python +class HookServerInstance(ThreadingHTTPServer): + app: Any # Reference to main App instance + +class HookHandler(BaseHTTPRequestHandler): + # Accesses self.server.app for all state + +class HookServer: + app: Any + port: int = 8999 + server: HookServerInstance | None + thread: threading.Thread | None +``` + +**Start conditions**: Only starts if `app.test_hooks_enabled == True` OR current provider is `'gemini_cli'`. Otherwise `start()` silently returns. + +**Initialization**: On start, ensures the app has `_pending_gui_tasks` + lock, `_pending_asks` + `_ask_responses` dicts, and `_api_event_queue` + lock. + +### GUI Thread Trampoline Pattern + +The HookServer **never reads GUI state directly** (thread safety). For state reads, it uses a trampoline: + +1. Create a `threading.Event()` and a `result` dict. +2. Push a `custom_callback` closure into `_pending_gui_tasks` that reads state and calls `event.set()`. +3. Block on `event.wait(timeout=60)`. +4. Return `result` as JSON, or 504 on timeout. + +This ensures all state reads happen on the GUI main thread during `_process_pending_gui_tasks`. + +### GET Endpoints + +| Endpoint | Thread Safety | Response | +|---|---|---| +| `GET /status` | Direct (stateless) | `{"status": "ok"}` | +| `GET /api/project` | Direct read | `{"project": }` via `project_manager.flat_config()` | +| `GET /api/session` | Direct read | `{"session": {"entries": [...]}}` from `app.disc_entries` | +| `GET /api/performance` | Direct read | `{"performance": }` from `app.perf_monitor.get_metrics()` | +| `GET /api/events` | Lock-guarded drain | `{"events": [...]}` — drains and clears `_api_event_queue` | +| `GET /api/gui/value` | GUI trampoline | `{"value": }` — reads from `_settable_fields` map | +| `GET /api/gui/value/` | GUI trampoline | Same, via URL path param | +| `GET /api/gui/mma_status` | GUI trampoline | Full MMA state dict (see below) | +| `GET /api/gui/diagnostics` | GUI trampoline | `{thinking, live, prior}` booleans | + +**`/api/gui/mma_status` response fields:** + +```python +{ + "mma_status": str, # "idle" | "planning" | "executing" | "done" + "ai_status": str, # "idle" | "sending..." | etc. + "active_tier": str | None, + "active_track": str, # Track ID or raw value + "active_tickets": list, # Serialized ticket dicts + "mma_step_mode": bool, + "pending_tool_approval": bool, # _pending_ask_dialog + "pending_mma_step_approval": bool, # _pending_mma_approval is not None + "pending_mma_spawn_approval": bool, # _pending_mma_spawn is not None + "pending_approval": bool, # Backward compat: step OR tool + "pending_spawn": bool, # Alias for spawn approval + "tracks": list, + "proposed_tracks": list, + "mma_streams": dict, # {stream_id: output_text} +} +``` + +**`/api/gui/diagnostics` response fields:** + +```python +{ + "thinking": bool, # ai_status in ["sending...", "running powershell..."] + "live": bool, # ai_status in ["running powershell...", "fetching url...", ...] + "prior": bool, # app.is_viewing_prior_session +} +``` + +### POST Endpoints + +| Endpoint | Body | Response | Effect | +|---|---|---|---| +| `POST /api/project` | `{"project": {...}}` | `{"status": "updated"}` | Sets `app.project` | +| `POST /api/session` | `{"session": {"entries": [...]}}` | `{"status": "updated"}` | Sets `app.disc_entries` | +| `POST /api/gui` | Any JSON dict | `{"status": "queued"}` | Appends to `_pending_gui_tasks` | +| `POST /api/ask` | Any JSON dict | `{"status": "ok", "response": ...}` or 504 | Blocking ask dialog | +| `POST /api/ask/respond` | `{"request_id": ..., "response": ...}` | `{"status": "ok"}` or 404 | Resolves a pending ask | + +### The `/api/ask` Protocol (Synchronous HITL via HTTP) + +This is the most complex endpoint — it implements a blocking request-response dialog over HTTP: + +1. Generate a UUID `request_id`. +2. Create a `threading.Event`. +3. Register in `app._pending_asks[request_id] = event`. +4. Push an `ask_received` event to `_api_event_queue` (for client discovery). +5. Append `{"type": "ask", "request_id": ..., "data": ...}` to `_pending_gui_tasks`. +6. Block on `event.wait(timeout=60.0)`. +7. On signal: read `app._ask_responses[request_id]`, clean up, return 200. +8. On timeout: clean up, return 504. + +The counterpart `/api/ask/respond`: + +1. Look up `request_id` in `app._pending_asks`. +2. Store `response` in `app._ask_responses[request_id]`. +3. Signal the event (`event.set()`). +4. Queue a `clear_ask` GUI task. +5. Return 200 (or 404 if `request_id` not found). --- -## 4. Synthetic Context Refresh +## ApiHookClient: The Automation Interface -To minimize token churn and redundant `read_file` calls, the `ai_client` performs a post-tool-execution refresh: -1. **Detection:** Triggered after the final tool call in a reasoning round. -2. **Collection:** re-reads all project-tracked files from disk. -3. **Injection:** The updated content is injected into the next LLM turn as a `[SYSTEM: FILES UPDATED]` block. -4. **Pruning:** Older snapshots are stripped from history in subsequent rounds to maintain a lean context window. +`api_hook_client.py` provides a synchronous Python client for the Hook API, used by test scripts and external tooling. + +```python +class ApiHookClient: + def __init__(self, base_url="http://127.0.0.1:8999", max_retries=5, retry_delay=0.2) +``` + +### Connection Methods + +| Method | Description | +|---|---| +| `wait_for_server(timeout=3)` | Polls `/status` with exponential backoff until server is ready. | +| `_make_request(method, endpoint, data, timeout)` | Core HTTP client with retry logic. | + +### State Query Methods + +| Method | Endpoint | Description | +|---|---|---| +| `get_status()` | `GET /status` | Health check | +| `get_project()` | `GET /api/project` | Full project config | +| `get_session()` | `GET /api/session` | Discussion entries | +| `get_mma_status()` | `GET /api/gui/mma_status` | Full MMA orchestration state | +| `get_performance()` | `GET /api/performance` | UI metrics (FPS, CPU, etc.) | +| `get_value(item)` | `GET /api/gui/value/` | Read any `_settable_fields` value | +| `get_text_value(item_tag)` | Wraps `get_value` | Returns string representation or None | +| `get_events()` | `GET /api/events` | Fetches and clears the event queue | +| `get_indicator_state(tag)` | `GET /api/gui/diagnostics` | Checks if an indicator is shown | +| `get_node_status(node_tag)` | Two-phase: `get_value` then `diagnostics` | DAG node status with fallback | + +### GUI Manipulation Methods + +| Method | Endpoint | Description | +|---|---|---| +| `set_value(item, value)` | `POST /api/gui` | Sets any `_settable_fields` value; special-cases `current_provider` and `gcli_path` | +| `click(item, *args, **kwargs)` | `POST /api/gui` | Simulates button click; passes optional `user_data` | +| `select_tab(tab_bar, tab)` | `POST /api/gui` | Switches to a specific tab | +| `select_list_item(listbox, item_value)` | `POST /api/gui` | Selects an item in a listbox | +| `push_event(event_type, payload)` | `POST /api/gui` | Pushes event into `AsyncEventQueue` | +| `post_gui(gui_data)` | `POST /api/gui` | Raw task dict injection | +| `reset_session()` | Clicks `btn_reset_session` | Simulates clicking the Reset Session button | + +### Polling Methods + +| Method | Description | +|---|---| +| `wait_for_event(event_type, timeout=5)` | Polls `get_events()` until a matching event type appears. | +| `wait_for_value(item, expected, timeout=5)` | Polls `get_value(item)` until it equals `expected`. | + +### HITL Method + +| Method | Description | +|---|---| +| `request_confirmation(tool_name, args)` | Sends to `/api/ask`, blocks until user responds via the GUI dialog. | + +--- + +## Synthetic Context Refresh + +To minimize token churn and redundant `read_file` calls, the `ai_client` performs a post-tool-execution context refresh. See [guide_architecture.md](guide_architecture.md#context-refresh-mechanism) for the full algorithm. + +Summary: +1. **Detection**: Triggered after the final tool call in each reasoning round. +2. **Collection**: Re-reads all project-tracked files, comparing mtimes. +3. **Injection**: Changed files are diffed and appended as `[SYSTEM: FILES UPDATED]` to the last tool output. +4. **Pruning**: Older `[FILES UPDATED]` blocks are stripped from history in subsequent rounds. + +--- + +## Session Logging + +`session_logger.py` opens timestamped log files at GUI startup and keeps them open for the process lifetime. + +### File Layout + +``` +logs/sessions// + comms.log # JSON-L: every API interaction (direction, kind, payload) + toolcalls.log # Markdown: sequential tool invocation records + apihooks.log # API hook invocations + clicalls.log # JSON-L: CLI subprocess details (command, stdin, stdout, stderr, latency) + +scripts/generated/ + _.ps1 # Each AI-generated PowerShell script, preserved in order +``` + +### Logging Functions + +| Function | Target | Format | +|---|---|---| +| `log_comms(entry)` | `comms.log` | JSON-L line per entry | +| `log_tool_call(script, result, script_path)` | `toolcalls.log` + `scripts/generated/` | Markdown record + preserved `.ps1` file | +| `log_api_hook(method, path, body)` | `apihooks.log` | Timestamped text line | +| `log_cli_call(command, stdin, stdout, stderr, latency)` | `clicalls.log` | JSON-L with latency tracking | + +### Lifecycle + +- `open_session(label)`: Called once at GUI startup. Idempotent (checks if already open). Registers `atexit.register(close_session)`. +- `close_session()`: Flushes and closes all file handles. + +--- + +## Shell Runner + +`shell_runner.py` executes PowerShell scripts with environment configuration, timeout handling, and optional QA integration. + +### Environment Configuration via `mcp_env.toml` + +```toml +[path] +prepend = ["C:/custom/bin", "C:/other/tools"] + +[env] +MY_VAR = "some_value" +EXPANDED = "${HOME}/subdir" +``` + +`_build_subprocess_env()` copies `os.environ`, prepends `[path].prepend` entries to `PATH`, and sets `[env]` key-value pairs with `${VAR}` expansion. + +### `run_powershell(script, base_dir, qa_callback=None) -> str` + +1. Prepends `Set-Location -LiteralPath ''` (with escaped single quotes). +2. Locates PowerShell: tries `powershell.exe`, `pwsh.exe`, `powershell`, `pwsh` in order. +3. Runs via `subprocess.Popen([exe, "-NoProfile", "-NonInteractive", "-Command", full_script])`. +4. `process.communicate(timeout=60)` — 60-second hard timeout. +5. On `TimeoutExpired`: kills process tree via `taskkill /F /T /PID`, returns `"ERROR: timed out after 60s"`. +6. Returns combined output: `STDOUT:\n\nSTDERR:\n\nEXIT CODE: `. +7. If `qa_callback` provided and command failed: appends `QA ANALYSIS:\n` — integrates Tier 4 QA error analysis directly.