docs: Complete documentation rewrite at gencpp/VEFontCache reference quality
Rewrites all docs from Gemini's 330-line executive summaries to 1874 lines of expert-level architectural reference matching the pedagogical depth of gencpp (Parser_Algo.md, AST_Types.md) and VEFontCache-Odin (guide_architecture.md). Changes: - guide_architecture.md: 73 -> 542 lines. Adds inline data structures for all dialog classes, cross-thread communication patterns, complete action type catalog, provider comparison table, 4-breakpoint Anthropic cache strategy, Gemini server-side cache lifecycle, context refresh algorithm. - guide_tools.md: 66 -> 385 lines. Full 26-tool inventory with parameters, 3-layer MCP security model walkthrough, all Hook API GET/POST endpoints with request/response formats, ApiHookClient method reference, /api/ask synchronous HITL protocol, shell runner with env config. - guide_mma.md: NEW (368 lines). Fills major documentation gap — complete Ticket/Track/WorkerContext data structures, DAG engine algorithms (cycle detection, topological sort), ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia, token firewalling. - guide_simulations.md: 64 -> 377 lines. 8-stage Puppeteer simulation lifecycle, mock_gemini_cli.py JSON-L protocol, approval automation pattern, ASTParser tree-sitter vs stdlib ast comparison, VerificationLogger. - Readme.md: Rewritten with module map, architecture summary, config examples. - docs/Readme.md: Proper index with guide contents table and GUI panel docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
138
Readme.md
138
Readme.md
@@ -1,66 +1,128 @@
|
||||
# Manual Slop
|
||||
|
||||
An experimental, high-density AI orchestration engine designed for expert developers. Manual Slop provides a strictly controlled environment for executing complex, multi-tier AI workflows with deterministic human-in-the-loop (HITL) overrides.
|
||||
A GUI orchestrator for local LLM-driven coding sessions. Manual Slop bridges high-latency AI reasoning with a low-latency ImGui render loop via a thread-safe asynchronous pipeline, ensuring every AI-generated payload passes through a human-auditable gate before execution.
|
||||
|
||||
**Tech Stack**: Python 3.11+, Dear PyGui / ImGui, FastAPI, Uvicorn
|
||||
**Providers**: Gemini API, Anthropic API, DeepSeek, Gemini CLI (headless)
|
||||
**Platform**: Windows (PowerShell) — single developer, local use
|
||||
|
||||
---
|
||||
|
||||
## 1. Technical Philosophy
|
||||
## Architecture at a Glance
|
||||
|
||||
Manual Slop is not a chat interface. It is a **Decoupled State Machine** built on the principle that AI reasoning should be observable, mutable, and interruptible. It bridges high-latency AI execution with a low-latency, retained-mode GUI via a thread-safe asynchronous pipeline.
|
||||
Four thread domains operate concurrently: the ImGui main loop, an asyncio worker for AI calls, a `HookServer` (HTTP on `:8999`) for external automation, and transient threads for model fetching. Background threads never write GUI state directly — they serialize task dicts into lock-guarded lists that the main thread drains once per frame ([details](./docs/guide_architecture.md#the-task-pipeline-producer-consumer-synchronization)).
|
||||
|
||||
### Core Features
|
||||
* **Hierarchical MMA (4-Tier Architecture):** Orchestrate complex tracks using a tiered model (Orchestrator -> Tech Lead -> Worker -> QA) with explicit token firewalling.
|
||||
* **The Execution Clutch:** A deterministic "gear-shifting" mechanism that pauses execution for human inspection and mutation of AI-generated payloads.
|
||||
* **MCP-Bridge & Tooling:** Integrated filesystem sandboxing and native search/fetch tools with project-wide security allowlists.
|
||||
* **Live Simulation Framework:** A robust verification suite using API hooks for automated visual and state assertions.
|
||||
The **Execution Clutch** suspends the AI execution thread on a `threading.Condition` when a destructive action (PowerShell script, sub-agent spawn) is requested. The GUI renders a modal where the user can read, edit, or reject the payload. On approval, the condition is signaled and execution resumes ([details](./docs/guide_architecture.md#the-execution-clutch-human-in-the-loop)).
|
||||
|
||||
The **MMA (Multi-Model Agent)** system decomposes epics into tracks, tracks into DAG-ordered tickets, and executes each ticket with a stateless Tier 3 worker that starts from `ai_client.reset_session()` — no conversational bleed between tickets ([details](./docs/guide_mma.md)).
|
||||
|
||||
---
|
||||
|
||||
## 2. Deep-Dive Documentation
|
||||
## Documentation
|
||||
|
||||
For expert-level technical details, refer to our specialized guides:
|
||||
|
||||
* **[Architectural Technical Reference](./docs/guide_architecture.md):** Deep-dive into thread synchronization, the task pipeline, and the decoupled state machine.
|
||||
* **[Tooling & IPC Reference](./docs/guide_tools.md):** Specification of the Hook API, MCP bridge, and the HITL communication protocol.
|
||||
* **[Verification & Simulation Framework](./docs/guide_simulations.md):** Detailed breakdown of the live GUI testing infrastructure and simulation lifecycle.
|
||||
| Guide | Scope |
|
||||
|---|---|
|
||||
| [Architecture](./docs/guide_architecture.md) | Threading model, event system, AI client multi-provider architecture, HITL mechanism, comms logging |
|
||||
| [Tools & IPC](./docs/guide_tools.md) | MCP Bridge security model, all 26 native tools, Hook API endpoints, ApiHookClient reference, shell runner |
|
||||
| [MMA Orchestration](./docs/guide_mma.md) | 4-tier hierarchy, Ticket/Track data structures, DAG engine, ConductorEngine execution loop, worker lifecycle |
|
||||
| [Simulations](./docs/guide_simulations.md) | `live_gui` fixture, Puppeteer pattern, mock provider, visual verification patterns, ASTParser / summarizer |
|
||||
|
||||
---
|
||||
|
||||
## 3. Setup & Environment
|
||||
## Module Map
|
||||
|
||||
| File | Lines | Role |
|
||||
|---|---|---|
|
||||
| `gui_2.py` | ~3080 | Primary ImGui interface — App class, frame-sync, HITL dialogs |
|
||||
| `ai_client.py` | ~1800 | Multi-provider LLM abstraction (Gemini, Anthropic, DeepSeek, Gemini CLI) |
|
||||
| `mcp_client.py` | ~870 | 26 MCP tools with filesystem sandboxing and tool dispatch |
|
||||
| `api_hooks.py` | ~330 | HookServer — REST API for external automation on `:8999` |
|
||||
| `api_hook_client.py` | ~245 | Python client for the Hook API (used by tests and external tooling) |
|
||||
| `multi_agent_conductor.py` | ~250 | ConductorEngine — Tier 2 orchestration loop with DAG execution |
|
||||
| `conductor_tech_lead.py` | ~100 | Tier 2 ticket generation from track briefs |
|
||||
| `dag_engine.py` | ~100 | TrackDAG (dependency graph) + ExecutionEngine (tick-based state machine) |
|
||||
| `models.py` | ~100 | Ticket, Track, WorkerContext dataclasses |
|
||||
| `events.py` | ~89 | EventEmitter, AsyncEventQueue, UserRequestEvent |
|
||||
| `project_manager.py` | ~300 | TOML config persistence, discussion management, track state |
|
||||
| `session_logger.py` | ~200 | JSON-L + markdown audit trails (comms, tools, CLI, hooks) |
|
||||
| `shell_runner.py` | ~100 | PowerShell execution with timeout, env config, QA callback |
|
||||
| `file_cache.py` | ~150 | ASTParser (tree-sitter) — skeleton and curated views |
|
||||
| `summarize.py` | ~120 | Heuristic file summaries (imports, classes, functions) |
|
||||
| `outline_tool.py` | ~80 | Hierarchical code outline via stdlib `ast` |
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
### Prerequisites
|
||||
* Python 3.11+
|
||||
* [`uv`](https://github.com/astral-sh/uv) for high-speed package management.
|
||||
|
||||
- Python 3.11+
|
||||
- [`uv`](https://github.com/astral-sh/uv) for package management
|
||||
|
||||
### Installation
|
||||
1. Clone the repository.
|
||||
2. Install dependencies:
|
||||
```powershell
|
||||
uv sync
|
||||
```
|
||||
3. Configure credentials in `credentials.toml`:
|
||||
```toml
|
||||
[gemini]
|
||||
api_key = "YOUR_KEY"
|
||||
[anthropic]
|
||||
api_key = "YOUR_KEY"
|
||||
```
|
||||
|
||||
### Running the Engine
|
||||
Launch the main GUI application:
|
||||
```powershell
|
||||
uv run gui_2.py
|
||||
git clone <repo>
|
||||
cd manual_slop
|
||||
uv sync
|
||||
```
|
||||
|
||||
To enable the Hook API for external telemetry or testing:
|
||||
### Credentials
|
||||
|
||||
Configure in `credentials.toml`:
|
||||
|
||||
```toml
|
||||
[gemini]
|
||||
api_key = "YOUR_KEY"
|
||||
|
||||
[anthropic]
|
||||
api_key = "YOUR_KEY"
|
||||
|
||||
[deepseek]
|
||||
api_key = "YOUR_KEY"
|
||||
```
|
||||
|
||||
### Running
|
||||
|
||||
```powershell
|
||||
uv run gui_2.py --enable-test-hooks
|
||||
uv run gui_2.py # Normal mode
|
||||
uv run gui_2.py --enable-test-hooks # With Hook API on :8999
|
||||
```
|
||||
|
||||
### Running Tests
|
||||
|
||||
```powershell
|
||||
uv run pytest tests/ -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Feature Roadmap (2026)
|
||||
## Project Configuration
|
||||
|
||||
* **DAG-Based Task Execution:** Real-time visual tracking of multi-agent ticket dependencies.
|
||||
* **Token Budgeting & Throttling:** Granular control over cost and context accumulation per tier.
|
||||
* **Advanced Simulation Suite:** Expanded visual verification for multi-modal reasoning tracks.
|
||||
Projects are stored as `<name>.toml` files. The discussion history is split into a sibling `<name>_history.toml` to keep the main config lean.
|
||||
|
||||
```toml
|
||||
[project]
|
||||
name = "my_project"
|
||||
git_dir = "./my_repo"
|
||||
system_prompt = ""
|
||||
|
||||
[files]
|
||||
base_dir = "./my_repo"
|
||||
paths = ["src/**/*.py", "README.md"]
|
||||
|
||||
[screenshots]
|
||||
base_dir = "./my_repo"
|
||||
paths = []
|
||||
|
||||
[output]
|
||||
output_dir = "./md_gen"
|
||||
|
||||
[gemini_cli]
|
||||
binary_path = "gemini"
|
||||
|
||||
[agent.tools]
|
||||
run_powershell = true
|
||||
read_file = true
|
||||
# ... 26 tool flags
|
||||
```
|
||||
|
||||
@@ -1,59 +1,74 @@
|
||||
# Manual Slop
|
||||
# Documentation Index
|
||||
|
||||
A GUI orchestrator for local LLM-driven coding sessions, built to prevent the AI from running wild and to provide total transparency into the context and execution state.
|
||||
[Top](../Readme.md)
|
||||
|
||||
## Core Management Panels
|
||||
---
|
||||
|
||||
## Guides
|
||||
|
||||
| Guide | Contents |
|
||||
|---|---|
|
||||
| [Architecture](guide_architecture.md) | Thread domains, cross-thread data structures, event system, application lifetime, task pipeline (producer-consumer), Execution Clutch (HITL), AI client multi-provider architecture, Anthropic/Gemini caching strategies, context refresh, comms logging, state machines |
|
||||
| [Tools & IPC](guide_tools.md) | MCP Bridge 3-layer security model, all 26 native tool signatures, Hook API GET/POST endpoints with request/response formats, ApiHookClient method reference, `/api/ask` synchronous HITL protocol, session logging, shell runner |
|
||||
| [MMA Orchestration](guide_mma.md) | Ticket/Track/WorkerContext data structures, DAG engine (cycle detection, topological sort), ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia, Tier 4 QA integration, token firewalling, track state persistence |
|
||||
| [Simulations](guide_simulations.md) | `live_gui` pytest fixture lifecycle, `VerificationLogger`, process cleanup, Puppeteer pattern (8-stage MMA simulation), approval automation, mock provider (`mock_gemini_cli.py`) with JSON-L protocol, visual verification patterns, ASTParser (tree-sitter) vs summarizer (stdlib `ast`) |
|
||||
|
||||
---
|
||||
|
||||
## GUI Panels
|
||||
|
||||
### Projects Panel
|
||||
|
||||
The heart of context management.
|
||||
Configuration and context management. Specifies the Git Directory (for commit tracking) and tracked file paths. Project switching swaps the active file list, discussion history, and settings via `<project>.toml` profiles.
|
||||
|
||||
> **Note:** The Config panel has been removed. Output directory and auto-add history settings are now integrated into the Projects and Discussion History panels respectively.
|
||||
|
||||
- **Configuration:** You specify the Git Directory (for commit tracking) and a Main Context File (the markdown file containing your project's notes and schema).
|
||||
- **Word-Wrap Toggle:** Dynamically swaps text rendering in large read-only panels (Responses, Comms Log) between unwrapped (ideal for viewing precise code formatting) and wrapped (ideal for prose).
|
||||
- **Project Switching:** Switch between different <project>.toml profiles to instantly swap out your entire active file list, discussion history, and settings.
|
||||
- **Word-Wrap Toggle**: Dynamically swaps text rendering in large read-only panels (Responses, Comms Log) between unwrapped (code formatting) and wrapped (prose).
|
||||
|
||||
### Discussion History
|
||||
|
||||
Manages your conversational branches, preventing context poisoning across different tasks.
|
||||
Manages conversational branches to prevent context poisoning across tasks.
|
||||
|
||||
- **Discussions Sub-Menu:** Allows you to create separate timelines for different tasks (e.g., "Refactoring Auth" vs. "Adding API Endpoints").
|
||||
- **Git Commit Tracking:** Clicking "Update Commit" reads HEAD from your project's git directory and stamps the discussion.
|
||||
- **Entry Management:** Each turn has a Role (User, AI, System). You can toggle entries between **Read** and **Edit** modes, collapse them, or hit [+ Max] to open them in the Global Text Viewer.
|
||||
- **Auto-Add:** If toggled, anything sent from the "Message" panel and returned to the "Response" panel is automatically appended to the current discussion history.
|
||||
- **Discussions Sub-Menu**: Create separate timelines for different tasks (e.g., "Refactoring Auth" vs. "Adding API Endpoints").
|
||||
- **Git Commit Tracking**: "Update Commit" reads HEAD from the project's git directory and stamps the discussion.
|
||||
- **Entry Management**: Each turn has a Role (User, AI, System). Toggle between Read/Edit modes, collapse entries, or open in the Global Text Viewer via `[+ Max]`.
|
||||
- **Auto-Add**: When toggled, Message panel sends and Response panel returns are automatically appended to the current discussion.
|
||||
|
||||
### Files & Screenshots
|
||||
|
||||
Controls what is explicitly fed into the context compiler.
|
||||
Controls what is fed into the context compiler.
|
||||
|
||||
- **Base Dir:** Defines the root for path resolution and tool constraints.
|
||||
- **Paths:** Explicit files or wildcard globs (e.g., src/**/*.rs).
|
||||
- When generating a request, full file contents are inlined into the context by default (`summary_only=False`). The AI can also call `get_file_summary` via its MCP tools to get a compact structural view of any file on demand.
|
||||
|
||||
## Interaction Panels
|
||||
- **Base Dir**: Defines the root for path resolution and MCP tool constraints.
|
||||
- **Paths**: Explicit files or wildcard globs (`src/**/*.rs`).
|
||||
- Full file contents are inlined by default. The AI can call `get_file_summary` for compact structural views.
|
||||
|
||||
### Provider
|
||||
|
||||
Switch between API backends (Gemini, Anthropic) on the fly. Clicking "Fetch Models" queries the active provider for the latest model list.
|
||||
Switches between API backends (Gemini, Anthropic, DeepSeek, Gemini CLI). "Fetch Models" queries the active provider for the latest model list.
|
||||
|
||||
### Message & Response
|
||||
|
||||
- **Message:** Your input field.
|
||||
- **Gen + Send:** Compiles the markdown context and dispatches the background thread to the AI.
|
||||
- **MD Only:** Dry-runs the compiler so you can inspect the generated <project>_00N.md without triggering an API charge.
|
||||
- **Response:** The read-only output. Flashes green when a new response arrives.
|
||||
- **Message**: User input field.
|
||||
- **Gen + Send**: Compiles markdown context and dispatches to the AI via `AsyncEventQueue`.
|
||||
- **MD Only**: Dry-runs the compiler for context inspection without API cost.
|
||||
- **Response**: Read-only output; flashes green on new response.
|
||||
|
||||
### Global Text Viewer & Script Outputs
|
||||
|
||||
- **Last Script Output:** Whenever the AI executes a background script, this window pops up, flashing blue. It contains both the executed script and the stdout/stderr. The `[+ Maximize]` buttons read directly from stored instance variables (`_last_script`, `_last_output`) rather than DPG widget tags, so they work correctly regardless of word-wrap state.
|
||||
- **Text Viewer:** A large, resizable global popup invoked anytime you click a [+] or [+ Maximize] button in the UI. Used for deep-reading long logs, discussion entries, or script bodies.
|
||||
- **Confirm Dialog:** The `[+ Maximize]` button in the script approval modal passes the script text directly as `user_data` at button-creation time, so it remains safe to click even after the dialog has been dismissed.
|
||||
- **Last Script Output**: Pops up (flashing blue) whenever the AI executes a script. Shows both the executed script and stdout/stderr. `[+ Maximize]` reads from stored instance variables, not DPG widget tags, so it works regardless of word-wrap state.
|
||||
- **Text Viewer**: Large resizable popup invoked by `[+]` / `[+ Maximize]` buttons. For deep-reading long logs, discussion entries, or script bodies.
|
||||
- **Confirm Dialog**: The `[+ Maximize]` button in the script approval modal passes script text as `user_data` at button-creation time — safe to click even after the dialog is dismissed.
|
||||
|
||||
## System Prompts
|
||||
### Tool Calls & Comms History
|
||||
|
||||
Provides two text inputs for overriding default instructions:
|
||||
Real-time display of MCP tool invocations and raw API traffic. Each comms entry: timestamp, direction (OUT/IN), kind, provider, model, payload.
|
||||
|
||||
1. **Global:** Applied across every project you load.
|
||||
2. **Project:** Specific to the active workspace.
|
||||
These are concatenated onto the strict tool-usage guidelines the agent is initialized with.
|
||||
### MMA Dashboard
|
||||
|
||||
Displays the 4-tier orchestration state: active track, ticket DAG with status indicators, per-tier token usage, output streams. Approval buttons for spawn/step/tool gates.
|
||||
|
||||
### System Prompts
|
||||
|
||||
Two text inputs for instruction overrides:
|
||||
1. **Global**: Applied across every project.
|
||||
2. **Project**: Specific to the active workspace.
|
||||
|
||||
Concatenated onto the base tool-usage guidelines.
|
||||
|
||||
@@ -1,72 +1,542 @@
|
||||
# Manual Slop: Architectural Technical Reference
|
||||
# Architecture
|
||||
|
||||
A deep-dive into the asynchronous orchestration, state synchronization, and the "Linear Execution Clutch" of the Manual Slop engine. This document is designed to move the reader from a high-level mental model to a low-level implementation understanding.
|
||||
[Top](../Readme.md) | [Tools & IPC](guide_tools.md) | [MMA Orchestration](guide_mma.md) | [Simulations](guide_simulations.md)
|
||||
|
||||
---
|
||||
|
||||
## 1. Philosophy: The Decoupled State Machine
|
||||
## Philosophy: The Decoupled State Machine
|
||||
|
||||
Manual Slop is built on a single, core realization: **AI reasoning is high-latency and non-deterministic, while GUI interaction must be low-latency and responsive.**
|
||||
|
||||
To solve this, the engine enforces a strict decoupling between three distinct boundaries:
|
||||
|
||||
* **The GUI Boundary (Main Thread):** A retained-mode loop (ImGui) that must never block. It handles visual telemetry and user "Seal of Approval" actions.
|
||||
* **The AI Boundary (Daemon Threads):** Stateless execution loops that handle the "heavy lifting" of context aggregation, LLM communication, and tool reasoning.
|
||||
* **The Orchestration Boundary (Asyncio):** A background thread that manages the flow of data between the other two, ensuring thread-safe communication without blocking the UI.
|
||||
Manual Slop solves a single tension: **AI reasoning is high-latency and non-deterministic; GUI interaction must be low-latency and responsive.** The engine enforces strict decoupling between three thread domains so that multi-second LLM calls never block the render loop, and every AI-generated payload passes through a human-auditable gate before execution.
|
||||
|
||||
---
|
||||
|
||||
## 2. System Lifetime & Initialization
|
||||
## Thread Domains
|
||||
|
||||
The application lifecycle, managed by `App` in `gui_2.py`, follows a precise sequence to ensure the environment is ready before the first frame:
|
||||
Four distinct thread domains operate concurrently:
|
||||
|
||||
1. **Context Hydration:** The engine reads `config.toml` (global) and `<project>.toml` (local). This builds the initial "world view" of the project—what files are tracked, what the discussion history is, and which AI models are active.
|
||||
2. **Thread Bootstrapping:**
|
||||
* The `Asyncio` event loop thread is started (`_loop_thread`).
|
||||
* The `HookServer` (FastAPI) is started as a daemon to handle IPC.
|
||||
3. **UI Entry:** The main thread enters `immapp.run()`. At this point, the GUI is "alive," and the background threads are ready to receive tasks.
|
||||
4. **The Dual-Flush Shutdown:** On exit, the system commits state back to both project and global configs. This ensures that your window positions, active discussions, and even pending tool results are preserved for the next session.
|
||||
| Domain | Created By | Purpose | Lifecycle |
|
||||
|---|---|---|---|
|
||||
| **Main / GUI** | `immapp.run()` | Dear ImGui retained-mode render loop; sole writer of GUI state | App lifetime |
|
||||
| **Asyncio Worker** | `App.__init__` via `threading.Thread(daemon=True)` | Event queue processing, AI client calls | Daemon (dies with process) |
|
||||
| **HookServer** | `api_hooks.HookServer.start()` | HTTP API on `:8999` for external automation and IPC | Daemon thread |
|
||||
| **Ad-hoc** | Transient `threading.Thread` calls | Model-fetching, legacy send paths | Short-lived |
|
||||
|
||||
The asyncio worker is **not** the main thread's event loop. It runs a dedicated `asyncio.new_event_loop()` on its own daemon thread:
|
||||
|
||||
```python
|
||||
# App.__init__:
|
||||
self._loop = asyncio.new_event_loop()
|
||||
self._loop_thread = threading.Thread(target=self._run_event_loop, daemon=True)
|
||||
self._loop_thread.start()
|
||||
|
||||
# _run_event_loop:
|
||||
def _run_event_loop(self) -> None:
|
||||
asyncio.set_event_loop(self._loop)
|
||||
self._loop.create_task(self._process_event_queue())
|
||||
self._loop.run_forever()
|
||||
```
|
||||
|
||||
The GUI thread uses `asyncio.run_coroutine_threadsafe(coro, self._loop)` to push work into this loop.
|
||||
|
||||
---
|
||||
|
||||
## 3. The Task Pipeline: Producer-Consumer Synchronization
|
||||
## Cross-Thread Data Structures
|
||||
|
||||
Because ImGui state cannot be safely modified from a background thread, Manual Slop uses a **Producer-Consumer** model for all updates.
|
||||
All cross-thread communication uses one of three patterns:
|
||||
|
||||
### The Flow of an AI Request
|
||||
1. **Produce:** When you click "Gen + Send," the GUI thread produces a `UserRequestEvent` and pushes it into the `AsyncEventQueue`.
|
||||
2. **Consume:** The background `asyncio` loop pops this event and dispatches it to the `ai_client`. The GUI thread remains free to render and respond to other inputs.
|
||||
3. **Task Backlog:** When the AI responds, the background thread *cannot* update the UI text boxes directly. Instead, it appends a **Task Dictionary** to the `_pending_gui_tasks` list.
|
||||
4. **Sync:** On every frame, the GUI thread checks this list. If tasks exist, it acquires a lock, clears the list, and executes the updates (e.g., "Set AI response text," "Blink the terminal indicator").
|
||||
### Pattern A: AsyncEventQueue (GUI -> Asyncio)
|
||||
|
||||
```python
|
||||
# events.py
|
||||
class AsyncEventQueue:
|
||||
_queue: asyncio.Queue # holds Tuple[str, Any] items
|
||||
|
||||
async def put(self, event_name: str, payload: Any = None) -> None
|
||||
async def get(self) -> Tuple[str, Any]
|
||||
```
|
||||
|
||||
The central event bus. Uses `asyncio.Queue`, so non-asyncio threads must enqueue via `asyncio.run_coroutine_threadsafe()`. Consumer is `App._process_event_queue()`, running as a long-lived coroutine on the asyncio loop.
|
||||
|
||||
### Pattern B: Guarded Lists (Any Thread -> GUI)
|
||||
|
||||
Background threads cannot write GUI state directly. They append task dicts to lock-guarded lists; the main thread drains these once per frame:
|
||||
|
||||
```python
|
||||
# App.__init__:
|
||||
self._pending_gui_tasks: list[dict[str, Any]] = []
|
||||
self._pending_gui_tasks_lock = threading.Lock()
|
||||
|
||||
self._pending_comms: list[dict[str, Any]] = []
|
||||
self._pending_comms_lock = threading.Lock()
|
||||
|
||||
self._pending_tool_calls: list[tuple[str, str, float]] = []
|
||||
self._pending_tool_calls_lock = threading.Lock()
|
||||
|
||||
self._pending_history_adds: list[dict[str, Any]] = []
|
||||
self._pending_history_adds_lock = threading.Lock()
|
||||
```
|
||||
|
||||
Additional locks:
|
||||
```python
|
||||
self._send_thread_lock = threading.Lock() # Guards send_thread creation
|
||||
self._pending_dialog_lock = threading.Lock() # Guards _pending_dialog + _pending_actions dict
|
||||
```
|
||||
|
||||
### Pattern C: Condition-Variable Dialogs (Bidirectional Blocking)
|
||||
|
||||
Used for Human-in-the-Loop (HITL) approval. Background thread blocks on `threading.Condition`; GUI thread signals after user action. See the [HITL section](#the-execution-clutch-human-in-the-loop) below.
|
||||
|
||||
---
|
||||
|
||||
## 4. The Execution Clutch: Human-In-The-Loop (HITL)
|
||||
## Event System
|
||||
|
||||
The "Execution Clutch" is our answer to the "Black Box" problem of AI. It allows you to shift from automatic execution to a manual, deterministic step-through mode.
|
||||
Three classes in `events.py` (89 lines, no external dependencies beyond `asyncio` and `typing`):
|
||||
|
||||
### How the "Shifting" Works
|
||||
When the AI requests a destructive action (like running a PowerShell script), the background execution thread is **suspended** using a `threading.Condition`:
|
||||
### EventEmitter
|
||||
|
||||
1. **The Pause:** The thread enters a `.wait()` state. It is physically blocked.
|
||||
2. **The Modal:** A task is sent to the GUI to open a modal dialog.
|
||||
3. **The Mutation:** The user can read the script, edit it, or reject it.
|
||||
4. **The Unleash:** When the user clicks "Approve," the GUI thread updates the shared state and calls `.notify_all()`. The background thread "wakes up," executes the (potentially modified) script, and reports the result back to the AI.
|
||||
```python
|
||||
class EventEmitter:
|
||||
_listeners: Dict[str, List[Callable]]
|
||||
|
||||
def on(self, event_name: str, callback: Callable) -> None
|
||||
def emit(self, event_name: str, *args: Any, **kwargs: Any) -> None
|
||||
```
|
||||
|
||||
Synchronous pub-sub. Callbacks execute in the caller's thread. Used by `ai_client.events` for lifecycle hooks (`request_start`, `response_received`, `tool_execution`). No thread safety — relies on consistent single-thread usage.
|
||||
|
||||
### AsyncEventQueue
|
||||
|
||||
Described above in Pattern A.
|
||||
|
||||
### UserRequestEvent
|
||||
|
||||
```python
|
||||
class UserRequestEvent:
|
||||
prompt: str # User's raw input text
|
||||
stable_md: str # Generated markdown context (files, screenshots)
|
||||
file_items: List[Any] # File attachment items for dynamic refresh
|
||||
disc_text: str # Serialized discussion history
|
||||
base_dir: str # Working directory for shell commands
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]
|
||||
```
|
||||
|
||||
Pure data carrier. Created on the GUI thread in `_handle_generate_send`, consumed on the asyncio thread in `_handle_request_event`.
|
||||
|
||||
---
|
||||
|
||||
## 5. Security: The MCP Allowlist
|
||||
## Application Lifetime
|
||||
|
||||
To prevent "hallucinated" file access, every filesystem tool (read, list, search) is gated by the **MCP (Model Context Protocol) Bridge**:
|
||||
### Boot Sequence
|
||||
|
||||
* **Resolution:** Every path requested by the AI is resolved to an absolute path.
|
||||
* **Checking:** It is verified against the project's `base_dir`. If the AI tries to `read_file("C:/Windows/System32/...")`, the bridge intercepts the call and returns an `ACCESS DENIED` error to the model before the OS is ever touched.
|
||||
The `App.__init__` (lines 152-296) follows this precise order:
|
||||
|
||||
1. **Config hydration**: Reads `config.toml` (global) and `<project>.toml` (local). Builds the initial "world view" — tracked files, discussion history, active models.
|
||||
2. **Thread bootstrapping**:
|
||||
- Asyncio event loop thread starts (`_loop_thread`).
|
||||
- `HookServer` starts as a daemon if `test_hooks_enabled` or provider is `gemini_cli`.
|
||||
3. **Callback wiring** (`_init_ai_and_hooks`): Connects `ai_client.confirm_and_run_callback`, `comms_log_callback`, `tool_log_callback` to GUI handlers.
|
||||
4. **UI entry**: Main thread enters `immapp.run()`. GUI is now alive; background threads are ready.
|
||||
|
||||
### Shutdown Sequence
|
||||
|
||||
When `immapp.run()` returns (user closed window):
|
||||
|
||||
1. `hook_server.stop()` — shuts down HTTP server, joins thread.
|
||||
2. `perf_monitor.stop()`.
|
||||
3. `ai_client.cleanup()` — destroys server-side API caches (Gemini `CachedContent`).
|
||||
4. **Dual-Flush persistence**: `_flush_to_project()`, `_save_active_project()`, `_flush_to_config()`, `save_config()` — commits state back to both project and global configs.
|
||||
5. `session_logger.close_session()`.
|
||||
|
||||
The asyncio loop thread is a daemon — it dies with the process. `App.shutdown()` exists for explicit cleanup in test scenarios:
|
||||
|
||||
```python
|
||||
def shutdown(self) -> None:
|
||||
if self._loop.is_running():
|
||||
self._loop.call_soon_threadsafe(self._loop.stop)
|
||||
if self._loop_thread.is_alive():
|
||||
self._loop_thread.join(timeout=2.0)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Telemetry & Auditing
|
||||
## The Task Pipeline: Producer-Consumer Synchronization
|
||||
|
||||
Every interaction in Manual Slop is designed to be auditable:
|
||||
* **JSON-L Comms Logs:** Raw API traffic is logged for debugging and token cost analysis.
|
||||
* **Generated Scripts:** Every script that passes through the "Clutch" is saved to `scripts/generated/`.
|
||||
* **Performance Monitor:** Real-time metrics (FPS, Frame Time, Input Lag) are tracked and can be queried via the Hook API to ensure the UI remains "fluid" under load.
|
||||
### Request Flow
|
||||
|
||||
```
|
||||
GUI Thread Asyncio Thread GUI Thread (next frame)
|
||||
────────── ────────────── ──────────────────────
|
||||
1. User clicks "Gen + Send"
|
||||
2. _handle_generate_send():
|
||||
- Compiles md context
|
||||
- Creates UserRequestEvent
|
||||
- Enqueues via
|
||||
run_coroutine_threadsafe ──> 3. _process_event_queue():
|
||||
awaits event_queue.get()
|
||||
routes "user_request" to
|
||||
_handle_request_event()
|
||||
4. Configures ai_client
|
||||
5. ai_client.send() BLOCKS
|
||||
(seconds to minutes)
|
||||
6. On completion, enqueues
|
||||
"response" event back ──> 7. _process_pending_gui_tasks():
|
||||
Drains task list under lock
|
||||
Sets ai_response text
|
||||
Triggers terminal blink
|
||||
```
|
||||
|
||||
### Event Types Routed by `_process_event_queue`
|
||||
|
||||
| Event Name | Action |
|
||||
|---|---|
|
||||
| `"user_request"` | Calls `_handle_request_event(payload)` — synchronous blocking AI call |
|
||||
| `"response"` | Appends `{"action": "handle_ai_response", ...}` to `_pending_gui_tasks` |
|
||||
| `"mma_state_update"` | Appends `{"action": "mma_state_update", ...}` to `_pending_gui_tasks` |
|
||||
| `"mma_spawn_approval"` | Appends the raw payload for HITL dialog creation |
|
||||
| `"mma_step_approval"` | Appends the raw payload for HITL dialog creation |
|
||||
|
||||
The pattern: events arriving on the asyncio thread that need GUI state changes are **serialized into `_pending_gui_tasks`** for consumption on the next render frame.
|
||||
|
||||
### Frame-Sync Mechanism: `_process_pending_gui_tasks`
|
||||
|
||||
Called once per ImGui frame on the **main GUI thread**. This is the sole safe point for mutating GUI-visible state.
|
||||
|
||||
**Locking strategy** — copy-and-clear:
|
||||
|
||||
```python
|
||||
def _process_pending_gui_tasks(self) -> None:
|
||||
if not self._pending_gui_tasks:
|
||||
return
|
||||
with self._pending_gui_tasks_lock:
|
||||
tasks = self._pending_gui_tasks[:] # Snapshot
|
||||
self._pending_gui_tasks.clear() # Release lock fast
|
||||
for task in tasks:
|
||||
# Process each task outside the lock
|
||||
```
|
||||
|
||||
Acquires the lock briefly to snapshot the task list, then processes outside the lock. Minimizes lock contention with producer threads.
|
||||
|
||||
### Complete Action Type Catalog
|
||||
|
||||
| Action | Source | Effect |
|
||||
|---|---|---|
|
||||
| `"refresh_api_metrics"` | asyncio/hooks | Updates API metrics display |
|
||||
| `"handle_ai_response"` | asyncio | Sets `ai_response`, `ai_status`, `mma_streams[stream_id]`; triggers blink; optionally auto-adds to discussion history |
|
||||
| `"show_track_proposal"` | asyncio | Sets `proposed_tracks` list, opens modal |
|
||||
| `"mma_state_update"` | asyncio | Updates `mma_status`, `active_tier`, `mma_tier_usage`, `active_tickets`, `active_track` |
|
||||
| `"set_value"` | HookServer | Sets any field in `_settable_fields` map via `setattr`; special-cases `current_provider`/`current_model` to reconfigure AI client |
|
||||
| `"click"` | HookServer | Dispatches to `_clickable_actions` map; introspects signatures to decide whether to pass `user_data` |
|
||||
| `"select_list_item"` | HookServer | Routes to `_switch_discussion()` for discussion listbox |
|
||||
| `{"type": "ask"}` | HookServer | Opens ask dialog: sets `_pending_ask_dialog = True`, stores `_ask_request_id` and `_ask_tool_data` |
|
||||
| `"clear_ask"` | HookServer | Clears ask dialog state if request_id matches |
|
||||
| `"custom_callback"` | HookServer | Executes an arbitrary callable with args |
|
||||
| `"mma_step_approval"` | asyncio (MMA engine) | Creates `MMAApprovalDialog`, stores in `_pending_mma_approval` |
|
||||
| `"mma_spawn_approval"` | asyncio (MMA engine) | Creates `MMASpawnApprovalDialog`, stores in `_pending_mma_spawn` |
|
||||
| `"refresh_from_project"` | HookServer/internal | Reloads all UI state from project dict |
|
||||
|
||||
---
|
||||
|
||||
## The Execution Clutch: Human-in-the-Loop
|
||||
|
||||
The "Execution Clutch" ensures every destructive AI action passes through an auditable human gate. Three dialog types implement this, all sharing the same blocking pattern.
|
||||
|
||||
### Dialog Classes
|
||||
|
||||
**`ConfirmDialog`** — PowerShell script execution approval:
|
||||
|
||||
```python
|
||||
class ConfirmDialog:
|
||||
_uid: str # uuid4 identifier
|
||||
_script: str # The PowerShell script text (editable)
|
||||
_base_dir: str # Working directory
|
||||
_condition: threading.Condition # Blocking primitive
|
||||
_done: bool # Signal flag
|
||||
_approved: bool # User's decision
|
||||
|
||||
def wait(self) -> tuple[bool, str] # Blocks until _done; returns (approved, script)
|
||||
```
|
||||
|
||||
**`MMAApprovalDialog`** — MMA tier step approval:
|
||||
|
||||
```python
|
||||
class MMAApprovalDialog:
|
||||
_ticket_id: str
|
||||
_payload: str # The step payload (editable)
|
||||
_condition: threading.Condition
|
||||
_done: bool
|
||||
_approved: bool
|
||||
|
||||
def wait(self) -> tuple[bool, str] # Returns (approved, payload)
|
||||
```
|
||||
|
||||
**`MMASpawnApprovalDialog`** — Sub-agent spawn approval:
|
||||
|
||||
```python
|
||||
class MMASpawnApprovalDialog:
|
||||
_ticket_id: str
|
||||
_role: str # tier3-worker, tier4-qa, etc.
|
||||
_prompt: str # Spawn prompt (editable)
|
||||
_context_md: str # Context document (editable)
|
||||
_condition: threading.Condition
|
||||
_done: bool
|
||||
_approved: bool
|
||||
_abort: bool # Can abort entire track
|
||||
|
||||
def wait(self) -> dict[str, Any] # Returns {approved, abort, prompt, context_md}
|
||||
```
|
||||
|
||||
### Blocking Flow
|
||||
|
||||
Using `ConfirmDialog` as exemplar:
|
||||
|
||||
```
|
||||
ASYNCIO THREAD (ai_client tool callback) GUI MAIN THREAD
|
||||
───────────────────────────────────────── ───────────────
|
||||
1. ai_client calls _confirm_and_run(script)
|
||||
2. Creates ConfirmDialog(script, base_dir)
|
||||
3. Stores dialog:
|
||||
- Headless: _pending_actions[uid] = dialog
|
||||
- GUI mode: _pending_dialog = dialog
|
||||
4. If test_hooks_enabled:
|
||||
pushes to _api_event_queue
|
||||
5. dialog.wait() BLOCKS on _condition
|
||||
6. Next frame: ImGui renders
|
||||
_pending_dialog in modal
|
||||
7. User clicks Approve/Reject
|
||||
8. _handle_approve_script():
|
||||
with dialog._condition:
|
||||
dialog._approved = True
|
||||
dialog._done = True
|
||||
dialog._condition.notify_all()
|
||||
9. wait() returns (True, potentially_edited_script)
|
||||
10. Executes shell_runner.run_powershell()
|
||||
11. Returns output to ai_client
|
||||
```
|
||||
|
||||
The `_condition.wait(timeout=0.1)` uses a 100ms polling interval inside a loop — a polling-with-condition hybrid that ensures the blocking thread wakes periodically.
|
||||
|
||||
### Resolution Paths
|
||||
|
||||
**GUI button path** (normal interactive use):
|
||||
`_handle_approve_script()` / `_handle_approve_mma_step()` / `_handle_approve_spawn()` directly manipulate the dialog's condition variable from the GUI thread.
|
||||
|
||||
**HTTP API path** (headless/automation):
|
||||
`resolve_pending_action(action_id, approved)` looks up the dialog by UUID in `_pending_actions` dict (headless) or `_pending_dialog` (GUI), then signals the condition:
|
||||
|
||||
```python
|
||||
def resolve_pending_action(self, action_id: str, approved: bool) -> bool:
|
||||
with self._pending_dialog_lock:
|
||||
if action_id in self._pending_actions:
|
||||
dialog = self._pending_actions[action_id]
|
||||
with dialog._condition:
|
||||
dialog._approved = approved
|
||||
dialog._done = True
|
||||
dialog._condition.notify_all()
|
||||
return True
|
||||
```
|
||||
|
||||
**MMA approval path**:
|
||||
`_handle_mma_respond(approved, payload, abort, prompt, context_md)` is the unified resolver. It uses a `dialog_container` — a one-element list `[None]` used as a mutable reference shared between the MMA engine (which creates the container) and the GUI (which populates it via `_process_pending_gui_tasks`).
|
||||
|
||||
---
|
||||
|
||||
## AI Client: Multi-Provider Architecture
|
||||
|
||||
`ai_client.py` operates as a **stateful singleton** — all provider state is held in module-level globals. There is no class wrapping; the module itself is the abstraction layer.
|
||||
|
||||
### Module-Level State
|
||||
|
||||
```python
|
||||
_provider: str = "gemini" # "gemini" | "anthropic" | "deepseek" | "gemini_cli"
|
||||
_model: str = "gemini-2.5-flash-lite"
|
||||
_temperature: float = 0.0
|
||||
_max_tokens: int = 8192
|
||||
_history_trunc_limit: int = 8000 # Char limit for truncating old tool outputs
|
||||
|
||||
_send_lock: threading.Lock # Serializes ALL send() calls across providers
|
||||
```
|
||||
|
||||
Per-provider client objects:
|
||||
|
||||
```python
|
||||
# Gemini (SDK-managed stateful chat)
|
||||
_gemini_client: genai.Client | None
|
||||
_gemini_chat: Any # Holds history internally
|
||||
_gemini_cache: Any # Server-side CachedContent
|
||||
_gemini_cache_md_hash: int | None # For cache invalidation
|
||||
_GEMINI_CACHE_TTL: int = 3600 # 1-hour; rebuilt at 90% (3240s)
|
||||
|
||||
# Anthropic (client-managed history)
|
||||
_anthropic_client: anthropic.Anthropic | None
|
||||
_anthropic_history: list[dict] # Mutable [{role, content}, ...]
|
||||
_anthropic_history_lock: threading.Lock
|
||||
|
||||
# DeepSeek (raw HTTP, client-managed history)
|
||||
_deepseek_history: list[dict]
|
||||
_deepseek_history_lock: threading.Lock
|
||||
|
||||
# Gemini CLI (adapter wrapper)
|
||||
_gemini_cli_adapter: GeminiCliAdapter | None
|
||||
```
|
||||
|
||||
Safety limits:
|
||||
|
||||
```python
|
||||
MAX_TOOL_ROUNDS: int = 10 # Max tool-call loop iterations per send()
|
||||
_MAX_TOOL_OUTPUT_BYTES: int = 500_000 # 500KB cumulative tool output budget
|
||||
_ANTHROPIC_CHUNK_SIZE: int = 120_000 # Max chars per system text block
|
||||
_ANTHROPIC_MAX_PROMPT_TOKENS: int = 180_000 # 200k limit minus headroom
|
||||
_GEMINI_MAX_INPUT_TOKENS: int = 900_000 # 1M window minus headroom
|
||||
```
|
||||
|
||||
### The `send()` Dispatcher
|
||||
|
||||
```python
|
||||
def send(md_content, user_message, base_dir=".", file_items=None,
|
||||
discussion_history="", stream=False,
|
||||
pre_tool_callback=None, qa_callback=None) -> str:
|
||||
with _send_lock:
|
||||
if _provider == "gemini": return _send_gemini(...)
|
||||
elif _provider == "gemini_cli": return _send_gemini_cli(...)
|
||||
elif _provider == "anthropic": return _send_anthropic(...)
|
||||
elif _provider == "deepseek": return _send_deepseek(..., stream=stream)
|
||||
```
|
||||
|
||||
`_send_lock` serializes all API calls — only one provider call can be in-flight at a time. All providers share the same callback signatures. Return type is always `str`.
|
||||
|
||||
### Provider Comparison
|
||||
|
||||
| Aspect | Gemini SDK | Anthropic | DeepSeek | Gemini CLI |
|
||||
|---|---|---|---|---|
|
||||
| **Client** | `genai.Client` | `anthropic.Anthropic` | Raw `requests.post` | `GeminiCliAdapter` (subprocess) |
|
||||
| **History** | SDK-managed (`_gemini_chat._history`) | Client-managed list | Client-managed list | CLI-managed (session ID) |
|
||||
| **Caching** | Server-side `CachedContent` with TTL | Prompt caching via `cache_control: ephemeral` (4 breakpoints) | None | None |
|
||||
| **Tool format** | `types.FunctionDeclaration` | JSON Schema dict | Not declared | Same as SDK via adapter |
|
||||
| **Tool results** | `Part.from_function_response(response={"output": ...})` | `{"type": "tool_result", "tool_use_id": ..., "content": ...}` | `{"role": "tool", "tool_call_id": ..., "content": ...}` | `{"role": "tool", ...}` |
|
||||
| **History trimming** | In-place at 40% of 900K token estimate | 2-phase: strip stale file refreshes, then drop turn pairs at 180K | None | None |
|
||||
| **Streaming** | No | No | Yes | No |
|
||||
|
||||
### Tool-Call Loop (common pattern across providers)
|
||||
|
||||
All providers follow the same high-level loop, iterated up to `MAX_TOOL_ROUNDS + 2` times:
|
||||
|
||||
1. Send message (or tool results from prior round) to API.
|
||||
2. Extract text response and any function calls.
|
||||
3. Log to comms log; emit events.
|
||||
4. If no function calls or max rounds exceeded: **break**.
|
||||
5. For each function call:
|
||||
- If `pre_tool_callback` rejects: return rejection text.
|
||||
- Dispatch to `mcp_client.dispatch()` or `shell_runner.run_powershell()`.
|
||||
- After the **last** call of this round: run `_reread_file_items()` for context refresh.
|
||||
- Truncate tool output at `_history_trunc_limit` chars.
|
||||
- Accumulate `_cumulative_tool_bytes`.
|
||||
6. If cumulative bytes > 500KB: inject warning.
|
||||
7. Package tool results in provider-specific format; loop.
|
||||
|
||||
### Context Refresh Mechanism
|
||||
|
||||
After the last tool call in each round, `_reread_file_items(file_items)` checks mtimes of all tracked files:
|
||||
|
||||
1. For each file item: compare `Path.stat().st_mtime` against stored `mtime`.
|
||||
2. If unchanged: pass through as-is.
|
||||
3. If changed: re-read content, store `old_content` for diffing, update `mtime`.
|
||||
4. Changed files are diffed via `_build_file_diff_text`:
|
||||
- Files <= 200 lines: emit full content.
|
||||
- Files > 200 lines with `old_content`: emit `difflib.unified_diff`.
|
||||
5. Diff is appended to the last tool's output as `[SYSTEM: FILES UPDATED]\n\n{diff}`.
|
||||
6. Stale `[FILES UPDATED]` blocks are stripped from older history turns by `_strip_stale_file_refreshes` to prevent context bloat.
|
||||
|
||||
### Anthropic Cache Strategy (4-Breakpoint System)
|
||||
|
||||
Anthropic allows a maximum of 4 `cache_control: ephemeral` breakpoints:
|
||||
|
||||
| # | Location | Purpose |
|
||||
|---|---|---|
|
||||
| 1 | Last block of stable system prompt | Cache base instructions |
|
||||
| 2 | Last block of context chunks | Cache file context |
|
||||
| 3 | Last tool definition | Cache tool schema |
|
||||
| 4 | Second-to-last user message | Cache conversation prefix |
|
||||
|
||||
Before placing breakpoint 4, all existing `cache_control` is stripped from history to prevent exceeding the limit.
|
||||
|
||||
### Gemini Cache Strategy (Server-Side TTL)
|
||||
|
||||
System instruction content is hashed. On each call, a 3-way decision:
|
||||
|
||||
- **Hash changed**: Delete old cache, rebuild with new content.
|
||||
- **Cache age > 90% of TTL**: Proactive renewal (delete + rebuild).
|
||||
- **No cache exists**: Create new `CachedContent` if token count >= 2048; otherwise inline.
|
||||
|
||||
---
|
||||
|
||||
## Comms Log System
|
||||
|
||||
Every API interaction is logged to a module-level list with real-time GUI push:
|
||||
|
||||
```python
|
||||
def _append_comms(direction: str, kind: str, payload: dict[str, Any]) -> None:
|
||||
entry = {
|
||||
"ts": datetime.now().strftime("%H:%M:%S"),
|
||||
"direction": direction, # "OUT" (to API) or "IN" (from API)
|
||||
"kind": kind, # "request" | "response" | "tool_call" | "tool_result"
|
||||
"provider": _provider,
|
||||
"model": _model,
|
||||
"payload": payload,
|
||||
}
|
||||
_comms_log.append(entry)
|
||||
if comms_log_callback:
|
||||
comms_log_callback(entry) # Real-time push to GUI
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## State Machines
|
||||
|
||||
### `ai_status` (Informal)
|
||||
|
||||
```
|
||||
"idle" -> "sending..." -> [AI call in progress]
|
||||
-> "running powershell..." -> "powershell done, awaiting AI..."
|
||||
-> "fetching url..." | "searching web..."
|
||||
-> "done" | "error"
|
||||
-> "idle" (on reset)
|
||||
```
|
||||
|
||||
### HITL Dialog State (Binary per type)
|
||||
|
||||
- `_pending_dialog is not None` — script confirmation active
|
||||
- `_pending_mma_approval is not None` — MMA step approval active
|
||||
- `_pending_mma_spawn is not None` — spawn approval active
|
||||
- `_pending_ask_dialog == True` — tool ask dialog active
|
||||
|
||||
---
|
||||
|
||||
## Security: The MCP Allowlist
|
||||
|
||||
Every filesystem tool (read, list, search, write) is gated by the MCP Bridge (`mcp_client.py`). See [guide_tools.md](guide_tools.md) for the complete security model, tool inventory, and endpoint reference.
|
||||
|
||||
Summary: Every path is resolved to an absolute path and checked against a dynamically-built allowlist constructed from the project's tracked files and base directories. Files named `history.toml` or `*_history.toml` are hard-blacklisted.
|
||||
|
||||
---
|
||||
|
||||
## Telemetry & Auditing
|
||||
|
||||
Every interaction is designed to be auditable:
|
||||
|
||||
- **JSON-L Comms Logs**: Raw API traffic logged to `logs/sessions/<id>/comms.log` for debugging and token cost analysis.
|
||||
- **Tool Call Logs**: Markdown-formatted sequential records to `toolcalls.log`.
|
||||
- **Generated Scripts**: Every PowerShell script that passes through the Execution Clutch is saved to `scripts/generated/<ts>_<seq>.ps1`.
|
||||
- **API Hook Logs**: All HTTP hook invocations logged to `apihooks.log`.
|
||||
- **CLI Call Logs**: Subprocess execution details (command, stdin, stdout, stderr, latency) to `clicalls.log` as JSON-L.
|
||||
- **Performance Monitor**: Real-time FPS, Frame Time, CPU, Input Lag tracked and queryable via Hook API.
|
||||
|
||||
---
|
||||
|
||||
## Architectural Invariants
|
||||
|
||||
1. **Single-writer principle**: All GUI state mutations happen on the main thread via `_process_pending_gui_tasks`. Background threads never write GUI state directly.
|
||||
2. **Copy-and-clear lock pattern**: `_process_pending_gui_tasks` snapshots and clears the task list under the lock, then processes outside the lock.
|
||||
3. **Context Amnesia**: Each MMA Tier 3 Worker starts with `ai_client.reset_session()`. No conversational bleed between tickets.
|
||||
4. **Send serialization**: `_send_lock` ensures only one provider call is in-flight at a time across all threads.
|
||||
5. **Dual-Flush persistence**: On exit, state is committed to both project-level and global-level config files.
|
||||
|
||||
368
docs/guide_mma.md
Normal file
368
docs/guide_mma.md
Normal file
@@ -0,0 +1,368 @@
|
||||
# MMA: 4-Tier Multi-Model Agent Orchestration
|
||||
|
||||
[Top](../Readme.md) | [Architecture](guide_architecture.md) | [Tools & IPC](guide_tools.md) | [Simulations](guide_simulations.md)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The MMA (Multi-Model Agent) system is a hierarchical task decomposition and execution engine. A high-level "epic" is broken into tracks, tracks are decomposed into tickets with dependency relationships, and tickets are executed by stateless workers with human-in-the-loop approval at every destructive boundary.
|
||||
|
||||
```
|
||||
Tier 1: Orchestrator — product alignment, epic → tracks
|
||||
Tier 2: Tech Lead — track → tickets (DAG), architectural oversight
|
||||
Tier 3: Worker — stateless TDD implementation per ticket
|
||||
Tier 4: QA — stateless error analysis, no fixes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Structures (`models.py`)
|
||||
|
||||
### Ticket
|
||||
|
||||
The atomic unit of work. All MMA execution revolves around transitioning tickets through their state machine.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class Ticket:
|
||||
id: str # e.g., "T-001"
|
||||
description: str # Human-readable task description
|
||||
status: str # "todo" | "in_progress" | "completed" | "blocked"
|
||||
assigned_to: str # Tier assignment: "tier3-worker", "tier4-qa"
|
||||
target_file: Optional[str] = None # File this ticket modifies
|
||||
context_requirements: List[str] = field() # Files needed for context injection
|
||||
depends_on: List[str] = field() # Ticket IDs that must complete first
|
||||
blocked_reason: Optional[str] = None # Why this ticket is blocked
|
||||
step_mode: bool = False # If True, requires manual approval before execution
|
||||
|
||||
def mark_blocked(self, reason: str) -> None # Sets status="blocked", stores reason
|
||||
def mark_complete(self) -> None # Sets status="completed"
|
||||
def to_dict(self) -> Dict[str, Any]
|
||||
@classmethod
|
||||
def from_dict(cls, data) -> "Ticket"
|
||||
```
|
||||
|
||||
**Status state machine:**
|
||||
|
||||
```
|
||||
todo ──> in_progress ──> completed
|
||||
| |
|
||||
v v
|
||||
blocked blocked
|
||||
```
|
||||
|
||||
### Track
|
||||
|
||||
A collection of tickets with a shared goal.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class Track:
|
||||
id: str # Track identifier
|
||||
description: str # Track-level brief
|
||||
tickets: List[Ticket] = field() # Ordered list of tickets
|
||||
|
||||
def get_executable_tickets(self) -> List[Ticket]
|
||||
# Returns all 'todo' tickets whose depends_on are all 'completed'
|
||||
```
|
||||
|
||||
### WorkerContext
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class WorkerContext:
|
||||
ticket_id: str # Which ticket this worker is processing
|
||||
model_name: str # LLM model to use (e.g., "gemini-2.5-flash-lite")
|
||||
messages: List[dict] # Conversation history for this worker
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## DAG Engine (`dag_engine.py`)
|
||||
|
||||
Two classes: `TrackDAG` (graph) and `ExecutionEngine` (state machine).
|
||||
|
||||
### TrackDAG
|
||||
|
||||
```python
|
||||
class TrackDAG:
|
||||
def __init__(self, tickets: List[Ticket]):
|
||||
self.tickets = tickets
|
||||
self.ticket_map = {t.id: t for t in tickets} # O(1) lookup by ID
|
||||
```
|
||||
|
||||
**`get_ready_tasks()`**: Returns tickets where `status == 'todo'` AND all `depends_on` have `status == 'completed'`. Missing dependencies are treated as NOT completed (fail-safe).
|
||||
|
||||
**`has_cycle()`**: Classic DFS cycle detection using visited set + recursion stack:
|
||||
|
||||
```python
|
||||
def has_cycle(self) -> bool:
|
||||
visited = set()
|
||||
rec_stack = set()
|
||||
def is_cyclic(ticket_id):
|
||||
if ticket_id in rec_stack: return True # Back edge = cycle
|
||||
if ticket_id in visited: return False # Already explored
|
||||
visited.add(ticket_id)
|
||||
rec_stack.add(ticket_id)
|
||||
for neighbor in ticket.depends_on:
|
||||
if is_cyclic(neighbor): return True
|
||||
rec_stack.remove(ticket_id)
|
||||
return False
|
||||
for ticket in self.tickets:
|
||||
if ticket.id not in visited:
|
||||
if is_cyclic(ticket.id): return True
|
||||
return False
|
||||
```
|
||||
|
||||
**`topological_sort()`**: Calls `has_cycle()` first — raises `ValueError` if cycle found. Standard DFS post-order topological sort. Returns list of ticket ID strings in dependency order.
|
||||
|
||||
### ExecutionEngine
|
||||
|
||||
```python
|
||||
class ExecutionEngine:
|
||||
def __init__(self, dag: TrackDAG, auto_queue: bool = False):
|
||||
self.dag = dag
|
||||
self.auto_queue = auto_queue
|
||||
```
|
||||
|
||||
**`tick()`** — the heartbeat. On each call:
|
||||
1. Queries `dag.get_ready_tasks()` for eligible tickets.
|
||||
2. If `auto_queue` is enabled: non-`step_mode` tasks are automatically promoted to `in_progress`.
|
||||
3. `step_mode` tasks remain in `todo` until `approve_task()` is called.
|
||||
4. Returns the list of ready tasks.
|
||||
|
||||
**`approve_task(task_id)`**: Manually transitions `todo` → `in_progress` if all dependencies are met.
|
||||
|
||||
**`update_task_status(task_id, status)`**: Force-sets status (used by workers to mark `completed` or `blocked`).
|
||||
|
||||
---
|
||||
|
||||
## ConductorEngine (`multi_agent_conductor.py`)
|
||||
|
||||
The Tier 2 orchestrator. Owns the execution loop that drives tickets through the DAG.
|
||||
|
||||
```python
|
||||
class ConductorEngine:
|
||||
def __init__(self, track: Track, event_queue=None, auto_queue=False):
|
||||
self.track = track
|
||||
self.event_queue = event_queue
|
||||
self.tier_usage = {
|
||||
"Tier 1": {"input": 0, "output": 0},
|
||||
"Tier 2": {"input": 0, "output": 0},
|
||||
"Tier 3": {"input": 0, "output": 0},
|
||||
"Tier 4": {"input": 0, "output": 0},
|
||||
}
|
||||
self.dag = TrackDAG(self.track.tickets)
|
||||
self.engine = ExecutionEngine(self.dag, auto_queue=auto_queue)
|
||||
```
|
||||
|
||||
### State Broadcast (`_push_state`)
|
||||
|
||||
On every state change, the engine pushes the full orchestration state to the GUI via `AsyncEventQueue`:
|
||||
|
||||
```python
|
||||
async def _push_state(self, status="running", active_tier=None):
|
||||
payload = {
|
||||
"status": status, # "running" | "done" | "blocked"
|
||||
"active_tier": active_tier, # e.g., "Tier 2 (Tech Lead)", "Tier 3 (Worker): T-001"
|
||||
"tier_usage": self.tier_usage,
|
||||
"track": {"id": self.track.id, "title": self.track.description},
|
||||
"tickets": [asdict(t) for t in self.track.tickets]
|
||||
}
|
||||
await self.event_queue.put("mma_state_update", payload)
|
||||
```
|
||||
|
||||
This payload is consumed by the GUI's `_process_pending_gui_tasks` handler for `"mma_state_update"`, which updates `mma_status`, `active_tier`, `mma_tier_usage`, `active_tickets`, and `active_track`.
|
||||
|
||||
### Ticket Ingestion (`parse_json_tickets`)
|
||||
|
||||
Parses a JSON array of ticket dicts (from Tier 2 LLM output) into `Ticket` objects, appends to `self.track.tickets`, then rebuilds the `TrackDAG` and `ExecutionEngine`.
|
||||
|
||||
### Main Execution Loop (`run`)
|
||||
|
||||
```python
|
||||
async def run(self):
|
||||
while True:
|
||||
ready_tasks = self.engine.tick()
|
||||
|
||||
if not ready_tasks:
|
||||
if all tickets completed:
|
||||
await self._push_state("done")
|
||||
break
|
||||
if any in_progress:
|
||||
await asyncio.sleep(1) # Waiting for async workers
|
||||
continue
|
||||
else:
|
||||
await self._push_state("blocked")
|
||||
break
|
||||
|
||||
for ticket in ready_tasks:
|
||||
if in_progress or (auto_queue and not step_mode):
|
||||
ticket.status = "in_progress"
|
||||
await self._push_state("running", f"Tier 3 (Worker): {ticket.id}")
|
||||
|
||||
# Create worker context
|
||||
context = WorkerContext(
|
||||
ticket_id=ticket.id,
|
||||
model_name="gemini-2.5-flash-lite",
|
||||
messages=[]
|
||||
)
|
||||
|
||||
# Execute in thread pool (blocking AI call)
|
||||
await loop.run_in_executor(
|
||||
None, run_worker_lifecycle, ticket, context, ...
|
||||
)
|
||||
|
||||
await self._push_state("running", "Tier 2 (Tech Lead)")
|
||||
|
||||
elif todo and (step_mode or not auto_queue):
|
||||
await self._push_state("running", f"Awaiting Approval: {ticket.id}")
|
||||
await asyncio.sleep(1) # Pause for HITL approval
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tier 2: Tech Lead (`conductor_tech_lead.py`)
|
||||
|
||||
The Tier 2 AI call converts a high-level Track brief into discrete Tier 3 tickets.
|
||||
|
||||
### `generate_tickets(track_brief, module_skeletons) -> list[dict]`
|
||||
|
||||
```python
|
||||
def generate_tickets(track_brief: str, module_skeletons: str) -> list[dict]:
|
||||
system_prompt = mma_prompts.PROMPTS.get("tier2_sprint_planning")
|
||||
user_message = (
|
||||
f"### TRACK BRIEF:\n{track_brief}\n\n"
|
||||
f"### MODULE SKELETONS:\n{module_skeletons}\n\n"
|
||||
"Please generate the implementation tickets for this track."
|
||||
)
|
||||
# Temporarily override system prompt
|
||||
old_system_prompt = ai_client._custom_system_prompt
|
||||
ai_client.set_custom_system_prompt(system_prompt)
|
||||
try:
|
||||
response = ai_client.send(md_content="", user_message=user_message)
|
||||
# Multi-layer JSON extraction:
|
||||
# 1. Try ```json ... ``` blocks
|
||||
# 2. Try ``` ... ``` blocks
|
||||
# 3. Regex search for [ { ... } ] pattern
|
||||
tickets = json.loads(json_match)
|
||||
return tickets
|
||||
finally:
|
||||
ai_client.set_custom_system_prompt(old_system_prompt)
|
||||
```
|
||||
|
||||
The JSON extraction is defensive — handles markdown code fences, bare JSON, and regex fallback for embedded arrays.
|
||||
|
||||
### `topological_sort(tickets: list[dict]) -> list[dict]`
|
||||
|
||||
Convenience wrapper: converts raw dicts to `Ticket` objects, builds a `TrackDAG`, calls `dag.topological_sort()`, returns the original dicts reordered by sorted IDs.
|
||||
|
||||
---
|
||||
|
||||
## Tier 3: Worker Lifecycle (`run_worker_lifecycle`)
|
||||
|
||||
This free function executes a single ticket. Key behaviors:
|
||||
|
||||
### Context Amnesia
|
||||
|
||||
```python
|
||||
ai_client.reset_session() # Each ticket starts with a clean slate
|
||||
```
|
||||
|
||||
No conversational bleed between tickets. Every worker is stateless.
|
||||
|
||||
### Context Injection
|
||||
|
||||
For `context_requirements` files:
|
||||
- First file: `parser.get_curated_view(content)` — full skeleton with `@core_logic` and `[HOT]` bodies preserved.
|
||||
- Subsequent files: `parser.get_skeleton(content)` — cheaper, signatures + docstrings only.
|
||||
|
||||
### Prompt Construction
|
||||
|
||||
```python
|
||||
user_message = (
|
||||
f"You are assigned to Ticket {ticket.id}.\n"
|
||||
f"Task Description: {ticket.description}\n"
|
||||
f"\nContext Files:\n{context_injection}\n"
|
||||
"Please complete this task. If you are blocked and cannot proceed, "
|
||||
"start your response with 'BLOCKED' and explain why."
|
||||
)
|
||||
```
|
||||
|
||||
### HITL Clutch Integration
|
||||
|
||||
If `event_queue` is provided, `confirm_spawn()` is called before executing, allowing the user to:
|
||||
- Read the prompt and context.
|
||||
- Edit both the prompt and context markdown.
|
||||
- Approve, reject, or abort the entire track.
|
||||
|
||||
The `confirm_spawn` function uses the `dialog_container` pattern:
|
||||
|
||||
1. Create `dialog_container = [None]` (mutable container for thread communication).
|
||||
2. Push `"mma_spawn_approval"` task to event queue with the container.
|
||||
3. Poll `dialog_container[0]` every 100ms for up to 60 seconds.
|
||||
4. When the GUI fills in the dialog, call `.wait()` to get the result.
|
||||
5. Returns `(approved, modified_prompt, modified_context)`.
|
||||
|
||||
---
|
||||
|
||||
## Tier 4: QA Error Analysis
|
||||
|
||||
Stateless error analysis. Invoked via the `qa_callback` parameter in `shell_runner.run_powershell()` when a command fails.
|
||||
|
||||
```python
|
||||
def run_tier4_analysis(error_message: str) -> str:
|
||||
"""Stateless Tier 4 QA analysis of an error message."""
|
||||
# Uses a dedicated system prompt for error triage
|
||||
# Returns analysis text (root cause, suggested fix)
|
||||
# Does NOT modify any code — analysis only
|
||||
```
|
||||
|
||||
Integrated directly into the shell execution pipeline: if `qa_callback` is provided and the command has non-zero exit or stderr output, the callback result is appended to the tool output as `QA ANALYSIS:\n<result>`.
|
||||
|
||||
---
|
||||
|
||||
## Cross-System Data Flow
|
||||
|
||||
The full MMA lifecycle from epic to completion:
|
||||
|
||||
1. **Tier 1 (Orchestrator)**: User enters an epic description in the GUI. Creates a `Track` with a brief.
|
||||
2. **Tier 2 (Tech Lead)**: `conductor_tech_lead.generate_tickets()` calls `ai_client.send()` with the `tier2_sprint_planning` prompt, producing a JSON ticket list.
|
||||
3. **Ingestion**: `ConductorEngine.parse_json_tickets()` ingests the JSON, builds `Ticket` objects, constructs `TrackDAG` + `ExecutionEngine`.
|
||||
4. **Execution loop**: `ConductorEngine.run()` enters the async loop, calling `engine.tick()` each iteration.
|
||||
5. **Worker dispatch**: For each ready ticket, `run_worker_lifecycle()` is called in a thread executor. It uses `ai_client.send()` with MCP tools (dispatched through `mcp_client.dispatch()`).
|
||||
6. **Security enforcement**: MCP tools enforce the allowlist via `_resolve_and_check()` on every filesystem operation.
|
||||
7. **State broadcast**: `_push_state()` → `AsyncEventQueue` → GUI renders DAG + ticket status.
|
||||
8. **External visibility**: `ApiHookClient.get_mma_status()` queries the Hook API for the full orchestration state.
|
||||
9. **HITL gates**: `confirm_spawn()` pushes to event queue → GUI renders dialog → user approves/edits → `dialog_container[0].wait()` returns the decision.
|
||||
|
||||
---
|
||||
|
||||
## Token Firewalling
|
||||
|
||||
Each tier operates within its own token budget:
|
||||
|
||||
- **Tier 3 workers** use lightweight models (default: `gemini-2.5-flash-lite`) and receive only the files listed in `context_requirements`.
|
||||
- **Context Amnesia** ensures no accumulated history bleeds between tickets.
|
||||
- **Tier 2** tracks cumulative `tier_usage` per tier: `{"input": N, "output": N}` for token cost monitoring.
|
||||
- **First file vs subsequent files**: The first `context_requirements` file gets a curated view (preserving hot paths); subsequent files get only skeletons.
|
||||
|
||||
---
|
||||
|
||||
## Track State Persistence
|
||||
|
||||
Track state can be persisted to disk via `project_manager.py`:
|
||||
|
||||
```
|
||||
conductor/tracks/<track_id>/
|
||||
spec.md # Track specification (human-authored)
|
||||
plan.md # Implementation plan with checkbox tasks
|
||||
metadata.json # Track metadata (id, type, status, timestamps)
|
||||
state.toml # Structured TrackState with task list
|
||||
```
|
||||
|
||||
`project_manager.get_all_tracks(base_dir)` scans the tracks directory with a three-tier metadata fallback:
|
||||
1. `state.toml` (structured `TrackState`) — counts tasks with `status == "completed"`.
|
||||
2. `metadata.json` (legacy) — gets id/title/status only.
|
||||
3. `plan.md` (regex) — counts `- [x]` vs `- [ ]` checkboxes for progress.
|
||||
@@ -1,63 +1,377 @@
|
||||
# Manual Slop: Verification & Simulation Framework
|
||||
# Verification & Simulation Framework
|
||||
|
||||
Detailed specification of the live GUI testing infrastructure, simulation lifecycle, and the mock provider strategy.
|
||||
[Top](../Readme.md) | [Architecture](guide_architecture.md) | [Tools & IPC](guide_tools.md) | [MMA Orchestration](guide_mma.md)
|
||||
|
||||
---
|
||||
|
||||
## 1. Live GUI Verification Infrastructure
|
||||
|
||||
To verify complex UI state and asynchronous interactions, Manual Slop employs a **Live Verification** strategy using the application's built-in API hooks.
|
||||
## Infrastructure
|
||||
|
||||
### `--enable-test-hooks`
|
||||
When launched with this flag, the application starts the `HookServer` on port `8999`, exposing its internal state to external HTTP requests. This is the foundation for all automated visual verification.
|
||||
|
||||
When launched with this flag, the application starts the `HookServer` on port `8999`, exposing its internal state to external HTTP requests. This is the foundation for all automated verification. Without this flag, the Hook API is only available when the provider is `gemini_cli`.
|
||||
|
||||
### The `live_gui` pytest Fixture
|
||||
Defined in `tests/conftest.py`, this session-scoped fixture manages the lifecycle of the application under test:
|
||||
1. **Startup:** Spawns `gui_2.py` in a separate process with `--enable-test-hooks`.
|
||||
2. **Telemetry:** Polls `/status` until the hook server is ready.
|
||||
3. **Isolation:** Resets the AI session and clears comms logs between tests to prevent state pollution.
|
||||
4. **Teardown:** Robustly kills the process tree on completion or failure.
|
||||
|
||||
Defined in `tests/conftest.py`, this session-scoped fixture manages the lifecycle of the application under test.
|
||||
|
||||
**Spawning:**
|
||||
|
||||
```python
|
||||
@pytest.fixture(scope="session")
|
||||
def live_gui() -> Generator[tuple[subprocess.Popen, str], None, None]:
|
||||
process = subprocess.Popen(
|
||||
["uv", "run", "python", "-u", gui_script, "--enable-test-hooks"],
|
||||
stdout=log_file, stderr=log_file, text=True,
|
||||
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP if os.name == 'nt' else 0
|
||||
)
|
||||
```
|
||||
|
||||
- **`-u` flag**: Disables output buffering for real-time log capture.
|
||||
- **Process group**: On Windows, uses `CREATE_NEW_PROCESS_GROUP` so the entire tree (GUI + child processes) can be killed cleanly.
|
||||
- **Logging**: Stdout/stderr redirected to `logs/gui_2_py_test.log`.
|
||||
|
||||
**Readiness polling:**
|
||||
|
||||
```python
|
||||
max_retries = 15 # seconds
|
||||
while time.time() - start_time < max_retries:
|
||||
response = requests.get("http://127.0.0.1:8999/status", timeout=0.5)
|
||||
if response.status_code == 200:
|
||||
ready = True; break
|
||||
if process.poll() is not None: break # Process died early
|
||||
time.sleep(0.5)
|
||||
```
|
||||
|
||||
Polls `GET /status` every 500ms for up to 15 seconds. Checks `process.poll()` each iteration to detect early crashes (avoids waiting the full timeout if the GUI exits). Pre-check: tests if port 8999 is already occupied.
|
||||
|
||||
**Failure path:** If the hook server never responds, kills the process tree and calls `pytest.fail()` to abort the entire test session. Diagnostic telemetry (startup time, PID, success/fail) is written via `VerificationLogger`.
|
||||
|
||||
**Teardown:**
|
||||
|
||||
```python
|
||||
finally:
|
||||
client = ApiHookClient()
|
||||
client.reset_session() # Clean GUI state before killing
|
||||
time.sleep(0.5)
|
||||
kill_process_tree(process.pid)
|
||||
log_file.close()
|
||||
```
|
||||
|
||||
Sends `reset_session()` via `ApiHookClient` before killing to prevent stale state files.
|
||||
|
||||
**Yield value:** `(process: subprocess.Popen, gui_script: str)`.
|
||||
|
||||
### Session Isolation
|
||||
|
||||
```python
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_ai_client() -> Generator[None, None, None]:
|
||||
ai_client.reset_session()
|
||||
ai_client.set_provider("gemini", "gemini-2.5-flash-lite")
|
||||
yield
|
||||
```
|
||||
|
||||
Runs automatically before every test. Resets the `ai_client` module state and defaults to a safe model, preventing state pollution between tests.
|
||||
|
||||
### Process Cleanup
|
||||
|
||||
```python
|
||||
def kill_process_tree(pid: int | None) -> None:
|
||||
```
|
||||
|
||||
- **Windows**: `taskkill /F /T /PID <pid>` — force-kills the process and all children (`/T` is critical since the GUI spawns child processes).
|
||||
- **Unix**: `os.killpg(os.getpgid(pid), SIGKILL)` to kill the entire process group.
|
||||
|
||||
### VerificationLogger
|
||||
|
||||
Structured diagnostic logging for test telemetry:
|
||||
|
||||
```python
|
||||
class VerificationLogger:
|
||||
def __init__(self, test_name: str, script_name: str):
|
||||
self.logs_dir = Path(f"logs/test/{datetime.now().strftime('%Y%m%d_%H%M%S')}")
|
||||
|
||||
def log_state(self, field: str, before: Any, after: Any, delta: Any = None)
|
||||
def finalize(self, description: str, status: str, result_msg: str)
|
||||
```
|
||||
|
||||
Output format: fixed-width column table (`Field | Before | After | Delta`) written to `logs/test/<timestamp>/<script_name>.txt`. Dual output: file + tagged stdout lines for CI visibility.
|
||||
|
||||
---
|
||||
|
||||
## 2. Simulation Lifecycle: The "Puppeteer" Pattern
|
||||
## Simulation Lifecycle: The "Puppeteer" Pattern
|
||||
|
||||
Simulations (like `tests/visual_sim_mma_v2.py`) act as a "Puppeteer," driving the GUI through the `ApiHookClient`.
|
||||
Simulations act as external puppeteers, driving the GUI through the `ApiHookClient` HTTP interface. The canonical example is `tests/visual_sim_mma_v2.py`.
|
||||
|
||||
### Phase 1: Environment Setup
|
||||
* **Provider Mocking:** The simulation sets the `current_provider` to `gemini_cli` and redirects the `gcli_path` to a mock script (e.g., `tests/mock_gemini_cli.py`).
|
||||
* **Workspace Isolation:** The `files_base_dir` is pointed to a temporary artifacts directory to prevent accidental modification of the host project.
|
||||
### Stage 1: Mock Provider Setup
|
||||
|
||||
### Phase 2: User Interaction Loop
|
||||
The simulation replicates a human workflow by invoking client methods:
|
||||
1. `client.set_value('mma_epic_input', '...')`: Injects the epic description.
|
||||
2. `client.click('btn_mma_plan_epic')`: Triggers the orchestration engine.
|
||||
```python
|
||||
client = ApiHookClient()
|
||||
client.set_value('current_provider', 'gemini_cli')
|
||||
mock_cli_path = f'{sys.executable} {os.path.abspath("tests/mock_gemini_cli.py")}'
|
||||
client.set_value('gcli_path', mock_cli_path)
|
||||
client.set_value('files_base_dir', 'tests/artifacts/temp_workspace')
|
||||
client.click('btn_project_save')
|
||||
```
|
||||
|
||||
### Phase 3: Polling & Assertion
|
||||
Because AI orchestration is asynchronous, simulations use a **Polling with Multi-Modal Approval** loop:
|
||||
* **State Polling:** The script polls `client.get_mma_status()` in a loop.
|
||||
* **Auto-Approval:** If the status indicates a pending tool or spawn request, the simulation automatically clicks the approval buttons (`btn_approve_spawn`, `btn_approve_tool`).
|
||||
* **Verification:** Once the expected state (e.g., "Mock Goal 1" appears in the track list) is detected, the simulation proceeds to the next phase or asserts success.
|
||||
- Switches the GUI's LLM provider to `gemini_cli` (the CLI adapter).
|
||||
- Points the CLI binary to `python tests/mock_gemini_cli.py` — all LLM calls go to the mock.
|
||||
- Redirects `files_base_dir` to a temp workspace to prevent polluting real project directories.
|
||||
- Saves the project configuration.
|
||||
|
||||
### Stage 2: Epic Planning
|
||||
|
||||
```python
|
||||
client.set_value('mma_epic_input', 'Develop a new feature')
|
||||
client.click('btn_mma_plan_epic')
|
||||
```
|
||||
|
||||
Enters an epic description and triggers planning. The GUI invokes the LLM (which hits the mock).
|
||||
|
||||
### Stage 3: Poll for Proposed Tracks (60s timeout)
|
||||
|
||||
```python
|
||||
for _ in range(60):
|
||||
status = client.get_mma_status()
|
||||
if status.get('pending_mma_spawn_approval'): client.click('btn_approve_spawn')
|
||||
elif status.get('pending_mma_step_approval'): client.click('btn_approve_mma_step')
|
||||
elif status.get('pending_tool_approval'): client.click('btn_approve_tool')
|
||||
if status.get('proposed_tracks') and len(status['proposed_tracks']) > 0: break
|
||||
time.sleep(1)
|
||||
```
|
||||
|
||||
The **approval automation** is a critical pattern repeated in every polling loop. The MMA engine has three approval gates:
|
||||
- **Spawn approval**: Permission to create a new worker subprocess.
|
||||
- **Step approval**: Permission to proceed with the next orchestration step.
|
||||
- **Tool approval**: Permission to execute a tool call.
|
||||
|
||||
All three are auto-approved by clicking the corresponding button. Without this, the engine would block indefinitely at each gate.
|
||||
|
||||
### Stage 4: Accept Tracks
|
||||
|
||||
```python
|
||||
client.click('btn_mma_accept_tracks')
|
||||
```
|
||||
|
||||
### Stage 5: Poll for Tracks Populated (30s timeout)
|
||||
|
||||
Waits until `status['tracks']` contains a track with `'Mock Goal 1'` in its title.
|
||||
|
||||
### Stage 6: Load Track and Verify Tickets (60s timeout)
|
||||
|
||||
```python
|
||||
client.click('btn_mma_load_track', user_data=track_id_to_load)
|
||||
```
|
||||
|
||||
Then polls until:
|
||||
- `active_track` matches the loaded track ID.
|
||||
- `active_tickets` list is non-empty.
|
||||
|
||||
### Stage 7: Verify MMA Status Transitions (120s timeout)
|
||||
|
||||
Polls until `mma_status == 'running'` or `'done'`. Continues auto-approving all gates.
|
||||
|
||||
### Stage 8: Verify Worker Output in Streams (60s timeout)
|
||||
|
||||
```python
|
||||
streams = status.get('mma_streams', {})
|
||||
if any("Tier 3" in k for k in streams.keys()):
|
||||
tier3_key = [k for k in streams.keys() if "Tier 3" in k][0]
|
||||
if "SUCCESS: Mock Tier 3 worker" in streams[tier3_key]:
|
||||
streams_found = True
|
||||
```
|
||||
|
||||
Verifies that `mma_streams` contains a key with "Tier 3" and the value contains the exact mock output string.
|
||||
|
||||
### Assertions Summary
|
||||
|
||||
1. Mock provider setup succeeds (try/except with `pytest.fail`).
|
||||
2. `proposed_tracks` appears within 60 seconds.
|
||||
3. `'Mock Goal 1'` track exists in tracks list within 30 seconds.
|
||||
4. Track loads and `active_tickets` populate within 60 seconds.
|
||||
5. MMA status becomes `'running'` or `'done'` within 120 seconds.
|
||||
6. Tier 3 worker output with specific mock content appears in `mma_streams` within 60 seconds.
|
||||
|
||||
---
|
||||
|
||||
## 3. Mock Provider Strategy
|
||||
|
||||
To test the 4-Tier MMA hierarchy without incurring API costs or latency, Manual Slop uses a **Script-Based Mocking** strategy via the `gemini_cli` adapter.
|
||||
## Mock Provider Strategy
|
||||
|
||||
### `tests/mock_gemini_cli.py`
|
||||
This script simulates the behavior of the `gemini` CLI by:
|
||||
1. **Input Parsing:** Reading the system prompt and user message from the environment/stdin.
|
||||
2. **Deterministic Response:** Returning pre-defined JSON payloads (e.g., track definitions, worker implementation scripts) based on keywords in the prompt.
|
||||
3. **Tool Simulation:** Mimicking function-call responses to trigger the "Execution Clutch" within the GUI.
|
||||
|
||||
A fake Gemini CLI executable that replaces the real `gemini` binary during integration tests. Outputs JSON-L messages matching the real CLI's streaming output protocol.
|
||||
|
||||
**Input mechanism:**
|
||||
|
||||
```python
|
||||
prompt = sys.stdin.read() # Primary: prompt via stdin
|
||||
sys.argv # Secondary: management command detection
|
||||
os.environ.get('GEMINI_CLI_HOOK_CONTEXT') # Tertiary: environment variable
|
||||
```
|
||||
|
||||
**Management command bypass:**
|
||||
|
||||
```python
|
||||
if len(sys.argv) > 1 and sys.argv[1] in ["mcp", "extensions", "skills", "hooks"]:
|
||||
return # Silent exit
|
||||
```
|
||||
|
||||
**Response routing** — keyword matching on stdin content:
|
||||
|
||||
| Prompt Contains | Response | Session ID |
|
||||
|---|---|---|
|
||||
| `'PATH: Epic Initialization'` | Two mock Track objects (`mock-track-1`, `mock-track-2`) | `mock-session-epic` |
|
||||
| `'PATH: Sprint Planning'` | Two mock Ticket objects (`mock-ticket-1` independent, `mock-ticket-2` depends on `mock-ticket-1`) | `mock-session-sprint` |
|
||||
| `'"role": "tool"'` or `'"tool_call_id"'` | Success message (simulates post-tool-call final answer) | `mock-session-final` |
|
||||
| Default (Tier 3 worker prompts) | `"SUCCESS: Mock Tier 3 worker implemented the change. [MOCK OUTPUT]"` | `mock-session-default` |
|
||||
|
||||
**Output protocol** — every response is exactly two JSON-L lines:
|
||||
|
||||
```json
|
||||
{"type": "message", "role": "assistant", "content": "<response>"}
|
||||
{"type": "result", "status": "success", "stats": {"total_tokens": N, ...}, "session_id": "mock-session-*"}
|
||||
```
|
||||
|
||||
This matches the real Gemini CLI's streaming output format. `flush=True` on every `print()` ensures the consuming process receives data immediately.
|
||||
|
||||
**Tool call simulation:** The mock does **not** emit tool calls. It detects tool results in the prompt (`'"role": "tool"'` check) and responds with a final answer, simulating the second turn of a tool-call conversation without actually issuing calls.
|
||||
|
||||
**Debug output:** All debug information goes to stderr, keeping stdout clean for the JSON-L protocol.
|
||||
|
||||
---
|
||||
|
||||
## 4. Visual Verification Examples
|
||||
## Visual Verification Patterns
|
||||
|
||||
Tests in this framework don't just check return values; they verify the **rendered state** of the application:
|
||||
* **DAG Integrity:** Verifying that `active_tickets` in the MMA status matches the expected task graph.
|
||||
* **Stream Telemetry:** Checking `mma_streams` to ensure that output from multiple tiers is correctly captured and displayed in the terminal.
|
||||
* **Modal State:** Asserting that the correct dialog (e.g., `ConfirmDialog`) is active during a pending tool call.
|
||||
Tests in this framework don't just check return values — they verify the **rendered state** of the application via the Hook API.
|
||||
|
||||
By combining these techniques, Manual Slop achieves a level of verification rigor usually reserved for high-stakes embedded systems or complex graphics engines.
|
||||
### DAG Integrity
|
||||
|
||||
Verify that `active_tickets` in the MMA status matches the expected task graph:
|
||||
|
||||
```python
|
||||
status = client.get_mma_status()
|
||||
tickets = status.get('active_tickets', [])
|
||||
assert len(tickets) >= 2
|
||||
assert any(t['id'] == 'mock-ticket-1' for t in tickets)
|
||||
```
|
||||
|
||||
### Stream Telemetry
|
||||
|
||||
Check `mma_streams` to ensure output from multiple tiers is correctly captured and routed:
|
||||
|
||||
```python
|
||||
streams = status.get('mma_streams', {})
|
||||
tier3_keys = [k for k in streams.keys() if "Tier 3" in k]
|
||||
assert len(tier3_keys) > 0
|
||||
assert "SUCCESS" in streams[tier3_keys[0]]
|
||||
```
|
||||
|
||||
### Modal State
|
||||
|
||||
Assert that the correct dialog is active during a pending tool call:
|
||||
|
||||
```python
|
||||
status = client.get_mma_status()
|
||||
assert status.get('pending_tool_approval') == True
|
||||
# or
|
||||
diag = client.get_indicator_state('thinking')
|
||||
assert diag.get('thinking') == True
|
||||
```
|
||||
|
||||
### Performance Monitoring
|
||||
|
||||
Verify UI responsiveness under load:
|
||||
|
||||
```python
|
||||
perf = client.get_performance()
|
||||
assert perf['fps'] > 30
|
||||
assert perf['input_lag_ms'] < 100
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Supporting Analysis Modules
|
||||
|
||||
### `file_cache.py` — ASTParser (tree-sitter)
|
||||
|
||||
```python
|
||||
class ASTParser:
|
||||
def __init__(self, language: str = "python"):
|
||||
self.language = tree_sitter.Language(tree_sitter_python.language())
|
||||
self.parser = tree_sitter.Parser(self.language)
|
||||
|
||||
def parse(self, code: str) -> tree_sitter.Tree
|
||||
def get_skeleton(self, code: str) -> str
|
||||
def get_curated_view(self, code: str) -> str
|
||||
```
|
||||
|
||||
**`get_skeleton` algorithm:**
|
||||
1. Parse code to tree-sitter AST.
|
||||
2. Walk all `function_definition` nodes.
|
||||
3. For each body (`block` node):
|
||||
- If first non-comment child is a docstring: preserve docstring, replace rest with `...`.
|
||||
- Otherwise: replace entire body with `...`.
|
||||
4. Apply edits in reverse byte order (maintains valid offsets).
|
||||
|
||||
**`get_curated_view` algorithm:**
|
||||
Enhanced skeleton that preserves bodies under two conditions:
|
||||
- Function has `@core_logic` decorator.
|
||||
- Function body contains a `# [HOT]` comment anywhere in its descendants.
|
||||
|
||||
If either condition is true, the body is preserved verbatim. This enables a two-tier code view: hot paths shown in full, boilerplate compressed.
|
||||
|
||||
### `summarize.py` — Heuristic File Summaries
|
||||
|
||||
Token-efficient structural descriptions without AI calls:
|
||||
|
||||
```python
|
||||
_SUMMARISERS: dict[str, Callable] = {
|
||||
".py": _summarise_python, # imports, classes, methods, functions, constants
|
||||
".toml": _summarise_toml, # table keys + array lengths
|
||||
".md": _summarise_markdown, # h1-h3 headings
|
||||
".ini": _summarise_generic, # line count + preview
|
||||
}
|
||||
```
|
||||
|
||||
**`_summarise_python`** uses stdlib `ast`:
|
||||
1. Parse with `ast.parse()`.
|
||||
2. Extract deduplicated imports (top-level module names only).
|
||||
3. Extract `ALL_CAPS` constants (both `Assign` and `AnnAssign`).
|
||||
4. Extract classes with their method names.
|
||||
5. Extract top-level function names.
|
||||
|
||||
Output:
|
||||
```
|
||||
**Python** — 150 lines
|
||||
imports: ast, json, pathlib
|
||||
constants: TIMEOUT_SECONDS
|
||||
class ASTParser: __init__, parse, get_skeleton
|
||||
functions: summarise_file, build_summary_markdown
|
||||
```
|
||||
|
||||
### `outline_tool.py` — Hierarchical Code Outline
|
||||
|
||||
```python
|
||||
class CodeOutliner:
|
||||
def outline(self, code: str) -> str
|
||||
```
|
||||
|
||||
Walks top-level `ast` nodes:
|
||||
- `ClassDef` → `[Class] Name (Lines X-Y)` + docstring + recurse for methods
|
||||
- `FunctionDef` → `[Func] Name (Lines X-Y)` or `[Method] Name` if nested
|
||||
- `AsyncFunctionDef` → `[Async Func] Name (Lines X-Y)`
|
||||
|
||||
Only extracts first line of docstrings. Uses indentation depth as heuristic for method vs function.
|
||||
|
||||
---
|
||||
|
||||
## Two Parallel Code Analysis Implementations
|
||||
|
||||
The codebase has two parallel approaches for structural code analysis:
|
||||
|
||||
| Aspect | `file_cache.py` (tree-sitter) | `summarize.py` / `outline_tool.py` (stdlib `ast`) |
|
||||
|---|---|---|
|
||||
| Parser | tree-sitter with `tree_sitter_python` | Python's built-in `ast` module |
|
||||
| Precision | Byte-accurate, preserves exact syntax | Line-level, may lose formatting nuance |
|
||||
| `@core_logic` / `[HOT]` | Supported (selective body preservation) | Not supported |
|
||||
| Used by | `py_get_skeleton` MCP tool, worker context injection | `get_file_summary` MCP tool, `py_get_code_outline` |
|
||||
| Performance | Slightly slower (C extension + tree walk) | Faster (pure Python, simpler walk) |
|
||||
|
||||
@@ -1,65 +1,385 @@
|
||||
# Manual Slop: Tooling & IPC Technical Reference
|
||||
# Tooling & IPC Technical Reference
|
||||
|
||||
A deep-dive into the Model Context Protocol (MCP) bridge, the Hook API, and the "Human-in-the-Loop" communication protocol.
|
||||
[Top](../Readme.md) | [Architecture](guide_architecture.md) | [MMA Orchestration](guide_mma.md) | [Simulations](guide_simulations.md)
|
||||
|
||||
---
|
||||
|
||||
## 1. The MCP Bridge: Filesystem Security
|
||||
## The MCP Bridge: Filesystem Security
|
||||
|
||||
The AI's ability to interact with your filesystem is mediated by a strict security allowlist.
|
||||
The AI's ability to interact with the filesystem is mediated by a three-layer security model in `mcp_client.py`. Every tool accessing the disk passes through `_resolve_and_check(path)` before any I/O occurs.
|
||||
|
||||
### Path Resolution & Sandboxing
|
||||
Every tool accessing the disk (e.g., `read_file`, `list_directory`, `search_files`) executes `_resolve_and_check(path)`:
|
||||
1. **Normalization:** The requested path is converted to an absolute path.
|
||||
2. **Constraint Check:** The path must reside within the project's `base_dir`.
|
||||
3. **Enforcement:** Violations trigger a `PermissionError`, returned to the model as an `ACCESS DENIED` status.
|
||||
### Global State
|
||||
|
||||
### Native Toolset
|
||||
* **`read_file(path)`:** UTF-8 extraction, clamped by token budgets.
|
||||
* **`list_directory(path)`:** Returns a structural map (Name, Type, Size).
|
||||
* **`get_file_summary(path)`:** AST-based heuristic parsing for high-signal architectural mapping without full-file read costs.
|
||||
* **`web_search(query)`:** Scrapes DuckDuckGo raw HTML via a dependency-free parser.
|
||||
```python
|
||||
_allowed_paths: set[Path] = set() # Explicit file allowlist (resolved absolutes)
|
||||
_base_dirs: set[Path] = set() # Directory roots for containment checks
|
||||
_primary_base_dir: Path | None = None # Used for resolving relative paths
|
||||
perf_monitor_callback: Optional[Callable[[], dict[str, Any]]] = None
|
||||
```
|
||||
|
||||
### Layer 1: Allowlist Construction (`configure`)
|
||||
|
||||
Called by `ai_client` before each send cycle. Takes `file_items` (from `aggregate.build_file_items()`) and optional `extra_base_dirs`.
|
||||
|
||||
1. Resets `_allowed_paths` and `_base_dirs` to empty sets on every call.
|
||||
2. Sets `_primary_base_dir` from `extra_base_dirs[0]` (resolved) or falls back to `Path.cwd()`.
|
||||
3. Iterates all `file_items`, resolving each `item["path"]` to an absolute path. Each resolved path is added to `_allowed_paths`; its parent directory is added to `_base_dirs`.
|
||||
4. Any entries in `extra_base_dirs` that are valid directories are also added to `_base_dirs`.
|
||||
|
||||
### Layer 2: Path Validation (`_is_allowed`)
|
||||
|
||||
Checks run in this exact order:
|
||||
|
||||
1. **Blacklist** (hard deny): If filename is `history.toml` or ends with `_history.toml`, return `False`. Prevents the AI from reading conversation history.
|
||||
2. **Explicit allowlist**: If resolved path is in `_allowed_paths`, return `True`.
|
||||
3. **CWD fallback**: If `_base_dirs` is empty, any path under `cwd()` is allowed.
|
||||
4. **Base directory containment**: Path must be a subpath of at least one entry in `_base_dirs` (via `relative_to()`).
|
||||
5. **Default deny**: All other paths are rejected.
|
||||
|
||||
All paths are resolved (following symlinks) before comparison, preventing symlink-based traversal.
|
||||
|
||||
### Layer 3: Resolution Gate (`_resolve_and_check`)
|
||||
|
||||
Every tool call passes through this:
|
||||
|
||||
1. Convert raw path string to `Path`.
|
||||
2. If not absolute, prepend `_primary_base_dir`.
|
||||
3. Resolve to absolute.
|
||||
4. Call `_is_allowed()`.
|
||||
5. Return `(resolved_path, "")` on success or `(None, error_message)` on failure.
|
||||
|
||||
The error message includes the full list of allowed base directories for debugging.
|
||||
|
||||
---
|
||||
|
||||
## 2. The Hook API: Remote Control & Telemetry
|
||||
## Native Tool Inventory
|
||||
|
||||
Manual Slop exposes a REST-based IPC interface (running by default on port `8999`) to facilitate automated verification and external monitoring.
|
||||
The `dispatch` function (line 806) is a flat if/elif chain mapping 26 tool names to implementations. All tools are categorized below with their parameters and behavior.
|
||||
|
||||
### Core Endpoints
|
||||
* `GET /status`: Engine health and hook server readiness.
|
||||
* `GET /mma_status`: Retrieves the 4-Tier state, active track metadata, and current ticket DAG status.
|
||||
* `POST /api/gui`: Pushes events into the `AsyncEventQueue`.
|
||||
* Payload example: `{"action": "set_value", "item": "current_provider", "value": "anthropic"}`
|
||||
* `GET /diagnostics`: High-frequency telemetry for UI performance (FPS, CPU, Input Lag).
|
||||
### File I/O Tools
|
||||
|
||||
### ApiHookClient Implementation
|
||||
The `api_hook_client.py` provides a robust wrapper for the Hook API:
|
||||
* **Synchronous Wait:** `wait_for_server()` polls `/status` with exponential backoff.
|
||||
* **State Polling:** `wait_for_value()` blocks until a specific GUI element matches an expected state.
|
||||
* **Remote Interaction:** `click()`, `set_value()`, and `select_tab()` methods allow external agents to drive the GUI.
|
||||
| Tool | Parameters | Description |
|
||||
|---|---|---|
|
||||
| `read_file` | `path` | UTF-8 file content extraction |
|
||||
| `list_directory` | `path` | Compact table: `[file/dir] name size`. Applies blacklist filter to entries. |
|
||||
| `search_files` | `path`, `pattern` | Glob pattern matching within an allowed directory. Applies blacklist filter. |
|
||||
| `get_file_slice` | `path`, `start_line`, `end_line` | Returns specific line range (1-based, inclusive) |
|
||||
| `set_file_slice` | `path`, `start_line`, `end_line`, `new_content` | Replaces a line range with new content (surgical edit) |
|
||||
| `get_tree` | `path`, `max_depth` | Directory structure up to `max_depth` levels |
|
||||
|
||||
### AST-Based Tools (Python only)
|
||||
|
||||
These use `file_cache.ASTParser` (tree-sitter) or stdlib `ast` for structural code analysis:
|
||||
|
||||
| Tool | Parameters | Description |
|
||||
|---|---|---|
|
||||
| `py_get_skeleton` | `path` | Signatures + docstrings, bodies replaced with `...`. Uses tree-sitter. |
|
||||
| `py_get_code_outline` | `path` | Hierarchical outline: `[Class] Name (Lines X-Y)` with nested methods. Uses stdlib `ast`. |
|
||||
| `py_get_definition` | `path`, `name` | Full source of a specific class/function/method. Supports `ClassName.method` dot notation. |
|
||||
| `py_update_definition` | `path`, `name`, `new_content` | Surgical replacement: locates symbol via `ast`, delegates to `set_file_slice`. |
|
||||
| `py_get_signature` | `path`, `name` | Only the `def` line through the colon. |
|
||||
| `py_set_signature` | `path`, `name`, `new_signature` | Replaces only the signature, preserving body. |
|
||||
| `py_get_class_summary` | `path`, `name` | Class docstring + list of method signatures. |
|
||||
| `py_get_var_declaration` | `path`, `name` | Module-level or class-level variable assignment line(s). |
|
||||
| `py_set_var_declaration` | `path`, `name`, `new_declaration` | Surgical variable replacement. |
|
||||
| `py_find_usages` | `path`, `name` | Exact string match search across a file or directory. |
|
||||
| `py_get_imports` | `path` | Parses AST, returns strict dependency list. |
|
||||
| `py_check_syntax` | `path` | Quick syntax validation via `ast.parse()`. |
|
||||
| `py_get_hierarchy` | `path`, `class_name` | Scans directory for subclasses of a given class. |
|
||||
| `py_get_docstring` | `path`, `name` | Extracts docstring for module, class, or function. |
|
||||
|
||||
### Analysis Tools
|
||||
|
||||
| Tool | Parameters | Description |
|
||||
|---|---|---|
|
||||
| `get_file_summary` | `path` | Heuristic summary via `summarize.py`: imports, classes, functions, constants for `.py`; table keys for `.toml`; headings for `.md`. |
|
||||
| `get_git_diff` | `path`, `base_rev`, `head_rev` | Git diff output for a file or directory. |
|
||||
|
||||
### Network Tools
|
||||
|
||||
| Tool | Parameters | Description |
|
||||
|---|---|---|
|
||||
| `web_search` | `query` | Scrapes DuckDuckGo HTML via dependency-free `_DDGParser` (HTMLParser subclass). Returns top 5 results with title, URL, snippet. |
|
||||
| `fetch_url` | `url` | Fetches URL content, strips HTML tags via `_TextExtractor`. |
|
||||
|
||||
### Runtime Tools
|
||||
|
||||
| Tool | Parameters | Description |
|
||||
|---|---|---|
|
||||
| `get_ui_performance` | (none) | Returns FPS, Frame Time, CPU, Input Lag via injected `perf_monitor_callback`. No security check (no filesystem access). |
|
||||
|
||||
### Tool Implementation Patterns
|
||||
|
||||
**AST-based read tools** follow this pattern:
|
||||
```python
|
||||
def py_get_skeleton(path: str) -> str:
|
||||
p, err = _resolve_and_check(path)
|
||||
if err: return err
|
||||
if not p.exists(): return f"ERROR: file not found: {path}"
|
||||
if not p.is_file() or p.suffix != ".py": return f"ERROR: not a python file: {path}"
|
||||
from file_cache import ASTParser
|
||||
code = p.read_text(encoding="utf-8")
|
||||
parser = ASTParser("python")
|
||||
return parser.get_skeleton(code)
|
||||
```
|
||||
|
||||
**AST-based write tools** use stdlib `ast` (not tree-sitter) to locate symbols, then delegate to `set_file_slice`:
|
||||
```python
|
||||
def py_update_definition(path: str, name: str, new_content: str) -> str:
|
||||
p, err = _resolve_and_check(path)
|
||||
if err: return err
|
||||
code = p.read_text(encoding="utf-8").lstrip(chr(0xFEFF)) # Strip BOM
|
||||
tree = ast.parse(code)
|
||||
node = _get_symbol_node(tree, name) # Walks AST for matching node
|
||||
if not node: return f"ERROR: could not find definition '{name}'"
|
||||
start = getattr(node, "lineno")
|
||||
end = getattr(node, "end_lineno")
|
||||
return set_file_slice(path, start, end, new_content)
|
||||
```
|
||||
|
||||
The `_get_symbol_node` helper supports dot notation (`ClassName.method_name`) by first finding the class, then searching its body for the method.
|
||||
|
||||
---
|
||||
|
||||
## 3. The HITL IPC Flow: `ask/respond`
|
||||
## The Hook API: Remote Control & Telemetry
|
||||
|
||||
Manual Slop supports a synchronous "Human-in-the-Loop" request pattern for operations requiring explicit confirmation or manual data mutation.
|
||||
Manual Slop exposes a REST-based IPC interface on `127.0.0.1:8999` using Python's `ThreadingHTTPServer`. Each incoming request gets its own thread.
|
||||
|
||||
### Sequence of Operation
|
||||
1. **Request:** A background agent (e.g., a Tier 3 Worker) calls `/api/ask` with a JSON payload.
|
||||
2. **Intercept:** the `HookServer` generates a unique `request_id` and pushes a `type: "ask"` event to the GUI's `_pending_gui_tasks`.
|
||||
3. **Modal Display:** The GUI renders an `Approve/Reject` modal with the payload details.
|
||||
4. **Response:** Upon user action, the GUI thread `POST`s to `/api/ask/respond`.
|
||||
5. **Resume:** The original agent call to `/api/ask` (which was polling for completion) unblocks and receives the user's response.
|
||||
### Server Architecture
|
||||
|
||||
This pattern is the foundation of the **Execution Clutch**, ensuring that no destructive action occurs without an auditable human signal.
|
||||
```python
|
||||
class HookServerInstance(ThreadingHTTPServer):
|
||||
app: Any # Reference to main App instance
|
||||
|
||||
class HookHandler(BaseHTTPRequestHandler):
|
||||
# Accesses self.server.app for all state
|
||||
|
||||
class HookServer:
|
||||
app: Any
|
||||
port: int = 8999
|
||||
server: HookServerInstance | None
|
||||
thread: threading.Thread | None
|
||||
```
|
||||
|
||||
**Start conditions**: Only starts if `app.test_hooks_enabled == True` OR current provider is `'gemini_cli'`. Otherwise `start()` silently returns.
|
||||
|
||||
**Initialization**: On start, ensures the app has `_pending_gui_tasks` + lock, `_pending_asks` + `_ask_responses` dicts, and `_api_event_queue` + lock.
|
||||
|
||||
### GUI Thread Trampoline Pattern
|
||||
|
||||
The HookServer **never reads GUI state directly** (thread safety). For state reads, it uses a trampoline:
|
||||
|
||||
1. Create a `threading.Event()` and a `result` dict.
|
||||
2. Push a `custom_callback` closure into `_pending_gui_tasks` that reads state and calls `event.set()`.
|
||||
3. Block on `event.wait(timeout=60)`.
|
||||
4. Return `result` as JSON, or 504 on timeout.
|
||||
|
||||
This ensures all state reads happen on the GUI main thread during `_process_pending_gui_tasks`.
|
||||
|
||||
### GET Endpoints
|
||||
|
||||
| Endpoint | Thread Safety | Response |
|
||||
|---|---|---|
|
||||
| `GET /status` | Direct (stateless) | `{"status": "ok"}` |
|
||||
| `GET /api/project` | Direct read | `{"project": <flat_config>}` via `project_manager.flat_config()` |
|
||||
| `GET /api/session` | Direct read | `{"session": {"entries": [...]}}` from `app.disc_entries` |
|
||||
| `GET /api/performance` | Direct read | `{"performance": <metrics>}` from `app.perf_monitor.get_metrics()` |
|
||||
| `GET /api/events` | Lock-guarded drain | `{"events": [...]}` — drains and clears `_api_event_queue` |
|
||||
| `GET /api/gui/value` | GUI trampoline | `{"value": <val>}` — reads from `_settable_fields` map |
|
||||
| `GET /api/gui/value/<tag>` | GUI trampoline | Same, via URL path param |
|
||||
| `GET /api/gui/mma_status` | GUI trampoline | Full MMA state dict (see below) |
|
||||
| `GET /api/gui/diagnostics` | GUI trampoline | `{thinking, live, prior}` booleans |
|
||||
|
||||
**`/api/gui/mma_status` response fields:**
|
||||
|
||||
```python
|
||||
{
|
||||
"mma_status": str, # "idle" | "planning" | "executing" | "done"
|
||||
"ai_status": str, # "idle" | "sending..." | etc.
|
||||
"active_tier": str | None,
|
||||
"active_track": str, # Track ID or raw value
|
||||
"active_tickets": list, # Serialized ticket dicts
|
||||
"mma_step_mode": bool,
|
||||
"pending_tool_approval": bool, # _pending_ask_dialog
|
||||
"pending_mma_step_approval": bool, # _pending_mma_approval is not None
|
||||
"pending_mma_spawn_approval": bool, # _pending_mma_spawn is not None
|
||||
"pending_approval": bool, # Backward compat: step OR tool
|
||||
"pending_spawn": bool, # Alias for spawn approval
|
||||
"tracks": list,
|
||||
"proposed_tracks": list,
|
||||
"mma_streams": dict, # {stream_id: output_text}
|
||||
}
|
||||
```
|
||||
|
||||
**`/api/gui/diagnostics` response fields:**
|
||||
|
||||
```python
|
||||
{
|
||||
"thinking": bool, # ai_status in ["sending...", "running powershell..."]
|
||||
"live": bool, # ai_status in ["running powershell...", "fetching url...", ...]
|
||||
"prior": bool, # app.is_viewing_prior_session
|
||||
}
|
||||
```
|
||||
|
||||
### POST Endpoints
|
||||
|
||||
| Endpoint | Body | Response | Effect |
|
||||
|---|---|---|---|
|
||||
| `POST /api/project` | `{"project": {...}}` | `{"status": "updated"}` | Sets `app.project` |
|
||||
| `POST /api/session` | `{"session": {"entries": [...]}}` | `{"status": "updated"}` | Sets `app.disc_entries` |
|
||||
| `POST /api/gui` | Any JSON dict | `{"status": "queued"}` | Appends to `_pending_gui_tasks` |
|
||||
| `POST /api/ask` | Any JSON dict | `{"status": "ok", "response": ...}` or 504 | Blocking ask dialog |
|
||||
| `POST /api/ask/respond` | `{"request_id": ..., "response": ...}` | `{"status": "ok"}` or 404 | Resolves a pending ask |
|
||||
|
||||
### The `/api/ask` Protocol (Synchronous HITL via HTTP)
|
||||
|
||||
This is the most complex endpoint — it implements a blocking request-response dialog over HTTP:
|
||||
|
||||
1. Generate a UUID `request_id`.
|
||||
2. Create a `threading.Event`.
|
||||
3. Register in `app._pending_asks[request_id] = event`.
|
||||
4. Push an `ask_received` event to `_api_event_queue` (for client discovery).
|
||||
5. Append `{"type": "ask", "request_id": ..., "data": ...}` to `_pending_gui_tasks`.
|
||||
6. Block on `event.wait(timeout=60.0)`.
|
||||
7. On signal: read `app._ask_responses[request_id]`, clean up, return 200.
|
||||
8. On timeout: clean up, return 504.
|
||||
|
||||
The counterpart `/api/ask/respond`:
|
||||
|
||||
1. Look up `request_id` in `app._pending_asks`.
|
||||
2. Store `response` in `app._ask_responses[request_id]`.
|
||||
3. Signal the event (`event.set()`).
|
||||
4. Queue a `clear_ask` GUI task.
|
||||
5. Return 200 (or 404 if `request_id` not found).
|
||||
|
||||
---
|
||||
|
||||
## 4. Synthetic Context Refresh
|
||||
## ApiHookClient: The Automation Interface
|
||||
|
||||
To minimize token churn and redundant `read_file` calls, the `ai_client` performs a post-tool-execution refresh:
|
||||
1. **Detection:** Triggered after the final tool call in a reasoning round.
|
||||
2. **Collection:** re-reads all project-tracked files from disk.
|
||||
3. **Injection:** The updated content is injected into the next LLM turn as a `[SYSTEM: FILES UPDATED]` block.
|
||||
4. **Pruning:** Older snapshots are stripped from history in subsequent rounds to maintain a lean context window.
|
||||
`api_hook_client.py` provides a synchronous Python client for the Hook API, used by test scripts and external tooling.
|
||||
|
||||
```python
|
||||
class ApiHookClient:
|
||||
def __init__(self, base_url="http://127.0.0.1:8999", max_retries=5, retry_delay=0.2)
|
||||
```
|
||||
|
||||
### Connection Methods
|
||||
|
||||
| Method | Description |
|
||||
|---|---|
|
||||
| `wait_for_server(timeout=3)` | Polls `/status` with exponential backoff until server is ready. |
|
||||
| `_make_request(method, endpoint, data, timeout)` | Core HTTP client with retry logic. |
|
||||
|
||||
### State Query Methods
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|---|---|---|
|
||||
| `get_status()` | `GET /status` | Health check |
|
||||
| `get_project()` | `GET /api/project` | Full project config |
|
||||
| `get_session()` | `GET /api/session` | Discussion entries |
|
||||
| `get_mma_status()` | `GET /api/gui/mma_status` | Full MMA orchestration state |
|
||||
| `get_performance()` | `GET /api/performance` | UI metrics (FPS, CPU, etc.) |
|
||||
| `get_value(item)` | `GET /api/gui/value/<item>` | Read any `_settable_fields` value |
|
||||
| `get_text_value(item_tag)` | Wraps `get_value` | Returns string representation or None |
|
||||
| `get_events()` | `GET /api/events` | Fetches and clears the event queue |
|
||||
| `get_indicator_state(tag)` | `GET /api/gui/diagnostics` | Checks if an indicator is shown |
|
||||
| `get_node_status(node_tag)` | Two-phase: `get_value` then `diagnostics` | DAG node status with fallback |
|
||||
|
||||
### GUI Manipulation Methods
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|---|---|---|
|
||||
| `set_value(item, value)` | `POST /api/gui` | Sets any `_settable_fields` value; special-cases `current_provider` and `gcli_path` |
|
||||
| `click(item, *args, **kwargs)` | `POST /api/gui` | Simulates button click; passes optional `user_data` |
|
||||
| `select_tab(tab_bar, tab)` | `POST /api/gui` | Switches to a specific tab |
|
||||
| `select_list_item(listbox, item_value)` | `POST /api/gui` | Selects an item in a listbox |
|
||||
| `push_event(event_type, payload)` | `POST /api/gui` | Pushes event into `AsyncEventQueue` |
|
||||
| `post_gui(gui_data)` | `POST /api/gui` | Raw task dict injection |
|
||||
| `reset_session()` | Clicks `btn_reset_session` | Simulates clicking the Reset Session button |
|
||||
|
||||
### Polling Methods
|
||||
|
||||
| Method | Description |
|
||||
|---|---|
|
||||
| `wait_for_event(event_type, timeout=5)` | Polls `get_events()` until a matching event type appears. |
|
||||
| `wait_for_value(item, expected, timeout=5)` | Polls `get_value(item)` until it equals `expected`. |
|
||||
|
||||
### HITL Method
|
||||
|
||||
| Method | Description |
|
||||
|---|---|
|
||||
| `request_confirmation(tool_name, args)` | Sends to `/api/ask`, blocks until user responds via the GUI dialog. |
|
||||
|
||||
---
|
||||
|
||||
## Synthetic Context Refresh
|
||||
|
||||
To minimize token churn and redundant `read_file` calls, the `ai_client` performs a post-tool-execution context refresh. See [guide_architecture.md](guide_architecture.md#context-refresh-mechanism) for the full algorithm.
|
||||
|
||||
Summary:
|
||||
1. **Detection**: Triggered after the final tool call in each reasoning round.
|
||||
2. **Collection**: Re-reads all project-tracked files, comparing mtimes.
|
||||
3. **Injection**: Changed files are diffed and appended as `[SYSTEM: FILES UPDATED]` to the last tool output.
|
||||
4. **Pruning**: Older `[FILES UPDATED]` blocks are stripped from history in subsequent rounds.
|
||||
|
||||
---
|
||||
|
||||
## Session Logging
|
||||
|
||||
`session_logger.py` opens timestamped log files at GUI startup and keeps them open for the process lifetime.
|
||||
|
||||
### File Layout
|
||||
|
||||
```
|
||||
logs/sessions/<session_id>/
|
||||
comms.log # JSON-L: every API interaction (direction, kind, payload)
|
||||
toolcalls.log # Markdown: sequential tool invocation records
|
||||
apihooks.log # API hook invocations
|
||||
clicalls.log # JSON-L: CLI subprocess details (command, stdin, stdout, stderr, latency)
|
||||
|
||||
scripts/generated/
|
||||
<ts>_<seq:04d>.ps1 # Each AI-generated PowerShell script, preserved in order
|
||||
```
|
||||
|
||||
### Logging Functions
|
||||
|
||||
| Function | Target | Format |
|
||||
|---|---|---|
|
||||
| `log_comms(entry)` | `comms.log` | JSON-L line per entry |
|
||||
| `log_tool_call(script, result, script_path)` | `toolcalls.log` + `scripts/generated/` | Markdown record + preserved `.ps1` file |
|
||||
| `log_api_hook(method, path, body)` | `apihooks.log` | Timestamped text line |
|
||||
| `log_cli_call(command, stdin, stdout, stderr, latency)` | `clicalls.log` | JSON-L with latency tracking |
|
||||
|
||||
### Lifecycle
|
||||
|
||||
- `open_session(label)`: Called once at GUI startup. Idempotent (checks if already open). Registers `atexit.register(close_session)`.
|
||||
- `close_session()`: Flushes and closes all file handles.
|
||||
|
||||
---
|
||||
|
||||
## Shell Runner
|
||||
|
||||
`shell_runner.py` executes PowerShell scripts with environment configuration, timeout handling, and optional QA integration.
|
||||
|
||||
### Environment Configuration via `mcp_env.toml`
|
||||
|
||||
```toml
|
||||
[path]
|
||||
prepend = ["C:/custom/bin", "C:/other/tools"]
|
||||
|
||||
[env]
|
||||
MY_VAR = "some_value"
|
||||
EXPANDED = "${HOME}/subdir"
|
||||
```
|
||||
|
||||
`_build_subprocess_env()` copies `os.environ`, prepends `[path].prepend` entries to `PATH`, and sets `[env]` key-value pairs with `${VAR}` expansion.
|
||||
|
||||
### `run_powershell(script, base_dir, qa_callback=None) -> str`
|
||||
|
||||
1. Prepends `Set-Location -LiteralPath '<base_dir>'` (with escaped single quotes).
|
||||
2. Locates PowerShell: tries `powershell.exe`, `pwsh.exe`, `powershell`, `pwsh` in order.
|
||||
3. Runs via `subprocess.Popen([exe, "-NoProfile", "-NonInteractive", "-Command", full_script])`.
|
||||
4. `process.communicate(timeout=60)` — 60-second hard timeout.
|
||||
5. On `TimeoutExpired`: kills process tree via `taskkill /F /T /PID`, returns `"ERROR: timed out after 60s"`.
|
||||
6. Returns combined output: `STDOUT:\n<out>\nSTDERR:\n<err>\nEXIT CODE: <code>`.
|
||||
7. If `qa_callback` provided and command failed: appends `QA ANALYSIS:\n<qa_callback(stderr)>` — integrates Tier 4 QA error analysis directly.
|
||||
|
||||
Reference in New Issue
Block a user