docs(conductor): Expert-level architectural documentation refresh

2026-03-01 09:19:48 -05:00
parent 7384df1e29
commit bf4468f125
8 changed files with 263 additions and 195 deletions
@@ -1,87 +1,72 @@
-# Guide: Architecture
+# Manual Slop: Architectural Technical Reference

-Overview of the package design, state management, and code-path layout.
+A deep-dive into the asynchronous orchestration, state synchronization, and the "Linear Execution Clutch" of the Manual Slop engine. This document is designed to move the reader from a high-level mental model to a low-level implementation understanding.

 ---

-The purpose of this software is to alleviate the pain points of using AI as a local co-pilot by encapsulating the workflow into a resilient, strictly controlled state machine. It manages context generation, API throttling, human-in-the-loop tool execution, and session-long logging.
+## 1. Philosophy: The Decoupled State Machine

-There are two primary state boundaries used:
+Manual Slop is built on a single, core realization: **AI reasoning is high-latency and non-deterministic, while GUI interaction must be low-latency and responsive.** 

-* The GUI State (Main Thread, Retained-Mode via Dear PyGui)
-* The AI State (Daemon Thread, stateless execution loop)
+To solve this, the engine enforces a strict decoupling between three distinct boundaries:

-All synchronization between these boundaries is managed via lock-protected queues and events.
+*   **The GUI Boundary (Main Thread):** A retained-mode loop (ImGui) that must never block. It handles visual telemetry and user "Seal of Approval" actions.
+*   **The AI Boundary (Daemon Threads):** Stateless execution loops that handle the "heavy lifting" of context aggregation, LLM communication, and tool reasoning.
+*   **The Orchestration Boundary (Asyncio):** A background thread that manages the flow of data between the other two, ensuring thread-safe communication without blocking the UI.

-## Code Paths
+---

-### Lifetime & Application Boot
+## 2. System Lifetime & Initialization

-The application lifetime is localized within App.run in gui_legacy.py.
+The application lifecycle, managed by `App` in `gui_2.py`, follows a precise sequence to ensure the environment is ready before the first frame:

-1. __init__ parses the global config.toml (which sets the active provider, theme, and project paths).
-2. It immediately hands off to project_manager.py to deserialize the active <project>.toml which hydrates the session's files, discussion histories, and prompts.
-3. Dear PyGui's dpg contexts are bootstrapped with docking_viewport=True, allowing individual GUI panels to exist as native OS windows.
-4. The main thread enters a blocking while dpg.is_dearpygui_running() render loop.
-5. On shutdown (clean exit), it performs a dual-flush: _flush_to_project() commits the UI state back to the <project>.toml, and _flush_to_config() commits the global state to config.toml. The viewport layout is automatically serialized to dpg_layout.ini.
+1.  **Context Hydration:** The engine reads `config.toml` (global) and `<project>.toml` (local). This builds the initial "world view" of the project—what files are tracked, what the discussion history is, and which AI models are active.
+2.  **Thread Bootstrapping:**
+    *   The `Asyncio` event loop thread is started (`_loop_thread`).
+    *   The `HookServer` (FastAPI) is started as a daemon to handle IPC.
+3.  **UI Entry:** The main thread enters `immapp.run()`. At this point, the GUI is "alive," and the background threads are ready to receive tasks.
+4.  **The Dual-Flush Shutdown:** On exit, the system commits state back to both project and global configs. This ensures that your window positions, active discussions, and even pending tool results are preserved for the next session.

-### Context Shaping & Aggregation
+---

-Before making a call to an AI Provider, the current state of the workspace is resolved into a dense Markdown representation. 
-This occurs inside aggregate.run.
+## 3. The Task Pipeline: Producer-Consumer Synchronization

-If using the default workflow, aggregate.py hashes through the following process:
+Because ImGui state cannot be safely modified from a background thread, Manual Slop uses a **Producer-Consumer** model for all updates.

-1. **Glob Resolution:** Iterates through config["files"]["paths"] and unpacks any wildcards (e.g., src/**/*.rs) against the designated base_dir.
-2. **File Item Build:** `build_file_items()` reads each resolved file once, storing path, content, and `mtime`. This list is returned alongside the markdown so `ai_client.py` can use it for dynamic context refresh after tool calls without re-reading from disk.
-3. **Markdown Generation:** `build_markdown_from_items()` assembles the final `<project>_00N.md` string. By default (`summary_only=False`) it inlines full file contents. If `summary_only=True`, it delegates to `summarize.build_summary_markdown()` which uses AST-based heuristics to produce compact structural summaries instead.
-4. The Markdown file is persisted to disk (`./md_gen/` by default) for auditing. `run()` returns a 3-tuple `(markdown_str, output_path, file_items)`.
+### The Flow of an AI Request
+1.  **Produce:** When you click "Gen + Send," the GUI thread produces a `UserRequestEvent` and pushes it into the `AsyncEventQueue`.
+2.  **Consume:** The background `asyncio` loop pops this event and dispatches it to the `ai_client`. The GUI thread remains free to render and respond to other inputs.
+3.  **Task Backlog:** When the AI responds, the background thread *cannot* update the UI text boxes directly. Instead, it appends a **Task Dictionary** to the `_pending_gui_tasks` list.
+4.  **Sync:** On every frame, the GUI thread checks this list. If tasks exist, it acquires a lock, clears the list, and executes the updates (e.g., "Set AI response text," "Blink the terminal indicator").

-### AI Communication & The Tool Loop
+---

-The communication model is unified under ai_client.py, which normalizes the Gemini and Anthropic SDKs into a singular interface send(md_content, user_message, base_dir, file_items).
+## 4. The Execution Clutch: Human-In-The-Loop (HITL)

-The loop is defined as follows:
+The "Execution Clutch" is our answer to the "Black Box" problem of AI. It allows you to shift from automatic execution to a manual, deterministic step-through mode.

-1. **Prompt Injection:** The aggregated Markdown context and system prompt are injected. For Gemini, the system_instruction and tools are stored in an explicit cache via `client.caches.create()` with a 1-hour TTL; if cache creation fails (under minimum token threshold), it falls back to inline system_instruction. When context changes mid-session, the old cache is deleted and a new one is created. For Anthropic, the system prompt + context are sent as `system=` blocks with `cache_control: ephemeral` on the last chunk, and tools carry `cache_control: ephemeral` on the last tool definition.
-2. **Execution Loop:** A MAX_TOOL_ROUNDS (default 10) bounded loop begins. The tools list for Anthropic is built once per session and reused.
-3. The AI provider is polled.
-4. If the provider's stop_reason is tool_use:
-   1. The loop parses the requested tool (either a read-only MCP tool or the destructive PowerShell tool).
-   2. If PowerShell, it dispatches a blocking event to the Main Thread (see *On Tool Execution & Concurrency*).
-   3. Once the last tool result in the batch is retrieved, the loop executes a **Dynamic Refresh** (`_reread_file_items`). Any files currently tracked by the project are pulled from disk fresh. The `file_items` variable is reassigned so subsequent tool rounds see the updated content.
-   4. For Anthropic: the refreshed file contents are appended as a text block to the tool_results user message. For Gemini: the refreshed contents are appended to the last function response's output string. In both cases, the block is prefixed with `[FILES UPDATED]` / `[SYSTEM: FILES UPDATED]`.
-   5. On subsequent rounds, stale file-refresh blocks from previous turns are stripped from history to prevent token accumulation. For Gemini, old tool outputs exceeding `_history_trunc_limit` characters are also truncated.
-5. Once the model outputs standard text, the loop terminates and yields the string back to the GUI callback.
+### How the "Shifting" Works
+When the AI requests a destructive action (like running a PowerShell script), the background execution thread is **suspended** using a `threading.Condition`:

-### On Tool Execution & Concurrency
+1.  **The Pause:** The thread enters a `.wait()` state. It is physically blocked.
+2.  **The Modal:** A task is sent to the GUI to open a modal dialog.
+3.  **The Mutation:** The user can read the script, edit it, or reject it.
+4.  **The Unleash:** When the user clicks "Approve," the GUI thread updates the shared state and calls `.notify_all()`. The background thread "wakes up," executes the (potentially modified) script, and reports the result back to the AI.

-When the AI calls a safe MCP tool (like read_file or search_files), the daemon thread immediately executes it via mcp_client.py and returns the result.
+---

-However, when the AI requests run_powershell, the operation halts:
+## 5. Security: The MCP Allowlist

-1. The Daemon Thread instantiates a ConfirmDialog object containing the payload and calls .wait(). This blocks the thread on a threading.Event().
-2. The ConfirmDialog instance is safely placed in a _pending_dialog_lock.
-3. The Main Thread, during its next frame cycle, pops the dialog from the lock and renders an OS-level modal window using dpg.window(modal=True).
-4. The user can inspect the script, modify it in the text box, or reject it entirely.
-5. Upon the user clicking "Approve & Run", the main thread triggers the threading.Event, unblocking the Daemon Thread.
-6. The Daemon Thread passes the script to shell_runner.py, captures stdout, stderr, and exit_code, logs it to session_logger.py, and returns it to the LLM.
+To prevent "hallucinated" file access, every filesystem tool (read, list, search) is gated by the **MCP (Model Context Protocol) Bridge**:

-### On Context History Pruning (Anthropic)
+*   **Resolution:** Every path requested by the AI is resolved to an absolute path.
+*   **Checking:** It is verified against the project's `base_dir`. If the AI tries to `read_file("C:/Windows/System32/...")`, the bridge intercepts the call and returns an `ACCESS DENIED` error to the model before the OS is ever touched.

-Because the Anthropic API requires sending the entire conversation history on every request, long sessions will inevitably hit the invalid_request_error: prompt is too long.
+---

-To solve this, ai_client.py implements an aggressive pruning algorithm:
-
-1. _strip_stale_file_refreshes: It recursively sweeps backward through the history dict and strips out large [FILES UPDATED] data blocks from old turns, preserving only the most recent snapshot.
-2. _trim_anthropic_history: If the estimated token count still exceeds _ANTHROPIC_MAX_PROMPT_TOKENS (~180,000), it slices off the oldest user/assistant message pairs from the beginning of the history array.
-3. The loop guarantees that at least the System prompt, Tool Definitions, and the final user prompt are preserved.
-
-### Session Persistence
-
-All I/O bound session data is recorded sequentially. session_logger.py hooks into the execution loops and records:
-
- logs/comms_<ts>.log: A JSON-L structured timeline of every raw payload sent/received.
- logs/toolcalls_<ts>.log: A sequential markdown record detailing every AI tool invocation and its exact stdout result.
- scripts/generated/: Every .ps1 script approved and executed by the shell runner is physically written to disk for version control transparency.
+## 6. Telemetry & Auditing

+Every interaction in Manual Slop is designed to be auditable:
+*   **JSON-L Comms Logs:** Raw API traffic is logged for debugging and token cost analysis.
+*   **Generated Scripts:** Every script that passes through the "Clutch" is saved to `scripts/generated/`.
+*   **Performance Monitor:** Real-time metrics (FPS, Frame Time, Input Lag) are tracked and can be queried via the Hook API to ensure the UI remains "fluid" under load.
@@ -0,0 +1,63 @@
+# Manual Slop: Verification & Simulation Framework
+
+Detailed specification of the live GUI testing infrastructure, simulation lifecycle, and the mock provider strategy.
+
+---
+
+## 1. Live GUI Verification Infrastructure
+
+To verify complex UI state and asynchronous interactions, Manual Slop employs a **Live Verification** strategy using the application's built-in API hooks.
+
+### `--enable-test-hooks`
+When launched with this flag, the application starts the `HookServer` on port `8999`, exposing its internal state to external HTTP requests. This is the foundation for all automated visual verification.
+
+### The `live_gui` pytest Fixture
+Defined in `tests/conftest.py`, this session-scoped fixture manages the lifecycle of the application under test:
+1.  **Startup:** Spawns `gui_2.py` in a separate process with `--enable-test-hooks`.
+2.  **Telemetry:** Polls `/status` until the hook server is ready.
+3.  **Isolation:** Resets the AI session and clears comms logs between tests to prevent state pollution.
+4.  **Teardown:** Robustly kills the process tree on completion or failure.
+
+---
+
+## 2. Simulation Lifecycle: The "Puppeteer" Pattern
+
+Simulations (like `tests/visual_sim_mma_v2.py`) act as a "Puppeteer," driving the GUI through the `ApiHookClient`.
+
+### Phase 1: Environment Setup
+*   **Provider Mocking:** The simulation sets the `current_provider` to `gemini_cli` and redirects the `gcli_path` to a mock script (e.g., `tests/mock_gemini_cli.py`).
+*   **Workspace Isolation:** The `files_base_dir` is pointed to a temporary artifacts directory to prevent accidental modification of the host project.
+
+### Phase 2: User Interaction Loop
+The simulation replicates a human workflow by invoking client methods:
+1.  `client.set_value('mma_epic_input', '...')`: Injects the epic description.
+2.  `client.click('btn_mma_plan_epic')`: Triggers the orchestration engine.
+
+### Phase 3: Polling & Assertion
+Because AI orchestration is asynchronous, simulations use a **Polling with Multi-Modal Approval** loop:
+*   **State Polling:** The script polls `client.get_mma_status()` in a loop.
+*   **Auto-Approval:** If the status indicates a pending tool or spawn request, the simulation automatically clicks the approval buttons (`btn_approve_spawn`, `btn_approve_tool`).
+*   **Verification:** Once the expected state (e.g., "Mock Goal 1" appears in the track list) is detected, the simulation proceeds to the next phase or asserts success.
+
+---
+
+## 3. Mock Provider Strategy
+
+To test the 4-Tier MMA hierarchy without incurring API costs or latency, Manual Slop uses a **Script-Based Mocking** strategy via the `gemini_cli` adapter.
+
+### `tests/mock_gemini_cli.py`
+This script simulates the behavior of the `gemini` CLI by:
+1.  **Input Parsing:** Reading the system prompt and user message from the environment/stdin.
+2.  **Deterministic Response:** Returning pre-defined JSON payloads (e.g., track definitions, worker implementation scripts) based on keywords in the prompt.
+3.  **Tool Simulation:** Mimicking function-call responses to trigger the "Execution Clutch" within the GUI.
+
+---
+
+## 4. Visual Verification Examples
+
+Tests in this framework don't just check return values; they verify the **rendered state** of the application:
+*   **DAG Integrity:** Verifying that `active_tickets` in the MMA status matches the expected task graph.
+*   **Stream Telemetry:** Checking `mma_streams` to ensure that output from multiple tiers is correctly captured and displayed in the terminal.
+*   **Modal State:** Asserting that the correct dialog (e.g., `ConfirmDialog`) is active during a pending tool call.
+
+By combining these techniques, Manual Slop achieves a level of verification rigor usually reserved for high-stakes embedded systems or complex graphics engines.
@@ -1,58 +1,65 @@
-# Guide: Tooling
+# Manual Slop: Tooling & IPC Technical Reference

-Overview of the tool dispatch and execution model.
+A deep-dive into the Model Context Protocol (MCP) bridge, the Hook API, and the "Human-in-the-Loop" communication protocol.

 ---

-The agent is provided two classes of tools: Read-Only MCP Tools, and a Destructive Execution Loop.
+## 1. The MCP Bridge: Filesystem Security

-## 1. Read-Only Context (MCP Tools)
+The AI's ability to interact with your filesystem is mediated by a strict security allowlist.

-Implemented in mcp_client.py. These tools allow the AI to selectively expand its knowledge of the codebase without requiring the user to dump entire 10,000-line files into the static context prefix.
+### Path Resolution & Sandboxing
+Every tool accessing the disk (e.g., `read_file`, `list_directory`, `search_files`) executes `_resolve_and_check(path)`:
+1.  **Normalization:** The requested path is converted to an absolute path.
+2.  **Constraint Check:** The path must reside within the project's `base_dir`.
+3.  **Enforcement:** Violations trigger a `PermissionError`, returned to the model as an `ACCESS DENIED` status.

-### Security & Scope
+### Native Toolset
+*   **`read_file(path)`:** UTF-8 extraction, clamped by token budgets.
+*   **`list_directory(path)`:** Returns a structural map (Name, Type, Size).
+*   **`get_file_summary(path)`:** AST-based heuristic parsing for high-signal architectural mapping without full-file read costs.
+*   **`web_search(query)`:** Scrapes DuckDuckGo raw HTML via a dependency-free parser.

-Every **filesystem** MCP tool passes its arguments through `_resolve_and_check`. This function ensures that the requested path falls under one of the allowed directories defined in the GUI's Base Dir configurations.
-If the AI attempts to read or search a path outside the project bounds, the tool safely catches the constraint violation and returns ACCESS DENIED.
+---

-The two **web tools** (`web_search`, `fetch_url`) bypass this check entirely — they have no filesystem access and are unrestricted.
+## 2. The Hook API: Remote Control & Telemetry

-### Supplied Tools:
+Manual Slop exposes a REST-based IPC interface (running by default on port `8999`) to facilitate automated verification and external monitoring.

-**Filesystem tools** (access-controlled via `_resolve_and_check`):
-* `read_file(path)`: Returns the raw UTF-8 text of a file.
-* `list_directory(path)`: Returns a formatted table of a directory's contents, showing file vs dir and byte sizes.
-* `search_files(path, pattern)`: Executes a glob search (e.g., `**/*.py`) within an allowed directory.
-* `get_file_summary(path)`: Invokes the local `summarize.py` heuristic parser to get the AST structure of a file without reading the whole body.
+### Core Endpoints
+*   `GET /status`: Engine health and hook server readiness.
+*   `GET /mma_status`: Retrieves the 4-Tier state, active track metadata, and current ticket DAG status.
+*   `POST /api/gui`: Pushes events into the `AsyncEventQueue`.
+    *   Payload example: `{"action": "set_value", "item": "current_provider", "value": "anthropic"}`
+*   `GET /diagnostics`: High-frequency telemetry for UI performance (FPS, CPU, Input Lag).

-**Web tools** (unrestricted — no filesystem access):
-* `web_search(query)`: Queries DuckDuckGo's raw HTML endpoint and returns the top 5 results (title, URL, snippet) using a native `_DDGParser` (HTMLParser subclass) to avoid heavy dependencies.
-* `fetch_url(url)`: Downloads a target webpage and strips out all scripts, styling, and structural HTML via `_TextExtractor`, returning only the raw prose content (clamped to 40,000 characters). Automatically resolves DuckDuckGo redirect links.
+### ApiHookClient Implementation
+The `api_hook_client.py` provides a robust wrapper for the Hook API:
+*   **Synchronous Wait:** `wait_for_server()` polls `/status` with exponential backoff.
+*   **State Polling:** `wait_for_value()` blocks until a specific GUI element matches an expected state.
+*   **Remote Interaction:** `click()`, `set_value()`, and `select_tab()` methods allow external agents to drive the GUI.

-## 2. Destructive Execution (run_powershell)
+---

-The core manipulation mechanism. This is a single, heavily guarded tool.
+## 3. The HITL IPC Flow: `ask/respond`

-### Flow
+Manual Slop supports a synchronous "Human-in-the-Loop" request pattern for operations requiring explicit confirmation or manual data mutation.

-1. The AI generates a 'run_powershell' payload containing a PowerShell script.
-2. The AI background thread calls confirm_and_run_callback (injected by gui_legacy.py).
-3. The background thread blocks completely, creating a modal popup on the main GUI thread.
-4. The user reads the script and chooses to Approve or Reject.
-5. If Approved, shell_runner.py executes the script using -NoProfile -NonInteractive -Command within the specified base_dir.
-6. The combined stdout, stderr, and EXIT CODE are captured and returned to the AI in the tool result block.
+### Sequence of Operation
+1.  **Request:** A background agent (e.g., a Tier 3 Worker) calls `/api/ask` with a JSON payload.
+2.  **Intercept:** the `HookServer` generates a unique `request_id` and pushes a `type: "ask"` event to the GUI's `_pending_gui_tasks`.
+3.  **Modal Display:** The GUI renders an `Approve/Reject` modal with the payload details.
+4.  **Response:** Upon user action, the GUI thread `POST`s to `/api/ask/respond`.
+5.  **Resume:** The original agent call to `/api/ask` (which was polling for completion) unblocks and receives the user's response.

-### AI Guidelines
+This pattern is the foundation of the **Execution Clutch**, ensuring that no destructive action occurs without an auditable human signal.

-The core system prompt explicitly guides the AI on how to use this tool safely:
+---

-* Prefer targeted replacements (using PowerShell's .Replace()) over full rewrites where possible.
-* If a file is large and complex (requiring specific escape characters), do not attempt an inline python -c script. Instead, use a PowerShell here-string (@'...'@) to write a temporary python helper script to disk, execute the python script, and then delete it.
+## 4. Synthetic Context Refresh

-### Synthetic Context Refresh
-
-After the **last** tool call in each round finishes (when multiple tools are called in a single round, the refresh happens once after all of them), ai_client runs `_reread_file_items`. It fetches the latest disk state of all files in the current project context. The `file_items` variable is reassigned so subsequent tool rounds within the same request use the fresh content.
-
-For Anthropic, the refreshed contents are injected as a text block in the `tool_results` user message. For Gemini, they are appended to the last function response's output string. In both cases, the block is prefixed with `[FILES UPDATED]` / `[SYSTEM: FILES UPDATED]`.
-
-On the next tool round, stale file-refresh blocks from previous rounds are stripped from history to prevent token accumulation. This means if the AI writes to a file, it instantly "sees" the modification in its next turn without having to waste a cycle calling `read_file`, and the cost of carrying the full file snapshot is limited to one round.
+To minimize token churn and redundant `read_file` calls, the `ai_client` performs a post-tool-execution refresh:
+1.  **Detection:** Triggered after the final tool call in a reasoning round.
+2.  **Collection:** re-reads all project-tracked files from disk.
+3.  **Injection:** The updated content is injected into the next LLM turn as a `[SYSTEM: FILES UPDATED]` block.
+4.  **Pruning:** Older snapshots are stripped from history in subsequent rounds to maintain a lean context window.