docs(conductor): Expert-level architectural documentation refresh

This commit is contained in:
2026-03-01 09:19:48 -05:00
parent 7384df1e29
commit bf4468f125
8 changed files with 263 additions and 195 deletions

View File

@@ -1,54 +1,66 @@
# Manual Slop # Manual Slop
Vibe coding.. but more manual An experimental, high-density AI orchestration engine designed for expert developers. Manual Slop provides a strictly controlled environment for executing complex, multi-tier AI workflows with deterministic human-in-the-loop (HITL) overrides.
![img](./gallery/python_2026-02-21_23-37-29.png) ---
This tool is designed to work as an auxiliary assistant that natively interacts with your codebase via PowerShell and MCP-like file tools, supporting both Anthropic and Gemini APIs. ## 1. Technical Philosophy
Features: Manual Slop is not a chat interface. It is a **Decoupled State Machine** built on the principle that AI reasoning should be observable, mutable, and interruptible. It bridges high-latency AI execution with a low-latency, retained-mode GUI via a thread-safe asynchronous pipeline.
* Multi-provider support (Anthropic & Gemini). ### Core Features
* Multi-project workspace management via TOML configuration. * **Hierarchical MMA (4-Tier Architecture):** Orchestrate complex tracks using a tiered model (Orchestrator -> Tech Lead -> Worker -> QA) with explicit token firewalling.
* Rich discussion history with branching and timestamps. * **The Execution Clutch:** A deterministic "gear-shifting" mechanism that pauses execution for human inspection and mutation of AI-generated payloads.
* Real-time file context aggregation and summarization. * **MCP-Bridge & Tooling:** Integrated filesystem sandboxing and native search/fetch tools with project-wide security allowlists.
* Integrated tool execution: * **Live Simulation Framework:** A robust verification suite using API hooks for automated visual and state assertions.
* PowerShell scripting for file modifications.
* MCP-like filesystem tools (read, list, search, summarize).
* Web search and URL fetching.
* Extensive UI features:
* Word-wrap toggles.
* Popup text viewers for large script/output inspection.
* Color theming and UI scaling.
## Session-Based Logging and Management ---
Manual Slop organizes all communications and tool interactions into session-based directories under `logs/`. This ensures a clean history and easy debugging. ## 2. Deep-Dive Documentation
* **Organized Storage:** Each session is assigned a unique ID and its own sub-directory containing communication logs (`comms.log`) and metadata. For expert-level technical details, refer to our specialized guides:
* **Log Management Panel:** The GUI includes a dedicated 'Log Management' panel where you can view session history, inspect metadata (message counts, errors, size), and protect important sessions.
* **Automated Pruning:** To keep the workspace clean, the application automatically prunes insignificant logs. Sessions older than 24 hours that are not "whitelisted" and are smaller than 2KB are automatically deleted.
* **Whitelisting:** Sessions containing errors, high activity, or significant changes are automatically whitelisted. Users can also manually whitelist sessions via the GUI to prevent them from being pruned.
## Documentation * **[Architectural Technical Reference](./docs/guide_architecture.md):** Deep-dive into thread synchronization, the task pipeline, and the decoupled state machine.
* **[Tooling & IPC Reference](./docs/guide_tools.md):** Specification of the Hook API, MCP bridge, and the HITL communication protocol.
* **[Verification & Simulation Framework](./docs/guide_simulations.md):** Detailed breakdown of the live GUI testing infrastructure and simulation lifecycle.
* [docs/Readme.md](docs/Readme.md) for the interface and usage guide ---
* [docs/guide_tools.md](docs/guide_tools.md) for information on the AI tooling capabilities
* [docs/guide_architecture.md](docs/guide_architecture.md) for an in-depth breakdown of the codebase architecture
## Instructions ## 3. Setup & Environment
1. Make a credentials.toml in the immediate directory of your clone: ### Prerequisites
* Python 3.11+
* [`uv`](https://github.com/astral-sh/uv) for high-speed package management.
### Installation
1. Clone the repository.
2. Install dependencies:
```powershell
uv sync
```
3. Configure credentials in `credentials.toml`:
```toml ```toml
[gemini] [gemini]
api_key = "****" api_key = "YOUR_KEY"
[anthropic] [anthropic]
api_key = "****" api_key = "YOUR_KEY"
``` ```
2. Have fun. This is experiemntal slop. ### Running the Engine
Launch the main GUI application:
```ps1 ```powershell
uv run .\gui_2.py uv run gui_2.py
``` ```
To enable the Hook API for external telemetry or testing:
```powershell
uv run gui_2.py --enable-test-hooks
```
---
## 4. Feature Roadmap (2026)
* **DAG-Based Task Execution:** Real-time visual tracking of multi-agent ticket dependencies.
* **Token Budgeting & Throttling:** Granular control over cost and context accumulation per tier.
* **Advanced Simulation Suite:** Expanded visual verification for multi-modal reasoning tracks.

View File

@@ -13,3 +13,10 @@ This file tracks all major tracks for the project. Each track has its own detail
- [ ] **Track: Comprehensive Conductor & MMA GUI UX** - [ ] **Track: Comprehensive Conductor & MMA GUI UX**
*Link: [./tracks/comprehensive_gui_ux_20260228/](./tracks/comprehensive_gui_ux_20260228/)* *Link: [./tracks/comprehensive_gui_ux_20260228/](./tracks/comprehensive_gui_ux_20260228/)*
---
- [x] **Track: Deep Architectural Documentation Refresh**
*Link: [./tracks/documentation_refresh_20260224/](./tracks/documentation_refresh_20260224/)*

View File

@@ -1,38 +1,38 @@
# Implementation Plan: Deep Architectural Documentation Refresh # Implementation Plan: Deep Architectural Documentation Refresh
## Phase 1: Context Cleanup & Research ## Phase 1: Context Cleanup & Research
- [ ] Task: Audit references to `MainContext.md` across the project. - [x] Task: Audit references to `MainContext.md` across the project.
- [ ] Task: Delete `MainContext.md` and update any identified references. - [x] Task: Delete `MainContext.md` and update any identified references.
- [ ] Task: Execute `py_get_skeleton` and `py_get_code_outline` for `events.py`, `api_hooks.py`, `api_hook_client.py`, and `gui_2.py` to create a technical map for the guides. - [x] Task: Execute `py_get_skeleton` and `py_get_code_outline` for `events.py`, `api_hooks.py`, `api_hook_client.py`, and `gui_2.py` to create a technical map for the guides.
- [ ] Task: Analyze the `live_gui` fixture in `tests/conftest.py` and the simulation loop in `tests/visual_sim_mma_v2.py`. - [x] Task: Analyze the `live_gui` fixture in `tests/conftest.py` and the simulation loop in `tests/visual_sim_mma_v2.py`.
## Phase 2: Core Architecture Deep Dive ## Phase 2: Core Architecture Deep Dive
Update `docs/guide_architecture.md` with expert-level detail. Update `docs/guide_architecture.md` with expert-level detail.
- [ ] Task: Document the Dual-Threaded App Lifetime: Main GUI loop vs. Daemon execution threads. - [x] Task: Document the Dual-Threaded App Lifetime: Main GUI loop vs. Daemon execution threads.
- [ ] Task: Detail the `AsyncEventQueue` and `EventEmitter` roles in the decoupling strategy. - [x] Task: Detail the `AsyncEventQueue` and `EventEmitter` roles in the decoupling strategy.
- [ ] Task: Explain the `_pending_gui_tasks` synchronization mechanism for bridging the Hook Server and GUI. - [x] Task: Explain the `_pending_gui_tasks` synchronization mechanism for bridging the Hook Server and GUI.
- [ ] Task: Document the "Linear Execution Clutch" and its deterministic state machine. - [x] Task: Document the "Linear Execution Clutch" and its deterministic state machine.
- [ ] Task: Verify the architectural descriptions against the actual implementation. - [x] Task: Verify the architectural descriptions against the actual implementation.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Core Architecture Deep Dive' (Protocol in workflow.md) - [x] Task: Conductor - User Manual Verification 'Phase 2: Core Architecture Deep Dive' (Protocol in workflow.md)
## Phase 3: Hook System & Tooling Technical Reference ## Phase 3: Hook System & Tooling Technical Reference
Update `docs/guide_tools.md` to include low-level API details. Update `docs/guide_tools.md` to include low-level API details.
- [ ] Task: Create a comprehensive API reference for all `HookServer` endpoints. - [x] Task: Create a comprehensive API reference for all `HookServer` endpoints.
- [ ] Task: Document the `ApiHookClient` implementation, including retries and polling strategies. - [x] Task: Document the `ApiHookClient` implementation, including retries and polling strategies.
- [ ] Task: Update the MCP toolset guide with current native tool implementations. - [x] Task: Update the MCP toolset guide with current native tool implementations.
- [ ] Task: Document the `ask/respond` IPC flow for "Human-in-the-Loop" confirmations. - [x] Task: Document the `ask/respond` IPC flow for "Human-in-the-Loop" confirmations.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Hook System & Tooling Technical Reference' (Protocol in workflow.md) - [x] Task: Conductor - User Manual Verification 'Phase 3: Hook System & Tooling Technical Reference' (Protocol in workflow.md)
## Phase 4: Verification & Simulation Framework ## Phase 4: Verification & Simulation Framework
Create the new `docs/guide_simulations.md` guide. Create the new `docs/guide_simulations.md` guide.
- [ ] Task: Detail the Live GUI testing infrastructure: `--enable-test-hooks` and the `live_gui` fixture. - [x] Task: Detail the Live GUI testing infrastructure: `--enable-test-hooks` and the `live_gui` fixture.
- [ ] Task: Breakdown the Simulation Lifecycle: Startup, Polling, Interaction, and Assertion. - [x] Task: Breakdown the Simulation Lifecycle: Startup, Polling, Interaction, and Assertion.
- [ ] Task: Document the mock provider strategy using `tests/mock_gemini_cli.py`. - [x] Task: Document the mock provider strategy using `tests/mock_gemini_cli.py`.
- [ ] Task: Provide examples of visual verification tests (e.g., MMA lifecycle). - [x] Task: Provide examples of visual verification tests (e.g., MMA lifecycle).
- [ ] Task: Conductor - User Manual Verification 'Phase 4: Verification & Simulation Framework' (Protocol in workflow.md) - [x] Task: Conductor - User Manual Verification 'Phase 4: Verification & Simulation Framework' (Protocol in workflow.md)
## Phase 5: README & Roadmap Update ## Phase 5: README & Roadmap Update
- [ ] Task: Update `Readme.md` with current setup (`uv`, `credentials.toml`) and vision. - [x] Task: Update `Readme.md` with current setup (`uv`, `credentials.toml`) and vision.
- [ ] Task: Perform a project-wide link validation of all Markdown files. - [x] Task: Perform a project-wide link validation of all Markdown files.
- [ ] Task: Verify the high-density information style across all documentation. - [x] Task: Verify the high-density information style across all documentation.
- [ ] Task: Conductor - User Manual Verification 'Phase 5: README & Roadmap Update' (Protocol in workflow.md) - [x] Task: Conductor - User Manual Verification 'Phase 5: README & Roadmap Update' (Protocol in workflow.md)

View File

@@ -1,49 +1,45 @@
# Track Specification: Deep Architectural Documentation Refresh # Track Specification: Deep Architectural Documentation Refresh
## Overview ## Overview
This track implements a high-density, expert-level documentation suite for the Manual Slop project, focusing on the complex asynchronous orchestration, the 4-Tier MMA (Hierarchical Multi-Model Architecture), and the robust visual simulation framework. The documentation style adheres to the "USA Graphics Company" values of high information density, tactile interaction, and deep technical transparency, inspired by the architectural guides of `VEFontCache-Odin` and `gencpp`. This track implements a high-density, expert-level documentation suite for the Manual Slop project. The documentation style is strictly modeled after the **pedagogical and narrative standards** of `gencpp` and `VEFontCache-Odin`. It moves beyond simple "User Guides" to provide a **"USA Graphics Company"** architectural reference: high information density, tactical technical transparency, and a narrative intent that guides a developer from high-level philosophy to low-level implementation.
## Goals ## Pedagogical Goals
1. **Architectural Deep Dive:** Document the core engine, focusing on state management, asynchronous event loops, and the decoupling of the GUI from the AI client. 1. **Narrative Intent:** Documentation must transition the reader through a logical learning journey: **Philosophy/Mental Model -> Architectural Boundaries -> Implementation Logic -> Verification/Simulation.**
2. **Hook System Technical Reference:** Provide a low-level guide to the `HookServer`, `HookHandler`, and the IPC mechanism used for automated verification. 2. **High Information Density:** Eliminate conversational filler and "fluff." Every sentence must provide architectural signal (state transitions, data flows, constraints).
3. **Simulation & Testing Framework:** Detail the live GUI simulation strategy, including the use of `pytest` fixtures, `ApiHookClient`, and mock providers. 3. **Technical Transparency:** Document the "How" and "Why" behind design decisions (e.g., *Why* the dual-threaded `Asyncio` loop? *How* does the "Execution Clutch" bridge the thread gap?).
4. **MMA Orchestration Logic:** Document the 4-Tier hierarchy (Orchestrator, Tech Lead, Worker, QA), token firewalling, and the DAG-based task execution engine. 4. **Architectural Mapping:** Use precise symbol names (`AsyncEventQueue`, `_pending_gui_tasks`, `HookServer`) to map the documentation directly to the source code.
5. **Event Handling System:** Explain the dual event architecture using both synchronous `EventEmitter` and asynchronous `AsyncEventQueue`. 5. **Multi-Layered Depth:** Each major component (Architecture, Tools, Simulations) must have its own dedicated, expert-level guide. No consolidation into single, shallow files.
## Functional Requirements (Documentation Areas) ## Functional Requirements (Documentation Areas)
### 1. Core Architecture (`docs/guide_architecture.md`) ### 1. Core Architecture (`docs/guide_architecture.md`)
- **Lifetime & Execution Loop:** Detailed breakdown of the application startup, the interaction between the main thread and daemon threads, and the event-driven updates. - **System Philosophy:** The "Decoupled State Machine" mental model.
- **Asynchronous Orchestration:** Documentation of the `AsyncEventQueue` and how it prevents GUI blocking during long-running AI operations. - **Application Lifetime:** The multi-threaded boot sequence and the "Dual-Flush" persistence model.
- **State Synchronization:** How the `_pending_gui_tasks` queue and `threading.Lock` are used to safely bridge the Hook Server thread and the Main GUI thread. - **The Task Pipeline:** Detailed producer-consumer synchronization between the GUI (Main) and AI (Daemon) threads.
- **Linear Execution Clutch:** Explanation of the deterministic debugging mode and the step-through execution logic. - **The Execution Clutch (HITL):** Detailed state machine for human-in-the-loop interception and payload mutation.
### 2. Tooling & Hooks (`docs/guide_tools.md`) ### 2. Tooling & IPC Reference (`docs/guide_tools.md`)
- **IPC & Hook API:** A complete reference for all `/api/` endpoints, including data structures for `mma_status`, `diagnostics`, and `ask/respond`. - **MCP Bridge:** Low-level security constraints and filesystem sandboxing.
- **ApiHookClient:** Usage guide for the requests-based client, detailing polling strategies and retry logic. - **Hook API:** A full technical reference for the REST/IPC interface (endpoints, payloads, diagnostics).
- **MCP Toolset:** Updated documentation for the native MCP-like tools (file I/O, search, web fetch) and their security allowlist. - **IPC Flow:** The `ask/respond` sequence for synchronous human-in-the-loop requests.
### 3. Verification & Simulation (`docs/guide_simulations.md` - NEW) ### 3. Verification & Simulation Framework (`docs/guide_simulations.md`)
- **Live GUI Verification:** How to run the app with `--enable-test-hooks` and use the `live_gui` fixture for end-to-end testing. - **Infrastructure:** The `--enable-test-hooks` flag and the `live_gui` pytest fixture.
- **Visual Simulation Lifecycle:** Detailed breakdown of `tests/visual_sim_mma_v2.py`, explaining how it simulates user interaction and verifies rendered state (e.g., DAG nodes, stream output). - **Lifecycle:** The "Puppeteer" pattern for driving the GUI via automated clients.
- **Mocking AI Providers:** How to use `tests/mock_gemini_cli.py` to test the full orchestration loop without incurring API costs. - **Mocking Strategy:** Script-based AI provider mocking via `mock_gemini_cli.py`.
- **Visual Assertion:** Examples of verifying the rendered state (DAG, Terminal streams) rather than just API returns.
### 4. README & Vision (`Readme.md`) ### 4. Product Vision & Roadmap (`Readme.md`)
- **Setup & Telemetry:** Updated instructions for `uv` and `credentials.toml`. - **Technological Identity:** High-density experimental tool for local AI orchestration.
- **Feature Roadmap:** Highlighting the MMA Dashboard, DAG engine, and the 4-Tier delegation model. - **Pedagogical Landing:** Direct links to the deep-dive guides to establish the project's expert-level tone immediately.
## Non-Functional Requirements ## Acceptance Criteria for Expert Review (Claude Opus)
- **Indentation & Formatting:** Follow the project's AI-optimized compact style (1-space indentation for code snippets). - [ ] **Zero Filler:** No introductory "In this section..." or "Now we will..." conversational markers.
- **High Information Density:** Avoid conversational filler; focus on state transitions, data flow, and architectural constraints. - [ ] **Structural Parity:** Documentation follows the `gencpp` pattern (Philosophy -> Code Paths -> Interface).
- **Accurate Code Linkage:** Documentation must include specific file names and symbol references (e.g., `AsyncEventQueue`, `_cb_accept_tracks`). - [ ] **Expert-Level Detail:** Includes data structures, locking mechanisms, and thread-safety constraints.
- [ ] **Narrative Cohesion:** The documents feel like a single, expert-authored manual for a complex graphics or systems engine.
## Acceptance Criteria - [ ] **Tactile Interaction:** Explains the "Linear Execution Clutch" as a physical shift in the application's processing gears.
- [ ] `docs/guide_architecture.md` updated with "Odin-style" depth.
- [ ] `docs/guide_tools.md` contains a full API reference for the Hook system.
- [ ] `docs/guide_simulations.md` created, detailing the visual verification suite.
- [ ] `Readme.md` reflects the current high-density product vision.
- [ ] `MainContext.md` decommissioned and references removed.
## Out of Scope ## Out of Scope
- Documenting legacy `gui_legacy.py` code beyond its role as a fallback. - Documenting legacy `gui_legacy.py` code beyond its role as a fallback.
- Visual diagram generation (focusing on high-quality text-based architectural mapping). - Visual diagram generation (focusing on high-signal text-based architectural mapping).

View File

@@ -1,87 +1,72 @@
# Guide: Architecture # Manual Slop: Architectural Technical Reference
Overview of the package design, state management, and code-path layout. A deep-dive into the asynchronous orchestration, state synchronization, and the "Linear Execution Clutch" of the Manual Slop engine. This document is designed to move the reader from a high-level mental model to a low-level implementation understanding.
--- ---
The purpose of this software is to alleviate the pain points of using AI as a local co-pilot by encapsulating the workflow into a resilient, strictly controlled state machine. It manages context generation, API throttling, human-in-the-loop tool execution, and session-long logging. ## 1. Philosophy: The Decoupled State Machine
There are two primary state boundaries used: Manual Slop is built on a single, core realization: **AI reasoning is high-latency and non-deterministic, while GUI interaction must be low-latency and responsive.**
* The GUI State (Main Thread, Retained-Mode via Dear PyGui) To solve this, the engine enforces a strict decoupling between three distinct boundaries:
* The AI State (Daemon Thread, stateless execution loop)
All synchronization between these boundaries is managed via lock-protected queues and events. * **The GUI Boundary (Main Thread):** A retained-mode loop (ImGui) that must never block. It handles visual telemetry and user "Seal of Approval" actions.
* **The AI Boundary (Daemon Threads):** Stateless execution loops that handle the "heavy lifting" of context aggregation, LLM communication, and tool reasoning.
* **The Orchestration Boundary (Asyncio):** A background thread that manages the flow of data between the other two, ensuring thread-safe communication without blocking the UI.
## Code Paths ---
### Lifetime & Application Boot ## 2. System Lifetime & Initialization
The application lifetime is localized within App.run in gui_legacy.py. The application lifecycle, managed by `App` in `gui_2.py`, follows a precise sequence to ensure the environment is ready before the first frame:
1. __init__ parses the global config.toml (which sets the active provider, theme, and project paths). 1. **Context Hydration:** The engine reads `config.toml` (global) and `<project>.toml` (local). This builds the initial "world view" of the project—what files are tracked, what the discussion history is, and which AI models are active.
2. It immediately hands off to project_manager.py to deserialize the active <project>.toml which hydrates the session's files, discussion histories, and prompts. 2. **Thread Bootstrapping:**
3. Dear PyGui's dpg contexts are bootstrapped with docking_viewport=True, allowing individual GUI panels to exist as native OS windows. * The `Asyncio` event loop thread is started (`_loop_thread`).
4. The main thread enters a blocking while dpg.is_dearpygui_running() render loop. * The `HookServer` (FastAPI) is started as a daemon to handle IPC.
5. On shutdown (clean exit), it performs a dual-flush: _flush_to_project() commits the UI state back to the <project>.toml, and _flush_to_config() commits the global state to config.toml. The viewport layout is automatically serialized to dpg_layout.ini. 3. **UI Entry:** The main thread enters `immapp.run()`. At this point, the GUI is "alive," and the background threads are ready to receive tasks.
4. **The Dual-Flush Shutdown:** On exit, the system commits state back to both project and global configs. This ensures that your window positions, active discussions, and even pending tool results are preserved for the next session.
### Context Shaping & Aggregation ---
Before making a call to an AI Provider, the current state of the workspace is resolved into a dense Markdown representation. ## 3. The Task Pipeline: Producer-Consumer Synchronization
This occurs inside aggregate.run.
If using the default workflow, aggregate.py hashes through the following process: Because ImGui state cannot be safely modified from a background thread, Manual Slop uses a **Producer-Consumer** model for all updates.
1. **Glob Resolution:** Iterates through config["files"]["paths"] and unpacks any wildcards (e.g., src/**/*.rs) against the designated base_dir. ### The Flow of an AI Request
2. **File Item Build:** `build_file_items()` reads each resolved file once, storing path, content, and `mtime`. This list is returned alongside the markdown so `ai_client.py` can use it for dynamic context refresh after tool calls without re-reading from disk. 1. **Produce:** When you click "Gen + Send," the GUI thread produces a `UserRequestEvent` and pushes it into the `AsyncEventQueue`.
3. **Markdown Generation:** `build_markdown_from_items()` assembles the final `<project>_00N.md` string. By default (`summary_only=False`) it inlines full file contents. If `summary_only=True`, it delegates to `summarize.build_summary_markdown()` which uses AST-based heuristics to produce compact structural summaries instead. 2. **Consume:** The background `asyncio` loop pops this event and dispatches it to the `ai_client`. The GUI thread remains free to render and respond to other inputs.
4. The Markdown file is persisted to disk (`./md_gen/` by default) for auditing. `run()` returns a 3-tuple `(markdown_str, output_path, file_items)`. 3. **Task Backlog:** When the AI responds, the background thread *cannot* update the UI text boxes directly. Instead, it appends a **Task Dictionary** to the `_pending_gui_tasks` list.
4. **Sync:** On every frame, the GUI thread checks this list. If tasks exist, it acquires a lock, clears the list, and executes the updates (e.g., "Set AI response text," "Blink the terminal indicator").
### AI Communication & The Tool Loop ---
The communication model is unified under ai_client.py, which normalizes the Gemini and Anthropic SDKs into a singular interface send(md_content, user_message, base_dir, file_items). ## 4. The Execution Clutch: Human-In-The-Loop (HITL)
The loop is defined as follows: The "Execution Clutch" is our answer to the "Black Box" problem of AI. It allows you to shift from automatic execution to a manual, deterministic step-through mode.
1. **Prompt Injection:** The aggregated Markdown context and system prompt are injected. For Gemini, the system_instruction and tools are stored in an explicit cache via `client.caches.create()` with a 1-hour TTL; if cache creation fails (under minimum token threshold), it falls back to inline system_instruction. When context changes mid-session, the old cache is deleted and a new one is created. For Anthropic, the system prompt + context are sent as `system=` blocks with `cache_control: ephemeral` on the last chunk, and tools carry `cache_control: ephemeral` on the last tool definition. ### How the "Shifting" Works
2. **Execution Loop:** A MAX_TOOL_ROUNDS (default 10) bounded loop begins. The tools list for Anthropic is built once per session and reused. When the AI requests a destructive action (like running a PowerShell script), the background execution thread is **suspended** using a `threading.Condition`:
3. The AI provider is polled.
4. If the provider's stop_reason is tool_use:
1. The loop parses the requested tool (either a read-only MCP tool or the destructive PowerShell tool).
2. If PowerShell, it dispatches a blocking event to the Main Thread (see *On Tool Execution & Concurrency*).
3. Once the last tool result in the batch is retrieved, the loop executes a **Dynamic Refresh** (`_reread_file_items`). Any files currently tracked by the project are pulled from disk fresh. The `file_items` variable is reassigned so subsequent tool rounds see the updated content.
4. For Anthropic: the refreshed file contents are appended as a text block to the tool_results user message. For Gemini: the refreshed contents are appended to the last function response's output string. In both cases, the block is prefixed with `[FILES UPDATED]` / `[SYSTEM: FILES UPDATED]`.
5. On subsequent rounds, stale file-refresh blocks from previous turns are stripped from history to prevent token accumulation. For Gemini, old tool outputs exceeding `_history_trunc_limit` characters are also truncated.
5. Once the model outputs standard text, the loop terminates and yields the string back to the GUI callback.
### On Tool Execution & Concurrency 1. **The Pause:** The thread enters a `.wait()` state. It is physically blocked.
2. **The Modal:** A task is sent to the GUI to open a modal dialog.
3. **The Mutation:** The user can read the script, edit it, or reject it.
4. **The Unleash:** When the user clicks "Approve," the GUI thread updates the shared state and calls `.notify_all()`. The background thread "wakes up," executes the (potentially modified) script, and reports the result back to the AI.
When the AI calls a safe MCP tool (like read_file or search_files), the daemon thread immediately executes it via mcp_client.py and returns the result. ---
However, when the AI requests run_powershell, the operation halts: ## 5. Security: The MCP Allowlist
1. The Daemon Thread instantiates a ConfirmDialog object containing the payload and calls .wait(). This blocks the thread on a threading.Event(). To prevent "hallucinated" file access, every filesystem tool (read, list, search) is gated by the **MCP (Model Context Protocol) Bridge**:
2. The ConfirmDialog instance is safely placed in a _pending_dialog_lock.
3. The Main Thread, during its next frame cycle, pops the dialog from the lock and renders an OS-level modal window using dpg.window(modal=True).
4. The user can inspect the script, modify it in the text box, or reject it entirely.
5. Upon the user clicking "Approve & Run", the main thread triggers the threading.Event, unblocking the Daemon Thread.
6. The Daemon Thread passes the script to shell_runner.py, captures stdout, stderr, and exit_code, logs it to session_logger.py, and returns it to the LLM.
### On Context History Pruning (Anthropic) * **Resolution:** Every path requested by the AI is resolved to an absolute path.
* **Checking:** It is verified against the project's `base_dir`. If the AI tries to `read_file("C:/Windows/System32/...")`, the bridge intercepts the call and returns an `ACCESS DENIED` error to the model before the OS is ever touched.
Because the Anthropic API requires sending the entire conversation history on every request, long sessions will inevitably hit the invalid_request_error: prompt is too long. ---
To solve this, ai_client.py implements an aggressive pruning algorithm: ## 6. Telemetry & Auditing
1. _strip_stale_file_refreshes: It recursively sweeps backward through the history dict and strips out large [FILES UPDATED] data blocks from old turns, preserving only the most recent snapshot.
2. _trim_anthropic_history: If the estimated token count still exceeds _ANTHROPIC_MAX_PROMPT_TOKENS (~180,000), it slices off the oldest user/assistant message pairs from the beginning of the history array.
3. The loop guarantees that at least the System prompt, Tool Definitions, and the final user prompt are preserved.
### Session Persistence
All I/O bound session data is recorded sequentially. session_logger.py hooks into the execution loops and records:
- logs/comms_<ts>.log: A JSON-L structured timeline of every raw payload sent/received.
- logs/toolcalls_<ts>.log: A sequential markdown record detailing every AI tool invocation and its exact stdout result.
- scripts/generated/: Every .ps1 script approved and executed by the shell runner is physically written to disk for version control transparency.
Every interaction in Manual Slop is designed to be auditable:
* **JSON-L Comms Logs:** Raw API traffic is logged for debugging and token cost analysis.
* **Generated Scripts:** Every script that passes through the "Clutch" is saved to `scripts/generated/`.
* **Performance Monitor:** Real-time metrics (FPS, Frame Time, Input Lag) are tracked and can be queried via the Hook API to ensure the UI remains "fluid" under load.

63
docs/guide_simulations.md Normal file
View File

@@ -0,0 +1,63 @@
# Manual Slop: Verification & Simulation Framework
Detailed specification of the live GUI testing infrastructure, simulation lifecycle, and the mock provider strategy.
---
## 1. Live GUI Verification Infrastructure
To verify complex UI state and asynchronous interactions, Manual Slop employs a **Live Verification** strategy using the application's built-in API hooks.
### `--enable-test-hooks`
When launched with this flag, the application starts the `HookServer` on port `8999`, exposing its internal state to external HTTP requests. This is the foundation for all automated visual verification.
### The `live_gui` pytest Fixture
Defined in `tests/conftest.py`, this session-scoped fixture manages the lifecycle of the application under test:
1. **Startup:** Spawns `gui_2.py` in a separate process with `--enable-test-hooks`.
2. **Telemetry:** Polls `/status` until the hook server is ready.
3. **Isolation:** Resets the AI session and clears comms logs between tests to prevent state pollution.
4. **Teardown:** Robustly kills the process tree on completion or failure.
---
## 2. Simulation Lifecycle: The "Puppeteer" Pattern
Simulations (like `tests/visual_sim_mma_v2.py`) act as a "Puppeteer," driving the GUI through the `ApiHookClient`.
### Phase 1: Environment Setup
* **Provider Mocking:** The simulation sets the `current_provider` to `gemini_cli` and redirects the `gcli_path` to a mock script (e.g., `tests/mock_gemini_cli.py`).
* **Workspace Isolation:** The `files_base_dir` is pointed to a temporary artifacts directory to prevent accidental modification of the host project.
### Phase 2: User Interaction Loop
The simulation replicates a human workflow by invoking client methods:
1. `client.set_value('mma_epic_input', '...')`: Injects the epic description.
2. `client.click('btn_mma_plan_epic')`: Triggers the orchestration engine.
### Phase 3: Polling & Assertion
Because AI orchestration is asynchronous, simulations use a **Polling with Multi-Modal Approval** loop:
* **State Polling:** The script polls `client.get_mma_status()` in a loop.
* **Auto-Approval:** If the status indicates a pending tool or spawn request, the simulation automatically clicks the approval buttons (`btn_approve_spawn`, `btn_approve_tool`).
* **Verification:** Once the expected state (e.g., "Mock Goal 1" appears in the track list) is detected, the simulation proceeds to the next phase or asserts success.
---
## 3. Mock Provider Strategy
To test the 4-Tier MMA hierarchy without incurring API costs or latency, Manual Slop uses a **Script-Based Mocking** strategy via the `gemini_cli` adapter.
### `tests/mock_gemini_cli.py`
This script simulates the behavior of the `gemini` CLI by:
1. **Input Parsing:** Reading the system prompt and user message from the environment/stdin.
2. **Deterministic Response:** Returning pre-defined JSON payloads (e.g., track definitions, worker implementation scripts) based on keywords in the prompt.
3. **Tool Simulation:** Mimicking function-call responses to trigger the "Execution Clutch" within the GUI.
---
## 4. Visual Verification Examples
Tests in this framework don't just check return values; they verify the **rendered state** of the application:
* **DAG Integrity:** Verifying that `active_tickets` in the MMA status matches the expected task graph.
* **Stream Telemetry:** Checking `mma_streams` to ensure that output from multiple tiers is correctly captured and displayed in the terminal.
* **Modal State:** Asserting that the correct dialog (e.g., `ConfirmDialog`) is active during a pending tool call.
By combining these techniques, Manual Slop achieves a level of verification rigor usually reserved for high-stakes embedded systems or complex graphics engines.

View File

@@ -1,58 +1,65 @@
# Guide: Tooling # Manual Slop: Tooling & IPC Technical Reference
Overview of the tool dispatch and execution model. A deep-dive into the Model Context Protocol (MCP) bridge, the Hook API, and the "Human-in-the-Loop" communication protocol.
--- ---
The agent is provided two classes of tools: Read-Only MCP Tools, and a Destructive Execution Loop. ## 1. The MCP Bridge: Filesystem Security
## 1. Read-Only Context (MCP Tools) The AI's ability to interact with your filesystem is mediated by a strict security allowlist.
Implemented in mcp_client.py. These tools allow the AI to selectively expand its knowledge of the codebase without requiring the user to dump entire 10,000-line files into the static context prefix. ### Path Resolution & Sandboxing
Every tool accessing the disk (e.g., `read_file`, `list_directory`, `search_files`) executes `_resolve_and_check(path)`:
1. **Normalization:** The requested path is converted to an absolute path.
2. **Constraint Check:** The path must reside within the project's `base_dir`.
3. **Enforcement:** Violations trigger a `PermissionError`, returned to the model as an `ACCESS DENIED` status.
### Security & Scope ### Native Toolset
* **`read_file(path)`:** UTF-8 extraction, clamped by token budgets.
* **`list_directory(path)`:** Returns a structural map (Name, Type, Size).
* **`get_file_summary(path)`:** AST-based heuristic parsing for high-signal architectural mapping without full-file read costs.
* **`web_search(query)`:** Scrapes DuckDuckGo raw HTML via a dependency-free parser.
Every **filesystem** MCP tool passes its arguments through `_resolve_and_check`. This function ensures that the requested path falls under one of the allowed directories defined in the GUI's Base Dir configurations. ---
If the AI attempts to read or search a path outside the project bounds, the tool safely catches the constraint violation and returns ACCESS DENIED.
The two **web tools** (`web_search`, `fetch_url`) bypass this check entirely — they have no filesystem access and are unrestricted. ## 2. The Hook API: Remote Control & Telemetry
### Supplied Tools: Manual Slop exposes a REST-based IPC interface (running by default on port `8999`) to facilitate automated verification and external monitoring.
**Filesystem tools** (access-controlled via `_resolve_and_check`): ### Core Endpoints
* `read_file(path)`: Returns the raw UTF-8 text of a file. * `GET /status`: Engine health and hook server readiness.
* `list_directory(path)`: Returns a formatted table of a directory's contents, showing file vs dir and byte sizes. * `GET /mma_status`: Retrieves the 4-Tier state, active track metadata, and current ticket DAG status.
* `search_files(path, pattern)`: Executes a glob search (e.g., `**/*.py`) within an allowed directory. * `POST /api/gui`: Pushes events into the `AsyncEventQueue`.
* `get_file_summary(path)`: Invokes the local `summarize.py` heuristic parser to get the AST structure of a file without reading the whole body. * Payload example: `{"action": "set_value", "item": "current_provider", "value": "anthropic"}`
* `GET /diagnostics`: High-frequency telemetry for UI performance (FPS, CPU, Input Lag).
**Web tools** (unrestricted — no filesystem access): ### ApiHookClient Implementation
* `web_search(query)`: Queries DuckDuckGo's raw HTML endpoint and returns the top 5 results (title, URL, snippet) using a native `_DDGParser` (HTMLParser subclass) to avoid heavy dependencies. The `api_hook_client.py` provides a robust wrapper for the Hook API:
* `fetch_url(url)`: Downloads a target webpage and strips out all scripts, styling, and structural HTML via `_TextExtractor`, returning only the raw prose content (clamped to 40,000 characters). Automatically resolves DuckDuckGo redirect links. * **Synchronous Wait:** `wait_for_server()` polls `/status` with exponential backoff.
* **State Polling:** `wait_for_value()` blocks until a specific GUI element matches an expected state.
* **Remote Interaction:** `click()`, `set_value()`, and `select_tab()` methods allow external agents to drive the GUI.
## 2. Destructive Execution (run_powershell) ---
The core manipulation mechanism. This is a single, heavily guarded tool. ## 3. The HITL IPC Flow: `ask/respond`
### Flow Manual Slop supports a synchronous "Human-in-the-Loop" request pattern for operations requiring explicit confirmation or manual data mutation.
1. The AI generates a 'run_powershell' payload containing a PowerShell script. ### Sequence of Operation
2. The AI background thread calls confirm_and_run_callback (injected by gui_legacy.py). 1. **Request:** A background agent (e.g., a Tier 3 Worker) calls `/api/ask` with a JSON payload.
3. The background thread blocks completely, creating a modal popup on the main GUI thread. 2. **Intercept:** the `HookServer` generates a unique `request_id` and pushes a `type: "ask"` event to the GUI's `_pending_gui_tasks`.
4. The user reads the script and chooses to Approve or Reject. 3. **Modal Display:** The GUI renders an `Approve/Reject` modal with the payload details.
5. If Approved, shell_runner.py executes the script using -NoProfile -NonInteractive -Command within the specified base_dir. 4. **Response:** Upon user action, the GUI thread `POST`s to `/api/ask/respond`.
6. The combined stdout, stderr, and EXIT CODE are captured and returned to the AI in the tool result block. 5. **Resume:** The original agent call to `/api/ask` (which was polling for completion) unblocks and receives the user's response.
### AI Guidelines This pattern is the foundation of the **Execution Clutch**, ensuring that no destructive action occurs without an auditable human signal.
The core system prompt explicitly guides the AI on how to use this tool safely: ---
* Prefer targeted replacements (using PowerShell's .Replace()) over full rewrites where possible. ## 4. Synthetic Context Refresh
* If a file is large and complex (requiring specific escape characters), do not attempt an inline python -c script. Instead, use a PowerShell here-string (@'...'@) to write a temporary python helper script to disk, execute the python script, and then delete it.
### Synthetic Context Refresh To minimize token churn and redundant `read_file` calls, the `ai_client` performs a post-tool-execution refresh:
1. **Detection:** Triggered after the final tool call in a reasoning round.
After the **last** tool call in each round finishes (when multiple tools are called in a single round, the refresh happens once after all of them), ai_client runs `_reread_file_items`. It fetches the latest disk state of all files in the current project context. The `file_items` variable is reassigned so subsequent tool rounds within the same request use the fresh content. 2. **Collection:** re-reads all project-tracked files from disk.
3. **Injection:** The updated content is injected into the next LLM turn as a `[SYSTEM: FILES UPDATED]` block.
For Anthropic, the refreshed contents are injected as a text block in the `tool_results` user message. For Gemini, they are appended to the last function response's output string. In both cases, the block is prefixed with `[FILES UPDATED]` / `[SYSTEM: FILES UPDATED]`. 4. **Pruning:** Older snapshots are stripped from history in subsequent rounds to maintain a lean context window.
On the next tool round, stale file-refresh blocks from previous rounds are stripped from history to prevent token accumulation. This means if the AI writes to a file, it instantly "sees" the modification in its next turn without having to waste a cycle calling `read_file`, and the cost of carrying the full file snapshot is limited to one round.

View File

@@ -1,7 +1,6 @@
[project] [project]
name = "manual_slop" name = "manual_slop"
git_dir = "C:/projects/manual_slop" git_dir = "C:/projects/manual_slop"
main_context = "MainContext.md"
system_prompt = "" system_prompt = ""
word_wrap = true word_wrap = true
summary_only = false summary_only = false
@@ -22,7 +21,6 @@ paths = [
"project_manager.py", "project_manager.py",
"config.toml", "config.toml",
"manual_slop.toml", "manual_slop.toml",
"MainContext.md",
"tests/test_agent_tools_wiring.py", "tests/test_agent_tools_wiring.py",
"pyproject.toml", "pyproject.toml",
"events.py", "events.py",