diff --git a/Readme.md b/Readme.md index 972485d..dee5040 100644 --- a/Readme.md +++ b/Readme.md @@ -1,14 +1,56 @@ -# Sloppy +# Manual Slop ![img](./gallery/splash.png) -A GUI orchestrator for local LLM-driven coding sessions. Manual Slop bridges high-latency AI reasoning with a low-latency ImGui render loop via a thread-safe asynchronous pipeline, ensuring every AI-generated payload passes through a human-auditable gate before execution. +A high-density GUI orchestrator for local LLM-driven coding sessions. Manual Slop bridges high-latency AI reasoning with a low-latency ImGui render loop via a thread-safe asynchronous pipeline, ensuring every AI-generated payload passes through a human-auditable gate before execution. -**Tech Stack**: Python 3.11+, Dear PyGui / ImGui, FastAPI, Uvicorn -**Providers**: Gemini API, Anthropic API, DeepSeek, Gemini CLI (headless) +**Design Philosophy**: Full manual control over vendor API metrics, agent capabilities, and context memory usage. High information density, tactile interactions, and explicit confirmation for destructive actions. + +**Tech Stack**: Python 3.11+, Dear PyGui / ImGui Bundle, FastAPI, Uvicorn, tree-sitter +**Providers**: Gemini API, Anthropic API, DeepSeek, Gemini CLI (headless), MiniMax **Platform**: Windows (PowerShell) — single developer, local use -![img](./gallery/python_2026-03-01_23-45-34.png) +![img](./gallery/python_2026-03-07_14-32-50.png) + +--- + +## Key Features + +### Multi-Provider Integration +- **Gemini SDK**: Server-side context caching with TTL management, automatic cache rebuilding at 90% TTL +- **Anthropic**: Ephemeral prompt caching with 4-breakpoint system, automatic history truncation at 180K tokens +- **DeepSeek**: Dedicated SDK for code-optimized reasoning +- **Gemini CLI**: Headless adapter with full functional parity, synchronous HITL bridge +- **MiniMax**: Alternative provider support + +### 4-Tier MMA Orchestration +Hierarchical task decomposition with specialized models and strict token firewalling: +- **Tier 1 (Orchestrator)**: Product alignment, epic → tracks +- **Tier 2 (Tech Lead)**: Track → tickets (DAG), persistent context +- **Tier 3 (Worker)**: Stateless TDD implementation, context amnesia +- **Tier 4 (QA)**: Stateless error analysis, no fixes + +### Strict Human-in-the-Loop (HITL) +- **Execution Clutch**: All destructive actions suspend on `threading.Condition` pending GUI approval +- **Three Dialog Types**: ConfirmDialog (scripts), MMAApprovalDialog (steps), MMASpawnApprovalDialog (workers) +- **Editable Payloads**: Review, modify, or reject any AI-generated content before execution + +### 26 MCP Tools with Sandboxing +Three-layer security model: Allowlist Construction → Path Validation → Resolution Gate +- **File I/O**: read, list, search, slice, edit, tree +- **AST-Based (Python)**: skeleton, outline, definition, signature, class summary, docstring +- **Analysis**: summary, git diff, find usages, imports, syntax check, hierarchy +- **Network**: web search, URL fetch +- **Runtime**: UI performance metrics + +### Parallel Tool Execution +Multiple independent tool calls within a single AI turn execute concurrently via `asyncio.gather`, significantly reducing latency. + +### AST-Based Context Management +- **Skeleton View**: Signatures + docstrings, bodies replaced with `...` +- **Curated View**: Preserves `@core_logic` decorated functions and `[HOT]` comment blocks +- **Targeted View**: Extracts only specified symbols and their dependencies +- **Heuristic Summaries**: Token-efficient structural descriptions without AI calls --- @@ -26,35 +68,12 @@ The **MMA (Multi-Model Agent)** system decomposes epics into tracks, tracks into | Guide | Scope | |---|---| +| [Readme](./docs/Readme.md) | Documentation index, GUI panel reference, configuration files, environment variables | | [Architecture](./docs/guide_architecture.md) | Threading model, event system, AI client multi-provider architecture, HITL mechanism, comms logging | -| [Tools & IPC](./docs/guide_tools.md) | MCP Bridge security model, all 26 native tools, Hook API endpoints, ApiHookClient reference, shell runner | -| [MMA Orchestration](./docs/guide_mma.md) | 4-tier hierarchy, Ticket/Track data structures, DAG engine, ConductorEngine execution loop, worker lifecycle | -| [Simulations](./docs/guide_simulations.md) | `live_gui` fixture, Puppeteer pattern, mock provider, visual verification patterns, ASTParser / summarizer | - ---- - -## Module Map - -Core implementation resides in the `src/` directory. - -| File | Role | -|---|---| -| `src/gui_2.py` | Primary ImGui interface — App class, frame-sync, HITL dialogs | -| `src/ai_client.py` | Multi-provider LLM abstraction (Gemini, Anthropic, DeepSeek, Gemini CLI) | -| `src/mcp_client.py` | 26 MCP tools with filesystem sandboxing and tool dispatch | -| `src/api_hooks.py` | HookServer — REST API for external automation on `:8999` | -| `src/api_hook_client.py` | Python client for the Hook API (used by tests and external tooling) | -| `src/multi_agent_conductor.py` | ConductorEngine — Tier 2 orchestration loop with DAG execution | -| `src/conductor_tech_lead.py` | Tier 2 ticket generation from track briefs | -| `src/dag_engine.py` | TrackDAG (dependency graph) + ExecutionEngine (tick-based state machine) | -| `src/models.py` | Ticket, Track, WorkerContext dataclasses | -| `src/events.py` | EventEmitter, AsyncEventQueue, UserRequestEvent | -| `src/project_manager.py` | TOML config persistence, discussion management, track state | -| `src/session_logger.py` | JSON-L + markdown audit trails (comms, tools, CLI, hooks) | -| `src/shell_runner.py` | PowerShell execution with timeout, env config, QA callback | -| `src/file_cache.py` | ASTParser (tree-sitter) — skeleton and curated views | -| `src/summarize.py` | Heuristic file summaries (imports, classes, functions) | -| `src/outline_tool.py` | Hierarchical code outline via stdlib `ast` | +| [Tools & IPC](./docs/guide_tools.md) | MCP Bridge 3-layer security, 26 tool inventory, Hook API endpoints, ApiHookClient reference, shell runner | +| [MMA Orchestration](./docs/guide_mma.md) | 4-tier hierarchy, Ticket/Track data structures, DAG engine, ConductorEngine, worker lifecycle, abort propagation | +| [Simulations](./docs/guide_simulations.md) | `live_gui` fixture, Puppeteer pattern, mock provider, visual verification, ASTParser / summarizer | +| [Meta-Boundary](./docs/guide_meta_boundary.md) | Application vs Meta-Tooling domains, inter-domain bridges, safety model separation | --- @@ -105,6 +124,151 @@ uv run pytest tests/ -v --- +## MMA 4-Tier Architecture + +The Multi-Model Agent system uses hierarchical task decomposition with specialized models at each tier: + +| Tier | Role | Model | Responsibility | +|------|------|-------|----------------| +| **Tier 1** | Orchestrator | `gemini-3.1-pro-preview` | Product alignment, epic → tracks, track initialization | +| **Tier 2** | Tech Lead | `gemini-3-flash-preview` | Track → tickets (DAG), architectural oversight, persistent context | +| **Tier 3** | Worker | `gemini-2.5-flash-lite` / `deepseek-v3` | Stateless TDD implementation per ticket, context amnesia | +| **Tier 4** | QA | `gemini-2.5-flash-lite` / `deepseek-v3` | Stateless error analysis, diagnostics only (no fixes) | + +**Key Principles:** +- **Context Amnesia**: Tier 3/4 workers start with `ai_client.reset_session()` — no history bleed +- **Token Firewalling**: Each tier receives only the context it needs +- **Model Escalation**: Failed tickets automatically retry with more capable models +- **WorkerPool**: Bounded concurrency (default: 4 workers) with semaphore gating + +--- + +## Module by Domain + +### src/ — Core implementation + +| File | Role | +|---|---| +| `src/gui_2.py` | Primary ImGui interface — App class, frame-sync, HITL dialogs, event system | +| `src/ai_client.py` | Multi-provider LLM abstraction (Gemini, Anthropic, DeepSeek, MiniMax) | +| `src/mcp_client.py` | 26 MCP tools with filesystem sandboxing and tool dispatch | +| `src/api_hooks.py` | HookServer — REST API on `127.0.0.1:8999 for external automation | +| `src/api_hook_client.py` | Python client for the Hook API (used by tests and external tooling) | +| `src/multi_agent_conductor.py` | ConductorEngine — Tier 2 orchestration loop with DAG execution | +| `src/conductor_tech_lead.py` | Tier 2 ticket generation from track briefs | +| `src/dag_engine.py` | TrackDAG (dependency graph) + ExecutionEngine (tick-based state machine) | +| `src/models.py` | Ticket, Track, WorkerContext, Metadata, Track state | +| `src/events.py` | EventEmitter, AsyncEventQueue, UserRequestEvent | +| `src/project_manager.py` | TOML config persistence, discussion management, track state | +| `src/session_logger.py` | JSON-L + markdown audit trails (comms, tools, CLI, hooks) | +| `src/shell_runner.py` | PowerShell execution with timeout, env config, QA callback | +| `src/file_cache.py` | ASTParser (tree-sitter) — skeleton, curated, and targeted views | +| `src/summarize.py` | Heuristic file summaries (imports, classes, functions) | +| `src/outline_tool.py` | Hierarchical code outline via stdlib `ast` | +| `src/performance_monitor.py` | FPS, frame time, CPU, input lag tracking | +| `src/log_registry.py` | Session metadata persistence | +| `src/log_pruner.py` | Automated log cleanup based on age and whitelist | +| `src/paths.py` | Centralized path resolution with environment variable overrides | +| `src/cost_tracker.py` | Token cost estimation for API calls | +| `src/gemini_cli_adapter.py` | CLI subprocess adapter with session management | +| `src/mma_prompts.py` | Tier-specific system prompts for MMA orchestration | +| `src/theme_*.py` | UI theming (dark, light modes) | + +Simulation modules in `simulation/`: +| File | Role | +|---|--- | +| `simulation/sim_base.py` | BaseSimulation class with setup/teardown lifecycle | +| `simulation/workflow_sim.py` | WorkflowSimulator — high-level GUI automation | +| `simulation/user_agent.py` | UserSimAgent — simulated user behavior (reading time, thinking delays) | + +--- + +## Setup +The MCP Bridge implements a three-layer security model in `mcp_client.py`: + +Every tool accessing the filesystem passes through `_resolve_and_check(path)` before any I/O. + +### Layer 1: Allowlist Construction (`configure`) +Called by `ai_client` before each send cycle: +1. Resets `_allowed_paths` and `_base_dirs` to empty sets +2. Sets `_primary_base_dir` from `extra_base_dirs[0]` +3. Iterates `file_items`, resolving paths, adding to allowlist +4. Blacklist check: `history.toml`, `*_history.toml`, `config.toml`, `credentials.toml` are NEVER allowed + +### Layer 2: Path Validation (`_is_allowed`) +Checks run in order: +1. **Blacklist**: `history.toml`, `*_history.toml` → hard deny +2. **Explicit allowlist**: Path in `_allowed_paths` → allow +3. **CWD fallback**: If no base dirs, allow `cwd()` subpaths +4. **Base containment**: Must be subpath of `_base_dirs` +5. **Default deny**: All other paths rejected + +### Layer 3: Resolution Gate (`_resolve_and_check`) +1. Convert raw path string to `Path` +2. If not absolute, prepend `_primary_base_dir` +3. Resolve to absolute (follows symlinks) +4. Call `_is_allowed()` +5. Return `(resolved_path, "")` on success or `(None, error_message)` on failure + +All paths are resolved (following symlinks) before comparison, preventing symlink-based traversal attacks. + +### Security Model + +The MCP Bridge implements a three-layer security model in `mcp_client.py`. Every tool accessing the filesystem passes through `_resolve_and_check(path)` before any I/O. + +### Layer 1: Allowlist Construction (`configure`) +Called by `ai_client` before each send cycle: +1. Resets `_allowed_paths` and `_base_dirs` to empty sets. +2. Sets `_primary_base_dir` from `extra_base_dirs[0]` (resolved) or falls back to cwd(). +3. Iterates `file_items`, resolving each path to an absolute path, adding to `_allowed_paths`; its parent directory is added to `_base_dirs`. +4. Any entries in `extra_base_dirs` that are valid directories are also added to `_base_dirs`. + +### Layer 2: Path Validation (`_is_allowed`) +Checks run in this exact order: +1. **Blacklist**: `history.toml`, `*_history.toml`, `config`, `credentials` → hard deny +2. **Explicit allowlist**: Path in `_allowed_paths` → allow +7. **CWD fallback**: If no base dirs, any under `cwd()` is allowed (fail-safe for projects without explicit base dirs) +8. **Base containment**: Must be a subpath of at least one entry in `_base_dirs` (via `relative_to()`) +9. **Default deny**: All other paths rejected +All paths are resolved (following symlinks) before comparison, preventing symlink-based traversal attacks. + +### Layer 3: Resolution Gate (`_resolve_and_check`) +Every tool call passes through this: +1. Convert raw path string to `Path`. +2. If not absolute, prepend `_primary_base_dir`. +3. Resolve to absolute. +4. Call `_is_allowed()`. +5. Return `(resolved_path, "")` on success, `(None, error_message)` on failure +All paths are resolved (following symlinks) before comparison, preventing symlink-based traversal attacks. + +--- + +## Conductor SystemThe project uses a spec-driven track system in `conductor/` for structured development: + +``` +conductor/ +├── workflow.md # Task lifecycle, TDD protocol, phase verification +├── tech-stack.md # Technology constraints and patterns +├── product.md # Product vision and guidelines +├── product-guidelines.md # Code standards, UX principles +└── tracks/ + └── _/ + ├── spec.md # Track specification + ├── plan.md # Implementation plan with checkbox tasks + ├── metadata.json # Track metadata + └── state.toml # Structured state with task list +``` + +**Key Concepts:** +- **Tracks**: Self-contained implementation units with spec, plan, and state +- **TDD Protocol**: Red (failing tests) → Green (pass) → Refactor +- **Phase Checkpoints**: Verification gates with git notes for audit trails +- **MMA Delegation**: Tracks are executed via the 4-tier agent hierarchy + +See `conductor/workflow.md` for the full development workflow. + +--- + ## Project Configuration Projects are stored as `.toml` files. The discussion history is split into a sibling `_history.toml` to keep the main config lean. @@ -134,3 +298,31 @@ run_powershell = true read_file = true # ... 26 tool flags ``` + +--- + +## Quick Reference + +### Hook API Endpoints (port 8999) + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/status` | GET | Health check | +| `/api/project` | GET/POST | Project config | +| `/api/session` | GET/POST | Discussion entries | +| `/api/gui` | POST | GUI task queue | +| `/api/gui/mma_status` | GET | Full MMA state | +| `/api/gui/value/` | GET | Read GUI field | +| `/api/ask` | POST | Blocking HITL dialog | + +### MCP Tool Categories + +| Category | Tools | +|----------|-------| +| **File I/O** | `read_file`, `list_directory`, `search_files`, `get_tree`, `get_file_slice`, `set_file_slice`, `edit_file` | +| **AST (Python)** | `py_get_skeleton`, `py_get_code_outline`, `py_get_definition`, `py_update_definition`, `py_get_signature`, `py_set_signature`, `py_get_class_summary`, `py_get_var_declaration`, `py_set_var_declaration`, `py_get_docstring` | +| **Analysis** | `get_file_summary`, `get_git_diff`, `py_find_usages`, `py_get_imports`, `py_check_syntax`, `py_get_hierarchy` | +| **Network** | `web_search`, `fetch_url` | +| **Runtime** | `get_ui_performance` | + +--- diff --git a/docs/Readme.md b/docs/Readme.md index 677c06c..e615d9b 100644 --- a/docs/Readme.md +++ b/docs/Readme.md @@ -1,6 +1,12 @@ # Documentation Index -[Top](../Readme.md) +[Top](../README.md) + +--- + +## Overview + +This documentation suite provides comprehensive technical reference for the Manual Slop application — a GUI orchestrator for local LLM-driven coding sessions. The guides follow a strict old-school technical documentation style, emphasizing architectural depth, state management details, algorithmic breakdowns, and structural formats. --- @@ -8,68 +14,341 @@ | Guide | Contents | |---|---| -| [Architecture](guide_architecture.md) | Thread domains, cross-thread data structures, event system, application lifetime, task pipeline (producer-consumer), Execution Clutch (HITL), AI client multi-provider architecture, Anthropic/Gemini caching strategies, context refresh, comms logging, state machines | -| [Meta-Boundary](guide_meta_boundary.md) | Explicit distinction between the Application's domain (Strict HITL) and the Meta-Tooling domain (autonomous agents), preventing feature bleed and safety bypasses via shared bridges like `mcp_client.py`. | -| [Tools & IPC](guide_tools.md) | MCP Bridge 3-layer security model, all 26 native tool signatures, Hook API GET/POST endpoints with request/response formats, ApiHookClient method reference, `/api/ask` synchronous HITL protocol, session logging, shell runner | -| [MMA Orchestration](guide_mma.md) | Ticket/Track/WorkerContext data structures, DAG engine (cycle detection, topological sort), ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia, Tier 4 QA integration, token firewalling, track state persistence | -| [Simulations](guide_simulations.md) | `live_gui` pytest fixture lifecycle, `VerificationLogger`, process cleanup, Puppeteer pattern (8-stage MMA simulation), approval automation, mock provider (`mock_gemini_cli.py`) with JSON-L protocol, visual verification patterns, ASTParser (tree-sitter) vs summarizer (stdlib `ast`) | +| [Architecture](guide_architecture.md) | Thread domains (GUI Main, Asyncio Worker, HookServer, Ad-hoc), cross-thread data structures (AsyncEventQueue, Guarded Lists, Condition-Variable Dialogs), event system (EventEmitter, SyncEventQueue, UserRequestEvent), application lifetime (boot sequence, shutdown sequence), task pipeline (producer-consumer synchronization), Execution Clutch (HITL mechanism with ConfirmDialog, MMAApprovalDialog, MMASpawnApprovalDialog), AI client multi-provider architecture (Gemini SDK, Anthropic, DeepSeek, Gemini CLI, MiniMax), Anthropic/Gemini caching strategies (4-breakpoint system, server-side TTL), context refresh mechanism (mtime-based file re-reading, diff injection), comms logging (JSON-L format), state machines (ai_status, HITL dialog state) | +| [Meta-Boundary](guide_meta_boundary.md) | Explicit distinction between the Application's domain (Strict HITL — `gui_2.py`, `ai_client.py`, `multi_agent_conductor.py`, `dag_engine.py`) and the Meta-Tooling domain (`scripts/mma_exec.py`, `scripts/claude_mma_exec.py`, `scripts/tool_call.py`, `scripts/mcp_server.py`, `.gemini/`, `.claude/`), preventing feature bleed and safety bypasses via shared bridges like `mcp_client.py`. Documents the Inter-Domain Bridges (`cli_tool_bridge.py`, `claude_tool_bridge.py`) and the `GEMINI_CLI_HOOK_CONTEXT` environment variable. | +| [Tools & IPC](guide_tools.md) | MCP Bridge 3-layer security model (Allowlist Construction, Path Validation, Resolution Gate), all 26 native tool signatures with parameters and behavior (File I/O, AST-Based, Analysis, Network, Runtime), Hook API GET/POST endpoints with request/response formats, ApiHookClient method reference (Connection Methods, State Query Methods, GUI Manipulation Methods, Polling Methods, HITL Method), `/api/ask` synchronous HITL protocol (blocking request-response over HTTP), session logging (comms.log, toolcalls.log, apihooks.log, clicalls.log, scripts/generated/*.ps1), shell runner (mcp_env.toml configuration, run_powershell function with timeout handling and QA callback integration) | +| [MMA Orchestration](guide_mma.md) | Ticket/Track/WorkerContext data structures (from `models.py`), DAG engine (TrackDAG class with cycle detection, topological sort, cascade_blocks; ExecutionEngine class with tick-based state machine), ConductorEngine execution loop (run method, _push_state for state broadcast, parse_json_tickets for ingestion), Tier 2 ticket generation (generate_tickets, topological_sort), Tier 3 worker lifecycle (run_worker_lifecycle with Context Amnesia, AST skeleton injection, HITL clutch integration via confirm_spawn and confirm_execution), Tier 4 QA integration (run_tier4_analysis, run_tier4_patch_callback), token firewalling (tier_usage tracking, model escalation), track state persistence (TrackState, save_track_state, load_track_state, get_all_tracks) | +| [Simulations](guide_simulations.md) | Structural Testing Contract (Ban on Arbitrary Core Mocking, `live_gui` Standard, Artifact Isolation), `live_gui` pytest fixture lifecycle (spawning, readiness polling, failure path, teardown, session isolation via reset_ai_client), VerificationLogger for structured diagnostic logging, process cleanup (kill_process_tree for Windows/Unix), Puppeteer pattern (8-stage MMA simulation with mock provider setup, epic planning, track acceptance, ticket loading, status transitions, worker output verification), mock provider strategy (`tests/mock_gemini_cli.py` with JSON-L protocol, input mechanisms, response routing, output protocol), visual verification patterns (DAG integrity, stream telemetry, modal state, performance monitoring), supporting analysis modules (ASTParser with tree-sitter, summarize.py heuristic summaries, outline_tool.py hierarchical outlines) | --- ## GUI Panels -### Projects Panel +### Context Hub -Configuration and context management. Specifies the Git Directory (for commit tracking) and tracked file paths. Project switching swaps the active file list, discussion history, and settings via `.toml` profiles. +The primary panel for project and file management. -- **Word-Wrap Toggle**: Dynamically swaps text rendering in large read-only panels (Responses, Comms Log) between unwrapped (code formatting) and wrapped (prose). +- **Project Selector**: Switch between `.toml` configurations. Changing projects swaps the active file list, discussion history, and settings. +- **Git Directory**: Path to the repository for commit tracking and git operations. +- **Main Context File**: Optional primary context document for the project. +- **Output Dir**: Directory where generated markdown files are written. +- **Word-Wrap Toggle**: Dynamically swaps text rendering in large read-only panels between unwrapped (code formatting) and wrapped (prose). +- **Summary Only**: When enabled, sends file structure summaries instead of full content to reduce token usage. +- **Auto-Scroll Comms/Tool History**: Automatically scrolls to the bottom when new entries arrive. -### Discussion History +### Files & Media Panel + +Controls what context is compiled and sent to the AI. + +- **Base Dir**: Root directory for path resolution and MCP tool constraints. +- **Paths**: Explicit files or wildcard globs (`src/**/*.py`). +- **File Flags**: + - **Auto-Aggregate**: Include in context compilation. + - **Force Full**: Bypass summary-only mode for this file. +- **Cache Indicator**: Green dot (●) indicates file is in provider's context cache. + +### Discussion Hub Manages conversational branches to prevent context poisoning across tasks. - **Discussions Sub-Menu**: Create separate timelines for different tasks (e.g., "Refactoring Auth" vs. "Adding API Endpoints"). - **Git Commit Tracking**: "Update Commit" reads HEAD from the project's git directory and stamps the discussion. -- **Entry Management**: Each turn has a Role (User, AI, System). Toggle between Read/Edit modes, collapse entries, or open in the Global Text Viewer via `[+ Max]`. +- **Entry Management**: Each turn has a Role (User, AI, System, Context, Tool, Vendor API). Toggle between Read/Edit modes, collapse entries, or open in the Global Text Viewer via `[+ Max]`. - **Auto-Add**: When toggled, Message panel sends and Response panel returns are automatically appended to the current discussion. +- **Truncate History**: Reduces history to N most recent User/AI pairs. -### Files & Screenshots +### AI Settings Panel -Controls what is fed into the context compiler. +- **Provider**: Switch between API backends (Gemini, Anthropic, DeepSeek, Gemini CLI, MiniMax). +- **Model**: Select from available models for the current provider. +- **Fetch Models**: Queries the active provider for the latest model list. +- **Temperature / Max Tokens**: Generation parameters. +- **History Truncation Limit**: Character limit for truncating old tool outputs. -- **Base Dir**: Defines the root for path resolution and MCP tool constraints. -- **Paths**: Explicit files or wildcard globs (`src/**/*.rs`). -- Full file contents are inlined by default. The AI can call `get_file_summary` for compact structural views. +### Token Budget Panel -### Provider +- **Current Usage**: Real-time token counts (input, output, cache read, cache creation). +- **Budget Percentage**: Visual indicator of context window utilization. +- **Provider-Specific Limits**: Anthropic (180K prompt), Gemini (900K input). -Switches between API backends (Gemini, Anthropic, DeepSeek, Gemini CLI). "Fetch Models" queries the active provider for the latest model list. +### Cache Panel -### Message & Response +- **Gemini Cache Stats**: Count, total size, and list of cached files. +- **Clear Cache**: Forces cache invalidation on next send. -- **Message**: User input field. +### Tool Analytics Panel + +- **Per-Tool Statistics**: Call count, total time, failure count for each tool. +- **Session Insights**: Burn rate estimation, average latency. + +### Message & Response Panels + +- **Message**: User input field with auto-expanding height. - **Gen + Send**: Compiles markdown context and dispatches to the AI via `AsyncEventQueue`. - **MD Only**: Dry-runs the compiler for context inspection without API cost. - **Response**: Read-only output; flashes green on new response. -### Global Text Viewer & Script Outputs +### Operations Hub -- **Last Script Output**: Pops up (flashing blue) whenever the AI executes a script. Shows both the executed script and stdout/stderr. `[+ Maximize]` reads from stored instance variables, not DPG widget tags, so it works regardless of word-wrap state. -- **Text Viewer**: Large resizable popup invoked by `[+]` / `[+ Maximize]` buttons. For deep-reading long logs, discussion entries, or script bodies. -- **Confirm Dialog**: The `[+ Maximize]` button in the script approval modal passes script text as `user_data` at button-creation time — safe to click even after the dialog is dismissed. - -### Tool Calls & Comms History - -Real-time display of MCP tool invocations and raw API traffic. Each comms entry: timestamp, direction (OUT/IN), kind, provider, model, payload. +- **Focus Agent Filter**: Show comms/tool history for specific tier (All, Tier 2, Tier 3, Tier 4). +- **Comms History**: Real-time display of raw API traffic (timestamp, direction, kind, provider, model, payload preview). +- **Tool Calls**: Sequential log of tool invocations with script/args and result preview. ### MMA Dashboard -Displays the 4-tier orchestration state: active track, ticket DAG with status indicators, per-tier token usage, output streams. Approval buttons for spawn/step/tool gates. +The 4-tier orchestration control center. -### System Prompts +- **Track Browser**: List of all tracks with status, progress, and actions (Load, Delete). +- **Active Track Summary**: Color-coded progress bar, ticket status breakdown (Completed, In Progress, Blocked, Todo), ETA estimation. +- **Visual Task DAG**: Node-based visualization using `imgui-node-editor` with color-coded states (Ready, Running, Blocked, Done). +- **Ticket Queue Management**: Bulk operations (Execute, Skip, Block), drag-and-drop reordering, priority assignment. +- **Tier Streams**: Real-time output from Tier 1/2/3/4 agents. -Two text inputs for instruction overrides: -1. **Global**: Applied across every project. -2. **Project**: Specific to the active workspace. +### Tier Stream Panels -Concatenated onto the base tool-usage guidelines. +Dedicated windows for each MMA tier: + +- **Tier 1: Strategy**: Orchestrator output for epic planning and track initialization. +- **Tier 2: Tech Lead**: Architectural decisions and ticket generation. +- **Tier 3: Workers**: Individual worker output streams (one per active ticket). +- **Tier 4: QA**: Error analysis and diagnostic summaries. + +### Log Management + +- **Session Registry**: Table of all session logs with metadata (start time, message count, size, whitelist status). +- **Star/Unstar**: Mark sessions for preservation during pruning. +- **Force Prune**: Manually trigger aggressive log cleanup. + +### Diagnostics Panel + +- **Performance Telemetry**: FPS, Frame Time, CPU %, Input Lag with moving averages. +- **Detailed Component Timings**: Per-panel rendering times with threshold alerts. +- **Performance Graphs**: Historical plots for selected metrics. + +--- + +## Configuration Files + +### config.toml (Global) + +```toml +[ai] +provider = "gemini" +model = "gemini-2.5-flash-lite" +temperature = 0.0 +max_tokens = 8192 +history_trunc_limit = 8000 +system_prompt = "" + +[projects] +active = "path/to/project.toml" +paths = ["path/to/project.toml"] + +[gui] +separate_message_panel = false +separate_response_panel = false +separate_tool_calls_panel = false +show_windows = { "Context Hub": true, ... } + +[paths] +logs_dir = "logs/sessions" +scripts_dir = "scripts/generated" +conductor_dir = "conductor" + +[mma] +max_workers = 4 +``` + +### .toml (Per-Project) + +```toml +[project] +name = "my_project" +git_dir = "./my_repo" +system_prompt = "" +main_context = "" + +[files] +base_dir = "." +paths = ["src/**/*.py"] +tier_assignments = { "src/core.py" = 1 } + +[screenshots] +base_dir = "." +paths = [] + +[output] +output_dir = "./md_gen" + +[gemini_cli] +binary_path = "gemini" + +[deepseek] +reasoning_effort = "medium" + +[agent.tools] +run_powershell = true +read_file = true +list_directory = true +search_files = true +get_file_summary = true +web_search = true +fetch_url = true +py_get_skeleton = true +py_get_code_outline = true +get_file_slice = true +set_file_slice = false +edit_file = false +py_get_definition = true +py_update_definition = false +py_get_signature = true +py_set_signature = false +py_get_class_summary = true +py_get_var_declaration = true +py_set_var_declaration = false +get_git_diff = true +py_find_usages = true +py_get_imports = true +py_check_syntax = true +py_get_hierarchy = true +py_get_docstring = true +get_tree = true +get_ui_performance = true + +[mma] +epic = "" +active_track_id = "" +tracks = [] +``` + +### credentials.toml + +```toml +[gemini] +api_key = "YOUR_KEY" + +[anthropic] +api_key = "YOUR_KEY" + +[deepseek] +api_key = "YOUR_KEY" + +[minimax] +api_key = "YOUR_KEY" +``` + +### mcp_env.toml (Optional) + +```toml +[path] +prepend = ["C:/custom/bin"] + +[env] +MY_VAR = "some_value" +EXPANDED = "${HOME}/subdir" +``` + +--- + +## Environment Variables + +| Variable | Purpose | +|---|---| +| `SLOP_CONFIG` | Override path to `config.toml` | +| `SLOP_CREDENTIALS` | Override path to `credentials.toml` | +| `SLOP_MCP_ENV` | Override path to `mcp_env.toml` | +| `SLOP_TEST_HOOKS` | Set to `"1"` to enable test hooks | +| `SLOP_LOGS_DIR` | Override logs directory | +| `SLOP_SCRIPTS_DIR` | Override generated scripts directory | +| `SLOP_CONDUCTOR_DIR` | Override conductor directory | +| `GEMINI_CLI_HOOK_CONTEXT` | Set by bridge scripts to bypass HITL for sub-agents | +| `CLAUDE_CLI_HOOK_CONTEXT` | Set by bridge scripts to bypass HITL for sub-agents | + +--- + +## Exit Codes + +| Code | Meaning | +|---|---| +| 0 | Normal exit | +| 1 | General error | +| 2 | Configuration error | +| 3 | API error | +| 4 | Test failure | + +--- + +## File Layout + +``` +manual_slop/ +├── conductor/ # Conductor system +│ ├── tracks/ # Track directories +│ │ └── / # Per-track files +│ │ ├── spec.md +│ │ ├── plan.md +│ │ ├── metadata.json +│ │ └── state.toml +│ ├── archive/ # Completed tracks +│ ├── product.md # Product definition +│ ├── product-guidelines.md +│ ├── tech-stack.md +│ └── workflow.md +├── docs/ # Deep-dive documentation +│ ├── guide_architecture.md +│ ├── guide_meta_boundary.md +│ ├── guide_mma.md +│ ├── guide_simulations.md +│ └── guide_tools.md +├── logs/ # Runtime logs +│ ├── sessions/ # Session logs +│ │ └── / # Per-session files +│ │ ├── comms.log +│ │ ├── toolcalls.log +│ │ ├── apihooks.log +│ │ └── clicalls.log +│ ├── agents/ # Sub-agent logs +│ ├── errors/ # Error logs +│ └── test/ # Test logs +├── scripts/ # Utility scripts +│ ├── generated/ # AI-generated scripts +│ └── *.py # Build/execution scripts +├── src/ # Core implementation +│ ├── gui_2.py # Primary ImGui interface +│ ├── app_controller.py # Headless controller +│ ├── ai_client.py # Multi-provider LLM abstraction +│ ├── mcp_client.py # 26 MCP tools +│ ├── api_hooks.py # HookServer REST API +│ ├── api_hook_client.py # Hook API client +│ ├── multi_agent_conductor.py # ConductorEngine +│ ├── conductor_tech_lead.py # Tier 2 ticket generation +│ ├── dag_engine.py # TrackDAG + ExecutionEngine +│ ├── models.py # Ticket, Track, WorkerContext +│ ├── events.py # EventEmitter, SyncEventQueue +│ ├── project_manager.py # TOML persistence +│ ├── session_logger.py # JSON-L logging +│ ├── shell_runner.py # PowerShell execution +│ ├── file_cache.py # ASTParser (tree-sitter) +│ ├── summarize.py # Heuristic summaries +│ ├── outline_tool.py # Code outlining +│ ├── performance_monitor.py # FPS/CPU tracking +│ ├── log_registry.py # Session metadata +│ ├── log_pruner.py # Log cleanup +│ ├── paths.py # Path resolution +│ ├── cost_tracker.py # Token cost estimation +│ ├── gemini_cli_adapter.py # CLI subprocess adapter +│ ├── mma_prompts.py # Tier system prompts +│ └── theme*.py # UI theming +├── simulation/ # Test simulations +│ ├── sim_base.py # BaseSimulation class +│ ├── workflow_sim.py # WorkflowSimulator +│ ├── user_agent.py # UserSimAgent +│ └── sim_*.py # Specific simulations +├── tests/ # Test suite +│ ├── conftest.py # Fixtures (live_gui) +│ ├── artifacts/ # Test outputs +│ └── test_*.py # Test files +├── sloppy.py # Main entry point +├── config.toml # Global configuration +└── credentials.toml # API keys +``` diff --git a/docs/guide_architecture.md b/docs/guide_architecture.md index 590a4f2..7133b65 100644 --- a/docs/guide_architecture.md +++ b/docs/guide_architecture.md @@ -1,12 +1,18 @@ # Architecture -[Top](../Readme.md) | [Tools & IPC](guide_tools.md) | [MMA Orchestration](guide_mma.md) | [Simulations](guide_simulations.md) +[Top](../README.md) | [Tools & IPC](guide_tools.md) | [MMA Orchestration](guide_mma.md) | [Simulations](guide_simulations.md) --- ## Philosophy: The Decoupled State Machine -Manual Slop solves a single tension: **AI reasoning is high-latency and non-deterministic; GUI interaction must be low-latency and responsive.** The engine enforces strict decoupling between three thread domains so that multi-second LLM calls never block the render loop, and every AI-generated payload passes through a human-auditable gate before execution. +Manual Slop solves a single tension: **AI reasoning is high-latency and non-deterministic; GUI interaction must be low-latency and responsive.** The engine enforces strict decoupling between four thread domains so that multi-second LLM calls never block the render loop, and every AI-generated payload passes through a human-auditable gate before execution. + +The architectural philosophy follows data-oriented design principles: +- The GUI (`gui_2.py`, `app_controller.py`) remains a pure visualization of application state +- State mutations occur only through lock-guarded queues consumed on the main render thread +- Background threads never write GUI state directly — they serialize task dicts for later consumption +- All cross-thread communication uses explicit synchronization primitives (Locks, Conditions, Events) ## Project Structure @@ -36,17 +42,17 @@ manual_slop/ Four distinct thread domains operate concurrently: -| Domain | Created By | Purpose | Lifecycle | -|---|---|---|---| -| **Main / GUI** | `immapp.run()` | Dear ImGui retained-mode render loop; sole writer of GUI state | App lifetime | -| **Asyncio Worker** | `App.__init__` via `threading.Thread(daemon=True)` | Event queue processing, AI client calls | Daemon (dies with process) | -| **HookServer** | `api_hooks.HookServer.start()` | HTTP API on `:8999` for external automation and IPC | Daemon thread | -| **Ad-hoc** | Transient `threading.Thread` calls | Model-fetching, legacy send paths | Short-lived | +| Domain | Created By | Purpose | Lifecycle | Key Synchronization Primitives | +|---|---|---|---|---| +| **Main / GUI** | `immapp.run()` | Dear ImGui retained-mode render loop; sole writer of GUI state | App lifetime | None (consumer of queues) | +| **Asyncio Worker** | `App.__init__` via `threading.Thread(daemon=True)` | Event queue processing, AI client calls | Daemon (dies with process) | `AsyncEventQueue`, `threading.Lock` | +| **HookServer** | `api_hooks.HookServer.start()` | HTTP API on `:8999` for external automation and IPC | Daemon thread | `threading.Lock`, `threading.Event` | +| **Ad-hoc** | Transient `threading.Thread` calls | Model-fetching, legacy send paths, log pruning | Short-lived | Task-specific locks | The asyncio worker is **not** the main thread's event loop. It runs a dedicated `asyncio.new_event_loop()` on its own daemon thread: ```python -# App.__init__: +# AppController.__init__: self._loop = asyncio.new_event_loop() self._loop_thread = threading.Thread(target=self._run_event_loop, daemon=True) self._loop_thread.start() @@ -60,6 +66,25 @@ def _run_event_loop(self) -> None: The GUI thread uses `asyncio.run_coroutine_threadsafe(coro, self._loop)` to push work into this loop. +### Thread-Local Context Isolation + +For concurrent multi-agent execution, the application uses `threading.local()` to manage per-thread context: + +```python +# ai_client.py +_local_storage = threading.local() + +def get_current_tier() -> Optional[str]: + """Returns the current tier from thread-local storage.""" + return getattr(_local_storage, "current_tier", None) + +def set_current_tier(tier: Optional[str]) -> None: + """Sets the current tier in thread-local storage.""" + _local_storage.current_tier = tier +``` + +This ensures that comms log entries and tool calls are correctly tagged with their source tier even when multiple workers execute concurrently. + --- ## Cross-Thread Data Structures @@ -553,12 +578,247 @@ Every interaction is designed to be auditable: - **CLI Call Logs**: Subprocess execution details (command, stdin, stdout, stderr, latency) to `clicalls.log` as JSON-L. - **Performance Monitor**: Real-time FPS, Frame Time, CPU, Input Lag tracked and queryable via Hook API. +### Telemetry Data Structures + +```python +# Comms log entry (JSON-L) +{ + "ts": "14:32:05", + "direction": "OUT", + "kind": "tool_call", + "provider": "gemini", + "model": "gemini-2.5-flash-lite", + "payload": { + "name": "run_powershell", + "id": "call_abc123", + "script": "Get-ChildItem" + }, + "source_tier": "Tier 3", + "local_ts": 1709875925.123 +} + +# Performance metrics (via get_metrics()) +{ + "fps": 60.0, + "fps_avg": 58.5, + "last_frame_time_ms": 16.67, + "frame_time_ms_avg": 17.1, + "cpu_percent": 12.5, + "cpu_percent_avg": 15.2, + "input_lag_ms": 2.3, + "input_lag_ms_avg": 3.1, + "time_render_mma_dashboard_ms": 5.2, + "time_render_mma_dashboard_ms_avg": 4.8 +} +``` + +--- + +## MMA Engine Architecture + +### WorkerPool: Concurrent Worker Management + +The `WorkerPool` class in `multi_agent_conductor.py` manages a bounded pool of worker threads: + +```python +class WorkerPool: + def __init__(self, max_workers: int = 4): + self.max_workers = max_workers + self._active: dict[str, threading.Thread] = {} + self._lock = threading.Lock() + self._semaphore = threading.Semaphore(max_workers) + + def spawn(self, ticket_id: str, target: Callable, args: tuple) -> Optional[threading.Thread]: + with self._lock: + if len(self._active) >= self.max_workers: + return None + + def wrapper(*a, **kw): + try: + with self._semaphore: + target(*a, **kw) + finally: + with self._lock: + self._active.pop(ticket_id, None) + + t = threading.Thread(target=wrapper, args=args, daemon=True) + with self._lock: + self._active[ticket_id] = t + t.start() + return t +``` + +**Key behaviors**: +- **Bounded concurrency**: `max_workers` (default 4) limits parallel ticket execution +- **Semaphore gating**: Ensures no more than `max_workers` can execute simultaneously +- **Automatic cleanup**: Thread removes itself from `_active` dict on completion +- **Non-blocking spawn**: Returns `None` if pool is full, allowing the engine to defer + +### ConductorEngine: Orchestration Loop + +The `ConductorEngine` orchestrates ticket execution within a track: + +```python +class ConductorEngine: + def __init__(self, track: Track, event_queue: Optional[SyncEventQueue] = None, + auto_queue: bool = False) -> None: + self.track = track + self.event_queue = event_queue + self.dag = TrackDAG(self.track.tickets) + self.engine = ExecutionEngine(self.dag, auto_queue=auto_queue) + self.pool = WorkerPool(max_workers=4) + self._abort_events: dict[str, threading.Event] = {} + self._pause_event = threading.Event() + self._tier_usage_lock = threading.Lock() + self.tier_usage = { + "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"}, + "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"}, + "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"}, + "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"}, + } +``` + +**Main execution loop** (`run` method): + +1. **Pause check**: If `_pause_event` is set, sleep and broadcast "paused" status +2. **DAG tick**: Call `engine.tick()` to get ready tasks +3. **Completion check**: If no ready tasks and all completed, break with "done" status +4. **Wait for workers**: If tasks in-progress or pool active, sleep and continue +5. **Blockage detection**: If no ready, no in-progress, and not all done, break with "blocked" status +6. **Spawn workers**: For each ready task, spawn a worker via `pool.spawn()` +7. **Model escalation**: Workers use `models_list[min(retry_count, 2)]` for capability upgrade on retries + +### Abort Event Propagation + +Each ticket has an associated `threading.Event` for abort signaling: + +```python +# Before spawning worker +self._abort_events[ticket.id] = threading.Event() + +# Worker checks abort at three points: +# 1. Before major work +if abort_event.is_set(): + ticket.status = "killed" + return "ABORTED" + +# 2. Before tool execution (in clutch_callback) +if abort_event.is_set(): + return False # Reject tool + +# 3. After blocking send() returns +if abort_event.is_set(): + ticket.status = "killed" + return "ABORTED" +``` + --- ## Architectural Invariants 1. **Single-writer principle**: All GUI state mutations happen on the main thread via `_process_pending_gui_tasks`. Background threads never write GUI state directly. + 2. **Copy-and-clear lock pattern**: `_process_pending_gui_tasks` snapshots and clears the task list under the lock, then processes outside the lock. + 3. **Context Amnesia**: Each MMA Tier 3 Worker starts with `ai_client.reset_session()`. No conversational bleed between tickets. + 4. **Send serialization**: `_send_lock` ensures only one provider call is in-flight at a time across all threads. + 5. **Dual-Flush persistence**: On exit, state is committed to both project-level and global-level config files. + +6. **No cross-thread GUI mutation**: Background threads must push tasks to `_pending_gui_tasks` rather than calling GUI methods directly. + +7. **Abort-before-execution**: Workers check abort events before major work phases, enabling clean cancellation. + +8. **Bounded worker pool**: `WorkerPool` enforces `max_workers` limit to prevent resource exhaustion. + +--- + +## Error Classification & Recovery + +### ProviderError Taxonomy + +The `ProviderError` class provides structured error classification: + +```python +class ProviderError(Exception): + def __init__(self, kind: str, provider: str, original: Exception): + self.kind = kind # "quota" | "rate_limit" | "auth" | "balance" | "network" | "unknown" + self.provider = provider + self.original = original + + def ui_message(self) -> str: + labels = { + "quota": "QUOTA EXHAUSTED", + "rate_limit": "RATE LIMITED", + "auth": "AUTH / API KEY ERROR", + "balance": "BALANCE / BILLING ERROR", + "network": "NETWORK / CONNECTION ERROR", + "unknown": "API ERROR", + } + return f"[{self.provider.upper()} {labels.get(self.kind, 'API ERROR')}]\n\n{self.original}" +``` + +### Error Recovery Patterns + +| Error Kind | Recovery Strategy | +|---|---| +| `quota` | Display in UI, await user intervention | +| `rate_limit` | Exponential backoff (not yet implemented) | +| `auth` | Prompt for credential verification | +| `balance` | Display billing alert | +| `network` | Auto-retry with timeout | +| `unknown` | Log full traceback, display in UI | + +--- + +## Memory Management + +### History Trimming Strategies + +**Gemini (40% threshold)**: +```python +if total_in > _GEMINI_MAX_INPUT_TOKENS * 0.4: + while len(hist) > 4 and total_in > _GEMINI_MAX_INPUT_TOKENS * 0.3: + # Drop oldest message pairs + hist.pop(0) # Assistant + hist.pop(0) # User +``` + +**Anthropic (180K limit)**: +```python +def _trim_anthropic_history(system_blocks, history): + est = _estimate_prompt_tokens(system_blocks, history) + while len(history) > 3 and est > _ANTHROPIC_MAX_PROMPT_TOKENS: + # Drop turn pairs, preserving tool_result chains + ... +``` + +### Tool Output Budget + +```python +_MAX_TOOL_OUTPUT_BYTES: int = 500_000 # 500KB cumulative + +if _cumulative_tool_bytes > _MAX_TOOL_OUTPUT_BYTES: + # Inject warning, force final answer + parts.append("SYSTEM WARNING: Cumulative tool output exceeded 500KB budget.") +``` + +### AST Cache (file_cache.py) + +```python +_ast_cache: Dict[str, Tuple[float, tree_sitter.Tree]] = {} + +def get_cached_tree(self, path: Optional[str], code: str) -> tree_sitter.Tree: + mtime = p.stat().st_mtime if p.exists() else 0.0 + if path in _ast_cache: + cached_mtime, tree = _ast_cache[path] + if cached_mtime == mtime: + return tree + # Parse and cache with simple LRU (max 10 entries) + if len(_ast_cache) >= 10: + del _ast_cache[next(iter(_ast_cache))] + tree = self.parse(code) + _ast_cache[path] = (mtime, tree) + return tree +``` diff --git a/docs/guide_mma.md b/docs/guide_mma.md index fac825e..ee47884 100644 --- a/docs/guide_mma.md +++ b/docs/guide_mma.md @@ -138,6 +138,31 @@ class ExecutionEngine: --- +## WorkerPool (`multi_agent_conductor.py`) + +Bounded concurrent worker pool with semaphore gating. + +```python +class WorkerPool: + def __init__(self, max_workers: int = 4): + self.max_workers = max_workers + self._active: dict[str, threading.Thread] = {} + self._lock = threading.Lock() + self._semaphore = threading.Semaphore(max_workers) +``` + +**Key Methods:** +- `spawn(ticket_id, target, args)` — Spawns a worker thread if pool has capacity. Returns `None` if full. +- `join_all(timeout)` — Waits for all active workers to complete. +- `get_active_count()` — Returns current number of active workers. +- `is_full()` — Returns `True` if at capacity. + +**Thread Safety:** All state mutations are protected by `_lock`. The semaphore ensures at most `max_workers` threads execute concurrently. + +**Configuration:** `max_workers` is loaded from `config.toml` → `[mma].max_workers` (default: 4). + +--- + ## ConductorEngine (`multi_agent_conductor.py`) The Tier 2 orchestrator. Owns the execution loop that drives tickets through the DAG. @@ -148,13 +173,16 @@ class ConductorEngine: self.track = track self.event_queue = event_queue self.tier_usage = { - "Tier 1": {"input": 0, "output": 0}, - "Tier 2": {"input": 0, "output": 0}, - "Tier 3": {"input": 0, "output": 0}, - "Tier 4": {"input": 0, "output": 0}, + "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"}, + "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"}, + "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"}, + "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"}, } self.dag = TrackDAG(self.track.tickets) self.engine = ExecutionEngine(self.dag, auto_queue=auto_queue) + self.pool = WorkerPool(max_workers=max_workers) + self._abort_events: dict[str, threading.Event] = {} + self._pause_event: threading.Event = threading.Event() ``` ### State Broadcast (`_push_state`) @@ -350,6 +378,80 @@ Each tier operates within its own token budget: --- +## Abort Event Propagation + +Workers can be killed mid-execution via abort events: + +```python +# In ConductorEngine.__init__: +self._abort_events: dict[str, threading.Event] = {} + +# When spawning a worker: +self._abort_events[ticket.id] = threading.Event() + +# To kill a worker: +def kill_worker(self, ticket_id: str) -> None: + if ticket_id in self._abort_events: + self._abort_events[ticket_id].set() # Signal abort + thread = self._active_workers.get(ticket_id) + if thread: + thread.join(timeout=1.0) # Wait for graceful shutdown +``` + +**Abort Check Points in `run_worker_lifecycle`:** +1. **Before major work** — checked immediately after `ai_client.reset_session()` +2. **During clutch_callback** — checked before each tool execution +3. **After blocking send()** — checked after AI call returns + +When abort is detected, the ticket status is set to `"killed"` and the worker exits immediately. + +--- + +## Pause/Resume Control + +The engine supports pausing the entire orchestration pipeline: + +```python +def pause(self) -> None: + self._pause_event.set() + +def resume(self) -> None: + self._pause_event.clear() +``` + +In the main `run()` loop: + +```python +while True: + if self._pause_event.is_set(): + self._push_state(status="paused", active_tier="Paused") + time.sleep(0.5) + continue + # ... normal execution +``` + +This allows the user to pause execution without killing workers. + +--- + +## Model Escalation + +Workers automatically escalate to more capable models on retry: + +```python +models_list = [ + "gemini-2.5-flash-lite", # First attempt + "gemini-2.5-flash", # Second attempt + "gemini-3.1-pro-preview" # Third+ attempt +] +model_idx = min(ticket.retry_count, len(models_list) - 1) +model_name = models_list[model_idx] +``` + +The `ticket.model_override` field can bypass this logic with a specific model. + +--- + ## Track State Persistence Track state can be persisted to disk via `project_manager.py`: diff --git a/docs/guide_simulations.md b/docs/guide_simulations.md index 4b3aa73..b2602f0 100644 --- a/docs/guide_simulations.md +++ b/docs/guide_simulations.md @@ -310,8 +310,9 @@ class ASTParser: self.parser = tree_sitter.Parser(self.language) def parse(self, code: str) -> tree_sitter.Tree - def get_skeleton(self, code: str) -> str - def get_curated_view(self, code: str) -> str + def get_skeleton(self, code: str, path: str = "") -> str + def get_curated_view(self, code: str, path: str = "") -> str + def get_targeted_view(self, code: str, symbols: List[str], path: str = "") -> str ``` **`get_skeleton` algorithm:** @@ -329,6 +330,13 @@ Enhanced skeleton that preserves bodies under two conditions: If either condition is true, the body is preserved verbatim. This enables a two-tier code view: hot paths shown in full, boilerplate compressed. +**`get_targeted_view` algorithm:** +Extracts only the specified symbols and their dependencies: +1. Find all requested symbol definitions (classes, functions, methods). +2. For each symbol, traverse its body to find referenced names. +3. Include only the definitions that are directly referenced. +4. Used for surgical context injection when `target_symbols` is specified on a Ticket. + ### `summarize.py` — Heuristic File Summaries Token-efficient structural descriptions without AI calls: diff --git a/docs/guide_tools.md b/docs/guide_tools.md index 4d6bdb1..2246fa6 100644 --- a/docs/guide_tools.md +++ b/docs/guide_tools.md @@ -141,6 +141,33 @@ The `_get_symbol_node` helper supports dot notation (`ClassName.method_name`) by --- +## Parallel Tool Execution + +Tools can be executed concurrently via `async_dispatch`: + +```python +async def async_dispatch(tool_name: str, tool_input: dict[str, Any]) -> str: + """Dispatch an MCP tool call asynchronously.""" + return await asyncio.to_thread(dispatch, tool_name, tool_input) +``` + +In `ai_client.py`, multiple tool calls within a single AI turn are executed in parallel: + +```python +async def _execute_tool_calls_concurrently(calls, base_dir, ...): + tasks = [] + for fc in calls: + tasks.append(_execute_single_tool_call_async(name, args, ...)) + results = await asyncio.gather(*tasks) + return results +``` + +This significantly reduces latency when the AI makes multiple independent file reads in a single turn. + +**Thread Safety Note:** The `configure()` function resets global state. In concurrent environments, ensure configuration is complete before dispatching tools. + +--- + ## The Hook API: Remote Control & Telemetry Manual Slop exposes a REST-based IPC interface on `127.0.0.1:8999` using Python's `ThreadingHTTPServer`. Each incoming request gets its own thread. @@ -312,6 +339,47 @@ class ApiHookClient: --- +## Parallel Tool Execution + +Tool calls are executed concurrently within a single AI turn using `asyncio.gather`. This significantly reduces latency when multiple independent tools need to be called. + +### `async_dispatch` Implementation + +```python +async def async_dispatch(tool_name: str, tool_input: dict[str, Any]) -> str: + """ + Dispatch an MCP tool call by name asynchronously. + Returns the result as a string. + """ + # Run blocking I/O bound tools in a thread to allow parallel execution + return await asyncio.to_thread(dispatch, tool_name, tool_input) +``` + +All tools are wrapped in `asyncio.to_thread()` to prevent blocking the event loop. This enables `ai_client.py` to execute multiple tools via `asyncio.gather()`: + +```python +results = await asyncio.gather( + async_dispatch("read_file", {"path": "src/module_a.py"}), + async_dispatch("read_file", {"path": "src/module_b.py"}), + async_dispatch("get_file_summary", {"path": "src/module_c.py"}), +) +``` + +### Concurrency Benefits + +| Scenario | Sequential | Parallel | +|----------|------------|----------| +| 3 file reads (100ms each) | 300ms | ~100ms | +| 5 file reads + 1 web fetch (200ms each) | 1200ms | ~200ms | +| Mixed I/O operations | Sum of all | Max of all | + +The parallel execution model is particularly effective for: +- Reading multiple source files simultaneously +- Fetching URLs while performing local file operations +- Running syntax checks across multiple files + +--- + ## Synthetic Context Refresh To minimize token churn and redundant `read_file` calls, the `ai_client` performs a post-tool-execution context refresh. See [guide_architecture.md](guide_architecture.md#context-refresh-mechanism) for the full algorithm. diff --git a/gallery/mpc-hc64_2026-03-08_01-24-13.png b/gallery/mpc-hc64_2026-03-08_01-24-13.png new file mode 100644 index 0000000..7d81c5d Binary files /dev/null and b/gallery/mpc-hc64_2026-03-08_01-24-13.png differ diff --git a/gallery/python_2026-03-07_14-32-50.png b/gallery/python_2026-03-07_14-32-50.png new file mode 100644 index 0000000..cb44d37 Binary files /dev/null and b/gallery/python_2026-03-07_14-32-50.png differ diff --git a/simulation/sim_base.py b/simulation/sim_base.py index 1339a27..2dc155e 100644 --- a/simulation/sim_base.py +++ b/simulation/sim_base.py @@ -1,3 +1,45 @@ +""" +Base Simulation Framework - Abstract base class for GUI automation tests. + +This module provides the foundation for all simulation-based tests in the +Manual Slop test suite. Simulations act as external "puppeteers" that drive +the GUI through the ApiHookClient HTTP interface. + +Architecture: + - BaseSimulation: Abstract base class with setup/teardown lifecycle + - WorkflowSimulator: High-level workflow operations (project setup, file mgmt) + - ApiHookClient: Low-level HTTP client for Hook API communication + +Typical Usage: + class MySimulation(BaseSimulation): + def run(self) -> None: + self.client.set_value('mma_epic_input', 'My epic description') + self.client.click('btn_mma_plan_epic') + # Poll for completion... + status = self.client.get_mma_status() + assert status['mma_status'] == 'done' + + if __name__ == '__main__': + run_sim(MySimulation) + +Lifecycle: + 1. setup() - Connects to GUI, resets session, scaffolds temp project + 2. run() - Implemented by subclass with simulation logic + 3. teardown() - Cleanup (optional file retention for debugging) + +Prerequisites: + - GUI must be running with --enable-test-hooks flag + - HookServer must be listening on http://127.0.0.1:8999 + +Thread Safety: + - Simulations are designed to run in the main thread + - ApiHookClient handles its own connection pooling + +See Also: + - simulation/workflow_sim.py for WorkflowSimulator + - tests/conftest.py for live_gui pytest fixture + - docs/guide_simulations.md for full simulation documentation +""" import sys import os import time diff --git a/simulation/workflow_sim.py b/simulation/workflow_sim.py index 45780d2..2d16170 100644 --- a/simulation/workflow_sim.py +++ b/simulation/workflow_sim.py @@ -1,3 +1,44 @@ +""" +Workflow Simulator - High-level GUI workflow automation for testing. + +This module provides the WorkflowSimulator class which orchestrates complex +multi-step workflows through the GUI via the ApiHookClient. It is designed +for integration testing and automated verification of GUI behavior. + +Key Capabilities: + - Project setup and configuration + - Discussion creation and switching + - AI turn execution with stall detection + - Context file management + - MMA (Multi-Model Agent) orchestration simulation + +Stall Detection: + The run_discussion_turn() method implements intelligent stall detection: + - Monitors ai_status for transitions from busy -> idle + - Detects stalled Tool results (non-busy state with Tool as last role) + - Automatically triggers btn_gen_send to recover from stalls + +Integration with UserSimAgent: + WorkflowSimulator delegates user simulation behavior (reading time, delays) + to UserSimAgent for realistic interaction patterns. + +Thread Safety: + This class is NOT thread-safe. All methods should be called from a single + thread (typically the main test thread). + +Example Usage: + client = ApiHookClient() + sim = WorkflowSimulator(client) + sim.setup_new_project("TestProject", "/path/to/git/dir") + sim.create_discussion("Feature A") + result = sim.run_discussion_turn("Please implement feature A") + +See Also: + - simulation/sim_base.py for BaseSimulation class + - simulation/user_agent.py for UserSimAgent + - api_hook_client.py for ApiHookClient + - docs/guide_simulations.md for full simulation documentation +""" import time from api_hook_client import ApiHookClient from simulation.user_agent import UserSimAgent diff --git a/src/dag_engine.py b/src/dag_engine.py index 7bf5b7b..1d56334 100644 --- a/src/dag_engine.py +++ b/src/dag_engine.py @@ -1,3 +1,31 @@ +""" +DAG Engine - Directed Acyclic Graph execution for MMA ticket orchestration. + +This module provides the core graph data structures and state machine logic +for executing implementation tickets in dependency order within the MMA +(Multi-Model Agent) system. + +Key Classes: + - TrackDAG: Graph representation with cycle detection, topological sorting, + and transitive blocking propagation. + - ExecutionEngine: Tick-based state machine that evaluates the DAG and + manages task status transitions. + +Architecture Integration: + - TrackDAG is constructed from a list of Ticket objects (from models.py) + - ExecutionEngine is consumed by ConductorEngine (multi_agent_conductor.py) + - The tick() method is called in the main orchestration loop to determine + which tasks are ready for execution + +Thread Safety: + - This module is NOT thread-safe. Callers must synchronize access if used + from multiple threads (e.g., the ConductorEngine's async loop). + +See Also: + - docs/guide_mma.md for the full MMA orchestration documentation + - src/models.py for Ticket and Track data structures + - src/multi_agent_conductor.py for ConductorEngine integration +""" from typing import List from src.models import Ticket diff --git a/src/events.py b/src/events.py index 79bd4df..2a3aa4d 100644 --- a/src/events.py +++ b/src/events.py @@ -1,5 +1,33 @@ """ -Decoupled event emission system for cross-module communication. +Events - Decoupled event emission and queuing for cross-thread communication. + +This module provides three complementary patterns for thread-safe communication +between the GUI main thread and background workers: + +1. EventEmitter: Pub/sub pattern for synchronous event broadcast + - Used for: API lifecycle events (request_start, response_received, tool_execution) + - Thread-safe: Callbacks execute on emitter's thread + - Example: ai_client.py emits 'request_start' and 'response_received' events + +2. SyncEventQueue: Producer-consumer pattern via queue.Queue + - Used for: Decoupled task submission where consumer polls at its own pace + - Thread-safe: Built on Python's thread-safe queue.Queue + - Example: Background workers submit tasks, main thread drains queue + +3. UserRequestEvent: Structured payload for AI request data + - Used for: Bundling prompt, context, files, and base_dir into single object + - Immutable data transfer object for cross-thread handoff + +Integration Points: + - ai_client.py: EventEmitter for API lifecycle events + - gui_2.py: Consumes events via _process_event_queue() + - multi_agent_conductor.py: Uses SyncEventQueue for state updates + - api_hooks.py: Pushes events to _api_event_queue for external visibility + +Thread Safety: + - EventEmitter: NOT thread-safe for concurrent on/emit (use from single thread) + - SyncEventQueue: FULLY thread-safe (built on queue.Queue) + - UserRequestEvent: Immutable, safe for concurrent access """ import queue from typing import Callable, Any, Dict, List, Tuple diff --git a/src/mcp_client.py b/src/mcp_client.py index 12cd56b..64b6c17 100644 --- a/src/mcp_client.py +++ b/src/mcp_client.py @@ -1,34 +1,56 @@ # mcp_client.py """ -Note(Gemini): -MCP-style file context tools for manual_slop. -Exposes read-only filesystem tools the AI can call to selectively fetch file -content on demand, instead of having everything inlined into the context block. +MCP Client - Multi-tool filesystem and network operations with sandboxing. -All access is restricted to paths that are either: - - Explicitly listed in the project's allowed_paths set, OR - - Contained within an allowed base_dir (must resolve to a subpath of it) +This module implements a Model Context Protocol (MCP)-like interface for AI +agents to interact with the filesystem and network. It provides 26 tools +with a three-layer security model to prevent unauthorized access. -This is heavily inspired by Claude's own tooling limits. We enforce safety here -so the AI doesn't wander outside the project workspace. +Three-Layer Security Model: + 1. Allowlist Construction (configure()): + - Builds _allowed_paths from project file_items + - Populates _base_dirs from file parents and extra_base_dirs + - Sets _primary_base_dir for relative path resolution + + 2. Path Validation (_is_allowed()): + - Blacklist check: history.toml, *_history.toml, config, credentials + - Explicit allowlist check: _allowed_paths membership + - CWD fallback: allows cwd() subpaths if no base_dirs configured + - Base directory containment: must be subpath of _base_dirs + + 3. Resolution Gate (_resolve_and_check()): + - Converts relative paths using _primary_base_dir + - Resolves symlinks to prevent traversal attacks + - Returns (resolved_path, error_message) tuple + +Tool Categories: + - File I/O: read_file, list_directory, search_files, get_tree + - Surgical Edits: set_file_slice, edit_file + - AST-Based (Python): py_get_skeleton, py_get_code_outline, py_get_definition, + py_update_definition, py_get_signature, py_set_signature, py_get_class_summary, + py_get_var_declaration, py_set_var_declaration + - Analysis: get_file_summary, get_git_diff, py_find_usages, py_get_imports, + py_check_syntax, py_get_hierarchy, py_get_docstring + - Network: web_search, fetch_url + - Runtime: get_ui_performance + +Mutating Tools: + The MUTATING_TOOLS frozenset defines tools that modify files. ai_client.py + checks this set and routes to pre_tool_callback (GUI approval) if present. + +Thread Safety: + This module uses module-level global state (_allowed_paths, _base_dirs). + Call configure() before dispatch() in multi-threaded environments. + +See Also: + - docs/guide_tools.md for complete tool inventory and security model + - src/ai_client.py for tool dispatch integration + - src/shell_runner.py for PowerShell execution """ # mcp_client.py #MCP-style file context tools for manual_slop. -# Exposes read-only filesystem tools the AI can call to selectively fetch file -# content on demand, instead of having everything inlined into the context block. -# All access is restricted to paths that are either: -# - Explicitly listed in the project's allowed_paths set, OR -# - Contained within an allowed base_dir (must resolve to a subpath of it) - -# Tools exposed: -# read_file(path) - return full UTF-8 content of a file -# list_directory(path) - list entries in a directory (names + type) -# search_files(path, pattern) - glob pattern search within an allowed dir -# get_file_summary(path) - return the summarize.py heuristic summary -# - from __future__ import annotations import asyncio from pathlib import Path diff --git a/src/models.py b/src/models.py index a892c20..a8ba7a3 100644 --- a/src/models.py +++ b/src/models.py @@ -1,3 +1,41 @@ +""" +Models - Core data structures for MMA orchestration and project configuration. + +This module defines the primary dataclasses used throughout the Manual Slop +application for representing tasks, tracks, and execution context. + +Key Data Structures: + - Ticket: Atomic unit of work with status, dependencies, and context requirements + - Track: Collection of tickets with a shared goal + - WorkerContext: Execution context for a Tier 3 worker + - Metadata: Track metadata (id, name, status, timestamps) + - TrackState: Serializable track state with discussion history + - FileItem: File configuration with auto-aggregate and force-full flags + +Status Machine (Ticket): + todo -> in_progress -> completed + | | + v v + blocked blocked + +Serialization: + All dataclasses provide to_dict() and from_dict() class methods for TOML/JSON + persistence via project_manager.py. + +Thread Safety: + These dataclasses are NOT thread-safe. Callers must synchronize mutations + if sharing instances across threads (e.g., during ConductorEngine execution). + +Configuration Integration: + - load_config() / save_config() read/write the global config.toml + - AGENT_TOOL_NAMES defines the canonical list of MCP tools available to agents + +See Also: + - docs/guide_mma.md for MMA orchestration documentation + - src/dag_engine.py for TrackDAG and ExecutionEngine + - src/multi_agent_conductor.py for ConductorEngine + - src/project_manager.py for persistence layer +""" from __future__ import annotations import tomllib import datetime