diff --git a/docs/guide_mcp_client.md b/docs/guide_mcp_client.md new file mode 100644 index 00000000..28c932ea --- /dev/null +++ b/docs/guide_mcp_client.md @@ -0,0 +1,410 @@ +# `src/mcp_client.py` — MCP Tools (45 tools, 3-layer security) + +[Top](../README.md) | [Architecture](guide_architecture.md) | [Tools & IPC](guide_tools.md) | [Testing](guide_testing.md) + +--- + +## Overview + +`src/mcp_client.py` (~81KB) is the **MCP (Model Context Protocol) tool implementation** for Manual Slop. It provides 45 tools that the AI can invoke to read/write files, analyze code structure, search symbols, and more. + +The module implements the **client side** of MCP — it provides the tools that an AI model can call during a conversation. It also implements the project's strict filesystem security model. + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────┐ +│ ai_client.send(...) │ +│ AI returns: { "name": "read_file", "args": {...} } │ +└─────────────────┬───────────────────────────────┘ + │ calls + ▼ +┌─────────────────────────────────────────────────┐ +│ mcp_client.dispatch(tool_name, tool_input) │ +│ Routes to the registered tool function │ +│ Returns tool result as string │ +└─────────────────┬───────────────────────────────┘ + │ calls (with security check) + ▼ +┌─────────────────────────────────────────────────┐ +│ 3-Layer Security: Allowlist → Validate → Resolve │ +└─────────────────┬───────────────────────────────┘ + │ passes + ▼ +┌─────────────────────────────────────────────────┐ +│ The actual tool function (45 of them) │ +└─────────────────────────────────────────────────┘ +``` + +--- + +## The 3-Layer Security Model + +Every filesystem access passes through 3 layers: + +### Layer 1: Allowlist Construction (`configure`) + +Called by `ai_client` before each send cycle (and on project load): + +```python +def configure(file_items: list[dict], base_dirs: list[str]) -> None: + """Build the allowlist from the project's tracked files and base dirs.""" + _allowed_paths.clear() + _base_dirs.clear() + if base_dirs: + _primary_base_dir = Path(base_dirs[0]).resolve() + for f in file_items: + if isinstance(f, dict) and "path" in f: + _allowed_paths.add(Path(f["path"]).resolve()) + # Blacklist: history.toml and config.toml are NEVER allowed +``` + +After this call, `_allowed_paths` contains every file the AI is allowed to touch. + +### Layer 2: Path Validation (`_is_allowed`) + +Called on every path before any I/O: + +```python +def _is_allowed(path: Path) -> bool: + """Return True if `path` is within the allowlist.""" + if path.name in {"history.toml", "config.toml", "credentials.toml"}: + return False + if "*_history.toml" in path.name: + return False + # ... checks against _allowed_paths and _base_dirs +``` + +Returns `False` for any path the AI is not allowed to touch. + +### Layer 3: Resolution Gate (`_resolve_and_check`) + +The final gate. Resolves the path (handling symlinks, relative paths) and re-checks. + +```python +def _resolve_and_check(raw_path: str) -> tuple[Path | None, str]: + """Resolve raw_path and verify it passes the allowlist check.""" + p = Path(raw_path).resolve() + if not _is_allowed(p): + return None, f"ERROR: path not in allowlist: {raw_path}" + return p, "" +``` + +Every tool function calls this first. If it returns an error, the tool returns the error string to the AI. + +--- + +## The 45 Tools + +The tools are organized by category. The full registered count is 45. + +### File I/O Tools (4) + +| Tool | Parameters | Description | +|---|---|---| +| `read_file` | `path` | UTF-8 file content extraction | +| `list_directory` | `path` | Compact table: `[file/dir] name size`. Applies blacklist filter. | +| `search_files` | `path`, `pattern` | Glob pattern matching within an allowed directory | +| `get_file_summary` | `path` | Heuristic summary via `summarize.py` (imports, classes, etc.) | + +### File Edit Tools (3) + +| Tool | Parameters | Description | +|---|---|---| +| `get_file_slice` | `path`, `start_line`, `end_line` | Returns specific line range (1-based, inclusive) | +| `set_file_slice` | `path`, `start_line`, `end_line`, `new_content` | Replaces a line range with new content | +| `edit_file` | `path`, `old_string`, `new_string`, `replace_all` | Exact string match replace. Preserves indentation. | + +### Python AST Tools (15) + +| Tool | Parameters | Description | +|---|---|---| +| `py_get_skeleton` | `path` | Skeleton: signatures + docstrings, bodies replaced with `...` | +| `py_get_code_outline` | `path` | Hierarchical outline: classes, functions, methods with line ranges | +| `py_get_symbol_info` | `path`, `name` | (source, line_number) for a class/function/method | +| `py_get_definition` | `path`, `name` | Full source for a class/function/method | +| `py_update_definition` | `path`, `name`, `new_content` | Surgical replacement (locates via ast, delegates to set_file_slice) | +| `py_get_signature` | `path`, `name` | Just the `def` line through the colon | +| `py_set_signature` | `path`, `name`, `new_signature` | Replaces only the signature, preserving body | +| `py_get_class_summary` | `path`, `name` | Class docstring + method signatures | +| `py_get_var_declaration` | `path`, `name` | Module/class-level variable assignment line(s) | +| `py_set_var_declaration` | `path`, `name`, `new_declaration` | Surgical variable replacement | +| `py_get_hierarchy` | `path`, `class_name` | Subclasses of a given class | +| `py_get_docstring` | `path`, `name` | Docstring for module/class/function | +| `py_get_imports` | `path` | AST-parsed dependency list | +| `py_find_usages` | `path`, `name` | Exact string match search | +| `py_check_syntax` | `path` | Syntax validation via `ast.parse()` | +| `py_remove_def` | `path`, `name` | Excises a definition | +| `py_add_def` | `path`, `name`, `new_content`, `anchor_type`, `anchor_symbol` | Inserts with 1-space indent normalization | +| `py_move_def` | `src_path`, `dest_path`, `name`, `dest_name`, `anchor_type`, `anchor_symbol` | Relocates code | +| `py_region_wrap` | `path`, `start_line`, `end_line`, `region_name` | Wraps line range in `#region` / `#endregion` | + +### C/C++ AST Tools (10) + +| Tool | Parameters | Description | +|---|---|---| +| `ts_c_get_skeleton` | `path` | C function signatures and struct definitions, bodies replaced | +| `ts_cpp_get_skeleton` | `path` | C++ class/struct/method signatures, inheritance | +| `ts_c_get_code_outline` | `path` | C outline | +| `ts_cpp_get_code_outline` | `path` | C++ outline with classes and inheritance | +| `ts_c_get_definition` | `path`, `name` | C struct or function source | +| `ts_cpp_get_definition` | `path`, `name` | C++ class, struct, or method source (supports `Class::method`) | +| `ts_c_update_definition` | `path`, `name`, `new_content` | Surgical C replacement | +| `ts_cpp_update_definition` | `path`, `name`, `new_content` | Surgical C++ replacement | +| `ts_c_get_signature` | `path`, `name` | C function/struct declaration | +| `ts_cpp_get_signature` | `path`, `name` | C++ method/function declaration | + +All C/C++ tools use **tree-sitter** (via `src/file_cache.py`'s `ASTParser`). + +### Analysis Tools (3) + +| Tool | Parameters | Description | +|---|---|---| +| `derive_code_path` | `target`, `max_depth` | Traces execution path of a function across multiple files | +| `py_get_imports` | `path` | AST-parsed dependency list | +| `py_find_usages` | `path`, `name` | String match search | + +### Network Tools (2) + +| Tool | Parameters | Description | +|---|---|---| +| `web_search` | `query` | DuckDuckGo HTML scrape via `_DDGParser` (HTMLParser subclass). Returns top 5 results. | +| `fetch_url` | `url` | Fetches URL, strips HTML via `_TextExtractor` | + +### Runtime Tools (1) + +| Tool | Parameters | Description | +|---|---|---| +| `get_ui_performance` | (none) | FPS, Frame Time, CPU, Input Lag via injected `perf_monitor_callback` | + +### Beads Tools (4) + +| Tool | Parameters | Description | +|---|---|---| +| `bd_list` | (none) | Lists all beads in active `.beads/` repo | +| `bd_create` | `title`, `description` | Creates a new bead | +| `bd_update` | `bead_id`, `status` | Updates bead status | +| `bd_ready` | (none) | Lists beads with no unresolved dependencies | + +--- + +## The `dispatch` Function + +The single entry point for all tool calls. + +```python +def dispatch(tool_name: str, tool_input: dict[str, Any]) -> str: + """Dispatch an MCP tool call by name. Returns the result as a string.""" +``` + +Returns the result as a string (errors included). The AI client receives this string and appends it to the conversation history. + +### Dispatch Flow + +```python +def dispatch(tool_name, tool_input): + if tool_name.startswith("bd_"): + return _dispatch_beads(tool_name, tool_input) + if tool_name == "read_file": + return _read_file(tool_input["path"]) + if tool_name == "py_get_skeleton": + return _py_get_skeleton(tool_input["path"]) + # ... etc, one branch per tool ... + return f"ERROR: unknown tool: {tool_name}" +``` + +The `bd_*` tools are dispatched separately because they require an active `.beads/` repository. + +### Async Dispatch + +```python +async def async_dispatch(tool_name, tool_input) -> str: + """Async version of dispatch. Uses asyncio.to_thread for blocking I/O.""" + return await asyncio.to_thread(dispatch, tool_name, tool_input) +``` + +For concurrent tool execution (when the AI emits multiple calls in one turn), the AI client uses `asyncio.gather` over `async_dispatch`. + +--- + +## The 3-Layer Security Details + +### Blacklist + +Always blocked, regardless of allowlist: +- `history.toml` +- `*_history.toml` +- `config.toml` +- `credentials.toml` + +These are matched by exact filename (no path component) in `_is_allowed`. + +### Allowlist Construction Order + +```python +def configure(file_items, base_dirs): + # Reset + _allowed_paths.clear() + _base_dirs.clear() + + # Primary base dir from first entry + if base_dirs: + _primary_base_dir = Path(base_dirs[0]).resolve() + _base_dirs.add(_primary_base_dir) + + # Add all file item paths + for f in file_items: + if isinstance(f, dict) and "path" in f: + try: + _allowed_paths.add(Path(f["path"]).resolve()) + except (OSError, ValueError): + pass +``` + +`_primary_base_dir` is the first base_dir; it's used for relative-path resolution. + +### Resolution Edge Cases + +`_resolve_and_check` handles: +- Absolute paths (used as-is) +- Relative paths (resolved against `_primary_base_dir`) +- Symlinks (resolved via `Path.resolve()`) +- Windows path separators +- UNC paths + +If resolution fails (e.g., path doesn't exist), returns the error to the AI. + +--- + +## External MCP Servers + +In addition to the 45 native tools, `mcp_client.py` manages **external MCP servers** via `ExternalMCPManager`: + +### `StdioMCPServer` (in `src/mcp_client.py`) + +Manages a local MCP server via subprocess (stdin/stdout). + +```python +server = StdioMCPServer( + name="my_server", + command=["python", "-m", "my_mcp_server"], + cwd="/path/to/server", +) +server.start() +tools = server.list_tools() # Get the server's tool schemas +result = server.call_tool("tool_name", {"arg": "value"}) +server.stop() +``` + +### `RemoteMCPServer` (in `src/mcp_client.py`) + +SSE-based remote MCP server integration. Foundation for connecting to remote MCP services. + +### `ExternalMCPManager` + +Manages multiple server lifecycles: + +```python +manager = ExternalMCPManager() +manager.add_server(server_config) # Stdio or Remote +tools = manager.get_all_tools() # All tools from all servers +manager.stop_all() +``` + +The `dispatch` function transparently routes calls to external server tools as well as native ones. + +### JSON-RPC 2.0 Engine + +External MCP servers use JSON-RPC 2.0 over their respective transports (stdio or SSE). The MCP client implements: +- Request ID generation +- Async request/response matching +- Timeout handling +- Error code mapping (JSON-RPC error codes → string error messages) + +--- + +## Public API + +| Function | Purpose | +|---|---| +| `configure(file_items, base_dirs)` | Build the allowlist | +| `dispatch(tool_name, tool_input)` | Call a tool by name | +| `async_dispatch(tool_name, tool_input)` | Async version | +| `get_tool_schemas() -> list[dict]` | All tool schemas (for AI capability declaration) | +| `is_allowed(path: str) -> bool` | Check if a path is allowed (for testing) | +| `get_external_mcp_manager() -> ExternalMCPManager` | Get the singleton manager | + +--- + +## Configuration + +```toml +# config.toml +[mcp] +enabled = true +blacklist = ["*.pem", "*.key", ".env"] # Additional patterns to always block +allow_symlinks = false +``` + +External MCP server config (`mcp_config.json`, standard format): + +```json +{ + "servers": [ + { + "name": "filesystem", + "command": ["python", "-m", "filesystem_mcp"], + "env": {} + } + ] +} +``` + +Located at `/mcp_config.json` or `/mcp_config.json`. + +--- + +## Testing + +### Unit Tests + +`tests/test_arch_boundary_phase1.py::test_mcp_client_whitelist_enforcement` — tests the security model. + +`tests/test_mcp_ts_integration.py` — tests tree-sitter integration. + +### Mocking MCP + +Tests that don't want real filesystem access can mock the dispatch: + +```python +from unittest.mock import patch + +def test_my_code(monkeypatch): + def fake_dispatch(name, args): + if name == "read_file": + return "mocked content" + return "" + monkeypatch.setattr("src.mcp_client.dispatch", fake_dispatch) + # ... test code ... +``` + +--- + +## Performance + +- **File I/O**: synchronous (blocks the calling thread). Use `async_dispatch` for parallel calls. +- **Tree-sitter parsing**: ~10-50ms per file for typical Python files. Cached in `_ast_cache` (mtime-based). +- **Network tools** (`web_search`, `fetch_url`): 100ms-2s depending on the network. + +--- + +## See Also + +- **[guide_architecture.md](guide_architecture.md#the-task-pipeline-producer-consumer-synchronization)** — Threading model +- **[guide_ai_client.md](guide_ai_client.md)** — How `ai_client` calls `dispatch` +- **[guide_mma.md](guide_mma.md)** — How Tier 3 workers use these tools +- **[conductor/tech-stack.md](../../conductor/tech-stack.md#srcmcp_clientpy)** — The architecture reference +- **[tests/test_arch_boundary_phase1.py](../../tests/test_arch_boundary_phase1.py)** — Security model tests +- **[docs/superpowers/specs/2026-06-02-clean-install-test-design.md](superpowers/specs/2026-06-02-clean-install-test-design.md)** — Opt-in clean install test that exercises `bd_*` tools