docs(mcp-client): add guide_mcp_client.md

2026-06-02 23:33:17 -04:00
parent f7663ab2e8
commit a58a2fd887
1 changed files with 410 additions and 0 deletions
@@ -0,0 +1,410 @@
+# `src/mcp_client.py` — MCP Tools (45 tools, 3-layer security)
+
+[Top](../README.md) | [Architecture](guide_architecture.md) | [Tools & IPC](guide_tools.md) | [Testing](guide_testing.md)
+
+---
+
+## Overview
+
+`src/mcp_client.py` (~81KB) is the **MCP (Model Context Protocol) tool implementation** for Manual Slop. It provides 45 tools that the AI can invoke to read/write files, analyze code structure, search symbols, and more.
+
+The module implements the **client side** of MCP — it provides the tools that an AI model can call during a conversation. It also implements the project's strict filesystem security model.
+
+---
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────┐
+│  ai_client.send(...)                              │
+│  AI returns: { "name": "read_file", "args": {...} } │
+└─────────────────┬───────────────────────────────┘
+                  │ calls
+                  ▼
+┌─────────────────────────────────────────────────┐
+│  mcp_client.dispatch(tool_name, tool_input)      │
+│  Routes to the registered tool function           │
+│  Returns tool result as string                    │
+└─────────────────┬───────────────────────────────┘
+                  │ calls (with security check)
+                  ▼
+┌─────────────────────────────────────────────────┐
+│  3-Layer Security: Allowlist → Validate → Resolve │
+└─────────────────┬───────────────────────────────┘
+                  │ passes
+                  ▼
+┌─────────────────────────────────────────────────┐
+│  The actual tool function (45 of them)            │
+└─────────────────────────────────────────────────┘
+```
+
+---
+
+## The 3-Layer Security Model
+
+Every filesystem access passes through 3 layers:
+
+### Layer 1: Allowlist Construction (`configure`)
+
+Called by `ai_client` before each send cycle (and on project load):
+
+```python
+def configure(file_items: list[dict], base_dirs: list[str]) -> None:
+    """Build the allowlist from the project's tracked files and base dirs."""
+    _allowed_paths.clear()
+    _base_dirs.clear()
+    if base_dirs:
+        _primary_base_dir = Path(base_dirs[0]).resolve()
+    for f in file_items:
+        if isinstance(f, dict) and "path" in f:
+            _allowed_paths.add(Path(f["path"]).resolve())
+    # Blacklist: history.toml and config.toml are NEVER allowed
+```
+
+After this call, `_allowed_paths` contains every file the AI is allowed to touch.
+
+### Layer 2: Path Validation (`_is_allowed`)
+
+Called on every path before any I/O:
+
+```python
+def _is_allowed(path: Path) -> bool:
+    """Return True if `path` is within the allowlist."""
+    if path.name in {"history.toml", "config.toml", "credentials.toml"}:
+        return False
+    if "*_history.toml" in path.name:
+        return False
+    # ... checks against _allowed_paths and _base_dirs
+```
+
+Returns `False` for any path the AI is not allowed to touch.
+
+### Layer 3: Resolution Gate (`_resolve_and_check`)
+
+The final gate. Resolves the path (handling symlinks, relative paths) and re-checks.
+
+```python
+def _resolve_and_check(raw_path: str) -> tuple[Path | None, str]:
+    """Resolve raw_path and verify it passes the allowlist check."""
+    p = Path(raw_path).resolve()
+    if not _is_allowed(p):
+        return None, f"ERROR: path not in allowlist: {raw_path}"
+    return p, ""
+```
+
+Every tool function calls this first. If it returns an error, the tool returns the error string to the AI.
+
+---
+
+## The 45 Tools
+
+The tools are organized by category. The full registered count is 45.
+
+### File I/O Tools (4)
+
+| Tool | Parameters | Description |
+|---|---|---|
+| `read_file` | `path` | UTF-8 file content extraction |
+| `list_directory` | `path` | Compact table: `[file/dir] name  size`. Applies blacklist filter. |
+| `search_files` | `path`, `pattern` | Glob pattern matching within an allowed directory |
+| `get_file_summary` | `path` | Heuristic summary via `summarize.py` (imports, classes, etc.) |
+
+### File Edit Tools (3)
+
+| Tool | Parameters | Description |
+|---|---|---|
+| `get_file_slice` | `path`, `start_line`, `end_line` | Returns specific line range (1-based, inclusive) |
+| `set_file_slice` | `path`, `start_line`, `end_line`, `new_content` | Replaces a line range with new content |
+| `edit_file` | `path`, `old_string`, `new_string`, `replace_all` | Exact string match replace. Preserves indentation. |
+
+### Python AST Tools (15)
+
+| Tool | Parameters | Description |
+|---|---|---|
+| `py_get_skeleton` | `path` | Skeleton: signatures + docstrings, bodies replaced with `...` |
+| `py_get_code_outline` | `path` | Hierarchical outline: classes, functions, methods with line ranges |
+| `py_get_symbol_info` | `path`, `name` | (source, line_number) for a class/function/method |
+| `py_get_definition` | `path`, `name` | Full source for a class/function/method |
+| `py_update_definition` | `path`, `name`, `new_content` | Surgical replacement (locates via ast, delegates to set_file_slice) |
+| `py_get_signature` | `path`, `name` | Just the `def` line through the colon |
+| `py_set_signature` | `path`, `name`, `new_signature` | Replaces only the signature, preserving body |
+| `py_get_class_summary` | `path`, `name` | Class docstring + method signatures |
+| `py_get_var_declaration` | `path`, `name` | Module/class-level variable assignment line(s) |
+| `py_set_var_declaration` | `path`, `name`, `new_declaration` | Surgical variable replacement |
+| `py_get_hierarchy` | `path`, `class_name` | Subclasses of a given class |
+| `py_get_docstring` | `path`, `name` | Docstring for module/class/function |
+| `py_get_imports` | `path` | AST-parsed dependency list |
+| `py_find_usages` | `path`, `name` | Exact string match search |
+| `py_check_syntax` | `path` | Syntax validation via `ast.parse()` |
+| `py_remove_def` | `path`, `name` | Excises a definition |
+| `py_add_def` | `path`, `name`, `new_content`, `anchor_type`, `anchor_symbol` | Inserts with 1-space indent normalization |
+| `py_move_def` | `src_path`, `dest_path`, `name`, `dest_name`, `anchor_type`, `anchor_symbol` | Relocates code |
+| `py_region_wrap` | `path`, `start_line`, `end_line`, `region_name` | Wraps line range in `#region` / `#endregion` |
+
+### C/C++ AST Tools (10)
+
+| Tool | Parameters | Description |
+|---|---|---|
+| `ts_c_get_skeleton` | `path` | C function signatures and struct definitions, bodies replaced |
+| `ts_cpp_get_skeleton` | `path` | C++ class/struct/method signatures, inheritance |
+| `ts_c_get_code_outline` | `path` | C outline |
+| `ts_cpp_get_code_outline` | `path` | C++ outline with classes and inheritance |
+| `ts_c_get_definition` | `path`, `name` | C struct or function source |
+| `ts_cpp_get_definition` | `path`, `name` | C++ class, struct, or method source (supports `Class::method`) |
+| `ts_c_update_definition` | `path`, `name`, `new_content` | Surgical C replacement |
+| `ts_cpp_update_definition` | `path`, `name`, `new_content` | Surgical C++ replacement |
+| `ts_c_get_signature` | `path`, `name` | C function/struct declaration |
+| `ts_cpp_get_signature` | `path`, `name` | C++ method/function declaration |
+
+All C/C++ tools use **tree-sitter** (via `src/file_cache.py`'s `ASTParser`).
+
+### Analysis Tools (3)
+
+| Tool | Parameters | Description |
+|---|---|---|
+| `derive_code_path` | `target`, `max_depth` | Traces execution path of a function across multiple files |
+| `py_get_imports` | `path` | AST-parsed dependency list |
+| `py_find_usages` | `path`, `name` | String match search |
+
+### Network Tools (2)
+
+| Tool | Parameters | Description |
+|---|---|---|
+| `web_search` | `query` | DuckDuckGo HTML scrape via `_DDGParser` (HTMLParser subclass). Returns top 5 results. |
+| `fetch_url` | `url` | Fetches URL, strips HTML via `_TextExtractor` |
+
+### Runtime Tools (1)
+
+| Tool | Parameters | Description |
+|---|---|---|
+| `get_ui_performance` | (none) | FPS, Frame Time, CPU, Input Lag via injected `perf_monitor_callback` |
+
+### Beads Tools (4)
+
+| Tool | Parameters | Description |
+|---|---|---|
+| `bd_list` | (none) | Lists all beads in active `.beads/` repo |
+| `bd_create` | `title`, `description` | Creates a new bead |
+| `bd_update` | `bead_id`, `status` | Updates bead status |
+| `bd_ready` | (none) | Lists beads with no unresolved dependencies |
+
+---
+
+## The `dispatch` Function
+
+The single entry point for all tool calls.
+
+```python
+def dispatch(tool_name: str, tool_input: dict[str, Any]) -> str:
+    """Dispatch an MCP tool call by name. Returns the result as a string."""
+```
+
+Returns the result as a string (errors included). The AI client receives this string and appends it to the conversation history.
+
+### Dispatch Flow
+
+```python
+def dispatch(tool_name, tool_input):
+    if tool_name.startswith("bd_"):
+        return _dispatch_beads(tool_name, tool_input)
+    if tool_name == "read_file":
+        return _read_file(tool_input["path"])
+    if tool_name == "py_get_skeleton":
+        return _py_get_skeleton(tool_input["path"])
+    # ... etc, one branch per tool ...
+    return f"ERROR: unknown tool: {tool_name}"
+```
+
+The `bd_*` tools are dispatched separately because they require an active `.beads/` repository.
+
+### Async Dispatch
+
+```python
+async def async_dispatch(tool_name, tool_input) -> str:
+    """Async version of dispatch. Uses asyncio.to_thread for blocking I/O."""
+    return await asyncio.to_thread(dispatch, tool_name, tool_input)
+```
+
+For concurrent tool execution (when the AI emits multiple calls in one turn), the AI client uses `asyncio.gather` over `async_dispatch`.
+
+---
+
+## The 3-Layer Security Details
+
+### Blacklist
+
+Always blocked, regardless of allowlist:
+- `history.toml`
+- `*_history.toml`
+- `config.toml`
+- `credentials.toml`
+
+These are matched by exact filename (no path component) in `_is_allowed`.
+
+### Allowlist Construction Order
+
+```python
+def configure(file_items, base_dirs):
+    # Reset
+    _allowed_paths.clear()
+    _base_dirs.clear()
+
+    # Primary base dir from first entry
+    if base_dirs:
+        _primary_base_dir = Path(base_dirs[0]).resolve()
+        _base_dirs.add(_primary_base_dir)
+
+    # Add all file item paths
+    for f in file_items:
+        if isinstance(f, dict) and "path" in f:
+            try:
+                _allowed_paths.add(Path(f["path"]).resolve())
+            except (OSError, ValueError):
+                pass
+```
+
+`_primary_base_dir` is the first base_dir; it's used for relative-path resolution.
+
+### Resolution Edge Cases
+
+`_resolve_and_check` handles:
+- Absolute paths (used as-is)
+- Relative paths (resolved against `_primary_base_dir`)
+- Symlinks (resolved via `Path.resolve()`)
+- Windows path separators
+- UNC paths
+
+If resolution fails (e.g., path doesn't exist), returns the error to the AI.
+
+---
+
+## External MCP Servers
+
+In addition to the 45 native tools, `mcp_client.py` manages **external MCP servers** via `ExternalMCPManager`:
+
+### `StdioMCPServer` (in `src/mcp_client.py`)
+
+Manages a local MCP server via subprocess (stdin/stdout).
+
+```python
+server = StdioMCPServer(
+    name="my_server",
+    command=["python", "-m", "my_mcp_server"],
+    cwd="/path/to/server",
+)
+server.start()
+tools = server.list_tools()  # Get the server's tool schemas
+result = server.call_tool("tool_name", {"arg": "value"})
+server.stop()
+```
+
+### `RemoteMCPServer` (in `src/mcp_client.py`)
+
+SSE-based remote MCP server integration. Foundation for connecting to remote MCP services.
+
+### `ExternalMCPManager`
+
+Manages multiple server lifecycles:
+
+```python
+manager = ExternalMCPManager()
+manager.add_server(server_config)  # Stdio or Remote
+tools = manager.get_all_tools()  # All tools from all servers
+manager.stop_all()
+```
+
+The `dispatch` function transparently routes calls to external server tools as well as native ones.
+
+### JSON-RPC 2.0 Engine
+
+External MCP servers use JSON-RPC 2.0 over their respective transports (stdio or SSE). The MCP client implements:
+- Request ID generation
+- Async request/response matching
+- Timeout handling
+- Error code mapping (JSON-RPC error codes → string error messages)
+
+---
+
+## Public API
+
+| Function | Purpose |
+|---|---|
+| `configure(file_items, base_dirs)` | Build the allowlist |
+| `dispatch(tool_name, tool_input)` | Call a tool by name |
+| `async_dispatch(tool_name, tool_input)` | Async version |
+| `get_tool_schemas() -> list[dict]` | All tool schemas (for AI capability declaration) |
+| `is_allowed(path: str) -> bool` | Check if a path is allowed (for testing) |
+| `get_external_mcp_manager() -> ExternalMCPManager` | Get the singleton manager |
+
+---
+
+## Configuration
+
+```toml
+# config.toml
+[mcp]
+enabled = true
+blacklist = ["*.pem", "*.key", ".env"]  # Additional patterns to always block
+allow_symlinks = false
+```
+
+External MCP server config (`mcp_config.json`, standard format):
+
+```json
+{
+    "servers": [
+        {
+            "name": "filesystem",
+            "command": ["python", "-m", "filesystem_mcp"],
+            "env": {}
+        }
+    ]
+}
+```
+
+Located at `<user_config>/mcp_config.json` or `<project_root>/mcp_config.json`.
+
+---
+
+## Testing
+
+### Unit Tests
+
+`tests/test_arch_boundary_phase1.py::test_mcp_client_whitelist_enforcement` — tests the security model.
+
+`tests/test_mcp_ts_integration.py` — tests tree-sitter integration.
+
+### Mocking MCP
+
+Tests that don't want real filesystem access can mock the dispatch:
+
+```python
+from unittest.mock import patch
+
+def test_my_code(monkeypatch):
+    def fake_dispatch(name, args):
+        if name == "read_file":
+            return "mocked content"
+        return ""
+    monkeypatch.setattr("src.mcp_client.dispatch", fake_dispatch)
+    # ... test code ...
+```
+
+---
+
+## Performance
+
+- **File I/O**: synchronous (blocks the calling thread). Use `async_dispatch` for parallel calls.
+- **Tree-sitter parsing**: ~10-50ms per file for typical Python files. Cached in `_ast_cache` (mtime-based).
+- **Network tools** (`web_search`, `fetch_url`): 100ms-2s depending on the network.
+
+---
+
+## See Also
+
+- **[guide_architecture.md](guide_architecture.md#the-task-pipeline-producer-consumer-synchronization)** — Threading model
+- **[guide_ai_client.md](guide_ai_client.md)** — How `ai_client` calls `dispatch`
+- **[guide_mma.md](guide_mma.md)** — How Tier 3 workers use these tools
+- **[conductor/tech-stack.md](../../conductor/tech-stack.md#srcmcp_clientpy)** — The architecture reference
+- **[tests/test_arch_boundary_phase1.py](../../tests/test_arch_boundary_phase1.py)** — Security model tests
+- **[docs/superpowers/specs/2026-06-02-clean-install-test-design.md](superpowers/specs/2026-06-02-clean-install-test-design.md)** — Opt-in clean install test that exercises `bd_*` tools