From 0f74705d01284c0cdf15d01ad329ff75688aadf3 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Sat, 6 Jun 2026 20:56:12 -0400 Subject: [PATCH] docs(reports): add planning digest covering 5 tracks from 2026-06-06 session MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Single-session planning digest that captures: - The 5 tracks fully specced + planned (test_batching, qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor) - Cross-cutting design themes (data-oriented, audit-driven, per-track commit + git note, out-of-scope-by-default) - The audit + data foundation (scripts/audit_weak_types.py; 430 -> 60 finding; 0 strong patterns; 26 unique type strings; 86% concentrated in 6 files) - The dependency graph + recommended execution order - Follow-up tracks already planned in spec §12.1 of each track - Recommended future tracks (post-tracks documentation is the top pick) - Risks, open questions, and a complete file index This is the kind of reference document that: - Future planners consult to understand the codebase's current state - The implementing agent uses to coordinate across tracks - The user reviews as a digest of the planning work Written in the project's docs/reports/ directory alongside the existing Phase 5 reports (PHASE5_STABILISATION_REPORT.md, MUTATION_MATRIX_PHASE5.md, etc.). --- docs/reports/PLANNING_DIGEST_20260606.md | 468 +++++++++++++++++++++++ 1 file changed, 468 insertions(+) create mode 100644 docs/reports/PLANNING_DIGEST_20260606.md diff --git a/docs/reports/PLANNING_DIGEST_20260606.md b/docs/reports/PLANNING_DIGEST_20260606.md new file mode 100644 index 00000000..c6439083 --- /dev/null +++ b/docs/reports/PLANNING_DIGEST_20260606.md @@ -0,0 +1,468 @@ +# Planning Digest: 5-Track Architectural Refactor (2026-06-06) + +**Status:** Planning complete; implementation in flight +**Author:** Tier 2 Tech Lead (brainstorming + spec + plan for all 5 tracks) +**Date:** 2026-06-06 +**Audience:** Future planners, the implementing agent, the user (as a reference / digest) + +--- + +## 1. Executive Summary + +In a single planning session, **5 architectural refactor tracks** were specced and planned end-to-end. Together they reshape the `manual_slop` codebase around three foundational design principles — **data-oriented error handling** (Fleury), **data-oriented types** (named, documented, generated), and **modular MCP architecture** (sub-MCPs by category). All 5 tracks share a common ancestor in the **startup_speedup_20260606** track (already shipped as of `12cec6ae`), which established the lazy-SDK-import convention the other tracks depend on. + +| # | Track | Status | Phases | Key new files | What it does | +|---|---|---|---|---|---| +| 1 | `test_batching_refactor_20260606` | Planned | 4 | `scripts/{test_categorizer,test_batcher,pytest_collection_order}.py` | Replaces alphabetical 4-at-a-time batching with tiered batching (Tier 1 unit + xdist, Tier 3 live_gui in one session, etc.) | +| 2 | `qwen_llama_grok_integration_20260606` | Planned | 6 | `src/{vendor_capabilities,openai_compatible,qwen_adapter}.py` | Adds Qwen (DashScope), Llama (Ollama + OpenRouter + custom URL), Grok (xAI). Introduces the Vendor Capability Matrix. | +| 3 | `data_oriented_error_handling_20260606` | Planned | 5 | `src/result_types.py` | Introduces `Result[T]`, `ErrorInfo`, `NilPath` per Fleury. Removes `ProviderError` exception. Marks `send()` `@deprecated`; adds `send_result()`. | +| 4 | `data_structure_strengthening_20260606` | Planned | 2 | `src/type_aliases.py`, `scripts/generate_type_registry.py` | Introduces 10 `TypeAlias` for the 430 anonymous `dict[str, Any]` / `list[dict[...]]` sites. Adds auto-generated `docs/type_registry/`. | +| 5 | `mcp_architecture_refactor_20260606` | Planned | 7 | `src/mcp_.py` (7 files), `src/mcp_client_security.py` | Splits 2,205-line `mcp_client.py` into slim controller + 6 native sub-MCPs + 1 external sub-MCP. | + +**Combined impact:** ~5 new framework files; ~6 modified framework files; ~6 modified high-traffic files (for the type-aliases refactor); 1 monolithic file split into 9 focused files; 1 new CI gate script; 1 new docs directory. + +--- + +## 2. Session Context + +### 2.1 Workflow model + +The user is operating in a **planning / execution split** mode: +- **This session:** Tier 2 Tech Lead (me) does brainstorming → spec → plan for each track. No code is written or executed. +- **External session:** Another agent does the implementation. It picks up each `plan.md` and executes task-by-task via the project's MMA tier system. + +This split lets the user think strategically (planning) while the heavy lifting (executing) happens in parallel. + +### 2.2 The pre-existing baseline + +Before this session, the project had: +- **277 test files** in `tests/` (`test_*.py` + `*_sim.py`) +- **53 src files** (`src/*.py`) +- **14 deep-dive guides** (`docs/guide_*.md`) +- **The startup_speedup_20260606 track was in flight** (Phase 6 complete per `253e1798`; track SHIPPED per `12cec6ae` in the same window as this planning session) +- **The test_batching_refactor_20260606 track had been planned** (spec + plan were in the folder but execution hadn't started) +- **Conductor convention was in place** — every track has `spec.md` + `metadata.json` + `state.toml`; the `tracks.md` registry lists all tracks with their `[track-created: ]` references + +### 2.3 What changed during this session + +The user asked for 5 different refactor specs in sequence: +1. **Test batching refactor** — already-planned track; I reviewed and committed +2. **Qwen/Llama/Grok vendors + capability matrix** — new spec; multiple design questions resolved +3. **Data-oriented error handling (Fleury pattern)** — new spec; user brought the article + friend's notes +4. **Data structure strengthening (type aliases + named tuples)** — new spec; user proposed auto-generated docs over TypedDict migration +5. **MCP architecture refactor (sub-MCPs)** — new spec; user proposed `mcp_.py` naming + the DSL future idea + +For each, I followed the **brainstorming → spec → plan** flow per the user's stated preference. + +--- + +## 3. Cross-Cutting Design Themes + +Five design themes run through all the tracks. Understanding them makes each track's individual decisions coherent. + +### 3.1 Data-Oriented Design (Fleury / Acton / Lottes) + +The user explicitly references this in two of the five tracks (`data_oriented_error_handling_20260606` for errors; `mcp_architecture_refactor_20260606` for module boundaries). The framing is: + +- **Errors are just cases**, not special control-flow primitives. Use `Result[T]` with side-channel error lists, not exceptions. +- **Algorithms on data**, not methods on objects. The `MCPController` is a data structure; sub-MCPs are data; the dispatch is a function from data to data. +- **Stable names, not types**. Type aliases (`Metadata`, `FileItem`, etc.) name data roles; they don't enforce structure (that's deferred to TypedDict if ever). +- **Shared code where possible**; unique code only where vendor-specific. The `_send__result()` functions in `ai_client.py` are thin boundary adapters; the `send_openai_compatible()` helper is the shared algorithm. + +### 3.2 Capability / Pattern / Convention as first-class docs + +The user values explicit, discoverable conventions over implicit understanding. Each track introduces at least one canonical document: +- `conductor/code_styleguides/error_handling.md` (Fleury patterns) +- `conductor/code_styleguides/type_aliases.md` (type alias conventions) +- `docs/type_registry/` (auto-generated per-source-file schema docs) +- `conductor/code_styleguides/mcp_.py` (implicit, via the naming convention) + +The product-guidelines.md is the umbrella; the styleguides are the detailed references. This pattern should be followed for any future track that introduces a new convention. + +### 3.3 Audit + data-driven decisions + +Two of the five tracks are data-grounded: +- `test_batching_refactor_20260606`: addressed the actual problem (alphabetical 4-at-a-time batching) and explicitly designed the solution around the test categories the project already uses (Tier 1 unit, Tier 2 mock_app, Tier 3 live_gui, etc.). +- `data_structure_strengthening_20260606`: drove by the `scripts/audit_weak_types.py` findings (430 weak sites; 86% concentrated in 6 high-traffic files; 0 strong patterns; 26 unique type strings; top 4 = 86% of findings). + +The audit data is the source of truth. The track's success criterion is a measurable drop in the audit count (430 → ~60 = 86% reduction). + +### 3.4 Process: per-track commit + git note + checkpoint + +Every plan follows the same template: +- **Per-task commit**: 1 commit per Red-Green-Refactor step +- **Per-checkpoint git note**: `git notes add -m "..."` summarizing what the phase delivered +- **Per-checkpoint state.toml update**: `current_phase` advanced; `checkpointsha` filled in + +This is a feature of the project's `conductor/workflow.md` and is consistently applied. The next planner / implementer should follow the same template. + +### 3.5 Out-of-scope-by-default; follow-up tracks for the next round + +Each of the 5 tracks explicitly defers work to follow-up tracks. The follow-ups are documented in each spec's §12.1: +- `public_api_migration_20260606` — removes deprecated `send()` (from data_oriented_error_handling) +- `type_registry_ci_20260606` — wires `generate_type_registry.py --check` into CI (from data_structure_strengthening) +- `mcp_dsl_20260606` — per-MCP compact DSL for tool calls (from mcp_architecture_refactor) +- `typed_dict_migration_20260606` — convert most-used aliases to `TypedDict` (initially planned; later replaced by the docs approach; kept as a future option) + +These follow-ups are listed in `conductor/tracks.md` as `[ ]` placeholders (item 0f etc.). They should be sequenced AFTER the 5 main tracks ship. + +--- + +## 4. The 5 Tracks in Detail + +### 4.1 `test_batching_refactor_20260606` + +**Goal:** Replace alphabetical 4-at-a-time batching with tiered batching that respects fixture-class boundaries. + +**Architecture:** +- `scripts/test_categorizer.py`: AST-based classifier that determines each test file's `FixtureClass` (UNIT, MOCK_APP, LIVE_GUI, HEADLESS, OPT_IN, PERFORMANCE) and its `batch_group` (e.g., `core`, `gui`, `mma`). +- `scripts/test_batcher.py`: Pure scheduler. `plan(records, options) -> list[Batch]` deterministically produces batches. +- `scripts/pytest_collection_order.py`: Conftest-loaded plugin for the per-test order control (opt-in per file). +- `scripts/run_tests_batched.py`: Modified CLI orchestrator with `--tiers`, `--include-opt-in`, `--plan`, `--audit` modes. + +**Key decisions:** +- **Tier 3 (live_gui) is one pytest invocation**, not many. This is THE single biggest runtime savings (15s startup amortized). +- **Tier 1 (unit) uses pytest-xdist** for parallelism. +- **Tier 0 (opt-in) is gated on BOTH env var AND CLI flag** (defense-in-depth: setting the env var alone shouldn't accidentally enable docker tests). +- **Hybrid classification**: auto-infer from filename + AST fixture scan; hand-curated `tests/test_categories.toml` overrides for cross-cutting and ambiguous files. + +**What's NOT done:** The script does NOT modify test files or fixtures; it only categorizes and batches. New tests get sensible defaults automatically. + +**Current state:** Plan complete (`7fdab705` spec, `f7b11f7f` plan). Ready for execution. + +--- + +### 4.2 `qwen_llama_grok_integration_20260606` + +**Goal:** Add first-class support for Qwen, Llama, Grok. Introduce the Vendor Capability Matrix. + +**Architecture:** +- `src/vendor_capabilities.py`: `VendorCapabilities` dataclass, `_REGISTRY` populated per-(vendor, model). +- `src/openai_compatible.py`: shared `send_openai_compatible()` helper (data-oriented design — operates on normalized data). +- `src/qwen_adapter.py`: DashScope-specific tool format translation + error classification. + +**Key decisions:** +- **Naming convention:** `_send__result()` returning `Result[str, ErrorInfo]` (8 vendors: Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI, Qwen, Llama, Grok). +- **Capability Matrix v1:** 7 capabilities — vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking. Audio and server-side code_execution deferred to a future track. +- **UX adaptation:** 9 UI elements read the matrix (screenshot button, tools toggle, cache panel, stream progress, fetch models button, token budget max, cost panel). +- **OpenAI-compatible at the SDK boundary** keeps raising; the new `_send__result()` functions catch and convert to `ErrorInfo`. Per Fleury: "exceptions are reserved for the SDK boundary." + +**Coordination with `startup_speedup_20260606`:** Qwen's DashScope SDK adds a new import; the audit script `scripts/audit_main_thread_imports.py` ensures the import is gated to a worker thread, not the main thread. Verified at the baseline in Phase 1 of the track. + +**Current state:** Plan complete (`b17cbbde` plan). Ready for execution. + +--- + +### 4.3 `data_oriented_error_handling_20260606` + +**Goal:** Introduce Ryan Fleury's "errors are just cases" framework as a project convention. + +**Architecture:** +- `src/result_types.py`: `ErrorKind` enum, `ErrorInfo` dataclass, `Result[T]` generic, `NilPath` + `NilRAGState` sentinel singletons. +- `src/mcp_client.py` (the data_oriented refactor for MCP): (p, err) tuples → `Result[Path]`; `assert p is not None` → nil-sentinel. +- `src/ai_client.py`: `ProviderError` exception REMOVED; `_classify__error()` returns `ErrorInfo`; `_send_()` renamed to `_send__result()` returning `Result[str]`. +- `src/rag_engine.py`: methods return `Result` instead of raising. + +**Key decisions:** +- **Internal-only refactor for the public API.** `_send__result()` is renamed + retuned. The public `send()` is preserved, marked `@typing_extensions.deprecated`; the new `send_result()` returns `Result[str]`. The actual breaking change happens in the follow-up `public_api_migration_20260606` track. +- **`ProviderError` is FULLY REMOVED**, not kept as a thin internal exception. Per Fleury, exceptions are for the SDK boundary only; once the boundary converts to `ErrorInfo`, no exception is needed. +- **Deprecation warning emitted in tests:** `tests/conftest.py` adds `filterwarnings("ignore::DeprecationWarning:src.ai_client")` during the transition. + +**Coordination with pending tracks:** +- `mcp_architecture_refactor_20260606` assumes the `Result` pattern is in place (the new sub-MCPs return `Result[str, ErrorInfo]` from `invoke()`). +- `data_structure_strengthening_20260606` assumes the `Metadata` family aliases are in place (the result types are referenced by name). +- Both track specs have a §10 "Coordination with Pending Tracks" section that documents the post-tracks state and verifies it before proceeding. + +**Current state:** Plan complete (`f7b11f7f` plan). Ready for execution. + +--- + +### 4.4 `data_structure_strengthening_20260606` + +**Goal:** Name the 430 anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types in the codebase. + +**Architecture:** +- `src/type_aliases.py`: 10 `TypeAlias` definitions + 1 `NamedTuple` (`FileItemsDiff`). + - `Metadata` (root), `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback` +- `scripts/audit_weak_types.py` (already committed `84fd9ac9`): AST-based static analyzer. `Finding` dataclass; `--json`, `--top N`, `--verbose` modes. After this track: also `--strict` mode (CI gate; exits 1 if new weak sites are introduced). +- `scripts/generate_type_registry.py` (Phase 2): AST-based registry generator. 3 modes — default (regenerate), `--check` (CI; exits 1 if drift), `--diff` (dry run). Writes `docs/type_registry/.md` per source file. +- `docs/type_registry/`: auto-generated per-source-file markdown references for the LLM to consult. + +**The data that drove the design:** +- 430 weak sites across 29 of 61 files in `src/` +- 0 strong patterns currently (no `TypeAlias`, no `NamedTuple`, no `pydantic.BaseModel` in the relevant shapes) +- 26 unique type strings after normalization +- Top 4 unique strings = 86% of findings (`list[dict[str, Any]]`, `dict[str, Any]`, `Dict[str, Any]`, `List[Dict[str, Any]]`) +- File distribution: ai_client.py (139), app_controller.py (86), models.py (51), api_hook_client.py (32), project_manager.py (20), aggregate.py (17) = 345 in 6 files; the rest in 23 lower-impact files + +**The "docs over TypedDict" decision (key user feedback mid-track):** +- Original draft proposed a follow-up track to convert aliases to `TypedDict`s. +- User pushed back: pay the token cost (LLM reads the docs) instead of the upfront cost (designing `TypedDict` schemas for every type). +- The `docs/type_registry/` generator is the result: an LLM can `cat docs/type_registry/ai_client.md` to see the fields of every struct in `src/ai_client.py` without the code having to enforce the structure at runtime. +- The 5-pattern structure (Nil sentinel, Zero-init, Fail-early, AND-over-OR, Side-channel errors) is documented in the styleguide. + +**Coordination:** +- This track's aliases compose with the `Result[T]` from `data_oriented_error_handling_20260606`: `Result[FileItems]`, `Result[CommsLogEntry]`, etc. are valid generics. +- The audit script is the **permanent CI gate** for this convention. New `dict[str, Any]` in a PR fails `--strict` mode. + +**Current state:** Plan complete (`91475781` plan). Ready for execution. + +--- + +### 4.5 `mcp_architecture_refactor_20260606` + +**Goal:** Split the 2,205-line monolithic `src/mcp_client.py` (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. + +**Architecture:** +- `src/mcp_client.py` (modified, slim): `SubMCP` Protocol + `MCPController` class + module-level `controller` singleton + `ALL_SUB_MCPS` registration list + re-export shim from `mcp_client_legacy`. +- `src/mcp_client_legacy.py` (NEW): the OLD `mcp_client.py` content. Re-exported for backward compat. +- `src/mcp_client_security.py` (NEW): 3-layer security (Allowlist → Resolve → Validate) returning `Result[Path]`. +- `src/mcp_file_io.py` (9 tools), `src/mcp_python.py` (14), `src/mcp_c.py` (5), `src/mcp_cpp.py` (5), `src/mcp_web.py` (2), `src/mcp_analysis.py` (2): native sub-MCPs. +- `src/mcp_external.py`: the existing `ExternalMCPManager` extracted; class name preserved as `ExternalMCP` for compat. + +**Naming convention (per user direction):** `mcp_.py` for native MCPs. The user explicitly said this; the convention is locked in. + +**Key design decisions:** +- **Sub-MCP shape:** class with `name` / `description` / `tools` (dict) / `invoke()` (returns `Result[str, ErrorInfo]`). +- **Registration mechanism:** explicit `controller.register(FileIOMCP())` at the bottom of `mcp_client.py`. New sub-MCP = create the file + add 2 lines to the registration. No magic, no auto-discovery. +- **Controller-level security:** the 3-layer security runs BEFORE delegating to sub-MCPs. Sub-MCPs receive already-validated paths. Testable in isolation. +- **Dispatch inversion:** the controller uses an inverted-dict `self._tool_index[tool_name] -> sub_mcp` for O(1) lookup. The current if/elif chain is O(n) per dispatch. +- **External MCP is NOT in `ALL_SUB_MCPS`** — it's a sub-controller. The main controller delegates to it AFTER native sub-MCPs miss. + +**The "thin adapter" approach for v1:** +- Each sub-MCP's methods (e.g., `read_file`, `py_get_skeleton`) **delegate to the corresponding function in `mcp_client_legacy.py`**. This keeps the legacy module as the source of truth for the implementation; the new `mcp_.py` is a thin adapter that adds the class shape, the security check, and the `Result` wrapping. +- A future track can move the actual implementations into the sub-MCP files directly once the architecture is established. For v1, delegation is the safer path. + +**Backward compatibility:** +- `src/mcp_client_legacy.py` re-exports all 45+ old function names. +- `src/mcp_client.py` is now a slim shim that imports from legacy. +- The 4 existing test files (`test_mcp_client_beads.py`, `test_mcp_config.py`, `test_mcp_perf_tool.py`, `test_mcp_ts_integration.py`) and `src/app_controller.py:61` (the direct `mcp_client.py_get_symbol_info` call) continue to work unchanged. + +**The DSL future (per user's notes on APL/K/Cosy):** +- The user shared a friend's idea: per-MCP compact dialects (like command line but more flexible) instead of JSON. +- Acknowledged in the spec as out of scope for this track ("no time for that"). +- Documented as `mcp_dsl_20260606` follow-up in spec §12.1. +- The sub-MCP architecture is the natural unit to pair with a DSL emitter in the future. + +**Current state:** Plan complete (`cf01870b` plan). Ready for execution. + +--- + +## 5. The Audit & Data Foundation + +The most data-grounded track is `data_structure_strengthening_20260606`. The audit that drove it is committed at `84fd9ac9`: + +``` +File: scripts/audit_weak_types.py +Size: 281 lines +Modes: default (human-readable), --json, --top N, --verbose +Detection: AST-based; regex over ast.unparse() of type annotations +Patterns detected: 14 (Dict[str, Any], list[dict[...]], Tuple[...], Optional[...], assign-tuple-literal, ...) +Positive patterns detected: TypeAlias, NamedTuple, @dataclass, pydantic.BaseModel +Exit codes: 0 = informational, 1 = usage error +``` + +**Pre-track findings (baseline):** +- 430 weak sites in 29 of 61 files +- 0 strong patterns +- 26 unique type strings +- Top 4 unique strings = 86% of findings + +**Post-track target:** +- ~60 weak sites in the 23 lower-impact files (the 6 high-traffic files contribute 0) +- 10 `TypeAlias` definitions + 1 `NamedTuple` in use +- `--strict` mode + baseline file as permanent CI gate + +This is **the most measurable track** in the planning session. Success = a concrete number drop in the audit count. + +--- + +## 6. The Coordinate Picture (dependencies) + +The 5 tracks form a dependency graph. The arrows are "blocks": + +``` +startup_speedup_20260606 (SHIPPED) + ↓ + ├── test_batching_refactor_20260606 (planned) + │ + ├── qwen_llama_grok_integration_20260606 (planned) + │ ↓ + │ ├── data_oriented_error_handling_20260606 (planned) + │ │ ↓ + │ │ ├── public_api_migration_20260606 (follow-up; not yet specced) + │ │ └── type_registry_ci_20260606 (follow-up; not yet specced) + │ │ + │ └── data_structure_strengthening_20260606 (planned) + │ ↓ + │ └── type_registry_ci_20260606 (follow-up; not yet specced) + │ + └── mcp_architecture_refactor_20260606 (planned; depends on data_oriented + data_structure tracks) + ↓ + └── mcp_dsl_20260606 (follow-up; not yet specced) +``` + +**Critical insight:** `mcp_architecture_refactor_20260606` depends on BOTH `data_oriented_error_handling_20260606` (for `Result`) and `data_structure_strengthening_20260606` (for the `Metadata` aliases). If the implementing agent executes tracks in arbitrary order, this dependency is broken. + +The recommended execution order is the topological order: `startup_speedup` (done) → `qwen_llama_grok` → `data_oriented_error_handling` + `data_structure_strengthening` (in parallel) → `mcp_architecture_refactor` → `test_batching_refactor` (no dependencies; can run anytime) → follow-up tracks. + +--- + +## 7. Follow-up Tracks Already Planned (Not in This Session's 5) + +Each track's spec §12.1 names a follow-up. Aggregated: + +| Follow-up | Parent track | Scope | +|---|---|---| +| `public_api_migration_20260606` | data_oriented_error_handling | Remove deprecated `ai_client.send()`; migrate all callers (multi_agent_conductor, app_controller, ~50 tests) to `send_result()` | +| `type_registry_ci_20260606` | data_structure_strengthening | Wire `generate_type_registry.py --check` into CI; add pre-commit hook; document per-track commit workflow | +| `mcp_dsl_20260606` | mcp_architecture_refactor | Per-MCP compact dialect for tool calls (APL/K/Cosy-inspired); ~5x token reduction per call | + +All three are listed in `conductor/tracks.md` as `[ ]` placeholders. They should be sequenced AFTER the 5 main tracks ship. None are urgent; all are improvements. + +--- + +## 8. Recommended Future Tracks (Beyond What's Planned) + +These are tracks I identified during this session but didn't fully spec. They're ranked by what I think is most important. + +### 8.1 Post-Tracks Documentation Synchronization (top pick) + +**Why:** The 5 planned tracks add 10+ new modules and change the architecture significantly. The existing docs (`docs/guide_*.md`) were last updated in the 2026-06-02 comprehensive docs refresh — and are about to be more out of date than they are now. Stale docs are the #1 enemy of AI readability (an LLM reading `guide_ai_client.md` and finding it pre-dates `Result`/`ErrorInfo` will hallucinate the wrong shape). + +**Scope (1-2 phases):** +- Phase 1: Update all existing guides (`guide_ai_client.md`, `guide_mcp_client.md`, etc.) to reflect the post-tracks state. +- Phase 2: Add cookbooks ("How to add a new sub-MCP", "How to add a new AI vendor", "How to add a new result type") + a `docs/type_registry.md` index. + +**Why first:** Bounded and achievable. Closes the loop on all the planning work — each track ships a module; this track ships the docs that explain those modules. + +### 8.2 Test Coverage Audit & Improvement (runner-up) + +**Why:** The project has a stated >80% coverage target per `conductor/workflow.md`, but the actual current state is unknown. Under-tested areas are likely `app_controller.py` (4,153 lines; the orchestrator that touches everything) and `multi_agent_conductor.py` (the most complex control flow). The new modules from the 5 planned tracks each get unit tests in their respective tracks, but integration tests are sparse. + +**Scope (1-2 phases):** +- Phase 1: Run `pytest --cov=src --cov-report=html`; identify the bottom-10 modules by coverage; write tests to bring each to >80%. +- Phase 2: Add a coverage threshold to CI (e.g., `--cov-fail-under=80`); add per-module coverage badges to `docs/Readme.md`. + +### 8.3 Security Audit / Hardening + +**Why:** The 3-layer MCP security model is solid, but there are adjacent concerns: +- **Command injection in `run_powershell`** — the AI generates PowerShell commands; how is the risk of a malicious model call mitigated? The HITL dialog exists, but is it consistently applied? +- **Prompt injection** — the AI sees file content, web search results, Beads queries. A malicious file could inject instructions that the AI then follows. How is this sanitized? +- **Sensitive data in logs** — the `comms_log` records full API requests/responses. If a user includes an API key or password in a message, it ends up in the log. What's the redaction policy? + +**Scope (1-2 phases):** +- Phase 1: Threat model the AI tool-calling surface; document the existing mitigations; identify gaps. +- Phase 2: Add log redaction for known secret patterns; add a "dangerous command" detector for `run_powershell`; add an "untrusted content" marker for content from external sources. + +### 8.4 Dependency Hygiene + +**Why:** `pyproject.toml` has a long dep list. No track for: +- Version pinning strategy (caret vs tilde vs exact) +- Deprecation monitoring (track when a vendor SDK announces EOL) +- License audit (any GPL contamination?) +- CVE scanning + +This is a "track for the person who maintains the project 6 months from now." + +--- + +## 9. Risks & Open Questions (Cross-Track) + +### 9.1 Risks + +| Risk | Likelihood | Impact | Mitigation | +|---|---|---|---| +| The implementing agent executes tracks in the wrong order, breaking the dependency chain (especially for `mcp_architecture_refactor_20260606` which depends on the other two). | Medium | High (broken tests; confusing failures) | The recommended execution order in §6 is explicit. The plan files note the dependencies in their "blocked_by" sections. | +| The 5 tracks add 10+ new files but the `scripts/audit_main_thread_imports.py` doesn't catch a heavy import in one of the new modules. | Low | Medium (regresses the startup_speedup invariant) | Each new module's Phase 1 task includes an import-time check (`uv run python -c "import time; ..."`). | +| A future contributor adds a new `dict[str, Any]` after the data_structure_strengthening track; the audit `--strict` mode catches it, but they're confused about why. | Medium | Low (process friction) | The styleguide + the deprecation warning in `--strict` mode explain the rule. | +| The `mcp_client_legacy.py` shim becomes permanent and never gets removed. | Medium | Low (acceptable) | The `public_api_migration_20260606` follow-up (and any future MCP-API changes) is the natural place to remove the shim. | +| The DSL idea becomes a "we have to do it now" before the architecture track is done. | Low | Low | The DSL is explicitly out of scope. The sub-MCP architecture is compatible with a future DSL layer. | + +### 9.2 Open questions for the next planning round + +- **Where do the implementation agents' session notes / handoffs go?** Each track has `metadata.json` + `state.toml` for the planning side. There's no equivalent for the implementation side. (The `startup_speedup_20260606` track's recent commits `253e1798`, `88fc42bb`, `8c4791d0` suggest they do handoff via commit messages, but a structured format would be nice.) +- **What happens when a track's implementation diverges from the plan?** Per `conductor/workflow.md`, "implementation differs from spec" is handled by updating the spec. But the plan files don't have a clear "deviations" section. Consider adding one to future plans. +- **How are plan review comments captured?** The plan files are committed at `cf01870b` (and the others). But there's no `conductor/plan_reviews/` directory. If the implementing agent has questions or disagreements, where do they go? + +--- + +## 10. File Index + +For the implementing agent (and any future planner), here's the canonical file index. + +### 10.1 Conductor convention files (the project-level structure) + +| File | Purpose | +|---|---| +| `conductor/tracks.md` | Master track registry. Lists all tracks with their status (`[ ]` planned, `[~]` in progress, `[x]` done) and `[track-created: ]` references. | +| `conductor/workflow.md` | The project's TDD + per-track commit + git note workflow. | +| `conductor/product-guidelines.md` | The project's design principles (1-space indent, 1 commit per task, type hints, etc.). | +| `conductor/product.md` | The project's product vision and use cases. | +| `conductor/tech-stack.md` | The project's tech stack. | +| `conductor/code_styleguides/python.md` | Language-specific style guide. | +| `conductor/code_styleguides/error_handling.md` | (created in data_oriented_error_handling) Data-Oriented Error Handling convention. | +| `conductor/code_styleguides/type_aliases.md` | (created in data_structure_strengthening) Type Aliases convention. | + +### 10.2 The 5 new tracks (this session's planning output) + +| Track | Spec SHA | Plan SHA | Files | +|---|---|---|---| +| `test_batching_refactor_20260606` | `b7a97374` | `f7b11f7f` | spec.md, metadata.json, state.toml, plan.md | +| `qwen_llama_grok_integration_20260606` | `7c1d597e` (track init), `97daaff2` (consistency) | `b17cbbde` | spec.md, metadata.json, state.toml, plan.md | +| `data_oriented_error_handling_20260606` | `494f68f9` (init), `cbc3b075` (track + tracks.md), `f7b11f7f` (plan) | `f7b11f7f` | spec.md, metadata.json, state.toml, plan.md | +| `data_structure_strengthening_20260606` | `ed42a97a` (init), `aba35f9f` (registry), `432c7895` (risk) | `91475781` | spec.md, metadata.json, state.toml, plan.md | +| `mcp_architecture_refactor_20260606` | `2720a894` (init), `dd137df7` (backfill) | `cf01870b` | spec.md, metadata.json, state.toml, plan.md | + +### 10.3 The 5 new module families (what the tracks will create) + +| Module family | Created by | Files | +|---|---|---| +| Test batching | `test_batching_refactor_20260606` | `scripts/{test_categorizer,test_batcher,pytest_collection_order}.py`, `scripts/run_tests_batched.py`, `tests/test_categories.toml` | +| Vendor capability matrix | `qwen_llama_grok_integration_20260606` | `src/{vendor_capabilities,openai_compatible,qwen_adapter}.py` | +| Result types | `data_oriented_error_handling_20260606` | `src/result_types.py` | +| Type aliases + registry | `data_structure_strengthening_20260606` | `src/type_aliases.py`, `scripts/generate_type_registry.py`, `docs/type_registry/` | +| Sub-MCPs | `mcp_architecture_refactor_20260606` | `src/mcp_.py` (7 files), `src/mcp_client_security.py`, `src/mcp_client_legacy.py` | + +### 10.4 The audit script (data-driven decisions) + +| File | Purpose | +|---|---| +| `scripts/audit_weak_types.py` (committed `84fd9ac9`) | AST analyzer that found the 430 weak sites driving data_structure_strengthening. | + +### 10.5 The startup_speedup predecessor + +| Track | Status | Key outputs | +|---|---|---| +| `startup_speedup_20260606` | SHIPPED (commits `12cec6ae`, `bb2ac6c9`, `253e1798`, `88fc42bb`, `8c4791d0`) | `_io_pool` ThreadPoolExecutor; warmup mechanism; lazy SDK imports; `scripts/audit_main_thread_imports.py` CI gate | + +This is the **predecessor for all 5 tracks** — the lazy-SDK-import convention means the new modules can use `from src.openai_compatible import send_openai_compatible` at the top without paying the SDK import cost on the main thread. + +--- + +## 11. Closing Notes + +### 11.1 What the user achieved in this session + +In a single multi-hour planning session, the user: +- Approved 5 architectural refactor tracks end-to-end (brainstorming → spec → plan) +- Made 3 major design decisions with significant impact: (1) the `mcp_.py` naming convention, (2) the "docs over TypedDict" tradeoff, (3) the deprecation-not-removal of the public `send()` API +- Brought in external inspiration: Ryan Fleury's data-oriented error handling, the user's friend's DSL idea +- Established a pattern for **data-grounded planning**: every spec is preceded by an audit (or an inventory) that drives the design decisions + +### 11.2 What the implementing agent inherits + +- 5 fully-specced + planned tracks, each with TDD task breakdown +- A clear execution order (topological sort of the dependency graph) +- ~25+ unit tests per track (pre-existing + new) that serve as regression coverage +- A permanent audit + CI gate (`scripts/audit_weak_types.py --strict`) for the type-alias convention +- Styleguides + product-guidelines + a new docs directory (`docs/type_registry/`) that serve as living documentation + +### 11.3 What I would do differently if I could start over + +- **Earlier on the data-oriented framing:** The user brought Fleury's article mid-session (for the error-handling track). It would have been useful to surface the data-oriented design philosophy in the FIRST track (test_batching_refactor) and apply it there. Going forward, this is a thread to weave into every track. +- **The "richest context" claim is half-true:** I have deep visibility into architecture and code quality concerns but little visibility into operational / production concerns (observability, telemetry, error rates in the field, user experience metrics). The recommended future tracks in §8 reflect this bias. + +### 11.4 One last recommendation + +**The post-tracks documentation track (§8.1) is the single most important thing to do NEXT** — after the 5 tracks ship, the docs are out of date. Plan it BEFORE the user starts working on the next big feature, so the codebase stays maintainable.