docs(reports): add planning digest covering 5 tracks from 2026-06-06 session

Single-session planning digest that captures: - The 5 tracks fully specced + planned (test_batching, qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor) - Cross-cutting design themes (data-oriented, audit-driven, per-track commit + git note, out-of-scope-by-default) - The audit + data foundation (scripts/audit_weak_types.py; 430 -> 60 finding; 0 strong patterns; 26 unique type strings; 86% concentrated in 6 files) - The dependency graph + recommended execution order - Follow-up tracks already planned in spec §12.1 of each track - Recommended future tracks (post-tracks documentation is the top pick) - Risks, open questions, and a complete file index This is the kind of reference document that: - Future planners consult to understand the codebase's current state - The implementing agent uses to coordinate across tracks - The user reviews as a digest of the planning work Written in the project's docs/reports/ directory alongside the existing Phase 5 reports (PHASE5_STABILISATION_REPORT.md, MUTATION_MATRIX_PHASE5.md, etc.).
2026-06-06 20:56:12 -04:00
parent 530a29f0d2
commit 0f74705d01
1 changed files with 468 additions and 0 deletions
@@ -0,0 +1,468 @@
+# Planning Digest: 5-Track Architectural Refactor (2026-06-06)
+
+**Status:** Planning complete; implementation in flight
+**Author:** Tier 2 Tech Lead (brainstorming + spec + plan for all 5 tracks)
+**Date:** 2026-06-06
+**Audience:** Future planners, the implementing agent, the user (as a reference / digest)
+
+---
+
+## 1. Executive Summary
+
+In a single planning session, **5 architectural refactor tracks** were specced and planned end-to-end. Together they reshape the `manual_slop` codebase around three foundational design principles — **data-oriented error handling** (Fleury), **data-oriented types** (named, documented, generated), and **modular MCP architecture** (sub-MCPs by category). All 5 tracks share a common ancestor in the **startup_speedup_20260606** track (already shipped as of `12cec6ae`), which established the lazy-SDK-import convention the other tracks depend on.
+
+| # | Track | Status | Phases | Key new files | What it does |
+|---|---|---|---|---|---|
+| 1 | `test_batching_refactor_20260606` | Planned | 4 | `scripts/{test_categorizer,test_batcher,pytest_collection_order}.py` | Replaces alphabetical 4-at-a-time batching with tiered batching (Tier 1 unit + xdist, Tier 3 live_gui in one session, etc.) |
+| 2 | `qwen_llama_grok_integration_20260606` | Planned | 6 | `src/{vendor_capabilities,openai_compatible,qwen_adapter}.py` | Adds Qwen (DashScope), Llama (Ollama + OpenRouter + custom URL), Grok (xAI). Introduces the Vendor Capability Matrix. |
+| 3 | `data_oriented_error_handling_20260606` | Planned | 5 | `src/result_types.py` | Introduces `Result[T]`, `ErrorInfo`, `NilPath` per Fleury. Removes `ProviderError` exception. Marks `send()` `@deprecated`; adds `send_result()`. |
+| 4 | `data_structure_strengthening_20260606` | Planned | 2 | `src/type_aliases.py`, `scripts/generate_type_registry.py` | Introduces 10 `TypeAlias` for the 430 anonymous `dict[str, Any]` / `list[dict[...]]` sites. Adds auto-generated `docs/type_registry/`. |
+| 5 | `mcp_architecture_refactor_20260606` | Planned | 7 | `src/mcp_<type>.py` (7 files), `src/mcp_client_security.py` | Splits 2,205-line `mcp_client.py` into slim controller + 6 native sub-MCPs + 1 external sub-MCP. |
+
+**Combined impact:** ~5 new framework files; ~6 modified framework files; ~6 modified high-traffic files (for the type-aliases refactor); 1 monolithic file split into 9 focused files; 1 new CI gate script; 1 new docs directory.
+
+---
+
+## 2. Session Context
+
+### 2.1 Workflow model
+
+The user is operating in a **planning / execution split** mode:
+- **This session:** Tier 2 Tech Lead (me) does brainstorming → spec → plan for each track. No code is written or executed.
+- **External session:** Another agent does the implementation. It picks up each `plan.md` and executes task-by-task via the project's MMA tier system.
+
+This split lets the user think strategically (planning) while the heavy lifting (executing) happens in parallel.
+
+### 2.2 The pre-existing baseline
+
+Before this session, the project had:
+- **277 test files** in `tests/` (`test_*.py` + `*_sim.py`)
+- **53 src files** (`src/*.py`)
+- **14 deep-dive guides** (`docs/guide_*.md`)
+- **The startup_speedup_20260606 track was in flight** (Phase 6 complete per `253e1798`; track SHIPPED per `12cec6ae` in the same window as this planning session)
+- **The test_batching_refactor_20260606 track had been planned** (spec + plan were in the folder but execution hadn't started)
+- **Conductor convention was in place** — every track has `spec.md` + `metadata.json` + `state.toml`; the `tracks.md` registry lists all tracks with their `[track-created: <sha>]` references
+
+### 2.3 What changed during this session
+
+The user asked for 5 different refactor specs in sequence:
+1. **Test batching refactor** — already-planned track; I reviewed and committed
+2. **Qwen/Llama/Grok vendors + capability matrix** — new spec; multiple design questions resolved
+3. **Data-oriented error handling (Fleury pattern)** — new spec; user brought the article + friend's notes
+4. **Data structure strengthening (type aliases + named tuples)** — new spec; user proposed auto-generated docs over TypedDict migration
+5. **MCP architecture refactor (sub-MCPs)** — new spec; user proposed `mcp_<type>.py` naming + the DSL future idea
+
+For each, I followed the **brainstorming → spec → plan** flow per the user's stated preference.
+
+---
+
+## 3. Cross-Cutting Design Themes
+
+Five design themes run through all the tracks. Understanding them makes each track's individual decisions coherent.
+
+### 3.1 Data-Oriented Design (Fleury / Acton / Lottes)
+
+The user explicitly references this in two of the five tracks (`data_oriented_error_handling_20260606` for errors; `mcp_architecture_refactor_20260606` for module boundaries). The framing is:
+
+- **Errors are just cases**, not special control-flow primitives. Use `Result[T]` with side-channel error lists, not exceptions.
+- **Algorithms on data**, not methods on objects. The `MCPController` is a data structure; sub-MCPs are data; the dispatch is a function from data to data.
+- **Stable names, not types**. Type aliases (`Metadata`, `FileItem`, etc.) name data roles; they don't enforce structure (that's deferred to TypedDict if ever).
+- **Shared code where possible**; unique code only where vendor-specific. The `_send_<vendor>_result()` functions in `ai_client.py` are thin boundary adapters; the `send_openai_compatible()` helper is the shared algorithm.
+
+### 3.2 Capability / Pattern / Convention as first-class docs
+
+The user values explicit, discoverable conventions over implicit understanding. Each track introduces at least one canonical document:
+- `conductor/code_styleguides/error_handling.md` (Fleury patterns)
+- `conductor/code_styleguides/type_aliases.md` (type alias conventions)
+- `docs/type_registry/` (auto-generated per-source-file schema docs)
+- `conductor/code_styleguides/mcp_<type>.py` (implicit, via the naming convention)
+
+The product-guidelines.md is the umbrella; the styleguides are the detailed references. This pattern should be followed for any future track that introduces a new convention.
+
+### 3.3 Audit + data-driven decisions
+
+Two of the five tracks are data-grounded:
+- `test_batching_refactor_20260606`: addressed the actual problem (alphabetical 4-at-a-time batching) and explicitly designed the solution around the test categories the project already uses (Tier 1 unit, Tier 2 mock_app, Tier 3 live_gui, etc.).
+- `data_structure_strengthening_20260606`: drove by the `scripts/audit_weak_types.py` findings (430 weak sites; 86% concentrated in 6 high-traffic files; 0 strong patterns; 26 unique type strings; top 4 = 86% of findings).
+
+The audit data is the source of truth. The track's success criterion is a measurable drop in the audit count (430 → ~60 = 86% reduction).
+
+### 3.4 Process: per-track commit + git note + checkpoint
+
+Every plan follows the same template:
+- **Per-task commit**: 1 commit per Red-Green-Refactor step
+- **Per-checkpoint git note**: `git notes add -m "..."` summarizing what the phase delivered
+- **Per-checkpoint state.toml update**: `current_phase` advanced; `checkpointsha` filled in
+
+This is a feature of the project's `conductor/workflow.md` and is consistently applied. The next planner / implementer should follow the same template.
+
+### 3.5 Out-of-scope-by-default; follow-up tracks for the next round
+
+Each of the 5 tracks explicitly defers work to follow-up tracks. The follow-ups are documented in each spec's §12.1:
+- `public_api_migration_20260606` — removes deprecated `send()` (from data_oriented_error_handling)
+- `type_registry_ci_20260606` — wires `generate_type_registry.py --check` into CI (from data_structure_strengthening)
+- `mcp_dsl_20260606` — per-MCP compact DSL for tool calls (from mcp_architecture_refactor)
+- `typed_dict_migration_20260606` — convert most-used aliases to `TypedDict` (initially planned; later replaced by the docs approach; kept as a future option)
+
+These follow-ups are listed in `conductor/tracks.md` as `[ ]` placeholders (item 0f etc.). They should be sequenced AFTER the 5 main tracks ship.
+
+---
+
+## 4. The 5 Tracks in Detail
+
+### 4.1 `test_batching_refactor_20260606`
+
+**Goal:** Replace alphabetical 4-at-a-time batching with tiered batching that respects fixture-class boundaries.
+
+**Architecture:**
+- `scripts/test_categorizer.py`: AST-based classifier that determines each test file's `FixtureClass` (UNIT, MOCK_APP, LIVE_GUI, HEADLESS, OPT_IN, PERFORMANCE) and its `batch_group` (e.g., `core`, `gui`, `mma`).
+- `scripts/test_batcher.py`: Pure scheduler. `plan(records, options) -> list[Batch]` deterministically produces batches.
+- `scripts/pytest_collection_order.py`: Conftest-loaded plugin for the per-test order control (opt-in per file).
+- `scripts/run_tests_batched.py`: Modified CLI orchestrator with `--tiers`, `--include-opt-in`, `--plan`, `--audit` modes.
+
+**Key decisions:**
+- **Tier 3 (live_gui) is one pytest invocation**, not many. This is THE single biggest runtime savings (15s startup amortized).
+- **Tier 1 (unit) uses pytest-xdist** for parallelism.
+- **Tier 0 (opt-in) is gated on BOTH env var AND CLI flag** (defense-in-depth: setting the env var alone shouldn't accidentally enable docker tests).
+- **Hybrid classification**: auto-infer from filename + AST fixture scan; hand-curated `tests/test_categories.toml` overrides for cross-cutting and ambiguous files.
+
+**What's NOT done:** The script does NOT modify test files or fixtures; it only categorizes and batches. New tests get sensible defaults automatically.
+
+**Current state:** Plan complete (`7fdab705` spec, `f7b11f7f` plan). Ready for execution.
+
+---
+
+### 4.2 `qwen_llama_grok_integration_20260606`
+
+**Goal:** Add first-class support for Qwen, Llama, Grok. Introduce the Vendor Capability Matrix.
+
+**Architecture:**
+- `src/vendor_capabilities.py`: `VendorCapabilities` dataclass, `_REGISTRY` populated per-(vendor, model).
+- `src/openai_compatible.py`: shared `send_openai_compatible()` helper (data-oriented design — operates on normalized data).
+- `src/qwen_adapter.py`: DashScope-specific tool format translation + error classification.
+
+**Key decisions:**
+- **Naming convention:** `_send_<vendor>_result()` returning `Result[str, ErrorInfo]` (8 vendors: Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI, Qwen, Llama, Grok).
+- **Capability Matrix v1:** 7 capabilities — vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking. Audio and server-side code_execution deferred to a future track.
+- **UX adaptation:** 9 UI elements read the matrix (screenshot button, tools toggle, cache panel, stream progress, fetch models button, token budget max, cost panel).
+- **OpenAI-compatible at the SDK boundary** keeps raising; the new `_send_<vendor>_result()` functions catch and convert to `ErrorInfo`. Per Fleury: "exceptions are reserved for the SDK boundary."
+
+**Coordination with `startup_speedup_20260606`:** Qwen's DashScope SDK adds a new import; the audit script `scripts/audit_main_thread_imports.py` ensures the import is gated to a worker thread, not the main thread. Verified at the baseline in Phase 1 of the track.
+
+**Current state:** Plan complete (`b17cbbde` plan). Ready for execution.
+
+---
+
+### 4.3 `data_oriented_error_handling_20260606`
+
+**Goal:** Introduce Ryan Fleury's "errors are just cases" framework as a project convention.
+
+**Architecture:**
+- `src/result_types.py`: `ErrorKind` enum, `ErrorInfo` dataclass, `Result[T]` generic, `NilPath` + `NilRAGState` sentinel singletons.
+- `src/mcp_client.py` (the data_oriented refactor for MCP): (p, err) tuples → `Result[Path]`; `assert p is not None` → nil-sentinel.
+- `src/ai_client.py`: `ProviderError` exception REMOVED; `_classify_<vendor>_error()` returns `ErrorInfo`; `_send_<vendor>()` renamed to `_send_<vendor>_result()` returning `Result[str]`.
+- `src/rag_engine.py`: methods return `Result` instead of raising.
+
+**Key decisions:**
+- **Internal-only refactor for the public API.** `_send_<vendor>_result()` is renamed + retuned. The public `send()` is preserved, marked `@typing_extensions.deprecated`; the new `send_result()` returns `Result[str]`. The actual breaking change happens in the follow-up `public_api_migration_20260606` track.
+- **`ProviderError` is FULLY REMOVED**, not kept as a thin internal exception. Per Fleury, exceptions are for the SDK boundary only; once the boundary converts to `ErrorInfo`, no exception is needed.
+- **Deprecation warning emitted in tests:** `tests/conftest.py` adds `filterwarnings("ignore::DeprecationWarning:src.ai_client")` during the transition.
+
+**Coordination with pending tracks:**
+- `mcp_architecture_refactor_20260606` assumes the `Result` pattern is in place (the new sub-MCPs return `Result[str, ErrorInfo]` from `invoke()`).
+- `data_structure_strengthening_20260606` assumes the `Metadata` family aliases are in place (the result types are referenced by name).
+- Both track specs have a §10 "Coordination with Pending Tracks" section that documents the post-tracks state and verifies it before proceeding.
+
+**Current state:** Plan complete (`f7b11f7f` plan). Ready for execution.
+
+---
+
+### 4.4 `data_structure_strengthening_20260606`
+
+**Goal:** Name the 430 anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types in the codebase.
+
+**Architecture:**
+- `src/type_aliases.py`: 10 `TypeAlias` definitions + 1 `NamedTuple` (`FileItemsDiff`).
+  - `Metadata` (root), `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`
+- `scripts/audit_weak_types.py` (already committed `84fd9ac9`): AST-based static analyzer. `Finding` dataclass; `--json`, `--top N`, `--verbose` modes. After this track: also `--strict` mode (CI gate; exits 1 if new weak sites are introduced).
+- `scripts/generate_type_registry.py` (Phase 2): AST-based registry generator. 3 modes — default (regenerate), `--check` (CI; exits 1 if drift), `--diff` (dry run). Writes `docs/type_registry/<source_module>.md` per source file.
+- `docs/type_registry/`: auto-generated per-source-file markdown references for the LLM to consult.
+
+**The data that drove the design:**
+- 430 weak sites across 29 of 61 files in `src/`
+- 0 strong patterns currently (no `TypeAlias`, no `NamedTuple`, no `pydantic.BaseModel` in the relevant shapes)
+- 26 unique type strings after normalization
+- Top 4 unique strings = 86% of findings (`list[dict[str, Any]]`, `dict[str, Any]`, `Dict[str, Any]`, `List[Dict[str, Any]]`)
+- File distribution: ai_client.py (139), app_controller.py (86), models.py (51), api_hook_client.py (32), project_manager.py (20), aggregate.py (17) = 345 in 6 files; the rest in 23 lower-impact files
+
+**The "docs over TypedDict" decision (key user feedback mid-track):**
+- Original draft proposed a follow-up track to convert aliases to `TypedDict`s.
+- User pushed back: pay the token cost (LLM reads the docs) instead of the upfront cost (designing `TypedDict` schemas for every type).
+- The `docs/type_registry/` generator is the result: an LLM can `cat docs/type_registry/ai_client.md` to see the fields of every struct in `src/ai_client.py` without the code having to enforce the structure at runtime.
+- The 5-pattern structure (Nil sentinel, Zero-init, Fail-early, AND-over-OR, Side-channel errors) is documented in the styleguide.
+
+**Coordination:**
+- This track's aliases compose with the `Result[T]` from `data_oriented_error_handling_20260606`: `Result[FileItems]`, `Result[CommsLogEntry]`, etc. are valid generics.
+- The audit script is the **permanent CI gate** for this convention. New `dict[str, Any]` in a PR fails `--strict` mode.
+
+**Current state:** Plan complete (`91475781` plan). Ready for execution.
+
+---
+
+### 4.5 `mcp_architecture_refactor_20260606`
+
+**Goal:** Split the 2,205-line monolithic `src/mcp_client.py` (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP.
+
+**Architecture:**
+- `src/mcp_client.py` (modified, slim): `SubMCP` Protocol + `MCPController` class + module-level `controller` singleton + `ALL_SUB_MCPS` registration list + re-export shim from `mcp_client_legacy`.
+- `src/mcp_client_legacy.py` (NEW): the OLD `mcp_client.py` content. Re-exported for backward compat.
+- `src/mcp_client_security.py` (NEW): 3-layer security (Allowlist → Resolve → Validate) returning `Result[Path]`.
+- `src/mcp_file_io.py` (9 tools), `src/mcp_python.py` (14), `src/mcp_c.py` (5), `src/mcp_cpp.py` (5), `src/mcp_web.py` (2), `src/mcp_analysis.py` (2): native sub-MCPs.
+- `src/mcp_external.py`: the existing `ExternalMCPManager` extracted; class name preserved as `ExternalMCP` for compat.
+
+**Naming convention (per user direction):** `mcp_<type>.py` for native MCPs. The user explicitly said this; the convention is locked in.
+
+**Key design decisions:**
+- **Sub-MCP shape:** class with `name` / `description` / `tools` (dict) / `invoke()` (returns `Result[str, ErrorInfo]`).
+- **Registration mechanism:** explicit `controller.register(FileIOMCP())` at the bottom of `mcp_client.py`. New sub-MCP = create the file + add 2 lines to the registration. No magic, no auto-discovery.
+- **Controller-level security:** the 3-layer security runs BEFORE delegating to sub-MCPs. Sub-MCPs receive already-validated paths. Testable in isolation.
+- **Dispatch inversion:** the controller uses an inverted-dict `self._tool_index[tool_name] -> sub_mcp` for O(1) lookup. The current if/elif chain is O(n) per dispatch.
+- **External MCP is NOT in `ALL_SUB_MCPS`** — it's a sub-controller. The main controller delegates to it AFTER native sub-MCPs miss.
+
+**The "thin adapter" approach for v1:**
+- Each sub-MCP's methods (e.g., `read_file`, `py_get_skeleton`) **delegate to the corresponding function in `mcp_client_legacy.py`**. This keeps the legacy module as the source of truth for the implementation; the new `mcp_<type>.py` is a thin adapter that adds the class shape, the security check, and the `Result` wrapping.
+- A future track can move the actual implementations into the sub-MCP files directly once the architecture is established. For v1, delegation is the safer path.
+
+**Backward compatibility:**
+- `src/mcp_client_legacy.py` re-exports all 45+ old function names.
+- `src/mcp_client.py` is now a slim shim that imports from legacy.
+- The 4 existing test files (`test_mcp_client_beads.py`, `test_mcp_config.py`, `test_mcp_perf_tool.py`, `test_mcp_ts_integration.py`) and `src/app_controller.py:61` (the direct `mcp_client.py_get_symbol_info` call) continue to work unchanged.
+
+**The DSL future (per user's notes on APL/K/Cosy):**
+- The user shared a friend's idea: per-MCP compact dialects (like command line but more flexible) instead of JSON.
+- Acknowledged in the spec as out of scope for this track ("no time for that").
+- Documented as `mcp_dsl_20260606` follow-up in spec §12.1.
+- The sub-MCP architecture is the natural unit to pair with a DSL emitter in the future.
+
+**Current state:** Plan complete (`cf01870b` plan). Ready for execution.
+
+---
+
+## 5. The Audit & Data Foundation
+
+The most data-grounded track is `data_structure_strengthening_20260606`. The audit that drove it is committed at `84fd9ac9`:
+
+```
+File: scripts/audit_weak_types.py
+Size: 281 lines
+Modes: default (human-readable), --json, --top N, --verbose
+Detection: AST-based; regex over ast.unparse() of type annotations
+Patterns detected: 14 (Dict[str, Any], list[dict[...]], Tuple[...], Optional[...], assign-tuple-literal, ...)
+Positive patterns detected: TypeAlias, NamedTuple, @dataclass, pydantic.BaseModel
+Exit codes: 0 = informational, 1 = usage error
+```
+
+**Pre-track findings (baseline):**
+- 430 weak sites in 29 of 61 files
+- 0 strong patterns
+- 26 unique type strings
+- Top 4 unique strings = 86% of findings
+
+**Post-track target:**
+- ~60 weak sites in the 23 lower-impact files (the 6 high-traffic files contribute 0)
+- 10 `TypeAlias` definitions + 1 `NamedTuple` in use
+- `--strict` mode + baseline file as permanent CI gate
+
+This is **the most measurable track** in the planning session. Success = a concrete number drop in the audit count.
+
+---
+
+## 6. The Coordinate Picture (dependencies)
+
+The 5 tracks form a dependency graph. The arrows are "blocks":
+
+```
+startup_speedup_20260606  (SHIPPED)
+  ↓
+  ├── test_batching_refactor_20260606  (planned)
+  │
+  ├── qwen_llama_grok_integration_20260606  (planned)
+  │      ↓
+  │      ├── data_oriented_error_handling_20260606  (planned)
+  │      │      ↓
+  │      │      ├── public_api_migration_20260606  (follow-up; not yet specced)
+  │      │      └── type_registry_ci_20260606  (follow-up; not yet specced)
+  │      │
+  │      └── data_structure_strengthening_20260606  (planned)
+  │             ↓
+  │             └── type_registry_ci_20260606  (follow-up; not yet specced)
+  │
+  └── mcp_architecture_refactor_20260606  (planned; depends on data_oriented + data_structure tracks)
+         ↓
+         └── mcp_dsl_20260606  (follow-up; not yet specced)
+```
+
+**Critical insight:** `mcp_architecture_refactor_20260606` depends on BOTH `data_oriented_error_handling_20260606` (for `Result`) and `data_structure_strengthening_20260606` (for the `Metadata` aliases). If the implementing agent executes tracks in arbitrary order, this dependency is broken.
+
+The recommended execution order is the topological order: `startup_speedup` (done) → `qwen_llama_grok` → `data_oriented_error_handling` + `data_structure_strengthening` (in parallel) → `mcp_architecture_refactor` → `test_batching_refactor` (no dependencies; can run anytime) → follow-up tracks.
+
+---
+
+## 7. Follow-up Tracks Already Planned (Not in This Session's 5)
+
+Each track's spec §12.1 names a follow-up. Aggregated:
+
+| Follow-up | Parent track | Scope |
+|---|---|---|
+| `public_api_migration_20260606` | data_oriented_error_handling | Remove deprecated `ai_client.send()`; migrate all callers (multi_agent_conductor, app_controller, ~50 tests) to `send_result()` |
+| `type_registry_ci_20260606` | data_structure_strengthening | Wire `generate_type_registry.py --check` into CI; add pre-commit hook; document per-track commit workflow |
+| `mcp_dsl_20260606` | mcp_architecture_refactor | Per-MCP compact dialect for tool calls (APL/K/Cosy-inspired); ~5x token reduction per call |
+
+All three are listed in `conductor/tracks.md` as `[ ]` placeholders. They should be sequenced AFTER the 5 main tracks ship. None are urgent; all are improvements.
+
+---
+
+## 8. Recommended Future Tracks (Beyond What's Planned)
+
+These are tracks I identified during this session but didn't fully spec. They're ranked by what I think is most important.
+
+### 8.1 Post-Tracks Documentation Synchronization (top pick)
+
+**Why:** The 5 planned tracks add 10+ new modules and change the architecture significantly. The existing docs (`docs/guide_*.md`) were last updated in the 2026-06-02 comprehensive docs refresh — and are about to be more out of date than they are now. Stale docs are the #1 enemy of AI readability (an LLM reading `guide_ai_client.md` and finding it pre-dates `Result`/`ErrorInfo` will hallucinate the wrong shape).
+
+**Scope (1-2 phases):**
+- Phase 1: Update all existing guides (`guide_ai_client.md`, `guide_mcp_client.md`, etc.) to reflect the post-tracks state.
+- Phase 2: Add cookbooks ("How to add a new sub-MCP", "How to add a new AI vendor", "How to add a new result type") + a `docs/type_registry.md` index.
+
+**Why first:** Bounded and achievable. Closes the loop on all the planning work — each track ships a module; this track ships the docs that explain those modules.
+
+### 8.2 Test Coverage Audit & Improvement (runner-up)
+
+**Why:** The project has a stated >80% coverage target per `conductor/workflow.md`, but the actual current state is unknown. Under-tested areas are likely `app_controller.py` (4,153 lines; the orchestrator that touches everything) and `multi_agent_conductor.py` (the most complex control flow). The new modules from the 5 planned tracks each get unit tests in their respective tracks, but integration tests are sparse.
+
+**Scope (1-2 phases):**
+- Phase 1: Run `pytest --cov=src --cov-report=html`; identify the bottom-10 modules by coverage; write tests to bring each to >80%.
+- Phase 2: Add a coverage threshold to CI (e.g., `--cov-fail-under=80`); add per-module coverage badges to `docs/Readme.md`.
+
+### 8.3 Security Audit / Hardening
+
+**Why:** The 3-layer MCP security model is solid, but there are adjacent concerns:
+- **Command injection in `run_powershell`** — the AI generates PowerShell commands; how is the risk of a malicious model call mitigated? The HITL dialog exists, but is it consistently applied?
+- **Prompt injection** — the AI sees file content, web search results, Beads queries. A malicious file could inject instructions that the AI then follows. How is this sanitized?
+- **Sensitive data in logs** — the `comms_log` records full API requests/responses. If a user includes an API key or password in a message, it ends up in the log. What's the redaction policy?
+
+**Scope (1-2 phases):**
+- Phase 1: Threat model the AI tool-calling surface; document the existing mitigations; identify gaps.
+- Phase 2: Add log redaction for known secret patterns; add a "dangerous command" detector for `run_powershell`; add an "untrusted content" marker for content from external sources.
+
+### 8.4 Dependency Hygiene
+
+**Why:** `pyproject.toml` has a long dep list. No track for:
+- Version pinning strategy (caret vs tilde vs exact)
+- Deprecation monitoring (track when a vendor SDK announces EOL)
+- License audit (any GPL contamination?)
+- CVE scanning
+
+This is a "track for the person who maintains the project 6 months from now."
+
+---
+
+## 9. Risks & Open Questions (Cross-Track)
+
+### 9.1 Risks
+
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| The implementing agent executes tracks in the wrong order, breaking the dependency chain (especially for `mcp_architecture_refactor_20260606` which depends on the other two). | Medium | High (broken tests; confusing failures) | The recommended execution order in §6 is explicit. The plan files note the dependencies in their "blocked_by" sections. |
+| The 5 tracks add 10+ new files but the `scripts/audit_main_thread_imports.py` doesn't catch a heavy import in one of the new modules. | Low | Medium (regresses the startup_speedup invariant) | Each new module's Phase 1 task includes an import-time check (`uv run python -c "import time; ..."`). |
+| A future contributor adds a new `dict[str, Any]` after the data_structure_strengthening track; the audit `--strict` mode catches it, but they're confused about why. | Medium | Low (process friction) | The styleguide + the deprecation warning in `--strict` mode explain the rule. |
+| The `mcp_client_legacy.py` shim becomes permanent and never gets removed. | Medium | Low (acceptable) | The `public_api_migration_20260606` follow-up (and any future MCP-API changes) is the natural place to remove the shim. |
+| The DSL idea becomes a "we have to do it now" before the architecture track is done. | Low | Low | The DSL is explicitly out of scope. The sub-MCP architecture is compatible with a future DSL layer. |
+
+### 9.2 Open questions for the next planning round
+
+- **Where do the implementation agents' session notes / handoffs go?** Each track has `metadata.json` + `state.toml` for the planning side. There's no equivalent for the implementation side. (The `startup_speedup_20260606` track's recent commits `253e1798`, `88fc42bb`, `8c4791d0` suggest they do handoff via commit messages, but a structured format would be nice.)
+- **What happens when a track's implementation diverges from the plan?** Per `conductor/workflow.md`, "implementation differs from spec" is handled by updating the spec. But the plan files don't have a clear "deviations" section. Consider adding one to future plans.
+- **How are plan review comments captured?** The plan files are committed at `cf01870b` (and the others). But there's no `conductor/plan_reviews/` directory. If the implementing agent has questions or disagreements, where do they go?
+
+---
+
+## 10. File Index
+
+For the implementing agent (and any future planner), here's the canonical file index.
+
+### 10.1 Conductor convention files (the project-level structure)
+
+| File | Purpose |
+|---|---|
+| `conductor/tracks.md` | Master track registry. Lists all tracks with their status (`[ ]` planned, `[~]` in progress, `[x]` done) and `[track-created: <sha>]` references. |
+| `conductor/workflow.md` | The project's TDD + per-track commit + git note workflow. |
+| `conductor/product-guidelines.md` | The project's design principles (1-space indent, 1 commit per task, type hints, etc.). |
+| `conductor/product.md` | The project's product vision and use cases. |
+| `conductor/tech-stack.md` | The project's tech stack. |
+| `conductor/code_styleguides/python.md` | Language-specific style guide. |
+| `conductor/code_styleguides/error_handling.md` | (created in data_oriented_error_handling) Data-Oriented Error Handling convention. |
+| `conductor/code_styleguides/type_aliases.md` | (created in data_structure_strengthening) Type Aliases convention. |
+
+### 10.2 The 5 new tracks (this session's planning output)
+
+| Track | Spec SHA | Plan SHA | Files |
+|---|---|---|---|
+| `test_batching_refactor_20260606` | `b7a97374` | `f7b11f7f` | spec.md, metadata.json, state.toml, plan.md |
+| `qwen_llama_grok_integration_20260606` | `7c1d597e` (track init), `97daaff2` (consistency) | `b17cbbde` | spec.md, metadata.json, state.toml, plan.md |
+| `data_oriented_error_handling_20260606` | `494f68f9` (init), `cbc3b075` (track + tracks.md), `f7b11f7f` (plan) | `f7b11f7f` | spec.md, metadata.json, state.toml, plan.md |
+| `data_structure_strengthening_20260606` | `ed42a97a` (init), `aba35f9f` (registry), `432c7895` (risk) | `91475781` | spec.md, metadata.json, state.toml, plan.md |
+| `mcp_architecture_refactor_20260606` | `2720a894` (init), `dd137df7` (backfill) | `cf01870b` | spec.md, metadata.json, state.toml, plan.md |
+
+### 10.3 The 5 new module families (what the tracks will create)
+
+| Module family | Created by | Files |
+|---|---|---|
+| Test batching | `test_batching_refactor_20260606` | `scripts/{test_categorizer,test_batcher,pytest_collection_order}.py`, `scripts/run_tests_batched.py`, `tests/test_categories.toml` |
+| Vendor capability matrix | `qwen_llama_grok_integration_20260606` | `src/{vendor_capabilities,openai_compatible,qwen_adapter}.py` |
+| Result types | `data_oriented_error_handling_20260606` | `src/result_types.py` |
+| Type aliases + registry | `data_structure_strengthening_20260606` | `src/type_aliases.py`, `scripts/generate_type_registry.py`, `docs/type_registry/` |
+| Sub-MCPs | `mcp_architecture_refactor_20260606` | `src/mcp_<type>.py` (7 files), `src/mcp_client_security.py`, `src/mcp_client_legacy.py` |
+
+### 10.4 The audit script (data-driven decisions)
+
+| File | Purpose |
+|---|---|
+| `scripts/audit_weak_types.py` (committed `84fd9ac9`) | AST analyzer that found the 430 weak sites driving data_structure_strengthening. |
+
+### 10.5 The startup_speedup predecessor
+
+| Track | Status | Key outputs |
+|---|---|---|
+| `startup_speedup_20260606` | SHIPPED (commits `12cec6ae`, `bb2ac6c9`, `253e1798`, `88fc42bb`, `8c4791d0`) | `_io_pool` ThreadPoolExecutor; warmup mechanism; lazy SDK imports; `scripts/audit_main_thread_imports.py` CI gate |
+
+This is the **predecessor for all 5 tracks** — the lazy-SDK-import convention means the new modules can use `from src.openai_compatible import send_openai_compatible` at the top without paying the SDK import cost on the main thread.
+
+---
+
+## 11. Closing Notes
+
+### 11.1 What the user achieved in this session
+
+In a single multi-hour planning session, the user:
+- Approved 5 architectural refactor tracks end-to-end (brainstorming → spec → plan)
+- Made 3 major design decisions with significant impact: (1) the `mcp_<type>.py` naming convention, (2) the "docs over TypedDict" tradeoff, (3) the deprecation-not-removal of the public `send()` API
+- Brought in external inspiration: Ryan Fleury's data-oriented error handling, the user's friend's DSL idea
+- Established a pattern for **data-grounded planning**: every spec is preceded by an audit (or an inventory) that drives the design decisions
+
+### 11.2 What the implementing agent inherits
+
+- 5 fully-specced + planned tracks, each with TDD task breakdown
+- A clear execution order (topological sort of the dependency graph)
+- ~25+ unit tests per track (pre-existing + new) that serve as regression coverage
+- A permanent audit + CI gate (`scripts/audit_weak_types.py --strict`) for the type-alias convention
+- Styleguides + product-guidelines + a new docs directory (`docs/type_registry/`) that serve as living documentation
+
+### 11.3 What I would do differently if I could start over
+
+- **Earlier on the data-oriented framing:** The user brought Fleury's article mid-session (for the error-handling track). It would have been useful to surface the data-oriented design philosophy in the FIRST track (test_batching_refactor) and apply it there. Going forward, this is a thread to weave into every track.
+- **The "richest context" claim is half-true:** I have deep visibility into architecture and code quality concerns but little visibility into operational / production concerns (observability, telemetry, error rates in the field, user experience metrics). The recommended future tracks in §8 reflect this bias.
+
+### 11.4 One last recommendation
+
+**The post-tracks documentation track (§8.1) is the single most important thing to do NEXT** — after the 5 tracks ship, the docs are out of date. Plan it BEFORE the user starts working on the next big feature, so the codebase stays maintainable.