docs(reports): add planning digest covering 5 tracks from 2026-06-06 session
Single-session planning digest that captures: - The 5 tracks fully specced + planned (test_batching, qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor) - Cross-cutting design themes (data-oriented, audit-driven, per-track commit + git note, out-of-scope-by-default) - The audit + data foundation (scripts/audit_weak_types.py; 430 -> 60 finding; 0 strong patterns; 26 unique type strings; 86% concentrated in 6 files) - The dependency graph + recommended execution order - Follow-up tracks already planned in spec §12.1 of each track - Recommended future tracks (post-tracks documentation is the top pick) - Risks, open questions, and a complete file index This is the kind of reference document that: - Future planners consult to understand the codebase's current state - The implementing agent uses to coordinate across tracks - The user reviews as a digest of the planning work Written in the project's docs/reports/ directory alongside the existing Phase 5 reports (PHASE5_STABILISATION_REPORT.md, MUTATION_MATRIX_PHASE5.md, etc.).
This commit is contained in:
@@ -0,0 +1,468 @@
|
||||
# Planning Digest: 5-Track Architectural Refactor (2026-06-06)
|
||||
|
||||
**Status:** Planning complete; implementation in flight
|
||||
**Author:** Tier 2 Tech Lead (brainstorming + spec + plan for all 5 tracks)
|
||||
**Date:** 2026-06-06
|
||||
**Audience:** Future planners, the implementing agent, the user (as a reference / digest)
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
In a single planning session, **5 architectural refactor tracks** were specced and planned end-to-end. Together they reshape the `manual_slop` codebase around three foundational design principles — **data-oriented error handling** (Fleury), **data-oriented types** (named, documented, generated), and **modular MCP architecture** (sub-MCPs by category). All 5 tracks share a common ancestor in the **startup_speedup_20260606** track (already shipped as of `12cec6ae`), which established the lazy-SDK-import convention the other tracks depend on.
|
||||
|
||||
| # | Track | Status | Phases | Key new files | What it does |
|
||||
|---|---|---|---|---|---|
|
||||
| 1 | `test_batching_refactor_20260606` | Planned | 4 | `scripts/{test_categorizer,test_batcher,pytest_collection_order}.py` | Replaces alphabetical 4-at-a-time batching with tiered batching (Tier 1 unit + xdist, Tier 3 live_gui in one session, etc.) |
|
||||
| 2 | `qwen_llama_grok_integration_20260606` | Planned | 6 | `src/{vendor_capabilities,openai_compatible,qwen_adapter}.py` | Adds Qwen (DashScope), Llama (Ollama + OpenRouter + custom URL), Grok (xAI). Introduces the Vendor Capability Matrix. |
|
||||
| 3 | `data_oriented_error_handling_20260606` | Planned | 5 | `src/result_types.py` | Introduces `Result[T]`, `ErrorInfo`, `NilPath` per Fleury. Removes `ProviderError` exception. Marks `send()` `@deprecated`; adds `send_result()`. |
|
||||
| 4 | `data_structure_strengthening_20260606` | Planned | 2 | `src/type_aliases.py`, `scripts/generate_type_registry.py` | Introduces 10 `TypeAlias` for the 430 anonymous `dict[str, Any]` / `list[dict[...]]` sites. Adds auto-generated `docs/type_registry/`. |
|
||||
| 5 | `mcp_architecture_refactor_20260606` | Planned | 7 | `src/mcp_<type>.py` (7 files), `src/mcp_client_security.py` | Splits 2,205-line `mcp_client.py` into slim controller + 6 native sub-MCPs + 1 external sub-MCP. |
|
||||
|
||||
**Combined impact:** ~5 new framework files; ~6 modified framework files; ~6 modified high-traffic files (for the type-aliases refactor); 1 monolithic file split into 9 focused files; 1 new CI gate script; 1 new docs directory.
|
||||
|
||||
---
|
||||
|
||||
## 2. Session Context
|
||||
|
||||
### 2.1 Workflow model
|
||||
|
||||
The user is operating in a **planning / execution split** mode:
|
||||
- **This session:** Tier 2 Tech Lead (me) does brainstorming → spec → plan for each track. No code is written or executed.
|
||||
- **External session:** Another agent does the implementation. It picks up each `plan.md` and executes task-by-task via the project's MMA tier system.
|
||||
|
||||
This split lets the user think strategically (planning) while the heavy lifting (executing) happens in parallel.
|
||||
|
||||
### 2.2 The pre-existing baseline
|
||||
|
||||
Before this session, the project had:
|
||||
- **277 test files** in `tests/` (`test_*.py` + `*_sim.py`)
|
||||
- **53 src files** (`src/*.py`)
|
||||
- **14 deep-dive guides** (`docs/guide_*.md`)
|
||||
- **The startup_speedup_20260606 track was in flight** (Phase 6 complete per `253e1798`; track SHIPPED per `12cec6ae` in the same window as this planning session)
|
||||
- **The test_batching_refactor_20260606 track had been planned** (spec + plan were in the folder but execution hadn't started)
|
||||
- **Conductor convention was in place** — every track has `spec.md` + `metadata.json` + `state.toml`; the `tracks.md` registry lists all tracks with their `[track-created: <sha>]` references
|
||||
|
||||
### 2.3 What changed during this session
|
||||
|
||||
The user asked for 5 different refactor specs in sequence:
|
||||
1. **Test batching refactor** — already-planned track; I reviewed and committed
|
||||
2. **Qwen/Llama/Grok vendors + capability matrix** — new spec; multiple design questions resolved
|
||||
3. **Data-oriented error handling (Fleury pattern)** — new spec; user brought the article + friend's notes
|
||||
4. **Data structure strengthening (type aliases + named tuples)** — new spec; user proposed auto-generated docs over TypedDict migration
|
||||
5. **MCP architecture refactor (sub-MCPs)** — new spec; user proposed `mcp_<type>.py` naming + the DSL future idea
|
||||
|
||||
For each, I followed the **brainstorming → spec → plan** flow per the user's stated preference.
|
||||
|
||||
---
|
||||
|
||||
## 3. Cross-Cutting Design Themes
|
||||
|
||||
Five design themes run through all the tracks. Understanding them makes each track's individual decisions coherent.
|
||||
|
||||
### 3.1 Data-Oriented Design (Fleury / Acton / Lottes)
|
||||
|
||||
The user explicitly references this in two of the five tracks (`data_oriented_error_handling_20260606` for errors; `mcp_architecture_refactor_20260606` for module boundaries). The framing is:
|
||||
|
||||
- **Errors are just cases**, not special control-flow primitives. Use `Result[T]` with side-channel error lists, not exceptions.
|
||||
- **Algorithms on data**, not methods on objects. The `MCPController` is a data structure; sub-MCPs are data; the dispatch is a function from data to data.
|
||||
- **Stable names, not types**. Type aliases (`Metadata`, `FileItem`, etc.) name data roles; they don't enforce structure (that's deferred to TypedDict if ever).
|
||||
- **Shared code where possible**; unique code only where vendor-specific. The `_send_<vendor>_result()` functions in `ai_client.py` are thin boundary adapters; the `send_openai_compatible()` helper is the shared algorithm.
|
||||
|
||||
### 3.2 Capability / Pattern / Convention as first-class docs
|
||||
|
||||
The user values explicit, discoverable conventions over implicit understanding. Each track introduces at least one canonical document:
|
||||
- `conductor/code_styleguides/error_handling.md` (Fleury patterns)
|
||||
- `conductor/code_styleguides/type_aliases.md` (type alias conventions)
|
||||
- `docs/type_registry/` (auto-generated per-source-file schema docs)
|
||||
- `conductor/code_styleguides/mcp_<type>.py` (implicit, via the naming convention)
|
||||
|
||||
The product-guidelines.md is the umbrella; the styleguides are the detailed references. This pattern should be followed for any future track that introduces a new convention.
|
||||
|
||||
### 3.3 Audit + data-driven decisions
|
||||
|
||||
Two of the five tracks are data-grounded:
|
||||
- `test_batching_refactor_20260606`: addressed the actual problem (alphabetical 4-at-a-time batching) and explicitly designed the solution around the test categories the project already uses (Tier 1 unit, Tier 2 mock_app, Tier 3 live_gui, etc.).
|
||||
- `data_structure_strengthening_20260606`: drove by the `scripts/audit_weak_types.py` findings (430 weak sites; 86% concentrated in 6 high-traffic files; 0 strong patterns; 26 unique type strings; top 4 = 86% of findings).
|
||||
|
||||
The audit data is the source of truth. The track's success criterion is a measurable drop in the audit count (430 → ~60 = 86% reduction).
|
||||
|
||||
### 3.4 Process: per-track commit + git note + checkpoint
|
||||
|
||||
Every plan follows the same template:
|
||||
- **Per-task commit**: 1 commit per Red-Green-Refactor step
|
||||
- **Per-checkpoint git note**: `git notes add -m "..."` summarizing what the phase delivered
|
||||
- **Per-checkpoint state.toml update**: `current_phase` advanced; `checkpointsha` filled in
|
||||
|
||||
This is a feature of the project's `conductor/workflow.md` and is consistently applied. The next planner / implementer should follow the same template.
|
||||
|
||||
### 3.5 Out-of-scope-by-default; follow-up tracks for the next round
|
||||
|
||||
Each of the 5 tracks explicitly defers work to follow-up tracks. The follow-ups are documented in each spec's §12.1:
|
||||
- `public_api_migration_20260606` — removes deprecated `send()` (from data_oriented_error_handling)
|
||||
- `type_registry_ci_20260606` — wires `generate_type_registry.py --check` into CI (from data_structure_strengthening)
|
||||
- `mcp_dsl_20260606` — per-MCP compact DSL for tool calls (from mcp_architecture_refactor)
|
||||
- `typed_dict_migration_20260606` — convert most-used aliases to `TypedDict` (initially planned; later replaced by the docs approach; kept as a future option)
|
||||
|
||||
These follow-ups are listed in `conductor/tracks.md` as `[ ]` placeholders (item 0f etc.). They should be sequenced AFTER the 5 main tracks ship.
|
||||
|
||||
---
|
||||
|
||||
## 4. The 5 Tracks in Detail
|
||||
|
||||
### 4.1 `test_batching_refactor_20260606`
|
||||
|
||||
**Goal:** Replace alphabetical 4-at-a-time batching with tiered batching that respects fixture-class boundaries.
|
||||
|
||||
**Architecture:**
|
||||
- `scripts/test_categorizer.py`: AST-based classifier that determines each test file's `FixtureClass` (UNIT, MOCK_APP, LIVE_GUI, HEADLESS, OPT_IN, PERFORMANCE) and its `batch_group` (e.g., `core`, `gui`, `mma`).
|
||||
- `scripts/test_batcher.py`: Pure scheduler. `plan(records, options) -> list[Batch]` deterministically produces batches.
|
||||
- `scripts/pytest_collection_order.py`: Conftest-loaded plugin for the per-test order control (opt-in per file).
|
||||
- `scripts/run_tests_batched.py`: Modified CLI orchestrator with `--tiers`, `--include-opt-in`, `--plan`, `--audit` modes.
|
||||
|
||||
**Key decisions:**
|
||||
- **Tier 3 (live_gui) is one pytest invocation**, not many. This is THE single biggest runtime savings (15s startup amortized).
|
||||
- **Tier 1 (unit) uses pytest-xdist** for parallelism.
|
||||
- **Tier 0 (opt-in) is gated on BOTH env var AND CLI flag** (defense-in-depth: setting the env var alone shouldn't accidentally enable docker tests).
|
||||
- **Hybrid classification**: auto-infer from filename + AST fixture scan; hand-curated `tests/test_categories.toml` overrides for cross-cutting and ambiguous files.
|
||||
|
||||
**What's NOT done:** The script does NOT modify test files or fixtures; it only categorizes and batches. New tests get sensible defaults automatically.
|
||||
|
||||
**Current state:** Plan complete (`7fdab705` spec, `f7b11f7f` plan). Ready for execution.
|
||||
|
||||
---
|
||||
|
||||
### 4.2 `qwen_llama_grok_integration_20260606`
|
||||
|
||||
**Goal:** Add first-class support for Qwen, Llama, Grok. Introduce the Vendor Capability Matrix.
|
||||
|
||||
**Architecture:**
|
||||
- `src/vendor_capabilities.py`: `VendorCapabilities` dataclass, `_REGISTRY` populated per-(vendor, model).
|
||||
- `src/openai_compatible.py`: shared `send_openai_compatible()` helper (data-oriented design — operates on normalized data).
|
||||
- `src/qwen_adapter.py`: DashScope-specific tool format translation + error classification.
|
||||
|
||||
**Key decisions:**
|
||||
- **Naming convention:** `_send_<vendor>_result()` returning `Result[str, ErrorInfo]` (8 vendors: Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI, Qwen, Llama, Grok).
|
||||
- **Capability Matrix v1:** 7 capabilities — vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking. Audio and server-side code_execution deferred to a future track.
|
||||
- **UX adaptation:** 9 UI elements read the matrix (screenshot button, tools toggle, cache panel, stream progress, fetch models button, token budget max, cost panel).
|
||||
- **OpenAI-compatible at the SDK boundary** keeps raising; the new `_send_<vendor>_result()` functions catch and convert to `ErrorInfo`. Per Fleury: "exceptions are reserved for the SDK boundary."
|
||||
|
||||
**Coordination with `startup_speedup_20260606`:** Qwen's DashScope SDK adds a new import; the audit script `scripts/audit_main_thread_imports.py` ensures the import is gated to a worker thread, not the main thread. Verified at the baseline in Phase 1 of the track.
|
||||
|
||||
**Current state:** Plan complete (`b17cbbde` plan). Ready for execution.
|
||||
|
||||
---
|
||||
|
||||
### 4.3 `data_oriented_error_handling_20260606`
|
||||
|
||||
**Goal:** Introduce Ryan Fleury's "errors are just cases" framework as a project convention.
|
||||
|
||||
**Architecture:**
|
||||
- `src/result_types.py`: `ErrorKind` enum, `ErrorInfo` dataclass, `Result[T]` generic, `NilPath` + `NilRAGState` sentinel singletons.
|
||||
- `src/mcp_client.py` (the data_oriented refactor for MCP): (p, err) tuples → `Result[Path]`; `assert p is not None` → nil-sentinel.
|
||||
- `src/ai_client.py`: `ProviderError` exception REMOVED; `_classify_<vendor>_error()` returns `ErrorInfo`; `_send_<vendor>()` renamed to `_send_<vendor>_result()` returning `Result[str]`.
|
||||
- `src/rag_engine.py`: methods return `Result` instead of raising.
|
||||
|
||||
**Key decisions:**
|
||||
- **Internal-only refactor for the public API.** `_send_<vendor>_result()` is renamed + retuned. The public `send()` is preserved, marked `@typing_extensions.deprecated`; the new `send_result()` returns `Result[str]`. The actual breaking change happens in the follow-up `public_api_migration_20260606` track.
|
||||
- **`ProviderError` is FULLY REMOVED**, not kept as a thin internal exception. Per Fleury, exceptions are for the SDK boundary only; once the boundary converts to `ErrorInfo`, no exception is needed.
|
||||
- **Deprecation warning emitted in tests:** `tests/conftest.py` adds `filterwarnings("ignore::DeprecationWarning:src.ai_client")` during the transition.
|
||||
|
||||
**Coordination with pending tracks:**
|
||||
- `mcp_architecture_refactor_20260606` assumes the `Result` pattern is in place (the new sub-MCPs return `Result[str, ErrorInfo]` from `invoke()`).
|
||||
- `data_structure_strengthening_20260606` assumes the `Metadata` family aliases are in place (the result types are referenced by name).
|
||||
- Both track specs have a §10 "Coordination with Pending Tracks" section that documents the post-tracks state and verifies it before proceeding.
|
||||
|
||||
**Current state:** Plan complete (`f7b11f7f` plan). Ready for execution.
|
||||
|
||||
---
|
||||
|
||||
### 4.4 `data_structure_strengthening_20260606`
|
||||
|
||||
**Goal:** Name the 430 anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types in the codebase.
|
||||
|
||||
**Architecture:**
|
||||
- `src/type_aliases.py`: 10 `TypeAlias` definitions + 1 `NamedTuple` (`FileItemsDiff`).
|
||||
- `Metadata` (root), `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`
|
||||
- `scripts/audit_weak_types.py` (already committed `84fd9ac9`): AST-based static analyzer. `Finding` dataclass; `--json`, `--top N`, `--verbose` modes. After this track: also `--strict` mode (CI gate; exits 1 if new weak sites are introduced).
|
||||
- `scripts/generate_type_registry.py` (Phase 2): AST-based registry generator. 3 modes — default (regenerate), `--check` (CI; exits 1 if drift), `--diff` (dry run). Writes `docs/type_registry/<source_module>.md` per source file.
|
||||
- `docs/type_registry/`: auto-generated per-source-file markdown references for the LLM to consult.
|
||||
|
||||
**The data that drove the design:**
|
||||
- 430 weak sites across 29 of 61 files in `src/`
|
||||
- 0 strong patterns currently (no `TypeAlias`, no `NamedTuple`, no `pydantic.BaseModel` in the relevant shapes)
|
||||
- 26 unique type strings after normalization
|
||||
- Top 4 unique strings = 86% of findings (`list[dict[str, Any]]`, `dict[str, Any]`, `Dict[str, Any]`, `List[Dict[str, Any]]`)
|
||||
- File distribution: ai_client.py (139), app_controller.py (86), models.py (51), api_hook_client.py (32), project_manager.py (20), aggregate.py (17) = 345 in 6 files; the rest in 23 lower-impact files
|
||||
|
||||
**The "docs over TypedDict" decision (key user feedback mid-track):**
|
||||
- Original draft proposed a follow-up track to convert aliases to `TypedDict`s.
|
||||
- User pushed back: pay the token cost (LLM reads the docs) instead of the upfront cost (designing `TypedDict` schemas for every type).
|
||||
- The `docs/type_registry/` generator is the result: an LLM can `cat docs/type_registry/ai_client.md` to see the fields of every struct in `src/ai_client.py` without the code having to enforce the structure at runtime.
|
||||
- The 5-pattern structure (Nil sentinel, Zero-init, Fail-early, AND-over-OR, Side-channel errors) is documented in the styleguide.
|
||||
|
||||
**Coordination:**
|
||||
- This track's aliases compose with the `Result[T]` from `data_oriented_error_handling_20260606`: `Result[FileItems]`, `Result[CommsLogEntry]`, etc. are valid generics.
|
||||
- The audit script is the **permanent CI gate** for this convention. New `dict[str, Any]` in a PR fails `--strict` mode.
|
||||
|
||||
**Current state:** Plan complete (`91475781` plan). Ready for execution.
|
||||
|
||||
---
|
||||
|
||||
### 4.5 `mcp_architecture_refactor_20260606`
|
||||
|
||||
**Goal:** Split the 2,205-line monolithic `src/mcp_client.py` (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP.
|
||||
|
||||
**Architecture:**
|
||||
- `src/mcp_client.py` (modified, slim): `SubMCP` Protocol + `MCPController` class + module-level `controller` singleton + `ALL_SUB_MCPS` registration list + re-export shim from `mcp_client_legacy`.
|
||||
- `src/mcp_client_legacy.py` (NEW): the OLD `mcp_client.py` content. Re-exported for backward compat.
|
||||
- `src/mcp_client_security.py` (NEW): 3-layer security (Allowlist → Resolve → Validate) returning `Result[Path]`.
|
||||
- `src/mcp_file_io.py` (9 tools), `src/mcp_python.py` (14), `src/mcp_c.py` (5), `src/mcp_cpp.py` (5), `src/mcp_web.py` (2), `src/mcp_analysis.py` (2): native sub-MCPs.
|
||||
- `src/mcp_external.py`: the existing `ExternalMCPManager` extracted; class name preserved as `ExternalMCP` for compat.
|
||||
|
||||
**Naming convention (per user direction):** `mcp_<type>.py` for native MCPs. The user explicitly said this; the convention is locked in.
|
||||
|
||||
**Key design decisions:**
|
||||
- **Sub-MCP shape:** class with `name` / `description` / `tools` (dict) / `invoke()` (returns `Result[str, ErrorInfo]`).
|
||||
- **Registration mechanism:** explicit `controller.register(FileIOMCP())` at the bottom of `mcp_client.py`. New sub-MCP = create the file + add 2 lines to the registration. No magic, no auto-discovery.
|
||||
- **Controller-level security:** the 3-layer security runs BEFORE delegating to sub-MCPs. Sub-MCPs receive already-validated paths. Testable in isolation.
|
||||
- **Dispatch inversion:** the controller uses an inverted-dict `self._tool_index[tool_name] -> sub_mcp` for O(1) lookup. The current if/elif chain is O(n) per dispatch.
|
||||
- **External MCP is NOT in `ALL_SUB_MCPS`** — it's a sub-controller. The main controller delegates to it AFTER native sub-MCPs miss.
|
||||
|
||||
**The "thin adapter" approach for v1:**
|
||||
- Each sub-MCP's methods (e.g., `read_file`, `py_get_skeleton`) **delegate to the corresponding function in `mcp_client_legacy.py`**. This keeps the legacy module as the source of truth for the implementation; the new `mcp_<type>.py` is a thin adapter that adds the class shape, the security check, and the `Result` wrapping.
|
||||
- A future track can move the actual implementations into the sub-MCP files directly once the architecture is established. For v1, delegation is the safer path.
|
||||
|
||||
**Backward compatibility:**
|
||||
- `src/mcp_client_legacy.py` re-exports all 45+ old function names.
|
||||
- `src/mcp_client.py` is now a slim shim that imports from legacy.
|
||||
- The 4 existing test files (`test_mcp_client_beads.py`, `test_mcp_config.py`, `test_mcp_perf_tool.py`, `test_mcp_ts_integration.py`) and `src/app_controller.py:61` (the direct `mcp_client.py_get_symbol_info` call) continue to work unchanged.
|
||||
|
||||
**The DSL future (per user's notes on APL/K/Cosy):**
|
||||
- The user shared a friend's idea: per-MCP compact dialects (like command line but more flexible) instead of JSON.
|
||||
- Acknowledged in the spec as out of scope for this track ("no time for that").
|
||||
- Documented as `mcp_dsl_20260606` follow-up in spec §12.1.
|
||||
- The sub-MCP architecture is the natural unit to pair with a DSL emitter in the future.
|
||||
|
||||
**Current state:** Plan complete (`cf01870b` plan). Ready for execution.
|
||||
|
||||
---
|
||||
|
||||
## 5. The Audit & Data Foundation
|
||||
|
||||
The most data-grounded track is `data_structure_strengthening_20260606`. The audit that drove it is committed at `84fd9ac9`:
|
||||
|
||||
```
|
||||
File: scripts/audit_weak_types.py
|
||||
Size: 281 lines
|
||||
Modes: default (human-readable), --json, --top N, --verbose
|
||||
Detection: AST-based; regex over ast.unparse() of type annotations
|
||||
Patterns detected: 14 (Dict[str, Any], list[dict[...]], Tuple[...], Optional[...], assign-tuple-literal, ...)
|
||||
Positive patterns detected: TypeAlias, NamedTuple, @dataclass, pydantic.BaseModel
|
||||
Exit codes: 0 = informational, 1 = usage error
|
||||
```
|
||||
|
||||
**Pre-track findings (baseline):**
|
||||
- 430 weak sites in 29 of 61 files
|
||||
- 0 strong patterns
|
||||
- 26 unique type strings
|
||||
- Top 4 unique strings = 86% of findings
|
||||
|
||||
**Post-track target:**
|
||||
- ~60 weak sites in the 23 lower-impact files (the 6 high-traffic files contribute 0)
|
||||
- 10 `TypeAlias` definitions + 1 `NamedTuple` in use
|
||||
- `--strict` mode + baseline file as permanent CI gate
|
||||
|
||||
This is **the most measurable track** in the planning session. Success = a concrete number drop in the audit count.
|
||||
|
||||
---
|
||||
|
||||
## 6. The Coordinate Picture (dependencies)
|
||||
|
||||
The 5 tracks form a dependency graph. The arrows are "blocks":
|
||||
|
||||
```
|
||||
startup_speedup_20260606 (SHIPPED)
|
||||
↓
|
||||
├── test_batching_refactor_20260606 (planned)
|
||||
│
|
||||
├── qwen_llama_grok_integration_20260606 (planned)
|
||||
│ ↓
|
||||
│ ├── data_oriented_error_handling_20260606 (planned)
|
||||
│ │ ↓
|
||||
│ │ ├── public_api_migration_20260606 (follow-up; not yet specced)
|
||||
│ │ └── type_registry_ci_20260606 (follow-up; not yet specced)
|
||||
│ │
|
||||
│ └── data_structure_strengthening_20260606 (planned)
|
||||
│ ↓
|
||||
│ └── type_registry_ci_20260606 (follow-up; not yet specced)
|
||||
│
|
||||
└── mcp_architecture_refactor_20260606 (planned; depends on data_oriented + data_structure tracks)
|
||||
↓
|
||||
└── mcp_dsl_20260606 (follow-up; not yet specced)
|
||||
```
|
||||
|
||||
**Critical insight:** `mcp_architecture_refactor_20260606` depends on BOTH `data_oriented_error_handling_20260606` (for `Result`) and `data_structure_strengthening_20260606` (for the `Metadata` aliases). If the implementing agent executes tracks in arbitrary order, this dependency is broken.
|
||||
|
||||
The recommended execution order is the topological order: `startup_speedup` (done) → `qwen_llama_grok` → `data_oriented_error_handling` + `data_structure_strengthening` (in parallel) → `mcp_architecture_refactor` → `test_batching_refactor` (no dependencies; can run anytime) → follow-up tracks.
|
||||
|
||||
---
|
||||
|
||||
## 7. Follow-up Tracks Already Planned (Not in This Session's 5)
|
||||
|
||||
Each track's spec §12.1 names a follow-up. Aggregated:
|
||||
|
||||
| Follow-up | Parent track | Scope |
|
||||
|---|---|---|
|
||||
| `public_api_migration_20260606` | data_oriented_error_handling | Remove deprecated `ai_client.send()`; migrate all callers (multi_agent_conductor, app_controller, ~50 tests) to `send_result()` |
|
||||
| `type_registry_ci_20260606` | data_structure_strengthening | Wire `generate_type_registry.py --check` into CI; add pre-commit hook; document per-track commit workflow |
|
||||
| `mcp_dsl_20260606` | mcp_architecture_refactor | Per-MCP compact dialect for tool calls (APL/K/Cosy-inspired); ~5x token reduction per call |
|
||||
|
||||
All three are listed in `conductor/tracks.md` as `[ ]` placeholders. They should be sequenced AFTER the 5 main tracks ship. None are urgent; all are improvements.
|
||||
|
||||
---
|
||||
|
||||
## 8. Recommended Future Tracks (Beyond What's Planned)
|
||||
|
||||
These are tracks I identified during this session but didn't fully spec. They're ranked by what I think is most important.
|
||||
|
||||
### 8.1 Post-Tracks Documentation Synchronization (top pick)
|
||||
|
||||
**Why:** The 5 planned tracks add 10+ new modules and change the architecture significantly. The existing docs (`docs/guide_*.md`) were last updated in the 2026-06-02 comprehensive docs refresh — and are about to be more out of date than they are now. Stale docs are the #1 enemy of AI readability (an LLM reading `guide_ai_client.md` and finding it pre-dates `Result`/`ErrorInfo` will hallucinate the wrong shape).
|
||||
|
||||
**Scope (1-2 phases):**
|
||||
- Phase 1: Update all existing guides (`guide_ai_client.md`, `guide_mcp_client.md`, etc.) to reflect the post-tracks state.
|
||||
- Phase 2: Add cookbooks ("How to add a new sub-MCP", "How to add a new AI vendor", "How to add a new result type") + a `docs/type_registry.md` index.
|
||||
|
||||
**Why first:** Bounded and achievable. Closes the loop on all the planning work — each track ships a module; this track ships the docs that explain those modules.
|
||||
|
||||
### 8.2 Test Coverage Audit & Improvement (runner-up)
|
||||
|
||||
**Why:** The project has a stated >80% coverage target per `conductor/workflow.md`, but the actual current state is unknown. Under-tested areas are likely `app_controller.py` (4,153 lines; the orchestrator that touches everything) and `multi_agent_conductor.py` (the most complex control flow). The new modules from the 5 planned tracks each get unit tests in their respective tracks, but integration tests are sparse.
|
||||
|
||||
**Scope (1-2 phases):**
|
||||
- Phase 1: Run `pytest --cov=src --cov-report=html`; identify the bottom-10 modules by coverage; write tests to bring each to >80%.
|
||||
- Phase 2: Add a coverage threshold to CI (e.g., `--cov-fail-under=80`); add per-module coverage badges to `docs/Readme.md`.
|
||||
|
||||
### 8.3 Security Audit / Hardening
|
||||
|
||||
**Why:** The 3-layer MCP security model is solid, but there are adjacent concerns:
|
||||
- **Command injection in `run_powershell`** — the AI generates PowerShell commands; how is the risk of a malicious model call mitigated? The HITL dialog exists, but is it consistently applied?
|
||||
- **Prompt injection** — the AI sees file content, web search results, Beads queries. A malicious file could inject instructions that the AI then follows. How is this sanitized?
|
||||
- **Sensitive data in logs** — the `comms_log` records full API requests/responses. If a user includes an API key or password in a message, it ends up in the log. What's the redaction policy?
|
||||
|
||||
**Scope (1-2 phases):**
|
||||
- Phase 1: Threat model the AI tool-calling surface; document the existing mitigations; identify gaps.
|
||||
- Phase 2: Add log redaction for known secret patterns; add a "dangerous command" detector for `run_powershell`; add an "untrusted content" marker for content from external sources.
|
||||
|
||||
### 8.4 Dependency Hygiene
|
||||
|
||||
**Why:** `pyproject.toml` has a long dep list. No track for:
|
||||
- Version pinning strategy (caret vs tilde vs exact)
|
||||
- Deprecation monitoring (track when a vendor SDK announces EOL)
|
||||
- License audit (any GPL contamination?)
|
||||
- CVE scanning
|
||||
|
||||
This is a "track for the person who maintains the project 6 months from now."
|
||||
|
||||
---
|
||||
|
||||
## 9. Risks & Open Questions (Cross-Track)
|
||||
|
||||
### 9.1 Risks
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| The implementing agent executes tracks in the wrong order, breaking the dependency chain (especially for `mcp_architecture_refactor_20260606` which depends on the other two). | Medium | High (broken tests; confusing failures) | The recommended execution order in §6 is explicit. The plan files note the dependencies in their "blocked_by" sections. |
|
||||
| The 5 tracks add 10+ new files but the `scripts/audit_main_thread_imports.py` doesn't catch a heavy import in one of the new modules. | Low | Medium (regresses the startup_speedup invariant) | Each new module's Phase 1 task includes an import-time check (`uv run python -c "import time; ..."`). |
|
||||
| A future contributor adds a new `dict[str, Any]` after the data_structure_strengthening track; the audit `--strict` mode catches it, but they're confused about why. | Medium | Low (process friction) | The styleguide + the deprecation warning in `--strict` mode explain the rule. |
|
||||
| The `mcp_client_legacy.py` shim becomes permanent and never gets removed. | Medium | Low (acceptable) | The `public_api_migration_20260606` follow-up (and any future MCP-API changes) is the natural place to remove the shim. |
|
||||
| The DSL idea becomes a "we have to do it now" before the architecture track is done. | Low | Low | The DSL is explicitly out of scope. The sub-MCP architecture is compatible with a future DSL layer. |
|
||||
|
||||
### 9.2 Open questions for the next planning round
|
||||
|
||||
- **Where do the implementation agents' session notes / handoffs go?** Each track has `metadata.json` + `state.toml` for the planning side. There's no equivalent for the implementation side. (The `startup_speedup_20260606` track's recent commits `253e1798`, `88fc42bb`, `8c4791d0` suggest they do handoff via commit messages, but a structured format would be nice.)
|
||||
- **What happens when a track's implementation diverges from the plan?** Per `conductor/workflow.md`, "implementation differs from spec" is handled by updating the spec. But the plan files don't have a clear "deviations" section. Consider adding one to future plans.
|
||||
- **How are plan review comments captured?** The plan files are committed at `cf01870b` (and the others). But there's no `conductor/plan_reviews/` directory. If the implementing agent has questions or disagreements, where do they go?
|
||||
|
||||
---
|
||||
|
||||
## 10. File Index
|
||||
|
||||
For the implementing agent (and any future planner), here's the canonical file index.
|
||||
|
||||
### 10.1 Conductor convention files (the project-level structure)
|
||||
|
||||
| File | Purpose |
|
||||
|---|---|
|
||||
| `conductor/tracks.md` | Master track registry. Lists all tracks with their status (`[ ]` planned, `[~]` in progress, `[x]` done) and `[track-created: <sha>]` references. |
|
||||
| `conductor/workflow.md` | The project's TDD + per-track commit + git note workflow. |
|
||||
| `conductor/product-guidelines.md` | The project's design principles (1-space indent, 1 commit per task, type hints, etc.). |
|
||||
| `conductor/product.md` | The project's product vision and use cases. |
|
||||
| `conductor/tech-stack.md` | The project's tech stack. |
|
||||
| `conductor/code_styleguides/python.md` | Language-specific style guide. |
|
||||
| `conductor/code_styleguides/error_handling.md` | (created in data_oriented_error_handling) Data-Oriented Error Handling convention. |
|
||||
| `conductor/code_styleguides/type_aliases.md` | (created in data_structure_strengthening) Type Aliases convention. |
|
||||
|
||||
### 10.2 The 5 new tracks (this session's planning output)
|
||||
|
||||
| Track | Spec SHA | Plan SHA | Files |
|
||||
|---|---|---|---|
|
||||
| `test_batching_refactor_20260606` | `b7a97374` | `f7b11f7f` | spec.md, metadata.json, state.toml, plan.md |
|
||||
| `qwen_llama_grok_integration_20260606` | `7c1d597e` (track init), `97daaff2` (consistency) | `b17cbbde` | spec.md, metadata.json, state.toml, plan.md |
|
||||
| `data_oriented_error_handling_20260606` | `494f68f9` (init), `cbc3b075` (track + tracks.md), `f7b11f7f` (plan) | `f7b11f7f` | spec.md, metadata.json, state.toml, plan.md |
|
||||
| `data_structure_strengthening_20260606` | `ed42a97a` (init), `aba35f9f` (registry), `432c7895` (risk) | `91475781` | spec.md, metadata.json, state.toml, plan.md |
|
||||
| `mcp_architecture_refactor_20260606` | `2720a894` (init), `dd137df7` (backfill) | `cf01870b` | spec.md, metadata.json, state.toml, plan.md |
|
||||
|
||||
### 10.3 The 5 new module families (what the tracks will create)
|
||||
|
||||
| Module family | Created by | Files |
|
||||
|---|---|---|
|
||||
| Test batching | `test_batching_refactor_20260606` | `scripts/{test_categorizer,test_batcher,pytest_collection_order}.py`, `scripts/run_tests_batched.py`, `tests/test_categories.toml` |
|
||||
| Vendor capability matrix | `qwen_llama_grok_integration_20260606` | `src/{vendor_capabilities,openai_compatible,qwen_adapter}.py` |
|
||||
| Result types | `data_oriented_error_handling_20260606` | `src/result_types.py` |
|
||||
| Type aliases + registry | `data_structure_strengthening_20260606` | `src/type_aliases.py`, `scripts/generate_type_registry.py`, `docs/type_registry/` |
|
||||
| Sub-MCPs | `mcp_architecture_refactor_20260606` | `src/mcp_<type>.py` (7 files), `src/mcp_client_security.py`, `src/mcp_client_legacy.py` |
|
||||
|
||||
### 10.4 The audit script (data-driven decisions)
|
||||
|
||||
| File | Purpose |
|
||||
|---|---|
|
||||
| `scripts/audit_weak_types.py` (committed `84fd9ac9`) | AST analyzer that found the 430 weak sites driving data_structure_strengthening. |
|
||||
|
||||
### 10.5 The startup_speedup predecessor
|
||||
|
||||
| Track | Status | Key outputs |
|
||||
|---|---|---|
|
||||
| `startup_speedup_20260606` | SHIPPED (commits `12cec6ae`, `bb2ac6c9`, `253e1798`, `88fc42bb`, `8c4791d0`) | `_io_pool` ThreadPoolExecutor; warmup mechanism; lazy SDK imports; `scripts/audit_main_thread_imports.py` CI gate |
|
||||
|
||||
This is the **predecessor for all 5 tracks** — the lazy-SDK-import convention means the new modules can use `from src.openai_compatible import send_openai_compatible` at the top without paying the SDK import cost on the main thread.
|
||||
|
||||
---
|
||||
|
||||
## 11. Closing Notes
|
||||
|
||||
### 11.1 What the user achieved in this session
|
||||
|
||||
In a single multi-hour planning session, the user:
|
||||
- Approved 5 architectural refactor tracks end-to-end (brainstorming → spec → plan)
|
||||
- Made 3 major design decisions with significant impact: (1) the `mcp_<type>.py` naming convention, (2) the "docs over TypedDict" tradeoff, (3) the deprecation-not-removal of the public `send()` API
|
||||
- Brought in external inspiration: Ryan Fleury's data-oriented error handling, the user's friend's DSL idea
|
||||
- Established a pattern for **data-grounded planning**: every spec is preceded by an audit (or an inventory) that drives the design decisions
|
||||
|
||||
### 11.2 What the implementing agent inherits
|
||||
|
||||
- 5 fully-specced + planned tracks, each with TDD task breakdown
|
||||
- A clear execution order (topological sort of the dependency graph)
|
||||
- ~25+ unit tests per track (pre-existing + new) that serve as regression coverage
|
||||
- A permanent audit + CI gate (`scripts/audit_weak_types.py --strict`) for the type-alias convention
|
||||
- Styleguides + product-guidelines + a new docs directory (`docs/type_registry/`) that serve as living documentation
|
||||
|
||||
### 11.3 What I would do differently if I could start over
|
||||
|
||||
- **Earlier on the data-oriented framing:** The user brought Fleury's article mid-session (for the error-handling track). It would have been useful to surface the data-oriented design philosophy in the FIRST track (test_batching_refactor) and apply it there. Going forward, this is a thread to weave into every track.
|
||||
- **The "richest context" claim is half-true:** I have deep visibility into architecture and code quality concerns but little visibility into operational / production concerns (observability, telemetry, error rates in the field, user experience metrics). The recommended future tracks in §8 reflect this bias.
|
||||
|
||||
### 11.4 One last recommendation
|
||||
|
||||
**The post-tracks documentation track (§8.1) is the single most important thing to do NEXT** — after the 5 tracks ship, the docs are out of date. Plan it BEFORE the user starts working on the next big feature, so the codebase stays maintainable.
|
||||
Reference in New Issue
Block a user