conductor(tracks): add test_infrastructure_hardening_20260609 as foundation track + supersede 4 placeholder test tracks

2026-06-09 15:18:20 -04:00
parent 566cf08cb8
commit fcb161fd2e
1 changed files with 37 additions and 29 deletions
@@ -16,31 +16,32 @@ Tracks that are unblocked and ready to start. Ordered by **dependency** (blocked

 | # | Priority | Track | Status | Blocked By |
 |---|---|---|---|---|
-| 1 | A | [Qwen, Llama & Grok Vendor Integration + Capability Matrix](#track-qwen-llama-grok-vendor-integration--capability-matrix) | spec ✓, plan pending | (none — foundation track) |
-| 2 | A | [Data-Oriented Error Handling (Fleury Pattern)](#track-data-oriented-error-handling-fleury-pattern) | spec ✓, plan ✓, ready to start | startup_speedup, test_batching_refactor, **qwen_llama_grok** (this is the upstream) |
-| 3 | A | [Data Structure Strengthening (Type Aliases + NamedTuples)](#track-data-structure-strengthening-type-aliases--namedtuples) | spec ✓, plan pending | (none — independent) |
-| 4 | A | [MCP Architecture Refactor (Sub-MCP Extraction)](#track-mcp-architecture-refactor-sub-mcp-extraction) | spec ✓, plan pending | data_oriented_error_handling, data_structure_strengthening |
-| 5 | D | [Public API Result Migration](#track-public-api-result-migration-followup) | placeholder; not yet specced | data_oriented_error_handling (deprecated `send()`) |
-| 6 | — | [UI Polish (Five Issues)](#track-ui-polish-five-issues) | spec ✓, plan ✓, ready to start | (none — independent) |
-| 7 | — | [Bootstrap gencpp Python Bindings](#track-bootstrap-gencpp-python-bindings) | spec TBD | (none — independent) |
-| 8 | — | [Tree-Sitter Lua MCP Tools](#track-tree-sitter-lua-mcp-tools) | spec TBD | (none — independent) |
-| 9 | — | [GDScript Language Support Tools](#track-gdscript-language-support-tools) | spec TBD | (none — independent) |
-| 10 | — | [C# Language Support Tools](#track-c-language-support-tools) | spec TBD | (none — independent) |
-| 11 | — | [OpenAI Provider Integration](#track-openai-provider-integration) | spec TBD | (none — independent) |
-| 12 | — | [Zhipu AI (GLM) Provider Integration](#track-zhipu-ai-glm-provider-integration) | spec TBD | (none — independent) |
-| 13 | — | [AI Provider Caching Optimization](#track-ai-provider-caching-optimization) | spec TBD | (none — independent) |
-| 14 | — | [Manual UX Validation & Review](#track-manual-ux-validation--review) | spec TBD | (none — independent) |
-| 14a | — | [Manual UX Validation — ASCII-Sketch Workflow](#track-manual-ux-validation--ascii-sketch-workflow-new-2026-06-08) | spec ✓, plan ✓, ready to start | (none — independent; NEW 2026-06-08) |
-| 14b | — | [Chunkification Optimization (Contingency)](#track-chunkification-optimization-new-2026-06-08-contingency) | spec ✓ (contingency), no plan | hard constraint surface (deferred) |
-| 15 | — | [GenCpp Dogfood Feedback Loop](#track-gencpp-dogfood-feedback-loop) | spec TBD | (none — independent; oldest pending track) |
-| 16 | — | [Code Path Audit](#track-code-path-audit) | spec TBD | (none — investigation track) |
-| 17 | — | [GUI Architecture Refinement](#track-gui-architecture-refinement) | (no spec.md) | (TBD) |
-| 18 | — | [Context First Message Fix](#track-context-first-message-fix) | spec TBD | (none — independent) |
-| 19 | — | [Fix Remaining Tests](#track-fix-remaining-tests) | spec TBD | (none — independent) |
-| 20 | — | [Test Harness Hardening](#track-test-harness-hardening) | spec TBD | (none — independent) |
-| 21 | — | [Test Patch Fixes](#track-test-patch-fixes) | spec TBD | (none — independent) |
-| 22 | — | [Test Batching Post-Refactor Polish](#track-test-batching-post-refactor-polish) | spec TBD | test_batching_refactor (COMPLETE) |
-| 23 | — | [Prior Session Test Harden (20260605)](#track-prior-session-test-harden-20260605-superseded) | superseded; no action needed | — |
+| 1 | A | [Test Infrastructure Hardening (2026-06-09)](#track-test-infrastructure-hardening-2026-06-09) | spec ✓, plan ✓, ready to start | (none — foundation track; SUPERSEDES tracks 19, 20, 21, 22) |
+| 2 | A | [Qwen, Llama & Grok Vendor Integration + Capability Matrix](#track-qwen-llama-grok-vendor-integration--capability-matrix) | spec ✓, plan pending | **test_infrastructure_hardening_20260609** (was: none) |
+| 3 | A | [Data-Oriented Error Handling (Fleury Pattern)](#track-data-oriented-error-handling-fleury-pattern) | spec ✓, plan ✓, ready to start | startup_speedup, test_batching_refactor, **test_infrastructure_hardening_20260609**, qwen_llama_grok |
+| 4 | A | [Data Structure Strengthening (Type Aliases + NamedTuples)](#track-data-structure-strengthening-type-aliases--namedtuples) | spec ✓, plan pending | **test_infrastructure_hardening_20260609** (was: none) |
+| 5 | A | [MCP Architecture Refactor (Sub-MCP Extraction)](#track-mcp-architecture-refactor-sub-mcp-extraction) | spec ✓, plan pending | test_infrastructure_hardening_20260609, data_oriented_error_handling, data_structure_strengthening |
+| 6 | D | [Public API Result Migration](#track-public-api-result-migration-followup) | placeholder; not yet specced | data_oriented_error_handling (deprecated `send()`) |
+| 7 | — | [UI Polish (Five Issues)](#track-ui-polish-five-issues) | spec ✓, plan ✓, ready to start | (none — independent) |
+| 8 | — | [Bootstrap gencpp Python Bindings](#track-bootstrap-gencpp-python-bindings) | spec TBD | (none — independent) |
+| 9 | — | [Tree-Sitter Lua MCP Tools](#track-tree-sitter-lua-mcp-tools) | spec TBD | (none — independent) |
+| 10 | — | [GDScript Language Support Tools](#track-gdscript-language-support-tools) | spec TBD | (none — independent) |
+| 11 | — | [C# Language Support Tools](#track-c-language-support-tools) | spec TBD | (none — independent) |
+| 12 | — | [OpenAI Provider Integration](#track-openai-provider-integration) | spec TBD | (none — independent) |
+| 13 | — | [Zhipu AI (GLM) Provider Integration](#track-zhipu-ai-glm-provider-integration) | spec TBD | (none — independent) |
+| 14 | — | [AI Provider Caching Optimization](#track-ai-provider-caching-optimization) | spec TBD | (none — independent) |
+| 15 | — | [Manual UX Validation & Review](#track-manual-ux-validation--review) | spec TBD | (none — independent) |
+| 15a | — | [Manual UX Validation — ASCII-Sketch Workflow](#track-manual-ux-validation--ascii-sketch-workflow-new-2026-06-08) | spec ✓, plan ✓, ready to start | (none — independent; NEW 2026-06-08) |
+| 15b | — | [Chunkification Optimization (Contingency)](#track-chunkification-optimization-new-2026-06-08-contingency) | spec ✓ (contingency), no plan | hard constraint surface (deferred) |
+| 16 | — | [GenCpp Dogfood Feedback Loop](#track-gencpp-dogfood-feedback-loop) | spec TBD | (none — independent; oldest pending track) |
+| 17 | — | [Code Path Audit](#track-code-path-audit) | spec TBD | test_infrastructure_hardening_20260609 (was: none) |
+| 18 | — | [GUI Architecture Refinement](#track-gui-architecture-refinement) | (no spec.md) | (TBD) |
+| 19 | — | [Context First Message Fix](#track-context-first-message-fix) | spec TBD | (none — independent) |
+| ~~19~~ | — | ~~[Fix Remaining Tests](#track-fix-remaining-tests)~~ | ~~SUPERSEDED by track 1~~ | — |
+| ~~20~~ | — | ~~[Test Harness Hardening](#track-test-harness-hardening)~~ | ~~SUPERSEDED by track 1~~ | — |
+| ~~21~~ | — | ~~[Test Patch Fixes](#track-test-patch-fixes)~~ | ~~SUPERSEDED by track 1~~ | — |
+| ~~22~~ | — | ~~[Test Batching Post-Refactor Polish](#track-test-batching-post-refactor-polish)~~ | ~~SUPERSEDED by track 1 (FR1 + FR2)~~ | — |
+| 20 | — | [Prior Session Test Harden (20260605)](#track-prior-session-test-harden-20260605-superseded) | superseded; no action needed | — |

 **Note on numbering:** the legacy file used `0a`, `0b`, `0c`... and `0d`, `0e`, `0f`, `0g` for tracks created 2026-06-06+. This is the **git-blame sort order**, not a logical execution order. The new structure re-orders by dependency.

@@ -464,28 +465,35 @@ User review surfaced five outstanding UI issues, each previously attempted witho
 *Goal: Replace alphabetical 4-at-a-time batching in `scripts/run_tests_batched.py` with fixture-class-isolated tiers: 0 (opt-in: clean_install/docker, gated on env var + --include-opt-in flag), 1 (unit, grouped by subsystem batch_group, pytest-xdist), 2 (mock_app, grouped), 3 (live_gui, all in one pytest invocation to amortize 15s startup), H (headless), P (performance, last). Hybrid classification: auto-infer from filename + AST fixture scan, hand-curated `tests/test_categories.toml` overrides for cross-cutting and ambiguous files. Opt-in per-test order control via `[[files.X.test_order]]` sub-tables, gated on a conftest-loaded pytest plugin (no-op without entries). Priority: B (process isolation) > A (subsystem diagnostic) > C (speed). 4 phases: library+dry-run, shadow run, switch default, cleanup.*
 *Goal: Reduce `sloppy.py` startup time by ~2000-2400ms. **Main Thread Purity Invariant**: main thread (entering `immapp.run()`) never imports a module heavier than `imgui_bundle` + lean `gui_2` skeleton. **No-prefetch rule**: heavy SDKs (`google.genai` 955ms, `anthropic` 430ms, `openai` 445ms, `fastapi` 470ms) are lazy-only — paid once on first use, on the asyncio thread, not in the background. **No-new-threads rule**: all background work goes through `AppController._io_pool` (4-thread `ThreadPoolExecutor`, named `controller-io-N`); zero new `threading.Thread(...)` calls in `src/`. **Enforcement**: static `scripts/audit_main_thread_imports.py` CI gate + runtime `tests/test_main_thread_purity.py` (`sys.addaudithook` test). 9 phases, 57 tasks. Target: `import src.ai_client` < 50ms (from ~1800ms), `import src.gui_2` < 500ms (from ~3000ms), `live_gui.wait_for_server(timeout=15)` no longer times out.*

+### Active
+
+#### Track: Test Infrastructure Hardening (2026-06-09) `[track-created: 566cf08c]`
+*Link: [./tracks/test_infrastructure_hardening_20260609/](./tracks/test_infrastructure_hardening_20260609/), Spec: [./tracks/test_infrastructure_hardening_20260609/spec.md](./tracks/test_infrastructure_hardening_20260609/spec.md), Plan: [./tracks/test_infrastructure_hardening_20260609/plan.md](./tracks/test_infrastructure_hardening_20260609/plan.md), Metadata: [./tracks/test_infrastructure_hardening_20260609/metadata.json](./tracks/test_infrastructure_hardening_20260609/metadata.json), State: [./tracks/test_infrastructure_hardening_20260609/state.toml](./tracks/test_infrastructure_hardening_20260609/state.toml)*
+
+*Goal: **Kill the test regression nightmare** that has consumed 4+ days of Tier 2 work. Fix 3 root causes of test regression churn: (1) subprocess state pollution via autouse `_check_live_gui_health` respawn (FR1), (2) filesystem path hygiene via `tmp_path_factory` + `live_gui_workspace` fixture (FR2), (3) `_sync_rag_engine` io_pool race via token + dirty flag coalescing (FR3). Plus 2 related fixes: `set_value` hook routing for `ai_input` (FR4), and an opt-in `clean_baseline` marker (FR5). 8 phases, ~60 surgical tasks, 6.5 days. Produces `docs/reports/test_bed_health_20260609.md` as the green baseline for the 4 upcoming tracks. **Inherits from** `test_infra_hardening_foundation_20260608` + `batch_resilience_plan_20260608` + `rag_test_batch_failure_status_20260609_pm3` + `rag_work_final_20260609_pm`. **Supersedes** the placeholder tracks `fix_remaining_tests_20260513`, `test_harness_hardening_20260310`, `test_patch_fixes_20260513`, and `test_batching_post_refactor_polish_20260607` (whose work is now scoped in FR1+FR2+FR3). **Blocks** the 4 upcoming tracks (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor) and code_path_audit_20260607. **Tier 2 supervision required for** Phases 1, 3, 4 (audit review, conftest refactor, io_pool race fix).*
+
 ### In Plan (or Pending Spec)

 #### Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix `[track-created: 7c1d597e]`
 *Link: [./tracks/qwen_llama_grok_integration_20260606/](./tracks/qwen_llama_grok_integration_20260606/), Spec: [./tracks/qwen_llama_grok_integration_20260606/spec.md](./tracks/qwen_llama_grok_integration_20260606/spec.md), Plan: [./tracks/qwen_llama_grok_integration_20260606/plan.md](./tracks/qwen_llama_grok_integration_20260606/plan.md) (to be authored by writing-plans skill)*

-*Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a **Vendor Capability Matrix** (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in `src/vendor_capabilities.py`. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared `send_openai_compatible()` helper in `src/openai_compatible.py` that operates on a normalized request/response data structure; each `_send_<vendor>()` is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor `_send_minimax()` to use the helper (~250 lines → ~50). **Out of scope** (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive.*
+*Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a **Vendor Capability Matrix** (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in `src/vendor_capabilities.py`. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared `send_openai_compatible()` helper in `src/openai_compatible.py` that operates on a normalized request/response data structure; each `_send_<vendor>()` is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor `_send_minimax()` to use the helper (~250 lines → ~50). **Out of scope** (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive. **Now blocked by** test_infrastructure_hardening_20260609 (was: none).*

 #### Track: Data-Oriented Error Handling (Fleury Pattern) `[track-created: 494f68f9]`
 *Link: [./tracks/data_oriented_error_handling_20260606/](./tracks/data_oriented_error_handling_20260606/), Spec: [./tracks/data_oriented_error_handling_20260606/spec.md](./tracks/data_oriented_error_handling_20260606/spec.md), Plan: [./tracks/data_oriented_error_handling_20260606/plan.md](./tracks/data_oriented_error_handling_20260606/plan.md)*

-*Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New `src/result_types.py` (ErrorKind enum, ErrorInfo dataclass, `Result[T]` with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new `conductor/code_styleguides/error_handling.md` canonical reference. Refactor `src/mcp_client.py` ((p, err) tuples → Result; 30+ `assert p is not None` → nil-sentinel paths), `src/ai_client.py` (ProviderError exception → ErrorInfo dataclass; `_send_<vendor>()` → `_send_<vendor>_result()` returning `Result[str]`; `send()` marked `@deprecated`; new `send_result()` public API), and `src/rag_engine.py` (RAGEngine methods → Result returns). Update `conductor/product-guidelines.md` + `workflow.md` + `docs/guide_*.md` so the convention is documented and future plans can incrementally migrate the remaining `src/` files. **Blocked by** startup_speedup, test_batching_refactor, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive.*
+*Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New `src/result_types.py` (ErrorKind enum, ErrorInfo dataclass, `Result[T]` with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new `conductor/code_styleguides/error_handling.md` canonical reference. Refactor `src/mcp_client.py` ((p, err) tuples → Result; 30+ `assert p is not None` → nil-sentinel paths), `src/ai_client.py` (ProviderError exception → ErrorInfo dataclass; `_send_<vendor>()` → `_send_<vendor>_result()` returning `Result[str]`; `send()` marked `@deprecated`; new `send_result()` public API), and `src/rag_engine.py` (RAGEngine methods → Result returns). Update `conductor/product-guidelines.md` + `workflow.md` + `docs/guide_*.md` so the convention is documented and future plans can incrementally migrate the remaining `src/` files. **Blocked by** startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive.*
 *Follow-up: **`public_api_migration_20260606`** (planned; not yet specced; no directory yet) — removes the deprecated `ai_client.send()` and migrates all callers. Detailed in the parent track's spec §12.1.*

 #### Track: Data Structure Strengthening (Type Aliases + NamedTuples) `[track-created: ed42a97a]`
 *Link: [./tracks/data_structure_strengthening_20260606/](./tracks/data_structure_strengthening_20260606/), Spec: [./tracks/data_structure_strengthening_20260606/spec.md](./tracks/data_structure_strengthening_20260606/spec.md), Plan: [./tracks/data_structure_strengthening_20260606/plan.md](./tracks/data_structure_strengthening_20260606/plan.md) (to be authored by writing-plans skill)*

-*Goal: Improve AI-readability by naming 430 currently-anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types. New `src/type_aliases.py` with 10 `TypeAlias` definitions (`Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`) and 1 `NamedTuple` (`FileItemsDiff`). Mechanical replacement of 345 weak sites across 6 high-traffic files: `src/ai_client.py` (139), `src/app_controller.py` (86), `src/models.py` (51), `src/api_hook_client.py` (32), `src/project_manager.py` (20), `src/aggregate.py` (17). Add `--strict` mode to the existing `scripts/audit_weak_types.py` (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate `scripts/audit_weak_types.baseline.json` with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. **Data-grounded**: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. **Honest about what's missing**: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk.*
+*Goal: Improve AI-readability by naming 430 currently-anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types. New `src/type_aliases.py` with 10 `TypeAlias` definitions (`Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`) and 1 `NamedTuple` (`FileItemsDiff`). Mechanical replacement of 345 weak sites across 6 high-traffic files: `src/ai_client.py` (139), `src/app_controller.py` (86), `src/models.py` (51), `src/api_hook_client.py` (32), `src/project_manager.py` (20), `src/aggregate.py` (17). Add `--strict` mode to the existing `scripts/audit_weak_types.py` (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate `scripts/audit_weak_types.baseline.json` with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. **Data-grounded**: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. **Honest about what's missing**: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk. **Now blocked by** test_infrastructure_hardening_20260609 (was: none).*

 #### Track: MCP Architecture Refactor (Sub-MCP Extraction) `[track-created: 2720a894]`
 *Link: [./tracks/mcp_architecture_refactor_20260606/](./tracks/mcp_architecture_refactor_20260606/), Spec: [./tracks/mcp_architecture_refactor_20260606/spec.md](./tracks/mcp_architecture_refactor_20260606/spec.md), Plan: [./tracks/mcp_architecture_refactor_20260606/plan.md](./tracks/mcp_architecture_refactor_20260606/plan.md) (to be authored by writing-plans skill)*

-*Goal: Split the 2,205-line monolithic `src/mcp_client.py` (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Naming convention `mcp_<type>.py` for native MCPs: `mcp_file_io.py` (9 tools), `mcp_python.py` (14), `mcp_c.py` (5), `mcp_cpp.py` (5), `mcp_web.py` (2), `mcp_analysis.py` (2). The existing `ExternalMCPManager` is extracted to `mcp_external.py` (class name preserved). New `MCPController` class in `src/mcp_client.py` holds the 3-layer security model (extracted to `src/mcp_client_security.py`), the `ALL_SUB_MCPS` registration list, and the inverted-dict dispatch lookup. New `src/mcp_client_legacy.py` re-exports all 45+ old symbols for backward compat (the 4 existing test files + `src/app_controller.py:61` continue to work). Each sub-MCP's `invoke()` returns `Result[str, ErrorInfo]` (Fleury pattern). Path parameters use the `Metadata` family aliases. **Blocked by** `data_oriented_error_handling_20260606` (for `Result`/`ErrorInfo`) and `data_structure_strengthening_20260606` (for `Metadata` aliases). 7 phases: foundation (security + controller), move-to-legacy, extract File I/O, extract Python, extract C/C++/Web/Analysis, extract External, dispatch update + docs + archive. **Out of scope** (per user): a per-MCP DSL (APL/K/Cosy-inspired) for compact tool calls — deferred to `mcp_dsl_20260606` follow-up. JSON-only for now.*
+*Goal: Split the 2,205-line monolithic `src/mcp_client.py` (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Naming convention `mcp_<type>.py` for native MCPs: `mcp_file_io.py` (9 tools), `mcp_python.py` (14), `mcp_c.py` (5), `mcp_cpp.py` (5), `mcp_web.py` (2), `mcp_analysis.py` (2). The existing `ExternalMCPManager` is extracted to `mcp_external.py` (class name preserved). New `MCPController` class in `src/mcp_client.py` holds the 3-layer security model (extracted to `src/mcp_client_security.py`), the `ALL_SUB_MCPS` registration list, and the inverted-dict dispatch lookup. New `src/mcp_client_legacy.py` re-exports all 45+ old symbols for backward compat (the 4 existing test files + `src/app_controller.py:61` continue to work). Each sub-MCP's `invoke()` returns `Result[str, ErrorInfo]` (Fleury pattern). Path parameters use the `Metadata` family aliases. **Blocked by** test_infrastructure_hardening_20260609, `data_oriented_error_handling_20260606` (for `Result`/`ErrorInfo`), and `data_structure_strengthening_20260606` (for `Metadata` aliases). 7 phases: foundation (security + controller), move-to-legacy, extract File I/O, extract Python, extract C/C++/Web/Analysis, extract External, dispatch update + docs + archive. **Out of scope** (per user): a per-MCP DSL (APL/K/Cosy-inspired) for compact tool calls — deferred to `mcp_dsl_20260606` follow-up. JSON-only for now.*

 #### Track: RAG Phase 4 Stress Test Fix `[x] — fixed 16412ad5`
 *Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (`tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/`) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). `index_file()` upserts silently corrupt the collection, then `search()` fails with `Collection expecting embedding with dimension of 3072, got 384` and the AI request never reaches 'done' status, timing out the 50*0.5s = 25s poll loop. Fix: `RAGEngine._init_vector_store` now calls `_validate_collection_dim` which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: `test_rag_collection_dim_mismatch_recreates_collection` and `test_rag_collection_dim_match_preserves_collection` in `tests/test_rag_engine.py`. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*