conductor(track): spec for public_api_migration_and_ui_polish_20260615 (3 prod + 12 test migrations + 2 UI Polish test fixes)

2026-06-15 15:20:44 -04:00
parent 431ebce2b9
commit 3febdab42c
1 changed files with 585 additions and 0 deletions
@@ -0,0 +1,585 @@
+# Track Specification: Public API Migration + UI Polish Test Cleanup
+
+**Track ID:** `public_api_migration_and_ui_polish_20260615`
+**Status:** Active (spec approved 2026-06-15)
+**Priority:** A (foundational; precedes `data_structure_strengthening_20260606`)
+**Owner:** Tier 2 Tech Lead
+**Type:** refactor + bugfix + test_cleanup + documentation
+**Estimated effort:** 2-3 days Tier 2 work (16-24 hours)
+**Parent tracks:** `data_oriented_error_handling_20260606` (shipped 2026-06-12), `ai_loop_regressions_20260614` (shipped 2026-06-15), `doeh_test_thinking_cleanup_20260615` (shipped 2026-06-15)
+**Blocks:** `data_structure_strengthening_20260606` (cleaner `Result` API usage makes type-alias replacement easier), `mcp_architecture_refactor_20260606` (transitively)
+
+---
+
+## 0. TL;DR
+
+This is a **stability track** that finishes the cleanup work started by `data_oriented_error_handling_20260606` and `doeh_test_thinking_cleanup_20260615`. Two concerns, one track:
+
+1. **Public API Migration**: remove the deprecated `ai_client.send()` legacy wrapper; migrate 3 remaining production call sites + 12 test files to `send_result()`; fix 4 of the 10 pre-existing test failures (2 Qwen + 2 symbol_parsing) as a side effect of the migration.
+2. **UI Polish Test Cleanup**: fix 2 broken test assertions in `test_discussion_truncate_layout.py` and `test_log_management_refresh.py` (the production code was already fixed by user commits `d0b06575` and `df7bda6e`; the tests use `find()` which locates the comment block instead of the actual code).
+
+**Result:** 6 of 10 pre-existing test failures fixed. Remaining 4 RAG failures are deferred to the next track (a separate RAG subsystem track — out of scope for this one). Project reaches a stable state suitable for the `data_structure_strengthening_20260606` track.
+
+---
+
+## 1. Overview
+
+### 1.1 Current State (as of 2026-06-15)
+
+The `data_oriented_error_handling_20260606` track (shipped 2026-06-12) introduced the `Result[T, ErrorInfo]` pattern and `send_result()` as the new public API. The legacy `ai_client.send()` was marked `@deprecated` and routed through `send_result()` internally. Two follow-up tracks shipped fixes for the immediate user-blocking issues (`ai_loop_regressions_20260614`) and the easy test mock bugs (`doeh_test_thinking_cleanup_20260615`).
+
+**As of 2026-06-15:**
+- 3 production call sites of the deprecated `send()` remain in `src/`
+- 12 test files use `ai_client.send()` directly
+- 1 test file uses `_send_<vendor>()` with the new `Result` return type but the old assertion pattern (causing 2 of 10 pre-existing failures)
+- 2 test files mock `ai_client.send` directly (causing 2 of 10 pre-existing failures)
+- 2 UI Polish test files use `find()` to locate a comment block instead of the actual code (causing 2 of 10 pre-existing failures)
+- 4 RAG test files fail (separate subsystem; deferred to a follow-up RAG track)
+
+### 1.2 Gaps to Fill (this Track's Scope)
+
+| Gap | Count | Type | Spec Section |
+|---|---|---|---|
+| Production `ai_client.send()` callers | 3 | refactor (deprecation removal) | §3.1 |
+| Test files using `ai_client.send()` | 12 | refactor (deprecation removal) | §3.2 |
+| Test files using `_send_<vendor>()` with old assertions | 1 | test fix (G3 in pre-existing failures) | §3.3 |
+| Test files mocking `ai_client.send` | 1 | test fix (G4 in pre-existing failures) | §3.4 |
+| UI Polish test bugs (`find()` not `rfind()`) | 2 | test fix (G6, G7 in pre-existing failures) | §3.5 |
+| Deprecation marker + legacy `send()` function | 1 | refactor (deprecation removal) | §3.6 |
+| `filterwarnings` conftest entry | 1 | housekeeping (deprecation removal) | §3.6 |
+| `test_deprecation_warnings.py` | 1 file (2 tests) | delete (tests obsolete) | §3.6 |
+| `docs/guide_ai_client.md` deprecation references | multiple | documentation | §3.7 |
+| `conductor/product-guidelines.md` deprecation language | multiple | documentation | §3.7 |
+
+### 1.3 Already Implemented (DO NOT re-implement)
+
+Verified by code audit (2026-06-15) — the following already work and are NOT in this track's scope:
+
+- **`send_result()` public API** — added in commit `9f86b2be` by `data_oriented_error_handling_20260606`
+- **`_send_<vendor>()` returning `Result[str]`** — all 6 vendors (`_send_gemini`, `_send_gemini_cli`, `_send_grok`, `_send_minimax`, `_send_qwen`, `_send_llama`, `_send_llama_native`) already return `Result[str]` (refactored in commits `0282f9ff`, `943a21bf`, `e384afce`, `64d6ba2d`)
+- **The 2 in-flight `_api_generate` and `_handle_request_event` migrations in `app_controller.py`** — already done by `doeh_test_thinking_cleanup_20260615` (commits `24ba2499` and `7b323e3e`)
+- **`test_ai_client_result.py::test_send_result_does_not_emit_deprecation`** — passes; the deprecation warning filter works
+- **11 test mock fixes from `doeh_test_thinking_cleanup_20260615`** — 29/29 tests in 5 files (`test_grok_provider`, `test_llama_provider`, `test_llama_ollama_native`, `test_ai_client_tool_loop_builder`, `test_headless_service`) now use the `Result` API
+- **UI Polish Phase 1 (markdown tables) — `src/markdown_table.py`** — shipped by commit `79ac9210`
+- **UI Polish Phase 2 (Keep Pairs input)** — code fix shipped by user commit `d0b06575` (`src/gui_2.py:5130-5131`); test bug remains (this track fixes)
+- **UI Polish Phase 3 (Refresh Registry)** — code fix shipped by user commit `df7bda6e` (`src/gui_2.py:2111-2112`); test bug remains (this track fixes)
+- **UI Polish Phase 4 (Vendor State tab)** — shipped by commit `3a864076` (`src/vendor_state.py`)
+- **UI Polish Phase 5 (Files & Media directory tree)** — shipped by commit `74e02485` (`src/gui_2.py:render_files_and_media`)
+
+---
+
+## 2. Goals
+
+### 2.1 Functional Goals
+
+| ID | Goal | Acceptance Criterion |
+|---|---|---|
+| **G1** | Migrate `src/conductor_tech_lead.py:68` to `send_result()` + Result handling | `uv run pytest tests/test_conductor_tech_lead.py` (or nearest tests) passes; no regression in tier-2 dispatch |
+| **G2** | Migrate `src/orchestrator_pm.py:86` to `send_result()` + Result handling | No regression in tier-1 dispatch tests |
+| **G3** | Migrate `src/multi_agent_conductor.py:591` to `send_result()` + Result handling | `test_mma_concurrent_tracks_sim`, `test_mma_step_mode_sim`, `test_undo_redo_sim`, and 30+ MMA live_gui tests pass |
+| **G4** | Migrate 12 test files using `ai_client.send()` to use `send_result()` | All migrated tests pass; no test calls `ai_client.send()` after this track |
+| **G5** | Fix `test_qwen_provider.py` (2 tests) to use `Result` API assertion pattern | 2/2 tests pass; same approach as `doeh_test_thinking_cleanup_20260615` used for grok/llama |
+| **G6** | Fix `test_symbol_parsing.py` (2 tests) to mock `send_result` not `send` | 2/2 tests pass |
+| **G7** | Fix `test_discussion_truncate_layout.py` (1 test) to use `rfind()` not `find()` | 1/1 test passes |
+| **G8** | Fix `test_log_management_refresh.py` (1 test) to use `rfind()` not `find()` | 1/1 test passes |
+| **G9** | Remove `@deprecated` decorator + legacy `send()` function in `src/ai_client.py` | `ai_client.send` AttributeError if called; `rg "ai_client\.send\(" src/ tests/` returns 0 hits |
+| **G10** | Delete `tests/test_deprecation_warnings.py` (2 obsolete tests) | File removed; no test imports or calls `ai_client.send` |
+| **G11** | Remove `filterwarnings` entry in `pyproject.toml:46-47` | `rg "ignore:Use ai_client.send_result" pyproject.toml` returns 0 hits |
+| **G12** | Update `docs/guide_ai_client.md` to remove deprecation references | No `@deprecated` mention; Result API section no longer notes "send() is deprecated" |
+| **G13** | Update `conductor/product-guidelines.md` to remove deprecation language | No "send() is deprecated; use send_result()" in product guidelines |
+
+### 2.2 Non-Functional Goals
+
+| ID | Goal | Acceptance Criterion |
+|---|---|---|
+| **NF1** | Zero new test regressions | `uv run pytest tests/` shows 4 fewer failures than the pre-track baseline (10 - 6 = 4 remaining; all RAG) |
+| **NF2** | All 28 production changes atomic per-task | 28 git commits; each commit is buildable + testable |
+| **NF3** | All changes follow the project's 1-space indentation, no-comments, type-hinting rules | `uv run python -c "import ast; ast.parse(open('src/ai_client.py').read())"` succeeds; production code has zero `#` comments in changed lines |
+| **NF4** | Per-commit git notes attached | `git log --format='%H %s' --grep="^public_api_migration_and_ui_polish_20260615" \| wc -l` matches task count |
+| **NF5** | `doeh_test_thinking_cleanup_20260615` state.toml remains parseable | `python -c "import tomllib; tomllib.load(open('conductor/tracks/doeh_test_thinking_cleanup_20260615/state.toml','rb'))"` succeeds |
+
+---
+
+## 3. Per-File Design
+
+### 3.1 Production call sites to migrate
+
+**Why these 3 only:** `data_oriented_error_handling_20260606` spec §12.1 lists 5 production call sites. Two of the five (`src/app_controller.py:282` and `src/app_controller.py:3674`) were already migrated by `doeh_test_thinking_cleanup_20260615` (commits `7b323e3e` and `24ba2499`). One was a misidentification — `src/mcp_client.py:2274` is an MCP tool schema for `py_check_syntax`, not a `send()` call. The remaining 3 are real.
+
+| File:Line | Current code | After this track | Difficulty |
+|---|---|---|---|
+| `src/conductor_tech_lead.py:68` | `response = ai_client.send(md_content="", user_message=user_message)` (2-arg) | `result = ai_client.send_result(md_content="", user_message=user_message); if not result.ok: <log warn + return None>; response = result.data` | Easy (2-arg call) |
+| `src/orchestrator_pm.py:86` | `response = ai_client.send(md_content="", user_message=user_message, enable_tools=False)` (3-arg) | `result = ai_client.send_result(md_content="", user_message=user_message, enable_tools=False); if not result.ok: <log warn + return None>; response = result.data` | Easy (3-arg call) |
+| `src/multi_agent_conductor.py:591` | `response = ai_client.send(md_content=..., user_message=..., base_dir=".", pre_tool_callback=..., qa_callback=..., patch_callback=..., stream_callback=...)` (8-arg, with 5 callbacks) | `result = ai_client.send_result(md_content=..., ...); if not result.ok: <log warn via comms + return per-ticket error>; response = result.data` | Hard (5 callbacks; per-ticket error handling needed in MMA) |
+
+**MMA per-ticket error handling:** the existing `_handle_request_event` pattern in `app_controller.py:3674` (already migrated by `doeh_test_thinking_cleanup_20260615`) uses `raise HTTPException(status_code=502, detail=err.ui_message())`. The MMA worker does not have an HTTP layer; the per-ticket error should be:
+- Logged to the comms log as `WARN/deprecated_send_with_errors` (or `WARN/worker_send_failed`)
+- Returned via `worker_comms_callback` as a status entry (per `multi_agent_conductor.py:584` callback)
+- The worker exits with a non-zero status so the DAG engine marks the ticket as failed
+
+**Reference:** `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §12.1 lines 677-688; `doeh_test_thinking_cleanup_20260615/spec.md` §3.1 (the G1 fix pattern at `src/app_controller.py:265-295` is the canonical reference for Result handling).
+
+### 3.2 Test files using `ai_client.send()` to migrate
+
+**Why 12 not 63:** the parent spec claimed "63 test files (verified 2026-06-11)". The current count (rg verified 2026-06-15) is **12 files with 20 call sites**. The discrepancy is because the spec was written when there were more legacy call sites; `data_oriented_error_handling_20260606` Phase 3 + `doeh_test_thinking_cleanup_20260615` Phase 2 already migrated the rest.
+
+| File | Call sites | Migration pattern |
+|---|---|---|
+| `tests/test_ai_client_cli.py` | 1 | `response = ai_client.send(...)` → `result = ai_client.send_result(...); assert result.ok; response = result.data` |
+| `tests/test_ai_cache_tracking.py` | 1 | Same pattern |
+| `tests/test_ai_client_result.py` | 3 | One is the existing `test_send_deprecated_emits_warning` (will be DELETED in Phase 6); the other two are pre-existing tests that test `send()` directly — they need to be rewritten to test `send_result()` semantics |
+| `tests/test_api_events.py` | 2 | Same pattern |
+| `tests/test_deepseek_provider.py` | 6 | Same pattern (per-call-site migration; 6 commits is too many; consolidate to 1-2 commits) |
+| `tests/test_gemini_cli_edge_cases.py` | 1 | Same pattern |
+| `tests/test_gemini_cli_integration.py` | 2 | Same pattern |
+| `tests/test_gemini_cli_parity_regression.py` | 1 | Same pattern |
+| `tests/test_gui2_mcp.py` | 1 | Same pattern |
+| `tests/test_tier4_interceptor.py` | 1 | Same pattern |
+| `tests/test_token_usage.py` | 1 | Same pattern |
+| **Total** | **20 call sites in 11 files** | The 12th file is `test_symbol_parsing.py` which mocks `send` not calls it; handled separately in §3.4 |
+
+**Migration pattern (canonical):**
+```python
+# Before:
+result = ai_client.send(md_content, user_message, base_dir)
+assert result == "expected text"
+
+# After:
+result = ai_client.send_result(md_content, user_message, base_dir)
+assert result.ok, f"send_result failed: {result.errors[0].ui_message() if result.errors else 'no error info'}"
+assert result.data == "expected text"
+```
+
+**Special case: `test_ai_client_result.py`:** The current file has 3 tests for the deprecated `send()`. The track DELETES the `test_send_deprecated_emits_warning` test (send() is removed in Phase 6) and KEEPS the `test_send_result_does_not_emit_deprecation` test (it remains a regression test for the new API). The 3rd test (`test_send_result_does_not_emit_deprecation` is the 2nd) needs review — see the file directly.
+
+**Test isolation:** Group migration by file. Per-file atomic commits preserve the file as a rollback unit. 11 files = 11 atomic commits (consolidate `test_ai_client_result.py` and `test_deepseek_provider.py` since they have multiple sites per file).
+
+**Reference:** `doeh_test_thinking_cleanup_20260615/plan.md` Phase 2 (Tasks 2.1-2.5) for the exact migration pattern; `conductor/code_styleguides/error_handling.md` §3.1 (AND over OR pattern).
+
+### 3.3 `test_qwen_provider.py` fix (2 tests)
+
+**Current state (verified 2026-06-15):**
+- `_send_qwen()` returns `Result[str]` (refactored by `data_oriented_error_handling_20260606` commit `64d6ba2d`)
+- Tests at `tests/test_qwen_provider.py:17-19` and `:27-28` assert against raw `str`:
+  ```python
+  result = ai_client._send_qwen("system", "user", ".", None, "", False, None, None, None)
+  assert result == "hi from qwen"  # FAILS: result is Result(data="hi from qwen")
+  ```
+
+**Fix:** Mirror the pattern used by `doeh_test_thinking_cleanup_20260615` for `test_grok_provider`, `test_llama_provider`, `test_llama_ollama_native`:
+```python
+result = ai_client._send_qwen("system", "user", ".", None, "", False, None, None, None)
+assert result.ok and result.data == "hi from qwen"
+```
+
+And for the image test:
+```python
+result = ai_client._send_qwen("system", "describe this image", ".", file_items, "", False, None, None, None)
+assert result.ok and "cat" in result.data.lower()
+```
+
+**Why this approach and not renaming `_send_qwen` → `_send_qwen_result`:** the parent spec at line 611 planned the rename, but commit `64d6ba2d` only changed the return type (not the name). The function name `_send_qwen` is stable; only the return type changed. Migrating the tests to handle `Result` is the right scope for this track. A future "rename to `_send_qwen_result`" track could be planned separately if needed.
+
+**Test isolation:** 1 atomic commit for both test fixes (per-file atomicity).
+
+**Reference:** `doeh_test_thinking_cleanup_20260615/spec.md` §1.1 (G2-G11 test mock bugs); `doeh_test_thinking_cleanup_20260615/plan.md` Phase 2.1-2.3 (the grok/llama/llama_native patterns).
+
+### 3.4 `test_symbol_parsing.py` fix (2 tests)
+
+**Current state (verified 2026-06-15):**
+- `tests/test_symbol_parsing.py:45,74` mock `src.ai_client.send`
+- Production now calls `src.ai_client.send_result` (per the migration done by `doeh_test_thinking_cleanup_20260615` commit `24ba2499`)
+- Mock receives 0 calls; test fails with `Expected 'send' to have been called once. Called 0 times.`
+
+**Fix:**
+```python
+# Before:
+with patch('src.ai_client.send') as mock_send:
+    ...
+    mock_send.assert_called_once()
+
+# After:
+with patch('src.ai_client.send_result') as mock_send_result:
+    mock_send_result.return_value = Result(data="mocked response")
+    ...
+    mock_send_result.assert_called_once()
+```
+
+**Test isolation:** 1 atomic commit for both test fixes (per-file atomicity).
+
+**Reference:** `doeh_test_thinking_cleanup_20260615/plan.md` Task 2.7 (the headless_service `test_generate_endpoint` mock migration is the canonical reference).
+
+### 3.5 UI Polish test fixes (2 tests)
+
+**Current state (verified 2026-06-15):**
+
+The UI Polish Five Issues track (`docs/superpowers/specs/2026-06-03-ui-polish-design.md`) has 5 phases. Per the code audit (2026-06-15):
+
+| Phase | Status | Code location | Test status |
+|---|---|---|---|
+| 1. Markdown tables | SHIPPED (commit `79ac9210`) | `src/markdown_table.py` | passing |
+| 2. Keep Pairs input | SHIPPED (user commit `d0b06575`) | `src/gui_2.py:5130-5131` (now `set_next_item_width(140)` + `drag_int`) | FAILING (test bug — see below) |
+| 3. Refresh Registry | SHIPPED (user commit `df7bda6e`) | `src/gui_2.py:2111-2112` (in-place `load_registry()`) | FAILING (test bug — see below) |
+| 4. Vendor State tab | SHIPPED (commit `3a864076`) | `src/vendor_state.py` | passing |
+| 5. Files & Media directory tree | SHIPPED (commit `74e02485`) | `src/gui_2.py:render_files_and_media` | passing |
+
+**Test bug in Phase 2 (`test_discussion_truncate_layout.py`):**
+```python
+def test_keep_pairs_input_uses_adequate_width():
+ src = inspect.getsource(gui_2)
+ marker = "Keep Pairs:"
+ idx = src.find(marker)  # ← BUG: finds comment block at line 5113
+ assert idx != -1, "Could not find Keep Pairs label in gui_2.py"
+ snippet = src[idx:idx + 200]  # ← snippet window doesn't reach line 5130
+ assert "set_next_item_width(80)" not in snippet, ...  # passes (vacuously)
+ assert "set_next_item_width(140)" in snippet, ...  # FAILS: snippet ends at the comment
+ assert "drag_int" in snippet, ...  # FAILS: snippet ends at the comment
+```
+
+The first occurrence of "Keep Pairs:" is in a comment at line 5113 (in the docstring of `render_discussion_entry_controls`). The actual code is at line 5130. The 200-char snippet window only reaches into the docstring.
+
+**Fix:** Use `rfind()` instead of `find()` to find the LAST occurrence (the actual code):
+```python
+def test_keep_pairs_input_uses_adequate_width():
+ src = inspect.getsource(gui_2)
+ marker = "Keep Pairs:"
+ idx = src.rfind(marker)  # ← finds the code at line 5130
+ assert idx != -1, "Could not find Keep Pairs label in gui_2.py"
+ snippet = src[idx:idx + 200]  # ← snippet now includes line 5130-5131
+ assert "set_next_item_width(80)" not in snippet, ...
+ assert "set_next_item_width(140)" in snippet, ...  # passes
+ assert "drag_int" in snippet, ...  # passes
+```
+
+**Test bug in Phase 3 (`test_log_management_refresh.py`):**
+```python
+def test_refresh_registry_button_calls_load_registry():
+ src = inspect.getsource(gui_2)
+ marker = "Refresh Registry"
+ idx = src.find(marker)  # ← BUG: finds comment block at line 2090
+ assert idx != -1, "Could not find Refresh Registry button in gui_2.py"
+ snippet = src[idx:idx + 400]  # ← snippet window doesn't reach line 2111
+ assert "load_registry" in snippet, ...  # FAILS
+```
+
+The first occurrence of "Refresh Registry" is in a comment at line 2090. The actual code is at line 2111. The 400-char snippet window doesn't reach the code.
+
+**Fix:** Same pattern — use `rfind()` to find the actual code:
+```python
+def test_refresh_registry_button_calls_load_registry():
+ src = inspect.getsource(gui_2)
+ marker = "Refresh Registry"
+ idx = src.rfind(marker)  # ← finds the code at line 2111
+ assert idx != -1, "Could not find Refresh Registry button in gui_2.py"
+ snippet = src[idx:idx + 400]
+ assert "load_registry" in snippet, ...  # passes
+ assert snippet.count("log_registry.LogRegistry(") <= 1, ...  # passes
+```
+
+**Test isolation:** 1 atomic commit for both test fixes (per-file atomicity; they're both 1-character changes in the same test fixture style).
+
+**Reference:** `docs/superpowers/specs/2026-06-03-ui-polish-design.md` §3.2 (Phase 2 design) and §3.3 (Phase 3 design).
+
+### 3.6 Deprecation removal
+
+**Files to modify:**
+
+1. **`src/ai_client.py:2939-3040`** — Remove the `@deprecated` decorator on `def send(...)` and the entire function body. The function is replaced by `send_result()` (which already exists at `src/ai_client.py:3002`).
+   - Verify: `rg "def send\(" src/ai_client.py` returns 0 hits (only `def send_result(` should remain).
+
+2. **`tests/test_deprecation_warnings.py`** — Delete the file. Both tests are obsolete:
+   - `test_send_deprecated_warning_emitted_once_per_site` — tests `send()`; can't run after `send()` is removed
+   - `test_send_result_does_not_emit_deprecation` — tests `send_result()` doesn't emit a deprecation; trivially true after `send()` is removed (no deprecation source)
+
+3. **`pyproject.toml:46-47`** — Remove the `filterwarnings` entry:
+   ```toml
+   filterwarnings = [
+       "ignore:Use ai_client.send_result.*:DeprecationWarning",  # DELETE THIS LINE
+   ]
+   ```
+   - Verify: `rg "ignore:Use ai_client.send_result" pyproject.toml` returns 0 hits.
+
+**Test isolation:** 1 atomic commit for the 3 changes (consecutive cleanup; the changes are meaningless without each other).
+
+**Reference:** `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.5 (deprecation strategy); `pyproject.toml:46-47` (current entry).
+
+### 3.7 Documentation updates
+
+**1. `docs/guide_ai_client.md` — remove deprecation references:**
+
+Search for "deprecat" (case-insensitive) and remove:
+- "Use ai_client.send_result() instead" mentions
+- "The deprecated send() will be removed in..." warnings
+- The entire deprecation warning table at the bottom of the `send_result` section
+
+**2. `conductor/product-guidelines.md` — remove deprecation language:**
+
+Search for "deprecat" (case-insensitive) and remove or update:
+- "send() is deprecated" mentions
+- "Use send_result()" instructions (the deprecation is being removed)
+- Update the "Public API deprecation" section to mark as resolved
+
+**Test isolation:** 1 atomic commit for the 2 doc updates (consecutive cleanup).
+
+**Reference:** `conductor/product-guidelines.md` "Data-Oriented Error Handling > Public API deprecation" section (search for the heading; mark as RESOLVED).
+
+---
+
+## 4. Architecture Reference
+
+### 4.1 The Result API (Fleury Pattern)
+
+The `Result[T, ErrorInfo]` pattern from `conductor/code_styleguides/error_handling.md` is the foundation. This track is the **removal of the deprecation** that the data_oriented_error_handling track introduced; the new API is the permanent one.
+
+**Key files:**
+- `src/result_types.py` — `Result`, `ErrorInfo`, `ErrorKind`, `NilPath`, `NilRAGState`
+- `src/ai_client.py:3002` — `def send_result(...)` (the permanent public API after this track)
+- `src/ai_client.py:_send_<vendor>()` (6 vendors) — return `Result[str]`
+
+**Per-call-site error handling pattern (canonical):**
+```python
+result = ai_client.send_result(md_content, user_message, base_dir, ...)
+if not result.ok:
+    err = result.errors[0]
+    # call-site-specific error handling:
+    # - HTTP layer (app_controller:_api_generate): raise HTTPException(502, detail=err.ui_message())
+    # - GUI layer (app_controller:_handle_request_event): log to comms + add error entry
+    # - MMA worker (multi_agent_conductor): log to comms + return per-ticket error
+    # - Tier 1/2 sub-agents (orchestrator_pm, conductor_tech_lead): log warn + return None or empty
+response = result.data
+```
+
+### 4.2 The deprecated send() function
+
+Per `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.5 lines 183-206, the `send()` function:
+- Was added in `data_oriented_error_handling_20260606` Phase 3 (commit `73cf321c`)
+- Wraps `send_result()` and unwraps the `Result` to return `str`
+- Is marked `@deprecated` via `typing_extensions.deprecated` (Python 3.11+ backport)
+- Emits a `DeprecationWarning` at runtime (cached per call site)
+
+The `filterwarnings` entry in `pyproject.toml:46-47` silences the warning during the transition period. This track removes both the function and the filter entry.
+
+### 4.3 Threading & Locking
+
+The production call site migrations MUST preserve the existing locking:
+- `multi_agent_conductor.py:591` — runs in a worker thread; the `set_comms_log_callback` and `set_current_tier` calls before the `send()` call MUST be preserved
+- `orchestrator_pm.py:86` — runs in the orchestrator thread; lock acquisition patterns must be preserved
+- `conductor_tech_lead.py:68` — runs in a sub-agent thread; the `set_custom_system_prompt` and `set_current_tier` calls before the `send()` call MUST be preserved
+
+**Reference:** `docs/guide_ai_client.md` "Threading Model" section; `docs/guide_app_controller.md` "AI Loop Lifecycle" section.
+
+### 4.4 The MMA per-ticket error handling
+
+The MMA worker (`multi_agent_conductor.py:run_worker_lifecycle`) currently does NOT have per-ticket error handling — it expects `send()` to return a `str` (and raises an exception on internal errors which the worker catches). After this track, `send_result()` returns a `Result[str]` with the errors in `result.errors`. The migration must:
+
+1. Check `result.ok` immediately after the call
+2. If `!result.ok`:
+   - Log the error to the comms log via `worker_comms_callback` (status entry with `err.ui_message()`)
+   - Return a sentinel value that the DAG engine marks as failed (e.g., return `None` and the worker exits with non-zero status)
+3. If `result.ok`: continue with `result.data` as before
+
+**Reference:** `docs/guide_mma.md` "Worker Lifecycle" section; the `multi_agent_conductor.py:584` `worker_comms_callback` (already wired up).
+
+---
+
+## 5. Test Plan
+
+### 5.1 Per-phase test verification
+
+Each phase must pass targeted tests before moving to the next:
+
+| Phase | Test command | Expected |
+|---|---|---|
+| 1 | `uv run pytest tests/test_conductor_tech_lead.py tests/test_orchestrator_pm.py tests/test_mma_concurrent_tracks_sim.py tests/test_mma_step_mode_sim.py tests/test_undo_redo_sim.py -v 2>&1 \| tee tests/artifacts/public_api_phase1.log` | All pass |
+| 2 | `uv run pytest tests/test_ai_client_cli.py tests/test_ai_cache_tracking.py tests/test_ai_client_result.py tests/test_api_events.py tests/test_deepseek_provider.py tests/test_gemini_cli_*.py tests/test_gui2_mcp.py tests/test_tier4_interceptor.py tests/test_token_usage.py -v 2>&1 \| tee tests/artifacts/public_api_phase2.log` | All pass |
+| 3 | `uv run pytest tests/test_qwen_provider.py -v` | 5/5 pass (2 of which were the pre-existing failures) |
+| 4 | `uv run pytest tests/test_symbol_parsing.py -v` | 2/2 pass (which were the pre-existing failures) |
+| 5 | `uv run pytest tests/test_discussion_truncate_layout.py tests/test_log_management_refresh.py -v` | 2/2 pass (which were the pre-existing failures) |
+| 6 | `uv run pytest tests/test_deprecation_warnings.py -v 2>&1` (should fail — file is deleted) + `uv run rg "ai_client\.send\(" src/ tests/` (should return 0) | File deleted; 0 rg hits |
+| 7 | `uv run pytest tests/ 2>&1 \| tee tests/artifacts/public_api_phase7_full.log` | 4 fewer failures than pre-track (10 - 6 = 4 RAG failures remain) |
+
+### 5.2 Per-task TDD red verification
+
+For each task that introduces a new test, the implementer MUST:
+1. Verify the test FAILS as expected (red phase)
+2. Implement the fix
+3. Verify the test PASSES (green phase)
+4. Commit
+
+**Anti-pattern guard:** per `AGENTS.md` "Critical Anti-Patterns", no skipping tests just because they fail. If a test fails for an unexpected reason, the implementer MUST investigate before committing.
+
+### 5.3 Test isolation
+
+Per `docs/guide_testing.md` "Structural Testing Contract":
+- No `unittest.mock.patch` on core infrastructure (event queues, `ai_client` internals, threading primitives) unless explicitly authorized
+- All integration tests use `live_gui` fixture
+- Test artifacts in `tests/artifacts/` or `tests/logs/` (gitignored)
+
+This track's tests are mostly UNIT tests (no `live_gui` needed). The MMA migration test (Phase 1) MAY need `live_gui` for the worker dispatch path; verify by running targeted tests first.
+
+---
+
+## 6. Migration Strategy
+
+### 6.1 The order matters
+
+**Phase 1 must complete before Phase 6:**
+- Phase 1 migrates the 3 production call sites to `send_result()`
+- Phase 6 removes the legacy `send()` function
+- If Phase 6 runs first, the production code (still using `send()`) crashes
+
+**Phase 2 must complete before Phase 6:**
+- Phase 2 migrates the 12 test files to `send_result()`
+- Phase 6 removes the legacy `send()` function
+- If Phase 6 runs first, the tests (still using `send()`) crash
+
+**Phase 3, 4, 5 can run in any order after Phase 1** (they're independent test fixes).
+
+**Phase 7 is the final sweep** (docs + tracks.md + full suite).
+
+### 6.2 Per-commit safety
+
+Each atomic commit must:
+- Be buildable (`python -c "import src.ai_client"` succeeds)
+- Pass its targeted tests
+- Not introduce a regression in the previously-passing tests
+- Have a clear commit message with the task number
+
+The per-task commit pattern (per `conductor/workflow.md`):
+```
+fix(ai_client): migrate conductor_tech_lead.py:68 to send_result() (G1, public_api_migration_and_ui_polish_20260615 Phase 1.1)
+```
+
+The per-phase checkpoint pattern:
+```
+conductor(checkpoint): Phase 1 complete - 3 production call sites migrated
+```
+
+---
+
+## 7. Out of Scope
+
+### 7.1 Deferred to separate tracks
+
+| ID | Item | Defer to | Why |
+|---|---|---|---|
+| OOS1 | 4 RAG test failures (test_rag_integration, test_rag_phase4_final_verify, test_rag_phase4_stress, test_rag_visual_sim) | RAG subsystem track (planned; not yet specced) | Pre-existing RAG subsystem issues; error is in RAG config lookup code, not AI client code. A partial fix was attempted in commit `16412ad5`; the remaining issue is a different code path. |
+| OOS2 | The `_send_<vendor>()` → `_send_<vendor>_result()` rename per the data_oriented_error_handling spec §3.4 line 611 | Separate "private API rename" track (if needed) | Not blocking; tests work with current names. The function names are stable; only the return type changed. |
+| OOS3 | The 23 lower-impact files with weak types (per `data_structure_strengthening_20260606/spec.md` §1 line 20) | `data_structure_strengthening_20260606` (the next major track after this) | That's exactly what data_structure_strengthening is for. |
+| OOS4 | The 4 remaining UI Polish track phases that ARE NOT in this scope (none — all 5 are either shipped or addressed by this track's test fixes) | N/A | All 5 UI Polish phases are accounted for. |
+| OOS5 | `live_gui_mock_injection_20260615` infrastructure | Separate infrastructure track | Not blocking. Recommended but not required. |
+
+### 7.2 Explicitly NOT in this track
+
+- **Renaming `_send_<vendor>()` to `_send_<vendor>_result()`** — not needed; tests work with current names after assertion pattern fix
+- **Adding TypedDict / @dataclass schemas** — that's data_structure_strengthening's scope
+- **MMA per-ticket Result returns (per `data_oriented_error_handling_20260606/spec.md` §12.1 line 677 "Adds any new public API surface needed (e.g., per-ticket Result returns in the MMA conductor)")** — the MMA worker already gets `Result[str]` from `send_result()`; the existing `worker_comms_callback` already handles per-ticket status updates. The spec's mention of "per-ticket Result returns" was speculative; the current Result-based flow is sufficient.
+- **Removing the `filterwarnings` for the `Optional[T]` ban** — the `audit_optional_in_3_files.py` audit (per `data_oriented_error_handling_20260606/spec.md`) is unrelated to this track's deprecation removal.
+
+---
+
+## 8. Risks & Mitigations
+
+| ID | Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|---|
+| **R1** | `multi_agent_conductor.py:591` migration breaks MMA worker dispatch (5 callbacks) | Medium | High | TDD red first: verify a known MMA test fails before the fix; verify it passes after. The existing `doeh_test_thinking_cleanup_20260615` G1 fix pattern is the canonical reference for Result handling. |
+| **R2** | Removing `send()` breaks a test that imports it indirectly | Low | Medium | Run `rg "ai_client\.send\(" src/ tests/` before AND after Phase 6 to confirm 0 hits. |
+| **R3** | `pyproject.toml` filterwarnings removal causes test suite to fail with `DeprecationWarning` (e.g., from another library) | Low | Low | The filter was added in `data_oriented_error_handling_20260606` specifically to silence `send()` deprecation; no other deprecation in the codebase is silenced by it. Verified by checking the rg history. |
+| **R4** | UI Polish test fixes (`find()` → `rfind()`) mask a real production bug | Low | Medium | The production code at `src/gui_2.py:5130-5131` and `:2111-2112` was already verified to have the correct values (`set_next_item_width(140)` + `drag_int` and in-place `load_registry()`). The test bug is just the search logic. |
+| **R5** | Qwen test fix uses a different pattern than grok/llama/llama_native | Low | Low | The plan uses the same `assert result.ok and result.data == "x"` pattern as `doeh_test_thinking_cleanup_20260615` (commits `d7e42a4a`, `439a0ac0`, `dbdf9ba9`). |
+| **R6** | `test_deprecation_warnings.py` deletion is misinterpreted as "deleting tests instead of fixing them" | Low | Low | Both tests in the file are obsolete after `send()` removal. The first test (test_send_deprecated) literally cannot run without `send()`. The second test (test_send_result_does_not_emit_deprecation) is trivially true. Document in the commit message. |
+| **R7** | The 4 RAG test failures get introduced or regressed during this track | Low | Medium | Run full test suite in Phase 7 and compare to the pre-track baseline. The 4 RAG failures are documented as "pre-existing" with their defer-to track recorded. |
+
+---
+
+## 9. Verification Criteria (definition of "done")
+
+The track is DONE when **ALL** of the following are true:
+
+1. **G1-G3 production migrations complete**: 3 call sites use `send_result()`; no `ai_client.send(` in `src/`
+2. **G4 test migration complete**: 12 test files use `send_result()`; no `ai_client.send(` in `tests/`
+3. **G5 Qwen test fix complete**: `test_qwen_provider.py` 5/5 pass
+4. **G6 symbol_parsing test fix complete**: `test_symbol_parsing.py` 2/2 pass
+5. **G7-G8 UI Polish test fixes complete**: `test_discussion_truncate_layout.py` 1/1 + `test_log_management_refresh.py` 1/1 pass
+6. **G9 deprecation removed**: `@deprecated` decorator and `send()` function gone from `src/ai_client.py`
+7. **G10 test_deprecation_warnings.py deleted**: file does not exist
+8. **G11 filterwarnings removed**: no `ignore:Use ai_client.send_result` in `pyproject.toml`
+9. **G12-G13 docs updated**: no `@deprecated` or "send is deprecated" mentions in `docs/guide_ai_client.md` or `conductor/product-guidelines.md`
+10. **NF1 no regressions**: full test suite has 4 RAG failures remaining (down from 10); no new failures
+11. **NF2 per-task commits**: ~28 atomic commits with clear messages
+12. **NF3 style preserved**: 1-space indentation, no comments, type hints in all changed code
+13. **NF4 per-commit git notes**: all 28 commits have git notes summarizing the task
+14. **NF5 doeh state.toml parseable**: `tomllib.load()` succeeds (unchanged from previous track; sanity check)
+15. **Final state**: 1280 + 6 newly-passing = 1286 tests pass; 4 RAG failures documented as deferred
+
+**Test count math:**
+- Pre-track baseline: 1280 pass + 4 skip + 10 fail (verified 2026-06-15)
+- After this track: 1286 pass + 4 skip + 4 fail (6 newly-passing: 2 Qwen + 2 symbol_parsing + 1 truncate + 1 refresh)
+- The 4 remaining failures are all RAG subsystem; deferred to the next track
+
+---
+
+## 10. Execution Order & Dependencies
+
+**No external blockers.** This track can start immediately after the Tier 1 review approves the spec.
+
+**Execution order (the plan):**
+1. Phase 1 (production migration) — 1 day
+2. Phase 2 (test migration, 12 files) — 1 day
+3. Phase 3 (Qwen test fix) — 1 hour (can be combined with Phase 2)
+4. Phase 4 (symbol_parsing test fix) — 30 min (can be combined with Phase 2)
+5. Phase 5 (UI Polish test fixes) — 30 min (independent)
+6. Phase 6 (deprecation removal) — 30 min (MUST be after Phases 1 + 2)
+7. Phase 7 (docs + housekeep) — 1 hour (after Phase 6)
+
+**Total:** 2-3 days Tier 2 work (the estimate accounts for the per-commit overhead + per-task git notes + 7 phase checkpoints).
+
+**Followed by:** the user can start `data_structure_strengthening_20260606` track (already has spec, plan pending).
+
+---
+
+## 11. References
+
+### Architecture docs
+- `docs/guide_ai_client.md` — multi-provider LLM client; `send_result()` is the canonical public API
+- `docs/guide_app_controller.md` — headless controller; `app_controller.py:_handle_request_event` was migrated by `doeh_test_thinking_cleanup_20260615`
+- `docs/guide_mma.md` — 4-tier MMA orchestration; `multi_agent_conductor.py:run_worker_lifecycle` is the worker entry point
+- `docs/guide_mcp_client.md` — MCP tool registry (note: `mcp_client.py:2274` was a misidentification in the parent spec)
+- `docs/guide_testing.md` — `live_gui` fixture + structural testing contract
+
+### Styleguides
+- `conductor/code_styleguides/error_handling.md` — `Result[T]` pattern + the AND-over-OR convention
+- `conductor/code_styleguides/data_oriented_design.md` — canonical DOD reference
+- `conductor/product-guidelines.md` — 1-space indentation, no comments, type hints, SDM tags
+
+### Parent tracks
+- `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.5 (deprecation strategy), §12.1 (follow-up scope)
+- `conductor/tracks/data_oriented_error_handling_20260606/state.toml` — the parent track's state
+- `conductor/tracks/doeh_test_thinking_cleanup_20260615/spec.md` — the previous track; the migration pattern reference
+- `conductor/tracks/doeh_test_thinking_cleanup_20260615/plan.md` Phase 2 — exact test mock fix pattern (Tasks 2.1-2.5)
+- `docs/reports/TRACK_COMPLETION_doeh_test_thinking_cleanup_20260615.md` — the 11 mock fixes that established the pattern
+
+### UI Polish track
+- `docs/superpowers/specs/2026-06-03-ui-polish-design.md` — the 5-phase UI Polish spec
+- `docs/superpowers/plans/2026-06-03-ui-polish.md` — the 5-phase UI Polish plan
+- User commits: `d0b06575` (Phase 2 code fix), `df7bda6e` (Phase 3 code fix)
+- Track commits: `79ac9210` (Phase 1), `3a864076` (Phase 4), `74e02485` (Phase 5)
+
+### Test files (the 12 + 1 to migrate, the 4 UI Polish fixes)
+- 12 send() test files: `test_ai_client_cli`, `test_ai_cache_tracking`, `test_ai_client_result`, `test_api_events`, `test_deepseek_provider`, `test_gemini_cli_edge_cases`, `test_gemini_cli_integration`, `test_gemini_cli_parity_regression`, `test_gui2_mcp`, `test_tier4_interceptor`, `test_token_usage`, `test_symbol_parsing`
+- 1 _send_ test file: `test_qwen_provider`
+- 2 UI Polish test files: `test_discussion_truncate_layout`, `test_log_management_refresh`
+- 1 file to delete: `test_deprecation_warnings`
+
+### Production call sites (3 to migrate)
+- `src/conductor_tech_lead.py:68`
+- `src/orchestrator_pm.py:86`
+- `src/multi_agent_conductor.py:591`
+
+### Codebase locations
+- `src/ai_client.py:2939-3040` — the deprecated `send()` function (to be deleted)
+- `src/ai_client.py:3002` — the new `send_result()` public API (kept)
+- `pyproject.toml:46-47` — the `filterwarnings` entry (to be deleted)
+- `tests/test_deprecation_warnings.py` — the 2 obsolete tests (to be deleted)
+- `docs/guide_ai_client.md` — deprecation references (to be removed)
+- `conductor/product-guidelines.md` — deprecation language (to be removed)