From 3febdab42c8cb07c1621823687763e5d7a105d48 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Mon, 15 Jun 2026 15:20:44 -0400 Subject: [PATCH] conductor(track): spec for public_api_migration_and_ui_polish_20260615 (3 prod + 12 test migrations + 2 UI Polish test fixes) --- .../spec.md | 585 ++++++++++++++++++ 1 file changed, 585 insertions(+) create mode 100644 conductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md diff --git a/conductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md b/conductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md new file mode 100644 index 00000000..ea567a17 --- /dev/null +++ b/conductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md @@ -0,0 +1,585 @@ +# Track Specification: Public API Migration + UI Polish Test Cleanup + +**Track ID:** `public_api_migration_and_ui_polish_20260615` +**Status:** Active (spec approved 2026-06-15) +**Priority:** A (foundational; precedes `data_structure_strengthening_20260606`) +**Owner:** Tier 2 Tech Lead +**Type:** refactor + bugfix + test_cleanup + documentation +**Estimated effort:** 2-3 days Tier 2 work (16-24 hours) +**Parent tracks:** `data_oriented_error_handling_20260606` (shipped 2026-06-12), `ai_loop_regressions_20260614` (shipped 2026-06-15), `doeh_test_thinking_cleanup_20260615` (shipped 2026-06-15) +**Blocks:** `data_structure_strengthening_20260606` (cleaner `Result` API usage makes type-alias replacement easier), `mcp_architecture_refactor_20260606` (transitively) + +--- + +## 0. TL;DR + +This is a **stability track** that finishes the cleanup work started by `data_oriented_error_handling_20260606` and `doeh_test_thinking_cleanup_20260615`. Two concerns, one track: + +1. **Public API Migration**: remove the deprecated `ai_client.send()` legacy wrapper; migrate 3 remaining production call sites + 12 test files to `send_result()`; fix 4 of the 10 pre-existing test failures (2 Qwen + 2 symbol_parsing) as a side effect of the migration. +2. **UI Polish Test Cleanup**: fix 2 broken test assertions in `test_discussion_truncate_layout.py` and `test_log_management_refresh.py` (the production code was already fixed by user commits `d0b06575` and `df7bda6e`; the tests use `find()` which locates the comment block instead of the actual code). + +**Result:** 6 of 10 pre-existing test failures fixed. Remaining 4 RAG failures are deferred to the next track (a separate RAG subsystem track — out of scope for this one). Project reaches a stable state suitable for the `data_structure_strengthening_20260606` track. + +--- + +## 1. Overview + +### 1.1 Current State (as of 2026-06-15) + +The `data_oriented_error_handling_20260606` track (shipped 2026-06-12) introduced the `Result[T, ErrorInfo]` pattern and `send_result()` as the new public API. The legacy `ai_client.send()` was marked `@deprecated` and routed through `send_result()` internally. Two follow-up tracks shipped fixes for the immediate user-blocking issues (`ai_loop_regressions_20260614`) and the easy test mock bugs (`doeh_test_thinking_cleanup_20260615`). + +**As of 2026-06-15:** +- 3 production call sites of the deprecated `send()` remain in `src/` +- 12 test files use `ai_client.send()` directly +- 1 test file uses `_send_()` with the new `Result` return type but the old assertion pattern (causing 2 of 10 pre-existing failures) +- 2 test files mock `ai_client.send` directly (causing 2 of 10 pre-existing failures) +- 2 UI Polish test files use `find()` to locate a comment block instead of the actual code (causing 2 of 10 pre-existing failures) +- 4 RAG test files fail (separate subsystem; deferred to a follow-up RAG track) + +### 1.2 Gaps to Fill (this Track's Scope) + +| Gap | Count | Type | Spec Section | +|---|---|---|---| +| Production `ai_client.send()` callers | 3 | refactor (deprecation removal) | §3.1 | +| Test files using `ai_client.send()` | 12 | refactor (deprecation removal) | §3.2 | +| Test files using `_send_()` with old assertions | 1 | test fix (G3 in pre-existing failures) | §3.3 | +| Test files mocking `ai_client.send` | 1 | test fix (G4 in pre-existing failures) | §3.4 | +| UI Polish test bugs (`find()` not `rfind()`) | 2 | test fix (G6, G7 in pre-existing failures) | §3.5 | +| Deprecation marker + legacy `send()` function | 1 | refactor (deprecation removal) | §3.6 | +| `filterwarnings` conftest entry | 1 | housekeeping (deprecation removal) | §3.6 | +| `test_deprecation_warnings.py` | 1 file (2 tests) | delete (tests obsolete) | §3.6 | +| `docs/guide_ai_client.md` deprecation references | multiple | documentation | §3.7 | +| `conductor/product-guidelines.md` deprecation language | multiple | documentation | §3.7 | + +### 1.3 Already Implemented (DO NOT re-implement) + +Verified by code audit (2026-06-15) — the following already work and are NOT in this track's scope: + +- **`send_result()` public API** — added in commit `9f86b2be` by `data_oriented_error_handling_20260606` +- **`_send_()` returning `Result[str]`** — all 6 vendors (`_send_gemini`, `_send_gemini_cli`, `_send_grok`, `_send_minimax`, `_send_qwen`, `_send_llama`, `_send_llama_native`) already return `Result[str]` (refactored in commits `0282f9ff`, `943a21bf`, `e384afce`, `64d6ba2d`) +- **The 2 in-flight `_api_generate` and `_handle_request_event` migrations in `app_controller.py`** — already done by `doeh_test_thinking_cleanup_20260615` (commits `24ba2499` and `7b323e3e`) +- **`test_ai_client_result.py::test_send_result_does_not_emit_deprecation`** — passes; the deprecation warning filter works +- **11 test mock fixes from `doeh_test_thinking_cleanup_20260615`** — 29/29 tests in 5 files (`test_grok_provider`, `test_llama_provider`, `test_llama_ollama_native`, `test_ai_client_tool_loop_builder`, `test_headless_service`) now use the `Result` API +- **UI Polish Phase 1 (markdown tables) — `src/markdown_table.py`** — shipped by commit `79ac9210` +- **UI Polish Phase 2 (Keep Pairs input)** — code fix shipped by user commit `d0b06575` (`src/gui_2.py:5130-5131`); test bug remains (this track fixes) +- **UI Polish Phase 3 (Refresh Registry)** — code fix shipped by user commit `df7bda6e` (`src/gui_2.py:2111-2112`); test bug remains (this track fixes) +- **UI Polish Phase 4 (Vendor State tab)** — shipped by commit `3a864076` (`src/vendor_state.py`) +- **UI Polish Phase 5 (Files & Media directory tree)** — shipped by commit `74e02485` (`src/gui_2.py:render_files_and_media`) + +--- + +## 2. Goals + +### 2.1 Functional Goals + +| ID | Goal | Acceptance Criterion | +|---|---|---| +| **G1** | Migrate `src/conductor_tech_lead.py:68` to `send_result()` + Result handling | `uv run pytest tests/test_conductor_tech_lead.py` (or nearest tests) passes; no regression in tier-2 dispatch | +| **G2** | Migrate `src/orchestrator_pm.py:86` to `send_result()` + Result handling | No regression in tier-1 dispatch tests | +| **G3** | Migrate `src/multi_agent_conductor.py:591` to `send_result()` + Result handling | `test_mma_concurrent_tracks_sim`, `test_mma_step_mode_sim`, `test_undo_redo_sim`, and 30+ MMA live_gui tests pass | +| **G4** | Migrate 12 test files using `ai_client.send()` to use `send_result()` | All migrated tests pass; no test calls `ai_client.send()` after this track | +| **G5** | Fix `test_qwen_provider.py` (2 tests) to use `Result` API assertion pattern | 2/2 tests pass; same approach as `doeh_test_thinking_cleanup_20260615` used for grok/llama | +| **G6** | Fix `test_symbol_parsing.py` (2 tests) to mock `send_result` not `send` | 2/2 tests pass | +| **G7** | Fix `test_discussion_truncate_layout.py` (1 test) to use `rfind()` not `find()` | 1/1 test passes | +| **G8** | Fix `test_log_management_refresh.py` (1 test) to use `rfind()` not `find()` | 1/1 test passes | +| **G9** | Remove `@deprecated` decorator + legacy `send()` function in `src/ai_client.py` | `ai_client.send` AttributeError if called; `rg "ai_client\.send\(" src/ tests/` returns 0 hits | +| **G10** | Delete `tests/test_deprecation_warnings.py` (2 obsolete tests) | File removed; no test imports or calls `ai_client.send` | +| **G11** | Remove `filterwarnings` entry in `pyproject.toml:46-47` | `rg "ignore:Use ai_client.send_result" pyproject.toml` returns 0 hits | +| **G12** | Update `docs/guide_ai_client.md` to remove deprecation references | No `@deprecated` mention; Result API section no longer notes "send() is deprecated" | +| **G13** | Update `conductor/product-guidelines.md` to remove deprecation language | No "send() is deprecated; use send_result()" in product guidelines | + +### 2.2 Non-Functional Goals + +| ID | Goal | Acceptance Criterion | +|---|---|---| +| **NF1** | Zero new test regressions | `uv run pytest tests/` shows 4 fewer failures than the pre-track baseline (10 - 6 = 4 remaining; all RAG) | +| **NF2** | All 28 production changes atomic per-task | 28 git commits; each commit is buildable + testable | +| **NF3** | All changes follow the project's 1-space indentation, no-comments, type-hinting rules | `uv run python -c "import ast; ast.parse(open('src/ai_client.py').read())"` succeeds; production code has zero `#` comments in changed lines | +| **NF4** | Per-commit git notes attached | `git log --format='%H %s' --grep="^public_api_migration_and_ui_polish_20260615" \| wc -l` matches task count | +| **NF5** | `doeh_test_thinking_cleanup_20260615` state.toml remains parseable | `python -c "import tomllib; tomllib.load(open('conductor/tracks/doeh_test_thinking_cleanup_20260615/state.toml','rb'))"` succeeds | + +--- + +## 3. Per-File Design + +### 3.1 Production call sites to migrate + +**Why these 3 only:** `data_oriented_error_handling_20260606` spec §12.1 lists 5 production call sites. Two of the five (`src/app_controller.py:282` and `src/app_controller.py:3674`) were already migrated by `doeh_test_thinking_cleanup_20260615` (commits `7b323e3e` and `24ba2499`). One was a misidentification — `src/mcp_client.py:2274` is an MCP tool schema for `py_check_syntax`, not a `send()` call. The remaining 3 are real. + +| File:Line | Current code | After this track | Difficulty | +|---|---|---|---| +| `src/conductor_tech_lead.py:68` | `response = ai_client.send(md_content="", user_message=user_message)` (2-arg) | `result = ai_client.send_result(md_content="", user_message=user_message); if not result.ok: ; response = result.data` | Easy (2-arg call) | +| `src/orchestrator_pm.py:86` | `response = ai_client.send(md_content="", user_message=user_message, enable_tools=False)` (3-arg) | `result = ai_client.send_result(md_content="", user_message=user_message, enable_tools=False); if not result.ok: ; response = result.data` | Easy (3-arg call) | +| `src/multi_agent_conductor.py:591` | `response = ai_client.send(md_content=..., user_message=..., base_dir=".", pre_tool_callback=..., qa_callback=..., patch_callback=..., stream_callback=...)` (8-arg, with 5 callbacks) | `result = ai_client.send_result(md_content=..., ...); if not result.ok: ; response = result.data` | Hard (5 callbacks; per-ticket error handling needed in MMA) | + +**MMA per-ticket error handling:** the existing `_handle_request_event` pattern in `app_controller.py:3674` (already migrated by `doeh_test_thinking_cleanup_20260615`) uses `raise HTTPException(status_code=502, detail=err.ui_message())`. The MMA worker does not have an HTTP layer; the per-ticket error should be: +- Logged to the comms log as `WARN/deprecated_send_with_errors` (or `WARN/worker_send_failed`) +- Returned via `worker_comms_callback` as a status entry (per `multi_agent_conductor.py:584` callback) +- The worker exits with a non-zero status so the DAG engine marks the ticket as failed + +**Reference:** `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §12.1 lines 677-688; `doeh_test_thinking_cleanup_20260615/spec.md` §3.1 (the G1 fix pattern at `src/app_controller.py:265-295` is the canonical reference for Result handling). + +### 3.2 Test files using `ai_client.send()` to migrate + +**Why 12 not 63:** the parent spec claimed "63 test files (verified 2026-06-11)". The current count (rg verified 2026-06-15) is **12 files with 20 call sites**. The discrepancy is because the spec was written when there were more legacy call sites; `data_oriented_error_handling_20260606` Phase 3 + `doeh_test_thinking_cleanup_20260615` Phase 2 already migrated the rest. + +| File | Call sites | Migration pattern | +|---|---|---| +| `tests/test_ai_client_cli.py` | 1 | `response = ai_client.send(...)` → `result = ai_client.send_result(...); assert result.ok; response = result.data` | +| `tests/test_ai_cache_tracking.py` | 1 | Same pattern | +| `tests/test_ai_client_result.py` | 3 | One is the existing `test_send_deprecated_emits_warning` (will be DELETED in Phase 6); the other two are pre-existing tests that test `send()` directly — they need to be rewritten to test `send_result()` semantics | +| `tests/test_api_events.py` | 2 | Same pattern | +| `tests/test_deepseek_provider.py` | 6 | Same pattern (per-call-site migration; 6 commits is too many; consolidate to 1-2 commits) | +| `tests/test_gemini_cli_edge_cases.py` | 1 | Same pattern | +| `tests/test_gemini_cli_integration.py` | 2 | Same pattern | +| `tests/test_gemini_cli_parity_regression.py` | 1 | Same pattern | +| `tests/test_gui2_mcp.py` | 1 | Same pattern | +| `tests/test_tier4_interceptor.py` | 1 | Same pattern | +| `tests/test_token_usage.py` | 1 | Same pattern | +| **Total** | **20 call sites in 11 files** | The 12th file is `test_symbol_parsing.py` which mocks `send` not calls it; handled separately in §3.4 | + +**Migration pattern (canonical):** +```python +# Before: +result = ai_client.send(md_content, user_message, base_dir) +assert result == "expected text" + +# After: +result = ai_client.send_result(md_content, user_message, base_dir) +assert result.ok, f"send_result failed: {result.errors[0].ui_message() if result.errors else 'no error info'}" +assert result.data == "expected text" +``` + +**Special case: `test_ai_client_result.py`:** The current file has 3 tests for the deprecated `send()`. The track DELETES the `test_send_deprecated_emits_warning` test (send() is removed in Phase 6) and KEEPS the `test_send_result_does_not_emit_deprecation` test (it remains a regression test for the new API). The 3rd test (`test_send_result_does_not_emit_deprecation` is the 2nd) needs review — see the file directly. + +**Test isolation:** Group migration by file. Per-file atomic commits preserve the file as a rollback unit. 11 files = 11 atomic commits (consolidate `test_ai_client_result.py` and `test_deepseek_provider.py` since they have multiple sites per file). + +**Reference:** `doeh_test_thinking_cleanup_20260615/plan.md` Phase 2 (Tasks 2.1-2.5) for the exact migration pattern; `conductor/code_styleguides/error_handling.md` §3.1 (AND over OR pattern). + +### 3.3 `test_qwen_provider.py` fix (2 tests) + +**Current state (verified 2026-06-15):** +- `_send_qwen()` returns `Result[str]` (refactored by `data_oriented_error_handling_20260606` commit `64d6ba2d`) +- Tests at `tests/test_qwen_provider.py:17-19` and `:27-28` assert against raw `str`: + ```python + result = ai_client._send_qwen("system", "user", ".", None, "", False, None, None, None) + assert result == "hi from qwen" # FAILS: result is Result(data="hi from qwen") + ``` + +**Fix:** Mirror the pattern used by `doeh_test_thinking_cleanup_20260615` for `test_grok_provider`, `test_llama_provider`, `test_llama_ollama_native`: +```python +result = ai_client._send_qwen("system", "user", ".", None, "", False, None, None, None) +assert result.ok and result.data == "hi from qwen" +``` + +And for the image test: +```python +result = ai_client._send_qwen("system", "describe this image", ".", file_items, "", False, None, None, None) +assert result.ok and "cat" in result.data.lower() +``` + +**Why this approach and not renaming `_send_qwen` → `_send_qwen_result`:** the parent spec at line 611 planned the rename, but commit `64d6ba2d` only changed the return type (not the name). The function name `_send_qwen` is stable; only the return type changed. Migrating the tests to handle `Result` is the right scope for this track. A future "rename to `_send_qwen_result`" track could be planned separately if needed. + +**Test isolation:** 1 atomic commit for both test fixes (per-file atomicity). + +**Reference:** `doeh_test_thinking_cleanup_20260615/spec.md` §1.1 (G2-G11 test mock bugs); `doeh_test_thinking_cleanup_20260615/plan.md` Phase 2.1-2.3 (the grok/llama/llama_native patterns). + +### 3.4 `test_symbol_parsing.py` fix (2 tests) + +**Current state (verified 2026-06-15):** +- `tests/test_symbol_parsing.py:45,74` mock `src.ai_client.send` +- Production now calls `src.ai_client.send_result` (per the migration done by `doeh_test_thinking_cleanup_20260615` commit `24ba2499`) +- Mock receives 0 calls; test fails with `Expected 'send' to have been called once. Called 0 times.` + +**Fix:** +```python +# Before: +with patch('src.ai_client.send') as mock_send: + ... + mock_send.assert_called_once() + +# After: +with patch('src.ai_client.send_result') as mock_send_result: + mock_send_result.return_value = Result(data="mocked response") + ... + mock_send_result.assert_called_once() +``` + +**Test isolation:** 1 atomic commit for both test fixes (per-file atomicity). + +**Reference:** `doeh_test_thinking_cleanup_20260615/plan.md` Task 2.7 (the headless_service `test_generate_endpoint` mock migration is the canonical reference). + +### 3.5 UI Polish test fixes (2 tests) + +**Current state (verified 2026-06-15):** + +The UI Polish Five Issues track (`docs/superpowers/specs/2026-06-03-ui-polish-design.md`) has 5 phases. Per the code audit (2026-06-15): + +| Phase | Status | Code location | Test status | +|---|---|---|---| +| 1. Markdown tables | SHIPPED (commit `79ac9210`) | `src/markdown_table.py` | passing | +| 2. Keep Pairs input | SHIPPED (user commit `d0b06575`) | `src/gui_2.py:5130-5131` (now `set_next_item_width(140)` + `drag_int`) | FAILING (test bug — see below) | +| 3. Refresh Registry | SHIPPED (user commit `df7bda6e`) | `src/gui_2.py:2111-2112` (in-place `load_registry()`) | FAILING (test bug — see below) | +| 4. Vendor State tab | SHIPPED (commit `3a864076`) | `src/vendor_state.py` | passing | +| 5. Files & Media directory tree | SHIPPED (commit `74e02485`) | `src/gui_2.py:render_files_and_media` | passing | + +**Test bug in Phase 2 (`test_discussion_truncate_layout.py`):** +```python +def test_keep_pairs_input_uses_adequate_width(): + src = inspect.getsource(gui_2) + marker = "Keep Pairs:" + idx = src.find(marker) # ← BUG: finds comment block at line 5113 + assert idx != -1, "Could not find Keep Pairs label in gui_2.py" + snippet = src[idx:idx + 200] # ← snippet window doesn't reach line 5130 + assert "set_next_item_width(80)" not in snippet, ... # passes (vacuously) + assert "set_next_item_width(140)" in snippet, ... # FAILS: snippet ends at the comment + assert "drag_int" in snippet, ... # FAILS: snippet ends at the comment +``` + +The first occurrence of "Keep Pairs:" is in a comment at line 5113 (in the docstring of `render_discussion_entry_controls`). The actual code is at line 5130. The 200-char snippet window only reaches into the docstring. + +**Fix:** Use `rfind()` instead of `find()` to find the LAST occurrence (the actual code): +```python +def test_keep_pairs_input_uses_adequate_width(): + src = inspect.getsource(gui_2) + marker = "Keep Pairs:" + idx = src.rfind(marker) # ← finds the code at line 5130 + assert idx != -1, "Could not find Keep Pairs label in gui_2.py" + snippet = src[idx:idx + 200] # ← snippet now includes line 5130-5131 + assert "set_next_item_width(80)" not in snippet, ... + assert "set_next_item_width(140)" in snippet, ... # passes + assert "drag_int" in snippet, ... # passes +``` + +**Test bug in Phase 3 (`test_log_management_refresh.py`):** +```python +def test_refresh_registry_button_calls_load_registry(): + src = inspect.getsource(gui_2) + marker = "Refresh Registry" + idx = src.find(marker) # ← BUG: finds comment block at line 2090 + assert idx != -1, "Could not find Refresh Registry button in gui_2.py" + snippet = src[idx:idx + 400] # ← snippet window doesn't reach line 2111 + assert "load_registry" in snippet, ... # FAILS +``` + +The first occurrence of "Refresh Registry" is in a comment at line 2090. The actual code is at line 2111. The 400-char snippet window doesn't reach the code. + +**Fix:** Same pattern — use `rfind()` to find the actual code: +```python +def test_refresh_registry_button_calls_load_registry(): + src = inspect.getsource(gui_2) + marker = "Refresh Registry" + idx = src.rfind(marker) # ← finds the code at line 2111 + assert idx != -1, "Could not find Refresh Registry button in gui_2.py" + snippet = src[idx:idx + 400] + assert "load_registry" in snippet, ... # passes + assert snippet.count("log_registry.LogRegistry(") <= 1, ... # passes +``` + +**Test isolation:** 1 atomic commit for both test fixes (per-file atomicity; they're both 1-character changes in the same test fixture style). + +**Reference:** `docs/superpowers/specs/2026-06-03-ui-polish-design.md` §3.2 (Phase 2 design) and §3.3 (Phase 3 design). + +### 3.6 Deprecation removal + +**Files to modify:** + +1. **`src/ai_client.py:2939-3040`** — Remove the `@deprecated` decorator on `def send(...)` and the entire function body. The function is replaced by `send_result()` (which already exists at `src/ai_client.py:3002`). + - Verify: `rg "def send\(" src/ai_client.py` returns 0 hits (only `def send_result(` should remain). + +2. **`tests/test_deprecation_warnings.py`** — Delete the file. Both tests are obsolete: + - `test_send_deprecated_warning_emitted_once_per_site` — tests `send()`; can't run after `send()` is removed + - `test_send_result_does_not_emit_deprecation` — tests `send_result()` doesn't emit a deprecation; trivially true after `send()` is removed (no deprecation source) + +3. **`pyproject.toml:46-47`** — Remove the `filterwarnings` entry: + ```toml + filterwarnings = [ + "ignore:Use ai_client.send_result.*:DeprecationWarning", # DELETE THIS LINE + ] + ``` + - Verify: `rg "ignore:Use ai_client.send_result" pyproject.toml` returns 0 hits. + +**Test isolation:** 1 atomic commit for the 3 changes (consecutive cleanup; the changes are meaningless without each other). + +**Reference:** `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.5 (deprecation strategy); `pyproject.toml:46-47` (current entry). + +### 3.7 Documentation updates + +**1. `docs/guide_ai_client.md` — remove deprecation references:** + +Search for "deprecat" (case-insensitive) and remove: +- "Use ai_client.send_result() instead" mentions +- "The deprecated send() will be removed in..." warnings +- The entire deprecation warning table at the bottom of the `send_result` section + +**2. `conductor/product-guidelines.md` — remove deprecation language:** + +Search for "deprecat" (case-insensitive) and remove or update: +- "send() is deprecated" mentions +- "Use send_result()" instructions (the deprecation is being removed) +- Update the "Public API deprecation" section to mark as resolved + +**Test isolation:** 1 atomic commit for the 2 doc updates (consecutive cleanup). + +**Reference:** `conductor/product-guidelines.md` "Data-Oriented Error Handling > Public API deprecation" section (search for the heading; mark as RESOLVED). + +--- + +## 4. Architecture Reference + +### 4.1 The Result API (Fleury Pattern) + +The `Result[T, ErrorInfo]` pattern from `conductor/code_styleguides/error_handling.md` is the foundation. This track is the **removal of the deprecation** that the data_oriented_error_handling track introduced; the new API is the permanent one. + +**Key files:** +- `src/result_types.py` — `Result`, `ErrorInfo`, `ErrorKind`, `NilPath`, `NilRAGState` +- `src/ai_client.py:3002` — `def send_result(...)` (the permanent public API after this track) +- `src/ai_client.py:_send_()` (6 vendors) — return `Result[str]` + +**Per-call-site error handling pattern (canonical):** +```python +result = ai_client.send_result(md_content, user_message, base_dir, ...) +if not result.ok: + err = result.errors[0] + # call-site-specific error handling: + # - HTTP layer (app_controller:_api_generate): raise HTTPException(502, detail=err.ui_message()) + # - GUI layer (app_controller:_handle_request_event): log to comms + add error entry + # - MMA worker (multi_agent_conductor): log to comms + return per-ticket error + # - Tier 1/2 sub-agents (orchestrator_pm, conductor_tech_lead): log warn + return None or empty +response = result.data +``` + +### 4.2 The deprecated send() function + +Per `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.5 lines 183-206, the `send()` function: +- Was added in `data_oriented_error_handling_20260606` Phase 3 (commit `73cf321c`) +- Wraps `send_result()` and unwraps the `Result` to return `str` +- Is marked `@deprecated` via `typing_extensions.deprecated` (Python 3.11+ backport) +- Emits a `DeprecationWarning` at runtime (cached per call site) + +The `filterwarnings` entry in `pyproject.toml:46-47` silences the warning during the transition period. This track removes both the function and the filter entry. + +### 4.3 Threading & Locking + +The production call site migrations MUST preserve the existing locking: +- `multi_agent_conductor.py:591` — runs in a worker thread; the `set_comms_log_callback` and `set_current_tier` calls before the `send()` call MUST be preserved +- `orchestrator_pm.py:86` — runs in the orchestrator thread; lock acquisition patterns must be preserved +- `conductor_tech_lead.py:68` — runs in a sub-agent thread; the `set_custom_system_prompt` and `set_current_tier` calls before the `send()` call MUST be preserved + +**Reference:** `docs/guide_ai_client.md` "Threading Model" section; `docs/guide_app_controller.md` "AI Loop Lifecycle" section. + +### 4.4 The MMA per-ticket error handling + +The MMA worker (`multi_agent_conductor.py:run_worker_lifecycle`) currently does NOT have per-ticket error handling — it expects `send()` to return a `str` (and raises an exception on internal errors which the worker catches). After this track, `send_result()` returns a `Result[str]` with the errors in `result.errors`. The migration must: + +1. Check `result.ok` immediately after the call +2. If `!result.ok`: + - Log the error to the comms log via `worker_comms_callback` (status entry with `err.ui_message()`) + - Return a sentinel value that the DAG engine marks as failed (e.g., return `None` and the worker exits with non-zero status) +3. If `result.ok`: continue with `result.data` as before + +**Reference:** `docs/guide_mma.md` "Worker Lifecycle" section; the `multi_agent_conductor.py:584` `worker_comms_callback` (already wired up). + +--- + +## 5. Test Plan + +### 5.1 Per-phase test verification + +Each phase must pass targeted tests before moving to the next: + +| Phase | Test command | Expected | +|---|---|---| +| 1 | `uv run pytest tests/test_conductor_tech_lead.py tests/test_orchestrator_pm.py tests/test_mma_concurrent_tracks_sim.py tests/test_mma_step_mode_sim.py tests/test_undo_redo_sim.py -v 2>&1 \| tee tests/artifacts/public_api_phase1.log` | All pass | +| 2 | `uv run pytest tests/test_ai_client_cli.py tests/test_ai_cache_tracking.py tests/test_ai_client_result.py tests/test_api_events.py tests/test_deepseek_provider.py tests/test_gemini_cli_*.py tests/test_gui2_mcp.py tests/test_tier4_interceptor.py tests/test_token_usage.py -v 2>&1 \| tee tests/artifacts/public_api_phase2.log` | All pass | +| 3 | `uv run pytest tests/test_qwen_provider.py -v` | 5/5 pass (2 of which were the pre-existing failures) | +| 4 | `uv run pytest tests/test_symbol_parsing.py -v` | 2/2 pass (which were the pre-existing failures) | +| 5 | `uv run pytest tests/test_discussion_truncate_layout.py tests/test_log_management_refresh.py -v` | 2/2 pass (which were the pre-existing failures) | +| 6 | `uv run pytest tests/test_deprecation_warnings.py -v 2>&1` (should fail — file is deleted) + `uv run rg "ai_client\.send\(" src/ tests/` (should return 0) | File deleted; 0 rg hits | +| 7 | `uv run pytest tests/ 2>&1 \| tee tests/artifacts/public_api_phase7_full.log` | 4 fewer failures than pre-track (10 - 6 = 4 RAG failures remain) | + +### 5.2 Per-task TDD red verification + +For each task that introduces a new test, the implementer MUST: +1. Verify the test FAILS as expected (red phase) +2. Implement the fix +3. Verify the test PASSES (green phase) +4. Commit + +**Anti-pattern guard:** per `AGENTS.md` "Critical Anti-Patterns", no skipping tests just because they fail. If a test fails for an unexpected reason, the implementer MUST investigate before committing. + +### 5.3 Test isolation + +Per `docs/guide_testing.md` "Structural Testing Contract": +- No `unittest.mock.patch` on core infrastructure (event queues, `ai_client` internals, threading primitives) unless explicitly authorized +- All integration tests use `live_gui` fixture +- Test artifacts in `tests/artifacts/` or `tests/logs/` (gitignored) + +This track's tests are mostly UNIT tests (no `live_gui` needed). The MMA migration test (Phase 1) MAY need `live_gui` for the worker dispatch path; verify by running targeted tests first. + +--- + +## 6. Migration Strategy + +### 6.1 The order matters + +**Phase 1 must complete before Phase 6:** +- Phase 1 migrates the 3 production call sites to `send_result()` +- Phase 6 removes the legacy `send()` function +- If Phase 6 runs first, the production code (still using `send()`) crashes + +**Phase 2 must complete before Phase 6:** +- Phase 2 migrates the 12 test files to `send_result()` +- Phase 6 removes the legacy `send()` function +- If Phase 6 runs first, the tests (still using `send()`) crash + +**Phase 3, 4, 5 can run in any order after Phase 1** (they're independent test fixes). + +**Phase 7 is the final sweep** (docs + tracks.md + full suite). + +### 6.2 Per-commit safety + +Each atomic commit must: +- Be buildable (`python -c "import src.ai_client"` succeeds) +- Pass its targeted tests +- Not introduce a regression in the previously-passing tests +- Have a clear commit message with the task number + +The per-task commit pattern (per `conductor/workflow.md`): +``` +fix(ai_client): migrate conductor_tech_lead.py:68 to send_result() (G1, public_api_migration_and_ui_polish_20260615 Phase 1.1) +``` + +The per-phase checkpoint pattern: +``` +conductor(checkpoint): Phase 1 complete - 3 production call sites migrated +``` + +--- + +## 7. Out of Scope + +### 7.1 Deferred to separate tracks + +| ID | Item | Defer to | Why | +|---|---|---|---| +| OOS1 | 4 RAG test failures (test_rag_integration, test_rag_phase4_final_verify, test_rag_phase4_stress, test_rag_visual_sim) | RAG subsystem track (planned; not yet specced) | Pre-existing RAG subsystem issues; error is in RAG config lookup code, not AI client code. A partial fix was attempted in commit `16412ad5`; the remaining issue is a different code path. | +| OOS2 | The `_send_()` → `_send__result()` rename per the data_oriented_error_handling spec §3.4 line 611 | Separate "private API rename" track (if needed) | Not blocking; tests work with current names. The function names are stable; only the return type changed. | +| OOS3 | The 23 lower-impact files with weak types (per `data_structure_strengthening_20260606/spec.md` §1 line 20) | `data_structure_strengthening_20260606` (the next major track after this) | That's exactly what data_structure_strengthening is for. | +| OOS4 | The 4 remaining UI Polish track phases that ARE NOT in this scope (none — all 5 are either shipped or addressed by this track's test fixes) | N/A | All 5 UI Polish phases are accounted for. | +| OOS5 | `live_gui_mock_injection_20260615` infrastructure | Separate infrastructure track | Not blocking. Recommended but not required. | + +### 7.2 Explicitly NOT in this track + +- **Renaming `_send_()` to `_send__result()`** — not needed; tests work with current names after assertion pattern fix +- **Adding TypedDict / @dataclass schemas** — that's data_structure_strengthening's scope +- **MMA per-ticket Result returns (per `data_oriented_error_handling_20260606/spec.md` §12.1 line 677 "Adds any new public API surface needed (e.g., per-ticket Result returns in the MMA conductor)")** — the MMA worker already gets `Result[str]` from `send_result()`; the existing `worker_comms_callback` already handles per-ticket status updates. The spec's mention of "per-ticket Result returns" was speculative; the current Result-based flow is sufficient. +- **Removing the `filterwarnings` for the `Optional[T]` ban** — the `audit_optional_in_3_files.py` audit (per `data_oriented_error_handling_20260606/spec.md`) is unrelated to this track's deprecation removal. + +--- + +## 8. Risks & Mitigations + +| ID | Risk | Likelihood | Impact | Mitigation | +|---|---|---|---|---| +| **R1** | `multi_agent_conductor.py:591` migration breaks MMA worker dispatch (5 callbacks) | Medium | High | TDD red first: verify a known MMA test fails before the fix; verify it passes after. The existing `doeh_test_thinking_cleanup_20260615` G1 fix pattern is the canonical reference for Result handling. | +| **R2** | Removing `send()` breaks a test that imports it indirectly | Low | Medium | Run `rg "ai_client\.send\(" src/ tests/` before AND after Phase 6 to confirm 0 hits. | +| **R3** | `pyproject.toml` filterwarnings removal causes test suite to fail with `DeprecationWarning` (e.g., from another library) | Low | Low | The filter was added in `data_oriented_error_handling_20260606` specifically to silence `send()` deprecation; no other deprecation in the codebase is silenced by it. Verified by checking the rg history. | +| **R4** | UI Polish test fixes (`find()` → `rfind()`) mask a real production bug | Low | Medium | The production code at `src/gui_2.py:5130-5131` and `:2111-2112` was already verified to have the correct values (`set_next_item_width(140)` + `drag_int` and in-place `load_registry()`). The test bug is just the search logic. | +| **R5** | Qwen test fix uses a different pattern than grok/llama/llama_native | Low | Low | The plan uses the same `assert result.ok and result.data == "x"` pattern as `doeh_test_thinking_cleanup_20260615` (commits `d7e42a4a`, `439a0ac0`, `dbdf9ba9`). | +| **R6** | `test_deprecation_warnings.py` deletion is misinterpreted as "deleting tests instead of fixing them" | Low | Low | Both tests in the file are obsolete after `send()` removal. The first test (test_send_deprecated) literally cannot run without `send()`. The second test (test_send_result_does_not_emit_deprecation) is trivially true. Document in the commit message. | +| **R7** | The 4 RAG test failures get introduced or regressed during this track | Low | Medium | Run full test suite in Phase 7 and compare to the pre-track baseline. The 4 RAG failures are documented as "pre-existing" with their defer-to track recorded. | + +--- + +## 9. Verification Criteria (definition of "done") + +The track is DONE when **ALL** of the following are true: + +1. **G1-G3 production migrations complete**: 3 call sites use `send_result()`; no `ai_client.send(` in `src/` +2. **G4 test migration complete**: 12 test files use `send_result()`; no `ai_client.send(` in `tests/` +3. **G5 Qwen test fix complete**: `test_qwen_provider.py` 5/5 pass +4. **G6 symbol_parsing test fix complete**: `test_symbol_parsing.py` 2/2 pass +5. **G7-G8 UI Polish test fixes complete**: `test_discussion_truncate_layout.py` 1/1 + `test_log_management_refresh.py` 1/1 pass +6. **G9 deprecation removed**: `@deprecated` decorator and `send()` function gone from `src/ai_client.py` +7. **G10 test_deprecation_warnings.py deleted**: file does not exist +8. **G11 filterwarnings removed**: no `ignore:Use ai_client.send_result` in `pyproject.toml` +9. **G12-G13 docs updated**: no `@deprecated` or "send is deprecated" mentions in `docs/guide_ai_client.md` or `conductor/product-guidelines.md` +10. **NF1 no regressions**: full test suite has 4 RAG failures remaining (down from 10); no new failures +11. **NF2 per-task commits**: ~28 atomic commits with clear messages +12. **NF3 style preserved**: 1-space indentation, no comments, type hints in all changed code +13. **NF4 per-commit git notes**: all 28 commits have git notes summarizing the task +14. **NF5 doeh state.toml parseable**: `tomllib.load()` succeeds (unchanged from previous track; sanity check) +15. **Final state**: 1280 + 6 newly-passing = 1286 tests pass; 4 RAG failures documented as deferred + +**Test count math:** +- Pre-track baseline: 1280 pass + 4 skip + 10 fail (verified 2026-06-15) +- After this track: 1286 pass + 4 skip + 4 fail (6 newly-passing: 2 Qwen + 2 symbol_parsing + 1 truncate + 1 refresh) +- The 4 remaining failures are all RAG subsystem; deferred to the next track + +--- + +## 10. Execution Order & Dependencies + +**No external blockers.** This track can start immediately after the Tier 1 review approves the spec. + +**Execution order (the plan):** +1. Phase 1 (production migration) — 1 day +2. Phase 2 (test migration, 12 files) — 1 day +3. Phase 3 (Qwen test fix) — 1 hour (can be combined with Phase 2) +4. Phase 4 (symbol_parsing test fix) — 30 min (can be combined with Phase 2) +5. Phase 5 (UI Polish test fixes) — 30 min (independent) +6. Phase 6 (deprecation removal) — 30 min (MUST be after Phases 1 + 2) +7. Phase 7 (docs + housekeep) — 1 hour (after Phase 6) + +**Total:** 2-3 days Tier 2 work (the estimate accounts for the per-commit overhead + per-task git notes + 7 phase checkpoints). + +**Followed by:** the user can start `data_structure_strengthening_20260606` track (already has spec, plan pending). + +--- + +## 11. References + +### Architecture docs +- `docs/guide_ai_client.md` — multi-provider LLM client; `send_result()` is the canonical public API +- `docs/guide_app_controller.md` — headless controller; `app_controller.py:_handle_request_event` was migrated by `doeh_test_thinking_cleanup_20260615` +- `docs/guide_mma.md` — 4-tier MMA orchestration; `multi_agent_conductor.py:run_worker_lifecycle` is the worker entry point +- `docs/guide_mcp_client.md` — MCP tool registry (note: `mcp_client.py:2274` was a misidentification in the parent spec) +- `docs/guide_testing.md` — `live_gui` fixture + structural testing contract + +### Styleguides +- `conductor/code_styleguides/error_handling.md` — `Result[T]` pattern + the AND-over-OR convention +- `conductor/code_styleguides/data_oriented_design.md` — canonical DOD reference +- `conductor/product-guidelines.md` — 1-space indentation, no comments, type hints, SDM tags + +### Parent tracks +- `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §3.5 (deprecation strategy), §12.1 (follow-up scope) +- `conductor/tracks/data_oriented_error_handling_20260606/state.toml` — the parent track's state +- `conductor/tracks/doeh_test_thinking_cleanup_20260615/spec.md` — the previous track; the migration pattern reference +- `conductor/tracks/doeh_test_thinking_cleanup_20260615/plan.md` Phase 2 — exact test mock fix pattern (Tasks 2.1-2.5) +- `docs/reports/TRACK_COMPLETION_doeh_test_thinking_cleanup_20260615.md` — the 11 mock fixes that established the pattern + +### UI Polish track +- `docs/superpowers/specs/2026-06-03-ui-polish-design.md` — the 5-phase UI Polish spec +- `docs/superpowers/plans/2026-06-03-ui-polish.md` — the 5-phase UI Polish plan +- User commits: `d0b06575` (Phase 2 code fix), `df7bda6e` (Phase 3 code fix) +- Track commits: `79ac9210` (Phase 1), `3a864076` (Phase 4), `74e02485` (Phase 5) + +### Test files (the 12 + 1 to migrate, the 4 UI Polish fixes) +- 12 send() test files: `test_ai_client_cli`, `test_ai_cache_tracking`, `test_ai_client_result`, `test_api_events`, `test_deepseek_provider`, `test_gemini_cli_edge_cases`, `test_gemini_cli_integration`, `test_gemini_cli_parity_regression`, `test_gui2_mcp`, `test_tier4_interceptor`, `test_token_usage`, `test_symbol_parsing` +- 1 _send_ test file: `test_qwen_provider` +- 2 UI Polish test files: `test_discussion_truncate_layout`, `test_log_management_refresh` +- 1 file to delete: `test_deprecation_warnings` + +### Production call sites (3 to migrate) +- `src/conductor_tech_lead.py:68` +- `src/orchestrator_pm.py:86` +- `src/multi_agent_conductor.py:591` + +### Codebase locations +- `src/ai_client.py:2939-3040` — the deprecated `send()` function (to be deleted) +- `src/ai_client.py:3002` — the new `send_result()` public API (kept) +- `pyproject.toml:46-47` — the `filterwarnings` entry (to be deleted) +- `tests/test_deprecation_warnings.py` — the 2 obsolete tests (to be deleted) +- `docs/guide_ai_client.md` — deprecation references (to be removed) +- `conductor/product-guidelines.md` — deprecation language (to be removed)