diff --git a/conductor/tracks/doeh_test_thinking_cleanup_20260615/spec.md b/conductor/tracks/doeh_test_thinking_cleanup_20260615/spec.md
new file mode 100644
index 00000000..c82c321b
--- /dev/null
+++ b/conductor/tracks/doeh_test_thinking_cleanup_20260615/spec.md
@@ -0,0 +1,277 @@
+# Track: Data-Oriented Error Handling Test & Thinking-Parser Cleanup
+
+**Status:** Active (spec approved 2026-06-15)
+**Initialized:** 2026-06-15
+**Owner:** Tier 2 Tech Lead
+**Priority:** High (1 critical production regression + 10+ test mock fixes + 2 deferred bugs)
+
+---
+
+## 1. Overview
+
+This track is the **cleanup follow-up** to two previously-completed tracks: `data_oriented_error_handling_20260606` (shipped 2026-06-12) and `ai_loop_regressions_20260614` (shipped 2026-06-15). It consolidates 3 categories of remaining work into a single deliverable:
+
+1. **A new production regression** introduced by `ai_loop_regressions_20260614` commit `2b7b571a` (FR2 fix): the `_api_generate` function in `src/app_controller.py:265-295` references an undefined variable `context_to_send`, causing `/api/v1/generate` to return HTTP 500 on every call. This bug was not caught by the previous track's smoke tests (which only verified Hook API substrate reachability) and was missed in the Tier 1 review (which relied on the test pass count, not direct code inspection of the FR2 diff).
+
+2. **10 pre-existing test mock bugs** from the `data_oriented_error_handling_20260606` refactor: tests that call `_send_<vendor>()` and assert against raw `str` return values, while the production code now returns `Result[str]`. Mechanical fixes (`assert result.ok and result.data == "x"` instead of `assert result == "x"`).
+
+3. **2 deferred bugs** from `ai_loop_regressions_20260614` spec §13: Gemini / Gemini CLI thinking-format compatibility (Bug #4) and `<think>` (half-width) marker support in `thinking_parser` (Bug #5).
+
+Plus 2 housekeeping items discovered during Tier 1 review of `ai_loop_regressions_20260614`: the duplicate-key bug in that track's `state.toml` (which makes the file unparseable by Python's `tomllib`), and the `tracks.md` row 24 that was never updated to mark the track complete.
+
+This track does NOT include (deferred to separate tracks — see §13):
+- The `public_api_migration_20260606` follow-up (5 production + 63 test call sites not migrated to `send_result()`)
+- A `live_gui_mock_injection` infrastructure track (would unblock proper end-to-end live_gui + AI client tests)
+- Pre-existing RAG flakiness (`test_rag_phase4_final_verify`)
+- The UI Polish Five Issues phases (2 unrelated test failures covered by that track)
+
+## 2. Goals (Priority Order)
+
+| Priority | Goal | Rationale |
+|---|---|---|
+| **A (critical)** | Fix the `_api_generate` `NameError` regression introduced by `ai_loop_regressions_20260614` commit `2b7b571a` | Production bug: `/api/v1/generate` returns HTTP 500 on every call. The fix is small (~3 lines: add back the `_disc_entries_lock` acquisition and `context_to_send = stable_md if not has_ai_response else ""`). A failing test (`test_headless_service.test_generate_endpoint`) is the canary. |
+| **A (primary value)** | Fix the 10 pre-existing test mock bugs from `data_oriented_error_handling_20260606` | The test suite has 10+ red tests that are all the same mechanical pattern. Fixing them gets the test suite back to green. Each test is a 1-line change (use `result.data` or `result.ok` checks). |
+| **B (architectural)** | Investigate and fix the Gemini / Gemini CLI thinking-format compatibility (Bug #4) | The user complained that thinking monologues don't render for Gemini. Empirical investigation needed: run a Gemini request, inspect `resp.text`, determine if a normalization pass is needed in `_send_gemini*`. |
+| **B (architectural)** | Add `<think>` (half-width) marker support to `thinking_parser.py` | User screenshot 1 showed `<think>...</think>` format. The current regex at `src/thinking_parser.py:9` requires the full-width `<thinking>`. Small change (~3 lines + tests). |
+| **C (housekeeping)** | Fix the `state.toml` duplicate-key bug in `ai_loop_regressions_20260614` | The state file is unparseable by Python's `tomllib` due to TOML §3.3.1 "Cannot overwrite a value". The fix is deleting lines 23-26 and 46-58. This blocks archival of the parent track. |
+| **C (housekeeping)** | Update `conductor/tracks.md` row 24 to reflect completion of `ai_loop_regressions_20260614` | The track was completed on 2026-06-15 but the row still says "spec ✓, plan ✓, ready to start". |
+| **C (verification)** | Full test suite sweep + `docs/guide_ai_client.md` "See Also" section update | Document the new `Result` API test patterns and the deferred items. |
+
+### 2.1 Non-Goals (this track)
+
+- **Not** migrating the remaining 5 production + 63 test call sites to `send_result()`. That is `public_api_migration_20260606`, a separate planned track with its own scope. This track only fixes the broken `_api_generate` site (which is the only newly-introduced production regression) and the 10+ tests that would be touched by the public_api migration.
+- **Not** introducing a `live_gui_mock_injection` infrastructure. That's a separate concern (test infrastructure) requiring subprocess mock injection. Recommended as its own track.
+- **Not** fixing the pre-existing RAG flakiness in `test_rag_phase4_final_verify`. That test had a partial fix in commit `16412ad5` (RAG Phase 4 dim-mismatch) and a subsequent failure with `'NoneType' object has no attribute 'get'`. This is a RAG subsystem concern, not an AI client test mock concern.
+- **Not** fixing `test_discussion_truncate_layout.py::test_keep_pairs_input_uses_adequate_width` and `test_log_management_refresh.py::test_refresh_registry_button_calls_load_registry`. These are Phase 2 and Phase 3 of the UI Polish Five Issues track, which has its own plan and spec. The 2 failing tests are correctly identified as out-of-scope here.
+- **Not** adding a CI gate or audit script. The existing `scripts/audit_*.py` scripts don't check for this category of regression (test mocks that don't match the new return types).
+- **Not** removing the deprecated `ai_client.send()` shim. That's `public_api_migration_20260606`.
+
+## 3. Current State Audit (as of commit `515ef933`)
+
+### 3.1 Already Implemented (DO NOT re-implement)
+
+- **`src/result_types.py`**: `Result[T]`, `ErrorInfo`, `ErrorKind` dataclasses exist; the new convention is fully established.
+- **`src/ai_client.py:send_result()`** (lines 2970-3092): the new public entry point, returns `Result[str]`. Routes to `_send_<vendor>_result()` per provider.
+- **`src/ai_client.py:send()`** (lines 2907-2968): the `@deprecated` shim, returns `result.data` (empty string on error).
+- **`src/ai_client.py:_send_*_result()`** (9 vendors): all return `Result[str]`.
+- **`src/ai_client.py:run_with_tool_loop()`** (lines 734-836): now has `wrap_reasoning_in_text: bool = False` kwarg (added by `ai_loop_regressions_20260614` FR3 fix).
+- **`src/app_controller.py:_handle_request_event`** (lines 3673-3697): uses `send_result()` + `result.ok` branching (fixed by `ai_loop_regressions_20260614` FR1).
+- **`src/app_controller.py:_api_generate_sync`** (line 3692): also updated by FR1 (the 2nd `except ProviderError` site was already replaced; the `try`/`except` was also restructured).
+- **`src/thinking_parser.py:parse_thinking_trace()`** (lines 8-54): supports `<thinking>`, `<thought>`, and `Thinking:` prefix markers.
+
+### 3.2 Gaps to Fill (This Track's Scope)
+
+#### G1: `_api_generate` NameError regression (CRITICAL)
+
+**File:line**: `src/app_controller.py:265-295` (the `_api_generate` function)
+**Bug introduced by**: `ai_loop_regressions_20260614` commit `2b7b571a` (FR2 fix)
+**Symptom**: `/api/v1/generate` returns HTTP 500 with `NameError: name 'context_to_send' is not defined`
+**Root cause**: The FR2 fix removed the `try:` block (which contained the `with controller._disc_entries_lock:` acquisition and the `context_to_send = stable_md if not has_ai_response else ""` assignment) and replaced it with a `send_result()` call that still references `context_to_send`. The variable definition was lost.
+
+The current state at `src/app_controller.py:278`:
+```python
+result = ai_client.send_result(context_to_send, user_msg, base_dir, ...)  # context_to_send is undefined
+```
+
+The fix needs to add back the 2 lines BEFORE line 278:
+```python
+with controller._disc_entries_lock:
+    has_ai_response = any(e.get("role") == "AI" for e in controller.disc_entries)
+context_to_send = stable_md if not has_ai_response else ""
+```
+
+**Failing test**: `tests/test_headless_service.py::TestHeadlessAPI::test_generate_endpoint` (currently returns 500).
+
+#### G2-G11: 10 pre-existing test mock bugs from `data_oriented_error_handling_20260606`
+
+All have the same root cause: the tests were written before the refactor when `_send_<vendor>()` returned `str`; the production code now returns `Result[str]`. The fix is mechanical: change `assert result == "x"` to `assert result.ok and result.data == "x"`, and `assert "text" in result` to `assert result.ok and "text" in result.data`.
+
+| # | File:line | Test | Current assertion | Fix |
+|---|---|---|---|---|
+| **G2** | `tests/test_llama_provider.py:22` | `test_send_grok_uses_xai_endpoint` (wait, this is in test_grok_provider) | `assert result == "hi from grok"` | `assert result.ok and result.data == "hi from grok"` |
+| **G3** | `tests/test_grok_provider.py:13` | `test_send_grok_uses_xai_endpoint` | `assert result == "hi from grok"` | `assert result.ok and result.data == "hi from grok"` |
+| **G4** | `tests/test_grok_provider.py:30` | `test_grok_web_search_adds_search_parameters_to_extra_body` | `assert len(captured_kwargs) == 1` (got 12) | Loop now calls the mock 12 times; update to `assert any(kw["extra_body"] is not None and kw["extra_body"].get("search_parameters", {}).get("mode") == "auto" for kw in captured_kwargs)` |
+| **G5** | `tests/test_grok_provider.py:46` | `test_grok_x_search_adds_x_source_to_extra_body` | `assert captured_kwargs[0]["extra_body"]["search_parameters"]["sources"] == [{"type": "x"}]` | Same as G4 — change to check across all kwargs |
+| **G6** | `tests/test_llama_provider.py:24` | `test_send_llama_openrouter_backend` | `assert result == "hi from openrouter"` | `assert result.ok and result.data == "hi from openrouter"` |
+| **G7** | `tests/test_llama_provider.py:43` | `test_send_llama_custom_url` | `assert result == "hi from custom"` | `assert result.ok and result.data == "hi from custom"` |
+| **G8** | `tests/test_llama_provider.py:62` | `test_send_llama_ollama_backend` | `assert "hi from ollama" in result` | `assert result.ok and "hi from ollama" in result.data` |
+| **G9** | `tests/test_llama_ollama_native.py:70` | `test_send_llama_native_calls_ollama_chat_when_localhost` | `assert "hi from native ollama" in result` | `assert result.ok and "hi from native ollama" in result.data` |
+| **G10** | `tests/test_llama_ollama_native.py:88` | `test_send_llama_native_preserves_thinking_field` | `assert "I thought about it" in result` | `assert result.ok and "I thought about it" in result.data` |
+| **G11** | `tests/test_llama_ollama_native.py:107` | `test_send_llama_routes_to_native_when_localhost` | `assert "via native" in result` | `assert result.ok and "via native" in result.data` |
+| **G12** | `tests/test_llama_ollama_native.py:122` | `test_send_llama_keeps_openai_path_for_non_local` | `assert "via openrouter" in result` | `assert result.ok and "via openrouter" in result.data` |
+| **G13** | `tests/test_ai_client_tool_loop_builder.py:22` | `test_run_with_tool_loop_calls_request_builder_each_round` | Mock returns raw `NormalizedResponse`; `_default_send` now does `if not res.ok:` expecting `Result[NormalizedResponse]` | Wrap the mock return in `Result(data=...)` |
+| **G14** | `tests/test_headless_service.py:57` | `test_generate_endpoint` | Mocks `ai_client.send` (deprecated); production now uses `send_result`. Plus the G1 NameError. | Update mock to `ai_client.send_result` returning `Result(data="AI Response")`; this test will pass after G1 is fixed |
+
+#### G15: Gemini / Gemini CLI thinking-format compatibility (Bug #4 deferred from `ai_loop_regressions_20260614`)
+
+**File:line**: `src/ai_client.py:_send_gemini` (lines 1538-1781) and `src/ai_client.py:_send_gemini_cli` (lines 1783-1897), possibly `src/thinking_parser.py:9`
+**Symptom**: User reported thinking monologues don't render for Gemini. The current `parse_thinking_trace` regex matches `<thinking>`, `<thought>`, and `Thinking:` prefix. The Gemini SDK may emit a different format.
+**Investigation needed**: empirically run a Gemini request that produces reasoning and inspect the raw `resp.text`. If the format is incompatible, add a normalization pass.
+
+#### G16: `<think>` (half-width) marker support (Bug #5 deferred from `ai_loop_regressions_20260614`)
+
+**File:line**: `src/thinking_parser.py:9` (the regex at line 9)
+**Symptom**: User screenshot 1 showed `<think>This is DWARF debug info, not the actual disassembly...</think>` — the half-width form. The current regex doesn't match this.
+**Fix**: extend the `tag_pattern` to also match `<think>...</think>` (the closing tag is the same).
+
+#### G17: `state.toml` duplicate-key bug (housekeeping, blocks `ai_loop_regressions_20260614` archival)
+
+**File:line**: `conductor/tracks/ai_loop_regressions_20260614/state.toml` lines 23-26 and 46-58
+**Symptom**: Python's `tomllib.load()` raises `TOMLDecodeError: Cannot overwrite a value (at line 23, column 123)`
+**Fix**: Delete the duplicate `phase_2..5` and `t2_1..t5_4` entries (the "pending" duplicates of the "completed" entries that already have the correct commit SHAs).
+
+#### G18: `tracks.md` row 24 not updated (housekeeping)
+
+**File:line**: `conductor/tracks.md:41`
+**Symptom**: Track 24 still shows "spec ✓, plan ✓, ready to start" though the track shipped on 2026-06-15.
+**Fix**: Update the status column to reflect completion, OR move the row to a "Recently Completed" section (per existing convention used by `qwen_llama_grok_integration_20260606`).
+
+## 4. Functional Requirements
+
+### FR1: Fix `_api_generate` NameError (G1)
+
+`_api_generate` in `src/app_controller.py:265-295` must:
+1. Have `context_to_send` properly defined before the `send_result()` call.
+2. Continue to use the `_disc_entries_lock` for thread-safe access to `disc_entries`.
+3. Continue to use the `if not result.ok: raise HTTPException(502, ...)` pattern from the FR2 fix.
+
+The fix is 2-3 lines added before line 278:
+```python
+with controller._disc_entries_lock:
+    has_ai_response = any(e.get("role") == "AI" for e in controller.disc_entries)
+context_to_send = stable_md if not has_ai_response else ""
+```
+
+### FR2: Fix the 11 pre-existing test mock bugs (G2-G12, G14)
+
+For each of the 11 tests, change the assertion pattern to handle `Result[str]`:
+- `assert result == "x"` → `assert result.ok and result.data == "x"`
+- `assert "text" in result` → `assert result.ok and "text" in result.data`
+
+For the Grok web_search / x_search tests (G4, G5), the test now goes through the tool loop and the mock is called multiple times. Change `assert captured_kwargs[0]...` to `assert any(kw["extra_body"]... for kw in captured_kwargs)`.
+
+For `test_headless_service.test_generate_endpoint` (G14): change the mock from `ai_client.send` to `ai_client.send_result` returning `Result(data="AI Response")`.
+
+### FR3: Fix `test_ai_client_tool_loop_builder` mock shape (G13)
+
+The mock at `tests/test_ai_client_tool_loop_builder.py:33` uses `patch("src.openai_compatible.send_openai_compatible", side_effect=[tool_response, final])` and returns raw `NormalizedResponse` objects. Since `run_with_tool_loop._default_send` now does `if not res.ok:` expecting a `Result[NormalizedResponse]`, the mock must return `Result(data=tool_response)` and `Result(data=final)`.
+
+### FR4: Investigate and fix Gemini thinking format (G15)
+
+Phase 3 task. Empirically investigate:
+1. Run a Gemini request (real or mocked) that produces thinking content.
+2. Inspect the raw `resp.text` to see what format it uses.
+3. If the format is not `<thinking>...</thinking>` or `Thinking:`, decide:
+   - **Option A**: Add a normalization pass in `_send_gemini` and `_send_gemini_cli` to wrap the thinking in `<thinking>` tags before returning.
+   - **Option B**: Extend `parse_thinking_trace` to match the new format.
+
+The empirical finding determines the approach. Document the result in the commit message.
+
+### FR5: Add `<think>` half-width marker support (G16)
+
+Extend the `tag_pattern` regex at `src/thinking_parser.py:9` to also match `<think>...</think>` (half-width). The fix is a single regex addition to the existing pattern. Update the 5+ existing tests in `tests/test_thinking_trace.py` to verify the new pattern works.
+
+### FR6: Fix `state.toml` duplicate keys (G17)
+
+Delete lines 23-26 and 46-58 from `conductor/tracks/ai_loop_regressions_20260614/state.toml`. The "completed" entries at lines 18-22 and 29-45 are correct; the "pending" duplicates must be removed.
+
+### FR7: Update `tracks.md` row 24 (G18)
+
+Update the status column at `conductor/tracks.md:41` to reflect the track's completion. The user preferred pattern (move to "Recently Completed" or just update status) is a Tier 1 review decision; either is acceptable.
+
+### FR8: Regression sweep + doc update
+
+Phase 5 task. Run the full test suite (`uv run pytest tests/`) and verify all G1-G13 + FR1-FR5 fixes are green. Update `docs/guide_ai_client.md` "See Also" section with cross-references to this track (similar to what was done in `ai_loop_regressions_20260614`).
+
+## 5. Non-Functional Requirements
+
+- **NFR1 (Atomic per-task commits)**: each plan task is one commit; no batching. Use the project's "1 commit per task" discipline (see `conductor/workflow.md`).
+- **NFR2 (1-space indentation)**: enforced by the project's AI-Optimized Python style.
+- **NFR3 (No diagnostic noise in production)**: no `sys.stderr.write("[XYZ_DIAG] ...")` lines in committed code. If instrumentation is needed for the TDD test, it goes to `tests/artifacts/<test_name>.diag.log`.
+- **NFR4 (Test isolation)**: the 11 test mock fixes must NOT use `unittest.mock.patch` to bypass the new Result API; they must correctly unwrap `result.data` or check `result.ok`. Per the project's "No Mock Patches to Pseudo API" anti-pattern rule.
+- **NFR5 (No regression in other providers)**: the 5 unaffected providers (Anthropic, Qwen, Grok non-thinking tests, Llama non-mock tests, Llama native non-mock tests) must continue to pass their existing tests.
+- **NFR6 (Thread safety)**: the FR1 fix in `_api_generate` must use `_disc_entries_lock` (the same lock the original code used) to avoid races with the GUI's discussion updates.
+
+## 6. Architecture Reference
+
+For implementation details, consult:
+
+- **`docs/guide_ai_client.md`**: the canonical guide for `src/ai_client.py`; the new `send_result()` API is documented in the "Data-Oriented Error Handling (Fleury Pattern) > Public API" section. The test mock fixes (FR2, FR3) follow the patterns shown there.
+- **`docs/guide_app_controller.md`**: the canonical guide for `src/app_controller.py`; the `_api_generate` and `_handle_request_event` flows are described in §"AI Loop Lifecycle". The FR1 fix lives in this subsystem.
+- **`docs/guide_thinking.md`** (or `docs/guide_discussions.md`): the canonical guide for thinking-mono rendering; the `parse_thinking_trace` markers are documented. FR4 (Gemini format) and FR5 (half-width marker) are in this subsystem.
+- **`conductor/code_styleguides/error_handling.md`**: the canonical reference for the Result/ErrorInfo pattern; the new FR2 test assertions follow §3.1 "AND over OR (Result struct with side-channel errors)".
+- **`docs/reports/TRACK_COMPLETION_ai_loop_regressions_20260615.md`**: the parent track's completion report. The G17 state.toml bug and the G18 tracks.md row issue are documented in the Tier 1 review §"Critical Issues" of that track.
+
+## 7. Out of Scope
+
+The following items are **explicitly out of scope** and tracked elsewhere:
+
+- **`public_api_migration_20260606`** (planned, separate track): removes the deprecated `ai_client.send()` and migrates 5 production + 63 test call sites to `send_result()`. This track only fixes the broken `_api_generate` site (G1) and the test mock bugs that the public_api migration would touch (G2-G12). The other 50+ test call sites are deferred to public_api.
+- **`live_gui_mock_injection_20260615`** (not yet specced): infrastructure for mock injection into the live_gui subprocess. Recommended as a separate track because it requires infrastructure work (subprocess mock protocol, conftest changes) and unblocks future live_gui + AI client tests.
+- **`test_rag_phase4_final_verify` flakiness**: pre-existing RAG subsystem issue (not caused by the data_oriented_error_handling or ai_loop_regressions tracks). The `'NoneType' object has no attribute 'get'` error is in RAG config lookup code, not AI client code. Recommended as a separate RAG track.
+- **`test_discussion_truncate_layout.py::test_keep_pairs_input_uses_adequate_width`**: Phase 2 of the UI Polish Five Issues track (`ui_polish_five_issues_20260302`). The track spec is at `docs/superpowers/specs/2026-06-03-ui-polish-design.md`.
+- **`test_log_management_refresh.py::test_refresh_registry_button_calls_load_registry`**: Phase 3 of the same UI Polish track. Both are out of scope here.
+- **The deprecated `ai_client.send()` removal**: that's the public_api_migration_20260606 track.
+
+## 8. Phases (Summary)
+
+| Phase | Name | Tasks | Verification |
+|---|---|---|---|
+| **Phase 1** | **CRITICAL: Fix `_api_generate` NameError (G1)** | 2 tasks: write failing test (`test_generate_endpoint` already exists; verify it fails for the NameError reason), fix the production code | `test_headless_service.test_generate_endpoint` returns 200 |
+| **Phase 2** | **Fix 10 test mock bugs (G2-G12, G14) + 1 mock shape fix (G13)** | 11 tasks: one per test file (4-5 per file group), TDD-red + green per file | Full suite has 11 fewer failures |
+| **Phase 3** | **Fix Gemini / Gemini CLI thinking-format (G15)** | 3 tasks: empirical investigation, fix the format mismatch (either normalization pass or parser extension), live_gui verification | Gemini thinking mono renders in Discussion Hub |
+| **Phase 4** | **Add `<think>` half-width marker (G16)** | 2 tasks: extend regex in `thinking_parser.py:9`, add 1+ new tests in `test_thinking_trace.py` | `parse_thinking_trace` extracts 1 segment from `<think>...</think>` text |
+| **Phase 5** | **Housekeeping + regression sweep + docs (G17, G18, FR8)** | 4 tasks: fix `state.toml` duplicates, update `tracks.md`, full suite sweep, doc update | Full suite green; state.toml parseable; tracks.md row 24 updated |
+
+## 9. Risk Analysis
+
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| **R1**: The FR1 `_api_generate` fix accidentally introduces a regression in the existing FR2/FR3 logic | Low | High | The fix only ADDS lines, doesn't modify any existing logic. After the fix, the function matches the original (pre-`ai_loop_regressions_20260614`) semantics. |
+| **R2**: The 11 test mock fixes have subtle differences in `result.ok` semantics that cause new test failures | Low | Low | The pattern is mechanical (`assert result.ok` then `assert result.data == "x"`). If a test is `assert result.ok` and `result.ok` is False, the failure message is clear (shows the ErrorInfo). |
+| **R3**: The Gemini thinking format investigation (Phase 3) requires running a real Gemini request, which the user may not have credentials for | Medium | Medium | If real Gemini credentials are unavailable, use a mock client that returns a realistic Gemini response with thinking content. Document the format assumption. |
+| **R4**: The `<think>` regex extension accidentally matches too much (e.g., greedy matching across multiple segments) | Low | Low | Use `re.DOTALL` + non-greedy `.*?` (consistent with the existing pattern). The existing 5+ tests in `test_thinking_trace.py` will catch regressions. |
+| **R5**: The `state.toml` cleanup (Phase 5) accidentally deletes the wrong lines | Very Low | High | Only delete the duplicate "pending" entries; the "completed" entries with commit SHAs must be preserved. The fix is mechanical and verifiable by re-running `tomllib.load()`. |
+
+## 10. Coordination with Pending Tracks
+
+This track is **independent** (no `blocked_by`) but interacts with:
+
+- **`ai_loop_regressions_20260614`** (shipped 2026-06-15): this track fixes the production regression (G1) and housekeeping issues (G17, G18) that the parent track left behind. It also picks up the 2 deferred bugs (G15, G16) from the parent's spec §13. No direct dependency — the parent track is shipped; this track is cleanup.
+- **`public_api_migration_20260606`** (planned, not yet specced): this track's G2-G12 test mock fixes overlap with the public_api track's test migration scope. After this track ships, the public_api track will have 11 fewer tests to migrate. The public_api track is responsible for the remaining 50+ test call sites and the 5 production call sites.
+- **`data_oriented_error_handling_20260606`** (shipped 2026-06-12): the root cause of the G2-G14 test mock bugs. This track is the test-cleanup follow-up to the parent refactor. No direct interaction — the parent track is shipped; this track fixes the remaining test fallout.
+- **UI Polish Five Issues track** (`ui_polish_five_issues_20260302`): the 2 out-of-scope test failures (`test_discussion_truncate_layout`, `test_log_management_refresh`) are Phase 2 and Phase 3 of that track. That track has its own plan and is ready to start; this track does not touch it.
+
+## 11. Verification Criteria (definition of "done")
+
+The track is complete when ALL of the following are true:
+
+- [ ] `test_headless_service::TestHeadlessAPI::test_generate_endpoint` returns 200 (proves the G1 fix).
+- [ ] All 11 test mock fixes (G2-G12) pass: full batched test suite has 11 fewer failures than before.
+- [ ] `test_ai_client_tool_loop_builder::test_run_with_tool_loop_calls_request_builder_each_round` passes (G13).
+- [ ] Phase 3 Gemini investigation produces a finding: either a normalization pass in `_send_gemini*` is added OR the parser is extended, AND a live_gui test or unit test demonstrates Gemini thinking-mono rendering.
+- [ ] `parse_thinking_trace` correctly extracts 1 ThinkingSegment from `<think>...</think>` text (G16).
+- [ ] `tests/test_thinking_trace.py` has 1+ new test for the half-width marker; all existing 5+ tests still pass.
+- [ ] Python's `tomllib.load()` on `conductor/tracks/ai_loop_regressions_20260614/state.toml` succeeds (G17).
+- [ ] `conductor/tracks.md` row 24 reflects the track's completion (G18).
+- [ ] Full test suite is green (no new failures beyond the deferred test_rag_phase4_final_verify and UI Polish tests).
+- [ ] `docs/guide_ai_client.md` "See Also" section has 2 new cross-references: (1) this cleanup track; (2) reference to `public_api_migration_20260606`.
+- [ ] `metadata.json` `verification_criteria` field is updated to reflect completion.
+
+## 12. See Also — Follow-up Notes
+
+### 12.1 `public_api_migration_20260606` (planned, separate track)
+
+Migrates the remaining 5 production call sites and 63 test call sites to `send_result()`. This track fixes only the broken `_api_generate` site (G1) and the 11 test mock bugs that the public_api track would have touched (G2-G12). The remaining ~50 test call sites and 5 production call sites are deferred.
+
+### 12.2 `live_gui_mock_injection_20260615` (not yet specced)
+
+Infrastructure for mock injection into the live_gui subprocess. The `ai_loop_regressions_20260614` Tier 2 review (§9 of the report) recommended this as a follow-up because the live_gui smoke tests only verify the Hook API substrate is reachable — they don't exercise the full request → AI client → discussion pipeline end-to-end. Without this infrastructure, future tracks hitting live_gui + AI client will hit the same wall.
+
+### 12.3 `test_rag_phase4_final_verify` flakiness (separate RAG concern)
+
+Pre-existing RAG subsystem issue not caused by the data_oriented_error_handling or ai_loop_regressions tracks. The error `'NoneType' object has no attribute 'get'` is in RAG config lookup code, not AI client code. A partial fix was attempted in commit `16412ad5` (RAG Phase 4 dim-mismatch recovery). Recommended as a separate RAG track.
+
+### 12.4 UI Polish Five Issues track (separate track)
+
+The 2 unrelated test failures in the full suite (`test_discussion_truncate_layout` and `test_log_management_refresh`) are Phase 2 and Phase 3 of the UI Polish track (`ui_polish_five_issues_20260302`). That track has its own spec and plan. Not in scope here.