conductor: register ai_loop_regressions_20260614 in tracks.md (priority A, ready for Tier 2)

2026-06-15 00:48:12 -04:00
parent acc294ae4e
commit f4c497b1e8
1 changed files with 14 additions and 0 deletions
@@ -38,6 +38,7 @@ Tracks that are unblocked and ready to start. Ordered by **dependency** (blocked
 | 16 | — | [GenCpp Dogfood Feedback Loop](#track-gencpp-dogfood-feedback-loop) | spec TBD | (none — independent; oldest pending track) |
 | 17 | — | [Code Path Audit](#track-code-path-audit) | spec TBD | test_infrastructure_hardening_20260609 (merged) |
 | 23 | A (research) | [Intent-Based Scripting Languages Survey](#track-intent-based-scripting-languages-survey-new-2026-06-12) | spec ✓, plan pending | (none — independent; NEW 2026-06-12; **non-impl research track**, **time-sensitive: report must complete before nagent v2.2**) |
+| 24 | A (bugfix) | [AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek)](#track-ai-loop-regressions-minimax-gemini-gemini-cli-deepseek-new-2026-06-14) | spec ✓, plan ✓, ready to start | (none — independent; **NEW 2026-06-14**; user-blocking; 3 bugs from `data_oriented_error_handling_20260606`) |
 | 18 | — | [GUI Architecture Refinement](#track-gui-architecture-refinement) | (no spec.md) | (TBD) |
 | 19 | — | [Context First Message Fix](#track-context-first-message-fix) | spec TBD | (none — independent) |
 | ~~19~~ | — | ~~[Fix Remaining Tests](#track-fix-remaining-tests)~~ | ~~SUPERSEDED by track 1~~ | — |
@@ -489,6 +490,19 @@ Lightweight chronology; full spec/plan/state per track is in the linked folder.

 *Goal: Improve AI-readability by naming 430 currently-anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types. New `src/type_aliases.py` with 10 `TypeAlias` definitions (`Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`) and 1 `NamedTuple` (`FileItemsDiff`). Mechanical replacement of 345 weak sites across 6 high-traffic files: `src/ai_client.py` (139), `src/app_controller.py` (86), `src/models.py` (51), `src/api_hook_client.py` (32), `src/project_manager.py` (20), `src/aggregate.py` (17). Add `--strict` mode to the existing `scripts/audit_weak_types.py` (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate `scripts/audit_weak_types.baseline.json` with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. **Data-grounded**: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. **Honest about what's missing**: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk. **Now blocked by** test_infrastructure_hardening_20260609 (was: none).*

+#### Track: AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek) `[track-created: pending]`
+*Link: [./tracks/ai_loop_regressions_20260614/](./tracks/ai_loop_regressions_20260614/), Spec: [./tracks/ai_loop_regressions_20260614/spec.md](./tracks/ai_loop_regressions_20260614/spec.md), Plan: [./tracks/ai_loop_regressions_20260614/plan.md](./tracks/ai_loop_regressions_20260614/plan.md), Metadata: [./tracks/ai_loop_regressions_20260614/metadata.json](./tracks/ai_loop_regressions_20260614/metadata.json)*
+
+*Status: 2026-06-14 — Active, ready for Tier 2 implementation. User-blocking diagnostic track. 3 root causes identified via investigation; each gets its own phase with TDD tests + atomic fix commit.*
+
+*Goal: Diagnose and fix the user-blocking AI loop regressions for the 4 providers (MiniMax, Gemini, Gemini CLI, DeepSeek) most heavily touched by the `data_oriented_error_handling_20260606` track (shipped 2026-06-12) and the subsequent `ai client pass` commit `5030bd84` (2026-06-13, 503-line `src/ai_client.py` refactor). 3 distinct bugs: **Bug #1** (3 dead `except ai_client.ProviderError` clauses in `src/app_controller.py:305, 313, 3692` — the class was removed in commit `64b787b8`; Python evaluates the class on every raised exception, so the except clauses silently break the error path). **Bug #2** (`_handle_request_event` calls the deprecated `ai_client.send()` which now returns `""` on error; the empty string is queued as a `response` comms entry, but `_on_comms_entry` at line 3801 filters it out via `if text_content.strip():`, so no discussion entry is added — this is the "AI turns are not getting proper entries" symptom). **Bug #3** (`_send_minimax` uses `reasoning_extractor` to extract reasoning into `history[].reasoning_content`, but the returned `response_text` (and thus `Result.data`) does not include `<thinking>` tags — DeepSeek correctly wraps reasoning in `<thinking>` tags at line 2117-2118, but MiniMax does not, so `parse_thinking_trace` finds no thinking blocks and no thinking segments are added to the discussion entry — this is the "thinking monologues no longer rendering" symptom).*
+
+*5 phases: Phase 1 (TDD red — 3 test files reproducing each bug), Phase 2 (fix Bug #2 in `src/app_controller.py:_handle_request_event` + live_gui regression test), Phase 3 (fix Bug #1 — replace 3 dead except clauses with `send_result()` pattern + AST scan verification), Phase 4 (fix Bug #3 — wrap MiniMax reasoning in `<thinking>` tags via a new `wrap_reasoning_in_text` kwarg in `run_with_tool_loop` + live_gui regression test), Phase 5 (full suite sweep + add 2 follow-up notes to `docs/guide_ai_client.md` "See Also" section). 17 tasks total, ~14 atomic commits, 1-2 days of Tier 2 work.*
+
+*Deferred to follow-up tracks (per user direction 2026-06-14): (1) Gemini / Gemini CLI thinking-format compatibility investigation (Bug #4) — the user's complaint includes Gemini but the format issue is plausibly a pre-existing limitation, not a new regression. (2) `<think>` (half-width) marker support in `thinking_parser.py` (Bug #5) — user screenshot showed `<think>...</think>` format which `parse_thinking_trace` doesn't currently match. Both documented in spec §13.1 and §13.2.*
+
+*`blocks: public_api_migration_20260606` (this track migrates 3 broken sites in `_handle_request_event` + 2 API endpoints; the public_api track picks up the remaining 5 production + 63 test call sites).*
+
 #### Track: MCP Architecture Refactor (Sub-MCP Extraction) `[track-created: 2720a894]`
 *Link: [./tracks/mcp_architecture_refactor_20260606/](./tracks/mcp_architecture_refactor_20260606/), Spec: [./tracks/mcp_architecture_refactor_20260606/spec.md](./tracks/mcp_architecture_refactor_20260606/spec.md), Plan: [./tracks/mcp_architecture_refactor_20260606/plan.md](./tracks/mcp_architecture_refactor_20260606/plan.md) (to be authored by writing-plans skill)*