Replaces deprecated ai_client.send(md_content='', user_message=user_message,
enable_tools=False) with ai_client.send_result(...) and branches on
result.ok. On error, logs the ui_message() and returns [] (the function
returns a list of track definitions or [] on failure).
The 3 tests in test_orchestrator_pm.py + 1 in test_orchestrator_pm_history.py
now fail because they mock src.ai_client.send. These will be fixed in
Phase 2.14-2.15 by mocking send_result instead.
Replaces deprecated ai_client.send(md_content='', user_message=user_message)
with ai_client.send_result(...) and branches on result.ok. On error, logs
the ui_message() and returns None (the function returns a list of ticket
definitions or None on failure).
The previous code called the @deprecated send() shim which silently
returns '' on error. The empty string would then be passed to json.loads,
causing JSONDecodeError and 3 retry attempts. The new code short-circuits
on the first error and returns None immediately.
This is the easiest of the 3 production migrations (2-arg call with no
callbacks). See plan.md Phase 1.1. Test fixes for the production-affected
mocks in test_conductor_tech_lead.py and test_orchestration_logic.py are
in Phase 2.12 and Phase 2.13.
NOTE: 4 tests now fail (3 in test_conductor_tech_lead.py + 1 in
test_orchestration_logic.py) because they mock src.ai_client.send.
These will be fixed in Phase 2.12/2.13 by mocking send_result instead.
The original Phase 2 covered 12 test files that *call* ai_client.send(...).
Phase 1.1 implementation revealed 7 additional test files that *mock*
ai_client.send (via patch()) for tests of the production code paths.
When production migrates to send_result(), these mocks receive 0 calls
and the tests fail with 'send was called 0 times'.
Adding Phase 2.12-2.18 to cover:
- test_conductor_tech_lead.py (3 mocks; breaks after Phase 1.1)
- test_orchestration_logic.py (1 mock; breaks after Phase 1.1)
- test_orchestrator_pm.py (3 mocks; pre-empt Phase 1.2)
- test_orchestrator_pm_history.py (1 mock; pre-empt Phase 1.2)
- test_phase6_engine.py (1 mock; pre-empt Phase 1.3)
- test_run_worker_lifecycle_abort.py (1 mock; pre-empt Phase 1.3)
- test_spawn_interception_v2.py (1 mock; pre-empt Phase 1.3)
test_rag_integration.py mock migration deferred to RAG track (OOS1).
Also adds state.toml for the track (7 phases, 28 tasks, audit fields).
In-depth handoff for Tier 1 review covering:
- Executive summary with TL;DR
- Goal & scope (planned vs delivered)
- Per-phase delivery summary
- Test coverage analysis (7 new + 2 adapted + 2 smoke)
- Deferred items documentation (3 cross-references)
- Pre-existing failures (14, verified not caused by this track)
- Plan deviations (6 items, with rationale)
- Post-ship risk register
- Commit inventory with diff stat
- 7 recommendations for the Tier 1 reviewer
- Handoff checklist
Working tree was clean before adding the report (no other changes to commit).
Updates status: active -> completed, adds completed_at date,
updates verification_criteria with the actual verification results.
7 regression tests pass; 14 pre-existing failures (parent track's
state.toml [regressions_20260612]) are not caused by these changes.
Adds 3 entries to the See Also section:
1. Gemini / Gemini CLI thinking-format compatibility (deferred from
ai_loop_regressions_20260614) - investigate empirically
2. <think> (half-width) marker support in thinking_parser (deferred)
3. Public API Result Migration (planned, separate track public_api_migration_20260606)
Each entry links to the corresponding spec section for traceability.
Mirrors the FR1 live_gui smoke test: the full end-to-end live_gui FR3
test would require mock injection into the live_gui subprocess. The
mock-based regression coverage for FR3 is already in
test_ai_loop_regressions_20260614.py::test_fr3_minimax_thinking_in_returned_text.
This smoke test verifies the disc_entries field is exposed via the
Hook API, establishing the integration substrate for follow-up work.
Adds a new wrap_reasoning_in_text: bool = False keyword argument to
run_with_tool_loop. When True and reasoning_content is non-empty, the
returned text is prepended with <thinking>...</thinking> tags so
thinking_parser.parse_thinking_trace can extract a ThinkingSegment
for the discussion entry.
The wrap is conditional (default False) so it doesn't break providers
that already wrap inline (e.g. DeepSeek, which wraps at line 2117-2118
before run_with_tool_loop sees the response).
_send_minimax now passes wrap_reasoning_in_text=bool(caps.reasoning).
When caps.reasoning is True (M2.5/M2.7), the reasoning is wrapped in
<thinking> tags. When False (M2/M2.1), the parameter is False and
no wrap happens (avoids useless getattr on non-reasoning models).
Also fixes a bug in the test_fr3_minimax_thinking_in_returned_text
test mock: it was returning a raw MagicMock instead of a Result
object, which caused the test to see auto-created MagicMock attributes
instead of the expected text. Now wraps in Result(data=MagicMock(...))
and sets ai_client._model to ensure get_capabilities('minimax', _model)
resolves to the M2.7 capabilities (reasoning=True).
Replaces 3 dead 'except ai_client.ProviderError' clauses (the class was
removed in commit 64b787b8) with the new send_result() + result.ok
pattern. Removes the inner try/except block entirely (replaced by
'if not result.ok: raise HTTPException(502, ...)').
Sites fixed:
- _api_generate: send() -> send_result() + result.ok branch
- _handle_request_event (already fixed in FR1 commit 24ba2499)
AST scan via test_fr2_no_provider_error_in_source now passes: zero
remaining references to ai_client.ProviderError in src/app_controller.py.
The single remaining 'except Exception as e: import traceback;
traceback.print_exc(); raise HTTPException(500, str(e))' is the
legitimate outer except for unexpected in-flight errors.
Added a one-line comment per the plan referencing the data-oriented
error handling styleguide, so future migrations follow the same pattern.
The full end-to-end live_gui FR1 test would require mock injection into
the live_gui subprocess (patches in the test process do NOT propagate).
The mock-based regression coverage for FR1 is already in:
- tests/test_live_gui_integration_v2.py::test_user_request_error_handling
(full controller flow with mock_app fixture)
- tests/test_ai_loop_regressions_20260614.py::test_fr1_*
(unit-level)
This smoke test verifies the live_gui's ai_status field is reachable via
the Hook API, establishing the integration substrate exists for
follow-up work to add subprocess mock injection.
The 2 tests in test_live_gui_integration_v2.py were mocking the old
ai_client.send() and asserting on the old error format. The FR1 fix
migrated _handle_request_event to ai_client.send_result() and routes
errors via ErrorInfo.ui_message() instead of f'ERROR: {e}'.
Updated:
- test_user_request_integration_flow: mock send_result instead of send
- test_user_request_error_handling: mock send_result returning an error
Result; assert new error format (just the message, no 'ERROR:' prefix)
Per AGENTS.md 'do not skip tests just because they fail' -- adapted
the tests to test the new (correct) behavior, not skipped or simplified.
Replaces deprecated ai_client.send() in _handle_request_event with
send_result() and branches on result.ok. On error, the first ErrorInfo
is routed to the event_queue as a 'response' with status='error',
allowing _on_comms_entry to add it to the discussion history.
The previous code called the @deprecated send() shim which silently
returns '' on error. The empty string was then filtered out by
_on_comms_entry (text_content.strip() check at line 3801), so users
saw no discussion entry for failed AI requests.
This also removes the dead 'except ai_client.ProviderError' clause at
line 3692 (the class was removed in commit 64b787b8). The 2 remaining
dead clauses at lines 305, 313 are fixed in the next commit (FR2).