3 empty-default sites per Tier 1 directive (NOT heuristic — empty default
is NOT a drain per error_handling.md:528-531):
1. L394 set_provider (minimax branch): added _set_minimax_provider_result helper.
The helper returns Result[list[str], ErrorInfo] with structured errors.
Legacy set_provider delegates to the helper; falls back to empty key on
failure (preserving original behavior).
2. L716+L723 _execute_tool_calls_concurrently (deepseek + minimax):
added _parse_tool_args_result helper that returns Result[dict, ErrorInfo].
The for-loop accumulates per-call errors into a local file_errors list.
3. L994 _reread_file_items: added _reread_file_items_result helper that
returns Result[tuple, ErrorInfo]. Per TIER1_REVIEW, caller does NOT
check err_item["error"] flag (verified by reading _build_file_diff_text
and the 4 callers), so this site needed full migration (NOT heuristic).
Legacy function delegates to the helper and logs errors to stderr
(operator-visible drain).
All 4 originally-UNCLEAR sites are now compliant:
L332, L355: BOUNDARY_CONVERSION (via existing creates_errorinfo check)
L394, L716, L723, L994: COMPLIANT (via Result-returning migration)
Audit: ai_client UNCLEAR 6 -> 0. Total: 19 INTERNAL_COMPLIANT.
Tests: 51 pass (28 baseline + 16 audit heuristics + 5 ai_client + 2 async_tools).
Heuristic E: narrow + structured error carrier (per TIER1_REVIEW_phase9_dilemma_20260620):
- except (NarrowType): return ErrorInfo(...) -> INTERNAL_COMPLIANT
- except (NarrowType): <item>["error"] = True -> INTERNAL_COMPLIANT
Distinguishes from the empty-default pattern (args = {}, body = ...) which
is explicitly NOT a drain per error_handling.md:528-531.
Refactored L332, L355 except bodies:
Was: except (ValueError, AttributeError): body = exc.response.text
Now: except (ValueError, AttributeError) as e: return ErrorInfo(...)
The function still returns ErrorInfo either way. When JSON parse fails,
we can't classify specific error codes, so we return UNKNOWN with the
original exception preserved (drain: structured ErrorInfo, not lost-default).
Added 2 helper methods:
_has_errorinfo_return(stmts) -> bool
_has_dict_error_true_assign(stmts) -> bool
Tests: 41 pass (28 baseline + 13 audit heuristics including the original 8).
Audit: ai_client UNCLEAR 6 -> 4 (L332+L355 now BOUNDARY_CONVERSION).
Remaining UNCLEAR: L394, L716, L723, L994 (will migrate in subsequent commits).
Was: except Exception as e (broad)
Now: except (OSError, UnicodeDecodeError) as e
The err_item drain (returned via the refreshed list with error: True flag)
is preserved. Only specific file I/O errors are caught now.
Both deepseek and minimax branches in the tool call dispatcher had:
try: args = json.loads(tool_args_str)
except: args = {}
json.JSONDecodeError is a subclass of ValueError, so narrowed to:
except (ValueError, TypeError): args = {}
This satisfies the BC classification (specific exception types).
Narrowed 3 INTERNAL_BROAD_CATCH sites to specific exception types:
1. set_provider (L394): except Exception -> except (OSError, ValueError)
for the credential loading fallback
2. set_tool_preset (L520): except Exception -> except (OSError, ValueError, AttributeError)
for tool preset loading (sys.stderr.write + flush preserved)
3. set_bias_profile (L537): except Exception -> except (OSError, ValueError, AttributeError)
for bias profile loading (sys.stderr.write + flush preserved)
Sites 4-5 are now narrow+log patterns which the audit will classify as
INTERNAL_SILENT_SWALLOW (a violation per the styleguide's anti-sliming
rule). They will be addressed in Phase 11 (silent-swallow cleanup).
The bare 'except:' in _classify_deepseek_error (L332) and _classify_minimax_error (L355)
was classified as INTERNAL_BROAD_CATCH. Narrowed to 'except (ValueError, AttributeError)'
since the only realistic exceptions from exc.response.json() are JSONDecodeError (subclass of ValueError)
and AttributeError (if exc.response is None or .json() is missing).
The TDD red moment. The implementation is renamed but the call sites
in src/, tests/, and docs still use send_result. Subsequent commits
rename the call sites and progressively move the test suite back to
green.
10 references renamed in src/ai_client.py:
- 4 'Called by: send_result' docstring tags in private provider helpers
- 1 function definition (def send_result -> def send)
- 1 [C: ...] SDM tag referencing test function names
- 2 monitor component names (start_component / end_component)
- 2 error source strings (CONFIG + INTERNAL)
Also adds scripts/tier2/apply_t1_1_edits.py - the helper script that
applied the 10 edits. Kept in scripts/tier2/ as a record of the
mechanical change pattern.
Refs: conductor/tracks/send_result_to_send_20260616/
Removes the @deprecated send() function (was at src/ai_client.py:2939-3000)
and the from typing_extensions import deprecated import (line 38). The
function is replaced by send_result() which has been the canonical public
API since the data_oriented_error_handling_20260606 track (commit 9f86b2be).
All 3 production call sites (src/conductor_tech_lead.py:68,
src/orchestrator_pm.py:86, src/multi_agent_conductor.py:591) and 18 test
files were migrated in Phases 1-2; 4 pre-existing failures were fixed in
Phases 3-4. No remaining callers of ai_client.send(.
Verification:
- uv run rg 'def send\\(' src/ai_client.py returns 0 hits
- import src.ai_client; hasattr(ai, 'send') is False
- 73/73 migrated tests pass
Adds a new wrap_reasoning_in_text: bool = False keyword argument to
run_with_tool_loop. When True and reasoning_content is non-empty, the
returned text is prepended with <thinking>...</thinking> tags so
thinking_parser.parse_thinking_trace can extract a ThinkingSegment
for the discussion entry.
The wrap is conditional (default False) so it doesn't break providers
that already wrap inline (e.g. DeepSeek, which wraps at line 2117-2118
before run_with_tool_loop sees the response).
_send_minimax now passes wrap_reasoning_in_text=bool(caps.reasoning).
When caps.reasoning is True (M2.5/M2.7), the reasoning is wrapped in
<thinking> tags. When False (M2/M2.1), the parameter is False and
no wrap happens (avoids useless getattr on non-reasoning models).
Also fixes a bug in the test_fr3_minimax_thinking_in_returned_text
test mock: it was returning a raw MagicMock instead of a Result
object, which caused the test to see auto-created MagicMock attributes
instead of the expected text. Now wraps in Result(data=MagicMock(...))
and sets ai_client._model to ensure get_capabilities('minimax', _model)
resolves to the M2.7 capabilities (reasoning=True).
This resolves the 401 Unauthorized/invalid api_id error by letting the MiniMax client default to api.minimax.io/v1 (like the model listing logic) or read a custom base_url from credentials.toml.
This resolves the issue where calling 'send_openai_compatible' discarded the NormalizedResponse details, resulting in an AttributeError when accessing 'raw_response' inside the tool loop.
The 6 error-classifier functions in ai_client.py, openai_compatible.py,
and qwen_adapter.py now return ErrorInfo (data-oriented) instead of
ProviderError. Each takes a source: str parameter for telemetry
provenance. ProviderError class is still used in production code paths
(Task 3.4) and will be removed in Task 3.7.
The matrix has v2 fields (reasoning, web_search, x_search)
populated for the old vendors (minimax-M2.5/M2.7, grok-*),
but the send functions didn't consult them. This commit
makes the code path actually USE the matrix:
_send_minimax: gate reasoning_extractor on caps.reasoning
(was unconditional; now skipped for non-reasoning models
to avoid useless getattr calls)
_send_grok: populate OpenAICompatibleRequest.extra_body with
search_parameters when caps.web_search or caps.x_search is
True. caps.web_search -> {mode: auto}; caps.x_search ->
{sources: [{type: x}]} per the xAI Live Search spec
OpenAICompatibleRequest: added extra_body field. Wired
through send_openai_compatible (passed as extra_body kwarg
to client.chat.completions.create).
Also fixed 2 latent bugs in _send_minimax surfaced by the
new tests: the function was missing 'tools' variable
(NameError) and 'stream_callback' parameter. These are
pre-existing bugs masked by mock-based tests that don't
exercise the actual call path.
Also cancelled t5_6/7/8 (the invented 'deferred tool-loop
conversion' work). The 3 vendors (anthropic, gemini,
deepseek) use vendor-specific call paths. Their inline
loops are NOT defects. The '3-5 days' / '1-2 weeks'
estimates were made up by the agent. The audit script's
DEFERRED_VENDORS exclusion is permanent.
Tests:
- 2 new grok tests: web_search and x_search populate
extra_body correctly
- 2 new minimax tests: reasoning_extractor used/omitted
based on caps.reasoning
- 122/122 vendor+tool+provider+import-isolation tests pass
(no regressions; +4 new tests this commit)
- 3 audit scripts pass
When _llama_base_url is localhost/127.0.0.1, _send_llama now
calls _send_llama_native (the native /api/chat adapter)
instead of the OpenAI-compat path. The native adapter
supports Ollama's vendor-specific fields: think, images,
thinking.
Functions added (in src/ai_client.py, per the naming
convention HARD RULE on no new src/*.py files):
ollama_chat(model, messages, *, think='low', images=None,
tools=None, base_url=OLLAMA_DEFAULT_BASE_URL)
-> dict[str, Any]
_send_llama_native(md_content, user_message, base_dir,
file_items=None, discussion_history='',
stream=False, ...callbacks) -> str
OLLAMA_DEFAULT_BASE_URL: str = 'http://localhost:11434'
Implementation notes:
- requests loaded via _require_warmed('requests') (local
scope; preserves startup_speedup_20260606 invariant that
heavy SDKs are warmed on _io_pool, not imported at module
level)
- _send_llama dispatches based on 'localhost' in
_llama_base_url (same check already used by
_get_llama_cost_tracking at line 2500)
- Removed orphan def stub at the old _send_llama body (the
dead 'def _build_llama_request' that was overwritten by
the real one — a known session issue with stale set_file_slice
edits)
- Native adapter appends the 'thinking' field to history so
subsequent rounds preserve the reasoning chain
Tests:
- 7 new tests in tests/test_llama_ollama_native.py:
* ollama_chat hits /api/chat (not /v1/chat/completions)
* ollama_chat includes 'think' param in payload
* ollama_chat includes 'images' in payload
* _send_llama_native wraps ollama_chat
* _send_llama_native preserves 'thinking' field
* _send_llama routes localhost to native (no openai client)
* _send_llama keeps openai path for non-local (no POST)
- Updated test_send_llama_ollama_backend in test_llama_provider.py
to mock the native path (was: mocked openai-compat; now:
mocked requests.post)
- 103/103 vendor+tool+provider+import-isolation tests pass
(no regressions; +7 new tests this commit)
- 4 audit scripts pass
Phase 2 tasks 2.1 + 2.2 + 2.3a of the follow-up track.
PROVIDERS now lives in src/ai_client.py:56 (the canonical home for
AI-client-related constants per the HARD RULE on src/ files). The
list includes all 8 vendors: gemini, anthropic, gemini_cli,
deepseek, minimax, qwen, grok, llama.
Backward compat: src/models.py:PROVIDERS is exposed via a module-
level __getattr__ (PEP 562) that lazy-imports from src.ai_client.
The lazy approach was needed because src.ai_client imports
ToolPreset/BiasProfile/Tool from src.models at line 50, so a
top-level 'from src.ai_client import PROVIDERS' in models.py
would deadlock. Adding a branch to the existing __getattr__
in models.py (which also handles pydantic class factories) is
the surgical fix.
tests/test_provider_curation.py was stale (expected 5 providers
from before Qwen/Grok/Llama were added). Updated to 8.
New test: tests/test_providers_source_of_truth.py asserts:
- src.ai_client.PROVIDERS exists and matches the 8-provider list
- src.models.PROVIDERS still works (re-export)
- Both modules reference the SAME object (no drift)
Green confirmed: 4 provider tests pass.
The follow-up track's tool-loop refactor moved
'from src.openai_compatible import send_openai_compatible,
OpenAICompatibleRequest, NormalizedResponse' to MODULE level
in src/ai_client.py. This violates the startup_speedup_20260606
invariant: heavy SDKs must not be loaded at module level because
ai_client.py is on the main thread's import chain.
src/openai_compatible.py line 5 does 'from openai import
OpenAIError, ...', so any import from it triggers the openai SDK
to load. test_ai_client_does_not_import_openai_at_module_level
guards this invariant and was failing.
Fix: move the imports back to local scope inside the function
bodies that need them:
- _default_send closure inside run_with_tool_loop
(imports send_openai_compatible)
- _send_grok (imports OpenAICompatibleRequest)
- _send_minimax (imports OpenAICompatibleRequest)
- _send_llama (imports OpenAICompatibleRequest)
- _send_gemini_cli (imports OpenAICompatibleRequest + NormalizedResponse)
Test patches: tests that previously patched
'src.ai_client.send_openai_compatible' now patch
'src.openai_compatible.send_openai_compatible' (the actual
import source). _execute_tool_calls_concurrently patches
unchanged (it's defined in src/ai_client.py itself).
Green confirmed: 62 vendor + tool + import-isolation tests
pass. 0 regressions.
Task 1.7 of the follow-up track. Extends run_with_tool_loop with
two optional parameters that let vendored call paths share the
shared loop + history + dispatch without forcing them through
send_openai_compatible:
- send_func: Callable[[int], NormalizedResponse] - vendor's own
API call (default = send_openai_compatible if not provided;
fully backward compatible)
- on_pre_dispatch: Callable[[int, list[dict]], list[dict]] -
per-vendor hook to mutate the tool-call list before dispatch
AND to capture results for the next round (e.g. Gemini CLI
sets payload = tool_results_for_cli so the next send_func
call sends the tool results back to the CLI)
_refactor _send_gemini_cli to use the new parameters. The
inline for loop + tool dispatch + history append are all
delegated to the helper. The vendor's send_func closure
handles:
- adapter.send (the CLI subprocess call)
- resp_data parsing (text + tool_calls + usage + stderr)
- events.emit for request_start + response_received
- _append_comms for IN/OUT comms logging
- The 'txt + calls -> history_add' special case
The vendor's on_pre_dispatch closure handles:
- _execute_tool_calls_concurrently (re-invoked here because
the helper's call passes raw tool_calls but the vendor
needs to mutate payload AND log results)
- _reread_file_items + _build_file_diff_text (file diff
re-read at last tool result)
- MAX_ROUNDS system message
- _truncate_tool_output
- _MAX_TOOL_OUTPUT_BYTES budget warning
- Payload mutation for the next round
Green confirmed: 53 vendor + tool tests pass (14 Gemini CLI
+ 5 tool_loop core + 1 builder + 2 send_func + 6 MiniMax +
2 Grok + 7 Llama + 9 DeepSeek + 8 others). No regressions.
Task 1.6 of the follow-up track. _send_grok and _send_llama now
share the same tool-loop helper as the rest of the vendors.
Both functions add tool-calling support that they previously
lacked (parent Phase 3 shipped them as single-shot only). The
plan's Task 1.6 title says 'add missing loop' which matches
this scope. tool_choice='auto' if tools else 'auto' matches
the MiniMax pattern.
Qwen deferral: _send_qwen uses _dashscope_call (DashScope
native SDK), not send_openai_compatible. run_with_tool_loop
hard-codes send_openai_compatible. Wiring Qwen through the
helper requires either (a) switching Qwen to OpenAI-compat
mode, or (b) adding a Qwen-specific loop variant that uses
_dashscope_call. Both are non-trivial and out of scope for
Task 1.6. Tracked as a follow-up note in the state.toml.
Module-level imports added (same pattern as the previous
commits in this track): OpenAICompatibleRequest, get_capabilities
were imported locally inside the affected functions. Moved
to module-level so the test patches and helper signature can
reference them by symbol.
Green confirmed: 51 vendor + tool tests pass.
Task 1.3 of the follow-up track. _send_minimax now uses
run_with_tool_loop with a per-round request_builder callback
that re-reads _minimax_history under _minimax_history_lock.
The plan's Task 1.3 example builds the request once before the
loop. That would break MiniMax tool flows because the API
would not see the tool results appended to _minimax_history
on later rounds. The fix: extend run_with_tool_loop's 2nd arg
to accept Union[OpenAICompatibleRequest, Callable[[int],
OpenAICompatibleRequest]] (backward compatible; static-request
vendors pass a single request). MiniMax now passes a closure
that rebuilds messages from history each round.
Reasoning extraction: MiniMax exposes its chain-of-thought via
response.raw_response.choices[0].message.reasoning_details[0].
get('text'). Lifted to a _extract_minimax_reasoning callback
passed as reasoning_extractor=... (the new parameter added
in the previous commit).
Trim callback: wraps _trim_minimax_history so it can be called
from run_with_tool_loop after each tool-result append.
Green confirmed: 51 vendor + tool tests pass (6 MiniMax + 5
tool_loop core + 1 tool_loop builder + 39 others); the new
test_ai_client_tool_loop_builder.py locks in the per-round
builder contract.
Tasks 1.1 (red) + 1.2 (green) of the follow-up track. Adds a single
shared tool-call loop in src/ai_client.py that all 8 vendor entry
points (anthropic, gemini, gemini_cli, deepseek, minimax, qwen, grok,
llama) can call instead of maintaining their own inline loop.
Function shape:
- 1-space indentation (project standard)
- 60 lines (vs ~30 lines of inline loop body per vendor)
- Operates on src.openai_compatible.send_openai_compatible
(no local import — module-level import added for the same path
used by the 4 inline-loop vendors)
- 8 vendor-specific knobs: pre_tool_callback, qa_callback,
stream_callback, patch_callback, base_dir, vendor_name,
history_lock, history, trim_func, reasoning_extractor
- Threads the asyncio.get_running_loop / RuntimeError fallback
to handle the no-event-loop case (matches the existing
inline pattern from _send_minimax)
- Uses _execute_tool_calls_concurrently (the existing concurrent
dispatcher) — no new dispatch code
Deviations from plan/Task 1.1:
- The plan's test code patched src.tool_loop.send_openai_compatible
and the plan's Task 1.3 vendor wrapper imported 'from
src.tool_loop import run_with_tool_loop'. The plan predates the
AGENTS.md HARD RULE on src/<thing>.py files; per the follow-up
track's Naming Convention section, run_with_tool_loop lives IN
src/ai_client.py. Tests patch src.ai_client.send_openai_compatible
and the vendor wrapper imports 'from src.ai_client import
run_with_tool_loop' (next task).
- Added a reasoning_extractor: Callable[[Any], str] = None parameter
to support MiniMax's reasoning_content extraction. Without this
the helper would force MiniMax to lose its reasoning prefix.
Green confirmed: 50 vendor + tool tests pass; 4 audit scripts pass.
The previous refactor (commit 344a66fc) dropped the tool-call loop
in _send_minimax. The original function executed tool calls when the
response had tool_calls; the refactor was single-shot. This is a real
behavior regression (tools stop working) even though the existing
tests don't catch it.
Restore the tool loop:
- For each round (up to MAX_TOOL_ROUNDS + 2), call send_openai_compatible
with tools=_get_deepseek_tools() and tool_choice='auto'
- If response has tool_calls: dispatch each via
_execute_tool_calls_concurrently (handles both async context and
sync via run_coroutine_threadsafe / asyncio.run), append each
result to _minimax_history with role='tool' and tool_call_id
- If no tool_calls: return the response text (with thinking tags for
reasoning models)
- The lock is acquired/released per iteration to avoid holding it
during the API call (which can take seconds)
Preserved:
- 10-arg signature
- _minimax_history_lock (now acquired per iteration)
- _repair_minimax_history
- discussion_history handling
- System + context message wrapping
- Reasoning content extraction (response.raw_response.choices[0].message
.reasoning_details[0].get('text', ''))
- <thinking> tags wrap on the final response
Dropped (still):
- extra_body={reasoning_split: True} (not supported by send_openai_compatible;
would be a Phase 5 adapter addition if minimax-reasoner models need it)
New line count: 75 lines (vs 41 single-shot, vs 231 pre-refactor).
Net effect: 231 -> 75 = 68% reduction; tool loop preserved.
Verification: 38/38 tests pass (no regressions).