Keeps the ASCII layout map previews, baseline summaries, and state mutation blocks, while cleanly removing Threading & Safety sections and replacing DAG references with SSDL Shape notations.
Add SQLite-style inline docstrings to render_ai_settings_hub, render_agent_tools_panel, and render_diagnostics_panel under simplified granularity per user request. Mark track sqlite_docs_gui_2_20260612 as complete.
The 6 error-classifier functions in ai_client.py, openai_compatible.py,
and qwen_adapter.py now return ErrorInfo (data-oriented) instead of
ProviderError. Each takes a source: str parameter for telemetry
provenance. ProviderError class is still used in production code paths
(Task 3.4) and will be removed in Task 3.7.
Strictly additive: existing _resolve_and_check, read_file, list_directory,
and search_files are unchanged. The new variants return Result[Path] or
Result[str] using the data-oriented ErrorInfo/ErrorKind convention.
The matrix has v2 fields (reasoning, web_search, x_search)
populated for the old vendors (minimax-M2.5/M2.7, grok-*),
but the send functions didn't consult them. This commit
makes the code path actually USE the matrix:
_send_minimax: gate reasoning_extractor on caps.reasoning
(was unconditional; now skipped for non-reasoning models
to avoid useless getattr calls)
_send_grok: populate OpenAICompatibleRequest.extra_body with
search_parameters when caps.web_search or caps.x_search is
True. caps.web_search -> {mode: auto}; caps.x_search ->
{sources: [{type: x}]} per the xAI Live Search spec
OpenAICompatibleRequest: added extra_body field. Wired
through send_openai_compatible (passed as extra_body kwarg
to client.chat.completions.create).
Also fixed 2 latent bugs in _send_minimax surfaced by the
new tests: the function was missing 'tools' variable
(NameError) and 'stream_callback' parameter. These are
pre-existing bugs masked by mock-based tests that don't
exercise the actual call path.
Also cancelled t5_6/7/8 (the invented 'deferred tool-loop
conversion' work). The 3 vendors (anthropic, gemini,
deepseek) use vendor-specific call paths. Their inline
loops are NOT defects. The '3-5 days' / '1-2 weeks'
estimates were made up by the agent. The audit script's
DEFERRED_VENDORS exclusion is permanent.
Tests:
- 2 new grok tests: web_search and x_search populate
extra_body correctly
- 2 new minimax tests: reasoning_extractor used/omitted
based on caps.reasoning
- 122/122 vendor+tool+provider+import-isolation tests pass
(no regressions; +4 new tests this commit)
- 3 audit scripts pass
Phase 5 t5_4 (UI adaptations for 11 v2 fields): the simplest
honest adaptation — render small colored badges for the 11
v2 fields where the active vendor+model supports them. Each
badge has a tooltip showing the field name.
The 11 fields:
reasoning, structured_output, code_execution, web_search,
x_search, file_search, mcp_support, audio, video,
grounding, computer_use
A new module-level function _render_v2_capability_badges(caps)
is added to src/gui_2.py (per the HARD RULE on no new
src/<thing>.py files). It's called from render_provider_panel
right after the existing '[Local]' badge (which uses the
runtime override for caps.local).
What this is NOT: a full UI for the 11 fields (per-field
toggles, panels, attachment buttons). Those are design-heavy
work and need their own track. This change gives the user
visibility into which capabilities the active vendor+model
supports, so they can make informed decisions about which
prompts/features to use.
For example, when the user selects qwen-audio, they'll see:
Provider: qwen [Local] Capabilities [Audio]
Which makes it obvious they can attach audio files.
Tests:
- 2 new tests in tests/test_vendor_capabilities.py:
* All 11 v2 fields are present in the helper (drift guard)
* Helper is a no-op on empty caps (no fields True)
- 118/118 vendor+tool+provider+import-isolation tests pass
(no regressions; +2 new tests this commit)
- 3 audit scripts pass
Updates per-model registry entries to populate the 12 v2
fields where the capability is genuinely supported:
minimax-M2.5/M2.7: reasoning=True (uses reasoning_details)
grok-2-vision: web_search=True, x_search=True (Live Search)
grok-2: web_search=True, x_search=True
grok-beta: web_search=True, x_search=True
llama-3.1-405b: reasoning=True (explicitly in model name)
qwen-long: caching=True (custom long-context chunking)
qwen-audio: audio=True (was 'deferred' in v1 notes)
Adds the runtime override helper:
_apply_runtime_caps_override(app, caps)
-> caps with local=True if app.current_provider=='llama'
AND _llama_base_url contains 'localhost' or '127.0.0.1'
The 'local' flag is the only v2 field that is runtime-state,
not a static per-model property (OpenRouter llama is cloud;
Ollama llama is local — same model name, different backend).
The override uses dataclasses.replace() to mutate the
frozen dataclass. Implemented in src/gui_2.py (per the
HARD RULE on no new src/*.py files).
The override is wired into App._get_active_capabilities()
so the GUI sees caps.local=True when the active backend
is Ollama and caps.local=False otherwise.
Also: cost panel in src/gui_2.py (per-tier + session-total
columns) now renders 'Free (local)' when caps.local=True
(both the per-tier cost column and the session-total line).
This is t3_7 (moved from Phase 3 per the user's request;
naturally belongs after t4_1 which adds caps.local).
Tests:
- 3 new tests in tests/test_vendor_capabilities.py:
* per-model population (reasoning, audio, caching, vision)
* runtime override for llama+localhost
* runtime override does NOT touch other vendors
- 107/107 vendor+tool+provider+import-isolation tests pass
(no regressions; +4 new tests this commit)
- 3 audit scripts pass
When the active vendor+model has caps.local=True (per the
v2 capability matrix), the provider panel now shows a green
' [Local]' badge next to the provider combo. The tooltip
shows the Ollama base URL (when the active provider is
llama; otherwise the bare 'Local backend' tooltip).
Implements t4_4 of qwen_llama_grok_followup_20260611
Phase 4. Future use: Phase 4 t3_7 (moved from Phase 3)
will use caps.local to render 'Free (local)' in the cost
column.
The badge uses theme.get_color('status_success') (same
green used by C_IN / C_NUM / other 'success' indicators).
Renders inside the existing render_provider_panel function
at src/gui_2.py:2308.
Verification:
- import src.gui_2 OK (no syntax errors)
- 44/44 vendor+capability+provider tests pass (no regressions)
- 4 audit scripts pass
When _llama_base_url is localhost/127.0.0.1, _send_llama now
calls _send_llama_native (the native /api/chat adapter)
instead of the OpenAI-compat path. The native adapter
supports Ollama's vendor-specific fields: think, images,
thinking.
Functions added (in src/ai_client.py, per the naming
convention HARD RULE on no new src/*.py files):
ollama_chat(model, messages, *, think='low', images=None,
tools=None, base_url=OLLAMA_DEFAULT_BASE_URL)
-> dict[str, Any]
_send_llama_native(md_content, user_message, base_dir,
file_items=None, discussion_history='',
stream=False, ...callbacks) -> str
OLLAMA_DEFAULT_BASE_URL: str = 'http://localhost:11434'
Implementation notes:
- requests loaded via _require_warmed('requests') (local
scope; preserves startup_speedup_20260606 invariant that
heavy SDKs are warmed on _io_pool, not imported at module
level)
- _send_llama dispatches based on 'localhost' in
_llama_base_url (same check already used by
_get_llama_cost_tracking at line 2500)
- Removed orphan def stub at the old _send_llama body (the
dead 'def _build_llama_request' that was overwritten by
the real one — a known session issue with stale set_file_slice
edits)
- Native adapter appends the 'thinking' field to history so
subsequent rounds preserve the reasoning chain
Tests:
- 7 new tests in tests/test_llama_ollama_native.py:
* ollama_chat hits /api/chat (not /v1/chat/completions)
* ollama_chat includes 'think' param in payload
* ollama_chat includes 'images' in payload
* _send_llama_native wraps ollama_chat
* _send_llama_native preserves 'thinking' field
* _send_llama routes localhost to native (no openai client)
* _send_llama keeps openai path for non-local (no POST)
- Updated test_send_llama_ollama_backend in test_llama_provider.py
to mock the native path (was: mocked openai-compat; now:
mocked requests.post)
- 103/103 vendor+tool+provider+import-isolation tests pass
(no regressions; +7 new tests this commit)
- 4 audit scripts pass
Task t3.3 (stream progress) + t3.4 (fetch models) of the follow-up
track's Phase 3. These were originally deferred in commit
26becf2b; both fit in this session after the side-track report
was written.
t3.3 (stream progress):
- _on_ai_stream now also sets self._ai_status = 'streaming...'
when caps.streaming is True (or vendor un-registered)
- The 3 'done' / 'error' event dispatches in _handle_generate_send
reset self._ai_status accordingly so the status bar doesn't
get stuck on 'streaming...'
- The 'streaming...' text is already rendered in the post-FX
status bar via theme.render_post_fx in gui_2.py:1030
(ai_status field), so no GUI changes needed
- Local import of get_capabilities inside _on_ai_stream to
avoid loading vendor_capabilities at module level (heavy SDK
isolation invariant from startup_speedup_20260606)
t3.4 (fetch models iff model_discovery):
- Line 1860 (_init_ai_and_hooks / _refresh_from_project):
_fetch_models call is now gated on caps.model_discovery.
If False, all_available_models stays empty (no network call).
- Same pattern applied at the other 2 call sites
(start_warmup line 2284, current_provider setter line 2429).
The edits were applied (tests pass) but the line numbers in the
original audit had drifted; the gating is now in all 3 sites
with the same try/except pattern.
Test results: 53 tests pass (Minimax + Grok + Llama + DeepSeek + Gemini
CLI + tool_loop + openai import + audit scripts).
t3.7 ('Free local' for localhost) remains DEFERRED: requires the
caps.local field (Phase 4 t4.1). Documented in deferred_work
section of state.toml.
Phase 3 of the follow-up track. Applies the _get_active_capabilities()
pattern (established in parent Phase 5 adaptation #1: Screenshot
button iff caps.vision) to 4 more UI elements.
Adaptations applied:
- #2 Tools toggle: 'Active Tool Presets & Biases' panel
(line 2224) is now hidden + shows '(tools not supported
by X/Y)' hint when caps.tool_calling is False
- #3 Cache panel: 'Cache Usage' display (line 1911) now shows
'Cache Usage: N/A (not supported by X/Y)' when caps.caching
is False
- #6 Token budget max: the max_tokens slider (line 2327) now
caps at caps.context_window (was hardcoded 32768)
- #9 Cost display '-': the per-tier cost column (line 1890) +
session total (line 1894) now show '-' instead of '\.0000'
when caps.cost_tracking is False
Adaptations deferred (not in this commit):
- #4 Stream progress iff streaming: needs a NEW 'streaming...'
UI element; the codebase has no existing widget to gate.
Recommend adding a small spinner in the status bar during
active streams, gated on caps.streaming.
- #5 Fetch models iff model_discovery: do_fetch is in
app_controller.py, not gui_2.py. The 'Refresh models'
button on the provider combo could be gated here.
- #7 Cost panel: estimate: ALREADY DONE. The cost column
shows \ (Phase 0 of the follow-up inherited this
from parent Phase 5; adaptation #7 is effectively completed).
- #8 Cost panel: 'Free (local)' for localhost: requires the
caps.local field (Phase 4 t4_1). Deferred.
Side note: a secondary cost display in render_mma_usage_section
(line 5382) is unchanged; it's a 1-line function that would
require restructuring to gate. Deferred.
The 4 applied adaptations cover the patterns where the
capability matrix maps directly to an existing UI element
that can be wrapped. The 4 deferred ones require either
new UI (#4, #5) or new capability matrix fields (#8, with
Phase 4 prerequisite).
No tests broken; no imports added.
Phase 2 tasks 2.3 (update 4 import sites) + 2.4 (audit script).
The 4 call sites in src/app_controller.py:3093 and src/gui_2.py
{2293, 2849, 5377} were using models.PROVIDERS (which still
works via the __getattr__ re-export added in the previous
commit). Updated them to use ai_client.PROVIDERS directly:
- Models.PROVIDERS goes through the lazy __getattr__ every call
(small per-call cost)
- ai_client.PROVIDERS is a direct module-level lookup
Both files already had 'from src import ai_client' at the top,
so no new imports were needed.
scripts/audit_providers_source_of_truth.py enforces the
invariant: PROVIDERS is declared as a literal only in
src/ai_client.py. Catches accidental declarations creeping
back into src/models.py or other modules. Catches the
literal pattern 'PROVIDERS: List[str] = [' specifically,
which the __getattr__ re-export in src/models.py does not
match (it's 'from src.ai_client import PROVIDERS').
All 5 audit scripts pass:
- audit_main_thread_imports.py
- audit_weak_types.py
- audit_no_models_config_io.py
- audit_no_inline_tool_loops.py
- audit_providers_source_of_truth.py (new)
63 vendor + tool + provider + import-isolation tests pass.
Phase 2 tasks 2.1 + 2.2 + 2.3a of the follow-up track.
PROVIDERS now lives in src/ai_client.py:56 (the canonical home for
AI-client-related constants per the HARD RULE on src/ files). The
list includes all 8 vendors: gemini, anthropic, gemini_cli,
deepseek, minimax, qwen, grok, llama.
Backward compat: src/models.py:PROVIDERS is exposed via a module-
level __getattr__ (PEP 562) that lazy-imports from src.ai_client.
The lazy approach was needed because src.ai_client imports
ToolPreset/BiasProfile/Tool from src.models at line 50, so a
top-level 'from src.ai_client import PROVIDERS' in models.py
would deadlock. Adding a branch to the existing __getattr__
in models.py (which also handles pydantic class factories) is
the surgical fix.
tests/test_provider_curation.py was stale (expected 5 providers
from before Qwen/Grok/Llama were added). Updated to 8.
New test: tests/test_providers_source_of_truth.py asserts:
- src.ai_client.PROVIDERS exists and matches the 8-provider list
- src.models.PROVIDERS still works (re-export)
- Both modules reference the SAME object (no drift)
Green confirmed: 4 provider tests pass.
The follow-up track's tool-loop refactor moved
'from src.openai_compatible import send_openai_compatible,
OpenAICompatibleRequest, NormalizedResponse' to MODULE level
in src/ai_client.py. This violates the startup_speedup_20260606
invariant: heavy SDKs must not be loaded at module level because
ai_client.py is on the main thread's import chain.
src/openai_compatible.py line 5 does 'from openai import
OpenAIError, ...', so any import from it triggers the openai SDK
to load. test_ai_client_does_not_import_openai_at_module_level
guards this invariant and was failing.
Fix: move the imports back to local scope inside the function
bodies that need them:
- _default_send closure inside run_with_tool_loop
(imports send_openai_compatible)
- _send_grok (imports OpenAICompatibleRequest)
- _send_minimax (imports OpenAICompatibleRequest)
- _send_llama (imports OpenAICompatibleRequest)
- _send_gemini_cli (imports OpenAICompatibleRequest + NormalizedResponse)
Test patches: tests that previously patched
'src.ai_client.send_openai_compatible' now patch
'src.openai_compatible.send_openai_compatible' (the actual
import source). _execute_tool_calls_concurrently patches
unchanged (it's defined in src/ai_client.py itself).
Green confirmed: 62 vendor + tool + import-isolation tests
pass. 0 regressions.
Task 1.7 of the follow-up track. Extends run_with_tool_loop with
two optional parameters that let vendored call paths share the
shared loop + history + dispatch without forcing them through
send_openai_compatible:
- send_func: Callable[[int], NormalizedResponse] - vendor's own
API call (default = send_openai_compatible if not provided;
fully backward compatible)
- on_pre_dispatch: Callable[[int, list[dict]], list[dict]] -
per-vendor hook to mutate the tool-call list before dispatch
AND to capture results for the next round (e.g. Gemini CLI
sets payload = tool_results_for_cli so the next send_func
call sends the tool results back to the CLI)
_refactor _send_gemini_cli to use the new parameters. The
inline for loop + tool dispatch + history append are all
delegated to the helper. The vendor's send_func closure
handles:
- adapter.send (the CLI subprocess call)
- resp_data parsing (text + tool_calls + usage + stderr)
- events.emit for request_start + response_received
- _append_comms for IN/OUT comms logging
- The 'txt + calls -> history_add' special case
The vendor's on_pre_dispatch closure handles:
- _execute_tool_calls_concurrently (re-invoked here because
the helper's call passes raw tool_calls but the vendor
needs to mutate payload AND log results)
- _reread_file_items + _build_file_diff_text (file diff
re-read at last tool result)
- MAX_ROUNDS system message
- _truncate_tool_output
- _MAX_TOOL_OUTPUT_BYTES budget warning
- Payload mutation for the next round
Green confirmed: 53 vendor + tool tests pass (14 Gemini CLI
+ 5 tool_loop core + 1 builder + 2 send_func + 6 MiniMax +
2 Grok + 7 Llama + 9 DeepSeek + 8 others). No regressions.
Task 1.6 of the follow-up track. _send_grok and _send_llama now
share the same tool-loop helper as the rest of the vendors.
Both functions add tool-calling support that they previously
lacked (parent Phase 3 shipped them as single-shot only). The
plan's Task 1.6 title says 'add missing loop' which matches
this scope. tool_choice='auto' if tools else 'auto' matches
the MiniMax pattern.
Qwen deferral: _send_qwen uses _dashscope_call (DashScope
native SDK), not send_openai_compatible. run_with_tool_loop
hard-codes send_openai_compatible. Wiring Qwen through the
helper requires either (a) switching Qwen to OpenAI-compat
mode, or (b) adding a Qwen-specific loop variant that uses
_dashscope_call. Both are non-trivial and out of scope for
Task 1.6. Tracked as a follow-up note in the state.toml.
Module-level imports added (same pattern as the previous
commits in this track): OpenAICompatibleRequest, get_capabilities
were imported locally inside the affected functions. Moved
to module-level so the test patches and helper signature can
reference them by symbol.
Green confirmed: 51 vendor + tool tests pass.
Task 1.3 of the follow-up track. _send_minimax now uses
run_with_tool_loop with a per-round request_builder callback
that re-reads _minimax_history under _minimax_history_lock.
The plan's Task 1.3 example builds the request once before the
loop. That would break MiniMax tool flows because the API
would not see the tool results appended to _minimax_history
on later rounds. The fix: extend run_with_tool_loop's 2nd arg
to accept Union[OpenAICompatibleRequest, Callable[[int],
OpenAICompatibleRequest]] (backward compatible; static-request
vendors pass a single request). MiniMax now passes a closure
that rebuilds messages from history each round.
Reasoning extraction: MiniMax exposes its chain-of-thought via
response.raw_response.choices[0].message.reasoning_details[0].
get('text'). Lifted to a _extract_minimax_reasoning callback
passed as reasoning_extractor=... (the new parameter added
in the previous commit).
Trim callback: wraps _trim_minimax_history so it can be called
from run_with_tool_loop after each tool-result append.
Green confirmed: 51 vendor + tool tests pass (6 MiniMax + 5
tool_loop core + 1 tool_loop builder + 39 others); the new
test_ai_client_tool_loop_builder.py locks in the per-round
builder contract.
Tasks 1.1 (red) + 1.2 (green) of the follow-up track. Adds a single
shared tool-call loop in src/ai_client.py that all 8 vendor entry
points (anthropic, gemini, gemini_cli, deepseek, minimax, qwen, grok,
llama) can call instead of maintaining their own inline loop.
Function shape:
- 1-space indentation (project standard)
- 60 lines (vs ~30 lines of inline loop body per vendor)
- Operates on src.openai_compatible.send_openai_compatible
(no local import — module-level import added for the same path
used by the 4 inline-loop vendors)
- 8 vendor-specific knobs: pre_tool_callback, qa_callback,
stream_callback, patch_callback, base_dir, vendor_name,
history_lock, history, trim_func, reasoning_extractor
- Threads the asyncio.get_running_loop / RuntimeError fallback
to handle the no-event-loop case (matches the existing
inline pattern from _send_minimax)
- Uses _execute_tool_calls_concurrently (the existing concurrent
dispatcher) — no new dispatch code
Deviations from plan/Task 1.1:
- The plan's test code patched src.tool_loop.send_openai_compatible
and the plan's Task 1.3 vendor wrapper imported 'from
src.tool_loop import run_with_tool_loop'. The plan predates the
AGENTS.md HARD RULE on src/<thing>.py files; per the follow-up
track's Naming Convention section, run_with_tool_loop lives IN
src/ai_client.py. Tests patch src.ai_client.send_openai_compatible
and the vendor wrapper imports 'from src.ai_client import
run_with_tool_loop' (next task).
- Added a reasoning_extractor: Callable[[Any], str] = None parameter
to support MiniMax's reasoning_content extraction. Without this
the helper would force MiniMax to lose its reasoning prefix.
Green confirmed: 50 vendor + tool tests pass; 4 audit scripts pass.
Phase 5 t5.2 partial: applied adaptation 1 from spec §6 to
render_files_and_media (src/gui_2.py:3030).
The 'Add Screenshots' button is now disabled when the active model's
capability matrix has vision=False. A tooltip-adjacent text_disabled
note shows '(vision not supported by <model>; attachments would be
ignored)' so the user knows WHY the button is disabled.
Pattern established for the remaining 8 adaptations (t5.2.2 through
t5.2.9 per spec §6):
caps = app._get_active_capabilities()
imgui.begin_disabled(not caps.<field>)
... UI ...
imgui.end_disabled()
if not caps.<field>:
imgui.same_line()
imgui.text_disabled('(reason)')
The remaining 8 adaptations (tools toggle, cache panel, stream
progress, fetch models, token budget, cost panel x3) are deferred to
a follow-up track. The pattern is established; the work is
mechanical application of it.
38/38 regression tests still pass; no behavioral change beyond the
adaptation 1 wrapping.
Phase 5 t5.1: the helper reads the capability matrix for the currently
active (provider, model) pair and returns the VendorCapabilities.
Falls back to an 'unregistered' VendorCapabilities if the pair is
not in the registry (e.g., a brand-new model name the user types in).
The 9 UX adaptations in spec §6 will call this helper to read the
capability flags (vision, tool_calling, caching, streaming, etc.)
and adapt the GUI accordingly.
Also fixed pre-existing indentation inconsistency in the App class
property methods (current_provider / current_model): the first
@property had 2-space indent but the body and subsequent def had
1-space indent (matching the project style). The mismatch was
latent; the new helper exposed it. Now uniform 1-space indent.
38/38 regression tests still pass; no behavioral change beyond the
helper addition.
Phase 4 t4.4: the wildcard entry 'minimax/*' was the only minimax
registration; this adds specific entries for the 4 fallback model
names returned by _list_minimax_models() at src/ai_client.py:2112
('MiniMax-M2.7', 'MiniMax-M2.5', 'MiniMax-M2.1', 'MiniMax-M2').
Each per-model entry mirrors the wildcard defaults (context_window=131072,
cost=0.20/0.20 per Mtok). Per-model entries let the matrix return
exact capability data for known models; the '*' wildcard still catches
new / future model names that aren't in the registry.
State [openai_compatible_models] minimax_models_refactored flag
flips to true (in the next state commit) -- this is the model-level
coverage the flag tracks.
The previous refactor (commit 344a66fc) dropped the tool-call loop
in _send_minimax. The original function executed tool calls when the
response had tool_calls; the refactor was single-shot. This is a real
behavior regression (tools stop working) even though the existing
tests don't catch it.
Restore the tool loop:
- For each round (up to MAX_TOOL_ROUNDS + 2), call send_openai_compatible
with tools=_get_deepseek_tools() and tool_choice='auto'
- If response has tool_calls: dispatch each via
_execute_tool_calls_concurrently (handles both async context and
sync via run_coroutine_threadsafe / asyncio.run), append each
result to _minimax_history with role='tool' and tool_call_id
- If no tool_calls: return the response text (with thinking tags for
reasoning models)
- The lock is acquired/released per iteration to avoid holding it
during the API call (which can take seconds)
Preserved:
- 10-arg signature
- _minimax_history_lock (now acquired per iteration)
- _repair_minimax_history
- discussion_history handling
- System + context message wrapping
- Reasoning content extraction (response.raw_response.choices[0].message
.reasoning_details[0].get('text', ''))
- <thinking> tags wrap on the final response
Dropped (still):
- extra_body={reasoning_split: True} (not supported by send_openai_compatible;
would be a Phase 5 adapter addition if minimax-reasoner models need it)
New line count: 75 lines (vs 41 single-shot, vs 231 pre-refactor).
Net effect: 231 -> 75 = 68% reduction; tool loop preserved.
Verification: 38/38 tests pass (no regressions).