docs(guides): document run_with_tool_loop, native Ollama, v2 matrix, PROVIDERS

Updates docs/guide_ai_client.md and docs/guide_models.md to document the follow-up track's Phase 1-4 work: guide_ai_client.md (added 3 sections + 1 inline note): - run_with_tool_loop shared helper (signature, the 2 extensions for vendored call paths, the 4 applied + 3 deferred vendors, audit script) - Native Ollama adapter (the dispatcher check in _send_llama, the think/images/thinking fields, the /api/chat endpoint difference) - V2 Capability Matrix (12 fields, GUI rendering, static vs runtime caps.local) - PROVIDERS Location (Phase 2 move, PEP 562 re-export) guide_models.md (added 2 sections): - PROVIDERS Constant (location change + circular import rationale + audit) - V2 Capability Matrix (v2 field list, how to add a new v2 field per the HARD RULE on no new src/<thing>.py files) These docs were previously stale; they still described the v1 matrix only and the old 'inline tool loop' pattern. Phase 5 t5_5 is the docs step that brings them in sync with the current code. Verification: 118/118 vendor+tool+provider+import-isolation tests pass (no regressions; docs changes do not affect code)
2026-06-11 21:51:55 -04:00
parent c9135b0565
commit 88aea3199c
2 changed files with 138 additions and 1 deletions
@@ -518,7 +518,99 @@ Qwen uses Alibaba's DashScope native SDK (not OpenAI-compatible) because DashSco
 - **OpenRouter** (cloud aggregator): `https://openrouter.ai/api/v1`
 - **Custom URL** (escape hatch): any OpenAI-compatible endpoint

-The local-LLM signal is `_get_llama_cost_tracking()` (returns False for localhost/127.0.0.1).
+
+
+### `run_with_tool_loop` — Shared Tool-Call Loop Helper
+
+Added 2026-06-11 by the `qwen_llama_grok_followup_20260611` track. Wraps `send_openai_compatible` with the tool-call loop, so 4+ OpenAI-compatible vendors share the same dispatch + history logic instead of each having their own inline loop.
+
+**Signature** (in `src/ai_client.py:806`):
+
+```python
+def run_with_tool_loop(
+    client: Any,
+    request: OpenAICompatibleRequest | Callable[[int], OpenAICompatibleRequest],
+    *,
+    capabilities: "VendorCapabilities",
+    pre_tool_callback: Optional[Callable] = None,
+    qa_callback: Optional[Callable] = None,
+    stream_callback: Optional[Callable[[str], None]] = None,
+    patch_callback: Optional[Callable] = None,
+    base_dir: str,
+    vendor_name: str,
+    history_lock: Optional[threading.Lock] = None,
+    history: Optional[list] = None,
+    trim_func: Optional[Callable] = None,
+    send_func: Optional[Callable[[int], "NormalizedResponse"]] = None,
+    on_pre_dispatch: Optional[Callable] = None,
+) -> str:
+```
+
+**Two extensions** were added beyond the original signature:
+
+1. `request` accepts a `Callable[[int], OpenAICompatibleRequest]` (per-round history rebuild). Use this when the vendor mutates history between rounds (e.g., MiniMax's per-round append).
+2. `send_func + on_pre_dispatch` allows vendored call paths (e.g., Gemini CLI's `GeminiCliAdapter`) to share the loop + dispatch without going through `send_openai_compatible`.
+
+**Vendors applied** (as of 2026-06-11):
+- `_send_minimax` (was inline, now uses helper)
+- `_send_grok` (was single-shot, now has loop)
+- `_send_llama` (was single-shot, now has loop)
+- `_send_gemini_cli` (uses `send_func` + `on_pre_dispatch`)
+
+**Vendors still deferred** (multi-day refactor; see `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` t5_6/7/8):
+- `_send_anthropic` (uses anthropic SDK)
+- `_send_gemini` (uses google-genai streaming)
+- `_send_deepseek` (uses requests.post)
+
+**Audit enforcement**: `scripts/audit_no_inline_tool_loops.py` fails if any non-deferred `_send_<vendor>()` has an inline `for ... in range(MAX_TOOL_ROUNDS)` loop.
+
+### Native Ollama Adapter (Phase 4)
+
+Added 2026-06-11. When `_llama_base_url` is `localhost` / `127.0.0.1` (Ollama default), `_send_llama` routes to `_send_llama_native` (which wraps `ollama_chat`). The native adapter POSTs to `/api/chat` (NOT `/v1/chat/completions`) and supports Ollama's vendor-specific fields:
+
+- `think`: `low` | `medium` | `high` — reasoning depth hint
+- `images`: list of base64-encoded images (for vision-capable models)
+- `thinking`: returned field; captured in history for subsequent rounds
+
+The dispatcher check is in `_send_llama` at the function head:
+```python
+if "localhost" in _llama_base_url or "127.0.0.1" in _llama_base_url:
+    return _send_llama_native(...)
+```
+
+For OpenRouter, custom URLs, and other cloud Llama endpoints, the existing OpenAI-compat path is unchanged.
+
+### V2 Capability Matrix (Phase 4)
+
+Added 2026-06-11. The `VendorCapabilities` dataclass in `src/vendor_capabilities.py` now has 12 v2 fields beyond the original 7 v1 fields:
+
+**V1 fields** (unchanged):
+- `vision`, `tool_calling`, `caching`, `streaming`, `model_discovery`, `context_window`, `cost_tracking`
+
+**V2 fields** (added):
+- `local` — backend is on-device (Ollama, etc.); consumed by `_apply_runtime_caps_override` for llama+localhost
+- `reasoning` — model supports `thinking` / reasoning traces (e.g., MiniMax-M2.5/M2.7, DeepSeek R1, llama-3.1-405b-reasoning)
+- `structured_output` — model supports JSON / tool-use output format
+- `code_execution` — model can run code (server-side; e.g., gemini-2.0-experimental)
+- `web_search` — model can do live web search (e.g., grok-2, gemini-grounded)
+- `x_search` — X/Twitter search (grok-specific)
+- `file_search` — model has a file_search tool (Anthropic)
+- `mcp_support` — model supports the Model Context Protocol (Anthropic, gemini)
+- `audio` — model accepts audio input (gemini-2.5+, qwen-audio)
+- `video` — model accepts video input (gemini-2.5+, qwen-vl-max)
+- `grounding` — model supports grounding (gemini)
+- `computer_use` — model can drive a computer (Anthropic claude-3.5+)
+
+**GUI rendering**: `src/gui_2.py:_render_v2_capability_badges` renders small green badges in the provider panel for each field where `caps.<field> = True`. The user can see at a glance which capabilities their active vendor+model supports.
+
+**Static + runtime**: Most v2 fields are per-model properties in the registry. `caps.local` is unique — it's runtime state (URL-dependent), so the GUI uses `dataclasses.replace(caps, local=True)` to override when the active backend is Ollama.
+
+### PROVIDERS Location (Phase 2)
+
+The `PROVIDERS` list moved from `src/models.py` to `src/ai_client.py:56` per the AGENTS.md HARD RULE (no new `src/<thing>.py` files). A PEP 562 `__getattr__` re-export in `src/models.py:261` maintains backward compatibility (lazy import; breaks the circular dependency where `src/ai_client.py` imports `ToolPreset` from `src/models.py`).
+
+Audit: `scripts/audit_providers_source_of_truth.py` fails if `PROVIDERS` is declared in `src/models.py`.
+

 ### Tests