docs(guides): document run_with_tool_loop, native Ollama, v2 matrix, PROVIDERS
Updates docs/guide_ai_client.md and docs/guide_models.md
to document the follow-up track's Phase 1-4 work:
guide_ai_client.md (added 3 sections + 1 inline note):
- run_with_tool_loop shared helper (signature, the
2 extensions for vendored call paths, the
4 applied + 3 deferred vendors, audit script)
- Native Ollama adapter (the dispatcher check in
_send_llama, the think/images/thinking fields,
the /api/chat endpoint difference)
- V2 Capability Matrix (12 fields, GUI rendering,
static vs runtime caps.local)
- PROVIDERS Location (Phase 2 move, PEP 562 re-export)
guide_models.md (added 2 sections):
- PROVIDERS Constant (location change + circular
import rationale + audit)
- V2 Capability Matrix (v2 field list, how to add
a new v2 field per the HARD RULE on no new
src/<thing>.py files)
These docs were previously stale; they still described the
v1 matrix only and the old 'inline tool loop' pattern.
Phase 5 t5_5 is the docs step that brings them in sync
with the current code.
Verification: 118/118 vendor+tool+provider+import-isolation
tests pass (no regressions; docs changes do not affect code)
This commit is contained in:
+93
-1
@@ -518,7 +518,99 @@ Qwen uses Alibaba's DashScope native SDK (not OpenAI-compatible) because DashSco
|
||||
- **OpenRouter** (cloud aggregator): `https://openrouter.ai/api/v1`
|
||||
- **Custom URL** (escape hatch): any OpenAI-compatible endpoint
|
||||
|
||||
The local-LLM signal is `_get_llama_cost_tracking()` (returns False for localhost/127.0.0.1).
|
||||
|
||||
|
||||
### `run_with_tool_loop` — Shared Tool-Call Loop Helper
|
||||
|
||||
Added 2026-06-11 by the `qwen_llama_grok_followup_20260611` track. Wraps `send_openai_compatible` with the tool-call loop, so 4+ OpenAI-compatible vendors share the same dispatch + history logic instead of each having their own inline loop.
|
||||
|
||||
**Signature** (in `src/ai_client.py:806`):
|
||||
|
||||
```python
|
||||
def run_with_tool_loop(
|
||||
client: Any,
|
||||
request: OpenAICompatibleRequest | Callable[[int], OpenAICompatibleRequest],
|
||||
*,
|
||||
capabilities: "VendorCapabilities",
|
||||
pre_tool_callback: Optional[Callable] = None,
|
||||
qa_callback: Optional[Callable] = None,
|
||||
stream_callback: Optional[Callable[[str], None]] = None,
|
||||
patch_callback: Optional[Callable] = None,
|
||||
base_dir: str,
|
||||
vendor_name: str,
|
||||
history_lock: Optional[threading.Lock] = None,
|
||||
history: Optional[list] = None,
|
||||
trim_func: Optional[Callable] = None,
|
||||
send_func: Optional[Callable[[int], "NormalizedResponse"]] = None,
|
||||
on_pre_dispatch: Optional[Callable] = None,
|
||||
) -> str:
|
||||
```
|
||||
|
||||
**Two extensions** were added beyond the original signature:
|
||||
|
||||
1. `request` accepts a `Callable[[int], OpenAICompatibleRequest]` (per-round history rebuild). Use this when the vendor mutates history between rounds (e.g., MiniMax's per-round append).
|
||||
2. `send_func + on_pre_dispatch` allows vendored call paths (e.g., Gemini CLI's `GeminiCliAdapter`) to share the loop + dispatch without going through `send_openai_compatible`.
|
||||
|
||||
**Vendors applied** (as of 2026-06-11):
|
||||
- `_send_minimax` (was inline, now uses helper)
|
||||
- `_send_grok` (was single-shot, now has loop)
|
||||
- `_send_llama` (was single-shot, now has loop)
|
||||
- `_send_gemini_cli` (uses `send_func` + `on_pre_dispatch`)
|
||||
|
||||
**Vendors still deferred** (multi-day refactor; see `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` t5_6/7/8):
|
||||
- `_send_anthropic` (uses anthropic SDK)
|
||||
- `_send_gemini` (uses google-genai streaming)
|
||||
- `_send_deepseek` (uses requests.post)
|
||||
|
||||
**Audit enforcement**: `scripts/audit_no_inline_tool_loops.py` fails if any non-deferred `_send_<vendor>()` has an inline `for ... in range(MAX_TOOL_ROUNDS)` loop.
|
||||
|
||||
### Native Ollama Adapter (Phase 4)
|
||||
|
||||
Added 2026-06-11. When `_llama_base_url` is `localhost` / `127.0.0.1` (Ollama default), `_send_llama` routes to `_send_llama_native` (which wraps `ollama_chat`). The native adapter POSTs to `/api/chat` (NOT `/v1/chat/completions`) and supports Ollama's vendor-specific fields:
|
||||
|
||||
- `think`: `low` | `medium` | `high` — reasoning depth hint
|
||||
- `images`: list of base64-encoded images (for vision-capable models)
|
||||
- `thinking`: returned field; captured in history for subsequent rounds
|
||||
|
||||
The dispatcher check is in `_send_llama` at the function head:
|
||||
```python
|
||||
if "localhost" in _llama_base_url or "127.0.0.1" in _llama_base_url:
|
||||
return _send_llama_native(...)
|
||||
```
|
||||
|
||||
For OpenRouter, custom URLs, and other cloud Llama endpoints, the existing OpenAI-compat path is unchanged.
|
||||
|
||||
### V2 Capability Matrix (Phase 4)
|
||||
|
||||
Added 2026-06-11. The `VendorCapabilities` dataclass in `src/vendor_capabilities.py` now has 12 v2 fields beyond the original 7 v1 fields:
|
||||
|
||||
**V1 fields** (unchanged):
|
||||
- `vision`, `tool_calling`, `caching`, `streaming`, `model_discovery`, `context_window`, `cost_tracking`
|
||||
|
||||
**V2 fields** (added):
|
||||
- `local` — backend is on-device (Ollama, etc.); consumed by `_apply_runtime_caps_override` for llama+localhost
|
||||
- `reasoning` — model supports `thinking` / reasoning traces (e.g., MiniMax-M2.5/M2.7, DeepSeek R1, llama-3.1-405b-reasoning)
|
||||
- `structured_output` — model supports JSON / tool-use output format
|
||||
- `code_execution` — model can run code (server-side; e.g., gemini-2.0-experimental)
|
||||
- `web_search` — model can do live web search (e.g., grok-2, gemini-grounded)
|
||||
- `x_search` — X/Twitter search (grok-specific)
|
||||
- `file_search` — model has a file_search tool (Anthropic)
|
||||
- `mcp_support` — model supports the Model Context Protocol (Anthropic, gemini)
|
||||
- `audio` — model accepts audio input (gemini-2.5+, qwen-audio)
|
||||
- `video` — model accepts video input (gemini-2.5+, qwen-vl-max)
|
||||
- `grounding` — model supports grounding (gemini)
|
||||
- `computer_use` — model can drive a computer (Anthropic claude-3.5+)
|
||||
|
||||
**GUI rendering**: `src/gui_2.py:_render_v2_capability_badges` renders small green badges in the provider panel for each field where `caps.<field> = True`. The user can see at a glance which capabilities their active vendor+model supports.
|
||||
|
||||
**Static + runtime**: Most v2 fields are per-model properties in the registry. `caps.local` is unique — it's runtime state (URL-dependent), so the GUI uses `dataclasses.replace(caps, local=True)` to override when the active backend is Ollama.
|
||||
|
||||
### PROVIDERS Location (Phase 2)
|
||||
|
||||
The `PROVIDERS` list moved from `src/models.py` to `src/ai_client.py:56` per the AGENTS.md HARD RULE (no new `src/<thing>.py` files). A PEP 562 `__getattr__` re-export in `src/models.py:261` maintains backward compatibility (lazy import; breaks the circular dependency where `src/ai_client.py` imports `ToolPreset` from `src/models.py`).
|
||||
|
||||
Audit: `scripts/audit_providers_source_of_truth.py` fails if `PROVIDERS` is declared in `src/models.py`.
|
||||
|
||||
|
||||
### Tests
|
||||
|
||||
|
||||
Reference in New Issue
Block a user