Private
Public Access
0
0

docs(guides): document run_with_tool_loop, native Ollama, v2 matrix, PROVIDERS

Updates docs/guide_ai_client.md and docs/guide_models.md
to document the follow-up track's Phase 1-4 work:

guide_ai_client.md (added 3 sections + 1 inline note):
  - run_with_tool_loop shared helper (signature, the
    2 extensions for vendored call paths, the
    4 applied + 3 deferred vendors, audit script)
  - Native Ollama adapter (the dispatcher check in
    _send_llama, the think/images/thinking fields,
    the /api/chat endpoint difference)
  - V2 Capability Matrix (12 fields, GUI rendering,
    static vs runtime caps.local)
  - PROVIDERS Location (Phase 2 move, PEP 562 re-export)

guide_models.md (added 2 sections):
  - PROVIDERS Constant (location change + circular
    import rationale + audit)
  - V2 Capability Matrix (v2 field list, how to add
    a new v2 field per the HARD RULE on no new
    src/<thing>.py files)

These docs were previously stale; they still described the
v1 matrix only and the old 'inline tool loop' pattern.
Phase 5 t5_5 is the docs step that brings them in sync
with the current code.

Verification: 118/118 vendor+tool+provider+import-isolation
tests pass (no regressions; docs changes do not affect code)
This commit is contained in:
2026-06-11 21:51:55 -04:00
parent c9135b0565
commit 88aea3199c
2 changed files with 138 additions and 1 deletions
+93 -1
View File
@@ -518,7 +518,99 @@ Qwen uses Alibaba's DashScope native SDK (not OpenAI-compatible) because DashSco
- **OpenRouter** (cloud aggregator): `https://openrouter.ai/api/v1`
- **Custom URL** (escape hatch): any OpenAI-compatible endpoint
The local-LLM signal is `_get_llama_cost_tracking()` (returns False for localhost/127.0.0.1).
### `run_with_tool_loop` — Shared Tool-Call Loop Helper
Added 2026-06-11 by the `qwen_llama_grok_followup_20260611` track. Wraps `send_openai_compatible` with the tool-call loop, so 4+ OpenAI-compatible vendors share the same dispatch + history logic instead of each having their own inline loop.
**Signature** (in `src/ai_client.py:806`):
```python
def run_with_tool_loop(
client: Any,
request: OpenAICompatibleRequest | Callable[[int], OpenAICompatibleRequest],
*,
capabilities: "VendorCapabilities",
pre_tool_callback: Optional[Callable] = None,
qa_callback: Optional[Callable] = None,
stream_callback: Optional[Callable[[str], None]] = None,
patch_callback: Optional[Callable] = None,
base_dir: str,
vendor_name: str,
history_lock: Optional[threading.Lock] = None,
history: Optional[list] = None,
trim_func: Optional[Callable] = None,
send_func: Optional[Callable[[int], "NormalizedResponse"]] = None,
on_pre_dispatch: Optional[Callable] = None,
) -> str:
```
**Two extensions** were added beyond the original signature:
1. `request` accepts a `Callable[[int], OpenAICompatibleRequest]` (per-round history rebuild). Use this when the vendor mutates history between rounds (e.g., MiniMax's per-round append).
2. `send_func + on_pre_dispatch` allows vendored call paths (e.g., Gemini CLI's `GeminiCliAdapter`) to share the loop + dispatch without going through `send_openai_compatible`.
**Vendors applied** (as of 2026-06-11):
- `_send_minimax` (was inline, now uses helper)
- `_send_grok` (was single-shot, now has loop)
- `_send_llama` (was single-shot, now has loop)
- `_send_gemini_cli` (uses `send_func` + `on_pre_dispatch`)
**Vendors still deferred** (multi-day refactor; see `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` t5_6/7/8):
- `_send_anthropic` (uses anthropic SDK)
- `_send_gemini` (uses google-genai streaming)
- `_send_deepseek` (uses requests.post)
**Audit enforcement**: `scripts/audit_no_inline_tool_loops.py` fails if any non-deferred `_send_<vendor>()` has an inline `for ... in range(MAX_TOOL_ROUNDS)` loop.
### Native Ollama Adapter (Phase 4)
Added 2026-06-11. When `_llama_base_url` is `localhost` / `127.0.0.1` (Ollama default), `_send_llama` routes to `_send_llama_native` (which wraps `ollama_chat`). The native adapter POSTs to `/api/chat` (NOT `/v1/chat/completions`) and supports Ollama's vendor-specific fields:
- `think`: `low` | `medium` | `high` — reasoning depth hint
- `images`: list of base64-encoded images (for vision-capable models)
- `thinking`: returned field; captured in history for subsequent rounds
The dispatcher check is in `_send_llama` at the function head:
```python
if "localhost" in _llama_base_url or "127.0.0.1" in _llama_base_url:
return _send_llama_native(...)
```
For OpenRouter, custom URLs, and other cloud Llama endpoints, the existing OpenAI-compat path is unchanged.
### V2 Capability Matrix (Phase 4)
Added 2026-06-11. The `VendorCapabilities` dataclass in `src/vendor_capabilities.py` now has 12 v2 fields beyond the original 7 v1 fields:
**V1 fields** (unchanged):
- `vision`, `tool_calling`, `caching`, `streaming`, `model_discovery`, `context_window`, `cost_tracking`
**V2 fields** (added):
- `local` — backend is on-device (Ollama, etc.); consumed by `_apply_runtime_caps_override` for llama+localhost
- `reasoning` — model supports `thinking` / reasoning traces (e.g., MiniMax-M2.5/M2.7, DeepSeek R1, llama-3.1-405b-reasoning)
- `structured_output` — model supports JSON / tool-use output format
- `code_execution` — model can run code (server-side; e.g., gemini-2.0-experimental)
- `web_search` — model can do live web search (e.g., grok-2, gemini-grounded)
- `x_search` — X/Twitter search (grok-specific)
- `file_search` — model has a file_search tool (Anthropic)
- `mcp_support` — model supports the Model Context Protocol (Anthropic, gemini)
- `audio` — model accepts audio input (gemini-2.5+, qwen-audio)
- `video` — model accepts video input (gemini-2.5+, qwen-vl-max)
- `grounding` — model supports grounding (gemini)
- `computer_use` — model can drive a computer (Anthropic claude-3.5+)
**GUI rendering**: `src/gui_2.py:_render_v2_capability_badges` renders small green badges in the provider panel for each field where `caps.<field> = True`. The user can see at a glance which capabilities their active vendor+model supports.
**Static + runtime**: Most v2 fields are per-model properties in the registry. `caps.local` is unique — it's runtime state (URL-dependent), so the GUI uses `dataclasses.replace(caps, local=True)` to override when the active backend is Ollama.
### PROVIDERS Location (Phase 2)
The `PROVIDERS` list moved from `src/models.py` to `src/ai_client.py:56` per the AGENTS.md HARD RULE (no new `src/<thing>.py` files). A PEP 562 `__getattr__` re-export in `src/models.py:261` maintains backward compatibility (lazy import; breaks the circular dependency where `src/ai_client.py` imports `ToolPreset` from `src/models.py`).
Audit: `scripts/audit_providers_source_of_truth.py` fails if `PROVIDERS` is declared in `src/models.py`.
### Tests
+45
View File
@@ -533,8 +533,53 @@ Tests live in `tests/test_models.py` and module-specific test files (e.g., `test
5. Add tests in `tests/test_models.py` (round-trip + validation).
6. Update `docs/guide_models.md` (this file) to document the new model.
---
## PROVIDERS Constant (Location Change 2026-06-11)
The `PROVIDERS` list was moved from `src/models.py` to `src/ai_client.py:56` per the AGENTS.md HARD RULE (no new `src/<thing>.py` files; system code lives in the system module).
**Current location**: `src/ai_client.py` (import as `from src.ai_client import PROVIDERS`)
**Backward compat**: `src/models.py:261-264` has a PEP 562 `__getattr__` that re-exports `PROVIDERS` via lazy import. This breaks the circular dependency where `src/ai_client.py:50` imports `ToolPreset` from `src/models.py` (a top-level `from src.ai_client import PROVIDERS` in `models.py` would deadlock).
**Audit**: `scripts/audit_providers_source_of_truth.py` fails if `PROVIDERS` is declared as a literal in `src/models.py`.
The 4 internal import sites were updated in commit `6c6a4aef`:
- `src/app_controller.py:3093`
- `src/gui_2.py:2293, 2849, 5377`
---
## V2 Capability Matrix (Added 2026-06-11)
`src/vendor_capabilities.py` defines the `VendorCapabilities` dataclass (NOT in `src/models.py` — it's in its own file because it's not a "data model" but a "capability registry"). The dataclass was extended with 12 v2 fields:
**V1 fields** (unchanged from parent track):
- `vision`, `tool_calling`, `caching`, `streaming`, `model_discovery`, `context_window`, `cost_tracking`
**V2 fields** (added in `qwen_llama_grok_followup_20260611` Phase 4):
- `local` — backend is on-device (Ollama, etc.)
- `reasoning` — model supports `thinking` / reasoning traces
- `structured_output` — model supports JSON / tool-use output
- `code_execution` — model can run code (server-side)
- `web_search` — model can do live web search
- `x_search` — X/Twitter search (grok-specific)
- `file_search` — model has a file_search tool (Anthropic)
- `mcp_support` — model supports the Model Context Protocol
- `audio` — model accepts audio input
- `video` — model accepts video input
- `grounding` — model supports grounding (gemini)
- `computer_use` — model can drive a computer (Anthropic claude-3.5+)
All v2 fields default to `False`. The dataclass is `frozen=True`; per-vendor entries use `register()` at module-import time. The GUI reads the matrix via `get_capabilities(vendor, model)` and adapts 9+ UI elements accordingly (see [guide_ai_client.md §V2 Capability Matrix](guide_ai_client.md#v2-capability-matrix-phase-4)).
**Adding a new v2 field**: The HARD RULE is that all AI-client code lives in `src/ai_client.py`. New v2 fields go in `src/vendor_capabilities.py` (existing file) — NOT in a new `src/<v2_thing>.py` file. Update the dataclass, populate per-model in the registry, add a small rendering helper in `src/gui_2.py` (e.g., `_render_v2_capability_badges` for the existing 11 v2 fields).
---
## See Also
- **[guide_architecture.md](guide_architecture.md)** — How models flow through the system