diff --git a/docs/guide_ai_client.md b/docs/guide_ai_client.md index d8251d0e..eb2dc6ed 100644 --- a/docs/guide_ai_client.md +++ b/docs/guide_ai_client.md @@ -518,7 +518,99 @@ Qwen uses Alibaba's DashScope native SDK (not OpenAI-compatible) because DashSco - **OpenRouter** (cloud aggregator): `https://openrouter.ai/api/v1` - **Custom URL** (escape hatch): any OpenAI-compatible endpoint -The local-LLM signal is `_get_llama_cost_tracking()` (returns False for localhost/127.0.0.1). + + +### `run_with_tool_loop` — Shared Tool-Call Loop Helper + +Added 2026-06-11 by the `qwen_llama_grok_followup_20260611` track. Wraps `send_openai_compatible` with the tool-call loop, so 4+ OpenAI-compatible vendors share the same dispatch + history logic instead of each having their own inline loop. + +**Signature** (in `src/ai_client.py:806`): + +```python +def run_with_tool_loop( + client: Any, + request: OpenAICompatibleRequest | Callable[[int], OpenAICompatibleRequest], + *, + capabilities: "VendorCapabilities", + pre_tool_callback: Optional[Callable] = None, + qa_callback: Optional[Callable] = None, + stream_callback: Optional[Callable[[str], None]] = None, + patch_callback: Optional[Callable] = None, + base_dir: str, + vendor_name: str, + history_lock: Optional[threading.Lock] = None, + history: Optional[list] = None, + trim_func: Optional[Callable] = None, + send_func: Optional[Callable[[int], "NormalizedResponse"]] = None, + on_pre_dispatch: Optional[Callable] = None, +) -> str: +``` + +**Two extensions** were added beyond the original signature: + +1. `request` accepts a `Callable[[int], OpenAICompatibleRequest]` (per-round history rebuild). Use this when the vendor mutates history between rounds (e.g., MiniMax's per-round append). +2. `send_func + on_pre_dispatch` allows vendored call paths (e.g., Gemini CLI's `GeminiCliAdapter`) to share the loop + dispatch without going through `send_openai_compatible`. + +**Vendors applied** (as of 2026-06-11): +- `_send_minimax` (was inline, now uses helper) +- `_send_grok` (was single-shot, now has loop) +- `_send_llama` (was single-shot, now has loop) +- `_send_gemini_cli` (uses `send_func` + `on_pre_dispatch`) + +**Vendors still deferred** (multi-day refactor; see `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` t5_6/7/8): +- `_send_anthropic` (uses anthropic SDK) +- `_send_gemini` (uses google-genai streaming) +- `_send_deepseek` (uses requests.post) + +**Audit enforcement**: `scripts/audit_no_inline_tool_loops.py` fails if any non-deferred `_send_()` has an inline `for ... in range(MAX_TOOL_ROUNDS)` loop. + +### Native Ollama Adapter (Phase 4) + +Added 2026-06-11. When `_llama_base_url` is `localhost` / `127.0.0.1` (Ollama default), `_send_llama` routes to `_send_llama_native` (which wraps `ollama_chat`). The native adapter POSTs to `/api/chat` (NOT `/v1/chat/completions`) and supports Ollama's vendor-specific fields: + +- `think`: `low` | `medium` | `high` — reasoning depth hint +- `images`: list of base64-encoded images (for vision-capable models) +- `thinking`: returned field; captured in history for subsequent rounds + +The dispatcher check is in `_send_llama` at the function head: +```python +if "localhost" in _llama_base_url or "127.0.0.1" in _llama_base_url: + return _send_llama_native(...) +``` + +For OpenRouter, custom URLs, and other cloud Llama endpoints, the existing OpenAI-compat path is unchanged. + +### V2 Capability Matrix (Phase 4) + +Added 2026-06-11. The `VendorCapabilities` dataclass in `src/vendor_capabilities.py` now has 12 v2 fields beyond the original 7 v1 fields: + +**V1 fields** (unchanged): +- `vision`, `tool_calling`, `caching`, `streaming`, `model_discovery`, `context_window`, `cost_tracking` + +**V2 fields** (added): +- `local` — backend is on-device (Ollama, etc.); consumed by `_apply_runtime_caps_override` for llama+localhost +- `reasoning` — model supports `thinking` / reasoning traces (e.g., MiniMax-M2.5/M2.7, DeepSeek R1, llama-3.1-405b-reasoning) +- `structured_output` — model supports JSON / tool-use output format +- `code_execution` — model can run code (server-side; e.g., gemini-2.0-experimental) +- `web_search` — model can do live web search (e.g., grok-2, gemini-grounded) +- `x_search` — X/Twitter search (grok-specific) +- `file_search` — model has a file_search tool (Anthropic) +- `mcp_support` — model supports the Model Context Protocol (Anthropic, gemini) +- `audio` — model accepts audio input (gemini-2.5+, qwen-audio) +- `video` — model accepts video input (gemini-2.5+, qwen-vl-max) +- `grounding` — model supports grounding (gemini) +- `computer_use` — model can drive a computer (Anthropic claude-3.5+) + +**GUI rendering**: `src/gui_2.py:_render_v2_capability_badges` renders small green badges in the provider panel for each field where `caps. = True`. The user can see at a glance which capabilities their active vendor+model supports. + +**Static + runtime**: Most v2 fields are per-model properties in the registry. `caps.local` is unique — it's runtime state (URL-dependent), so the GUI uses `dataclasses.replace(caps, local=True)` to override when the active backend is Ollama. + +### PROVIDERS Location (Phase 2) + +The `PROVIDERS` list moved from `src/models.py` to `src/ai_client.py:56` per the AGENTS.md HARD RULE (no new `src/.py` files). A PEP 562 `__getattr__` re-export in `src/models.py:261` maintains backward compatibility (lazy import; breaks the circular dependency where `src/ai_client.py` imports `ToolPreset` from `src/models.py`). + +Audit: `scripts/audit_providers_source_of_truth.py` fails if `PROVIDERS` is declared in `src/models.py`. + ### Tests diff --git a/docs/guide_models.md b/docs/guide_models.md index e24736b5..90d26f12 100644 --- a/docs/guide_models.md +++ b/docs/guide_models.md @@ -533,8 +533,53 @@ Tests live in `tests/test_models.py` and module-specific test files (e.g., `test 5. Add tests in `tests/test_models.py` (round-trip + validation). 6. Update `docs/guide_models.md` (this file) to document the new model. + --- +## PROVIDERS Constant (Location Change 2026-06-11) + +The `PROVIDERS` list was moved from `src/models.py` to `src/ai_client.py:56` per the AGENTS.md HARD RULE (no new `src/.py` files; system code lives in the system module). + +**Current location**: `src/ai_client.py` (import as `from src.ai_client import PROVIDERS`) + +**Backward compat**: `src/models.py:261-264` has a PEP 562 `__getattr__` that re-exports `PROVIDERS` via lazy import. This breaks the circular dependency where `src/ai_client.py:50` imports `ToolPreset` from `src/models.py` (a top-level `from src.ai_client import PROVIDERS` in `models.py` would deadlock). + +**Audit**: `scripts/audit_providers_source_of_truth.py` fails if `PROVIDERS` is declared as a literal in `src/models.py`. + +The 4 internal import sites were updated in commit `6c6a4aef`: +- `src/app_controller.py:3093` +- `src/gui_2.py:2293, 2849, 5377` + +--- + +## V2 Capability Matrix (Added 2026-06-11) + +`src/vendor_capabilities.py` defines the `VendorCapabilities` dataclass (NOT in `src/models.py` — it's in its own file because it's not a "data model" but a "capability registry"). The dataclass was extended with 12 v2 fields: + +**V1 fields** (unchanged from parent track): +- `vision`, `tool_calling`, `caching`, `streaming`, `model_discovery`, `context_window`, `cost_tracking` + +**V2 fields** (added in `qwen_llama_grok_followup_20260611` Phase 4): +- `local` — backend is on-device (Ollama, etc.) +- `reasoning` — model supports `thinking` / reasoning traces +- `structured_output` — model supports JSON / tool-use output +- `code_execution` — model can run code (server-side) +- `web_search` — model can do live web search +- `x_search` — X/Twitter search (grok-specific) +- `file_search` — model has a file_search tool (Anthropic) +- `mcp_support` — model supports the Model Context Protocol +- `audio` — model accepts audio input +- `video` — model accepts video input +- `grounding` — model supports grounding (gemini) +- `computer_use` — model can drive a computer (Anthropic claude-3.5+) + +All v2 fields default to `False`. The dataclass is `frozen=True`; per-vendor entries use `register()` at module-import time. The GUI reads the matrix via `get_capabilities(vendor, model)` and adapts 9+ UI elements accordingly (see [guide_ai_client.md §V2 Capability Matrix](guide_ai_client.md#v2-capability-matrix-phase-4)). + +**Adding a new v2 field**: The HARD RULE is that all AI-client code lives in `src/ai_client.py`. New v2 fields go in `src/vendor_capabilities.py` (existing file) — NOT in a new `src/.py` file. Update the dataclass, populate per-model in the registry, add a small rendering helper in `src/gui_2.py` (e.g., `_render_v2_capability_badges` for the existing 11 v2 fields). + +--- + + ## See Also - **[guide_architecture.md](guide_architecture.md)** — How models flow through the system