refactor(minimax): use send_openai_compatible helper (231 -> 41 lines)

conductor(plan): mark t3.18 + phase_3 complete; advance to phase 4
conductor(checkpoint): Phase 3 complete - Grok (xAI) + Llama (multi-backend) via shared helper
2026-06-11 02:21:28 -04:00 · 2026-06-11 02:06:13 -04:00 · 2026-06-11 02:05:37 -04:00 · 2026-06-11 02:05:07 -04:00 · 2026-06-11 02:04:09 -04:00 · 2026-06-11 02:02:56 -04:00
16 changed files with 929 additions and 464 deletions
@@ -1,158 +0,0 @@
-# TASKS.md
-<!-- Quick-read pointer to active and planned conductor tracks -->
-<!-- Source of truth for task state is conductor/tracks/*/plan.md -->
-
-## Active Tracks
-*(none — all planned tracks queued below)*
-*See tracks.md for active track status*
-
-## Completed This Session
-*(See archive: strict_execution_queue_completed_20260306)*
-
---
-
-#### 0. conductor_path_configurable_20260306
- **Status:** Planned
- **Priority:** CRITICAL
- **Goal:** Eliminate hardcoded conductor paths. Make path configurable via config.toml or CONDUCTOR_DIR env var. Allow running app to use separate directory from development tracks.
-
-## Phase 3: Future Horizons (Tracks 1-20)
-*Initialized: 2026-03-06*
-
-### Architecture & Backend
-
-#### 1. true_parallel_worker_execution_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Implement true concurrency for the DAG engine. Once threading.local() is in place, the ExecutionEngine should spawn independent Tier 3 workers in parallel (e.g., 4 workers handling 4 isolated tests simultaneously). Requires strict file-locking or a Git-based diff-merging strategy to prevent AST collision.
-
-#### 2. deep_ast_context_pruning_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Before dispatching a Tier 3 worker, use tree_sitter to automatically parse the target file AST, strip out unrelated function bodies, and inject a surgically condensed skeleton into the worker prompt. Guarantees the AI only sees what it needs to edit, drastically reducing token burn.
-
-#### 3. visual_dag_ticket_editing_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Replace the linear ticket list in the GUI with an interactive Node Graph using ImGui Bundle node editor. Allow the user to visually drag dependency lines, split nodes, or delete tasks before clicking Execute Pipeline.
-
-#### 4. tier4_auto_patching_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Elevate Tier 4 from a log summarizer to an auto-patcher. When a verification test fails, Tier 4 generates a .patch file. The GUI intercepts this and presents a side-by-side Diff Viewer. The user clicks Apply Patch to instantly resume the pipeline.
-
-#### 5. native_orchestrator_20260306
- **Status:** Planned
- **Priority:** Low
- **Goal:** Absorb the Conductor extension entirely into the core application. Manual Slop should natively read/write plan.md, manage the metadata.json, and orchestrate the MMA tiers in pure Python, removing the dependency on external CLI shell executions (mma_exec.py).
-
---
-
-### GUI Overhauls & Visualizations
-
-#### 6. cost_token_analytics_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Real-time cost tracking panel displaying cost per model, session totals, and breakdown by tier. Uses existing cost_tracker.py which is implemented but has no GUI.
-
-#### 7. performance_dashboard_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Expand performance metrics panel with CPU/RAM usage, frame time, input lag with historical graphs. Uses existing performance_monitor.py which has basic metrics but no detailed visualization.
-
-#### 8. mma_multiworker_viz_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Split-view GUI for parallel worker streams per tier. Visualize multiple concurrent workers with individual status, output tabs, and resource usage. Enable kill/restart per worker.
-
-#### 9. cache_analytics_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Gemini cache hit/miss visualization, memory usage, TTL status display. Uses existing ai_client.get_gemini_cache_stats() which is not displayed in GUI.
-
-#### 10. tool_usage_analytics_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Analytics panel showing most-used tools, average execution time, and failure rates. Uses existing tool_log_callback data.
-
-#### 11. session_insights_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Token usage over time, cost projections, session summary with efficiency scores. Visualize session_logger data.
-
-#### 12. track_progress_viz_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Progress bars and percentage completion for active tracks and tickets. Better visualization of DAG execution state.
-
-#### 13. manual_skeleton_injection_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Add UI controls to manually flag files for skeleton injection in discussions. Allow agent to request full file reads or specific def/class definitions on-demand.
-
-#### 14. on_demand_def_lookup_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Add ability for agent to request specific class/function definitions during discussion. User can @mention a symbol and get its full definition inline.
-
---
-
-### Manual UX Controls
-
-#### 15. ticket_queue_mgmt_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Allow user to manually reorder, prioritize, or requeue tickets in the DAG. Add drag-drop reordering, priority tags, and bulk selection.
-
-#### 16. kill_abort_workers_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion; add cancel button.
-
-#### 17. manual_block_control_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Allow user to manually block or unblock tickets with custom reasons. Currently blocked tickets rely on dependency resolution; add manual override.
-
-#### 18. pipeline_pause_resume_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Add global pause/resume for the entire DAG execution pipeline. Allow user to freeze all worker activity and resume later.
-
-#### 19. per_ticket_model_20260306
- **Status:** Planned
- **Priority:** Low
- **Goal:** Allow user to manually select which model to use for a specific ticket, overriding the default tier model.
-
-#### 20. manual_ux_validation_20260302
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures.
-
---
-
-### C/C++ Language Support
-
-#### 25. ts_cpp_tree_sitter_20260308
- **Status:** Planned
- **Priority:** High
- **Goal:** Add tree-sitter C and C++ grammars. Extend ASTParser to support C/C++ skeleton and outline extraction. Add MCP tools ts_c_get_skeleton, ts_cpp_get_skeleton, ts_c_get_code_outline, ts_cpp_get_code_outline.
-
-#### 26. gencpp_python_bindings_20260308
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Bootstrap standalone Python project with CFFI bindings for gencpp C library. Provides foundation for richer C++ AST parsing in future (beyond tree-sitter syntax).
-
---
-
-### Path Configuration
-
-#### 27. project_conductor_dir_20260308
- **Status:** Planned
- **Priority:** High
- **Goal:** Make conductor directory per-project. Each project TOML can specify custom conductor dir for isolated track/state management. Extends existing global path config.
-
-#### 28. gui_path_config_20260308
- **Status:** Planned
- **Priority:** High
- **Goal:** Add path configuration UI to Context Hub. Allow users to view and edit configurable paths (conductor, logs, scripts) directly from the GUI.
@@ -59,6 +59,40 @@ This means:
 - **Anthropic/Gemini/DeepKeep** stay per-vendor code paths; the data-oriented refactor doesn't apply to them because their unique APIs are not OpenAI-compatible-shaped.
 - **"Base paths are unique"** (the user's wording) means: `_send_qwen()`, `_send_llama()`, `_send_grok()`, `_send_minimax()` are the unique entry points; everything they call into is shared.

+### 3.1.1 Architectural principle: "Use the best API per vendor" (added 2026-06-11, revised after Grok consultation)
+
+**Per the user's correction, the track's prior assumption — "all OpenAI-compatible" — was incomplete. The right principle is: **use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.**
+
+The OpenAI-compatible shim (the `send_openai_compatible` helper) is the highest-leverage part of the spec: every vendor that uses it gets the same request/response/tool-calling/error/streaming logic with zero duplication. The question is **which vendors should use it** vs. which should have a native adapter.
+
+**Confirmed best API per vendor (Grok-consulted 2026-06-11):**
+
+| Vendor | API / Approach | Decision |
+|---|---|---|
+| **Qwen** | Alibaba DashScope native SDK (not OpenAI-compatible) | **NATIVE** — OpenAI-compatible mode drops Qwen-Audio, Qwen-Long custom chunking, Qwen-VL-Max enhanced vision. Phase 2 ships this. |
+| **xAI (Grok)** | xAI official OpenAI-compatible (`https://api.x.ai/v1`) | **OPENAI-COMPATIBLE** — Per Grok's own confirmation, the OpenAI-compatible endpoint is "fully compatible and clean" with "no meaningful unique native surface lost." Phase 3 ships this. |
+| **MiniMax** | OpenAI-compatible (`https://api.minimax.io/v1`) | **OPENAI-COMPATIBLE** — Already fully compatible. Phase 4 refactor is a pure win. |
+| **DeepSeek** | OpenAI-compatible (`https://api.deepseek.com`) | **OPENAI-COMPATIBLE** — Drop-in compatible by design; offers an `/anthropic`-compatible path too. Follow-up track. |
+| **Ollama** (Llama local backend) | Ollama's `/v1/chat/completions` (OpenAI-compatible) is the v1 choice; native `/api/chat` is a possible v2 | **OPENAI-COMPATIBLE in v1** — Ollama's compat endpoint supports streaming, tools, vision, JSON mode. Native `/api/chat` has extras (`think` param, `images: list[str]`, structured outputs); deferred to follow-up. |
+| **Meta Llama API** (Llama cloud-native) | Meta's native REST API | **NATIVE (NEW BACKEND, FOLLOW-UP)** — Add as a 4th Llama backend. Deferred pending verification of Meta's API spec. |
+| **Gemini** | Google `genai` SDK / Gemini native API (NOT OpenAI-compatible) | **NATIVE (FOLLOW-UP)** — OpenAI-comp loses explicit context caching (big cost win), Grounding with Google Search, native video/multimodal. The deferred follow-up track. |
+| **Anthropic** | Anthropic official SDK / Messages API (NOT OpenAI-compatible) | **NATIVE (FOLLOW-UP)** — Native gives prompt caching (`cache_control` ephemeral, 50-90% savings), PDF processing, citations, extended thinking, Computer Use. OpenAI-comp layer exists but loses too much. The deferred follow-up track. |
+
+**Implications for the capability matrix:** as native APIs add features, the matrix grows. The current v1 matrix has 7 fields (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking). Future expansion (per the deferred list in §3.3, refined by Grok's consultation) will add:
+
+- `audio` (Qwen-Audio, others)
+- `video` (Gemini native, others)
+- `grounding` / `search` (Gemini Grounding with Google Search, Grok's `x_search` and `web_search`)
+- `computer_use` (Anthropic, beta/agentic)
+- `local` (boolean — true for Ollama; useful for UX "free local" badge)
+- `reasoning` / `extended_thinking` (Grok `reasoning_effort`, Anthropic extended thinking, Ollama `think`)
+- `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support` (per-vendor server-side tools)
+- `structured_output` (response_format / format support)
+
+The matrix IS the aggregate tracker; the GUI filters UI elements based on what's in the matrix. **The matrix's job is to be the canonical source of truth for "what can this vendor/model do"; the GUI never hard-codes per-vendor branches.** Any new capability a vendor adds (server-side tools, native cost reporting, prompt caching) goes into the matrix; the UI filters based on it.
+
+**This track's Phase 3 ships the OpenAI-compatible Grok + Llama (3 backends) as the canonical implementation per Grok's confirmation; the native-API work for Llama (Ollama native, Meta Llama API) is deferred to follow-up tracks documented in §13.1.**
+
 ### 3.2 Module Layout

 ```
@@ -222,9 +256,11 @@ _llama_api_key: str = "ollama"                      # Ollama doesn't require aut

 **Model discovery:** Ollama exposes `GET /api/tags` (not `/v1/models`); OpenRouter exposes `GET /v1/models`. The Llama adapter probes both endpoints and unions the results. For custom URLs, falls back to the hardcoded registry.

-### 4.3 Grok via xAI (OpenAI-Compatible)
+### 4.3 Grok via xAI (OpenAI-Compatible) — confirmed 2026-06-11

-**SDK:** `openai` (already a dependency).
+**Per Grok's consultation (2026-06-11): the OpenAI-compatible endpoint at `https://api.x.ai/v1` is the canonical, fully-featured approach.** xAI's API is "fully compatible and clean" with "no meaningful unique native surface lost" by using the OpenAI-compatible shim. This section was previously labeled "Native REST API" based on a user impression that the native endpoint had unique features (prompt_cache_key, reasoning_effort, server-side tools, cost_in_usd_ticks) that the shim loses; Grok's actual recommendation is that the shim is fine.
+
+**SDK:** `openai` (already a dependency). Set `base_url="https://api.x.ai/v1"` and pass the xAI API key as the Bearer token (handled automatically by the OpenAI SDK).

 **State:**
 ```python
@@ -239,15 +275,15 @@ _grok_history_lock: threading.Lock = threading.Lock()

 **Models shipped in the capability registry (v1):**

-| Model | vision | tool_calling | caching | context_window | cost_input | cost_output |
-|---|---|---|---|---|---|---|
-| `grok-2` | false | true | false | 131,072 | $2.00 | $10.00 |
-| `grok-2-vision` | true | true | false | 32,768 | $2.00 | $10.00 |
-| `grok-beta` | false | true | false | 131,072 | $5.00 | $15.00 |
+| Model | vision | tool_calling | context_window | cost_input | cost_output |
+|---|---|---|---|---|---|
+| `grok-2` | false | true | 131,072 | $2.00 | $10.00 |
+| `grok-2-vision` | true | true | 32,768 | $2.00 | $10.00 |
+| `grok-beta` | false | true | 131,072 | $5.00 | $15.00 |

-(Pricing from x.ai public pricing as of 2026-06-06; update if needed.)
+(Pricing from x.ai public pricing as of 2026-06-06; update if needed. `caching` stays `False` in v1 since Grok's OpenAI-compatible shim doesn't expose `prompt_cache_key`.)

-**Entry point:** `_send_grok()` in `src/ai_client.py`. Calls `send_openai_compatible()` with the xAI base URL.
+**Entry point:** `_send_grok()` in `src/ai_client.py`. Calls `send_openai_compatible()` with the xAI base URL (via the OpenAI SDK).

 **Tool format:** Native OpenAI. No translation needed.

@@ -466,9 +502,15 @@ Each phase has its own checkpoint commit and git note.

 ## 13. See Also

-### 13.1 Follow-up Track (separate plan)
+### 13.1 Follow-up Tracks (separate plans)

-**"Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high.
+**A. "Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high.
+
+**B. "Llama Native APIs (Ollama native + Meta Llama API)"** — Per §3.1.1's revised assessment (after Grok's consultation), xAI's OpenAI-compatible endpoint is the canonical full-featured approach — NO Grok native refactor is needed. The follow-up for Llama backends is:
+- **Llama (Ollama backend)** → Ollama native `/api/chat`; adds `think` param (low/medium/high), `images: list[str]` in messages (cleaner base64 than OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs. The Phase 3 Red tests are written for the OpenAI-compatible shim; the native tests would mock `requests.post` to `/api/chat`.
+- **Llama (Meta Llama API backend)** → New 4th Llama backend; uses Meta's native REST API. Currently deferred pending verification of Meta's API spec (the `llama.developer.meta.com/docs/overview` URL returned 400 on fetch this session; needs re-verification when the docs are available).
+- **Capability matrix expansion** → Add fields for the new native features per Grok's consultation: `audio`, `video`, `grounding`/`search`, `computer_use`, `local`, `reasoning`/`extended_thinking`, `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `structured_output`. Each addition is a registry change + a UI adaptation in Phase 5.
+- **Test rewrites** → The Phase 3 Llama Red tests in `test_llama_provider.py` would be extended with 2 more tests: native Ollama (`/api/chat` with `think` param, `images: list[str]`) and Meta Llama API. The Grok Red tests do NOT need rewriting.

 ### 13.2 Project References

@@ -5,14 +5,15 @@
 track_id = "qwen_llama_grok_integration_20260606"
 name = "Qwen, Llama & Grok Vendor Integration + Capability Matrix"
 status = "active"
-current_phase = 0
-last_updated = "2026-06-06"
+current_phase = 3
+last_updated = "2026-06-11"
+

 [phases]
 # Phase 1: Capability matrix framework + shared helper (no user-facing changes)
-phase_1 = { status = "pending", checkpoint_sha = "", name = "Capability matrix framework + shared helper" }
+phase_1 = { status = "completed", checkpoint_sha = "03da130", name = "Capability matrix framework + shared helper" }
 # Phase 2: Qwen via DashScope
-phase_2 = { status = "pending", checkpoint_sha = "", name = "Qwen via DashScope" }
+phase_2 = { status = "completed", checkpoint_sha = "0f2541a", name = "Qwen via DashScope" }
 # Phase 3: Grok + Llama via shared helper
 phase_3 = { status = "pending", checkpoint_sha = "", name = "Grok + Llama via shared helper" }
 # Phase 4: MiniMax refactor
@@ -25,49 +26,49 @@ phase_6 = { status = "pending", checkpoint_sha = "", name = "Docs + archive" }
 [tasks]
 # Phase 1: Capability matrix framework + shared helper
 # (Tasks TBD by writing-plans; placeholder structure only)
-t1_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_vendor_capabilities.py::test_registry_lookup_known_model" }
-t1_2 = { status = "pending", commit_sha = "", description = "Red: tests/test_vendor_capabilities.py::test_fallback_to_vendor_default" }
-t1_3 = { status = "pending", commit_sha = "", description = "Red: tests/test_vendor_capabilities.py::test_unknown_vendor_raises" }
-t1_4 = { status = "pending", commit_sha = "", description = "Green: implement src/vendor_capabilities.py with VendorCapabilities + get_capabilities + initial registry" }
-t1_5 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_send_non_streaming" }
-t1_6 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_send_streaming_aggregates_chunks" }
-t1_7 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_tool_call_detection" }
-t1_8 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_vision_multimodal_message" }
-t1_9 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_error_classification_429_to_rate_limit" }
-t1_10 = { status = "pending", commit_sha = "", description = "Green: implement src/openai_compatible.py with NormalizedResponse + OpenAICompatibleRequest + send_openai_compatible" }
-t1_11 = { status = "pending", commit_sha = "", description = "Add dashscope>=1.14.0,<2.0.0 to pyproject.toml dependencies" }
-t1_12 = { status = "pending", commit_sha = "", description = "Phase 1 checkpoint commit + git note" }
+t1_1 = { status = "completed", commit_sha = "6fb6f86", description = "Red: tests/test_vendor_capabilities.py::test_registry_lookup_known_model" }
+t1_2 = { status = "completed", commit_sha = "6fb6f86", description = "Red: tests/test_vendor_capabilities.py::test_fallback_to_vendor_default" }
+t1_3 = { status = "completed", commit_sha = "6fb6f86", description = "Red: tests/test_vendor_capabilities.py::test_unknown_vendor_raises" }
+t1_4 = { status = "completed", commit_sha = "6be04bc", description = "Green: implement src/vendor_capabilities.py with VendorCapabilities + get_capabilities + initial registry" }
+t1_5 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_send_non_streaming" }
+t1_6 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_send_streaming_aggregates_chunks" }
+t1_7 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_tool_call_detection" }
+t1_8 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_vision_multimodal_message" }
+t1_9 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_error_classification_429_to_rate_limit" }
+t1_10 = { status = "completed", commit_sha = "d7d7d5c", description = "Green: implement src/openai_compatible.py with NormalizedResponse + OpenAICompatibleRequest + send_openai_compatible" }
+t1_11 = { status = "in_progress", commit_sha = "", description = "Add dashscope>=1.14.0,<2.0.0 to pyproject.toml dependencies" }
+t1_12 = { status = "completed", commit_sha = "03da130", description = "Phase 1 checkpoint commit + git note" }
 # Phase 2: Qwen via DashScope
-t2_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_send_qwen_routes_to_dashscope" }
-t2_2 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_qwen_tool_format_translation" }
-t2_3 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_qwen_vl_vision_image_base64" }
-t2_4 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_qwen_error_classification" }
-t2_5 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_list_qwen_models" }
-t2_6 = { status = "pending", commit_sha = "", description = "Green: implement _send_qwen, _ensure_qwen_client, _classify_qwen_error, _list_qwen_models in src/ai_client.py" }
-t2_7 = { status = "pending", commit_sha = "", description = "Add [qwen] section to credentials_template.toml" }
-t2_8 = { status = "pending", commit_sha = "", description = "Add qwen to PROVIDERS in src/gui_2.py and src/app_controller.py" }
-t2_9 = { status = "pending", commit_sha = "", description = "Add Qwen models to capability registry in src/vendor_capabilities.py" }
-t2_10 = { status = "pending", commit_sha = "", description = "Add Qwen pricing to src/cost_tracker.py" }
-t2_11 = { status = "pending", commit_sha = "", description = "Phase 2 checkpoint commit + git note" }
+t2_1 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_send_qwen_routes_to_dashscope" }
+t2_2 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_qwen_tool_format_translation" }
+t2_3 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_qwen_vl_vision_image_base64" }
+t2_4 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_qwen_error_classification" }
+t2_5 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_list_qwen_models" }
+t2_6 = { status = "completed", commit_sha = "bc2cce1", description = "Green: implement _send_qwen, _ensure_qwen_client, _classify_qwen_error, _list_qwen_models in src/ai_client.py" }
+t2_7 = { status = "cancelled", commit_sha = "ab6b53f", description = "SKIPPED: no credentials_template.toml exists in project; user maintains single credentials.toml directly" }
+t2_8 = { status = "completed", commit_sha = "ab6b53f", description = "Add qwen to PROVIDERS (centralized in src/models.py; gui_2.py and app_controller.py import from there)" }
+t2_9 = { status = "completed", commit_sha = "6be04bc", description = "Add Qwen models to capability registry (DONE in Phase 1 initial population; 8 qwen entries: 1 wildcard + 7 specific)" }
+t2_10 = { status = "completed", commit_sha = "ab6b53f", description = "Add Qwen pricing to src/cost_tracker.py" }
+t2_11 = { status = "completed", commit_sha = "0f2541a", description = "Phase 2 checkpoint commit + git note" }
 # Phase 3: Grok + Llama via shared helper
-t3_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_grok_provider.py::test_send_grok_uses_xai_endpoint" }
-t3_2 = { status = "pending", commit_sha = "", description = "Red: tests/test_grok_provider.py::test_grok_2_vision_vision_support" }
-t3_3 = { status = "pending", commit_sha = "", description = "Green: implement _send_grok, _ensure_grok_client in src/ai_client.py" }
-t3_4 = { status = "pending", commit_sha = "", description = "Add [grok] section to credentials_template.toml" }
-t3_5 = { status = "pending", commit_sha = "", description = "Add grok to PROVIDERS in src/gui_2.py and src/app_controller.py" }
-t3_6 = { status = "pending", commit_sha = "", description = "Add Grok models to capability registry" }
-t3_7 = { status = "pending", commit_sha = "", description = "Add Grok pricing to src/cost_tracker.py" }
-t3_8 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_send_llama_ollama_backend" }
-t3_9 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_send_llama_openrouter_backend" }
-t3_10 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_send_llama_custom_url" }
-t3_11 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_llama_model_discovery_unions_ollama_and_openrouter" }
-t3_12 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_llama_3_2_vision_vision_support" }
-t3_13 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_llama_local_backend_cost_tracking_false" }
-t3_14 = { status = "pending", commit_sha = "", description = "Green: implement _send_llama, _ensure_llama_client, _list_llama_models in src/ai_client.py" }
-t3_15 = { status = "pending", commit_sha = "", description = "Add [llama] section to credentials_template.toml" }
-t3_16 = { status = "pending", commit_sha = "", description = "Add llama to PROVIDERS in src/gui_2.py and src/app_controller.py" }
-t3_17 = { status = "pending", commit_sha = "", description = "Add Llama models to capability registry" }
-t3_18 = { status = "pending", commit_sha = "", description = "Phase 3 checkpoint commit + git note" }
+t3_1 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_grok_provider.py::test_send_grok_uses_xai_endpoint" }
+t3_2 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_grok_provider.py::test_grok_2_vision_vision_support" }
+t3_3 = { status = "completed", commit_sha = "29a96cc", description = "Green: implement _send_grok, _ensure_grok_client in src/ai_client.py" }
+t3_4 = { status = "cancelled", commit_sha = "f9b5c93", description = "SKIPPED: no credentials_template.toml exists; user maintains single credentials.toml directly" }
+t3_5 = { status = "completed", commit_sha = "f9b5c93", description = "Add grok to PROVIDERS (centralized in src/models.py)" }
+t3_6 = { status = "completed", commit_sha = "6be04bc", description = "Add Grok models to capability registry (DONE in Phase 1)" }
+t3_7 = { status = "completed", commit_sha = "f9b5c93", description = "Add Grok pricing to src/cost_tracker.py (3 entries)" }
+t3_8 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_send_llama_ollama_backend" }
+t3_9 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_send_llama_openrouter_backend" }
+t3_10 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_send_llama_custom_url" }
+t3_11 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_llama_model_discovery_unions_ollama_and_openrouter" }
+t3_12 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_llama_3_2_vision_vision_support" }
+t3_13 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_llama_local_backend_cost_tracking_false" }
+t3_14 = { status = "completed", commit_sha = "29a96cc", description = "Green: implement _send_llama, _ensure_llama_client, _list_llama_models, _get_llama_cost_tracking" }
+t3_15 = { status = "cancelled", commit_sha = "f9b5c93", description = "SKIPPED: no credentials_template.toml exists; user maintains single credentials.toml directly" }
+t3_16 = { status = "completed", commit_sha = "f9b5c93", description = "Add llama to PROVIDERS (centralized in src/models.py)" }
+t3_17 = { status = "completed", commit_sha = "6be04bc", description = "Add Llama models to capability registry (DONE in Phase 1; 9 entries: 1 wildcard + 8 models)" }
+t3_18 = { status = "completed", commit_sha = "21adb4a", description = "Phase 3 checkpoint commit + git note" }
 # Phase 4: MiniMax refactor
 t4_1 = { status = "pending", commit_sha = "", description = "Baseline: run tests/test_minimax_provider.py; all pass (green)" }
 t4_2 = { status = "pending", commit_sha = "", description = "Refactor _send_minimax to use send_openai_compatible helper" }
@@ -93,7 +94,7 @@ t6_5 = { status = "pending", commit_sha = "", description = "Final checkpoint co
 # Filled as phases complete
 phase_1_capability_registry_complete = false
 phase_1_shared_helper_complete = false
-phase_2_qwen_dashscope_complete = false
+phase_2_qwen_dashscope_complete = true
 phase_3_grok_complete = false
 phase_3_llama_complete = false
 phase_4_minimax_refactor_preserves_tests = false
@@ -20,6 +20,7 @@ dependencies = [
    "uvicorn~=0.41.0",

    "anthropic~=0.83.0",
+    "dashscope>=1.14.0,<2.0.0",
    "google-genai~=1.64.0",
    "openai~=2.26.0",

@@ -1,30 +0,0 @@
-$total = 0
-$passed = 0
-$failed = 0
-
-$testFiles = Get-ChildItem tests/test_*.py | Select-Object -ExpandProperty Name
-
-Write-Host "Running full test suite..."
-Write-Host "==========================="
-
-foreach ($file in $testFiles) {
-    Write-Host "Testing: $file"
-    $result = uv run pytest "tests/$file" -q --tb=no 2>&1 | Select-String -Pattern "passed|failed"
-    
-    if ($result -match "(\d+) passed") {
-        $p = [int]$matches[1]
-        $passed += $p
-        $total += $p
-    }
-    if ($result -match "(\d+) failed") {
-        $f = [int]$matches[1]
-        $failed += $f
-        $total += $f
-    }
-}
-
-Write-Host ""
-Write-Host "==========================="
-Write-Host "TOTAL: $total tests"
-Write-Host "PASSED: $passed"
-Write-Host "FAILED: $failed"
@@ -131,6 +131,21 @@ _minimax_client:  Any = None
 _minimax_history: list[dict[str, Any]] = []
 _minimax_history_lock: threading.Lock = threading.Lock()

+_qwen_client: Any = None
+_qwen_history: list[dict[str, Any]] = []
+_qwen_history_lock: threading.Lock = threading.Lock()
+_qwen_region: str = "china"
+
+_grok_client: Any = None
+_grok_history: list[dict[str, Any]] = []
+_grok_history_lock: threading.Lock = threading.Lock()
+
+_llama_client: Any = None
+_llama_history: list[dict[str, Any]] = []
+_llama_history_lock: threading.Lock = threading.Lock()
+_llama_base_url: str = "http://localhost:11434/v1"
+_llama_api_key: str = "ollama"
+
 _send_lock: threading.Lock = threading.Lock()

 _BIAS_ENGINE = ToolBiasEngine()
@@ -486,6 +501,7 @@ def reset_session() -> None:
 global _anthropic_client, _anthropic_history
 global _deepseek_client, _deepseek_history
 global _minimax_client, _minimax_history
+ global _qwen_client, _qwen_history
 global _CACHED_ANTHROPIC_TOOLS, _CACHED_DEEPSEEK_TOOLS
 global _gemini_cli_adapter
 if _gemini_client and _gemini_cache:
@@ -513,6 +529,17 @@ def reset_session() -> None:
 _minimax_client    = None
 with _minimax_history_lock:
  _minimax_history       = []
+ _qwen_client    = None
+ with _qwen_history_lock:
+  _qwen_history       = []
+ _grok_client = None
+ with _grok_history_lock:
+  _grok_history = []
+ _llama_client = None
+ with _llama_history_lock:
+  _llama_history = []
+ _llama_base_url = "http://localhost:11434/v1"
+ _llama_api_key = "ollama"
 _CACHED_ANTHROPIC_TOOLS = None
 _CACHED_DEEPSEEK_TOOLS  = None
 file_cache.reset_client()
@@ -527,6 +554,9 @@ def list_models(provider: str) -> list[str]:
 elif provider == "deepseek":   return _list_deepseek_models(creds["deepseek"]["api_key"])
 elif provider == "gemini_cli": return _list_gemini_cli_models()
 elif provider == "minimax":    return _list_minimax_models(creds["minimax"]["api_key"])
+ elif provider == "qwen":      return _list_qwen_models()
+ elif provider == "grok":     return _list_grok_models()
+ elif provider == "llama":    return _list_llama_models()
 return []

 #endregion: Comms Log
@@ -2140,6 +2170,58 @@ def _ensure_minimax_client() -> None:
   raise ValueError("MiniMax API key not found in credentials.toml")
  _minimax_client = OpenAI(api_key=api_key, base_url="https://api.minimax.chat/v1")

+def _ensure_grok_client() -> Any:
+ global _grok_client
+ if _grok_client is None:
+  openai = _require_warmed("openai")
+  creds = _load_credentials()
+  api_key = creds.get("grok", {}).get("api_key")
+  if not api_key:
+   raise ValueError("Grok API key not found in credentials.toml")
+  _grok_client = openai.OpenAI(api_key=api_key, base_url="https://api.x.ai/v1")
+ return _grok_client
+
+def _send_grok(md_content: str, user_message: str, base_dir: str,
+ file_items: list[dict[str, Any]] | None = None,
+ discussion_history: str = "",
+ stream: bool = False,
+ pre_tool_callback: Optional[Callable[[str, str, Optional[Callable[[str], str]]], Optional[str]]] = None,
+ qa_callback: Optional[Callable[[str], str]] = None,
+ stream_callback: Optional[Callable[[str], None]] = None,
+ patch_callback: Optional[Callable[[str, str], Optional[str]]] = None) -> str:
+ client = _ensure_grok_client()
+ from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible
+ from src.vendor_capabilities import get_capabilities
+ with _grok_history_lock:
+  user_content = user_message
+  if file_items:
+   for fi in file_items:
+    if fi.get("is_image") and fi.get("base64_data"):
+     user_content = f"[IMAGE: {fi.get('path', 'attachment')}]\n{user_content}"
+  if discussion_history and not _grok_history:
+   _grok_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"})
+  else:
+   _grok_history.append({"role": "user", "content": user_content})
+  messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"}]
+  messages.extend(_grok_history)
+ request = OpenAICompatibleRequest(
+  messages=messages,
+  model=_model,
+  temperature=_temperature,
+  top_p=_top_p,
+  max_tokens=_max_tokens,
+  stream=stream,
+  stream_callback=stream_callback,
+ )
+ caps = get_capabilities("grok", _model)
+ response = send_openai_compatible(client, request, capabilities=caps)
+ _grok_history.append({"role": "assistant", "content": response.text})
+ return response.text
+
+def _list_grok_models() -> list[str]:
+ from src.vendor_capabilities import list_models_for_vendor
+ return list_models_for_vendor("grok")
+
 def _send_minimax(md_content: str, user_message: str, base_dir: str,
 file_items: list[dict[str, Any]] | None = None,
 discussion_history: str = "",
@@ -2148,227 +2230,221 @@ def _send_minimax(md_content: str, user_message: str, base_dir: str,
 qa_callback: Optional[Callable[[str], str]] = None,
 stream_callback: Optional[Callable[[str], None]] = None,
 patch_callback: Optional[Callable[[str, str], Optional[str]]] = None) -> str:
- """
- [C: src/ai_server.py:_handle_send]
- """
- openai = _require_warmed("openai")
- requests = _require_warmed("requests")
- try:
-  mcp_client.configure(file_items or [], [base_dir])
-  creds = _load_credentials()
-  api_key = creds.get("minimax", {}).get("api_key")
-  if not api_key:
-   raise ValueError("MiniMax API key not found in credentials.toml")
-  
-  client = OpenAI(api_key=api_key, base_url="https://api.minimax.io/v1")
-  
-  with _minimax_history_lock:
-   _repair_minimax_history(_minimax_history)
-   if discussion_history and not _minimax_history:
-    user_content = f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"
-   else:
-    user_content = user_message
-   _minimax_history.append({"role": "user", "content": user_content})
-  
-  all_text_parts: list[str] = []
-  _cumulative_tool_bytes = 0
-  
-  for round_idx in range(MAX_TOOL_ROUNDS + 2):
-   current_api_messages: list[dict[str, Any]] = []
-   
-   sys_msg = {"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"}
-   current_api_messages.append(sys_msg)
-   
-   with _minimax_history_lock:
-    dropped = _trim_minimax_history([sys_msg], _minimax_history)
-    if dropped > 0:
-     _append_comms("OUT", "request", {"message": f"[MINIMAX HISTORY TRIMMED: dropped {dropped} old messages]"})
-
-    for i, msg in enumerate(_minimax_history):
-     role = msg.get("role")
-     api_msg = {"role": role}
-     
-     content = msg.get("content")
-     if role == "assistant":
-      if msg.get("tool_calls"):
-       api_msg["content"] = content or None
-       api_msg["tool_calls"] = msg["tool_calls"]
-      else:
-       api_msg["content"] = content or ""
-     elif role == "tool":
-      api_msg["content"] = content or ""
-      api_msg["tool_call_id"] = msg.get("tool_call_id")
-     else:
-      api_msg["content"] = content or ""
-     
-     current_api_messages.append(api_msg)
-   
-   request_payload: dict[str, Any] = {
-    "model": _model,
-    "messages": current_api_messages,
-    "stream": stream,
-    "extra_body": {"reasoning_split": True},
-   }
-   
-   if stream:
-    request_payload["stream_options"] = {"include_usage": True}
-   
-   request_payload["temperature"] = 1.0
-   request_payload["top_p"] = _top_p
-   request_payload["max_tokens"] = min(_max_tokens, 8192)
-   
-   tools = _get_deepseek_tools()
-   if tools:
-    request_payload["tools"] = tools
-   
-   events.emit("request_start", payload={"provider": "minimax", "model": _model, "round": round_idx, "streaming": stream})
-   
-   try:
-    response = client.chat.completions.create(**request_payload, timeout=120)
-   except Exception as e:
-    raise _classify_minimax_error(e) from e
-   
-   assistant_text = ""
-   tool_calls_raw = []
-   reasoning_content = ""
-   finish_reason = "stop"
-   usage = {}
-   
-   if stream:
-    aggregated_content = ""
-    aggregated_tool_calls: list[dict[str, Any]] = []
-    aggregated_reasoning = ""
-    current_usage: dict[str, Any] = {}
-    final_finish_reason = "stop"
-    
-    for chunk in response:
-     if not chunk.choices:
-      if chunk.usage:
-       current_usage = chunk.usage.model_dump()
-      continue
-     
-     delta = chunk.choices[0].delta
-     if delta.content:
-      content_chunk = delta.content
-      aggregated_content += content_chunk
-      if stream_callback:
-       stream_callback(content_chunk)
-     
-     if hasattr(delta, "reasoning_details") and delta.reasoning_details:
-      for detail in delta.reasoning_details:
-       if "text" in detail:
-        aggregated_reasoning += detail["text"]
-     
-     if delta.tool_calls:
-      for tc_delta in delta.tool_calls:
-       idx = tc_delta.index
-       while len(aggregated_tool_calls) <= idx:
-        aggregated_tool_calls.append({"id": "", "type": "function", "function": {"name": "", "arguments": ""}})
-       target = aggregated_tool_calls[idx]
-       if tc_delta.id:
-        target["id"] = tc_delta.id
-       if tc_delta.function and tc_delta.function.name:
-        target["function"]["name"] += tc_delta.function.name
-       if tc_delta.function and tc_delta.function.arguments:
-        target["function"]["arguments"] += tc_delta.function.arguments
-     
-     if chunk.choices[0].finish_reason:
-      final_finish_reason = chunk.choices[0].finish_reason
-     if chunk.usage:
-      current_usage = chunk.usage.model_dump()
-    
-    assistant_text = aggregated_content
-    tool_calls_raw = aggregated_tool_calls
-    reasoning_content = aggregated_reasoning
-    finish_reason = final_finish_reason
-    usage = current_usage
-   else:
-    choice = response.choices[0]
-    message = choice.message
-    assistant_text = message.content or ""
-    tool_calls_raw = message.tool_calls or []
-    if hasattr(message, "reasoning_details") and message.reasoning_details:
-     reasoning_content = message.reasoning_details[0].get("text", "") if message.reasoning_details else ""
-    finish_reason = choice.finish_reason or "stop"
-    usage = response.usage.model_dump() if response.usage else {}
-   
-   thinking_tags = ""
-   if reasoning_content:
-    thinking_tags = f"<thinking>\n{reasoning_content}\n</thinking>\n"
-   full_assistant_text = thinking_tags + assistant_text
-   
-   with _minimax_history_lock:
-    msg_to_store: dict[str, Any] = {"role": "assistant", "content": assistant_text or None}
-    if reasoning_content:
-     msg_to_store["reasoning_content"] = reasoning_content
-    if tool_calls_raw:
-     msg_to_store["tool_calls"] = tool_calls_raw
-    _minimax_history.append(msg_to_store)
-   
-   if full_assistant_text:
-    all_text_parts.append(full_assistant_text)
-   
-   _append_comms("IN", "response", {
-     "round": round_idx,
-     "stop_reason": finish_reason,
-     "text": full_assistant_text,
-     "tool_calls": tool_calls_raw,
-     "usage": usage,
-     "streaming": stream
-    })
-   
-   if finish_reason != "tool_calls" and not tool_calls_raw:
-    break
-   if round_idx > MAX_TOOL_ROUNDS:
-    break
-   
-   try:
-    loop = asyncio.get_running_loop()
-    results = asyncio.run_coroutine_threadsafe(
-     _execute_tool_calls_concurrently(tool_calls_raw, base_dir, pre_tool_callback, qa_callback, round_idx, "minimax", patch_callback),
-     loop
-    ).result()
-   except RuntimeError:
-    results = asyncio.run(_execute_tool_calls_concurrently(tool_calls_raw, base_dir, pre_tool_callback, qa_callback, round_idx, "minimax", patch_callback))
-   
-   tool_results_for_history: list[dict[str, Any]] = []
-   for i, (name, call_id, out, _) in enumerate(results):
-    if i == len(results) - 1:
-     if file_items:
-      file_items, changed = _reread_file_items(file_items)
-      ctx = _build_file_diff_text(changed)
-      if ctx:
-       out += f"\n\n{_get_context_marker()}\n\n{ctx}"
-     if round_idx == MAX_TOOL_ROUNDS:
-      out += "\n\n[SYSTEM: MAX ROUNDS. PROVIDE FINAL ANSWER.]"
-    
-    truncated = _truncate_tool_output(out)
-    _cumulative_tool_bytes += len(truncated)
-    tool_results_for_history.append({
-      "role": "tool",
-      "tool_call_id": call_id,
-      "content": truncated,
-     })
-    _append_comms("IN", "tool_result", {"name": name, "id": call_id, "output": out})
-    events.emit("tool_execution", payload={"status": "completed", "tool": name, "result": out, "round": round_idx})
-   
-   if _cumulative_tool_bytes > _MAX_TOOL_OUTPUT_BYTES:
-    tool_results_for_history.append({
-      "role": "user",
-      "content": f"SYSTEM WARNING: Cumulative tool output exceeded {_MAX_TOOL_OUTPUT_BYTES // 1000}KB budget. Provide your final answer now."
-     })
-    _append_comms("OUT", "request", {"message": f"[TOOL OUTPUT BUDGET EXCEEDED: {_cumulative_tool_bytes} bytes]"})
-   
-   with _minimax_history_lock:
-    for tr in tool_results_for_history:
-     _minimax_history.append(tr)
-   
-  return "\n\n".join(all_text_parts) if all_text_parts else "(No text returned)"
- except Exception as e:
-  raise _classify_minimax_error(e) from e
+ _ensure_minimax_client()
+ from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible
+ from src.vendor_capabilities import get_capabilities
+ with _minimax_history_lock:
+  _repair_minimax_history(_minimax_history)
+  if discussion_history and not _minimax_history:
+   _minimax_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"})
+  else:
+   _minimax_history.append({"role": "user", "content": user_message})
+  messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"}]
+  messages.extend(_minimax_history)
+ request = OpenAICompatibleRequest(
+  messages=messages,
+  model=_model,
+  temperature=_temperature,
+  top_p=_top_p,
+  max_tokens=min(_max_tokens, 8192),
+  stream=stream,
+  stream_callback=stream_callback,
+ )
+ caps = get_capabilities("minimax", _model)
+ response = send_openai_compatible(_minimax_client, request, capabilities=caps)
+ reasoning_content = ""
+ if response.raw_response and hasattr(response.raw_response, "choices"):
+  choice = response.raw_response.choices[0]
+  if hasattr(choice.message, "reasoning_details") and choice.message.reasoning_details:
+   reasoning_content = choice.message.reasoning_details[0].get("text", "") if choice.message.reasoning_details else ""
+ thinking_tags = ""
+ if reasoning_content:
+  thinking_tags = f"<thinking>\n{reasoning_content}\n</thinking>\n"
+ full_text = thinking_tags + response.text
+ with _minimax_history_lock:
+  msg_to_store: dict[str, Any] = {"role": "assistant", "content": response.text or None}
+  if reasoning_content:
+   msg_to_store["reasoning_content"] = reasoning_content
+  _minimax_history.append(msg_to_store)
+ return full_text

 #endregion: MiniMax Provider

+#region: Qwen Provider
+
+def _ensure_qwen_client() -> None:
+ global _qwen_client, _qwen_region
+ if _qwen_client is None:
+  import dashscope
+  creds = _load_credentials()
+  api_key = creds.get("qwen", {}).get("api_key")
+  if not api_key:
+   raise ValueError("Qwen API key not found in credentials.toml")
+  _qwen_region = creds.get("qwen", {}).get("region", "china")
+  if _qwen_region == "international":
+   dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"
+  else:
+   dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
+  dashscope.api_key = api_key
+  _qwen_client = dashscope.Generation
+
+def _dashscope_call(
+ model: str,
+ messages: list[dict[str, Any]],
+ tools: list[dict[str, Any]] | None,
+ *,
+ max_tokens: int,
+ temperature: float,
+ top_p: float,
+) -> dict[str, Any]:
+ import dashscope
+ from src.qwen_adapter import build_dashscope_tools
+ kwargs: dict[str, Any] = {
+  "model": model,
+  "messages": messages,
+  "max_tokens": max_tokens,
+  "temperature": temperature,
+  "top_p": top_p,
+  "result_format": "message",
+ }
+ if tools:
+  kwargs["tools"] = build_dashscope_tools(tools)
+ resp = dashscope.Generation.call(**kwargs)
+ if getattr(resp, "status_code", 200) != 200:
+  from src.qwen_adapter import classify_dashscope_error
+  raise classify_dashscope_error(_dashscope_exception_from_response(resp))
+ return {
+  "text": resp.output.text if hasattr(resp, "output") and resp.output else "",
+  "tool_calls": _extract_dashscope_tool_calls(resp),
+  "usage": {
+   "input_tokens": getattr(resp.usage, "input_tokens", 0) if hasattr(resp, "usage") and resp.usage else 0,
+   "output_tokens": getattr(resp.usage, "output_tokens", 0) if hasattr(resp, "usage") and resp.usage else 0,
+  },
+ }
+
+def _dashscope_exception_from_response(resp: Any) -> Exception:
+ msg = getattr(resp, "message", "unknown dashscope error")
+ return RuntimeError(msg)
+
+def _extract_dashscope_tool_calls(resp: Any) -> list[dict[str, Any]]:
+ out: list[dict[str, Any]] = []
+ if not (hasattr(resp, "output") and resp.output and getattr(resp.output, "tool_calls", None)):
+  return out
+ for tc in resp.output.tool_calls:
+  out.append({
+   "id": getattr(tc, "id", ""),
+   "type": "function",
+   "function": {
+    "name": getattr(tc.function, "name", "") if hasattr(tc, "function") else "",
+    "arguments": getattr(tc.function, "arguments", "{}") if hasattr(tc, "function") else "{}",
+   },
+  })
+ return out
+
+def _list_qwen_models() -> list[str]:
+ from src.vendor_capabilities import list_models_for_vendor
+ return list_models_for_vendor("qwen")
+
+def _send_qwen(md_content: str, user_message: str, base_dir: str,
+ file_items: list[dict[str, Any]] | None = None,
+ discussion_history: str = "",
+ stream: bool = False,
+ pre_tool_callback: Optional[Callable[[str, str, Optional[Callable[[str], str]]], Optional[str]]] = None,
+ qa_callback: Optional[Callable[[str], str]] = None,
+ stream_callback: Optional[Callable[[str], None]] = None,
+ patch_callback: Optional[Callable[[str, str], Optional[str]]] = None) -> str:
+ _ensure_qwen_client()
+ with _qwen_history_lock:
+  user_content = user_message
+  if file_items:
+   for fi in file_items:
+    if fi.get("is_image") and fi.get("base64_data"):
+     user_content = f"[IMAGE: {fi.get('path', 'attachment')}]\n{user_content}"
+  if discussion_history and not _qwen_history:
+   _qwen_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"})
+  else:
+   _qwen_history.append({"role": "user", "content": user_content})
+  messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"}]
+  messages.extend(_qwen_history)
+ resp = _dashscope_call(
+  model=_model,
+  messages=messages,
+  tools=None,
+  max_tokens=_max_tokens,
+  temperature=_temperature,
+  top_p=_top_p,
+ )
+ return resp.get("text", "")
+
+#endregion: Qwen Provider
+
+def _ensure_llama_client() -> Any:
+ global _llama_client, _llama_base_url, _llama_api_key
+ if _llama_client is None:
+  openai = _require_warmed("openai")
+  creds = _load_credentials()
+  configured_url = creds.get("llama", {}).get("base_url")
+  configured_key = creds.get("llama", {}).get("api_key")
+  if configured_url:
+   _llama_base_url = configured_url
+  if configured_key is not None:
+   _llama_api_key = configured_key or "ollama"
+  _llama_client = openai.OpenAI(api_key=_llama_api_key, base_url=_llama_base_url)
+ return _llama_client
+
+def _send_llama(md_content: str, user_message: str, base_dir: str,
+ file_items: list[dict[str, Any]] | None = None,
+ discussion_history: str = "",
+ stream: bool = False,
+ pre_tool_callback: Optional[Callable[[str, str, Optional[Callable[[str], str]]], Optional[str]]] = None,
+ qa_callback: Optional[Callable[[str], str]] = None,
+ stream_callback: Optional[Callable[[str], None]] = None,
+ patch_callback: Optional[Callable[[str, str], Optional[str]]] = None) -> str:
+ client = _ensure_llama_client()
+ from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible
+ from src.vendor_capabilities import get_capabilities
+ with _llama_history_lock:
+  user_content = user_message
+  if file_items:
+   for fi in file_items:
+    if fi.get("is_image") and fi.get("base64_data"):
+     user_content = f"[IMAGE: {fi.get('path', 'attachment')}]\n{user_content}"
+  if discussion_history and not _llama_history:
+   _llama_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"})
+  else:
+   _llama_history.append({"role": "user", "content": user_content})
+  messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"}]
+  messages.extend(_llama_history)
+ request = OpenAICompatibleRequest(
+  messages=messages,
+  model=_model,
+  temperature=_temperature,
+  top_p=_top_p,
+  max_tokens=_max_tokens,
+  stream=stream,
+  stream_callback=stream_callback,
+ )
+ caps = get_capabilities("llama", _model)
+ response = send_openai_compatible(client, request, capabilities=caps)
+ _llama_history.append({"role": "assistant", "content": response.text})
+ return response.text
+
+def _list_llama_models() -> list[str]:
+ from src.vendor_capabilities import list_models_for_vendor
+ return list_models_for_vendor("llama")
+
+def _get_llama_cost_tracking() -> bool:
+ if "localhost" in _llama_base_url or "127.0.0.1" in _llama_base_url:
+  return False
+ from src.vendor_capabilities import get_capabilities
+ try:
+  caps = get_capabilities("llama", _model)
+  return caps.cost_tracking
+ except KeyError:
+  return True
+
+#endregion: Llama Provider
+
 #region: Tier 4 Analysis

 def run_tier4_analysis(stderr: str) -> str:
@@ -43,6 +43,24 @@ MODEL_PRICING = [
 (r"claude-.*-sonnet", {"input_per_mtok": 3.0, "output_per_mtok": 15.0}),
 (r"claude-.*-opus", {"input_per_mtok": 15.0, "output_per_mtok": 75.0}),
 (r"deepseek-v3", {"input_per_mtok": 0.27, "output_per_mtok": 1.10}),
+ (r"qwen-turbo", {"input_per_mtok": 0.05, "output_per_mtok": 0.10}),
+ (r"qwen-plus", {"input_per_mtok": 0.40, "output_per_mtok": 1.20}),
+ (r"qwen-max", {"input_per_mtok": 2.00, "output_per_mtok": 6.00}),
+ (r"qwen-long", {"input_per_mtok": 0.07, "output_per_mtok": 0.28}),
+ (r"qwen-vl-plus", {"input_per_mtok": 0.21, "output_per_mtok": 0.63}),
+ (r"qwen-vl-max", {"input_per_mtok": 0.50, "output_per_mtok": 1.50}),
+ (r"qwen-audio", {"input_per_mtok": 0.10, "output_per_mtok": 0.30}),
+ (r"grok-2", {"input_per_mtok": 2.00, "output_per_mtok": 10.00}),
+ (r"grok-2-vision", {"input_per_mtok": 2.00, "output_per_mtok": 10.00}),
+ (r"grok-beta", {"input_per_mtok": 5.00, "output_per_mtok": 15.00}),
+ (r"llama-3\.1-8b-instant", {"input_per_mtok": 0.05, "output_per_mtok": 0.08}),
+ (r"llama-3\.1-70b-versatile", {"input_per_mtok": 0.59, "output_per_mtok": 0.79}),
+ (r"llama-3\.1-405b-reasoning", {"input_per_mtok": 3.00, "output_per_mtok": 3.00}),
+ (r"llama-3\.2-1b-preview", {"input_per_mtok": 0.04, "output_per_mtok": 0.04}),
+ (r"llama-3\.2-3b-preview", {"input_per_mtok": 0.06, "output_per_mtok": 0.06}),
+ (r"llama-3\.2-11b-vision-preview", {"input_per_mtok": 0.18, "output_per_mtok": 0.18}),
+ (r"llama-3\.2-90b-vision-preview", {"input_per_mtok": 0.90, "output_per_mtok": 0.90}),
+ (r"llama-3\.3-70b-specdec", {"input_per_mtok": 0.59, "output_per_mtok": 0.79}),
 ]

 def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
@@ -53,7 +53,7 @@ from src.paths      import get_config_path

 #region: Constants

-PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax"]
+PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]

 AGENT_TOOL_NAMES: List[str] = [
 "run_powershell",
@@ -0,0 +1,144 @@
+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Any, Callable, Optional
+
+from openai import OpenAIError, RateLimitError, AuthenticationError, PermissionDeniedError, APIConnectionError, APIStatusError, BadRequestError
+
+@dataclass(frozen=True)
+class NormalizedResponse:
+ text: str
+ tool_calls: list[dict[str, Any]]
+ usage_input_tokens: int
+ usage_output_tokens: int
+ usage_cache_read_tokens: int
+ usage_cache_creation_tokens: int
+ raw_response: Any
+
+@dataclass
+class OpenAICompatibleRequest:
+ messages: list[dict[str, Any]]
+ model: str
+ temperature: float = 0.0
+ top_p: float = 1.0
+ max_tokens: int = 8192
+ tools: Optional[list[dict[str, Any]]] = None
+ tool_choice: str = "auto"
+ stream: bool = False
+ stream_callback: Optional[Callable[[str], None]] = None
+
+def _to_dict_tool_call(tc: Any) -> dict[str, Any]:
+ return {
+  "id": getattr(tc, "id", None),
+  "type": getattr(tc, "type", "function"),
+  "function": {
+   "name": getattr(tc.function, "name", None),
+   "arguments": getattr(tc.function, "arguments", "{}"),
+  },
+ }
+
+def _classify_openai_compatible_error(exc: Exception) -> "ProviderError":
+ from src.ai_client import ProviderError
+ if isinstance(exc, RateLimitError):
+  return ProviderError(kind="rate_limit", provider="openai_compatible", original=exc)
+ if isinstance(exc, AuthenticationError) or isinstance(exc, PermissionDeniedError):
+  return ProviderError(kind="auth", provider="openai_compatible", original=exc)
+ if isinstance(exc, APIConnectionError):
+  return ProviderError(kind="network", provider="openai_compatible", original=exc)
+ if isinstance(exc, APIStatusError):
+  code = getattr(exc, "status_code", 0)
+  if code == 402:
+   return ProviderError(kind="balance", provider="openai_compatible", original=exc)
+  if code == 429:
+   return ProviderError(kind="rate_limit", provider="openai_compatible", original=exc)
+  if code in (401, 403):
+   return ProviderError(kind="auth", provider="openai_compatible", original=exc)
+  if code in (500, 502, 503, 504):
+   return ProviderError(kind="network", provider="openai_compatible", original=exc)
+ if isinstance(exc, BadRequestError):
+  return ProviderError(kind="quota", provider="openai_compatible", original=exc)
+ return ProviderError(kind="unknown", provider="openai_compatible", original=exc)
+
+def send_openai_compatible(
+ client: Any,
+ request: OpenAICompatibleRequest,
+ *,
+ capabilities: Any,
+) -> NormalizedResponse:
+ kwargs: dict[str, Any] = {
+  "model": request.model,
+  "messages": request.messages,
+  "temperature": request.temperature,
+  "top_p": request.top_p,
+  "max_tokens": request.max_tokens,
+  "stream": request.stream,
+ }
+ if request.tools is not None:
+  kwargs["tools"] = request.tools
+  kwargs["tool_choice"] = request.tool_choice
+ try:
+  if request.stream:
+   return _send_streaming(client, kwargs, request.stream_callback)
+  return _send_blocking(client, kwargs)
+ except OpenAIError as exc:
+  raise _classify_openai_compatible_error(exc) from exc
+
+def _send_blocking(client: Any, kwargs: dict[str, Any]) -> NormalizedResponse:
+ resp = client.chat.completions.create(**kwargs)
+ msg = resp.choices[0].message
+ tool_calls_raw = msg.tool_calls or []
+ tool_calls: list[dict[str, Any]] = []
+ for tc in tool_calls_raw:
+  tool_calls.append(_to_dict_tool_call(tc))
+ usage = getattr(resp, "usage", None)
+ return NormalizedResponse(
+  text=msg.content or "",
+  tool_calls=tool_calls,
+  usage_input_tokens=int(getattr(usage, "prompt_tokens", 0) or 0),
+  usage_output_tokens=int(getattr(usage, "completion_tokens", 0) or 0),
+  usage_cache_read_tokens=0,
+  usage_cache_creation_tokens=0,
+  raw_response=resp,
+ )
+
+def _send_streaming(client: Any, kwargs: dict[str, Any], callback: Optional[Callable[[str], None]]) -> NormalizedResponse:
+ kwargs_stream = dict(kwargs)
+ kwargs_stream["stream"] = True
+ kwargs_stream["stream_options"] = {"include_usage": True}
+ chunks_iter = client.chat.completions.create(**kwargs_stream)
+ text_parts: list[str] = []
+ tool_calls_acc: dict[int, dict[str, Any]] = {}
+ usage_input = 0
+ usage_output = 0
+ for chunk in chunks_iter:
+  for choice in getattr(chunk, "choices", []) or []:
+   delta = getattr(choice, "delta", None)
+   if delta is None:
+    continue
+   if delta.content:
+    text_parts.append(delta.content)
+    if callback:
+     callback(delta.content)
+   for tc in getattr(delta, "tool_calls", None) or []:
+    idx = getattr(tc, "index", 0)
+    if idx not in tool_calls_acc:
+     tool_calls_acc[idx] = {"id": None, "type": "function", "function": {"name": None, "arguments": ""}}
+    if getattr(tc, "id", None):
+     tool_calls_acc[idx]["id"] = tc.id
+    if getattr(tc, "function", None):
+     if tc.function.name:
+      tool_calls_acc[idx]["function"]["name"] = tc.function.name
+     if tc.function.arguments:
+      tool_calls_acc[idx]["function"]["arguments"] += tc.function.arguments
+  chunk_usage = getattr(chunk, "usage", None)
+  if chunk_usage is not None:
+   usage_input = int(getattr(chunk_usage, "prompt_tokens", 0) or 0)
+   usage_output = int(getattr(chunk_usage, "completion_tokens", 0) or 0)
+ return NormalizedResponse(
+  text="".join(text_parts),
+  tool_calls=[tool_calls_acc[k] for k in sorted(tool_calls_acc.keys())],
+  usage_input_tokens=usage_input,
+  usage_output_tokens=usage_output,
+  usage_cache_read_tokens=0,
+  usage_cache_creation_tokens=0,
+  raw_response=None,
+ )
@@ -0,0 +1,37 @@
+from __future__ import annotations
+from typing import Any
+import dashscope
+from dashscope.common.error import (
+ AuthenticationError,
+ InvalidParameter,
+ RequestFailure,
+ ServiceUnavailableError,
+ TimeoutException,
+)
+from src.ai_client import ProviderError
+
+def build_dashscope_tools(openai_tools: list[dict[str, Any]]) -> list[dict[str, Any]]:
+ out: list[dict[str, Any]] = []
+ for t in openai_tools:
+  if t.get("type") != "function":
+   continue
+  fn = t.get("function", {})
+  out.append({
+   "name": fn.get("name", ""),
+   "description": fn.get("description", ""),
+   "parameters": fn.get("parameters", {"type": "object", "properties": {}}),
+  })
+ return out
+
+def classify_dashscope_error(exc: Exception) -> ProviderError:
+ if isinstance(exc, AuthenticationError):
+  return ProviderError(kind="auth", provider="qwen", original=exc)
+ if isinstance(exc, TimeoutException):
+  return ProviderError(kind="network", provider="qwen", original=exc)
+ if isinstance(exc, ServiceUnavailableError):
+  return ProviderError(kind="network", provider="qwen", original=exc)
+ if isinstance(exc, InvalidParameter):
+  return ProviderError(kind="quota", provider="qwen", original=exc)
+ if isinstance(exc, RequestFailure):
+  return ProviderError(kind="network", provider="qwen", original=exc)
+ return ProviderError(kind="unknown", provider="qwen", original=exc)
@@ -0,0 +1,55 @@
+from __future__ import annotations
+from dataclasses import dataclass
+
+@dataclass(frozen=True)
+class VendorCapabilities:
+ vendor: str
+ model: str
+ vision: bool = False
+ tool_calling: bool = True
+ caching: bool = False
+ streaming: bool = True
+ model_discovery: bool = True
+ context_window: int = 8192
+ cost_tracking: bool = True
+ cost_input_per_mtok: float = 0.0
+ cost_output_per_mtok: float = 0.0
+ notes: str = ''
+
+_REGISTRY: dict[tuple[str, str], VendorCapabilities] = {}
+
+def register(cap: VendorCapabilities) -> None:
+ _REGISTRY[(cap.vendor, cap.model)] = cap
+
+def get_capabilities(vendor: str, model: str) -> VendorCapabilities:
+ if (vendor, model) in _REGISTRY:
+  return _REGISTRY[(vendor, model)]
+ if (vendor, '*') in _REGISTRY:
+  return _REGISTRY[(vendor, '*')]
+ raise KeyError(f'No capabilities registered for vendor={vendor!r} model={model!r}')
+
+def list_models_for_vendor(vendor: str) -> list[str]:
+ return sorted({m for v, m in _REGISTRY if v == vendor and m != '*'})
+
+register(VendorCapabilities(vendor='minimax', model='*', context_window=131072, cost_input_per_mtok=0.20, cost_output_per_mtok=0.20))
+register(VendorCapabilities(vendor='grok', model='*', context_window=131072, cost_input_per_mtok=2.00, cost_output_per_mtok=10.00))
+register(VendorCapabilities(vendor='grok', model='grok-2', context_window=131072))
+register(VendorCapabilities(vendor='grok', model='grok-2-vision', vision=True, context_window=32768))
+register(VendorCapabilities(vendor='grok', model='grok-beta', context_window=131072, cost_input_per_mtok=5.00, cost_output_per_mtok=15.00))
+register(VendorCapabilities(vendor='llama', model='*', context_window=131072))
+register(VendorCapabilities(vendor='llama', model='llama-3.1-8b-instant', context_window=131072, cost_input_per_mtok=0.05, cost_output_per_mtok=0.08))
+register(VendorCapabilities(vendor='llama', model='llama-3.1-70b-versatile', context_window=131072, cost_input_per_mtok=0.59, cost_output_per_mtok=0.79))
+register(VendorCapabilities(vendor='llama', model='llama-3.1-405b-reasoning', context_window=131072, cost_input_per_mtok=3.00, cost_output_per_mtok=3.00))
+register(VendorCapabilities(vendor='llama', model='llama-3.2-1b-preview', context_window=131072, cost_input_per_mtok=0.04, cost_output_per_mtok=0.04))
+register(VendorCapabilities(vendor='llama', model='llama-3.2-3b-preview', context_window=131072, cost_input_per_mtok=0.06, cost_output_per_mtok=0.06))
+register(VendorCapabilities(vendor='llama', model='llama-3.2-11b-vision-preview', vision=True, context_window=131072, cost_input_per_mtok=0.18, cost_output_per_mtok=0.18))
+register(VendorCapabilities(vendor='llama', model='llama-3.2-90b-vision-preview', vision=True, context_window=131072, cost_input_per_mtok=0.90, cost_output_per_mtok=0.90))
+register(VendorCapabilities(vendor='llama', model='llama-3.3-70b-specdec', context_window=131072, cost_input_per_mtok=0.59, cost_output_per_mtok=0.79))
+register(VendorCapabilities(vendor='qwen', model='*', context_window=32768))
+register(VendorCapabilities(vendor='qwen', model='qwen-turbo', context_window=1000000, cost_input_per_mtok=0.05, cost_output_per_mtok=0.10))
+register(VendorCapabilities(vendor='qwen', model='qwen-plus', context_window=131072, cost_input_per_mtok=0.40, cost_output_per_mtok=1.20))
+register(VendorCapabilities(vendor='qwen', model='qwen-max', context_window=32768, cost_input_per_mtok=2.00, cost_output_per_mtok=6.00))
+register(VendorCapabilities(vendor='qwen', model='qwen-long', context_window=1000000, cost_input_per_mtok=0.07, cost_output_per_mtok=0.28))
+register(VendorCapabilities(vendor='qwen', model='qwen-vl-plus', vision=True, context_window=131072, cost_input_per_mtok=0.21, cost_output_per_mtok=0.63))
+register(VendorCapabilities(vendor='qwen', model='qwen-vl-max', vision=True, context_window=32768, cost_input_per_mtok=0.50, cost_output_per_mtok=1.50))
+register(VendorCapabilities(vendor='qwen', model='qwen-audio', context_window=32768, cost_input_per_mtok=0.10, cost_output_per_mtok=0.30, notes='Text-only in v1; audio input deferred'))
@@ -0,0 +1,28 @@
+from unittest.mock import MagicMock, patch
+import pytest
+from src import ai_client
+
+@pytest.fixture(autouse=True)
+def _reset_grok_state():
+ if hasattr(ai_client, '_grok_client'):
+  ai_client._grok_client = None
+ if hasattr(ai_client, '_grok_history'):
+  ai_client._grok_history = []
+ yield
+
+def test_send_grok_uses_xai_endpoint(monkeypatch: pytest.MonkeyPatch) -> None:
+ ai_client.set_provider("grok", "grok-2")
+ mock_client = MagicMock()
+ mock_client.chat.completions.create.return_value = MagicMock(
+  choices=[MagicMock(message=MagicMock(content="hi from grok", tool_calls=[]))],
+  usage=MagicMock(prompt_tokens=10, completion_tokens=5),
+ )
+ with patch("src.ai_client._ensure_grok_client", return_value=mock_client):
+  result = ai_client._send_grok("system", "user", ".", None, "", False, None, None, None)
+  assert result == "hi from grok"
+  assert mock_client.chat.completions.create.called
+
+def test_grok_2_vision_supports_image() -> None:
+ from src.vendor_capabilities import get_capabilities
+ caps = get_capabilities("grok", "grok-2-vision")
+ assert caps.vision is True
@@ -0,0 +1,68 @@
+from unittest.mock import MagicMock, patch
+import pytest
+from src import ai_client
+
+@pytest.fixture(autouse=True)
+def _reset_llama_state():
+ if hasattr(ai_client, '_llama_client'):
+  ai_client._llama_client = None
+ if hasattr(ai_client, '_llama_history'):
+  ai_client._llama_history = []
+ if hasattr(ai_client, '_llama_base_url'):
+  ai_client._llama_base_url = "http://localhost:11434/v1"
+ if hasattr(ai_client, '_llama_api_key'):
+  ai_client._llama_api_key = "ollama"
+ yield
+
+def test_send_llama_ollama_backend(monkeypatch: pytest.MonkeyPatch) -> None:
+ ai_client._llama_base_url = "http://localhost:11434/v1"
+ ai_client.set_provider("llama", "llama-3.2-3b-preview")
+ mock_client = MagicMock()
+ mock_client.chat.completions.create.return_value = MagicMock(
+  choices=[MagicMock(message=MagicMock(content="hi from ollama", tool_calls=[]))],
+  usage=MagicMock(prompt_tokens=5, completion_tokens=3),
+ )
+ with patch("src.ai_client._ensure_llama_client", return_value=mock_client):
+  result = ai_client._send_llama("system", "user", ".", None, "", False, None, None, None)
+  assert result == "hi from ollama"
+
+def test_send_llama_openrouter_backend(monkeypatch: pytest.MonkeyPatch) -> None:
+ ai_client._llama_base_url = "https://openrouter.ai/api/v1"
+ ai_client.set_provider("llama", "llama-3.1-70b-versatile")
+ captured_client = MagicMock()
+ captured_client.chat.completions.create.return_value = MagicMock(
+  choices=[MagicMock(message=MagicMock(content="hi from openrouter", tool_calls=[]))],
+  usage=MagicMock(prompt_tokens=5, completion_tokens=3),
+ )
+ with patch("src.ai_client._ensure_llama_client", return_value=captured_client) as ensure:
+  result = ai_client._send_llama("system", "user", ".", None, "", False, None, None, None)
+  assert result == "hi from openrouter"
+  assert ensure.called
+
+def test_send_llama_custom_url(monkeypatch: pytest.MonkeyPatch) -> None:
+ ai_client._llama_base_url = "http://my-server:9999/v1"
+ mock_client = MagicMock()
+ mock_client.chat.completions.create.return_value = MagicMock(
+  choices=[MagicMock(message=MagicMock(content="hi from custom", tool_calls=[]))],
+  usage=MagicMock(prompt_tokens=5, completion_tokens=3),
+ )
+ with patch("src.ai_client._ensure_llama_client", return_value=mock_client):
+  result = ai_client._send_llama("system", "user", ".", None, "", False, None, None, None)
+  assert result == "hi from custom"
+
+def test_llama_model_discovery_unions_ollama_and_openrouter() -> None:
+ from src.ai_client import _list_llama_models
+ models = _list_llama_models()
+ assert "llama-3.1-8b-instant" in models
+ assert "llama-3.2-11b-vision-preview" in models
+ assert "llama-3.3-70b-specdec" in models
+
+def test_llama_3_2_vision_vision_capability() -> None:
+ from src.vendor_capabilities import get_capabilities
+ caps = get_capabilities("llama", "llama-3.2-11b-vision-preview")
+ assert caps.vision is True
+
+def test_llama_local_backend_cost_tracking_false_for_ollama() -> None:
+ ai_client._llama_base_url = "http://localhost:11434/v1"
+ from src.ai_client import _get_llama_cost_tracking
+ assert _get_llama_cost_tracking() is False
@@ -0,0 +1,88 @@
+from unittest.mock import MagicMock
+import pytest
+from src.openai_compatible import (
+ NormalizedResponse,
+ OpenAICompatibleRequest,
+ send_openai_compatible,
+)
+from src.vendor_capabilities import VendorCapabilities, register
+
+@pytest.fixture
+def caps() -> VendorCapabilities:
+ return VendorCapabilities(vendor="test", model="test-model", context_window=8192, cost_input_per_mtok=1.0, cost_output_per_mtok=2.0)
+
+def _mock_completion(text: str = "hello", tool_calls=None, usage_input: int = 10, usage_output: int = 5):
+ m = MagicMock()
+ m.choices = [MagicMock()]
+ m.choices[0].message.content = text
+ m.choices[0].message.tool_calls = tool_calls or []
+ m.usage.prompt_tokens = usage_input
+ m.usage.completion_tokens = usage_output
+ m.usage.prompt_tokens_details = None
+ m.usage.completion_tokens_details = None
+ return m
+
+def test_send_non_streaming_returns_normalized_response(caps: VendorCapabilities) -> None:
+ client = MagicMock()
+ client.chat.completions.create.return_value = _mock_completion("hi", usage_input=20, usage_output=10)
+ request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m", max_tokens=100)
+ response = send_openai_compatible(client, request, capabilities=caps)
+ assert response.text == "hi"
+ assert response.tool_calls == []
+ assert response.usage_input_tokens == 20
+ assert response.usage_output_tokens == 10
+
+def test_send_streaming_aggregates_chunks(caps: VendorCapabilities) -> None:
+ client = MagicMock()
+ chunks = [
+  MagicMock(choices=[MagicMock(delta=MagicMock(content="hel", tool_calls=None))]),
+  MagicMock(choices=[MagicMock(delta=MagicMock(content="lo", tool_calls=None))]),
+  MagicMock(choices=[MagicMock(delta=MagicMock(content="", tool_calls=None))], usage=MagicMock(prompt_tokens=15, completion_tokens=5)),
+ ]
+ client.chat.completions.create.return_value = iter(chunks)
+ received: list = []
+ request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m", stream=True, stream_callback=received.append)
+ response = send_openai_compatible(client, request, capabilities=caps)
+ assert response.text == "hello"
+ assert received == ["hel", "lo"]
+ assert response.usage_input_tokens == 15
+
+def test_tool_call_detection_in_response(caps: VendorCapabilities) -> None:
+ tool_call = MagicMock()
+ tool_call.id = "call_1"
+ tool_call.function.name = "read_file"
+ tool_call.function.arguments = '{"path": "/tmp/x"}'
+ completion = _mock_completion(text="", tool_calls=[tool_call])
+ client = MagicMock()
+ client.chat.completions.create.return_value = completion
+ request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m")
+ response = send_openai_compatible(client, request, capabilities=caps)
+ assert len(response.tool_calls) == 1
+ assert response.tool_calls[0]["function"]["name"] == "read_file"
+ assert response.tool_calls[0]["id"] == "call_1"
+
+def test_vision_multimodal_message(caps: VendorCapabilities) -> None:
+ client = MagicMock()
+ client.chat.completions.create.return_value = _mock_completion("looks like a cat")
+ messages = [{"role": "user", "content": [{"type": "text", "text": "what is this?"}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}]}]
+ request = OpenAICompatibleRequest(messages=messages, model="m")
+ response = send_openai_compatible(client, request, capabilities=caps)
+ sent_messages = client.chat.completions.create.call_args.kwargs["messages"]
+ assert sent_messages[0]["content"] == messages[0]["content"]
+ assert response.text == "looks like a cat"
+
+def test_error_classification_429_to_rate_limit(caps: VendorCapabilities) -> None:
+ from openai import RateLimitError
+ from src.ai_client import ProviderError
+ client = MagicMock()
+ client.chat.completions.create.side_effect = RateLimitError("rate limited", response=MagicMock(status_code=429), body=None)
+ request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m")
+ with pytest.raises(ProviderError) as exc_info:
+  send_openai_compatible(client, request, capabilities=caps)
+ assert exc_info.value.kind == "rate_limit"
+
+def test_normalized_response_is_frozen_dataclass() -> None:
+ from dataclasses import FrozenInstanceError
+ r = NormalizedResponse(text="x", tool_calls=[], usage_input_tokens=0, usage_output_tokens=0, usage_cache_read_tokens=0, usage_cache_creation_tokens=0, raw_response=None)
+ with pytest.raises(FrozenInstanceError):
+  r.text = "y"
@@ -0,0 +1,55 @@
+from unittest.mock import MagicMock, patch
+import pytest
+from src import ai_client
+
+@pytest.fixture(autouse=True)
+def _reset_qwen_state():
+ if hasattr(ai_client, '_qwen_client'):
+  ai_client._qwen_client = None
+ if hasattr(ai_client, '_qwen_history'):
+  ai_client._qwen_history = []
+ yield
+
+def test_send_qwen_routes_to_dashscope(monkeypatch: pytest.MonkeyPatch) -> None:
+ ai_client.set_provider("qwen", "qwen-max")
+ with patch("src.ai_client._ensure_qwen_client") as ensure, \
+  patch("src.ai_client._dashscope_call", return_value={"text": "hi from qwen", "tool_calls": [], "usage": {"input_tokens": 10, "output_tokens": 5}}) as call:
+  result = ai_client._send_qwen("system", "user", ".", None, "", False, None, None, None)
+  assert result == "hi from qwen"
+  call.assert_called_once()
+  ensure.assert_called_once()
+
+def test_qwen_vision_vl_model_accepts_image(monkeypatch: pytest.MonkeyPatch) -> None:
+ ai_client.set_provider("qwen", "qwen-vl-max")
+ with patch("src.ai_client._ensure_qwen_client"), \
+  patch("src.ai_client._dashscope_call", return_value={"text": "I see a cat", "tool_calls": [], "usage": {"input_tokens": 10, "output_tokens": 5}}) as call:
+  file_items = [{"path": "/tmp/cat.png", "is_image": True, "base64_data": "iVBOR..."}]
+  result = ai_client._send_qwen("system", "describe this image", ".", file_items, "", False, None, None, None)
+  assert "cat" in result.lower()
+  kwargs = call.call_args.kwargs
+  msgs_str = str(kwargs.get("messages", [])).lower()
+  assert "image" in msgs_str or "cat.png" in msgs_str
+
+def test_qwen_tool_format_translation() -> None:
+ from src.qwen_adapter import build_dashscope_tools
+ openai_tools = [{"type": "function", "function": {"name": "read_file", "description": "Read a file", "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}}}]
+ ds_tools = build_dashscope_tools(openai_tools)
+ assert len(ds_tools) == 1
+ assert ds_tools[0]["name"] == "read_file"
+ assert "parameters" in ds_tools[0]
+
+def test_qwen_error_classification() -> None:
+ from src.ai_client import ProviderError
+ from src.qwen_adapter import classify_dashscope_error
+ from dashscope.common.error import AuthenticationError
+ err = classify_dashscope_error(AuthenticationError("bad key"))
+ assert err.kind == "auth"
+ assert err.provider == "qwen"
+
+def test_list_qwen_models_returns_hardcoded_registry() -> None:
+ from src.ai_client import _list_qwen_models
+ models = _list_qwen_models()
+ assert "qwen-max" in models
+ assert "qwen-vl-max" in models
+ assert "qwen-turbo" in models
+ assert "qwen-audio" in models
@@ -0,0 +1,40 @@
+import pytest
+from src.vendor_capabilities import VendorCapabilities, get_capabilities, register
+
+@pytest.fixture(autouse=True)
+def _clean_registry():
+ import src.vendor_capabilities
+ snapshot = src.vendor_capabilities._REGISTRY.copy()
+ yield
+ src.vendor_capabilities._REGISTRY.clear()
+ src.vendor_capabilities._REGISTRY.update(snapshot)
+
+def test_registry_lookup_known_model():
+ caps = VendorCapabilities(
+  vendor='qwen',
+  model='qwen-max',
+  vision=False,
+  context_window=32768
+ )
+ register(caps)
+ retrieved = get_capabilities('qwen', 'qwen-max')
+ assert retrieved.vendor == 'qwen'
+ assert retrieved.model == 'qwen-max'
+ assert retrieved.context_window == 32768
+ assert retrieved.vision is False
+
+def test_fallback_to_vendor_default():
+ caps = VendorCapabilities(
+  vendor='llama',
+  model='*',
+  context_window=131072,
+  cost_tracking=False
+ )
+ register(caps)
+ retrieved = get_capabilities('llama', 'llama-3.3-future-unregistered')
+ assert retrieved.context_window == 131072
+ assert retrieved.cost_tracking is False
+
+def test_unknown_vendor_raises():
+ with pytest.raises(KeyError, match='No capabilities registered'):
+  get_capabilities('nonexistent_vendor', 'anymodel')