diff --git a/conductor/tracks/qwen_llama_grok_integration_20260606/spec.md b/conductor/tracks/qwen_llama_grok_integration_20260606/spec.md index 165529c0..c85544a3 100644 --- a/conductor/tracks/qwen_llama_grok_integration_20260606/spec.md +++ b/conductor/tracks/qwen_llama_grok_integration_20260606/spec.md @@ -59,6 +59,23 @@ This means: - **Anthropic/Gemini/DeepKeep** stay per-vendor code paths; the data-oriented refactor doesn't apply to them because their unique APIs are not OpenAI-compatible-shaped. - **"Base paths are unique"** (the user's wording) means: `_send_qwen()`, `_send_llama()`, `_send_grok()`, `_send_minimax()` are the unique entry points; everything they call into is shared. +### 3.1.1 Architectural principle: "Use the best API per vendor" (added 2026-06-11) + +**Per the user's correction, the track's prior assumption — "all OpenAI-compatible" — is wrong.** The right principle is: **use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.** + +The OpenAI-compatible shim (the `send_openai_compatible` helper) loses vendor-specific features. Concrete examples discovered in this session: + +- **xAI (Grok)**: native REST has `prompt_cache_key` (prompt caching), `reasoning_effort` (reasoning model control), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp_calls`), `cost_in_usd_ticks` (native cost reporting), and the newer `/v1/responses` endpoint. The OpenAI SDK shim loses all of these. **Decision: Grok uses xAI's native REST API (`requests.post` to `https://api.x.ai/v1/chat/completions` or `/v1/responses`), not the OpenAI SDK.** +- **Ollama** (used as Llama's local backend): native `/api/chat` has `think` param (low/medium/high for thinking traces), `images: list[str]` in messages (cleaner base64 array vs OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs. **Decision (FOLLOW-UP): Ollama should use native `/api/chat` instead of the OpenAI-compatible `/v1/chat/completions`.** Deferred to a follow-up track because the Phase 3 Red tests are already written for the OpenAI-compatible shim. +- **Meta Llama API** (separate from Ollama): a hosted cloud API for Llama models. The OpenAI-compatible shim loses whatever Meta-native features it offers. **Decision (FOLLOW-UP): Add Meta's Llama API as a 4th Llama backend (alongside Ollama, OpenRouter, custom_url).** Deferred to a follow-up track pending verification of Meta's API spec. +- **Qwen (DashScope)**: already uses native SDK (correct from the start). +- **MiniMax**: no native SDK other than the OpenAI-compatible endpoint. Keep as-is. +- **Anthropic / Gemini / DeepSeek**: have native SDKs (anthropic SDK, google-genai SDK, raw HTTP). These stay per-vendor per the deferred `anthropic_gemini_deepseek_capability_matrix_20260606` follow-up track. + +**Implications for the capability matrix:** as native APIs add features, the matrix grows. The current v1 matrix has 7 fields (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking). Future expansion (per the deferred list in §3.3) will add: `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `reasoning_effort`, `structured_output`. The matrix IS the aggregate tracker; the GUI filters UI elements based on what's in the matrix. **The matrix's job is to be the canonical source of truth for "what can this vendor/model do"; the GUI never hard-codes per-vendor branches.** Any new capability a vendor adds (server-side tools, native cost reporting, prompt caching) goes into the matrix; the UI filters based on it. + +**This track's Phase 3 ships the OpenAI-compatible Grok + Llama (3 backends) as a placeholder; the native-API work is deferred to follow-up tracks documented in §13.1.** + ### 3.2 Module Layout ``` @@ -222,40 +239,48 @@ _llama_api_key: str = "ollama" # Ollama doesn't require aut **Model discovery:** Ollama exposes `GET /api/tags` (not `/v1/models`); OpenRouter exposes `GET /v1/models`. The Llama adapter probes both endpoints and unions the results. For custom URLs, falls back to the hardcoded registry. -### 4.3 Grok via xAI (OpenAI-Compatible) +### 4.3 Grok via xAI (Native REST API) — corrected 2026-06-11 -**SDK:** `openai` (already a dependency). +**Why native (not OpenAI-compatible):** Per §3.1.1, the OpenAI SDK shim loses xAI's native features: `prompt_cache_key` (prompt caching — sets `caching: true` in the matrix), `reasoning_effort` (reasoning model control — sets `reasoning: true`), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp` — each a future matrix field), `cost_in_usd_ticks` (native cost reporting), and the newer `/v1/responses` endpoint. **Grok uses `requests.post` directly to xAI's native REST API.** + +**Two native endpoints are available:** +- `POST https://api.x.ai/v1/chat/completions` — OpenAI-request-shape; supports tools, streaming, vision. Use this for v1 (matches the test signature; simpler). +- `POST https://api.x.ai/v1/responses` — xAI's newer native endpoint; supports reasoning, server-side tools, response chains via `previous_response_id`. Use this in a follow-up track for the full server-side feature set. **State:** ```python -_grok_client: OpenAI | None = None _grok_history: list[dict[str, Any]] = [] _grok_history_lock: threading.Lock = threading.Lock() +_grok_api_key: str = "" ``` +(No persistent client; each call uses `requests.post` with the auth header.) + **Credentials:** `credentials.toml` `[grok]` section with `api_key`. (xAI's `base_url` is hardcoded to `https://api.x.ai/v1`.) **Configuration per-project (TOML):** `provider = "grok"`, `grok_model = "grok-2"`. -**Models shipped in the capability registry (v1):** +**Models shipped in the capability registry (v1) — updated with native features:** -| Model | vision | tool_calling | caching | context_window | cost_input | cost_output | -|---|---|---|---|---|---|---| -| `grok-2` | false | true | false | 131,072 | $2.00 | $10.00 | -| `grok-2-vision` | true | true | false | 32,768 | $2.00 | $10.00 | -| `grok-beta` | false | true | false | 131,072 | $5.00 | $15.00 | +| Model | vision | tool_calling | caching | streaming | context_window | cost_input | cost_output | +|---|---|---|---|---|---|---|---| +| `grok-2` | false | true | true (prompt_cache_key) | true | 131,072 | $2.00 | $10.00 | +| `grok-2-vision` | true | true | true (prompt_cache_key) | true | 32,768 | $2.00 | $10.00 | +| `grok-beta` | false | true | true (prompt_cache_key) | true | 131,072 | $5.00 | $15.00 | -(Pricing from x.ai public pricing as of 2026-06-06; update if needed.) +(Pricing from x.ai public pricing as of 2026-06-06; update if needed. The `caching: true` entry acknowledges xAI's `prompt_cache_key` support.) -**Entry point:** `_send_grok()` in `src/ai_client.py`. Calls `send_openai_compatible()` with the xAI base URL. +**Entry point:** `_send_grok()` in `src/ai_client.py`. POSTs to `https://api.x.ai/v1/chat/completions` directly via `requests.post` (or `httpx`). NOT `client.chat.completions.create()` (the OpenAI SDK shim). -**Tool format:** Native OpenAI. No translation needed. +**Tool format:** xAI's native format matches OpenAI's `tool_calls` (id, type=function, function={name, arguments}). No translation needed. -**Vision:** Grok-2-Vision accepts image URLs or base64. The OpenAI-compatible helper already handles vision via the OpenAI SDK's multimodal message format. +**Vision:** Grok-2-Vision accepts image URLs or base64 via the same `content: list[dict]` shape as OpenAI. Pass through unchanged. -**Error classification:** Same as OpenAI-compatible vendors (uniform error shape via the openai SDK). +**Error classification:** New `_classify_grok_error()` maps xAI's HTTP status codes (401, 403, 429, 500+) to `ProviderError` kinds. xAI returns JSON error bodies with `code` and `message` fields (e.g., 401 → `code="InvalidApiKey"`). -**Model discovery:** xAI exposes `GET /v1/models`. Standard OpenAI-compatible discovery. +**Model discovery:** xAI exposes `GET /v1/models`. The Grok adapter calls this and returns the model IDs. + +**Phase 3 placeholder behavior:** This track's Phase 3 ships the OpenAI-compatible Grok (mocking `chat.completions.create`) as a placeholder, NOT the native REST approach. The OpenAI-compatible implementation works against xAI's `https://api.x.ai/v1` endpoint (which IS OpenAI-compatible) but loses the native features listed above. The native refactor is documented as a follow-up in §13.1. ## 5. Shared OpenAI-Compatible Helper @@ -466,9 +491,16 @@ Each phase has its own checkpoint commit and git note. ## 13. See Also -### 13.1 Follow-up Track (separate plan) +### 13.1 Follow-up Tracks (separate plans) -**"Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high. +**A. "Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high. + +**B. "Native Vendor APIs (post-OpenAI-compatible-placeholder)"** — Replaces the OpenAI-compatible shim used in this track's Phase 3 (Grok + Llama) with the vendors' native SDKs/REST APIs. Per §3.1.1, the OpenAI-compatible approach loses native features. Concretely: +- **Grok** → xAI native REST (`requests.post` to `https://api.x.ai/v1/chat/completions` or `/v1/responses`); adds `prompt_cache_key` (caching), `reasoning_effort` (reasoning), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp`), and `cost_in_usd_ticks` (native cost reporting). +- **Llama (Ollama backend)** → Ollama native `/api/chat`; adds `think` param (low/medium/high), `images: list[str]` in messages (cleaner base64 than OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs. +- **Llama (Meta Llama API backend)** → New 4th backend option; uses Meta's native REST API. Currently deferred pending verification of Meta's API spec (the `llama.developer.meta.com/docs/overview` URL returned 400 on fetch this session; needs re-verification when the docs are available). +- **Capability matrix expansion** → Add fields for the new native features: `caching` (already in v1, just enable for Grok), `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `reasoning_effort`, `structured_output`. Each addition is a registry change + a UI adaptation in Phase 5. +- **Test rewrites** → The Phase 3 Red tests in `test_grok_provider.py` and `test_llama_provider.py` mock `chat.completions.create` (OpenAI SDK pattern). Native tests would mock `requests.post` (or `httpx`) and verify the JSON body shape, headers, and response parsing. ### 13.2 Project References