From 8e3543d875cbda0198d0bc39a4b9d02bf975c651 Mon Sep 17 00:00:00 2001
From: Ed_ <edwardgz@gmail.com>
Date: Thu, 11 Jun 2026 02:01:08 -0400
Subject: [PATCH] docs(spec): revise 'best API per vendor' after Grok
 consultation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Grok's own recommendation (consulted 2026-06-11):

  'xAI (Grok) | xAI official OpenAI-compatible (https://api.x.ai/v1) |
   Fully compatible and clean. Supports Grok-2 + Grok-2-Vision. No
   meaningful unique native surface lost by using the compatible
   endpoint.'

This REVERSES the earlier 'xAI native' correction. The OpenAI-
compatible approach for Grok is the canonical full-featured path;
the implementation in Phase 3 (OpenAI SDK with base_url=https://api.x.ai/v1
+ send_openai_compatible helper) is correct as-is.

Updates to the spec:

1. §3.1.1: replaced the 'use xAI native' decision with the confirmed
   per-vendor table. Qwen=Native, Grok=OpenAI-Compatible (per Grok's
   own confirmation), MiniMax=OpenAI-Compatible, DeepSeek=OpenAI-
   Compatible, Ollama=OpenAI-Compatible-in-v1 (native in v2),
   Meta Llama API=Native (new 4th backend, follow-up), Gemini=Native
   (follow-up), Anthropic=Native (follow-up). Also added Grok's
   recommended v2 matrix field expansion: audio, video, grounding,
   computer_use, local, reasoning/extended_thinking, web_search,
   x_search, code_execution, file_search, mcp_support, structured_output.

2. §4.3: reverted from 'Grok via xAI (Native REST API)' back to
   'Grok via xAI (OpenAI-Compatible) - confirmed 2026-06-11'. The
   implementation does NOT need a native refactor; the OpenAI SDK
   at https://api.x.ai/v1 is the canonical approach. Removed the
   earlier 'caching: true' entry from the registry (since the
   OpenAI-compat shim doesn't expose prompt_cache_key) and the
   'no persistent client' state struct (back to the OpenAI SDK
   pattern).

3. §13.1.B: renamed from 'Native Vendor APIs' to 'Llama Native APIs
   (Ollama native + Meta Llama API)' and removed the Grok native
   refactor item (Grok says OpenAI-compat is fine). Kept the Ollama
   native + Meta Llama API items + matrix expansion. Clarified that
   Grok tests do NOT need rewriting; only Llama tests get 2 more
   (native Ollama, Meta Llama API).

Net effect: the Phase 3 work that just shipped (Grok+Llama Green
using OpenAI-compat shim) is CORRECT as-is. The implementation
matches Grok's actual recommendation. No code rollback needed.
---
 .../spec.md                                   | 88 +++++++++++--------
 1 file changed, 49 insertions(+), 39 deletions(-)

diff --git a/conductor/tracks/qwen_llama_grok_integration_20260606/spec.md b/conductor/tracks/qwen_llama_grok_integration_20260606/spec.md
index c85544a3..141e428f 100644
--- a/conductor/tracks/qwen_llama_grok_integration_20260606/spec.md
+++ b/conductor/tracks/qwen_llama_grok_integration_20260606/spec.md
@@ -59,22 +59,39 @@ This means:
 - **Anthropic/Gemini/DeepKeep** stay per-vendor code paths; the data-oriented refactor doesn't apply to them because their unique APIs are not OpenAI-compatible-shaped.
 - **"Base paths are unique"** (the user's wording) means: `_send_qwen()`, `_send_llama()`, `_send_grok()`, `_send_minimax()` are the unique entry points; everything they call into is shared.
 
-### 3.1.1 Architectural principle: "Use the best API per vendor" (added 2026-06-11)
+### 3.1.1 Architectural principle: "Use the best API per vendor" (added 2026-06-11, revised after Grok consultation)
 
-**Per the user's correction, the track's prior assumption — "all OpenAI-compatible" — is wrong.** The right principle is: **use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.**
+**Per the user's correction, the track's prior assumption — "all OpenAI-compatible" — was incomplete. The right principle is: **use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.**
 
-The OpenAI-compatible shim (the `send_openai_compatible` helper) loses vendor-specific features. Concrete examples discovered in this session:
+The OpenAI-compatible shim (the `send_openai_compatible` helper) is the highest-leverage part of the spec: every vendor that uses it gets the same request/response/tool-calling/error/streaming logic with zero duplication. The question is **which vendors should use it** vs. which should have a native adapter.
 
-- **xAI (Grok)**: native REST has `prompt_cache_key` (prompt caching), `reasoning_effort` (reasoning model control), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp_calls`), `cost_in_usd_ticks` (native cost reporting), and the newer `/v1/responses` endpoint. The OpenAI SDK shim loses all of these. **Decision: Grok uses xAI's native REST API (`requests.post` to `https://api.x.ai/v1/chat/completions` or `/v1/responses`), not the OpenAI SDK.**
-- **Ollama** (used as Llama's local backend): native `/api/chat` has `think` param (low/medium/high for thinking traces), `images: list[str]` in messages (cleaner base64 array vs OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs. **Decision (FOLLOW-UP): Ollama should use native `/api/chat` instead of the OpenAI-compatible `/v1/chat/completions`.** Deferred to a follow-up track because the Phase 3 Red tests are already written for the OpenAI-compatible shim.
-- **Meta Llama API** (separate from Ollama): a hosted cloud API for Llama models. The OpenAI-compatible shim loses whatever Meta-native features it offers. **Decision (FOLLOW-UP): Add Meta's Llama API as a 4th Llama backend (alongside Ollama, OpenRouter, custom_url).** Deferred to a follow-up track pending verification of Meta's API spec.
-- **Qwen (DashScope)**: already uses native SDK (correct from the start).
-- **MiniMax**: no native SDK other than the OpenAI-compatible endpoint. Keep as-is.
-- **Anthropic / Gemini / DeepSeek**: have native SDKs (anthropic SDK, google-genai SDK, raw HTTP). These stay per-vendor per the deferred `anthropic_gemini_deepseek_capability_matrix_20260606` follow-up track.
+**Confirmed best API per vendor (Grok-consulted 2026-06-11):**
 
-**Implications for the capability matrix:** as native APIs add features, the matrix grows. The current v1 matrix has 7 fields (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking). Future expansion (per the deferred list in §3.3) will add: `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `reasoning_effort`, `structured_output`. The matrix IS the aggregate tracker; the GUI filters UI elements based on what's in the matrix. **The matrix's job is to be the canonical source of truth for "what can this vendor/model do"; the GUI never hard-codes per-vendor branches.** Any new capability a vendor adds (server-side tools, native cost reporting, prompt caching) goes into the matrix; the UI filters based on it.
+| Vendor | API / Approach | Decision |
+|---|---|---|
+| **Qwen** | Alibaba DashScope native SDK (not OpenAI-compatible) | **NATIVE** — OpenAI-compatible mode drops Qwen-Audio, Qwen-Long custom chunking, Qwen-VL-Max enhanced vision. Phase 2 ships this. |
+| **xAI (Grok)** | xAI official OpenAI-compatible (`https://api.x.ai/v1`) | **OPENAI-COMPATIBLE** — Per Grok's own confirmation, the OpenAI-compatible endpoint is "fully compatible and clean" with "no meaningful unique native surface lost." Phase 3 ships this. |
+| **MiniMax** | OpenAI-compatible (`https://api.minimax.io/v1`) | **OPENAI-COMPATIBLE** — Already fully compatible. Phase 4 refactor is a pure win. |
+| **DeepSeek** | OpenAI-compatible (`https://api.deepseek.com`) | **OPENAI-COMPATIBLE** — Drop-in compatible by design; offers an `/anthropic`-compatible path too. Follow-up track. |
+| **Ollama** (Llama local backend) | Ollama's `/v1/chat/completions` (OpenAI-compatible) is the v1 choice; native `/api/chat` is a possible v2 | **OPENAI-COMPATIBLE in v1** — Ollama's compat endpoint supports streaming, tools, vision, JSON mode. Native `/api/chat` has extras (`think` param, `images: list[str]`, structured outputs); deferred to follow-up. |
+| **Meta Llama API** (Llama cloud-native) | Meta's native REST API | **NATIVE (NEW BACKEND, FOLLOW-UP)** — Add as a 4th Llama backend. Deferred pending verification of Meta's API spec. |
+| **Gemini** | Google `genai` SDK / Gemini native API (NOT OpenAI-compatible) | **NATIVE (FOLLOW-UP)** — OpenAI-comp loses explicit context caching (big cost win), Grounding with Google Search, native video/multimodal. The deferred follow-up track. |
+| **Anthropic** | Anthropic official SDK / Messages API (NOT OpenAI-compatible) | **NATIVE (FOLLOW-UP)** — Native gives prompt caching (`cache_control` ephemeral, 50-90% savings), PDF processing, citations, extended thinking, Computer Use. OpenAI-comp layer exists but loses too much. The deferred follow-up track. |
 
-**This track's Phase 3 ships the OpenAI-compatible Grok + Llama (3 backends) as a placeholder; the native-API work is deferred to follow-up tracks documented in §13.1.**
+**Implications for the capability matrix:** as native APIs add features, the matrix grows. The current v1 matrix has 7 fields (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking). Future expansion (per the deferred list in §3.3, refined by Grok's consultation) will add:
+
+- `audio` (Qwen-Audio, others)
+- `video` (Gemini native, others)
+- `grounding` / `search` (Gemini Grounding with Google Search, Grok's `x_search` and `web_search`)
+- `computer_use` (Anthropic, beta/agentic)
+- `local` (boolean — true for Ollama; useful for UX "free local" badge)
+- `reasoning` / `extended_thinking` (Grok `reasoning_effort`, Anthropic extended thinking, Ollama `think`)
+- `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support` (per-vendor server-side tools)
+- `structured_output` (response_format / format support)
+
+The matrix IS the aggregate tracker; the GUI filters UI elements based on what's in the matrix. **The matrix's job is to be the canonical source of truth for "what can this vendor/model do"; the GUI never hard-codes per-vendor branches.** Any new capability a vendor adds (server-side tools, native cost reporting, prompt caching) goes into the matrix; the UI filters based on it.
+
+**This track's Phase 3 ships the OpenAI-compatible Grok + Llama (3 backends) as the canonical implementation per Grok's confirmation; the native-API work for Llama (Ollama native, Meta Llama API) is deferred to follow-up tracks documented in §13.1.**
 
 ### 3.2 Module Layout
 
@@ -239,48 +256,42 @@ _llama_api_key: str = "ollama"                      # Ollama doesn't require aut
 
 **Model discovery:** Ollama exposes `GET /api/tags` (not `/v1/models`); OpenRouter exposes `GET /v1/models`. The Llama adapter probes both endpoints and unions the results. For custom URLs, falls back to the hardcoded registry.
 
-### 4.3 Grok via xAI (Native REST API) — corrected 2026-06-11
+### 4.3 Grok via xAI (OpenAI-Compatible) — confirmed 2026-06-11
 
-**Why native (not OpenAI-compatible):** Per §3.1.1, the OpenAI SDK shim loses xAI's native features: `prompt_cache_key` (prompt caching — sets `caching: true` in the matrix), `reasoning_effort` (reasoning model control — sets `reasoning: true`), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp` — each a future matrix field), `cost_in_usd_ticks` (native cost reporting), and the newer `/v1/responses` endpoint. **Grok uses `requests.post` directly to xAI's native REST API.**
+**Per Grok's consultation (2026-06-11): the OpenAI-compatible endpoint at `https://api.x.ai/v1` is the canonical, fully-featured approach.** xAI's API is "fully compatible and clean" with "no meaningful unique native surface lost" by using the OpenAI-compatible shim. This section was previously labeled "Native REST API" based on a user impression that the native endpoint had unique features (prompt_cache_key, reasoning_effort, server-side tools, cost_in_usd_ticks) that the shim loses; Grok's actual recommendation is that the shim is fine.
 
-**Two native endpoints are available:**
-- `POST https://api.x.ai/v1/chat/completions` — OpenAI-request-shape; supports tools, streaming, vision. Use this for v1 (matches the test signature; simpler).
-- `POST https://api.x.ai/v1/responses` — xAI's newer native endpoint; supports reasoning, server-side tools, response chains via `previous_response_id`. Use this in a follow-up track for the full server-side feature set.
+**SDK:** `openai` (already a dependency). Set `base_url="https://api.x.ai/v1"` and pass the xAI API key as the Bearer token (handled automatically by the OpenAI SDK).
 
 **State:**
 ```python
+_grok_client: OpenAI | None = None
 _grok_history: list[dict[str, Any]] = []
 _grok_history_lock: threading.Lock = threading.Lock()
-_grok_api_key: str = ""
 ```
 
-(No persistent client; each call uses `requests.post` with the auth header.)
-
 **Credentials:** `credentials.toml` `[grok]` section with `api_key`. (xAI's `base_url` is hardcoded to `https://api.x.ai/v1`.)
 
 **Configuration per-project (TOML):** `provider = "grok"`, `grok_model = "grok-2"`.
 
-**Models shipped in the capability registry (v1) — updated with native features:**
+**Models shipped in the capability registry (v1):**
 
-| Model | vision | tool_calling | caching | streaming | context_window | cost_input | cost_output |
-|---|---|---|---|---|---|---|---|
-| `grok-2` | false | true | true (prompt_cache_key) | true | 131,072 | $2.00 | $10.00 |
-| `grok-2-vision` | true | true | true (prompt_cache_key) | true | 32,768 | $2.00 | $10.00 |
-| `grok-beta` | false | true | true (prompt_cache_key) | true | 131,072 | $5.00 | $15.00 |
+| Model | vision | tool_calling | context_window | cost_input | cost_output |
+|---|---|---|---|---|---|
+| `grok-2` | false | true | 131,072 | $2.00 | $10.00 |
+| `grok-2-vision` | true | true | 32,768 | $2.00 | $10.00 |
+| `grok-beta` | false | true | 131,072 | $5.00 | $15.00 |
 
-(Pricing from x.ai public pricing as of 2026-06-06; update if needed. The `caching: true` entry acknowledges xAI's `prompt_cache_key` support.)
+(Pricing from x.ai public pricing as of 2026-06-06; update if needed. `caching` stays `False` in v1 since Grok's OpenAI-compatible shim doesn't expose `prompt_cache_key`.)
 
-**Entry point:** `_send_grok()` in `src/ai_client.py`. POSTs to `https://api.x.ai/v1/chat/completions` directly via `requests.post` (or `httpx`). NOT `client.chat.completions.create()` (the OpenAI SDK shim).
+**Entry point:** `_send_grok()` in `src/ai_client.py`. Calls `send_openai_compatible()` with the xAI base URL (via the OpenAI SDK).
 
-**Tool format:** xAI's native format matches OpenAI's `tool_calls` (id, type=function, function={name, arguments}). No translation needed.
+**Tool format:** Native OpenAI. No translation needed.
 
-**Vision:** Grok-2-Vision accepts image URLs or base64 via the same `content: list[dict]` shape as OpenAI. Pass through unchanged.
+**Vision:** Grok-2-Vision accepts image URLs or base64. The OpenAI-compatible helper already handles vision via the OpenAI SDK's multimodal message format.
 
-**Error classification:** New `_classify_grok_error()` maps xAI's HTTP status codes (401, 403, 429, 500+) to `ProviderError` kinds. xAI returns JSON error bodies with `code` and `message` fields (e.g., 401 → `code="InvalidApiKey"`).
+**Error classification:** Same as OpenAI-compatible vendors (uniform error shape via the openai SDK).
 
-**Model discovery:** xAI exposes `GET /v1/models`. The Grok adapter calls this and returns the model IDs.
-
-**Phase 3 placeholder behavior:** This track's Phase 3 ships the OpenAI-compatible Grok (mocking `chat.completions.create`) as a placeholder, NOT the native REST approach. The OpenAI-compatible implementation works against xAI's `https://api.x.ai/v1` endpoint (which IS OpenAI-compatible) but loses the native features listed above. The native refactor is documented as a follow-up in §13.1.
+**Model discovery:** xAI exposes `GET /v1/models`. Standard OpenAI-compatible discovery.
 
 ## 5. Shared OpenAI-Compatible Helper
 
@@ -495,12 +506,11 @@ Each phase has its own checkpoint commit and git note.
 
 **A. "Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high.
 
-**B. "Native Vendor APIs (post-OpenAI-compatible-placeholder)"** — Replaces the OpenAI-compatible shim used in this track's Phase 3 (Grok + Llama) with the vendors' native SDKs/REST APIs. Per §3.1.1, the OpenAI-compatible approach loses native features. Concretely:
-- **Grok** → xAI native REST (`requests.post` to `https://api.x.ai/v1/chat/completions` or `/v1/responses`); adds `prompt_cache_key` (caching), `reasoning_effort` (reasoning), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp`), and `cost_in_usd_ticks` (native cost reporting).
-- **Llama (Ollama backend)** → Ollama native `/api/chat`; adds `think` param (low/medium/high), `images: list[str]` in messages (cleaner base64 than OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs.
-- **Llama (Meta Llama API backend)** → New 4th backend option; uses Meta's native REST API. Currently deferred pending verification of Meta's API spec (the `llama.developer.meta.com/docs/overview` URL returned 400 on fetch this session; needs re-verification when the docs are available).
-- **Capability matrix expansion** → Add fields for the new native features: `caching` (already in v1, just enable for Grok), `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `reasoning_effort`, `structured_output`. Each addition is a registry change + a UI adaptation in Phase 5.
-- **Test rewrites** → The Phase 3 Red tests in `test_grok_provider.py` and `test_llama_provider.py` mock `chat.completions.create` (OpenAI SDK pattern). Native tests would mock `requests.post` (or `httpx`) and verify the JSON body shape, headers, and response parsing.
+**B. "Llama Native APIs (Ollama native + Meta Llama API)"** — Per §3.1.1's revised assessment (after Grok's consultation), xAI's OpenAI-compatible endpoint is the canonical full-featured approach — NO Grok native refactor is needed. The follow-up for Llama backends is:
+- **Llama (Ollama backend)** → Ollama native `/api/chat`; adds `think` param (low/medium/high), `images: list[str]` in messages (cleaner base64 than OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs. The Phase 3 Red tests are written for the OpenAI-compatible shim; the native tests would mock `requests.post` to `/api/chat`.
+- **Llama (Meta Llama API backend)** → New 4th Llama backend; uses Meta's native REST API. Currently deferred pending verification of Meta's API spec (the `llama.developer.meta.com/docs/overview` URL returned 400 on fetch this session; needs re-verification when the docs are available).
+- **Capability matrix expansion** → Add fields for the new native features per Grok's consultation: `audio`, `video`, `grounding`/`search`, `computer_use`, `local`, `reasoning`/`extended_thinking`, `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `structured_output`. Each addition is a registry change + a UI adaptation in Phase 5.
+- **Test rewrites** → The Phase 3 Llama Red tests in `test_llama_provider.py` would be extended with 2 more tests: native Ollama (`/api/chat` with `think` param, `images: list[str]`) and Meta Llama API. The Grok Red tests do NOT need rewriting.
 
 ### 13.2 Project References