docs(spec): add 'best API per vendor' principle; mark xAI native as target; document follow-ups

Three additions to the spec, per the user's architectural correction in this session: 1. NEW section 3.1.1: 'Architectural principle: Use the best API per vendor' — explains why the OpenAI-compatible shim loses vendor- specific features (xAI: prompt_cache_key, reasoning_effort, server- side tools, cost_in_usd_ticks; Ollama: think param, images array, thinking field, structured outputs) and states the principle: 'use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.' Also notes that the capability matrix IS the aggregate tracker; future native features go into the matrix, and the GUI filters based on it (no per-vendor UI branches). 2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was 'OpenAI-Compatible'. Now specifies two native endpoints (/v1/chat/completions and /v1/responses), the native features that matter, the updated capability registry (caching=true for Grok via prompt_cache_key), and a 'Phase 3 placeholder behavior' note that this track's Phase 3 ships the OpenAI-compatible Grok as a placeholder. The native refactor is deferred to follow-up B. 3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs (post-OpenAI-compatible-placeholder)' which documents: - Grok → xAI native REST - Llama (Ollama) → native /api/chat - Llama (Meta Llama API) → new 4th backend (deferred pending verification of Meta's API spec; llama.developer.meta.com/docs/overview returned 400 on fetch this session) - Capability matrix expansion (web_search, x_search, code_execution, file_search, mcp_support, reasoning_effort, structured_output) - Test rewrites (mock requests.post instead of chat.completions.create) This is a docs-only commit; no code changes. The Phase 3 Green work continues with the OpenAI-compatible approach as planned in the existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track B handles the native refactor when prioritized.
2026-06-11 01:49:36 -04:00
parent 891c008f0c
commit 06716252f1
1 changed files with 49 additions and 17 deletions
@@ -59,6 +59,23 @@ This means:
 - **Anthropic/Gemini/DeepKeep** stay per-vendor code paths; the data-oriented refactor doesn't apply to them because their unique APIs are not OpenAI-compatible-shaped.
 - **"Base paths are unique"** (the user's wording) means: `_send_qwen()`, `_send_llama()`, `_send_grok()`, `_send_minimax()` are the unique entry points; everything they call into is shared.

+### 3.1.1 Architectural principle: "Use the best API per vendor" (added 2026-06-11)
+
+**Per the user's correction, the track's prior assumption — "all OpenAI-compatible" — is wrong.** The right principle is: **use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.**
+
+The OpenAI-compatible shim (the `send_openai_compatible` helper) loses vendor-specific features. Concrete examples discovered in this session:
+
+- **xAI (Grok)**: native REST has `prompt_cache_key` (prompt caching), `reasoning_effort` (reasoning model control), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp_calls`), `cost_in_usd_ticks` (native cost reporting), and the newer `/v1/responses` endpoint. The OpenAI SDK shim loses all of these. **Decision: Grok uses xAI's native REST API (`requests.post` to `https://api.x.ai/v1/chat/completions` or `/v1/responses`), not the OpenAI SDK.**
+- **Ollama** (used as Llama's local backend): native `/api/chat` has `think` param (low/medium/high for thinking traces), `images: list[str]` in messages (cleaner base64 array vs OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs. **Decision (FOLLOW-UP): Ollama should use native `/api/chat` instead of the OpenAI-compatible `/v1/chat/completions`.** Deferred to a follow-up track because the Phase 3 Red tests are already written for the OpenAI-compatible shim.
+- **Meta Llama API** (separate from Ollama): a hosted cloud API for Llama models. The OpenAI-compatible shim loses whatever Meta-native features it offers. **Decision (FOLLOW-UP): Add Meta's Llama API as a 4th Llama backend (alongside Ollama, OpenRouter, custom_url).** Deferred to a follow-up track pending verification of Meta's API spec.
+- **Qwen (DashScope)**: already uses native SDK (correct from the start).
+- **MiniMax**: no native SDK other than the OpenAI-compatible endpoint. Keep as-is.
+- **Anthropic / Gemini / DeepSeek**: have native SDKs (anthropic SDK, google-genai SDK, raw HTTP). These stay per-vendor per the deferred `anthropic_gemini_deepseek_capability_matrix_20260606` follow-up track.
+
+**Implications for the capability matrix:** as native APIs add features, the matrix grows. The current v1 matrix has 7 fields (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking). Future expansion (per the deferred list in §3.3) will add: `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `reasoning_effort`, `structured_output`. The matrix IS the aggregate tracker; the GUI filters UI elements based on what's in the matrix. **The matrix's job is to be the canonical source of truth for "what can this vendor/model do"; the GUI never hard-codes per-vendor branches.** Any new capability a vendor adds (server-side tools, native cost reporting, prompt caching) goes into the matrix; the UI filters based on it.
+
+**This track's Phase 3 ships the OpenAI-compatible Grok + Llama (3 backends) as a placeholder; the native-API work is deferred to follow-up tracks documented in §13.1.**
+
 ### 3.2 Module Layout

 ```
@@ -222,40 +239,48 @@ _llama_api_key: str = "ollama"                      # Ollama doesn't require aut

 **Model discovery:** Ollama exposes `GET /api/tags` (not `/v1/models`); OpenRouter exposes `GET /v1/models`. The Llama adapter probes both endpoints and unions the results. For custom URLs, falls back to the hardcoded registry.

-### 4.3 Grok via xAI (OpenAI-Compatible)
+### 4.3 Grok via xAI (Native REST API) — corrected 2026-06-11

-**SDK:** `openai` (already a dependency).
+**Why native (not OpenAI-compatible):** Per §3.1.1, the OpenAI SDK shim loses xAI's native features: `prompt_cache_key` (prompt caching — sets `caching: true` in the matrix), `reasoning_effort` (reasoning model control — sets `reasoning: true`), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp` — each a future matrix field), `cost_in_usd_ticks` (native cost reporting), and the newer `/v1/responses` endpoint. **Grok uses `requests.post` directly to xAI's native REST API.**
+
+**Two native endpoints are available:**
+- `POST https://api.x.ai/v1/chat/completions` — OpenAI-request-shape; supports tools, streaming, vision. Use this for v1 (matches the test signature; simpler).
+- `POST https://api.x.ai/v1/responses` — xAI's newer native endpoint; supports reasoning, server-side tools, response chains via `previous_response_id`. Use this in a follow-up track for the full server-side feature set.

 **State:**
 ```python
-_grok_client: OpenAI | None = None
 _grok_history: list[dict[str, Any]] = []
 _grok_history_lock: threading.Lock = threading.Lock()
+_grok_api_key: str = ""
 ```

+(No persistent client; each call uses `requests.post` with the auth header.)
+
 **Credentials:** `credentials.toml` `[grok]` section with `api_key`. (xAI's `base_url` is hardcoded to `https://api.x.ai/v1`.)

 **Configuration per-project (TOML):** `provider = "grok"`, `grok_model = "grok-2"`.

-**Models shipped in the capability registry (v1):**
+**Models shipped in the capability registry (v1) — updated with native features:**

-| Model | vision | tool_calling | caching | context_window | cost_input | cost_output |
-|---|---|---|---|---|---|---|
-| `grok-2` | false | true | false | 131,072 | $2.00 | $10.00 |
-| `grok-2-vision` | true | true | false | 32,768 | $2.00 | $10.00 |
-| `grok-beta` | false | true | false | 131,072 | $5.00 | $15.00 |
+| Model | vision | tool_calling | caching | streaming | context_window | cost_input | cost_output |
+|---|---|---|---|---|---|---|---|
+| `grok-2` | false | true | true (prompt_cache_key) | true | 131,072 | $2.00 | $10.00 |
+| `grok-2-vision` | true | true | true (prompt_cache_key) | true | 32,768 | $2.00 | $10.00 |
+| `grok-beta` | false | true | true (prompt_cache_key) | true | 131,072 | $5.00 | $15.00 |

-(Pricing from x.ai public pricing as of 2026-06-06; update if needed.)
+(Pricing from x.ai public pricing as of 2026-06-06; update if needed. The `caching: true` entry acknowledges xAI's `prompt_cache_key` support.)

-**Entry point:** `_send_grok()` in `src/ai_client.py`. Calls `send_openai_compatible()` with the xAI base URL.
+**Entry point:** `_send_grok()` in `src/ai_client.py`. POSTs to `https://api.x.ai/v1/chat/completions` directly via `requests.post` (or `httpx`). NOT `client.chat.completions.create()` (the OpenAI SDK shim).

-**Tool format:** Native OpenAI. No translation needed.
+**Tool format:** xAI's native format matches OpenAI's `tool_calls` (id, type=function, function={name, arguments}). No translation needed.

-**Vision:** Grok-2-Vision accepts image URLs or base64. The OpenAI-compatible helper already handles vision via the OpenAI SDK's multimodal message format.
+**Vision:** Grok-2-Vision accepts image URLs or base64 via the same `content: list[dict]` shape as OpenAI. Pass through unchanged.

-**Error classification:** Same as OpenAI-compatible vendors (uniform error shape via the openai SDK).
+**Error classification:** New `_classify_grok_error()` maps xAI's HTTP status codes (401, 403, 429, 500+) to `ProviderError` kinds. xAI returns JSON error bodies with `code` and `message` fields (e.g., 401 → `code="InvalidApiKey"`).

-**Model discovery:** xAI exposes `GET /v1/models`. Standard OpenAI-compatible discovery.
+**Model discovery:** xAI exposes `GET /v1/models`. The Grok adapter calls this and returns the model IDs.
+
+**Phase 3 placeholder behavior:** This track's Phase 3 ships the OpenAI-compatible Grok (mocking `chat.completions.create`) as a placeholder, NOT the native REST approach. The OpenAI-compatible implementation works against xAI's `https://api.x.ai/v1` endpoint (which IS OpenAI-compatible) but loses the native features listed above. The native refactor is documented as a follow-up in §13.1.

 ## 5. Shared OpenAI-Compatible Helper

@@ -466,9 +491,16 @@ Each phase has its own checkpoint commit and git note.

 ## 13. See Also

-### 13.1 Follow-up Track (separate plan)
+### 13.1 Follow-up Tracks (separate plans)

-**"Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high.
+**A. "Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high.
+
+**B. "Native Vendor APIs (post-OpenAI-compatible-placeholder)"** — Replaces the OpenAI-compatible shim used in this track's Phase 3 (Grok + Llama) with the vendors' native SDKs/REST APIs. Per §3.1.1, the OpenAI-compatible approach loses native features. Concretely:
+- **Grok** → xAI native REST (`requests.post` to `https://api.x.ai/v1/chat/completions` or `/v1/responses`); adds `prompt_cache_key` (caching), `reasoning_effort` (reasoning), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp`), and `cost_in_usd_ticks` (native cost reporting).
+- **Llama (Ollama backend)** → Ollama native `/api/chat`; adds `think` param (low/medium/high), `images: list[str]` in messages (cleaner base64 than OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs.
+- **Llama (Meta Llama API backend)** → New 4th backend option; uses Meta's native REST API. Currently deferred pending verification of Meta's API spec (the `llama.developer.meta.com/docs/overview` URL returned 400 on fetch this session; needs re-verification when the docs are available).
+- **Capability matrix expansion** → Add fields for the new native features: `caching` (already in v1, just enable for Grok), `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `reasoning_effort`, `structured_output`. Each addition is a registry change + a UI adaptation in Phase 5.
+- **Test rewrites** → The Phase 3 Red tests in `test_grok_provider.py` and `test_llama_provider.py` mock `chat.completions.create` (OpenAI SDK pattern). Native tests would mock `requests.post` (or `httpx`) and verify the JSON body shape, headers, and response parsing.

 ### 13.2 Project References