Private
Public Access
0
0

docs(spec): add 'best API per vendor' principle; mark xAI native as target; document follow-ups

Three additions to the spec, per the user's architectural correction
in this session:

1. NEW section 3.1.1: 'Architectural principle: Use the best API per
   vendor' — explains why the OpenAI-compatible shim loses vendor-
   specific features (xAI: prompt_cache_key, reasoning_effort, server-
   side tools, cost_in_usd_ticks; Ollama: think param, images array,
   thinking field, structured outputs) and states the principle:
   'use each vendor's native SDK or REST API when one exists, falling
   back to OpenAI-compatible only when no native option exists.'

   Also notes that the capability matrix IS the aggregate tracker;
   future native features go into the matrix, and the GUI filters
   based on it (no per-vendor UI branches).

2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was
   'OpenAI-Compatible'. Now specifies two native endpoints
   (/v1/chat/completions and /v1/responses), the native features that
   matter, the updated capability registry (caching=true for Grok
   via prompt_cache_key), and a 'Phase 3 placeholder behavior' note
   that this track's Phase 3 ships the OpenAI-compatible Grok as a
   placeholder. The native refactor is deferred to follow-up B.

3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs
   (post-OpenAI-compatible-placeholder)' which documents:
   - Grok → xAI native REST
   - Llama (Ollama) → native /api/chat
   - Llama (Meta Llama API) → new 4th backend (deferred pending
     verification of Meta's API spec; llama.developer.meta.com/docs/overview
     returned 400 on fetch this session)
   - Capability matrix expansion (web_search, x_search, code_execution,
     file_search, mcp_support, reasoning_effort, structured_output)
   - Test rewrites (mock requests.post instead of chat.completions.create)

This is a docs-only commit; no code changes. The Phase 3 Green work
continues with the OpenAI-compatible approach as planned in the
existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track
B handles the native refactor when prioritized.
This commit is contained in:
2026-06-11 01:49:36 -04:00
parent 891c008f0c
commit 06716252f1
@@ -59,6 +59,23 @@ This means:
- **Anthropic/Gemini/DeepKeep** stay per-vendor code paths; the data-oriented refactor doesn't apply to them because their unique APIs are not OpenAI-compatible-shaped.
- **"Base paths are unique"** (the user's wording) means: `_send_qwen()`, `_send_llama()`, `_send_grok()`, `_send_minimax()` are the unique entry points; everything they call into is shared.
### 3.1.1 Architectural principle: "Use the best API per vendor" (added 2026-06-11)
**Per the user's correction, the track's prior assumption — "all OpenAI-compatible" — is wrong.** The right principle is: **use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.**
The OpenAI-compatible shim (the `send_openai_compatible` helper) loses vendor-specific features. Concrete examples discovered in this session:
- **xAI (Grok)**: native REST has `prompt_cache_key` (prompt caching), `reasoning_effort` (reasoning model control), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp_calls`), `cost_in_usd_ticks` (native cost reporting), and the newer `/v1/responses` endpoint. The OpenAI SDK shim loses all of these. **Decision: Grok uses xAI's native REST API (`requests.post` to `https://api.x.ai/v1/chat/completions` or `/v1/responses`), not the OpenAI SDK.**
- **Ollama** (used as Llama's local backend): native `/api/chat` has `think` param (low/medium/high for thinking traces), `images: list[str]` in messages (cleaner base64 array vs OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs. **Decision (FOLLOW-UP): Ollama should use native `/api/chat` instead of the OpenAI-compatible `/v1/chat/completions`.** Deferred to a follow-up track because the Phase 3 Red tests are already written for the OpenAI-compatible shim.
- **Meta Llama API** (separate from Ollama): a hosted cloud API for Llama models. The OpenAI-compatible shim loses whatever Meta-native features it offers. **Decision (FOLLOW-UP): Add Meta's Llama API as a 4th Llama backend (alongside Ollama, OpenRouter, custom_url).** Deferred to a follow-up track pending verification of Meta's API spec.
- **Qwen (DashScope)**: already uses native SDK (correct from the start).
- **MiniMax**: no native SDK other than the OpenAI-compatible endpoint. Keep as-is.
- **Anthropic / Gemini / DeepSeek**: have native SDKs (anthropic SDK, google-genai SDK, raw HTTP). These stay per-vendor per the deferred `anthropic_gemini_deepseek_capability_matrix_20260606` follow-up track.
**Implications for the capability matrix:** as native APIs add features, the matrix grows. The current v1 matrix has 7 fields (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking). Future expansion (per the deferred list in §3.3) will add: `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `reasoning_effort`, `structured_output`. The matrix IS the aggregate tracker; the GUI filters UI elements based on what's in the matrix. **The matrix's job is to be the canonical source of truth for "what can this vendor/model do"; the GUI never hard-codes per-vendor branches.** Any new capability a vendor adds (server-side tools, native cost reporting, prompt caching) goes into the matrix; the UI filters based on it.
**This track's Phase 3 ships the OpenAI-compatible Grok + Llama (3 backends) as a placeholder; the native-API work is deferred to follow-up tracks documented in §13.1.**
### 3.2 Module Layout
```
@@ -222,40 +239,48 @@ _llama_api_key: str = "ollama" # Ollama doesn't require aut
**Model discovery:** Ollama exposes `GET /api/tags` (not `/v1/models`); OpenRouter exposes `GET /v1/models`. The Llama adapter probes both endpoints and unions the results. For custom URLs, falls back to the hardcoded registry.
### 4.3 Grok via xAI (OpenAI-Compatible)
### 4.3 Grok via xAI (Native REST API) — corrected 2026-06-11
**SDK:** `openai` (already a dependency).
**Why native (not OpenAI-compatible):** Per §3.1.1, the OpenAI SDK shim loses xAI's native features: `prompt_cache_key` (prompt caching — sets `caching: true` in the matrix), `reasoning_effort` (reasoning model control — sets `reasoning: true`), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp` — each a future matrix field), `cost_in_usd_ticks` (native cost reporting), and the newer `/v1/responses` endpoint. **Grok uses `requests.post` directly to xAI's native REST API.**
**Two native endpoints are available:**
- `POST https://api.x.ai/v1/chat/completions` — OpenAI-request-shape; supports tools, streaming, vision. Use this for v1 (matches the test signature; simpler).
- `POST https://api.x.ai/v1/responses` — xAI's newer native endpoint; supports reasoning, server-side tools, response chains via `previous_response_id`. Use this in a follow-up track for the full server-side feature set.
**State:**
```python
_grok_client: OpenAI | None = None
_grok_history: list[dict[str, Any]] = []
_grok_history_lock: threading.Lock = threading.Lock()
_grok_api_key: str = ""
```
(No persistent client; each call uses `requests.post` with the auth header.)
**Credentials:** `credentials.toml` `[grok]` section with `api_key`. (xAI's `base_url` is hardcoded to `https://api.x.ai/v1`.)
**Configuration per-project (TOML):** `provider = "grok"`, `grok_model = "grok-2"`.
**Models shipped in the capability registry (v1):**
**Models shipped in the capability registry (v1) — updated with native features:**
| Model | vision | tool_calling | caching | context_window | cost_input | cost_output |
|---|---|---|---|---|---|---|
| `grok-2` | false | true | false | 131,072 | $2.00 | $10.00 |
| `grok-2-vision` | true | true | false | 32,768 | $2.00 | $10.00 |
| `grok-beta` | false | true | false | 131,072 | $5.00 | $15.00 |
| Model | vision | tool_calling | caching | streaming | context_window | cost_input | cost_output |
|---|---|---|---|---|---|---|---|
| `grok-2` | false | true | true (prompt_cache_key) | true | 131,072 | $2.00 | $10.00 |
| `grok-2-vision` | true | true | true (prompt_cache_key) | true | 32,768 | $2.00 | $10.00 |
| `grok-beta` | false | true | true (prompt_cache_key) | true | 131,072 | $5.00 | $15.00 |
(Pricing from x.ai public pricing as of 2026-06-06; update if needed.)
(Pricing from x.ai public pricing as of 2026-06-06; update if needed. The `caching: true` entry acknowledges xAI's `prompt_cache_key` support.)
**Entry point:** `_send_grok()` in `src/ai_client.py`. Calls `send_openai_compatible()` with the xAI base URL.
**Entry point:** `_send_grok()` in `src/ai_client.py`. POSTs to `https://api.x.ai/v1/chat/completions` directly via `requests.post` (or `httpx`). NOT `client.chat.completions.create()` (the OpenAI SDK shim).
**Tool format:** Native OpenAI. No translation needed.
**Tool format:** xAI's native format matches OpenAI's `tool_calls` (id, type=function, function={name, arguments}). No translation needed.
**Vision:** Grok-2-Vision accepts image URLs or base64. The OpenAI-compatible helper already handles vision via the OpenAI SDK's multimodal message format.
**Vision:** Grok-2-Vision accepts image URLs or base64 via the same `content: list[dict]` shape as OpenAI. Pass through unchanged.
**Error classification:** Same as OpenAI-compatible vendors (uniform error shape via the openai SDK).
**Error classification:** New `_classify_grok_error()` maps xAI's HTTP status codes (401, 403, 429, 500+) to `ProviderError` kinds. xAI returns JSON error bodies with `code` and `message` fields (e.g., 401 → `code="InvalidApiKey"`).
**Model discovery:** xAI exposes `GET /v1/models`. Standard OpenAI-compatible discovery.
**Model discovery:** xAI exposes `GET /v1/models`. The Grok adapter calls this and returns the model IDs.
**Phase 3 placeholder behavior:** This track's Phase 3 ships the OpenAI-compatible Grok (mocking `chat.completions.create`) as a placeholder, NOT the native REST approach. The OpenAI-compatible implementation works against xAI's `https://api.x.ai/v1` endpoint (which IS OpenAI-compatible) but loses the native features listed above. The native refactor is documented as a follow-up in §13.1.
## 5. Shared OpenAI-Compatible Helper
@@ -466,9 +491,16 @@ Each phase has its own checkpoint commit and git note.
## 13. See Also
### 13.1 Follow-up Track (separate plan)
### 13.1 Follow-up Tracks (separate plans)
**"Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high.
**A. "Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high.
**B. "Native Vendor APIs (post-OpenAI-compatible-placeholder)"** — Replaces the OpenAI-compatible shim used in this track's Phase 3 (Grok + Llama) with the vendors' native SDKs/REST APIs. Per §3.1.1, the OpenAI-compatible approach loses native features. Concretely:
- **Grok** → xAI native REST (`requests.post` to `https://api.x.ai/v1/chat/completions` or `/v1/responses`); adds `prompt_cache_key` (caching), `reasoning_effort` (reasoning), server-side tools (`web_search`, `x_search`, `code_interpreter`, `file_search`, `mcp`), and `cost_in_usd_ticks` (native cost reporting).
- **Llama (Ollama backend)** → Ollama native `/api/chat`; adds `think` param (low/medium/high), `images: list[str]` in messages (cleaner base64 than OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs.
- **Llama (Meta Llama API backend)** → New 4th backend option; uses Meta's native REST API. Currently deferred pending verification of Meta's API spec (the `llama.developer.meta.com/docs/overview` URL returned 400 on fetch this session; needs re-verification when the docs are available).
- **Capability matrix expansion** → Add fields for the new native features: `caching` (already in v1, just enable for Grok), `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `reasoning_effort`, `structured_output`. Each addition is a registry change + a UI adaptation in Phase 5.
- **Test rewrites** → The Phase 3 Red tests in `test_grok_provider.py` and `test_llama_provider.py` mock `chat.completions.create` (OpenAI SDK pattern). Native tests would mock `requests.post` (or `httpx`) and verify the JSON body shape, headers, and response parsing.
### 13.2 Project References