docs(phase-6): update ai_client+models guides; report + follow-up track setup

Phase 6 t6.1 + t6.2 (no archive per user directive): - docs/guide_ai_client.md: update Overview to mention 8 providers (was 5); add 'Shared OpenAI-Compatible Helper' section explaining src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible, usage pattern); document the Qwen adapter and Llama multi-backend. - docs/guide_models.md: update PROVIDERS list to 8 entries (was 5). - conductor/tracks.md: update the Qwen track entry to reflect '50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up'; add detailed status note pointing to the follow-up track + audit report. - docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report explaining why a follow-up is needed (7 categories of gaps; the Tech Lead's 'footnote for now' failure mode; the lessons learned). - conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up track setup (spec.md, state.toml, metadata.json, TODO.md). 5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9, local-first + matrix v2, Anthropic/Gemini/DeepSeek migration. Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed) are NOT applied per user directive: 'we can then doc this we're not archiving yet, if we have a follow up track I need this one to stay up because there is still alot todo'.
2026-06-11 09:33:18 -04:00
parent 457255bcd4
commit 691dc584eb
8 changed files with 745 additions and 3 deletions
@@ -6,10 +6,17 @@

 ## Overview

-`src/ai_client.py` (~116KB) is the **unified LLM client** for 5 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI) behind a single `send()` function.
+`src/ai_client.py` (~116KB) is the **unified LLM client** for 8 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI, Qwen, Grok, Llama) behind a single `send()` function.

 The module is a **stateful singleton** — all provider state is held in module-level globals. There is no class wrapping; the module itself is the abstraction layer.

+The 8 providers split into 3 API shapes:
+- **Native SDK**: Gemini (google-genai), Anthropic (anthropic), Qwen (DashScope)
+- **OpenAI-compatible**: MiniMax, Grok, Llama (Ollama/OpenRouter/custom), DeepSeek
+- **Subprocess**: Gemini CLI
+
+The OpenAI-compatible vendors all call the shared helper in `src/openai_compatible.py` (added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track; see "Shared OpenAI-Compatible Helper" section below). The MiniMax provider's `_send_minimax` was refactored to use this helper (Phase 4 of the same track, 231 → 75 lines, 68% reduction).
+
 ---

 ## Module-Level Imports
@@ -430,4 +437,91 @@ Gated by env var (e.g., `RUN_REAL_AI_TESTS=1`). Hits the real API. Not in defaul
 - **[guide_state_lifecycle.md](guide_state_lifecycle.md)** — The per-provider history globals (`_anthropic_history`, etc.) are managed here; their locking and reset behavior is documented
 - **[guide_context_aggregation.md](guide_context_aggregation.md)** — The `aggregate.py` pipeline that produces the markdown the AI client sends
 - **[conductor/product.md](../conductor/product.md#multi-provider-integration)** — Product-level overview of providers
+- **[docs/reports/qwen_llama_grok_followup_audit_20260611.md](qwen_llama_grok_followup_audit_20260611.md)** — Audit of the parent track's gaps; follow-up track `qwen_llama_grok_followup_20260611` covers them
+
+---
+
+## Shared OpenAI-Compatible Helper (`src/openai_compatible.py`)
+
+Added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track. Operates on a normalized request/response data structure so 4 OpenAI-compatible vendors (MiniMax, Grok, Llama, DeepSeek) can share the same request building, response parsing, streaming aggregation, tool call detection, and error classification logic.
+
+### Data Structures
+
+```python
+@dataclass(frozen=True)
+class NormalizedResponse:
+    text: str
+    tool_calls: list[dict[str, Any]]
+    usage_input_tokens: int
+    usage_output_tokens: int
+    usage_cache_read_tokens: int
+    usage_cache_creation_tokens: int
+    raw_response: Any
+
+@dataclass
+class OpenAICompatibleRequest:
+    messages: list[dict[str, Any]]
+    model: str
+    temperature: float = 0.0
+    top_p: float = 1.0
+    max_tokens: int = 8192
+    tools: Optional[list[dict[str, Any]]] = None
+    tool_choice: str = "auto"
+    stream: bool = False
+    stream_callback: Optional[Callable[[str], None]] = None
+```
+
+### The Function
+
+```python
+def send_openai_compatible(
+    client: Any,        # openai.OpenAI client with vendor-specific base_url + auth
+    request: OpenAICompatibleRequest,
+    *, capabilities: "VendorCapabilities",  # from src/vendor_capabilities.py
+) -> NormalizedResponse:
+```
+
+The function:
+1. Translates `request.messages` into the OpenAI SDK's `messages` parameter (passthrough — already in OpenAI shape).
+2. Translates `request.tools` if non-None (passthrough for now; future: strip unsupported fields based on `capabilities`).
+3. Calls `client.chat.completions.create(...)` with the right parameters.
+4. If streaming: aggregates chunks; calls `stream_callback(text_chunk)` for each text delta; collects final usage from the last chunk.
+5. If non-streaming: parses the response in one shot.
+6. Returns a `NormalizedResponse` with text, tool calls (in OpenAI shape), usage stats.
+7. On exception: classifies the OpenAI exception and re-raises as `ProviderError`.
+
+### Usage Pattern (per vendor)
+
+```python
+# _send_grok, _send_llama (single-shot placeholders), _send_minimax (with restored tool loop)
+def _send_grok(md_content, user_message, base_dir, file_items=None, discussion_history="", stream=False, ...):
+    client = _ensure_grok_client()  # openai.OpenAI(api_key=..., base_url="https://api.x.ai/v1")
+    with _grok_history_lock:
+        # ... build messages, append user, system + context ...
+        request = OpenAICompatibleRequest(
+            messages=messages, model=_model, stream=stream,
+            stream_callback=stream_callback,
+        )
+        caps = get_capabilities("grok", _model)
+        response = send_openai_compatible(client, request, capabilities=caps)
+        # ... append to history, return response.text ...
+```
+
+### Qwen Adapter (`src/qwen_adapter.py`)
+
+Qwen uses Alibaba's DashScope native SDK (not OpenAI-compatible) because DashScope's OpenAI-compatible mode drops important features (Qwen-Audio, Qwen-Long custom chunking, Qwen-VL-Max enhanced vision). The adapter normalizes DashScope tool format to OpenAI shape via `build_dashscope_tools()` and classifies DashScope exceptions via `classify_dashscope_error()`.
+
+### Llama Multi-Backend
+
+`_send_llama` supports 3 backends via the state globals `_llama_base_url` and `_llama_api_key`:
+- **Ollama** (local): `http://localhost:11434/v1`; no auth
+- **OpenRouter** (cloud aggregator): `https://openrouter.ai/api/v1`
+- **Custom URL** (escape hatch): any OpenAI-compatible endpoint
+
+The local-LLM signal is `_get_llama_cost_tracking()` (returns False for localhost/127.0.0.1).
+
+### Tests
+
+- `tests/test_vendor_capabilities.py` (3 tests): registry lookup, vendor-default fallback, unknown-vendor raises
+- `tests/test_openai_compatible.py` (6 tests): non-streaming, streaming aggregation, tool call detection, vision, error classification, frozen dataclass
 - **[conductor/tracks/nagent_review_20260608/report.md §15 Pitfalls #2 and #4](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the per-provider history globals and the stateful singleton pattern; future-track candidate for stateless LLMClient