docs(phase-6): update ai_client+models guides; report + follow-up track setup
Phase 6 t6.1 + t6.2 (no archive per user directive): - docs/guide_ai_client.md: update Overview to mention 8 providers (was 5); add 'Shared OpenAI-Compatible Helper' section explaining src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible, usage pattern); document the Qwen adapter and Llama multi-backend. - docs/guide_models.md: update PROVIDERS list to 8 entries (was 5). - conductor/tracks.md: update the Qwen track entry to reflect '50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up'; add detailed status note pointing to the follow-up track + audit report. - docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report explaining why a follow-up is needed (7 categories of gaps; the Tech Lead's 'footnote for now' failure mode; the lessons learned). - conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up track setup (spec.md, state.toml, metadata.json, TODO.md). 5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9, local-first + matrix v2, Anthropic/Gemini/DeepSeek migration. Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed) are NOT applied per user directive: 'we can then doc this we're not archiving yet, if we have a follow up track I need this one to stay up because there is still alot todo'.
This commit is contained in:
+95
-1
@@ -6,10 +6,17 @@
|
||||
|
||||
## Overview
|
||||
|
||||
`src/ai_client.py` (~116KB) is the **unified LLM client** for 5 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI) behind a single `send()` function.
|
||||
`src/ai_client.py` (~116KB) is the **unified LLM client** for 8 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI, Qwen, Grok, Llama) behind a single `send()` function.
|
||||
|
||||
The module is a **stateful singleton** — all provider state is held in module-level globals. There is no class wrapping; the module itself is the abstraction layer.
|
||||
|
||||
The 8 providers split into 3 API shapes:
|
||||
- **Native SDK**: Gemini (google-genai), Anthropic (anthropic), Qwen (DashScope)
|
||||
- **OpenAI-compatible**: MiniMax, Grok, Llama (Ollama/OpenRouter/custom), DeepSeek
|
||||
- **Subprocess**: Gemini CLI
|
||||
|
||||
The OpenAI-compatible vendors all call the shared helper in `src/openai_compatible.py` (added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track; see "Shared OpenAI-Compatible Helper" section below). The MiniMax provider's `_send_minimax` was refactored to use this helper (Phase 4 of the same track, 231 → 75 lines, 68% reduction).
|
||||
|
||||
---
|
||||
|
||||
## Module-Level Imports
|
||||
@@ -430,4 +437,91 @@ Gated by env var (e.g., `RUN_REAL_AI_TESTS=1`). Hits the real API. Not in defaul
|
||||
- **[guide_state_lifecycle.md](guide_state_lifecycle.md)** — The per-provider history globals (`_anthropic_history`, etc.) are managed here; their locking and reset behavior is documented
|
||||
- **[guide_context_aggregation.md](guide_context_aggregation.md)** — The `aggregate.py` pipeline that produces the markdown the AI client sends
|
||||
- **[conductor/product.md](../conductor/product.md#multi-provider-integration)** — Product-level overview of providers
|
||||
- **[docs/reports/qwen_llama_grok_followup_audit_20260611.md](qwen_llama_grok_followup_audit_20260611.md)** — Audit of the parent track's gaps; follow-up track `qwen_llama_grok_followup_20260611` covers them
|
||||
|
||||
---
|
||||
|
||||
## Shared OpenAI-Compatible Helper (`src/openai_compatible.py`)
|
||||
|
||||
Added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track. Operates on a normalized request/response data structure so 4 OpenAI-compatible vendors (MiniMax, Grok, Llama, DeepSeek) can share the same request building, response parsing, streaming aggregation, tool call detection, and error classification logic.
|
||||
|
||||
### Data Structures
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class NormalizedResponse:
|
||||
text: str
|
||||
tool_calls: list[dict[str, Any]]
|
||||
usage_input_tokens: int
|
||||
usage_output_tokens: int
|
||||
usage_cache_read_tokens: int
|
||||
usage_cache_creation_tokens: int
|
||||
raw_response: Any
|
||||
|
||||
@dataclass
|
||||
class OpenAICompatibleRequest:
|
||||
messages: list[dict[str, Any]]
|
||||
model: str
|
||||
temperature: float = 0.0
|
||||
top_p: float = 1.0
|
||||
max_tokens: int = 8192
|
||||
tools: Optional[list[dict[str, Any]]] = None
|
||||
tool_choice: str = "auto"
|
||||
stream: bool = False
|
||||
stream_callback: Optional[Callable[[str], None]] = None
|
||||
```
|
||||
|
||||
### The Function
|
||||
|
||||
```python
|
||||
def send_openai_compatible(
|
||||
client: Any, # openai.OpenAI client with vendor-specific base_url + auth
|
||||
request: OpenAICompatibleRequest,
|
||||
*, capabilities: "VendorCapabilities", # from src/vendor_capabilities.py
|
||||
) -> NormalizedResponse:
|
||||
```
|
||||
|
||||
The function:
|
||||
1. Translates `request.messages` into the OpenAI SDK's `messages` parameter (passthrough — already in OpenAI shape).
|
||||
2. Translates `request.tools` if non-None (passthrough for now; future: strip unsupported fields based on `capabilities`).
|
||||
3. Calls `client.chat.completions.create(...)` with the right parameters.
|
||||
4. If streaming: aggregates chunks; calls `stream_callback(text_chunk)` for each text delta; collects final usage from the last chunk.
|
||||
5. If non-streaming: parses the response in one shot.
|
||||
6. Returns a `NormalizedResponse` with text, tool calls (in OpenAI shape), usage stats.
|
||||
7. On exception: classifies the OpenAI exception and re-raises as `ProviderError`.
|
||||
|
||||
### Usage Pattern (per vendor)
|
||||
|
||||
```python
|
||||
# _send_grok, _send_llama (single-shot placeholders), _send_minimax (with restored tool loop)
|
||||
def _send_grok(md_content, user_message, base_dir, file_items=None, discussion_history="", stream=False, ...):
|
||||
client = _ensure_grok_client() # openai.OpenAI(api_key=..., base_url="https://api.x.ai/v1")
|
||||
with _grok_history_lock:
|
||||
# ... build messages, append user, system + context ...
|
||||
request = OpenAICompatibleRequest(
|
||||
messages=messages, model=_model, stream=stream,
|
||||
stream_callback=stream_callback,
|
||||
)
|
||||
caps = get_capabilities("grok", _model)
|
||||
response = send_openai_compatible(client, request, capabilities=caps)
|
||||
# ... append to history, return response.text ...
|
||||
```
|
||||
|
||||
### Qwen Adapter (`src/qwen_adapter.py`)
|
||||
|
||||
Qwen uses Alibaba's DashScope native SDK (not OpenAI-compatible) because DashScope's OpenAI-compatible mode drops important features (Qwen-Audio, Qwen-Long custom chunking, Qwen-VL-Max enhanced vision). The adapter normalizes DashScope tool format to OpenAI shape via `build_dashscope_tools()` and classifies DashScope exceptions via `classify_dashscope_error()`.
|
||||
|
||||
### Llama Multi-Backend
|
||||
|
||||
`_send_llama` supports 3 backends via the state globals `_llama_base_url` and `_llama_api_key`:
|
||||
- **Ollama** (local): `http://localhost:11434/v1`; no auth
|
||||
- **OpenRouter** (cloud aggregator): `https://openrouter.ai/api/v1`
|
||||
- **Custom URL** (escape hatch): any OpenAI-compatible endpoint
|
||||
|
||||
The local-LLM signal is `_get_llama_cost_tracking()` (returns False for localhost/127.0.0.1).
|
||||
|
||||
### Tests
|
||||
|
||||
- `tests/test_vendor_capabilities.py` (3 tests): registry lookup, vendor-default fallback, unknown-vendor raises
|
||||
- `tests/test_openai_compatible.py` (6 tests): non-streaming, streaming aggregation, tool call detection, vision, error classification, frozen dataclass
|
||||
- **[conductor/tracks/nagent_review_20260608/report.md §15 Pitfalls #2 and #4](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the per-provider history globals and the stateful singleton pattern; future-track candidate for stateless LLMClient
|
||||
|
||||
Reference in New Issue
Block a user