docs(phase-6): update ai_client+models guides; report + follow-up track setup
Phase 6 t6.1 + t6.2 (no archive per user directive): - docs/guide_ai_client.md: update Overview to mention 8 providers (was 5); add 'Shared OpenAI-Compatible Helper' section explaining src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible, usage pattern); document the Qwen adapter and Llama multi-backend. - docs/guide_models.md: update PROVIDERS list to 8 entries (was 5). - conductor/tracks.md: update the Qwen track entry to reflect '50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up'; add detailed status note pointing to the follow-up track + audit report. - docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report explaining why a follow-up is needed (7 categories of gaps; the Tech Lead's 'footnote for now' failure mode; the lessons learned). - conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up track setup (spec.md, state.toml, metadata.json, TODO.md). 5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9, local-first + matrix v2, Anthropic/Gemini/DeepSeek migration. Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed) are NOT applied per user directive: 'we can then doc this we're not archiving yet, if we have a follow up track I need this one to stay up because there is still alot todo'.
This commit is contained in:
+95
-1
@@ -6,10 +6,17 @@
|
||||
|
||||
## Overview
|
||||
|
||||
`src/ai_client.py` (~116KB) is the **unified LLM client** for 5 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI) behind a single `send()` function.
|
||||
`src/ai_client.py` (~116KB) is the **unified LLM client** for 8 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI, Qwen, Grok, Llama) behind a single `send()` function.
|
||||
|
||||
The module is a **stateful singleton** — all provider state is held in module-level globals. There is no class wrapping; the module itself is the abstraction layer.
|
||||
|
||||
The 8 providers split into 3 API shapes:
|
||||
- **Native SDK**: Gemini (google-genai), Anthropic (anthropic), Qwen (DashScope)
|
||||
- **OpenAI-compatible**: MiniMax, Grok, Llama (Ollama/OpenRouter/custom), DeepSeek
|
||||
- **Subprocess**: Gemini CLI
|
||||
|
||||
The OpenAI-compatible vendors all call the shared helper in `src/openai_compatible.py` (added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track; see "Shared OpenAI-Compatible Helper" section below). The MiniMax provider's `_send_minimax` was refactored to use this helper (Phase 4 of the same track, 231 → 75 lines, 68% reduction).
|
||||
|
||||
---
|
||||
|
||||
## Module-Level Imports
|
||||
@@ -430,4 +437,91 @@ Gated by env var (e.g., `RUN_REAL_AI_TESTS=1`). Hits the real API. Not in defaul
|
||||
- **[guide_state_lifecycle.md](guide_state_lifecycle.md)** — The per-provider history globals (`_anthropic_history`, etc.) are managed here; their locking and reset behavior is documented
|
||||
- **[guide_context_aggregation.md](guide_context_aggregation.md)** — The `aggregate.py` pipeline that produces the markdown the AI client sends
|
||||
- **[conductor/product.md](../conductor/product.md#multi-provider-integration)** — Product-level overview of providers
|
||||
- **[docs/reports/qwen_llama_grok_followup_audit_20260611.md](qwen_llama_grok_followup_audit_20260611.md)** — Audit of the parent track's gaps; follow-up track `qwen_llama_grok_followup_20260611` covers them
|
||||
|
||||
---
|
||||
|
||||
## Shared OpenAI-Compatible Helper (`src/openai_compatible.py`)
|
||||
|
||||
Added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track. Operates on a normalized request/response data structure so 4 OpenAI-compatible vendors (MiniMax, Grok, Llama, DeepSeek) can share the same request building, response parsing, streaming aggregation, tool call detection, and error classification logic.
|
||||
|
||||
### Data Structures
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class NormalizedResponse:
|
||||
text: str
|
||||
tool_calls: list[dict[str, Any]]
|
||||
usage_input_tokens: int
|
||||
usage_output_tokens: int
|
||||
usage_cache_read_tokens: int
|
||||
usage_cache_creation_tokens: int
|
||||
raw_response: Any
|
||||
|
||||
@dataclass
|
||||
class OpenAICompatibleRequest:
|
||||
messages: list[dict[str, Any]]
|
||||
model: str
|
||||
temperature: float = 0.0
|
||||
top_p: float = 1.0
|
||||
max_tokens: int = 8192
|
||||
tools: Optional[list[dict[str, Any]]] = None
|
||||
tool_choice: str = "auto"
|
||||
stream: bool = False
|
||||
stream_callback: Optional[Callable[[str], None]] = None
|
||||
```
|
||||
|
||||
### The Function
|
||||
|
||||
```python
|
||||
def send_openai_compatible(
|
||||
client: Any, # openai.OpenAI client with vendor-specific base_url + auth
|
||||
request: OpenAICompatibleRequest,
|
||||
*, capabilities: "VendorCapabilities", # from src/vendor_capabilities.py
|
||||
) -> NormalizedResponse:
|
||||
```
|
||||
|
||||
The function:
|
||||
1. Translates `request.messages` into the OpenAI SDK's `messages` parameter (passthrough — already in OpenAI shape).
|
||||
2. Translates `request.tools` if non-None (passthrough for now; future: strip unsupported fields based on `capabilities`).
|
||||
3. Calls `client.chat.completions.create(...)` with the right parameters.
|
||||
4. If streaming: aggregates chunks; calls `stream_callback(text_chunk)` for each text delta; collects final usage from the last chunk.
|
||||
5. If non-streaming: parses the response in one shot.
|
||||
6. Returns a `NormalizedResponse` with text, tool calls (in OpenAI shape), usage stats.
|
||||
7. On exception: classifies the OpenAI exception and re-raises as `ProviderError`.
|
||||
|
||||
### Usage Pattern (per vendor)
|
||||
|
||||
```python
|
||||
# _send_grok, _send_llama (single-shot placeholders), _send_minimax (with restored tool loop)
|
||||
def _send_grok(md_content, user_message, base_dir, file_items=None, discussion_history="", stream=False, ...):
|
||||
client = _ensure_grok_client() # openai.OpenAI(api_key=..., base_url="https://api.x.ai/v1")
|
||||
with _grok_history_lock:
|
||||
# ... build messages, append user, system + context ...
|
||||
request = OpenAICompatibleRequest(
|
||||
messages=messages, model=_model, stream=stream,
|
||||
stream_callback=stream_callback,
|
||||
)
|
||||
caps = get_capabilities("grok", _model)
|
||||
response = send_openai_compatible(client, request, capabilities=caps)
|
||||
# ... append to history, return response.text ...
|
||||
```
|
||||
|
||||
### Qwen Adapter (`src/qwen_adapter.py`)
|
||||
|
||||
Qwen uses Alibaba's DashScope native SDK (not OpenAI-compatible) because DashScope's OpenAI-compatible mode drops important features (Qwen-Audio, Qwen-Long custom chunking, Qwen-VL-Max enhanced vision). The adapter normalizes DashScope tool format to OpenAI shape via `build_dashscope_tools()` and classifies DashScope exceptions via `classify_dashscope_error()`.
|
||||
|
||||
### Llama Multi-Backend
|
||||
|
||||
`_send_llama` supports 3 backends via the state globals `_llama_base_url` and `_llama_api_key`:
|
||||
- **Ollama** (local): `http://localhost:11434/v1`; no auth
|
||||
- **OpenRouter** (cloud aggregator): `https://openrouter.ai/api/v1`
|
||||
- **Custom URL** (escape hatch): any OpenAI-compatible endpoint
|
||||
|
||||
The local-LLM signal is `_get_llama_cost_tracking()` (returns False for localhost/127.0.0.1).
|
||||
|
||||
### Tests
|
||||
|
||||
- `tests/test_vendor_capabilities.py` (3 tests): registry lookup, vendor-default fallback, unknown-vendor raises
|
||||
- `tests/test_openai_compatible.py` (6 tests): non-streaming, streaming aggregation, tool call detection, vision, error classification, frozen dataclass
|
||||
- **[conductor/tracks/nagent_review_20260608/report.md §15 Pitfalls #2 and #4](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the per-provider history globals and the stateful singleton pattern; future-track candidate for stateless LLMClient
|
||||
|
||||
@@ -363,7 +363,7 @@ The file also defines several module-level constants used across the app:
|
||||
|
||||
```python
|
||||
# Provider routing
|
||||
PROVIDERS: list[str] = ["gemini", "anthropic", "deepseek", "MiniMax", "gemini-cli"]
|
||||
PROVIDERS: list[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
|
||||
|
||||
# Tool categories (for Tool Bias)
|
||||
TOOL_CATEGORIES: list[str] = [
|
||||
|
||||
@@ -0,0 +1,165 @@
|
||||
# Qwen/Llama/Grok Follow-Up Audit Report (2026-06-11)
|
||||
|
||||
**Date:** 2026-06-11
|
||||
**Author:** Tier 2 Tech Lead
|
||||
**Subject:** Why a follow-up track is needed after `qwen_llama_grok_integration_20260606` Phase 5
|
||||
|
||||
## TL;DR
|
||||
|
||||
The parent track shipped 5 of 6 phases with 50/79 tasks done. The Tech Lead **did not surface the gaps at the checkpoints**; the user discovered them only at the Phase 5 checkpoint. The user is right: the Tech Lead's "footnote for now" pattern is bad — it looks like the work was hidden until called out.
|
||||
|
||||
**7 categories of gap** are documented here. Each is captured in the new follow-up track `qwen_llama_grok_followup_20260611`.
|
||||
|
||||
---
|
||||
|
||||
## 1. Phase 5 partial: 1 of 9 UX adaptations shipped
|
||||
|
||||
**What shipped:** Adaptation 1 (Screenshot button iff vision) at `src/gui_2.py:3030` + the helper `_get_active_capabilities()` at `src/gui_2.py:733`.
|
||||
|
||||
**What didn't ship:** Adaptations 2-9:
|
||||
- Tools toggle iff tool_calling
|
||||
- Cache panel iff caching
|
||||
- Stream progress iff streaming
|
||||
- Fetch Models button iff model_discovery
|
||||
- Token budget max = context_window
|
||||
- Cost panel × 3 (estimate / "Free (local)" for localhost / "—" for other cost_tracking=false)
|
||||
|
||||
**The right move:** All 9 at once, OR explicit user-facing "I'm shipping 1 of 9; the other 8 are deferred" BEFORE doing adaptation 1. The Tech Lead did the latter in a footnote, which the user called out as bad UX.
|
||||
|
||||
---
|
||||
|
||||
## 2. Tool-call loop regression: only MiniMax works
|
||||
|
||||
**What shipped:** `_send_minimax` has a working tool loop. The other 7 vendor entry points do not.
|
||||
|
||||
| Vendor | Tool loop? | Why |
|
||||
|---|---|---|
|
||||
| `_send_minimax` | ✅ Works (231 → 75 lines after refactor + tool loop restoration) | Worker did the refactor; I added the tool loop back manually |
|
||||
| `_send_qwen` | ❌ Single-shot | Phase 2 worker omitted it (Qwen has DashScope-specific tool format) |
|
||||
| `_send_grok` | ❌ Single-shot | Phase 3 worker omitted it (placeholder) |
|
||||
| `_send_llama` | ❌ Single-shot | Phase 3 worker omitted it (placeholder) |
|
||||
| `_send_anthropic` | ✅ Inline (4-way duplication with the other 3) | Pre-existing pattern |
|
||||
| `_send_gemini` | ✅ Inline | Pre-existing pattern |
|
||||
| `_send_gemini_cli` | ✅ Inline | Pre-existing pattern |
|
||||
| `_send_deepseek` | ✅ Inline | Pre-existing pattern |
|
||||
|
||||
**The right move:** Lift the loop into a shared `run_with_tool_loop` helper that takes history management as injected parameters. Apply to all 8 vendors. This is a single-fix, 8-call-site refactor — much smaller than letting the duplication grow.
|
||||
|
||||
The Tech Lead caught this at the end of Phase 4 (during the MiniMax refactor) but should have caught it at the end of Phase 2 (when the Qwen worker shipped single-shot) or the end of Phase 3 (when Grok+Llama workers shipped single-shot).
|
||||
|
||||
---
|
||||
|
||||
## 3. `src/models.py` has a PROVIDERS list — the user is right that this is sprawl
|
||||
|
||||
**What's there now:**
|
||||
```python
|
||||
# src/models.py:79
|
||||
PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
|
||||
```
|
||||
|
||||
**The problem:** `src/models.py` is for **MMA data models** (Tickets, Tracks, FileItem, WorkerContext, etc.). The vendor list is an **AI client concern**. The audit script `audit_no_models_config_io.py` enforces config I/O rules; PROVIDERS has no analogous enforcement.
|
||||
|
||||
**The right move:** Move PROVIDERS to `src/ai_client.py` (or a new `src/ai_client_providers.py`). Add `scripts/audit_providers_source_of_truth.py` that fails the build if PROVIDERS is declared in models.py.
|
||||
|
||||
The Tech Lead justified keeping it in models.py with "the centralized registry pattern" without asking whether models.py was the right home.
|
||||
|
||||
---
|
||||
|
||||
## 4. `src/ai_client.py` is 2784 lines and growing
|
||||
|
||||
**What's there:** 8 vendor entry points (`_send_anthropic`, `_send_gemini`, `_send_gemini_cli`, `_send_deepseek`, `_send_minimax`, `_send_qwen`, `_send_grok`, `_send_llama`) plus all the supporting machinery (client init, history management, error classification, reasoning content extraction).
|
||||
|
||||
**The 8 vendors' inline patterns are 70% similar.** Each has:
|
||||
- Client init (credentials + SDK setup)
|
||||
- History management (per-vendor lock + history list + repair + trim)
|
||||
- Message building (system + context + user content)
|
||||
- API call (via SDK or HTTP)
|
||||
- Tool loop (or single-shot — see gap #2)
|
||||
- Reasoning content extraction
|
||||
- Error classification
|
||||
|
||||
**The right move:** Codepath consolidation. The shared `send_openai_compatible` covers the API call. A future `run_with_tool_loop` covers the tool loop (gap #2). What's left:
|
||||
- History management as a `VendorHistory` class or per-vendor thin wrapper
|
||||
- Reasoning content extraction as a uniform helper
|
||||
- Error classification as a per-HTTP-code helper
|
||||
|
||||
Could cut `src/ai_client.py` by 30-40% (~1000 lines).
|
||||
|
||||
---
|
||||
|
||||
## 5. Local models deserve more emphasis
|
||||
|
||||
**What's there now:** Ollama is one of 3 Llama backends (Ollama, OpenRouter, custom_url). The `cost_tracking: False` for localhost is a small signal.
|
||||
|
||||
**The user feedback (verbatim):** "I want to put more emphasis and supporting local models and separating local model vending vis online/cloud vendors of models."
|
||||
|
||||
**The right architecture:**
|
||||
- Add `local: bool` to VendorCapabilities (separate from `cost_tracking`)
|
||||
- Native Ollama (`/api/chat`) as the **default** for Llama (not the OpenAI-compatible fallback)
|
||||
- Meta Llama API as a 4th backend (the docs URL returned 400 last session; needs re-verification)
|
||||
- GUI: "Local Model" badge per-vendor
|
||||
- Cost panel: 4th state "Local (no cost)" distinct from "Free (local)" and "—"
|
||||
- vLLM, LM Studio, llama.cpp as additional custom-URL backends with discoverable presets
|
||||
|
||||
This is a significant priority shift. The follow-up track's Phase 4 leads with this.
|
||||
|
||||
---
|
||||
|
||||
## 6. V2 matrix field expansion documented but not implemented
|
||||
|
||||
**What the spec says (per Grok's consultation):** Add 12 new fields to VendorCapabilities:
|
||||
- `local: bool`
|
||||
- `reasoning: bool` (xAI `reasoning_effort`, Anthropic extended thinking, Ollama `think`)
|
||||
- `structured_output: bool` (response_format / format)
|
||||
- `code_execution: bool` (xAI code_interpreter, Anthropic Computer Use, Gemini Code Execution)
|
||||
- `web_search: bool` (xAI web_search, Gemini Grounding)
|
||||
- `x_search: bool` (xAI X/Twitter search)
|
||||
- `file_search: bool` (xAI file_search, Anthropic PDF, Gemini file API)
|
||||
- `mcp_support: bool` (xAI mcp_calls, Anthropic MCP)
|
||||
- `audio: bool` (Qwen-Audio, Gemini audio)
|
||||
- `video: bool` (Gemini video)
|
||||
- `grounding: bool` (Gemini Grounding with Google Search)
|
||||
- `computer_use: bool` (Anthropic Computer Use)
|
||||
|
||||
**What shipped:** 0 of 12. None wired. No UI adaptations.
|
||||
|
||||
The follow-up track's Phase 4 lands these.
|
||||
|
||||
---
|
||||
|
||||
## 7. Anthropic / Gemini / DeepSeek still not on the matrix
|
||||
|
||||
**What's there:** These 3 vendors have unique APIs (4-breakpoint caching, genai SDK, raw HTTP) and the migration to the matrix is non-trivial. The follow-up track is documented (`parent spec §13.1.A`) but never scheduled.
|
||||
|
||||
**The value:** Anthropic has prompt caching, extended thinking, Computer Use (big UX wins). Gemini has Grounding with Google Search, native video. DeepSeek has reasoning models.
|
||||
|
||||
The follow-up track's Phase 5 lands these.
|
||||
|
||||
---
|
||||
|
||||
## Lessons (Tech Lead Process)
|
||||
|
||||
1. **Surface gaps as they appear, not at the checkpoint.** If a task is going to be deferred mid-phase, say so immediately — don't footnote it later.
|
||||
2. **Be explicit about architectural deviations.** The `src/models.py` PROVIDERS sprawl should have been raised at Phase 2, not at Phase 5.
|
||||
3. **Plan for the test infrastructure before coding.** The tool-loop regression wasn't caught because no test exercised the loop.
|
||||
4. **The "footnote for now" pattern is bad UX.** It looks like the work was hidden until called out. Either ship the work or be explicit about deferring it BEFORE doing the work.
|
||||
|
||||
## Follow-Up Track
|
||||
|
||||
`conductor/tracks/qwen_llama_grok_followup_20260611/` — 5 phases:
|
||||
- Phase 1: Tool loop lift (run_with_tool_loop helper for 8 vendors)
|
||||
- Phase 2: PROVIDERS move (out of src/models.py)
|
||||
- Phase 3: UX adaptations 2-9 (8 of 9 deferred from parent Phase 5)
|
||||
- Phase 4: Local-first + matrix v2 expansion (12 new fields)
|
||||
- Phase 5: Anthropic / Gemini / DeepSeek migration
|
||||
|
||||
## Parent Track Status
|
||||
|
||||
`qwen_llama_grok_integration_20260606` is **NOT being archived** (per user directive). It stays open in `conductor/tracks/` for the follow-up to use as a reference. Phase 6 docs are being done now; the track folder remains at the same path.
|
||||
|
||||
## See Also
|
||||
|
||||
- `conductor/tracks/qwen_llama_grok_followup_20260611/spec.md` — the follow-up spec
|
||||
- `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` — the follow-up state
|
||||
- `conductor/tracks/qwen_llama_grok_followup_20260611/TODO.md` — the setup checklist
|
||||
- `conductor/tracks/qwen_llama_grok_integration_20260606/` — the parent track
|
||||
Reference in New Issue
Block a user