Private
Public Access
0
0

docs(phase-6): update ai_client+models guides; report + follow-up track setup

Phase 6 t6.1 + t6.2 (no archive per user directive):
- docs/guide_ai_client.md: update Overview to mention 8 providers (was 5);
  add 'Shared OpenAI-Compatible Helper' section explaining
  src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest,
  send_openai_compatible, usage pattern); document the Qwen adapter
  and Llama multi-backend.
- docs/guide_models.md: update PROVIDERS list to 8 entries (was 5).
- conductor/tracks.md: update the Qwen track entry to reflect
  '50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up';
  add detailed status note pointing to the follow-up track + audit
  report.
- docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report
  explaining why a follow-up is needed (7 categories of gaps; the
  Tech Lead's 'footnote for now' failure mode; the lessons learned).
- conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up
  track setup (spec.md, state.toml, metadata.json, TODO.md).
  5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9,
  local-first + matrix v2, Anthropic/Gemini/DeepSeek migration.

Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed)
are NOT applied per user directive: 'we can then doc this we're not
archiving yet, if we have a follow up track I need this one to stay
up because there is still alot todo'.
This commit is contained in:
2026-06-11 09:33:18 -04:00
parent 457255bcd4
commit 691dc584eb
8 changed files with 745 additions and 3 deletions
+95 -1
View File
@@ -6,10 +6,17 @@
## Overview
`src/ai_client.py` (~116KB) is the **unified LLM client** for 5 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI) behind a single `send()` function.
`src/ai_client.py` (~116KB) is the **unified LLM client** for 8 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI, Qwen, Grok, Llama) behind a single `send()` function.
The module is a **stateful singleton** — all provider state is held in module-level globals. There is no class wrapping; the module itself is the abstraction layer.
The 8 providers split into 3 API shapes:
- **Native SDK**: Gemini (google-genai), Anthropic (anthropic), Qwen (DashScope)
- **OpenAI-compatible**: MiniMax, Grok, Llama (Ollama/OpenRouter/custom), DeepSeek
- **Subprocess**: Gemini CLI
The OpenAI-compatible vendors all call the shared helper in `src/openai_compatible.py` (added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track; see "Shared OpenAI-Compatible Helper" section below). The MiniMax provider's `_send_minimax` was refactored to use this helper (Phase 4 of the same track, 231 → 75 lines, 68% reduction).
---
## Module-Level Imports
@@ -430,4 +437,91 @@ Gated by env var (e.g., `RUN_REAL_AI_TESTS=1`). Hits the real API. Not in defaul
- **[guide_state_lifecycle.md](guide_state_lifecycle.md)** — The per-provider history globals (`_anthropic_history`, etc.) are managed here; their locking and reset behavior is documented
- **[guide_context_aggregation.md](guide_context_aggregation.md)** — The `aggregate.py` pipeline that produces the markdown the AI client sends
- **[conductor/product.md](../conductor/product.md#multi-provider-integration)** — Product-level overview of providers
- **[docs/reports/qwen_llama_grok_followup_audit_20260611.md](qwen_llama_grok_followup_audit_20260611.md)** — Audit of the parent track's gaps; follow-up track `qwen_llama_grok_followup_20260611` covers them
---
## Shared OpenAI-Compatible Helper (`src/openai_compatible.py`)
Added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track. Operates on a normalized request/response data structure so 4 OpenAI-compatible vendors (MiniMax, Grok, Llama, DeepSeek) can share the same request building, response parsing, streaming aggregation, tool call detection, and error classification logic.
### Data Structures
```python
@dataclass(frozen=True)
class NormalizedResponse:
text: str
tool_calls: list[dict[str, Any]]
usage_input_tokens: int
usage_output_tokens: int
usage_cache_read_tokens: int
usage_cache_creation_tokens: int
raw_response: Any
@dataclass
class OpenAICompatibleRequest:
messages: list[dict[str, Any]]
model: str
temperature: float = 0.0
top_p: float = 1.0
max_tokens: int = 8192
tools: Optional[list[dict[str, Any]]] = None
tool_choice: str = "auto"
stream: bool = False
stream_callback: Optional[Callable[[str], None]] = None
```
### The Function
```python
def send_openai_compatible(
client: Any, # openai.OpenAI client with vendor-specific base_url + auth
request: OpenAICompatibleRequest,
*, capabilities: "VendorCapabilities", # from src/vendor_capabilities.py
) -> NormalizedResponse:
```
The function:
1. Translates `request.messages` into the OpenAI SDK's `messages` parameter (passthrough — already in OpenAI shape).
2. Translates `request.tools` if non-None (passthrough for now; future: strip unsupported fields based on `capabilities`).
3. Calls `client.chat.completions.create(...)` with the right parameters.
4. If streaming: aggregates chunks; calls `stream_callback(text_chunk)` for each text delta; collects final usage from the last chunk.
5. If non-streaming: parses the response in one shot.
6. Returns a `NormalizedResponse` with text, tool calls (in OpenAI shape), usage stats.
7. On exception: classifies the OpenAI exception and re-raises as `ProviderError`.
### Usage Pattern (per vendor)
```python
# _send_grok, _send_llama (single-shot placeholders), _send_minimax (with restored tool loop)
def _send_grok(md_content, user_message, base_dir, file_items=None, discussion_history="", stream=False, ...):
client = _ensure_grok_client() # openai.OpenAI(api_key=..., base_url="https://api.x.ai/v1")
with _grok_history_lock:
# ... build messages, append user, system + context ...
request = OpenAICompatibleRequest(
messages=messages, model=_model, stream=stream,
stream_callback=stream_callback,
)
caps = get_capabilities("grok", _model)
response = send_openai_compatible(client, request, capabilities=caps)
# ... append to history, return response.text ...
```
### Qwen Adapter (`src/qwen_adapter.py`)
Qwen uses Alibaba's DashScope native SDK (not OpenAI-compatible) because DashScope's OpenAI-compatible mode drops important features (Qwen-Audio, Qwen-Long custom chunking, Qwen-VL-Max enhanced vision). The adapter normalizes DashScope tool format to OpenAI shape via `build_dashscope_tools()` and classifies DashScope exceptions via `classify_dashscope_error()`.
### Llama Multi-Backend
`_send_llama` supports 3 backends via the state globals `_llama_base_url` and `_llama_api_key`:
- **Ollama** (local): `http://localhost:11434/v1`; no auth
- **OpenRouter** (cloud aggregator): `https://openrouter.ai/api/v1`
- **Custom URL** (escape hatch): any OpenAI-compatible endpoint
The local-LLM signal is `_get_llama_cost_tracking()` (returns False for localhost/127.0.0.1).
### Tests
- `tests/test_vendor_capabilities.py` (3 tests): registry lookup, vendor-default fallback, unknown-vendor raises
- `tests/test_openai_compatible.py` (6 tests): non-streaming, streaming aggregation, tool call detection, vision, error classification, frozen dataclass
- **[conductor/tracks/nagent_review_20260608/report.md §15 Pitfalls #2 and #4](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the per-provider history globals and the stateful singleton pattern; future-track candidate for stateless LLMClient
+1 -1
View File
@@ -363,7 +363,7 @@ The file also defines several module-level constants used across the app:
```python
# Provider routing
PROVIDERS: list[str] = ["gemini", "anthropic", "deepseek", "MiniMax", "gemini-cli"]
PROVIDERS: list[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
# Tool categories (for Tool Bias)
TOOL_CATEGORIES: list[str] = [
@@ -0,0 +1,165 @@
# Qwen/Llama/Grok Follow-Up Audit Report (2026-06-11)
**Date:** 2026-06-11
**Author:** Tier 2 Tech Lead
**Subject:** Why a follow-up track is needed after `qwen_llama_grok_integration_20260606` Phase 5
## TL;DR
The parent track shipped 5 of 6 phases with 50/79 tasks done. The Tech Lead **did not surface the gaps at the checkpoints**; the user discovered them only at the Phase 5 checkpoint. The user is right: the Tech Lead's "footnote for now" pattern is bad — it looks like the work was hidden until called out.
**7 categories of gap** are documented here. Each is captured in the new follow-up track `qwen_llama_grok_followup_20260611`.
---
## 1. Phase 5 partial: 1 of 9 UX adaptations shipped
**What shipped:** Adaptation 1 (Screenshot button iff vision) at `src/gui_2.py:3030` + the helper `_get_active_capabilities()` at `src/gui_2.py:733`.
**What didn't ship:** Adaptations 2-9:
- Tools toggle iff tool_calling
- Cache panel iff caching
- Stream progress iff streaming
- Fetch Models button iff model_discovery
- Token budget max = context_window
- Cost panel × 3 (estimate / "Free (local)" for localhost / "—" for other cost_tracking=false)
**The right move:** All 9 at once, OR explicit user-facing "I'm shipping 1 of 9; the other 8 are deferred" BEFORE doing adaptation 1. The Tech Lead did the latter in a footnote, which the user called out as bad UX.
---
## 2. Tool-call loop regression: only MiniMax works
**What shipped:** `_send_minimax` has a working tool loop. The other 7 vendor entry points do not.
| Vendor | Tool loop? | Why |
|---|---|---|
| `_send_minimax` | ✅ Works (231 → 75 lines after refactor + tool loop restoration) | Worker did the refactor; I added the tool loop back manually |
| `_send_qwen` | ❌ Single-shot | Phase 2 worker omitted it (Qwen has DashScope-specific tool format) |
| `_send_grok` | ❌ Single-shot | Phase 3 worker omitted it (placeholder) |
| `_send_llama` | ❌ Single-shot | Phase 3 worker omitted it (placeholder) |
| `_send_anthropic` | ✅ Inline (4-way duplication with the other 3) | Pre-existing pattern |
| `_send_gemini` | ✅ Inline | Pre-existing pattern |
| `_send_gemini_cli` | ✅ Inline | Pre-existing pattern |
| `_send_deepseek` | ✅ Inline | Pre-existing pattern |
**The right move:** Lift the loop into a shared `run_with_tool_loop` helper that takes history management as injected parameters. Apply to all 8 vendors. This is a single-fix, 8-call-site refactor — much smaller than letting the duplication grow.
The Tech Lead caught this at the end of Phase 4 (during the MiniMax refactor) but should have caught it at the end of Phase 2 (when the Qwen worker shipped single-shot) or the end of Phase 3 (when Grok+Llama workers shipped single-shot).
---
## 3. `src/models.py` has a PROVIDERS list — the user is right that this is sprawl
**What's there now:**
```python
# src/models.py:79
PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
```
**The problem:** `src/models.py` is for **MMA data models** (Tickets, Tracks, FileItem, WorkerContext, etc.). The vendor list is an **AI client concern**. The audit script `audit_no_models_config_io.py` enforces config I/O rules; PROVIDERS has no analogous enforcement.
**The right move:** Move PROVIDERS to `src/ai_client.py` (or a new `src/ai_client_providers.py`). Add `scripts/audit_providers_source_of_truth.py` that fails the build if PROVIDERS is declared in models.py.
The Tech Lead justified keeping it in models.py with "the centralized registry pattern" without asking whether models.py was the right home.
---
## 4. `src/ai_client.py` is 2784 lines and growing
**What's there:** 8 vendor entry points (`_send_anthropic`, `_send_gemini`, `_send_gemini_cli`, `_send_deepseek`, `_send_minimax`, `_send_qwen`, `_send_grok`, `_send_llama`) plus all the supporting machinery (client init, history management, error classification, reasoning content extraction).
**The 8 vendors' inline patterns are 70% similar.** Each has:
- Client init (credentials + SDK setup)
- History management (per-vendor lock + history list + repair + trim)
- Message building (system + context + user content)
- API call (via SDK or HTTP)
- Tool loop (or single-shot — see gap #2)
- Reasoning content extraction
- Error classification
**The right move:** Codepath consolidation. The shared `send_openai_compatible` covers the API call. A future `run_with_tool_loop` covers the tool loop (gap #2). What's left:
- History management as a `VendorHistory` class or per-vendor thin wrapper
- Reasoning content extraction as a uniform helper
- Error classification as a per-HTTP-code helper
Could cut `src/ai_client.py` by 30-40% (~1000 lines).
---
## 5. Local models deserve more emphasis
**What's there now:** Ollama is one of 3 Llama backends (Ollama, OpenRouter, custom_url). The `cost_tracking: False` for localhost is a small signal.
**The user feedback (verbatim):** "I want to put more emphasis and supporting local models and separating local model vending vis online/cloud vendors of models."
**The right architecture:**
- Add `local: bool` to VendorCapabilities (separate from `cost_tracking`)
- Native Ollama (`/api/chat`) as the **default** for Llama (not the OpenAI-compatible fallback)
- Meta Llama API as a 4th backend (the docs URL returned 400 last session; needs re-verification)
- GUI: "Local Model" badge per-vendor
- Cost panel: 4th state "Local (no cost)" distinct from "Free (local)" and "—"
- vLLM, LM Studio, llama.cpp as additional custom-URL backends with discoverable presets
This is a significant priority shift. The follow-up track's Phase 4 leads with this.
---
## 6. V2 matrix field expansion documented but not implemented
**What the spec says (per Grok's consultation):** Add 12 new fields to VendorCapabilities:
- `local: bool`
- `reasoning: bool` (xAI `reasoning_effort`, Anthropic extended thinking, Ollama `think`)
- `structured_output: bool` (response_format / format)
- `code_execution: bool` (xAI code_interpreter, Anthropic Computer Use, Gemini Code Execution)
- `web_search: bool` (xAI web_search, Gemini Grounding)
- `x_search: bool` (xAI X/Twitter search)
- `file_search: bool` (xAI file_search, Anthropic PDF, Gemini file API)
- `mcp_support: bool` (xAI mcp_calls, Anthropic MCP)
- `audio: bool` (Qwen-Audio, Gemini audio)
- `video: bool` (Gemini video)
- `grounding: bool` (Gemini Grounding with Google Search)
- `computer_use: bool` (Anthropic Computer Use)
**What shipped:** 0 of 12. None wired. No UI adaptations.
The follow-up track's Phase 4 lands these.
---
## 7. Anthropic / Gemini / DeepSeek still not on the matrix
**What's there:** These 3 vendors have unique APIs (4-breakpoint caching, genai SDK, raw HTTP) and the migration to the matrix is non-trivial. The follow-up track is documented (`parent spec §13.1.A`) but never scheduled.
**The value:** Anthropic has prompt caching, extended thinking, Computer Use (big UX wins). Gemini has Grounding with Google Search, native video. DeepSeek has reasoning models.
The follow-up track's Phase 5 lands these.
---
## Lessons (Tech Lead Process)
1. **Surface gaps as they appear, not at the checkpoint.** If a task is going to be deferred mid-phase, say so immediately — don't footnote it later.
2. **Be explicit about architectural deviations.** The `src/models.py` PROVIDERS sprawl should have been raised at Phase 2, not at Phase 5.
3. **Plan for the test infrastructure before coding.** The tool-loop regression wasn't caught because no test exercised the loop.
4. **The "footnote for now" pattern is bad UX.** It looks like the work was hidden until called out. Either ship the work or be explicit about deferring it BEFORE doing the work.
## Follow-Up Track
`conductor/tracks/qwen_llama_grok_followup_20260611/` — 5 phases:
- Phase 1: Tool loop lift (run_with_tool_loop helper for 8 vendors)
- Phase 2: PROVIDERS move (out of src/models.py)
- Phase 3: UX adaptations 2-9 (8 of 9 deferred from parent Phase 5)
- Phase 4: Local-first + matrix v2 expansion (12 new fields)
- Phase 5: Anthropic / Gemini / DeepSeek migration
## Parent Track Status
`qwen_llama_grok_integration_20260606` is **NOT being archived** (per user directive). It stays open in `conductor/tracks/` for the follow-up to use as a reference. Phase 6 docs are being done now; the track folder remains at the same path.
## See Also
- `conductor/tracks/qwen_llama_grok_followup_20260611/spec.md` — the follow-up spec
- `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` — the follow-up state
- `conductor/tracks/qwen_llama_grok_followup_20260611/TODO.md` — the setup checklist
- `conductor/tracks/qwen_llama_grok_integration_20260606/` — the parent track