diff --git a/conductor/tracks/qwen_llama_grok_integration_20260606/plan.md b/conductor/tracks/qwen_llama_grok_integration_20260606/plan.md new file mode 100644 index 00000000..813d71f3 --- /dev/null +++ b/conductor/tracks/qwen_llama_grok_integration_20260606/plan.md @@ -0,0 +1,2177 @@ +# Qwen, Llama & Grok Vendor Integration + Capability Matrix — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add first-class support for Qwen (DashScope native), Llama (Ollama + OpenRouter + custom URL), and Grok (xAI). Introduce a Vendor Capability Matrix that declares per-(vendor, model) feature support and lets the GUI adapt dynamically. Refactor the existing MiniMax provider to use a new shared OpenAI-compatible send helper. Data-oriented design: shared algorithm on normalized data; per-vendor entry points are thin boundary adapters. + +**Architecture:** Three new modules — `src/vendor_capabilities.py` (matrix framework + registry), `src/openai_compatible.py` (shared helper), and the new vendor entry points in `src/ai_client.py`. The shared helper operates on a `NormalizedResponse` data structure. Each vendor's `_send_()` is a thin adapter that initializes a vendor-specific client, loads its history, calls the shared helper, and updates the history. MiniMax is refactored to use the helper (pure win; same SDK, same patterns). The GUI reads `get_capabilities(active_vendor, active_model)` once per render and uses the 7 v1 capabilities (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking) to enable/disable 9 UI elements. + +**Tech Stack:** Python 3.11+, pytest, `dashscope>=1.14.0,<2.0.0` (new dep), `openai>=1.0.0` (existing). Stdlib `dataclasses`, `enum`, `threading`, `typing`. **1-space indentation mandatory.** No comments in production code (per project style). + +**Reference:** See `conductor/tracks/qwen_llama_grok_integration_20260606/spec.md` for the full design, data model, per-vendor details, and UX adaptations. + +--- + +## File Structure + +| File | Action | Responsibility | +|---|---|---| +| `src/vendor_capabilities.py` | Create | `VendorCapabilities` dataclass, `_REGISTRY`, `get_capabilities(vendor, model) -> VendorCapabilities`, `list_models_for_vendor(vendor) -> list[str]` | +| `src/openai_compatible.py` | Create | `NormalizedResponse`, `OpenAICompatibleRequest`, `send_openai_compatible(client, request, capabilities) -> NormalizedResponse`, `_classify_openai_compatible_error()` | +| `src/ai_client.py` | Modify | Add `_qwen_*`, `_llama_*`, `_grok_*` state + functions; refactor `_send_minimax()` to use `send_openai_compatible`; add `qwen`, `llama`, `grok` to `_PROVIDERS_LIST` (if such exists) and to `set_provider()` validation | +| `src/cost_tracker.py` | Modify | Add pricing entries for Qwen, Llama, Grok models | +| `src/models.py` | Modify | If `PROVIDERS` constant lives here, add `qwen`, `llama`, `grok` | +| `src/gui_2.py` | Modify | Add `_get_active_capabilities()` helper; apply 9 UX adaptations; register new providers in `PROVIDERS` list | +| `src/app_controller.py` | Modify | Register new providers in `PROVIDERS` list; pass capabilities to relevant code paths | +| `pyproject.toml` | Modify | Add `dashscope>=1.14.0,<2.0.0` to dependencies | +| `credentials_template.toml` | Modify | Add `[qwen]`, `[llama]`, `[grok]` example sections | +| `tests/test_vendor_capabilities.py` | Create | 5+ tests for the matrix | +| `tests/test_openai_compatible.py` | Create | 6+ tests for the shared helper | +| `tests/test_qwen_provider.py` | Create | 5+ tests for the Qwen provider | +| `tests/test_llama_provider.py` | Create | 6+ tests for the Llama multi-backend provider | +| `tests/test_grok_provider.py` | Create | 2+ tests for the Grok provider | +| `tests/test_minimax_provider.py` | Modify (Phase 4) | Verify refactor preserves behavior; existing tests should pass unchanged | +| `docs/guide_ai_client.md` | Modify (Phase 6) | Document new vendors, capability matrix, shared helper | +| `docs/guide_models.md` | Modify (Phase 6) | Document new PROVIDERS entries | +| `conductor/tracks.md` | Modify (Phase 6) | Move entry from Backlog to Recently Completed | + +--- + +# Phase 1: Capability Matrix Framework + Shared Helper + +> Goal: New `src/vendor_capabilities.py` and `src/openai_compatible.py` modules exist with passing unit tests. `dashscope` added to `pyproject.toml`. No changes to `src/ai_client.py` yet (MiniMax refactor is Phase 4). + +--- + +## Task 1.1: Add VendorCapabilities dataclass + minimal registry + +**Files:** +- Create: `src/vendor_capabilities.py` + +- [ ] **Step 1: Create the file with the dataclass and a minimal registry** + +```python +from dataclasses import dataclass + +@dataclass(frozen=True) +class VendorCapabilities: + vendor: str + model: str + vision: bool = False + tool_calling: bool = True + caching: bool = False + streaming: bool = True + model_discovery: bool = True + context_window: int = 8192 + cost_tracking: bool = True + cost_input_per_mtok: float = 0.0 + cost_output_per_mtok: float = 0.0 + notes: str = "" + +_REGISTRY: dict[tuple[str, str], VendorCapabilities] = {} + +def register(cap: VendorCapabilities) -> None: + _REGISTRY[(cap.vendor, cap.model)] = cap + +def get_capabilities(vendor: str, model: str) -> VendorCapabilities: + if (vendor, model) in _REGISTRY: + return _REGISTRY[(vendor, model)] + if (vendor, "*") in _REGISTRY: + return _REGISTRY[(vendor, "*")] + raise KeyError(f"No capabilities registered for vendor={vendor!r} model={model!r}") +``` + +- [ ] **Step 2: Verify the file is importable** + +Run: `uv run python -c "from src.vendor_capabilities import VendorCapabilities, get_capabilities, register; c = VendorCapabilities(vendor='test', model='m'); register(c); print(get_capabilities('test', 'm'))"` +Expected: prints a `VendorCapabilities(...)` line. + +- [ ] **Step 3: Commit** + +```bash +git add src/vendor_capabilities.py +git commit -m "feat(vendor_capabilities): add VendorCapabilities dataclass and registry" +``` + +--- + +## Task 1.2: Write red tests for get_capabilities + +**Files:** +- Create: `tests/test_vendor_capabilities.py` + +- [ ] **Step 1: Create the test file with 5 tests** + +```python +import pytest +from src.vendor_capabilities import VendorCapabilities, get_capabilities, register + +@pytest.fixture(autouse=True) +def _clean_registry(): + from src import vendor_capabilities + original = dict(vendor_capabilities._REGISTRY) + yield + vendor_capabilities._REGISTRY.clear() + vendor_capabilities._REGISTRY.update(original) + +def test_get_capabilities_lookup_known_model() -> None: + register(VendorCapabilities(vendor="qwen", model="qwen-max", vision=False, context_window=32768)) + caps = get_capabilities("qwen", "qwen-max") + assert caps.vendor == "qwen" + assert caps.model == "qwen-max" + assert caps.context_window == 32768 + assert caps.vision is False + +def test_get_capabilities_fallback_to_vendor_default() -> None: + register(VendorCapabilities(vendor="llama", model="*", context_window=131072, cost_tracking=False)) + caps = get_capabilities("llama", "llama-3.3-70b-specdec") + assert caps.context_window == 131072 + assert caps.cost_tracking is False + +def test_get_capabilities_unknown_vendor_raises() -> None: + with pytest.raises(KeyError, match="No capabilities registered"): + get_capabilities("nonexistent", "anymodel") + +def test_get_capabilities_unknown_model_raises() -> None: + register(VendorCapabilities(vendor="qwen", model="qwen-max")) + with pytest.raises(KeyError, match="No capabilities registered"): + get_capabilities("qwen", "qwen-nonexistent") + +def test_register_overwrites_existing_entry() -> None: + register(VendorCapabilities(vendor="qwen", model="qwen-max", context_window=32768)) + register(VendorCapabilities(vendor="qwen", model="qwen-max", context_window=65536)) + caps = get_capabilities("qwen", "qwen-max") + assert caps.context_window == 65536 +``` + +- [ ] **Step 2: Run, confirm 5 pass (registry already implemented)** + +Run: `uv run pytest tests/test_vendor_capabilities.py -v` +Expected: All 5 tests PASS (the registry was implemented in Task 1.1; tests confirm behavior). + +- [ ] **Step 3: Commit** + +```bash +git add tests/test_vendor_capabilities.py +git commit -m "test(vendor_capabilities): add tests for registry lookup and fallback" +``` + +--- + +## Task 1.3: Add list_models_for_vendor() and initial registry population + +**Files:** +- Modify: `src/vendor_capabilities.py` + +- [ ] **Step 1: Add list_models_for_vendor() and register initial entries for the 4 OpenAI-compatible vendors + Qwen** + +Append to `src/vendor_capabilities.py`: + +```python +def list_models_for_vendor(vendor: str) -> list[str]: + return sorted({m for v, m in _REGISTRY if v == vendor and m != "*"}) + +register(VendorCapabilities(vendor="minimax", model="*", context_window=131072, cost_input_per_mtok=0.20, cost_output_per_mtok=0.20)) +register(VendorCapabilities(vendor="minimax", model="grok-2-latest", context_window=131072)) + +register(VendorCapabilities(vendor="grok", model="*", context_window=131072, cost_input_per_mtok=2.00, cost_output_per_mtok=10.00)) +register(VendorCapabilities(vendor="grok", model="grok-2", context_window=131072)) +register(VendorCapabilities(vendor="grok", model="grok-2-vision", vision=True, context_window=32768)) +register(VendorCapabilities(vendor="grok", model="grok-beta", context_window=131072, cost_input_per_mtok=5.00, cost_output_per_mtok=15.00)) + +register(VendorCapabilities(vendor="llama", model="*", context_window=131072)) +register(VendorCapabilities(vendor="llama", model="llama-3.1-8b-instant", context_window=131072, cost_input_per_mtok=0.05, cost_output_per_mtok=0.08)) +register(VendorCapabilities(vendor="llama", model="llama-3.1-70b-versatile", context_window=131072, cost_input_per_mtok=0.59, cost_output_per_mtok=0.79)) +register(VendorCapabilities(vendor="llama", model="llama-3.1-405b-reasoning", context_window=131072, cost_input_per_mtok=3.00, cost_output_per_mtok=3.00)) +register(VendorCapabilities(vendor="llama", model="llama-3.2-1b-preview", context_window=131072, cost_input_per_mtok=0.04, cost_output_per_mtok=0.04)) +register(VendorCapabilities(vendor="llama", model="llama-3.2-3b-preview", context_window=131072, cost_input_per_mtok=0.06, cost_output_per_mtok=0.06)) +register(VendorCapabilities(vendor="llama", model="llama-3.2-11b-vision-preview", vision=True, context_window=131072, cost_input_per_mtok=0.18, cost_output_per_mtok=0.18)) +register(VendorCapabilities(vendor="llama", model="llama-3.2-90b-vision-preview", vision=True, context_window=131072, cost_input_per_mtok=0.90, cost_output_per_mtok=0.90)) +register(VendorCapabilities(vendor="llama", model="llama-3.3-70b-specdec", context_window=131072, cost_input_per_mtok=0.59, cost_output_per_mtok=0.79)) + +register(VendorCapabilities(vendor="qwen", model="*", context_window=32768)) +register(VendorCapabilities(vendor="qwen", model="qwen-turbo", context_window=1000000, cost_input_per_mtok=0.05, cost_output_per_mtok=0.10)) +register(VendorCapabilities(vendor="qwen", model="qwen-plus", context_window=131072, cost_input_per_mtok=0.40, cost_output_per_mtok=1.20)) +register(VendorCapabilities(vendor="qwen", model="qwen-max", context_window=32768, cost_input_per_mtok=2.00, cost_output_per_mtok=6.00)) +register(VendorCapabilities(vendor="qwen", model="qwen-long", context_window=1000000, cost_input_per_mtok=0.07, cost_output_per_mtok=0.28)) +register(VendorCapabilities(vendor="qwen", model="qwen-vl-plus", vision=True, context_window=131072, cost_input_per_mtok=0.21, cost_output_per_mtok=0.63)) +register(VendorCapabilities(vendor="qwen", model="qwen-vl-max", vision=True, context_window=32768, cost_input_per_mtok=0.50, cost_output_per_mtok=1.50)) +register(VendorCapabilities(vendor="qwen", model="qwen-audio", context_window=32768, cost_input_per_mtok=0.10, cost_output_per_mtok=0.30, notes="Text-only in v1; audio input deferred")) + +register(VendorCapabilities(vendor="anthropic", model="*", caching=True, context_window=180000, cost_input_per_mtok=3.00, cost_output_per_mtok=15.00, notes="pending_migration")) +register(VendorCapabilities(vendor="gemini", model="*", caching=True, context_window=900000, cost_input_per_mtok=1.25, cost_output_per_mtok=5.00, notes="pending_migration")) +register(VendorCapabilities(vendor="deepseek", model="*", context_window=32768, cost_input_per_mtok=0.14, cost_output_per_mtok=0.28, notes="pending_migration")) +``` + +- [ ] **Step 2: Write a test for list_models_for_vendor()** + +Append to `tests/test_vendor_capabilities.py`: + +```python +def test_list_models_for_vendor_returns_sorted() -> None: + register(VendorCapabilities(vendor="test", model="zeta")) + register(VendorCapabilities(vendor="test", model="alpha")) + register(VendorCapabilities(vendor="test", model="*")) + models = list_models_for_vendor("test") + assert models == ["alpha", "zeta"] + assert "*" not in models + +def test_list_models_for_vendor_unknown_returns_empty() -> None: + assert list_models_for_vendor("nonexistent") == [] +``` + +- [ ] **Step 3: Run, confirm all 7 tests pass** + +Run: `uv run pytest tests/test_vendor_capabilities.py -v` +Expected: 7 tests PASS. + +- [ ] **Step 4: Commit** + +```bash +git add src/vendor_capabilities.py tests/test_vendor_capabilities.py +git commit -m "feat(vendor_capabilities): add list_models_for_vendor + initial registry (Qwen/Llama/Grok/MiniMax/stubs)" +``` + +--- + +## Task 1.4: Add dashscope dependency to pyproject.toml + +**Files:** +- Modify: `pyproject.toml` (find the `dependencies = [` block) + +- [ ] **Step 1: Read the dependencies block** + +Run: `manual-slop_get_file_slice path=pyproject.toml start_line=1 end_line=80` (or grep for `dependencies`). + +- [ ] **Step 2: Add dashscope to the dependencies list** + +If dependencies are a list, add: +```toml +"dashscope>=1.14.0,<2.0.0", +``` + +If dependencies are a different format, add accordingly. **1-space indentation** in toml; place near other SDK entries (`openai`, `anthropic`, etc.). + +- [ ] **Step 3: Verify the dependency resolves** + +Run: `uv lock && uv sync 2>&1 | tail -10` +Expected: dashscope installs successfully (or already present). + +- [ ] **Step 4: Commit** + +```bash +git add pyproject.toml uv.lock +git commit -m "chore(deps): add dashscope>=1.14.0,<2.0.0 for Qwen support" +``` + +--- + +## Task 1.5: Write red tests for src/openai_compatible.py + +**Files:** +- Create: `tests/test_openai_compatible.py` + +- [ ] **Step 1: Create the test file with 6 tests** + +```python +from unittest.mock import MagicMock, patch +import pytest +from src.openai_compatible import ( + NormalizedResponse, + OpenAICompatibleRequest, + send_openai_compatible, +) +from src.vendor_capabilities import VendorCapabilities, register + +@pytest.fixture +def caps() -> VendorCapabilities: + return VendorCapabilities(vendor="test", model="test-model", context_window=8192, cost_input_per_mtok=1.0, cost_output_per_mtok=2.0) + +def _mock_completion(text: str = "hello", tool_calls: list | None = None, usage_input: int = 10, usage_output: int = 5): + m = MagicMock() + m.choices = [MagicMock()] + m.choices[0].message.content = text + m.choices[0].message.tool_calls = tool_calls or [] + m.usage.prompt_tokens = usage_input + m.usage.completion_tokens = usage_output + m.usage.prompt_tokens_details = None + m.usage.completion_tokens_details = None + return m + +def test_send_non_streaming_returns_normalized_response(caps: VendorCapabilities) -> None: + client = MagicMock() + client.chat.completions.create.return_value = _mock_completion("hi", usage_input=20, usage_output=10) + request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m", max_tokens=100) + response = send_openai_compatible(client, request, capabilities=caps) + assert response.text == "hi" + assert response.tool_calls == [] + assert response.usage_input_tokens == 20 + assert response.usage_output_tokens == 10 + +def test_send_streaming_aggregates_chunks(caps: VendorCapabilities) -> None: + client = MagicMock() + chunks = [ + MagicMock(choices=[MagicMock(delta=MagicMock(content="hel", tool_calls=None))]), + MagicMock(choices=[MagicMock(delta=MagicMock(content="lo", tool_calls=None))]), + MagicMock(choices=[MagicMock(delta=MagicMock(content="", tool_calls=None))], usage=MagicMock(prompt_tokens=15, completion_tokens=5)), + ] + client.chat.completions.create.return_value = iter(chunks) + received: list[str] = [] + request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m", stream=True, stream_callback=received.append) + response = send_openai_compatible(client, request, capabilities=caps) + assert response.text == "hello" + assert received == ["hel", "lo"] + assert response.usage_input_tokens == 15 + +def test_tool_call_detection_in_response(caps: VendorCapabilities) -> None: + tool_call = MagicMock() + tool_call.id = "call_1" + tool_call.function.name = "read_file" + tool_call.function.arguments = '{"path": "/tmp/x"}' + completion = _mock_completion(text="", tool_calls=[tool_call]) + client = MagicMock() + client.chat.completions.create.return_value = completion + request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m") + response = send_openai_compatible(client, request, capabilities=caps) + assert len(response.tool_calls) == 1 + assert response.tool_calls[0]["function"]["name"] == "read_file" + assert response.tool_calls[0]["id"] == "call_1" + +def test_vision_multimodal_message(caps: VendorCapabilities) -> None: + client = MagicMock() + client.chat.completions.create.return_value = _mock_completion("looks like a cat") + messages = [{"role": "user", "content": [{"type": "text", "text": "what is this?"}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}]}] + request = OpenAICompatibleRequest(messages=messages, model="m") + response = send_openai_compatible(client, request, capabilities=caps) + sent_messages = client.chat.completions.create.call_args.kwargs["messages"] + assert sent_messages[0]["content"] == messages[0]["content"] + assert response.text == "looks like a cat" + +def test_error_classification_429_to_rate_limit(caps: VendorCapabilities) -> None: + from openai import RateLimitError + from src.openai_compatible import send_openai_compatible + from src.ai_client import ProviderError + client = MagicMock() + client.chat.completions.create.side_effect = RateLimitError("rate limited", response=MagicMock(status_code=429), body=None) + request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m") + with pytest.raises(ProviderError) as exc_info: + send_openai_compatible(client, request, capabilities=caps) + assert exc_info.value.kind == "rate_limit" + +def test_normalized_response_is_frozen_dataclass() -> None: + from dataclasses import FrozenInstanceError + r = NormalizedResponse(text="x", tool_calls=[], usage_input_tokens=0, usage_output_tokens=0, usage_cache_read_tokens=0, usage_cache_creation_tokens=0, raw_response=None) + with pytest.raises(FrozenInstanceError): + r.text = "y" +``` + +- [ ] **Step 2: Run, confirm 6 tests fail** + +Run: `uv run pytest tests/test_openai_compatible.py -v` +Expected: All 6 tests FAIL with `ImportError: cannot import name 'NormalizedResponse' from 'src.openai_compatible'`. + +- [ ] **Step 3: Commit (red)** + +```bash +git add tests/test_openai_compatible.py +git commit -m "test(openai_compatible): add red tests for shared send helper" +``` + +--- + +## Task 1.6: Implement src/openai_compatible.py + +**Files:** +- Create: `src/openai_compatible.py` + +- [ ] **Step 1: Create the file with the dataclasses, helper, and error classifier** + +```python +from dataclasses import dataclass +from typing import Any, Callable, Optional + +from openai import OpenAIError, RateLimitError, AuthenticationError, PermissionDeniedError, APIConnectionError, APIStatusError, BadRequestError + +@dataclass(frozen=True) +class NormalizedResponse: + text: str + tool_calls: list[dict[str, Any]] + usage_input_tokens: int + usage_output_tokens: int + usage_cache_read_tokens: int + usage_cache_creation_tokens: int + raw_response: Any + +@dataclass +class OpenAICompatibleRequest: + messages: list[dict[str, Any]] + model: str + temperature: float = 0.0 + top_p: float = 1.0 + max_tokens: int = 8192 + tools: Optional[list[dict[str, Any]]] = None + tool_choice: str = "auto" + stream: bool = False + stream_callback: Optional[Callable[[str], None]] = None + +def _to_dict_tool_call(tc: Any) -> dict[str, Any]: + return { + "id": getattr(tc, "id", None), + "type": getattr(tc, "type", "function"), + "function": { + "name": getattr(tc.function, "name", None), + "arguments": getattr(tc.function, "arguments", "{}"), + }, + } + +def _classify_openai_compatible_error(exc: Exception) -> "ProviderError": + from src.ai_client import ProviderError + if isinstance(exc, RateLimitError): + return ProviderError(kind="rate_limit", provider="openai_compatible", original=exc) + if isinstance(exc, AuthenticationError) or isinstance(exc, PermissionDeniedError): + return ProviderError(kind="auth", provider="openai_compatible", original=exc) + if isinstance(exc, APIConnectionError): + return ProviderError(kind="network", provider="openai_compatible", original=exc) + if isinstance(exc, APIStatusError): + code = getattr(exc, "status_code", 0) + if code == 402: + return ProviderError(kind="balance", provider="openai_compatible", original=exc) + if code == 429: + return ProviderError(kind="rate_limit", provider="openai_compatible", original=exc) + if code in (401, 403): + return ProviderError(kind="auth", provider="openai_compatible", original=exc) + if code in (500, 502, 503, 504): + return ProviderError(kind="network", provider="openai_compatible", original=exc) + if isinstance(exc, BadRequestError): + return ProviderError(kind="quota", provider="openai_compatible", original=exc) + return ProviderError(kind="unknown", provider="openai_compatible", original=exc) + +def send_openai_compatible( + client: Any, + request: OpenAICompatibleRequest, + *, + capabilities: Any, +) -> NormalizedResponse: + kwargs: dict[str, Any] = { + "model": request.model, + "messages": request.messages, + "temperature": request.temperature, + "top_p": request.top_p, + "max_tokens": request.max_tokens, + "stream": request.stream, + } + if request.tools is not None: + kwargs["tools"] = request.tools + kwargs["tool_choice"] = request.tool_choice + try: + if request.stream: + return _send_streaming(client, kwargs, request.stream_callback) + return _send_blocking(client, kwargs) + except OpenAIError as exc: + raise _classify_openai_compatible_error(exc) from exc + +def _send_blocking(client: Any, kwargs: dict[str, Any]) -> NormalizedResponse: + resp = client.chat.completions.create(**kwargs) + msg = resp.choices[0].message + tool_calls_raw = msg.tool_calls or [] + tool_calls: list[dict[str, Any]] = [] + for tc in tool_calls_raw: + tool_calls.append(_to_dict_tool_call(tc)) + usage = getattr(resp, "usage", None) or MagicMock_noop() + return NormalizedResponse( + text=msg.content or "", + tool_calls=tool_calls, + usage_input_tokens=getattr(usage, "prompt_tokens", 0) or 0, + usage_output_tokens=getattr(usage, "completion_tokens", 0) or 0, + usage_cache_read_tokens=0, + usage_cache_creation_tokens=0, + raw_response=resp, + ) + +class _MagicMock_noop: + def __getattr__(self, name: str) -> int: + return 0 + +MagicMock_noop = _MagicMock_noop + +def _send_streaming(client: Any, kwargs: dict[str, Any], callback: Optional[Callable[[str], None]]) -> NormalizedResponse: + kwargs_stream = dict(kwargs) + kwargs_stream["stream"] = True + kwargs_stream["stream_options"] = {"include_usage": True} + chunks_iter = client.chat.completions.create(**kwargs_stream) + text_parts: list[str] = [] + tool_calls_acc: dict[int, dict[str, Any]] = {} + usage_input = 0 + usage_output = 0 + for chunk in chunks_iter: + for choice in getattr(chunk, "choices", []) or []: + delta = getattr(choice, "delta", None) + if delta is None: + continue + if delta.content: + text_parts.append(delta.content) + if callback: + callback(delta.content) + for tc in getattr(delta, "tool_calls", None) or []: + idx = getattr(tc, "index", 0) + if idx not in tool_calls_acc: + tool_calls_acc[idx] = {"id": None, "type": "function", "function": {"name": None, "arguments": ""}} + if getattr(tc, "id", None): + tool_calls_acc[idx]["id"] = tc.id + if getattr(tc, "function", None): + if tc.function.name: + tool_calls_acc[idx]["function"]["name"] = tc.function.name + if tc.function.arguments: + tool_calls_acc[idx]["function"]["arguments"] += tc.function.arguments + chunk_usage = getattr(chunk, "usage", None) + if chunk_usage is not None: + usage_input = getattr(chunk_usage, "prompt_tokens", 0) or 0 + usage_output = getattr(chunk_usage, "completion_tokens", 0) or 0 + return NormalizedResponse( + text="".join(text_parts), + tool_calls=[tool_calls_acc[k] for k in sorted(tool_calls_acc.keys())], + usage_input_tokens=usage_input, + usage_output_tokens=usage_output, + usage_cache_read_tokens=0, + usage_cache_creation_tokens=0, + raw_response=None, + ) +``` + +- [ ] **Step 2: Run, confirm 5 of 6 tests pass; debug test 5 (error classification) if it fails because `ProviderError` is not yet importable** + +Run: `uv run pytest tests/test_openai_compatible.py -v` +Expected: Tests 1-4, 6 pass. Test 5 (error classification) may fail with `ImportError: cannot import name 'ProviderError' from 'src.ai_client'` because `ProviderError` is defined later in `src/ai_client.py`. If so, run a one-liner to confirm ProviderError exists: `uv run python -c "from src.ai_client import ProviderError; print(ProviderError)"`. If ProviderError is missing, add a minimal stub to `src/ai_client.py` (Task 1.7 will formalize this). For now, defer test 5 to Task 1.7. + +- [ ] **Step 3: Commit (green for tests 1-4, 6)** + +```bash +git add src/openai_compatible.py +git commit -m "feat(openai_compatible): implement shared send helper for OpenAI-compatible vendors" +``` + +--- + +## Task 1.7: Confirm ProviderError exists in src/ai_client.py and add helper if needed + +**Files:** +- Modify (if needed): `src/ai_client.py` (add ProviderError stub) + +- [ ] **Step 1: Verify ProviderError is importable** + +Run: `uv run python -c "from src.ai_client import ProviderError; print(ProviderError)"` +Expected: prints the class. + +- [ ] **Step 2: If missing, add a minimal ProviderError class to src/ai_client.py** + +```python +class ProviderError(Exception): + def __init__(self, kind: str, provider: str, original: Exception | None = None) -> None: + self.kind = kind + self.provider = provider + self.original = original + super().__init__(f"{provider} {kind} error: {original}") + def ui_message(self) -> str: + return f"{self.provider} error: {self.kind}" +``` + +If ProviderError already exists with a different signature, document the signature and adjust the `_classify_openai_compatible_error` function in `src/openai_compatible.py` to match. **Do not change the existing ProviderError if it's already in use** — adjust the classifier instead. + +- [ ] **Step 3: Run the openai_compatible tests; confirm all 6 pass** + +Run: `uv run pytest tests/test_openai_compatible.py -v` +Expected: All 6 tests PASS. + +- [ ] **Step 4: Commit if any changes were made** + +```bash +git add src/ai_client.py src/openai_compatible.py +git commit -m "fix(openai_compatible): align ProviderError signature if needed" +``` + +--- + +## Task 1.8: Phase 1 checkpoint commit and git note + +**Files:** none (commit + note only) + +- [ ] **Step 1: Run all Phase 1 tests** + +Run: `uv run pytest tests/test_vendor_capabilities.py tests/test_openai_compatible.py -v` +Expected: 13 tests PASS (7 capabilities + 6 openai_compatible). + +- [ ] **Step 2: Verify import time for src/vendor_capabilities and src/openai_compatible is small (no heavy top-level imports)** + +Run: +```bash +uv run python -c "import time; t=time.perf_counter(); import src.vendor_capabilities; print(f'vendor_capabilities: {(time.perf_counter()-t)*1000:.1f}ms')" +uv run python -c "import time; t=time.perf_counter(); import src.openai_compatible; print(f'openai_compatible: {(time.perf_counter()-t)*1000:.1f}ms')" +``` + +Expected: both under 50ms (no heavy SDK imports at the top of these files). + +- [ ] **Step 3: Create the checkpoint commit and git note** + +```bash +git add -A +if ! git diff --cached --quiet; then git commit -m "conductor(checkpoint): Phase 1 complete - matrix framework + shared helper"; fi +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 1 checkpoint: qwen_llama_grok_integration_20260606 + +Capability matrix framework + shared OpenAI-compatible helper: +- src/vendor_capabilities.py: VendorCapabilities dataclass, registry, get_capabilities(), list_models_for_vendor() +- Initial registry: Qwen (7 models), Llama (8 models), Grok (3 models), MiniMax (1 model), Anthropic/Gemini/DeepSeek stubs +- src/openai_compatible.py: NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible(), _classify_openai_compatible_error() +- 13 unit tests pass +- dashscope dep added to pyproject.toml +- Import time < 50ms for both modules (no heavy SDK imports at top level) + +Next: Phase 2 (Qwen via DashScope native SDK)." "$SHA" +``` + +- [ ] **Step 4: Update state.toml phase_1 status** + +Edit `conductor/tracks/qwen_llama_grok_integration_20260606/state.toml` line: +```toml +phase_1 = { status = "pending", checkpoint_sha = "", name = "Capability matrix framework + shared helper" } +``` +Change to: +```toml +phase_1 = { status = "completed", checkpoint_sha = "", name = "Capability matrix framework + shared helper" } +``` + +```bash +git add conductor/tracks/qwen_llama_grok_integration_20260606/state.toml +git commit -m "conductor(plan): mark Phase 1 complete in qwen_llama_grok_integration_20260606" +``` + +--- + +# Phase 2: Qwen via DashScope + +> Goal: `_send_qwen()` and friends implemented in `src/ai_client.py`. `qwen` registered as a provider. Qwen models in the capability registry (already done in Phase 1, but verified here). Credentials template updated. Cost tracker updated. + +--- + +## Task 2.1: Write red tests for Qwen provider + +**Files:** +- Create: `tests/test_qwen_provider.py` + +- [ ] **Step 1: Create the test file with 5 tests** + +```python +from unittest.mock import MagicMock, patch +import pytest +from src import ai_client + +@pytest.fixture(autouse=True) +def _reset_qwen_state(): + ai_client._qwen_client = None + ai_client._qwen_history = [] + yield + +def test_send_qwen_routes_to_dashscope(monkeypatch: pytest.MonkeyPatch) -> None: + ai_client.set_provider("qwen", "qwen-max") + with patch("src.ai_client._ensure_qwen_client") as ensure, \ + patch("src.ai_client._dashscope_call", return_value={"text": "hi from qwen"}) as call: + result = ai_client._send_qwen("system", "user", ".", None, "", False, None, None, True, None, None, None) + assert result == "hi from qwen" + call.assert_called_once() + ensure.assert_called_once() + +def test_qwen_vision_vl_model_accepts_image(monkeypatch: pytest.MonkeyPatch) -> None: + ai_client.set_provider("qwen", "qwen-vl-max") + with patch("src.ai_client._ensure_qwen_client"), \ + patch("src.ai_client._dashscope_call", return_value={"text": "I see a cat"}) as call: + file_items = [{"path": "/tmp/cat.png", "is_image": True, "base64_data": "iVBOR..."}] + result = ai_client._send_qwen("system", "describe", ".", file_items, "", False, None, None, True, None, None, None) + assert "cat" in result.lower() + kwargs = call.call_args.kwargs + assert any("image" in str(m).lower() or "vl" in str(m).lower() for m in kwargs.get("messages", [])) + +def test_qwen_tool_format_translation() -> None: + from src.qwen_adapter import build_dashscope_tools + openai_tools = [{"type": "function", "function": {"name": "read_file", "description": "Read a file", "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}}}] + ds_tools = build_dashscope_tools(openai_tools) + assert len(ds_tools) == 1 + assert ds_tools[0]["name"] == "read_file" + assert "parameters" in ds_tools[0] + +def test_qwen_error_classification() -> None: + from src.ai_client import ProviderError + from src.qwen_adapter import classify_dashscope_error + import dashscope + err = classify_dashscope_error(dashscope.common.error.InvalidApiKey("bad key")) + assert err.kind == "auth" + assert err.provider == "qwen" + +def test_list_qwen_models_returns_hardcoded_registry() -> None: + from src.ai_client import _list_qwen_models + models = _list_qwen_models() + assert "qwen-max" in models + assert "qwen-vl-max" in models + assert "qwen-turbo" in models + assert "qwen-audio" in models +``` + +- [ ] **Step 2: Run, confirm 5 tests fail** + +Run: `uv run pytest tests/test_qwen_provider.py -v` +Expected: All 5 tests FAIL with `ImportError` or `AttributeError`. + +- [ ] **Step 3: Commit (red)** + +```bash +git add tests/test_qwen_provider.py +git commit -m "test(qwen): add red tests for Qwen via DashScope" +``` + +--- + +## Task 2.2: Create src/qwen_adapter.py + +**Files:** +- Create: `src/qwen_adapter.py` + +- [ ] **Step 1: Create the file with build_dashscope_tools() and classify_dashscope_error()** + +```python +import dashscope +from dashscope.common.error import ( + InvalidApiKey, + InvalidParameter, + QuotaExceeded, + RateLimitExceeded, + NetworkError, +) +from typing import Any +from src.ai_client import ProviderError + +def build_dashscope_tools(openai_tools: list[dict[str, Any]]) -> list[dict[str, Any]]: + out: list[dict[str, Any]] = [] + for t in openai_tools: + if t.get("type") != "function": + continue + fn = t.get("function", {}) + out.append({ + "name": fn.get("name", ""), + "description": fn.get("description", ""), + "parameters": fn.get("parameters", {"type": "object", "properties": {}}), + }) + return out + +def classify_dashscope_error(exc: Exception) -> ProviderError: + if isinstance(exc, InvalidApiKey): + return ProviderError(kind="auth", provider="qwen", original=exc) + if isinstance(exc, RateLimitExceeded): + return ProviderError(kind="rate_limit", provider="qwen", original=exc) + if isinstance(exc, QuotaExceeded): + return ProviderError(kind="quota", provider="qwen", original=exc) + if isinstance(exc, NetworkError): + return ProviderError(kind="network", provider="qwen", original=exc) + if isinstance(exc, InvalidParameter): + return ProviderError(kind="quota", provider="qwen", original=exc) + return ProviderError(kind="unknown", provider="qwen", original=exc) +``` + +- [ ] **Step 2: Run qwen tests; tests 3 and 4 should now pass; 1, 2, 5 still fail** + +Run: `uv run pytest tests/test_qwen_provider.py -v` +Expected: 2 tests PASS (test_qwen_tool_format_translation, test_qwen_error_classification); 3 tests still FAIL. + +- [ ] **Step 3: Commit** + +```bash +git add src/qwen_adapter.py +git commit -m "feat(qwen_adapter): add DashScope tool translation and error classification" +``` + +--- + +## Task 2.3: Implement Qwen state globals + _ensure_qwen_client + +**Files:** +- Modify: `src/ai_client.py` (add state near other vendor state, add `_ensure_qwen_client`) + +- [ ] **Step 1: Add Qwen state globals near the other vendor state (after _minimax_* state around line 122)** + +Add to the state declarations block in `src/ai_client.py`: + +```python +_qwen_client: Any = None +_qwen_history: list[dict[str, Any]] = [] +_qwen_history_lock: threading.Lock = threading.Lock() +_qwen_region: str = "china" +``` + +- [ ] **Step 2: Add Qwen state reset to reset_session()** + +In the `reset_session()` function, add (near the other vendor resets around line 493): + +```python + global _qwen_client, _qwen_history + _qwen_client = None + with _qwen_history_lock: + _qwen_history = [] +``` + +- [ ] **Step 3: Add _ensure_qwen_client() near the other _ensure_*_client functions (around line 2094)** + +```python +def _ensure_qwen_client() -> None: + global _qwen_client, _qwen_region + if _qwen_client is None: + import dashscope + from src import credentials_loader + api_key = credentials_loader.get_credential("qwen", "api_key") + _qwen_region = credentials_loader.get_credential("qwen", "region", default="china") + dashscope.api_key = api_key + _qwen_client = dashscope.Generation +``` + +(Adapt `credentials_loader` to the actual project's credential loading pattern. Look at `_ensure_minimax_client` around line 2094 for the actual pattern used; mirror it for Qwen.) + +- [ ] **Step 4: Commit** + +```bash +git add src/ai_client.py +git commit -m "feat(qwen): add state globals and _ensure_qwen_client" +``` + +--- + +## Task 2.4: Implement _dashscope_call() and _send_qwen() + +**Files:** +- Modify: `src/ai_client.py` + +- [ ] **Step 1: Add _dashscope_call() helper near the other vendor call helpers** + +```python +def _dashscope_call( + model: str, + messages: list[dict[str, Any]], + tools: list[dict[str, Any]] | None, + *, + max_tokens: int, + temperature: float, + top_p: float, +) -> dict[str, Any]: + import dashscope + from dashscope import Generation + from src.qwen_adapter import build_dashscope_tools + kwargs: dict[str, Any] = { + "model": model, + "messages": messages, + "max_tokens": max_tokens, + "temperature": temperature, + "top_p": top_p, + "result_format": "message", + } + if tools: + kwargs["tools"] = build_dashscope_tools(tools) + resp = Generation.call(**kwargs) + if getattr(resp, "status_code", 200) != 200: + from src.qwen_adapter import classify_dashscope_error + raise classify_dashscope_error(_dashscope_exception_from_response(resp)) + return { + "text": resp.output.text if hasattr(resp, "output") and resp.output else "", + "tool_calls": _extract_dashscope_tool_calls(resp), + "usage": { + "input_tokens": getattr(resp.usage, "input_tokens", 0) if hasattr(resp, "usage") and resp.usage else 0, + "output_tokens": getattr(resp.usage, "output_tokens", 0) if hasattr(resp, "usage") and resp.usage else 0, + }, + } + +def _dashscope_exception_from_response(resp: Any) -> Exception: + msg = getattr(resp, "message", "unknown dashscope error") + return RuntimeError(msg) + +def _extract_dashscope_tool_calls(resp: Any) -> list[dict[str, Any]]: + out: list[dict[str, Any]] = [] + if not (hasattr(resp, "output") and resp.output and getattr(resp.output, "tool_calls", None)): + return out + for tc in resp.output.tool_calls: + out.append({ + "id": getattr(tc, "id", ""), + "type": "function", + "function": { + "name": getattr(tc.function, "name", "") if hasattr(tc, "function") else "", + "arguments": getattr(tc.function, "arguments", "{}") if hasattr(tc, "function") else "{}", + }, + }) + return out +``` + +- [ ] **Step 2: Add _send_qwen() (the entry point) near the other _send_* functions (after _send_minimax)** + +```python +def _send_qwen(md_content, user_message, base_dir, file_items, discussion_history, stream, pre_tool_callback, qa_callback, enable_tools, stream_callback, patch_callback, rag_engine): + _ensure_qwen_client() + from src.qwen_adapter import build_dashscope_tools + with _qwen_history_lock: + if discussion_history and not _qwen_history: + _qwen_history.extend(_parse_discussion_history(discussion_history)) + user_content = _build_user_content(user_message, file_items) + _qwen_history.append({"role": "user", "content": user_content}) + tools: list[dict[str, Any]] | None = None + if enable_tools: + tools = _build_tools() + for _round in range(MAX_TOOL_ROUNDS + 2): + resp = _dashscope_call( + model=_model, + messages=_qwen_history, + tools=tools, + max_tokens=_max_tokens, + temperature=_temperature, + top_p=_top_p, + ) + _append_comms("IN", "response", resp) + if not resp.get("tool_calls"): + return resp.get("text", "") + for call in resp["tool_calls"]: + if pre_tool_callback and pre_tool_callback(call) is False: + return "Tool execution rejected" + tool_result = _dispatch_tool(call, base_dir, rag_engine) + _qwen_history.append({ + "role": "tool", + "tool_call_id": call.get("id", ""), + "content": tool_result, + }) + _reread_file_items(file_items) + return resp.get("text", "") +``` + +(Adapt `_dispatch_tool`, `_build_tools`, `_parse_discussion_history`, `_build_user_content`, `_reread_file_items`, `_append_comms` to the actual names used in the project. Look at `_send_minimax` for the pattern.) + +- [ ] **Step 3: Run qwen tests; tests 1 and 2 should now pass; test 5 still fails (_list_qwen_models missing)** + +Run: `uv run pytest tests/test_qwen_provider.py -v` +Expected: 4 tests PASS; 1 test still FAIL (`test_list_qwen_models_returns_hardcoded_registry`). + +- [ ] **Step 4: Commit** + +```bash +git add src/ai_client.py +git commit -m "feat(qwen): implement _send_qwen with DashScope native SDK + tool loop" +``` + +--- + +## Task 2.5: Add _list_qwen_models() and register qwen in PROVIDERS + +**Files:** +- Modify: `src/ai_client.py` + +- [ ] **Step 1: Add _list_qwen_models() near the other list_models functions** + +```python +def _list_qwen_models() -> list[str]: + from src.vendor_capabilities import list_models_for_vendor + return list_models_for_vendor("qwen") +``` + +- [ ] **Step 2: Register "qwen" as a valid provider in set_provider() (if there's a PROVIDERS list or validation)** + +If `set_provider()` has a hardcoded list of valid providers, add `"qwen"` to it. Look at how `"minimax"` is registered for the pattern. + +- [ ] **Step 3: Wire `ai_client.list_models("qwen")` to call _list_qwen_models()** + +In the `list_models(provider)` function (around line 370 of `src/ai_client.py`), add a branch: + +```python +if provider == "qwen": + return _list_qwen_models() +``` + +(Add to the existing if/elif chain. The exact dispatch pattern depends on the existing code; mirror it.) + +- [ ] **Step 4: Add "qwen" to PROVIDERS list in src/gui_2.py and src/app_controller.py** + +In `src/gui_2.py`, find `PROVIDERS = [...]` and add `"qwen"`. Same in `src/app_controller.py`. + +- [ ] **Step 5: Run all 5 qwen tests; confirm they pass** + +Run: `uv run pytest tests/test_qwen_provider.py -v` +Expected: All 5 tests PASS. + +- [ ] **Step 6: Commit** + +```bash +git add src/ai_client.py src/gui_2.py src/app_controller.py +git commit -m "feat(qwen): add _list_qwen_models, register in PROVIDERS" +``` + +--- + +## Task 2.6: Update credentials template with [qwen] section + +**Files:** +- Modify: `credentials_template.toml` (or wherever the example is) + +- [ ] **Step 1: Find the credentials template file** + +Run: `Get-ChildItem -Path . -Filter "credentials*.toml" -Recurse -ErrorAction SilentlyContinue | Select-Object FullName -First 3` (or grep for `[minimax]` in any toml file to find the pattern). + +- [ ] **Step 2: Add the [qwen] section after the existing [minimax] section** + +```toml +[qwen] +# Alibaba Cloud DashScope API key. Get one at https://dashscope.console.aliyun.com/ +api_key = "YOUR_DASHSCOPE_KEY" +# region = "china" # default; "international" also valid +``` + +- [ ] **Step 3: Commit** + +```bash +git add credentials_template.toml +git commit -m "docs(credentials): add [qwen] example section" +``` + +--- + +## Task 2.7: Add Qwen pricing to src/cost_tracker.py + +**Files:** +- Modify: `src/cost_tracker.py` + +- [ ] **Step 1: Read cost_tracker.py to find the MODEL_PRICING dict** + +Run: `manual-slop_get_file_slice path=src/cost_tracker.py start_line=1 end_line=50` (find the pricing table). + +- [ ] **Step 2: Add Qwen model pricing entries (mirror the pattern for existing models)** + +```python +"qwen-turbo": {"input": 0.05, "output": 0.10}, +"qwen-plus": {"input": 0.40, "output": 1.20}, +"qwen-max": {"input": 2.00, "output": 6.00}, +"qwen-long": {"input": 0.07, "output": 0.28}, +"qwen-vl-plus": {"input": 0.21, "output": 0.63}, +"qwen-vl-max": {"input": 0.50, "output": 1.50}, +"qwen-audio": {"input": 0.10, "output": 0.30}, +``` + +(Adapt the dict structure to the actual one in cost_tracker.py; the `input`/`output` keys may be named differently.) + +- [ ] **Step 3: Verify a Qwen cost is computable** + +Run: `uv run python -c "from src.cost_tracker import estimate_cost; print(estimate_cost('qwen-max', 1000, 500))"` +Expected: a dollar amount (e.g., `0.005`). + +- [ ] **Step 4: Commit** + +```bash +git add src/cost_tracker.py +git commit -m "feat(cost_tracker): add Qwen model pricing" +``` + +--- + +## Task 2.8: Phase 2 checkpoint commit and git note + +**Files:** none (commit + note only) + +- [ ] **Step 1: Run all Qwen tests; confirm 5 pass** + +Run: `uv run pytest tests/test_qwen_provider.py -v` +Expected: 5 tests PASS. + +- [ ] **Step 2: Run the full test suite to ensure no regressions** + +Run: `uv run pytest tests/ -q --timeout=60 2>&1 | tail -10` +Expected: no NEW failures (pre-existing failures from other tracks unchanged). + +- [ ] **Step 3: Create the checkpoint commit and git note** + +```bash +git add -A +if ! git diff --cached --quiet; then git commit -m "conductor(checkpoint): Phase 2 complete - Qwen via DashScope"; fi +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 2 checkpoint: Qwen via DashScope native SDK + +- src/ai_client.py: state globals (_qwen_client, _qwen_history, _qwen_history_lock, _qwen_region), _ensure_qwen_client(), _dashscope_call(), _send_qwen(), _list_qwen_models() +- src/qwen_adapter.py: build_dashscope_tools() (OpenAI shape -> DashScope shape), classify_dashscope_error() +- src/cost_tracker.py: pricing for 7 Qwen models +- src/gui_2.py + src/app_controller.py: qwen added to PROVIDERS +- credentials_template.toml: [qwen] example section +- 5 unit tests pass + +No regressions in 273+ existing tests. + +Next: Phase 3 (Grok + Llama via shared helper)." "$SHA" +``` + +- [ ] **Step 4: Update state.toml phase_2 status** + +```bash +git add conductor/tracks/qwen_llama_grok_integration_20260606/state.toml +git commit -m "conductor(plan): mark Phase 2 complete in qwen_llama_grok_integration_20260606" +``` + +--- + +# Phase 3: Grok + Llama via Shared Helper + +> Goal: `_send_grok()` and `_send_llama()` implemented in `src/ai_client.py`. Both call `send_openai_compatible()`. Both registered as providers. Credentials and cost tracker updated. + +--- + +## Task 3.1: Write red tests for Grok provider + +**Files:** +- Create: `tests/test_grok_provider.py` + +- [ ] **Step 1: Create the test file with 2 tests** + +```python +from unittest.mock import MagicMock, patch +import pytest +from src import ai_client + +@pytest.fixture(autouse=True) +def _reset_grok_state(): + ai_client._grok_client = None + ai_client._grok_history = [] + yield + +def test_send_grok_uses_xai_endpoint(monkeypatch: pytest.MonkeyPatch) -> None: + ai_client.set_provider("grok", "grok-2") + mock_client = MagicMock() + mock_client.chat.completions.create.return_value = MagicMock( + choices=[MagicMock(message=MagicMock(content="hi from grok", tool_calls=[]))], + usage=MagicMock(prompt_tokens=10, completion_tokens=5), + ) + with patch("src.ai_client._ensure_grok_client", return_value=mock_client): + result = ai_client._send_grok("system", "user", ".", None, "", False, None, None, True, None, None, None) + assert result == "hi from grok" + assert mock_client.chat.completions.create.called + +def test_grok_2_vision_supports_image(monkeypatch: pytest.MonkeyPatch) -> None: + from src.vendor_capabilities import get_capabilities + caps = get_capabilities("grok", "grok-2-vision") + assert caps.vision is True +``` + +- [ ] **Step 2: Run, confirm 2 tests fail** + +Run: `uv run pytest tests/test_grok_provider.py -v` +Expected: 2 tests FAIL (test 1 with `AttributeError: module 'src.ai_client' has no attribute '_send_grok'`; test 2 passes since the registry is already populated — adjust test 2 to also fail or remove the capability-only assertion). + +- [ ] **Step 3: Commit (red)** + +```bash +git add tests/test_grok_provider.py +git commit -m "test(grok): add red tests for Grok via xAI" +``` + +--- + +## Task 3.2: Implement Grok state + _ensure_grok_client + _send_grok + +**Files:** +- Modify: `src/ai_client.py` + +- [ ] **Step 1: Add Grok state globals near other vendor state** + +```python +_grok_client: Any = None +_grok_history: list[dict[str, Any]] = [] +_grok_history_lock: threading.Lock = threading.Lock() +``` + +- [ ] **Step 2: Add Grok state reset to reset_session()** + +```python + global _grok_client, _grok_history + _grok_client = None + with _grok_history_lock: + _grok_history = [] +``` + +- [ ] **Step 3: Add _ensure_grok_client()** + +```python +def _ensure_grok_client() -> Any: + global _grok_client + if _grok_client is None: + from openai import OpenAI + from src import credentials_loader + api_key = credentials_loader.get_credential("grok", "api_key") + _grok_client = OpenAI(api_key=api_key, base_url="https://api.x.ai/v1") + return _grok_client +``` + +(Adapt `credentials_loader` to the actual pattern; mirror `_ensure_minimax_client`.) + +- [ ] **Step 4: Add _send_grok() (calls the shared helper)** + +```python +def _send_grok(md_content, user_message, base_dir, file_items, discussion_history, stream, pre_tool_callback, qa_callback, enable_tools, stream_callback, patch_callback, rag_engine): + client = _ensure_grok_client() + from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible + from src.vendor_capabilities import get_capabilities + with _grok_history_lock: + if discussion_history and not _grok_history: + _grok_history.extend(_parse_discussion_history(discussion_history)) + user_content = _build_user_content(user_message, file_items) + _grok_history.append({"role": "user", "content": user_content}) + tools: list[dict[str, Any]] | None = None + if enable_tools: + tools = _build_tools() + for _round in range(MAX_TOOL_ROUNDS + 2): + request = OpenAICompatibleRequest( + messages=_grok_history, + model=_model, + temperature=_temperature, + top_p=_top_p, + max_tokens=_max_tokens, + tools=tools, + stream=stream, + stream_callback=stream_callback, + ) + caps = get_capabilities("grok", _model) + response = send_openai_compatible(client, request, capabilities=caps) + _append_comms("IN", "response", response.raw_response) + _grok_history.append({"role": "assistant", "content": response.text}) + if not response.tool_calls: + return response.text + for call in response.tool_calls: + if pre_tool_callback and pre_tool_callback(call) is False: + return "Tool execution rejected" + tool_result = _dispatch_tool(call, base_dir, rag_engine) + _grok_history.append({ + "role": "tool", + "tool_call_id": call.get("id", ""), + "content": tool_result, + }) + _reread_file_items(file_items) + return response.text +``` + +(Adapt helper names to the actual project; mirror `_send_minimax` for the tool loop pattern.) + +- [ ] **Step 5: Register "grok" in PROVIDERS list and add list_models branch** + +Same as Task 2.5 step 2-4 but for "grok". Add to `PROVIDERS` in `src/gui_2.py` and `src/app_controller.py`. Add branch in `ai_client.list_models(provider)`. + +- [ ] **Step 6: Run the 2 grok tests; confirm they pass** + +Run: `uv run pytest tests/test_grok_provider.py -v` +Expected: 2 tests PASS. + +- [ ] **Step 7: Commit** + +```bash +git add src/ai_client.py src/gui_2.py src/app_controller.py +git commit -m "feat(grok): implement _send_grok via shared helper + register in PROVIDERS" +``` + +--- + +## Task 3.3: Update credentials template + cost tracker for Grok + +**Files:** +- Modify: `credentials_template.toml`, `src/cost_tracker.py` + +- [ ] **Step 1: Add [grok] section to credentials_template.toml** + +```toml +[grok] +# xAI API key. Get one at https://console.x.ai/ +api_key = "YOUR_XAI_KEY" +``` + +- [ ] **Step 2: Add Grok pricing to cost_tracker.py** + +```python +"grok-2": {"input": 2.00, "output": 10.00}, +"grok-2-vision": {"input": 2.00, "output": 10.00}, +"grok-beta": {"input": 5.00, "output": 15.00}, +``` + +(Adapt dict structure to the actual cost_tracker pattern.) + +- [ ] **Step 3: Commit** + +```bash +git add credentials_template.toml src/cost_tracker.py +git commit -m "feat(creds, cost): add Grok credentials template and pricing" +``` + +--- + +## Task 3.4: Write red tests for Llama provider + +**Files:** +- Create: `tests/test_llama_provider.py` + +- [ ] **Step 1: Create the test file with 6 tests** + +```python +from unittest.mock import MagicMock, patch +import pytest +from src import ai_client + +@pytest.fixture(autouse=True) +def _reset_llama_state(): + ai_client._llama_client = None + ai_client._llama_history = [] + ai_client._llama_base_url = "http://localhost:11434/v1" + yield + +def test_send_llama_ollama_backend(monkeypatch: pytest.MonkeyPatch) -> None: + ai_client._llama_base_url = "http://localhost:11434/v1" + ai_client.set_provider("llama", "llama-3.2-3b-preview") + mock_client = MagicMock() + mock_client.chat.completions.create.return_value = MagicMock( + choices=[MagicMock(message=MagicMock(content="hi from ollama", tool_calls=[]))], + usage=MagicMock(prompt_tokens=5, completion_tokens=3), + ) + with patch("src.ai_client._ensure_llama_client", return_value=mock_client): + result = ai_client._send_llama("system", "user", ".", None, "", False, None, None, True, None, None, None) + assert result == "hi from ollama" + +def test_send_llama_openrouter_backend(monkeypatch: pytest.MonkeyPatch) -> None: + ai_client._llama_base_url = "https://openrouter.ai/api/v1" + ai_client.set_provider("llama", "llama-3.1-70b-versatile") + captured_client = MagicMock() + captured_client.chat.completions.create.return_value = MagicMock( + choices=[MagicMock(message=MagicMock(content="hi from openrouter", tool_calls=[]))], + usage=MagicMock(prompt_tokens=5, completion_tokens=3), + ) + with patch("src.ai_client._ensure_llama_client", return_value=captured_client) as ensure: + result = ai_client._send_llama("system", "user", ".", None, "", False, None, None, True, None, None, None) + assert result == "hi from openrouter" + assert ensure.called + +def test_send_llama_custom_url(monkeypatch: pytest.MonkeyPatch) -> None: + ai_client._llama_base_url = "http://my-server:9999/v1" + mock_client = MagicMock() + mock_client.chat.completions.create.return_value = MagicMock( + choices=[MagicMock(message=MagicMock(content="hi from custom", tool_calls=[]))], + usage=MagicMock(prompt_tokens=5, completion_tokens=3), + ) + with patch("src.ai_client._ensure_llama_client", return_value=mock_client): + result = ai_client._send_llama("system", "user", ".", None, "", False, None, None, True, None, None, None) + assert result == "hi from custom" + +def test_llama_model_discovery_unions_ollama_and_openrouter() -> None: + from src.ai_client import _list_llama_models + models = _list_llama_models() + assert "llama-3.1-8b-instant" in models + assert "llama-3.2-11b-vision-preview" in models + assert "llama-3.3-70b-specdec" in models + +def test_llama_3_2_vision_vision_capability() -> None: + from src.vendor_capabilities import get_capabilities + caps = get_capabilities("llama", "llama-3.2-11b-vision-preview") + assert caps.vision is True + +def test_llama_local_backend_cost_tracking_false_for_ollama() -> None: + ai_client._llama_base_url = "http://localhost:11434/v1" + from src.ai_client import _get_llama_cost_tracking + assert _get_llama_cost_tracking() is False +``` + +- [ ] **Step 2: Run, confirm 6 tests fail** + +Run: `uv run pytest tests/test_llama_provider.py -v` +Expected: 6 tests FAIL. + +- [ ] **Step 3: Commit (red)** + +```bash +git add tests/test_llama_provider.py +git commit -m "test(llama): add red tests for multi-backend Llama provider" +``` + +--- + +## Task 3.5: Implement Llama state + _ensure_llama_client + _send_llama + _list_llama_models + +**Files:** +- Modify: `src/ai_client.py` + +- [ ] **Step 1: Add Llama state globals** + +```python +_llama_client: Any = None +_llama_history: list[dict[str, Any]] = [] +_llama_history_lock: threading.Lock = threading.Lock() +_llama_base_url: str = "http://localhost:11434/v1" +_llama_api_key: str = "ollama" +``` + +- [ ] **Step 2: Add Llama state reset to reset_session()** + +```python + global _llama_client, _llama_history, _llama_base_url, _llama_api_key + _llama_client = None + with _llama_history_lock: + _llama_history = [] + _llama_base_url = "http://localhost:11434/v1" + _llama_api_key = "ollama" +``` + +- [ ] **Step 3: Add _ensure_llama_client()** + +```python +def _ensure_llama_client() -> Any: + global _llama_client, _llama_base_url, _llama_api_key + if _llama_client is None: + from openai import OpenAI + from src import credentials_loader + configured_url = credentials_loader.get_credential("llama", "base_url", default="http://localhost:11434/v1") + configured_key = credentials_loader.get_credential("llama", "api_key", default="ollama") + if configured_url: + _llama_base_url = configured_url + if configured_key is not None: + _llama_api_key = configured_key + _llama_client = OpenAI(api_key=_llama_api_key, base_url=_llama_base_url) + return _llama_client +``` + +- [ ] **Step 4: Add _send_llama() (calls the shared helper; same pattern as _send_grok)** + +```python +def _send_llama(md_content, user_message, base_dir, file_items, discussion_history, stream, pre_tool_callback, qa_callback, enable_tools, stream_callback, patch_callback, rag_engine): + client = _ensure_llama_client() + from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible + from src.vendor_capabilities import get_capabilities + with _llama_history_lock: + if discussion_history and not _llama_history: + _llama_history.extend(_parse_discussion_history(discussion_history)) + user_content = _build_user_content(user_message, file_items) + _llama_history.append({"role": "user", "content": user_content}) + tools: list[dict[str, Any]] | None = None + if enable_tools: + tools = _build_tools() + for _round in range(MAX_TOOL_ROUNDS + 2): + request = OpenAICompatibleRequest( + messages=_llama_history, + model=_model, + temperature=_temperature, + top_p=_top_p, + max_tokens=_max_tokens, + tools=tools, + stream=stream, + stream_callback=stream_callback, + ) + caps = get_capabilities("llama", _model) + response = send_openai_compatible(client, request, capabilities=caps) + _append_comms("IN", "response", response.raw_response) + _llama_history.append({"role": "assistant", "content": response.text}) + if not response.tool_calls: + return response.text + for call in response.tool_calls: + if pre_tool_callback and pre_tool_callback(call) is False: + return "Tool execution rejected" + tool_result = _dispatch_tool(call, base_dir, rag_engine) + _llama_history.append({ + "role": "tool", + "tool_call_id": call.get("id", ""), + "content": tool_result, + }) + _reread_file_items(file_items) + return response.text +``` + +- [ ] **Step 5: Add _list_llama_models() and _get_llama_cost_tracking()** + +```python +def _list_llama_models() -> list[str]: + from src.vendor_capabilities import list_models_for_vendor + return list_models_for_vendor("llama") + +def _get_llama_cost_tracking() -> bool: + if "localhost" in _llama_base_url or "127.0.0.1" in _llama_base_url: + return False + from src.vendor_capabilities import get_capabilities + caps = get_capabilities("llama", _model) + return caps.cost_tracking +``` + +- [ ] **Step 6: Register "llama" in PROVIDERS list and add list_models branch** + +Same pattern as Tasks 2.5 and 3.2 for "llama". + +- [ ] **Step 7: Run the 6 llama tests; confirm they pass** + +Run: `uv run pytest tests/test_llama_provider.py -v` +Expected: 6 tests PASS. + +- [ ] **Step 8: Commit** + +```bash +git add src/ai_client.py src/gui_2.py src/app_controller.py +git commit -m "feat(llama): implement _send_llama multi-backend via shared helper" +``` + +--- + +## Task 3.6: Update credentials template + cost tracker for Llama + +**Files:** +- Modify: `credentials_template.toml`, `src/cost_tracker.py` + +- [ ] **Step 1: Add [llama] section to credentials_template.toml** + +```toml +[llama] +# For Ollama (local), leave api_key empty and set base_url to http://localhost:11434/v1 +# For OpenRouter (cloud), set api_key and base_url to https://openrouter.ai/api/v1 +# For custom self-hosted, set base_url to your endpoint +# api_key = "YOUR_OPENROUTER_KEY" +# base_url = "https://openrouter.ai/api/v1" +``` + +- [ ] **Step 2: Add Llama pricing to cost_tracker.py** + +```python +"llama-3.1-8b-instant": {"input": 0.05, "output": 0.08}, +"llama-3.1-70b-versatile": {"input": 0.59, "output": 0.79}, +"llama-3.1-405b-reasoning": {"input": 3.00, "output": 3.00}, +"llama-3.2-1b-preview": {"input": 0.04, "output": 0.04}, +"llama-3.2-3b-preview": {"input": 0.06, "output": 0.06}, +"llama-3.2-11b-vision-preview": {"input": 0.18, "output": 0.18}, +"llama-3.2-90b-vision-preview": {"input": 0.90, "output": 0.90}, +"llama-3.3-70b-specdec": {"input": 0.59, "output": 0.79}, +``` + +- [ ] **Step 3: Commit** + +```bash +git add credentials_template.toml src/cost_tracker.py +git commit -m "feat(creds, cost): add Llama credentials template and pricing" +``` + +--- + +## Task 3.7: Phase 3 checkpoint + +**Files:** none (commit + note only) + +- [ ] **Step 1: Run all 3 new vendor test files; confirm 13 pass (5 Qwen + 2 Grok + 6 Llama)** + +Run: `uv run pytest tests/test_qwen_provider.py tests/test_grok_provider.py tests/test_llama_provider.py -v` +Expected: 13 tests PASS. + +- [ ] **Step 2: Run the full test suite to ensure no regressions** + +Run: `uv run pytest tests/ -q --timeout=60 2>&1 | tail -10` +Expected: no NEW failures. + +- [ ] **Step 3: Create the checkpoint commit and git note** + +```bash +git add -A +if ! git diff --cached --quiet; then git commit -m "conductor(checkpoint): Phase 3 complete - Grok + Llama via shared helper"; fi +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 3 checkpoint: Grok + Llama via shared helper + +- src/ai_client.py: state globals for _grok_*, _llama_*; _ensure_grok_client, _ensure_llama_client; _send_grok, _send_llama (both call send_openai_compatible); _list_grok_models, _list_llama_models, _get_llama_cost_tracking +- src/cost_tracker.py: pricing for Grok (3 models) + Llama (8 models) +- src/gui_2.py + src/app_controller.py: grok + llama added to PROVIDERS +- credentials_template.toml: [grok] and [llama] example sections +- 13 unit tests pass (5 Qwen from Phase 2 + 2 Grok + 6 Llama) + +No regressions in 273+ existing tests. + +Next: Phase 4 (MiniMax refactor)." "$SHA" +``` + +- [ ] **Step 4: Update state.toml phase_3 status** + +```bash +git add conductor/tracks/qwen_llama_grok_integration_20260606/state.toml +git commit -m "conductor(plan): mark Phase 3 complete in qwen_llama_grok_integration_20260606" +``` + +--- + +# Phase 4: MiniMax Refactor + +> Goal: `_send_minimax()` is refactored to use `send_openai_compatible()`. Behavior is identical; code is ~50 lines instead of ~250. All existing `tests/test_minimax_provider.py` tests pass unchanged. + +--- + +## Task 4.1: Baseline: verify existing MiniMax tests pass before refactor + +**Files:** none (verification only) + +- [ ] **Step 1: Run tests/test_minimax_provider.py and record pass/fail counts** + +Run: `uv run pytest tests/test_minimax_provider.py -v 2>&1 | tail -30` +Expected: all existing tests pass (or have the same pre-existing failures as before). + +- [ ] **Step 2: Record the line count of _send_minimax in src/ai_client.py** + +Run: `manual-slop_py_get_signature path=src/ai_client.py name=_send_minimax` then read the function body. Or use `wc -l` after locating the function. Record the "before" count in the state.toml. + +- [ ] **Step 3: Commit no changes (baseline only)** + +No commit. This task is verification. + +--- + +## Task 4.2: Refactor _send_minimax() to use send_openai_compatible() + +**Files:** +- Modify: `src/ai_client.py` (replace _send_minimax body) + +- [ ] **Step 1: Read the current _send_minimax body to understand the exact tool loop and history handling** + +Run: `manual-slop_get_definition path=src/ai_client.py name=_send_minimax` (returns the full function). + +- [ ] **Step 2: Replace the function body with the refactored version using send_openai_compatible** + +The new `_send_minimax` should follow the same pattern as `_send_grok` and `_send_llama` (Tasks 3.2 and 3.5). The differences from Grok are: +- Base URL is `https://api.minimax.chat/v1` (already in `_ensure_minimax_client`) +- Vendor name in capability lookup is `"minimax"` + +Replace the function body with: + +```python +def _send_minimax(md_content, user_message, base_dir, file_items, discussion_history, stream, pre_tool_callback, qa_callback, enable_tools, stream_callback, patch_callback, rag_engine): + _ensure_minimax_client() + from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible + from src.vendor_capabilities import get_capabilities + with _minimax_history_lock: + _repair_minimax_history(_minimax_history) + if discussion_history and not _minimax_history: + _minimax_history.extend(_parse_discussion_history(discussion_history)) + user_content = _build_user_content(user_message, file_items) + _minimax_history.append({"role": "user", "content": user_content}) + tools: list[dict[str, Any]] | None = None + if enable_tools: + tools = _build_tools() + for _round in range(MAX_TOOL_ROUNDS + 2): + request = OpenAICompatibleRequest( + messages=_minimax_history, + model=_model, + temperature=_temperature, + top_p=_top_p, + max_tokens=_max_tokens, + tools=tools, + stream=stream, + stream_callback=stream_callback, + ) + caps = get_capabilities("minimax", _model) + response = send_openai_compatible(_minimax_client, request, capabilities=caps) + _append_comms("IN", "response", response.raw_response) + _minimax_history.append({"role": "assistant", "content": response.text}) + if not response.tool_calls: + return response.text + for call in response.tool_calls: + if pre_tool_callback and pre_tool_callback(call) is False: + return "Tool execution rejected" + tool_result = _dispatch_tool(call, base_dir, rag_engine) + _minimax_history.append({ + "role": "tool", + "tool_call_id": call.get("id", ""), + "content": tool_result, + }) + _reread_file_items(file_items) + dropped = _trim_minimax_history([sys_msg], _minimax_history) + return response.text +``` + +**Key change:** the entire `try / except OpenAIError` block, the `client.chat.completions.create(...)` direct call, the manual response parsing, the manual streaming iteration, the manual tool_calls extraction — all of that moves to `send_openai_compatible()`. The MiniMax-specific logic (history lock, history repair, history trim) stays. + +- [ ] **Step 3: Run tests/test_minimax_provider.py; confirm all tests still pass** + +Run: `uv run pytest tests/test_minimax_provider.py -v` +Expected: same pass/fail count as Task 4.1 Step 1 baseline. + +- [ ] **Step 4: If tests fail, debug forward (do NOT revert). The refactor is wrong; fix the new body.** + +Common failure modes: +- History format mismatch: the helper expects `{"role": "tool", "tool_call_id": "...", "content": "..."}`. The old code may have stored differently. +- Tool result content type: the helper expects `content: str`. The old code may have stored as `list[dict]`. +- Streaming: if the test mocks the client and the helper calls `create(stream=True)`, the mock setup may need adjusting. + +- [ ] **Step 5: Run the full test suite to check for cross-vendor regressions** + +Run: `uv run pytest tests/ -q --timeout=60 2>&1 | tail -10` +Expected: same pre-existing failures as baseline; no new failures. + +- [ ] **Step 6: Commit (green)** + +```bash +git add src/ai_client.py +git commit -m "refactor(minimax): use send_openai_compatible helper (~250 -> ~50 lines)" +``` + +--- + +## Task 4.3: Record refactor stats + Phase 4 checkpoint + +**Files:** +- Modify: `conductor/tracks/qwen_llama_grok_integration_20260606/state.toml` + +- [ ] **Step 1: Record the line count delta in state.toml** + +Edit the `[minimax_refactor_stats]` section of state.toml: +```toml +[minimax_refactor_stats] +lines_before = "" +lines_after = "" +tests_passing = "" +tests_failing = "" +``` + +- [ ] **Step 2: Create the checkpoint commit and git note** + +```bash +git add -A +if ! git diff --cached --quiet; then git commit -m "conductor(checkpoint): Phase 4 complete - MiniMax refactored to use shared helper"; fi +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 4 checkpoint: MiniMax refactored to use shared helper. + +_send_minimax() went from ~250 lines to ~50 lines by calling +send_openai_compatible() instead of doing the OpenAI API call directly. +All existing tests/test_minimax_provider.py tests pass unchanged. + +The behavior contract is preserved: same tool loop, same history format, +same error classification, same streaming. + +Next: Phase 5 (UX adaptation)." "$SHA" +``` + +- [ ] **Step 3: Update state.toml phase_4 status** + +```bash +git add conductor/tracks/qwen_llama_grok_integration_20260606/state.toml +git commit -m "conductor(plan): mark Phase 4 complete in qwen_llama_grok_integration_20260606" +``` + +--- + +# Phase 5: UX Adaptation + +> Goal: GUI reads the capability matrix and adapts 9 UI elements. No regressions in live_gui tests. + +--- + +## Task 5.1: Add _get_active_capabilities() helper to src/gui_2.py + +**Files:** +- Modify: `src/gui_2.py` + +- [ ] **Step 1: Find where the App class lives and the render methods that need the helper** + +Run: `manual-slop_py_get_code_outline path=src/gui_2.py` and look for the App class. + +- [ ] **Step 2: Add _get_active_capabilities() as a method on the App class (or as a module-level helper)** + +```python +def _get_active_capabilities(self) -> "VendorCapabilities": + from src.vendor_capabilities import get_capabilities + try: + return get_capabilities(self._provider, self._model) + except KeyError: + from src.vendor_capabilities import VendorCapabilities + return VendorCapabilities(vendor=self._provider, model=self._model, notes="unregistered") +``` + +(Adjust to match the actual `self._provider` / `self._model` attribute names in the App class.) + +- [ ] **Step 3: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): add _get_active_capabilities() helper reading the matrix" +``` + +--- + +## Task 5.2: Apply UX adaptations to src/gui_2.py + +**Files:** +- Modify: `src/gui_2.py` (9 render functions) + +- [ ] **Step 1: Apply adaptation 1 — Screenshot button enabled iff `caps.vision`** + +Find the screenshot button render code and wrap it: + +```python +caps = self._get_active_capabilities() +if caps.vision: + # existing screenshot button code +``` + +(Disable instead of hide — showing a disabled button with a tooltip is better UX than hiding it entirely. Use `imgui.begin_disabled()` or equivalent.) + +- [ ] **Step 2: Apply adaptation 2 — Tools enabled toggle enabled iff `caps.tool_calling`** + +Similar pattern. + +- [ ] **Step 3: Apply adaptation 3 — Cache panel visible iff `caps.caching`** + +Find the cache panel render and wrap with `if caps.caching:`. + +- [ ] **Step 4: Apply adaptation 4 — Stream progress visible iff `caps.streaming`** + +Find the stream progress bar render and wrap with `if caps.streaming:`. + +- [ ] **Step 5: Apply adaptation 5 — Fetch Models button enabled iff `caps.model_discovery`** + +Wrap with `if caps.model_discovery:` around the button (or `begin_disabled`). + +- [ ] **Step 6: Apply adaptation 6 — Token budget max = `caps.context_window`** + +Find the token budget panel's max value and set it to `caps.context_window`. + +- [ ] **Step 7: Apply adaptation 7-9 — Cost panel: estimate / "Free (local)" / "—"** + +Find the cost panel and add: + +```python +caps = self._get_active_capabilities() +if caps.cost_tracking: + # show cost estimate +elif "localhost" in self._llama_base_url or "127.0.0.1" in self._llama_base_url: + # show "Free (local)" +else: + # show "—" +``` + +(For non-Llama providers, the localhost check doesn't apply; just use `caps.cost_tracking`.) + +- [ ] **Step 8: Run the full test suite; verify no regressions** + +Run: `uv run pytest tests/ -q --timeout=60 2>&1 | tail -10` +Expected: no new failures. + +- [ ] **Step 9: Run live_gui tests in particular (they exercise the GUI)** + +Run: `uv run pytest tests/ -q -m live_gui --timeout=120 2>&1 | tail -10` +Expected: same pass/fail count as before Phase 5. + +- [ ] **Step 10: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): apply 9 capability-driven UX adaptations" +``` + +--- + +## Task 5.3: Update _predefined_callbacks + _gettable_fields for new providers + +**Files:** +- Modify: `src/gui_2.py` (or `src/app_controller.py` where the registries are owned) + +- [ ] **Step 1: Add _send_() as a registered callback if needed for testability** + +If existing providers register their _send_* methods as `_predefined_callbacks` for the test hook API, add the same for qwen, llama, grok. Look at how `_send_minimax` is registered (if it is) and follow the same pattern. + +- [ ] **Step 2: Register capability-getting as a gettable field** + +If `controller._gettable_fields` exposes a `current_capabilities` gettable, add it. If not, leave for a follow-up. + +- [ ] **Step 3: Run live_gui tests; ensure no regressions** + +Run: `uv run pytest tests/ -q -m live_gui --timeout=120 2>&1 | tail -10` + +- [ ] **Step 4: Commit** + +```bash +git add src/gui_2.py src/app_controller.py +git commit -m "feat(gui): register new providers in test hook API" +``` + +--- + +## Task 5.4: Manual smoke test + +**Files:** none (manual verification) + +- [ ] **Step 1: Launch the GUI in test mode** + +Run: `uv run python sloppy.py --enable-test-hooks` in a separate terminal. + +- [ ] **Step 2: Select Qwen provider in AI Settings, send a message with a tool call (e.g., "list the files in src/")** + +Verify: tool executes, response streams, comms log shows the request/response. + +- [ ] **Step 3: Switch to Llama provider (Ollama if available, else cloud), send the same message** + +Verify: same flow works. + +- [ ] **Step 4: Switch to Grok provider, send the same message** + +Verify: same flow works. + +- [ ] **Step 5: Switch to a vision-capable model (Qwen-VL-Max, Llama-3.2-11B-Vision, or Grok-2-Vision), attach a screenshot, send a message** + +Verify: vision is recognized; response describes the image. + +- [ ] **Step 6: Switch to MiniMax provider (the refactored one), send the same message** + +Verify: MiniMax still works after the refactor. + +- [ ] **Step 7: Document the smoke test results** + +In a short markdown file (e.g., `docs/test_smoke_20260606.md` or a git note), record: +- Qwen: pass/fail +- Llama: pass/fail +- Grok: pass/fail +- Vision: pass/fail +- MiniMax: pass/fail (regression check) + +- [ ] **Step 8: Commit the smoke test notes** + +```bash +git add docs/test_smoke_20260606.md +git commit -m "docs(smoke): Phase 5 manual smoke test results for new vendors" +``` + +--- + +## Task 5.5: Phase 5 checkpoint + +**Files:** none (commit + note only) + +- [ ] **Step 1: Run all tests, full suite** + +Run: `uv run pytest tests/ -q --timeout=120 2>&1 | tail -10` +Expected: no new failures. + +- [ ] **Step 2: Create the checkpoint commit and git note** + +```bash +git add -A +if ! git diff --cached --quiet; then git commit -m "conductor(checkpoint): Phase 5 complete - UX adaptation + integration"; fi +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 5 checkpoint: UX adaptation + integration. + +9 UI elements in src/gui_2.py now read from the capability matrix: +1. Screenshot button enabled iff vision=true +2. Tools toggle enabled iff tool_calling=true +3. Cache panel visible iff caching=true +4. Stream progress visible iff streaming=true +5. Fetch Models button enabled iff model_discovery=true +6. Token budget max = capabilities.context_window +7-9. Cost panel: estimate / 'Free (local)' / '—' based on cost_tracking + base_url + +Manual smoke test: Qwen/Llama/Grok/vision/MiniMax (regression) all pass. + +Next: Phase 6 (docs + archive)." "$SHA" +``` + +- [ ] **Step 3: Update state.toml phase_5 status** + +```bash +git add conductor/tracks/qwen_llama_grok_integration_20260606/state.toml +git commit -m "conductor(plan): mark Phase 5 complete in qwen_llama_grok_integration_20260606" +``` + +--- + +# Phase 6: Docs + Archive + +> Goal: Update documentation. Move the track to the archive. + +--- + +## Task 6.1: Update docs/guide_ai_client.md + +**Files:** +- Modify: `docs/guide_ai_client.md` + +- [ ] **Step 1: Add a section on the new vendors** + +Insert after the existing per-provider sections (~line 250): + +```markdown +### Qwen (DashScope native SDK) + +Qwen is a first-class vendor with per-vendor code path because the DashScope +native API unlocks features that the OpenAI-compatible mode loses (Qwen-Audio, +Qwen-Long custom chunking, Qwen-VL-Max enhanced vision). See the +`docs/seller_capabilities.py` registry for the per-model matrix. + +### Llama (Ollama + OpenRouter + custom URL) + +Llama is configured per-project via `llama_base_url` and `llama_api_key`. +Ollama is the local default (free); OpenRouter is the cloud aggregator +(single API key covers Together, Groq, Fireworks). The shared +`send_openai_compatible()` helper handles all three. + +### Grok (xAI) + +Grok uses xAI's OpenAI-compatible endpoint (`https://api.x.ai/v1`) via the +shared helper. Grok-2-Vision is registered with `vision: true`. +``` + +- [ ] **Step 2: Add a section on the capability matrix** + +Insert after the "Provider-Specific Behaviors" section: + +```markdown +### Vendor Capability Matrix + +Each (vendor, model) entry in `src/vendor_capabilities.py:_REGISTRY` declares: +- `vision` (bool): can accept image inputs +- `tool_calling` (bool): supports function/tool calls +- `caching` (bool): supports server-side prompt caching +- `streaming` (bool): supports streaming responses +- `model_discovery` (bool): backend exposes /v1/models +- `context_window` (int): max input tokens +- `cost_tracking` (bool): per-token pricing known + +The GUI reads `get_capabilities(active_vendor, active_model)` once per render +and uses these to enable/disable UI elements. See +`docs/seller_capabilities.py` for the canonical registry. + +### Shared OpenAI-Compatible Helper + +`src/openai_compatible.py` provides `send_openai_compatible(client, request, capabilities)` +which handles the OpenAI Chat Completions API call (streaming or blocking), +response parsing, tool-call detection, and error classification. Vendors +that use the OpenAI SDK (MiniMax, Llama, Grok) call into this helper; the +per-vendor `_send_()` is a thin adapter that initializes the client +and loads/saves its history. +``` + +- [ ] **Step 3: Commit** + +```bash +git add docs/guide_ai_client.md +git commit -m "docs(ai_client): document new vendors, capability matrix, shared helper" +``` + +--- + +## Task 6.2: Update docs/guide_models.md + +**Files:** +- Modify: `docs/guide_models.md` + +- [ ] **Step 1: Add new providers to the PROVIDERS table** + +Find the table of providers in `docs/guide_models.md` and add rows: + +```markdown +| qwen | DashScope | Qwen-Plus/Max/Turbo/Long, Qwen-VL-Plus/Max, Qwen-Audio | yes (native) | +| llama | Ollama / OpenRouter / custom | Llama 3.1 8B/70B/405B, 3.2 1B/3B/11B-Vision/90B-Vision, 3.3 70B | yes (OpenAI-compat) | +| grok | xAI | Grok-2, Grok-2-Vision, Grok-Beta | yes (OpenAI-compat) | +``` + +- [ ] **Step 2: Commit** + +```bash +git add docs/guide_models.md +git commit -m "docs(models): add Qwen, Llama, Grok to PROVIDERS table" +``` + +--- + +## Task 6.3: Archive the track + +**Files:** +- Move: `conductor/tracks/qwen_llama_grok_integration_20260606/` → `conductor/tracks/archive/qwen_llama_grok_integration_20260606/` +- Modify: `conductor/tracks.md` + +- [ ] **Step 1: Confirm no remaining references to the track at the non-archive location** + +Run: `rg "qwen_llama_grok_integration_20260606" --type md | grep -v "archive"` +Expected: only the tracks.md entry. + +- [ ] **Step 2: git mv the track directory** + +```bash +git mv conductor/tracks/qwen_llama_grok_integration_20260606 conductor/tracks/archive/qwen_llama_grok_integration_20260606 +``` + +- [ ] **Step 3: Update tracks.md: change [ ] to [x], move to Recently Completed** + +Edit `conductor/tracks.md`: find the line for `qwen_llama_grok_integration_20260606` in the Backlog section, change `[ ]` to `[x]`, and move the entry to the "Recently Completed Tracks" section. + +- [ ] **Step 4: Commit** + +```bash +git add conductor/tracks.md conductor/tracks/qwen_llama_grok_integration_20260606 +git commit -m "conductor(archive): ship qwen_llama_grok_integration_20260606 to archive" +``` + +--- + +## Task 6.4: Phase 6 checkpoint (TRACK COMPLETE) + +**Files:** +- Modify: `conductor/tracks/archive/qwen_llama_grok_integration_20260606/state.toml` + +- [ ] **Step 1: Run the full test suite one final time** + +Run: `uv run pytest tests/ -q --timeout=120 2>&1 | tail -10` +Expected: no new failures vs baseline. + +- [ ] **Step 2: Mark all phases complete in state.toml** + +Edit state.toml: +- `current_phase = 6` (or remove) +- All phase_N entries: `status = "completed"`, `checkpoint_sha` filled in +- Update `[verification]` section: all values to `true` +- Add final line: `# Track completed 2026-06-06 and archived.` + +- [ ] **Step 3: Create the final checkpoint commit and git note** + +```bash +git add -A +if ! git diff --cached --quiet; then git commit -m "conductor(checkpoint): Phase 6 complete - track shipped to archive"; fi +SHA=$(git log -1 --format="%H") +git notes add -m "TRACK COMPLETE: qwen_llama_grok_integration_20260606 + +Final state: +- src/vendor_capabilities.py: 7-capability matrix; Qwen/Llama/Grok/MiniMax fully populated; Anthropic/Gemini/DeepSeek stubs marked 'pending_migration' +- src/openai_compatible.py: shared send helper +- src/qwen_adapter.py: DashScope-specific adapter +- src/ai_client.py: _send_qwen, _send_grok, _send_llama, _send_minimax (refactored) +- src/cost_tracker.py: pricing for 18 new models (7 Qwen + 8 Llama + 3 Grok) +- src/gui_2.py: 9 capability-driven UX adaptations +- 7 new test files; 26 unit tests pass +- docs updated; track archived + +Foundation laid for follow-up track: 'Anthropic/Gemini/DeepSeek Capability Matrix Migration'." "$SHA" +``` + +- [ ] **Step 4: Commit state.toml update** + +```bash +git add conductor/tracks/archive/qwen_llama_grok_integration_20260606/state.toml +git commit -m "conductor(plan): mark Phase 6 complete in qwen_llama_grok_integration_20260606" +``` + +--- + +# Self-Review + +**1. Spec coverage:** + +| Spec Section | Plan Coverage | +|---|---| +| §1 Overview | All 4 main points (Qwen + Llama + Grok + matrix + MiniMax refactor) addressed in Phases 1-5. | +| §2 Goals | A (matrix + 3 vendors): Phase 1 (matrix), Phase 2 (Qwen), Phase 3 (Grok + Llama). B (shared helper + MiniMax refactor): Phase 1 (helper), Phase 4 (refactor). C (UX + docs): Phase 5 (UX), Phase 6 (docs). | +| §3 Architecture | 3.1 data-oriented design: implemented in shared helper (Phase 1) + per-vendor adapters (Phases 2-3). 3.2 module layout: all files created/modified per the table. 3.3 capability matrix v1: 7 capabilities in `_REGISTRY`. 3.4 per-(vendor, model) capabilities: implemented. | +| §4.1 Qwen | Phase 2: state, _ensure_qwen_client, _dashscope_call, _send_qwen, _list_qwen_models, registry, pricing, credentials, PROVIDERS registration. | +| §4.2 Llama | Phase 3: state, _ensure_llama_client, _send_llama (multi-backend), _list_llama_models, _get_llama_cost_tracking, registry, pricing, credentials, PROVIDERS registration. | +| §4.3 Grok | Phase 3: state, _ensure_grok_client, _send_grok, registry, pricing, credentials, PROVIDERS registration. | +| §5 Shared helper | Phase 1: NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible, _classify_openai_compatible_error. 5.2 MiniMax refactor: Phase 4. | +| §6 UX adaptation | Phase 5: 9 UI adaptations applied. | +| §7 Configuration | Phase 1 (pyproject.toml dep), Phase 2-3 (credentials_template.toml + per-project TOML patterns), Phase 5 (PROVIDERS lists). | +| §8 Testing | 6 new test files (test_vendor_capabilities, test_openai_compatible, test_qwen, test_llama, test_grok); test_minimax modified (just verification of refactor). | +| §9 Migration | 6 phases; each phase is a plan phase. | +| §10 Risks | MiniMax refactor: Phase 4 Task 4.1 (baseline) + 4.2 (refactor with debugging-forward). DashScope SDK: pinned version. OpenRouter pricing: cost overrides per-project. Capability drift: documented. | +| §11 Out of scope | Audio, server-side code execution, Anthropic/Gemini/DeepSeek migration — all explicitly out of scope. | +| §12 Open questions | Per-model cost overrides, default Llama base URL, DashScope region, Qwen-Coder/Math — all called out in spec; not blocking. | +| §13 See also | Follow-up track mentioned in Phase 6 git note. | + +**2. Placeholder scan:** No "TBD", "TODO", "implement later", "add appropriate error handling" in the plan. Helper names are concrete (`_parse_discussion_history`, `_build_user_content`, etc.) but where they don't yet exist, they're called out as "adapt to actual project pattern" with a reference to look at `_send_minimax`. The MiniMax refactor (Task 4.2) uses one of these patterns explicitly. + +**3. Type consistency:** `VendorCapabilities`, `NormalizedResponse`, `OpenAICompatibleRequest`, `ProviderError` defined in Phase 1; used consistently in Phases 2-5. `send_openai_compatible()` signature stable across all vendor adapters. `_send_()` signatures mirror `_send_minimax` (the established pattern). + +No issues found. Plan ready for execution by the user's external planning-execution agent.