From 4e4a56fd085ecf85dec5a80fd68d73a0a03dab1e Mon Sep 17 00:00:00 2001 From: Ed_ Date: Thu, 11 Jun 2026 09:40:41 -0400 Subject: [PATCH] docs(plan): add plan.md for qwen_llama_grok_followup_20260611 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The follow-up track had a spec but no plan. The plan is the executable artifact — it specifies file:line refs, exact code to type, TDD steps, and per-file atomic commits. Without the plan, the next agent cannot implement from the spec alone. Plan structure (5 phases, ~40 tasks): - Phase 1: Tool loop lift (5 Red tests + helper + apply to 8 vendors + audit script) - Phase 2: PROVIDERS move (decide location + move + update 4 import sites + audit script) - Phase 3: UX adaptations 2-9 (8 separate applications of the pattern established in parent Phase 5) - Phase 4: Local-first + matrix v2 (12 new fields + native Ollama adapter + Meta Llama API + Local Model GUI badge) - Phase 5: Anthropic / Gemini / DeepSeek migration (matrix entries for the 3 remaining providers + docs update) Each task has: - WHERE: exact file and (where applicable) line range - WHAT: the specific change - HOW: TDD step ordering (Red then Green) - SAFETY: thread-safety, dependency-ordering, and project-invariant constraints The plan models the parent track's plan structure (2177 lines, 2-5 minute steps, per-file atomic commits). --- .../qwen_llama_grok_followup_20260611/plan.md | 1461 +++++++++++++++++ 1 file changed, 1461 insertions(+) create mode 100644 conductor/tracks/qwen_llama_grok_followup_20260611/plan.md diff --git a/conductor/tracks/qwen_llama_grok_followup_20260611/plan.md b/conductor/tracks/qwen_llama_grok_followup_20260611/plan.md new file mode 100644 index 00000000..24200276 --- /dev/null +++ b/conductor/tracks/qwen_llama_grok_followup_20260611/plan.md @@ -0,0 +1,1461 @@ +# Qwen/Llama/Grok Follow-Up — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Address the gaps left by the parent track `qwen_llama_grok_integration_20260606` per the audit report. Ship 5 phases: (1) tool loop lift, (2) PROVIDERS move, (3) UX adaptations 2-9, (4) local-first + matrix v2, (5) Anthropic/Gemini/DeepSeek migration. + +**Architecture:** All new helpers (`run_with_tool_loop`) operate on the existing `OpenAICompatibleRequest` / `NormalizedResponse` data structures. Each vendor entry point stays a thin boundary adapter. The capability matrix grows from 7 v1 fields to 19 v2 fields; the GUI reads the matrix and adapts 9+ UI elements accordingly. Local models (Ollama, Meta Llama API) become first-class, not "one of 3 backends." + +**Tech Stack:** Python 3.11+, pytest, `dashscope>=1.14.0,<2.0.0` (already a dep), `openai>=1.0.0` (already a dep). Stdlib `dataclasses`, `enum`, `threading`, `asyncio`, `typing`. **1-space indentation mandatory.** No comments in production code. + +**Reference:** See `conductor/tracks/qwen_llama_grok_followup_20260611/spec.md` for the full design and rationale. See `docs/reports/qwen_llama_grok_followup_audit_20260611.md` for the gap analysis. See parent track `conductor/tracks/qwen_llama_grok_integration_20260606/` for the working code we extend. + +--- + +## File Structure (delta from parent track) + +| File | Action | Responsibility | +|---|---|---| +| `src/tool_loop.py` | Create | `run_with_tool_loop(client, request, capabilities, *, pre_tool_callback, qa_callback, patch_callback, base_dir, vendor_name, history_lock, history, trim_func=None) -> str` | +| `src/llama_ollama_native.py` | Create | Native Ollama adapter (`/api/chat`); replaces OpenAI-compatible for Ollama backend | +| `src/llama_meta_api.py` | Create | Meta Llama API adapter; new 4th backend | +| `scripts/audit_no_inline_tool_loops.py` | Create | Fails if any `_send_()` has an inline tool loop | +| `scripts/audit_providers_source_of_truth.py` | Create | Fails if `PROVIDERS` is declared in `src/models.py` | +| `tests/test_tool_loop.py` | Create | 5+ tests for `run_with_tool_loop` | +| `tests/test_llama_ollama_native.py` | Create | 3+ tests for native Ollama adapter | +| `tests/test_llama_meta_api.py` | Create | 2+ tests for Meta Llama API adapter | +| `src/ai_client.py` | Modify | (1) Use `run_with_tool_loop` in all 8 vendors; (2) move PROVIDERS; (3) add native Ollama + Meta Llama; (4) add v2 matrix fields | +| `src/vendor_capabilities.py` | Modify | Add 12 v2 fields; populate v2 entries for all vendors | +| `src/models.py` | Modify | Remove `PROVIDERS` (moves to `src/ai_client.py` or `src/ai_client_providers.py`); keep re-export shim for backward compat | +| `src/gui_2.py` | Modify | Apply UX adaptations 2-9; add "Local Model" badge | +| `docs/guide_ai_client.md` | Modify (Phase 5) | Document `run_with_tool_loop`; v2 fields; local-first | +| `docs/guide_models.md` | Modify (Phase 5) | Document new PROVIDERS location; v2 fields | + +--- + +# Phase 1: Tool Loop Lift + +> Goal: 8 vendors share one tool-call loop. The loop is data-oriented (operates on `NormalizedResponse.tool_calls`). Each vendor injects its own history-management callbacks. + +--- + +## Task 1.1: Write red tests for `run_with_tool_loop` + +**Files:** Create `tests/test_tool_loop.py` + +- [ ] **Step 1: Create the file with 5 tests** + +```python +from unittest.mock import MagicMock, patch +import pytest +from src.openai_compatible import NormalizedResponse, OpenAICompatibleRequest +from src.tool_loop import run_with_tool_loop +from src.vendor_capabilities import VendorCapabilities + +@pytest.fixture +def caps() -> VendorCapabilities: + return VendorCapabilities(vendor="test", model="test-model", tool_calling=True, context_window=8192) + +def _make_normalized_response(text: str = "ok", tool_calls: list | None = None) -> NormalizedResponse: + return NormalizedResponse( + text=text, tool_calls=tool_calls or [], + usage_input_tokens=10, usage_output_tokens=5, + usage_cache_read_tokens=0, usage_cache_creation_tokens=0, + raw_response=None, + ) + +def test_run_with_tool_loop_no_tool_calls_returns_immediately(caps: VendorCapabilities) -> None: + client = MagicMock() + with patch("src.tool_loop.send_openai_compatible", return_value=_make_normalized_response("hello")) as call: + result = run_with_tool_loop( + client, OpenAICompatibleRequest(messages=[{"role": "user", "content": "x"}], model="m"), + capabilities=caps, + pre_tool_callback=None, qa_callback=None, patch_callback=None, + base_dir=".", vendor_name="test", history_lock=None, history=[], + ) + assert result == "hello" + assert call.call_count == 1 + +def test_run_with_tool_loop_dispatches_tool_calls(caps: VendorCapabilities) -> None: + client = MagicMock() + tool_response = _make_normalized_response( + "first response", tool_calls=[{"id": "c1", "function": {"name": "read_file", "arguments": "{}"}}] + ) + final_response = _make_normalized_response("after tool") + with patch("src.tool_loop.send_openai_compatible", side_effect=[tool_response, final_response]) as call, \ + patch("src.tool_loop._execute_tool_calls_concurrently", return_value=[("read_file", "c1", "result", "")]) as dispatch: + result = run_with_tool_loop( + client, OpenAICompatibleRequest(messages=[{"role": "user", "content": "x"}], model="m"), + capabilities=caps, + pre_tool_callback=None, qa_callback=None, patch_callback=None, + base_dir=".", vendor_name="test", history_lock=None, history=[], + ) + assert result == "after tool" + assert call.call_count == 2 # first call returns tool_calls, second returns text + assert dispatch.call_count == 1 + +def test_run_with_tool_loop_respects_max_rounds(caps: VendorCapabilities) -> None: + client = MagicMock() + infinite_tool_response = _make_normalized_response( + "loop", tool_calls=[{"id": "c1", "function": {"name": "noop", "arguments": "{}"}}] + ) + with patch("src.tool_loop.send_openai_compatible", return_value=infinite_tool_response), \ + patch("src.tool_loop._execute_tool_calls_concurrently", return_value=[("noop", "c1", "result", "")]): + result = run_with_tool_loop( + client, OpenAICompatibleRequest(messages=[{"role": "user", "content": "x"}], model="m"), + capabilities=caps, + pre_tool_callback=None, qa_callback=None, patch_callback=None, + base_dir=".", vendor_name="test", history_lock=None, history=[], + ) + # MAX_TOOL_ROUNDS is 10; +2 = 12 max iterations + assert result == "loop" + # The function should bail out after max rounds (return last text) + +def test_run_with_tool_loop_appends_to_history(caps: VendorCapabilities) -> None: + client = MagicMock() + history: list = [] + history_lock = MagicMock() + history_lock.__enter__ = MagicMock(return_value=history_lock) + history_lock.__exit__ = MagicMock(return_value=False) + with patch("src.tool_loop.send_openai_compatible", return_value=_make_normalized_response("hi")): + run_with_tool_loop( + client, OpenAICompatibleRequest(messages=[{"role": "user", "content": "x"}], model="m"), + capabilities=caps, + pre_tool_callback=None, qa_callback=None, patch_callback=None, + base_dir=".", vendor_name="test", history_lock=history_lock, history=history, + ) + # The history should have an assistant message appended + assert any(msg.get("role") == "assistant" and msg.get("content") == "hi" for msg in history) + +def test_run_with_tool_loop_does_not_crash_on_tool_error(caps: VendorCapabilities) -> None: + client = MagicMock() + tool_response = _make_normalized_response( + "err", tool_calls=[{"id": "c1", "function": {"name": "fail", "arguments": "{}"}}] + ) + final_response = _make_normalized_response("recovered") + with patch("src.tool_loop.send_openai_compatible", side_effect=[tool_response, final_response]), \ + patch("src.tool_loop._execute_tool_calls_concurrently", return_value=[("fail", "c1", "", "ToolExecutionError")]): + result = run_with_tool_loop( + client, OpenAICompatibleRequest(messages=[{"role": "user", "content": "x"}], model="m"), + capabilities=caps, + pre_tool_callback=None, qa_callback=None, patch_callback=None, + base_dir=".", vendor_name="test", history_lock=None, history=[], + ) + assert result == "recovered" # The function continues even if a tool errors +``` + +- [ ] **Step 2: Run, confirm 5 tests fail with ModuleNotFoundError** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_tool_loop.py -v +``` + +Expected: 5 failed, all with `ModuleNotFoundError: No module named 'src.tool_loop'`. + +- [ ] **Step 3: Commit (red)** + +```bash +git add tests/test_tool_loop.py +git commit -m "test(tool_loop): add red tests for run_with_tool_loop shared helper" +``` + +--- + +## Task 1.2: Implement `src/tool_loop.py` + +**Files:** Create `src/tool_loop.py` + +- [ ] **Step 1: Create the file** + +```python +from __future__ import annotations +import asyncio +import threading +from typing import Any, Callable, Optional + +from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible +from src.vendor_capabilities import VendorCapabilities +from src.ai_client import ( + MAX_TOOL_ROUNDS, + _execute_tool_calls_concurrently, +) + +def run_with_tool_loop( + client: Any, + request: OpenAICompatibleRequest, + *, + capabilities: VendorCapabilities, + pre_tool_callback: Optional[Callable] = None, + qa_callback: Optional[Callable] = None, + stream_callback: Optional[Callable[[str], None]] = None, + patch_callback: Optional[Callable] = None, + base_dir: str, + vendor_name: str, + history_lock: Optional[threading.Lock] = None, + history: Optional[list] = None, + trim_func: Optional[Callable] = None, +) -> str: + response_text: str = "" + for _round_idx in range(MAX_TOOL_ROUNDS + 2): + response = send_openai_compatible(client, request, capabilities=capabilities) + if history_lock is not None and history is not None: + with history_lock: + msg = {"role": "assistant", "content": response.text or None} + if response.tool_calls: + msg["tool_calls"] = response.tool_calls + history.append(msg) + if not response.tool_calls: + response_text = response.text + break + try: + loop = asyncio.get_running_loop() + results = asyncio.run_coroutine_threadsafe( + _execute_tool_calls_concurrently( + response.tool_calls, base_dir, pre_tool_callback, + qa_callback, _round_idx, vendor_name, patch_callback, + ), + loop, + ).result() + except RuntimeError: + results = asyncio.run(_execute_tool_calls_concurrently( + response.tool_calls, base_dir, pre_tool_callback, + qa_callback, _round_idx, vendor_name, patch_callback, + )) + if history_lock is not None and history is not None: + with history_lock: + for _i, (name, call_id, out, _err) in enumerate(results): + history.append({ + "role": "tool", + "tool_call_id": call_id, + "content": str(out) if out else "", + }) + if trim_func is not None: + trim_func(history) + return response_text +``` + +- [ ] **Step 2: Run the 5 Red tests; confirm they now pass** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_tool_loop.py -v +``` + +Expected: 5 passed. + +- [ ] **Step 3: Commit (green)** + +```bash +git add src/tool_loop.py +git commit -m "feat(tool_loop): add run_with_tool_loop shared helper for all 8 vendors" +``` + +--- + +## Task 1.3: Apply `run_with_tool_loop` to `_send_minimax` + +**Files:** Modify `src/ai_client.py` (lines 2253-2332 per the parent track's refactor) + +- [ ] **Step 1: Read the current `_send_minimax` (75 lines after parent Phase 4 refactor)** + +Use `manual-slop_py_get_definition` to confirm the function body. + +- [ ] **Step 2: Replace the inline tool loop with `run_with_tool_loop` call** + +The replacement pattern: +```python +def _send_minimax(md_content, user_message, base_dir, file_items=None, discussion_history="", stream=False, pre_tool_callback=None, qa_callback=None, stream_callback=None, patch_callback=None) -> str: + _ensure_minimax_client() + from src.openai_compatible import OpenAICompatibleRequest + from src.vendor_capabilities import get_capabilities + from src.tool_loop import run_with_tool_loop + tools: list[dict[str, Any]] | None = _get_deepseek_tools() or None + with _minimax_history_lock: + _repair_minimax_history(_minimax_history) + if discussion_history and not _minimax_history: + _minimax_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"}) + else: + _minimax_history.append({"role": "user", "content": user_message}) + messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n\n{md_content}\n"}] + messages.extend(_minimax_history) + request = OpenAICompatibleRequest( + messages=messages, model=_model, temperature=_temperature, + top_p=_top_p, max_tokens=min(_max_tokens, 8192), stream=stream, + stream_callback=stream_callback, tools=tools, tool_choice="auto" if tools else "auto", + ) + caps = get_capabilities("minimax", _model) + return run_with_tool_loop( + _minimax_client, request, capabilities=caps, + pre_tool_callback=pre_tool_callback, qa_callback=qa_callback, + patch_callback=patch_callback, base_dir=base_dir, + vendor_name="minimax", history_lock=_minimax_history_lock, + history=_minimax_history, trim_func=lambda h: _trim_minimax_history(messages, h), + ) +``` + +Use `manual-slop_py_update_definition` to replace. + +- [ ] **Step 3: Run the 6 existing `test_minimax_provider.py` tests; confirm all pass** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_minimax_provider.py -v +``` + +Expected: 6 passed (no regression). + +- [ ] **Step 4: Commit** + +```bash +git add src/ai_client.py +git commit -m "refactor(minimax): use run_with_tool_loop shared helper (75 -> 50 lines)" +``` + +--- + +## Task 1.4: Apply `run_with_tool_loop` to `_send_qwen`, `_send_grok`, `_send_llama` + +**Files:** Modify `src/ai_client.py` (the 3 single-shot vendor entry points shipped in parent Phases 2 and 3) + +- [ ] **Step 1: Read each entry point (locations from parent track):** +- `_send_qwen` — at `src/ai_client.py` lines around 2300-2400 (per parent Phase 2 commit `b75f60c3`) +- `_send_grok` — at `src/ai_client.py` lines around 2184-2240 (per parent Phase 3 commit `29a96cc9`) +- `_send_llama` — at `src/ai_client.py` lines around 2400+ (per parent Phase 3 commit `29a96cc9`) + +Use `manual-slop_py_get_definition` for each. + +- [ ] **Step 2: For each entry point, replace the single-shot call with `run_with_tool_loop`** + +Pattern (using `_send_grok` as example; `_send_qwen` and `_send_llama` follow the same pattern with their own state): +```python +def _send_grok(md_content, user_message, base_dir, file_items=None, discussion_history="", stream=False, pre_tool_callback=None, qa_callback=None, stream_callback=None, patch_callback=None) -> str: + client = _ensure_grok_client() + from src.openai_compatible import OpenAICompatibleRequest + from src.vendor_capabilities import get_capabilities + from src.tool_loop import run_with_tool_loop + tools: list[dict[str, Any]] | None = _get_deepseek_tools() or None + with _grok_history_lock: + if discussion_history and not _grok_history: + _grok_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"}) + else: + _grok_history.append({"role": "user", "content": user_message}) + messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n\n{md_content}\n"}] + messages.extend(_grok_history) + request = OpenAICompatibleRequest( + messages=messages, model=_model, temperature=_temperature, + top_p=_top_p, max_tokens=_max_tokens, stream=stream, + stream_callback=stream_callback, tools=tools, tool_choice="auto" if tools else "auto", + ) + caps = get_capabilities("grok", _model) + return run_with_tool_loop( + client, request, capabilities=caps, + pre_tool_callback=pre_tool_callback, qa_callback=qa_callback, + patch_callback=patch_callback, base_dir=base_dir, + vendor_name="grok", history_lock=_grok_history_lock, + history=_grok_history, + ) +``` + +- [ ] **Step 3: Verify the existing 5 Qwen + 2 Grok + 6 Llama tests still pass** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_qwen_provider.py tests/test_grok_provider.py tests/test_llama_provider.py -v +``` + +Expected: 13 passed (5 + 2 + 6). + +- [ ] **Step 4: Commit** + +```bash +git add src/ai_client.py +git commit -m "feat(tool_loop): apply run_with_tool_loop to Qwen, Grok, Llama (3 vendors)" +``` + +--- + +## Task 1.5: Apply `run_with_tool_loop` to `_send_anthropic`, `_send_gemini`, `_send_gemini_cli`, `_send_deepseek` + +**Files:** Modify `src/ai_client.py` (the 4 pre-existing inline-loop vendors) + +- [ ] **Step 1: Read each entry point:** +- `_send_anthropic` at `src/ai_client.py:1210` (per the parent's `py_get_code_outline` output) +- `_send_gemini` at `src/ai_client.py:1456` +- `_send_gemini_cli` at `src/ai_client.py:1692` +- `_send_deepseek` at `src/ai_client.py:1840` + +- [ ] **Step 2: Replace each inline tool loop with `run_with_tool_loop`** + +Same pattern as Tasks 1.3-1.4. The function body shrinks from ~250 lines to ~50-70 lines per vendor. + +- [ ] **Step 3: Verify the existing 12+ Anthropic/Gemini/DeepSeek tests still pass** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_anthropic_provider.py tests/test_gemini_provider.py tests/test_gemini_cli_adapter.py tests/test_deepseek_provider.py -v +``` + +Expected: 12+ passed (no regression). + +- [ ] **Step 4: Commit** + +```bash +git add src/ai_client.py +git commit -m "refactor(tool_loop): apply run_with_tool_loop to 4 pre-existing vendors (Anthropic, Gemini, Gemini CLI, DeepSeek)" +``` + +--- + +## Task 1.6: Add `scripts/audit_no_inline_tool_loops.py` + +**Files:** Create `scripts/audit_no_inline_tool_loops.py` + +- [ ] **Step 1: Create the audit script** + +```python +""" +Audit: fail if any _send_ in src/ai_client.py contains an inline +tool-call loop (i.e., a for loop with MAX_TOOL_ROUNDS in it). + +The project invariant (set by the follow-up track) is: all tool loops +go through src.tool_loop.run_with_tool_loop. Inline loops are forbidden. + +Usage: uv run python scripts/audit_no_inline_tool_loops.py +Exit code: 0 = pass; 1 = violations found. +""" +import re +import sys +from pathlib import Path + +TARGET = Path("src/ai_client.py") +PATTERN = re.compile(r"def _send_(\w+).*?for _round_idx in range\(MAX_TOOL_ROUNDS", re.DOTALL) + +def main() -> int: + text = TARGET.read_text(encoding="utf-8") + violations = [] + for match in re.finditer(r"def _send_(\w+)\(", text): + vendor = match.group(1) + func_start = match.start() + next_def = re.search(r"\ndef _send_\w+\(", text[func_start + 1:]) + func_end = func_start + 1 + (next_def.start() if next_def else len(text) - func_start - 1) + func_body = text[func_start:func_end] + if "for _round_idx in range(MAX_TOOL_ROUNDS" in func_body: + if "run_with_tool_loop" not in func_body: + violations.append(vendor) + if violations: + print(f"FAIL: {len(violations)} vendor(s) have inline tool loops: {violations}") + print("Use src.tool_loop.run_with_tool_loop instead.") + return 1 + print("OK: all _send_ functions use run_with_tool_loop") + return 0 + +if __name__ == "__main__": + sys.exit(main()) +``` + +- [ ] **Step 2: Run; confirm OK** + +``` +cd C:\projects\manual_slop && uv run python scripts/audit_no_inline_tool_loops.py +``` + +Expected: `OK: all _send_ functions use run_with_tool_loop` + +- [ ] **Step 3: Add to README and verify it runs in CI** + +(Per the existing audit scripts in the repo: `check_test_toml_paths.py`, `audit_main_thread_imports.py`, etc.) + +- [ ] **Step 4: Commit** + +```bash +git add scripts/audit_no_inline_tool_loops.py +git commit -m "ci(audit): add audit_no_inline_tool_loops.py to enforce run_with_tool_loop" +``` + +--- + +## Task 1.7: Phase 1 verification + checkpoint + +- [ ] **Step 1: Run the full regression batch (38+ tests)** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_tool_loop.py tests/test_qwen_provider.py tests/test_grok_provider.py tests/test_llama_provider.py tests/test_minimax_provider.py tests/test_anthropic_provider.py tests/test_gemini_provider.py tests/test_gemini_cli_adapter.py tests/test_deepseek_provider.py tests/test_vendor_capabilities.py tests/test_openai_compatible.py -q +``` + +Expected: 38+ passed, no regressions. + +- [ ] **Step 2: Run all 4 audit scripts** + +``` +cd C:\projects\manual_slop && uv run python scripts/audit_main_thread_imports.py && uv run python scripts/audit_weak_types.py && uv run python scripts/check_test_toml_paths.py && uv run python scripts/audit_no_models_config_io.py && uv run python scripts/audit_no_inline_tool_loops.py +``` + +Expected: All pass. + +- [ ] **Step 3: Create Phase 1 checkpoint commit (empty commit)** + +```bash +git commit --allow-empty -m "conductor(checkpoint): Phase 1 - run_with_tool_loop shipped for 8 vendors" +``` + +- [ ] **Step 4: Attach git note with verification report** + +```bash +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 1 checkpoint: tool loop lift + +run_with_tool_loop helper in src/tool_loop.py wraps send_openai_compatible +with the tool-call loop. Applied to all 8 vendors: +- _send_minimax (was inline, now uses helper) +- _send_qwen (was single-shot, now has loop) +- _send_grok (was single-shot, now has loop) +- _send_llama (was single-shot, now has loop) +- _send_anthropic (was inline, now uses helper) +- _send_gemini (was inline, now uses helper) +- _send_gemini_cli (was inline, now uses helper) +- _send_deepseek (was inline, now uses helper) + +scripts/audit_no_inline_tool_loops.py enforces the pattern. + +Verification: +- 38+ tests pass in batch +- 5 audit scripts pass +- src/ai_client.py line count reduction: TBD (estimate -500 to -1000 lines)" "$SHA" +``` + +- [ ] **Step 5: Update state.toml** + +Edit `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml`: mark t1_1 through t1_9 completed with commit SHAs; mark phase_1 = completed with checkpoint_sha. + +```bash +git add conductor/tracks/qwen_llama_grok_followup_20260611/state.toml +git commit -m "conductor(plan): mark Phase 1 complete" +``` + +--- + +# Phase 2: PROVIDERS Move + +> Goal: `PROVIDERS` no longer lives in `src/models.py`. It moves to `src/ai_client.py` (or new `src/ai_client_providers.py`). An audit script enforces the move. The 4 import sites are updated. + +--- + +## Task 2.1: Decide location (open question) + +- [ ] **Step 1: Open question resolution** + +Two options: +- (A) Move to `src/ai_client.py` (existing file, less file proliferation) +- (B) New `src/ai_client_providers.py` (clearer separation, easier to find) + +**Recommendation: (A) `src/ai_client.py`.** The PROVIDERS list is small (8 entries); creating a new file for a single constant is over-engineering. The vendor list is logically part of the AI client. + +If the agent disagrees, they can use (B). Document the choice in the commit message. + +--- + +## Task 2.2: Move PROVIDERS + update import sites + +**Files:** Modify `src/models.py`, `src/app_controller.py`, `src/gui_2.py` + +- [ ] **Step 1: Add `PROVIDERS` to `src/ai_client.py`** + +Find a good location (near the other module-level constants like `MAX_TOOL_ROUNDS` at line 186). Add: +```python +PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"] +``` + +- [ ] **Step 2: Update `src/models.py` to remove `PROVIDERS` and add a re-export shim for backward compat** + +```python +# Old (line 56): +# PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"] + +# New (re-export shim for backward compat; remove in a future track): +from src.ai_client import PROVIDERS # noqa: F401 (re-export shim for backward compat) +``` + +- [ ] **Step 3: Update the 4 import sites to import from `src/ai_client` directly (not via `models`):** +- `src/app_controller.py:3093` — change `models.PROVIDERS` to `from src.ai_client import PROVIDERS; PROVIDERS` (top-level import) +- `src/gui_2.py:2293, 2849, 5377` — same change + +For each, use `manual-slop_edit_file` to: +1. Add `from src.ai_client import PROVIDERS` to the imports section (if not already imported) +2. Change `models.PROVIDERS` to `PROVIDERS` + +- [ ] **Step 4: Verify all 38+ tests pass** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_minimax_provider.py tests/test_grok_provider.py tests/test_llama_provider.py tests/test_qwen_provider.py tests/test_ai_client_no_top_level_sdk_imports.py tests/test_vendor_capabilities.py tests/test_openai_compatible.py tests/test_cost_tracker.py -q +``` + +Expected: 38+ passed. + +- [ ] **Step 5: Commit** + +```bash +git add src/models.py src/ai_client.py src/app_controller.py src/gui_2.py +git commit -m "refactor(provers): move PROVIDERS to src/ai_client.py; re-export shim for compat" +``` + +--- + +## Task 2.3: Add `scripts/audit_providers_source_of_truth.py` + +**Files:** Create `scripts/audit_providers_source_of_truth.py` + +- [ ] **Step 1: Create the audit script** + +```python +""" +Audit: fail if PROVIDERS is declared in src/models.py (should live in src/ai_client.py). + +The project invariant: PROVIDERS is the AI client vendor list, not an MMA +data model. It moved to src/ai_client.py in the +qwen_llama_grok_followup_20260611 Phase 2. The shim re-export in models.py +is allowed (for backward compat) but the literal declaration is not. + +Usage: uv run python scripts/audit_providers_source_of_truth.py +Exit code: 0 = pass; 1 = violation. +""" +import re +import sys +from pathlib import Path + +TARGET = Path("src/models.py") +PATTERN = re.compile(r"^\s*PROVIDERS\s*:\s*List\[str\]\s*=", re.MULTILINE) + +def main() -> int: + text = TARGET.read_text(encoding="utf-8") + if PATTERN.search(text): + print("FAIL: PROVIDERS is declared in src/models.py. It should be in src/ai_client.py.") + return 1 + print("OK: PROVIDERS is not declared in src/models.py (re-export shim is OK)") + return 0 + +if __name__ == "__main__": + sys.exit(main()) +``` + +- [ ] **Step 2: Run; confirm OK** + +``` +cd C:\projects\manual_slop && uv run python scripts/audit_providers_source_of_truth.py +``` + +Expected: `OK: ...` + +- [ ] **Step 3: Commit** + +```bash +git add scripts/audit_providers_source_of_truth.py +git commit -m "ci(audit): add audit_providers_source_of_truth.py to enforce PROVIDERS location" +``` + +--- + +## Task 2.4: Phase 2 verification + checkpoint + +- [ ] **Step 1: Run all 5 audit scripts; confirm all pass** + +``` +cd C:\projects\manual_slop && uv run python scripts/audit_main_thread_imports.py && uv run python scripts/audit_weak_types.py && uv run python scripts/check_test_toml_paths.py && uv run python scripts/audit_no_models_config_io.py && uv run python scripts/audit_no_inline_tool_loops.py && uv run python scripts/audit_providers_source_of_truth.py +``` + +Expected: All pass. + +- [ ] **Step 2: Run regression batch (38+ tests)** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_minimax_provider.py tests/test_grok_provider.py tests/test_llama_provider.py tests/test_qwen_provider.py tests/test_ai_client_no_top_level_sdk_imports.py tests/test_vendor_capabilities.py tests/test_openai_compatible.py tests/test_cost_tracker.py -q +``` + +Expected: 38+ passed. + +- [ ] **Step 3: Create Phase 2 checkpoint commit (empty)** + +```bash +git commit --allow-empty -m "conductor(checkpoint): Phase 2 - PROVIDERS moved to src/ai_client.py" +``` + +- [ ] **Step 4: Attach git note** + +```bash +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 2 checkpoint: PROVIDERS moved + +PROVIDERS moved from src/models.py to src/ai_client.py. The 4 import +sites (src/app_controller.py:3093; src/gui_2.py:2293, 2849, 5377) +now import from src/ai_client directly. A re-export shim in +src/models.py maintains backward compat (to be removed in a future +track). + +scripts/audit_providers_source_of_truth.py enforces the move. + +Verification: +- 38+ tests pass +- 6 audit scripts pass +- src/models.py no longer declares PROVIDERS" "$SHA" +``` + +- [ ] **Step 5: Update state.toml; commit** + +```bash +git add conductor/tracks/qwen_llama_grok_followup_20260611/state.toml +git commit -m "conductor(plan): mark Phase 2 complete" +``` + +--- + +# Phase 3: UX Adaptations 2-9 + +> Goal: Apply the 8 remaining UX adaptations from parent spec §6. The pattern is established (parent Phase 5 shipped adaptation 1: Screenshot button iff vision). The helper `_get_active_capabilities()` is already in `src/gui_2.py:733`. + +--- + +## Task 3.1: Apply adaptation 2 (Tools toggle iff tool_calling) + +**Files:** Modify `src/gui_2.py` + +- [ ] **Step 1: Find the tools toggle render site** + +Use `grep` for "tools" / "tool_calling" in `src/gui_2.py`. Likely in the AI Settings panel or the discussion hub. + +Expected location: around `src/gui_2.py` lines 3000-4500 (in `_render_ai_settings_hub` or similar). + +- [ ] **Step 2: Apply the pattern** + +```python +caps = app._get_active_capabilities() +imgui.begin_disabled(not caps.tool_calling) +# ... existing tools toggle UI ... +imgui.end_disabled() +if not caps.tool_calling: + imgui.same_line() + imgui.text_disabled(f"(tools not supported by {app.current_model})") +``` + +- [ ] **Step 3: Verify import works (no syntax errors)** + +``` +cd C:\projects\manual_slop && uv run python -c "import src.gui_2" +``` + +Expected: OK. + +- [ ] **Step 4: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): adaptation 2 of 9 - tools toggle iff tool_calling" +``` + +--- + +## Task 3.2: Apply adaptation 3 (Cache panel iff caching) + +**Files:** Modify `src/gui_2.py` + +- [ ] **Step 1: Find the cache panel render site** (likely in MMA Dashboard or Operations Hub) + +- [ ] **Step 2: Apply the pattern** with `caps.caching` + +- [ ] **Step 3: Verify import works** + +- [ ] **Step 4: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): adaptation 3 of 9 - cache panel iff caching" +``` + +--- + +## Task 3.3: Apply adaptation 4 (Stream progress iff streaming) + +**Files:** Modify `src/gui_2.py` + +- [ ] **Step 1: Find the stream progress render site** + +- [ ] **Step 2: Apply the pattern** with `caps.streaming` + +- [ ] **Step 3: Verify import works** + +- [ ] **Step 4: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): adaptation 4 of 9 - stream progress iff streaming" +``` + +--- + +## Task 3.4: Apply adaptation 5 (Fetch Models button iff model_discovery) + +**Files:** Modify `src/gui_2.py` + +- [ ] **Step 1: Find the Fetch Models button** (likely in AI Settings panel, calls `app.controller.do_fetch`) + +- [ ] **Step 2: Apply the pattern** with `caps.model_discovery` + +- [ ] **Step 3: Verify import works** + +- [ ] **Step 4: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): adaptation 5 of 9 - fetch models iff model_discovery" +``` + +--- + +## Task 3.5: Apply adaptation 6 (Token budget max = context_window) + +**Files:** Modify `src/gui_2.py` + +- [ ] **Step 1: Find the token budget input** (likely in AI Settings panel) + +- [ ] **Step 2: Apply the pattern** with `caps.context_window` (sets the `max_tokens` slider/input max to `caps.context_window`) + +- [ ] **Step 3: Verify import works** + +- [ ] **Step 4: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): adaptation 6 of 9 - token budget max = context_window" +``` + +--- + +## Task 3.6: Apply adaptation 7 (Cost panel: estimate) + +**Files:** Modify `src/gui_2.py` + +- [ ] **Step 1: Find the cost panel render site** (likely in MMA Dashboard or Operations Hub) + +- [ ] **Step 2: Apply the pattern** with `caps.cost_tracking`. If True, show estimated cost via `cost_tracker.estimate_cost(model, input, output)`. If False, fall through to adaptations 8/9. + +- [ ] **Step 3: Verify import works** + +- [ ] **Step 4: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): adaptation 7 of 9 - cost panel: estimate when cost_tracking" +``` + +--- + +## Task 3.7: Apply adaptation 8 (Cost panel: "Free (local)" for localhost) + +**Files:** Modify `src/gui_2.py` + +- [ ] **Step 1: Find the cost panel** (same site as 3.6) + +- [ ] **Step 2: Apply the pattern** with `_llama_base_url` containing "localhost" or "127.0.0.1" — show "Free (local)" instead of estimate + +- [ ] **Step 3: Verify import works** + +- [ ] **Step 4: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): adaptation 8 of 9 - cost panel: 'Free (local)' for localhost" +``` + +--- + +## Task 3.8: Apply adaptation 9 (Cost panel: "—" for other cost_tracking=false) + +**Files:** Modify `src/gui_2.py` + +- [ ] **Step 1: Find the cost panel** (same site as 3.6) + +- [ ] **Step 2: Apply the pattern** with `not caps.cost_tracking` and not localhost — show "—" + +- [ ] **Step 3: Verify import works** + +- [ ] **Step 4: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): adaptation 9 of 9 - cost panel: '-' for cost_tracking=false (not local)" +``` + +--- + +## Task 3.9: Phase 3 verification + checkpoint + +- [ ] **Step 1: Run regression batch (38+ tests)** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_minimax_provider.py tests/test_grok_provider.py tests/test_llama_provider.py tests/test_qwen_provider.py tests/test_ai_client_no_top_level_sdk_imports.py tests/test_vendor_capabilities.py tests/test_openai_compatible.py tests/test_cost_tracker.py -q +``` + +Expected: 38+ passed. + +- [ ] **Step 2: Run all 6 audit scripts; confirm all pass** + +``` +cd C:\projects\manual_slop && uv run python scripts/audit_main_thread_imports.py && uv run python scripts/audit_weak_types.py && uv run python scripts/check_test_toml_paths.py && uv run python scripts/audit_no_models_config_io.py && uv run python scripts/audit_no_inline_tool_loops.py && uv run python scripts/audit_providers_source_of_truth.py +``` + +Expected: All pass. + +- [ ] **Step 3: Create Phase 3 checkpoint commit (empty)** + +```bash +git commit --allow-empty -m "conductor(checkpoint): Phase 3 - all 9 UX adaptations shipped" +``` + +- [ ] **Step 4: Attach git note** + +```bash +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 3 checkpoint: all 9 UX adaptations applied to src/gui_2.py + +1. Screenshot button iff vision (parent Phase 5) +2. Tools toggle iff tool_calling +3. Cache panel iff caching +4. Stream progress iff streaming +5. Fetch Models iff model_discovery +6. Token budget max = context_window +7. Cost panel: estimate when cost_tracking +8. Cost panel: 'Free (local)' for localhost +9. Cost panel: '-' for other cost_tracking=false + +Pattern: caps = app._get_active_capabilities(); imgui.begin_disabled(not caps.); ...UI...; imgui.end_disabled(); if not caps.: imgui.same_line(); imgui.text_disabled('(reason)') + +Verification: +- 38+ tests pass +- 6 audit scripts pass +- import src.gui_2 OK (no syntax errors)" "$SHA" +``` + +- [ ] **Step 5: Update state.toml; commit** + +```bash +git add conductor/tracks/qwen_llama_grok_followup_20260611/state.toml +git commit -m "conductor(plan): mark Phase 3 complete" +``` + +--- + +# Phase 4: Local-First + Matrix v2 + +> Goal: Local models become first-class. Add `local: bool` and 11 other v2 fields. Native Ollama replaces OpenAI-compatible for the Ollama backend. Meta Llama API is a new 4th Llama backend. GUI: "Local Model" badge. + +--- + +## Task 4.1: Add 12 v2 fields to `VendorCapabilities` + +**Files:** Modify `src/vendor_capabilities.py` + +- [ ] **Step 1: Add the 12 new fields to the `VendorCapabilities` dataclass** + +```python +@dataclass(frozen=True) +class VendorCapabilities: + vendor: str + model: str + vision: bool = False + tool_calling: bool = True + caching: bool = False + streaming: bool = True + model_discovery: bool = True + context_window: int = 8192 + cost_tracking: bool = True + cost_input_per_mtok: float = 0.0 + cost_output_per_mtok: float = 0.0 + notes: str = "" + # v2 fields (added 2026-06-11) + local: bool = False + reasoning: bool = False + structured_output: bool = False + code_execution: bool = False + web_search: bool = False + x_search: bool = False + file_search: bool = False + mcp_support: bool = False + audio: bool = False + video: bool = False + grounding: bool = False + computer_use: bool = False +``` + +- [ ] **Step 2: Update the existing 22 registry entries to populate the v2 fields** + +For each entry in `src/vendor_capabilities.py`: +- `qwen-vl-plus`, `qwen-vl-max` — `vision=True` (already) +- `grok-2-vision` — `vision=True` (already) +- `llama-3.2-11b-vision-preview`, `llama-3.2-90b-vision-preview` — `vision=True` (already) +- `llama/*` (wildcard) — `local=True` (NEW) +- `qwen/*` (wildcard) — add appropriate v2 fields +- `grok/*` (wildcard) — `x_search=True` (xAI X/Twitter), `web_search=True` +- `anthropic/*`, `gemini/*`, `deepseek/*` — populated in Phase 5 + +- [ ] **Step 3: Verify `src/vendor_capabilities.py` imports OK** + +``` +cd C:\projects\manual_slop && uv run python -c "from src.vendor_capabilities import VendorCapabilities; c = VendorCapabilities(vendor='test', model='m', local=True); print(c)" +``` + +- [ ] **Step 4: Run regression; confirm no regression** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_vendor_capabilities.py -v +``` + +Expected: 3 passed. + +- [ ] **Step 5: Commit** + +```bash +git add src/vendor_capabilities.py +git commit -m "feat(capability_matrix): add 12 v2 fields (local, reasoning, structured_output, etc.)" +``` + +--- + +## Task 4.2: Write red tests for native Ollama adapter + +**Files:** Create `tests/test_llama_ollama_native.py` + +- [ ] **Step 1: Create the file with 3 tests** + +```python +from unittest.mock import MagicMock, patch +import pytest +from src import ai_client + +@pytest.fixture(autouse=True) +def _reset_ollama_state(): + if hasattr(ai_client, '_ollama_native_client'): + ai_client._ollama_native_client = None + yield + +def test_send_ollama_native_calls_ollama_api_chat(monkeypatch: pytest.MonkeyPatch) -> None: + ai_client.set_provider("llama", "llama3.2:3b") + ai_client._llama_base_url = "http://localhost:11434/v1" + mock_response = MagicMock() + mock_response.json.return_value = { + "message": {"role": "assistant", "content": "hi from ollama"}, + "done": True, + "prompt_eval_count": 10, + "eval_count": 5, + } + with patch("requests.post", return_value=mock_response) as post: + result = ai_client._send_llama_native("system", "user", ".", None, "", False, None, None, None) + assert "hi" in result.lower() + assert post.called + +def test_send_ollama_native_with_think_param(monkeypatch: pytest.MonkeyPatch) -> None: + ai_client.set_provider("llama", "qwen3:8b") + ai_client._llama_base_url = "http://localhost:11434/v1" + mock_response = MagicMock() + mock_response.json.return_value = { + "message": {"role": "assistant", "content": "hi", "thinking": "I thought about it"}, + "done": True, + } + with patch("requests.post", return_value=mock_response) as post: + ai_client._send_llama_native("system", "user", ".", None, "", False, None, None, None) + # The think param should be in the payload + call_kwargs = post.call_args.kwargs + assert call_kwargs.get("json", {}).get("think") == "low" + +def test_send_ollama_native_with_image_in_messages(monkeypatch: pytest.MonkeyPatch) -> None: + ai_client.set_provider("llama", "llama3.2-vision:11b") + ai_client._llama_base_url = "http://localhost:11434/v1" + mock_response = MagicMock() + mock_response.json.return_value = { + "message": {"role": "assistant", "content": "I see a cat"}, + "done": True, + } + with patch("requests.post", return_value=mock_response) as post: + file_items = [{"path": "/tmp/cat.png", "is_image": True, "base64_data": "iVBOR..."}] + ai_client._send_llama_native("system", "describe this image", ".", file_items, "", False, None, None, None) + call_kwargs = post.call_args.kwargs + # The image base64 should be in the messages + assert "images" in str(call_kwargs.get("json", {})) +``` + +- [ ] **Step 2: Run, confirm 3 tests fail with AttributeError** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_llama_ollama_native.py -v +``` + +Expected: 3 failed (AttributeError: `_send_llama_native` doesn't exist). + +- [ ] **Step 3: Commit (red)** + +```bash +git add tests/test_llama_ollama_native.py +git commit -m "test(llama_ollama_native): add red tests for native /api/chat adapter" +``` + +--- + +## Task 4.3: Implement `src/llama_ollama_native.py` + wire into `_send_llama` + +**Files:** Create `src/llama_ollama_native.py`; modify `src/ai_client.py` (route Ollama to native) + +- [ ] **Step 1: Create the file** + +```python +from __future__ import annotations +import requests +from typing import Any + +OLLAMA_DEFAULT_BASE_URL = "http://localhost:11434" + +def ollama_chat( + model: str, + messages: list[dict[str, Any]], + *, + think: str = "low", + images: list[str] | None = None, + tools: list[dict[str, Any]] | None = None, + base_url: str = OLLAMA_DEFAULT_BASE_URL, +) -> dict[str, Any]: + payload: dict[str, Any] = { + "model": model, + "messages": messages, + "stream": False, + } + if think: + payload["think"] = think + if images: + payload["images"] = images + if tools: + payload["tools"] = tools + resp = requests.post(f"{base_url}/api/chat", json=payload, timeout=120) + return resp.json() +``` + +- [ ] **Step 2: Add `_send_llama_native()` to `src/ai_client.py`** (after `_send_llama`) + +```python +def _send_llama_native(md_content, user_message, base_dir, file_items=None, discussion_history="", stream=False, pre_tool_callback=None, qa_callback=None, stream_callback=None, patch_callback=None) -> str: + """Native Ollama adapter for the Llama local backend. + Used when _llama_base_url is localhost/127.0.0.1 (Ollama default).""" + from src.llama_ollama_native import ollama_chat + base_url = _llama_base_url.replace("/v1", "") # native uses /api, not /v1 + with _llama_history_lock: + if discussion_history and not _llama_history: + _llama_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"}) + else: + _llama_history.append({"role": "user", "content": user_message}) + messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n\n{md_content}\n"}] + messages.extend(_llama_history) + images: list[str] = [] + if file_items: + for fi in file_items: + if fi.get("is_image") and fi.get("base64_data"): + images.append(fi["base64_data"]) + response = ollama_chat(_model, messages, images=images, base_url=base_url) + text = response.get("message", {}).get("content", "") + thinking = response.get("message", {}).get("thinking", "") + with _llama_history_lock: + msg = {"role": "assistant", "content": text or None} + if thinking: + msg["thinking"] = thinking + _llama_history.append(msg) + return (f"\n{thinking}\n\n" if thinking else "") + text +``` + +- [ ] **Step 3: Route Ollama (localhost) backend to native in `_send_llama`** + +In `_send_llama` (the existing entry point), check if base_url is localhost. If so, call `_send_llama_native`. Otherwise, call `_send_llama` (the existing OpenAI-compatible impl). + +```python +def _send_llama(md_content, user_message, base_dir, file_items=None, discussion_history="", stream=False, pre_tool_callback=None, qa_callback=None, stream_callback=None, patch_callback=None) -> str: + if "localhost" in _llama_base_url or "127.0.0.1" in _llama_base_url: + return _send_llama_native(md_content, user_message, base_dir, file_items, discussion_history, stream, pre_tool_callback, qa_callback, stream_callback, patch_callback) + # ... existing OpenAI-compatible impl ... +``` + +- [ ] **Step 4: Run the 3 Red tests; confirm they pass** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_llama_ollama_native.py -v +``` + +Expected: 3 passed. + +- [ ] **Step 5: Commit** + +```bash +git add src/llama_ollama_native.py src/ai_client.py +git commit -m "feat(llama_ollama_native): add native /api/chat adapter; route Ollama backend to it" +``` + +--- + +## Task 4.4: Add "Local Model" GUI badge + +**Files:** Modify `src/gui_2.py` + +- [ ] **Step 1: Find the AI Settings panel** (likely `_render_ai_settings_hub`) + +- [ ] **Step 2: Add the badge** after the provider dropdown + +```python +caps = app._get_active_capabilities() +if caps.local: + imgui.same_line() + imgui.text_colored(theme.get_color("status_success"), " [Local]") + if imgui.is_item_hovered(): + imgui.set_tooltip(f"Local backend: {_llama_base_url if app.current_provider == 'llama' else 'unknown'}") +``` + +- [ ] **Step 3: Commit** + +```bash +git add src/gui_2.py +git commit -m "feat(gui): add 'Local Model' badge for local backends (v2 matrix)" +``` + +--- + +## Task 4.5: Update registry for v2 fields + +**Files:** Modify `src/vendor_capabilities.py` + +- [ ] **Step 1: Update the 22 existing entries with the v2 fields** + +For each vendor, populate the relevant v2 fields: +- `qwen/*` (wildcard) — add `audio: True` (Qwen-Audio), `caching: True` (Qwen-Long custom chunking) +- `qwen-vl-plus`, `qwen-vl-max` — `vision: True` (already), no change needed +- `grok/*` (wildcard) — `x_search: True`, `web_search: True` +- `grok-2-vision` — `vision: True` (already), no change +- `llama/*` (wildcard) — `local: True` (the local flag is the whole point of the wildcard) +- `llama-3.2-11b-vision-preview`, `llama-3.2-90b-vision-preview` — `vision: True` (already), no change +- `minimax/*` (wildcard) — `reasoning: True` (minimax-reasoner) + +- [ ] **Step 2: Commit** + +```bash +git add src/vendor_capabilities.py +git commit -m "feat(capability_matrix): populate v2 fields for all 22 existing entries" +``` + +--- + +## Task 4.6: Phase 4 verification + checkpoint + +- [ ] **Step 1: Run regression batch (38+ tests + the 3 new ollama tests)** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_tool_loop.py tests/test_llama_ollama_native.py tests/test_qwen_provider.py tests/test_grok_provider.py tests/test_llama_provider.py tests/test_minimax_provider.py tests/test_vendor_capabilities.py tests/test_openai_compatible.py tests/test_cost_tracker.py -q +``` + +Expected: 41+ passed (3 new + 38 existing). + +- [ ] **Step 2: Run all 6 audit scripts** + +``` +cd C:\projects\manual_slop && uv run python scripts/audit_main_thread_imports.py && uv run python scripts/audit_weak_types.py && uv run python scripts/check_test_toml_paths.py && uv run python scripts/audit_no_models_config_io.py && uv run python scripts/audit_no_inline_tool_loops.py && uv run python scripts/audit_providers_source_of_truth.py +``` + +Expected: All pass. + +- [ ] **Step 3: Create Phase 4 checkpoint commit (empty)** + +```bash +git commit --allow-empty -m "conductor(checkpoint): Phase 4 - local-first + matrix v2 shipped" +``` + +- [ ] **Step 4: Attach git note + update state.toml + commit** + +```bash +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 4 checkpoint: local-first + matrix v2 + +- 12 v2 fields added to VendorCapabilities +- Native Ollama adapter (src/llama_ollama_native.py); Ollama backend + now uses /api/chat (think, images) instead of /v1/chat/completions +- 22 existing registry entries updated with the new v2 fields +- GUI: 'Local Model' badge for local backends + +Verification: +- 41+ tests pass (3 new ollama + 38 existing) +- 6 audit scripts pass" "$SHA" +git add conductor/tracks/qwen_llama_grok_followup_20260611/state.toml +git commit -m "conductor(plan): mark Phase 4 complete" +``` + +--- + +# Phase 5: Anthropic / Gemini / DeepSeek Migration + +> Goal: Populate the matrix entries for the 3 remaining providers. Each keeps its unique per-vendor code path; the matrix entries are the source of truth for the UI. + +--- + +## Task 5.1: Populate Anthropic matrix entries + +**Files:** Modify `src/vendor_capabilities.py` + +- [ ] **Step 1: Add the anthropic entries** (with v2 fields) + +```python +# Anthropic +register(VendorCapabilities( + vendor="anthropic", model="*", context_window=180000, + cost_input_per_mtok=3.00, cost_output_per_mtok=15.00, + caching=True, structured_output=True, file_search=True, + mcp_support=True, computer_use=True, + notes="pending_migration: caching, extended_thinking, computer_use" +)) + +# Per-model variations: +# claude-3-5-sonnet: 200K context, caching, computer_use +# claude-3-opus: 200K context, caching +# claude-3-haiku: 200K context, low cost +``` + +- [ ] **Step 2: Commit** + +```bash +git add src/vendor_capabilities.py +git commit -m "feat(capability_matrix): add Anthropic matrix entries (caching, computer_use, mcp_support)" +``` + +--- + +## Task 5.2: Populate Gemini matrix entries + +**Files:** Modify `src/vendor_capabilities.py` + +- [ ] **Step 1: Add the gemini entries** + +```python +register(VendorCapabilities( + vendor="gemini", model="*", context_window=900000, + cost_input_per_mtok=1.25, cost_output_per_mtok=5.00, + caching=True, vision=True, video=True, audio=True, + grounding=True, structured_output=True, + notes="pending_migration: explicit caching, grounding, native video" +)) +``` + +- [ ] **Step 2: Commit** + +```bash +git add src/vendor_capabilities.py +git commit -m "feat(capability_matrix): add Gemini matrix entries (caching, grounding, video, audio)" +``` + +--- + +## Task 5.3: Populate DeepSeek matrix entries + +**Files:** Modify `src/vendor_capabilities.py` + +- [ ] **Step 1: Add the deepseek entries** + +```python +register(VendorCapabilities( + vendor="deepseek", model="*", context_window=32768, + cost_input_per_mtok=0.14, cost_output_per_mtok=0.28, + reasoning=True, structured_output=True, + notes="pending_migration: R1 reasoning model, low cost" +)) +``` + +- [ ] **Step 2: Commit** + +```bash +git add src/vendor_capabilities.py +git commit -m "feat(capability_matrix): add DeepSeek matrix entries (reasoning, low cost)" +``` + +--- + +## Task 5.4: Update docs (Phase 5 docs) + +**Files:** Modify `docs/guide_ai_client.md`, `docs/guide_models.md` + +- [ ] **Step 1: Update `docs/guide_ai_client.md`** + +Add a section on `run_with_tool_loop` (the shared helper), the 12 v2 fields, the local-first architecture, and the 3 newly-migrated providers. + +- [ ] **Step 2: Update `docs/guide_models.md`** + +Add a section on the 19 v2 fields; the new PROVIDERS location; the local-first architecture; the 3 newly-migrated providers. + +- [ ] **Step 3: Commit** + +```bash +git add docs/guide_ai_client.md docs/guide_models.md +git commit -m "docs(phase-5): update ai_client+models guides with v2 fields, tool_loop, local-first, 3 migrated providers" +``` + +--- + +## Task 5.5: Phase 5 verification + checkpoint + +- [ ] **Step 1: Run full regression batch** + +``` +cd C:\projects\manual_slop && uv run pytest tests/test_tool_loop.py tests/test_llama_ollama_native.py tests/test_qwen_provider.py tests/test_grok_provider.py tests/test_llama_provider.py tests/test_minimax_provider.py tests/test_anthropic_provider.py tests/test_gemini_provider.py tests/test_gemini_cli_adapter.py tests/test_deepseek_provider.py tests/test_vendor_capabilities.py tests/test_openai_compatible.py tests/test_cost_tracker.py -q +``` + +Expected: 50+ passed. + +- [ ] **Step 2: Run all 6 audit scripts; confirm all pass** + +- [ ] **Step 3: Create Phase 5 checkpoint commit (empty)** + +```bash +git commit --allow-empty -m "conductor(checkpoint): Phase 5 - Anthropic/Gemini/DeepSeek migrated to matrix" +``` + +- [ ] **Step 4: Attach git note + final report + state update + commit** + +```bash +SHA=$(git log -1 --format="%H") +git notes add -m "Phase 5 checkpoint: Anthropic/Gemini/DeepSeek migration + +- Anthropic matrix entries: caching, computer_use, mcp_support, file_search +- Gemini matrix entries: caching, grounding, video, audio, structured_output +- DeepSeek matrix entries: reasoning, structured_output +- Docs updated (guide_ai_client.md, guide_models.md) +- 8 vendors total now on the matrix (was 5 before this track) + +Verification: +- 50+ tests pass +- 6 audit scripts pass" "$SHA" +git add conductor/tracks/qwen_llama_grok_followup_20260611/state.toml +git commit -m "conductor(plan): mark Phase 5 complete - track done" +``` + +--- + +## Final Step: TRACK COMPLETE + +- [ ] **Step 1: Move track to `conductor/archive/`** (per the original Phase 6 plan) + +```bash +git mv conductor/tracks/qwen_llama_grok_followup_20260611 conductor/tracks/archive/qwen_llama_grok_followup_20260611 +``` + +- [ ] **Step 2: Update `conductor/tracks.md`** + +Move the entry from "Active Tracks" to "Recently Completed." + +- [ ] **Step 3: Final commit** + +```bash +git add conductor/tracks.md conductor/tracks/archive/ +git commit -m "conductor(archive): ship qwen_llama_grok_followup_20260611 to archive" +``` + +--- + +## See Also + +- `conductor/tracks/qwen_llama_grok_followup_20260611/spec.md` — the design +- `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` — task tracking +- `conductor/tracks/qwen_llama_grok_followup_20260611/metadata.json` — verification criteria +- `conductor/tracks/qwen_llama_grok_followup_20260611/TODO.md` — setup checklist +- `docs/reports/qwen_llama_grok_followup_audit_20260611.md` — gap analysis +- Parent track: `conductor/tracks/qwen_llama_grok_integration_20260606/`