From 691dc584eb7050fe301e5c2e60940f27472fd07b Mon Sep 17 00:00:00 2001
From: Ed_ <edwardgz@gmail.com>
Date: Thu, 11 Jun 2026 09:33:18 -0400
Subject: [PATCH] docs(phase-6): update ai_client+models guides; report +
 follow-up track setup

Phase 6 t6.1 + t6.2 (no archive per user directive):
- docs/guide_ai_client.md: update Overview to mention 8 providers (was 5);
  add 'Shared OpenAI-Compatible Helper' section explaining
  src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest,
  send_openai_compatible, usage pattern); document the Qwen adapter
  and Llama multi-backend.
- docs/guide_models.md: update PROVIDERS list to 8 entries (was 5).
- conductor/tracks.md: update the Qwen track entry to reflect
  '50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up';
  add detailed status note pointing to the follow-up track + audit
  report.
- docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report
  explaining why a follow-up is needed (7 categories of gaps; the
  Tech Lead's 'footnote for now' failure mode; the lessons learned).
- conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up
  track setup (spec.md, state.toml, metadata.json, TODO.md).
  5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9,
  local-first + matrix v2, Anthropic/Gemini/DeepSeek migration.

Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed)
are NOT applied per user directive: 'we can then doc this we're not
archiving yet, if we have a follow up track I need this one to stay
up because there is still alot todo'.
---
 conductor/tracks.md                           |   2 +-
 .../qwen_llama_grok_followup_20260611/TODO.md |  81 ++++++
 .../metadata.json                             |  79 ++++++
 .../qwen_llama_grok_followup_20260611/spec.md | 237 ++++++++++++++++++
 .../state.toml                                |  86 +++++++
 docs/guide_ai_client.md                       |  96 ++++++-
 docs/guide_models.md                          |   2 +-
 ...qwen_llama_grok_followup_audit_20260611.md | 165 ++++++++++++
 8 files changed, 745 insertions(+), 3 deletions(-)
 create mode 100644 conductor/tracks/qwen_llama_grok_followup_20260611/TODO.md
 create mode 100644 conductor/tracks/qwen_llama_grok_followup_20260611/metadata.json
 create mode 100644 conductor/tracks/qwen_llama_grok_followup_20260611/spec.md
 create mode 100644 conductor/tracks/qwen_llama_grok_followup_20260611/state.toml
 create mode 100644 docs/reports/qwen_llama_grok_followup_audit_20260611.md

diff --git a/conductor/tracks.md b/conductor/tracks.md
index 8af184d6..65135c9d 100644
--- a/conductor/tracks.md
+++ b/conductor/tracks.md
@@ -16,7 +16,7 @@ Tracks that are unblocked and ready to start. Ordered by **dependency** (blocked
 
 | # | Priority | Track | Status | Blocked By |
 |---|---|---|---|---|
-| 2 | A | [Qwen, Llama & Grok Vendor Integration + Capability Matrix](#track-qwen-llama-grok-vendor-integration--capability-matrix) | spec ✓, plan pending | **test_infrastructure_hardening_20260609 (merged)** |
+| 2 | A | [Qwen, Llama & Grok Vendor Integration + Capability Matrix](#track-qwen-llama-grok-vendor-integration--capability-matrix) | spec ✓, plan ✓, 50/79 tasks done; **Phase 6 in progress (docs); NOT archiving — has follow-up track** | **test_infrastructure_hardening_20260609 (merged)** |
 | 3 | A | [Data-Oriented Error Handling (Fleury Pattern)](#track-data-oriented-error-handling-fleury-pattern) | spec ✓, plan ✓, ready to start | startup_speedup, test_batching_refactor, **test_infrastructure_hardening_20260609 (merged)**, qwen_llama_grok |
 | 4 | A | [Data Structure Strengthening (Type Aliases + NamedTuples)](#track-data-structure-strengthening-type-aliases--namedtuples) | spec ✓, plan pending | **test_infrastructure_hardening_20260609 (merged)** |
 | 5 | A | [MCP Architecture Refactor (Sub-MCP Extraction)](#track-mcp-architecture-refactor-sub-mcp-extraction) | spec ✓, plan pending | test_infrastructure_hardening_20260609 (merged), data_oriented_error_handling, data_structure_strengthening |
diff --git a/conductor/tracks/qwen_llama_grok_followup_20260611/TODO.md b/conductor/tracks/qwen_llama_grok_followup_20260611/TODO.md
new file mode 100644
index 00000000..f7077054
--- /dev/null
+++ b/conductor/tracks/qwen_llama_grok_followup_20260611/TODO.md
@@ -0,0 +1,81 @@
+# Track: Qwen, Llama & Grok Follow-Up (Post-Phase 5)
+
+This is a TODO list for setting up the follow-up track. The Tier 2 Tech Lead will execute items in order.
+
+## Status
+
+- [x] Spec drafted: `conductor/tracks/qwen_llama_grok_followup_20260611/spec.md`
+- [ ] state.toml initialized
+- [ ] metadata.json created
+- [ ] Phase 1 ready to start
+
+## Immediate TODOs (in order)
+
+1. **Read parent track state**
+   - [ ] Read `conductor/tracks/qwen_llama_grok_integration_20260606/state.toml` to confirm Phase 6 is complete
+   - [ ] Read `conductor/tracks/qwen_llama_grok_integration_20260606/plan.md` and find tasks tagged t6.* to confirm Phase 6 done
+
+2. **Create the follow-up track structure**
+   - [ ] Create `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` with 5 phases × ~7 tasks
+   - [ ] Create `conductor/tracks/qwen_llama_grok_followup_20260611/metadata.json` with verification_criteria
+
+3. **Phase 1: Tool Loop Lift (first concrete work)**
+   - [ ] Read current tool-loop patterns in `_send_minimax` (231 → 75 lines after refactor) and `_send_anthropic/_send_gemini/_send_gemini_cli/_send_deepseek` (inline loops)
+   - [ ] Design `run_with_tool_loop(client, request, capabilities, *, pre_tool_callback, qa_callback, patch_callback, base_dir, vendor_name, history_lock, history, trim_func)` helper
+   - [ ] Write 5 Red tests: no-tool-calls returns immediately, tool-calls dispatch, max-rounds limit, history appending, error-in-tool-call doesn't crash
+   - [ ] Implement helper in `src/tool_loop.py`
+   - [ ] Apply to all 8 vendors
+   - [ ] Audit script `scripts/audit_no_inline_tool_loops.py` to enforce the pattern
+   - [ ] Verify all 38+ existing tests still pass
+   - [ ] Phase 1 checkpoint
+
+4. **Phase 2: PROVIDERS Move**
+   - [ ] Decide: `src/ai_client.py` vs new `src/ai_client_providers.py` (open question in spec)
+   - [ ] Move PROVIDERS constant
+   - [ ] Update 5 import sites
+   - [ ] Add `scripts/audit_providers_source_of_truth.py`
+   - [ ] Verify all 38+ tests pass
+   - [ ] Phase 2 checkpoint
+
+5. **Phase 3: UX Adaptations 2-9**
+   - [ ] Apply each adaptation one at a time, 1-2 per commit
+   - [ ] Run live_gui tests in batch after each commit
+   - [ ] Phase 3 checkpoint when all 9 adaptations done
+
+6. **Phase 4: Local-First + Matrix Expansion**
+   - [ ] Add `local: bool` to VendorCapabilities
+   - [ ] Native Ollama adapter (verify URL https://docs.ollama.com/api/chat is up)
+   - [ ] Meta Llama API adapter (verify URL https://llama.developer.meta.com/docs/overview is up — was 400 last session)
+   - [ ] GUI: "Local Model" badge
+   - [ ] Add 12 v2 fields to VendorCapabilities
+   - [ ] Update all vendor registry entries
+   - [ ] UI adaptations for the new fields
+   - [ ] Phase 4 checkpoint
+
+7. **Phase 5: Anthropic / Gemini / DeepSeek Migration**
+   - [ ] Populate Anthropic matrix entries
+   - [ ] Populate Gemini matrix entries
+   - [ ] Populate DeepSeek matrix entries
+   - [ ] UI adaptations
+   - [ ] Docs + archive
+
+## Pre-Work Prerequisites
+
+Before starting Phase 1, confirm the parent track's Phase 6 is complete:
+- `docs/guide_ai_client.md` updated with new vendors, matrix, helper
+- `docs/guide_models.md` updated with new PROVIDERS entries
+- Parent track folder **stays open** in `conductor/tracks/` (not archived)
+- `conductor/tracks.md` reflects active status
+
+## Lessons from Parent Track (apply to this one)
+
+- **Surface gaps as they appear, not at the checkpoint.** If a task is going to be deferred mid-phase, say so immediately — don't footnote it later.
+- **Be explicit about architectural deviations.** The `src/models.py` PROVIDERS sprawl should have been raised at Phase 2, not at Phase 5.
+- **Plan for the test infrastructure before coding.** The parent track's tool-loop regression wasn't caught because no test exercised the loop. Future work: every helper gets tests BEFORE implementation.
+
+## Status
+
+- T0: Spec drafted (this file) — DONE
+- T1: Parent track Phase 6 verification — TODO
+- T2: Follow-up track files created — TODO
+- T3: Phase 1 (tool loop lift) — TODO
diff --git a/conductor/tracks/qwen_llama_grok_followup_20260611/metadata.json b/conductor/tracks/qwen_llama_grok_followup_20260611/metadata.json
new file mode 100644
index 00000000..e68e57f8
--- /dev/null
+++ b/conductor/tracks/qwen_llama_grok_followup_20260611/metadata.json
@@ -0,0 +1,79 @@
+{
+  "track_id": "qwen_llama_grok_followup_20260611",
+  "name": "Qwen/Llama/Grok Follow-Up (tool loop, PROVIDERS move, UX adaptations 2-9, local-first, matrix v2, Anthropic/Gemini/DeepSeek migration)",
+  "initialized": "2026-06-11",
+  "owner": "tier2-tech-lead",
+  "priority": "high",
+  "status": "active",
+  "type": "refactor + feature",
+  "scope": {
+    "new_files": [
+      "src/tool_loop.py",
+      "src/llama_ollama_native.py",
+      "src/llama_meta_api.py",
+      "tests/test_tool_loop.py",
+      "scripts/audit_no_inline_tool_loops.py",
+      "scripts/audit_providers_source_of_truth.py"
+    ],
+    "modified_files": [
+      "src/ai_client.py",
+      "src/vendor_capabilities.py",
+      "src/gui_2.py",
+      "src/models.py",
+      "tests/test_minimax_provider.py",
+      "tests/test_grok_provider.py",
+      "tests/test_llama_provider.py",
+      "tests/test_qwen_provider.py",
+      "tests/test_anthropic_provider.py",
+      "tests/test_gemini_provider.py",
+      "tests/test_deepseek_provider.py",
+      "docs/guide_ai_client.md",
+      "docs/guide_models.md"
+    ]
+  },
+  "blocked_by": {
+    "qwen_llama_grok_integration_20260606": "phase_6_in_progress"
+  },
+  "blocks": [
+    "anthropic_gemini_deepseek_capability_matrix_20260606"
+  ],
+  "estimated_phases": 5,
+  "spec": "spec.md",
+  "plan": "plan.md",
+  "state": "state.toml",
+  "todo": "TODO.md",
+  "priority_order": "A (tool loop lift + PROVIDERS move + UX 2-9) > B (local-first + matrix v2) > C (Anthropic/Gemini/DeepSeek migration)",
+  "user_directions": [
+    "2026-06-11: User wants REPORT explaining why a follow-up is needed (gaps in parent track).",
+    "2026-06-11: User wants LOCAL MODELS prioritized as first-class; current implementation treats Ollama as 'one of 3 backends' which under-emphasizes local.",
+    "2026-06-11: User wants the source-of-truth sprawl cleaned up (PROVIDERS in models.py is wrong; should be elsewhere).",
+    "2026-06-11: User wants ai_client.py further codepath consolidation; new files need review."
+  ],
+  "verification_criteria": [
+    "src/tool_loop.py:run_with_tool_loop handles no-tool-calls, dispatches tool calls, respects max-rounds, appends to history, doesn't crash on tool error",
+    "All 8 vendors (_send_minimax, _send_qwen, _send_grok, _send_llama, _send_anthropic, _send_gemini, _send_gemini_cli, _send_deepseek) use run_with_tool_loop",
+    "scripts/audit_no_inline_tool_loops.py passes (no inline tool loops in any _send_<vendor>)",
+    "PROVIDERS is no longer declared in src/models.py",
+    "scripts/audit_providers_source_of_truth.py passes",
+    "All 9 UX adaptations from parent spec §6 are applied to src/gui_2.py (1 from parent Phase 5 + 8 from this track's Phase 3)",
+    "src/llama_ollama_native.py: native Ollama adapter replaces OpenAI-compatible for Ollama backend (or used by default)",
+    "src/llama_meta_api.py: Meta Llama API adapter; new 4th backend",
+    "src/vendor_capabilities.py: 12 new v2 fields added (local, reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use)",
+    "All vendor registry entries updated with the new fields",
+    "Anthropic matrix entries populated (caching, extended_thinking, pdf, computer_use)",
+    "Gemini matrix entries populated (caching, grounding, video, audio)",
+    "DeepSeek matrix entries populated (reasoning, low_cost)",
+    "GUI: 'Local Model' badge added to AI Settings panel",
+    "GUI: 4 cost panel states (estimate / 'Free (local)' / '-' / new local-no-cost state)",
+    "All existing tests still pass (38+ in batch; full suite has pre-existing live_gui flakes)",
+    "No new threading.Thread calls",
+    "docs/guide_ai_client.md + docs/guide_models.md updated"
+  ],
+  "links": {
+    "parent_track": "conductor/tracks/qwen_llama_grok_integration_20260606/",
+    "parent_spec": "conductor/tracks/qwen_llama_grok_integration_20260606/spec.md",
+    "ai_client_guide": "docs/guide_ai_client.md",
+    "models_guide": "docs/guide_models.md",
+    "follow_up_audit_report": "docs/reports/qwen_llama_grok_followup_20260611.md (TBD; will be created in Phase 5)"
+  }
+}
diff --git a/conductor/tracks/qwen_llama_grok_followup_20260611/spec.md b/conductor/tracks/qwen_llama_grok_followup_20260611/spec.md
new file mode 100644
index 00000000..07f9b556
--- /dev/null
+++ b/conductor/tracks/qwen_llama_grok_followup_20260611/spec.md
@@ -0,0 +1,237 @@
+# Track: Qwen, Llama & Grok Follow-Up (Post-Phase 5)
+
+**Status:** Active (initializing)
+**Initialized:** 2026-06-11
+**Owner:** Tier 2 Tech Lead
+**Priority:** High (architectural consolidation + UX payoff; user is rightly concerned that the parent track shipped with gaps)
+
+---
+
+## Why This Track Exists
+
+The parent track `qwen_llama_grok_integration_20260606` (status: 50/79 tasks done, Phase 6 in progress) shipped 5 phases cleanly but **left meaningful gaps** that the Tier 2 Tech Lead did not surface until the Phase 5 checkpoint. This track captures the deferred work, ordered by impact.
+
+**The Tier 2's failure mode** (called out by the user 2026-06-11): "you never even told me until now and then you just say 'oh yeah we're done btw, fuck you' thats what it feels like." Rightly called. This track exists to fix that.
+
+---
+
+## Goals (Priority Order)
+
+| Priority | Goal | Rationale |
+|---|---|---|
+| **A (architectural)** | Lift the tool-call loop into a shared `run_with_tool_loop()` helper. Apply to all 4 new vendors + the 4 existing vendors. | Today only `_send_minimax` has a working tool loop. Qwen/Grok/Llama are single-shot (regression). Anthropic/Gemini/Gemini-cli/DeepSeek already have inline tool loops (4-way duplication). Lifting gives one place to fix bugs + add new behavior. |
+| **A (architectural)** | Move `PROVIDERS` out of `src/models.py`. | `src/models.py` is for MMA data models (Tickets, Tracks, FileItem). The vendor list is an AI client concern. The audit script `audit_no_models_config_io.py` enforces config I/O rules; PROVIDERS has no analogous enforcement. Move to `src/ai_client.py` (or new `src/ai_client_providers.py`); add an audit script that enforces the move. |
+| **A (UX payoff)** | Apply the remaining 8 of 9 UX adaptations from parent track spec §6: tools toggle (tool_calling), cache panel (caching), stream progress (streaming), fetch models (model_discovery), token budget max (context_window), cost panel × 3. | The pattern is established (adaptation 1 shipped in parent Phase 5); the helper `_get_active_capabilities()` is in place; the remaining 8 are mechanical applications. |
+| **B (local-first)** | Promote local models from "one of 3 backends" to first-class. | Add `local_backend: bool` capability field (separate from `cost_tracking`). Native Ollama (`/api/chat`) as the default for Llama (not the OpenAI-compatible fallback). Add Meta Llama API as a 4th backend. Add a "Local Model" UI badge. |
+| **B (matrix expansion)** | Land the v2 matrix fields: `local`, `reasoning`, `structured_output`, `code_execution`, `web_search`, `x_search`, `file_search`, `mcp_support`, `audio`, `video`, `grounding`, `computer_use`. | These are the 12 fields documented in parent spec §3.1.1 after the Grok consultation. None wired today. Each addition is registry + UI adaptation. |
+| **C (provider coverage)** | Migrate Anthropic / Gemini / DeepSeek onto the capability matrix. | Anthropic has prompt caching, extended thinking, Computer Use (high-value UX). Gemini has Grounding with Google Search, native video. DeepSeek has reasoning models. None of these capabilities are exposed in the GUI today. |
+| **C (codepath consolidation)** | Reduce `src/ai_client.py` line count (currently 2784). | The 8 vendors' inline patterns have grown. Lifting history management, reasoning content extraction, error classification per HTTP code into shared helpers would cut ~30-40% of the file. |
+
+### Non-Goals (this track)
+
+- Not changing the matrix schema (the 7 v1 fields are good; v2 is additive)
+- Not changing the shared `send_openai_compatible` helper (it works; the tool loop is separate)
+- Not changing the `vendor_capabilities.py` lookup pattern (it works; registry is the source of truth)
+- Not adding new vendors (the parent track added Qwen/Grok/Llama; this track only consolidates what's there)
+
+---
+
+## Architecture
+
+### A.1 Tool Loop Lift
+
+Today:
+```python
+# in _send_minimax (only):
+for _round in range(MAX_TOOL_ROUNDS + 2):
+    request = OpenAICompatibleRequest(...)
+    response = send_openai_compatible(client, request, capabilities=caps)
+    if not response.tool_calls: return response.text
+    results = asyncio.run(_execute_tool_calls_concurrently(response.tool_calls, ...))
+    # ... append results to history ...
+
+# in _send_qwen, _send_grok, _send_llama: no loop (single-shot, regression)
+# in _send_anthropic, _send_gemini, _send_gemini_cli, _send_deepseek: inline loop (4-way duplication)
+```
+
+After:
+```python
+# new src/tool_loop.py:
+def run_with_tool_loop(
+    client, request, capabilities, *,
+    pre_tool_callback, qa_callback, patch_callback,
+    base_dir, vendor_name, history_lock, history, trim_func,
+) -> str:
+    """Wraps send_openai_compatible with a tool-call loop. Works for any
+    OpenAI-compatible vendor; vendor-specific logic (history mgmt,
+    trim, message format) is injected via parameters."""
+    ...
+
+# in each _send_<vendor>:
+response = run_with_tool_loop(
+    client=_ensure_<vendor>_client(),
+    request=OpenAICompatibleRequest(...),
+    capabilities=get_capabilities(vendor, _model),
+    pre_tool_callback=..., qa_callback=..., patch_callback=...,
+    base_dir=base_dir, vendor_name="<vendor>",
+    history_lock=_<vendor>_history_lock,
+    history=_<vendor>_history,
+    trim_func=_(vendor)_trim_history,
+)
+```
+
+The helper takes history management as injected parameters (each vendor has its own lock and history list). The tool dispatch (`_execute_tool_calls_concurrently`) takes a `vendor_name` string.
+
+### A.2 PROVIDERS Move
+
+Today:
+```python
+# src/models.py:79
+PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
+```
+
+After:
+```python
+# src/ai_client.py (new location) or src/ai_client_providers.py (new file)
+PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
+
+# src/models.py: import from src.ai_client or keep as re-export shim for backward compat
+```
+
+The audit script: add `scripts/audit_providers_source_of_truth.py` that verifies PROVIDERS is not declared in `src/models.py`. Fails the build if regressed.
+
+### A.3 UX Adaptations 2-9
+
+Same pattern as the shipped adaptation 1 (Screenshot button iff vision). For each render site:
+```python
+caps = app._get_active_capabilities()
+imgui.begin_disabled(not caps.<field>)
+... UI ...
+imgui.end_disabled()
+if not caps.<field>:
+    imgui.same_line()
+    imgui.text_disabled("(reason)")
+```
+
+### B.1 Local-First Architecture
+
+- Add `local_backend: bool` to `VendorCapabilities` (default False)
+- Set True for Llama (when base_url is localhost/127.0.0.1)
+- Native Ollama adapter: `src/llama_ollama_native.py` (separate from openai_compatible)
+- Meta Llama API adapter: `src/llama_meta_api.py` (verify docs first)
+- GUI: new "Local Model" badge in the AI Settings panel
+- Cost panel: 4th state "Local (no cost)" distinct from "Free (local)" and "—"
+
+### B.2 Matrix Expansion (v2)
+
+Add to `VendorCapabilities`:
+- `local: bool` (B.1)
+- `reasoning: bool` (xAI `reasoning_effort`, Anthropic extended thinking, Ollama `think`)
+- `structured_output: bool` (response_format / format)
+- `code_execution: bool` (xAI code_interpreter, Anthropic Computer Use, Gemini Code Execution)
+- `web_search: bool` (xAI web_search, Gemini Grounding)
+- `x_search: bool` (xAI X/Twitter search, xAI-specific)
+- `file_search: bool` (xAI file_search, Anthropic PDF, Gemini file API)
+- `mcp_support: bool` (xAI mcp_calls, Anthropic MCP)
+- `audio: bool` (Qwen-Audio, Gemini audio)
+- `video: bool` (Gemini video)
+- `grounding: bool` (Gemini Grounding with Google Search)
+- `computer_use: bool` (Anthropic Computer Use)
+
+Each new field is a registry update + a UI adaptation. The matrix schema grows; the GUI filters based on the matrix.
+
+### C.1 Anthropic / Gemini / DeepSeek Migration
+
+Per the deferred follow-up track `anthropic_gemini_deepseek_capability_matrix_20260606` (parent spec §13.1.A). The capability matrix entries for these vendors can be populated:
+- `anthropic/*` with `caching: True` (prompt caching), `extended_thinking: True`, `pdf: True`, `computer_use: True`
+- `gemini/*` with `caching: True` (explicit cache), `grounding: True`, `video: True`, `audio: True`
+- `deepseek/*` with `reasoning: True` (R1), `low_cost: True`
+
+The implementations (`_send_anthropic`, `_send_gemini`, `_send_deepseek`) keep their unique per-vendor code paths. The matrix entries are the source of truth for the UI.
+
+---
+
+## Phase Plan (5 phases, 4 weeks of work)
+
+### Phase 1: Tool Loop Lift (1-2 weeks)
+- T1.1: Write red tests for `run_with_tool_loop` (5 tests covering: no tool calls returns immediately, tool calls dispatch, max rounds limit, history appending, error in tool call doesn't crash)
+- T1.2: Implement `src/tool_loop.py` with `run_with_tool_loop`
+- T1.3: Apply to `_send_minimax` (replace inline loop)
+- T1.4: Apply to `_send_qwen`, `_send_grok`, `_send_llama` (add the missing loop)
+- T1.5: Apply to `_send_anthropic`, `_send_gemini`, `_send_gemini_cli`, `_send_deepseek` (consolidate)
+- T1.6: Verify all 8 vendors' existing tests still pass
+- T1.7: Audit script `scripts/audit_no_inline_tool_loops.py` to enforce the pattern
+
+### Phase 2: PROVIDERS Move (1 week)
+- T2.1: Move `PROVIDERS` to `src/ai_client.py` (or new `src/ai_client_providers.py`)
+- T2.2: Update all 5 import sites (gui_2.py, app_controller.py, etc.) to point to new location
+- T2.3: Add `scripts/audit_providers_source_of_truth.py` to enforce the move
+- T2.4: Verify all 38+ tests pass
+
+### Phase 3: UX Adaptations 2-9 (1-2 weeks)
+- T3.1: Apply adaptation 2 (tools toggle iff tool_calling)
+- T3.2: Apply adaptation 3 (cache panel iff caching)
+- T3.3: Apply adaptation 4 (stream progress iff streaming)
+- T3.4: Apply adaptation 5 (fetch models iff model_discovery)
+- T3.5: Apply adaptation 6 (token budget max = context_window)
+- T3.6: Apply adaptation 7 (cost panel: estimate)
+- T3.7: Apply adaptation 8 (cost panel: "Free (local)" for localhost)
+- T3.8: Apply adaptation 9 (cost panel: "—" for other cost_tracking=false)
+- T3.9: Verify live_gui tests pass
+
+### Phase 4: Local-First + Matrix Expansion (1-2 weeks)
+- T4.1: Add `local: bool` to VendorCapabilities; update registry for Llama
+- T4.2: Native Ollama adapter (`src/llama_ollama_native.py`); replace OpenAI-compatible for Ollama backend
+- T4.3: Meta Llama API adapter (`src/llama_meta_api.py`); add as 4th Llama backend
+- T4.4: GUI: "Local Model" badge
+- T4.5: Add v2 fields (local, reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use)
+- T4.6: Update all vendor registry entries with the new fields
+- T4.7: Add UI adaptations for the new fields (e.g., "Reasoning" toggle, "Code execution" panel)
+
+### Phase 5: Anthropic / Gemini / DeepSeek Migration (1-2 weeks)
+- T5.1: Populate Anthropic matrix entries (caching, extended_thinking, pdf, computer_use)
+- T5.2: Populate Gemini matrix entries (caching, grounding, video, audio)
+- T5.3: Populate DeepSeek matrix entries (reasoning, low_cost)
+- T5.4: UI adaptations for the new capabilities
+- T5.5: Docs + archive
+
+---
+
+## Testing Strategy
+
+- All new helpers (`run_with_tool_loop`) get TDD: Red tests first, then implementation
+- All UX adaptations get a test that verifies the render function reads the capability
+- All audit scripts get a self-test (the script can detect its own absence)
+- Live_gui tests run in batch (per the docs_sync lessons: bisect in batch, not isolation)
+
+---
+
+## Risks
+
+- **Tool loop lift risk:** Anthropic and Gemini have unique tool-use formats (Anthropic uses `tool_use` blocks; Gemini uses `functionCall`). Lifting requires careful preservation. Mitigation: keep the per-vendor `tool_format_converter` injection as a parameter.
+- **PROVIDERS move risk:** 5 import sites to update; some might use `from src.models import PROVIDERS` and break. Mitigation: search-and-replace audit, run full test suite after.
+- **UX adaptation risk:** Same as parent Phase 5 — touching 260KB of GUI code is high risk. Mitigation: ship 1-2 per commit, run live_gui batch after each.
+
+---
+
+## Open Questions
+
+1. **`src/ai_client_providers.py` vs `src/ai_client.py`?** Should PROVIDERS go in a new file (clearer separation) or stay in the main ai_client module (less file proliferation)?
+2. **Meta Llama API spec verification:** The 400 error on the docs URL last session — is it back up? If not, defer the Meta backend.
+3. **Local model as separate UI mode?** Should the GUI have a "Local / Cloud / All" filter on the provider dropdown, or just show the local badge per-vendor?
+
+---
+
+## See Also
+
+- Parent track: `conductor/tracks/qwen_llama_grok_integration_20260606/`
+- Parent spec: `conductor/tracks/qwen_llama_grok_integration_20260606/spec.md`
+- Parent Phase 5 report: `docs/reports/qwen_llama_grok_integration_20260610.md` (TBD)
+- `docs/guide_ai_client.md` — the doc that needs updating in Phase 6 of the parent track
+
+---
+
+## Status
+
+- T0: Spec drafted (this file)
+- T1: Phase 1 (tool loop lift) ready to start
diff --git a/conductor/tracks/qwen_llama_grok_followup_20260611/state.toml b/conductor/tracks/qwen_llama_grok_followup_20260611/state.toml
new file mode 100644
index 00000000..cf1473e4
--- /dev/null
+++ b/conductor/tracks/qwen_llama_grok_followup_20260611/state.toml
@@ -0,0 +1,86 @@
+# Track state for qwen_llama_grok_followup_20260611
+# Updated by Tier 2 Tech Lead as tasks complete
+
+[meta]
+track_id = "qwen_llama_grok_followup_20260611"
+name = "Qwen/Llama/Grok Follow-Up (tool loop, PROVIDERS move, UX adaptations 2-9, local-first, matrix v2, Anthropic/Gemini/DeepSeek migration)"
+status = "active"
+current_phase = 0
+last_updated = "2026-06-11"
+
+[blocked_by]
+# This follow-up is blocked on the parent track's Phase 6 (docs) completing.
+qwen_llama_grok_integration_20260606 = "phase_6_in_progress"
+
+[phases]
+phase_1 = { status = "pending", checkpoint_sha = "", name = "Tool loop lift (run_with_tool_loop helper for 8 vendors)" }
+phase_2 = { status = "pending", checkpoint_sha = "", name = "PROVIDERS move (out of src/models.py)" }
+phase_3 = { status = "pending", checkpoint_sha = "", name = "UX adaptations 2-9 (8 of 9 deferred from parent Phase 5)" }
+phase_4 = { status = "pending", checkpoint_sha = "", name = "Local-first + matrix v2 expansion (12 new fields)" }
+phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration" }
+
+[tasks]
+# Phase 1: Tool loop lift
+t1_1 = { status = "pending", commit_sha = "", description = "Read tool-loop patterns in _send_minimax + the 4 inline-loop vendors" }
+t1_2 = { status = "pending", commit_sha = "", description = "Design run_with_tool_loop helper signature" }
+t1_3 = { status = "pending", commit_sha = "", description = "Red: 5 tests for run_with_tool_loop in tests/test_tool_loop.py" }
+t1_4 = { status = "pending", commit_sha = "", description = "Green: implement run_with_tool_loop in src/tool_loop.py" }
+t1_5 = { status = "pending", commit_sha = "", description = "Apply to _send_minimax (replace inline loop)" }
+t1_6 = { status = "pending", commit_sha = "", description = "Apply to _send_qwen + _send_grok + _send_llama (add missing loop)" }
+t1_7 = { status = "pending", commit_sha = "", description = "Apply to _send_anthropic + _send_gemini + _send_gemini_cli + _send_deepseek (consolidate inline)" }
+t1_8 = { status = "pending", commit_sha = "", description = "Add scripts/audit_no_inline_tool_loops.py" }
+t1_9 = { status = "pending", commit_sha = "", description = "Phase 1 checkpoint + git note" }
+# Phase 2: PROVIDERS move
+t2_1 = { status = "pending", commit_sha = "", description = "Decide: src/ai_client.py vs new src/ai_client_providers.py" }
+t2_2 = { status = "pending", commit_sha = "", description = "Move PROVIDERS to new location" }
+t2_3 = { status = "pending", commit_sha = "", description = "Update 5 import sites" }
+t2_4 = { status = "pending", commit_sha = "", description = "Add scripts/audit_providers_source_of_truth.py" }
+t2_5 = { status = "pending", commit_sha = "", description = "Phase 2 checkpoint + git note" }
+# Phase 3: UX adaptations 2-9
+t3_1 = { status = "pending", commit_sha = "", description = "Adaptation 2: tools toggle iff tool_calling" }
+t3_2 = { status = "pending", commit_sha = "", description = "Adaptation 3: cache panel iff caching" }
+t3_3 = { status = "pending", commit_sha = "", description = "Adaptation 4: stream progress iff streaming" }
+t3_4 = { status = "pending", commit_sha = "", description = "Adaptation 5: fetch models iff model_discovery" }
+t3_5 = { status = "pending", commit_sha = "", description = "Adaptation 6: token budget max = context_window" }
+t3_6 = { status = "pending", commit_sha = "", description = "Adaptation 7: cost panel: estimate" }
+t3_7 = { status = "pending", commit_sha = "", description = "Adaptation 8: cost panel: 'Free (local)' for localhost" }
+t3_8 = { status = "pending", commit_sha = "", description = "Adaptation 9: cost panel: '-' for other cost_tracking=false" }
+t3_9 = { status = "pending", commit_sha = "", description = "Phase 3 checkpoint + git note" }
+# Phase 4: Local-first + matrix v2
+t4_1 = { status = "pending", commit_sha = "", description = "Add local: bool to VendorCapabilities" }
+t4_2 = { status = "pending", commit_sha = "", description = "Native Ollama adapter src/llama_ollama_native.py" }
+t4_3 = { status = "pending", commit_sha = "", description = "Meta Llama API adapter src/llama_meta_api.py" }
+t4_4 = { status = "pending", commit_sha = "", description = "GUI: 'Local Model' badge" }
+t4_5 = { status = "pending", commit_sha = "", description = "Add 12 v2 fields to VendorCapabilities" }
+t4_6 = { status = "pending", commit_sha = "", description = "Update all vendor registry entries" }
+t4_7 = { status = "pending", commit_sha = "", description = "UI adaptations for new fields (reasoning toggle, code execution panel, etc.)" }
+t4_8 = { status = "pending", commit_sha = "", description = "Phase 4 checkpoint + git note" }
+# Phase 5: Anthropic / Gemini / DeepSeek migration
+t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching, extended_thinking, pdf, computer_use)" }
+t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching, grounding, video, audio)" }
+t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning, low_cost)" }
+t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for new capabilities" }
+t5_5 = { status = "pending", commit_sha = "", description = "Phase 5 docs + archive" }
+
+[verification]
+phase_1_tool_loop_lifted = false
+phase_2_providers_moved = false
+phase_3_all_9_ux_adaptations = false
+phase_4_local_first_and_matrix_v2 = false
+phase_5_anthropic_gemini_deepseek_matrix = false
+full_test_suite_passes = false
+no_inline_tool_loops = false
+no_providers_in_models_py = false
+
+[open_questions]
+# Phase 4
+where_should_providers_live = "src/ai_client.py (existing file) or new src/ai_client_providers.py (new file)?"
+
+[local_first_priority]
+# Per user feedback 2026-06-11: emphasize local models as first-class
+# vs cloud/online vendors. Add UI badge, distinct cost state, native Ollama.
+local_model_as_first_class = true
+native_ollama_default_for_llama = true
+meta_llama_api_4th_backend = true
+local_badge_in_gui = true
+distinct_cost_state_for_local = true
diff --git a/docs/guide_ai_client.md b/docs/guide_ai_client.md
index 8993157d..d8251d0e 100644
--- a/docs/guide_ai_client.md
+++ b/docs/guide_ai_client.md
@@ -6,10 +6,17 @@
 
 ## Overview
 
-`src/ai_client.py` (~116KB) is the **unified LLM client** for 5 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI) behind a single `send()` function.
+`src/ai_client.py` (~116KB) is the **unified LLM client** for 8 providers. It abstracts the differences between providers (Gemini, Anthropic, DeepSeek, MiniMax, Gemini CLI, Qwen, Grok, Llama) behind a single `send()` function.
 
 The module is a **stateful singleton** — all provider state is held in module-level globals. There is no class wrapping; the module itself is the abstraction layer.
 
+The 8 providers split into 3 API shapes:
+- **Native SDK**: Gemini (google-genai), Anthropic (anthropic), Qwen (DashScope)
+- **OpenAI-compatible**: MiniMax, Grok, Llama (Ollama/OpenRouter/custom), DeepSeek
+- **Subprocess**: Gemini CLI
+
+The OpenAI-compatible vendors all call the shared helper in `src/openai_compatible.py` (added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track; see "Shared OpenAI-Compatible Helper" section below). The MiniMax provider's `_send_minimax` was refactored to use this helper (Phase 4 of the same track, 231 → 75 lines, 68% reduction).
+
 ---
 
 ## Module-Level Imports
@@ -430,4 +437,91 @@ Gated by env var (e.g., `RUN_REAL_AI_TESTS=1`). Hits the real API. Not in defaul
 - **[guide_state_lifecycle.md](guide_state_lifecycle.md)** — The per-provider history globals (`_anthropic_history`, etc.) are managed here; their locking and reset behavior is documented
 - **[guide_context_aggregation.md](guide_context_aggregation.md)** — The `aggregate.py` pipeline that produces the markdown the AI client sends
 - **[conductor/product.md](../conductor/product.md#multi-provider-integration)** — Product-level overview of providers
+- **[docs/reports/qwen_llama_grok_followup_audit_20260611.md](qwen_llama_grok_followup_audit_20260611.md)** — Audit of the parent track's gaps; follow-up track `qwen_llama_grok_followup_20260611` covers them
+
+---
+
+## Shared OpenAI-Compatible Helper (`src/openai_compatible.py`)
+
+Added 2026-06-06 by the `qwen_llama_grok_integration_20260606` track. Operates on a normalized request/response data structure so 4 OpenAI-compatible vendors (MiniMax, Grok, Llama, DeepSeek) can share the same request building, response parsing, streaming aggregation, tool call detection, and error classification logic.
+
+### Data Structures
+
+```python
+@dataclass(frozen=True)
+class NormalizedResponse:
+    text: str
+    tool_calls: list[dict[str, Any]]
+    usage_input_tokens: int
+    usage_output_tokens: int
+    usage_cache_read_tokens: int
+    usage_cache_creation_tokens: int
+    raw_response: Any
+
+@dataclass
+class OpenAICompatibleRequest:
+    messages: list[dict[str, Any]]
+    model: str
+    temperature: float = 0.0
+    top_p: float = 1.0
+    max_tokens: int = 8192
+    tools: Optional[list[dict[str, Any]]] = None
+    tool_choice: str = "auto"
+    stream: bool = False
+    stream_callback: Optional[Callable[[str], None]] = None
+```
+
+### The Function
+
+```python
+def send_openai_compatible(
+    client: Any,        # openai.OpenAI client with vendor-specific base_url + auth
+    request: OpenAICompatibleRequest,
+    *, capabilities: "VendorCapabilities",  # from src/vendor_capabilities.py
+) -> NormalizedResponse:
+```
+
+The function:
+1. Translates `request.messages` into the OpenAI SDK's `messages` parameter (passthrough — already in OpenAI shape).
+2. Translates `request.tools` if non-None (passthrough for now; future: strip unsupported fields based on `capabilities`).
+3. Calls `client.chat.completions.create(...)` with the right parameters.
+4. If streaming: aggregates chunks; calls `stream_callback(text_chunk)` for each text delta; collects final usage from the last chunk.
+5. If non-streaming: parses the response in one shot.
+6. Returns a `NormalizedResponse` with text, tool calls (in OpenAI shape), usage stats.
+7. On exception: classifies the OpenAI exception and re-raises as `ProviderError`.
+
+### Usage Pattern (per vendor)
+
+```python
+# _send_grok, _send_llama (single-shot placeholders), _send_minimax (with restored tool loop)
+def _send_grok(md_content, user_message, base_dir, file_items=None, discussion_history="", stream=False, ...):
+    client = _ensure_grok_client()  # openai.OpenAI(api_key=..., base_url="https://api.x.ai/v1")
+    with _grok_history_lock:
+        # ... build messages, append user, system + context ...
+        request = OpenAICompatibleRequest(
+            messages=messages, model=_model, stream=stream,
+            stream_callback=stream_callback,
+        )
+        caps = get_capabilities("grok", _model)
+        response = send_openai_compatible(client, request, capabilities=caps)
+        # ... append to history, return response.text ...
+```
+
+### Qwen Adapter (`src/qwen_adapter.py`)
+
+Qwen uses Alibaba's DashScope native SDK (not OpenAI-compatible) because DashScope's OpenAI-compatible mode drops important features (Qwen-Audio, Qwen-Long custom chunking, Qwen-VL-Max enhanced vision). The adapter normalizes DashScope tool format to OpenAI shape via `build_dashscope_tools()` and classifies DashScope exceptions via `classify_dashscope_error()`.
+
+### Llama Multi-Backend
+
+`_send_llama` supports 3 backends via the state globals `_llama_base_url` and `_llama_api_key`:
+- **Ollama** (local): `http://localhost:11434/v1`; no auth
+- **OpenRouter** (cloud aggregator): `https://openrouter.ai/api/v1`
+- **Custom URL** (escape hatch): any OpenAI-compatible endpoint
+
+The local-LLM signal is `_get_llama_cost_tracking()` (returns False for localhost/127.0.0.1).
+
+### Tests
+
+- `tests/test_vendor_capabilities.py` (3 tests): registry lookup, vendor-default fallback, unknown-vendor raises
+- `tests/test_openai_compatible.py` (6 tests): non-streaming, streaming aggregation, tool call detection, vision, error classification, frozen dataclass
 - **[conductor/tracks/nagent_review_20260608/report.md §15 Pitfalls #2 and #4](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the per-provider history globals and the stateful singleton pattern; future-track candidate for stateless LLMClient
diff --git a/docs/guide_models.md b/docs/guide_models.md
index 3ab9e618..e24736b5 100644
--- a/docs/guide_models.md
+++ b/docs/guide_models.md
@@ -363,7 +363,7 @@ The file also defines several module-level constants used across the app:
 
 ```python
 # Provider routing
-PROVIDERS: list[str] = ["gemini", "anthropic", "deepseek", "MiniMax", "gemini-cli"]
+PROVIDERS: list[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
 
 # Tool categories (for Tool Bias)
 TOOL_CATEGORIES: list[str] = [
diff --git a/docs/reports/qwen_llama_grok_followup_audit_20260611.md b/docs/reports/qwen_llama_grok_followup_audit_20260611.md
new file mode 100644
index 00000000..8789f24e
--- /dev/null
+++ b/docs/reports/qwen_llama_grok_followup_audit_20260611.md
@@ -0,0 +1,165 @@
+# Qwen/Llama/Grok Follow-Up Audit Report (2026-06-11)
+
+**Date:** 2026-06-11
+**Author:** Tier 2 Tech Lead
+**Subject:** Why a follow-up track is needed after `qwen_llama_grok_integration_20260606` Phase 5
+
+## TL;DR
+
+The parent track shipped 5 of 6 phases with 50/79 tasks done. The Tech Lead **did not surface the gaps at the checkpoints**; the user discovered them only at the Phase 5 checkpoint. The user is right: the Tech Lead's "footnote for now" pattern is bad — it looks like the work was hidden until called out.
+
+**7 categories of gap** are documented here. Each is captured in the new follow-up track `qwen_llama_grok_followup_20260611`.
+
+---
+
+## 1. Phase 5 partial: 1 of 9 UX adaptations shipped
+
+**What shipped:** Adaptation 1 (Screenshot button iff vision) at `src/gui_2.py:3030` + the helper `_get_active_capabilities()` at `src/gui_2.py:733`.
+
+**What didn't ship:** Adaptations 2-9:
+- Tools toggle iff tool_calling
+- Cache panel iff caching
+- Stream progress iff streaming
+- Fetch Models button iff model_discovery
+- Token budget max = context_window
+- Cost panel × 3 (estimate / "Free (local)" for localhost / "—" for other cost_tracking=false)
+
+**The right move:** All 9 at once, OR explicit user-facing "I'm shipping 1 of 9; the other 8 are deferred" BEFORE doing adaptation 1. The Tech Lead did the latter in a footnote, which the user called out as bad UX.
+
+---
+
+## 2. Tool-call loop regression: only MiniMax works
+
+**What shipped:** `_send_minimax` has a working tool loop. The other 7 vendor entry points do not.
+
+| Vendor | Tool loop? | Why |
+|---|---|---|
+| `_send_minimax` | ✅ Works (231 → 75 lines after refactor + tool loop restoration) | Worker did the refactor; I added the tool loop back manually |
+| `_send_qwen` | ❌ Single-shot | Phase 2 worker omitted it (Qwen has DashScope-specific tool format) |
+| `_send_grok` | ❌ Single-shot | Phase 3 worker omitted it (placeholder) |
+| `_send_llama` | ❌ Single-shot | Phase 3 worker omitted it (placeholder) |
+| `_send_anthropic` | ✅ Inline (4-way duplication with the other 3) | Pre-existing pattern |
+| `_send_gemini` | ✅ Inline | Pre-existing pattern |
+| `_send_gemini_cli` | ✅ Inline | Pre-existing pattern |
+| `_send_deepseek` | ✅ Inline | Pre-existing pattern |
+
+**The right move:** Lift the loop into a shared `run_with_tool_loop` helper that takes history management as injected parameters. Apply to all 8 vendors. This is a single-fix, 8-call-site refactor — much smaller than letting the duplication grow.
+
+The Tech Lead caught this at the end of Phase 4 (during the MiniMax refactor) but should have caught it at the end of Phase 2 (when the Qwen worker shipped single-shot) or the end of Phase 3 (when Grok+Llama workers shipped single-shot).
+
+---
+
+## 3. `src/models.py` has a PROVIDERS list — the user is right that this is sprawl
+
+**What's there now:**
+```python
+# src/models.py:79
+PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
+```
+
+**The problem:** `src/models.py` is for **MMA data models** (Tickets, Tracks, FileItem, WorkerContext, etc.). The vendor list is an **AI client concern**. The audit script `audit_no_models_config_io.py` enforces config I/O rules; PROVIDERS has no analogous enforcement.
+
+**The right move:** Move PROVIDERS to `src/ai_client.py` (or a new `src/ai_client_providers.py`). Add `scripts/audit_providers_source_of_truth.py` that fails the build if PROVIDERS is declared in models.py.
+
+The Tech Lead justified keeping it in models.py with "the centralized registry pattern" without asking whether models.py was the right home.
+
+---
+
+## 4. `src/ai_client.py` is 2784 lines and growing
+
+**What's there:** 8 vendor entry points (`_send_anthropic`, `_send_gemini`, `_send_gemini_cli`, `_send_deepseek`, `_send_minimax`, `_send_qwen`, `_send_grok`, `_send_llama`) plus all the supporting machinery (client init, history management, error classification, reasoning content extraction).
+
+**The 8 vendors' inline patterns are 70% similar.** Each has:
+- Client init (credentials + SDK setup)
+- History management (per-vendor lock + history list + repair + trim)
+- Message building (system + context + user content)
+- API call (via SDK or HTTP)
+- Tool loop (or single-shot — see gap #2)
+- Reasoning content extraction
+- Error classification
+
+**The right move:** Codepath consolidation. The shared `send_openai_compatible` covers the API call. A future `run_with_tool_loop` covers the tool loop (gap #2). What's left:
+- History management as a `VendorHistory` class or per-vendor thin wrapper
+- Reasoning content extraction as a uniform helper
+- Error classification as a per-HTTP-code helper
+
+Could cut `src/ai_client.py` by 30-40% (~1000 lines).
+
+---
+
+## 5. Local models deserve more emphasis
+
+**What's there now:** Ollama is one of 3 Llama backends (Ollama, OpenRouter, custom_url). The `cost_tracking: False` for localhost is a small signal.
+
+**The user feedback (verbatim):** "I want to put more emphasis and supporting local models and separating local model vending vis online/cloud vendors of models."
+
+**The right architecture:**
+- Add `local: bool` to VendorCapabilities (separate from `cost_tracking`)
+- Native Ollama (`/api/chat`) as the **default** for Llama (not the OpenAI-compatible fallback)
+- Meta Llama API as a 4th backend (the docs URL returned 400 last session; needs re-verification)
+- GUI: "Local Model" badge per-vendor
+- Cost panel: 4th state "Local (no cost)" distinct from "Free (local)" and "—"
+- vLLM, LM Studio, llama.cpp as additional custom-URL backends with discoverable presets
+
+This is a significant priority shift. The follow-up track's Phase 4 leads with this.
+
+---
+
+## 6. V2 matrix field expansion documented but not implemented
+
+**What the spec says (per Grok's consultation):** Add 12 new fields to VendorCapabilities:
+- `local: bool`
+- `reasoning: bool` (xAI `reasoning_effort`, Anthropic extended thinking, Ollama `think`)
+- `structured_output: bool` (response_format / format)
+- `code_execution: bool` (xAI code_interpreter, Anthropic Computer Use, Gemini Code Execution)
+- `web_search: bool` (xAI web_search, Gemini Grounding)
+- `x_search: bool` (xAI X/Twitter search)
+- `file_search: bool` (xAI file_search, Anthropic PDF, Gemini file API)
+- `mcp_support: bool` (xAI mcp_calls, Anthropic MCP)
+- `audio: bool` (Qwen-Audio, Gemini audio)
+- `video: bool` (Gemini video)
+- `grounding: bool` (Gemini Grounding with Google Search)
+- `computer_use: bool` (Anthropic Computer Use)
+
+**What shipped:** 0 of 12. None wired. No UI adaptations.
+
+The follow-up track's Phase 4 lands these.
+
+---
+
+## 7. Anthropic / Gemini / DeepSeek still not on the matrix
+
+**What's there:** These 3 vendors have unique APIs (4-breakpoint caching, genai SDK, raw HTTP) and the migration to the matrix is non-trivial. The follow-up track is documented (`parent spec §13.1.A`) but never scheduled.
+
+**The value:** Anthropic has prompt caching, extended thinking, Computer Use (big UX wins). Gemini has Grounding with Google Search, native video. DeepSeek has reasoning models.
+
+The follow-up track's Phase 5 lands these.
+
+---
+
+## Lessons (Tech Lead Process)
+
+1. **Surface gaps as they appear, not at the checkpoint.** If a task is going to be deferred mid-phase, say so immediately — don't footnote it later.
+2. **Be explicit about architectural deviations.** The `src/models.py` PROVIDERS sprawl should have been raised at Phase 2, not at Phase 5.
+3. **Plan for the test infrastructure before coding.** The tool-loop regression wasn't caught because no test exercised the loop.
+4. **The "footnote for now" pattern is bad UX.** It looks like the work was hidden until called out. Either ship the work or be explicit about deferring it BEFORE doing the work.
+
+## Follow-Up Track
+
+`conductor/tracks/qwen_llama_grok_followup_20260611/` — 5 phases:
+- Phase 1: Tool loop lift (run_with_tool_loop helper for 8 vendors)
+- Phase 2: PROVIDERS move (out of src/models.py)
+- Phase 3: UX adaptations 2-9 (8 of 9 deferred from parent Phase 5)
+- Phase 4: Local-first + matrix v2 expansion (12 new fields)
+- Phase 5: Anthropic / Gemini / DeepSeek migration
+
+## Parent Track Status
+
+`qwen_llama_grok_integration_20260606` is **NOT being archived** (per user directive). It stays open in `conductor/tracks/` for the follow-up to use as a reference. Phase 6 docs are being done now; the track folder remains at the same path.
+
+## See Also
+
+- `conductor/tracks/qwen_llama_grok_followup_20260611/spec.md` — the follow-up spec
+- `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` — the follow-up state
+- `conductor/tracks/qwen_llama_grok_followup_20260611/TODO.md` — the setup checklist
+- `conductor/tracks/qwen_llama_grok_integration_20260606/` — the parent track