docs(phase-6): update ai_client+models guides; report + follow-up track setup

Phase 6 t6.1 + t6.2 (no archive per user directive): - docs/guide_ai_client.md: update Overview to mention 8 providers (was 5); add 'Shared OpenAI-Compatible Helper' section explaining src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible, usage pattern); document the Qwen adapter and Llama multi-backend. - docs/guide_models.md: update PROVIDERS list to 8 entries (was 5). - conductor/tracks.md: update the Qwen track entry to reflect '50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up'; add detailed status note pointing to the follow-up track + audit report. - docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report explaining why a follow-up is needed (7 categories of gaps; the Tech Lead's 'footnote for now' failure mode; the lessons learned). - conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up track setup (spec.md, state.toml, metadata.json, TODO.md). 5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9, local-first + matrix v2, Anthropic/Gemini/DeepSeek migration. Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed) are NOT applied per user directive: 'we can then doc this we're not archiving yet, if we have a follow up track I need this one to stay up because there is still alot todo'.
2026-06-11 09:33:18 -04:00
parent 457255bcd4
commit 691dc584eb
8 changed files with 745 additions and 3 deletions
@@ -0,0 +1,81 @@
+# Track: Qwen, Llama & Grok Follow-Up (Post-Phase 5)
+
+This is a TODO list for setting up the follow-up track. The Tier 2 Tech Lead will execute items in order.
+
+## Status
+
+- [x] Spec drafted: `conductor/tracks/qwen_llama_grok_followup_20260611/spec.md`
+- [ ] state.toml initialized
+- [ ] metadata.json created
+- [ ] Phase 1 ready to start
+
+## Immediate TODOs (in order)
+
+1. **Read parent track state**
+   - [ ] Read `conductor/tracks/qwen_llama_grok_integration_20260606/state.toml` to confirm Phase 6 is complete
+   - [ ] Read `conductor/tracks/qwen_llama_grok_integration_20260606/plan.md` and find tasks tagged t6.* to confirm Phase 6 done
+
+2. **Create the follow-up track structure**
+   - [ ] Create `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` with 5 phases × ~7 tasks
+   - [ ] Create `conductor/tracks/qwen_llama_grok_followup_20260611/metadata.json` with verification_criteria
+
+3. **Phase 1: Tool Loop Lift (first concrete work)**
+   - [ ] Read current tool-loop patterns in `_send_minimax` (231 → 75 lines after refactor) and `_send_anthropic/_send_gemini/_send_gemini_cli/_send_deepseek` (inline loops)
+   - [ ] Design `run_with_tool_loop(client, request, capabilities, *, pre_tool_callback, qa_callback, patch_callback, base_dir, vendor_name, history_lock, history, trim_func)` helper
+   - [ ] Write 5 Red tests: no-tool-calls returns immediately, tool-calls dispatch, max-rounds limit, history appending, error-in-tool-call doesn't crash
+   - [ ] Implement helper in `src/tool_loop.py`
+   - [ ] Apply to all 8 vendors
+   - [ ] Audit script `scripts/audit_no_inline_tool_loops.py` to enforce the pattern
+   - [ ] Verify all 38+ existing tests still pass
+   - [ ] Phase 1 checkpoint
+
+4. **Phase 2: PROVIDERS Move**
+   - [ ] Decide: `src/ai_client.py` vs new `src/ai_client_providers.py` (open question in spec)
+   - [ ] Move PROVIDERS constant
+   - [ ] Update 5 import sites
+   - [ ] Add `scripts/audit_providers_source_of_truth.py`
+   - [ ] Verify all 38+ tests pass
+   - [ ] Phase 2 checkpoint
+
+5. **Phase 3: UX Adaptations 2-9**
+   - [ ] Apply each adaptation one at a time, 1-2 per commit
+   - [ ] Run live_gui tests in batch after each commit
+   - [ ] Phase 3 checkpoint when all 9 adaptations done
+
+6. **Phase 4: Local-First + Matrix Expansion**
+   - [ ] Add `local: bool` to VendorCapabilities
+   - [ ] Native Ollama adapter (verify URL https://docs.ollama.com/api/chat is up)
+   - [ ] Meta Llama API adapter (verify URL https://llama.developer.meta.com/docs/overview is up — was 400 last session)
+   - [ ] GUI: "Local Model" badge
+   - [ ] Add 12 v2 fields to VendorCapabilities
+   - [ ] Update all vendor registry entries
+   - [ ] UI adaptations for the new fields
+   - [ ] Phase 4 checkpoint
+
+7. **Phase 5: Anthropic / Gemini / DeepSeek Migration**
+   - [ ] Populate Anthropic matrix entries
+   - [ ] Populate Gemini matrix entries
+   - [ ] Populate DeepSeek matrix entries
+   - [ ] UI adaptations
+   - [ ] Docs + archive
+
+## Pre-Work Prerequisites
+
+Before starting Phase 1, confirm the parent track's Phase 6 is complete:
+- `docs/guide_ai_client.md` updated with new vendors, matrix, helper
+- `docs/guide_models.md` updated with new PROVIDERS entries
+- Parent track folder **stays open** in `conductor/tracks/` (not archived)
+- `conductor/tracks.md` reflects active status
+
+## Lessons from Parent Track (apply to this one)
+
+- **Surface gaps as they appear, not at the checkpoint.** If a task is going to be deferred mid-phase, say so immediately — don't footnote it later.
+- **Be explicit about architectural deviations.** The `src/models.py` PROVIDERS sprawl should have been raised at Phase 2, not at Phase 5.
+- **Plan for the test infrastructure before coding.** The parent track's tool-loop regression wasn't caught because no test exercised the loop. Future work: every helper gets tests BEFORE implementation.
+
+## Status
+
+- T0: Spec drafted (this file) — DONE
+- T1: Parent track Phase 6 verification — TODO
+- T2: Follow-up track files created — TODO
+- T3: Phase 1 (tool loop lift) — TODO
@@ -0,0 +1,79 @@
+{
+  "track_id": "qwen_llama_grok_followup_20260611",
+  "name": "Qwen/Llama/Grok Follow-Up (tool loop, PROVIDERS move, UX adaptations 2-9, local-first, matrix v2, Anthropic/Gemini/DeepSeek migration)",
+  "initialized": "2026-06-11",
+  "owner": "tier2-tech-lead",
+  "priority": "high",
+  "status": "active",
+  "type": "refactor + feature",
+  "scope": {
+    "new_files": [
+      "src/tool_loop.py",
+      "src/llama_ollama_native.py",
+      "src/llama_meta_api.py",
+      "tests/test_tool_loop.py",
+      "scripts/audit_no_inline_tool_loops.py",
+      "scripts/audit_providers_source_of_truth.py"
+    ],
+    "modified_files": [
+      "src/ai_client.py",
+      "src/vendor_capabilities.py",
+      "src/gui_2.py",
+      "src/models.py",
+      "tests/test_minimax_provider.py",
+      "tests/test_grok_provider.py",
+      "tests/test_llama_provider.py",
+      "tests/test_qwen_provider.py",
+      "tests/test_anthropic_provider.py",
+      "tests/test_gemini_provider.py",
+      "tests/test_deepseek_provider.py",
+      "docs/guide_ai_client.md",
+      "docs/guide_models.md"
+    ]
+  },
+  "blocked_by": {
+    "qwen_llama_grok_integration_20260606": "phase_6_in_progress"
+  },
+  "blocks": [
+    "anthropic_gemini_deepseek_capability_matrix_20260606"
+  ],
+  "estimated_phases": 5,
+  "spec": "spec.md",
+  "plan": "plan.md",
+  "state": "state.toml",
+  "todo": "TODO.md",
+  "priority_order": "A (tool loop lift + PROVIDERS move + UX 2-9) > B (local-first + matrix v2) > C (Anthropic/Gemini/DeepSeek migration)",
+  "user_directions": [
+    "2026-06-11: User wants REPORT explaining why a follow-up is needed (gaps in parent track).",
+    "2026-06-11: User wants LOCAL MODELS prioritized as first-class; current implementation treats Ollama as 'one of 3 backends' which under-emphasizes local.",
+    "2026-06-11: User wants the source-of-truth sprawl cleaned up (PROVIDERS in models.py is wrong; should be elsewhere).",
+    "2026-06-11: User wants ai_client.py further codepath consolidation; new files need review."
+  ],
+  "verification_criteria": [
+    "src/tool_loop.py:run_with_tool_loop handles no-tool-calls, dispatches tool calls, respects max-rounds, appends to history, doesn't crash on tool error",
+    "All 8 vendors (_send_minimax, _send_qwen, _send_grok, _send_llama, _send_anthropic, _send_gemini, _send_gemini_cli, _send_deepseek) use run_with_tool_loop",
+    "scripts/audit_no_inline_tool_loops.py passes (no inline tool loops in any _send_<vendor>)",
+    "PROVIDERS is no longer declared in src/models.py",
+    "scripts/audit_providers_source_of_truth.py passes",
+    "All 9 UX adaptations from parent spec §6 are applied to src/gui_2.py (1 from parent Phase 5 + 8 from this track's Phase 3)",
+    "src/llama_ollama_native.py: native Ollama adapter replaces OpenAI-compatible for Ollama backend (or used by default)",
+    "src/llama_meta_api.py: Meta Llama API adapter; new 4th backend",
+    "src/vendor_capabilities.py: 12 new v2 fields added (local, reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use)",
+    "All vendor registry entries updated with the new fields",
+    "Anthropic matrix entries populated (caching, extended_thinking, pdf, computer_use)",
+    "Gemini matrix entries populated (caching, grounding, video, audio)",
+    "DeepSeek matrix entries populated (reasoning, low_cost)",
+    "GUI: 'Local Model' badge added to AI Settings panel",
+    "GUI: 4 cost panel states (estimate / 'Free (local)' / '-' / new local-no-cost state)",
+    "All existing tests still pass (38+ in batch; full suite has pre-existing live_gui flakes)",
+    "No new threading.Thread calls",
+    "docs/guide_ai_client.md + docs/guide_models.md updated"
+  ],
+  "links": {
+    "parent_track": "conductor/tracks/qwen_llama_grok_integration_20260606/",
+    "parent_spec": "conductor/tracks/qwen_llama_grok_integration_20260606/spec.md",
+    "ai_client_guide": "docs/guide_ai_client.md",
+    "models_guide": "docs/guide_models.md",
+    "follow_up_audit_report": "docs/reports/qwen_llama_grok_followup_20260611.md (TBD; will be created in Phase 5)"
+  }
+}
@@ -0,0 +1,237 @@
+# Track: Qwen, Llama & Grok Follow-Up (Post-Phase 5)
+
+**Status:** Active (initializing)
+**Initialized:** 2026-06-11
+**Owner:** Tier 2 Tech Lead
+**Priority:** High (architectural consolidation + UX payoff; user is rightly concerned that the parent track shipped with gaps)
+
+---
+
+## Why This Track Exists
+
+The parent track `qwen_llama_grok_integration_20260606` (status: 50/79 tasks done, Phase 6 in progress) shipped 5 phases cleanly but **left meaningful gaps** that the Tier 2 Tech Lead did not surface until the Phase 5 checkpoint. This track captures the deferred work, ordered by impact.
+
+**The Tier 2's failure mode** (called out by the user 2026-06-11): "you never even told me until now and then you just say 'oh yeah we're done btw, fuck you' thats what it feels like." Rightly called. This track exists to fix that.
+
+---
+
+## Goals (Priority Order)
+
+| Priority | Goal | Rationale |
+|---|---|---|
+| **A (architectural)** | Lift the tool-call loop into a shared `run_with_tool_loop()` helper. Apply to all 4 new vendors + the 4 existing vendors. | Today only `_send_minimax` has a working tool loop. Qwen/Grok/Llama are single-shot (regression). Anthropic/Gemini/Gemini-cli/DeepSeek already have inline tool loops (4-way duplication). Lifting gives one place to fix bugs + add new behavior. |
+| **A (architectural)** | Move `PROVIDERS` out of `src/models.py`. | `src/models.py` is for MMA data models (Tickets, Tracks, FileItem). The vendor list is an AI client concern. The audit script `audit_no_models_config_io.py` enforces config I/O rules; PROVIDERS has no analogous enforcement. Move to `src/ai_client.py` (or new `src/ai_client_providers.py`); add an audit script that enforces the move. |
+| **A (UX payoff)** | Apply the remaining 8 of 9 UX adaptations from parent track spec §6: tools toggle (tool_calling), cache panel (caching), stream progress (streaming), fetch models (model_discovery), token budget max (context_window), cost panel × 3. | The pattern is established (adaptation 1 shipped in parent Phase 5); the helper `_get_active_capabilities()` is in place; the remaining 8 are mechanical applications. |
+| **B (local-first)** | Promote local models from "one of 3 backends" to first-class. | Add `local_backend: bool` capability field (separate from `cost_tracking`). Native Ollama (`/api/chat`) as the default for Llama (not the OpenAI-compatible fallback). Add Meta Llama API as a 4th backend. Add a "Local Model" UI badge. |
+| **B (matrix expansion)** | Land the v2 matrix fields: `local`, `reasoning`, `structured_output`, `code_execution`, `web_search`, `x_search`, `file_search`, `mcp_support`, `audio`, `video`, `grounding`, `computer_use`. | These are the 12 fields documented in parent spec §3.1.1 after the Grok consultation. None wired today. Each addition is registry + UI adaptation. |
+| **C (provider coverage)** | Migrate Anthropic / Gemini / DeepSeek onto the capability matrix. | Anthropic has prompt caching, extended thinking, Computer Use (high-value UX). Gemini has Grounding with Google Search, native video. DeepSeek has reasoning models. None of these capabilities are exposed in the GUI today. |
+| **C (codepath consolidation)** | Reduce `src/ai_client.py` line count (currently 2784). | The 8 vendors' inline patterns have grown. Lifting history management, reasoning content extraction, error classification per HTTP code into shared helpers would cut ~30-40% of the file. |
+
+### Non-Goals (this track)
+
+- Not changing the matrix schema (the 7 v1 fields are good; v2 is additive)
+- Not changing the shared `send_openai_compatible` helper (it works; the tool loop is separate)
+- Not changing the `vendor_capabilities.py` lookup pattern (it works; registry is the source of truth)
+- Not adding new vendors (the parent track added Qwen/Grok/Llama; this track only consolidates what's there)
+
+---
+
+## Architecture
+
+### A.1 Tool Loop Lift
+
+Today:
+```python
+# in _send_minimax (only):
+for _round in range(MAX_TOOL_ROUNDS + 2):
+    request = OpenAICompatibleRequest(...)
+    response = send_openai_compatible(client, request, capabilities=caps)
+    if not response.tool_calls: return response.text
+    results = asyncio.run(_execute_tool_calls_concurrently(response.tool_calls, ...))
+    # ... append results to history ...
+
+# in _send_qwen, _send_grok, _send_llama: no loop (single-shot, regression)
+# in _send_anthropic, _send_gemini, _send_gemini_cli, _send_deepseek: inline loop (4-way duplication)
+```
+
+After:
+```python
+# new src/tool_loop.py:
+def run_with_tool_loop(
+    client, request, capabilities, *,
+    pre_tool_callback, qa_callback, patch_callback,
+    base_dir, vendor_name, history_lock, history, trim_func,
+) -> str:
+    """Wraps send_openai_compatible with a tool-call loop. Works for any
+    OpenAI-compatible vendor; vendor-specific logic (history mgmt,
+    trim, message format) is injected via parameters."""
+    ...
+
+# in each _send_<vendor>:
+response = run_with_tool_loop(
+    client=_ensure_<vendor>_client(),
+    request=OpenAICompatibleRequest(...),
+    capabilities=get_capabilities(vendor, _model),
+    pre_tool_callback=..., qa_callback=..., patch_callback=...,
+    base_dir=base_dir, vendor_name="<vendor>",
+    history_lock=_<vendor>_history_lock,
+    history=_<vendor>_history,
+    trim_func=_(vendor)_trim_history,
+)
+```
+
+The helper takes history management as injected parameters (each vendor has its own lock and history list). The tool dispatch (`_execute_tool_calls_concurrently`) takes a `vendor_name` string.
+
+### A.2 PROVIDERS Move
+
+Today:
+```python
+# src/models.py:79
+PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
+```
+
+After:
+```python
+# src/ai_client.py (new location) or src/ai_client_providers.py (new file)
+PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
+
+# src/models.py: import from src.ai_client or keep as re-export shim for backward compat
+```
+
+The audit script: add `scripts/audit_providers_source_of_truth.py` that verifies PROVIDERS is not declared in `src/models.py`. Fails the build if regressed.
+
+### A.3 UX Adaptations 2-9
+
+Same pattern as the shipped adaptation 1 (Screenshot button iff vision). For each render site:
+```python
+caps = app._get_active_capabilities()
+imgui.begin_disabled(not caps.<field>)
+... UI ...
+imgui.end_disabled()
+if not caps.<field>:
+    imgui.same_line()
+    imgui.text_disabled("(reason)")
+```
+
+### B.1 Local-First Architecture
+
+- Add `local_backend: bool` to `VendorCapabilities` (default False)
+- Set True for Llama (when base_url is localhost/127.0.0.1)
+- Native Ollama adapter: `src/llama_ollama_native.py` (separate from openai_compatible)
+- Meta Llama API adapter: `src/llama_meta_api.py` (verify docs first)
+- GUI: new "Local Model" badge in the AI Settings panel
+- Cost panel: 4th state "Local (no cost)" distinct from "Free (local)" and "—"
+
+### B.2 Matrix Expansion (v2)
+
+Add to `VendorCapabilities`:
+- `local: bool` (B.1)
+- `reasoning: bool` (xAI `reasoning_effort`, Anthropic extended thinking, Ollama `think`)
+- `structured_output: bool` (response_format / format)
+- `code_execution: bool` (xAI code_interpreter, Anthropic Computer Use, Gemini Code Execution)
+- `web_search: bool` (xAI web_search, Gemini Grounding)
+- `x_search: bool` (xAI X/Twitter search, xAI-specific)
+- `file_search: bool` (xAI file_search, Anthropic PDF, Gemini file API)
+- `mcp_support: bool` (xAI mcp_calls, Anthropic MCP)
+- `audio: bool` (Qwen-Audio, Gemini audio)
+- `video: bool` (Gemini video)
+- `grounding: bool` (Gemini Grounding with Google Search)
+- `computer_use: bool` (Anthropic Computer Use)
+
+Each new field is a registry update + a UI adaptation. The matrix schema grows; the GUI filters based on the matrix.
+
+### C.1 Anthropic / Gemini / DeepSeek Migration
+
+Per the deferred follow-up track `anthropic_gemini_deepseek_capability_matrix_20260606` (parent spec §13.1.A). The capability matrix entries for these vendors can be populated:
+- `anthropic/*` with `caching: True` (prompt caching), `extended_thinking: True`, `pdf: True`, `computer_use: True`
+- `gemini/*` with `caching: True` (explicit cache), `grounding: True`, `video: True`, `audio: True`
+- `deepseek/*` with `reasoning: True` (R1), `low_cost: True`
+
+The implementations (`_send_anthropic`, `_send_gemini`, `_send_deepseek`) keep their unique per-vendor code paths. The matrix entries are the source of truth for the UI.
+
+---
+
+## Phase Plan (5 phases, 4 weeks of work)
+
+### Phase 1: Tool Loop Lift (1-2 weeks)
+- T1.1: Write red tests for `run_with_tool_loop` (5 tests covering: no tool calls returns immediately, tool calls dispatch, max rounds limit, history appending, error in tool call doesn't crash)
+- T1.2: Implement `src/tool_loop.py` with `run_with_tool_loop`
+- T1.3: Apply to `_send_minimax` (replace inline loop)
+- T1.4: Apply to `_send_qwen`, `_send_grok`, `_send_llama` (add the missing loop)
+- T1.5: Apply to `_send_anthropic`, `_send_gemini`, `_send_gemini_cli`, `_send_deepseek` (consolidate)
+- T1.6: Verify all 8 vendors' existing tests still pass
+- T1.7: Audit script `scripts/audit_no_inline_tool_loops.py` to enforce the pattern
+
+### Phase 2: PROVIDERS Move (1 week)
+- T2.1: Move `PROVIDERS` to `src/ai_client.py` (or new `src/ai_client_providers.py`)
+- T2.2: Update all 5 import sites (gui_2.py, app_controller.py, etc.) to point to new location
+- T2.3: Add `scripts/audit_providers_source_of_truth.py` to enforce the move
+- T2.4: Verify all 38+ tests pass
+
+### Phase 3: UX Adaptations 2-9 (1-2 weeks)
+- T3.1: Apply adaptation 2 (tools toggle iff tool_calling)
+- T3.2: Apply adaptation 3 (cache panel iff caching)
+- T3.3: Apply adaptation 4 (stream progress iff streaming)
+- T3.4: Apply adaptation 5 (fetch models iff model_discovery)
+- T3.5: Apply adaptation 6 (token budget max = context_window)
+- T3.6: Apply adaptation 7 (cost panel: estimate)
+- T3.7: Apply adaptation 8 (cost panel: "Free (local)" for localhost)
+- T3.8: Apply adaptation 9 (cost panel: "—" for other cost_tracking=false)
+- T3.9: Verify live_gui tests pass
+
+### Phase 4: Local-First + Matrix Expansion (1-2 weeks)
+- T4.1: Add `local: bool` to VendorCapabilities; update registry for Llama
+- T4.2: Native Ollama adapter (`src/llama_ollama_native.py`); replace OpenAI-compatible for Ollama backend
+- T4.3: Meta Llama API adapter (`src/llama_meta_api.py`); add as 4th Llama backend
+- T4.4: GUI: "Local Model" badge
+- T4.5: Add v2 fields (local, reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use)
+- T4.6: Update all vendor registry entries with the new fields
+- T4.7: Add UI adaptations for the new fields (e.g., "Reasoning" toggle, "Code execution" panel)
+
+### Phase 5: Anthropic / Gemini / DeepSeek Migration (1-2 weeks)
+- T5.1: Populate Anthropic matrix entries (caching, extended_thinking, pdf, computer_use)
+- T5.2: Populate Gemini matrix entries (caching, grounding, video, audio)
+- T5.3: Populate DeepSeek matrix entries (reasoning, low_cost)
+- T5.4: UI adaptations for the new capabilities
+- T5.5: Docs + archive
+
+---
+
+## Testing Strategy
+
+- All new helpers (`run_with_tool_loop`) get TDD: Red tests first, then implementation
+- All UX adaptations get a test that verifies the render function reads the capability
+- All audit scripts get a self-test (the script can detect its own absence)
+- Live_gui tests run in batch (per the docs_sync lessons: bisect in batch, not isolation)
+
+---
+
+## Risks
+
+- **Tool loop lift risk:** Anthropic and Gemini have unique tool-use formats (Anthropic uses `tool_use` blocks; Gemini uses `functionCall`). Lifting requires careful preservation. Mitigation: keep the per-vendor `tool_format_converter` injection as a parameter.
+- **PROVIDERS move risk:** 5 import sites to update; some might use `from src.models import PROVIDERS` and break. Mitigation: search-and-replace audit, run full test suite after.
+- **UX adaptation risk:** Same as parent Phase 5 — touching 260KB of GUI code is high risk. Mitigation: ship 1-2 per commit, run live_gui batch after each.
+
+---
+
+## Open Questions
+
+1. **`src/ai_client_providers.py` vs `src/ai_client.py`?** Should PROVIDERS go in a new file (clearer separation) or stay in the main ai_client module (less file proliferation)?
+2. **Meta Llama API spec verification:** The 400 error on the docs URL last session — is it back up? If not, defer the Meta backend.
+3. **Local model as separate UI mode?** Should the GUI have a "Local / Cloud / All" filter on the provider dropdown, or just show the local badge per-vendor?
+
+---
+
+## See Also
+
+- Parent track: `conductor/tracks/qwen_llama_grok_integration_20260606/`
+- Parent spec: `conductor/tracks/qwen_llama_grok_integration_20260606/spec.md`
+- Parent Phase 5 report: `docs/reports/qwen_llama_grok_integration_20260610.md` (TBD)
+- `docs/guide_ai_client.md` — the doc that needs updating in Phase 6 of the parent track
+
+---
+
+## Status
+
+- T0: Spec drafted (this file)
+- T1: Phase 1 (tool loop lift) ready to start
@@ -0,0 +1,86 @@
+# Track state for qwen_llama_grok_followup_20260611
+# Updated by Tier 2 Tech Lead as tasks complete
+
+[meta]
+track_id = "qwen_llama_grok_followup_20260611"
+name = "Qwen/Llama/Grok Follow-Up (tool loop, PROVIDERS move, UX adaptations 2-9, local-first, matrix v2, Anthropic/Gemini/DeepSeek migration)"
+status = "active"
+current_phase = 0
+last_updated = "2026-06-11"
+
+[blocked_by]
+# This follow-up is blocked on the parent track's Phase 6 (docs) completing.
+qwen_llama_grok_integration_20260606 = "phase_6_in_progress"
+
+[phases]
+phase_1 = { status = "pending", checkpoint_sha = "", name = "Tool loop lift (run_with_tool_loop helper for 8 vendors)" }
+phase_2 = { status = "pending", checkpoint_sha = "", name = "PROVIDERS move (out of src/models.py)" }
+phase_3 = { status = "pending", checkpoint_sha = "", name = "UX adaptations 2-9 (8 of 9 deferred from parent Phase 5)" }
+phase_4 = { status = "pending", checkpoint_sha = "", name = "Local-first + matrix v2 expansion (12 new fields)" }
+phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration" }
+
+[tasks]
+# Phase 1: Tool loop lift
+t1_1 = { status = "pending", commit_sha = "", description = "Read tool-loop patterns in _send_minimax + the 4 inline-loop vendors" }
+t1_2 = { status = "pending", commit_sha = "", description = "Design run_with_tool_loop helper signature" }
+t1_3 = { status = "pending", commit_sha = "", description = "Red: 5 tests for run_with_tool_loop in tests/test_tool_loop.py" }
+t1_4 = { status = "pending", commit_sha = "", description = "Green: implement run_with_tool_loop in src/tool_loop.py" }
+t1_5 = { status = "pending", commit_sha = "", description = "Apply to _send_minimax (replace inline loop)" }
+t1_6 = { status = "pending", commit_sha = "", description = "Apply to _send_qwen + _send_grok + _send_llama (add missing loop)" }
+t1_7 = { status = "pending", commit_sha = "", description = "Apply to _send_anthropic + _send_gemini + _send_gemini_cli + _send_deepseek (consolidate inline)" }
+t1_8 = { status = "pending", commit_sha = "", description = "Add scripts/audit_no_inline_tool_loops.py" }
+t1_9 = { status = "pending", commit_sha = "", description = "Phase 1 checkpoint + git note" }
+# Phase 2: PROVIDERS move
+t2_1 = { status = "pending", commit_sha = "", description = "Decide: src/ai_client.py vs new src/ai_client_providers.py" }
+t2_2 = { status = "pending", commit_sha = "", description = "Move PROVIDERS to new location" }
+t2_3 = { status = "pending", commit_sha = "", description = "Update 5 import sites" }
+t2_4 = { status = "pending", commit_sha = "", description = "Add scripts/audit_providers_source_of_truth.py" }
+t2_5 = { status = "pending", commit_sha = "", description = "Phase 2 checkpoint + git note" }
+# Phase 3: UX adaptations 2-9
+t3_1 = { status = "pending", commit_sha = "", description = "Adaptation 2: tools toggle iff tool_calling" }
+t3_2 = { status = "pending", commit_sha = "", description = "Adaptation 3: cache panel iff caching" }
+t3_3 = { status = "pending", commit_sha = "", description = "Adaptation 4: stream progress iff streaming" }
+t3_4 = { status = "pending", commit_sha = "", description = "Adaptation 5: fetch models iff model_discovery" }
+t3_5 = { status = "pending", commit_sha = "", description = "Adaptation 6: token budget max = context_window" }
+t3_6 = { status = "pending", commit_sha = "", description = "Adaptation 7: cost panel: estimate" }
+t3_7 = { status = "pending", commit_sha = "", description = "Adaptation 8: cost panel: 'Free (local)' for localhost" }
+t3_8 = { status = "pending", commit_sha = "", description = "Adaptation 9: cost panel: '-' for other cost_tracking=false" }
+t3_9 = { status = "pending", commit_sha = "", description = "Phase 3 checkpoint + git note" }
+# Phase 4: Local-first + matrix v2
+t4_1 = { status = "pending", commit_sha = "", description = "Add local: bool to VendorCapabilities" }
+t4_2 = { status = "pending", commit_sha = "", description = "Native Ollama adapter src/llama_ollama_native.py" }
+t4_3 = { status = "pending", commit_sha = "", description = "Meta Llama API adapter src/llama_meta_api.py" }
+t4_4 = { status = "pending", commit_sha = "", description = "GUI: 'Local Model' badge" }
+t4_5 = { status = "pending", commit_sha = "", description = "Add 12 v2 fields to VendorCapabilities" }
+t4_6 = { status = "pending", commit_sha = "", description = "Update all vendor registry entries" }
+t4_7 = { status = "pending", commit_sha = "", description = "UI adaptations for new fields (reasoning toggle, code execution panel, etc.)" }
+t4_8 = { status = "pending", commit_sha = "", description = "Phase 4 checkpoint + git note" }
+# Phase 5: Anthropic / Gemini / DeepSeek migration
+t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching, extended_thinking, pdf, computer_use)" }
+t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching, grounding, video, audio)" }
+t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning, low_cost)" }
+t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for new capabilities" }
+t5_5 = { status = "pending", commit_sha = "", description = "Phase 5 docs + archive" }
+
+[verification]
+phase_1_tool_loop_lifted = false
+phase_2_providers_moved = false
+phase_3_all_9_ux_adaptations = false
+phase_4_local_first_and_matrix_v2 = false
+phase_5_anthropic_gemini_deepseek_matrix = false
+full_test_suite_passes = false
+no_inline_tool_loops = false
+no_providers_in_models_py = false
+
+[open_questions]
+# Phase 4
+where_should_providers_live = "src/ai_client.py (existing file) or new src/ai_client_providers.py (new file)?"
+
+[local_first_priority]
+# Per user feedback 2026-06-11: emphasize local models as first-class
+# vs cloud/online vendors. Add UI badge, distinct cost state, native Ollama.
+local_model_as_first_class = true
+native_ollama_default_for_llama = true
+meta_llama_api_4th_backend = true
+local_badge_in_gui = true
+distinct_cost_state_for_local = true