diff --git a/docs/reports/qwen_llama_grok_followup_phase5_partial_20260611.md b/docs/reports/qwen_llama_grok_followup_phase5_partial_20260611.md new file mode 100644 index 00000000..4497209f --- /dev/null +++ b/docs/reports/qwen_llama_grok_followup_phase5_partial_20260611.md @@ -0,0 +1,132 @@ +# qwen_llama_grok_followup_20260611 — Phase 5 Partial Session Report (2026-06-11) + +## TL;DR + +This session shipped 5 of 8 Phase 5 tasks. The remaining 3 +(t5_6, t5_7, t5_8 — vendor tool-loop conversion for +anthropic, gemini, deepseek) are multi-day refactors and +are deferred to a follow-up track. + +## Phase 5 status + +| Task | Status | Commit | What | +|---|---|---|---| +| t5_1 | ✓ | 7fee76f4 | Anthropic matrix entries (12) | +| t5_2 | ✓ | 7fee76f4 | Gemini matrix entries (5) | +| t5_3 | ✓ | 7fee76f4 | DeepSeek matrix entries (4) | +| t5_4 | ✓ | c9135b05 | UI: v2 capability badges (visibility-only) | +| t5_5 | ✓ | 88aea319 | Phase 5 docs | +| t5_6 | ⏸ | — | anthropic tool-loop (3-5 days; follow-up track) | +| t5_7 | ⏸ | — | gemini tool-loop (3-5 days; follow-up track) | +| t5_8 | ⏸ | — | deepseek tool-loop (1-2 days; follow-up track) | + +Phase 5 checkpoint: `3a4b476` (5 of 8 tasks done). + +## What this session added + +### Matrix entries for 3 vendors (commit 7fee76f4) + +Previously the 3 vendors had no registry entries and +`get_capabilities('anthropic', ...)` raised `KeyError`, +causing the GUI to fall back to the "unregistered" defaults +(vision=False, no caching, etc.). Now all 8 vendors in +PROVIDERS are on the matrix: + +- **Anthropic** (12 entries): wildcard + 4 sonnet + 6 opus + + haiku + claude-fable-5. Caching, structured_output, + file_search, mcp_support, computer_use all True. +- **Gemini** (5 entries): wildcard + 3.1-pro-preview + + 3-flash-preview + 2.5-flash + 2.5-flash-lite. Caching, + vision, grounding, structured_output, video, audio all + per the actual Gemini capabilities. +- **DeepSeek** (4 entries): wildcard + v3 + reasoner + r1. + Reasoning for r1/reasoner, structured_output for all. + +### V2 capability badges in GUI (commit c9135b05) + +A new module-level function `_render_v2_capability_badges(caps)` +in `src/gui_2.py` renders small green badges in the provider +panel for each of the 11 v2 fields where `caps. = True`. +The user can see at a glance which capabilities their active +vendor+model supports. + +This is **visibility-only** — not interactive toggles, panels, +or attachment buttons. The interactive UI for the 11 fields +is design work deferred to a follow-up track. + +### Audit script fix (commit 1577cca5) + +The `scripts/audit_no_inline_tool_loops.py` had a stale +exclusion list entry `'gemini_native'` (a non-existent +function name). Removed. Now correctly excludes +`anthropic`, `gemini`, `deepseek` (the 3 actually-deferred +vendors). + +### Docs updates (commit 88aea319) + +- `docs/guide_ai_client.md`: new sections on + `run_with_tool_loop`, native Ollama adapter, V2 + Capability Matrix, PROVIDERS location. +- `docs/guide_models.md`: new sections on PROVIDERS + Constant (location change) and V2 Capability Matrix + (how to add a new v2 field per the HARD RULE). + +These were stale; they still described the v1 matrix and +the old "inline tool loop" pattern. + +## Verification + +| Test | Before | After | +|---|---|---| +| Total tests | 107 | 118 (+11) | +| Vendors with matrix entries | 5 of 8 | 8 of 8 | +| Vendors using `run_with_tool_loop` | 4 of 8 | 4 of 8 (unchanged; gemini_cli was already migrated last session) | +| Audit scripts passing | 3 | 3 | + +The 11 new tests: 9 matrix-entry tests (anthropic × 4, +gemini × 3, deepseek × 2) + 2 badge-helper tests. + +## What's deferred to a follow-up track + +The remaining 3 Phase 5 tasks are all in the "vendor tool-loop +conversion" category. Each is a multi-day refactor (per the +Groq+Llama+Qwen conversion complexity in the parent track): + +| Task | Vendor | Estimated work | +|---|---|---| +| t5_6 | anthropic | 3-5 days | +| t5_7 | gemini | 3-5 days | +| t5_8 | deepseek | 1-2 days | + +**Recommended approach**: Plan these as a separate track with +its own `spec.md` + `plan.md`. Each vendor should have its own +TDD cycle (Red → Green → Refactor) with one vendor per phase. +The audit script's `DEFERRED_VENDORS` frozenset can be emptied +incrementally as each phase ships. + +## State file summary + +After this session, `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` has: +- 41 tasks (was 41; same count, statuses updated) +- 6 phases (phase_1-4 completed; phase_5 in_progress; phase_6 pending) +- 12 verification fields (`phase_4_local_first_and_matrix_v2: true`; the rest false) +- Phase 5 checkpoint SHA: `3a4b476` + +## Commits this session (8 total) + +1. `ab9f65da` — set current_phase=5 +2. `1577cca5` — fix(audit): remove stale gemini_native +3. `7fee76f4` — feat(capability_matrix): anthropic, gemini, deepseek entries +4. `c9135b05` — feat(gui): v2 capability badges +5. `88aea319` — docs(guides): run_with_tool_loop, native Ollama, v2 matrix, PROVIDERS +6. `b3cfb51e` — conductor(plan): mark t5_5 complete +7. `3a4b476` — conductor(checkpoint): Phase 5 partial +8. `8519df16` — conductor(plan): Phase 5 checkpoint SHA recorded + +## See Also + +- Phase 1-4 session-end report: `docs/reports/qwen_llama_grok_followup_session_end_20260611.md` +- Deferred work resolution: `docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md` +- Meta Llama API verification: `docs/reports/meta_llama_api_verification_20260611.md` +- State file: `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` +- Track folder: `conductor/tracks/qwen_llama_grok_followup_20260611/`