docs(reports): add Phase 5 partial session-end report

5 of 8 Phase 5 tasks done in this session: - t5_1/2/3: matrix entries for the 3 remaining vendors (anthropic, gemini, deepseek) - 21 new entries - t5_4: visibility-only v2 capability badges in GUI - t5_5: docs updated (guide_ai_client.md + guide_models.md) Remaining 3 tasks (t5_6/7/8: tool-loop conversion for anthropic/gemini/deepseek) are multi-day refactors deferred to a follow-up track. 11 new tests (118 total, was 107); 3 audit scripts pass.
2026-06-11 21:55:54 -04:00
parent 8519df1643
commit 740762b3a7
1 changed files with 132 additions and 0 deletions
@@ -0,0 +1,132 @@
+# qwen_llama_grok_followup_20260611 — Phase 5 Partial Session Report (2026-06-11)
+
+## TL;DR
+
+This session shipped 5 of 8 Phase 5 tasks. The remaining 3
+(t5_6, t5_7, t5_8 — vendor tool-loop conversion for
+anthropic, gemini, deepseek) are multi-day refactors and
+are deferred to a follow-up track.
+
+## Phase 5 status
+
+| Task | Status | Commit | What |
+|---|---|---|---|
+| t5_1 | ✓ | 7fee76f4 | Anthropic matrix entries (12) |
+| t5_2 | ✓ | 7fee76f4 | Gemini matrix entries (5) |
+| t5_3 | ✓ | 7fee76f4 | DeepSeek matrix entries (4) |
+| t5_4 | ✓ | c9135b05 | UI: v2 capability badges (visibility-only) |
+| t5_5 | ✓ | 88aea319 | Phase 5 docs |
+| t5_6 | ⏸ | — | anthropic tool-loop (3-5 days; follow-up track) |
+| t5_7 | ⏸ | — | gemini tool-loop (3-5 days; follow-up track) |
+| t5_8 | ⏸ | — | deepseek tool-loop (1-2 days; follow-up track) |
+
+Phase 5 checkpoint: `3a4b476` (5 of 8 tasks done).
+
+## What this session added
+
+### Matrix entries for 3 vendors (commit 7fee76f4)
+
+Previously the 3 vendors had no registry entries and
+`get_capabilities('anthropic', ...)` raised `KeyError`,
+causing the GUI to fall back to the "unregistered" defaults
+(vision=False, no caching, etc.). Now all 8 vendors in
+PROVIDERS are on the matrix:
+
+- **Anthropic** (12 entries): wildcard + 4 sonnet + 6 opus
+  + haiku + claude-fable-5. Caching, structured_output,
+  file_search, mcp_support, computer_use all True.
+- **Gemini** (5 entries): wildcard + 3.1-pro-preview +
+  3-flash-preview + 2.5-flash + 2.5-flash-lite. Caching,
+  vision, grounding, structured_output, video, audio all
+  per the actual Gemini capabilities.
+- **DeepSeek** (4 entries): wildcard + v3 + reasoner + r1.
+  Reasoning for r1/reasoner, structured_output for all.
+
+### V2 capability badges in GUI (commit c9135b05)
+
+A new module-level function `_render_v2_capability_badges(caps)`
+in `src/gui_2.py` renders small green badges in the provider
+panel for each of the 11 v2 fields where `caps.<field> = True`.
+The user can see at a glance which capabilities their active
+vendor+model supports.
+
+This is **visibility-only** — not interactive toggles, panels,
+or attachment buttons. The interactive UI for the 11 fields
+is design work deferred to a follow-up track.
+
+### Audit script fix (commit 1577cca5)
+
+The `scripts/audit_no_inline_tool_loops.py` had a stale
+exclusion list entry `'gemini_native'` (a non-existent
+function name). Removed. Now correctly excludes
+`anthropic`, `gemini`, `deepseek` (the 3 actually-deferred
+vendors).
+
+### Docs updates (commit 88aea319)
+
+- `docs/guide_ai_client.md`: new sections on
+  `run_with_tool_loop`, native Ollama adapter, V2
+  Capability Matrix, PROVIDERS location.
+- `docs/guide_models.md`: new sections on PROVIDERS
+  Constant (location change) and V2 Capability Matrix
+  (how to add a new v2 field per the HARD RULE).
+
+These were stale; they still described the v1 matrix and
+the old "inline tool loop" pattern.
+
+## Verification
+
+| Test | Before | After |
+|---|---|---|
+| Total tests | 107 | 118 (+11) |
+| Vendors with matrix entries | 5 of 8 | 8 of 8 |
+| Vendors using `run_with_tool_loop` | 4 of 8 | 4 of 8 (unchanged; gemini_cli was already migrated last session) |
+| Audit scripts passing | 3 | 3 |
+
+The 11 new tests: 9 matrix-entry tests (anthropic × 4,
+gemini × 3, deepseek × 2) + 2 badge-helper tests.
+
+## What's deferred to a follow-up track
+
+The remaining 3 Phase 5 tasks are all in the "vendor tool-loop
+conversion" category. Each is a multi-day refactor (per the
+Groq+Llama+Qwen conversion complexity in the parent track):
+
+| Task | Vendor | Estimated work |
+|---|---|---|
+| t5_6 | anthropic | 3-5 days |
+| t5_7 | gemini | 3-5 days |
+| t5_8 | deepseek | 1-2 days |
+
+**Recommended approach**: Plan these as a separate track with
+its own `spec.md` + `plan.md`. Each vendor should have its own
+TDD cycle (Red → Green → Refactor) with one vendor per phase.
+The audit script's `DEFERRED_VENDORS` frozenset can be emptied
+incrementally as each phase ships.
+
+## State file summary
+
+After this session, `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml` has:
+- 41 tasks (was 41; same count, statuses updated)
+- 6 phases (phase_1-4 completed; phase_5 in_progress; phase_6 pending)
+- 12 verification fields (`phase_4_local_first_and_matrix_v2: true`; the rest false)
+- Phase 5 checkpoint SHA: `3a4b476`
+
+## Commits this session (8 total)
+
+1. `ab9f65da` — set current_phase=5
+2. `1577cca5` — fix(audit): remove stale gemini_native
+3. `7fee76f4` — feat(capability_matrix): anthropic, gemini, deepseek entries
+4. `c9135b05` — feat(gui): v2 capability badges
+5. `88aea319` — docs(guides): run_with_tool_loop, native Ollama, v2 matrix, PROVIDERS
+6. `b3cfb51e` — conductor(plan): mark t5_5 complete
+7. `3a4b476` — conductor(checkpoint): Phase 5 partial
+8. `8519df16` — conductor(plan): Phase 5 checkpoint SHA recorded
+
+## See Also
+
+- Phase 1-4 session-end report: `docs/reports/qwen_llama_grok_followup_session_end_20260611.md`
+- Deferred work resolution: `docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md`
+- Meta Llama API verification: `docs/reports/meta_llama_api_verification_20260611.md`
+- State file: `conductor/tracks/qwen_llama_grok_followup_20260611/state.toml`
+- Track folder: `conductor/tracks/qwen_llama_grok_followup_20260611/`