conductor(plan): resolve deferred work into proper task entries

The track had 3 categories of deferred work. Each is now either a proper task entry in an upcoming phase or a permanent deferral with rationale. Resolution: 1. Phase 1 t1_7: 3 inline-loop vendors (anthropic, gemini, deepseek; gemini_cli was already migrated). Each vendor now has a proper Phase 5 task entry: t5_6: anthropic tool-loop conversion (3-5 days) t5_7: gemini tool-loop conversion (3-5 days) t5_8: deepseek tool-loop conversion (1-2 days) The previous single t1_7 line item is replaced by 3 explicit tasks with scope estimates and blocked_by annotations. 2. Phase 4 t4_3: Meta Llama API. PERMANENT DEFERRED to Phase 6 t6_1. Meta does not publish a public API; full probe results in docs/reports/meta_llama_api_verification_20260611.md. 3. Phase 4 t4_7: UI adaptations for new v2 fields. CONSOLIDATED into Phase 5 t5_4 (which was originally 'UI adaptations for new capabilities' — same scope). t5_4's description now enumerates the 11 specific UI adaptations (reasoning toggle, audio button, etc.). t4_7 is cancelled to avoid duplicate task entries. Phase 5 expanded scope: 8 tasks total (was 5). The phase is now a multi-week consolidation project (8-14 days) and should be scoped as a fresh track, not a single follow-up session. Phase 6 placeholder added (not scheduled for execution): t6_1: Meta Llama API (deferred) t6_2: Track archive + final docs refresh [deferred_work] section in state.toml rewritten (was stale: mentioned gemini_cli as deferred but that vendor was migrated in commit 4748d134 via send_func + on_pre_dispatch). Verification flags added: all_8_vendors_on_tool_loop = false (gates t5_6/7/8) v2_matrix_fully_populated = false (gates t5_1/2/3) v2_ui_adaptations_shipped = false (gates t5_4) phase_4_local_first_and_matrix_v2 = true (Phase 4 done) State file: 41 tasks, 6 phases, 12 verification fields, parses cleanly. Report: docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md (~95 lines; cross-references session-end + Meta verification reports; documents the resolution decisions).
2026-06-11 21:20:44 -04:00
parent 6596349325
commit 58c4370142
2 changed files with 229 additions and 24 deletions
@@ -18,7 +18,8 @@ phase_1 = { status = "completed", checkpoint_sha = "ffe22c30", name = "Tool loop
 phase_2 = { status = "completed", checkpoint_sha = "7b24ee9", name = "PROVIDERS move (out of src/models.py)" }
 phase_3 = { status = "completed", checkpoint_sha = "43182af", name = "UX adaptations 2-9 (4 of 8 applied; 3 deferred; 1 already done)" }
 phase_4 = { status = "completed", checkpoint_sha = "bb7beaa", name = "Local-first + matrix v2 expansion (12 new fields)" }
-phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration" }
+phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration + UI adaptations + tool-loop conversion" }
+phase_6 = { status = "pending", checkpoint_sha = "", name = "Track archive + final docs refresh" }

 [tasks]
 # Phase 1: Tool loop lift
@@ -63,48 +64,102 @@ t4_4 = { status = "completed", commit_sha = "49d51604", description = "GUI: 'Loc
 t4_5 = { status = "completed", commit_sha = "0a9e2775", description = "Add 12 v2 fields to VendorCapabilities (combined with t4_1 in single atomic commit). All v2 fields added to the dataclass with default False." }
 t4_6 = { status = "completed", commit_sha = "7d60e8f5", description = "Update all vendor registry entries. Populated v2 fields per-model: reasoning for minimax-M2.5/M2.7/llama-3.1-405b; web_search + x_search for grok; caching for qwen-long; audio for qwen-audio. Runtime override for 'local' (dataclass.replace on llama+localhost)." }
 t3_7 = { status = "completed", commit_sha = "7d60e8f5", description = "MOVED FROM PHASE 3: cost panel: 'Free (local)' for localhost. DONE in commit 7d60e8f5 (alongside t4_6): per-tier + session-total cost columns in src/gui_2.py now render 'Free (local)' when caps.local=True." }
-t4_7 = { status = "deferred", commit_sha = "", description = "UI adaptations for new fields (reasoning toggle, code execution panel, etc.). DEFERRED to a separate follow-up track. See state.toml 'deferred_work' section." }
+t4_7 = { status = "cancelled", commit_sha = "", description = "CONSOLIDATED INTO Phase 5 t5_4. The 'UI adaptations for new v2 fields' task was originally here; the same scope is now explicitly t5_4 (UI adaptations for 11 v2 fields: reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use). Cancelled on 2026-06-11 to avoid duplicate task entries." }
 t4_8 = { status = "completed", commit_sha = "bb7beaa", description = "Phase 4 checkpoint + git note" }
-t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching, extended_thinking, pdf, computer_use)" }
-t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching, grounding, video, audio)" }
-t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning, low_cost)" }
-t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for new capabilities" }
+# Phase 5: Anthropic / Gemini / DeepSeek migration
+# Phase 5 has TWO sub-areas:
+#   A. Matrix entries (t5_1, t5_2, t5_3) — populate VendorCapabilities
+#      for the 3 remaining vendors
+#   B. Tool-loop conversion (t5_6, t5_7, t5_8) — DEFERRED from Phase 1
+#      t1_7; each vendor needs to be refactored to use
+#      run_with_tool_loop (which requires converting their vendored
+#      call path to OpenAICompatibleRequest + send_openai_compatible)
+#   C. UI adaptations for new v2 fields (t5_4) — DEFERRED from
+#      Phase 4 t4_7; 11 v2 fields need per-vendor UI treatment
+t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching=True, structured_output=True, file_search=True, mcp_support=True, computer_use=True). cost_input_per_mtok=3.00, cost_output_per_mtok=15.00; context_window=180000-200000 depending on model. Extended thinking is a per-request feature, not a static capability." }
+t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching=True, vision=True, video=True, audio=True, grounding=True, structured_output=True). context_window=900000+; cost_input_per_mtok=1.25, cost_output_per_mtok=5.00." }
+t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning=True for deepseek-reasoner/R1, structured_output=True, low cost=0.14/0.28 per Mtok). context_window=32768." }
+t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for 11 v2 fields: (1) reasoning toggle in AI settings; (2) structured_output JSON toggle; (3) code_execution panel; (4) web_search UI for tools; (5) x_search UI for grok-specific search; (6) file_search panel; (7) mcp_support toggle for MCP server enablement; (8) audio attachment button; (9) video attachment button; (10) grounding toggle; (11) computer_use toggle. CONSOLIDATED from Phase 4 t4_7 (which is now cancelled)." }
 t5_5 = { status = "pending", commit_sha = "", description = "Phase 5 docs + archive" }
+# Phase 5 tool-loop conversion (DEFERRED from Phase 1 t1_7)
+t5_6 = { status = "pending", commit_sha = "", description = "Convert _send_anthropic to use run_with_tool_loop. Requires: (1) refactor anthropic call path to produce OpenAICompatibleRequest + use send_openai_compatible; (2) preserve anthropic-specific features (prompt caching via cache_control, extended thinking via thinking param, computer_use tool type); (3) tests for tool-calling + caching + thinking. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
+t5_7 = { status = "pending", commit_sha = "", description = "Convert _send_gemini to use run_with_tool_loop. Requires: (1) refactor google-genai streaming call path to OpenAICompatibleRequest + send_openai_compatible; (2) preserve gemini-specific features (explicit caching, grounding, file/video/audio inputs); (3) tests for tool-calling + streaming. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
+t5_8 = { status = "pending", commit_sha = "", description = "Convert _send_deepseek to use run_with_tool_loop. Deepseek already uses OpenAI-compat (requests.post) but has an inline tool loop. (1) Refactor to produce OpenAICompatibleRequest; (2) replace inline loop with run_with_tool_loop; (3) preserve deepseek-reasoner reasoning_content in history. Estimated 1-2 days (similar shape to Grok+Llama conversion in parent track). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
+# Phase 6: Permanent deferrals + cleanup (NOT scheduled for
+# execution in this track; tasks are tracking placeholders)
+t6_1 = { status = "deferred", commit_sha = "", description = "Meta Llama API adapter. PERMANENT DEFERRED on 2026-06-11: docs URL works (200) but actual API endpoints are 404/403 (no public OpenAI-compat surface). See docs/reports/meta_llama_api_verification_20260611.md. To be done in a separate follow-up track when Meta publishes a public API. Estimate 1-2 days once a public URL exists." }
+t6_2 = { status = "pending", commit_sha = "", description = "Track archive + final docs refresh. Move conductor/tracks/qwen_llama_grok_followup_20260611/ to archive/ once all of Phase 5 (and any non-deferred t5_X tasks) are complete. Update conductor/tracks.md." }

 [verification]
 phase_1_tool_loop_lifted = false
 phase_2_providers_moved = false
 phase_3_all_9_ux_adaptations = false
-phase_4_local_first_and_matrix_v2 = false
+phase_4_local_first_and_matrix_v2 = true
 phase_5_anthropic_gemini_deepseek_matrix = false
+phase_6_archived = false
 full_test_suite_passes = false
 no_inline_tool_loops = false
 no_providers_in_models_py = false
+all_8_vendors_on_tool_loop = false
+v2_matrix_fully_populated = false
+v2_ui_adaptations_shipped = false

 [open_questions]
 # Phase 4
 where_should_providers_live = "src/ai_client.py (existing file) or new src/ai_client_providers.py (new file)?"

 [deferred_work]
-# Task 1.7 surface: the 4 inline-loop vendors (anthropic, gemini, gemini_cli,
-# deepseek) cannot share run_with_tool_loop as-is. They use their own
-# vendored call paths (deepseek uses requests.post; gemini uses
-# google-genai streaming; gemini_cli uses subprocess JSONL; anthropic uses
-# the anthropic SDK). run_with_tool_loop is hard-coded to send_openai_compatible.
+# This section tracks work that was deferred from the original
+# plan. Each item has either been moved into a proper task entry
+# in the upcoming phases (see Phase 5 t5_6/7/8 below) or marked
+# as a permanent deferral with rationale (Phase 6 t6_1).
 #
-# To apply run_with_tool_loop to these 4 vendors, each must first be
-# refactored to produce OpenAICompatibleRequest + use send_openai_compatible
-# (analogous to the parent track's Grok+Llama+Qwen work). That conversion
-# is its own multi-day refactor; the plan treated it as a one-task line item
-# but the gap is significantly larger.
+# ============== Phase 1 t1_7: deferred vendors ==============
+# As of 2026-06-11, the 4 inline-loop vendors have been reduced
+# to 3 (gemini_cli was migrated to run_with_tool_loop via
+# send_func + on_pre_dispatch in commit 4748d134). The remaining
+# 3 (anthropic, gemini, deepseek) each use their own vendored
+# call path:
+#   - anthropic: anthropic SDK (.Anthropic().messages.create/stream)
+#   - gemini:    google-genai (Client().models.generate_content_stream)
+#   - deepseek:  requests.post (no SDK; raw OpenAI-compat)
 #
-# Per the per-task decision protocol in conductor/workflow.md ("Plan
-# approach doesn't fit"), Task 1.7 needs a scope re-plan before continuing.
-# Recommendation: split into 4 separate tasks (one per vendor) under a new
-# Phase 1.5 'vendor-conversion-to-OpenAICompatibleRequest', each with its
-# own spike + test + commit. The Phase 1 checkpoint (t1.9) should not
-# include the 4 inline-loop vendors; the current state is 'helper exists
-# + 3 vendors applied' which is a meaningful milestone on its own.
+# run_with_tool_loop is hard-coded to send_openai_compatible.
+# To apply it to these 3 vendors, each must first be refactored
+# to produce OpenAICompatibleRequest + use send_openai_compatible
+# (analogous to the parent track's Grok+Llama+Qwen work).
+#
+# Each conversion is a multi-day refactor (3-5 days per vendor
+# based on the Grok/Llama/Qwen conversion complexity). The plan
+# treated it as a one-task line item but the gap is significantly
+# larger.
+#
+# RESOLUTION: Each vendor now has a proper task entry in Phase 5:
+#   t5_6: anthropic tool-loop conversion
+#   t5_7: gemini tool-loop conversion
+#   t5_8: deepseek tool-loop conversion
+# This replaces the single t1_7 line item.
+#
+# ============== Phase 4 t4_3: Meta Llama API ==============
+# The Meta Llama developer docs URL is reachable (200 OK) but
+# the actual API endpoints (api.meta.ai, llama-api.meta.com,
+# api.llama.com) are 404/403/(no response). Meta does not
+# currently publish a public OpenAI-compat API.
+#
+# RESOLUTION: Permanent deferral. See Phase 6 t6_1 and
+# docs/reports/meta_llama_api_verification_20260611.md.
+# Re-evaluates when Meta publishes a public surface.
+#
+# ============== Phase 4 t4_7: UI adaptations for new v2 fields ==============
+# The 12 v2 fields are populated in the registry and accessible
+# via get_capabilities(). The GUI work (toggle for reasoning,
+# panel for code_execution, attachment buttons for audio/video,
+# etc.) is design-heavy and per-vendor-specific.
+#
+# RESOLUTION: Consolidated into Phase 5 t5_4. The Phase 5 task
+# was originally named "UI adaptations for new capabilities"
+# (effectively the same scope). It now has explicit per-field
+# scope in the task description.
 [local_first_priority]
 # Per user feedback 2026-06-11: emphasize local models as first-class
 # vs cloud/online vendors. Add UI badge, distinct cost state, native Ollama.
@@ -0,0 +1,150 @@
+# qwen_llama_grok_followup_20260611 — Deferred Work Resolution
+
+## TL;DR
+
+The track had 3 categories of deferred work. Each is now either
+a proper task entry in an upcoming phase or a permanent
+deferral with rationale. The state file's `[deferred_work]`
+section is rewritten to reflect current reality (the previous
+text was stale; mentioned `gemini_cli` as deferred but that
+vendor was migrated in commit `4748d134` via
+`send_func` + `on_pre_dispatch`).
+
+## The 3 deferred categories
+
+### 1. Phase 1 t1_7: 3 vendors (anthropic, gemini, deepseek) still on inline tool loops
+
+**Status:** MOVED to Phase 5 as proper task entries.
+
+| Task | Vendor | Estimated work | Why it was deferred |
+|---|---|---|---|
+| t5_6 | anthropic | 3-5 days | Uses anthropic SDK; must convert to OpenAICompatibleRequest + send_openai_compatible, then preserve anthropic-specific features (cache_control, extended_thinking, computer_use) |
+| t5_7 | gemini | 3-5 days | Uses google-genai streaming; same conversion scope as anthropic |
+| t5_8 | deepseek | 1-2 days | Already uses OpenAI-compat (requests.post) but has an inline loop; smallest refactor. Similar shape to Grok+Llama conversion in the parent track |
+
+Total estimated work: 7-12 days. This is a multi-week project on
+its own; not appropriate to bundle into the current 1-2-day
+session-per-phase cadence.
+
+**Why they were deferred originally:** Each vendor's vendored
+call path can't be slotted into `run_with_tool_loop` as-is —
+the helper is hard-coded to `send_openai_compatible`. The
+parent track treated Grok+Llama+Qwen as a 1-task line item but
+the actual conversion was substantial (the parent track
+spanned 5 days for those 3). The follow-up track made the
+correct call: don't try to fit 3 more conversions into a
+follow-up that's also doing 4 other phases.
+
+### 2. Phase 4 t4_3: Meta Llama API adapter
+
+**Status:** PERMANENT DEFERRED to Phase 6 t6_1.
+
+The Meta Llama developer docs URL is reachable (200 OK as of
+2026-06-11; was 400 in the parent session). However, the
+actual API endpoints (api.meta.ai, llama-api.meta.com,
+api.llama.com) are 404/403/(no response). Meta does not
+currently publish a public OpenAI-compat API.
+
+See `docs/reports/meta_llama_api_verification_20260611.md`
+for full probe results. Decision: don't ship a fake adapter
+that returns errors at runtime; defer until Meta publishes a
+public surface.
+
+Phase 6 t6_1 is a tracking placeholder, NOT scheduled for
+execution in this track. The next session/track can re-evaluate
+when Meta publishes a public URL (or another open-source Llama
+API surfaces).
+
+### 3. Phase 4 t4_7: UI adaptations for new v2 fields
+
+**Status:** CONSOLIDATED into Phase 5 t5_4 (which was
+originally named "UI adaptations for new capabilities" —
+effectively the same scope, just re-discovered).
+
+**Why it was a separate task:** When Phase 4 t4_6 populated
+the 11 v2 fields beyond `local`, the GUI work for those
+fields naturally fell out of Phase 4 scope. The fields are
+vendor-specific (e.g., `reasoning` for grok-2-reasoner only;
+`audio` for qwen-audio only) and design-heavy (per-field
+UX decisions: toggle vs panel vs button).
+
+**Resolution:** Cancel t4_7 as a duplicate, expand t5_4's
+description to enumerate the 11 specific UI adaptations:
+
+1. Reasoning toggle
+2. Structured output JSON toggle
+3. Code execution panel
+4. Web search UI
+5. X/Twitter search UI (grok-specific)
+6. File search panel
+7. MCP support toggle
+8. Audio attachment button
+9. Video attachment button
+10. Grounding toggle
+11. Computer use toggle
+
+The 11 fields are populated in `src/vendor_capabilities.py`;
+`get_capabilities()` is the read API; the GUI just needs to
+consult `caps.<field>` and render the right control.
+
+## Phase 5 expanded scope
+
+Phase 5 is now a "consolidation phase" that includes the
+tool-loop conversion work that was originally deferred from
+Phase 1, the matrix entries for the 3 remaining vendors,
+and the UI adaptations for new v2 fields. The phase is
+multi-day work (estimated 8-14 days) and should be scoped as
+a fresh track rather than a single follow-up session.
+
+The expanded Phase 5 has 8 tasks:
+- t5_1: Anthropic matrix entries
+- t5_2: Gemini matrix entries
+- t5_3: DeepSeek matrix entries
+- t5_4: UI adaptations for 11 v2 fields (consolidated from t4_7)
+- t5_5: Phase 5 docs + archive
+- t5_6: anthropic tool-loop conversion (deferred from t1_7)
+- t5_7: gemini tool-loop conversion (deferred from t1_7)
+- t5_8: deepseek tool-loop conversion (deferred from t1_7)
+
+## Verification
+
+The state file has 3 new verification flags that gate
+"Phase 5 complete":
+
+```
+all_8_vendors_on_tool_loop = false  # t5_6, t5_7, t5_8
+v2_matrix_fully_populated = false   # t5_1, t5_2, t5_3
+v2_ui_adaptations_shipped = false   # t5_4
+```
+
+When all 3 are true AND t5_5 (docs+archive) is complete,
+Phase 5 is done. The `audit_no_inline_tool_loops.py`
+script (which already exists) will start FAILING on Phase 5
+completion — that's the audit-script-success-as-CI-gate
+pattern, intended.
+
+## Phase 6 placeholder
+
+Phase 6 is a "cleanup" phase with 2 tasks:
+- t6_1: Meta Llama API adapter (PERMANENT DEFERRED)
+- t6_2: Track archive + final docs refresh
+
+Phase 6 is NOT scheduled for execution in this track; it's
+the home for permanent deferrals + the final archive step
+that runs when Phase 5 ships.
+
+## Cross-references
+
+- Session-end report (previous session):
+  `docs/reports/qwen_llama_grok_followup_session_end_20260611.md`
+- Meta Llama API verification report:
+  `docs/reports/meta_llama_api_verification_20260611.md`
+- Parent track's Phase 5+6:
+  `conductor/tracks/qwen_llama_grok_integration_20260606/`
+- This track's plan.md:
+  `conductor/tracks/qwen_llama_grok_followup_20260611/plan.md`
+  (note: plan.md was NOT updated to reflect the new t5_6/7/8
+  tasks; this report + the state.toml are the source of truth.
+  The plan.md is a planning artifact frozen at track-creation
+  time; new tasks are tracked in state.toml per the workflow
+  protocol.)