conductor(plan): resolve deferred work into proper task entries

The track had 3 categories of deferred work. Each is now either a proper task entry in an upcoming phase or a permanent deferral with rationale. Resolution: 1. Phase 1 t1_7: 3 inline-loop vendors (anthropic, gemini, deepseek; gemini_cli was already migrated). Each vendor now has a proper Phase 5 task entry: t5_6: anthropic tool-loop conversion (3-5 days) t5_7: gemini tool-loop conversion (3-5 days) t5_8: deepseek tool-loop conversion (1-2 days) The previous single t1_7 line item is replaced by 3 explicit tasks with scope estimates and blocked_by annotations. 2. Phase 4 t4_3: Meta Llama API. PERMANENT DEFERRED to Phase 6 t6_1. Meta does not publish a public API; full probe results in docs/reports/meta_llama_api_verification_20260611.md. 3. Phase 4 t4_7: UI adaptations for new v2 fields. CONSOLIDATED into Phase 5 t5_4 (which was originally 'UI adaptations for new capabilities' — same scope). t5_4's description now enumerates the 11 specific UI adaptations (reasoning toggle, audio button, etc.). t4_7 is cancelled to avoid duplicate task entries. Phase 5 expanded scope: 8 tasks total (was 5). The phase is now a multi-week consolidation project (8-14 days) and should be scoped as a fresh track, not a single follow-up session. Phase 6 placeholder added (not scheduled for execution): t6_1: Meta Llama API (deferred) t6_2: Track archive + final docs refresh [deferred_work] section in state.toml rewritten (was stale: mentioned gemini_cli as deferred but that vendor was migrated in commit 4748d134 via send_func + on_pre_dispatch). Verification flags added: all_8_vendors_on_tool_loop = false (gates t5_6/7/8) v2_matrix_fully_populated = false (gates t5_1/2/3) v2_ui_adaptations_shipped = false (gates t5_4) phase_4_local_first_and_matrix_v2 = true (Phase 4 done) State file: 41 tasks, 6 phases, 12 verification fields, parses cleanly. Report: docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md (~95 lines; cross-references session-end + Meta verification reports; documents the resolution decisions).
2026-06-11 21:20:44 -04:00
parent 6596349325
commit 58c4370142
2 changed files with 229 additions and 24 deletions
@@ -18,7 +18,8 @@ phase_1 = { status = "completed", checkpoint_sha = "ffe22c30", name = "Tool loop
 phase_2 = { status = "completed", checkpoint_sha = "7b24ee9", name = "PROVIDERS move (out of src/models.py)" }
 phase_3 = { status = "completed", checkpoint_sha = "43182af", name = "UX adaptations 2-9 (4 of 8 applied; 3 deferred; 1 already done)" }
 phase_4 = { status = "completed", checkpoint_sha = "bb7beaa", name = "Local-first + matrix v2 expansion (12 new fields)" }
-phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration" }
+phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration + UI adaptations + tool-loop conversion" }
+phase_6 = { status = "pending", checkpoint_sha = "", name = "Track archive + final docs refresh" }

 [tasks]
 # Phase 1: Tool loop lift
@@ -63,48 +64,102 @@ t4_4 = { status = "completed", commit_sha = "49d51604", description = "GUI: 'Loc
 t4_5 = { status = "completed", commit_sha = "0a9e2775", description = "Add 12 v2 fields to VendorCapabilities (combined with t4_1 in single atomic commit). All v2 fields added to the dataclass with default False." }
 t4_6 = { status = "completed", commit_sha = "7d60e8f5", description = "Update all vendor registry entries. Populated v2 fields per-model: reasoning for minimax-M2.5/M2.7/llama-3.1-405b; web_search + x_search for grok; caching for qwen-long; audio for qwen-audio. Runtime override for 'local' (dataclass.replace on llama+localhost)." }
 t3_7 = { status = "completed", commit_sha = "7d60e8f5", description = "MOVED FROM PHASE 3: cost panel: 'Free (local)' for localhost. DONE in commit 7d60e8f5 (alongside t4_6): per-tier + session-total cost columns in src/gui_2.py now render 'Free (local)' when caps.local=True." }
-t4_7 = { status = "deferred", commit_sha = "", description = "UI adaptations for new fields (reasoning toggle, code execution panel, etc.). DEFERRED to a separate follow-up track. See state.toml 'deferred_work' section." }
+t4_7 = { status = "cancelled", commit_sha = "", description = "CONSOLIDATED INTO Phase 5 t5_4. The 'UI adaptations for new v2 fields' task was originally here; the same scope is now explicitly t5_4 (UI adaptations for 11 v2 fields: reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use). Cancelled on 2026-06-11 to avoid duplicate task entries." }
 t4_8 = { status = "completed", commit_sha = "bb7beaa", description = "Phase 4 checkpoint + git note" }
-t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching, extended_thinking, pdf, computer_use)" }
-t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching, grounding, video, audio)" }
-t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning, low_cost)" }
-t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for new capabilities" }
+# Phase 5: Anthropic / Gemini / DeepSeek migration
+# Phase 5 has TWO sub-areas:
+#   A. Matrix entries (t5_1, t5_2, t5_3) — populate VendorCapabilities
+#      for the 3 remaining vendors
+#   B. Tool-loop conversion (t5_6, t5_7, t5_8) — DEFERRED from Phase 1
+#      t1_7; each vendor needs to be refactored to use
+#      run_with_tool_loop (which requires converting their vendored
+#      call path to OpenAICompatibleRequest + send_openai_compatible)
+#   C. UI adaptations for new v2 fields (t5_4) — DEFERRED from
+#      Phase 4 t4_7; 11 v2 fields need per-vendor UI treatment
+t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching=True, structured_output=True, file_search=True, mcp_support=True, computer_use=True). cost_input_per_mtok=3.00, cost_output_per_mtok=15.00; context_window=180000-200000 depending on model. Extended thinking is a per-request feature, not a static capability." }
+t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching=True, vision=True, video=True, audio=True, grounding=True, structured_output=True). context_window=900000+; cost_input_per_mtok=1.25, cost_output_per_mtok=5.00." }
+t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning=True for deepseek-reasoner/R1, structured_output=True, low cost=0.14/0.28 per Mtok). context_window=32768." }
+t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for 11 v2 fields: (1) reasoning toggle in AI settings; (2) structured_output JSON toggle; (3) code_execution panel; (4) web_search UI for tools; (5) x_search UI for grok-specific search; (6) file_search panel; (7) mcp_support toggle for MCP server enablement; (8) audio attachment button; (9) video attachment button; (10) grounding toggle; (11) computer_use toggle. CONSOLIDATED from Phase 4 t4_7 (which is now cancelled)." }
 t5_5 = { status = "pending", commit_sha = "", description = "Phase 5 docs + archive" }
+# Phase 5 tool-loop conversion (DEFERRED from Phase 1 t1_7)
+t5_6 = { status = "pending", commit_sha = "", description = "Convert _send_anthropic to use run_with_tool_loop. Requires: (1) refactor anthropic call path to produce OpenAICompatibleRequest + use send_openai_compatible; (2) preserve anthropic-specific features (prompt caching via cache_control, extended thinking via thinking param, computer_use tool type); (3) tests for tool-calling + caching + thinking. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
+t5_7 = { status = "pending", commit_sha = "", description = "Convert _send_gemini to use run_with_tool_loop. Requires: (1) refactor google-genai streaming call path to OpenAICompatibleRequest + send_openai_compatible; (2) preserve gemini-specific features (explicit caching, grounding, file/video/audio inputs); (3) tests for tool-calling + streaming. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
+t5_8 = { status = "pending", commit_sha = "", description = "Convert _send_deepseek to use run_with_tool_loop. Deepseek already uses OpenAI-compat (requests.post) but has an inline tool loop. (1) Refactor to produce OpenAICompatibleRequest; (2) replace inline loop with run_with_tool_loop; (3) preserve deepseek-reasoner reasoning_content in history. Estimated 1-2 days (similar shape to Grok+Llama conversion in parent track). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
+# Phase 6: Permanent deferrals + cleanup (NOT scheduled for
+# execution in this track; tasks are tracking placeholders)
+t6_1 = { status = "deferred", commit_sha = "", description = "Meta Llama API adapter. PERMANENT DEFERRED on 2026-06-11: docs URL works (200) but actual API endpoints are 404/403 (no public OpenAI-compat surface). See docs/reports/meta_llama_api_verification_20260611.md. To be done in a separate follow-up track when Meta publishes a public API. Estimate 1-2 days once a public URL exists." }
+t6_2 = { status = "pending", commit_sha = "", description = "Track archive + final docs refresh. Move conductor/tracks/qwen_llama_grok_followup_20260611/ to archive/ once all of Phase 5 (and any non-deferred t5_X tasks) are complete. Update conductor/tracks.md." }

 [verification]
 phase_1_tool_loop_lifted = false
 phase_2_providers_moved = false
 phase_3_all_9_ux_adaptations = false
-phase_4_local_first_and_matrix_v2 = false
+phase_4_local_first_and_matrix_v2 = true
 phase_5_anthropic_gemini_deepseek_matrix = false
+phase_6_archived = false
 full_test_suite_passes = false
 no_inline_tool_loops = false
 no_providers_in_models_py = false
+all_8_vendors_on_tool_loop = false
+v2_matrix_fully_populated = false
+v2_ui_adaptations_shipped = false

 [open_questions]
 # Phase 4
 where_should_providers_live = "src/ai_client.py (existing file) or new src/ai_client_providers.py (new file)?"

 [deferred_work]
-# Task 1.7 surface: the 4 inline-loop vendors (anthropic, gemini, gemini_cli,
-# deepseek) cannot share run_with_tool_loop as-is. They use their own
-# vendored call paths (deepseek uses requests.post; gemini uses
-# google-genai streaming; gemini_cli uses subprocess JSONL; anthropic uses
-# the anthropic SDK). run_with_tool_loop is hard-coded to send_openai_compatible.
+# This section tracks work that was deferred from the original
+# plan. Each item has either been moved into a proper task entry
+# in the upcoming phases (see Phase 5 t5_6/7/8 below) or marked
+# as a permanent deferral with rationale (Phase 6 t6_1).
 #
-# To apply run_with_tool_loop to these 4 vendors, each must first be
-# refactored to produce OpenAICompatibleRequest + use send_openai_compatible
-# (analogous to the parent track's Grok+Llama+Qwen work). That conversion
-# is its own multi-day refactor; the plan treated it as a one-task line item
-# but the gap is significantly larger.
+# ============== Phase 1 t1_7: deferred vendors ==============
+# As of 2026-06-11, the 4 inline-loop vendors have been reduced
+# to 3 (gemini_cli was migrated to run_with_tool_loop via
+# send_func + on_pre_dispatch in commit 4748d134). The remaining
+# 3 (anthropic, gemini, deepseek) each use their own vendored
+# call path:
+#   - anthropic: anthropic SDK (.Anthropic().messages.create/stream)
+#   - gemini:    google-genai (Client().models.generate_content_stream)
+#   - deepseek:  requests.post (no SDK; raw OpenAI-compat)
 #
-# Per the per-task decision protocol in conductor/workflow.md ("Plan
-# approach doesn't fit"), Task 1.7 needs a scope re-plan before continuing.
-# Recommendation: split into 4 separate tasks (one per vendor) under a new
-# Phase 1.5 'vendor-conversion-to-OpenAICompatibleRequest', each with its
-# own spike + test + commit. The Phase 1 checkpoint (t1.9) should not
-# include the 4 inline-loop vendors; the current state is 'helper exists
-# + 3 vendors applied' which is a meaningful milestone on its own.
+# run_with_tool_loop is hard-coded to send_openai_compatible.
+# To apply it to these 3 vendors, each must first be refactored
+# to produce OpenAICompatibleRequest + use send_openai_compatible
+# (analogous to the parent track's Grok+Llama+Qwen work).
+#
+# Each conversion is a multi-day refactor (3-5 days per vendor
+# based on the Grok/Llama/Qwen conversion complexity). The plan
+# treated it as a one-task line item but the gap is significantly
+# larger.
+#
+# RESOLUTION: Each vendor now has a proper task entry in Phase 5:
+#   t5_6: anthropic tool-loop conversion
+#   t5_7: gemini tool-loop conversion
+#   t5_8: deepseek tool-loop conversion
+# This replaces the single t1_7 line item.
+#
+# ============== Phase 4 t4_3: Meta Llama API ==============
+# The Meta Llama developer docs URL is reachable (200 OK) but
+# the actual API endpoints (api.meta.ai, llama-api.meta.com,
+# api.llama.com) are 404/403/(no response). Meta does not
+# currently publish a public OpenAI-compat API.
+#
+# RESOLUTION: Permanent deferral. See Phase 6 t6_1 and
+# docs/reports/meta_llama_api_verification_20260611.md.
+# Re-evaluates when Meta publishes a public surface.
+#
+# ============== Phase 4 t4_7: UI adaptations for new v2 fields ==============
+# The 12 v2 fields are populated in the registry and accessible
+# via get_capabilities(). The GUI work (toggle for reasoning,
+# panel for code_execution, attachment buttons for audio/video,
+# etc.) is design-heavy and per-vendor-specific.
+#
+# RESOLUTION: Consolidated into Phase 5 t5_4. The Phase 5 task
+# was originally named "UI adaptations for new capabilities"
+# (effectively the same scope). It now has explicit per-field
+# scope in the task description.
 [local_first_priority]
 # Per user feedback 2026-06-11: emphasize local models as first-class
 # vs cloud/online vendors. Add UI badge, distinct cost state, native Ollama.