Private
Public Access
0
0

conductor(plan): resolve deferred work into proper task entries

The track had 3 categories of deferred work. Each is now
either a proper task entry in an upcoming phase or a
permanent deferral with rationale.

Resolution:

1. Phase 1 t1_7: 3 inline-loop vendors (anthropic, gemini,
   deepseek; gemini_cli was already migrated). Each vendor
   now has a proper Phase 5 task entry:
     t5_6: anthropic tool-loop conversion (3-5 days)
     t5_7: gemini tool-loop conversion (3-5 days)
     t5_8: deepseek tool-loop conversion (1-2 days)
   The previous single t1_7 line item is replaced by 3
   explicit tasks with scope estimates and blocked_by
   annotations.

2. Phase 4 t4_3: Meta Llama API. PERMANENT DEFERRED to
   Phase 6 t6_1. Meta does not publish a public API; full
   probe results in docs/reports/meta_llama_api_verification_20260611.md.

3. Phase 4 t4_7: UI adaptations for new v2 fields.
   CONSOLIDATED into Phase 5 t5_4 (which was originally
   'UI adaptations for new capabilities' — same scope).
   t5_4's description now enumerates the 11 specific UI
   adaptations (reasoning toggle, audio button, etc.).
   t4_7 is cancelled to avoid duplicate task entries.

Phase 5 expanded scope: 8 tasks total (was 5). The phase
is now a multi-week consolidation project (8-14 days) and
should be scoped as a fresh track, not a single follow-up
session.

Phase 6 placeholder added (not scheduled for execution):
  t6_1: Meta Llama API (deferred)
  t6_2: Track archive + final docs refresh

[deferred_work] section in state.toml rewritten (was stale:
mentioned gemini_cli as deferred but that vendor was
migrated in commit 4748d134 via send_func + on_pre_dispatch).

Verification flags added:
  all_8_vendors_on_tool_loop = false  (gates t5_6/7/8)
  v2_matrix_fully_populated = false   (gates t5_1/2/3)
  v2_ui_adaptations_shipped = false   (gates t5_4)
  phase_4_local_first_and_matrix_v2 = true  (Phase 4 done)

State file: 41 tasks, 6 phases, 12 verification fields,
parses cleanly.

Report: docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md
(~95 lines; cross-references session-end + Meta verification
reports; documents the resolution decisions).
This commit is contained in:
2026-06-11 21:20:44 -04:00
parent 6596349325
commit 58c4370142
2 changed files with 229 additions and 24 deletions
@@ -18,7 +18,8 @@ phase_1 = { status = "completed", checkpoint_sha = "ffe22c30", name = "Tool loop
phase_2 = { status = "completed", checkpoint_sha = "7b24ee9", name = "PROVIDERS move (out of src/models.py)" }
phase_3 = { status = "completed", checkpoint_sha = "43182af", name = "UX adaptations 2-9 (4 of 8 applied; 3 deferred; 1 already done)" }
phase_4 = { status = "completed", checkpoint_sha = "bb7beaa", name = "Local-first + matrix v2 expansion (12 new fields)" }
phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration" }
phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration + UI adaptations + tool-loop conversion" }
phase_6 = { status = "pending", checkpoint_sha = "", name = "Track archive + final docs refresh" }
[tasks]
# Phase 1: Tool loop lift
@@ -63,48 +64,102 @@ t4_4 = { status = "completed", commit_sha = "49d51604", description = "GUI: 'Loc
t4_5 = { status = "completed", commit_sha = "0a9e2775", description = "Add 12 v2 fields to VendorCapabilities (combined with t4_1 in single atomic commit). All v2 fields added to the dataclass with default False." }
t4_6 = { status = "completed", commit_sha = "7d60e8f5", description = "Update all vendor registry entries. Populated v2 fields per-model: reasoning for minimax-M2.5/M2.7/llama-3.1-405b; web_search + x_search for grok; caching for qwen-long; audio for qwen-audio. Runtime override for 'local' (dataclass.replace on llama+localhost)." }
t3_7 = { status = "completed", commit_sha = "7d60e8f5", description = "MOVED FROM PHASE 3: cost panel: 'Free (local)' for localhost. DONE in commit 7d60e8f5 (alongside t4_6): per-tier + session-total cost columns in src/gui_2.py now render 'Free (local)' when caps.local=True." }
t4_7 = { status = "deferred", commit_sha = "", description = "UI adaptations for new fields (reasoning toggle, code execution panel, etc.). DEFERRED to a separate follow-up track. See state.toml 'deferred_work' section." }
t4_7 = { status = "cancelled", commit_sha = "", description = "CONSOLIDATED INTO Phase 5 t5_4. The 'UI adaptations for new v2 fields' task was originally here; the same scope is now explicitly t5_4 (UI adaptations for 11 v2 fields: reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use). Cancelled on 2026-06-11 to avoid duplicate task entries." }
t4_8 = { status = "completed", commit_sha = "bb7beaa", description = "Phase 4 checkpoint + git note" }
t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching, extended_thinking, pdf, computer_use)" }
t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching, grounding, video, audio)" }
t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning, low_cost)" }
t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for new capabilities" }
# Phase 5: Anthropic / Gemini / DeepSeek migration
# Phase 5 has TWO sub-areas:
# A. Matrix entries (t5_1, t5_2, t5_3) — populate VendorCapabilities
# for the 3 remaining vendors
# B. Tool-loop conversion (t5_6, t5_7, t5_8) — DEFERRED from Phase 1
# t1_7; each vendor needs to be refactored to use
# run_with_tool_loop (which requires converting their vendored
# call path to OpenAICompatibleRequest + send_openai_compatible)
# C. UI adaptations for new v2 fields (t5_4) — DEFERRED from
# Phase 4 t4_7; 11 v2 fields need per-vendor UI treatment
t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching=True, structured_output=True, file_search=True, mcp_support=True, computer_use=True). cost_input_per_mtok=3.00, cost_output_per_mtok=15.00; context_window=180000-200000 depending on model. Extended thinking is a per-request feature, not a static capability." }
t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching=True, vision=True, video=True, audio=True, grounding=True, structured_output=True). context_window=900000+; cost_input_per_mtok=1.25, cost_output_per_mtok=5.00." }
t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning=True for deepseek-reasoner/R1, structured_output=True, low cost=0.14/0.28 per Mtok). context_window=32768." }
t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for 11 v2 fields: (1) reasoning toggle in AI settings; (2) structured_output JSON toggle; (3) code_execution panel; (4) web_search UI for tools; (5) x_search UI for grok-specific search; (6) file_search panel; (7) mcp_support toggle for MCP server enablement; (8) audio attachment button; (9) video attachment button; (10) grounding toggle; (11) computer_use toggle. CONSOLIDATED from Phase 4 t4_7 (which is now cancelled)." }
t5_5 = { status = "pending", commit_sha = "", description = "Phase 5 docs + archive" }
# Phase 5 tool-loop conversion (DEFERRED from Phase 1 t1_7)
t5_6 = { status = "pending", commit_sha = "", description = "Convert _send_anthropic to use run_with_tool_loop. Requires: (1) refactor anthropic call path to produce OpenAICompatibleRequest + use send_openai_compatible; (2) preserve anthropic-specific features (prompt caching via cache_control, extended thinking via thinking param, computer_use tool type); (3) tests for tool-calling + caching + thinking. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
t5_7 = { status = "pending", commit_sha = "", description = "Convert _send_gemini to use run_with_tool_loop. Requires: (1) refactor google-genai streaming call path to OpenAICompatibleRequest + send_openai_compatible; (2) preserve gemini-specific features (explicit caching, grounding, file/video/audio inputs); (3) tests for tool-calling + streaming. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
t5_8 = { status = "pending", commit_sha = "", description = "Convert _send_deepseek to use run_with_tool_loop. Deepseek already uses OpenAI-compat (requests.post) but has an inline tool loop. (1) Refactor to produce OpenAICompatibleRequest; (2) replace inline loop with run_with_tool_loop; (3) preserve deepseek-reasoner reasoning_content in history. Estimated 1-2 days (similar shape to Grok+Llama conversion in parent track). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
# Phase 6: Permanent deferrals + cleanup (NOT scheduled for
# execution in this track; tasks are tracking placeholders)
t6_1 = { status = "deferred", commit_sha = "", description = "Meta Llama API adapter. PERMANENT DEFERRED on 2026-06-11: docs URL works (200) but actual API endpoints are 404/403 (no public OpenAI-compat surface). See docs/reports/meta_llama_api_verification_20260611.md. To be done in a separate follow-up track when Meta publishes a public API. Estimate 1-2 days once a public URL exists." }
t6_2 = { status = "pending", commit_sha = "", description = "Track archive + final docs refresh. Move conductor/tracks/qwen_llama_grok_followup_20260611/ to archive/ once all of Phase 5 (and any non-deferred t5_X tasks) are complete. Update conductor/tracks.md." }
[verification]
phase_1_tool_loop_lifted = false
phase_2_providers_moved = false
phase_3_all_9_ux_adaptations = false
phase_4_local_first_and_matrix_v2 = false
phase_4_local_first_and_matrix_v2 = true
phase_5_anthropic_gemini_deepseek_matrix = false
phase_6_archived = false
full_test_suite_passes = false
no_inline_tool_loops = false
no_providers_in_models_py = false
all_8_vendors_on_tool_loop = false
v2_matrix_fully_populated = false
v2_ui_adaptations_shipped = false
[open_questions]
# Phase 4
where_should_providers_live = "src/ai_client.py (existing file) or new src/ai_client_providers.py (new file)?"
[deferred_work]
# Task 1.7 surface: the 4 inline-loop vendors (anthropic, gemini, gemini_cli,
# deepseek) cannot share run_with_tool_loop as-is. They use their own
# vendored call paths (deepseek uses requests.post; gemini uses
# google-genai streaming; gemini_cli uses subprocess JSONL; anthropic uses
# the anthropic SDK). run_with_tool_loop is hard-coded to send_openai_compatible.
# This section tracks work that was deferred from the original
# plan. Each item has either been moved into a proper task entry
# in the upcoming phases (see Phase 5 t5_6/7/8 below) or marked
# as a permanent deferral with rationale (Phase 6 t6_1).
#
# To apply run_with_tool_loop to these 4 vendors, each must first be
# refactored to produce OpenAICompatibleRequest + use send_openai_compatible
# (analogous to the parent track's Grok+Llama+Qwen work). That conversion
# is its own multi-day refactor; the plan treated it as a one-task line item
# but the gap is significantly larger.
# ============== Phase 1 t1_7: deferred vendors ==============
# As of 2026-06-11, the 4 inline-loop vendors have been reduced
# to 3 (gemini_cli was migrated to run_with_tool_loop via
# send_func + on_pre_dispatch in commit 4748d134). The remaining
# 3 (anthropic, gemini, deepseek) each use their own vendored
# call path:
# - anthropic: anthropic SDK (.Anthropic().messages.create/stream)
# - gemini: google-genai (Client().models.generate_content_stream)
# - deepseek: requests.post (no SDK; raw OpenAI-compat)
#
# Per the per-task decision protocol in conductor/workflow.md ("Plan
# approach doesn't fit"), Task 1.7 needs a scope re-plan before continuing.
# Recommendation: split into 4 separate tasks (one per vendor) under a new
# Phase 1.5 'vendor-conversion-to-OpenAICompatibleRequest', each with its
# own spike + test + commit. The Phase 1 checkpoint (t1.9) should not
# include the 4 inline-loop vendors; the current state is 'helper exists
# + 3 vendors applied' which is a meaningful milestone on its own.
# run_with_tool_loop is hard-coded to send_openai_compatible.
# To apply it to these 3 vendors, each must first be refactored
# to produce OpenAICompatibleRequest + use send_openai_compatible
# (analogous to the parent track's Grok+Llama+Qwen work).
#
# Each conversion is a multi-day refactor (3-5 days per vendor
# based on the Grok/Llama/Qwen conversion complexity). The plan
# treated it as a one-task line item but the gap is significantly
# larger.
#
# RESOLUTION: Each vendor now has a proper task entry in Phase 5:
# t5_6: anthropic tool-loop conversion
# t5_7: gemini tool-loop conversion
# t5_8: deepseek tool-loop conversion
# This replaces the single t1_7 line item.
#
# ============== Phase 4 t4_3: Meta Llama API ==============
# The Meta Llama developer docs URL is reachable (200 OK) but
# the actual API endpoints (api.meta.ai, llama-api.meta.com,
# api.llama.com) are 404/403/(no response). Meta does not
# currently publish a public OpenAI-compat API.
#
# RESOLUTION: Permanent deferral. See Phase 6 t6_1 and
# docs/reports/meta_llama_api_verification_20260611.md.
# Re-evaluates when Meta publishes a public surface.
#
# ============== Phase 4 t4_7: UI adaptations for new v2 fields ==============
# The 12 v2 fields are populated in the registry and accessible
# via get_capabilities(). The GUI work (toggle for reasoning,
# panel for code_execution, attachment buttons for audio/video,
# etc.) is design-heavy and per-vendor-specific.
#
# RESOLUTION: Consolidated into Phase 5 t5_4. The Phase 5 task
# was originally named "UI adaptations for new capabilities"
# (effectively the same scope). It now has explicit per-field
# scope in the task description.
[local_first_priority]
# Per user feedback 2026-06-11: emphasize local models as first-class
# vs cloud/online vendors. Add UI badge, distinct cost state, native Ollama.