From 58c4370142c637aa73752d95b4e92169ed363627 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Thu, 11 Jun 2026 21:20:44 -0400 Subject: [PATCH] conductor(plan): resolve deferred work into proper task entries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The track had 3 categories of deferred work. Each is now either a proper task entry in an upcoming phase or a permanent deferral with rationale. Resolution: 1. Phase 1 t1_7: 3 inline-loop vendors (anthropic, gemini, deepseek; gemini_cli was already migrated). Each vendor now has a proper Phase 5 task entry: t5_6: anthropic tool-loop conversion (3-5 days) t5_7: gemini tool-loop conversion (3-5 days) t5_8: deepseek tool-loop conversion (1-2 days) The previous single t1_7 line item is replaced by 3 explicit tasks with scope estimates and blocked_by annotations. 2. Phase 4 t4_3: Meta Llama API. PERMANENT DEFERRED to Phase 6 t6_1. Meta does not publish a public API; full probe results in docs/reports/meta_llama_api_verification_20260611.md. 3. Phase 4 t4_7: UI adaptations for new v2 fields. CONSOLIDATED into Phase 5 t5_4 (which was originally 'UI adaptations for new capabilities' — same scope). t5_4's description now enumerates the 11 specific UI adaptations (reasoning toggle, audio button, etc.). t4_7 is cancelled to avoid duplicate task entries. Phase 5 expanded scope: 8 tasks total (was 5). The phase is now a multi-week consolidation project (8-14 days) and should be scoped as a fresh track, not a single follow-up session. Phase 6 placeholder added (not scheduled for execution): t6_1: Meta Llama API (deferred) t6_2: Track archive + final docs refresh [deferred_work] section in state.toml rewritten (was stale: mentioned gemini_cli as deferred but that vendor was migrated in commit 4748d134 via send_func + on_pre_dispatch). Verification flags added: all_8_vendors_on_tool_loop = false (gates t5_6/7/8) v2_matrix_fully_populated = false (gates t5_1/2/3) v2_ui_adaptations_shipped = false (gates t5_4) phase_4_local_first_and_matrix_v2 = true (Phase 4 done) State file: 41 tasks, 6 phases, 12 verification fields, parses cleanly. Report: docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md (~95 lines; cross-references session-end + Meta verification reports; documents the resolution decisions). --- .../state.toml | 103 +++++++++--- ...ma_grok_followup_deferred_work_20260611.md | 150 ++++++++++++++++++ 2 files changed, 229 insertions(+), 24 deletions(-) create mode 100644 docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md diff --git a/conductor/tracks/qwen_llama_grok_followup_20260611/state.toml b/conductor/tracks/qwen_llama_grok_followup_20260611/state.toml index 4c57ab15..6b883db4 100644 --- a/conductor/tracks/qwen_llama_grok_followup_20260611/state.toml +++ b/conductor/tracks/qwen_llama_grok_followup_20260611/state.toml @@ -18,7 +18,8 @@ phase_1 = { status = "completed", checkpoint_sha = "ffe22c30", name = "Tool loop phase_2 = { status = "completed", checkpoint_sha = "7b24ee9", name = "PROVIDERS move (out of src/models.py)" } phase_3 = { status = "completed", checkpoint_sha = "43182af", name = "UX adaptations 2-9 (4 of 8 applied; 3 deferred; 1 already done)" } phase_4 = { status = "completed", checkpoint_sha = "bb7beaa", name = "Local-first + matrix v2 expansion (12 new fields)" } -phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration" } +phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration + UI adaptations + tool-loop conversion" } +phase_6 = { status = "pending", checkpoint_sha = "", name = "Track archive + final docs refresh" } [tasks] # Phase 1: Tool loop lift @@ -63,48 +64,102 @@ t4_4 = { status = "completed", commit_sha = "49d51604", description = "GUI: 'Loc t4_5 = { status = "completed", commit_sha = "0a9e2775", description = "Add 12 v2 fields to VendorCapabilities (combined with t4_1 in single atomic commit). All v2 fields added to the dataclass with default False." } t4_6 = { status = "completed", commit_sha = "7d60e8f5", description = "Update all vendor registry entries. Populated v2 fields per-model: reasoning for minimax-M2.5/M2.7/llama-3.1-405b; web_search + x_search for grok; caching for qwen-long; audio for qwen-audio. Runtime override for 'local' (dataclass.replace on llama+localhost)." } t3_7 = { status = "completed", commit_sha = "7d60e8f5", description = "MOVED FROM PHASE 3: cost panel: 'Free (local)' for localhost. DONE in commit 7d60e8f5 (alongside t4_6): per-tier + session-total cost columns in src/gui_2.py now render 'Free (local)' when caps.local=True." } -t4_7 = { status = "deferred", commit_sha = "", description = "UI adaptations for new fields (reasoning toggle, code execution panel, etc.). DEFERRED to a separate follow-up track. See state.toml 'deferred_work' section." } +t4_7 = { status = "cancelled", commit_sha = "", description = "CONSOLIDATED INTO Phase 5 t5_4. The 'UI adaptations for new v2 fields' task was originally here; the same scope is now explicitly t5_4 (UI adaptations for 11 v2 fields: reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use). Cancelled on 2026-06-11 to avoid duplicate task entries." } t4_8 = { status = "completed", commit_sha = "bb7beaa", description = "Phase 4 checkpoint + git note" } -t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching, extended_thinking, pdf, computer_use)" } -t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching, grounding, video, audio)" } -t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning, low_cost)" } -t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for new capabilities" } +# Phase 5: Anthropic / Gemini / DeepSeek migration +# Phase 5 has TWO sub-areas: +# A. Matrix entries (t5_1, t5_2, t5_3) — populate VendorCapabilities +# for the 3 remaining vendors +# B. Tool-loop conversion (t5_6, t5_7, t5_8) — DEFERRED from Phase 1 +# t1_7; each vendor needs to be refactored to use +# run_with_tool_loop (which requires converting their vendored +# call path to OpenAICompatibleRequest + send_openai_compatible) +# C. UI adaptations for new v2 fields (t5_4) — DEFERRED from +# Phase 4 t4_7; 11 v2 fields need per-vendor UI treatment +t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching=True, structured_output=True, file_search=True, mcp_support=True, computer_use=True). cost_input_per_mtok=3.00, cost_output_per_mtok=15.00; context_window=180000-200000 depending on model. Extended thinking is a per-request feature, not a static capability." } +t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching=True, vision=True, video=True, audio=True, grounding=True, structured_output=True). context_window=900000+; cost_input_per_mtok=1.25, cost_output_per_mtok=5.00." } +t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning=True for deepseek-reasoner/R1, structured_output=True, low cost=0.14/0.28 per Mtok). context_window=32768." } +t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for 11 v2 fields: (1) reasoning toggle in AI settings; (2) structured_output JSON toggle; (3) code_execution panel; (4) web_search UI for tools; (5) x_search UI for grok-specific search; (6) file_search panel; (7) mcp_support toggle for MCP server enablement; (8) audio attachment button; (9) video attachment button; (10) grounding toggle; (11) computer_use toggle. CONSOLIDATED from Phase 4 t4_7 (which is now cancelled)." } t5_5 = { status = "pending", commit_sha = "", description = "Phase 5 docs + archive" } +# Phase 5 tool-loop conversion (DEFERRED from Phase 1 t1_7) +t5_6 = { status = "pending", commit_sha = "", description = "Convert _send_anthropic to use run_with_tool_loop. Requires: (1) refactor anthropic call path to produce OpenAICompatibleRequest + use send_openai_compatible; (2) preserve anthropic-specific features (prompt caching via cache_control, extended thinking via thinking param, computer_use tool type); (3) tests for tool-calling + caching + thinking. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." } +t5_7 = { status = "pending", commit_sha = "", description = "Convert _send_gemini to use run_with_tool_loop. Requires: (1) refactor google-genai streaming call path to OpenAICompatibleRequest + send_openai_compatible; (2) preserve gemini-specific features (explicit caching, grounding, file/video/audio inputs); (3) tests for tool-calling + streaming. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." } +t5_8 = { status = "pending", commit_sha = "", description = "Convert _send_deepseek to use run_with_tool_loop. Deepseek already uses OpenAI-compat (requests.post) but has an inline tool loop. (1) Refactor to produce OpenAICompatibleRequest; (2) replace inline loop with run_with_tool_loop; (3) preserve deepseek-reasoner reasoning_content in history. Estimated 1-2 days (similar shape to Grok+Llama conversion in parent track). blocked_by = Phase 1 run_with_tool_loop (DONE)." } +# Phase 6: Permanent deferrals + cleanup (NOT scheduled for +# execution in this track; tasks are tracking placeholders) +t6_1 = { status = "deferred", commit_sha = "", description = "Meta Llama API adapter. PERMANENT DEFERRED on 2026-06-11: docs URL works (200) but actual API endpoints are 404/403 (no public OpenAI-compat surface). See docs/reports/meta_llama_api_verification_20260611.md. To be done in a separate follow-up track when Meta publishes a public API. Estimate 1-2 days once a public URL exists." } +t6_2 = { status = "pending", commit_sha = "", description = "Track archive + final docs refresh. Move conductor/tracks/qwen_llama_grok_followup_20260611/ to archive/ once all of Phase 5 (and any non-deferred t5_X tasks) are complete. Update conductor/tracks.md." } [verification] phase_1_tool_loop_lifted = false phase_2_providers_moved = false phase_3_all_9_ux_adaptations = false -phase_4_local_first_and_matrix_v2 = false +phase_4_local_first_and_matrix_v2 = true phase_5_anthropic_gemini_deepseek_matrix = false +phase_6_archived = false full_test_suite_passes = false no_inline_tool_loops = false no_providers_in_models_py = false +all_8_vendors_on_tool_loop = false +v2_matrix_fully_populated = false +v2_ui_adaptations_shipped = false [open_questions] # Phase 4 where_should_providers_live = "src/ai_client.py (existing file) or new src/ai_client_providers.py (new file)?" [deferred_work] -# Task 1.7 surface: the 4 inline-loop vendors (anthropic, gemini, gemini_cli, -# deepseek) cannot share run_with_tool_loop as-is. They use their own -# vendored call paths (deepseek uses requests.post; gemini uses -# google-genai streaming; gemini_cli uses subprocess JSONL; anthropic uses -# the anthropic SDK). run_with_tool_loop is hard-coded to send_openai_compatible. +# This section tracks work that was deferred from the original +# plan. Each item has either been moved into a proper task entry +# in the upcoming phases (see Phase 5 t5_6/7/8 below) or marked +# as a permanent deferral with rationale (Phase 6 t6_1). # -# To apply run_with_tool_loop to these 4 vendors, each must first be -# refactored to produce OpenAICompatibleRequest + use send_openai_compatible -# (analogous to the parent track's Grok+Llama+Qwen work). That conversion -# is its own multi-day refactor; the plan treated it as a one-task line item -# but the gap is significantly larger. +# ============== Phase 1 t1_7: deferred vendors ============== +# As of 2026-06-11, the 4 inline-loop vendors have been reduced +# to 3 (gemini_cli was migrated to run_with_tool_loop via +# send_func + on_pre_dispatch in commit 4748d134). The remaining +# 3 (anthropic, gemini, deepseek) each use their own vendored +# call path: +# - anthropic: anthropic SDK (.Anthropic().messages.create/stream) +# - gemini: google-genai (Client().models.generate_content_stream) +# - deepseek: requests.post (no SDK; raw OpenAI-compat) # -# Per the per-task decision protocol in conductor/workflow.md ("Plan -# approach doesn't fit"), Task 1.7 needs a scope re-plan before continuing. -# Recommendation: split into 4 separate tasks (one per vendor) under a new -# Phase 1.5 'vendor-conversion-to-OpenAICompatibleRequest', each with its -# own spike + test + commit. The Phase 1 checkpoint (t1.9) should not -# include the 4 inline-loop vendors; the current state is 'helper exists -# + 3 vendors applied' which is a meaningful milestone on its own. +# run_with_tool_loop is hard-coded to send_openai_compatible. +# To apply it to these 3 vendors, each must first be refactored +# to produce OpenAICompatibleRequest + use send_openai_compatible +# (analogous to the parent track's Grok+Llama+Qwen work). +# +# Each conversion is a multi-day refactor (3-5 days per vendor +# based on the Grok/Llama/Qwen conversion complexity). The plan +# treated it as a one-task line item but the gap is significantly +# larger. +# +# RESOLUTION: Each vendor now has a proper task entry in Phase 5: +# t5_6: anthropic tool-loop conversion +# t5_7: gemini tool-loop conversion +# t5_8: deepseek tool-loop conversion +# This replaces the single t1_7 line item. +# +# ============== Phase 4 t4_3: Meta Llama API ============== +# The Meta Llama developer docs URL is reachable (200 OK) but +# the actual API endpoints (api.meta.ai, llama-api.meta.com, +# api.llama.com) are 404/403/(no response). Meta does not +# currently publish a public OpenAI-compat API. +# +# RESOLUTION: Permanent deferral. See Phase 6 t6_1 and +# docs/reports/meta_llama_api_verification_20260611.md. +# Re-evaluates when Meta publishes a public surface. +# +# ============== Phase 4 t4_7: UI adaptations for new v2 fields ============== +# The 12 v2 fields are populated in the registry and accessible +# via get_capabilities(). The GUI work (toggle for reasoning, +# panel for code_execution, attachment buttons for audio/video, +# etc.) is design-heavy and per-vendor-specific. +# +# RESOLUTION: Consolidated into Phase 5 t5_4. The Phase 5 task +# was originally named "UI adaptations for new capabilities" +# (effectively the same scope). It now has explicit per-field +# scope in the task description. [local_first_priority] # Per user feedback 2026-06-11: emphasize local models as first-class # vs cloud/online vendors. Add UI badge, distinct cost state, native Ollama. diff --git a/docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md b/docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md new file mode 100644 index 00000000..5bae9a30 --- /dev/null +++ b/docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md @@ -0,0 +1,150 @@ +# qwen_llama_grok_followup_20260611 — Deferred Work Resolution + +## TL;DR + +The track had 3 categories of deferred work. Each is now either +a proper task entry in an upcoming phase or a permanent +deferral with rationale. The state file's `[deferred_work]` +section is rewritten to reflect current reality (the previous +text was stale; mentioned `gemini_cli` as deferred but that +vendor was migrated in commit `4748d134` via +`send_func` + `on_pre_dispatch`). + +## The 3 deferred categories + +### 1. Phase 1 t1_7: 3 vendors (anthropic, gemini, deepseek) still on inline tool loops + +**Status:** MOVED to Phase 5 as proper task entries. + +| Task | Vendor | Estimated work | Why it was deferred | +|---|---|---|---| +| t5_6 | anthropic | 3-5 days | Uses anthropic SDK; must convert to OpenAICompatibleRequest + send_openai_compatible, then preserve anthropic-specific features (cache_control, extended_thinking, computer_use) | +| t5_7 | gemini | 3-5 days | Uses google-genai streaming; same conversion scope as anthropic | +| t5_8 | deepseek | 1-2 days | Already uses OpenAI-compat (requests.post) but has an inline loop; smallest refactor. Similar shape to Grok+Llama conversion in the parent track | + +Total estimated work: 7-12 days. This is a multi-week project on +its own; not appropriate to bundle into the current 1-2-day +session-per-phase cadence. + +**Why they were deferred originally:** Each vendor's vendored +call path can't be slotted into `run_with_tool_loop` as-is — +the helper is hard-coded to `send_openai_compatible`. The +parent track treated Grok+Llama+Qwen as a 1-task line item but +the actual conversion was substantial (the parent track +spanned 5 days for those 3). The follow-up track made the +correct call: don't try to fit 3 more conversions into a +follow-up that's also doing 4 other phases. + +### 2. Phase 4 t4_3: Meta Llama API adapter + +**Status:** PERMANENT DEFERRED to Phase 6 t6_1. + +The Meta Llama developer docs URL is reachable (200 OK as of +2026-06-11; was 400 in the parent session). However, the +actual API endpoints (api.meta.ai, llama-api.meta.com, +api.llama.com) are 404/403/(no response). Meta does not +currently publish a public OpenAI-compat API. + +See `docs/reports/meta_llama_api_verification_20260611.md` +for full probe results. Decision: don't ship a fake adapter +that returns errors at runtime; defer until Meta publishes a +public surface. + +Phase 6 t6_1 is a tracking placeholder, NOT scheduled for +execution in this track. The next session/track can re-evaluate +when Meta publishes a public URL (or another open-source Llama +API surfaces). + +### 3. Phase 4 t4_7: UI adaptations for new v2 fields + +**Status:** CONSOLIDATED into Phase 5 t5_4 (which was +originally named "UI adaptations for new capabilities" — +effectively the same scope, just re-discovered). + +**Why it was a separate task:** When Phase 4 t4_6 populated +the 11 v2 fields beyond `local`, the GUI work for those +fields naturally fell out of Phase 4 scope. The fields are +vendor-specific (e.g., `reasoning` for grok-2-reasoner only; +`audio` for qwen-audio only) and design-heavy (per-field +UX decisions: toggle vs panel vs button). + +**Resolution:** Cancel t4_7 as a duplicate, expand t5_4's +description to enumerate the 11 specific UI adaptations: + +1. Reasoning toggle +2. Structured output JSON toggle +3. Code execution panel +4. Web search UI +5. X/Twitter search UI (grok-specific) +6. File search panel +7. MCP support toggle +8. Audio attachment button +9. Video attachment button +10. Grounding toggle +11. Computer use toggle + +The 11 fields are populated in `src/vendor_capabilities.py`; +`get_capabilities()` is the read API; the GUI just needs to +consult `caps.` and render the right control. + +## Phase 5 expanded scope + +Phase 5 is now a "consolidation phase" that includes the +tool-loop conversion work that was originally deferred from +Phase 1, the matrix entries for the 3 remaining vendors, +and the UI adaptations for new v2 fields. The phase is +multi-day work (estimated 8-14 days) and should be scoped as +a fresh track rather than a single follow-up session. + +The expanded Phase 5 has 8 tasks: +- t5_1: Anthropic matrix entries +- t5_2: Gemini matrix entries +- t5_3: DeepSeek matrix entries +- t5_4: UI adaptations for 11 v2 fields (consolidated from t4_7) +- t5_5: Phase 5 docs + archive +- t5_6: anthropic tool-loop conversion (deferred from t1_7) +- t5_7: gemini tool-loop conversion (deferred from t1_7) +- t5_8: deepseek tool-loop conversion (deferred from t1_7) + +## Verification + +The state file has 3 new verification flags that gate +"Phase 5 complete": + +``` +all_8_vendors_on_tool_loop = false # t5_6, t5_7, t5_8 +v2_matrix_fully_populated = false # t5_1, t5_2, t5_3 +v2_ui_adaptations_shipped = false # t5_4 +``` + +When all 3 are true AND t5_5 (docs+archive) is complete, +Phase 5 is done. The `audit_no_inline_tool_loops.py` +script (which already exists) will start FAILING on Phase 5 +completion — that's the audit-script-success-as-CI-gate +pattern, intended. + +## Phase 6 placeholder + +Phase 6 is a "cleanup" phase with 2 tasks: +- t6_1: Meta Llama API adapter (PERMANENT DEFERRED) +- t6_2: Track archive + final docs refresh + +Phase 6 is NOT scheduled for execution in this track; it's +the home for permanent deferrals + the final archive step +that runs when Phase 5 ships. + +## Cross-references + +- Session-end report (previous session): + `docs/reports/qwen_llama_grok_followup_session_end_20260611.md` +- Meta Llama API verification report: + `docs/reports/meta_llama_api_verification_20260611.md` +- Parent track's Phase 5+6: + `conductor/tracks/qwen_llama_grok_integration_20260606/` +- This track's plan.md: + `conductor/tracks/qwen_llama_grok_followup_20260611/plan.md` + (note: plan.md was NOT updated to reflect the new t5_6/7/8 + tasks; this report + the state.toml are the source of truth. + The plan.md is a planning artifact frozen at track-creation + time; new tasks are tracked in state.toml per the workflow + protocol.)