Private
Public Access
0
0

conductor(plan): resolve deferred work into proper task entries

The track had 3 categories of deferred work. Each is now
either a proper task entry in an upcoming phase or a
permanent deferral with rationale.

Resolution:

1. Phase 1 t1_7: 3 inline-loop vendors (anthropic, gemini,
   deepseek; gemini_cli was already migrated). Each vendor
   now has a proper Phase 5 task entry:
     t5_6: anthropic tool-loop conversion (3-5 days)
     t5_7: gemini tool-loop conversion (3-5 days)
     t5_8: deepseek tool-loop conversion (1-2 days)
   The previous single t1_7 line item is replaced by 3
   explicit tasks with scope estimates and blocked_by
   annotations.

2. Phase 4 t4_3: Meta Llama API. PERMANENT DEFERRED to
   Phase 6 t6_1. Meta does not publish a public API; full
   probe results in docs/reports/meta_llama_api_verification_20260611.md.

3. Phase 4 t4_7: UI adaptations for new v2 fields.
   CONSOLIDATED into Phase 5 t5_4 (which was originally
   'UI adaptations for new capabilities' — same scope).
   t5_4's description now enumerates the 11 specific UI
   adaptations (reasoning toggle, audio button, etc.).
   t4_7 is cancelled to avoid duplicate task entries.

Phase 5 expanded scope: 8 tasks total (was 5). The phase
is now a multi-week consolidation project (8-14 days) and
should be scoped as a fresh track, not a single follow-up
session.

Phase 6 placeholder added (not scheduled for execution):
  t6_1: Meta Llama API (deferred)
  t6_2: Track archive + final docs refresh

[deferred_work] section in state.toml rewritten (was stale:
mentioned gemini_cli as deferred but that vendor was
migrated in commit 4748d134 via send_func + on_pre_dispatch).

Verification flags added:
  all_8_vendors_on_tool_loop = false  (gates t5_6/7/8)
  v2_matrix_fully_populated = false   (gates t5_1/2/3)
  v2_ui_adaptations_shipped = false   (gates t5_4)
  phase_4_local_first_and_matrix_v2 = true  (Phase 4 done)

State file: 41 tasks, 6 phases, 12 verification fields,
parses cleanly.

Report: docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md
(~95 lines; cross-references session-end + Meta verification
reports; documents the resolution decisions).
This commit is contained in:
2026-06-11 21:20:44 -04:00
parent 6596349325
commit 58c4370142
2 changed files with 229 additions and 24 deletions
@@ -18,7 +18,8 @@ phase_1 = { status = "completed", checkpoint_sha = "ffe22c30", name = "Tool loop
phase_2 = { status = "completed", checkpoint_sha = "7b24ee9", name = "PROVIDERS move (out of src/models.py)" }
phase_3 = { status = "completed", checkpoint_sha = "43182af", name = "UX adaptations 2-9 (4 of 8 applied; 3 deferred; 1 already done)" }
phase_4 = { status = "completed", checkpoint_sha = "bb7beaa", name = "Local-first + matrix v2 expansion (12 new fields)" }
phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration" }
phase_5 = { status = "pending", checkpoint_sha = "", name = "Anthropic/Gemini/DeepSeek capability matrix migration + UI adaptations + tool-loop conversion" }
phase_6 = { status = "pending", checkpoint_sha = "", name = "Track archive + final docs refresh" }
[tasks]
# Phase 1: Tool loop lift
@@ -63,48 +64,102 @@ t4_4 = { status = "completed", commit_sha = "49d51604", description = "GUI: 'Loc
t4_5 = { status = "completed", commit_sha = "0a9e2775", description = "Add 12 v2 fields to VendorCapabilities (combined with t4_1 in single atomic commit). All v2 fields added to the dataclass with default False." }
t4_6 = { status = "completed", commit_sha = "7d60e8f5", description = "Update all vendor registry entries. Populated v2 fields per-model: reasoning for minimax-M2.5/M2.7/llama-3.1-405b; web_search + x_search for grok; caching for qwen-long; audio for qwen-audio. Runtime override for 'local' (dataclass.replace on llama+localhost)." }
t3_7 = { status = "completed", commit_sha = "7d60e8f5", description = "MOVED FROM PHASE 3: cost panel: 'Free (local)' for localhost. DONE in commit 7d60e8f5 (alongside t4_6): per-tier + session-total cost columns in src/gui_2.py now render 'Free (local)' when caps.local=True." }
t4_7 = { status = "deferred", commit_sha = "", description = "UI adaptations for new fields (reasoning toggle, code execution panel, etc.). DEFERRED to a separate follow-up track. See state.toml 'deferred_work' section." }
t4_7 = { status = "cancelled", commit_sha = "", description = "CONSOLIDATED INTO Phase 5 t5_4. The 'UI adaptations for new v2 fields' task was originally here; the same scope is now explicitly t5_4 (UI adaptations for 11 v2 fields: reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use). Cancelled on 2026-06-11 to avoid duplicate task entries." }
t4_8 = { status = "completed", commit_sha = "bb7beaa", description = "Phase 4 checkpoint + git note" }
t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching, extended_thinking, pdf, computer_use)" }
t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching, grounding, video, audio)" }
t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning, low_cost)" }
t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for new capabilities" }
# Phase 5: Anthropic / Gemini / DeepSeek migration
# Phase 5 has TWO sub-areas:
# A. Matrix entries (t5_1, t5_2, t5_3) — populate VendorCapabilities
# for the 3 remaining vendors
# B. Tool-loop conversion (t5_6, t5_7, t5_8) — DEFERRED from Phase 1
# t1_7; each vendor needs to be refactored to use
# run_with_tool_loop (which requires converting their vendored
# call path to OpenAICompatibleRequest + send_openai_compatible)
# C. UI adaptations for new v2 fields (t5_4) — DEFERRED from
# Phase 4 t4_7; 11 v2 fields need per-vendor UI treatment
t5_1 = { status = "pending", commit_sha = "", description = "Populate Anthropic matrix entries (caching=True, structured_output=True, file_search=True, mcp_support=True, computer_use=True). cost_input_per_mtok=3.00, cost_output_per_mtok=15.00; context_window=180000-200000 depending on model. Extended thinking is a per-request feature, not a static capability." }
t5_2 = { status = "pending", commit_sha = "", description = "Populate Gemini matrix entries (caching=True, vision=True, video=True, audio=True, grounding=True, structured_output=True). context_window=900000+; cost_input_per_mtok=1.25, cost_output_per_mtok=5.00." }
t5_3 = { status = "pending", commit_sha = "", description = "Populate DeepSeek matrix entries (reasoning=True for deepseek-reasoner/R1, structured_output=True, low cost=0.14/0.28 per Mtok). context_window=32768." }
t5_4 = { status = "pending", commit_sha = "", description = "UI adaptations for 11 v2 fields: (1) reasoning toggle in AI settings; (2) structured_output JSON toggle; (3) code_execution panel; (4) web_search UI for tools; (5) x_search UI for grok-specific search; (6) file_search panel; (7) mcp_support toggle for MCP server enablement; (8) audio attachment button; (9) video attachment button; (10) grounding toggle; (11) computer_use toggle. CONSOLIDATED from Phase 4 t4_7 (which is now cancelled)." }
t5_5 = { status = "pending", commit_sha = "", description = "Phase 5 docs + archive" }
# Phase 5 tool-loop conversion (DEFERRED from Phase 1 t1_7)
t5_6 = { status = "pending", commit_sha = "", description = "Convert _send_anthropic to use run_with_tool_loop. Requires: (1) refactor anthropic call path to produce OpenAICompatibleRequest + use send_openai_compatible; (2) preserve anthropic-specific features (prompt caching via cache_control, extended thinking via thinking param, computer_use tool type); (3) tests for tool-calling + caching + thinking. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
t5_7 = { status = "pending", commit_sha = "", description = "Convert _send_gemini to use run_with_tool_loop. Requires: (1) refactor google-genai streaming call path to OpenAICompatibleRequest + send_openai_compatible; (2) preserve gemini-specific features (explicit caching, grounding, file/video/audio inputs); (3) tests for tool-calling + streaming. Multi-day refactor (3-5 days). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
t5_8 = { status = "pending", commit_sha = "", description = "Convert _send_deepseek to use run_with_tool_loop. Deepseek already uses OpenAI-compat (requests.post) but has an inline tool loop. (1) Refactor to produce OpenAICompatibleRequest; (2) replace inline loop with run_with_tool_loop; (3) preserve deepseek-reasoner reasoning_content in history. Estimated 1-2 days (similar shape to Grok+Llama conversion in parent track). blocked_by = Phase 1 run_with_tool_loop (DONE)." }
# Phase 6: Permanent deferrals + cleanup (NOT scheduled for
# execution in this track; tasks are tracking placeholders)
t6_1 = { status = "deferred", commit_sha = "", description = "Meta Llama API adapter. PERMANENT DEFERRED on 2026-06-11: docs URL works (200) but actual API endpoints are 404/403 (no public OpenAI-compat surface). See docs/reports/meta_llama_api_verification_20260611.md. To be done in a separate follow-up track when Meta publishes a public API. Estimate 1-2 days once a public URL exists." }
t6_2 = { status = "pending", commit_sha = "", description = "Track archive + final docs refresh. Move conductor/tracks/qwen_llama_grok_followup_20260611/ to archive/ once all of Phase 5 (and any non-deferred t5_X tasks) are complete. Update conductor/tracks.md." }
[verification]
phase_1_tool_loop_lifted = false
phase_2_providers_moved = false
phase_3_all_9_ux_adaptations = false
phase_4_local_first_and_matrix_v2 = false
phase_4_local_first_and_matrix_v2 = true
phase_5_anthropic_gemini_deepseek_matrix = false
phase_6_archived = false
full_test_suite_passes = false
no_inline_tool_loops = false
no_providers_in_models_py = false
all_8_vendors_on_tool_loop = false
v2_matrix_fully_populated = false
v2_ui_adaptations_shipped = false
[open_questions]
# Phase 4
where_should_providers_live = "src/ai_client.py (existing file) or new src/ai_client_providers.py (new file)?"
[deferred_work]
# Task 1.7 surface: the 4 inline-loop vendors (anthropic, gemini, gemini_cli,
# deepseek) cannot share run_with_tool_loop as-is. They use their own
# vendored call paths (deepseek uses requests.post; gemini uses
# google-genai streaming; gemini_cli uses subprocess JSONL; anthropic uses
# the anthropic SDK). run_with_tool_loop is hard-coded to send_openai_compatible.
# This section tracks work that was deferred from the original
# plan. Each item has either been moved into a proper task entry
# in the upcoming phases (see Phase 5 t5_6/7/8 below) or marked
# as a permanent deferral with rationale (Phase 6 t6_1).
#
# To apply run_with_tool_loop to these 4 vendors, each must first be
# refactored to produce OpenAICompatibleRequest + use send_openai_compatible
# (analogous to the parent track's Grok+Llama+Qwen work). That conversion
# is its own multi-day refactor; the plan treated it as a one-task line item
# but the gap is significantly larger.
# ============== Phase 1 t1_7: deferred vendors ==============
# As of 2026-06-11, the 4 inline-loop vendors have been reduced
# to 3 (gemini_cli was migrated to run_with_tool_loop via
# send_func + on_pre_dispatch in commit 4748d134). The remaining
# 3 (anthropic, gemini, deepseek) each use their own vendored
# call path:
# - anthropic: anthropic SDK (.Anthropic().messages.create/stream)
# - gemini: google-genai (Client().models.generate_content_stream)
# - deepseek: requests.post (no SDK; raw OpenAI-compat)
#
# Per the per-task decision protocol in conductor/workflow.md ("Plan
# approach doesn't fit"), Task 1.7 needs a scope re-plan before continuing.
# Recommendation: split into 4 separate tasks (one per vendor) under a new
# Phase 1.5 'vendor-conversion-to-OpenAICompatibleRequest', each with its
# own spike + test + commit. The Phase 1 checkpoint (t1.9) should not
# include the 4 inline-loop vendors; the current state is 'helper exists
# + 3 vendors applied' which is a meaningful milestone on its own.
# run_with_tool_loop is hard-coded to send_openai_compatible.
# To apply it to these 3 vendors, each must first be refactored
# to produce OpenAICompatibleRequest + use send_openai_compatible
# (analogous to the parent track's Grok+Llama+Qwen work).
#
# Each conversion is a multi-day refactor (3-5 days per vendor
# based on the Grok/Llama/Qwen conversion complexity). The plan
# treated it as a one-task line item but the gap is significantly
# larger.
#
# RESOLUTION: Each vendor now has a proper task entry in Phase 5:
# t5_6: anthropic tool-loop conversion
# t5_7: gemini tool-loop conversion
# t5_8: deepseek tool-loop conversion
# This replaces the single t1_7 line item.
#
# ============== Phase 4 t4_3: Meta Llama API ==============
# The Meta Llama developer docs URL is reachable (200 OK) but
# the actual API endpoints (api.meta.ai, llama-api.meta.com,
# api.llama.com) are 404/403/(no response). Meta does not
# currently publish a public OpenAI-compat API.
#
# RESOLUTION: Permanent deferral. See Phase 6 t6_1 and
# docs/reports/meta_llama_api_verification_20260611.md.
# Re-evaluates when Meta publishes a public surface.
#
# ============== Phase 4 t4_7: UI adaptations for new v2 fields ==============
# The 12 v2 fields are populated in the registry and accessible
# via get_capabilities(). The GUI work (toggle for reasoning,
# panel for code_execution, attachment buttons for audio/video,
# etc.) is design-heavy and per-vendor-specific.
#
# RESOLUTION: Consolidated into Phase 5 t5_4. The Phase 5 task
# was originally named "UI adaptations for new capabilities"
# (effectively the same scope). It now has explicit per-field
# scope in the task description.
[local_first_priority]
# Per user feedback 2026-06-11: emphasize local models as first-class
# vs cloud/online vendors. Add UI badge, distinct cost state, native Ollama.
@@ -0,0 +1,150 @@
# qwen_llama_grok_followup_20260611 — Deferred Work Resolution
## TL;DR
The track had 3 categories of deferred work. Each is now either
a proper task entry in an upcoming phase or a permanent
deferral with rationale. The state file's `[deferred_work]`
section is rewritten to reflect current reality (the previous
text was stale; mentioned `gemini_cli` as deferred but that
vendor was migrated in commit `4748d134` via
`send_func` + `on_pre_dispatch`).
## The 3 deferred categories
### 1. Phase 1 t1_7: 3 vendors (anthropic, gemini, deepseek) still on inline tool loops
**Status:** MOVED to Phase 5 as proper task entries.
| Task | Vendor | Estimated work | Why it was deferred |
|---|---|---|---|
| t5_6 | anthropic | 3-5 days | Uses anthropic SDK; must convert to OpenAICompatibleRequest + send_openai_compatible, then preserve anthropic-specific features (cache_control, extended_thinking, computer_use) |
| t5_7 | gemini | 3-5 days | Uses google-genai streaming; same conversion scope as anthropic |
| t5_8 | deepseek | 1-2 days | Already uses OpenAI-compat (requests.post) but has an inline loop; smallest refactor. Similar shape to Grok+Llama conversion in the parent track |
Total estimated work: 7-12 days. This is a multi-week project on
its own; not appropriate to bundle into the current 1-2-day
session-per-phase cadence.
**Why they were deferred originally:** Each vendor's vendored
call path can't be slotted into `run_with_tool_loop` as-is —
the helper is hard-coded to `send_openai_compatible`. The
parent track treated Grok+Llama+Qwen as a 1-task line item but
the actual conversion was substantial (the parent track
spanned 5 days for those 3). The follow-up track made the
correct call: don't try to fit 3 more conversions into a
follow-up that's also doing 4 other phases.
### 2. Phase 4 t4_3: Meta Llama API adapter
**Status:** PERMANENT DEFERRED to Phase 6 t6_1.
The Meta Llama developer docs URL is reachable (200 OK as of
2026-06-11; was 400 in the parent session). However, the
actual API endpoints (api.meta.ai, llama-api.meta.com,
api.llama.com) are 404/403/(no response). Meta does not
currently publish a public OpenAI-compat API.
See `docs/reports/meta_llama_api_verification_20260611.md`
for full probe results. Decision: don't ship a fake adapter
that returns errors at runtime; defer until Meta publishes a
public surface.
Phase 6 t6_1 is a tracking placeholder, NOT scheduled for
execution in this track. The next session/track can re-evaluate
when Meta publishes a public URL (or another open-source Llama
API surfaces).
### 3. Phase 4 t4_7: UI adaptations for new v2 fields
**Status:** CONSOLIDATED into Phase 5 t5_4 (which was
originally named "UI adaptations for new capabilities" —
effectively the same scope, just re-discovered).
**Why it was a separate task:** When Phase 4 t4_6 populated
the 11 v2 fields beyond `local`, the GUI work for those
fields naturally fell out of Phase 4 scope. The fields are
vendor-specific (e.g., `reasoning` for grok-2-reasoner only;
`audio` for qwen-audio only) and design-heavy (per-field
UX decisions: toggle vs panel vs button).
**Resolution:** Cancel t4_7 as a duplicate, expand t5_4's
description to enumerate the 11 specific UI adaptations:
1. Reasoning toggle
2. Structured output JSON toggle
3. Code execution panel
4. Web search UI
5. X/Twitter search UI (grok-specific)
6. File search panel
7. MCP support toggle
8. Audio attachment button
9. Video attachment button
10. Grounding toggle
11. Computer use toggle
The 11 fields are populated in `src/vendor_capabilities.py`;
`get_capabilities()` is the read API; the GUI just needs to
consult `caps.<field>` and render the right control.
## Phase 5 expanded scope
Phase 5 is now a "consolidation phase" that includes the
tool-loop conversion work that was originally deferred from
Phase 1, the matrix entries for the 3 remaining vendors,
and the UI adaptations for new v2 fields. The phase is
multi-day work (estimated 8-14 days) and should be scoped as
a fresh track rather than a single follow-up session.
The expanded Phase 5 has 8 tasks:
- t5_1: Anthropic matrix entries
- t5_2: Gemini matrix entries
- t5_3: DeepSeek matrix entries
- t5_4: UI adaptations for 11 v2 fields (consolidated from t4_7)
- t5_5: Phase 5 docs + archive
- t5_6: anthropic tool-loop conversion (deferred from t1_7)
- t5_7: gemini tool-loop conversion (deferred from t1_7)
- t5_8: deepseek tool-loop conversion (deferred from t1_7)
## Verification
The state file has 3 new verification flags that gate
"Phase 5 complete":
```
all_8_vendors_on_tool_loop = false # t5_6, t5_7, t5_8
v2_matrix_fully_populated = false # t5_1, t5_2, t5_3
v2_ui_adaptations_shipped = false # t5_4
```
When all 3 are true AND t5_5 (docs+archive) is complete,
Phase 5 is done. The `audit_no_inline_tool_loops.py`
script (which already exists) will start FAILING on Phase 5
completion — that's the audit-script-success-as-CI-gate
pattern, intended.
## Phase 6 placeholder
Phase 6 is a "cleanup" phase with 2 tasks:
- t6_1: Meta Llama API adapter (PERMANENT DEFERRED)
- t6_2: Track archive + final docs refresh
Phase 6 is NOT scheduled for execution in this track; it's
the home for permanent deferrals + the final archive step
that runs when Phase 5 ships.
## Cross-references
- Session-end report (previous session):
`docs/reports/qwen_llama_grok_followup_session_end_20260611.md`
- Meta Llama API verification report:
`docs/reports/meta_llama_api_verification_20260611.md`
- Parent track's Phase 5+6:
`conductor/tracks/qwen_llama_grok_integration_20260606/`
- This track's plan.md:
`conductor/tracks/qwen_llama_grok_followup_20260611/plan.md`
(note: plan.md was NOT updated to reflect the new t5_6/7/8
tasks; this report + the state.toml are the source of truth.
The plan.md is a planning artifact frozen at track-creation
time; new tasks are tracked in state.toml per the workflow
protocol.)