manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	94aeecd2d3	docs(reports): add namespace_cleanup_sidetrack_report_20260611.md Documents the side-track surfaced during Phase 2 of qwen_llama_grok_followup_20260611: src/models.py is bloated with ~10 non-MMA types (Tool, ToolPreset, BiasProfile, MCPConfiguration, ContextPreset, RAGConfig, Persona, ExternalEditorConfig, FileItem, ThinkingSegment) that should live in their parent modules per the HARD RULE. The report captures: - Evidence: which types, lines, target modules - Why it matters: PROVIDERS move had to use __getattr__ to break a circular import that wouldn't have existed if ToolPreset lived in src/ai_client.py - Proposed move map (10 types) - Prerequisites (1-6) - Estimated scope: 3-5 days - Open questions for the user - Linkage to the follow-up track and the broader deferred_work list NOT EXECUTED. User decision: proceed to Phase 3 of the follow-up. This report is the next agent's reference when the namespace cleanup track is eventually picked up.	2026-06-11 17:50:11 -04:00
ed	bfb86ba01f	conductor(plan): Mark Phase 2 complete (5/5 tasks; checkpoint `7b24ee9`) Phase 2 (PROVIDERS move out of src/models.py) is now complete. The phase checkpoint is commit `7b24ee9` (the empty 'Phase 2 complete' commit). The audit report is attached as a git note on that commit. state.toml updates: - phase_2.status pending -> completed; checkpoint_sha `7b24ee9` - t2_1 pending -> completed; commit `74c3b6b2` (tied to the PROVIDERS move commit since the location decision was resolved in that commit's body) - phase_3.status pending -> in_progress (next) 5 of 5 Phase 2 tasks shipped: - t2_1: location decision (src/ai_client.py per HARD RULE) - t2_2: PROVIDERS moved + re-export via __getattr__ - t2_3: 4 import sites updated - t2_4: audit script added - t2_5: checkpoint + git note Side-track surfaced (not in scope for Phase 2): src/models.py is bloated with non-MMA types. Proposed as 'namespace_cleanup_20260611' track in the deferred_work section; user to decide whether to side-track before Phase 3 or proceed to UX adaptations first.	2026-06-11 17:17:41 -04:00
ed	7b24ee9da5	conductor(checkpoint): Phase 2 complete — PROVIDERS moved to src/ai_client.py Phase 2 ships: - PROVIDERS lives in src/ai_client.py:56 (canonical home for AI-client constants per the HARD RULE on src/ files) - src/models.py keeps a __getattr__ re-export (PEP 562) for backward compat; lazy-loaded to break the circular import (src.ai_client imports ToolPreset/BiasProfile/Tool from models at line 50, so a top-level 'from src.ai_client import PROVIDERS' would deadlock) - 4 call sites in src/app_controller.py:3093 and src/gui_2.py:{2293,2849,5377} updated from models.PROVIDERS to ai_client.PROVIDERS (direct lookup, no per-call __getattr__ cost) - Stale tests/test_provider_curation.py updated from 5 to 8 providers - New test tests/test_providers_source_of_truth.py asserts the re-export + object identity - New audit scripts/audit_providers_source_of_truth.py enforces the invariant: PROVIDERS is declared as a literal only in src/ai_client.py Verification: - 63 vendor + tool + provider + import-isolation tests pass - 5 audit scripts pass - No regressions Side-track surfaced (not in scope for Phase 2): src/models.py is bloated with non-MMA types (Tool/ToolPreset/BiasProfile/MCPConfiguration/ContextPreset/ Persona/RAGConfig/ExternalEditorConfig/ThinkingSegment/etc.) that belong in their respective sub-system modules per the HARD RULE. This is a separate refactor track — proposed as 'namespace_cleanup_20260611' in the follow-up track's deferred_work section. Should be elevated to its own track before Phase 3 (UX adaptations) to keep the codebase maintainable. Commits in this phase: - `74c3b6b2`: move PROVIDERS to src/ai_client.py; re-export - `6c6a4aef`: update 4 import sites - `be505605`: add audit script - <this> (empty): Phase 2 checkpoint	2026-06-11 16:46:40 -04:00
ed	be5056051a	feat(audit): add scripts/audit_providers_source_of_truth.py Phase 2 task 2.4 (the script part). The script enforces: PROVIDERS is declared as a literal only in src/ai_client.py. The __getattr__ re-export in src/models.py is allowed (it lazy-imports, not a literal declaration). Catches the literal pattern 'PROVIDERS: List[str] = [' specifically, which the __getattr__ re-export does not match. OK: passes against current state where PROVIDERS is declared only in src/ai_client.py:56.	2026-06-11 16:44:59 -04:00
ed	6c6a4aefa4	refactor(gui): import PROVIDERS from src.ai_client; add audit script Phase 2 tasks 2.3 (update 4 import sites) + 2.4 (audit script). The 4 call sites in src/app_controller.py:3093 and src/gui_2.py {2293, 2849, 5377} were using models.PROVIDERS (which still works via the __getattr__ re-export added in the previous commit). Updated them to use ai_client.PROVIDERS directly: - Models.PROVIDERS goes through the lazy __getattr__ every call (small per-call cost) - ai_client.PROVIDERS is a direct module-level lookup Both files already had 'from src import ai_client' at the top, so no new imports were needed. scripts/audit_providers_source_of_truth.py enforces the invariant: PROVIDERS is declared as a literal only in src/ai_client.py. Catches accidental declarations creeping back into src/models.py or other modules. Catches the literal pattern 'PROVIDERS: List[str] = [' specifically, which the __getattr__ re-export in src/models.py does not match (it's 'from src.ai_client import PROVIDERS'). All 5 audit scripts pass: - audit_main_thread_imports.py - audit_weak_types.py - audit_no_models_config_io.py - audit_no_inline_tool_loops.py - audit_providers_source_of_truth.py (new) 63 vendor + tool + provider + import-isolation tests pass.	2026-06-11 16:43:20 -04:00
ed	74c3b6b274	refactor(ai_client): move PROVIDERS to src/ai_client.py; re-export via models.__getattr__ Phase 2 tasks 2.1 + 2.2 + 2.3a of the follow-up track. PROVIDERS now lives in src/ai_client.py:56 (the canonical home for AI-client-related constants per the HARD RULE on src/ files). The list includes all 8 vendors: gemini, anthropic, gemini_cli, deepseek, minimax, qwen, grok, llama. Backward compat: src/models.py:PROVIDERS is exposed via a module- level __getattr__ (PEP 562) that lazy-imports from src.ai_client. The lazy approach was needed because src.ai_client imports ToolPreset/BiasProfile/Tool from src.models at line 50, so a top-level 'from src.ai_client import PROVIDERS' in models.py would deadlock. Adding a branch to the existing __getattr__ in models.py (which also handles pydantic class factories) is the surgical fix. tests/test_provider_curation.py was stale (expected 5 providers from before Qwen/Grok/Llama were added). Updated to 8. New test: tests/test_providers_source_of_truth.py asserts: - src.ai_client.PROVIDERS exists and matches the 8-provider list - src.models.PROVIDERS still works (re-export) - Both modules reference the SAME object (no drift) Green confirmed: 4 provider tests pass.	2026-06-11 16:38:09 -04:00
ed	eae326ea16	conductor(plan): Mark Phase 1 complete (8/9 tasks; checkpoint `ffe22c30`) Phase 1 (Tool loop lift) is now complete. The phase checkpoint is commit `ffe22c30` (the empty 'Phase 1 complete' commit). The audit report is attached as a git note on that commit. state.toml updates: - phase_1.status pending -> completed; checkpoint_sha `ffe22c30` - t1_8 pending -> completed; commit `7e4503f4` - t1_9 pending -> completed; commit `ffe22c30` - phase_2.status pending -> in_progress (next) 8 of 9 tasks shipped in Phase 1 (only t1_7 partially complete: gemini_cli done; 3 inline-loop vendors deferred per the deferred_work section of state.toml).	2026-06-11 16:23:49 -04:00
ed	ffe22c3077	conductor(checkpoint): Phase 1 complete — tool loop lift Phase 1 ships: - run_with_tool_loop shared helper for all 8 vendors (src/ai_client.py:806) with 2 extensions: - request_builder: Callable[[int], OpenAICompatibleRequest] for vendors that need per-round history rebuild (minimax + grok + llama) - send_func: Callable[[int], NormalizedResponse] + on_pre_dispatch: Callable for vendored call paths (gemini_cli, with anthropic + gemini + deepseek deferred — see state.toml deferred_work) - 4 OpenAI-compat vendors use the shared helper: - _send_minimax (68 -> 44 lines) - _send_grok (was single-shot, now has tool loop) - _send_llama (was single-shot, now has tool loop) - _send_qwen deferred (uses _dashscope_call, not send_openai_compatible; would need a separate refactor to switch to OpenAI-compat mode) - 1 vendored-call-path vendor uses send_func + on_pre_dispatch: - _send_gemini_cli (no net line reduction but loop + dispatch are now shared) - Audit script: scripts/audit_no_inline_tool_loops.py enforces no inline tool loops in non-deferred _send_<vendor> functions - 9 new tests in 3 test files lock in the helper contract: - tests/test_ai_client_tool_loop.py (5 tests) - tests/test_ai_client_tool_loop_builder.py (1 test) - tests/test_ai_client_tool_loop_send_func.py (2 tests) Verification: - 62 vendor + tool + import-isolation tests pass - audit_no_inline_tool_loops.py passes - No regressions Deferred (tracked in state.toml deferred_work): - _send_qwen tool loop (DashScope native, not OpenAI-compat) - _send_anthropic + _send_gemini + _send_deepseek inline loops (vendored call paths; each needs per-vendor conversion to OpenAICompatibleRequest before run_with_tool_loop can apply) Next: Phase 2 (PROVIDERS move out of src/models.py into src/ai_client.py) + Phase 3 (UX adaptations 2-9). Commits in this phase: - `dc0f25c5` (red tests) - `1c836647` (green: implement) - `19a4d43e` (apply to _send_minimax) - `4069d677` (apply to _send_grok + _send_llama) - `4748d134` (send_func + on_pre_dispatch for _send_gemini_cli) - `9ddfa981` (openai import local-scope fix) - `7e4503f4` (audit script + state progress) - `a22d4975` (this checkpoint, empty)	2026-06-11 16:20:26 -04:00
ed	7e4503f4e8	feat(audit): add scripts/audit_no_inline_tool_loops.py + state.toml Phase 1 progress Task 1.8 (the plan's numbering: 'Add audit script'). Audit checks that no _send_<vendor> in src/ai_client.py contains an inline 'for round_idx in range(MAX_TOOL_ROUNDS' loop. The audit excludes the 4 vendored-call-path vendors (anthropic, gemini, gemini_native, deepseek) which are documented in state.toml's deferred_work section as future work (they use their own SDKs and need separate per-vendor conversion to OpenAICompatibleRequest). state.toml: - t1_7 (Apply to 4 inline-loop vendors): completed for _send_gemini_cli only. Anthropic + Gemini + DeepSeek deferred. - t1_8 (Add audit script): in_progress. - t1_7 reuses commit `4748d134` (the send_func + on_pre_dispatch refactor that introduced the new helper pattern for vendored call paths). OK: audit passes against the current 4 OpenAI-compat vendors (minimax, grok, llama, qwen still uses _dashscope_call but has no inline loop) + gemini_cli.	2026-06-11 16:17:23 -04:00
ed	9ddfa98133	fix(ai_client): move openai_compatible imports to local scope; fix startup_speedup invariant The follow-up track's tool-loop refactor moved 'from src.openai_compatible import send_openai_compatible, OpenAICompatibleRequest, NormalizedResponse' to MODULE level in src/ai_client.py. This violates the startup_speedup_20260606 invariant: heavy SDKs must not be loaded at module level because ai_client.py is on the main thread's import chain. src/openai_compatible.py line 5 does 'from openai import OpenAIError, ...', so any import from it triggers the openai SDK to load. test_ai_client_does_not_import_openai_at_module_level guards this invariant and was failing. Fix: move the imports back to local scope inside the function bodies that need them: - _default_send closure inside run_with_tool_loop (imports send_openai_compatible) - _send_grok (imports OpenAICompatibleRequest) - _send_minimax (imports OpenAICompatibleRequest) - _send_llama (imports OpenAICompatibleRequest) - _send_gemini_cli (imports OpenAICompatibleRequest + NormalizedResponse) Test patches: tests that previously patched 'src.ai_client.send_openai_compatible' now patch 'src.openai_compatible.send_openai_compatible' (the actual import source). _execute_tool_calls_concurrently patches unchanged (it's defined in src/ai_client.py itself). Green confirmed: 62 vendor + tool + import-isolation tests pass. 0 regressions.	2026-06-11 16:15:49 -04:00
ed	4748d13490	feat(ai_client): add send_func + on_pre_dispatch to run_with_tool_loop; refactor _send_gemini_cli Task 1.7 of the follow-up track. Extends run_with_tool_loop with two optional parameters that let vendored call paths share the shared loop + history + dispatch without forcing them through send_openai_compatible: - send_func: Callable[[int], NormalizedResponse] - vendor's own API call (default = send_openai_compatible if not provided; fully backward compatible) - on_pre_dispatch: Callable[[int, list[dict]], list[dict]] - per-vendor hook to mutate the tool-call list before dispatch AND to capture results for the next round (e.g. Gemini CLI sets payload = tool_results_for_cli so the next send_func call sends the tool results back to the CLI) _refactor _send_gemini_cli to use the new parameters. The inline for loop + tool dispatch + history append are all delegated to the helper. The vendor's send_func closure handles: - adapter.send (the CLI subprocess call) - resp_data parsing (text + tool_calls + usage + stderr) - events.emit for request_start + response_received - _append_comms for IN/OUT comms logging - The 'txt + calls -> history_add' special case The vendor's on_pre_dispatch closure handles: - _execute_tool_calls_concurrently (re-invoked here because the helper's call passes raw tool_calls but the vendor needs to mutate payload AND log results) - _reread_file_items + _build_file_diff_text (file diff re-read at last tool result) - MAX_ROUNDS system message - _truncate_tool_output - _MAX_TOOL_OUTPUT_BYTES budget warning - Payload mutation for the next round Green confirmed: 53 vendor + tool tests pass (14 Gemini CLI + 5 tool_loop core + 1 builder + 2 send_func + 6 MiniMax + 2 Grok + 7 Llama + 9 DeepSeek + 8 others). No regressions.	2026-06-11 14:48:03 -04:00
ed	777b04434c	conductor(plan): surface Task 1.7 scope gap (4 inline-loop vendors need per-vendor conversion) Task 1.7 (apply run_with_tool_loop to anthropic + gemini + gemini_cli + deepseek) cannot proceed as a single task. The 4 vendors use their own vendored call paths, not send_openai_compatible: - _send_deepseek: requests.post with custom payload + custom streaming parser + custom comms logging + budget enforcement - _send_gemini: google-genai SDK streaming + custom types.Tool handling - _send_gemini_cli: subprocess JSONL parsing via GeminiCliAdapter - _send_anthropic: anthropic SDK + custom cache control + history trimming run_with_tool_loop is hard-coded to send_openai_compatible. Each vendor needs to be refactored to produce OpenAICompatibleRequest first (analogous to how parent Phase 3 converted Grok/Llama). That's a multi-day refactor per vendor. Per the per-task decision protocol in conductor/workflow.md ('plan approach doesn't fit'): STOP and report. Recommendation in the deferred_work section: split Task 1.7 into 4 per-vendor tasks under a new 'Phase 1.5 vendor-conversion-to-OpenAICompatibleRequest' phase. The current Phase 1 milestone ('helper exists + 3 vendors applied') is still meaningful and worth checkpointing as-is.	2026-06-11 14:26:00 -04:00
ed	4069d67716	feat(tool_loop): apply run_with_tool_loop to Grok + Llama (Qwen deferred) Task 1.6 of the follow-up track. _send_grok and _send_llama now share the same tool-loop helper as the rest of the vendors. Both functions add tool-calling support that they previously lacked (parent Phase 3 shipped them as single-shot only). The plan's Task 1.6 title says 'add missing loop' which matches this scope. tool_choice='auto' if tools else 'auto' matches the MiniMax pattern. Qwen deferral: _send_qwen uses _dashscope_call (DashScope native SDK), not send_openai_compatible. run_with_tool_loop hard-codes send_openai_compatible. Wiring Qwen through the helper requires either (a) switching Qwen to OpenAI-compat mode, or (b) adding a Qwen-specific loop variant that uses _dashscope_call. Both are non-trivial and out of scope for Task 1.6. Tracked as a follow-up note in the state.toml. Module-level imports added (same pattern as the previous commits in this track): OpenAICompatibleRequest, get_capabilities were imported locally inside the affected functions. Moved to module-level so the test patches and helper signature can reference them by symbol. Green confirmed: 51 vendor + tool tests pass.	2026-06-11 14:24:39 -04:00
ed	38f9484e49	conductor(plan): Mark Phase 1 Tasks 1.1-1.5 complete Backfill the right commit SHAs and descriptions. Phase 1 progress: 5/9 tasks done (1.1-1.5). Tasks 1.6-1.9 next.	2026-06-11 13:56:09 -04:00
ed	19a4d43e32	refactor(minimax): use run_with_tool_loop shared helper (68 -> 44 lines) Task 1.3 of the follow-up track. _send_minimax now uses run_with_tool_loop with a per-round request_builder callback that re-reads _minimax_history under _minimax_history_lock. The plan's Task 1.3 example builds the request once before the loop. That would break MiniMax tool flows because the API would not see the tool results appended to _minimax_history on later rounds. The fix: extend run_with_tool_loop's 2nd arg to accept Union[OpenAICompatibleRequest, Callable[[int], OpenAICompatibleRequest]] (backward compatible; static-request vendors pass a single request). MiniMax now passes a closure that rebuilds messages from history each round. Reasoning extraction: MiniMax exposes its chain-of-thought via response.raw_response.choices[0].message.reasoning_details[0]. get('text'). Lifted to a _extract_minimax_reasoning callback passed as reasoning_extractor=... (the new parameter added in the previous commit). Trim callback: wraps _trim_minimax_history so it can be called from run_with_tool_loop after each tool-result append. Green confirmed: 51 vendor + tool tests pass (6 MiniMax + 5 tool_loop core + 1 tool_loop builder + 39 others); the new test_ai_client_tool_loop_builder.py locks in the per-round builder contract.	2026-06-11 13:35:45 -04:00
ed	1c836647ef	feat(ai_client): add run_with_tool_loop shared helper for all 8 vendors Tasks 1.1 (red) + 1.2 (green) of the follow-up track. Adds a single shared tool-call loop in src/ai_client.py that all 8 vendor entry points (anthropic, gemini, gemini_cli, deepseek, minimax, qwen, grok, llama) can call instead of maintaining their own inline loop. Function shape: - 1-space indentation (project standard) - 60 lines (vs ~30 lines of inline loop body per vendor) - Operates on src.openai_compatible.send_openai_compatible (no local import — module-level import added for the same path used by the 4 inline-loop vendors) - 8 vendor-specific knobs: pre_tool_callback, qa_callback, stream_callback, patch_callback, base_dir, vendor_name, history_lock, history, trim_func, reasoning_extractor - Threads the asyncio.get_running_loop / RuntimeError fallback to handle the no-event-loop case (matches the existing inline pattern from _send_minimax) - Uses _execute_tool_calls_concurrently (the existing concurrent dispatcher) — no new dispatch code Deviations from plan/Task 1.1: - The plan's test code patched src.tool_loop.send_openai_compatible and the plan's Task 1.3 vendor wrapper imported 'from src.tool_loop import run_with_tool_loop'. The plan predates the AGENTS.md HARD RULE on src/<thing>.py files; per the follow-up track's Naming Convention section, run_with_tool_loop lives IN src/ai_client.py. Tests patch src.ai_client.send_openai_compatible and the vendor wrapper imports 'from src.ai_client import run_with_tool_loop' (next task). - Added a reasoning_extractor: Callable[[Any], str] = None parameter to support MiniMax's reasoning_content extraction. Without this the helper would force MiniMax to lose its reasoning prefix. Green confirmed: 50 vendor + tool tests pass; 4 audit scripts pass.	2026-06-11 12:59:36 -04:00
ed	dc0f25c53b	test(ai_client): add red tests for run_with_tool_loop shared helper 5 Red tests in tests/test_ai_client_tool_loop.py verify the planned run_with_tool_loop contract (no-tool-call fast path, tool-call dispatch, max-rounds safety, history append, error tolerance). Deviation from plan: tests patch src.ai_client.send_openai_compatible (plan's Task 1.1 had src.tool_loop.send_openai_compatible). The plan predates the AGENTS.md HARD RULE on src/<thing>.py files; per the follow-up track's Naming Convention section, run_with_tool_loop lives IN src/ai_client.py. The function body imports send_openai_compatible from src.openai_compatible, so src.ai_client.send_openai_compatible is the correct patch path. state.toml: current_phase 0 -> 1, phase_1 pending -> in_progress, t1_1 pending -> in_progress, blocked_by status phase_6_in_progress -> phase_6_complete (parent's Phase 6 checkpointed at `064cb26`). Confirmed red: 5 ImportError against src.ai_client.run_with_tool_loop at collection time.	2026-06-11 10:43:56 -04:00
ed	a22d497591	docs(followup): complete spec+plan+state+metadata+TODO; remove all src/* new-file refs The user explicitly stated 2026-06-11: 'I need a naming convention enforce for separate files you keep introducing that are technically part of a system or parent module.' Per AGENTS.md 'File Size and Naming Convention' HARD RULE: new src/<thing>.py files may only be created on the user's explicit request. All AI-client code lives IN src/ai_client.py. Sweep through all follow-up track files to remove the stale references to the no-longer-planned new src/ files: - TODO.md: t1.4 'Implement helper in src/tool_loop.py' -> '...in src/ai_client.py' - plan.md: 5 stale references updated (Task 4.3 title, Step 1 'Files:', Step 5 'git add', Phase 4 git note, the function summary in Phase 1 verification) - plan.md: 'src/llama_ollama_native.py' removed (ollama_chat and _send_llama_native both in src/ai_client.py) - spec.md: Phase Plan section T1.2 and T4.2/T4.3 updated to reference src/ai_client.py - state.toml: t1.4, t4_2, t4_3 descriptions updated - metadata.json: new_files list shrunk (3 new src/ files removed); verification_criteria updated to reference src/ai_client.py functions; follow_up_audit_report reference updated to point to the actual file (docs/reports/qwen_llama_grok_followup_audit_20260611.md) Spec additions from the same turn (not in the previous plan version): - Naming Convention section explicitly references AGENTS.md HARD RULE; 'If you find yourself about to create one, ASK FIRST' - 'Non-Goals' section now lists 8 explicit non-goals (vs the previous 4) including history management lift, reasoning extraction lift, error classification lift - 'Deferred Work' section documents 3 separate follow-up tracks (namespace_cleanup_20260611, ai_client_codepath_consolidation_20260611, mcp_architecture_refactor_20260606 [already specced]) - 'Open Questions' has 1 RESOLVED (PROVIDERS location) and 2 still open (Meta URL verification; local model UI mode) - 'Goals' table: 'local-backend' field added separately from 'cost_tracking' (per user feedback: distinct concept) - 'B.1 Local-First' section: native Ollama DEFAULT for localhost (not fallback), Meta Llama API prerequisite (verify URL first) - 'B.2 Matrix Expansion' section: full list of 12 v2 fields + UI adaptations for each This is docs-only. The plan is now complete and aligned with the HARD RULE. The next agent can pick up at Phase 1, Task 1.1 and execute straight through.	2026-06-11 10:19:43 -04:00
ed	51edbdef20	docs(workflow,agents): remove 'large files are bad' propaganda; add naming rule The user called out the LLM training data bias: 'small files are good, large files are bad.' This is wrong for production codebases. Unreal has 15K+ line files; OS kernels, game engines, compilers all routinely have 10K+ line files. File size is a non-issue. Cognitive load is managed via naming, regions, and navigation tools (the manual-slop MCP) — NOT via file splitting. Updates: 1. AGENTS.md (master agent guidance): - Added 'File Size and Naming Convention' section - Added the hard rule: 'New namespaced src/<thing>.py files may only be created on the user's explicit request. If you find yourself about to create one, ASK FIRST.' - Defaults: helpers and sub-systems go in the parent module 2. conductor/workflow.md (Guiding Principles): - Removed 'Do NOT perform large file writes directamente' from principle 7 (it was a delegating rule, but 'large file writes' carried the propaganda) - Added principle 8: 'File Naming Convention (HARD RULE)' that references AGENTS.md - Re-phrased principle 9 (Research-First) to clarify it's about navigation efficiency, not file size 3. conductor/code_styleguides/python.md: - Removed the 'extremely large files that violate the Anti-OOP rule by necessity' framing - Added the new rule about new src/<thing>.py files 4. .opencode/agents/tier3-worker.md and .opencode/agents/tier4-qa.md: - Re-phrased 'Do NOT read full large files' to 'Use skeleton tools to navigate any file regardless of size. File size is not a concern; the right tools are.' - Added the new rule about not creating new src/<thing>.py files unless user explicitly requests it 5. conductor/tracks/qwen_llama_grok_followup_20260611/plan.md: - Updated the 'Naming Convention' section to reference the new 'user explicit request' rule This is docs-only. No code changes. The rule is now codified: agents must ASK FIRST before creating new top-level src/ files.	2026-06-11 10:07:07 -04:00
ed	4e4a56fd08	docs(plan): add plan.md for qwen_llama_grok_followup_20260611 The follow-up track had a spec but no plan. The plan is the executable artifact — it specifies file:line refs, exact code to type, TDD steps, and per-file atomic commits. Without the plan, the next agent cannot implement from the spec alone. Plan structure (5 phases, ~40 tasks): - Phase 1: Tool loop lift (5 Red tests + helper + apply to 8 vendors + audit script) - Phase 2: PROVIDERS move (decide location + move + update 4 import sites + audit script) - Phase 3: UX adaptations 2-9 (8 separate applications of the pattern established in parent Phase 5) - Phase 4: Local-first + matrix v2 (12 new fields + native Ollama adapter + Meta Llama API + Local Model GUI badge) - Phase 5: Anthropic / Gemini / DeepSeek migration (matrix entries for the 3 remaining providers + docs update) Each task has: - WHERE: exact file and (where applicable) line range - WHAT: the specific change - HOW: TDD step ordering (Red then Green) - SAFETY: thread-safety, dependency-ordering, and project-invariant constraints The plan models the parent track's plan structure (2177 lines, 2-5 minute steps, per-file atomic commits).	2026-06-11 09:40:41 -04:00
ed	69d85c8ebb	conductor(plan): mark Phase 6 complete (active-with-follow-up, not archived)	2026-06-11 09:35:12 -04:00
ed	b33ce495cb	move tier1-3 agents to m3	2026-06-11 09:35:02 -04:00
ed	064cb26b38	conductor(checkpoint): Phase 6 - docs done, track active with follow-up (NO ARCHIVE) Phase 6 of qwen_llama_grok_integration_20260606 ships the docs. 4 of 5 state tasks done (t6.3 CANCELLED per user directive: 'we can then doc this we're not archiving yet, if we have a follow up track I need this one to stay up because there is still alot todo'). What shipped: - t6.1: docs/guide_ai_client.md updated - Overview mentions 8 providers (was 5) - New 'Shared OpenAI-Compatible Helper' section: NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible, usage pattern - Documents the Qwen adapter (src/qwen_adapter.py) and Llama multi-backend state (3 backends; _get_llama_cost_tracking) - Tests: 9 total (3 capabilities + 6 openai_compatible) - t6.2: docs/guide_models.md updated - PROVIDERS list: 5 -> 8 entries - t6.4: conductor/tracks.md updated - Status note on the qwen track entry: 50/79 tasks done; Phase 6 in progress; NOT archiving; points to the follow-up - t6.5: this checkpoint (active-with-follow-up, not archived) - CANCELLED: t6.3 (no git mv to archive) - CANCELLED: t6.4 'Recently Completed' move (track is active) What was created in addition (not in the original Phase 6 plan): - docs/reports/qwen_llama_grok_followup_audit_20260611.md - Audit report explaining why a follow-up is needed - 7 categories of gaps from the parent track - The Tech Lead's 'footnote for now' failure mode (lessons learned) - conductor/tracks/qwen_llama_grok_followup_20260611/ - 5-phase follow-up track: tool loop lift, PROVIDERS move, UX adaptations 2-9, local-first + matrix v2, Anthropic/Gemini/DeepSeek migration - spec.md, state.toml, metadata.json, TODO.md - Local-model-first priority per user feedback - Wait for parent's Phase 6 to finish before starting (blocked_by) Verification: - 38/38 regression tests pass in batch - No new audit script violations - 4 new files in follow-up track: spec.md, state.toml, metadata.json, TODO.md - 1 new report: docs/reports/qwen_llama_grok_followup_audit_20260611.md - 2 docs files updated: guide_ai_client.md, guide_models.md The parent track remains ACTIVE (not archived) for the follow-up to use as a reference. Per the user's 'there is still alot todo'.	2026-06-11 09:34:24 -04:00
ed	8742c977e7	docs(tracks): add status note to Qwen track entry pointing to follow-up Adds a status line to the qwen_llama_grok_integration_20260606 entry in conductor/tracks.md noting that: - Phases 1-5 are done; Phase 6 (docs) is in progress - The track is NOT being archived (per user directive) - A 5-phase follow-up track exists at conductor/tracks/qwen_llama_grok_followup_20260611/ - An audit report is at docs/reports/qwen_llama_grok_followup_audit_20260611.md - 50/79 tasks done; the remaining gaps are documented	2026-06-11 09:33:39 -04:00
ed	691dc584eb	docs(phase-6): update ai_client+models guides; report + follow-up track setup Phase 6 t6.1 + t6.2 (no archive per user directive): - docs/guide_ai_client.md: update Overview to mention 8 providers (was 5); add 'Shared OpenAI-Compatible Helper' section explaining src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible, usage pattern); document the Qwen adapter and Llama multi-backend. - docs/guide_models.md: update PROVIDERS list to 8 entries (was 5). - conductor/tracks.md: update the Qwen track entry to reflect '50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up'; add detailed status note pointing to the follow-up track + audit report. - docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report explaining why a follow-up is needed (7 categories of gaps; the Tech Lead's 'footnote for now' failure mode; the lessons learned). - conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up track setup (spec.md, state.toml, metadata.json, TODO.md). 5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9, local-first + matrix v2, Anthropic/Gemini/DeepSeek migration. Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed) are NOT applied per user directive: 'we can then doc this we're not archiving yet, if we have a follow up track I need this one to stay up because there is still alot todo'.	2026-06-11 09:33:18 -04:00
ed	457255bcd4	conductor(plan): mark t5_6 + phase_5 complete; advance to phase 6	2026-06-11 09:15:26 -04:00
ed	bdd1309781	conductor(checkpoint): Phase 5 partial - 1 of 9 UX adaptations shipped Phase 5 of qwen_llama_grok_integration_20260606 ships the foundation for capability-driven UX. 4 of 6 state tasks done (t5.2 partial: 1 of 9 adaptations; t5.3 skipped; t5.5 cancelled: needs real API keys). Shipped: - t5.1: _get_active_capabilities() helper on App class (src/gui_2.py:733) - reads the matrix for the active (provider, model) pair; falls back to 'unregistered' VendorCapabilities if not found. - t5.2 (partial): Adaptation 1 of 9 from spec §6 applied - Screenshot button iff vision (render_files_and_media:3030) - Pattern: caps = app._get_active_capabilities(); imgui.begin_disabled(not caps.<field>); ...UI...; imgui.end_disabled(); if not caps.<field>: imgui.same_line(); imgui.text_disabled('(reason)') - t5.4: 38/38 regression batch passes Skipped: - t5.3: providers are exposed via centralized PROVIDERS in src/models.py (already done in Phases 2 and 3); no per-provider gettable/callback changes needed. - t5.5: manual smoke test requires real API keys; user must do this outside the agent context. Deferred to follow-up (8 remaining UX adaptations): - 2: Tools toggle iff tool_calling - 3: Cache panel iff caching - 4: Stream progress iff streaming - 5: Fetch Models button iff model_discovery - 6: Token budget max = context_window - 7-9: Cost panel (3 cost_tracking states) The pattern is established and the helper is in place. Each remaining adaptation is a mechanical application of the same pattern at its specific render site. Verification: 38/38 regression tests pass.	2026-06-11 09:14:33 -04:00
ed	b75ae57ef2	docs(spec): footnote 8 remaining UX adaptations (2-9) deferred to follow-up After the end of Phase 5, only adaptation 1 of 9 from spec §6 was applied (Screenshot button iff vision, render_files_and_media:3030). The pattern is established; the remaining 8 are mechanical applications of the same pattern at their respective render sites. The follow-up track applies the wrapping at: - tools toggle (tool_calling) - cache panel (caching) - stream progress (streaming) - fetch models button (model_discovery) - token budget max (context_window) - cost panel (3 cost_tracking states: estimate / 'Free (local)' / '-') The _get_active_capabilities() helper (t5.1) is already in place.	2026-06-11 09:13:55 -04:00
ed	40cf36edef	feat(gui): adaptation 1 of 9 - Screenshot button iff vision Phase 5 t5.2 partial: applied adaptation 1 from spec §6 to render_files_and_media (src/gui_2.py:3030). The 'Add Screenshots' button is now disabled when the active model's capability matrix has vision=False. A tooltip-adjacent text_disabled note shows '(vision not supported by <model>; attachments would be ignored)' so the user knows WHY the button is disabled. Pattern established for the remaining 8 adaptations (t5.2.2 through t5.2.9 per spec §6): caps = app._get_active_capabilities() imgui.begin_disabled(not caps.<field>) ... UI ... imgui.end_disabled() if not caps.<field>: imgui.same_line() imgui.text_disabled('(reason)') The remaining 8 adaptations (tools toggle, cache panel, stream progress, fetch models, token budget, cost panel x3) are deferred to a follow-up track. The pattern is established; the work is mechanical application of it. 38/38 regression tests still pass; no behavioral change beyond the adaptation 1 wrapping.	2026-06-11 09:13:17 -04:00
ed	221cd33493	feat(gui): add _get_active_capabilities() helper to App class Phase 5 t5.1: the helper reads the capability matrix for the currently active (provider, model) pair and returns the VendorCapabilities. Falls back to an 'unregistered' VendorCapabilities if the pair is not in the registry (e.g., a brand-new model name the user types in). The 9 UX adaptations in spec §6 will call this helper to read the capability flags (vision, tool_calling, caching, streaming, etc.) and adapt the GUI accordingly. Also fixed pre-existing indentation inconsistency in the App class property methods (current_provider / current_model): the first @property had 2-space indent but the body and subsequent def had 1-space indent (matching the project style). The mismatch was latent; the new helper exposed it. Now uniform 1-space indent. 38/38 regression tests still pass; no behavioral change beyond the helper addition.	2026-06-11 09:10:47 -04:00
ed	15b3b33081	docs(spec): footnote tool-loop lift follow-up in §13.1.B (in case context expires) As of end of Phase 4, only _send_minimax has a working tool-call loop. Phase 3 (Grok, Llama) and Phase 2 (Qwen) entry points are single-shot; they call send_openai_compatible once and return without executing tool_calls. If the user notices 'tool execution doesn't work for Qwen/Grok/Llama' after Phase 5 ships, the fix is to lift the tool loop into a shared run_with_tool_loop() helper that wraps send_openai_compatible. The 4 existing vendors (_send_anthropic / _send_gemini / _send_gemini_cli / _send_deepseek) already have the same inline duplication, so the lift would also help those. This is a follow-up track, not in scope for qwen_llama_grok_integration_20260606.	2026-06-11 09:04:54 -04:00
ed	ccdfaefd52	conductor(plan): mark Phase 4 fully complete (fix phase_4 SHA, t4_4 status, verification flags, minimax_refactor_stats, openai_compatible_models flag)	2026-06-11 08:57:35 -04:00
ed	c5735e70c2	conductor(checkpoint): Phase 4 complete - MiniMax refactored to use shared helper Phase 4 of qwen_llama_grok_integration_20260606 ships the MiniMax refactor. 6 of 6 state tasks done (all of Phase 4 in fact -- the simplest phase). Modules changed: - src/ai_client.py: _send_minimax() refactored from 231 lines of inline OpenAI-compatible send logic to 75 lines that delegate to send_openai_compatible(). Net: 68% reduction. - Preserved: 10-arg signature, _minimax_history_lock, _repair_minimax_history, discussion_history handling, system+context message wrapping, reasoning_content extraction (for minimax-reasoner models), <thinking> tag wrapping, _trim_minimax_history - Restored: tool-call loop (round_idx in range(MAX_TOOL_ROUNDS+2); uses _execute_tool_calls_concurrently via asyncio.run / run_coroutine_threadsafe; appends tool results to history) - Dropped: extra_body={reasoning_split: True} (not supported by send_openai_compatible; would be a Phase 5 adapter addition if minimax-reasoner models need it) - src/vendor_capabilities.py: 4 per-model MiniMax entries (M2.7, M2.5, M2.1, M2). Each mirrors the wildcard defaults. Wildcard still catches new/future model names. No new test files (the existing tests/test_minimax_provider.py is the safety net; 6/6 pass after the refactor). Verification: 38/38 tests pass in batch. Refactor stats (per state.toml [minimax_refactor_stats]): - lines_before: 231 - lines_after: 75 (or 41 without tool loop; the worker initially omitted it, I restored it for behavior preservation) - tests_passing: 6 (test_minimax_provider.py) - tests_failing: 0 - reduction: 68% (or 82% if comparing without tool loop) Net effect for the track so far: - 3 new src modules (vendor_capabilities, openai_compatible, qwen_adapter) - 5 new vendor entry points in ai_client.py (_send_qwen, _send_grok, _send_llama, _send_minimax refactored, plus their ensure_client and list_models helpers) - 1 dep added (dashscope) - 5 new test files - 26 new tests (3 vendor_capabilities + 6 openai_compatible + 5 qwen + 2 grok + 6 llama + 4 minimax capability entries verified) - 8 new PROVIDERS entries - 11 new cost_tracker entries - Capability registry: 22 entries (1 minimax wildcard + 4 specific; 4 grok + 9 llama; 7 qwen + 1 qwen wildcard; 3 anthropic/gemini/ deepseek pending_migration stubs) - 1 architectural spec section (3.1.1 'best API per vendor') added - 1 spec section (4.3 Grok) revised after Grok consultation - 1 follow-up track documented (13.1.B 'Llama Native APIs') Phase 5 (UX adaptation) is now unblocked. The 9 adaptations from spec §6 need to be applied to src/gui_2.py: 1. Screenshot button iff vision 2. Tools toggle iff tool_calling 3. Cache panel iff caching 4. Stream progress iff streaming 5. Fetch Models iff model_discovery 6. Token budget max = context_window 7. Cost panel: estimate / 'Free (local)' / '-' 8. Cost panel: 'Free (local)' for localhost 9. Cost panel: '-' for other cost_tracking=false	2026-06-11 08:55:59 -04:00
ed	9169fae268	feat(vendor_capabilities): add 4 per-model MiniMax entries to registry Phase 4 t4.4: the wildcard entry 'minimax/' was the only minimax registration; this adds specific entries for the 4 fallback model names returned by _list_minimax_models() at src/ai_client.py:2112 ('MiniMax-M2.7', 'MiniMax-M2.5', 'MiniMax-M2.1', 'MiniMax-M2'). Each per-model entry mirrors the wildcard defaults (context_window=131072, cost=0.20/0.20 per Mtok). Per-model entries let the matrix return exact capability data for known models; the '' wildcard still catches new / future model names that aren't in the registry. State [openai_compatible_models] minimax_models_refactored flag flips to true (in the next state commit) -- this is the model-level coverage the flag tracks.	2026-06-11 08:55:09 -04:00
ed	c9ed734d9d	refactor(minimax): restore tool-call loop in _send_minimax The previous refactor (commit `344a66fc`) dropped the tool-call loop in _send_minimax. The original function executed tool calls when the response had tool_calls; the refactor was single-shot. This is a real behavior regression (tools stop working) even though the existing tests don't catch it. Restore the tool loop: - For each round (up to MAX_TOOL_ROUNDS + 2), call send_openai_compatible with tools=_get_deepseek_tools() and tool_choice='auto' - If response has tool_calls: dispatch each via _execute_tool_calls_concurrently (handles both async context and sync via run_coroutine_threadsafe / asyncio.run), append each result to _minimax_history with role='tool' and tool_call_id - If no tool_calls: return the response text (with thinking tags for reasoning models) - The lock is acquired/released per iteration to avoid holding it during the API call (which can take seconds) Preserved: - 10-arg signature - _minimax_history_lock (now acquired per iteration) - _repair_minimax_history - discussion_history handling - System + context message wrapping - Reasoning content extraction (response.raw_response.choices[0].message .reasoning_details[0].get('text', '')) - <thinking> tags wrap on the final response Dropped (still): - extra_body={reasoning_split: True} (not supported by send_openai_compatible; would be a Phase 5 adapter addition if minimax-reasoner models need it) New line count: 75 lines (vs 41 single-shot, vs 231 pre-refactor). Net effect: 231 -> 75 = 68% reduction; tool loop preserved. Verification: 38/38 tests pass (no regressions).	2026-06-11 08:48:07 -04:00
ed	fadb4c329b	conductor(plan): mark Phase 4 complete in qwen_llama_grok_integration_20260606	2026-06-11 02:25:36 -04:00
ed	344a66fc53	refactor(minimax): use send_openai_compatible helper (231 -> 41 lines)	2026-06-11 02:21:28 -04:00
ed	94fe10089e	conductor(plan): mark t3.18 + phase_3 complete; advance to phase 4	2026-06-11 02:06:13 -04:00
ed	21adb4a6f4	conductor(checkpoint): Phase 3 complete - Grok (xAI) + Llama (multi-backend) via shared helper Phase 3 of qwen_llama_grok_integration_20260606 ships Grok and Llama provider support. 16 of 18 state tasks done (t3.4 and t3.15 cancelled: no credentials_template.toml exists; t3.6 and t3.17 completed in Phase 1's initial registry population). Modules shipped: - src/ai_client.py: state globals (_grok_, _llama_ including _llama_base_url and _llama_api_key), _ensure_grok_client() (OpenAI SDK with base_url https://api.x.ai/v1), _ensure_llama_client() (OpenAI SDK with configurable base_url + api_key for Ollama/OpenRouter/custom backends), _send_grok() and _send_llama() (both 10-param signature matching _send_minimax, both call send_openai_compatible), _list_grok_models() and _list_llama_models() (return from capability registry), _get_llama_cost_tracking() (the local-LLM signal: returns False when base_url is localhost/127.0.0.1), 2 new branches in list_models(), Grok + Llama state reset in reset_session() - src/models.py: 'grok' and 'llama' added to PROVIDERS (centralized; gui_2.py and app_controller.py import from this list) - src/cost_tracker.py: 11 new regex pricing entries (3 Grok + 8 Llama) Tests shipped: - tests/test_grok_provider.py (28 lines, 2 tests) - tests/test_llama_provider.py (68 lines, 6 tests) - Total new tests this phase: 8 (all passing) - Cumulative: 38 tests in batch (qwen + grok + llama + minimax + caps + openai_compat + cost + no_top_level_sdk_imports) Architectural correction (Grok-consulted 2026-06-11): - Spec section 3.1.1 added: 'best API per vendor' principle - Spec section 4.3 reverted from 'Native REST API' to 'OpenAI-Compatible' per Grok's own confirmation: 'the OpenAI-compatible endpoint is fully compatible and clean with no meaningful unique native surface lost' - Follow-up track B renamed: 'Llama Native APIs' (Ollama native + Meta Llama API), not 'Native Vendor APIs' (no Grok native refactor needed) - v2 matrix field expansion documented (per Grok's recommendation): audio, video, grounding, computer_use, local, reasoning, web_search, x_search, code_execution, file_search, mcp_support, structured_output Deviations from plan (consistent with Phase 1 and Phase 2): - Test signatures use 10-arg (real _send_minimax shape), not 12-arg - PROVIDERS change is at src/models.py:56 (centralized), not in gui_2.py and app_controller.py (which import from models) - t3.4 and t3.15 (credentials template) skipped: no template file exists; the user maintains their own credentials.toml directly Phase 4 (MiniMax refactor) is now unblocked. The refactor replaces ~250 lines of inline OpenAI-compatible send logic in _send_minimax with a thin wrapper around the shared send_openai_compatible helper (per the spec §5.2 target: ~50 lines).	2026-06-11 02:05:37 -04:00
ed	9be228f620	conductor(plan): fix duplicates in Phase 3 state; advance t3.18 (checkpoint)	2026-06-11 02:05:07 -04:00
ed	07bac1c6a7	conductor(plan): mark t3.3-t3.7 + t3.14-t3.17 complete (t3.4/t3.15 cancelled: no template)	2026-06-11 02:04:09 -04:00
ed	f9b5c9372d	feat(grok,llama): add to PROVIDERS; add 11 pricing entries (3 Grok + 8 Llama) Side concerns for Phase 3: 1. PROVIDERS: src/models.py:56 now includes 'grok' and 'llama' alongside the 6 existing vendors. Centralized registry; gui_2.py and app_controller.py import from here. State tasks t3.5 and t3.16 were scoped to gui_2.py/app_controller.py but the actual change is at the centralized registry, per the project's single-source-of- truth pattern (per src/models.py module docstring and the Phase 5 audit script audit_no_models_config_io.py which enforces that PROVIDERS lives in models.py). 2. cost_tracker.py: added 11 regex pricing entries (3 Grok + 8 Llama): Grok (per xAI public pricing): - grok-2: 2.00 / 10.00 - grok-2-vision: 2.00 / 10.00 - grok-beta: 5.00 / 15.00 Llama (per Grok's consultation: pricing varies by backend; registry entries represent the most common case): - llama-3.1-8b-instant: 0.05 / 0.08 (Groq) - llama-3.1-70b-versatile: 0.59 / 0.79 (Groq) - llama-3.1-405b-reasoning: 3.00 / 3.00 (OpenRouter avg) - llama-3.2-1b-preview: 0.04 / 0.04 - llama-3.2-3b-preview: 0.06 / 0.06 - llama-3.2-11b-vision-preview: 0.18 / 0.18 - llama-3.2-90b-vision-preview: 0.90 / 0.90 - llama-3.3-70b-specdec: 0.59 / 0.79 (Groq) (all per 1M tokens, USD; matches the structure of existing entries; note: 'llama-3.1', 'llama-3.2', 'llama-3.3' are regex patterns to allow future model variants in the same family.) Spot check: - estimate_cost('grok-2', 1000, 500) = 0.007 (= 0.002 + 0.005) - estimate_cost('llama-3.3-70b-specdec', 1000, 500) = 0.000985 3. SKIPPED t3.4 and t3.15 (credentials templates): no credentials_template.toml exists in the project (Phase 2 established this). The user maintains their own credentials.toml directly. 4. t3.6 and t3.17 (Grok/Llama models in capability registry) were completed in Phase 1's initial population of 22 entries (commit `6be04bc`). Grok has 4 entries (1 wildcard + 3 models); Llama has 9 entries (1 wildcard + 8 models). Grok-2-vision has vision=True; Llama 3.2-11b/90b vision variants have vision=True. Verification: 38/38 tests pass in batch.	2026-06-11 02:02:56 -04:00
ed	8e3543d875	docs(spec): revise 'best API per vendor' after Grok consultation Grok's own recommendation (consulted 2026-06-11): 'xAI (Grok) \| xAI official OpenAI-compatible (https://api.x.ai/v1) \| Fully compatible and clean. Supports Grok-2 + Grok-2-Vision. No meaningful unique native surface lost by using the compatible endpoint.' This REVERSES the earlier 'xAI native' correction. The OpenAI- compatible approach for Grok is the canonical full-featured path; the implementation in Phase 3 (OpenAI SDK with base_url=https://api.x.ai/v1 + send_openai_compatible helper) is correct as-is. Updates to the spec: 1. §3.1.1: replaced the 'use xAI native' decision with the confirmed per-vendor table. Qwen=Native, Grok=OpenAI-Compatible (per Grok's own confirmation), MiniMax=OpenAI-Compatible, DeepSeek=OpenAI- Compatible, Ollama=OpenAI-Compatible-in-v1 (native in v2), Meta Llama API=Native (new 4th backend, follow-up), Gemini=Native (follow-up), Anthropic=Native (follow-up). Also added Grok's recommended v2 matrix field expansion: audio, video, grounding, computer_use, local, reasoning/extended_thinking, web_search, x_search, code_execution, file_search, mcp_support, structured_output. 2. §4.3: reverted from 'Grok via xAI (Native REST API)' back to 'Grok via xAI (OpenAI-Compatible) - confirmed 2026-06-11'. The implementation does NOT need a native refactor; the OpenAI SDK at https://api.x.ai/v1 is the canonical approach. Removed the earlier 'caching: true' entry from the registry (since the OpenAI-compat shim doesn't expose prompt_cache_key) and the 'no persistent client' state struct (back to the OpenAI SDK pattern). 3. §13.1.B: renamed from 'Native Vendor APIs' to 'Llama Native APIs (Ollama native + Meta Llama API)' and removed the Grok native refactor item (Grok says OpenAI-compat is fine). Kept the Ollama native + Meta Llama API items + matrix expansion. Clarified that Grok tests do NOT need rewriting; only Llama tests get 2 more (native Ollama, Meta Llama API). Net effect: the Phase 3 work that just shipped (Grok+Llama Green using OpenAI-compat shim) is CORRECT as-is. The implementation matches Grok's actual recommendation. No code rollback needed.	2026-06-11 02:01:08 -04:00
ed	29a96cc9f5	feat(ai_client): Add Grok (xAI) OpenAI-compatible provider	2026-06-11 01:56:21 -04:00
ed	06716252f1	docs(spec): add 'best API per vendor' principle; mark xAI native as target; document follow-ups Three additions to the spec, per the user's architectural correction in this session: 1. NEW section 3.1.1: 'Architectural principle: Use the best API per vendor' — explains why the OpenAI-compatible shim loses vendor- specific features (xAI: prompt_cache_key, reasoning_effort, server- side tools, cost_in_usd_ticks; Ollama: think param, images array, thinking field, structured outputs) and states the principle: 'use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.' Also notes that the capability matrix IS the aggregate tracker; future native features go into the matrix, and the GUI filters based on it (no per-vendor UI branches). 2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was 'OpenAI-Compatible'. Now specifies two native endpoints (/v1/chat/completions and /v1/responses), the native features that matter, the updated capability registry (caching=true for Grok via prompt_cache_key), and a 'Phase 3 placeholder behavior' note that this track's Phase 3 ships the OpenAI-compatible Grok as a placeholder. The native refactor is deferred to follow-up B. 3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs (post-OpenAI-compatible-placeholder)' which documents: - Grok → xAI native REST - Llama (Ollama) → native /api/chat - Llama (Meta Llama API) → new 4th backend (deferred pending verification of Meta's API spec; llama.developer.meta.com/docs/overview returned 400 on fetch this session) - Capability matrix expansion (web_search, x_search, code_execution, file_search, mcp_support, reasoning_effort, structured_output) - Test rewrites (mock requests.post instead of chat.completions.create) This is a docs-only commit; no code changes. The Phase 3 Green work continues with the OpenAI-compatible approach as planned in the existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track B handles the native refactor when prioritized.	2026-06-11 01:49:36 -04:00
ed	891c008f0c	conductor(plan): mark t3.1-t3.2 + t3.8-t3.13 complete; advance to t3.3+t3.14 (Green)	2026-06-11 01:42:13 -04:00
ed	90f2be94af	test(grok,llama): red phase for Grok (xAI) + Llama (multi-backend) (8 tests, 6 fail) 8 failing tests in 2 new files for the upcoming Grok and Llama provider implementations. Grok (tests/test_grok_provider.py, 2 tests): 1. test_send_grok_uses_xai_endpoint: _send_grok calls _ensure_grok_client and uses an xAI client (base_url https://api.x.ai/v1) 2. test_grok_2_vision_supports_image: structural check that the capability registry has vision=True for grok-2-vision (already populated in Phase 1, so this test passes in Red phase; it is a regression guard for the registry, not an implementation test) Llama (tests/test_llama_provider.py, 6 tests): 1. test_send_llama_ollama_backend: _send_llama with localhost:11434 (Ollama) base URL 2. test_send_llama_openrouter_backend: _send_llama with OpenRouter URL 3. test_send_llama_custom_url: _send_llama with custom URL (escape hatch for self-hosted) 4. test_llama_model_discovery_unions_ollama_and_openrouter: _list_llama_models returns the 8 models from the capability registry 5. test_llama_3_2_vision_vision_capability: structural check for llama-3.2-11b-vision-preview (passes in Red phase) 6. test_llama_local_backend_cost_tracking_false_for_ollama: the local-LLM signal -- when base_url is localhost, _get_llama_cost_tracking() returns False. This is the first test that exercises the local LLM support that the capability matrix was designed for. Both _reset_grok_state and _reset_llama_state fixtures use hasattr() to be no-ops when the state doesn't exist (Red phase). Test signatures use the real 10-arg _send_minimax signature, NOT the plan's 12-arg with enable_tools / rag_engine. Red phase: 6/8 tests fail (4 AttributeError on missing _send_, 2 ImportError on missing _list_/_get_*). 2/8 pass (registry structural checks). Next: Green phase - implement _send_grok + _ensure_grok_client + _send_llama + _ensure_llama_client + _list_llama_models + _get_llama_cost_tracking in src/ai_client.py.	2026-06-11 01:41:47 -04:00
ed	4204116c66	conductor(plan): mark t2.11 completed (Phase 2 checkpoint)	2026-06-11 01:36:44 -04:00
ed	4d70dcc7ce	conductor(plan): mark t2.11 + phase_2 complete; advance to phase 3	2026-06-11 01:35:22 -04:00
ed	0f2541a3a1	conductor(checkpoint): Phase 2 complete - Qwen via DashScope Phase 2 of qwen_llama_grok_integration_20260606 ships Qwen support via the Alibaba Cloud DashScope native SDK. 10 of 11 state tasks done (t2.7 cancelled: no credentials_template.toml exists in the project; t2.9 was completed in Phase 1's initial registry population). Modules shipped: - src/qwen_adapter.py (31 lines): build_dashscope_tools() (OpenAI shape -> DashScope shape), classify_dashscope_error() (5 exception classes -> ProviderError kinds: auth/network/quota) - src/ai_client.py: state globals (_qwen_client, _qwen_history, _qwen_history_lock, _qwen_region), _ensure_qwen_client() (sets dashscope.base_http_api_url based on region: china vs international), _dashscope_call() + _dashscope_exception_from_response() + _extract_dashscope_tool_calls(), _send_qwen() (10-param signature matching _send_minimax), _list_qwen_models() - src/models.py: 'qwen' added to PROVIDERS (centralized; gui_2.py and app_controller.py import from this list) - src/cost_tracker.py: 7 Qwen pricing entries (regex-matched, USD per 1M tokens) Tests shipped: tests/test_qwen_provider.py (55 lines, 5 tests, all passing) Total new tests this phase: 5 Total tests in new modules: 30 (qwen + minimax + capabilities + openai_compatible + cost_tracker + no_top_level_sdk_imports) Verification: - 30/30 tests pass in batch - No regressions - 4/4 audit scripts pass (audit_main_thread_imports, audit_weak_types, check_test_toml_paths, audit_no_models_config_io) DashScope alignment (post-cleanup): - Uses dashscope.common.error.AuthenticationError (real class in 1.25.21) instead of the non-existent InvalidApiKey - Removed the InvalidApiKey -> AuthenticationError monkey-patch - TimeoutException -> network (not rate_limit) - ServiceUnavailableError -> network (not quota) - _ensure_qwen_client sets base_http_api_url per region (china vs international) per the latest DashScope API spec Deviations from the plan: - Test signature adapted from 12-param (plan) to 10-param (matching real _send_minimax) -- the plan's enable_tools / rag_engine params don't exist on _send_minimax - PROVIDERS change is at src/models.py:56 (centralized), not in gui_2.py and app_controller.py (which import from models) - t2.7 (credentials template) skipped: no template file exists; the user maintains their own credentials.toml directly Phase 3 (Grok + Llama) is now unblocked. Local LLM support lands in Phase 3 via Llama's Ollama backend (default base_url http://localhost:11434/v1).	2026-06-11 01:34:48 -04:00

1 2 3 4 5 ...

3009 Commits