manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	7d60e8f5ab	feat(capability_matrix): populate v2 fields per-model; add runtime local override Updates per-model registry entries to populate the 12 v2 fields where the capability is genuinely supported: minimax-M2.5/M2.7: reasoning=True (uses reasoning_details) grok-2-vision: web_search=True, x_search=True (Live Search) grok-2: web_search=True, x_search=True grok-beta: web_search=True, x_search=True llama-3.1-405b: reasoning=True (explicitly in model name) qwen-long: caching=True (custom long-context chunking) qwen-audio: audio=True (was 'deferred' in v1 notes) Adds the runtime override helper: _apply_runtime_caps_override(app, caps) -> caps with local=True if app.current_provider=='llama' AND _llama_base_url contains 'localhost' or '127.0.0.1' The 'local' flag is the only v2 field that is runtime-state, not a static per-model property (OpenRouter llama is cloud; Ollama llama is local — same model name, different backend). The override uses dataclasses.replace() to mutate the frozen dataclass. Implemented in src/gui_2.py (per the HARD RULE on no new src/.py files). The override is wired into App._get_active_capabilities() so the GUI sees caps.local=True when the active backend is Ollama and caps.local=False otherwise. Also: cost panel in src/gui_2.py (per-tier + session-total columns) now renders 'Free (local)' when caps.local=True (both the per-tier cost column and the session-total line). This is t3_7 (moved from Phase 3 per the user's request; naturally belongs after t4_1 which adds caps.local). Tests: - 3 new tests in tests/test_vendor_capabilities.py: per-model population (reasoning, audio, caching, vision) * runtime override for llama+localhost * runtime override does NOT touch other vendors - 107/107 vendor+tool+provider+import-isolation tests pass (no regressions; +4 new tests this commit) - 3 audit scripts pass	2026-06-11 21:04:36 -04:00
ed	49d516042e	feat(gui): add 'Local Model' badge in provider panel for local backends When the active vendor+model has caps.local=True (per the v2 capability matrix), the provider panel now shows a green ' [Local]' badge next to the provider combo. The tooltip shows the Ollama base URL (when the active provider is llama; otherwise the bare 'Local backend' tooltip). Implements t4_4 of qwen_llama_grok_followup_20260611 Phase 4. Future use: Phase 4 t3_7 (moved from Phase 3) will use caps.local to render 'Free (local)' in the cost column. The badge uses theme.get_color('status_success') (same green used by C_IN / C_NUM / other 'success' indicators). Renders inside the existing render_provider_panel function at src/gui_2.py:2308. Verification: - import src.gui_2 OK (no syntax errors) - 44/44 vendor+capability+provider tests pass (no regressions) - 4 audit scripts pass	2026-06-11 20:50:13 -04:00
ed	25baa6fe25	feat(ai_client): add native Ollama adapter; route localhost to it When _llama_base_url is localhost/127.0.0.1, _send_llama now calls _send_llama_native (the native /api/chat adapter) instead of the OpenAI-compat path. The native adapter supports Ollama's vendor-specific fields: think, images, thinking. Functions added (in src/ai_client.py, per the naming convention HARD RULE on no new src/.py files): ollama_chat(model, messages, , think='low', images=None, tools=None, base_url=OLLAMA_DEFAULT_BASE_URL) -> dict[str, Any] _send_llama_native(md_content, user_message, base_dir, file_items=None, discussion_history='', stream=False, ...callbacks) -> str OLLAMA_DEFAULT_BASE_URL: str = 'http://localhost:11434' Implementation notes: - requests loaded via _require_warmed('requests') (local scope; preserves startup_speedup_20260606 invariant that heavy SDKs are warmed on _io_pool, not imported at module level) - _send_llama dispatches based on 'localhost' in _llama_base_url (same check already used by _get_llama_cost_tracking at line 2500) - Removed orphan def stub at the old _send_llama body (the dead 'def _build_llama_request' that was overwritten by the real one — a known session issue with stale set_file_slice edits) - Native adapter appends the 'thinking' field to history so subsequent rounds preserve the reasoning chain Tests: - 7 new tests in tests/test_llama_ollama_native.py: * ollama_chat hits /api/chat (not /v1/chat/completions) * ollama_chat includes 'think' param in payload * ollama_chat includes 'images' in payload * _send_llama_native wraps ollama_chat * _send_llama_native preserves 'thinking' field * _send_llama routes localhost to native (no openai client) * _send_llama keeps openai path for non-local (no POST) - Updated test_send_llama_ollama_backend in test_llama_provider.py to mock the native path (was: mocked openai-compat; now: mocked requests.post) - 103/103 vendor+tool+provider+import-isolation tests pass (no regressions; +7 new tests this commit) - 4 audit scripts pass	2026-06-11 20:45:08 -04:00
ed	0a9e277564	feat(capability_matrix): add 12 v2 fields to VendorCapabilities The 7 v1 fields (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking) plus 2 cost fields and notes are now extended by 12 v2 fields: local, reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use All default to False. Registry entries continue to work unchanged (backward compatible). t4_1 of Phase 4. Tests: - 12 parameterized 'default is False' tests - 12 parameterized 'round-trip to True' tests - 3 'local flag' tests: per-model, wildcard fallback, vendor isolation - 3 pre-existing registry tests still pass - 96/96 vendor+tool+provider+import-isolation tests pass (no regressions; +27 new tests this commit)	2026-06-11 20:24:30 -04:00
ed	2e181a8216	feat(app_controller): apply 2 of 3 deferred UX adaptations (stream progress + fetch models gate) Task t3.3 (stream progress) + t3.4 (fetch models) of the follow-up track's Phase 3. These were originally deferred in commit 26becf2b; both fit in this session after the side-track report was written. t3.3 (stream progress): - _on_ai_stream now also sets self._ai_status = 'streaming...' when caps.streaming is True (or vendor un-registered) - The 3 'done' / 'error' event dispatches in _handle_generate_send reset self._ai_status accordingly so the status bar doesn't get stuck on 'streaming...' - The 'streaming...' text is already rendered in the post-FX status bar via theme.render_post_fx in gui_2.py:1030 (ai_status field), so no GUI changes needed - Local import of get_capabilities inside _on_ai_stream to avoid loading vendor_capabilities at module level (heavy SDK isolation invariant from startup_speedup_20260606) t3.4 (fetch models iff model_discovery): - Line 1860 (_init_ai_and_hooks / _refresh_from_project): _fetch_models call is now gated on caps.model_discovery. If False, all_available_models stays empty (no network call). - Same pattern applied at the other 2 call sites (start_warmup line 2284, current_provider setter line 2429). The edits were applied (tests pass) but the line numbers in the original audit had drifted; the gating is now in all 3 sites with the same try/except pattern. Test results: 53 tests pass (Minimax + Grok + Llama + DeepSeek + Gemini CLI + tool_loop + openai import + audit scripts). t3.7 ('Free local' for localhost) remains DEFERRED: requires the caps.local field (Phase 4 t4.1). Documented in deferred_work section of state.toml.	2026-06-11 19:18:51 -04:00
ed	26becf2b88	feat(gui): apply 4 of 8 UX capability-matrix adaptations to src/gui_2.py Phase 3 of the follow-up track. Applies the _get_active_capabilities() pattern (established in parent Phase 5 adaptation #1: Screenshot button iff caps.vision) to 4 more UI elements. Adaptations applied: - #2 Tools toggle: 'Active Tool Presets & Biases' panel (line 2224) is now hidden + shows '(tools not supported by X/Y)' hint when caps.tool_calling is False - #3 Cache panel: 'Cache Usage' display (line 1911) now shows 'Cache Usage: N/A (not supported by X/Y)' when caps.caching is False - #6 Token budget max: the max_tokens slider (line 2327) now caps at caps.context_window (was hardcoded 32768) - #9 Cost display '-': the per-tier cost column (line 1890) + session total (line 1894) now show '-' instead of '\.0000' when caps.cost_tracking is False Adaptations deferred (not in this commit): - #4 Stream progress iff streaming: needs a NEW 'streaming...' UI element; the codebase has no existing widget to gate. Recommend adding a small spinner in the status bar during active streams, gated on caps.streaming. - #5 Fetch models iff model_discovery: do_fetch is in app_controller.py, not gui_2.py. The 'Refresh models' button on the provider combo could be gated here. - #7 Cost panel: estimate: ALREADY DONE. The cost column shows \ (Phase 0 of the follow-up inherited this from parent Phase 5; adaptation #7 is effectively completed). - #8 Cost panel: 'Free (local)' for localhost: requires the caps.local field (Phase 4 t4_1). Deferred. Side note: a secondary cost display in render_mma_usage_section (line 5382) is unchanged; it's a 1-line function that would require restructuring to gate. Deferred. The 4 applied adaptations cover the patterns where the capability matrix maps directly to an existing UI element that can be wrapped. The 4 deferred ones require either new UI (#4, #5) or new capability matrix fields (#8, with Phase 4 prerequisite). No tests broken; no imports added.	2026-06-11 18:29:53 -04:00
ed	6c6a4aefa4	refactor(gui): import PROVIDERS from src.ai_client; add audit script Phase 2 tasks 2.3 (update 4 import sites) + 2.4 (audit script). The 4 call sites in src/app_controller.py:3093 and src/gui_2.py {2293, 2849, 5377} were using models.PROVIDERS (which still works via the __getattr__ re-export added in the previous commit). Updated them to use ai_client.PROVIDERS directly: - Models.PROVIDERS goes through the lazy __getattr__ every call (small per-call cost) - ai_client.PROVIDERS is a direct module-level lookup Both files already had 'from src import ai_client' at the top, so no new imports were needed. scripts/audit_providers_source_of_truth.py enforces the invariant: PROVIDERS is declared as a literal only in src/ai_client.py. Catches accidental declarations creeping back into src/models.py or other modules. Catches the literal pattern 'PROVIDERS: List[str] = [' specifically, which the __getattr__ re-export in src/models.py does not match (it's 'from src.ai_client import PROVIDERS'). All 5 audit scripts pass: - audit_main_thread_imports.py - audit_weak_types.py - audit_no_models_config_io.py - audit_no_inline_tool_loops.py - audit_providers_source_of_truth.py (new) 63 vendor + tool + provider + import-isolation tests pass.	2026-06-11 16:43:20 -04:00
ed	74c3b6b274	refactor(ai_client): move PROVIDERS to src/ai_client.py; re-export via models.__getattr__ Phase 2 tasks 2.1 + 2.2 + 2.3a of the follow-up track. PROVIDERS now lives in src/ai_client.py:56 (the canonical home for AI-client-related constants per the HARD RULE on src/ files). The list includes all 8 vendors: gemini, anthropic, gemini_cli, deepseek, minimax, qwen, grok, llama. Backward compat: src/models.py:PROVIDERS is exposed via a module- level __getattr__ (PEP 562) that lazy-imports from src.ai_client. The lazy approach was needed because src.ai_client imports ToolPreset/BiasProfile/Tool from src.models at line 50, so a top-level 'from src.ai_client import PROVIDERS' in models.py would deadlock. Adding a branch to the existing __getattr__ in models.py (which also handles pydantic class factories) is the surgical fix. tests/test_provider_curation.py was stale (expected 5 providers from before Qwen/Grok/Llama were added). Updated to 8. New test: tests/test_providers_source_of_truth.py asserts: - src.ai_client.PROVIDERS exists and matches the 8-provider list - src.models.PROVIDERS still works (re-export) - Both modules reference the SAME object (no drift) Green confirmed: 4 provider tests pass.	2026-06-11 16:38:09 -04:00
ed	9ddfa98133	fix(ai_client): move openai_compatible imports to local scope; fix startup_speedup invariant The follow-up track's tool-loop refactor moved 'from src.openai_compatible import send_openai_compatible, OpenAICompatibleRequest, NormalizedResponse' to MODULE level in src/ai_client.py. This violates the startup_speedup_20260606 invariant: heavy SDKs must not be loaded at module level because ai_client.py is on the main thread's import chain. src/openai_compatible.py line 5 does 'from openai import OpenAIError, ...', so any import from it triggers the openai SDK to load. test_ai_client_does_not_import_openai_at_module_level guards this invariant and was failing. Fix: move the imports back to local scope inside the function bodies that need them: - _default_send closure inside run_with_tool_loop (imports send_openai_compatible) - _send_grok (imports OpenAICompatibleRequest) - _send_minimax (imports OpenAICompatibleRequest) - _send_llama (imports OpenAICompatibleRequest) - _send_gemini_cli (imports OpenAICompatibleRequest + NormalizedResponse) Test patches: tests that previously patched 'src.ai_client.send_openai_compatible' now patch 'src.openai_compatible.send_openai_compatible' (the actual import source). _execute_tool_calls_concurrently patches unchanged (it's defined in src/ai_client.py itself). Green confirmed: 62 vendor + tool + import-isolation tests pass. 0 regressions.	2026-06-11 16:15:49 -04:00
ed	4748d13490	feat(ai_client): add send_func + on_pre_dispatch to run_with_tool_loop; refactor _send_gemini_cli Task 1.7 of the follow-up track. Extends run_with_tool_loop with two optional parameters that let vendored call paths share the shared loop + history + dispatch without forcing them through send_openai_compatible: - send_func: Callable[[int], NormalizedResponse] - vendor's own API call (default = send_openai_compatible if not provided; fully backward compatible) - on_pre_dispatch: Callable[[int, list[dict]], list[dict]] - per-vendor hook to mutate the tool-call list before dispatch AND to capture results for the next round (e.g. Gemini CLI sets payload = tool_results_for_cli so the next send_func call sends the tool results back to the CLI) _refactor _send_gemini_cli to use the new parameters. The inline for loop + tool dispatch + history append are all delegated to the helper. The vendor's send_func closure handles: - adapter.send (the CLI subprocess call) - resp_data parsing (text + tool_calls + usage + stderr) - events.emit for request_start + response_received - _append_comms for IN/OUT comms logging - The 'txt + calls -> history_add' special case The vendor's on_pre_dispatch closure handles: - _execute_tool_calls_concurrently (re-invoked here because the helper's call passes raw tool_calls but the vendor needs to mutate payload AND log results) - _reread_file_items + _build_file_diff_text (file diff re-read at last tool result) - MAX_ROUNDS system message - _truncate_tool_output - _MAX_TOOL_OUTPUT_BYTES budget warning - Payload mutation for the next round Green confirmed: 53 vendor + tool tests pass (14 Gemini CLI + 5 tool_loop core + 1 builder + 2 send_func + 6 MiniMax + 2 Grok + 7 Llama + 9 DeepSeek + 8 others). No regressions.	2026-06-11 14:48:03 -04:00
ed	4069d67716	feat(tool_loop): apply run_with_tool_loop to Grok + Llama (Qwen deferred) Task 1.6 of the follow-up track. _send_grok and _send_llama now share the same tool-loop helper as the rest of the vendors. Both functions add tool-calling support that they previously lacked (parent Phase 3 shipped them as single-shot only). The plan's Task 1.6 title says 'add missing loop' which matches this scope. tool_choice='auto' if tools else 'auto' matches the MiniMax pattern. Qwen deferral: _send_qwen uses _dashscope_call (DashScope native SDK), not send_openai_compatible. run_with_tool_loop hard-codes send_openai_compatible. Wiring Qwen through the helper requires either (a) switching Qwen to OpenAI-compat mode, or (b) adding a Qwen-specific loop variant that uses _dashscope_call. Both are non-trivial and out of scope for Task 1.6. Tracked as a follow-up note in the state.toml. Module-level imports added (same pattern as the previous commits in this track): OpenAICompatibleRequest, get_capabilities were imported locally inside the affected functions. Moved to module-level so the test patches and helper signature can reference them by symbol. Green confirmed: 51 vendor + tool tests pass.	2026-06-11 14:24:39 -04:00
ed	19a4d43e32	refactor(minimax): use run_with_tool_loop shared helper (68 -> 44 lines) Task 1.3 of the follow-up track. _send_minimax now uses run_with_tool_loop with a per-round request_builder callback that re-reads _minimax_history under _minimax_history_lock. The plan's Task 1.3 example builds the request once before the loop. That would break MiniMax tool flows because the API would not see the tool results appended to _minimax_history on later rounds. The fix: extend run_with_tool_loop's 2nd arg to accept Union[OpenAICompatibleRequest, Callable[[int], OpenAICompatibleRequest]] (backward compatible; static-request vendors pass a single request). MiniMax now passes a closure that rebuilds messages from history each round. Reasoning extraction: MiniMax exposes its chain-of-thought via response.raw_response.choices[0].message.reasoning_details[0]. get('text'). Lifted to a _extract_minimax_reasoning callback passed as reasoning_extractor=... (the new parameter added in the previous commit). Trim callback: wraps _trim_minimax_history so it can be called from run_with_tool_loop after each tool-result append. Green confirmed: 51 vendor + tool tests pass (6 MiniMax + 5 tool_loop core + 1 tool_loop builder + 39 others); the new test_ai_client_tool_loop_builder.py locks in the per-round builder contract.	2026-06-11 13:35:45 -04:00
ed	1c836647ef	feat(ai_client): add run_with_tool_loop shared helper for all 8 vendors Tasks 1.1 (red) + 1.2 (green) of the follow-up track. Adds a single shared tool-call loop in src/ai_client.py that all 8 vendor entry points (anthropic, gemini, gemini_cli, deepseek, minimax, qwen, grok, llama) can call instead of maintaining their own inline loop. Function shape: - 1-space indentation (project standard) - 60 lines (vs ~30 lines of inline loop body per vendor) - Operates on src.openai_compatible.send_openai_compatible (no local import — module-level import added for the same path used by the 4 inline-loop vendors) - 8 vendor-specific knobs: pre_tool_callback, qa_callback, stream_callback, patch_callback, base_dir, vendor_name, history_lock, history, trim_func, reasoning_extractor - Threads the asyncio.get_running_loop / RuntimeError fallback to handle the no-event-loop case (matches the existing inline pattern from _send_minimax) - Uses _execute_tool_calls_concurrently (the existing concurrent dispatcher) — no new dispatch code Deviations from plan/Task 1.1: - The plan's test code patched src.tool_loop.send_openai_compatible and the plan's Task 1.3 vendor wrapper imported 'from src.tool_loop import run_with_tool_loop'. The plan predates the AGENTS.md HARD RULE on src/<thing>.py files; per the follow-up track's Naming Convention section, run_with_tool_loop lives IN src/ai_client.py. Tests patch src.ai_client.send_openai_compatible and the vendor wrapper imports 'from src.ai_client import run_with_tool_loop' (next task). - Added a reasoning_extractor: Callable[[Any], str] = None parameter to support MiniMax's reasoning_content extraction. Without this the helper would force MiniMax to lose its reasoning prefix. Green confirmed: 50 vendor + tool tests pass; 4 audit scripts pass.	2026-06-11 12:59:36 -04:00
ed	40cf36edef	feat(gui): adaptation 1 of 9 - Screenshot button iff vision Phase 5 t5.2 partial: applied adaptation 1 from spec §6 to render_files_and_media (src/gui_2.py:3030). The 'Add Screenshots' button is now disabled when the active model's capability matrix has vision=False. A tooltip-adjacent text_disabled note shows '(vision not supported by <model>; attachments would be ignored)' so the user knows WHY the button is disabled. Pattern established for the remaining 8 adaptations (t5.2.2 through t5.2.9 per spec §6): caps = app._get_active_capabilities() imgui.begin_disabled(not caps.<field>) ... UI ... imgui.end_disabled() if not caps.<field>: imgui.same_line() imgui.text_disabled('(reason)') The remaining 8 adaptations (tools toggle, cache panel, stream progress, fetch models, token budget, cost panel x3) are deferred to a follow-up track. The pattern is established; the work is mechanical application of it. 38/38 regression tests still pass; no behavioral change beyond the adaptation 1 wrapping.	2026-06-11 09:13:17 -04:00
ed	221cd33493	feat(gui): add _get_active_capabilities() helper to App class Phase 5 t5.1: the helper reads the capability matrix for the currently active (provider, model) pair and returns the VendorCapabilities. Falls back to an 'unregistered' VendorCapabilities if the pair is not in the registry (e.g., a brand-new model name the user types in). The 9 UX adaptations in spec §6 will call this helper to read the capability flags (vision, tool_calling, caching, streaming, etc.) and adapt the GUI accordingly. Also fixed pre-existing indentation inconsistency in the App class property methods (current_provider / current_model): the first @property had 2-space indent but the body and subsequent def had 1-space indent (matching the project style). The mismatch was latent; the new helper exposed it. Now uniform 1-space indent. 38/38 regression tests still pass; no behavioral change beyond the helper addition.	2026-06-11 09:10:47 -04:00
ed	9169fae268	feat(vendor_capabilities): add 4 per-model MiniMax entries to registry Phase 4 t4.4: the wildcard entry 'minimax/' was the only minimax registration; this adds specific entries for the 4 fallback model names returned by _list_minimax_models() at src/ai_client.py:2112 ('MiniMax-M2.7', 'MiniMax-M2.5', 'MiniMax-M2.1', 'MiniMax-M2'). Each per-model entry mirrors the wildcard defaults (context_window=131072, cost=0.20/0.20 per Mtok). Per-model entries let the matrix return exact capability data for known models; the '' wildcard still catches new / future model names that aren't in the registry. State [openai_compatible_models] minimax_models_refactored flag flips to true (in the next state commit) -- this is the model-level coverage the flag tracks.	2026-06-11 08:55:09 -04:00
ed	c9ed734d9d	refactor(minimax): restore tool-call loop in _send_minimax The previous refactor (commit `344a66fc`) dropped the tool-call loop in _send_minimax. The original function executed tool calls when the response had tool_calls; the refactor was single-shot. This is a real behavior regression (tools stop working) even though the existing tests don't catch it. Restore the tool loop: - For each round (up to MAX_TOOL_ROUNDS + 2), call send_openai_compatible with tools=_get_deepseek_tools() and tool_choice='auto' - If response has tool_calls: dispatch each via _execute_tool_calls_concurrently (handles both async context and sync via run_coroutine_threadsafe / asyncio.run), append each result to _minimax_history with role='tool' and tool_call_id - If no tool_calls: return the response text (with thinking tags for reasoning models) - The lock is acquired/released per iteration to avoid holding it during the API call (which can take seconds) Preserved: - 10-arg signature - _minimax_history_lock (now acquired per iteration) - _repair_minimax_history - discussion_history handling - System + context message wrapping - Reasoning content extraction (response.raw_response.choices[0].message .reasoning_details[0].get('text', '')) - <thinking> tags wrap on the final response Dropped (still): - extra_body={reasoning_split: True} (not supported by send_openai_compatible; would be a Phase 5 adapter addition if minimax-reasoner models need it) New line count: 75 lines (vs 41 single-shot, vs 231 pre-refactor). Net effect: 231 -> 75 = 68% reduction; tool loop preserved. Verification: 38/38 tests pass (no regressions).	2026-06-11 08:48:07 -04:00
ed	344a66fc53	refactor(minimax): use send_openai_compatible helper (231 -> 41 lines)	2026-06-11 02:21:28 -04:00
ed	f9b5c9372d	feat(grok,llama): add to PROVIDERS; add 11 pricing entries (3 Grok + 8 Llama) Side concerns for Phase 3: 1. PROVIDERS: src/models.py:56 now includes 'grok' and 'llama' alongside the 6 existing vendors. Centralized registry; gui_2.py and app_controller.py import from here. State tasks t3.5 and t3.16 were scoped to gui_2.py/app_controller.py but the actual change is at the centralized registry, per the project's single-source-of- truth pattern (per src/models.py module docstring and the Phase 5 audit script audit_no_models_config_io.py which enforces that PROVIDERS lives in models.py). 2. cost_tracker.py: added 11 regex pricing entries (3 Grok + 8 Llama): Grok (per xAI public pricing): - grok-2: 2.00 / 10.00 - grok-2-vision: 2.00 / 10.00 - grok-beta: 5.00 / 15.00 Llama (per Grok's consultation: pricing varies by backend; registry entries represent the most common case): - llama-3.1-8b-instant: 0.05 / 0.08 (Groq) - llama-3.1-70b-versatile: 0.59 / 0.79 (Groq) - llama-3.1-405b-reasoning: 3.00 / 3.00 (OpenRouter avg) - llama-3.2-1b-preview: 0.04 / 0.04 - llama-3.2-3b-preview: 0.06 / 0.06 - llama-3.2-11b-vision-preview: 0.18 / 0.18 - llama-3.2-90b-vision-preview: 0.90 / 0.90 - llama-3.3-70b-specdec: 0.59 / 0.79 (Groq) (all per 1M tokens, USD; matches the structure of existing entries; note: 'llama-3.1', 'llama-3.2', 'llama-3.3' are regex patterns to allow future model variants in the same family.) Spot check: - estimate_cost('grok-2', 1000, 500) = 0.007 (= 0.002 + 0.005) - estimate_cost('llama-3.3-70b-specdec', 1000, 500) = 0.000985 3. SKIPPED t3.4 and t3.15 (credentials templates): no credentials_template.toml exists in the project (Phase 2 established this). The user maintains their own credentials.toml directly. 4. t3.6 and t3.17 (Grok/Llama models in capability registry) were completed in Phase 1's initial population of 22 entries (commit `6be04bc`). Grok has 4 entries (1 wildcard + 3 models); Llama has 9 entries (1 wildcard + 8 models). Grok-2-vision has vision=True; Llama 3.2-11b/90b vision variants have vision=True. Verification: 38/38 tests pass in batch.	2026-06-11 02:02:56 -04:00
ed	29a96cc9f5	feat(ai_client): Add Grok (xAI) OpenAI-compatible provider	2026-06-11 01:56:21 -04:00
ed	ab6b53fa8b	feat(qwen): add qwen to PROVIDERS; add 7 Qwen pricing entries to cost_tracker Side concerns for Phase 2: 1. PROVIDERS: src/models.py:56 now includes 'qwen' alongside the existing 5 vendors. The other 4 references to PROVIDERS in src/gui_2.py and src/app_controller.py import from this centralized list, so this one edit propagates everywhere. State task t2.8 was scoped to 'gui_2.py and app_controller.py' but the actual change is at the centralized registry, per the project's single-source-of-truth pattern (per src/models.py module docstring and the Phase 5 audit script audit_no_models_config_io.py which enforces that PROVIDERS lives in models.py). 2. cost_tracker.py: added 7 regex pricing entries for the Qwen models shipped in Phase 1's vendor_capabilities.py: - qwen-turbo: 0.05 / 0.10 - qwen-plus: 0.40 / 1.20 - qwen-max: 2.00 / 6.00 - qwen-long: 0.07 / 0.28 - qwen-vl-plus: 0.21 / 0.63 - qwen-vl-max: 0.50 / 1.50 - qwen-audio: 0.10 / 0.30 (all per 1M tokens, USD; matches the structure of existing entries) Spot check: estimate_cost('qwen-max', 1000, 500) = 0.005 (= 0.002 + 0.003) 3. SKIPPED t2.7 (credentials template): no credentials_template.toml exists in the project. The only credentials file is the active credentials.toml which the user maintains directly with their own API keys. The plan's assumption of a template file does not match the project's actual structure. Documented in the commit log rather than modifying the user's actual credentials.toml with a placeholder key (which would be inconsistent with the rest of that file's pattern of real keys). When the user obtains a DashScope API key, they can add a [qwen] section directly. 4. t2.9 (Qwen models in capability registry) was completed in Phase 1's initial population of 22 entries (commit `6be04bc`). The 8 qwen entries (1 wildcard + 7 specific models) are in src/vendor_capabilities.py. Verification: 30/30 tests pass in batch (test_qwen_provider, test_minimax_provider, test_ai_client_no_top_level_sdk_imports, test_vendor_capabilities, test_openai_compatible, test_cost_tracker)	2026-06-11 01:30:38 -04:00
ed	de5e106234	fix(qwen): align with dashscope 1.25.21 API; remove InvalidApiKey monkey-patch	2026-06-11 01:26:53 -04:00
ed	b75f60c3fe	feat(ai): Add Qwen provider support to ai_client	2026-06-11 01:20:35 -04:00
ed	bc2cce1612	feat(ai): Add Qwen adapter for DashScope provider	2026-06-11 01:20:19 -04:00
ed	d7d7d5cef9	feat(openai_compatible): implement shared send helper with streaming/tool/vision/error Green phase: src/openai_compatible.py now exists and all 6 Red-phase tests in tests/test_openai_compatible.py pass. Implementation (144 lines, 1-space indent, no comments): Data structures: - NormalizedResponse: frozen dataclass with text, tool_calls, usage_input_tokens, usage_output_tokens, usage_cache_read_tokens, usage_cache_creation_tokens, raw_response - OpenAICompatibleRequest: regular dataclass with messages, model, temperature=0.0, top_p=1.0, max_tokens=8192, tools=None, tool_choice='auto', stream=False, stream_callback=None Algorithms: - send_openai_compatible(client, request, *, capabilities) -> NormalizedResponse Dispatches to _send_blocking or _send_streaming based on request.stream. Catches openai.OpenAIError and re-raises as classified ProviderError. - _send_blocking: extracts message text + tool_calls, converts tool_calls to dicts via _to_dict_tool_call, reads usage.prompt_tokens / usage.completion_tokens (with int() coercion for MagicMock test compat). - _send_streaming: iterates chunks, accumulates text parts, aggregates tool_calls by index, fires stream_callback per text delta, reads chunk.usage for final token counts. - _classify_openai_compatible_error: maps RateLimitError -> 'rate_limit', AuthenticationError/PermissionDeniedError -> 'auth', APIConnectionError -> 'network', APIStatusError with 402/429/401-403/500-504 -> 'balance'/ 'rate_limit'/'auth'/'network', BadRequestError -> 'quota', fallback 'unknown'. All use provider='openai_compatible'. Fixed plan's code smell: removed the 'MagicMock_noop' forward-reference class (defined after first use) and replaced with the cleaner Pythonic pattern 'int(getattr(usage, prompt_tokens, 0) or 0)'. Real OpenAI SDK always sets usage on responses; the defensive fallback was noise. Function-level import of ProviderError inside _classify_openai_compatible_error avoids any circular import risk.	2026-06-11 00:39:58 -04:00
ed	6be04bc4f0	feat(vendor_capabilities): implement registry with initial 22-entry population Green phase: src/vendor_capabilities.py now exists and all 3 Red-phase tests in tests/test_vendor_capabilities.py pass. Implementation: - VendorCapabilities frozen dataclass with 12 fields (vendor, model, vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking, cost_input_per_mtok, cost_output_per_mtok, notes) - Module-level _REGISTRY dict keyed by (vendor, model) - register() inserts/overwrites entries - get_capabilities() returns specific entry if present, else vendor '' default, else raises KeyError with 'No capabilities registered' message - list_models_for_vendor() returns sorted model names for a vendor (excludes '' wildcard) Initial population (22 entries at module load): - 1 minimax wildcard (cost: 0.20/0.20 per Mtok) - 4 grok (1 wildcard + 3 models; grok-2-vision has vision=True) - 9 llama (1 wildcard + 8 models; 11b/90b vision variants have vision=True) - 8 qwen (1 wildcard + 7 models; qwen-vl-plus/max have vision=True; qwen-audio has notes='Text-only in v1; audio input deferred') The plan's Task 1.3 listed 22 entries but included one impossible entry (vendor='minimax', model='grok-2-latest'). Omitted; 21 entries shipped. Test fix: test_fallback_to_vendor_default previously used model name 'llama-3.3-70b-specdec' which IS in the registry, so the specific entry was returned (with default cost_tracking=True), not the wildcard. Fixed by changing to 'llama-3.3-future-unregistered' (not in registry, so fallback fires correctly).	2026-06-11 00:30:52 -04:00
ed	9f89511743	fix(session_logger): correct stale file layout in module docstring The top-of-file docstring claimed 'logs/sessions/comms_<ts>.log' with <ts> as a filename prefix. Actual: per-session subdir 'logs/sessions/<session_id>/' with plain filenames (comms.log, toolcalls.log, apihooks.log, clicalls.log). The <ts>/session_id is the PARENT DIR, not a filename prefix. Per commit 73e1a36d (per-session subdirs), the per-session directory is the unit of isolation. apihooks.log is a fourth log file the old docstring omitted entirely. Also added the new files (apihooks.log, outputs/ subdir) and clarified the scripts/generated/ dual-write pattern.	2026-06-10 20:59:10 -04:00
ed	2972d235a3	fix(io_pool): correct stale docstring (4 threads -> 8 threads) Per IO_POOL_MAX_WORKERS = 8 (set in commit `4a338486` on 2026-06-06 to relieve contention during batched sims), the pool actually has 8 workers, not 4. The docstring was stale. Also added the SHAs of the 4->8 bump for traceability.	2026-06-10 20:50:55 -04:00
ed	f51bfdcd05	fix(rag): remove INVESTIGATE diagnostic logging	2026-06-10 17:37:03 -04:00
ed	5a9b8d6891	fix(test+rag): clean chroma cache pre-test + add INVESTIGATE stderr for RAG init	2026-06-10 17:20:57 -04:00
ed	dc90c54161	fix(rag): reset rag_config to default RAGConfig() (not None) in _handle_reset_session	2026-06-10 13:15:36 -04:00
ed	d945cb7432	fix(controller): re-apply FR1+FR2 (mma_tier_usage pre-population + _flush_to_project defensive d.get)	2026-06-10 11:55:22 -04:00
ed	4660b8c874	fix(sim): defensive .setdefault('paths', []) in test_context_sim_live	2026-06-10 11:33:15 -04:00
ed	4284ec6eba	fix(controller): remove 'persona_manager' from _LAZY_MANAGER_DEFAULTS	2026-06-10 09:03:12 -04:00
ed	bc4651d1e4	fix(controller): re-add self.context_preset_manager init (lost in `72f8f466`)	2026-06-10 08:56:35 -04:00
ed	1919aa8a32	fix(controller): _flush_to_project defensive against missing 'model' key	2026-06-10 08:48:57 -04:00
ed	d80c94b973	fix(controller): pre-populate mma_tier_usage on reset (restore _flush_to_project contract)	2026-06-10 08:46:54 -04:00
ed	f5021360f1	wip: pre-mma-tier-usage-reset-fix (preserve inherited working tree)	2026-06-10 08:43:18 -04:00
ed	72f8f466fe	fix(sim+api): proper wait loops, project switch endpoint, drop stale check Three real fixes for the sim test + the live_gui coordination layer: 1. /api/project_switch_status endpoint in src/app_controller.py. The wait helper had been calling this endpoint but it did not exist; the helper always received a 404, fell back to {in_progress: False}, and returned immediately even when a switch was in flight. Added the endpoint that reads _project_switch_in_progress, active_project_path, and _project_switch_error from the controller. 2. simulation/sim_base.py: replace time.sleep(2.0)/time.sleep(1.5) in the setup() with wait_io_pool_idle and wait_for_project_switch so the test does not click btn_md_only while a project switch is in flight. Also added the wait calls to sim_context.py for the same reason. 3. src/app_controller.py _handle_md_only: removed the is_project_stale() early-return. The stale state is a transient window during which the previous code dropped the click on the floor with a misleading 'stale ui' status. The MD generation worker is safe to run from any project state; the action handler now always proceeds. 4. tests/test_extended_sims.py: set current_model to 'gemini-cli' so _do_generate does not raise KeyError('model') when the test overrides provider to gemini_cli. KNOWN ISSUE: test_context_sim_live still fails with status 'switching to: temp_livecontextsim' after a 60s wait. The click appears to be re-triggering a project switch via the GUI's render loop. Root cause investigation deferred; the sim is async and the test path is fragile.	2026-06-10 00:31:22 -04:00
ed	fe240db410	fix(reset): clear mma_tier_usage and RAG state in _handle_reset_session	2026-06-09 19:44:10 -04:00
ed	3b0e63124a	fix(mma): process global mma_state_update when no track in payload	2026-06-09 17:45:13 -04:00
ed	b8fcd9d6f5	fix(rag): coalesce _sync_rag_engine calls via token + dirty flag	2026-06-09 16:25:44 -04:00
ed	644d88ab93	fix(rag): break recursion in _validate_collection_dim The wipe path called self._init_vector_store() which re-invoked _validate_collection_dim, causing infinite recursion (RecursionError) when the dim mismatch test ran with the mock embedding provider. Re-initialize the vector store INLINE after the rmtree wipe so the fresh collection is created without going through the validator again.	2026-06-09 14:47:01 -04:00
ed	64bc04a6b8	fix(rag): wipe chroma dir on dim mismatch instead of delete_collection When the existing collection has embeddings from a different embedding provider (e.g. Gemini 3072-dim vs local 384-dim), the prior approach of calling client.delete_collection() fails with 'RustBindingsAPI object has no attribute bindings' in chromadb 1.5.x when the underlying state is corrupted. rmtree is reliable and re-creates a fresh empty collection. Also fixes: - 'The truth value of an empty array is ambiguous' on numpy 2.x by using try/except around len() instead of truthiness check - WinError 32 on rmtree by closing the chroma client first Verified: tests/test_rag_phase4_final_verify.py passes in isolation in 7.75s after this fix. The test still fails in batch context due to a separate io_pool race condition (multiple _sync_rag_engine calls collide when the test sets rag_enabled, rag_source, and rag_emb_provider in sequence). The race is in app_controller.py and is out of scope for this defensive fix. Note: tests/test_rag_engine.py has explicit unit tests for test_rag_collection_dim_mismatch_recreates_collection and test_rag_collection_dim_match_preserves_collection which exercise this code path.	2026-06-09 14:37:19 -04:00
ed	eb8357ec0e	fix(rag): add CWD fallback in index_file for path-resolution resilience RAGEngine.index_file silently returns when the joined base_dir+file_path doesn't exist. This caused the RAG batch test to fail with 0 indexed documents when the live_gui subprocess's active_project_root resolved to a parent dir (e.g. tests/artifacts/) instead of the workspace (tests/artifacts/live_gui_workspace/). The fix: if the primary path doesn't exist, try CWD+file_path. The base_dir takes priority; CWD is a safety net for relative-path resolution across the spawn CWD boundary. This is a defensive fix at the rag_engine layer. It does NOT fix the underlying path-leakage issue in tests/conftest.py (hardcoded Path('tests/artifacts/live_gui_workspace')) which needs a proper fixture refactor. The RAG test still fails in batch due to that deeper issue, documented in docs/reports/rag_test_batch_failure_status_20260609_pm3.md. Behavior: - base_dir+file_path exists: indexed from base_dir (unchanged) - base_dir+file_path missing, CWD+file_path exists: indexed from CWD (new) - Both missing: silently returns (unchanged) Verified: tests/test_rag_index_file_path_fallback.py (3 tests, all pass) - test_index_file_finds_file_via_cwd_fallback - test_index_file_uses_base_dir_first - test_index_file_silently_returns_when_no_match Note: test file was removed before commit because it was being abandoned along with the broader path-hygiene refactor. The fix itself is preserved in src/rag_engine.py.	2026-06-09 12:31:21 -04:00
ed	e62266e868	fix(rag): surface embedding provider init failure as 'error' status The bug: when the local embedding provider fails to initialize (e.g. sentence-transformers not installed), RAGEngine.__init__ leaves self.embedding_provider = None (initialized at line 93 but never overwritten by the failing LocalEmbeddingProvider ctor). The constructor returns. _sync_rag_engine's else branch then sets status to 'ready' - a lie. The RAG panel shows 'ready'. The user triggers a retrieval. The engine either has a broken embedding provider (None) or the retrieval fails silently. The RAG context never appears in the AI's history. The fix: in _sync_rag_engine's _task, after RAGEngine(...) returns, check if engine.embedding_provider is None. If so, set status to 'error: RAG embedding provider failed to initialize' and return early. This prevents: - The engine from being assigned to self.rag_engine - The rebuild being triggered - The status being set to 'ready' / 'indexing' Note: this does NOT make the RAG test pass. The test requires the sentence-transformers package which isn't installed in this env. The fix makes the failure reliable (not flaky) and surfaces the right error message. TDD: 3 tests added in tests/test_rag_engine_ready_status_bug.py: - RAGEngine ctor raises ImportError on missing sentence-transformers - _sync_rag_engine sets status to 'error' (not 'ready') on init failure - RAGEngine ctor leaves embedding_provider=None when init fails All 3 pass. The RAG batch test now fails reliably at line 46 with the clear error message.	2026-06-09 09:39:02 -04:00
ed	bcdc26d0bd	fix(gui): correct __getattr__ to not silently return None for missing ui_ attrs PR1 follow-up (the actual IM_ASSERT root cause fix). The IM_ASSERT in 'MainDockSpace' was triggered by the render_approve_script_modal function (gui_2.py:4895) calling imgui.checkbox with a None value for app.ui_approve_modal_preview. The chain of bugs: 1. AppController.__getattr__ returned None for ANY ui_ attribute (line 1237-1238). This was intended as a safety net for ui_* flags defined in __init__ but it was too généreux: it returned None for ui_ attrs that were NEVER set. 2. The pattern in render_approve_script_modal: if not hasattr(app, 'ui_approve_modal_preview'): app.ui_approve_modal_preview = False _, app.ui_approve_modal_preview = imgui.checkbox(..., app.ui_approve_modal_preview) relied on hasattr() returning False for unset attrs to trigger the initialization. But the App.__setattr__ checks hasattr(self.controller, name) to decide where to route assignments. The controller's __getattr__ returned None for ui_approve_modal_preview, so hasattr() returned True. The App.__setattr__ routed the assignment to the controller. The controller's __getattr__ then returned None on read, silently dropping the False value. 3. The next line called imgui.checkbox with None, which raised a TypeError. The TypeError propagated out of render_approve_script_modal without closing the modal, leaving the ImGui scope stack unbalanced. The unbalanced scope triggered IM_ASSERT(Missing End()) on the next frame. Fix: AppController.__getattr__ now only returns None for an EXPLICIT allowlist of ui_ attrs that are defined in __init__. For any other missing attribute (including the case 'hasattr() should return False'), it raises AttributeError. The App.__getattr__ was also fixed (per the test) to check hasattr(controller, name) before delegating. This is defense in depth in case other __getattr__ patterns are added. Test verification (TDD red → green): - 1/1 test_app_getattr_hasattr_bug PASSES (verifies hasattr returns False for unset attrs via App.__getattr__) - 1/1 test_app_controller_getattr_ui_bug PASSES (verifies hasattr returns False for unset ui_ attrs on controller) Live verification: - 4 sims + test_live_workflow + 2 markdown tests: 7/7 PASS in 83.15s - Previously failed at 200s+ with 'cannot schedule new futures after shutdown' / 121s with 'GUI is degraded before test starts' - Now passes cleanly. The IM_ASSERT no longer fires. 13/13 related unit tests pass (app_controller_* + app_run_* + app_getattr_*). No regressions in 51/51 io_pool/warmup/sigint/etc. unit tests.	2026-06-08 23:45:25 -04:00
ed	1c565da7a0	feat(gui): wrap immapp.run in try/except + add /api/gui_health endpoint PR2 of the test_full_live_workflow_imgui_assert fix sequence. When an ImGui scope mismatch (IM_ASSERT(Missing End())) fires in immapp.run (e.g. after cumulative state corruption from prior sims' panel renders), the RuntimeError propagates out of app.run(). The controller's _io_pool gets shut down via __del__/finalization. The hook server (separate ThreadingHTTPServer) survives. Subsequent test clicks fail with 'cannot schedule new futures after shutdown' and the test times out after 120s with no clear signal of what went wrong. This commit: 1. Wraps immapp.run in try/except RuntimeError in gui_2.py:618. On assertion: logs the error to stderr (NOT silent), records it on controller._gui_degraded_reason and _last_imgui_assert, and returns from run() so the hook server keeps serving. 2. Adds _gui_degraded_reason and _last_imgui_assert to AppController.__init__ (initialized to None). 3. Adds /api/gui_health endpoint in api_hooks.py:148. Returns {healthy, degraded_reason, last_assert, io_pool_alive}. 4. Adds ApiHookClient.get_gui_health() with the matching unit tests (3 mocked tests + 1 live test). Per user feedback 2026-06-08: - The wrap does NOT silently swallow the error. It logs at ERROR level and surfaces it via the health endpoint. - Tests can call client.get_gui_health() to detect a degraded GUI and fail fast with a clear message. TDD: tests written first, confirmed to fail, then fix applied. 34/34 unit tests pass. 1/1 live test passes (live_gui health endpoint reports healthy=True on fresh subprocess).	2026-06-08 20:46:41 -04:00
ed	4a33848620	fix(io_pool): increase worker count from 4 to 8 to prevent test hangs Root cause: test_full_live_workflow in batch context (with prior sims running AI discussion turns) would queue its _do_project_switch behind the auto-pruner's scan of tests/logs/ (154MB, 6519 files). The 4-worker pool was saturated, so the switch would never run within 30s. Fix: bump IO_POOL_MAX_WORKERS from 4 to 8. This gives the pool enough capacity to run: 2 pruners + the project switch + 5 spare. Also: add /api/io_pool_status endpoint + get_io_pool_status + wait_io_pool_idle helpers (kept in api_hooks.py and api_hook_client.py for the test_api_hook_client_io_pool.py tests, even though the test itself no longer uses them - they remain useful for future tests that want to assert pool state directly). Also: add wait_for_warmup at the start of test_full_live_workflow to ensure SDK modules are loaded before AI ops. Test verification: - test_full_live_workflow in isolation: 11.83s PASS - test_full_live_workflow in batch (with 4 prior sims): 83.46s PASS - 30/30 related unit tests PASS	2026-06-08 17:49:34 -04:00
ed	9afc93bce2	fix(app_controller): clear project-switch state in _handle_reset_session When a prior test in the tier-3-live_gui batch leaves a _do_project_switch background thread running, the next test's btn_project_new_automated click sees _project_switch_in_progress=True (from the prior thread) and queues the new path via _project_switch_pending_path. The queued switch is never actually submitted to the io_pool, so is_project_stale() stays True and AI ops (_handle_generate_send) bail with 'project switch in progress; AI ops disabled'. Fix: _handle_reset_session now also clears _project_switch_in_progress, _project_switch_pending_path, and _project_switch_error (under the existing _project_switch_lock). This way, even if the prior background thread is still running, the controller reports an idle state and the new switch can be submitted normally. Also: - src/api_hook_client.py: reverted wait_for_project_switch to require in_progress=False (was relaxed to return on queued path, which misled the caller into thinking the switch was done) - tests/test_handle_reset_session_clears_project.py: new test test_handle_reset_session_clears_project_switch_state asserts is_project_stale() returns False after reset - tests/test_api_hook_client_wait_for_project_switch.py: updated test_wait_for_project_switch_does_not_return_on_queued (in_progress + matching path should keep waiting, not return early) - tests/test_live_workflow.py: added pre-wait for any in-flight switch before doing btn_reset (so the test waits up to 60s for the prior switch to complete if needed) - conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with the deeper hang analysis and recommended fix Known follow-up: test_full_live_workflow still hangs in tier-3 batch even with this fix, because the new _do_project_switch itself is hung in the io_pool (likely saturation from prior sims' AI discussion turn workers). Deeper investigation required.	2026-06-08 15:19:30 -04:00

1 2 3 4 5 ...

784 Commits