manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	c9ed734d9d	refactor(minimax): restore tool-call loop in _send_minimax The previous refactor (commit `344a66fc`) dropped the tool-call loop in _send_minimax. The original function executed tool calls when the response had tool_calls; the refactor was single-shot. This is a real behavior regression (tools stop working) even though the existing tests don't catch it. Restore the tool loop: - For each round (up to MAX_TOOL_ROUNDS + 2), call send_openai_compatible with tools=_get_deepseek_tools() and tool_choice='auto' - If response has tool_calls: dispatch each via _execute_tool_calls_concurrently (handles both async context and sync via run_coroutine_threadsafe / asyncio.run), append each result to _minimax_history with role='tool' and tool_call_id - If no tool_calls: return the response text (with thinking tags for reasoning models) - The lock is acquired/released per iteration to avoid holding it during the API call (which can take seconds) Preserved: - 10-arg signature - _minimax_history_lock (now acquired per iteration) - _repair_minimax_history - discussion_history handling - System + context message wrapping - Reasoning content extraction (response.raw_response.choices[0].message .reasoning_details[0].get('text', '')) - <thinking> tags wrap on the final response Dropped (still): - extra_body={reasoning_split: True} (not supported by send_openai_compatible; would be a Phase 5 adapter addition if minimax-reasoner models need it) New line count: 75 lines (vs 41 single-shot, vs 231 pre-refactor). Net effect: 231 -> 75 = 68% reduction; tool loop preserved. Verification: 38/38 tests pass (no regressions).	2026-06-11 08:48:07 -04:00
ed	fadb4c329b	conductor(plan): mark Phase 4 complete in qwen_llama_grok_integration_20260606	2026-06-11 02:25:36 -04:00
ed	344a66fc53	refactor(minimax): use send_openai_compatible helper (231 -> 41 lines)	2026-06-11 02:21:28 -04:00
ed	94fe10089e	conductor(plan): mark t3.18 + phase_3 complete; advance to phase 4	2026-06-11 02:06:13 -04:00
ed	21adb4a6f4	conductor(checkpoint): Phase 3 complete - Grok (xAI) + Llama (multi-backend) via shared helper Phase 3 of qwen_llama_grok_integration_20260606 ships Grok and Llama provider support. 16 of 18 state tasks done (t3.4 and t3.15 cancelled: no credentials_template.toml exists; t3.6 and t3.17 completed in Phase 1's initial registry population). Modules shipped: - src/ai_client.py: state globals (_grok_, _llama_ including _llama_base_url and _llama_api_key), _ensure_grok_client() (OpenAI SDK with base_url https://api.x.ai/v1), _ensure_llama_client() (OpenAI SDK with configurable base_url + api_key for Ollama/OpenRouter/custom backends), _send_grok() and _send_llama() (both 10-param signature matching _send_minimax, both call send_openai_compatible), _list_grok_models() and _list_llama_models() (return from capability registry), _get_llama_cost_tracking() (the local-LLM signal: returns False when base_url is localhost/127.0.0.1), 2 new branches in list_models(), Grok + Llama state reset in reset_session() - src/models.py: 'grok' and 'llama' added to PROVIDERS (centralized; gui_2.py and app_controller.py import from this list) - src/cost_tracker.py: 11 new regex pricing entries (3 Grok + 8 Llama) Tests shipped: - tests/test_grok_provider.py (28 lines, 2 tests) - tests/test_llama_provider.py (68 lines, 6 tests) - Total new tests this phase: 8 (all passing) - Cumulative: 38 tests in batch (qwen + grok + llama + minimax + caps + openai_compat + cost + no_top_level_sdk_imports) Architectural correction (Grok-consulted 2026-06-11): - Spec section 3.1.1 added: 'best API per vendor' principle - Spec section 4.3 reverted from 'Native REST API' to 'OpenAI-Compatible' per Grok's own confirmation: 'the OpenAI-compatible endpoint is fully compatible and clean with no meaningful unique native surface lost' - Follow-up track B renamed: 'Llama Native APIs' (Ollama native + Meta Llama API), not 'Native Vendor APIs' (no Grok native refactor needed) - v2 matrix field expansion documented (per Grok's recommendation): audio, video, grounding, computer_use, local, reasoning, web_search, x_search, code_execution, file_search, mcp_support, structured_output Deviations from plan (consistent with Phase 1 and Phase 2): - Test signatures use 10-arg (real _send_minimax shape), not 12-arg - PROVIDERS change is at src/models.py:56 (centralized), not in gui_2.py and app_controller.py (which import from models) - t3.4 and t3.15 (credentials template) skipped: no template file exists; the user maintains their own credentials.toml directly Phase 4 (MiniMax refactor) is now unblocked. The refactor replaces ~250 lines of inline OpenAI-compatible send logic in _send_minimax with a thin wrapper around the shared send_openai_compatible helper (per the spec §5.2 target: ~50 lines).	2026-06-11 02:05:37 -04:00
ed	9be228f620	conductor(plan): fix duplicates in Phase 3 state; advance t3.18 (checkpoint)	2026-06-11 02:05:07 -04:00
ed	07bac1c6a7	conductor(plan): mark t3.3-t3.7 + t3.14-t3.17 complete (t3.4/t3.15 cancelled: no template)	2026-06-11 02:04:09 -04:00
ed	f9b5c9372d	feat(grok,llama): add to PROVIDERS; add 11 pricing entries (3 Grok + 8 Llama) Side concerns for Phase 3: 1. PROVIDERS: src/models.py:56 now includes 'grok' and 'llama' alongside the 6 existing vendors. Centralized registry; gui_2.py and app_controller.py import from here. State tasks t3.5 and t3.16 were scoped to gui_2.py/app_controller.py but the actual change is at the centralized registry, per the project's single-source-of- truth pattern (per src/models.py module docstring and the Phase 5 audit script audit_no_models_config_io.py which enforces that PROVIDERS lives in models.py). 2. cost_tracker.py: added 11 regex pricing entries (3 Grok + 8 Llama): Grok (per xAI public pricing): - grok-2: 2.00 / 10.00 - grok-2-vision: 2.00 / 10.00 - grok-beta: 5.00 / 15.00 Llama (per Grok's consultation: pricing varies by backend; registry entries represent the most common case): - llama-3.1-8b-instant: 0.05 / 0.08 (Groq) - llama-3.1-70b-versatile: 0.59 / 0.79 (Groq) - llama-3.1-405b-reasoning: 3.00 / 3.00 (OpenRouter avg) - llama-3.2-1b-preview: 0.04 / 0.04 - llama-3.2-3b-preview: 0.06 / 0.06 - llama-3.2-11b-vision-preview: 0.18 / 0.18 - llama-3.2-90b-vision-preview: 0.90 / 0.90 - llama-3.3-70b-specdec: 0.59 / 0.79 (Groq) (all per 1M tokens, USD; matches the structure of existing entries; note: 'llama-3.1', 'llama-3.2', 'llama-3.3' are regex patterns to allow future model variants in the same family.) Spot check: - estimate_cost('grok-2', 1000, 500) = 0.007 (= 0.002 + 0.005) - estimate_cost('llama-3.3-70b-specdec', 1000, 500) = 0.000985 3. SKIPPED t3.4 and t3.15 (credentials templates): no credentials_template.toml exists in the project (Phase 2 established this). The user maintains their own credentials.toml directly. 4. t3.6 and t3.17 (Grok/Llama models in capability registry) were completed in Phase 1's initial population of 22 entries (commit `6be04bc`). Grok has 4 entries (1 wildcard + 3 models); Llama has 9 entries (1 wildcard + 8 models). Grok-2-vision has vision=True; Llama 3.2-11b/90b vision variants have vision=True. Verification: 38/38 tests pass in batch.	2026-06-11 02:02:56 -04:00
ed	8e3543d875	docs(spec): revise 'best API per vendor' after Grok consultation Grok's own recommendation (consulted 2026-06-11): 'xAI (Grok) \| xAI official OpenAI-compatible (https://api.x.ai/v1) \| Fully compatible and clean. Supports Grok-2 + Grok-2-Vision. No meaningful unique native surface lost by using the compatible endpoint.' This REVERSES the earlier 'xAI native' correction. The OpenAI- compatible approach for Grok is the canonical full-featured path; the implementation in Phase 3 (OpenAI SDK with base_url=https://api.x.ai/v1 + send_openai_compatible helper) is correct as-is. Updates to the spec: 1. §3.1.1: replaced the 'use xAI native' decision with the confirmed per-vendor table. Qwen=Native, Grok=OpenAI-Compatible (per Grok's own confirmation), MiniMax=OpenAI-Compatible, DeepSeek=OpenAI- Compatible, Ollama=OpenAI-Compatible-in-v1 (native in v2), Meta Llama API=Native (new 4th backend, follow-up), Gemini=Native (follow-up), Anthropic=Native (follow-up). Also added Grok's recommended v2 matrix field expansion: audio, video, grounding, computer_use, local, reasoning/extended_thinking, web_search, x_search, code_execution, file_search, mcp_support, structured_output. 2. §4.3: reverted from 'Grok via xAI (Native REST API)' back to 'Grok via xAI (OpenAI-Compatible) - confirmed 2026-06-11'. The implementation does NOT need a native refactor; the OpenAI SDK at https://api.x.ai/v1 is the canonical approach. Removed the earlier 'caching: true' entry from the registry (since the OpenAI-compat shim doesn't expose prompt_cache_key) and the 'no persistent client' state struct (back to the OpenAI SDK pattern). 3. §13.1.B: renamed from 'Native Vendor APIs' to 'Llama Native APIs (Ollama native + Meta Llama API)' and removed the Grok native refactor item (Grok says OpenAI-compat is fine). Kept the Ollama native + Meta Llama API items + matrix expansion. Clarified that Grok tests do NOT need rewriting; only Llama tests get 2 more (native Ollama, Meta Llama API). Net effect: the Phase 3 work that just shipped (Grok+Llama Green using OpenAI-compat shim) is CORRECT as-is. The implementation matches Grok's actual recommendation. No code rollback needed.	2026-06-11 02:01:08 -04:00
ed	29a96cc9f5	feat(ai_client): Add Grok (xAI) OpenAI-compatible provider	2026-06-11 01:56:21 -04:00
ed	06716252f1	docs(spec): add 'best API per vendor' principle; mark xAI native as target; document follow-ups Three additions to the spec, per the user's architectural correction in this session: 1. NEW section 3.1.1: 'Architectural principle: Use the best API per vendor' — explains why the OpenAI-compatible shim loses vendor- specific features (xAI: prompt_cache_key, reasoning_effort, server- side tools, cost_in_usd_ticks; Ollama: think param, images array, thinking field, structured outputs) and states the principle: 'use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.' Also notes that the capability matrix IS the aggregate tracker; future native features go into the matrix, and the GUI filters based on it (no per-vendor UI branches). 2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was 'OpenAI-Compatible'. Now specifies two native endpoints (/v1/chat/completions and /v1/responses), the native features that matter, the updated capability registry (caching=true for Grok via prompt_cache_key), and a 'Phase 3 placeholder behavior' note that this track's Phase 3 ships the OpenAI-compatible Grok as a placeholder. The native refactor is deferred to follow-up B. 3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs (post-OpenAI-compatible-placeholder)' which documents: - Grok → xAI native REST - Llama (Ollama) → native /api/chat - Llama (Meta Llama API) → new 4th backend (deferred pending verification of Meta's API spec; llama.developer.meta.com/docs/overview returned 400 on fetch this session) - Capability matrix expansion (web_search, x_search, code_execution, file_search, mcp_support, reasoning_effort, structured_output) - Test rewrites (mock requests.post instead of chat.completions.create) This is a docs-only commit; no code changes. The Phase 3 Green work continues with the OpenAI-compatible approach as planned in the existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track B handles the native refactor when prioritized.	2026-06-11 01:49:36 -04:00
ed	891c008f0c	conductor(plan): mark t3.1-t3.2 + t3.8-t3.13 complete; advance to t3.3+t3.14 (Green)	2026-06-11 01:42:13 -04:00
ed	90f2be94af	test(grok,llama): red phase for Grok (xAI) + Llama (multi-backend) (8 tests, 6 fail) 8 failing tests in 2 new files for the upcoming Grok and Llama provider implementations. Grok (tests/test_grok_provider.py, 2 tests): 1. test_send_grok_uses_xai_endpoint: _send_grok calls _ensure_grok_client and uses an xAI client (base_url https://api.x.ai/v1) 2. test_grok_2_vision_supports_image: structural check that the capability registry has vision=True for grok-2-vision (already populated in Phase 1, so this test passes in Red phase; it is a regression guard for the registry, not an implementation test) Llama (tests/test_llama_provider.py, 6 tests): 1. test_send_llama_ollama_backend: _send_llama with localhost:11434 (Ollama) base URL 2. test_send_llama_openrouter_backend: _send_llama with OpenRouter URL 3. test_send_llama_custom_url: _send_llama with custom URL (escape hatch for self-hosted) 4. test_llama_model_discovery_unions_ollama_and_openrouter: _list_llama_models returns the 8 models from the capability registry 5. test_llama_3_2_vision_vision_capability: structural check for llama-3.2-11b-vision-preview (passes in Red phase) 6. test_llama_local_backend_cost_tracking_false_for_ollama: the local-LLM signal -- when base_url is localhost, _get_llama_cost_tracking() returns False. This is the first test that exercises the local LLM support that the capability matrix was designed for. Both _reset_grok_state and _reset_llama_state fixtures use hasattr() to be no-ops when the state doesn't exist (Red phase). Test signatures use the real 10-arg _send_minimax signature, NOT the plan's 12-arg with enable_tools / rag_engine. Red phase: 6/8 tests fail (4 AttributeError on missing _send_, 2 ImportError on missing _list_/_get_*). 2/8 pass (registry structural checks). Next: Green phase - implement _send_grok + _ensure_grok_client + _send_llama + _ensure_llama_client + _list_llama_models + _get_llama_cost_tracking in src/ai_client.py.	2026-06-11 01:41:47 -04:00
ed	4204116c66	conductor(plan): mark t2.11 completed (Phase 2 checkpoint)	2026-06-11 01:36:44 -04:00
ed	4d70dcc7ce	conductor(plan): mark t2.11 + phase_2 complete; advance to phase 3	2026-06-11 01:35:22 -04:00
ed	0f2541a3a1	conductor(checkpoint): Phase 2 complete - Qwen via DashScope Phase 2 of qwen_llama_grok_integration_20260606 ships Qwen support via the Alibaba Cloud DashScope native SDK. 10 of 11 state tasks done (t2.7 cancelled: no credentials_template.toml exists in the project; t2.9 was completed in Phase 1's initial registry population). Modules shipped: - src/qwen_adapter.py (31 lines): build_dashscope_tools() (OpenAI shape -> DashScope shape), classify_dashscope_error() (5 exception classes -> ProviderError kinds: auth/network/quota) - src/ai_client.py: state globals (_qwen_client, _qwen_history, _qwen_history_lock, _qwen_region), _ensure_qwen_client() (sets dashscope.base_http_api_url based on region: china vs international), _dashscope_call() + _dashscope_exception_from_response() + _extract_dashscope_tool_calls(), _send_qwen() (10-param signature matching _send_minimax), _list_qwen_models() - src/models.py: 'qwen' added to PROVIDERS (centralized; gui_2.py and app_controller.py import from this list) - src/cost_tracker.py: 7 Qwen pricing entries (regex-matched, USD per 1M tokens) Tests shipped: tests/test_qwen_provider.py (55 lines, 5 tests, all passing) Total new tests this phase: 5 Total tests in new modules: 30 (qwen + minimax + capabilities + openai_compatible + cost_tracker + no_top_level_sdk_imports) Verification: - 30/30 tests pass in batch - No regressions - 4/4 audit scripts pass (audit_main_thread_imports, audit_weak_types, check_test_toml_paths, audit_no_models_config_io) DashScope alignment (post-cleanup): - Uses dashscope.common.error.AuthenticationError (real class in 1.25.21) instead of the non-existent InvalidApiKey - Removed the InvalidApiKey -> AuthenticationError monkey-patch - TimeoutException -> network (not rate_limit) - ServiceUnavailableError -> network (not quota) - _ensure_qwen_client sets base_http_api_url per region (china vs international) per the latest DashScope API spec Deviations from the plan: - Test signature adapted from 12-param (plan) to 10-param (matching real _send_minimax) -- the plan's enable_tools / rag_engine params don't exist on _send_minimax - PROVIDERS change is at src/models.py:56 (centralized), not in gui_2.py and app_controller.py (which import from models) - t2.7 (credentials template) skipped: no template file exists; the user maintains their own credentials.toml directly Phase 3 (Grok + Llama) is now unblocked. Local LLM support lands in Phase 3 via Llama's Ollama backend (default base_url http://localhost:11434/v1).	2026-06-11 01:34:48 -04:00
ed	45d316a0bd	conductor(plan): mark t2.6-t2.10 complete (t2.7 cancelled: no template); advance to t2.11	2026-06-11 01:34:25 -04:00
ed	ab6b53fa8b	feat(qwen): add qwen to PROVIDERS; add 7 Qwen pricing entries to cost_tracker Side concerns for Phase 2: 1. PROVIDERS: src/models.py:56 now includes 'qwen' alongside the existing 5 vendors. The other 4 references to PROVIDERS in src/gui_2.py and src/app_controller.py import from this centralized list, so this one edit propagates everywhere. State task t2.8 was scoped to 'gui_2.py and app_controller.py' but the actual change is at the centralized registry, per the project's single-source-of-truth pattern (per src/models.py module docstring and the Phase 5 audit script audit_no_models_config_io.py which enforces that PROVIDERS lives in models.py). 2. cost_tracker.py: added 7 regex pricing entries for the Qwen models shipped in Phase 1's vendor_capabilities.py: - qwen-turbo: 0.05 / 0.10 - qwen-plus: 0.40 / 1.20 - qwen-max: 2.00 / 6.00 - qwen-long: 0.07 / 0.28 - qwen-vl-plus: 0.21 / 0.63 - qwen-vl-max: 0.50 / 1.50 - qwen-audio: 0.10 / 0.30 (all per 1M tokens, USD; matches the structure of existing entries) Spot check: estimate_cost('qwen-max', 1000, 500) = 0.005 (= 0.002 + 0.003) 3. SKIPPED t2.7 (credentials template): no credentials_template.toml exists in the project. The only credentials file is the active credentials.toml which the user maintains directly with their own API keys. The plan's assumption of a template file does not match the project's actual structure. Documented in the commit log rather than modifying the user's actual credentials.toml with a placeholder key (which would be inconsistent with the rest of that file's pattern of real keys). When the user obtains a DashScope API key, they can add a [qwen] section directly. 4. t2.9 (Qwen models in capability registry) was completed in Phase 1's initial population of 22 entries (commit `6be04bc`). The 8 qwen entries (1 wildcard + 7 specific models) are in src/vendor_capabilities.py. Verification: 30/30 tests pass in batch (test_qwen_provider, test_minimax_provider, test_ai_client_no_top_level_sdk_imports, test_vendor_capabilities, test_openai_compatible, test_cost_tracker)	2026-06-11 01:30:38 -04:00
ed	de5e106234	fix(qwen): align with dashscope 1.25.21 API; remove InvalidApiKey monkey-patch	2026-06-11 01:26:53 -04:00
ed	b75f60c3fe	feat(ai): Add Qwen provider support to ai_client	2026-06-11 01:20:35 -04:00
ed	bc2cce1612	feat(ai): Add Qwen adapter for DashScope provider	2026-06-11 01:20:19 -04:00
ed	6858dba3f5	remove unused files	2026-06-11 01:02:02 -04:00
ed	3940eb36ac	conductor(plan): mark t2.1-t2.5 complete; advance to t2.6 (Green)	2026-06-11 00:53:58 -04:00
ed	060f471cb9	test(qwen): red phase for Qwen via DashScope (5 failing tests) 5 failing tests in tests/test_qwen_provider.py that establish the core behaviors of the new Qwen (DashScope) provider: 1. test_send_qwen_routes_to_dashscope: _send_qwen calls _ensure_qwen_client and _dashscope_call, returns the text from the DashScope response 2. test_qwen_vision_vl_model_accepts_image: when file_items contains an image, the messages passed to _dashscope_call include the image ref 3. test_qwen_tool_format_translation: build_dashscope_tools converts OpenAI-shaped tool dicts to DashScope shape (name/description/parameters flat structure, not wrapped in function:) 4. test_qwen_error_classification: classify_dashscope_error maps dashscope.common.error.InvalidApiKey -> ProviderError(kind='auth', provider='qwen') 5. test_list_qwen_models_returns_hardcoded_registry: _list_qwen_models returns the 7 Qwen models registered in src/vendor_capabilities.py The autouse _reset_qwen_state fixture uses hasattr() so it is a no-op when _qwen_client / _qwen_history do not exist (yet); this keeps the fixture working in the Red phase. All 5 tests fail: - Tests 1, 2: AttributeError: src.ai_client has no _ensure_qwen_client / _send_qwen / _dashscope_call - Tests 3, 4: ModuleNotFoundError: No module named src.qwen_adapter - Test 5: ImportError: cannot import name _list_qwen_models Test signature adapted to match the real _send_minimax signature at src/ai_client.py:2143-2148 (10 params, no enable_tools / rag_engine) rather than the plan's 12-param signature. Next: Green phase - implement src/qwen_adapter.py + src/ai_client.py state + _ensure_qwen_client + _send_qwen + _list_qwen_models.	2026-06-11 00:53:10 -04:00
ed	d5373e8f94	conductor(plan): mark t1.12 + phase_1 complete; advance to phase 2	2026-06-11 00:48:14 -04:00
ed	03da130780	conductor(checkpoint): Phase 1 complete - capability matrix framework + shared helper Phase 1 of qwen_llama_grok_integration_20260606 ships two new modules and one new dependency, all under TDD discipline (12 tasks, 4 atomic commits, 3+6 failing-then-passing tests). Modules shipped: - src/vendor_capabilities.py (55 lines): VendorCapabilities frozen dataclass with 12 fields, module-level _REGISTRY dict keyed by (vendor, model), register() / get_capabilities() (with vendor '*' wildcard fallback) / list_models_for_vendor() functions, 22 initial registry entries (1 minimax, 4 grok, 9 llama, 8 qwen; plan's typo of minimax/grok-2-latest omitted). - src/openai_compatible.py (144 lines): NormalizedResponse frozen dataclass, OpenAICompatibleRequest dataclass, send_openai_compatible() dispatch, _send_blocking + _send_streaming helpers, _classify_openai_compatible_error error classifier (RateLimitError->rate_limit, AuthenticationError->auth, etc.). Fixed plan's MagicMock_noop forward-reference code smell. Tests shipped (all passing): - tests/test_vendor_capabilities.py (40 lines, 3 tests) - tests/test_openai_compatible.py (88 lines, 6 tests) - Total: 9 new tests, 0 regressions Dependency added: - pyproject.toml: dashscope>=1.14.0,<2.0.0 (installed: 1.25.21) Verification: - 24/24 tests pass in batch (test_minimax_provider, test_ai_client_no_top_level_sdk_imports, test_vendor_capabilities, test_openai_compatible) - 4 audit scripts pass with no new violations: - scripts/audit_main_thread_imports.py: OK - scripts/audit_weak_types.py: OK - scripts/check_test_toml_paths.py: OK - scripts/audit_no_models_config_io.py: OK - src/ai_client.py: NOT modified (Phase 4 will refactor _send_minimax) - src/openai_compatible.py and src/vendor_capabilities.py are importable with no side effects beyond registry population - No threading.Thread calls introduced (per project invariant) - Module-level imports in new files are stdlib + openai (already-used SDK) + a function-level import of ProviderError from src.ai_client inside the error classifier (avoids circular import risk)	2026-06-11 00:46:41 -04:00
ed	67782198b6	conductor(plan): mark t1.11 (dashscope dep) complete; advance to t1.12	2026-06-11 00:46:18 -04:00
ed	f4186f1061	chore(deps): add dashscope>=1.14.0,<2.0.0 for Qwen support	2026-06-11 00:44:08 -04:00
ed	f07e616c38	conductor(plan): mark t1.5-t1.10 complete; advance to t1.11	2026-06-11 00:41:11 -04:00
ed	d7d7d5cef9	feat(openai_compatible): implement shared send helper with streaming/tool/vision/error Green phase: src/openai_compatible.py now exists and all 6 Red-phase tests in tests/test_openai_compatible.py pass. Implementation (144 lines, 1-space indent, no comments): Data structures: - NormalizedResponse: frozen dataclass with text, tool_calls, usage_input_tokens, usage_output_tokens, usage_cache_read_tokens, usage_cache_creation_tokens, raw_response - OpenAICompatibleRequest: regular dataclass with messages, model, temperature=0.0, top_p=1.0, max_tokens=8192, tools=None, tool_choice='auto', stream=False, stream_callback=None Algorithms: - send_openai_compatible(client, request, *, capabilities) -> NormalizedResponse Dispatches to _send_blocking or _send_streaming based on request.stream. Catches openai.OpenAIError and re-raises as classified ProviderError. - _send_blocking: extracts message text + tool_calls, converts tool_calls to dicts via _to_dict_tool_call, reads usage.prompt_tokens / usage.completion_tokens (with int() coercion for MagicMock test compat). - _send_streaming: iterates chunks, accumulates text parts, aggregates tool_calls by index, fires stream_callback per text delta, reads chunk.usage for final token counts. - _classify_openai_compatible_error: maps RateLimitError -> 'rate_limit', AuthenticationError/PermissionDeniedError -> 'auth', APIConnectionError -> 'network', APIStatusError with 402/429/401-403/500-504 -> 'balance'/ 'rate_limit'/'auth'/'network', BadRequestError -> 'quota', fallback 'unknown'. All use provider='openai_compatible'. Fixed plan's code smell: removed the 'MagicMock_noop' forward-reference class (defined after first use) and replaced with the cleaner Pythonic pattern 'int(getattr(usage, prompt_tokens, 0) or 0)'. Real OpenAI SDK always sets usage on responses; the defensive fallback was noise. Function-level import of ProviderError inside _classify_openai_compatible_error avoids any circular import risk.	2026-06-11 00:39:58 -04:00
ed	b53fe39d79	test(openai_compatible): red phase for shared send helper (6 failing tests) 6 failing tests in tests/test_openai_compatible.py that establish the core behaviors of the new send_openai_compatible() shared helper: 1. test_send_non_streaming_returns_normalized_response: blocking call returns text, empty tool_calls, and correct usage token counts 2. test_send_streaming_aggregates_chunks: streaming call aggregates deltas into final text and fires stream_callback per chunk 3. test_tool_call_detection_in_response: tool_calls from the response are converted to dicts with id/type/function/arguments fields 4. test_vision_multimodal_message: messages with multimodal content (text + image_url) are passed through unchanged to the client 5. test_error_classification_429_to_rate_limit: RateLimitError from openai SDK is caught and re-raised as ProviderError(kind='rate_limit') 6. test_normalized_response_is_frozen_dataclass: NormalizedResponse is a frozen dataclass (FrozenInstanceError on attribute assignment) All 6 tests fail with ModuleNotFoundError: No module named 'src.openai_compatible' (confirmed via pytest). The implementation file will be created in the next commit (Green phase). ProviderError confirmed importable from src.ai_client (no stub needed).	2026-06-11 00:35:13 -04:00
ed	6f11e7da14	conductor(plan): mark t1.1-t1.4 complete; advance to phase 1 in_progress	2026-06-11 00:31:57 -04:00
ed	6be04bc4f0	feat(vendor_capabilities): implement registry with initial 22-entry population Green phase: src/vendor_capabilities.py now exists and all 3 Red-phase tests in tests/test_vendor_capabilities.py pass. Implementation: - VendorCapabilities frozen dataclass with 12 fields (vendor, model, vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking, cost_input_per_mtok, cost_output_per_mtok, notes) - Module-level _REGISTRY dict keyed by (vendor, model) - register() inserts/overwrites entries - get_capabilities() returns specific entry if present, else vendor '' default, else raises KeyError with 'No capabilities registered' message - list_models_for_vendor() returns sorted model names for a vendor (excludes '' wildcard) Initial population (22 entries at module load): - 1 minimax wildcard (cost: 0.20/0.20 per Mtok) - 4 grok (1 wildcard + 3 models; grok-2-vision has vision=True) - 9 llama (1 wildcard + 8 models; 11b/90b vision variants have vision=True) - 8 qwen (1 wildcard + 7 models; qwen-vl-plus/max have vision=True; qwen-audio has notes='Text-only in v1; audio input deferred') The plan's Task 1.3 listed 22 entries but included one impossible entry (vendor='minimax', model='grok-2-latest'). Omitted; 21 entries shipped. Test fix: test_fallback_to_vendor_default previously used model name 'llama-3.3-70b-specdec' which IS in the registry, so the specific entry was returned (with default cost_tracking=True), not the wildcard. Fixed by changing to 'llama-3.3-future-unregistered' (not in registry, so fallback fires correctly).	2026-06-11 00:30:52 -04:00
ed	6fb6f8653c	test(vendor_capabilities): red phase for registry lookup, fallback, unknown vendor 3 failing tests in tests/test_vendor_capabilities.py that establish the core behaviors of the new VendorCapability matrix: 1. test_registry_lookup_known_model: registering and looking up a specific (vendor, model) entry returns the registered entry 2. test_fallback_to_vendor_default: looking up an unregistered model returns the vendor's '*' default entry 3. test_unknown_vendor_raises: looking up a vendor with no entries raises KeyError with a 'No capabilities registered' message All 3 tests fail with ModuleNotFoundError: No module named 'src.vendor_capabilities' (confirmed via pytest). The implementation file will be created in the next commit (Green phase). The autouse _clean_registry fixture snapshots src.vendor_capabilities._REGISTRY before each test and restores it after, providing test isolation for the module-level state.	2026-06-11 00:19:00 -04:00
ed	cd2557bc4a	config stable-2026-6-11	2026-06-11 00:16:22 -04:00
ed	2fa5a14620	docs(report): append Final Report section to docs_sync closing report Final report for the continuation session that started after the original 25-commit run closed. Covers: Stats: - 17 atomic continuation commits (`db5ab0d9` -> `7d6dbbd3`) plus `03056a4f` for the closure summary itself - 14 unique doc files modified - 0 source files modified (continuation was docs-only) - 11 source files read in full; ~20 outlined - ~250 + lines, ~190 - lines across the doc edits What was done (14 drift clusters with detailed before/after): - guide_hot_reload.md: example registration + trigger_key claim - guide_app_controller.md: filename typo + fictional hot_reload() method - guide_gui_2.md: line 155 -> 285; reload() -> reload_all() - guide_nerv_theme.md: 5 wrong hex values; render_nerv_fx fiction; [nerv] config fiction; 0.5 Hz -> 3.18 Hz; 1.5s pulse -> no decay - guide_shaders_and_window.md: 3 fictional [nerv] config refs - guide_command_palette.md: 11 -> 33 commands - guide_mma.md: 5 algorithm drift points (has_cycle iterative, topological_sort Kahn's, tick no-promote, ConductorEngine.__init__ signature) - guide_beads.md: dispatch line range - guide_multi_agent_conductor.md: wholesale rewrite of pre-refactor architecture - guide_tools.md: run_powershell signature (add patch_callback) - guide_context_curation.md: FuzzyAnchor docstring (replace 'anchor_lines' with real field names) - guide_simulations.md: CodeOutliner doc (add [ImGui Scope], return-type suffix, count guard) - Readme.md: 3 line-level drift (45->46 MCP, 32->33 commands, shell_runner patch_callback) - docs/Readme.md: file tree (24->27 guides with full alphabetical list) - conductor/index.md: 23 -> 27 guides count Drift patterns (6, refined from the 4 in the original handoff): 1. Thread counts 2. Line numbers 3. Removed-class claims 4. Schema fields 5. NEW: Architecture rotations (the most common in this continuation) 6. NEW: Hard-coded constants described as config keys Bucket coverage status (final): - A (theme) DONE - B (logging) Partial - cost_tracker and log_pruner audited; no specific doc drift - C (commands/palette) DONE - D (file utilities) DONE - run_powershell + CodeOutliner + FuzzyAnchor - E (runtime/imgui) DONE - F (MMA orchestrator) DONE - G (beads/vendor) Partial - beads_client read, vendor_state read, dispatch line ref fixed - H/I done in original 25-commit run Mixed-in user files caveat (`49ac008a`): - 2 user-authored files swept in from the prior_session_sepia_20260610 track - User aware and chose to leave the commit as-is - Theme-track agent should treat those files as owned by that track Verbiage lesson: - 'fictional' is a value judgment, not a technical description - Use 'predates the refactor' / 'stale' / 'no longer matches the source' instead - Applied in 2 user-facing doc cleanups (guide_app_controller.md:59, guide_rag.md:322) Recommendations for the theme-track agent: - Read guide_themes.md:87 before touching the theme system - Do NOT touch the guide_nerv_theme.md and guide_shaders_and_window.md updates from this session (re-verified against source) - The theme_2.py:111 comment confirms the per-frame create-and-discard FX pattern - Run all 4 audit scripts before committing any source code change - The markdown_table.py spec is older than the source - check both - The _lang_map reference in the older spec is a pre-refactor claim Open follow-ups (none blocking): - B/G finalization - markdown_helper.py and markdown_table.py source verification (left for theme track) - Test count verification (322 may drift) - Doc freshness signal	2026-06-11 00:02:34 -04:00
ed	7d6dbbd371	docs(conductor/index): fix guide count (23->27), update last-refresh date and add docs_sync_test_era_20260610 reference	2026-06-10 23:58:20 -04:00
ed	d0dec98a18	docs(readme): refresh file tree + summary table (27 guides with full alphabetical list, 45+1=46 MCP tools, 33 commands, shell_runner with patch_callback, 322 test files)	2026-06-10 23:57:47 -04:00
ed	758f5c861e	docs(readme): fix 3 line-level drift in src/ table (45->46 MCP tools, 32->33 commands, add patch_callback to shell_runner)	2026-06-10 23:56:37 -04:00
ed	824f5e9bae	docs(simulations): expand CodeOutliner doc (add get_outline dispatcher, [ImGui Scope] case, return-type suffix, count overflow guard)	2026-06-10 23:47:28 -04:00
ed	de9107db4f	docs(readme): fix tool count in guide_tools summary (26->46 with breakdown) + add patch_callback to shell runner description	2026-06-10 23:46:26 -04:00
ed	99eb434f60	docs(curation): correct FuzzyAnchor docstring (add get_context helper, replace 'anchor_lines' with actual field names)	2026-06-10 23:45:37 -04:00
ed	aa4ec2ed08	docs(tools): fix run_powershell signature (add patch_callback + correct Popen kwargs + qa_callback also fires on stderr-only)	2026-06-10 23:45:02 -04:00
ed	03056a4f4c	docs(report): append continuation summary to docs_sync closing report 12 atomic commits added after the original 25-commit run closed: 6 small drift fixes (db5ab0d9..28172135) - guide_hot_reload.md: example registration + trigger_key claim - guide_app_controller.md: src/hot_reload.py -> src/hot_reloader.py + hot_reload() method - guide_gui_2.md: line 155 -> 285; reload() -> reload_all() - guide_nerv_theme.md: 5 wrong hex values, stale apply_nerv body, stale render_nerv_fx example, [nerv] config that was never wired, 0.5 Hz vs actual 3.18 Hz flicker - guide_shaders_and_window.md: 3 fictional [nerv] config refs - guide_app_controller.md:68: self-referential io_pool docstring claim 1 mid-size fix (`81e88241`) - guide_command_palette.md: command count 11 -> 33 (full source-derived Action column for every @registry.register decorator in src/commands.py) 2 MMA rewrites (`57143b7a`, `394987f8`, `a49e5ffb`, `e0368174`) - guide_mma.md: has_cycle recursive -> iterative; topological_sort DFS -> Kahn's; tick auto-promotion claim; ConductorEngine.__init__ missing max_workers param - guide_beads.md: bd_ tool dispatch line range - guide_multi_agent_conductor.md: rewrote the TrackDAG and ExecutionEngine/ConductorEngine/WorkerPool/mma_exec sections; the prior doc predated the conductor_engine refactor and described a different architecture (MultiAgentConductor class that doesn't exist, ExecutionMode enum that doesn't exist, _dispatch_loop background thread that doesn't exist, ThreadPoolExecutor-backed WorkerPool that is actually a dict[str, Thread] + lock + semaphore) 2 verbiage cleanups - replaced 'fictional' with neutral phrasing ('predates the refactor' / 'stale') in 2 places where the prior session had used it in user-facing doc text. Going forward doc-drift commits use neutral language; 'fictional' was a value judgment on the doc and its author, not a technical description. Bucket coverage after continuation: A (theme), C (commands/palette), E (runtime/imgui), F (MMA orchestrator) fully covered. B (logging) and G (beads/vendor) partial. H/I (mcp_client/ai_client deep) done in original 25-commit run. Still untouched: D (8 file utilities), shaders.py / bg shader.py, summary_cache.py. Caveat for next agent (theme track): commit `49ac008a` accidentally swept in 2 user-authored files from the parallel prior_session_sepia_20260610 work (conductor/tracks/prior_session_sepia_20260610/plan.md and docs/superpowers/plans/2026-06-10-prior-session-sepia.md). The user is aware and chose to leave them in that commit. The next agent should treat those files as owned by the prior_session_sepia_20260610 track and not modify them from the theme-track context.	2026-06-10 23:41:32 -04:00
ed	49ac008a87	docs: replace 2 'fictional' usages with neutral phrasing (predates the refactor / was stale)	2026-06-10 23:34:33 -04:00
ed	e03681741a	docs(mma-conductor): rewrite ExecutionEngine/ConductorEngine/WorkerPool/mma_exec sections to match current src/multi_agent_conductor.py (predates the conductor_engine refactor)	2026-06-10 23:31:43 -04:00
ed	a49e5ffb16	docs(mma-conductor): replace fictional TrackDAG section with actual src/dag_engine.py API	2026-06-10 23:30:04 -04:00
ed	394987f8b3	docs(beads): fix dispatch line ref (1474-1494 -> 1453-1473; add tool-schema block 2224-2268)	2026-06-10 23:29:18 -04:00
ed	57143b7ab2	docs(mma): fix 5 drift points (has_cycle iterative/DFS->iterative, topological_sort DFS->Kahn, tick auto-promotion, ConductorEngine.__init__ signature+max_workers)	2026-06-10 23:27:46 -04:00
ed	81e8824170	docs(command_palette): fix command count (11->33) and expand table with actual source-derived actions	2026-06-10 23:22:06 -04:00

1 2 3 4 5 ...