Private
Public Access
0
0

32 Commits

Author SHA1 Message Date
ed 344a66fc53 refactor(minimax): use send_openai_compatible helper (231 -> 41 lines) 2026-06-11 02:21:28 -04:00
ed 94fe10089e conductor(plan): mark t3.18 + phase_3 complete; advance to phase 4 2026-06-11 02:06:13 -04:00
ed 21adb4a6f4 conductor(checkpoint): Phase 3 complete - Grok (xAI) + Llama (multi-backend) via shared helper
Phase 3 of qwen_llama_grok_integration_20260606 ships Grok and Llama
provider support. 16 of 18 state tasks done (t3.4 and t3.15 cancelled:
no credentials_template.toml exists; t3.6 and t3.17 completed in
Phase 1's initial registry population).

Modules shipped:
- src/ai_client.py: state globals (_grok_*, _llama_* including _llama_base_url
  and _llama_api_key), _ensure_grok_client() (OpenAI SDK with base_url
  https://api.x.ai/v1), _ensure_llama_client() (OpenAI SDK with
  configurable base_url + api_key for Ollama/OpenRouter/custom backends),
  _send_grok() and _send_llama() (both 10-param signature matching
  _send_minimax, both call send_openai_compatible), _list_grok_models()
  and _list_llama_models() (return from capability registry),
  _get_llama_cost_tracking() (the local-LLM signal: returns False when
  base_url is localhost/127.0.0.1), 2 new branches in list_models(),
  Grok + Llama state reset in reset_session()
- src/models.py: 'grok' and 'llama' added to PROVIDERS (centralized;
  gui_2.py and app_controller.py import from this list)
- src/cost_tracker.py: 11 new regex pricing entries (3 Grok + 8 Llama)

Tests shipped:
- tests/test_grok_provider.py (28 lines, 2 tests)
- tests/test_llama_provider.py (68 lines, 6 tests)
- Total new tests this phase: 8 (all passing)
- Cumulative: 38 tests in batch (qwen + grok + llama + minimax + caps +
  openai_compat + cost + no_top_level_sdk_imports)

Architectural correction (Grok-consulted 2026-06-11):
- Spec section 3.1.1 added: 'best API per vendor' principle
- Spec section 4.3 reverted from 'Native REST API' to 'OpenAI-Compatible'
  per Grok's own confirmation: 'the OpenAI-compatible endpoint is
  fully compatible and clean with no meaningful unique native surface
  lost'
- Follow-up track B renamed: 'Llama Native APIs' (Ollama native +
  Meta Llama API), not 'Native Vendor APIs' (no Grok native refactor
  needed)
- v2 matrix field expansion documented (per Grok's recommendation):
  audio, video, grounding, computer_use, local, reasoning,
  web_search, x_search, code_execution, file_search, mcp_support,
  structured_output

Deviations from plan (consistent with Phase 1 and Phase 2):
- Test signatures use 10-arg (real _send_minimax shape), not 12-arg
- PROVIDERS change is at src/models.py:56 (centralized), not in
  gui_2.py and app_controller.py (which import from models)
- t3.4 and t3.15 (credentials template) skipped: no template file
  exists; the user maintains their own credentials.toml directly

Phase 4 (MiniMax refactor) is now unblocked. The refactor replaces
~250 lines of inline OpenAI-compatible send logic in _send_minimax
with a thin wrapper around the shared send_openai_compatible helper
(per the spec §5.2 target: ~50 lines).
2026-06-11 02:05:37 -04:00
ed 9be228f620 conductor(plan): fix duplicates in Phase 3 state; advance t3.18 (checkpoint) 2026-06-11 02:05:07 -04:00
ed 07bac1c6a7 conductor(plan): mark t3.3-t3.7 + t3.14-t3.17 complete (t3.4/t3.15 cancelled: no template) 2026-06-11 02:04:09 -04:00
ed f9b5c9372d feat(grok,llama): add to PROVIDERS; add 11 pricing entries (3 Grok + 8 Llama)
Side concerns for Phase 3:

1. PROVIDERS: src/models.py:56 now includes 'grok' and 'llama' alongside
   the 6 existing vendors. Centralized registry; gui_2.py and
   app_controller.py import from here. State tasks t3.5 and t3.16
   were scoped to gui_2.py/app_controller.py but the actual change
   is at the centralized registry, per the project's single-source-of-
   truth pattern (per src/models.py module docstring and the Phase 5
   audit script audit_no_models_config_io.py which enforces that
   PROVIDERS lives in models.py).

2. cost_tracker.py: added 11 regex pricing entries (3 Grok + 8 Llama):

   Grok (per xAI public pricing):
   - grok-2: 2.00 / 10.00
   - grok-2-vision: 2.00 / 10.00
   - grok-beta: 5.00 / 15.00

   Llama (per Grok's consultation: pricing varies by backend; registry
   entries represent the most common case):
   - llama-3.1-8b-instant: 0.05 / 0.08 (Groq)
   - llama-3.1-70b-versatile: 0.59 / 0.79 (Groq)
   - llama-3.1-405b-reasoning: 3.00 / 3.00 (OpenRouter avg)
   - llama-3.2-1b-preview: 0.04 / 0.04
   - llama-3.2-3b-preview: 0.06 / 0.06
   - llama-3.2-11b-vision-preview: 0.18 / 0.18
   - llama-3.2-90b-vision-preview: 0.90 / 0.90
   - llama-3.3-70b-specdec: 0.59 / 0.79 (Groq)

   (all per 1M tokens, USD; matches the structure of existing entries;
   note: 'llama-3.1', 'llama-3.2', 'llama-3.3' are regex patterns to
   allow future model variants in the same family.)

   Spot check:
   - estimate_cost('grok-2', 1000, 500) = 0.007 (= 0.002 + 0.005)
   - estimate_cost('llama-3.3-70b-specdec', 1000, 500) = 0.000985

3. SKIPPED t3.4 and t3.15 (credentials templates): no
   credentials_template.toml exists in the project (Phase 2 established
   this). The user maintains their own credentials.toml directly.

4. t3.6 and t3.17 (Grok/Llama models in capability registry) were
   completed in Phase 1's initial population of 22 entries
   (commit 6be04bc). Grok has 4 entries (1 wildcard + 3 models);
   Llama has 9 entries (1 wildcard + 8 models). Grok-2-vision has
   vision=True; Llama 3.2-11b/90b vision variants have vision=True.

Verification: 38/38 tests pass in batch.
2026-06-11 02:02:56 -04:00
ed 8e3543d875 docs(spec): revise 'best API per vendor' after Grok consultation
Grok's own recommendation (consulted 2026-06-11):

  'xAI (Grok) | xAI official OpenAI-compatible (https://api.x.ai/v1) |
   Fully compatible and clean. Supports Grok-2 + Grok-2-Vision. No
   meaningful unique native surface lost by using the compatible
   endpoint.'

This REVERSES the earlier 'xAI native' correction. The OpenAI-
compatible approach for Grok is the canonical full-featured path;
the implementation in Phase 3 (OpenAI SDK with base_url=https://api.x.ai/v1
+ send_openai_compatible helper) is correct as-is.

Updates to the spec:

1. §3.1.1: replaced the 'use xAI native' decision with the confirmed
   per-vendor table. Qwen=Native, Grok=OpenAI-Compatible (per Grok's
   own confirmation), MiniMax=OpenAI-Compatible, DeepSeek=OpenAI-
   Compatible, Ollama=OpenAI-Compatible-in-v1 (native in v2),
   Meta Llama API=Native (new 4th backend, follow-up), Gemini=Native
   (follow-up), Anthropic=Native (follow-up). Also added Grok's
   recommended v2 matrix field expansion: audio, video, grounding,
   computer_use, local, reasoning/extended_thinking, web_search,
   x_search, code_execution, file_search, mcp_support, structured_output.

2. §4.3: reverted from 'Grok via xAI (Native REST API)' back to
   'Grok via xAI (OpenAI-Compatible) - confirmed 2026-06-11'. The
   implementation does NOT need a native refactor; the OpenAI SDK
   at https://api.x.ai/v1 is the canonical approach. Removed the
   earlier 'caching: true' entry from the registry (since the
   OpenAI-compat shim doesn't expose prompt_cache_key) and the
   'no persistent client' state struct (back to the OpenAI SDK
   pattern).

3. §13.1.B: renamed from 'Native Vendor APIs' to 'Llama Native APIs
   (Ollama native + Meta Llama API)' and removed the Grok native
   refactor item (Grok says OpenAI-compat is fine). Kept the Ollama
   native + Meta Llama API items + matrix expansion. Clarified that
   Grok tests do NOT need rewriting; only Llama tests get 2 more
   (native Ollama, Meta Llama API).

Net effect: the Phase 3 work that just shipped (Grok+Llama Green
using OpenAI-compat shim) is CORRECT as-is. The implementation
matches Grok's actual recommendation. No code rollback needed.
2026-06-11 02:01:08 -04:00
ed 29a96cc9f5 feat(ai_client): Add Grok (xAI) OpenAI-compatible provider 2026-06-11 01:56:21 -04:00
ed 06716252f1 docs(spec): add 'best API per vendor' principle; mark xAI native as target; document follow-ups
Three additions to the spec, per the user's architectural correction
in this session:

1. NEW section 3.1.1: 'Architectural principle: Use the best API per
   vendor' — explains why the OpenAI-compatible shim loses vendor-
   specific features (xAI: prompt_cache_key, reasoning_effort, server-
   side tools, cost_in_usd_ticks; Ollama: think param, images array,
   thinking field, structured outputs) and states the principle:
   'use each vendor's native SDK or REST API when one exists, falling
   back to OpenAI-compatible only when no native option exists.'

   Also notes that the capability matrix IS the aggregate tracker;
   future native features go into the matrix, and the GUI filters
   based on it (no per-vendor UI branches).

2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was
   'OpenAI-Compatible'. Now specifies two native endpoints
   (/v1/chat/completions and /v1/responses), the native features that
   matter, the updated capability registry (caching=true for Grok
   via prompt_cache_key), and a 'Phase 3 placeholder behavior' note
   that this track's Phase 3 ships the OpenAI-compatible Grok as a
   placeholder. The native refactor is deferred to follow-up B.

3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs
   (post-OpenAI-compatible-placeholder)' which documents:
   - Grok → xAI native REST
   - Llama (Ollama) → native /api/chat
   - Llama (Meta Llama API) → new 4th backend (deferred pending
     verification of Meta's API spec; llama.developer.meta.com/docs/overview
     returned 400 on fetch this session)
   - Capability matrix expansion (web_search, x_search, code_execution,
     file_search, mcp_support, reasoning_effort, structured_output)
   - Test rewrites (mock requests.post instead of chat.completions.create)

This is a docs-only commit; no code changes. The Phase 3 Green work
continues with the OpenAI-compatible approach as planned in the
existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track
B handles the native refactor when prioritized.
2026-06-11 01:49:36 -04:00
ed 891c008f0c conductor(plan): mark t3.1-t3.2 + t3.8-t3.13 complete; advance to t3.3+t3.14 (Green) 2026-06-11 01:42:13 -04:00
ed 90f2be94af test(grok,llama): red phase for Grok (xAI) + Llama (multi-backend) (8 tests, 6 fail)
8 failing tests in 2 new files for the upcoming Grok and Llama
provider implementations.

Grok (tests/test_grok_provider.py, 2 tests):
1. test_send_grok_uses_xai_endpoint: _send_grok calls _ensure_grok_client
   and uses an xAI client (base_url https://api.x.ai/v1)
2. test_grok_2_vision_supports_image: structural check that the
   capability registry has vision=True for grok-2-vision (already
   populated in Phase 1, so this test passes in Red phase; it is a
   regression guard for the registry, not an implementation test)

Llama (tests/test_llama_provider.py, 6 tests):
1. test_send_llama_ollama_backend: _send_llama with localhost:11434
   (Ollama) base URL
2. test_send_llama_openrouter_backend: _send_llama with OpenRouter URL
3. test_send_llama_custom_url: _send_llama with custom URL
   (escape hatch for self-hosted)
4. test_llama_model_discovery_unions_ollama_and_openrouter: _list_llama_models
   returns the 8 models from the capability registry
5. test_llama_3_2_vision_vision_capability: structural check for
   llama-3.2-11b-vision-preview (passes in Red phase)
6. test_llama_local_backend_cost_tracking_false_for_ollama: the local-LLM
   signal -- when base_url is localhost, _get_llama_cost_tracking()
   returns False. This is the first test that exercises the local LLM
   support that the capability matrix was designed for.

Both _reset_grok_state and _reset_llama_state fixtures use hasattr() to
be no-ops when the state doesn't exist (Red phase).

Test signatures use the real 10-arg _send_minimax signature, NOT the
plan's 12-arg with enable_tools / rag_engine.

Red phase: 6/8 tests fail (4 AttributeError on missing _send_*,
2 ImportError on missing _list_*/_get_*). 2/8 pass (registry structural
checks).

Next: Green phase - implement _send_grok + _ensure_grok_client +
_send_llama + _ensure_llama_client + _list_llama_models +
_get_llama_cost_tracking in src/ai_client.py.
2026-06-11 01:41:47 -04:00
ed 4204116c66 conductor(plan): mark t2.11 completed (Phase 2 checkpoint) 2026-06-11 01:36:44 -04:00
ed 4d70dcc7ce conductor(plan): mark t2.11 + phase_2 complete; advance to phase 3 2026-06-11 01:35:22 -04:00
ed 0f2541a3a1 conductor(checkpoint): Phase 2 complete - Qwen via DashScope
Phase 2 of qwen_llama_grok_integration_20260606 ships Qwen support via
the Alibaba Cloud DashScope native SDK. 10 of 11 state tasks done
(t2.7 cancelled: no credentials_template.toml exists in the project;
t2.9 was completed in Phase 1's initial registry population).

Modules shipped:
- src/qwen_adapter.py (31 lines): build_dashscope_tools() (OpenAI shape
  -> DashScope shape), classify_dashscope_error() (5 exception classes
  -> ProviderError kinds: auth/network/quota)
- src/ai_client.py: state globals (_qwen_client, _qwen_history,
  _qwen_history_lock, _qwen_region), _ensure_qwen_client() (sets
  dashscope.base_http_api_url based on region: china vs international),
  _dashscope_call() + _dashscope_exception_from_response() +
  _extract_dashscope_tool_calls(), _send_qwen() (10-param signature
  matching _send_minimax), _list_qwen_models()
- src/models.py: 'qwen' added to PROVIDERS (centralized; gui_2.py and
  app_controller.py import from this list)
- src/cost_tracker.py: 7 Qwen pricing entries (regex-matched,
  USD per 1M tokens)

Tests shipped: tests/test_qwen_provider.py (55 lines, 5 tests, all passing)
Total new tests this phase: 5
Total tests in new modules: 30 (qwen + minimax + capabilities +
openai_compatible + cost_tracker + no_top_level_sdk_imports)

Verification:
- 30/30 tests pass in batch
- No regressions
- 4/4 audit scripts pass (audit_main_thread_imports, audit_weak_types,
  check_test_toml_paths, audit_no_models_config_io)

DashScope alignment (post-cleanup):
- Uses dashscope.common.error.AuthenticationError (real class in
  1.25.21) instead of the non-existent InvalidApiKey
- Removed the InvalidApiKey -> AuthenticationError monkey-patch
- TimeoutException -> network (not rate_limit)
- ServiceUnavailableError -> network (not quota)
- _ensure_qwen_client sets base_http_api_url per region (china vs
  international) per the latest DashScope API spec

Deviations from the plan:
- Test signature adapted from 12-param (plan) to 10-param (matching
  real _send_minimax) -- the plan's enable_tools / rag_engine params
  don't exist on _send_minimax
- PROVIDERS change is at src/models.py:56 (centralized), not in
  gui_2.py and app_controller.py (which import from models)
- t2.7 (credentials template) skipped: no template file exists;
  the user maintains their own credentials.toml directly

Phase 3 (Grok + Llama) is now unblocked. Local LLM support lands
in Phase 3 via Llama's Ollama backend (default base_url
http://localhost:11434/v1).
2026-06-11 01:34:48 -04:00
ed 45d316a0bd conductor(plan): mark t2.6-t2.10 complete (t2.7 cancelled: no template); advance to t2.11 2026-06-11 01:34:25 -04:00
ed ab6b53fa8b feat(qwen): add qwen to PROVIDERS; add 7 Qwen pricing entries to cost_tracker
Side concerns for Phase 2:

1. PROVIDERS: src/models.py:56 now includes 'qwen' alongside the existing
   5 vendors. The other 4 references to PROVIDERS in src/gui_2.py and
   src/app_controller.py import from this centralized list, so this
   one edit propagates everywhere. State task t2.8 was scoped to
   'gui_2.py and app_controller.py' but the actual change is at the
   centralized registry, per the project's single-source-of-truth
   pattern (per src/models.py module docstring and the Phase 5 audit
   script audit_no_models_config_io.py which enforces that PROVIDERS
   lives in models.py).

2. cost_tracker.py: added 7 regex pricing entries for the Qwen models
   shipped in Phase 1's vendor_capabilities.py:
   - qwen-turbo: 0.05 / 0.10
   - qwen-plus: 0.40 / 1.20
   - qwen-max: 2.00 / 6.00
   - qwen-long: 0.07 / 0.28
   - qwen-vl-plus: 0.21 / 0.63
   - qwen-vl-max: 0.50 / 1.50
   - qwen-audio: 0.10 / 0.30
   (all per 1M tokens, USD; matches the structure of existing entries)

   Spot check: estimate_cost('qwen-max', 1000, 500) = 0.005 (= 0.002 + 0.003)

3. SKIPPED t2.7 (credentials template): no credentials_template.toml
   exists in the project. The only credentials file is the active
   credentials.toml which the user maintains directly with their own
   API keys. The plan's assumption of a template file does not match
   the project's actual structure. Documented in the commit log
   rather than modifying the user's actual credentials.toml with a
   placeholder key (which would be inconsistent with the rest of
   that file's pattern of real keys). When the user obtains a
   DashScope API key, they can add a [qwen] section directly.

4. t2.9 (Qwen models in capability registry) was completed in Phase 1's
   initial population of 22 entries (commit 6be04bc). The 8 qwen
   entries (1 wildcard + 7 specific models) are in src/vendor_capabilities.py.

Verification: 30/30 tests pass in batch
(test_qwen_provider, test_minimax_provider, test_ai_client_no_top_level_sdk_imports,
test_vendor_capabilities, test_openai_compatible, test_cost_tracker)
2026-06-11 01:30:38 -04:00
ed de5e106234 fix(qwen): align with dashscope 1.25.21 API; remove InvalidApiKey monkey-patch 2026-06-11 01:26:53 -04:00
ed b75f60c3fe feat(ai): Add Qwen provider support to ai_client 2026-06-11 01:20:35 -04:00
ed bc2cce1612 feat(ai): Add Qwen adapter for DashScope provider 2026-06-11 01:20:19 -04:00
ed 6858dba3f5 remove unused files 2026-06-11 01:02:02 -04:00
ed 3940eb36ac conductor(plan): mark t2.1-t2.5 complete; advance to t2.6 (Green) 2026-06-11 00:53:58 -04:00
ed 060f471cb9 test(qwen): red phase for Qwen via DashScope (5 failing tests)
5 failing tests in tests/test_qwen_provider.py that establish the
core behaviors of the new Qwen (DashScope) provider:

1. test_send_qwen_routes_to_dashscope: _send_qwen calls _ensure_qwen_client
   and _dashscope_call, returns the text from the DashScope response
2. test_qwen_vision_vl_model_accepts_image: when file_items contains an
   image, the messages passed to _dashscope_call include the image ref
3. test_qwen_tool_format_translation: build_dashscope_tools converts
   OpenAI-shaped tool dicts to DashScope shape (name/description/parameters
   flat structure, not wrapped in function:)
4. test_qwen_error_classification: classify_dashscope_error maps
   dashscope.common.error.InvalidApiKey -> ProviderError(kind='auth',
   provider='qwen')
5. test_list_qwen_models_returns_hardcoded_registry: _list_qwen_models
   returns the 7 Qwen models registered in src/vendor_capabilities.py

The autouse _reset_qwen_state fixture uses hasattr() so it is a no-op
when _qwen_client / _qwen_history do not exist (yet); this keeps the
fixture working in the Red phase.

All 5 tests fail:
- Tests 1, 2: AttributeError: src.ai_client has no _ensure_qwen_client /
  _send_qwen / _dashscope_call
- Tests 3, 4: ModuleNotFoundError: No module named src.qwen_adapter
- Test 5: ImportError: cannot import name _list_qwen_models

Test signature adapted to match the real _send_minimax signature at
src/ai_client.py:2143-2148 (10 params, no enable_tools / rag_engine)
rather than the plan's 12-param signature.

Next: Green phase - implement src/qwen_adapter.py + src/ai_client.py
state + _ensure_qwen_client + _send_qwen + _list_qwen_models.
2026-06-11 00:53:10 -04:00
ed d5373e8f94 conductor(plan): mark t1.12 + phase_1 complete; advance to phase 2 2026-06-11 00:48:14 -04:00
ed 03da130780 conductor(checkpoint): Phase 1 complete - capability matrix framework + shared helper
Phase 1 of qwen_llama_grok_integration_20260606 ships two new modules and
one new dependency, all under TDD discipline (12 tasks, 4 atomic commits,
3+6 failing-then-passing tests).

Modules shipped:
- src/vendor_capabilities.py (55 lines): VendorCapabilities frozen dataclass
  with 12 fields, module-level _REGISTRY dict keyed by (vendor, model),
  register() / get_capabilities() (with vendor '*' wildcard fallback) /
  list_models_for_vendor() functions, 22 initial registry entries
  (1 minimax, 4 grok, 9 llama, 8 qwen; plan's typo of minimax/grok-2-latest
  omitted).
- src/openai_compatible.py (144 lines): NormalizedResponse frozen dataclass,
  OpenAICompatibleRequest dataclass, send_openai_compatible() dispatch,
  _send_blocking + _send_streaming helpers, _classify_openai_compatible_error
  error classifier (RateLimitError->rate_limit, AuthenticationError->auth,
  etc.). Fixed plan's MagicMock_noop forward-reference code smell.

Tests shipped (all passing):
- tests/test_vendor_capabilities.py (40 lines, 3 tests)
- tests/test_openai_compatible.py (88 lines, 6 tests)
- Total: 9 new tests, 0 regressions

Dependency added:
- pyproject.toml: dashscope>=1.14.0,<2.0.0 (installed: 1.25.21)

Verification:
- 24/24 tests pass in batch (test_minimax_provider, test_ai_client_no_top_level_sdk_imports,
  test_vendor_capabilities, test_openai_compatible)
- 4 audit scripts pass with no new violations:
  - scripts/audit_main_thread_imports.py: OK
  - scripts/audit_weak_types.py: OK
  - scripts/check_test_toml_paths.py: OK
  - scripts/audit_no_models_config_io.py: OK
- src/ai_client.py: NOT modified (Phase 4 will refactor _send_minimax)
- src/openai_compatible.py and src/vendor_capabilities.py are importable
  with no side effects beyond registry population
- No threading.Thread calls introduced (per project invariant)
- Module-level imports in new files are stdlib + openai (already-used SDK)
  + a function-level import of ProviderError from src.ai_client inside
  the error classifier (avoids circular import risk)
2026-06-11 00:46:41 -04:00
ed 67782198b6 conductor(plan): mark t1.11 (dashscope dep) complete; advance to t1.12 2026-06-11 00:46:18 -04:00
ed f4186f1061 chore(deps): add dashscope>=1.14.0,<2.0.0 for Qwen support 2026-06-11 00:44:08 -04:00
ed f07e616c38 conductor(plan): mark t1.5-t1.10 complete; advance to t1.11 2026-06-11 00:41:11 -04:00
ed d7d7d5cef9 feat(openai_compatible): implement shared send helper with streaming/tool/vision/error
Green phase: src/openai_compatible.py now exists and all 6 Red-phase
tests in tests/test_openai_compatible.py pass.

Implementation (144 lines, 1-space indent, no comments):

Data structures:
- NormalizedResponse: frozen dataclass with text, tool_calls,
  usage_input_tokens, usage_output_tokens, usage_cache_read_tokens,
  usage_cache_creation_tokens, raw_response
- OpenAICompatibleRequest: regular dataclass with messages, model,
  temperature=0.0, top_p=1.0, max_tokens=8192, tools=None,
  tool_choice='auto', stream=False, stream_callback=None

Algorithms:
- send_openai_compatible(client, request, *, capabilities) -> NormalizedResponse
  Dispatches to _send_blocking or _send_streaming based on request.stream.
  Catches openai.OpenAIError and re-raises as classified ProviderError.
- _send_blocking: extracts message text + tool_calls, converts tool_calls
  to dicts via _to_dict_tool_call, reads usage.prompt_tokens /
  usage.completion_tokens (with int() coercion for MagicMock test compat).
- _send_streaming: iterates chunks, accumulates text parts, aggregates
  tool_calls by index, fires stream_callback per text delta, reads
  chunk.usage for final token counts.
- _classify_openai_compatible_error: maps RateLimitError -> 'rate_limit',
  AuthenticationError/PermissionDeniedError -> 'auth', APIConnectionError
  -> 'network', APIStatusError with 402/429/401-403/500-504 -> 'balance'/
  'rate_limit'/'auth'/'network', BadRequestError -> 'quota', fallback
  'unknown'. All use provider='openai_compatible'.

Fixed plan's code smell: removed the 'MagicMock_noop' forward-reference
class (defined after first use) and replaced with the cleaner Pythonic
pattern 'int(getattr(usage, prompt_tokens, 0) or 0)'. Real OpenAI SDK
always sets usage on responses; the defensive fallback was noise.

Function-level import of ProviderError inside _classify_openai_compatible_error
avoids any circular import risk.
2026-06-11 00:39:58 -04:00
ed b53fe39d79 test(openai_compatible): red phase for shared send helper (6 failing tests)
6 failing tests in tests/test_openai_compatible.py that establish the
core behaviors of the new send_openai_compatible() shared helper:

1. test_send_non_streaming_returns_normalized_response: blocking call
   returns text, empty tool_calls, and correct usage token counts
2. test_send_streaming_aggregates_chunks: streaming call aggregates
   deltas into final text and fires stream_callback per chunk
3. test_tool_call_detection_in_response: tool_calls from the response
   are converted to dicts with id/type/function/arguments fields
4. test_vision_multimodal_message: messages with multimodal content
   (text + image_url) are passed through unchanged to the client
5. test_error_classification_429_to_rate_limit: RateLimitError from
   openai SDK is caught and re-raised as ProviderError(kind='rate_limit')
6. test_normalized_response_is_frozen_dataclass: NormalizedResponse is
   a frozen dataclass (FrozenInstanceError on attribute assignment)

All 6 tests fail with ModuleNotFoundError: No module named
'src.openai_compatible' (confirmed via pytest). The implementation file
will be created in the next commit (Green phase).

ProviderError confirmed importable from src.ai_client (no stub needed).
2026-06-11 00:35:13 -04:00
ed 6f11e7da14 conductor(plan): mark t1.1-t1.4 complete; advance to phase 1 in_progress 2026-06-11 00:31:57 -04:00
ed 6be04bc4f0 feat(vendor_capabilities): implement registry with initial 22-entry population
Green phase: src/vendor_capabilities.py now exists and all 3 Red-phase
tests in tests/test_vendor_capabilities.py pass.

Implementation:
- VendorCapabilities frozen dataclass with 12 fields (vendor, model, vision,
  tool_calling, caching, streaming, model_discovery, context_window,
  cost_tracking, cost_input_per_mtok, cost_output_per_mtok, notes)
- Module-level _REGISTRY dict keyed by (vendor, model)
- register() inserts/overwrites entries
- get_capabilities() returns specific entry if present, else vendor '*'
  default, else raises KeyError with 'No capabilities registered' message
- list_models_for_vendor() returns sorted model names for a vendor
  (excludes '*' wildcard)

Initial population (22 entries at module load):
- 1 minimax wildcard (cost: 0.20/0.20 per Mtok)
- 4 grok (1 wildcard + 3 models; grok-2-vision has vision=True)
- 9 llama (1 wildcard + 8 models; 11b/90b vision variants have vision=True)
- 8 qwen (1 wildcard + 7 models; qwen-vl-plus/max have vision=True;
  qwen-audio has notes='Text-only in v1; audio input deferred')

The plan's Task 1.3 listed 22 entries but included one impossible entry
(vendor='minimax', model='grok-2-latest'). Omitted; 21 entries shipped.

Test fix: test_fallback_to_vendor_default previously used model name
'llama-3.3-70b-specdec' which IS in the registry, so the specific entry
was returned (with default cost_tracking=True), not the wildcard. Fixed
by changing to 'llama-3.3-future-unregistered' (not in registry, so
fallback fires correctly).
2026-06-11 00:30:52 -04:00
ed 6fb6f8653c test(vendor_capabilities): red phase for registry lookup, fallback, unknown vendor
3 failing tests in tests/test_vendor_capabilities.py that establish the
core behaviors of the new VendorCapability matrix:

1. test_registry_lookup_known_model: registering and looking up a specific
   (vendor, model) entry returns the registered entry
2. test_fallback_to_vendor_default: looking up an unregistered model returns
   the vendor's '*' default entry
3. test_unknown_vendor_raises: looking up a vendor with no entries raises
   KeyError with a 'No capabilities registered' message

All 3 tests fail with ModuleNotFoundError: No module named
'src.vendor_capabilities' (confirmed via pytest). The implementation file
will be created in the next commit (Green phase).

The autouse _clean_registry fixture snapshots src.vendor_capabilities._REGISTRY
before each test and restores it after, providing test isolation for the
module-level state.
2026-06-11 00:19:00 -04:00
16 changed files with 929 additions and 464 deletions
-158
View File
@@ -1,158 +0,0 @@
# TASKS.md
<!-- Quick-read pointer to active and planned conductor tracks -->
<!-- Source of truth for task state is conductor/tracks/*/plan.md -->
## Active Tracks
*(none — all planned tracks queued below)*
*See tracks.md for active track status*
## Completed This Session
*(See archive: strict_execution_queue_completed_20260306)*
---
#### 0. conductor_path_configurable_20260306
- **Status:** Planned
- **Priority:** CRITICAL
- **Goal:** Eliminate hardcoded conductor paths. Make path configurable via config.toml or CONDUCTOR_DIR env var. Allow running app to use separate directory from development tracks.
## Phase 3: Future Horizons (Tracks 1-20)
*Initialized: 2026-03-06*
### Architecture & Backend
#### 1. true_parallel_worker_execution_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Implement true concurrency for the DAG engine. Once threading.local() is in place, the ExecutionEngine should spawn independent Tier 3 workers in parallel (e.g., 4 workers handling 4 isolated tests simultaneously). Requires strict file-locking or a Git-based diff-merging strategy to prevent AST collision.
#### 2. deep_ast_context_pruning_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Before dispatching a Tier 3 worker, use tree_sitter to automatically parse the target file AST, strip out unrelated function bodies, and inject a surgically condensed skeleton into the worker prompt. Guarantees the AI only sees what it needs to edit, drastically reducing token burn.
#### 3. visual_dag_ticket_editing_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Replace the linear ticket list in the GUI with an interactive Node Graph using ImGui Bundle node editor. Allow the user to visually drag dependency lines, split nodes, or delete tasks before clicking Execute Pipeline.
#### 4. tier4_auto_patching_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Elevate Tier 4 from a log summarizer to an auto-patcher. When a verification test fails, Tier 4 generates a .patch file. The GUI intercepts this and presents a side-by-side Diff Viewer. The user clicks Apply Patch to instantly resume the pipeline.
#### 5. native_orchestrator_20260306
- **Status:** Planned
- **Priority:** Low
- **Goal:** Absorb the Conductor extension entirely into the core application. Manual Slop should natively read/write plan.md, manage the metadata.json, and orchestrate the MMA tiers in pure Python, removing the dependency on external CLI shell executions (mma_exec.py).
---
### GUI Overhauls & Visualizations
#### 6. cost_token_analytics_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Real-time cost tracking panel displaying cost per model, session totals, and breakdown by tier. Uses existing cost_tracker.py which is implemented but has no GUI.
#### 7. performance_dashboard_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Expand performance metrics panel with CPU/RAM usage, frame time, input lag with historical graphs. Uses existing performance_monitor.py which has basic metrics but no detailed visualization.
#### 8. mma_multiworker_viz_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Split-view GUI for parallel worker streams per tier. Visualize multiple concurrent workers with individual status, output tabs, and resource usage. Enable kill/restart per worker.
#### 9. cache_analytics_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Gemini cache hit/miss visualization, memory usage, TTL status display. Uses existing ai_client.get_gemini_cache_stats() which is not displayed in GUI.
#### 10. tool_usage_analytics_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Analytics panel showing most-used tools, average execution time, and failure rates. Uses existing tool_log_callback data.
#### 11. session_insights_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Token usage over time, cost projections, session summary with efficiency scores. Visualize session_logger data.
#### 12. track_progress_viz_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Progress bars and percentage completion for active tracks and tickets. Better visualization of DAG execution state.
#### 13. manual_skeleton_injection_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Add UI controls to manually flag files for skeleton injection in discussions. Allow agent to request full file reads or specific def/class definitions on-demand.
#### 14. on_demand_def_lookup_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Add ability for agent to request specific class/function definitions during discussion. User can @mention a symbol and get its full definition inline.
---
### Manual UX Controls
#### 15. ticket_queue_mgmt_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Allow user to manually reorder, prioritize, or requeue tickets in the DAG. Add drag-drop reordering, priority tags, and bulk selection.
#### 16. kill_abort_workers_20260306
- **Status:** Planned
- **Priority:** High
- **Goal:** Add ability to kill/abort a running Tier 3 worker mid-execution. Currently workers run to completion; add cancel button.
#### 17. manual_block_control_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Allow user to manually block or unblock tickets with custom reasons. Currently blocked tickets rely on dependency resolution; add manual override.
#### 18. pipeline_pause_resume_20260306
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Add global pause/resume for the entire DAG execution pipeline. Allow user to freeze all worker activity and resume later.
#### 19. per_ticket_model_20260306
- **Status:** Planned
- **Priority:** Low
- **Goal:** Allow user to manually select which model to use for a specific ticket, overriding the default tier model.
#### 20. manual_ux_validation_20260302
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures.
---
### C/C++ Language Support
#### 25. ts_cpp_tree_sitter_20260308
- **Status:** Planned
- **Priority:** High
- **Goal:** Add tree-sitter C and C++ grammars. Extend ASTParser to support C/C++ skeleton and outline extraction. Add MCP tools ts_c_get_skeleton, ts_cpp_get_skeleton, ts_c_get_code_outline, ts_cpp_get_code_outline.
#### 26. gencpp_python_bindings_20260308
- **Status:** Planned
- **Priority:** Medium
- **Goal:** Bootstrap standalone Python project with CFFI bindings for gencpp C library. Provides foundation for richer C++ AST parsing in future (beyond tree-sitter syntax).
---
### Path Configuration
#### 27. project_conductor_dir_20260308
- **Status:** Planned
- **Priority:** High
- **Goal:** Make conductor directory per-project. Each project TOML can specify custom conductor dir for isolated track/state management. Extends existing global path config.
#### 28. gui_path_config_20260308
- **Status:** Planned
- **Priority:** High
- **Goal:** Add path configuration UI to Context Hub. Allow users to view and edit configurable paths (conductor, logs, scripts) directly from the GUI.
@@ -59,6 +59,40 @@ This means:
- **Anthropic/Gemini/DeepKeep** stay per-vendor code paths; the data-oriented refactor doesn't apply to them because their unique APIs are not OpenAI-compatible-shaped. - **Anthropic/Gemini/DeepKeep** stay per-vendor code paths; the data-oriented refactor doesn't apply to them because their unique APIs are not OpenAI-compatible-shaped.
- **"Base paths are unique"** (the user's wording) means: `_send_qwen()`, `_send_llama()`, `_send_grok()`, `_send_minimax()` are the unique entry points; everything they call into is shared. - **"Base paths are unique"** (the user's wording) means: `_send_qwen()`, `_send_llama()`, `_send_grok()`, `_send_minimax()` are the unique entry points; everything they call into is shared.
### 3.1.1 Architectural principle: "Use the best API per vendor" (added 2026-06-11, revised after Grok consultation)
**Per the user's correction, the track's prior assumption — "all OpenAI-compatible" — was incomplete. The right principle is: **use each vendor's native SDK or REST API when one exists, falling back to OpenAI-compatible only when no native option exists.**
The OpenAI-compatible shim (the `send_openai_compatible` helper) is the highest-leverage part of the spec: every vendor that uses it gets the same request/response/tool-calling/error/streaming logic with zero duplication. The question is **which vendors should use it** vs. which should have a native adapter.
**Confirmed best API per vendor (Grok-consulted 2026-06-11):**
| Vendor | API / Approach | Decision |
|---|---|---|
| **Qwen** | Alibaba DashScope native SDK (not OpenAI-compatible) | **NATIVE** — OpenAI-compatible mode drops Qwen-Audio, Qwen-Long custom chunking, Qwen-VL-Max enhanced vision. Phase 2 ships this. |
| **xAI (Grok)** | xAI official OpenAI-compatible (`https://api.x.ai/v1`) | **OPENAI-COMPATIBLE** — Per Grok's own confirmation, the OpenAI-compatible endpoint is "fully compatible and clean" with "no meaningful unique native surface lost." Phase 3 ships this. |
| **MiniMax** | OpenAI-compatible (`https://api.minimax.io/v1`) | **OPENAI-COMPATIBLE** — Already fully compatible. Phase 4 refactor is a pure win. |
| **DeepSeek** | OpenAI-compatible (`https://api.deepseek.com`) | **OPENAI-COMPATIBLE** — Drop-in compatible by design; offers an `/anthropic`-compatible path too. Follow-up track. |
| **Ollama** (Llama local backend) | Ollama's `/v1/chat/completions` (OpenAI-compatible) is the v1 choice; native `/api/chat` is a possible v2 | **OPENAI-COMPATIBLE in v1** — Ollama's compat endpoint supports streaming, tools, vision, JSON mode. Native `/api/chat` has extras (`think` param, `images: list[str]`, structured outputs); deferred to follow-up. |
| **Meta Llama API** (Llama cloud-native) | Meta's native REST API | **NATIVE (NEW BACKEND, FOLLOW-UP)** — Add as a 4th Llama backend. Deferred pending verification of Meta's API spec. |
| **Gemini** | Google `genai` SDK / Gemini native API (NOT OpenAI-compatible) | **NATIVE (FOLLOW-UP)** — OpenAI-comp loses explicit context caching (big cost win), Grounding with Google Search, native video/multimodal. The deferred follow-up track. |
| **Anthropic** | Anthropic official SDK / Messages API (NOT OpenAI-compatible) | **NATIVE (FOLLOW-UP)** — Native gives prompt caching (`cache_control` ephemeral, 50-90% savings), PDF processing, citations, extended thinking, Computer Use. OpenAI-comp layer exists but loses too much. The deferred follow-up track. |
**Implications for the capability matrix:** as native APIs add features, the matrix grows. The current v1 matrix has 7 fields (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking). Future expansion (per the deferred list in §3.3, refined by Grok's consultation) will add:
- `audio` (Qwen-Audio, others)
- `video` (Gemini native, others)
- `grounding` / `search` (Gemini Grounding with Google Search, Grok's `x_search` and `web_search`)
- `computer_use` (Anthropic, beta/agentic)
- `local` (boolean — true for Ollama; useful for UX "free local" badge)
- `reasoning` / `extended_thinking` (Grok `reasoning_effort`, Anthropic extended thinking, Ollama `think`)
- `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support` (per-vendor server-side tools)
- `structured_output` (response_format / format support)
The matrix IS the aggregate tracker; the GUI filters UI elements based on what's in the matrix. **The matrix's job is to be the canonical source of truth for "what can this vendor/model do"; the GUI never hard-codes per-vendor branches.** Any new capability a vendor adds (server-side tools, native cost reporting, prompt caching) goes into the matrix; the UI filters based on it.
**This track's Phase 3 ships the OpenAI-compatible Grok + Llama (3 backends) as the canonical implementation per Grok's confirmation; the native-API work for Llama (Ollama native, Meta Llama API) is deferred to follow-up tracks documented in §13.1.**
### 3.2 Module Layout ### 3.2 Module Layout
``` ```
@@ -222,9 +256,11 @@ _llama_api_key: str = "ollama" # Ollama doesn't require aut
**Model discovery:** Ollama exposes `GET /api/tags` (not `/v1/models`); OpenRouter exposes `GET /v1/models`. The Llama adapter probes both endpoints and unions the results. For custom URLs, falls back to the hardcoded registry. **Model discovery:** Ollama exposes `GET /api/tags` (not `/v1/models`); OpenRouter exposes `GET /v1/models`. The Llama adapter probes both endpoints and unions the results. For custom URLs, falls back to the hardcoded registry.
### 4.3 Grok via xAI (OpenAI-Compatible) ### 4.3 Grok via xAI (OpenAI-Compatible) — confirmed 2026-06-11
**SDK:** `openai` (already a dependency). **Per Grok's consultation (2026-06-11): the OpenAI-compatible endpoint at `https://api.x.ai/v1` is the canonical, fully-featured approach.** xAI's API is "fully compatible and clean" with "no meaningful unique native surface lost" by using the OpenAI-compatible shim. This section was previously labeled "Native REST API" based on a user impression that the native endpoint had unique features (prompt_cache_key, reasoning_effort, server-side tools, cost_in_usd_ticks) that the shim loses; Grok's actual recommendation is that the shim is fine.
**SDK:** `openai` (already a dependency). Set `base_url="https://api.x.ai/v1"` and pass the xAI API key as the Bearer token (handled automatically by the OpenAI SDK).
**State:** **State:**
```python ```python
@@ -239,15 +275,15 @@ _grok_history_lock: threading.Lock = threading.Lock()
**Models shipped in the capability registry (v1):** **Models shipped in the capability registry (v1):**
| Model | vision | tool_calling | caching | context_window | cost_input | cost_output | | Model | vision | tool_calling | context_window | cost_input | cost_output |
|---|---|---|---|---|---|---| |---|---|---|---|---|---|
| `grok-2` | false | true | false | 131,072 | $2.00 | $10.00 | | `grok-2` | false | true | 131,072 | $2.00 | $10.00 |
| `grok-2-vision` | true | true | false | 32,768 | $2.00 | $10.00 | | `grok-2-vision` | true | true | 32,768 | $2.00 | $10.00 |
| `grok-beta` | false | true | false | 131,072 | $5.00 | $15.00 | | `grok-beta` | false | true | 131,072 | $5.00 | $15.00 |
(Pricing from x.ai public pricing as of 2026-06-06; update if needed.) (Pricing from x.ai public pricing as of 2026-06-06; update if needed. `caching` stays `False` in v1 since Grok's OpenAI-compatible shim doesn't expose `prompt_cache_key`.)
**Entry point:** `_send_grok()` in `src/ai_client.py`. Calls `send_openai_compatible()` with the xAI base URL. **Entry point:** `_send_grok()` in `src/ai_client.py`. Calls `send_openai_compatible()` with the xAI base URL (via the OpenAI SDK).
**Tool format:** Native OpenAI. No translation needed. **Tool format:** Native OpenAI. No translation needed.
@@ -466,9 +502,15 @@ Each phase has its own checkpoint commit and git note.
## 13. See Also ## 13. See Also
### 13.1 Follow-up Track (separate plan) ### 13.1 Follow-up Tracks (separate plans)
**"Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high. **A. "Anthropic / Gemini / DeepSeek Capability Matrix Migration"** — Migrates the three remaining providers onto the same capability matrix. Required pre-work: ensure the matrix's per-model lookup pattern handles the `caching: true` (Anthropic 4-breakpoint, Gemini explicit) and `pdf_input: true` (Anthropic, Gemini) capabilities. Each provider keeps its unique per-vendor code path (the 4-breakpoint system, the genai SDK); the matrix entries are populated so the UX can adapt. This is a separate track because the migration of each unique-API provider is non-trivial and the risk of regressing the existing working code is high.
**B. "Llama Native APIs (Ollama native + Meta Llama API)"** — Per §3.1.1's revised assessment (after Grok's consultation), xAI's OpenAI-compatible endpoint is the canonical full-featured approach — NO Grok native refactor is needed. The follow-up for Llama backends is:
- **Llama (Ollama backend)** → Ollama native `/api/chat`; adds `think` param (low/medium/high), `images: list[str]` in messages (cleaner base64 than OpenAI's `image_url` content type), `thinking` field in responses, `format` for structured outputs. The Phase 3 Red tests are written for the OpenAI-compatible shim; the native tests would mock `requests.post` to `/api/chat`.
- **Llama (Meta Llama API backend)** → New 4th Llama backend; uses Meta's native REST API. Currently deferred pending verification of Meta's API spec (the `llama.developer.meta.com/docs/overview` URL returned 400 on fetch this session; needs re-verification when the docs are available).
- **Capability matrix expansion** → Add fields for the new native features per Grok's consultation: `audio`, `video`, `grounding`/`search`, `computer_use`, `local`, `reasoning`/`extended_thinking`, `web_search`, `x_search`, `code_execution`, `file_search`, `mcp_support`, `structured_output`. Each addition is a registry change + a UI adaptation in Phase 5.
- **Test rewrites** → The Phase 3 Llama Red tests in `test_llama_provider.py` would be extended with 2 more tests: native Ollama (`/api/chat` with `think` param, `images: list[str]`) and Meta Llama API. The Grok Red tests do NOT need rewriting.
### 13.2 Project References ### 13.2 Project References
@@ -5,14 +5,15 @@
track_id = "qwen_llama_grok_integration_20260606" track_id = "qwen_llama_grok_integration_20260606"
name = "Qwen, Llama & Grok Vendor Integration + Capability Matrix" name = "Qwen, Llama & Grok Vendor Integration + Capability Matrix"
status = "active" status = "active"
current_phase = 0 current_phase = 3
last_updated = "2026-06-06" last_updated = "2026-06-11"
[phases] [phases]
# Phase 1: Capability matrix framework + shared helper (no user-facing changes) # Phase 1: Capability matrix framework + shared helper (no user-facing changes)
phase_1 = { status = "pending", checkpoint_sha = "", name = "Capability matrix framework + shared helper" } phase_1 = { status = "completed", checkpoint_sha = "03da130", name = "Capability matrix framework + shared helper" }
# Phase 2: Qwen via DashScope # Phase 2: Qwen via DashScope
phase_2 = { status = "pending", checkpoint_sha = "", name = "Qwen via DashScope" } phase_2 = { status = "completed", checkpoint_sha = "0f2541a", name = "Qwen via DashScope" }
# Phase 3: Grok + Llama via shared helper # Phase 3: Grok + Llama via shared helper
phase_3 = { status = "pending", checkpoint_sha = "", name = "Grok + Llama via shared helper" } phase_3 = { status = "pending", checkpoint_sha = "", name = "Grok + Llama via shared helper" }
# Phase 4: MiniMax refactor # Phase 4: MiniMax refactor
@@ -25,49 +26,49 @@ phase_6 = { status = "pending", checkpoint_sha = "", name = "Docs + archive" }
[tasks] [tasks]
# Phase 1: Capability matrix framework + shared helper # Phase 1: Capability matrix framework + shared helper
# (Tasks TBD by writing-plans; placeholder structure only) # (Tasks TBD by writing-plans; placeholder structure only)
t1_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_vendor_capabilities.py::test_registry_lookup_known_model" } t1_1 = { status = "completed", commit_sha = "6fb6f86", description = "Red: tests/test_vendor_capabilities.py::test_registry_lookup_known_model" }
t1_2 = { status = "pending", commit_sha = "", description = "Red: tests/test_vendor_capabilities.py::test_fallback_to_vendor_default" } t1_2 = { status = "completed", commit_sha = "6fb6f86", description = "Red: tests/test_vendor_capabilities.py::test_fallback_to_vendor_default" }
t1_3 = { status = "pending", commit_sha = "", description = "Red: tests/test_vendor_capabilities.py::test_unknown_vendor_raises" } t1_3 = { status = "completed", commit_sha = "6fb6f86", description = "Red: tests/test_vendor_capabilities.py::test_unknown_vendor_raises" }
t1_4 = { status = "pending", commit_sha = "", description = "Green: implement src/vendor_capabilities.py with VendorCapabilities + get_capabilities + initial registry" } t1_4 = { status = "completed", commit_sha = "6be04bc", description = "Green: implement src/vendor_capabilities.py with VendorCapabilities + get_capabilities + initial registry" }
t1_5 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_send_non_streaming" } t1_5 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_send_non_streaming" }
t1_6 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_send_streaming_aggregates_chunks" } t1_6 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_send_streaming_aggregates_chunks" }
t1_7 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_tool_call_detection" } t1_7 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_tool_call_detection" }
t1_8 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_vision_multimodal_message" } t1_8 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_vision_multimodal_message" }
t1_9 = { status = "pending", commit_sha = "", description = "Red: tests/test_openai_compatible.py::test_error_classification_429_to_rate_limit" } t1_9 = { status = "completed", commit_sha = "b53fe39", description = "Red: tests/test_openai_compatible.py::test_error_classification_429_to_rate_limit" }
t1_10 = { status = "pending", commit_sha = "", description = "Green: implement src/openai_compatible.py with NormalizedResponse + OpenAICompatibleRequest + send_openai_compatible" } t1_10 = { status = "completed", commit_sha = "d7d7d5c", description = "Green: implement src/openai_compatible.py with NormalizedResponse + OpenAICompatibleRequest + send_openai_compatible" }
t1_11 = { status = "pending", commit_sha = "", description = "Add dashscope>=1.14.0,<2.0.0 to pyproject.toml dependencies" } t1_11 = { status = "in_progress", commit_sha = "", description = "Add dashscope>=1.14.0,<2.0.0 to pyproject.toml dependencies" }
t1_12 = { status = "pending", commit_sha = "", description = "Phase 1 checkpoint commit + git note" } t1_12 = { status = "completed", commit_sha = "03da130", description = "Phase 1 checkpoint commit + git note" }
# Phase 2: Qwen via DashScope # Phase 2: Qwen via DashScope
t2_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_send_qwen_routes_to_dashscope" } t2_1 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_send_qwen_routes_to_dashscope" }
t2_2 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_qwen_tool_format_translation" } t2_2 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_qwen_tool_format_translation" }
t2_3 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_qwen_vl_vision_image_base64" } t2_3 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_qwen_vl_vision_image_base64" }
t2_4 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_qwen_error_classification" } t2_4 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_qwen_error_classification" }
t2_5 = { status = "pending", commit_sha = "", description = "Red: tests/test_qwen_provider.py::test_list_qwen_models" } t2_5 = { status = "completed", commit_sha = "060f471", description = "Red: tests/test_qwen_provider.py::test_list_qwen_models" }
t2_6 = { status = "pending", commit_sha = "", description = "Green: implement _send_qwen, _ensure_qwen_client, _classify_qwen_error, _list_qwen_models in src/ai_client.py" } t2_6 = { status = "completed", commit_sha = "bc2cce1", description = "Green: implement _send_qwen, _ensure_qwen_client, _classify_qwen_error, _list_qwen_models in src/ai_client.py" }
t2_7 = { status = "pending", commit_sha = "", description = "Add [qwen] section to credentials_template.toml" } t2_7 = { status = "cancelled", commit_sha = "ab6b53f", description = "SKIPPED: no credentials_template.toml exists in project; user maintains single credentials.toml directly" }
t2_8 = { status = "pending", commit_sha = "", description = "Add qwen to PROVIDERS in src/gui_2.py and src/app_controller.py" } t2_8 = { status = "completed", commit_sha = "ab6b53f", description = "Add qwen to PROVIDERS (centralized in src/models.py; gui_2.py and app_controller.py import from there)" }
t2_9 = { status = "pending", commit_sha = "", description = "Add Qwen models to capability registry in src/vendor_capabilities.py" } t2_9 = { status = "completed", commit_sha = "6be04bc", description = "Add Qwen models to capability registry (DONE in Phase 1 initial population; 8 qwen entries: 1 wildcard + 7 specific)" }
t2_10 = { status = "pending", commit_sha = "", description = "Add Qwen pricing to src/cost_tracker.py" } t2_10 = { status = "completed", commit_sha = "ab6b53f", description = "Add Qwen pricing to src/cost_tracker.py" }
t2_11 = { status = "pending", commit_sha = "", description = "Phase 2 checkpoint commit + git note" } t2_11 = { status = "completed", commit_sha = "0f2541a", description = "Phase 2 checkpoint commit + git note" }
# Phase 3: Grok + Llama via shared helper # Phase 3: Grok + Llama via shared helper
t3_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_grok_provider.py::test_send_grok_uses_xai_endpoint" } t3_1 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_grok_provider.py::test_send_grok_uses_xai_endpoint" }
t3_2 = { status = "pending", commit_sha = "", description = "Red: tests/test_grok_provider.py::test_grok_2_vision_vision_support" } t3_2 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_grok_provider.py::test_grok_2_vision_vision_support" }
t3_3 = { status = "pending", commit_sha = "", description = "Green: implement _send_grok, _ensure_grok_client in src/ai_client.py" } t3_3 = { status = "completed", commit_sha = "29a96cc", description = "Green: implement _send_grok, _ensure_grok_client in src/ai_client.py" }
t3_4 = { status = "pending", commit_sha = "", description = "Add [grok] section to credentials_template.toml" } t3_4 = { status = "cancelled", commit_sha = "f9b5c93", description = "SKIPPED: no credentials_template.toml exists; user maintains single credentials.toml directly" }
t3_5 = { status = "pending", commit_sha = "", description = "Add grok to PROVIDERS in src/gui_2.py and src/app_controller.py" } t3_5 = { status = "completed", commit_sha = "f9b5c93", description = "Add grok to PROVIDERS (centralized in src/models.py)" }
t3_6 = { status = "pending", commit_sha = "", description = "Add Grok models to capability registry" } t3_6 = { status = "completed", commit_sha = "6be04bc", description = "Add Grok models to capability registry (DONE in Phase 1)" }
t3_7 = { status = "pending", commit_sha = "", description = "Add Grok pricing to src/cost_tracker.py" } t3_7 = { status = "completed", commit_sha = "f9b5c93", description = "Add Grok pricing to src/cost_tracker.py (3 entries)" }
t3_8 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_send_llama_ollama_backend" } t3_8 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_send_llama_ollama_backend" }
t3_9 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_send_llama_openrouter_backend" } t3_9 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_send_llama_openrouter_backend" }
t3_10 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_send_llama_custom_url" } t3_10 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_send_llama_custom_url" }
t3_11 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_llama_model_discovery_unions_ollama_and_openrouter" } t3_11 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_llama_model_discovery_unions_ollama_and_openrouter" }
t3_12 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_llama_3_2_vision_vision_support" } t3_12 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_llama_3_2_vision_vision_support" }
t3_13 = { status = "pending", commit_sha = "", description = "Red: tests/test_llama_provider.py::test_llama_local_backend_cost_tracking_false" } t3_13 = { status = "completed", commit_sha = "90f2be9", description = "Red: tests/test_llama_provider.py::test_llama_local_backend_cost_tracking_false" }
t3_14 = { status = "pending", commit_sha = "", description = "Green: implement _send_llama, _ensure_llama_client, _list_llama_models in src/ai_client.py" } t3_14 = { status = "completed", commit_sha = "29a96cc", description = "Green: implement _send_llama, _ensure_llama_client, _list_llama_models, _get_llama_cost_tracking" }
t3_15 = { status = "pending", commit_sha = "", description = "Add [llama] section to credentials_template.toml" } t3_15 = { status = "cancelled", commit_sha = "f9b5c93", description = "SKIPPED: no credentials_template.toml exists; user maintains single credentials.toml directly" }
t3_16 = { status = "pending", commit_sha = "", description = "Add llama to PROVIDERS in src/gui_2.py and src/app_controller.py" } t3_16 = { status = "completed", commit_sha = "f9b5c93", description = "Add llama to PROVIDERS (centralized in src/models.py)" }
t3_17 = { status = "pending", commit_sha = "", description = "Add Llama models to capability registry" } t3_17 = { status = "completed", commit_sha = "6be04bc", description = "Add Llama models to capability registry (DONE in Phase 1; 9 entries: 1 wildcard + 8 models)" }
t3_18 = { status = "pending", commit_sha = "", description = "Phase 3 checkpoint commit + git note" } t3_18 = { status = "completed", commit_sha = "21adb4a", description = "Phase 3 checkpoint commit + git note" }
# Phase 4: MiniMax refactor # Phase 4: MiniMax refactor
t4_1 = { status = "pending", commit_sha = "", description = "Baseline: run tests/test_minimax_provider.py; all pass (green)" } t4_1 = { status = "pending", commit_sha = "", description = "Baseline: run tests/test_minimax_provider.py; all pass (green)" }
t4_2 = { status = "pending", commit_sha = "", description = "Refactor _send_minimax to use send_openai_compatible helper" } t4_2 = { status = "pending", commit_sha = "", description = "Refactor _send_minimax to use send_openai_compatible helper" }
@@ -93,7 +94,7 @@ t6_5 = { status = "pending", commit_sha = "", description = "Final checkpoint co
# Filled as phases complete # Filled as phases complete
phase_1_capability_registry_complete = false phase_1_capability_registry_complete = false
phase_1_shared_helper_complete = false phase_1_shared_helper_complete = false
phase_2_qwen_dashscope_complete = false phase_2_qwen_dashscope_complete = true
phase_3_grok_complete = false phase_3_grok_complete = false
phase_3_llama_complete = false phase_3_llama_complete = false
phase_4_minimax_refactor_preserves_tests = false phase_4_minimax_refactor_preserves_tests = false
+1
View File
@@ -20,6 +20,7 @@ dependencies = [
"uvicorn~=0.41.0", "uvicorn~=0.41.0",
"anthropic~=0.83.0", "anthropic~=0.83.0",
"dashscope>=1.14.0,<2.0.0",
"google-genai~=1.64.0", "google-genai~=1.64.0",
"openai~=2.26.0", "openai~=2.26.0",
-30
View File
@@ -1,30 +0,0 @@
$total = 0
$passed = 0
$failed = 0
$testFiles = Get-ChildItem tests/test_*.py | Select-Object -ExpandProperty Name
Write-Host "Running full test suite..."
Write-Host "==========================="
foreach ($file in $testFiles) {
Write-Host "Testing: $file"
$result = uv run pytest "tests/$file" -q --tb=no 2>&1 | Select-String -Pattern "passed|failed"
if ($result -match "(\d+) passed") {
$p = [int]$matches[1]
$passed += $p
$total += $p
}
if ($result -match "(\d+) failed") {
$f = [int]$matches[1]
$failed += $f
$total += $f
}
}
Write-Host ""
Write-Host "==========================="
Write-Host "TOTAL: $total tests"
Write-Host "PASSED: $passed"
Write-Host "FAILED: $failed"
+282 -206
View File
@@ -131,6 +131,21 @@ _minimax_client: Any = None
_minimax_history: list[dict[str, Any]] = [] _minimax_history: list[dict[str, Any]] = []
_minimax_history_lock: threading.Lock = threading.Lock() _minimax_history_lock: threading.Lock = threading.Lock()
_qwen_client: Any = None
_qwen_history: list[dict[str, Any]] = []
_qwen_history_lock: threading.Lock = threading.Lock()
_qwen_region: str = "china"
_grok_client: Any = None
_grok_history: list[dict[str, Any]] = []
_grok_history_lock: threading.Lock = threading.Lock()
_llama_client: Any = None
_llama_history: list[dict[str, Any]] = []
_llama_history_lock: threading.Lock = threading.Lock()
_llama_base_url: str = "http://localhost:11434/v1"
_llama_api_key: str = "ollama"
_send_lock: threading.Lock = threading.Lock() _send_lock: threading.Lock = threading.Lock()
_BIAS_ENGINE = ToolBiasEngine() _BIAS_ENGINE = ToolBiasEngine()
@@ -486,6 +501,7 @@ def reset_session() -> None:
global _anthropic_client, _anthropic_history global _anthropic_client, _anthropic_history
global _deepseek_client, _deepseek_history global _deepseek_client, _deepseek_history
global _minimax_client, _minimax_history global _minimax_client, _minimax_history
global _qwen_client, _qwen_history
global _CACHED_ANTHROPIC_TOOLS, _CACHED_DEEPSEEK_TOOLS global _CACHED_ANTHROPIC_TOOLS, _CACHED_DEEPSEEK_TOOLS
global _gemini_cli_adapter global _gemini_cli_adapter
if _gemini_client and _gemini_cache: if _gemini_client and _gemini_cache:
@@ -513,6 +529,17 @@ def reset_session() -> None:
_minimax_client = None _minimax_client = None
with _minimax_history_lock: with _minimax_history_lock:
_minimax_history = [] _minimax_history = []
_qwen_client = None
with _qwen_history_lock:
_qwen_history = []
_grok_client = None
with _grok_history_lock:
_grok_history = []
_llama_client = None
with _llama_history_lock:
_llama_history = []
_llama_base_url = "http://localhost:11434/v1"
_llama_api_key = "ollama"
_CACHED_ANTHROPIC_TOOLS = None _CACHED_ANTHROPIC_TOOLS = None
_CACHED_DEEPSEEK_TOOLS = None _CACHED_DEEPSEEK_TOOLS = None
file_cache.reset_client() file_cache.reset_client()
@@ -527,6 +554,9 @@ def list_models(provider: str) -> list[str]:
elif provider == "deepseek": return _list_deepseek_models(creds["deepseek"]["api_key"]) elif provider == "deepseek": return _list_deepseek_models(creds["deepseek"]["api_key"])
elif provider == "gemini_cli": return _list_gemini_cli_models() elif provider == "gemini_cli": return _list_gemini_cli_models()
elif provider == "minimax": return _list_minimax_models(creds["minimax"]["api_key"]) elif provider == "minimax": return _list_minimax_models(creds["minimax"]["api_key"])
elif provider == "qwen": return _list_qwen_models()
elif provider == "grok": return _list_grok_models()
elif provider == "llama": return _list_llama_models()
return [] return []
#endregion: Comms Log #endregion: Comms Log
@@ -2140,6 +2170,58 @@ def _ensure_minimax_client() -> None:
raise ValueError("MiniMax API key not found in credentials.toml") raise ValueError("MiniMax API key not found in credentials.toml")
_minimax_client = OpenAI(api_key=api_key, base_url="https://api.minimax.chat/v1") _minimax_client = OpenAI(api_key=api_key, base_url="https://api.minimax.chat/v1")
def _ensure_grok_client() -> Any:
global _grok_client
if _grok_client is None:
openai = _require_warmed("openai")
creds = _load_credentials()
api_key = creds.get("grok", {}).get("api_key")
if not api_key:
raise ValueError("Grok API key not found in credentials.toml")
_grok_client = openai.OpenAI(api_key=api_key, base_url="https://api.x.ai/v1")
return _grok_client
def _send_grok(md_content: str, user_message: str, base_dir: str,
file_items: list[dict[str, Any]] | None = None,
discussion_history: str = "",
stream: bool = False,
pre_tool_callback: Optional[Callable[[str, str, Optional[Callable[[str], str]]], Optional[str]]] = None,
qa_callback: Optional[Callable[[str], str]] = None,
stream_callback: Optional[Callable[[str], None]] = None,
patch_callback: Optional[Callable[[str, str], Optional[str]]] = None) -> str:
client = _ensure_grok_client()
from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible
from src.vendor_capabilities import get_capabilities
with _grok_history_lock:
user_content = user_message
if file_items:
for fi in file_items:
if fi.get("is_image") and fi.get("base64_data"):
user_content = f"[IMAGE: {fi.get('path', 'attachment')}]\n{user_content}"
if discussion_history and not _grok_history:
_grok_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"})
else:
_grok_history.append({"role": "user", "content": user_content})
messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"}]
messages.extend(_grok_history)
request = OpenAICompatibleRequest(
messages=messages,
model=_model,
temperature=_temperature,
top_p=_top_p,
max_tokens=_max_tokens,
stream=stream,
stream_callback=stream_callback,
)
caps = get_capabilities("grok", _model)
response = send_openai_compatible(client, request, capabilities=caps)
_grok_history.append({"role": "assistant", "content": response.text})
return response.text
def _list_grok_models() -> list[str]:
from src.vendor_capabilities import list_models_for_vendor
return list_models_for_vendor("grok")
def _send_minimax(md_content: str, user_message: str, base_dir: str, def _send_minimax(md_content: str, user_message: str, base_dir: str,
file_items: list[dict[str, Any]] | None = None, file_items: list[dict[str, Any]] | None = None,
discussion_history: str = "", discussion_history: str = "",
@@ -2148,227 +2230,221 @@ def _send_minimax(md_content: str, user_message: str, base_dir: str,
qa_callback: Optional[Callable[[str], str]] = None, qa_callback: Optional[Callable[[str], str]] = None,
stream_callback: Optional[Callable[[str], None]] = None, stream_callback: Optional[Callable[[str], None]] = None,
patch_callback: Optional[Callable[[str, str], Optional[str]]] = None) -> str: patch_callback: Optional[Callable[[str, str], Optional[str]]] = None) -> str:
""" _ensure_minimax_client()
[C: src/ai_server.py:_handle_send] from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible
""" from src.vendor_capabilities import get_capabilities
openai = _require_warmed("openai")
requests = _require_warmed("requests")
try:
mcp_client.configure(file_items or [], [base_dir])
creds = _load_credentials()
api_key = creds.get("minimax", {}).get("api_key")
if not api_key:
raise ValueError("MiniMax API key not found in credentials.toml")
client = OpenAI(api_key=api_key, base_url="https://api.minimax.io/v1")
with _minimax_history_lock: with _minimax_history_lock:
_repair_minimax_history(_minimax_history) _repair_minimax_history(_minimax_history)
if discussion_history and not _minimax_history: if discussion_history and not _minimax_history:
user_content = f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}" _minimax_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"})
else: else:
user_content = user_message _minimax_history.append({"role": "user", "content": user_message})
_minimax_history.append({"role": "user", "content": user_content}) messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"}]
messages.extend(_minimax_history)
all_text_parts: list[str] = [] request = OpenAICompatibleRequest(
_cumulative_tool_bytes = 0 messages=messages,
model=_model,
for round_idx in range(MAX_TOOL_ROUNDS + 2): temperature=_temperature,
current_api_messages: list[dict[str, Any]] = [] top_p=_top_p,
max_tokens=min(_max_tokens, 8192),
sys_msg = {"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"} stream=stream,
current_api_messages.append(sys_msg) stream_callback=stream_callback,
)
with _minimax_history_lock: caps = get_capabilities("minimax", _model)
dropped = _trim_minimax_history([sys_msg], _minimax_history) response = send_openai_compatible(_minimax_client, request, capabilities=caps)
if dropped > 0:
_append_comms("OUT", "request", {"message": f"[MINIMAX HISTORY TRIMMED: dropped {dropped} old messages]"})
for i, msg in enumerate(_minimax_history):
role = msg.get("role")
api_msg = {"role": role}
content = msg.get("content")
if role == "assistant":
if msg.get("tool_calls"):
api_msg["content"] = content or None
api_msg["tool_calls"] = msg["tool_calls"]
else:
api_msg["content"] = content or ""
elif role == "tool":
api_msg["content"] = content or ""
api_msg["tool_call_id"] = msg.get("tool_call_id")
else:
api_msg["content"] = content or ""
current_api_messages.append(api_msg)
request_payload: dict[str, Any] = {
"model": _model,
"messages": current_api_messages,
"stream": stream,
"extra_body": {"reasoning_split": True},
}
if stream:
request_payload["stream_options"] = {"include_usage": True}
request_payload["temperature"] = 1.0
request_payload["top_p"] = _top_p
request_payload["max_tokens"] = min(_max_tokens, 8192)
tools = _get_deepseek_tools()
if tools:
request_payload["tools"] = tools
events.emit("request_start", payload={"provider": "minimax", "model": _model, "round": round_idx, "streaming": stream})
try:
response = client.chat.completions.create(**request_payload, timeout=120)
except Exception as e:
raise _classify_minimax_error(e) from e
assistant_text = ""
tool_calls_raw = []
reasoning_content = "" reasoning_content = ""
finish_reason = "stop" if response.raw_response and hasattr(response.raw_response, "choices"):
usage = {} choice = response.raw_response.choices[0]
if hasattr(choice.message, "reasoning_details") and choice.message.reasoning_details:
if stream: reasoning_content = choice.message.reasoning_details[0].get("text", "") if choice.message.reasoning_details else ""
aggregated_content = ""
aggregated_tool_calls: list[dict[str, Any]] = []
aggregated_reasoning = ""
current_usage: dict[str, Any] = {}
final_finish_reason = "stop"
for chunk in response:
if not chunk.choices:
if chunk.usage:
current_usage = chunk.usage.model_dump()
continue
delta = chunk.choices[0].delta
if delta.content:
content_chunk = delta.content
aggregated_content += content_chunk
if stream_callback:
stream_callback(content_chunk)
if hasattr(delta, "reasoning_details") and delta.reasoning_details:
for detail in delta.reasoning_details:
if "text" in detail:
aggregated_reasoning += detail["text"]
if delta.tool_calls:
for tc_delta in delta.tool_calls:
idx = tc_delta.index
while len(aggregated_tool_calls) <= idx:
aggregated_tool_calls.append({"id": "", "type": "function", "function": {"name": "", "arguments": ""}})
target = aggregated_tool_calls[idx]
if tc_delta.id:
target["id"] = tc_delta.id
if tc_delta.function and tc_delta.function.name:
target["function"]["name"] += tc_delta.function.name
if tc_delta.function and tc_delta.function.arguments:
target["function"]["arguments"] += tc_delta.function.arguments
if chunk.choices[0].finish_reason:
final_finish_reason = chunk.choices[0].finish_reason
if chunk.usage:
current_usage = chunk.usage.model_dump()
assistant_text = aggregated_content
tool_calls_raw = aggregated_tool_calls
reasoning_content = aggregated_reasoning
finish_reason = final_finish_reason
usage = current_usage
else:
choice = response.choices[0]
message = choice.message
assistant_text = message.content or ""
tool_calls_raw = message.tool_calls or []
if hasattr(message, "reasoning_details") and message.reasoning_details:
reasoning_content = message.reasoning_details[0].get("text", "") if message.reasoning_details else ""
finish_reason = choice.finish_reason or "stop"
usage = response.usage.model_dump() if response.usage else {}
thinking_tags = "" thinking_tags = ""
if reasoning_content: if reasoning_content:
thinking_tags = f"<thinking>\n{reasoning_content}\n</thinking>\n" thinking_tags = f"<thinking>\n{reasoning_content}\n</thinking>\n"
full_assistant_text = thinking_tags + assistant_text full_text = thinking_tags + response.text
with _minimax_history_lock: with _minimax_history_lock:
msg_to_store: dict[str, Any] = {"role": "assistant", "content": assistant_text or None} msg_to_store: dict[str, Any] = {"role": "assistant", "content": response.text or None}
if reasoning_content: if reasoning_content:
msg_to_store["reasoning_content"] = reasoning_content msg_to_store["reasoning_content"] = reasoning_content
if tool_calls_raw:
msg_to_store["tool_calls"] = tool_calls_raw
_minimax_history.append(msg_to_store) _minimax_history.append(msg_to_store)
return full_text
if full_assistant_text:
all_text_parts.append(full_assistant_text)
_append_comms("IN", "response", {
"round": round_idx,
"stop_reason": finish_reason,
"text": full_assistant_text,
"tool_calls": tool_calls_raw,
"usage": usage,
"streaming": stream
})
if finish_reason != "tool_calls" and not tool_calls_raw:
break
if round_idx > MAX_TOOL_ROUNDS:
break
try:
loop = asyncio.get_running_loop()
results = asyncio.run_coroutine_threadsafe(
_execute_tool_calls_concurrently(tool_calls_raw, base_dir, pre_tool_callback, qa_callback, round_idx, "minimax", patch_callback),
loop
).result()
except RuntimeError:
results = asyncio.run(_execute_tool_calls_concurrently(tool_calls_raw, base_dir, pre_tool_callback, qa_callback, round_idx, "minimax", patch_callback))
tool_results_for_history: list[dict[str, Any]] = []
for i, (name, call_id, out, _) in enumerate(results):
if i == len(results) - 1:
if file_items:
file_items, changed = _reread_file_items(file_items)
ctx = _build_file_diff_text(changed)
if ctx:
out += f"\n\n{_get_context_marker()}\n\n{ctx}"
if round_idx == MAX_TOOL_ROUNDS:
out += "\n\n[SYSTEM: MAX ROUNDS. PROVIDE FINAL ANSWER.]"
truncated = _truncate_tool_output(out)
_cumulative_tool_bytes += len(truncated)
tool_results_for_history.append({
"role": "tool",
"tool_call_id": call_id,
"content": truncated,
})
_append_comms("IN", "tool_result", {"name": name, "id": call_id, "output": out})
events.emit("tool_execution", payload={"status": "completed", "tool": name, "result": out, "round": round_idx})
if _cumulative_tool_bytes > _MAX_TOOL_OUTPUT_BYTES:
tool_results_for_history.append({
"role": "user",
"content": f"SYSTEM WARNING: Cumulative tool output exceeded {_MAX_TOOL_OUTPUT_BYTES // 1000}KB budget. Provide your final answer now."
})
_append_comms("OUT", "request", {"message": f"[TOOL OUTPUT BUDGET EXCEEDED: {_cumulative_tool_bytes} bytes]"})
with _minimax_history_lock:
for tr in tool_results_for_history:
_minimax_history.append(tr)
return "\n\n".join(all_text_parts) if all_text_parts else "(No text returned)"
except Exception as e:
raise _classify_minimax_error(e) from e
#endregion: MiniMax Provider #endregion: MiniMax Provider
#region: Qwen Provider
def _ensure_qwen_client() -> None:
global _qwen_client, _qwen_region
if _qwen_client is None:
import dashscope
creds = _load_credentials()
api_key = creds.get("qwen", {}).get("api_key")
if not api_key:
raise ValueError("Qwen API key not found in credentials.toml")
_qwen_region = creds.get("qwen", {}).get("region", "china")
if _qwen_region == "international":
dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"
else:
dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1"
dashscope.api_key = api_key
_qwen_client = dashscope.Generation
def _dashscope_call(
model: str,
messages: list[dict[str, Any]],
tools: list[dict[str, Any]] | None,
*,
max_tokens: int,
temperature: float,
top_p: float,
) -> dict[str, Any]:
import dashscope
from src.qwen_adapter import build_dashscope_tools
kwargs: dict[str, Any] = {
"model": model,
"messages": messages,
"max_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"result_format": "message",
}
if tools:
kwargs["tools"] = build_dashscope_tools(tools)
resp = dashscope.Generation.call(**kwargs)
if getattr(resp, "status_code", 200) != 200:
from src.qwen_adapter import classify_dashscope_error
raise classify_dashscope_error(_dashscope_exception_from_response(resp))
return {
"text": resp.output.text if hasattr(resp, "output") and resp.output else "",
"tool_calls": _extract_dashscope_tool_calls(resp),
"usage": {
"input_tokens": getattr(resp.usage, "input_tokens", 0) if hasattr(resp, "usage") and resp.usage else 0,
"output_tokens": getattr(resp.usage, "output_tokens", 0) if hasattr(resp, "usage") and resp.usage else 0,
},
}
def _dashscope_exception_from_response(resp: Any) -> Exception:
msg = getattr(resp, "message", "unknown dashscope error")
return RuntimeError(msg)
def _extract_dashscope_tool_calls(resp: Any) -> list[dict[str, Any]]:
out: list[dict[str, Any]] = []
if not (hasattr(resp, "output") and resp.output and getattr(resp.output, "tool_calls", None)):
return out
for tc in resp.output.tool_calls:
out.append({
"id": getattr(tc, "id", ""),
"type": "function",
"function": {
"name": getattr(tc.function, "name", "") if hasattr(tc, "function") else "",
"arguments": getattr(tc.function, "arguments", "{}") if hasattr(tc, "function") else "{}",
},
})
return out
def _list_qwen_models() -> list[str]:
from src.vendor_capabilities import list_models_for_vendor
return list_models_for_vendor("qwen")
def _send_qwen(md_content: str, user_message: str, base_dir: str,
file_items: list[dict[str, Any]] | None = None,
discussion_history: str = "",
stream: bool = False,
pre_tool_callback: Optional[Callable[[str, str, Optional[Callable[[str], str]]], Optional[str]]] = None,
qa_callback: Optional[Callable[[str], str]] = None,
stream_callback: Optional[Callable[[str], None]] = None,
patch_callback: Optional[Callable[[str, str], Optional[str]]] = None) -> str:
_ensure_qwen_client()
with _qwen_history_lock:
user_content = user_message
if file_items:
for fi in file_items:
if fi.get("is_image") and fi.get("base64_data"):
user_content = f"[IMAGE: {fi.get('path', 'attachment')}]\n{user_content}"
if discussion_history and not _qwen_history:
_qwen_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"})
else:
_qwen_history.append({"role": "user", "content": user_content})
messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"}]
messages.extend(_qwen_history)
resp = _dashscope_call(
model=_model,
messages=messages,
tools=None,
max_tokens=_max_tokens,
temperature=_temperature,
top_p=_top_p,
)
return resp.get("text", "")
#endregion: Qwen Provider
def _ensure_llama_client() -> Any:
global _llama_client, _llama_base_url, _llama_api_key
if _llama_client is None:
openai = _require_warmed("openai")
creds = _load_credentials()
configured_url = creds.get("llama", {}).get("base_url")
configured_key = creds.get("llama", {}).get("api_key")
if configured_url:
_llama_base_url = configured_url
if configured_key is not None:
_llama_api_key = configured_key or "ollama"
_llama_client = openai.OpenAI(api_key=_llama_api_key, base_url=_llama_base_url)
return _llama_client
def _send_llama(md_content: str, user_message: str, base_dir: str,
file_items: list[dict[str, Any]] | None = None,
discussion_history: str = "",
stream: bool = False,
pre_tool_callback: Optional[Callable[[str, str, Optional[Callable[[str], str]]], Optional[str]]] = None,
qa_callback: Optional[Callable[[str], str]] = None,
stream_callback: Optional[Callable[[str], None]] = None,
patch_callback: Optional[Callable[[str, str], Optional[str]]] = None) -> str:
client = _ensure_llama_client()
from src.openai_compatible import OpenAICompatibleRequest, send_openai_compatible
from src.vendor_capabilities import get_capabilities
with _llama_history_lock:
user_content = user_message
if file_items:
for fi in file_items:
if fi.get("is_image") and fi.get("base64_data"):
user_content = f"[IMAGE: {fi.get('path', 'attachment')}]\n{user_content}"
if discussion_history and not _llama_history:
_llama_history.append({"role": "user", "content": f"[DISCUSSION HISTORY]\n\n{discussion_history}\n\n---\n\n{user_message}"})
else:
_llama_history.append({"role": "user", "content": user_content})
messages = [{"role": "system", "content": f"{_get_combined_system_prompt()}\n\n<context>\n{md_content}\n</context>"}]
messages.extend(_llama_history)
request = OpenAICompatibleRequest(
messages=messages,
model=_model,
temperature=_temperature,
top_p=_top_p,
max_tokens=_max_tokens,
stream=stream,
stream_callback=stream_callback,
)
caps = get_capabilities("llama", _model)
response = send_openai_compatible(client, request, capabilities=caps)
_llama_history.append({"role": "assistant", "content": response.text})
return response.text
def _list_llama_models() -> list[str]:
from src.vendor_capabilities import list_models_for_vendor
return list_models_for_vendor("llama")
def _get_llama_cost_tracking() -> bool:
if "localhost" in _llama_base_url or "127.0.0.1" in _llama_base_url:
return False
from src.vendor_capabilities import get_capabilities
try:
caps = get_capabilities("llama", _model)
return caps.cost_tracking
except KeyError:
return True
#endregion: Llama Provider
#region: Tier 4 Analysis #region: Tier 4 Analysis
def run_tier4_analysis(stderr: str) -> str: def run_tier4_analysis(stderr: str) -> str:
+18
View File
@@ -43,6 +43,24 @@ MODEL_PRICING = [
(r"claude-.*-sonnet", {"input_per_mtok": 3.0, "output_per_mtok": 15.0}), (r"claude-.*-sonnet", {"input_per_mtok": 3.0, "output_per_mtok": 15.0}),
(r"claude-.*-opus", {"input_per_mtok": 15.0, "output_per_mtok": 75.0}), (r"claude-.*-opus", {"input_per_mtok": 15.0, "output_per_mtok": 75.0}),
(r"deepseek-v3", {"input_per_mtok": 0.27, "output_per_mtok": 1.10}), (r"deepseek-v3", {"input_per_mtok": 0.27, "output_per_mtok": 1.10}),
(r"qwen-turbo", {"input_per_mtok": 0.05, "output_per_mtok": 0.10}),
(r"qwen-plus", {"input_per_mtok": 0.40, "output_per_mtok": 1.20}),
(r"qwen-max", {"input_per_mtok": 2.00, "output_per_mtok": 6.00}),
(r"qwen-long", {"input_per_mtok": 0.07, "output_per_mtok": 0.28}),
(r"qwen-vl-plus", {"input_per_mtok": 0.21, "output_per_mtok": 0.63}),
(r"qwen-vl-max", {"input_per_mtok": 0.50, "output_per_mtok": 1.50}),
(r"qwen-audio", {"input_per_mtok": 0.10, "output_per_mtok": 0.30}),
(r"grok-2", {"input_per_mtok": 2.00, "output_per_mtok": 10.00}),
(r"grok-2-vision", {"input_per_mtok": 2.00, "output_per_mtok": 10.00}),
(r"grok-beta", {"input_per_mtok": 5.00, "output_per_mtok": 15.00}),
(r"llama-3\.1-8b-instant", {"input_per_mtok": 0.05, "output_per_mtok": 0.08}),
(r"llama-3\.1-70b-versatile", {"input_per_mtok": 0.59, "output_per_mtok": 0.79}),
(r"llama-3\.1-405b-reasoning", {"input_per_mtok": 3.00, "output_per_mtok": 3.00}),
(r"llama-3\.2-1b-preview", {"input_per_mtok": 0.04, "output_per_mtok": 0.04}),
(r"llama-3\.2-3b-preview", {"input_per_mtok": 0.06, "output_per_mtok": 0.06}),
(r"llama-3\.2-11b-vision-preview", {"input_per_mtok": 0.18, "output_per_mtok": 0.18}),
(r"llama-3\.2-90b-vision-preview", {"input_per_mtok": 0.90, "output_per_mtok": 0.90}),
(r"llama-3\.3-70b-specdec", {"input_per_mtok": 0.59, "output_per_mtok": 0.79}),
] ]
def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float: def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
+1 -1
View File
@@ -53,7 +53,7 @@ from src.paths import get_config_path
#region: Constants #region: Constants
PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax"] PROVIDERS: List[str] = ["gemini", "anthropic", "gemini_cli", "deepseek", "minimax", "qwen", "grok", "llama"]
AGENT_TOOL_NAMES: List[str] = [ AGENT_TOOL_NAMES: List[str] = [
"run_powershell", "run_powershell",
+144
View File
@@ -0,0 +1,144 @@
from __future__ import annotations
from dataclasses import dataclass
from typing import Any, Callable, Optional
from openai import OpenAIError, RateLimitError, AuthenticationError, PermissionDeniedError, APIConnectionError, APIStatusError, BadRequestError
@dataclass(frozen=True)
class NormalizedResponse:
text: str
tool_calls: list[dict[str, Any]]
usage_input_tokens: int
usage_output_tokens: int
usage_cache_read_tokens: int
usage_cache_creation_tokens: int
raw_response: Any
@dataclass
class OpenAICompatibleRequest:
messages: list[dict[str, Any]]
model: str
temperature: float = 0.0
top_p: float = 1.0
max_tokens: int = 8192
tools: Optional[list[dict[str, Any]]] = None
tool_choice: str = "auto"
stream: bool = False
stream_callback: Optional[Callable[[str], None]] = None
def _to_dict_tool_call(tc: Any) -> dict[str, Any]:
return {
"id": getattr(tc, "id", None),
"type": getattr(tc, "type", "function"),
"function": {
"name": getattr(tc.function, "name", None),
"arguments": getattr(tc.function, "arguments", "{}"),
},
}
def _classify_openai_compatible_error(exc: Exception) -> "ProviderError":
from src.ai_client import ProviderError
if isinstance(exc, RateLimitError):
return ProviderError(kind="rate_limit", provider="openai_compatible", original=exc)
if isinstance(exc, AuthenticationError) or isinstance(exc, PermissionDeniedError):
return ProviderError(kind="auth", provider="openai_compatible", original=exc)
if isinstance(exc, APIConnectionError):
return ProviderError(kind="network", provider="openai_compatible", original=exc)
if isinstance(exc, APIStatusError):
code = getattr(exc, "status_code", 0)
if code == 402:
return ProviderError(kind="balance", provider="openai_compatible", original=exc)
if code == 429:
return ProviderError(kind="rate_limit", provider="openai_compatible", original=exc)
if code in (401, 403):
return ProviderError(kind="auth", provider="openai_compatible", original=exc)
if code in (500, 502, 503, 504):
return ProviderError(kind="network", provider="openai_compatible", original=exc)
if isinstance(exc, BadRequestError):
return ProviderError(kind="quota", provider="openai_compatible", original=exc)
return ProviderError(kind="unknown", provider="openai_compatible", original=exc)
def send_openai_compatible(
client: Any,
request: OpenAICompatibleRequest,
*,
capabilities: Any,
) -> NormalizedResponse:
kwargs: dict[str, Any] = {
"model": request.model,
"messages": request.messages,
"temperature": request.temperature,
"top_p": request.top_p,
"max_tokens": request.max_tokens,
"stream": request.stream,
}
if request.tools is not None:
kwargs["tools"] = request.tools
kwargs["tool_choice"] = request.tool_choice
try:
if request.stream:
return _send_streaming(client, kwargs, request.stream_callback)
return _send_blocking(client, kwargs)
except OpenAIError as exc:
raise _classify_openai_compatible_error(exc) from exc
def _send_blocking(client: Any, kwargs: dict[str, Any]) -> NormalizedResponse:
resp = client.chat.completions.create(**kwargs)
msg = resp.choices[0].message
tool_calls_raw = msg.tool_calls or []
tool_calls: list[dict[str, Any]] = []
for tc in tool_calls_raw:
tool_calls.append(_to_dict_tool_call(tc))
usage = getattr(resp, "usage", None)
return NormalizedResponse(
text=msg.content or "",
tool_calls=tool_calls,
usage_input_tokens=int(getattr(usage, "prompt_tokens", 0) or 0),
usage_output_tokens=int(getattr(usage, "completion_tokens", 0) or 0),
usage_cache_read_tokens=0,
usage_cache_creation_tokens=0,
raw_response=resp,
)
def _send_streaming(client: Any, kwargs: dict[str, Any], callback: Optional[Callable[[str], None]]) -> NormalizedResponse:
kwargs_stream = dict(kwargs)
kwargs_stream["stream"] = True
kwargs_stream["stream_options"] = {"include_usage": True}
chunks_iter = client.chat.completions.create(**kwargs_stream)
text_parts: list[str] = []
tool_calls_acc: dict[int, dict[str, Any]] = {}
usage_input = 0
usage_output = 0
for chunk in chunks_iter:
for choice in getattr(chunk, "choices", []) or []:
delta = getattr(choice, "delta", None)
if delta is None:
continue
if delta.content:
text_parts.append(delta.content)
if callback:
callback(delta.content)
for tc in getattr(delta, "tool_calls", None) or []:
idx = getattr(tc, "index", 0)
if idx not in tool_calls_acc:
tool_calls_acc[idx] = {"id": None, "type": "function", "function": {"name": None, "arguments": ""}}
if getattr(tc, "id", None):
tool_calls_acc[idx]["id"] = tc.id
if getattr(tc, "function", None):
if tc.function.name:
tool_calls_acc[idx]["function"]["name"] = tc.function.name
if tc.function.arguments:
tool_calls_acc[idx]["function"]["arguments"] += tc.function.arguments
chunk_usage = getattr(chunk, "usage", None)
if chunk_usage is not None:
usage_input = int(getattr(chunk_usage, "prompt_tokens", 0) or 0)
usage_output = int(getattr(chunk_usage, "completion_tokens", 0) or 0)
return NormalizedResponse(
text="".join(text_parts),
tool_calls=[tool_calls_acc[k] for k in sorted(tool_calls_acc.keys())],
usage_input_tokens=usage_input,
usage_output_tokens=usage_output,
usage_cache_read_tokens=0,
usage_cache_creation_tokens=0,
raw_response=None,
)
+37
View File
@@ -0,0 +1,37 @@
from __future__ import annotations
from typing import Any
import dashscope
from dashscope.common.error import (
AuthenticationError,
InvalidParameter,
RequestFailure,
ServiceUnavailableError,
TimeoutException,
)
from src.ai_client import ProviderError
def build_dashscope_tools(openai_tools: list[dict[str, Any]]) -> list[dict[str, Any]]:
out: list[dict[str, Any]] = []
for t in openai_tools:
if t.get("type") != "function":
continue
fn = t.get("function", {})
out.append({
"name": fn.get("name", ""),
"description": fn.get("description", ""),
"parameters": fn.get("parameters", {"type": "object", "properties": {}}),
})
return out
def classify_dashscope_error(exc: Exception) -> ProviderError:
if isinstance(exc, AuthenticationError):
return ProviderError(kind="auth", provider="qwen", original=exc)
if isinstance(exc, TimeoutException):
return ProviderError(kind="network", provider="qwen", original=exc)
if isinstance(exc, ServiceUnavailableError):
return ProviderError(kind="network", provider="qwen", original=exc)
if isinstance(exc, InvalidParameter):
return ProviderError(kind="quota", provider="qwen", original=exc)
if isinstance(exc, RequestFailure):
return ProviderError(kind="network", provider="qwen", original=exc)
return ProviderError(kind="unknown", provider="qwen", original=exc)
+55
View File
@@ -0,0 +1,55 @@
from __future__ import annotations
from dataclasses import dataclass
@dataclass(frozen=True)
class VendorCapabilities:
vendor: str
model: str
vision: bool = False
tool_calling: bool = True
caching: bool = False
streaming: bool = True
model_discovery: bool = True
context_window: int = 8192
cost_tracking: bool = True
cost_input_per_mtok: float = 0.0
cost_output_per_mtok: float = 0.0
notes: str = ''
_REGISTRY: dict[tuple[str, str], VendorCapabilities] = {}
def register(cap: VendorCapabilities) -> None:
_REGISTRY[(cap.vendor, cap.model)] = cap
def get_capabilities(vendor: str, model: str) -> VendorCapabilities:
if (vendor, model) in _REGISTRY:
return _REGISTRY[(vendor, model)]
if (vendor, '*') in _REGISTRY:
return _REGISTRY[(vendor, '*')]
raise KeyError(f'No capabilities registered for vendor={vendor!r} model={model!r}')
def list_models_for_vendor(vendor: str) -> list[str]:
return sorted({m for v, m in _REGISTRY if v == vendor and m != '*'})
register(VendorCapabilities(vendor='minimax', model='*', context_window=131072, cost_input_per_mtok=0.20, cost_output_per_mtok=0.20))
register(VendorCapabilities(vendor='grok', model='*', context_window=131072, cost_input_per_mtok=2.00, cost_output_per_mtok=10.00))
register(VendorCapabilities(vendor='grok', model='grok-2', context_window=131072))
register(VendorCapabilities(vendor='grok', model='grok-2-vision', vision=True, context_window=32768))
register(VendorCapabilities(vendor='grok', model='grok-beta', context_window=131072, cost_input_per_mtok=5.00, cost_output_per_mtok=15.00))
register(VendorCapabilities(vendor='llama', model='*', context_window=131072))
register(VendorCapabilities(vendor='llama', model='llama-3.1-8b-instant', context_window=131072, cost_input_per_mtok=0.05, cost_output_per_mtok=0.08))
register(VendorCapabilities(vendor='llama', model='llama-3.1-70b-versatile', context_window=131072, cost_input_per_mtok=0.59, cost_output_per_mtok=0.79))
register(VendorCapabilities(vendor='llama', model='llama-3.1-405b-reasoning', context_window=131072, cost_input_per_mtok=3.00, cost_output_per_mtok=3.00))
register(VendorCapabilities(vendor='llama', model='llama-3.2-1b-preview', context_window=131072, cost_input_per_mtok=0.04, cost_output_per_mtok=0.04))
register(VendorCapabilities(vendor='llama', model='llama-3.2-3b-preview', context_window=131072, cost_input_per_mtok=0.06, cost_output_per_mtok=0.06))
register(VendorCapabilities(vendor='llama', model='llama-3.2-11b-vision-preview', vision=True, context_window=131072, cost_input_per_mtok=0.18, cost_output_per_mtok=0.18))
register(VendorCapabilities(vendor='llama', model='llama-3.2-90b-vision-preview', vision=True, context_window=131072, cost_input_per_mtok=0.90, cost_output_per_mtok=0.90))
register(VendorCapabilities(vendor='llama', model='llama-3.3-70b-specdec', context_window=131072, cost_input_per_mtok=0.59, cost_output_per_mtok=0.79))
register(VendorCapabilities(vendor='qwen', model='*', context_window=32768))
register(VendorCapabilities(vendor='qwen', model='qwen-turbo', context_window=1000000, cost_input_per_mtok=0.05, cost_output_per_mtok=0.10))
register(VendorCapabilities(vendor='qwen', model='qwen-plus', context_window=131072, cost_input_per_mtok=0.40, cost_output_per_mtok=1.20))
register(VendorCapabilities(vendor='qwen', model='qwen-max', context_window=32768, cost_input_per_mtok=2.00, cost_output_per_mtok=6.00))
register(VendorCapabilities(vendor='qwen', model='qwen-long', context_window=1000000, cost_input_per_mtok=0.07, cost_output_per_mtok=0.28))
register(VendorCapabilities(vendor='qwen', model='qwen-vl-plus', vision=True, context_window=131072, cost_input_per_mtok=0.21, cost_output_per_mtok=0.63))
register(VendorCapabilities(vendor='qwen', model='qwen-vl-max', vision=True, context_window=32768, cost_input_per_mtok=0.50, cost_output_per_mtok=1.50))
register(VendorCapabilities(vendor='qwen', model='qwen-audio', context_window=32768, cost_input_per_mtok=0.10, cost_output_per_mtok=0.30, notes='Text-only in v1; audio input deferred'))
+28
View File
@@ -0,0 +1,28 @@
from unittest.mock import MagicMock, patch
import pytest
from src import ai_client
@pytest.fixture(autouse=True)
def _reset_grok_state():
if hasattr(ai_client, '_grok_client'):
ai_client._grok_client = None
if hasattr(ai_client, '_grok_history'):
ai_client._grok_history = []
yield
def test_send_grok_uses_xai_endpoint(monkeypatch: pytest.MonkeyPatch) -> None:
ai_client.set_provider("grok", "grok-2")
mock_client = MagicMock()
mock_client.chat.completions.create.return_value = MagicMock(
choices=[MagicMock(message=MagicMock(content="hi from grok", tool_calls=[]))],
usage=MagicMock(prompt_tokens=10, completion_tokens=5),
)
with patch("src.ai_client._ensure_grok_client", return_value=mock_client):
result = ai_client._send_grok("system", "user", ".", None, "", False, None, None, None)
assert result == "hi from grok"
assert mock_client.chat.completions.create.called
def test_grok_2_vision_supports_image() -> None:
from src.vendor_capabilities import get_capabilities
caps = get_capabilities("grok", "grok-2-vision")
assert caps.vision is True
+68
View File
@@ -0,0 +1,68 @@
from unittest.mock import MagicMock, patch
import pytest
from src import ai_client
@pytest.fixture(autouse=True)
def _reset_llama_state():
if hasattr(ai_client, '_llama_client'):
ai_client._llama_client = None
if hasattr(ai_client, '_llama_history'):
ai_client._llama_history = []
if hasattr(ai_client, '_llama_base_url'):
ai_client._llama_base_url = "http://localhost:11434/v1"
if hasattr(ai_client, '_llama_api_key'):
ai_client._llama_api_key = "ollama"
yield
def test_send_llama_ollama_backend(monkeypatch: pytest.MonkeyPatch) -> None:
ai_client._llama_base_url = "http://localhost:11434/v1"
ai_client.set_provider("llama", "llama-3.2-3b-preview")
mock_client = MagicMock()
mock_client.chat.completions.create.return_value = MagicMock(
choices=[MagicMock(message=MagicMock(content="hi from ollama", tool_calls=[]))],
usage=MagicMock(prompt_tokens=5, completion_tokens=3),
)
with patch("src.ai_client._ensure_llama_client", return_value=mock_client):
result = ai_client._send_llama("system", "user", ".", None, "", False, None, None, None)
assert result == "hi from ollama"
def test_send_llama_openrouter_backend(monkeypatch: pytest.MonkeyPatch) -> None:
ai_client._llama_base_url = "https://openrouter.ai/api/v1"
ai_client.set_provider("llama", "llama-3.1-70b-versatile")
captured_client = MagicMock()
captured_client.chat.completions.create.return_value = MagicMock(
choices=[MagicMock(message=MagicMock(content="hi from openrouter", tool_calls=[]))],
usage=MagicMock(prompt_tokens=5, completion_tokens=3),
)
with patch("src.ai_client._ensure_llama_client", return_value=captured_client) as ensure:
result = ai_client._send_llama("system", "user", ".", None, "", False, None, None, None)
assert result == "hi from openrouter"
assert ensure.called
def test_send_llama_custom_url(monkeypatch: pytest.MonkeyPatch) -> None:
ai_client._llama_base_url = "http://my-server:9999/v1"
mock_client = MagicMock()
mock_client.chat.completions.create.return_value = MagicMock(
choices=[MagicMock(message=MagicMock(content="hi from custom", tool_calls=[]))],
usage=MagicMock(prompt_tokens=5, completion_tokens=3),
)
with patch("src.ai_client._ensure_llama_client", return_value=mock_client):
result = ai_client._send_llama("system", "user", ".", None, "", False, None, None, None)
assert result == "hi from custom"
def test_llama_model_discovery_unions_ollama_and_openrouter() -> None:
from src.ai_client import _list_llama_models
models = _list_llama_models()
assert "llama-3.1-8b-instant" in models
assert "llama-3.2-11b-vision-preview" in models
assert "llama-3.3-70b-specdec" in models
def test_llama_3_2_vision_vision_capability() -> None:
from src.vendor_capabilities import get_capabilities
caps = get_capabilities("llama", "llama-3.2-11b-vision-preview")
assert caps.vision is True
def test_llama_local_backend_cost_tracking_false_for_ollama() -> None:
ai_client._llama_base_url = "http://localhost:11434/v1"
from src.ai_client import _get_llama_cost_tracking
assert _get_llama_cost_tracking() is False
+88
View File
@@ -0,0 +1,88 @@
from unittest.mock import MagicMock
import pytest
from src.openai_compatible import (
NormalizedResponse,
OpenAICompatibleRequest,
send_openai_compatible,
)
from src.vendor_capabilities import VendorCapabilities, register
@pytest.fixture
def caps() -> VendorCapabilities:
return VendorCapabilities(vendor="test", model="test-model", context_window=8192, cost_input_per_mtok=1.0, cost_output_per_mtok=2.0)
def _mock_completion(text: str = "hello", tool_calls=None, usage_input: int = 10, usage_output: int = 5):
m = MagicMock()
m.choices = [MagicMock()]
m.choices[0].message.content = text
m.choices[0].message.tool_calls = tool_calls or []
m.usage.prompt_tokens = usage_input
m.usage.completion_tokens = usage_output
m.usage.prompt_tokens_details = None
m.usage.completion_tokens_details = None
return m
def test_send_non_streaming_returns_normalized_response(caps: VendorCapabilities) -> None:
client = MagicMock()
client.chat.completions.create.return_value = _mock_completion("hi", usage_input=20, usage_output=10)
request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m", max_tokens=100)
response = send_openai_compatible(client, request, capabilities=caps)
assert response.text == "hi"
assert response.tool_calls == []
assert response.usage_input_tokens == 20
assert response.usage_output_tokens == 10
def test_send_streaming_aggregates_chunks(caps: VendorCapabilities) -> None:
client = MagicMock()
chunks = [
MagicMock(choices=[MagicMock(delta=MagicMock(content="hel", tool_calls=None))]),
MagicMock(choices=[MagicMock(delta=MagicMock(content="lo", tool_calls=None))]),
MagicMock(choices=[MagicMock(delta=MagicMock(content="", tool_calls=None))], usage=MagicMock(prompt_tokens=15, completion_tokens=5)),
]
client.chat.completions.create.return_value = iter(chunks)
received: list = []
request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m", stream=True, stream_callback=received.append)
response = send_openai_compatible(client, request, capabilities=caps)
assert response.text == "hello"
assert received == ["hel", "lo"]
assert response.usage_input_tokens == 15
def test_tool_call_detection_in_response(caps: VendorCapabilities) -> None:
tool_call = MagicMock()
tool_call.id = "call_1"
tool_call.function.name = "read_file"
tool_call.function.arguments = '{"path": "/tmp/x"}'
completion = _mock_completion(text="", tool_calls=[tool_call])
client = MagicMock()
client.chat.completions.create.return_value = completion
request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m")
response = send_openai_compatible(client, request, capabilities=caps)
assert len(response.tool_calls) == 1
assert response.tool_calls[0]["function"]["name"] == "read_file"
assert response.tool_calls[0]["id"] == "call_1"
def test_vision_multimodal_message(caps: VendorCapabilities) -> None:
client = MagicMock()
client.chat.completions.create.return_value = _mock_completion("looks like a cat")
messages = [{"role": "user", "content": [{"type": "text", "text": "what is this?"}, {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}]}]
request = OpenAICompatibleRequest(messages=messages, model="m")
response = send_openai_compatible(client, request, capabilities=caps)
sent_messages = client.chat.completions.create.call_args.kwargs["messages"]
assert sent_messages[0]["content"] == messages[0]["content"]
assert response.text == "looks like a cat"
def test_error_classification_429_to_rate_limit(caps: VendorCapabilities) -> None:
from openai import RateLimitError
from src.ai_client import ProviderError
client = MagicMock()
client.chat.completions.create.side_effect = RateLimitError("rate limited", response=MagicMock(status_code=429), body=None)
request = OpenAICompatibleRequest(messages=[{"role": "user", "content": "ping"}], model="m")
with pytest.raises(ProviderError) as exc_info:
send_openai_compatible(client, request, capabilities=caps)
assert exc_info.value.kind == "rate_limit"
def test_normalized_response_is_frozen_dataclass() -> None:
from dataclasses import FrozenInstanceError
r = NormalizedResponse(text="x", tool_calls=[], usage_input_tokens=0, usage_output_tokens=0, usage_cache_read_tokens=0, usage_cache_creation_tokens=0, raw_response=None)
with pytest.raises(FrozenInstanceError):
r.text = "y"
+55
View File
@@ -0,0 +1,55 @@
from unittest.mock import MagicMock, patch
import pytest
from src import ai_client
@pytest.fixture(autouse=True)
def _reset_qwen_state():
if hasattr(ai_client, '_qwen_client'):
ai_client._qwen_client = None
if hasattr(ai_client, '_qwen_history'):
ai_client._qwen_history = []
yield
def test_send_qwen_routes_to_dashscope(monkeypatch: pytest.MonkeyPatch) -> None:
ai_client.set_provider("qwen", "qwen-max")
with patch("src.ai_client._ensure_qwen_client") as ensure, \
patch("src.ai_client._dashscope_call", return_value={"text": "hi from qwen", "tool_calls": [], "usage": {"input_tokens": 10, "output_tokens": 5}}) as call:
result = ai_client._send_qwen("system", "user", ".", None, "", False, None, None, None)
assert result == "hi from qwen"
call.assert_called_once()
ensure.assert_called_once()
def test_qwen_vision_vl_model_accepts_image(monkeypatch: pytest.MonkeyPatch) -> None:
ai_client.set_provider("qwen", "qwen-vl-max")
with patch("src.ai_client._ensure_qwen_client"), \
patch("src.ai_client._dashscope_call", return_value={"text": "I see a cat", "tool_calls": [], "usage": {"input_tokens": 10, "output_tokens": 5}}) as call:
file_items = [{"path": "/tmp/cat.png", "is_image": True, "base64_data": "iVBOR..."}]
result = ai_client._send_qwen("system", "describe this image", ".", file_items, "", False, None, None, None)
assert "cat" in result.lower()
kwargs = call.call_args.kwargs
msgs_str = str(kwargs.get("messages", [])).lower()
assert "image" in msgs_str or "cat.png" in msgs_str
def test_qwen_tool_format_translation() -> None:
from src.qwen_adapter import build_dashscope_tools
openai_tools = [{"type": "function", "function": {"name": "read_file", "description": "Read a file", "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}}}]
ds_tools = build_dashscope_tools(openai_tools)
assert len(ds_tools) == 1
assert ds_tools[0]["name"] == "read_file"
assert "parameters" in ds_tools[0]
def test_qwen_error_classification() -> None:
from src.ai_client import ProviderError
from src.qwen_adapter import classify_dashscope_error
from dashscope.common.error import AuthenticationError
err = classify_dashscope_error(AuthenticationError("bad key"))
assert err.kind == "auth"
assert err.provider == "qwen"
def test_list_qwen_models_returns_hardcoded_registry() -> None:
from src.ai_client import _list_qwen_models
models = _list_qwen_models()
assert "qwen-max" in models
assert "qwen-vl-max" in models
assert "qwen-turbo" in models
assert "qwen-audio" in models
+40
View File
@@ -0,0 +1,40 @@
import pytest
from src.vendor_capabilities import VendorCapabilities, get_capabilities, register
@pytest.fixture(autouse=True)
def _clean_registry():
import src.vendor_capabilities
snapshot = src.vendor_capabilities._REGISTRY.copy()
yield
src.vendor_capabilities._REGISTRY.clear()
src.vendor_capabilities._REGISTRY.update(snapshot)
def test_registry_lookup_known_model():
caps = VendorCapabilities(
vendor='qwen',
model='qwen-max',
vision=False,
context_window=32768
)
register(caps)
retrieved = get_capabilities('qwen', 'qwen-max')
assert retrieved.vendor == 'qwen'
assert retrieved.model == 'qwen-max'
assert retrieved.context_window == 32768
assert retrieved.vision is False
def test_fallback_to_vendor_default():
caps = VendorCapabilities(
vendor='llama',
model='*',
context_window=131072,
cost_tracking=False
)
register(caps)
retrieved = get_capabilities('llama', 'llama-3.3-future-unregistered')
assert retrieved.context_window == 131072
assert retrieved.cost_tracking is False
def test_unknown_vendor_raises():
with pytest.raises(KeyError, match='No capabilities registered'):
get_capabilities('nonexistent_vendor', 'anymodel')