Private
Public Access
0
0
Commit Graph

807 Commits

Author SHA1 Message Date
ed 6d408c4d03 docs(gui_2): Document App snapshot, profile, and history/undo/redo managers 2026-06-12 21:21:25 -04:00
ed 3b4b55698c docs(gui_2): Add SQLite-granularity docstrings for App init, run, _gui_func, and shutdown 2026-06-12 21:18:18 -04:00
ed 6aafac5d2f starting human review of ai_client 2026-06-12 20:51:55 -04:00
ed 6f5b5f91c4 restore comment 2026-06-12 20:26:48 -04:00
ed ee3c90b865 refactor(rag_engine): Result API + NilRAGState (_init_vector_store, _validate_collection_dim, _get_state) 2026-06-12 20:14:40 -04:00
ed 64b787b881 refactor(ai_client): remove ProviderError class; ErrorInfo is the new error type 2026-06-12 19:41:41 -04:00
ed 73cf321cdf feat(ai_client): mark send() @deprecated; rewire to call send_result() 2026-06-12 19:22:27 -04:00
ed 9f86b2bee3 feat(ai_client): add send_result() public API returning Result[str] 2026-06-12 19:01:50 -04:00
ed d4d7d1ab14 refactor(ai_client): _send_llama_native_result() returns Result[str] 2026-06-12 18:47:14 -04:00
ed 6665152950 refactor(ai_client): _send_llama_result() returns Result[str] 2026-06-12 18:46:29 -04:00
ed 64d6ba2db5 refactor(ai_client): _send_qwen_result() returns Result[str] 2026-06-12 18:45:23 -04:00
ed e384afce9c refactor(ai_client): _send_minimax_result() returns Result[str] 2026-06-12 18:44:33 -04:00
ed 87cac3808f refactor(ai_client): _send_grok_result() returns Result[str] 2026-06-12 18:43:47 -04:00
ed 49923f9b43 refactor(ai_client): _send_deepseek_result() returns Result[str] 2026-06-12 18:42:53 -04:00
ed f840dbe85e refactor(ai_client): _send_anthropic_result() returns Result[str] 2026-06-12 18:41:15 -04:00
ed 943a21bfdc refactor(ai_client): _send_gemini_cli_result() returns Result[str] 2026-06-12 18:39:43 -04:00
ed 0282f9ff61 refactor(ai_client): _send_gemini_result() returns Result[str] 2026-06-12 18:38:43 -04:00
ed 0cad1e161f refactor(ai_client): classifier functions return ErrorInfo instead of ProviderError
The 6 error-classifier functions in ai_client.py, openai_compatible.py,
and qwen_adapter.py now return ErrorInfo (data-oriented) instead of
ProviderError. Each takes a source: str parameter for telemetry
provenance. ProviderError class is still used in production code paths
(Task 3.4) and will be removed in Task 3.7.
2026-06-12 18:32:05 -04:00
ed cf5e7b9925 feat(mcp): add Result-returning variants of resolve, read, list, search
Strictly additive: existing _resolve_and_check, read_file, list_directory,
and search_files are unchanged. The new variants return Result[Path] or
Result[str] using the data-oriented ErrorInfo/ErrorKind convention.
2026-06-12 17:44:55 -04:00
ed 46089e3649 feat(result_types): add Result, ErrorInfo, ErrorKind, NilPath, NilRAGState, OK 2026-06-12 16:38:09 -04:00
ed d7c6d67f69 feat(ai_client): wire v2 matrix fields into old vendor send functions
The matrix has v2 fields (reasoning, web_search, x_search)
populated for the old vendors (minimax-M2.5/M2.7, grok-*),
but the send functions didn't consult them. This commit
makes the code path actually USE the matrix:

  _send_minimax: gate reasoning_extractor on caps.reasoning
    (was unconditional; now skipped for non-reasoning models
    to avoid useless getattr calls)

  _send_grok: populate OpenAICompatibleRequest.extra_body with
    search_parameters when caps.web_search or caps.x_search is
    True. caps.web_search -> {mode: auto}; caps.x_search ->
    {sources: [{type: x}]} per the xAI Live Search spec

  OpenAICompatibleRequest: added extra_body field. Wired
    through send_openai_compatible (passed as extra_body kwarg
    to client.chat.completions.create).

Also fixed 2 latent bugs in _send_minimax surfaced by the
new tests: the function was missing 'tools' variable
(NameError) and 'stream_callback' parameter. These are
pre-existing bugs masked by mock-based tests that don't
exercise the actual call path.

Also cancelled t5_6/7/8 (the invented 'deferred tool-loop
conversion' work). The 3 vendors (anthropic, gemini,
deepseek) use vendor-specific call paths. Their inline
loops are NOT defects. The '3-5 days' / '1-2 weeks'
estimates were made up by the agent. The audit script's
DEFERRED_VENDORS exclusion is permanent.

Tests:
- 2 new grok tests: web_search and x_search populate
  extra_body correctly
- 2 new minimax tests: reasoning_extractor used/omitted
  based on caps.reasoning
- 122/122 vendor+tool+provider+import-isolation tests pass
  (no regressions; +4 new tests this commit)
- 3 audit scripts pass
2026-06-11 22:27:42 -04:00
ed c9135b0565 feat(gui): add v2 capability badges in provider panel
Phase 5 t5_4 (UI adaptations for 11 v2 fields): the simplest
honest adaptation — render small colored badges for the 11
v2 fields where the active vendor+model supports them. Each
badge has a tooltip showing the field name.

The 11 fields:
  reasoning, structured_output, code_execution, web_search,
  x_search, file_search, mcp_support, audio, video,
  grounding, computer_use

A new module-level function _render_v2_capability_badges(caps)
is added to src/gui_2.py (per the HARD RULE on no new
src/<thing>.py files). It's called from render_provider_panel
right after the existing '[Local]' badge (which uses the
runtime override for caps.local).

What this is NOT: a full UI for the 11 fields (per-field
toggles, panels, attachment buttons). Those are design-heavy
work and need their own track. This change gives the user
visibility into which capabilities the active vendor+model
supports, so they can make informed decisions about which
prompts/features to use.

For example, when the user selects qwen-audio, they'll see:
  Provider: qwen [Local]  Capabilities [Audio]
Which makes it obvious they can attach audio files.

Tests:
- 2 new tests in tests/test_vendor_capabilities.py:
  * All 11 v2 fields are present in the helper (drift guard)
  * Helper is a no-op on empty caps (no fields True)
- 118/118 vendor+tool+provider+import-isolation tests pass
  (no regressions; +2 new tests this commit)
- 3 audit scripts pass
2026-06-11 21:46:41 -04:00
ed 7fee76f491 feat(capability_matrix): add anthropic, gemini, deepseek registry entries
Phase 5 t5_1, t5_2, t5_3: populate the v2 capability matrix
for the 3 vendors that had no registry entries. Previously,
get_capabilities('anthropic', ...) raised KeyError and the
GUI fell back to the 'unregistered' defaults. Now all 8
vendors in PROVIDERS are on the matrix.

Entries added:

  anthropic/*  (12 entries)
    - wildcard + 8 sonnet/opus variants + haiku-4-5 + claude-fable-5
    - caching=True, structured_output=True, file_search=True,
      mcp_support=True, computer_use=True (per Claude 3.5+ docs)
    - cost: sonnet=\/\, opus=\/\, haiku=\/\
    - context_window=200000 (Claude 3+ standard)

  gemini/*  (5 entries)
    - wildcard + 3.1-pro-preview + 3-flash-preview + 2.5-flash + 2.5-flash-lite
    - caching=True, vision=True, grounding=True,
      structured_output=True (per Gemini 2.5+ docs)
    - video=True, audio=True (for 2.5+ and 3.x; lite has no video/audio)
    - cost: 3.1-pro=\.50/\.50, 3-flash=\.15/\.60,
      2.5-flash=\.15/\.60, 2.5-flash-lite=\.075/\.30
    - context_window=1000000 (Gemini 2.5+ standard)

  deepseek/*  (4 entries)
    - wildcard + deepseek-v3 + deepseek-reasoner + deepseek-r1
    - reasoning=True (for r1/reasoner; v3 has structured_output=True only)
    - structured_output=True (all)
    - cost: v3=\.27/\.10, r1=\.55/\.19
    - context_window=32768

Tests:
- 9 new tests in tests/test_vendor_capabilities.py:
  * anthropic: sonnet/opus/haiku/wildcard entry tests
  * gemini: pro-preview + vision + wildcard tests
  * deepseek: reasoner + wildcard tests
- 116/116 vendor+tool+provider+import-isolation tests pass
  (no regressions; +9 new tests this commit)
- 3 audit scripts pass
2026-06-11 21:35:32 -04:00
ed 7d60e8f5ab feat(capability_matrix): populate v2 fields per-model; add runtime local override
Updates per-model registry entries to populate the 12 v2
fields where the capability is genuinely supported:

  minimax-M2.5/M2.7: reasoning=True (uses reasoning_details)
  grok-2-vision:      web_search=True, x_search=True (Live Search)
  grok-2:             web_search=True, x_search=True
  grok-beta:          web_search=True, x_search=True
  llama-3.1-405b:     reasoning=True (explicitly in model name)
  qwen-long:          caching=True (custom long-context chunking)
  qwen-audio:         audio=True (was 'deferred' in v1 notes)

Adds the runtime override helper:
  _apply_runtime_caps_override(app, caps)
  -> caps with local=True if app.current_provider=='llama'
     AND _llama_base_url contains 'localhost' or '127.0.0.1'

The 'local' flag is the only v2 field that is runtime-state,
not a static per-model property (OpenRouter llama is cloud;
Ollama llama is local — same model name, different backend).
The override uses dataclasses.replace() to mutate the
frozen dataclass. Implemented in src/gui_2.py (per the
HARD RULE on no new src/*.py files).

The override is wired into App._get_active_capabilities()
so the GUI sees caps.local=True when the active backend
is Ollama and caps.local=False otherwise.

Also: cost panel in src/gui_2.py (per-tier + session-total
columns) now renders 'Free (local)' when caps.local=True
(both the per-tier cost column and the session-total line).
This is t3_7 (moved from Phase 3 per the user's request;
naturally belongs after t4_1 which adds caps.local).

Tests:
- 3 new tests in tests/test_vendor_capabilities.py:
  * per-model population (reasoning, audio, caching, vision)
  * runtime override for llama+localhost
  * runtime override does NOT touch other vendors
- 107/107 vendor+tool+provider+import-isolation tests pass
  (no regressions; +4 new tests this commit)
- 3 audit scripts pass
2026-06-11 21:04:36 -04:00
ed 49d516042e feat(gui): add 'Local Model' badge in provider panel for local backends
When the active vendor+model has caps.local=True (per the
v2 capability matrix), the provider panel now shows a green
' [Local]' badge next to the provider combo. The tooltip
shows the Ollama base URL (when the active provider is
llama; otherwise the bare 'Local backend' tooltip).

Implements t4_4 of qwen_llama_grok_followup_20260611
Phase 4. Future use: Phase 4 t3_7 (moved from Phase 3)
will use caps.local to render 'Free (local)' in the cost
column.

The badge uses theme.get_color('status_success') (same
green used by C_IN / C_NUM / other 'success' indicators).
Renders inside the existing render_provider_panel function
at src/gui_2.py:2308.

Verification:
- import src.gui_2 OK (no syntax errors)
- 44/44 vendor+capability+provider tests pass (no regressions)
- 4 audit scripts pass
2026-06-11 20:50:13 -04:00
ed 25baa6fe25 feat(ai_client): add native Ollama adapter; route localhost to it
When _llama_base_url is localhost/127.0.0.1, _send_llama now
calls _send_llama_native (the native /api/chat adapter)
instead of the OpenAI-compat path. The native adapter
supports Ollama's vendor-specific fields: think, images,
thinking.

Functions added (in src/ai_client.py, per the naming
convention HARD RULE on no new src/*.py files):

  ollama_chat(model, messages, *, think='low', images=None,
              tools=None, base_url=OLLAMA_DEFAULT_BASE_URL)
    -> dict[str, Any]

  _send_llama_native(md_content, user_message, base_dir,
                     file_items=None, discussion_history='',
                     stream=False, ...callbacks) -> str

  OLLAMA_DEFAULT_BASE_URL: str = 'http://localhost:11434'

Implementation notes:
- requests loaded via _require_warmed('requests') (local
  scope; preserves startup_speedup_20260606 invariant that
  heavy SDKs are warmed on _io_pool, not imported at module
  level)
- _send_llama dispatches based on 'localhost' in
  _llama_base_url (same check already used by
  _get_llama_cost_tracking at line 2500)
- Removed orphan def stub at the old _send_llama body (the
  dead 'def _build_llama_request' that was overwritten by
  the real one — a known session issue with stale set_file_slice
  edits)
- Native adapter appends the 'thinking' field to history so
  subsequent rounds preserve the reasoning chain

Tests:
- 7 new tests in tests/test_llama_ollama_native.py:
  * ollama_chat hits /api/chat (not /v1/chat/completions)
  * ollama_chat includes 'think' param in payload
  * ollama_chat includes 'images' in payload
  * _send_llama_native wraps ollama_chat
  * _send_llama_native preserves 'thinking' field
  * _send_llama routes localhost to native (no openai client)
  * _send_llama keeps openai path for non-local (no POST)
- Updated test_send_llama_ollama_backend in test_llama_provider.py
  to mock the native path (was: mocked openai-compat; now:
  mocked requests.post)
- 103/103 vendor+tool+provider+import-isolation tests pass
  (no regressions; +7 new tests this commit)
- 4 audit scripts pass
2026-06-11 20:45:08 -04:00
ed 0a9e277564 feat(capability_matrix): add 12 v2 fields to VendorCapabilities
The 7 v1 fields (vision, tool_calling, caching, streaming,
model_discovery, context_window, cost_tracking) plus 2 cost
fields and notes are now extended by 12 v2 fields:

  local, reasoning, structured_output, code_execution,
  web_search, x_search, file_search, mcp_support,
  audio, video, grounding, computer_use

All default to False. Registry entries continue to work
unchanged (backward compatible). t4_1 of Phase 4.

Tests:
- 12 parameterized 'default is False' tests
- 12 parameterized 'round-trip to True' tests
- 3 'local flag' tests: per-model, wildcard fallback,
  vendor isolation
- 3 pre-existing registry tests still pass
- 96/96 vendor+tool+provider+import-isolation tests pass
  (no regressions; +27 new tests this commit)
2026-06-11 20:24:30 -04:00
ed 2e181a8216 feat(app_controller): apply 2 of 3 deferred UX adaptations (stream progress + fetch models gate)
Task t3.3 (stream progress) + t3.4 (fetch models) of the follow-up
track's Phase 3. These were originally deferred in commit
26becf2b; both fit in this session after the side-track report
was written.

t3.3 (stream progress):
- _on_ai_stream now also sets self._ai_status = 'streaming...'
  when caps.streaming is True (or vendor un-registered)
- The 3 'done' / 'error' event dispatches in _handle_generate_send
  reset self._ai_status accordingly so the status bar doesn't
  get stuck on 'streaming...'
- The 'streaming...' text is already rendered in the post-FX
  status bar via theme.render_post_fx in gui_2.py:1030
  (ai_status field), so no GUI changes needed
- Local import of get_capabilities inside _on_ai_stream to
  avoid loading vendor_capabilities at module level (heavy SDK
  isolation invariant from startup_speedup_20260606)

t3.4 (fetch models iff model_discovery):
- Line 1860 (_init_ai_and_hooks / _refresh_from_project):
  _fetch_models call is now gated on caps.model_discovery.
  If False, all_available_models stays empty (no network call).
- Same pattern applied at the other 2 call sites
  (start_warmup line 2284, current_provider setter line 2429).
  The edits were applied (tests pass) but the line numbers in the
  original audit had drifted; the gating is now in all 3 sites
  with the same try/except pattern.

Test results: 53 tests pass (Minimax + Grok + Llama + DeepSeek + Gemini
CLI + tool_loop + openai import + audit scripts).

t3.7 ('Free local' for localhost) remains DEFERRED: requires the
caps.local field (Phase 4 t4.1). Documented in deferred_work
section of state.toml.
2026-06-11 19:18:51 -04:00
ed 26becf2b88 feat(gui): apply 4 of 8 UX capability-matrix adaptations to src/gui_2.py
Phase 3 of the follow-up track. Applies the _get_active_capabilities()
pattern (established in parent Phase 5 adaptation #1: Screenshot
button iff caps.vision) to 4 more UI elements.

Adaptations applied:
- #2 Tools toggle: 'Active Tool Presets & Biases' panel
  (line 2224) is now hidden + shows '(tools not supported
  by X/Y)' hint when caps.tool_calling is False
- #3 Cache panel: 'Cache Usage' display (line 1911) now shows
  'Cache Usage: N/A (not supported by X/Y)' when caps.caching
  is False
- #6 Token budget max: the max_tokens slider (line 2327) now
  caps at caps.context_window (was hardcoded 32768)
- #9 Cost display '-': the per-tier cost column (line 1890) +
  session total (line 1894) now show '-' instead of '\.0000'
  when caps.cost_tracking is False

Adaptations deferred (not in this commit):
- #4 Stream progress iff streaming: needs a NEW 'streaming...'
  UI element; the codebase has no existing widget to gate.
  Recommend adding a small spinner in the status bar during
  active streams, gated on caps.streaming.
- #5 Fetch models iff model_discovery: do_fetch is in
  app_controller.py, not gui_2.py. The 'Refresh models'
  button on the provider combo could be gated here.
- #7 Cost panel: estimate: ALREADY DONE. The cost column
  shows \ (Phase 0 of the follow-up inherited this
  from parent Phase 5; adaptation #7 is effectively completed).
- #8 Cost panel: 'Free (local)' for localhost: requires the
  caps.local field (Phase 4 t4_1). Deferred.

Side note: a secondary cost display in render_mma_usage_section
(line 5382) is unchanged; it's a 1-line function that would
require restructuring to gate. Deferred.

The 4 applied adaptations cover the patterns where the
capability matrix maps directly to an existing UI element
that can be wrapped. The 4 deferred ones require either
new UI (#4, #5) or new capability matrix fields (#8, with
Phase 4 prerequisite).

No tests broken; no imports added.
2026-06-11 18:29:53 -04:00
ed 6c6a4aefa4 refactor(gui): import PROVIDERS from src.ai_client; add audit script
Phase 2 tasks 2.3 (update 4 import sites) + 2.4 (audit script).

The 4 call sites in src/app_controller.py:3093 and src/gui_2.py
{2293, 2849, 5377} were using models.PROVIDERS (which still
works via the __getattr__ re-export added in the previous
commit). Updated them to use ai_client.PROVIDERS directly:
- Models.PROVIDERS goes through the lazy __getattr__ every call
  (small per-call cost)
- ai_client.PROVIDERS is a direct module-level lookup

Both files already had 'from src import ai_client' at the top,
so no new imports were needed.

scripts/audit_providers_source_of_truth.py enforces the
invariant: PROVIDERS is declared as a literal only in
src/ai_client.py. Catches accidental declarations creeping
back into src/models.py or other modules. Catches the
literal pattern 'PROVIDERS: List[str] = [' specifically,
which the __getattr__ re-export in src/models.py does not
match (it's 'from src.ai_client import PROVIDERS').

All 5 audit scripts pass:
- audit_main_thread_imports.py
- audit_weak_types.py
- audit_no_models_config_io.py
- audit_no_inline_tool_loops.py
- audit_providers_source_of_truth.py (new)

63 vendor + tool + provider + import-isolation tests pass.
2026-06-11 16:43:20 -04:00
ed 74c3b6b274 refactor(ai_client): move PROVIDERS to src/ai_client.py; re-export via models.__getattr__
Phase 2 tasks 2.1 + 2.2 + 2.3a of the follow-up track.

PROVIDERS now lives in src/ai_client.py:56 (the canonical home for
AI-client-related constants per the HARD RULE on src/ files). The
list includes all 8 vendors: gemini, anthropic, gemini_cli,
deepseek, minimax, qwen, grok, llama.

Backward compat: src/models.py:PROVIDERS is exposed via a module-
level __getattr__ (PEP 562) that lazy-imports from src.ai_client.
The lazy approach was needed because src.ai_client imports
ToolPreset/BiasProfile/Tool from src.models at line 50, so a
top-level 'from src.ai_client import PROVIDERS' in models.py
would deadlock. Adding a branch to the existing __getattr__
in models.py (which also handles pydantic class factories) is
the surgical fix.

tests/test_provider_curation.py was stale (expected 5 providers
from before Qwen/Grok/Llama were added). Updated to 8.

New test: tests/test_providers_source_of_truth.py asserts:
- src.ai_client.PROVIDERS exists and matches the 8-provider list
- src.models.PROVIDERS still works (re-export)
- Both modules reference the SAME object (no drift)

Green confirmed: 4 provider tests pass.
2026-06-11 16:38:09 -04:00
ed 9ddfa98133 fix(ai_client): move openai_compatible imports to local scope; fix startup_speedup invariant
The follow-up track's tool-loop refactor moved
'from src.openai_compatible import send_openai_compatible,
 OpenAICompatibleRequest, NormalizedResponse' to MODULE level
in src/ai_client.py. This violates the startup_speedup_20260606
invariant: heavy SDKs must not be loaded at module level because
ai_client.py is on the main thread's import chain.

src/openai_compatible.py line 5 does 'from openai import
OpenAIError, ...', so any import from it triggers the openai SDK
to load. test_ai_client_does_not_import_openai_at_module_level
guards this invariant and was failing.

Fix: move the imports back to local scope inside the function
bodies that need them:
- _default_send closure inside run_with_tool_loop
  (imports send_openai_compatible)
- _send_grok (imports OpenAICompatibleRequest)
- _send_minimax (imports OpenAICompatibleRequest)
- _send_llama (imports OpenAICompatibleRequest)
- _send_gemini_cli (imports OpenAICompatibleRequest + NormalizedResponse)

Test patches: tests that previously patched
'src.ai_client.send_openai_compatible' now patch
'src.openai_compatible.send_openai_compatible' (the actual
import source). _execute_tool_calls_concurrently patches
unchanged (it's defined in src/ai_client.py itself).

Green confirmed: 62 vendor + tool + import-isolation tests
pass. 0 regressions.
2026-06-11 16:15:49 -04:00
ed 4748d13490 feat(ai_client): add send_func + on_pre_dispatch to run_with_tool_loop; refactor _send_gemini_cli
Task 1.7 of the follow-up track. Extends run_with_tool_loop with
two optional parameters that let vendored call paths share the
shared loop + history + dispatch without forcing them through
send_openai_compatible:

- send_func: Callable[[int], NormalizedResponse] - vendor's own
  API call (default = send_openai_compatible if not provided;
  fully backward compatible)
- on_pre_dispatch: Callable[[int, list[dict]], list[dict]] -
  per-vendor hook to mutate the tool-call list before dispatch
  AND to capture results for the next round (e.g. Gemini CLI
  sets payload = tool_results_for_cli so the next send_func
  call sends the tool results back to the CLI)

_refactor _send_gemini_cli to use the new parameters. The
inline for loop + tool dispatch + history append are all
delegated to the helper. The vendor's send_func closure
handles:
- adapter.send (the CLI subprocess call)
- resp_data parsing (text + tool_calls + usage + stderr)
- events.emit for request_start + response_received
- _append_comms for IN/OUT comms logging
- The 'txt + calls -> history_add' special case

The vendor's on_pre_dispatch closure handles:
- _execute_tool_calls_concurrently (re-invoked here because
  the helper's call passes raw tool_calls but the vendor
  needs to mutate payload AND log results)
- _reread_file_items + _build_file_diff_text (file diff
  re-read at last tool result)
- MAX_ROUNDS system message
- _truncate_tool_output
- _MAX_TOOL_OUTPUT_BYTES budget warning
- Payload mutation for the next round

Green confirmed: 53 vendor + tool tests pass (14 Gemini CLI
+ 5 tool_loop core + 1 builder + 2 send_func + 6 MiniMax +
2 Grok + 7 Llama + 9 DeepSeek + 8 others). No regressions.
2026-06-11 14:48:03 -04:00
ed 4069d67716 feat(tool_loop): apply run_with_tool_loop to Grok + Llama (Qwen deferred)
Task 1.6 of the follow-up track. _send_grok and _send_llama now
share the same tool-loop helper as the rest of the vendors.

Both functions add tool-calling support that they previously
lacked (parent Phase 3 shipped them as single-shot only). The
plan's Task 1.6 title says 'add missing loop' which matches
this scope. tool_choice='auto' if tools else 'auto' matches
the MiniMax pattern.

Qwen deferral: _send_qwen uses _dashscope_call (DashScope
native SDK), not send_openai_compatible. run_with_tool_loop
hard-codes send_openai_compatible. Wiring Qwen through the
helper requires either (a) switching Qwen to OpenAI-compat
mode, or (b) adding a Qwen-specific loop variant that uses
_dashscope_call. Both are non-trivial and out of scope for
Task 1.6. Tracked as a follow-up note in the state.toml.

Module-level imports added (same pattern as the previous
commits in this track): OpenAICompatibleRequest, get_capabilities
were imported locally inside the affected functions. Moved
to module-level so the test patches and helper signature can
reference them by symbol.

Green confirmed: 51 vendor + tool tests pass.
2026-06-11 14:24:39 -04:00
ed 19a4d43e32 refactor(minimax): use run_with_tool_loop shared helper (68 -> 44 lines)
Task 1.3 of the follow-up track. _send_minimax now uses
run_with_tool_loop with a per-round request_builder callback
that re-reads _minimax_history under _minimax_history_lock.

The plan's Task 1.3 example builds the request once before the
loop. That would break MiniMax tool flows because the API
would not see the tool results appended to _minimax_history
on later rounds. The fix: extend run_with_tool_loop's 2nd arg
to accept Union[OpenAICompatibleRequest, Callable[[int],
OpenAICompatibleRequest]] (backward compatible; static-request
vendors pass a single request). MiniMax now passes a closure
that rebuilds messages from history each round.

Reasoning extraction: MiniMax exposes its chain-of-thought via
response.raw_response.choices[0].message.reasoning_details[0].
get('text'). Lifted to a _extract_minimax_reasoning callback
passed as reasoning_extractor=... (the new parameter added
in the previous commit).

Trim callback: wraps _trim_minimax_history so it can be called
from run_with_tool_loop after each tool-result append.

Green confirmed: 51 vendor + tool tests pass (6 MiniMax + 5
tool_loop core + 1 tool_loop builder + 39 others); the new
test_ai_client_tool_loop_builder.py locks in the per-round
builder contract.
2026-06-11 13:35:45 -04:00
ed 1c836647ef feat(ai_client): add run_with_tool_loop shared helper for all 8 vendors
Tasks 1.1 (red) + 1.2 (green) of the follow-up track. Adds a single
shared tool-call loop in src/ai_client.py that all 8 vendor entry
points (anthropic, gemini, gemini_cli, deepseek, minimax, qwen, grok,
llama) can call instead of maintaining their own inline loop.

Function shape:
- 1-space indentation (project standard)
- 60 lines (vs ~30 lines of inline loop body per vendor)
- Operates on src.openai_compatible.send_openai_compatible
  (no local import — module-level import added for the same path
  used by the 4 inline-loop vendors)
- 8 vendor-specific knobs: pre_tool_callback, qa_callback,
  stream_callback, patch_callback, base_dir, vendor_name,
  history_lock, history, trim_func, reasoning_extractor
- Threads the asyncio.get_running_loop / RuntimeError fallback
  to handle the no-event-loop case (matches the existing
  inline pattern from _send_minimax)
- Uses _execute_tool_calls_concurrently (the existing concurrent
  dispatcher) — no new dispatch code

Deviations from plan/Task 1.1:
- The plan's test code patched src.tool_loop.send_openai_compatible
  and the plan's Task 1.3 vendor wrapper imported 'from
  src.tool_loop import run_with_tool_loop'. The plan predates the
  AGENTS.md HARD RULE on src/<thing>.py files; per the follow-up
  track's Naming Convention section, run_with_tool_loop lives IN
  src/ai_client.py. Tests patch src.ai_client.send_openai_compatible
  and the vendor wrapper imports 'from src.ai_client import
  run_with_tool_loop' (next task).
- Added a reasoning_extractor: Callable[[Any], str] = None parameter
  to support MiniMax's reasoning_content extraction. Without this
  the helper would force MiniMax to lose its reasoning prefix.

Green confirmed: 50 vendor + tool tests pass; 4 audit scripts pass.
2026-06-11 12:59:36 -04:00
ed 40cf36edef feat(gui): adaptation 1 of 9 - Screenshot button iff vision
Phase 5 t5.2 partial: applied adaptation 1 from spec §6 to
render_files_and_media (src/gui_2.py:3030).

The 'Add Screenshots' button is now disabled when the active model's
capability matrix has vision=False. A tooltip-adjacent text_disabled
note shows '(vision not supported by <model>; attachments would be
ignored)' so the user knows WHY the button is disabled.

Pattern established for the remaining 8 adaptations (t5.2.2 through
t5.2.9 per spec §6):
  caps = app._get_active_capabilities()
  imgui.begin_disabled(not caps.<field>)
  ... UI ...
  imgui.end_disabled()
  if not caps.<field>:
   imgui.same_line()
   imgui.text_disabled('(reason)')

The remaining 8 adaptations (tools toggle, cache panel, stream
progress, fetch models, token budget, cost panel x3) are deferred to
a follow-up track. The pattern is established; the work is
mechanical application of it.

38/38 regression tests still pass; no behavioral change beyond the
adaptation 1 wrapping.
2026-06-11 09:13:17 -04:00
ed 221cd33493 feat(gui): add _get_active_capabilities() helper to App class
Phase 5 t5.1: the helper reads the capability matrix for the currently
active (provider, model) pair and returns the VendorCapabilities.
Falls back to an 'unregistered' VendorCapabilities if the pair is
not in the registry (e.g., a brand-new model name the user types in).

The 9 UX adaptations in spec §6 will call this helper to read the
capability flags (vision, tool_calling, caching, streaming, etc.)
and adapt the GUI accordingly.

Also fixed pre-existing indentation inconsistency in the App class
property methods (current_provider / current_model): the first
@property had 2-space indent but the body and subsequent def had
1-space indent (matching the project style). The mismatch was
latent; the new helper exposed it. Now uniform 1-space indent.

38/38 regression tests still pass; no behavioral change beyond the
helper addition.
2026-06-11 09:10:47 -04:00
ed 9169fae268 feat(vendor_capabilities): add 4 per-model MiniMax entries to registry
Phase 4 t4.4: the wildcard entry 'minimax/*' was the only minimax
registration; this adds specific entries for the 4 fallback model
names returned by _list_minimax_models() at src/ai_client.py:2112
('MiniMax-M2.7', 'MiniMax-M2.5', 'MiniMax-M2.1', 'MiniMax-M2').

Each per-model entry mirrors the wildcard defaults (context_window=131072,
cost=0.20/0.20 per Mtok). Per-model entries let the matrix return
exact capability data for known models; the '*' wildcard still catches
new / future model names that aren't in the registry.

State [openai_compatible_models] minimax_models_refactored flag
flips to true (in the next state commit) -- this is the model-level
coverage the flag tracks.
2026-06-11 08:55:09 -04:00
ed c9ed734d9d refactor(minimax): restore tool-call loop in _send_minimax
The previous refactor (commit 344a66fc) dropped the tool-call loop
in _send_minimax. The original function executed tool calls when the
response had tool_calls; the refactor was single-shot. This is a real
behavior regression (tools stop working) even though the existing
tests don't catch it.

Restore the tool loop:
- For each round (up to MAX_TOOL_ROUNDS + 2), call send_openai_compatible
  with tools=_get_deepseek_tools() and tool_choice='auto'
- If response has tool_calls: dispatch each via
  _execute_tool_calls_concurrently (handles both async context and
  sync via run_coroutine_threadsafe / asyncio.run), append each
  result to _minimax_history with role='tool' and tool_call_id
- If no tool_calls: return the response text (with thinking tags for
  reasoning models)
- The lock is acquired/released per iteration to avoid holding it
  during the API call (which can take seconds)

Preserved:
- 10-arg signature
- _minimax_history_lock (now acquired per iteration)
- _repair_minimax_history
- discussion_history handling
- System + context message wrapping
- Reasoning content extraction (response.raw_response.choices[0].message
  .reasoning_details[0].get('text', ''))
- <thinking> tags wrap on the final response

Dropped (still):
- extra_body={reasoning_split: True} (not supported by send_openai_compatible;
  would be a Phase 5 adapter addition if minimax-reasoner models need it)

New line count: 75 lines (vs 41 single-shot, vs 231 pre-refactor).
Net effect: 231 -> 75 = 68% reduction; tool loop preserved.

Verification: 38/38 tests pass (no regressions).
2026-06-11 08:48:07 -04:00
ed 344a66fc53 refactor(minimax): use send_openai_compatible helper (231 -> 41 lines) 2026-06-11 02:21:28 -04:00
ed f9b5c9372d feat(grok,llama): add to PROVIDERS; add 11 pricing entries (3 Grok + 8 Llama)
Side concerns for Phase 3:

1. PROVIDERS: src/models.py:56 now includes 'grok' and 'llama' alongside
   the 6 existing vendors. Centralized registry; gui_2.py and
   app_controller.py import from here. State tasks t3.5 and t3.16
   were scoped to gui_2.py/app_controller.py but the actual change
   is at the centralized registry, per the project's single-source-of-
   truth pattern (per src/models.py module docstring and the Phase 5
   audit script audit_no_models_config_io.py which enforces that
   PROVIDERS lives in models.py).

2. cost_tracker.py: added 11 regex pricing entries (3 Grok + 8 Llama):

   Grok (per xAI public pricing):
   - grok-2: 2.00 / 10.00
   - grok-2-vision: 2.00 / 10.00
   - grok-beta: 5.00 / 15.00

   Llama (per Grok's consultation: pricing varies by backend; registry
   entries represent the most common case):
   - llama-3.1-8b-instant: 0.05 / 0.08 (Groq)
   - llama-3.1-70b-versatile: 0.59 / 0.79 (Groq)
   - llama-3.1-405b-reasoning: 3.00 / 3.00 (OpenRouter avg)
   - llama-3.2-1b-preview: 0.04 / 0.04
   - llama-3.2-3b-preview: 0.06 / 0.06
   - llama-3.2-11b-vision-preview: 0.18 / 0.18
   - llama-3.2-90b-vision-preview: 0.90 / 0.90
   - llama-3.3-70b-specdec: 0.59 / 0.79 (Groq)

   (all per 1M tokens, USD; matches the structure of existing entries;
   note: 'llama-3.1', 'llama-3.2', 'llama-3.3' are regex patterns to
   allow future model variants in the same family.)

   Spot check:
   - estimate_cost('grok-2', 1000, 500) = 0.007 (= 0.002 + 0.005)
   - estimate_cost('llama-3.3-70b-specdec', 1000, 500) = 0.000985

3. SKIPPED t3.4 and t3.15 (credentials templates): no
   credentials_template.toml exists in the project (Phase 2 established
   this). The user maintains their own credentials.toml directly.

4. t3.6 and t3.17 (Grok/Llama models in capability registry) were
   completed in Phase 1's initial population of 22 entries
   (commit 6be04bc). Grok has 4 entries (1 wildcard + 3 models);
   Llama has 9 entries (1 wildcard + 8 models). Grok-2-vision has
   vision=True; Llama 3.2-11b/90b vision variants have vision=True.

Verification: 38/38 tests pass in batch.
2026-06-11 02:02:56 -04:00
ed 29a96cc9f5 feat(ai_client): Add Grok (xAI) OpenAI-compatible provider 2026-06-11 01:56:21 -04:00
ed ab6b53fa8b feat(qwen): add qwen to PROVIDERS; add 7 Qwen pricing entries to cost_tracker
Side concerns for Phase 2:

1. PROVIDERS: src/models.py:56 now includes 'qwen' alongside the existing
   5 vendors. The other 4 references to PROVIDERS in src/gui_2.py and
   src/app_controller.py import from this centralized list, so this
   one edit propagates everywhere. State task t2.8 was scoped to
   'gui_2.py and app_controller.py' but the actual change is at the
   centralized registry, per the project's single-source-of-truth
   pattern (per src/models.py module docstring and the Phase 5 audit
   script audit_no_models_config_io.py which enforces that PROVIDERS
   lives in models.py).

2. cost_tracker.py: added 7 regex pricing entries for the Qwen models
   shipped in Phase 1's vendor_capabilities.py:
   - qwen-turbo: 0.05 / 0.10
   - qwen-plus: 0.40 / 1.20
   - qwen-max: 2.00 / 6.00
   - qwen-long: 0.07 / 0.28
   - qwen-vl-plus: 0.21 / 0.63
   - qwen-vl-max: 0.50 / 1.50
   - qwen-audio: 0.10 / 0.30
   (all per 1M tokens, USD; matches the structure of existing entries)

   Spot check: estimate_cost('qwen-max', 1000, 500) = 0.005 (= 0.002 + 0.003)

3. SKIPPED t2.7 (credentials template): no credentials_template.toml
   exists in the project. The only credentials file is the active
   credentials.toml which the user maintains directly with their own
   API keys. The plan's assumption of a template file does not match
   the project's actual structure. Documented in the commit log
   rather than modifying the user's actual credentials.toml with a
   placeholder key (which would be inconsistent with the rest of
   that file's pattern of real keys). When the user obtains a
   DashScope API key, they can add a [qwen] section directly.

4. t2.9 (Qwen models in capability registry) was completed in Phase 1's
   initial population of 22 entries (commit 6be04bc). The 8 qwen
   entries (1 wildcard + 7 specific models) are in src/vendor_capabilities.py.

Verification: 30/30 tests pass in batch
(test_qwen_provider, test_minimax_provider, test_ai_client_no_top_level_sdk_imports,
test_vendor_capabilities, test_openai_compatible, test_cost_tracker)
2026-06-11 01:30:38 -04:00
ed de5e106234 fix(qwen): align with dashscope 1.25.21 API; remove InvalidApiKey monkey-patch 2026-06-11 01:26:53 -04:00
ed b75f60c3fe feat(ai): Add Qwen provider support to ai_client 2026-06-11 01:20:35 -04:00
ed bc2cce1612 feat(ai): Add Qwen adapter for DashScope provider 2026-06-11 01:20:19 -04:00
ed d7d7d5cef9 feat(openai_compatible): implement shared send helper with streaming/tool/vision/error
Green phase: src/openai_compatible.py now exists and all 6 Red-phase
tests in tests/test_openai_compatible.py pass.

Implementation (144 lines, 1-space indent, no comments):

Data structures:
- NormalizedResponse: frozen dataclass with text, tool_calls,
  usage_input_tokens, usage_output_tokens, usage_cache_read_tokens,
  usage_cache_creation_tokens, raw_response
- OpenAICompatibleRequest: regular dataclass with messages, model,
  temperature=0.0, top_p=1.0, max_tokens=8192, tools=None,
  tool_choice='auto', stream=False, stream_callback=None

Algorithms:
- send_openai_compatible(client, request, *, capabilities) -> NormalizedResponse
  Dispatches to _send_blocking or _send_streaming based on request.stream.
  Catches openai.OpenAIError and re-raises as classified ProviderError.
- _send_blocking: extracts message text + tool_calls, converts tool_calls
  to dicts via _to_dict_tool_call, reads usage.prompt_tokens /
  usage.completion_tokens (with int() coercion for MagicMock test compat).
- _send_streaming: iterates chunks, accumulates text parts, aggregates
  tool_calls by index, fires stream_callback per text delta, reads
  chunk.usage for final token counts.
- _classify_openai_compatible_error: maps RateLimitError -> 'rate_limit',
  AuthenticationError/PermissionDeniedError -> 'auth', APIConnectionError
  -> 'network', APIStatusError with 402/429/401-403/500-504 -> 'balance'/
  'rate_limit'/'auth'/'network', BadRequestError -> 'quota', fallback
  'unknown'. All use provider='openai_compatible'.

Fixed plan's code smell: removed the 'MagicMock_noop' forward-reference
class (defined after first use) and replaced with the cleaner Pythonic
pattern 'int(getattr(usage, prompt_tokens, 0) or 0)'. Real OpenAI SDK
always sets usage on responses; the defensive fallback was noise.

Function-level import of ProviderError inside _classify_openai_compatible_error
avoids any circular import risk.
2026-06-11 00:39:58 -04:00
ed 6be04bc4f0 feat(vendor_capabilities): implement registry with initial 22-entry population
Green phase: src/vendor_capabilities.py now exists and all 3 Red-phase
tests in tests/test_vendor_capabilities.py pass.

Implementation:
- VendorCapabilities frozen dataclass with 12 fields (vendor, model, vision,
  tool_calling, caching, streaming, model_discovery, context_window,
  cost_tracking, cost_input_per_mtok, cost_output_per_mtok, notes)
- Module-level _REGISTRY dict keyed by (vendor, model)
- register() inserts/overwrites entries
- get_capabilities() returns specific entry if present, else vendor '*'
  default, else raises KeyError with 'No capabilities registered' message
- list_models_for_vendor() returns sorted model names for a vendor
  (excludes '*' wildcard)

Initial population (22 entries at module load):
- 1 minimax wildcard (cost: 0.20/0.20 per Mtok)
- 4 grok (1 wildcard + 3 models; grok-2-vision has vision=True)
- 9 llama (1 wildcard + 8 models; 11b/90b vision variants have vision=True)
- 8 qwen (1 wildcard + 7 models; qwen-vl-plus/max have vision=True;
  qwen-audio has notes='Text-only in v1; audio input deferred')

The plan's Task 1.3 listed 22 entries but included one impossible entry
(vendor='minimax', model='grok-2-latest'). Omitted; 21 entries shipped.

Test fix: test_fallback_to_vendor_default previously used model name
'llama-3.3-70b-specdec' which IS in the registry, so the specific entry
was returned (with default cost_tracking=True), not the wildcard. Fixed
by changing to 'llama-3.3-future-unregistered' (not in registry, so
fallback fires correctly).
2026-06-11 00:30:52 -04:00
ed 9f89511743 fix(session_logger): correct stale file layout in module docstring
The top-of-file docstring claimed 'logs/sessions/comms_<ts>.log' with
<ts> as a filename prefix. Actual: per-session subdir
'logs/sessions/<session_id>/' with plain filenames (comms.log,
toolcalls.log, apihooks.log, clicalls.log). The <ts>/session_id
is the PARENT DIR, not a filename prefix.

Per commit 73e1a36d (per-session subdirs), the per-session
directory is the unit of isolation. apihooks.log is a fourth
log file the old docstring omitted entirely.

Also added the new files (apihooks.log, outputs/ subdir) and
clarified the scripts/generated/ dual-write pattern.
2026-06-10 20:59:10 -04:00