conductor(tracks): Register qwen_llama_grok_integration_20260606 in registry (item 0d)

2026-06-06 14:56:55 -04:00
parent 7c1d597ef1
commit 055430a75a
1 changed files with 4 additions and 0 deletions
@@ -157,6 +157,10 @@ User review surfaced five outstanding UI issues, each previously attempted witho
   *Goal: Replace alphabetical 4-at-a-time batching in `scripts/run_tests_batched.py` with fixture-class-isolated tiers: 0 (opt-in: clean_install/docker, gated on env var + --include-opt-in flag), 1 (unit, grouped by subsystem batch_group, pytest-xdist), 2 (mock_app, grouped), 3 (live_gui, all in one pytest invocation to amortize 15s startup), H (headless), P (performance, last). Hybrid classification: auto-infer from filename + AST fixture scan, hand-curated `tests/test_categories.toml` overrides for cross-cutting and ambiguous files. Opt-in per-test order control via `[[files.X.test_order]]` sub-tables, gated on a conftest-loaded pytest plugin (no-op without entries). Priority: B (process isolation) > A (subsystem diagnostic) > C (speed). 4 phases: library+dry-run, shadow run, switch default, cleanup.*
   *Goal: Reduce `sloppy.py` startup time by ~2000-2400ms. **Main Thread Purity Invariant**: main thread (entering `immapp.run()`) never imports a module heavier than `imgui_bundle` + lean `gui_2` skeleton. **No-prefetch rule**: heavy SDKs (`google.genai` 955ms, `anthropic` 430ms, `openai` 445ms, `fastapi` 470ms) are lazy-only — paid once on first use, on the asyncio thread, not in the background. **No-new-threads rule**: all background work goes through `AppController._io_pool` (4-thread `ThreadPoolExecutor`, named `controller-io-N`); zero new `threading.Thread(...)` calls in `src/`. **Enforcement**: static `scripts/audit_main_thread_imports.py` CI gate + runtime `tests/test_main_thread_purity.py` (`sys.addaudithook` test). 9 phases, 57 tasks. Target: `import src.ai_client` < 50ms (from ~1800ms), `import src.gui_2` < 500ms (from ~3000ms), `live_gui.wait_for_server(timeout=15)` no longer times out.*

+0d. [ ] **Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix** `[track-created: 7c1d597e]`
+   *Link: [./tracks/qwen_llama_grok_integration_20260606/](./tracks/qwen_llama_grok_integration_20260606/), Spec: [./tracks/qwen_llama_grok_integration_20260606/spec.md](./tracks/qwen_llama_grok_integration_20260606/spec.md), Plan: [./tracks/qwen_llama_grok_integration_20260606/plan.md](./tracks/qwen_llama_grok_integration_20260606/plan.md) (to be authored by writing-plans skill)*
+   *Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a **Vendor Capability Matrix** (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in `src/vendor_capabilities.py`. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared `send_openai_compatible()` helper in `src/openai_compatible.py` that operates on a normalized request/response data structure; each `_send_<vendor>()` is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor `_send_minimax()` to use the helper (~250 lines → ~50). **Out of scope** (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive.*
+
 0b. [x] **Track: rag_phase4_stress_test_flake_20260606** — fixed 16412ad5
   *Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (`tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/`) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). `index_file()` upserts silently corrupt the collection, then `search()` fails with `Collection expecting embedding with dimension of 3072, got 384` and the AI request never reaches 'done' status, timing out the 50*0.5s = 25s poll loop. Fix: `RAGEngine._init_vector_store` now calls `_validate_collection_dim` which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: `test_rag_collection_dim_mismatch_recreates_collection` and `test_rag_collection_dim_match_preserves_collection` in `tests/test_rag_engine.py`. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*
 0a. [ ] **Track: prior_session_test_harden_20260605** [superseded by live_gui_test_hardening_v2_20260605]