conductor(tracks): mark rag_phase4_stress_test_flake resolved (commit 16412ad5)

2026-06-06 11:29:03 -04:00
parent 16412ad5f9
commit 9d72d98b50
1 changed files with 2 additions and 3 deletions
@@ -152,9 +152,8 @@ User review surfaced five outstanding UI issues, each previously attempted witho
 0. [ ] **Track: Sloppy.py Startup Speedup**
   *Status: 2026-06-05 — Surfaced during regression_fixes_20260605 root-cause analysis. `sloppy.py --enable-test-hooks` startup latency has crept up; live_gui fixtures time out at `wait_for_server(timeout=15)`. Hypothesized cause: too much init work on the main thread (FastAPI hook server bring-up, log pruner retry loops, MCT startup). Plan: profile startup, move heavy init off the main thread to the controller's background thread pool, defer non-critical subsystems to lazy-init on first use. Spec/plan to follow.*

-0b. [ ] **Track: rag_phase4_stress_test_flake_20260606**
-   *Status: 2026-06-06 — Surfaced during post-v2 verification test run. `test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim` passes in isolation (14.83s) but fails in the full batched run (24.56s) with `AssertionError: Modified context not found in discussion`. Pattern: 50 dummy files indexed, 1 modified, RAG re-indexed successfully, but the modified content doesn't appear in the AI's discussion context. Hypothesized causes: (a) test-ordering flake with the session-scoped live_gui fixture (most likely), (b) timing issue with the RAG retrieval into the AI context, (c) real bug in the RAG->AI context pipeline that only surfaces with 50+ files. Plan: investigate determinism first (run with different batch orders), then check the RAG->context assembly code path. Spec/plan to follow.*
-
+0b. [x] **Track: rag_phase4_stress_test_flake_20260606** — fixed 16412ad5
+   *Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (`tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/`) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). `index_file()` upserts silently corrupt the collection, then `search()` fails with `Collection expecting embedding with dimension of 3072, got 384` and the AI request never reaches 'done' status, timing out the 50*0.5s = 25s poll loop. Fix: `RAGEngine._init_vector_store` now calls `_validate_collection_dim` which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: `test_rag_collection_dim_mismatch_recreates_collection` and `test_rag_collection_dim_match_preserves_collection` in `tests/test_rag_engine.py`. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*
 0a. [ ] **Track: prior_session_test_harden_20260605** [superseded by live_gui_test_hardening_v2_20260605]
   *Status: 2026-06-05 — Surfaced during live_gui_fragility_fixes_20260605 execution. `test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders` is more under-mocked than expected. Completed as part of live_gui_test_hardening_v2_20260605: test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4.*