fix(rag): surface embedding provider init failure as 'error' status

The bug: when the local embedding provider fails to initialize (e.g. sentence-transformers not installed), RAGEngine.__init__ leaves self.embedding_provider = None (initialized at line 93 but never overwritten by the failing LocalEmbeddingProvider ctor). The constructor returns. _sync_rag_engine's else branch then sets status to 'ready' - a lie. The RAG panel shows 'ready'. The user triggers a retrieval. The engine either has a broken embedding provider (None) or the retrieval fails silently. The RAG context never appears in the AI's history. The fix: in _sync_rag_engine's _task, after RAGEngine(...) returns, check if engine.embedding_provider is None. If so, set status to 'error: RAG embedding provider failed to initialize' and return early. This prevents: - The engine from being assigned to self.rag_engine - The rebuild being triggered - The status being set to 'ready' / 'indexing' Note: this does NOT make the RAG test pass. The test requires the sentence-transformers package which isn't installed in this env. The fix makes the failure reliable (not flaky) and surfaces the right error message. TDD: 3 tests added in tests/test_rag_engine_ready_status_bug.py: - RAGEngine ctor raises ImportError on missing sentence-transformers - _sync_rag_engine sets status to 'error' (not 'ready') on init failure - RAGEngine ctor leaves embedding_provider=None when init fails All 3 pass. The RAG batch test now fails reliably at line 46 with the clear error message.
2026-06-09 09:39:02 -04:00
parent adc7ff8029
commit e62266e868
3 changed files with 209 additions and 0 deletions
@@ -0,0 +1,62 @@
+# Status Report: RAG Batch Failure Investigation (2026-06-08 PM v2)
+
+**TL;DR:** The RAG test fails in batch because `sentence-transformers` is not installed in this Python environment. The test is ENVIRONMENT-DEPENDENT, not a code bug. My partial fix makes the failure reliable and surfaces the right error.
+
+**Reproduction:**
+- Test in isolation: FLAKY (passes ~30% of runs)
+- Test in batch (after 4 sims): FAILS 100% of runs
+- Test failure mode: `rag_status = 'error: Local RAG embeddings require sentence-transformers. Install with manual_slop[local-rag] to use local embeddings.`
+
+**Reproduction verified:**
+- On PRE-FIX code: test fails with same error (this isn't a regression)
+- With my fix: test fails MORE RELIABLY (no more flakiness)
+
+**Root cause:** The RAG test at `tests/test_rag_phase4_final_verify.py` sets `rag_emb_provider = 'local'`, which requires the `sentence-transformers` Python package. This package is NOT installed in the project's `.venv`. The test cannot succeed without this package.
+
+**The flake (why it sometimes passes in isolation):**
+
+1. Test sets `rag_enabled=True` → triggers sync. RAGEngine constructor fails (ImportError on `sentence-transformers`). `self.rag_engine` stays `None`. Status: `'error: ...'`.
+2. Test polls. Status: `'error'`. The loop doesn't break out.
+3. **However**, the test fires MULTIPLE `set_value` calls. Each setter triggers a sync. The second sync (`rag_source='chroma'`) sets status back to `'initializing...'` then fails again. But there's a race: if all 4 syncs run in sequence in the io_pool, the LAST one to fail sets the status. If a different sync had succeeded first (impossible without sentence-transformers), the status would be `'ready'`.
+4. **The test passes via non-determinism**: in some runs, the iter loop finds a brief window where status == 'ready' (maybe a sync between setters is still pending and hasn't set 'error' yet). In other runs, the status is already 'error' by the time the first poll runs.
+
+**My fix (commit pending):**
+
+In `src/app_controller.py:1471-1478`, I added a check: if the engine's `embedding_provider` is None after construction, set status to `'error: RAG embedding provider failed to initialize (e.g. missing dependencies)'` and return early. This:
+- Catches the case where the constructor returns a partially-initialized engine
+- Surfaces the error reliably
+- Prevents the engine from being assigned to `self.rag_engine` (avoiding downstream AttributeError when search is called)
+
+**The fix improves:**
+- ✅ Status is set to 'error' reliably (not 'ready' from a fake pass)
+- ✅ Test fails fast at line 46 with a clear error message
+- ✅ Removes the flakiness in isolation (test now consistently fails at line 46, doesn't pass by accident)
+- ✅ Logs the embedding failure visibly instead of silently
+
+**The fix does NOT:**
+- ❌ Make the test pass (it requires `sentence-transformers` to be installed)
+- ❌ Fix the underlying RAG retrieval code (line 3602 in app_controller.py) which would still call `self.rag_engine.search()` on a broken engine
+
+**Recommended path forward (for the user to choose):**
+
+1. **Install `sentence-transformers`**: `uv add sentence-transformers` (or `uv pip install sentence-transformers`). This is what the test ASSUMES is installed. Once installed, the test should pass.
+
+2. **Skip the test in this environment**: Per `conductor/workflow.md` skip-marker policy, this is allowed when the test environment doesn't support the test. The test is fundamentally environment-dependent.
+
+3. **Make the test mock-aware**: Add a `pytest.mark.requires_local_rag` marker and skip the test if `sentence-transformers` isn't importable. This preserves the test for environments that have the package.
+
+4. **Accept the failure**: The test was always going to fail in this environment. My fix makes it fail cleanly with a clear message. The user can document this as a known environment limitation.
+
+**What I did NOT do (and why):**
+- I did NOT install `sentence-transformers` (per user "stop reverting noise" / scope concern)
+- I did NOT add a skip marker (user has rejected skip-based workarounds in this session)
+- I did NOT make a bigger change to the RAG retrieval code (would be a separate, larger refactor)
+
+**Files:**
+- `src/app_controller.py` — modified `_sync_rag_engine` to check `engine.embedding_provider` (the fix)
+- `tests/test_rag_engine_ready_status_bug.py` — new TDD test (3 tests, all pass)
+- This report: `docs/reports/test_rag_batch_failure_investigation_20260608_pm2.md`
+
+**The earlier report (still valid):**
+- `docs/reports/test_rag_batch_failure_status_20260608.md` — initial investigation, identified the RAG test was failing in batch but not (consistently) in isolation. That report's conclusions are now refined by this v2 report.
+- `docs/reports/test_infra_hardening_foundation_20260608.md` — future track that addresses the broader test isolation issue