docs(styleguide): add chroma_cache.md — chroma DB path and cleanup pattern
Lesson 5 from the 4-day test-hell saga. The chroma cache lives at tests/artifacts/.slop_cache/chroma_<collection>/, NOT at the per-run live_gui_workspace_<timestamp>/ subdir. The trailing-slash bug in Path(active_project_path).parent places the cache one level higher than expected. RAG tests must pre-clean the cache to avoid persistent state from prior batched runs. Documents the cleanup pattern (shutil.rmtree with ignore_errors=True), the auto-recovery mechanism (_validate_collection_dim), and 3 anti-patterns (assuming per-run, not cleaning, asserting on first chunk in batched context).
This commit is contained in:
@@ -0,0 +1,90 @@
|
|||||||
|
# Chroma Cache Path Styleguide
|
||||||
|
|
||||||
|
## The Rule
|
||||||
|
|
||||||
|
The ChromaDB persistent vector cache lives at:
|
||||||
|
|
||||||
|
```
|
||||||
|
<project_root>/tests/artifacts/.slop_cache/chroma_<collection_name>/
|
||||||
|
```
|
||||||
|
|
||||||
|
**NOT** at the per-run `tests/artifacts/live_gui_workspace_<timestamp>/` subdir.
|
||||||
|
|
||||||
|
Tests that interact with RAG **MUST** pre-clean the cache to avoid persistent state from prior tests in the batched run.
|
||||||
|
|
||||||
|
## Why This Rule Exists
|
||||||
|
|
||||||
|
The chroma cache path is auto-derived from `RAGEngine._init_vector_store()` (`src/rag_engine.py:108-125`):
|
||||||
|
|
||||||
|
```python
|
||||||
|
db_path = os.path.abspath(os.path.join(
|
||||||
|
self.base_dir, ".slop_cache", f"chroma_{vs_config.collection_name}"
|
||||||
|
))
|
||||||
|
```
|
||||||
|
|
||||||
|
`self.base_dir` is computed as `Path(active_project_path).parent`. **The trailing-slash bug**: when the test config produces a project path ending in `/` (e.g., from `os.path.join` with a trailing `/`), `Path(p).parent` returns the directory ONE LEVEL HIGHER than expected. So the chroma cache lands at `tests/artifacts/.slop_cache/` (the parent of the per-run `live_gui_workspace_<timestamp>/` subdir) instead of inside the per-run subdir.
|
||||||
|
|
||||||
|
This was the dominant cause of `tier-3-live_gui` failures in the 2026-06-08 to 2026-06-10 window. A prior batched run with a different embedding provider (e.g., Gemini 3072-dim vs local 384-dim) leaves a corrupt collection on disk. The next test's `search()` raises `chromadb.errors.InvalidDimensionError: Collection expecting embedding with dimension of X, got Y`, the AI request never reaches `'done'` status, and the live_gui test polls timeout at 50×0.5s = 25s.
|
||||||
|
|
||||||
|
## The Pre-Cleanup Pattern
|
||||||
|
|
||||||
|
RAG tests should wipe the chroma cache BEFORE pushing RAG config. The pattern is in `tests/test_rag_phase4_final_verify.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from pathlib import Path
|
||||||
|
import shutil
|
||||||
|
|
||||||
|
def test_phase4_final_verify(live_gui):
|
||||||
|
# Wipe any stale chroma from prior batched runs
|
||||||
|
cache = Path("tests/artifacts/.slop_cache/chroma_test_final_verify")
|
||||||
|
if cache.exists():
|
||||||
|
shutil.rmtree(cache, ignore_errors=True)
|
||||||
|
# ... rest of test
|
||||||
|
```
|
||||||
|
|
||||||
|
`ignore_errors=True` is required because:
|
||||||
|
- On Windows, the chroma client may still hold file handles; `rmtree` may fail with `WinError 32` (sharing violation).
|
||||||
|
- If a parallel xdist worker is mid-write, the rmtree can race; `ignore_errors` lets the next worker's write retry.
|
||||||
|
|
||||||
|
The `_validate_collection_dim()` mechanism in `RAGEngine` (`src/rag_engine.py:127-213`) also auto-recovers by wiping the dim-mismatched collection (see [docs/guide_rag.md](../docs/guide_rag.md#dimension-mismatch-protection)). But pre-cleaning is faster and avoids the stderr warning.
|
||||||
|
|
||||||
|
## Anti-Patterns
|
||||||
|
|
||||||
|
❌ **Assuming the cache is per-run:**
|
||||||
|
```python
|
||||||
|
def test_rag(live_gui, live_gui_workspace):
|
||||||
|
# WRONG: live_gui_workspace is a per-run subdir, but the chroma
|
||||||
|
# cache is at tests/artifacts/.slop_cache/, NOT under live_gui_workspace
|
||||||
|
cache = live_gui_workspace / ".slop_cache" / "chroma_test"
|
||||||
|
if cache.exists():
|
||||||
|
shutil.rmtree(cache) # Doesn't find the actual cache
|
||||||
|
```
|
||||||
|
|
||||||
|
❌ **Not pre-cleaning at all:**
|
||||||
|
```python
|
||||||
|
def test_rag(live_gui):
|
||||||
|
# WRONG: no pre-cleanup. If a prior batched run with a different
|
||||||
|
# embedding provider is on disk, this test will hit dim-mismatch
|
||||||
|
client = ApiHookClient()
|
||||||
|
client.push_event("set_value", {"field": "rag_enabled", "value": True})
|
||||||
|
# ... eventually hangs polling for 'done' status
|
||||||
|
```
|
||||||
|
|
||||||
|
❌ **Asserting on the FIRST retrieved chunk:**
|
||||||
|
```python
|
||||||
|
assert "Manual Slop RAG is great" in entry.get("content")
|
||||||
|
# WRONG: in batched context, the chroma ordering may rank a .py
|
||||||
|
# file first instead of the .txt file. Either file's content
|
||||||
|
# proves RAG worked; the assertion must accept either.
|
||||||
|
```
|
||||||
|
|
||||||
|
## When in Doubt
|
||||||
|
|
||||||
|
If a RAG test is flaky in batched runs but passes in isolation, the chroma cache is the #1 suspect. The test's actual chroma path is `Path("tests/artifacts/.slop_cache") / f"chroma_{collection_name}"`. Wipe it before the test starts.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- [docs/guide_testing.md §Chroma Cache Path and Cross-Test Pollution](../docs/guide_testing.md) — broader context in the testing guide
|
||||||
|
- [docs/guide_rag.md §Dimension Mismatch Protection](../docs/guide_rag.md) — the auto-recovery mechanism
|
||||||
|
- [conductor/code_styleguides/workspace_paths.md](./workspace_paths.md) — sibling styleguide for test workspace paths
|
||||||
|
- [docs/reports/test_infrastructure_hardening_batch_green_20260610.md](../docs/reports/test_infrastructure_hardening_batch_green_20260610.md) — the 6-lesson summary this styleguide is sourced from
|
||||||
Reference in New Issue
Block a user