Private
Public Access
0
0
Files
manual_slop/conductor/code_styleguides/chroma_cache.md
T
ed 01ea22fc4a docs(styleguide): add chroma_cache.md — chroma DB path and cleanup pattern
Lesson 5 from the 4-day test-hell saga. The chroma cache lives at
tests/artifacts/.slop_cache/chroma_<collection>/, NOT at the per-run
live_gui_workspace_<timestamp>/ subdir. The trailing-slash bug in
Path(active_project_path).parent places the cache one level higher
than expected.

RAG tests must pre-clean the cache to avoid persistent state from
prior batched runs. Documents the cleanup pattern (shutil.rmtree with
ignore_errors=True), the auto-recovery mechanism (_validate_collection_dim),
and 3 anti-patterns (assuming per-run, not cleaning, asserting on
first chunk in batched context).
2026-06-10 20:18:09 -04:00

91 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Chroma Cache Path Styleguide
## The Rule
The ChromaDB persistent vector cache lives at:
```
<project_root>/tests/artifacts/.slop_cache/chroma_<collection_name>/
```
**NOT** at the per-run `tests/artifacts/live_gui_workspace_<timestamp>/` subdir.
Tests that interact with RAG **MUST** pre-clean the cache to avoid persistent state from prior tests in the batched run.
## Why This Rule Exists
The chroma cache path is auto-derived from `RAGEngine._init_vector_store()` (`src/rag_engine.py:108-125`):
```python
db_path = os.path.abspath(os.path.join(
self.base_dir, ".slop_cache", f"chroma_{vs_config.collection_name}"
))
```
`self.base_dir` is computed as `Path(active_project_path).parent`. **The trailing-slash bug**: when the test config produces a project path ending in `/` (e.g., from `os.path.join` with a trailing `/`), `Path(p).parent` returns the directory ONE LEVEL HIGHER than expected. So the chroma cache lands at `tests/artifacts/.slop_cache/` (the parent of the per-run `live_gui_workspace_<timestamp>/` subdir) instead of inside the per-run subdir.
This was the dominant cause of `tier-3-live_gui` failures in the 2026-06-08 to 2026-06-10 window. A prior batched run with a different embedding provider (e.g., Gemini 3072-dim vs local 384-dim) leaves a corrupt collection on disk. The next test's `search()` raises `chromadb.errors.InvalidDimensionError: Collection expecting embedding with dimension of X, got Y`, the AI request never reaches `'done'` status, and the live_gui test polls timeout at 50×0.5s = 25s.
## The Pre-Cleanup Pattern
RAG tests should wipe the chroma cache BEFORE pushing RAG config. The pattern is in `tests/test_rag_phase4_final_verify.py`:
```python
from pathlib import Path
import shutil
def test_phase4_final_verify(live_gui):
# Wipe any stale chroma from prior batched runs
cache = Path("tests/artifacts/.slop_cache/chroma_test_final_verify")
if cache.exists():
shutil.rmtree(cache, ignore_errors=True)
# ... rest of test
```
`ignore_errors=True` is required because:
- On Windows, the chroma client may still hold file handles; `rmtree` may fail with `WinError 32` (sharing violation).
- If a parallel xdist worker is mid-write, the rmtree can race; `ignore_errors` lets the next worker's write retry.
The `_validate_collection_dim()` mechanism in `RAGEngine` (`src/rag_engine.py:127-213`) also auto-recovers by wiping the dim-mismatched collection (see [docs/guide_rag.md](../docs/guide_rag.md#dimension-mismatch-protection)). But pre-cleaning is faster and avoids the stderr warning.
## Anti-Patterns
**Assuming the cache is per-run:**
```python
def test_rag(live_gui, live_gui_workspace):
# WRONG: live_gui_workspace is a per-run subdir, but the chroma
# cache is at tests/artifacts/.slop_cache/, NOT under live_gui_workspace
cache = live_gui_workspace / ".slop_cache" / "chroma_test"
if cache.exists():
shutil.rmtree(cache) # Doesn't find the actual cache
```
**Not pre-cleaning at all:**
```python
def test_rag(live_gui):
# WRONG: no pre-cleanup. If a prior batched run with a different
# embedding provider is on disk, this test will hit dim-mismatch
client = ApiHookClient()
client.push_event("set_value", {"field": "rag_enabled", "value": True})
# ... eventually hangs polling for 'done' status
```
**Asserting on the FIRST retrieved chunk:**
```python
assert "Manual Slop RAG is great" in entry.get("content")
# WRONG: in batched context, the chroma ordering may rank a .py
# file first instead of the .txt file. Either file's content
# proves RAG worked; the assertion must accept either.
```
## When in Doubt
If a RAG test is flaky in batched runs but passes in isolation, the chroma cache is the #1 suspect. The test's actual chroma path is `Path("tests/artifacts/.slop_cache") / f"chroma_{collection_name}"`. Wipe it before the test starts.
## Related
- [docs/guide_testing.md §Chroma Cache Path and Cross-Test Pollution](../docs/guide_testing.md) — broader context in the testing guide
- [docs/guide_rag.md §Dimension Mismatch Protection](../docs/guide_rag.md) — the auto-recovery mechanism
- [conductor/code_styleguides/workspace_paths.md](./workspace_paths.md) — sibling styleguide for test workspace paths
- [docs/reports/test_infrastructure_hardening_batch_green_20260610.md](../docs/reports/test_infrastructure_hardening_batch_green_20260610.md) — the 6-lesson summary this styleguide is sourced from