Private
Public Access
0
0

conductor(audit): document _sync_rag_engine race in controller

This commit is contained in:
2026-06-09 15:29:17 -04:00
parent aebbd66836
commit 5e13fa9ba7
@@ -0,0 +1,68 @@
# _sync_rag_engine Race Audit
## Setters that trigger sync (direct callers)
- `rag_enabled.setter` (src/app_controller.py:1499)
- `rag_source.setter` (src/app_controller.py:1509)
- `rag_emb_provider.setter` (src/app_controller.py:1519)
- `rag_collection_name.setter` (src/app_controller.py:1557)
- `__init__` when `rag_config.enabled` is True (src/app_controller.py:1844)
## Indirect triggers
- `_rebuild_rag_index` is called from `_sync_rag_engine` itself (line 1481) when engine is empty and `self.files` is non-empty
- `ui_file_paths` setter (line 1576) changes `self.files` but does NOT call `_sync_rag_engine` directly; subsequent `_sync_rag_engine` calls see the new files
## Submit pattern (src/app_controller.py:1460-1490)
```
def _sync_rag_engine(self):
self._set_rag_status("initializing...")
def _task():
try:
from src import rag_engine
engine = rag_engine.RAGEngine(self.rag_config, self.active_project_root)
if engine.embedding_provider is None:
self._set_rag_status("error: RAG embedding provider failed to initialize (e.g. missing dependencies)")
return
with self._rag_engine_lock:
self.rag_engine = engine
if self.rag_engine and self.rag_engine.is_empty() and self.files:
self._rebuild_rag_index()
else:
self._set_rag_status("ready")
except Exception as e:
self._set_rag_status(f"error: {e}")
sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")
sys.stderr.flush()
self.submit_io(_task)
```
## Coalescing mechanism
NONE. Every setter call immediately submits a fresh task to the io_pool. There is no debounce, no token check, no dirty flag.
## Lock
`self._rag_engine_lock` exists (line 1482) but only protects the assignment of `self.rag_engine = engine`. The construction of `RAGEngine(...)` runs WITHOUT the lock, so two tasks can be building engines simultaneously.
## Race scenario
1. Test fires `set_rag_collection_name("name_A")` → submit task T1 to io_pool
2. Test fires `set_rag_enabled(True)` 50ms later → submit task T2 to io_pool
3. T1 starts on io_pool thread #1, starts constructing `RAGEngine(self.rag_config, ...)` with collection_name="name_A"
4. T2 starts on io_pool thread #2, starts constructing `RAGEngine(self.rag_config, ...)` with collection_name="name_B"
5. T1 finishes first, acquires `_rag_engine_lock`, sets `self.rag_engine = engine_A` (collection_name="name_A")
6. T2 finishes, acquires lock, sets `self.rag_engine = engine_B` (collection_name="name_B") ← LAST WRITER WINS
7. Test queries `self.rag_engine.vector_store.collection_name` → gets "name_B" (the most recent setter)
8. But the engine was constructed with whatever the controller's rag_config was AT THE TIME of construction. If `_rebuild_rag_index` was called from T1 with files that exist at the time, but T2's engine_A already had different state...
## Why this is non-deterministic
- T1's engine may have indexed files using its config snapshot
- T2's engine may have indexed DIFFERENT files using ITS config snapshot
- Whichever finishes LAST is the one that survives
- The test may have set `rag_collection_name=A` expecting that to be used; but T2 (which set `rag_enabled=True` later) wins the race, and engine_B has `collection_name=B` not A
## Fix outline (for Phase 4)
1. Add to `__init__`: `self._rag_sync_token: int = 0`, `self._rag_sync_dirty: bool = False`, `self._rag_sync_lock: threading.Lock`
2. In `_sync_rag_engine`: increment token, set dirty=True, submit task with current token
3. In the task: check if token is still current. If not, return early (a newer sync will pick up the changes). If yes, build the engine, check dirty again, if clean return, else loop to pick up new changes.
## Files affected
- src/app_controller.py:1460 (_sync_rag_engine method)
- src/app_controller.py:1037 area (AppController.__init__ state)
- New test: tests/test_sync_rag_engine_coalescing.py (Phase 4 Task 4.1.3)