# RAG (Retrieval-Augmented Generation)

[Top](../Readme.md) | [Architecture](guide_architecture.md) | [MMA](guide_mma.md) | [Tools & IPC](guide_tools.md) | [Simulations](guide_simulations.md)

---

## Overview

Manual Slop integrates Retrieval-Augmented Generation (RAG) to extend the AI's working context beyond the explicit file list. When a project is RAG-enabled, the system maintains a vector index of file content; AI calls can retrieve semantically similar fragments at query time and prepend them to the prompt.

The RAG implementation is pluggable: the vector store, the embedding provider, and the chunking strategy are all configurable per project. The default backend is **ChromaDB** (local persistent), the default embedding is **Gemini Embedding 001** (cloud), and the default chunking is **character-based with overlap** (with **AST-aware chunking** for Python files when enabled).

This guide covers:

1. **Architecture** — Where RAG fits in the dispatch pipeline
2. **Components** — `RAGEngine`, embedding providers, vector store
3. **Data Flow** — Indexing, query, retrieval, injection
4. **Configuration** — `RAGConfig` schema and TOML settings
5. **Verification** — Test infrastructure and known edge cases

---

## Architecture

RAG sits between the project's tracked files and the AI provider's input prompt. It is **not** an internal AI call — it is a pre-processing step that augments `md_content` before the provider sees it.

```
                ┌─────────────────────────────────┐
                │ AppController / ConductorEngine │
                │ (caller of ai_client.send)      │
                └────────────┬────────────────────┘
                             │ constructs RAGEngine once per project
                             ▼
        ┌────────────────────────────────────────────┐
        │ RAGEngine                                   │
        │  ├─ EmbeddingProvider (Local or Gemini)    │
        │  ├─ VectorStore (ChromaDB persistent)        │
        │  └─ Chunkers (_chunk_text, _chunk_code)    │
        └────────────┬───────────────────────────────┘
                     │ on every ai_client.send() call:
                     │   rag_engine.search(user_message) -> fragments
                     ▼
        ┌────────────────────────────────────────────┐
        │ ai_client.send(rag_engine=...)             │
        │   injects [RETRIEVED CONTEXT] block        │
        │   into md_content before provider call     │
        └────────────────────────────────────────────┘
```

**Lifecycle**:
- The `AppController` constructs a single `RAGEngine` per project load (lazily, when the project is first opened or when a RAG-related setting changes).
- The `RAGEngine` is passed through to `ai_client.send()` for every AI call from the main discussion flow.
- For Tier 3 workers spawned by the MMA, the ConductorEngine or caller is responsible for constructing the engine (typically with the same configuration as the main discussion).
- If a project disables RAG, `rag_engine=None` is passed to `send()` and the integration is a no-op.

**Why caller-owned?** The RAG engine is decoupled from `ai_client` so that the same module can be reused by the GUI's RAG panel for direct queries, by MMA workers for ticket-specific retrieval, and by future automation scripts. `ai_client` only knows how to *use* an engine if one is provided.

---

## Components

### `RAGEngine` (`src/rag_engine.py`)

The central class. Owns the embedding provider and the vector store, exposes high-level methods for indexing and search.

```python
class RAGEngine:
    def __init__(self, config: models.RAGConfig, base_dir: str = "."):
        ...
```

**Construction**: Takes a `RAGConfig` (from `src/models.py`) and a `base_dir`. The config specifies the embedding provider type, the vector store path, the chunk size, and the chunk overlap.

**Internal state**:
- `embedding_provider: BaseEmbeddingProvider` — set by `_init_embedding_provider`
- `client: chromadb.PersistentClient` — the chroma client (or the string `"mock"` in mock mode)
- `collection: chromadb.Collection` — the actual collection (or `"mock"` in mock mode)
- `chunk_size: int` — character count per chunk
- `chunk_overlap: int` — overlap between adjacent chunks

### Embedding Providers

Two providers are implemented; new ones can be added by subclassing `BaseEmbeddingProvider`.

#### `BaseEmbeddingProvider`

```python
class BaseEmbeddingProvider:
    def embed(self, texts: List[str]) -> List[List[float]]:
        """Embed a batch of texts. Returns one vector per input text."""
        ...
```

A contract: `embed()` takes a list of strings and returns a list of equal-length float vectors. The vector dimensionality is provider-specific (e.g., 384 for `all-MiniLM-L6-v2`, 768 for `gemini-embedding-001`).

#### `LocalEmbeddingProvider`

Uses **sentence-transformers** (`all-MiniLM-L6-v2` by default) for embedding.

- **Pros**: Fully local, no API quota, deterministic.
- **Cons**: Lower-quality embeddings than cloud models for code; CPU/GPU usage during indexing.
- **Default model**: `all-MiniLM-L6-v2` (384 dimensions, ~80MB download on first use).

```python
class LocalEmbeddingProvider(BaseEmbeddingProvider):
    def __init__(self, model_name: str = 'all-MiniLM-L6-v2'):
        ...
```

#### `GeminiEmbeddingProvider`

Uses the **Gemini Embedding 001** model via the google-genai SDK.

- **Pros**: Higher-quality embeddings, especially for code; no local model download.
- **Cons**: Requires Gemini API key, network round-trip per embedding call, subject to API quotas.

```python
class GeminiEmbeddingProvider(BaseEmbeddingProvider):
    def __init__(self, model_name: str = 'gemini-embedding-001'):
        ...
```

#### Lazy Loading

The heavy dependencies (`sentence_transformers`, `google.genai`, `chromadb`) are loaded lazily via `_get_sentence_transformers()`, `_get_google_genai()`, `_get_chromadb()`. This means RAG is opt-in: a project that doesn't enable RAG pays no import-time cost.

### Vector Store

ChromaDB is the default persistent vector store. The store is created at `<project_dir>/.slop_cache/chroma_<collection_name>/` (auto-generated from `VectorStoreConfig.collection_name`, default `"manual_slop"`). The `.slop_cache` location is intentional — it co-locates the chroma index with the existing per-project cache layout.

```python
def _init_vector_store(self):
    vs_config = self.config.vector_store
    if vs_config.provider == 'chroma':
        db_path = os.path.abspath(os.path.join(
            self.base_dir, ".slop_cache", f"chroma_{vs_config.collection_name}"
        ))
        os.makedirs(db_path, exist_ok=True)
        chromadb, Settings = _get_chromadb()
        self.client     = chromadb.PersistentClient(path=db_path)
        self.collection = self.client.get_or_create_collection(name=vs_config.collection_name)
        self._validate_collection_dim()
    elif vs_config.provider == 'mock':
        self.client     = "mock"
        self.collection = "mock"
    else:
        raise ValueError(f"Unknown vector store provider: {vs_config.provider}")
```

**Backends** (`VectorStoreConfig.provider`):
- `chroma` (default for real use) — local persistent, single-process
- `mock` — no-op collection (for tests / RAG-disabled paths)

The `mcp_server` + `mcp_tool` fields in `VectorStoreConfig` are placeholders for the future External RAG Bridge via MCP (e.g., a remote vector database server); not yet implemented.

### Chunking Strategies

Two strategies are implemented. The choice is made per-file based on extension and config.

#### Character-Based (`_chunk_text`)

Default for non-Python files and for Python files when AST chunking is disabled.

```python
def _chunk_text(self, content: str) -> List[str]:
    """Character-based chunking with overlap."""
    chunks = []
    start = 0
    while start < len(content):
        end = min(start + self.chunk_size, len(content))
        chunks.append(content[start:end])
        if end >= len(content): break
        start = end - self.chunk_overlap
    return chunks
```

- **Default chunk size**: 1000 characters
- **Default overlap**: 200 characters
- **Edge cases**: Empty files return `[]`; single-chunk files return `[content]`.

#### AST-Aware (`_chunk_code`)

Used for `.py` files when `RAGConfig.ast_chunking_enabled = True`.

```python
def _chunk_code(self, content: str, file_path: str) -> List[str]:
    """AST-aware chunking for Python code."""
    # Parses with stdlib ast
    # Splits on top-level def/class boundaries
    # Each chunk is a complete top-level definition with its docstring
    ...
```

- **Strategy**: Each top-level function, class, or constant block becomes one chunk. Docstrings are preserved as the first line of the chunk for context.
- **Pros**: Semantic boundaries produce more meaningful retrieval results. A query for "how does X work" is more likely to return the entire definition of X rather than a fragment.
- **Cons**: Requires valid Python; syntax errors fall back to character-based chunking.

The chunker uses stdlib `ast` (not tree-sitter) to avoid pulling tree-sitter for a feature that only handles Python.

---

## Data Flow

### Indexing Flow

When a project is loaded with RAG enabled, the `RAGEngine` is populated by indexing all tracked files.

```
1. Project load: AppController reads [rag] section from manual_slop.toml
2. AppController constructs RAGEngine(config)
3. RAGEngine._init_vector_store() creates/loads ChromaDB collection
   - Calls _validate_collection_dim() to detect/recover from dim mismatch
4. For each tracked file (parallelized):
     a. Read content
     b. Choose chunker based on extension and config
     c. For each chunk: call embedding_provider.embed([chunk])
     d. Add to vector store with metadata {path, chunk_index, ...}
5. Indexing complete; engine is ready for queries
```

**Parallelization**: The indexing pipeline uses `ThreadPoolExecutor` for parallel embedding calls (the embedding step is the bottleneck). The chunking is fast and sequential per file.

**Incremental Updates**: When a file's `mtime` changes (detected by `pathlib.Path.stat().st_mtime`), `delete_documents_by_path()` is called first, then the file is re-indexed. This is critical for the auto-sync flow (see Configuration below).

**Path resolution resilience**: `index_file()` falls back to `os.getcwd()` if the `base_dir`-relative path doesn't exist. This handles batched test conditions where the subprocess CWD differs from the project root (e.g., a test chdir'ing into `tests/artifacts/live_gui_workspace_*/` for fixture isolation). Without the fallback, indexing silently skipped files in those conditions.

### Dimension Mismatch Protection

`_init_vector_store()` calls `_validate_collection_dim()` after creating the collection. The validation inspects the first existing vector's dim and compares it to the current embedding provider's output. On mismatch (e.g., the user switched from Gemini 3072-dim to local 384-dim, or vice versa, or a prior run populated the collection with a different model), the chroma directory is wiped via `shutil.rmtree` (with the client closed first to release file handles) and the collection is recreated with the correct dim.

**Why this exists:** Without validation, dim-mismatched upserts silently corrupt the collection. The next `search()` raises `chromadb.errors.InvalidDimensionError: Collection expecting embedding with dimension of X, got Y`, the AI request never reaches `'done'` status, and the live_gui test polls timeout at 50×0.5s = 25s. This pattern was the dominant cause of `tier-3-live_gui` failures in the 2026-06-08 to 2026-06-10 window.

Regression tests in `tests/test_rag_engine.py`: `test_rag_collection_dim_mismatch_recreates_collection`, `test_rag_collection_dim_match_preserves_collection`.

### Query Flow

When `ai_client.send(rag_engine=engine)` is called:

```
1. send() receives user_message
2. If rag_engine is not None:
     a. rag_engine.search(user_message, top_k=5) -> list of {text, metadata, distance}
     b. If results non-empty: inject [RETRIEVED CONTEXT] block into md_content
     c. The block contains the top_k fragments, formatted as:
        ```
        [RETRIEVED CONTEXT]
        File: path/to/file.py (chunk 0)
        <chunk text>

        File: path/to/another.py (chunk 2)
        <chunk text>
        ...
        ```
3. send() proceeds to the provider call with the augmented md_content
```

The injection point is **before** the system prompt construction. This means the retrieved context is treated as part of the project's tracked content, not as ad-hoc advice.

### Public Methods

> **As of 2026-06-11:** The signatures below document the **post-refactor**
> `Result[T]` returns applied by the `data_oriented_error_handling_20260606`
> track. The pre-refactor methods raised `ImportError` / `ValueError` or
> silently set `self.collection = None` on failure. See the new
> [Data-Oriented Error Handling (Fleury Pattern)](#data-oriented-error-handling-fleury-pattern)
> section below for the full convention.

```python
# Index a single file
rag_engine.index_file(path: str) -> Result[None]
# data=None on both success and failure; check result.errors

# Search the index
rag_engine.search(query: str, top_k: int = 5) -> Result[list[dict[str, Any]]]
# data is the list of {"text", "metadata", "distance"} hits; [] on failure
# Result[None] in the unconfigured case (data=NIL_RAG_STATE)

# Index management
rag_engine.add_documents(
    ids: List[str],
    texts: List[str],
    metadatas: Optional[List[dict]] = None,
) -> Result[None]
rag_engine.delete_documents(ids: List[str]) -> Result[None]
rag_engine.delete_documents_by_path(path: str) -> Result[None]
rag_engine.get_all_indexed_paths() -> Result[list[str]]
rag_engine.is_empty() -> Result[bool]
# All return Result; on error, data is the zero value and result.errors is populated
```

The `RAGEngine._init_vector_store_result()` and
`RAGEngine._validate_collection_dim_result()` methods are the new
internal entry points that produce `Result[None]`. They replace the
old `_init_vector_store()` (which raised `ImportError` on missing
chromadb, or `ValueError` on unknown vector-store provider) and the
old `_validate_collection_dim()` (which caught `Exception` and silently
corrupted the collection). Post-refactor, every failure path produces a
typed `ErrorInfo` entry; the application can react instead of crashing
on an unhandled exception.

---

## Configuration

RAG is configured via the project's `manual_slop.toml`:

```toml
[rag]
enabled = true
embedding_provider = "gemini"  # or "local"

[rag.vector_store]
provider = "chroma"              # "chroma" | "mock"
collection_name = "manual_slop"  # the chroma subdir under .slop_cache/
url = ""                         # future: external HTTP vector store
api_key = ""                     # future: external HTTP auth
mcp_server = ""                  # future: MCP-based external RAG bridge
mcp_tool = ""                    # future: tool name on the MCP server

[rag]
chunk_size = 1000
chunk_overlap = 200
```

### `RAGConfig` + `VectorStoreConfig` Schema (`src/models.py`)

```python
@dataclass
class VectorStoreConfig:
    provider:        str                              # "chroma" | "mock"
    url:             Optional[str] = None             # future: external HTTP
    api_key:         Optional[str] = None             # future: external HTTP auth
    collection_name: str = "manual_slop"
    mcp_server:      Optional[str] = None             # future: MCP bridge
    mcp_tool:        Optional[str] = None             # future: MCP tool name

@dataclass
class RAGConfig:
    enabled:            bool = False
    vector_store:       VectorStoreConfig = field(default_factory=lambda: VectorStoreConfig(provider='mock'))
    embedding_provider: str = 'gemini'                 # "gemini" | "local"
    chunk_size:         int = 1000
    chunk_overlap:      int = 200
```

> **What about the fields the old doc showed?** The 2026-06-10 docs sync verified against `src/models.py:1029-1040` that the previous `RAGConfig` schema was **stale** (predated the schema refactor) — most of the fields it listed did not exist in the real dataclass. Specifically: `ast_chunking_enabled` does not exist anywhere in `src/` (there is no `ChunkingConfig` class — I claimed one existed in an earlier draft of this note and was wrong; flagging the correction here); `vector_store_backend` and `vector_store_path` never existed on `RAGConfig` (they were a flattened version of the now-nested `VectorStoreConfig`); `auto_index_on_load` and `auto_sync_interval_seconds` do not exist anywhere in `src/` (they were aspirational; the actual index-on-load and auto-sync behavior is wired in `RAGEngine` and the controller's `mma_state_update` flow, not via persisted config); `top_k` IS a real thing but it is a **runtime parameter** to `RAGEngine.search(query, top_k=5)` and `RAGEngine._search_mcp(query, top_k=5)` (`src/rag_engine.py:339, 322`), not a field on `RAGConfig` — the old doc confused "config field" with "search parameter."

### Behavior When Disabled

If `enabled = false` (the default), `RAGEngine` is never constructed. `ai_client.send()` receives `rag_engine=None` and the integration is a no-op. The lazy-loading of `chromadb`, `sentence_transformers`, and `google.genai` is also skipped, so there is zero overhead for projects that don't use RAG.

### Auto-Sync

When `auto_sync_interval_seconds > 0`, a background thread periodically scans tracked files for `mtime` changes and re-indexes them. This keeps the vector store consistent with on-disk changes without requiring explicit user action.

The sync uses `pathlib.Path.stat().st_mtime` for change detection (same mechanism as the file cache in `file_cache.py`). For very large projects, the sync can be tuned to skip files above a size threshold.

---

## Cross-System Integration

### `ai_client.send()` Integration

See [guide_architecture.md#rag-integration](guide_architecture.md#rag-integration) for the full dispatch flow. Summary:

```python
def send(md_content, user_message, ..., rag_engine=None) -> str:
    if rag_engine is not None:
        retrieved = rag_engine.search(user_message, top_k=rag_engine.config.top_k)
        if retrieved:
            md_content = _inject_rag_context(md_content, retrieved)
    ...
```

The injection is a no-op if:
- `rag_engine is None`
- `rag_engine.is_empty()` (index has no documents)
- `search()` returns no results above the distance threshold

### MMA Worker Integration

The ConductorEngine does not construct `RAGEngine` itself. Workers receive context via `md_content` which is built by the caller. To use RAG in workers:

1. Construct a `RAGEngine` in the caller (typically `AppController` or test harness).
2. Pass it to `multi_agent_conductor.run_worker_lifecycle(..., rag_engine=...)` (if supported) or to the test invocation.
3. The worker passes it to `ai_client.send(rag_engine=...)`.

Note: As of 2026-06-02, the direct `rag_engine` parameter on `run_worker_lifecycle` is **not yet implemented**. Workers currently rely on the `md_content` already being augmented by the caller, or on Tier 4 / Tier 2 setting up the augmentation before spawning workers.

### GUI Integration

The GUI's RAG panel (under AI Settings → RAG) provides:
- **Status indicator** — `RAGEngine.is_empty()` → "Empty" / "Indexed N chunks"
- **Manual search box** — for testing retrieval quality without sending a full AI call
- **Re-index button** — forces a full rebuild of the index
- **Settings editor** — modifies `RAGConfig` fields and writes back to `manual_slop.toml`

The RAG panel also surfaces the **auto-sync status** (last sync time, files indexed, files pending re-index).

---

## Testing

### Unit Tests

- `tests/test_rag_engine.py` — `RAGEngine` basic lifecycle with mock ChromaDB and mock embedding provider
- `tests/test_rag_integration.py` — End-to-end indexing + search + retrieval

### Simulation Tests

- `tests/test_rag_gui_presence.py` — Verifies the RAG panel renders correctly
- `tests/test_rag_visual_sim.py` — Visual verification of the RAG search results panel

### Stress Tests

- `tests/test_rag_phase4_stress.py` — Indexes 1000+ files, measures retrieval latency
- `tests/test_rag_phase4_final_verify.py` — End-to-end verification of RAG-augmented AI responses

### Test Patterns

The standard pattern for testing RAG-augmented calls:

```python
def test_rag_augmented_send(live_gui):
    # 1. Set up project with RAG enabled
    client.set_rag_config(enabled=True, embedding_provider="local")
    client.reindex_project()
    
    # 2. Send a question that requires retrieval
    response = client.send("How does the Execution Clutch work?")
    
    # 3. Verify the response references the retrieved content
    # (The exact assertion depends on what was indexed)
    assert response
```

For unit tests that don't need real embedding models, the `BaseEmbeddingProvider` is mocked to return deterministic vectors (e.g., based on the hash of the input text).

---
## Data-Oriented Error Handling (Fleury Pattern)

The RAG engine follows the "errors are just cases" framework
(Ryan Fleury). The canonical reference is
[`conductor/code_styleguides/error_handling.md`](../conductor/code_styleguides/error_handling.md).

### Result-Based Returns

RAG methods that previously raised `ImportError`, `ValueError`, or
silently mutated `self.collection = None` on failure now return
`Result[T]` with side-channel `ErrorInfo` entries:

| Method | Pre-refactor | Post-refactor |
|---|---|---|
| `_init_vector_store()` | `raise ImportError` (no chromadb) or `raise ValueError` (unknown provider) | `_init_vector_store_result() -> Result[None]` |
| `_validate_collection_dim()` | `except Exception: pass` (silent corruption) | `_validate_collection_dim_result() -> Result[None]` |
| `is_empty()` | `bool` (or `None` if collection failed) | `Result[bool]` (data is `False` on failure) |
| `add_documents()` | `raise` on chromadb error | `Result[None]` (errors as `ErrorInfo`) |
| `search()` | `List[Dict]` (or `[]` on failure) | `Result[list[dict]]` (data is `[]` on failure) |
| `index_file()` | `raise` on missing file or chromadb error | `Result[None]` (errors as `ErrorInfo`) |

### Nil-Sentinel Pattern

The `NIL_RAG_STATE` dataclass is the "RAG engine in unconfigured/failed-
to-init state" — it has all default values and is safe to read from:

```python
@dataclass(frozen=True)
class NilRAGState:
    enabled: bool = False
    is_empty_result: bool = True
    errors: list[ErrorInfo] = field(default_factory=list)

NIL_RAG_STATE = NilRAGState()  # module-level singleton
```

When the RAG engine is in this state (e.g., chromadb isn't installed,
or the configured provider is unknown), methods that would have raised
now return `Result` with `data=NIL_RAG_STATE` and the error in
`.errors`. Callers can check `if isinstance(result.data, NilRAGState):
    handle_as_disabled()` — but most callers just need to know
"should I render the RAG panel as enabled?" and
`NIL_RAG_STATE.enabled == False` is fine.

### Constructor Behavior

`RAGEngine.__init__` still raises for "config missing" (fail early at
init — that's a programmer error). "Config invalid" (e.g., bad
embedding provider, bad chromadb collection) defers to
`_init_vector_store_result()` and is called explicitly or lazily. The
constructor itself returns a "best-effort" instance with
`self.collection = NIL_COLLECTION` if init fails; the first call to
`search()` / `add_documents()` etc. will surface the deferred error
in its `Result.errors`.

### Example

```python
from src import rag_engine
from src.result_types import ErrorKind

result = rag_engine.search("user query", top_k=5)
if result.errors:
    for err in result.errors:
        if err.kind == ErrorKind.NOT_READY:
            log.info("RAG not yet warmed: %s", err.message)
        elif err.kind == ErrorKind.CONFIG:
            log.warning("RAG misconfigured: %s", err.message)
        else:
            log.error(err.ui_message())
# use result.data regardless (it's the zero-initialized [] on failure)
for hit in result.data:
    process(hit)
```

### Dimension Mismatch Protection (Recovers via `ErrorInfo`)

The 2026-06-06 collection-dim-mismatch bug fix
(commit `16412ad5`) lives inside `_validate_collection_dim_result()`
post-refactor. When the on-disk collection's dim doesn't match the
current embedding provider's dim, the method returns
`Result[None]` with a single `ErrorInfo(kind=ErrorKind.CONFIG, ...)`
instead of raising `InvalidDimensionError` deep in chromadb. The
caller (`_init_vector_store_result()`) sees the error in the
`.errors` list and can recreate the collection. This is the canonical
"SDK boundary catches, convert to ErrorInfo" pattern in action.

### See Also (in-doc)

- [`conductor/code_styleguides/error_handling.md`](../conductor/code_styleguides/error_handling.md) — canonical styleguide (5 patterns, data model, decision tree, anti-patterns)
- [`conductor/tracks/data_oriented_error_handling_20260606/spec.md`](../conductor/tracks/data_oriented_error_handling_20260606/spec.md) — the spec that introduced this pattern
- [`docs/guide_ai_client.md`](guide_ai_client.md#data-oriented-error-handling-fleury-pattern) — same pattern in the provider layer
- [`docs/guide_mcp_client.md`](guide_mcp_client.md#data-oriented-error-handling-fleury-pattern) — same pattern in the MCP tool layer

---
## Edge Cases & Limitations

1. **Empty Index**: If the index has no documents, `search()` returns `[]` and no context is injected. The AI call proceeds normally with just the explicit file context.

2. **Network Failures (Gemini Embeddings)**: If the Gemini API is unreachable, `GeminiEmbeddingProvider.embed()` raises an exception. The caller (typically `_chunk_code` → `index_file` → RAG indexer) should handle this gracefully and either retry or fall back to the local provider.

3. **Stale Index**: Auto-sync runs periodically but not on every read. If a file is changed between sync intervals, the index may be stale. The `delete_documents_by_path` + `index_file` cycle is atomic per file, so a partial sync leaves the index in a consistent (if incomplete) state.

4. **Large Files**: A single file larger than `chunk_size` is split into multiple chunks with overlap. There's no upper limit on the number of chunks per file, but very large files (>10MB) may slow down indexing significantly.

5. **Binary Files**: RAG only handles text files. Binary files (images, compiled Python, etc.) are skipped during indexing with a warning logged to `comms_log`.

6. **Cross-Project Queries**: The vector store is per-project (`<project_dir>/.rag/chroma/`). Cross-project retrieval is **not** supported; each project has its own isolated index.

7. **Concurrent Writes**: ChromaDB's PersistentClient is single-writer. If multiple processes try to write to the same index simultaneously, ChromaDB will raise. Manual Slop uses a `threading.Lock` to serialize writes from the auto-sync thread and the manual re-index button.

---

## Future Work

- **External RAG Bridge** — Connect to remote vector databases (e.g., a managed Pinecone or Weaviate) via MCP. The `_search_mcp` method (`src/rag_engine.py:322`) IS a real implementation: when `RAGConfig.vector_store.provider == "mcp"`, `RAGEngine.search()` dispatches to `_search_mcp()` which calls `mcp_client.async_dispatch("rag_search", {"query": ..., "top_k": ...})`. The MCP-bridge config (`mcp_server`, `mcp_tool`) lives on `VectorStoreConfig`. The bridge wires up the rest of the RAG pipeline to a remote vector store; no per-vendor `_init_vector_store` branch is needed because the MCP server owns that.
- **Hybrid Search** — Combine dense (vector) retrieval with sparse (BM25) retrieval for better recall on code keywords.
- **Re-ranking** — Apply a cross-encoder reranker to the top-k results before injection to improve precision.
- **Caching** — Cache query results in memory to avoid re-embedding for repeated questions.
- **Provider Routing** — Allow per-query provider selection (e.g., use Gemini for general queries, local for code).

See [guide_tools.md](guide_tools.md) for the MCP tool inventory; see [guide_architecture.md](guide_architecture.md) for the dispatch pipeline.