# RAG Integration Discipline

**Status:** Styleguide; codifies when and how to wire RAG (the opt-in, semantic-search memory dimension) into Manual Slop features.
**Date:** 2026-06-12
**Cross-refs:** `conductor/code_styleguides/agent_memory_dimensions.md` §3; `conductor/code_styleguides/data_oriented_design.md` §9; `docs/guide_rag.md`.

> **What this is.** RAG is the opt-in, semantic-search memory dimension. It's *useful* (semantic search across large codebases; concept-level discovery; cross-file pattern matching grep can't do). It's also *fuzzy* (vector similarity, not exact) and *opaque* (the vector store is not user-editable). The discipline: be conservative about when to wire it in. The wrong shape for the right question is a common mistake.

---

## 0. The 6 rules (the one-glance table)

| # | Rule | Why |
|---|---|---|
| 1 | RAG is **opt-in**. Default-off in new projects | Most features don't need it; the cost of unnecessary RAG is the embedding-provider round trip + the storage cost |
| 2 | RAG **complements**; it never **replaces** | Curation / Discussion / Knowledge are the durable, user-editable dimensions; RAG is the fuzzy, semantic search |
| 3 | RAG results display with **provenance** | The user needs to know which file and which chunk produced the result |
| 4 | RAG **never mutates state** | No auto-injection of RAG results into `disc_entries`; no auto-update of `FileItem`; no auto-write to disk |
| 5 | RAG integration is **feature-gated** | A feature must explicitly request RAG in its scope; RAG is not the default for "give me context" |
| 6 | RAG failure is **graceful** | A failed search returns `Result.empty` or an empty list; never crashes the request |

---

## 1. RAG is opt-in (Rule 1)

**The default is OFF.** A new project opens with `rag_enabled = false`. The user opts in via the AI Settings panel.

**The rationale.** RAG is not free:
- The embedding-provider round trip adds latency (200-500ms per call, per provider)
- The storage cost grows with the indexed corpus (per `RAGConfig.chunk_size` and `chunk_overlap`)
- The dim-mismatch fix at `16412ad5` shows that switching providers requires a full re-index (the existing collection is incompatible with the new provider's embedding dimension)

For a project that doesn't *need* semantic search (e.g., a small Python project with 20 files), RAG is overhead, not benefit.

**The opt-in surface.** Per the existing `[ai_settings.toml]` pattern:
- `[X] Enable RAG` checkbox
- Source: `(project / global / none)` radio
- Embedding provider: `(gemini / local)` dropdown
- Chunk size: integer (default 1000)
- Chunk overlap: integer (default 200)

**The opt-out is also supported.** `rm ~/.manual_slop/.slop_cache/chroma_<provider>/` deletes the index. Re-enabling requires a full re-index.

**The opt-out via the AI Settings:**
```toml
[ai_settings.rag]
enabled = false   # default for new projects
```

**The opt-in is explicit:**
```toml
[ai_settings.rag]
enabled = true
source = "project"
embedding_provider = "gemini"
chunk_size = 1000
chunk_overlap = 200
```

---

## 2. RAG complements; it never replaces (Rule 2)

**The 4 memory dimensions** (per `conductor/code_styleguides/agent_memory_dimensions.md`):

| Dim | SSDL | Use when |
|---|---|---|
| Curation | `[Q]` | "How to render a file" |
| Discussion | `o==>` | "What was said in this chat" |
| **RAG** | `[Q]` | **"What similar content exists"** |
| Knowledge | `o==>` | "What we learned from past runs" |

**The rule.** RAG is the *fuzzy semantic search* dimension. It is NOT:
- A replacement for curation (use `FileItem.view_mode` + Fuzzy Anchors)
- A replacement for discussion (use `disc_entries`)
- A replacement for knowledge (use `knowledge/digest.md`)

**The cross-cutting principle.** When a feature asks "give me context," the answer is *not* "enable RAG." The answer is "which of the 4 dimensions is the right home?" — and the 4-dim decision tree is the test.

**The "complement" examples:**
- A new discussion opens: render the active preset's `FileItem`s (curation) + the `disc_entries` (discussion) + the knowledge digest (knowledge). *Optionally* append `{rag-context}` if the user has opted in.
- The LLM asks "what's the execution clutch?": try knowledge first (the user has decided it's a durable concept). Try discussion second (search the prior entries for "clutch"). Try RAG third (semantic search across the indexed codebase). Curation fourth (the user has configured specific files).
- The user asks "where does X happen?": RAG is the *natural* shape for this question (semantic search). Use it.

---

## 3. Provenance required (Rule 3)

**The principle.** When RAG returns results, the user must be able to see *which file* and *which chunk* produced the result. No black boxes.

**The RAG result shape** (per `RAGEngine.search`):

```python
@dataclass
class SearchResult:
    file_path: str           # the absolute path
    chunk_offset: int        # byte offset within the file
    chunk_length: int        # length in bytes
    content: str             # the matched text
    similarity: float         # the cosine similarity
```

**The display in the LLM context** (the `{rag-context}` block):

```
{rag-context}
## src/ai_client.py:512-768 (similarity: 0.87)
...content...

## src/aggregate.py:142-289 (similarity: 0.82)
...content...
{/rag-context}
```

**The display in the GUI** (the per-result tooltip):

```
[Anthropic cache-aware send]
File: src/ai_client.py:512-768
Similarity: 0.87
Click to jump to file
```

**The provenance is not optional.** If a result has no provenance, it doesn't go in the context.

**The cross-references.** The dim-mismatch fix at `16412ad5` shows the kind of bug that happens when the RAG index loses provenance: switching providers silently corrupts the index because the embeddings have different dimensions. The provenance (file path + chunk offset) is what makes the index re-buildable.

---

## 4. RAG never mutates state (Rule 4)

**The principle.** RAG is a *query* dimension. It returns data; it does not write data.

**The mutation rules:**
- RAG results **do NOT** go into `disc_entries`
- RAG results **do NOT** update `FileItem` curation state
- RAG results **do NOT** write to disk
- RAG results **do NOT** trigger knowledge harvest
- RAG results **do NOT** modify the system prompt or persona

**The exception (none).** There is no feature that should mutate state from RAG results. If a feature wants to "remember" something from RAG, the user must explicitly say "add that to the discussion" (which appends a `role: "User"` entry to `disc_entries`) or "harvest that into knowledge" (which runs the harvest workflow).

**The boundary in code:**

```python
# In ai_client.py:send() (the integration point)
def send(...):
    prompt = aggregate.build(...)
    if config.rag_enabled:
        results = rag_engine.search(prompt, k=N)
        prompt = append_rag_block(prompt, results)   # READ ONLY
    return self._send_<provider>(prompt, ...)
    # NO mutation of: disc_entries, FileItem, knowledge files
```

**The mutation must happen in a different function, called explicitly by the user or the LLM with HITL approval.**

---

## 5. Feature-gated integration (Rule 5)

**The principle.** A feature must explicitly request RAG in its scope. RAG is not the default for "give me context."

**The gate.** Every feature that uses RAG declares the dependency in its spec, plan, and changelog:

```markdown
## Scope
- Feature X (uses RAG for semantic search)
- Feature Y (no RAG dependency; uses Curation + Discussion only)

## Dependencies
- RAG is required for Feature X; the user must opt-in via AI Settings
- Feature Y is independent of RAG
```

**The runtime gate.** The feature's code checks `config.rag_enabled` and behaves accordingly:

```python
# In the feature's code
def feature_x(query: str) -> list[SearchResult]:
    if not config.rag_enabled:
        raise RAGNotEnabledError("Feature X requires RAG; opt in via AI Settings")
    return rag_engine.search(query, k=N)
```

**The error message is explicit.** The user knows why the feature isn't working.

**The CLI surface** (for testing and debugging):
```bash
$ python -m src.feature_x "execution clutch"
# Error: RAG not enabled. Enable via: [ai_settings.toml] rag.enabled = true
```

**The audit trail.** Every feature that uses RAG is logged in `metadata.json` for the feature's track: `uses_rag: true`.

---

## 6. Graceful failure (Rule 6)

**The principle.** RAG failure is data, not an exception. A failed search returns an empty result; the request continues.

**The failure modes** (in priority order):

| Failure | Handling |
|---|---|
| RAG not enabled | Skip; no `{rag-context}` block; the request continues |
| ChromaDB not initialized | Skip; log a warning; the request continues |
| Embedding provider not available | Skip; log a warning; the request continues |
| Index missing (first run) | Skip; log a warning; the request continues |
| Search returns empty | Normal; no `{rag-context}` block; the request continues |
| Search times out | Return partial results; log a warning |
| Search raises an exception | Catch; log the exception; return empty; the request continues |

**The exception is `Result[T, ErrorInfo]`, not an exception.** Per the `data_oriented_error_handling_20260606` convention.

```python
# In the RAG engine
def search(self, query: str, k: int = 5) -> Result[list[SearchResult], ErrorInfo]:
    try:
        if not self._enabled:
            return Result(data=[], errors=[ErrorInfo(NOT_READY, "RAG not enabled")])
        if not self._collection:
            return Result(data=[], errors=[ErrorInfo(NOT_READY, "RAG not initialized")])
        results = self._collection.query(query, k=k)
        return Result(data=results, errors=[])
    except Exception as exc:
        return Result(data=[], errors=[ErrorInfo(INTERNAL, str(exc))])
```

**The caller** (`ai_client.py:send`) checks `.errors` and proceeds with empty results:

```python
rag_result = rag_engine.search(prompt, k=N)
if rag_result.ok and rag_result.data:
    prompt = append_rag_block(prompt, rag_result.data)
# else: proceed without RAG; the request doesn't fail
```

**The user sees the warning** in the comms log:
```
[RAG] search failed: ChromaDB not initialized
[RAG] request continues without RAG
```

---

## 7. The wiring points (the where)

| Where in `src/` | What it does | What it does NOT do |
|---|---|---|
| `src/ai_client.py:send` | The integration point; appends `{rag-context}` if enabled | Does not mutate state |
| `src/aggregate.py:run` | Builds the initial context; appends `{rag-context}` in the volatile layer | Does not query RAG directly |
| `src/rag_engine.py:search` | The semantic search; returns `Result[list[SearchResult], ErrorInfo]` | Does not write to the index |
| `src/rag_engine.py:index_file` | The indexer; called by `RAGEngine._init_vector_store` or by the harvest CLI | Does not run at LLM call time |
| `src/ai_settings.toml` (or GUI) | The opt-in surface | Does not trigger RAG automatically |

---

## 8. The forbidden patterns (the "don't do this" list)

| Pattern | Why it's forbidden |
|---|---|
| RAG as a *replacement* for curation | Curation is structural (per-file schema); RAG is semantic (fuzzy). Use curation for "how to render file X" |
| RAG as a *replacement* for discussion | Discussion is precise (the actual messages); RAG is fuzzy. Use discussion for "what was said" |
| RAG as a *replacement* for knowledge | Knowledge is durable (user-edited, provenance-aware); RAG is volatile (indexed, opaque). Use knowledge for "what we decided" |
| Auto-inject RAG results into `disc_entries` | This is a state mutation; it changes the conversation in a way the user didn't ask for |
| Auto-write RAG results to disk | Same; no mutation |
| Use RAG when the user hasn't opted in | RAG is opt-in; default-off in new projects |
| Crash the request when RAG fails | Graceful failure; the request continues |
| Use RAG for "show me the last thing the user said" | Use `disc_entries` (precise) |
| Use RAG for "show me what we decided last time" | Use the knowledge digest (durable) |
| Use RAG for "show me the file the user is editing" | Use `FileItem` (curation) |

---

## 9. The cross-references

- `conductor/code_styleguides/agent_memory_dimensions.md` §3 — the RAG dim in context
- `conductor/code_styleguides/data_oriented_design.md` §1.2 — "Design around a model of the world" (the underlying anti-pattern)
- `conductor/code_styleguides/cache_friendly_context.md` — where the 4 dims get injected in the cache strategy
- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge dim (the alternative for "what we decided")
- `docs/guide_rag.md` — the existing RAG deep-dive
- `data_oriented_error_handling_20260606` — the `Result[T, ErrorInfo]` pattern
- `conductor/tracks/rag_phase4_stress_fix_20260606` — the dim-mismatch fix at `16412ad5`