# RAG Integration Discipline **Status:** Styleguide; codifies when and how to wire RAG (the opt-in, semantic-search memory dimension) into Manual Slop features. **Date:** 2026-06-12 **Cross-refs:** `conductor/code_styleguides/agent_memory_dimensions.md` §3; `conductor/code_styleguides/data_oriented_design.md` §9; `docs/guide_rag.md`. > **What this is.** RAG is the opt-in, semantic-search memory dimension. It's *useful* (semantic search across large codebases; concept-level discovery; cross-file pattern matching grep can't do). It's also *fuzzy* (vector similarity, not exact) and *opaque* (the vector store is not user-editable). The discipline: be conservative about when to wire it in. The wrong shape for the right question is a common mistake. --- ## 0. The 6 rules (the one-glance table) | # | Rule | Why | |---|---|---| | 1 | RAG is **opt-in**. Default-off in new projects | Most features don't need it; the cost of unnecessary RAG is the embedding-provider round trip + the storage cost | | 2 | RAG **complements**; it never **replaces** | Curation / Discussion / Knowledge are the durable, user-editable dimensions; RAG is the fuzzy, semantic search | | 3 | RAG results display with **provenance** | The user needs to know which file and which chunk produced the result | | 4 | RAG **never mutates state** | No auto-injection of RAG results into `disc_entries`; no auto-update of `FileItem`; no auto-write to disk | | 5 | RAG integration is **feature-gated** | A feature must explicitly request RAG in its scope; RAG is not the default for "give me context" | | 6 | RAG failure is **graceful** | A failed search returns `Result.empty` or an empty list; never crashes the request | --- ## 1. RAG is opt-in (Rule 1) **The default is OFF.** A new project opens with `rag_enabled = false`. The user opts in via the AI Settings panel. **The rationale.** RAG is not free: - The embedding-provider round trip adds latency (200-500ms per call, per provider) - The storage cost grows with the indexed corpus (per `RAGConfig.chunk_size` and `chunk_overlap`) - The dim-mismatch fix at `16412ad5` shows that switching providers requires a full re-index (the existing collection is incompatible with the new provider's embedding dimension) For a project that doesn't *need* semantic search (e.g., a small Python project with 20 files), RAG is overhead, not benefit. **The opt-in surface.** Per the existing `[ai_settings.toml]` pattern: - `[X] Enable RAG` checkbox - Source: `(project / global / none)` radio - Embedding provider: `(gemini / local)` dropdown - Chunk size: integer (default 1000) - Chunk overlap: integer (default 200) **The opt-out is also supported.** `rm ~/.manual_slop/.slop_cache/chroma_/` deletes the index. Re-enabling requires a full re-index. **The opt-out via the AI Settings:** ```toml [ai_settings.rag] enabled = false # default for new projects ``` **The opt-in is explicit:** ```toml [ai_settings.rag] enabled = true source = "project" embedding_provider = "gemini" chunk_size = 1000 chunk_overlap = 200 ``` --- ## 2. RAG complements; it never replaces (Rule 2) **The 4 memory dimensions** (per `conductor/code_styleguides/agent_memory_dimensions.md`): | Dim | SSDL | Use when | |---|---|---| | Curation | `[Q]` | "How to render a file" | | Discussion | `o==>` | "What was said in this chat" | | **RAG** | `[Q]` | **"What similar content exists"** | | Knowledge | `o==>` | "What we learned from past runs" | **The rule.** RAG is the *fuzzy semantic search* dimension. It is NOT: - A replacement for curation (use `FileItem.view_mode` + Fuzzy Anchors) - A replacement for discussion (use `disc_entries`) - A replacement for knowledge (use `knowledge/digest.md`) **The cross-cutting principle.** When a feature asks "give me context," the answer is *not* "enable RAG." The answer is "which of the 4 dimensions is the right home?" — and the 4-dim decision tree is the test. **The "complement" examples:** - A new discussion opens: render the active preset's `FileItem`s (curation) + the `disc_entries` (discussion) + the knowledge digest (knowledge). *Optionally* append `{rag-context}` if the user has opted in. - The LLM asks "what's the execution clutch?": try knowledge first (the user has decided it's a durable concept). Try discussion second (search the prior entries for "clutch"). Try RAG third (semantic search across the indexed codebase). Curation fourth (the user has configured specific files). - The user asks "where does X happen?": RAG is the *natural* shape for this question (semantic search). Use it. --- ## 3. Provenance required (Rule 3) **The principle.** When RAG returns results, the user must be able to see *which file* and *which chunk* produced the result. No black boxes. **The RAG result shape** (per `RAGEngine.search`): ```python @dataclass class SearchResult: file_path: str # the absolute path chunk_offset: int # byte offset within the file chunk_length: int # length in bytes content: str # the matched text similarity: float # the cosine similarity ``` **The display in the LLM context** (the `{rag-context}` block): ``` {rag-context} ## src/ai_client.py:512-768 (similarity: 0.87) ...content... ## src/aggregate.py:142-289 (similarity: 0.82) ...content... {/rag-context} ``` **The display in the GUI** (the per-result tooltip): ``` [Anthropic cache-aware send] File: src/ai_client.py:512-768 Similarity: 0.87 Click to jump to file ``` **The provenance is not optional.** If a result has no provenance, it doesn't go in the context. **The cross-references.** The dim-mismatch fix at `16412ad5` shows the kind of bug that happens when the RAG index loses provenance: switching providers silently corrupts the index because the embeddings have different dimensions. The provenance (file path + chunk offset) is what makes the index re-buildable. --- ## 4. RAG never mutates state (Rule 4) **The principle.** RAG is a *query* dimension. It returns data; it does not write data. **The mutation rules:** - RAG results **do NOT** go into `disc_entries` - RAG results **do NOT** update `FileItem` curation state - RAG results **do NOT** write to disk - RAG results **do NOT** trigger knowledge harvest - RAG results **do NOT** modify the system prompt or persona **The exception (none).** There is no feature that should mutate state from RAG results. If a feature wants to "remember" something from RAG, the user must explicitly say "add that to the discussion" (which appends a `role: "User"` entry to `disc_entries`) or "harvest that into knowledge" (which runs the harvest workflow). **The boundary in code:** ```python # In ai_client.py:send() (the integration point) def send(...): prompt = aggregate.build(...) if config.rag_enabled: results = rag_engine.search(prompt, k=N) prompt = append_rag_block(prompt, results) # READ ONLY return self._send_(prompt, ...) # NO mutation of: disc_entries, FileItem, knowledge files ``` **The mutation must happen in a different function, called explicitly by the user or the LLM with HITL approval.** --- ## 5. Feature-gated integration (Rule 5) **The principle.** A feature must explicitly request RAG in its scope. RAG is not the default for "give me context." **The gate.** Every feature that uses RAG declares the dependency in its spec, plan, and changelog: ```markdown ## Scope - Feature X (uses RAG for semantic search) - Feature Y (no RAG dependency; uses Curation + Discussion only) ## Dependencies - RAG is required for Feature X; the user must opt-in via AI Settings - Feature Y is independent of RAG ``` **The runtime gate.** The feature's code checks `config.rag_enabled` and behaves accordingly: ```python # In the feature's code def feature_x(query: str) -> list[SearchResult]: if not config.rag_enabled: raise RAGNotEnabledError("Feature X requires RAG; opt in via AI Settings") return rag_engine.search(query, k=N) ``` **The error message is explicit.** The user knows why the feature isn't working. **The CLI surface** (for testing and debugging): ```bash $ python -m src.feature_x "execution clutch" # Error: RAG not enabled. Enable via: [ai_settings.toml] rag.enabled = true ``` **The audit trail.** Every feature that uses RAG is logged in `metadata.json` for the feature's track: `uses_rag: true`. --- ## 6. Graceful failure (Rule 6) **The principle.** RAG failure is data, not an exception. A failed search returns an empty result; the request continues. **The failure modes** (in priority order): | Failure | Handling | |---|---| | RAG not enabled | Skip; no `{rag-context}` block; the request continues | | ChromaDB not initialized | Skip; log a warning; the request continues | | Embedding provider not available | Skip; log a warning; the request continues | | Index missing (first run) | Skip; log a warning; the request continues | | Search returns empty | Normal; no `{rag-context}` block; the request continues | | Search times out | Return partial results; log a warning | | Search raises an exception | Catch; log the exception; return empty; the request continues | **The exception is `Result[T, ErrorInfo]`, not an exception.** Per the `data_oriented_error_handling_20260606` convention. ```python # In the RAG engine def search(self, query: str, k: int = 5) -> Result[list[SearchResult], ErrorInfo]: try: if not self._enabled: return Result(data=[], errors=[ErrorInfo(NOT_READY, "RAG not enabled")]) if not self._collection: return Result(data=[], errors=[ErrorInfo(NOT_READY, "RAG not initialized")]) results = self._collection.query(query, k=k) return Result(data=results, errors=[]) except Exception as exc: return Result(data=[], errors=[ErrorInfo(INTERNAL, str(exc))]) ``` **The caller** (`ai_client.py:send`) checks `.errors` and proceeds with empty results: ```python rag_result = rag_engine.search(prompt, k=N) if rag_result.ok and rag_result.data: prompt = append_rag_block(prompt, rag_result.data) # else: proceed without RAG; the request doesn't fail ``` **The user sees the warning** in the comms log: ``` [RAG] search failed: ChromaDB not initialized [RAG] request continues without RAG ``` --- ## 7. The wiring points (the where) | Where in `src/` | What it does | What it does NOT do | |---|---|---| | `src/ai_client.py:send` | The integration point; appends `{rag-context}` if enabled | Does not mutate state | | `src/aggregate.py:run` | Builds the initial context; appends `{rag-context}` in the volatile layer | Does not query RAG directly | | `src/rag_engine.py:search` | The semantic search; returns `Result[list[SearchResult], ErrorInfo]` | Does not write to the index | | `src/rag_engine.py:index_file` | The indexer; called by `RAGEngine._init_vector_store` or by the harvest CLI | Does not run at LLM call time | | `src/ai_settings.toml` (or GUI) | The opt-in surface | Does not trigger RAG automatically | --- ## 8. The forbidden patterns (the "don't do this" list) | Pattern | Why it's forbidden | |---|---| | RAG as a *replacement* for curation | Curation is structural (per-file schema); RAG is semantic (fuzzy). Use curation for "how to render file X" | | RAG as a *replacement* for discussion | Discussion is precise (the actual messages); RAG is fuzzy. Use discussion for "what was said" | | RAG as a *replacement* for knowledge | Knowledge is durable (user-edited, provenance-aware); RAG is volatile (indexed, opaque). Use knowledge for "what we decided" | | Auto-inject RAG results into `disc_entries` | This is a state mutation; it changes the conversation in a way the user didn't ask for | | Auto-write RAG results to disk | Same; no mutation | | Use RAG when the user hasn't opted in | RAG is opt-in; default-off in new projects | | Crash the request when RAG fails | Graceful failure; the request continues | | Use RAG for "show me the last thing the user said" | Use `disc_entries` (precise) | | Use RAG for "show me what we decided last time" | Use the knowledge digest (durable) | | Use RAG for "show me the file the user is editing" | Use `FileItem` (curation) | --- ## 9. The cross-references - `conductor/code_styleguides/agent_memory_dimensions.md` §3 — the RAG dim in context - `conductor/code_styleguides/data_oriented_design.md` §1.2 — "Design around a model of the world" (the underlying anti-pattern) - `conductor/code_styleguides/cache_friendly_context.md` — where the 4 dims get injected in the cache strategy - `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge dim (the alternative for "what we decided") - `docs/guide_rag.md` — the existing RAG deep-dive - `data_oriented_error_handling_20260606` — the `Result[T, ErrorInfo]` pattern - `conductor/tracks/rag_phase4_stress_fix_20260606` — the dim-mismatch fix at `16412ad5`