Re-audit after reading the actual full file contents:
1. guide_app_controller.md (the __init__ walkthrough):
- '4-thread ThreadPoolExecutor' -> '8-thread' per IO_POOL_MAX_WORKERS = 8
in src/io_pool.py:20 (bumped from 4 in commit 4a338486; the io_pool.py
module docstring is also stale and says '4 worker threads' - flagged
for a separate fix).
- '12 locks' -> '11 locks + 5 non-lock state fields' (re-counted the
threading.Lock() and the _rag_sync_*/_project_switch_* fields).
2. guide_app_controller.md (the closing line):
- '12 locks' -> removed; explained the 434-line __init__ body
composition (locks + state fields + settable_fields + gui_task_handlers).
3. guide_rag.md (Future Work section):
- 'The _search_mcp method is a placeholder for this' -> WRONG.
_search_mcp (src/rag_engine.py:322) IS a real implementation that
calls mcp_client.async_dispatch when vector_store.provider == 'mcp'.
Rewrote the future-work item to describe the actual mechanism.
4. docs/reports/docs_sync_test_era_20260610.md (the closing report):
- Same 4-thread->8 and 12-locks->11 corrections propagated.
The structural facts (WorkspaceProfile/RAGConfig/VectorStoreConfig field
lists, method existence, _init_actions/_load_active_project line
numbers, _LiveGuiHandle existence, etc.) were all correct. The
counting/threading-pool claims I cited from memory were the ones
that needed re-verification.
23 KiB
RAG (Retrieval-Augmented Generation)
Top | Architecture | MMA | Tools & IPC | Simulations
Overview
Manual Slop integrates Retrieval-Augmented Generation (RAG) to extend the AI's working context beyond the explicit file list. When a project is RAG-enabled, the system maintains a vector index of file content; AI calls can retrieve semantically similar fragments at query time and prepend them to the prompt.
The RAG implementation is pluggable: the vector store, the embedding provider, and the chunking strategy are all configurable per project. The default backend is ChromaDB (local persistent), the default embedding is Gemini Embedding 001 (cloud), and the default chunking is character-based with overlap (with AST-aware chunking for Python files when enabled).
This guide covers:
- Architecture — Where RAG fits in the dispatch pipeline
- Components —
RAGEngine, embedding providers, vector store - Data Flow — Indexing, query, retrieval, injection
- Configuration —
RAGConfigschema and TOML settings - Verification — Test infrastructure and known edge cases
Architecture
RAG sits between the project's tracked files and the AI provider's input prompt. It is not an internal AI call — it is a pre-processing step that augments md_content before the provider sees it.
┌─────────────────────────────────┐
│ AppController / ConductorEngine │
│ (caller of ai_client.send) │
└────────────┬────────────────────┘
│ constructs RAGEngine once per project
▼
┌────────────────────────────────────────────┐
│ RAGEngine │
│ ├─ EmbeddingProvider (Local or Gemini) │
│ ├─ VectorStore (ChromaDB persistent) │
│ └─ Chunkers (_chunk_text, _chunk_code) │
└────────────┬───────────────────────────────┘
│ on every ai_client.send() call:
│ rag_engine.search(user_message) -> fragments
▼
┌────────────────────────────────────────────┐
│ ai_client.send(rag_engine=...) │
│ injects [RETRIEVED CONTEXT] block │
│ into md_content before provider call │
└────────────────────────────────────────────┘
Lifecycle:
- The
AppControllerconstructs a singleRAGEngineper project load (lazily, when the project is first opened or when a RAG-related setting changes). - The
RAGEngineis passed through toai_client.send()for every AI call from the main discussion flow. - For Tier 3 workers spawned by the MMA, the ConductorEngine or caller is responsible for constructing the engine (typically with the same configuration as the main discussion).
- If a project disables RAG,
rag_engine=Noneis passed tosend()and the integration is a no-op.
Why caller-owned? The RAG engine is decoupled from ai_client so that the same module can be reused by the GUI's RAG panel for direct queries, by MMA workers for ticket-specific retrieval, and by future automation scripts. ai_client only knows how to use an engine if one is provided.
Components
RAGEngine (src/rag_engine.py)
The central class. Owns the embedding provider and the vector store, exposes high-level methods for indexing and search.
class RAGEngine:
def __init__(self, config: models.RAGConfig, base_dir: str = "."):
...
Construction: Takes a RAGConfig (from src/models.py) and a base_dir. The config specifies the embedding provider type, the vector store path, the chunk size, and the chunk overlap.
Internal state:
embedding_provider: BaseEmbeddingProvider— set by_init_embedding_providerclient: chromadb.PersistentClient— the chroma client (or the string"mock"in mock mode)collection: chromadb.Collection— the actual collection (or"mock"in mock mode)chunk_size: int— character count per chunkchunk_overlap: int— overlap between adjacent chunks
Embedding Providers
Two providers are implemented; new ones can be added by subclassing BaseEmbeddingProvider.
BaseEmbeddingProvider
class BaseEmbeddingProvider:
def embed(self, texts: List[str]) -> List[List[float]]:
"""Embed a batch of texts. Returns one vector per input text."""
...
A contract: embed() takes a list of strings and returns a list of equal-length float vectors. The vector dimensionality is provider-specific (e.g., 384 for all-MiniLM-L6-v2, 768 for gemini-embedding-001).
LocalEmbeddingProvider
Uses sentence-transformers (all-MiniLM-L6-v2 by default) for embedding.
- Pros: Fully local, no API quota, deterministic.
- Cons: Lower-quality embeddings than cloud models for code; CPU/GPU usage during indexing.
- Default model:
all-MiniLM-L6-v2(384 dimensions, ~80MB download on first use).
class LocalEmbeddingProvider(BaseEmbeddingProvider):
def __init__(self, model_name: str = 'all-MiniLM-L6-v2'):
...
GeminiEmbeddingProvider
Uses the Gemini Embedding 001 model via the google-genai SDK.
- Pros: Higher-quality embeddings, especially for code; no local model download.
- Cons: Requires Gemini API key, network round-trip per embedding call, subject to API quotas.
class GeminiEmbeddingProvider(BaseEmbeddingProvider):
def __init__(self, model_name: str = 'gemini-embedding-001'):
...
Lazy Loading
The heavy dependencies (sentence_transformers, google.genai, chromadb) are loaded lazily via _get_sentence_transformers(), _get_google_genai(), _get_chromadb(). This means RAG is opt-in: a project that doesn't enable RAG pays no import-time cost.
Vector Store
ChromaDB is the default persistent vector store. The store is created at <project_dir>/.slop_cache/chroma_<collection_name>/ (auto-generated from VectorStoreConfig.collection_name, default "manual_slop"). The .slop_cache location is intentional — it co-locates the chroma index with the existing per-project cache layout.
def _init_vector_store(self):
vs_config = self.config.vector_store
if vs_config.provider == 'chroma':
db_path = os.path.abspath(os.path.join(
self.base_dir, ".slop_cache", f"chroma_{vs_config.collection_name}"
))
os.makedirs(db_path, exist_ok=True)
chromadb, Settings = _get_chromadb()
self.client = chromadb.PersistentClient(path=db_path)
self.collection = self.client.get_or_create_collection(name=vs_config.collection_name)
self._validate_collection_dim()
elif vs_config.provider == 'mock':
self.client = "mock"
self.collection = "mock"
else:
raise ValueError(f"Unknown vector store provider: {vs_config.provider}")
Backends (VectorStoreConfig.provider):
chroma(default for real use) — local persistent, single-processmock— no-op collection (for tests / RAG-disabled paths)
The mcp_server + mcp_tool fields in VectorStoreConfig are placeholders for the future External RAG Bridge via MCP (e.g., a remote vector database server); not yet implemented.
Chunking Strategies
Two strategies are implemented. The choice is made per-file based on extension and config.
Character-Based (_chunk_text)
Default for non-Python files and for Python files when AST chunking is disabled.
def _chunk_text(self, content: str) -> List[str]:
"""Character-based chunking with overlap."""
chunks = []
start = 0
while start < len(content):
end = min(start + self.chunk_size, len(content))
chunks.append(content[start:end])
if end >= len(content): break
start = end - self.chunk_overlap
return chunks
- Default chunk size: 1000 characters
- Default overlap: 200 characters
- Edge cases: Empty files return
[]; single-chunk files return[content].
AST-Aware (_chunk_code)
Used for .py files when RAGConfig.ast_chunking_enabled = True.
def _chunk_code(self, content: str, file_path: str) -> List[str]:
"""AST-aware chunking for Python code."""
# Parses with stdlib ast
# Splits on top-level def/class boundaries
# Each chunk is a complete top-level definition with its docstring
...
- Strategy: Each top-level function, class, or constant block becomes one chunk. Docstrings are preserved as the first line of the chunk for context.
- Pros: Semantic boundaries produce more meaningful retrieval results. A query for "how does X work" is more likely to return the entire definition of X rather than a fragment.
- Cons: Requires valid Python; syntax errors fall back to character-based chunking.
The chunker uses stdlib ast (not tree-sitter) to avoid pulling tree-sitter for a feature that only handles Python.
Data Flow
Indexing Flow
When a project is loaded with RAG enabled, the RAGEngine is populated by indexing all tracked files.
1. Project load: AppController reads [rag] section from manual_slop.toml
2. AppController constructs RAGEngine(config)
3. RAGEngine._init_vector_store() creates/loads ChromaDB collection
- Calls _validate_collection_dim() to detect/recover from dim mismatch
4. For each tracked file (parallelized):
a. Read content
b. Choose chunker based on extension and config
c. For each chunk: call embedding_provider.embed([chunk])
d. Add to vector store with metadata {path, chunk_index, ...}
5. Indexing complete; engine is ready for queries
Parallelization: The indexing pipeline uses ThreadPoolExecutor for parallel embedding calls (the embedding step is the bottleneck). The chunking is fast and sequential per file.
Incremental Updates: When a file's mtime changes (detected by pathlib.Path.stat().st_mtime), delete_documents_by_path() is called first, then the file is re-indexed. This is critical for the auto-sync flow (see Configuration below).
Path resolution resilience: index_file() falls back to os.getcwd() if the base_dir-relative path doesn't exist. This handles batched test conditions where the subprocess CWD differs from the project root (e.g., a test chdir'ing into tests/artifacts/live_gui_workspace_*/ for fixture isolation). Without the fallback, indexing silently skipped files in those conditions.
Dimension Mismatch Protection
_init_vector_store() calls _validate_collection_dim() after creating the collection. The validation inspects the first existing vector's dim and compares it to the current embedding provider's output. On mismatch (e.g., the user switched from Gemini 3072-dim to local 384-dim, or vice versa, or a prior run populated the collection with a different model), the chroma directory is wiped via shutil.rmtree (with the client closed first to release file handles) and the collection is recreated with the correct dim.
Why this exists: Without validation, dim-mismatched upserts silently corrupt the collection. The next search() raises chromadb.errors.InvalidDimensionError: Collection expecting embedding with dimension of X, got Y, the AI request never reaches 'done' status, and the live_gui test polls timeout at 50×0.5s = 25s. This pattern was the dominant cause of tier-3-live_gui failures in the 2026-06-08 to 2026-06-10 window.
Regression tests in tests/test_rag_engine.py: test_rag_collection_dim_mismatch_recreates_collection, test_rag_collection_dim_match_preserves_collection.
Query Flow
When ai_client.send(rag_engine=engine) is called:
1. send() receives user_message
2. If rag_engine is not None:
a. rag_engine.search(user_message, top_k=5) -> list of {text, metadata, distance}
b. If results non-empty: inject [RETRIEVED CONTEXT] block into md_content
c. The block contains the top_k fragments, formatted as:
```
[RETRIEVED CONTEXT]
File: path/to/file.py (chunk 0)
<chunk text>
File: path/to/another.py (chunk 2)
<chunk text>
...
```
3. send() proceeds to the provider call with the augmented md_content
The injection point is before the system prompt construction. This means the retrieved context is treated as part of the project's tracked content, not as ad-hoc advice.
Public Methods
# Index a single file
rag_engine.index_file(path: str) -> None
# Search the index
rag_engine.search(query: str, top_k: int = 5) -> List[Dict[str, Any]]
# Returns: [{"text": str, "metadata": dict, "distance": float}, ...]
# Index management
rag_engine.add_documents(ids: List[str], texts: List[str], metadatas: Optional[List[dict]] = None) -> None
rag_engine.delete_documents(ids: List[str]) -> None
rag_engine.delete_documents_by_path(path: str) -> None
rag_engine.get_all_indexed_paths() -> List[str]
rag_engine.is_empty() -> bool
Configuration
RAG is configured via the project's manual_slop.toml:
[rag]
enabled = true
embedding_provider = "gemini" # or "local"
[rag.vector_store]
provider = "chroma" # "chroma" | "mock"
collection_name = "manual_slop" # the chroma subdir under .slop_cache/
url = "" # future: external HTTP vector store
api_key = "" # future: external HTTP auth
mcp_server = "" # future: MCP-based external RAG bridge
mcp_tool = "" # future: tool name on the MCP server
[rag]
chunk_size = 1000
chunk_overlap = 200
RAGConfig + VectorStoreConfig Schema (src/models.py)
@dataclass
class VectorStoreConfig:
provider: str # "chroma" | "mock"
url: Optional[str] = None # future: external HTTP
api_key: Optional[str] = None # future: external HTTP auth
collection_name: str = "manual_slop"
mcp_server: Optional[str] = None # future: MCP bridge
mcp_tool: Optional[str] = None # future: MCP tool name
@dataclass
class RAGConfig:
enabled: bool = False
vector_store: VectorStoreConfig = field(default_factory=lambda: VectorStoreConfig(provider='mock'))
embedding_provider: str = 'gemini' # "gemini" | "local"
chunk_size: int = 1000
chunk_overlap: int = 200
What about the fields the old doc showed? The 2026-06-10 docs sync verified against
src/models.py:1029-1040that the previousRAGConfigschema was fictional — most of the fields it listed never existed in the real dataclass. Specifically:ast_chunking_enableddoes not exist anywhere insrc/(there is noChunkingConfigclass — I claimed one existed in an earlier draft of this note and was wrong; flagging the correction here);vector_store_backendandvector_store_pathnever existed onRAGConfig(they were a flattened version of the now-nestedVectorStoreConfig);auto_index_on_loadandauto_sync_interval_secondsdo not exist anywhere insrc/(they were aspirational; the actual index-on-load and auto-sync behavior is wired inRAGEngineand the controller'smma_state_updateflow, not via persisted config);top_kIS a real thing but it is a runtime parameter toRAGEngine.search(query, top_k=5)andRAGEngine._search_mcp(query, top_k=5)(src/rag_engine.py:339, 322), not a field onRAGConfig— the old doc confused "config field" with "search parameter."
Behavior When Disabled
If enabled = false (the default), RAGEngine is never constructed. ai_client.send() receives rag_engine=None and the integration is a no-op. The lazy-loading of chromadb, sentence_transformers, and google.genai is also skipped, so there is zero overhead for projects that don't use RAG.
Auto-Sync
When auto_sync_interval_seconds > 0, a background thread periodically scans tracked files for mtime changes and re-indexes them. This keeps the vector store consistent with on-disk changes without requiring explicit user action.
The sync uses pathlib.Path.stat().st_mtime for change detection (same mechanism as the file cache in file_cache.py). For very large projects, the sync can be tuned to skip files above a size threshold.
Cross-System Integration
ai_client.send() Integration
See guide_architecture.md#rag-integration for the full dispatch flow. Summary:
def send(md_content, user_message, ..., rag_engine=None) -> str:
if rag_engine is not None:
retrieved = rag_engine.search(user_message, top_k=rag_engine.config.top_k)
if retrieved:
md_content = _inject_rag_context(md_content, retrieved)
...
The injection is a no-op if:
rag_engine is Nonerag_engine.is_empty()(index has no documents)search()returns no results above the distance threshold
MMA Worker Integration
The ConductorEngine does not construct RAGEngine itself. Workers receive context via md_content which is built by the caller. To use RAG in workers:
- Construct a
RAGEnginein the caller (typicallyAppControlleror test harness). - Pass it to
multi_agent_conductor.run_worker_lifecycle(..., rag_engine=...)(if supported) or to the test invocation. - The worker passes it to
ai_client.send(rag_engine=...).
Note: As of 2026-06-02, the direct rag_engine parameter on run_worker_lifecycle is not yet implemented. Workers currently rely on the md_content already being augmented by the caller, or on Tier 4 / Tier 2 setting up the augmentation before spawning workers.
GUI Integration
The GUI's RAG panel (under AI Settings → RAG) provides:
- Status indicator —
RAGEngine.is_empty()→ "Empty" / "Indexed N chunks" - Manual search box — for testing retrieval quality without sending a full AI call
- Re-index button — forces a full rebuild of the index
- Settings editor — modifies
RAGConfigfields and writes back tomanual_slop.toml
The RAG panel also surfaces the auto-sync status (last sync time, files indexed, files pending re-index).
Testing
Unit Tests
tests/test_rag_engine.py—RAGEnginebasic lifecycle with mock ChromaDB and mock embedding providertests/test_rag_integration.py— End-to-end indexing + search + retrieval
Simulation Tests
tests/test_rag_gui_presence.py— Verifies the RAG panel renders correctlytests/test_rag_visual_sim.py— Visual verification of the RAG search results panel
Stress Tests
tests/test_rag_phase4_stress.py— Indexes 1000+ files, measures retrieval latencytests/test_rag_phase4_final_verify.py— End-to-end verification of RAG-augmented AI responses
Test Patterns
The standard pattern for testing RAG-augmented calls:
def test_rag_augmented_send(live_gui):
# 1. Set up project with RAG enabled
client.set_rag_config(enabled=True, embedding_provider="local")
client.reindex_project()
# 2. Send a question that requires retrieval
response = client.send("How does the Execution Clutch work?")
# 3. Verify the response references the retrieved content
# (The exact assertion depends on what was indexed)
assert response
For unit tests that don't need real embedding models, the BaseEmbeddingProvider is mocked to return deterministic vectors (e.g., based on the hash of the input text).
Edge Cases & Limitations
-
Empty Index: If the index has no documents,
search()returns[]and no context is injected. The AI call proceeds normally with just the explicit file context. -
Network Failures (Gemini Embeddings): If the Gemini API is unreachable,
GeminiEmbeddingProvider.embed()raises an exception. The caller (typically_chunk_code→index_file→ RAG indexer) should handle this gracefully and either retry or fall back to the local provider. -
Stale Index: Auto-sync runs periodically but not on every read. If a file is changed between sync intervals, the index may be stale. The
delete_documents_by_path+index_filecycle is atomic per file, so a partial sync leaves the index in a consistent (if incomplete) state. -
Large Files: A single file larger than
chunk_sizeis split into multiple chunks with overlap. There's no upper limit on the number of chunks per file, but very large files (>10MB) may slow down indexing significantly. -
Binary Files: RAG only handles text files. Binary files (images, compiled Python, etc.) are skipped during indexing with a warning logged to
comms_log. -
Cross-Project Queries: The vector store is per-project (
<project_dir>/.rag/chroma/). Cross-project retrieval is not supported; each project has its own isolated index. -
Concurrent Writes: ChromaDB's PersistentClient is single-writer. If multiple processes try to write to the same index simultaneously, ChromaDB will raise. Manual Slop uses a
threading.Lockto serialize writes from the auto-sync thread and the manual re-index button.
Future Work
- External RAG Bridge — Connect to remote vector databases (e.g., a managed Pinecone or Weaviate) via MCP. The
_search_mcpmethod (src/rag_engine.py:322) IS a real implementation: whenRAGConfig.vector_store.provider == "mcp",RAGEngine.search()dispatches to_search_mcp()which callsmcp_client.async_dispatch("rag_search", {"query": ..., "top_k": ...}). The MCP-bridge config (mcp_server,mcp_tool) lives onVectorStoreConfig. The bridge wires up the rest of the RAG pipeline to a remote vector store; no per-vendor_init_vector_storebranch is needed because the MCP server owns that. - Hybrid Search — Combine dense (vector) retrieval with sparse (BM25) retrieval for better recall on code keywords.
- Re-ranking — Apply a cross-encoder reranker to the top-k results before injection to improve precision.
- Caching — Cache query results in memory to avoid re-embedding for repeated questions.
- Provider Routing — Allow per-query provider selection (e.g., use Gemini for general queries, local for code).
See guide_tools.md for the MCP tool inventory; see guide_architecture.md for the dispatch pipeline.