Gitea (and any case-sensitive filesystem) was rendering the [Top]
nav links in /docs as broken because of two bugs:
1. Case-sensitivity: 22 links used '../README.md' (all-uppercase)
but the actual file is 'docs/Readme.md' (capital R, lowercase
rest). 21 guide_*.md nav bars were affected, plus 1 internal
cross-link in Readme.md itself. Works on Windows (case-
insensitive) but broken on Linux/Gitea.
Fix: 22 occurrences across 22 files changed
'../README.md' -> '../Readme.md'
2. Wrong relative-path level: 16 links used '../../conductor/...'
from 'docs/guide_*.md' to reach 'conductor/'. This goes up 2
levels to 'projects/', which doesn't exist. The correct path
from 'docs/guide_*.md' to 'conductor/' is 1 level up
('../conductor/...'). 12 unique patterns across 10 files
affected.
Fix: 16 occurrences across 10 files changed
'../../conductor/' -> '../conductor/'
3. Bonus: 1 planned-guide link in guide_context_curation.md
referenced a never-written 'guide_context_presets.md'. The
ContextPreset schema is now fully covered in the new
'guide_context_aggregation.md' (per the 2026-06-08 docs
refresh). Fix: link target updated.
No content was changed, only link paths. 24 files, 37 link
replacements, 37 deletions.
Verification:
- All .md links in docs/ now resolve to existing files
(validated by path-resolution check from each file's directory)
- The 3 new guides from the previous docs refresh commit
(guide_discussions.md, guide_state_lifecycle.md,
guide_context_aggregation.md) had the case bug inherited from
guide_architecture.md's existing nav pattern; their top-of-file
nav bars are now correct
- The 21 pre-existing guide nav bars that had the same bug
(all 21 of them, except the 3 that used the correct case:
guide_mma.md, guide_simulations.md, guide_tools.md) are now
also fixed
- Inter-guide links (e.g. [Discussions](guide_discussions.md))
were not affected; they were always correct because both the
link text and the actual filename are lowercase
This is a docs-only fix. No code modified.
18 KiB
RAG (Retrieval-Augmented Generation)
Top | Architecture | MMA | Tools & IPC | Simulations
Overview
Manual Slop integrates Retrieval-Augmented Generation (RAG) to extend the AI's working context beyond the explicit file list. When a project is RAG-enabled, the system maintains a vector index of file content; AI calls can retrieve semantically similar fragments at query time and prepend them to the prompt.
The RAG implementation is pluggable: the vector store, the embedding provider, and the chunking strategy are all configurable per project. The default backend is ChromaDB (local persistent), the default embedding is Gemini Embedding 001 (cloud), and the default chunking is character-based with overlap (with AST-aware chunking for Python files when enabled).
This guide covers:
- Architecture — Where RAG fits in the dispatch pipeline
- Components —
RAGEngine, embedding providers, vector store - Data Flow — Indexing, query, retrieval, injection
- Configuration —
RAGConfigschema and TOML settings - Verification — Test infrastructure and known edge cases
Architecture
RAG sits between the project's tracked files and the AI provider's input prompt. It is not an internal AI call — it is a pre-processing step that augments md_content before the provider sees it.
┌─────────────────────────────────┐
│ AppController / ConductorEngine │
│ (caller of ai_client.send) │
└────────────┬────────────────────┘
│ constructs RAGEngine once per project
▼
┌────────────────────────────────────────────┐
│ RAGEngine │
│ ├─ EmbeddingProvider (Local or Gemini) │
│ ├─ VectorStore (ChromaDB persistent) │
│ └─ Chunkers (_chunk_text, _chunk_code) │
└────────────┬───────────────────────────────┘
│ on every ai_client.send() call:
│ rag_engine.search(user_message) -> fragments
▼
┌────────────────────────────────────────────┐
│ ai_client.send(rag_engine=...) │
│ injects [RETRIEVED CONTEXT] block │
│ into md_content before provider call │
└────────────────────────────────────────────┘
Lifecycle:
- The
AppControllerconstructs a singleRAGEngineper project load (lazily, when the project is first opened or when a RAG-related setting changes). - The
RAGEngineis passed through toai_client.send()for every AI call from the main discussion flow. - For Tier 3 workers spawned by the MMA, the ConductorEngine or caller is responsible for constructing the engine (typically with the same configuration as the main discussion).
- If a project disables RAG,
rag_engine=Noneis passed tosend()and the integration is a no-op.
Why caller-owned? The RAG engine is decoupled from ai_client so that the same module can be reused by the GUI's RAG panel for direct queries, by MMA workers for ticket-specific retrieval, and by future automation scripts. ai_client only knows how to use an engine if one is provided.
Components
RAGEngine (src/rag_engine.py)
The central class. Owns the embedding provider and the vector store, exposes high-level methods for indexing and search.
class RAGEngine:
def __init__(self, config: models.RAGConfig, base_dir: str = "."):
...
Construction: Takes a RAGConfig (from src/models.py) and a base_dir. The config specifies the embedding provider type, the vector store path, the chunk size, and the chunk overlap.
Internal state:
embedding_provider: BaseEmbeddingProvider— set by_init_embedding_providervector_store— a ChromaDBCollection(or a stub for tests)chunk_size: int— character count per chunkchunk_overlap: int— overlap between adjacent chunks
Embedding Providers
Two providers are implemented; new ones can be added by subclassing BaseEmbeddingProvider.
BaseEmbeddingProvider
class BaseEmbeddingProvider:
def embed(self, texts: List[str]) -> List[List[float]]:
"""Embed a batch of texts. Returns one vector per input text."""
...
A contract: embed() takes a list of strings and returns a list of equal-length float vectors. The vector dimensionality is provider-specific (e.g., 384 for all-MiniLM-L6-v2, 768 for gemini-embedding-001).
LocalEmbeddingProvider
Uses sentence-transformers (all-MiniLM-L6-v2 by default) for embedding.
- Pros: Fully local, no API quota, deterministic.
- Cons: Lower-quality embeddings than cloud models for code; CPU/GPU usage during indexing.
- Default model:
all-MiniLM-L6-v2(384 dimensions, ~80MB download on first use).
class LocalEmbeddingProvider(BaseEmbeddingProvider):
def __init__(self, model_name: str = 'all-MiniLM-L6-v2'):
...
GeminiEmbeddingProvider
Uses the Gemini Embedding 001 model via the google-genai SDK.
- Pros: Higher-quality embeddings, especially for code; no local model download.
- Cons: Requires Gemini API key, network round-trip per embedding call, subject to API quotas.
class GeminiEmbeddingProvider(BaseEmbeddingProvider):
def __init__(self, model_name: str = 'gemini-embedding-001'):
...
Lazy Loading
The heavy dependencies (sentence_transformers, google.genai, chromadb) are loaded lazily via _get_sentence_transformers(), _get_google_genai(), _get_chromadb(). This means RAG is opt-in: a project that doesn't enable RAG pays no import-time cost.
Vector Store
ChromaDB is the default persistent vector store. The store is created at <project_dir>/.rag/chroma/ by default (configurable via RAGConfig.vector_store_path).
def _init_vector_store(self):
if self.config.vector_store_backend == "chromadb":
client = chromadb.PersistentClient(path=...)
self.vector_store = client.get_or_create_collection(name=...)
else:
raise NotImplementedError(...)
Backends:
chromadb(default) — local persistent, single-process- Future: External RAG Bridge via MCP (e.g., a remote vector database server)
The _search_mcp method is a placeholder for the future external bridge integration; current local-only mode uses vector_store.query() directly.
Chunking Strategies
Two strategies are implemented. The choice is made per-file based on extension and config.
Character-Based (_chunk_text)
Default for non-Python files and for Python files when AST chunking is disabled.
def _chunk_text(self, content: str) -> List[str]:
"""Character-based chunking with overlap."""
chunks = []
start = 0
while start < len(content):
end = min(start + self.chunk_size, len(content))
chunks.append(content[start:end])
if end >= len(content): break
start = end - self.chunk_overlap
return chunks
- Default chunk size: 1000 characters
- Default overlap: 200 characters
- Edge cases: Empty files return
[]; single-chunk files return[content].
AST-Aware (_chunk_code)
Used for .py files when RAGConfig.ast_chunking_enabled = True.
def _chunk_code(self, content: str, file_path: str) -> List[str]:
"""AST-aware chunking for Python code."""
# Parses with stdlib ast
# Splits on top-level def/class boundaries
# Each chunk is a complete top-level definition with its docstring
...
- Strategy: Each top-level function, class, or constant block becomes one chunk. Docstrings are preserved as the first line of the chunk for context.
- Pros: Semantic boundaries produce more meaningful retrieval results. A query for "how does X work" is more likely to return the entire definition of X rather than a fragment.
- Cons: Requires valid Python; syntax errors fall back to character-based chunking.
The chunker uses stdlib ast (not tree-sitter) to avoid pulling tree-sitter for a feature that only handles Python.
Data Flow
Indexing Flow
When a project is loaded with RAG enabled, the RAGEngine is populated by indexing all tracked files.
1. Project load: AppController reads [rag] section from manual_slop.toml
2. AppController constructs RAGEngine(config)
3. RAGEngine._init_vector_store() creates/loads ChromaDB collection
4. For each tracked file (parallelized):
a. Read content
b. Choose chunker based on extension and config
c. For each chunk: call embedding_provider.embed([chunk])
d. Add to vector store with metadata {path, chunk_index, ...}
5. Indexing complete; engine is ready for queries
Parallelization: The indexing pipeline uses ThreadPoolExecutor for parallel embedding calls (the embedding step is the bottleneck). The chunking is fast and sequential per file.
Incremental Updates: When a file's mtime changes (detected by pathlib.Path.stat().st_mtime), delete_documents_by_path() is called first, then the file is re-indexed. This is critical for the auto-sync flow (see Configuration below).
Query Flow
When ai_client.send(rag_engine=engine) is called:
1. send() receives user_message
2. If rag_engine is not None:
a. rag_engine.search(user_message, top_k=5) -> list of {text, metadata, distance}
b. If results non-empty: inject [RETRIEVED CONTEXT] block into md_content
c. The block contains the top_k fragments, formatted as:
```
[RETRIEVED CONTEXT]
File: path/to/file.py (chunk 0)
<chunk text>
File: path/to/another.py (chunk 2)
<chunk text>
...
```
3. send() proceeds to the provider call with the augmented md_content
The injection point is before the system prompt construction. This means the retrieved context is treated as part of the project's tracked content, not as ad-hoc advice.
Public Methods
# Index a single file
rag_engine.index_file(path: str) -> None
# Search the index
rag_engine.search(query: str, top_k: int = 5) -> List[Dict[str, Any]]
# Returns: [{"text": str, "metadata": dict, "distance": float}, ...]
# Index management
rag_engine.add_documents(ids: List[str], texts: List[str], metadatas: Optional[List[dict]] = None) -> None
rag_engine.delete_documents(ids: List[str]) -> None
rag_engine.delete_documents_by_path(path: str) -> None
rag_engine.get_all_indexed_paths() -> List[str]
rag_engine.is_empty() -> bool
Configuration
RAG is configured via the project's manual_slop.toml:
[rag]
enabled = true
embedding_provider = "gemini" # or "local"
chunk_size = 1000
chunk_overlap = 200
ast_chunking_enabled = true
vector_store_backend = "chromadb"
vector_store_path = ".rag/chroma" # relative to project base_dir
auto_index_on_load = true
auto_sync_interval_seconds = 60 # background re-indexing
top_k = 5
RAGConfig Schema (src/models.py)
@dataclass
class RAGConfig:
enabled: bool = False
embedding_provider: str = "gemini" # "local" | "gemini"
chunk_size: int = 1000
chunk_overlap: int = 200
ast_chunking_enabled: bool = True
vector_store_backend: str = "chromadb"
vector_store_path: str = ".rag/chroma"
auto_index_on_load: bool = True
auto_sync_interval_seconds: int = 60
top_k: int = 5
Behavior When Disabled
If enabled = false (the default), RAGEngine is never constructed. ai_client.send() receives rag_engine=None and the integration is a no-op. The lazy-loading of chromadb, sentence_transformers, and google.genai is also skipped, so there is zero overhead for projects that don't use RAG.
Auto-Sync
When auto_sync_interval_seconds > 0, a background thread periodically scans tracked files for mtime changes and re-indexes them. This keeps the vector store consistent with on-disk changes without requiring explicit user action.
The sync uses pathlib.Path.stat().st_mtime for change detection (same mechanism as the file cache in file_cache.py). For very large projects, the sync can be tuned to skip files above a size threshold.
Cross-System Integration
ai_client.send() Integration
See guide_architecture.md#rag-integration for the full dispatch flow. Summary:
def send(md_content, user_message, ..., rag_engine=None) -> str:
if rag_engine is not None:
retrieved = rag_engine.search(user_message, top_k=rag_engine.config.top_k)
if retrieved:
md_content = _inject_rag_context(md_content, retrieved)
...
The injection is a no-op if:
rag_engine is Nonerag_engine.is_empty()(index has no documents)search()returns no results above the distance threshold
MMA Worker Integration
The ConductorEngine does not construct RAGEngine itself. Workers receive context via md_content which is built by the caller. To use RAG in workers:
- Construct a
RAGEnginein the caller (typicallyAppControlleror test harness). - Pass it to
multi_agent_conductor.run_worker_lifecycle(..., rag_engine=...)(if supported) or to the test invocation. - The worker passes it to
ai_client.send(rag_engine=...).
Note: As of 2026-06-02, the direct rag_engine parameter on run_worker_lifecycle is not yet implemented. Workers currently rely on the md_content already being augmented by the caller, or on Tier 4 / Tier 2 setting up the augmentation before spawning workers.
GUI Integration
The GUI's RAG panel (under AI Settings → RAG) provides:
- Status indicator —
RAGEngine.is_empty()→ "Empty" / "Indexed N chunks" - Manual search box — for testing retrieval quality without sending a full AI call
- Re-index button — forces a full rebuild of the index
- Settings editor — modifies
RAGConfigfields and writes back tomanual_slop.toml
The RAG panel also surfaces the auto-sync status (last sync time, files indexed, files pending re-index).
Testing
Unit Tests
tests/test_rag_engine.py—RAGEnginebasic lifecycle with mock ChromaDB and mock embedding providertests/test_rag_integration.py— End-to-end indexing + search + retrieval
Simulation Tests
tests/test_rag_gui_presence.py— Verifies the RAG panel renders correctlytests/test_rag_visual_sim.py— Visual verification of the RAG search results panel
Stress Tests
tests/test_rag_phase4_stress.py— Indexes 1000+ files, measures retrieval latencytests/test_rag_phase4_final_verify.py— End-to-end verification of RAG-augmented AI responses
Test Patterns
The standard pattern for testing RAG-augmented calls:
def test_rag_augmented_send(live_gui):
# 1. Set up project with RAG enabled
client.set_rag_config(enabled=True, embedding_provider="local")
client.reindex_project()
# 2. Send a question that requires retrieval
response = client.send("How does the Execution Clutch work?")
# 3. Verify the response references the retrieved content
# (The exact assertion depends on what was indexed)
assert response
For unit tests that don't need real embedding models, the BaseEmbeddingProvider is mocked to return deterministic vectors (e.g., based on the hash of the input text).
Edge Cases & Limitations
-
Empty Index: If the index has no documents,
search()returns[]and no context is injected. The AI call proceeds normally with just the explicit file context. -
Network Failures (Gemini Embeddings): If the Gemini API is unreachable,
GeminiEmbeddingProvider.embed()raises an exception. The caller (typically_chunk_code→index_file→ RAG indexer) should handle this gracefully and either retry or fall back to the local provider. -
Stale Index: Auto-sync runs periodically but not on every read. If a file is changed between sync intervals, the index may be stale. The
delete_documents_by_path+index_filecycle is atomic per file, so a partial sync leaves the index in a consistent (if incomplete) state. -
Large Files: A single file larger than
chunk_sizeis split into multiple chunks with overlap. There's no upper limit on the number of chunks per file, but very large files (>10MB) may slow down indexing significantly. -
Binary Files: RAG only handles text files. Binary files (images, compiled Python, etc.) are skipped during indexing with a warning logged to
comms_log. -
Cross-Project Queries: The vector store is per-project (
<project_dir>/.rag/chroma/). Cross-project retrieval is not supported; each project has its own isolated index. -
Concurrent Writes: ChromaDB's PersistentClient is single-writer. If multiple processes try to write to the same index simultaneously, ChromaDB will raise. Manual Slop uses a
threading.Lockto serialize writes from the auto-sync thread and the manual re-index button.
Future Work
- External RAG Bridge — Connect to remote vector databases (e.g., a managed Pinecone or Weaviate) via MCP. The
_search_mcpmethod is a placeholder for this. - Hybrid Search — Combine dense (vector) retrieval with sparse (BM25) retrieval for better recall on code keywords.
- Re-ranking — Apply a cross-encoder reranker to the top-k results before injection to improve precision.
- Caching — Cache query results in memory to avoid re-embedding for repeated questions.
- Provider Routing — Allow per-query provider selection (e.g., use Gemini for general queries, local for code).
See guide_tools.md for the MCP tool inventory; see guide_architecture.md for the dispatch pipeline.