diff --git a/docs/reports/ai_decoupling_revert_report.md b/docs/reports/ai_decoupling_revert_report.md
new file mode 100644
index 00000000..078da870
--- /dev/null
+++ b/docs/reports/ai_decoupling_revert_report.md
@@ -0,0 +1,96 @@
+# AI Client Decoupling - Attempted and Reverted
+
+**Date:** 2026-05-13
+**Status:** REVERTED
+
+## Summary
+
+An attempt was made to decouple the AI client library imports from the main GUI application to reduce startup time. The core issue was slow startup due to heavy SDK imports (`google.genai`, `anthropic`, `chromadb`). The decoupling was only partially implemented and ultimately determined to be unnecessary since the actual bottleneck was RAG initialization, not AI SDK imports.
+
+## What Was Attempted
+
+### 1. Created `ai_client_stub.py`
+A lightweight stub module that provides a minimal interface to AI client functionality without importing heavy SDKs. The stub was intended to route all AI calls to a separate AI server process.
+
+### 2. Module Replacement Pattern in `sloppy.py`
+```python
+# Route all ai_client imports to ai_client_stub to avoid loading heavy SDKs
+if os.environ.get("AI_SERVER_ENABLED"):
+    import sys
+    from src import ai_client_stub
+    sys.modules["src.ai_client"] = ai_client_stub
+```
+
+### 3. Lazy Loading of RAG
+Moved `rag_engine` import from module-level to lazy imports inside functions/setters.
+
+### 4. Async RAG Initialization
+Moved RAG engine initialization to a background thread to prevent blocking the UI during startup.
+
+## What Actually Fixed the Startup Issue
+
+**The primary startup bottleneck was RAG initialization (5+ seconds), not AI SDK imports.**
+
+Timeline of discovery:
+1. Initial timing showed ~1.4s for `app_controller` import
+2. Further profiling revealed `rag_engine` → `chromadb` import chain at module level
+3. Lazy loading of `rag_engine` reduced startup to ~0.4s
+4. Further profiling showed `init_state()` taking 5+ seconds
+5. Discovered `models.RAGConfig.from_dict()` was parsing with RAG enabled in config
+6. Making RAG initialization async reduced App() construction from 5.2s to 0.027s
+
+## Why Decoupling Was Not Fully Implemented
+
+1. **Incomplete module replacement:** The `sys.modules["src.ai_client"] = ai_client_stub` approach was fragile and not consistently applied. Multiple modules still imported `ai_client` directly.
+
+2. **AI Server never properly utilized:** The `ai_client_proxy` and server infrastructure existed but was never properly integrated. The proxy client was designed to spawn a subprocess and communicate via JSON-RPC, but this was never connected to actual AI calls.
+
+3. **Wrong diagnosis:** The real issue was RAG blocking the event loop, not AI SDK imports. Even if decoupling worked fully, it wouldn't have addressed the primary bottleneck.
+
+4. **Architectural complexity:** The decoupling added significant complexity (stub modules, proxy client, server process, IPC mechanism) without proportional benefit.
+
+## Files Modified During Attempt
+
+### Created
+- `src/ai_client_stub.py` - Lightweight stub module
+
+### Modified
+- `sloppy.py` - Added AI_SERVER_ENABLED routing
+- `src/app_controller.py` - Lazy rag_engine import, async RAG init
+
+## Files That Should Be Removed/Restored
+
+The following changes represent incomplete decoupling that should be cleaned up:
+
+1. `src/ai_client_stub.py` - Should be evaluated for deletion if AI server is not implemented
+2. `src/ai_client_proxy.py` - Same as above
+3. Environment variable `AI_SERVER_ENABLED` in `sloppy.py` - No longer needed if decoupling is removed
+
+## Current State
+
+After reverting the decoupling attempt:
+
+| Metric | Time |
+|--------|------|
+| App class load | 0.4s |
+| App() construction | 0.027s |
+| RAG initialization | Async (background thread) |
+
+The application now starts quickly with RAG loading in the background.
+
+## Recommendations
+
+1. **If AI server is not implemented:** Remove `ai_client_stub.py`, `ai_client_proxy.py`, and clean up `sloppy.py`
+
+2. **If AI server is needed:** Implement it properly as a separate concern, not as a module replacement hack
+
+3. **Keep async RAG init:** The background thread for RAG is a good pattern and should remain
+
+4. **Profile before optimizing:** The lesson learned is to profile before attempting architectural changes
+
+## Lessons Learned
+
+1. Measure first, optimize second - the actual bottleneck was discovered through profiling, not assumption
+2. Architectural changes should solve actual problems, not anticipated ones
+3. Partial decoupling is worse than no decoupling - it adds complexity without benefits
+4. The simplest fix is often correct - lazy imports and async initialization solved the problem without architectural overhaul
\ No newline at end of file
diff --git a/sloppy.py b/sloppy.py
index 984eca2f..eeebd6d0 100644
--- a/sloppy.py
+++ b/sloppy.py
@@ -12,17 +12,10 @@ if thirdparty not in sys.path:
 os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"
 os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = "1"
 os.environ["TOKENIZERS_PARALLELISM"] = "false"
-os.environ["AI_SERVER_ENABLED"] = "1"
 
 from defer.sugar import install as _install_defer
 _install_defer()
 
-# Route all ai_client imports to ai_client_stub to avoid loading heavy SDKs
-if os.environ.get("AI_SERVER_ENABLED"):
-    import sys
-    from src import ai_client_stub
-    sys.modules["src.ai_client"] = ai_client_stub
-
 from src.gui_2 import main
 
 if __name__ == "__main__":