revert(ai_client): remove incomplete decoupling, restore clean startup

The AI client decoupling was never properly implemented and added unnecessary complexity. The actual startup bottleneck was RAG initialization which is now handled via async initialization. Report written to docs/reports/ai_decoupling_revert_report.md
2026-05-13 16:01:58 -04:00
parent d92086aef1
commit 4025a7130d
2 changed files with 96 additions and 7 deletions
@@ -0,0 +1,96 @@
+# AI Client Decoupling - Attempted and Reverted
+
+**Date:** 2026-05-13
+**Status:** REVERTED
+
+## Summary
+
+An attempt was made to decouple the AI client library imports from the main GUI application to reduce startup time. The core issue was slow startup due to heavy SDK imports (`google.genai`, `anthropic`, `chromadb`). The decoupling was only partially implemented and ultimately determined to be unnecessary since the actual bottleneck was RAG initialization, not AI SDK imports.
+
+## What Was Attempted
+
+### 1. Created `ai_client_stub.py`
+A lightweight stub module that provides a minimal interface to AI client functionality without importing heavy SDKs. The stub was intended to route all AI calls to a separate AI server process.
+
+### 2. Module Replacement Pattern in `sloppy.py`
+```python
+# Route all ai_client imports to ai_client_stub to avoid loading heavy SDKs
+if os.environ.get("AI_SERVER_ENABLED"):
+    import sys
+    from src import ai_client_stub
+    sys.modules["src.ai_client"] = ai_client_stub
+```
+
+### 3. Lazy Loading of RAG
+Moved `rag_engine` import from module-level to lazy imports inside functions/setters.
+
+### 4. Async RAG Initialization
+Moved RAG engine initialization to a background thread to prevent blocking the UI during startup.
+
+## What Actually Fixed the Startup Issue
+
+**The primary startup bottleneck was RAG initialization (5+ seconds), not AI SDK imports.**
+
+Timeline of discovery:
+1. Initial timing showed ~1.4s for `app_controller` import
+2. Further profiling revealed `rag_engine` → `chromadb` import chain at module level
+3. Lazy loading of `rag_engine` reduced startup to ~0.4s
+4. Further profiling showed `init_state()` taking 5+ seconds
+5. Discovered `models.RAGConfig.from_dict()` was parsing with RAG enabled in config
+6. Making RAG initialization async reduced App() construction from 5.2s to 0.027s
+
+## Why Decoupling Was Not Fully Implemented
+
+1. **Incomplete module replacement:** The `sys.modules["src.ai_client"] = ai_client_stub` approach was fragile and not consistently applied. Multiple modules still imported `ai_client` directly.
+
+2. **AI Server never properly utilized:** The `ai_client_proxy` and server infrastructure existed but was never properly integrated. The proxy client was designed to spawn a subprocess and communicate via JSON-RPC, but this was never connected to actual AI calls.
+
+3. **Wrong diagnosis:** The real issue was RAG blocking the event loop, not AI SDK imports. Even if decoupling worked fully, it wouldn't have addressed the primary bottleneck.
+
+4. **Architectural complexity:** The decoupling added significant complexity (stub modules, proxy client, server process, IPC mechanism) without proportional benefit.
+
+## Files Modified During Attempt
+
+### Created
+- `src/ai_client_stub.py` - Lightweight stub module
+
+### Modified
+- `sloppy.py` - Added AI_SERVER_ENABLED routing
+- `src/app_controller.py` - Lazy rag_engine import, async RAG init
+
+## Files That Should Be Removed/Restored
+
+The following changes represent incomplete decoupling that should be cleaned up:
+
+1. `src/ai_client_stub.py` - Should be evaluated for deletion if AI server is not implemented
+2. `src/ai_client_proxy.py` - Same as above
+3. Environment variable `AI_SERVER_ENABLED` in `sloppy.py` - No longer needed if decoupling is removed
+
+## Current State
+
+After reverting the decoupling attempt:
+
+| Metric | Time |
+|--------|------|
+| App class load | 0.4s |
+| App() construction | 0.027s |
+| RAG initialization | Async (background thread) |
+
+The application now starts quickly with RAG loading in the background.
+
+## Recommendations
+
+1. **If AI server is not implemented:** Remove `ai_client_stub.py`, `ai_client_proxy.py`, and clean up `sloppy.py`
+
+2. **If AI server is needed:** Implement it properly as a separate concern, not as a module replacement hack
+
+3. **Keep async RAG init:** The background thread for RAG is a good pattern and should remain
+
+4. **Profile before optimizing:** The lesson learned is to profile before attempting architectural changes
+
+## Lessons Learned
+
+1. Measure first, optimize second - the actual bottleneck was discovered through profiling, not assumption
+2. Architectural changes should solve actual problems, not anticipated ones
+3. Partial decoupling is worse than no decoupling - it adds complexity without benefits
+4. The simplest fix is often correct - lazy imports and async initialization solved the problem without architectural overhaul
@@ -12,17 +12,10 @@ if thirdparty not in sys.path:
 os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"
 os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = "1"
 os.environ["TOKENIZERS_PARALLELISM"] = "false"
-os.environ["AI_SERVER_ENABLED"] = "1"

 from defer.sugar import install as _install_defer
 _install_defer()

-# Route all ai_client imports to ai_client_stub to avoid loading heavy SDKs
-if os.environ.get("AI_SERVER_ENABLED"):
-    import sys
-    from src import ai_client_stub
-    sys.modules["src.ai_client"] = ai_client_stub
-
 from src.gui_2 import main

 if __name__ == "__main__":