diff --git a/docs/reports/ai_decoupling_revert_report.md b/docs/reports/ai_decoupling_revert_report.md new file mode 100644 index 00000000..078da870 --- /dev/null +++ b/docs/reports/ai_decoupling_revert_report.md @@ -0,0 +1,96 @@ +# AI Client Decoupling - Attempted and Reverted + +**Date:** 2026-05-13 +**Status:** REVERTED + +## Summary + +An attempt was made to decouple the AI client library imports from the main GUI application to reduce startup time. The core issue was slow startup due to heavy SDK imports (`google.genai`, `anthropic`, `chromadb`). The decoupling was only partially implemented and ultimately determined to be unnecessary since the actual bottleneck was RAG initialization, not AI SDK imports. + +## What Was Attempted + +### 1. Created `ai_client_stub.py` +A lightweight stub module that provides a minimal interface to AI client functionality without importing heavy SDKs. The stub was intended to route all AI calls to a separate AI server process. + +### 2. Module Replacement Pattern in `sloppy.py` +```python +# Route all ai_client imports to ai_client_stub to avoid loading heavy SDKs +if os.environ.get("AI_SERVER_ENABLED"): + import sys + from src import ai_client_stub + sys.modules["src.ai_client"] = ai_client_stub +``` + +### 3. Lazy Loading of RAG +Moved `rag_engine` import from module-level to lazy imports inside functions/setters. + +### 4. Async RAG Initialization +Moved RAG engine initialization to a background thread to prevent blocking the UI during startup. + +## What Actually Fixed the Startup Issue + +**The primary startup bottleneck was RAG initialization (5+ seconds), not AI SDK imports.** + +Timeline of discovery: +1. Initial timing showed ~1.4s for `app_controller` import +2. Further profiling revealed `rag_engine` → `chromadb` import chain at module level +3. Lazy loading of `rag_engine` reduced startup to ~0.4s +4. Further profiling showed `init_state()` taking 5+ seconds +5. Discovered `models.RAGConfig.from_dict()` was parsing with RAG enabled in config +6. Making RAG initialization async reduced App() construction from 5.2s to 0.027s + +## Why Decoupling Was Not Fully Implemented + +1. **Incomplete module replacement:** The `sys.modules["src.ai_client"] = ai_client_stub` approach was fragile and not consistently applied. Multiple modules still imported `ai_client` directly. + +2. **AI Server never properly utilized:** The `ai_client_proxy` and server infrastructure existed but was never properly integrated. The proxy client was designed to spawn a subprocess and communicate via JSON-RPC, but this was never connected to actual AI calls. + +3. **Wrong diagnosis:** The real issue was RAG blocking the event loop, not AI SDK imports. Even if decoupling worked fully, it wouldn't have addressed the primary bottleneck. + +4. **Architectural complexity:** The decoupling added significant complexity (stub modules, proxy client, server process, IPC mechanism) without proportional benefit. + +## Files Modified During Attempt + +### Created +- `src/ai_client_stub.py` - Lightweight stub module + +### Modified +- `sloppy.py` - Added AI_SERVER_ENABLED routing +- `src/app_controller.py` - Lazy rag_engine import, async RAG init + +## Files That Should Be Removed/Restored + +The following changes represent incomplete decoupling that should be cleaned up: + +1. `src/ai_client_stub.py` - Should be evaluated for deletion if AI server is not implemented +2. `src/ai_client_proxy.py` - Same as above +3. Environment variable `AI_SERVER_ENABLED` in `sloppy.py` - No longer needed if decoupling is removed + +## Current State + +After reverting the decoupling attempt: + +| Metric | Time | +|--------|------| +| App class load | 0.4s | +| App() construction | 0.027s | +| RAG initialization | Async (background thread) | + +The application now starts quickly with RAG loading in the background. + +## Recommendations + +1. **If AI server is not implemented:** Remove `ai_client_stub.py`, `ai_client_proxy.py`, and clean up `sloppy.py` + +2. **If AI server is needed:** Implement it properly as a separate concern, not as a module replacement hack + +3. **Keep async RAG init:** The background thread for RAG is a good pattern and should remain + +4. **Profile before optimizing:** The lesson learned is to profile before attempting architectural changes + +## Lessons Learned + +1. Measure first, optimize second - the actual bottleneck was discovered through profiling, not assumption +2. Architectural changes should solve actual problems, not anticipated ones +3. Partial decoupling is worse than no decoupling - it adds complexity without benefits +4. The simplest fix is often correct - lazy imports and async initialization solved the problem without architectural overhaul \ No newline at end of file diff --git a/sloppy.py b/sloppy.py index 984eca2f..eeebd6d0 100644 --- a/sloppy.py +++ b/sloppy.py @@ -12,17 +12,10 @@ if thirdparty not in sys.path: os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1" os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = "1" os.environ["TOKENIZERS_PARALLELISM"] = "false" -os.environ["AI_SERVER_ENABLED"] = "1" from defer.sugar import install as _install_defer _install_defer() -# Route all ai_client imports to ai_client_stub to avoid loading heavy SDKs -if os.environ.get("AI_SERVER_ENABLED"): - import sys - from src import ai_client_stub - sys.modules["src.ai_client"] = ai_client_stub - from src.gui_2 import main if __name__ == "__main__":