# Track: Sloppy.py Startup Speedup **Status:** Active **Initialized:** 2026-06-06 **Owner:** Tier 2 Tech Lead **Priority:** High (regression blocker — `live_gui` fixtures time out at `wait_for_server(timeout=15)`) --- ## 1. Problem Statement `uv run sloppy.py --enable-test-hooks` startup latency has crept up. `live_gui` tests time out at `wait_for_server(timeout=15)`. Root cause is **too much work on the main thread before `immapp.run()` returns and the GUI becomes interactive**: - 5 AI provider SDKs (`google.genai`, `anthropic`, `openai`, `requests`, ...) eagerly imported at `src/ai_client.py` module top-level, even though only one is the active provider at runtime - `imgui_bundle` transitively pulls `numpy` and 9 other heavy modules at the top of `src/gui_2.py` and 9 sibling files - NERV theme, command palette, markdown table extensions are loaded eagerly even though they are feature-gated - `AppController.__init__` does all subsystem construction synchronously on the thread that will become the main GUI thread (path manager, presets, personas, context presets, tool presets, history, workspace, RAG, hook server) The architecture is already correct: AI calls go through the asyncio worker thread, so the *call* is non-blocking. The *imports* are still synchronous on the main thread, and that is what the user sees as "sloppy.py is slow to open." ### 1.1 Measurement Baseline (from `scripts/benchmark_imports.py`) Cold-start subprocess timings, median of 3 runs, 85 unique import paths: | module | time | files | classification | |---|---:|---:|---| | google.genai | ~955ms | 1 | **defer (provider SDK, default)** | | openai | ~445ms | 1 | defer (provider SDK) | | anthropic | ~430ms | 1 | defer (provider SDK) | | src.markdown_table | ~250ms | 1 | defer (feature-gated) | | src.theme_nerv | ~245ms | 1 | defer (feature-gated) | | imgui_bundle | ~245ms | 10 | **KEEP (ImGui hot path)** | | src.command_palette | ~244ms | 1 | defer (feature-gated) | | src.theme_nerv_fx | ~240ms | 1 | defer (feature-gated) | | fastapi (+ security.api_key) | ~470ms combined | 1 | defer (only `--enable-test-hooks` or web mode) | | requests | ~92ms | 3 | defer (deepseek/minimax only) | | numpy | ~65ms | 2 | keep (bg_shader; optional in gui_2) | | pydantic | ~70ms | 1 | keep (models.py is loaded by everyone) | | tree_sitter_* | ~25ms each | 1 | keep (file_cache) | **Estimated main-thread import cost today (worst case, all paths):** ~2500-3000ms (1.0s SDKs + 1.0s web/fastapi + 0.5s GUI extras + ~0.5s transitives). **Estimated main-thread import cost after this track:** ~500-600ms (`imgui_bundle` + lean `gui_2` + `pydantic` models). Net savings ~2000-2400ms. --- ## 2. Approach The architecture is already correct. The fix is **systematic application of the lazy-load + shared-job-pool patterns** the codebase already uses for `RAGEngine` (`get_rag_engine` in `src/app_controller.py:244-249`) and `MultiAgentConductor` (`get_mma_conductor` in `src/app_controller.py:266-271`). ### 2.1 Architectural Invariant: Main Thread Purity > **The main thread (the one that enters `immapp.run()`) must NEVER import a > module heavier than `imgui_bundle` and the lean `gui_2` skeleton. Every heavy > import is loaded by the asyncio worker thread, the AppController's shared > job pool, or the MMA WorkerPool. This invariant is enforced by an audit > script (CI gate) and a runtime audit-hook test that fails if a heavy import > is observed on the main thread at startup.** Concretely, the main thread's import chain is allowed to contain: - All `import X` statements transitively reachable from `src/gui_2.py` whose accumulated import time is < 50ms - The modules: `imgui_bundle`, `defer`, `src.imgui_scopes`, `src.theme_2` (default theme only), `src.theme_models`, `src.paths`, `src.models`, `src.events` - Anything in `sys.stdlib_module_names` Everything else — provider SDKs, FastAPI, NERV theme, command palette, markdown table extensions, the full `src.ai_client` provider list, `numpy`/`psutil`/ `tree_sitter_*` if used by lazy code paths — must be loaded by a background mechanism that does not run on the main thread. ### 2.2 Four layers of protection #### Layer 1 — Explicit warmup-aware module access (the load-bearing wall, non-negotiable) Remove heavy imports from the top of source files reachable from the main thread. Functions that need them use a `_require_warmed(name)` helper that assumes the module is already in `sys.modules` (because warmup put it there): ```python # BEFORE (src/ai_client.py, current) from google import genai import anthropic import openai # ... 5 provider SDKs loaded unconditionally # AFTER import sys import importlib from typing import Any def _require_warmed(name: str) -> Any: """Get a module that AppController's warmup should have loaded. Raises RuntimeError if the module is not in sys.modules. This is the explicit contract: heavy modules MUST be warmed at startup. No lazy loading on first use — the import is paid upfront on a bg thread. """ mod = sys.modules.get(name) if mod is None: raise RuntimeError( f"Module {name!r} is not warmed. " f"AppController.__init__ must have run first (which submits warmup jobs)." ) return mod def _send_gemini(md_content, user_message, ...): genai = _require_warmed("google.genai") # ... use genai ... ``` **Why no `import X` inside the function body?** Because that would be lazy loading on first use. If the first use is triggered by a user UI action (e.g. switching the provider from MiniMax to Gemini, the controller enqueues an action that propagates to the first call), the user sees a 955ms lag between their click and any visible response. That's the bad case the user called out: *"lazy loading introduces latencies when interacting with the UI state vs the bg state."* By warming proactively, the first user-triggered call is instant. The cost is paid during startup on a bg thread, before the user can interact. **Main-thread cost: zero.** The main thread's import chain is fully lean (none of the heavy modules are imported top-level). The warmup jobs run on `_io_pool` workers in parallel with the main thread's remaining init. #### Layer 2 — Shared job pool on AppController (no new threads per task) The codebase already has these dedicated / shared threads: - `AppController._loop_thread` — asyncio worker (**DEDICATED** to the AI event loop, do not use for arbitrary work) - `WorkerPool` (in `src/multi_agent_conductor.py`) — 4-thread pool for MMA workers (**DEDICATED** to MMA, do not pollute with imports or I/O) - `HookServer` thread — **DEDICATED** to the FastAPI server - Ad-hoc `threading.Thread` calls — used for one-off tasks; the user wants to **MINIMIZE** these **User constraint:** no new daemon threads per import warmup, per I/O task, per log-prune. We add ONE shared `ThreadPoolExecutor` to `AppController` named `_io_pool`, and any subsystem that needs background work submits jobs to it. This includes: - Initial RAG index warm-up (if applicable) - Log pruning (currently a one-shot thread — refactor to use the pool) - Disk-bound subsystem initialization (e.g., TOML re-read on persona switch) - **Heavy module warmup (the primary use case for this track)** ```python # In AppController.__init__ from concurrent.futures import ThreadPoolExecutor self._io_pool = ThreadPoolExecutor( max_workers=4, thread_name_prefix="controller-io", ) ``` **Threads created by this track: 4** (the pool). Not 4+1 per job, not 1 per import, not 1 per subsystem. Just 4 long-lived threads that all background work shares. Future work that needs a bg thread should `controller._io_pool.submit(fn)`. #### Layer 3 — Proactive warmup + completion notification (the new mechanism) This is the core of the track. In `AppController.__init__`, immediately after `_io_pool` is created, the controller submits a job to the pool for each heavy module that needs warming. The main thread does NOT wait for these to complete. ```python # In AppController.__init__, right after self._io_pool is created self._warmup_status: dict[str, list[str]] = { "pending": [], "completed": [], "failed": [], } self._warmup_lock = threading.Lock() self._warmup_done_event = threading.Event() self._warmup_callbacks: list[Callable] = [] self._submit_warmup_jobs() ``` ```python def _submit_warmup_jobs(self) -> None: """Submit bg jobs to import heavy modules. Notifies subscribers on completion.""" heavy = self._compute_warmup_list() with self._warmup_lock: self._warmup_status["pending"] = list(heavy) self._warmup_status["completed"] = [] self._warmup_status["failed"] = [] self._warmup_done_event.clear() for module_name in heavy: self._io_pool.submit(self._warmup_one, module_name) def _compute_warmup_list(self) -> list[str]: result = [ # AI provider SDKs "google.genai", "anthropic", "openai", "requests", # Feature-gated GUI (used by main thread but not on first frame) "src.command_palette", "src.theme_nerv", "src.theme_nerv_fx", "src.markdown_table", ] if self._enable_test_hooks or self._web_host: result.extend(["fastapi", "fastapi.security.api_key"]) return result def _warmup_one(self, module_name: str) -> None: try: importlib.import_module(module_name) with self._warmup_lock: self._warmup_status["pending"].remove(module_name) self._warmup_status["completed"].append(module_name) except Exception as e: with self._warmup_lock: self._warmup_status["pending"].remove(module_name) self._warmup_status["failed"].append(module_name) finally: with self._warmup_lock: done = not self._warmup_status["pending"] callbacks = list(self._warmup_callbacks) if done else [] if done: self._warmup_done_event.set() for cb in callbacks: try: cb(self._warmup_status) except Exception: pass ``` **Completion notification** is critical for the user-visible UX. Three surfaces: 1. **GUI status indicator** — the status bar shows "Warming up... (5/8)" while the bg jobs run, then "All imports ready" with a green dot when complete. The GUI never blocks waiting; the indicator is updated by polling `controller.warmup_status()` once per frame (cheap, lock-guarded). 2. **GUI toast notification** — when warmup completes, show a toast: "All providers ready" with the count of modules loaded. User can dismiss. 3. **Hook API endpoint** — `GET /api/warmup_status` returns the current state; `GET /api/warmup_wait?timeout=N` blocks until done (for tests). The user said: *"the app controller should post to test clients or the user when its threads are warmed up with imports — that way the user knows 'hey you have the ui first, but now you have all the functionality.'"* This is exactly what the notification surfaces achieve. **Why this beats lazy-loading:** if a user clicks "switch to Gemini" and the controller lazy-loads `google.genai` on that action, the user sees ~1s of nothing happening between the click and the visible response. With warmup, the click is instant because `google.genai` is already in `sys.modules`. The 1s of cost was paid during startup, when the user was looking at a splash or otherwise not waiting on input. #### Layer 4 — Worker-process isolation (future, out of scope) The codebase already runs `gemini_cli` and external MCP servers as subprocesses for this exact reason. A future track could move `google.genai` / `anthropic` into their own worker processes, communicating via the existing `SyncEventQueue`. This track does NOT do this — Layer 1+2+3 is sufficient for the current problem. ### 2.3 Threading constraints (verified empirically) The user's question: *"if I import in the app controller's thread, will it block the GUI's thread?"* The answer is: | Scenario | Blocks GUI? | |---|---| | Module top-level import of heavy X, then main imports X | **YES** (X's import is in main's chain). This is why we remove heavy imports from main-thread-reachable files. | | `_io_pool` worker warming X while main thread renders | **NO direct block, but GIL contention causes micro-stutters** (~5-50ms each). Acceptable because the pool is capped at 4 threads and the main thread is mostly idle in `immapp.run()`. | | `_io_pool` worker warms X; main thread later calls `_require_warmed("X")` (X already in `sys.modules`) | **NO** (the lookup is a `dict.get()` — instant, no import lock contention). | | User-triggered UI action (e.g. provider switch) propagates to controller which calls `_require_warmed` on a warmed module | **NO** (lookup is instant). This is the win the user explicitly called out: no user-perceptible lag. | | `wait_for_warmup()` blocks the asyncio thread waiting for warmup | **NO direct block on GUI** (different thread). Asyncio thread waits; main thread renders. Acceptable but rarely needed if user waits for warmup notification first. | | Spawning a new `threading.Thread` for each import warmup | **Wasteful** (thread creation ~1-5ms each; thread count explodes). Use the `_io_pool` instead. | This means: **Layer 1 is non-negotiable.** Even with warmup on `_io_pool`, if the heavy import is also in the main thread's import chain, the main thread will block on the import lock the moment it tries to use the module. Layer 1 removes the heavy imports from the main thread's chain; Layer 2 reuses threads efficiently; Layer 3 proactively warms on bg threads so the FIRST user-triggered use is instant. ### 2.4 Enforcement: the "main thread purity" audit Two enforcement mechanisms, both required: #### Static: `scripts/audit_main_thread_imports.py` (CI gate) 1. AST-walk the import graph reachable from `sloppy.py` (the main entry). For each `.py` file in the graph, collect top-level `import X` and `from X import Y` statements. 2. Compare against an allowlist of "main-thread-safe" modules (stdlib + `imgui_bundle` + the lean gui_2 skeleton list from §2.1). Any non-allowlist import is a violation. 3. Exit non-zero with a clear message naming the file, line, and heavy module. 4. Run as part of CI (`uv run python scripts/audit_main_thread_imports.py`) and as a pre-commit hook. #### Runtime: `tests/test_main_thread_purity.py` (TDD, empirical) 1. Spawn `uv run python sloppy.py --headless --enable-test-hooks` as a subprocess, with a `sys.addaudithook` callback that logs every `import` event with the calling thread. 2. Wait for the headless server to be ready (or 5s timeout). 3. Read the audit log. Assert: every `import` event with `threading.current_thread() is threading.main_thread()` was for a module in the allowlist. 4. Kill the subprocess. This is the empirical enforcement: it proves the invariant holds at runtime, not just at static analysis time. --- ## 3. Architectural Changes ### 3.1 Per-file import plan For each source file reachable from the main thread's import chain, we **remove top-level heavy imports** and have functions access them via `_require_warmed(name)`. The warmup jobs (§3.2) put the modules in `sys.modules` before any function is called. #### `src/ai_client.py` (the biggest win: ~1800ms) Top-level today: `from google import genai`, `import anthropic`, `import openai`, `import requests` (used by deepseek/minimax). After: - **Drop all four heavy imports from the top.** Add `_require_warmed(name)` helper at the top. - `_send_gemini()` calls `_require_warmed("google.genai")` to get the module - `_send_anthropic()` calls `_require_warmed("anthropic")` - `_send_deepseek()` and `_send_minimax()` call `_require_warmed("openai")` and `_require_warmed("requests")` - Provider client objects (`_gemini_client`, `_anthropic_client`, etc.) stay as module globals but are now `None` until `_send_*` initializes them (extracted from current top-level logic into a new `_ensure__client()` that uses the warmed module) - The warmup list in `AppController._compute_warmup_list()` includes `google.genai`, `anthropic`, `openai`, `requests` (always warmed) **Result:** ~1800ms off the main thread. The bg threads pay this cost during startup. By the time the first AI call happens (which is always async, on the asyncio thread), the modules are in `sys.modules` and the lookup is instant. No user-perceptible lag. #### `src/api_hooks.py` (FastAPI in headless/web only) Top-level today: `from fastapi import ...`, `from fastapi.security.api_key import ...` (only needed if `--enable-test-hooks` or `--web-host`). After: - **Drop these from top.** Add `_require_warmed(name)` calls inside the methods that need them. - The warmup list in `AppController._compute_warmup_list()` includes `fastapi`, `fastapi.security.api_key` **conditionally** — only when `enable_test_hooks` or `web_host` is set **Result:** ~470ms off the main thread for non-test, non-web launches. For `live_gui` tests (`--enable-test-hooks`), the warmup loads fastapi during the same startup window, so the hook server is ready when the process announces readiness. #### `src/commands.py` (command palette warmup-aware) Top-level today: `from src.command_palette import ...` at `src/commands.py:1`. After: - **Drop the top-level import.** The command functions call `_require_warmed("src.command_palette")` to access the module - The warmup list includes `src.command_palette` **Result:** ~244ms off the main thread's import chain. The bg thread warms it during startup; the first `Ctrl+Shift+P` is instant. #### `src/theme_2.py` (NERV theme warmup-aware) Top-level today: `from src.theme_nerv import ...`, `from src.theme_nerv_fx import ...` at the top of `src/theme_2.py`. After: - **Drop the top-level imports.** `apply_nerv_theme()` (or the function that activates NERV) calls `_require_warmed("src.theme_nerv")` and `_require_warmed("src.theme_nerv_fx")` - The warmup list includes both NERV modules **Result:** ~485ms off the main thread's import chain (the default non-NERV path is lean). User pays the cost during startup; theme switch is instant when they pick NERV. #### `src/markdown_helper.py` (markdown table warmup-aware) Top-level today: `from src.markdown_table import ...` at `src/markdown_helper.py:1`. After: - **Drop the top-level import.** The table-detection branch of `render()` calls `_require_warmed("src.markdown_table")` - The warmup list includes `src.markdown_table` **Result:** ~250ms off the main thread's import chain. First markdown table render is instant. #### `src/imgui_scopes.py`, `src/gui_2.py`, `src/bg_shader.py` (KEEP `imgui_bundle`) These MUST keep `import imgui_bundle` at top — the ImGui render loop is the hot path and needs the module on first frame. There is no way to defer this without breaking the render loop. What CAN be deferred inside `src/gui_2.py`: - `import numpy` (only needed for `bg_shader`; the GUI itself doesn't need numpy on the first frame) — move to `_require_warmed("numpy")` in the bg shader call site, add `numpy` to the warmup list - Other feature-gated imports — same pattern #### `src/gui_2.py` direct heavy imports (audit) We will use AST to audit which `import X` statements at `src/gui_2.py` top-level are reachable from the first-frame render path (`render_main_window`, `render_main_menu_bar`, etc.) and which are feature-gated. First-frame imports stay top-level. Feature-gated ones move to `_require_warmed(...)` calls at the use site, with the module added to the warmup list. ### 3.2 Job pool + warmup scaffolding New code in `src/app_controller.py`: ```python from concurrent.futures import ThreadPoolExecutor import importlib import threading # In AppController.__init__, after the asyncio loop starts: self._io_pool = ThreadPoolExecutor( max_workers=4, thread_name_prefix="controller-io", ) # Warmup state self._warmup_lock = threading.Lock() self._warmup_done_event = threading.Event() self._warmup_status: dict[str, list[str]] = { "pending": [], "completed": [], "failed": [], } self._warmup_callbacks: list[Callable] = [] self._submit_warmup_jobs() ``` `_submit_warmup_jobs()` computes the warmup list and submits one job per module to the pool: ```python def _submit_warmup_jobs(self) -> None: heavy = self._compute_warmup_list() with self._warmup_lock: self._warmup_status["pending"] = list(heavy) self._warmup_status["completed"] = [] self._warmup_status["failed"] = [] self._warmup_done_event.clear() for name in heavy: self._io_pool.submit(self._warmup_one, name) def _compute_warmup_list(self) -> list[str]: result = [ "google.genai", "anthropic", "openai", "requests", "src.command_palette", "src.theme_nerv", "src.theme_nerv_fx", "src.markdown_table", "numpy", # used by bg_shader; warmed for first invocation ] if self._enable_test_hooks or self._web_host: result.extend(["fastapi", "fastapi.security.api_key"]) return result ``` Each warmup worker imports the module, updates the status, and on the last one fires the completion callbacks (so the GUI status indicator and toast notification can react): ```python def _warmup_one(self, name: str) -> None: try: importlib.import_module(name) with self._warmup_lock: self._warmup_status["pending"].remove(name) self._warmup_status["completed"].append(name) except Exception: with self._warmup_lock: self._warmup_status["pending"].remove(name) self._warmup_status["failed"].append(name) finally: with self._warmup_lock: done = not self._warmup_status["pending"] cbs = list(self._warmup_callbacks) if done else [] if done: self._warmup_done_event.set() for cb in cbs: try: cb(dict(self._warmup_status)) except Exception: pass ``` Public API on `AppController`: ```python def warmup_status(self) -> dict[str, list[str]]: """Snapshot the current warmup state. Cheap (lock-guarded copy).""" with self._warmup_lock: return {k: list(v) for k, v in self._warmup_status.items()} def is_warmup_done(self) -> bool: return self._warmup_done_event.is_set() def wait_for_warmup(self, timeout: float | None = None) -> bool: """Block until warmup completes. Returns True on done, False on timeout.""" return self._warmup_done_event.wait(timeout=timeout) def on_warmup_complete(self, callback: Callable[[dict], None]) -> None: """Register a callback for warmup completion. If already done, fires immediately.""" with self._warmup_lock: if self._warmup_done_event.is_set(): snap = {k: list(v) for k, v in self._warmup_status.items()} if "snap" in dir(): # already done callback(snap) else: with self._warmup_lock: self._warmup_callbacks.append(callback) ``` Hook API endpoints (added in `src/api_hooks.py`): - `GET /api/warmup_status` → `controller.warmup_status()` - `GET /api/warmup_wait?timeout=N` → blocks until done, returns final status GUI integration (in `src/gui_2.py`): - Status bar: "Warming up... (5/8)" while in flight, "All imports ready" + green dot when done. Polled once per frame from `controller.warmup_status()` (cheap, ~microseconds). - On transition to done: show a toast notification "All providers ready (8 modules)" for 5 seconds. In `AppController.shutdown()` (or wherever lifecycle cleanup lives): `self._io_pool.shutdown(wait=False)`. Non-blocking because the pool's workers are daemon threads and will die with the process anyway. ### 3.3 Startup timing instrumentation Add `src/startup_profiler.py`: ```python class StartupProfiler: """Records wall-clock time spent in each named init phase. Cheap (no I/O). Stored on AppController.startup_profile for later inspection via the Hook API (`GET /api/startup_profile`) and the Diagnostics panel. """ _phases: list[tuple[str, float, float]] # (name, start, duration_ms) @contextmanager def phase(self, name: str) -> Iterator[None]: t0 = time.perf_counter() yield self._phases.append((name, t0, (time.perf_counter() - t0) * 1000)) ``` Used at every major init step in `AppController.__init__` and `App.__init__`. --- ## 4. Phases ### Phase 1: Audit + Benchmark + Foundation (Day 1) - T1.1: Run `scripts/benchmark_imports.py` and capture baseline - T1.2: AST-audit every `import X` in `src/*.py` to map which is reachable from the first-frame render path vs feature-gated - T1.3: Add `StartupProfiler` to `src/app_controller.py` and instrument current init - T1.4: Add `scripts/audit_main_thread_imports.py` (static gate) - T1.5: Commit baseline + audit script ### Phase 2: Job Pool + Warmup Foundation (Day 1) - T2.1 (TDD Red): `tests/test_app_controller_io_pool.py` — assert `AppController` has a 4-worker `_io_pool` named `controller-io-*` - T2.2 (Green): Add `_io_pool` to `AppController.__init__` with named threads - T2.3 (TDD Red): `tests/test_warmup_mechanism.py` — assert warmup jobs are submitted in `__init__`, complete within 10s, fire the done event, support callbacks, don't block init - T2.4 (Green): Implement `_submit_warmup_jobs()`, `_compute_warmup_list()`, `_warmup_one()`, `warmup_status()`, `is_warmup_done()`, `wait_for_warmup()`, `on_warmup_complete()` per spec §3.2 - T2.5: Run T2.1 + T2.3 tests, confirm PASS - T2.6: Commit ### Phase 3: Remove top-level heavy SDK imports from `src/ai_client.py` (Day 2) - T3.1 (TDD Red): `tests/test_ai_client_no_top_level_sdk_imports.py` — assert `import src.ai_client` does NOT load `google.genai` / `anthropic` / `openai` / `requests` (warmup hasn't run in the subprocess) - T3.2 (Green): Remove the four heavy imports from the top of `ai_client.py`. Add `_require_warmed(name)` helper. Each `_send_*` uses `_require_warmed("google.genai")` etc. - T3.3: Run existing `tests/test_ai_client.py`; fix any breakage (tests relying on top-level import side effects need a fixture that warms or a fallback for test mode) - T3.4: Confirm T3.1 tests PASS - T3.5: Commit ### Phase 4: Remove top-level FastAPI imports from `src/api_hooks.py` (Day 2) - T4.1 (TDD Red): `tests/test_hook_server_no_top_level_fastapi.py` — assert `from src.api_hooks import HookServer` does NOT import fastapi - T4.2 (Green): Remove the fastapi imports from top. Use `_require_warmed` inside the methods that need them - T4.3: Run existing `tests/test_api_hooks.py`; fix - T4.4: Commit ### Phase 5: Remove top-level imports for feature-gated GUI modules (Day 3) - T5A: Command Palette — `tests/test_command_palette_no_top_level_import.py` + remove from `src/commands.py` + use `_require_warmed("src.command_palette")` - T5B: NERV Theme — `tests/test_theme_nerv_no_top_level_import.py` + remove from `src/theme_2.py` + use `_require_warmed("src.theme_nerv")` etc. - T5C: Markdown Table — `tests/test_markdown_helper_no_top_level_import.py` + remove from `src/markdown_helper.py` + use `_require_warmed("src.markdown_table")` - T5D: GUI feature-gated — audit `src/gui_2.py` via the T1.2 script, apply same pattern. `numpy` migrates to `_require_warmed` in `bg_shader` call site. - T5E: Commit per module (4 atomic commits) ### Phase 6: Migrate ad-hoc threads to `_io_pool` (Day 4) - T6.1: Audit: `grep -rn "threading.Thread(" src/` to find all ad-hoc thread spawns (excluding `HookServer` and `WorkerPool` which are domain-specific) - T6.2: Refactor each ad-hoc thread to use `controller.submit_io(fn)` instead - T6.3: Per-migration commit - T6.4: Final `grep -rn "threading.Thread(" src/` shows ZERO new spawns ### Phase 7: Warmup Notification (Hook API + GUI) (Day 4) - T7A.1 (TDD Red): `tests/test_api_hooks_warmup.py` — assert `GET /api/warmup_status` and `GET /api/warmup_wait` work - T7A.2 (Green): Add the two endpoints in `src/api_hooks.py` and register `warmup_status` in `_gettable_fields` - T7B.1: In `src/gui_2.py`, add a status-bar indicator that polls `controller.warmup_status()` each frame: "Warming up... (N/M)" while pending, "All imports ready" with green dot on completion - T7B.2: Register a callback via `controller.on_warmup_complete(cb)` that shows a toast "All providers ready (M modules)" on success - T7B.3: Update docs (status bar, toast, hook API) - T7B.4: Commit ### Phase 8: Enforcement — Runtime Audit Hook (Day 4) - T8.1 (TDD Red): `tests/test_main_thread_purity.py` — spawn `sloppy.py --headless --enable-test-hooks` with a `sys.addaudithook` shim, verify no heavy import happens on the main thread - T8.2: Once Phase 3-5 land, this test should start passing. Wire into CI as a gating test (`@pytest.mark.slow`). - T8.3: Commit ### Phase 9: Verify + Checkpoint (Day 5) - T9.1: Re-run `scripts/benchmark_imports.py --runs=3`; confirm `import src.ai_client` < 50ms, `import src.gui_2` < 500ms, `import src.app_controller` < 300ms - T9.2: Re-run `scripts/audit_main_thread_imports.py`; exit 0 - T9.3: Run `tests/test_warmup_mechanism.py`; warmup completes and notifications fire - T9.4: Run `tests/test_main_thread_purity.py`; pass - T9.5: Run full `live_gui` test batch; `wait_for_server(timeout=15)` no longer times out. Tests can call `controller.wait_for_warmup()` before exercising warmup-dependent functionality. - T9.6: Manual smoke: - `uv run sloppy.py`: time-to-first-frame < 1.5s, observe status indicator "Warming up... (N/M)" → "All imports ready" + toast - `uv run sloppy.py --enable-test-hooks`: same, plus `/api/warmup_status` returns `completed` after a brief wait - `uv run sloppy.py --headless`: time-to-server-ready - **Provider switch test**: switch from MiniMax to Gemini in the GUI after warmup. The action must be INSTANT, not 1s-delayed (proves warmup did its job) - T9.7: Phase checkpoint commit + git note with full verification report - T9.8: Update `conductor/tracks.md`; archive track `uv run sloppy.py --enable-test-hooks` both feel snappier - T9.6: Phase checkpoint commit with full verification report --- ## 5. Risks and Mitigations | Risk | Likelihood | Impact | Mitigation | |---|---|---|---| | Lazy import inside a hot path adds latency on every call | Med | Med | Always gate the import with `sys.modules` check OR use module-level sentinel | | First AI call on the asyncio thread blocks for ~955ms while `google.genai` imports | High | Low | The user already paid this latency budget; happens on the asyncio worker, not main. Document the expected first-call pause. | | Lazy import surfaces circular import that was hidden by top-level ordering | Med | Med | Phase 1 audit catches this; defer each lazy import to the test phase | | Test fixtures import the heavy module before main code, breaking assumptions | Low | Low | `reset_ai_client` and `isolate_workspace` fixtures already lazy-reset | | Hot reload of a now-lazy module doesn't trigger | Low | Med | Update `HotReloader.HOT_MODULES` to register the lazy module's gate function | | `_io_pool` worker importing a heavy module holds GIL and stutters GUI | Med | Low | The pool is capped at 4 threads; stutter is bounded; user sees responsive UI before any stutter | | A future commit re-introduces a heavy import on the main thread | Med | High | Static gate (`audit_main_thread_imports.py`, CI) + runtime audit hook (`test_main_thread_purity.py`) catch this | ### Hot Reload consideration `src/hot_reloader.py` registers modules at import time. Lazy-loaded modules (imported inside functions) are NOT registered. The hot-reload workflow needs: - Either: register the lazy module with a callback that forces a re-import via `importlib.reload` - Or: explicitly trigger the lazy import on hot-reload trigger This is a small follow-up task; the lazy import itself doesn't break hot reload (it just means you have to invoke the gate function once to materialize the module before reload can take effect). --- ## 6. Verification Criteria The track is complete when: - [ ] `import src.ai_client` cold start < 50ms (down from ~1800ms) - [ ] `import src.gui_2` cold start < 500ms (down from ~3000ms) - [ ] `import src.app_controller` cold start < 300ms (down from ~700ms) - [ ] `uv run sloppy.py --enable-test-hooks` reaches `immapp.run()` in < 1.5s - [ ] `live_gui.wait_for_server(timeout=15)` passes for all 273+ tests - [ ] `scripts/audit_main_thread_imports.py` exits 0 (no heavy imports on main) - [ ] `tests/test_main_thread_purity.py` passes (runtime audit hook confirms invariant) - [ ] `scripts/benchmark_imports.py` shows no new red entries in the top-20 - [ ] **`controller.wait_for_warmup(timeout=10.0)` returns True** — warmup completed within 10s of `AppController.__init__` - [ ] **All modules in the warmup list are in `sys.modules` after warmup** — `controller.warmup_status()['pending']` is empty, `'completed'` contains all expected module names - [ ] **User-triggered actions on warmed modules are instant** — manual test switching providers (e.g. MiniMax → Gemini) after warmup completes shows NO perceptible lag (was ~1s with lazy-loading) - [ ] **GUI status indicator transitions** — observe "Warming up... (N/M)" in the status bar, then "All imports ready" with green dot, then a toast notification fires via `controller.on_warmup_complete(...)` - [ ] **Hook API exposes warmup state** — `GET /api/warmup_status` returns `{pending: [], completed: [...], failed: []}`; `GET /api/warmup_wait?timeout=10` returns the final state - [ ] **NO `import X` statements inside function bodies for heavy modules** — verified by `grep -rn "^\s*import \(google\|anthropic\|openai\|fastapi\|src\.command_palette\|src\.theme_nerv\|src\.markdown_table\)" src/` - [ ] No regressions in the existing 272/273 passing tests - [ ] `grep -rn "threading.Thread(" src/` shows ZERO new spawns after Phase 6 migration (only the existing project scaffolding threads like `HookServer` and `WorkerPool` remain, and they're domain-specific) - [ ] Startup profile + io_pool status visible in `/api/startup_profile`, `/api/io_pool_status`, and the Diagnostics panel --- ## 7. Out of Scope - Process-isolation of heavy SDKs (Layer 4 in §2.2) — future track - `imgui_bundle` lazy loading — fundamentally impossible (ImGui hot path) - Importing on the main thread for the lean `gui_2` skeleton (~300ms unavoidable) - `pydantic` lazy loading (used by `src/models.py` which is imported by 16 files; the cost is already amortized and deferring it would cascade) - Lazy-loading heavy modules in function bodies (Layer 1 in §2.2 — explicitly rejected by the user; warmup is the only mechanism) --- ## 8. Cross-References - `conductor/tracks.md` line 152 — original backlog entry that this track fulfills - `docs/guide_architecture.md:43-67` — thread domains (asyncio worker is the right place for heavy work) - `docs/guide_architecture.md:880-898` — Architectural Invariants (single-writer principle; this track respects it) - `docs/guide_app_controller.md:241-271` — existing `get_rag_engine` / `get_mma_conductor` lazy patterns (the templates this track replicates) - `docs/guide_hot_reload.md:295-312` — what is/isn't safe to hot-reload (lazy-loaded modules need a small follow-up) - `conductor/workflow.md` — TDD Red-Green-Refactor protocol + atomic per-task commits + git notes - `scripts/benchmark_imports.py` — the measurement tool built in this conversation