Architectural shift driven by user clarification: lazy-loading on first
use causes user-perceptible lag when the user-triggered action (e.g.
provider switch) propagates to a controller method that triggers the
first import. The fix is to pre-import heavy modules on a bg thread
at startup and have functions access them via _require_warmed().
Old design (rejected):
- from google import genai inside _send_gemini (lazy on first call)
- First user action that triggers this pays the cost; UI feels laggy
New design (this commit):
- Top-level heavy imports REMOVED from main-thread-reachable files
- AppController.__init__ submits warmup jobs to _io_pool (4 threads,
named 'controller-io-N')
- Each warmup worker imports its module and updates a thread-safe
warmup_status dict
- Functions access modules via _require_warmed(name), which assumes
the module is in sys.modules (warmed at startup)
- When all jobs complete, _warmup_done_event is set and registered
on_warmup_complete callbacks fire
- GUI shows status indicator + toast when warmup completes
- Hook API exposes /api/warmup_status and /api/warmup_wait
- Tests can call controller.wait_for_warmup() before exercising
warmup-dependent functionality
Phase 2 now bundles job pool + warmup (T2.3+T2.4 add warmup tests +
implementation). Phases 3-5 do 'remove top-level imports' instead of
'lazy-load'. Phase 7 is the notification surface (Hook API + GUI).
Definition of Done includes warmup-completion criteria, the
'no function-body imports' check, and an end-to-end 'provider switch
is INSTANT' smoke test.
No code changes; this is a planning update only.
35 KiB
Track: Sloppy.py Startup Speedup
Status: Active
Initialized: 2026-06-06
Owner: Tier 2 Tech Lead
Priority: High (regression blocker — live_gui fixtures time out at wait_for_server(timeout=15))
1. Problem Statement
uv run sloppy.py --enable-test-hooks startup latency has crept up. live_gui tests
time out at wait_for_server(timeout=15). Root cause is too much work on the main
thread before immapp.run() returns and the GUI becomes interactive:
- 5 AI provider SDKs (
google.genai,anthropic,openai,requests, ...) eagerly imported atsrc/ai_client.pymodule top-level, even though only one is the active provider at runtime imgui_bundletransitively pullsnumpyand 9 other heavy modules at the top ofsrc/gui_2.pyand 9 sibling files- NERV theme, command palette, markdown table extensions are loaded eagerly even though they are feature-gated
AppController.__init__does all subsystem construction synchronously on the thread that will become the main GUI thread (path manager, presets, personas, context presets, tool presets, history, workspace, RAG, hook server)
The architecture is already correct: AI calls go through the asyncio worker thread, so the call is non-blocking. The imports are still synchronous on the main thread, and that is what the user sees as "sloppy.py is slow to open."
1.1 Measurement Baseline (from scripts/benchmark_imports.py)
Cold-start subprocess timings, median of 3 runs, 85 unique import paths:
| module | time | files | classification |
|---|---|---|---|
| google.genai | ~955ms | 1 | defer (provider SDK, default) |
| openai | ~445ms | 1 | defer (provider SDK) |
| anthropic | ~430ms | 1 | defer (provider SDK) |
| src.markdown_table | ~250ms | 1 | defer (feature-gated) |
| src.theme_nerv | ~245ms | 1 | defer (feature-gated) |
| imgui_bundle | ~245ms | 10 | KEEP (ImGui hot path) |
| src.command_palette | ~244ms | 1 | defer (feature-gated) |
| src.theme_nerv_fx | ~240ms | 1 | defer (feature-gated) |
| fastapi (+ security.api_key) | ~470ms combined | 1 | defer (only --enable-test-hooks or web mode) |
| requests | ~92ms | 3 | defer (deepseek/minimax only) |
| numpy | ~65ms | 2 | keep (bg_shader; optional in gui_2) |
| pydantic | ~70ms | 1 | keep (models.py is loaded by everyone) |
| tree_sitter_* | ~25ms each | 1 | keep (file_cache) |
Estimated main-thread import cost today (worst case, all paths): ~2500-3000ms (1.0s SDKs + 1.0s web/fastapi + 0.5s GUI extras + ~0.5s transitives).
Estimated main-thread import cost after this track:
~500-600ms (imgui_bundle + lean gui_2 + pydantic models). Net savings
~2000-2400ms.
2. Approach
The architecture is already correct. The fix is systematic application of the
lazy-load + shared-job-pool patterns the codebase already uses for RAGEngine
(get_rag_engine in src/app_controller.py:244-249) and MultiAgentConductor
(get_mma_conductor in src/app_controller.py:266-271).
2.1 Architectural Invariant: Main Thread Purity
The main thread (the one that enters
immapp.run()) must NEVER import a module heavier thanimgui_bundleand the leangui_2skeleton. Every heavy import is loaded by the asyncio worker thread, the AppController's shared job pool, or the MMA WorkerPool. This invariant is enforced by an audit script (CI gate) and a runtime audit-hook test that fails if a heavy import is observed on the main thread at startup.
Concretely, the main thread's import chain is allowed to contain:
- All
import Xstatements transitively reachable fromsrc/gui_2.pywhose accumulated import time is < 50ms - The modules:
imgui_bundle,defer,src.imgui_scopes,src.theme_2(default theme only),src.theme_models,src.paths,src.models,src.events - Anything in
sys.stdlib_module_names
Everything else — provider SDKs, FastAPI, NERV theme, command palette, markdown
table extensions, the full src.ai_client provider list, numpy/psutil/
tree_sitter_* if used by lazy code paths — must be loaded by a background
mechanism that does not run on the main thread.
2.2 Four layers of protection
Layer 1 — Explicit warmup-aware module access (the load-bearing wall, non-negotiable)
Remove heavy imports from the top of source files reachable from the main
thread. Functions that need them use a _require_warmed(name) helper that
assumes the module is already in sys.modules (because warmup put it there):
# BEFORE (src/ai_client.py, current)
from google import genai
import anthropic
import openai
# ... 5 provider SDKs loaded unconditionally
# AFTER
import sys
import importlib
from typing import Any
def _require_warmed(name: str) -> Any:
"""Get a module that AppController's warmup should have loaded.
Raises RuntimeError if the module is not in sys.modules. This is the
explicit contract: heavy modules MUST be warmed at startup. No lazy
loading on first use — the import is paid upfront on a bg thread.
"""
mod = sys.modules.get(name)
if mod is None:
raise RuntimeError(
f"Module {name!r} is not warmed. "
f"AppController.__init__ must have run first (which submits warmup jobs)."
)
return mod
def _send_gemini(md_content, user_message, ...):
genai = _require_warmed("google.genai")
# ... use genai ...
Why no import X inside the function body? Because that would be lazy
loading on first use. If the first use is triggered by a user UI action
(e.g. switching the provider from MiniMax to Gemini, the controller enqueues
an action that propagates to the first call), the user sees a 955ms lag
between their click and any visible response. That's the bad case the user
called out: "lazy loading introduces latencies when interacting with the UI
state vs the bg state."
By warming proactively, the first user-triggered call is instant. The cost is paid during startup on a bg thread, before the user can interact.
Main-thread cost: zero. The main thread's import chain is fully lean
(none of the heavy modules are imported top-level). The warmup jobs run on
_io_pool workers in parallel with the main thread's remaining init.
Layer 2 — Shared job pool on AppController (no new threads per task)
The codebase already has these dedicated / shared threads:
AppController._loop_thread— asyncio worker (DEDICATED to the AI event loop, do not use for arbitrary work)WorkerPool(insrc/multi_agent_conductor.py) — 4-thread pool for MMA workers (DEDICATED to MMA, do not pollute with imports or I/O)HookServerthread — DEDICATED to the FastAPI server- Ad-hoc
threading.Threadcalls — used for one-off tasks; the user wants to MINIMIZE these
User constraint: no new daemon threads per import warmup, per I/O task, per
log-prune. We add ONE shared ThreadPoolExecutor to AppController named
_io_pool, and any subsystem that needs background work submits jobs to it.
This includes:
- Initial RAG index warm-up (if applicable)
- Log pruning (currently a one-shot thread — refactor to use the pool)
- Disk-bound subsystem initialization (e.g., TOML re-read on persona switch)
- Heavy module warmup (the primary use case for this track)
# In AppController.__init__
from concurrent.futures import ThreadPoolExecutor
self._io_pool = ThreadPoolExecutor(
max_workers=4,
thread_name_prefix="controller-io",
)
Threads created by this track: 4 (the pool). Not 4+1 per job, not 1 per
import, not 1 per subsystem. Just 4 long-lived threads that all background work
shares. Future work that needs a bg thread should controller._io_pool.submit(fn).
Layer 3 — Proactive warmup + completion notification (the new mechanism)
This is the core of the track. In AppController.__init__, immediately after
_io_pool is created, the controller submits a job to the pool for each heavy
module that needs warming. The main thread does NOT wait for these to complete.
# In AppController.__init__, right after self._io_pool is created
self._warmup_status: dict[str, list[str]] = {
"pending": [], "completed": [], "failed": [],
}
self._warmup_lock = threading.Lock()
self._warmup_done_event = threading.Event()
self._warmup_callbacks: list[Callable] = []
self._submit_warmup_jobs()
def _submit_warmup_jobs(self) -> None:
"""Submit bg jobs to import heavy modules. Notifies subscribers on completion."""
heavy = self._compute_warmup_list()
with self._warmup_lock:
self._warmup_status["pending"] = list(heavy)
self._warmup_status["completed"] = []
self._warmup_status["failed"] = []
self._warmup_done_event.clear()
for module_name in heavy:
self._io_pool.submit(self._warmup_one, module_name)
def _compute_warmup_list(self) -> list[str]:
result = [
# AI provider SDKs
"google.genai", "anthropic", "openai", "requests",
# Feature-gated GUI (used by main thread but not on first frame)
"src.command_palette",
"src.theme_nerv", "src.theme_nerv_fx",
"src.markdown_table",
]
if self._enable_test_hooks or self._web_host:
result.extend(["fastapi", "fastapi.security.api_key"])
return result
def _warmup_one(self, module_name: str) -> None:
try:
importlib.import_module(module_name)
with self._warmup_lock:
self._warmup_status["pending"].remove(module_name)
self._warmup_status["completed"].append(module_name)
except Exception as e:
with self._warmup_lock:
self._warmup_status["pending"].remove(module_name)
self._warmup_status["failed"].append(module_name)
finally:
with self._warmup_lock:
done = not self._warmup_status["pending"]
callbacks = list(self._warmup_callbacks) if done else []
if done:
self._warmup_done_event.set()
for cb in callbacks:
try:
cb(self._warmup_status)
except Exception:
pass
Completion notification is critical for the user-visible UX. Three surfaces:
-
GUI status indicator — the status bar shows "Warming up... (5/8)" while the bg jobs run, then "All imports ready" with a green dot when complete. The GUI never blocks waiting; the indicator is updated by polling
controller.warmup_status()once per frame (cheap, lock-guarded). -
GUI toast notification — when warmup completes, show a toast: "All providers ready" with the count of modules loaded. User can dismiss.
-
Hook API endpoint —
GET /api/warmup_statusreturns the current state;GET /api/warmup_wait?timeout=Nblocks until done (for tests).
The user said: "the app controller should post to test clients or the user when its threads are warmed up with imports — that way the user knows 'hey you have the ui first, but now you have all the functionality.'" This is exactly what the notification surfaces achieve.
Why this beats lazy-loading: if a user clicks "switch to Gemini" and the
controller lazy-loads google.genai on that action, the user sees ~1s of
nothing happening between the click and the visible response. With warmup,
the click is instant because google.genai is already in sys.modules. The
1s of cost was paid during startup, when the user was looking at a splash or
otherwise not waiting on input.
Layer 4 — Worker-process isolation (future, out of scope)
The codebase already runs gemini_cli and external MCP servers as subprocesses
for this exact reason. A future track could move google.genai / anthropic into
their own worker processes, communicating via the existing SyncEventQueue. This
track does NOT do this — Layer 1+2+3 is sufficient for the current problem.
2.3 Threading constraints (verified empirically)
The user's question: "if I import in the app controller's thread, will it block the GUI's thread?" The answer is:
| Scenario | Blocks GUI? |
|---|---|
| Module top-level import of heavy X, then main imports X | YES (X's import is in main's chain). This is why we remove heavy imports from main-thread-reachable files. |
_io_pool worker warming X while main thread renders |
NO direct block, but GIL contention causes micro-stutters (~5-50ms each). Acceptable because the pool is capped at 4 threads and the main thread is mostly idle in immapp.run(). |
_io_pool worker warms X; main thread later calls _require_warmed("X") (X already in sys.modules) |
NO (the lookup is a dict.get() — instant, no import lock contention). |
User-triggered UI action (e.g. provider switch) propagates to controller which calls _require_warmed on a warmed module |
NO (lookup is instant). This is the win the user explicitly called out: no user-perceptible lag. |
wait_for_warmup() blocks the asyncio thread waiting for warmup |
NO direct block on GUI (different thread). Asyncio thread waits; main thread renders. Acceptable but rarely needed if user waits for warmup notification first. |
Spawning a new threading.Thread for each import warmup |
Wasteful (thread creation ~1-5ms each; thread count explodes). Use the _io_pool instead. |
This means: Layer 1 is non-negotiable. Even with warmup on _io_pool, if
the heavy import is also in the main thread's import chain, the main thread
will block on the import lock the moment it tries to use the module. Layer 1
removes the heavy imports from the main thread's chain; Layer 2 reuses
threads efficiently; Layer 3 proactively warms on bg threads so the FIRST
user-triggered use is instant.
2.4 Enforcement: the "main thread purity" audit
Two enforcement mechanisms, both required:
Static: scripts/audit_main_thread_imports.py (CI gate)
-
AST-walk the import graph reachable from
sloppy.py(the main entry). For each.pyfile in the graph, collect top-levelimport Xandfrom X import Ystatements. -
Compare against an allowlist of "main-thread-safe" modules (stdlib +
imgui_bundle+ the lean gui_2 skeleton list from §2.1). Any non-allowlist import is a violation. -
Exit non-zero with a clear message naming the file, line, and heavy module.
-
Run as part of CI (
uv run python scripts/audit_main_thread_imports.py) and as a pre-commit hook.
Runtime: tests/test_main_thread_purity.py (TDD, empirical)
-
Spawn
uv run python sloppy.py --headless --enable-test-hooksas a subprocess, with asys.addaudithookcallback that logs everyimportevent with the calling thread. -
Wait for the headless server to be ready (or 5s timeout).
-
Read the audit log. Assert: every
importevent withthreading.current_thread() is threading.main_thread()was for a module in the allowlist. -
Kill the subprocess.
This is the empirical enforcement: it proves the invariant holds at runtime, not just at static analysis time.
3. Architectural Changes
3.1 Per-file import plan
For each source file reachable from the main thread's import chain, we
remove top-level heavy imports and have functions access them via
_require_warmed(name). The warmup jobs (§3.2) put the modules in
sys.modules before any function is called.
src/ai_client.py (the biggest win: ~1800ms)
Top-level today: from google import genai, import anthropic, import openai,
import requests (used by deepseek/minimax).
After:
- Drop all four heavy imports from the top. Add
_require_warmed(name)helper at the top. _send_gemini()calls_require_warmed("google.genai")to get the module_send_anthropic()calls_require_warmed("anthropic")_send_deepseek()and_send_minimax()call_require_warmed("openai")and_require_warmed("requests")- Provider client objects (
_gemini_client,_anthropic_client, etc.) stay as module globals but are nowNoneuntil_send_*initializes them (extracted from current top-level logic into a new_ensure_<provider>_client()that uses the warmed module) - The warmup list in
AppController._compute_warmup_list()includesgoogle.genai,anthropic,openai,requests(always warmed)
Result: ~1800ms off the main thread. The bg threads pay this cost during
startup. By the time the first AI call happens (which is always async, on
the asyncio thread), the modules are in sys.modules and the lookup is
instant. No user-perceptible lag.
src/api_hooks.py (FastAPI in headless/web only)
Top-level today: from fastapi import ..., from fastapi.security.api_key import ...
(only needed if --enable-test-hooks or --web-host).
After:
- Drop these from top. Add
_require_warmed(name)calls inside the methods that need them. - The warmup list in
AppController._compute_warmup_list()includesfastapi,fastapi.security.api_keyconditionally — only whenenable_test_hooksorweb_hostis set
Result: ~470ms off the main thread for non-test, non-web launches.
For live_gui tests (--enable-test-hooks), the warmup loads fastapi
during the same startup window, so the hook server is ready when the
process announces readiness.
src/commands.py (command palette warmup-aware)
Top-level today: from src.command_palette import ... at src/commands.py:1.
After:
- Drop the top-level import. The command functions call
_require_warmed("src.command_palette")to access the module - The warmup list includes
src.command_palette
Result: ~244ms off the main thread's import chain. The bg thread
warms it during startup; the first Ctrl+Shift+P is instant.
src/theme_2.py (NERV theme warmup-aware)
Top-level today: from src.theme_nerv import ..., from src.theme_nerv_fx import ...
at the top of src/theme_2.py.
After:
- Drop the top-level imports.
apply_nerv_theme()(or the function that activates NERV) calls_require_warmed("src.theme_nerv")and_require_warmed("src.theme_nerv_fx") - The warmup list includes both NERV modules
Result: ~485ms off the main thread's import chain (the default non-NERV path is lean). User pays the cost during startup; theme switch is instant when they pick NERV.
src/markdown_helper.py (markdown table warmup-aware)
Top-level today: from src.markdown_table import ... at src/markdown_helper.py:1.
After:
- Drop the top-level import. The table-detection branch of
render()calls_require_warmed("src.markdown_table") - The warmup list includes
src.markdown_table
Result: ~250ms off the main thread's import chain. First markdown table render is instant.
src/imgui_scopes.py, src/gui_2.py, src/bg_shader.py (KEEP imgui_bundle)
These MUST keep import imgui_bundle at top — the ImGui render loop is the
hot path and needs the module on first frame. There is no way to defer
this without breaking the render loop.
What CAN be deferred inside src/gui_2.py:
import numpy(only needed forbg_shader; the GUI itself doesn't need numpy on the first frame) — move to_require_warmed("numpy")in the bg shader call site, addnumpyto the warmup list- Other feature-gated imports — same pattern
src/gui_2.py direct heavy imports (audit)
We will use AST to audit which import X statements at src/gui_2.py
top-level are reachable from the first-frame render path
(render_main_window, render_main_menu_bar, etc.) and which are
feature-gated. First-frame imports stay top-level. Feature-gated ones
move to _require_warmed(...) calls at the use site, with the module
added to the warmup list.
3.2 Job pool + warmup scaffolding
New code in src/app_controller.py:
from concurrent.futures import ThreadPoolExecutor
import importlib
import threading
# In AppController.__init__, after the asyncio loop starts:
self._io_pool = ThreadPoolExecutor(
max_workers=4,
thread_name_prefix="controller-io",
)
# Warmup state
self._warmup_lock = threading.Lock()
self._warmup_done_event = threading.Event()
self._warmup_status: dict[str, list[str]] = {
"pending": [], "completed": [], "failed": [],
}
self._warmup_callbacks: list[Callable] = []
self._submit_warmup_jobs()
_submit_warmup_jobs() computes the warmup list and submits one job per
module to the pool:
def _submit_warmup_jobs(self) -> None:
heavy = self._compute_warmup_list()
with self._warmup_lock:
self._warmup_status["pending"] = list(heavy)
self._warmup_status["completed"] = []
self._warmup_status["failed"] = []
self._warmup_done_event.clear()
for name in heavy:
self._io_pool.submit(self._warmup_one, name)
def _compute_warmup_list(self) -> list[str]:
result = [
"google.genai", "anthropic", "openai", "requests",
"src.command_palette",
"src.theme_nerv", "src.theme_nerv_fx",
"src.markdown_table",
"numpy", # used by bg_shader; warmed for first invocation
]
if self._enable_test_hooks or self._web_host:
result.extend(["fastapi", "fastapi.security.api_key"])
return result
Each warmup worker imports the module, updates the status, and on the last one fires the completion callbacks (so the GUI status indicator and toast notification can react):
def _warmup_one(self, name: str) -> None:
try:
importlib.import_module(name)
with self._warmup_lock:
self._warmup_status["pending"].remove(name)
self._warmup_status["completed"].append(name)
except Exception:
with self._warmup_lock:
self._warmup_status["pending"].remove(name)
self._warmup_status["failed"].append(name)
finally:
with self._warmup_lock:
done = not self._warmup_status["pending"]
cbs = list(self._warmup_callbacks) if done else []
if done:
self._warmup_done_event.set()
for cb in cbs:
try:
cb(dict(self._warmup_status))
except Exception:
pass
Public API on AppController:
def warmup_status(self) -> dict[str, list[str]]:
"""Snapshot the current warmup state. Cheap (lock-guarded copy)."""
with self._warmup_lock:
return {k: list(v) for k, v in self._warmup_status.items()}
def is_warmup_done(self) -> bool:
return self._warmup_done_event.is_set()
def wait_for_warmup(self, timeout: float | None = None) -> bool:
"""Block until warmup completes. Returns True on done, False on timeout."""
return self._warmup_done_event.wait(timeout=timeout)
def on_warmup_complete(self, callback: Callable[[dict], None]) -> None:
"""Register a callback for warmup completion. If already done, fires immediately."""
with self._warmup_lock:
if self._warmup_done_event.is_set():
snap = {k: list(v) for k, v in self._warmup_status.items()}
if "snap" in dir(): # already done
callback(snap)
else:
with self._warmup_lock:
self._warmup_callbacks.append(callback)
Hook API endpoints (added in src/api_hooks.py):
GET /api/warmup_status→controller.warmup_status()GET /api/warmup_wait?timeout=N→ blocks until done, returns final status
GUI integration (in src/gui_2.py):
- Status bar: "Warming up... (5/8)" while in flight, "All imports ready" + green dot when done. Polled once per frame from
controller.warmup_status()(cheap, ~microseconds). - On transition to done: show a toast notification "All providers ready (8 modules)" for 5 seconds.
In AppController.shutdown() (or wherever lifecycle cleanup lives):
self._io_pool.shutdown(wait=False). Non-blocking because the pool's
workers are daemon threads and will die with the process anyway.
3.3 Startup timing instrumentation
Add src/startup_profiler.py:
class StartupProfiler:
"""Records wall-clock time spent in each named init phase.
Cheap (no I/O). Stored on AppController.startup_profile for later inspection
via the Hook API (`GET /api/startup_profile`) and the Diagnostics panel.
"""
_phases: list[tuple[str, float, float]] # (name, start, duration_ms)
@contextmanager
def phase(self, name: str) -> Iterator[None]:
t0 = time.perf_counter()
yield
self._phases.append((name, t0, (time.perf_counter() - t0) * 1000))
Used at every major init step in AppController.__init__ and App.__init__.
4. Phases
Phase 1: Audit + Benchmark + Foundation (Day 1)
- T1.1: Run
scripts/benchmark_imports.pyand capture baseline - T1.2: AST-audit every
import Xinsrc/*.pyto map which is reachable from the first-frame render path vs feature-gated - T1.3: Add
StartupProfilertosrc/app_controller.pyand instrument current init - T1.4: Add
scripts/audit_main_thread_imports.py(static gate) - T1.5: Commit baseline + audit script
Phase 2: Job Pool + Warmup Foundation (Day 1)
- T2.1 (TDD Red):
tests/test_app_controller_io_pool.py— assertAppControllerhas a 4-worker_io_poolnamedcontroller-io-* - T2.2 (Green): Add
_io_pooltoAppController.__init__with named threads - T2.3 (TDD Red):
tests/test_warmup_mechanism.py— assert warmup jobs are submitted in__init__, complete within 10s, fire the done event, support callbacks, don't block init - T2.4 (Green): Implement
_submit_warmup_jobs(),_compute_warmup_list(),_warmup_one(),warmup_status(),is_warmup_done(),wait_for_warmup(),on_warmup_complete()per spec §3.2 - T2.5: Run T2.1 + T2.3 tests, confirm PASS
- T2.6: Commit
Phase 3: Remove top-level heavy SDK imports from src/ai_client.py (Day 2)
- T3.1 (TDD Red):
tests/test_ai_client_no_top_level_sdk_imports.py— assertimport src.ai_clientdoes NOT loadgoogle.genai/anthropic/openai/requests(warmup hasn't run in the subprocess) - T3.2 (Green): Remove the four heavy imports from the top of
ai_client.py. Add_require_warmed(name)helper. Each_send_*uses_require_warmed("google.genai")etc. - T3.3: Run existing
tests/test_ai_client.py; fix any breakage (tests relying on top-level import side effects need a fixture that warms or a fallback for test mode) - T3.4: Confirm T3.1 tests PASS
- T3.5: Commit
Phase 4: Remove top-level FastAPI imports from src/api_hooks.py (Day 2)
- T4.1 (TDD Red):
tests/test_hook_server_no_top_level_fastapi.py— assertfrom src.api_hooks import HookServerdoes NOT import fastapi - T4.2 (Green): Remove the fastapi imports from top. Use
_require_warmedinside the methods that need them - T4.3: Run existing
tests/test_api_hooks.py; fix - T4.4: Commit
Phase 5: Remove top-level imports for feature-gated GUI modules (Day 3)
- T5A: Command Palette —
tests/test_command_palette_no_top_level_import.py- remove from
src/commands.py+ use_require_warmed("src.command_palette")
- remove from
- T5B: NERV Theme —
tests/test_theme_nerv_no_top_level_import.py+ remove fromsrc/theme_2.py+ use_require_warmed("src.theme_nerv")etc. - T5C: Markdown Table —
tests/test_markdown_helper_no_top_level_import.py+ remove fromsrc/markdown_helper.py+ use_require_warmed("src.markdown_table") - T5D: GUI feature-gated — audit
src/gui_2.pyvia the T1.2 script, apply same pattern.numpymigrates to_require_warmedinbg_shadercall site. - T5E: Commit per module (4 atomic commits)
Phase 6: Migrate ad-hoc threads to _io_pool (Day 4)
- T6.1: Audit:
grep -rn "threading.Thread(" src/to find all ad-hoc thread spawns (excludingHookServerandWorkerPoolwhich are domain-specific) - T6.2: Refactor each ad-hoc thread to use
controller.submit_io(fn)instead - T6.3: Per-migration commit
- T6.4: Final
grep -rn "threading.Thread(" src/shows ZERO new spawns
Phase 7: Warmup Notification (Hook API + GUI) (Day 4)
- T7A.1 (TDD Red):
tests/test_api_hooks_warmup.py— assertGET /api/warmup_statusandGET /api/warmup_waitwork - T7A.2 (Green): Add the two endpoints in
src/api_hooks.pyand registerwarmup_statusin_gettable_fields - T7B.1: In
src/gui_2.py, add a status-bar indicator that pollscontroller.warmup_status()each frame: "Warming up... (N/M)" while pending, "All imports ready" with green dot on completion - T7B.2: Register a callback via
controller.on_warmup_complete(cb)that shows a toast "All providers ready (M modules)" on success - T7B.3: Update docs (status bar, toast, hook API)
- T7B.4: Commit
Phase 8: Enforcement — Runtime Audit Hook (Day 4)
- T8.1 (TDD Red):
tests/test_main_thread_purity.py— spawnsloppy.py --headless --enable-test-hookswith asys.addaudithookshim, verify no heavy import happens on the main thread - T8.2: Once Phase 3-5 land, this test should start passing. Wire into CI
as a gating test (
@pytest.mark.slow). - T8.3: Commit
Phase 9: Verify + Checkpoint (Day 5)
- T9.1: Re-run
scripts/benchmark_imports.py --runs=3; confirmimport src.ai_client< 50ms,import src.gui_2< 500ms,import src.app_controller< 300ms - T9.2: Re-run
scripts/audit_main_thread_imports.py; exit 0 - T9.3: Run
tests/test_warmup_mechanism.py; warmup completes and notifications fire - T9.4: Run
tests/test_main_thread_purity.py; pass - T9.5: Run full
live_guitest batch;wait_for_server(timeout=15)no longer times out. Tests can callcontroller.wait_for_warmup()before exercising warmup-dependent functionality. - T9.6: Manual smoke:
uv run sloppy.py: time-to-first-frame < 1.5s, observe status indicator "Warming up... (N/M)" → "All imports ready" + toastuv run sloppy.py --enable-test-hooks: same, plus/api/warmup_statusreturnscompletedafter a brief waituv run sloppy.py --headless: time-to-server-ready- Provider switch test: switch from MiniMax to Gemini in the GUI after warmup. The action must be INSTANT, not 1s-delayed (proves warmup did its job)
- T9.7: Phase checkpoint commit + git note with full verification report
- T9.8: Update
conductor/tracks.md; archive trackuv run sloppy.py --enable-test-hooksboth feel snappier - T9.6: Phase checkpoint commit with full verification report
5. Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Lazy import inside a hot path adds latency on every call | Med | Med | Always gate the import with sys.modules check OR use module-level sentinel |
First AI call on the asyncio thread blocks for ~955ms while google.genai imports |
High | Low | The user already paid this latency budget; happens on the asyncio worker, not main. Document the expected first-call pause. |
| Lazy import surfaces circular import that was hidden by top-level ordering | Med | Med | Phase 1 audit catches this; defer each lazy import to the test phase |
| Test fixtures import the heavy module before main code, breaking assumptions | Low | Low | reset_ai_client and isolate_workspace fixtures already lazy-reset |
| Hot reload of a now-lazy module doesn't trigger | Low | Med | Update HotReloader.HOT_MODULES to register the lazy module's gate function |
_io_pool worker importing a heavy module holds GIL and stutters GUI |
Med | Low | The pool is capped at 4 threads; stutter is bounded; user sees responsive UI before any stutter |
| A future commit re-introduces a heavy import on the main thread | Med | High | Static gate (audit_main_thread_imports.py, CI) + runtime audit hook (test_main_thread_purity.py) catch this |
Hot Reload consideration
src/hot_reloader.py registers modules at import time. Lazy-loaded modules
(imported inside functions) are NOT registered. The hot-reload workflow needs:
- Either: register the lazy module with a callback that forces a re-import via
importlib.reload - Or: explicitly trigger the lazy import on hot-reload trigger
This is a small follow-up task; the lazy import itself doesn't break hot reload (it just means you have to invoke the gate function once to materialize the module before reload can take effect).
6. Verification Criteria
The track is complete when:
import src.ai_clientcold start < 50ms (down from ~1800ms)import src.gui_2cold start < 500ms (down from ~3000ms)import src.app_controllercold start < 300ms (down from ~700ms)uv run sloppy.py --enable-test-hooksreachesimmapp.run()in < 1.5slive_gui.wait_for_server(timeout=15)passes for all 273+ testsscripts/audit_main_thread_imports.pyexits 0 (no heavy imports on main)tests/test_main_thread_purity.pypasses (runtime audit hook confirms invariant)scripts/benchmark_imports.pyshows no new red entries in the top-20controller.wait_for_warmup(timeout=10.0)returns True — warmup completed within 10s ofAppController.__init__- All modules in the warmup list are in
sys.modulesafter warmup —controller.warmup_status()['pending']is empty,'completed'contains all expected module names - User-triggered actions on warmed modules are instant — manual test switching providers (e.g. MiniMax → Gemini) after warmup completes shows NO perceptible lag (was ~1s with lazy-loading)
- GUI status indicator transitions — observe "Warming up... (N/M)" in
the status bar, then "All imports ready" with green dot, then a toast
notification fires via
controller.on_warmup_complete(...) - Hook API exposes warmup state —
GET /api/warmup_statusreturns{pending: [], completed: [...], failed: []};GET /api/warmup_wait?timeout=10returns the final state - NO
import Xstatements inside function bodies for heavy modules — verified bygrep -rn "^\s*import \(google\|anthropic\|openai\|fastapi\|src\.command_palette\|src\.theme_nerv\|src\.markdown_table\)" src/ - No regressions in the existing 272/273 passing tests
grep -rn "threading.Thread(" src/shows ZERO new spawns after Phase 6 migration (only the existing project scaffolding threads likeHookServerandWorkerPoolremain, and they're domain-specific)- Startup profile + io_pool status visible in
/api/startup_profile,/api/io_pool_status, and the Diagnostics panel
7. Out of Scope
- Process-isolation of heavy SDKs (Layer 4 in §2.2) — future track
imgui_bundlelazy loading — fundamentally impossible (ImGui hot path)- Importing on the main thread for the lean
gui_2skeleton (~300ms unavoidable) pydanticlazy loading (used bysrc/models.pywhich is imported by 16 files; the cost is already amortized and deferring it would cascade)- Lazy-loading heavy modules in function bodies (Layer 1 in §2.2 — explicitly rejected by the user; warmup is the only mechanism)
8. Cross-References
conductor/tracks.mdline 152 — original backlog entry that this track fulfillsdocs/guide_architecture.md:43-67— thread domains (asyncio worker is the right place for heavy work)docs/guide_architecture.md:880-898— Architectural Invariants (single-writer principle; this track respects it)docs/guide_app_controller.md:241-271— existingget_rag_engine/get_mma_conductorlazy patterns (the templates this track replicates)docs/guide_hot_reload.md:295-312— what is/isn't safe to hot-reload (lazy-loaded modules need a small follow-up)conductor/workflow.md— TDD Red-Green-Refactor protocol + atomic per-task commits + git notesscripts/benchmark_imports.py— the measurement tool built in this conversation