Private
Public Access
0
0
Files
manual_slop/conductor/tracks/startup_speedup_20260606/spec.md
T
ed f2f5ee1197 conductor(plan): flip track from lazy-loading to proactive warmup
Architectural shift driven by user clarification: lazy-loading on first
use causes user-perceptible lag when the user-triggered action (e.g.
provider switch) propagates to a controller method that triggers the
first import. The fix is to pre-import heavy modules on a bg thread
at startup and have functions access them via _require_warmed().

Old design (rejected):
  - from google import genai inside _send_gemini (lazy on first call)
  - First user action that triggers this pays the cost; UI feels laggy

New design (this commit):
  - Top-level heavy imports REMOVED from main-thread-reachable files
  - AppController.__init__ submits warmup jobs to _io_pool (4 threads,
    named 'controller-io-N')
  - Each warmup worker imports its module and updates a thread-safe
    warmup_status dict
  - Functions access modules via _require_warmed(name), which assumes
    the module is in sys.modules (warmed at startup)
  - When all jobs complete, _warmup_done_event is set and registered
    on_warmup_complete callbacks fire
  - GUI shows status indicator + toast when warmup completes
  - Hook API exposes /api/warmup_status and /api/warmup_wait
  - Tests can call controller.wait_for_warmup() before exercising
    warmup-dependent functionality

Phase 2 now bundles job pool + warmup (T2.3+T2.4 add warmup tests +
implementation). Phases 3-5 do 'remove top-level imports' instead of
'lazy-load'. Phase 7 is the notification surface (Hook API + GUI).
Definition of Done includes warmup-completion criteria, the
'no function-body imports' check, and an end-to-end 'provider switch
is INSTANT' smoke test.

No code changes; this is a planning update only.
2026-06-06 13:45:05 -04:00

35 KiB

Track: Sloppy.py Startup Speedup

Status: Active Initialized: 2026-06-06 Owner: Tier 2 Tech Lead Priority: High (regression blocker — live_gui fixtures time out at wait_for_server(timeout=15))


1. Problem Statement

uv run sloppy.py --enable-test-hooks startup latency has crept up. live_gui tests time out at wait_for_server(timeout=15). Root cause is too much work on the main thread before immapp.run() returns and the GUI becomes interactive:

  • 5 AI provider SDKs (google.genai, anthropic, openai, requests, ...) eagerly imported at src/ai_client.py module top-level, even though only one is the active provider at runtime
  • imgui_bundle transitively pulls numpy and 9 other heavy modules at the top of src/gui_2.py and 9 sibling files
  • NERV theme, command palette, markdown table extensions are loaded eagerly even though they are feature-gated
  • AppController.__init__ does all subsystem construction synchronously on the thread that will become the main GUI thread (path manager, presets, personas, context presets, tool presets, history, workspace, RAG, hook server)

The architecture is already correct: AI calls go through the asyncio worker thread, so the call is non-blocking. The imports are still synchronous on the main thread, and that is what the user sees as "sloppy.py is slow to open."

1.1 Measurement Baseline (from scripts/benchmark_imports.py)

Cold-start subprocess timings, median of 3 runs, 85 unique import paths:

module time files classification
google.genai ~955ms 1 defer (provider SDK, default)
openai ~445ms 1 defer (provider SDK)
anthropic ~430ms 1 defer (provider SDK)
src.markdown_table ~250ms 1 defer (feature-gated)
src.theme_nerv ~245ms 1 defer (feature-gated)
imgui_bundle ~245ms 10 KEEP (ImGui hot path)
src.command_palette ~244ms 1 defer (feature-gated)
src.theme_nerv_fx ~240ms 1 defer (feature-gated)
fastapi (+ security.api_key) ~470ms combined 1 defer (only --enable-test-hooks or web mode)
requests ~92ms 3 defer (deepseek/minimax only)
numpy ~65ms 2 keep (bg_shader; optional in gui_2)
pydantic ~70ms 1 keep (models.py is loaded by everyone)
tree_sitter_* ~25ms each 1 keep (file_cache)

Estimated main-thread import cost today (worst case, all paths): ~2500-3000ms (1.0s SDKs + 1.0s web/fastapi + 0.5s GUI extras + ~0.5s transitives).

Estimated main-thread import cost after this track: ~500-600ms (imgui_bundle + lean gui_2 + pydantic models). Net savings ~2000-2400ms.


2. Approach

The architecture is already correct. The fix is systematic application of the lazy-load + shared-job-pool patterns the codebase already uses for RAGEngine (get_rag_engine in src/app_controller.py:244-249) and MultiAgentConductor (get_mma_conductor in src/app_controller.py:266-271).

2.1 Architectural Invariant: Main Thread Purity

The main thread (the one that enters immapp.run()) must NEVER import a module heavier than imgui_bundle and the lean gui_2 skeleton. Every heavy import is loaded by the asyncio worker thread, the AppController's shared job pool, or the MMA WorkerPool. This invariant is enforced by an audit script (CI gate) and a runtime audit-hook test that fails if a heavy import is observed on the main thread at startup.

Concretely, the main thread's import chain is allowed to contain:

  • All import X statements transitively reachable from src/gui_2.py whose accumulated import time is < 50ms
  • The modules: imgui_bundle, defer, src.imgui_scopes, src.theme_2 (default theme only), src.theme_models, src.paths, src.models, src.events
  • Anything in sys.stdlib_module_names

Everything else — provider SDKs, FastAPI, NERV theme, command palette, markdown table extensions, the full src.ai_client provider list, numpy/psutil/ tree_sitter_* if used by lazy code paths — must be loaded by a background mechanism that does not run on the main thread.

2.2 Four layers of protection

Layer 1 — Explicit warmup-aware module access (the load-bearing wall, non-negotiable)

Remove heavy imports from the top of source files reachable from the main thread. Functions that need them use a _require_warmed(name) helper that assumes the module is already in sys.modules (because warmup put it there):

# BEFORE (src/ai_client.py, current)
from google import genai
import anthropic
import openai
# ... 5 provider SDKs loaded unconditionally

# AFTER
import sys
import importlib
from typing import Any

def _require_warmed(name: str) -> Any:
    """Get a module that AppController's warmup should have loaded.
    
    Raises RuntimeError if the module is not in sys.modules. This is the
    explicit contract: heavy modules MUST be warmed at startup. No lazy
    loading on first use — the import is paid upfront on a bg thread.
    """
    mod = sys.modules.get(name)
    if mod is None:
        raise RuntimeError(
            f"Module {name!r} is not warmed. "
            f"AppController.__init__ must have run first (which submits warmup jobs)."
        )
    return mod

def _send_gemini(md_content, user_message, ...):
    genai = _require_warmed("google.genai")
    # ... use genai ...

Why no import X inside the function body? Because that would be lazy loading on first use. If the first use is triggered by a user UI action (e.g. switching the provider from MiniMax to Gemini, the controller enqueues an action that propagates to the first call), the user sees a 955ms lag between their click and any visible response. That's the bad case the user called out: "lazy loading introduces latencies when interacting with the UI state vs the bg state."

By warming proactively, the first user-triggered call is instant. The cost is paid during startup on a bg thread, before the user can interact.

Main-thread cost: zero. The main thread's import chain is fully lean (none of the heavy modules are imported top-level). The warmup jobs run on _io_pool workers in parallel with the main thread's remaining init.

Layer 2 — Shared job pool on AppController (no new threads per task)

The codebase already has these dedicated / shared threads:

  • AppController._loop_thread — asyncio worker (DEDICATED to the AI event loop, do not use for arbitrary work)
  • WorkerPool (in src/multi_agent_conductor.py) — 4-thread pool for MMA workers (DEDICATED to MMA, do not pollute with imports or I/O)
  • HookServer thread — DEDICATED to the FastAPI server
  • Ad-hoc threading.Thread calls — used for one-off tasks; the user wants to MINIMIZE these

User constraint: no new daemon threads per import warmup, per I/O task, per log-prune. We add ONE shared ThreadPoolExecutor to AppController named _io_pool, and any subsystem that needs background work submits jobs to it. This includes:

  • Initial RAG index warm-up (if applicable)
  • Log pruning (currently a one-shot thread — refactor to use the pool)
  • Disk-bound subsystem initialization (e.g., TOML re-read on persona switch)
  • Heavy module warmup (the primary use case for this track)
# In AppController.__init__
from concurrent.futures import ThreadPoolExecutor

self._io_pool = ThreadPoolExecutor(
 max_workers=4,
 thread_name_prefix="controller-io",
)

Threads created by this track: 4 (the pool). Not 4+1 per job, not 1 per import, not 1 per subsystem. Just 4 long-lived threads that all background work shares. Future work that needs a bg thread should controller._io_pool.submit(fn).

Layer 3 — Proactive warmup + completion notification (the new mechanism)

This is the core of the track. In AppController.__init__, immediately after _io_pool is created, the controller submits a job to the pool for each heavy module that needs warming. The main thread does NOT wait for these to complete.

# In AppController.__init__, right after self._io_pool is created
self._warmup_status: dict[str, list[str]] = {
    "pending": [], "completed": [], "failed": [],
}
self._warmup_lock = threading.Lock()
self._warmup_done_event = threading.Event()
self._warmup_callbacks: list[Callable] = []
self._submit_warmup_jobs()
def _submit_warmup_jobs(self) -> None:
    """Submit bg jobs to import heavy modules. Notifies subscribers on completion."""
    heavy = self._compute_warmup_list()
    with self._warmup_lock:
        self._warmup_status["pending"] = list(heavy)
        self._warmup_status["completed"] = []
        self._warmup_status["failed"] = []
        self._warmup_done_event.clear()
    for module_name in heavy:
        self._io_pool.submit(self._warmup_one, module_name)

def _compute_warmup_list(self) -> list[str]:
    result = [
        # AI provider SDKs
        "google.genai", "anthropic", "openai", "requests",
        # Feature-gated GUI (used by main thread but not on first frame)
        "src.command_palette",
        "src.theme_nerv", "src.theme_nerv_fx",
        "src.markdown_table",
    ]
    if self._enable_test_hooks or self._web_host:
        result.extend(["fastapi", "fastapi.security.api_key"])
    return result

def _warmup_one(self, module_name: str) -> None:
    try:
        importlib.import_module(module_name)
        with self._warmup_lock:
            self._warmup_status["pending"].remove(module_name)
            self._warmup_status["completed"].append(module_name)
    except Exception as e:
        with self._warmup_lock:
            self._warmup_status["pending"].remove(module_name)
            self._warmup_status["failed"].append(module_name)
    finally:
        with self._warmup_lock:
            done = not self._warmup_status["pending"]
            callbacks = list(self._warmup_callbacks) if done else []
            if done:
                self._warmup_done_event.set()
        for cb in callbacks:
            try:
                cb(self._warmup_status)
            except Exception:
                pass

Completion notification is critical for the user-visible UX. Three surfaces:

  1. GUI status indicator — the status bar shows "Warming up... (5/8)" while the bg jobs run, then "All imports ready" with a green dot when complete. The GUI never blocks waiting; the indicator is updated by polling controller.warmup_status() once per frame (cheap, lock-guarded).

  2. GUI toast notification — when warmup completes, show a toast: "All providers ready" with the count of modules loaded. User can dismiss.

  3. Hook API endpointGET /api/warmup_status returns the current state; GET /api/warmup_wait?timeout=N blocks until done (for tests).

The user said: "the app controller should post to test clients or the user when its threads are warmed up with imports — that way the user knows 'hey you have the ui first, but now you have all the functionality.'" This is exactly what the notification surfaces achieve.

Why this beats lazy-loading: if a user clicks "switch to Gemini" and the controller lazy-loads google.genai on that action, the user sees ~1s of nothing happening between the click and the visible response. With warmup, the click is instant because google.genai is already in sys.modules. The 1s of cost was paid during startup, when the user was looking at a splash or otherwise not waiting on input.

Layer 4 — Worker-process isolation (future, out of scope)

The codebase already runs gemini_cli and external MCP servers as subprocesses for this exact reason. A future track could move google.genai / anthropic into their own worker processes, communicating via the existing SyncEventQueue. This track does NOT do this — Layer 1+2+3 is sufficient for the current problem.

2.3 Threading constraints (verified empirically)

The user's question: "if I import in the app controller's thread, will it block the GUI's thread?" The answer is:

Scenario Blocks GUI?
Module top-level import of heavy X, then main imports X YES (X's import is in main's chain). This is why we remove heavy imports from main-thread-reachable files.
_io_pool worker warming X while main thread renders NO direct block, but GIL contention causes micro-stutters (~5-50ms each). Acceptable because the pool is capped at 4 threads and the main thread is mostly idle in immapp.run().
_io_pool worker warms X; main thread later calls _require_warmed("X") (X already in sys.modules) NO (the lookup is a dict.get() — instant, no import lock contention).
User-triggered UI action (e.g. provider switch) propagates to controller which calls _require_warmed on a warmed module NO (lookup is instant). This is the win the user explicitly called out: no user-perceptible lag.
wait_for_warmup() blocks the asyncio thread waiting for warmup NO direct block on GUI (different thread). Asyncio thread waits; main thread renders. Acceptable but rarely needed if user waits for warmup notification first.
Spawning a new threading.Thread for each import warmup Wasteful (thread creation ~1-5ms each; thread count explodes). Use the _io_pool instead.

This means: Layer 1 is non-negotiable. Even with warmup on _io_pool, if the heavy import is also in the main thread's import chain, the main thread will block on the import lock the moment it tries to use the module. Layer 1 removes the heavy imports from the main thread's chain; Layer 2 reuses threads efficiently; Layer 3 proactively warms on bg threads so the FIRST user-triggered use is instant.

2.4 Enforcement: the "main thread purity" audit

Two enforcement mechanisms, both required:

Static: scripts/audit_main_thread_imports.py (CI gate)

  1. AST-walk the import graph reachable from sloppy.py (the main entry). For each .py file in the graph, collect top-level import X and from X import Y statements.

  2. Compare against an allowlist of "main-thread-safe" modules (stdlib + imgui_bundle + the lean gui_2 skeleton list from §2.1). Any non-allowlist import is a violation.

  3. Exit non-zero with a clear message naming the file, line, and heavy module.

  4. Run as part of CI (uv run python scripts/audit_main_thread_imports.py) and as a pre-commit hook.

Runtime: tests/test_main_thread_purity.py (TDD, empirical)

  1. Spawn uv run python sloppy.py --headless --enable-test-hooks as a subprocess, with a sys.addaudithook callback that logs every import event with the calling thread.

  2. Wait for the headless server to be ready (or 5s timeout).

  3. Read the audit log. Assert: every import event with threading.current_thread() is threading.main_thread() was for a module in the allowlist.

  4. Kill the subprocess.

This is the empirical enforcement: it proves the invariant holds at runtime, not just at static analysis time.


3. Architectural Changes

3.1 Per-file import plan

For each source file reachable from the main thread's import chain, we remove top-level heavy imports and have functions access them via _require_warmed(name). The warmup jobs (§3.2) put the modules in sys.modules before any function is called.

src/ai_client.py (the biggest win: ~1800ms)

Top-level today: from google import genai, import anthropic, import openai, import requests (used by deepseek/minimax).

After:

  • Drop all four heavy imports from the top. Add _require_warmed(name) helper at the top.
  • _send_gemini() calls _require_warmed("google.genai") to get the module
  • _send_anthropic() calls _require_warmed("anthropic")
  • _send_deepseek() and _send_minimax() call _require_warmed("openai") and _require_warmed("requests")
  • Provider client objects (_gemini_client, _anthropic_client, etc.) stay as module globals but are now None until _send_* initializes them (extracted from current top-level logic into a new _ensure_<provider>_client() that uses the warmed module)
  • The warmup list in AppController._compute_warmup_list() includes google.genai, anthropic, openai, requests (always warmed)

Result: ~1800ms off the main thread. The bg threads pay this cost during startup. By the time the first AI call happens (which is always async, on the asyncio thread), the modules are in sys.modules and the lookup is instant. No user-perceptible lag.

src/api_hooks.py (FastAPI in headless/web only)

Top-level today: from fastapi import ..., from fastapi.security.api_key import ... (only needed if --enable-test-hooks or --web-host).

After:

  • Drop these from top. Add _require_warmed(name) calls inside the methods that need them.
  • The warmup list in AppController._compute_warmup_list() includes fastapi, fastapi.security.api_key conditionally — only when enable_test_hooks or web_host is set

Result: ~470ms off the main thread for non-test, non-web launches. For live_gui tests (--enable-test-hooks), the warmup loads fastapi during the same startup window, so the hook server is ready when the process announces readiness.

src/commands.py (command palette warmup-aware)

Top-level today: from src.command_palette import ... at src/commands.py:1.

After:

  • Drop the top-level import. The command functions call _require_warmed("src.command_palette") to access the module
  • The warmup list includes src.command_palette

Result: ~244ms off the main thread's import chain. The bg thread warms it during startup; the first Ctrl+Shift+P is instant.

src/theme_2.py (NERV theme warmup-aware)

Top-level today: from src.theme_nerv import ..., from src.theme_nerv_fx import ... at the top of src/theme_2.py.

After:

  • Drop the top-level imports. apply_nerv_theme() (or the function that activates NERV) calls _require_warmed("src.theme_nerv") and _require_warmed("src.theme_nerv_fx")
  • The warmup list includes both NERV modules

Result: ~485ms off the main thread's import chain (the default non-NERV path is lean). User pays the cost during startup; theme switch is instant when they pick NERV.

src/markdown_helper.py (markdown table warmup-aware)

Top-level today: from src.markdown_table import ... at src/markdown_helper.py:1.

After:

  • Drop the top-level import. The table-detection branch of render() calls _require_warmed("src.markdown_table")
  • The warmup list includes src.markdown_table

Result: ~250ms off the main thread's import chain. First markdown table render is instant.

src/imgui_scopes.py, src/gui_2.py, src/bg_shader.py (KEEP imgui_bundle)

These MUST keep import imgui_bundle at top — the ImGui render loop is the hot path and needs the module on first frame. There is no way to defer this without breaking the render loop.

What CAN be deferred inside src/gui_2.py:

  • import numpy (only needed for bg_shader; the GUI itself doesn't need numpy on the first frame) — move to _require_warmed("numpy") in the bg shader call site, add numpy to the warmup list
  • Other feature-gated imports — same pattern

src/gui_2.py direct heavy imports (audit)

We will use AST to audit which import X statements at src/gui_2.py top-level are reachable from the first-frame render path (render_main_window, render_main_menu_bar, etc.) and which are feature-gated. First-frame imports stay top-level. Feature-gated ones move to _require_warmed(...) calls at the use site, with the module added to the warmup list.

3.2 Job pool + warmup scaffolding

New code in src/app_controller.py:

from concurrent.futures import ThreadPoolExecutor
import importlib
import threading

# In AppController.__init__, after the asyncio loop starts:
self._io_pool = ThreadPoolExecutor(
 max_workers=4,
 thread_name_prefix="controller-io",
)

# Warmup state
self._warmup_lock = threading.Lock()
self._warmup_done_event = threading.Event()
self._warmup_status: dict[str, list[str]] = {
    "pending": [], "completed": [], "failed": [],
}
self._warmup_callbacks: list[Callable] = []
self._submit_warmup_jobs()

_submit_warmup_jobs() computes the warmup list and submits one job per module to the pool:

def _submit_warmup_jobs(self) -> None:
    heavy = self._compute_warmup_list()
    with self._warmup_lock:
        self._warmup_status["pending"] = list(heavy)
        self._warmup_status["completed"] = []
        self._warmup_status["failed"] = []
        self._warmup_done_event.clear()
    for name in heavy:
        self._io_pool.submit(self._warmup_one, name)

def _compute_warmup_list(self) -> list[str]:
    result = [
        "google.genai", "anthropic", "openai", "requests",
        "src.command_palette",
        "src.theme_nerv", "src.theme_nerv_fx",
        "src.markdown_table",
        "numpy",  # used by bg_shader; warmed for first invocation
    ]
    if self._enable_test_hooks or self._web_host:
        result.extend(["fastapi", "fastapi.security.api_key"])
    return result

Each warmup worker imports the module, updates the status, and on the last one fires the completion callbacks (so the GUI status indicator and toast notification can react):

def _warmup_one(self, name: str) -> None:
    try:
        importlib.import_module(name)
        with self._warmup_lock:
            self._warmup_status["pending"].remove(name)
            self._warmup_status["completed"].append(name)
    except Exception:
        with self._warmup_lock:
            self._warmup_status["pending"].remove(name)
            self._warmup_status["failed"].append(name)
    finally:
        with self._warmup_lock:
            done = not self._warmup_status["pending"]
            cbs = list(self._warmup_callbacks) if done else []
            if done:
                self._warmup_done_event.set()
    for cb in cbs:
        try:
            cb(dict(self._warmup_status))
        except Exception:
            pass

Public API on AppController:

def warmup_status(self) -> dict[str, list[str]]:
    """Snapshot the current warmup state. Cheap (lock-guarded copy)."""
    with self._warmup_lock:
        return {k: list(v) for k, v in self._warmup_status.items()}

def is_warmup_done(self) -> bool:
    return self._warmup_done_event.is_set()

def wait_for_warmup(self, timeout: float | None = None) -> bool:
    """Block until warmup completes. Returns True on done, False on timeout."""
    return self._warmup_done_event.wait(timeout=timeout)

def on_warmup_complete(self, callback: Callable[[dict], None]) -> None:
    """Register a callback for warmup completion. If already done, fires immediately."""
    with self._warmup_lock:
        if self._warmup_done_event.is_set():
            snap = {k: list(v) for k, v in self._warmup_status.items()}
    if "snap" in dir():  # already done
        callback(snap)
    else:
        with self._warmup_lock:
            self._warmup_callbacks.append(callback)

Hook API endpoints (added in src/api_hooks.py):

  • GET /api/warmup_statuscontroller.warmup_status()
  • GET /api/warmup_wait?timeout=N → blocks until done, returns final status

GUI integration (in src/gui_2.py):

  • Status bar: "Warming up... (5/8)" while in flight, "All imports ready" + green dot when done. Polled once per frame from controller.warmup_status() (cheap, ~microseconds).
  • On transition to done: show a toast notification "All providers ready (8 modules)" for 5 seconds.

In AppController.shutdown() (or wherever lifecycle cleanup lives): self._io_pool.shutdown(wait=False). Non-blocking because the pool's workers are daemon threads and will die with the process anyway.

3.3 Startup timing instrumentation

Add src/startup_profiler.py:

class StartupProfiler:
    """Records wall-clock time spent in each named init phase.
    
    Cheap (no I/O). Stored on AppController.startup_profile for later inspection
    via the Hook API (`GET /api/startup_profile`) and the Diagnostics panel.
    """
    _phases: list[tuple[str, float, float]]  # (name, start, duration_ms)
    
    @contextmanager
    def phase(self, name: str) -> Iterator[None]:
        t0 = time.perf_counter()
        yield
        self._phases.append((name, t0, (time.perf_counter() - t0) * 1000))

Used at every major init step in AppController.__init__ and App.__init__.


4. Phases

Phase 1: Audit + Benchmark + Foundation (Day 1)

  • T1.1: Run scripts/benchmark_imports.py and capture baseline
  • T1.2: AST-audit every import X in src/*.py to map which is reachable from the first-frame render path vs feature-gated
  • T1.3: Add StartupProfiler to src/app_controller.py and instrument current init
  • T1.4: Add scripts/audit_main_thread_imports.py (static gate)
  • T1.5: Commit baseline + audit script

Phase 2: Job Pool + Warmup Foundation (Day 1)

  • T2.1 (TDD Red): tests/test_app_controller_io_pool.py — assert AppController has a 4-worker _io_pool named controller-io-*
  • T2.2 (Green): Add _io_pool to AppController.__init__ with named threads
  • T2.3 (TDD Red): tests/test_warmup_mechanism.py — assert warmup jobs are submitted in __init__, complete within 10s, fire the done event, support callbacks, don't block init
  • T2.4 (Green): Implement _submit_warmup_jobs(), _compute_warmup_list(), _warmup_one(), warmup_status(), is_warmup_done(), wait_for_warmup(), on_warmup_complete() per spec §3.2
  • T2.5: Run T2.1 + T2.3 tests, confirm PASS
  • T2.6: Commit

Phase 3: Remove top-level heavy SDK imports from src/ai_client.py (Day 2)

  • T3.1 (TDD Red): tests/test_ai_client_no_top_level_sdk_imports.py — assert import src.ai_client does NOT load google.genai / anthropic / openai / requests (warmup hasn't run in the subprocess)
  • T3.2 (Green): Remove the four heavy imports from the top of ai_client.py. Add _require_warmed(name) helper. Each _send_* uses _require_warmed("google.genai") etc.
  • T3.3: Run existing tests/test_ai_client.py; fix any breakage (tests relying on top-level import side effects need a fixture that warms or a fallback for test mode)
  • T3.4: Confirm T3.1 tests PASS
  • T3.5: Commit

Phase 4: Remove top-level FastAPI imports from src/api_hooks.py (Day 2)

  • T4.1 (TDD Red): tests/test_hook_server_no_top_level_fastapi.py — assert from src.api_hooks import HookServer does NOT import fastapi
  • T4.2 (Green): Remove the fastapi imports from top. Use _require_warmed inside the methods that need them
  • T4.3: Run existing tests/test_api_hooks.py; fix
  • T4.4: Commit

Phase 5: Remove top-level imports for feature-gated GUI modules (Day 3)

  • T5A: Command Palette — tests/test_command_palette_no_top_level_import.py
    • remove from src/commands.py + use _require_warmed("src.command_palette")
  • T5B: NERV Theme — tests/test_theme_nerv_no_top_level_import.py + remove from src/theme_2.py + use _require_warmed("src.theme_nerv") etc.
  • T5C: Markdown Table — tests/test_markdown_helper_no_top_level_import.py + remove from src/markdown_helper.py + use _require_warmed("src.markdown_table")
  • T5D: GUI feature-gated — audit src/gui_2.py via the T1.2 script, apply same pattern. numpy migrates to _require_warmed in bg_shader call site.
  • T5E: Commit per module (4 atomic commits)

Phase 6: Migrate ad-hoc threads to _io_pool (Day 4)

  • T6.1: Audit: grep -rn "threading.Thread(" src/ to find all ad-hoc thread spawns (excluding HookServer and WorkerPool which are domain-specific)
  • T6.2: Refactor each ad-hoc thread to use controller.submit_io(fn) instead
  • T6.3: Per-migration commit
  • T6.4: Final grep -rn "threading.Thread(" src/ shows ZERO new spawns

Phase 7: Warmup Notification (Hook API + GUI) (Day 4)

  • T7A.1 (TDD Red): tests/test_api_hooks_warmup.py — assert GET /api/warmup_status and GET /api/warmup_wait work
  • T7A.2 (Green): Add the two endpoints in src/api_hooks.py and register warmup_status in _gettable_fields
  • T7B.1: In src/gui_2.py, add a status-bar indicator that polls controller.warmup_status() each frame: "Warming up... (N/M)" while pending, "All imports ready" with green dot on completion
  • T7B.2: Register a callback via controller.on_warmup_complete(cb) that shows a toast "All providers ready (M modules)" on success
  • T7B.3: Update docs (status bar, toast, hook API)
  • T7B.4: Commit

Phase 8: Enforcement — Runtime Audit Hook (Day 4)

  • T8.1 (TDD Red): tests/test_main_thread_purity.py — spawn sloppy.py --headless --enable-test-hooks with a sys.addaudithook shim, verify no heavy import happens on the main thread
  • T8.2: Once Phase 3-5 land, this test should start passing. Wire into CI as a gating test (@pytest.mark.slow).
  • T8.3: Commit

Phase 9: Verify + Checkpoint (Day 5)

  • T9.1: Re-run scripts/benchmark_imports.py --runs=3; confirm import src.ai_client < 50ms, import src.gui_2 < 500ms, import src.app_controller < 300ms
  • T9.2: Re-run scripts/audit_main_thread_imports.py; exit 0
  • T9.3: Run tests/test_warmup_mechanism.py; warmup completes and notifications fire
  • T9.4: Run tests/test_main_thread_purity.py; pass
  • T9.5: Run full live_gui test batch; wait_for_server(timeout=15) no longer times out. Tests can call controller.wait_for_warmup() before exercising warmup-dependent functionality.
  • T9.6: Manual smoke:
    • uv run sloppy.py: time-to-first-frame < 1.5s, observe status indicator "Warming up... (N/M)" → "All imports ready" + toast
    • uv run sloppy.py --enable-test-hooks: same, plus /api/warmup_status returns completed after a brief wait
    • uv run sloppy.py --headless: time-to-server-ready
    • Provider switch test: switch from MiniMax to Gemini in the GUI after warmup. The action must be INSTANT, not 1s-delayed (proves warmup did its job)
  • T9.7: Phase checkpoint commit + git note with full verification report
  • T9.8: Update conductor/tracks.md; archive track uv run sloppy.py --enable-test-hooks both feel snappier
  • T9.6: Phase checkpoint commit with full verification report

5. Risks and Mitigations

Risk Likelihood Impact Mitigation
Lazy import inside a hot path adds latency on every call Med Med Always gate the import with sys.modules check OR use module-level sentinel
First AI call on the asyncio thread blocks for ~955ms while google.genai imports High Low The user already paid this latency budget; happens on the asyncio worker, not main. Document the expected first-call pause.
Lazy import surfaces circular import that was hidden by top-level ordering Med Med Phase 1 audit catches this; defer each lazy import to the test phase
Test fixtures import the heavy module before main code, breaking assumptions Low Low reset_ai_client and isolate_workspace fixtures already lazy-reset
Hot reload of a now-lazy module doesn't trigger Low Med Update HotReloader.HOT_MODULES to register the lazy module's gate function
_io_pool worker importing a heavy module holds GIL and stutters GUI Med Low The pool is capped at 4 threads; stutter is bounded; user sees responsive UI before any stutter
A future commit re-introduces a heavy import on the main thread Med High Static gate (audit_main_thread_imports.py, CI) + runtime audit hook (test_main_thread_purity.py) catch this

Hot Reload consideration

src/hot_reloader.py registers modules at import time. Lazy-loaded modules (imported inside functions) are NOT registered. The hot-reload workflow needs:

  • Either: register the lazy module with a callback that forces a re-import via importlib.reload
  • Or: explicitly trigger the lazy import on hot-reload trigger

This is a small follow-up task; the lazy import itself doesn't break hot reload (it just means you have to invoke the gate function once to materialize the module before reload can take effect).


6. Verification Criteria

The track is complete when:

  • import src.ai_client cold start < 50ms (down from ~1800ms)
  • import src.gui_2 cold start < 500ms (down from ~3000ms)
  • import src.app_controller cold start < 300ms (down from ~700ms)
  • uv run sloppy.py --enable-test-hooks reaches immapp.run() in < 1.5s
  • live_gui.wait_for_server(timeout=15) passes for all 273+ tests
  • scripts/audit_main_thread_imports.py exits 0 (no heavy imports on main)
  • tests/test_main_thread_purity.py passes (runtime audit hook confirms invariant)
  • scripts/benchmark_imports.py shows no new red entries in the top-20
  • controller.wait_for_warmup(timeout=10.0) returns True — warmup completed within 10s of AppController.__init__
  • All modules in the warmup list are in sys.modules after warmupcontroller.warmup_status()['pending'] is empty, 'completed' contains all expected module names
  • User-triggered actions on warmed modules are instant — manual test switching providers (e.g. MiniMax → Gemini) after warmup completes shows NO perceptible lag (was ~1s with lazy-loading)
  • GUI status indicator transitions — observe "Warming up... (N/M)" in the status bar, then "All imports ready" with green dot, then a toast notification fires via controller.on_warmup_complete(...)
  • Hook API exposes warmup stateGET /api/warmup_status returns {pending: [], completed: [...], failed: []}; GET /api/warmup_wait?timeout=10 returns the final state
  • NO import X statements inside function bodies for heavy modules — verified by grep -rn "^\s*import \(google\|anthropic\|openai\|fastapi\|src\.command_palette\|src\.theme_nerv\|src\.markdown_table\)" src/
  • No regressions in the existing 272/273 passing tests
  • grep -rn "threading.Thread(" src/ shows ZERO new spawns after Phase 6 migration (only the existing project scaffolding threads like HookServer and WorkerPool remain, and they're domain-specific)
  • Startup profile + io_pool status visible in /api/startup_profile, /api/io_pool_status, and the Diagnostics panel

7. Out of Scope

  • Process-isolation of heavy SDKs (Layer 4 in §2.2) — future track
  • imgui_bundle lazy loading — fundamentally impossible (ImGui hot path)
  • Importing on the main thread for the lean gui_2 skeleton (~300ms unavoidable)
  • pydantic lazy loading (used by src/models.py which is imported by 16 files; the cost is already amortized and deferring it would cascade)
  • Lazy-loading heavy modules in function bodies (Layer 1 in §2.2 — explicitly rejected by the user; warmup is the only mechanism)

8. Cross-References

  • conductor/tracks.md line 152 — original backlog entry that this track fulfills
  • docs/guide_architecture.md:43-67 — thread domains (asyncio worker is the right place for heavy work)
  • docs/guide_architecture.md:880-898 — Architectural Invariants (single-writer principle; this track respects it)
  • docs/guide_app_controller.md:241-271 — existing get_rag_engine / get_mma_conductor lazy patterns (the templates this track replicates)
  • docs/guide_hot_reload.md:295-312 — what is/isn't safe to hot-reload (lazy-loaded modules need a small follow-up)
  • conductor/workflow.md — TDD Red-Green-Refactor protocol + atomic per-task commits + git notes
  • scripts/benchmark_imports.py — the measurement tool built in this conversation