# Track: Sloppy.py Startup Speedup

**Status:** Active
**Initialized:** 2026-06-06
**Owner:** Tier 2 Tech Lead
**Priority:** High (regression blocker — `live_gui` fixtures time out at `wait_for_server(timeout=15)`)

---

## 1. Problem Statement

`uv run sloppy.py --enable-test-hooks` startup latency has crept up. `live_gui` tests
time out at `wait_for_server(timeout=15)`. Root cause is **too much work on the main
thread before `immapp.run()` returns and the GUI becomes interactive**:

- 5 AI provider SDKs (`google.genai`, `anthropic`, `openai`, `requests`, ...) eagerly
  imported at `src/ai_client.py` module top-level, even though only one is the active
  provider at runtime
- `imgui_bundle` transitively pulls `numpy` and 9 other heavy modules at the top of
  `src/gui_2.py` and 9 sibling files
- NERV theme, command palette, markdown table extensions are loaded eagerly even
  though they are feature-gated
- `AppController.__init__` does all subsystem construction synchronously on the
  thread that will become the main GUI thread (path manager, presets, personas,
  context presets, tool presets, history, workspace, RAG, hook server)

The architecture is already correct: AI calls go through the asyncio worker thread,
so the *call* is non-blocking. The *imports* are still synchronous on the main
thread, and that is what the user sees as "sloppy.py is slow to open."

### 1.1 Measurement Baseline (from `scripts/benchmark_imports.py`)

Cold-start subprocess timings, median of 3 runs, 85 unique import paths:

| module | time | files | classification |
|---|---:|---:|---|
| google.genai | ~955ms | 1 | **defer (provider SDK, default)** |
| openai | ~445ms | 1 | defer (provider SDK) |
| anthropic | ~430ms | 1 | defer (provider SDK) |
| src.markdown_table | ~250ms | 1 | defer (feature-gated) |
| src.theme_nerv | ~245ms | 1 | defer (feature-gated) |
| imgui_bundle | ~245ms | 10 | **KEEP (ImGui hot path)** |
| src.command_palette | ~244ms | 1 | defer (feature-gated) |
| src.theme_nerv_fx | ~240ms | 1 | defer (feature-gated) |
| fastapi (+ security.api_key) | ~470ms combined | 1 | defer (only `--enable-test-hooks` or web mode) |
| requests | ~92ms | 3 | defer (deepseek/minimax only) |
| numpy | ~65ms | 2 | keep (bg_shader; optional in gui_2) |
| pydantic | ~70ms | 1 | keep (models.py is loaded by everyone) |
| tree_sitter_* | ~25ms each | 1 | keep (file_cache) |

**Estimated main-thread import cost today (worst case, all paths):**
~2500-3000ms (1.0s SDKs + 1.0s web/fastapi + 0.5s GUI extras + ~0.5s transitives).

**Estimated main-thread import cost after this track:**
~500-600ms (`imgui_bundle` + lean `gui_2` + `pydantic` models). Net savings
~2000-2400ms.

---

## 2. Approach

The architecture is already correct. The fix is **systematic application of the
lazy-load + shared-job-pool patterns** the codebase already uses for `RAGEngine`
(`get_rag_engine` in `src/app_controller.py:244-249`) and `MultiAgentConductor`
(`get_mma_conductor` in `src/app_controller.py:266-271`).

### 2.1 Architectural Invariant: Main Thread Purity

> **The main thread (the one that enters `immapp.run()`) must NEVER import a
> module heavier than `imgui_bundle` and the lean `gui_2` skeleton. Every heavy
> import is loaded by the asyncio worker thread, the AppController's shared
> job pool, or the MMA WorkerPool. This invariant is enforced by an audit
> script (CI gate) and a runtime audit-hook test that fails if a heavy import
> is observed on the main thread at startup.**

Concretely, the main thread's import chain is allowed to contain:
- All `import X` statements transitively reachable from `src/gui_2.py` whose
  accumulated import time is < 50ms
- The modules: `imgui_bundle`, `defer`, `src.imgui_scopes`, `src.theme_2`
  (default theme only), `src.theme_models`, `src.paths`, `src.models`,
  `src.events`
- Anything in `sys.stdlib_module_names`

Everything else — provider SDKs, FastAPI, NERV theme, command palette, markdown
table extensions, the full `src.ai_client` provider list, `numpy`/`psutil`/
`tree_sitter_*` if used by lazy code paths — must be loaded by a background
mechanism that does not run on the main thread.

### 2.2 Four layers of protection

#### Layer 1 — Explicit warmup-aware module access (the load-bearing wall, non-negotiable)

Remove heavy imports from the top of source files reachable from the main
thread. Functions that need them use a `_require_warmed(name)` helper that
assumes the module is already in `sys.modules` (because warmup put it there):

```python
# BEFORE (src/ai_client.py, current)
from google import genai
import anthropic
import openai
# ... 5 provider SDKs loaded unconditionally

# AFTER
import sys
import importlib
from typing import Any

def _require_warmed(name: str) -> Any:
    """Get a module that AppController's warmup should have loaded.
    
    Raises RuntimeError if the module is not in sys.modules. This is the
    explicit contract: heavy modules MUST be warmed at startup. No lazy
    loading on first use — the import is paid upfront on a bg thread.
    """
    mod = sys.modules.get(name)
    if mod is None:
        raise RuntimeError(
            f"Module {name!r} is not warmed. "
            f"AppController.__init__ must have run first (which submits warmup jobs)."
        )
    return mod

def _send_gemini(md_content, user_message, ...):
    genai = _require_warmed("google.genai")
    # ... use genai ...
```

**Why no `import X` inside the function body?** Because that would be lazy
loading on first use. If the first use is triggered by a user UI action
(e.g. switching the provider from MiniMax to Gemini, the controller enqueues
an action that propagates to the first call), the user sees a 955ms lag
between their click and any visible response. That's the bad case the user
called out: *"lazy loading introduces latencies when interacting with the UI
state vs the bg state."*

By warming proactively, the first user-triggered call is instant. The cost
is paid during startup on a bg thread, before the user can interact.

**Main-thread cost: zero.** The main thread's import chain is fully lean
(none of the heavy modules are imported top-level). The warmup jobs run on
`_io_pool` workers in parallel with the main thread's remaining init.

#### Layer 2 — Shared job pool on AppController (no new threads per task)

The codebase already has these dedicated / shared threads:
- `AppController._loop_thread` — asyncio worker (**DEDICATED** to the AI event
  loop, do not use for arbitrary work)
- `WorkerPool` (in `src/multi_agent_conductor.py`) — 4-thread pool for MMA
  workers (**DEDICATED** to MMA, do not pollute with imports or I/O)
- `HookServer` thread — **DEDICATED** to the FastAPI server
- Ad-hoc `threading.Thread` calls — used for one-off tasks; the user wants to
  **MINIMIZE** these

**User constraint:** no new daemon threads per import warmup, per I/O task, per
log-prune. We add ONE shared `ThreadPoolExecutor` to `AppController` named
`_io_pool`, and any subsystem that needs background work submits jobs to it.
This includes:
- Initial RAG index warm-up (if applicable)
- Log pruning (currently a one-shot thread — refactor to use the pool)
- Disk-bound subsystem initialization (e.g., TOML re-read on persona switch)
- **Heavy module warmup (the primary use case for this track)**

```python
# In AppController.__init__
from concurrent.futures import ThreadPoolExecutor

self._io_pool = ThreadPoolExecutor(
 max_workers=4,
 thread_name_prefix="controller-io",
)
```

**Threads created by this track: 4** (the pool). Not 4+1 per job, not 1 per
import, not 1 per subsystem. Just 4 long-lived threads that all background work
shares. Future work that needs a bg thread should `controller._io_pool.submit(fn)`.

#### Layer 3 — Proactive warmup + completion notification (the new mechanism)

This is the core of the track. In `AppController.__init__`, immediately after
`_io_pool` is created, the controller submits a job to the pool for each heavy
module that needs warming. The main thread does NOT wait for these to complete.

```python
# In AppController.__init__, right after self._io_pool is created
self._warmup_status: dict[str, list[str]] = {
    "pending": [], "completed": [], "failed": [],
}
self._warmup_lock = threading.Lock()
self._warmup_done_event = threading.Event()
self._warmup_callbacks: list[Callable] = []
self._submit_warmup_jobs()
```

```python
def _submit_warmup_jobs(self) -> None:
    """Submit bg jobs to import heavy modules. Notifies subscribers on completion."""
    heavy = self._compute_warmup_list()
    with self._warmup_lock:
        self._warmup_status["pending"] = list(heavy)
        self._warmup_status["completed"] = []
        self._warmup_status["failed"] = []
        self._warmup_done_event.clear()
    for module_name in heavy:
        self._io_pool.submit(self._warmup_one, module_name)

def _compute_warmup_list(self) -> list[str]:
    result = [
        # AI provider SDKs
        "google.genai", "anthropic", "openai", "requests",
        # Feature-gated GUI (used by main thread but not on first frame)
        "src.command_palette",
        "src.theme_nerv", "src.theme_nerv_fx",
        "src.markdown_table",
    ]
    if self._enable_test_hooks or self._web_host:
        result.extend(["fastapi", "fastapi.security.api_key"])
    return result

def _warmup_one(self, module_name: str) -> None:
    try:
        importlib.import_module(module_name)
        with self._warmup_lock:
            self._warmup_status["pending"].remove(module_name)
            self._warmup_status["completed"].append(module_name)
    except Exception as e:
        with self._warmup_lock:
            self._warmup_status["pending"].remove(module_name)
            self._warmup_status["failed"].append(module_name)
    finally:
        with self._warmup_lock:
            done = not self._warmup_status["pending"]
            callbacks = list(self._warmup_callbacks) if done else []
            if done:
                self._warmup_done_event.set()
        for cb in callbacks:
            try:
                cb(self._warmup_status)
            except Exception:
                pass
```

**Completion notification** is critical for the user-visible UX. Three surfaces:

1. **GUI status indicator** — the status bar shows "Warming up... (5/8)" while
   the bg jobs run, then "All imports ready" with a green dot when complete.
   The GUI never blocks waiting; the indicator is updated by polling
   `controller.warmup_status()` once per frame (cheap, lock-guarded).

2. **GUI toast notification** — when warmup completes, show a toast:
   "All providers ready" with the count of modules loaded. User can dismiss.

3. **Hook API endpoint** — `GET /api/warmup_status` returns the current state;
   `GET /api/warmup_wait?timeout=N` blocks until done (for tests).

The user said: *"the app controller should post to test clients or the user
when its threads are warmed up with imports — that way the user knows 'hey
you have the ui first, but now you have all the functionality.'"* This is
exactly what the notification surfaces achieve.

**Why this beats lazy-loading:** if a user clicks "switch to Gemini" and the
controller lazy-loads `google.genai` on that action, the user sees ~1s of
nothing happening between the click and the visible response. With warmup,
the click is instant because `google.genai` is already in `sys.modules`. The
1s of cost was paid during startup, when the user was looking at a splash or
otherwise not waiting on input.

#### Layer 4 — Worker-process isolation (future, out of scope)

The codebase already runs `gemini_cli` and external MCP servers as subprocesses
for this exact reason. A future track could move `google.genai` / `anthropic` into
their own worker processes, communicating via the existing `SyncEventQueue`. This
track does NOT do this — Layer 1+2+3 is sufficient for the current problem.

### 2.3 Threading constraints (verified empirically)

The user's question: *"if I import in the app controller's thread, will it block
the GUI's thread?"* The answer is:

| Scenario | Blocks GUI? |
|---|---|
| Module top-level import of heavy X, then main imports X | **YES** (X's import is in main's chain). This is why we remove heavy imports from main-thread-reachable files. |
| `_io_pool` worker warming X while main thread renders | **NO direct block, but GIL contention causes micro-stutters** (~5-50ms each). Acceptable because the pool is capped at 4 threads and the main thread is mostly idle in `immapp.run()`. |
| `_io_pool` worker warms X; main thread later calls `_require_warmed("X")` (X already in `sys.modules`) | **NO** (the lookup is a `dict.get()` — instant, no import lock contention). |
| User-triggered UI action (e.g. provider switch) propagates to controller which calls `_require_warmed` on a warmed module | **NO** (lookup is instant). This is the win the user explicitly called out: no user-perceptible lag. |
| `wait_for_warmup()` blocks the asyncio thread waiting for warmup | **NO direct block on GUI** (different thread). Asyncio thread waits; main thread renders. Acceptable but rarely needed if user waits for warmup notification first. |
| Spawning a new `threading.Thread` for each import warmup | **Wasteful** (thread creation ~1-5ms each; thread count explodes). Use the `_io_pool` instead. |

This means: **Layer 1 is non-negotiable.** Even with warmup on `_io_pool`, if
the heavy import is also in the main thread's import chain, the main thread
will block on the import lock the moment it tries to use the module. Layer 1
removes the heavy imports from the main thread's chain; Layer 2 reuses
threads efficiently; Layer 3 proactively warms on bg threads so the FIRST
user-triggered use is instant.

### 2.4 Enforcement: the "main thread purity" audit

Two enforcement mechanisms, both required:

#### Static: `scripts/audit_main_thread_imports.py` (CI gate)

1. AST-walk the import graph reachable from `sloppy.py` (the main entry).
   For each `.py` file in the graph, collect top-level `import X` and
   `from X import Y` statements.

2. Compare against an allowlist of "main-thread-safe" modules (stdlib +
   `imgui_bundle` + the lean gui_2 skeleton list from §2.1). Any
   non-allowlist import is a violation.

3. Exit non-zero with a clear message naming the file, line, and heavy module.

4. Run as part of CI (`uv run python scripts/audit_main_thread_imports.py`)
   and as a pre-commit hook.

#### Runtime: `tests/test_main_thread_purity.py` (TDD, empirical)

1. Spawn `uv run python sloppy.py --headless --enable-test-hooks` as a
   subprocess, with a `sys.addaudithook` callback that logs every
   `import` event with the calling thread.

2. Wait for the headless server to be ready (or 5s timeout).

3. Read the audit log. Assert: every `import` event with
   `threading.current_thread() is threading.main_thread()` was for a module in
   the allowlist.

4. Kill the subprocess.

This is the empirical enforcement: it proves the invariant holds at runtime,
not just at static analysis time.

---

## 3. Architectural Changes

### 3.1 Per-file import plan

For each source file reachable from the main thread's import chain, we
**remove top-level heavy imports** and have functions access them via
`_require_warmed(name)`. The warmup jobs (§3.2) put the modules in
`sys.modules` before any function is called.

#### `src/ai_client.py` (the biggest win: ~1800ms)

Top-level today: `from google import genai`, `import anthropic`, `import openai`,
`import requests` (used by deepseek/minimax).

After:
- **Drop all four heavy imports from the top.** Add `_require_warmed(name)`
  helper at the top.
- `_send_gemini()` calls `_require_warmed("google.genai")` to get the module
- `_send_anthropic()` calls `_require_warmed("anthropic")`
- `_send_deepseek()` and `_send_minimax()` call `_require_warmed("openai")` and `_require_warmed("requests")`
- Provider client objects (`_gemini_client`, `_anthropic_client`, etc.) stay
  as module globals but are now `None` until `_send_*` initializes them
  (extracted from current top-level logic into a new
  `_ensure_<provider>_client()` that uses the warmed module)
- The warmup list in `AppController._compute_warmup_list()` includes
  `google.genai`, `anthropic`, `openai`, `requests` (always warmed)

**Result:** ~1800ms off the main thread. The bg threads pay this cost during
startup. By the time the first AI call happens (which is always async, on
the asyncio thread), the modules are in `sys.modules` and the lookup is
instant. No user-perceptible lag.

#### `src/api_hooks.py` (FastAPI in headless/web only)

Top-level today: `from fastapi import ...`, `from fastapi.security.api_key import ...`
(only needed if `--enable-test-hooks` or `--web-host`).

After:
- **Drop these from top.** Add `_require_warmed(name)` calls inside the
  methods that need them.
- The warmup list in `AppController._compute_warmup_list()` includes
  `fastapi`, `fastapi.security.api_key` **conditionally** — only when
  `enable_test_hooks` or `web_host` is set

**Result:** ~470ms off the main thread for non-test, non-web launches.
For `live_gui` tests (`--enable-test-hooks`), the warmup loads fastapi
during the same startup window, so the hook server is ready when the
process announces readiness.

#### `src/commands.py` (command palette warmup-aware)

Top-level today: `from src.command_palette import ...` at `src/commands.py:1`.

After:
- **Drop the top-level import.** The command functions call
  `_require_warmed("src.command_palette")` to access the module
- The warmup list includes `src.command_palette`

**Result:** ~244ms off the main thread's import chain. The bg thread
warms it during startup; the first `Ctrl+Shift+P` is instant.

#### `src/theme_2.py` (NERV theme warmup-aware)

Top-level today: `from src.theme_nerv import ...`, `from src.theme_nerv_fx import ...`
at the top of `src/theme_2.py`.

After:
- **Drop the top-level imports.** `apply_nerv_theme()` (or the function
  that activates NERV) calls `_require_warmed("src.theme_nerv")` and
  `_require_warmed("src.theme_nerv_fx")`
- The warmup list includes both NERV modules

**Result:** ~485ms off the main thread's import chain (the default
non-NERV path is lean). User pays the cost during startup; theme switch
is instant when they pick NERV.

#### `src/markdown_helper.py` (markdown table warmup-aware)

Top-level today: `from src.markdown_table import ...` at `src/markdown_helper.py:1`.

After:
- **Drop the top-level import.** The table-detection branch of `render()`
  calls `_require_warmed("src.markdown_table")`
- The warmup list includes `src.markdown_table`

**Result:** ~250ms off the main thread's import chain. First markdown
table render is instant.

#### `src/imgui_scopes.py`, `src/gui_2.py`, `src/bg_shader.py` (KEEP `imgui_bundle`)

These MUST keep `import imgui_bundle` at top — the ImGui render loop is the
hot path and needs the module on first frame. There is no way to defer
this without breaking the render loop.

What CAN be deferred inside `src/gui_2.py`:
- `import numpy` (only needed for `bg_shader`; the GUI itself doesn't
  need numpy on the first frame) — move to `_require_warmed("numpy")` in
  the bg shader call site, add `numpy` to the warmup list
- Other feature-gated imports — same pattern

#### `src/gui_2.py` direct heavy imports (audit)

We will use AST to audit which `import X` statements at `src/gui_2.py`
top-level are reachable from the first-frame render path
(`render_main_window`, `render_main_menu_bar`, etc.) and which are
feature-gated. First-frame imports stay top-level. Feature-gated ones
move to `_require_warmed(...)` calls at the use site, with the module
added to the warmup list.

### 3.2 Job pool + warmup scaffolding

New code in `src/app_controller.py`:

```python
from concurrent.futures import ThreadPoolExecutor
import importlib
import threading

# In AppController.__init__, after the asyncio loop starts:
self._io_pool = ThreadPoolExecutor(
 max_workers=4,
 thread_name_prefix="controller-io",
)

# Warmup state
self._warmup_lock = threading.Lock()
self._warmup_done_event = threading.Event()
self._warmup_status: dict[str, list[str]] = {
    "pending": [], "completed": [], "failed": [],
}
self._warmup_callbacks: list[Callable] = []
self._submit_warmup_jobs()
```

`_submit_warmup_jobs()` computes the warmup list and submits one job per
module to the pool:

```python
def _submit_warmup_jobs(self) -> None:
    heavy = self._compute_warmup_list()
    with self._warmup_lock:
        self._warmup_status["pending"] = list(heavy)
        self._warmup_status["completed"] = []
        self._warmup_status["failed"] = []
        self._warmup_done_event.clear()
    for name in heavy:
        self._io_pool.submit(self._warmup_one, name)

def _compute_warmup_list(self) -> list[str]:
    result = [
        "google.genai", "anthropic", "openai", "requests",
        "src.command_palette",
        "src.theme_nerv", "src.theme_nerv_fx",
        "src.markdown_table",
        "numpy",  # used by bg_shader; warmed for first invocation
    ]
    if self._enable_test_hooks or self._web_host:
        result.extend(["fastapi", "fastapi.security.api_key"])
    return result
```

Each warmup worker imports the module, updates the status, and on the
last one fires the completion callbacks (so the GUI status indicator and
toast notification can react):

```python
def _warmup_one(self, name: str) -> None:
    try:
        importlib.import_module(name)
        with self._warmup_lock:
            self._warmup_status["pending"].remove(name)
            self._warmup_status["completed"].append(name)
    except Exception:
        with self._warmup_lock:
            self._warmup_status["pending"].remove(name)
            self._warmup_status["failed"].append(name)
    finally:
        with self._warmup_lock:
            done = not self._warmup_status["pending"]
            cbs = list(self._warmup_callbacks) if done else []
            if done:
                self._warmup_done_event.set()
    for cb in cbs:
        try:
            cb(dict(self._warmup_status))
        except Exception:
            pass
```

Public API on `AppController`:

```python
def warmup_status(self) -> dict[str, list[str]]:
    """Snapshot the current warmup state. Cheap (lock-guarded copy)."""
    with self._warmup_lock:
        return {k: list(v) for k, v in self._warmup_status.items()}

def is_warmup_done(self) -> bool:
    return self._warmup_done_event.is_set()

def wait_for_warmup(self, timeout: float | None = None) -> bool:
    """Block until warmup completes. Returns True on done, False on timeout."""
    return self._warmup_done_event.wait(timeout=timeout)

def on_warmup_complete(self, callback: Callable[[dict], None]) -> None:
    """Register a callback for warmup completion. If already done, fires immediately."""
    with self._warmup_lock:
        if self._warmup_done_event.is_set():
            snap = {k: list(v) for k, v in self._warmup_status.items()}
    if "snap" in dir():  # already done
        callback(snap)
    else:
        with self._warmup_lock:
            self._warmup_callbacks.append(callback)
```

Hook API endpoints (added in `src/api_hooks.py`):

- `GET /api/warmup_status` → `controller.warmup_status()`
- `GET /api/warmup_wait?timeout=N` → blocks until done, returns final status

GUI integration (in `src/gui_2.py`):

- Status bar: "Warming up... (5/8)" while in flight, "All imports ready" + green dot when done. Polled once per frame from `controller.warmup_status()` (cheap, ~microseconds).
- On transition to done: show a toast notification "All providers ready (8 modules)" for 5 seconds.

In `AppController.shutdown()` (or wherever lifecycle cleanup lives):
`self._io_pool.shutdown(wait=False)`. Non-blocking because the pool's
workers are daemon threads and will die with the process anyway.

### 3.3 Startup timing instrumentation

Add `src/startup_profiler.py`:

```python
class StartupProfiler:
    """Records wall-clock time spent in each named init phase.
    
    Cheap (no I/O). Stored on AppController.startup_profile for later inspection
    via the Hook API (`GET /api/startup_profile`) and the Diagnostics panel.
    """
    _phases: list[tuple[str, float, float]]  # (name, start, duration_ms)
    
    @contextmanager
    def phase(self, name: str) -> Iterator[None]:
        t0 = time.perf_counter()
        yield
        self._phases.append((name, t0, (time.perf_counter() - t0) * 1000))
```

Used at every major init step in `AppController.__init__` and `App.__init__`.

---

## 4. Phases

### Phase 1: Audit + Benchmark + Foundation (Day 1)
- T1.1: Run `scripts/benchmark_imports.py` and capture baseline
- T1.2: AST-audit every `import X` in `src/*.py` to map which is reachable
  from the first-frame render path vs feature-gated
- T1.3: Add `StartupProfiler` to `src/app_controller.py` and instrument
  current init
- T1.4: Add `scripts/audit_main_thread_imports.py` (static gate)
- T1.5: Commit baseline + audit script

### Phase 2: Job Pool + Warmup Foundation (Day 1)
- T2.1 (TDD Red): `tests/test_app_controller_io_pool.py` — assert
  `AppController` has a 4-worker `_io_pool` named `controller-io-*`
- T2.2 (Green): Add `_io_pool` to `AppController.__init__` with named threads
- T2.3 (TDD Red): `tests/test_warmup_mechanism.py` — assert warmup jobs are
  submitted in `__init__`, complete within 10s, fire the done event, support
  callbacks, don't block init
- T2.4 (Green): Implement `_submit_warmup_jobs()`, `_compute_warmup_list()`,
  `_warmup_one()`, `warmup_status()`, `is_warmup_done()`, `wait_for_warmup()`,
  `on_warmup_complete()` per spec §3.2
- T2.5: Run T2.1 + T2.3 tests, confirm PASS
- T2.6: Commit

### Phase 3: Remove top-level heavy SDK imports from `src/ai_client.py` (Day 2)
- T3.1 (TDD Red): `tests/test_ai_client_no_top_level_sdk_imports.py` — assert
  `import src.ai_client` does NOT load `google.genai` / `anthropic` / `openai` /
  `requests` (warmup hasn't run in the subprocess)
- T3.2 (Green): Remove the four heavy imports from the top of `ai_client.py`.
  Add `_require_warmed(name)` helper. Each `_send_*` uses
  `_require_warmed("google.genai")` etc.
- T3.3: Run existing `tests/test_ai_client.py`; fix any breakage (tests
  relying on top-level import side effects need a fixture that warms or a
  fallback for test mode)
- T3.4: Confirm T3.1 tests PASS
- T3.5: Commit

### Phase 4: Remove top-level FastAPI imports from `src/api_hooks.py` (Day 2)
- T4.1 (TDD Red): `tests/test_hook_server_no_top_level_fastapi.py` — assert
  `from src.api_hooks import HookServer` does NOT import fastapi
- T4.2 (Green): Remove the fastapi imports from top. Use `_require_warmed`
  inside the methods that need them
- T4.3: Run existing `tests/test_api_hooks.py`; fix
- T4.4: Commit

### Phase 5: Remove top-level imports for feature-gated GUI modules (Day 3)
- T5A: Command Palette — `tests/test_command_palette_no_top_level_import.py`
  + remove from `src/commands.py` + use `_require_warmed("src.command_palette")`
- T5B: NERV Theme — `tests/test_theme_nerv_no_top_level_import.py` + remove
  from `src/theme_2.py` + use `_require_warmed("src.theme_nerv")` etc.
- T5C: Markdown Table — `tests/test_markdown_helper_no_top_level_import.py` +
  remove from `src/markdown_helper.py` + use `_require_warmed("src.markdown_table")`
- T5D: GUI feature-gated — audit `src/gui_2.py` via the T1.2 script, apply
  same pattern. `numpy` migrates to `_require_warmed` in `bg_shader` call site.
- T5E: Commit per module (4 atomic commits)

### Phase 6: Migrate ad-hoc threads to `_io_pool` (Day 4)
- T6.1: Audit: `grep -rn "threading.Thread(" src/` to find all ad-hoc
  thread spawns (excluding `HookServer` and `WorkerPool` which are domain-specific)
- T6.2: Refactor each ad-hoc thread to use `controller.submit_io(fn)` instead
- T6.3: Per-migration commit
- T6.4: Final `grep -rn "threading.Thread(" src/` shows ZERO new spawns

### Phase 7: Warmup Notification (Hook API + GUI) (Day 4)
- T7A.1 (TDD Red): `tests/test_api_hooks_warmup.py` — assert
  `GET /api/warmup_status` and `GET /api/warmup_wait` work
- T7A.2 (Green): Add the two endpoints in `src/api_hooks.py` and register
  `warmup_status` in `_gettable_fields`
- T7B.1: In `src/gui_2.py`, add a status-bar indicator that polls
  `controller.warmup_status()` each frame: "Warming up... (N/M)" while
  pending, "All imports ready" with green dot on completion
- T7B.2: Register a callback via `controller.on_warmup_complete(cb)` that
  shows a toast "All providers ready (M modules)" on success
- T7B.3: Update docs (status bar, toast, hook API)
- T7B.4: Commit

### Phase 8: Enforcement — Runtime Audit Hook (Day 4)
- T8.1 (TDD Red): `tests/test_main_thread_purity.py` — spawn `sloppy.py
  --headless --enable-test-hooks` with a `sys.addaudithook` shim, verify no
  heavy import happens on the main thread
- T8.2: Once Phase 3-5 land, this test should start passing. Wire into CI
  as a gating test (`@pytest.mark.slow`).
- T8.3: Commit

### Phase 9: Verify + Checkpoint (Day 5)
- T9.1: Re-run `scripts/benchmark_imports.py --runs=3`; confirm
  `import src.ai_client` < 50ms, `import src.gui_2` < 500ms,
  `import src.app_controller` < 300ms
- T9.2: Re-run `scripts/audit_main_thread_imports.py`; exit 0
- T9.3: Run `tests/test_warmup_mechanism.py`; warmup completes and notifications fire
- T9.4: Run `tests/test_main_thread_purity.py`; pass
- T9.5: Run full `live_gui` test batch; `wait_for_server(timeout=15)` no
  longer times out. Tests can call `controller.wait_for_warmup()` before
  exercising warmup-dependent functionality.
- T9.6: Manual smoke:
  - `uv run sloppy.py`: time-to-first-frame < 1.5s, observe status indicator
    "Warming up... (N/M)" → "All imports ready" + toast
  - `uv run sloppy.py --enable-test-hooks`: same, plus `/api/warmup_status`
    returns `completed` after a brief wait
  - `uv run sloppy.py --headless`: time-to-server-ready
  - **Provider switch test**: switch from MiniMax to Gemini in the GUI
    after warmup. The action must be INSTANT, not 1s-delayed (proves
    warmup did its job)
- T9.7: Phase checkpoint commit + git note with full verification report
- T9.8: Update `conductor/tracks.md`; archive track
  `uv run sloppy.py --enable-test-hooks` both feel snappier
- T9.6: Phase checkpoint commit with full verification report

---

## 5. Risks and Mitigations

| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Lazy import inside a hot path adds latency on every call | Med | Med | Always gate the import with `sys.modules` check OR use module-level sentinel |
| First AI call on the asyncio thread blocks for ~955ms while `google.genai` imports | High | Low | The user already paid this latency budget; happens on the asyncio worker, not main. Document the expected first-call pause. |
| Lazy import surfaces circular import that was hidden by top-level ordering | Med | Med | Phase 1 audit catches this; defer each lazy import to the test phase |
| Test fixtures import the heavy module before main code, breaking assumptions | Low | Low | `reset_ai_client` and `isolate_workspace` fixtures already lazy-reset |
| Hot reload of a now-lazy module doesn't trigger | Low | Med | Update `HotReloader.HOT_MODULES` to register the lazy module's gate function |
| `_io_pool` worker importing a heavy module holds GIL and stutters GUI | Med | Low | The pool is capped at 4 threads; stutter is bounded; user sees responsive UI before any stutter |
| A future commit re-introduces a heavy import on the main thread | Med | High | Static gate (`audit_main_thread_imports.py`, CI) + runtime audit hook (`test_main_thread_purity.py`) catch this |

### Hot Reload consideration

`src/hot_reloader.py` registers modules at import time. Lazy-loaded modules
(imported inside functions) are NOT registered. The hot-reload workflow needs:
- Either: register the lazy module with a callback that forces a re-import via
  `importlib.reload`
- Or: explicitly trigger the lazy import on hot-reload trigger

This is a small follow-up task; the lazy import itself doesn't break hot reload
(it just means you have to invoke the gate function once to materialize the
module before reload can take effect).

---

## 6. Verification Criteria

The track is complete when:

- [ ] `import src.ai_client` cold start < 50ms (down from ~1800ms)
- [ ] `import src.gui_2` cold start < 500ms (down from ~3000ms)
- [ ] `import src.app_controller` cold start < 300ms (down from ~700ms)
- [ ] `uv run sloppy.py --enable-test-hooks` reaches `immapp.run()` in < 1.5s
- [ ] `live_gui.wait_for_server(timeout=15)` passes for all 273+ tests
- [ ] `scripts/audit_main_thread_imports.py` exits 0 (no heavy imports on main)
- [ ] `tests/test_main_thread_purity.py` passes (runtime audit hook confirms invariant)
- [ ] `scripts/benchmark_imports.py` shows no new red entries in the top-20
- [ ] **`controller.wait_for_warmup(timeout=10.0)` returns True** — warmup completed
      within 10s of `AppController.__init__`
- [ ] **All modules in the warmup list are in `sys.modules` after warmup** —
      `controller.warmup_status()['pending']` is empty, `'completed'` contains
      all expected module names
- [ ] **User-triggered actions on warmed modules are instant** — manual test
      switching providers (e.g. MiniMax → Gemini) after warmup completes shows
      NO perceptible lag (was ~1s with lazy-loading)
- [ ] **GUI status indicator transitions** — observe "Warming up... (N/M)" in
      the status bar, then "All imports ready" with green dot, then a toast
      notification fires via `controller.on_warmup_complete(...)`
- [ ] **Hook API exposes warmup state** — `GET /api/warmup_status` returns
      `{pending: [], completed: [...], failed: []}`; `GET /api/warmup_wait?timeout=10`
      returns the final state
- [ ] **NO `import X` statements inside function bodies for heavy modules** —
      verified by `grep -rn "^\s*import \(google\|anthropic\|openai\|fastapi\|src\.command_palette\|src\.theme_nerv\|src\.markdown_table\)" src/`
- [ ] No regressions in the existing 272/273 passing tests
- [ ] `grep -rn "threading.Thread(" src/` shows ZERO new spawns after Phase 6
      migration (only the existing project scaffolding threads like `HookServer`
      and `WorkerPool` remain, and they're domain-specific)
- [ ] Startup profile + io_pool status visible in `/api/startup_profile`,
      `/api/io_pool_status`, and the Diagnostics panel

---

## 7. Out of Scope

- Process-isolation of heavy SDKs (Layer 4 in §2.2) — future track
- `imgui_bundle` lazy loading — fundamentally impossible (ImGui hot path)
- Importing on the main thread for the lean `gui_2` skeleton (~300ms unavoidable)
- `pydantic` lazy loading (used by `src/models.py` which is imported by 16 files;
  the cost is already amortized and deferring it would cascade)
- Lazy-loading heavy modules in function bodies (Layer 1 in §2.2 — explicitly
  rejected by the user; warmup is the only mechanism)

---

## 8. Cross-References

- `conductor/tracks.md` line 152 — original backlog entry that this track fulfills
- `docs/guide_architecture.md:43-67` — thread domains (asyncio worker is the right
  place for heavy work)
- `docs/guide_architecture.md:880-898` — Architectural Invariants (single-writer
  principle; this track respects it)
- `docs/guide_app_controller.md:241-271` — existing `get_rag_engine` /
  `get_mma_conductor` lazy patterns (the templates this track replicates)
- `docs/guide_hot_reload.md:295-312` — what is/isn't safe to hot-reload
  (lazy-loaded modules need a small follow-up)
- `conductor/workflow.md` — TDD Red-Green-Refactor protocol + atomic per-task
  commits + git notes
- `scripts/benchmark_imports.py` — the measurement tool built in this conversation