docs(ai_client+api_hooks): lazy-loading + warmup endpoints (startup_speedup)

guide_ai_client.md: - Add 'Module-Level Imports' section explaining that the 5 provider SDKs are NOT imported at module level; they're obtained via src.module_loader._require_warmed() after the WarmupManager loads them in the background. (Per startup_speedup_20260606: import src.ai_client went from ~1800ms to ~161ms.) guide_api_hooks.md: - Add 4 warmup endpoints to the endpoints table: /api/warmup_status, /api/warmup_wait?timeout=N, /api/warmup_canaries, /api/startup_timeline - Add 'Warmup API' section with client methods + external script pattern (use get_warmup_wait() instead of time.sleep() race)
2026-06-10 20:00:37 -04:00
parent ca48d33d16
commit 07c1ed4928
2 changed files with 26 additions and 1 deletions
@@ -12,6 +12,12 @@ The module is a **stateful singleton** — all provider state is held in module-

 ---

+## Module-Level Imports
+
+> **Important:** The 5 provider SDKs are **NOT** imported at module level. `import google.genai`, `import anthropic`, `import openai`, and `import fastapi` are heavy (~430-955ms each on cold load) and are now obtained via `src.module_loader._require_warmed("google.genai")` and similar calls, after the `WarmupManager` has loaded them in the background. The module-level globals you see in the State section (`_gemini_client`, `_anthropic_client`, etc.) are typed as `Optional` because they're populated by `_require_warmed()` on first use, not at import time.
+
+This change was part of the 2026-06-06 `startup_speedup_20260606` track. Before: `import src.ai_client` took ~1800ms. After: ~161ms. The remaining cost is the bare module skeleton.
+
 ## Architecture

 ```
@@ -76,8 +76,27 @@ The server runs in a daemon thread. It stops when the process exits (or via `ser
 | `GET` | `/api/performance` | Performance metrics (FPS, frame time) |
 | `GET` | `/api/comms` | Communication log |
 | `GET` | `/api/diagnostics` | Diagnostics state |
+| `GET` | `/api/warmup_status` | Warmup progress snapshot: `{pending, completed, failed}` module lists |
+| `GET` | `/api/warmup_wait?timeout=N` | Server-side blocking wait for warmup completion (up to N seconds) |
+| `GET` | `/api/warmup_canaries` | Per-module import timing records (canary_id, module, thread, elapsed_ms, status) |
+| `GET` | `/api/startup_timeline` | Startup phase breakdown: init_start_ts, warmup_done_ts, first_frame_ts, warmup_ms, first_frame_after_init_ms, first_frame_after_warmup_ms |

-(Full endpoint list may grow; check the live server for the canonical list.)
+### Warmup API
+
+The 4 warmup endpoints (added in `startup_speedup_20260606`) let external clients (test harnesses, scripts, the Command Palette's "Restart" actions) answer two questions:
+1. **Is the app ready?** — `get_warmup_status()` returns the current `{pending, completed, failed}` module lists. `is_warmup_done()` (via `wait_for_warmup`) blocks until all are done.
+2. **Did the warmup block the first frame?** — `get_startup_timeline()` returns the 3 phase breakdowns (AppController init, GUI bundle setup, first render) plus the critical gap between warmup completion and first frame paint.
+
+**Client methods** in `ApiHookClient` (`src/api_hook_client.py:312-348`):
+
+| Method | Endpoint | Purpose |
+|---|---|---|
+| `get_warmup_status()` | `GET /api/warmup_status` | Returns `{pending, completed, failed}` |
+| `get_warmup_wait(timeout=30.0)` | `GET /api/warmup_wait?timeout=N` | Server-side blocking wait |
+| `get_warmup_canaries()` | `GET /api/warmup_canaries` | Per-module import timing |
+| `get_startup_timeline()` | `GET /api/startup_timeline` | Phase breakdown dict |
+
+**External script pattern:** A test or script that needs the app fully ready should call `client.get_warmup_wait(timeout=60)` before any other API call. This replaces the old `time.sleep(N)` race-condition pattern. See `tests/test_api_hooks_warmup.py` for usage examples.

 ### Request/Response Format