conductor(track): create startup_speedup_20260606 track for sloppy.py startup latency
Fulfills the existing backlog entry at conductor/tracks.md:152 (2026-06-05 root-cause analysis of live_gui wait_for_server timeouts). Main Thread Purity Invariant: the main thread (entering immapp.run()) must never import a module heavier than imgui_bundle and the lean gui_2 skeleton. Enforced by: - static gate: scripts/audit_main_thread_imports.py (CI) - runtime hook: tests/test_main_thread_purity.py (sys.addaudithook) Threading constraint: no new threading.Thread(...) calls in src/. All background work goes through AppController._io_pool (ThreadPoolExecutor, max_workers=4, thread_name_prefix='controller-io'). 9 phases, 57 tasks: audit+baseline, job pool, lazy-load SDKs, lazy-load FastAPI, lazy-load feature-gated GUI, migrate ad-hoc threads, runtime enforcement, hook API + diagnostics, verify+checkpoint. Expected savings: ~2000-2400ms off main-thread import cost. Target: import src.ai_client < 50ms (from ~1800ms), live_gui fixtures no longer time out at wait_for_server(timeout=15).
This commit is contained in:
+3
-2
@@ -149,8 +149,9 @@ User review surfaced five outstanding UI issues, each previously attempted witho
|
||||
|
||||
## Remaining Backlog (Phases 3 & 4)
|
||||
|
||||
0. [ ] **Track: Sloppy.py Startup Speedup**
|
||||
*Status: 2026-06-05 — Surfaced during regression_fixes_20260605 root-cause analysis. `sloppy.py --enable-test-hooks` startup latency has crept up; live_gui fixtures time out at `wait_for_server(timeout=15)`. Hypothesized cause: too much init work on the main thread (FastAPI hook server bring-up, log pruner retry loops, MCT startup). Plan: profile startup, move heavy init off the main thread to the controller's background thread pool, defer non-critical subsystems to lazy-init on first use. Spec/plan to follow.*
|
||||
0. [~] **Track: Sloppy.py Startup Speedup**
|
||||
*Link: [./tracks/startup_speedup_20260606/](./tracks/startup_speedup_20260606/), Spec: [./tracks/startup_speedup_20260606/spec.md](./tracks/startup_speedup_20260606/spec.md), Plan: [./tracks/startup_speedup_20260606/plan.md](./tracks/startup_speedup_20260606/plan.md)*
|
||||
*Goal: Reduce `sloppy.py` startup time by ~2000-2400ms via (1) lazy-loading AI provider SDKs (`google.genai` 955ms, `anthropic` 430ms, `openai` 445ms) into the function that uses them, (2) lazy-loading `fastapi` in `HookServer` (~470ms), (3) lazy-loading feature-gated GUI modules (`command_palette` 244ms, `theme_nerv*` 485ms, `markdown_table` 250ms), (4) background prefetch of the default provider SDK on a daemon thread, (5) `StartupProfiler` + `/api/startup_profile` for measurement. Three-layer architecture: lazy in called function (load-bearing), bg prefetch (latency hiding), worker-process (future). Target: `import src.ai_client` < 50ms (from ~1800ms), `import src.gui_2` < 500ms (from ~3000ms), `live_gui.wait_for_server(timeout=15)` no longer times out.*
|
||||
|
||||
0b. [x] **Track: rag_phase4_stress_test_flake_20260606** — fixed 16412ad5
|
||||
*Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (`tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/`) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). `index_file()` upserts silently corrupt the collection, then `search()` fails with `Collection expecting embedding with dimension of 3072, got 384` and the AI request never reaches 'done' status, timing out the 50*0.5s = 25s poll loop. Fix: `RAGEngine._init_vector_store` now calls `_validate_collection_dim` which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: `test_rag_collection_dim_mismatch_recreates_collection` and `test_rag_collection_dim_match_preserves_collection` in `tests/test_rag_engine.py`. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*
|
||||
|
||||
@@ -0,0 +1,70 @@
|
||||
{
|
||||
"track_id": "startup_speedup_20260606",
|
||||
"name": "Sloppy.py Startup Speedup",
|
||||
"initialized": "2026-06-06",
|
||||
"owner": "tier2-tech-lead",
|
||||
"priority": "high",
|
||||
"status": "active",
|
||||
"type": "refactor + performance",
|
||||
"scope": {
|
||||
"new_files": [
|
||||
"src/startup_profiler.py",
|
||||
"scripts/audit_main_thread_imports.py",
|
||||
"scripts/audit_gui2_imports.py",
|
||||
"tests/test_ai_client_lazy_imports.py",
|
||||
"tests/test_hook_server_lazy_fastapi.py",
|
||||
"tests/test_app_controller_io_pool.py",
|
||||
"tests/test_command_palette_lazy.py",
|
||||
"tests/test_theme_nerv_lazy.py",
|
||||
"tests/test_markdown_helper_lazy.py",
|
||||
"tests/test_main_thread_purity.py",
|
||||
"tests/test_startup_profiler.py",
|
||||
"tests/test_io_pool_endpoint.py"
|
||||
],
|
||||
"modified_files": [
|
||||
"src/ai_client.py",
|
||||
"src/api_hooks.py",
|
||||
"src/app_controller.py",
|
||||
"src/commands.py",
|
||||
"src/command_palette.py",
|
||||
"src/theme_2.py",
|
||||
"src/theme_nerv.py",
|
||||
"src/theme_nerv_fx.py",
|
||||
"src/markdown_helper.py",
|
||||
"src/markdown_table.py",
|
||||
"src/gui_2.py",
|
||||
"src/log_pruner.py",
|
||||
"src/project_manager.py"
|
||||
]
|
||||
},
|
||||
"blocked_by": [],
|
||||
"blocks": [],
|
||||
"estimated_phases": 9,
|
||||
"spec": "spec.md",
|
||||
"plan": "plan.md",
|
||||
"architectural_invariant": "The main thread (the one that enters immapp.run()) must NEVER import a module heavier than imgui_bundle and the lean gui_2 skeleton. Enforced by scripts/audit_main_thread_imports.py (static CI gate) and tests/test_main_thread_purity.py (runtime audit-hook test).",
|
||||
"threading_constraint": "NO new threading.Thread(...) calls in src/. All background work must go through AppController._io_pool (ThreadPoolExecutor, max_workers=4, thread_name_prefix='controller-io').",
|
||||
"verification_criteria": [
|
||||
"import src.ai_client < 50ms cold start (from ~1800ms)",
|
||||
"import src.gui_2 < 500ms cold start (from ~3000ms)",
|
||||
"import src.app_controller < 300ms cold start (from ~700ms)",
|
||||
"uv run sloppy.py --enable-test-hooks reaches immapp.run() in < 1.5s",
|
||||
"live_gui.wait_for_server(timeout=15) passes for all tests",
|
||||
"scripts/audit_main_thread_imports.py exits 0 (no main-thread heavy imports)",
|
||||
"tests/test_main_thread_purity.py passes (runtime audit hook confirms invariant)",
|
||||
"No regressions in 273+ existing tests",
|
||||
"ZERO new threading.Thread(...) calls in src/ (after Phase 6 migration)",
|
||||
"Startup profile + io_pool status visible via /api/startup_profile and /api/io_pool_status"
|
||||
],
|
||||
"links": {
|
||||
"backlog_entry": "conductor/tracks.md:152",
|
||||
"benchmark_script": "scripts/benchmark_imports.py",
|
||||
"audit_script": "scripts/audit_main_thread_imports.py",
|
||||
"related_docs": [
|
||||
"docs/guide_architecture.md",
|
||||
"docs/guide_app_controller.md",
|
||||
"docs/guide_hot_reload.md",
|
||||
"docs/guide_testing.md"
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,232 @@
|
||||
# Plan: Sloppy.py Startup Speedup
|
||||
|
||||
**Track:** `startup_speedup_20260606`
|
||||
**Spec:** [./spec.md](./spec.md)
|
||||
**Status:** In progress
|
||||
**Started:** 2026-06-06
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Audit + Benchmark + Foundation
|
||||
|
||||
- [ ] **T1.1** Capture baseline with `scripts/benchmark_imports.py --runs=3 --color=never > docs/startup_baseline_20260606.txt`
|
||||
- [ ] **T1.2** Write `scripts/audit_gui2_imports.py` (AST walker): for each `import X` in `src/gui_2.py`, classify as `first-frame` (reachable from `main()` / `render_main_window` etc.) vs `feature-gated` (inside an `if/elif` branch that requires user action). Commit audit results to `docs/startup_audit_20260606.md`.
|
||||
- [ ] **T1.3** Add `src/startup_profiler.py` with `StartupProfiler` class (context manager `phase(name)`). Wire into `AppController.__init__` and `App.__init__` at 8 major init points. (No new test; verify via manual run + diagnostics panel.) `[T1.3]`
|
||||
- [ ] **T1.4** Write `scripts/audit_main_thread_imports.py` (static gate, fails CI). AST-walks the import graph reachable from `sloppy.py`, collects all top-level `import X` / `from X import Y`, compares against an allowlist. Exits non-zero with file:line:module on violation. Allowlist: `sys.stdlib_module_names` + the lean gui_2 skeleton list from `spec.md:2.1` (`imgui_bundle`, `defer`, `src.imgui_scopes`, `src.theme_2` (default theme only), `src.theme_models`, `src.paths`, `src.models`, `src.events`).
|
||||
- [ ] **T1.5** Commit baseline + audit script: `git add . && git commit -m "conductor(startup): baseline measurements + main thread import audit script"` + git note
|
||||
|
||||
**Phase 1 checkpoint:** Baseline established. Static gate exists. All three import classes (first-frame, feature-gated, background-safe) documented.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Job Pool Foundation (the "no new threads" rule)
|
||||
|
||||
The user constraint: no new `threading.Thread(...)` per task, per import, per
|
||||
ad-hoc job. The codebase gets ONE shared `ThreadPoolExecutor` on `AppController`,
|
||||
named `_io_pool`, used by any subsystem that needs background work.
|
||||
|
||||
- [ ] **T2.1 (Red)** `tests/test_app_controller_io_pool.py`:
|
||||
- `test_app_controller_has_io_pool`: instantiate `AppController`, assert `hasattr(controller, '_io_pool')` and it's a `ThreadPoolExecutor`
|
||||
- `test_io_pool_uses_named_threads`: submit a job, assert the executing thread name starts with `controller-io`
|
||||
- `test_io_pool_size_is_4`: assert `_io_pool._max_workers == 4`
|
||||
- `test_io_pool_shuts_down_on_close`: call `controller.shutdown()`, assert the pool is shut down
|
||||
- Confirm FAIL (no `_io_pool` yet)
|
||||
- [ ] **T2.2 (Green)** In `src/app_controller.py`:
|
||||
- Add `from concurrent.futures import ThreadPoolExecutor` at top
|
||||
- In `__init__`, after the asyncio loop starts and BEFORE the existing HookServer block: `self._io_pool = ThreadPoolExecutor(max_workers=4, thread_name_prefix="controller-io")`
|
||||
- In `shutdown()` (already exists in `App.shutdown` for the GUI; ensure the AppController has a matching shutdown that calls `self._io_pool.shutdown(wait=False)`)
|
||||
- Add `controller.submit_io(fn, *args)` helper: `return self._io_pool.submit(fn, *args)` (with a docstring saying "use this instead of `threading.Thread` for new background work")
|
||||
- [ ] **T2.3** Run T2.1 tests; confirm PASS
|
||||
- [ ] **T2.4** Commit: `feat(app_controller): add shared _io_pool ThreadPoolExecutor` + git note
|
||||
|
||||
**Phase 2 checkpoint:** `AppController` owns a 4-thread named pool. `controller.submit_io(fn)` is the sanctioned way to do background work. Existing ad-hoc threads still exist (will be migrated in Phase 5).
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Lazy-load AI Provider SDKs (TDD)
|
||||
|
||||
- [ ] **T3.1 (Red)** Write `tests/test_ai_client_lazy_imports.py`:
|
||||
- `test_ai_client_does_not_import_genai_at_module_level`: spawn fresh subprocess, `import src.ai_client`, assert `'google.genai' not in sys.modules` (or `google.genai` in modules but `_gemini_client` is `None`)
|
||||
- `test_ai_client_does_not_import_anthropic_at_module_level`
|
||||
- `test_ai_client_does_not_import_openai_at_module_level`
|
||||
- `test_ai_client_does_not_import_requests_at_module_level`
|
||||
- Confirm tests FAIL (proves the imports are currently eager)
|
||||
- [ ] **T3.2 (Green)** In `src/ai_client.py`:
|
||||
- Remove `from google import genai` from top
|
||||
- Remove `import anthropic` from top
|
||||
- Remove `import openai` from top
|
||||
- Remove `import requests` from top
|
||||
- Add lazy imports inside `_send_gemini`, `_send_anthropic`, `_send_deepseek`, `_send_minimax`
|
||||
- Provider client globals stay as `None` until first `_ensure_<provider>_client()` call
|
||||
- [ ] **T3.3** Run existing `tests/test_ai_client.py`; fix any breakage. Most likely issue: tests that rely on top-level import side effects need a fixture that triggers lazy init.
|
||||
- [ ] **T3.4** Re-run T3.1 tests, confirm PASS
|
||||
- [ ] **T3.5** Commit: `git commit -m "refactor(ai_client): lazy-load provider SDKs to defer ~1800ms off main thread"` + git note
|
||||
- [ ] **T3.6** Update `conductor/tracks.md` T3 row with SHA
|
||||
|
||||
**Phase 3 checkpoint:** `import src.ai_client` < 50ms cold. All 273 existing tests still pass.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Lazy-load FastAPI in HookServer (TDD)
|
||||
|
||||
- [ ] **T4.1 (Red)** Write `tests/test_hook_server_lazy_fastapi.py`:
|
||||
- `test_hook_server_does_not_import_fastapi_at_module_level`: subprocess test
|
||||
- `test_hook_server_does_not_import_fastapi_security_at_module_level`
|
||||
- Confirm FAIL
|
||||
- [ ] **T4.2 (Green)** In `src/api_hooks.py`:
|
||||
- Remove `from fastapi import ...` from top
|
||||
- Remove `from fastapi.security.api_key import APIKeyHeader` from top
|
||||
- Add lazy imports inside the methods that need them (FastAPI app construction, route registration)
|
||||
- [ ] **T4.3** Run existing `tests/test_api_hooks.py`; fix breakage
|
||||
- [ ] **T4.4** Confirm T4.1 tests PASS
|
||||
- [ ] **T4.5** Commit: `git commit -m "refactor(api_hooks): lazy-load fastapi to defer ~470ms off main thread"` + git note
|
||||
|
||||
**Phase 4 checkpoint:** `from src.api_hooks import HookServer` does not import fastapi.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Lazy-load Feature-gated GUI Modules (TDD per module)
|
||||
|
||||
### 5A: Command Palette
|
||||
|
||||
- [ ] **T5A.1 (Red)** `tests/test_command_palette_lazy.py`: `from src.commands import COMMANDS` (or whatever the eager import is) does not import `src.command_palette`. Confirm FAIL.
|
||||
- [ ] **T5A.2 (Green)** In `src/commands.py`: move `from src.command_palette import ...` inside the command functions that open the palette (`_open_command_palette`, `_toggle_command_palette`).
|
||||
- [ ] **T5A.3** Run `tests/test_command_palette.py`; fix.
|
||||
- [ ] **T5A.4** Commit: `refactor(commands): lazy-load command_palette to defer 244ms`
|
||||
|
||||
### 5B: NERV Theme
|
||||
|
||||
- [ ] **T5B.1 (Red)** `tests/test_theme_nerv_lazy.py`: `from src.theme_2 import *` (or whatever) does not import `src.theme_nerv` or `src.theme_nerv_fx`. Confirm FAIL.
|
||||
- [ ] **T5B.2 (Green)** In `src/theme_2.py`: move `from src.theme_nerv import ...` and `from src.theme_nerv_fx import ...` inside `apply_nerv_theme()` (or whichever function activates the theme).
|
||||
- [ ] **T5B.3** Run `tests/test_theme_2.py` and `tests/test_theme_nerv.py`; fix.
|
||||
- [ ] **T5B.4** Commit: `refactor(theme): lazy-load nerv theme to defer 485ms off non-nerv path`
|
||||
|
||||
### 5C: Markdown Table
|
||||
|
||||
- [ ] **T5C.1 (Red)** `tests/test_markdown_helper_lazy.py`: `from src.markdown_helper import MarkdownRenderer` does not import `src.markdown_table`. Confirm FAIL.
|
||||
- [ ] **T5C.2 (Green)** In `src/markdown_helper.py`: move `from src.markdown_table import ...` inside the table-detection branch of `render()`.
|
||||
- [ ] **T5C.3** Run `tests/test_markdown_helper.py`; fix.
|
||||
- [ ] **T5C.4** Commit: `refactor(markdown): lazy-load markdown_table to defer 250ms off non-table markdown`
|
||||
|
||||
### 5D: GUI module feature-gated imports
|
||||
|
||||
- [ ] **T5D.1** Run `scripts/audit_gui2_imports.py` (built in T1.2); collect list of feature-gated imports in `src/gui_2.py`
|
||||
- [ ] **T5D.2** For each feature-gated import, apply the same TDD pattern (5A-5C). Group into 1-2 atomic commits per logical feature.
|
||||
- [ ] **T5D.3** Run full GUI test suite; fix.
|
||||
- [ ] **T5D.4** Commit per feature group
|
||||
|
||||
**Phase 5 checkpoint:** Feature-gated imports are lazy. Default-theme / non-palette / non-table path is lean.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Migrate Ad-hoc Threads to `_io_pool`
|
||||
|
||||
The codebase has several ad-hoc `threading.Thread(...)` calls. Per the user
|
||||
constraint, these should migrate to `controller.submit_io(fn)`. **This phase
|
||||
audits and migrates them, but does NOT add new prefetch threads** (the heavy
|
||||
SDKs are lazy-only per spec §2.2 Layer 3).
|
||||
|
||||
- [ ] **T6.1** Audit: `grep -rn "threading.Thread(" src/` to find all ad-hoc thread spawns. Document each in `state.toml` (a new `[ad_hoc_threads]` section).
|
||||
- [ ] **T6.2** For each ad-hoc thread in `src/log_pruner.py`, `src/project_manager.py`, etc., refactor to use `controller.submit_io(fn)` instead. Wrap the callable body in a try/except (the pool's default behavior is to surface exceptions via the Future; preserve existing error logging).
|
||||
- [ ] **T6.3** Run full test suite; fix.
|
||||
- [ ] **T6.4** Per-migration commit (or grouped by subsystem if 3+ threads in one file). Final commit: `refactor: migrate ad-hoc threads to AppController._io_pool` + git note.
|
||||
|
||||
**Phase 6 checkpoint:** `grep -rn "threading.Thread(" src/` shows ZERO new spawns after this phase (existing project scaffolding threads like `HookServer` and `MMA WorkerPool` are exempt — they're domain-specific).
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: Enforcement (Runtime Audit Hook)
|
||||
|
||||
The static gate (T1.4) catches known imports at audit time. This phase adds
|
||||
empirical enforcement: a test that spawns `sloppy.py` and verifies NO heavy
|
||||
import happens on the main thread at runtime.
|
||||
|
||||
- [ ] **T7.1 (Red)** `tests/test_main_thread_purity.py`:
|
||||
- `test_headless_startup_no_heavy_imports_on_main`: spawn `uv run python sloppy.py --headless --enable-test-hooks` with a `sitecustomize.py` shim that installs `sys.addaudithook` to log every `import` event with the calling thread. The hook writes to a temp file as JSON-L.
|
||||
- Wait for headless server ready (5s timeout via `ApiHookClient`).
|
||||
- Read the audit log. Assert: no event with `thread_name == "MainThread"` for any module in the heavy denylist (`google.genai`, `anthropic`, `openai`, `fastapi`, `requests`, `numpy`, `tkinter`, `psutil`, `pydantic`, `tree_sitter_*`, `src.command_palette`, `src.theme_nerv`, `src.theme_nerv_fx`, `src.markdown_table`, `src.ai_client.send_*`-direct).
|
||||
- Kill subprocess. Confirm FAIL (current state imports these on main).
|
||||
- [ ] **T7.2** Once Phase 3-5 land and the static gate passes, this test should start passing. If it doesn't, debug and add more lazy imports.
|
||||
- [ ] **T7.3** Wire `test_main_thread_purity.py` into CI as a gating test (it'll be slow, ~10s, so mark with `@pytest.mark.slow` and only run in batched CI).
|
||||
- [ ] **T7.4** Commit: `test: empirical main-thread purity check via sys.audit hook` + git note
|
||||
|
||||
**Phase 7 checkpoint:** CI fails if a future commit re-introduces a heavy main-thread import.
|
||||
|
||||
---
|
||||
|
||||
## Phase 8: Hook API + Diagnostics
|
||||
|
||||
- [ ] **T8.1** Add `/api/startup_profile` endpoint in `src/api_hooks.py` returning `controller.startup_profiler.snapshot()`
|
||||
- [ ] **T8.2** Register `startup_profile` in `_gettable_fields`
|
||||
- [ ] **T8.3** Add a "Startup Profile" section to the Diagnostics panel (`src/gui_2.py` `_render_diagnostics` or similar). Show: phase name, duration, % of total.
|
||||
- [ ] **T8.4** Add `/api/io_pool_status` endpoint returning `{max_workers, active_threads, queued, completed}` so the user can see the job pool is alive.
|
||||
- [ ] **T8.5** Update `docs/guide_api_hooks.md` with both new endpoints.
|
||||
- [ ] **T8.6** Tests: extend `tests/test_api_hooks.py` + new `tests/test_startup_profiler.py` + new `tests/test_io_pool_endpoint.py`.
|
||||
- [ ] **T8.7** Commit: `feat(diagnostics): expose startup profile and io_pool status via Hook API` + git note
|
||||
|
||||
**Phase 8 checkpoint:** User can see per-phase startup cost + job-pool liveness in the GUI.
|
||||
|
||||
---
|
||||
|
||||
## Phase 9: Verify + Phase Checkpoint
|
||||
|
||||
- [ ] **T9.1** Re-run `scripts/benchmark_imports.py --runs=3`. Save to `docs/startup_after_20260606.txt`. Diff against T1.1 baseline; confirm:
|
||||
- `import src.ai_client` < 50ms
|
||||
- `import src.gui_2` < 500ms
|
||||
- `import src.app_controller` < 300ms (includes `_io_pool` creation; should still be < 300ms)
|
||||
- [ ] **T9.2** Re-run `scripts/audit_main_thread_imports.py` (T1.4). Confirm exit 0. No violations.
|
||||
- [ ] **T9.3** Run `live_gui` test batch (per `conductor/workflow.md:147-150`: max 4 test files per batch, long timeout):
|
||||
- `uv run pytest tests/test_live_gui_*.py --timeout=60 -v` in batches
|
||||
- Confirm `wait_for_server(timeout=15)` does not time out
|
||||
- [ ] **T9.4** Manual smoke:
|
||||
- `uv run sloppy.py` (normal mode): time-to-first-frame
|
||||
- `uv run sloppy.py --enable-test-hooks` (test mode): time-to-first-frame
|
||||
- `uv run sloppy.py --headless` (headless): time-to-server-ready
|
||||
- [ ] **T9.5** Phase checkpoint commit: `conductor(checkpoint): Phase 9 complete - sloppy.py startup speedup track` + git note with full verification report
|
||||
- [ ] **T9.6** Update `conductor/tracks.md`: mark track complete, link to archived folder
|
||||
|
||||
**Phase 9 checkpoint:** All verification criteria in `spec.md:6` met.
|
||||
|
||||
---
|
||||
|
||||
## Definition of Done
|
||||
|
||||
- [ ] All Phase 1-9 tasks checked
|
||||
- [ ] All tests pass (273+ existing + new TDD tests including `test_main_thread_purity`)
|
||||
- [ ] `uv run ruff check .` and `uv run mypy --explicit-package-bases .` clean (per `mma-tier2-tech-lead` skill)
|
||||
- [ ] `uv run python scripts/audit_main_thread_imports.py` exits 0
|
||||
- [ ] `docs/startup_baseline_20260606.txt` and `docs/startup_after_20260606.txt` archived
|
||||
- [ ] Phase 9 git note contains: baseline diff, audit script result, runtime audit hook result, full test batch results, manual smoke timings, file inventory
|
||||
- [ ] Track moved to `conductor/tracks/archive/`
|
||||
- [ ] **NO new `threading.Thread(...)` calls in `src/`** (verified by `grep -rn "threading.Thread(" src/`)
|
||||
|
||||
---
|
||||
|
||||
## Notes for Tier 3 Workers
|
||||
|
||||
- **Always use 1-space indentation for Python code.** Confirm via `uv run python -c "import ast; ..."` AST check if you do any class-body reorganization (the "Indentation-Driven Class Method Visibility" pitfall in `conductor/workflow.md`).
|
||||
- **Test fixtures**: `isolate_workspace`, `reset_paths`, `reset_ai_client`, `vlogger`, `kill_process_tree`, `mock_app`, `live_gui` — see `docs/guide_testing.md`.
|
||||
- **Subprocess tests for module-level imports**: spawn `uv run python -c "..."` and inspect `sys.modules` after the import. Pattern:
|
||||
```python
|
||||
result = subprocess.run(
|
||||
[sys.executable, "-c", "import sys; import src.ai_client; import json; print(json.dumps(sorted(sys.modules.keys())))"],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
assert 'google.genai' not in result.stdout
|
||||
```
|
||||
- **For new background work**: use `controller.submit_io(fn, *args)`, NOT `threading.Thread(target=fn).start()`. The user constraint is "no new threads."
|
||||
- **Atomic commits per task.** No batching. If a task touches 3 files, commit all 3 in one commit but the commit message describes the task.
|
||||
- **The `_io_pool` is a daemon executor by default in Python 3.9+; non-daemon workers in 3.8.** Check `pyproject.toml` for `requires-python`. Either way, the pool is shut down on `AppController.shutdown()`.
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
- Spec: [./spec.md](./spec.md)
|
||||
- Original backlog entry: `conductor/tracks.md:152`
|
||||
- Benchmark tool: `scripts/benchmark_imports.py`
|
||||
- Lazy pattern templates: `src/app_controller.py:241-271` (RAG + MMA)
|
||||
- Threading constraints: `docs/guide_architecture.md:43-67`
|
||||
- Architectural Invariant: `spec.md:2.1`
|
||||
- Job pool spec: `spec.md:2.2 Layer 2`
|
||||
- Hot reload constraints: `docs/guide_hot_reload.md:295-312`
|
||||
@@ -0,0 +1,527 @@
|
||||
# Track: Sloppy.py Startup Speedup
|
||||
|
||||
**Status:** Active
|
||||
**Initialized:** 2026-06-06
|
||||
**Owner:** Tier 2 Tech Lead
|
||||
**Priority:** High (regression blocker — `live_gui` fixtures time out at `wait_for_server(timeout=15)`)
|
||||
|
||||
---
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
`uv run sloppy.py --enable-test-hooks` startup latency has crept up. `live_gui` tests
|
||||
time out at `wait_for_server(timeout=15)`. Root cause is **too much work on the main
|
||||
thread before `immapp.run()` returns and the GUI becomes interactive**:
|
||||
|
||||
- 5 AI provider SDKs (`google.genai`, `anthropic`, `openai`, `requests`, ...) eagerly
|
||||
imported at `src/ai_client.py` module top-level, even though only one is the active
|
||||
provider at runtime
|
||||
- `imgui_bundle` transitively pulls `numpy` and 9 other heavy modules at the top of
|
||||
`src/gui_2.py` and 9 sibling files
|
||||
- NERV theme, command palette, markdown table extensions are loaded eagerly even
|
||||
though they are feature-gated
|
||||
- `AppController.__init__` does all subsystem construction synchronously on the
|
||||
thread that will become the main GUI thread (path manager, presets, personas,
|
||||
context presets, tool presets, history, workspace, RAG, hook server)
|
||||
|
||||
The architecture is already correct: AI calls go through the asyncio worker thread,
|
||||
so the *call* is non-blocking. The *imports* are still synchronous on the main
|
||||
thread, and that is what the user sees as "sloppy.py is slow to open."
|
||||
|
||||
### 1.1 Measurement Baseline (from `scripts/benchmark_imports.py`)
|
||||
|
||||
Cold-start subprocess timings, median of 3 runs, 85 unique import paths:
|
||||
|
||||
| module | time | files | classification |
|
||||
|---|---:|---:|---|
|
||||
| google.genai | ~955ms | 1 | **defer (provider SDK, default)** |
|
||||
| openai | ~445ms | 1 | defer (provider SDK) |
|
||||
| anthropic | ~430ms | 1 | defer (provider SDK) |
|
||||
| src.markdown_table | ~250ms | 1 | defer (feature-gated) |
|
||||
| src.theme_nerv | ~245ms | 1 | defer (feature-gated) |
|
||||
| imgui_bundle | ~245ms | 10 | **KEEP (ImGui hot path)** |
|
||||
| src.command_palette | ~244ms | 1 | defer (feature-gated) |
|
||||
| src.theme_nerv_fx | ~240ms | 1 | defer (feature-gated) |
|
||||
| fastapi (+ security.api_key) | ~470ms combined | 1 | defer (only `--enable-test-hooks` or web mode) |
|
||||
| requests | ~92ms | 3 | defer (deepseek/minimax only) |
|
||||
| numpy | ~65ms | 2 | keep (bg_shader; optional in gui_2) |
|
||||
| pydantic | ~70ms | 1 | keep (models.py is loaded by everyone) |
|
||||
| tree_sitter_* | ~25ms each | 1 | keep (file_cache) |
|
||||
|
||||
**Estimated main-thread import cost today (worst case, all paths):**
|
||||
~2500-3000ms (1.0s SDKs + 1.0s web/fastapi + 0.5s GUI extras + ~0.5s transitives).
|
||||
|
||||
**Estimated main-thread import cost after this track:**
|
||||
~500-600ms (`imgui_bundle` + lean `gui_2` + `pydantic` models). Net savings
|
||||
~2000-2400ms.
|
||||
|
||||
---
|
||||
|
||||
## 2. Approach
|
||||
|
||||
The architecture is already correct. The fix is **systematic application of the
|
||||
lazy-load + shared-job-pool patterns** the codebase already uses for `RAGEngine`
|
||||
(`get_rag_engine` in `src/app_controller.py:244-249`) and `MultiAgentConductor`
|
||||
(`get_mma_conductor` in `src/app_controller.py:266-271`).
|
||||
|
||||
### 2.1 Architectural Invariant: Main Thread Purity
|
||||
|
||||
> **The main thread (the one that enters `immapp.run()`) must NEVER import a
|
||||
> module heavier than `imgui_bundle` and the lean `gui_2` skeleton. Every heavy
|
||||
> import is loaded by the asyncio worker thread, the AppController's shared
|
||||
> job pool, or the MMA WorkerPool. This invariant is enforced by an audit
|
||||
> script (CI gate) and a runtime audit-hook test that fails if a heavy import
|
||||
> is observed on the main thread at startup.**
|
||||
|
||||
Concretely, the main thread's import chain is allowed to contain:
|
||||
- All `import X` statements transitively reachable from `src/gui_2.py` whose
|
||||
accumulated import time is < 50ms
|
||||
- The modules: `imgui_bundle`, `defer`, `src.imgui_scopes`, `src.theme_2`
|
||||
(default theme only), `src.theme_models`, `src.paths`, `src.models`,
|
||||
`src.events`
|
||||
- Anything in `sys.stdlib_module_names`
|
||||
|
||||
Everything else — provider SDKs, FastAPI, NERV theme, command palette, markdown
|
||||
table extensions, the full `src.ai_client` provider list, `numpy`/`psutil`/
|
||||
`tree_sitter_*` if used by lazy code paths — must be loaded by a background
|
||||
mechanism that does not run on the main thread.
|
||||
|
||||
### 2.2 Four layers of protection
|
||||
|
||||
#### Layer 1 — Pure lazy loading (the load-bearing wall, non-negotiable)
|
||||
|
||||
Move heavy imports from module top-level into the function body that needs them:
|
||||
|
||||
```python
|
||||
# BEFORE (src/ai_client.py, current)
|
||||
from google import genai
|
||||
import anthropic
|
||||
import openai
|
||||
# ... 5 provider SDKs loaded unconditionally
|
||||
|
||||
# AFTER
|
||||
def _send_gemini(md_content, user_message, ...):
|
||||
from google import genai # 955ms, paid once, on the first call's thread
|
||||
...
|
||||
|
||||
def _send_anthropic(...):
|
||||
import anthropic
|
||||
...
|
||||
```
|
||||
|
||||
**Main-thread cost: zero.** First call still pays the latency, but it happens on
|
||||
the asyncio worker thread (per `guide_architecture.md:215-234`), so the GUI never
|
||||
sees it.
|
||||
|
||||
#### Layer 2 — Shared job pool on AppController (no new threads per task)
|
||||
|
||||
The codebase already has these dedicated / shared threads:
|
||||
- `AppController._loop_thread` — asyncio worker (**DEDICATED** to the AI event
|
||||
loop, do not use for arbitrary work)
|
||||
- `WorkerPool` (in `src/multi_agent_conductor.py`) — 4-thread pool for MMA
|
||||
workers (**DEDICATED** to MMA, do not pollute with imports or I/O)
|
||||
- `HookServer` thread — **DEDICATED** to the FastAPI server
|
||||
- Ad-hoc `threading.Thread` calls — used for one-off tasks; the user wants to
|
||||
**MINIMIZE** these
|
||||
|
||||
**User constraint:** no new daemon threads per import prefetch, per I/O task, per
|
||||
log-prune. We add ONE shared `ThreadPoolExecutor` to `AppController` named
|
||||
`_io_pool`, and any subsystem that needs background work submits jobs to it.
|
||||
This includes:
|
||||
- Initial RAG index warm-up (if applicable)
|
||||
- Log pruning (currently a one-shot thread — refactor to use the pool)
|
||||
- Disk-bound subsystem initialization (e.g., TOML re-read on persona switch)
|
||||
- Any other ad-hoc I/O
|
||||
|
||||
```python
|
||||
# In AppController.__init__
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
|
||||
self._io_pool = ThreadPoolExecutor(
|
||||
max_workers=4,
|
||||
thread_name_prefix="controller-io",
|
||||
)
|
||||
```
|
||||
|
||||
**Threads created by this track: 4** (the pool). Not 4+1 per job, not 1 per
|
||||
import, not 1 per subsystem. Just 4 long-lived threads that all background work
|
||||
shares. Future work that needs a bg thread should `controller._io_pool.submit(fn)`.
|
||||
|
||||
#### Layer 3 — NO prefetch of the heaviest SDKs (deliberate)
|
||||
|
||||
The original Phase 5 of this plan proposed a `import-prefetch` daemon thread that
|
||||
warms `google.genai` (~955ms) on a background thread. **This has been explicitly
|
||||
rejected** for the heavy SDKs, and the reasoning is sound:
|
||||
|
||||
- A 955ms import on a background thread holds the GIL for ~10-50ms at a time
|
||||
during C extension init. Each hold stalls the main thread's render loop.
|
||||
- The user pays 955ms total either way: prefetch = 955ms of background stutter
|
||||
+ instant first call; lazy-only = 955ms of stutter on the first call only,
|
||||
with the GUI fully interactive in between.
|
||||
- Prefetching wastes the import cost when the user never uses that provider
|
||||
(e.g., default is Gemini but the user actually only uses Anthropic).
|
||||
|
||||
**Rule: heavy SDKs (`google.genai`, `anthropic`, `openai`, `fastapi`) are
|
||||
lazy-only, never prefetched.** Lighter modules (themes, command palette,
|
||||
markdown table) MAY be optionally warmed on the `_io_pool` if profiling shows
|
||||
they're commonly used, but it's not a hard requirement and the default is
|
||||
"don't warm."
|
||||
|
||||
#### Layer 4 — Worker-process isolation (future, out of scope)
|
||||
|
||||
The codebase already runs `gemini_cli` and external MCP servers as subprocesses
|
||||
for this exact reason. A future track could move `google.genai` / `anthropic` into
|
||||
their own worker processes, communicating via the existing `SyncEventQueue`. This
|
||||
track does NOT do this — Layer 1+2+3 is sufficient for the current problem.
|
||||
|
||||
### 2.3 Threading constraints (verified empirically)
|
||||
|
||||
The user's question: *"if I import in the app controller's thread, will it block
|
||||
the GUI's thread?"* The answer is:
|
||||
|
||||
| Scenario | Blocks GUI? |
|
||||
|---|---|
|
||||
| Module top-level import of heavy X, then main imports X | **YES** (X's import is in main's chain) |
|
||||
| Lazy import of X inside a function called from the asyncio thread | **NO** (asyncio thread blocks, not main) |
|
||||
| Lazy import of X inside a function called from the main thread | **YES** (first call only; the function caller blocks) |
|
||||
| `_io_pool` worker importing X while main thread renders | **NO direct block, but GIL contention causes micro-stutters** (~5-50ms each). Acceptable because the pool is capped at 4 threads. |
|
||||
| `_io_pool` worker imports X; main thread later imports X (same module) | **YES** (main blocks on per-module import lock until worker finishes). This is why Layer 1 must come first. |
|
||||
| Spawning a new `threading.Thread` for each import prefetch | **Wasteful** (thread creation ~1-5ms each; thread count explodes). Use the `_io_pool` instead. |
|
||||
|
||||
This means: **Layer 1 is non-negotiable.** Even with the `_io_pool`, if the
|
||||
heavy import is also in the main thread's import chain, the main thread will
|
||||
block on the import lock the moment it tries to use the module. Layer 1 removes
|
||||
the heavy imports from the main thread's chain; Layer 2 reuses threads
|
||||
efficiently; Layer 3 deliberately avoids prefetching the heaviest.
|
||||
|
||||
### 2.4 Enforcement: the "main thread purity" audit
|
||||
|
||||
Two enforcement mechanisms, both required:
|
||||
|
||||
#### Static: `scripts/audit_main_thread_imports.py` (CI gate)
|
||||
|
||||
1. AST-walk the import graph reachable from `sloppy.py` (the main entry).
|
||||
For each `.py` file in the graph, collect top-level `import X` and
|
||||
`from X import Y` statements.
|
||||
|
||||
2. Compare against an allowlist of "main-thread-safe" modules (stdlib +
|
||||
`imgui_bundle` + the lean gui_2 skeleton list from §2.1). Any
|
||||
non-allowlist import is a violation.
|
||||
|
||||
3. Exit non-zero with a clear message naming the file, line, and heavy module.
|
||||
|
||||
4. Run as part of CI (`uv run python scripts/audit_main_thread_imports.py`)
|
||||
and as a pre-commit hook.
|
||||
|
||||
#### Runtime: `tests/test_main_thread_purity.py` (TDD, empirical)
|
||||
|
||||
1. Spawn `uv run python sloppy.py --headless --enable-test-hooks` as a
|
||||
subprocess, with a `sys.addaudithook` callback that logs every
|
||||
`import` event with the calling thread.
|
||||
|
||||
2. Wait for the headless server to be ready (or 5s timeout).
|
||||
|
||||
3. Read the audit log. Assert: every `import` event with
|
||||
`threading.current_thread() is threading.main_thread()` was for a module in
|
||||
the allowlist.
|
||||
|
||||
4. Kill the subprocess.
|
||||
|
||||
This is the empirical enforcement: it proves the invariant holds at runtime,
|
||||
not just at static analysis time.
|
||||
|
||||
---
|
||||
|
||||
## 3. Architectural Changes
|
||||
|
||||
### 3.1 Per-file import plan
|
||||
|
||||
#### `src/ai_client.py` (the biggest win: ~1800ms)
|
||||
|
||||
Top-level today: `from google import genai`, `import anthropic`, `import openai`,
|
||||
`import requests` (used by deepseek/minimax).
|
||||
|
||||
After:
|
||||
- Drop `from google import genai` from top — lazy in `_send_gemini()`
|
||||
- Drop `import anthropic` from top — lazy in `_send_anthropic()`
|
||||
- Drop `import openai` from top — lazy in `_send_deepseek()` and `_send_minimax()`
|
||||
- Drop `import requests` from top — lazy in those two providers' HTTP code
|
||||
- Provider client objects (`_gemini_client`, `_anthropic_client`, etc.) stay as
|
||||
module globals but are now `None` until first use
|
||||
- The `_send_*` functions check their provider client is initialized and call a
|
||||
new `_ensure_<provider>_client()` lazy initializer (extracted from the current
|
||||
top-level logic)
|
||||
|
||||
**Result:** ~1800ms off the main thread. First AI call still pays it, but on
|
||||
the asyncio worker.
|
||||
|
||||
#### `src/app_controller.py` (FastAPI in headless/web only)
|
||||
|
||||
Top-level today: `from fastapi import ...`, `from fastapi.security.api_key import ...`
|
||||
(only needed if `--enable-test-hooks` or `--web-host`).
|
||||
|
||||
After:
|
||||
- Drop these from top — lazy inside `HookServer.__init__` (which is itself lazy
|
||||
in the controller: `if enable_test_hooks: from src.api_hooks import HookServer; ...`)
|
||||
|
||||
**Result:** ~470ms off the main thread for non-test, non-web launches. Critical
|
||||
because `live_gui` tests launch with `--enable-test-hooks` but the FastAPI work
|
||||
can be deferred until the asyncio loop is ready.
|
||||
|
||||
#### `src/commands.py` and `src/command_palette.py` (command palette lazy)
|
||||
|
||||
Top-level today: `from src.command_palette import ...` at `src/commands.py:1`.
|
||||
|
||||
After:
|
||||
- Lazy in each `_*_command()` function in `src/commands.py` that actually
|
||||
opens the palette
|
||||
- The CommandRegistry decorator can keep module-level function references, but
|
||||
the *body* of the command does the heavy import
|
||||
|
||||
**Result:** ~244ms off if user doesn't open palette during the first session.
|
||||
|
||||
#### `src/theme_2.py` and `src/theme_nerv.py` / `src/theme_nerv_fx.py` (NERV theme lazy)
|
||||
|
||||
Top-level today: NERV modules imported at `src/theme_2.py` module top.
|
||||
|
||||
After:
|
||||
- Lazy in `apply_nerv_theme()` (the function that activates NERV)
|
||||
- The default theme path stays lean (uses only `src/theme_2.py` + `src/theme_models.py`)
|
||||
|
||||
**Result:** ~485ms off if user doesn't pick NERV theme (the default path).
|
||||
|
||||
#### `src/markdown_helper.py` (markdown table lazy)
|
||||
|
||||
Top-level today: `from src.markdown_table import ...` at `src/markdown_helper.py:1`.
|
||||
|
||||
After:
|
||||
- Lazy in `_render_table_block()` (or wherever GFM table detection happens)
|
||||
- The first markdown render that hits a table pays the 250ms; subsequent hits are
|
||||
cached in `sys.modules`
|
||||
|
||||
**Result:** ~250ms off the first markdown render that lacks tables (typical).
|
||||
|
||||
#### `src/imgui_scopes.py`, `src/gui_2.py`, `src/bg_shader.py` (KEEP `imgui_bundle`)
|
||||
|
||||
These MUST keep `import imgui_bundle` at top — the ImGui render loop is the hot
|
||||
path and needs the module on first frame. There is no way to defer this without
|
||||
breaking the render loop.
|
||||
|
||||
What CAN be deferred inside `src/gui_2.py`:
|
||||
- `import numpy` (only needed for `bg_shader`; the GUI itself doesn't need numpy
|
||||
on the first frame)
|
||||
- Other feature-gated imports
|
||||
|
||||
#### `src/gui_2.py` direct heavy imports (audit)
|
||||
|
||||
We will use AST to audit which `import X` statements at `src/gui_2.py` top-level
|
||||
are reachable from the first-frame render path (`render_main_window`,
|
||||
`render_main_menu_bar`, etc.) and which are feature-gated. Feature-gated ones
|
||||
move inside the function that gates them.
|
||||
|
||||
### 3.2 Job pool scaffolding
|
||||
|
||||
New code in `src/app_controller.py`:
|
||||
|
||||
```python
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
|
||||
# In AppController.__init__, after the asyncio loop starts:
|
||||
self._io_pool = ThreadPoolExecutor(
|
||||
max_workers=4,
|
||||
thread_name_prefix="controller-io",
|
||||
)
|
||||
|
||||
def submit_io(self, fn, *args, **kwargs):
|
||||
"""Submit a background job to the shared I/O pool. Use this instead of
|
||||
threading.Thread for new background work.
|
||||
|
||||
Returns a concurrent.futures.Future. Caller can .result() if they need
|
||||
to block, or .add_done_callback for fire-and-forget with error handling.
|
||||
"""
|
||||
return self._io_pool.submit(fn, *args, **kwargs)
|
||||
```
|
||||
|
||||
In `AppController.shutdown()` (or wherever lifecycle cleanup lives):
|
||||
`self._io_pool.shutdown(wait=False)`. Non-blocking because the pool's
|
||||
workers are daemon threads and will die with the process anyway.
|
||||
|
||||
### 3.3 Startup timing instrumentation
|
||||
|
||||
Add `src/startup_profiler.py`:
|
||||
|
||||
```python
|
||||
class StartupProfiler:
|
||||
"""Records wall-clock time spent in each named init phase.
|
||||
|
||||
Cheap (no I/O). Stored on AppController.startup_profile for later inspection
|
||||
via the Hook API (`GET /api/startup_profile`) and the Diagnostics panel.
|
||||
"""
|
||||
_phases: list[tuple[str, float, float]] # (name, start, duration_ms)
|
||||
|
||||
@contextmanager
|
||||
def phase(self, name: str) -> Iterator[None]:
|
||||
t0 = time.perf_counter()
|
||||
yield
|
||||
self._phases.append((name, t0, (time.perf_counter() - t0) * 1000))
|
||||
```
|
||||
|
||||
Used at every major init step in `AppController.__init__` and `App.__init__`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Phases
|
||||
|
||||
### Phase 1: Audit + Benchmark + Foundation (Day 1)
|
||||
- T1.1: Run `scripts/benchmark_imports.py` and capture baseline
|
||||
- T1.2: AST-audit every `import X` in `src/*.py` to map which is reachable
|
||||
from the first-frame render path vs feature-gated
|
||||
- T1.3: Add `StartupProfiler` to `src/app_controller.py` and instrument
|
||||
current init
|
||||
- T1.4: Add `scripts/audit_main_thread_imports.py` (static gate)
|
||||
- T1.5: Commit baseline + audit script
|
||||
|
||||
### Phase 2: Job Pool Foundation (Day 1) — the "no new threads" rule
|
||||
- T2.1 (TDD Red): Write `tests/test_app_controller_io_pool.py` asserting
|
||||
`AppController` has a `_io_pool: ThreadPoolExecutor` with 4 workers, named
|
||||
`controller-io-*`
|
||||
- T2.2 (Green): Add `self._io_pool = ThreadPoolExecutor(max_workers=4,
|
||||
thread_name_prefix="controller-io")` to `AppController.__init__`. Add
|
||||
`submit_io(fn, *args)` helper. Wire shutdown into `controller.shutdown()`.
|
||||
- T2.3: Verify T2.1 tests pass + full suite still passes
|
||||
|
||||
### Phase 3: Lazy-load AI provider SDKs (Day 2)
|
||||
- T3.1 (TDD Red): Write `tests/test_ai_client_lazy_imports.py` asserting
|
||||
`import src.ai_client` does NOT import any provider SDK
|
||||
- T3.2 (Green): Move `from google import genai` / `import anthropic` /
|
||||
`import openai` / `import requests` into their respective `_send_*` functions
|
||||
- T3.3: Verify existing `tests/test_ai_client.py` still passes
|
||||
- T3.4: Commit, re-run benchmark, expect `import src.ai_client` < 50ms
|
||||
|
||||
### Phase 4: Lazy-load FastAPI in `HookServer` (Day 2)
|
||||
- T4.1 (TDD Red): Write `tests/test_hook_server_lazy_fastapi.py` asserting
|
||||
`from src.api_hooks import HookServer` does NOT import fastapi
|
||||
- T4.2 (Green): Move `from fastapi import ...` inside the methods that need them
|
||||
- T4.3: Verify existing `tests/test_api_hooks.py` still passes
|
||||
- T4.4: Commit
|
||||
|
||||
### Phase 5: Lazy-load feature-gated GUI modules (Day 3)
|
||||
- T5.1: Lazy-load `src.command_palette` in `src/commands.py`
|
||||
- T5.2: Lazy-load `src.theme_nerv` and `src.theme_nerv_fx` in `src/theme_2.py`
|
||||
- T5.3: Lazy-load `src.markdown_table` in `src/markdown_helper.py`
|
||||
- T5.4: Audit and lazy-load feature-gated imports in `src/gui_2.py`
|
||||
- T5.5: Run all GUI tests; fix any circular imports
|
||||
- T5.6: Commit per task
|
||||
|
||||
### Phase 6: Migrate ad-hoc threads to `_io_pool` (Day 4)
|
||||
- T6.1: Audit: `grep -rn "threading.Thread(" src/` to find all ad-hoc
|
||||
thread spawns (excluding `HookServer` and `WorkerPool` which are domain-specific)
|
||||
- T6.2: Refactor each ad-hoc thread to use `controller.submit_io(fn)` instead
|
||||
- T6.3: Per-migration commit
|
||||
- T6.4: Final `grep -rn "threading.Thread(" src/` shows ZERO new spawns
|
||||
(the grep result should be identical to the T6.1 audit list, no new entries)
|
||||
|
||||
### Phase 7: Enforcement — Runtime Audit Hook (Day 4)
|
||||
- T7.1 (TDD Red): `tests/test_main_thread_purity.py` — spawn `sloppy.py
|
||||
--headless --enable-test-hooks` with a `sys.addaudithook` shim, verify no
|
||||
heavy import happens on the main thread
|
||||
- T7.2: Once Phase 3-5 land, this test should start passing. Wire into CI.
|
||||
- T7.3: Commit
|
||||
|
||||
### Phase 8: Hook API + Diagnostics (Day 5)
|
||||
- T8.1: Add `/api/startup_profile` endpoint
|
||||
- T8.2: Add `/api/io_pool_status` endpoint
|
||||
- T8.3: Add to `_gettable_fields` and the Diagnostics panel
|
||||
- T8.4: Document in `docs/guide_api_hooks.md`
|
||||
- T8.5: Tests + commit
|
||||
|
||||
### Phase 9: Verify + Checkpoint (Day 5)
|
||||
- T9.1: Re-run `scripts/benchmark_imports.py`; confirm `import src.gui_2` and
|
||||
`import src.ai_client` are now < 100ms each
|
||||
- T9.2: Re-run `scripts/audit_main_thread_imports.py`; exit 0
|
||||
- T9.3: Run `tests/test_main_thread_purity.py`; pass
|
||||
- T9.4: Run full `live_gui` test batch; `wait_for_server(timeout=15)` no
|
||||
longer times out
|
||||
- T9.5: Manual smoke test: `uv run sloppy.py` and
|
||||
`uv run sloppy.py --enable-test-hooks` both feel snappier
|
||||
- T9.6: Phase checkpoint commit with full verification report
|
||||
|
||||
---
|
||||
|
||||
## 5. Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| Lazy import inside a hot path adds latency on every call | Med | Med | Always gate the import with `sys.modules` check OR use module-level sentinel |
|
||||
| First AI call on the asyncio thread blocks for ~955ms while `google.genai` imports | High | Low | The user already paid this latency budget; happens on the asyncio worker, not main. Document the expected first-call pause. |
|
||||
| Lazy import surfaces circular import that was hidden by top-level ordering | Med | Med | Phase 1 audit catches this; defer each lazy import to the test phase |
|
||||
| Test fixtures import the heavy module before main code, breaking assumptions | Low | Low | `reset_ai_client` and `isolate_workspace` fixtures already lazy-reset |
|
||||
| Hot reload of a now-lazy module doesn't trigger | Low | Med | Update `HotReloader.HOT_MODULES` to register the lazy module's gate function |
|
||||
| `_io_pool` worker importing a heavy module holds GIL and stutters GUI | Med | Low | The pool is capped at 4 threads; stutter is bounded; user sees responsive UI before any stutter |
|
||||
| A future commit re-introduces a heavy import on the main thread | Med | High | Static gate (`audit_main_thread_imports.py`, CI) + runtime audit hook (`test_main_thread_purity.py`) catch this |
|
||||
|
||||
### Hot Reload consideration
|
||||
|
||||
`src/hot_reloader.py` registers modules at import time. Lazy-loaded modules
|
||||
(imported inside functions) are NOT registered. The hot-reload workflow needs:
|
||||
- Either: register the lazy module with a callback that forces a re-import via
|
||||
`importlib.reload`
|
||||
- Or: explicitly trigger the lazy import on hot-reload trigger
|
||||
|
||||
This is a small follow-up task; the lazy import itself doesn't break hot reload
|
||||
(it just means you have to invoke the gate function once to materialize the
|
||||
module before reload can take effect).
|
||||
|
||||
---
|
||||
|
||||
## 6. Verification Criteria
|
||||
|
||||
The track is complete when:
|
||||
|
||||
- [ ] `import src.ai_client` cold start < 50ms (down from ~1800ms)
|
||||
- [ ] `import src.gui_2` cold start < 500ms (down from ~3000ms)
|
||||
- [ ] `import src.app_controller` cold start < 300ms (down from ~700ms)
|
||||
- [ ] `uv run sloppy.py --enable-test-hooks` reaches `immapp.run()` in < 1.5s
|
||||
- [ ] `live_gui.wait_for_server(timeout=15)` passes for all 273+ tests
|
||||
- [ ] `scripts/audit_main_thread_imports.py` exits 0 (no heavy imports on main)
|
||||
- [ ] `tests/test_main_thread_purity.py` passes (runtime audit hook confirms invariant)
|
||||
- [ ] `scripts/benchmark_imports.py` shows no new red entries in the top-20
|
||||
- [ ] First AI call latency on the asyncio thread is < 1500ms (pays the SDK load once,
|
||||
then the user has a snappy first call forever after). Main thread sees ZERO
|
||||
of this cost.
|
||||
- [ ] No regressions in the existing 272/273 passing tests
|
||||
- [ ] `grep -rn "threading.Thread(" src/` shows ZERO new spawns after Phase 6
|
||||
migration (only the existing project scaffolding threads like `HookServer`
|
||||
and `WorkerPool` remain, and they're domain-specific)
|
||||
- [ ] Startup profile + io_pool status visible in `/api/startup_profile`,
|
||||
`/api/io_pool_status`, and the Diagnostics panel
|
||||
|
||||
---
|
||||
|
||||
## 7. Out of Scope
|
||||
|
||||
- Process-isolation of heavy SDKs (Layer 4 in §2.2) — future track
|
||||
- `imgui_bundle` lazy loading — fundamentally impossible (ImGui hot path)
|
||||
- Importing on the main thread for the lean `gui_2` skeleton (~300ms unavoidable)
|
||||
- `pydantic` lazy loading (used by `src/models.py` which is imported by 16 files;
|
||||
the cost is already amortized and deferring it would cascade)
|
||||
- Prefetch / warm-up of the heavy SDKs in the background (Layer 3 in §2.2 is
|
||||
deliberately the "do nothing" layer; the user pays the import cost once on
|
||||
first use, on the asyncio thread, not in the background)
|
||||
|
||||
---
|
||||
|
||||
## 8. Cross-References
|
||||
|
||||
- `conductor/tracks.md` line 152 — original backlog entry that this track fulfills
|
||||
- `docs/guide_architecture.md:43-67` — thread domains (asyncio worker is the right
|
||||
place for heavy work)
|
||||
- `docs/guide_architecture.md:880-898` — Architectural Invariants (single-writer
|
||||
principle; this track respects it)
|
||||
- `docs/guide_app_controller.md:241-271` — existing `get_rag_engine` /
|
||||
`get_mma_conductor` lazy patterns (the templates this track replicates)
|
||||
- `docs/guide_hot_reload.md:295-312` — what is/isn't safe to hot-reload
|
||||
(lazy-loaded modules need a small follow-up)
|
||||
- `conductor/workflow.md` — TDD Red-Green-Refactor protocol + atomic per-task
|
||||
commits + git notes
|
||||
- `scripts/benchmark_imports.py` — the measurement tool built in this conversation
|
||||
@@ -0,0 +1,110 @@
|
||||
# Track state for startup_speedup_20260606
|
||||
# Updated by Tier 2 Tech Lead as tasks complete
|
||||
|
||||
[meta]
|
||||
track_id = "startup_speedup_20260606"
|
||||
name = "Sloppy.py Startup Speedup"
|
||||
status = "active"
|
||||
current_phase = 1
|
||||
last_updated = "2026-06-06"
|
||||
|
||||
[phases]
|
||||
phase_1 = { status = "in_progress", checkpoint_sha = "", name = "Audit + Benchmark + Foundation" }
|
||||
phase_2 = { status = "pending", checkpoint_sha = "", name = "Job Pool Foundation (no new threads)" }
|
||||
phase_3 = { status = "pending", checkpoint_sha = "", name = "Lazy-load AI provider SDKs" }
|
||||
phase_4 = { status = "pending", checkpoint_sha = "", name = "Lazy-load FastAPI in HookServer" }
|
||||
phase_5 = { status = "pending", checkpoint_sha = "", name = "Lazy-load feature-gated GUI modules" }
|
||||
phase_6 = { status = "pending", checkpoint_sha = "", name = "Migrate ad-hoc threads to _io_pool" }
|
||||
phase_7 = { status = "pending", checkpoint_sha = "", name = "Enforcement: runtime audit hook" }
|
||||
phase_8 = { status = "pending", checkpoint_sha = "", name = "Hook API + Diagnostics" }
|
||||
phase_9 = { status = "pending", checkpoint_sha = "", name = "Verify + Checkpoint" }
|
||||
|
||||
[tasks]
|
||||
# Phase 1: Audit + Benchmark + Foundation
|
||||
t1_1 = { status = "pending", commit_sha = "", description = "Capture baseline benchmark" }
|
||||
t1_2 = { status = "pending", commit_sha = "", description = "Audit src/gui_2.py imports (first-frame vs feature-gated)" }
|
||||
t1_3 = { status = "pending", commit_sha = "", description = "Add StartupProfiler and instrument init" }
|
||||
t1_4 = { status = "pending", commit_sha = "", description = "Write scripts/audit_main_thread_imports.py (static CI gate)" }
|
||||
t1_5 = { status = "pending", commit_sha = "", description = "Commit baseline + audit script" }
|
||||
# Phase 2: Job Pool Foundation
|
||||
t2_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_app_controller_io_pool.py" }
|
||||
t2_2 = { status = "pending", commit_sha = "", description = "Green: add _io_pool ThreadPoolExecutor + submit_io helper to AppController" }
|
||||
t2_3 = { status = "pending", commit_sha = "", description = "Confirm T2.1 tests pass + full suite still passes" }
|
||||
t2_4 = { status = "pending", commit_sha = "", description = "Commit T2" }
|
||||
# Phase 3: Lazy-load AI Provider SDKs
|
||||
t3_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_ai_client_lazy_imports.py" }
|
||||
t3_2 = { status = "pending", commit_sha = "", description = "Green: move provider SDK imports into _send_* funcs" }
|
||||
t3_3 = { status = "pending", commit_sha = "", description = "Fix existing test_ai_client.py breakage" }
|
||||
t3_4 = { status = "pending", commit_sha = "", description = "Confirm T3.1 tests PASS" }
|
||||
t3_5 = { status = "pending", commit_sha = "", description = "Commit T3" }
|
||||
t3_6 = { status = "pending", commit_sha = "", description = "Update tracks.md T3 row" }
|
||||
# Phase 4: Lazy-load FastAPI
|
||||
t4_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_hook_server_lazy_fastapi.py" }
|
||||
t4_2 = { status = "pending", commit_sha = "", description = "Green: move fastapi imports into HookServer methods" }
|
||||
t4_3 = { status = "pending", commit_sha = "", description = "Fix existing test_api_hooks.py breakage" }
|
||||
t4_4 = { status = "pending", commit_sha = "", description = "Confirm T4.1 tests PASS" }
|
||||
t4_5 = { status = "pending", commit_sha = "", description = "Commit T4" }
|
||||
# Phase 5A: Command Palette
|
||||
t5a_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_command_palette_lazy.py" }
|
||||
t5a_2 = { status = "pending", commit_sha = "", description = "Green: lazy-load in src/commands.py" }
|
||||
t5a_3 = { status = "pending", commit_sha = "", description = "Fix existing test_command_palette.py" }
|
||||
t5a_4 = { status = "pending", commit_sha = "", description = "Commit T5A" }
|
||||
# Phase 5B: NERV Theme
|
||||
t5b_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_theme_nerv_lazy.py" }
|
||||
t5b_2 = { status = "pending", commit_sha = "", description = "Green: lazy-load in src/theme_2.py" }
|
||||
t5b_3 = { status = "pending", commit_sha = "", description = "Fix existing test_theme_2.py + test_theme_nerv.py" }
|
||||
t5b_4 = { status = "pending", commit_sha = "", description = "Commit T5B" }
|
||||
# Phase 5C: Markdown Table
|
||||
t5c_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_markdown_helper_lazy.py" }
|
||||
t5c_2 = { status = "pending", commit_sha = "", description = "Green: lazy-load in src/markdown_helper.py" }
|
||||
t5c_3 = { status = "pending", commit_sha = "", description = "Fix existing test_markdown_helper.py" }
|
||||
t5c_4 = { status = "pending", commit_sha = "", description = "Commit T5C" }
|
||||
# Phase 5D: gui_2 feature-gated imports
|
||||
t5d_1 = { status = "pending", commit_sha = "", description = "Run audit_gui2_imports.py and collect feature-gated list" }
|
||||
t5d_2 = { status = "pending", commit_sha = "", description = "Apply TDD pattern per feature-gated import" }
|
||||
t5d_3 = { status = "pending", commit_sha = "", description = "Run full GUI test suite; fix" }
|
||||
t5d_4 = { status = "pending", commit_sha = "", description = "Commit per feature group" }
|
||||
# Phase 6: Migrate ad-hoc threads
|
||||
t6_1 = { status = "pending", commit_sha = "", description = "Audit threading.Thread( spawns; document each" }
|
||||
t6_2 = { status = "pending", commit_sha = "", description = "Refactor each ad-hoc thread to use controller.submit_io" }
|
||||
t6_3 = { status = "pending", commit_sha = "", description = "Run full test suite; fix" }
|
||||
t6_4 = { status = "pending", commit_sha = "", description = "Commit per migration; final grep shows zero new spawns" }
|
||||
# Phase 7: Enforcement - Runtime Audit Hook
|
||||
t7_1 = { status = "pending", commit_sha = "", description = "Red: tests/test_main_thread_purity.py" }
|
||||
t7_2 = { status = "pending", commit_sha = "", description = "Confirm test passes after Phase 3-5" }
|
||||
t7_3 = { status = "pending", commit_sha = "", description = "Wire into CI as @pytest.mark.slow gating test" }
|
||||
t7_4 = { status = "pending", commit_sha = "", description = "Commit T7" }
|
||||
# Phase 8: Hook API + Diagnostics
|
||||
t8_1 = { status = "pending", commit_sha = "", description = "Add /api/startup_profile endpoint" }
|
||||
t8_2 = { status = "pending", commit_sha = "", description = "Add /api/io_pool_status endpoint" }
|
||||
t8_3 = { status = "pending", commit_sha = "", description = "Add startup profile + io_pool status to Diagnostics panel" }
|
||||
t8_4 = { status = "pending", commit_sha = "", description = "Update docs/guide_api_hooks.md" }
|
||||
t8_5 = { status = "pending", commit_sha = "", description = "Tests for endpoints + profiler round-trip" }
|
||||
t8_6 = { status = "pending", commit_sha = "", description = "Commit T8" }
|
||||
# Phase 9: Verify + Checkpoint
|
||||
t9_1 = { status = "pending", commit_sha = "", description = "Re-run benchmark; diff vs baseline" }
|
||||
t9_2 = { status = "pending", commit_sha = "", description = "Re-run audit_main_thread_imports.py; exit 0" }
|
||||
t9_3 = { status = "pending", commit_sha = "", description = "Run test_main_thread_purity.py; pass" }
|
||||
t9_4 = { status = "pending", commit_sha = "", description = "Run live_gui test batch; confirm wait_for_server passes" }
|
||||
t9_5 = { status = "pending", commit_sha = "", description = "Manual smoke (normal, test-hooks, headless modes)" }
|
||||
t9_6 = { status = "pending", commit_sha = "", description = "Phase checkpoint commit + git note" }
|
||||
t9_7 = { status = "pending", commit_sha = "", description = "Update tracks.md; archive track" }
|
||||
|
||||
[verification]
|
||||
# To be filled at Phase 9
|
||||
baseline_ai_client_ms = 0
|
||||
after_ai_client_ms = 0
|
||||
baseline_gui_2_ms = 0
|
||||
after_gui_2_ms = 0
|
||||
baseline_app_controller_ms = 0
|
||||
after_app_controller_ms = 0
|
||||
live_gui_passed = 0
|
||||
live_gui_failed = 0
|
||||
audit_main_thread_violations = 0
|
||||
io_pool_max_workers = 4
|
||||
io_pool_thread_name_prefix = "controller-io"
|
||||
new_threading_thread_calls = 0
|
||||
|
||||
[ad_hoc_threads]
|
||||
# Filled in Phase 6 T6.1 audit
|
||||
# Format: {file = "src/foo.py", line = 42, current_target = "lambda", proposed_target = "controller.submit_io(...)"}
|
||||
Reference in New Issue
Block a user