Private
Public Access
0
0

conductor(track): create startup_speedup_20260606 track for sloppy.py startup latency

Fulfills the existing backlog entry at conductor/tracks.md:152
(2026-06-05 root-cause analysis of live_gui wait_for_server timeouts).

Main Thread Purity Invariant: the main thread (entering immapp.run())
must never import a module heavier than imgui_bundle and the lean
gui_2 skeleton. Enforced by:
  - static gate: scripts/audit_main_thread_imports.py (CI)
  - runtime hook: tests/test_main_thread_purity.py (sys.addaudithook)

Threading constraint: no new threading.Thread(...) calls in src/.
All background work goes through AppController._io_pool
(ThreadPoolExecutor, max_workers=4, thread_name_prefix='controller-io').

9 phases, 57 tasks: audit+baseline, job pool, lazy-load SDKs, lazy-load
FastAPI, lazy-load feature-gated GUI, migrate ad-hoc threads, runtime
enforcement, hook API + diagnostics, verify+checkpoint.

Expected savings: ~2000-2400ms off main-thread import cost.
Target: import src.ai_client < 50ms (from ~1800ms), live_gui fixtures
no longer time out at wait_for_server(timeout=15).
This commit is contained in:
2026-06-06 12:57:20 -04:00
parent 2adf3274af
commit cd4fb04541
5 changed files with 942 additions and 2 deletions
+3 -2
View File
@@ -149,8 +149,9 @@ User review surfaced five outstanding UI issues, each previously attempted witho
## Remaining Backlog (Phases 3 & 4)
0. [ ] **Track: Sloppy.py Startup Speedup**
*Status: 2026-06-05 — Surfaced during regression_fixes_20260605 root-cause analysis. `sloppy.py --enable-test-hooks` startup latency has crept up; live_gui fixtures time out at `wait_for_server(timeout=15)`. Hypothesized cause: too much init work on the main thread (FastAPI hook server bring-up, log pruner retry loops, MCT startup). Plan: profile startup, move heavy init off the main thread to the controller's background thread pool, defer non-critical subsystems to lazy-init on first use. Spec/plan to follow.*
0. [~] **Track: Sloppy.py Startup Speedup**
*Link: [./tracks/startup_speedup_20260606/](./tracks/startup_speedup_20260606/), Spec: [./tracks/startup_speedup_20260606/spec.md](./tracks/startup_speedup_20260606/spec.md), Plan: [./tracks/startup_speedup_20260606/plan.md](./tracks/startup_speedup_20260606/plan.md)*
*Goal: Reduce `sloppy.py` startup time by ~2000-2400ms via (1) lazy-loading AI provider SDKs (`google.genai` 955ms, `anthropic` 430ms, `openai` 445ms) into the function that uses them, (2) lazy-loading `fastapi` in `HookServer` (~470ms), (3) lazy-loading feature-gated GUI modules (`command_palette` 244ms, `theme_nerv*` 485ms, `markdown_table` 250ms), (4) background prefetch of the default provider SDK on a daemon thread, (5) `StartupProfiler` + `/api/startup_profile` for measurement. Three-layer architecture: lazy in called function (load-bearing), bg prefetch (latency hiding), worker-process (future). Target: `import src.ai_client` < 50ms (from ~1800ms), `import src.gui_2` < 500ms (from ~3000ms), `live_gui.wait_for_server(timeout=15)` no longer times out.*
0b. [x] **Track: rag_phase4_stress_test_flake_20260606** — fixed 16412ad5
*Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (`tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/`) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). `index_file()` upserts silently corrupt the collection, then `search()` fails with `Collection expecting embedding with dimension of 3072, got 384` and the AI request never reaches 'done' status, timing out the 50*0.5s = 25s poll loop. Fix: `RAGEngine._init_vector_store` now calls `_validate_collection_dim` which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: `test_rag_collection_dim_mismatch_recreates_collection` and `test_rag_collection_dim_match_preserves_collection` in `tests/test_rag_engine.py`. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*