ed/manual_slop

Private

Public Access

Fork 0

Files

T

ed 96158edd97 conductor(plan): mark T1.3 StartupProfiler complete (5a856536)

2026-06-06 13:59:02 -04:00

20 KiB

Raw Blame History

Plan: Sloppy.py Startup Speedup

Track: startup_speedup_20260606 Spec: ./spec.md Status: In progress Started: 2026-06-06

Phase 1: Audit + Benchmark + Foundation

T1.1 Capture baseline with scripts/benchmark_imports.py --runs=3 --color=never > docs/startup_baseline_20260606.txt
T1.2 Write scripts/audit_gui2_imports.py (AST walker): for each import X in src/gui_2.py, classify as first-frame (reachable from main() / render_main_window etc.) vs feature-gated (inside an if/elif branch that requires user action). Commit audit results to docs/startup_audit_20260606.md.
T1.3 Add src/startup_profiler.py with StartupProfiler class (context manager phase(name)). Wire into AppController.__init__ and App.__init__ at 8 major init points. (No new test; verify via manual run + diagnostics panel.) [T1.3: 5a856536]
T1.4 Write scripts/audit_main_thread_imports.py (static gate, fails CI). AST-walks the import graph reachable from sloppy.py, collects all top-level import X / from X import Y, compares against an allowlist. Exits non-zero with file:line:module on violation. Allowlist: sys.stdlib_module_names + the lean gui_2 skeleton list from spec.md:2.1 (imgui_bundle, defer, src.imgui_scopes, src.theme_2 (default theme only), src.theme_models, src.paths, src.models, src.events).
T1.5 Commit baseline + audit script: git add . && git commit -m "conductor(startup): baseline measurements + main thread import audit script" + git note

Phase 1 checkpoint: Baseline established. Static gate exists. All three import classes (first-frame, feature-gated, background-safe) documented.

Phase 2: Job Pool + Warmup Foundation (the "no new threads" + "no lazy-loading" rules)

Two user constraints, addressed together:

No new threading.Thread(...) per task, per import, per ad-hoc job.
No lazy-loading in function bodies. Heavy imports are warmed on bg threads at startup, not loaded on first use.

The codebase gets ONE shared ThreadPoolExecutor on AppController named _io_pool, used for warmup AND any future background work.

T2.1 (Red) tests/test_app_controller_io_pool.py:
- test_app_controller_has_io_pool: instantiate AppController, assert hasattr(controller, '_io_pool') and it's a ThreadPoolExecutor
- test_io_pool_uses_named_threads: submit a job, assert the executing thread name starts with controller-io
- test_io_pool_size_is_4: assert _io_pool._max_workers == 4
- test_io_pool_shuts_down_on_close: call controller.shutdown(), assert the pool is shut down
- Confirm FAIL (no _io_pool yet)
T2.2 (Green) In src/app_controller.py:
- Add from concurrent.futures import ThreadPoolExecutor and import importlib at top
- In __init__, after the asyncio loop starts and BEFORE the existing HookServer block: self._io_pool = ThreadPoolExecutor(max_workers=4, thread_name_prefix="controller-io")
- Add warmup state: self._warmup_lock, self._warmup_done_event, self._warmup_status (dict with pending/completed/failed lists), self._warmup_callbacks
- Call self._submit_warmup_jobs() at the end of __init__
- In shutdown() (already exists in App.shutdown for the GUI; ensure the AppController has a matching shutdown that calls self._io_pool.shutdown(wait=False))
T2.3 (Red) tests/test_warmup_mechanism.py:
- test_warmup_jobs_submitted_on_init: after AppController.__init__, assert len(controller.warmup_status()['pending']) > 0
- test_warmup_jobs_complete_within_timeout: call controller.wait_for_warmup(timeout=10.0), assert True
- test_warmup_status_reflects_completion: after wait_for_warmup, assert controller.is_warmup_done() == True and len(warmup_status()['pending']) == 0
- test_warmup_callback_fires_on_completion: register a callback via controller.on_warmup_complete(cb), assert it was called once warmup done
- test_warmup_does_not_block_init: time __init__ with a 4-worker pool, assert it returns in < 200ms even though warmup takes longer
- Confirm FAIL (no warmup yet)
T2.4 (Green) Implement _submit_warmup_jobs(), _compute_warmup_list(), _warmup_one(), warmup_status(), is_warmup_done(), wait_for_warmup(), on_warmup_complete() per spec §3.2. Warmup list includes: google.genai, anthropic, openai, requests, src.command_palette, src.theme_nerv, src.theme_nerv_fx, src.markdown_table, numpy. Conditionally adds fastapi, fastapi.security.api_key if enable_test_hooks or web_host is set.
T2.5 Run T2.1 and T2.3 tests; confirm PASS
T2.6 Commit: feat(app_controller): add _io_pool + proactive warmup mechanism + git note

Phase 2 checkpoint: AppController owns a 4-thread named pool. Warmup jobs are submitted in __init__ and complete in the background. controller.wait_for_warmup(), controller.warmup_status(), and controller.on_warmup_complete(cb) are the public API. Main thread does NOT block waiting for warmup.

Phase 3: Remove top-level heavy imports from `src/ai_client.py` (TDD)

The current src/ai_client.py has from google import genai etc. at the top, which puts the main thread in the import chain. Phase 3 removes these and swaps to _require_warmed(name).

T3.1 (Red) Write tests/test_ai_client_no_top_level_sdk_imports.py:
- test_ai_client_does_not_import_genai_at_module_level: spawn fresh subprocess, import src.ai_client, assert 'google.genai' not in sys.modules (warmup hasn't run in this subprocess)
- test_ai_client_does_not_import_anthropic_at_module_level
- test_ai_client_does_not_import_openai_at_module_level
- test_ai_client_does_not_import_requests_at_module_level
- Confirm tests FAIL (proves the imports are currently eager)
T3.2 (Green) In src/ai_client.py:
- Add import sys, importlib, threading at top
- Remove from google import genai, import anthropic, import openai, import requests from top
- Add _require_warmed(name) helper: returns sys.modules[name] or raises RuntimeError
- Each _send_* function calls _require_warmed("google.genai") etc. instead of using the module directly
- Provider client globals stay as None until first _send_* initializes them via _ensure_<provider>_client() (extracted from current top-level logic, uses the warmed module)
T3.3 Run existing tests/test_ai_client.py; fix any breakage. Tests that relied on top-level import side effects need a fixture that warms the modules (or a fallback for test mode).
T3.4 Re-run T3.1 tests, confirm PASS
T3.5 Commit: refactor(ai_client): remove top-level SDK imports; use _require_warmed + git note
T3.6 Update conductor/tracks.md T3 row with SHA

Phase 3 checkpoint: import src.ai_client < 50ms cold. When run inside an AppController whose warmup has completed, _send_* functions find the SDKs in sys.modules and execute instantly.

Phase 4: Remove top-level FastAPI imports from `src/api_hooks.py` (TDD)

Same pattern as Phase 3, for the FastAPI imports.

T4.1 (Red) Write tests/test_hook_server_no_top_level_fastapi.py:
- test_hook_server_does_not_import_fastapi_at_module_level: subprocess test
- test_hook_server_does_not_import_fastapi_security_at_module_level
- Confirm FAIL
T4.2 (Green) In src/api_hooks.py:
- Remove from fastapi import ..., from fastapi.security.api_key import ... from top
- Add _require_warmed(name) calls inside the methods that need them (FastAPI app construction, route registration)
T4.3 Run existing tests/test_api_hooks.py; fix breakage (similar fallback strategy as Phase 3)
T4.4 Confirm T4.1 tests PASS
T4.5 Commit: refactor(api_hooks): remove top-level fastapi imports; use _require_warmed + git note

Phase 4 checkpoint: from src.api_hooks import HookServer does not import fastapi. The HookServer is fully constructed only after AppController's warmup has loaded fastapi (or after _require_warmed("fastapi") triggers the import in test mode).

Phase 5: Remove top-level imports for feature-gated GUI modules (TDD per module)

5A: Command Palette

T5A.1 (Red) tests/test_command_palette_no_top_level_import.py: from src.commands import COMMANDS does not import src.command_palette. Confirm FAIL.
T5A.2 (Green) In src/commands.py: remove from src.command_palette import ... from top. The command functions (_open_command_palette, _toggle_command_palette) call _require_warmed("src.command_palette") to access the module.
T5A.3 Run tests/test_command_palette.py; fix.
T5A.4 Commit: refactor(commands): remove top-level command_palette import; use _require_warmed

5B: NERV Theme

T5B.1 (Red) tests/test_theme_nerv_no_top_level_import.py: from src.theme_2 import * does not import src.theme_nerv or src.theme_nerv_fx. Confirm FAIL.
T5B.2 (Green) In src/theme_2.py: remove from src.theme_nerv import ... and from src.theme_nerv_fx import ... from top. apply_nerv_theme() (or whichever function activates the theme) calls _require_warmed("src.theme_nerv") and _require_warmed("src.theme_nerv_fx").
T5B.3 Run tests/test_theme_2.py and tests/test_theme_nerv.py; fix.
T5B.4 Commit: refactor(theme): remove top-level nerv theme imports; use _require_warmed

5C: Markdown Table

T5C.1 (Red) tests/test_markdown_helper_no_top_level_import.py: from src.markdown_helper import MarkdownRenderer does not import src.markdown_table. Confirm FAIL.
T5C.2 (Green) In src/markdown_helper.py: remove from src.markdown_table import ... from top. The table-detection branch of render() calls _require_warmed("src.markdown_table").
T5C.3 Run tests/test_markdown_helper.py; fix.
T5C.4 Commit: refactor(markdown): remove top-level markdown_table import; use _require_warmed

5D: GUI module feature-gated imports

T5D.1 Run scripts/audit_gui2_imports.py (built in T1.2); collect list of feature-gated imports in src/gui_2.py
T5D.2 For each feature-gated import, apply the same TDD pattern (5A-5C). Group into 1-2 atomic commits per logical feature.
T5D.3 Run full GUI test suite; fix.
T5D.4 Commit per feature group

Phase 5 checkpoint: All heavy imports removed from main-thread-reachable source files. Default-theme / non-palette / non-table path is lean. Warmup pre-loads all of them in the background.

Phase 6: Migrate Ad-hoc Threads to `_io_pool`

The codebase has several ad-hoc threading.Thread(...) calls. Per the user constraint, these should migrate to controller.submit_io(fn).

T6.1 Audit: grep -rn "threading.Thread(" src/ to find all ad-hoc thread spawns. Document each in state.toml (a new [ad_hoc_threads] section).
T6.2 For each ad-hoc thread in src/log_pruner.py, src/project_manager.py, etc., refactor to use controller.submit_io(fn) instead. Wrap the callable body in a try/except (the pool's default behavior is to surface exceptions via the Future; preserve existing error logging).
T6.3 Run full test suite; fix.
T6.4 Per-migration commit (or grouped by subsystem if 3+ threads in one file). Final commit: refactor: migrate ad-hoc threads to AppController._io_pool + git note.

Phase 6 checkpoint: grep -rn "threading.Thread(" src/ shows ZERO new spawns after this phase (existing project scaffolding threads like HookServer and MMA WorkerPool are exempt — they're domain-specific).

Phase 7: Warmup Notification (Hook API + GUI)

The user said: "the app controller should post to test clients or the user when its threads are warmed up with imports — that way the user knows 'hey you have the ui first, but now you have all the functionality.'" This phase implements the notification surfaces.

7A: Hook API endpoints

T7A.1 (Red) tests/test_api_hooks_warmup.py:
- test_warmup_status_endpoint: hit GET /api/warmup_status, assert response has pending/completed/failed keys
- test_warmup_wait_endpoint: hit GET /api/warmup_wait?timeout=10, assert response includes the completion state
- Confirm FAIL (endpoints don't exist yet)
T7A.2 (Green) In src/api_hooks.py:
- Add GET /api/warmup_status returning controller.warmup_status()
- Add GET /api/warmup_wait accepting ?timeout=N (default 30s), calling controller.wait_for_warmup(timeout) then returning the final status
- Register warmup_status in _gettable_fields so the existing Hook API client can fetch it
T7A.3 Run T7A.1 tests; confirm PASS
T7A.4 Commit: feat(api_hooks): add /api/warmup_status and /api/warmup_wait + git note

7B: GUI status indicator + toast

T7B.1 In src/gui_2.py (in the status bar render function), poll controller.warmup_status() once per frame. While pending is non-empty: show "Warming up... (N/M)" text. When pending is empty AND failed is empty: show "All imports ready" with a green dot. When failed is non-empty: show "Imports: N failed" with a yellow dot.
T7B.2 Register a callback via controller.on_warmup_complete(cb) that:
- On transition to done (with no failures): queue a toast notification "All providers ready (M modules)" via the existing toast system
- On transition to done (with failures): queue a warning toast "Warmup finished with N failures — see Diagnostics"
T7B.3 Update docs/guide_gui_2.md (or wherever status bar is documented) to describe the new indicator
T7B.4 Commit: feat(gui_2): warmup status indicator + completion toast + git note

Phase 7 checkpoint: Tests can poll /api/warmup_status to know when the system is fully ready. The GUI shows progress during startup and a toast when complete.

Phase 8: Enforcement (Runtime Audit Hook)

The static gate (T1.4) catches known imports at audit time. This phase adds empirical enforcement: a test that spawns sloppy.py and verifies NO heavy import happens on the main thread at runtime.

T8.1 (Red) tests/test_main_thread_purity.py:
- test_headless_startup_no_heavy_imports_on_main: spawn uv run python sloppy.py --headless --enable-test-hooks with a sitecustomize.py shim that installs sys.addaudithook to log every import event with the calling thread. The hook writes to a temp file as JSON-L.
- Wait for headless server ready (5s timeout via ApiHookClient).
- Read the audit log. Assert: no event with thread_name == "MainThread" for any module in the heavy denylist (google.genai, anthropic, openai, fastapi, requests, numpy, tkinter, psutil, pydantic, tree_sitter_*, src.command_palette, src.theme_nerv, src.theme_nerv_fx, src.markdown_table).
- Kill subprocess. Confirm FAIL (current state imports these on main).
T8.2 Once Phase 3-5 land and the static gate passes, this test should start passing. If it doesn't, debug and add more top-level import removals.
T8.3 Wire test_main_thread_purity.py into CI as a gating test (it'll be slow, ~10s, so mark with @pytest.mark.slow and only run in batched CI).
T8.4 Commit: test: empirical main-thread purity check via sys.audit hook + git note

Phase 8 checkpoint: CI fails if a future commit re-introduces a heavy main-thread import.

Phase 9: Verify + Phase Checkpoint

T9.1 Re-run scripts/benchmark_imports.py --runs=3. Save to docs/startup_after_20260606.txt. Diff against T1.1 baseline; confirm:
- import src.ai_client < 50ms
- import src.gui_2 < 500ms
- import src.app_controller < 300ms (includes _io_pool creation; should still be < 300ms)
T9.2 Re-run scripts/audit_main_thread_imports.py (T1.4). Confirm exit 0. No violations.
T9.3 Run tests/test_warmup_mechanism.py (T2.3); confirm warmup completes and notifications fire
T9.4 Run live_gui test batch (per conductor/workflow.md:147-150: max 4 test files per batch, long timeout):
- uv run pytest tests/test_live_gui_*.py --timeout=60 -v in batches
- Confirm wait_for_server(timeout=15) does not time out
- Optionally: tests can call controller.wait_for_warmup() before exercising functionality that depends on warmed modules
T9.5 Manual smoke:
- uv run sloppy.py (normal mode): time-to-first-frame, observe "Warming up... (N/M)" status, then "All imports ready" toast
- uv run sloppy.py --enable-test-hooks (test mode): same observations, plus /api/warmup_status returns completed
- uv run sloppy.py --headless (headless): time-to-server-ready
- Verify a user action that switches provider (or other warmup-dependent operation) is INSTANT, not 1s-delayed
T9.6 Phase checkpoint commit: conductor(checkpoint): Phase 9 complete - sloppy.py startup speedup track + git note with full verification report
T9.7 Update conductor/tracks.md: mark track complete, link to archived folder

Phase 9 checkpoint: All verification criteria in spec.md:6 met. User can switch providers with zero perceptible lag because warmup already loaded the SDK.

Definition of Done

All Phase 1-9 tasks checked
All tests pass (273+ existing + new TDD tests including test_main_thread_purity and test_warmup_mechanism)
uv run ruff check . and uv run mypy --explicit-package-bases . clean (per mma-tier2-tech-lead skill)
uv run python scripts/audit_main_thread_imports.py exits 0
docs/startup_baseline_20260606.txt and docs/startup_after_20260606.txt archived
Phase 9 git note contains: baseline diff, audit script result, runtime audit hook result, full test batch results, manual smoke timings, file inventory
Track moved to conductor/tracks/archive/
NO new threading.Thread(...) calls in src/ (verified by grep -rn "threading.Thread(" src/)
NO import X statements in function bodies for heavy modules — verified by grep -rn "^\s*import \(google\|anthropic\|openai\|fastapi\|src\.command_palette\|src\.theme_nerv\|src\.markdown_table\)" src/
Warmup completion notification works — GUI shows toast, Hook API returns completed, controller.is_warmup_done() returns True within 10s of startup
User action latency is zero for warmup-dependent operations — manual smoke test switching providers / opening palette / rendering NERV is instant

Notes for Tier 3 Workers

Always use 1-space indentation for Python code. Confirm via uv run python -c "import ast; ..." AST check if you do any class-body reorganization (the "Indentation-Driven Class Method Visibility" pitfall in conductor/workflow.md).
Test fixtures: isolate_workspace, reset_paths, reset_ai_client, vlogger, kill_process_tree, mock_app, live_gui — see docs/guide_testing.md.

Subprocess tests for module-level imports: spawn uv run python -c "..." and inspect sys.modules after the import. Pattern:

result = subprocess.run(
    [sys.executable, "-c", "import sys; import src.ai_client; import json; print(json.dumps(sorted(sys.modules.keys())))"],
    capture_output=True, text=True
)
assert 'google.genai' not in result.stdout

For new background work: use controller.submit_io(fn, *args), NOT threading.Thread(target=fn).start(). The user constraint is "no new threads."
Atomic commits per task. No batching. If a task touches 3 files, commit all 3 in one commit but the commit message describes the task.
The _io_pool is a daemon executor by default in Python 3.9+; non-daemon workers in 3.8. Check pyproject.toml for requires-python. Either way, the pool is shut down on AppController.shutdown().

Cross-References

Spec: ./spec.md
Original backlog entry: conductor/tracks.md:152
Benchmark tool: scripts/benchmark_imports.py
Lazy pattern templates: src/app_controller.py:241-271 (RAG + MMA)
Threading constraints: docs/guide_architecture.md:43-67
Architectural Invariant: spec.md:2.1
Job pool spec: spec.md:2.2 Layer 2
Hot reload constraints: docs/guide_hot_reload.md:295-312

20 KiB Raw Blame History