Adds per-AppController startup timing instrumentation to answer
'did the warmup block the first frame?'
AppController.__init__ records _init_start_ts at entry (cold-start anchor).
WarmupManager.on_complete callback stamps _warmup_done_ts.
App.render_main_interface (gui_2.py) calls mark_first_frame_rendered()
on its first call, which stamps _first_frame_ts and logs the timeline.
New public API on AppController:
- init_start_ts (property): float
- warmup_done_ts (property): Optional[float]
- first_frame_ts (property): Optional[float]
- mark_first_frame_rendered(ts=None): idempotent; logs to stderr
- startup_timeline() -> dict with all timestamps + precomputed deltas:
warmup_ms, first_frame_after_init_ms, first_frame_after_warmup_ms
Stderr log on warmup done:
[startup] warmup done in 1186.2ms (first frame rendered Nms BEFORE/AFTER)
Stderr log on first frame:
[startup] first frame at Xms after init (warmup took Yms) (rendered Zms BEFORE/AFTER warmup done)
Hook API:
- GET /api/startup_timeline
- ApiHookClient.get_startup_timeline() -> dict
5 new tests in test_warmup_canaries.py covering all the new methods.
All 18 canary tests + 10 api_hooks tests + 6 gui_indicator tests pass.
Script scripts/apply_startup_timeline.py is included as a reference
for the multi-edit pattern (the proper MCP-equivalent tools will be
added later per the edit_workflow doc).
Per module: prints a one-line summary to stderr when the import
completes or fails:
[warmup 1] google.genai on controller-io_0 (id=18636): 1218.6ms
[warmup 2] anthropic on controller-io_1 (id=5500): 1148.3ms
[warmup 3] openai on controller-io_2 (id=34376): 1144.2ms
...
When the entire warmup completes, prints an aggregate:
[warmup done] 9 modules: 9 completed (sum of per-module elapsed: 3591.7ms)
If ANY canary ran on the main thread (main-thread-purity violation),
the per-module line is tagged with [MAIN-THREAD] AND a final WARNING
is printed:
[warmup WARNING] N module(s) loaded on the MAIN THREAD: google.genai
Default is log_to_stderr=True so production runs get the observability
for free. Tests opt out via WarmupManager(pool, log_to_stderr=False)
in the _build_warmup helper.
5 new tests (4 stderr logging + 1 quiet). All 13 canary tests pass.
Use case: 'did my heavy import run on the GUI thread when it shouldnt
have?' is now answered by grepping stderr for [warmup ...] [MAIN-THREAD]
lines. No hook server required.
Adds a canary record for each module submitted to the warmup, tracking:
canary_id, module, thread_name, thread_id, submit_ts, start_ts,
end_ts, elapsed_ms, status, error.
Surface:
- WarmupManager.canaries() returns list[dict] (defensive copy)
- AppController.warmup_canaries() returns list[dict] (delegation)
- GET /api/warmup_canaries Hook API endpoint
- ApiHookClient.get_warmup_canaries() returns list[dict]
Example: the warmup of google.genai records a 1187ms canary on
thread controller-io_0 with thread_id 50420, canary_id 1.
11 new tests (8 unit in test_warmup_canaries + 3 in test_api_hooks_warmup).
All pass; live_gui smoke test confirms endpoint returns real data.
Sub-track 2 of startup_speedup_20260606. Removes the top-level
'import tomli_w' from src/models.py and moves it inside save_config().
tomli_w (~30ms cold load) is now loaded only when the user saves
config, not on every src.models import.
This drops the audit violation count from 63 to 62.
Pydantic BaseModel (the other src/models.py violation) is left for
a future sub-track: deferring a class base requires a metaclass or
proxy pattern that's higher risk for the small (~50ms) saving.
3 new tests in tests/test_models_no_top_level_tomli_w.py:
- tomli_w NOT in sys.modules after import src.models
- save_config() still works (because tomli_w loads on-demand)
- save_config() actually triggers the import on first call
17 existing model tests pass (test_persona_models, test_bias_models,
test_context_presets_models, test_per_ticket_model, test_file_item_model).
Fixes the run_tests_batched.py hang that occurs after batch 4.
The original conftest (commit 52ea2693) stored _warmup_app_controller
at module scope for the entire pytest session. When pytest exits,
GC of the AppController triggers ThreadPoolExecutor.__del__ ->
shutdown(wait=True). If warmup hasn't fully completed by then, the
shutdown blocks indefinitely, causing the batched test runner to
hang at the subprocess.run boundary.
Fix: register an atexit handler that captures the _io_pool reference
directly (default argument) and shuts it down with wait=False. The
pool reference is captured by closure, surviving even after the
AppController is GC'd. shutdown() is idempotent so the subsequent
shutdown(wait=True) in __del__ is a no-op.
This is part of sub-track 4 (warmup notification) cleanup; the
conftest's wait_for_warmup behavior is preserved, only the
exit-hang is fixed.
Sub-track 4 of startup_speedup_20260606. Adds per-frame GUI feedback
during the AppController's background warmup:
- render_warmup_status_indicator(app): module-level render fn called
from render_main_interface. Shows 'Warming up... (N/M)' in warning
color while pending, 'Imports: K failed' in error color on failure,
or 'All imports ready (M modules)' in success color for 3 seconds
after completion. Hidden otherwise.
- _on_warmup_complete_callback(app, status): thread-safe callback
registered with controller.on_warmup_complete() in App._post_init.
Records timestamp + lock-protected toast list.
- App._post_init: registers the callback.
6 new tests in tests/test_gui_warmup_indicator.py:
- 2 importable-checks (function exists)
- 3 callback-logic tests (timestamp, failures, thread-safety)
- 1 live_gui smoke test (controller exposes warmup_status)
Sub-track 3 of startup_speedup_20260606. Builds on the Phase 7 minimal
work at b464d1fe which only added warmup_status to /api/gui/diagnostics.
New dedicated endpoints:
- GET /api/warmup_status -> controller.warmup_status() (cheap, lock-guarded)
- GET /api/warmup_wait?timeout=N -> controller.wait_for_warmup(timeout)
then returns the final status. Default 30s.
Both callable from external clients via ApiHookClient.get_warmup_status()
and ApiHookClient.get_warmup_wait(timeout=30.0).
7 new tests in tests/test_api_hooks_warmup.py (5 unit + 2 live_gui).
All 7 pass.
Phase 6 of startup_speedup_20260606 was partial: ~13 ad-hoc
threading.Thread spawns remained in src/app_controller.py and
2 in src/gui_2.py. This commit migrates all of them to
self.submit_io(...) (the shared _io_pool wrapper from Phase 2).
ZERO new threading.Thread() spawns in src/ (excluding the
5 domain-specific threads already exempt per spec):
- api_hooks.py:739 HookServer HTTP server (domain-specific)
- api_hooks.py:818 WebSocketServer (domain-specific)
- app_controller.py _loop_thread (asyncio event loop, DEDICATED)
- multi_agent_conductor.py WorkerPool (domain-specific)
- performance_monitor.py CPU monitor (continuous, domain-specific)
Sites migrated (15 total):
app_controller.py:
- 1289 _task in _sync_rag_engine
- 1480 _run in _rebuild_rag_index
- 2078-2079 do_fetch in _fetch_models (dropped stored ref)
- 2218-2219 queue_fallback in _run_event_loop
- 2229 _handle_request_event in _process_event_queue
- 2828-2833 _do_project_switch in _switch_project (stored as Future)
- 3455 worker in _handle_md_only
- 3477 worker in _handle_compress_discussion
- 3516 worker in _handle_generate_send
- 3784 _bg_task in _cb_plan_epic
- 3825 _bg_task in _cb_accept_tracks
- 3844 engine.run in _cb_start_track (track_id case)
- 3855 engine.run in _cb_start_track (reload case)
- 3866 _start_track_logic lambda in _cb_start_track (idx case)
- 3939 engine.run in _start_track_logic
gui_2.py:
- 1129 _stats_worker in _update_context_file_stats
- 3507 worker in _check_auto_refresh_context_preview
Stored-ref migration (Phase 6 partial work):
- self.models_thread (declared L960, assigned L2078):
No external readers. Dropped the declaration and the assignment;
replaced the .start() with self.submit_io(do_fetch).
- self._project_switch_thread (declared L868, assigned L2828):
Read by test_project_switch_persona_preset.py:21 for
.is_alive() polling. The test's _wait_for_switch helper now uses
the public is_project_stale() flag instead -- the Future from
submit_io isn't directly exposed, but the in_progress flag
already tracks lifecycle correctly. Dropped the declaration;
replaced the .start() with self.submit_io(self._do_project_switch, path).
Test impact:
- test_project_switch_persona_preset.py::_wait_for_switch:
Updated to poll ctrl.is_project_stale() instead of the
_project_switch_thread attribute. The new API is cleaner
(one public method instead of two coupled attributes) and
works with the io_pool background-thread model.
Effectiveness:
- Per-spawn cost: ~1-5ms saved (thread creation)
- 4 long-lived threads eliminated; all background work now shares
the 4-worker _io_pool
- When 4 long-lived threads were active simultaneously, the new
pool backpressure causes them to queue; future work can be
backpressured explicitly
TESTS: 19+39 = 58 tests touching migrated code paths all pass.
The 1 remaining failure (test_api_generate_blocked_while_stale:
'AppController' object has no attribute 'ui_global_preset_name')
is pre-existing and unrelated to this work (per the user's note
that they will address separately).
The google-genai library has a known circular-import bug in its
__init__.py chain:
google.genai/__init__.py:21: from .client import Client
-> from ._api_client import BaseApiClient
-> from .types import HttpOptions
When loaded fresh in a pytest process, the chain collides with
itself and leaves google.genai in a 'partially initialized' state.
Per the user spec (startup_speedup_20260606 spec.md:2.2 Layer 3):
"the app controller should post to test clients or the user
when its threads are warmed up with imports — that way the user
knows 'hey you have the ui first, but now you have all the
functionality.'"
This is exactly what the warmup notification system does.
Phase 2 (commit 1354679e) added the WarmupManager + _io_pool,
and the warmup list (state.toml) already includes 'google.genai'.
The AppController.__init__ submits the warmup jobs to the _io_pool
background thread. When the warmup completes, _warmup_done_event
is set and registered on_warmup_complete callbacks fire.
The previous conftest fix imported 'google.genai' DIRECTLY at
conftest module load. That bypassed the whole notification
mechanism. This commit fixes the oversight:
- Reverts the direct `import google.genai`
- Creates an AppController at conftest load time
- Calls `wait_for_warmup(timeout=60.0)` to block until the
background warmup completes
- google.genai ends up in sys.modules via the warmup's
`importlib.import_module` call (same end state, but now via
the documented mechanism)
The conftest's `from src.gui_2 import App` at line 27 is also
a heavy synchronous import chain that runs in-process. By the
time that line executes, the warmup is already in progress on
the _io_pool. The wait_for_warmup() call after that line ensures
the warmup completes before any test collects.
The AppController is session-scoped (one per pytest process).
If another fixture (e.g. live_gui) creates its own AppController
that also runs warmup, the second controller's wait_for_warmup
returns immediately because the modules are already in
sys.modules.
Cost: 60s timeout worst-case (typically completes in ~3s based on
the baseline measurement). One-time per pytest process.
Earlier alternatives I tried and rejected:
- Direct `import google.genai` in conftest: bypasses the
notification mechanism. User feedback: "you are falling back
to your jank."
- Source-level `genai = _require_warmed('google.genai')` + `.types`:
fails the same way (the library bug is in the PARENT's
__init__.py, not the leaf). The parent's __init__.py never
completes in a fresh process; once it's in the "partially
initialized" state in sys.modules, no caller pattern can fix it.
- Revert the conftest change and skip these tests: not viable,
the tests are real and important.
Three test failures identified by the batched test suite, all rooted
in the Phase 3 lazy-import refactor of src/ai_client.py.
FIX 1: UnboundLocalError in _ensure_gemini_client
- _ensure_gemini_client had a latent bug: creds was assigned inside
`if _gemini_client is None:` but used on the next line. When the
client was already cached, the assignment was skipped and the next
line raised UnboundLocalError. Moved the Client() construction
inside the if block to match creds' scope.
- This affected test_ai_cache_tracking.py and (downstream)
test_gui_updates.py::test_telemetry_data_updates_correctly.
FIX 2: Phase 3 removed top-level `import requests` from ai_client.py.
- test_discussion_compression.py::test_discussion_compression_deepseek
did `patch("src.ai_client.requests.post", ...)` which no longer works.
- Updated the test to mock _require_warmed to return a fake requests
module with `.post()`, matching the new lazy-import pattern.
FIX 3: _require_warmed could not import dotted names like `google.genai.types`
- The google-genai library has a self-referential __init__.py that
does `from .client import Client` which transitively does
`from .types import HttpOptions`. Importing `google.genai.types`
FIRST (before the parent package is fully loaded) hit a "partially
initialized module" circular import.
- Enhanced _require_warmed to pre-import parent packages for dotted
names: walks `name.split(".")` and imports each parent (if not in
sys.modules) before the leaf import. O(n) extra imports per call
on first use; subsequent calls are O(1) sys.modules hit.
TESTS:
- test_ai_cache_tracking.py: 2/2 PASS
- test_discussion_compression.py: 4/4 PASS
- 29/29 PASS across the sampled test files that were failing
(test_subagent_summarization, test_tool_access_exclusion,
test_tier4_interceptor, test_gui2_mcp, test_gui_updates,
test_headless_service)
ARCHITECTURAL NOTE: The _require_warmed enhancement is a small
but important robustness fix. The google-genai library's
__init__.py chain is a known source of fragility; the parent-
pre-import pattern is the recommended workaround.
Phase 8 of startup_speedup_20260606 track.
Part 1: app_controller.py cleanup
- Removed 'import requests' (was used in 2 places - lazy import added inside)
- Removed 'import tomli_w' (dead import; never referenced in app_controller)
- Migrated 2 threading.Thread spawns to use self.submit_io (the do_post
closures in _handle_approve_ask and _handle_reject_ask)
Part 2: Main thread purity enforcement test
- tests/test_main_thread_purity.py: 7 tests verify that the 6 refactored
files (ai_client, app_controller, commands, theme_2, markdown_helper,
gui_2) have ZERO top-level imports from the heavy denylist:
{google.genai, anthropic, openai, requests, google.genai.types,
fastapi, fastapi.security.api_key, src.command_palette,
src.theme_nerv, src.theme_nerv_fx, src.markdown_table, numpy,
tkinter, tomli_w}
This is the static enforcement (the runtime audit-hook test using
sys.addaudithook is a follow-up).
The test is RED before each refactor phase, GREEN after. If a future
commit re-introduces a heavy import in one of these files, the test
fails immediately in CI.
TESTS:
- 7/7 main thread purity tests PASS
- 15/15 log + app controller tests still PASS (no breakage from
removing requests/tomli_w imports)
Phase 5D of startup_speedup_20260606 track.
DEAD IMPORTS REMOVED (zero uses, safe to remove):
- 'import tomli_w' (line 18) - never referenced anywhere in gui_2.py
- 'from src import theme_nerv_fx as theme_fx' (line 59) - never
referenced; the actual NERV FX objects are created in src/theme_2.py
and accessed via render_post_fx()
The theme_nerv_fx removal saves the full ~254ms import of
src.theme_nerv_fx on the main thread.
LAZY PROXY PATTERN for heavy feature-gated modules:
- 'import numpy as np' (line 9) - used in 1 place (plot_lines)
- 'from tkinter import filedialog, Tk' (lines 30, 34) - duplicates
removed, 13 use sites now go through the proxy
Added a _LazyModule class that defers module loading until first
attribute access or call. The proxy is a transparent replacement:
'np.array(...)' and 'Tk()' continue to work unchanged. The import
only fires on first use, then is cached in sys.modules for O(1)
subsequent access.
ARCHITECTURAL NOTE: This is a general-purpose pattern that can be
used for any module that should not be in the main thread's import
chain. The Phase 5A 'lazy registry proxy' was a similar idea but
custom-tailored to one use case; _LazyModule is the general form.
EFFECTIVENESS (estimated from baseline):
- src.theme_nerv_fx removal: ~254ms saved
- numpy deferral: ~65ms saved (when not plotting); 0ms saved if the
user is using numpy (imgui_bundle transitively brings it in anyway)
- tkinter deferral: small but real savings (tkinter is stdlib but
still has import cost)
Note that numpy and tkinter are still brought in transitively by
imgui_bundle and other src.* modules. The test verifies the AST
(top-level imports of gui_2.py) is clean; the runtime sys.modules
check is too strict because of these transitive imports.
TESTS:
- tests/test_gui_2_no_top_level_heavy_imports.py: 5/5 PASS (all RED -> GREEN)
- 13 gui tests sampled (gui_progress, gui_paths, gui_kill_button,
gui_window_controls, gui_custom_window, gui_fast_render,
gui_startup_smoke, gui2_layout, gui2_events): all PASS
NEXT: Phase 6 (ad-hoc threads -> _io_pool), Phase 7 (warmup
notification), Phase 8 (enforcement), Phase 9 (final verify + checkpoint).
Phase 5C of startup_speedup_20260606 track.
src/markdown_helper.py imported src.markdown_table at module level:
from src.markdown_table import parse_tables, render_table
Both parse_tables and render_table are only used inside
MarkdownRenderer.render(). Removed the top-level import; the
MarkdownRenderer.render() method now does:
markdown_table = _require_warmed('src.markdown_table')
parse_tables = markdown_table.parse_tables
render_table = markdown_table.render_table
at the top of its body, before any other logic.
TESTS:
- tests/test_markdown_helper_no_top_level_table.py: 3/3 PASS (all RED -> GREEN)
- tests/test_markdown_table*.py (5 files) + test_markdown_helper_bullets.py +
test_markdown_render_robust.py: 24/24 PASS (no breakage)
EFFECTIVENESS: import src.markdown_helper no longer triggers src.markdown_table
(~250ms). For renderers that never hit a GFM table, the import is never
paid. For renderers that do, the warmup pre-loads it on _io_pool and the
render() lookup is O(1).
NEXT: Phase 5D - bulk refactor of src/gui_2.py feature-gated imports via
scripts/audit_gui2_imports.py.
Phase 5B of startup_speedup_20260606 track.
src/theme_2.py had 3 top-level NERV imports:
from src import theme_nerv
from src.theme_nerv import DATA_GREEN
from src.theme_nerv_fx import CRTFilter, AlertPulsing, StatusFlicker
And 3 module-level FX object instantiations:
_crt_filter = CRTFilter()
_alert_pulsing = AlertPulsing()
_status_flicker = StatusFlicker()
ALL removed. The 3 use sites now lookup via _require_warmed:
- apply() NERV branch: theme_nerv = _require_warmed('src.theme_nerv')
- ai_text_color(): theme_nerv = _require_warmed('src.theme_nerv')
(then uses theme_nerv.DATA_GREEN)
- render_post_fx(): theme_nerv_fx = _require_warmed('src.theme_nerv_fx')
(then creates FX objects locally per-call)
The _status_flicker was instantiated but never used (dead code path;
the StatusFlicker class is still importable via theme_nerv_fx but not
auto-constructed in theme_2.py).
TESTS:
- tests/test_theme_2_no_top_level_nerv.py: 4/4 PASS (all RED -> GREEN)
- tests/test_theme.py, test_theme_nerv.py, test_theme_nerv_fx.py,
test_theme_models.py: 21/21 PASS (no breakage)
EFFECTIVENESS: import src.theme_2 no longer triggers src.theme_nerv or
src.theme_nerv_fx (~485ms combined). For users on default theme, these
are NEVER loaded. For NERV users, the warmup pre-loads on _io_pool and
the lookup is O(1).
NEXT: Phase 5C (markdown table) follows same TDD pattern.
Phase 5A T5A.1-T5A.4 of startup_speedup_20260606 track.
src/commands.py was importing src.command_palette at module load to
create the CommandRegistry singleton. The 32 @registry.register
decorators on the command functions needed this registry at import time.
Approach: lazy registry proxy. The @registry.register decorator now
just queues the function in a list; the real CommandRegistry is built
on first access to any other registry attribute (.all, .get, etc.).
By that time, all 32 decorators have run and the pending list is
populated, so the real registration is complete in one pass.
src/commands.py changes:
- Removed 'from src.command_palette import CommandRegistry'
- Added 'from src.module_loader import _require_warmed'
- Added _LazyCommandRegistry class (proxy)
- Added _get_real_registry() function (initializes on first access)
- Replaced 'registry = CommandRegistry()' with 'registry = _LazyCommandRegistry()'
- The 32 @registry.register decorators are unchanged (the proxy's
register method returns the function unchanged after queueing it)
EFFECTIVENESS:
- 'import src.commands' no longer triggers src.command_palette (~244ms)
- The warmup on AppController's _io_pool pre-loads src.command_palette
on a background thread during startup
- First access to registry.all() (e.g. from gui_2.py at palette open
time) is O(1) - the warmup module is already in sys.modules
TESTS:
- tests/test_commands_no_top_level_command_palette.py: 4/4 PASS (3 RED, 1 green; now all green)
- tests/test_command_palette.py: 13/13 PASS (no breakage)
- tests/test_command_palette_sim.py: 7/7 PASS (live_gui tests, the
full palette flow works end-to-end with the lazy proxy)
ARCHITECTURAL NOTE: The lazy proxy is a minimal-change solution that
preserves the public API. The 32 decorated functions don't need any
changes; gui_2.py's 'from src.commands import registry' still works
unchanged. The deferral is invisible to consumers.
NEXT: Phase 5B (NERV theme) and 5C (markdown table) follow the same
TDD pattern. 5D is the bulk refactor of src/gui_2.py feature-gated
imports via the audit_gui2_imports.py script.
Phase 4 T4.1-T4.4 of startup_speedup_20260606 track.
DEVIATION FROM ORIGINAL SPEC: spec.md said fastapi was in src/api_hooks.py
but it was actually in src/app_controller.py (lines 17, 21). api_hooks.py
uses stdlib http.server. Phase 4 target corrected to app_controller.
LIFTED _require_warmed TO SHARED MODULE: created src/module_loader.py to
avoid duplicating the lookup logic and the cross-module import smell
(app_controller -> ai_client). src/ai_client.py re-exports it so the
T3.1 test (which asserts hasattr(src.ai_client, '_require_warmed'))
continues to work.
src/app_controller.py changes:
- Added 'from __future__ import annotations' (enables lazy type annotations;
-> FastAPI return type now a forward reference)
- Removed 'from fastapi import FastAPI, Depends, HTTPException' (line 17)
- Removed 'from fastapi.security.api_key import APIKeyHeader' (line 21)
- Added 'from src.module_loader import _require_warmed' (cross-module via
shared utility, not via ai_client)
- create_api(): added lookups at top of function body
- 7 _api_* helper functions (_api_get_key, _api_generate, _api_stream,
_api_confirm_action, _api_get_session, _api_delete_session,
_api_get_context): added 'HTTPException = _require_warmed(...).HTTPException'
at top of each function body
EFFECTIVENESS:
- import src.app_controller no longer triggers fastapi import (saves ~470ms
in main thread; only loaded when --enable-test-hooks is set)
- When --enable-test-hooks is set, the AppController's warmup pre-loads
fastapi on the _io_pool, so create_api()'s lookup is O(1)
TESTS:
- tests/test_app_controller_no_top_level_fastapi.py: 4/4 PASS (was 3 RED + 1 pass)
- tests/test_ai_client_no_top_level_sdk_imports.py: 9/9 still PASS (re-export works)
- tests/test_app_controller_mcp.py, test_app_controller_offloading.py: pass
- tests/test_headless_service.py: 10/11 PASS (1 pre-existing failure
test_generate_endpoint is a circular-import issue in google.genai,
reproduces identically on stashed pre-Phase-4 state - NOT a regression
from this change)
- tests/test_hooks.py: pass
NEXT: Phase 5 (feature-gated GUI module imports - command palette, NERV
theme, markdown table), then Phase 6 (ad-hoc threads -> _io_pool).
Phase 3 T3.2 + T3.3 of startup_speedup_20260606 track.
The 5 heavy SDKs (anthropic, google.genai, openai, google.genai.types,
requests) are no longer imported at module level. Each function that
needs them now calls _require_warmed(name) to get the module from
sys.modules (populated by AppController's warmup on _io_pool).
This is the load-bearing wall of the Main Thread Purity Invariant:
heavy modules are never in the main thread's import chain.
run_discussion_compression now uses _require_warmed for both
google.genai.types (gemini branch) and requests (deepseek branch).
Tests/test_tier4_patch_generation.py adapted: the 2 tests that
mocked 'src.ai_client.types' (no longer a module-level attr)
now mock 'src.ai_client._require_warmed' (the new public mechanism).
T3.1 tests now pass (9/9). T3.3 breakage fixed.
All 25 ai_client + tier4 tests pass.
Phase 3 Task T3.1 of startup_speedup_20260606 track. 9 tests assert:
- import src.ai_client does NOT trigger google.genai / anthropic /
openai / requests / google.genai.types imports (the main thread
must not load these on import; they're warmed on _io_pool)
- _require_warmed(name) helper exists and is callable
- _require_warmed returns the cached module if already in sys.modules
- _require_warmed falls back to importlib for tests/dev where
warmup didn't run
- The static audit script does not see src/ai_client.py as a
contributor of heavy-import violations
All 9 tests are currently FAILING (RED). They will turn GREEN when
T3.2 (the actual refactor of src/ai_client.py to remove top-level
imports and add _require_warmed) lands.
The implementation is held pending MCP client fix (per user instruction).
Phase 2 Tasks T2.1-T2.4 of the startup_speedup_20260606 track.
NEW: src/io_pool.py
make_io_pool() factory: 4-worker ThreadPoolExecutor with
thread_name_prefix='controller-io'. The sanctioned way for any
background work. Replaces ad-hoc threading.Thread() calls per
the 'no new threads' rule.
NEW: src/warmup.py
WarmupManager: manages a list of modules to import on the shared
pool. Public API:
.submit(modules) - start warmup (call once)
.status() - {pending, completed, failed}
.is_done() - bool
.wait(timeout) - block until done
.on_complete(callback) - register completion callback
.reset() - clear state
Thread-safe (lock-guarded). 10 tests cover all paths.
NEW: tests/test_io_pool.py (4 tests):
- ThreadPoolExecutor returned
- 4 workers
- Threads named 'controller-io-*'
- Jobs run in parallel (barrier test)
NEW: tests/test_warmup.py (10 tests):
- One job per module submitted
- Initial pending list correct
- Failed imports tracked
- Done event set after all complete
- wait() blocks until done
- on_complete callback fires (and immediately if already done)
- Modules actually end up in sys.modules
- reset() clears state
- Jobs run concurrently (not serially)
All 14 tests pass. AppController integration is the next commit.
Phase 1, Tasks T1.2 + T1.4 of the startup_speedup_20260606 track.
NEW: scripts/audit_main_thread_imports.py
Static CI gate that AST-walks the import graph reachable from
sloppy.py and fails (exit 1) if any heavy module is imported at the
top of a main-thread-reachable file. Walks into if/elif/else and
try/except branches (which run at import time) but skips function
bodies (which only run when called). Allowlist: stdlib + the lean
gui_2 skeleton (imgui_bundle, defer, src.imgui_scopes, src.theme_2,
src.theme_models, src.paths, src.models, src.events).
NEW: scripts/audit_gui2_imports.py
Read-only analysis tool that lists every top-level and function-level
import in src/gui_2.py, classified by location. Used in Phase 5D to
identify which imports to remove.
NEW: tests/test_audit_main_thread_imports.py
9 tests covering: --help exits 0, clean stdlib-only passes, heavy
third-party fails, google.genai fails, transitive walks, function-
body imports ignored, if-branch imports flagged, try-block imports
flagged, file:line reported. All 9 pass.
NEW: docs/reports/startup_baseline_20260606.txt
3-run median cold-start benchmark. Worst offenders: src.gui_2
(1770ms), simulation.user_agent (1517ms), google.genai (1001ms),
openai (482ms), anthropic (441ms), imgui_bundle (255ms),
src.theme_nerv* (485ms combined), src.markdown_table (243ms),
src.command_palette (242ms).
NEW: docs/reports/startup_audit_20260606.txt
Audit output on the CURRENT codebase. Reports 67 violations across
the main-thread import graph (incl. numpy in src/gui_2.py:9,
tomli_w in src/gui_2.py:18, fastapi + requests in src/app_controller,
tree_sitter_* in src/file_cache, pydantic in src/models, plus all
the src.* subsystem imports that drag in heavy transitive deps).
Phase 3-5 of the track will resolve these one by one.
After Phase 3-5, this audit must exit 0 (no violations).
Co-located reports in docs/reports/ per project convention; the other
agent finished their work in docs/superpowers/ and is unrelated.
Lightweight, in-memory profiler for AppController init phases. Used by
the startup_speedup_20260606 track to measure where the time goes
during boot (config hydration, hook server start, subsystem init, etc.).
The profiler is exposed via /api/startup_profile (Phase 8 work) and
the Diagnostics panel so the user can see the exact per-phase cost.
Public API:
StartupProfiler() - create
.phase(name) - context manager
.snapshot() - {phases: {name: {start_ts, duration_ms}}, total_ms, count}
.reset() - clear recorded phases
.enable() / .disable() - toggle recording
Implementation:
- dataclass with list of _Phase(name, start_ts, end_ts)
- @contextmanager records wall-clock via time.perf_counter
- records duration even if the body raises (try/finally)
- snapshot is a copy, so consumers can't mutate the live state
TDD: 5 tests in tests/test_startup_profiler.py cover: basic
recording, total math, snapshot isolation, exception safety, empty
state.
Track.get_executable_tickets (in models.py) called TrackDAG at
runtime, forcing a top-level import of src.dag_engine into models.py
and creating a 2-cycle that broke whichever module loaded second
(Ticket was not yet defined when models.py loaded first; TrackDAG
was not yet defined when dag_engine.py loaded first).
Fix: hoist the method out of the Track dataclass and into a free
function get_executable_tickets(track) in dag_engine.py. models.py
no longer needs TrackDAG at all, so the cycle is one-directional
(models -> dag_engine) and resolves cleanly in any import order.
Tests updated:
- tests/test_mma_models.py: import get_executable_tickets and call
it instead of track.get_executable_tickets() (4 call sites)
- tests/test_conductor_engine_v2.py: comment update
Verified both import orders resolve cleanly:
forward: import src.models; import src.dag_engine -> OK
reverse: import src.dag_engine; import src.models -> OK
34 tests pass (test_mma_models, test_dag_engine, test_execution_engine,
test_arch_boundary_phase3, test_track_state_schema).
When switching projects, the previous implementation ran the entire
save/load/refresh sequence on the main thread. With large project files
or slow disks, this caused the UI to freeze for several seconds.
Fix:
- _switch_project now returns immediately after setting flags; the
actual work runs in a daemon thread (_do_project_switch)
- New is_project_stale() property returns True while a switch is queued
or running; the GUI renders an amber/yellow tint overlay to signal
the controller state lags the user's last click
- AI ops are gated: _api_generate returns HTTP 409, _handle_generate_send
and _handle_md_only early-return with ai_status feedback, all when
is_project_stale() is true
- Queued switches (clicking project A then B in rapid succession) are
coalesced: B replaces A as the target; once A completes, B is
triggered automatically via the finally branch in _do_project_switch
- New state fields: _project_switch_in_progress, _project_switch_pending_path,
_project_switch_thread, _project_switch_lock
- AppController state class attributes use hasattr guard for _app to
keep the controller usable standalone in tests/headless mode
UX:
- Render loop keeps drawing during the switch
- User can still scroll, switch tabs, browse files
- Amber tint + popup explains what's happening and that AI ops are paused
- ai_status shows the target project name
Tests:
- _wait_for_switch helper added for the new async switch flow
- All 7 existing switch tests updated to call _wait_for_switch
- 2 new tests:
- test_switch_project_non_blocking: verifies _switch_project returns
in <0.2s and is_project_stale() is True during the switch
- test_api_generate_blocked_while_stale: verifies _api_generate
raises HTTPException(409) while a switch is in progress
All 33 related tests pass.
When switching projects, the previous project's context_files remained
visible in the Context Composition panel because the controller's
self.context_files list was not reloaded from the new project's TOML
files.paths entry.
Fix in _refresh_from_project:
- After loading self.files from the project TOML, populate
self.context_files with deep copies of those FileItem objects
- Reset self._app.ui_selected_context_files to match the new project's
auto_aggregate set
- Guard the _app access with hasattr so the controller is usable
standalone (in tests, headless mode, etc.) without an attached App
Test: 1 new test in tests/test_project_switch_persona_preset.py
- test_switch_project_resets_context_files: switches from project_a
(forth + gte_hello files) to project_b (gencpp timing files) and
asserts context_files contains ONLY project_b's files
Two fixes for the regression introduced in b92daef3 (and an additional
hardening for the persona->context_preset stale-reference class of bug):
1. Regression: persona_manager was missing on first project load.
_load_active_project creates preset_manager and tool_preset_manager
but did not create persona_manager, so the new
self.personas = self.persona_manager.load_all() line in
_refresh_from_project raised AttributeError on app startup before
the post-_load_active_project persona_manager creation could run.
Fix: create self.persona_manager in _load_active_project alongside
the other managers, so the manager is available when
_refresh_from_project runs.
2. Stale reference: persona's context_preset field pointed to a
preset (e.g. 'GTE') that no longer exists in the project, causing
load_context_preset to raise KeyError and crash the persona
selector panel (which triggered the cascading 'Missing End()' imgui
assertion).
Fix: wrap the load_context_preset call in render_persona_selector_panel
with try/except KeyError, surface the error in app.ai_status, and
clear app.ui_active_context_preset to keep the GUI state consistent.
Tests: 2 new tests in tests/test_project_switch_persona_preset.py
- test_load_active_project_creates_persona_manager (regression guard)
- test_load_context_preset_missing_raises_keyerror (verifies the
contract that load_context_preset raises for missing names; the
GUI layer is now responsible for catching the error)
When switching projects, the previous project's project-specific persona and
presets remained selected in the AI Settings panel because:
1. self.personas was not reloaded after switching project root
2. self.ui_active_persona / tool_preset / bias_profile / project_preset_name
were not validated against the newly-loaded personas/presets
Fix:
- Reload self.personas from self.persona_manager in _refresh_from_project
- Validate each active selection and reset to None/empty if it does not
exist in the newly-loaded manager dictionaries
- Push the active tool preset and bias profile to ai_client after the swap
- Initialize self.ui_active_bias_profile in class attribute block (was only
set later in __init__, causing AttributeError on direct attribute access)
Tests: 4 new tests in tests/test_project_switch_persona_preset.py verify
the reset behavior for persona, preset, tool preset, and global preset
preservation.
ROOT CAUSE: imgui_md (mekhontsev/imgui_md) BLOCK_P does NOT call ImGui::NewLine()
when m_list_stack is non-empty (verified in imgui_md.cpp). So a multi-paragraph
list item like:
- bullet text (long, wraps to 2 lines)
continuation paragraph
renders BOTH paragraphs at the same Y because the second BLOCK_P enters/exits
without advancing the cursor. The continuation crashes into the previous
paragraph's last wrapped line.
FIX: Add MarkdownRenderer._normalize_list_continuations preprocessor that
strips blank lines between a list item and its indented continuation. The
continuation then becomes a lazy continuation of the first paragraph (single
BLOCK_P in imgui_md, proper text wrapping, no overlap). Trade-off: users
cannot have separate paragraphs within a single list item. Acceptable.
Also: fixed a pre-existing bug in _normalize_nested_list_endings where a
duplicate conditional caused the function to return empty string (the
out.append(line) was inside the wrong scope). It was silently corrupting
all list content since fd5f4d0e.
TESTS: 23/23 markdown unit tests pass. 3 new tests for the new preprocessor
covering: blank-strip case, blank-preservation case, simple-list passthrough.
FIX 1 (src/markdown_table.py): Cells now use imgui_md.render(c) instead of
imgui.text_wrapped(c). imgui_md uses MD4C which strips backtick-delimited
inline code spans BEFORE rendering, so backticks no longer appear as
literal characters in cell content. Side benefit: inline emphasis
(*foo*, **bar**) now renders in cells too.
FIX 2 (src/markdown_helper.py): Added MarkdownRenderer._normalize_nested_list_endings.
Upstream imgui_md (mekhontsev/imgui_md) BLOCK_UL exit only calls
ImGui::NewLine() for top-level list endings. For nested list endings, no
NewLine is emitted, so the next text starts at the same Y as the last
list item, causing visual overlap. The preprocessor inserts a blank
line before any line that follows a list item with MORE indent than
itself, forcing a paragraph break. Cannot fix the C++ from Python.
Tests:
- test_markdown_table_wrapped.py: updated to assert imgui_md.render is
called for cell content (not imgui.text_wrapped).
- test_markdown_helper_bullets.py: added 4 tests for the new preprocessors
(nested-list blank insertion + bullet delimiter conversion + edge cases).
20/20 markdown unit tests pass. 1-space indentation throughout.
KNOWN LIMITATIONS (cannot fix without forking imgui_md C++):
- Inline code spans render as plain text (no monospace font in cells)
- The ' * ' bullet delimiter has a Y-overlap bug upstream
(workaround: pre-convert to '- ' via _normalize_bullet_delimiters)
- Nested list ending overlap (workaround: insert blank line via
_normalize_nested_list_endings)
Table fix (src/markdown_table.py):
- Add TableColumnFlags_.width_stretch to each table_setup_column call
(was missing — columns had no width to wrap against, so text_wrapped
couldn't grow row height → all rows squished together)
- Remove the explicit for-h-in-headers: table_next_column + text_wrapped(h)
loop. table_headers_row() already renders the header from the
table_setup_column() names; the explicit loop was drawing it AGAIN on
top → double-rendered header rows.
Bullet fix (src/markdown_helper.py):
- Revert _render_md_no_bullet_overlap → simple imgui_md.render(chunk);
imgui.spacing() (the original af0bbe97 approach). The complex
workaround was stripping '- ' and rendering stripped text to imgui_md,
which double-rendered '- 1. ...' content (imgui.bullet from my code +
numbered list marker from imgui_md).
- Add MarkdownRenderer._normalize_bullet_delimiters: regex-converts
'* ' markers to '- ' before passing to imgui_md. This works around
the upstream bug in mekhontsev/imgui_md BLOCK_LI where the '*' case
calls ImGui::Bullet() without ImGui::SameLine(), causing the bullet
to render on its own Y with the text on the next Y. The '-' case
uses Text+SameLine which is correct. Cannot fix from Python (we
can't subclass the C++ class) — pre-conversion is the cheapest fix.
Tests:
- test_markdown_table_wrapped.py: updated to assert new behavior
(text_wrapped count == cell count, not header+cell).
- test_markdown_table_columns.py: updated to assert exactly 6
table_next_column calls (cells only, not 9).
- test_markdown_helper_bullets.py: rewrote for new public-API behavior
(imgui_md.render called with the unstripped chunk).
16/16 markdown unit tests pass.
ROOT CAUSE: src/markdown_table.py:render_table was missing
imgui.table_setup_column() calls. In ImGui, columns MUST be
configured via table_setup_column before table_headers_row is called.
Without it, the table has no defined columns, causing cells to
render at overlapping Y positions. This manifested as text overlap
in the Discussion Hub's read_mode entries (e.g., 'swc2 -> gte_sw'
overlapping the line above it).
FIX: Call imgui.table_setup_column(h, TableColumnFlags_.width_stretch)
for each header BEFORE table_headers_row(). Each column now has a
defined width (stretch = fills available space) and cells render
correctly without overlap.
Tests:
- New test_markdown_table_columns.py asserts setup_column is called
once per column and table_next_column is called for each cell.
- 16/16 broad regression pass (test_markdown_table,
test_markdown_table_render, test_markdown_render_robust,
test_gen_send_empty_context, test_gui_fast_render)
ROOT CAUSE: The ListClipper in render_prior_session_view was being
tripped up by the variable heights of discussion entries (huge system
prompts vs small tool results). When the first entry was very tall
(system prompt), the clipper would compute the visible range assuming
uniform item heights, leading to underflow/overflow on subsequent
items. The user saw only the first ~8 entries with massive empty
space below ('early clipping').
FIX: Replace the ListClipper with a direct for loop over
app.prior_disc_entries. With 233 entries, performance is acceptable
and each entry renders correctly. The user can still scroll the
parent imscope.child window if content overflows.
Tests:
- Updated test_prior_session_no_clipping.py to set entries on
app_instance.controller.prior_disc_entries (the App's __getattr__
proxies attribute reads to the controller, so the set must go to
the controller directly).
- 28/28 broad regression pass
ROOT CAUSE: render_comms_history_panel had imgui.end_child() nested INSIDE
an 'if app._scroll_comms_to_bottom:' block at line 3758. When
_scroll_comms_to_bottom was False (the common case), end_child was
NOT called, leaving the comms_scroll child window open. This caused
the imGui state to corrupt: tab_item.end_tab_item, tab_bar.end_tab_bar,
and the outer window.end all saw that the child was still open
(WithinEndChildID was set), triggering 'Must call EndChild() and not
End()!' assertion.
FIX: Convert the entire comms_scroll block to imscope.child (which uses
Python's with statement for exception-safe end_child). The scroll-to-bottom
logic is now correctly nested INSIDE the with block, and there's no
manual end_child to forget.
Tests:
- Updated test_comms_scroll_no_clipping.py to check imscope.child
instead of begin_child
- 28/28 broad regression pass
ROOT CAUSE: render_heavy_text (called per comms panel entry) had
manual begin_child/end_child pairs. If anything inside the child
(especially markdown_helper.render) raised, end_child was skipped.
The child window was left open, corrupting the imGui state. The
corruption cascaded through tab_item.end_tab_item -> tab_bar.end_tab_bar
-> window.end, triggering 'Must call EndChild() and not End()!' assertion.
FIX: Convert the inner begin_child/end_child pair to imscope.child so
the end_child is automatically called by Python's with statement, even
on exception. Also convert prior_scroll to imscope.child for consistency.
TESTS:
- Existing test_comms_no_extraneous_pop.py: push/pop balance check
- Updated test_prior_session_no_clipping.py to match new imscope.child
signature
- 28/28 broad regression pass
ROOT CAUSE: In a previous fix (df7bda6e 'explicit child size for
comms_scroll and prior_scroll'), the code that pushed a child_bg style
color at the start of render_comms_history_panel was removed when the
section was rewritten to use imgui.get_content_region_avail() for
explicit child sizing. However, the matching pop_style_color at the end
of the function (guarded by 'if app.is_viewing_prior_session') was left
in place.
RESULT: When viewing a prior session, the imscope.style_color in
_gui_func pushes 1 color at the start of the frame, then the orphaned
pop in render_comms_history_panel decrements the imGui style counter
by 1, then _gui_func's imscope __exit__ tries to pop again — triggering
IM_ASSERT 'PopStyleColor() too many times!'.
This caused a cascade of imGui state corruption on every frame after
loading a prior session log, manifesting as 'too many times' assertions
on the next frame and 'Must call EndChild() and not End()' once the
style stack underflowed.
FIX: Remove the orphan pop_style_color at gui_2.py:3761. No matching
push exists, so the pop is unconditionally wrong.
TESTS:
- New test_comms_no_extraneous_pop.py asserts push/pop balance in
render_comms_history_panel when is_viewing_prior_session is True
- 43/43 broad regression pass