Phase 5C of startup_speedup_20260606 track.
src/markdown_helper.py imported src.markdown_table at module level:
from src.markdown_table import parse_tables, render_table
Both parse_tables and render_table are only used inside
MarkdownRenderer.render(). Removed the top-level import; the
MarkdownRenderer.render() method now does:
markdown_table = _require_warmed('src.markdown_table')
parse_tables = markdown_table.parse_tables
render_table = markdown_table.render_table
at the top of its body, before any other logic.
TESTS:
- tests/test_markdown_helper_no_top_level_table.py: 3/3 PASS (all RED -> GREEN)
- tests/test_markdown_table*.py (5 files) + test_markdown_helper_bullets.py +
test_markdown_render_robust.py: 24/24 PASS (no breakage)
EFFECTIVENESS: import src.markdown_helper no longer triggers src.markdown_table
(~250ms). For renderers that never hit a GFM table, the import is never
paid. For renderers that do, the warmup pre-loads it on _io_pool and the
render() lookup is O(1).
NEXT: Phase 5D - bulk refactor of src/gui_2.py feature-gated imports via
scripts/audit_gui2_imports.py.
Track + metadata + state + tracks.md registration for the Fleury-pattern
error handling refactor.
Key design decisions (per user approval):
- Option A for _send_<vendor>() handling: rename to _send_<vendor>_result()
and change return type to Result[str] (contained to internal callers).
- send() is marked @typing_extensions.deprecated; send_result() is the new
public API.
- ProviderError exception is FULLY REPLACED by ErrorInfo dataclass
(a value, not an exception).
- 5 phases: foundation, mcp_client, ai_client, rag_engine, deprecation+archive.
- Post-tracks baseline check (Phase 1 Task 1.1) verifies the 3 pending
tracks have merged before proceeding.
- 9 Open Questions, 7 Risks, 5 verification criteria, follow-up track
public_api_migration_20260606 planned in spec §12.1.
Blocked by: startup_speedup_20260606, test_batching_refactor_20260606,
qwen_llama_grok_integration_20260606. Blocks: public_api_migration_20260606.
Phase 5B of startup_speedup_20260606 track.
src/theme_2.py had 3 top-level NERV imports:
from src import theme_nerv
from src.theme_nerv import DATA_GREEN
from src.theme_nerv_fx import CRTFilter, AlertPulsing, StatusFlicker
And 3 module-level FX object instantiations:
_crt_filter = CRTFilter()
_alert_pulsing = AlertPulsing()
_status_flicker = StatusFlicker()
ALL removed. The 3 use sites now lookup via _require_warmed:
- apply() NERV branch: theme_nerv = _require_warmed('src.theme_nerv')
- ai_text_color(): theme_nerv = _require_warmed('src.theme_nerv')
(then uses theme_nerv.DATA_GREEN)
- render_post_fx(): theme_nerv_fx = _require_warmed('src.theme_nerv_fx')
(then creates FX objects locally per-call)
The _status_flicker was instantiated but never used (dead code path;
the StatusFlicker class is still importable via theme_nerv_fx but not
auto-constructed in theme_2.py).
TESTS:
- tests/test_theme_2_no_top_level_nerv.py: 4/4 PASS (all RED -> GREEN)
- tests/test_theme.py, test_theme_nerv.py, test_theme_nerv_fx.py,
test_theme_models.py: 21/21 PASS (no breakage)
EFFECTIVENESS: import src.theme_2 no longer triggers src.theme_nerv or
src.theme_nerv_fx (~485ms combined). For users on default theme, these
are NEVER loaded. For NERV users, the warmup pre-loads on _io_pool and
the lookup is O(1).
NEXT: Phase 5C (markdown table) follows same TDD pattern.
This track executes after startup_speedup, test_batching_refactor, and
qwen_llama_grok_integration land. Section 10 documents the expected
post-tracks codebase state and answers 6 critical coordination questions:
- Q1: Existing _send_<vendor>() functions (returning str) are renamed
to _send_<vendor>_result() and changed to return Result[str] (Option A:
clean rename, contained to internal callers).
- Q2: send_openai_compatible in src/openai_compatible.py STAYS as-is
(it raises at the SDK boundary; correct per Fleury). The new
_send_<vendor>_result() functions catch and convert to ErrorInfo.
- Q3: Deprecation warning on send() will produce Python warnings in
tests; filterwarnings in conftest.py silences them during transition.
- Q4: The except ProviderError clauses in src/ai_client.py become
dead code after the refactor and are removed in Phase 3.
- Q5: ProviderError is FULLY REPLACED by ErrorInfo (a value, not an
exception). ProviderError removed entirely; ErrorInfo is the new
error type.
- Q6: ProviderError.ui_message() moves to ErrorInfo.ui_message().
Phase 1 also adds a baseline verification task to confirm the 3 pending
tracks have merged before proceeding.
Also renumbered Out of Scope (11) and See Also (12) sections to
preserve monotonic section numbers.
Phase 5A T5A.1-T5A.4 of startup_speedup_20260606 track.
src/commands.py was importing src.command_palette at module load to
create the CommandRegistry singleton. The 32 @registry.register
decorators on the command functions needed this registry at import time.
Approach: lazy registry proxy. The @registry.register decorator now
just queues the function in a list; the real CommandRegistry is built
on first access to any other registry attribute (.all, .get, etc.).
By that time, all 32 decorators have run and the pending list is
populated, so the real registration is complete in one pass.
src/commands.py changes:
- Removed 'from src.command_palette import CommandRegistry'
- Added 'from src.module_loader import _require_warmed'
- Added _LazyCommandRegistry class (proxy)
- Added _get_real_registry() function (initializes on first access)
- Replaced 'registry = CommandRegistry()' with 'registry = _LazyCommandRegistry()'
- The 32 @registry.register decorators are unchanged (the proxy's
register method returns the function unchanged after queueing it)
EFFECTIVENESS:
- 'import src.commands' no longer triggers src.command_palette (~244ms)
- The warmup on AppController's _io_pool pre-loads src.command_palette
on a background thread during startup
- First access to registry.all() (e.g. from gui_2.py at palette open
time) is O(1) - the warmup module is already in sys.modules
TESTS:
- tests/test_commands_no_top_level_command_palette.py: 4/4 PASS (3 RED, 1 green; now all green)
- tests/test_command_palette.py: 13/13 PASS (no breakage)
- tests/test_command_palette_sim.py: 7/7 PASS (live_gui tests, the
full palette flow works end-to-end with the lazy proxy)
ARCHITECTURAL NOTE: The lazy proxy is a minimal-change solution that
preserves the public API. The 32 decorated functions don't need any
changes; gui_2.py's 'from src.commands import registry' still works
unchanged. The deferral is invisible to consumers.
NEXT: Phase 5B (NERV theme) and 5C (markdown table) follow the same
TDD pattern. 5D is the bulk refactor of src/gui_2.py feature-gated
imports via the audit_gui2_imports.py script.
Phase 4 T4.1-T4.4 of startup_speedup_20260606 track.
DEVIATION FROM ORIGINAL SPEC: spec.md said fastapi was in src/api_hooks.py
but it was actually in src/app_controller.py (lines 17, 21). api_hooks.py
uses stdlib http.server. Phase 4 target corrected to app_controller.
LIFTED _require_warmed TO SHARED MODULE: created src/module_loader.py to
avoid duplicating the lookup logic and the cross-module import smell
(app_controller -> ai_client). src/ai_client.py re-exports it so the
T3.1 test (which asserts hasattr(src.ai_client, '_require_warmed'))
continues to work.
src/app_controller.py changes:
- Added 'from __future__ import annotations' (enables lazy type annotations;
-> FastAPI return type now a forward reference)
- Removed 'from fastapi import FastAPI, Depends, HTTPException' (line 17)
- Removed 'from fastapi.security.api_key import APIKeyHeader' (line 21)
- Added 'from src.module_loader import _require_warmed' (cross-module via
shared utility, not via ai_client)
- create_api(): added lookups at top of function body
- 7 _api_* helper functions (_api_get_key, _api_generate, _api_stream,
_api_confirm_action, _api_get_session, _api_delete_session,
_api_get_context): added 'HTTPException = _require_warmed(...).HTTPException'
at top of each function body
EFFECTIVENESS:
- import src.app_controller no longer triggers fastapi import (saves ~470ms
in main thread; only loaded when --enable-test-hooks is set)
- When --enable-test-hooks is set, the AppController's warmup pre-loads
fastapi on the _io_pool, so create_api()'s lookup is O(1)
TESTS:
- tests/test_app_controller_no_top_level_fastapi.py: 4/4 PASS (was 3 RED + 1 pass)
- tests/test_ai_client_no_top_level_sdk_imports.py: 9/9 still PASS (re-export works)
- tests/test_app_controller_mcp.py, test_app_controller_offloading.py: pass
- tests/test_headless_service.py: 10/11 PASS (1 pre-existing failure
test_generate_endpoint is a circular-import issue in google.genai,
reproduces identically on stashed pre-Phase-4 state - NOT a regression
from this change)
- tests/test_hooks.py: pass
NEXT: Phase 5 (feature-gated GUI module imports - command palette, NERV
theme, markdown table), then Phase 6 (ad-hoc threads -> _io_pool).
Phase 3 T3.2 + T3.3 of startup_speedup_20260606 track.
The 5 heavy SDKs (anthropic, google.genai, openai, google.genai.types,
requests) are no longer imported at module level. Each function that
needs them now calls _require_warmed(name) to get the module from
sys.modules (populated by AppController's warmup on _io_pool).
This is the load-bearing wall of the Main Thread Purity Invariant:
heavy modules are never in the main thread's import chain.
run_discussion_compression now uses _require_warmed for both
google.genai.types (gemini branch) and requests (deepseek branch).
Tests/test_tier4_patch_generation.py adapted: the 2 tests that
mocked 'src.ai_client.types' (no longer a module-level attr)
now mock 'src.ai_client._require_warmed' (the new public mechanism).
T3.1 tests now pass (9/9). T3.3 breakage fixed.
All 25 ai_client + tier4 tests pass.
The 46-entry mcp.manual-slop.tools block added in commit 30281843 was invalid per the v1.16.2 schema (McpLocalConfig has additionalProperties: false) and was being silently dropped. Also adds proper MCP server configuration and subagent permission grants.
Changes:
opencode.json:
- Remove the silently-dropped mcp.manual-slop.tools block (46 entries)
- Add timeout: 30000 (default 5000 is fragile)
- Add environment block with PYTHONPATH, GIT_TERMINAL_PROMPT, GCM_INTERACTIVE, GIT_ASKPASS, HOME so mcp_env.toml values are injected into the MCP server process
- Top-level 'tools' block intentionally omitted: schema only accepts boolean values (enable/disable), not description objects. Tool descriptions come from the MCP server's list_tools response (mcp_client.MCP_TOOL_SPECS).
.opencode/agents/{tier1-orchestrator,tier2-tech-lead,tier3-worker,tier4-qa,explore}.md:
- Add 'manual-slop_*': allow to each agent's permission block so subagents can use the 46 MCP tools (previously defaulted to deny in some permission schemas)
general.md: no change (no permission block, defaults to allow all)
Verified:
- opencode.json is now schema-valid (no more 'Expected boolean' errors)
- Both MCP servers connected: MiniMax (2 tools), manual-slop (46 tools)
- manual-slop MCP server startup: ~651ms (well under 30s timeout)
- All MCP tests pass: test_mcp_config.py + test_mcp_perf_tool.py = 4/4
- Subagent permission blocks confirmed in 'opencode debug config' output
Phase 3 Task T3.1 of startup_speedup_20260606 track. 9 tests assert:
- import src.ai_client does NOT trigger google.genai / anthropic /
openai / requests / google.genai.types imports (the main thread
must not load these on import; they're warmed on _io_pool)
- _require_warmed(name) helper exists and is callable
- _require_warmed returns the cached module if already in sys.modules
- _require_warmed falls back to importlib for tests/dev where
warmup didn't run
- The static audit script does not see src/ai_client.py as a
contributor of heavy-import violations
All 9 tests are currently FAILING (RED). They will turn GREEN when
T3.2 (the actual refactor of src/ai_client.py to remove top-level
imports and add _require_warmed) lands.
The implementation is held pending MCP client fix (per user instruction).
The capability matrix v1 has no 'audio' field (audio_input is deferred to v2).
Qwen-Audio's vision flag was incorrectly marked true. Changed to false and
clarified that v1 uses Qwen-Audio as text-only; audio attachment UI is
hidden via the absent audio capability check.
Phase 2 of startup_speedup_20260606 is done.
Tasks:
T2.1 (Red) tests/test_io_pool.py 1354679e 4 tests
T2.2 (Green) src/io_pool.py 1354679e make_io_pool() factory
T2.3 (Red) tests/test_warmup.py 1354679e 10 tests
T2.4 (Green) src/warmup.py 1354679e WarmupManager
T2.5 (Wire) AppController integration 922c5ad9 io_pool + warmup in __init__ + 5 public delegation methods
T2.6 (Plan) this commit
What now exists:
- make_io_pool() returns a 4-worker ThreadPoolExecutor named 'controller-io-N'
- WarmupManager class with submit/status/is_done/wait/on_complete/reset
- AppController creates self._io_pool + self._warmup early in __init__
- Warmup is submitted immediately (jobs run concurrent with the rest of init)
- Public API: controller.warmup_status(), controller.is_warmup_done(),
controller.wait_for_warmup(timeout), controller.on_warmup_complete(cb)
- controller._compute_warmup_list() returns 9 always + 2 conditional (fastapi)
- shutdown() now also shuts down the io_pool
Currently the warmup is a no-op for modules already imported at the top
of app_controller.py (fastapi, requests). Phase 3 will remove those
top-level imports; the warmup infrastructure will then start doing
real work.
18/18 tests passing (4 io_pool + 10 warmup + 4 test_app_controller_*).
Next: Phase 3 (remove top-level SDK imports from src/ai_client.py).
Expected to fix ~3 audit violations (google.genai, anthropic, openai).
Phase 2 Task T2.5 of the startup_speedup_20260606 track.
In AppController.__init__, right after the lock init (and before the
heavy subsystem construction that follows), create the shared _io_pool
and WarmupManager, then submit the warmup list. The warmup runs
concurrently with the rest of __init__, so by the time __init__
returns, the heavy modules are loaded (or in flight).
Changes:
- Add imports: from src.io_pool import make_io_pool,
from src.warmup import WarmupManager
- In __init__, after the locks block, add:
self._io_pool = make_io_pool()
self._warmup = WarmupManager(self._io_pool)
self._warmup.submit(self._compute_warmup_list())
- Add _compute_warmup_list() method: returns ['google.genai',
'anthropic', 'openai', 'requests', 'src.command_palette',
'src.theme_nerv', 'src.theme_nerv_fx', 'src.markdown_table',
'numpy'] always, plus ['fastapi', 'fastapi.security.api_key']
if self.test_hooks_enabled
- Add public delegation methods: warmup_status(), is_warmup_done(),
wait_for_warmup(timeout), on_warmup(callback)
- In shutdown(), add self._io_pool.shutdown(wait=False)
The warmup currently is a no-op for the heavy modules already imported
at the top of app_controller.py (fastapi, requests, etc. are
already in sys.modules). The infrastructure is in place; Phase 3 will
remove the top-level imports so the warmup actually does work.
Verified: all 18 tests pass (test_io_pool + test_warmup + existing
test_app_controller_mcp + test_app_controller_offloading).
Phase 2 Tasks T2.1-T2.4 of the startup_speedup_20260606 track.
NEW: src/io_pool.py
make_io_pool() factory: 4-worker ThreadPoolExecutor with
thread_name_prefix='controller-io'. The sanctioned way for any
background work. Replaces ad-hoc threading.Thread() calls per
the 'no new threads' rule.
NEW: src/warmup.py
WarmupManager: manages a list of modules to import on the shared
pool. Public API:
.submit(modules) - start warmup (call once)
.status() - {pending, completed, failed}
.is_done() - bool
.wait(timeout) - block until done
.on_complete(callback) - register completion callback
.reset() - clear state
Thread-safe (lock-guarded). 10 tests cover all paths.
NEW: tests/test_io_pool.py (4 tests):
- ThreadPoolExecutor returned
- 4 workers
- Threads named 'controller-io-*'
- Jobs run in parallel (barrier test)
NEW: tests/test_warmup.py (10 tests):
- One job per module submitted
- Initial pending list correct
- Failed imports tracked
- Done event set after all complete
- wait() blocks until done
- on_complete callback fires (and immediately if already done)
- Modules actually end up in sys.modules
- reset() clears state
- Jobs run concurrently (not serially)
All 14 tests pass. AppController integration is the next commit.
16 tasks across 4 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.16): Library + dry-run. 20 unit tests across categorizer,
batcher, plugin. New run_tests_batched.py has --plan/--audit only.
- Phase 2 (2.1-2.3): Shadow run via CI. Compare new vs old plan output.
- Phase 3 (3.1-3.4): Switch default. Full CLI with --tiers, --durations.
Old script becomes .legacy. Update docs/guide_testing.md.
- Phase 4 (4.1-4.6): Populate registry, gitignore durations, delete
legacy, archive track.
1-space indentation per project style guide. No placeholders. All
test code is concrete.
Phase 1, Tasks T1.2 + T1.4 of the startup_speedup_20260606 track.
NEW: scripts/audit_main_thread_imports.py
Static CI gate that AST-walks the import graph reachable from
sloppy.py and fails (exit 1) if any heavy module is imported at the
top of a main-thread-reachable file. Walks into if/elif/else and
try/except branches (which run at import time) but skips function
bodies (which only run when called). Allowlist: stdlib + the lean
gui_2 skeleton (imgui_bundle, defer, src.imgui_scopes, src.theme_2,
src.theme_models, src.paths, src.models, src.events).
NEW: scripts/audit_gui2_imports.py
Read-only analysis tool that lists every top-level and function-level
import in src/gui_2.py, classified by location. Used in Phase 5D to
identify which imports to remove.
NEW: tests/test_audit_main_thread_imports.py
9 tests covering: --help exits 0, clean stdlib-only passes, heavy
third-party fails, google.genai fails, transitive walks, function-
body imports ignored, if-branch imports flagged, try-block imports
flagged, file:line reported. All 9 pass.
NEW: docs/reports/startup_baseline_20260606.txt
3-run median cold-start benchmark. Worst offenders: src.gui_2
(1770ms), simulation.user_agent (1517ms), google.genai (1001ms),
openai (482ms), anthropic (441ms), imgui_bundle (255ms),
src.theme_nerv* (485ms combined), src.markdown_table (243ms),
src.command_palette (242ms).
NEW: docs/reports/startup_audit_20260606.txt
Audit output on the CURRENT codebase. Reports 67 violations across
the main-thread import graph (incl. numpy in src/gui_2.py:9,
tomli_w in src/gui_2.py:18, fastapi + requests in src/app_controller,
tree_sitter_* in src/file_cache, pydantic in src/models, plus all
the src.* subsystem imports that drag in heavy transitive deps).
Phase 3-5 of the track will resolve these one by one.
After Phase 3-5, this audit must exit 0 (no violations).
Co-located reports in docs/reports/ per project convention; the other
agent finished their work in docs/superpowers/ and is unrelated.
Default --audit exits non-zero on hard errors only. --strict adds the
'multiple subsystems = probably cross-cutting' heuristic from Section 9
as a CI gate. Two modes, one flag.
Three-tier batching refactor: replace alphabetical 4-at-a-time batching with
fixture-class-isolated tiers (0 opt-in, 1 unit/xdist, 2 mock_app, 3 live_gui
in one session, H headless, P performance).
Hybrid classification: auto-infer from filename + AST fixture scan; hand-curated
tests/test_categories.toml overrides for cross-cutting and ambiguous files.
Opt-in per-test order control via [[files.X.test_order]] sub-tables, gated on
a conftest-loaded pytest plugin (no-op without entries).
Priority order: B (process isolation) > A (subsystem diagnostic) > C (speed).
Lightweight, in-memory profiler for AppController init phases. Used by
the startup_speedup_20260606 track to measure where the time goes
during boot (config hydration, hook server start, subsystem init, etc.).
The profiler is exposed via /api/startup_profile (Phase 8 work) and
the Diagnostics panel so the user can see the exact per-phase cost.
Public API:
StartupProfiler() - create
.phase(name) - context manager
.snapshot() - {phases: {name: {start_ts, duration_ms}}, total_ms, count}
.reset() - clear recorded phases
.enable() / .disable() - toggle recording
Implementation:
- dataclass with list of _Phase(name, start_ts, end_ts)
- @contextmanager records wall-clock via time.perf_counter
- records duration even if the body raises (try/finally)
- snapshot is a copy, so consumers can't mutate the live state
TDD: 5 tests in tests/test_startup_profiler.py cover: basic
recording, total math, snapshot isolation, exception safety, empty
state.
Architectural shift driven by user clarification: lazy-loading on first
use causes user-perceptible lag when the user-triggered action (e.g.
provider switch) propagates to a controller method that triggers the
first import. The fix is to pre-import heavy modules on a bg thread
at startup and have functions access them via _require_warmed().
Old design (rejected):
- from google import genai inside _send_gemini (lazy on first call)
- First user action that triggers this pays the cost; UI feels laggy
New design (this commit):
- Top-level heavy imports REMOVED from main-thread-reachable files
- AppController.__init__ submits warmup jobs to _io_pool (4 threads,
named 'controller-io-N')
- Each warmup worker imports its module and updates a thread-safe
warmup_status dict
- Functions access modules via _require_warmed(name), which assumes
the module is in sys.modules (warmed at startup)
- When all jobs complete, _warmup_done_event is set and registered
on_warmup_complete callbacks fire
- GUI shows status indicator + toast when warmup completes
- Hook API exposes /api/warmup_status and /api/warmup_wait
- Tests can call controller.wait_for_warmup() before exercising
warmup-dependent functionality
Phase 2 now bundles job pool + warmup (T2.3+T2.4 add warmup tests +
implementation). Phases 3-5 do 'remove top-level imports' instead of
'lazy-load'. Phase 7 is the notification surface (Hook API + GUI).
Definition of Done includes warmup-completion criteria, the
'no function-body imports' check, and an end-to-end 'provider switch
is INSTANT' smoke test.
No code changes; this is a planning update only.
Track.get_executable_tickets (in models.py) called TrackDAG at
runtime, forcing a top-level import of src.dag_engine into models.py
and creating a 2-cycle that broke whichever module loaded second
(Ticket was not yet defined when models.py loaded first; TrackDAG
was not yet defined when dag_engine.py loaded first).
Fix: hoist the method out of the Track dataclass and into a free
function get_executable_tickets(track) in dag_engine.py. models.py
no longer needs TrackDAG at all, so the cycle is one-directional
(models -> dag_engine) and resolves cleanly in any import order.
Tests updated:
- tests/test_mma_models.py: import get_executable_tickets and call
it instead of track.get_executable_tickets() (4 call sites)
- tests/test_conductor_engine_v2.py: comment update
Verified both import orders resolve cleanly:
forward: import src.models; import src.dag_engine -> OK
reverse: import src.dag_engine; import src.models -> OK
34 tests pass (test_mma_models, test_dag_engine, test_execution_engine,
test_arch_boundary_phase3, test_track_state_schema).
Fulfills the existing backlog entry at conductor/tracks.md:152
(2026-06-05 root-cause analysis of live_gui wait_for_server timeouts).
Main Thread Purity Invariant: the main thread (entering immapp.run())
must never import a module heavier than imgui_bundle and the lean
gui_2 skeleton. Enforced by:
- static gate: scripts/audit_main_thread_imports.py (CI)
- runtime hook: tests/test_main_thread_purity.py (sys.addaudithook)
Threading constraint: no new threading.Thread(...) calls in src/.
All background work goes through AppController._io_pool
(ThreadPoolExecutor, max_workers=4, thread_name_prefix='controller-io').
9 phases, 57 tasks: audit+baseline, job pool, lazy-load SDKs, lazy-load
FastAPI, lazy-load feature-gated GUI, migrate ad-hoc threads, runtime
enforcement, hook API + diagnostics, verify+checkpoint.
Expected savings: ~2000-2400ms off main-thread import cost.
Target: import src.ai_client < 50ms (from ~1800ms), live_gui fixtures
no longer time out at wait_for_server(timeout=15).