Private
Public Access
0
0
Commit Graph

2591 Commits

Author SHA1 Message Date
ed 69d098baaa refactor(theme_2): remove top-level NERV theme imports; use _require_warmed
Phase 5B of startup_speedup_20260606 track.

src/theme_2.py had 3 top-level NERV imports:
  from src import theme_nerv
  from src.theme_nerv import DATA_GREEN
  from src.theme_nerv_fx import CRTFilter, AlertPulsing, StatusFlicker

And 3 module-level FX object instantiations:
  _crt_filter     = CRTFilter()
  _alert_pulsing  = AlertPulsing()
  _status_flicker = StatusFlicker()

ALL removed. The 3 use sites now lookup via _require_warmed:
- apply() NERV branch: theme_nerv = _require_warmed('src.theme_nerv')
- ai_text_color(): theme_nerv = _require_warmed('src.theme_nerv')
  (then uses theme_nerv.DATA_GREEN)
- render_post_fx(): theme_nerv_fx = _require_warmed('src.theme_nerv_fx')
  (then creates FX objects locally per-call)

The _status_flicker was instantiated but never used (dead code path;
the StatusFlicker class is still importable via theme_nerv_fx but not
auto-constructed in theme_2.py).

TESTS:
- tests/test_theme_2_no_top_level_nerv.py: 4/4 PASS (all RED -> GREEN)
- tests/test_theme.py, test_theme_nerv.py, test_theme_nerv_fx.py,
  test_theme_models.py: 21/21 PASS (no breakage)

EFFECTIVENESS: import src.theme_2 no longer triggers src.theme_nerv or
src.theme_nerv_fx (~485ms combined). For users on default theme, these
are NEVER loaded. For NERV users, the warmup pre-loads on _io_pool and
the lookup is O(1).

NEXT: Phase 5C (markdown table) follows same TDD pattern.
2026-06-06 16:55:20 -04:00
ed 494f68f9d9 conductor(spec): Add 'Coordination with Pending Tracks' section (§10)
This track executes after startup_speedup, test_batching_refactor, and
qwen_llama_grok_integration land. Section 10 documents the expected
post-tracks codebase state and answers 6 critical coordination questions:

- Q1: Existing _send_<vendor>() functions (returning str) are renamed
  to _send_<vendor>_result() and changed to return Result[str] (Option A:
  clean rename, contained to internal callers).
- Q2: send_openai_compatible in src/openai_compatible.py STAYS as-is
  (it raises at the SDK boundary; correct per Fleury). The new
  _send_<vendor>_result() functions catch and convert to ErrorInfo.
- Q3: Deprecation warning on send() will produce Python warnings in
  tests; filterwarnings in conftest.py silences them during transition.
- Q4: The except ProviderError clauses in src/ai_client.py become
  dead code after the refactor and are removed in Phase 3.
- Q5: ProviderError is FULLY REPLACED by ErrorInfo (a value, not an
  exception). ProviderError removed entirely; ErrorInfo is the new
  error type.
- Q6: ProviderError.ui_message() moves to ErrorInfo.ui_message().

Phase 1 also adds a baseline verification task to confirm the 3 pending
tracks have merged before proceeding.

Also renumbered Out of Scope (11) and See Also (12) sections to
preserve monotonic section numbers.
2026-06-06 16:54:25 -04:00
ed 78d3a1db1f refactor(commands): use lazy registry proxy to defer src.command_palette import
Phase 5A T5A.1-T5A.4 of startup_speedup_20260606 track.

src/commands.py was importing src.command_palette at module load to
create the CommandRegistry singleton. The 32 @registry.register
decorators on the command functions needed this registry at import time.

Approach: lazy registry proxy. The @registry.register decorator now
just queues the function in a list; the real CommandRegistry is built
on first access to any other registry attribute (.all, .get, etc.).
By that time, all 32 decorators have run and the pending list is
populated, so the real registration is complete in one pass.

src/commands.py changes:
- Removed 'from src.command_palette import CommandRegistry'
- Added 'from src.module_loader import _require_warmed'
- Added _LazyCommandRegistry class (proxy)
- Added _get_real_registry() function (initializes on first access)
- Replaced 'registry = CommandRegistry()' with 'registry = _LazyCommandRegistry()'
- The 32 @registry.register decorators are unchanged (the proxy's
  register method returns the function unchanged after queueing it)

EFFECTIVENESS:
- 'import src.commands' no longer triggers src.command_palette (~244ms)
- The warmup on AppController's _io_pool pre-loads src.command_palette
  on a background thread during startup
- First access to registry.all() (e.g. from gui_2.py at palette open
  time) is O(1) - the warmup module is already in sys.modules

TESTS:
- tests/test_commands_no_top_level_command_palette.py: 4/4 PASS (3 RED, 1 green; now all green)
- tests/test_command_palette.py: 13/13 PASS (no breakage)
- tests/test_command_palette_sim.py: 7/7 PASS (live_gui tests, the
  full palette flow works end-to-end with the lazy proxy)

ARCHITECTURAL NOTE: The lazy proxy is a minimal-change solution that
preserves the public API. The 32 decorated functions don't need any
changes; gui_2.py's 'from src.commands import registry' still works
unchanged. The deferral is invisible to consumers.

NEXT: Phase 5B (NERV theme) and 5C (markdown table) follow the same
TDD pattern. 5D is the bulk refactor of src/gui_2.py feature-gated
imports via the audit_gui2_imports.py script.
2026-06-06 16:48:04 -04:00
ed 16291234ff conductor(plan): Record Phase 4 checkpoint SHA 883682c1 2026-06-06 16:37:27 -04:00
ed 883682c1c2 conductor(checkpoint): Phase 4 complete - fastapi no longer in main-thread import chain 2026-06-06 16:36:31 -04:00
ed a0ff1bde91 conductor(plan): Mark Phase 4 complete - app_controller fastapi import removal + _require_warmed lift 2026-06-06 16:36:20 -04:00
ed 3849d30441 refactor(app_controller): remove top-level fastapi imports; lift _require_warmed to shared module
Phase 4 T4.1-T4.4 of startup_speedup_20260606 track.

DEVIATION FROM ORIGINAL SPEC: spec.md said fastapi was in src/api_hooks.py
but it was actually in src/app_controller.py (lines 17, 21). api_hooks.py
uses stdlib http.server. Phase 4 target corrected to app_controller.

LIFTED _require_warmed TO SHARED MODULE: created src/module_loader.py to
avoid duplicating the lookup logic and the cross-module import smell
(app_controller -> ai_client). src/ai_client.py re-exports it so the
T3.1 test (which asserts hasattr(src.ai_client, '_require_warmed'))
continues to work.

src/app_controller.py changes:
- Added 'from __future__ import annotations' (enables lazy type annotations;
  -> FastAPI return type now a forward reference)
- Removed 'from fastapi import FastAPI, Depends, HTTPException' (line 17)
- Removed 'from fastapi.security.api_key import APIKeyHeader' (line 21)
- Added 'from src.module_loader import _require_warmed' (cross-module via
  shared utility, not via ai_client)
- create_api(): added lookups at top of function body
- 7 _api_* helper functions (_api_get_key, _api_generate, _api_stream,
  _api_confirm_action, _api_get_session, _api_delete_session,
  _api_get_context): added 'HTTPException = _require_warmed(...).HTTPException'
  at top of each function body

EFFECTIVENESS:
- import src.app_controller no longer triggers fastapi import (saves ~470ms
  in main thread; only loaded when --enable-test-hooks is set)
- When --enable-test-hooks is set, the AppController's warmup pre-loads
  fastapi on the _io_pool, so create_api()'s lookup is O(1)

TESTS:
- tests/test_app_controller_no_top_level_fastapi.py: 4/4 PASS (was 3 RED + 1 pass)
- tests/test_ai_client_no_top_level_sdk_imports.py: 9/9 still PASS (re-export works)
- tests/test_app_controller_mcp.py, test_app_controller_offloading.py: pass
- tests/test_headless_service.py: 10/11 PASS (1 pre-existing failure
  test_generate_endpoint is a circular-import issue in google.genai,
  reproduces identically on stashed pre-Phase-4 state - NOT a regression
  from this change)
- tests/test_hooks.py: pass

NEXT: Phase 5 (feature-gated GUI module imports - command palette, NERV
theme, markdown table), then Phase 6 (ad-hoc threads -> _io_pool).
2026-06-06 16:34:46 -04:00
ed 7fb13fbf4b conductor(plan): Record Phase 3 checkpoint SHA + mark T3.6 complete 2026-06-06 16:13:35 -04:00
ed 056358f230 conductor(checkpoint): Phase 3 complete - ai_client heavy SDK imports removed 2026-06-06 16:12:17 -04:00
ed 8905c26bff conductor(plan): Mark Phase 3 complete - ai_client SDK import removal done 2026-06-06 16:11:14 -04:00
ed 51c054ece8 refactor(ai_client): remove top-level SDK imports; use _require_warmed
Phase 3 T3.2 + T3.3 of startup_speedup_20260606 track.

The 5 heavy SDKs (anthropic, google.genai, openai, google.genai.types,
requests) are no longer imported at module level. Each function that
needs them now calls _require_warmed(name) to get the module from
sys.modules (populated by AppController's warmup on _io_pool).

This is the load-bearing wall of the Main Thread Purity Invariant:
heavy modules are never in the main thread's import chain.

run_discussion_compression now uses _require_warmed for both
google.genai.types (gemini branch) and requests (deepseek branch).

Tests/test_tier4_patch_generation.py adapted: the 2 tests that
mocked 'src.ai_client.types' (no longer a module-level attr)
now mock 'src.ai_client._require_warmed' (the new public mechanism).

T3.1 tests now pass (9/9). T3.3 breakage fixed.
All 25 ai_client + tier4 tests pass.
2026-06-06 16:09:16 -04:00
ed ca35b3ef48 fix(opencode): Remove invalid MCP tools block, add timeout/env, grant subagent access
The 46-entry mcp.manual-slop.tools block added in commit 30281843 was invalid per the v1.16.2 schema (McpLocalConfig has additionalProperties: false) and was being silently dropped. Also adds proper MCP server configuration and subagent permission grants.

Changes:

opencode.json:
- Remove the silently-dropped mcp.manual-slop.tools block (46 entries)
- Add timeout: 30000 (default 5000 is fragile)
- Add environment block with PYTHONPATH, GIT_TERMINAL_PROMPT, GCM_INTERACTIVE, GIT_ASKPASS, HOME so mcp_env.toml values are injected into the MCP server process
- Top-level 'tools' block intentionally omitted: schema only accepts boolean values (enable/disable), not description objects. Tool descriptions come from the MCP server's list_tools response (mcp_client.MCP_TOOL_SPECS).

.opencode/agents/{tier1-orchestrator,tier2-tech-lead,tier3-worker,tier4-qa,explore}.md:
- Add 'manual-slop_*': allow to each agent's permission block so subagents can use the 46 MCP tools (previously defaulted to deny in some permission schemas)

general.md: no change (no permission block, defaults to allow all)

Verified:
- opencode.json is now schema-valid (no more 'Expected boolean' errors)
- Both MCP servers connected: MiniMax (2 tools), manual-slop (46 tools)
- manual-slop MCP server startup: ~651ms (well under 30s timeout)
- All MCP tests pass: test_mcp_config.py + test_mcp_perf_tool.py = 4/4
- Subagent permission blocks confirmed in 'opencode debug config' output
2026-06-06 15:44:52 -04:00
ed 9eed60238a conductor(plan): mark T3.1 RED done; T3.2 holding for MCP fix (16780ec6) 2026-06-06 15:16:02 -04:00
ed 16780ec6d4 test(ai_client): TDD red phase - no top-level SDK imports allowed
Phase 3 Task T3.1 of startup_speedup_20260606 track. 9 tests assert:

  - import src.ai_client does NOT trigger google.genai / anthropic /
    openai / requests / google.genai.types imports (the main thread
    must not load these on import; they're warmed on _io_pool)
  - _require_warmed(name) helper exists and is callable
  - _require_warmed returns the cached module if already in sys.modules
  - _require_warmed falls back to importlib for tests/dev where
    warmup didn't run
  - The static audit script does not see src/ai_client.py as a
    contributor of heavy-import violations

All 9 tests are currently FAILING (RED). They will turn GREEN when
T3.2 (the actual refactor of src/ai_client.py to remove top-level
imports and add _require_warmed) lands.

The implementation is held pending MCP client fix (per user instruction).
2026-06-06 15:11:13 -04:00
ed b17cbbdeca conductor(plan): write 6-phase implementation plan for qwen_llama_grok_integration_20260606
~30 tasks across 6 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.8): Capability matrix framework (src/vendor_capabilities.py)
  + shared OpenAI-compatible helper (src/openai_compatible.py). 13 unit tests.
- Phase 2 (2.1-2.8): Qwen via DashScope native SDK. 5 unit tests.
- Phase 3 (3.1-3.7): Grok (xAI) + Llama (Ollama + OpenRouter + custom URL)
  via shared helper. 8 unit tests.
- Phase 4 (4.1-4.3): MiniMax refactor (_send_minimax from ~250 -> ~50 lines).
  Safety net: existing tests/test_minimax_provider.py.
- Phase 5 (5.1-5.5): 9 capability-driven UX adaptations in src/gui_2.py.
  Manual smoke test for all 3 new vendors.
- Phase 6 (6.1-6.4): Update docs/guide_ai_client.md + guide_models.md.
  Archive the track.

Data-oriented design: shared helper is the algorithm on normalized data;
_send_<vendor>() entry points are thin boundary adapters.

1-space indentation per project style guide. No placeholders. All test
code is concrete. Self-review at end confirms spec coverage (every
section of spec.md mapped to a task).
2026-06-06 15:06:30 -04:00
ed 97daaff29b conductor(spec): Fix Qwen-Audio matrix entry consistency (vision=false, audio deferred)
The capability matrix v1 has no 'audio' field (audio_input is deferred to v2).
Qwen-Audio's vision flag was incorrectly marked true. Changed to false and
clarified that v1 uses Qwen-Audio as text-only; audio attachment UI is
hidden via the absent audio capability check.
2026-06-06 14:58:03 -04:00
ed 055430a75a conductor(tracks): Register qwen_llama_grok_integration_20260606 in registry (item 0d) 2026-06-06 14:56:55 -04:00
ed 7c1d597ef1 conductor(track): Initialize qwen_llama_grok_integration_20260606 spec
Three new vendors + capability matrix framework + MiniMax refactor:

**Capability matrix v1 (7 features):** vision, tool_calling, caching, streaming,
model_discovery, context_window, cost_tracking. Audio and server-side code
execution deferred to a follow-up track.

**Qwen via DashScope native SDK:** Qwen-Turbo, Qwen-Plus, Qwen-Max, Qwen-Long
(1M context), Qwen-VL-Plus/Max (vision), Qwen-Audio. Native API chosen over
OpenAI-compatible mode to unlock Qwen-Audio, Qwen-Long custom chunking, and
Qwen-VL-Max enhanced vision.

**Llama (OpenAI-compatible, multi-backend):** Ollama (local, free), OpenRouter
(cloud aggregator covering Together/Groq/Fireworks), custom URL escape hatch.
Models: Llama 3.1 8B/70B/405B, 3.2 1B/3B, 3.2 11B/90B Vision, 3.3 70B.

**Grok via xAI (OpenAI-compatible):** Grok-2, Grok-2-Vision, Grok-Beta.

**Shared OpenAI-compatible helper** in src/openai_compatible.py processes a
normalized request/response data structure; each _send_<vendor>() is a thin
adapter at the boundary (data-oriented design per Fleury/Acton/Lottes).

**MiniMax refactor:** ~250 lines reduced to ~50 by using the shared helper.
Existing test_minimax_provider.py is the safety net.

**UX adaptation:** 9 UI elements (screenshot, tools toggle, cache panel, stream
progress, fetch models, token budget, cost panel) read from the matrix instead
of hard-coding per-vendor branches.

**Out of scope (deferred):** Anthropic/Gemini/DeepSeek migration to the matrix
(separate track), audio input, server-side code execution, PDF input, batch API,
fine-tuning.

6 phases planned: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX
adaptation, docs+archive.
2026-06-06 14:56:00 -04:00
ed 7eb743c6cb conductor(plan): Phase 2 complete - io_pool + warmup foundation in place
Phase 2 of startup_speedup_20260606 is done.

Tasks:
  T2.1 (Red)   tests/test_io_pool.py         1354679e  4 tests
  T2.2 (Green) src/io_pool.py                1354679e  make_io_pool() factory
  T2.3 (Red)   tests/test_warmup.py          1354679e  10 tests
  T2.4 (Green) src/warmup.py                 1354679e  WarmupManager
  T2.5 (Wire)  AppController integration     922c5ad9  io_pool + warmup in __init__ + 5 public delegation methods
  T2.6 (Plan)  this commit

What now exists:
  - make_io_pool() returns a 4-worker ThreadPoolExecutor named 'controller-io-N'
  - WarmupManager class with submit/status/is_done/wait/on_complete/reset
  - AppController creates self._io_pool + self._warmup early in __init__
  - Warmup is submitted immediately (jobs run concurrent with the rest of init)
  - Public API: controller.warmup_status(), controller.is_warmup_done(),
    controller.wait_for_warmup(timeout), controller.on_warmup_complete(cb)
  - controller._compute_warmup_list() returns 9 always + 2 conditional (fastapi)
  - shutdown() now also shuts down the io_pool

Currently the warmup is a no-op for modules already imported at the top
of app_controller.py (fastapi, requests). Phase 3 will remove those
top-level imports; the warmup infrastructure will then start doing
real work.

18/18 tests passing (4 io_pool + 10 warmup + 4 test_app_controller_*).

Next: Phase 3 (remove top-level SDK imports from src/ai_client.py).
Expected to fix ~3 audit violations (google.genai, anthropic, openai).
2026-06-06 14:52:04 -04:00
ed 922c5ad9ab feat(app_controller): wire _io_pool + warmup + 5 public delegation methods
Phase 2 Task T2.5 of the startup_speedup_20260606 track.

In AppController.__init__, right after the lock init (and before the
heavy subsystem construction that follows), create the shared _io_pool
and WarmupManager, then submit the warmup list. The warmup runs
concurrently with the rest of __init__, so by the time __init__
returns, the heavy modules are loaded (or in flight).

Changes:
  - Add imports: from src.io_pool import make_io_pool,
    from src.warmup import WarmupManager
  - In __init__, after the locks block, add:
      self._io_pool = make_io_pool()
      self._warmup = WarmupManager(self._io_pool)
      self._warmup.submit(self._compute_warmup_list())
  - Add _compute_warmup_list() method: returns ['google.genai',
    'anthropic', 'openai', 'requests', 'src.command_palette',
    'src.theme_nerv', 'src.theme_nerv_fx', 'src.markdown_table',
    'numpy'] always, plus ['fastapi', 'fastapi.security.api_key']
    if self.test_hooks_enabled
  - Add public delegation methods: warmup_status(), is_warmup_done(),
    wait_for_warmup(timeout), on_warmup(callback)
  - In shutdown(), add self._io_pool.shutdown(wait=False)

The warmup currently is a no-op for the heavy modules already imported
at the top of app_controller.py (fastapi, requests, etc. are
already in sys.modules). The infrastructure is in place; Phase 3 will
remove the top-level imports so the warmup actually does work.

Verified: all 18 tests pass (test_io_pool + test_warmup + existing
test_app_controller_mcp + test_app_controller_offloading).
2026-06-06 14:48:51 -04:00
ed 1354679e33 feat(io_pool, warmup): add shared 4-thread pool + WarmupManager
Phase 2 Tasks T2.1-T2.4 of the startup_speedup_20260606 track.

NEW: src/io_pool.py
  make_io_pool() factory: 4-worker ThreadPoolExecutor with
  thread_name_prefix='controller-io'. The sanctioned way for any
  background work. Replaces ad-hoc threading.Thread() calls per
  the 'no new threads' rule.

NEW: src/warmup.py
  WarmupManager: manages a list of modules to import on the shared
  pool. Public API:
    .submit(modules)        - start warmup (call once)
    .status()               - {pending, completed, failed}
    .is_done()              - bool
    .wait(timeout)          - block until done
    .on_complete(callback)  - register completion callback
    .reset()                - clear state
  Thread-safe (lock-guarded). 10 tests cover all paths.

NEW: tests/test_io_pool.py (4 tests):
  - ThreadPoolExecutor returned
  - 4 workers
  - Threads named 'controller-io-*'
  - Jobs run in parallel (barrier test)

NEW: tests/test_warmup.py (10 tests):
  - One job per module submitted
  - Initial pending list correct
  - Failed imports tracked
  - Done event set after all complete
  - wait() blocks until done
  - on_complete callback fires (and immediately if already done)
  - Modules actually end up in sys.modules
  - reset() clears state
  - Jobs run concurrently (not serially)

All 14 tests pass. AppController integration is the next commit.
2026-06-06 14:47:02 -04:00
ed 7fdab70529 conductor(plan): write 4-phase implementation plan for test_batching_refactor_20260606
16 tasks across 4 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.16): Library + dry-run. 20 unit tests across categorizer,
  batcher, plugin. New run_tests_batched.py has --plan/--audit only.
- Phase 2 (2.1-2.3): Shadow run via CI. Compare new vs old plan output.
- Phase 3 (3.1-3.4): Switch default. Full CLI with --tiers, --durations.
  Old script becomes .legacy. Update docs/guide_testing.md.
- Phase 4 (4.1-4.6): Populate registry, gitignore durations, delete
  legacy, archive track.

1-space indentation per project style guide. No placeholders. All
test code is concrete.
2026-06-06 14:24:39 -04:00
ed f9a0125847 conductor(plan): Phase 1 complete - baseline + audit infrastructure ready
Phase 1 of startup_speedup_20260606 track is done.

Tasks completed:
  T1.1 baseline benchmark        -> 6f9a3af2 (docs/reports/startup_baseline_20260606.txt)
  T1.2 audit_gui2_imports.py     -> 6f9a3af2 (scripts/ + audit results)
  T1.3 StartupProfiler           -> 5a856536 (src/ + 5 tests)
  T1.4 audit_main_thread_imports -> 6f9a3af2 (scripts/ + 9 tests)
  T1.5 plan update                -> this commit

Baseline numbers (3-run median, from scripts/benchmark_imports.py):
  src.gui_2                1770ms   (main-thread bottleneck)
  simulation.user_agent    1517ms
  google.genai             1001ms
  openai                    482ms
  anthropic                 441ms
  imgui_bundle              255ms   (KEEP - ImGui hot path)
  src.theme_nerv_fx         254ms
  src.theme_nerv            246ms
  src.markdown_table        243ms
  src.command_palette       242ms

Audit violations on current codebase: 67. These are the targets
for Phases 3-5 (remove top-level heavy imports to fix each one).

Next: Phase 2 (Job Pool + Warmup Foundation).
2026-06-06 14:24:20 -04:00
ed 6f9a3af201 feat(audit): add main-thread import graph audit + baseline measurements
Phase 1, Tasks T1.2 + T1.4 of the startup_speedup_20260606 track.

NEW: scripts/audit_main_thread_imports.py
  Static CI gate that AST-walks the import graph reachable from
  sloppy.py and fails (exit 1) if any heavy module is imported at the
  top of a main-thread-reachable file. Walks into if/elif/else and
  try/except branches (which run at import time) but skips function
  bodies (which only run when called). Allowlist: stdlib + the lean
  gui_2 skeleton (imgui_bundle, defer, src.imgui_scopes, src.theme_2,
  src.theme_models, src.paths, src.models, src.events).

NEW: scripts/audit_gui2_imports.py
  Read-only analysis tool that lists every top-level and function-level
  import in src/gui_2.py, classified by location. Used in Phase 5D to
  identify which imports to remove.

NEW: tests/test_audit_main_thread_imports.py
  9 tests covering: --help exits 0, clean stdlib-only passes, heavy
  third-party fails, google.genai fails, transitive walks, function-
  body imports ignored, if-branch imports flagged, try-block imports
  flagged, file:line reported. All 9 pass.

NEW: docs/reports/startup_baseline_20260606.txt
  3-run median cold-start benchmark. Worst offenders: src.gui_2
  (1770ms), simulation.user_agent (1517ms), google.genai (1001ms),
  openai (482ms), anthropic (441ms), imgui_bundle (255ms),
  src.theme_nerv* (485ms combined), src.markdown_table (243ms),
  src.command_palette (242ms).

NEW: docs/reports/startup_audit_20260606.txt
  Audit output on the CURRENT codebase. Reports 67 violations across
  the main-thread import graph (incl. numpy in src/gui_2.py:9,
  tomli_w in src/gui_2.py:18, fastapi + requests in src/app_controller,
  tree_sitter_* in src/file_cache, pydantic in src/models, plus all
  the src.* subsystem imports that drag in heavy transitive deps).
  Phase 3-5 of the track will resolve these one by one.

After Phase 3-5, this audit must exit 0 (no violations).

Co-located reports in docs/reports/ per project convention; the other
agent finished their work in docs/superpowers/ and is unrelated.
2026-06-06 14:22:18 -04:00
ed 0553983ce9 conductor(spec): Clarify --audit --strict semantics in Section 4.3
Default --audit exits non-zero on hard errors only. --strict adds the
'multiple subsystems = probably cross-cutting' heuristic from Section 9
as a CI gate. Two modes, one flag.
2026-06-06 14:16:13 -04:00
ed cbfd78c51d conductor(tracks): Register test_batching_refactor_20260606 in registry 2026-06-06 14:14:11 -04:00
ed b7a9737443 conductor(track): Initialize test_batching_refactor_20260606 spec
Three-tier batching refactor: replace alphabetical 4-at-a-time batching with
fixture-class-isolated tiers (0 opt-in, 1 unit/xdist, 2 mock_app, 3 live_gui
in one session, H headless, P performance).

Hybrid classification: auto-infer from filename + AST fixture scan; hand-curated
tests/test_categories.toml overrides for cross-cutting and ambiguous files.

Opt-in per-test order control via [[files.X.test_order]] sub-tables, gated on
a conftest-loaded pytest plugin (no-op without entries).

Priority order: B (process isolation) > A (subsystem diagnostic) > C (speed).
2026-06-06 14:12:14 -04:00
ed 96158edd97 conductor(plan): mark T1.3 StartupProfiler complete (5a856536) 2026-06-06 13:59:02 -04:00
ed 5a85653654 feat(startup_profiler): add StartupProfiler for per-phase init timing
Lightweight, in-memory profiler for AppController init phases. Used by
the startup_speedup_20260606 track to measure where the time goes
during boot (config hydration, hook server start, subsystem init, etc.).

The profiler is exposed via /api/startup_profile (Phase 8 work) and
the Diagnostics panel so the user can see the exact per-phase cost.

Public API:
  StartupProfiler() - create
  .phase(name) - context manager
  .snapshot() - {phases: {name: {start_ts, duration_ms}}, total_ms, count}
  .reset() - clear recorded phases
  .enable() / .disable() - toggle recording

Implementation:
  - dataclass with list of _Phase(name, start_ts, end_ts)
  - @contextmanager records wall-clock via time.perf_counter
  - records duration even if the body raises (try/finally)
  - snapshot is a copy, so consumers can't mutate the live state

TDD: 5 tests in tests/test_startup_profiler.py cover: basic
recording, total math, snapshot isolation, exception safety, empty
state.
2026-06-06 13:57:26 -04:00
ed f2f5ee1197 conductor(plan): flip track from lazy-loading to proactive warmup
Architectural shift driven by user clarification: lazy-loading on first
use causes user-perceptible lag when the user-triggered action (e.g.
provider switch) propagates to a controller method that triggers the
first import. The fix is to pre-import heavy modules on a bg thread
at startup and have functions access them via _require_warmed().

Old design (rejected):
  - from google import genai inside _send_gemini (lazy on first call)
  - First user action that triggers this pays the cost; UI feels laggy

New design (this commit):
  - Top-level heavy imports REMOVED from main-thread-reachable files
  - AppController.__init__ submits warmup jobs to _io_pool (4 threads,
    named 'controller-io-N')
  - Each warmup worker imports its module and updates a thread-safe
    warmup_status dict
  - Functions access modules via _require_warmed(name), which assumes
    the module is in sys.modules (warmed at startup)
  - When all jobs complete, _warmup_done_event is set and registered
    on_warmup_complete callbacks fire
  - GUI shows status indicator + toast when warmup completes
  - Hook API exposes /api/warmup_status and /api/warmup_wait
  - Tests can call controller.wait_for_warmup() before exercising
    warmup-dependent functionality

Phase 2 now bundles job pool + warmup (T2.3+T2.4 add warmup tests +
implementation). Phases 3-5 do 'remove top-level imports' instead of
'lazy-load'. Phase 7 is the notification surface (Hook API + GUI).
Definition of Done includes warmup-completion criteria, the
'no function-body imports' check, and an end-to-end 'provider switch
is INSTANT' smoke test.

No code changes; this is a planning update only.
2026-06-06 13:45:05 -04:00
ed ca254bac41 fix(imports): break models<->dag_engine circular dependency
Track.get_executable_tickets (in models.py) called TrackDAG at
runtime, forcing a top-level import of src.dag_engine into models.py
and creating a 2-cycle that broke whichever module loaded second
(Ticket was not yet defined when models.py loaded first; TrackDAG
was not yet defined when dag_engine.py loaded first).

Fix: hoist the method out of the Track dataclass and into a free
function get_executable_tickets(track) in dag_engine.py. models.py
no longer needs TrackDAG at all, so the cycle is one-directional
(models -> dag_engine) and resolves cleanly in any import order.

Tests updated:
- tests/test_mma_models.py: import get_executable_tickets and call
  it instead of track.get_executable_tickets() (4 call sites)
- tests/test_conductor_engine_v2.py: comment update

Verified both import orders resolve cleanly:
  forward:  import src.models; import src.dag_engine  -> OK
  reverse:  import src.dag_engine; import src.models  -> OK
34 tests pass (test_mma_models, test_dag_engine, test_execution_engine,
test_arch_boundary_phase3, test_track_state_schema).
2026-06-06 13:30:18 -04:00
r00tz 9e4fac496d made local rag needs optional (prevents having to have torch / sentence-transformers if you never use local embedding) 2026-06-06 13:21:43 -04:00
ed 32e633b3ec conductor(plan): mark startup_speedup_20260606 track creation committed (cd4fb045) 2026-06-06 13:01:32 -04:00
ed cd4fb04541 conductor(track): create startup_speedup_20260606 track for sloppy.py startup latency
Fulfills the existing backlog entry at conductor/tracks.md:152
(2026-06-05 root-cause analysis of live_gui wait_for_server timeouts).

Main Thread Purity Invariant: the main thread (entering immapp.run())
must never import a module heavier than imgui_bundle and the lean
gui_2 skeleton. Enforced by:
  - static gate: scripts/audit_main_thread_imports.py (CI)
  - runtime hook: tests/test_main_thread_purity.py (sys.addaudithook)

Threading constraint: no new threading.Thread(...) calls in src/.
All background work goes through AppController._io_pool
(ThreadPoolExecutor, max_workers=4, thread_name_prefix='controller-io').

9 phases, 57 tasks: audit+baseline, job pool, lazy-load SDKs, lazy-load
FastAPI, lazy-load feature-gated GUI, migrate ad-hoc threads, runtime
enforcement, hook API + diagnostics, verify+checkpoint.

Expected savings: ~2000-2400ms off main-thread import cost.
Target: import src.ai_client < 50ms (from ~1800ms), live_gui fixtures
no longer time out at wait_for_server(timeout=15).
2026-06-06 12:57:20 -04:00
ed 2adf3274af add benchmark scriptr 2026-06-06 12:47:41 -04:00
ed 311fde9a8b fixes 2026-06-06 12:44:07 -04:00
ed 9ccaf0594c some org on ai_client 2026-06-06 11:35:20 -04:00
ed 9d72d98b50 conductor(tracks): mark rag_phase4_stress_test_flake resolved (commit 16412ad5) 2026-06-06 11:29:03 -04:00
ed 16412ad5f9 fix(rag): detect ChromaDB dim mismatch and recreate collection on provider switch 2026-06-06 11:26:47 -04:00
ed 339b062913 more organization 2026-06-06 11:08:07 -04:00
ed 7d555361f9 more organization 2026-06-06 10:24:22 -04:00
ed 1c627bcc30 fix(docs): correct section order in guide_testing (patterns before See Also) + fix LF/CRLF 2026-06-06 09:34:38 -04:00
ed 0f742b1d5f conductor(workflow): add Indentation-Driven Class Method Visibility pitfall (2026-06-05) 2026-06-06 02:04:05 -04:00
ed e276bac093 docs(gui_2): add __getattr__/__setattr__ delegation pattern + indentation gotcha 2026-06-06 01:59:20 -04:00
ed 4ee22dedb9 docs(testing): add Narrow Test Paths + Indentation-Driven Method Visibility patterns 2026-06-06 01:53:25 -04:00
ed e7b8877f2a docs(readme): update for v2 completion (24 guides, 273 test files, 98.9% pass rate) 2026-06-06 01:42:45 -04:00
ed 5e0b6bbfd3 conductor(tracks): queue RAG test flake as new backlog item; mark prior_session complete 2026-06-06 01:35:21 -04:00
ed 008179360f conductor(index): v2 recently shipped, all 4 live_gui failures resolved 2026-06-06 01:30:03 -04:00
ed 9a3831897b conductor(tracks): mark live_gui_test_hardening_v2 complete (root cause was indent, not state sync) 2026-06-06 01:28:02 -04:00
ed 26e0ced4d9 test(prior_session): refactor to narrow render_prior_session_view (50+ mocks -> 20) 2026-06-06 01:12:29 -04:00