Eliminates 22 call sites that bypassed the AppController state owner
and read/wrote config.toml directly. AppController is now the single
source of truth for self.config; gui_2.py, commands.py, etc. go
through controller.save_config() / controller.load_config().
Production changes:
- src/models.py: rename load_config -> _load_config_from_disk,
save_config -> _save_config_to_disk (private I/O primitives)
- src/app_controller.py: add public load_config()/save_config() methods
that own the state. Update 3 internal call sites and 3 ConductorEngine
call sites to pass max_workers from self.config
- src/multi_agent_conductor.py: ConductorEngine.__init__ now takes
max_workers as a parameter (caller responsibility, not I/O primitive)
- src/external_editor.py: get_default_launcher() takes config as a
parameter; gui_2.py:1311,4776 pass app.config
- src/gui_2.py: 17 sites of models.save_config(X.config) replaced with
X.save_config() (delegates via __getattr__ to controller)
- src/commands.py: save_all() uses app.save_config()
Test changes (route through controller, not I/O primitive):
- tests/conftest.py: mock_app and app_instance fixtures now patch
AppController.load_config/save_config instead of models I/O primitives
- 18 other test files: patches renamed from models._save_config_to_disk
to AppController.save_config (and same for load_config)
- tests/test_app_controller_mcp.py: use SLOP_CONFIG env var instead of
patching removed CONFIG_PATH module constant
- tests/test_parallel_execution.py: pass max_workers=2 explicitly to
ConductorEngine (caller no longer reads config)
- tests/test_gui_paths.py: add save_config=MagicMock() to MockApp;
assert on controller method, not I/O primitive
- tests/test_models_no_top_level_tomli_w.py: still calls private
_save_config_to_disk directly (the only allowed exception; tests
the lazy-load behavior of the primitive itself)
New files:
- scripts/audit_no_models_config_io.py: enforces the rule (--strict,
--json modes; AST-based docstring detection to avoid false positives)
- conductor/code_styleguides/config_state_owner.md: documents the rule
Verification:
- 67 targeted tests pass
- scripts/audit_no_models_config_io.py --strict returns 0
This is the architectural cleanup that surfaced during the
audit_architectural_cheats_20260607 review. Closes the smoke-gun
CONFIG_PATH module constant (already done in 0c7ebf22) AND the
free-function models.load_config/save_config smell.
[conductor(checkpoint): config-iO-refactor-20260607]
The unconditional watchdog (91b19c90) was a 90s time.sleep, which fired for ANY batch that ran >90s from conftest load — even legitimate slow live_gui tests. User confirmed: Batch 2 ended at 92.1s because the unconditional fired mid-test (the smart watchdog's signal hadn't fired yet because pytest_terminal_summary only runs after all tests are done).
Fix: make the unconditional ALSO signal-based. Both watchdogs now wait for the same _pytest_finished_event. The difference is just the timeout:
- Smart: 300s pytest-hung + 5s grace (handles normal cases)
- Unconditional: 900s pytest-hung + 5s grace (catches extremely long test runs)
- If the signal never fires, both fire os._exit(2) (the first to time out wins).
Why 900s for unconditional: pytest_terminal_summary fires AFTER the summary print. For a normal batch, that's ~32s. For an extremely long batch (e.g., 10+ minutes of slow tests), we want to wait the full duration before declaring it hung. 900s = 15 min is a safe upper bound; the run_tests_batched.py subprocess.run(timeout=1000) is the final safety net for catastrophic hangs.
Two-thread design is intentional (redundant safety). If one thread is somehow blocked, the other fires. The grace period is 5s for both, so the first to fire wins the race.
The previous smart watchdog (44b0b5d4, 91b19c90) used pytest_unconfigure as its signal. But pytest_unconfigure fires AFTER all fixtures, terminal summary, and finalizers — at the very end of the session. If anything in conftest's chain (e.g., the io_pool created in AppController.__init__ at conftest line ~65) hangs in __del__, pytest_unconfigure never gets called. Result: every batch's watchdog waited the full 60s/90s and then fired.
The right signal is pytest_terminal_summary, which fires AFTER the test summary is printed (the user can see '241 passed, 1 skipped in 32.30s' in the output) but BEFORE the shutdown hangs begin. At that point the test session is logically done; the watchdog can give a short 5s grace for normal finalization, then os._exit(0) so the runner can move to the next batch.
The previous attempts and why they failed (documented in test_conftest_smart_watchdog.py docstring):
- e1c8730f: 30s os._exit(0) cut off batches mid-test
- 719c5e27: os._exit(2) but daemon thread fired on every batch
- 91b19c90: kept exit 2 but pytest_unconfigure never fires when io_pool hangs
- 44b0b5d4: pytest_unconfigure as signal still hung
- 2026-06-07 final: pytest_terminal_summary fires after summary print, before shutdown hangs
New contract:
- Normal batch: pytest_terminal_summary fires at ~32s (after summary
is printed), 5s grace, os._exit(0). Total: 37s.
- Hung in test execution: pytest_terminal_summary never fires,
smart watchdog waits 300s, fires os._exit(2).
- Hung in conftest load (before any test): unconditional watchdog
fires os._exit(2) at 60s.
7 tests in test_conftest_smart_watchdog.py updated to match:
- test_terminal_summary_hook_sets_finished_event: primary signal source
- test_unconfigure_hook_is_fallback_signal: fallback for crashes
- test_clean_exit_uses_zero_exit_code: os._exit(0) after signal
- test_hang_uses_nonzero_exit_code: os._exit(2) for true hangs
The smart watchdog's 120s pytest-hung + 30s grace = 150s total wait was too long. The user's run hung past that point in interpreter shutdown (ThreadPoolExecutor.__del__ or live_gui teardown). Two changes:
1. SHORTENED the smart watchdog:
- pytest-hung: 120s -> 60s
- shutdown-grace: 30s -> 15s
- Total: 75s (was 150s)
2. ADDED an unconditional 90s sledgehammer watchdog. This one does
NOT wait for pytest_unconfigure. It just sleeps 90s from conftest
load and fires os._exit(2). This handles the case where pytest is
hung BEFORE pytest_unconfigure is reached (e.g., conftest's own
wait_for_warmup hangs, or pytest never reaches its unconfigure).
So the new contract is:
- Normal batch: pytest_unconfigure sets event at ~32s, smart
watchdog's first wait returns immediately, 15s grace elapses,
watchdog exits with 0 (normal exit). Unconditional never fires
(90s would only fire if smart failed).
- Hung batch: pytest_unconfigure never fires, unconditional
watchdog fires at 90s with os._exit(2). Runner catches via
CalledProcessError, reports failure.
- Hung shutdown: pytest_unconfigure fires at ~32s, 15s grace
elapses, smart watchdog fires at 60s with os._exit(2).
The 90s unconditional + 60s smart + 15s grace = the smart watchdog
fires first (at 60s) if pytest is done; the unconditional fires
later (at 90s) if pytest is hung earlier. Net max hang: 90s.
Added test_conftest_smart_watchdog.py test for the new thread.
Re-add hang protection after the user's run showed pytest hanging in interpreter shutdown (ThreadPoolExecutor.__del__ / live_gui teardown) after Batch 1 completed successfully. The previous naive watchdog (e1c8730f, 30s os._exit(0)) cut off batches mid-test; the immediate removal (4103c08e) let real hangs wait 1000s for the runner's subprocess timeout.
This SMART watchdog only fires when pytest is ACTUALLY hanging:
- pytest_unconfigure hook sets _pytest_finished_event when the
test session is done (BEFORE interpreter finalization).
- Watchdog waits for the event with 120s timeout:
* If not set in 120s: pytest is hung in test execution -> os._exit(2).
* If set: pytest finished cleanly; give 30s for normal
interpreter shutdown (ThreadPoolExecutor.__del__, etc.).
* If still alive after grace: io_pool / live_gui teardown
is hung -> os._exit(2).
- Exit code 2 (not 0) so run_tests_batched.py correctly reports
a failed batch (CalledProcessError). The 0 in the previous
version masked hangs and hid test failures.
Contract:
- Normal batch (35s execution, 2s shutdown): pytest_unconfigure
fires at 35s, watchdog's first wait returns immediately, 30s
grace elapses without fire, pytest exits with 0. Runner: passed.
- Hung batch: pytest_unconfigure never fires, watchdog fires
os._exit(2) at 120s. Runner: failed.
- Hung shutdown (io_pool.__del__ blocks): pytest_unconfigure
fires, 30s grace elapses, watchdog fires os._exit(2). Runner: failed.
5 new tests in tests/test_conftest_smart_watchdog.py:
- test_watchdog_thread_registered: daemon thread named conftest-smart-watchdog
- test_watchdog_thread_is_daemon: doesn't block pytest exit
- test_pytest_unconfigure_sets_finished_flag: hook exists in conftest
- test_watchdog_uses_non_zero_exit_code: os._exit(2) is used
- test_watchdog_timeouts_documented: 120s and 30s are present
The conftest watchdog (e1c8730f) was a misguided fix. Empirically observed 2026-06-07:
1. CUTS OFF BATCHES MID-TEST: On Windows, daemon=True threads are NOT auto-killed by the interpreter. The watchdog's time.sleep(30) continues through pytest's normal shutdown, then os._exit(0) fires. For any batch with live_gui tests (which start a sloppy.py subprocess and may take >30s), pytest gets killed mid-test before its FAILURES/summary line is printed. The user's last run showed every batch at exactly 32.0s, confirming the watchdog fires regardless of pytest state.
2. HIDES TEST FAILURES: pytest's os._exit(0) masks its actual exit code, so the run_tests_batched.py runner (using subprocess.run(check=True)) reported 'All 5 batches passed' even when batch 5 had 5 F's in test_ticket_queue and 1 F in test_live_gui_filedialog_regression.
3. TIMING CORRELATION: Every batch in the run completed in 32.0s exactly. The 30s watchdog + ~2s pytest startup = 32.0s for ALL batches, including ones with 240 items collected that pytest never finished running.
Removed:
- The watchdog thread registration (conftest.py lines 77-82)
- The HANG PROTECTION comment block (replaced with explanation of why we removed it)
- tests/test_conftest_watchdog.py (the test no longer applies)
Kept:
- The wait_for_warmup() call (this is the SPEC's mechanism for tests to wait for AppController warmup, NOT a watchdog)
The runner's subprocess.run(timeout=1000) per batch is now the only safety net.
The os._exit(2) change in 719c5e27 introduced a regression: the watchdog's daemon thread continues running through pytest's interpreter shutdown. On EVERY batch (even ones that complete successfully in 17s), the watchdog's time.sleep(30.0) elapses during finalization and the thread calls os._exit(2) just as pytest is wrapping up. Result: every batch was reported as 'Batch N failed' by run_tests_batched.py, even ones with '126 passed in 17.14s'.
Revert watchdog to os._exit(0) — its original purpose (force-exit any stuck pytest at 30s) doesn't need a non-zero code; it's a sledgehammer, not a signal. The runner does its own failure detection.
Update scripts/run_tests_batched.py to:
- Use subprocess.run(timeout=180) per batch
- Catch TimeoutExpired as a batch failure (with elapsed time + reason printed)
- Catch CalledProcessError as a batch failure (preserved from before)
- Print elapsed time for every batch (pass or fail) so hang behavior is visible
- Print a final summary that lists all FAILED FILES (not batches) for easy re-running
- Add --batch-size and --timeout CLI flags
- Add 1-space indentation + type hints per project style
Verified: ast.parse OK; --help works; test_conftest_watchdog 3/3 pass.
The conftest watchdog (e1c8730f) used os._exit(0) after the 30s sleep. run_tests_batched.py calls subprocess.run(check=True) and only prints 'Batch N failed.' when the subprocess exits non-zero. Exit 0 hid the failure: pytest got killed mid-test, the FAILURES section never printed, and the runner silently moved to the next batch. The 'Total batches with failures: 1' summary at the end was therefore undercounting.
Fix: os._exit(0) -> os._exit(2). Code 2 is the standard 'interrupted by signal/timeout' code; pytest also uses it for Ctrl-C. The batched runner now correctly reports a non-zero exit as a failure.
Test updated (docstring) to document the new contract. 3/3 test_conftest_watchdog.py still pass.
The fixture detected stale processes on port 8999 but only issued a soft btn_reset POST (which doesn't reset the provider). When a previous batch left a sloppy.py subprocess running, the new subprocess failed to bind port 8999 and the wait loop connected to the stale process instead, leading to cross-batch state pollution (e.g., test_change_provider_via_hook seeing current_provider='gemini' after setting 'anthropic').
Fix: when port 8999 is found LISTENING, parse netstat -ano for the PID, taskkill /F /PID it, sleep 1s, then proceed with the fresh subprocess.Popen.
Verified: tests/test_conftest_watchdog.py 3/3 still pass (the watchdog from e1c8730f is independent of this fix).
run_tests_batched.py hangs at the end of a batch when the pytest
subprocess fails to exit cleanly. Two hang chains have been observed:
1. ThreadPoolExecutor.__del__ -> shutdown(wait=True) joining a
blocked worker during interpreter finalization
(concurrent.futures._python_exit, pool __del__, etc.).
2. The session-scoped \live_gui\ fixture teardown hanging in
client.reset_session() (HTTP call to hook server) or
kill_process_tree(process.pid) / process.wait(timeout=2)
(waiting for the sloppy.py subprocess to die on Windows).
A previous atexit-based fix (commit 8957c9a5) attempted to preempt
chain #1, but verified empirically that atexit handlers do NOT fire
at all when a pool worker is blocked in user code (see
src/io_pool.py module docstring for the full analysis). The
atexit-based fix is therefore ineffective, and was removed from
the conftest in this commit.
Solution: a daemon-thread watchdog that unconditionally calls
os._exit(0) after 30s. If pytest exits cleanly first, the thread
is killed when the process tears down (daemon=True). If pytest
hangs, the watchdog kicks in and the batched runner can move to
the next batch. Same pattern as
src/app_controller.py:_install_sigint_exit_handler (the production
Ctrl+C fix); the difference is the trigger (time-based vs. SIGINT).
Files:
- tests/conftest.py: replaced the ineffective atexit-based fix
with the daemon-thread watchdog. Header comment documents both
hang chains and explains why atexit was abandoned.
- tests/test_conftest_watchdog.py: 3 static regression tests that
verify the watchdog is registered as a daemon thread with a
timeout in the 25-35s range. Static checks (not subprocess) so
the test itself isn't recursively bound by the watchdog.
Fixes the run_tests_batched.py hang that occurs after batch 4.
The original conftest (commit 52ea2693) stored _warmup_app_controller
at module scope for the entire pytest session. When pytest exits,
GC of the AppController triggers ThreadPoolExecutor.__del__ ->
shutdown(wait=True). If warmup hasn't fully completed by then, the
shutdown blocks indefinitely, causing the batched test runner to
hang at the subprocess.run boundary.
Fix: register an atexit handler that captures the _io_pool reference
directly (default argument) and shuts it down with wait=False. The
pool reference is captured by closure, surviving even after the
AppController is GC'd. shutdown() is idempotent so the subsequent
shutdown(wait=True) in __del__ is a no-op.
This is part of sub-track 4 (warmup notification) cleanup; the
conftest's wait_for_warmup behavior is preserved, only the
exit-hang is fixed.
The google-genai library has a known circular-import bug in its
__init__.py chain:
google.genai/__init__.py:21: from .client import Client
-> from ._api_client import BaseApiClient
-> from .types import HttpOptions
When loaded fresh in a pytest process, the chain collides with
itself and leaves google.genai in a 'partially initialized' state.
Per the user spec (startup_speedup_20260606 spec.md:2.2 Layer 3):
"the app controller should post to test clients or the user
when its threads are warmed up with imports — that way the user
knows 'hey you have the ui first, but now you have all the
functionality.'"
This is exactly what the warmup notification system does.
Phase 2 (commit 1354679e) added the WarmupManager + _io_pool,
and the warmup list (state.toml) already includes 'google.genai'.
The AppController.__init__ submits the warmup jobs to the _io_pool
background thread. When the warmup completes, _warmup_done_event
is set and registered on_warmup_complete callbacks fire.
The previous conftest fix imported 'google.genai' DIRECTLY at
conftest module load. That bypassed the whole notification
mechanism. This commit fixes the oversight:
- Reverts the direct `import google.genai`
- Creates an AppController at conftest load time
- Calls `wait_for_warmup(timeout=60.0)` to block until the
background warmup completes
- google.genai ends up in sys.modules via the warmup's
`importlib.import_module` call (same end state, but now via
the documented mechanism)
The conftest's `from src.gui_2 import App` at line 27 is also
a heavy synchronous import chain that runs in-process. By the
time that line executes, the warmup is already in progress on
the _io_pool. The wait_for_warmup() call after that line ensures
the warmup completes before any test collects.
The AppController is session-scoped (one per pytest process).
If another fixture (e.g. live_gui) creates its own AppController
that also runs warmup, the second controller's wait_for_warmup
returns immediately because the modules are already in
sys.modules.
Cost: 60s timeout worst-case (typically completes in ~3s based on
the baseline measurement). One-time per pytest process.
Earlier alternatives I tried and rejected:
- Direct `import google.genai` in conftest: bypasses the
notification mechanism. User feedback: "you are falling back
to your jank."
- Source-level `genai = _require_warmed('google.genai')` + `.types`:
fails the same way (the library bug is in the PARENT's
__init__.py, not the leaf). The parent's __init__.py never
completes in a fresh process; once it's in the "partially
initialized" state in sys.modules, no caller pattern can fix it.
- Revert the conftest change and skip these tests: not viable,
the tests are real and important.
- Add isolate_workspace autouse fixture in conftest.py.
- Monkeypatch SLOP_CONFIG and preset paths to point to a temporary test directory.
- Update test_history_management.py to use dynamic paths.get_config_path().
- Prevents tests from accidentally reading or modifying the active project.toml or config.toml.
- conftest.py: Include tools.text_editors.vscode in live_gui workspace config
- gui_2.py: Add btn_open_external_editor to _clickable_actions
- test_external_editor_gui.py: Tests for external editor GUI integration
Note: Due to process boundaries (GUI runs in subprocess), full VSCode launch
verification requires manual testing. The test infrastructure verifies config,
command format, and button wiring. Manual verification recommended.
- Add AppController.stop_services() to clean up AI client and event loop
- Add ConfirmDialog, MMAApprovalDialog, MMASpawnApprovalDialog imports to gui_2.py
- Fix test mocks for MMA dashboard and approval indicators
- Add retry logic to conftest.py for Windows file lock cleanup
Applied 236 return type annotations to functions with no return values
across 100+ files (core modules, tests, scripts, simulations).
Added Phase 4 to python_style_refactor track for remaining 597 items
(untyped params, vars, and functions with return values).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>