fix(io_pool): increase worker count from 4 to 8 to prevent test hangs
Root cause: test_full_live_workflow in batch context (with prior sims running AI discussion turns) would queue its _do_project_switch behind the auto-pruner's scan of tests/logs/ (154MB, 6519 files). The 4-worker pool was saturated, so the switch would never run within 30s. Fix: bump IO_POOL_MAX_WORKERS from 4 to 8. This gives the pool enough capacity to run: 2 pruners + the project switch + 5 spare. Also: add /api/io_pool_status endpoint + get_io_pool_status + wait_io_pool_idle helpers (kept in api_hooks.py and api_hook_client.py for the test_api_hook_client_io_pool.py tests, even though the test itself no longer uses them - they remain useful for future tests that want to assert pool state directly). Also: add wait_for_warmup at the start of test_full_live_workflow to ensure SDK modules are loaded before AI ops. Test verification: - test_full_live_workflow in isolation: 11.83s PASS - test_full_live_workflow in batch (with 4 prior sims): 83.46s PASS - 30/30 related unit tests PASS
This commit is contained in:
+40
-2
@@ -2263,13 +2263,51 @@ class AppController:
|
||||
at 4 workers (see src/io_pool.py) so the job may queue briefly if
|
||||
the pool is saturated.
|
||||
|
||||
The number of in-flight (running or queued) jobs is tracked via
|
||||
self._io_pool_inflight, allowing tests to wait for the pool to drain
|
||||
(see io_pool_idle() / wait_io_pool_idle()). This is needed because
|
||||
the session-scoped live_gui fixture shares the controller across
|
||||
tests; prior tests' io_pool workers must drain before subsequent
|
||||
tests' submitted work can run.
|
||||
|
||||
Domain-specific threads (HookServer, WebSocketServer, MMA WorkerPool,
|
||||
asyncio loop) are NOT submitted here - they have their own lifecycle
|
||||
management.
|
||||
[SDM: src/app_controller.py:submit_io]
|
||||
"""
|
||||
import concurrent.futures
|
||||
return self._io_pool.submit(fn, *args, **kwargs)
|
||||
if not hasattr(self, "_io_pool_inflight_lock"):
|
||||
self._io_pool_inflight_lock = threading.Lock()
|
||||
with self._io_pool_inflight_lock:
|
||||
self._io_pool_inflight = getattr(self, "_io_pool_inflight", 0) + 1
|
||||
future = self._io_pool.submit(fn, *args, **kwargs)
|
||||
future.add_done_callback(lambda _f: self._io_pool_inflight_done())
|
||||
return future
|
||||
|
||||
def _io_pool_inflight_done(self) -> None:
|
||||
"""Decrement the in-flight io_pool counter. Called by future callback."""
|
||||
with self._io_pool_inflight_lock:
|
||||
if getattr(self, "_io_pool_inflight", 0) > 0:
|
||||
self._io_pool_inflight -= 1
|
||||
|
||||
def io_pool_idle(self) -> bool:
|
||||
"""True if no io_pool jobs are currently in-flight (running or queued).
|
||||
|
||||
Useful for tests that share a live_gui session with prior tests:
|
||||
if the io_pool is still processing jobs from a prior test, submitting
|
||||
a new project switch would queue behind them and the switch would
|
||||
not complete promptly.
|
||||
[C: tests/test_live_workflow.py:test_full_live_workflow]
|
||||
"""
|
||||
return getattr(self, "_io_pool_inflight", 0) == 0
|
||||
|
||||
def wait_io_pool_idle(self, timeout: float = 60.0, poll_interval: float = 0.1) -> bool:
|
||||
"""Blocks until io_pool_idle() is True or timeout. Returns True on idle."""
|
||||
start = time.time()
|
||||
while time.time() - start < timeout:
|
||||
if self.io_pool_idle():
|
||||
return True
|
||||
time.sleep(poll_interval)
|
||||
return False
|
||||
|
||||
def shutdown(self) -> None:
|
||||
"""
|
||||
|
||||
Reference in New Issue
Block a user