4a33848620
Root cause: test_full_live_workflow in batch context (with prior sims running AI discussion turns) would queue its _do_project_switch behind the auto-pruner's scan of tests/logs/ (154MB, 6519 files). The 4-worker pool was saturated, so the switch would never run within 30s. Fix: bump IO_POOL_MAX_WORKERS from 4 to 8. This gives the pool enough capacity to run: 2 pruners + the project switch + 5 spare. Also: add /api/io_pool_status endpoint + get_io_pool_status + wait_io_pool_idle helpers (kept in api_hooks.py and api_hook_client.py for the test_api_hook_client_io_pool.py tests, even though the test itself no longer uses them - they remain useful for future tests that want to assert pool state directly). Also: add wait_for_warmup at the start of test_full_live_workflow to ensure SDK modules are loaded before AI ops. Test verification: - test_full_live_workflow in isolation: 11.83s PASS - test_full_live_workflow in batch (with 4 prior sims): 83.46s PASS - 30/30 related unit tests PASS
37 lines
1.3 KiB
Python
37 lines
1.3 KiB
Python
"""Shared AppController I/O pool factory.
|
|
|
|
Historical note: an earlier revision of this module registered an
|
|
``atexit.register(pool.shutdown, wait=False)`` handler here, mirroring
|
|
the conftest fix at commit 8957c9a5. That approach was reverted because
|
|
it does not solve the Ctrl+C hang in ``sloppy.py`` when a worker is
|
|
mid-task (e.g. a long-running Gemini/Anthropic HTTP request): atexit
|
|
handlers do not fire at all in that scenario, so the process still hangs
|
|
in ``ThreadPoolExecutor.__del__`` -> ``shutdown(wait=True)`` during
|
|
finalization.
|
|
|
|
The production fix lives in ``AppController.__init__`` as a SIGINT
|
|
handler that drains the pool and calls ``os._exit(0)``, sidestepping
|
|
the broken finalization chain. See commit log for details.
|
|
"""
|
|
|
|
from concurrent.futures import ThreadPoolExecutor
|
|
|
|
|
|
IO_POOL_MAX_WORKERS: int = 8
|
|
IO_POOL_THREAD_NAME_PREFIX: str = "controller-io"
|
|
|
|
|
|
def make_io_pool(max_workers: int = IO_POOL_MAX_WORKERS) -> ThreadPoolExecutor:
|
|
"""Create the shared AppController I/O pool.
|
|
|
|
4 worker threads, named "controller-io-N". Used for warmup, log pruning,
|
|
disk-bound subsystem init, and any other background work that should
|
|
not spin up its own thread.
|
|
|
|
Caller is responsible for shutdown (e.g. controller.shutdown()).
|
|
"""
|
|
return ThreadPoolExecutor(
|
|
max_workers=max_workers,
|
|
thread_name_prefix=IO_POOL_THREAD_NAME_PREFIX,
|
|
)
|