From 9796fe27f434619e1f7048265f520ad46c8d12bf Mon Sep 17 00:00:00 2001 From: Ed_ Date: Sun, 7 Jun 2026 13:43:30 -0400 Subject: [PATCH] fix(tests): make unconditional watchdog signal-based too (900s, was 90s timer) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The unconditional watchdog (91b19c90) was a 90s time.sleep, which fired for ANY batch that ran >90s from conftest load — even legitimate slow live_gui tests. User confirmed: Batch 2 ended at 92.1s because the unconditional fired mid-test (the smart watchdog's signal hadn't fired yet because pytest_terminal_summary only runs after all tests are done). Fix: make the unconditional ALSO signal-based. Both watchdogs now wait for the same _pytest_finished_event. The difference is just the timeout: - Smart: 300s pytest-hung + 5s grace (handles normal cases) - Unconditional: 900s pytest-hung + 5s grace (catches extremely long test runs) - If the signal never fires, both fire os._exit(2) (the first to time out wins). Why 900s for unconditional: pytest_terminal_summary fires AFTER the summary print. For a normal batch, that's ~32s. For an extremely long batch (e.g., 10+ minutes of slow tests), we want to wait the full duration before declaring it hung. 900s = 15 min is a safe upper bound; the run_tests_batched.py subprocess.run(timeout=1000) is the final safety net for catastrophic hangs. Two-thread design is intentional (redundant safety). If one thread is somehow blocked, the other fires. The grace period is 5s for both, so the first to fire wins the race. --- tests/conftest.py | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/tests/conftest.py b/tests/conftest.py index 0141d434..3ceb878e 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -124,19 +124,24 @@ def _smart_watchdog_exit() -> None: threading.Thread(target=_smart_watchdog_exit, daemon=True, name="conftest-smart-watchdog").start() def _unconditional_watchdog_exit() -> None: - """Hard fail-safe: fires regardless of pytest state after 90s total. + """Hard fail-safe: also signal-based, but with a much longer + timeout than the smart watchdog. - The smart watchdog (above) is gated on pytest_unconfigure setting - _pytest_finished_event. If something is hung BEFORE pytest - unconfigure runs (e.g., the conftest's own _warmup_app_controller - hangs in wait_for_warmup during startup, or pytest never reaches - its unconfigure phase), the smart watchdog's first wait - blocks. This unconditional watchdog is the sledgehammer: 90s - from conftest load, fire os._exit(2) regardless. + The smart watchdog (above) uses 300s. This sledgehammer waits + 900s (15 minutes) for the same signal, so a long-running test + can take up to 15 minutes before we declare it a hang. The + only case this catches that the smart doesn't: pytest finishes + but the test session is so long the smart's 300s expires first. + In that case we still want the runner to move on. + + If the signal never fires (true hang), os._exit(2) so the runner + catches it as CalledProcessError. """ + if not _pytest_finished_event.wait(timeout=900.0): + os._exit(2) import time - time.sleep(90.0) - os._exit(2) + time.sleep(5.0) + os._exit(0) threading.Thread(target=_unconditional_watchdog_exit, daemon=True, name="conftest-unconditional-watchdog").start()