fix(tests): make unconditional watchdog signal-based too (900s, was 90s timer)
The unconditional watchdog (91b19c90) was a 90s time.sleep, which fired for ANY batch that ran >90s from conftest load — even legitimate slow live_gui tests. User confirmed: Batch 2 ended at 92.1s because the unconditional fired mid-test (the smart watchdog's signal hadn't fired yet because pytest_terminal_summary only runs after all tests are done).
Fix: make the unconditional ALSO signal-based. Both watchdogs now wait for the same _pytest_finished_event. The difference is just the timeout:
- Smart: 300s pytest-hung + 5s grace (handles normal cases)
- Unconditional: 900s pytest-hung + 5s grace (catches extremely long test runs)
- If the signal never fires, both fire os._exit(2) (the first to time out wins).
Why 900s for unconditional: pytest_terminal_summary fires AFTER the summary print. For a normal batch, that's ~32s. For an extremely long batch (e.g., 10+ minutes of slow tests), we want to wait the full duration before declaring it hung. 900s = 15 min is a safe upper bound; the run_tests_batched.py subprocess.run(timeout=1000) is the final safety net for catastrophic hangs.
Two-thread design is intentional (redundant safety). If one thread is somehow blocked, the other fires. The grace period is 5s for both, so the first to fire wins the race.
This commit is contained in:
+15
-10
@@ -124,19 +124,24 @@ def _smart_watchdog_exit() -> None:
|
||||
threading.Thread(target=_smart_watchdog_exit, daemon=True, name="conftest-smart-watchdog").start()
|
||||
|
||||
def _unconditional_watchdog_exit() -> None:
|
||||
"""Hard fail-safe: fires regardless of pytest state after 90s total.
|
||||
"""Hard fail-safe: also signal-based, but with a much longer
|
||||
timeout than the smart watchdog.
|
||||
|
||||
The smart watchdog (above) is gated on pytest_unconfigure setting
|
||||
_pytest_finished_event. If something is hung BEFORE pytest
|
||||
unconfigure runs (e.g., the conftest's own _warmup_app_controller
|
||||
hangs in wait_for_warmup during startup, or pytest never reaches
|
||||
its unconfigure phase), the smart watchdog's first wait
|
||||
blocks. This unconditional watchdog is the sledgehammer: 90s
|
||||
from conftest load, fire os._exit(2) regardless.
|
||||
The smart watchdog (above) uses 300s. This sledgehammer waits
|
||||
900s (15 minutes) for the same signal, so a long-running test
|
||||
can take up to 15 minutes before we declare it a hang. The
|
||||
only case this catches that the smart doesn't: pytest finishes
|
||||
but the test session is so long the smart's 300s expires first.
|
||||
In that case we still want the runner to move on.
|
||||
|
||||
If the signal never fires (true hang), os._exit(2) so the runner
|
||||
catches it as CalledProcessError.
|
||||
"""
|
||||
if not _pytest_finished_event.wait(timeout=900.0):
|
||||
os._exit(2)
|
||||
import time
|
||||
time.sleep(90.0)
|
||||
os._exit(2)
|
||||
time.sleep(5.0)
|
||||
os._exit(0)
|
||||
|
||||
threading.Thread(target=_unconditional_watchdog_exit, daemon=True, name="conftest-unconditional-watchdog").start()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user