Private
Public Access
0
0

fix(tests): make unconditional watchdog signal-based too (900s, was 90s timer)

The unconditional watchdog (91b19c90) was a 90s time.sleep, which fired for ANY batch that ran >90s from conftest load — even legitimate slow live_gui tests. User confirmed: Batch 2 ended at 92.1s because the unconditional fired mid-test (the smart watchdog's signal hadn't fired yet because pytest_terminal_summary only runs after all tests are done).

Fix: make the unconditional ALSO signal-based. Both watchdogs now wait for the same _pytest_finished_event. The difference is just the timeout:
  - Smart: 300s pytest-hung + 5s grace (handles normal cases)
  - Unconditional: 900s pytest-hung + 5s grace (catches extremely long test runs)
  - If the signal never fires, both fire os._exit(2) (the first to time out wins).

Why 900s for unconditional: pytest_terminal_summary fires AFTER the summary print. For a normal batch, that's ~32s. For an extremely long batch (e.g., 10+ minutes of slow tests), we want to wait the full duration before declaring it hung. 900s = 15 min is a safe upper bound; the run_tests_batched.py subprocess.run(timeout=1000) is the final safety net for catastrophic hangs.

Two-thread design is intentional (redundant safety). If one thread is somehow blocked, the other fires. The grace period is 5s for both, so the first to fire wins the race.
This commit is contained in:
2026-06-07 13:43:30 -04:00
parent b0fefb2aab
commit 9796fe27f4
+15 -10
View File
@@ -124,19 +124,24 @@ def _smart_watchdog_exit() -> None:
threading.Thread(target=_smart_watchdog_exit, daemon=True, name="conftest-smart-watchdog").start()
def _unconditional_watchdog_exit() -> None:
"""Hard fail-safe: fires regardless of pytest state after 90s total.
"""Hard fail-safe: also signal-based, but with a much longer
timeout than the smart watchdog.
The smart watchdog (above) is gated on pytest_unconfigure setting
_pytest_finished_event. If something is hung BEFORE pytest
unconfigure runs (e.g., the conftest's own _warmup_app_controller
hangs in wait_for_warmup during startup, or pytest never reaches
its unconfigure phase), the smart watchdog's first wait
blocks. This unconditional watchdog is the sledgehammer: 90s
from conftest load, fire os._exit(2) regardless.
The smart watchdog (above) uses 300s. This sledgehammer waits
900s (15 minutes) for the same signal, so a long-running test
can take up to 15 minutes before we declare it a hang. The
only case this catches that the smart doesn't: pytest finishes
but the test session is so long the smart's 300s expires first.
In that case we still want the runner to move on.
If the signal never fires (true hang), os._exit(2) so the runner
catches it as CalledProcessError.
"""
if not _pytest_finished_event.wait(timeout=900.0):
os._exit(2)
import time
time.sleep(90.0)
os._exit(2)
time.sleep(5.0)
os._exit(0)
threading.Thread(target=_unconditional_watchdog_exit, daemon=True, name="conftest-unconditional-watchdog").start()