Private
Public Access
0
0

fix(tests): add SMART hang watchdog (pytest_unconfigure-triggered, exit 2)

Re-add hang protection after the user's run showed pytest hanging in interpreter shutdown (ThreadPoolExecutor.__del__ / live_gui teardown) after Batch 1 completed successfully. The previous naive watchdog (e1c8730f, 30s os._exit(0)) cut off batches mid-test; the immediate removal (4103c08e) let real hangs wait 1000s for the runner's subprocess timeout.

This SMART watchdog only fires when pytest is ACTUALLY hanging:
  - pytest_unconfigure hook sets _pytest_finished_event when the
    test session is done (BEFORE interpreter finalization).
  - Watchdog waits for the event with 120s timeout:
      * If not set in 120s: pytest is hung in test execution -> os._exit(2).
      * If set: pytest finished cleanly; give 30s for normal
        interpreter shutdown (ThreadPoolExecutor.__del__, etc.).
      * If still alive after grace: io_pool / live_gui teardown
        is hung -> os._exit(2).
  - Exit code 2 (not 0) so run_tests_batched.py correctly reports
    a failed batch (CalledProcessError). The 0 in the previous
    version masked hangs and hid test failures.

Contract:
  - Normal batch (35s execution, 2s shutdown): pytest_unconfigure
    fires at 35s, watchdog's first wait returns immediately, 30s
    grace elapses without fire, pytest exits with 0. Runner: passed.
  - Hung batch: pytest_unconfigure never fires, watchdog fires
    os._exit(2) at 120s. Runner: failed.
  - Hung shutdown (io_pool.__del__ blocks): pytest_unconfigure
    fires, 30s grace elapses, watchdog fires os._exit(2). Runner: failed.

5 new tests in tests/test_conftest_smart_watchdog.py:
  - test_watchdog_thread_registered: daemon thread named conftest-smart-watchdog
  - test_watchdog_thread_is_daemon: doesn't block pytest exit
  - test_pytest_unconfigure_sets_finished_flag: hook exists in conftest
  - test_watchdog_uses_non_zero_exit_code: os._exit(2) is used
  - test_watchdog_timeouts_documented: 120s and 30s are present
This commit is contained in:
2026-06-07 13:18:11 -04:00
parent 4103c08eac
commit 44b0b5d4ee
2 changed files with 160 additions and 0 deletions
+39
View File
@@ -67,6 +67,45 @@ if not _warmup_app_controller.wait_for_warmup(timeout=60.0):
stacklevel=2,
)
# HANG PROTECTION (smart watchdog). Two observed hang chains from
# e1c8730f and the prior naive watchdog:
# 1. ThreadPoolExecutor.__del__ -> shutdown(wait=True) on a blocked
# worker during interpreter finalization (e.g., the io_pool
# created in AppController.__init__ at conftest line 65).
# 2. The session-scoped `live_gui` fixture teardown hanging in
# client.reset_session() (HTTP call to the hook server) or
# kill_process_tree(process.pid) / process.wait(timeout=2) waiting
# for the sloppy.py subprocess to die on Windows.
# The naive os._exit(0) at 30s approach CUT OFF BATCHES MID-TEST
# (every batch exited at 32.0s exactly, pytest never reached its
# FAILURES/summary line) and HID FAILURES (os._exit(0) masked
# pytest's non-zero exit code).
#
# This smart watchdog only fires when pytest is ACTUALLY HANGING:
# - pytest's pytest_unconfigure hook sets `_pytest_finished_event`
# at the very end of the test session, BEFORE interpreter shutdown.
# - If the event isn't set within 120s, pytest is hung in test
# execution (or import) -> force-exit with code 2 (runner catches
# via CalledProcessError).
# - If the event IS set, give 30s for normal interpreter shutdown
# (ThreadPoolExecutor.__del__, etc.). If still alive, force-exit.
# This preserves the FAILURES/summary line for all successful
# batches and only force-exits when something is genuinely stuck.
import threading
_pytest_finished_event: threading.Event = threading.Event()
def pytest_unconfigure(config: object) -> None:
_pytest_finished_event.set()
def _smart_watchdog_exit() -> None:
import time
if not _pytest_finished_event.wait(timeout=120.0):
os._exit(2)
if not _pytest_finished_event.wait(timeout=30.0):
os._exit(2)
threading.Thread(target=_smart_watchdog_exit, daemon=True, name="conftest-smart-watchdog").start()
from src.gui_2 import App
class VerificationLogger: