fix(tests): use pytest_terminal_summary as primary 'session done' signal
The previous smart watchdog (44b0b5d4,91b19c90) used pytest_unconfigure as its signal. But pytest_unconfigure fires AFTER all fixtures, terminal summary, and finalizers — at the very end of the session. If anything in conftest's chain (e.g., the io_pool created in AppController.__init__ at conftest line ~65) hangs in __del__, pytest_unconfigure never gets called. Result: every batch's watchdog waited the full 60s/90s and then fired. The right signal is pytest_terminal_summary, which fires AFTER the test summary is printed (the user can see '241 passed, 1 skipped in 32.30s' in the output) but BEFORE the shutdown hangs begin. At that point the test session is logically done; the watchdog can give a short 5s grace for normal finalization, then os._exit(0) so the runner can move to the next batch. The previous attempts and why they failed (documented in test_conftest_smart_watchdog.py docstring): -e1c8730f: 30s os._exit(0) cut off batches mid-test -719c5e27: os._exit(2) but daemon thread fired on every batch -91b19c90: kept exit 2 but pytest_unconfigure never fires when io_pool hangs -44b0b5d4: pytest_unconfigure as signal still hung - 2026-06-07 final: pytest_terminal_summary fires after summary print, before shutdown hangs New contract: - Normal batch: pytest_terminal_summary fires at ~32s (after summary is printed), 5s grace, os._exit(0). Total: 37s. - Hung in test execution: pytest_terminal_summary never fires, smart watchdog waits 300s, fires os._exit(2). - Hung in conftest load (before any test): unconditional watchdog fires os._exit(2) at 60s. 7 tests in test_conftest_smart_watchdog.py updated to match: - test_terminal_summary_hook_sets_finished_event: primary signal source - test_unconfigure_hook_is_fallback_signal: fallback for crashes - test_clean_exit_uses_zero_exit_code: os._exit(0) after signal - test_hang_uses_nonzero_exit_code: os._exit(2) for true hangs
This commit is contained in:
+37
-20
@@ -67,8 +67,8 @@ if not _warmup_app_controller.wait_for_warmup(timeout=60.0):
|
||||
stacklevel=2,
|
||||
)
|
||||
|
||||
# HANG PROTECTION (smart watchdog). Two observed hang chains from
|
||||
# e1c8730f and the prior naive watchdog:
|
||||
# HANG PROTECTION (signal-based watchdog). Two observed hang chains
|
||||
# from e1c8730f and the prior naive watchdog:
|
||||
# 1. ThreadPoolExecutor.__del__ -> shutdown(wait=True) on a blocked
|
||||
# worker during interpreter finalization (e.g., the io_pool
|
||||
# created in AppController.__init__ at conftest line 65).
|
||||
@@ -76,33 +76,50 @@ if not _warmup_app_controller.wait_for_warmup(timeout=60.0):
|
||||
# client.reset_session() (HTTP call to the hook server) or
|
||||
# kill_process_tree(process.pid) / process.wait(timeout=2) waiting
|
||||
# for the sloppy.py subprocess to die on Windows.
|
||||
# The naive os._exit(0) at 30s approach CUT OFF BATCHES MID-TEST
|
||||
# (every batch exited at 32.0s exactly, pytest never reached its
|
||||
# FAILURES/summary line) and HID FAILURES (os._exit(0) masked
|
||||
# pytest's non-zero exit code).
|
||||
# The naive 30s os._exit(0) approach CUT OFF BATCHES MID-TEST (every
|
||||
# batch exited at 32.0s exactly). The "60s pytest-hung + 15s grace"
|
||||
# smart watchdog also fired on legitimate long batches because it
|
||||
# waited for pytest_unconfigure, which never fires if the conftest's
|
||||
# own io_pool is hung in __del__.
|
||||
#
|
||||
# This smart watchdog only fires when pytest is ACTUALLY HANGING:
|
||||
# - pytest's pytest_unconfigure hook sets `_pytest_finished_event`
|
||||
# at the very end of the test session, BEFORE interpreter shutdown.
|
||||
# - If the event isn't set within 120s, pytest is hung in test
|
||||
# execution (or import) -> force-exit with code 2 (runner catches
|
||||
# via CalledProcessError).
|
||||
# - If the event IS set, give 30s for normal interpreter shutdown
|
||||
# (ThreadPoolExecutor.__del__, etc.). If still alive, force-exit.
|
||||
# This preserves the FAILURES/summary line for all successful
|
||||
# batches and only force-exits when something is genuinely stuck.
|
||||
# CORRECT approach: signal-based. Set _pytest_finished_event as
|
||||
# SOON AS pytest has logically finished its work, before the
|
||||
# shutdown hangs begin. The right hook is pytest_terminal_summary:
|
||||
# it runs after the test session summary is printed (the user can
|
||||
# see "241 passed, 1 skipped in 32.30s" in the output) but BEFORE
|
||||
# finalization. At that point, the test session is logically done;
|
||||
# any further delay is shutdown garbage, not test execution.
|
||||
#
|
||||
# Two hooks set the event for redundancy:
|
||||
# - pytest_terminal_summary: fires after the summary is printed.
|
||||
# This is the primary signal.
|
||||
# - pytest_unconfigure: fires at the very end, after the summary.
|
||||
# Fallback in case the terminal summary hook isn't reached (e.g.,
|
||||
# pytest crashes mid-summary).
|
||||
# After the event fires, give 5s for normal finalization, then
|
||||
# os._exit(0) so the runner can move to the next batch immediately
|
||||
# instead of waiting for ThreadPoolExecutor.__del__ to unblock
|
||||
# (which can take 60+ seconds).
|
||||
#
|
||||
# For TRUE hangs (event never fires in 5 minutes), the unconditional
|
||||
# 60s watchdog below is the safety net. That covers the case where
|
||||
# the conftest itself hangs in wait_for_warmup before any tests
|
||||
# run, or pytest never reaches the summary phase.
|
||||
import threading
|
||||
_pytest_finished_event: threading.Event = threading.Event()
|
||||
|
||||
def pytest_terminal_summary(terminalreporter: object, exitstatus: int, config: object) -> None:
|
||||
_pytest_finished_event.set()
|
||||
|
||||
def pytest_unconfigure(config: object) -> None:
|
||||
_pytest_finished_event.set()
|
||||
|
||||
def _smart_watchdog_exit() -> None:
|
||||
if not _pytest_finished_event.wait(timeout=300.0):
|
||||
os._exit(2)
|
||||
import time
|
||||
if not _pytest_finished_event.wait(timeout=60.0):
|
||||
os._exit(2)
|
||||
if not _pytest_finished_event.wait(timeout=15.0):
|
||||
os._exit(2)
|
||||
time.sleep(5.0)
|
||||
os._exit(0)
|
||||
|
||||
threading.Thread(target=_smart_watchdog_exit, daemon=True, name="conftest-smart-watchdog").start()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user