719c5e274a
The conftest watchdog (e1c8730f) used os._exit(0) after the 30s sleep. run_tests_batched.py calls subprocess.run(check=True) and only prints 'Batch N failed.' when the subprocess exits non-zero. Exit 0 hid the failure: pytest got killed mid-test, the FAILURES section never printed, and the runner silently moved to the next batch. The 'Total batches with failures: 1' summary at the end was therefore undercounting.
Fix: os._exit(0) -> os._exit(2). Code 2 is the standard 'interrupted by signal/timeout' code; pytest also uses it for Ctrl-C. The batched runner now correctly reports a non-zero exit as a failure.
Test updated (docstring) to document the new contract. 3/3 test_conftest_watchdog.py still pass.
101 lines
3.8 KiB
Python
101 lines
3.8 KiB
Python
"""Regression: pytest conftest must install a hang-bounding watchdog.
|
|
|
|
The run_tests_batched.py runner hangs at the end of a batch when the
|
|
pytest subprocess fails to exit cleanly. Two hang chains have been
|
|
observed:
|
|
1. ThreadPoolExecutor.__del__ -> shutdown(wait=True) on a blocked
|
|
worker during interpreter finalization.
|
|
2. The session-scoped `live_gui` fixture teardown (conftest.py:~451)
|
|
hanging on HTTP call to the hook server or on process.wait() for
|
|
the sloppy.py subprocess.
|
|
|
|
The conftest installs a daemon-thread watchdog (os._exit(2) after a
|
|
30s timeout) to bound the hang. The non-zero exit code is critical:
|
|
run_tests_batched.py uses subprocess.run(check=True) and only
|
|
prints "Batch N failed." if pytest exits non-zero. Exit code 0 would
|
|
silently report a successful batch even when the watchdog killed
|
|
pytest mid-test (the FAILURES section never gets printed). Exit
|
|
code 2 is the standard "interrupted by signal/timeout" code that
|
|
preserves the failure signal to the runner.
|
|
|
|
This test verifies the watchdog is actually registered after the
|
|
conftest loads. It does NOT spawn a subprocess (which would itself
|
|
be bound by the watchdog and create a recursive timeout), it just
|
|
inspects threading.enumerate() at the time the test runs.
|
|
|
|
If the watchdog is removed or the timeout grows, this test fails
|
|
and the run_tests_batched.py hang returns.
|
|
"""
|
|
|
|
import sys
|
|
import threading
|
|
from pathlib import Path
|
|
|
|
import pytest
|
|
|
|
ROOT = Path(__file__).resolve().parent.parent
|
|
sys.path.insert(0, str(ROOT))
|
|
|
|
# The conftest has already been loaded by pytest before this test
|
|
# collection. We just need to verify the watchdog thread is alive.
|
|
WATCHDOG_NAME = "conftest-hang-watchdog"
|
|
WATCHDOG_SLEEP_SECONDS = 30.0
|
|
WATCHDOG_TOLERANCE_SECONDS = 5.0
|
|
|
|
|
|
def test_watchdog_thread_registered() -> None:
|
|
"""Verify the conftest's hang-bounding watchdog thread is alive.
|
|
|
|
The watchdog is a daemon thread named "conftest-hang-watchdog" that
|
|
sleeps for ~30s then calls os._exit(0). It must be alive (not yet
|
|
fired) at the time this test runs, because the pytest session has
|
|
not been running for 30s yet.
|
|
"""
|
|
threads = threading.enumerate()
|
|
names = [t.name for t in threads]
|
|
assert WATCHDOG_NAME in names, (
|
|
f"conftest watchdog thread {WATCHDOG_NAME!r} not found in "
|
|
f"threading.enumerate(); run_tests_batched.py will hang at end "
|
|
f"of batch. Active threads: {names}"
|
|
)
|
|
|
|
|
|
def test_watchdog_thread_is_daemon() -> None:
|
|
"""Watchdog must be daemon so it doesn't block pytest's own exit."""
|
|
for t in threading.enumerate():
|
|
if t.name == WATCHDOG_NAME:
|
|
assert t.daemon, (
|
|
f"watchdog thread is not daemon (daemon={t.daemon}); "
|
|
f"this would prevent pytest from exiting cleanly"
|
|
)
|
|
return
|
|
pytest.fail(f"watchdog thread {WATCHDOG_NAME!r} not found")
|
|
|
|
|
|
def test_watchdog_timeout_within_tolerance() -> None:
|
|
"""Watchdog timeout must be near the documented 30s value.
|
|
|
|
If the timeout drifts too low (<25s), normal slow batches could
|
|
be killed prematurely. If it drifts too high (>120s), the hang
|
|
bounding is too loose. This test enforces the contract.
|
|
"""
|
|
import re
|
|
conftest_path = Path(__file__).resolve().parent / "conftest.py"
|
|
text = conftest_path.read_text(encoding="utf-8")
|
|
# Look for the watchdog sleep call and extract the timeout
|
|
match = re.search(r"time\.sleep\(([\d.]+)\)", text)
|
|
assert match is not None, (
|
|
f"could not find time.sleep() call in {conftest_path}; "
|
|
f"watchdog may have been removed or restructured"
|
|
)
|
|
sleep_value = float(match.group(1))
|
|
assert (
|
|
WATCHDOG_SLEEP_SECONDS - WATCHDOG_TOLERANCE_SECONDS
|
|
<= sleep_value
|
|
<= WATCHDOG_SLEEP_SECONDS + WATCHDOG_TOLERANCE_SECONDS
|
|
), (
|
|
f"watchdog timeout is {sleep_value}s; expected "
|
|
f"~{WATCHDOG_SLEEP_SECONDS}s +/- {WATCHDOG_TOLERANCE_SECONDS}s. "
|
|
f"If the timeout was intentionally changed, update this test."
|
|
)
|