955b61df78
The os._exit(2) change in 719c5e27 introduced a regression: the watchdog's daemon thread continues running through pytest's interpreter shutdown. On EVERY batch (even ones that complete successfully in 17s), the watchdog's time.sleep(30.0) elapses during finalization and the thread calls os._exit(2) just as pytest is wrapping up. Result: every batch was reported as 'Batch N failed' by run_tests_batched.py, even ones with '126 passed in 17.14s'.
Revert watchdog to os._exit(0) — its original purpose (force-exit any stuck pytest at 30s) doesn't need a non-zero code; it's a sledgehammer, not a signal. The runner does its own failure detection.
Update scripts/run_tests_batched.py to:
- Use subprocess.run(timeout=180) per batch
- Catch TimeoutExpired as a batch failure (with elapsed time + reason printed)
- Catch CalledProcessError as a batch failure (preserved from before)
- Print elapsed time for every batch (pass or fail) so hang behavior is visible
- Print a final summary that lists all FAILED FILES (not batches) for easy re-running
- Add --batch-size and --timeout CLI flags
- Add 1-space indentation + type hints per project style
Verified: ast.parse OK; --help works; test_conftest_watchdog 3/3 pass.
99 lines
3.6 KiB
Python
99 lines
3.6 KiB
Python
"""Regression: pytest conftest must install a hang-bounding watchdog.
|
|
|
|
The run_tests_batched.py runner hangs at the end of a batch when the
|
|
pytest subprocess fails to exit cleanly. Two hang chains have been
|
|
observed:
|
|
1. ThreadPoolExecutor.__del__ -> shutdown(wait=True) on a blocked
|
|
worker during interpreter finalization.
|
|
2. The session-scoped `live_gui` fixture teardown (conftest.py:~451)
|
|
hanging on HTTP call to the hook server or on process.wait() for
|
|
the sloppy.py subprocess.
|
|
|
|
The conftest installs a daemon-thread watchdog (os._exit(0) after a
|
|
30s timeout) to bound the hang. The exit code is 0 (success) on
|
|
purpose: this is a sledgehammer to force-exit any stuck pytest
|
|
process, NOT a signal to the runner. Failure detection is the
|
|
runner's job — run_tests_batched.py uses subprocess.run(timeout=120)
|
|
and treats TimeoutExpired as a batch failure.
|
|
|
|
This test verifies the watchdog is actually registered after the
|
|
conftest loads. It does NOT spawn a subprocess (which would itself
|
|
be bound by the watchdog and create a recursive timeout), it just
|
|
inspects threading.enumerate() at the time the test runs.
|
|
|
|
If the watchdog is removed or the timeout grows, this test fails
|
|
and the run_tests_batched.py hang returns.
|
|
"""
|
|
|
|
import sys
|
|
import threading
|
|
from pathlib import Path
|
|
|
|
import pytest
|
|
|
|
ROOT = Path(__file__).resolve().parent.parent
|
|
sys.path.insert(0, str(ROOT))
|
|
|
|
# The conftest has already been loaded by pytest before this test
|
|
# collection. We just need to verify the watchdog thread is alive.
|
|
WATCHDOG_NAME = "conftest-hang-watchdog"
|
|
WATCHDOG_SLEEP_SECONDS = 30.0
|
|
WATCHDOG_TOLERANCE_SECONDS = 5.0
|
|
|
|
|
|
def test_watchdog_thread_registered() -> None:
|
|
"""Verify the conftest's hang-bounding watchdog thread is alive.
|
|
|
|
The watchdog is a daemon thread named "conftest-hang-watchdog" that
|
|
sleeps for ~30s then calls os._exit(0). It must be alive (not yet
|
|
fired) at the time this test runs, because the pytest session has
|
|
not been running for 30s yet.
|
|
"""
|
|
threads = threading.enumerate()
|
|
names = [t.name for t in threads]
|
|
assert WATCHDOG_NAME in names, (
|
|
f"conftest watchdog thread {WATCHDOG_NAME!r} not found in "
|
|
f"threading.enumerate(); run_tests_batched.py will hang at end "
|
|
f"of batch. Active threads: {names}"
|
|
)
|
|
|
|
|
|
def test_watchdog_thread_is_daemon() -> None:
|
|
"""Watchdog must be daemon so it doesn't block pytest's own exit."""
|
|
for t in threading.enumerate():
|
|
if t.name == WATCHDOG_NAME:
|
|
assert t.daemon, (
|
|
f"watchdog thread is not daemon (daemon={t.daemon}); "
|
|
f"this would prevent pytest from exiting cleanly"
|
|
)
|
|
return
|
|
pytest.fail(f"watchdog thread {WATCHDOG_NAME!r} not found")
|
|
|
|
|
|
def test_watchdog_timeout_within_tolerance() -> None:
|
|
"""Watchdog timeout must be near the documented 30s value.
|
|
|
|
If the timeout drifts too low (<25s), normal slow batches could
|
|
be killed prematurely. If it drifts too high (>120s), the hang
|
|
bounding is too loose. This test enforces the contract.
|
|
"""
|
|
import re
|
|
conftest_path = Path(__file__).resolve().parent / "conftest.py"
|
|
text = conftest_path.read_text(encoding="utf-8")
|
|
# Look for the watchdog sleep call and extract the timeout
|
|
match = re.search(r"time\.sleep\(([\d.]+)\)", text)
|
|
assert match is not None, (
|
|
f"could not find time.sleep() call in {conftest_path}; "
|
|
f"watchdog may have been removed or restructured"
|
|
)
|
|
sleep_value = float(match.group(1))
|
|
assert (
|
|
WATCHDOG_SLEEP_SECONDS - WATCHDOG_TOLERANCE_SECONDS
|
|
<= sleep_value
|
|
<= WATCHDOG_SLEEP_SECONDS + WATCHDOG_TOLERANCE_SECONDS
|
|
), (
|
|
f"watchdog timeout is {sleep_value}s; expected "
|
|
f"~{WATCHDOG_SLEEP_SECONDS}s +/- {WATCHDOG_TOLERANCE_SECONDS}s. "
|
|
f"If the timeout was intentionally changed, update this test."
|
|
)
|