Private
Public Access
0
0

fix(tests): remove conftest watchdog; rely on runner-level subprocess timeout

The conftest watchdog (e1c8730f) was a misguided fix. Empirically observed 2026-06-07:

1. CUTS OFF BATCHES MID-TEST: On Windows, daemon=True threads are NOT auto-killed by the interpreter. The watchdog's time.sleep(30) continues through pytest's normal shutdown, then os._exit(0) fires. For any batch with live_gui tests (which start a sloppy.py subprocess and may take >30s), pytest gets killed mid-test before its FAILURES/summary line is printed. The user's last run showed every batch at exactly 32.0s, confirming the watchdog fires regardless of pytest state.

2. HIDES TEST FAILURES: pytest's os._exit(0) masks its actual exit code, so the run_tests_batched.py runner (using subprocess.run(check=True)) reported 'All 5 batches passed' even when batch 5 had 5 F's in test_ticket_queue and 1 F in test_live_gui_filedialog_regression.

3. TIMING CORRELATION: Every batch in the run completed in 32.0s exactly. The 30s watchdog + ~2s pytest startup = 32.0s for ALL batches, including ones with 240 items collected that pytest never finished running.

Removed:
- The watchdog thread registration (conftest.py lines 77-82)
- The HANG PROTECTION comment block (replaced with explanation of why we removed it)
- tests/test_conftest_watchdog.py (the test no longer applies)

Kept:
- The wait_for_warmup() call (this is the SPEC's mechanism for tests to wait for AppController warmup, NOT a watchdog)

The runner's subprocess.run(timeout=1000) per batch is now the only safety net.
This commit is contained in:
2026-06-07 13:15:08 -04:00
parent 955b61df78
commit 4103c08eac
2 changed files with 19 additions and 131 deletions
+19 -33
View File
@@ -34,33 +34,26 @@ install()
# the live_gui fixture also creates one), this call is a no-op or
# fast (warmup already done).
#
# HANG PROTECTION: The run_tests_batched.py runner hangs at the end
# of a batch when the pytest subprocess fails to exit cleanly. Two
# hang chains have been observed:
# 1. ThreadPoolExecutor.__del__ -> shutdown(wait=True) joining a
# blocked worker (concurrent.futures._python_exit, pool __del__,
# etc.). An earlier atexit fix at commit 8957c9a5 attempted to
# preempt this; verified empirically that atexit handlers do NOT
# fire at all when a pool worker is blocked in user code, so the
# fix is ineffective (see src/io_pool.py module docstring).
# 2. The session-scoped `live_gui` fixture teardown (conftest.py:~451)
# hangs in client.reset_session() (HTTP call to the hook server)
# or kill_process_tree(process.pid) / process.wait(timeout=2)
# (waiting for the sloppy.py subprocess to die on Windows).
# Both chains keep the pytest subprocess alive indefinitely, which
# makes run_tests_batched.py hang at subprocess.run() waiting for the
# child to exit.
# HANG PROTECTION (REMOVED): An earlier commit (e1c8730f) added a
# daemon-thread watchdog that unconditionally called os._exit(0) after
# 30s. The intent was to bound hangs from ThreadPoolExecutor.__del__
# and the live_gui fixture teardown. Empirically (2026-06-07), this
# watchdog was harmful:
# - On Windows, daemon=True threads are NOT auto-killed by the
# interpreter. The watchdog's time.sleep(30) continues through
# pytest's normal shutdown, then os._exit(0) fires.
# - For batches that take >30s (e.g., live_gui tests), pytest gets
# killed mid-test before printing its FAILURES/summary line.
# - The os._exit(0) hides pytest's actual exit code, so the
# run_tests_batched.py runner reports 'Batch N passed' even when
# tests had failed (e.g., 5 F's in test_ticket_queue).
#
# Solution: a daemon-thread watchdog that unconditionally calls
# os._exit(0) after a generous timeout. If pytest exits cleanly
# first, the thread is killed when the process tears down
# (daemon=True). If pytest hangs, the watchdog kicks in and the
# batched runner can move to the next batch. 30s timeout: batches
# 1-3 in the user's run completed in 1-5s of test execution; 30s
# leaves headroom for slow batches while bounding the worst-case
# hang at half a minute. See src/app_controller.py:_install_sigint_exit_handler
# for the same pattern (SIGINT + os._exit(0)) applied to the
# production Ctrl+C path.
# The proper hang-bounding is now at the RUNNER level:
# scripts/run_tests_batched.py uses subprocess.run(timeout=1000) per
# batch. If pytest hangs, the runner kills it after 1000s and reports
# failure. Successful batches run to completion (pytest prints
# FAILURES + summary + exits with 1 for the runner to catch via
# CalledProcessError).
import atexit
from src.app_controller import AppController
_warmup_app_controller = AppController()
@@ -74,13 +67,6 @@ if not _warmup_app_controller.wait_for_warmup(timeout=60.0):
stacklevel=2,
)
def _watchdog_exit() -> None:
import time
time.sleep(30.0)
os._exit(0)
import threading
threading.Thread(target=_watchdog_exit, daemon=True, name="conftest-hang-watchdog").start()
from src.gui_2 import App
class VerificationLogger: