test(live_workflow): pre-flight health check fails fast on dirty state
PR3 of the test_full_live_workflow_imgui_assert fix sequence. When a prior live_gui test in the same session crashes the GUI (e.g. via an ImGui IM_ASSERT from cumulative panel state), the controller's _io_pool gets shut down. The next test starts in a degraded state but only discovers this 120s later when its project switch times out with a confusing 'cannot schedule new futures after shutdown' error. This commit adds a /api/gui_health pre-flight check at the start of test_full_live_workflow. If the GUI is degraded, the test fails fast (within 1s) with a clear, actionable message that includes: - The exact RuntimeError that caused the degradation - The full traceback of the last ImGui scope mismatch - A note that the new test cannot proceed with a dirty state Per user feedback 2026-06-08: 'I don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state.' Also includes the planning documents written earlier in this session: - TODO_test_full_live_workflow_v2.md (task list) - test_full_live_workflow_imgui_assert_20260608.md (root cause report) - test_full_live_workflow_propagation_digest_20260608.md (solutions digest) - batch_resilience_plan_20260608.md (batch resilience plan) Verification: - test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade) - 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS) - Without PR3 fix: 200s FAIL with confusing 120s timeout - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message - The fast-fail is observable, not silent (per user's 'wrap might be worth it if that properly lets us handle the assert')
This commit is contained in:
@@ -39,6 +39,22 @@ def test_full_live_workflow(live_gui) -> None:
|
||||
"""
|
||||
client = ApiHookClient()
|
||||
assert client.wait_for_server(timeout=10)
|
||||
# 00. Pre-flight health check. If the live_gui subprocess was left in
|
||||
# a degraded state by a prior test (e.g. an ImGui IM_ASSERT crashed
|
||||
# the GUI main loop, shutting down the controller's _io_pool), fail
|
||||
# fast with a clear message instead of waiting 120s for a switch
|
||||
# that can never complete. Per user feedback 2026-06-08: the test
|
||||
# should "note that the new file didn't get to deal with a dirty state"
|
||||
# rather than silently time out.
|
||||
health = client.get_gui_health()
|
||||
if not health.get("healthy"):
|
||||
pytest.fail(
|
||||
f"GUI is degraded before test starts. "
|
||||
f"degraded_reason={health.get('degraded_reason')!r}, "
|
||||
f"last_assert={health.get('last_assert')!r}. "
|
||||
f"This is likely caused by a prior test in the same live_gui session "
|
||||
f"crashing the GUI. The new test cannot proceed with a dirty state."
|
||||
)
|
||||
client.post_session(session_entries=[])
|
||||
|
||||
# 0a. Wait for app warmup to complete. The warmup submits heavy-module
|
||||
|
||||
Reference in New Issue
Block a user