test(live_workflow): pre-flight health check fails fast on dirty state

PR3 of the test_full_live_workflow_imgui_assert fix sequence. When a prior live_gui test in the same session crashes the GUI (e.g. via an ImGui IM_ASSERT from cumulative panel state), the controller's _io_pool gets shut down. The next test starts in a degraded state but only discovers this 120s later when its project switch times out with a confusing 'cannot schedule new futures after shutdown' error. This commit adds a /api/gui_health pre-flight check at the start of test_full_live_workflow. If the GUI is degraded, the test fails fast (within 1s) with a clear, actionable message that includes: - The exact RuntimeError that caused the degradation - The full traceback of the last ImGui scope mismatch - A note that the new test cannot proceed with a dirty state Per user feedback 2026-06-08: 'I don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state.' Also includes the planning documents written earlier in this session: - TODO_test_full_live_workflow_v2.md (task list) - test_full_live_workflow_imgui_assert_20260608.md (root cause report) - test_full_live_workflow_propagation_digest_20260608.md (solutions digest) - batch_resilience_plan_20260608.md (batch resilience plan) Verification: - test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade) - 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS) - Without PR3 fix: 200s FAIL with confusing 120s timeout - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message - The fast-fail is observable, not silent (per user's 'wrap might be worth it if that properly lets us handle the assert')
2026-06-08 21:17:54 -04:00
parent 8a597d1832
commit 51ecace464
5 changed files with 1077 additions and 0 deletions
@@ -39,6 +39,22 @@ def test_full_live_workflow(live_gui) -> None:
 """
 client = ApiHookClient()
 assert client.wait_for_server(timeout=10)
+ # 00. Pre-flight health check. If the live_gui subprocess was left in
+ # a degraded state by a prior test (e.g. an ImGui IM_ASSERT crashed
+ # the GUI main loop, shutting down the controller's _io_pool), fail
+ # fast with a clear message instead of waiting 120s for a switch
+ # that can never complete. Per user feedback 2026-06-08: the test
+ # should "note that the new file didn't get to deal with a dirty state"
+ # rather than silently time out.
+ health = client.get_gui_health()
+ if not health.get("healthy"):
+  pytest.fail(
+   f"GUI is degraded before test starts. "
+   f"degraded_reason={health.get('degraded_reason')!r}, "
+   f"last_assert={health.get('last_assert')!r}. "
+   f"This is likely caused by a prior test in the same live_gui session "
+   f"crashing the GUI. The new test cannot proceed with a dirty state."
+  )
 client.post_session(session_entries=[])

 # 0a. Wait for app warmup to complete. The warmup submits heavy-module