PR3 of the test_full_live_workflow_imgui_assert fix sequence. When a prior live_gui test in the same session crashes the GUI (e.g. via an ImGui IM_ASSERT from cumulative panel state), the controller's _io_pool gets shut down. The next test starts in a degraded state but only discovers this 120s later when its project switch times out with a confusing 'cannot schedule new futures after shutdown' error. This commit adds a /api/gui_health pre-flight check at the start of test_full_live_workflow. If the GUI is degraded, the test fails fast (within 1s) with a clear, actionable message that includes: - The exact RuntimeError that caused the degradation - The full traceback of the last ImGui scope mismatch - A note that the new test cannot proceed with a dirty state Per user feedback 2026-06-08: 'I don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state.' Also includes the planning documents written earlier in this session: - TODO_test_full_live_workflow_v2.md (task list) - test_full_live_workflow_imgui_assert_20260608.md (root cause report) - test_full_live_workflow_propagation_digest_20260608.md (solutions digest) - batch_resilience_plan_20260608.md (batch resilience plan) Verification: - test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade) - 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS) - Without PR3 fix: 200s FAIL with confusing 120s timeout - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message - The fast-fail is observable, not silent (per user's 'wrap might be worth it if that properly lets us handle the assert')
12 KiB
Batch-Level Test Resilience Plan
Companion to: docs/reports/test_full_live_workflow_propagation_digest_20260608.md
Status: Pre-implementation plan
User requirement: "I also don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state."
1. Current Behavior
The tests/conftest.py:live_gui fixture is session-scoped. It spawns a single sloppy.py subprocess at the start of the test session and keeps it alive for ALL live_gui tests across ALL tiers.
Test file structure (relevant):
tests/test_extended_sims.py— 4 sim tests:test_context_sim_live,test_ai_settings_sim_live,test_tools_sim_live,test_execution_sim_live. The IM_ASSERT fires during the 4th sim (~71.5s into GUI lifetime).tests/test_live_workflow.py— separate file, runs AFTER test_extended_sims.py in alphabetical order.test_full_live_workflowis the failing test.
The IM_ASSERT crashes the GUI's main loop mid-test-file. The hook server (separate thread) survives, but the controller's _io_pool is in a shutdown state. The next test file (test_live_workflow.py) starts in this degraded state. Its first click (btn_project_new_automated) hits submit_io which raises RuntimeError: cannot schedule new futures after shutdown. The test's wait_for_project_switch polls for 120s before timing out.
Failure mode observed by user: "the new file didn't get to deal with a dirty state"
2. Real User Concern: Within-Session Subprocess Degradation
The user's concern is specifically about WITHIN-SESSION state. They want:
- A test file can crash the subprocess without preventing the next file from running cleanly
- If the next file is doomed (subprocess is degraded), the runner should report this clearly, not silently time out
- The runner should continue to subsequent batches even after a failed one (this already works for tiers that don't use
live_gui)
The current implementation has NONE of these properties:
live_guiis session-scoped, so the subprocess lives across the whole test session- A crashed subprocess poisons all subsequent live_gui tests
- The degraded state (io_pool shut down) is not surfaced to the test, so the test fails with a confusing timeout, not a clear "subprocess degraded" message
3. Probable Solutions
Solution A: Per-file live_gui Fixture (most isolated)
Approach: Change live_gui from @pytest.fixture(scope="session") to @pytest.fixture(scope="module"). Each test file gets a fresh subprocess.
Code change (1 line):
# tests/conftest.py
@pytest.fixture(scope="module") # was: "session"
def live_gui(request):
...
Pros:
- Maximum isolation. A test file that crashes the subprocess doesn't affect the next file.
- The fixture's
finallyblock (which callskill_process_tree) is the per-file cleanup. - Simple to implement (one-line scope change + audit).
Cons:
- ~1-2s overhead per file (subprocess spawn + hook server health check).
- For 49 live_gui files, that's 49-98s of additional overhead.
- Some tests may currently rely on cross-file state (e.g., a project loaded by file A is still loaded when file B starts). These tests would break.
Mitigation: Audit the live_gui tests for cross-file state dependencies. Most should be standalone (each test sets up its own state). If any are not, mark them with @pytest.mark.requires_prior_state and either:
- Skip them when scope is module
- Or document the dependency and add a setup step in the dependent file
Effort: 1-2 hours (scope change + audit + fix cross-file dependencies).
Risk: Medium. May break tests that depend on cross-file state. The audit is the main work.
Solution B: Lazy Re-spawn (most flexible)
Approach: Keep the live_gui fixture session-scoped, but wrap it in a handle that re-spawns the subprocess if it dies. The handle exposes the same API as the current fixture.
Code change (significant):
# tests/conftest.py
class _LiveGuiHandle:
def __init__(self, gui_script: str):
self._gui_script = gui_script
self._process: subprocess.Popen | None = None
self._lock = threading.Lock()
self._spawn()
def _spawn(self) -> None:
# Existing fixture spawn logic, refactored into a method
...
def is_alive(self) -> bool:
return self._process is not None and self._process.poll() is None
def ensure_alive(self) -> None:
with self._lock:
if not self.is_alive():
self._spawn()
@property
def process(self) -> subprocess.Popen:
self.ensure_alive()
return self._process
@pytest.fixture(scope="session")
def live_gui(request):
handle = _LiveGuiHandle(gui_script)
yield handle, handle._gui_script
handle._kill()
Pros:
- Preserves the per-session fixture scope.
- Auto-recovers from subprocess death between tests.
- Tests that rely on cross-file state can still do so (the subprocess is the same instance, modulo a respawn).
- Single place to add health checks.
Cons:
- More complex. The handle's
ensure_aliveadds a check at every test entry. - If the subprocess dies mid-test, the test still fails — we only recover BETWEEN tests.
- Respawning the subprocess loses any in-process state. Tests that rely on state from a prior test fail on respawn.
Effort: 4-6 hours (refactor fixture + add respawn logic + tests).
Risk: Low. The respawn is a fallback; the primary path (subprocess stays alive) is unchanged.
Solution C: Per-Batch Process Tracking (most surgical)
Approach: Add a process health check at the start of each batch in scripts/run_tests_batched.py. If the previous batch left the subprocess dead, log a clear warning. Tests can then fail fast with a known message.
Code change (conftest writes pid file, batcher reads it):
# tests/conftest.py (in live_gui fixture, after spawn)
pid_file = tests_dir / ".live_gui_pid"
pid_file.write_text(str(process.pid))
# scripts/run_tests_batched.py
def _run_batch(b: Batch, ...) -> ...:
if b.label.startswith("tier-3-live_gui"):
pid_file = tests_dir / ".live_gui_pid"
if pid_file.exists():
pid = int(pid_file.read_text().strip())
if not _is_pid_alive(pid):
print(_c(f"[BATCH-WARN] Prior tier-3 batch left the live_gui subprocess (pid={pid}) dead. "
f"This batch's live_gui tests may not start with a clean state.",
_C.BOLD_YELLOW))
Pros:
- Surgical. Doesn't change the fixture or test code.
- Surfaces the dirty state via a clear warning, not a silent hang.
- User can then choose to debug or skip the batch.
Cons:
- Doesn't actually FIX the dirty state — just makes it visible.
- Requires the fixture to write a pid file (small change).
- Tests still fail with the same confusing timeout, but the warning is in the runner output.
Effort: 1-2 hours.
Risk: Low. Read-only check, no behavioral change.
Solution D: Fixture Auto-Detect (middle ground)
Approach: Keep live_gui session-scoped, but at the START of each test (not file), check if the subprocess is alive. If dead, re-spawn.
Code change (conftest auto-use hook):
# tests/conftest.py
@pytest.fixture(autouse=True)
def _check_live_gui_health(request, live_gui):
if "live_gui" in request.fixturenames:
handle, gui_script = live_gui
handle.ensure_alive()
yield
Pros:
- Per-test recovery. A test that crashes the subprocess doesn't affect the next test.
- Minimal API change (tests still use
live_gui).
Cons:
- Per-test overhead (~0.1s for the health check).
- If a test's clicks during a degraded subprocess fail, the test must be re-designed to be idempotent.
- Respawning loses state.
Effort: 2-3 hours.
Risk: Medium. Tests that assume "subprocess is alive when my test starts" may need adjustment.
4. Recommended Combination
Primary: Solution A (per-file fixture scope)
- Most isolated. Each test file is a clean unit.
- Simple to implement and audit.
- For the IM_ASSERT scenario: test_extended_sims.py crashes its subprocess at the end. test_live_workflow.py starts with a fresh subprocess. The IM_ASSERT-triggered pollution doesn't reach test_live_workflow.py.
Secondary: Solution C (per-batch warning)
- Safety net. If a test file's subprocess dies mid-file (rather than at end of file), the next batch's runner logs a clear warning.
- Doesn't fix the dirty state but makes it visible.
Optional: Solution B (lazy re-spawn)
- If the audit for Solution A reveals too many cross-file dependencies, Solution B is the fallback.
- More complex but preserves the per-session state model.
NOT recommended: Solution D alone
- Per-test recovery is too granular. A test's failure shouldn't trigger a re-spawn that affects subsequent tests' setup.
- Also: Solution D doesn't help the IM_ASSERT scenario. The IM_ASSERT crashes the subprocess during test_extended_sims.py, and Solution D would respawn it for the next test in the SAME file. But the next test in test_extended_sims.py is
test_full_live_workflowwhich is in a different file — Solution D would still respawn correctly for it.
Actually, Solution D WOULD work for the IM_ASSERT scenario:
- IM_ASSERT fires during
test_execution_sim_live(test 4 in test_extended_sims.py) - Next test is... well, there are no more tests in test_extended_sims.py
- Next file is test_live_workflow.py, first test is test_full_live_workflow
- Solution D's autouse fixture would re-spawn the subprocess before test_full_live_workflow
So Solution D is actually a viable primary approach. Let me reconsider.
Revised recommendation:
- Solution D (autouse fixture auto-respawn) as the primary. It's the most surgical.
- Solution A (per-file scope) as the alternative if Solution D's autouse approach has side effects.
- Solution C (per-batch warning) as a safety net for any case the autouse doesn't catch.
5. Open Questions for the User
Before implementation, these need clarification:
-
Fixture scope preference: Per-file (Solution A) or per-test auto-respawn (Solution D)?
- Per-file: more overhead but simpler reasoning
- Per-test auto-respawn: more surgical but adds an autouse hook
- My recommendation: Solution D. It's the closest to "the next test file gets a clean subprocess" without changing the fixture's API.
-
State reset on respawn: When the subprocess is re-spawned, should the new subprocess inherit any state (e.g., loaded project, recent discussion)?
- My recommendation: No. Fresh subprocess = fresh state. Tests should set up their own state.
-
Failure signaling: If the subprocess can't be respawned (e.g., port 8999 still in use from a zombie), should the test fail immediately or retry?
- My recommendation: Fail immediately with a clear error. Retries can hide real issues.
-
Backward compatibility: Are there tests that explicitly DEPEND on the session-scoped behavior (e.g., they share state across files)?
- Need to audit. The audit is part of Solution A; for Solution D, the audit is less critical because respawned subprocesses are NEW instances (no shared state with prior subprocesses).
6. References
tests/conftest.py:282— currentlive_guifixture (session-scoped)tests/conftest.py:516-547—live_guifixture finally block (kill + cleanup)scripts/run_tests_batched.py:136-164—_run_batchfunctionscripts/run_tests_batched.py:51-86— batch result trackingdocs/reports/test_full_live_workflow_propagation_digest_20260608.md— full solution matrixconductor/todos/TODO_test_full_live_workflow_v2.md— task list including Task 4 (batch isolation)