test(live_workflow): pre-flight health check fails fast on dirty state
PR3 of the test_full_live_workflow_imgui_assert fix sequence. When a prior live_gui test in the same session crashes the GUI (e.g. via an ImGui IM_ASSERT from cumulative panel state), the controller's _io_pool gets shut down. The next test starts in a degraded state but only discovers this 120s later when its project switch times out with a confusing 'cannot schedule new futures after shutdown' error. This commit adds a /api/gui_health pre-flight check at the start of test_full_live_workflow. If the GUI is degraded, the test fails fast (within 1s) with a clear, actionable message that includes: - The exact RuntimeError that caused the degradation - The full traceback of the last ImGui scope mismatch - A note that the new test cannot proceed with a dirty state Per user feedback 2026-06-08: 'I don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state.' Also includes the planning documents written earlier in this session: - TODO_test_full_live_workflow_v2.md (task list) - test_full_live_workflow_imgui_assert_20260608.md (root cause report) - test_full_live_workflow_propagation_digest_20260608.md (solutions digest) - batch_resilience_plan_20260608.md (batch resilience plan) Verification: - test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade) - 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS) - Without PR3 fix: 200s FAIL with confusing 120s timeout - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message - The fast-fail is observable, not silent (per user's 'wrap might be worth it if that properly lets us handle the assert')
This commit is contained in:
@@ -0,0 +1,250 @@
|
||||
# Batch-Level Test Resilience Plan
|
||||
|
||||
**Companion to:** `docs/reports/test_full_live_workflow_propagation_digest_20260608.md`
|
||||
**Status:** Pre-implementation plan
|
||||
**User requirement:** "I also don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state."
|
||||
|
||||
---
|
||||
|
||||
## 1. Current Behavior
|
||||
|
||||
The `tests/conftest.py:live_gui` fixture is **session-scoped**. It spawns a single `sloppy.py` subprocess at the start of the test session and keeps it alive for ALL live_gui tests across ALL tiers.
|
||||
|
||||
**Test file structure (relevant):**
|
||||
- `tests/test_extended_sims.py` — 4 sim tests: `test_context_sim_live`, `test_ai_settings_sim_live`, `test_tools_sim_live`, `test_execution_sim_live`. The IM_ASSERT fires during the 4th sim (~71.5s into GUI lifetime).
|
||||
- `tests/test_live_workflow.py` — separate file, runs AFTER test_extended_sims.py in alphabetical order. `test_full_live_workflow` is the failing test.
|
||||
|
||||
The IM_ASSERT crashes the GUI's main loop mid-test-file. The hook server (separate thread) survives, but the controller's `_io_pool` is in a shutdown state. The next test file (`test_live_workflow.py`) starts in this degraded state. Its first click (`btn_project_new_automated`) hits `submit_io` which raises `RuntimeError: cannot schedule new futures after shutdown`. The test's `wait_for_project_switch` polls for 120s before timing out.
|
||||
|
||||
**Failure mode observed by user:** "the new file didn't get to deal with a dirty state"
|
||||
|
||||
---
|
||||
|
||||
## 2. Real User Concern: Within-Session Subprocess Degradation
|
||||
|
||||
The user's concern is specifically about WITHIN-SESSION state. They want:
|
||||
|
||||
1. A test file can crash the subprocess without preventing the next file from running cleanly
|
||||
2. If the next file is doomed (subprocess is degraded), the runner should report this clearly, not silently time out
|
||||
3. The runner should continue to subsequent batches even after a failed one (this already works for tiers that don't use `live_gui`)
|
||||
|
||||
**The current implementation has NONE of these properties:**
|
||||
- `live_gui` is session-scoped, so the subprocess lives across the whole test session
|
||||
- A crashed subprocess poisons all subsequent live_gui tests
|
||||
- The degraded state (io_pool shut down) is not surfaced to the test, so the test fails with a confusing timeout, not a clear "subprocess degraded" message
|
||||
|
||||
---
|
||||
|
||||
## 3. Probable Solutions
|
||||
|
||||
### Solution A: Per-file live_gui Fixture (most isolated)
|
||||
|
||||
**Approach:** Change `live_gui` from `@pytest.fixture(scope="session")` to `@pytest.fixture(scope="module")`. Each test file gets a fresh subprocess.
|
||||
|
||||
**Code change (1 line):**
|
||||
```python
|
||||
# tests/conftest.py
|
||||
@pytest.fixture(scope="module") # was: "session"
|
||||
def live_gui(request):
|
||||
...
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Maximum isolation. A test file that crashes the subprocess doesn't affect the next file.
|
||||
- The fixture's `finally` block (which calls `kill_process_tree`) is the per-file cleanup.
|
||||
- Simple to implement (one-line scope change + audit).
|
||||
|
||||
**Cons:**
|
||||
- ~1-2s overhead per file (subprocess spawn + hook server health check).
|
||||
- For 49 live_gui files, that's 49-98s of additional overhead.
|
||||
- Some tests may currently rely on cross-file state (e.g., a project loaded by file A is still loaded when file B starts). These tests would break.
|
||||
|
||||
**Mitigation:** Audit the live_gui tests for cross-file state dependencies. Most should be standalone (each test sets up its own state). If any are not, mark them with `@pytest.mark.requires_prior_state` and either:
|
||||
- Skip them when scope is module
|
||||
- Or document the dependency and add a setup step in the dependent file
|
||||
|
||||
**Effort:** 1-2 hours (scope change + audit + fix cross-file dependencies).
|
||||
|
||||
**Risk:** Medium. May break tests that depend on cross-file state. The audit is the main work.
|
||||
|
||||
### Solution B: Lazy Re-spawn (most flexible)
|
||||
|
||||
**Approach:** Keep the `live_gui` fixture session-scoped, but wrap it in a handle that re-spawns the subprocess if it dies. The handle exposes the same API as the current fixture.
|
||||
|
||||
**Code change (significant):**
|
||||
```python
|
||||
# tests/conftest.py
|
||||
class _LiveGuiHandle:
|
||||
def __init__(self, gui_script: str):
|
||||
self._gui_script = gui_script
|
||||
self._process: subprocess.Popen | None = None
|
||||
self._lock = threading.Lock()
|
||||
self._spawn()
|
||||
|
||||
def _spawn(self) -> None:
|
||||
# Existing fixture spawn logic, refactored into a method
|
||||
...
|
||||
|
||||
def is_alive(self) -> bool:
|
||||
return self._process is not None and self._process.poll() is None
|
||||
|
||||
def ensure_alive(self) -> None:
|
||||
with self._lock:
|
||||
if not self.is_alive():
|
||||
self._spawn()
|
||||
|
||||
@property
|
||||
def process(self) -> subprocess.Popen:
|
||||
self.ensure_alive()
|
||||
return self._process
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def live_gui(request):
|
||||
handle = _LiveGuiHandle(gui_script)
|
||||
yield handle, handle._gui_script
|
||||
handle._kill()
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Preserves the per-session fixture scope.
|
||||
- Auto-recovers from subprocess death between tests.
|
||||
- Tests that rely on cross-file state can still do so (the subprocess is the same instance, modulo a respawn).
|
||||
- Single place to add health checks.
|
||||
|
||||
**Cons:**
|
||||
- More complex. The handle's `ensure_alive` adds a check at every test entry.
|
||||
- If the subprocess dies mid-test, the test still fails — we only recover BETWEEN tests.
|
||||
- Respawning the subprocess loses any in-process state. Tests that rely on state from a prior test fail on respawn.
|
||||
|
||||
**Effort:** 4-6 hours (refactor fixture + add respawn logic + tests).
|
||||
|
||||
**Risk:** Low. The respawn is a fallback; the primary path (subprocess stays alive) is unchanged.
|
||||
|
||||
### Solution C: Per-Batch Process Tracking (most surgical)
|
||||
|
||||
**Approach:** Add a process health check at the start of each batch in `scripts/run_tests_batched.py`. If the previous batch left the subprocess dead, log a clear warning. Tests can then fail fast with a known message.
|
||||
|
||||
**Code change (conftest writes pid file, batcher reads it):**
|
||||
```python
|
||||
# tests/conftest.py (in live_gui fixture, after spawn)
|
||||
pid_file = tests_dir / ".live_gui_pid"
|
||||
pid_file.write_text(str(process.pid))
|
||||
|
||||
# scripts/run_tests_batched.py
|
||||
def _run_batch(b: Batch, ...) -> ...:
|
||||
if b.label.startswith("tier-3-live_gui"):
|
||||
pid_file = tests_dir / ".live_gui_pid"
|
||||
if pid_file.exists():
|
||||
pid = int(pid_file.read_text().strip())
|
||||
if not _is_pid_alive(pid):
|
||||
print(_c(f"[BATCH-WARN] Prior tier-3 batch left the live_gui subprocess (pid={pid}) dead. "
|
||||
f"This batch's live_gui tests may not start with a clean state.",
|
||||
_C.BOLD_YELLOW))
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Surgical. Doesn't change the fixture or test code.
|
||||
- Surfaces the dirty state via a clear warning, not a silent hang.
|
||||
- User can then choose to debug or skip the batch.
|
||||
|
||||
**Cons:**
|
||||
- Doesn't actually FIX the dirty state — just makes it visible.
|
||||
- Requires the fixture to write a pid file (small change).
|
||||
- Tests still fail with the same confusing timeout, but the warning is in the runner output.
|
||||
|
||||
**Effort:** 1-2 hours.
|
||||
|
||||
**Risk:** Low. Read-only check, no behavioral change.
|
||||
|
||||
### Solution D: Fixture Auto-Detect (middle ground)
|
||||
|
||||
**Approach:** Keep `live_gui` session-scoped, but at the START of each test (not file), check if the subprocess is alive. If dead, re-spawn.
|
||||
|
||||
**Code change (conftest auto-use hook):**
|
||||
```python
|
||||
# tests/conftest.py
|
||||
@pytest.fixture(autouse=True)
|
||||
def _check_live_gui_health(request, live_gui):
|
||||
if "live_gui" in request.fixturenames:
|
||||
handle, gui_script = live_gui
|
||||
handle.ensure_alive()
|
||||
yield
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Per-test recovery. A test that crashes the subprocess doesn't affect the next test.
|
||||
- Minimal API change (tests still use `live_gui`).
|
||||
|
||||
**Cons:**
|
||||
- Per-test overhead (~0.1s for the health check).
|
||||
- If a test's clicks during a degraded subprocess fail, the test must be re-designed to be idempotent.
|
||||
- Respawning loses state.
|
||||
|
||||
**Effort:** 2-3 hours.
|
||||
|
||||
**Risk:** Medium. Tests that assume "subprocess is alive when my test starts" may need adjustment.
|
||||
|
||||
---
|
||||
|
||||
## 4. Recommended Combination
|
||||
|
||||
**Primary: Solution A (per-file fixture scope)**
|
||||
- Most isolated. Each test file is a clean unit.
|
||||
- Simple to implement and audit.
|
||||
- For the IM_ASSERT scenario: test_extended_sims.py crashes its subprocess at the end. test_live_workflow.py starts with a fresh subprocess. The IM_ASSERT-triggered pollution doesn't reach test_live_workflow.py.
|
||||
|
||||
**Secondary: Solution C (per-batch warning)**
|
||||
- Safety net. If a test file's subprocess dies mid-file (rather than at end of file), the next batch's runner logs a clear warning.
|
||||
- Doesn't fix the dirty state but makes it visible.
|
||||
|
||||
**Optional: Solution B (lazy re-spawn)**
|
||||
- If the audit for Solution A reveals too many cross-file dependencies, Solution B is the fallback.
|
||||
- More complex but preserves the per-session state model.
|
||||
|
||||
### NOT recommended: Solution D alone
|
||||
- Per-test recovery is too granular. A test's failure shouldn't trigger a re-spawn that affects subsequent tests' setup.
|
||||
- Also: Solution D doesn't help the IM_ASSERT scenario. The IM_ASSERT crashes the subprocess during test_extended_sims.py, and Solution D would respawn it for the next test in the SAME file. But the next test in test_extended_sims.py is `test_full_live_workflow` which is in a different file — Solution D would still respawn correctly for it.
|
||||
|
||||
Actually, Solution D WOULD work for the IM_ASSERT scenario:
|
||||
- IM_ASSERT fires during `test_execution_sim_live` (test 4 in test_extended_sims.py)
|
||||
- Next test is... well, there are no more tests in test_extended_sims.py
|
||||
- Next file is test_live_workflow.py, first test is test_full_live_workflow
|
||||
- Solution D's autouse fixture would re-spawn the subprocess before test_full_live_workflow
|
||||
|
||||
So Solution D is actually a viable primary approach. Let me reconsider.
|
||||
|
||||
**Revised recommendation:**
|
||||
- **Solution D (autouse fixture auto-respawn)** as the primary. It's the most surgical.
|
||||
- **Solution A (per-file scope)** as the alternative if Solution D's autouse approach has side effects.
|
||||
- **Solution C (per-batch warning)** as a safety net for any case the autouse doesn't catch.
|
||||
|
||||
---
|
||||
|
||||
## 5. Open Questions for the User
|
||||
|
||||
Before implementation, these need clarification:
|
||||
|
||||
1. **Fixture scope preference:** Per-file (Solution A) or per-test auto-respawn (Solution D)?
|
||||
- Per-file: more overhead but simpler reasoning
|
||||
- Per-test auto-respawn: more surgical but adds an autouse hook
|
||||
- My recommendation: Solution D. It's the closest to "the next test file gets a clean subprocess" without changing the fixture's API.
|
||||
|
||||
2. **State reset on respawn:** When the subprocess is re-spawned, should the new subprocess inherit any state (e.g., loaded project, recent discussion)?
|
||||
- My recommendation: No. Fresh subprocess = fresh state. Tests should set up their own state.
|
||||
|
||||
3. **Failure signaling:** If the subprocess can't be respawned (e.g., port 8999 still in use from a zombie), should the test fail immediately or retry?
|
||||
- My recommendation: Fail immediately with a clear error. Retries can hide real issues.
|
||||
|
||||
4. **Backward compatibility:** Are there tests that explicitly DEPEND on the session-scoped behavior (e.g., they share state across files)?
|
||||
- Need to audit. The audit is part of Solution A; for Solution D, the audit is less critical because respawned subprocesses are NEW instances (no shared state with prior subprocesses).
|
||||
|
||||
---
|
||||
|
||||
## 6. References
|
||||
|
||||
- `tests/conftest.py:282` — current `live_gui` fixture (session-scoped)
|
||||
- `tests/conftest.py:516-547` — `live_gui` fixture finally block (kill + cleanup)
|
||||
- `scripts/run_tests_batched.py:136-164` — `_run_batch` function
|
||||
- `scripts/run_tests_batched.py:51-86` — batch result tracking
|
||||
- `docs/reports/test_full_live_workflow_propagation_digest_20260608.md` — full solution matrix
|
||||
- `conductor/todos/TODO_test_full_live_workflow_v2.md` — task list including Task 4 (batch isolation)
|
||||
@@ -0,0 +1,267 @@
|
||||
# Root Cause Report: test_full_live_workflow batch failure (v2)
|
||||
|
||||
**Supersedes:** `test_full_live_workflow_root_cause_20260608.md` (older 6-cause analysis, dated 2026-06-08)
|
||||
**Date:** 2026-06-08
|
||||
**Status:** Investigation complete via diagnostic logging, no fix attempted
|
||||
**Failure reproducibility:** 100% in `tier-3-live_gui` batch (5+ tests, ~200s total), 0% in isolation (`pytest tests/test_live_workflow.py` → 11.69s PASS)
|
||||
**Related commits (reverted/no-op):** `4a338486` (io_pool 4→8), `c9a991bb` (timeout 30→120s), `87d7c5bf` (test_io_pool assertion)
|
||||
|
||||
---
|
||||
|
||||
## TL;DR — The Real Root Cause
|
||||
|
||||
**`test_full_live_workflow` does not fail because of a slow `_do_project_switch`.** The switch runs in ~8-10ms when it executes. The test fails because **the GUI subprocess crashes mid-batch** due to an ImGui scope mismatch in some render function, which leaves the controller's `io_pool` in a shutdown state. Subsequent clicks to the still-alive hook server fail with `RuntimeError: cannot schedule new futures after shutdown`.
|
||||
|
||||
The `_do_project_switch` was a SYMPTOM, not the cause. The previous report's 6-cause analysis (cwd-relative paths, race conditions, click fire-and-forget, etc.) addressed symptoms of the same underlying issue but did not address the IM_ASSERT.
|
||||
|
||||
---
|
||||
|
||||
## Evidence Trail
|
||||
|
||||
### Diagnostic instrumentation (temporarily added, then reverted)
|
||||
|
||||
Added `[switch-diag] +N.NNNs <step>` prints to stderr at every step inside `_do_project_switch` (production code). Output goes to `logs/sloppy_py_test.log` because the `live_gui` fixture captures the subprocess's stderr/stdout to that file. Pattern matches the existing `[startup]` and `[HOOKS]` instrumentation style.
|
||||
|
||||
### Key log findings (`logs/sloppy_py_test.log`)
|
||||
|
||||
#### Finding 1: All 4 sims' switches complete in ~8-10ms each
|
||||
|
||||
```
|
||||
[switch-diag] +0.000s enter path=temp_livecontextsim.toml
|
||||
[switch-diag] +0.000s flush_to_project_start
|
||||
[switch-diag] +0.000s flush_to_project_done
|
||||
[switch-diag] +0.000s load_project_start
|
||||
[switch-diag] +0.002s load_project_done
|
||||
[switch-diag] +0.002s preset_manager_start
|
||||
[switch-diag] +0.002s preset_manager_done
|
||||
[switch-diag] +0.002s persona_manager_start
|
||||
[switch-diag] +0.002s persona_manager_done
|
||||
[switch-diag] +0.002s refresh_start
|
||||
[switch-diag] +0.010s refresh_done
|
||||
[switch-diag] +0.010s mcp_configure_start
|
||||
[switch-diag] +0.010s mcp_configure_done
|
||||
[switch-diag] +0.010s success
|
||||
[switch-diag] +0.010s finally_enter
|
||||
[switch-diag] +0.010s finally_done
|
||||
```
|
||||
|
||||
Same pattern for all 4 sims. **The switch itself is fast. There is no hang inside `_do_project_switch`.**
|
||||
|
||||
#### Finding 2: An ImGui `IM_ASSERT` fires at 71.5s into GUI lifetime
|
||||
|
||||
```
|
||||
[02214] [imgui-error] In window 'MainDockSpace': Missing End()
|
||||
[startup] main_call: 71518.5ms
|
||||
Traceback (most recent call last):
|
||||
File "C:\projects\manual_slop\sloppy.py", line 75, in <module>
|
||||
main()
|
||||
File "C:\projects\manual_slop\src\gui_2.py", line 1478, in main
|
||||
app.run()
|
||||
File "C:\projects\manual_slop\src\gui_2.py", line 618, in run
|
||||
immapp.run(self.runner_params, ...)
|
||||
File "...\imgui_bundle\_patch_runners_add_save_screenshot_param.py", line 38, in patched_run
|
||||
run_backup(*args, **kwargs)
|
||||
RuntimeError: IM_ASSERT( (0) && "Missing End()" ) --- imgui.cpp:11662
|
||||
```
|
||||
|
||||
The `IM_ASSERT` is an ImGui scope-tracking assertion: a `begin()` call was not matched with a corresponding `end()`. The window reported is 'MainDockSpace' — a special window managed by `hello_imgui` for the dock space layout. Some child widget within the dock space has an unbalanced begin/end.
|
||||
|
||||
#### Finding 3: The test's `btn_project_new_automated` click hits a shutdown pool
|
||||
|
||||
```
|
||||
[HOOKS] POST /api/session data length: 1 ← test_live_workflow's post_session
|
||||
[HOOKS] GET /api/warmup_wait?timeout=60.0
|
||||
[HOOKS] GET /api/project_switch_status
|
||||
[HOOKS] POST /api/gui data length: 3 ← test's btn_reset
|
||||
[HOOKS] POST /api/gui data length: 3 ← test's btn_project_new_automated
|
||||
Error executing GUI task (click): cannot schedule new futures after shutdown
|
||||
Traceback (most recent call last):
|
||||
File "...\src\app_controller.py", line 1637, in _process_pending_gui_tasks
|
||||
self._gui_task_handlers[action](self, task)
|
||||
File "...\src\app_controller.py", line 580, in _handle_click
|
||||
controller._cb_new_project_automated(user_data)
|
||||
File "...\src\app_controller.py", line 2723, in _cb_new_project_automated
|
||||
self._switch_project(user_data)
|
||||
File "...\src\app_controller.py", line 2809, in _switch_project
|
||||
self.submit_io(self._do_project_switch, path)
|
||||
File "...\src\app_controller.py", line 2282, in submit_io
|
||||
future = self._io_pool.submit(fn, *args, **kwargs)
|
||||
File "...\concurrent\futures\thread.py", line 167, in submit
|
||||
raise RuntimeError('cannot schedule new futures after shutdown')
|
||||
RuntimeError: cannot schedule new futures after shutdown
|
||||
```
|
||||
|
||||
The IM_ASSERT happened ~71.5s into the GUI. The test_live_workflow runs after the 4 sims (~80s into GUI). Between the IM_ASSERT and the test's click, the `io_pool` was shut down.
|
||||
|
||||
#### Finding 4: The hook server (FastAPI/stdlib http.server) stays alive
|
||||
|
||||
The subprocess continues to respond to hooks (GET /api/events, GET /api/session, POST /api/gui) AFTER the IM_ASSERT and AFTER the io_pool is shut down. The test's `wait_for_project_switch` polls `/api/project_switch_status` 1200+ times in 120s — the server is responsive. Only `submit_io` fails.
|
||||
|
||||
This proves: the IO thread pool is the casualty, not the entire process. Something selectively shut down `_io_pool` while the rest of the controller (and the hook server) kept running.
|
||||
|
||||
---
|
||||
|
||||
## What Shuts Down The io_pool?
|
||||
|
||||
This is the unresolved question. The only places `_io_pool.shutdown(wait=False)` is called in `src/`:
|
||||
|
||||
1. `src/app_controller.py:762` — in the `_on_sigint` SIGINT handler. This requires SIGINT to be delivered to the subprocess. On Windows, `taskkill /F` does NOT deliver signals.
|
||||
2. `src/app_controller.py:2325` — in `controller.shutdown()`. This is called from `src/gui_2.py:869` (in `App.shutdown`), which is only called at `src/gui_2.py:620` AFTER `immapp.run()` returns successfully.
|
||||
|
||||
The IM_ASSERT raises `RuntimeError` from inside `immapp.run()`. The exception propagates up. **There is no `try/finally` around `immapp.run`**, so `App.shutdown()` is NOT called via this path.
|
||||
|
||||
**Hypothesis A:** Python's interpreter finalization calls `__del__` on the controller (or its `_io_pool`) during exception propagation. `ThreadPoolExecutor.__del__` defaults to `shutdown(wait=False)` per the io_pool.py module docstring. This is the most likely path on a `RuntimeError` propagating through `main()`.
|
||||
|
||||
**Hypothesis B:** `immapp.run` internally catches the IM_ASSERT and returns normally. Then `app.shutdown()` is called, which calls `controller.shutdown()`, which calls `_io_pool.shutdown(wait=False)`. Then `app.run()` returns, `main()` returns, and the sloppy.py process exits. But the log shows more activity AFTER the IM_ASSERT (clicks being processed), so this hypothesis is inconsistent with the evidence.
|
||||
|
||||
**Hypothesis C:** The hook server thread (FastAPI on port 8999) runs in a separate thread from the main ImGui loop. The IM_ASSERT crashes the main thread but the hook server thread keeps running. The `_io_pool` is the GUI's pool, not the hook server's. When the main thread crashes, Python's atexit / finalization shuts down `_io_pool`. The hook server thread continues independently.
|
||||
|
||||
**Hypothesis C is most consistent with the evidence.** The `_io_pool` is created in `AppController.__init__` (line ~810 in current code). The hook server is a separate `ThreadingHTTPServer` (line 11 of `src/api_hooks.py`). They are independent. The IM_ASSERT kills the ImGui main loop, the io_pool gets shut down during finalization, the hook server thread continues serving requests.
|
||||
|
||||
The exact mechanism of `_io_pool.shutdown` being called in this scenario is not directly observable from the log. It could be:
|
||||
- `ThreadPoolExecutor.__del__` during GC (Hypothesis C path)
|
||||
- An atexit handler installed by the warmup system or another module
|
||||
- A signal delivery I haven't identified (e.g., the Python interpreter catching the exception and sending SIGTERM internally)
|
||||
|
||||
**This matters less than the actual fix.** The IM_ASSERT is the trigger; the io_pool shutdown is a downstream consequence. Fixing the IM_ASSERT (the real bug) prevents all of this.
|
||||
|
||||
---
|
||||
|
||||
## Why Did The IM_ASSERT Only Fire In Batch?
|
||||
|
||||
The IM_ASSERT is deterministic: it fires every time the offending code path is rendered. In isolation, `test_full_live_workflow` runs alone — it does not exercise the same render functions as the 4 sims. In batch, the sims run first:
|
||||
- `test_context_sim_live` (ContextSimulation)
|
||||
- `test_ai_settings_sim_live` (AISettingsSimulation)
|
||||
- `test_tools_sim_live` (ToolsSimulation)
|
||||
- `test_execution_sim_live` (ExecutionSimulation)
|
||||
|
||||
Each sim opens specific panels (Context, AI Settings, Tools, Execution) and triggers render paths that may be unique to those simulations. After 4 sims, the cumulative state of `ImGui`'s internal scope stack is corrupted — a `begin()` was called in a panel that's only opened in sim mode, and its matching `end()` was either:
|
||||
- In a code path that was skipped due to a conditional render
|
||||
- In a `defer` block that early-returned
|
||||
- After a `return` inside a panel function
|
||||
- Skipped due to a conditional in the render loop
|
||||
|
||||
The IM_ASSERT then fires at frame 71.5s, which is some specific frame AFTER the sims have set up state. The exact render function is unknown without running `scripts/check_imgui_scopes.py` against the full codebase.
|
||||
|
||||
---
|
||||
|
||||
## Why Did My Previous Fixes Fail?
|
||||
|
||||
### Fix 1: io_pool 4→8 (commit `4a338486`)
|
||||
|
||||
**Wrong diagnosis:** I assumed the io_pool was saturated with sims' AI discussion turn workers, causing the new switch to queue forever.
|
||||
|
||||
**Actual cause:** The io_pool isn't the bottleneck. The switch runs in ~8-10ms. The pool wasn't saturated at the time of the test's switch.
|
||||
|
||||
**Why the commit doesn't hurt:** Bigger pool is a marginal improvement to startup concurrency. It's not a regression, just a fix for an issue that isn't the root cause.
|
||||
|
||||
### Fix 2: Test timeout 30s→120s (commit `c9a991bb`)
|
||||
|
||||
**Wrong diagnosis:** I assumed the switch was slow and just needed more time.
|
||||
|
||||
**Actual cause:** The switch is fast. The test fails because the click can't even reach the switch handler — `submit_io` throws at line 2282 because the pool is shut down.
|
||||
|
||||
**Why the commit doesn't hurt:** A longer timeout gives a clearer error message (the actual `RuntimeError: cannot schedule new futures after shutdown` is surfaced) but doesn't change the outcome. If anything, it makes the test more annoying to wait for.
|
||||
|
||||
### Fix 3 (uncommitted, reverted): Dedicated executor for switches
|
||||
|
||||
**Wrong diagnosis:** I assumed the project switch should not share a pool with background work.
|
||||
|
||||
**Actual cause:** The pool is fine. The pool gets killed by the GUI crash.
|
||||
|
||||
**Why the commit was reverted:** It added complexity for a non-issue. Per `conductor/workflow.md`: "Don't ship a known regression to save time."
|
||||
|
||||
---
|
||||
|
||||
## The Three Fixes I Have NOT Yet Attempted
|
||||
|
||||
Per the systematic-debugging skill, the architectural question needs user input. The 3 viable directions are documented in `docs/reports/test_full_live_workflow_propagation_digest_20260608.md` (to be written — see TODOs).
|
||||
|
||||
### Direction A: Fix the actual ImGui scope bug
|
||||
|
||||
**What:** Run `scripts/check_imgui_scopes.py` to find the `begin()`/`end()` mismatch. Fix the offending render function.
|
||||
|
||||
**Pros:** Real fix. Solves the root cause.
|
||||
|
||||
**Cons:** May require deep investigation across 90+ render functions. May be in a render path that's only triggered by a specific sim panel combination. Could take significant time.
|
||||
|
||||
**Risk:** Medium. A wrong fix could break other tests or hide the real issue.
|
||||
|
||||
### Direction B: Wrap `immapp.run` in `try/except RuntimeError`
|
||||
|
||||
**What:** In `src/gui_2.py:618`, wrap `immapp.run(...)` in a `try/except RuntimeError` (or broader). On exception, log it and let the app continue in a degraded state (e.g., skip the rest of the frame, return to event loop).
|
||||
|
||||
**Pros:** Band-aid that prevents the GUI crash from propagating to the process. Tests continue to work. Easier to implement than Direction A.
|
||||
|
||||
**Cons:** Hides the actual ImGui scope bug. Future tests may exhibit other weirdness from the scope mismatch. The user has said they don't want a "wrap" that just silently continues.
|
||||
|
||||
**Risk:** Low for tests, but masks a real bug. Per user: "I don't want the entire test to just linger or silently continue."
|
||||
|
||||
### Direction C: Make `_io_pool.shutdown` recoverable
|
||||
|
||||
**What:** In `submit_io`, check if the pool is shut down. If so, recreate it (lazily).
|
||||
|
||||
**Pros:** Decouples the test from the io_pool's lifecycle. Makes the controller more robust to GUI crashes.
|
||||
|
||||
**Cons:** Doesn't address the IM_ASSERT root cause. The GUI is still crashing — we're just hiding the consequences.
|
||||
|
||||
**Risk:** Low. Standard pattern for resilient thread pools.
|
||||
|
||||
### Direction D: Make the batch runner handle the failure cleanly
|
||||
|
||||
**What:** When a test file fails, the `run_tests_batched.py` runner currently continues to the next batch. The fix is to ensure: (1) the failing test file is marked as failed, (2) the next batch can start with a clean state (kill and restart the sloppy.py subprocess per batch, not session-wide).
|
||||
|
||||
**Pros:** Doesn't require fixing the underlying bug. Tests can fail without poisoning subsequent batches.
|
||||
|
||||
**Cons:** Doesn't fix `test_full_live_workflow`. The test still fails in its own batch.
|
||||
|
||||
**Risk:** Low. Standard pattern for test isolation.
|
||||
|
||||
### User's explicit guidance
|
||||
|
||||
Per the user:
|
||||
- "I don't want the entire test to just linger or silently continue"
|
||||
- "I also don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails"
|
||||
- "Just has to note that the new file didn't get to deal with a dirty state"
|
||||
- "The wrap might be worth it if that properly lets us handle the assert"
|
||||
|
||||
The user wants:
|
||||
1. A report (this document)
|
||||
2. A todo list for the actual fixes
|
||||
3. A digest of probable solutions
|
||||
4. Batch-level resilience (kill+restart sloppy.py per file)
|
||||
5. The wrap-around-`immapp.run` is acceptable IF it properly handles the assert (not just swallows it)
|
||||
|
||||
---
|
||||
|
||||
## Files Referenced
|
||||
|
||||
- `src/app_controller.py:2282` — `submit_io` line that throws the RuntimeError
|
||||
- `src/app_controller.py:762` — `_on_sigint` shutdown
|
||||
- `src/app_controller.py:2325` — `controller.shutdown` pool shutdown
|
||||
- `src/gui_2.py:618` — `immapp.run(...)` call site (the IM_ASSERT trigger)
|
||||
- `src/gui_2.py:620` — `self.shutdown()` (only called on normal exit)
|
||||
- `src/api_hooks.py:117-136` — `/api/project_switch_status` endpoint
|
||||
- `src/api_hooks.py:11` — `from http.server import ThreadingHTTPServer` (independent of io_pool)
|
||||
- `tests/test_live_workflow.py:90-94` — `wait_for_project_switch` call
|
||||
- `tests/test_live_workflow.py:84-89` — defensive `os.path.exists` check
|
||||
- `tests/conftest.py:263-280` — `kill_process_tree` (uses `taskkill /F` on Windows; no signal)
|
||||
- `tests/conftest.py:111-126` — pytest smart watchdog (300s timeout in PARENT process)
|
||||
- `tests/conftest.py:516-547` — `live_gui` fixture finally block (session-scoped, only fires at end)
|
||||
- `scripts/check_imgui_scopes.py` — EXISTING audit script that can detect this class of bug
|
||||
- `logs/sloppy_py_test.log` — captured subprocess stderr from the failing test run
|
||||
|
||||
## Diagnostic Logging (temporarily added, then reverted)
|
||||
|
||||
```python
|
||||
# src/app_controller.py:2731-2763 (REVERTED)
|
||||
import sys as _diag_sys
|
||||
_diag_t0 = time.time()
|
||||
def _diag(step: str) -> None:
|
||||
print(f"[switch-diag] +{time.time()-_diag_t0:.3f}s {step} path={Path(path).name}",
|
||||
file=_diag_sys.stderr, flush=True)
|
||||
_diag("enter")
|
||||
# ... at every step ...
|
||||
```
|
||||
|
||||
All diagnostic logging has been removed. The production code is back to the pre-diagnostic state. The pattern can be re-applied if needed (e.g., to find which `begin()` in which render function is unbalanced).
|
||||
@@ -0,0 +1,388 @@
|
||||
# Digest: Probable Solutions for ImGui Assert Propagation Failure (2026-06-08)
|
||||
|
||||
**Companion to:** `docs/reports/test_full_live_workflow_imgui_assert_20260608.md`
|
||||
**Companion to:** `conductor/todos/TODO_test_full_live_workflow_v2.md`
|
||||
**Status:** Pre-implementation analysis. User has not yet chosen a direction.
|
||||
**Audience:** Future implementer, the user (decision reference)
|
||||
|
||||
---
|
||||
|
||||
## 1. The Problem (restated)
|
||||
|
||||
When `tests/test_extended_sims.py` (4 sims) runs before `tests/test_live_workflow.py` in the tier-3 batch, an ImGui `IM_ASSERT((0) && "Missing End()")` fires at ~71.5s into GUI lifetime in window 'MainDockSpace'. The `RuntimeError` propagates from `immapp.run` through `app.run()` and `main()`. The hook server thread (separate `ThreadingHTTPServer`) survives. The `_io_pool` ends up in a shutdown state (mechanism unclear — likely `ThreadPoolExecutor.__del__` during GC). Subsequent test clicks fail with `RuntimeError: cannot schedule new futures after shutdown`.
|
||||
|
||||
The test poll loop then waits 120s before timing out.
|
||||
|
||||
---
|
||||
|
||||
## 2. Constraints (from user)
|
||||
|
||||
Per the user's session feedback (2026-06-08):
|
||||
- "I don't want the entire test to just linger or silently continue" — silent failure is not acceptable
|
||||
- "I also don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails" — batch isolation is required
|
||||
- "Just has to note that the new file didn't get to deal with a dirty state" — the failure should be observable, not hidden
|
||||
- "The wrap might be worth it if that properly lets us handle the assert" — a proper wrap (with logging + observable state) is acceptable; a silent swallow is not
|
||||
|
||||
The user wants:
|
||||
1. A real fix where possible
|
||||
2. A wrap that surfaces the failure (not swallows it)
|
||||
3. Batch resilience (a failed batch should not poison the next)
|
||||
4. Tests should be able to detect a degraded GUI and fail fast with a clear message
|
||||
|
||||
---
|
||||
|
||||
## 3. Solution Matrix
|
||||
|
||||
The 6 tasks in `conductor/todos/TODO_test_full_live_workflow_v2.md` are presented here as a solutions matrix. Each is evaluated on:
|
||||
- **Real-fix value** (does it address the root cause?)
|
||||
- **Test impact** (does it make `test_full_live_workflow` pass in batch?)
|
||||
- **Effort** (hours)
|
||||
- **Risk** (chance of regression)
|
||||
- **User alignment** (does it match the user's stated constraints?)
|
||||
|
||||
| # | Solution | Real fix? | Test impact | Effort | Risk | User-aligned? |
|
||||
|---|----------|-----------|------------|--------|------|---------------|
|
||||
| 1 | Run `check_imgui_scopes.py` to find the scope mismatch | **Yes** (the actual bug) | Yes (if successful) | 1-2h | Med | Yes |
|
||||
| 2 | Fix the identified ImGui scope mismatch | **Yes** (the actual bug) | Yes (if successful) | 1-4h | Med | Yes |
|
||||
| 3 | Wrap `immapp.run` in `try/except RuntimeError` | No (band-aid) | Yes (prevents crash) | 1-2h | Low | Yes (per user: "might be worth it if it properly handles") |
|
||||
| 4 | Kill+restart sloppy.py per test file | No (isolation) | Yes (clean state) | 2-4h | Low | Yes (per user: "I can't restart the app and continue with the next test file if it fails") |
|
||||
| 5 | Make `submit_io` recover from a shut-down pool | No (resilience) | Yes (survives crash) | 0.5h | Low | Yes (defense in depth) |
|
||||
| 6 | Add `/api/gui_health` endpoint | No (observability) | Yes (fast-fail) | 1-2h | Low | Yes (per user: "Just has to note that the new file didn't get to deal with a dirty state") |
|
||||
|
||||
---
|
||||
|
||||
## 4. Solution Details (Probable Approaches)
|
||||
|
||||
### 4.1 Solution 1+2: Audit + Fix the ImGui Scope Mismatch
|
||||
|
||||
**Approach A: Manual scope audit (lowest risk)**
|
||||
|
||||
1. Run `python scripts/check_imgui_scopes.py` against `src/gui_2.py`
|
||||
2. Triage findings — many will be false positives (e.g., `begin()` inside a conditional that has an `end()` in the other branch via context manager)
|
||||
3. Identify the SPECIFIC `render_*` function with the unbalanced scope
|
||||
4. Inspect that function's render path and find the missing `end()` or the extra `begin()`
|
||||
|
||||
**Probable location of the bug:** A render function that's only called in one of the sims' panel render paths. Candidates:
|
||||
- Render functions for the AI Settings panel (test_ai_settings_sim_live)
|
||||
- Render functions for the Tools panel (test_tools_sim_live)
|
||||
- Render functions for the Execution/Modals panel (test_execution_sim_live)
|
||||
- Render functions for the Context & Chat panel (test_context_sim_live)
|
||||
|
||||
**Probable cause of the bug:** A recent render refactor that added a `begin()` to show a tooltip or popup, but the matching `end()` is inside an `if` branch that can early-return. Or a `begin()` that's only conditionally reached, with the `end()` always called but the stack now has an extra entry.
|
||||
|
||||
**Approach B: Defer-not-catch pattern (per `conductor/workflow.md` known pitfall)**
|
||||
|
||||
The known pitfall section describes:
|
||||
> `imgui-bundle` (and similar native extension libraries) expose C-level functions that can crash the Python process with a Windows access violation (`0xc0000005`) or a SIGSEGV on Linux. **These crashes are not catchable from Python** — `try/except Exception` does not intercept native access violations, only Python exceptions.
|
||||
|
||||
The IM_ASSERT we observed IS a Python RuntimeError (catchable). But the fix pattern is similar: use `imscope` context managers (per `src/imgui_scopes.py`) to ensure scopes are balanced even on early returns.
|
||||
|
||||
**Implementation pattern:**
|
||||
```python
|
||||
# Before (buggy)
|
||||
def render_my_panel(app):
|
||||
imgui.begin("My Panel")
|
||||
if not app.some_state:
|
||||
return # <-- Missing end()!
|
||||
imgui.text("hello")
|
||||
imgui.end()
|
||||
|
||||
# After (fixed)
|
||||
def render_my_panel(app):
|
||||
with imscope(imgui.begin, "My Panel"): # auto end() on exit
|
||||
if not app.some_state:
|
||||
return
|
||||
imgui.text("hello")
|
||||
# end() called automatically
|
||||
```
|
||||
|
||||
**Effort:** 1-4 hours (depends on what the audit finds).
|
||||
|
||||
**Risk:** Medium. The fix may need to be applied to multiple render functions, each requiring careful testing.
|
||||
|
||||
**Confidence in success:** Medium. The IM_ASSERT is deterministic and ImGui's scope tracking is reliable. Once the offending function is found, the fix is mechanical.
|
||||
|
||||
### 4.2 Solution 3: Wrap `immapp.run` in `try/except RuntimeError`
|
||||
|
||||
**Approach: Catch and recover**
|
||||
|
||||
```python
|
||||
# src/gui_2.py:617-621
|
||||
try:
|
||||
immapp.run(self.runner_params, add_ons_params=immapp.AddOnsParams(with_markdown_options=md_options))
|
||||
except RuntimeError as e:
|
||||
# IM_ASSERT (Missing End()) or similar. Log the error, mark the GUI
|
||||
# as degraded, and let the hook server continue. Per user feedback
|
||||
# (2026-06-08): the wrap is acceptable IF it surfaces the failure
|
||||
# (does not silently swallow).
|
||||
self.controller._gui_degraded_reason = f"immapp.run raised: {e}"
|
||||
self.controller._last_imgui_assert = traceback.format_exc()
|
||||
print(f"[GUI-DEGRADED] {self.controller._gui_degraded_reason}", file=sys.stderr, flush=True)
|
||||
# Do NOT call self.shutdown() — keep the hook server alive for tests.
|
||||
# The io_pool may be in a weird state; lazy-recreate on next submit_io (Task 5).
|
||||
# On normal exit
|
||||
if not self.controller._gui_degraded_reason:
|
||||
self.shutdown()
|
||||
session_logger.close_session()
|
||||
```
|
||||
|
||||
**Key design choices:**
|
||||
1. **The wrap does NOT call `self.shutdown()`** when the assert fires. This keeps the hook server alive so subsequent tests can still query state.
|
||||
2. **The error is logged at ERROR level** with the full assert message and stack trace. This is observable, not silent.
|
||||
3. **The controller sets `_gui_degraded_reason` and `_last_imgui_assert`** so the new `/api/gui_health` endpoint (Task 6) can expose the state to tests.
|
||||
4. **Tests can detect the degraded state** via `client.get_gui_health()` and fail fast with a clear message.
|
||||
|
||||
**Effort:** 1-2 hours (wrap + logging + state).
|
||||
|
||||
**Risk:** Low. The wrap is a band-aid, but a transparent one. The error is still surfaced.
|
||||
|
||||
**Confidence in success:** High. The wrap itself is trivial. The integration with `/api/gui_health` and the test-side fast-fail is straightforward.
|
||||
|
||||
### 4.3 Solution 4: Kill+Restart sloppy.py per Test File
|
||||
|
||||
**Approach: Per-file fixture scope**
|
||||
|
||||
Currently, the `live_gui` fixture in `tests/conftest.py` is session-scoped. All live_gui tests share the same `sloppy.py` subprocess for the entire test session. If the subprocess crashes mid-session, all subsequent tests are poisoned.
|
||||
|
||||
**Change:** Make the fixture per-file-scoped (or function-scoped with smart re-spawn). When a test in file A finishes, the subprocess is killed. When file B starts, a fresh subprocess is spawned.
|
||||
|
||||
**Implementation pattern:**
|
||||
|
||||
```python
|
||||
# tests/conftest.py (modified)
|
||||
@pytest.fixture(scope="module") # was: "session"
|
||||
def live_gui(request):
|
||||
# If the prior file's subprocess died, this is a fresh start.
|
||||
# If the user wants true per-test isolation, change to "function".
|
||||
...
|
||||
try:
|
||||
yield process, gui_script
|
||||
finally:
|
||||
kill_process_tree(process.pid)
|
||||
log_file.close()
|
||||
shutil.rmtree(temp_workspace)
|
||||
```
|
||||
|
||||
**Considerations:**
|
||||
- **Performance:** Each test file's fixture spawn takes ~1-2s. With 49 live_gui files, that's 49-98s added to the tier-3 batch. Probably acceptable.
|
||||
- **State persistence:** Currently, tests can rely on state from a prior test (e.g., a project loaded by sim 1 is still loaded when sim 2 runs). Making the fixture per-file-scoped breaks this. **Most live_gui tests should NOT depend on prior test state** — they should set up their own state. This is the principle the v1 report identified.
|
||||
- **Watchdog interaction:** The conftest's smart watchdog (300s) and unconditional watchdog (900s) are based on `_pytest_finished_event`. The per-file fixture change is independent.
|
||||
|
||||
**Alternative approach: Smart re-spawn**
|
||||
|
||||
Keep the fixture session-scoped, but add a "re-spawn" check at the start of each test. If the subprocess is dead, spawn a new one. The fixture becomes a "lazy" fixture that may spawn multiple subprocesses over the session.
|
||||
|
||||
**Probable implementation:**
|
||||
```python
|
||||
@pytest.fixture(scope="session")
|
||||
def live_gui(request):
|
||||
process, gui_script = _spawn_sloppy()
|
||||
yield _LazyLiveGui(process, gui_script)
|
||||
kill_process_tree(process.pid)
|
||||
|
||||
class _LazyLiveGui:
|
||||
def __init__(self, process, gui_script):
|
||||
self._process = process
|
||||
self._gui_script = gui_script
|
||||
self._lock = threading.Lock()
|
||||
|
||||
def get_process(self):
|
||||
with self._lock:
|
||||
if self._process.poll() is not None:
|
||||
# Respawn
|
||||
self._process = _spawn_sloppy()
|
||||
return self._process
|
||||
```
|
||||
|
||||
This is more complex but preserves the per-session state model for tests that need it.
|
||||
|
||||
**Effort:** 2-4 hours (per-file approach is simpler; lazy re-spawn is more invasive).
|
||||
|
||||
**Risk:** Low. The per-file approach is straightforward. The lazy re-spawn is more complex but doesn't change the API surface for tests.
|
||||
|
||||
**Confidence in success:** High. The pattern is well-established in test infrastructure.
|
||||
|
||||
### 4.4 Solution 5: Make `submit_io` Recover from a Shut-Down Pool
|
||||
|
||||
**Approach: Lazy recreation**
|
||||
|
||||
```python
|
||||
# src/app_controller.py:2275-2284
|
||||
def submit_io(self, fn, *args, **kwargs):
|
||||
if not hasattr(self, "_io_pool") or self._io_pool is None or self._is_io_pool_shutdown():
|
||||
# Recreate the pool (it was shut down, e.g., by a GUI crash).
|
||||
self._io_pool = make_io_pool()
|
||||
self._io_pool_inflight = 0
|
||||
if not hasattr(self, "_io_pool_inflight_lock"):
|
||||
self._io_pool_inflight_lock = threading.Lock()
|
||||
with self._io_pool_inflight_lock:
|
||||
self._io_pool_inflight = getattr(self, "_io_pool_inflight", 0) + 1
|
||||
future = self._io_pool.submit(fn, *args, **kwargs)
|
||||
future.add_done_callback(lambda _f: self._io_pool_inflight_done())
|
||||
return future
|
||||
|
||||
def _is_io_pool_shutdown(self) -> bool:
|
||||
"""True if the io_pool has been shut down (e.g., via __del__ or controller.shutdown)."""
|
||||
pool = getattr(self, "_io_pool", None)
|
||||
if pool is None:
|
||||
return True
|
||||
# ThreadPoolExecutor doesn't expose a clean "is_shutdown" method.
|
||||
# Use a try/except probe.
|
||||
try:
|
||||
pool.submit(lambda: None).result(timeout=0.001)
|
||||
return False
|
||||
except RuntimeError:
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
```
|
||||
|
||||
**Key design choices:**
|
||||
1. **The lazy-recreate happens transparently.** Callers don't need to know about pool lifecycle.
|
||||
2. **The inflight counter is reset** when the pool is recreated (the old workers are dead).
|
||||
3. **The probe (`_is_io_pool_shutdown`) is best-effort.** `ThreadPoolExecutor` doesn't expose a clean shutdown check; a `submit().result()` probe is the standard pattern.
|
||||
|
||||
**Effort:** 30 minutes.
|
||||
|
||||
**Risk:** Low. The pool was already designed to be replaceable (per the existing test infrastructure). The probe is a non-invasive check.
|
||||
|
||||
**Confidence in success:** High. Standard pattern for resilient thread pools.
|
||||
|
||||
### 4.5 Solution 6: Add `/api/gui_health` Endpoint
|
||||
|
||||
**Approach: Read-only health endpoint**
|
||||
|
||||
```python
|
||||
# src/api_hooks.py (new elif branch in do_GET)
|
||||
elif self.path == "/api/gui_health":
|
||||
self.send_response(200)
|
||||
self.send_header("Content-Type", "application/json")
|
||||
self.end_headers()
|
||||
controller = _get_app_attr(app, "controller", None)
|
||||
if controller is None:
|
||||
payload = {"healthy": True, "degraded_reason": None, "last_assert": None, "io_pool_alive": True}
|
||||
else:
|
||||
payload = {
|
||||
"healthy": getattr(controller, "_gui_degraded_reason", None) is None,
|
||||
"degraded_reason": getattr(controller, "_gui_degraded_reason", None),
|
||||
"last_assert": getattr(controller, "_last_imgui_assert", None),
|
||||
"io_pool_alive": not controller._is_io_pool_shutdown() if hasattr(controller, "_is_io_pool_shutdown") else True,
|
||||
}
|
||||
self.wfile.write(json.dumps(payload).encode("utf-8"))
|
||||
```
|
||||
|
||||
**Test integration:**
|
||||
|
||||
```python
|
||||
# tests/test_live_workflow.py (new pre-check at start)
|
||||
def test_full_live_workflow(live_gui) -> None:
|
||||
client = ApiHookClient()
|
||||
assert client.wait_for_server(timeout=10)
|
||||
# New: check GUI health before proceeding
|
||||
health = client.get_gui_health()
|
||||
if not health.get("healthy"):
|
||||
pytest.fail(
|
||||
f"GUI is degraded before test starts: "
|
||||
f"degraded_reason={health.get('degraded_reason')}, "
|
||||
f"last_assert={health.get('last_assert')}"
|
||||
)
|
||||
...
|
||||
```
|
||||
|
||||
**Effort:** 1-2 hours.
|
||||
|
||||
**Risk:** Low. Read-only endpoint + trivial client method + simple pre-check.
|
||||
|
||||
**Confidence in success:** High.
|
||||
|
||||
---
|
||||
|
||||
## 5. Recommended Combination
|
||||
|
||||
The user's constraints suggest a **layered defense**:
|
||||
|
||||
### Layer 1: Real fix (Tasks 1+2)
|
||||
Find and fix the ImGui scope mismatch. This is the right thing to do.
|
||||
|
||||
### Layer 2: Safety net (Task 3)
|
||||
Wrap `immapp.run` so a future scope mismatch doesn't kill the process. Log the error, mark GUI as degraded.
|
||||
|
||||
### Layer 3: Observability (Task 6)
|
||||
Expose the degraded state via `/api/gui_health`. Tests can fast-fail with a clear message.
|
||||
|
||||
### Layer 4: Resilience (Task 5)
|
||||
Make `submit_io` recover from a shut-down pool. Defense in depth.
|
||||
|
||||
### Layer 5: Batch isolation (Task 4)
|
||||
Per-file (or per-test) fixture scope for `live_gui`. A failed batch doesn't poison the next.
|
||||
|
||||
**Recommended PR order:**
|
||||
- **PR 1 (highest priority):** Tasks 1+2 (real fix)
|
||||
- **PR 2 (in parallel with PR 1):** Tasks 3+6 (wrap + observability) — these don't depend on finding the bug
|
||||
- **PR 3:** Task 4 (batch isolation) — independent of the bug, valuable on its own
|
||||
- **PR 4 (defense in depth):** Task 5 (lazy pool recreation) — only needed if PR 2 doesn't fully solve the problem
|
||||
|
||||
### What's NOT recommended
|
||||
|
||||
- **Task 5 alone, without Tasks 1+2+3:** The lazy-recreate hides the bug. The io_pool will keep getting shut down by the GUI crash, then recreated, then shut down again, etc. The test will pass but the GUI is still broken.
|
||||
- **Task 4 alone, without the real fix:** Batch isolation is good hygiene, but the test still fails in its own batch. The bug is not fixed.
|
||||
- **A silent swallow in Task 3:** The user explicitly rejected this. A silent failure is worse than a visible failure.
|
||||
|
||||
---
|
||||
|
||||
## 6. Open Questions for the User
|
||||
|
||||
Before implementation, these need clarification:
|
||||
|
||||
1. **Task 3 wrap behavior:** When the IM_ASSERT fires, should the wrap:
|
||||
a. Just log and return (degraded mode, no further renders)
|
||||
b. Try to continue the render loop (skip the current frame, retry next frame)
|
||||
c. Restart the ImGui context entirely (clear and reinitialize)
|
||||
|
||||
My recommendation: (a) is the safest. (b) risks infinite loops if the scope is genuinely broken. (c) is invasive.
|
||||
|
||||
2. **Task 4 scope:** Per-file or per-test? Per-test is more isolated but adds 49+ fixture spawns. Per-file is a middle ground.
|
||||
|
||||
My recommendation: Per-file for now. Per-test can be added later if needed.
|
||||
|
||||
3. **Task 5 proactive recreation:** Should the pool be recreated eagerly (at controller init) or lazily (on first submit after shutdown)?
|
||||
|
||||
My recommendation: Lazy. The pool is normally alive; recreating eagerly is wasteful.
|
||||
|
||||
4. **New state attributes:** `_gui_degraded_reason` and `_last_imgui_assert` — should they be persisted to disk (e.g., for crash analysis) or just in-memory?
|
||||
|
||||
My recommendation: In-memory. Disk persistence is overkill for transient runtime state.
|
||||
|
||||
5. **Compatibility with existing tests:** Some existing tests may assume the io_pool is always available. Will the lazy-recreate break them?
|
||||
|
||||
My recommendation: No. The lazy-recreate preserves the pool's API surface. The only difference is that the pool may be a fresh instance after a crash. Existing tests don't care about the instance identity.
|
||||
|
||||
---
|
||||
|
||||
## 7. Effort & Risk Summary
|
||||
|
||||
| Solution | Effort | Risk | User-aligned | Recommendation |
|
||||
|----------|--------|------|--------------|----------------|
|
||||
| 1+2 (audit + fix) | 2-6h | Med | Yes | **PRIORITY 1: implement first** |
|
||||
| 3 (wrap) | 1-2h | Low | Yes | **PRIORITY 2: implement in parallel** |
|
||||
| 4 (batch isolation) | 2-4h | Low | Yes | **PRIORITY 3: independent, implement anytime** |
|
||||
| 5 (lazy recreate) | 0.5h | Low | Yes | **PRIORITY 4: only if 3 doesn't fully solve** |
|
||||
| 6 (health endpoint) | 1-2h | Low | Yes | **PRIORITY 2: implement with 3** |
|
||||
|
||||
**Total effort (all):** 6-14 hours
|
||||
**Recommended PR sequence:** 1+2 → 3+6 → 4 → 5
|
||||
|
||||
---
|
||||
|
||||
## 8. References
|
||||
|
||||
- `docs/reports/test_full_live_workflow_imgui_assert_20260608.md` — full debugging report
|
||||
- `conductor/todos/TODO_test_full_live_workflow_v2.md` — task list
|
||||
- `scripts/check_imgui_scopes.py` — existing audit script (use for Task 1)
|
||||
- `src/imgui_scopes.py` — ImGuiScope context manager (use for fix in Task 2)
|
||||
- `src/api_hooks.py:11` — `ThreadingHTTPServer` (independent of io_pool)
|
||||
- `src/gui_2.py:618` — `immapp.run(...)` call site
|
||||
- `src/app_controller.py:2282` — `submit_io` line that throws the RuntimeError
|
||||
- `conductor/workflow.md` "Known Pitfalls (2026-06-05)" — Defer-Not-Catch Pattern
|
||||
- `conductor/workflow.md` "Skip-Marker Policy" — don't add skip markers as workarounds
|
||||
- `AGENTS.md` "Critical Anti-Patterns" — no comments, no batch fragility
|
||||
Reference in New Issue
Block a user