test(live_workflow): pre-flight health check fails fast on dirty state

PR3 of the test_full_live_workflow_imgui_assert fix sequence. When a prior live_gui test in the same session crashes the GUI (e.g. via an ImGui IM_ASSERT from cumulative panel state), the controller's _io_pool gets shut down. The next test starts in a degraded state but only discovers this 120s later when its project switch times out with a confusing 'cannot schedule new futures after shutdown' error. This commit adds a /api/gui_health pre-flight check at the start of test_full_live_workflow. If the GUI is degraded, the test fails fast (within 1s) with a clear, actionable message that includes: - The exact RuntimeError that caused the degradation - The full traceback of the last ImGui scope mismatch - A note that the new test cannot proceed with a dirty state Per user feedback 2026-06-08: 'I don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state.' Also includes the planning documents written earlier in this session: - TODO_test_full_live_workflow_v2.md (task list) - test_full_live_workflow_imgui_assert_20260608.md (root cause report) - test_full_live_workflow_propagation_digest_20260608.md (solutions digest) - batch_resilience_plan_20260608.md (batch resilience plan) Verification: - test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade) - 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS) - Without PR3 fix: 200s FAIL with confusing 120s timeout - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message - The fast-fail is observable, not silent (per user's 'wrap might be worth it if that properly lets us handle the assert')
2026-06-08 21:17:54 -04:00
parent 8a597d1832
commit 51ecace464
5 changed files with 1077 additions and 0 deletions
@@ -0,0 +1,250 @@
+# Batch-Level Test Resilience Plan
+
+**Companion to:** `docs/reports/test_full_live_workflow_propagation_digest_20260608.md`
+**Status:** Pre-implementation plan
+**User requirement:** "I also don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state."
+
+---
+
+## 1. Current Behavior
+
+The `tests/conftest.py:live_gui` fixture is **session-scoped**. It spawns a single `sloppy.py` subprocess at the start of the test session and keeps it alive for ALL live_gui tests across ALL tiers.
+
+**Test file structure (relevant):**
+- `tests/test_extended_sims.py` — 4 sim tests: `test_context_sim_live`, `test_ai_settings_sim_live`, `test_tools_sim_live`, `test_execution_sim_live`. The IM_ASSERT fires during the 4th sim (~71.5s into GUI lifetime).
+- `tests/test_live_workflow.py` — separate file, runs AFTER test_extended_sims.py in alphabetical order. `test_full_live_workflow` is the failing test.
+
+The IM_ASSERT crashes the GUI's main loop mid-test-file. The hook server (separate thread) survives, but the controller's `_io_pool` is in a shutdown state. The next test file (`test_live_workflow.py`) starts in this degraded state. Its first click (`btn_project_new_automated`) hits `submit_io` which raises `RuntimeError: cannot schedule new futures after shutdown`. The test's `wait_for_project_switch` polls for 120s before timing out.
+
+**Failure mode observed by user:** "the new file didn't get to deal with a dirty state"
+
+---
+
+## 2. Real User Concern: Within-Session Subprocess Degradation
+
+The user's concern is specifically about WITHIN-SESSION state. They want:
+
+1. A test file can crash the subprocess without preventing the next file from running cleanly
+2. If the next file is doomed (subprocess is degraded), the runner should report this clearly, not silently time out
+3. The runner should continue to subsequent batches even after a failed one (this already works for tiers that don't use `live_gui`)
+
+**The current implementation has NONE of these properties:**
+- `live_gui` is session-scoped, so the subprocess lives across the whole test session
+- A crashed subprocess poisons all subsequent live_gui tests
+- The degraded state (io_pool shut down) is not surfaced to the test, so the test fails with a confusing timeout, not a clear "subprocess degraded" message
+
+---
+
+## 3. Probable Solutions
+
+### Solution A: Per-file live_gui Fixture (most isolated)
+
+**Approach:** Change `live_gui` from `@pytest.fixture(scope="session")` to `@pytest.fixture(scope="module")`. Each test file gets a fresh subprocess.
+
+**Code change (1 line):**
+```python
+# tests/conftest.py
+@pytest.fixture(scope="module")  # was: "session"
+def live_gui(request):
+    ...
+```
+
+**Pros:**
+- Maximum isolation. A test file that crashes the subprocess doesn't affect the next file.
+- The fixture's `finally` block (which calls `kill_process_tree`) is the per-file cleanup.
+- Simple to implement (one-line scope change + audit).
+
+**Cons:**
+- ~1-2s overhead per file (subprocess spawn + hook server health check).
+- For 49 live_gui files, that's 49-98s of additional overhead.
+- Some tests may currently rely on cross-file state (e.g., a project loaded by file A is still loaded when file B starts). These tests would break.
+
+**Mitigation:** Audit the live_gui tests for cross-file state dependencies. Most should be standalone (each test sets up its own state). If any are not, mark them with `@pytest.mark.requires_prior_state` and either:
+- Skip them when scope is module
+- Or document the dependency and add a setup step in the dependent file
+
+**Effort:** 1-2 hours (scope change + audit + fix cross-file dependencies).
+
+**Risk:** Medium. May break tests that depend on cross-file state. The audit is the main work.
+
+### Solution B: Lazy Re-spawn (most flexible)
+
+**Approach:** Keep the `live_gui` fixture session-scoped, but wrap it in a handle that re-spawns the subprocess if it dies. The handle exposes the same API as the current fixture.
+
+**Code change (significant):**
+```python
+# tests/conftest.py
+class _LiveGuiHandle:
+    def __init__(self, gui_script: str):
+        self._gui_script = gui_script
+        self._process: subprocess.Popen | None = None
+        self._lock = threading.Lock()
+        self._spawn()
+    
+    def _spawn(self) -> None:
+        # Existing fixture spawn logic, refactored into a method
+        ...
+    
+    def is_alive(self) -> bool:
+        return self._process is not None and self._process.poll() is None
+    
+    def ensure_alive(self) -> None:
+        with self._lock:
+            if not self.is_alive():
+                self._spawn()
+    
+    @property
+    def process(self) -> subprocess.Popen:
+        self.ensure_alive()
+        return self._process
+
+@pytest.fixture(scope="session")
+def live_gui(request):
+    handle = _LiveGuiHandle(gui_script)
+    yield handle, handle._gui_script
+    handle._kill()
+```
+
+**Pros:**
+- Preserves the per-session fixture scope.
+- Auto-recovers from subprocess death between tests.
+- Tests that rely on cross-file state can still do so (the subprocess is the same instance, modulo a respawn).
+- Single place to add health checks.
+
+**Cons:**
+- More complex. The handle's `ensure_alive` adds a check at every test entry.
+- If the subprocess dies mid-test, the test still fails — we only recover BETWEEN tests.
+- Respawning the subprocess loses any in-process state. Tests that rely on state from a prior test fail on respawn.
+
+**Effort:** 4-6 hours (refactor fixture + add respawn logic + tests).
+
+**Risk:** Low. The respawn is a fallback; the primary path (subprocess stays alive) is unchanged.
+
+### Solution C: Per-Batch Process Tracking (most surgical)
+
+**Approach:** Add a process health check at the start of each batch in `scripts/run_tests_batched.py`. If the previous batch left the subprocess dead, log a clear warning. Tests can then fail fast with a known message.
+
+**Code change (conftest writes pid file, batcher reads it):**
+```python
+# tests/conftest.py (in live_gui fixture, after spawn)
+pid_file = tests_dir / ".live_gui_pid"
+pid_file.write_text(str(process.pid))
+
+# scripts/run_tests_batched.py
+def _run_batch(b: Batch, ...) -> ...:
+    if b.label.startswith("tier-3-live_gui"):
+        pid_file = tests_dir / ".live_gui_pid"
+        if pid_file.exists():
+            pid = int(pid_file.read_text().strip())
+            if not _is_pid_alive(pid):
+                print(_c(f"[BATCH-WARN] Prior tier-3 batch left the live_gui subprocess (pid={pid}) dead. "
+                         f"This batch's live_gui tests may not start with a clean state.",
+                         _C.BOLD_YELLOW))
+```
+
+**Pros:**
+- Surgical. Doesn't change the fixture or test code.
+- Surfaces the dirty state via a clear warning, not a silent hang.
+- User can then choose to debug or skip the batch.
+
+**Cons:**
+- Doesn't actually FIX the dirty state — just makes it visible.
+- Requires the fixture to write a pid file (small change).
+- Tests still fail with the same confusing timeout, but the warning is in the runner output.
+
+**Effort:** 1-2 hours.
+
+**Risk:** Low. Read-only check, no behavioral change.
+
+### Solution D: Fixture Auto-Detect (middle ground)
+
+**Approach:** Keep `live_gui` session-scoped, but at the START of each test (not file), check if the subprocess is alive. If dead, re-spawn.
+
+**Code change (conftest auto-use hook):**
+```python
+# tests/conftest.py
+@pytest.fixture(autouse=True)
+def _check_live_gui_health(request, live_gui):
+    if "live_gui" in request.fixturenames:
+        handle, gui_script = live_gui
+        handle.ensure_alive()
+    yield
+```
+
+**Pros:**
+- Per-test recovery. A test that crashes the subprocess doesn't affect the next test.
+- Minimal API change (tests still use `live_gui`).
+
+**Cons:**
+- Per-test overhead (~0.1s for the health check).
+- If a test's clicks during a degraded subprocess fail, the test must be re-designed to be idempotent.
+- Respawning loses state.
+
+**Effort:** 2-3 hours.
+
+**Risk:** Medium. Tests that assume "subprocess is alive when my test starts" may need adjustment.
+
+---
+
+## 4. Recommended Combination
+
+**Primary: Solution A (per-file fixture scope)**
+- Most isolated. Each test file is a clean unit.
+- Simple to implement and audit.
+- For the IM_ASSERT scenario: test_extended_sims.py crashes its subprocess at the end. test_live_workflow.py starts with a fresh subprocess. The IM_ASSERT-triggered pollution doesn't reach test_live_workflow.py.
+
+**Secondary: Solution C (per-batch warning)**
+- Safety net. If a test file's subprocess dies mid-file (rather than at end of file), the next batch's runner logs a clear warning.
+- Doesn't fix the dirty state but makes it visible.
+
+**Optional: Solution B (lazy re-spawn)**
+- If the audit for Solution A reveals too many cross-file dependencies, Solution B is the fallback.
+- More complex but preserves the per-session state model.
+
+### NOT recommended: Solution D alone
+- Per-test recovery is too granular. A test's failure shouldn't trigger a re-spawn that affects subsequent tests' setup.
+- Also: Solution D doesn't help the IM_ASSERT scenario. The IM_ASSERT crashes the subprocess during test_extended_sims.py, and Solution D would respawn it for the next test in the SAME file. But the next test in test_extended_sims.py is `test_full_live_workflow` which is in a different file — Solution D would still respawn correctly for it.
+
+Actually, Solution D WOULD work for the IM_ASSERT scenario:
+- IM_ASSERT fires during `test_execution_sim_live` (test 4 in test_extended_sims.py)
+- Next test is... well, there are no more tests in test_extended_sims.py
+- Next file is test_live_workflow.py, first test is test_full_live_workflow
+- Solution D's autouse fixture would re-spawn the subprocess before test_full_live_workflow
+
+So Solution D is actually a viable primary approach. Let me reconsider.
+
+**Revised recommendation:**
+- **Solution D (autouse fixture auto-respawn)** as the primary. It's the most surgical.
+- **Solution A (per-file scope)** as the alternative if Solution D's autouse approach has side effects.
+- **Solution C (per-batch warning)** as a safety net for any case the autouse doesn't catch.
+
+---
+
+## 5. Open Questions for the User
+
+Before implementation, these need clarification:
+
+1. **Fixture scope preference:** Per-file (Solution A) or per-test auto-respawn (Solution D)?
+   - Per-file: more overhead but simpler reasoning
+   - Per-test auto-respawn: more surgical but adds an autouse hook
+   - My recommendation: Solution D. It's the closest to "the next test file gets a clean subprocess" without changing the fixture's API.
+
+2. **State reset on respawn:** When the subprocess is re-spawned, should the new subprocess inherit any state (e.g., loaded project, recent discussion)?
+   - My recommendation: No. Fresh subprocess = fresh state. Tests should set up their own state.
+
+3. **Failure signaling:** If the subprocess can't be respawned (e.g., port 8999 still in use from a zombie), should the test fail immediately or retry?
+   - My recommendation: Fail immediately with a clear error. Retries can hide real issues.
+
+4. **Backward compatibility:** Are there tests that explicitly DEPEND on the session-scoped behavior (e.g., they share state across files)?
+   - Need to audit. The audit is part of Solution A; for Solution D, the audit is less critical because respawned subprocesses are NEW instances (no shared state with prior subprocesses).
+
+---
+
+## 6. References
+
+- `tests/conftest.py:282` — current `live_gui` fixture (session-scoped)
+- `tests/conftest.py:516-547` — `live_gui` fixture finally block (kill + cleanup)
+- `scripts/run_tests_batched.py:136-164` — `_run_batch` function
+- `scripts/run_tests_batched.py:51-86` — batch result tracking
+- `docs/reports/test_full_live_workflow_propagation_digest_20260608.md` — full solution matrix
+- `conductor/todos/TODO_test_full_live_workflow_v2.md` — task list including Task 4 (batch isolation)
@@ -0,0 +1,267 @@
+# Root Cause Report: test_full_live_workflow batch failure (v2)
+
+**Supersedes:** `test_full_live_workflow_root_cause_20260608.md` (older 6-cause analysis, dated 2026-06-08)
+**Date:** 2026-06-08
+**Status:** Investigation complete via diagnostic logging, no fix attempted
+**Failure reproducibility:** 100% in `tier-3-live_gui` batch (5+ tests, ~200s total), 0% in isolation (`pytest tests/test_live_workflow.py` → 11.69s PASS)
+**Related commits (reverted/no-op):** `4a338486` (io_pool 4→8), `c9a991bb` (timeout 30→120s), `87d7c5bf` (test_io_pool assertion)
+
+---
+
+## TL;DR — The Real Root Cause
+
+**`test_full_live_workflow` does not fail because of a slow `_do_project_switch`.** The switch runs in ~8-10ms when it executes. The test fails because **the GUI subprocess crashes mid-batch** due to an ImGui scope mismatch in some render function, which leaves the controller's `io_pool` in a shutdown state. Subsequent clicks to the still-alive hook server fail with `RuntimeError: cannot schedule new futures after shutdown`.
+
+The `_do_project_switch` was a SYMPTOM, not the cause. The previous report's 6-cause analysis (cwd-relative paths, race conditions, click fire-and-forget, etc.) addressed symptoms of the same underlying issue but did not address the IM_ASSERT.
+
+---
+
+## Evidence Trail
+
+### Diagnostic instrumentation (temporarily added, then reverted)
+
+Added `[switch-diag] +N.NNNs <step>` prints to stderr at every step inside `_do_project_switch` (production code). Output goes to `logs/sloppy_py_test.log` because the `live_gui` fixture captures the subprocess's stderr/stdout to that file. Pattern matches the existing `[startup]` and `[HOOKS]` instrumentation style.
+
+### Key log findings (`logs/sloppy_py_test.log`)
+
+#### Finding 1: All 4 sims' switches complete in ~8-10ms each
+
+```
+[switch-diag] +0.000s enter path=temp_livecontextsim.toml
+[switch-diag] +0.000s flush_to_project_start
+[switch-diag] +0.000s flush_to_project_done
+[switch-diag] +0.000s load_project_start
+[switch-diag] +0.002s load_project_done
+[switch-diag] +0.002s preset_manager_start
+[switch-diag] +0.002s preset_manager_done
+[switch-diag] +0.002s persona_manager_start
+[switch-diag] +0.002s persona_manager_done
+[switch-diag] +0.002s refresh_start
+[switch-diag] +0.010s refresh_done
+[switch-diag] +0.010s mcp_configure_start
+[switch-diag] +0.010s mcp_configure_done
+[switch-diag] +0.010s success
+[switch-diag] +0.010s finally_enter
+[switch-diag] +0.010s finally_done
+```
+
+Same pattern for all 4 sims. **The switch itself is fast. There is no hang inside `_do_project_switch`.**
+
+#### Finding 2: An ImGui `IM_ASSERT` fires at 71.5s into GUI lifetime
+
+```
+[02214] [imgui-error] In window 'MainDockSpace': Missing End()
+[startup] main_call: 71518.5ms
+Traceback (most recent call last):
+  File "C:\projects\manual_slop\sloppy.py", line 75, in <module>
+    main()
+  File "C:\projects\manual_slop\src\gui_2.py", line 1478, in main
+    app.run()
+  File "C:\projects\manual_slop\src\gui_2.py", line 618, in run
+    immapp.run(self.runner_params, ...)
+  File "...\imgui_bundle\_patch_runners_add_save_screenshot_param.py", line 38, in patched_run
+    run_backup(*args, **kwargs)
+RuntimeError: IM_ASSERT( (0) && "Missing End()" )   ---   imgui.cpp:11662
+```
+
+The `IM_ASSERT` is an ImGui scope-tracking assertion: a `begin()` call was not matched with a corresponding `end()`. The window reported is 'MainDockSpace' — a special window managed by `hello_imgui` for the dock space layout. Some child widget within the dock space has an unbalanced begin/end.
+
+#### Finding 3: The test's `btn_project_new_automated` click hits a shutdown pool
+
+```
+[HOOKS] POST /api/session data length: 1       ← test_live_workflow's post_session
+[HOOKS] GET /api/warmup_wait?timeout=60.0
+[HOOKS] GET /api/project_switch_status
+[HOOKS] POST /api/gui data length: 3           ← test's btn_reset
+[HOOKS] POST /api/gui data length: 3           ← test's btn_project_new_automated
+Error executing GUI task (click): cannot schedule new futures after shutdown
+Traceback (most recent call last):
+  File "...\src\app_controller.py", line 1637, in _process_pending_gui_tasks
+    self._gui_task_handlers[action](self, task)
+  File "...\src\app_controller.py", line 580, in _handle_click
+    controller._cb_new_project_automated(user_data)
+  File "...\src\app_controller.py", line 2723, in _cb_new_project_automated
+    self._switch_project(user_data)
+  File "...\src\app_controller.py", line 2809, in _switch_project
+    self.submit_io(self._do_project_switch, path)
+  File "...\src\app_controller.py", line 2282, in submit_io
+    future = self._io_pool.submit(fn, *args, **kwargs)
+  File "...\concurrent\futures\thread.py", line 167, in submit
+    raise RuntimeError('cannot schedule new futures after shutdown')
+RuntimeError: cannot schedule new futures after shutdown
+```
+
+The IM_ASSERT happened ~71.5s into the GUI. The test_live_workflow runs after the 4 sims (~80s into GUI). Between the IM_ASSERT and the test's click, the `io_pool` was shut down.
+
+#### Finding 4: The hook server (FastAPI/stdlib http.server) stays alive
+
+The subprocess continues to respond to hooks (GET /api/events, GET /api/session, POST /api/gui) AFTER the IM_ASSERT and AFTER the io_pool is shut down. The test's `wait_for_project_switch` polls `/api/project_switch_status` 1200+ times in 120s — the server is responsive. Only `submit_io` fails.
+
+This proves: the IO thread pool is the casualty, not the entire process. Something selectively shut down `_io_pool` while the rest of the controller (and the hook server) kept running.
+
+---
+
+## What Shuts Down The io_pool?
+
+This is the unresolved question. The only places `_io_pool.shutdown(wait=False)` is called in `src/`:
+
+1. `src/app_controller.py:762` — in the `_on_sigint` SIGINT handler. This requires SIGINT to be delivered to the subprocess. On Windows, `taskkill /F` does NOT deliver signals.
+2. `src/app_controller.py:2325` — in `controller.shutdown()`. This is called from `src/gui_2.py:869` (in `App.shutdown`), which is only called at `src/gui_2.py:620` AFTER `immapp.run()` returns successfully.
+
+The IM_ASSERT raises `RuntimeError` from inside `immapp.run()`. The exception propagates up. **There is no `try/finally` around `immapp.run`**, so `App.shutdown()` is NOT called via this path.
+
+**Hypothesis A:** Python's interpreter finalization calls `__del__` on the controller (or its `_io_pool`) during exception propagation. `ThreadPoolExecutor.__del__` defaults to `shutdown(wait=False)` per the io_pool.py module docstring. This is the most likely path on a `RuntimeError` propagating through `main()`.
+
+**Hypothesis B:** `immapp.run` internally catches the IM_ASSERT and returns normally. Then `app.shutdown()` is called, which calls `controller.shutdown()`, which calls `_io_pool.shutdown(wait=False)`. Then `app.run()` returns, `main()` returns, and the sloppy.py process exits. But the log shows more activity AFTER the IM_ASSERT (clicks being processed), so this hypothesis is inconsistent with the evidence.
+
+**Hypothesis C:** The hook server thread (FastAPI on port 8999) runs in a separate thread from the main ImGui loop. The IM_ASSERT crashes the main thread but the hook server thread keeps running. The `_io_pool` is the GUI's pool, not the hook server's. When the main thread crashes, Python's atexit / finalization shuts down `_io_pool`. The hook server thread continues independently.
+
+**Hypothesis C is most consistent with the evidence.** The `_io_pool` is created in `AppController.__init__` (line ~810 in current code). The hook server is a separate `ThreadingHTTPServer` (line 11 of `src/api_hooks.py`). They are independent. The IM_ASSERT kills the ImGui main loop, the io_pool gets shut down during finalization, the hook server thread continues serving requests.
+
+The exact mechanism of `_io_pool.shutdown` being called in this scenario is not directly observable from the log. It could be:
+- `ThreadPoolExecutor.__del__` during GC (Hypothesis C path)
+- An atexit handler installed by the warmup system or another module
+- A signal delivery I haven't identified (e.g., the Python interpreter catching the exception and sending SIGTERM internally)
+
+**This matters less than the actual fix.** The IM_ASSERT is the trigger; the io_pool shutdown is a downstream consequence. Fixing the IM_ASSERT (the real bug) prevents all of this.
+
+---
+
+## Why Did The IM_ASSERT Only Fire In Batch?
+
+The IM_ASSERT is deterministic: it fires every time the offending code path is rendered. In isolation, `test_full_live_workflow` runs alone — it does not exercise the same render functions as the 4 sims. In batch, the sims run first:
+- `test_context_sim_live` (ContextSimulation)
+- `test_ai_settings_sim_live` (AISettingsSimulation)
+- `test_tools_sim_live` (ToolsSimulation)
+- `test_execution_sim_live` (ExecutionSimulation)
+
+Each sim opens specific panels (Context, AI Settings, Tools, Execution) and triggers render paths that may be unique to those simulations. After 4 sims, the cumulative state of `ImGui`'s internal scope stack is corrupted — a `begin()` was called in a panel that's only opened in sim mode, and its matching `end()` was either:
+- In a code path that was skipped due to a conditional render
+- In a `defer` block that early-returned
+- After a `return` inside a panel function
+- Skipped due to a conditional in the render loop
+
+The IM_ASSERT then fires at frame 71.5s, which is some specific frame AFTER the sims have set up state. The exact render function is unknown without running `scripts/check_imgui_scopes.py` against the full codebase.
+
+---
+
+## Why Did My Previous Fixes Fail?
+
+### Fix 1: io_pool 4→8 (commit `4a338486`)
+
+**Wrong diagnosis:** I assumed the io_pool was saturated with sims' AI discussion turn workers, causing the new switch to queue forever.
+
+**Actual cause:** The io_pool isn't the bottleneck. The switch runs in ~8-10ms. The pool wasn't saturated at the time of the test's switch.
+
+**Why the commit doesn't hurt:** Bigger pool is a marginal improvement to startup concurrency. It's not a regression, just a fix for an issue that isn't the root cause.
+
+### Fix 2: Test timeout 30s→120s (commit `c9a991bb`)
+
+**Wrong diagnosis:** I assumed the switch was slow and just needed more time.
+
+**Actual cause:** The switch is fast. The test fails because the click can't even reach the switch handler — `submit_io` throws at line 2282 because the pool is shut down.
+
+**Why the commit doesn't hurt:** A longer timeout gives a clearer error message (the actual `RuntimeError: cannot schedule new futures after shutdown` is surfaced) but doesn't change the outcome. If anything, it makes the test more annoying to wait for.
+
+### Fix 3 (uncommitted, reverted): Dedicated executor for switches
+
+**Wrong diagnosis:** I assumed the project switch should not share a pool with background work.
+
+**Actual cause:** The pool is fine. The pool gets killed by the GUI crash.
+
+**Why the commit was reverted:** It added complexity for a non-issue. Per `conductor/workflow.md`: "Don't ship a known regression to save time."
+
+---
+
+## The Three Fixes I Have NOT Yet Attempted
+
+Per the systematic-debugging skill, the architectural question needs user input. The 3 viable directions are documented in `docs/reports/test_full_live_workflow_propagation_digest_20260608.md` (to be written — see TODOs).
+
+### Direction A: Fix the actual ImGui scope bug
+
+**What:** Run `scripts/check_imgui_scopes.py` to find the `begin()`/`end()` mismatch. Fix the offending render function.
+
+**Pros:** Real fix. Solves the root cause.
+
+**Cons:** May require deep investigation across 90+ render functions. May be in a render path that's only triggered by a specific sim panel combination. Could take significant time.
+
+**Risk:** Medium. A wrong fix could break other tests or hide the real issue.
+
+### Direction B: Wrap `immapp.run` in `try/except RuntimeError`
+
+**What:** In `src/gui_2.py:618`, wrap `immapp.run(...)` in a `try/except RuntimeError` (or broader). On exception, log it and let the app continue in a degraded state (e.g., skip the rest of the frame, return to event loop).
+
+**Pros:** Band-aid that prevents the GUI crash from propagating to the process. Tests continue to work. Easier to implement than Direction A.
+
+**Cons:** Hides the actual ImGui scope bug. Future tests may exhibit other weirdness from the scope mismatch. The user has said they don't want a "wrap" that just silently continues.
+
+**Risk:** Low for tests, but masks a real bug. Per user: "I don't want the entire test to just linger or silently continue."
+
+### Direction C: Make `_io_pool.shutdown` recoverable
+
+**What:** In `submit_io`, check if the pool is shut down. If so, recreate it (lazily).
+
+**Pros:** Decouples the test from the io_pool's lifecycle. Makes the controller more robust to GUI crashes.
+
+**Cons:** Doesn't address the IM_ASSERT root cause. The GUI is still crashing — we're just hiding the consequences.
+
+**Risk:** Low. Standard pattern for resilient thread pools.
+
+### Direction D: Make the batch runner handle the failure cleanly
+
+**What:** When a test file fails, the `run_tests_batched.py` runner currently continues to the next batch. The fix is to ensure: (1) the failing test file is marked as failed, (2) the next batch can start with a clean state (kill and restart the sloppy.py subprocess per batch, not session-wide).
+
+**Pros:** Doesn't require fixing the underlying bug. Tests can fail without poisoning subsequent batches.
+
+**Cons:** Doesn't fix `test_full_live_workflow`. The test still fails in its own batch.
+
+**Risk:** Low. Standard pattern for test isolation.
+
+### User's explicit guidance
+
+Per the user:
+- "I don't want the entire test to just linger or silently continue"
+- "I also don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails"
+- "Just has to note that the new file didn't get to deal with a dirty state"
+- "The wrap might be worth it if that properly lets us handle the assert"
+
+The user wants:
+1. A report (this document)
+2. A todo list for the actual fixes
+3. A digest of probable solutions
+4. Batch-level resilience (kill+restart sloppy.py per file)
+5. The wrap-around-`immapp.run` is acceptable IF it properly handles the assert (not just swallows it)
+
+---
+
+## Files Referenced
+
+- `src/app_controller.py:2282` — `submit_io` line that throws the RuntimeError
+- `src/app_controller.py:762` — `_on_sigint` shutdown
+- `src/app_controller.py:2325` — `controller.shutdown` pool shutdown
+- `src/gui_2.py:618` — `immapp.run(...)` call site (the IM_ASSERT trigger)
+- `src/gui_2.py:620` — `self.shutdown()` (only called on normal exit)
+- `src/api_hooks.py:117-136` — `/api/project_switch_status` endpoint
+- `src/api_hooks.py:11` — `from http.server import ThreadingHTTPServer` (independent of io_pool)
+- `tests/test_live_workflow.py:90-94` — `wait_for_project_switch` call
+- `tests/test_live_workflow.py:84-89` — defensive `os.path.exists` check
+- `tests/conftest.py:263-280` — `kill_process_tree` (uses `taskkill /F` on Windows; no signal)
+- `tests/conftest.py:111-126` — pytest smart watchdog (300s timeout in PARENT process)
+- `tests/conftest.py:516-547` — `live_gui` fixture finally block (session-scoped, only fires at end)
+- `scripts/check_imgui_scopes.py` — EXISTING audit script that can detect this class of bug
+- `logs/sloppy_py_test.log` — captured subprocess stderr from the failing test run
+
+## Diagnostic Logging (temporarily added, then reverted)
+
+```python
+# src/app_controller.py:2731-2763 (REVERTED)
+import sys as _diag_sys
+_diag_t0 = time.time()
+def _diag(step: str) -> None:
+    print(f"[switch-diag] +{time.time()-_diag_t0:.3f}s {step} path={Path(path).name}",
+          file=_diag_sys.stderr, flush=True)
+_diag("enter")
+# ... at every step ...
+```
+
+All diagnostic logging has been removed. The production code is back to the pre-diagnostic state. The pattern can be re-applied if needed (e.g., to find which `begin()` in which render function is unbalanced).
@@ -0,0 +1,388 @@
+# Digest: Probable Solutions for ImGui Assert Propagation Failure (2026-06-08)
+
+**Companion to:** `docs/reports/test_full_live_workflow_imgui_assert_20260608.md`
+**Companion to:** `conductor/todos/TODO_test_full_live_workflow_v2.md`
+**Status:** Pre-implementation analysis. User has not yet chosen a direction.
+**Audience:** Future implementer, the user (decision reference)
+
+---
+
+## 1. The Problem (restated)
+
+When `tests/test_extended_sims.py` (4 sims) runs before `tests/test_live_workflow.py` in the tier-3 batch, an ImGui `IM_ASSERT((0) && "Missing End()")` fires at ~71.5s into GUI lifetime in window 'MainDockSpace'. The `RuntimeError` propagates from `immapp.run` through `app.run()` and `main()`. The hook server thread (separate `ThreadingHTTPServer`) survives. The `_io_pool` ends up in a shutdown state (mechanism unclear — likely `ThreadPoolExecutor.__del__` during GC). Subsequent test clicks fail with `RuntimeError: cannot schedule new futures after shutdown`.
+
+The test poll loop then waits 120s before timing out.
+
+---
+
+## 2. Constraints (from user)
+
+Per the user's session feedback (2026-06-08):
+- "I don't want the entire test to just linger or silently continue" — silent failure is not acceptable
+- "I also don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails" — batch isolation is required
+- "Just has to note that the new file didn't get to deal with a dirty state" — the failure should be observable, not hidden
+- "The wrap might be worth it if that properly lets us handle the assert" — a proper wrap (with logging + observable state) is acceptable; a silent swallow is not
+
+The user wants:
+1. A real fix where possible
+2. A wrap that surfaces the failure (not swallows it)
+3. Batch resilience (a failed batch should not poison the next)
+4. Tests should be able to detect a degraded GUI and fail fast with a clear message
+
+---
+
+## 3. Solution Matrix
+
+The 6 tasks in `conductor/todos/TODO_test_full_live_workflow_v2.md` are presented here as a solutions matrix. Each is evaluated on:
+- **Real-fix value** (does it address the root cause?)
+- **Test impact** (does it make `test_full_live_workflow` pass in batch?)
+- **Effort** (hours)
+- **Risk** (chance of regression)
+- **User alignment** (does it match the user's stated constraints?)
+
+| # | Solution | Real fix? | Test impact | Effort | Risk | User-aligned? |
+|---|----------|-----------|------------|--------|------|---------------|
+| 1 | Run `check_imgui_scopes.py` to find the scope mismatch | **Yes** (the actual bug) | Yes (if successful) | 1-2h | Med | Yes |
+| 2 | Fix the identified ImGui scope mismatch | **Yes** (the actual bug) | Yes (if successful) | 1-4h | Med | Yes |
+| 3 | Wrap `immapp.run` in `try/except RuntimeError` | No (band-aid) | Yes (prevents crash) | 1-2h | Low | Yes (per user: "might be worth it if it properly handles") |
+| 4 | Kill+restart sloppy.py per test file | No (isolation) | Yes (clean state) | 2-4h | Low | Yes (per user: "I can't restart the app and continue with the next test file if it fails") |
+| 5 | Make `submit_io` recover from a shut-down pool | No (resilience) | Yes (survives crash) | 0.5h | Low | Yes (defense in depth) |
+| 6 | Add `/api/gui_health` endpoint | No (observability) | Yes (fast-fail) | 1-2h | Low | Yes (per user: "Just has to note that the new file didn't get to deal with a dirty state") |
+
+---
+
+## 4. Solution Details (Probable Approaches)
+
+### 4.1 Solution 1+2: Audit + Fix the ImGui Scope Mismatch
+
+**Approach A: Manual scope audit (lowest risk)**
+
+1. Run `python scripts/check_imgui_scopes.py` against `src/gui_2.py`
+2. Triage findings — many will be false positives (e.g., `begin()` inside a conditional that has an `end()` in the other branch via context manager)
+3. Identify the SPECIFIC `render_*` function with the unbalanced scope
+4. Inspect that function's render path and find the missing `end()` or the extra `begin()`
+
+**Probable location of the bug:** A render function that's only called in one of the sims' panel render paths. Candidates:
+- Render functions for the AI Settings panel (test_ai_settings_sim_live)
+- Render functions for the Tools panel (test_tools_sim_live)
+- Render functions for the Execution/Modals panel (test_execution_sim_live)
+- Render functions for the Context & Chat panel (test_context_sim_live)
+
+**Probable cause of the bug:** A recent render refactor that added a `begin()` to show a tooltip or popup, but the matching `end()` is inside an `if` branch that can early-return. Or a `begin()` that's only conditionally reached, with the `end()` always called but the stack now has an extra entry.
+
+**Approach B: Defer-not-catch pattern (per `conductor/workflow.md` known pitfall)**
+
+The known pitfall section describes:
+> `imgui-bundle` (and similar native extension libraries) expose C-level functions that can crash the Python process with a Windows access violation (`0xc0000005`) or a SIGSEGV on Linux. **These crashes are not catchable from Python** — `try/except Exception` does not intercept native access violations, only Python exceptions.
+
+The IM_ASSERT we observed IS a Python RuntimeError (catchable). But the fix pattern is similar: use `imscope` context managers (per `src/imgui_scopes.py`) to ensure scopes are balanced even on early returns.
+
+**Implementation pattern:**
+```python
+# Before (buggy)
+def render_my_panel(app):
+    imgui.begin("My Panel")
+    if not app.some_state:
+        return  # <-- Missing end()!
+    imgui.text("hello")
+    imgui.end()
+
+# After (fixed)
+def render_my_panel(app):
+    with imscope(imgui.begin, "My Panel"):  # auto end() on exit
+        if not app.some_state:
+            return
+        imgui.text("hello")
+        # end() called automatically
+```
+
+**Effort:** 1-4 hours (depends on what the audit finds).
+
+**Risk:** Medium. The fix may need to be applied to multiple render functions, each requiring careful testing.
+
+**Confidence in success:** Medium. The IM_ASSERT is deterministic and ImGui's scope tracking is reliable. Once the offending function is found, the fix is mechanical.
+
+### 4.2 Solution 3: Wrap `immapp.run` in `try/except RuntimeError`
+
+**Approach: Catch and recover**
+
+```python
+# src/gui_2.py:617-621
+try:
+    immapp.run(self.runner_params, add_ons_params=immapp.AddOnsParams(with_markdown_options=md_options))
+except RuntimeError as e:
+    # IM_ASSERT (Missing End()) or similar. Log the error, mark the GUI
+    # as degraded, and let the hook server continue. Per user feedback
+    # (2026-06-08): the wrap is acceptable IF it surfaces the failure
+    # (does not silently swallow).
+    self.controller._gui_degraded_reason = f"immapp.run raised: {e}"
+    self.controller._last_imgui_assert = traceback.format_exc()
+    print(f"[GUI-DEGRADED] {self.controller._gui_degraded_reason}", file=sys.stderr, flush=True)
+    # Do NOT call self.shutdown() — keep the hook server alive for tests.
+    # The io_pool may be in a weird state; lazy-recreate on next submit_io (Task 5).
+# On normal exit
+if not self.controller._gui_degraded_reason:
+    self.shutdown()
+    session_logger.close_session()
+```
+
+**Key design choices:**
+1. **The wrap does NOT call `self.shutdown()`** when the assert fires. This keeps the hook server alive so subsequent tests can still query state.
+2. **The error is logged at ERROR level** with the full assert message and stack trace. This is observable, not silent.
+3. **The controller sets `_gui_degraded_reason` and `_last_imgui_assert`** so the new `/api/gui_health` endpoint (Task 6) can expose the state to tests.
+4. **Tests can detect the degraded state** via `client.get_gui_health()` and fail fast with a clear message.
+
+**Effort:** 1-2 hours (wrap + logging + state).
+
+**Risk:** Low. The wrap is a band-aid, but a transparent one. The error is still surfaced.
+
+**Confidence in success:** High. The wrap itself is trivial. The integration with `/api/gui_health` and the test-side fast-fail is straightforward.
+
+### 4.3 Solution 4: Kill+Restart sloppy.py per Test File
+
+**Approach: Per-file fixture scope**
+
+Currently, the `live_gui` fixture in `tests/conftest.py` is session-scoped. All live_gui tests share the same `sloppy.py` subprocess for the entire test session. If the subprocess crashes mid-session, all subsequent tests are poisoned.
+
+**Change:** Make the fixture per-file-scoped (or function-scoped with smart re-spawn). When a test in file A finishes, the subprocess is killed. When file B starts, a fresh subprocess is spawned.
+
+**Implementation pattern:**
+
+```python
+# tests/conftest.py (modified)
+@pytest.fixture(scope="module")  # was: "session"
+def live_gui(request):
+    # If the prior file's subprocess died, this is a fresh start.
+    # If the user wants true per-test isolation, change to "function".
+    ...
+    try:
+        yield process, gui_script
+    finally:
+        kill_process_tree(process.pid)
+        log_file.close()
+        shutil.rmtree(temp_workspace)
+```
+
+**Considerations:**
+- **Performance:** Each test file's fixture spawn takes ~1-2s. With 49 live_gui files, that's 49-98s added to the tier-3 batch. Probably acceptable.
+- **State persistence:** Currently, tests can rely on state from a prior test (e.g., a project loaded by sim 1 is still loaded when sim 2 runs). Making the fixture per-file-scoped breaks this. **Most live_gui tests should NOT depend on prior test state** — they should set up their own state. This is the principle the v1 report identified.
+- **Watchdog interaction:** The conftest's smart watchdog (300s) and unconditional watchdog (900s) are based on `_pytest_finished_event`. The per-file fixture change is independent.
+
+**Alternative approach: Smart re-spawn**
+
+Keep the fixture session-scoped, but add a "re-spawn" check at the start of each test. If the subprocess is dead, spawn a new one. The fixture becomes a "lazy" fixture that may spawn multiple subprocesses over the session.
+
+**Probable implementation:**
+```python
+@pytest.fixture(scope="session")
+def live_gui(request):
+    process, gui_script = _spawn_sloppy()
+    yield _LazyLiveGui(process, gui_script)
+    kill_process_tree(process.pid)
+
+class _LazyLiveGui:
+    def __init__(self, process, gui_script):
+        self._process = process
+        self._gui_script = gui_script
+        self._lock = threading.Lock()
+    
+    def get_process(self):
+        with self._lock:
+            if self._process.poll() is not None:
+                # Respawn
+                self._process = _spawn_sloppy()
+            return self._process
+```
+
+This is more complex but preserves the per-session state model for tests that need it.
+
+**Effort:** 2-4 hours (per-file approach is simpler; lazy re-spawn is more invasive).
+
+**Risk:** Low. The per-file approach is straightforward. The lazy re-spawn is more complex but doesn't change the API surface for tests.
+
+**Confidence in success:** High. The pattern is well-established in test infrastructure.
+
+### 4.4 Solution 5: Make `submit_io` Recover from a Shut-Down Pool
+
+**Approach: Lazy recreation**
+
+```python
+# src/app_controller.py:2275-2284
+def submit_io(self, fn, *args, **kwargs):
+    if not hasattr(self, "_io_pool") or self._io_pool is None or self._is_io_pool_shutdown():
+        # Recreate the pool (it was shut down, e.g., by a GUI crash).
+        self._io_pool = make_io_pool()
+        self._io_pool_inflight = 0
+    if not hasattr(self, "_io_pool_inflight_lock"):
+        self._io_pool_inflight_lock = threading.Lock()
+    with self._io_pool_inflight_lock:
+        self._io_pool_inflight = getattr(self, "_io_pool_inflight", 0) + 1
+    future = self._io_pool.submit(fn, *args, **kwargs)
+    future.add_done_callback(lambda _f: self._io_pool_inflight_done())
+    return future
+
+def _is_io_pool_shutdown(self) -> bool:
+    """True if the io_pool has been shut down (e.g., via __del__ or controller.shutdown)."""
+    pool = getattr(self, "_io_pool", None)
+    if pool is None:
+        return True
+    # ThreadPoolExecutor doesn't expose a clean "is_shutdown" method.
+    # Use a try/except probe.
+    try:
+        pool.submit(lambda: None).result(timeout=0.001)
+        return False
+    except RuntimeError:
+        return True
+    except Exception:
+        return False
+```
+
+**Key design choices:**
+1. **The lazy-recreate happens transparently.** Callers don't need to know about pool lifecycle.
+2. **The inflight counter is reset** when the pool is recreated (the old workers are dead).
+3. **The probe (`_is_io_pool_shutdown`) is best-effort.** `ThreadPoolExecutor` doesn't expose a clean shutdown check; a `submit().result()` probe is the standard pattern.
+
+**Effort:** 30 minutes.
+
+**Risk:** Low. The pool was already designed to be replaceable (per the existing test infrastructure). The probe is a non-invasive check.
+
+**Confidence in success:** High. Standard pattern for resilient thread pools.
+
+### 4.5 Solution 6: Add `/api/gui_health` Endpoint
+
+**Approach: Read-only health endpoint**
+
+```python
+# src/api_hooks.py (new elif branch in do_GET)
+elif self.path == "/api/gui_health":
+    self.send_response(200)
+    self.send_header("Content-Type", "application/json")
+    self.end_headers()
+    controller = _get_app_attr(app, "controller", None)
+    if controller is None:
+        payload = {"healthy": True, "degraded_reason": None, "last_assert": None, "io_pool_alive": True}
+    else:
+        payload = {
+            "healthy": getattr(controller, "_gui_degraded_reason", None) is None,
+            "degraded_reason": getattr(controller, "_gui_degraded_reason", None),
+            "last_assert": getattr(controller, "_last_imgui_assert", None),
+            "io_pool_alive": not controller._is_io_pool_shutdown() if hasattr(controller, "_is_io_pool_shutdown") else True,
+        }
+    self.wfile.write(json.dumps(payload).encode("utf-8"))
+```
+
+**Test integration:**
+
+```python
+# tests/test_live_workflow.py (new pre-check at start)
+def test_full_live_workflow(live_gui) -> None:
+    client = ApiHookClient()
+    assert client.wait_for_server(timeout=10)
+    # New: check GUI health before proceeding
+    health = client.get_gui_health()
+    if not health.get("healthy"):
+        pytest.fail(
+            f"GUI is degraded before test starts: "
+            f"degraded_reason={health.get('degraded_reason')}, "
+            f"last_assert={health.get('last_assert')}"
+        )
+    ...
+```
+
+**Effort:** 1-2 hours.
+
+**Risk:** Low. Read-only endpoint + trivial client method + simple pre-check.
+
+**Confidence in success:** High.
+
+---
+
+## 5. Recommended Combination
+
+The user's constraints suggest a **layered defense**:
+
+### Layer 1: Real fix (Tasks 1+2)
+Find and fix the ImGui scope mismatch. This is the right thing to do.
+
+### Layer 2: Safety net (Task 3)
+Wrap `immapp.run` so a future scope mismatch doesn't kill the process. Log the error, mark GUI as degraded.
+
+### Layer 3: Observability (Task 6)
+Expose the degraded state via `/api/gui_health`. Tests can fast-fail with a clear message.
+
+### Layer 4: Resilience (Task 5)
+Make `submit_io` recover from a shut-down pool. Defense in depth.
+
+### Layer 5: Batch isolation (Task 4)
+Per-file (or per-test) fixture scope for `live_gui`. A failed batch doesn't poison the next.
+
+**Recommended PR order:**
+- **PR 1 (highest priority):** Tasks 1+2 (real fix)
+- **PR 2 (in parallel with PR 1):** Tasks 3+6 (wrap + observability) — these don't depend on finding the bug
+- **PR 3:** Task 4 (batch isolation) — independent of the bug, valuable on its own
+- **PR 4 (defense in depth):** Task 5 (lazy pool recreation) — only needed if PR 2 doesn't fully solve the problem
+
+### What's NOT recommended
+
+- **Task 5 alone, without Tasks 1+2+3:** The lazy-recreate hides the bug. The io_pool will keep getting shut down by the GUI crash, then recreated, then shut down again, etc. The test will pass but the GUI is still broken.
+- **Task 4 alone, without the real fix:** Batch isolation is good hygiene, but the test still fails in its own batch. The bug is not fixed.
+- **A silent swallow in Task 3:** The user explicitly rejected this. A silent failure is worse than a visible failure.
+
+---
+
+## 6. Open Questions for the User
+
+Before implementation, these need clarification:
+
+1. **Task 3 wrap behavior:** When the IM_ASSERT fires, should the wrap:
+   a. Just log and return (degraded mode, no further renders)
+   b. Try to continue the render loop (skip the current frame, retry next frame)
+   c. Restart the ImGui context entirely (clear and reinitialize)
+   
+   My recommendation: (a) is the safest. (b) risks infinite loops if the scope is genuinely broken. (c) is invasive.
+
+2. **Task 4 scope:** Per-file or per-test? Per-test is more isolated but adds 49+ fixture spawns. Per-file is a middle ground.
+   
+   My recommendation: Per-file for now. Per-test can be added later if needed.
+
+3. **Task 5 proactive recreation:** Should the pool be recreated eagerly (at controller init) or lazily (on first submit after shutdown)?
+   
+   My recommendation: Lazy. The pool is normally alive; recreating eagerly is wasteful.
+
+4. **New state attributes:** `_gui_degraded_reason` and `_last_imgui_assert` — should they be persisted to disk (e.g., for crash analysis) or just in-memory?
+   
+   My recommendation: In-memory. Disk persistence is overkill for transient runtime state.
+
+5. **Compatibility with existing tests:** Some existing tests may assume the io_pool is always available. Will the lazy-recreate break them?
+   
+   My recommendation: No. The lazy-recreate preserves the pool's API surface. The only difference is that the pool may be a fresh instance after a crash. Existing tests don't care about the instance identity.
+
+---
+
+## 7. Effort & Risk Summary
+
+| Solution | Effort | Risk | User-aligned | Recommendation |
+|----------|--------|------|--------------|----------------|
+| 1+2 (audit + fix) | 2-6h | Med | Yes | **PRIORITY 1: implement first** |
+| 3 (wrap) | 1-2h | Low | Yes | **PRIORITY 2: implement in parallel** |
+| 4 (batch isolation) | 2-4h | Low | Yes | **PRIORITY 3: independent, implement anytime** |
+| 5 (lazy recreate) | 0.5h | Low | Yes | **PRIORITY 4: only if 3 doesn't fully solve** |
+| 6 (health endpoint) | 1-2h | Low | Yes | **PRIORITY 2: implement with 3** |
+
+**Total effort (all):** 6-14 hours
+**Recommended PR sequence:** 1+2 → 3+6 → 4 → 5
+
+---
+
+## 8. References
+
+- `docs/reports/test_full_live_workflow_imgui_assert_20260608.md` — full debugging report
+- `conductor/todos/TODO_test_full_live_workflow_v2.md` — task list
+- `scripts/check_imgui_scopes.py` — existing audit script (use for Task 1)
+- `src/imgui_scopes.py` — ImGuiScope context manager (use for fix in Task 2)
+- `src/api_hooks.py:11` — `ThreadingHTTPServer` (independent of io_pool)
+- `src/gui_2.py:618` — `immapp.run(...)` call site
+- `src/app_controller.py:2282` — `submit_io` line that throws the RuntimeError
+- `conductor/workflow.md` "Known Pitfalls (2026-06-05)" — Defer-Not-Catch Pattern
+- `conductor/workflow.md` "Skip-Marker Policy" — don't add skip markers as workarounds
+- `AGENTS.md` "Critical Anti-Patterns" — no comments, no batch fragility