# Future Track Foundation: Test Infrastructure Hardening (2026-06-08) **Status:** Foundation document (pre-spec). Goal: outline the broader track that this work belongs to. **Related:** - `docs/reports/test_full_live_workflow_imgui_assert_20260608.md` (initial root cause) - `docs/reports/test_full_live_workflow_propagation_digest_20260608.md` (solutions digest) - `docs/reports/test_full_live_workflow_progress_20260608_pm.md` (PR1+PR2+PR3 progress) - `docs/reports/batch_resilience_plan_20260608.md` (batch resilience plan) - `conductor/todos/TODO_test_full_live_workflow_v2.md` (task list) --- ## What Was Fixed (this session) 1. **PR1 (audit):** `scripts/check_imgui_scopes.py` found 3 false positives. Documented. 2. **PR2 (wrap + health endpoint):** `immapp.run` is now wrapped in try/except. `/api/gui_health` exposes the controller's degraded state. Tests fail fast with clear messages on dirty state. 3. **PR3 (pre-flight check):** `test_full_live_workflow` calls `client.get_gui_health()` at start. Fails fast with actionable message if the GUI is degraded. 4. **PR1 follow-up (real fix):** The actual IM_ASSERT trigger was a double `__getattr__` bug: - `AppController.__getattr__` returned `None` for ANY `ui_` attribute (including ones not in `__init__`) - `App.__setattr__` checked `hasattr(self.controller, name)` to route assignments; the controller's buggy `__getattr__` made `hasattr` return True for all `ui_` attrs - The `if not hasattr(app, 'foo'): app.foo = False` pattern in `render_approve_script_modal` failed to initialize - `imgui.checkbox` was called with `None`, raised TypeError - The TypeError propagated without closing the ImGui modal, leaving the scope stack unbalanced - Next frame: IM_ASSERT(Missing End()) 5. **Fix:** `AppController.__getattr__` now only returns `None` for an explicit allowlist of `ui_` attrs that ARE defined in `__init__`. For any other missing attribute, raises `AttributeError`. Also added defense-in-depth in `App.__getattr__` to check `hasattr(controller, name)` before delegating. **Result:** 4 sims + test_live_workflow + 2 markdown tests all pass in 87.80s. No IM_ASSERT. The test passes cleanly. --- ## What Is Still Open (Future Work) ### 1. Test Infrastructure Audit (the broader track) The fixes this session addressed ONE bug that was making `test_full_live_workflow` fail. The user asked: "continue with trying to finally cure the test infra with a strong foundation for the future track." **The broader concern:** The test infrastructure has accumulated complexity and implicit assumptions. The `live_gui` fixture is session-scoped, the controller's state is shared across 49+ tests, and small bugs in `__getattr__` / `__setattr__` cascade into mysterious failures 80 seconds later. **Recommended track scope:** - **Test isolation:** Move from session-scoped to per-file (or per-test-with-respawn) live_gui fixture - **Observability:** Add `/api/gui_health` (done) + structured logging for all state mutations - **Regression safety:** Audit all `__getattr__` / `__setattr__` / `__init__` for hidden contract assumptions - **ImGui scope audit:** Make the static `check_imgui_scopes.py` more powerful (handle try/except, control flow, context managers) - **Defer-not-catch pattern:** Per `conductor/workflow.md` known pitfall, audit all `imgui.*` calls for the "called before ImGui fully initialized" issue ### 2. The `_UI_FLAG_DEFAULTS` allowlist (immediate) In the fix, I hardcoded an allowlist of `ui_` attrs that can return `None`. This is a maintenance burden — new `ui_` attrs added to `__init__` must also be added to this allowlist, or the test fixture will fail. **Better fix:** Use a class-level `_UI_FLAG_DEFAULTS` set OR detect them dynamically (e.g., from annotations in `__init__`). The current hardcoded set is fragile. ### 3. The `_handle_reset_session` and other state-clearing paths The `AppController._handle_reset_session` clears many fields but not all. Tests that share state via the session-scoped fixture can carry over state from one test to the next. A future track should audit and complete the reset logic. ### 4. Per-test or per-file `live_gui` fixture scope Per the `docs/reports/batch_resilience_plan_20260608.md`, the recommended approach is to either: - Make the fixture per-file scoped (heavy but simple) - Add a lazy re-spawn wrapper (lighter but more complex) - Add a per-test autouse health check (lightest, but doesn't recover from subprocess death) The right answer depends on whether tests need cross-file state. The current 49+ live_gui tests should be audited for cross-file dependencies. ### 5. The `live_gui` subprocess lifecycle The subprocess is killed via `taskkill /F /T` (force-kill). This is correct for production but means the subprocess can't clean up. A graceful shutdown signal (e.g., `os.kill(pid, signal.CTRL_C_EVENT)` to trigger the SIGINT handler) would allow clean teardown and better diagnostic output on the next session. ### 6. Documentation: the `__getattr__` / `__setattr__` contract The fix in this session was possible because I read the `__getattr__` code. But the `__getattr__` / `__setattr__` pair is a non-obvious contract. The docstring should explicitly state: - Which attributes are delegated to the controller - What `hasattr()` should return for each - The interaction with `setattr()` A future track should add explicit tests for the delegation contract, perhaps via property descriptors. --- ## Proposed Track Name `test_infra_hardening_20260608` (or similar) ## Proposed Track Phases ### Phase 1: Audit (1-2 days) - Catalog all `__getattr__` / `__setattr__` in the codebase - Document the implicit contracts - Identify other "silent failure" patterns (where a bug manifests 80s later in a different subsystem) ### Phase 2: Refactor the `_UI_FLAG_DEFAULTS` (1 day) - Move the hardcoded set to a class-level attribute - OR detect from `__init__` annotations - Add unit test that catches missing entries ### Phase 3: live_gui fixture scope change (1-2 days) - Audit all live_gui tests for cross-file state dependencies - Change `live_gui` from session-scoped to per-file (or per-test-with-respawn) - Add metrics for the cost (slowdown) ### Phase 4: Improve check_imgui_scopes.py (2-3 days) - Add support for try/except patterns - Add support for control flow analysis - Add a "render function entry/exit" tracking mode that runs the GUI for a frame and reports unbalanced scopes ### Phase 5: Documentation and runbooks (1 day) - Document the deferred-not-catch pattern in a code style guide - Add a runbook for "the live_gui test failed — what to check" - Update the `docs/reports/` to reflect the new infrastructure --- ## Why This Track Is Worth Doing The bug fixed in this session was a 4-layer deep interaction: 1. `__getattr__` returning None (the wrong default) 2. `hasattr()` returning True because of (1) 3. `__setattr__` routing the assignment to the wrong place because of (2) 4. `imgui.checkbox` getting None because of (3) 5. The TypeError propagating without proper cleanup 6. The ImGui scope stack being unbalanced 7. The next frame triggering IM_ASSERT This is a fragility that will recur. The track prevents future bugs of this shape by: - Making the contracts explicit (Phase 1) - Eliminating the silent-failure pattern (Phase 2) - Reducing the state surface shared between tests (Phase 3) - Improving the static audit to catch scope issues early (Phase 4) --- ## Related Commits - `bcdc26d0` (this session): The actual fix — `__getattr__` allowlist - `51ecace4` (this session): PR3 pre-flight health check + planning docs - `1c565da7` (this session): PR2 wrap + health endpoint - `c9a991bb` (this session): timeout bump - `4a338486` (this session): io_pool 4→8 - `87d7c5bf` (this session): io_pool test assertion