# Future Track Foundation: Test Infrastructure Hardening (2026-06-08)

**Status:** Foundation document (pre-spec). Goal: outline the broader track that this work belongs to.

**Related:**
- `docs/reports/test_full_live_workflow_imgui_assert_20260608.md` (initial root cause)
- `docs/reports/test_full_live_workflow_propagation_digest_20260608.md` (solutions digest)
- `docs/reports/test_full_live_workflow_progress_20260608_pm.md` (PR1+PR2+PR3 progress)
- `docs/reports/batch_resilience_plan_20260608.md` (batch resilience plan)
- `conductor/todos/TODO_test_full_live_workflow_v2.md` (task list)

---

## What Was Fixed (this session)

1. **PR1 (audit):** `scripts/check_imgui_scopes.py` found 3 false positives. Documented.
2. **PR2 (wrap + health endpoint):** `immapp.run` is now wrapped in try/except. `/api/gui_health` exposes the controller's degraded state. Tests fail fast with clear messages on dirty state.
3. **PR3 (pre-flight check):** `test_full_live_workflow` calls `client.get_gui_health()` at start. Fails fast with actionable message if the GUI is degraded.
4. **PR1 follow-up (real fix):** The actual IM_ASSERT trigger was a double `__getattr__` bug:
   - `AppController.__getattr__` returned `None` for ANY `ui_` attribute (including ones not in `__init__`)
   - `App.__setattr__` checked `hasattr(self.controller, name)` to route assignments; the controller's buggy `__getattr__` made `hasattr` return True for all `ui_` attrs
   - The `if not hasattr(app, 'foo'): app.foo = False` pattern in `render_approve_script_modal` failed to initialize
   - `imgui.checkbox` was called with `None`, raised TypeError
   - The TypeError propagated without closing the ImGui modal, leaving the scope stack unbalanced
   - Next frame: IM_ASSERT(Missing End())
5. **Fix:** `AppController.__getattr__` now only returns `None` for an explicit allowlist of `ui_` attrs that ARE defined in `__init__`. For any other missing attribute, raises `AttributeError`. Also added defense-in-depth in `App.__getattr__` to check `hasattr(controller, name)` before delegating.

**Result:** 4 sims + test_live_workflow + 2 markdown tests all pass in 87.80s. No IM_ASSERT. The test passes cleanly.

---

## What Is Still Open (Future Work)

### 1. Test Infrastructure Audit (the broader track)

The fixes this session addressed ONE bug that was making `test_full_live_workflow` fail. The user asked: "continue with trying to finally cure the test infra with a strong foundation for the future track."

**The broader concern:** The test infrastructure has accumulated complexity and implicit assumptions. The `live_gui` fixture is session-scoped, the controller's state is shared across 49+ tests, and small bugs in `__getattr__` / `__setattr__` cascade into mysterious failures 80 seconds later.

**Recommended track scope:**
- **Test isolation:** Move from session-scoped to per-file (or per-test-with-respawn) live_gui fixture
- **Observability:** Add `/api/gui_health` (done) + structured logging for all state mutations
- **Regression safety:** Audit all `__getattr__` / `__setattr__` / `__init__` for hidden contract assumptions
- **ImGui scope audit:** Make the static `check_imgui_scopes.py` more powerful (handle try/except, control flow, context managers)
- **Defer-not-catch pattern:** Per `conductor/workflow.md` known pitfall, audit all `imgui.*` calls for the "called before ImGui fully initialized" issue

### 2. The `_UI_FLAG_DEFAULTS` allowlist (immediate)

In the fix, I hardcoded an allowlist of `ui_` attrs that can return `None`. This is a maintenance burden — new `ui_` attrs added to `__init__` must also be added to this allowlist, or the test fixture will fail.

**Better fix:** Use a class-level `_UI_FLAG_DEFAULTS` set OR detect them dynamically (e.g., from annotations in `__init__`). The current hardcoded set is fragile.

### 3. The `_handle_reset_session` and other state-clearing paths

The `AppController._handle_reset_session` clears many fields but not all. Tests that share state via the session-scoped fixture can carry over state from one test to the next. A future track should audit and complete the reset logic.

### 4. Per-test or per-file `live_gui` fixture scope

Per the `docs/reports/batch_resilience_plan_20260608.md`, the recommended approach is to either:
- Make the fixture per-file scoped (heavy but simple)
- Add a lazy re-spawn wrapper (lighter but more complex)
- Add a per-test autouse health check (lightest, but doesn't recover from subprocess death)

The right answer depends on whether tests need cross-file state. The current 49+ live_gui tests should be audited for cross-file dependencies.

### 5. The `live_gui` subprocess lifecycle

The subprocess is killed via `taskkill /F /T` (force-kill). This is correct for production but means the subprocess can't clean up. A graceful shutdown signal (e.g., `os.kill(pid, signal.CTRL_C_EVENT)` to trigger the SIGINT handler) would allow clean teardown and better diagnostic output on the next session.

### 6. Documentation: the `__getattr__` / `__setattr__` contract

The fix in this session was possible because I read the `__getattr__` code. But the `__getattr__` / `__setattr__` pair is a non-obvious contract. The docstring should explicitly state:
- Which attributes are delegated to the controller
- What `hasattr()` should return for each
- The interaction with `setattr()`

A future track should add explicit tests for the delegation contract, perhaps via property descriptors.

---

## Proposed Track Name

`test_infra_hardening_20260608` (or similar)

## Proposed Track Phases

### Phase 1: Audit (1-2 days)
- Catalog all `__getattr__` / `__setattr__` in the codebase
- Document the implicit contracts
- Identify other "silent failure" patterns (where a bug manifests 80s later in a different subsystem)

### Phase 2: Refactor the `_UI_FLAG_DEFAULTS` (1 day)
- Move the hardcoded set to a class-level attribute
- OR detect from `__init__` annotations
- Add unit test that catches missing entries

### Phase 3: live_gui fixture scope change (1-2 days)
- Audit all live_gui tests for cross-file state dependencies
- Change `live_gui` from session-scoped to per-file (or per-test-with-respawn)
- Add metrics for the cost (slowdown)

### Phase 4: Improve check_imgui_scopes.py (2-3 days)
- Add support for try/except patterns
- Add support for control flow analysis
- Add a "render function entry/exit" tracking mode that runs the GUI for a frame and reports unbalanced scopes

### Phase 5: Documentation and runbooks (1 day)
- Document the deferred-not-catch pattern in a code style guide
- Add a runbook for "the live_gui test failed — what to check"
- Update the `docs/reports/` to reflect the new infrastructure

---

## Why This Track Is Worth Doing

The bug fixed in this session was a 4-layer deep interaction:
1. `__getattr__` returning None (the wrong default)
2. `hasattr()` returning True because of (1)
3. `__setattr__` routing the assignment to the wrong place because of (2)
4. `imgui.checkbox` getting None because of (3)
5. The TypeError propagating without proper cleanup
6. The ImGui scope stack being unbalanced
7. The next frame triggering IM_ASSERT

This is a fragility that will recur. The track prevents future bugs of this shape by:
- Making the contracts explicit (Phase 1)
- Eliminating the silent-failure pattern (Phase 2)
- Reducing the state surface shared between tests (Phase 3)
- Improving the static audit to catch scope issues early (Phase 4)

---

## Related Commits

- `bcdc26d0` (this session): The actual fix — `__getattr__` allowlist
- `51ecace4` (this session): PR3 pre-flight health check + planning docs
- `1c565da7` (this session): PR2 wrap + health endpoint
- `c9a991bb` (this session): timeout bump
- `4a338486` (this session): io_pool 4→8
- `87d7c5bf` (this session): io_pool test assertion