Foundation document for the future test_infra_hardening track that will address session-scoped live_gui fixture isolation, silent __getattr__/__setattr__ contract assumptions, and similar test infrastructure fragility. Also documents the test_rag_phase4_final_verify batch failure that surfaces after the __getattr__ fix unblocks test_full_live_workflow. The RAG test failure is NOT a regression - it reproduces on pre-fix HEAD too. It's a pre-existing test isolation issue (the live_gui fixture is session-scoped, so state from the 4 sims pollutes the controller).
7.7 KiB
Future Track Foundation: Test Infrastructure Hardening (2026-06-08)
Status: Foundation document (pre-spec). Goal: outline the broader track that this work belongs to.
Related:
docs/reports/test_full_live_workflow_imgui_assert_20260608.md(initial root cause)docs/reports/test_full_live_workflow_propagation_digest_20260608.md(solutions digest)docs/reports/test_full_live_workflow_progress_20260608_pm.md(PR1+PR2+PR3 progress)docs/reports/batch_resilience_plan_20260608.md(batch resilience plan)conductor/todos/TODO_test_full_live_workflow_v2.md(task list)
What Was Fixed (this session)
- PR1 (audit):
scripts/check_imgui_scopes.pyfound 3 false positives. Documented. - PR2 (wrap + health endpoint):
immapp.runis now wrapped in try/except./api/gui_healthexposes the controller's degraded state. Tests fail fast with clear messages on dirty state. - PR3 (pre-flight check):
test_full_live_workflowcallsclient.get_gui_health()at start. Fails fast with actionable message if the GUI is degraded. - PR1 follow-up (real fix): The actual IM_ASSERT trigger was a double
__getattr__bug:AppController.__getattr__returnedNonefor ANYui_attribute (including ones not in__init__)App.__setattr__checkedhasattr(self.controller, name)to route assignments; the controller's buggy__getattr__madehasattrreturn True for allui_attrs- The
if not hasattr(app, 'foo'): app.foo = Falsepattern inrender_approve_script_modalfailed to initialize imgui.checkboxwas called withNone, raised TypeError- The TypeError propagated without closing the ImGui modal, leaving the scope stack unbalanced
- Next frame: IM_ASSERT(Missing End())
- Fix:
AppController.__getattr__now only returnsNonefor an explicit allowlist ofui_attrs that ARE defined in__init__. For any other missing attribute, raisesAttributeError. Also added defense-in-depth inApp.__getattr__to checkhasattr(controller, name)before delegating.
Result: 4 sims + test_live_workflow + 2 markdown tests all pass in 87.80s. No IM_ASSERT. The test passes cleanly.
What Is Still Open (Future Work)
1. Test Infrastructure Audit (the broader track)
The fixes this session addressed ONE bug that was making test_full_live_workflow fail. The user asked: "continue with trying to finally cure the test infra with a strong foundation for the future track."
The broader concern: The test infrastructure has accumulated complexity and implicit assumptions. The live_gui fixture is session-scoped, the controller's state is shared across 49+ tests, and small bugs in __getattr__ / __setattr__ cascade into mysterious failures 80 seconds later.
Recommended track scope:
- Test isolation: Move from session-scoped to per-file (or per-test-with-respawn) live_gui fixture
- Observability: Add
/api/gui_health(done) + structured logging for all state mutations - Regression safety: Audit all
__getattr__/__setattr__/__init__for hidden contract assumptions - ImGui scope audit: Make the static
check_imgui_scopes.pymore powerful (handle try/except, control flow, context managers) - Defer-not-catch pattern: Per
conductor/workflow.mdknown pitfall, audit allimgui.*calls for the "called before ImGui fully initialized" issue
2. The _UI_FLAG_DEFAULTS allowlist (immediate)
In the fix, I hardcoded an allowlist of ui_ attrs that can return None. This is a maintenance burden — new ui_ attrs added to __init__ must also be added to this allowlist, or the test fixture will fail.
Better fix: Use a class-level _UI_FLAG_DEFAULTS set OR detect them dynamically (e.g., from annotations in __init__). The current hardcoded set is fragile.
3. The _handle_reset_session and other state-clearing paths
The AppController._handle_reset_session clears many fields but not all. Tests that share state via the session-scoped fixture can carry over state from one test to the next. A future track should audit and complete the reset logic.
4. Per-test or per-file live_gui fixture scope
Per the docs/reports/batch_resilience_plan_20260608.md, the recommended approach is to either:
- Make the fixture per-file scoped (heavy but simple)
- Add a lazy re-spawn wrapper (lighter but more complex)
- Add a per-test autouse health check (lightest, but doesn't recover from subprocess death)
The right answer depends on whether tests need cross-file state. The current 49+ live_gui tests should be audited for cross-file dependencies.
5. The live_gui subprocess lifecycle
The subprocess is killed via taskkill /F /T (force-kill). This is correct for production but means the subprocess can't clean up. A graceful shutdown signal (e.g., os.kill(pid, signal.CTRL_C_EVENT) to trigger the SIGINT handler) would allow clean teardown and better diagnostic output on the next session.
6. Documentation: the __getattr__ / __setattr__ contract
The fix in this session was possible because I read the __getattr__ code. But the __getattr__ / __setattr__ pair is a non-obvious contract. The docstring should explicitly state:
- Which attributes are delegated to the controller
- What
hasattr()should return for each - The interaction with
setattr()
A future track should add explicit tests for the delegation contract, perhaps via property descriptors.
Proposed Track Name
test_infra_hardening_20260608 (or similar)
Proposed Track Phases
Phase 1: Audit (1-2 days)
- Catalog all
__getattr__/__setattr__in the codebase - Document the implicit contracts
- Identify other "silent failure" patterns (where a bug manifests 80s later in a different subsystem)
Phase 2: Refactor the _UI_FLAG_DEFAULTS (1 day)
- Move the hardcoded set to a class-level attribute
- OR detect from
__init__annotations - Add unit test that catches missing entries
Phase 3: live_gui fixture scope change (1-2 days)
- Audit all live_gui tests for cross-file state dependencies
- Change
live_guifrom session-scoped to per-file (or per-test-with-respawn) - Add metrics for the cost (slowdown)
Phase 4: Improve check_imgui_scopes.py (2-3 days)
- Add support for try/except patterns
- Add support for control flow analysis
- Add a "render function entry/exit" tracking mode that runs the GUI for a frame and reports unbalanced scopes
Phase 5: Documentation and runbooks (1 day)
- Document the deferred-not-catch pattern in a code style guide
- Add a runbook for "the live_gui test failed — what to check"
- Update the
docs/reports/to reflect the new infrastructure
Why This Track Is Worth Doing
The bug fixed in this session was a 4-layer deep interaction:
__getattr__returning None (the wrong default)hasattr()returning True because of (1)__setattr__routing the assignment to the wrong place because of (2)imgui.checkboxgetting None because of (3)- The TypeError propagating without proper cleanup
- The ImGui scope stack being unbalanced
- The next frame triggering IM_ASSERT
This is a fragility that will recur. The track prevents future bugs of this shape by:
- Making the contracts explicit (Phase 1)
- Eliminating the silent-failure pattern (Phase 2)
- Reducing the state surface shared between tests (Phase 3)
- Improving the static audit to catch scope issues early (Phase 4)
Related Commits
bcdc26d0(this session): The actual fix —__getattr__allowlist51ecace4(this session): PR3 pre-flight health check + planning docs1c565da7(this session): PR2 wrap + health endpointc9a991bb(this session): timeout bump4a338486(this session): io_pool 4→887d7c5bf(this session): io_pool test assertion