Private

Public Access

Files

T

ed 37b9a68017 docs: add test_infra_hardening foundation + RAG batch failure status

Foundation document for the future test_infra_hardening track that
will address session-scoped live_gui fixture isolation, silent
__getattr__/__setattr__ contract assumptions, and similar test
infrastructure fragility.

Also documents the test_rag_phase4_final_verify batch failure
that surfaces after the __getattr__ fix unblocks
test_full_live_workflow. The RAG test failure is NOT a regression
- it reproduces on pre-fix HEAD too. It's a pre-existing test
isolation issue (the live_gui fixture is session-scoped, so state
from the 4 sims pollutes the controller).

2026-06-09 00:26:05 -04:00

7.7 KiB

Raw Blame History

Future Track Foundation: Test Infrastructure Hardening (2026-06-08)

Status: Foundation document (pre-spec). Goal: outline the broader track that this work belongs to.

Related:

docs/reports/test_full_live_workflow_imgui_assert_20260608.md (initial root cause)
docs/reports/test_full_live_workflow_propagation_digest_20260608.md (solutions digest)
docs/reports/test_full_live_workflow_progress_20260608_pm.md (PR1+PR2+PR3 progress)
docs/reports/batch_resilience_plan_20260608.md (batch resilience plan)
conductor/todos/TODO_test_full_live_workflow_v2.md (task list)

What Was Fixed (this session)

PR1 (audit): scripts/check_imgui_scopes.py found 3 false positives. Documented.
PR2 (wrap + health endpoint): immapp.run is now wrapped in try/except. /api/gui_health exposes the controller's degraded state. Tests fail fast with clear messages on dirty state.
PR3 (pre-flight check): test_full_live_workflow calls client.get_gui_health() at start. Fails fast with actionable message if the GUI is degraded.
PR1 follow-up (real fix): The actual IM_ASSERT trigger was a double __getattr__ bug:
- AppController.__getattr__ returned None for ANY ui_ attribute (including ones not in __init__)
- App.__setattr__ checked hasattr(self.controller, name) to route assignments; the controller's buggy __getattr__ made hasattr return True for all ui_ attrs
- The if not hasattr(app, 'foo'): app.foo = False pattern in render_approve_script_modal failed to initialize
- imgui.checkbox was called with None, raised TypeError
- The TypeError propagated without closing the ImGui modal, leaving the scope stack unbalanced
- Next frame: IM_ASSERT(Missing End())
Fix: AppController.__getattr__ now only returns None for an explicit allowlist of ui_ attrs that ARE defined in __init__. For any other missing attribute, raises AttributeError. Also added defense-in-depth in App.__getattr__ to check hasattr(controller, name) before delegating.

Result: 4 sims + test_live_workflow + 2 markdown tests all pass in 87.80s. No IM_ASSERT. The test passes cleanly.

What Is Still Open (Future Work)

1. Test Infrastructure Audit (the broader track)

The fixes this session addressed ONE bug that was making test_full_live_workflow fail. The user asked: "continue with trying to finally cure the test infra with a strong foundation for the future track."

The broader concern: The test infrastructure has accumulated complexity and implicit assumptions. The live_gui fixture is session-scoped, the controller's state is shared across 49+ tests, and small bugs in __getattr__ / __setattr__ cascade into mysterious failures 80 seconds later.

Recommended track scope:

Test isolation: Move from session-scoped to per-file (or per-test-with-respawn) live_gui fixture
Observability: Add /api/gui_health (done) + structured logging for all state mutations
Regression safety: Audit all __getattr__ / __setattr__ / __init__ for hidden contract assumptions
ImGui scope audit: Make the static check_imgui_scopes.py more powerful (handle try/except, control flow, context managers)
Defer-not-catch pattern: Per conductor/workflow.md known pitfall, audit all imgui.* calls for the "called before ImGui fully initialized" issue

2. The `_UI_FLAG_DEFAULTS` allowlist (immediate)

In the fix, I hardcoded an allowlist of ui_ attrs that can return None. This is a maintenance burden — new ui_ attrs added to __init__ must also be added to this allowlist, or the test fixture will fail.

Better fix: Use a class-level _UI_FLAG_DEFAULTS set OR detect them dynamically (e.g., from annotations in __init__). The current hardcoded set is fragile.

3. The `_handle_reset_session` and other state-clearing paths

The AppController._handle_reset_session clears many fields but not all. Tests that share state via the session-scoped fixture can carry over state from one test to the next. A future track should audit and complete the reset logic.

4. Per-test or per-file `live_gui` fixture scope

Per the docs/reports/batch_resilience_plan_20260608.md, the recommended approach is to either:

Make the fixture per-file scoped (heavy but simple)
Add a lazy re-spawn wrapper (lighter but more complex)
Add a per-test autouse health check (lightest, but doesn't recover from subprocess death)

The right answer depends on whether tests need cross-file state. The current 49+ live_gui tests should be audited for cross-file dependencies.

5. The `live_gui` subprocess lifecycle

The subprocess is killed via taskkill /F /T (force-kill). This is correct for production but means the subprocess can't clean up. A graceful shutdown signal (e.g., os.kill(pid, signal.CTRL_C_EVENT) to trigger the SIGINT handler) would allow clean teardown and better diagnostic output on the next session.

6. Documentation: the `getattr` / `setattr` contract

The fix in this session was possible because I read the __getattr__ code. But the __getattr__ / __setattr__ pair is a non-obvious contract. The docstring should explicitly state:

Which attributes are delegated to the controller
What hasattr() should return for each
The interaction with setattr()

A future track should add explicit tests for the delegation contract, perhaps via property descriptors.

Proposed Track Name

test_infra_hardening_20260608 (or similar)

Proposed Track Phases

Phase 1: Audit (1-2 days)

Catalog all __getattr__ / __setattr__ in the codebase
Document the implicit contracts
Identify other "silent failure" patterns (where a bug manifests 80s later in a different subsystem)

Phase 2: Refactor the `_UI_FLAG_DEFAULTS` (1 day)

Move the hardcoded set to a class-level attribute
OR detect from __init__ annotations
Add unit test that catches missing entries

Phase 3: live_gui fixture scope change (1-2 days)

Audit all live_gui tests for cross-file state dependencies
Change live_gui from session-scoped to per-file (or per-test-with-respawn)
Add metrics for the cost (slowdown)

Phase 4: Improve check_imgui_scopes.py (2-3 days)

Add support for try/except patterns
Add support for control flow analysis
Add a "render function entry/exit" tracking mode that runs the GUI for a frame and reports unbalanced scopes

Phase 5: Documentation and runbooks (1 day)

Document the deferred-not-catch pattern in a code style guide
Add a runbook for "the live_gui test failed — what to check"
Update the docs/reports/ to reflect the new infrastructure

Why This Track Is Worth Doing

The bug fixed in this session was a 4-layer deep interaction:

__getattr__ returning None (the wrong default)
hasattr() returning True because of (1)
__setattr__ routing the assignment to the wrong place because of (2)
imgui.checkbox getting None because of (3)
The TypeError propagating without proper cleanup
The ImGui scope stack being unbalanced
The next frame triggering IM_ASSERT

This is a fragility that will recur. The track prevents future bugs of this shape by:

Making the contracts explicit (Phase 1)
Eliminating the silent-failure pattern (Phase 2)
Reducing the state surface shared between tests (Phase 3)
Improving the static audit to catch scope issues early (Phase 4)

bcdc26d0 (this session): The actual fix — __getattr__ allowlist
51ecace4 (this session): PR3 pre-flight health check + planning docs
1c565da7 (this session): PR2 wrap + health endpoint
c9a991bb (this session): timeout bump
4a338486 (this session): io_pool 4→8
87d7c5bf (this session): io_pool test assertion

7.7 KiB Raw Blame History