ed/manual_slop

Private

Public Access

Fork 0

Files

T

ed cb206b973f docs(spec): defer Change 2 (prior_session test) to separate track; reason + follow-up

2026-06-05 20:12:33 -04:00

16 KiB

Raw Blame History

Live-GUI Fragility Fixes — Design

Date: 2026-06-05 Status: Draft Track follow-up to: regression_fixes_20260605 Scope: Fix 3 failing live_gui tests discovered in the 2026-06-05 batched test run, harden the defer-not-catch pattern doc, restore 100% pass rate on the 272-file test suite.

1. Background

Scope decisions (per user review 2026-06-05)

Change 1 (the b"" → "" fix): in scope, critical path.
Change 2 (test mock fix for prior session test): SCOPE REDUCED during execution. The test was more under-mocked than the spec assumed. Initial error at src/gui_2.py:2333 (imscope.window tuple unpack) was the first of several un-mocked dependencies. After fixing imscope.window, the next failure surfaces at src/gui_2.py:4496 (render_theme_panel: imgui.begin returning bool where 2-tuple expected). The test calls render_main_interface which is a kitchen-sink function requiring 50+ mocks. Decision: defer Change 2 to a separate follow-up track that focuses on refactoring the test to either (a) exercise a narrow prior-session render path instead of render_main_interface, or (b) add the missing 50+ mocks. The imscope.window fix is still applied as a defensive change (and as a model for future test work).
Change 3 (regression unit test): in scope, critical path.
Change 4 (doc hardening of defer-not-catch sections): DEFERRED to end of track — user wants to see how long the critical path takes first. If time permits at the end, do Change 4 as a final commit; otherwise leave for a follow-up patch.

Revised pass-rate target

Before track: 269/272 (98.9%)
After Change 1: 271/272 (99.6%) — both test_auto_switch_sim and test_workspace_profiles_restoration should pass; test_prior_session_no_pop_imbalance is deferred to a follow-up.
After Change 3: 272/272 if Change 2 also fixed, else 271/272 + new regression unit test passes.

Follow-up track: prior_session_test_harden_20260605

A new track to be queued in conductor/tracks.md covering the test_prior_session_no_pop_imbalance test's comprehensive mock setup (or refactor to test a narrow path).

Failures (3)

Test	File	Symptom	Root cause
`test_auto_switch_sim`	`tests/test_auto_switch_sim.py:47`	`assert False == True` after triggering tier-3 auto-switch	Category A: profile save raises TypeError → no profile saved → load is no-op
`test_workspace_profiles_restoration`	`tests/test_workspace_profiles_sim.py:81`	`assert False is True` after `load_workspace_profile`	Category A: same as above
`test_no_extraneous_pop_when_prior_session_renders`	`tests/test_prior_session_no_pop_imbalance.py:135`	`TypeError: cannot unpack non-iterable NoneType object` at `src/gui_2.py:2333`	Category B: test mock setup for `imscope.window` returns non-iterable, but production code expects `(opened, visible)` tuple

Test run results (2026-06-05, batched via `scripts/run_tests_batched.py`)

272 test files, 68 batches, 269/272 passing (98.9%).
3 failing tests, all in live_gui (session-scoped fixture) or integration marker category.
0 failing tests in any other category (unit, headless, mock_app, simulation).

Root cause analysis (Category A — both profile failures)

A regression introduced by commit d7487af4 ("fix(gui_2): defer save_ini_settings on first capture to avoid early-render crash"). That commit added a defer-not-catch guard in _capture_workspace_profile (src/gui_2.py:601-606):

def _capture_workspace_profile(self, name: str) -> models.WorkspaceProfile:
  if not getattr(self, "_ini_capture_ready", False):
   self._ini_capture_ready = True
   ini = b""          # <-- BUG: bytes, not str
  else:
   try:
    ini = imgui.save_ini_settings_to_memory()  # returns str
   except Exception:
    ini = b""          # <-- BUG: same
  ...

The bug: ini = b"" is a bytes literal, but the WorkspaceProfile dataclass declares ini_content: str (src/models.py:799), AND tomli_w (the TOML serializer) raises TypeError: Object of type 'bytes' is not TOML serializable.

Verified empirically:

>>> import tomli_w
>>> tomli_w.dump({"ini_content": b""}, io.BytesIO())
TypeError: Object of type 'bytes' is not TOML serializable

Trace path for the failure:

Test: set_value('ui_separate_tier1', True) → field is True in app state.
Test: push_event("custom_callback", {"callback": "save_workspace_profile", ...}).
GUI: _process_pending_gui_tasks → _cb_save_workspace_profile (src/app_controller.py:2870).
App: _capture_workspace_profile(name) → returns WorkspaceProfile(..., ini_content=b"", ...).
workspace_manager.save_profile(profile) → profile.to_dict() → {"ini_content": b"", ...}.
_save_file → tomli_w.dump(data, f) → TypeError raised.
Exception propagates; profile is NOT saved to disk; workspace_profiles is NOT reloaded; self._app.workspace_profiles is NOT updated.
Test: set_value('ui_separate_tier1', False) → field is False.
Test: push_event("custom_callback", {"callback": "load_workspace_profile", ...}).
App: _cb_load_workspace_profile(name) → if name in self.workspace_profiles: → False (save failed) → does nothing.
Test: assert get_value('ui_separate_tier1') is True → fails (still False).

The original pre-defer code (ini = imgui.save_ini_settings_to_memory()) returned a str that round-tripped through TOML successfully; tests passed. The defer fix introduced a type-incompatible sentinel value that broke the serialization contract.

The 1-line fix: change ini = b"" to ini = "" (and add a defensive str-coerce for the non-defer path).

Root cause analysis (Category B — prior session test)

The test mocks imscope.window(...) to return a MagicMock() whose __enter__ returns the bare mock. Production code at src/gui_2.py:2333 does with imscope.window(...) as (opened, visible): which expects a 2-tuple. The test's setup (lines ~70-80) sets __enter__ for many imscope context managers to return non-iterable MagicMock() but for popup_modal (line ~91) correctly returns (True, None). The imscope.window setup is missing the tuple-return — purely a test-authoring bug.

2. Goals

Restore 100% pass rate on the 272-file test suite (no regressions in any other test).
Preserve the defer-not-catch safety property of commit d7487af4 (avoid C-level crash on early-render C calls).
Harden the defer-not-catch documentation to call out the str/bytes type contract (avoid future regressions of the same kind).
Tighten the test-authoring contract for the prior session test: mock imscope context managers with the correct return shape.
OPTIONAL/DEFERRED: Harden the defer-not-catch pattern doc with a "sentinel must match consumer type contract" note. Per user review (2026-06-05), this is deferred to the end of the track. If time permits, do it; otherwise leave for a follow-up patch.

3. Non-Goals

Not refactoring the workspace profile save/load architecture.
Not adding wait-for-ready semantics to the test framework (deferred to a separate live_gui harden track; tracked as backlog item 0 in conductor/tracks.md).
Not fixing the broader test fragility / session-state issues (deferred).
Not addressing sloppy.py startup latency (separate track, also backlog).

4. Design

Change 1: Fix `ini = b""` → `ini = ""` in `_capture_workspace_profile`

Files:

Modify: src/gui_2.py:601-606 (the defer branch)
Modify: src/gui_2.py:606-609 (the non-defer branch's except handler)

Approach: Change ini = b"" to ini = "" in both places. The pre-fix code returned a str; we're restoring that contract. Additionally, defensively coerce the non-defer result: ini = imgui.save_ini_settings_to_memory() returns a str per imgui-bundle docs, but to be safe against future imgui-bundle changes, wrap it: ini = str(imgui.save_ini_settings_to_memory() or "").

def _capture_workspace_profile(self, name: str) -> models.WorkspaceProfile:
  if not getattr(self, "_ini_capture_ready", False):
   self._ini_capture_ready = True
   ini = ""
  else:
   try:
    ini = str(imgui.save_ini_settings_to_memory() or "")
   except Exception:
    ini = ""
  panel_states = { ... }
  return models.WorkspaceProfile(...)

Why: WorkspaceProfile.ini_content: str (src/models.py:799); tomli_w rejects bytes. imgui.load_ini_settings_from_memory(ini_data: str, ...) also expects str. Restoring the str contract is the minimal fix.

Alternatives considered:

A2 — Use imgui.save_ini_settings_to_disk(path) then read the file. Rejected: adds a side-effect path that's not idempotent; tests can pollute the test artifacts dir.
A3 — Force a frame render in __init__ so the first call is safe. Rejected: changes init semantics; interacts badly with hot-reload (src/hot_reloader.py); may regress startup latency (the very thing the new sloppy.py startup track is meant to address).

Change 2: Fix the prior session test mock

Files:

Modify: tests/test_prior_session_no_pop_imbalance.py (the imscope.window mock setup)

Approach: Add the tuple-return to imscope.window's __enter__ mock, matching the pattern already used for popup_modal at line 91:

mock_imscope.window.return_value.__enter__ = MagicMock(return_value=(True, True))
mock_imscope.window.return_value.__exit__ = MagicMock(side_effect=_scope_exit)

Why: The test's imscope.window setup is the only one missing the tuple-return; all other imscope context managers that production code expects to unpack as tuples already have it. This is a 2-line test-only fix.

Change 3: Add a regression test for the ini_content type contract

Files:

Create: tests/test_workspace_profile_serialization.py

Approach: Add a unit test that verifies a WorkspaceProfile with ini_content="" (empty str) round-trips through TOML via to_dict → tomli_w.dump → tomllib.load → from_dict without raising. This is the contract that the defer fix violated.

def test_workspace_profile_empty_ini_content_roundtrips():
  from src.models import WorkspaceProfile
  profile = WorkspaceProfile(name="t", ini_content="", show_windows={"A": True}, panel_states={"x": 1})
  d = profile.to_dict()
  import io, tomli_w, tomllib
  buf = io.BytesIO()
  tomli_w.dump({profile.name: d}, buf)  # this is what save_profile does
  buf.seek(0)
  back = tomllib.load(buf)
  loaded = WorkspaceProfile.from_dict("t", back["t"])
  assert loaded.ini_content == ""
  assert loaded.show_windows == {"A": True}
  assert loaded.panel_states == {"x": 1}

Why: This test would have caught the d7487af4 regression. It encodes the type contract for future contributors. It's a pure unit test, no live_gui, runs in <1s.

Change 4: Harden the defer-not-catch doc

Files:

Modify: docs/guide_gui_2.md "Workspace Profile Defer-Not-Catch" section
Modify: docs/guide_testing.md "Early-Render C-Level Crashes" section
Modify: conductor/workflow.md "Defer-Not-Catch Pattern for Native Crashes" section

Approach: Add a note: "When implementing a defer-not-catch guard for a return value, ensure the sentinel value matches the type contract of the downstream consumer. For WorkspaceProfile.ini_content: str, the sentinel must be "" (str), not b"" (bytes) — TOML serialization rejects bytes."

Why: Future contributors applying the defer-not-catch pattern should not silently introduce type-incompatible sentinels.

5. Data Flow

Before (buggy)

set_value(True)            → app.ui_separate_tier1 = True
save_workspace_profile     → _capture_workspace_profile → ini=b""  (bytes)
                           → to_dict() → {"ini_content": b""}
                           → tomli_w.dump → TypeError
                           → profile NOT saved
set_value(False)           → app.ui_separate_tier1 = False
load_workspace_profile     → name not in workspace_profiles → no-op
assert get_value is True   → FAILS (still False)

After (fixed)

set_value(True)            → app.ui_separate_tier1 = True
save_workspace_profile     → _capture_workspace_profile → ini=""  (str)
                           → to_dict() → {"ini_content": ""}
                           → tomli_w.dump → OK
                           → profile saved
set_value(False)           → app.ui_separate_tier1 = False
load_workspace_profile     → name in workspace_profiles → _apply_workspace_profile
                           → setattr(self, "ui_separate_tier1", True)
assert get_value is True   → PASSES

6. Error Handling

The defer branch and the except branch both set ini = "". Empty string is a valid str and is safe for tomli_w, for the dataclass, and for imgui.load_ini_settings_from_memory("") (which is a no-op that lets ImGui use its defaults).
No new exceptions are introduced. The TypeError from the buggy b"" goes away because the type is now str.
The new regression test (test_workspace_profile_serialization.py) is itself a forward-looking guard: if a future change reintroduces a bytes sentinel, the test will fail with a clear message.

7. Testing Strategy

New tests

tests/test_workspace_profile_serialization.py::test_workspace_profile_empty_ini_content_roundtrips — pure unit test, <1s, encodes the str contract.

Existing tests that should now pass

tests/test_auto_switch_sim::test_auto_switch_sim — saves+loads workspace profile.
tests/test_workspace_profiles_sim::test_workspace_profiles_restoration — saves+loads workspace profile.
tests/test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders — mock setup fix.

Regression check

Re-run the full batched test suite (scripts/run_tests_batched.py) after the fixes; expect 272/272 pass.
Re-run targeted batches of theme tests (test_theme*, test_log_pruner*, test_view_presets*, test_gui_progress*, test_gui_phase4*) to verify the prior doc-track fixes still pass.

8. Risk Assessment

Risk	Likelihood	Impact	Mitigation
The `str()` coercion in the non-defer branch changes behavior	Low	Low	`imgui.save_ini_settings_to_memory()` is documented to return `str`; the coercion is defensive only. The `or ""` handles a `None` return (which `imgui-bundle` does not produce but we don't want to crash on).
The new unit test depends on `tomli_w` semantics that change	Very low	Low	`tomli_w` is a stable dep; the test would only break if `bytes` becomes serializable, which would be a major version change.
The mock fix in the prior session test changes other behavior	Low	Low	The fix only adds the missing tuple-return; existing mocks for other imscope context managers are untouched.
Removing the `b""` sentinel causes the early-render C crash to return	Very low	High	The `try/except Exception` around `imgui.save_ini_settings_to_memory()` is preserved; the flag-based defer is preserved. Only the type of the sentinel changes.

9. Out of Scope (Tracked Separately)

live_gui session-state contract (test-authoring rigor, wait-for-ready pattern) — see [docs/guide_testing.md#authoring-robust-live_gui-tests-dont-assume-clean-state] (added in this session). This is a doc-only change; tests will be hardened over time as they break.
sloppy.py startup latency — new backlog item 0 in conductor/tracks.md, planned via superpowers writing-plans skill in a future session.
Other live_gui tests still flagged as fragile in the regression-fixes plan (MMA engine state transitions, RAG status timing) — these were in the deferred category of the regression_fixes_20260605 plan; not addressed by this design.

10. References

Commit d7487af4 — the defer-not-catch fix that introduced the b"" sentinel.
src/gui_2.py:601-606 — current defer code.
src/models.py:797-823 — WorkspaceProfile dataclass with ini_content: str.
src/workspace_manager.py:48-58 — save_profile that calls to_dict then tomli_w.dump.
docs/guide_gui_2.md#workspace-profile-defer-not-catch — the defer-not-catch section to harden.
docs/guide_testing.md#known-gotchas-2026-06-05 — the early-render C-crash section to harden.
conductor/tracks.md — regression_fixes_20260605 and multi_themes_20260604 entries.
conductor/tracks.md — new backlog item 0 (sloppy.py startup speedup).

16 KiB Raw Blame History

Live-GUI Fragility Fixes — Design

1. Background

Scope decisions (per user review 2026-06-05)

Revised pass-rate target

Follow-up track: prior_session_test_harden_20260605

Failures (3)

Test run results (2026-06-05, batched via scripts/run_tests_batched.py)

Root cause analysis (Category A — both profile failures)

Root cause analysis (Category B — prior session test)

2. Goals

3. Non-Goals

4. Design

Change 1: Fix ini = b"" → ini = "" in _capture_workspace_profile

Change 2: Fix the prior session test mock

Change 3: Add a regression test for the ini_content type contract

Change 4: Harden the defer-not-catch doc

5. Data Flow

Before (buggy)

After (fixed)

6. Error Handling

7. Testing Strategy

New tests

Existing tests that should now pass

Regression check

8. Risk Assessment

9. Out of Scope (Tracked Separately)

10. References

16 KiB

Raw Blame History

Test run results (2026-06-05, batched via `scripts/run_tests_batched.py`)

Change 1: Fix `ini = b""` → `ini = ""` in `_capture_workspace_profile`