Private
Public Access
0
0
Files
manual_slop/docs/superpowers/specs/2026-06-05-live-gui-fragility-fixes-design.md
T

16 KiB

Live-GUI Fragility Fixes — Design

Date: 2026-06-05 Status: Draft Track follow-up to: regression_fixes_20260605 Scope: Fix 3 failing live_gui tests discovered in the 2026-06-05 batched test run, harden the defer-not-catch pattern doc, restore 100% pass rate on the 272-file test suite.

1. Background

Scope decisions (per user review 2026-06-05)

  • Change 1 (the b"""" fix): in scope, critical path.
  • Change 2 (test mock fix for prior session test): SCOPE REDUCED during execution. The test was more under-mocked than the spec assumed. Initial error at src/gui_2.py:2333 (imscope.window tuple unpack) was the first of several un-mocked dependencies. After fixing imscope.window, the next failure surfaces at src/gui_2.py:4496 (render_theme_panel: imgui.begin returning bool where 2-tuple expected). The test calls render_main_interface which is a kitchen-sink function requiring 50+ mocks. Decision: defer Change 2 to a separate follow-up track that focuses on refactoring the test to either (a) exercise a narrow prior-session render path instead of render_main_interface, or (b) add the missing 50+ mocks. The imscope.window fix is still applied as a defensive change (and as a model for future test work).
  • Change 3 (regression unit test): in scope, critical path.
  • Change 4 (doc hardening of defer-not-catch sections): DEFERRED to end of track — user wants to see how long the critical path takes first. If time permits at the end, do Change 4 as a final commit; otherwise leave for a follow-up patch.

Revised pass-rate target

  • Before track: 269/272 (98.9%)
  • After Change 1: 271/272 (99.6%) — both test_auto_switch_sim and test_workspace_profiles_restoration should pass; test_prior_session_no_pop_imbalance is deferred to a follow-up.
  • After Change 3: 272/272 if Change 2 also fixed, else 271/272 + new regression unit test passes.

Follow-up track: prior_session_test_harden_20260605

A new track to be queued in conductor/tracks.md covering the test_prior_session_no_pop_imbalance test's comprehensive mock setup (or refactor to test a narrow path).

Failures (3)

Test File Symptom Root cause
test_auto_switch_sim tests/test_auto_switch_sim.py:47 assert False == True after triggering tier-3 auto-switch Category A: profile save raises TypeError → no profile saved → load is no-op
test_workspace_profiles_restoration tests/test_workspace_profiles_sim.py:81 assert False is True after load_workspace_profile Category A: same as above
test_no_extraneous_pop_when_prior_session_renders tests/test_prior_session_no_pop_imbalance.py:135 TypeError: cannot unpack non-iterable NoneType object at src/gui_2.py:2333 Category B: test mock setup for imscope.window returns non-iterable, but production code expects (opened, visible) tuple

Test run results (2026-06-05, batched via scripts/run_tests_batched.py)

  • 272 test files, 68 batches, 269/272 passing (98.9%).
  • 3 failing tests, all in live_gui (session-scoped fixture) or integration marker category.
  • 0 failing tests in any other category (unit, headless, mock_app, simulation).

Root cause analysis (Category A — both profile failures)

A regression introduced by commit d7487af4 ("fix(gui_2): defer save_ini_settings on first capture to avoid early-render crash"). That commit added a defer-not-catch guard in _capture_workspace_profile (src/gui_2.py:601-606):

def _capture_workspace_profile(self, name: str) -> models.WorkspaceProfile:
  if not getattr(self, "_ini_capture_ready", False):
   self._ini_capture_ready = True
   ini = b""          # <-- BUG: bytes, not str
  else:
   try:
    ini = imgui.save_ini_settings_to_memory()  # returns str
   except Exception:
    ini = b""          # <-- BUG: same
  ...

The bug: ini = b"" is a bytes literal, but the WorkspaceProfile dataclass declares ini_content: str (src/models.py:799), AND tomli_w (the TOML serializer) raises TypeError: Object of type 'bytes' is not TOML serializable.

Verified empirically:

>>> import tomli_w
>>> tomli_w.dump({"ini_content": b""}, io.BytesIO())
TypeError: Object of type 'bytes' is not TOML serializable

Trace path for the failure:

  1. Test: set_value('ui_separate_tier1', True) → field is True in app state.
  2. Test: push_event("custom_callback", {"callback": "save_workspace_profile", ...}).
  3. GUI: _process_pending_gui_tasks_cb_save_workspace_profile (src/app_controller.py:2870).
  4. App: _capture_workspace_profile(name) → returns WorkspaceProfile(..., ini_content=b"", ...).
  5. workspace_manager.save_profile(profile)profile.to_dict(){"ini_content": b"", ...}.
  6. _save_filetomli_w.dump(data, f)TypeError raised.
  7. Exception propagates; profile is NOT saved to disk; workspace_profiles is NOT reloaded; self._app.workspace_profiles is NOT updated.
  8. Test: set_value('ui_separate_tier1', False) → field is False.
  9. Test: push_event("custom_callback", {"callback": "load_workspace_profile", ...}).
  10. App: _cb_load_workspace_profile(name)if name in self.workspace_profiles:False (save failed) → does nothing.
  11. Test: assert get_value('ui_separate_tier1') is Truefails (still False).

The original pre-defer code (ini = imgui.save_ini_settings_to_memory()) returned a str that round-tripped through TOML successfully; tests passed. The defer fix introduced a type-incompatible sentinel value that broke the serialization contract.

The 1-line fix: change ini = b"" to ini = "" (and add a defensive str-coerce for the non-defer path).

Root cause analysis (Category B — prior session test)

The test mocks imscope.window(...) to return a MagicMock() whose __enter__ returns the bare mock. Production code at src/gui_2.py:2333 does with imscope.window(...) as (opened, visible): which expects a 2-tuple. The test's setup (lines ~70-80) sets __enter__ for many imscope context managers to return non-iterable MagicMock() but for popup_modal (line ~91) correctly returns (True, None). The imscope.window setup is missing the tuple-return — purely a test-authoring bug.

2. Goals

  1. Restore 100% pass rate on the 272-file test suite (no regressions in any other test).
  2. Preserve the defer-not-catch safety property of commit d7487af4 (avoid C-level crash on early-render C calls).
  3. Harden the defer-not-catch documentation to call out the str/bytes type contract (avoid future regressions of the same kind).
  4. Tighten the test-authoring contract for the prior session test: mock imscope context managers with the correct return shape.
  5. OPTIONAL/DEFERRED: Harden the defer-not-catch pattern doc with a "sentinel must match consumer type contract" note. Per user review (2026-06-05), this is deferred to the end of the track. If time permits, do it; otherwise leave for a follow-up patch.

3. Non-Goals

  • Not refactoring the workspace profile save/load architecture.
  • Not adding wait-for-ready semantics to the test framework (deferred to a separate live_gui harden track; tracked as backlog item 0 in conductor/tracks.md).
  • Not fixing the broader test fragility / session-state issues (deferred).
  • Not addressing sloppy.py startup latency (separate track, also backlog).

4. Design

Change 1: Fix ini = b""ini = "" in _capture_workspace_profile

Files:

  • Modify: src/gui_2.py:601-606 (the defer branch)
  • Modify: src/gui_2.py:606-609 (the non-defer branch's except handler)

Approach: Change ini = b"" to ini = "" in both places. The pre-fix code returned a str; we're restoring that contract. Additionally, defensively coerce the non-defer result: ini = imgui.save_ini_settings_to_memory() returns a str per imgui-bundle docs, but to be safe against future imgui-bundle changes, wrap it: ini = str(imgui.save_ini_settings_to_memory() or "").

def _capture_workspace_profile(self, name: str) -> models.WorkspaceProfile:
  if not getattr(self, "_ini_capture_ready", False):
   self._ini_capture_ready = True
   ini = ""
  else:
   try:
    ini = str(imgui.save_ini_settings_to_memory() or "")
   except Exception:
    ini = ""
  panel_states = { ... }
  return models.WorkspaceProfile(...)

Why: WorkspaceProfile.ini_content: str (src/models.py:799); tomli_w rejects bytes. imgui.load_ini_settings_from_memory(ini_data: str, ...) also expects str. Restoring the str contract is the minimal fix.

Alternatives considered:

  • A2 — Use imgui.save_ini_settings_to_disk(path) then read the file. Rejected: adds a side-effect path that's not idempotent; tests can pollute the test artifacts dir.
  • A3 — Force a frame render in __init__ so the first call is safe. Rejected: changes init semantics; interacts badly with hot-reload (src/hot_reloader.py); may regress startup latency (the very thing the new sloppy.py startup track is meant to address).

Change 2: Fix the prior session test mock

Files:

  • Modify: tests/test_prior_session_no_pop_imbalance.py (the imscope.window mock setup)

Approach: Add the tuple-return to imscope.window's __enter__ mock, matching the pattern already used for popup_modal at line 91:

mock_imscope.window.return_value.__enter__ = MagicMock(return_value=(True, True))
mock_imscope.window.return_value.__exit__ = MagicMock(side_effect=_scope_exit)

Why: The test's imscope.window setup is the only one missing the tuple-return; all other imscope context managers that production code expects to unpack as tuples already have it. This is a 2-line test-only fix.

Change 3: Add a regression test for the ini_content type contract

Files:

  • Create: tests/test_workspace_profile_serialization.py

Approach: Add a unit test that verifies a WorkspaceProfile with ini_content="" (empty str) round-trips through TOML via to_dicttomli_w.dumptomllib.loadfrom_dict without raising. This is the contract that the defer fix violated.

def test_workspace_profile_empty_ini_content_roundtrips():
  from src.models import WorkspaceProfile
  profile = WorkspaceProfile(name="t", ini_content="", show_windows={"A": True}, panel_states={"x": 1})
  d = profile.to_dict()
  import io, tomli_w, tomllib
  buf = io.BytesIO()
  tomli_w.dump({profile.name: d}, buf)  # this is what save_profile does
  buf.seek(0)
  back = tomllib.load(buf)
  loaded = WorkspaceProfile.from_dict("t", back["t"])
  assert loaded.ini_content == ""
  assert loaded.show_windows == {"A": True}
  assert loaded.panel_states == {"x": 1}

Why: This test would have caught the d7487af4 regression. It encodes the type contract for future contributors. It's a pure unit test, no live_gui, runs in <1s.

Change 4: Harden the defer-not-catch doc

Files:

  • Modify: docs/guide_gui_2.md "Workspace Profile Defer-Not-Catch" section
  • Modify: docs/guide_testing.md "Early-Render C-Level Crashes" section
  • Modify: conductor/workflow.md "Defer-Not-Catch Pattern for Native Crashes" section

Approach: Add a note: "When implementing a defer-not-catch guard for a return value, ensure the sentinel value matches the type contract of the downstream consumer. For WorkspaceProfile.ini_content: str, the sentinel must be "" (str), not b"" (bytes) — TOML serialization rejects bytes."

Why: Future contributors applying the defer-not-catch pattern should not silently introduce type-incompatible sentinels.

5. Data Flow

Before (buggy)

set_value(True)            → app.ui_separate_tier1 = True
save_workspace_profile     → _capture_workspace_profile → ini=b""  (bytes)
                           → to_dict() → {"ini_content": b""}
                           → tomli_w.dump → TypeError
                           → profile NOT saved
set_value(False)           → app.ui_separate_tier1 = False
load_workspace_profile     → name not in workspace_profiles → no-op
assert get_value is True   → FAILS (still False)

After (fixed)

set_value(True)            → app.ui_separate_tier1 = True
save_workspace_profile     → _capture_workspace_profile → ini=""  (str)
                           → to_dict() → {"ini_content": ""}
                           → tomli_w.dump → OK
                           → profile saved
set_value(False)           → app.ui_separate_tier1 = False
load_workspace_profile     → name in workspace_profiles → _apply_workspace_profile
                           → setattr(self, "ui_separate_tier1", True)
assert get_value is True   → PASSES

6. Error Handling

  • The defer branch and the except branch both set ini = "". Empty string is a valid str and is safe for tomli_w, for the dataclass, and for imgui.load_ini_settings_from_memory("") (which is a no-op that lets ImGui use its defaults).
  • No new exceptions are introduced. The TypeError from the buggy b"" goes away because the type is now str.
  • The new regression test (test_workspace_profile_serialization.py) is itself a forward-looking guard: if a future change reintroduces a bytes sentinel, the test will fail with a clear message.

7. Testing Strategy

New tests

  • tests/test_workspace_profile_serialization.py::test_workspace_profile_empty_ini_content_roundtrips — pure unit test, <1s, encodes the str contract.

Existing tests that should now pass

  • tests/test_auto_switch_sim::test_auto_switch_sim — saves+loads workspace profile.
  • tests/test_workspace_profiles_sim::test_workspace_profiles_restoration — saves+loads workspace profile.
  • tests/test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders — mock setup fix.

Regression check

  • Re-run the full batched test suite (scripts/run_tests_batched.py) after the fixes; expect 272/272 pass.
  • Re-run targeted batches of theme tests (test_theme*, test_log_pruner*, test_view_presets*, test_gui_progress*, test_gui_phase4*) to verify the prior doc-track fixes still pass.

8. Risk Assessment

Risk Likelihood Impact Mitigation
The str() coercion in the non-defer branch changes behavior Low Low imgui.save_ini_settings_to_memory() is documented to return str; the coercion is defensive only. The or "" handles a None return (which imgui-bundle does not produce but we don't want to crash on).
The new unit test depends on tomli_w semantics that change Very low Low tomli_w is a stable dep; the test would only break if bytes becomes serializable, which would be a major version change.
The mock fix in the prior session test changes other behavior Low Low The fix only adds the missing tuple-return; existing mocks for other imscope context managers are untouched.
Removing the b"" sentinel causes the early-render C crash to return Very low High The try/except Exception around imgui.save_ini_settings_to_memory() is preserved; the flag-based defer is preserved. Only the type of the sentinel changes.

9. Out of Scope (Tracked Separately)

  • live_gui session-state contract (test-authoring rigor, wait-for-ready pattern) — see [docs/guide_testing.md#authoring-robust-live_gui-tests-dont-assume-clean-state] (added in this session). This is a doc-only change; tests will be hardened over time as they break.
  • sloppy.py startup latency — new backlog item 0 in conductor/tracks.md, planned via superpowers writing-plans skill in a future session.
  • Other live_gui tests still flagged as fragile in the regression-fixes plan (MMA engine state transitions, RAG status timing) — these were in the deferred category of the regression_fixes_20260605 plan; not addressed by this design.

10. References

  • Commit d7487af4 — the defer-not-catch fix that introduced the b"" sentinel.
  • src/gui_2.py:601-606 — current defer code.
  • src/models.py:797-823WorkspaceProfile dataclass with ini_content: str.
  • src/workspace_manager.py:48-58save_profile that calls to_dict then tomli_w.dump.
  • docs/guide_gui_2.md#workspace-profile-defer-not-catch — the defer-not-catch section to harden.
  • docs/guide_testing.md#known-gotchas-2026-06-05 — the early-render C-crash section to harden.
  • conductor/tracks.mdregression_fixes_20260605 and multi_themes_20260604 entries.
  • conductor/tracks.md — new backlog item 0 (sloppy.py startup speedup).