252 lines
16 KiB
Markdown
252 lines
16 KiB
Markdown
# Live-GUI Fragility Fixes — Design
|
|
|
|
**Date:** 2026-06-05
|
|
**Status:** Draft
|
|
**Track follow-up to:** regression_fixes_20260605
|
|
**Scope:** Fix 3 failing live_gui tests discovered in the 2026-06-05 batched test run, harden the defer-not-catch pattern doc, restore 100% pass rate on the 272-file test suite.
|
|
|
|
## 1. Background
|
|
|
|
### Scope decisions (per user review 2026-06-05)
|
|
- Change 1 (the `b""` → `""` fix): **in scope, critical path.**
|
|
- Change 2 (test mock fix for prior session test): **SCOPE REDUCED during execution.** The test was more under-mocked than the spec assumed. Initial error at `src/gui_2.py:2333` (imscope.window tuple unpack) was the first of several un-mocked dependencies. After fixing imscope.window, the next failure surfaces at `src/gui_2.py:4496` (render_theme_panel: imgui.begin returning bool where 2-tuple expected). The test calls `render_main_interface` which is a kitchen-sink function requiring 50+ mocks. **Decision: defer Change 2 to a separate follow-up track** that focuses on refactoring the test to either (a) exercise a narrow prior-session render path instead of `render_main_interface`, or (b) add the missing 50+ mocks. The imscope.window fix is still applied as a defensive change (and as a model for future test work).
|
|
- Change 3 (regression unit test): **in scope, critical path.**
|
|
- Change 4 (doc hardening of defer-not-catch sections): **DEFERRED to end of track** — user wants to see how long the critical path takes first. If time permits at the end, do Change 4 as a final commit; otherwise leave for a follow-up patch.
|
|
|
|
### Revised pass-rate target
|
|
- Before track: 269/272 (98.9%)
|
|
- After Change 1: 271/272 (99.6%) — both `test_auto_switch_sim` and `test_workspace_profiles_restoration` should pass; `test_prior_session_no_pop_imbalance` is deferred to a follow-up.
|
|
- After Change 3: 272/272 if Change 2 also fixed, else 271/272 + new regression unit test passes.
|
|
|
|
### Follow-up track: prior_session_test_harden_20260605
|
|
A new track to be queued in `conductor/tracks.md` covering the `test_prior_session_no_pop_imbalance` test's comprehensive mock setup (or refactor to test a narrow path).
|
|
|
|
### Failures (3)
|
|
|
|
| Test | File | Symptom | Root cause |
|
|
|---|---|---|---|
|
|
| `test_auto_switch_sim` | `tests/test_auto_switch_sim.py:47` | `assert False == True` after triggering tier-3 auto-switch | Category A: profile save raises TypeError → no profile saved → load is no-op |
|
|
| `test_workspace_profiles_restoration` | `tests/test_workspace_profiles_sim.py:81` | `assert False is True` after `load_workspace_profile` | Category A: same as above |
|
|
| `test_no_extraneous_pop_when_prior_session_renders` | `tests/test_prior_session_no_pop_imbalance.py:135` | `TypeError: cannot unpack non-iterable NoneType object` at `src/gui_2.py:2333` | Category B: test mock setup for `imscope.window` returns non-iterable, but production code expects `(opened, visible)` tuple |
|
|
|
|
### Test run results (2026-06-05, batched via `scripts/run_tests_batched.py`)
|
|
|
|
- **272 test files, 68 batches, 269/272 passing (98.9%).**
|
|
- 3 failing tests, all in `live_gui` (session-scoped fixture) or `integration` marker category.
|
|
- 0 failing tests in any other category (unit, headless, mock_app, simulation).
|
|
|
|
### Root cause analysis (Category A — both profile failures)
|
|
|
|
A regression introduced by commit `d7487af4` ("fix(gui_2): defer save_ini_settings on first capture to avoid early-render crash"). That commit added a defer-not-catch guard in `_capture_workspace_profile` (`src/gui_2.py:601-606`):
|
|
|
|
```python
|
|
def _capture_workspace_profile(self, name: str) -> models.WorkspaceProfile:
|
|
if not getattr(self, "_ini_capture_ready", False):
|
|
self._ini_capture_ready = True
|
|
ini = b"" # <-- BUG: bytes, not str
|
|
else:
|
|
try:
|
|
ini = imgui.save_ini_settings_to_memory() # returns str
|
|
except Exception:
|
|
ini = b"" # <-- BUG: same
|
|
...
|
|
```
|
|
|
|
The bug: `ini = b""` is a `bytes` literal, but the `WorkspaceProfile` dataclass declares `ini_content: str` (`src/models.py:799`), AND `tomli_w` (the TOML serializer) raises `TypeError: Object of type 'bytes' is not TOML serializable`.
|
|
|
|
Verified empirically:
|
|
```python
|
|
>>> import tomli_w
|
|
>>> tomli_w.dump({"ini_content": b""}, io.BytesIO())
|
|
TypeError: Object of type 'bytes' is not TOML serializable
|
|
```
|
|
|
|
Trace path for the failure:
|
|
1. Test: `set_value('ui_separate_tier1', True)` → field is `True` in app state.
|
|
2. Test: `push_event("custom_callback", {"callback": "save_workspace_profile", ...})`.
|
|
3. GUI: `_process_pending_gui_tasks` → `_cb_save_workspace_profile` (`src/app_controller.py:2870`).
|
|
4. App: `_capture_workspace_profile(name)` → returns `WorkspaceProfile(..., ini_content=b"", ...)`.
|
|
5. `workspace_manager.save_profile(profile)` → `profile.to_dict()` → `{"ini_content": b"", ...}`.
|
|
6. `_save_file` → `tomli_w.dump(data, f)` → **TypeError raised**.
|
|
7. Exception propagates; profile is **NOT saved to disk**; `workspace_profiles` is **NOT reloaded**; `self._app.workspace_profiles` is **NOT updated**.
|
|
8. Test: `set_value('ui_separate_tier1', False)` → field is `False`.
|
|
9. Test: `push_event("custom_callback", {"callback": "load_workspace_profile", ...})`.
|
|
10. App: `_cb_load_workspace_profile(name)` → `if name in self.workspace_profiles:` → `False` (save failed) → **does nothing**.
|
|
11. Test: `assert get_value('ui_separate_tier1') is True` → **fails** (still `False`).
|
|
|
|
The original pre-defer code (`ini = imgui.save_ini_settings_to_memory()`) returned a `str` that round-tripped through TOML successfully; tests passed. The defer fix introduced a type-incompatible sentinel value that broke the serialization contract.
|
|
|
|
The 1-line fix: change `ini = b""` to `ini = ""` (and add a defensive str-coerce for the non-defer path).
|
|
|
|
### Root cause analysis (Category B — prior session test)
|
|
|
|
The test mocks `imscope.window(...)` to return a `MagicMock()` whose `__enter__` returns the bare mock. Production code at `src/gui_2.py:2333` does `with imscope.window(...) as (opened, visible):` which expects a 2-tuple. The test's setup (lines ~70-80) sets `__enter__` for many imscope context managers to return non-iterable `MagicMock()` but for `popup_modal` (line ~91) correctly returns `(True, None)`. The `imscope.window` setup is missing the tuple-return — purely a test-authoring bug.
|
|
|
|
## 2. Goals
|
|
|
|
1. **Restore 100% pass rate on the 272-file test suite** (no regressions in any other test).
|
|
2. **Preserve the defer-not-catch safety property** of commit `d7487af4` (avoid C-level crash on early-render C calls).
|
|
3. **Harden the defer-not-catch documentation** to call out the str/bytes type contract (avoid future regressions of the same kind).
|
|
4. **Tighten the test-authoring contract** for the prior session test: mock imscope context managers with the correct return shape.
|
|
5. **OPTIONAL/DEFERRED:** Harden the defer-not-catch pattern doc with a "sentinel must match consumer type contract" note. Per user review (2026-06-05), this is deferred to the end of the track. If time permits, do it; otherwise leave for a follow-up patch.
|
|
|
|
## 3. Non-Goals
|
|
|
|
- Not refactoring the workspace profile save/load architecture.
|
|
- Not adding wait-for-ready semantics to the test framework (deferred to a separate live_gui harden track; tracked as backlog item 0 in `conductor/tracks.md`).
|
|
- Not fixing the broader test fragility / session-state issues (deferred).
|
|
- Not addressing `sloppy.py` startup latency (separate track, also backlog).
|
|
|
|
## 4. Design
|
|
|
|
### Change 1: Fix `ini = b""` → `ini = ""` in `_capture_workspace_profile`
|
|
|
|
**Files:**
|
|
- Modify: `src/gui_2.py:601-606` (the defer branch)
|
|
- Modify: `src/gui_2.py:606-609` (the non-defer branch's `except` handler)
|
|
|
|
**Approach:** Change `ini = b""` to `ini = ""` in both places. The pre-fix code returned a `str`; we're restoring that contract. Additionally, defensively coerce the non-defer result: `ini = imgui.save_ini_settings_to_memory()` returns a `str` per `imgui-bundle` docs, but to be safe against future imgui-bundle changes, wrap it: `ini = str(imgui.save_ini_settings_to_memory() or "")`.
|
|
|
|
```python
|
|
def _capture_workspace_profile(self, name: str) -> models.WorkspaceProfile:
|
|
if not getattr(self, "_ini_capture_ready", False):
|
|
self._ini_capture_ready = True
|
|
ini = ""
|
|
else:
|
|
try:
|
|
ini = str(imgui.save_ini_settings_to_memory() or "")
|
|
except Exception:
|
|
ini = ""
|
|
panel_states = { ... }
|
|
return models.WorkspaceProfile(...)
|
|
```
|
|
|
|
**Why:** `WorkspaceProfile.ini_content: str` (`src/models.py:799`); `tomli_w` rejects `bytes`. `imgui.load_ini_settings_from_memory(ini_data: str, ...)` also expects `str`. Restoring the `str` contract is the minimal fix.
|
|
|
|
**Alternatives considered:**
|
|
- A2 — Use `imgui.save_ini_settings_to_disk(path)` then read the file. **Rejected**: adds a side-effect path that's not idempotent; tests can pollute the test artifacts dir.
|
|
- A3 — Force a frame render in `__init__` so the first call is safe. **Rejected**: changes init semantics; interacts badly with hot-reload (`src/hot_reloader.py`); may regress startup latency (the very thing the new sloppy.py startup track is meant to address).
|
|
|
|
### Change 2: Fix the prior session test mock
|
|
|
|
**Files:**
|
|
- Modify: `tests/test_prior_session_no_pop_imbalance.py` (the imscope.window mock setup)
|
|
|
|
**Approach:** Add the tuple-return to `imscope.window`'s `__enter__` mock, matching the pattern already used for `popup_modal` at line 91:
|
|
|
|
```python
|
|
mock_imscope.window.return_value.__enter__ = MagicMock(return_value=(True, True))
|
|
mock_imscope.window.return_value.__exit__ = MagicMock(side_effect=_scope_exit)
|
|
```
|
|
|
|
**Why:** The test's `imscope.window` setup is the only one missing the tuple-return; all other imscope context managers that production code expects to unpack as tuples already have it. This is a 2-line test-only fix.
|
|
|
|
### Change 3: Add a regression test for the ini_content type contract
|
|
|
|
**Files:**
|
|
- Create: `tests/test_workspace_profile_serialization.py`
|
|
|
|
**Approach:** Add a unit test that verifies a `WorkspaceProfile` with `ini_content=""` (empty str) round-trips through TOML via `to_dict` → `tomli_w.dump` → `tomllib.load` → `from_dict` without raising. This is the contract that the defer fix violated.
|
|
|
|
```python
|
|
def test_workspace_profile_empty_ini_content_roundtrips():
|
|
from src.models import WorkspaceProfile
|
|
profile = WorkspaceProfile(name="t", ini_content="", show_windows={"A": True}, panel_states={"x": 1})
|
|
d = profile.to_dict()
|
|
import io, tomli_w, tomllib
|
|
buf = io.BytesIO()
|
|
tomli_w.dump({profile.name: d}, buf) # this is what save_profile does
|
|
buf.seek(0)
|
|
back = tomllib.load(buf)
|
|
loaded = WorkspaceProfile.from_dict("t", back["t"])
|
|
assert loaded.ini_content == ""
|
|
assert loaded.show_windows == {"A": True}
|
|
assert loaded.panel_states == {"x": 1}
|
|
```
|
|
|
|
**Why:** This test would have caught the `d7487af4` regression. It encodes the type contract for future contributors. It's a pure unit test, no live_gui, runs in <1s.
|
|
|
|
### Change 4: Harden the defer-not-catch doc
|
|
|
|
**Files:**
|
|
- Modify: `docs/guide_gui_2.md` "Workspace Profile Defer-Not-Catch" section
|
|
- Modify: `docs/guide_testing.md` "Early-Render C-Level Crashes" section
|
|
- Modify: `conductor/workflow.md` "Defer-Not-Catch Pattern for Native Crashes" section
|
|
|
|
**Approach:** Add a note: "When implementing a defer-not-catch guard for a return value, **ensure the sentinel value matches the type contract of the downstream consumer**. For `WorkspaceProfile.ini_content: str`, the sentinel must be `""` (str), not `b""` (bytes) — TOML serialization rejects bytes."
|
|
|
|
**Why:** Future contributors applying the defer-not-catch pattern should not silently introduce type-incompatible sentinels.
|
|
|
|
## 5. Data Flow
|
|
|
|
### Before (buggy)
|
|
```
|
|
set_value(True) → app.ui_separate_tier1 = True
|
|
save_workspace_profile → _capture_workspace_profile → ini=b"" (bytes)
|
|
→ to_dict() → {"ini_content": b""}
|
|
→ tomli_w.dump → TypeError
|
|
→ profile NOT saved
|
|
set_value(False) → app.ui_separate_tier1 = False
|
|
load_workspace_profile → name not in workspace_profiles → no-op
|
|
assert get_value is True → FAILS (still False)
|
|
```
|
|
|
|
### After (fixed)
|
|
```
|
|
set_value(True) → app.ui_separate_tier1 = True
|
|
save_workspace_profile → _capture_workspace_profile → ini="" (str)
|
|
→ to_dict() → {"ini_content": ""}
|
|
→ tomli_w.dump → OK
|
|
→ profile saved
|
|
set_value(False) → app.ui_separate_tier1 = False
|
|
load_workspace_profile → name in workspace_profiles → _apply_workspace_profile
|
|
→ setattr(self, "ui_separate_tier1", True)
|
|
assert get_value is True → PASSES
|
|
```
|
|
|
|
## 6. Error Handling
|
|
|
|
- The defer branch and the `except` branch both set `ini = ""`. Empty string is a valid `str` and is safe for `tomli_w`, for the dataclass, and for `imgui.load_ini_settings_from_memory("")` (which is a no-op that lets ImGui use its defaults).
|
|
- No new exceptions are introduced. The `TypeError` from the buggy `b""` goes away because the type is now `str`.
|
|
- The new regression test (`test_workspace_profile_serialization.py`) is itself a forward-looking guard: if a future change reintroduces a bytes sentinel, the test will fail with a clear message.
|
|
|
|
## 7. Testing Strategy
|
|
|
|
### New tests
|
|
- `tests/test_workspace_profile_serialization.py::test_workspace_profile_empty_ini_content_roundtrips` — pure unit test, <1s, encodes the str contract.
|
|
|
|
### Existing tests that should now pass
|
|
- `tests/test_auto_switch_sim::test_auto_switch_sim` — saves+loads workspace profile.
|
|
- `tests/test_workspace_profiles_sim::test_workspace_profiles_restoration` — saves+loads workspace profile.
|
|
- `tests/test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders` — mock setup fix.
|
|
|
|
### Regression check
|
|
- Re-run the full batched test suite (`scripts/run_tests_batched.py`) after the fixes; expect 272/272 pass.
|
|
- Re-run targeted batches of theme tests (`test_theme*`, `test_log_pruner*`, `test_view_presets*`, `test_gui_progress*`, `test_gui_phase4*`) to verify the prior doc-track fixes still pass.
|
|
|
|
## 8. Risk Assessment
|
|
|
|
| Risk | Likelihood | Impact | Mitigation |
|
|
|---|---|---|---|
|
|
| The `str()` coercion in the non-defer branch changes behavior | Low | Low | `imgui.save_ini_settings_to_memory()` is documented to return `str`; the coercion is defensive only. The `or ""` handles a `None` return (which `imgui-bundle` does not produce but we don't want to crash on). |
|
|
| The new unit test depends on `tomli_w` semantics that change | Very low | Low | `tomli_w` is a stable dep; the test would only break if `bytes` becomes serializable, which would be a major version change. |
|
|
| The mock fix in the prior session test changes other behavior | Low | Low | The fix only adds the missing tuple-return; existing mocks for other imscope context managers are untouched. |
|
|
| Removing the `b""` sentinel causes the early-render C crash to return | Very low | High | The `try/except Exception` around `imgui.save_ini_settings_to_memory()` is preserved; the flag-based defer is preserved. Only the type of the sentinel changes. |
|
|
|
|
## 9. Out of Scope (Tracked Separately)
|
|
|
|
- **live_gui session-state contract** (test-authoring rigor, wait-for-ready pattern) — see [docs/guide_testing.md#authoring-robust-live_gui-tests-dont-assume-clean-state] (added in this session). This is a doc-only change; tests will be hardened over time as they break.
|
|
- **sloppy.py startup latency** — new backlog item 0 in `conductor/tracks.md`, planned via superpowers writing-plans skill in a future session.
|
|
- **Other live_gui tests still flagged as fragile in the regression-fixes plan** (MMA engine state transitions, RAG status timing) — these were in the deferred category of the `regression_fixes_20260605` plan; not addressed by this design.
|
|
|
|
## 10. References
|
|
|
|
- Commit `d7487af4` — the defer-not-catch fix that introduced the `b""` sentinel.
|
|
- `src/gui_2.py:601-606` — current defer code.
|
|
- `src/models.py:797-823` — `WorkspaceProfile` dataclass with `ini_content: str`.
|
|
- `src/workspace_manager.py:48-58` — `save_profile` that calls `to_dict` then `tomli_w.dump`.
|
|
- `docs/guide_gui_2.md#workspace-profile-defer-not-catch` — the defer-not-catch section to harden.
|
|
- `docs/guide_testing.md#known-gotchas-2026-06-05` — the early-render C-crash section to harden.
|
|
- `conductor/tracks.md` — `regression_fixes_20260605` and `multi_themes_20260604` entries.
|
|
- `conductor/tracks.md` — new backlog item 0 (sloppy.py startup speedup).
|