docs(spec+plan): prior_session_test_harden (refactor to narrow render_prior_session_view)

2026-06-05 22:41:46 -04:00
parent 9542c4c750
commit 85cd34683a
2 changed files with 340 additions and 0 deletions
@@ -0,0 +1,118 @@
+# prior_session_test_harden_20260605 — Design
+
+**Date:** 2026-06-05
+**Status:** Draft
+**Track:** prior_session_test_harden_20260605 (sub-project of v2)
+
+## Problem Statement
+
+`tests/test_prior_session_no_pop_imbalance.py::test_no_extraneous_pop_when_prior_session_renders` fails with `TypeError: cannot unpack non-iterable NoneType object` at `src/gui_2.py:2333` (`imscope.window(...) as (opened, visible):`).
+
+Root cause: the test mocks `imscope.window`'s `__enter__` to return a non-iterable `MagicMock()`, but the production code expects a 2-tuple. **AND** the test exercises `gui_2.render_main_interface(app_instance)`, a kitchen-sink function that calls dozens of other render functions, each with their own mock-shape requirements. After fixing the imscope.window tuple-return, the next failure surfaces at `src/gui_2.py:4496` (render_theme_panel: imgui.begin returning bool where 2-tuple expected). The test would need 50+ mocks to fully exercise `render_main_interface`.
+
+## Test's Actual Intent
+
+The test's only assertion is `assert push_count["n"] == pop_count["n"]` — verify that `imscope.style_color` push and pop counts balance when the prior-session render runs. This is a narrow, well-defined contract.
+
+The test does NOT need to exercise the entire `render_main_interface`. It only needs to exercise the prior-session render path.
+
+## Design
+
+### Approach: Call the narrow prior-session render function, not the kitchen sink
+
+`src/gui_2.py` has a dedicated `render_prior_session_view(app)` function (line ~4400) that handles the prior-session rendering. It's a ~30-line function with a finite, mockable set of imgui/imscope calls.
+
+**Hypothetical refactor:**
+
+```python
+def test_no_extraneous_pop_when_prior_session_renders():
+    from src import gui_2
+    from unittest.mock import MagicMock, patch
+
+    app_instance = MagicMock()
+    app_instance.is_viewing_prior_session = True
+    app_instance.perf_profiling_enabled = False
+    app_instance.prior_disc_entries = [
+        {"role": "User", "content": "test", "collapsed": False, "ts": "t1"}
+    ]
+
+    push_count = {"n": 0}
+    pop_count = {"n": 0}
+    def _track_push(*a, **k): push_count["n"] += 1
+    def _track_pop(*a, **k): pop_count["n"] += 1
+
+    with patch("src.gui_2.imgui") as mock_imgui, \
+         patch("src.gui_2.imscope") as mock_imscope, \
+         patch("src.gui_2.theme") as mock_theme, \
+         patch("src.gui_2.markdown_helper") as mock_md:
+
+        # Wire push/pop tracking on imscope.style_color
+        mock_imscope.style_color.return_value.__enter__.side_effect = _track_push
+        mock_imscope.style_color.return_value.__exit__.side_effect = lambda *a: (pop_count.__setitem__("n", pop_count["n"] + 1) or False)
+
+        # Set up tuple-return for ALL imscope context managers (style_color, child, id, etc.)
+        for sc in [mock_imscope.style_color, mock_imscope.child, mock_imscope.id]:
+            sc.return_value.__enter__ = MagicMock()
+            sc.return_value.__exit__ = MagicMock(return_value=False)
+
+        # Mock the small finite set of imgui calls used by render_prior_session_view
+        mock_imgui.Col_ = MagicMock()
+        mock_imgui.button = MagicMock(return_value=False)
+        mock_imgui.same_line = MagicMock()
+        mock_imgui.text_colored = MagicMock()
+        mock_imgui.separator = MagicMock()
+        mock_imgui.get_content_region_avail = MagicMock(return_value=MagicMock(x=800.0, y=600.0))
+        mock_imgui.ImVec2 = lambda *a: MagicMock(x=a[0], y=a[1])
+        mock_imgui.WindowFlags_ = MagicMock()
+        mock_imgui.text = MagicMock()
+
+        mock_theme.get_color = MagicMock(return_value=MagicMock(x=0,y=0,z=0,w=0))
+        mock_theme.ai_text_style.return_value.__enter__ = MagicMock()
+        mock_theme.ai_text_style.return_value.__exit__ = MagicMock(return_value=False)
+
+        mock_md.render = MagicMock()
+
+        # Call the narrow function, NOT the kitchen sink
+        gui_2.render_prior_session_view(app_instance)
+
+    assert push_count["n"] == pop_count["n"], f"Push/pop imbalance: pushes={push_count['n']}, pops={pop_count['n']}"
+```
+
+This is ~30 mocks instead of 50+, scoped to what `render_prior_session_view` actually uses. The imscope mocks all return their own context-manager defaults (no need to return a tuple for `style_color` since `with imscope.style_color(...) as c:` doesn't unpack). The test's actual assertion (push/pop balance) is preserved.
+
+### Why this approach
+
+- **Smallest change to the test**: removes 50+ mocks, replaces with 30+ scoped mocks. Test runs faster.
+- **Preserves test intent**: the assertion is still about push/pop balance in the prior-session render.
+- **Survives future refactors**: as long as `render_prior_session_view` exists, the test is meaningful. If the function is renamed/restructured, the test is localized to that function.
+- **Aligns with the live_gui test philosophy**: tests should exercise narrow paths, not kitchen sinks. (This is consistent with the [docs/guide_testing.md Authoring Robust live_gui Tests] rules I just authored.)
+
+### Alternatives considered
+
+- **A2: Add 50+ mocks to make `render_main_interface` work.** Rejected: the test becomes a maintenance burden (any change to any sub-render function breaks the test). It also tests too much (push/pop balance in the entire GUI, not just prior-session).
+- **A3: Skip the test entirely, mark as known-flake.** Rejected: the test is meaningful and verifies a real contract. Better to make it work.
+
+## File Changes
+
+### Modify: `tests/test_prior_session_no_pop_imbalance.py`
+
+Replace the `render_main_interface(app_instance)` call with `render_prior_session_view(app_instance)`. Remove the mocks for the 50+ imgui methods that are NOT used by `render_prior_session_view` (e.g. `selectable`, `tree_node`, `set_scroll_here_y`, etc.). Keep the mocks for the 30+ methods that ARE used.
+
+### No production code changes
+
+The test is rewritten; `render_prior_session_view` itself does not change.
+
+## Risk Assessment
+
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| `render_prior_session_view` signature/name changes | Low | Medium | The test is local to this function; future refactors will update both |
+| Mocking too aggressively (mocking something the function actually uses) | Medium | Low | Run the test; if it fails, add the missing mock |
+| Test was testing more than just push/pop balance (e.g. some side effect) | Low | Low | Read the original test docstring; the only assertion is push/pop balance |
+
+## Out of Scope
+
+- **State sync fix** — separate track (`live_gui_state_sync_20260605`).
+- **Wait-for-ready pattern** — separate track (`wait_for_ready_test_pattern_20260605`).
+- **undo_redo_lifecycle** — separate track (`undo_redo_lifecycle_fix_20260605`).
+- **Refactoring `render_main_interface` to be smaller** — deferred; out of scope for this track.