docs(spec+plan): undo_redo_lifecycle_fix (3-phase investigation: state-sync vs snapshot vs flake)

2026-06-05 22:49:16 -04:00
parent b692353e98
commit 3e52f20d16
2 changed files with 244 additions and 0 deletions
@@ -0,0 +1,161 @@
+# undo_redo_lifecycle_fix_20260605 Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Resolve the `test_undo_redo_lifecycle` failure. Phase 1: verify state-sync fix is sufficient. Phase 2: investigate snapshot mechanism if needed. Phase 3: flake-fix with polling if needed.
+
+**Architecture:** Sequential investigation. Cheapest fix first.
+
+**Tech Stack:** Python 3.11+, pytest 9.0.
+
+---
+
+## File Structure
+
+| File | Change | Purpose |
+|---|---|---|
+| (Phase 1) None | | |
+| (Phase 2) `src/history.py`, `src/gui_2.py`, `tests/test_undo_redo_ai_input_snapshot.py` | Possibly modify | Fix snapshot if it doesn't include ai_input |
+| (Phase 3) `tests/test_undo_redo_sim.py` | Possibly modify | Replace time.sleep with polling |
+
+---
+
+## Task 1: Phase 1 — Run the test, see if it passes after the state-sync fix
+
+**Files:** (no changes; verification)
+
+- [ ] **Step 1.1: Run the test in isolation**
+
+```powershell
+cd C:\projects\manual_slop; uv run pytest tests/test_undo_redo_sim.py::test_undo_redo_lifecycle -v --timeout=60
+```
+
+Expected outcomes:
+- **A) PASSES** → Done. The state-sync fix is sufficient. Skip to Task 4 (documentation).
+- **B) FAILS** → Proceed to Task 2 (Phase 2: investigate snapshot).
+
+- [ ] **Step 1.2: Document the outcome**
+
+If passes: commit a doc-only note confirming state-sync fixed it.
+
+```powershell
+cd C:\projects\manual_slop; git -c core.autocrlf=false commit --allow-empty -m "verify: undo_redo_lifecycle passes after state-sync fix"
+$h = git -C C:\projects\manual_slop log -1 --format='%H'
+git -C C:\projects\manual_slop notes add -m "Confirmed: the ui_ai_input property delegation in live_gui_state_sync_20260605 fixes test_undo_redo_lifecycle. The snapshot reads app.ui_ai_input (now delegated to controller.ui_ai_input where the value lives) and captures the right value. Undo restores correctly." $h
+```
+
+If fails: proceed to Task 2.
+
+---
+
+## Task 2: Phase 2 — Check the snapshot mechanism for `ai_input`
+
+**Files:** (read-only; possibly modify later)
+
+- [ ] **Step 2.1: Read `UISnapshot` definition**
+
+Read `src/history.py` to find the `UISnapshot` dataclass. List its fields.
+
+```powershell
+cd C:\projects\manual_slop; uv run python -c "
+import re
+with open('src/history.py', 'r', encoding='utf-8') as f:
+    content = f.read()
+m = re.search(r'class UISnapshot', content)
+if m:
+    print(content[m.start():m.start()+500])
+"
+```
+
+- [ ] **Step 2.2: Check if `ai_input` is a field**
+
+- **A) `ai_input` is a field** → Task 3: check `_apply_snapshot` for restore line.
+- **B) `ai_input` is NOT a field** → Add it. See Step 2.3.
+
+- [ ] **Step 2.3: If `ai_input` is missing from UISnapshot, add it**
+
+Add `ai_input: str = ""` to the UISnapshot dataclass.
+
+In `src/gui_2.py:_take_snapshot` (line 551), add `ai_input=self.ui_ai_input,`.
+
+In `src/gui_2.py:_apply_snapshot` (line 569), add `self.ui_ai_input = snapshot.ai_input`.
+
+Commit:
+
+```powershell
+cd C:\projects\manual_slop; git add src/history.py src/gui_2.py
+git -C C:\projects\manual_slop commit -m "fix(gui_2): add ai_input to UISnapshot for undo/redo round-trip"
+$h = git -C C:\projects\manual_slop log -1 --format='%H'
+git -C C:\projects\manual_slop notes add -m "Add ai_input field to UISnapshot (src/history.py), capture in _take_snapshot, restore in _apply_snapshot. The undo/redo system was silently dropping ai_input changes; this fixes test_undo_redo_lifecycle." $h
+```
+
+- [ ] **Step 2.4: Run the test**
+
+```powershell
+cd C:\projects\manual_slop; uv run pytest tests/test_undo_redo_sim.py::test_undo_redo_lifecycle -v --timeout=60
+```
+
+Expected: 1 passed.
+
+If still fails → proceed to Task 3 (Phase 3: flake-fix with polling).
+
+---
+
+## Task 3: Phase 3 — Test-ordering / flake investigation
+
+**Files:**
+- Modify: `tests/test_undo_redo_sim.py` (replace time.sleep with polling)
+
+- [ ] **Step 3.1: Add the polling helpers (or import from wait_for_ready track)**
+
+```python
+import time
+
+def wait_for_value(client, item, expected, timeout=5.0):
+    deadline = time.time() + timeout
+    while time.time() < deadline:
+        if client.get_value(item) == expected:
+            return
+        time.sleep(0.1)
+    raise TimeoutError(f"Item '{item}' did not become {expected!r} within {timeout}s")
+```
+
+- [ ] **Step 3.2: Replace the time.sleep calls**
+
+- [ ] **Step 3.3: Run the test**
+
+- [ ] **Step 3.4: Commit**
+
+```powershell
+cd C:\projects\manual_slop; git add tests/test_undo_redo_sim.py
+git -C C:\projects\manual_slop commit -m "test(undo_redo): replace time.sleep with wait_for_value polling"
+```
+
+---
+
+## Task 4: Update tracks.md
+
+**Files:**
+- Modify: `conductor/tracks.md`
+
+- [ ] **Step 4.1: Add a note about the outcome**
+
+```powershell
+cd C:\projects\manual_slop; git add conductor/tracks.md
+git -C C:\projects\manual_slop commit -m "conductor: undo_redo_lifecycle sub-track complete"
+```
+
+---
+
+## Self-Review
+
+- **Spec coverage:** 3-phase sequential investigation. State-sync fix may resolve it (Phase 1). If not, snapshot investigation (Phase 2). If not, flake-fix (Phase 3).
+- **Placeholders:** None.
+- **Type consistency:** `ai_input: str` matches the existing type.
+- **Risk:** Low — only investigation + minimal source change.
+
+---
+
+## Execution Handoff
+
+Inline execution. Up to 4 tasks; some may be skipped depending on the outcome of Phase 1.
@@ -0,0 +1,83 @@
+# undo_redo_lifecycle_fix_20260605 — Design
+
+**Date:** 2026-06-05
+**Status:** Draft
+**Track:** undo_redo_lifecycle_fix_20260605 (sub-project of v2)
+
+## Problem Statement
+
+`tests/test_undo_redo_sim.py::test_undo_redo_lifecycle` failed in the 2026-06-05 second batched test run (after the first run had it passing). The test:
+
+1. Sets `temperature=0.5` and `ai_input="Initial Input"`.
+2. Modifies to `temperature=1.5` and `ai_input="Modified Input"`.
+3. Asserts current state — passes.
+4. Clicks `btn_undo`.
+5. Asserts `ai_input == "Initial Input"` and `temperature == 0.5`.
+6. **Fails on the `ai_input` assertion**: gets `''` (empty string).
+
+The undo restores `temperature` correctly but not `ai_input`. The other 2 tests in the same file (`test_undo_redo_discussion_mutation`, `test_undo_redo_context_mutation`) pass — they don't exercise `ai_input`.
+
+### Possible causes
+
+1. **App/Controller state sync bug for `ai_input`.** The snapshot at `src/gui_2.py:551` reads `self.ui_ai_input` (App), but `set_value` writes to `controller.ui_ai_input`. The snapshot captures the App's (stale) value. **This should be fixed by the `live_gui_state_sync_20260605` track** (which adds an `ui_ai_input` property on the App that delegates to the Controller).
+
+2. **Snapshot doesn't include `ai_input` field at all.** Check `src/history.py:UISnapshot` — if `ai_input` isn't a field, the snapshot stores nothing, and the apply can't restore.
+
+3. **Test flake.** The test was passing in the first run, failing in the second. The `live_gui` fixture is session-scoped, and different test orders can produce different state. The test's `time.sleep(2.0)` after `btn_undo` may not be enough if the GUI is under load.
+
+4. **Recent user commit `873edf42` regression.** The user's cleanup commit touched 53 files including `src/gui_2.py:261` lines. If the cleanup accidentally changed the snapshot mechanism, this could break the test.
+
+## Design
+
+### Approach: Two-phase investigation
+
+**Phase 1: Re-run the test after the `live_gui_state_sync_20260605` track lands.**
+
+If the state-sync property fix for `ui_ai_input` unblocks the test, the issue is resolved. No further work needed.
+
+**Phase 2: If the test still fails, deep-dive into the snapshot mechanism.**
+
+Investigate in this order:
+1. Check `src/history.py:UISnapshot` to see if `ai_input` is a field. If not, add it.
+2. Check `src/gui_2.py:_apply_snapshot` to see if it restores `ai_input`. If not, add the restore line.
+3. Check if there's a per-tick snapshot filter that excludes certain fields.
+4. Add a regression test that explicitly verifies the snapshot/undo round-trip for `ai_input`.
+
+**Phase 3: If still failing, test-ordering / flake investigation.**
+
+The test uses `time.sleep(2.0)` after `btn_undo`. Convert to polling (`wait_for_load_completion` from the `wait_for_ready_test_pattern_20260605` track). If the test passes with polling, it was a flake.
+
+### Why this approach
+
+- **Sequential investigation**: cheapest fixes first. State-sync is the most likely cause (it just landed as a property fix). Snapshot mechanism is the second most likely. Flake is the third.
+- **No speculative changes**: don't add `ai_input` to the snapshot if it's already there. Don't change the undo mechanism if the state-sync fix is sufficient.
+
+## File Changes
+
+### Phase 1: None (state-sync fix is in a different track)
+
+### Phase 2 (if needed):
+
+- Modify: `src/history.py` (add `ai_input` field to UISnapshot if missing)
+- Modify: `src/gui_2.py:_apply_snapshot` (add `ai_input` restore line if missing)
+- Add: `tests/test_undo_redo_ai_input_snapshot.py` (regression test for the round-trip)
+
+### Phase 3 (if needed):
+
+- Modify: `tests/test_undo_redo_sim.py` (replace `time.sleep(2.0)` with `wait_for_load_completion`)
+
+## Risk Assessment
+
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| Phase 1 fixes the issue | High | None | Done |
+| Phase 2 needed: snapshot already has ai_input but apply doesn't restore | Medium | Low | Check first, then add the restore line |
+| Phase 2 needed: snapshot doesn't have ai_input | Low | Low | Add the field + apply line |
+| Phase 3 needed: it's a flake | Low | None | Replace sleeps with polling |
+
+## Out of Scope
+
+- **State sync fix** — separate track (`live_gui_state_sync_20260605`).
+- **prior_session test** — separate track (`prior_session_test_harden_20260605`).
+- **wait_for_ready pattern** — separate track (`wait_for_ready_test_pattern_20260605`).
+- **General undo/redo system improvements** — out of scope.