docs(workflow): add 3 test-hell lessons to Known Pitfalls + Live_gui Test Fragility

Known Pitfalls (new subsection): - HARD BAN: git checkout -- <file>, git restore, git reset (per AGENTS.md Critical Anti-Patterns; destroyed user in-progress edits twice on 2026-06-07; concrete 2026-06-10 incident: mma_tier_usage_reset_fix regression) Live_gui Test Fragility (2 new subsections): - Anti-pattern: push_event + time.sleep(N) + assert is a race. Fix: poll-until-state-visible with bounded retries. 5+ tests affected in 2026-06-10 batch-green wave. - Async setters need poll-for-state. mma_state_update and rag_* setters dispatch to _pending_gui_tasks queue; the setter returns before the GUI render loop processes the task. Assert immediately = race. Fix: poll via get_value with bounded retry.
2026-06-10 20:19:54 -04:00
parent 01ea22fc4a
commit 965e015709
1 changed files with 50 additions and 0 deletions
@@ -401,6 +401,29 @@ To emulate the 4-Tier MMA Architecture within the standard Conductor extension w

 ## Known Pitfalls (2026-06-05)

+### HARD BAN: `git checkout -- <file>`, `git restore`, `git reset` (Added 2026-06-10)
+
+**Per AGENTS.md (Critical Anti-Patterns):** These three commands are FORBIDDEN without explicit user permission in the same message. They destroyed user in-progress `src/*` edits twice in one session (2026-06-07). If you think you need one, ASK FIRST.
+
+The intent of "look at what the file looked like at commit X" is non-destructive inspection. The CORRECT way:
+
+```bash
+# WRONG: overwrites the working tree
+git checkout HEAD~1 -- src/foo.py
+
+# RIGHT: prints to stdout, leaves working tree alone
+git show HEAD~1:src/foo.py
+```
+
+`git checkout -- <file>` and `git restore` are particularly dangerous because:
+- They overwrite uncommitted changes silently
+- They overwrite previously-committed state in the working tree if the user has already committed and then re-edited
+- The user doesn't see the loss until they notice missing changes
+
+If you genuinely need to revert (e.g., the working tree is broken from a previous agent), use `git stash` first to capture the in-progress state, ASK THE USER, then proceed.
+
+This was the actual cause of the 2026-06-10 `mma_tier_usage_reset_fix` regression: an agent used `git checkout --` to "peek at baseline", which overwrote the just-committed FR1+FR2 fixes. Recovery was via re-applying the fixes with `edit_file` (option B chosen by the user). Don't repeat this.
+
 ### Defer-Not-Catch Pattern for Native Crashes

 `imgui-bundle` (and similar native extension libraries) expose C-level functions that can crash the Python process with a Windows access violation (`0xc0000005`) or a SIGSEGV on Linux. **These crashes are not catchable from Python** — `try/except Exception` does not intercept native access violations, only Python exceptions.
@@ -426,6 +449,33 @@ In particular, watch for:

 `live_gui` is a session-scoped fixture. All tests in a session share the same `sloppy.py` subprocess. A test that "passes when run after test X but fails in isolation" is a **fragile test, not a fragile fixture**. The fixture is session-scoped by design; the test must explicitly wait-for-ready, reset state via Hook API, and verify preconditions via `get_value`/`wait_for_event` rather than assuming a "clean" ImGui state from a prior test. See [../docs/guide_testing.md](../docs/guide_testing.md#authoring-robust-live_gui-tests-dont-assume-clean-state) for the 5-rule authoring contract with anti-pattern vs pattern code examples. Bisect failures by running the test both in the full suite and in isolation to distinguish "test needs work" from "real app bug".

+#### Anti-Pattern: `push_event` + `time.sleep(N)` + `assert` is a guaranteed race (Added 2026-06-10)
+
+The pattern `push_event(...)` → `time.sleep(N)` → `assert` is a guaranteed race condition in batched runs. The first time you write this, the test passes in isolation because the sleep happens to be long enough. Then it lands in the batched run, the subprocess is busier, the sleep is no longer long enough, and the assert fires before the GUI render loop has processed the event.
+
+**Fix:** Replace `time.sleep(N)` with a poll loop on `get_value` or `wait_for_event`. The poll doubles as a wait-for-ready AND a correctness assertion.
+
+```python
+# WRONG: race condition
+def test_open_modal(live_gui):
+    client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
+    time.sleep(1)  # hope the modal opened
+    assert some_cached_value["settings_open"] is True  # may be stale
+
+# RIGHT: poll-until-state-visible
+def test_open_modal(live_gui):
+    client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
+    assert client.get_value("show_settings_modal"), "settings modal did not open"
+```
+
+This pattern surfaced 5+ times in the 2026-06-10 batch-green wave (test_reset_session_clears_mma_and_rag, test_visual_mma, test_visual_sim_gui_ux, test_gui_ux_event_routing, test_z_negative_flows). The fix is always the same: replace `time.sleep` with a poll loop bounded by a retry timeout (typically 5-20 iterations × 0.5s).
+
+#### Async Setters Need Poll-For-State (Added 2026-06-10)
+
+`mma_state_update` and `rag_*` setters operate asynchronously via the `_pending_gui_tasks` queue (`src/app_controller.py:_pending_gui_tasks_lock` and the dispatch logic in `src/gui_2.py:_process_pending_gui_tasks`). The setter returns before the GUI render loop processes the task. Asserting immediately after a setter call is a race.
+
+**Fix:** Poll via `get_value` with a bounded retry loop, not a single `time.sleep`. The setters that need this treatment include (but are not limited to): `mma_state_update`, `rag_enabled`, `rag_source`, `rag_emb_provider`, `rag_chunk_size`, `rag_chunk_overlap`, and any other `set_value` that targets a `_pending_gui_tasks`-dispatched field. See [../docs/guide_testing.md §MMA and RAG State in `reset_session()`](../docs/guide_testing.md) for the full state-bucket list.
+

 ### Indentation-Driven Class Method Visibility (CRITICAL)