docs(todo): simple todo list for fixing test_full_live_workflow race

2026-06-08 09:25:18 -04:00
parent 08ee7547be
commit 8248a49f1e
1 changed files with 74 additions and 0 deletions
@@ -0,0 +1,74 @@
+# TODO: Fix test_full_live_workflow race condition
+
+**Report:** `docs/reports/test_full_live_workflow_root_cause_20260608.md`
+**Failure reproducibility:** 100% in tier-3 batch, 0% in isolation
+**Status:** Not started (investigation only; no fix code yet)
+
+## Tasks (simple, ordered by ROI)
+
+### 1. [HIGH] Add deterministic signal endpoint
+- **What:** Add `GET /api/project_switch_status` returning `{"in_progress": bool, "path": str | null, "error": str | null}`.
+- **Where:** `src/api_hooks.py` (new handler) + `src/app_controller.py` (track `_project_switch_in_progress` + `_project_switch_error` state).
+- **Why:** Polling the project dict is fragile (returns stale state from prior tests). Polling a purpose-built signal is deterministic.
+- **Pattern:** See `src/api_hooks.py:336-363` (`/api/warmup_wait`) for the existing pattern of "block until condition, return final state".
+- **Acceptance:** Test polls `/api/project_switch_status` until `in_progress == False` and `path == expected` and `error is None`. Times out after 30s with clear error.
+
+### 2. [HIGH] Reset project state in `_handle_reset_session`
+- **What:** Add `self.project = {}; self.active_project_path = ""; self.project_paths = []` (or call a new `_reset_project_state` helper) at the start of `_handle_reset_session`.
+- **Where:** `src/app_controller.py:3244-3296`.
+- **Why:** The session-scoped `live_gui` fixture shares the controller across 48 tests. Prior tests leave stale project state. The reset handler currently clears AI session but not project state.
+- **Acceptance:** After `client.click("btn_reset")` followed by the new project-creation click, the test sees a clean project state regardless of which tests ran before it in the tier-3 batch.
+
+### 3. [MED] Replace `os.path.abspath("tests/artifacts/temp_project.toml")` with fixture-provided path
+- **What:** Have the `live_gui` fixture provide `temp_project_path` (str) derived from its own `temp_workspace` directory.
+- **Where:** `tests/conftest.py` (live_gui fixture) + `tests/test_live_workflow.py:50`.
+- **Why:** cwd-relative path is fragile; fixture-relative path is stable.
+- **Acceptance:** Test does `temp_project_path = live_gui_temp_project_path` (or accesses it as a fixture attribute). No more `os.path.abspath("tests/artifacts/...")`.
+
+### 4. [MED] Replace 10×1s blind poll with condition-based wait
+- **What:** Use the new `/api/project_switch_status` endpoint with a single `wait_for_condition` call (or `client.wait_for_project_active(name, timeout=30)` helper).
+- **Where:** `tests/test_live_workflow.py:58-65` + new `ApiHookClient.wait_for_project_active` method.
+- **Why:** Blind polling of derived state is fragile; condition-based wait is deterministic and surfaces the failure reason immediately.
+- **Pattern:** See `src/api_hook_client.py:wait_for_server` (existing pattern in the same client).
+- **Acceptance:** Test fails fast (within 5-10s) with a clear `error` message from the API instead of timing out at 10s with "Project failed to activate".
+
+### 5. [LOW] Add defensive state assertions
+- **What:** Before polling for activation, verify:
+  - The file was created: `assert os.path.exists(temp_project_path)`
+  - The click was enqueued: check `client.get_events()` for the `click` task
+- **Where:** `tests/test_live_workflow.py:55-65`.
+- **Why:** Catches the case where the click was dropped or the handler crashed before writing the file.
+- **Acceptance:** If the file doesn't exist after the click, the test fails immediately with "temp_project.toml not created" instead of timing out.
+
+### 6. [LOW] Add `pytest.mark.live` to pyproject.toml markers
+- **What:** Append `"live: marks tests as live visualization tests (not in CI by default)"` to `[tool.pytest.ini_options].markers`.
+- **Where:** `pyproject.toml`.
+- **Why:** Silences the `PytestUnknownMarkWarning: Unknown pytest.mark.live` warnings emitted by `test_visual_mma.py`, `test_visual_sim_gui_ux.py`. The mark already exists; pyproject just doesn't know about it.
+- **Acceptance:** `uv run pytest tests/ 2>&1 | grep -i UnknownMark` returns 0 lines.
+
+### 7. [LOW] Add `tests/.test_durations.json` recording in CI / dev convenience
+- **What:** Add a dev-mode shortcut to record durations once the fix lands (e.g. `python scripts/run_tests_batched.py --durations`).
+- **Where:** `scripts/run_tests_batched.py` already has `--durations` flag; just need a one-time run + commit.
+- **Why:** The categorizer uses `.test_durations.json` for `speed` auto-inference. Currently all files default to MEDIUM speed.
+- **Acceptance:** `tests/.test_durations.json` exists, has timing data for all 295+ tests. (Not strictly needed for the live_workflow fix.)
+
+## Order of work
+
+1, 2, 3, 4 are tightly coupled (all about making the test deterministic and isolated). Do them in one PR.
+
+5 is a defensive complement. Add with 1-4.
+
+6, 7 are unrelated cleanup. Do in a separate small commit.
+
+## Estimated time
+
+- Tasks 1, 2, 3, 4, 5: 2-3 hours (mostly test + 1 endpoint + 1 reset path)
+- Tasks 6, 7: 5-10 minutes each
+
+## Verification
+
+After fix:
+- `uv run python scripts/run_tests_batched.py --tiers 3 --no-xdist --no-color` shows `<<< tier-3-live_gui PASS`
+- `uv run pytest tests/test_live_workflow.py` still PASSes in isolation
+- `uv run pytest tests/test_live_workflow.py tests/test_extended_sims.py tests/test_command_palette_sim.py` (siblings) PASSes
+- Failure message on real regression is clear and actionable (e.g. "click was not dispatched within 5s" or "/api/project_switch_status returned error: file not found")