# TODO: Fix test_full_live_workflow race condition **Report:** `docs/reports/test_full_live_workflow_root_cause_20260608.md` **Failure reproducibility:** 100% in tier-3 batch, 0% in isolation **Status:** Tasks 1+2 SHIPPED (commit `6ecb31ea`); Tasks 3-7 remaining ## Tasks (simple, ordered by ROI) ### 1. [HIGH] Add deterministic signal endpoint ✅ SHIPPED (commit 6ecb31ea) - **What:** Add `GET /api/project_switch_status` returning `{"in_progress": bool, "path": str | null, "error": str | null}`. - **Where:** `src/api_hooks.py` (new handler) + `src/app_controller.py` (track `_project_switch_in_progress` + `_project_switch_error` state). - **Why:** Polling the project dict is fragile (returns stale state from prior tests). Polling a purpose-built signal is deterministic. - **Pattern:** See `src/api_hooks.py:336-363` (`/api/warmup_wait`) for the existing pattern of "block until condition, return final state". - **Acceptance:** Test polls `/api/project_switch_status` until `in_progress == False` and `path == expected` and `error is None`. Times out after 30s with clear error. - **Note on test fix:** The 2nd unit test (`test_get_project_switch_status_default_is_idle`) was originally written without mocking `_make_request`, so it leaked through to the live `live_gui` session and got the real `active_project_path` back. Fixed in same commit by adding `patch.object(client, "_make_request")` mock. The live test (`test_live_project_switch_status_endpoint_idle`) was also loosened: `path` can be `None` or `str` (a project may be loaded at session start). ### 2. [HIGH] Reset project state in `_handle_reset_session` ✅ SHIPPED (commit 6ecb31ea) + REGRESSION FIXED (commit e0a3eb8c) - **What:** Add `self.project = {}; self.project_paths = []` at the start of `_handle_reset_session`. Do NOT clear `self.active_project_path`. - **Where:** `src/app_controller.py:3244-3296`. - **Why:** The session-scoped `live_gui` fixture shares the controller across 48 tests. Prior tests leave stale project state. The reset handler currently clears AI session but not project state. - **Acceptance:** After `client.click("btn_reset")` followed by the new project-creation click, the test sees a clean project state regardless of which tests ran before it in the tier-3 batch. - **Implementation note (commit 6ecb31ea):** Mirrors `__init__` default-project branch: creates a fresh `project_manager.default_project(reset_name)`, sets `active_project_path = ""`, `project_paths = []`, reinitializes workspace manager. 3 unit tests pass. - **Regression (discovered in commit 6ecb31ea, fixed in commit e0a3eb8c):** Setting `self.active_project_path = ""` caused `test_context_sim_live` to fail. Root cause: `_do_project_switch` calls `_flush_to_project()` which writes to `self.active_project_path` (raises `OSError` on empty path), and the `finally` block's `_switch_project(pending)` re-submitted the failed switch in an infinite loop. Status stuck at "switching to: ..." for 5+ seconds. Fix: keep `self.active_project_path` as-is. Only replace `self.project` (fresh default) and clear `self.project_paths`. The stale state is solved by replacing the project dict. Also removed the `WorkspaceManager(project_root=None)` reinit (not needed for the bug). 3 unit tests + 16 related regression tests pass. `test_full_live_workflow` passes in 10.19s in isolation. ### 3. [MED] Replace `os.path.abspath("tests/artifacts/temp_project.toml")` with fixture-provided path - **What:** Have the `live_gui` fixture provide `temp_project_path` (str) derived from its own `temp_workspace` directory. - **Where:** `tests/conftest.py` (live_gui fixture) + `tests/test_live_workflow.py:50`. - **Why:** cwd-relative path is fragile; fixture-relative path is stable. - **Acceptance:** Test does `temp_project_path = live_gui_temp_project_path` (or accesses it as a fixture attribute). No more `os.path.abspath("tests/artifacts/...")`. ### 4. [MED] Replace 10×1s blind poll with condition-based wait - **What:** Use the new `/api/project_switch_status` endpoint with a single `wait_for_condition` call (or `client.wait_for_project_active(name, timeout=30)` helper). - **Where:** `tests/test_live_workflow.py:58-65` + new `ApiHookClient.wait_for_project_active` method. - **Why:** Blind polling of derived state is fragile; condition-based wait is deterministic and surfaces the failure reason immediately. - **Pattern:** See `src/api_hook_client.py:wait_for_server` (existing pattern in the same client). - **Acceptance:** Test fails fast (within 5-10s) with a clear `error` message from the API instead of timing out at 10s with "Project failed to activate". ### 5. [LOW] Add defensive state assertions - **What:** Before polling for activation, verify: - The file was created: `assert os.path.exists(temp_project_path)` - The click was enqueued: check `client.get_events()` for the `click` task - **Where:** `tests/test_live_workflow.py:55-65`. - **Why:** Catches the case where the click was dropped or the handler crashed before writing the file. - **Acceptance:** If the file doesn't exist after the click, the test fails immediately with "temp_project.toml not created" instead of timing out. ### 6. [LOW] Add `pytest.mark.live` to pyproject.toml markers - **What:** Append `"live: marks tests as live visualization tests (not in CI by default)"` to `[tool.pytest.ini_options].markers`. - **Where:** `pyproject.toml`. - **Why:** Silences the `PytestUnknownMarkWarning: Unknown pytest.mark.live` warnings emitted by `test_visual_mma.py`, `test_visual_sim_gui_ux.py`. The mark already exists; pyproject just doesn't know about it. - **Acceptance:** `uv run pytest tests/ 2>&1 | grep -i UnknownMark` returns 0 lines. ### 7. [LOW] Add `tests/.test_durations.json` recording in CI / dev convenience - **What:** Add a dev-mode shortcut to record durations once the fix lands (e.g. `python scripts/run_tests_batched.py --durations`). - **Where:** `scripts/run_tests_batched.py` already has `--durations` flag; just need a one-time run + commit. - **Why:** The categorizer uses `.test_durations.json` for `speed` auto-inference. Currently all files default to MEDIUM speed. - **Acceptance:** `tests/.test_durations.json` exists, has timing data for all 295+ tests. (Not strictly needed for the live_workflow fix.) ## Order of work 1, 2, 3, 4 are tightly coupled (all about making the test deterministic and isolated). Do them in one PR. 5 is a defensive complement. Add with 1-4. 6, 7 are unrelated cleanup. Do in a separate small commit. ## Estimated time - Tasks 1, 2, 3, 4, 5: 2-3 hours (mostly test + 1 endpoint + 1 reset path) - Tasks 6, 7: 5-10 minutes each ## Verification After fix: - `uv run python scripts/run_tests_batched.py --tiers 3 --no-xdist --no-color` shows `<<< tier-3-live_gui PASS` - `uv run pytest tests/test_live_workflow.py` still PASSes in isolation - `uv run pytest tests/test_live_workflow.py tests/test_extended_sims.py tests/test_command_palette_sim.py` (siblings) PASSes - Failure message on real regression is clear and actionable (e.g. "click was not dispatched within 5s" or "/api/project_switch_status returned error: file not found")