manual_slop/TODO_test_full_live_workflow.md

# TODO: Fix test_full_live_workflow race condition

**Report:** `docs/reports/test_full_live_workflow_root_cause_20260608.md`
**Failure reproducibility:** 100% in tier-3 batch, 0% in isolation
**Status:** Tasks 1+2 SHIPPED (commit `6ecb31ea`); Tasks 3-7 remaining

## Tasks (simple, ordered by ROI)

### 1. [HIGH] Add deterministic signal endpoint ✅ SHIPPED (commit 6ecb31ea)
- **What:** Add `GET /api/project_switch_status` returning `{"in_progress": bool, "path": str | null, "error": str | null}`.
- **Where:** `src/api_hooks.py` (new handler) + `src/app_controller.py` (track `_project_switch_in_progress` + `_project_switch_error` state).
- **Why:** Polling the project dict is fragile (returns stale state from prior tests). Polling a purpose-built signal is deterministic.
- **Pattern:** See `src/api_hooks.py:336-363` (`/api/warmup_wait`) for the existing pattern of "block until condition, return final state".
- **Acceptance:** Test polls `/api/project_switch_status` until `in_progress == False` and `path == expected` and `error is None`. Times out after 30s with clear error.
- **Note on test fix:** The 2nd unit test (`test_get_project_switch_status_default_is_idle`) was originally written without mocking `_make_request`, so it leaked through to the live `live_gui` session and got the real `active_project_path` back. Fixed in same commit by adding `patch.object(client, "_make_request")` mock. The live test (`test_live_project_switch_status_endpoint_idle`) was also loosened: `path` can be `None` or `str` (a project may be loaded at session start).

### 2. [HIGH] Reset project state in `_handle_reset_session` ✅ SHIPPED (commit 6ecb31ea) + REGRESSION FIXED (commit e0a3eb8c)
- **What:** Add `self.project = {}; self.project_paths = []` at the start of `_handle_reset_session`. Do NOT clear `self.active_project_path`.
- **Where:** `src/app_controller.py:3244-3296`.
- **Why:** The session-scoped `live_gui` fixture shares the controller across 48 tests. Prior tests leave stale project state. The reset handler currently clears AI session but not project state.
- **Acceptance:** After `client.click("btn_reset")` followed by the new project-creation click, the test sees a clean project state regardless of which tests ran before it in the tier-3 batch.
- **Implementation note (commit 6ecb31ea):** Mirrors `__init__` default-project branch: creates a fresh `project_manager.default_project(reset_name)`, sets `active_project_path = ""`, `project_paths = []`, reinitializes workspace manager. 3 unit tests pass.
- **Regression (discovered in commit 6ecb31ea, fixed in commit e0a3eb8c):** Setting `self.active_project_path = ""` caused `test_context_sim_live` to fail. Root cause: `_do_project_switch` calls `_flush_to_project()` which writes to `self.active_project_path` (raises `OSError` on empty path), and the `finally` block's `_switch_project(pending)` re-submitted the failed switch in an infinite loop. Status stuck at "switching to: ..." for 5+ seconds. Fix: keep `self.active_project_path` as-is. Only replace `self.project` (fresh default) and clear `self.project_paths`. The stale state is solved by replacing the project dict. Also removed the `WorkspaceManager(project_root=None)` reinit (not needed for the bug). 3 unit tests + 16 related regression tests pass. `test_full_live_workflow` passes in 10.19s in isolation.

### 3. [MED] Replace `os.path.abspath("tests/artifacts/temp_project.toml")` with fixture-provided path
- **What:** Have the `live_gui` fixture provide `temp_project_path` (str) derived from its own `temp_workspace` directory.
- **Where:** `tests/conftest.py` (live_gui fixture) + `tests/test_live_workflow.py:50`.
- **Why:** cwd-relative path is fragile; fixture-relative path is stable.
- **Acceptance:** Test does `temp_project_path = live_gui_temp_project_path` (or accesses it as a fixture attribute). No more `os.path.abspath("tests/artifacts/...")`.

### 4. [MED] Replace 10×1s blind poll with condition-based wait
- **What:** Use the new `/api/project_switch_status` endpoint with a single `wait_for_condition` call (or `client.wait_for_project_active(name, timeout=30)` helper).
- **Where:** `tests/test_live_workflow.py:58-65` + new `ApiHookClient.wait_for_project_active` method.
- **Why:** Blind polling of derived state is fragile; condition-based wait is deterministic and surfaces the failure reason immediately.
- **Pattern:** See `src/api_hook_client.py:wait_for_server` (existing pattern in the same client).
- **Acceptance:** Test fails fast (within 5-10s) with a clear `error` message from the API instead of timing out at 10s with "Project failed to activate".

### 5. [LOW] Add defensive state assertions
- **What:** Before polling for activation, verify:
  - The file was created: `assert os.path.exists(temp_project_path)`
  - The click was enqueued: check `client.get_events()` for the `click` task
- **Where:** `tests/test_live_workflow.py:55-65`.
- **Why:** Catches the case where the click was dropped or the handler crashed before writing the file.
- **Acceptance:** If the file doesn't exist after the click, the test fails immediately with "temp_project.toml not created" instead of timing out.

### 6. [LOW] Add `pytest.mark.live` to pyproject.toml markers
- **What:** Append `"live: marks tests as live visualization tests (not in CI by default)"` to `[tool.pytest.ini_options].markers`.
- **Where:** `pyproject.toml`.
- **Why:** Silences the `PytestUnknownMarkWarning: Unknown pytest.mark.live` warnings emitted by `test_visual_mma.py`, `test_visual_sim_gui_ux.py`. The mark already exists; pyproject just doesn't know about it.
- **Acceptance:** `uv run pytest tests/ 2>&1 | grep -i UnknownMark` returns 0 lines.

### 7. [LOW] Add `tests/.test_durations.json` recording in CI / dev convenience
- **What:** Add a dev-mode shortcut to record durations once the fix lands (e.g. `python scripts/run_tests_batched.py --durations`).
- **Where:** `scripts/run_tests_batched.py` already has `--durations` flag; just need a one-time run + commit.
- **Why:** The categorizer uses `.test_durations.json` for `speed` auto-inference. Currently all files default to MEDIUM speed.
- **Acceptance:** `tests/.test_durations.json` exists, has timing data for all 295+ tests. (Not strictly needed for the live_workflow fix.)

## Order of work

1, 2, 3, 4 are tightly coupled (all about making the test deterministic and isolated). Do them in one PR.

5 is a defensive complement. Add with 1-4.

6, 7 are unrelated cleanup. Do in a separate small commit.

## Estimated time

- Tasks 1, 2, 3, 4, 5: 2-3 hours (mostly test + 1 endpoint + 1 reset path)
- Tasks 6, 7: 5-10 minutes each

## Verification

After fix:
- `uv run python scripts/run_tests_batched.py --tiers 3 --no-xdist --no-color` shows `<<< tier-3-live_gui PASS`
- `uv run pytest tests/test_live_workflow.py` still PASSes in isolation
- `uv run pytest tests/test_live_workflow.py tests/test_extended_sims.py tests/test_command_palette_sim.py` (siblings) PASSes
- Failure message on real regression is clear and actionable (e.g. "click was not dispatched within 5s" or "/api/project_switch_status returned error: file not found")