TODO: Fix test_full_live_workflow race condition

Report: docs/reports/test_full_live_workflow_root_cause_20260608.md Failure reproducibility: 100% in tier-3 batch, 0% in isolation Status: Not started (investigation only; no fix code yet)

Tasks (simple, ordered by ROI)

1. [HIGH] Add deterministic signal endpoint

What: Add GET /api/project_switch_status returning {"in_progress": bool, "path": str | null, "error": str | null}.
Where: src/api_hooks.py (new handler) + src/app_controller.py (track _project_switch_in_progress + _project_switch_error state).
Why: Polling the project dict is fragile (returns stale state from prior tests). Polling a purpose-built signal is deterministic.
Pattern: See src/api_hooks.py:336-363 (/api/warmup_wait) for the existing pattern of "block until condition, return final state".
Acceptance: Test polls /api/project_switch_status until in_progress == False and path == expected and error is None. Times out after 30s with clear error.

2. [HIGH] Reset project state in `_handle_reset_session`

What: Add self.project = {}; self.active_project_path = ""; self.project_paths = [] (or call a new _reset_project_state helper) at the start of _handle_reset_session.
Where: src/app_controller.py:3244-3296.
Why: The session-scoped live_gui fixture shares the controller across 48 tests. Prior tests leave stale project state. The reset handler currently clears AI session but not project state.
Acceptance: After client.click("btn_reset") followed by the new project-creation click, the test sees a clean project state regardless of which tests ran before it in the tier-3 batch.

3. [MED] Replace `os.path.abspath("tests/artifacts/temp_project.toml")` with fixture-provided path

What: Have the live_gui fixture provide temp_project_path (str) derived from its own temp_workspace directory.
Where: tests/conftest.py (live_gui fixture) + tests/test_live_workflow.py:50.
Why: cwd-relative path is fragile; fixture-relative path is stable.
Acceptance: Test does temp_project_path = live_gui_temp_project_path (or accesses it as a fixture attribute). No more os.path.abspath("tests/artifacts/...").

4. [MED] Replace 10×1s blind poll with condition-based wait

What: Use the new /api/project_switch_status endpoint with a single wait_for_condition call (or client.wait_for_project_active(name, timeout=30) helper).
Where: tests/test_live_workflow.py:58-65 + new ApiHookClient.wait_for_project_active method.
Why: Blind polling of derived state is fragile; condition-based wait is deterministic and surfaces the failure reason immediately.
Pattern: See src/api_hook_client.py:wait_for_server (existing pattern in the same client).
Acceptance: Test fails fast (within 5-10s) with a clear error message from the API instead of timing out at 10s with "Project failed to activate".

5. [LOW] Add defensive state assertions

What: Before polling for activation, verify:
- The file was created: assert os.path.exists(temp_project_path)
- The click was enqueued: check client.get_events() for the click task
Where: tests/test_live_workflow.py:55-65.
Why: Catches the case where the click was dropped or the handler crashed before writing the file.
Acceptance: If the file doesn't exist after the click, the test fails immediately with "temp_project.toml not created" instead of timing out.

6. [LOW] Add `pytest.mark.live` to pyproject.toml markers

What: Append "live: marks tests as live visualization tests (not in CI by default)" to [tool.pytest.ini_options].markers.
Where: pyproject.toml.
Why: Silences the PytestUnknownMarkWarning: Unknown pytest.mark.live warnings emitted by test_visual_mma.py, test_visual_sim_gui_ux.py. The mark already exists; pyproject just doesn't know about it.
Acceptance: uv run pytest tests/ 2>&1 | grep -i UnknownMark returns 0 lines.

7. [LOW] Add `tests/.test_durations.json` recording in CI / dev convenience

What: Add a dev-mode shortcut to record durations once the fix lands (e.g. python scripts/run_tests_batched.py --durations).
Where: scripts/run_tests_batched.py already has --durations flag; just need a one-time run + commit.
Why: The categorizer uses .test_durations.json for speed auto-inference. Currently all files default to MEDIUM speed.
Acceptance: tests/.test_durations.json exists, has timing data for all 295+ tests. (Not strictly needed for the live_workflow fix.)

Order of work

1, 2, 3, 4 are tightly coupled (all about making the test deterministic and isolated). Do them in one PR.

5 is a defensive complement. Add with 1-4.

6, 7 are unrelated cleanup. Do in a separate small commit.

Estimated time

Tasks 1, 2, 3, 4, 5: 2-3 hours (mostly test + 1 endpoint + 1 reset path)
Tasks 6, 7: 5-10 minutes each

Verification

After fix:

uv run python scripts/run_tests_batched.py --tiers 3 --no-xdist --no-color shows <<< tier-3-live_gui PASS
uv run pytest tests/test_live_workflow.py still PASSes in isolation
uv run pytest tests/test_live_workflow.py tests/test_extended_sims.py tests/test_command_palette_sim.py (siblings) PASSes
Failure message on real regression is clear and actionable (e.g. "click was not dispatched within 5s" or "/api/project_switch_status returned error: file not found")

5.4 KiB Raw Blame History Unescape Escape