5.4 KiB
5.4 KiB
TODO: Fix test_full_live_workflow race condition
Report: docs/reports/test_full_live_workflow_root_cause_20260608.md
Failure reproducibility: 100% in tier-3 batch, 0% in isolation
Status: Not started (investigation only; no fix code yet)
Tasks (simple, ordered by ROI)
1. [HIGH] Add deterministic signal endpoint
- What: Add
GET /api/project_switch_statusreturning{"in_progress": bool, "path": str | null, "error": str | null}. - Where:
src/api_hooks.py(new handler) +src/app_controller.py(track_project_switch_in_progress+_project_switch_errorstate). - Why: Polling the project dict is fragile (returns stale state from prior tests). Polling a purpose-built signal is deterministic.
- Pattern: See
src/api_hooks.py:336-363(/api/warmup_wait) for the existing pattern of "block until condition, return final state". - Acceptance: Test polls
/api/project_switch_statusuntilin_progress == Falseandpath == expectedanderror is None. Times out after 30s with clear error.
2. [HIGH] Reset project state in _handle_reset_session
- What: Add
self.project = {}; self.active_project_path = ""; self.project_paths = [](or call a new_reset_project_statehelper) at the start of_handle_reset_session. - Where:
src/app_controller.py:3244-3296. - Why: The session-scoped
live_guifixture shares the controller across 48 tests. Prior tests leave stale project state. The reset handler currently clears AI session but not project state. - Acceptance: After
client.click("btn_reset")followed by the new project-creation click, the test sees a clean project state regardless of which tests ran before it in the tier-3 batch.
3. [MED] Replace os.path.abspath("tests/artifacts/temp_project.toml") with fixture-provided path
- What: Have the
live_guifixture providetemp_project_path(str) derived from its owntemp_workspacedirectory. - Where:
tests/conftest.py(live_gui fixture) +tests/test_live_workflow.py:50. - Why: cwd-relative path is fragile; fixture-relative path is stable.
- Acceptance: Test does
temp_project_path = live_gui_temp_project_path(or accesses it as a fixture attribute). No moreos.path.abspath("tests/artifacts/...").
4. [MED] Replace 10×1s blind poll with condition-based wait
- What: Use the new
/api/project_switch_statusendpoint with a singlewait_for_conditioncall (orclient.wait_for_project_active(name, timeout=30)helper). - Where:
tests/test_live_workflow.py:58-65+ newApiHookClient.wait_for_project_activemethod. - Why: Blind polling of derived state is fragile; condition-based wait is deterministic and surfaces the failure reason immediately.
- Pattern: See
src/api_hook_client.py:wait_for_server(existing pattern in the same client). - Acceptance: Test fails fast (within 5-10s) with a clear
errormessage from the API instead of timing out at 10s with "Project failed to activate".
5. [LOW] Add defensive state assertions
- What: Before polling for activation, verify:
- The file was created:
assert os.path.exists(temp_project_path) - The click was enqueued: check
client.get_events()for theclicktask
- The file was created:
- Where:
tests/test_live_workflow.py:55-65. - Why: Catches the case where the click was dropped or the handler crashed before writing the file.
- Acceptance: If the file doesn't exist after the click, the test fails immediately with "temp_project.toml not created" instead of timing out.
6. [LOW] Add pytest.mark.live to pyproject.toml markers
- What: Append
"live: marks tests as live visualization tests (not in CI by default)"to[tool.pytest.ini_options].markers. - Where:
pyproject.toml. - Why: Silences the
PytestUnknownMarkWarning: Unknown pytest.mark.livewarnings emitted bytest_visual_mma.py,test_visual_sim_gui_ux.py. The mark already exists; pyproject just doesn't know about it. - Acceptance:
uv run pytest tests/ 2>&1 | grep -i UnknownMarkreturns 0 lines.
7. [LOW] Add tests/.test_durations.json recording in CI / dev convenience
- What: Add a dev-mode shortcut to record durations once the fix lands (e.g.
python scripts/run_tests_batched.py --durations). - Where:
scripts/run_tests_batched.pyalready has--durationsflag; just need a one-time run + commit. - Why: The categorizer uses
.test_durations.jsonforspeedauto-inference. Currently all files default to MEDIUM speed. - Acceptance:
tests/.test_durations.jsonexists, has timing data for all 295+ tests. (Not strictly needed for the live_workflow fix.)
Order of work
1, 2, 3, 4 are tightly coupled (all about making the test deterministic and isolated). Do them in one PR.
5 is a defensive complement. Add with 1-4.
6, 7 are unrelated cleanup. Do in a separate small commit.
Estimated time
- Tasks 1, 2, 3, 4, 5: 2-3 hours (mostly test + 1 endpoint + 1 reset path)
- Tasks 6, 7: 5-10 minutes each
Verification
After fix:
uv run python scripts/run_tests_batched.py --tiers 3 --no-xdist --no-colorshows<<< tier-3-live_gui PASSuv run pytest tests/test_live_workflow.pystill PASSes in isolationuv run pytest tests/test_live_workflow.py tests/test_extended_sims.py tests/test_command_palette_sim.py(siblings) PASSes- Failure message on real regression is clear and actionable (e.g. "click was not dispatched within 5s" or "/api/project_switch_status returned error: file not found")