7.0 KiB
7.0 KiB
TODO: Fix test_full_live_workflow race condition
Report: docs/reports/test_full_live_workflow_root_cause_20260608.md
Failure reproducibility: 100% in tier-3 batch, 0% in isolation
Status: Tasks 1+2 SHIPPED (commit 6ecb31ea); Tasks 3-7 remaining
Tasks (simple, ordered by ROI)
1. [HIGH] Add deterministic signal endpoint ✅ SHIPPED (commit 6ecb31ea)
- What: Add
GET /api/project_switch_statusreturning{"in_progress": bool, "path": str | null, "error": str | null}. - Where:
src/api_hooks.py(new handler) +src/app_controller.py(track_project_switch_in_progress+_project_switch_errorstate). - Why: Polling the project dict is fragile (returns stale state from prior tests). Polling a purpose-built signal is deterministic.
- Pattern: See
src/api_hooks.py:336-363(/api/warmup_wait) for the existing pattern of "block until condition, return final state". - Acceptance: Test polls
/api/project_switch_statusuntilin_progress == Falseandpath == expectedanderror is None. Times out after 30s with clear error. - Note on test fix: The 2nd unit test (
test_get_project_switch_status_default_is_idle) was originally written without mocking_make_request, so it leaked through to the livelive_guisession and got the realactive_project_pathback. Fixed in same commit by addingpatch.object(client, "_make_request")mock. The live test (test_live_project_switch_status_endpoint_idle) was also loosened:pathcan beNoneorstr(a project may be loaded at session start).
2. [HIGH] Reset project state in _handle_reset_session ✅ SHIPPED (commit 6ecb31ea) + REGRESSION FIXED (commit e0a3eb8c)
- What: Add
self.project = {}; self.project_paths = []at the start of_handle_reset_session. Do NOT clearself.active_project_path. - Where:
src/app_controller.py:3244-3296. - Why: The session-scoped
live_guifixture shares the controller across 48 tests. Prior tests leave stale project state. The reset handler currently clears AI session but not project state. - Acceptance: After
client.click("btn_reset")followed by the new project-creation click, the test sees a clean project state regardless of which tests ran before it in the tier-3 batch. - Implementation note (commit
6ecb31ea): Mirrors__init__default-project branch: creates a freshproject_manager.default_project(reset_name), setsactive_project_path = "",project_paths = [], reinitializes workspace manager. 3 unit tests pass. - Regression (discovered in commit
6ecb31ea, fixed in commite0a3eb8c): Settingself.active_project_path = ""causedtest_context_sim_liveto fail. Root cause:_do_project_switchcalls_flush_to_project()which writes toself.active_project_path(raisesOSErroron empty path), and thefinallyblock's_switch_project(pending)re-submitted the failed switch in an infinite loop. Status stuck at "switching to: ..." for 5+ seconds. Fix: keepself.active_project_pathas-is. Only replaceself.project(fresh default) and clearself.project_paths. The stale state is solved by replacing the project dict. Also removed theWorkspaceManager(project_root=None)reinit (not needed for the bug). 3 unit tests + 16 related regression tests pass.test_full_live_workflowpasses in 10.19s in isolation.
3. [MED] Replace os.path.abspath("tests/artifacts/temp_project.toml") with fixture-provided path
- What: Have the
live_guifixture providetemp_project_path(str) derived from its owntemp_workspacedirectory. - Where:
tests/conftest.py(live_gui fixture) +tests/test_live_workflow.py:50. - Why: cwd-relative path is fragile; fixture-relative path is stable.
- Acceptance: Test does
temp_project_path = live_gui_temp_project_path(or accesses it as a fixture attribute). No moreos.path.abspath("tests/artifacts/...").
4. [MED] Replace 10×1s blind poll with condition-based wait
- What: Use the new
/api/project_switch_statusendpoint with a singlewait_for_conditioncall (orclient.wait_for_project_active(name, timeout=30)helper). - Where:
tests/test_live_workflow.py:58-65+ newApiHookClient.wait_for_project_activemethod. - Why: Blind polling of derived state is fragile; condition-based wait is deterministic and surfaces the failure reason immediately.
- Pattern: See
src/api_hook_client.py:wait_for_server(existing pattern in the same client). - Acceptance: Test fails fast (within 5-10s) with a clear
errormessage from the API instead of timing out at 10s with "Project failed to activate".
5. [LOW] Add defensive state assertions
- What: Before polling for activation, verify:
- The file was created:
assert os.path.exists(temp_project_path) - The click was enqueued: check
client.get_events()for theclicktask
- The file was created:
- Where:
tests/test_live_workflow.py:55-65. - Why: Catches the case where the click was dropped or the handler crashed before writing the file.
- Acceptance: If the file doesn't exist after the click, the test fails immediately with "temp_project.toml not created" instead of timing out.
6. [LOW] Add pytest.mark.live to pyproject.toml markers
- What: Append
"live: marks tests as live visualization tests (not in CI by default)"to[tool.pytest.ini_options].markers. - Where:
pyproject.toml. - Why: Silences the
PytestUnknownMarkWarning: Unknown pytest.mark.livewarnings emitted bytest_visual_mma.py,test_visual_sim_gui_ux.py. The mark already exists; pyproject just doesn't know about it. - Acceptance:
uv run pytest tests/ 2>&1 | grep -i UnknownMarkreturns 0 lines.
7. [LOW] Add tests/.test_durations.json recording in CI / dev convenience
- What: Add a dev-mode shortcut to record durations once the fix lands (e.g.
python scripts/run_tests_batched.py --durations). - Where:
scripts/run_tests_batched.pyalready has--durationsflag; just need a one-time run + commit. - Why: The categorizer uses
.test_durations.jsonforspeedauto-inference. Currently all files default to MEDIUM speed. - Acceptance:
tests/.test_durations.jsonexists, has timing data for all 295+ tests. (Not strictly needed for the live_workflow fix.)
Order of work
1, 2, 3, 4 are tightly coupled (all about making the test deterministic and isolated). Do them in one PR.
5 is a defensive complement. Add with 1-4.
6, 7 are unrelated cleanup. Do in a separate small commit.
Estimated time
- Tasks 1, 2, 3, 4, 5: 2-3 hours (mostly test + 1 endpoint + 1 reset path)
- Tasks 6, 7: 5-10 minutes each
Verification
After fix:
uv run python scripts/run_tests_batched.py --tiers 3 --no-xdist --no-colorshows<<< tier-3-live_gui PASSuv run pytest tests/test_live_workflow.pystill PASSes in isolationuv run pytest tests/test_live_workflow.py tests/test_extended_sims.py tests/test_command_palette_sim.py(siblings) PASSes- Failure message on real regression is clear and actionable (e.g. "click was not dispatched within 5s" or "/api/project_switch_status returned error: file not found")