Private
Public Access
0
0
Files
manual_slop/TODO_test_full_live_workflow.md
T

7.0 KiB
Raw Blame History

TODO: Fix test_full_live_workflow race condition

Report: docs/reports/test_full_live_workflow_root_cause_20260608.md Failure reproducibility: 100% in tier-3 batch, 0% in isolation Status: Tasks 1+2 SHIPPED (commit 6ecb31ea); Tasks 3-7 remaining

Tasks (simple, ordered by ROI)

1. [HIGH] Add deterministic signal endpoint SHIPPED (commit 6ecb31ea)

  • What: Add GET /api/project_switch_status returning {"in_progress": bool, "path": str | null, "error": str | null}.
  • Where: src/api_hooks.py (new handler) + src/app_controller.py (track _project_switch_in_progress + _project_switch_error state).
  • Why: Polling the project dict is fragile (returns stale state from prior tests). Polling a purpose-built signal is deterministic.
  • Pattern: See src/api_hooks.py:336-363 (/api/warmup_wait) for the existing pattern of "block until condition, return final state".
  • Acceptance: Test polls /api/project_switch_status until in_progress == False and path == expected and error is None. Times out after 30s with clear error.
  • Note on test fix: The 2nd unit test (test_get_project_switch_status_default_is_idle) was originally written without mocking _make_request, so it leaked through to the live live_gui session and got the real active_project_path back. Fixed in same commit by adding patch.object(client, "_make_request") mock. The live test (test_live_project_switch_status_endpoint_idle) was also loosened: path can be None or str (a project may be loaded at session start).

2. [HIGH] Reset project state in _handle_reset_session SHIPPED (commit 6ecb31ea) + REGRESSION FIXED (commit e0a3eb8c)

  • What: Add self.project = {}; self.project_paths = [] at the start of _handle_reset_session. Do NOT clear self.active_project_path.
  • Where: src/app_controller.py:3244-3296.
  • Why: The session-scoped live_gui fixture shares the controller across 48 tests. Prior tests leave stale project state. The reset handler currently clears AI session but not project state.
  • Acceptance: After client.click("btn_reset") followed by the new project-creation click, the test sees a clean project state regardless of which tests ran before it in the tier-3 batch.
  • Implementation note (commit 6ecb31ea): Mirrors __init__ default-project branch: creates a fresh project_manager.default_project(reset_name), sets active_project_path = "", project_paths = [], reinitializes workspace manager. 3 unit tests pass.
  • Regression (discovered in commit 6ecb31ea, fixed in commit e0a3eb8c): Setting self.active_project_path = "" caused test_context_sim_live to fail. Root cause: _do_project_switch calls _flush_to_project() which writes to self.active_project_path (raises OSError on empty path), and the finally block's _switch_project(pending) re-submitted the failed switch in an infinite loop. Status stuck at "switching to: ..." for 5+ seconds. Fix: keep self.active_project_path as-is. Only replace self.project (fresh default) and clear self.project_paths. The stale state is solved by replacing the project dict. Also removed the WorkspaceManager(project_root=None) reinit (not needed for the bug). 3 unit tests + 16 related regression tests pass. test_full_live_workflow passes in 10.19s in isolation.

3. [MED] Replace os.path.abspath("tests/artifacts/temp_project.toml") with fixture-provided path

  • What: Have the live_gui fixture provide temp_project_path (str) derived from its own temp_workspace directory.
  • Where: tests/conftest.py (live_gui fixture) + tests/test_live_workflow.py:50.
  • Why: cwd-relative path is fragile; fixture-relative path is stable.
  • Acceptance: Test does temp_project_path = live_gui_temp_project_path (or accesses it as a fixture attribute). No more os.path.abspath("tests/artifacts/...").

4. [MED] Replace 10×1s blind poll with condition-based wait

  • What: Use the new /api/project_switch_status endpoint with a single wait_for_condition call (or client.wait_for_project_active(name, timeout=30) helper).
  • Where: tests/test_live_workflow.py:58-65 + new ApiHookClient.wait_for_project_active method.
  • Why: Blind polling of derived state is fragile; condition-based wait is deterministic and surfaces the failure reason immediately.
  • Pattern: See src/api_hook_client.py:wait_for_server (existing pattern in the same client).
  • Acceptance: Test fails fast (within 5-10s) with a clear error message from the API instead of timing out at 10s with "Project failed to activate".

5. [LOW] Add defensive state assertions

  • What: Before polling for activation, verify:
    • The file was created: assert os.path.exists(temp_project_path)
    • The click was enqueued: check client.get_events() for the click task
  • Where: tests/test_live_workflow.py:55-65.
  • Why: Catches the case where the click was dropped or the handler crashed before writing the file.
  • Acceptance: If the file doesn't exist after the click, the test fails immediately with "temp_project.toml not created" instead of timing out.

6. [LOW] Add pytest.mark.live to pyproject.toml markers

  • What: Append "live: marks tests as live visualization tests (not in CI by default)" to [tool.pytest.ini_options].markers.
  • Where: pyproject.toml.
  • Why: Silences the PytestUnknownMarkWarning: Unknown pytest.mark.live warnings emitted by test_visual_mma.py, test_visual_sim_gui_ux.py. The mark already exists; pyproject just doesn't know about it.
  • Acceptance: uv run pytest tests/ 2>&1 | grep -i UnknownMark returns 0 lines.

7. [LOW] Add tests/.test_durations.json recording in CI / dev convenience

  • What: Add a dev-mode shortcut to record durations once the fix lands (e.g. python scripts/run_tests_batched.py --durations).
  • Where: scripts/run_tests_batched.py already has --durations flag; just need a one-time run + commit.
  • Why: The categorizer uses .test_durations.json for speed auto-inference. Currently all files default to MEDIUM speed.
  • Acceptance: tests/.test_durations.json exists, has timing data for all 295+ tests. (Not strictly needed for the live_workflow fix.)

Order of work

1, 2, 3, 4 are tightly coupled (all about making the test deterministic and isolated). Do them in one PR.

5 is a defensive complement. Add with 1-4.

6, 7 are unrelated cleanup. Do in a separate small commit.

Estimated time

  • Tasks 1, 2, 3, 4, 5: 2-3 hours (mostly test + 1 endpoint + 1 reset path)
  • Tasks 6, 7: 5-10 minutes each

Verification

After fix:

  • uv run python scripts/run_tests_batched.py --tiers 3 --no-xdist --no-color shows <<< tier-3-live_gui PASS
  • uv run pytest tests/test_live_workflow.py still PASSes in isolation
  • uv run pytest tests/test_live_workflow.py tests/test_extended_sims.py tests/test_command_palette_sim.py (siblings) PASSes
  • Failure message on real regression is clear and actionable (e.g. "click was not dispatched within 5s" or "/api/project_switch_status returned error: file not found")