The 30s wait_for_project_switch timeout was an excessive constraint.
In batch context, prior sims' AI discussion turn workers saturate the
8-worker io_pool, queueing this switch for tens of seconds. The other
defensive waits in the test (warmup 60s, prior switch 60s) already use
60s+, so 30s was the inconsistent outlier.
User confirmed: 'I think not completing in 30s is an excessive constraint
if thats whats going on.'
Verification:
- test_full_live_workflow isolation: 11.69s PASS
- 7-test batch (test_full_live_workflow + 4 extended sims + 2 markdown): 85.83s PASS
Root cause: test_full_live_workflow in batch context (with prior sims
running AI discussion turns) would queue its _do_project_switch behind
the auto-pruner's scan of tests/logs/ (154MB, 6519 files). The 4-worker
pool was saturated, so the switch would never run within 30s.
Fix: bump IO_POOL_MAX_WORKERS from 4 to 8. This gives the pool enough
capacity to run: 2 pruners + the project switch + 5 spare.
Also: add /api/io_pool_status endpoint + get_io_pool_status +
wait_io_pool_idle helpers (kept in api_hooks.py and api_hook_client.py
for the test_api_hook_client_io_pool.py tests, even though the test
itself no longer uses them - they remain useful for future tests that
want to assert pool state directly).
Also: add wait_for_warmup at the start of test_full_live_workflow to
ensure SDK modules are loaded before AI ops.
Test verification:
- test_full_live_workflow in isolation: 11.83s PASS
- test_full_live_workflow in batch (with 4 prior sims): 83.46s PASS
- 30/30 related unit tests PASS
When a prior test in the tier-3-live_gui batch leaves a _do_project_switch
background thread running, the next test's btn_project_new_automated click
sees _project_switch_in_progress=True (from the prior thread) and queues
the new path via _project_switch_pending_path. The queued switch is never
actually submitted to the io_pool, so is_project_stale() stays True and
AI ops (_handle_generate_send) bail with 'project switch in progress;
AI ops disabled'.
Fix: _handle_reset_session now also clears _project_switch_in_progress,
_project_switch_pending_path, and _project_switch_error (under the
existing _project_switch_lock). This way, even if the prior background
thread is still running, the controller reports an idle state and the
new switch can be submitted normally.
Also:
- src/api_hook_client.py: reverted wait_for_project_switch to require
in_progress=False (was relaxed to return on queued path, which misled
the caller into thinking the switch was done)
- tests/test_handle_reset_session_clears_project.py: new test
test_handle_reset_session_clears_project_switch_state asserts
is_project_stale() returns False after reset
- tests/test_api_hook_client_wait_for_project_switch.py: updated
test_wait_for_project_switch_does_not_return_on_queued (in_progress
+ matching path should keep waiting, not return early)
- tests/test_live_workflow.py: added pre-wait for any in-flight switch
before doing btn_reset (so the test waits up to 60s for the prior
switch to complete if needed)
- conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with
the deeper hang analysis and recommended fix
Known follow-up: test_full_live_workflow still hangs in tier-3 batch
even with this fix, because the new _do_project_switch itself is hung
in the io_pool (likely saturation from prior sims' AI discussion turn
workers). Deeper investigation required.
Following the conductor convention of organizing track-related
artifacts under conductor/. The TODO tracks the test_full_live_workflow
race condition fix and its follow-up items (Tasks 3, 7 still pending;
known batch hang documented).
Tasks 1, 2 (with regression fix), 4, 5, 6 are SHIPPED in prior commits.
Silences the PytestUnknownMarkWarning emitted by test_visual_mma.py and
test_visual_sim_gui_ux.py (3 instances). The @pytest.mark.live mark
already exists in the test files; pyproject.toml just didn't know
about it.
- pyproject.toml: added 'live: marks tests as live visualization tests
(not in CI by default)' to [tool.pytest.ini_options].markers
Replaces the 10x1s blind poll of derived state with a condition-based
wait on /api/project_switch_status. Also adds a defensive file existence
check that fails fast (within 5s) if the click was dropped or the
project creation handler crashed.
The new wait surfaces a clear error message ('Project switch did not
complete in 30s. Last status: ...') instead of the generic 'Project
failed to activate', and exposes _project_switch_error if the controller
reported one.
- tests/test_live_workflow.py: replaced poll loop (lines 57-65) with
wait_for_project_switch + os.path.exists defensive check
Adds a polling helper that blocks until the project switch completes,
errors out, or times out. Replaces the fragile 10x1s blind poll in
test_full_live_workflow with a condition-based wait on the
/api/project_switch_status endpoint.
Features:
- Polls /api/project_switch_status every 200ms (configurable)
- Returns immediately on error (with the error in the result)
- Path matching: exact match OR basename match (handles absolute vs relative)
- Times out with a clear 'timeout' flag instead of a generic assertion
- Optional expected_path: if None, returns on any in_progress=False
- src/api_hook_client.py: new wait_for_project_switch method (37 lines)
- tests/test_api_hook_client_wait_for_project_switch.py: 6 unit tests
with mocked _make_request covering all paths
Task 2 (_handle_reset_session reset) introduced a regression: setting self.active_project_path to empty caused an infinite re-switch loop in _do_project_switch because _flush_to_project writes to active_project_path (raises OSError on empty path), and the finally block re-submitted the failed switch on every iteration. Result: test_context_sim_live saw switching-to status for 5+ seconds and MD-only generation was blocked.
Fix: keep self.active_project_path as-is in _handle_reset_session. Only reset self.project (to a fresh default_project dict) and self.project_paths (to empty list). The stale project state issue is solved by replacing the project dict; the active_project_path stays valid for _flush_to_project.
- src/app_controller.py: refined _handle_reset_session project reset
- tests/test_handle_reset_session_clears_project.py: updated contract test to assert active_project_path is preserved
Stale project state from prior live_gui tests (shared session-scoped
subprocess) was leaking into subsequent tests, causing the
test_full_live_workflow race condition: 'Project not switched' errors
when self.project still claimed to be a different project.
The fix: _handle_reset_session now mirrors the default-project branch
of __init__ (lines 1743-1745), creating a fresh default project dict,
clearing active_project_path and project_paths, and reinitializing
the workspace manager.
- src/app_controller.py: 6 new lines in _handle_reset_session
- tests/test_handle_reset_session_clears_project.py: 3 tests
(active_project_path, project_paths, self.project)
Adds a new endpoint that exposes the project-switch state machine so tests
can poll for completion instead of guessing with timeouts.
- AppController: track _project_switch_error on failure paths
- src/api_hooks.py: GET /api/project_switch_status returns
{in_progress, pending_path, active_path, error}
- src/api_hook_client.py: get_project_switch_status() helper
- tests/test_api_hooks_project_switch.py: 3 unit tests for client + endpoint
shape, 1 live_gui test for the default-idle case