fix(app_controller): clear project-switch state in _handle_reset_session
When a prior test in the tier-3-live_gui batch leaves a _do_project_switch background thread running, the next test's btn_project_new_automated click sees _project_switch_in_progress=True (from the prior thread) and queues the new path via _project_switch_pending_path. The queued switch is never actually submitted to the io_pool, so is_project_stale() stays True and AI ops (_handle_generate_send) bail with 'project switch in progress; AI ops disabled'. Fix: _handle_reset_session now also clears _project_switch_in_progress, _project_switch_pending_path, and _project_switch_error (under the existing _project_switch_lock). This way, even if the prior background thread is still running, the controller reports an idle state and the new switch can be submitted normally. Also: - src/api_hook_client.py: reverted wait_for_project_switch to require in_progress=False (was relaxed to return on queued path, which misled the caller into thinking the switch was done) - tests/test_handle_reset_session_clears_project.py: new test test_handle_reset_session_clears_project_switch_state asserts is_project_stale() returns False after reset - tests/test_api_hook_client_wait_for_project_switch.py: updated test_wait_for_project_switch_does_not_return_on_queued (in_progress + matching path should keep waiting, not return early) - tests/test_live_workflow.py: added pre-wait for any in-flight switch before doing btn_reset (so the test waits up to 60s for the prior switch to complete if needed) - conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with the deeper hang analysis and recommended fix Known follow-up: test_full_live_workflow still hangs in tier-3 batch even with this fix, because the new _do_project_switch itself is hung in the io_pool (likely saturation from prior sims' AI discussion turn workers). Deeper investigation required.
This commit is contained in:
@@ -30,11 +30,16 @@
|
||||
|
||||
### 4. [MED] Replace 10×1s blind poll with condition-based wait ✅ SHIPPED (commits a6605d98 + b6972c31)
|
||||
- **What:** Use the new `/api/project_switch_status` endpoint with `client.wait_for_project_switch(expected_path, timeout)`.
|
||||
- **Where:** `tests/test_live_workflow.py:57-72` (replaced lines 57-65) + new `ApiHookClient.wait_for_project_switch` method.
|
||||
- **Where:** `tests/test_live_workflow.py` + new `ApiHookClient.wait_for_project_switch` method.
|
||||
- **Why:** Blind polling of derived state is fragile; condition-based wait is deterministic and surfaces the failure reason immediately.
|
||||
- **Pattern:** See `src/api_hook_client.py:wait_for_server` (existing pattern in the same client).
|
||||
- **Acceptance:** Test fails fast (within 30s) with a clear `error` message from the API instead of timing out at 10s with "Project failed to activate". 6 unit tests for the new helper (mocked _make_request) all pass.
|
||||
- **Known issue:** Test STILL fails in tier-3-live_gui batch (passes in 10.24s in isolation). The wait helper reports `in_progress: True` for the full 30s timeout, meaning `_do_project_switch` background thread is not completing. Likely io_pool saturation from prior sims' AI discussion turns. Needs further investigation of `_do_project_switch` hangs in batch context.
|
||||
- **Acceptance:** Test fails fast (within 30s) with a clear `error` message from the API instead of timing out at 10s with "Project failed to activate". 7 unit tests for the new helper (mocked _make_request) all pass.
|
||||
- **Known issue (still open):** Test STILL fails in tier-3-live_gui batch (passes in 10.24s in isolation). The wait helper reports `in_progress: True, path: temp_project.toml` for the full 30s timeout. Investigation found:
|
||||
- Added pre-wait (`client.wait_for_project_switch` at start) so the test waits for any prior switch to complete
|
||||
- Added `_handle_reset_session` to also clear `_project_switch_in_progress`/`_project_switch_pending_path`/`_project_switch_error` so a hung switch doesn't block the next session
|
||||
- The new switch is submitted to io_pool but the `_do_project_switch` background thread is **still hanging in the batch context** for 30+ seconds. The thread is not blocked on a lock or I/O — it's just not being scheduled (likely io_pool saturation from prior sims' long-running discussion turn workers)
|
||||
- This is a deeper issue: `test_extended_sims.py` sims each submit AI discussion turns that spawn multiple io_pool jobs. The sims don't wait for these to complete. The next test inherits a saturated pool.
|
||||
- **Recommended fix:** Mark `test_full_live_workflow` with `@pytest.mark.skipif(ENV_BATCH)` or run it in a separate subprocess. The test is fundamentally fragile to session-scoped state pollution and the io_pool saturation from prior sims.
|
||||
|
||||
### 5. [LOW] Add defensive state assertions ✅ SHIPPED (commit b6972c31)
|
||||
- **What:** Before waiting for activation, verify the file was created (5s poll, then assert).
|
||||
|
||||
Reference in New Issue
Block a user