Private
Public Access
0
0
Commit Graph

690 Commits

Author SHA1 Message Date
ed 4bb19835db fix(test): per-worker workspace subdir + file-lock for xdist live_gui coordination 2026-06-09 22:23:33 -04:00
ed 38cb0f99b4 fix(test): add PID to workspace path for xdist worker isolation 2026-06-09 21:45:02 -04:00
ed 35f4cecb9b fix(test): catch OSError in workspace rmtree retry (broader than PermissionError) 2026-06-09 21:22:00 -04:00
ed aa776224f2 test(workspace): update fixture test to assert tests/artifacts/ not tmp dir 2026-06-09 21:06:06 -04:00
ed ccc2aa0be9 test(workspace): verify per-run workspace path and gitignore status 2026-06-09 20:45:24 -04:00
ed b8c15f8d92 fix(test): per-run workspace under tests/artifacts/ (replaces tmp_path_factory) 2026-06-09 20:42:43 -04:00
ed fe240db410 fix(reset): clear mma_tier_usage and RAG state in _handle_reset_session 2026-06-09 19:44:10 -04:00
ed 34290e5d1a test(watchdog): update PYTEST_FINISHED_TIMEOUT_SECONDS to 600 to match conftest 2026-06-09 18:42:53 -04:00
ed c3af1b8a2e chore(test): double smart_watchdog timeout from 300s to 600s for tier-3 2026-06-09 18:37:34 -04:00
ed 7a946544ff test(mma): mark test_visual_mma_components with clean_baseline 2026-06-09 17:14:23 -04:00
ed e7da7e0d6a test(rag): update test for Phase 4 coalescing state 2026-06-09 17:10:33 -04:00
ed 749120d239 feat(audit): flag hardcoded workspace and project-root paths in tests 2026-06-09 17:01:14 -04:00
ed 1cd3444e4c test(rag): mark RAG tests with clean_baseline for batch isolation 2026-06-09 16:56:55 -04:00
ed 7b87bbf5ec feat(test): clean_baseline marker resets controller state before test 2026-06-09 16:40:18 -04:00
ed b8fcd9d6f5 fix(rag): coalesce _sync_rag_engine calls via token + dirty flag 2026-06-09 16:25:44 -04:00
ed 006bb11488 refactor(test): 5 test files use live_gui_workspace fixture instead of hardcoded path 2026-06-09 16:14:40 -04:00
ed 91313451a2 feat(test): expose live_gui_workspace as a separate fixture 2026-06-09 15:53:06 -04:00
ed c64da95ef5 refactor(test): live_gui workspace via tmp_path_factory 2026-06-09 15:51:35 -04:00
ed c3cb3c6e44 feat(test): autouse _check_live_gui_health recovers from degraded subprocess 2026-06-09 15:47:28 -04:00
ed 67d0211e56 feat(test): autouse _check_live_gui_health recovers from degraded subprocess 2026-06-09 15:42:00 -04:00
ed 16bd3d3a47 refactor(test): wrap live_gui subprocess in _LiveGuiHandle class 2026-06-09 15:37:47 -04:00
ed 40f905d14b test(rag): update dim-mismatch test to assert rmtree behavior
The fix in 644d88ab changed the recovery path from client.delete_collection
to shutil.rmtree (chromadb 1.5.x delete_collection is broken on corrupted
state). The test still asserted the old behavior.
2026-06-09 14:50:55 -04:00
ed a341d7a7c8 test: ensure sentence-transformers is in test env + conftest gate 2026-06-09 10:37:14 -04:00
ed e62266e868 fix(rag): surface embedding provider init failure as 'error' status
The bug: when the local embedding provider fails to initialize
(e.g. sentence-transformers not installed), RAGEngine.__init__
leaves self.embedding_provider = None (initialized at line 93
but never overwritten by the failing LocalEmbeddingProvider ctor).
The constructor returns. _sync_rag_engine's else branch then
sets status to 'ready' - a lie. The RAG panel shows 'ready'.
The user triggers a retrieval. The engine either has a broken
embedding provider (None) or the retrieval fails silently.
The RAG context never appears in the AI's history.

The fix: in _sync_rag_engine's _task, after RAGEngine(...)
returns, check if engine.embedding_provider is None. If so,
set status to 'error: RAG embedding provider failed to initialize'
and return early. This prevents:
  - The engine from being assigned to self.rag_engine
  - The rebuild being triggered
  - The status being set to 'ready' / 'indexing'

Note: this does NOT make the RAG test pass. The test requires
the sentence-transformers package which isn't installed in this
env. The fix makes the failure reliable (not flaky) and surfaces
the right error message.

TDD: 3 tests added in tests/test_rag_engine_ready_status_bug.py:
- RAGEngine ctor raises ImportError on missing sentence-transformers
- _sync_rag_engine sets status to 'error' (not 'ready') on init failure
- RAGEngine ctor leaves embedding_provider=None when init fails

All 3 pass. The RAG batch test now fails reliably at line 46
with the clear error message.
2026-06-09 09:39:02 -04:00
ed bcdc26d0bd fix(gui): correct __getattr__ to not silently return None for missing ui_ attrs
PR1 follow-up (the actual IM_ASSERT root cause fix).

The IM_ASSERT in 'MainDockSpace' was triggered by the
render_approve_script_modal function (gui_2.py:4895) calling
imgui.checkbox with a None value for app.ui_approve_modal_preview.

The chain of bugs:

1. AppController.__getattr__ returned None for ANY ui_ attribute
   (line 1237-1238). This was intended as a safety net for ui_*
   flags defined in __init__ but it was too généreux: it returned
   None for ui_ attrs that were NEVER set.

2. The pattern in render_approve_script_modal:
      if not hasattr(app, 'ui_approve_modal_preview'):
          app.ui_approve_modal_preview = False
      _, app.ui_approve_modal_preview = imgui.checkbox(..., app.ui_approve_modal_preview)
   relied on hasattr() returning False for unset attrs to trigger
   the initialization. But the App.__setattr__ checks
   hasattr(self.controller, name) to decide where to route
   assignments. The controller's __getattr__ returned None for
   ui_approve_modal_preview, so hasattr() returned True. The
   App.__setattr__ routed the assignment to the controller.
   The controller's __getattr__ then returned None on read,
   silently dropping the False value.

3. The next line called imgui.checkbox with None, which raised
   a TypeError. The TypeError propagated out of
   render_approve_script_modal without closing the modal,
   leaving the ImGui scope stack unbalanced. The unbalanced
   scope triggered IM_ASSERT(Missing End()) on the next frame.

Fix: AppController.__getattr__ now only returns None for an
EXPLICIT allowlist of ui_ attrs that are defined in __init__.
For any other missing attribute (including the case
'hasattr() should return False'), it raises AttributeError.

The App.__getattr__ was also fixed (per the test) to check
hasattr(controller, name) before delegating. This is defense in
depth in case other __getattr__ patterns are added.

Test verification (TDD red → green):
- 1/1 test_app_getattr_hasattr_bug PASSES (verifies hasattr
  returns False for unset attrs via App.__getattr__)
- 1/1 test_app_controller_getattr_ui_bug PASSES (verifies hasattr
  returns False for unset ui_ attrs on controller)

Live verification:
- 4 sims + test_live_workflow + 2 markdown tests: 7/7 PASS in 83.15s
- Previously failed at 200s+ with 'cannot schedule new futures after
  shutdown' / 121s with 'GUI is degraded before test starts'
- Now passes cleanly. The IM_ASSERT no longer fires.

13/13 related unit tests pass (app_controller_* + app_run_* +
app_getattr_*). No regressions in 51/51 io_pool/warmup/sigint/etc.
unit tests.
2026-06-08 23:45:25 -04:00
ed 51ecace464 test(live_workflow): pre-flight health check fails fast on dirty state
PR3 of the test_full_live_workflow_imgui_assert fix sequence.

When a prior live_gui test in the same session crashes the GUI (e.g.
via an ImGui IM_ASSERT from cumulative panel state), the controller's
_io_pool gets shut down. The next test starts in a degraded state
but only discovers this 120s later when its project switch times
out with a confusing 'cannot schedule new futures after shutdown'
error.

This commit adds a /api/gui_health pre-flight check at the start of
test_full_live_workflow. If the GUI is degraded, the test fails
fast (within 1s) with a clear, actionable message that includes:
- The exact RuntimeError that caused the degradation
- The full traceback of the last ImGui scope mismatch
- A note that the new test cannot proceed with a dirty state

Per user feedback 2026-06-08: 'I don't want a batch to be too fragile
where I can't restart the app and continue with the next test file
if it fails. Just has to note that the new file didn't get to deal
with a dirty state.'

Also includes the planning documents written earlier in this session:
- TODO_test_full_live_workflow_v2.md (task list)
- test_full_live_workflow_imgui_assert_20260608.md (root cause report)
- test_full_live_workflow_propagation_digest_20260608.md (solutions digest)
- batch_resilience_plan_20260608.md (batch resilience plan)

Verification:
- test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade)
- 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS)
  - Without PR3 fix: 200s FAIL with confusing 120s timeout
  - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message
- The fast-fail is observable, not silent (per user's 'wrap might be
  worth it if that properly lets us handle the assert')
2026-06-08 21:17:54 -04:00
ed 1c565da7a0 feat(gui): wrap immapp.run in try/except + add /api/gui_health endpoint
PR2 of the test_full_live_workflow_imgui_assert fix sequence.

When an ImGui scope mismatch (IM_ASSERT(Missing End())) fires in
immapp.run (e.g. after cumulative state corruption from prior sims'
panel renders), the RuntimeError propagates out of app.run(). The
controller's _io_pool gets shut down via __del__/finalization. The
hook server (separate ThreadingHTTPServer) survives. Subsequent test
clicks fail with 'cannot schedule new futures after shutdown' and
the test times out after 120s with no clear signal of what went
wrong.

This commit:
1. Wraps immapp.run in try/except RuntimeError in gui_2.py:618.
   On assertion: logs the error to stderr (NOT silent), records
   it on controller._gui_degraded_reason and _last_imgui_assert,
   and returns from run() so the hook server keeps serving.
2. Adds _gui_degraded_reason and _last_imgui_assert to
   AppController.__init__ (initialized to None).
3. Adds /api/gui_health endpoint in api_hooks.py:148. Returns
   {healthy, degraded_reason, last_assert, io_pool_alive}.
4. Adds ApiHookClient.get_gui_health() with the matching unit
   tests (3 mocked tests + 1 live test).

Per user feedback 2026-06-08:
- The wrap does NOT silently swallow the error. It logs at ERROR
  level and surfaces it via the health endpoint.
- Tests can call client.get_gui_health() to detect a degraded GUI
  and fail fast with a clear message.

TDD: tests written first, confirmed to fail, then fix applied.
34/34 unit tests pass. 1/1 live test passes (live_gui health
endpoint reports healthy=True on fresh subprocess).
2026-06-08 20:46:41 -04:00
ed c9a991bbb8 test(live_workflow): bump project switch wait timeout 30s -> 120s
The 30s wait_for_project_switch timeout was an excessive constraint.
In batch context, prior sims' AI discussion turn workers saturate the
8-worker io_pool, queueing this switch for tens of seconds. The other
defensive waits in the test (warmup 60s, prior switch 60s) already use
60s+, so 30s was the inconsistent outlier.

User confirmed: 'I think not completing in 30s is an excessive constraint
if thats whats going on.'

Verification:
- test_full_live_workflow isolation: 11.69s PASS
- 7-test batch (test_full_live_workflow + 4 extended sims + 2 markdown): 85.83s PASS
2026-06-08 18:14:18 -04:00
ed 87d7c5bff2 test(io_pool): update assertion for 8-worker pool size 2026-06-08 17:51:39 -04:00
ed 4a33848620 fix(io_pool): increase worker count from 4 to 8 to prevent test hangs
Root cause: test_full_live_workflow in batch context (with prior sims
running AI discussion turns) would queue its _do_project_switch behind
the auto-pruner's scan of tests/logs/ (154MB, 6519 files). The 4-worker
pool was saturated, so the switch would never run within 30s.

Fix: bump IO_POOL_MAX_WORKERS from 4 to 8. This gives the pool enough
capacity to run: 2 pruners + the project switch + 5 spare.

Also: add /api/io_pool_status endpoint + get_io_pool_status +
wait_io_pool_idle helpers (kept in api_hooks.py and api_hook_client.py
for the test_api_hook_client_io_pool.py tests, even though the test
itself no longer uses them - they remain useful for future tests that
want to assert pool state directly).

Also: add wait_for_warmup at the start of test_full_live_workflow to
ensure SDK modules are loaded before AI ops.

Test verification:
- test_full_live_workflow in isolation: 11.83s PASS
- test_full_live_workflow in batch (with 4 prior sims): 83.46s PASS
- 30/30 related unit tests PASS
2026-06-08 17:49:34 -04:00
ed 9afc93bce2 fix(app_controller): clear project-switch state in _handle_reset_session
When a prior test in the tier-3-live_gui batch leaves a _do_project_switch
background thread running, the next test's btn_project_new_automated click
sees _project_switch_in_progress=True (from the prior thread) and queues
the new path via _project_switch_pending_path. The queued switch is never
actually submitted to the io_pool, so is_project_stale() stays True and
AI ops (_handle_generate_send) bail with 'project switch in progress;
AI ops disabled'.

Fix: _handle_reset_session now also clears _project_switch_in_progress,
_project_switch_pending_path, and _project_switch_error (under the
existing _project_switch_lock). This way, even if the prior background
thread is still running, the controller reports an idle state and the
new switch can be submitted normally.

Also:
- src/api_hook_client.py: reverted wait_for_project_switch to require
  in_progress=False (was relaxed to return on queued path, which misled
  the caller into thinking the switch was done)
- tests/test_handle_reset_session_clears_project.py: new test
  test_handle_reset_session_clears_project_switch_state asserts
  is_project_stale() returns False after reset
- tests/test_api_hook_client_wait_for_project_switch.py: updated
  test_wait_for_project_switch_does_not_return_on_queued (in_progress
  + matching path should keep waiting, not return early)
- tests/test_live_workflow.py: added pre-wait for any in-flight switch
  before doing btn_reset (so the test waits up to 60s for the prior
  switch to complete if needed)
- conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with
  the deeper hang analysis and recommended fix

Known follow-up: test_full_live_workflow still hangs in tier-3 batch
even with this fix, because the new _do_project_switch itself is hung
in the io_pool (likely saturation from prior sims' AI discussion turn
workers). Deeper investigation required.
2026-06-08 15:19:30 -04:00
ed b6972c31de test(live_workflow): use wait_for_project_switch + defensive file check
Replaces the 10x1s blind poll of derived state with a condition-based
wait on /api/project_switch_status. Also adds a defensive file existence
check that fails fast (within 5s) if the click was dropped or the
project creation handler crashed.

The new wait surfaces a clear error message ('Project switch did not
complete in 30s. Last status: ...') instead of the generic 'Project
failed to activate', and exposes _project_switch_error if the controller
reported one.

- tests/test_live_workflow.py: replaced poll loop (lines 57-65) with
  wait_for_project_switch + os.path.exists defensive check
2026-06-08 13:26:54 -04:00
ed a6605d9889 feat(api_hook_client): add wait_for_project_switch for deterministic test waits
Adds a polling helper that blocks until the project switch completes,
errors out, or times out. Replaces the fragile 10x1s blind poll in
test_full_live_workflow with a condition-based wait on the
/api/project_switch_status endpoint.

Features:
- Polls /api/project_switch_status every 200ms (configurable)
- Returns immediately on error (with the error in the result)
- Path matching: exact match OR basename match (handles absolute vs relative)
- Times out with a clear 'timeout' flag instead of a generic assertion
- Optional expected_path: if None, returns on any in_progress=False

- src/api_hook_client.py: new wait_for_project_switch method (37 lines)
- tests/test_api_hook_client_wait_for_project_switch.py: 6 unit tests
  with mocked _make_request covering all paths
2026-06-08 13:04:12 -04:00
ed e0a3eb8c05 fix(app_controller): regression in test_context_sim_live from clearing active_project_path
Task 2 (_handle_reset_session reset) introduced a regression: setting self.active_project_path to empty caused an infinite re-switch loop in _do_project_switch because _flush_to_project writes to active_project_path (raises OSError on empty path), and the finally block re-submitted the failed switch on every iteration. Result: test_context_sim_live saw switching-to status for 5+ seconds and MD-only generation was blocked.

Fix: keep self.active_project_path as-is in _handle_reset_session. Only reset self.project (to a fresh default_project dict) and self.project_paths (to empty list). The stale project state issue is solved by replacing the project dict; the active_project_path stays valid for _flush_to_project.

- src/app_controller.py: refined _handle_reset_session project reset
- tests/test_handle_reset_session_clears_project.py: updated contract test to assert active_project_path is preserved
2026-06-08 12:24:10 -04:00
ed 6ecb31ea0a feat(app_controller): reset project state in _handle_reset_session
Stale project state from prior live_gui tests (shared session-scoped
subprocess) was leaking into subsequent tests, causing the
test_full_live_workflow race condition: 'Project not switched' errors
when self.project still claimed to be a different project.

The fix: _handle_reset_session now mirrors the default-project branch
of __init__ (lines 1743-1745), creating a fresh default project dict,
clearing active_project_path and project_paths, and reinitializing
the workspace manager.

- src/app_controller.py: 6 new lines in _handle_reset_session
- tests/test_handle_reset_session_clears_project.py: 3 tests
  (active_project_path, project_paths, self.project)
2026-06-08 10:13:07 -04:00
ed abb3856525 feat(api_hooks): add /api/project_switch_status endpoint for deterministic test signaling
Adds a new endpoint that exposes the project-switch state machine so tests
can poll for completion instead of guessing with timeouts.

- AppController: track _project_switch_error on failure paths
- src/api_hooks.py: GET /api/project_switch_status returns
  {in_progress, pending_path, active_path, error}
- src/api_hook_client.py: get_project_switch_status() helper
- tests/test_api_hooks_project_switch.py: 3 unit tests for client + endpoint
  shape, 1 live_gui test for the default-idle case
2026-06-08 09:55:36 -04:00
ed 9eac02ddcb feat(tests): populate test_categories.toml with cross-cutting entries 2026-06-08 01:14:12 -04:00
ed 29ac64adc6 test(conftest): register tests.pytest_collection_order as pytest plugin 2026-06-08 00:49:11 -04:00
ed f240504f0e feat(collection_order): implement opt-in per-test sort via conftest hook 2026-06-08 00:47:21 -04:00
ed 6287005ad1 test(collection_order): add red tests for opt-in sort_items_by_order 2026-06-08 00:47:03 -04:00
ed e07036ad5d feat(batcher): implement Batch dataclass and plan() function 2026-06-08 00:46:12 -04:00
ed 246f293c56 test(batcher): add red tests for plan() function 2026-06-08 00:41:20 -04:00
ed f778ef509e feat(categorizer): implement load_registry, merge_registry, categorize_all 2026-06-08 00:33:21 -04:00
ed 828050ae4f test(categorizer): add red tests for registry merge and full classification 2026-06-08 00:27:04 -04:00
ed 9e5fed56a5 feat(categorizer): implement subsystem/speed/batch_group inference 2026-06-08 00:22:22 -04:00
ed 7aaac7d586 test(categorizer): add red tests for subsystem/speed/batch_group inference 2026-06-08 00:21:03 -04:00
ed b2e8cce9f6 feat(categorizer): implement auto_classify using AST scan (no regex) 2026-06-08 00:19:43 -04:00
ed fb54737f45 test(categorizer): add red tests for auto_classify fixture_class rules 2026-06-08 00:16:18 -04:00
ed dd48c095b8 refactor(tests): move test_categorizer library from scripts/ to tests/ 2026-06-08 00:15:19 -04:00
ed 746dde8286 push latest related to default layout 2026-06-07 23:50:24 -04:00