Lesson 5 from the 4-day test-hell saga. The chroma cache lives at
tests/artifacts/.slop_cache/chroma_<collection>/, NOT at the per-run
live_gui_workspace_<timestamp>/ subdir. The trailing-slash bug in
Path(active_project_path).parent places the cache one level higher
than expected.
RAG tests must pre-clean the cache to avoid persistent state from
prior batched runs. Documents the cleanup pattern (shutil.rmtree with
ignore_errors=True), the auto-recovery mechanism (_validate_collection_dim),
and 3 anti-patterns (assuming per-run, not cleaning, asserting on
first chunk in batched context).
New entry at the top of the Recently Shipped list, linking to the
archive/ folder. Includes:
- 314/314 green across all 11 tier batches
- FR1-FR5 summary
- 3 lineage tracks also archived
- The 4 unblocked tracks
- Link to the closing batch-green report
- Remove row 1 from Active Tracks table
- Update rows 2-5, 17: test_infrastructure_hardening_20260609 -> '(merged)'
- Mark test_infrastructure_hardening as [COMPLETE 2026-06-10] [archived]
- Update link to use archive/ instead of tracks/
- Add closing note: 314/314 tests green, lineage tracks also archived
The previous doc showed:
- A fictional AppState dataclass (does not exist)
- A fictional __init__ that creates manager objects in __init__
(managers are lazy via __getattr__, created in _load_active_project)
- A fictional register_hooks(app) method (real flow is _init_actions
called from init_state populates _predefined_callbacks)
- A fictional enable_test_hooks parameter (real signature is
defer_warmup: bool = False, log_to_stderr: Optional[bool] = None;
--enable-test-hooks is parsed by sloppy.py for HookServer, not here)
The new doc describes the real init flow (timeline anchors, 12 locks,
GUI health state, io_pool, warmup manager, flags) and points to the
actual line numbers in src/app_controller.py.
guide_ai_client.md:
- Add 'Module-Level Imports' section explaining that the 5 provider SDKs
are NOT imported at module level; they're obtained via
src.module_loader._require_warmed() after the WarmupManager loads them
in the background. (Per startup_speedup_20260606: import src.ai_client
went from ~1800ms to ~161ms.)
guide_api_hooks.md:
- Add 4 warmup endpoints to the endpoints table:
/api/warmup_status, /api/warmup_wait?timeout=N,
/api/warmup_canaries, /api/startup_timeline
- Add 'Warmup API' section with client methods + external script pattern
(use get_warmup_wait() instead of time.sleep() race)
The live_gui fixture in tests/conftest.py:467 now yields a _LiveGuiHandle
object (not a tuple). The handle exposes:
- .process, .gui_script, .workspace (Path to per-run workspace)
- .is_alive(), .ensure_alive(), .respawn_count
- __iter__ and __getitem__ for backward-compatible tuple unpacking
Also document the xdist O_EXCL file-lock coordination pattern and the
PYTEST_XDIST_WORKER env var owner/client role split.
The 2026-06-05 live_gui_fragility_fixes refactor replaced the old 7-field
WorkspaceProfile (docking_layout: bytes, window_visibility, theme,
theme_fx_enabled, captured_at, description) with a 4-field model:
ini_content: str, show_windows, panel_states. tomli_w rejects bytes,
so the ini_content is now a plain ImGui ini string, not base64.
- Update Data Model class example + field table
- Update Serialization section + TOML example
- Update Profile Activation + Capturing Current State steps
- Update Layout Stability note (binary blob -> raw ini string)
- Replace 'Theme FX State is Global' limitation with 'Theme is Not Captured'
Three real fixes for the sim test + the live_gui coordination layer:
1. /api/project_switch_status endpoint in src/app_controller.py.
The wait helper had been calling this endpoint but it did not exist;
the helper always received a 404, fell back to {in_progress: False},
and returned immediately even when a switch was in flight. Added the
endpoint that reads _project_switch_in_progress, active_project_path,
and _project_switch_error from the controller.
2. simulation/sim_base.py: replace time.sleep(2.0)/time.sleep(1.5) in
the setup() with wait_io_pool_idle and wait_for_project_switch so
the test does not click btn_md_only while a project switch is in
flight. Also added the wait calls to sim_context.py for the same
reason.
3. src/app_controller.py _handle_md_only: removed the is_project_stale()
early-return. The stale state is a transient window during which the
previous code dropped the click on the floor with a misleading
'stale ui' status. The MD generation worker is safe to run from any
project state; the action handler now always proceeds.
4. tests/test_extended_sims.py: set current_model to 'gemini-cli' so
_do_generate does not raise KeyError('model') when the test
overrides provider to gemini_cli.
KNOWN ISSUE: test_context_sim_live still fails with status
'switching to: temp_livecontextsim' after a 60s wait. The click
appears to be re-triggering a project switch via the GUI's render
loop. Root cause investigation deferred; the sim is async and the
test path is fragile.
The session-scoped live_gui fixture deleted the shared workspace
before recreating it, which raced with the per-worker lock acquisition
and produced FileNotFoundError on .live_gui_owner.lock in xdist.
The per-run timestamped name (tests/artifacts/live_gui_workspace_<ts>/)
already provides enough isolation between pytest invocations, so the
rmtree is unnecessary. Use mkdir(exist_ok=True) only.