manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	9ddfa98133	fix(ai_client): move openai_compatible imports to local scope; fix startup_speedup invariant The follow-up track's tool-loop refactor moved 'from src.openai_compatible import send_openai_compatible, OpenAICompatibleRequest, NormalizedResponse' to MODULE level in src/ai_client.py. This violates the startup_speedup_20260606 invariant: heavy SDKs must not be loaded at module level because ai_client.py is on the main thread's import chain. src/openai_compatible.py line 5 does 'from openai import OpenAIError, ...', so any import from it triggers the openai SDK to load. test_ai_client_does_not_import_openai_at_module_level guards this invariant and was failing. Fix: move the imports back to local scope inside the function bodies that need them: - _default_send closure inside run_with_tool_loop (imports send_openai_compatible) - _send_grok (imports OpenAICompatibleRequest) - _send_minimax (imports OpenAICompatibleRequest) - _send_llama (imports OpenAICompatibleRequest) - _send_gemini_cli (imports OpenAICompatibleRequest + NormalizedResponse) Test patches: tests that previously patched 'src.ai_client.send_openai_compatible' now patch 'src.openai_compatible.send_openai_compatible' (the actual import source). _execute_tool_calls_concurrently patches unchanged (it's defined in src/ai_client.py itself). Green confirmed: 62 vendor + tool + import-isolation tests pass. 0 regressions.	2026-06-11 16:15:49 -04:00
ed	4748d13490	feat(ai_client): add send_func + on_pre_dispatch to run_with_tool_loop; refactor _send_gemini_cli Task 1.7 of the follow-up track. Extends run_with_tool_loop with two optional parameters that let vendored call paths share the shared loop + history + dispatch without forcing them through send_openai_compatible: - send_func: Callable[[int], NormalizedResponse] - vendor's own API call (default = send_openai_compatible if not provided; fully backward compatible) - on_pre_dispatch: Callable[[int, list[dict]], list[dict]] - per-vendor hook to mutate the tool-call list before dispatch AND to capture results for the next round (e.g. Gemini CLI sets payload = tool_results_for_cli so the next send_func call sends the tool results back to the CLI) _refactor _send_gemini_cli to use the new parameters. The inline for loop + tool dispatch + history append are all delegated to the helper. The vendor's send_func closure handles: - adapter.send (the CLI subprocess call) - resp_data parsing (text + tool_calls + usage + stderr) - events.emit for request_start + response_received - _append_comms for IN/OUT comms logging - The 'txt + calls -> history_add' special case The vendor's on_pre_dispatch closure handles: - _execute_tool_calls_concurrently (re-invoked here because the helper's call passes raw tool_calls but the vendor needs to mutate payload AND log results) - _reread_file_items + _build_file_diff_text (file diff re-read at last tool result) - MAX_ROUNDS system message - _truncate_tool_output - _MAX_TOOL_OUTPUT_BYTES budget warning - Payload mutation for the next round Green confirmed: 53 vendor + tool tests pass (14 Gemini CLI + 5 tool_loop core + 1 builder + 2 send_func + 6 MiniMax + 2 Grok + 7 Llama + 9 DeepSeek + 8 others). No regressions.	2026-06-11 14:48:03 -04:00
ed	19a4d43e32	refactor(minimax): use run_with_tool_loop shared helper (68 -> 44 lines) Task 1.3 of the follow-up track. _send_minimax now uses run_with_tool_loop with a per-round request_builder callback that re-reads _minimax_history under _minimax_history_lock. The plan's Task 1.3 example builds the request once before the loop. That would break MiniMax tool flows because the API would not see the tool results appended to _minimax_history on later rounds. The fix: extend run_with_tool_loop's 2nd arg to accept Union[OpenAICompatibleRequest, Callable[[int], OpenAICompatibleRequest]] (backward compatible; static-request vendors pass a single request). MiniMax now passes a closure that rebuilds messages from history each round. Reasoning extraction: MiniMax exposes its chain-of-thought via response.raw_response.choices[0].message.reasoning_details[0]. get('text'). Lifted to a _extract_minimax_reasoning callback passed as reasoning_extractor=... (the new parameter added in the previous commit). Trim callback: wraps _trim_minimax_history so it can be called from run_with_tool_loop after each tool-result append. Green confirmed: 51 vendor + tool tests pass (6 MiniMax + 5 tool_loop core + 1 tool_loop builder + 39 others); the new test_ai_client_tool_loop_builder.py locks in the per-round builder contract.	2026-06-11 13:35:45 -04:00
ed	dc0f25c53b	test(ai_client): add red tests for run_with_tool_loop shared helper 5 Red tests in tests/test_ai_client_tool_loop.py verify the planned run_with_tool_loop contract (no-tool-call fast path, tool-call dispatch, max-rounds safety, history append, error tolerance). Deviation from plan: tests patch src.ai_client.send_openai_compatible (plan's Task 1.1 had src.tool_loop.send_openai_compatible). The plan predates the AGENTS.md HARD RULE on src/<thing>.py files; per the follow-up track's Naming Convention section, run_with_tool_loop lives IN src/ai_client.py. The function body imports send_openai_compatible from src.openai_compatible, so src.ai_client.send_openai_compatible is the correct patch path. state.toml: current_phase 0 -> 1, phase_1 pending -> in_progress, t1_1 pending -> in_progress, blocked_by status phase_6_in_progress -> phase_6_complete (parent's Phase 6 checkpointed at `064cb26`). Confirmed red: 5 ImportError against src.ai_client.run_with_tool_loop at collection time.	2026-06-11 10:43:56 -04:00
ed	90f2be94af	test(grok,llama): red phase for Grok (xAI) + Llama (multi-backend) (8 tests, 6 fail) 8 failing tests in 2 new files for the upcoming Grok and Llama provider implementations. Grok (tests/test_grok_provider.py, 2 tests): 1. test_send_grok_uses_xai_endpoint: _send_grok calls _ensure_grok_client and uses an xAI client (base_url https://api.x.ai/v1) 2. test_grok_2_vision_supports_image: structural check that the capability registry has vision=True for grok-2-vision (already populated in Phase 1, so this test passes in Red phase; it is a regression guard for the registry, not an implementation test) Llama (tests/test_llama_provider.py, 6 tests): 1. test_send_llama_ollama_backend: _send_llama with localhost:11434 (Ollama) base URL 2. test_send_llama_openrouter_backend: _send_llama with OpenRouter URL 3. test_send_llama_custom_url: _send_llama with custom URL (escape hatch for self-hosted) 4. test_llama_model_discovery_unions_ollama_and_openrouter: _list_llama_models returns the 8 models from the capability registry 5. test_llama_3_2_vision_vision_capability: structural check for llama-3.2-11b-vision-preview (passes in Red phase) 6. test_llama_local_backend_cost_tracking_false_for_ollama: the local-LLM signal -- when base_url is localhost, _get_llama_cost_tracking() returns False. This is the first test that exercises the local LLM support that the capability matrix was designed for. Both _reset_grok_state and _reset_llama_state fixtures use hasattr() to be no-ops when the state doesn't exist (Red phase). Test signatures use the real 10-arg _send_minimax signature, NOT the plan's 12-arg with enable_tools / rag_engine. Red phase: 6/8 tests fail (4 AttributeError on missing _send_, 2 ImportError on missing _list_/_get_*). 2/8 pass (registry structural checks). Next: Green phase - implement _send_grok + _ensure_grok_client + _send_llama + _ensure_llama_client + _list_llama_models + _get_llama_cost_tracking in src/ai_client.py.	2026-06-11 01:41:47 -04:00
ed	de5e106234	fix(qwen): align with dashscope 1.25.21 API; remove InvalidApiKey monkey-patch	2026-06-11 01:26:53 -04:00
ed	060f471cb9	test(qwen): red phase for Qwen via DashScope (5 failing tests) 5 failing tests in tests/test_qwen_provider.py that establish the core behaviors of the new Qwen (DashScope) provider: 1. test_send_qwen_routes_to_dashscope: _send_qwen calls _ensure_qwen_client and _dashscope_call, returns the text from the DashScope response 2. test_qwen_vision_vl_model_accepts_image: when file_items contains an image, the messages passed to _dashscope_call include the image ref 3. test_qwen_tool_format_translation: build_dashscope_tools converts OpenAI-shaped tool dicts to DashScope shape (name/description/parameters flat structure, not wrapped in function:) 4. test_qwen_error_classification: classify_dashscope_error maps dashscope.common.error.InvalidApiKey -> ProviderError(kind='auth', provider='qwen') 5. test_list_qwen_models_returns_hardcoded_registry: _list_qwen_models returns the 7 Qwen models registered in src/vendor_capabilities.py The autouse _reset_qwen_state fixture uses hasattr() so it is a no-op when _qwen_client / _qwen_history do not exist (yet); this keeps the fixture working in the Red phase. All 5 tests fail: - Tests 1, 2: AttributeError: src.ai_client has no _ensure_qwen_client / _send_qwen / _dashscope_call - Tests 3, 4: ModuleNotFoundError: No module named src.qwen_adapter - Test 5: ImportError: cannot import name _list_qwen_models Test signature adapted to match the real _send_minimax signature at src/ai_client.py:2143-2148 (10 params, no enable_tools / rag_engine) rather than the plan's 12-param signature. Next: Green phase - implement src/qwen_adapter.py + src/ai_client.py state + _ensure_qwen_client + _send_qwen + _list_qwen_models.	2026-06-11 00:53:10 -04:00
ed	b53fe39d79	test(openai_compatible): red phase for shared send helper (6 failing tests) 6 failing tests in tests/test_openai_compatible.py that establish the core behaviors of the new send_openai_compatible() shared helper: 1. test_send_non_streaming_returns_normalized_response: blocking call returns text, empty tool_calls, and correct usage token counts 2. test_send_streaming_aggregates_chunks: streaming call aggregates deltas into final text and fires stream_callback per chunk 3. test_tool_call_detection_in_response: tool_calls from the response are converted to dicts with id/type/function/arguments fields 4. test_vision_multimodal_message: messages with multimodal content (text + image_url) are passed through unchanged to the client 5. test_error_classification_429_to_rate_limit: RateLimitError from openai SDK is caught and re-raised as ProviderError(kind='rate_limit') 6. test_normalized_response_is_frozen_dataclass: NormalizedResponse is a frozen dataclass (FrozenInstanceError on attribute assignment) All 6 tests fail with ModuleNotFoundError: No module named 'src.openai_compatible' (confirmed via pytest). The implementation file will be created in the next commit (Green phase). ProviderError confirmed importable from src.ai_client (no stub needed).	2026-06-11 00:35:13 -04:00
ed	6be04bc4f0	feat(vendor_capabilities): implement registry with initial 22-entry population Green phase: src/vendor_capabilities.py now exists and all 3 Red-phase tests in tests/test_vendor_capabilities.py pass. Implementation: - VendorCapabilities frozen dataclass with 12 fields (vendor, model, vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking, cost_input_per_mtok, cost_output_per_mtok, notes) - Module-level _REGISTRY dict keyed by (vendor, model) - register() inserts/overwrites entries - get_capabilities() returns specific entry if present, else vendor '' default, else raises KeyError with 'No capabilities registered' message - list_models_for_vendor() returns sorted model names for a vendor (excludes '' wildcard) Initial population (22 entries at module load): - 1 minimax wildcard (cost: 0.20/0.20 per Mtok) - 4 grok (1 wildcard + 3 models; grok-2-vision has vision=True) - 9 llama (1 wildcard + 8 models; 11b/90b vision variants have vision=True) - 8 qwen (1 wildcard + 7 models; qwen-vl-plus/max have vision=True; qwen-audio has notes='Text-only in v1; audio input deferred') The plan's Task 1.3 listed 22 entries but included one impossible entry (vendor='minimax', model='grok-2-latest'). Omitted; 21 entries shipped. Test fix: test_fallback_to_vendor_default previously used model name 'llama-3.3-70b-specdec' which IS in the registry, so the specific entry was returned (with default cost_tracking=True), not the wildcard. Fixed by changing to 'llama-3.3-future-unregistered' (not in registry, so fallback fires correctly).	2026-06-11 00:30:52 -04:00
ed	6fb6f8653c	test(vendor_capabilities): red phase for registry lookup, fallback, unknown vendor 3 failing tests in tests/test_vendor_capabilities.py that establish the core behaviors of the new VendorCapability matrix: 1. test_registry_lookup_known_model: registering and looking up a specific (vendor, model) entry returns the registered entry 2. test_fallback_to_vendor_default: looking up an unregistered model returns the vendor's '*' default entry 3. test_unknown_vendor_raises: looking up a vendor with no entries raises KeyError with a 'No capabilities registered' message All 3 tests fail with ModuleNotFoundError: No module named 'src.vendor_capabilities' (confirmed via pytest). The implementation file will be created in the next commit (Green phase). The autouse _clean_registry fixture snapshots src.vendor_capabilities._REGISTRY before each test and restores it after, providing test isolation for the module-level state.	2026-06-11 00:19:00 -04:00
ed	5a9b8d6891	fix(test+rag): clean chroma cache pre-test + add INVESTIGATE stderr for RAG init	2026-06-10 17:20:57 -04:00
ed	a3abe49ca9	fix(test): poll for mma_state_update 'simulating' to land in test_gui_ux_event_routing	2026-06-10 15:45:44 -04:00
ed	2c924fe6df	test(infra): poll-for-event race fixes + watchdog timeout bump + spec update	2026-06-10 15:14:35 -04:00
ed	563e609505	fix(test): poll for push_event to land in test_visual_mma_components	2026-06-10 15:13:25 -04:00
ed	8f7de45aca	fix(rag): robust test polling for entry race + stress test timing tolerance	2026-06-10 14:43:27 -04:00
ed	15ffc3a34f	fix(rag): make test assertion accept either file's content (robust to chroma ordering)	2026-06-10 13:53:52 -04:00
ed	1772fa8fc2	conductor(checkpoint): Final Phase 2 complete - FR1+FR2 re-applied, sim test passes in batch	2026-06-10 12:13:16 -04:00
ed	4660b8c874	fix(sim): defensive .setdefault('paths', []) in test_context_sim_live	2026-06-10 11:33:15 -04:00
ed	428aa18948	conductor(checkpoint): Checkpoint end of Phase 1 (4 FRs + 4 regression tests)	2026-06-10 09:56:21 -04:00
ed	b96d709efb	test(reset): regression for 3 pre-existing controller bugs	2026-06-10 09:16:46 -04:00
ed	72f8f466fe	fix(sim+api): proper wait loops, project switch endpoint, drop stale check Three real fixes for the sim test + the live_gui coordination layer: 1. /api/project_switch_status endpoint in src/app_controller.py. The wait helper had been calling this endpoint but it did not exist; the helper always received a 404, fell back to {in_progress: False}, and returned immediately even when a switch was in flight. Added the endpoint that reads _project_switch_in_progress, active_project_path, and _project_switch_error from the controller. 2. simulation/sim_base.py: replace time.sleep(2.0)/time.sleep(1.5) in the setup() with wait_io_pool_idle and wait_for_project_switch so the test does not click btn_md_only while a project switch is in flight. Also added the wait calls to sim_context.py for the same reason. 3. src/app_controller.py _handle_md_only: removed the is_project_stale() early-return. The stale state is a transient window during which the previous code dropped the click on the floor with a misleading 'stale ui' status. The MD generation worker is safe to run from any project state; the action handler now always proceeds. 4. tests/test_extended_sims.py: set current_model to 'gemini-cli' so _do_generate does not raise KeyError('model') when the test overrides provider to gemini_cli. KNOWN ISSUE: test_context_sim_live still fails with status 'switching to: temp_livecontextsim' after a 60s wait. The click appears to be re-triggering a project switch via the GUI's render loop. Root cause investigation deferred; the sim is async and the test path is fragile.	2026-06-10 00:31:22 -04:00
ed	33d02bb11f	fix(test): drop rmtree race in live_gui workspace creation The session-scoped live_gui fixture deleted the shared workspace before recreating it, which raced with the per-worker lock acquisition and produced FileNotFoundError on .live_gui_owner.lock in xdist. The per-run timestamped name (tests/artifacts/live_gui_workspace_<ts>/) already provides enough isolation between pytest invocations, so the rmtree is unnecessary. Use mkdir(exist_ok=True) only.	2026-06-09 23:31:09 -04:00
ed	283bb7085b	fix(test): remove live_gui skip gate — lock mechanism handles coordination	2026-06-09 22:45:36 -04:00
ed	5568b59634	fix(test): single shared workspace, remove per-worker subdirs (keep lock mechanism)	2026-06-09 22:38:28 -04:00
ed	4bb19835db	fix(test): per-worker workspace subdir + file-lock for xdist live_gui coordination	2026-06-09 22:23:33 -04:00
ed	38cb0f99b4	fix(test): add PID to workspace path for xdist worker isolation	2026-06-09 21:45:02 -04:00
ed	35f4cecb9b	fix(test): catch OSError in workspace rmtree retry (broader than PermissionError)	2026-06-09 21:22:00 -04:00
ed	aa776224f2	test(workspace): update fixture test to assert tests/artifacts/ not tmp dir	2026-06-09 21:06:06 -04:00
ed	ccc2aa0be9	test(workspace): verify per-run workspace path and gitignore status	2026-06-09 20:45:24 -04:00
ed	b8c15f8d92	fix(test): per-run workspace under tests/artifacts/ (replaces tmp_path_factory)	2026-06-09 20:42:43 -04:00
ed	fe240db410	fix(reset): clear mma_tier_usage and RAG state in _handle_reset_session	2026-06-09 19:44:10 -04:00
ed	34290e5d1a	test(watchdog): update PYTEST_FINISHED_TIMEOUT_SECONDS to 600 to match conftest	2026-06-09 18:42:53 -04:00
ed	c3af1b8a2e	chore(test): double smart_watchdog timeout from 300s to 600s for tier-3	2026-06-09 18:37:34 -04:00
ed	7a946544ff	test(mma): mark test_visual_mma_components with clean_baseline	2026-06-09 17:14:23 -04:00
ed	e7da7e0d6a	test(rag): update test for Phase 4 coalescing state	2026-06-09 17:10:33 -04:00
ed	749120d239	feat(audit): flag hardcoded workspace and project-root paths in tests	2026-06-09 17:01:14 -04:00
ed	1cd3444e4c	test(rag): mark RAG tests with clean_baseline for batch isolation	2026-06-09 16:56:55 -04:00
ed	7b87bbf5ec	feat(test): clean_baseline marker resets controller state before test	2026-06-09 16:40:18 -04:00
ed	b8fcd9d6f5	fix(rag): coalesce _sync_rag_engine calls via token + dirty flag	2026-06-09 16:25:44 -04:00
ed	006bb11488	refactor(test): 5 test files use live_gui_workspace fixture instead of hardcoded path	2026-06-09 16:14:40 -04:00
ed	91313451a2	feat(test): expose live_gui_workspace as a separate fixture	2026-06-09 15:53:06 -04:00
ed	c64da95ef5	refactor(test): live_gui workspace via tmp_path_factory	2026-06-09 15:51:35 -04:00
ed	c3cb3c6e44	feat(test): autouse _check_live_gui_health recovers from degraded subprocess	2026-06-09 15:47:28 -04:00
ed	67d0211e56	feat(test): autouse _check_live_gui_health recovers from degraded subprocess	2026-06-09 15:42:00 -04:00
ed	16bd3d3a47	refactor(test): wrap live_gui subprocess in _LiveGuiHandle class	2026-06-09 15:37:47 -04:00
ed	40f905d14b	test(rag): update dim-mismatch test to assert rmtree behavior The fix in `644d88ab` changed the recovery path from client.delete_collection to shutil.rmtree (chromadb 1.5.x delete_collection is broken on corrupted state). The test still asserted the old behavior.	2026-06-09 14:50:55 -04:00
ed	a341d7a7c8	test: ensure sentence-transformers is in test env + conftest gate	2026-06-09 10:37:14 -04:00
ed	e62266e868	fix(rag): surface embedding provider init failure as 'error' status The bug: when the local embedding provider fails to initialize (e.g. sentence-transformers not installed), RAGEngine.__init__ leaves self.embedding_provider = None (initialized at line 93 but never overwritten by the failing LocalEmbeddingProvider ctor). The constructor returns. _sync_rag_engine's else branch then sets status to 'ready' - a lie. The RAG panel shows 'ready'. The user triggers a retrieval. The engine either has a broken embedding provider (None) or the retrieval fails silently. The RAG context never appears in the AI's history. The fix: in _sync_rag_engine's _task, after RAGEngine(...) returns, check if engine.embedding_provider is None. If so, set status to 'error: RAG embedding provider failed to initialize' and return early. This prevents: - The engine from being assigned to self.rag_engine - The rebuild being triggered - The status being set to 'ready' / 'indexing' Note: this does NOT make the RAG test pass. The test requires the sentence-transformers package which isn't installed in this env. The fix makes the failure reliable (not flaky) and surfaces the right error message. TDD: 3 tests added in tests/test_rag_engine_ready_status_bug.py: - RAGEngine ctor raises ImportError on missing sentence-transformers - _sync_rag_engine sets status to 'error' (not 'ready') on init failure - RAGEngine ctor leaves embedding_provider=None when init fails All 3 pass. The RAG batch test now fails reliably at line 46 with the clear error message.	2026-06-09 09:39:02 -04:00
ed	bcdc26d0bd	fix(gui): correct __getattr__ to not silently return None for missing ui_ attrs PR1 follow-up (the actual IM_ASSERT root cause fix). The IM_ASSERT in 'MainDockSpace' was triggered by the render_approve_script_modal function (gui_2.py:4895) calling imgui.checkbox with a None value for app.ui_approve_modal_preview. The chain of bugs: 1. AppController.__getattr__ returned None for ANY ui_ attribute (line 1237-1238). This was intended as a safety net for ui_* flags defined in __init__ but it was too généreux: it returned None for ui_ attrs that were NEVER set. 2. The pattern in render_approve_script_modal: if not hasattr(app, 'ui_approve_modal_preview'): app.ui_approve_modal_preview = False _, app.ui_approve_modal_preview = imgui.checkbox(..., app.ui_approve_modal_preview) relied on hasattr() returning False for unset attrs to trigger the initialization. But the App.__setattr__ checks hasattr(self.controller, name) to decide where to route assignments. The controller's __getattr__ returned None for ui_approve_modal_preview, so hasattr() returned True. The App.__setattr__ routed the assignment to the controller. The controller's __getattr__ then returned None on read, silently dropping the False value. 3. The next line called imgui.checkbox with None, which raised a TypeError. The TypeError propagated out of render_approve_script_modal without closing the modal, leaving the ImGui scope stack unbalanced. The unbalanced scope triggered IM_ASSERT(Missing End()) on the next frame. Fix: AppController.__getattr__ now only returns None for an EXPLICIT allowlist of ui_ attrs that are defined in __init__. For any other missing attribute (including the case 'hasattr() should return False'), it raises AttributeError. The App.__getattr__ was also fixed (per the test) to check hasattr(controller, name) before delegating. This is defense in depth in case other __getattr__ patterns are added. Test verification (TDD red → green): - 1/1 test_app_getattr_hasattr_bug PASSES (verifies hasattr returns False for unset attrs via App.__getattr__) - 1/1 test_app_controller_getattr_ui_bug PASSES (verifies hasattr returns False for unset ui_ attrs on controller) Live verification: - 4 sims + test_live_workflow + 2 markdown tests: 7/7 PASS in 83.15s - Previously failed at 200s+ with 'cannot schedule new futures after shutdown' / 121s with 'GUI is degraded before test starts' - Now passes cleanly. The IM_ASSERT no longer fires. 13/13 related unit tests pass (app_controller_* + app_run_* + app_getattr_*). No regressions in 51/51 io_pool/warmup/sigint/etc. unit tests.	2026-06-08 23:45:25 -04:00
ed	51ecace464	test(live_workflow): pre-flight health check fails fast on dirty state PR3 of the test_full_live_workflow_imgui_assert fix sequence. When a prior live_gui test in the same session crashes the GUI (e.g. via an ImGui IM_ASSERT from cumulative panel state), the controller's _io_pool gets shut down. The next test starts in a degraded state but only discovers this 120s later when its project switch times out with a confusing 'cannot schedule new futures after shutdown' error. This commit adds a /api/gui_health pre-flight check at the start of test_full_live_workflow. If the GUI is degraded, the test fails fast (within 1s) with a clear, actionable message that includes: - The exact RuntimeError that caused the degradation - The full traceback of the last ImGui scope mismatch - A note that the new test cannot proceed with a dirty state Per user feedback 2026-06-08: 'I don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state.' Also includes the planning documents written earlier in this session: - TODO_test_full_live_workflow_v2.md (task list) - test_full_live_workflow_imgui_assert_20260608.md (root cause report) - test_full_live_workflow_propagation_digest_20260608.md (solutions digest) - batch_resilience_plan_20260608.md (batch resilience plan) Verification: - test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade) - 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS) - Without PR3 fix: 200s FAIL with confusing 120s timeout - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message - The fast-fail is observable, not silent (per user's 'wrap might be worth it if that properly lets us handle the assert')	2026-06-08 21:17:54 -04:00

1 2 3 4 5 ...