manual_slop

Private

Public Access

Author	SHA1	Message	Date
ed	7fee76f491	feat(capability_matrix): add anthropic, gemini, deepseek registry entries Phase 5 t5_1, t5_2, t5_3: populate the v2 capability matrix for the 3 vendors that had no registry entries. Previously, get_capabilities('anthropic', ...) raised KeyError and the GUI fell back to the 'unregistered' defaults. Now all 8 vendors in PROVIDERS are on the matrix. Entries added: anthropic/* (12 entries) - wildcard + 8 sonnet/opus variants + haiku-4-5 + claude-fable-5 - caching=True, structured_output=True, file_search=True, mcp_support=True, computer_use=True (per Claude 3.5+ docs) - cost: sonnet=\/\, opus=\/\, haiku=\/\ - context_window=200000 (Claude 3+ standard) gemini/* (5 entries) - wildcard + 3.1-pro-preview + 3-flash-preview + 2.5-flash + 2.5-flash-lite - caching=True, vision=True, grounding=True, structured_output=True (per Gemini 2.5+ docs) - video=True, audio=True (for 2.5+ and 3.x; lite has no video/audio) - cost: 3.1-pro=\.50/\.50, 3-flash=\.15/\.60, 2.5-flash=\.15/\.60, 2.5-flash-lite=\.075/\.30 - context_window=1000000 (Gemini 2.5+ standard) deepseek/* (4 entries) - wildcard + deepseek-v3 + deepseek-reasoner + deepseek-r1 - reasoning=True (for r1/reasoner; v3 has structured_output=True only) - structured_output=True (all) - cost: v3=\.27/\.10, r1=\.55/\.19 - context_window=32768 Tests: - 9 new tests in tests/test_vendor_capabilities.py: * anthropic: sonnet/opus/haiku/wildcard entry tests * gemini: pro-preview + vision + wildcard tests * deepseek: reasoner + wildcard tests - 116/116 vendor+tool+provider+import-isolation tests pass (no regressions; +9 new tests this commit) - 3 audit scripts pass	2026-06-11 21:35:32 -04:00
ed	7d60e8f5ab	feat(capability_matrix): populate v2 fields per-model; add runtime local override Updates per-model registry entries to populate the 12 v2 fields where the capability is genuinely supported: minimax-M2.5/M2.7: reasoning=True (uses reasoning_details) grok-2-vision: web_search=True, x_search=True (Live Search) grok-2: web_search=True, x_search=True grok-beta: web_search=True, x_search=True llama-3.1-405b: reasoning=True (explicitly in model name) qwen-long: caching=True (custom long-context chunking) qwen-audio: audio=True (was 'deferred' in v1 notes) Adds the runtime override helper: _apply_runtime_caps_override(app, caps) -> caps with local=True if app.current_provider=='llama' AND _llama_base_url contains 'localhost' or '127.0.0.1' The 'local' flag is the only v2 field that is runtime-state, not a static per-model property (OpenRouter llama is cloud; Ollama llama is local — same model name, different backend). The override uses dataclasses.replace() to mutate the frozen dataclass. Implemented in src/gui_2.py (per the HARD RULE on no new src/.py files). The override is wired into App._get_active_capabilities() so the GUI sees caps.local=True when the active backend is Ollama and caps.local=False otherwise. Also: cost panel in src/gui_2.py (per-tier + session-total columns) now renders 'Free (local)' when caps.local=True (both the per-tier cost column and the session-total line). This is t3_7 (moved from Phase 3 per the user's request; naturally belongs after t4_1 which adds caps.local). Tests: - 3 new tests in tests/test_vendor_capabilities.py: per-model population (reasoning, audio, caching, vision) * runtime override for llama+localhost * runtime override does NOT touch other vendors - 107/107 vendor+tool+provider+import-isolation tests pass (no regressions; +4 new tests this commit) - 3 audit scripts pass	2026-06-11 21:04:36 -04:00
ed	25baa6fe25	feat(ai_client): add native Ollama adapter; route localhost to it When _llama_base_url is localhost/127.0.0.1, _send_llama now calls _send_llama_native (the native /api/chat adapter) instead of the OpenAI-compat path. The native adapter supports Ollama's vendor-specific fields: think, images, thinking. Functions added (in src/ai_client.py, per the naming convention HARD RULE on no new src/.py files): ollama_chat(model, messages, , think='low', images=None, tools=None, base_url=OLLAMA_DEFAULT_BASE_URL) -> dict[str, Any] _send_llama_native(md_content, user_message, base_dir, file_items=None, discussion_history='', stream=False, ...callbacks) -> str OLLAMA_DEFAULT_BASE_URL: str = 'http://localhost:11434' Implementation notes: - requests loaded via _require_warmed('requests') (local scope; preserves startup_speedup_20260606 invariant that heavy SDKs are warmed on _io_pool, not imported at module level) - _send_llama dispatches based on 'localhost' in _llama_base_url (same check already used by _get_llama_cost_tracking at line 2500) - Removed orphan def stub at the old _send_llama body (the dead 'def _build_llama_request' that was overwritten by the real one — a known session issue with stale set_file_slice edits) - Native adapter appends the 'thinking' field to history so subsequent rounds preserve the reasoning chain Tests: - 7 new tests in tests/test_llama_ollama_native.py: * ollama_chat hits /api/chat (not /v1/chat/completions) * ollama_chat includes 'think' param in payload * ollama_chat includes 'images' in payload * _send_llama_native wraps ollama_chat * _send_llama_native preserves 'thinking' field * _send_llama routes localhost to native (no openai client) * _send_llama keeps openai path for non-local (no POST) - Updated test_send_llama_ollama_backend in test_llama_provider.py to mock the native path (was: mocked openai-compat; now: mocked requests.post) - 103/103 vendor+tool+provider+import-isolation tests pass (no regressions; +7 new tests this commit) - 4 audit scripts pass	2026-06-11 20:45:08 -04:00
ed	0a9e277564	feat(capability_matrix): add 12 v2 fields to VendorCapabilities The 7 v1 fields (vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking) plus 2 cost fields and notes are now extended by 12 v2 fields: local, reasoning, structured_output, code_execution, web_search, x_search, file_search, mcp_support, audio, video, grounding, computer_use All default to False. Registry entries continue to work unchanged (backward compatible). t4_1 of Phase 4. Tests: - 12 parameterized 'default is False' tests - 12 parameterized 'round-trip to True' tests - 3 'local flag' tests: per-model, wildcard fallback, vendor isolation - 3 pre-existing registry tests still pass - 96/96 vendor+tool+provider+import-isolation tests pass (no regressions; +27 new tests this commit)	2026-06-11 20:24:30 -04:00
ed	74c3b6b274	refactor(ai_client): move PROVIDERS to src/ai_client.py; re-export via models.__getattr__ Phase 2 tasks 2.1 + 2.2 + 2.3a of the follow-up track. PROVIDERS now lives in src/ai_client.py:56 (the canonical home for AI-client-related constants per the HARD RULE on src/ files). The list includes all 8 vendors: gemini, anthropic, gemini_cli, deepseek, minimax, qwen, grok, llama. Backward compat: src/models.py:PROVIDERS is exposed via a module- level __getattr__ (PEP 562) that lazy-imports from src.ai_client. The lazy approach was needed because src.ai_client imports ToolPreset/BiasProfile/Tool from src.models at line 50, so a top-level 'from src.ai_client import PROVIDERS' in models.py would deadlock. Adding a branch to the existing __getattr__ in models.py (which also handles pydantic class factories) is the surgical fix. tests/test_provider_curation.py was stale (expected 5 providers from before Qwen/Grok/Llama were added). Updated to 8. New test: tests/test_providers_source_of_truth.py asserts: - src.ai_client.PROVIDERS exists and matches the 8-provider list - src.models.PROVIDERS still works (re-export) - Both modules reference the SAME object (no drift) Green confirmed: 4 provider tests pass.	2026-06-11 16:38:09 -04:00
ed	9ddfa98133	fix(ai_client): move openai_compatible imports to local scope; fix startup_speedup invariant The follow-up track's tool-loop refactor moved 'from src.openai_compatible import send_openai_compatible, OpenAICompatibleRequest, NormalizedResponse' to MODULE level in src/ai_client.py. This violates the startup_speedup_20260606 invariant: heavy SDKs must not be loaded at module level because ai_client.py is on the main thread's import chain. src/openai_compatible.py line 5 does 'from openai import OpenAIError, ...', so any import from it triggers the openai SDK to load. test_ai_client_does_not_import_openai_at_module_level guards this invariant and was failing. Fix: move the imports back to local scope inside the function bodies that need them: - _default_send closure inside run_with_tool_loop (imports send_openai_compatible) - _send_grok (imports OpenAICompatibleRequest) - _send_minimax (imports OpenAICompatibleRequest) - _send_llama (imports OpenAICompatibleRequest) - _send_gemini_cli (imports OpenAICompatibleRequest + NormalizedResponse) Test patches: tests that previously patched 'src.ai_client.send_openai_compatible' now patch 'src.openai_compatible.send_openai_compatible' (the actual import source). _execute_tool_calls_concurrently patches unchanged (it's defined in src/ai_client.py itself). Green confirmed: 62 vendor + tool + import-isolation tests pass. 0 regressions.	2026-06-11 16:15:49 -04:00
ed	4748d13490	feat(ai_client): add send_func + on_pre_dispatch to run_with_tool_loop; refactor _send_gemini_cli Task 1.7 of the follow-up track. Extends run_with_tool_loop with two optional parameters that let vendored call paths share the shared loop + history + dispatch without forcing them through send_openai_compatible: - send_func: Callable[[int], NormalizedResponse] - vendor's own API call (default = send_openai_compatible if not provided; fully backward compatible) - on_pre_dispatch: Callable[[int, list[dict]], list[dict]] - per-vendor hook to mutate the tool-call list before dispatch AND to capture results for the next round (e.g. Gemini CLI sets payload = tool_results_for_cli so the next send_func call sends the tool results back to the CLI) _refactor _send_gemini_cli to use the new parameters. The inline for loop + tool dispatch + history append are all delegated to the helper. The vendor's send_func closure handles: - adapter.send (the CLI subprocess call) - resp_data parsing (text + tool_calls + usage + stderr) - events.emit for request_start + response_received - _append_comms for IN/OUT comms logging - The 'txt + calls -> history_add' special case The vendor's on_pre_dispatch closure handles: - _execute_tool_calls_concurrently (re-invoked here because the helper's call passes raw tool_calls but the vendor needs to mutate payload AND log results) - _reread_file_items + _build_file_diff_text (file diff re-read at last tool result) - MAX_ROUNDS system message - _truncate_tool_output - _MAX_TOOL_OUTPUT_BYTES budget warning - Payload mutation for the next round Green confirmed: 53 vendor + tool tests pass (14 Gemini CLI + 5 tool_loop core + 1 builder + 2 send_func + 6 MiniMax + 2 Grok + 7 Llama + 9 DeepSeek + 8 others). No regressions.	2026-06-11 14:48:03 -04:00
ed	19a4d43e32	refactor(minimax): use run_with_tool_loop shared helper (68 -> 44 lines) Task 1.3 of the follow-up track. _send_minimax now uses run_with_tool_loop with a per-round request_builder callback that re-reads _minimax_history under _minimax_history_lock. The plan's Task 1.3 example builds the request once before the loop. That would break MiniMax tool flows because the API would not see the tool results appended to _minimax_history on later rounds. The fix: extend run_with_tool_loop's 2nd arg to accept Union[OpenAICompatibleRequest, Callable[[int], OpenAICompatibleRequest]] (backward compatible; static-request vendors pass a single request). MiniMax now passes a closure that rebuilds messages from history each round. Reasoning extraction: MiniMax exposes its chain-of-thought via response.raw_response.choices[0].message.reasoning_details[0]. get('text'). Lifted to a _extract_minimax_reasoning callback passed as reasoning_extractor=... (the new parameter added in the previous commit). Trim callback: wraps _trim_minimax_history so it can be called from run_with_tool_loop after each tool-result append. Green confirmed: 51 vendor + tool tests pass (6 MiniMax + 5 tool_loop core + 1 tool_loop builder + 39 others); the new test_ai_client_tool_loop_builder.py locks in the per-round builder contract.	2026-06-11 13:35:45 -04:00
ed	dc0f25c53b	test(ai_client): add red tests for run_with_tool_loop shared helper 5 Red tests in tests/test_ai_client_tool_loop.py verify the planned run_with_tool_loop contract (no-tool-call fast path, tool-call dispatch, max-rounds safety, history append, error tolerance). Deviation from plan: tests patch src.ai_client.send_openai_compatible (plan's Task 1.1 had src.tool_loop.send_openai_compatible). The plan predates the AGENTS.md HARD RULE on src/<thing>.py files; per the follow-up track's Naming Convention section, run_with_tool_loop lives IN src/ai_client.py. The function body imports send_openai_compatible from src.openai_compatible, so src.ai_client.send_openai_compatible is the correct patch path. state.toml: current_phase 0 -> 1, phase_1 pending -> in_progress, t1_1 pending -> in_progress, blocked_by status phase_6_in_progress -> phase_6_complete (parent's Phase 6 checkpointed at `064cb26`). Confirmed red: 5 ImportError against src.ai_client.run_with_tool_loop at collection time.	2026-06-11 10:43:56 -04:00
ed	90f2be94af	test(grok,llama): red phase for Grok (xAI) + Llama (multi-backend) (8 tests, 6 fail) 8 failing tests in 2 new files for the upcoming Grok and Llama provider implementations. Grok (tests/test_grok_provider.py, 2 tests): 1. test_send_grok_uses_xai_endpoint: _send_grok calls _ensure_grok_client and uses an xAI client (base_url https://api.x.ai/v1) 2. test_grok_2_vision_supports_image: structural check that the capability registry has vision=True for grok-2-vision (already populated in Phase 1, so this test passes in Red phase; it is a regression guard for the registry, not an implementation test) Llama (tests/test_llama_provider.py, 6 tests): 1. test_send_llama_ollama_backend: _send_llama with localhost:11434 (Ollama) base URL 2. test_send_llama_openrouter_backend: _send_llama with OpenRouter URL 3. test_send_llama_custom_url: _send_llama with custom URL (escape hatch for self-hosted) 4. test_llama_model_discovery_unions_ollama_and_openrouter: _list_llama_models returns the 8 models from the capability registry 5. test_llama_3_2_vision_vision_capability: structural check for llama-3.2-11b-vision-preview (passes in Red phase) 6. test_llama_local_backend_cost_tracking_false_for_ollama: the local-LLM signal -- when base_url is localhost, _get_llama_cost_tracking() returns False. This is the first test that exercises the local LLM support that the capability matrix was designed for. Both _reset_grok_state and _reset_llama_state fixtures use hasattr() to be no-ops when the state doesn't exist (Red phase). Test signatures use the real 10-arg _send_minimax signature, NOT the plan's 12-arg with enable_tools / rag_engine. Red phase: 6/8 tests fail (4 AttributeError on missing _send_, 2 ImportError on missing _list_/_get_*). 2/8 pass (registry structural checks). Next: Green phase - implement _send_grok + _ensure_grok_client + _send_llama + _ensure_llama_client + _list_llama_models + _get_llama_cost_tracking in src/ai_client.py.	2026-06-11 01:41:47 -04:00
ed	de5e106234	fix(qwen): align with dashscope 1.25.21 API; remove InvalidApiKey monkey-patch	2026-06-11 01:26:53 -04:00
ed	060f471cb9	test(qwen): red phase for Qwen via DashScope (5 failing tests) 5 failing tests in tests/test_qwen_provider.py that establish the core behaviors of the new Qwen (DashScope) provider: 1. test_send_qwen_routes_to_dashscope: _send_qwen calls _ensure_qwen_client and _dashscope_call, returns the text from the DashScope response 2. test_qwen_vision_vl_model_accepts_image: when file_items contains an image, the messages passed to _dashscope_call include the image ref 3. test_qwen_tool_format_translation: build_dashscope_tools converts OpenAI-shaped tool dicts to DashScope shape (name/description/parameters flat structure, not wrapped in function:) 4. test_qwen_error_classification: classify_dashscope_error maps dashscope.common.error.InvalidApiKey -> ProviderError(kind='auth', provider='qwen') 5. test_list_qwen_models_returns_hardcoded_registry: _list_qwen_models returns the 7 Qwen models registered in src/vendor_capabilities.py The autouse _reset_qwen_state fixture uses hasattr() so it is a no-op when _qwen_client / _qwen_history do not exist (yet); this keeps the fixture working in the Red phase. All 5 tests fail: - Tests 1, 2: AttributeError: src.ai_client has no _ensure_qwen_client / _send_qwen / _dashscope_call - Tests 3, 4: ModuleNotFoundError: No module named src.qwen_adapter - Test 5: ImportError: cannot import name _list_qwen_models Test signature adapted to match the real _send_minimax signature at src/ai_client.py:2143-2148 (10 params, no enable_tools / rag_engine) rather than the plan's 12-param signature. Next: Green phase - implement src/qwen_adapter.py + src/ai_client.py state + _ensure_qwen_client + _send_qwen + _list_qwen_models.	2026-06-11 00:53:10 -04:00
ed	b53fe39d79	test(openai_compatible): red phase for shared send helper (6 failing tests) 6 failing tests in tests/test_openai_compatible.py that establish the core behaviors of the new send_openai_compatible() shared helper: 1. test_send_non_streaming_returns_normalized_response: blocking call returns text, empty tool_calls, and correct usage token counts 2. test_send_streaming_aggregates_chunks: streaming call aggregates deltas into final text and fires stream_callback per chunk 3. test_tool_call_detection_in_response: tool_calls from the response are converted to dicts with id/type/function/arguments fields 4. test_vision_multimodal_message: messages with multimodal content (text + image_url) are passed through unchanged to the client 5. test_error_classification_429_to_rate_limit: RateLimitError from openai SDK is caught and re-raised as ProviderError(kind='rate_limit') 6. test_normalized_response_is_frozen_dataclass: NormalizedResponse is a frozen dataclass (FrozenInstanceError on attribute assignment) All 6 tests fail with ModuleNotFoundError: No module named 'src.openai_compatible' (confirmed via pytest). The implementation file will be created in the next commit (Green phase). ProviderError confirmed importable from src.ai_client (no stub needed).	2026-06-11 00:35:13 -04:00
ed	6be04bc4f0	feat(vendor_capabilities): implement registry with initial 22-entry population Green phase: src/vendor_capabilities.py now exists and all 3 Red-phase tests in tests/test_vendor_capabilities.py pass. Implementation: - VendorCapabilities frozen dataclass with 12 fields (vendor, model, vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking, cost_input_per_mtok, cost_output_per_mtok, notes) - Module-level _REGISTRY dict keyed by (vendor, model) - register() inserts/overwrites entries - get_capabilities() returns specific entry if present, else vendor '' default, else raises KeyError with 'No capabilities registered' message - list_models_for_vendor() returns sorted model names for a vendor (excludes '' wildcard) Initial population (22 entries at module load): - 1 minimax wildcard (cost: 0.20/0.20 per Mtok) - 4 grok (1 wildcard + 3 models; grok-2-vision has vision=True) - 9 llama (1 wildcard + 8 models; 11b/90b vision variants have vision=True) - 8 qwen (1 wildcard + 7 models; qwen-vl-plus/max have vision=True; qwen-audio has notes='Text-only in v1; audio input deferred') The plan's Task 1.3 listed 22 entries but included one impossible entry (vendor='minimax', model='grok-2-latest'). Omitted; 21 entries shipped. Test fix: test_fallback_to_vendor_default previously used model name 'llama-3.3-70b-specdec' which IS in the registry, so the specific entry was returned (with default cost_tracking=True), not the wildcard. Fixed by changing to 'llama-3.3-future-unregistered' (not in registry, so fallback fires correctly).	2026-06-11 00:30:52 -04:00
ed	6fb6f8653c	test(vendor_capabilities): red phase for registry lookup, fallback, unknown vendor 3 failing tests in tests/test_vendor_capabilities.py that establish the core behaviors of the new VendorCapability matrix: 1. test_registry_lookup_known_model: registering and looking up a specific (vendor, model) entry returns the registered entry 2. test_fallback_to_vendor_default: looking up an unregistered model returns the vendor's '*' default entry 3. test_unknown_vendor_raises: looking up a vendor with no entries raises KeyError with a 'No capabilities registered' message All 3 tests fail with ModuleNotFoundError: No module named 'src.vendor_capabilities' (confirmed via pytest). The implementation file will be created in the next commit (Green phase). The autouse _clean_registry fixture snapshots src.vendor_capabilities._REGISTRY before each test and restores it after, providing test isolation for the module-level state.	2026-06-11 00:19:00 -04:00
ed	5a9b8d6891	fix(test+rag): clean chroma cache pre-test + add INVESTIGATE stderr for RAG init	2026-06-10 17:20:57 -04:00
ed	a3abe49ca9	fix(test): poll for mma_state_update 'simulating' to land in test_gui_ux_event_routing	2026-06-10 15:45:44 -04:00
ed	2c924fe6df	test(infra): poll-for-event race fixes + watchdog timeout bump + spec update	2026-06-10 15:14:35 -04:00
ed	563e609505	fix(test): poll for push_event to land in test_visual_mma_components	2026-06-10 15:13:25 -04:00
ed	8f7de45aca	fix(rag): robust test polling for entry race + stress test timing tolerance	2026-06-10 14:43:27 -04:00
ed	15ffc3a34f	fix(rag): make test assertion accept either file's content (robust to chroma ordering)	2026-06-10 13:53:52 -04:00
ed	1772fa8fc2	conductor(checkpoint): Final Phase 2 complete - FR1+FR2 re-applied, sim test passes in batch	2026-06-10 12:13:16 -04:00
ed	4660b8c874	fix(sim): defensive .setdefault('paths', []) in test_context_sim_live	2026-06-10 11:33:15 -04:00
ed	428aa18948	conductor(checkpoint): Checkpoint end of Phase 1 (4 FRs + 4 regression tests)	2026-06-10 09:56:21 -04:00
ed	b96d709efb	test(reset): regression for 3 pre-existing controller bugs	2026-06-10 09:16:46 -04:00
ed	72f8f466fe	fix(sim+api): proper wait loops, project switch endpoint, drop stale check Three real fixes for the sim test + the live_gui coordination layer: 1. /api/project_switch_status endpoint in src/app_controller.py. The wait helper had been calling this endpoint but it did not exist; the helper always received a 404, fell back to {in_progress: False}, and returned immediately even when a switch was in flight. Added the endpoint that reads _project_switch_in_progress, active_project_path, and _project_switch_error from the controller. 2. simulation/sim_base.py: replace time.sleep(2.0)/time.sleep(1.5) in the setup() with wait_io_pool_idle and wait_for_project_switch so the test does not click btn_md_only while a project switch is in flight. Also added the wait calls to sim_context.py for the same reason. 3. src/app_controller.py _handle_md_only: removed the is_project_stale() early-return. The stale state is a transient window during which the previous code dropped the click on the floor with a misleading 'stale ui' status. The MD generation worker is safe to run from any project state; the action handler now always proceeds. 4. tests/test_extended_sims.py: set current_model to 'gemini-cli' so _do_generate does not raise KeyError('model') when the test overrides provider to gemini_cli. KNOWN ISSUE: test_context_sim_live still fails with status 'switching to: temp_livecontextsim' after a 60s wait. The click appears to be re-triggering a project switch via the GUI's render loop. Root cause investigation deferred; the sim is async and the test path is fragile.	2026-06-10 00:31:22 -04:00
ed	33d02bb11f	fix(test): drop rmtree race in live_gui workspace creation The session-scoped live_gui fixture deleted the shared workspace before recreating it, which raced with the per-worker lock acquisition and produced FileNotFoundError on .live_gui_owner.lock in xdist. The per-run timestamped name (tests/artifacts/live_gui_workspace_<ts>/) already provides enough isolation between pytest invocations, so the rmtree is unnecessary. Use mkdir(exist_ok=True) only.	2026-06-09 23:31:09 -04:00
ed	283bb7085b	fix(test): remove live_gui skip gate — lock mechanism handles coordination	2026-06-09 22:45:36 -04:00
ed	5568b59634	fix(test): single shared workspace, remove per-worker subdirs (keep lock mechanism)	2026-06-09 22:38:28 -04:00
ed	4bb19835db	fix(test): per-worker workspace subdir + file-lock for xdist live_gui coordination	2026-06-09 22:23:33 -04:00
ed	38cb0f99b4	fix(test): add PID to workspace path for xdist worker isolation	2026-06-09 21:45:02 -04:00
ed	35f4cecb9b	fix(test): catch OSError in workspace rmtree retry (broader than PermissionError)	2026-06-09 21:22:00 -04:00
ed	aa776224f2	test(workspace): update fixture test to assert tests/artifacts/ not tmp dir	2026-06-09 21:06:06 -04:00
ed	ccc2aa0be9	test(workspace): verify per-run workspace path and gitignore status	2026-06-09 20:45:24 -04:00
ed	b8c15f8d92	fix(test): per-run workspace under tests/artifacts/ (replaces tmp_path_factory)	2026-06-09 20:42:43 -04:00
ed	fe240db410	fix(reset): clear mma_tier_usage and RAG state in _handle_reset_session	2026-06-09 19:44:10 -04:00
ed	34290e5d1a	test(watchdog): update PYTEST_FINISHED_TIMEOUT_SECONDS to 600 to match conftest	2026-06-09 18:42:53 -04:00
ed	c3af1b8a2e	chore(test): double smart_watchdog timeout from 300s to 600s for tier-3	2026-06-09 18:37:34 -04:00
ed	7a946544ff	test(mma): mark test_visual_mma_components with clean_baseline	2026-06-09 17:14:23 -04:00
ed	e7da7e0d6a	test(rag): update test for Phase 4 coalescing state	2026-06-09 17:10:33 -04:00
ed	749120d239	feat(audit): flag hardcoded workspace and project-root paths in tests	2026-06-09 17:01:14 -04:00
ed	1cd3444e4c	test(rag): mark RAG tests with clean_baseline for batch isolation	2026-06-09 16:56:55 -04:00
ed	7b87bbf5ec	feat(test): clean_baseline marker resets controller state before test	2026-06-09 16:40:18 -04:00
ed	b8fcd9d6f5	fix(rag): coalesce _sync_rag_engine calls via token + dirty flag	2026-06-09 16:25:44 -04:00
ed	006bb11488	refactor(test): 5 test files use live_gui_workspace fixture instead of hardcoded path	2026-06-09 16:14:40 -04:00
ed	91313451a2	feat(test): expose live_gui_workspace as a separate fixture	2026-06-09 15:53:06 -04:00
ed	c64da95ef5	refactor(test): live_gui workspace via tmp_path_factory	2026-06-09 15:51:35 -04:00
ed	c3cb3c6e44	feat(test): autouse _check_live_gui_health recovers from degraded subprocess	2026-06-09 15:47:28 -04:00
ed	67d0211e56	feat(test): autouse _check_live_gui_health recovers from degraded subprocess	2026-06-09 15:42:00 -04:00
ed	16bd3d3a47	refactor(test): wrap live_gui subprocess in _LiveGuiHandle class	2026-06-09 15:37:47 -04:00

1 2 3 4 5 ...

719 Commits