ed/manual_slop

Private

Public Access

Fork 0

Files

T

ed 566cf08cb8 conductor(track): test_infrastructure_hardening_20260609 - spec to kill the test regression nightmare

2026-06-09 15:15:26 -04:00

28 KiB

Raw Blame History

Track Specification: Test Infrastructure Hardening (2026-06-09)

Status: SPEC FOR APPROVAL. The user has asked for a single track to "kill the test regression nightmare" so the 4 upcoming tracks (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor) can land on a clean test bed.

Inheritance: This track absorbs and supersedes:

docs/reports/test_infra_hardening_foundation_20260608.md (foundation, 5 phases proposed)

docs/reports/batch_resilience_plan_20260608.md (4 solutions; Solution A + C recommended)

docs/reports/rag_test_batch_failure_status_20260609_pm3.md (filesystem hygiene findings #1-5)

docs/reports/rag_work_final_20260609_pm.md (remaining failures: io_pool race, set_value hook)

The implicit "fix test in batch" goal that has been chasing the Tier 2 for 4+ days

Overview

The test suite has accumulated 49+ live_gui tests that share a single session-scoped subprocess. Recent regression hunts have surfaced 3 distinct failure modes that keep re-emerging under different masks:

Subprocess state pollution — the 4 sims in test_extended_sims.py mutate controller state (current_provider, ui_* attrs, MMA workflows, RAG sync); subsequent tests in the same batch read dirty state.
Filesystem hygiene — the live_gui fixture creates tests/artifacts/live_gui_workspace/ as a HARDCODED relative path; 6 test files re-derive the path independently; RAGEngine.index_file joins base_dir + file_path with base_dir possibly being a relative path, so indexing silently no-ops in batch (the root cause of the RAG test batch failure).
io_pool race in _sync_rag_engine — multiple setters in quick succession submit parallel sync tasks, last-finished-wins, indexing is non-deterministic.

Each of these has been "fixed" in isolation (RAG dim-mismatch recursion, CWD fallback, embedding provider error surface, ini_content str/bytes sentinel, indent on _capture_workspace_profile) but the underlying architectural problems remain. The Tier 2 keeps finding new symptoms.

This track kills the nightmare by fixing the three root causes with surgical, contained, testable changes that the 4 upcoming tracks need as a precondition.

Current State Audit (as of 2026-06-09)

Already Implemented (DO NOT re-implement)

✅ live_gui fixture exists at tests/conftest.py:282 (session-scoped)
✅ Fixture kills subprocess on teardown (tests/conftest.py:516-547)
✅ /api/gui_health endpoint surfaces degraded state (commit 1c565da7)
✅ Pre-flight get_gui_health() check in test_full_live_workflow (commit 51ecace4)
✅ try/except around immapp.run (commit 1c565da7)
✅ _UI_FLAG_DEFAULTS allowlist for __getattr__ (commit bcdc26d0)
✅ _ini_capture_ready defer-not-catch flag for imgui.save_ini_settings_to_memory (commit d7487af4)
✅ _capture_workspace_profile indent fix (sub-track 1 of live_gui_test_hardening_v2, commit 26e0ced4)
✅ ini_content str/bytes contract test (tests/test_workspace_profile_serialization.py)
✅ LogPruner busy-loop backoff (commit ac08ee87)
✅ RAG dim-mismatch wipe (commit 64bc04a6)
✅ RAG _validate_collection_dim recursion fix (commit 644d88ab)
✅ RAG index_file CWD fallback (commit eb8357ec, uncommitted as of report; needs to be committed as defensive fix)
✅ sentence-transformers available in dev env via [local-rag] extra (commit a341d7a7)
✅ _sync_rag_engine surfaces embedding_provider init failure (commit e62266e8)
✅ test_required_test_dependencies.py enforces test-time deps (commit b801b11c)
✅ isolate_workspace, reset_paths, reset_ai_client, vlogger autouse fixtures
✅ audit_main_thread_imports.py and audit_weak_types.py static CI gates
✅ check_test_toml_paths.py audit script (CI gate for real-TOML references)
✅ Batch tier-1 + tier-2 + tier-3 + tier-H + tier-P structure (scripts/run_tests_batched.py)

Gaps to Fill (This Track's Scope)

Gap 1: `live_gui` subprocess scope + per-test dirty-state guard

What exists: Session-scoped live_gui fixture. Subprocess state survives across 49+ tests.
What's missing: When a test dies (IM_ASSERT, error result, etc.) the subprocess is degraded; subsequent tests in different files get dirty state. The pre-flight get_gui_health() check is file-local, not test-local, and only checks health, doesn't recover.
Real symptom: test_rag_phase4_final_verify passes in isolation, fails in batch. test_gui2_set_value_hook_works returns '' instead of queued value. test_rag_phase4_stress non-deterministic indexing.

Gap 2: Filesystem hygiene for `live_gui_workspace`

What exists: tests/conftest.py:412 hardcodes Path("tests/artifacts/live_gui_workspace"). 6 test files re-derive the same path independently.
What's missing: The path is relative to CWD. When the test runner or prior tests shift CWD, all downstream path joins break. RAGEngine.index_file joins base_dir + file_path; when base_dir is relative and CWD has drifted, the file doesn't exist, indexing silently no-ops.
Real symptom: RAG test in batch finds 0 documents in collection. chroma_test_final_verify count=0. chroma_db collection count=0. chroma_test_stress count=0. Only chroma_manual_slop (the user's project, NOT a test) has 328 docs from a separate session.
Files affected:
- tests/conftest.py:412 (HARDCODED)
- tests/test_rag_phase4_final_verify.py:20
- tests/test_rag_phase4_stress.py:21
- tests/test_saved_presets_sim.py:14, 121
- tests/test_tool_presets_sim.py:13
- tests/test_visual_sim_gui_ux.py:79

Gap 3: `_sync_rag_engine` io_pool race

What exists: src/app_controller.py _sync_rag_engine submits a sync task to _io_pool for each set_value that mutates rag_config. Multiple setters in quick succession → multiple parallel sync tasks → non-deterministic indexing.
What's missing: A coalescing/debounce pattern that serializes sync attempts within a short window (e.g., 100ms).
Real symptom: Test fires 5 setters (rag_collection_name, files, rag_enabled, rag_source, rag_emb_provider) in succession. Each submits a sync. The last one to finish wins, but indexing happens against whichever engine finished last. The test then asserts on the wrong engine's output.

Gap 4: `set_value` hook test failure (pre-existing, separate code path)

What exists: test_gui2_set_value_hook_works line 41 — set_value returns 'queued' but get_value('ai_input') returns '' after 1.5s.
What's missing: A setattr routing issue in gui_2.py similar to the earlier _UI_FLAG_DEFAULTS fix. The test's input doesn't actually reach the controller.
Real symptom: Test fails in batch; same class of bug as the _UI_FLAG_DEFAULTS allowlist bug (commit bcdc26d0).

Gap 5: Tests assert against dirty subprocess state from prior tests

What exists: Test isolation is implicit (assumes clean state from prior fixture). When a prior test's set_value calls pollute the controller, subsequent tests fail in ways unrelated to their code.
What's missing: A _reset_controller_state hook that the live_gui fixture exposes, so each test can opt-in to a clean baseline.

Goals

Goal A: Per-test subprocess resilience. Make the live_gui fixture recover from a degraded subprocess BEFORE each test (not just before each file). When the subprocess dies mid-test, the next test gets a fresh one.
Goal B: Path hygiene for the live_gui workspace. Refactor tests/conftest.py:live_gui to use tmp_path_factory.mktemp("live_gui_workspace") and expose the path as a separate fixture. Update all dependent test files to consume the fixture instead of hardcoding the path.
Goal C: Eliminate _sync_rag_engine race. Add a coalescing/debounce pattern so 5 setters in 100ms produce 1 sync, not 5 parallel syncs.
Goal D: Fix set_value hook routing. Find the __setattr__ bug that causes set_value('ai_input', ...) to not actually mutate the controller's ai_input state, and fix it the same way _UI_FLAG_DEFAULTS was fixed.
Goal E: Test files assert against fresh state. Add a _reset_controller_state fixture that any test can opt into via autouse-on-marker (@pytest.mark.clean_baseline).
Goal F: Verify all 4 upcoming tracks have a clean test bed. Run the full tier-1 + tier-2 + tier-3 batch and document which tests pass in batch vs. isolation. The 4 upcoming tracks (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor) start with a known green baseline.

Non-Goals (Out of Scope)

❌ Refactoring the live_gui fixture to per-file scope (Solution A in batch_resilience_plan_20260608.md). Solution D (autouse health check + respawn) is the surgical alternative; per-file is too coarse.
❌ Refactoring src/rag_engine.py to a chunk-based data structure (that's the chunkification_optimization_20260608_PLACEHOLDER track).
❌ Migrating live_gui tests to mock-based tests (preserves the integration value).
❌ Adding CI infrastructure (this repo has no CI; manual batch runs are the verification).
❌ Fixing the 7 mock_app tests in test_z_negative_flows.py (separate code path; deferred).
❌ Fixing the 5 MMA pipeline tests that don't reach "tracks" state (separate code path; deferred).
❌ Fixing the auto_switch_sim test (separate code path; deferred).
❌ Doing the code_path_audit_20260607 work (post-4-tracks; the audit is the post-condition).

Functional Requirements

FR1. Per-test subprocess health check + respawn

Where: tests/conftest.py:282 (the live_gui fixture)

What: Add an autouse fixture that runs AFTER live_gui and BEFORE each test that uses it. The fixture:

Calls client.get_gui_health() with a 1s timeout.
If health is "degraded" OR the response is None OR the call raises, calls _respawn_subprocess().
After respawn (or if health was already OK), verifies the subprocess is alive via the existing kill_process_tree machinery.

API:

@pytest.fixture(autouse=True)
def _check_live_gui_health(request, live_gui):
    if "live_gui" in request.fixturenames:
        handle, _ = live_gui
        handle.ensure_alive()  # does the health check + respawn
    yield

Tests required:

test_live_gui_respawn_after_kill: kill the subprocess via the handle, run a no-op test that uses live_gui, assert the subprocess is alive at test end.
test_live_gui_health_check_fast_path: when the subprocess is alive, the health check is <100ms.
test_live_gui_no_respawn_on_clean: when the subprocess is alive AND get_gui_health() returns OK, no respawn happens (verify via a respawn_count counter on the handle).

FR2. Expose `live_gui_workspace` as a separate fixture

Where: tests/conftest.py:282 (the live_gui fixture), plus 6 test files

What:

Change live_gui to create the workspace via tmp_path_factory.mktemp("live_gui_workspace") instead of Path("tests/artifacts/live_gui_workspace").
Add a new fixture live_gui_workspace that yields the absolute path to the workspace.
The live_gui fixture uses chdir (or sets the subprocess CWD) to the absolute path; the subprocess inherits the correct CWD.
Update 6 test files to accept live_gui_workspace as a fixture parameter and use the absolute path instead of the hardcoded one.

Tests required:

test_live_gui_workspace_is_absolute: assert the workspace path is absolute.
test_live_gui_workspace_unique_per_session: assert two consecutive sessions get different workspace dirs (per-session mktemp returns unique dirs).
test_live_gui_workspace_passed_to_test: parametrize a test with live_gui_workspace, assert the test can create files in it.

Files to update:

tests/conftest.py:412 — replace Path("tests/artifacts/live_gui_workspace") with tmp_path_factory.mktemp("live_gui_workspace")
tests/test_rag_phase4_final_verify.py:20 — accept live_gui_workspace fixture
tests/test_rag_phase4_stress.py:21 — accept live_gui_workspace fixture
tests/test_saved_presets_sim.py:14, 121 — accept live_gui_workspace fixture
tests/test_tool_presets_sim.py:13 — accept live_gui_workspace fixture
tests/test_visual_sim_gui_ux.py:79 — accept live_gui_workspace fixture

FR3. Coalesce `_sync_rag_engine` calls

Where: src/app_controller.py:_sync_rag_engine (or the setter that triggers it)

What: Replace the immediate-submit pattern with a debounce/coalesce pattern. Multiple setters within a 100ms window produce ONE sync, run on the next idle moment.

Approach: Add a _rag_sync_token: Optional[int] and a _rag_sync_dirty: bool flag. When a setter mutates rag_config, increment the token and set dirty. A background "sync dispatcher" task (or a deferred submit) reads the token, builds the engine once, sets the engine, and clears the flag. If a new setter comes in while a sync is running, increment the token, set dirty, the running sync sees the new token and re-runs once.

Tests required:

test_sync_rag_engine_coalesces_five_setters: fire 5 setters in 50ms, assert only 1 RAGEngine() is constructed.
test_sync_rag_engine_rerun_on_token_change: while a sync is running, fire a setter; assert the sync sees the new token and re-runs once.
test_sync_rag_engine_idempotent_no_changes: if no setters fire, no sync runs.

FR4. Fix `set_value` hook routing for `ai_input`

Where: src/gui_2.py:__setattr__ (or src/app_controller.py:_handle_set_value)

What: Investigate the __setattr__ / __setstate__ chain. The test (tests/test_gui2_set_value_hook_works) calls client.set_value('ai_input', 'hello'), which posts to /api/gui/set_value, which calls controller.<some_method>. The method either doesn't actually mutate ai_input or routes the value to a different attribute (similar to how _UI_FLAG_DEFAULTS was incorrectly returning None).

Likely root cause: Either:

The __setattr__ allowlist only includes certain ui_ attrs, and ai_input is not on it, so the assignment is silently dropped.
The /api/gui/set_value endpoint has a field != 'ai_input' branch that doesn't call the setter.

Tests required:

test_set_value_hook_ai_input: assert that after set_value('ai_input', 'hello') and a 0.5s wait, get_value('ai_input') returns 'hello'.
test_set_value_hook_temperature: same for temperature.
test_set_value_hook_persists: same for model_name.

Diagnostic test (write first): A test that introspects the controller's __dict__ and the API hook's parameter-to-handler mapping to find the missing branch.

FR5. Optional clean-baseline marker

Where: tests/conftest.py (new fixture), test files that want it

What: Add a @pytest.mark.clean_baseline marker. An autouse fixture detects the marker and calls a _reset_controller_state method on the controller before the test starts. The reset clears: ai_input, ai_status, ai_response, current_provider, current_model, rag_config, files, mma_streams, mma_epic_input, mma_proposed_tracks, plus any field set by a prior test.

API:

@pytest.fixture(autouse=True)
def _clean_baseline(request, live_gui):
    if request.node.get_closest_marker("clean_baseline"):
        handle, _ = live_gui
        handle.client.reset_session()  # existing endpoint, plus extended reset
    yield

Tests required:

test_clean_baseline_resets_ai_input: set ai_input='polluted', mark test with clean_baseline, assert ai_input is '' at test start.
test_clean_baseline_resets_rag_config: same for rag_config.

FR6. Verify the 4 upcoming tracks have a clean test bed

Where: scripts/run_tests_batched.py (no changes); verification in this track's final phase

What: Run the full tier-1 + tier-2 + tier-3 batch and document which tests pass. Produce a "test bed health report" as a markdown file in docs/reports/test_bed_health_20260609.md. The report lists:

Tier-1 unit tests: all pass (already verified in rag_work_final_20260609_pm.md)
Tier-2 mock_app tests: all pass
Tier-3 live_gui tests: pass/fail per file, with the failure mode
A "before" / "after" diff so the user can see the impact

Non-Functional Requirements

NFR1: Per-test overhead < 200ms. The autouse _check_live_gui_health fixture must add <200ms to each test that uses live_gui. The 49 live_gui tests × 200ms = 9.8s additional batch time. Acceptable.
NFR2: No regressions in tier-1 / tier-2. All unit tests and mock_app tests must continue to pass. The fixture change is additive, not destructive.
NFR3: Backward compat for tests that don't opt in. Tests that don't use live_gui are unaffected. Tests that use live_gui but don't opt into clean_baseline continue to work (they just don't get a reset).
NFR4: No hardcoded paths to C:/projects/manual_slop or ./tests/artifacts/ in production code. The track's filesystem-hygiene fix is enforced by the existing scripts/check_test_toml_paths.py audit (extended to also catch Path("tests/artifacts/") and Path("C:/projects/") in test files).
NFR5: 1-space indentation. All Python code in this track uses 1-space indentation per conductor/product-guidelines.md.
NFR6: CRLF line endings on Windows. All Python files in this track use CRLF.

Architecture Reference

This track touches the following subsystems (see linked deep-dive guides):

Test infrastructure: tests/conftest.py, scripts/run_tests_batched.py. See docs/guide_testing.md §"7 conftest fixtures" and §"Puppeteer pattern".
AppController state delegation: src/app_controller.py (166KB). See docs/guide_app_controller.md §"_predefined_callbacks / _gettable_fields Hook API registries" and docs/guide_state_lifecycle.md §"State Delegation (getattr/setattr)".
RAG engine: src/rag_engine.py. See docs/guide_rag.md §"RAGEngine lifecycle" and §"Sync to controller".
Hook API: src/api_hooks.py + src/api_hook_client.py. See docs/guide_api_hooks.md §"/api/gui/set_value" and §"Remote Confirmation Protocol".
io_pool: src/app_controller.py:_io_pool. See docs/guide_architecture.md §"Thread domains".

Key design constraints inherited

Defer-not-catch pattern: imgui.* calls before ImGui is ready crash at the C level (0xc0000005). The _check_live_gui_health fixture must NOT touch ImGui directly. It uses the existing Hook API (/api/gui_health, /api/status) which runs in the hook server thread, not the render thread.
Session-scoped fixture: live_gui is session-scoped by design. Per-file or per-test scoping would break cross-test state (e.g., test_full_live_workflow expects a fresh live_gui, but test_rag_phase4_stress depends on the same subprocess the prior 4 sims used). The autouse respawn is the surgical solution.
tmp_path_factory scope: tmp_path_factory.mktemp() is session-scoped (per the pytest docs). Per-test tmp_path is a different fixture. The live_gui_workspace fixture must use tmp_path_factory to be consistent with the session-scoped live_gui.

Key prior decisions to respect

The _UI_FLAG_DEFAULTS allowlist was a HARD-CODED set. The new set_value hook fix should follow the same allowlist pattern (consistency with the existing fix) OR use a class-level attribute that derives from __init__ annotations (the better fix, but the user has not asked for the better fix; this track stays surgical).
The existing run_tests_batched.py tier structure (tier-1 unit, tier-2 mock_app, tier-3 live_gui, tier-H headless, tier-P perf) is NOT to be restructured. The track works WITH the existing tier structure.
The audit_main_thread_imports.py and audit_weak_types.py static CI gates are the project's enforcement mechanism. The new Path("tests/artifacts/") and Path("C:/projects/") patterns are added to check_test_toml_paths.py (extended) as a third gate.

Out of Scope

The following are explicitly NOT part of this track. They are mentioned so the user knows they are deferred, not forgotten:

Per-file live_gui fixture scope (Solution A from batch_resilience_plan_20260608.md): Not needed if the per-test autouse respawn works. May revisit if the per-test respawn has too much overhead.
Refactoring live_gui fixture to a class-based handle with respawn (Solution B): Same — only do if per-test respawn is insufficient.
MMA pipeline tests that don't reach "tracks" state: 3 tests fail in this pattern (test_mma_concurrent_tracks_execution, test_mma_step_mode_approval_flow, test_mma_complete_lifecycle). These are MMA-engine-state-transition bugs, not test-isolation bugs. Out of scope.
Negative-flows tests (test_z_negative_flows.py): 3 tests fail in this pattern. They exercise the mock provider's error path. Pre-existing, separate code path. Out of scope.
test_auto_switch_sim: Workspace auto-switch logic not applying Tier 3 profile. Pre-existing, separate code path. Out of scope.
test_prior_session_no_pop_imbalance: Already addressed in live_gui_test_hardening_v2 (commit 26e0ced4). Verify it still passes.
code_path_audit_20260607: Post-4-tracks audit. This track unblocks the 4 tracks; the audit runs after.
chunkification_optimization_20260608_PLACEHOLDER: The comms.log chunkification. Out of scope; the user has not approved it.
manual_ux_validation_20260608_PLACEHOLDER: The ASCII-sketch workflow. Out of scope; the user has not approved it.
CI infrastructure: No CI in this repo. Manual batch runs are the verification.

Verification Criteria

This track is "done" when ALL of the following are true:

✅ All tier-1 unit tests pass in batch (no regression).
✅ All tier-2 mock_app tests pass in batch (no regression).
✅ The 6 test files that hardcoded Path("tests/artifacts/live_gui_workspace") now use the live_gui_workspace fixture.
✅ test_rag_phase4_final_verify.py::test_phase4_final_verify passes in BATCH (after 4 sims) — the primary symptom the user wanted fixed.
✅ test_rag_phase4_stress.py passes in batch OR has a documented reason for the residual flakiness (acceptable per rag_work_final_20260609_pm.md's "out of scope" decision IF the io_pool race fix in FR3 lands).
✅ test_gui2_set_value_hook_works passes in batch.
✅ The autouse _check_live_gui_health fixture is in place; a new test (test_live_gui_respawn_after_kill) verifies it.
✅ The _sync_rag_engine coalescing fix is in place; a new test (test_sync_rag_engine_coalesces_five_setters) verifies it.
✅ A docs/reports/test_bed_health_20260609.md report is committed, listing pass/fail per test file with the failure mode for any residual failures.
✅ scripts/check_test_toml_paths.py is extended to flag Path("tests/artifacts/") and Path("C:/projects/") in test files; the audit passes.

Risk Assessment

Risk	Likelihood	Impact	Mitigation
Per-test respawn adds too much overhead (>200ms × 49 tests = 10s)	Medium	Low	Verify with the NFR1 measurement; if exceeded, fall back to per-batch respawn
Per-test respawn breaks cross-test state dependencies	Medium	High	Add a `--no-respawn` pytest flag for tests that need cross-test state; audit the 49 live_gui tests for state dependencies before Phase 1
`tmp_path_factory.mktemp` changes the workspace path, breaking the on-disk chroma DB persistence assumption	High	Low	Clear `.slop_cache/` dirs at session start; OR add a `live_gui_workspace_persist` opt-in
`_sync_rag_engine` coalescing breaks the existing RAG test that DEPENDS on multiple parallel syncs (unlikely)	Low	Medium	Write the FR3 tests to verify both "5 setters → 1 sync" AND "single setter → single sync" still work
`set_value` hook fix changes behavior for existing tests that assert on the OLD (broken) behavior	Low	High	Run the full tier-3 batch in Phase 3 and verify no regressions
The `tmp_path_factory.mktemp` refactor corrupts `tests/conftest.py` (the previous attempt at this refactor DID corrupt it; commit was reverted per `rag_test_batch_failure_status_20260609_pm3.md`)	High	High	Use `git stash` before each edit; if edit fails, `git stash pop` and try again with `manual-slop_set_file_slice` (which is the recommended surgical tool per `conductor/edit_workflow.md`)

Phases (summary)

This spec is the entry point. The plan (plan.md) breaks these into TDD-ready tasks.

Phase	Scope	Effort
Phase 1	Audit: enumerate all `live_gui` cross-test state dependencies, document baseline failure modes	1 day
Phase 2	FR1: Per-test subprocess health check + respawn (autouse fixture)	1 day
Phase 3	FR2: Expose `live_gui_workspace` as a separate fixture, update 6 test files	1 day
Phase 4	FR3: Coalesce `_sync_rag_engine` calls (token + dirty flag pattern)	1 day
Phase 5	FR4: Fix `set_value` hook routing for `ai_input`	1 day
Phase 6	FR5: Optional `clean_baseline` marker	0.5 day
Phase 7	FR6: Run full batch, produce test_bed_health report	0.5 day
Phase 8	Docs: update `docs/guide_testing.md` + `docs/guide_state_lifecycle.md`	0.5 day

Total: 6.5 days (fits within 1 sprint).

Approval Required

This spec requires user approval before the plan is written. Per the conductor workflow:

The spec is the agent's design intent — it explains WHY, not just WHAT. A plan for an unapproved spec is wasted effort.

The user has asked for a track to "kill the test regression nightmare." This spec defines what "kill" means: 5 surgical fixes (FR1-FR5) + a verification report (FR6) that produces a clean test bed for the 4 upcoming tracks. If the user wants more aggressive scope (e.g., refactoring live_gui to per-file scope), revise the spec before approving.

28 KiB Raw Blame History Unescape Escape

Track Specification: Test Infrastructure Hardening (2026-06-09)

Overview

Current State Audit (as of 2026-06-09)

Already Implemented (DO NOT re-implement)

Gaps to Fill (This Track's Scope)

Gap 1: live_gui subprocess scope + per-test dirty-state guard

Gap 2: Filesystem hygiene for live_gui_workspace

Gap 3: _sync_rag_engine io_pool race

Gap 4: set_value hook test failure (pre-existing, separate code path)

Gap 5: Tests assert against dirty subprocess state from prior tests

Goals

Non-Goals (Out of Scope)

Functional Requirements

FR1. Per-test subprocess health check + respawn

FR2. Expose live_gui_workspace as a separate fixture

FR3. Coalesce _sync_rag_engine calls

FR4. Fix set_value hook routing for ai_input

FR5. Optional clean-baseline marker

FR6. Verify the 4 upcoming tracks have a clean test bed

Non-Functional Requirements

Architecture Reference

Key design constraints inherited

Key prior decisions to respect

Out of Scope

Verification Criteria

Risk Assessment

Phases (summary)

See Also

Approval Required

28 KiB

Raw Blame History

Gap 1: `live_gui` subprocess scope + per-test dirty-state guard

Gap 2: Filesystem hygiene for `live_gui_workspace`

Gap 3: `_sync_rag_engine` io_pool race

Gap 4: `set_value` hook test failure (pre-existing, separate code path)

FR2. Expose `live_gui_workspace` as a separate fixture

FR3. Coalesce `_sync_rag_engine` calls

FR4. Fix `set_value` hook routing for `ai_input`