From 97c58f033225ee68bf45e889640b98b29a667639 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Sat, 27 Jun 2026 23:46:36 -0400 Subject: [PATCH] docs(report): ADDENDUM 3 - tests hotpatch state instead of calling proper project-switch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per user feedback: the test progression is fundamentally broken. Tests hotpatch individual state fields (files, rag_enabled, etc.) via set_value instead of switching to a project that has the right configuration, like a user would. The session-scoped subprocess's active_project_path leaks across tests because reset_session() deliberately doesn't reset it. Documented the 4 red flags: 1. test_rag_phase4_final_verify hotpatches state, never calls _switch_project 2. reset_session() is an incomplete reset masquerading as @clean_baseline 3. sim_base.teardown() is a no-op (cleanup commented out), never switches back 4. index_file silently no-ops on missing files (production bug) Correct fix: tests should call _switch_project to establish their project context (like a user), not hotpatch. reset_session() should restore the original project. sim_base.teardown() should switch back + clean up. Retracted the 'delete stale files' recommendation — that treats the symptom, not the defect. --- ...GATION_rag_phase4_final_verify_20260627.md | 119 +++++++++++++++++- 1 file changed, 118 insertions(+), 1 deletion(-) diff --git a/docs/reports/INVESTIGATION_rag_phase4_final_verify_20260627.md b/docs/reports/INVESTIGATION_rag_phase4_final_verify_20260627.md index edd23a91..cc45ab81 100644 --- a/docs/reports/INVESTIGATION_rag_phase4_final_verify_20260627.md +++ b/docs/reports/INVESTIGATION_rag_phase4_final_verify_20260627.md @@ -250,4 +250,121 @@ The `test_sandbox_hardening_20260619` track shipped (state.toml `status = "compl - `4d2a6666` fix(rag): convert RAGChunk to dict — **the real fix for the OLD hang**; still present in source - `91a61288` Merge `tier2/module_taxonomy_refactor_20260627` — **ancestor of `4d2a6666`** (NOT the regression trigger; `4d2a6666` was on top of it when it allegedly passed) -- `e58d332e` test(rag): update dim mismatch test + stress test — Tier 2's test-only changes; `git diff 4d2a6666 e58d332e` is test/docs only, NO `src/` — so the "regression" between them is environmental, not a code change \ No newline at end of file +- `e58d332e` test(rag): update dim mismatch test + stress test — Tier 2's test-only changes; `git diff 4d2a6666 e58d332e` is test/docs only, NO `src/` — so the "regression" between them is environmental, not a code change + +--- + +## ADDENDUM 3: The real defect — tests hotpatch state instead of calling proper project-switch (2026-06-27 23:15, per user directive) + +**User feedback:** "feels like some red flags here... your just removing existing config that should still be ok for the existing state space of a well used app? Is there something fundamentally wrong with the test progression?" + +**Answer: yes. The test progression is fundamentally broken.** The live_gui test suite does not properly switch projects between tests the way a user would. Instead, tests hotpatch individual state fields via `set_value` while leaving the project context (`active_project_path`) stale from whatever the prior test left. This is the same anti-pattern as the `ui_files_base_dir` pollution Tier 2 fixed — shared mutable state in the session-scoped subprocess with no isolation boundary. + +### What a user does vs what the tests do + +**A user switching projects** (the correct flow, via `_switch_project` → `_do_project_switch`): +1. User clicks a project in the Projects panel +2. `_switch_project(path)` fires (app_controller.py:~3203) — non-blocking, marks stale +3. `_do_project_switch(path)` runs on the io_pool (app_controller.py:~3160): + - `_flush_to_project()` — saves the OLD project to disk + - `load_project(path)` — loads the NEW project's manual_slop.toml + - `self.project = new_project` — full project dict swap + - `self.active_project_path = path` — updates the active path + - Re-initializes preset/tool_preset/persona managers for the new root + - `_refresh_from_project()` — reloads `self.files`, `self.ui_files_base_dir`, `self.disc_entries`, `self.rag_config`, etc. from the new project + - `mcp_client.configure(...)` — reconfigures MCP for the new root + - Sets `ai_status = "switched to: "` + +This is a **full context swap**: files, base_dir, RAG config, discussion, presets, personas — everything is reloaded from the new project file. The user is now "in" the new project. + +**What `test_rag_phase4_final_verify` does** (the broken flow): +1. `_reset_clean_baseline` fires → `reset_session()` — partial reset (clears `files`, `rag_config`, `tracks`, `mma_state`, `disc_entries`, but **deliberately does NOT reset `active_project_path`** per the comment at app_controller.py:3874-3879) +2. The test calls `set_value('files', ['final_test_1.txt', ...])` — hotpatches `self.files` without touching the project context +3. The test calls `set_value('rag_enabled', True)` etc. — hotpatches `self.rag_config` without reloading from the project file +4. The test calls `set_value('rag_collection_name', ...)` — hotpatches the collection name +5. **The test NEVER calls `_switch_project` to switch to the workspace project.** It assumes the subprocess is already "in" the workspace. But if a prior test (e.g. `test_context_sim_live`) switched to `temp_livecontextsim.toml`, the subprocess is still there. + +The test hotpatches individual fields while the project context (`active_project_path`, `self.project`, `active_project_root`) is stale. The RAG engine uses `active_project_root` (derived from `active_project_path`) as its `base_dir`, NOT `ui_files_base_dir`. So even if `ui_files_base_dir` is correct (Tier 2's fix), the RAG engine still uses the stale project's root. + +### The simulation tests: switch but never switch back + +`simulation/sim_base.py:setup()` (line 64-99) is the ONE test path that DOES call the proper project-switch flow: +- Line 80: `client.click("btn_reset")` — partial reset +- Line 84: `self.project_path = os.path.abspath(f"tests/artifacts/temp_{project_name.lower()}.toml")` — scaffolds a temp project file +- Line 88: `self.sim.setup_new_project(project_name, git_dir, self.project_path)` — switches the subprocess to it +- Line 94: `self.client.wait_for_project_switch(expected_path=self.project_path, timeout=30.0)` — waits for the switch + +But `simulation/sim_base.py:teardown()` (line 101-109) is a **no-op**: +```python +def teardown(self) -> None: + if self.project_path and os.path.exists(self.project_path): + # We keep it for debugging if it failed, but usually we'd clean up + # os.remove(self.project_path) + pass + print("[BaseSim] Teardown complete.") +``` + +The cleanup is **commented out** ("We keep it for debugging if it failed, but usually we'd clean up"). And there is **NO switch-back to the workspace project**. The subprocess is left on `temp_livecontextsim.toml` (or whichever sim project was last switched to). + +Every sim test in `tests/test_extended_sims.py` follows this pattern: +- `test_context_sim_live` (line 23): `sim.setup("LiveContextSim")` → `sim.teardown()` (no-op) +- `test_ai_settings_sim_live` (line 37): `sim.setup("LiveAISettingsSim")` → `sim.teardown()` (no-op) +- `test_tools_sim_live` (line 51): `sim.setup("LiveToolsSim")` → `sim.teardown()` (no-op) +- `test_execution_sim_live` (line 64): `sim.setup("LiveExecutionSim")` → `sim.teardown()` (no-op) + +None of them switch back. The last one to run leaves the subprocess on its temp project. When `test_rag_phase4_final_verify` runs next in the batch, it inherits that stale project context. + +### Why `reset_session()` doesn't fix this (the "infinite re-switch loop" comment) + +`_handle_reset_session` (app_controller.py:~3862) has this comment at line 3874-3879: +``` +# We do NOT clear self.active_project_path +# because _do_project_switch calls _flush_to_project() which writes to +# self.active_project_path; an empty path would raise OSError and +# create an infinite re-switch loop. (See test_context_sim_live +# regression on 2026-06-08.) +``` + +The loop risk: `_do_project_switch` (app_controller.py:~3160) calls `_flush_to_project()` as its FIRST action. `_flush_to_project` does `if self.active_project_path: result = self._flush_to_project_result(cleaned_proj, self.active_project_path)`. If `active_project_path` is empty, it skips the save (safe). But the `_do_project_switch` `finally` block (line ~3210) checks `_project_switch_pending_path` and re-triggers `_switch_project(pending)` if a switch is queued. If `reset_session()` cleared `active_project_path` while a switch was in-flight, the pending switch would fire against an empty path → `_switch_project` checks `if not Path(path).exists(): self.ai_status = "project file not found"; return` → returns. No infinite loop. The comment's fear appears to be stale — the current `_switch_project` guards against non-existent paths. But this needs verification before changing `reset_session()`. + +**The actual `test_context_sim_live` regression on 2026-06-08** was likely a different issue (the sim switches the project, then `reset_session` fires, then the sim's `wait_for_project_switch` sees the reset as a stale state). The comment is a workaround for a sim-timing issue, not a fundamental blocker to resetting `active_project_path`. + +### The 4 red flags (user's intuition was correct) + +1. **Tests hotpatch state instead of calling proper state-change functions.** `test_rag_phase4_final_verify` sets `files`, `rag_enabled`, `rag_source`, etc. via `set_value` (hotpatching individual fields) instead of switching to a project that has the right configuration. A user would create/switch to a project with the right files and RAG settings; the test bypasses that and hotpatches. This is the "hotpatch bullshit" the user called out. + +2. **`reset_session()` is an incomplete reset masquerading as a clean baseline.** The `@clean_baseline` marker implies a clean slate, but `reset_session()` deliberately skips `active_project_path`. The project context leaks across tests. The name is misleading. + +3. **The sim tests switch projects but never switch back.** `sim_base.teardown()` is a no-op (cleanup commented out). The subprocess is left on whatever temp project the last sim used. This is a test-hygiene failure that creates the stale `active_project_path` state. + +4. **`index_file` silently no-ops on missing files.** `src/rag_engine.py` `index_file`: `if not os.path.exists(full_path): ... return` with NO log. A well-used app should tell the user when their `base_dir` is wrong. The silent return made this bug invisible for 3 sessions. + +### The correct fix (NOT deleting stale files) + +**Primary fix — make the test establish its project context properly:** +`test_rag_phase4_final_verify` should switch the subprocess to the workspace project BEFORE setting files/RAG config, the way a user would: +```python +# Switch to the workspace project (like a user would) +client.push_event('custom_callback', {'callback': '_switch_project', 'args': [str(live_gui_workspace / 'manual_slop.toml')]}) +# Wait for the switch to complete +client.wait_for_project_switch(expected_path=str(live_gui_workspace / 'manual_slop.toml'), timeout=30.0) +``` +This makes the test self-contained — it doesn't depend on whatever project a prior test left. After the switch, `active_project_path` = workspace, `active_project_root` = workspace, and the RAG engine uses the workspace as `base_dir`. + +**Secondary fix — make `reset_session()` restore the original project:** +`_handle_reset_session` should re-read the config's `projects.active` and restore `active_project_path` to the project the subprocess started with. This makes `@clean_baseline` actually mean a clean baseline. The "infinite re-switch loop" fear needs to be verified (the current `_switch_project` guards against non-existent paths, so the loop may no longer be possible). If the loop IS still possible, fix the loop in `_do_project_switch`/`_flush_to_project` instead of working around it by leaking state. + +**Tertiary fix — make `sim_base.teardown()` actually clean up:** +`simulation/sim_base.py:teardown()` should: (a) switch the subprocess back to the workspace project (or the config's `projects.active`), and (b) remove the temp project file + its history sibling. The commented-out cleanup should be uncommented and extended with a switch-back. + +**Production fix — make `index_file` log when it no-ops:** +`src/rag_engine.py` `index_file` — the `if not os.path.exists(full_path): return` should log a warning. The silent return is a production bug that makes misconfigured `base_dir` invisible. This would have made the diagnosis trivial. + +### Why "delete the stale files" was a bad recommendation (my mistake) + +Deleting `tests/artifacts/temp_*.toml` would make the test pass temporarily, but: +- It doesn't fix the test progression — the next time the sim tests run, they'll create new temp files and leave the subprocess on them again +- It doesn't fix `reset_session()` — the project context still leaks +- It doesn't fix `index_file` — the silent no-op is still invisible +- A well-used app WILL have multiple project files on disk; the test suite should handle that gracefully, not require a clean disk +- It treats the symptom (stale files exist) instead of the defect (tests don't manage project context properly) \ No newline at end of file