From 97c58f033225ee68bf45e889640b98b29a667639 Mon Sep 17 00:00:00 2001
From: Ed_ <edwardgz@gmail.com>
Date: Sat, 27 Jun 2026 23:46:36 -0400
Subject: [PATCH] docs(report): ADDENDUM 3 - tests hotpatch state instead of
 calling proper project-switch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per user feedback: the test progression is fundamentally broken. Tests
hotpatch individual state fields (files, rag_enabled, etc.) via set_value
instead of switching to a project that has the right configuration, like
a user would. The session-scoped subprocess's active_project_path leaks
across tests because reset_session() deliberately doesn't reset it.

Documented the 4 red flags:
1. test_rag_phase4_final_verify hotpatches state, never calls _switch_project
2. reset_session() is an incomplete reset masquerading as @clean_baseline
3. sim_base.teardown() is a no-op (cleanup commented out), never switches back
4. index_file silently no-ops on missing files (production bug)

Correct fix: tests should call _switch_project to establish their project
context (like a user), not hotpatch. reset_session() should restore the
original project. sim_base.teardown() should switch back + clean up.
Retracted the 'delete stale files' recommendation — that treats the
symptom, not the defect.
---
 ...GATION_rag_phase4_final_verify_20260627.md | 119 +++++++++++++++++-
 1 file changed, 118 insertions(+), 1 deletion(-)

diff --git a/docs/reports/INVESTIGATION_rag_phase4_final_verify_20260627.md b/docs/reports/INVESTIGATION_rag_phase4_final_verify_20260627.md
index edd23a91..cc45ab81 100644
--- a/docs/reports/INVESTIGATION_rag_phase4_final_verify_20260627.md
+++ b/docs/reports/INVESTIGATION_rag_phase4_final_verify_20260627.md
@@ -250,4 +250,121 @@ The `test_sandbox_hardening_20260619` track shipped (state.toml `status = "compl
 
 - `4d2a6666` fix(rag): convert RAGChunk to dict — **the real fix for the OLD hang**; still present in source
 - `91a61288` Merge `tier2/module_taxonomy_refactor_20260627` — **ancestor of `4d2a6666`** (NOT the regression trigger; `4d2a6666` was on top of it when it allegedly passed)
-- `e58d332e` test(rag): update dim mismatch test + stress test — Tier 2's test-only changes; `git diff 4d2a6666 e58d332e` is test/docs only, NO `src/` — so the "regression" between them is environmental, not a code change
\ No newline at end of file
+- `e58d332e` test(rag): update dim mismatch test + stress test — Tier 2's test-only changes; `git diff 4d2a6666 e58d332e` is test/docs only, NO `src/` — so the "regression" between them is environmental, not a code change
+
+---
+
+## ADDENDUM 3: The real defect — tests hotpatch state instead of calling proper project-switch (2026-06-27 23:15, per user directive)
+
+**User feedback:** "feels like some red flags here... your just removing existing config that should still be ok for the existing state space of a well used app? Is there something fundamentally wrong with the test progression?"
+
+**Answer: yes. The test progression is fundamentally broken.** The live_gui test suite does not properly switch projects between tests the way a user would. Instead, tests hotpatch individual state fields via `set_value` while leaving the project context (`active_project_path`) stale from whatever the prior test left. This is the same anti-pattern as the `ui_files_base_dir` pollution Tier 2 fixed — shared mutable state in the session-scoped subprocess with no isolation boundary.
+
+### What a user does vs what the tests do
+
+**A user switching projects** (the correct flow, via `_switch_project` → `_do_project_switch`):
+1. User clicks a project in the Projects panel
+2. `_switch_project(path)` fires (app_controller.py:~3203) — non-blocking, marks stale
+3. `_do_project_switch(path)` runs on the io_pool (app_controller.py:~3160):
+   - `_flush_to_project()` — saves the OLD project to disk
+   - `load_project(path)` — loads the NEW project's manual_slop.toml
+   - `self.project = new_project` — full project dict swap
+   - `self.active_project_path = path` — updates the active path
+   - Re-initializes preset/tool_preset/persona managers for the new root
+   - `_refresh_from_project()` — reloads `self.files`, `self.ui_files_base_dir`, `self.disc_entries`, `self.rag_config`, etc. from the new project
+   - `mcp_client.configure(...)` — reconfigures MCP for the new root
+   - Sets `ai_status = "switched to: <name>"`
+
+This is a **full context swap**: files, base_dir, RAG config, discussion, presets, personas — everything is reloaded from the new project file. The user is now "in" the new project.
+
+**What `test_rag_phase4_final_verify` does** (the broken flow):
+1. `_reset_clean_baseline` fires → `reset_session()` — partial reset (clears `files`, `rag_config`, `tracks`, `mma_state`, `disc_entries`, but **deliberately does NOT reset `active_project_path`** per the comment at app_controller.py:3874-3879)
+2. The test calls `set_value('files', ['final_test_1.txt', ...])` — hotpatches `self.files` without touching the project context
+3. The test calls `set_value('rag_enabled', True)` etc. — hotpatches `self.rag_config` without reloading from the project file
+4. The test calls `set_value('rag_collection_name', ...)` — hotpatches the collection name
+5. **The test NEVER calls `_switch_project` to switch to the workspace project.** It assumes the subprocess is already "in" the workspace. But if a prior test (e.g. `test_context_sim_live`) switched to `temp_livecontextsim.toml`, the subprocess is still there.
+
+The test hotpatches individual fields while the project context (`active_project_path`, `self.project`, `active_project_root`) is stale. The RAG engine uses `active_project_root` (derived from `active_project_path`) as its `base_dir`, NOT `ui_files_base_dir`. So even if `ui_files_base_dir` is correct (Tier 2's fix), the RAG engine still uses the stale project's root.
+
+### The simulation tests: switch but never switch back
+
+`simulation/sim_base.py:setup()` (line 64-99) is the ONE test path that DOES call the proper project-switch flow:
+- Line 80: `client.click("btn_reset")` — partial reset
+- Line 84: `self.project_path = os.path.abspath(f"tests/artifacts/temp_{project_name.lower()}.toml")` — scaffolds a temp project file
+- Line 88: `self.sim.setup_new_project(project_name, git_dir, self.project_path)` — switches the subprocess to it
+- Line 94: `self.client.wait_for_project_switch(expected_path=self.project_path, timeout=30.0)` — waits for the switch
+
+But `simulation/sim_base.py:teardown()` (line 101-109) is a **no-op**:
+```python
+def teardown(self) -> None:
+    if self.project_path and os.path.exists(self.project_path):
+        # We keep it for debugging if it failed, but usually we'd clean up
+        # os.remove(self.project_path)
+        pass
+    print("[BaseSim] Teardown complete.")
+```
+
+The cleanup is **commented out** ("We keep it for debugging if it failed, but usually we'd clean up"). And there is **NO switch-back to the workspace project**. The subprocess is left on `temp_livecontextsim.toml` (or whichever sim project was last switched to).
+
+Every sim test in `tests/test_extended_sims.py` follows this pattern:
+- `test_context_sim_live` (line 23): `sim.setup("LiveContextSim")` → `sim.teardown()` (no-op)
+- `test_ai_settings_sim_live` (line 37): `sim.setup("LiveAISettingsSim")` → `sim.teardown()` (no-op)
+- `test_tools_sim_live` (line 51): `sim.setup("LiveToolsSim")` → `sim.teardown()` (no-op)
+- `test_execution_sim_live` (line 64): `sim.setup("LiveExecutionSim")` → `sim.teardown()` (no-op)
+
+None of them switch back. The last one to run leaves the subprocess on its temp project. When `test_rag_phase4_final_verify` runs next in the batch, it inherits that stale project context.
+
+### Why `reset_session()` doesn't fix this (the "infinite re-switch loop" comment)
+
+`_handle_reset_session` (app_controller.py:~3862) has this comment at line 3874-3879:
+```
+# We do NOT clear self.active_project_path
+# because _do_project_switch calls _flush_to_project() which writes to
+# self.active_project_path; an empty path would raise OSError and
+# create an infinite re-switch loop. (See test_context_sim_live
+# regression on 2026-06-08.)
+```
+
+The loop risk: `_do_project_switch` (app_controller.py:~3160) calls `_flush_to_project()` as its FIRST action. `_flush_to_project` does `if self.active_project_path: result = self._flush_to_project_result(cleaned_proj, self.active_project_path)`. If `active_project_path` is empty, it skips the save (safe). But the `_do_project_switch` `finally` block (line ~3210) checks `_project_switch_pending_path` and re-triggers `_switch_project(pending)` if a switch is queued. If `reset_session()` cleared `active_project_path` while a switch was in-flight, the pending switch would fire against an empty path → `_switch_project` checks `if not Path(path).exists(): self.ai_status = "project file not found"; return` → returns. No infinite loop. The comment's fear appears to be stale — the current `_switch_project` guards against non-existent paths. But this needs verification before changing `reset_session()`.
+
+**The actual `test_context_sim_live` regression on 2026-06-08** was likely a different issue (the sim switches the project, then `reset_session` fires, then the sim's `wait_for_project_switch` sees the reset as a stale state). The comment is a workaround for a sim-timing issue, not a fundamental blocker to resetting `active_project_path`.
+
+### The 4 red flags (user's intuition was correct)
+
+1. **Tests hotpatch state instead of calling proper state-change functions.** `test_rag_phase4_final_verify` sets `files`, `rag_enabled`, `rag_source`, etc. via `set_value` (hotpatching individual fields) instead of switching to a project that has the right configuration. A user would create/switch to a project with the right files and RAG settings; the test bypasses that and hotpatches. This is the "hotpatch bullshit" the user called out.
+
+2. **`reset_session()` is an incomplete reset masquerading as a clean baseline.** The `@clean_baseline` marker implies a clean slate, but `reset_session()` deliberately skips `active_project_path`. The project context leaks across tests. The name is misleading.
+
+3. **The sim tests switch projects but never switch back.** `sim_base.teardown()` is a no-op (cleanup commented out). The subprocess is left on whatever temp project the last sim used. This is a test-hygiene failure that creates the stale `active_project_path` state.
+
+4. **`index_file` silently no-ops on missing files.** `src/rag_engine.py` `index_file`: `if not os.path.exists(full_path): ... return` with NO log. A well-used app should tell the user when their `base_dir` is wrong. The silent return made this bug invisible for 3 sessions.
+
+### The correct fix (NOT deleting stale files)
+
+**Primary fix — make the test establish its project context properly:**
+`test_rag_phase4_final_verify` should switch the subprocess to the workspace project BEFORE setting files/RAG config, the way a user would:
+```python
+# Switch to the workspace project (like a user would)
+client.push_event('custom_callback', {'callback': '_switch_project', 'args': [str(live_gui_workspace / 'manual_slop.toml')]})
+# Wait for the switch to complete
+client.wait_for_project_switch(expected_path=str(live_gui_workspace / 'manual_slop.toml'), timeout=30.0)
+```
+This makes the test self-contained — it doesn't depend on whatever project a prior test left. After the switch, `active_project_path` = workspace, `active_project_root` = workspace, and the RAG engine uses the workspace as `base_dir`.
+
+**Secondary fix — make `reset_session()` restore the original project:**
+`_handle_reset_session` should re-read the config's `projects.active` and restore `active_project_path` to the project the subprocess started with. This makes `@clean_baseline` actually mean a clean baseline. The "infinite re-switch loop" fear needs to be verified (the current `_switch_project` guards against non-existent paths, so the loop may no longer be possible). If the loop IS still possible, fix the loop in `_do_project_switch`/`_flush_to_project` instead of working around it by leaking state.
+
+**Tertiary fix — make `sim_base.teardown()` actually clean up:**
+`simulation/sim_base.py:teardown()` should: (a) switch the subprocess back to the workspace project (or the config's `projects.active`), and (b) remove the temp project file + its history sibling. The commented-out cleanup should be uncommented and extended with a switch-back.
+
+**Production fix — make `index_file` log when it no-ops:**
+`src/rag_engine.py` `index_file` — the `if not os.path.exists(full_path): return` should log a warning. The silent return is a production bug that makes misconfigured `base_dir` invisible. This would have made the diagnosis trivial.
+
+### Why "delete the stale files" was a bad recommendation (my mistake)
+
+Deleting `tests/artifacts/temp_*.toml` would make the test pass temporarily, but:
+- It doesn't fix the test progression — the next time the sim tests run, they'll create new temp files and leave the subprocess on them again
+- It doesn't fix `reset_session()` — the project context still leaks
+- It doesn't fix `index_file` — the silent no-op is still invisible
+- A well-used app WILL have multiple project files on disk; the test suite should handle that gracefully, not require a clean disk
+- It treats the symptom (stale files exist) instead of the defect (tests don't manage project context properly)
\ No newline at end of file