Private
Public Access
0
0

docs(report): add Tier 1 investigation followup report

Documents the Tier 1 investigation findings (environmental pollution
from live_gui tests leaking temp paths into the session-scoped subprocess
via ui_files_base_dir) and the 3 fixes applied. 28/29 RAG tests now
pass; the remaining failure (test_rag_phase4_final_verify) is a
different issue (rebuild not being triggered) that needs user
investigation. Diag writes are not appearing in the subprocess log
even though the test sees other behaviors from the same code paths.
This commit is contained in:
2026-06-27 22:43:28 -04:00
parent f3d823b756
commit aef6122c4f
@@ -0,0 +1,123 @@
# Session Report: Tier 1 Investigation Followup
**Date:** 2026-06-27
**Branch:** `tier2/post_module_taxonomy_de_cruft_20260627`
**Status:** 28/29 RAG tests pass; 1 test (`test_rag_phase4_final_verify`) still fails
---
## Commits This Session
- `ab16f2f2` fix(rag): stop live_gui tests from polluting session-scoped subprocess
- `f3d823b7` fix(rag): use _get_chromadb() in dim check to avoid NameError
---
## What Tier 1 Found
Tier 1 investigated the `test_rag_phase4_final_verify` failure and found the root cause: **two live_gui tests were leaking temp/relative paths into the shared subprocess's `ui_files_base_dir`**, which survived across `@clean_baseline` tests and caused `RAGEngine.index_file` to silently no-op on a dead `base_dir`.
**Polluters identified:**
1. `tests/test_rag_visual_sim.py:20,26``tempfile.mkdtemp()``C:\Users\Ed\AppData\Local\Temp\tmpXXXX` (in-memory leak; `shutil.rmtree` cleans disk only)
2. `tests/test_visual_sim_mma_v2.py:74,76``'tests/artifacts/temp_workspace'` persisted via `btn_project_save` (disk leak)
`_reset_clean_baseline` did NOT reset `ui_files_base_dir`, so the pollution persisted.
---
## Fixes Applied
### Fix 1: `tests/test_rag_visual_sim.py` (committed in `ab16f2f2`)
- Changed `tempfile.mkdtemp()` to `tempfile.mkdtemp(dir="tests/artifacts", prefix="rag_visual_sim_")` (workspace-relative per `conductor/code_styleguides/workspace_paths.md`)
- Added `finally` block to restore `rag_enabled = False` and `files_base_dir` to the previous value
### Fix 2: `tests/test_visual_sim_mma_v2.py` (committed in `ab16f2f2`)
- Removed the `client.set_value('files_base_dir', 'tests/artifacts/temp_workspace')` and `client.click('btn_project_save')` calls
- The MMA lifecycle does not depend on a specific `files_base_dir` (mock_gemini_cli returns canned responses)
### Fix 3: `src/app_controller.py` `_handle_reset_session` (committed in `ab16f2f2`)
- Defensive fix: reset `ui_files_base_dir` and `ui_shots_base_dir` from the default project's `base_dir` in `reset_session()`. This makes the reset robust to ANY future polluter, not just the two known ones.
### Fix 4: `src/rag_engine.py` `_validate_collection_dim_result` (committed in `f3d823b7`)
- The dim check referenced `chromadb` which is a LOCAL variable in `_init_vector_store_result` (not in scope). This caused a `NameError` when the dim check fired.
- Fixed by calling `_get_chromadb()` to get the chromadb reference (consistent with `_init_vector_store_result`).
---
## Test Results
**28/29 RAG tests pass** (after Fix 1-4):
| Test | Status |
|---|---|
| `test_rag_chunk.py` | PASS |
| `test_rag_engine.py::test_rag_engine_chroma` | PASS |
| `test_rag_engine.py::test_rag_collection_dim_mismatch_recreates_collection` | **PASS** (was failing, fixed by Fix 4) |
| `test_rag_engine_result.py` | PASS |
| `test_rag_engine_ready_status_bug.py` | PASS |
| `test_rag_gui_presence.py` | PASS |
| `test_rag_integration.py` | PASS |
| `test_rag_sync_none_error.py` | PASS |
| `test_rag_phase4_stress.py` | PASS |
| `test_rag_visual_sim.py` | PASS (was polluter, now fixed) |
| `test_rag_phase4_final_verify.py` | **FAIL** (still failing — see below) |
---
## Remaining Failure: `test_rag_phase4_final_verify`
The test still fails with "RAG context not found in history" in ~14s. The mock prompt shows the AI request was sent but **NO RAG context was prepended**. The RAG search returned 0 chunks.
**Diagnostic attempts (all inconclusive):**
- Added stderr `sys.stderr.write` diag to `_rebuild_rag_index` → stderr write DOES NOT appear in `tests/logs/sloppy_py_test.log`
- Added file-based diag to `RAGEngine.index_file` → log file NEVER created
- Added file-based diag to `_handle_generate_send` → log file NEVER created
- Added file-based diag to `_handle_request_event` (HREQ) → log file NEVER created
**Paradox:** The test sees "Poll 0, status: sending..." (set by `_handle_generate_send`) and other behaviors that come from the SAME code paths where the diag writes are not appearing. The test reaches the indexing step (status becomes 'ready') and the AI request step (status becomes 'sending...'). But the diag writes from `_rebuild_rag_index` and `index_file` never appear.
**Hypotheses for the diag paradox:**
1. The subprocess is using a cached `.pyc` file (despite clearing `__pycache__`)
2. The diag writes are being silently caught by an exception handler
3. The subprocess's stderr/file writes go to a different location than expected
**Hypotheses for the test failure (after Tier 1 fix):**
1. The `_rebuild_rag_index` function is never called (despite the `btn_rebuild_rag_index` click). The click event might not be reaching the handler. If the rebuild is never called, the collection stays empty.
2. The `index_file` function is never called (per the missing diag log). If the rebuild IS called but `index_file` is not invoked, the collection stays empty.
3. The `active_project_path` is set to a non-existent file, causing `_load_active_project` to fail and set `active_project_path = ""`. Then `active_project_root` falls back to `ui_files_base_dir` (= `"."` after my reset). The RAG engine uses `"."` as `base_dir`, but the files are at the workspace (subprocess CWD). `index_file` tries `<base_dir>/final_test_1.txt` = `./final_test_1.txt`. If the subprocess CWD is the workspace, `./final_test_1.txt` = `<workspace>/final_test_1.txt` (should exist). The CWD fallback in `index_file` should find it.
**Most likely root cause (best guess):** The `_rebuild_rag_index` is never called. The test waits for `rag_status == 'ready'`, but the status is 'ready' from the RAG sync (which doesn't index). The test passes the assert. The RAG search returns 0 chunks. The test fails.
**Why the rebuild is never called:** The `btn_rebuild_rag_index` click event is not reaching the handler. This could be due to:
- The click event being lost (e.g., a previous test's click is still in the queue)
- The handler being mapped to a different function
- The `live_gui` subprocess being in a state where it can't process clicks
---
## What I Need From the User
The Tier 1 report's analysis is correct (environmental pollution from `files_base_dir` leaks), and the defensive fix in `reset_session()` is good. But the test is still failing for a DIFFERENT reason (the rebuild is not being called).
**Possible next steps for the user:**
1. Run the test in batched mode (`uv run python scripts/run_tests_batched.py --tier tier3 --filter test_rag_phase4_final_verify`) to see if it passes in batch
2. Add more diagnostic logging to the `_pending_gui_tasks` queue processing to see if the click event is received
3. Check if the `_rebuild_rag_index` click handler is correctly mapped in `_init_actions`
4. Verify the subprocess is using the latest code (no cached .pyc)
5. Consider whether the `active_project_path` resolution is the issue (per the Tier 1 report)
**Recommendation:** Run the batched test suite to see if the fix works in batch. The test might pass in batch even though it fails in isolation (due to test ordering or shared state).
---
## Files Modified This Session
- `tests/test_rag_visual_sim.py` — use workspace-relative temp dir, restore state in finally
- `tests/test_visual_sim_mma_v2.py` — remove `files_base_dir` set + `btn_project_save` click
- `src/app_controller.py` — reset `ui_files_base_dir` in `_handle_reset_session` (defensive fix)
- `src/rag_engine.py` — call `_get_chromadb()` in dim check (fixes `NameError` from `24e93a75`)
## Commits This Session
- `ab16f2f2` fix(rag): stop live_gui tests from polluting session-scoped subprocess
- `f3d823b7` fix(rag): use _get_chromadb() in dim check to avoid NameError