conductor(track): plan for rag_test_failures_20260615 (5 phases, ~10 tasks)

2026-06-15 21:53:13 -04:00
parent 006df67637
commit 3aa4cfa133
1 changed files with 173 additions and 0 deletions
@@ -0,0 +1,173 @@
+# Plan: RAG Test Failures Fix
+
+**Track:** `rag_test_failures_20260615`
+**Spec:** `spec.md`
+**Status:** Active (plan approved 2026-06-15)
+
+## TDD Protocol (MANDATORY)
+
+For each phase, the order is:
+1. **Red**: verify the test/failure is present (TDD red phase)
+2. **Green**: implement the fix; run the test; confirm it passes
+3. **Verify green**: run the targeted test batch to confirm no regression
+4. **Commit**: one atomic commit per task with a clear message
+5. **Git note**: attach a 3-5 sentence summary to the commit
+
+Per the project rule (see `AGENTS.md` "Critical Anti-Patterns"), per-task atomic commits. The 1-space indentation rule is in effect.
+
+**Diagnostic strategy:** the error message `"'NoneType' object has no attribute 'get'"` is specific — it indicates a `dict.get()` call on a `None` value. The implementer should add a diagnostic traceback to the except clause at `src/app_controller.py:1479` to capture the actual call site, then remove the traceback after the fix is verified.
+
+---
+
+## Phase 1: Investigation + reproducing test (1-2 hours)
+
+**Focus:** Find the exact location of the `.get(None)` call. The spec §1.4 lists 5 candidate sites; the investigation will narrow to 1.
+
+- [ ] **Task 1.1**: TDD red - verify all 3 RAG tests fail with the same error
+  - **Command:** `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase1_red.log`
+  - **EXPECTED:** 3 failures, all with the same `rag_status: error: 'NoneType' object has no attribute 'get'`
+  - **COMMIT:** No new commit; this is a verification step.
+
+- [ ] **Task 1.2**: Add diagnostic traceback to the except clause
+  - **WHERE:** `src/app_controller.py:1479-1482` (the except clause in `_do_rag_sync`)
+  - **WHAT:** Replace the existing `sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")` with `sys.stderr.write(traceback.format_exc())`. Also `import traceback` at the top of the file (if not already imported).
+  - **HOW:** Use `manual-slop_edit_file` to add the import and update the except clause. 2-line change.
+  - **NOTE:** This is a temporary diagnostic; remove it in Phase 2 after the fix is verified.
+  - **SAFETY:** The `traceback` import is stdlib; no new dependency. The `format_exc()` is thread-safe.
+  - **VERIFY:** `uv run pytest tests/test_rag_visual_sim.py -v 2>&1 | tee /tmp/rag_diag.log` — confirm the full traceback is printed to stderr
+  - **COMMIT:** `chore(rag): add diagnostic traceback to _do_rag_sync except clause (Phase 1.2)`
+
+- [ ] **Task 1.3**: Capture the full traceback and identify the call site
+  - **Command:** `uv run pytest tests/test_rag_visual_sim.py -v 2>&1 | grep -A 30 "Traceback"`
+  - **EXPECTED:** A traceback showing the exact line where `.get()` is called on None
+  - **OUTPUT:** Document the traceback in the commit message for the fix (Phase 2)
+  - **COMMIT:** No new commit; this is a verification step.
+
+- [ ] **Task 1.4**: Write a focused reproducing test (smaller than the 3 RAG tests)
+  - **WHERE:** `tests/test_rag_sync_none_error.py` (new file, ~30 lines)
+  - **WHAT:** A focused test that:
+    1. Creates an `AppController` with mocked dependencies
+    2. Sets `rag_enabled=True` via the setter
+    3. Submits the sync and waits for completion
+    4. Asserts `rag_status != "error: ..."` (or specifically `rag_status == "ready"`)
+  - **HOW:** Use the existing `test_orchestration_logic.py` or `test_rag_engine.py` patterns as a template. Use `MagicMock` for the controller's heavy dependencies.
+  - **SAFETY:** No live_gui; this should be a fast unit test.
+  - **VERIFY:** `uv run pytest tests/test_rag_sync_none_error.py -v` fails with the same error
+  - **COMMIT:** `test(rag): add focused reproducing test for NoneType.get sync error (Phase 1.4)`
+
+---
+
+## Phase 2: Fix (1-3 hours)
+
+**Focus:** Fix the root cause found in Phase 1. The fix is dependent on what the investigation reveals.
+
+- [ ] **Task 2.1**: Implement the fix based on the Phase 1 investigation
+  - **WHERE:** TBD based on Phase 1 (one of: `src/rag_engine.py:_validate_collection_dim_result`, `src/rag_engine.py:_init_vector_store_result`, `src/app_controller.py:_do_rag_sync`, or a config field setter)
+  - **WHAT:** Add a defensive guard or correct the call. Specific examples:
+    - If `src/rag_engine.py:149` (`embeddings = res.get("embeddings")`): Add a check that `res` is a dict before calling `.get()`; if not, return `Result(data=None)` early.
+    - If a config field is None: Add a guard in the setter or a fallback in the engine init.
+    - If the IO pool is leaking errors from another worker: Add a more specific exception handler.
+  - **HOW:** Use `manual-slop_edit_file` for surgical changes. 1-5 lines typical.
+  - **SAFETY:** The fix must be defensive (guard against future None) or corrective (the field should not be None). Document the choice in the commit message.
+  - **VERIFY:** `uv run pytest tests/test_rag_sync_none_error.py -v` passes (the new test from Phase 1.4)
+  - **COMMIT:** `fix(rag): handle None response in _validate_collection_dim_result (Phase 2.1)` (or appropriate title based on the actual fix)
+
+- [ ] **Task 2.2**: Verify all 3 RAG tests pass
+  - **Command:** `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase2_green.log`
+  - **EXPECTED:** 3/3 pass
+  - **COMMIT:** No new commit; this is a verification step.
+
+- [ ] **Task 2.3**: Remove the diagnostic traceback from Phase 1.2
+  - **WHERE:** `src/app_controller.py:1479-1482`
+  - **WHAT:** Remove the `import traceback` (if not used elsewhere) and the `traceback.format_exc()` call. Restore the original `sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")`.
+  - **HOW:** Use `manual-slop_edit_file` with the exact old/new strings.
+  - **SAFETY:** Verify `traceback` is not used elsewhere in the file before removing the import. Use `uv run rg "traceback" src/app_controller.py` to check.
+  - **VERIFY:** `uv run rg "traceback" src/app_controller.py` returns 0 hits (or only the import line which should also be removed)
+  - **COMMIT:** `chore(rag): remove diagnostic traceback from _do_rag_sync (Phase 2.3)`
+
+- [ ] **Task 2.4**: Add a defensive guard or proper error message (G3)
+  - **WHERE:** TBD based on the fix in Task 2.1
+  - **WHAT:** Ensure the error message identifies WHICH field or call is None. For example, change "error: NoneType has no attribute 'get'" to "error: RAG sync failed: <class>.get() called on None in <function>".
+  - **HOW:** Catch the specific exception type and re-raise with a more informative message. Or add a `try/except` around the specific call site.
+  - **SAFETY:** The new error message should not leak sensitive information (file paths are OK; credentials are not).
+  - **VERIFY:** Run the 3 RAG tests; if the bug recurs, the error message is more useful.
+  - **COMMIT:** `fix(rag): add defensive guard with informative error message (Phase 2.4)`
+
+---
+
+## Phase 3: Full test suite + batched verification (30 min)
+
+**Focus:** Ensure no regression in the broader test suite.
+
+- [ ] **Task 3.1**: Run the full RAG test suite
+  - **Command:** `uv run pytest tests/test_rag_engine.py tests/test_rag_engine_result.py tests/test_rag_engine_ready_status_bug.py tests/test_rag_gui_presence.py tests/test_rag_integration.py tests/test_sync_rag_engine_coalescing.py tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase3_rag_suite.log`
+  - **EXPECTED:** 30+/30+ pass (no new failures)
+  - **COMMIT:** No new commit; this is a verification step.
+
+- [ ] **Task 3.2**: Run the full test suite
+  - **Command:** `uv run pytest tests/ 2>&1 | tee tests/artifacts/rag_track_phase3_full.log`
+  - **EXPECTED:** 1285 pass + 4 skip + 0 fail (was 1282 + 4 + 3 pre-track)
+  - **ACTION:** If NEW failures appear, STOP and report to the user.
+  - **COMMIT:** No new commit; this is a verification step.
+
+- [ ] **Task 3.3**: Run the batched test suite
+  - **Command:** `uv run .\scripts\run_tests_batched.py 2>&1 | tee tests/artifacts/rag_track_phase3_batched.log`
+  - **EXPECTED:** All tiers PASS; no failures
+  - **COMMIT:** `conductor(checkpoint): Phase 3 complete - 1285 tests pass, 0 failures`
+
+---
+
+## Phase 4: Docs update (15 min)
+
+**Focus:** Document the fix in `docs/guide_rag.md` (if it exists).
+
+- [ ] **Task 4.1**: Check if `docs/guide_rag.md` exists
+  - **Command:** `uv run rg "guide_rag" docs/ docs/AGENTS.md`
+  - **EXPECTED:** May or may not exist; if not, skip Phase 4
+  - **COMMIT:** No new commit.
+
+- [ ] **Task 4.2 (CONDITIONAL)**: If `docs/guide_rag.md` exists, add a troubleshooting entry
+  - **WHERE:** `docs/guide_rag.md` (a "Troubleshooting" or "Known issues" section)
+  - **WHAT:** Add 1-2 paragraphs documenting:
+    - The error: "If `rag_status` shows `'NoneType' object has no attribute 'get'`, ..."
+    - The fix: "Check the RAG sync worker at `src/app_controller.py:_do_rag_sync`..."
+  - **HOW:** Use `manual-slop_edit_file` to add the section.
+  - **VERIFY:** `uv run rg "NoneType" docs/guide_rag.md` returns 1 hit
+  - **COMMIT:** `docs(rag): document the NoneType.get fix (Phase 4.2)`
+
+---
+
+## Phase 5: Metadata + tracks.md (15 min)
+
+**Focus:** Mark the track complete in the project registry.
+
+- [ ] **Task 5.1**: Update `metadata.json` to mark the track complete
+  - **WHERE:** `conductor/tracks/rag_test_failures_20260615/metadata.json`
+  - **WHAT:** Change `"status": "active"` to `"status": "completed"`. Add a `completed_at` field. Update `verification_criteria` to reflect what was actually verified.
+  - **HOW:** Direct file edit.
+  - **COMMIT:** `conductor(track): mark rag_test_failures_20260615 as completed`
+
+- [ ] **Task 5.2**: Update `conductor/tracks.md` to reflect the track's status
+  - **WHERE:** `conductor/tracks.md`
+  - **WHAT:** Add a row for the RAG track or update the existing RAG section.
+  - **HOW:** Direct file edit.
+  - **COMMIT:** `conductor: mark rag_test_failures_20260615 as completed in tracks.md`
+
+- [ ] **Task 5.3**: Conductor - User Manual Verification
+  - **ACTION:** Announce the track is complete. Provide the user with a summary: "3 RAG tests fixed; first fully green baseline since 2026-06-12. The user can now proceed with the `send_result` → `send` mass rename or the `data_structure_strengthening_20260606` track."
+
+---
+
+## Summary
+
+- **Total tasks:** ~10 (across 5 phases)
+- **Total atomic commits:** ~5-7 (1 diagnostic + 1 reproducing test + 1 fix + 1 diag removal + 1 defensive guard + 1 docs + 1 metadata)
+- **Total estimated effort:** 0.5-1 day Tier 2 work (4-8 hours)
+- **Dependencies:** None (independent track)
+- **Out of scope (deferred):** `send_result` → `send` mass rename (user's manual refactor); 23 lower-impact weak-type files (data_structure_strengthening); live_gui_mock_injection infrastructure
+
+## Test count math
+
+- **Pre-track baseline:** 1282 pass + 4 skip + 3 fail
+- **After this track:** 1285 pass + 4 skip + 0 fail (3 newly-passing)
+- **First fully green baseline** since `data_oriented_error_handling_20260606` shipped 2026-06-12