diff --git a/conductor/tracks/rag_test_failures_20260615/plan.md b/conductor/tracks/rag_test_failures_20260615/plan.md new file mode 100644 index 00000000..2618a2eb --- /dev/null +++ b/conductor/tracks/rag_test_failures_20260615/plan.md @@ -0,0 +1,173 @@ +# Plan: RAG Test Failures Fix + +**Track:** `rag_test_failures_20260615` +**Spec:** `spec.md` +**Status:** Active (plan approved 2026-06-15) + +## TDD Protocol (MANDATORY) + +For each phase, the order is: +1. **Red**: verify the test/failure is present (TDD red phase) +2. **Green**: implement the fix; run the test; confirm it passes +3. **Verify green**: run the targeted test batch to confirm no regression +4. **Commit**: one atomic commit per task with a clear message +5. **Git note**: attach a 3-5 sentence summary to the commit + +Per the project rule (see `AGENTS.md` "Critical Anti-Patterns"), per-task atomic commits. The 1-space indentation rule is in effect. + +**Diagnostic strategy:** the error message `"'NoneType' object has no attribute 'get'"` is specific — it indicates a `dict.get()` call on a `None` value. The implementer should add a diagnostic traceback to the except clause at `src/app_controller.py:1479` to capture the actual call site, then remove the traceback after the fix is verified. + +--- + +## Phase 1: Investigation + reproducing test (1-2 hours) + +**Focus:** Find the exact location of the `.get(None)` call. The spec §1.4 lists 5 candidate sites; the investigation will narrow to 1. + +- [ ] **Task 1.1**: TDD red - verify all 3 RAG tests fail with the same error + - **Command:** `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase1_red.log` + - **EXPECTED:** 3 failures, all with the same `rag_status: error: 'NoneType' object has no attribute 'get'` + - **COMMIT:** No new commit; this is a verification step. + +- [ ] **Task 1.2**: Add diagnostic traceback to the except clause + - **WHERE:** `src/app_controller.py:1479-1482` (the except clause in `_do_rag_sync`) + - **WHAT:** Replace the existing `sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")` with `sys.stderr.write(traceback.format_exc())`. Also `import traceback` at the top of the file (if not already imported). + - **HOW:** Use `manual-slop_edit_file` to add the import and update the except clause. 2-line change. + - **NOTE:** This is a temporary diagnostic; remove it in Phase 2 after the fix is verified. + - **SAFETY:** The `traceback` import is stdlib; no new dependency. The `format_exc()` is thread-safe. + - **VERIFY:** `uv run pytest tests/test_rag_visual_sim.py -v 2>&1 | tee /tmp/rag_diag.log` — confirm the full traceback is printed to stderr + - **COMMIT:** `chore(rag): add diagnostic traceback to _do_rag_sync except clause (Phase 1.2)` + +- [ ] **Task 1.3**: Capture the full traceback and identify the call site + - **Command:** `uv run pytest tests/test_rag_visual_sim.py -v 2>&1 | grep -A 30 "Traceback"` + - **EXPECTED:** A traceback showing the exact line where `.get()` is called on None + - **OUTPUT:** Document the traceback in the commit message for the fix (Phase 2) + - **COMMIT:** No new commit; this is a verification step. + +- [ ] **Task 1.4**: Write a focused reproducing test (smaller than the 3 RAG tests) + - **WHERE:** `tests/test_rag_sync_none_error.py` (new file, ~30 lines) + - **WHAT:** A focused test that: + 1. Creates an `AppController` with mocked dependencies + 2. Sets `rag_enabled=True` via the setter + 3. Submits the sync and waits for completion + 4. Asserts `rag_status != "error: ..."` (or specifically `rag_status == "ready"`) + - **HOW:** Use the existing `test_orchestration_logic.py` or `test_rag_engine.py` patterns as a template. Use `MagicMock` for the controller's heavy dependencies. + - **SAFETY:** No live_gui; this should be a fast unit test. + - **VERIFY:** `uv run pytest tests/test_rag_sync_none_error.py -v` fails with the same error + - **COMMIT:** `test(rag): add focused reproducing test for NoneType.get sync error (Phase 1.4)` + +--- + +## Phase 2: Fix (1-3 hours) + +**Focus:** Fix the root cause found in Phase 1. The fix is dependent on what the investigation reveals. + +- [ ] **Task 2.1**: Implement the fix based on the Phase 1 investigation + - **WHERE:** TBD based on Phase 1 (one of: `src/rag_engine.py:_validate_collection_dim_result`, `src/rag_engine.py:_init_vector_store_result`, `src/app_controller.py:_do_rag_sync`, or a config field setter) + - **WHAT:** Add a defensive guard or correct the call. Specific examples: + - If `src/rag_engine.py:149` (`embeddings = res.get("embeddings")`): Add a check that `res` is a dict before calling `.get()`; if not, return `Result(data=None)` early. + - If a config field is None: Add a guard in the setter or a fallback in the engine init. + - If the IO pool is leaking errors from another worker: Add a more specific exception handler. + - **HOW:** Use `manual-slop_edit_file` for surgical changes. 1-5 lines typical. + - **SAFETY:** The fix must be defensive (guard against future None) or corrective (the field should not be None). Document the choice in the commit message. + - **VERIFY:** `uv run pytest tests/test_rag_sync_none_error.py -v` passes (the new test from Phase 1.4) + - **COMMIT:** `fix(rag): handle None response in _validate_collection_dim_result (Phase 2.1)` (or appropriate title based on the actual fix) + +- [ ] **Task 2.2**: Verify all 3 RAG tests pass + - **Command:** `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase2_green.log` + - **EXPECTED:** 3/3 pass + - **COMMIT:** No new commit; this is a verification step. + +- [ ] **Task 2.3**: Remove the diagnostic traceback from Phase 1.2 + - **WHERE:** `src/app_controller.py:1479-1482` + - **WHAT:** Remove the `import traceback` (if not used elsewhere) and the `traceback.format_exc()` call. Restore the original `sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")`. + - **HOW:** Use `manual-slop_edit_file` with the exact old/new strings. + - **SAFETY:** Verify `traceback` is not used elsewhere in the file before removing the import. Use `uv run rg "traceback" src/app_controller.py` to check. + - **VERIFY:** `uv run rg "traceback" src/app_controller.py` returns 0 hits (or only the import line which should also be removed) + - **COMMIT:** `chore(rag): remove diagnostic traceback from _do_rag_sync (Phase 2.3)` + +- [ ] **Task 2.4**: Add a defensive guard or proper error message (G3) + - **WHERE:** TBD based on the fix in Task 2.1 + - **WHAT:** Ensure the error message identifies WHICH field or call is None. For example, change "error: NoneType has no attribute 'get'" to "error: RAG sync failed: .get() called on None in ". + - **HOW:** Catch the specific exception type and re-raise with a more informative message. Or add a `try/except` around the specific call site. + - **SAFETY:** The new error message should not leak sensitive information (file paths are OK; credentials are not). + - **VERIFY:** Run the 3 RAG tests; if the bug recurs, the error message is more useful. + - **COMMIT:** `fix(rag): add defensive guard with informative error message (Phase 2.4)` + +--- + +## Phase 3: Full test suite + batched verification (30 min) + +**Focus:** Ensure no regression in the broader test suite. + +- [ ] **Task 3.1**: Run the full RAG test suite + - **Command:** `uv run pytest tests/test_rag_engine.py tests/test_rag_engine_result.py tests/test_rag_engine_ready_status_bug.py tests/test_rag_gui_presence.py tests/test_rag_integration.py tests/test_sync_rag_engine_coalescing.py tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase3_rag_suite.log` + - **EXPECTED:** 30+/30+ pass (no new failures) + - **COMMIT:** No new commit; this is a verification step. + +- [ ] **Task 3.2**: Run the full test suite + - **Command:** `uv run pytest tests/ 2>&1 | tee tests/artifacts/rag_track_phase3_full.log` + - **EXPECTED:** 1285 pass + 4 skip + 0 fail (was 1282 + 4 + 3 pre-track) + - **ACTION:** If NEW failures appear, STOP and report to the user. + - **COMMIT:** No new commit; this is a verification step. + +- [ ] **Task 3.3**: Run the batched test suite + - **Command:** `uv run .\scripts\run_tests_batched.py 2>&1 | tee tests/artifacts/rag_track_phase3_batched.log` + - **EXPECTED:** All tiers PASS; no failures + - **COMMIT:** `conductor(checkpoint): Phase 3 complete - 1285 tests pass, 0 failures` + +--- + +## Phase 4: Docs update (15 min) + +**Focus:** Document the fix in `docs/guide_rag.md` (if it exists). + +- [ ] **Task 4.1**: Check if `docs/guide_rag.md` exists + - **Command:** `uv run rg "guide_rag" docs/ docs/AGENTS.md` + - **EXPECTED:** May or may not exist; if not, skip Phase 4 + - **COMMIT:** No new commit. + +- [ ] **Task 4.2 (CONDITIONAL)**: If `docs/guide_rag.md` exists, add a troubleshooting entry + - **WHERE:** `docs/guide_rag.md` (a "Troubleshooting" or "Known issues" section) + - **WHAT:** Add 1-2 paragraphs documenting: + - The error: "If `rag_status` shows `'NoneType' object has no attribute 'get'`, ..." + - The fix: "Check the RAG sync worker at `src/app_controller.py:_do_rag_sync`..." + - **HOW:** Use `manual-slop_edit_file` to add the section. + - **VERIFY:** `uv run rg "NoneType" docs/guide_rag.md` returns 1 hit + - **COMMIT:** `docs(rag): document the NoneType.get fix (Phase 4.2)` + +--- + +## Phase 5: Metadata + tracks.md (15 min) + +**Focus:** Mark the track complete in the project registry. + +- [ ] **Task 5.1**: Update `metadata.json` to mark the track complete + - **WHERE:** `conductor/tracks/rag_test_failures_20260615/metadata.json` + - **WHAT:** Change `"status": "active"` to `"status": "completed"`. Add a `completed_at` field. Update `verification_criteria` to reflect what was actually verified. + - **HOW:** Direct file edit. + - **COMMIT:** `conductor(track): mark rag_test_failures_20260615 as completed` + +- [ ] **Task 5.2**: Update `conductor/tracks.md` to reflect the track's status + - **WHERE:** `conductor/tracks.md` + - **WHAT:** Add a row for the RAG track or update the existing RAG section. + - **HOW:** Direct file edit. + - **COMMIT:** `conductor: mark rag_test_failures_20260615 as completed in tracks.md` + +- [ ] **Task 5.3**: Conductor - User Manual Verification + - **ACTION:** Announce the track is complete. Provide the user with a summary: "3 RAG tests fixed; first fully green baseline since 2026-06-12. The user can now proceed with the `send_result` → `send` mass rename or the `data_structure_strengthening_20260606` track." + +--- + +## Summary + +- **Total tasks:** ~10 (across 5 phases) +- **Total atomic commits:** ~5-7 (1 diagnostic + 1 reproducing test + 1 fix + 1 diag removal + 1 defensive guard + 1 docs + 1 metadata) +- **Total estimated effort:** 0.5-1 day Tier 2 work (4-8 hours) +- **Dependencies:** None (independent track) +- **Out of scope (deferred):** `send_result` → `send` mass rename (user's manual refactor); 23 lower-impact weak-type files (data_structure_strengthening); live_gui_mock_injection infrastructure + +## Test count math + +- **Pre-track baseline:** 1282 pass + 4 skip + 3 fail +- **After this track:** 1285 pass + 4 skip + 0 fail (3 newly-passing) +- **First fully green baseline** since `data_oriented_error_handling_20260606` shipped 2026-06-12