Private
Public Access
0
0

conductor(track): plan for rag_test_failures_20260615 (5 phases, ~10 tasks)

This commit is contained in:
2026-06-15 21:53:13 -04:00
parent 006df67637
commit 3aa4cfa133
@@ -0,0 +1,173 @@
# Plan: RAG Test Failures Fix
**Track:** `rag_test_failures_20260615`
**Spec:** `spec.md`
**Status:** Active (plan approved 2026-06-15)
## TDD Protocol (MANDATORY)
For each phase, the order is:
1. **Red**: verify the test/failure is present (TDD red phase)
2. **Green**: implement the fix; run the test; confirm it passes
3. **Verify green**: run the targeted test batch to confirm no regression
4. **Commit**: one atomic commit per task with a clear message
5. **Git note**: attach a 3-5 sentence summary to the commit
Per the project rule (see `AGENTS.md` "Critical Anti-Patterns"), per-task atomic commits. The 1-space indentation rule is in effect.
**Diagnostic strategy:** the error message `"'NoneType' object has no attribute 'get'"` is specific — it indicates a `dict.get()` call on a `None` value. The implementer should add a diagnostic traceback to the except clause at `src/app_controller.py:1479` to capture the actual call site, then remove the traceback after the fix is verified.
---
## Phase 1: Investigation + reproducing test (1-2 hours)
**Focus:** Find the exact location of the `.get(None)` call. The spec §1.4 lists 5 candidate sites; the investigation will narrow to 1.
- [ ] **Task 1.1**: TDD red - verify all 3 RAG tests fail with the same error
- **Command:** `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase1_red.log`
- **EXPECTED:** 3 failures, all with the same `rag_status: error: 'NoneType' object has no attribute 'get'`
- **COMMIT:** No new commit; this is a verification step.
- [ ] **Task 1.2**: Add diagnostic traceback to the except clause
- **WHERE:** `src/app_controller.py:1479-1482` (the except clause in `_do_rag_sync`)
- **WHAT:** Replace the existing `sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")` with `sys.stderr.write(traceback.format_exc())`. Also `import traceback` at the top of the file (if not already imported).
- **HOW:** Use `manual-slop_edit_file` to add the import and update the except clause. 2-line change.
- **NOTE:** This is a temporary diagnostic; remove it in Phase 2 after the fix is verified.
- **SAFETY:** The `traceback` import is stdlib; no new dependency. The `format_exc()` is thread-safe.
- **VERIFY:** `uv run pytest tests/test_rag_visual_sim.py -v 2>&1 | tee /tmp/rag_diag.log` — confirm the full traceback is printed to stderr
- **COMMIT:** `chore(rag): add diagnostic traceback to _do_rag_sync except clause (Phase 1.2)`
- [ ] **Task 1.3**: Capture the full traceback and identify the call site
- **Command:** `uv run pytest tests/test_rag_visual_sim.py -v 2>&1 | grep -A 30 "Traceback"`
- **EXPECTED:** A traceback showing the exact line where `.get()` is called on None
- **OUTPUT:** Document the traceback in the commit message for the fix (Phase 2)
- **COMMIT:** No new commit; this is a verification step.
- [ ] **Task 1.4**: Write a focused reproducing test (smaller than the 3 RAG tests)
- **WHERE:** `tests/test_rag_sync_none_error.py` (new file, ~30 lines)
- **WHAT:** A focused test that:
1. Creates an `AppController` with mocked dependencies
2. Sets `rag_enabled=True` via the setter
3. Submits the sync and waits for completion
4. Asserts `rag_status != "error: ..."` (or specifically `rag_status == "ready"`)
- **HOW:** Use the existing `test_orchestration_logic.py` or `test_rag_engine.py` patterns as a template. Use `MagicMock` for the controller's heavy dependencies.
- **SAFETY:** No live_gui; this should be a fast unit test.
- **VERIFY:** `uv run pytest tests/test_rag_sync_none_error.py -v` fails with the same error
- **COMMIT:** `test(rag): add focused reproducing test for NoneType.get sync error (Phase 1.4)`
---
## Phase 2: Fix (1-3 hours)
**Focus:** Fix the root cause found in Phase 1. The fix is dependent on what the investigation reveals.
- [ ] **Task 2.1**: Implement the fix based on the Phase 1 investigation
- **WHERE:** TBD based on Phase 1 (one of: `src/rag_engine.py:_validate_collection_dim_result`, `src/rag_engine.py:_init_vector_store_result`, `src/app_controller.py:_do_rag_sync`, or a config field setter)
- **WHAT:** Add a defensive guard or correct the call. Specific examples:
- If `src/rag_engine.py:149` (`embeddings = res.get("embeddings")`): Add a check that `res` is a dict before calling `.get()`; if not, return `Result(data=None)` early.
- If a config field is None: Add a guard in the setter or a fallback in the engine init.
- If the IO pool is leaking errors from another worker: Add a more specific exception handler.
- **HOW:** Use `manual-slop_edit_file` for surgical changes. 1-5 lines typical.
- **SAFETY:** The fix must be defensive (guard against future None) or corrective (the field should not be None). Document the choice in the commit message.
- **VERIFY:** `uv run pytest tests/test_rag_sync_none_error.py -v` passes (the new test from Phase 1.4)
- **COMMIT:** `fix(rag): handle None response in _validate_collection_dim_result (Phase 2.1)` (or appropriate title based on the actual fix)
- [ ] **Task 2.2**: Verify all 3 RAG tests pass
- **Command:** `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase2_green.log`
- **EXPECTED:** 3/3 pass
- **COMMIT:** No new commit; this is a verification step.
- [ ] **Task 2.3**: Remove the diagnostic traceback from Phase 1.2
- **WHERE:** `src/app_controller.py:1479-1482`
- **WHAT:** Remove the `import traceback` (if not used elsewhere) and the `traceback.format_exc()` call. Restore the original `sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")`.
- **HOW:** Use `manual-slop_edit_file` with the exact old/new strings.
- **SAFETY:** Verify `traceback` is not used elsewhere in the file before removing the import. Use `uv run rg "traceback" src/app_controller.py` to check.
- **VERIFY:** `uv run rg "traceback" src/app_controller.py` returns 0 hits (or only the import line which should also be removed)
- **COMMIT:** `chore(rag): remove diagnostic traceback from _do_rag_sync (Phase 2.3)`
- [ ] **Task 2.4**: Add a defensive guard or proper error message (G3)
- **WHERE:** TBD based on the fix in Task 2.1
- **WHAT:** Ensure the error message identifies WHICH field or call is None. For example, change "error: NoneType has no attribute 'get'" to "error: RAG sync failed: <class>.get() called on None in <function>".
- **HOW:** Catch the specific exception type and re-raise with a more informative message. Or add a `try/except` around the specific call site.
- **SAFETY:** The new error message should not leak sensitive information (file paths are OK; credentials are not).
- **VERIFY:** Run the 3 RAG tests; if the bug recurs, the error message is more useful.
- **COMMIT:** `fix(rag): add defensive guard with informative error message (Phase 2.4)`
---
## Phase 3: Full test suite + batched verification (30 min)
**Focus:** Ensure no regression in the broader test suite.
- [ ] **Task 3.1**: Run the full RAG test suite
- **Command:** `uv run pytest tests/test_rag_engine.py tests/test_rag_engine_result.py tests/test_rag_engine_ready_status_bug.py tests/test_rag_gui_presence.py tests/test_rag_integration.py tests/test_sync_rag_engine_coalescing.py tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase3_rag_suite.log`
- **EXPECTED:** 30+/30+ pass (no new failures)
- **COMMIT:** No new commit; this is a verification step.
- [ ] **Task 3.2**: Run the full test suite
- **Command:** `uv run pytest tests/ 2>&1 | tee tests/artifacts/rag_track_phase3_full.log`
- **EXPECTED:** 1285 pass + 4 skip + 0 fail (was 1282 + 4 + 3 pre-track)
- **ACTION:** If NEW failures appear, STOP and report to the user.
- **COMMIT:** No new commit; this is a verification step.
- [ ] **Task 3.3**: Run the batched test suite
- **Command:** `uv run .\scripts\run_tests_batched.py 2>&1 | tee tests/artifacts/rag_track_phase3_batched.log`
- **EXPECTED:** All tiers PASS; no failures
- **COMMIT:** `conductor(checkpoint): Phase 3 complete - 1285 tests pass, 0 failures`
---
## Phase 4: Docs update (15 min)
**Focus:** Document the fix in `docs/guide_rag.md` (if it exists).
- [ ] **Task 4.1**: Check if `docs/guide_rag.md` exists
- **Command:** `uv run rg "guide_rag" docs/ docs/AGENTS.md`
- **EXPECTED:** May or may not exist; if not, skip Phase 4
- **COMMIT:** No new commit.
- [ ] **Task 4.2 (CONDITIONAL)**: If `docs/guide_rag.md` exists, add a troubleshooting entry
- **WHERE:** `docs/guide_rag.md` (a "Troubleshooting" or "Known issues" section)
- **WHAT:** Add 1-2 paragraphs documenting:
- The error: "If `rag_status` shows `'NoneType' object has no attribute 'get'`, ..."
- The fix: "Check the RAG sync worker at `src/app_controller.py:_do_rag_sync`..."
- **HOW:** Use `manual-slop_edit_file` to add the section.
- **VERIFY:** `uv run rg "NoneType" docs/guide_rag.md` returns 1 hit
- **COMMIT:** `docs(rag): document the NoneType.get fix (Phase 4.2)`
---
## Phase 5: Metadata + tracks.md (15 min)
**Focus:** Mark the track complete in the project registry.
- [ ] **Task 5.1**: Update `metadata.json` to mark the track complete
- **WHERE:** `conductor/tracks/rag_test_failures_20260615/metadata.json`
- **WHAT:** Change `"status": "active"` to `"status": "completed"`. Add a `completed_at` field. Update `verification_criteria` to reflect what was actually verified.
- **HOW:** Direct file edit.
- **COMMIT:** `conductor(track): mark rag_test_failures_20260615 as completed`
- [ ] **Task 5.2**: Update `conductor/tracks.md` to reflect the track's status
- **WHERE:** `conductor/tracks.md`
- **WHAT:** Add a row for the RAG track or update the existing RAG section.
- **HOW:** Direct file edit.
- **COMMIT:** `conductor: mark rag_test_failures_20260615 as completed in tracks.md`
- [ ] **Task 5.3**: Conductor - User Manual Verification
- **ACTION:** Announce the track is complete. Provide the user with a summary: "3 RAG tests fixed; first fully green baseline since 2026-06-12. The user can now proceed with the `send_result``send` mass rename or the `data_structure_strengthening_20260606` track."
---
## Summary
- **Total tasks:** ~10 (across 5 phases)
- **Total atomic commits:** ~5-7 (1 diagnostic + 1 reproducing test + 1 fix + 1 diag removal + 1 defensive guard + 1 docs + 1 metadata)
- **Total estimated effort:** 0.5-1 day Tier 2 work (4-8 hours)
- **Dependencies:** None (independent track)
- **Out of scope (deferred):** `send_result``send` mass rename (user's manual refactor); 23 lower-impact weak-type files (data_structure_strengthening); live_gui_mock_injection infrastructure
## Test count math
- **Pre-track baseline:** 1282 pass + 4 skip + 3 fail
- **After this track:** 1285 pass + 4 skip + 0 fail (3 newly-passing)
- **First fully green baseline** since `data_oriented_error_handling_20260606` shipped 2026-06-12