From 006df67637143dbc7f3ed3db34e0167f5f1975af Mon Sep 17 00:00:00 2001 From: Ed_ Date: Mon, 15 Jun 2026 21:51:11 -0400 Subject: [PATCH] conductor(track): spec for rag_test_failures_20260615 (3 RAG test fixes, single root cause) --- .../tracks/rag_test_failures_20260615/spec.md | 386 ++++++++++++++++++ 1 file changed, 386 insertions(+) create mode 100644 conductor/tracks/rag_test_failures_20260615/spec.md diff --git a/conductor/tracks/rag_test_failures_20260615/spec.md b/conductor/tracks/rag_test_failures_20260615/spec.md new file mode 100644 index 00000000..1f223967 --- /dev/null +++ b/conductor/tracks/rag_test_failures_20260615/spec.md @@ -0,0 +1,386 @@ +# Track Specification: RAG Test Failures Fix + +**Track ID:** `rag_test_failures_20260615` +**Status:** Active (spec approved 2026-06-15) +**Priority:** A (foundational; precedes `data_structure_strengthening_20260606` and the user's planned `send_result` → `send` mass rename) +**Owner:** Tier 2 Tech Lead +**Type:** bugfix + test_fix +**Estimated effort:** 0.5-1 day Tier 2 work (4-8 hours) +**Parent tracks:** `data_oriented_error_handling_20260606` (shipped 2026-06-12), `ai_loop_regressions_20260614` (shipped 2026-06-15), `doeh_test_thinking_cleanup_20260615` (shipped 2026-06-15), `public_api_migration_and_ui_polish_20260615` (shipped 2026-06-15) + +--- + +## 0. TL;DR + +A small, focused bug-fix track that resolves the **3 remaining pre-existing test failures** (not 4 as the parent track documented — `test_rag_integration.py` was inadvertently fixed by the public_api migration's Phase 2 follow-up, commit `26e1b652`). + +**All 3 failures share the same root cause:** the RAG sync worker at `src/app_controller.py:_do_rag_sync` catches an exception during the `RAGEngine` construction or subsequent config lookup, and the error message is `"'NoneType' object has no attribute 'get'"`. This is a specific Python error pattern indicating a `dict.get()` call is being made on a `None` value somewhere in the RAG setup path. + +**Result:** all 1285 tests pass (1282 + 3 RAG fixed). The project reaches a fully-green baseline for the first time since the `data_oriented_error_handling_20260606` track shipped on 2026-06-12. The user can then proceed with the planned `send_result` → `send` mass rename and the `data_structure_strengthening_20260606` track. + +--- + +## 1. Overview + +### 1.1 Current State (as of 2026-06-15) + +After the `public_api_migration_and_ui_polish_20260615` track completed: +- **1282 tests pass** (was 1280 pre-track; 7 newly-passing in the run, 13 fixed total per the completion report) +- **4 tests skipped** (unchanged) +- **3 tests fail** (was 10 pre-track; down from 4 RAG failures because `test_rag_integration.py::test_rag_integration` is now passing) + +The 3 remaining failures are all RAG subsystem tests in tier-3 (live_gui): + +| Test | Tier | File | Failure point | +|---|---|---|---| +| `test_rag_phase4_final_verify::test_phase4_final_verify` | tier-3 (live_gui) | `tests/test_rag_phase4_final_verify.py` | Line 65 (after `rag_enabled=True` + wait for `rag_status == 'ready'`) | +| `test_rag_phase4_stress::test_rag_large_codebase_verification_sim` | tier-3 (live_gui) | `tests/test_rag_phase4_stress.py` | Line 48 (same pattern) | +| `test_rag_visual_sim::test_rag_full_lifecycle_sim` | tier-3 (live_gui) | `tests/test_rag_visual_sim.py` | Line 32 (initial status check after `rag_enabled=True`) | + +All 3 fail with the **same error message** captured in `rag_status`: `"error: 'NoneType' object has no attribute 'get'"`. The error originates in `src/app_controller.py:_do_rag_sync` (line 1479-1482): + +```python +except Exception as e: + self._set_rag_status(f"error: {e}") + sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n") + sys.stderr.flush() +``` + +### 1.2 Gaps to Fill (this Track's Scope) + +| Gap | Count | Spec Section | +|---|---|---| +| Investigate the RAG sync NoneType.get error | 1 investigation | §3.1 | +| Fix the underlying bug in `src/app_controller.py` and/or `src/rag_engine.py` | 1-3 code changes | §3.2 | +| Verify the 3 RAG tests pass | 3 test fixes | §3.3 | + +### 1.3 Already Implemented (DO NOT re-implement) + +Verified by code audit (2026-06-15): + +- **`RAGConfig` default** (`src/models.py:1039-1065`) — has `vector_store: VectorStoreConfig = field(default_factory=lambda: VectorStoreConfig(provider='mock'))`; the default is NOT `None`. Confirmed by direct instantiation: `RAGConfig().vector_store.provider == 'mock'`. +- **`RAGEngine.__init__` with `vector_store.provider='mock'`** — succeeds; `is_empty()` returns `True`; no further sync work is triggered (mock branch at `src/rag_engine.py:123-126`). +- **`_do_rag_sync` coalescing** — the `token + dirty flag` pattern prevents N parallel syncs; works correctly (per `test_infrastructure_hardening_20260609` track). +- **`_init_vector_store_result` mock branch** — sets `self.client = "mock"` and `self.collection = "mock"`; `is_empty()` and `add_documents()` both check for this and return early. +- **`test_rag_integration.py::test_rag_integration`** — already PASSES (fixed incidentally by `public_api_migration_and_ui_polish_20260615` Phase 2 follow-up commit `26e1b652`). + +### 1.4 Investigation Clues + +The error pattern `"'NoneType' object has no attribute 'get'"` is a specific Python error indicating a `dict.get()` call on a `None` value. The most likely candidates in the RAG sync path: + +1. **`src/app_controller.py:1469` — `engine = rag_engine.RAGEngine(self.rag_config, self.active_project_root)`** — if `self.active_project_root` is `None` or the `RAGConfig` has a `None` sub-field. + - **Status:** `active_project_root` is a property that returns `str(Path(self.active_project_path).parent)` or `self.ui_files_base_dir`. The test sets `files_base_dir` to a valid path. + - **Status:** `RAGConfig()` default has all required fields populated. + +2. **`src/rag_engine.py:89-101` — `RAGEngine.__init__`** — calls `_init_embedding_provider()` and `_init_vector_store_result()`. With `vector_store.provider='mock'`, the latter should return `Result(data=None)` (success). + - **Status:** Verified by direct instantiation: the engine constructs successfully. + +3. **`src/rag_engine.py:111-128` — `_init_vector_store_result`** — the `'chroma'` branch calls `_validate_collection_dim_result()` (line 122) which calls `self.collection.get(limit=1, include=["embeddings"])` (line 146) then `res.get("embeddings")` (line 149). If `self.collection` is set but the chromadb call returns a non-dict (e.g. a `Result` object), `.get()` would fail with NoneType. + - **Status:** This is the most likely candidate. The `is_empty()` and `add_documents()` short-circuit on the mock string, but the `_init_vector_store_result` for the `'mock'` branch returns immediately with `Result(data=None)` (line 126) — so the chromadb validation is skipped. So this isn't the bug for the 'mock' case. + - **Status:** For the 'chroma' case (test_rag_phase4_stress uses 'chroma'), the validation runs. If `self.embedding_provider.embed(["__rag_dim_check__"])` fails (e.g. due to gemini client not being initialized in the test subprocess), the error could be different. But the test_rag_phase4_stress uses `rag_emb_provider='local'` which depends on `sentence_transformers`. + +4. **`src/app_controller.py:230` — `controller.rag_engine and controller.rag_config and controller.rag_config.enabled`** — this is the entry check; if any of these is None, the sync is skipped. + - **Status:** `self.rag_config` is set in `__init__` (line 1830-1831) and reset in `reset_session` (line 3387). Should never be None after init. + +5. **A more subtle cause:** the `submit_io` lambda in `src/app_controller.py:1457` (`self.submit_io(lambda: self._do_rag_sync(token))`) submits a lambda. If the IO pool is shared with the user-agent / MMA comms callbacks, an unrelated exception in a different task could leak into the RAG status. + - **Status:** Low likelihood, but worth checking. + +The implementer MUST use TDD red-first: add a focused test that reproduces the error with minimal setup, then trace the call chain to find the actual `.get(None)` call. The audit above is a starting point, not a definitive diagnosis. + +--- + +## 2. Goals + +### 2.1 Functional Goals + +| ID | Goal | Acceptance Criterion | +|---|---|---| +| **G1** | Investigate the RAG sync NoneType.get error | A focused regression test reproduces the error with `rag_enabled=True` + `rag_source='mock'` setup | +| **G2** | Fix the underlying bug | The 3 RAG tests pass after the fix; no regression in the 12 RAG-related tests that already pass | +| **G3** | Add a defensive guard or proper error message | If a config field is unexpectedly None, the error message identifies WHICH field is None (so future debug is easier) | +| **G4** | Update `docs/guide_rag.md` to document the fix | The relevant guide has a "Known issues" or "Troubleshooting" section if appropriate | + +### 2.2 Non-Functional Goals + +| ID | Goal | Acceptance Criterion | +|---|---|---| +| **NF1** | Zero new regressions | `uv run pytest tests/` shows 3 fewer failures than pre-track baseline; no new failures | +| **NF2** | Per-task atomic commits | 1-3 atomic commits with clear messages | +| **NF3** | 1-space indentation, no comments, type hints preserved | `uv run python -c "import ast; ast.parse(open('src/app_controller.py').read())"` succeeds | +| **NF4** | Per-commit git notes | All commits have git notes summarizing the fix | + +--- + +## 3. Per-File Design + +### 3.1 Investigation: Reproduce the error in isolation + +The first task is a TDD red. The implementer should write a test that reproduces the error with minimal setup. + +**Recommended test file:** `tests/test_rag_sync_none_error.py` (new file) + +**The test pattern:** +```python +def test_rag_sync_does_not_fail_with_none_error(controller_with_rag_enabled): + # controller_with_rag_enabled: a fixture that: + # - Creates an AppController + # - Sets rag_enabled=True, rag_source='mock', files_base_dir=tmp_path + # - Submits the sync + # - Waits for the sync to complete (poll _rag_sync_dirty or rag_status) + status = controller.rag_status + assert "error" not in status, f"RAG sync failed unexpectedly: {status}" + # OR + assert status == "ready", f"Expected 'ready', got: {status}" +``` + +**The diagnostic step:** +1. Run the test; capture the full error message +2. Add a `sys.stderr.write` traceback capture in the except clause at `src/app_controller.py:1479` +3. Find the actual line where the `.get()` is called on None +4. **Document the root cause** in the commit message (so the fix is traceable) + +### 3.2 The fix + +The fix depends on what the investigation finds. Three likely scenarios: + +**Scenario A: A config field is None** (most likely) +- **Example:** If `self.rag_config.embedding_provider` is somehow `None` when the setter for `rag_source` is called, the engine init would fail. +- **Fix:** Add a guard in the setter: `if not self.rag_config: return` and a fallback in the engine init: `if self.config.embedding_provider is None: raise ValueError("embedding_provider must be set before rag_enabled")`. +- **Files affected:** `src/rag_engine.py`, possibly `src/app_controller.py` + +**Scenario B: A dict access is failing on a ChromaDB response** +- **Example:** `_validate_collection_dim_result` line 149: `embeddings = res.get("embeddings") if isinstance(res, dict) else None`. If chromadb returns a different object type, the `.get()` is skipped (None is returned) but the call downstream may fail. +- **Fix:** Add more defensive guards or correct the type check. +- **Files affected:** `src/rag_engine.py` + +**Scenario C: A side effect of a previous test (subprocess state pollution)** +- **Example:** A prior test in the live_gui subprocess left the RAG config in a bad state. +- **Fix:** Reset the RAG config in the test's `setup` or use `live_gui.reset_session()`. +- **Files affected:** The test (no production code change) + +**The implementer MUST** follow the TDD protocol: write the reproducing test, run it, observe the failure, trace the root cause, fix it, run the test again, verify all 3 RAG tests pass. + +### 3.3 Test verification + +After the fix: +- The 3 RAG tests pass in isolation +- The 3 RAG tests pass in batched run (`scripts/run_tests_batched.py`) +- The full test suite has 1285 pass (was 1282) + 4 skip + 0 fail (was 3) +- No regression in `test_rag_engine.py` (9+ tests), `test_rag_engine_result.py`, `test_rag_engine_ready_status_bug.py`, `test_rag_gui_presence.py`, `test_rag_integration.py`, `test_sync_rag_engine_coalescing.py`, `test_rag_phase4_stress.py` (after the fix) + +### 3.4 Documentation + +Update `docs/guide_rag.md` (if it exists; check first) with: +- A short note about the fix (1 paragraph) +- A troubleshooting entry if the error is likely to recur: "If `rag_status` shows `'NoneType' object has no attribute 'get'`, check that `rag_config.embedding_provider` is set before `rag_enabled`." + +If `docs/guide_rag.md` does not exist, no new doc is needed (the per-source-file guide is the wrong place for this; the test file's docstring or the commit message is sufficient). + +--- + +## 4. Architecture Reference + +### 4.1 The RAG sync pipeline + +The RAG sync is initiated when any of the RAG-related setters is called (`rag_enabled`, `rag_source`, `rag_emb_provider`, `rag_chunk_size`, `rag_chunk_overlap`, etc.): + +``` +[Set rag_* property] -> [setter calls _sync_rag_engine()] -> [token + dirty flag update] + | + v + [submit_io(_do_rag_sync(token))] -> [IO pool worker] + | + v + [_do_rag_sync body] + | + v + [RAGEngine(config, base_dir) construction] + | + v + [if engine.is_empty() and self.files -> _rebuild_rag_index()] + | + v + [set _set_rag_status("ready" | "error: ...")] +``` + +### 4.2 The mock branch + +The `RAGConfig().vector_store.provider` defaults to `'mock'`. When the engine init hits this branch: + +```python +elif vs_config.provider == 'mock': + self.client = "mock" + self.collection = "mock" + return Result(data=None) +``` + +The engine is "empty" (`is_empty()` returns `True` for mock). `_rebuild_rag_index` is NOT called. The status should be "ready" immediately. + +### 4.3 The coalescing pattern + +The `token + dirty flag` pattern in `_sync_rag_engine` ensures that N rapid setter calls produce ONE sync, not N parallel syncs. This is the pattern from `test_infrastructure_hardening_20260609` track. The token check at line 1463 short-circuits superseded syncs. + +### 4.4 The status update mechanism + +`self._set_rag_status(status)` appends a task to `_pending_gui_tasks`. The GUI render loop processes the queue and updates the `rag_status` field. The test polls `client.get_value('rag_status')` to wait for the update. + +--- + +## 5. Test Plan + +### 5.1 Per-phase test verification + +| Phase | Test command | Expected | +|---|---|---| +| 1 | `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 \| tee tests/artifacts/rag_track_phase1_red.log` | 3/3 fail with the NoneType.get error | +| 2 | (after fix) `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 \| tee tests/artifacts/rag_track_phase2_green.log` | 3/3 pass | +| 3 | (full suite) `uv run pytest tests/ 2>&1 \| tee tests/artifacts/rag_track_phase3_full.log` | 1285 pass + 4 skip + 0 fail | +| 4 | (batched) `uv run .\scripts\run_tests_batched.py 2>&1 \| tee tests/artifacts/rag_track_phase4_batched.log` | All tiers PASS; no failures | + +### 5.2 TDD red verification + +For each new test or fix: +1. Verify the test FAILS as expected (red phase) +2. Implement the fix +3. Verify the test PASSES (green phase) +4. Verify no regression in the previously-passing tests +5. Commit + +**Anti-pattern guard:** per `AGENTS.md` "Critical Anti-Patterns", no skipping tests just because they fail. The 3 RAG tests are the actual problem to solve; the implementer must find and fix the root cause. + +### 5.3 The diagnostic strategy + +If the implementer can't find the bug from the error message alone: +1. Add `import traceback; sys.stderr.write(traceback.format_exc())` to the except clause in `src/app_controller.py:1479-1482` +2. Run the test; capture the full traceback +3. Find the actual `.get(None)` call +4. **Document the traceback in the commit message** (so the fix is traceable) +5. Remove the diag traceback after the fix is verified + +--- + +## 6. Migration Strategy + +This is a small bug-fix track. The phases are simple: + +1. **Phase 1: Investigation + reproducing test (1-2 hours)** +2. **Phase 2: Fix (1-3 hours)** +3. **Phase 3: Full test suite + batched verification (30 min)** +4. **Phase 4: Docs update (15 min)** +5. **Phase 5: Metadata + tracks.md (15 min)** + +The order doesn't matter much (it's all one fix); the implementer can iterate between Phase 1 and 2 as needed. + +--- + +## 7. Out of Scope + +### 7.1 Deferred to separate tracks + +| ID | Item | Defer to | Why | +|---|---|---|---| +| OOS1 | The `send_result` → `send` mass rename (user's stated intent) | User's manual refactor after this track | The user wants to do this themselves. The Result API is stable; only the function name changes. | +| OOS2 | 23 lower-impact files with weak types (per `data_structure_strengthening_20260606/spec.md` §1 line 20) | `data_structure_strengthening_20260606` (the next major track) | That's the data_structure track's scope. | +| OOS3 | `live_gui_mock_injection_20260615` infrastructure | Separate infrastructure track | Not blocking. Recommended but not required. | +| OOS4 | The full RAG test cleanup (e.g., removing `time.sleep(0.5)` patterns in favor of poll loops) | Separate RAG test quality track | The tests are functional; this is a test-quality improvement, not a bug fix. | +| OOS5 | The Gemini CLI thinking-format path | Defer to `doeh_test_thinking_cleanup_20260615` follow-up | Not in this track's scope. | +| OOS6 | The `RAGConfig` data structure improvements (e.g., nested validation) | `data_structure_strengthening_20260606` | Not blocking the bug fix. | + +### 7.2 Explicitly NOT in this track + +- The user wants to do a `send_result` → `send` mass rename after this track. **Do not** do it in this track. The bug fix is for RAG only. +- A general RAG test quality cleanup (poll loops, error message improvements, etc.) — out of scope; only fix the specific bug. +- The `_rebuild_rag_index` method's complex error handling — out of scope; only fix the specific bug. + +--- + +## 8. Risks & Mitigations + +| ID | Risk | Likelihood | Impact | Mitigation | +|---|---|---|---|---| +| **R1** | The fix breaks an unrelated test | Low | Medium | Run the full test suite in Phase 3 + the batched test in Phase 4. If a new failure appears, STOP and report. | +| **R2** | The bug is in a hard-to-reach code path (deep in IO pool worker) | Medium | Medium | Add diagnostic traceback in the except clause; capture the actual error site; document in the commit message. | +| **R3** | The fix is in the test (subprocess state pollution) not the production code | Low | Low | If the fix is in the test, document this in the commit message. Consider adding a teardown reset in the test. | +| **R4** | The fix introduces a regression in `test_rag_engine_ready_status_bug.py` | Low | Medium | Run the full RAG test suite after the fix. | +| **R5** | The implementation takes longer than estimated (1 day) | Low | Low | This is a 0.5-1 day track; even 2 days is acceptable. The user's overall plan is to do 2 more tracks (this + a `send_result` → `send` rename) before the data structure track. | + +--- + +## 9. Verification Criteria (definition of "done") + +The track is DONE when **ALL** of the following are true: + +1. **G1: A reproducing test exists** that fails before the fix +2. **G2: All 3 RAG tests pass** (test_rag_phase4_final_verify, test_rag_phase4_stress, test_rag_visual_sim) +3. **G3: A defensive guard or proper error message** is added (so future debug is easier) +4. **G4: docs/guide_rag.md** updated (if it exists) +5. **NF1: No new regressions** in the full test suite (1285 pass + 4 skip + 0 fail) +6. **NF2: Per-task atomic commits** (1-3 commits total) +7. **NF3: 1-space indentation + no comments + type hints preserved** +8. **NF4: Per-commit git notes** attached + +**Test count math:** +- Pre-track baseline: 1282 pass + 4 skip + 3 fail +- After this track: 1285 pass + 4 skip + 0 fail (3 newly-passing) +- This is the FIRST time the project is fully green since `data_oriented_error_handling_20260606` shipped on 2026-06-12. + +--- + +## 10. Execution Order & Dependencies + +**No external blockers.** This track can start immediately after the Tier 1 review approves the spec. + +**Execution order (the plan):** +1. Phase 1: Investigation + reproducing test (1-2 hours) +2. Phase 2: Fix (1-3 hours) +3. Phase 3: Full test suite + batched verification (30 min) +4. Phase 4: Docs update (15 min) +5. Phase 5: Metadata + tracks.md (15 min) + +**Total:** 0.5-1 day Tier 2 work (4-8 hours) + +**Followed by:** the user can do the `send_result` → `send` mass rename themselves, then start `data_structure_strengthening_20260606` track. + +--- + +## 11. References + +### Architecture docs +- `docs/guide_rag.md` (if it exists) — RAG subsystem architecture +- `docs/guide_app_controller.md` — the `AppController._do_rag_sync` method is the entry point +- `docs/guide_testing.md` — `live_gui` fixture + structural testing contract + +### Styleguides +- `conductor/code_styleguides/error_handling.md` — `Result[T]` pattern (used by `RAGEngine._init_vector_store_result`) +- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference + +### Source code (the relevant lines) +- `src/app_controller.py:1451-1488` — `_sync_rag_engine` and `_do_rag_sync` (the entry points) +- `src/app_controller.py:1490-1497` — `rag_enabled` property + setter (triggers the sync) +- `src/app_controller.py:3016-3023` — `_set_rag_status` (sets the error status) +- `src/app_controller.py:3025-3056` — `_rebuild_rag_index` (the second worker) +- `src/rag_engine.py:88-128` — `RAGEngine.__init__` and `_init_vector_store_result` +- `src/rag_engine.py:130-166` — `_validate_collection_dim_result` (the most likely `.get()` call site) +- `src/models.py:1039-1065` — `RAGConfig` and `VectorStoreConfig` + +### Parent tracks +- `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §12.1 — the follow-up scope that included RAG fixes +- `conductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md` — the parent track that documented 4 RAG failures remaining (1 was inadvertently fixed) +- `docs/reports/TRACK_COMPLETION_public_api_migration_and_ui_polish_20260615.md` §3 deviation #2.3 — the `test_rag_integration.py` fix (commit 26e1b652) + +### Test files (the 3 to fix) +- `tests/test_rag_phase4_final_verify.py::test_phase4_final_verify` (tier-3 live_gui) +- `tests/test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim` (tier-3 live_gui) +- `tests/test_rag_visual_sim.py::test_rag_full_lifecycle_sim` (tier-3 live_gui) + +### Already-passing RAG tests (do NOT regress) +- `tests/test_rag_engine.py` (8+ tests) +- `tests/test_rag_engine_result.py` (3+ tests) +- `tests/test_rag_engine_ready_status_bug.py` (3+ tests) +- `tests/test_rag_gui_presence.py` (2 tests) +- `tests/test_rag_integration.py::test_rag_integration` (1 test; was failing pre-public_api, fixed by commit 26e1b652) +- `tests/test_sync_rag_engine_coalescing.py` (4+ tests) + +### User's stated intent (after this track) +- `send_result` → `send` mass rename (user will do manually) +- Then `data_structure_strengthening_20260606` track