Private
Public Access
0
0

conductor(track): spec for rag_test_failures_20260615 (3 RAG test fixes, single root cause)

This commit is contained in:
2026-06-15 21:51:11 -04:00
parent bc388f11bb
commit 006df67637
@@ -0,0 +1,386 @@
# Track Specification: RAG Test Failures Fix
**Track ID:** `rag_test_failures_20260615`
**Status:** Active (spec approved 2026-06-15)
**Priority:** A (foundational; precedes `data_structure_strengthening_20260606` and the user's planned `send_result``send` mass rename)
**Owner:** Tier 2 Tech Lead
**Type:** bugfix + test_fix
**Estimated effort:** 0.5-1 day Tier 2 work (4-8 hours)
**Parent tracks:** `data_oriented_error_handling_20260606` (shipped 2026-06-12), `ai_loop_regressions_20260614` (shipped 2026-06-15), `doeh_test_thinking_cleanup_20260615` (shipped 2026-06-15), `public_api_migration_and_ui_polish_20260615` (shipped 2026-06-15)
---
## 0. TL;DR
A small, focused bug-fix track that resolves the **3 remaining pre-existing test failures** (not 4 as the parent track documented — `test_rag_integration.py` was inadvertently fixed by the public_api migration's Phase 2 follow-up, commit `26e1b652`).
**All 3 failures share the same root cause:** the RAG sync worker at `src/app_controller.py:_do_rag_sync` catches an exception during the `RAGEngine` construction or subsequent config lookup, and the error message is `"'NoneType' object has no attribute 'get'"`. This is a specific Python error pattern indicating a `dict.get()` call is being made on a `None` value somewhere in the RAG setup path.
**Result:** all 1285 tests pass (1282 + 3 RAG fixed). The project reaches a fully-green baseline for the first time since the `data_oriented_error_handling_20260606` track shipped on 2026-06-12. The user can then proceed with the planned `send_result``send` mass rename and the `data_structure_strengthening_20260606` track.
---
## 1. Overview
### 1.1 Current State (as of 2026-06-15)
After the `public_api_migration_and_ui_polish_20260615` track completed:
- **1282 tests pass** (was 1280 pre-track; 7 newly-passing in the run, 13 fixed total per the completion report)
- **4 tests skipped** (unchanged)
- **3 tests fail** (was 10 pre-track; down from 4 RAG failures because `test_rag_integration.py::test_rag_integration` is now passing)
The 3 remaining failures are all RAG subsystem tests in tier-3 (live_gui):
| Test | Tier | File | Failure point |
|---|---|---|---|
| `test_rag_phase4_final_verify::test_phase4_final_verify` | tier-3 (live_gui) | `tests/test_rag_phase4_final_verify.py` | Line 65 (after `rag_enabled=True` + wait for `rag_status == 'ready'`) |
| `test_rag_phase4_stress::test_rag_large_codebase_verification_sim` | tier-3 (live_gui) | `tests/test_rag_phase4_stress.py` | Line 48 (same pattern) |
| `test_rag_visual_sim::test_rag_full_lifecycle_sim` | tier-3 (live_gui) | `tests/test_rag_visual_sim.py` | Line 32 (initial status check after `rag_enabled=True`) |
All 3 fail with the **same error message** captured in `rag_status`: `"error: 'NoneType' object has no attribute 'get'"`. The error originates in `src/app_controller.py:_do_rag_sync` (line 1479-1482):
```python
except Exception as e:
self._set_rag_status(f"error: {e}")
sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")
sys.stderr.flush()
```
### 1.2 Gaps to Fill (this Track's Scope)
| Gap | Count | Spec Section |
|---|---|---|
| Investigate the RAG sync NoneType.get error | 1 investigation | §3.1 |
| Fix the underlying bug in `src/app_controller.py` and/or `src/rag_engine.py` | 1-3 code changes | §3.2 |
| Verify the 3 RAG tests pass | 3 test fixes | §3.3 |
### 1.3 Already Implemented (DO NOT re-implement)
Verified by code audit (2026-06-15):
- **`RAGConfig` default** (`src/models.py:1039-1065`) — has `vector_store: VectorStoreConfig = field(default_factory=lambda: VectorStoreConfig(provider='mock'))`; the default is NOT `None`. Confirmed by direct instantiation: `RAGConfig().vector_store.provider == 'mock'`.
- **`RAGEngine.__init__` with `vector_store.provider='mock'`** — succeeds; `is_empty()` returns `True`; no further sync work is triggered (mock branch at `src/rag_engine.py:123-126`).
- **`_do_rag_sync` coalescing** — the `token + dirty flag` pattern prevents N parallel syncs; works correctly (per `test_infrastructure_hardening_20260609` track).
- **`_init_vector_store_result` mock branch** — sets `self.client = "mock"` and `self.collection = "mock"`; `is_empty()` and `add_documents()` both check for this and return early.
- **`test_rag_integration.py::test_rag_integration`** — already PASSES (fixed incidentally by `public_api_migration_and_ui_polish_20260615` Phase 2 follow-up commit `26e1b652`).
### 1.4 Investigation Clues
The error pattern `"'NoneType' object has no attribute 'get'"` is a specific Python error indicating a `dict.get()` call on a `None` value. The most likely candidates in the RAG sync path:
1. **`src/app_controller.py:1469``engine = rag_engine.RAGEngine(self.rag_config, self.active_project_root)`** — if `self.active_project_root` is `None` or the `RAGConfig` has a `None` sub-field.
- **Status:** `active_project_root` is a property that returns `str(Path(self.active_project_path).parent)` or `self.ui_files_base_dir`. The test sets `files_base_dir` to a valid path.
- **Status:** `RAGConfig()` default has all required fields populated.
2. **`src/rag_engine.py:89-101``RAGEngine.__init__`** — calls `_init_embedding_provider()` and `_init_vector_store_result()`. With `vector_store.provider='mock'`, the latter should return `Result(data=None)` (success).
- **Status:** Verified by direct instantiation: the engine constructs successfully.
3. **`src/rag_engine.py:111-128``_init_vector_store_result`** — the `'chroma'` branch calls `_validate_collection_dim_result()` (line 122) which calls `self.collection.get(limit=1, include=["embeddings"])` (line 146) then `res.get("embeddings")` (line 149). If `self.collection` is set but the chromadb call returns a non-dict (e.g. a `Result` object), `.get()` would fail with NoneType.
- **Status:** This is the most likely candidate. The `is_empty()` and `add_documents()` short-circuit on the mock string, but the `_init_vector_store_result` for the `'mock'` branch returns immediately with `Result(data=None)` (line 126) — so the chromadb validation is skipped. So this isn't the bug for the 'mock' case.
- **Status:** For the 'chroma' case (test_rag_phase4_stress uses 'chroma'), the validation runs. If `self.embedding_provider.embed(["__rag_dim_check__"])` fails (e.g. due to gemini client not being initialized in the test subprocess), the error could be different. But the test_rag_phase4_stress uses `rag_emb_provider='local'` which depends on `sentence_transformers`.
4. **`src/app_controller.py:230``controller.rag_engine and controller.rag_config and controller.rag_config.enabled`** — this is the entry check; if any of these is None, the sync is skipped.
- **Status:** `self.rag_config` is set in `__init__` (line 1830-1831) and reset in `reset_session` (line 3387). Should never be None after init.
5. **A more subtle cause:** the `submit_io` lambda in `src/app_controller.py:1457` (`self.submit_io(lambda: self._do_rag_sync(token))`) submits a lambda. If the IO pool is shared with the user-agent / MMA comms callbacks, an unrelated exception in a different task could leak into the RAG status.
- **Status:** Low likelihood, but worth checking.
The implementer MUST use TDD red-first: add a focused test that reproduces the error with minimal setup, then trace the call chain to find the actual `.get(None)` call. The audit above is a starting point, not a definitive diagnosis.
---
## 2. Goals
### 2.1 Functional Goals
| ID | Goal | Acceptance Criterion |
|---|---|---|
| **G1** | Investigate the RAG sync NoneType.get error | A focused regression test reproduces the error with `rag_enabled=True` + `rag_source='mock'` setup |
| **G2** | Fix the underlying bug | The 3 RAG tests pass after the fix; no regression in the 12 RAG-related tests that already pass |
| **G3** | Add a defensive guard or proper error message | If a config field is unexpectedly None, the error message identifies WHICH field is None (so future debug is easier) |
| **G4** | Update `docs/guide_rag.md` to document the fix | The relevant guide has a "Known issues" or "Troubleshooting" section if appropriate |
### 2.2 Non-Functional Goals
| ID | Goal | Acceptance Criterion |
|---|---|---|
| **NF1** | Zero new regressions | `uv run pytest tests/` shows 3 fewer failures than pre-track baseline; no new failures |
| **NF2** | Per-task atomic commits | 1-3 atomic commits with clear messages |
| **NF3** | 1-space indentation, no comments, type hints preserved | `uv run python -c "import ast; ast.parse(open('src/app_controller.py').read())"` succeeds |
| **NF4** | Per-commit git notes | All commits have git notes summarizing the fix |
---
## 3. Per-File Design
### 3.1 Investigation: Reproduce the error in isolation
The first task is a TDD red. The implementer should write a test that reproduces the error with minimal setup.
**Recommended test file:** `tests/test_rag_sync_none_error.py` (new file)
**The test pattern:**
```python
def test_rag_sync_does_not_fail_with_none_error(controller_with_rag_enabled):
# controller_with_rag_enabled: a fixture that:
# - Creates an AppController
# - Sets rag_enabled=True, rag_source='mock', files_base_dir=tmp_path
# - Submits the sync
# - Waits for the sync to complete (poll _rag_sync_dirty or rag_status)
status = controller.rag_status
assert "error" not in status, f"RAG sync failed unexpectedly: {status}"
# OR
assert status == "ready", f"Expected 'ready', got: {status}"
```
**The diagnostic step:**
1. Run the test; capture the full error message
2. Add a `sys.stderr.write` traceback capture in the except clause at `src/app_controller.py:1479`
3. Find the actual line where the `.get()` is called on None
4. **Document the root cause** in the commit message (so the fix is traceable)
### 3.2 The fix
The fix depends on what the investigation finds. Three likely scenarios:
**Scenario A: A config field is None** (most likely)
- **Example:** If `self.rag_config.embedding_provider` is somehow `None` when the setter for `rag_source` is called, the engine init would fail.
- **Fix:** Add a guard in the setter: `if not self.rag_config: return` and a fallback in the engine init: `if self.config.embedding_provider is None: raise ValueError("embedding_provider must be set before rag_enabled")`.
- **Files affected:** `src/rag_engine.py`, possibly `src/app_controller.py`
**Scenario B: A dict access is failing on a ChromaDB response**
- **Example:** `_validate_collection_dim_result` line 149: `embeddings = res.get("embeddings") if isinstance(res, dict) else None`. If chromadb returns a different object type, the `.get()` is skipped (None is returned) but the call downstream may fail.
- **Fix:** Add more defensive guards or correct the type check.
- **Files affected:** `src/rag_engine.py`
**Scenario C: A side effect of a previous test (subprocess state pollution)**
- **Example:** A prior test in the live_gui subprocess left the RAG config in a bad state.
- **Fix:** Reset the RAG config in the test's `setup` or use `live_gui.reset_session()`.
- **Files affected:** The test (no production code change)
**The implementer MUST** follow the TDD protocol: write the reproducing test, run it, observe the failure, trace the root cause, fix it, run the test again, verify all 3 RAG tests pass.
### 3.3 Test verification
After the fix:
- The 3 RAG tests pass in isolation
- The 3 RAG tests pass in batched run (`scripts/run_tests_batched.py`)
- The full test suite has 1285 pass (was 1282) + 4 skip + 0 fail (was 3)
- No regression in `test_rag_engine.py` (9+ tests), `test_rag_engine_result.py`, `test_rag_engine_ready_status_bug.py`, `test_rag_gui_presence.py`, `test_rag_integration.py`, `test_sync_rag_engine_coalescing.py`, `test_rag_phase4_stress.py` (after the fix)
### 3.4 Documentation
Update `docs/guide_rag.md` (if it exists; check first) with:
- A short note about the fix (1 paragraph)
- A troubleshooting entry if the error is likely to recur: "If `rag_status` shows `'NoneType' object has no attribute 'get'`, check that `rag_config.embedding_provider` is set before `rag_enabled`."
If `docs/guide_rag.md` does not exist, no new doc is needed (the per-source-file guide is the wrong place for this; the test file's docstring or the commit message is sufficient).
---
## 4. Architecture Reference
### 4.1 The RAG sync pipeline
The RAG sync is initiated when any of the RAG-related setters is called (`rag_enabled`, `rag_source`, `rag_emb_provider`, `rag_chunk_size`, `rag_chunk_overlap`, etc.):
```
[Set rag_* property] -> [setter calls _sync_rag_engine()] -> [token + dirty flag update]
|
v
[submit_io(_do_rag_sync(token))] -> [IO pool worker]
|
v
[_do_rag_sync body]
|
v
[RAGEngine(config, base_dir) construction]
|
v
[if engine.is_empty() and self.files -> _rebuild_rag_index()]
|
v
[set _set_rag_status("ready" | "error: ...")]
```
### 4.2 The mock branch
The `RAGConfig().vector_store.provider` defaults to `'mock'`. When the engine init hits this branch:
```python
elif vs_config.provider == 'mock':
self.client = "mock"
self.collection = "mock"
return Result(data=None)
```
The engine is "empty" (`is_empty()` returns `True` for mock). `_rebuild_rag_index` is NOT called. The status should be "ready" immediately.
### 4.3 The coalescing pattern
The `token + dirty flag` pattern in `_sync_rag_engine` ensures that N rapid setter calls produce ONE sync, not N parallel syncs. This is the pattern from `test_infrastructure_hardening_20260609` track. The token check at line 1463 short-circuits superseded syncs.
### 4.4 The status update mechanism
`self._set_rag_status(status)` appends a task to `_pending_gui_tasks`. The GUI render loop processes the queue and updates the `rag_status` field. The test polls `client.get_value('rag_status')` to wait for the update.
---
## 5. Test Plan
### 5.1 Per-phase test verification
| Phase | Test command | Expected |
|---|---|---|
| 1 | `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 \| tee tests/artifacts/rag_track_phase1_red.log` | 3/3 fail with the NoneType.get error |
| 2 | (after fix) `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 \| tee tests/artifacts/rag_track_phase2_green.log` | 3/3 pass |
| 3 | (full suite) `uv run pytest tests/ 2>&1 \| tee tests/artifacts/rag_track_phase3_full.log` | 1285 pass + 4 skip + 0 fail |
| 4 | (batched) `uv run .\scripts\run_tests_batched.py 2>&1 \| tee tests/artifacts/rag_track_phase4_batched.log` | All tiers PASS; no failures |
### 5.2 TDD red verification
For each new test or fix:
1. Verify the test FAILS as expected (red phase)
2. Implement the fix
3. Verify the test PASSES (green phase)
4. Verify no regression in the previously-passing tests
5. Commit
**Anti-pattern guard:** per `AGENTS.md` "Critical Anti-Patterns", no skipping tests just because they fail. The 3 RAG tests are the actual problem to solve; the implementer must find and fix the root cause.
### 5.3 The diagnostic strategy
If the implementer can't find the bug from the error message alone:
1. Add `import traceback; sys.stderr.write(traceback.format_exc())` to the except clause in `src/app_controller.py:1479-1482`
2. Run the test; capture the full traceback
3. Find the actual `.get(None)` call
4. **Document the traceback in the commit message** (so the fix is traceable)
5. Remove the diag traceback after the fix is verified
---
## 6. Migration Strategy
This is a small bug-fix track. The phases are simple:
1. **Phase 1: Investigation + reproducing test (1-2 hours)**
2. **Phase 2: Fix (1-3 hours)**
3. **Phase 3: Full test suite + batched verification (30 min)**
4. **Phase 4: Docs update (15 min)**
5. **Phase 5: Metadata + tracks.md (15 min)**
The order doesn't matter much (it's all one fix); the implementer can iterate between Phase 1 and 2 as needed.
---
## 7. Out of Scope
### 7.1 Deferred to separate tracks
| ID | Item | Defer to | Why |
|---|---|---|---|
| OOS1 | The `send_result``send` mass rename (user's stated intent) | User's manual refactor after this track | The user wants to do this themselves. The Result API is stable; only the function name changes. |
| OOS2 | 23 lower-impact files with weak types (per `data_structure_strengthening_20260606/spec.md` §1 line 20) | `data_structure_strengthening_20260606` (the next major track) | That's the data_structure track's scope. |
| OOS3 | `live_gui_mock_injection_20260615` infrastructure | Separate infrastructure track | Not blocking. Recommended but not required. |
| OOS4 | The full RAG test cleanup (e.g., removing `time.sleep(0.5)` patterns in favor of poll loops) | Separate RAG test quality track | The tests are functional; this is a test-quality improvement, not a bug fix. |
| OOS5 | The Gemini CLI thinking-format path | Defer to `doeh_test_thinking_cleanup_20260615` follow-up | Not in this track's scope. |
| OOS6 | The `RAGConfig` data structure improvements (e.g., nested validation) | `data_structure_strengthening_20260606` | Not blocking the bug fix. |
### 7.2 Explicitly NOT in this track
- The user wants to do a `send_result``send` mass rename after this track. **Do not** do it in this track. The bug fix is for RAG only.
- A general RAG test quality cleanup (poll loops, error message improvements, etc.) — out of scope; only fix the specific bug.
- The `_rebuild_rag_index` method's complex error handling — out of scope; only fix the specific bug.
---
## 8. Risks & Mitigations
| ID | Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|---|
| **R1** | The fix breaks an unrelated test | Low | Medium | Run the full test suite in Phase 3 + the batched test in Phase 4. If a new failure appears, STOP and report. |
| **R2** | The bug is in a hard-to-reach code path (deep in IO pool worker) | Medium | Medium | Add diagnostic traceback in the except clause; capture the actual error site; document in the commit message. |
| **R3** | The fix is in the test (subprocess state pollution) not the production code | Low | Low | If the fix is in the test, document this in the commit message. Consider adding a teardown reset in the test. |
| **R4** | The fix introduces a regression in `test_rag_engine_ready_status_bug.py` | Low | Medium | Run the full RAG test suite after the fix. |
| **R5** | The implementation takes longer than estimated (1 day) | Low | Low | This is a 0.5-1 day track; even 2 days is acceptable. The user's overall plan is to do 2 more tracks (this + a `send_result``send` rename) before the data structure track. |
---
## 9. Verification Criteria (definition of "done")
The track is DONE when **ALL** of the following are true:
1. **G1: A reproducing test exists** that fails before the fix
2. **G2: All 3 RAG tests pass** (test_rag_phase4_final_verify, test_rag_phase4_stress, test_rag_visual_sim)
3. **G3: A defensive guard or proper error message** is added (so future debug is easier)
4. **G4: docs/guide_rag.md** updated (if it exists)
5. **NF1: No new regressions** in the full test suite (1285 pass + 4 skip + 0 fail)
6. **NF2: Per-task atomic commits** (1-3 commits total)
7. **NF3: 1-space indentation + no comments + type hints preserved**
8. **NF4: Per-commit git notes** attached
**Test count math:**
- Pre-track baseline: 1282 pass + 4 skip + 3 fail
- After this track: 1285 pass + 4 skip + 0 fail (3 newly-passing)
- This is the FIRST time the project is fully green since `data_oriented_error_handling_20260606` shipped on 2026-06-12.
---
## 10. Execution Order & Dependencies
**No external blockers.** This track can start immediately after the Tier 1 review approves the spec.
**Execution order (the plan):**
1. Phase 1: Investigation + reproducing test (1-2 hours)
2. Phase 2: Fix (1-3 hours)
3. Phase 3: Full test suite + batched verification (30 min)
4. Phase 4: Docs update (15 min)
5. Phase 5: Metadata + tracks.md (15 min)
**Total:** 0.5-1 day Tier 2 work (4-8 hours)
**Followed by:** the user can do the `send_result``send` mass rename themselves, then start `data_structure_strengthening_20260606` track.
---
## 11. References
### Architecture docs
- `docs/guide_rag.md` (if it exists) — RAG subsystem architecture
- `docs/guide_app_controller.md` — the `AppController._do_rag_sync` method is the entry point
- `docs/guide_testing.md``live_gui` fixture + structural testing contract
### Styleguides
- `conductor/code_styleguides/error_handling.md``Result[T]` pattern (used by `RAGEngine._init_vector_store_result`)
- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference
### Source code (the relevant lines)
- `src/app_controller.py:1451-1488``_sync_rag_engine` and `_do_rag_sync` (the entry points)
- `src/app_controller.py:1490-1497``rag_enabled` property + setter (triggers the sync)
- `src/app_controller.py:3016-3023``_set_rag_status` (sets the error status)
- `src/app_controller.py:3025-3056``_rebuild_rag_index` (the second worker)
- `src/rag_engine.py:88-128``RAGEngine.__init__` and `_init_vector_store_result`
- `src/rag_engine.py:130-166``_validate_collection_dim_result` (the most likely `.get()` call site)
- `src/models.py:1039-1065``RAGConfig` and `VectorStoreConfig`
### Parent tracks
- `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §12.1 — the follow-up scope that included RAG fixes
- `conductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md` — the parent track that documented 4 RAG failures remaining (1 was inadvertently fixed)
- `docs/reports/TRACK_COMPLETION_public_api_migration_and_ui_polish_20260615.md` §3 deviation #2.3 — the `test_rag_integration.py` fix (commit 26e1b652)
### Test files (the 3 to fix)
- `tests/test_rag_phase4_final_verify.py::test_phase4_final_verify` (tier-3 live_gui)
- `tests/test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim` (tier-3 live_gui)
- `tests/test_rag_visual_sim.py::test_rag_full_lifecycle_sim` (tier-3 live_gui)
### Already-passing RAG tests (do NOT regress)
- `tests/test_rag_engine.py` (8+ tests)
- `tests/test_rag_engine_result.py` (3+ tests)
- `tests/test_rag_engine_ready_status_bug.py` (3+ tests)
- `tests/test_rag_gui_presence.py` (2 tests)
- `tests/test_rag_integration.py::test_rag_integration` (1 test; was failing pre-public_api, fixed by commit 26e1b652)
- `tests/test_sync_rag_engine_coalescing.py` (4+ tests)
### User's stated intent (after this track)
- `send_result``send` mass rename (user will do manually)
- Then `data_structure_strengthening_20260606` track