23 KiB
Track Specification: RAG Test Failures Fix
Track ID: rag_test_failures_20260615
Status: Active (spec approved 2026-06-15)
Priority: A (foundational; precedes data_structure_strengthening_20260606 and the user's planned send_result → send mass rename)
Owner: Tier 2 Tech Lead
Type: bugfix + test_fix
Scope: 3 test failures (tier-3 live_gui RAG tests) + 1 production bug in 2 lines + 3 new unit tests
Parent tracks: data_oriented_error_handling_20260606 (shipped 2026-06-12), ai_loop_regressions_20260614 (shipped 2026-06-15), doeh_test_thinking_cleanup_20260615 (shipped 2026-06-15), public_api_migration_and_ui_polish_20260615 (shipped 2026-06-15)
0. TL;DR
A small, focused bug-fix track that resolves the 3 remaining pre-existing test failures (not 4 as the parent track documented — test_rag_integration.py was inadvertently fixed by the public_api migration's Phase 2 follow-up, commit 26e1b652).
All 3 failures share the same root cause: the RAG sync worker at src/app_controller.py:_do_rag_sync catches an exception during the RAGEngine construction or subsequent config lookup, and the error message is "'NoneType' object has no attribute 'get'". This is a specific Python error pattern indicating a dict.get() call is being made on a None value somewhere in the RAG setup path.
Result: all 1285 tests pass (1282 + 3 RAG fixed). The project reaches a fully-green baseline for the first time since the data_oriented_error_handling_20260606 track shipped on 2026-06-12. The user can then proceed with the planned send_result → send mass rename and the data_structure_strengthening_20260606 track.
1. Overview
1.1 Current State (as of 2026-06-15)
After the public_api_migration_and_ui_polish_20260615 track completed:
- 1282 tests pass (was 1280 pre-track; 7 newly-passing in the run, 13 fixed total per the completion report)
- 4 tests skipped (unchanged)
- 3 tests fail (was 10 pre-track; down from 4 RAG failures because
test_rag_integration.py::test_rag_integrationis now passing)
The 3 remaining failures are all RAG subsystem tests in tier-3 (live_gui):
| Test | Tier | File | Failure point |
|---|---|---|---|
test_rag_phase4_final_verify::test_phase4_final_verify |
tier-3 (live_gui) | tests/test_rag_phase4_final_verify.py |
Line 65 (after rag_enabled=True + wait for rag_status == 'ready') |
test_rag_phase4_stress::test_rag_large_codebase_verification_sim |
tier-3 (live_gui) | tests/test_rag_phase4_stress.py |
Line 48 (same pattern) |
test_rag_visual_sim::test_rag_full_lifecycle_sim |
tier-3 (live_gui) | tests/test_rag_visual_sim.py |
Line 32 (initial status check after rag_enabled=True) |
All 3 fail with the same error message captured in rag_status: "error: 'NoneType' object has no attribute 'get'". The error originates in src/app_controller.py:_do_rag_sync (line 1479-1482):
except Exception as e:
self._set_rag_status(f"error: {e}")
sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")
sys.stderr.flush()
1.2 Gaps to Fill (this Track's Scope)
| Gap | Count | Spec Section |
|---|---|---|
| Investigate the RAG sync NoneType.get error | 1 investigation | §3.1 |
Fix the underlying bug in src/app_controller.py and/or src/rag_engine.py |
1-3 code changes | §3.2 |
| Verify the 3 RAG tests pass | 3 test fixes | §3.3 |
1.3 Already Implemented (DO NOT re-implement)
Verified by code audit (2026-06-15):
RAGConfigdefault (src/models.py:1039-1065) — hasvector_store: VectorStoreConfig = field(default_factory=lambda: VectorStoreConfig(provider='mock')); the default is NOTNone. Confirmed by direct instantiation:RAGConfig().vector_store.provider == 'mock'.RAGEngine.__init__withvector_store.provider='mock'— succeeds;is_empty()returnsTrue; no further sync work is triggered (mock branch atsrc/rag_engine.py:123-126)._do_rag_synccoalescing — thetoken + dirty flagpattern prevents N parallel syncs; works correctly (pertest_infrastructure_hardening_20260609track)._init_vector_store_resultmock branch — setsself.client = "mock"andself.collection = "mock";is_empty()andadd_documents()both check for this and return early.test_rag_integration.py::test_rag_integration— already PASSES (fixed incidentally bypublic_api_migration_and_ui_polish_20260615Phase 2 follow-up commit26e1b652).
1.4 Investigation Clues
The error pattern "'NoneType' object has no attribute 'get'" is a specific Python error indicating a dict.get() call on a None value. The most likely candidates in the RAG sync path:
-
src/app_controller.py:1469—engine = rag_engine.RAGEngine(self.rag_config, self.active_project_root)— ifself.active_project_rootisNoneor theRAGConfighas aNonesub-field.- Status:
active_project_rootis a property that returnsstr(Path(self.active_project_path).parent)orself.ui_files_base_dir. The test setsfiles_base_dirto a valid path. - Status:
RAGConfig()default has all required fields populated.
- Status:
-
src/rag_engine.py:89-101—RAGEngine.__init__— calls_init_embedding_provider()and_init_vector_store_result(). Withvector_store.provider='mock', the latter should returnResult(data=None)(success).- Status: Verified by direct instantiation: the engine constructs successfully.
-
src/rag_engine.py:111-128—_init_vector_store_result— the'chroma'branch calls_validate_collection_dim_result()(line 122) which callsself.collection.get(limit=1, include=["embeddings"])(line 146) thenres.get("embeddings")(line 149). Ifself.collectionis set but the chromadb call returns a non-dict (e.g. aResultobject),.get()would fail with NoneType.- Status: This is the most likely candidate. The
is_empty()andadd_documents()short-circuit on the mock string, but the_init_vector_store_resultfor the'mock'branch returns immediately withResult(data=None)(line 126) — so the chromadb validation is skipped. So this isn't the bug for the 'mock' case. - Status: For the 'chroma' case (test_rag_phase4_stress uses 'chroma'), the validation runs. If
self.embedding_provider.embed(["__rag_dim_check__"])fails (e.g. due to gemini client not being initialized in the test subprocess), the error could be different. But the test_rag_phase4_stress usesrag_emb_provider='local'which depends onsentence_transformers.
- Status: This is the most likely candidate. The
-
src/app_controller.py:230—controller.rag_engine and controller.rag_config and controller.rag_config.enabled— this is the entry check; if any of these is None, the sync is skipped.- Status:
self.rag_configis set in__init__(line 1830-1831) and reset inreset_session(line 3387). Should never be None after init.
- Status:
-
A more subtle cause: the
submit_iolambda insrc/app_controller.py:1457(self.submit_io(lambda: self._do_rag_sync(token))) submits a lambda. If the IO pool is shared with the user-agent / MMA comms callbacks, an unrelated exception in a different task could leak into the RAG status.- Status: Low likelihood, but worth checking.
The implementer MUST use TDD red-first: add a focused test that reproduces the error with minimal setup, then trace the call chain to find the actual .get(None) call. The audit above is a starting point, not a definitive diagnosis.
2. Goals
2.1 Functional Goals
| ID | Goal | Acceptance Criterion |
|---|---|---|
| G1 | Investigate the RAG sync NoneType.get error | A focused regression test reproduces the error with rag_enabled=True + rag_source='mock' setup |
| G2 | Fix the underlying bug | The 3 RAG tests pass after the fix; no regression in the 12 RAG-related tests that already pass |
| G3 | Add a defensive guard or proper error message | If a config field is unexpectedly None, the error message identifies WHICH field is None (so future debug is easier) |
| G4 | Update docs/guide_rag.md to document the fix |
The relevant guide has a "Known issues" or "Troubleshooting" section if appropriate |
2.2 Non-Functional Goals
| ID | Goal | Acceptance Criterion |
|---|---|---|
| NF1 | Zero new regressions | uv run pytest tests/ shows 3 fewer failures than pre-track baseline; no new failures |
| NF2 | Per-task atomic commits | 1-3 atomic commits with clear messages |
| NF3 | 1-space indentation, no comments, type hints preserved | uv run python -c "import ast; ast.parse(open('src/app_controller.py').read())" succeeds |
| NF4 | Per-commit git notes | All commits have git notes summarizing the fix |
3. Per-File Design
3.1 Investigation: Reproduce the error in isolation
The first task is a TDD red. The implementer should write a test that reproduces the error with minimal setup.
Recommended test file: tests/test_rag_sync_none_error.py (new file)
The test pattern:
def test_rag_sync_does_not_fail_with_none_error(controller_with_rag_enabled):
# controller_with_rag_enabled: a fixture that:
# - Creates an AppController
# - Sets rag_enabled=True, rag_source='mock', files_base_dir=tmp_path
# - Submits the sync
# - Waits for the sync to complete (poll _rag_sync_dirty or rag_status)
status = controller.rag_status
assert "error" not in status, f"RAG sync failed unexpectedly: {status}"
# OR
assert status == "ready", f"Expected 'ready', got: {status}"
The diagnostic step:
- Run the test; capture the full error message
- Add a
sys.stderr.writetraceback capture in the except clause atsrc/app_controller.py:1479 - Find the actual line where the
.get()is called on None - Document the root cause in the commit message (so the fix is traceable)
3.2 The fix
The fix depends on what the investigation finds. Three likely scenarios:
Scenario A: A config field is None (most likely)
- Example: If
self.rag_config.embedding_provideris somehowNonewhen the setter forrag_sourceis called, the engine init would fail. - Fix: Add a guard in the setter:
if not self.rag_config: returnand a fallback in the engine init:if self.config.embedding_provider is None: raise ValueError("embedding_provider must be set before rag_enabled"). - Files affected:
src/rag_engine.py, possiblysrc/app_controller.py
Scenario B: A dict access is failing on a ChromaDB response
- Example:
_validate_collection_dim_resultline 149:embeddings = res.get("embeddings") if isinstance(res, dict) else None. If chromadb returns a different object type, the.get()is skipped (None is returned) but the call downstream may fail. - Fix: Add more defensive guards or correct the type check.
- Files affected:
src/rag_engine.py
Scenario C: A side effect of a previous test (subprocess state pollution)
- Example: A prior test in the live_gui subprocess left the RAG config in a bad state.
- Fix: Reset the RAG config in the test's
setupor uselive_gui.reset_session(). - Files affected: The test (no production code change)
The implementer MUST follow the TDD protocol: write the reproducing test, run it, observe the failure, trace the root cause, fix it, run the test again, verify all 3 RAG tests pass.
3.3 Test verification
After the fix:
- The 3 RAG tests pass in isolation
- The 3 RAG tests pass in batched run (
scripts/run_tests_batched.py) - The full test suite has 1285 pass (was 1282) + 4 skip + 0 fail (was 3)
- No regression in
test_rag_engine.py(9+ tests),test_rag_engine_result.py,test_rag_engine_ready_status_bug.py,test_rag_gui_presence.py,test_rag_integration.py,test_sync_rag_engine_coalescing.py,test_rag_phase4_stress.py(after the fix)
3.4 Documentation
Update docs/guide_rag.md (if it exists; check first) with:
- A short note about the fix (1 paragraph)
- A troubleshooting entry if the error is likely to recur: "If
rag_statusshows'NoneType' object has no attribute 'get', check thatrag_config.embedding_provideris set beforerag_enabled."
If docs/guide_rag.md does not exist, no new doc is needed (the per-source-file guide is the wrong place for this; the test file's docstring or the commit message is sufficient).
4. Architecture Reference
4.1 The RAG sync pipeline
The RAG sync is initiated when any of the RAG-related setters is called (rag_enabled, rag_source, rag_emb_provider, rag_chunk_size, rag_chunk_overlap, etc.):
[Set rag_* property] -> [setter calls _sync_rag_engine()] -> [token + dirty flag update]
|
v
[submit_io(_do_rag_sync(token))] -> [IO pool worker]
|
v
[_do_rag_sync body]
|
v
[RAGEngine(config, base_dir) construction]
|
v
[if engine.is_empty() and self.files -> _rebuild_rag_index()]
|
v
[set _set_rag_status("ready" | "error: ...")]
4.2 The mock branch
The RAGConfig().vector_store.provider defaults to 'mock'. When the engine init hits this branch:
elif vs_config.provider == 'mock':
self.client = "mock"
self.collection = "mock"
return Result(data=None)
The engine is "empty" (is_empty() returns True for mock). _rebuild_rag_index is NOT called. The status should be "ready" immediately.
4.3 The coalescing pattern
The token + dirty flag pattern in _sync_rag_engine ensures that N rapid setter calls produce ONE sync, not N parallel syncs. This is the pattern from test_infrastructure_hardening_20260609 track. The token check at line 1463 short-circuits superseded syncs.
4.4 The status update mechanism
self._set_rag_status(status) appends a task to _pending_gui_tasks. The GUI render loop processes the queue and updates the rag_status field. The test polls client.get_value('rag_status') to wait for the update.
5. Test Plan
5.1 Per-phase test verification
| Phase | Test command | Expected |
|---|---|---|
| 1 | uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase1_red.log |
3/3 fail with the NoneType.get error |
| 2 | (after fix) uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase2_green.log |
3/3 pass |
| 3 | (full suite) uv run pytest tests/ 2>&1 | tee tests/artifacts/rag_track_phase3_full.log |
1285 pass + 4 skip + 0 fail |
| 4 | (batched) uv run .\scripts\run_tests_batched.py 2>&1 | tee tests/artifacts/rag_track_phase4_batched.log |
All tiers PASS; no failures |
5.2 TDD red verification
For each new test or fix:
- Verify the test FAILS as expected (red phase)
- Implement the fix
- Verify the test PASSES (green phase)
- Verify no regression in the previously-passing tests
- Commit
Anti-pattern guard: per AGENTS.md "Critical Anti-Patterns", no skipping tests just because they fail. The 3 RAG tests are the actual problem to solve; the implementer must find and fix the root cause.
5.3 The diagnostic strategy
If the implementer can't find the bug from the error message alone:
- Add
import traceback; sys.stderr.write(traceback.format_exc())to the except clause insrc/app_controller.py:1479-1482 - Run the test; capture the full traceback
- Find the actual
.get(None)call - Document the traceback in the commit message (so the fix is traceable)
- Remove the diag traceback after the fix is verified
6. Migration Strategy
This is a small bug-fix track. The phases are simple:
- Phase 1: Investigation + reproducing test
- Phase 2: Fix
- Phase 3: Full test suite + batched verification
- Phase 4: Docs update
- Phase 5: Metadata + tracks.md
The order doesn't matter much (it's all one fix); the implementer can iterate between Phase 1 and 2 as needed.
7. Out of Scope
7.1 Deferred to separate tracks
| ID | Item | Defer to | Why |
|---|---|---|---|
| OOS1 | The send_result → send mass rename (user's stated intent) |
User's manual refactor after this track | The user wants to do this themselves. The Result API is stable; only the function name changes. |
| OOS2 | 23 lower-impact files with weak types (per data_structure_strengthening_20260606/spec.md §1 line 20) |
data_structure_strengthening_20260606 (the next major track) |
That's the data_structure track's scope. |
| OOS3 | live_gui_mock_injection_20260615 infrastructure |
Separate infrastructure track | Not blocking. Recommended but not required. |
| OOS4 | The full RAG test cleanup (e.g., removing time.sleep(0.5) patterns in favor of poll loops) |
Separate RAG test quality track | The tests are functional; this is a test-quality improvement, not a bug fix. |
| OOS5 | The Gemini CLI thinking-format path | Defer to doeh_test_thinking_cleanup_20260615 follow-up |
Not in this track's scope. |
| OOS6 | The RAGConfig data structure improvements (e.g., nested validation) |
data_structure_strengthening_20260606 |
Not blocking the bug fix. |
7.2 Explicitly NOT in this track
- The user wants to do a
send_result→sendmass rename after this track. Do not do it in this track. The bug fix is for RAG only. - A general RAG test quality cleanup (poll loops, error message improvements, etc.) — out of scope; only fix the specific bug.
- The
_rebuild_rag_indexmethod's complex error handling — out of scope; only fix the specific bug.
8. Risks & Mitigations
| ID | Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|---|
| R1 | The fix breaks an unrelated test | Low | Medium | Run the full test suite in Phase 3 + the batched test in Phase 4. If a new failure appears, STOP and report. |
| R2 | The bug is in a hard-to-reach code path (deep in IO pool worker) | Medium | Medium | Add diagnostic traceback in the except clause; capture the actual error site; document in the commit message. |
| R3 | The fix is in the test (subprocess state pollution) not the production code | Low | Low | If the fix is in the test, document this in the commit message. Consider adding a teardown reset in the test. |
| R4 | The fix introduces a regression in test_rag_engine_ready_status_bug.py |
Low | Medium | Run the full RAG test suite after the fix. |
| R5 | The implementation is larger than the 2-line fix suggested by the spec | Low | Low | The spec is a guide, not a contract. If the fix is larger (e.g., a larger refactor is needed), the Tier 2 reports and the user decides whether to expand scope. The user's overall plan is 2 more tracks (this + a send_result → send rename) before the data structure track. |
9. Verification Criteria (definition of "done")
The track is DONE when ALL of the following are true:
- G1: A reproducing test exists that fails before the fix
- G2: All 3 RAG tests pass (test_rag_phase4_final_verify, test_rag_phase4_stress, test_rag_visual_sim)
- G3: A defensive guard or proper error message is added (so future debug is easier)
- G4: docs/guide_rag.md updated (if it exists)
- NF1: No new regressions in the full test suite (1285 pass + 4 skip + 0 fail)
- NF2: Per-task atomic commits (1-3 commits total)
- NF3: 1-space indentation + no comments + type hints preserved
- NF4: Per-commit git notes attached
Test count math:
- Pre-track baseline: 1282 pass + 4 skip + 3 fail
- After this track: 1285 pass + 4 skip + 0 fail (3 newly-passing)
- This is the FIRST time the project is fully green since
data_oriented_error_handling_20260606shipped on 2026-06-12.
10. Execution Order & Dependencies
No external blockers. This track can start immediately after the Tier 1 review approves the spec.
Execution order (the plan):
- Phase 1: Investigation + reproducing test
- Phase 2: Fix
- Phase 3: Full test suite + batched verification
- Phase 4: Docs update
- Phase 5: Metadata + tracks.md
Total: 5 phases, ~10 tasks, 4 atomic commits (1 fix + 1 docs + 1 metadata + 1 final-state); all with git notes.
Followed by: the user can do the send_result → send mass rename themselves, then start data_structure_strengthening_20260606 track.
11. References
Architecture docs
docs/guide_rag.md(if it exists) — RAG subsystem architecturedocs/guide_app_controller.md— theAppController._do_rag_syncmethod is the entry pointdocs/guide_testing.md—live_guifixture + structural testing contract
Styleguides
conductor/code_styleguides/error_handling.md—Result[T]pattern (used byRAGEngine._init_vector_store_result)conductor/code_styleguides/data_oriented_design.md— the canonical DOD reference
Source code (the relevant lines)
src/app_controller.py:1451-1488—_sync_rag_engineand_do_rag_sync(the entry points)src/app_controller.py:1490-1497—rag_enabledproperty + setter (triggers the sync)src/app_controller.py:3016-3023—_set_rag_status(sets the error status)src/app_controller.py:3025-3056—_rebuild_rag_index(the second worker)src/rag_engine.py:88-128—RAGEngine.__init__and_init_vector_store_resultsrc/rag_engine.py:130-166—_validate_collection_dim_result(the most likely.get()call site)src/models.py:1039-1065—RAGConfigandVectorStoreConfig
Parent tracks
conductor/tracks/data_oriented_error_handling_20260606/spec.md§12.1 — the follow-up scope that included RAG fixesconductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md— the parent track that documented 4 RAG failures remaining (1 was inadvertently fixed)docs/reports/TRACK_COMPLETION_public_api_migration_and_ui_polish_20260615.md§3 deviation #2.3 — thetest_rag_integration.pyfix (commit26e1b652)
Test files (the 3 to fix)
tests/test_rag_phase4_final_verify.py::test_phase4_final_verify(tier-3 live_gui)tests/test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim(tier-3 live_gui)tests/test_rag_visual_sim.py::test_rag_full_lifecycle_sim(tier-3 live_gui)
Already-passing RAG tests (do NOT regress)
tests/test_rag_engine.py(8+ tests)tests/test_rag_engine_result.py(3+ tests)tests/test_rag_engine_ready_status_bug.py(3+ tests)tests/test_rag_gui_presence.py(2 tests)tests/test_rag_integration.py::test_rag_integration(1 test; was failing pre-public_api, fixed by commit26e1b652)tests/test_sync_rag_engine_coalescing.py(4+ tests)
User's stated intent (after this track)
send_result→sendmass rename (user will do manually)- Then
data_structure_strengthening_20260606track