Private
Public Access
0
0
Files
manual_slop/conductor/archive/rag_test_failures_20260615/spec.md
T

23 KiB

Track Specification: RAG Test Failures Fix

Track ID: rag_test_failures_20260615 Status: Active (spec approved 2026-06-15) Priority: A (foundational; precedes data_structure_strengthening_20260606 and the user's planned send_resultsend mass rename) Owner: Tier 2 Tech Lead Type: bugfix + test_fix Scope: 3 test failures (tier-3 live_gui RAG tests) + 1 production bug in 2 lines + 3 new unit tests Parent tracks: data_oriented_error_handling_20260606 (shipped 2026-06-12), ai_loop_regressions_20260614 (shipped 2026-06-15), doeh_test_thinking_cleanup_20260615 (shipped 2026-06-15), public_api_migration_and_ui_polish_20260615 (shipped 2026-06-15)


0. TL;DR

A small, focused bug-fix track that resolves the 3 remaining pre-existing test failures (not 4 as the parent track documented — test_rag_integration.py was inadvertently fixed by the public_api migration's Phase 2 follow-up, commit 26e1b652).

All 3 failures share the same root cause: the RAG sync worker at src/app_controller.py:_do_rag_sync catches an exception during the RAGEngine construction or subsequent config lookup, and the error message is "'NoneType' object has no attribute 'get'". This is a specific Python error pattern indicating a dict.get() call is being made on a None value somewhere in the RAG setup path.

Result: all 1285 tests pass (1282 + 3 RAG fixed). The project reaches a fully-green baseline for the first time since the data_oriented_error_handling_20260606 track shipped on 2026-06-12. The user can then proceed with the planned send_resultsend mass rename and the data_structure_strengthening_20260606 track.


1. Overview

1.1 Current State (as of 2026-06-15)

After the public_api_migration_and_ui_polish_20260615 track completed:

  • 1282 tests pass (was 1280 pre-track; 7 newly-passing in the run, 13 fixed total per the completion report)
  • 4 tests skipped (unchanged)
  • 3 tests fail (was 10 pre-track; down from 4 RAG failures because test_rag_integration.py::test_rag_integration is now passing)

The 3 remaining failures are all RAG subsystem tests in tier-3 (live_gui):

Test Tier File Failure point
test_rag_phase4_final_verify::test_phase4_final_verify tier-3 (live_gui) tests/test_rag_phase4_final_verify.py Line 65 (after rag_enabled=True + wait for rag_status == 'ready')
test_rag_phase4_stress::test_rag_large_codebase_verification_sim tier-3 (live_gui) tests/test_rag_phase4_stress.py Line 48 (same pattern)
test_rag_visual_sim::test_rag_full_lifecycle_sim tier-3 (live_gui) tests/test_rag_visual_sim.py Line 32 (initial status check after rag_enabled=True)

All 3 fail with the same error message captured in rag_status: "error: 'NoneType' object has no attribute 'get'". The error originates in src/app_controller.py:_do_rag_sync (line 1479-1482):

except Exception as e:
    self._set_rag_status(f"error: {e}")
    sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")
    sys.stderr.flush()

1.2 Gaps to Fill (this Track's Scope)

Gap Count Spec Section
Investigate the RAG sync NoneType.get error 1 investigation §3.1
Fix the underlying bug in src/app_controller.py and/or src/rag_engine.py 1-3 code changes §3.2
Verify the 3 RAG tests pass 3 test fixes §3.3

1.3 Already Implemented (DO NOT re-implement)

Verified by code audit (2026-06-15):

  • RAGConfig default (src/models.py:1039-1065) — has vector_store: VectorStoreConfig = field(default_factory=lambda: VectorStoreConfig(provider='mock')); the default is NOT None. Confirmed by direct instantiation: RAGConfig().vector_store.provider == 'mock'.
  • RAGEngine.__init__ with vector_store.provider='mock' — succeeds; is_empty() returns True; no further sync work is triggered (mock branch at src/rag_engine.py:123-126).
  • _do_rag_sync coalescing — the token + dirty flag pattern prevents N parallel syncs; works correctly (per test_infrastructure_hardening_20260609 track).
  • _init_vector_store_result mock branch — sets self.client = "mock" and self.collection = "mock"; is_empty() and add_documents() both check for this and return early.
  • test_rag_integration.py::test_rag_integration — already PASSES (fixed incidentally by public_api_migration_and_ui_polish_20260615 Phase 2 follow-up commit 26e1b652).

1.4 Investigation Clues

The error pattern "'NoneType' object has no attribute 'get'" is a specific Python error indicating a dict.get() call on a None value. The most likely candidates in the RAG sync path:

  1. src/app_controller.py:1469engine = rag_engine.RAGEngine(self.rag_config, self.active_project_root) — if self.active_project_root is None or the RAGConfig has a None sub-field.

    • Status: active_project_root is a property that returns str(Path(self.active_project_path).parent) or self.ui_files_base_dir. The test sets files_base_dir to a valid path.
    • Status: RAGConfig() default has all required fields populated.
  2. src/rag_engine.py:89-101RAGEngine.__init__ — calls _init_embedding_provider() and _init_vector_store_result(). With vector_store.provider='mock', the latter should return Result(data=None) (success).

    • Status: Verified by direct instantiation: the engine constructs successfully.
  3. src/rag_engine.py:111-128_init_vector_store_result — the 'chroma' branch calls _validate_collection_dim_result() (line 122) which calls self.collection.get(limit=1, include=["embeddings"]) (line 146) then res.get("embeddings") (line 149). If self.collection is set but the chromadb call returns a non-dict (e.g. a Result object), .get() would fail with NoneType.

    • Status: This is the most likely candidate. The is_empty() and add_documents() short-circuit on the mock string, but the _init_vector_store_result for the 'mock' branch returns immediately with Result(data=None) (line 126) — so the chromadb validation is skipped. So this isn't the bug for the 'mock' case.
    • Status: For the 'chroma' case (test_rag_phase4_stress uses 'chroma'), the validation runs. If self.embedding_provider.embed(["__rag_dim_check__"]) fails (e.g. due to gemini client not being initialized in the test subprocess), the error could be different. But the test_rag_phase4_stress uses rag_emb_provider='local' which depends on sentence_transformers.
  4. src/app_controller.py:230controller.rag_engine and controller.rag_config and controller.rag_config.enabled — this is the entry check; if any of these is None, the sync is skipped.

    • Status: self.rag_config is set in __init__ (line 1830-1831) and reset in reset_session (line 3387). Should never be None after init.
  5. A more subtle cause: the submit_io lambda in src/app_controller.py:1457 (self.submit_io(lambda: self._do_rag_sync(token))) submits a lambda. If the IO pool is shared with the user-agent / MMA comms callbacks, an unrelated exception in a different task could leak into the RAG status.

    • Status: Low likelihood, but worth checking.

The implementer MUST use TDD red-first: add a focused test that reproduces the error with minimal setup, then trace the call chain to find the actual .get(None) call. The audit above is a starting point, not a definitive diagnosis.


2. Goals

2.1 Functional Goals

ID Goal Acceptance Criterion
G1 Investigate the RAG sync NoneType.get error A focused regression test reproduces the error with rag_enabled=True + rag_source='mock' setup
G2 Fix the underlying bug The 3 RAG tests pass after the fix; no regression in the 12 RAG-related tests that already pass
G3 Add a defensive guard or proper error message If a config field is unexpectedly None, the error message identifies WHICH field is None (so future debug is easier)
G4 Update docs/guide_rag.md to document the fix The relevant guide has a "Known issues" or "Troubleshooting" section if appropriate

2.2 Non-Functional Goals

ID Goal Acceptance Criterion
NF1 Zero new regressions uv run pytest tests/ shows 3 fewer failures than pre-track baseline; no new failures
NF2 Per-task atomic commits 1-3 atomic commits with clear messages
NF3 1-space indentation, no comments, type hints preserved uv run python -c "import ast; ast.parse(open('src/app_controller.py').read())" succeeds
NF4 Per-commit git notes All commits have git notes summarizing the fix

3. Per-File Design

3.1 Investigation: Reproduce the error in isolation

The first task is a TDD red. The implementer should write a test that reproduces the error with minimal setup.

Recommended test file: tests/test_rag_sync_none_error.py (new file)

The test pattern:

def test_rag_sync_does_not_fail_with_none_error(controller_with_rag_enabled):
    # controller_with_rag_enabled: a fixture that:
    #   - Creates an AppController
    #   - Sets rag_enabled=True, rag_source='mock', files_base_dir=tmp_path
    #   - Submits the sync
    #   - Waits for the sync to complete (poll _rag_sync_dirty or rag_status)
    status = controller.rag_status
    assert "error" not in status, f"RAG sync failed unexpectedly: {status}"
    # OR
    assert status == "ready", f"Expected 'ready', got: {status}"

The diagnostic step:

  1. Run the test; capture the full error message
  2. Add a sys.stderr.write traceback capture in the except clause at src/app_controller.py:1479
  3. Find the actual line where the .get() is called on None
  4. Document the root cause in the commit message (so the fix is traceable)

3.2 The fix

The fix depends on what the investigation finds. Three likely scenarios:

Scenario A: A config field is None (most likely)

  • Example: If self.rag_config.embedding_provider is somehow None when the setter for rag_source is called, the engine init would fail.
  • Fix: Add a guard in the setter: if not self.rag_config: return and a fallback in the engine init: if self.config.embedding_provider is None: raise ValueError("embedding_provider must be set before rag_enabled").
  • Files affected: src/rag_engine.py, possibly src/app_controller.py

Scenario B: A dict access is failing on a ChromaDB response

  • Example: _validate_collection_dim_result line 149: embeddings = res.get("embeddings") if isinstance(res, dict) else None. If chromadb returns a different object type, the .get() is skipped (None is returned) but the call downstream may fail.
  • Fix: Add more defensive guards or correct the type check.
  • Files affected: src/rag_engine.py

Scenario C: A side effect of a previous test (subprocess state pollution)

  • Example: A prior test in the live_gui subprocess left the RAG config in a bad state.
  • Fix: Reset the RAG config in the test's setup or use live_gui.reset_session().
  • Files affected: The test (no production code change)

The implementer MUST follow the TDD protocol: write the reproducing test, run it, observe the failure, trace the root cause, fix it, run the test again, verify all 3 RAG tests pass.

3.3 Test verification

After the fix:

  • The 3 RAG tests pass in isolation
  • The 3 RAG tests pass in batched run (scripts/run_tests_batched.py)
  • The full test suite has 1285 pass (was 1282) + 4 skip + 0 fail (was 3)
  • No regression in test_rag_engine.py (9+ tests), test_rag_engine_result.py, test_rag_engine_ready_status_bug.py, test_rag_gui_presence.py, test_rag_integration.py, test_sync_rag_engine_coalescing.py, test_rag_phase4_stress.py (after the fix)

3.4 Documentation

Update docs/guide_rag.md (if it exists; check first) with:

  • A short note about the fix (1 paragraph)
  • A troubleshooting entry if the error is likely to recur: "If rag_status shows 'NoneType' object has no attribute 'get', check that rag_config.embedding_provider is set before rag_enabled."

If docs/guide_rag.md does not exist, no new doc is needed (the per-source-file guide is the wrong place for this; the test file's docstring or the commit message is sufficient).


4. Architecture Reference

4.1 The RAG sync pipeline

The RAG sync is initiated when any of the RAG-related setters is called (rag_enabled, rag_source, rag_emb_provider, rag_chunk_size, rag_chunk_overlap, etc.):

[Set rag_* property] -> [setter calls _sync_rag_engine()] -> [token + dirty flag update]
                                                                    |
                                                                    v
                                          [submit_io(_do_rag_sync(token))] -> [IO pool worker]
                                                                                     |
                                                                                     v
                                                                            [_do_rag_sync body]
                                                                                     |
                                                                                     v
                                           [RAGEngine(config, base_dir) construction]
                                                                                     |
                                                                                     v
                                  [if engine.is_empty() and self.files -> _rebuild_rag_index()]
                                                                                     |
                                                                                     v
                                                        [set _set_rag_status("ready" | "error: ...")]

4.2 The mock branch

The RAGConfig().vector_store.provider defaults to 'mock'. When the engine init hits this branch:

elif vs_config.provider == 'mock':
    self.client = "mock"
    self.collection = "mock"
    return Result(data=None)

The engine is "empty" (is_empty() returns True for mock). _rebuild_rag_index is NOT called. The status should be "ready" immediately.

4.3 The coalescing pattern

The token + dirty flag pattern in _sync_rag_engine ensures that N rapid setter calls produce ONE sync, not N parallel syncs. This is the pattern from test_infrastructure_hardening_20260609 track. The token check at line 1463 short-circuits superseded syncs.

4.4 The status update mechanism

self._set_rag_status(status) appends a task to _pending_gui_tasks. The GUI render loop processes the queue and updates the rag_status field. The test polls client.get_value('rag_status') to wait for the update.


5. Test Plan

5.1 Per-phase test verification

Phase Test command Expected
1 uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase1_red.log 3/3 fail with the NoneType.get error
2 (after fix) uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 | tee tests/artifacts/rag_track_phase2_green.log 3/3 pass
3 (full suite) uv run pytest tests/ 2>&1 | tee tests/artifacts/rag_track_phase3_full.log 1285 pass + 4 skip + 0 fail
4 (batched) uv run .\scripts\run_tests_batched.py 2>&1 | tee tests/artifacts/rag_track_phase4_batched.log All tiers PASS; no failures

5.2 TDD red verification

For each new test or fix:

  1. Verify the test FAILS as expected (red phase)
  2. Implement the fix
  3. Verify the test PASSES (green phase)
  4. Verify no regression in the previously-passing tests
  5. Commit

Anti-pattern guard: per AGENTS.md "Critical Anti-Patterns", no skipping tests just because they fail. The 3 RAG tests are the actual problem to solve; the implementer must find and fix the root cause.

5.3 The diagnostic strategy

If the implementer can't find the bug from the error message alone:

  1. Add import traceback; sys.stderr.write(traceback.format_exc()) to the except clause in src/app_controller.py:1479-1482
  2. Run the test; capture the full traceback
  3. Find the actual .get(None) call
  4. Document the traceback in the commit message (so the fix is traceable)
  5. Remove the diag traceback after the fix is verified

6. Migration Strategy

This is a small bug-fix track. The phases are simple:

  1. Phase 1: Investigation + reproducing test
  2. Phase 2: Fix
  3. Phase 3: Full test suite + batched verification
  4. Phase 4: Docs update
  5. Phase 5: Metadata + tracks.md

The order doesn't matter much (it's all one fix); the implementer can iterate between Phase 1 and 2 as needed.


7. Out of Scope

7.1 Deferred to separate tracks

ID Item Defer to Why
OOS1 The send_resultsend mass rename (user's stated intent) User's manual refactor after this track The user wants to do this themselves. The Result API is stable; only the function name changes.
OOS2 23 lower-impact files with weak types (per data_structure_strengthening_20260606/spec.md §1 line 20) data_structure_strengthening_20260606 (the next major track) That's the data_structure track's scope.
OOS3 live_gui_mock_injection_20260615 infrastructure Separate infrastructure track Not blocking. Recommended but not required.
OOS4 The full RAG test cleanup (e.g., removing time.sleep(0.5) patterns in favor of poll loops) Separate RAG test quality track The tests are functional; this is a test-quality improvement, not a bug fix.
OOS5 The Gemini CLI thinking-format path Defer to doeh_test_thinking_cleanup_20260615 follow-up Not in this track's scope.
OOS6 The RAGConfig data structure improvements (e.g., nested validation) data_structure_strengthening_20260606 Not blocking the bug fix.

7.2 Explicitly NOT in this track

  • The user wants to do a send_resultsend mass rename after this track. Do not do it in this track. The bug fix is for RAG only.
  • A general RAG test quality cleanup (poll loops, error message improvements, etc.) — out of scope; only fix the specific bug.
  • The _rebuild_rag_index method's complex error handling — out of scope; only fix the specific bug.

8. Risks & Mitigations

ID Risk Likelihood Impact Mitigation
R1 The fix breaks an unrelated test Low Medium Run the full test suite in Phase 3 + the batched test in Phase 4. If a new failure appears, STOP and report.
R2 The bug is in a hard-to-reach code path (deep in IO pool worker) Medium Medium Add diagnostic traceback in the except clause; capture the actual error site; document in the commit message.
R3 The fix is in the test (subprocess state pollution) not the production code Low Low If the fix is in the test, document this in the commit message. Consider adding a teardown reset in the test.
R4 The fix introduces a regression in test_rag_engine_ready_status_bug.py Low Medium Run the full RAG test suite after the fix.
R5 The implementation is larger than the 2-line fix suggested by the spec Low Low The spec is a guide, not a contract. If the fix is larger (e.g., a larger refactor is needed), the Tier 2 reports and the user decides whether to expand scope. The user's overall plan is 2 more tracks (this + a send_resultsend rename) before the data structure track.

9. Verification Criteria (definition of "done")

The track is DONE when ALL of the following are true:

  1. G1: A reproducing test exists that fails before the fix
  2. G2: All 3 RAG tests pass (test_rag_phase4_final_verify, test_rag_phase4_stress, test_rag_visual_sim)
  3. G3: A defensive guard or proper error message is added (so future debug is easier)
  4. G4: docs/guide_rag.md updated (if it exists)
  5. NF1: No new regressions in the full test suite (1285 pass + 4 skip + 0 fail)
  6. NF2: Per-task atomic commits (1-3 commits total)
  7. NF3: 1-space indentation + no comments + type hints preserved
  8. NF4: Per-commit git notes attached

Test count math:

  • Pre-track baseline: 1282 pass + 4 skip + 3 fail
  • After this track: 1285 pass + 4 skip + 0 fail (3 newly-passing)
  • This is the FIRST time the project is fully green since data_oriented_error_handling_20260606 shipped on 2026-06-12.

10. Execution Order & Dependencies

No external blockers. This track can start immediately after the Tier 1 review approves the spec.

Execution order (the plan):

  1. Phase 1: Investigation + reproducing test
  2. Phase 2: Fix
  3. Phase 3: Full test suite + batched verification
  4. Phase 4: Docs update
  5. Phase 5: Metadata + tracks.md

Total: 5 phases, ~10 tasks, 4 atomic commits (1 fix + 1 docs + 1 metadata + 1 final-state); all with git notes.

Followed by: the user can do the send_resultsend mass rename themselves, then start data_structure_strengthening_20260606 track.


11. References

Architecture docs

  • docs/guide_rag.md (if it exists) — RAG subsystem architecture
  • docs/guide_app_controller.md — the AppController._do_rag_sync method is the entry point
  • docs/guide_testing.mdlive_gui fixture + structural testing contract

Styleguides

  • conductor/code_styleguides/error_handling.mdResult[T] pattern (used by RAGEngine._init_vector_store_result)
  • conductor/code_styleguides/data_oriented_design.md — the canonical DOD reference

Source code (the relevant lines)

  • src/app_controller.py:1451-1488_sync_rag_engine and _do_rag_sync (the entry points)
  • src/app_controller.py:1490-1497rag_enabled property + setter (triggers the sync)
  • src/app_controller.py:3016-3023_set_rag_status (sets the error status)
  • src/app_controller.py:3025-3056_rebuild_rag_index (the second worker)
  • src/rag_engine.py:88-128RAGEngine.__init__ and _init_vector_store_result
  • src/rag_engine.py:130-166_validate_collection_dim_result (the most likely .get() call site)
  • src/models.py:1039-1065RAGConfig and VectorStoreConfig

Parent tracks

  • conductor/tracks/data_oriented_error_handling_20260606/spec.md §12.1 — the follow-up scope that included RAG fixes
  • conductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md — the parent track that documented 4 RAG failures remaining (1 was inadvertently fixed)
  • docs/reports/TRACK_COMPLETION_public_api_migration_and_ui_polish_20260615.md §3 deviation #2.3 — the test_rag_integration.py fix (commit 26e1b652)

Test files (the 3 to fix)

  • tests/test_rag_phase4_final_verify.py::test_phase4_final_verify (tier-3 live_gui)
  • tests/test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim (tier-3 live_gui)
  • tests/test_rag_visual_sim.py::test_rag_full_lifecycle_sim (tier-3 live_gui)

Already-passing RAG tests (do NOT regress)

  • tests/test_rag_engine.py (8+ tests)
  • tests/test_rag_engine_result.py (3+ tests)
  • tests/test_rag_engine_ready_status_bug.py (3+ tests)
  • tests/test_rag_gui_presence.py (2 tests)
  • tests/test_rag_integration.py::test_rag_integration (1 test; was failing pre-public_api, fixed by commit 26e1b652)
  • tests/test_sync_rag_engine_coalescing.py (4+ tests)

User's stated intent (after this track)

  • send_resultsend mass rename (user will do manually)
  • Then data_structure_strengthening_20260606 track