ed/manual_slop

Private

Public Access

Fork 0

Files

T

ed 58fe3063d8 move more tracks (completed) to archive

2026-06-18 18:59:05 -04:00

23 KiB

Raw Blame History

Track Specification: RAG Test Failures Fix

Track ID: rag_test_failures_20260615 Status: Active (spec approved 2026-06-15) Priority: A (foundational; precedes data_structure_strengthening_20260606 and the user's planned send_result → send mass rename) Owner: Tier 2 Tech Lead Type: bugfix + test_fix Scope: 3 test failures (tier-3 live_gui RAG tests) + 1 production bug in 2 lines + 3 new unit tests Parent tracks: data_oriented_error_handling_20260606 (shipped 2026-06-12), ai_loop_regressions_20260614 (shipped 2026-06-15), doeh_test_thinking_cleanup_20260615 (shipped 2026-06-15), public_api_migration_and_ui_polish_20260615 (shipped 2026-06-15)

0. TL;DR

A small, focused bug-fix track that resolves the 3 remaining pre-existing test failures (not 4 as the parent track documented — test_rag_integration.py was inadvertently fixed by the public_api migration's Phase 2 follow-up, commit 26e1b652).

All 3 failures share the same root cause: the RAG sync worker at src/app_controller.py:_do_rag_sync catches an exception during the RAGEngine construction or subsequent config lookup, and the error message is "'NoneType' object has no attribute 'get'". This is a specific Python error pattern indicating a dict.get() call is being made on a None value somewhere in the RAG setup path.

Result: all 1285 tests pass (1282 + 3 RAG fixed). The project reaches a fully-green baseline for the first time since the data_oriented_error_handling_20260606 track shipped on 2026-06-12. The user can then proceed with the planned send_result → send mass rename and the data_structure_strengthening_20260606 track.

1. Overview

1.1 Current State (as of 2026-06-15)

After the public_api_migration_and_ui_polish_20260615 track completed:

1282 tests pass (was 1280 pre-track; 7 newly-passing in the run, 13 fixed total per the completion report)
4 tests skipped (unchanged)
3 tests fail (was 10 pre-track; down from 4 RAG failures because test_rag_integration.py::test_rag_integration is now passing)

The 3 remaining failures are all RAG subsystem tests in tier-3 (live_gui):

Test	Tier	File	Failure point
`test_rag_phase4_final_verify::test_phase4_final_verify`	tier-3 (live_gui)	`tests/test_rag_phase4_final_verify.py`	Line 65 (after `rag_enabled=True` + wait for `rag_status == 'ready'`)
`test_rag_phase4_stress::test_rag_large_codebase_verification_sim`	tier-3 (live_gui)	`tests/test_rag_phase4_stress.py`	Line 48 (same pattern)
`test_rag_visual_sim::test_rag_full_lifecycle_sim`	tier-3 (live_gui)	`tests/test_rag_visual_sim.py`	Line 32 (initial status check after `rag_enabled=True`)

All 3 fail with the same error message captured in rag_status: "error: 'NoneType' object has no attribute 'get'". The error originates in src/app_controller.py:_do_rag_sync (line 1479-1482):

except Exception as e:
    self._set_rag_status(f"error: {e}")
    sys.stderr.write(f"[DEBUG RAG] Failed to sync engine: {e}\n")
    sys.stderr.flush()

1.2 Gaps to Fill (this Track's Scope)

Gap	Count	Spec Section
Investigate the RAG sync NoneType.get error	1 investigation	§3.1
Fix the underlying bug in `src/app_controller.py` and/or `src/rag_engine.py`	1-3 code changes	§3.2
Verify the 3 RAG tests pass	3 test fixes	§3.3

1.3 Already Implemented (DO NOT re-implement)

Verified by code audit (2026-06-15):

RAGConfig default (src/models.py:1039-1065) — has vector_store: VectorStoreConfig = field(default_factory=lambda: VectorStoreConfig(provider='mock')); the default is NOT None. Confirmed by direct instantiation: RAGConfig().vector_store.provider == 'mock'.
RAGEngine.__init__ with vector_store.provider='mock' — succeeds; is_empty() returns True; no further sync work is triggered (mock branch at src/rag_engine.py:123-126).
_do_rag_sync coalescing — the token + dirty flag pattern prevents N parallel syncs; works correctly (per test_infrastructure_hardening_20260609 track).
_init_vector_store_result mock branch — sets self.client = "mock" and self.collection = "mock"; is_empty() and add_documents() both check for this and return early.
test_rag_integration.py::test_rag_integration — already PASSES (fixed incidentally by public_api_migration_and_ui_polish_20260615 Phase 2 follow-up commit 26e1b652).

1.4 Investigation Clues

The error pattern "'NoneType' object has no attribute 'get'" is a specific Python error indicating a dict.get() call on a None value. The most likely candidates in the RAG sync path:

src/app_controller.py:1469 — engine = rag_engine.RAGEngine(self.rag_config, self.active_project_root) — if self.active_project_root is None or the RAGConfig has a None sub-field.
- Status: active_project_root is a property that returns str(Path(self.active_project_path).parent) or self.ui_files_base_dir. The test sets files_base_dir to a valid path.
- Status: RAGConfig() default has all required fields populated.
src/rag_engine.py:89-101 — RAGEngine.__init__ — calls _init_embedding_provider() and _init_vector_store_result(). With vector_store.provider='mock', the latter should return Result(data=None) (success).
- Status: Verified by direct instantiation: the engine constructs successfully.
src/rag_engine.py:111-128 — _init_vector_store_result — the 'chroma' branch calls _validate_collection_dim_result() (line 122) which calls self.collection.get(limit=1, include=["embeddings"]) (line 146) then res.get("embeddings") (line 149). If self.collection is set but the chromadb call returns a non-dict (e.g. a Result object), .get() would fail with NoneType.
- Status: This is the most likely candidate. The is_empty() and add_documents() short-circuit on the mock string, but the _init_vector_store_result for the 'mock' branch returns immediately with Result(data=None) (line 126) — so the chromadb validation is skipped. So this isn't the bug for the 'mock' case.
- Status: For the 'chroma' case (test_rag_phase4_stress uses 'chroma'), the validation runs. If self.embedding_provider.embed(["__rag_dim_check__"]) fails (e.g. due to gemini client not being initialized in the test subprocess), the error could be different. But the test_rag_phase4_stress uses rag_emb_provider='local' which depends on sentence_transformers.
src/app_controller.py:230 — controller.rag_engine and controller.rag_config and controller.rag_config.enabled — this is the entry check; if any of these is None, the sync is skipped.
- Status: self.rag_config is set in __init__ (line 1830-1831) and reset in reset_session (line 3387). Should never be None after init.
A more subtle cause: the submit_io lambda in src/app_controller.py:1457 (self.submit_io(lambda: self._do_rag_sync(token))) submits a lambda. If the IO pool is shared with the user-agent / MMA comms callbacks, an unrelated exception in a different task could leak into the RAG status.
- Status: Low likelihood, but worth checking.

The implementer MUST use TDD red-first: add a focused test that reproduces the error with minimal setup, then trace the call chain to find the actual .get(None) call. The audit above is a starting point, not a definitive diagnosis.

2. Goals

2.1 Functional Goals

ID	Goal	Acceptance Criterion
G1	Investigate the RAG sync NoneType.get error	A focused regression test reproduces the error with `rag_enabled=True` + `rag_source='mock'` setup
G2	Fix the underlying bug	The 3 RAG tests pass after the fix; no regression in the 12 RAG-related tests that already pass
G3	Add a defensive guard or proper error message	If a config field is unexpectedly None, the error message identifies WHICH field is None (so future debug is easier)
G4	Update `docs/guide_rag.md` to document the fix	The relevant guide has a "Known issues" or "Troubleshooting" section if appropriate

2.2 Non-Functional Goals

ID	Goal	Acceptance Criterion
NF1	Zero new regressions	`uv run pytest tests/` shows 3 fewer failures than pre-track baseline; no new failures
NF2	Per-task atomic commits	1-3 atomic commits with clear messages
NF3	1-space indentation, no comments, type hints preserved	`uv run python -c "import ast; ast.parse(open('src/app_controller.py').read())"` succeeds
NF4	Per-commit git notes	All commits have git notes summarizing the fix

3. Per-File Design

3.1 Investigation: Reproduce the error in isolation

The first task is a TDD red. The implementer should write a test that reproduces the error with minimal setup.

Recommended test file: tests/test_rag_sync_none_error.py (new file)

The test pattern:

def test_rag_sync_does_not_fail_with_none_error(controller_with_rag_enabled):
    # controller_with_rag_enabled: a fixture that:
    #   - Creates an AppController
    #   - Sets rag_enabled=True, rag_source='mock', files_base_dir=tmp_path
    #   - Submits the sync
    #   - Waits for the sync to complete (poll _rag_sync_dirty or rag_status)
    status = controller.rag_status
    assert "error" not in status, f"RAG sync failed unexpectedly: {status}"
    # OR
    assert status == "ready", f"Expected 'ready', got: {status}"

The diagnostic step:

Run the test; capture the full error message
Add a sys.stderr.write traceback capture in the except clause at src/app_controller.py:1479
Find the actual line where the .get() is called on None
Document the root cause in the commit message (so the fix is traceable)

3.2 The fix

The fix depends on what the investigation finds. Three likely scenarios:

Scenario A: A config field is None (most likely)

Example: If self.rag_config.embedding_provider is somehow None when the setter for rag_source is called, the engine init would fail.
Fix: Add a guard in the setter: if not self.rag_config: return and a fallback in the engine init: if self.config.embedding_provider is None: raise ValueError("embedding_provider must be set before rag_enabled").
Files affected: src/rag_engine.py, possibly src/app_controller.py

Scenario B: A dict access is failing on a ChromaDB response

Example: _validate_collection_dim_result line 149: embeddings = res.get("embeddings") if isinstance(res, dict) else None. If chromadb returns a different object type, the .get() is skipped (None is returned) but the call downstream may fail.
Fix: Add more defensive guards or correct the type check.
Files affected: src/rag_engine.py

Scenario C: A side effect of a previous test (subprocess state pollution)

Example: A prior test in the live_gui subprocess left the RAG config in a bad state.
Fix: Reset the RAG config in the test's setup or use live_gui.reset_session().
Files affected: The test (no production code change)

The implementer MUST follow the TDD protocol: write the reproducing test, run it, observe the failure, trace the root cause, fix it, run the test again, verify all 3 RAG tests pass.

3.3 Test verification

After the fix:

The 3 RAG tests pass in isolation
The 3 RAG tests pass in batched run (scripts/run_tests_batched.py)
The full test suite has 1285 pass (was 1282) + 4 skip + 0 fail (was 3)
No regression in test_rag_engine.py (9+ tests), test_rag_engine_result.py, test_rag_engine_ready_status_bug.py, test_rag_gui_presence.py, test_rag_integration.py, test_sync_rag_engine_coalescing.py, test_rag_phase4_stress.py (after the fix)

3.4 Documentation

Update docs/guide_rag.md (if it exists; check first) with:

A short note about the fix (1 paragraph)
A troubleshooting entry if the error is likely to recur: "If rag_status shows 'NoneType' object has no attribute 'get', check that rag_config.embedding_provider is set before rag_enabled."

If docs/guide_rag.md does not exist, no new doc is needed (the per-source-file guide is the wrong place for this; the test file's docstring or the commit message is sufficient).

4. Architecture Reference

4.1 The RAG sync pipeline

The RAG sync is initiated when any of the RAG-related setters is called (rag_enabled, rag_source, rag_emb_provider, rag_chunk_size, rag_chunk_overlap, etc.):

[Set rag_* property] -> [setter calls _sync_rag_engine()] -> [token + dirty flag update]
                                                                    |
                                                                    v
                                          [submit_io(_do_rag_sync(token))] -> [IO pool worker]
                                                                                     |
                                                                                     v
                                                                            [_do_rag_sync body]
                                                                                     |
                                                                                     v
                                           [RAGEngine(config, base_dir) construction]
                                                                                     |
                                                                                     v
                                  [if engine.is_empty() and self.files -> _rebuild_rag_index()]
                                                                                     |
                                                                                     v
                                                        [set _set_rag_status("ready" | "error: ...")]

4.2 The mock branch

The RAGConfig().vector_store.provider defaults to 'mock'. When the engine init hits this branch:

elif vs_config.provider == 'mock':
    self.client = "mock"
    self.collection = "mock"
    return Result(data=None)

The engine is "empty" (is_empty() returns True for mock). _rebuild_rag_index is NOT called. The status should be "ready" immediately.

4.3 The coalescing pattern

The token + dirty flag pattern in _sync_rag_engine ensures that N rapid setter calls produce ONE sync, not N parallel syncs. This is the pattern from test_infrastructure_hardening_20260609 track. The token check at line 1463 short-circuits superseded syncs.

4.4 The status update mechanism

self._set_rag_status(status) appends a task to _pending_gui_tasks. The GUI render loop processes the queue and updates the rag_status field. The test polls client.get_value('rag_status') to wait for the update.

5. Test Plan

5.1 Per-phase test verification

Phase	Test command	Expected
1	`uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 \| tee tests/artifacts/rag_track_phase1_red.log`	3/3 fail with the NoneType.get error
2	(after fix) `uv run pytest tests/test_rag_phase4_final_verify.py tests/test_rag_phase4_stress.py tests/test_rag_visual_sim.py -v 2>&1 \| tee tests/artifacts/rag_track_phase2_green.log`	3/3 pass
3	(full suite) `uv run pytest tests/ 2>&1 \| tee tests/artifacts/rag_track_phase3_full.log`	1285 pass + 4 skip + 0 fail
4	(batched) `uv run .\scripts\run_tests_batched.py 2>&1 \| tee tests/artifacts/rag_track_phase4_batched.log`	All tiers PASS; no failures

5.2 TDD red verification

For each new test or fix:

Verify the test FAILS as expected (red phase)
Implement the fix
Verify the test PASSES (green phase)
Verify no regression in the previously-passing tests
Commit

Anti-pattern guard: per AGENTS.md "Critical Anti-Patterns", no skipping tests just because they fail. The 3 RAG tests are the actual problem to solve; the implementer must find and fix the root cause.

5.3 The diagnostic strategy

If the implementer can't find the bug from the error message alone:

Add import traceback; sys.stderr.write(traceback.format_exc()) to the except clause in src/app_controller.py:1479-1482
Run the test; capture the full traceback
Find the actual .get(None) call
Document the traceback in the commit message (so the fix is traceable)
Remove the diag traceback after the fix is verified

6. Migration Strategy

This is a small bug-fix track. The phases are simple:

Phase 1: Investigation + reproducing test
Phase 2: Fix
Phase 3: Full test suite + batched verification
Phase 4: Docs update
Phase 5: Metadata + tracks.md

The order doesn't matter much (it's all one fix); the implementer can iterate between Phase 1 and 2 as needed.

7. Out of Scope

7.1 Deferred to separate tracks

ID	Item	Defer to	Why
OOS1	The `send_result` → `send` mass rename (user's stated intent)	User's manual refactor after this track	The user wants to do this themselves. The Result API is stable; only the function name changes.
OOS2	23 lower-impact files with weak types (per `data_structure_strengthening_20260606/spec.md` §1 line 20)	`data_structure_strengthening_20260606` (the next major track)	That's the data_structure track's scope.
OOS3	`live_gui_mock_injection_20260615` infrastructure	Separate infrastructure track	Not blocking. Recommended but not required.
OOS4	The full RAG test cleanup (e.g., removing `time.sleep(0.5)` patterns in favor of poll loops)	Separate RAG test quality track	The tests are functional; this is a test-quality improvement, not a bug fix.
OOS5	The Gemini CLI thinking-format path	Defer to `doeh_test_thinking_cleanup_20260615` follow-up	Not in this track's scope.
OOS6	The `RAGConfig` data structure improvements (e.g., nested validation)	`data_structure_strengthening_20260606`	Not blocking the bug fix.

7.2 Explicitly NOT in this track

The user wants to do a send_result → send mass rename after this track. Do not do it in this track. The bug fix is for RAG only.
A general RAG test quality cleanup (poll loops, error message improvements, etc.) — out of scope; only fix the specific bug.
The _rebuild_rag_index method's complex error handling — out of scope; only fix the specific bug.

8. Risks & Mitigations

ID	Risk	Likelihood	Impact	Mitigation
R1	The fix breaks an unrelated test	Low	Medium	Run the full test suite in Phase 3 + the batched test in Phase 4. If a new failure appears, STOP and report.
R2	The bug is in a hard-to-reach code path (deep in IO pool worker)	Medium	Medium	Add diagnostic traceback in the except clause; capture the actual error site; document in the commit message.
R3	The fix is in the test (subprocess state pollution) not the production code	Low	Low	If the fix is in the test, document this in the commit message. Consider adding a teardown reset in the test.
R4	The fix introduces a regression in `test_rag_engine_ready_status_bug.py`	Low	Medium	Run the full RAG test suite after the fix.
R5	The implementation is larger than the 2-line fix suggested by the spec	Low	Low	The spec is a guide, not a contract. If the fix is larger (e.g., a larger refactor is needed), the Tier 2 reports and the user decides whether to expand scope. The user's overall plan is 2 more tracks (this + a `send_result` → `send` rename) before the data structure track.

9. Verification Criteria (definition of "done")

The track is DONE when ALL of the following are true:

G1: A reproducing test exists that fails before the fix
G2: All 3 RAG tests pass (test_rag_phase4_final_verify, test_rag_phase4_stress, test_rag_visual_sim)
G3: A defensive guard or proper error message is added (so future debug is easier)
G4: docs/guide_rag.md updated (if it exists)
NF1: No new regressions in the full test suite (1285 pass + 4 skip + 0 fail)
NF2: Per-task atomic commits (1-3 commits total)
NF3: 1-space indentation + no comments + type hints preserved
NF4: Per-commit git notes attached

Test count math:

Pre-track baseline: 1282 pass + 4 skip + 3 fail
After this track: 1285 pass + 4 skip + 0 fail (3 newly-passing)
This is the FIRST time the project is fully green since data_oriented_error_handling_20260606 shipped on 2026-06-12.

10. Execution Order & Dependencies

No external blockers. This track can start immediately after the Tier 1 review approves the spec.

Execution order (the plan):

Phase 1: Investigation + reproducing test
Phase 2: Fix
Phase 3: Full test suite + batched verification
Phase 4: Docs update
Phase 5: Metadata + tracks.md

Total: 5 phases, ~10 tasks, 4 atomic commits (1 fix + 1 docs + 1 metadata + 1 final-state); all with git notes.

Followed by: the user can do the send_result → send mass rename themselves, then start data_structure_strengthening_20260606 track.

11. References

Architecture docs

docs/guide_rag.md (if it exists) — RAG subsystem architecture
docs/guide_app_controller.md — the AppController._do_rag_sync method is the entry point
docs/guide_testing.md — live_gui fixture + structural testing contract

Styleguides

conductor/code_styleguides/error_handling.md — Result[T] pattern (used by RAGEngine._init_vector_store_result)
conductor/code_styleguides/data_oriented_design.md — the canonical DOD reference

Source code (the relevant lines)

src/app_controller.py:1451-1488 — _sync_rag_engine and _do_rag_sync (the entry points)
src/app_controller.py:1490-1497 — rag_enabled property + setter (triggers the sync)
src/app_controller.py:3016-3023 — _set_rag_status (sets the error status)
src/app_controller.py:3025-3056 — _rebuild_rag_index (the second worker)
src/rag_engine.py:88-128 — RAGEngine.__init__ and _init_vector_store_result
src/rag_engine.py:130-166 — _validate_collection_dim_result (the most likely .get() call site)
src/models.py:1039-1065 — RAGConfig and VectorStoreConfig

Parent tracks

conductor/tracks/data_oriented_error_handling_20260606/spec.md §12.1 — the follow-up scope that included RAG fixes
conductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md — the parent track that documented 4 RAG failures remaining (1 was inadvertently fixed)
docs/reports/TRACK_COMPLETION_public_api_migration_and_ui_polish_20260615.md §3 deviation #2.3 — the test_rag_integration.py fix (commit 26e1b652)

Test files (the 3 to fix)

tests/test_rag_phase4_final_verify.py::test_phase4_final_verify (tier-3 live_gui)
tests/test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim (tier-3 live_gui)
tests/test_rag_visual_sim.py::test_rag_full_lifecycle_sim (tier-3 live_gui)

Already-passing RAG tests (do NOT regress)

tests/test_rag_engine.py (8+ tests)
tests/test_rag_engine_result.py (3+ tests)
tests/test_rag_engine_ready_status_bug.py (3+ tests)
tests/test_rag_gui_presence.py (2 tests)
tests/test_rag_integration.py::test_rag_integration (1 test; was failing pre-public_api, fixed by commit 26e1b652)
tests/test_sync_rag_engine_coalescing.py (4+ tests)

User's stated intent (after this track)

send_result → send mass rename (user will do manually)
Then data_structure_strengthening_20260606 track

23 KiB Raw Blame History