docs(report): add deviation #2.5 for test_headless_verification fix
The headless batch hang the user reported was caused by an xdist worker
crash on test_headless_verification_full_run, not a test logic failure.
The same root cause as the 4 Phase 2 follow-ups (mock returns raw string
but production does 'if not result.ok:'), but with a different failure
mode (worker crash that hangs the batched test runner).
Documented in section 3 of the report as deviation #2.5 with:
- Where it went wrong (missed in the 4 follow-ups)
- The specific symptom in the user's session
- The fix (out-of-band commit e35b6a34)
- Lesson for the next spec (verification must include xdist mode)
This commit is contained in:
@@ -22,7 +22,7 @@ Two concerns, one track — both shipped:
|
||||
|
||||
**(B) UI Polish Test Cleanup:** 2 broken test assertions fixed (`find()` → `rfind()` to locate the actual code instead of the comment block). The production code was already correct (user commits `d0b06575` and `df7bda6e`); the test bug was just the search logic.
|
||||
|
||||
**Result:** 12 pre-existing test failures fixed (6 from the spec + 6 more discovered in Phase 2 follow-ups). 4 RAG failures remain (deferred to a separate RAG track per spec §7.1 OOS1).
|
||||
**Result:** 13 pre-existing test failures fixed (6 from the spec + 6 more discovered in Phase 2 follow-ups + 1 out-of-band that caused the headless batch hang). 4 RAG failures remain (deferred to a separate RAG track per spec §7.1 OOS1).
|
||||
|
||||
**CRITICAL for the next track you plan:** The user has expressed intent to **mass-rename `send_result` to `send`** in a future refactor (stated during the run: "when we do I'm going to rename all send_result to send via mass refactor"). The track is designed so this rename is mechanical: the `Result[T]` return type is stable; only the public function name changes. The next track should plan for this rename and decide whether to keep `Result[T]` semantics or revert to `Optional[str]`.
|
||||
|
||||
@@ -54,8 +54,8 @@ Two concerns, one stability track. Per spec §0: "This is a **stability track**
|
||||
| **Test mock bug (G16, qwen)** | 1 | 2 tests in `test_qwen_provider.py` asserted against raw `str`; production returns `Result[str]` after the `data_oriented_error_handling` refactor |
|
||||
| **UI Polish test bug (G17, G18)** | 2 | `find()` located the comment block, not the code; `rfind()` fixes |
|
||||
| **Deprecation removal (G19)** | 1 | `send()` function + `filterwarnings` + `test_deprecation_warnings.py` |
|
||||
| **Discovered in implementation (not in spec)** | 7 | Production-affected test mocks (4 files: `test_conductor_tech_lead.py`, `test_orchestration_logic.py`, `test_orchestrator_pm.py`, `test_orchestrator_pm_history.py`, `test_phase6_engine.py`, `test_run_worker_lifecycle_abort.py`, `test_spawn_interception_v2.py`) + 4 follow-up mock-return-value fixes |
|
||||
| **Total** | **27 gaps** | (19 documented in metadata + 7 discovered in Phase 2 + 4 follow-up) |
|
||||
| **Discovered in implementation (not in spec)** | 7 | Production-affected test mocks (4 files: `test_conductor_tech_lead.py`, `test_orchestration_logic.py`, `test_orchestrator_pm.py`, `test_orchestrator_pm_history.py`, `test_phase6_engine.py`, `test_run_worker_lifecycle_abort.py`, `test_spawn_interception_v2.py`) + 4 follow-up mock-return-value fixes + 1 out-of-band headless_verification fix |
|
||||
| **Total** | **28 gaps** | (19 documented in metadata + 7 discovered in Phase 2 + 4 follow-up + 1 out-of-band) |
|
||||
|
||||
### 1.2 Symptoms (as code-discovered during Phase 1.1)
|
||||
|
||||
@@ -208,6 +208,26 @@ A `rg "ai_client\.send|ai_client, ['\"]send['\"]|ai_client\.send\("` would have
|
||||
|
||||
**Lesson for the next spec:** A spec for a Result-based refactor should grep for `return_value="..."` and `return_value='...'` patterns that match the migrated function name. The script `scripts/audit_weak_types.py` does NOT catch this category (it catches `dict[str, Any]` annotations, not mock patterns).
|
||||
|
||||
### #2.5: Out-of-band fix — `test_headless_verification.py` caused the headless batch hang (CRITICAL for batched runs)
|
||||
|
||||
**Where this went wrong:** A 5th test file (`tests/test_headless_verification.py::test_headless_verification_full_run`) had the same raw-string mock pattern. It was missed in the original 4 follow-ups because:
|
||||
1. The 4 follow-ups targeted the 4 test files the user reported during the run
|
||||
2. The headless test fails in a different mode under xdist: the xdist worker crashes with `node down: Not properly terminated` rather than reporting a test failure
|
||||
3. The batched test runner (`scripts/run_tests_batched.py`) reads from a pipe; when the worker dies ungracefully, the master process waits forever for the pipe to close, hanging the entire `tier-1-unit-headless` batch
|
||||
|
||||
**Symptom in the user's session:**
|
||||
```
|
||||
>>> Running tier-1-unit-headless (2 files)
|
||||
[gw0] [ 33%] PASSED tests/test_headless_simulation.py::test_mma_track_lifecycle_simulation
|
||||
Why does this never end...
|
||||
```
|
||||
|
||||
**The fix:** Out-of-band commit `e35b6a34` — wrapped the mock return in `Result(data="...")`. 2/2 tests pass under xdist; full headless batch (14 tests) completes in 18.7s.
|
||||
|
||||
**Lesson for the next spec:** A spec for a Result-based refactor should include a verification step that runs the full test suite under `pytest -n auto` (not just single-file or no-xdist mode). A test that "passes in isolation" can still hang under xdist, and a hang blocks the entire batched test runner.
|
||||
|
||||
**Why this was out-of-band rather than Phase 2:** The user's earlier report only mentioned 3 specific test failures (`test_token_reduction_logging`, `test_rag_integration`, `test_run_worker_lifecycle_uses_strategy`). I fixed those 3 plus 1 more (`test_conductor_engine_v2.py` which the user fixed manually but the mocks still returned raw strings). The headless_verification test was a 5th file not in the user's report, and its failure mode is qualitatively different (xdist worker crash, not test failure).
|
||||
|
||||
### #3: Track work was done on `master` directly, not a feature branch (PROCEDURAL)
|
||||
|
||||
**What happened:** The track was created on `master` and the work was committed directly to `master` over 31 atomic commits + 6 phase checkpoints. There is no feature branch to merge.
|
||||
|
||||
Reference in New Issue
Block a user