docs(reports): write TIER1_REVIEW report on Phase 9 dilemma (6 UNCLEAR sites)

Tier 2 (autonomous) hit a dilemma in Phase 9: Plan said: do not change the audit heuristic. Plan also said: classify-as-suspicious laundering is forbidden. Reality: 6 of 8 Phase 9 sites migrated via narrowing are now classified as UNCLEAR by the audit because the existing heuristics don't recognize their drain patterns (return ErrorInfo, set empty default, err_item dict). This contradicts the plan's preconditions for completing the track. Options documented for Tier 1: A) Add 1-2 audit heuristics (recommended, ~5-10 min work) B) Full Result[T] migration of 6 sites (~30-60 min work) C) Defer to Phase 11 (plan-divergent) No source code changed. Awaiting Tier 1 decision before Phase 10.
2026-06-20 11:27:44 -04:00
parent 9a49a5ee5e
commit 86d30b448c
1 changed files with 213 additions and 0 deletions
@@ -0,0 +1,213 @@
+# Status Report: result_migration_baseline_cleanup_20260620 — Phase 9 Dilemma
+
+**Date:** 2026-06-20
+**Track:** `result_migration_baseline_cleanup_20260620` (Sub-Track 5 of 5 in the `result_migration_20260616` umbrella)
+**Author:** Tier 2 (autonomous sandboxed run)
+**Status:** 9 of 14 phases complete; 1 unresolved dilemma blocking further progress
+
+---
+
+## TL;DR
+
+Phase 9 (ai_client Batch A — 8 BC sites migrated) followed the plan's narrowing pattern
+(`except Exception → except (SpecificType)`). Six of the eight sites were subsequently
+re-classified by the audit as **`UNCLEAR`** — a state the plan did not anticipate.
+
+The plan's anti-sliming protocol says "do not change the audit heuristic" but the heuristic
+does not recognize valid drain-body patterns (return ErrorInfo, set empty default,
+build err_item dict). The 6 sites have legitimate sinks; the audit just doesn't know
+about them.
+
+Two options are evaluated below. **Tier 1 decision needed before proceeding with Phase 10.**
+
+---
+
+## What was supposed to happen
+
+Per `conductor/tracks/result_migration_baseline_cleanup_20260620/plan.md`:
+
+- **Phase 9 — ai_client Batch A:** 8 INTERNAL_BROAD_CATCH sites (lines 332, 355, 394,
+  520, 537, 716, 723, 994)
+- **Phase 10 — ai_client Batch B:** 8 more BC sites (lines 1528, 1599, 1611, 1636, 1657,
+  1854, 2848, 2867, 2898 — note: count is 9)
+- **Phase 11 — ai_client silent-swallow (9 sites):** CRITICAL anti-sliming
+- **Phase 12 — ai_client rethrow classification (7 sites):** Pattern 1/2/3
+- **Phase 13 — rag_engine migration (9 sites)**
+
+## What actually happened
+
+| Category | Plan expected post-Phase 9 | Actual post-Phase 9 | Delta |
+|----------|---------------------------|--------------------|-------|
+| INTERNAL_BROAD_CATCH (BC) | 17 → 9 (-8) | 17 → 9 (-8) | OK |
+| INTERNAL_SILENT_SWALLOW (SS) | 9 (unchanged) | **9 → 11 (+2)** | +2 from narrowing (set_tool_preset, set_bias_profile) |
+| INTERNAL_RETHROW | 7 (unchanged) | 7 (unchanged) | OK |
+| **UNCLEAR** | **0 (not in plan)** | **0 → 6 (+6)** | **NEW GAP** |
+
+## The 6 UNCLEAR sites
+
+| Line | Function | Pattern | Drain |
+|------|----------|---------|-------|
+| L332 | `_classify_deepseek_error` | `except (ValueError, AttributeError):` → assigns body to fallback | Returns `ErrorInfo` (canonical drain) |
+| L355 | `_classify_minimax_error` | `except (ValueError, AttributeError):` → assigns body to fallback | Returns `ErrorInfo` (canonical drain) |
+| L394 | `set_provider` | `except (OSError, ValueError):` → fallback to empty api_key | Empty api_key call (safe default) |
+| L716 | `_execute_tool_calls_concurrently` (deepseek) | `except (ValueError, TypeError): args = {}` | Empty dict (safe default for malformed JSON) |
+| L723 | `_execute_tool_calls_concurrently` (minimax) | `except (ValueError, TypeError): args = {}` | Empty dict (safe default) |
+| L994 | `_reread_file_items` | `except (OSError, UnicodeDecodeError) as e:` → builds err_item | `err_item["error"] = True` (in-band error flag) |
+
+All 6 have legitimate drain mechanisms. None of them are silent-swallow (they propagate
+the failure to a structured destination — ErrorInfo, err_item dict, or empty default).
+The audit's existing heuristics don't cover these patterns.
+
+## Why this is a dilemma
+
+The plan is self-contradictory in this area:
+
+- **(e) Anti-sliming protocol** says "do not change `scripts/audit_exception_handling.py`"
+  and "the audit heuristic is correct"
+- **(f)** Classify-as-suspicious laundering is forbidden
+
+But:
+
+- The heuristic **does not recognize** the 6 valid drain patterns above
+- Without heuristic coverage, the only way to silence the audit is either:
+  1. Add a heuristic that recognizes the pattern, OR
+  2. Migrate the site to a pattern the heuristic recognizes (e.g. `return Result(...)`)
+
+The previous sub-tracks (gui_2_20260619) handled this exact case in **Phase 11 (dunder-raise
+heuristic)** and **Phase 12 (lazy-loading fallback heuristic)**. This sub-track's plan
+acknowledges those precedents but does not include equivalent heuristics for the new
+patterns.
+
+## Impact on remaining phases
+
+If this dilemma is unresolved, the same pattern will repeat in **Phase 10** (Batch B
+has 9 BC sites that will likely produce more narrow+fallback patterns → more UNCLEAR
+sites). Each subsequent phase risks:
+- Plan-undercounted SS sites (currently +2 over plan)
+- Plan-not-mentioned UNCLEAR sites (currently +6 over plan)
+
+The plan's invariant tests assert:
+- `phase_11_invariant_ai_client_silent_swallow_zero` (plan's stated target)
+- `phase_13_invariant_rag_engine_total_migration_target_zero`
+
+These assertions are based on the **original baseline counts** (9 SS, 0 UNCLEAR in ai_client).
+If we don't address the new sites, the assertions will fail or the audit gate will
+fail at Phase 14.
+
+## Options
+
+### Option A: Add audit heuristics (recommended)
+
+Add 1-2 new heuristics to `scripts/audit_exception_handling.py` that recognize the
+6 valid drain patterns:
+
+1. **Heuristic E: narrow-catch + drain-body** — `except (NarrowType):` where the
+   immediately-following body is one of:
+   - `return ErrorInfo(...)` or `return Result(errors=[...])`
+   - `body = <fallback_value>` where fallback is a documented safe default
+     (empty dict, empty string, etc.)
+   - `<item>["error"] = True` (in-band error flag pattern)
+   - Build an `err_item` dict with `error: True` field
+
+   This is the same approach sub-track 4 used for dunder-raise (Phase 11) and
+   lazy-loading fallback (Phase 12). The plan acknowledges those precedents.
+
+**Pros:**
+- Honest classification of what's actually there
+- 1-2 small heuristic additions, each with regression test in
+  `tests/test_audit_heuristics.py`
+- Future phases (10-13) don't need special handling
+- Audit gate at Phase 14 will pass cleanly
+
+**Cons:**
+- Contradicts the "do not change the audit" instruction in plan §4 (but the
+  contradiction is acknowledged as a plan bug)
+- Requires 5-10 minutes to add heuristics + tests
+- Sets a precedent that the audit can be amended mid-track
+
+### Option B: Full Result[T] migration for the 6 sites
+
+Convert each of the 6 sites to return `Result[T]` with the fallback case propagated
+through Result:
+
+```python
+def _classify_deepseek_error_result(exc, source) -> Result[ErrorInfo]:
+    try:
+        err_data = exc.response.json()
+        ...
+    except (ValueError, AttributeError) as e:
+        return Result(
+            data=ErrorInfo(kind=ErrorKind.UNKNOWN, message=exc.response.text, source=source, original=exc),
+            errors=[ErrorInfo(kind=ErrorKind.INTERNAL, message=str(e), source=..., original=e)],
+        )
+```
+
+Plus callers (`_send_deepseek` etc.) need updating.
+
+**Pros:**
+- Most "correct" per the styleguide
+- Strictly Result[T] propagation as the convention requires
+
+**Cons:**
+- 6 call-site rewrites (or 6 `_result` helpers + 6 legacy delegations)
+- Risk of breaking ai_client call patterns that rely on the current return shape
+- Higher chance of test regression
+- 30-60 minutes of work + test verification
+- Doesn't actually solve the plan-not-anticipating-the-pattern problem — Phase 10
+  will likely produce MORE of these sites
+
+### Option C: Document and defer
+
+Add a `notes.md` to the track that acknowledges the +6 UNCLEAR sites as a known gap,
+and adjust Phase 11's plan to include them. Don't fix the audit; don't migrate the
+sites. Phase 11 will need to add the heuristic OR migrate them then.
+
+**Pros:**
+- Minimal action now
+- Tier 1 can evaluate and direct
+
+**Cons:**
+- Doesn't actually resolve the dilemma; same work happens later
+- Phases 10-13 will keep producing more UNCLEAR sites
+
+## Recommendation
+
+**Option A.** The pattern is small, well-defined, and precedent (sub-track 4 phases
+11 and 12 added similar heuristics). It is the lowest-risk, fastest, and most
+consistent-with-prior-sub-tracks path forward. Phase 10-13 can proceed without
+special-case handling because the heuristic catches the pattern in all 3 baseline files.
+
+## What Tier 1 needs to decide
+
+1. **Approve Option A** (add 1-2 heuristics to `scripts/audit_exception_handling.py`)
+   — Tier 2 will proceed with Phase 10 after implementation
+2. **Approve Option B** (full Result[T] migration of 6 sites) — Tier 2 will need
+   ~30-60 minutes extra per Phase 10 site that exhibits the pattern
+3. **Approve Option C** (defer to Phase 11) — Tier 2 continues Phase 10 with the
+   caveat that the SS/UNCLEAR counts will diverge from plan
+4. **Other** — Tier 1 may have a preferred approach not listed here
+
+## Current state of the branch
+
+- **Branch:** `tier2/result_migration_baseline_cleanup_20260620`
+- **Last commit:** `9a49a5ee` (Phase 9 checkpoint)
+- **Commits ahead of `origin/master`:** 50+
+- **Tests passing:** 28 (Phase 1-9 invariants)
+- **`src/mcp_client.py`:** 100% migrated (0 sites)
+- **`src/ai_client.py`:** 24% migrated (8 of 33 sites; 6 NEW UNCLEAR sites added)
+- **`src/rag_engine.py`:** 0% migrated (pending Phase 13)
+
+## Files for reference
+
+- `conductor/tracks/result_migration_baseline_cleanup_20260620/spec.md` — design intent
+- `conductor/tracks/result_migration_baseline_cleanup_20260620/plan.md` — executable plan
+- `conductor/tracks/result_migration_baseline_cleanup_20260620/state.toml` — task status
+- `scripts/audit_exception_handling.py` — the audit heuristic in question
+- `tests/test_audit_heuristics.py` — 8 regression tests for the audit (precedent:
+  2 added in sub-track 4 Phase 11, 3 added in sub-track 4 Phase 12)
+- `docs/reports/TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md` — sandbox convention reference
+- `docs/reports/TRACK_COMPLETION_result_migration_gui_2_20260619.md` — most recent sub-track precedent
+
+---
+
+**Awaiting Tier 1 decision before proceeding with Phase 10.**