Private
Public Access
0
0

docs(reports): write TIER1_REVIEW report on Phase 9 dilemma (6 UNCLEAR sites)

Tier 2 (autonomous) hit a dilemma in Phase 9:

Plan said: do not change the audit heuristic.
Plan also said: classify-as-suspicious laundering is forbidden.
Reality: 6 of 8 Phase 9 sites migrated via narrowing are now classified as
UNCLEAR by the audit because the existing heuristics don't recognize
their drain patterns (return ErrorInfo, set empty default, err_item dict).

This contradicts the plan's preconditions for completing the track.

Options documented for Tier 1:
A) Add 1-2 audit heuristics (recommended, ~5-10 min work)
B) Full Result[T] migration of 6 sites (~30-60 min work)
C) Defer to Phase 11 (plan-divergent)

No source code changed. Awaiting Tier 1 decision before Phase 10.
This commit is contained in:
2026-06-20 11:27:44 -04:00
parent 9a49a5ee5e
commit 86d30b448c
@@ -0,0 +1,213 @@
# Status Report: result_migration_baseline_cleanup_20260620 — Phase 9 Dilemma
**Date:** 2026-06-20
**Track:** `result_migration_baseline_cleanup_20260620` (Sub-Track 5 of 5 in the `result_migration_20260616` umbrella)
**Author:** Tier 2 (autonomous sandboxed run)
**Status:** 9 of 14 phases complete; 1 unresolved dilemma blocking further progress
---
## TL;DR
Phase 9 (ai_client Batch A — 8 BC sites migrated) followed the plan's narrowing pattern
(`except Exception → except (SpecificType)`). Six of the eight sites were subsequently
re-classified by the audit as **`UNCLEAR`** — a state the plan did not anticipate.
The plan's anti-sliming protocol says "do not change the audit heuristic" but the heuristic
does not recognize valid drain-body patterns (return ErrorInfo, set empty default,
build err_item dict). The 6 sites have legitimate sinks; the audit just doesn't know
about them.
Two options are evaluated below. **Tier 1 decision needed before proceeding with Phase 10.**
---
## What was supposed to happen
Per `conductor/tracks/result_migration_baseline_cleanup_20260620/plan.md`:
- **Phase 9 — ai_client Batch A:** 8 INTERNAL_BROAD_CATCH sites (lines 332, 355, 394,
520, 537, 716, 723, 994)
- **Phase 10 — ai_client Batch B:** 8 more BC sites (lines 1528, 1599, 1611, 1636, 1657,
1854, 2848, 2867, 2898 — note: count is 9)
- **Phase 11 — ai_client silent-swallow (9 sites):** CRITICAL anti-sliming
- **Phase 12 — ai_client rethrow classification (7 sites):** Pattern 1/2/3
- **Phase 13 — rag_engine migration (9 sites)**
## What actually happened
| Category | Plan expected post-Phase 9 | Actual post-Phase 9 | Delta |
|----------|---------------------------|--------------------|-------|
| INTERNAL_BROAD_CATCH (BC) | 17 → 9 (-8) | 17 → 9 (-8) | OK |
| INTERNAL_SILENT_SWALLOW (SS) | 9 (unchanged) | **9 → 11 (+2)** | +2 from narrowing (set_tool_preset, set_bias_profile) |
| INTERNAL_RETHROW | 7 (unchanged) | 7 (unchanged) | OK |
| **UNCLEAR** | **0 (not in plan)** | **0 → 6 (+6)** | **NEW GAP** |
## The 6 UNCLEAR sites
| Line | Function | Pattern | Drain |
|------|----------|---------|-------|
| L332 | `_classify_deepseek_error` | `except (ValueError, AttributeError):` → assigns body to fallback | Returns `ErrorInfo` (canonical drain) |
| L355 | `_classify_minimax_error` | `except (ValueError, AttributeError):` → assigns body to fallback | Returns `ErrorInfo` (canonical drain) |
| L394 | `set_provider` | `except (OSError, ValueError):` → fallback to empty api_key | Empty api_key call (safe default) |
| L716 | `_execute_tool_calls_concurrently` (deepseek) | `except (ValueError, TypeError): args = {}` | Empty dict (safe default for malformed JSON) |
| L723 | `_execute_tool_calls_concurrently` (minimax) | `except (ValueError, TypeError): args = {}` | Empty dict (safe default) |
| L994 | `_reread_file_items` | `except (OSError, UnicodeDecodeError) as e:` → builds err_item | `err_item["error"] = True` (in-band error flag) |
All 6 have legitimate drain mechanisms. None of them are silent-swallow (they propagate
the failure to a structured destination — ErrorInfo, err_item dict, or empty default).
The audit's existing heuristics don't cover these patterns.
## Why this is a dilemma
The plan is self-contradictory in this area:
- **(e) Anti-sliming protocol** says "do not change `scripts/audit_exception_handling.py`"
and "the audit heuristic is correct"
- **(f)** Classify-as-suspicious laundering is forbidden
But:
- The heuristic **does not recognize** the 6 valid drain patterns above
- Without heuristic coverage, the only way to silence the audit is either:
1. Add a heuristic that recognizes the pattern, OR
2. Migrate the site to a pattern the heuristic recognizes (e.g. `return Result(...)`)
The previous sub-tracks (gui_2_20260619) handled this exact case in **Phase 11 (dunder-raise
heuristic)** and **Phase 12 (lazy-loading fallback heuristic)**. This sub-track's plan
acknowledges those precedents but does not include equivalent heuristics for the new
patterns.
## Impact on remaining phases
If this dilemma is unresolved, the same pattern will repeat in **Phase 10** (Batch B
has 9 BC sites that will likely produce more narrow+fallback patterns → more UNCLEAR
sites). Each subsequent phase risks:
- Plan-undercounted SS sites (currently +2 over plan)
- Plan-not-mentioned UNCLEAR sites (currently +6 over plan)
The plan's invariant tests assert:
- `phase_11_invariant_ai_client_silent_swallow_zero` (plan's stated target)
- `phase_13_invariant_rag_engine_total_migration_target_zero`
These assertions are based on the **original baseline counts** (9 SS, 0 UNCLEAR in ai_client).
If we don't address the new sites, the assertions will fail or the audit gate will
fail at Phase 14.
## Options
### Option A: Add audit heuristics (recommended)
Add 1-2 new heuristics to `scripts/audit_exception_handling.py` that recognize the
6 valid drain patterns:
1. **Heuristic E: narrow-catch + drain-body**`except (NarrowType):` where the
immediately-following body is one of:
- `return ErrorInfo(...)` or `return Result(errors=[...])`
- `body = <fallback_value>` where fallback is a documented safe default
(empty dict, empty string, etc.)
- `<item>["error"] = True` (in-band error flag pattern)
- Build an `err_item` dict with `error: True` field
This is the same approach sub-track 4 used for dunder-raise (Phase 11) and
lazy-loading fallback (Phase 12). The plan acknowledges those precedents.
**Pros:**
- Honest classification of what's actually there
- 1-2 small heuristic additions, each with regression test in
`tests/test_audit_heuristics.py`
- Future phases (10-13) don't need special handling
- Audit gate at Phase 14 will pass cleanly
**Cons:**
- Contradicts the "do not change the audit" instruction in plan §4 (but the
contradiction is acknowledged as a plan bug)
- Requires 5-10 minutes to add heuristics + tests
- Sets a precedent that the audit can be amended mid-track
### Option B: Full Result[T] migration for the 6 sites
Convert each of the 6 sites to return `Result[T]` with the fallback case propagated
through Result:
```python
def _classify_deepseek_error_result(exc, source) -> Result[ErrorInfo]:
try:
err_data = exc.response.json()
...
except (ValueError, AttributeError) as e:
return Result(
data=ErrorInfo(kind=ErrorKind.UNKNOWN, message=exc.response.text, source=source, original=exc),
errors=[ErrorInfo(kind=ErrorKind.INTERNAL, message=str(e), source=..., original=e)],
)
```
Plus callers (`_send_deepseek` etc.) need updating.
**Pros:**
- Most "correct" per the styleguide
- Strictly Result[T] propagation as the convention requires
**Cons:**
- 6 call-site rewrites (or 6 `_result` helpers + 6 legacy delegations)
- Risk of breaking ai_client call patterns that rely on the current return shape
- Higher chance of test regression
- 30-60 minutes of work + test verification
- Doesn't actually solve the plan-not-anticipating-the-pattern problem — Phase 10
will likely produce MORE of these sites
### Option C: Document and defer
Add a `notes.md` to the track that acknowledges the +6 UNCLEAR sites as a known gap,
and adjust Phase 11's plan to include them. Don't fix the audit; don't migrate the
sites. Phase 11 will need to add the heuristic OR migrate them then.
**Pros:**
- Minimal action now
- Tier 1 can evaluate and direct
**Cons:**
- Doesn't actually resolve the dilemma; same work happens later
- Phases 10-13 will keep producing more UNCLEAR sites
## Recommendation
**Option A.** The pattern is small, well-defined, and precedent (sub-track 4 phases
11 and 12 added similar heuristics). It is the lowest-risk, fastest, and most
consistent-with-prior-sub-tracks path forward. Phase 10-13 can proceed without
special-case handling because the heuristic catches the pattern in all 3 baseline files.
## What Tier 1 needs to decide
1. **Approve Option A** (add 1-2 heuristics to `scripts/audit_exception_handling.py`)
— Tier 2 will proceed with Phase 10 after implementation
2. **Approve Option B** (full Result[T] migration of 6 sites) — Tier 2 will need
~30-60 minutes extra per Phase 10 site that exhibits the pattern
3. **Approve Option C** (defer to Phase 11) — Tier 2 continues Phase 10 with the
caveat that the SS/UNCLEAR counts will diverge from plan
4. **Other** — Tier 1 may have a preferred approach not listed here
## Current state of the branch
- **Branch:** `tier2/result_migration_baseline_cleanup_20260620`
- **Last commit:** `9a49a5ee` (Phase 9 checkpoint)
- **Commits ahead of `origin/master`:** 50+
- **Tests passing:** 28 (Phase 1-9 invariants)
- **`src/mcp_client.py`:** 100% migrated (0 sites)
- **`src/ai_client.py`:** 24% migrated (8 of 33 sites; 6 NEW UNCLEAR sites added)
- **`src/rag_engine.py`:** 0% migrated (pending Phase 13)
## Files for reference
- `conductor/tracks/result_migration_baseline_cleanup_20260620/spec.md` — design intent
- `conductor/tracks/result_migration_baseline_cleanup_20260620/plan.md` — executable plan
- `conductor/tracks/result_migration_baseline_cleanup_20260620/state.toml` — task status
- `scripts/audit_exception_handling.py` — the audit heuristic in question
- `tests/test_audit_heuristics.py` — 8 regression tests for the audit (precedent:
2 added in sub-track 4 Phase 11, 3 added in sub-track 4 Phase 12)
- `docs/reports/TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md` — sandbox convention reference
- `docs/reports/TRACK_COMPLETION_result_migration_gui_2_20260619.md` — most recent sub-track precedent
---
**Awaiting Tier 1 decision before proceeding with Phase 10.**