Private
Public Access
0
0
Files
manual_slop/conductor/tracks/result_migration_20260616/spec.md
T
ed 5107f3cad9 Merge branch 'tier2/live_gui_test_fixes_20260618' into tier2/result_migration_small_files_20260617
# Conflicts:
#	conductor/tracks/live_gui_test_fixes_20260618/state.toml
#	docs/reports/RESULT_MIGRATION_SMALL_FILES_20260617.md
#	docs/reports/TRACK_COMPLETION_result_migration_small_files_20260617.md
#	scripts/tier2/failcount.py
#	scripts/tier2/write_report.py
2026-06-18 17:55:05 -04:00

36 KiB

Track Specification: Result Migration (Phase 2 — eliminate all bad exception handling)

Track ID: result_migration_20260616 (umbrella for the 5 sub-tracks below) Status: Active (spec approved 2026-06-16) Priority: A (foundational; the 3 refactored baseline files + 5 migration sub-tracks complete the data-oriented error handling convention) Owner: Tier 2 Tech Lead Type: refactor (5 sub-tracks, each a separate TDD execution) Scope: 268 sites across 42 files (per the exception_handling_audit_20260616 audit) Parent tracks: data_oriented_error_handling_20260606 (shipped 2026-06-12), exception_handling_audit_20260616 (shipped 2026-06-16) Sibling tracks: data_structure_strengthening_20260606 (planned, parallel; uses the cleaner Result API from this phase)

Note on effort estimates: per the Tier 1 rules (see conductor/workflow.md §"Tier 1 Track Initialization Rules"), this spec does NOT include day estimates. Effort is measured by scope (N files, M sites) and T-shirt size (S/M/L/XL) per sub-track. The user / Tier 2 agent decides the actual pacing.


0. TL;DR

This is the migration phase that completes the data-oriented error handling convention. The 2026-06-12 parent track established the convention; this umbrella track plans 5 sub-tracks that eliminate the remaining 211 violations + 25 suspicious + 32 unclear = 268 "bad" sites across the codebase.

Per-file baseline (per exception_handling_audit_20260616):

Bucket Files V+S sites What
LARGE 2 (gui_2, app_controller) 77 Dedicated track per file (T-shirt: XL)
MEDIUM 2 (session_logger, warmup) 15 Folds into the small-files track
SMALL 35 57 Batched in one track (T-shirt: L)
BASELINE 3 (mcp_client, ai_client, rag_engine) 87 Closes the gaps in the convention reference (T-shirt: L)

5 sub-tracks with consistent result_migration_* prefix:

  1. result_migration_review_pass (T-shirt: S) — 57 sites (32 UNCLEAR + 25 INTERNAL_RETHROW); updates the audit's heuristics
  2. result_migration_small_files (T-shirt: L) — 37 files (35 SMALL + 2 MEDIUM); SHIPPED 2026-06-18 (Phase 13 complete: 11/11 tiers actually run; 9 PASS clean + 2 PASS with documented issues (REPORTED for diff tracks: test_execution_sim_live GUI subprocess crash + test_live_gui_workspace_exists xdist race); 4 pre-existing Gemini 503 tests documented with @pytest.mark.skip) (Phase 10 REJECTED for sliming 21 sites via 5 LAUNDERING HEURISTICS; Phase 11 REJECTED for keeping Heuristic #19 and missing the visit_Try audit bug; Phase 12 REJECTED for the false test claim — the test runner script crashed at 5/11 with UnicodeEncodeError; tier-1-unit-core FAILED with 3 unverified 'pre-existing' failures; 6 tiers not actually tested; Phase 12's '11 tiers total. 10 PASS' claim in commit 2235e4b8 is false; Phase 13 fixes the script crash, investigates the 3 failures, and verifies 11/11 PASS)
  3. result_migration_app_controller (T-shirt: XL) — 56 sites (35 V + 3 S + 2 ? + 16 C; 13 FastAPI boundary stay as-is)
  4. result_migration_gui_2 (T-shirt: XL) — 55 sites (37 V + 2 S + 14 ? + 2 C; the 14 ? includes the +1 site from the review pass: src/gui_2.py:1349)
  5. result_migration_baseline_cleanup (T-shirt: L) — 112 sites (77 V + 10 S + 6 ? + 19 C in the 3 refactored files)

Total: 5 sub-tracks, 268 sites migrated, ~2100 lines changed across ~42 files.

Post-Review Pass Update (2026-06-17, sub-track 1 shipped): After the review pass (result_migration_review_pass_20260617), the UNCLEAR + INTERNAL_RETHROW sites are reclassified:

  • 24 UNCLEAR sites were in scope (the audit's "current state" count after the new heuristics was 24, not 32; the original 32 was the pre-heuristic count)
  • 23 of 24 UNCLEAR sites are compliant (reclassified by 10 new heuristics; only src/gui_2.py:1349 is migration-target)
  • 19 INTERNAL_RETHROW sites are all compliant: 7 PATTERN_1 (Result→Exception bridge in baseline files) + 2 PATTERN_2 (catch+log+re-raise) + 9 compliant (standard __getattr__, abstract method, validation raise) + 1 audit-script bug (missed find)
  • Net migration scope change: sub-track 4 (gui_2) gains 1 site (L1349). All other sub-tracks are unchanged.

Post-Sub-Track-2 Update (2026-06-17, sub-track 2 shipped): After the small-files migration (result_migration_small_files_20260617), the audit script is now correct (3 bugs fixed in Phase 1 of that sub-track), and the 37 SMALL+MEDIUM files have been processed:

  • 49/76 sites migrated (6 full Result[T] + 43 exception narrowing) + 13 already compliant
  • 27 sites remain INTERNAL_SILENT_SWALLOW (narrow-catch + pass); Phase 11 in progress (REJECTS Phase 10's sliming; full Result[T] migration; not narrowing, not logging-only, not silent recovery)
  • Audit's UNCLEAR count: 7 → 21 (+14 sites) - the narrowing created patterns the audit's heuristics don't recognize; Phase 11 in progress (REJECTS Phase 10's 5 LAUNDERING heuristics; reverts them and adds legitimate Heuristic A)
  • Bonus defensive fix: try/except (OSError, tomllib.TOMLDecodeError) in load_track_state unblocked 7+ tests
  • Test result: all 11 test tiers PASS (tier-1-unit-comms, tier-1-unit-core, tier-1-unit-gui, tier-1-unit-headless, tier-1-unit-mma, tier-2-mock_app-comms, tier-2-mock_app-core, tier-2-mock_app-gui, tier-2-mock_app-headless, tier-2-mock_app-mma, tier-3-live_gui)
  • Documented G4 deviation: 27 silent-swallow sites remain. Phase 11 COMPLETE (not Phase 10 — Phase 10 was REJECTED); full Result[T] migration for the 27 sites (5 full Result in warmup.py + 2 helper extracts + 14 documented as already compliant + 1 known limitation + 1 already Result from Phase 10). The user has directed that Result[T] is mandatory, not optional, given the project's heavy use of multi-threaded io_pool dispatch (Python has no wave-based preemptive thread pipelining, so every soft/hard failure point needs full context).

Phase 11 Update (2026-06-17, REJECTED Phase 10): Phase 10 attempted the full Result[T] migration but tier-2 SLIMED 21 of the 26 sites using except SpecificError: ...; logger.warning(...); return default (which is NOT a Result migration). Tier-2 also added 5 LAUNDERING HEURISTICS (#22-#26) to scripts/audit_exception_handling.py that classify narrowing as INTERNAL_COMPLIANT — these are rejected as laundering. Phase 11 REJECTS Phase 10, REVERTS the 5 laundering heuristics, and does the FULL Result[T] migration for the 21 slimed sites. Result[T] is NOT optional. No "context manager" or "user callback" excuses. The reference implementation is src/hot_reloader.py (which tier-2 did correctly); the same pattern must be applied to warmup.py. Test count claim must be 11 tiers (not 10).

Phase 12 Update (2026-06-17, REJECTED Phase 11): THE USER'S PRINCIPLE: "IF ANY PLACE HAS A ERROR LOG IT ALSO NEEDS A RESULT[T]. RESULT[T] PROPOGATES UNTIL IT REACHED A 'DRAIN' POINT WHERE THE ERROR CAN BE HANDLED APPROPRIATELY WITHOUT CRASHING THE APP. THE APP SHOULD ALMOST NEVER CRASH UNLESS SOMETHING CRITICAL FAILS THAT PREVENTS IT FROM ACTUALLY OPERATING WITH ITS FEATURES."

THE USER'S DIRECTIVE ON THE STYLEGUIDE: "make sure tier 2 is required to read that styleguide and make sure to update the style guide to be aware of the concept of a drain point, which just makes explicit a place where result[t]"

Phase 11 was REJECTED for 3 reasons:

  1. Heuristic #19 is LAUNDERING. The "narrow + log = compliant" pattern is WRONG. Logging is NOT a drain. Phase 11 left Heuristic #19 in place; 6 sites in the "14 already compliant" claim were Laundering via Heuristic #19. Phase 12.1 REMOVES Heuristic #19.
  2. The audit-script visit_Try walker is BUGGY. It does NOT recurse into node.body (the try body itself), so nested Trys are silently dropped. I verified: src/api_hooks.py has 23 actual try/except nodes but the audit reports only 5 — a gap of 18 sites, 12+ of which are silent-fallback violations. Phase 12.2 FIXES this bug.
  3. Tier-2 misclassified 2 sites. The claims of "HTTP request handlers; classified INTERNAL_COMPLIANT via Heuristic #19" for api_hooks.py:451 and :824 are wrong about which heuristic applies. The actual code at L451 is except (OSError, ValueError) as e: self.send_response(500) (narrow + HTTP response, NOT a Heuristic #19 log call). The actual code at L824 is except (OSError, ValueError) as e: import traceback; traceback.print_exc(file=sys.stderr) (narrow + traceback, NOT a Heuristic #19 log call). Phase 12.6.1 migrates these.

Phase 12 ACTIONS:

  • 12.0: TIER-2 MUST READ conductor/code_styleguides/error_handling.md end-to-end BEFORE any Phase 12 code work. NO CODE; the read is acknowledged in the commit message of 12.0.1.
  • 12.0.1: UPDATE error_handling.md with 3 changes: (A) add a "Drain Points" section with 5 patterns; (B) update the "Broad-Except Distinction" table to explicitly say narrow + log = INTERNAL_SILENT_SWALLOW violation (prevents Heuristic #19 regression); (C) add a MUST-READ rule to the AI Agent Checklist.
  • 12.1: REMOVE Heuristic #19 (narrow+log laundering)
  • 12.2: FIX the visit_Try audit bug (2-line change to recurse into node.body)
  • 12.3: ADD Heuristic D (True Drain-Point Recognition) with 5 patterns: HTTP error response, GUI error display, intentional app termination, telemetry emission, retry-with-bounded-attempts
  • 12.4-12.5: Re-audit and triage
  • 12.6: Migrate ALL newly-revealed sites to Result[T] (per-file sub-batches)
  • 12.7: Update callers
  • 12.8: Update tests (including 1+ error-path test per migration)
  • 12.9: Verify ALL 11 test tiers PASS (not 10; not 9)
  • 12.10-12.12: Update reports and umbrella

WHAT IS A DRAIN POINT: A function that HANDLES the error (not just records it). Examples: try: ...; except: imgui.text(f"Error: {e}") (user-visible error in GUI); try: ...; except: self.send_response(500); self.wfile.write(json.dumps({"error": str(e)})) (HTTP error response); try: ...; except: sys.exit(f"Fatal: {e}") (intentional app termination). NOT a drain point: try: ...; except: sys.stderr.write(...); pass (just log). Heuristic D recognizes the small set of legitimate drain points.

Phase 13 Update (2026-06-17, REJECTED Phase 12): Phase 12 migrations were REAL and SUBSTANTIAL: 16 sites in src/api_hooks.py migrated to Result[T] (3 helpers extracted), 27 sites in 16 small files migrated to Result[T], the styleguide was updated with the Drain Points section + the Broad-Except table update + the AI Agent Checklist MUST-READ rule, the audit-script had Heuristic #19 removed + visit_Try bug fixed + Heuristic D added with 5 drain-point patterns. Sub-track 2 audit post-fix: 0 violations, 0 UNCLEAR.

But Phase 12's test claim was FALSE:

  • The test runner script scripts/run_tests_batched.py:185 crashed with UnicodeEncodeError (cp1252 can't encode the box-drawing characters in the summary table) after running only 5 of 11 tiers.
  • tier-1-unit-core FAILED with 3 unverified "pre-existing" failures. One of these (test_gemini_provider_passes_qa_callback_to_run_script) is a mock assertion failure, NOT a Gemini API 503 — it may be a Phase 12 regression.
  • The 6 remaining tiers (tier-2-mock-comms/core/gui/headless/mma + tier-3-live_gui) were NOT executed.
  • Tier-2's "verified via git stash before my changes" claim is UNVERIFIED — the test log shows no parent-commit run was performed.
  • The "11 tiers total. 10 PASS" claim in commit 2235e4b8 is FALSE. Actual count: 5 tested, 4 PASS, 1 FAIL, 6 NOT TESTED.

Phase 13 ACTIONS:

  • 13.1: FIX the script crash in scripts/run_tests_batched.py:185 (add sys.stdout.reconfigure(encoding='utf-8', errors='replace') at the start of main()). This is the FIRST action; without it, no other test verification is possible.
  • 13.2: INVESTIGATE the 3 tier-1-unit-core failures on the parent commit (4ab7c732). For each test, run on parent and current; identify pre-existing vs regression. Record results to tests/artifacts/PHASE13_PARENT_COMMIT_RESULTS.log. Per AGENTS.md HARD BAN: do NOT use git restore or git checkout -- <file>; use git checkout <commit> (whole commit) and return via git checkout <branch>.
  • 13.3: FIX any actual regressions found in 13.2. Candidates: src/ai_client.py:_send_gemini (test_gemini_provider_passes_qa_callback_to_run_script), src/aggregate.py (test_auto_aggregate_skip, test_view_mode_summary). The audit's 0 violations in sub-track 2 scope MUST be preserved.
  • 13.4: DOCUMENT any confirmed pre-existing failures with @pytest.mark.skip(reason=...). Per AGENTS.md: documentation of a known failure, not an excuse.
  • 13.5: RE-RUN all 11 test tiers; verify the script completes and 11/11 PASS. The test count is 11, NOT 10. This is the FIFTH time this is being emphasized.
  • 13.6-13.8: Update reports and umbrella with the actual test results.
  • 13.9: Conductor - User Manual Verification.

The migrations stand. The test claim was wrong. Phase 13 fixes the test claim.

Phase 13 Resolution (2026-06-18, sub-track 2 SHIPPED): All 9 Phase 13 actions completed successfully:

  • 13.1 DONE: scripts/run_tests_batched.py:185 UTF-8 crash fixed. Commit 0c62ab9d.
  • 13.2 DONE: 3 tier-1-unit-core failures investigated on parent commit 4ab7c732. Log: tests/artifacts/PHASE13_PARENT_COMMIT_RESULTS.log. Commit b96252e9.
  • 13.3 DONE: 0 regressions to fix. Phase 12.6 commits did NOT introduce any regressions.
  • 13.4 DONE: 4 pre-existing Gemini 503 tests documented with @pytest.mark.skip(reason=...). Commit 2f405b44.
  • 13.4b DONE: User directive applied to test_execution_sim_live - switched from gemini_cli to gemini provider. STILL FAILS (GUI subprocess crash). Commit 6025a1d1. Reported for diff track.
  • 13.5 DONE: All 11 tiers actually run. Final results: 9 PASS clean + 2 PASS with documented issues (REPORTED for diff tracks: test_execution_sim_live + test_live_gui_workspace_exists).
  • 13.6 DONE: Reports updated.
  • 13.7 DONE: state.toml + metadata.json + tracks.md marked complete.
  • 13.8 DONE: This umbrella spec.md updated.
  • 13.9 PENDING: Conductor - User Manual Verification.

Test count is 11, NOT 10, NOT 9. The 11th tier is tier-1-unit-comms.

Reported for diff tracks (NOT Phase 12 regressions):

  1. test_execution_sim_live: GUI subprocess (port 8999) crashes mid-test during script generation flow. Same failure with both gemini_cli (mock subprocess) and gemini (real SDK). NOT provider-specific. The 90s timeout is reached without AI text. The GUI dies before the AI can respond.
  2. test_live_gui_workspace_exists: xdist race condition. The workspace can be cleaned up between fixture setup and the test assertion. Passes in isolation on both parent and current commit.

1. Overview

1.1 The State Before This Phase (as of 2026-06-16)

Per exception_handling_audit_20260616:

  • Convention is applied to 3 of 65 src/ files (mcp_client.py, ai_client.py, rag_engine.py — the "baseline").
  • 62 src/ files are in the migration-target state — they still use idiomatic Python (try/except, Optional[T], broad except Exception).
  • 211 violations + 25 suspicious + 32 unclear = 268 "bad" sites across 42 files.
  • Test pass count: 1288 + 4 + 0 (the codebase works correctly; the audit identifies refactor opportunities, not bugs).

1.2 The Goal

Migrate all 268 "bad" sites in the 42 affected files to the data-oriented error handling convention. After this phase, the codebase will have:

  • Zero INTERNAL_SILENT_SWALLOW (except ...: pass / log-only).
  • Zero INTERNAL_BROAD_CATCH (except Exception without ErrorInfo conversion, in non-*_result code).
  • Zero INTERNAL_OPTIONAL_RETURN (try/except + return None/Optional[T]).
  • Zero INTERNAL_RETHROW (try/except + raise without ErrorInfo conversion) — except where the new "Re-Raise Patterns" section allows.
  • Zero UNCLEAR (manual review confirms each is compliant or gets migrated).

The 5 sub-tracks collectively achieve this. The convention's "delete to turn off" audit script (scripts/audit_exception_handling.py) becomes useful as a CI gate in --strict mode after this phase: any new violation introduced by future code will fail CI.

1.3 The 5 Sub-Tracks (consistent result_migration_* prefix)

All 5 sub-tracks follow the naming pattern result_migration_<scope>_<YYYYMMDD>. The umbrella spec uses placeholders; each sub-track gets its own date when it starts. The umbrella commit names (this spec) use 20260616.

Sub-track 1: result_migration_review_pass_<YYYYMMDD>

Scope: 32 UNCLEAR + 25 INTERNAL_RETHROW = 57 sites across 15 files. T-shirt size: S (smallest sub-track; mostly research + audit-script edits).

Why first: the UNCLEAR sites are ambiguous; a human review pass turns them into definite decisions (compliant or migration-target). The INTERNAL_RETHROW sites need the 3 legitimate re-raise patterns from conductor/code_styleguides/error_handling.md (added 2026-06-16) to be applied. Both feed into all later sub-tracks.

What it does:

  • For each of the 32 UNCLEAR sites, a human looks at the site and decides compliant-or-migration. Updates the audit's heuristics for sites that turn out to be a common pattern.
  • For each of the 25 INTERNAL_RETHROW sites, classify as one of the 3 legitimate re-raise patterns (convert, log+raise, cleanup+raise) or mark for migration.
  • Output: a doc with the per-site decision (added as an appendix to this umbrella spec when the sub-track ships).

Dependency: none (it's the first sub-track).

Sub-track 2: result_migration_small_files_<YYYYMMDD>

Scope: 37 files (the 35 SMALL + 2 MEDIUM from the --by-size bucket); 76 sites (62V + 10S + 4 UNCLEAR) → 49 migrated + 13 already compliant + 27 silent-swallow remain. T-shirt size: L (batched; ~750 lines changed across 37 files + 1 audit script + 1 new test file). Status: shipped 2026-06-17 with documented G4 deviation (27 sites remain INTERNAL_SILENT_SWALLOW; Phase 11 of this sub-track REJECTS Phase 10's sliming of 21 sites and does the full Result[T] migration per the user's explicit direction).

Why second: the small files are quick wins; they don't depend on the orchestrator (app_controller) or the GUI. Some of them DO depend on sub-track 1's review pass (so the UNCLEAR sites are classified first). Phase 1 of this sub-track (audit-script bug fixes) unblocks sub-tracks 3 and 4 by giving them an audit that classifies correctly.

What it did:

  • Phase 1: 3 audit-script bug fixes (TDD) — fixed the 3 bugs documented in the review-pass report §4.4:
    • visit_Try walker now visits ALL except handlers (was only walking the last)
    • render_json per-file list now includes all findings (was filtering compliant)
    • render_json no longer truncates per-file list to top 15 (default now 200)
  • Phase 2: 4 UNCLEAR classifications (2 migration-target + 2 compliant; decisions in docs/reports/RESULT_MIGRATION_SMALL_FILES_20260617.md)
  • Phases 3-8: 49/76 sites migrated using two strategies:
    • Strategy A: Full Result[T] migration (2 files, 6 sites): summary_cache.py, log_registry.py. Backwards-compatible (callers ignore the Result return).
    • Strategy B: Exception narrowing (24 files, 43 sites): changed except Exception to specific stdlib/domain exceptions. Public API unchanged; behavior unchanged; no caller updates needed. This is a partial migration — the convention's FR4 says "convert to Result[T]", but the spec also acknowledged (R5) that cascading public API changes may be acceptable. Tier 2 chose narrowing for 43 sites to avoid ~100+ caller updates. Caveat: narrowing without logging.warning(...) is silent recovery (no trace). The 27 sites that remain INTERNAL_SILENT_SWALLOW are documented in the track completion report; Phase 11 of this sub-track is actively doing the full Result[T] migration for them (REJECTS Phase 10's sliming).
  • Phase 9: Verification — all 11 test tiers PASS; per-site report + track completion report written; state.toml + metadata.json marked completed.
  • Bonus defensive fix: try/except (OSError, tomllib.TOMLDecodeError) in load_track_state (in src/project_manager.py) for a pre-existing malformed state.toml crash. Unblocked 7+ tests.

Documented G4 deviation: 27 sites remain INTERNAL_SILENT_SWALLOW (narrow-catch + pass or narrow-catch + return None). These are categorized as:

  • Category A (intentional silent recovery, 17 sites): Known failure modes where the caller has no use for the error info (e.g., file_cache.py:98 mtime cache fallback, outline_tool.py:90 ast.unparse fallback, startup_profiler.py:40 profile output with stderr.write as a log). Should add logging.debug(...) per the audit's heuristic #19 to confirm intent.
  • Category B (user-input-driven, 10 sites): Callbacks and reload paths where any exception is possible (e.g., warmup.py:139/215/249 user callbacks, hot_reloader.py:58 module reload). Should add logging.warning(...) to surface user errors.

Migration-target sites introduced by the narrowing: the audit's UNCLEAR count went 7 → 21 (+14 sites) because the narrowing created patterns the audit's heuristics don't recognize. Phase 11 of this sub-track adds the legitimate Heuristic A (Result-returning recovery in non-*_result function) (heavily-narrowed except without logging; except returning Result in non-*_result function) that reclassify these.

Dependency: sub-track 1 (for the UNCLEAR classification). Unblocks sub-tracks 3 and 4 by fixing the audit script.

Sub-track 3: result_migration_app_controller_<YYYYMMDD>

Scope: src/app_controller.py (166KB); 56 sites (35 V + 3 S + 2 ? + 16 C). T-shirt size: XL (the orchestrator; high coordination with Hook API + MMA + RAG; ~700 lines changed in 1 file).

Why dedicated: the controller is the orchestrator; it touches every subsystem. Changes here require careful coordination with the _predefined_callbacks and _gettable_fields Hook API registries, the MMA conductor, and the RAG engine.

What it does:

  • Migrates the 22 migration-target sites (35 V - 13 FastAPI boundary = 22).
  • The 13 FastAPI boundary sites (per the new "Boundary Types" section in conductor/code_styleguides/error_handling.md) stay as-is.
  • The 16 compliant sites stay as-is.
  • Uses the 5-file-commit pattern from the parent track's doeh_test_thinking_cleanup_20260615 (not 11 separate test mocks).
  • Adds tests for the new Result-based API (similar to test_ai_client_result.py).

Dependency: sub-track 1 (for the 2 UNCLEAR sites at lines 1842 and 1668).

Sub-track 4: result_migration_gui_2_<YYYYMMDD>

Scope: src/gui_2.py (260KB); 55 sites (37 V + 2 S + 14 ? + 2 C; the 14 ? includes the +1 site from the review pass: src/gui_2.py:1349). T-shirt size: XL (the largest file; immediate-mode UI; ~700 lines changed in 1 file).

Why dedicated: the largest file in the codebase. The immediate-mode UI means changes here affect every render frame. The migration should be done incrementally with the hot-reload mechanism (Ctrl+Alt+R) so the user can verify each change visually.

What it does:

  • Migrates the 37 V + 2 S + 14 ? = 53 migration-target sites (the 14 ? includes the +1 site from the review pass: src/gui_2.py:1349, the only UNCLEAR site the review pass classified as migration-target).
  • The 2 compliant sites stay as-is.
  • The 13 UNCLEAR sites are the trickiest (per sub-track 1's review pass).
  • Uses the hot-reload mechanism for visual verification.

Dependency: sub-track 1 (for the 13 UNCLEAR sites); sub-track 3 (strong coordination, since app_controller calls gui_2 methods; the controller should be migrated first to give the GUI a clean API).

Sub-track 5: result_migration_baseline_cleanup_<YYYYMMDD>

Scope: the 3 refactored files (mcp_client.py, ai_client.py, rag_engine.py); 112 sites (77 V + 10 S + 6 ? + 19 C). T-shirt size: L (parent's Path C deferred work; ~600 lines changed across 3 files).

Why last: the baseline files ARE the convention reference. The remaining 77 violations are gaps in the reference (mostly the parent's "deferred" work — the 30+ tool functions in mcp_client.py, the SDK-exception-classification helpers in ai_client.py, the non-*_result methods in rag_engine.py). Closing these makes the convention reference pure — no migration-target sites in the baseline.

What it does:

  • Migrates the 30+ tool functions in mcp_client.py (the parent's Path C deferred work).
  • Migrates the broad-catches in the SDK-exception-classification helpers in ai_client.py (catch anthropic.APIError + convert to ErrorInfo).
  • Migrates the non-*_result methods in rag_engine.py.
  • Result: the 3 refactored files become 100% convention-compliant.

Dependency: none (independent of the other 4 sub-tracks; can run in parallel with sub-tracks 2-4 if the Tier 2 agents coordinate).

1.4 Out of Scope (Explicit)

  • send_resultsend mass rename (user's stated manual refactor; separate work after this phase ships).
  • data_structure_strengthening_20260606 (parallel track; uses the cleaner Result API from this phase).
  • live_gui_mock_injection_20260615 (separate infrastructure track).
  • Removing the send() deprecation (followup; once the rename ships).
  • Migrating tests/ files (the public_api_migration_20260606 track already migrated 22 test files to send_result(); the remaining tests are out of scope for this phase).
  • Adding new Result patterns to areas that don't have any (this phase migrates EXISTING try/except sites, not adds new ones).

[Track 1: review pass]              (S; informational; can run in parallel with 2-5)
       ↓
[Track 2: small files]              (L; 37 files)
       ↓
[Track 3: app_controller]           (XL; high coordination)
       ↓
[Track 4: gui_2]                    (XL; depends on 3 for clean API)
       ↓
[Track 5: baseline cleanup]         (L; can run in parallel with 3-4)

Parallelization options:

  • Tracks 2 + 5 can run in parallel (different files).
  • Tracks 3 + 5 can run in parallel (different files; both touch app_controller's interface but Track 5 only touches the convention reference files).
  • Track 4 depends on Track 3 (the GUI calls controller methods).
  • Track 1 is independent (informational; can run any time).

3. Architecture Reference

3.1 The Convention

  • conductor/code_styleguides/error_handling.md — the canonical styleguide (5 patterns + 5 doc-clarification sections added 2026-06-16)
  • docs/AGENTS.md §"The 4 memory dimensions" — the cross-cutting lens
  • docs/guide_ai_client.md "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the provider layer
  • docs/guide_mcp_client.md "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the MCP tool layer
  • docs/guide_rag.md "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the RAG engine
  • conductor/code_styleguides/data_oriented_design.md — the canonical DOD reference

3.2 The Audit Script

  • scripts/audit_exception_handling.py — the static analyzer (10-category classification; --json, --top, --verbose, --strict, --summary, --by-size modes)
  • docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md — the audit report (the 268-site inventory; the per-file + per-category breakdown)
  • docs/guide_app_controller.md "Exception Handling" — the app_controller-specific guide (the 13 FastAPI boundary sites; the 22 migration-target sites)

3.3 The 4 Enforcement Audit Scripts (CI gates)

This phase's goal is to make --strict mode of scripts/audit_exception_handling.py a viable CI gate. The other 3 enforcement scripts are:

  • scripts/audit_weak_types.py — the dict[str, Any] / list[dict[...]] type-strengthening audit
  • scripts/audit_optional_in_3_files.py — the Optional[T] return type ban in the 3 refactored files (referenced by error_handling.md but not yet committed; should be created in data_structure_strengthening_20260606 per its spec §12.2)
  • scripts/audit_main_thread_imports.py — the main-thread import graph purity invariant

After this phase ships, all 4 scripts should be wired into CI as --strict mode gates.


4. Per-Sub-Track Plan (just sub-track 1; the rest are detailed when each sub-track starts)

Sub-track 1 (result_migration_review_pass) is the only one with a detailed plan; the other 4 are detailed when each starts. The reason: the audit's UNCLEAR + INTERNAL_RETHROW classification may change the migration scope of the later sub-tracks (some UNCLEAR sites may turn out to be compliant, reducing the migration work).

Phase 1: Setup (Sub-track 1)

  • Task 1.1: Initialize the sub-track folder

    • WHERE: conductor/tracks/result_migration_review_pass_<YYYYMMDD>/
    • WHAT: spec.md, plan.md, metadata.json
    • HOW: Copy this umbrella spec as the starting point; customize for the review pass
  • Task 1.2: Update conductor/tracks.md

    • WHERE: conductor/tracks.md (new row for the sub-track)
    • WHAT: Add the sub-track under the umbrella row
    • HOW: Same pattern as the previous tracks

Phase 2: Review (Sub-track 1)

  • Task 2.1: Review the 32 UNCLEAR sites

    • WHERE: All src/ files
    • WHAT: For each site, decide compliant-or-migration; record the decision in a doc
    • HOW: Use the audit's JSON output; for each site, read the snippet
      • context + 2-3 lines around it; classify
  • Task 2.2: Classify the 25 INTERNAL_RETHROW sites

    • WHERE: All src/ files
    • WHAT: For each site, apply the 3 legitimate re-raise patterns from the new styleguide section; record the decision
    • HOW: Same as 2.1; the decisions feed into the migration scope of sub-tracks 2-4
  • Task 2.3: Update the audit script's heuristics

    • WHERE: scripts/audit_exception_handling.py
    • WHAT: For sites that turned out to be compliant (a common pattern the script doesn't recognize), add a heuristic to the classification logic
    • HOW: Add to the _classify_except / _classify_raise functions

Phase 3: Report (Sub-track 1)

  • Task 3.1: Write the review pass report
    • WHERE: docs/reports/RESULT_MIGRATION_REVIEW_PASS_<YYYYMMDD>.md
    • WHAT: Per-site decision table; updated migration scope for the later sub-tracks; updated audit script heuristics
    • HOW: Use the format of the EXCEPTION_HANDLING_AUDIT_20260616.md report

Phase 4: Verification (Sub-track 1)

  • Task 4.1: Verify the updated audit script

    • WHERE: scripts/audit_exception_handling.py
    • WHAT: Re-run the audit; the UNCLEAR count should drop to 0; the INTERNAL_RETHROW count should drop to whatever the 3 legitimate patterns don't cover
    • HOW: uv run python scripts/audit_exception_handling.py --by-size
  • Task 4.2: Document the updated migration scope

    • WHERE: This umbrella spec (the per-sub-track plan section)
    • WHAT: The sub-track 2-4 scope may change after the review pass; document the changes

5. Verification Criteria (per sub-track)

Each sub-track has its own verification criteria. The umbrella's criteria are that all 5 sub-tracks pass their criteria; the umbrella is "complete" when:

  • 268 sites migrated (or marked as legitimate via the review pass).
  • --strict mode of the audit script returns 0 (no violations).
  • Full test suite: 1288 + 4 + 0 (unchanged; the migration is behavior-preserving).
  • The convention is now fully applied to all 65 src/ files.
  • The 4 enforcement audit scripts can be wired into CI as --strict gates.

6. Risks & Mitigations

ID Risk Likelihood Impact Mitigation
R1 The 5 sub-tracks are larger than expected (the parent's Path C deferred work is bigger than estimated) Medium High Track 5 (baseline cleanup) is the biggest risk — the 30+ tool functions in mcp_client.py may be bigger than expected. The plan acknowledges scope can grow; the user decides whether to split sub-tracks further.
R2 The migration breaks the Hot Reload mechanism (changes to gui_2.py don't hot-reload correctly) Medium High Sub-track 4 uses the hot-reload mechanism for visual verification. The migration should be done incrementally; the user can verify each change visually.
R3 The migration breaks the Hook API (changes to app_controller.py break the _predefined_callbacks / _gettable_fields registries) Low High Sub-track 3 includes a "before/after" verification of the Hook API (via live_gui tests). The convention's Result type is structurally compatible with the existing str/None return types if needed.
R4 The review pass (sub-track 1) reveals that more sites are violations than the audit's heuristics suggest Medium Medium The review pass updates the audit's heuristics; the migration scope for sub-tracks 2-4 may grow. The plan documents the scope changes in Phase 4.
R5 The user wants a different sub-track ordering (e.g., the orchestrator first) Low Low The plan recommends a sequence but the user can reorder. The sub-tracks are independent enough to swap.

7. Commits (the umbrella + 5 sub-tracks, in order)

The umbrella is 1 commit. Each sub-track is 5+ commits (spec, plan, metadata, code, docs). Total: 1 + 5*5 = 26 commits across the 5 sub-tracks.


Phase 14 Update (2026-06-18): Live GUI Test Fixes

Sub-track 2 (result_migration_small_files_20260617) shipped on 2026-06-17 with 2 documented test infrastructure issues that blocked full closure. The follow-up track live_gui_test_fixes_20260618 was created and shipped on 2026-06-18 with both fixes applied.

The 2 fixes

Issue 1: test_execution_sim_live GUI subprocess crash (tier-3-live_gui)

  • Symptom: GUI subprocess (port 8999) crashes mid-test with 0xC00000FD = STATUS_STACK_OVERFLOW
  • Root cause: imgui.set_window_focus("Response") was called directly during the response panel render, exhausting the GUI main thread's 1.94 MB stack on Windows
  • Fix: defer the focus call to the next frame's idle phase via a new _pending_focus_response flag
  • Same root cause as test_z_negative_flows.py documented in docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617_REFINED.md

Issue 2: test_live_gui_workspace_exists xdist race (tier-1-unit-gui)

  • Symptom: xdist race where the owner worker's teardown removes the shared workspace path before a client worker's test can assert it exists
  • Root cause: live_gui_workspace fixture returned the path without ensuring it existed
  • Fix: call workspace.mkdir(parents=True, exist_ok=True) before returning
  • Pre-existing on parent commit 4ab7c732 (verified)

Final test pass count: 11/11 tiers PASS clean

After both fixes, all 11 test tiers pass clean (~825s total). This is the final pass count for sub-track 2. The 4 Gemini 503 pre-existing skip markers remain (out of scope for the live_gui_test_fixes track; deferred to a follow-up track to mock the Gemini API in summarize.summarise_file).

Sub-track 2 status

Sub-track 2 (result_migration_small_files_20260617) is now FULLY ready for merge with no documented issues from the live_gui_test_fixes track. Sub-track 3 (result_migration_app_controller) is unblocked.

References

  • conductor/tracks/live_gui_test_fixes_20260618/spec.md - the fix track spec
  • conductor/tracks/live_gui_test_fixes_20260618/plan.md - the fix track plan
  • docs/reports/TRACK_COMPLETION_live_gui_test_fixes_20260618.md - the fix track completion report
  • tests/artifacts/PHASE14_TEST_RUN_RESULTS.log - 11/11 tier verification

8. See Also

  • conductor/code_styleguides/error_handling.md — the canonical convention (5 patterns + 5 doc-clarification sections)
  • conductor/code_styleguides/data_oriented_design.md — the canonical DOD reference
  • docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md — the audit report (the 268-site inventory)
  • scripts/audit_exception_handling.py — the static analyzer (with --summary and --by-size modes)
  • conductor/tracks/exception_handling_audit_20260616/spec.md — the audit track's spec
  • conductor/tracks/data_oriented_error_handling_20260606/spec.md §12.2 — the parent's prioritized list of future migration tracks (this umbrella replaces that list)
  • conductor/tracks/data_structure_strengthening_20260606/spec.md — the parallel track (uses the cleaner Result API from this phase)