Private
Public Access
0
0
Files
manual_slop/docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md
T

17 KiB

Exception Handling Audit Report (Data-Oriented Convention Compliance)

Date: 2026-06-16 Track ID: exception_handling_audit_20260616 Status: COMPLETED (5/5 phases) Reviewer: User (handoff for next-track decision)


0. TL;DR

A static analyzer (scripts/audit_exception_handling.py) classified every try/except/finally/raise site in the codebase (65 files, 348 sites) against the data-oriented error handling convention established by data_oriented_error_handling_20260606 (shipped 2026-06-12).

Headline Count
Total sites 348
Compliant sites 80 (23%)
Suspicious sites 25 (7%)
Violation sites 211 (61%)
Unclear (manual review) 32 (9%)

Key finding: the convention is partially applied (3 of 65 src/ files are refactored: mcp_client.py, ai_client.py, rag_engine.py). The remaining ~10 files in src/ are in the migration-target state.

File Group Sites Violations Note
Baseline (3 refactored files) 112 77 Convention reference; even these have remaining except Exception + log patterns that should be Result-converted
Migration target (62 other files) 236 134 The work for future refactor tracks

What the user decides: which migration-target file is the next refactor track? The top 5 candidates by violation count are: gui_2.py (37), app_controller.py (35), session_logger.py (8), warmup.py (6), theme_models.py (6).

Important: the "violation count" is NOT a bug count. These are migration-target sites, not bugs. The codebase works correctly today (1288 + 4 + 0 test pass). The audit identifies which files would benefit from future refactor tracks; the user decides what to migrate.


1. Methodology

1.1 The 10 Classification Categories

The audit classifies each site into one of 10 categories (5 compliant, 3 violation, 1 suspicious, 1 unclear):

Category Convention status When
BOUNDARY_SDK Compliant Wraps a third-party SDK call
BOUNDARY_IO Compliant Wraps stdlib I/O that can raise
BOUNDARY_CONVERSION Compliant Catches and converts to ErrorInfo in a Result
BOUNDARY_FASTAPI Compliant FastAPI HTTPException in _api_* handler
INTERNAL_SILENT_SWALLOW Violation except ...: pass or just logs
INTERNAL_BROAD_CATCH Violation except Exception without ErrorInfo conversion, in non-*_result code
INTERNAL_OPTIONAL_RETURN Violation try/except + return None/Optional[T]
INTERNAL_RETHROW Suspicious try/except + raise (without ErrorInfo conversion)
INTERNAL_PROGRAMMER_RAISE Compliant raise for impossible state / precondition (__init__, assert, ValueError)
INTERNAL_COMPLIANT Compliant try/finally (no except) — canonical cleanup
UNCLEAR Review needed Can't determine automatically

1.2 The Baseline vs Migration-Target Split

The 3 fully-refactored files (per the data_oriented_error_handling_20260606 track) are the baseline — the convention reference. The other ~62 files are the migration target. The audit reports both separately so the user can distinguish "the convention has gaps even in the refactored files" from "the convention has not been applied to the unrefactored files".

1.3 The Script's Classification Logic

The script uses Python's ast module (not regex) to walk each source file's AST and classify each try/except/finally/raise node. The classification considers:

  1. The exception type (third-party SDK exception, stdlib I/O exception, FastAPI exception, programmer-error exception, etc.)
  2. The enclosing function name (_api_* for FastAPI, *_result for Result-returning, __init__ for constructors)
  3. The return type annotation of the enclosing function (Result[T] vs Optional[T] vs plain T)
  4. What the catch site does with the exception (ErrorInfo conversion, re-raise, return None, silent swallow, etc.)
  5. What the try body calls (third-party SDK module vs internal method)

The script outputs a 1-line hint per site suggesting what the fix could look like (e.g., "return Result(data=NIL_T, errors=[...])").

1.4 What the Script Does NOT Do

  • Does NOT execute the code (it's a static analyzer; no behavior change).
  • Does NOT modify any files.
  • Does NOT provide specific refactor patches (the "hint" is a 1-line suggestion; the implementer of the next refactor track writes the actual code).
  • Does NOT verify that refactored code works (no test execution; the audit report is the deliverable).

2. The 3 Refactored Baseline Files (Convention Reference)

These 3 files are the convention reference. Sites in these files are labeled in_refactored_baseline: true in the JSON output.

2.1 src/mcp_client.py (refactored 2026-06-12)

  • Total sites: 53
  • Violations: 44 (40 INTERNAL_BROAD_CATCH + 4 INTERNAL_SILENT_SWALLOW)
  • Compliant sites: 5
  • Unclear: 4

Note: the spec for the parent track chose "Path C" (additive *_result variants alongside the existing (p, err) tuple API). The 30+ tool-function refactor + assertion chain removal is deferred. The 44 violations are mostly the remaining (p, err) + except Exception + log patterns in the 30+ tool functions that haven't been refactored yet.

2.2 src/ai_client.py (refactored 2026-06-12)

  • Total sites: 46
  • Violations: 27 (18 INTERNAL_BROAD_CATCH + 9 INTERNAL_SILENT_SWALLOW)
  • Compliant sites: 8
  • Suspicious sites: 9
  • Unclear: 2

Note: the ProviderError exception class was REMOVED; all 8 _send_<vendor>_result() functions return Result[str]. The 27 violations are mostly the broad-catches in the SDK-exception-classification helpers (which catch anthropic.APIError, google.api_core.exceptions.*, etc., but don't convert to ErrorInfo at the catch site — they log and re-raise).

2.3 src/rag_engine.py (refactored 2026-06-12)

  • Total sites: 13
  • Violations: 6 (5 INTERNAL_BROAD_CATCH + 1 INTERNAL_SILENT_SWALLOW)
  • Compliant sites: 1
  • Suspicious sites: 8

Note: _init_vector_store_result and _validate_collection_dim_result return Result[None] with ErrorInfo conversion. The 6 violations are the remaining broad-catches in non-*_result methods (add_documents, etc.).

2.4 The 77 Baseline Violations Are NOT Bugs

The 77 violations in the 3 refactored files are migration-target sites in files that are otherwise convention-compliant. The refactor was incomplete (per the parent's Path C decision for mcp_client and the incremental migration strategy). The user can decide to do follow-up refactors to close these 77 sites, or to accept them as "good enough for the convention reference" and focus on the larger unrefactored files.


3. Per-File Violation Counts (Top 15 Migration-Target Files)

Rank File Total Violations Suspicious Unclear Compliant Note
1 src/gui_2.py (260KB) 54 37 2 13 2 Largest file; 25 INTERNAL_BROAD_CATCH + 12 INTERNAL_SILENT_SWALLOW
2 src/app_controller.py (166KB) 56 35 3 2 16 13 of 35 are FastAPI boundary (compliant); 22 are migration-target
3 src/session_logger.py 8 8 0 0 0 All silent-swallow + broad-catch
4 src/warmup.py 7 6 1 0 0 Startup-time broad-catches
5 src/theme_models.py 10 6 0 2 2 Mostly re-raise
6 src/api_hooks.py 5 5 0 0 0 FastAPI HookServer; many broad-catches
7 src/project_manager.py 5 5 0 0 0 3 silent-swallow + 2 broad-catch
8-15 (10 other files, 0-3 violations each) 91 32 6 14 39 Mixed; small files

Total migration-target sites: 236 Total migration-target violations: 134 Total migration-target compliant: 70 (mostly INTERNAL_PROGRAMMER_RAISE in __init__ + try/finally cleanup patterns + a few BOUNDARY_SDK from chromadb/requests imports)


4. Per-Category Breakdown

4.1 Violations (211 sites, 61% of total)

Category Count Typical pattern Fix hint
INTERNAL_BROAD_CATCH 147 try: ...; except Exception: log(...) Narrow the exception type OR convert to ErrorInfo in a Result
INTERNAL_SILENT_SWALLOW 61 try: ...; except SomeError: pass Let it propagate OR return Result(data=NIL_T, errors=[...]) OR document with assert
INTERNAL_OPTIONAL_RETURN 3 try: ...; except: return None Replace with Result[T] returning Result(data=NIL_T, errors=[...])

4.2 Compliant (80 sites, 23% of total)

Category Count Typical pattern
INTERNAL_PROGRAMMER_RAISE 25 raise ValueError in __init__; assert for impossible states
BOUNDARY_SDK 19 except anthropic.APIError; except google.api_core.exceptions.*
INTERNAL_COMPLIANT 16 try/finally cleanup pattern
BOUNDARY_FASTAPI 12 raise HTTPException in _api_* handler
BOUNDARY_CONVERSION 8 except Exception as e: return Result(data=..., errors=[ErrorInfo(...)])

4.3 Suspicious (25 sites, 7% of total)

Category Count Typical pattern Fix hint
INTERNAL_RETHROW 25 try: ...; except: log(); raise (no conversion) See "Re-Raise Patterns" in the styleguide; 3 legitimate patterns + 1 suspicious

4.4 Unclear (32 sites, 9% of total)

Category Count Typical pattern
UNCLEAR 32 Can't determine automatically; needs human review

The 32 UNCLEAR sites are mostly in src/gui_2.py (13) and the smaller files (theme_models, project_manager, etc.). They have ambiguous exception-handling patterns where the script's heuristics don't definitively classify. The --verbose flag shows each one inline.


5. The 5 Doc Gaps Closed (this track's secondary deliverable)

The audit revealed 5 gaps in the existing documentation of the convention. This track closed all 5.

5.1 G1: FastAPI HTTPException in _api_* handlers (CLOSED)

Gap: the styleguide said "exceptions are reserved for the SDK boundary" but didn't address the FastAPI framework boundary. The audit found 13 sites in src/app_controller.py that use FastAPI's idiomatic HTTPException pattern.

Fix: added a new "Boundary Types" section to the styleguide with 3 categories of legitimate boundaries (third-party SDK, stdlib I/O, framework). The framework category explicitly covers FastAPI. The new docs/guide_app_controller.md "Exception Handling" section explains the 13 sites in detail.

5.2 G2: The "broad except Exception" rule (CLOSED)

Gap: the styleguide's anti-pattern #6 says "DON'T catch except Exception and silently swallow." But except Exception + ErrorInfo conversion is the canonical SDK boundary pattern (per the parent's spec §3.3). The rule was ambiguous.

Fix: added a new "The Broad-Except Distinction" section to the styleguide. The section provides a decision table showing when except Exception is compliant (conversion to ErrorInfo) vs when it's a violation (swallow / log-only). The new BOUNDARY_CONVERSION and INTERNAL_BROAD_CATCH categories in the audit implement this rule.

5.3 G3: The "constructors can raise" rule (CLOSED)

Gap: the styleguide §"When to Use This Convention" mentions "Constructors (__init__) that fail with programmer errors (use assert or raise for these)" but the wording is brief. The audit found multiple legitimate ValueError raises in __init__ and assert sites.

Fix: added a new "Constructors Can Raise" section to the styleguide with 2 code examples (the ValueError pattern + the assert pattern) and a list of 9 recognized programmer-error exception types. The new INTERNAL_PROGRAMMER_RAISE category in the audit implements this rule.

5.4 G4: The "re-raise" pattern (CLOSED)

Gap: the styleguide's anti-patterns say "DON'T raise a custom exception class for runtime failures" but re-raising is a separate concern that needs its own rule. The audit found 25 try/except + raise sites in src/.

Fix: added a new "Re-Raise Patterns" section to the styleguide with 3 legitimate re-raise patterns (convert, log, cleanup) + 1 suspicious pattern (catch + re-raise the same exception). The new INTERNAL_RETHROW category in the audit implements this rule.

5.5 G5: The audit script reference (CLOSED)

Gap: the new scripts/audit_exception_handling.py wasn't referenced from any of the convention's documentation.

Fix: added a new "Audit Script" section to the styleguide. The section documents the script's usage, the classification categories, the "delete to turn off" pattern (per feature_flags.md), and the output structure. Also added a cross-reference from conductor/product-guidelines.md "Data-Oriented Error Handling" section.


6. The Migration Target (the work for future refactor tracks)

The 211 violations are distributed across 42 files. The user decides which file(s) to migrate next. The top 3 candidates by violation count:

6.1 src/gui_2.py (37 violations, 260KB)

The largest file in the codebase. The 37 violations are mostly the INTERNAL_BROAD_CATCH (25) + INTERNAL_SILENT_SWALLOW (12) patterns. 13 sites are UNCLEAR (manual review needed).

Migration scope estimate: 2-3 days Tier 2 work to migrate the file to the convention. The work would be: convert Optional[T] return types to Result[T]; convert except Exception + log/print to except Exception + return Result(...); add tests for the new Result-based API.

Risk: the file is the GUI rendering layer; changes here affect every render frame. The migration should be done incrementally with the hot-reload mechanism (Ctrl+Alt+R) so the user can verify each change visually.

6.2 src/app_controller.py (35 violations + 16 compliant, 166KB)

The headless orchestrator. The 35 violations are 28 INTERNAL_BROAD_CATCH + 6 INTERNAL_SILENT_SWALLOW + 1 INTERNAL_OPTIONAL_RETURN. The 16 compliant sites are 13 FastAPI boundary + 3 INTERNAL_PROGRAMMER_RAISE.

Migration scope estimate: 2-3 days Tier 2 work. The 13 FastAPI boundary sites stay as-is (they're the framework contract). The 22 migration-target sites are the work.

Risk: the controller is the orchestrator and touches every subsystem. Changes here require careful coordination with the _predefined_callbacks and _gettable_fields registries (per the Hook API). The migration should be done in 5-file commits (the parent track's pattern).

6.3 src/session_logger.py (8 violations)

A small file (16KB). The 8 violations are 4 INTERNAL_BROAD_CATCH + 4 INTERNAL_SILENT_SWALLOW.

Migration scope estimate: 0.5 day Tier 2 work. The file is small and the migration is straightforward.

Risk: low. The file is self-contained.


7. Followup Recommendations (for the user's next-track decision)

The user has 4 options for what to do next:

# Option Scope Estimated effort Rationale
1 Do the planned send_resultsend mass rename (manual refactor) Mechanical find-replace of the function name 1-2 hours User's stated intent. Mechanical, low-risk. Doesn't change test pass count.
2 Migrate app_controller.py to the convention 22 migration-target sites 2-3 days Tier 2 The highest-priority migration per the doeh spec §12.2. The 13 FastAPI boundary sites stay.
3 Migrate gui_2.py to the convention 37 migration-target sites 2-3 days Tier 2 The largest file; would close the biggest single chunk.
4 Migrate session_logger.py + warmup.py + theme_models.py together 20 migration-target sites in 3 small files 0.5-1 day Tier 2 Quick wins; clears 3 files at once.

The recommended order is 1 → 2 → 3 → 4 (do the mechanical rename first, then the orchestrator migration, then the GUI, then the small files). The user decides.


8. Verification Artifacts

  • tests/artifacts/exception_handling_audit_final.log — the human-readable audit output (103 lines, 7.4KB)
  • tests/artifacts/exception_handling_audit_final.json — the JSON output (43.7KB, machine-readable)
  • scripts/audit_exception_handling.py — the static analyzer (792 lines)
  • conductor/code_styleguides/error_handling.md — updated with 5 new sections
  • docs/guide_app_controller.md — updated with the FastAPI boundary section
  • conductor/product-guidelines.md — updated with the audit script cross-reference
  • docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md — this report

9. Test Pass Count (unchanged from rag_test_failures_20260615)

This track is informational (no code change). The test pass count is 1288 + 4 + 0 (unchanged from the previous track's baseline).