Private
Public Access
0
0
Files
manual_slop/conductor/code_styleguides/error_handling.md
T

30 KiB

Data-Oriented Error Handling

Status: Active convention as of 2026-06-11. Established by the data_oriented_error_handling_20260606 track. Canonical reference for all Python error-handling decisions in this codebase.

This styleguide codifies Ryan Fleury's "errors are just cases" framework as the project convention. The 5 patterns below replace Optional[T] returns and exception-based control flow with Result[T] dataclasses and nil-sentinel dataclasses. SDK-boundary exceptions are caught and converted to ErrorInfo; the rest of the application works with data, not control flow.

Reference: Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them". Independent corroboration: Timothy Lottes (ERROR[__line__]: _code_ exit pattern; each error code has exactly one meaning — never overload UNKNOWN), Valigo ("Exceptions are horrifying"; modern languages without legacy baggage move away from exceptions — Rust, Jai, Zig, Odin).


The 5 Patterns

1. Nil-Sentinel Dataclasses (replaces None)

When a function would "return None" in conventional Python, return a nil-sentinel dataclass instead. The sentinel has all default values (zero-initialized) and is safe to read from.

from dataclasses import dataclass, field

@dataclass(frozen=True)
class NilPath:
 exists: bool = False
 read_text: str = ""
 errors: list[ErrorInfo] = field(default_factory=list)

NIL_PATH = NilPath() # module-level singleton

Callers don't need if x is None: checks; they can call x.read_text and get "" on the nil path.

Convention: NIL_* (uppercase) is the module-level singleton. Nil* (PascalCase) is the class. Frozen dataclass prevents runtime mutation.

2. Zero-Initialization (via @dataclass defaults)

Fresh memory from the OS is zero-initialized. In Python, @dataclass with field defaults achieves the same: the data is in a valid "empty" state without any explicit constructor logic.

@dataclass(frozen=True)
class String8:
 text: str = ""
 size: int = 0

Code that consumes String8 (e.g., a for-loop bounded by size) works correctly with the zero-initialized instance.

Convention: Mutable defaults use field(default_factory=list) (NOT = [], which is shared across instances).

3. Fail Early (push validation to shallow stack frames)

Don't defer error checks to deep in the call stack. Push them to the entry point so the user knows ASAP if the operation cannot succeed.

def do_thing(path: Path) -> Result[str]:
 resolved = _resolve_path(path) # validation happens HERE, not deeper
 if not resolved.ok:
 return Result(data="", errors=resolved.errors)
 ...

Convention: assert at entry points for invariants. Early return for user-facing errors. try/finally (Python's analog to goto defer) for cleanup.

4. AND over OR (Result with side-channel errors; no sum types)

Instead of Union[T, E] or Result<T, E>, return a struct with BOTH data and errors as parallel fields:

@dataclass(frozen=True)
class Result(Generic[T]):
 data: T # the happy-path result (zero-initialized on failure)
 errors: list[ErrorInfo] = field(default_factory=list) # side-channel; empty = success

Callers:

r = do_thing(path)
if r.errors:
 for err in r.errors: log(err.ui_message())
# use r.data regardless (it's the zero-initialized value on failure)

Convention: Result is generic over T (the success data) but NOT over the error type. Errors are always list[ErrorInfo] (a side-channel list, not a tagged sum). This collapses the bifurcated if r.ok: ... else: ... codepaths into a single flat codepath.

5. Error Info as Side-Channel (not as exception)

Errors flow as DATA in the Result struct, not as exceptions. SDK boundaries (which must catch vendor exceptions) convert them to ErrorInfo:

@dataclass(frozen=True)
class ErrorInfo:
 kind: ErrorKind
 message: str
 source: str = ""
 original: BaseException | None = None
 def ui_message(self) -> str:
 src = f"[{self.source}] " if self.source else ""
 return f"{src}{self.kind.value}: {self.message}"

Convention: ErrorInfo is the canonical error type. The legacy ai_client.ProviderError exception class is removed; SDK helpers (_classify_<vendor>_error()) RETURN ErrorInfo instead of raising.


The Data Model

The canonical types live in src/result_types.py:

Type Form Purpose
ErrorKind str, Enum (12+ values) Canonical error taxonomy: NETWORK, AUTH, QUOTA, RATE_LIMIT, BALANCE, PERMISSION, NOT_FOUND, INVALID_INPUT, NOT_READY, UNKNOWN, CONFIG, INTERNAL, plus optional PROVIDER_HISTORY_DIVERGED_FROM_UI for app-vs-provider-state-divergence cases. Each value has exactly one meaning.
ErrorInfo @dataclass(frozen=True) A single error: kind: ErrorKind, message: str, source: str = "", original: BaseException | None = None. Frozen; carries ui_message() for display.
Result[T] @dataclass(frozen=True) Generic[T] The success-or-failure container: data: T, errors: list[ErrorInfo] = field(default_factory=list), ok: bool property, with_error(), with_errors(), with_data() methods.
NilPath @dataclass(frozen=True) + NIL_PATH Nil-sentinel for filesystem paths. Has exists=False, read_text="", errors=[].
NilRAGState @dataclass(frozen=True) + NIL_RAG_STATE Nil-sentinel for the RAG engine. Has enabled=False, is_empty_result=True, errors=[].
OK Result[None] constant Trivial success for fail-or-succeed operations that carry no data.

Result is generic over T only (not over the error type). Errors are always list[ErrorInfo]. This is the AND-over-OR principle: data and errors are parallel fields, not a tagged sum.


Decision Tree

Need to represent "missing or failed"?
|
+-- Is the value a "data" value (not a control-flow signal)?
| +-- Use a Result dataclass (data + errors list)
| +-- Use a nil-sentinel dataclass (zero-initialized)
|
+-- Is the value a control-flow signal (e.g., "abort" or "skip")?
| +-- Use a boolean (or enum)
| +-- Use Optional[bool] / Optional[Enum] ONLY if the absence is meaningful
|
+-- Is the failure "unrecoverable" (programmer error, not runtime condition)?
| +-- Use assert (debug builds)
| +-- Use raise (only for programmer errors like KeyError on a known dict)
|
+-- Does the SDK raise an exception you can't avoid?
  +-- Catch at the boundary; convert to ErrorInfo inside a Result

Anti-Patterns

DON'T do these things:

  1. DON'T use Optional[X] for "this might fail at runtime". Use Result[X] instead.
  2. DON'T use None as a sentinel for "no result". Use a nil-sentinel dataclass.
  3. DON'T raise a custom exception class for runtime failures. Catch SDK exceptions and return ErrorInfo.
  4. DON'T use Union[T, E] (sum type). Use a struct with parallel fields (AND over OR).
  5. DON'T have if x is None: handle; else: use_x patterns in production code. The nil-sentinel makes them unnecessary.
  6. DON'T catch except Exception and silently swallow. Convert to ErrorInfo and return in the Result.

Examples

The 3 refactored subsystems demonstrate each pattern in context:

  • src/mcp_client.py:205-294read_file, list_directory, search_files return Result[str]; (p, err) tuples become Result[Path]; the 30+ assert p is not None chain (lines 304-794) is removed.
  • src/ai_client.py_send_<vendor>_result() returns Result[str] (8 vendors: gemini, anthropic, deepseek, minimax, gemini_cli, qwen, llama, grok); send_result() is the new public API; send() is @deprecated.
  • src/rag_engine.py:100-180_init_vector_store_result, _validate_collection_dim_result, is_empty_result, add_documents_result return Result[None] or Result[T]; broad except Exception blocks become ErrorInfo entries.

Hard Rules (enforced in the 3 refactored files)

These are non-negotiable in src/mcp_client.py, src/ai_client.py, and src/rag_engine.py:

  • Optional[T] return types are FORBIDDEN in the 3 refactored files. Use Result[T] (with NIL_T singleton if needed) instead. Rationale: Optional[T] is the sum type Union[T, None] that Fleury's framework replaces. Mixing the two patterns reintroduces the bifurcation the convention is designed to remove.
  • Function return types must be Result[T] for any function that can fail at runtime. A function that can't fail (e.g., get_name() -> str) doesn't need a Result. The classification is "can this return a different value under different runtime conditions?" If yes, Result. If no, plain return type.
  • Catch SDK exceptions at the boundary only. Inside the 3 refactored files, the only place an exception is caught is at the SDK call site (e.g., _send_<vendor>_result() wrapping the SDK call). Internal try/except is reserved for converting OSError, PermissionError, and similar I/O exceptions to ErrorInfo at the mcp_client tool boundary.

The verification script scripts/audit_optional_in_3_files.py enforces the Optional[X] rule by failing CI if any new Optional[X] appears in the 3 refactored files.

Optional[X] in argument types

The Optional[X] ban above applies to return types only. Argument types that genuinely may be None (e.g., rag_engine: Optional[Any] = None, pre_tool_callback: Optional[Callable] = None) remain allowed; they describe a caller choice, not a runtime failure of this function.

Cross-thread safety

Result and ErrorInfo are @dataclass(frozen=True) and therefore thread-safe by immutability. The with_error() / with_errors() / with_data() methods produce new instances (no mutation), matching the project's "no shared mutable state across threads" invariant. Deprecation warnings use warnings.warn(..., stacklevel=2) which is thread-safe.


When to Use This Convention

Use it for:

  • New public APIs (any function that can fail at runtime and the caller might care).
  • New internal functions where the caller benefits from knowing the failure (vs. just propagating None).

Don't use it for:

  • Constructors (__init__) that fail with programmer errors (use assert or raise for these). See "Constructors Can Raise" below for the full rule.
  • Trivial getters that can't fail (get_name() -> str doesn't need a Result).
  • Performance-critical hot paths where the overhead of the dataclass allocation is measurable (rare; benchmark first).

Boundary Types: What Counts as a "Boundary"?

The convention says "exceptions are reserved for the SDK boundary," but what counts as a boundary? There are 3 categories:

1. Third-party SDK calls

A try/except that wraps a call to a third-party SDK is the canonical boundary use of the pattern. The catch site converts the SDK's exception to ErrorInfo (or re-raises if the function is the public API and a Result is the right return type).

Recognized third-party SDK modules (partial list): anthropic, google / google.genai / google.api_core, openai, groq, cohere, chromadb, sentence_transformers, huggingface_hub, requests, urllib3, httpx, aiohttp, websockets, psutil, imgui_bundle, dearpygui, PIL, cv2, numpy.

Recognized third-party exception types (partial list): anthropic.APIError / RateLimitError / AuthenticationError, google.api_core.exceptions.GoogleAPIError / ResourceExhausted, openai.OpenAIError / APIError / RateLimitError, requests.RequestException / ConnectionError / Timeout, httpx.HTTPError / RequestError, chromadb.errors.ChromaError, pydantic.ValidationError.

2. Stdlib I/O that can raise

File and network I/O via stdlib (open(), os.path.*, json.loads(), subprocess.run(), socket.*, sqlite3.*, csv.*, zipfile.*, xml.etree.ElementTree) commonly raises. Catching the specific exception (OSError, FileNotFoundError, PermissionError, json.JSONDecodeError, subprocess.CalledProcessError, etc.) at the tool boundary and converting to ErrorInfo is compliant.

This is the "stdlib I/O exception caught in our own code is acceptable" rule. The catch site should be specific (except FileNotFoundError, not except Exception) and should convert to ErrorInfo, not swallow.

3. Framework boundaries (FastAPI)

A try/except or raise in a FastAPI _api_* handler is the framework boundary. raise HTTPException(status_code=..., detail=...) is the FastAPI-idiomatic way to signal an HTTP error; FastAPI converts it to a JSON response at the framework level. This is not an exception leak into internal code; it's the framework contract.

# Compliant: FastAPI boundary in _api_* handler
async def _api_get_key(controller, header_key: str) -> str:
 if not _is_valid_key(header_key):
  raise HTTPException(status_code=403, detail="Could not validate API Key")
 return header_key

# Compliant: broad catch + HTTPException at the FastAPI boundary
async def _api_generate(controller, payload):
 try:
  result = ai_client.send_result(...)
  return result.data
 except Exception as e:
  raise HTTPException(status_code=500, detail=f"AI call failed: {e}")

The catch-all except Exception is acceptable here because the conversion is to the framework's exception (HTTPException), not to a silent swallow. The detail message includes the original error; the HTTP status code is the framework contract.

What is NOT a boundary

  • Internal business logic: try/except around a for loop in a controller method is internal, not boundary.
  • Cross-method calls within src/: calling a method in app_controller.py from a method in app_controller.py is internal, not boundary.
  • stdlib I/O that the user controls directly: opening a file the user passed via --config is internal; converting the failure should be Result-based, not exception-based.

The Broad-Except Distinction

Anti-pattern #6 says "DON'T catch except Exception and silently swallow." But except Exception is not always a violation. The distinction is what the catch site does with the exception:

What the catch does Classification Convention status
pass (or no body) INTERNAL_SILENT_SWALLOW Violation
print(...) / log(...) only INTERNAL_SILENT_SWALLOW Violation (the data is lost)
return None / return Optional[T] INTERNAL_OPTIONAL_RETURN Violation (use Result[T])
return Result(data=..., errors=[ErrorInfo(...)]) BOUNDARY_CONVERSION Compliant (the canonical pattern)
raise (re-raise) INTERNAL_RETHROW (or BOUNDARY_SDK if at third-party call) Suspicious (often refactorable)
raise HTTPException(...) (in _api_* handler) BOUNDARY_FASTAPI Compliant (the framework contract)

The canonical pattern (in _result functions that wrap third-party SDK calls):

def _validate_collection_dim_result(self) -> Result[None]:
 if self.collection is None or self.collection == "mock":
  return Result(data=None)
 try:
  res = self.collection.get(limit=1, include=["embeddings"])
  # ... validation logic ...
  return Result(data=None)
 except Exception as e:
  return Result(data=None, errors=[
   ErrorInfo(kind=ErrorKind.INTERNAL,
       message=f"Failed to validate collection dim: {e}",
       source="rag._validate_collection_dim",
       original=e)
  ])

This except Exception is compliant because the catch + ErrorInfo conversion IS the data-oriented pattern. The original=e field preserves the original exception for debugging.

The anti-pattern (in internal code that has nothing to do with a third-party SDK):

# VIOLATION: broad catch + silent swallow
try:
 do_something()
except Exception:
 pass

# VIOLATION: broad catch + log-only (data is lost)
try:
 do_something()
except Exception as e:
 print(f"Error: {e}")

Constructors Can Raise

Per the "When to Use This Convention" section, constructors (__init__) that fail with programmer errors use assert or raise. This section elaborates.

Compliant constructor raises:

class MyClass:
 def __init__(self, config: Config):
  if config is None:
  raise ValueError("MyClass requires a non-None Config")
  if not config.api_key:
  raise ValueError("MyClass requires a non-empty api_key")
  self._config = config

Compliant assert (for impossible states):

def _set_rag_status(self, status: str):
 # The status string is one of a known set; if it's not, the caller
 # has a bug.
 assert status in {"idle", "ready", "syncing", "error"}, f"Unknown status: {status}"
 self._rag_status = status

The rule: if the failure is "this object cannot exist without X," raise in __init__ is the canonical pattern. The Result pattern is for runtime failures ("the network is down"); raise is for programmer errors ("you forgot to pass X").

Recognized programmer-error exception types (per scripts/audit_exception_handling.py INTERNAL_PROGRAMMER_RAISE category): AssertionError, ValueError, KeyError, IndexError, TypeError, AttributeError, NameError, RuntimeError, NotImplementedError.


Re-Raise Patterns

A try/except + raise (without ErrorInfo conversion) is suspicious but not always a violation. There are 3 legitimate re-raise patterns:

1. Catch + convert + raise as a different type

# Compliant: convert library error to user-friendly error
try:
 value = json.loads(raw)
except json.JSONDecodeError as e:
 raise ValueError(f"Invalid JSON: {e}") from e

The from e preserves the original exception in the traceback. The new exception type (ValueError) is more meaningful to the caller.

2. Catch + log + re-raise

# Compliant: log before propagating
try:
 do_something()
except Exception as e:
 logger.exception("do_something failed; will propagate")
 raise

The log line provides a record; the re-raise preserves the original control flow. This is appropriate when the failure is severe and the caller should still handle it.

3. Catch + cleanup + re-raise

# Compliant: ensure cleanup before propagating
try:
 resource = acquire()
 do_something(resource)
finally:
 release(resource) # `finally` is cleaner; `except+raise` is for when
  # you also need to log or convert

Use try/finally for the pure cleanup case (no logging/conversion). Use try/except + re-raise when you need to log or convert AND ensure cleanup.

Suspicious re-raise (often a code smell)

# SUSPICIOUS: catch + re-raise the same exception (no value-add)
try:
 do_something()
except Exception:
 raise

This catches an exception, does nothing with it, and re-raises. The try/except is dead code; remove it or use a Result-based propagation instead.

The audit script flags this as INTERNAL_RETHROW (suspicious). If you see this pattern in code review, ask "is the try/except doing anything useful? If not, remove it."


Audit Script

The convention is enforced via scripts/audit_exception_handling.py. This is a static analyzer (AST-based) that classifies every try/except/finally/raise site in the codebase per the categories in the previous sections.

Usage:

# Human-readable report
uv run python scripts/audit_exception_handling.py

# JSON output for tooling
uv run python scripts/audit_exception_handling.py --json

# Include tests/ and scripts/
uv run python scripts/audit_exception_handling.py --include-tests

# Top N files (default: 15)
uv run python scripts/audit_exception_handling.py --top 20

# Show every site inline
uv run python scripts/audit_exception_handling.py --verbose

# Strict mode (exit 1 on any violation; for CI use)
uv run python scripts/audit_exception_handling.py --strict

"Delete to turn off" (per feature_flags.md): rm scripts/audit_exception_handling.py disables the audit. Re-enable by restoring the file (it's tracked in git).

Classification categories (the canonical taxonomy; matches the script's output):

Category Convention status When
BOUNDARY_SDK Compliant Wraps a third-party SDK call
BOUNDARY_IO Compliant Wraps stdlib I/O that can raise
BOUNDARY_CONVERSION Compliant Catches and converts to ErrorInfo in a Result
BOUNDARY_FASTAPI Compliant FastAPI HTTPException in _api_* handler
INTERNAL_SILENT_SWALLOW Violation except ...: pass or just logs
INTERNAL_BROAD_CATCH Violation except Exception without ErrorInfo conversion, in non-*_result code
INTERNAL_OPTIONAL_RETURN Violation try/except + return None/Optional[T]
INTERNAL_RETHROW Suspicious try/except + raise (without ErrorInfo conversion)
INTERNAL_PROGRAMMER_RAISE Compliant raise for impossible state / precondition
INTERNAL_COMPLIANT Compliant try/finally (no except) — canonical cleanup
UNCLEAR Review needed Can't determine automatically

Output structure:

=== Exception Handling Audit (Data-Oriented Convention) ===

Files scanned: 65
Files with findings: 42
Total sites: 348
Compliant sites:   80
Suspicious sites:  25
Violation sites:   211
Unclear (review):  32

--- Baseline (refactored files: mcp_client, ai_client, rag_engine) ---
  Sites: 112, violations: 77
--- Migration target (all other src/ files) ---
  Sites: 236, violations: 134

The baseline is the 3 fully-refactored files (the convention reference). The migration target is the ~10 unrefactored files in src/. The violation count is informational; the user decides which migration-target files warrant a refactor track.

Important: the audit is informational, not a CI gate. The script exits 0 by default. Use --strict to enable CI-gate mode (exit 1 on any violation). The user is expected to review the report and decide the next action.


Migration Playbook

When converting existing code:

  1. Identify the Optional[X] return type or the raise statement.
  2. Define a Result dataclass (or use the existing one) with data: X and errors: list[ErrorInfo].
  3. Replace None returns with Result(data=NIL_X, errors=[...]) or Result(data=zero_value, errors=[...]).
  4. Replace raise X with return Result(data=zero_value, errors=[ErrorInfo(kind=..., message=...)]).
  5. Update the caller to check result.errors instead of is None / try/except.
  6. Add a test that verifies both the success and failure paths return the right Result.

Deprecation: ai_client.send()ai_client.send_result()

The public ai_client.send() is marked @deprecated (via typing_extensions.deprecated, the Python 3.11+ backport of @warnings.deprecated). It still works for backward compat but emits a DeprecationWarning at runtime. New code MUST use ai_client.send_result().

  • send_result(...) -> Result[str, ErrorInfo] — the new public API.
  • send(...) -> strdeprecated. Returns str for backward compat; errors are logged to the comms log but not returned.
  • Removal timeline: public_api_migration_20260606 follow-up track.

The deprecation warning is cached per call site (Python's __warningregistry__) to avoid log spam. tests/conftest.py adds a filterwarnings entry to silence the warning during the transition; new tests for the new API should assert the warning is NOT emitted by send_result().


AI Agent Checklist (Added 2026-06-16)

This section is for AI agents writing code in this codebase. LLMs are trained on idiomatic Python (try/except, Optional[T], raise Exception, etc.) which is the OPPOSITE of this convention. The checklist below catches the most common LLM mistakes. Run this checklist before claiming a task is done.

The 5 MUST-DO rules

When writing NEW code, you MUST:

  1. Use Result[T] for any function that can fail at runtime. A function that returns a different value under different runtime conditions (success vs. failure) returns Result[T], not Optional[T], not T | None, not a custom exception class. Use the Result dataclass from src/result_types.py; populate errors: list[ErrorInfo] on failure.

  2. Catch SDK exceptions at the boundary, convert to ErrorInfo. If your code calls anthropic, google.genai, openai, chromadb, requests, or any other third-party SDK, the catch site converts the exception to ErrorInfo(kind=..., message=...) and returns it in Result.errors. Do NOT re-raise; do NOT swallow; do NOT let the exception propagate into internal code.

  3. Use nil-sentinel dataclasses for "no result". If a function would return None in idiomatic Python, return a frozen NilPath / NilRAGState / etc. singleton from src/result_types.py instead. Callers don't need if x is None: checks; they can call x.read_text and get "" on the nil path.

  4. Use try/finally (no except) for cleanup. Bare try: ...; finally: cleanup() is the canonical goto defer pattern. Use it for resource cleanup, lock release, file handle close. Do NOT use try/except + pass for cleanup; the cleanup should run whether or not an exception occurred.

  5. raise is reserved for programmer errors. assert for "this should never happen" invariants. raise ValueError, raise NotImplementedError, raise KeyError in __init__ for "this object needs X." Do NOT use raise for runtime failures (the network is down, the file doesn't exist, the API rate-limited); those are Result cases.

The 7 MUST-NOT-DO rules

When writing NEW code, you MUST NOT:

  1. DO NOT use Optional[T] as a return type (in any file in src/mcp_client.py, src/ai_client.py, src/rag_engine.py — the 3 refactored files). Use Result[T] instead. CI fails if you add a new Optional[T] to those files (enforced by scripts/audit_optional_in_3_files.py).

  2. DO NOT use Optional[T] as a return type (anywhere else in src/). The convention is migrating to Result[T]; new code should set the pattern, not perpetuate the old one. Argument types that may be None (caller choice) are still OK.

  3. DO NOT use None as a sentinel for "no result". Use a nil-sentinel dataclass. The data is zero-initialized; the caller doesn't need a None check.

  4. DO NOT raise a custom exception class for runtime failures. SDK exceptions caught and converted to ErrorInfo is the only legitimate exception path. Internal code uses Result.

  5. DO NOT use Union[T, E] (sum type). Use Result[T] with side-channel errors: list[ErrorInfo]. The result is the data AND the errors, not a tagged sum.

  6. DO NOT catch except Exception and silently swallow. Either narrow the exception type, convert to ErrorInfo in a Result, or document the intentional swallow with a comment-free assert for the precondition. The audit script flags this as INTERNAL_SILENT_SWALLOW.

  7. DO NOT catch except Exception in non-*_result code without conversion to ErrorInfo. If you must catch, convert: except SomeError as e: return Result(data=NIL_T, errors=[ErrorInfo(kind=INTERNAL, message=..., original=e)]). The audit script flags this as INTERNAL_BROAD_CATCH.

The 3 boundary patterns (where try/except IS the right answer)

These are the 3 categories where try/except is legitimate. See the "Boundary Types" section above for the full discussion.

  1. Third-party SDK calls. Wrapping anthropic.Anthropic().messages.create(...) in try/except anthropic.APIError is the canonical pattern. Convert to ErrorInfo; return in Result.

  2. Stdlib I/O that can raise. open(), os.path.*, json.loads(), subprocess.run(), socket.*, sqlite3.*, chromadb.PersistentClient() can all raise. Catch the specific exception (OSError, FileNotFoundError, json.JSONDecodeError, subprocess.CalledProcessError, etc.); convert to ErrorInfo.

  3. FastAPI HTTPException in _api_* handlers. raise HTTPException(status_code=..., detail=...) in a function named _api_* is the FastAPI-idiomatic way to signal HTTP errors. FastAPI converts it to a JSON response at the framework level. This is NOT an exception leak; it's the framework contract.

The pre-commit gate

Before claiming "done," you MUST run:

uv run python scripts/audit_exception_handling.py

If the script reports any INTERNAL_* (other than INTERNAL_COMPLIANT and INTERNAL_PROGRAMMER_RAISE) or BOUNDARY_* (other than BOUNDARY_FASTAPI in _api_* handlers), your code violates the convention. Fix it before committing. For CI use:

uv run python scripts/audit_exception_handling.py --strict

--strict exits 1 on any violation; use this in pre-commit hooks and CI to enforce the convention. The 4 enforcement audit scripts are:

  • scripts/audit_exception_handling.py --strict (this one)
  • scripts/audit_weak_types.py --strict (the type-strengthening audit)
  • scripts/audit_main_thread_imports.py (always strict; the import graph gate)
  • scripts/audit_no_models_config_io.py (the config-I/O ownership gate)

All 4 are part of the convention enforcement. See conductor/product-guidelines.md "Data-Oriented Error Handling" and docs/AGENTS.md §"Convention Enforcement" for the project-level rules.

Why this checklist exists

LLMs are trained on idiomatic Python. Without this checklist, an AI agent writing new code in this codebase will revert to idiomatic patterns (try/except, Optional[T], raise Exception) — the "tech rot with idiomatic Python" the user is preventing. The checklist is the last line of defense. The audit scripts are the automated check; the checklist is the manual one.


  • conductor/tracks/data_oriented_error_handling_20260606/spec.md — the spec that established this convention.
  • docs/guide_ai_client.md "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the provider layer.
  • docs/guide_mcp_client.md "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the MCP tool layer.
  • conductor/code_styleguides/data_oriented_design.md (added 2026-06-12) — the canonical Data-Oriented Design (DOD) reference; this track is the canonical application of DOD to error handling ("errors are data, not control flow").
  • conductor/code_styleguides/agent_memory_dimensions.md (added 2026-06-12) — the 4-dim memory model; the knowledge harvest TDD protocol in workflow.md uses this track's Result pattern.
  • docs/guide_rag.md "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the RAG engine.
  • Ryan Fleury's original article — the philosophical foundation.