30 KiB
Data-Oriented Error Handling
Status: Active convention as of 2026-06-11. Established by the
data_oriented_error_handling_20260606track. Canonical reference for all Python error-handling decisions in this codebase.
This styleguide codifies Ryan Fleury's "errors are just cases" framework as the
project convention. The 5 patterns below replace Optional[T] returns and
exception-based control flow with Result[T] dataclasses and nil-sentinel
dataclasses. SDK-boundary exceptions are caught and converted to ErrorInfo;
the rest of the application works with data, not control flow.
Reference: Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have
Them".
Independent corroboration: Timothy Lottes (ERROR[__line__]: _code_ exit
pattern; each error code has exactly one meaning — never overload UNKNOWN),
Valigo ("Exceptions are horrifying"; modern languages without legacy baggage
move away from exceptions — Rust, Jai, Zig, Odin).
The 5 Patterns
1. Nil-Sentinel Dataclasses (replaces None)
When a function would "return None" in conventional Python, return a nil-sentinel dataclass instead. The sentinel has all default values (zero-initialized) and is safe to read from.
from dataclasses import dataclass, field
@dataclass(frozen=True)
class NilPath:
exists: bool = False
read_text: str = ""
errors: list[ErrorInfo] = field(default_factory=list)
NIL_PATH = NilPath() # module-level singleton
Callers don't need if x is None: checks; they can call x.read_text and
get "" on the nil path.
Convention: NIL_* (uppercase) is the module-level singleton. Nil*
(PascalCase) is the class. Frozen dataclass prevents runtime mutation.
2. Zero-Initialization (via @dataclass defaults)
Fresh memory from the OS is zero-initialized. In Python, @dataclass with
field defaults achieves the same: the data is in a valid "empty" state
without any explicit constructor logic.
@dataclass(frozen=True)
class String8:
text: str = ""
size: int = 0
Code that consumes String8 (e.g., a for-loop bounded by size) works
correctly with the zero-initialized instance.
Convention: Mutable defaults use field(default_factory=list) (NOT = [],
which is shared across instances).
3. Fail Early (push validation to shallow stack frames)
Don't defer error checks to deep in the call stack. Push them to the entry point so the user knows ASAP if the operation cannot succeed.
def do_thing(path: Path) -> Result[str]:
resolved = _resolve_path(path) # validation happens HERE, not deeper
if not resolved.ok:
return Result(data="", errors=resolved.errors)
...
Convention: assert at entry points for invariants. Early return for
user-facing errors. try/finally (Python's analog to goto defer) for
cleanup.
4. AND over OR (Result with side-channel errors; no sum types)
Instead of Union[T, E] or Result<T, E>, return a struct with BOTH data
and errors as parallel fields:
@dataclass(frozen=True)
class Result(Generic[T]):
data: T # the happy-path result (zero-initialized on failure)
errors: list[ErrorInfo] = field(default_factory=list) # side-channel; empty = success
Callers:
r = do_thing(path)
if r.errors:
for err in r.errors: log(err.ui_message())
# use r.data regardless (it's the zero-initialized value on failure)
Convention: Result is generic over T (the success data) but NOT over
the error type. Errors are always list[ErrorInfo] (a side-channel list, not
a tagged sum). This collapses the bifurcated if r.ok: ... else: ...
codepaths into a single flat codepath.
5. Error Info as Side-Channel (not as exception)
Errors flow as DATA in the Result struct, not as exceptions. SDK
boundaries (which must catch vendor exceptions) convert them to ErrorInfo:
@dataclass(frozen=True)
class ErrorInfo:
kind: ErrorKind
message: str
source: str = ""
original: BaseException | None = None
def ui_message(self) -> str:
src = f"[{self.source}] " if self.source else ""
return f"{src}{self.kind.value}: {self.message}"
Convention: ErrorInfo is the canonical error type. The legacy
ai_client.ProviderError exception class is removed; SDK helpers
(_classify_<vendor>_error()) RETURN ErrorInfo instead of raising.
The Data Model
The canonical types live in src/result_types.py:
| Type | Form | Purpose |
|---|---|---|
ErrorKind |
str, Enum (12+ values) |
Canonical error taxonomy: NETWORK, AUTH, QUOTA, RATE_LIMIT, BALANCE, PERMISSION, NOT_FOUND, INVALID_INPUT, NOT_READY, UNKNOWN, CONFIG, INTERNAL, plus optional PROVIDER_HISTORY_DIVERGED_FROM_UI for app-vs-provider-state-divergence cases. Each value has exactly one meaning. |
ErrorInfo |
@dataclass(frozen=True) |
A single error: kind: ErrorKind, message: str, source: str = "", original: BaseException | None = None. Frozen; carries ui_message() for display. |
Result[T] |
@dataclass(frozen=True) Generic[T] |
The success-or-failure container: data: T, errors: list[ErrorInfo] = field(default_factory=list), ok: bool property, with_error(), with_errors(), with_data() methods. |
NilPath |
@dataclass(frozen=True) + NIL_PATH |
Nil-sentinel for filesystem paths. Has exists=False, read_text="", errors=[]. |
NilRAGState |
@dataclass(frozen=True) + NIL_RAG_STATE |
Nil-sentinel for the RAG engine. Has enabled=False, is_empty_result=True, errors=[]. |
OK |
Result[None] constant |
Trivial success for fail-or-succeed operations that carry no data. |
Result is generic over T only (not over the error type). Errors are
always list[ErrorInfo]. This is the AND-over-OR principle: data and errors
are parallel fields, not a tagged sum.
Decision Tree
Need to represent "missing or failed"?
|
+-- Is the value a "data" value (not a control-flow signal)?
| +-- Use a Result dataclass (data + errors list)
| +-- Use a nil-sentinel dataclass (zero-initialized)
|
+-- Is the value a control-flow signal (e.g., "abort" or "skip")?
| +-- Use a boolean (or enum)
| +-- Use Optional[bool] / Optional[Enum] ONLY if the absence is meaningful
|
+-- Is the failure "unrecoverable" (programmer error, not runtime condition)?
| +-- Use assert (debug builds)
| +-- Use raise (only for programmer errors like KeyError on a known dict)
|
+-- Does the SDK raise an exception you can't avoid?
+-- Catch at the boundary; convert to ErrorInfo inside a Result
Anti-Patterns
DON'T do these things:
- DON'T use
Optional[X]for "this might fail at runtime". UseResult[X]instead. - DON'T use
Noneas a sentinel for "no result". Use a nil-sentinel dataclass. - DON'T raise a custom exception class for runtime failures. Catch SDK
exceptions and return
ErrorInfo. - DON'T use
Union[T, E](sum type). Use a struct with parallel fields (AND over OR). - DON'T have
if x is None: handle; else: use_xpatterns in production code. The nil-sentinel makes them unnecessary. - DON'T catch
except Exceptionand silently swallow. Convert toErrorInfoand return in theResult.
Examples
The 3 refactored subsystems demonstrate each pattern in context:
src/mcp_client.py:205-294—read_file,list_directory,search_filesreturnResult[str];(p, err)tuples becomeResult[Path]; the 30+assert p is not Nonechain (lines 304-794) is removed.src/ai_client.py—_send_<vendor>_result()returnsResult[str](8 vendors: gemini, anthropic, deepseek, minimax, gemini_cli, qwen, llama, grok);send_result()is the new public API;send()is@deprecated.src/rag_engine.py:100-180—_init_vector_store_result,_validate_collection_dim_result,is_empty_result,add_documents_resultreturnResult[None]orResult[T]; broadexcept Exceptionblocks becomeErrorInfoentries.
Hard Rules (enforced in the 3 refactored files)
These are non-negotiable in src/mcp_client.py, src/ai_client.py, and
src/rag_engine.py:
Optional[T]return types are FORBIDDEN in the 3 refactored files. UseResult[T](withNIL_Tsingleton if needed) instead. Rationale:Optional[T]is the sum typeUnion[T, None]that Fleury's framework replaces. Mixing the two patterns reintroduces the bifurcation the convention is designed to remove.- Function return types must be
Result[T]for any function that can fail at runtime. A function that can't fail (e.g.,get_name() -> str) doesn't need aResult. The classification is "can this return a different value under different runtime conditions?" If yes,Result. If no, plain return type. - Catch SDK exceptions at the boundary only. Inside the 3 refactored
files, the only place an exception is caught is at the SDK call site
(e.g.,
_send_<vendor>_result()wrapping the SDK call). Internaltry/exceptis reserved for convertingOSError,PermissionError, and similar I/O exceptions toErrorInfoat the mcp_client tool boundary.
The verification script scripts/audit_optional_in_3_files.py enforces the
Optional[X] rule by failing CI if any new Optional[X] appears in the 3
refactored files.
Optional[X] in argument types
The Optional[X] ban above applies to return types only. Argument types
that genuinely may be None (e.g., rag_engine: Optional[Any] = None,
pre_tool_callback: Optional[Callable] = None) remain allowed; they describe
a caller choice, not a runtime failure of this function.
Cross-thread safety
Result and ErrorInfo are @dataclass(frozen=True) and therefore
thread-safe by immutability. The with_error() / with_errors() /
with_data() methods produce new instances (no mutation), matching the
project's "no shared mutable state across threads" invariant. Deprecation
warnings use warnings.warn(..., stacklevel=2) which is thread-safe.
When to Use This Convention
Use it for:
- New public APIs (any function that can fail at runtime and the caller might care).
- New internal functions where the caller benefits from knowing the failure
(vs. just propagating
None).
Don't use it for:
- Constructors (
__init__) that fail with programmer errors (useassertorraisefor these). See "Constructors Can Raise" below for the full rule. - Trivial getters that can't fail (
get_name() -> strdoesn't need aResult). - Performance-critical hot paths where the overhead of the dataclass allocation is measurable (rare; benchmark first).
Boundary Types: What Counts as a "Boundary"?
The convention says "exceptions are reserved for the SDK boundary," but what counts as a boundary? There are 3 categories:
1. Third-party SDK calls
A try/except that wraps a call to a third-party SDK is the canonical
boundary use of the pattern. The catch site converts the SDK's exception
to ErrorInfo (or re-raises if the function is the public API and a Result
is the right return type).
Recognized third-party SDK modules (partial list):
anthropic, google / google.genai / google.api_core, openai,
groq, cohere, chromadb, sentence_transformers, huggingface_hub,
requests, urllib3, httpx, aiohttp, websockets, psutil,
imgui_bundle, dearpygui, PIL, cv2, numpy.
Recognized third-party exception types (partial list):
anthropic.APIError / RateLimitError / AuthenticationError,
google.api_core.exceptions.GoogleAPIError / ResourceExhausted,
openai.OpenAIError / APIError / RateLimitError,
requests.RequestException / ConnectionError / Timeout,
httpx.HTTPError / RequestError,
chromadb.errors.ChromaError,
pydantic.ValidationError.
2. Stdlib I/O that can raise
File and network I/O via stdlib (open(), os.path.*, json.loads(),
subprocess.run(), socket.*, sqlite3.*, csv.*, zipfile.*,
xml.etree.ElementTree) commonly raises. Catching the specific exception
(OSError, FileNotFoundError, PermissionError,
json.JSONDecodeError, subprocess.CalledProcessError, etc.) at the
tool boundary and converting to ErrorInfo is compliant.
This is the "stdlib I/O exception caught in our own code is acceptable"
rule. The catch site should be specific (except FileNotFoundError,
not except Exception) and should convert to ErrorInfo, not swallow.
3. Framework boundaries (FastAPI)
A try/except or raise in a FastAPI _api_* handler is the framework
boundary. raise HTTPException(status_code=..., detail=...) is the
FastAPI-idiomatic way to signal an HTTP error; FastAPI converts it to a
JSON response at the framework level. This is not an exception leak
into internal code; it's the framework contract.
# Compliant: FastAPI boundary in _api_* handler
async def _api_get_key(controller, header_key: str) -> str:
if not _is_valid_key(header_key):
raise HTTPException(status_code=403, detail="Could not validate API Key")
return header_key
# Compliant: broad catch + HTTPException at the FastAPI boundary
async def _api_generate(controller, payload):
try:
result = ai_client.send_result(...)
return result.data
except Exception as e:
raise HTTPException(status_code=500, detail=f"AI call failed: {e}")
The catch-all except Exception is acceptable here because the
conversion is to the framework's exception (HTTPException), not to a
silent swallow. The detail message includes the original error; the
HTTP status code is the framework contract.
What is NOT a boundary
- Internal business logic:
try/exceptaround aforloop in a controller method is internal, not boundary. - Cross-method calls within
src/: calling a method inapp_controller.pyfrom a method inapp_controller.pyis internal, not boundary. - stdlib I/O that the user controls directly: opening a file the user
passed via
--configis internal; converting the failure should be Result-based, not exception-based.
The Broad-Except Distinction
Anti-pattern #6 says "DON'T catch except Exception and silently swallow."
But except Exception is not always a violation. The distinction is
what the catch site does with the exception:
| What the catch does | Classification | Convention status |
|---|---|---|
pass (or no body) |
INTERNAL_SILENT_SWALLOW |
Violation |
print(...) / log(...) only |
INTERNAL_SILENT_SWALLOW |
Violation (the data is lost) |
return None / return Optional[T] |
INTERNAL_OPTIONAL_RETURN |
Violation (use Result[T]) |
return Result(data=..., errors=[ErrorInfo(...)]) |
BOUNDARY_CONVERSION |
Compliant (the canonical pattern) |
raise (re-raise) |
INTERNAL_RETHROW (or BOUNDARY_SDK if at third-party call) |
Suspicious (often refactorable) |
raise HTTPException(...) (in _api_* handler) |
BOUNDARY_FASTAPI |
Compliant (the framework contract) |
The canonical pattern (in _result functions that wrap third-party SDK
calls):
def _validate_collection_dim_result(self) -> Result[None]:
if self.collection is None or self.collection == "mock":
return Result(data=None)
try:
res = self.collection.get(limit=1, include=["embeddings"])
# ... validation logic ...
return Result(data=None)
except Exception as e:
return Result(data=None, errors=[
ErrorInfo(kind=ErrorKind.INTERNAL,
message=f"Failed to validate collection dim: {e}",
source="rag._validate_collection_dim",
original=e)
])
This except Exception is compliant because the catch + ErrorInfo
conversion IS the data-oriented pattern. The original=e field preserves
the original exception for debugging.
The anti-pattern (in internal code that has nothing to do with a third-party SDK):
# VIOLATION: broad catch + silent swallow
try:
do_something()
except Exception:
pass
# VIOLATION: broad catch + log-only (data is lost)
try:
do_something()
except Exception as e:
print(f"Error: {e}")
Constructors Can Raise
Per the "When to Use This Convention" section, constructors (__init__)
that fail with programmer errors use assert or raise. This section
elaborates.
Compliant constructor raises:
class MyClass:
def __init__(self, config: Config):
if config is None:
raise ValueError("MyClass requires a non-None Config")
if not config.api_key:
raise ValueError("MyClass requires a non-empty api_key")
self._config = config
Compliant assert (for impossible states):
def _set_rag_status(self, status: str):
# The status string is one of a known set; if it's not, the caller
# has a bug.
assert status in {"idle", "ready", "syncing", "error"}, f"Unknown status: {status}"
self._rag_status = status
The rule: if the failure is "this object cannot exist without X," raise
in __init__ is the canonical pattern. The Result pattern is for runtime
failures ("the network is down"); raise is for programmer errors ("you
forgot to pass X").
Recognized programmer-error exception types (per
scripts/audit_exception_handling.py INTERNAL_PROGRAMMER_RAISE
category):
AssertionError, ValueError, KeyError, IndexError, TypeError,
AttributeError, NameError, RuntimeError, NotImplementedError.
Re-Raise Patterns
A try/except + raise (without ErrorInfo conversion) is suspicious but
not always a violation. There are 3 legitimate re-raise patterns:
1. Catch + convert + raise as a different type
# Compliant: convert library error to user-friendly error
try:
value = json.loads(raw)
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON: {e}") from e
The from e preserves the original exception in the traceback. The
new exception type (ValueError) is more meaningful to the caller.
2. Catch + log + re-raise
# Compliant: log before propagating
try:
do_something()
except Exception as e:
logger.exception("do_something failed; will propagate")
raise
The log line provides a record; the re-raise preserves the original control flow. This is appropriate when the failure is severe and the caller should still handle it.
3. Catch + cleanup + re-raise
# Compliant: ensure cleanup before propagating
try:
resource = acquire()
do_something(resource)
finally:
release(resource) # `finally` is cleaner; `except+raise` is for when
# you also need to log or convert
Use try/finally for the pure cleanup case (no logging/conversion).
Use try/except + re-raise when you need to log or convert AND ensure
cleanup.
Suspicious re-raise (often a code smell)
# SUSPICIOUS: catch + re-raise the same exception (no value-add)
try:
do_something()
except Exception:
raise
This catches an exception, does nothing with it, and re-raises. The
try/except is dead code; remove it or use a Result-based propagation
instead.
The audit script flags this as INTERNAL_RETHROW (suspicious). If you
see this pattern in code review, ask "is the try/except doing anything
useful? If not, remove it."
Audit Script
The convention is enforced via
scripts/audit_exception_handling.py. This is a static analyzer (AST-based)
that classifies every try/except/finally/raise site in the codebase per
the categories in the previous sections.
Usage:
# Human-readable report
uv run python scripts/audit_exception_handling.py
# JSON output for tooling
uv run python scripts/audit_exception_handling.py --json
# Include tests/ and scripts/
uv run python scripts/audit_exception_handling.py --include-tests
# Top N files (default: 15)
uv run python scripts/audit_exception_handling.py --top 20
# Show every site inline
uv run python scripts/audit_exception_handling.py --verbose
# Strict mode (exit 1 on any violation; for CI use)
uv run python scripts/audit_exception_handling.py --strict
"Delete to turn off" (per feature_flags.md): rm scripts/audit_exception_handling.py disables the audit. Re-enable by
restoring the file (it's tracked in git).
Classification categories (the canonical taxonomy; matches the script's output):
| Category | Convention status | When |
|---|---|---|
BOUNDARY_SDK |
Compliant | Wraps a third-party SDK call |
BOUNDARY_IO |
Compliant | Wraps stdlib I/O that can raise |
BOUNDARY_CONVERSION |
Compliant | Catches and converts to ErrorInfo in a Result |
BOUNDARY_FASTAPI |
Compliant | FastAPI HTTPException in _api_* handler |
INTERNAL_SILENT_SWALLOW |
Violation | except ...: pass or just logs |
INTERNAL_BROAD_CATCH |
Violation | except Exception without ErrorInfo conversion, in non-*_result code |
INTERNAL_OPTIONAL_RETURN |
Violation | try/except + return None/Optional[T] |
INTERNAL_RETHROW |
Suspicious | try/except + raise (without ErrorInfo conversion) |
INTERNAL_PROGRAMMER_RAISE |
Compliant | raise for impossible state / precondition |
INTERNAL_COMPLIANT |
Compliant | try/finally (no except) — canonical cleanup |
UNCLEAR |
Review needed | Can't determine automatically |
Output structure:
=== Exception Handling Audit (Data-Oriented Convention) ===
Files scanned: 65
Files with findings: 42
Total sites: 348
Compliant sites: 80
Suspicious sites: 25
Violation sites: 211
Unclear (review): 32
--- Baseline (refactored files: mcp_client, ai_client, rag_engine) ---
Sites: 112, violations: 77
--- Migration target (all other src/ files) ---
Sites: 236, violations: 134
The baseline is the 3 fully-refactored files (the convention reference).
The migration target is the ~10 unrefactored files in src/. The
violation count is informational; the user decides which migration-target
files warrant a refactor track.
Important: the audit is informational, not a CI gate. The script
exits 0 by default. Use --strict to enable CI-gate mode (exit 1 on any
violation). The user is expected to review the report and decide the
next action.
Migration Playbook
When converting existing code:
- Identify the
Optional[X]return type or theraisestatement. - Define a
Resultdataclass (or use the existing one) withdata: Xanderrors: list[ErrorInfo]. - Replace
Nonereturns withResult(data=NIL_X, errors=[...])orResult(data=zero_value, errors=[...]). - Replace
raise Xwithreturn Result(data=zero_value, errors=[ErrorInfo(kind=..., message=...)]). - Update the caller to check
result.errorsinstead ofis None/try/except. - Add a test that verifies both the success and failure paths return the
right
Result.
Deprecation: ai_client.send() → ai_client.send_result()
The public ai_client.send() is marked @deprecated (via
typing_extensions.deprecated, the Python 3.11+ backport of
@warnings.deprecated). It still works for backward compat but emits a
DeprecationWarning at runtime. New code MUST use ai_client.send_result().
send_result(...) -> Result[str, ErrorInfo]— the new public API.send(...) -> str— deprecated. Returnsstrfor backward compat; errors are logged to the comms log but not returned.- Removal timeline:
public_api_migration_20260606follow-up track.
The deprecation warning is cached per call site (Python's __warningregistry__)
to avoid log spam. tests/conftest.py adds a filterwarnings entry to
silence the warning during the transition; new tests for the new API should
assert the warning is NOT emitted by send_result().
AI Agent Checklist (Added 2026-06-16)
This section is for AI agents writing code in this codebase. LLMs are
trained on idiomatic Python (try/except, Optional[T], raise Exception, etc.) which is the OPPOSITE of this convention. The
checklist below catches the most common LLM mistakes. Run this
checklist before claiming a task is done.
The 5 MUST-DO rules
When writing NEW code, you MUST:
-
Use
Result[T]for any function that can fail at runtime. A function that returns a different value under different runtime conditions (success vs. failure) returnsResult[T], notOptional[T], notT | None, not a custom exception class. Use theResultdataclass fromsrc/result_types.py; populateerrors: list[ErrorInfo]on failure. -
Catch SDK exceptions at the boundary, convert to
ErrorInfo. If your code callsanthropic,google.genai,openai,chromadb,requests, or any other third-party SDK, the catch site converts the exception toErrorInfo(kind=..., message=...)and returns it inResult.errors. Do NOT re-raise; do NOT swallow; do NOT let the exception propagate into internal code. -
Use nil-sentinel dataclasses for "no result". If a function would return
Nonein idiomatic Python, return a frozenNilPath/NilRAGState/ etc. singleton fromsrc/result_types.pyinstead. Callers don't needif x is None:checks; they can callx.read_textand get""on the nil path. -
Use
try/finally(no except) for cleanup. Baretry: ...; finally: cleanup()is the canonicalgoto deferpattern. Use it for resource cleanup, lock release, file handle close. Do NOT usetry/except+ pass for cleanup; the cleanup should run whether or not an exception occurred. -
raiseis reserved for programmer errors.assertfor "this should never happen" invariants.raise ValueError,raise NotImplementedError,raise KeyErrorin__init__for "this object needs X." Do NOT useraisefor runtime failures (the network is down, the file doesn't exist, the API rate-limited); those areResultcases.
The 7 MUST-NOT-DO rules
When writing NEW code, you MUST NOT:
-
DO NOT use
Optional[T]as a return type (in any file insrc/mcp_client.py,src/ai_client.py,src/rag_engine.py— the 3 refactored files). UseResult[T]instead. CI fails if you add a newOptional[T]to those files (enforced byscripts/audit_optional_in_3_files.py). -
DO NOT use
Optional[T]as a return type (anywhere else insrc/). The convention is migrating toResult[T]; new code should set the pattern, not perpetuate the old one. Argument types that may beNone(caller choice) are still OK. -
DO NOT use
Noneas a sentinel for "no result". Use a nil-sentinel dataclass. The data is zero-initialized; the caller doesn't need a None check. -
DO NOT raise a custom exception class for runtime failures. SDK exceptions caught and converted to
ErrorInfois the only legitimate exception path. Internal code usesResult. -
DO NOT use
Union[T, E](sum type). UseResult[T]with side-channelerrors: list[ErrorInfo]. The result is the data AND the errors, not a tagged sum. -
DO NOT catch
except Exceptionand silently swallow. Either narrow the exception type, convert toErrorInfoin aResult, or document the intentional swallow with a comment-freeassertfor the precondition. The audit script flags this asINTERNAL_SILENT_SWALLOW. -
DO NOT catch
except Exceptionin non-*_resultcode without conversion toErrorInfo. If you must catch, convert:except SomeError as e: return Result(data=NIL_T, errors=[ErrorInfo(kind=INTERNAL, message=..., original=e)]). The audit script flags this asINTERNAL_BROAD_CATCH.
The 3 boundary patterns (where try/except IS the right answer)
These are the 3 categories where try/except is legitimate. See the
"Boundary Types" section above for the full discussion.
-
Third-party SDK calls. Wrapping
anthropic.Anthropic().messages.create(...)intry/except anthropic.APIErroris the canonical pattern. Convert toErrorInfo; return inResult. -
Stdlib I/O that can raise.
open(),os.path.*,json.loads(),subprocess.run(),socket.*,sqlite3.*,chromadb.PersistentClient()can all raise. Catch the specific exception (OSError,FileNotFoundError,json.JSONDecodeError,subprocess.CalledProcessError, etc.); convert toErrorInfo. -
FastAPI
HTTPExceptionin_api_*handlers.raise HTTPException(status_code=..., detail=...)in a function named_api_*is the FastAPI-idiomatic way to signal HTTP errors. FastAPI converts it to a JSON response at the framework level. This is NOT an exception leak; it's the framework contract.
The pre-commit gate
Before claiming "done," you MUST run:
uv run python scripts/audit_exception_handling.py
If the script reports any INTERNAL_* (other than INTERNAL_COMPLIANT
and INTERNAL_PROGRAMMER_RAISE) or BOUNDARY_* (other than
BOUNDARY_FASTAPI in _api_* handlers), your code violates the
convention. Fix it before committing. For CI use:
uv run python scripts/audit_exception_handling.py --strict
--strict exits 1 on any violation; use this in pre-commit hooks and
CI to enforce the convention. The 4 enforcement audit scripts are:
scripts/audit_exception_handling.py --strict(this one)scripts/audit_weak_types.py --strict(the type-strengthening audit)scripts/audit_main_thread_imports.py(always strict; the import graph gate)scripts/audit_no_models_config_io.py(the config-I/O ownership gate)
All 4 are part of the convention enforcement. See
conductor/product-guidelines.md "Data-Oriented Error Handling" and
docs/AGENTS.md §"Convention Enforcement" for the project-level rules.
Why this checklist exists
LLMs are trained on idiomatic Python. Without this checklist, an
AI agent writing new code in this codebase will revert to idiomatic
patterns (try/except, Optional[T], raise Exception) — the
"tech rot with idiomatic Python" the user is preventing. The
checklist is the last line of defense. The audit scripts are the
automated check; the checklist is the manual one.
conductor/tracks/data_oriented_error_handling_20260606/spec.md— the spec that established this convention.docs/guide_ai_client.md"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the provider layer.docs/guide_mcp_client.md"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the MCP tool layer.conductor/code_styleguides/data_oriented_design.md(added 2026-06-12) — the canonical Data-Oriented Design (DOD) reference; this track is the canonical application of DOD to error handling ("errors are data, not control flow").conductor/code_styleguides/agent_memory_dimensions.md(added 2026-06-12) — the 4-dim memory model; the knowledge harvest TDD protocol inworkflow.mduses this track'sResultpattern.docs/guide_rag.md"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the RAG engine.- Ryan Fleury's original article — the philosophical foundation.