# Data-Oriented Error Handling > **Status:** Active convention as of 2026-06-11. Established by the > `data_oriented_error_handling_20260606` track. Canonical reference for all > Python error-handling decisions in this codebase. This styleguide codifies Ryan Fleury's "errors are just cases" framework as the project convention. The 5 patterns below replace `Optional[T]` returns and exception-based control flow with `Result[T]` dataclasses and nil-sentinel dataclasses. SDK-boundary exceptions are caught and converted to `ErrorInfo`; the rest of the application works with data, not control flow. Reference: [Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them"](https://www.dgtlgrove.com/p/the-easiest-way-to-handle-errors). Independent corroboration: Timothy Lottes (`ERROR[__line__]: _code_` exit pattern; each error code has exactly one meaning — never overload `UNKNOWN`), Valigo ("Exceptions are horrifying"; modern languages without legacy baggage move away from exceptions — Rust, Jai, Zig, Odin). --- ## The 5 Patterns ### 1. Nil-Sentinel Dataclasses (replaces `None`) When a function would "return None" in conventional Python, return a nil-sentinel dataclass instead. The sentinel has all default values (zero-initialized) and is safe to read from. ```python from dataclasses import dataclass, field @dataclass(frozen=True) class NilPath: exists: bool = False read_text: str = "" errors: list[ErrorInfo] = field(default_factory=list) NIL_PATH = NilPath() # module-level singleton ``` Callers don't need `if x is None:` checks; they can call `x.read_text` and get `""` on the nil path. **Convention:** `NIL_*` (uppercase) is the module-level singleton. `Nil*` (PascalCase) is the class. Frozen dataclass prevents runtime mutation. ### 2. Zero-Initialization (via `@dataclass` defaults) Fresh memory from the OS is zero-initialized. In Python, `@dataclass` with field defaults achieves the same: the data is in a valid "empty" state without any explicit constructor logic. ```python @dataclass(frozen=True) class String8: text: str = "" size: int = 0 ``` Code that consumes `String8` (e.g., a for-loop bounded by `size`) works correctly with the zero-initialized instance. **Convention:** Mutable defaults use `field(default_factory=list)` (NOT `= []`, which is shared across instances). ### 3. Fail Early (push validation to shallow stack frames) Don't defer error checks to deep in the call stack. Push them to the entry point so the user knows ASAP if the operation cannot succeed. ```python def do_thing(path: Path) -> Result[str]: resolved = _resolve_path(path) # validation happens HERE, not deeper if not resolved.ok: return Result(data="", errors=resolved.errors) ... ``` **Convention:** `assert` at entry points for invariants. Early `return` for user-facing errors. `try/finally` (Python's analog to `goto defer`) for cleanup. ### 4. AND over OR (Result with side-channel errors; no sum types) Instead of `Union[T, E]` or `Result`, return a struct with BOTH data and errors as parallel fields: ```python @dataclass(frozen=True) class Result(Generic[T]): data: T # the happy-path result (zero-initialized on failure) errors: list[ErrorInfo] = field(default_factory=list) # side-channel; empty = success ``` Callers: ```python r = do_thing(path) if r.errors: for err in r.errors: log(err.ui_message()) # use r.data regardless (it's the zero-initialized value on failure) ``` **Convention:** `Result` is generic over `T` (the success data) but NOT over the error type. Errors are always `list[ErrorInfo]` (a side-channel list, not a tagged sum). This collapses the bifurcated `if r.ok: ... else: ...` codepaths into a single flat codepath. ### 5. Error Info as Side-Channel (not as exception) Errors flow as DATA in the `Result` struct, not as exceptions. SDK boundaries (which must catch vendor exceptions) convert them to `ErrorInfo`: ```python @dataclass(frozen=True) class ErrorInfo: kind: ErrorKind message: str source: str = "" original: BaseException | None = None def ui_message(self) -> str: src = f"[{self.source}] " if self.source else "" return f"{src}{self.kind.value}: {self.message}" ``` **Convention:** `ErrorInfo` is the canonical error type. The legacy `ai_client.ProviderError` exception class is removed; SDK helpers (`_classify__error()`) RETURN `ErrorInfo` instead of raising. --- ## The Data Model The canonical types live in `src/result_types.py`: | Type | Form | Purpose | |---|---|---| | `ErrorKind` | `str, Enum` (12+ values) | Canonical error taxonomy: `NETWORK`, `AUTH`, `QUOTA`, `RATE_LIMIT`, `BALANCE`, `PERMISSION`, `NOT_FOUND`, `INVALID_INPUT`, `NOT_READY`, `UNKNOWN`, `CONFIG`, `INTERNAL`, plus optional `PROVIDER_HISTORY_DIVERGED_FROM_UI` for app-vs-provider-state-divergence cases. Each value has exactly one meaning. | | `ErrorInfo` | `@dataclass(frozen=True)` | A single error: `kind: ErrorKind`, `message: str`, `source: str = ""`, `original: BaseException \| None = None`. Frozen; carries `ui_message()` for display. | | `Result[T]` | `@dataclass(frozen=True)` `Generic[T]` | The success-or-failure container: `data: T`, `errors: list[ErrorInfo] = field(default_factory=list)`, `ok: bool` property, `with_error()`, `with_errors()`, `with_data()` methods. | | `NilPath` | `@dataclass(frozen=True)` + `NIL_PATH` | Nil-sentinel for filesystem paths. Has `exists=False`, `read_text=""`, `errors=[]`. | | `NilRAGState` | `@dataclass(frozen=True)` + `NIL_RAG_STATE` | Nil-sentinel for the RAG engine. Has `enabled=False`, `is_empty_result=True`, `errors=[]`. | | `OK` | `Result[None]` constant | Trivial success for fail-or-succeed operations that carry no data. | `Result` is **generic over `T` only** (not over the error type). Errors are always `list[ErrorInfo]`. This is the AND-over-OR principle: data and errors are parallel fields, not a tagged sum. --- ## Decision Tree ``` Need to represent "missing or failed"? | +-- Is the value a "data" value (not a control-flow signal)? | +-- Use a Result dataclass (data + errors list) | +-- Use a nil-sentinel dataclass (zero-initialized) | +-- Is the value a control-flow signal (e.g., "abort" or "skip")? | +-- Use a boolean (or enum) | +-- Use Optional[bool] / Optional[Enum] ONLY if the absence is meaningful | +-- Is the failure "unrecoverable" (programmer error, not runtime condition)? | +-- Use assert (debug builds) | +-- Use raise (only for programmer errors like KeyError on a known dict) | +-- Does the SDK raise an exception you can't avoid? +-- Catch at the boundary; convert to ErrorInfo inside a Result ``` --- ## Anti-Patterns **DON'T do these things:** 1. **DON'T** use `Optional[X]` for "this might fail at runtime". Use `Result[X]` instead. 2. **DON'T** use `None` as a sentinel for "no result". Use a nil-sentinel dataclass. 3. **DON'T** raise a custom exception class for runtime failures. Catch SDK exceptions and return `ErrorInfo`. 4. **DON'T** use `Union[T, E]` (sum type). Use a struct with parallel fields (AND over OR). 5. **DON'T** have `if x is None: handle; else: use_x` patterns in production code. The nil-sentinel makes them unnecessary. 6. **DON'T** catch `except Exception` and silently swallow. Convert to `ErrorInfo` and return in the `Result`. --- ## Examples The 3 refactored subsystems demonstrate each pattern in context: - **`src/mcp_client.py:205-294`** — `read_file`, `list_directory`, `search_files` return `Result[str]`; `(p, err)` tuples become `Result[Path]`; the 30+ `assert p is not None` chain (lines 304-794) is removed. - **`src/ai_client.py`** — `_send__result()` returns `Result[str]` (8 vendors: gemini, anthropic, deepseek, minimax, gemini_cli, qwen, llama, grok); `send(...) -> Result[str, ErrorInfo]` is the public API. - **`src/rag_engine.py:100-180`** — `_init_vector_store_result`, `_validate_collection_dim_result`, `is_empty_result`, `add_documents_result` return `Result[None]` or `Result[T]`; broad `except Exception` blocks become `ErrorInfo` entries. --- ## Hard Rules (enforced in the 3 refactored files) These are non-negotiable in `src/mcp_client.py`, `src/ai_client.py`, and `src/rag_engine.py`: - **`Optional[T]` return types are FORBIDDEN** in the 3 refactored files. Use `Result[T]` (with `NIL_T` singleton if needed) instead. Rationale: `Optional[T]` is the sum type `Union[T, None]` that Fleury's framework replaces. Mixing the two patterns reintroduces the bifurcation the convention is designed to remove. - **Function return types must be `Result[T]` for any function that can fail at runtime.** A function that can't fail (e.g., `get_name() -> str`) doesn't need a `Result`. The classification is "can this return a different value under different runtime conditions?" If yes, `Result`. If no, plain return type. - **Catch SDK exceptions at the boundary only.** Inside the 3 refactored files, the only place an exception is caught is at the SDK call site (e.g., `_send__result()` wrapping the SDK call). Internal `try/except` is reserved for converting `OSError`, `PermissionError`, and similar I/O exceptions to `ErrorInfo` at the mcp_client tool boundary. The verification script `scripts/audit_optional_in_3_files.py` enforces the `Optional[X]` rule by failing CI if any new `Optional[X]` appears in the 3 refactored files. ### `Optional[X]` in argument types The `Optional[X]` ban above applies to **return types only**. Argument types that genuinely may be `None` (e.g., `rag_engine: Optional[Any] = None`, `pre_tool_callback: Optional[Callable] = None`) remain allowed; they describe a caller choice, not a runtime failure of this function. ### Cross-thread safety `Result` and `ErrorInfo` are `@dataclass(frozen=True)` and therefore thread-safe by immutability. The `with_error()` / `with_errors()` / `with_data()` methods produce new instances (no mutation), matching the project's "no shared mutable state across threads" invariant. Deprecation warnings use `warnings.warn(..., stacklevel=2)` which is thread-safe. --- ## When to Use This Convention **Use it for:** - New public APIs (any function that can fail at runtime and the caller might care). - New internal functions where the caller benefits from knowing the failure (vs. just propagating `None`). **Don't use it for:** - Constructors (`__init__`) that fail with programmer errors (use `assert` or `raise` for these). See "Constructors Can Raise" below for the full rule. - Trivial getters that can't fail (`get_name() -> str` doesn't need a `Result`). - Performance-critical hot paths where the overhead of the dataclass allocation is measurable (rare; benchmark first). --- ## Boundary Types: What Counts as a "Boundary"? The convention says "exceptions are reserved for the SDK boundary," but what counts as a boundary? There are 3 categories: ### 1. Third-party SDK calls A try/except that wraps a call to a third-party SDK is the canonical boundary use of the pattern. The catch site converts the SDK's exception to `ErrorInfo` (or re-raises if the function is the public API and a Result is the right return type). Recognized third-party SDK modules (partial list): `anthropic`, `google` / `google.genai` / `google.api_core`, `openai`, `groq`, `cohere`, `chromadb`, `sentence_transformers`, `huggingface_hub`, `requests`, `urllib3`, `httpx`, `aiohttp`, `websockets`, `psutil`, `imgui_bundle`, `dearpygui`, `PIL`, `cv2`, `numpy`. Recognized third-party exception types (partial list): `anthropic.APIError` / `RateLimitError` / `AuthenticationError`, `google.api_core.exceptions.GoogleAPIError` / `ResourceExhausted`, `openai.OpenAIError` / `APIError` / `RateLimitError`, `requests.RequestException` / `ConnectionError` / `Timeout`, `httpx.HTTPError` / `RequestError`, `chromadb.errors.ChromaError`, `pydantic.ValidationError`. ### 2. Stdlib I/O that can raise File and network I/O via stdlib (`open()`, `os.path.*`, `json.loads()`, `subprocess.run()`, `socket.*`, `sqlite3.*`, `csv.*`, `zipfile.*`, `xml.etree.ElementTree`) commonly raises. Catching the specific exception (`OSError`, `FileNotFoundError`, `PermissionError`, `json.JSONDecodeError`, `subprocess.CalledProcessError`, etc.) at the tool boundary and converting to `ErrorInfo` is compliant. This is the "stdlib I/O exception caught in our own code is acceptable" rule. The catch site should be **specific** (`except FileNotFoundError`, not `except Exception`) and should convert to `ErrorInfo`, not swallow. ### 3. Framework boundaries (FastAPI) A try/except or `raise` in a FastAPI `_api_*` handler is the framework boundary. `raise HTTPException(status_code=..., detail=...)` is the FastAPI-idiomatic way to signal an HTTP error; FastAPI converts it to a JSON response at the framework level. This is **not** an exception leak into internal code; it's the framework contract. ```python # Compliant: FastAPI boundary in _api_* handler async def _api_get_key(controller, header_key: str) -> str: if not _is_valid_key(header_key): raise HTTPException(status_code=403, detail="Could not validate API Key") return header_key # Compliant: broad catch + HTTPException at the FastAPI boundary async def _api_generate(controller, payload): try: result = ai_client.send(...) return result.data except Exception as e: raise HTTPException(status_code=500, detail=f"AI call failed: {e}") ``` The catch-all `except Exception` is acceptable here **because the conversion is to the framework's exception** (HTTPException), not to a silent swallow. The detail message includes the original error; the HTTP status code is the framework contract. ### What is NOT a boundary - Internal business logic: `try/except` around a `for` loop in a controller method is internal, not boundary. - Cross-method calls within `src/`: calling a method in `app_controller.py` from a method in `app_controller.py` is internal, not boundary. - stdlib I/O that the user controls directly: opening a file the user passed via `--config` is internal; converting the failure should be Result-based, not exception-based. --- ## The Broad-Except Distinction Anti-pattern #6 says "DON'T catch `except Exception` and silently swallow." But `except Exception` is **not always a violation**. The distinction is **what the catch site does with the exception**: | What the catch does | Classification | Convention status | |---|---|---| | `pass` (or no body) | `INTERNAL_SILENT_SWALLOW` | **Violation** | | `print(...)` / `log(...)` only | `INTERNAL_SILENT_SWALLOW` | **Violation** (the data is lost) | | `return None` / `return Optional[T]` | `INTERNAL_OPTIONAL_RETURN` | **Violation** (use `Result[T]`) | | `return Result(data=..., errors=[ErrorInfo(...)])` | `BOUNDARY_CONVERSION` | **Compliant** (the canonical pattern) | | `raise` (re-raise) | `INTERNAL_RETHROW` (or `BOUNDARY_SDK` if at third-party call) | **Suspicious** (often refactorable) | | `raise HTTPException(...)` (in `_api_*` handler) | `BOUNDARY_FASTAPI` | **Compliant** (the framework contract) | **The canonical pattern** (in `_result` functions that wrap third-party SDK calls): ```python def _validate_collection_dim_result(self) -> Result[None]: if self.collection is None or self.collection == "mock": return Result(data=None) try: res = self.collection.get(limit=1, include=["embeddings"]) # ... validation logic ... return Result(data=None) except Exception as e: return Result(data=None, errors=[ ErrorInfo(kind=ErrorKind.INTERNAL, message=f"Failed to validate collection dim: {e}", source="rag._validate_collection_dim", original=e) ]) ``` This `except Exception` is **compliant** because the catch + ErrorInfo conversion IS the data-oriented pattern. The `original=e` field preserves the original exception for debugging. **The anti-pattern** (in internal code that has nothing to do with a third-party SDK): ```python # VIOLATION: broad catch + silent swallow try: do_something() except Exception: pass # VIOLATION: broad catch + log-only (data is lost) try: do_something() except Exception as e: print(f"Error: {e}") ``` --- ## Constructors Can Raise Per the "When to Use This Convention" section, constructors (`__init__`) that fail with programmer errors use `assert` or `raise`. This section elaborates. **Compliant constructor raises:** ```python class MyClass: def __init__(self, config: Config): if config is None: raise ValueError("MyClass requires a non-None Config") if not config.api_key: raise ValueError("MyClass requires a non-empty api_key") self._config = config ``` **Compliant assert (for impossible states):** ```python def _set_rag_status(self, status: str): # The status string is one of a known set; if it's not, the caller # has a bug. assert status in {"idle", "ready", "syncing", "error"}, f"Unknown status: {status}" self._rag_status = status ``` **The rule:** if the failure is "this object cannot exist without X," raise in `__init__` is the canonical pattern. The Result pattern is for runtime failures ("the network is down"); raise is for programmer errors ("you forgot to pass X"). **Recognized programmer-error exception types** (per `scripts/audit_exception_handling.py` `INTERNAL_PROGRAMMER_RAISE` category): `AssertionError`, `ValueError`, `KeyError`, `IndexError`, `TypeError`, `AttributeError`, `NameError`, `RuntimeError`, `NotImplementedError`. --- ## Re-Raise Patterns A `try/except + raise` (without ErrorInfo conversion) is **suspicious** but not always a violation. There are 3 legitimate re-raise patterns: ### 1. Catch + convert + raise as a different type ```python # Compliant: convert library error to user-friendly error try: value = json.loads(raw) except json.JSONDecodeError as e: raise ValueError(f"Invalid JSON: {e}") from e ``` The `from e` preserves the original exception in the traceback. The new exception type (`ValueError`) is more meaningful to the caller. ### 2. Catch + log + re-raise ```python # Compliant: log before propagating try: do_something() except Exception as e: logger.exception("do_something failed; will propagate") raise ``` The log line provides a record; the re-raise preserves the original control flow. This is appropriate when the failure is severe and the caller should still handle it. ### 3. Catch + cleanup + re-raise ```python # Compliant: ensure cleanup before propagating try: resource = acquire() do_something(resource) finally: release(resource) # `finally` is cleaner; `except+raise` is for when # you also need to log or convert ``` Use `try/finally` for the pure cleanup case (no logging/conversion). Use `try/except + re-raise` when you need to log or convert AND ensure cleanup. ### Suspicious re-raise (often a code smell) ```python # SUSPICIOUS: catch + re-raise the same exception (no value-add) try: do_something() except Exception: raise ``` This catches an exception, does nothing with it, and re-raises. The `try/except` is dead code; remove it or use a `Result`-based propagation instead. The audit script flags this as `INTERNAL_RETHROW` (suspicious). If you see this pattern in code review, ask "is the `try/except` doing anything useful? If not, remove it." --- ## Audit Script The convention is enforced via `scripts/audit_exception_handling.py`. This is a static analyzer (AST-based) that classifies every `try/except/finally/raise` site in the codebase per the categories in the previous sections. **Usage:** ```bash # Human-readable report uv run python scripts/audit_exception_handling.py # JSON output for tooling uv run python scripts/audit_exception_handling.py --json # Include tests/ and scripts/ uv run python scripts/audit_exception_handling.py --include-tests # Top N files (default: 15) uv run python scripts/audit_exception_handling.py --top 20 # Show every site inline uv run python scripts/audit_exception_handling.py --verbose # Strict mode (exit 1 on any violation; for CI use) uv run python scripts/audit_exception_handling.py --strict ``` **"Delete to turn off"** (per `feature_flags.md`): `rm scripts/audit_exception_handling.py` disables the audit. Re-enable by restoring the file (it's tracked in git). **Classification categories** (the canonical taxonomy; matches the script's output): | Category | Convention status | When | |---|---|---| | `BOUNDARY_SDK` | Compliant | Wraps a third-party SDK call | | `BOUNDARY_IO` | Compliant | Wraps stdlib I/O that can raise | | `BOUNDARY_CONVERSION` | Compliant | Catches and converts to `ErrorInfo` in a `Result` | | `BOUNDARY_FASTAPI` | Compliant | FastAPI `HTTPException` in `_api_*` handler | | `INTERNAL_SILENT_SWALLOW` | **Violation** | `except ...: pass` or just logs | | `INTERNAL_BROAD_CATCH` | **Violation** | `except Exception` without ErrorInfo conversion, in non-`*_result` code | | `INTERNAL_OPTIONAL_RETURN` | **Violation** | `try/except + return None/Optional[T]` | | `INTERNAL_RETHROW` | Suspicious | `try/except + raise` (without ErrorInfo conversion) | | `INTERNAL_PROGRAMMER_RAISE` | Compliant | `raise` for impossible state / precondition | | `INTERNAL_COMPLIANT` | Compliant | `try/finally` (no except) — canonical cleanup | | `UNCLEAR` | Review needed | Can't determine automatically | **Output structure:** ``` === Exception Handling Audit (Data-Oriented Convention) === Files scanned: 65 Files with findings: 42 Total sites: 348 Compliant sites: 80 Suspicious sites: 25 Violation sites: 211 Unclear (review): 32 --- Baseline (refactored files: mcp_client, ai_client, rag_engine) --- Sites: 112, violations: 77 --- Migration target (all other src/ files) --- Sites: 236, violations: 134 ``` The **baseline** is the 3 fully-refactored files (the convention reference). The **migration target** is the ~10 unrefactored files in `src/`. The violation count is informational; the user decides which migration-target files warrant a refactor track. **Important:** the audit is **informational**, not a CI gate. The script exits 0 by default. Use `--strict` to enable CI-gate mode (exit 1 on any violation). The user is expected to review the report and decide the next action. --- ## Migration Playbook When converting existing code: 1. Identify the `Optional[X]` return type or the `raise` statement. 2. Define a `Result` dataclass (or use the existing one) with `data: X` and `errors: list[ErrorInfo]`. 3. Replace `None` returns with `Result(data=NIL_X, errors=[...])` or `Result(data=zero_value, errors=[...])`. 4. Replace `raise X` with `return Result(data=zero_value, errors=[ErrorInfo(kind=..., message=...)])`. 5. Update the caller to check `result.errors` instead of `is None` / `try/except`. 6. Add a test that verifies both the success and failure paths return the right `Result`. --- ## Historical deprecation (added 2026-06-15, reverted 2026-06-16) The public `ai_client.send()` was briefly marked `@deprecated` in favor of `ai_client.send_result()` on 2026-06-15 by the `public_api_migration_and_ui_polish_20260615` track. The decision was reverted on 2026-06-16 by `send_result_to_send_20260616` after the Tier 2 autonomous sandbox proved capable of doing the rename safely. `ai_client.send(...) -> Result[str, ErrorInfo]` is the canonical public API. No deprecation is in effect. For the historical record of the brief deprecation cycle, see `conductor/tracks/public_api_migration_and_ui_polish_20260615/spec.md` and `conductor/tracks/send_result_to_send_20260616/spec.md`. --- ## AI Agent Checklist (Added 2026-06-16) This section is for AI agents writing code in this codebase. LLMs are trained on idiomatic Python (`try/except`, `Optional[T]`, `raise Exception`, etc.) which is the OPPOSITE of this convention. The checklist below catches the most common LLM mistakes. **Run this checklist before claiming a task is done.** ### The 5 MUST-DO rules When writing NEW code, you MUST: 1. **Use `Result[T]` for any function that can fail at runtime.** A function that returns a different value under different runtime conditions (success vs. failure) returns `Result[T]`, not `Optional[T]`, not `T | None`, not a custom exception class. Use the `Result` dataclass from `src/result_types.py`; populate `errors: list[ErrorInfo]` on failure. 2. **Catch SDK exceptions at the boundary, convert to `ErrorInfo`.** If your code calls `anthropic`, `google.genai`, `openai`, `chromadb`, `requests`, or any other third-party SDK, the catch site converts the exception to `ErrorInfo(kind=..., message=...)` and returns it in `Result.errors`. Do NOT re-raise; do NOT swallow; do NOT let the exception propagate into internal code. 3. **Use nil-sentinel dataclasses for "no result".** If a function would return `None` in idiomatic Python, return a frozen `NilPath` / `NilRAGState` / etc. singleton from `src/result_types.py` instead. Callers don't need `if x is None:` checks; they can call `x.read_text` and get `""` on the nil path. 4. **Use `try/finally` (no except) for cleanup.** Bare `try: ...; finally: cleanup()` is the canonical `goto defer` pattern. Use it for resource cleanup, lock release, file handle close. Do NOT use `try/except` + pass for cleanup; the cleanup should run whether or not an exception occurred. 5. **`raise` is reserved for programmer errors.** `assert` for "this should never happen" invariants. `raise ValueError`, `raise NotImplementedError`, `raise KeyError` in `__init__` for "this object needs X." Do NOT use `raise` for runtime failures (the network is down, the file doesn't exist, the API rate-limited); those are `Result` cases. ### The 7 MUST-NOT-DO rules When writing NEW code, you MUST NOT: 1. **DO NOT use `Optional[T]` as a return type** (in any file in `src/mcp_client.py`, `src/ai_client.py`, `src/rag_engine.py` — the 3 refactored files). Use `Result[T]` instead. CI fails if you add a new `Optional[T]` to those files (enforced by `scripts/audit_optional_in_3_files.py`). 2. **DO NOT use `Optional[T]` as a return type** (anywhere else in `src/`). The convention is migrating to `Result[T]`; new code should set the pattern, not perpetuate the old one. Argument types that may be `None` (caller choice) are still OK. 3. **DO NOT use `None` as a sentinel for "no result".** Use a nil-sentinel dataclass. The data is zero-initialized; the caller doesn't need a None check. 4. **DO NOT raise a custom exception class for runtime failures.** SDK exceptions caught and converted to `ErrorInfo` is the only legitimate exception path. Internal code uses `Result`. 5. **DO NOT use `Union[T, E]` (sum type).** Use `Result[T]` with side-channel `errors: list[ErrorInfo]`. The result is the data AND the errors, not a tagged sum. 6. **DO NOT catch `except Exception` and silently swallow.** Either narrow the exception type, convert to `ErrorInfo` in a `Result`, or document the intentional swallow with a comment-free `assert` for the precondition. The audit script flags this as `INTERNAL_SILENT_SWALLOW`. 7. **DO NOT catch `except Exception` in non-`*_result` code without conversion to `ErrorInfo`.** If you must catch, convert: `except SomeError as e: return Result(data=NIL_T, errors=[ErrorInfo(kind=INTERNAL, message=..., original=e)])`. The audit script flags this as `INTERNAL_BROAD_CATCH`. ### The 3 boundary patterns (where `try/except` IS the right answer) These are the 3 categories where `try/except` is legitimate. See the "Boundary Types" section above for the full discussion. 1. **Third-party SDK calls.** Wrapping `anthropic.Anthropic().messages.create(...)` in `try/except anthropic.APIError` is the canonical pattern. Convert to `ErrorInfo`; return in `Result`. 2. **Stdlib I/O that can raise.** `open()`, `os.path.*`, `json.loads()`, `subprocess.run()`, `socket.*`, `sqlite3.*`, `chromadb.PersistentClient()` can all raise. Catch the specific exception (`OSError`, `FileNotFoundError`, `json.JSONDecodeError`, `subprocess.CalledProcessError`, etc.); convert to `ErrorInfo`. 3. **FastAPI `HTTPException` in `_api_*` handlers.** `raise HTTPException(status_code=..., detail=...)` in a function named `_api_*` is the FastAPI-idiomatic way to signal HTTP errors. FastAPI converts it to a JSON response at the framework level. This is NOT an exception leak; it's the framework contract. ### The pre-commit gate Before claiming "done," you MUST run: ```bash uv run python scripts/audit_exception_handling.py ``` If the script reports any `INTERNAL_*` (other than `INTERNAL_COMPLIANT` and `INTERNAL_PROGRAMMER_RAISE`) or `BOUNDARY_*` (other than `BOUNDARY_FASTAPI` in `_api_*` handlers), your code violates the convention. Fix it before committing. For CI use: ```bash uv run python scripts/audit_exception_handling.py --strict ``` `--strict` exits 1 on any violation; use this in pre-commit hooks and CI to enforce the convention. The 4 enforcement audit scripts are: - `scripts/audit_exception_handling.py --strict` (this one) - `scripts/audit_weak_types.py --strict` (the type-strengthening audit) - `scripts/audit_main_thread_imports.py` (always strict; the import graph gate) - `scripts/audit_no_models_config_io.py` (the config-I/O ownership gate) All 4 are part of the convention enforcement. See `conductor/product-guidelines.md` "Data-Oriented Error Handling" and `docs/AGENTS.md` §"Convention Enforcement" for the project-level rules. ### Why this checklist exists LLMs are trained on idiomatic Python. Without this checklist, an AI agent writing new code in this codebase will revert to idiomatic patterns (`try/except`, `Optional[T]`, `raise Exception`) — the "tech rot with idiomatic Python" the user is preventing. The checklist is the last line of defense. The audit scripts are the automated check; the checklist is the manual one. --- - `conductor/tracks/data_oriented_error_handling_20260606/spec.md` — the spec that established this convention. - `docs/guide_ai_client.md` "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the provider layer. - `docs/guide_mcp_client.md` "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the MCP tool layer. - `conductor/code_styleguides/data_oriented_design.md` (added 2026-06-12) — the canonical Data-Oriented Design (DOD) reference; this track is the canonical application of DOD to error handling ("errors are data, not control flow"). - `conductor/code_styleguides/agent_memory_dimensions.md` (added 2026-06-12) — the 4-dim memory model; the knowledge harvest TDD protocol in `workflow.md` uses this track's `Result` pattern. - `docs/guide_rag.md` "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the RAG engine. - Ryan Fleury's [original article](https://www.dgtlgrove.com/p/the-easiest-way-to-handle-errors) — the philosophical foundation.