# Data-Oriented Error Handling > **Status:** Active convention as of 2026-06-11. Established by the > `data_oriented_error_handling_20260606` track. Canonical reference for all > Python error-handling decisions in this codebase. This styleguide codifies Ryan Fleury's "errors are just cases" framework as the project convention. The 5 patterns below replace `Optional[T]` returns and exception-based control flow with `Result[T]` dataclasses and nil-sentinel dataclasses. SDK-boundary exceptions are caught and converted to `ErrorInfo`; the rest of the application works with data, not control flow. Reference: [Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them"](https://www.dgtlgrove.com/p/the-easiest-way-to-handle-errors). Independent corroboration: Timothy Lottes (`ERROR[__line__]: _code_` exit pattern; each error code has exactly one meaning — never overload `UNKNOWN`), Valigo ("Exceptions are horrifying"; modern languages without legacy baggage move away from exceptions — Rust, Jai, Zig, Odin). --- ## The 5 Patterns ### 1. Nil-Sentinel Dataclasses (replaces `None`) When a function would "return None" in conventional Python, return a nil-sentinel dataclass instead. The sentinel has all default values (zero-initialized) and is safe to read from. ```python from dataclasses import dataclass, field @dataclass(frozen=True) class NilPath: exists: bool = False read_text: str = "" errors: list[ErrorInfo] = field(default_factory=list) NIL_PATH = NilPath() # module-level singleton ``` Callers don't need `if x is None:` checks; they can call `x.read_text` and get `""` on the nil path. **Convention:** `NIL_*` (uppercase) is the module-level singleton. `Nil*` (PascalCase) is the class. Frozen dataclass prevents runtime mutation. ### 2. Zero-Initialization (via `@dataclass` defaults) Fresh memory from the OS is zero-initialized. In Python, `@dataclass` with field defaults achieves the same: the data is in a valid "empty" state without any explicit constructor logic. ```python @dataclass(frozen=True) class String8: text: str = "" size: int = 0 ``` Code that consumes `String8` (e.g., a for-loop bounded by `size`) works correctly with the zero-initialized instance. **Convention:** Mutable defaults use `field(default_factory=list)` (NOT `= []`, which is shared across instances). ### 3. Fail Early (push validation to shallow stack frames) Don't defer error checks to deep in the call stack. Push them to the entry point so the user knows ASAP if the operation cannot succeed. ```python def do_thing(path: Path) -> Result[str]: resolved = _resolve_path(path) # validation happens HERE, not deeper if not resolved.ok: return Result(data="", errors=resolved.errors) ... ``` **Convention:** `assert` at entry points for invariants. Early `return` for user-facing errors. `try/finally` (Python's analog to `goto defer`) for cleanup. ### 4. AND over OR (Result with side-channel errors; no sum types) Instead of `Union[T, E]` or `Result`, return a struct with BOTH data and errors as parallel fields: ```python @dataclass(frozen=True) class Result(Generic[T]): data: T # the happy-path result (zero-initialized on failure) errors: list[ErrorInfo] = field(default_factory=list) # side-channel; empty = success ``` Callers: ```python r = do_thing(path) if r.errors: for err in r.errors: log(err.ui_message()) # use r.data regardless (it's the zero-initialized value on failure) ``` **Convention:** `Result` is generic over `T` (the success data) but NOT over the error type. Errors are always `list[ErrorInfo]` (a side-channel list, not a tagged sum). This collapses the bifurcated `if r.ok: ... else: ...` codepaths into a single flat codepath. ### 5. Error Info as Side-Channel (not as exception) Errors flow as DATA in the `Result` struct, not as exceptions. SDK boundaries (which must catch vendor exceptions) convert them to `ErrorInfo`: ```python @dataclass(frozen=True) class ErrorInfo: kind: ErrorKind message: str source: str = "" original: BaseException | None = None def ui_message(self) -> str: src = f"[{self.source}] " if self.source else "" return f"{src}{self.kind.value}: {self.message}" ``` **Convention:** `ErrorInfo` is the canonical error type. The legacy `ai_client.ProviderError` exception class is removed; SDK helpers (`_classify__error()`) RETURN `ErrorInfo` instead of raising. --- ## The Data Model The canonical types live in `src/result_types.py`: | Type | Form | Purpose | |---|---|---| | `ErrorKind` | `str, Enum` (12+ values) | Canonical error taxonomy: `NETWORK`, `AUTH`, `QUOTA`, `RATE_LIMIT`, `BALANCE`, `PERMISSION`, `NOT_FOUND`, `INVALID_INPUT`, `NOT_READY`, `UNKNOWN`, `CONFIG`, `INTERNAL`, plus optional `PROVIDER_HISTORY_DIVERGED_FROM_UI` for app-vs-provider-state-divergence cases. Each value has exactly one meaning. | | `ErrorInfo` | `@dataclass(frozen=True)` | A single error: `kind: ErrorKind`, `message: str`, `source: str = ""`, `original: BaseException \| None = None`. Frozen; carries `ui_message()` for display. | | `Result[T]` | `@dataclass(frozen=True)` `Generic[T]` | The success-or-failure container: `data: T`, `errors: list[ErrorInfo] = field(default_factory=list)`, `ok: bool` property, `with_error()`, `with_errors()`, `with_data()` methods. | | `NilPath` | `@dataclass(frozen=True)` + `NIL_PATH` | Nil-sentinel for filesystem paths. Has `exists=False`, `read_text=""`, `errors=[]`. | | `NilRAGState` | `@dataclass(frozen=True)` + `NIL_RAG_STATE` | Nil-sentinel for the RAG engine. Has `enabled=False`, `is_empty_result=True`, `errors=[]`. | | `OK` | `Result[None]` constant | Trivial success for fail-or-succeed operations that carry no data. | `Result` is **generic over `T` only** (not over the error type). Errors are always `list[ErrorInfo]`. This is the AND-over-OR principle: data and errors are parallel fields, not a tagged sum. --- ## Decision Tree ``` Need to represent "missing or failed"? | +-- Is the value a "data" value (not a control-flow signal)? | +-- Use a Result dataclass (data + errors list) | +-- Use a nil-sentinel dataclass (zero-initialized) | +-- Is the value a control-flow signal (e.g., "abort" or "skip")? | +-- Use a boolean (or enum) | +-- Use Optional[bool] / Optional[Enum] ONLY if the absence is meaningful | +-- Is the failure "unrecoverable" (programmer error, not runtime condition)? | +-- Use assert (debug builds) | +-- Use raise (only for programmer errors like KeyError on a known dict) | +-- Does the SDK raise an exception you can't avoid? +-- Catch at the boundary; convert to ErrorInfo inside a Result ``` --- ## Anti-Patterns **DON'T do these things:** 1. **DON'T** use `Optional[X]` for "this might fail at runtime". Use `Result[X]` instead. 2. **DON'T** use `None` as a sentinel for "no result". Use a nil-sentinel dataclass. 3. **DON'T** raise a custom exception class for runtime failures. Catch SDK exceptions and return `ErrorInfo`. 4. **DON'T** use `Union[T, E]` (sum type). Use a struct with parallel fields (AND over OR). 5. **DON'T** have `if x is None: handle; else: use_x` patterns in production code. The nil-sentinel makes them unnecessary. 6. **DON'T** catch `except Exception` and silently swallow. Convert to `ErrorInfo` and return in the `Result`. --- ## Examples The 3 refactored subsystems demonstrate each pattern in context: - **`src/mcp_client.py:205-294`** — `read_file`, `list_directory`, `search_files` return `Result[str]`; `(p, err)` tuples become `Result[Path]`; the 30+ `assert p is not None` chain (lines 304-794) is removed. - **`src/ai_client.py`** — `_send__result()` returns `Result[str]` (8 vendors: gemini, anthropic, deepseek, minimax, gemini_cli, qwen, llama, grok); `send_result()` is the new public API; `send()` is `@deprecated`. - **`src/rag_engine.py:100-180`** — `_init_vector_store_result`, `_validate_collection_dim_result`, `is_empty_result`, `add_documents_result` return `Result[None]` or `Result[T]`; broad `except Exception` blocks become `ErrorInfo` entries. --- ## Hard Rules (enforced in the 3 refactored files) These are non-negotiable in `src/mcp_client.py`, `src/ai_client.py`, and `src/rag_engine.py`: - **`Optional[T]` return types are FORBIDDEN** in the 3 refactored files. Use `Result[T]` (with `NIL_T` singleton if needed) instead. Rationale: `Optional[T]` is the sum type `Union[T, None]` that Fleury's framework replaces. Mixing the two patterns reintroduces the bifurcation the convention is designed to remove. - **Function return types must be `Result[T]` for any function that can fail at runtime.** A function that can't fail (e.g., `get_name() -> str`) doesn't need a `Result`. The classification is "can this return a different value under different runtime conditions?" If yes, `Result`. If no, plain return type. - **Catch SDK exceptions at the boundary only.** Inside the 3 refactored files, the only place an exception is caught is at the SDK call site (e.g., `_send__result()` wrapping the SDK call). Internal `try/except` is reserved for converting `OSError`, `PermissionError`, and similar I/O exceptions to `ErrorInfo` at the mcp_client tool boundary. The verification script `scripts/audit_optional_in_3_files.py` enforces the `Optional[X]` rule by failing CI if any new `Optional[X]` appears in the 3 refactored files. ### `Optional[X]` in argument types The `Optional[X]` ban above applies to **return types only**. Argument types that genuinely may be `None` (e.g., `rag_engine: Optional[Any] = None`, `pre_tool_callback: Optional[Callable] = None`) remain allowed; they describe a caller choice, not a runtime failure of this function. ### Cross-thread safety `Result` and `ErrorInfo` are `@dataclass(frozen=True)` and therefore thread-safe by immutability. The `with_error()` / `with_errors()` / `with_data()` methods produce new instances (no mutation), matching the project's "no shared mutable state across threads" invariant. Deprecation warnings use `warnings.warn(..., stacklevel=2)` which is thread-safe. --- ## When to Use This Convention **Use it for:** - New public APIs (any function that can fail at runtime and the caller might care). - New internal functions where the caller benefits from knowing the failure (vs. just propagating `None`). **Don't use it for:** - Constructors (`__init__`) that fail with programmer errors (use `assert` or `raise` for these). - Trivial getters that can't fail (`get_name() -> str` doesn't need a `Result`). - Performance-critical hot paths where the overhead of the dataclass allocation is measurable (rare; benchmark first). --- ## Migration Playbook When converting existing code: 1. Identify the `Optional[X]` return type or the `raise` statement. 2. Define a `Result` dataclass (or use the existing one) with `data: X` and `errors: list[ErrorInfo]`. 3. Replace `None` returns with `Result(data=NIL_X, errors=[...])` or `Result(data=zero_value, errors=[...])`. 4. Replace `raise X` with `return Result(data=zero_value, errors=[ErrorInfo(kind=..., message=...)])`. 5. Update the caller to check `result.errors` instead of `is None` / `try/except`. 6. Add a test that verifies both the success and failure paths return the right `Result`. --- ## Deprecation: `ai_client.send()` → `ai_client.send_result()` The public `ai_client.send()` is marked `@deprecated` (via `typing_extensions.deprecated`, the Python 3.11+ backport of `@warnings.deprecated`). It still works for backward compat but emits a `DeprecationWarning` at runtime. New code MUST use `ai_client.send_result()`. - `send_result(...) -> Result[str, ErrorInfo]` — the new public API. - `send(...) -> str` — **deprecated.** Returns `str` for backward compat; errors are logged to the comms log but not returned. - Removal timeline: `public_api_migration_20260606` follow-up track. The deprecation warning is cached per call site (Python's `__warningregistry__`) to avoid log spam. `tests/conftest.py` adds a `filterwarnings` entry to silence the warning during the transition; new tests for the new API should assert the warning is NOT emitted by `send_result()`. --- ## See Also - `conductor/tracks/data_oriented_error_handling_20260606/spec.md` — the spec that established this convention. - `docs/guide_ai_client.md` "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the provider layer. - `docs/guide_mcp_client.md` "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the MCP tool layer. - `docs/guide_rag.md` "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the RAG engine. - Ryan Fleury's [original article](https://www.dgtlgrove.com/p/the-easiest-way-to-handle-errors) — the philosophical foundation.