13 KiB
Data-Oriented Error Handling
Status: Active convention as of 2026-06-11. Established by the
data_oriented_error_handling_20260606track. Canonical reference for all Python error-handling decisions in this codebase.
This styleguide codifies Ryan Fleury's "errors are just cases" framework as the
project convention. The 5 patterns below replace Optional[T] returns and
exception-based control flow with Result[T] dataclasses and nil-sentinel
dataclasses. SDK-boundary exceptions are caught and converted to ErrorInfo;
the rest of the application works with data, not control flow.
Reference: Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have
Them".
Independent corroboration: Timothy Lottes (ERROR[__line__]: _code_ exit
pattern; each error code has exactly one meaning — never overload UNKNOWN),
Valigo ("Exceptions are horrifying"; modern languages without legacy baggage
move away from exceptions — Rust, Jai, Zig, Odin).
The 5 Patterns
1. Nil-Sentinel Dataclasses (replaces None)
When a function would "return None" in conventional Python, return a nil-sentinel dataclass instead. The sentinel has all default values (zero-initialized) and is safe to read from.
from dataclasses import dataclass, field
@dataclass(frozen=True)
class NilPath:
exists: bool = False
read_text: str = ""
errors: list[ErrorInfo] = field(default_factory=list)
NIL_PATH = NilPath() # module-level singleton
Callers don't need if x is None: checks; they can call x.read_text and
get "" on the nil path.
Convention: NIL_* (uppercase) is the module-level singleton. Nil*
(PascalCase) is the class. Frozen dataclass prevents runtime mutation.
2. Zero-Initialization (via @dataclass defaults)
Fresh memory from the OS is zero-initialized. In Python, @dataclass with
field defaults achieves the same: the data is in a valid "empty" state
without any explicit constructor logic.
@dataclass(frozen=True)
class String8:
text: str = ""
size: int = 0
Code that consumes String8 (e.g., a for-loop bounded by size) works
correctly with the zero-initialized instance.
Convention: Mutable defaults use field(default_factory=list) (NOT = [],
which is shared across instances).
3. Fail Early (push validation to shallow stack frames)
Don't defer error checks to deep in the call stack. Push them to the entry point so the user knows ASAP if the operation cannot succeed.
def do_thing(path: Path) -> Result[str]:
resolved = _resolve_path(path) # validation happens HERE, not deeper
if not resolved.ok:
return Result(data="", errors=resolved.errors)
...
Convention: assert at entry points for invariants. Early return for
user-facing errors. try/finally (Python's analog to goto defer) for
cleanup.
4. AND over OR (Result with side-channel errors; no sum types)
Instead of Union[T, E] or Result<T, E>, return a struct with BOTH data
and errors as parallel fields:
@dataclass(frozen=True)
class Result(Generic[T]):
data: T # the happy-path result (zero-initialized on failure)
errors: list[ErrorInfo] = field(default_factory=list) # side-channel; empty = success
Callers:
r = do_thing(path)
if r.errors:
for err in r.errors: log(err.ui_message())
# use r.data regardless (it's the zero-initialized value on failure)
Convention: Result is generic over T (the success data) but NOT over
the error type. Errors are always list[ErrorInfo] (a side-channel list, not
a tagged sum). This collapses the bifurcated if r.ok: ... else: ...
codepaths into a single flat codepath.
5. Error Info as Side-Channel (not as exception)
Errors flow as DATA in the Result struct, not as exceptions. SDK
boundaries (which must catch vendor exceptions) convert them to ErrorInfo:
@dataclass(frozen=True)
class ErrorInfo:
kind: ErrorKind
message: str
source: str = ""
original: BaseException | None = None
def ui_message(self) -> str:
src = f"[{self.source}] " if self.source else ""
return f"{src}{self.kind.value}: {self.message}"
Convention: ErrorInfo is the canonical error type. The legacy
ai_client.ProviderError exception class is removed; SDK helpers
(_classify_<vendor>_error()) RETURN ErrorInfo instead of raising.
The Data Model
The canonical types live in src/result_types.py:
| Type | Form | Purpose |
|---|---|---|
ErrorKind |
str, Enum (12+ values) |
Canonical error taxonomy: NETWORK, AUTH, QUOTA, RATE_LIMIT, BALANCE, PERMISSION, NOT_FOUND, INVALID_INPUT, NOT_READY, UNKNOWN, CONFIG, INTERNAL, plus optional PROVIDER_HISTORY_DIVERGED_FROM_UI for app-vs-provider-state-divergence cases. Each value has exactly one meaning. |
ErrorInfo |
@dataclass(frozen=True) |
A single error: kind: ErrorKind, message: str, source: str = "", original: BaseException | None = None. Frozen; carries ui_message() for display. |
Result[T] |
@dataclass(frozen=True) Generic[T] |
The success-or-failure container: data: T, errors: list[ErrorInfo] = field(default_factory=list), ok: bool property, with_error(), with_errors(), with_data() methods. |
NilPath |
@dataclass(frozen=True) + NIL_PATH |
Nil-sentinel for filesystem paths. Has exists=False, read_text="", errors=[]. |
NilRAGState |
@dataclass(frozen=True) + NIL_RAG_STATE |
Nil-sentinel for the RAG engine. Has enabled=False, is_empty_result=True, errors=[]. |
OK |
Result[None] constant |
Trivial success for fail-or-succeed operations that carry no data. |
Result is generic over T only (not over the error type). Errors are
always list[ErrorInfo]. This is the AND-over-OR principle: data and errors
are parallel fields, not a tagged sum.
Decision Tree
Need to represent "missing or failed"?
|
+-- Is the value a "data" value (not a control-flow signal)?
| +-- Use a Result dataclass (data + errors list)
| +-- Use a nil-sentinel dataclass (zero-initialized)
|
+-- Is the value a control-flow signal (e.g., "abort" or "skip")?
| +-- Use a boolean (or enum)
| +-- Use Optional[bool] / Optional[Enum] ONLY if the absence is meaningful
|
+-- Is the failure "unrecoverable" (programmer error, not runtime condition)?
| +-- Use assert (debug builds)
| +-- Use raise (only for programmer errors like KeyError on a known dict)
|
+-- Does the SDK raise an exception you can't avoid?
+-- Catch at the boundary; convert to ErrorInfo inside a Result
Anti-Patterns
DON'T do these things:
- DON'T use
Optional[X]for "this might fail at runtime". UseResult[X]instead. - DON'T use
Noneas a sentinel for "no result". Use a nil-sentinel dataclass. - DON'T raise a custom exception class for runtime failures. Catch SDK
exceptions and return
ErrorInfo. - DON'T use
Union[T, E](sum type). Use a struct with parallel fields (AND over OR). - DON'T have
if x is None: handle; else: use_xpatterns in production code. The nil-sentinel makes them unnecessary. - DON'T catch
except Exceptionand silently swallow. Convert toErrorInfoand return in theResult.
Examples
The 3 refactored subsystems demonstrate each pattern in context:
src/mcp_client.py:205-294—read_file,list_directory,search_filesreturnResult[str];(p, err)tuples becomeResult[Path]; the 30+assert p is not Nonechain (lines 304-794) is removed.src/ai_client.py—_send_<vendor>_result()returnsResult[str](8 vendors: gemini, anthropic, deepseek, minimax, gemini_cli, qwen, llama, grok);send_result()is the new public API;send()is@deprecated.src/rag_engine.py:100-180—_init_vector_store_result,_validate_collection_dim_result,is_empty_result,add_documents_resultreturnResult[None]orResult[T]; broadexcept Exceptionblocks becomeErrorInfoentries.
Hard Rules (enforced in the 3 refactored files)
These are non-negotiable in src/mcp_client.py, src/ai_client.py, and
src/rag_engine.py:
Optional[T]return types are FORBIDDEN in the 3 refactored files. UseResult[T](withNIL_Tsingleton if needed) instead. Rationale:Optional[T]is the sum typeUnion[T, None]that Fleury's framework replaces. Mixing the two patterns reintroduces the bifurcation the convention is designed to remove.- Function return types must be
Result[T]for any function that can fail at runtime. A function that can't fail (e.g.,get_name() -> str) doesn't need aResult. The classification is "can this return a different value under different runtime conditions?" If yes,Result. If no, plain return type. - Catch SDK exceptions at the boundary only. Inside the 3 refactored
files, the only place an exception is caught is at the SDK call site
(e.g.,
_send_<vendor>_result()wrapping the SDK call). Internaltry/exceptis reserved for convertingOSError,PermissionError, and similar I/O exceptions toErrorInfoat the mcp_client tool boundary.
The verification script scripts/audit_optional_in_3_files.py enforces the
Optional[X] rule by failing CI if any new Optional[X] appears in the 3
refactored files.
Optional[X] in argument types
The Optional[X] ban above applies to return types only. Argument types
that genuinely may be None (e.g., rag_engine: Optional[Any] = None,
pre_tool_callback: Optional[Callable] = None) remain allowed; they describe
a caller choice, not a runtime failure of this function.
Cross-thread safety
Result and ErrorInfo are @dataclass(frozen=True) and therefore
thread-safe by immutability. The with_error() / with_errors() /
with_data() methods produce new instances (no mutation), matching the
project's "no shared mutable state across threads" invariant. Deprecation
warnings use warnings.warn(..., stacklevel=2) which is thread-safe.
When to Use This Convention
Use it for:
- New public APIs (any function that can fail at runtime and the caller might care).
- New internal functions where the caller benefits from knowing the failure
(vs. just propagating
None).
Don't use it for:
- Constructors (
__init__) that fail with programmer errors (useassertorraisefor these). - Trivial getters that can't fail (
get_name() -> strdoesn't need aResult). - Performance-critical hot paths where the overhead of the dataclass allocation is measurable (rare; benchmark first).
Migration Playbook
When converting existing code:
- Identify the
Optional[X]return type or theraisestatement. - Define a
Resultdataclass (or use the existing one) withdata: Xanderrors: list[ErrorInfo]. - Replace
Nonereturns withResult(data=NIL_X, errors=[...])orResult(data=zero_value, errors=[...]). - Replace
raise Xwithreturn Result(data=zero_value, errors=[ErrorInfo(kind=..., message=...)]). - Update the caller to check
result.errorsinstead ofis None/try/except. - Add a test that verifies both the success and failure paths return the
right
Result.
Deprecation: ai_client.send() → ai_client.send_result()
The public ai_client.send() is marked @deprecated (via
typing_extensions.deprecated, the Python 3.11+ backport of
@warnings.deprecated). It still works for backward compat but emits a
DeprecationWarning at runtime. New code MUST use ai_client.send_result().
send_result(...) -> Result[str, ErrorInfo]— the new public API.send(...) -> str— deprecated. Returnsstrfor backward compat; errors are logged to the comms log but not returned.- Removal timeline:
public_api_migration_20260606follow-up track.
The deprecation warning is cached per call site (Python's __warningregistry__)
to avoid log spam. tests/conftest.py adds a filterwarnings entry to
silence the warning during the transition; new tests for the new API should
assert the warning is NOT emitted by send_result().
See Also
conductor/tracks/data_oriented_error_handling_20260606/spec.md— the spec that established this convention.docs/guide_ai_client.md"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the provider layer.docs/guide_mcp_client.md"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the MCP tool layer.conductor/code_styleguides/data_oriented_design.md(added 2026-06-12) — the canonical Data-Oriented Design (DOD) reference; this track is the canonical application of DOD to error handling ("errors are data, not control flow").conductor/code_styleguides/agent_memory_dimensions.md(added 2026-06-12) — the 4-dim memory model; the knowledge harvest TDD protocol inworkflow.mduses this track'sResultpattern.docs/guide_rag.md"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the RAG engine.- Ryan Fleury's original article — the philosophical foundation.