Private
Public Access
0
0
Files
manual_slop/conductor/code_styleguides/error_handling.md
T

13 KiB

Data-Oriented Error Handling

Status: Active convention as of 2026-06-11. Established by the data_oriented_error_handling_20260606 track. Canonical reference for all Python error-handling decisions in this codebase.

This styleguide codifies Ryan Fleury's "errors are just cases" framework as the project convention. The 5 patterns below replace Optional[T] returns and exception-based control flow with Result[T] dataclasses and nil-sentinel dataclasses. SDK-boundary exceptions are caught and converted to ErrorInfo; the rest of the application works with data, not control flow.

Reference: Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them". Independent corroboration: Timothy Lottes (ERROR[__line__]: _code_ exit pattern; each error code has exactly one meaning — never overload UNKNOWN), Valigo ("Exceptions are horrifying"; modern languages without legacy baggage move away from exceptions — Rust, Jai, Zig, Odin).


The 5 Patterns

1. Nil-Sentinel Dataclasses (replaces None)

When a function would "return None" in conventional Python, return a nil-sentinel dataclass instead. The sentinel has all default values (zero-initialized) and is safe to read from.

from dataclasses import dataclass, field

@dataclass(frozen=True)
class NilPath:
 exists: bool = False
 read_text: str = ""
 errors: list[ErrorInfo] = field(default_factory=list)

NIL_PATH = NilPath() # module-level singleton

Callers don't need if x is None: checks; they can call x.read_text and get "" on the nil path.

Convention: NIL_* (uppercase) is the module-level singleton. Nil* (PascalCase) is the class. Frozen dataclass prevents runtime mutation.

2. Zero-Initialization (via @dataclass defaults)

Fresh memory from the OS is zero-initialized. In Python, @dataclass with field defaults achieves the same: the data is in a valid "empty" state without any explicit constructor logic.

@dataclass(frozen=True)
class String8:
 text: str = ""
 size: int = 0

Code that consumes String8 (e.g., a for-loop bounded by size) works correctly with the zero-initialized instance.

Convention: Mutable defaults use field(default_factory=list) (NOT = [], which is shared across instances).

3. Fail Early (push validation to shallow stack frames)

Don't defer error checks to deep in the call stack. Push them to the entry point so the user knows ASAP if the operation cannot succeed.

def do_thing(path: Path) -> Result[str]:
 resolved = _resolve_path(path) # validation happens HERE, not deeper
 if not resolved.ok:
 return Result(data="", errors=resolved.errors)
 ...

Convention: assert at entry points for invariants. Early return for user-facing errors. try/finally (Python's analog to goto defer) for cleanup.

4. AND over OR (Result with side-channel errors; no sum types)

Instead of Union[T, E] or Result<T, E>, return a struct with BOTH data and errors as parallel fields:

@dataclass(frozen=True)
class Result(Generic[T]):
 data: T # the happy-path result (zero-initialized on failure)
 errors: list[ErrorInfo] = field(default_factory=list) # side-channel; empty = success

Callers:

r = do_thing(path)
if r.errors:
 for err in r.errors: log(err.ui_message())
# use r.data regardless (it's the zero-initialized value on failure)

Convention: Result is generic over T (the success data) but NOT over the error type. Errors are always list[ErrorInfo] (a side-channel list, not a tagged sum). This collapses the bifurcated if r.ok: ... else: ... codepaths into a single flat codepath.

5. Error Info as Side-Channel (not as exception)

Errors flow as DATA in the Result struct, not as exceptions. SDK boundaries (which must catch vendor exceptions) convert them to ErrorInfo:

@dataclass(frozen=True)
class ErrorInfo:
 kind: ErrorKind
 message: str
 source: str = ""
 original: BaseException | None = None
 def ui_message(self) -> str:
 src = f"[{self.source}] " if self.source else ""
 return f"{src}{self.kind.value}: {self.message}"

Convention: ErrorInfo is the canonical error type. The legacy ai_client.ProviderError exception class is removed; SDK helpers (_classify_<vendor>_error()) RETURN ErrorInfo instead of raising.


The Data Model

The canonical types live in src/result_types.py:

Type Form Purpose
ErrorKind str, Enum (12+ values) Canonical error taxonomy: NETWORK, AUTH, QUOTA, RATE_LIMIT, BALANCE, PERMISSION, NOT_FOUND, INVALID_INPUT, NOT_READY, UNKNOWN, CONFIG, INTERNAL, plus optional PROVIDER_HISTORY_DIVERGED_FROM_UI for app-vs-provider-state-divergence cases. Each value has exactly one meaning.
ErrorInfo @dataclass(frozen=True) A single error: kind: ErrorKind, message: str, source: str = "", original: BaseException | None = None. Frozen; carries ui_message() for display.
Result[T] @dataclass(frozen=True) Generic[T] The success-or-failure container: data: T, errors: list[ErrorInfo] = field(default_factory=list), ok: bool property, with_error(), with_errors(), with_data() methods.
NilPath @dataclass(frozen=True) + NIL_PATH Nil-sentinel for filesystem paths. Has exists=False, read_text="", errors=[].
NilRAGState @dataclass(frozen=True) + NIL_RAG_STATE Nil-sentinel for the RAG engine. Has enabled=False, is_empty_result=True, errors=[].
OK Result[None] constant Trivial success for fail-or-succeed operations that carry no data.

Result is generic over T only (not over the error type). Errors are always list[ErrorInfo]. This is the AND-over-OR principle: data and errors are parallel fields, not a tagged sum.


Decision Tree

Need to represent "missing or failed"?
|
+-- Is the value a "data" value (not a control-flow signal)?
| +-- Use a Result dataclass (data + errors list)
| +-- Use a nil-sentinel dataclass (zero-initialized)
|
+-- Is the value a control-flow signal (e.g., "abort" or "skip")?
| +-- Use a boolean (or enum)
| +-- Use Optional[bool] / Optional[Enum] ONLY if the absence is meaningful
|
+-- Is the failure "unrecoverable" (programmer error, not runtime condition)?
| +-- Use assert (debug builds)
| +-- Use raise (only for programmer errors like KeyError on a known dict)
|
+-- Does the SDK raise an exception you can't avoid?
  +-- Catch at the boundary; convert to ErrorInfo inside a Result

Anti-Patterns

DON'T do these things:

  1. DON'T use Optional[X] for "this might fail at runtime". Use Result[X] instead.
  2. DON'T use None as a sentinel for "no result". Use a nil-sentinel dataclass.
  3. DON'T raise a custom exception class for runtime failures. Catch SDK exceptions and return ErrorInfo.
  4. DON'T use Union[T, E] (sum type). Use a struct with parallel fields (AND over OR).
  5. DON'T have if x is None: handle; else: use_x patterns in production code. The nil-sentinel makes them unnecessary.
  6. DON'T catch except Exception and silently swallow. Convert to ErrorInfo and return in the Result.

Examples

The 3 refactored subsystems demonstrate each pattern in context:

  • src/mcp_client.py:205-294read_file, list_directory, search_files return Result[str]; (p, err) tuples become Result[Path]; the 30+ assert p is not None chain (lines 304-794) is removed.
  • src/ai_client.py_send_<vendor>_result() returns Result[str] (8 vendors: gemini, anthropic, deepseek, minimax, gemini_cli, qwen, llama, grok); send_result() is the new public API; send() is @deprecated.
  • src/rag_engine.py:100-180_init_vector_store_result, _validate_collection_dim_result, is_empty_result, add_documents_result return Result[None] or Result[T]; broad except Exception blocks become ErrorInfo entries.

Hard Rules (enforced in the 3 refactored files)

These are non-negotiable in src/mcp_client.py, src/ai_client.py, and src/rag_engine.py:

  • Optional[T] return types are FORBIDDEN in the 3 refactored files. Use Result[T] (with NIL_T singleton if needed) instead. Rationale: Optional[T] is the sum type Union[T, None] that Fleury's framework replaces. Mixing the two patterns reintroduces the bifurcation the convention is designed to remove.
  • Function return types must be Result[T] for any function that can fail at runtime. A function that can't fail (e.g., get_name() -> str) doesn't need a Result. The classification is "can this return a different value under different runtime conditions?" If yes, Result. If no, plain return type.
  • Catch SDK exceptions at the boundary only. Inside the 3 refactored files, the only place an exception is caught is at the SDK call site (e.g., _send_<vendor>_result() wrapping the SDK call). Internal try/except is reserved for converting OSError, PermissionError, and similar I/O exceptions to ErrorInfo at the mcp_client tool boundary.

The verification script scripts/audit_optional_in_3_files.py enforces the Optional[X] rule by failing CI if any new Optional[X] appears in the 3 refactored files.

Optional[X] in argument types

The Optional[X] ban above applies to return types only. Argument types that genuinely may be None (e.g., rag_engine: Optional[Any] = None, pre_tool_callback: Optional[Callable] = None) remain allowed; they describe a caller choice, not a runtime failure of this function.

Cross-thread safety

Result and ErrorInfo are @dataclass(frozen=True) and therefore thread-safe by immutability. The with_error() / with_errors() / with_data() methods produce new instances (no mutation), matching the project's "no shared mutable state across threads" invariant. Deprecation warnings use warnings.warn(..., stacklevel=2) which is thread-safe.


When to Use This Convention

Use it for:

  • New public APIs (any function that can fail at runtime and the caller might care).
  • New internal functions where the caller benefits from knowing the failure (vs. just propagating None).

Don't use it for:

  • Constructors (__init__) that fail with programmer errors (use assert or raise for these).
  • Trivial getters that can't fail (get_name() -> str doesn't need a Result).
  • Performance-critical hot paths where the overhead of the dataclass allocation is measurable (rare; benchmark first).

Migration Playbook

When converting existing code:

  1. Identify the Optional[X] return type or the raise statement.
  2. Define a Result dataclass (or use the existing one) with data: X and errors: list[ErrorInfo].
  3. Replace None returns with Result(data=NIL_X, errors=[...]) or Result(data=zero_value, errors=[...]).
  4. Replace raise X with return Result(data=zero_value, errors=[ErrorInfo(kind=..., message=...)]).
  5. Update the caller to check result.errors instead of is None / try/except.
  6. Add a test that verifies both the success and failure paths return the right Result.

Deprecation: ai_client.send()ai_client.send_result()

The public ai_client.send() is marked @deprecated (via typing_extensions.deprecated, the Python 3.11+ backport of @warnings.deprecated). It still works for backward compat but emits a DeprecationWarning at runtime. New code MUST use ai_client.send_result().

  • send_result(...) -> Result[str, ErrorInfo] — the new public API.
  • send(...) -> strdeprecated. Returns str for backward compat; errors are logged to the comms log but not returned.
  • Removal timeline: public_api_migration_20260606 follow-up track.

The deprecation warning is cached per call site (Python's __warningregistry__) to avoid log spam. tests/conftest.py adds a filterwarnings entry to silence the warning during the transition; new tests for the new API should assert the warning is NOT emitted by send_result().


See Also

  • conductor/tracks/data_oriented_error_handling_20260606/spec.md — the spec that established this convention.
  • docs/guide_ai_client.md "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the provider layer.
  • docs/guide_mcp_client.md "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the MCP tool layer.
  • conductor/code_styleguides/data_oriented_design.md (added 2026-06-12) — the canonical Data-Oriented Design (DOD) reference; this track is the canonical application of DOD to error handling ("errors are data, not control flow").
  • conductor/code_styleguides/agent_memory_dimensions.md (added 2026-06-12) — the 4-dim memory model; the knowledge harvest TDD protocol in workflow.md uses this track's Result pattern.
  • docs/guide_rag.md "Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the RAG engine.
  • Ryan Fleury's original article — the philosophical foundation.