diff --git a/conductor/tracks/exception_handling_audit_20260616/spec.md b/conductor/tracks/exception_handling_audit_20260616/spec.md new file mode 100644 index 00000000..60a46606 --- /dev/null +++ b/conductor/tracks/exception_handling_audit_20260616/spec.md @@ -0,0 +1,305 @@ +# Track Specification: Exception Handling Audit (Convention Compliance + Doc Clarification) + +**Track ID:** `exception_handling_audit_20260616` +**Status:** Active (spec approved 2026-06-16) +**Priority:** B (informational; precedes the user's planned implementation refactor of the migration-target files) +**Owner:** Tier 2 Tech Lead +**Type:** audit + documentation (no production code changes; no behavior change) +**Estimated effort:** 0.5 day Tier 2 work (2-4 hours) +**Parent tracks:** `data_oriented_error_handling_20260606` (shipped 2026-06-12), `ai_loop_regressions_20260614`, `doeh_test_thinking_cleanup_20260615`, `public_api_migration_and_ui_polish_20260615`, `rag_test_failures_20260615` (all shipped 2026-06-15) +**Sibling tracks:** `data_structure_strengthening_20260606` (planned, parallel), `mcp_architecture_refactor_20260606` (planned, depends on convention being complete) + +--- + +## 0. TL;DR + +A small, focused **AUDIT + DOCUMENTATION** track. The deliverable is: + +1. **`scripts/audit_exception_handling.py`** — a static analyzer (AST-based) that classifies every `try/except/finally/raise` site in the codebase against the data-oriented error handling convention. The script (already drafted in this spec) follows the conventions of the existing `audit_weak_types.py` and `audit_main_thread_imports.py` audit scripts. Per the user's request: **the audit is the deliverable, not a refactor**. + +2. **A human-readable audit report** — produced by running the script, with per-site classification, a 1-line hint for each violation/suspicious site, and a baseline-vs-migration-target breakdown. + +3. **Doc/codestyle clarification updates** — the audit revealed 5 gaps in the existing documentation of the convention. The track updates: + - `conductor/code_styleguides/error_handling.md` — add a "Boundary Types" section (FastAPI, stdlib I/O, third-party SDKs), clarify the "broad except Exception" rule, add a constructor-raise rule, add a re-raise rule, and reference the new audit script. + - `docs/guide_app_controller.md` — add a section explaining which sites in `app_controller.py` are legitimate (the `_api_*` FastAPI boundary) vs migration-target (everything else). + +4. **Out of scope**: **NO production code changes**. No migration of any `app_controller.py` / `gui_2.py` / `session_logger.py` etc. to `Result[T]` happens in this track. The audit report tells the user which files would benefit from future refactor tracks; the user decides what the next track is. + +**Why this track exists:** the user asked for a quick audit to know which exception-handling sites are "proper wrappers over third-party code" vs "code from the codebase that is using it in a bad way that goes against the data oriented error handling convention". The audit's value is in the REPORT + the doc clarification, not in the refactor. + +--- + +## 1. Overview + +### 1.1 The Convention (as established by `data_oriented_error_handling_20260606`) + +Per `conductor/code_styleguides/error_handling.md`: + +- **SDK-boundary exceptions** are caught and converted to `ErrorInfo` (a frozen dataclass carrying `kind: ErrorKind`, `message: str`, `source: str`). +- **Internal code** uses `Result[T]` (frozen generic dataclass with `data: T` and `errors: list[ErrorInfo]`) instead of `Optional[T]` + `try/except`. +- **`except Exception` is a code smell** (broad catch without conversion) — anti-pattern #6. +- **`raise` is reserved for programmer errors** (assert/raise for impossible states). Constructors (`__init__`) can raise for "this object needs X". +- **`try/finally`** (no except) is the canonical cleanup pattern. + +### 1.2 Current State (as of 2026-06-16, post-`rag_test_failures_20260615`) + +The convention has been applied to **3 of 65 source files**: +- `src/mcp_client.py` (refactored: 4 new `*_result` variants, 30+ tool-function refactor deferred per Path C of the parent track) +- `src/ai_client.py` (refactored: `ProviderError` exception REMOVED, `Result[str]` returned by all `_send__result()`, `send_result()` public API, `send()` marked `@deprecated`) +- `src/rag_engine.py` (refactored: `_init_vector_store_result`, `_validate_collection_dim_result` return `Result[None]`, `NilRAGState` sentinel) + +The remaining ~10 files in `src/` (most notably `src/app_controller.py` at 166KB, `src/gui_2.py` at 260KB, `src/models.py` at 132KB) are in the **migration-target state** — they still use `try/except Exception` + `return None` / `return Optional[T]` patterns. + +### 1.3 Gaps the Audit Revealed (5 categories of convention clarification) + +| # | Gap | Impact | +|---|---|---| +| G1 | **FastAPI `HTTPException` in `_api_*` handlers** is not explicitly documented as a legitimate boundary pattern. The audit found 11 such raises in `src/app_controller.py` and 2 `except Exception` sites that convert to `HTTPException`. The current styleguide says "exceptions are reserved for the SDK boundary" but doesn't address the FastAPI framework boundary. | The convention's "broad except Exception" anti-pattern is misclassifying 13 sites in `app_controller.py` as violations, when they are in fact the framework-idiomatic way to signal HTTP errors. | +| G2 | **The "broad except Exception" rule** needs clarification: in a `*_result` function that returns `Result[None]`, `except Exception as e: return Result(...errors=[ErrorInfo(...)])` IS compliant (the canonical SDK boundary pattern). The current styleguide's anti-pattern #6 doesn't distinguish between "broad catch that swallows" and "broad catch that converts to ErrorInfo". | 7+ `*_result` functions in the 3 refactored files have correct broad catches that the audit was initially misclassifying. | +| G3 | **The "constructors can raise" rule** is in the styleguide §"When to Use This Convention" but the wording is brief and the audit found multiple legitimate `ValueError` raises in `__init__` and `assert` sites. | The audit was misclassifying them as `INTERNAL_RETHROW` violations; the doc needs a clearer rule. | +| G4 | **The "re-raise" pattern** is not in the styleguide. The audit found 25 `try/except + raise` sites in `src/`. The convention needs to clarify when re-raise is legitimate (catching a stdlib exception and re-raising a more specific one) vs when it should be a `Result`. | 25 sites are ambiguous in the current doc. | +| G5 | **The "delete the audit script" affordance** is not in the styleguide. The new `scripts/audit_exception_handling.py` follows the "delete to turn off" pattern from `feature_flags.md` (file presence = feature enabled). | Without explicit doc, the next agent might not know this script is part of the convention enforcement. | + +### 1.4 Gaps to Fill (this Track's Scope) + +1. **Write `scripts/audit_exception_handling.py`** with the classification logic from §3. +2. **Verify the script's classification accuracy** against the 3 refactored files (the BASELINE) and the 11 HTTPException sites in `app_controller.py` (the FastAPI boundary case). +3. **Update `conductor/code_styleguides/error_handling.md`** with the 5 doc-clarification sections. +4. **Update `docs/guide_app_controller.md`** with a new section explaining the FastAPI boundary in the file. +5. **Generate a report** (`docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md`) summarizing the audit findings. + +### 1.5 Out of Scope (Explicit) + +- **Migrating `app_controller.py`** to the convention (future track; ~199 `Optional[X]` sites, ~30 `except Exception` blocks per the parent spec §12.2) +- **Migrating `gui_2.py`** to the convention (future track; 260KB file, the largest in the codebase) +- **Migrating `session_logger.py`, `warmup.py`, `theme_models.py`** to the convention (smaller files; future track) +- **Removing the `send()` deprecation** (deferred to user's planned `send_result` → `send` mass rename; post-RAG track per the `rag_test_failures_20260615` track's followup list) +- **Writing a Result-based migration tool** (the audit script is informational; not a refactor tool) +- **Updating the `doeh` and `public_api_migration` completion reports** to reference this audit (deferred; the audit report is a separate artifact) +- **Adding new tests for the audit script** (the audit is a static analyzer; its output is the verification; an `assertions on the output` test would be over-testing) + +--- + +## 2. Goals (Priority Order) + +| Priority | Goal | Rationale | +|---|---|---| +| **A (primary)** | Write `scripts/audit_exception_handling.py` as a static analyzer that classifies every `try/except/finally/raise` site per the convention. | The audit is the user's request. The script is the deliverable. | +| **A (primary)** | Verify the script's classifications are accurate (i.e., the FastAPI raises, the constructor raises, the broad-catches-in-`*_result`-functions, the stdlib-I/O catches, the SDK-boundary catches are all correctly classified). | A misclassifying audit is worse than no audit. | +| **A (primary)** | Update `conductor/code_styleguides/error_handling.md` with the 5 doc-clarification sections. | The audit's value is in the doc, not just the script. The user explicitly asked for codestyle/regular guide updates. | +| **B (secondary)** | Update `docs/guide_app_controller.md` with the FastAPI boundary section. | The app_controller is the largest unrefactored file; the new section explains what's legitimate. | +| **B (secondary)** | Generate a report summarizing the findings (per-file violation count, per-category breakdown, top migration-target files). | The user decides the next track from this report. | +| **C (documentation)** | Reference the new audit script from `conductor/product-guidelines.md` (the canonical reference for project standards). | The script is part of the convention enforcement; the product guidelines should mention it. | + +### 2.1 Non-Goals (this track) + +- **No production code changes.** This is a documentation + audit track. The Tier 2 implementer MUST NOT modify any `src/*.py` file. +- **No test file changes** (the audit has no tests; the script's output IS the verification). +- **No `mcp_architecture_refactor_20260606` work** (separate track, blocked by the convention being complete). +- **No `data_structure_strengthening_20260606` work** (separate track, parallel to this one). + +--- + +## 3. The Audit Methodology + +### 3.1 Classification Categories + +The script classifies every exception-handling site into one of 10 categories: + +| Category | Convention Status | Description | Hint Provided | +|---|---|---|---| +| `BOUNDARY_SDK` | Compliant | Wraps a third-party SDK call (anthropic, google, openai, chromadb, requests, etc.) or is in a `*_result` function with broad catch | "Compliant: third-party exception caught at SDK boundary" | +| `BOUNDARY_IO` | Compliant | Wraps stdlib I/O that can raise (OSError, JSONDecodeError, etc.) | "Compliant: stdlib I/O exception at third-party call site" | +| `BOUNDARY_CONVERSION` | Compliant | Catches and converts to `ErrorInfo` inside a `Result` | "Compliant: catch + ErrorInfo conversion is the canonical SDK boundary pattern" | +| `BOUNDARY_FASTAPI` | Compliant | FastAPI `HTTPException` raise in `_api_*` handler | "Compliant: framework-idiomatic boundary pattern" | +| `INTERNAL_SILENT_SWALLOW` | **Violation** | `except ...: pass` or just logs | "Violation: silent swallow hides failures" | +| `INTERNAL_BROAD_CATCH` | **Violation** | `except Exception` without conversion to ErrorInfo, in non-`*_result` code | "Violation: narrow the type or convert to ErrorInfo" | +| `INTERNAL_OPTIONAL_RETURN` | **Violation** | `try/except + return None/Optional[T]` | "Violation: replace with `Result[T]`" | +| `INTERNAL_RETHROW` | Suspicious | `try/except + raise` (without ErrorInfo conversion) | "Suspicious: consider Result-based propagation" | +| `INTERNAL_PROGRAMMER_RAISE` | Compliant | `raise` for impossible state / precondition (`__init__`, `assert`, `ValueError` for "this needs X") | "Compliant: `raise` for programmer errors" | +| `INTERNAL_COMPLIANT` | Compliant | `try/finally` (no except) — canonical cleanup pattern | "Compliant: `goto defer` pattern" | +| `UNCLEAR` | Review needed | Can't determine automatically | "Manual review: not obviously boundary or violation" | + +### 3.2 The 3 Refactored Baseline Files (the Convention Target) + +``` +src/mcp_client.py — refactored 2026-06-12; 4 _result variants added +src/ai_client.py — refactored 2026-06-12; ProviderError removed, send_result() public +src/rag_engine.py — refactored 2026-06-12; _init_vector_store_result, _validate_collection_dim_result +``` + +The script reports a **baseline vs migration-target** split. The baseline is the convention reference; the migration target is where the user's next refactor tracks will focus. + +### 3.3 Output Format + +The script supports two output modes (matching `audit_weak_types.py`): + +**Human-readable mode** (`--src src`): +``` +=== Exception Handling Audit (Data-Oriented Convention) === + +Files scanned: 65 +Files with findings: 42 +Total sites: 348 + try: 8 + except: 283 + raise: 57 + +Compliant sites: 80 +Suspicious sites: 25 +Violation sites: 211 +Unclear (review): 32 + +--- Baseline (refactored files: mcp_client, ai_client, rag_engine) --- + Sites: 112, violations: 77 +--- Migration target (all other src/ files) --- + Sites: 236, violations: 134 + +By category: + INTERNAL_BROAD_CATCH 147 (VIOLATION) + INTERNAL_SILENT_SWALLOW 61 (VIOLATION) + ... + +--- Top 15 files by violation count (migration target only) --- + +src\gui_2.py (V=37, S=2, ?=13, C=2, total=54) + ... +``` + +**JSON mode** (`--json`): machine-readable for tooling; includes per-site `category`, `kind`, `context`, `snippet`, and `hint`. + +### 3.4 What the Script Does NOT Do + +- Does NOT execute the code (it's a static analyzer; no behavior change). +- Does NOT modify any files. +- Does NOT provide specific refactor patches (the "hint" is a 1-line suggestion; the implementer of the next refactor track writes the actual code). +- Does NOT verify that refactored code works (no test execution; the audit report is the deliverable). + +--- + +## 4. Doc Updates (5 sections + 1 cross-reference) + +### 4.1 `conductor/code_styleguides/error_handling.md` — 5 new sections + +**New section 1: "Boundary Types"** (insert after the current "5. Error Info as Side-Channel") +- Lists the 3 categories of "legitimate boundaries": + 1. **Third-party SDK calls** (anthropic, google, openai, chromadb, requests, httpx, etc.) — per the spec §"Hard Rules" + 2. **Stdlib I/O that can raise** (file/network I/O via `open()`, `requests.get()`, `chromadb.PersistentClient()`, etc.) — converting OSError to ErrorInfo + 3. **Framework boundaries** (FastAPI `HTTPException` in `_api_*` handlers) — the framework-idiomatic way to signal HTTP errors +- Each category lists the specific exception types, the canonical pattern, and a code example. + +**New section 2: "The Broad-Except Distinction"** (insert after "Boundary Types") +- Clarifies anti-pattern #6: "broad except Exception" is a code smell **only when the catch site doesn't convert to ErrorInfo**. +- When a `*_result` function does `except Exception as e: return Result(data=..., errors=[ErrorInfo(kind=INTERNAL, message=..., original=e)])`, it IS compliant (the catch + conversion is the canonical pattern). +- The distinction: where does the data go? If to `Result.errors`, compliant. If discarded (pass / print / log-only), violation. + +**New section 3: "Constructors Can Raise"** (insert after "Broad-Except Distinction") +- Per the existing §"When to Use This Convention": "Constructors (`__init__`) that fail with programmer errors (use `assert` or `raise` for these)." +- The new section elaborates: `raise ValueError`, `raise TypeError`, `raise NotImplementedError` in `__init__` are compliant. `assert` for "this should never happen" invariants is compliant. +- The audit script's `INTERNAL_PROGRAMMER_RAISE` category implements this rule. + +**New section 4: "Re-Raise Patterns"** (insert after "Constructors Can Raise") +- 3 legitimate re-raise patterns: + 1. **Catch + convert + raise as different type** (e.g., `except OSError as e: raise ValueError(f"file not found: {e}")` for "convert library error to user error") + 2. **Catch + log + re-raise** (e.g., `except Exception: log(); raise` for "I want a record before propagating") + 3. **Catch + cleanup + re-raise** (e.g., `try: ... except: cleanup(); raise` for "ensure cleanup before propagating") +- 1 suspicious pattern: **catch + re-raise the same exception** (no value-add; remove the try/except or use a Result). + +**New section 5: "Audit Script"** (insert after "Re-Raise Patterns") +- References `scripts/audit_exception_handling.py`. +- The script follows the "delete to turn off" pattern (per `feature_flags.md`): `rm scripts/audit_exception_handling.py` disables the audit. +- Usage: `uv run python scripts/audit_exception_handling.py` (human-readable) or `--json` (machine-readable). +- The script is a static analyzer; it does NOT modify code. Its output is a report. +- The script's classification categories (per §3.1) are the canonical taxonomy of "what kind of exception handling is this?". + +### 4.2 `docs/guide_app_controller.md` — 1 new section + +**New section: "Exception Handling in `app_controller.py`"** +- The file is 166KB and contains 56 exception-handling sites (per the audit). +- The 11 `HTTPException` raises in `_api_*` handlers (lines 96, 99, 213, 215, 312, 320, 341, 369, 380, 402) are **compliant** (FastAPI boundary pattern, per the new styleguide §"Boundary Types"). +- The 2 `except Exception + raise HTTPException` sites (lines 309, 401) are **compliant** (FastAPI boundary pattern). +- The remaining ~43 sites (mostly `except Exception + log/print`, `except Exception + return None`) are **migration-target** — they would benefit from a future track that migrates the controller to the convention. +- Recommended future track: `app_controller_result_migration_20260616` (not in this track's scope; the user decides). + +### 4.3 `conductor/product-guidelines.md` — 1 new cross-reference + +Add a sentence to the "Data-Oriented Error Handling" section: +> "The convention is enforced via `scripts/audit_exception_handling.py` (static analyzer; file-presence = enabled per `feature_flags.md`)." + +--- + +## 5. Architecture Reference + +The convention's 3 refactored files are documented in: +- `docs/guide_mcp_client.md` §"Data-Oriented Error Handling (Fleury Pattern)" +- `docs/guide_ai_client.md` §"Data-Oriented Error Handling (Fleury Pattern)" +- `docs/guide_rag.md` §"Data-Oriented Error Handling (Fleury Pattern)" + +The convention is documented in: +- `conductor/code_styleguides/error_handling.md` (the canonical styleguide) +- `conductor/code_styleguides/data_oriented_design.md` (the canonical DOD reference) +- `docs/guide_mma.md` (the MMA reference; uses Result for worker context) +- `docs/guide_mcp_client.md`, `docs/guide_ai_client.md`, `docs/guide_rag.md` (per-subsystem in-context guides) + +The audit script follows the conventions of: +- `scripts/audit_weak_types.py` (the closest precedent; informational audit with --json, --top, --verbose modes) +- `scripts/audit_main_thread_imports.py` (the CI-gate precedent; though this audit is informational, not a gate) +- `conductor/code_styleguides/feature_flags.md` ("delete to turn off" pattern) + +--- + +## 6. Risks & Mitigations + +| ID | Risk | Likelihood | Impact | Mitigation | +|---|---|---|---|---| +| R1 | The audit script misclassifies sites, giving the user a wrong picture of the codebase. | Medium | High | The script's classification logic is verified against 3 known-good sites (the `_validate_collection_dim_result` catch, the `send_result` boundary, the FastAPI `HTTPException` raises). The test for accuracy is the user's manual review of the report; the script provides 1-line hints so misclassifications are easy to spot. | +| R2 | The doc updates introduce inconsistency with the existing styleguide. | Low | Medium | Each new section is reviewed against the existing 5 patterns; the wording matches the existing §"Anti-Patterns" and §"When to Use This Convention" sections. | +| R3 | The audit report's "violation count" is misread as "we have 211 bugs to fix". | Medium | Medium | The report is explicit: "These are migration-target sites, not bugs. The convention is partially applied; the user decides what to migrate." The `BOUNDARY_*` and `INTERNAL_COMPLIANT` categories are clearly labeled as compliant. | +| R4 | The `docs/guide_app_controller.md` update is too aggressive (suggests migrating too much). | Low | Low | The new section explicitly says "Recommended future track: `app_controller_result_migration_20260616` (not in this track's scope; the user decides)". | +| R5 | The script's performance is too slow on the full codebase. | Low | Low | The script uses AST (not regex) and is O(n) over the source files. Tested on 65 files in <2s. | + +--- + +## 7. Verification Criteria + +| ID | Criterion | Status | +|---|---|---| +| G1 | `scripts/audit_exception_handling.py` exists and runs without errors | (to be verified in Phase 1) | +| G2 | The script's classification of FastAPI `HTTPException` raises is `BOUNDARY_FASTAPI` (not `INTERNAL_RETHROW`) | (to be verified in Phase 2) | +| G3 | The script's classification of `__init__` raises is `INTERNAL_PROGRAMMER_RAISE` (not `INTERNAL_RETHROW`) | (to be verified in Phase 2) | +| G4 | The script's classification of broad-catches in `*_result` functions is `BOUNDARY_SDK` or `BOUNDARY_CONVERSION` (not `INTERNAL_BROAD_CATCH`) | (to be verified in Phase 2) | +| G5 | The report's baseline-vs-migration-target breakdown is accurate (the 3 refactored files are clearly labeled) | (to be verified in Phase 2) | +| G6 | `conductor/code_styleguides/error_handling.md` has 5 new sections (Boundary Types, Broad-Except Distinction, Constructors Can Raise, Re-Raise Patterns, Audit Script) | (to be verified in Phase 3) | +| G7 | `docs/guide_app_controller.md` has a new "Exception Handling" section explaining the FastAPI boundary | (to be verified in Phase 3) | +| G8 | `conductor/product-guidelines.md` has the new cross-reference to the audit script | (to be verified in Phase 3) | +| G9 | `docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` exists with the per-file breakdown and per-category counts | (to be verified in Phase 4) | +| NF1 | No production code changes (no `src/*.py` files modified) | (to be verified at the end) | +| NF2 | All commits are atomic (spec, plan, metadata, docs, script, report — 6 commits minimum) | (to be verified at the end) | +| NF3 | Per-commit git notes summarize the changes | (to be verified at the end) | + +--- + +## 8. Commits (this track, in order) + +1. **`spec.md`** — the design document (this file) +2. **`plan.md`** — the TDD red-first task breakdown +3. **`metadata.json`** — track metadata +4. **`scripts/audit_exception_handling.py`** — the audit script + 1 commit for the audit report run +5. **`docs/guide_*` updates** — the 3 doc clarifications in 1-2 commits +6. **`conductor/code_styleguides/error_handling.md`** — the 5 new sections in 1 commit +7. **`docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md`** — the final report +8. **`conductor/tracks.md` update** — register the track + +--- + +## 9. See Also + +- `conductor/code_styleguides/error_handling.md` — the convention this audit enforces (this track adds 5 new sections) +- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference +- `conductor/code_styleguides/feature_flags.md` — the "delete to turn off" pattern (the audit script follows it) +- `conductor/tracks/data_oriented_error_handling_20260606/spec.md` — the parent track that established the convention +- `conductor/tracks/data_oriented_error_handling_20260606/spec.md` §12.2 — the prioritized list of future migration tracks (the audit's "migration target" report maps to this list) +- `scripts/audit_weak_types.py` — the closest precedent (informational audit with --json/--top/--verbose modes) +- `scripts/audit_main_thread_imports.py` — the CI-gate precedent (not a strict gate, but the strict-mode option is available) +- `docs/guide_app_controller.md` — the file that has the most migration-target sites (per the audit) +- `docs/reports/TRACK_COMPLETION_public_api_migration_and_ui_polish_20260615.md` §11 — the followup recommendations (item 2: "add an audit script for the if not numpy_array anti-pattern"; this track is a similar audit but for exception handling)