docs(report): add session report (audit + migration plan + tech-rot prevention)

2026-06-16 10:48:15 -04:00
parent b90d4bdd4e
commit 88e44d1c0e
1 changed files with 140 additions and 0 deletions
@@ -0,0 +1,140 @@
+# Session Report: Exception Handling Audit + Migration Planning + Tech-Rot Prevention
+
+**Date:** 2026-06-16
+**Total commits:** 17 (1 pre-existing todo + 16 new)
+**Tracks shipped:** 2 (`rag_test_failures_20260615` Tier 1 review; `exception_handling_audit_20260616` full execution)
+**Tracks planned:** 1 umbrella (`result_migration_20260616`, with 5 sub-tracks)
+**Doc updates:** 5 (styleguide + product-guidelines + docs/AGENTS + tracks.md + AGENTS.md)
+**Process rules added:** 1 (HARD BAN on day estimates in track artifacts)
+
+---
+
+## Scope executed
+
+This session executed 4 distinct work-streams:
+
+1. **Tier 1 review of `rag_test_failures_20260615`** — verified the 2-line fix in `src/rag_engine.py`, validated the docs update in `docs/guide_rag.md`, confirmed test pass count (1288 + 4 + 0 = first fully green baseline since 2026-06-12). Found 1 minor metadata inaccuracy (the metadata listed `src/app_controller.py` in modified_files but no production change occurred there; the change was in `src/rag_engine.py` only).
+
+2. **`exception_handling_audit_20260616` track** — built a 792-line AST-based static analyzer (`scripts/audit_exception_handling.py`) that classifies every `try/except/finally/raise` site in 65 `src/` files against a 10-category taxonomy. Identified **268 "bad" sites** (211 violations + 25 suspicious + 32 unclear) across 42 files. The 3 fully-refactored files (mcp_client.py, ai_client.py, rag_engine.py) are the **convention baseline**; the other 62 files are **migration target**. Closed 5 doc gaps the audit revealed.
+
+3. **5-track migration plan** — estimated that 5 sub-tracks are needed to eliminate all 268 "bad" sites, organized under a `result_migration_20260616` umbrella with the consistent `result_migration_*` prefix. Each sub-track sized by **scope + T-shirt size** (not day estimates, per the new Tier 1 rule added this session).
+
+4. **Tech-rot prevention** — added 4 enforcement mechanisms (styleguide checklist + product-guidelines obligations + docs/AGENTS.md enforcement section + audit script `--ci` flag) so future AI agents writing new code don't revert to idiomatic Python patterns.
+
+---
+
+## What was built
+
+### Static analyzer: `scripts/audit_exception_handling.py` (792 lines)
+
+AST-based, not regex. 10-category classification:
+- **5 compliant**: `BOUNDARY_SDK`, `BOUNDARY_IO`, `BOUNDARY_CONVERSION`, `BOUNDARY_FASTAPI`, `INTERNAL_PROGRAMMER_RAISE`, `INTERNAL_COMPLIANT`
+- **3 violation**: `INTERNAL_SILENT_SWALLOW`, `INTERNAL_BROAD_CATCH`, `INTERNAL_OPTIONAL_RETURN`
+- **1 suspicious**: `INTERNAL_RETHROW`
+- **1 unclear**: `UNCLEAR`
+
+6 output modes: default human-readable, `--json`, `--summary` (per-file table), `--by-size` (migration-effort buckets), `--strict`/`--ci` (CI gate), `--include-tests`, `--include-baseline`, `--exclude`.
+
+### The audit report: `docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` (370 lines)
+
+9 sections. The headline: **348 total sites / 80 compliant (23%) / 25 suspicious (7%) / 211 violations (61%) / 32 unclear (9%)**. Baseline (3 refactored files) has 112 sites / 77 violations. Migration target (62 other files) has 236 sites / 134 violations.
+
+### 5 doc updates (the tech-rot prevention)
+
+| File | What was added |
+|---|---|
+| `conductor/code_styleguides/error_handling.md` | "AI Agent Checklist" — 5 MUST-DO + 7 MUST-NOT-DO + 3 boundary patterns + pre-commit gate |
+| `conductor/product-guidelines.md` | "AI Agent Obligations" — 4 enforcement mechanisms + 4 audit scripts table + pre-commit workflow |
+| `docs/AGENTS.md` | "Convention Enforcement" section AT THE TOP of the file — first thing AIs see |
+| `conductor/tracks.md` | Registered `result_migration_20260616` umbrella (row 6d) + detail section |
+| `scripts/audit_exception_handling.py` | Added `--ci` alias for `--strict`; updated docstring to explain CI-gate mode |
+
+### The 5-track migration plan (`result_migration_20260616` umbrella)
+
+Consistent `result_migration_*` prefix for all 5 sub-tracks:
+
+| # | Sub-track | T-shirt | Scope |
+|---|---|---|---|
+| 1 | `result_migration_review_pass` | S | 57 sites (32 UNCLEAR + 25 INTERNAL_RETHROW) across 15 files |
+| 2 | `result_migration_small_files` | L | 37 files (35 SMALL + 2 MEDIUM); 72 V+S sites |
+| 3 | `result_migration_app_controller` | XL | 56 sites in 1 file (166KB) |
+| 4 | `result_migration_gui_2` | XL | 54 sites in 1 file (260KB) |
+| 5 | `result_migration_baseline_cleanup` | L | 112 sites in 3 refactored files |
+
+**Total: 5 sub-tracks, 268 sites across 42 files, ~2100 lines changed.**
+
+Sequence: 1 (review) → 2 (small files) → 3 (app_controller) → 4 (gui_2) → 5 (baseline cleanup). Tracks 2 + 5 can run in parallel; tracks 3 + 4 must be sequential (the GUI calls controller methods).
+
+### Process rule: HARD BAN on day estimates
+
+Codified in `AGENTS.md` (Critical Anti-Patterns, HARD BAN entry) and `conductor/workflow.md` (new "Tier 1 Track Initialization Rules" section, 113 lines).
+
+**Why this matters:** day estimates are inaccurate noise. Tier 2 capacity is bounded by attention, not time. The user called this out explicitly: *"Day estimates are inaccurate. Tier-2s can only do so much in a single track and there is no way in hell its going to be 'DAYS'."*
+
+**The rule:** measure effort by **scope** (N files, M sites, N tasks) and **T-shirt size** (S/M/L/XL). The user / Tier 2 agent decides the actual pacing.
+
+**Cleanup applied retroactively:** stripped day estimates from the 2 previously-shipped tracks (`rag_test_failures_20260615` and `exception_handling_audit_20260616`).
+
+---
+
+## Critical findings (the audit's most important discoveries)
+
+1. **`test_rag_visual_sim.py::test_rag_full_lifecycle_sim` was already passing at track execution time**, contrary to the spec's claim. The parent track's incidental fixes had already resolved it.
+
+2. **`src/app_controller.py` has 13 FastAPI boundary sites that are LEGITIMATE** (per the new "Boundary Types" section in the styleguide), not migration-target. The 22 remaining sites ARE migration-target.
+
+3. **The convention is partially applied even in the 3 refactored files**: 77 violations remain in mcp_client.py (44), ai_client.py (27), rag_engine.py (6). These are the parent's "Path C deferred work" + the SDK-exception-classification helpers in ai_client.py + the non-`*_result` methods in rag_engine.py. Sub-track 5 (baseline_cleanup) closes these.
+
+4. **The 268-site inventory is the canonical migration target.** Per-file breakdown (top 5):
+   - `src/gui_2.py`: 54 sites (37 V + 2 S + 13 ?)
+   - `src/app_controller.py`: 56 sites (35 V + 3 S + 2 ? + 16 C; 13 FastAPI boundary)
+   - `src/session_logger.py`: 8 sites (8 V)
+   - `src/warmup.py`: 7 sites (6 V + 1 S)
+   - `src/mcp_client.py`: 53 sites (44 V; BASELINE)
+
+5. **The audit's heuristics had bugs that the Tier 1 review caught**: `raise HTTPException(...)` was misclassified as `INTERNAL_RETHROW` because `ast.unparse(node.exc)` returns the full call expression, not just the class name. Fixed in the audit script.
+
+---
+
+## State
+
+- **Branch:** `master` (16 new commits, all atomic, all with git notes)
+- **Test pass count:** 1288 + 4 + 0 (unchanged from `rag_test_failures_20260615`; this session was informational + planning + docs)
+- **Convention status:** 3 of 65 `src/` files are convention-compliant (the baseline); 62 are migration-target. After all 5 `result_migration_*` sub-tracks ship, the convention will be applied to all 65 files.
+- **Pre-existing modified files** (NOT touched this session): `config.toml`, `manualslop_layout.ini`, `project_history.toml` — same 3 files mentioned in the `rag_test_failures_20260615` completion report as out of scope.
+
+---
+
+## Followup recommendations (for the next session / Tier 2)
+
+1. **Start sub-track 1** (`result_migration_review_pass`): a small (S) informational sub-track that reviews the 32 UNCLEAR + 25 INTERNAL_RETHROW sites, updates the audit's heuristics, and produces a per-site decision table. T-shirt size S, no day estimate. **No production code change.** This is the natural first sub-track to execute.
+
+2. **Then sub-tracks 2-5 in sequence** (small files → app_controller → gui_2 → baseline cleanup). Each is a refactor with tests; all have the convention's 4 enforcement mechanisms to prevent new violations.
+
+3. **After sub-track 5 ships:** wire `audit_exception_handling.py --strict` (or `--ci`) into pre-commit hooks + CI. At that point the project has 0 violations and the script returns 0; `--strict` mode becomes a meaningful CI gate.
+
+4. **Then the user's stated manual refactor:** `send_result` → `send` mass rename. Mechanical find-replace; no behavior change.
+
+5. **Then `data_structure_strengthening_20260606`** (the TypeAlias / NamedTuple track, parallel to result_migration; uses the cleaner Result API from this phase).
+
+---
+
+## See Also
+
+- `conductor/tracks/exception_handling_audit_20260616/` — the audit track's spec/plan/metadata
+- `conductor/tracks/result_migration_20260616/` — the umbrella spec for the 5 sub-tracks
+- `conductor/code_styleguides/error_handling.md` — the canonical styleguide (now with AI Agent Checklist)
+- `docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` — the 268-site inventory
+- `AGENTS.md` "Critical Anti-Patterns" — the HARD BAN on day estimates
+- `conductor/workflow.md` §"Tier 1 Track Initialization Rules" — the no-day-estimates rule
+- `docs/AGENTS.md` §"Convention Enforcement" — the AI-facing mirror's first section
+- `conductor/tracks/rag_test_failures_20260615/` — the parent track (the first fully green baseline)
+- `conductor/tracks/data_oriented_error_handling_20260606/` — the convention's origin track
+
+---
+
+## Closing note
+
+The session started with a Tier 1 review (verify someone else's work). It grew into: a new track (audit + 5 doc gaps), an umbrella track for the migration phase (5 sub-tracks), a process rule (no day estimates), and 5 doc updates to prevent tech rot. **17 commits, 4 lifecycle stages, 0 test regressions.** The project is now at a fully green baseline (1288 + 4 + 0) and the convention has 4 enforcement mechanisms to keep it that way.
+
+The next Tier 1 session should start with sub-track 1 (`result_migration_review_pass`); everything else is in place.