conductor(track): spec for result_migration_small_files_20260617 (sub-track 2 of 5)

2026-06-17 18:19:42 -04:00
parent 87f273d044
commit 0aa00e394d
1 changed files with 222 additions and 0 deletions
@@ -0,0 +1,222 @@
+# Track Specification: Result Migration — Sub-Track 2 (Small Files + Audit-Script Bug Fixes)
+
+**Track ID:** `result_migration_small_files_20260617`
+**Parent umbrella:** [`result_migration_20260616`](../../result_migration_20260616/spec.md) (sub-track 2 of 5)
+**Type:** refactor + audit-script maintenance (1 file script fix + 37 source file migrations)
+**Priority:** A (foundational; the convention's middle layer)
+**T-shirt size:** L
+**Status:** ready to start (sub-track 1 shipped; 4 UNCLEAR sites need classification)
+
+---
+
+## 0. Overview
+
+This is sub-track 2 of the 5-sub-track `result_migration_20260616` campaign. It does two things in one track:
+
+1. **Phase 1: Fix 3 pre-existing audit-script bugs** (documented in the review pass report §4.4) so that the audit's classification and reporting are correct for sub-tracks 2-5.
+2. **Phases 2-7: Migrate 37 source files** (the 35 SMALL + 2 MEDIUM from the `--by-size` bucket) to the data-oriented error handling convention.
+
+The audit-script fix MUST happen first because:
+- The `visit_Try` walker bug actively misclassifies `raise` statements in non-last `except` handlers (confirmed: `src/rag_engine.py:31` is missed). Running the audit against the 37 files before the fix produces a wrong scope.
+- The `render_json` filter + truncation bugs hide findings in the per-file report. Fixing them gives Tier 2 accurate per-file guidance.
+
+**Why combine the two:** the audit-script fixes are small (~50-100 lines), well-scoped, and pre-existing in the project's institutional memory. Folding them into sub-track 2 (which already has the SMALL batched-commit pattern) is cheaper than a separate 1-task track.
+
+## 1. Current State Audit (as of 2026-06-17, base commit `b6caca40` post-review-pass merge)
+
+### 1.1 The 37-File Scope (per `scripts/audit_exception_handling.py --by-size`)
+
+| Bucket | Files | V+S+? | Notes |
+|---|---|---|---|
+| SMALL | 35 | 48V + 9S + 4? = 61 sites | Batched migration (5-7 files per commit) |
+| MEDIUM | 2 (session_logger, warmup) | 14V + 1S = 15 sites | Dedicated commits per file |
+| **Total** | **37** | **76 sites** | |
+
+The 4 UNCLEAR sites in SMALL are NOT classified by the review pass (they were "outside review scope" per the review-pass report §4.3). They are:
+
+| File | Site | Why still UNCLEAR |
+|---|---|---|
+| `src/outline_tool.py` | line 49 | Audit's `_classify_except` heuristic doesn't match the pattern |
+| `src/summarize.py` | line 36 | Same |
+| `src/conductor_tech_lead.py` | line 1 | Same |
+| `src/openai_compatible.py` | line 1 | Same |
+
+These 4 are **Phase 2 work** of this track: read each snippet, classify compliant-or-migration, record the decision in the report. Per the review-pass convention, sites that are compliant don't need migration; sites that are migration-target get a per-site decision.
+
+### 1.2 The 35 SMALL Files (per `audit_exception_handling.py --by-size`)
+
+| File | V | S | ? | C | total |
+|---|---|---|---|---|---|
+| src/api_hooks.py | 3 | 2 | 0 | 0 | 5 |
+| src/project_manager.py | 5 | 0 | 0 | 0 | 5 |
+| src/aggregate.py | 4 | 0 | 0 | 1 | 5 |
+| src/multi_agent_conductor.py | 4 | 0 | 0 | 4 | 8 |
+| src/summary_cache.py | 4 | 0 | 0 | 0 | 4 |
+| src/commands.py | 3 | 0 | 0 | 0 | 3 |
+| src/external_editor.py | 3 | 0 | 0 | 0 | 3 |
+| src/models.py | 2 | 1 | 0 | 2 | 5 |
+| src/outline_tool.py | 2 | 1 | 1 | 0 | 4 |
+| src/file_cache.py | 2 | 0 | 0 | 1 | 3 |
+| src/gemini_cli_adapter.py | 0 | 2 | 0 | 2 | 4 |
+| src/log_registry.py | 2 | 0 | 0 | 2 | 4 |
+| src/markdown_helper.py | 2 | 0 | 0 | 0 | 2 |
+| src/orchestrator_pm.py | 2 | 0 | 0 | 1 | 3 |
+| src/presets.py | 2 | 0 | 0 | 3 | 5 |
+| src/shell_runner.py | 1 | 1 | 0 | 2 | 4 |
+| src/command_palette.py | 1 | 0 | 0 | 1 | 2 |
+| src/context_presets.py | 1 | 0 | 0 | 0 | 1 |
+| src/diff_viewer.py | 1 | 0 | 0 | 0 | 1 |
+| src/hot_reloader.py | 1 | 0 | 0 | 1 | 2 |
+| src/startup_profiler.py | 1 | 0 | 0 | 1 | 2 |
+| src/summarize.py | 1 | 0 | 1 | 0 | 2 |
+| src/theme_2.py | 1 | 0 | 0 | 0 | 1 |
+| src/theme_models.py | 0 | 1 | 0 | 9 | 10 |
+| src/vendor_capabilities.py | 0 | 1 | 0 | 0 | 1 |
+| src/api_hook_client.py | 0 | 0 | 0 | 2 | 2 |
+| src/conductor_tech_lead.py | 0 | 0 | 1 | 2 | 3 |
+| src/dag_engine.py | 0 | 0 | 0 | 1 | 1 |
+| src/log_pruner.py | 0 | 0 | 0 | 2 | 2 |
+| src/openai_compatible.py | 0 | 0 | 1 | 0 | 1 |
+| src/paths.py | 0 | 0 | 0 | 3 | 3 |
+| src/performance_monitor.py | 0 | 0 | 0 | 1 | 1 |
+| src/personas.py | 0 | 0 | 0 | 3 | 3 |
+| src/tool_presets.py | 0 | 0 | 0 | 3 | 3 |
+| src/workspace_manager.py | 0 | 0 | 0 | 3 | 3 |
+| **SMALL subtotal** | **48** | **9** | **4** | **50** | **111** |
+
+### 1.3 The 2 MEDIUM Files
+
+| File | V | S | ? | C | total |
+|---|---|---|---|---|---|
+| src/session_logger.py | 8 | 0 | 0 | 0 | 8 |
+| src/warmup.py | 6 | 1 | 0 | 0 | 7 |
+| **MEDIUM subtotal** | **14** | **1** | **0** | **0** | **15** |
+
+### 1.4 The 3 Audit-Script Bugs (per review-pass report §4.4)
+
+The review pass documented 3 pre-existing bugs in `scripts/audit_exception_handling.py`. All 3 are fixed in Phase 1 of this track.
+
+| Bug | Location | Impact | Fix Complexity |
+|---|---|---|---|
+| `visit_Try` only walks children of the LAST except handler | `scripts/audit_exception_handling.py:759-784` (specifically L774: `for child in handler.body if node.handlers else []` uses the loop variable `handler` from L771, which is the last iteration) | **Real classification bug.** Misses `raise` statements in non-last except handlers. Confirmed: `src/rag_engine.py:31` is not in the audit findings. Will reclassify 5-15 sites once fixed. | TDD: ~30 lines, 3-4 tests |
+| `render_json` filters out compliant findings in non-verbose mode | `scripts/audit_exception_handling.py:884, 889, 958` (filter is `if f.category in VIOLATION_CATEGORIES or f.category in ("UNCLEAR", "INTERNAL_RETHROW")` — `INTERNAL_COMPLIANT` is excluded) | **Reporting bug.** Totals are right; per-file list is incomplete. The 25 newly-classified compliant sites (from the review pass) are not in the per-file list. | TDD: ~20 lines, 2 tests |
+| `render_json` truncates per-file list to `top` (default 15) | `scripts/audit_exception_handling.py:1058` (CLI default), `scripts/audit_exception_handling.py:958` (the `[r for r in sorted_reports[:top]]` slice) | **Reporting bug.** UNCLEAR sites in low-violation files (e.g., `outline_tool.py`, `summarize.py`) are not in the per-file list. | TDD: ~10 lines, 1 test |
+
+**Estimated total Phase 1 scope:** ~60 lines of changes (1 file), 6-9 TDD tests, 1 commit (or 3 if per-bug atomic).
+
+### 1.5 The 4 UNCLEAR Sites (Phase 2 classification)
+
+The review pass did NOT classify these 4 sites (they were below the audit's 24-site review threshold). Phase 2 of this track reads each site + 2-3 lines of context and decides compliant-or-migration. The decisions feed into Phase 3+ as additional migration targets OR as "no-op" (already compliant).
+
+Per the review-pass convention:
+- **Compliant** = add to the report as a "no-op" line; no code change
+- **Migration-target** = queue for Phase 3+ batches (add to the per-batch scope)
+
+### 1.6 The Migration Pattern (per the styleguide)
+
+Each `try/except` site that is a migration-target follows this transformation (per `conductor/code_styleguides/error_handling.md`):
+
+**Before** (idiomatic Python):
+```python
+def some_function(arg: str) -> SomeResult:
+    try:
+        return compute(arg)
+    except Exception as e:
+        logger.error("...")
+        return None
+```
+
+**After** (data-oriented):
+```python
+def some_function(arg: str) -> Result[SomeResult]:
+    try:
+        return Result(data=compute(arg))
+    except SpecificError as e:
+        return Result(data=NIL_T, errors=[ErrorInfo(category="...", message=str(e), ...)])
+```
+
+The convention uses `Result[T]` (from `src/result_types.py`) with `NIL_T` sentinel and `ErrorInfo` dataclass. The 3 refactored baseline files (mcp_client, ai_client, rag_engine) are the reference implementations.
+
+## 2. Goals
+
+The track has 3 goals, all bounded by scope (not time):
+
+1. **Fix the 3 audit-script bugs** so the audit is accurate for sub-tracks 2-5.
+2. **Classify the 4 UNCLEAR sites** in the SMALL bucket.
+3. **Migrate 76 sites across 37 files** to the data-oriented error handling convention.
+
+## 3. Functional Requirements
+
+- **FR1:** The 3 audit-script bugs in `scripts/audit_exception_handling.py` are fixed; each fix has a TDD test in `tests/test_audit_exception_handling_bug_fixes.py` (or a new test file).
+- **FR2:** Re-running `uv run python scripts/audit_exception_handling.py --json` after Phase 1 shows the corrected classification (the `rag_engine.py:31` raise is now in the findings; the per-file list is complete; the per-file list is no longer truncated to top 15 by default).
+- **FR3:** A per-site decision table for the 4 UNCLEAR sites is written to `docs/reports/RESULT_MIGRATION_SMALL_FILES_20260617.md` (the track's per-site report).
+- **FR4:** All 35 SMALL + 2 MEDIUM files are migrated to the convention. Each `try/except` migration-target is converted to a `Result[T]` return; the compliant sites stay as-is (with a comment-free doc reference in the report).
+- **FR5:** The audit re-run after Phase 7 shows **0 migration-target sites in the 37-file scope** (all 76 sites are either `INTERNAL_COMPLIANT`, `BOUNDARY_*`, or `INTERNAL_PROGRAMMER_RAISE`).
+- **FR6:** The full test suite (`uv run python scripts/run_tests_batched.py`) continues to PASS; the tier-1, tier-2, and tier-3 test counts are unchanged OR grow by the number of new tests added.
+
+## 4. Non-Functional Requirements
+
+- **NF1:** No production code change outside the 37 files in scope. Phase 1 modifies only `scripts/audit_exception_handling.py`; Phases 2-7 modify the 37 source files.
+- **NF2:** Atomic per-task commits. Each phase is a separate commit batch. Within Phase 7, batch 5-7 files per commit (per the umbrella spec).
+- **NF3:** Per-commit git notes summarizing the work.
+- **NF4:** The 1-space indentation convention is enforced on all Python code (per `conductor/workflow.md`).
+- **NF5:** No diagnostic noise in production code (per AGENTS.md "No Diagnostic Noise in Production" rule).
+- **NF6:** The TDD red-green-refactor cycle is followed for every code change.
+
+## 5. Architecture Reference
+
+- `conductor/code_styleguides/error_handling.md` — the canonical styleguide (5 patterns + 5 doc sections; the migration target)
+- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference
+- `docs/AGENTS.md` §"Convention Enforcement" — the 4 enforcement audit scripts
+- `docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` — the parent audit report (268-site inventory)
+- `docs/reports/RESULT_MIGRATION_REVIEW_PASS_20260617.md` — the review-pass report (43 sites classified; 3 audit-script bugs documented in §4.4)
+- `conductor/tracks/result_migration_20260616/spec.md` — the umbrella spec (the per-sub-track plan section)
+- `conductor/tracks/result_migration_20260616/plan.md` — the umbrella's plan
+- `conductor/tracks/result_migration_review_pass_20260617/plan.md` — the review-pass plan (per-site decisions + heuristics)
+- `docs/guide_ai_client.md` §"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the provider layer
+- `docs/guide_mcp_client.md` §"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the MCP tool layer
+- `docs/guide_rag.md` §"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the RAG engine
+- `src/result_types.py` — the `Result[T]` and `NIL_T` definitions
+- `scripts/audit_exception_handling.py` — the audit script being fixed (Phase 1)
+
+## 6. Out of Scope (Explicit)
+
+- **Migrating the 3 BASELINE files** (mcp_client, ai_client, rag_engine) — sub-track 5's work.
+- **Migrating `src/gui_2.py` or `src/app_controller.py`** — sub-tracks 4 and 3's work, respectively.
+- **The `send_result` → `send` mass rename** — separate work after this phase.
+- **The umbrella's per-sub-track plan** (sub-tracks 2-4 ordering is unchanged; sub-track 4's +1 site is documented in the umbrella's "Post-Review Pass Update" callout).
+- **Adding new `Result` patterns to areas that don't have any** (this track migrates EXISTING `try/except` sites only).
+- **Refactoring the audit script's overall architecture** (Phase 1 fixes the 3 specific bugs; the broader architecture refactor is out of scope).
+
+## 7. Verification Criteria
+
+- **G1:** `scripts/audit_exception_handling.py` is fixed; the 3 documented bugs are verified by the new TDD tests in `tests/test_audit_exception_handling_bug_fixes.py`.
+- **G2:** Re-running the audit post-Phase-1: `src/rag_engine.py:31` is in the findings; the per-file list is complete (not filtered to violations-only); the per-file list is not truncated to top 15.
+- **G3:** The 4 UNCLEAR sites in the SMALL bucket are classified; the decisions are recorded in the track's per-site report.
+- **G4:** All 37 files in scope are migrated to the convention. Re-running the audit post-Phase-7: 0 migration-target sites in the 37-file scope.
+- **G5:** Full test suite continues to PASS (`uv run python scripts/run_tests_batched.py`).
+- **G6:** Atomic commits: spec, plan, metadata + state, Phase 1 fix commits (3), Phase 2 UNCLEAR classification, Phase 3-7 migration batches (5-7 files per commit).
+
+## 8. Risks
+
+- **R1:** Fixing the `visit_Try` bug surfaces new migration-target sites in sub-track 2's 37 files (raises in non-last except handlers). The Phase 1 commit should be verified with `--json` to count the new findings; if the count grows, the per-batch scope adjusts.
+- **R2:** The 4 UNCLEAR sites turn out to be non-trivial migrations (more than a 5-line Result conversion). If so, the per-file batch plan is updated; the user's T-shirt-size estimate (L) may grow to XL.
+- **R3:** The audit-script fixes introduce regressions in the existing 10 TDD tests. The TDD workflow catches this; if a regression occurs, the fix is rolled back and re-implemented.
+- **R4:** The migration breaks behavior in a way the test suite doesn't catch. The 11 test tiers exercise most code paths, but the SMALL files are not all live_gui-tested. For files that are not covered, manual smoke-testing or a targeted integration test is added.
+- **R5:** The batched-commit pattern (5-7 files per commit) is too coarse; some files have complex migrations that need their own commit. The batch plan can be adjusted per-file (the umbrella's spec is guidance, not a rigid rule).
+
+## 9. Notes for the Tier 2 Implementer
+
+- **Phase 1 is a TDD refactor of the audit script.** The 3 bugs are documented in the review-pass report §4.4. Each bug has a `WHERE: line range` and `WHAT: the fix`. Write failing tests first.
+- **Phase 2 is a research task.** Read the 4 UNCLEAR sites (use `get_file_slice` to read each line + 2-3 lines of context). Classify compliant-or-migration. Document in the report.
+- **Phases 3-7 are mechanical migrations.** For each `try/except` site:
+  1. Read the snippet + 5-10 lines of context
+  2. Determine the return type (e.g., `str` → `Result[str]`, `None` → `Result[None]` or `Result[SomeType]`)
+  3. Add a `Result` import (or use existing)
+  4. Convert `except Exception as e: return None` to `except SpecificError as e: return Result(data=NIL_T, errors=[ErrorInfo(category="...", message=str(e))])`
+  5. Update the caller to check `result.ok` and `result.errors`
+  6. Add a test for the new Result-based API
+- **The 2 MEDIUM files (session_logger, warmup) get dedicated commits** (per the umbrella spec).
+- **The 35 SMALL files get batched commits** (5-7 files per commit). Group by topic to keep commits focused (e.g., all theme files together, all logging files together, all preset files together).
+- **Per-file changes are small** (1-5 lines per migration site; ~5-20 lines per file for imports + result type introduction).
+- **Throw-away scripts go in `scripts/tier2/artifacts/result_migration_small_files_20260617/`** (per Tier 2 convention).