diff --git a/conductor/tracks/result_migration_small_files_20260617/spec.md b/conductor/tracks/result_migration_small_files_20260617/spec.md new file mode 100644 index 00000000..797c0111 --- /dev/null +++ b/conductor/tracks/result_migration_small_files_20260617/spec.md @@ -0,0 +1,222 @@ +# Track Specification: Result Migration — Sub-Track 2 (Small Files + Audit-Script Bug Fixes) + +**Track ID:** `result_migration_small_files_20260617` +**Parent umbrella:** [`result_migration_20260616`](../../result_migration_20260616/spec.md) (sub-track 2 of 5) +**Type:** refactor + audit-script maintenance (1 file script fix + 37 source file migrations) +**Priority:** A (foundational; the convention's middle layer) +**T-shirt size:** L +**Status:** ready to start (sub-track 1 shipped; 4 UNCLEAR sites need classification) + +--- + +## 0. Overview + +This is sub-track 2 of the 5-sub-track `result_migration_20260616` campaign. It does two things in one track: + +1. **Phase 1: Fix 3 pre-existing audit-script bugs** (documented in the review pass report §4.4) so that the audit's classification and reporting are correct for sub-tracks 2-5. +2. **Phases 2-7: Migrate 37 source files** (the 35 SMALL + 2 MEDIUM from the `--by-size` bucket) to the data-oriented error handling convention. + +The audit-script fix MUST happen first because: +- The `visit_Try` walker bug actively misclassifies `raise` statements in non-last `except` handlers (confirmed: `src/rag_engine.py:31` is missed). Running the audit against the 37 files before the fix produces a wrong scope. +- The `render_json` filter + truncation bugs hide findings in the per-file report. Fixing them gives Tier 2 accurate per-file guidance. + +**Why combine the two:** the audit-script fixes are small (~50-100 lines), well-scoped, and pre-existing in the project's institutional memory. Folding them into sub-track 2 (which already has the SMALL batched-commit pattern) is cheaper than a separate 1-task track. + +## 1. Current State Audit (as of 2026-06-17, base commit `b6caca40` post-review-pass merge) + +### 1.1 The 37-File Scope (per `scripts/audit_exception_handling.py --by-size`) + +| Bucket | Files | V+S+? | Notes | +|---|---|---|---| +| SMALL | 35 | 48V + 9S + 4? = 61 sites | Batched migration (5-7 files per commit) | +| MEDIUM | 2 (session_logger, warmup) | 14V + 1S = 15 sites | Dedicated commits per file | +| **Total** | **37** | **76 sites** | | + +The 4 UNCLEAR sites in SMALL are NOT classified by the review pass (they were "outside review scope" per the review-pass report §4.3). They are: + +| File | Site | Why still UNCLEAR | +|---|---|---| +| `src/outline_tool.py` | line 49 | Audit's `_classify_except` heuristic doesn't match the pattern | +| `src/summarize.py` | line 36 | Same | +| `src/conductor_tech_lead.py` | line 1 | Same | +| `src/openai_compatible.py` | line 1 | Same | + +These 4 are **Phase 2 work** of this track: read each snippet, classify compliant-or-migration, record the decision in the report. Per the review-pass convention, sites that are compliant don't need migration; sites that are migration-target get a per-site decision. + +### 1.2 The 35 SMALL Files (per `audit_exception_handling.py --by-size`) + +| File | V | S | ? | C | total | +|---|---|---|---|---|---| +| src/api_hooks.py | 3 | 2 | 0 | 0 | 5 | +| src/project_manager.py | 5 | 0 | 0 | 0 | 5 | +| src/aggregate.py | 4 | 0 | 0 | 1 | 5 | +| src/multi_agent_conductor.py | 4 | 0 | 0 | 4 | 8 | +| src/summary_cache.py | 4 | 0 | 0 | 0 | 4 | +| src/commands.py | 3 | 0 | 0 | 0 | 3 | +| src/external_editor.py | 3 | 0 | 0 | 0 | 3 | +| src/models.py | 2 | 1 | 0 | 2 | 5 | +| src/outline_tool.py | 2 | 1 | 1 | 0 | 4 | +| src/file_cache.py | 2 | 0 | 0 | 1 | 3 | +| src/gemini_cli_adapter.py | 0 | 2 | 0 | 2 | 4 | +| src/log_registry.py | 2 | 0 | 0 | 2 | 4 | +| src/markdown_helper.py | 2 | 0 | 0 | 0 | 2 | +| src/orchestrator_pm.py | 2 | 0 | 0 | 1 | 3 | +| src/presets.py | 2 | 0 | 0 | 3 | 5 | +| src/shell_runner.py | 1 | 1 | 0 | 2 | 4 | +| src/command_palette.py | 1 | 0 | 0 | 1 | 2 | +| src/context_presets.py | 1 | 0 | 0 | 0 | 1 | +| src/diff_viewer.py | 1 | 0 | 0 | 0 | 1 | +| src/hot_reloader.py | 1 | 0 | 0 | 1 | 2 | +| src/startup_profiler.py | 1 | 0 | 0 | 1 | 2 | +| src/summarize.py | 1 | 0 | 1 | 0 | 2 | +| src/theme_2.py | 1 | 0 | 0 | 0 | 1 | +| src/theme_models.py | 0 | 1 | 0 | 9 | 10 | +| src/vendor_capabilities.py | 0 | 1 | 0 | 0 | 1 | +| src/api_hook_client.py | 0 | 0 | 0 | 2 | 2 | +| src/conductor_tech_lead.py | 0 | 0 | 1 | 2 | 3 | +| src/dag_engine.py | 0 | 0 | 0 | 1 | 1 | +| src/log_pruner.py | 0 | 0 | 0 | 2 | 2 | +| src/openai_compatible.py | 0 | 0 | 1 | 0 | 1 | +| src/paths.py | 0 | 0 | 0 | 3 | 3 | +| src/performance_monitor.py | 0 | 0 | 0 | 1 | 1 | +| src/personas.py | 0 | 0 | 0 | 3 | 3 | +| src/tool_presets.py | 0 | 0 | 0 | 3 | 3 | +| src/workspace_manager.py | 0 | 0 | 0 | 3 | 3 | +| **SMALL subtotal** | **48** | **9** | **4** | **50** | **111** | + +### 1.3 The 2 MEDIUM Files + +| File | V | S | ? | C | total | +|---|---|---|---|---|---| +| src/session_logger.py | 8 | 0 | 0 | 0 | 8 | +| src/warmup.py | 6 | 1 | 0 | 0 | 7 | +| **MEDIUM subtotal** | **14** | **1** | **0** | **0** | **15** | + +### 1.4 The 3 Audit-Script Bugs (per review-pass report §4.4) + +The review pass documented 3 pre-existing bugs in `scripts/audit_exception_handling.py`. All 3 are fixed in Phase 1 of this track. + +| Bug | Location | Impact | Fix Complexity | +|---|---|---|---| +| `visit_Try` only walks children of the LAST except handler | `scripts/audit_exception_handling.py:759-784` (specifically L774: `for child in handler.body if node.handlers else []` uses the loop variable `handler` from L771, which is the last iteration) | **Real classification bug.** Misses `raise` statements in non-last except handlers. Confirmed: `src/rag_engine.py:31` is not in the audit findings. Will reclassify 5-15 sites once fixed. | TDD: ~30 lines, 3-4 tests | +| `render_json` filters out compliant findings in non-verbose mode | `scripts/audit_exception_handling.py:884, 889, 958` (filter is `if f.category in VIOLATION_CATEGORIES or f.category in ("UNCLEAR", "INTERNAL_RETHROW")` — `INTERNAL_COMPLIANT` is excluded) | **Reporting bug.** Totals are right; per-file list is incomplete. The 25 newly-classified compliant sites (from the review pass) are not in the per-file list. | TDD: ~20 lines, 2 tests | +| `render_json` truncates per-file list to `top` (default 15) | `scripts/audit_exception_handling.py:1058` (CLI default), `scripts/audit_exception_handling.py:958` (the `[r for r in sorted_reports[:top]]` slice) | **Reporting bug.** UNCLEAR sites in low-violation files (e.g., `outline_tool.py`, `summarize.py`) are not in the per-file list. | TDD: ~10 lines, 1 test | + +**Estimated total Phase 1 scope:** ~60 lines of changes (1 file), 6-9 TDD tests, 1 commit (or 3 if per-bug atomic). + +### 1.5 The 4 UNCLEAR Sites (Phase 2 classification) + +The review pass did NOT classify these 4 sites (they were below the audit's 24-site review threshold). Phase 2 of this track reads each site + 2-3 lines of context and decides compliant-or-migration. The decisions feed into Phase 3+ as additional migration targets OR as "no-op" (already compliant). + +Per the review-pass convention: +- **Compliant** = add to the report as a "no-op" line; no code change +- **Migration-target** = queue for Phase 3+ batches (add to the per-batch scope) + +### 1.6 The Migration Pattern (per the styleguide) + +Each `try/except` site that is a migration-target follows this transformation (per `conductor/code_styleguides/error_handling.md`): + +**Before** (idiomatic Python): +```python +def some_function(arg: str) -> SomeResult: + try: + return compute(arg) + except Exception as e: + logger.error("...") + return None +``` + +**After** (data-oriented): +```python +def some_function(arg: str) -> Result[SomeResult]: + try: + return Result(data=compute(arg)) + except SpecificError as e: + return Result(data=NIL_T, errors=[ErrorInfo(category="...", message=str(e), ...)]) +``` + +The convention uses `Result[T]` (from `src/result_types.py`) with `NIL_T` sentinel and `ErrorInfo` dataclass. The 3 refactored baseline files (mcp_client, ai_client, rag_engine) are the reference implementations. + +## 2. Goals + +The track has 3 goals, all bounded by scope (not time): + +1. **Fix the 3 audit-script bugs** so the audit is accurate for sub-tracks 2-5. +2. **Classify the 4 UNCLEAR sites** in the SMALL bucket. +3. **Migrate 76 sites across 37 files** to the data-oriented error handling convention. + +## 3. Functional Requirements + +- **FR1:** The 3 audit-script bugs in `scripts/audit_exception_handling.py` are fixed; each fix has a TDD test in `tests/test_audit_exception_handling_bug_fixes.py` (or a new test file). +- **FR2:** Re-running `uv run python scripts/audit_exception_handling.py --json` after Phase 1 shows the corrected classification (the `rag_engine.py:31` raise is now in the findings; the per-file list is complete; the per-file list is no longer truncated to top 15 by default). +- **FR3:** A per-site decision table for the 4 UNCLEAR sites is written to `docs/reports/RESULT_MIGRATION_SMALL_FILES_20260617.md` (the track's per-site report). +- **FR4:** All 35 SMALL + 2 MEDIUM files are migrated to the convention. Each `try/except` migration-target is converted to a `Result[T]` return; the compliant sites stay as-is (with a comment-free doc reference in the report). +- **FR5:** The audit re-run after Phase 7 shows **0 migration-target sites in the 37-file scope** (all 76 sites are either `INTERNAL_COMPLIANT`, `BOUNDARY_*`, or `INTERNAL_PROGRAMMER_RAISE`). +- **FR6:** The full test suite (`uv run python scripts/run_tests_batched.py`) continues to PASS; the tier-1, tier-2, and tier-3 test counts are unchanged OR grow by the number of new tests added. + +## 4. Non-Functional Requirements + +- **NF1:** No production code change outside the 37 files in scope. Phase 1 modifies only `scripts/audit_exception_handling.py`; Phases 2-7 modify the 37 source files. +- **NF2:** Atomic per-task commits. Each phase is a separate commit batch. Within Phase 7, batch 5-7 files per commit (per the umbrella spec). +- **NF3:** Per-commit git notes summarizing the work. +- **NF4:** The 1-space indentation convention is enforced on all Python code (per `conductor/workflow.md`). +- **NF5:** No diagnostic noise in production code (per AGENTS.md "No Diagnostic Noise in Production" rule). +- **NF6:** The TDD red-green-refactor cycle is followed for every code change. + +## 5. Architecture Reference + +- `conductor/code_styleguides/error_handling.md` — the canonical styleguide (5 patterns + 5 doc sections; the migration target) +- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference +- `docs/AGENTS.md` §"Convention Enforcement" — the 4 enforcement audit scripts +- `docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` — the parent audit report (268-site inventory) +- `docs/reports/RESULT_MIGRATION_REVIEW_PASS_20260617.md` — the review-pass report (43 sites classified; 3 audit-script bugs documented in §4.4) +- `conductor/tracks/result_migration_20260616/spec.md` — the umbrella spec (the per-sub-track plan section) +- `conductor/tracks/result_migration_20260616/plan.md` — the umbrella's plan +- `conductor/tracks/result_migration_review_pass_20260617/plan.md` — the review-pass plan (per-site decisions + heuristics) +- `docs/guide_ai_client.md` §"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the provider layer +- `docs/guide_mcp_client.md` §"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the MCP tool layer +- `docs/guide_rag.md` §"Data-Oriented Error Handling (Fleury Pattern)" — the in-context guide for the RAG engine +- `src/result_types.py` — the `Result[T]` and `NIL_T` definitions +- `scripts/audit_exception_handling.py` — the audit script being fixed (Phase 1) + +## 6. Out of Scope (Explicit) + +- **Migrating the 3 BASELINE files** (mcp_client, ai_client, rag_engine) — sub-track 5's work. +- **Migrating `src/gui_2.py` or `src/app_controller.py`** — sub-tracks 4 and 3's work, respectively. +- **The `send_result` → `send` mass rename** — separate work after this phase. +- **The umbrella's per-sub-track plan** (sub-tracks 2-4 ordering is unchanged; sub-track 4's +1 site is documented in the umbrella's "Post-Review Pass Update" callout). +- **Adding new `Result` patterns to areas that don't have any** (this track migrates EXISTING `try/except` sites only). +- **Refactoring the audit script's overall architecture** (Phase 1 fixes the 3 specific bugs; the broader architecture refactor is out of scope). + +## 7. Verification Criteria + +- **G1:** `scripts/audit_exception_handling.py` is fixed; the 3 documented bugs are verified by the new TDD tests in `tests/test_audit_exception_handling_bug_fixes.py`. +- **G2:** Re-running the audit post-Phase-1: `src/rag_engine.py:31` is in the findings; the per-file list is complete (not filtered to violations-only); the per-file list is not truncated to top 15. +- **G3:** The 4 UNCLEAR sites in the SMALL bucket are classified; the decisions are recorded in the track's per-site report. +- **G4:** All 37 files in scope are migrated to the convention. Re-running the audit post-Phase-7: 0 migration-target sites in the 37-file scope. +- **G5:** Full test suite continues to PASS (`uv run python scripts/run_tests_batched.py`). +- **G6:** Atomic commits: spec, plan, metadata + state, Phase 1 fix commits (3), Phase 2 UNCLEAR classification, Phase 3-7 migration batches (5-7 files per commit). + +## 8. Risks + +- **R1:** Fixing the `visit_Try` bug surfaces new migration-target sites in sub-track 2's 37 files (raises in non-last except handlers). The Phase 1 commit should be verified with `--json` to count the new findings; if the count grows, the per-batch scope adjusts. +- **R2:** The 4 UNCLEAR sites turn out to be non-trivial migrations (more than a 5-line Result conversion). If so, the per-file batch plan is updated; the user's T-shirt-size estimate (L) may grow to XL. +- **R3:** The audit-script fixes introduce regressions in the existing 10 TDD tests. The TDD workflow catches this; if a regression occurs, the fix is rolled back and re-implemented. +- **R4:** The migration breaks behavior in a way the test suite doesn't catch. The 11 test tiers exercise most code paths, but the SMALL files are not all live_gui-tested. For files that are not covered, manual smoke-testing or a targeted integration test is added. +- **R5:** The batched-commit pattern (5-7 files per commit) is too coarse; some files have complex migrations that need their own commit. The batch plan can be adjusted per-file (the umbrella's spec is guidance, not a rigid rule). + +## 9. Notes for the Tier 2 Implementer + +- **Phase 1 is a TDD refactor of the audit script.** The 3 bugs are documented in the review-pass report §4.4. Each bug has a `WHERE: line range` and `WHAT: the fix`. Write failing tests first. +- **Phase 2 is a research task.** Read the 4 UNCLEAR sites (use `get_file_slice` to read each line + 2-3 lines of context). Classify compliant-or-migration. Document in the report. +- **Phases 3-7 are mechanical migrations.** For each `try/except` site: + 1. Read the snippet + 5-10 lines of context + 2. Determine the return type (e.g., `str` → `Result[str]`, `None` → `Result[None]` or `Result[SomeType]`) + 3. Add a `Result` import (or use existing) + 4. Convert `except Exception as e: return None` to `except SpecificError as e: return Result(data=NIL_T, errors=[ErrorInfo(category="...", message=str(e))])` + 5. Update the caller to check `result.ok` and `result.errors` + 6. Add a test for the new Result-based API +- **The 2 MEDIUM files (session_logger, warmup) get dedicated commits** (per the umbrella spec). +- **The 35 SMALL files get batched commits** (5-7 files per commit). Group by topic to keep commits focused (e.g., all theme files together, all logging files together, all preset files together). +- **Per-file changes are small** (1-5 lines per migration site; ~5-20 lines per file for imports + result type introduction). +- **Throw-away scripts go in `scripts/tier2/artifacts/result_migration_small_files_20260617/`** (per Tier 2 convention).