From 7ea414e988db91e14e7b0a3e41d3058bf2d2593d Mon Sep 17 00:00:00 2001 From: Ed_ Date: Mon, 22 Jun 2026 00:03:32 -0400 Subject: [PATCH] conductor(spec): code_path_audit_20260607 v2 - data-pipeline + decomposition-cost lens Re-scopes the audit from 'expensive operations per action' (v1) to 'data pipelines per aggregate' (v2). The v1 framing was correct 2026-06-07 (the 4 foundational tracks were future) but is now stale; v2 also cross-validates the data_structure_strengthening + data_oriented_error_handling deductions directly. 10 in-scope aggregates (Metadata, FileItem, FileItems, CommsLogEntry, CommsLog, HistoryMessage, History, ToolDefinition, ToolCall, Result[T]) + 3 candidate aggregates (ToolSpec, ChatMessage, ProviderHistory; forward-compat placeholders for any_type_componentization_20260621 which is NOT on master). 4 static analyses: PCG (3 AST passes), MemoryDim classifier, APD (5 access patterns), CFE (7 frequencies). 11 public functions, all return Result[T] per error_handling.md hard rule. Decomposition-cost heuristic per aggregate answers: 'should this data be componentize further (split) or unify further (wider fat structs)?' 4 directions: componentize, unify, hold, insufficient_data. 10-phase TDD plan, 69 tests total. Consumes JSON from 6 existing audit scripts (cross-validates data_structure_strengthening + data_oriented_error_handling). Out-of-scope: runtime profiling (deferred to pipeline_runtime_profiling_20260607), MMA worker spawn (cold). v1 spec.md + plan.md preserved unchanged. --- .../code_path_audit_20260607/spec_v2.md | 636 ++++++++++++++++++ 1 file changed, 636 insertions(+) create mode 100644 conductor/tracks/code_path_audit_20260607/spec_v2.md diff --git a/conductor/tracks/code_path_audit_20260607/spec_v2.md b/conductor/tracks/code_path_audit_20260607/spec_v2.md new file mode 100644 index 00000000..dfa48c12 --- /dev/null +++ b/conductor/tracks/code_path_audit_20260607/spec_v2.md @@ -0,0 +1,636 @@ +# Track Specification: Code Path & Data Pipeline Audit v2 + +**Status:** Spec v2 (revised 2026-06-22; v1 was approved 2026-06-07 and revised 2026-06-08 with the post-4-tracks timing + 5-source framing) +**Initialized:** 2026-06-07 (v1); 2026-06-22 (v2 supersedes v1) +**Owner:** Tier 1 (spec) -> Tier 2 (plan + execution) +**Priority:** High (foundational; enables follow-up pruning + per-pipeline refactor tracks) +**Folder:** `conductor/tracks/code_path_audit_20260607/` +**Files:** `spec.md` (v1; preserved), `spec_v2.md` (this file), `plan.md` (v1; preserved), `plan_v2.md` (after this spec is approved) + +> **v2 revision note (2026-06-22).** The v1 spec.md (approved 2026-06-07; revised 2026-06-08) was never executed (no `state.toml`, no `metadata.json`, no `src/code_path_audit.py` in the working tree). The 14-day gap saw 4 foundational tracks ship (`qwen_llama_grok_integration_20260606`, `data_oriented_error_handling_20260606`, `data_structure_strengthening_20260606`, `mcp_architecture_refactor_20260606`), the entire 5-sub-track `result_migration` campaign ship (2026-06-16 through 2026-06-21; 100% complete), and the `nagent_review` corpus grow from v1 to v3.1. v2 re-scopes the audit from "expensive operations per action" to "data pipelines per aggregate" — the v1 framing was correct at the time (the 4 tracks were future) but is now stale. v2 also cross-validates the `data_structure_strengthening_20260606` + `data_oriented_error_handling_20260606` deductions directly, which v1 could not (those tracks didn't exist on 2026-06-07). See §"Why v2" below. + +--- + +## Why v2 (the rationale for the revision) + +The user's framing (2026-06-22): + +> "The whole point of the code path audit is to audit all paths nearly in the ./src of the codebase. The main point of it is to identify data-oriented pipelines and what data aggregate they will be operating on. This will realize what the data strengthening just uncovered and cross-audit if its deductions on the data structures are accurate while also being able to utilize additional flexibility the data oriented error handling track has provided. We are entering a time where the codebase is getting heavily adjusted into a properly engineered machine with discernable working parts." +> +> "The cost of the pipeline is important, it should factor in what data needs to be componentized further vs which can be unified further into wider code paths handling larger fat structs." + +**Three changes from v1 to v2:** + +1. **Output structure: per-action -> per-data-aggregate.** v1 emitted 3 per-action profiles (`ai_message_lifecycle`, `discussion_save_load`, `gui_startup`). v2 emits 10+3 per-data-aggregate profiles (`Metadata`, `FileItem`, `FileItems`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `ToolDefinition`, `ToolCall`, `Result[T]` + the 3 candidate aggregates `ChatMessage`, `ToolSpec`, `ProviderHistory`). The per-action reports are preserved for backward compat but downgraded to "cross-references to the per-aggregate profiles." + +2. **Cross-validation with the 5 existing audit scripts.** v1 was a standalone tool. v2 consumes JSON from `audit_weak_types`, `audit_exception_handling`, `audit_optional_in_3_files`, `audit_no_models_config_io`, `audit_main_thread_imports`, and the type registry (`generate_type_registry.py --json`). The v2 audit's per-aggregate `cross_audit_findings` + `result_coverage` + `type_alias_coverage` are the cross-checks of the 2 foundational tracks (`data_structure_strengthening` + `data_oriented_error_handling`). + +3. **The decomposition-cost heuristic.** v1 had a "cost model" focused on expensive operations (file I/O, network, AST parse). v2 adds a `DecompositionCost` heuristic per aggregate that answers the user's question: "should this data be componentized further (split into smaller dataclasses) or unified further (combined into wider fat structs)?" The recommendation is grounded in 3 dimensions: access pattern (whole_struct / field_by_field / hot_cold_split / bulk_batched / mixed), frequency (hot / per_turn / per_discussion / per_request / cold / init / unknown), and shape (struct_field_count + struct_frozen). + +--- + +## Overview + +Build `src/code_path_audit.py` v2 — a data-oriented static-analysis tool that audits the data pipelines in `src/` and produces per-data-aggregate profiles. The output (custom postfix `.dsl` data + markdown + prefix tree text, organized per-aggregate) is the artifact that informs per-aggregate refactor decisions. The actual code changes are follow-up tracks (the 3 high-priority candidates from `decomposition_matrix.md`). + +The v2 audit's primary value is **cross-validation**: it consumes the JSON outputs of the 5 existing audit scripts and synthesizes them with the per-aggregate producer/consumer call graph. The result is a per-aggregate report that says "this aggregate has 12 weak-type sites (cross-checks `data_structure_strengthening`), 5 exception-handling sites (cross-checks `data_oriented_error_handling`), and 1 high-priority optimization candidate (decomposition direction: componentize)." The user reads one report per aggregate, not one per action. + +The v2 audit is **read-only** on `src/` (the only new file is the tool itself + its tests + the report). The MMA worker spawn action is **out of scope** (per v1; the user's "keeping MMA cold" directive from 2026-06-07 still stands). Runtime profiling is **out of scope** (deferred to `pipeline_runtime_profiling_20260607`); the v2's heuristic cost constants are recalibrated by that follow-up. + +--- + +## Current State Audit (as of `7e61dd7d`) + +`src/` has 65 `.py` files (per the result migration campaign's final state). The call graph is dense; per-aggregate traversal is what makes the analysis tractable. The 4 foundational tracks that v1 deferred behind have all shipped; the 2 follow-up tracks (`any_type_componentization_20260621` + `phase2_4_5_call_site_completion_20260621`) are NOT on master (merged in `f914b2bc` then reverted in `751b94d4`); the v2 audit must be tolerant of their absence for an interim run. + +### Already Implemented (DO NOT re-implement; KEEP / build on) + +1. **`scripts/audit_main_thread_imports.py`** — the import-graph CI gate. The v2 audit consumes its JSON output (per the v2's `cross_audit_findings.import_graph` field). v2 does not modify this script. + +2. **`scripts/audit_weak_types.py`** — the weak-types CI gate. v2 consumes its JSON output. v2 does not modify this script. + +3. **`scripts/audit_exception_handling.py`** — the exception-handling CI gate (per `error_handling.md`). v2 consumes its JSON output. v2 does not modify this script. + +4. **`scripts/audit_optional_in_3_files.py`** — the `Optional[T]` ban CI gate for the 3 refactored files (`mcp_client.py`, `ai_client.py`, `rag_engine.py`). v2 extends this script by 1 line (add `src/code_path_audit.py` to the baseline list); the convention is the same. + +5. **`scripts/audit_no_models_config_io.py`** — the config-I/O ownership CI gate (per `conductor/code_styleguides/config_state_owner.md`). v2 consumes its JSON output. v2 does not modify this script. + +6. **`scripts/generate_type_registry.py`** — the type-registry generator (per `conductor/code_styleguides/type_aliases.md`). v2 consumes its JSON output. v2 does not modify this script. + +7. **`src/type_aliases.py`** — the 10 canonical TypeAliases + 1 NamedTuple (`FileItemsDiff`). v2 imports these; v2 does not redefine them. The 13 data aggregates (10 + 3 candidates) are referenced by their canonical names. + +8. **`src/result_types.py`** — `Result[T]`, `ErrorInfo`, `NilPath`, `NilRAGState`, `ErrorKind`. v2 imports these; v2 does not redefine them. v2's public functions return `Result[T]` per the `error_handling.md` hard rule. + +9. **`src/mcp_client.py:934-992` — `derive_code_path(target, max_depth=5)`.** A single-symbol recursive call tracer with text output. v2 builds on this pattern; the v2's PCG P1 (return-type pass) is the multi-symbol superset. The v1 spec's `CallGraph` is subsumed by the v2's `ProducerConsumerGraph` (function-to-aggregate edges, not function-to-function edges). + +10. **`src/performance_monitor.py`** — runtime profiling with `monitor.scope("name")` + per-component hit counts + latencies. Used at runtime; the `pipeline_runtime_profiling_20260607` follow-up uses it to calibrate the v2's heuristic cost constants. + +11. **`conductor/code_styleguides/data_oriented_design.md`** — the canonical DOD reference. v2's decomposition-cost heuristic is informed by the 8 defaults in §2 (especially "The common case dominates" + "Where there is one, there are many"). v2's per-aggregate access pattern classification follows the DOD's "Algorithms on data" framing. + +12. **`conductor/code_styleguides/error_handling.md`** — the `Result[T]` convention. v2's public API returns `Result[T]` per the hard rule (§"Hard Rules" §"The 5 MUST-DO rules" + §"The 7 MUST-NOT-DO rules"). + +13. **`conductor/code_styleguides/type_aliases.md`** — the 10 TypeAliases + 1 NamedTuple. v2's per-aggregate `type_alias_coverage` metric is the cross-check of this convention. + +14. **`conductor/code_styleguides/agent_memory_dimensions.md`** — the 4 mem dims (curation / discussion / RAG / knowledge). v2's `MemoryDim` classifier (§7.2.2) follows the styleguide's "shape rule" (a feature that wants one should use the matching dimension). + +15. **`conductor/code_styleguides/feature_flags.md`** — the "delete to turn off" pattern. v2's `scripts/audit_code_path_audit_coverage.py` is a feature flag (the meta-audit); removing the file disables the meta-audit. + +16. **`conductor/code_styleguides/cache_friendly_context.md`** — the stable-to-volatile cache ordering. v2's per-aggregate reports are a downstream consumer of the cache state (the `cache_friendly_context` is the "what stays in the LLM's context"; the v2's per-aggregate profile is the "what data flows through the LLM"). + +17. **`conductor/code_styleguides/knowledge_artifacts.md`** — the knowledge harvest pattern. v2's per-aggregate profiles are NOT a knowledge artifact (they're a curation artifact, per the 4-dim rule). + +18. **`conductor/code_styleguides/rag_integration_discipline.md`** — the conservative-RAG rule. v2's `RAG` aggregate (RAGEngine state, indexed chunks) is classified by the `MemoryDim` classifier; the audit does not mutate RAG state. + +19. **SDM docstrings** (`[C: ...]` / `[M: ...]` tags in `src/*.py` docstrings) — pre-computed caller/mutation info. v2's PCG is a more rigorous version of what SDM already documents ad-hoc. + +20. **`conductor/tracks/nagent_review_20260608/nagent_review_v3_1_20260620.md`** — the v3.1 nagent review. v2 references the v3.1 Candidates 27-30 (Markdown + custom DSL lock-in, per-turn ground-truth hook, dataset-curation track, cache TTL GUI hardening). The v2's custom postfix DSL is a direct application of Candidate 27 (markdown + custom DSL). + +21. **`docs/reports/computational_shapes_ssdl_digest_20260608.md`** — the SSDL digest that informed the v1 spec's 5-source lens. v2 preserves the lens (the 6 SSDL primitives are referenced in the v2's per-aggregate access pattern + frequency classification). + +22. **`docs/reports/RESULT_MIGRATION_CAMPAIGN_STATUS_20260619.md`** — the 100%-complete `result_migration` campaign (268 sites migrated + 9 legacy wrappers obliterated across 6 sub-tracks, 2026-06-16 through 2026-06-21). v2's `result_coverage` metric is the post-campaign check that the convention was applied uniformly across all 65 `src/` files. + +23. **`docs/reports/ANY_TYPE_AUDIT_20260621.md`** — the 89-site audit (48 promoted + 41 deferred) that informed `any_type_componentization_20260621`. v2 references the 3 candidate aggregates (§3.1 `ToolSpec`, §3.2 `ChatMessage`, §3.3 `ProviderHistory`) as forward-compat placeholders. + +24. **`docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md`** — the Tier 2's authoritative cost analysis of the 41 deferred Phase 3 sites (the 112 call sites in `_send_()` that would migrate to `ProviderHistory.append()`). v2's `ProviderHistory` candidate aggregate's placeholder is sourced from this report. + +25. **`conductor/tracks/code_path_audit_20260607/spec.md`** — the v1 spec (preserved). v2's structure is informed by v1's 6-phase plan + 5-source framing + 3-action output. + +26. **`conductor/tracks/code_path_audit_20260607/plan.md`** — the v1 plan (preserved, never executed). v2's plan is a fresh write. + +### Gaps to Fill (This Track's Scope) + +- A `ProducerConsumerGraph` builder for all of `src/` (3 AST passes: P1 return types, P2 parameter types, P3 field access). Multi-aggregate, machine-readable output. +- An `AccessPatternDetector` (5 patterns: whole_struct, field_by_field, hot_cold_split, bulk_batched, mixed). Per-`(function, aggregate)` classification with per-aggregate dominance rule (25% threshold). +- A `CallFrequencyEstimator` (7 frequencies: hot, per_turn, per_discussion, per_request, cold, init, unknown). Entry-point-based heuristic + manual override file. +- A `DecompositionCost` heuristic per aggregate (4 directions: componentize, unify, hold, insufficient_data). The 5-step `recommended_direction` logic per §7.5. +- A `MemoryDim` classifier per aggregate (7 dims: curation, discussion, rag, knowledge, config, control, unknown). Canonical mappings + file-of-origin heuristic + override. +- A per-aggregate profile data model (`AggregateProfile` + 9 supporting dataclasses + 5 enums: `AggregateKind`, `MemoryDim`, `AccessPattern`, `Frequency`, `RecommendedDirection`). All `frozen=True` per the immutability story. The 9 supporting dataclasses: `FunctionRef`, `AccessPatternEvidence`, `FrequencyEvidence`, `ResultCoverage`, `TypeAliasCoverage`, `CrossAuditFinding`, `CrossAuditFindings`, `DecompositionCost`, `OptimizationCandidate`. +- A cross-audit integration layer that consumes the 6 input JSON streams and produces per-aggregate `cross_audit_findings` + 2 coverage metrics (`result_coverage`, `type_alias_coverage`). +- The v2 postfix DSL (14 new tagged words + the v1's 7 preserved). The flat-section format (streamable, tag-scannable). +- Output: per-aggregate `.dsl` + `.md` + `.tree` files + 4 top-level rollup files (summary.md, cross_audit_summary.md, decomposition_matrix.md, candidates.md). +- A CLI (`python -m src.code_path_audit --all --date `) and an MCP tool (`code_path_audit_v2(action=None) -> dict`). +- A meta-audit (`scripts/audit_code_path_audit_coverage.py`) that validates the v2 audit's output schema. +- The actual audit run on the 13 aggregates, with the report committed to `docs/reports/code_path_audit//`. +- A new styleguide (`conductor/code_styleguides/code_path_audit.md`) documenting the v2 audit's contract. +- A 1-line extension to `scripts/audit_optional_in_3_files.py` to include `src/code_path_audit.py` in the baseline. + +--- + +## Goals + +1. **Produce a queryable artifact per aggregate.** The custom postfix `.dsl` output is the source of truth; markdown + prefix tree text are for human review. Re-run after any `src/` change to see drift. +2. **Cross-validate the 2 foundational conventions.** Per-aggregate `result_coverage` (the `data_oriented_error_handling` cross-check) + per-aggregate `type_alias_coverage` (the `data_structure_strengthening` cross-check). The verdict at the top of `summary.md` says "VERIFIED" or "DRIFT DETECTED" with the specific evidence. +3. **Surface the top-N decomposition candidates per aggregate.** The `decomposition_matrix.md` ranks candidates by `estimated_savings_us × frequency_multiplier`. This is what the user uses to decide which refactor track to do next. +4. **Data-grounded design.** The audit's data structure is the spec; the heuristics and the threshold are module-level constants tunable from one place (`scripts/code_path_audit_overrides.toml`). +5. **Reusable across aggregates.** The `build_pcg` + `classify_memory_dim` + `detect_access_pattern` + `estimate_call_frequency` + `compute_decomposition_cost` APIs take any aggregate (or "all 13"). Adding a 14th aggregate is 1 line in the `AGGREGATES` constant. +6. **Surface calibration gaps clearly.** When the static heuristic can't resolve a call (C-extension, decorator-driven dispatch, `getattr` magic), the report flags it as "unresolved" so the `pipeline_runtime_profiling_20260607` follow-up targets it. +7. **Tolerate the candidate aggregates' absence.** The 3 candidate aggregates (`ChatMessage`, `ToolSpec`, `ProviderHistory`) are NOT on master. The v2 audit produces placeholders with `is_candidate: True`; the report is still valid (the placeholders are clearly marked). + +--- + +## Functional Requirements + +The 11 public functions in `src/code_path_audit.py`. All return `Result[T]` per the `error_handling.md` hard rule (or return a deterministic `T` when no runtime failure is possible). + +| # | Function | Returns | Failure mode | +|---|---|---|---| +| 1 | `run_audit(src_dir, audit_inputs_dir, output_dir, date)` | `Result[AuditSummary]` | 6 input JSONs may be missing or malformed; src/ may be unparseable | +| 2 | `build_pcg(src_dir)` | `Result[ProducerConsumerGraph]` | AST parse errors in src/ | +| 3 | `classify_memory_dim(aggregate, type_registry)` | `MemoryDim` | n/a (deterministic) | +| 4 | `detect_access_pattern(function_body, aggregate)` | `AccessPattern` | n/a (deterministic) | +| 5 | `estimate_call_frequency(function, call_graph)` | `Frequency` | n/a (deterministic) | +| 6 | `compute_decomposition_cost(profile)` | `DecompositionCost` | n/a (deterministic) | +| 7 | `read_input_json(path)` | `Result[dict]` | file not found; malformed JSON | +| 8 | `to_dsl_v2(profile)` | `str` | n/a (deterministic) | +| 9 | `parse_dsl_v2(text)` | `Result[dict]` | malformed DSL | +| 10 | `to_markdown(profile)` | `str` | n/a (deterministic) | +| 11 | `to_tree(profile)` | `str` | n/a (deterministic) | + +Plus the CLI (`python -m src.code_path_audit ...`) and the MCP tool (`code_path_audit_v2`). + +--- + +## Non-Functional Requirements + +- **No new pip dependencies.** The v2 audit uses stdlib only (`ast`, `pathlib`, `json`, `dataclasses`, `tomllib` for the override file). +- **1-space indentation** for all Python code (per `conductor/workflow.md`). +- **CRLF line endings** on Windows. +- **Type hints required** for all public functions. +- **No comments in Python source** (documentation lives in `/docs`). +- **`Result[T]` return types** for all functions that can fail at runtime (per the `error_handling.md` hard rule). The new file is held to the same standard as the 3 refactored files. +- **`Optional[T]` return types are FORBIDDEN** in `src/code_path_audit.py`. Verified by the extended `scripts/audit_optional_in_3_files.py` (1-line extension). +- **Per-task commits** (1 task = 1 commit). Per `conductor/workflow.md` TDD protocol. +- **Per-task git notes** (each commit gets a `git notes add -m "..."` summary). +- **Coverage target: >80%** for `src/code_path_audit.py`. The 4 audit scripts (`audit_exception_handling.py --strict`, `audit_weak_types.py --strict`, `audit_main_thread_imports.py`, `audit_no_models_config_io.py`) are the verification gates. +- **The audit's runtime is bounded.** The full audit run against the real `src/` (65 files) completes in <60s on a developer machine. The unit + integration tests complete in <30s. The live_gui E2E tests are opt-in. + +--- + +## Architecture + +### 7.1 Public API (the 11 functions) + +#### 7.1.1 `run_audit(...)` + +The main entry point. Runs the full audit pipeline: + +1. Read the 6 input JSON files from `audit_inputs_dir` (using `read_input_json` per function #7). Missing files are tolerated; the corresponding `cross_audit_findings` field is `()` and the markdown notes the absence. +2. Build the PCG (using `build_pcg` per function #2). +3. For each of the 13 aggregates, build the `AggregateProfile`: + - `classify_memory_dim(aggregate, type_registry)` (function #3) + - `detect_access_pattern(consumer, aggregate)` (function #4) for each consumer; aggregate to the per-aggregate pattern + - `estimate_call_frequency(function, call_graph)` (function #5) for each producer + consumer; aggregate to the per-aggregate frequency + - Cross-validate with the 6 input JSONs (compute `cross_audit_findings`, `result_coverage`, `type_alias_coverage`) + - `compute_decomposition_cost(profile)` (function #6) + - Synthesize `optimization_candidates` from the cross-audit findings + the decomposition cost +4. Render the 13 per-aggregate `.dsl` + `.md` + `.tree` files. +5. Render the 4 top-level rollup files (`summary.md`, `cross_audit_summary.md`, `decomposition_matrix.md`, `candidates.md`). +6. Return `Result[AuditSummary]` with the per-aggregate profiles + the rollup paths. + +#### 7.1.2 The other 10 functions + +Per the table in §"Functional Requirements." The deterministic functions (3, 4, 5, 6, 8, 10, 11) take already-parsed data and return data; no I/O. The boundary functions (1, 2, 7, 9) catch stdlib I/O + AST parse errors and convert to `ErrorInfo` per `error_handling.md` Pattern 2. + +### 7.2 The 4 static analyses (PCG, MemoryDim, APD, CFE) + +#### 7.2.1 `ProducerConsumerGraph` (PCG) — pipeline discovery + +**Three AST passes over `src/`:** + +| Pass | What it finds | Output | +|---|---|---| +| **P1: Return types** | `FunctionDef.returns` annotation -> `Result[T]` -> producer of `T`; or direct `T` (alias or dataclass) -> producer of `T`. | `(function, aggregate, "producer", confidence="high")` edges | +| **P2: Parameter types** | `FunctionDef.args` annotation -> parameter is a TypeAlias or dataclass -> consumer of that aggregate. `dict[str, Any]` parameter is NOT a consumer edge (typed by P3). | `(function, aggregate, "consumer", confidence="high")` edges | +| **P3: Field access** | Every `payload['key']` and `payload.attr` in the function body. The audit consults `scripts/generate_type_registry.py --json` to map `key` to a known field of a known aggregate. If `key` is unique to one aggregate (e.g., `'vision'` -> `VendorCapabilities`), the consumer edge is high-confidence. If `key` is ambiguous (e.g., `'path'` appears in both `FileItem` and `ContextPreset`), the edge is low-confidence and the markdown flags it. | `(function, aggregate, "consumer", confidence=...)` edges | + +**Edge cases the algorithm handles:** + +- **Constructor calls** (`dict(...)`, `SomeDataclass(...)`, `SomeNamedTuple(...)`) inside a function body: the function is a producer at the call site. The audit tracks the call's `type` argument (`dict`, `SomeDataclass`) to identify the aggregate. +- **Re-exports** (`from src.type_aliases import Metadata`): the audit uses `import` resolution to find the canonical TypeAlias definition, not the re-exported name. +- **Decorator-wrapped methods** (e.g., `@imscope`): the audit walks through the decorator; if the decorator is a known passthrough (per `scripts/code_path_audit_overrides.toml`), the method body is processed normally. If unknown, the function is marked "unresolved" and the markdown notes it (matches the v1 spec's `unresolved_calls` behavior). +- **Re-exports across sub-MCPs** (`mcp_client.py` re-exports `mcp_file_io.read_file_result`): the audit uses the **definition** site, not the re-export site, for the producer. The re-export site gets a "passthrough" `FunctionRef` with `role="consumer"`. + +**Output:** A bipartite graph keyed by `(function_fqname, aggregate_name)` -> `FunctionRef` + role. + +#### 7.2.2 `MemoryDim` classifier + +A function `classify_memory_dim(aggregate_name, producer_functions, type_registry) -> MemoryDim` that consults: + +1. **Canonical mappings** (hardcoded in `code_path_audit.py`): + - `Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History` -> `discussion` (per-turn conversational) + - `FileItem`, `FileItems` -> `curation` (per-file structural) + - `ToolDefinition`, `ToolCall` -> `control` (these propagate through the LLM-tool pipeline) + - `Result`, `ErrorInfo` -> `control` (propagation primitives) +2. **File-of-origin heuristic:** if the aggregate's primary producer is in `src/aggregate.py`, `src/context_presets.py`, `src/views.py` -> `curation`. If in `src/ai_client.py`, `src/history.py`, `src/app_controller.py` (in the discussion-handling sections) -> `discussion`. If in `src/rag_engine.py` -> `rag`. If in `src/knowledge*.py` (if exists) -> `knowledge`. If in `src/paths.py`, `src/presets.py`, `src/personas.py` -> `config`. +3. **Override file:** `scripts/code_path_audit_overrides.toml` with `[memory_dim.] = ""` for cases the heuristic gets wrong. + +**When the classifier can't determine:** the result is `"unknown"` and the markdown flags it for human review (the override file is the fix). + +#### 7.2.3 `AccessPatternDetector` (APD) — per-`(function, aggregate)` access pattern + +For each `(function, aggregate)` pair: + +1. Walk the function body. Record every `payload['key']` / `payload.attr` access into a `Counter[str]` keyed by `key`. +2. Detect these patterns: + - `whole_struct`: the function reads `payload` directly (passes to another function; `print(payload)`; `return payload`) OR accesses <=1 distinct key. + - `field_by_field`: the function accesses >=3 distinct keys AND no `whole_struct` access in the body. + - `hot_cold_split`: the function accesses 1-2 keys in the function's hot path (the top-level statement body) AND 2+ additional keys inside `if/else` branches. + - `bulk_batched`: the function is `for x in payload_list: ` where `payload_list: list[aggregate]` and the body accesses fields uniformly across iterations. + - `mixed`: none of the above patterns dominate (each pattern has <60% share of the function's accesses). +3. Aggregate the per-function patterns to the aggregate level: the dominant pattern across all consumers, with the rule that the dominant pattern must have >=25% share of consumers. If no pattern has >=25%, the aggregate-level result is `mixed`. + +**The threshold constants** are module-level in `code_path_audit.py`: + +```python +WHOLE_STRUCT_KEY_THRESHOLD: int = 1 +FIELD_BY_FIELD_KEY_THRESHOLD: int = 3 +MIXED_DOMINANCE_THRESHOLD: float = 0.6 +AGGREGATE_LEVEL_DOMINANCE_THRESHOLD: float = 0.25 +``` + +The override file can change them per-aggregate. + +#### 7.2.4 `CallFrequencyEstimator` (CFE) — per-function frequency + +Build the v1 call graph. For each function: + +1. **Entry point detection** (AST-based): + - Functions called from `__init__` of `App` (in `src/gui_2.py`) or `AppController` (in `src/app_controller.py`) or from `main()` (in `gui.py`) -> `init`. + - Functions called from the ImGui render loop (`render_*` functions, or functions called within `if imgui.begin_main_tool_bar():` etc.) -> `hot`. + - Functions called from the AI send path (`_send__result`, `process_user_request`) -> `per_turn`. + - Functions called from `reset_session`, `cleanup`, `_classify_*_error` -> `cold`. + - Functions called from `save_project`, `load_project`, `save_snapshot` -> `per_discussion`. + - Functions called from `_api_*` FastAPI handlers -> `per_request`. +2. **Override file:** `scripts/code_path_audit_overrides.toml` with `[frequency.] = ""` for manual corrections. +3. **Aggregate level:** the dominant frequency across all producers+consumers, with `unknown` if no dominant. + +### 7.3 The 6 input streams + +The v2 audit consumes JSON from 6 sources. All 6 are in `tests/artifacts/audit_inputs/` (gitignored per `test_sandbox.md`): + +| Input | Path | Producer | Shape (essential fields) | +|---|---|---|---| +| 1 | `audit_weak_types.json` | `scripts/audit_weak_types.py --json` | `{"findings": [{"file", "line", "type_string", "category"}]}` | +| 2 | `audit_exception_handling.json` | `scripts/audit_exception_handling.py --json` | `{"findings": [{"file", "line", "category", "function", "class", "body_summary"}]}` | +| 3 | `audit_optional_in_3_files.json` | `scripts/audit_optional_in_3_files.py --json` | `{"findings": [{"file", "line", "return_type", "function"}]}` (3 baseline files only) | +| 4 | `audit_no_models_config_io.json` | `scripts/audit_no_models_config_io.py --json` | `{"findings": [{"file", "line", "function", "config_path"}]}` | +| 5 | `audit_main_thread_imports.json` | `scripts/audit_main_thread_imports.py --json` | `{"findings": [{"file", "line", "imported_module", "thread"}]}` | +| 6 | `type_registry.json` | `scripts/generate_type_registry.py --json` | `{"types": {"": {"file", "fields": [{"name", "type", "optional"}]}}}` | + +**Tolerance:** if any input is missing or malformed, the audit continues with the corresponding `cross_audit_findings` field set to `()` (empty tuple) and the markdown notes the missing input. The audit does NOT fail on missing inputs. + +### 7.4 The 13 data aggregates (10 + 3 candidates) + +The 10 in-scope aggregates are the canonical TypeAliases from `src/type_aliases.py`: + +``` +1. Metadata (the root alias; 79 sites in src/ai_client.py alone) +2. FileItem (single file in context) +3. FileItems (list of files in context; the most common weak pattern) +4. CommsLogEntry (single entry in AI comms log) +5. CommsLog (the comms log ring buffer) +6. HistoryMessage (single message in provider history; UI layer) +7. History (the conversation history) +8. ToolDefinition (single tool definition) +9. ToolCall (single tool call from the model) +10. Result[T] (the success-or-failure wrapper; the audit's coverage metric) +``` + +The 3 candidate aggregates are from `any_type_componentization_20260621` §3 (NOT on master; the v2 audit is forward-compatible with their absence): + +``` +11. ToolSpec / ToolParameter (would replace ToolDefinition's 45 dict instances; §3.1) +12. ChatMessage / UsageStats / NormalizedResponse (would replace HistoryMessage + tool-call dicts; §3.2) +13. ProviderHistory (would replace the 7 per-provider history lists + locks; §3.3 + PHASE3_HYPOTHETICAL_PROMOTION) +``` + +When the candidate is absent (the master state), the v2 audit produces a placeholder with `is_candidate: True` and all metrics set to 0. The `candidates.md` rollup explains the placeholder status. + +### 7.5 The decomposition cost formula + +**Constants (module-level, tunable):** + +```python +MICROSECOND_BUDGET_PER_LLM_TURN: int = 50_000 # per a real Anthropic Sonnet call's worth of work +BRANCH_DISPATCH_OVERHEAD_US: int = 100 # cost per if/else branch decision on a struct field +ALLOCATION_OVERHEAD_US: int = 50 # cost per SomeDataclass(...) construction +DEAD_FIELD_COST_PER_FIELD_US: int = 10 # wasted allocation per unused field +COMPONENTIZATION_INDIRECTION_US: int = 200 # cost of splitting a hot struct into 2 +UNIFICATION_INDIRECTION_US: int = 300 # cost of merging 2 hot structs into 1 +``` + +**Per-call cost formula:** + +``` +per_call_cost_us = + (struct_field_count * ALLOCATION_OVERHEAD_US) + + (max(fields_accessed_in_hot_path, 1) * BRANCH_DISPATCH_OVERHEAD_US) + + (struct_frozen ? 20 : 0) +``` + +**Current total cost** (per unit of frequency): + +``` +current_total_us = per_call_cost_us * frequency_multiplier +where frequency_multiplier is: + hot = 60 (60 fps) + per_turn = 1 + per_request = 1 + per_discussion = 1 + cold = 0.01 + init = 0.001 + unknown = 0 (no estimate; mark insufficient_data) +``` + +**Componentize savings formula:** + +``` +componentize_savings_us = current_total_us * componentize_factor +where componentize_factor is: + if access_pattern == "field_by_field" and struct_field_count > 10 and not struct_frozen: + componentize_factor = 0.30 + elif access_pattern == "hot_cold_split" and hot_field_count <= 2 and struct_field_count > 5: + componentize_factor = 0.40 + elif access_pattern == "whole_struct" or access_pattern == "bulk_batched": + componentize_factor = -0.20 + elif access_pattern == "mixed": + componentize_factor = 0 + else: + componentize_factor = -0.10 +``` + +**Unify savings formula:** + +``` +unify_savings_us = current_total_us * unify_factor +where unify_factor is: + if access_pattern == "bulk_batched" and struct_field_count <= 3 and struct_frozen: + unify_factor = 0.25 + elif access_pattern == "whole_struct" and struct_field_count <= 5 and struct_frozen: + unify_factor = 0.15 + elif access_pattern == "field_by_field": + unify_factor = -0.30 + elif access_pattern == "hot_cold_split": + unify_factor = -0.10 + elif access_pattern == "mixed": + unify_factor = 0 + else: + unify_factor = 0.05 +``` + +**`recommended_direction` logic:** + +``` +if access_pattern == "field_by_field" and struct_field_count > 10: + -> "componentize" (rationale cites the dead-field count) +elif access_pattern == "hot_cold_split" and hot_field_count <= 2: + -> "componentize" (split into hot + cold structs) +elif access_pattern == "bulk_batched" and struct_field_count <= 3: + -> "unify" (small struct; wider bulk path is fine) +elif access_pattern == "whole_struct" and struct_field_count <= 5: + -> "unify" (small struct; less dispatch overhead) +elif access_pattern == "mixed" or frequency == "unknown": + -> "insufficient_data" (recommend runtime profiling per pipeline) +elif struct_frozen and access_pattern == "whole_struct": + -> "hold" (frozen + whole_struct is the ideal shape) +else: + -> "hold" +``` + +**The auto-generated rationale string:** + +``` +": access_pattern=, frequency=, struct_field_count=, struct_frozen=. +Recommended: because . Estimated savings: us per ." +``` + +The Tier 2 Tech Lead can override the rationale per-aggregate in `scripts/code_path_audit_overrides.toml`. + +--- + +## Output Format + +### 8.1 The 13 per-aggregate files (DSL + markdown + tree) + +For each aggregate: + +**`*.dsl`** — the postfix DSL (flat sections, streamable, tag-scannable). The canonical artifact. + +**`*.md`** — human-readable markdown, 10 sections (Header, Pipeline summary, Access pattern, Frequency, Result coverage, Type alias coverage, Cross-audit findings, Decomposition cost, Optimization candidates, Verdict). + +**`*.tree`** — prefix tree text view (box-drawing, recursive walker). Compact, scannable. + +### 8.2 The 4 top-level rollups + +**`summary.md`** — the 30-second view + the 4-mem-dim rollup + the verdict (the "VERIFIED" or "DRIFT DETECTED" line). + +**`cross_audit_summary.md`** — the per-aggregate cross-audit hits table (5 columns, one per input audit script) + the top-5 follow-up candidates + the cross-validation verdict. + +**`decomposition_matrix.md`** — the ranked list of optimization candidates across all aggregates, sorted by `estimated_savings_us * frequency_multiplier`. The "what should we do next" view. + +**`candidates.md`** — the 3 candidate aggregates (forward-compat placeholders). Explains the placeholder status. + +### 8.3 The v1 artifacts (preserved for backward compat) + +- `docs/reports/code_path_audit//call_graph.dsl` — the v1 full call graph. +- `docs/reports/code_path_audit//actions/ai_message_lifecycle.{dsl,md,mmd}` — the v1 per-action reports, downgraded to "cross-references to the per-aggregate profiles." + +### 8.4 The audit_inputs/ dir (gitignored) + +The 6 input JSON files consumed (for reproducibility; same dir name as `tests/artifacts/audit_inputs/` per `test_sandbox.md`). + +--- + +## Verification (10-phase TDD test plan) + +Per `conductor/workflow.md` TDD red-first protocol. Each phase has 1 setup commit + N test commits + 1 refactor commit. + +| Phase | What | Test count | Audit gate | +|---|---:|---:|---| +| 1. Data model | `AggregateProfile` + 9 supporting dataclasses + 5 enums (per §7.1 / §7.2) | 10 | n/a | +| 2. PCG (P1+P2+P3) | The 3 AST passes; producer/consumer edges | 7 | `audit_main_thread_imports.py` | +| 3. APD | The 5 access patterns + the 25% dominance rule | 6 | n/a | +| 4. CFE | The 6 entry-point detectors + the override file | 6 | n/a | +| 5. Decomposition cost | The 4-direction logic + the auto-generated rationale | 6 | n/a | +| 6. Cross-audit integration | The 6 input JSON contracts + the 3-tier mapping | 7 | `audit_weak_types.py --strict` | +| 7. v2 DSL | The 14 new tagged words + the round-trip + backward compat | 5 | n/a | +| 8. Markdown / tree renderers | The 10 markdown sections + the box-drawing tree | 4 | n/a | +| 9. Integration tests | The synthetic src/ fixture + the real src/ run | 7 | All 4 audit scripts pass `--strict` | +| 10. Live_gui E2E (opt-in) | The MCP tool via the `live_gui` fixture | 2 | All 4 audit scripts pass `--strict` | + +**Total: 60 unit tests + 7 integration tests + 2 live_gui tests = 69 tests.** + +### 9.1 The synthetic src/ fixture + +`tests/fixtures/synthetic_src/` — 6 files defining 3 aggregates (`Metadata`, `FileItems`, `History`) + 6 functions (2 producers, 4 consumers). The integration tests assert the exact expected profiles. + +### 9.2 The 6 input JSON fixture + +`tests/fixtures/audit_inputs/` — 6 JSON files matching the contracts in §7.3. The integration tests assert the cross-audit mapping, the `result_coverage` + `type_alias_coverage` formulas, and the tolerance for missing inputs. + +### 9.3 Pre-commit verification + +```bash +uv run pytest tests/test_code_path_audit.py -q +uv run python scripts/audit_exception_handling.py --strict +uv run python scripts/audit_weak_types.py --strict +uv run python scripts/audit_main_thread_imports.py +uv run python scripts/audit_no_models_config_io.py +``` + +### 9.4 End-of-track verification + +```bash +uv run python -m src.code_path_audit --all --date 2026-06-22 +uv run python scripts/audit_exception_handling.py --strict +uv run python scripts/audit_weak_types.py --strict +uv run python scripts/audit_main_thread_imports.py +uv run python scripts/audit_no_models_config_io.py +uv run python scripts/generate_type_registry.py --check +uv run pytest tests/test_code_path_audit_live_gui.py -v +``` + +### 9.5 Manual verification (per `conductor/workflow.md`) + +The Tier 2 Tech Lead + user review the `docs/reports/code_path_audit//summary.md` to confirm: +- The 4-mem-dim rollup is correct +- The cross-audit verdict is accurate +- The decomposition_matrix.md rankings match the user's intuition +- The 3 candidate aggregates are properly marked as placeholders + +--- + +## Out of Scope (per §7.2) + +- **No modifications to existing `src/*.py` files** (read-only on the 65 existing files; the v2 audit doesn't change them). +- **No modifications to the 5 existing audit scripts** (consume their JSON; don't change them). +- **No runtime profiling.** Deferred to `pipeline_runtime_profiling_20260607` (preserved from the v1 spec's follow-up list). +- **No new pip dependencies.** The v2 audit uses stdlib only. +- **No changes to `data_structure_strengthening_20260606` or `data_oriented_error_handling_20260606` styleguides.** +- **No changes to the v1 `spec.md` and `plan.md`** (they stay as v1). +- **No MMA worker spawn action** (preserved from v1; the user's "keeping MMA cold" directive from 2026-06-07 still stands). +- **No new modules in `src/` other than `code_path_audit.py`** (per the file size + naming convention in AGENTS.md). +- **The 23 lower-impact files** (those with 1-9 weak-type sites each) are deferred. +- **The 3 candidate aggregates' "real" analysis** is deferred (the v2 audit produces placeholders; the real profiles arrive after `any_type_componentization_20260621` merges). +- **The v1-style per-action output** is preserved for backward compat but downgraded to "cross-references to the per-aggregate profiles." + +--- + +## Risks (per §7.3) + +| Risk | Likelihood | Impact | Mitigation | +|---|---|---|---| +| The decomposition-cost heuristic is inaccurate (componentize_savings overestimate or underestimate) | Medium | Medium (false-positive optimization candidates) | Runtime-profiling follow-up recalibrates. The override file adjusts per-aggregate. | +| The PCG misses dynamic patterns (`eval`, `getattr`, decorator-driven dispatch) | Medium | Low (affected functions marked "unresolved") | The override file lists known passthroughs. Runtime-profiling follow-up catches unresolved. | +| The 6 input JSON contracts drift (the existing audit scripts evolve without bumping the v2 audit's contract) | Medium | Low (the v2 audit tolerates missing fields; the schema validator catches drift) | The `audit_code_path_audit_coverage.py` meta-audit runs in CI; fails on schema drift. | +| The candidate aggregates don't merge (`any_type_componentization_20260621` is delayed) | Low | Low (the placeholders are still there; the report still produces) | The v2 audit is forward-compatible. The `is_candidate: bool` flag handles absence. | +| The v1 .dsl files don't round-trip (the v2 parser is more strict than v1) | Low | Medium (the v1 action reports are broken) | The v2 parser is a **superset** of v1; the v1 action reports still parse. The `test_v2_dsl_backward_compat_v1` test verifies. | +| The 60+7+2 = 69 tests is too long-running for the per-PR CI gate | Low | Low (AST walks are sub-second; live_gui tests are opt-in) | Unit + integration tests <30s. Live_gui tests opt-in via env var. | +| The synthetic src/ fixture diverges from real src/ (the test expectations don't generalize) | Medium | Low (the integration tests catch real bugs separately) | The integration test layer runs against real src/ as well as the synthetic fixture. | +| The v2 audit is run against `master` without `any_type_componentization_20260621` merged, so the candidate placeholders pollute the report | Low | Low (the placeholders are clearly marked) | The `is_candidate: bool` flag is visible in every output. The `summary.md` has a section explaining placeholder status. | +| The decomposition-matrix savings estimates are misinterpreted as "ground truth" (they're heuristic) | Medium | Low (the user might over-prioritize) | The `summary.md` and `decomposition_matrix.md` headers caveat: "Savings estimates are heuristic (calibrated by `pipeline_runtime_profiling_20260607`); use as ranking input, not as actual savings." | +| The 4 mem dim classification is wrong for some aggregates (the file-of-origin heuristic misroutes) | Medium | Low (the misrouted aggregate shows up in the wrong dim's rollup) | The `MemoryDim` is overridable in `scripts/code_path_audit_overrides.toml`. The markdown flags the override. | + +--- + +## Coordination with Pending Tracks + +| Track | Status (2026-06-22) | Relationship to v2 | +|---|---|---| +| `any_type_componentization_20260621` | NOT on master (merged `f914b2bc`, reverted `751b94d4`); spec + plan in `conductor/tracks/any_type_componentization_20260621/` | The 3 candidate aggregates (`ToolSpec`, `ChatMessage`, `ProviderHistory`) are sourced from this track's `ANY_TYPE_AUDIT_20260621.md` §3. The v2 audit's `candidates.md` rollup documents the forward-compat. When this track merges, the v2 audit is re-run; the placeholders become real profiles. | +| `phase2_4_5_call_site_completion_20260621` | NOT on master (same merge+revert history as `any_type_componentization_20260621`); spec + plan + TRACK_COMPLETION report in `conductor/tracks/phase2_4_5_call_site_completion_20260621/` | The `PHASE3_HYPOTHETICAL_PROMOTION.md` (authored by Tier 2; the authoritative Phase 3 cost hypothesis) is the source of the v2's `ProviderHistory` candidate aggregate's expected cost. The v2 audit's `candidates.md` cites this report. | +| `data_oriented_error_handling_20260606` | SHIPPED (in master) | The v2 audit's `result_coverage` metric is the cross-check. The `error_handling.md` styleguide is the v2 audit's source of truth for the `Result[T]` return types. | +| `data_structure_strengthening_20260606` | SHIPPED (in master) | The v2 audit's `type_alias_coverage` metric is the cross-check. The `type_aliases.md` styleguide + the 10 TypeAliases are the v2 audit's source of truth. | +| `result_migration_cruft_removal_20260620` | SHIPPED (in master) | The `RESULT_MIGRATION_CAMPAIGN_STATUS_20260619.md` confirms the 100% complete state. The v2 audit's `result_coverage` reports on this final state. | +| `public_api_migration_and_ui_polish_20260615` | SHIPPED (in master) | `ai_client.send_result()` is the canonical public API. The v2 audit's `Metadata` aggregate's `result_coverage` reports on the post-migration state. | +| `nagent_review_20260608` (v3.1) | ACTIVE (in master; v3.1 is the latest at `7e61dd7d`) | The v2 audit references Candidates 27-30 (Markdown + custom DSL lock-in, per-turn ground-truth hook, dataset-curation track, cache TTL GUI hardening). The v2's custom postfix DSL is a direct application of Candidate 27. | +| `exception_handling_audit_20260616` | SHIPPED (in master) | The 211-site audit (`EXCEPTION_HANDLING_AUDIT_20260616.md`) is the precedent for the v2 audit's structure (audit -> migration plan -> sub-tracks). | +| `tier2_leak_prevention_20260620` | SHIPPED (in master) | The v2 audit's Tier 2 execution follows the `tier2_leak_prevention` conventions (no `git push*`, no `git checkout*`, etc.). | + +**This audit has no blockers** and **no conflicts**. It can ship independently of the 5 active planned tracks. It enables future refactors (the 3 high-priority `componentize` candidates). + +--- + +## Follow-up (per §7.4) + +| # | Track | When | Purpose | +|---|---|---|---| +| 1 | `pipeline_runtime_profiling_20260607` | After v2 ships | Calibrate the v2's heuristic cost constants against real measurements. Uses `src/performance_monitor.py`. The v2 spec's `MICROSECOND_BUDGET_PER_LLM_TURN`, `BRANCH_DISPATCH_OVERHEAD_US`, `ALLOCATION_OVERHEAD_US`, `DEAD_FIELD_COST_PER_FIELD_US`, `COMPONENTIZATION_INDIRECTION_US`, `UNIFICATION_INDIRECTION_US` are recalibrated by this track. | +| 2 | `data_pipelines_inventory_` | After v2 ships | Per-pipeline (vs per-aggregate) reports for the top 5 pipelines. Complements the v2 with the pipeline view. The v2's `decomposition_matrix.md` is the input. | +| 3 | `code_path_audit_in_ci_` | After v2 ships | Run v2 in CI on every PR; fail on new untyped sites OR a high-priority decomposition-matrix regression. The "audit as CI gate" pattern. | +| 4 | `code_path_audit_data_oriented_refactor_` | After v2 ships | Implement the 3 high-priority `componentize` candidates (FileItems, History, Metadata) per the v2 audit's `decomposition_matrix.md`. | +| 5 | `code_path_audit_v2_5_followup_` | After `any_type_componentization_20260621` merges | Re-run v2; the 3 placeholders become real profiles; the decomposition-matrix gets 3 new rows. | + +--- + +## See Also + +### Styleguides + +- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference (v2's decomposition-cost heuristic is informed by §2's 8 defaults) +- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention (v2's public API returns `Result[T]` per the hard rule) +- `conductor/code_styleguides/type_aliases.md` — the 10 TypeAliases + 1 NamedTuple (v2's 10 in-scope aggregates) +- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 mem dims (v2's `MemoryDim` classifier) +- `conductor/code_styleguides/feature_flags.md` — "delete to turn off" pattern (v2's `audit_code_path_audit_coverage.py` is a feature flag) +- `conductor/code_styleguides/cache_friendly_context.md` — stable-to-volatile context ordering (v2's per-aggregate reports are a downstream consumer of the cache state) +- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge harvest pattern (v2's per-aggregate profiles are NOT a knowledge artifact; they're curation) +- `conductor/code_styleguides/rag_integration_discipline.md` — the conservative-RAG rule (v2's `rag` aggregate classification) +- `conductor/code_styleguides/config_state_owner.md` — config I/O ownership (v2's `audit_no_models_config_io.json` is the cross-check) + +### v1 spec + plan (preserved) + +- `conductor/tracks/code_path_audit_20260607/spec.md` — the v1 spec (approved 2026-06-07; revised 2026-06-08 with post-4-tracks timing + 5-source framing) +- `conductor/tracks/code_path_audit_20260607/plan.md` — the v1 plan (preserved, never executed) + +### Reports + ideation + +- `docs/reports/computational_shapes_ssdl_digest_20260608.md` — the SSDL digest that informed the v1 spec's 5-source lens (v2 preserves the lens) +- `docs/reports/RESULT_MIGRATION_CAMPAIGN_STATUS_20260619.md` — the 100%-complete result migration campaign +- `docs/reports/ANY_TYPE_AUDIT_20260621.md` — the 89-site audit (48 promoted + 41 deferred) that informed `any_type_componentization_20260621` (v2's 3 candidate aggregates) +- `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` — the Tier 2's authoritative cost analysis of the 41 deferred Phase 3 sites +- `docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` — the 211-site audit (precedent for v2's structure) +- `docs/reports/PLANNING_DIGEST_20260606.md` — the planning digest for the 5 foundational tracks +- `docs/ideation/ed_chunk_data_structures_20260523.md` — the chunk-based-data-structure ideation (referenced in v1 spec; v2's `bulk_batched` access pattern aligns) + +### v3.1 nagent review (the latest framing) + +- `conductor/tracks/nagent_review_20260608/nagent_review_v3_1_20260620.md` — the v3.1 thickened main review +- `conductor/tracks/nagent_review_20260608/nagent_takeaways_v3_1_20260620.md` — the v3.1 bridge + the 4 new candidates (27-30) +- `conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md` — the v3 main review (preserved per user directive 2026-06-20) + +### Source files (the v2 audit consumes) + +- `src/type_aliases.py` — the 10 TypeAliases + 1 NamedTuple +- `src/result_types.py` — `Result[T]`, `ErrorInfo`, nil-sentinels +- `src/mcp_client.py:934-992` — `derive_code_path` (the v2's PCG is the multi-symbol superset) +- `src/performance_monitor.py` — runtime profiling (used by `pipeline_runtime_profiling_20260607` follow-up) +- `src/vendor_capabilities.py` — the canonical `frozen=True` dataclass + module-level registry pattern (template for the v2 audit's per-aggregate profile structure) + +### Audit scripts (the v2 audit consumes) + +- `scripts/audit_main_thread_imports.py` — import-graph CI gate +- `scripts/audit_weak_types.py` — weak-types CI gate +- `scripts/audit_exception_handling.py` — exception-handling CI gate +- `scripts/audit_optional_in_3_files.py` — `Optional[T]` ban CI gate (v2 extends this with 1 line) +- `scripts/audit_no_models_config_io.py` — config-I/O ownership CI gate +- `scripts/generate_type_registry.py` — type-registry generator + +### Workflow + process + +- `conductor/workflow.md` — TDD protocol + per-task commits + git notes + phase checkpoints + skip-marker policy +- `conductor/edit_workflow.md` — the edit-tool contract (the v2 audit uses `manual-slop_*` MCP tools per the project convention) +- `AGENTS.md` — canonical operating rules (the "no day estimates" rule, the "small files are propaganda" stance, the hard bans on `git restore` / `git checkout --`) +- `conductor/product-guidelines.md` — product-level conventions (1-space indent, 1 commit per task, type hints, etc.) +- `conductor/tech-stack.md` — tech stack constraints (Python 3.11+, imgui-bundle, FastAPI, etc.) + +### Sibling tracks (the v2's relationship) + +- `conductor/tracks/any_type_componentization_20260621/` — the 3 candidate aggregates' source +- `conductor/tracks/phase2_4_5_call_site_completion_20260621/` — the `PHASE3_HYPOTHETICAL_PROMOTION` source +- `conductor/tracks/data_oriented_error_handling_20260606/` — the `Result[T]` source +- `conductor/tracks/data_structure_strengthening_20260606/` — the TypeAlias source +- `conductor/tracks/result_migration_cruft_removal_20260620/` — the 100% complete result migration + +--- + +**End of spec_v2.md.**