# Code Path & Data Pipeline Audit Styleguide > **Status:** Active convention as of 2026-06-22. Established by the `code_path_audit_20260607` v2 track. This styleguide codifies the contract for `src/code_path_audit.py` v2 and the 6 input audit scripts it consumes. Companion to `data_oriented_design.md`, `error_handling.md`, `type_aliases.md`, and `agent_memory_dimensions.md`. ## The 5 Conventions ### 1. Per-aggregate profile structure Every `AggregateProfile` (the central artifact) has 15 fields (14 required + 1 default): `name`, `aggregate_kind`, `memory_dim`, `producers`, `consumers`, `access_pattern`, `access_pattern_evidence`, `frequency`, `frequency_evidence`, `result_coverage`, `type_alias_coverage`, `cross_audit_findings`, `decomposition_cost`, `optimization_candidates`, `is_candidate` (plus `mermaid` and `markdown` with defaults). The `is_candidate: bool` flag distinguishes the 3 placeholder aggregates (`ToolSpec`, `ChatMessage`, `ProviderHistory`) from the 10 real aggregates. The custom postfix `.dsl` output is the canonical artifact: each section is a self-contained tagged record (flat, streamable, tag-scannable). The 14 new v2 DSL words: `kind`, `mem-dim`, `fn-ref`, `access-pattern`, `ap-evidence`, `frequency`, `freq-evidence`, `result-coverage`, `type-alias-coverage`, `cross-audit-finding`, `cross-audit-findings`, `decomp-cost`, `opt-candidate`, `is-candidate`. Arity table in `src/code_path_audit.py:DSL_WORD_ARITY_V2`. ### 2. The 4 decomposition directions For each aggregate, the audit computes a `DecompositionCost` (8 fields: `current_cost_estimate`, `componentize_savings`, `unify_savings`, `recommended_direction`, `recommended_rationale`, `batch_size`, `struct_field_count`, `struct_frozen`). The `recommended_direction` is one of: - **`componentize`** - split into smaller dataclasses; access pattern is `field_by_field` with many dead fields, OR `hot_cold_split` with small hot fields. - **`unify`** - combine into wider fat structs; access pattern is `bulk_batched` with a small struct, OR `whole_struct` with a small struct. - **`hold`** - current shape is correct; default for `frozen + whole_struct` (the ideal shape). - **`insufficient_data`** - access pattern is `mixed` or frequency is `unknown`; needs runtime profiling per pipeline. The 4-direction logic is in `src/code_path_audit.py:recommended_direction()`. The savings estimates are heuristic (calibrated by `pipeline_runtime_profiling_20260607`); use as ranking input, not as actual savings. ### 3. The override file format `scripts/code_path_audit_overrides.toml` (TOML) lets the user adjust per-aggregate. Sections: ```toml [memory_dim] "Metadata" = "curation" [frequency] "src.cleanup.do_nothing" = "cold" ``` The file is optional. Missing file = empty overrides (the canonical mappings + heuristics apply). ### 4. The 4 mem dim classification rules `MemoryDim` is a 7-value Literal: `curation`, `discussion`, `rag`, `knowledge`, `config`, `control`, `unknown`. The classification precedence (per `src/code_path_audit.py:classify_memory_dim()`): overrides > canonical mappings > file-of-origin heuristic > `unknown`. - **`curation`**: per-file structural (FileItem, FileItems, ContextPreset). - **`discussion`**: per-turn conversational (Metadata, CommsLog, History, ChatMessage). - **`rag`**: opt-in semantic (RAGEngine state, indexed chunks). - **`knowledge`**: per-project durable (knowledge category files, digest). - **`config`**: project / global config (manual_slop.toml, presets.toml, personas.toml). - **`control`**: propagation primitives (Result[T], ErrorInfo, WebSocketMessage, ToolSpec, NormalizedResponse). - **`unknown`**: the audit can't classify; flagged for human review. ### 5. The cross-audit integration contract The v2 audit consumes JSON from 6 input sources (in `tests/artifacts/audit_inputs/`): | Input | Producer | Shape | |---|---|---| | `audit_weak_types.json` | `scripts/audit_weak_types.py --json` | `{"findings": [{"file", "line", "type_string", "category"}]}` | | `audit_exception_handling.json` | `scripts/audit_exception_handling.py --json` | `{"findings": [{"file", "line", "category", "function", "class", "body_summary"}]}` | | `audit_optional_in_3_files.json` | `scripts/audit_optional_in_3_files.py --json` | `{"findings": [{"file", "line", "return_type", "function"}]}` | | `audit_no_models_config_io.json` | `scripts/audit_no_models_config_io.py --json` | `{"findings": [{"file", "line", "function", "config_path"}]}` | | `audit_main_thread_imports.json` | `scripts/audit_main_thread_imports.py --json` | `{"findings": [{"file", "line", "imported_module", "thread"}]}` | | `type_registry.json` | `scripts/generate_type_registry.py --json` | `{"types": {"": {"file", "fields": [{"name", "type", "optional"}]}}}` | **Tolerance:** if any input is missing or malformed, the audit continues with the corresponding `cross_audit_findings` field set to `()` and the markdown notes the missing input. The audit does NOT fail on missing inputs. The finding-to-aggregate mapping is 3-tier: tier 1 (function lookup) > tier 2 (field lookup via type registry) > tier 3 (heuristic fallback by file-of-origin). Each finding gets a `(aggregate, confidence, mapping_tier)` triple. ## See Also - `conductor/tracks/code_path_audit_20260607/spec_v2.md` - the canonical spec - `conductor/tracks/code_path_audit_20260607/plan_v2.md` - the canonical plan - `conductor/code_styleguides/data_oriented_design.md` - the canonical DOD reference - `conductor/code_styleguides/error_handling.md` - the `Result[T]` convention - `conductor/code_styleguides/type_aliases.md` - the 10 TypeAliases + 1 NamedTuple - `conductor/code_styleguides/agent_memory_dimensions.md` - the 4 mem dims