# Code Path & Data Pipeline Audit Styleguide

> **Status:** Active convention as of 2026-06-22. Established by the `code_path_audit_20260607` v2 track.

This styleguide codifies the contract for `src/code_path_audit.py` v2 and the 6 input audit scripts it consumes. Companion to `data_oriented_design.md`, `error_handling.md`, `type_aliases.md`, and `agent_memory_dimensions.md`.

## The 5 Conventions

### 1. Per-aggregate profile structure

Every `AggregateProfile` (the central artifact) has 15 fields (14 required + 1 default): `name`, `aggregate_kind`, `memory_dim`, `producers`, `consumers`, `access_pattern`, `access_pattern_evidence`, `frequency`, `frequency_evidence`, `result_coverage`, `type_alias_coverage`, `cross_audit_findings`, `decomposition_cost`, `optimization_candidates`, `is_candidate` (plus `mermaid` and `markdown` with defaults). The `is_candidate: bool` flag distinguishes the 3 placeholder aggregates (`ToolSpec`, `ChatMessage`, `ProviderHistory`) from the 10 real aggregates.

The custom postfix `.dsl` output is the canonical artifact: each section is a self-contained tagged record (flat, streamable, tag-scannable). The 14 new v2 DSL words: `kind`, `mem-dim`, `fn-ref`, `access-pattern`, `ap-evidence`, `frequency`, `freq-evidence`, `result-coverage`, `type-alias-coverage`, `cross-audit-finding`, `cross-audit-findings`, `decomp-cost`, `opt-candidate`, `is-candidate`. Arity table in `src/code_path_audit.py:DSL_WORD_ARITY_V2`.

### 2. The 4 decomposition directions

For each aggregate, the audit computes a `DecompositionCost` (8 fields: `current_cost_estimate`, `componentize_savings`, `unify_savings`, `recommended_direction`, `recommended_rationale`, `batch_size`, `struct_field_count`, `struct_frozen`). The `recommended_direction` is one of:

- **`componentize`** - split into smaller dataclasses; access pattern is `field_by_field` with many dead fields, OR `hot_cold_split` with small hot fields.
- **`unify`** - combine into wider fat structs; access pattern is `bulk_batched` with a small struct, OR `whole_struct` with a small struct.
- **`hold`** - current shape is correct; default for `frozen + whole_struct` (the ideal shape).
- **`insufficient_data`** - access pattern is `mixed` or frequency is `unknown`; needs runtime profiling per pipeline.

The 4-direction logic is in `src/code_path_audit.py:recommended_direction()`. The savings estimates are heuristic (calibrated by `pipeline_runtime_profiling_20260607`); use as ranking input, not as actual savings.

### 3. The override file format

`scripts/code_path_audit_overrides.toml` (TOML) lets the user adjust per-aggregate. Sections:

```toml
[memory_dim]
"Metadata" = "curation"

[frequency]
"src.cleanup.do_nothing" = "cold"
```

The file is optional. Missing file = empty overrides (the canonical mappings + heuristics apply).

### 4. The 4 mem dim classification rules

`MemoryDim` is a 7-value Literal: `curation`, `discussion`, `rag`, `knowledge`, `config`, `control`, `unknown`. The classification precedence (per `src/code_path_audit.py:classify_memory_dim()`): overrides > canonical mappings > file-of-origin heuristic > `unknown`.

- **`curation`**: per-file structural (FileItem, FileItems, ContextPreset).
- **`discussion`**: per-turn conversational (Metadata, CommsLog, History, ChatMessage).
- **`rag`**: opt-in semantic (RAGEngine state, indexed chunks).
- **`knowledge`**: per-project durable (knowledge category files, digest).
- **`config`**: project / global config (manual_slop.toml, presets.toml, personas.toml).
- **`control`**: propagation primitives (Result[T], ErrorInfo, WebSocketMessage, ToolSpec, NormalizedResponse).
- **`unknown`**: the audit can't classify; flagged for human review.

### 5. The cross-audit integration contract

The v2 audit consumes JSON from 6 input sources (in `tests/artifacts/audit_inputs/`):

| Input | Producer | Shape |
|---|---|---|
| `audit_weak_types.json` | `scripts/audit_weak_types.py --json` | `{"findings": [{"file", "line", "type_string", "category"}]}` |
| `audit_exception_handling.json` | `scripts/audit_exception_handling.py --json` | `{"findings": [{"file", "line", "category", "function", "class", "body_summary"}]}` |
| `audit_optional_in_3_files.json` | `scripts/audit_optional_in_3_files.py --json` | `{"findings": [{"file", "line", "return_type", "function"}]}` |
| `audit_no_models_config_io.json` | `scripts/audit_no_models_config_io.py --json` | `{"findings": [{"file", "line", "function", "config_path"}]}` |
| `audit_main_thread_imports.json` | `scripts/audit_main_thread_imports.py --json` | `{"findings": [{"file", "line", "imported_module", "thread"}]}` |
| `type_registry.json` | `scripts/generate_type_registry.py --json` | `{"types": {"<aggregate>": {"file", "fields": [{"name", "type", "optional"}]}}}` |

**Tolerance:** if any input is missing or malformed, the audit continues with the corresponding `cross_audit_findings` field set to `()` and the markdown notes the missing input. The audit does NOT fail on missing inputs.

The finding-to-aggregate mapping is 3-tier: tier 1 (function lookup) > tier 2 (field lookup via type registry) > tier 3 (heuristic fallback by file-of-origin). Each finding gets a `(aggregate, confidence, mapping_tier)` triple.

## See Also

- `conductor/tracks/code_path_audit_20260607/spec_v2.md` - the canonical spec
- `conductor/tracks/code_path_audit_20260607/plan_v2.md` - the canonical plan
- `conductor/code_styleguides/data_oriented_design.md` - the canonical DOD reference
- `conductor/code_styleguides/error_handling.md` - the `Result[T]` convention
- `conductor/code_styleguides/type_aliases.md` - the 10 TypeAliases + 1 NamedTuple
- `conductor/code_styleguides/agent_memory_dimensions.md` - the 4 mem dims