Replaces the Phase 0 stub. Documents the per-aggregate profile structure, the 4 decomposition directions, the override file format, the 4 mem dim classification rules, and the 6-input cross-audit integration contract.
5.6 KiB
Code Path & Data Pipeline Audit Styleguide
Status: Active convention as of 2026-06-22. Established by the
code_path_audit_20260607v2 track.
This styleguide codifies the contract for src/code_path_audit.py v2 and the 6 input audit scripts it consumes. Companion to data_oriented_design.md, error_handling.md, type_aliases.md, and agent_memory_dimensions.md.
The 5 Conventions
1. Per-aggregate profile structure
Every AggregateProfile (the central artifact) has 15 fields (14 required + 1 default): name, aggregate_kind, memory_dim, producers, consumers, access_pattern, access_pattern_evidence, frequency, frequency_evidence, result_coverage, type_alias_coverage, cross_audit_findings, decomposition_cost, optimization_candidates, is_candidate (plus mermaid and markdown with defaults). The is_candidate: bool flag distinguishes the 3 placeholder aggregates (ToolSpec, ChatMessage, ProviderHistory) from the 10 real aggregates.
The custom postfix .dsl output is the canonical artifact: each section is a self-contained tagged record (flat, streamable, tag-scannable). The 14 new v2 DSL words: kind, mem-dim, fn-ref, access-pattern, ap-evidence, frequency, freq-evidence, result-coverage, type-alias-coverage, cross-audit-finding, cross-audit-findings, decomp-cost, opt-candidate, is-candidate. Arity table in src/code_path_audit.py:DSL_WORD_ARITY_V2.
2. The 4 decomposition directions
For each aggregate, the audit computes a DecompositionCost (8 fields: current_cost_estimate, componentize_savings, unify_savings, recommended_direction, recommended_rationale, batch_size, struct_field_count, struct_frozen). The recommended_direction is one of:
componentize- split into smaller dataclasses; access pattern isfield_by_fieldwith many dead fields, ORhot_cold_splitwith small hot fields.unify- combine into wider fat structs; access pattern isbulk_batchedwith a small struct, ORwhole_structwith a small struct.hold- current shape is correct; default forfrozen + whole_struct(the ideal shape).insufficient_data- access pattern ismixedor frequency isunknown; needs runtime profiling per pipeline.
The 4-direction logic is in src/code_path_audit.py:recommended_direction(). The savings estimates are heuristic (calibrated by pipeline_runtime_profiling_20260607); use as ranking input, not as actual savings.
3. The override file format
scripts/code_path_audit_overrides.toml (TOML) lets the user adjust per-aggregate. Sections:
[memory_dim]
"Metadata" = "curation"
[frequency]
"src.cleanup.do_nothing" = "cold"
The file is optional. Missing file = empty overrides (the canonical mappings + heuristics apply).
4. The 4 mem dim classification rules
MemoryDim is a 7-value Literal: curation, discussion, rag, knowledge, config, control, unknown. The classification precedence (per src/code_path_audit.py:classify_memory_dim()): overrides > canonical mappings > file-of-origin heuristic > unknown.
curation: per-file structural (FileItem, FileItems, ContextPreset).discussion: per-turn conversational (Metadata, CommsLog, History, ChatMessage).rag: opt-in semantic (RAGEngine state, indexed chunks).knowledge: per-project durable (knowledge category files, digest).config: project / global config (manual_slop.toml, presets.toml, personas.toml).control: propagation primitives (Result[T], ErrorInfo, WebSocketMessage, ToolSpec, NormalizedResponse).unknown: the audit can't classify; flagged for human review.
5. The cross-audit integration contract
The v2 audit consumes JSON from 6 input sources (in tests/artifacts/audit_inputs/):
| Input | Producer | Shape |
|---|---|---|
audit_weak_types.json |
scripts/audit_weak_types.py --json |
{"findings": [{"file", "line", "type_string", "category"}]} |
audit_exception_handling.json |
scripts/audit_exception_handling.py --json |
{"findings": [{"file", "line", "category", "function", "class", "body_summary"}]} |
audit_optional_in_3_files.json |
scripts/audit_optional_in_3_files.py --json |
{"findings": [{"file", "line", "return_type", "function"}]} |
audit_no_models_config_io.json |
scripts/audit_no_models_config_io.py --json |
{"findings": [{"file", "line", "function", "config_path"}]} |
audit_main_thread_imports.json |
scripts/audit_main_thread_imports.py --json |
{"findings": [{"file", "line", "imported_module", "thread"}]} |
type_registry.json |
scripts/generate_type_registry.py --json |
{"types": {"<aggregate>": {"file", "fields": [{"name", "type", "optional"}]}}} |
Tolerance: if any input is missing or malformed, the audit continues with the corresponding cross_audit_findings field set to () and the markdown notes the missing input. The audit does NOT fail on missing inputs.
The finding-to-aggregate mapping is 3-tier: tier 1 (function lookup) > tier 2 (field lookup via type registry) > tier 3 (heuristic fallback by file-of-origin). Each finding gets a (aggregate, confidence, mapping_tier) triple.
See Also
conductor/tracks/code_path_audit_20260607/spec_v2.md- the canonical specconductor/tracks/code_path_audit_20260607/plan_v2.md- the canonical planconductor/code_styleguides/data_oriented_design.md- the canonical DOD referenceconductor/code_styleguides/error_handling.md- theResult[T]conventionconductor/code_styleguides/type_aliases.md- the 10 TypeAliases + 1 NamedTupleconductor/code_styleguides/agent_memory_dimensions.md- the 4 mem dims