Private
Public Access
0
0
Files
manual_slop/conductor/code_styleguides/code_path_audit.md
T
ed f5f313182b docs(styleguide): write the full 5-convention code_path_audit styleguide
Replaces the Phase 0 stub. Documents the per-aggregate profile
structure, the 4 decomposition directions, the override file
format, the 4 mem dim classification rules, and the 6-input
cross-audit integration contract.
2026-06-22 02:10:25 -04:00

5.6 KiB

Code Path & Data Pipeline Audit Styleguide

Status: Active convention as of 2026-06-22. Established by the code_path_audit_20260607 v2 track.

This styleguide codifies the contract for src/code_path_audit.py v2 and the 6 input audit scripts it consumes. Companion to data_oriented_design.md, error_handling.md, type_aliases.md, and agent_memory_dimensions.md.

The 5 Conventions

1. Per-aggregate profile structure

Every AggregateProfile (the central artifact) has 15 fields (14 required + 1 default): name, aggregate_kind, memory_dim, producers, consumers, access_pattern, access_pattern_evidence, frequency, frequency_evidence, result_coverage, type_alias_coverage, cross_audit_findings, decomposition_cost, optimization_candidates, is_candidate (plus mermaid and markdown with defaults). The is_candidate: bool flag distinguishes the 3 placeholder aggregates (ToolSpec, ChatMessage, ProviderHistory) from the 10 real aggregates.

The custom postfix .dsl output is the canonical artifact: each section is a self-contained tagged record (flat, streamable, tag-scannable). The 14 new v2 DSL words: kind, mem-dim, fn-ref, access-pattern, ap-evidence, frequency, freq-evidence, result-coverage, type-alias-coverage, cross-audit-finding, cross-audit-findings, decomp-cost, opt-candidate, is-candidate. Arity table in src/code_path_audit.py:DSL_WORD_ARITY_V2.

2. The 4 decomposition directions

For each aggregate, the audit computes a DecompositionCost (8 fields: current_cost_estimate, componentize_savings, unify_savings, recommended_direction, recommended_rationale, batch_size, struct_field_count, struct_frozen). The recommended_direction is one of:

  • componentize - split into smaller dataclasses; access pattern is field_by_field with many dead fields, OR hot_cold_split with small hot fields.
  • unify - combine into wider fat structs; access pattern is bulk_batched with a small struct, OR whole_struct with a small struct.
  • hold - current shape is correct; default for frozen + whole_struct (the ideal shape).
  • insufficient_data - access pattern is mixed or frequency is unknown; needs runtime profiling per pipeline.

The 4-direction logic is in src/code_path_audit.py:recommended_direction(). The savings estimates are heuristic (calibrated by pipeline_runtime_profiling_20260607); use as ranking input, not as actual savings.

3. The override file format

scripts/code_path_audit_overrides.toml (TOML) lets the user adjust per-aggregate. Sections:

[memory_dim]
"Metadata" = "curation"

[frequency]
"src.cleanup.do_nothing" = "cold"

The file is optional. Missing file = empty overrides (the canonical mappings + heuristics apply).

4. The 4 mem dim classification rules

MemoryDim is a 7-value Literal: curation, discussion, rag, knowledge, config, control, unknown. The classification precedence (per src/code_path_audit.py:classify_memory_dim()): overrides > canonical mappings > file-of-origin heuristic > unknown.

  • curation: per-file structural (FileItem, FileItems, ContextPreset).
  • discussion: per-turn conversational (Metadata, CommsLog, History, ChatMessage).
  • rag: opt-in semantic (RAGEngine state, indexed chunks).
  • knowledge: per-project durable (knowledge category files, digest).
  • config: project / global config (manual_slop.toml, presets.toml, personas.toml).
  • control: propagation primitives (Result[T], ErrorInfo, WebSocketMessage, ToolSpec, NormalizedResponse).
  • unknown: the audit can't classify; flagged for human review.

5. The cross-audit integration contract

The v2 audit consumes JSON from 6 input sources (in tests/artifacts/audit_inputs/):

Input Producer Shape
audit_weak_types.json scripts/audit_weak_types.py --json {"findings": [{"file", "line", "type_string", "category"}]}
audit_exception_handling.json scripts/audit_exception_handling.py --json {"findings": [{"file", "line", "category", "function", "class", "body_summary"}]}
audit_optional_in_3_files.json scripts/audit_optional_in_3_files.py --json {"findings": [{"file", "line", "return_type", "function"}]}
audit_no_models_config_io.json scripts/audit_no_models_config_io.py --json {"findings": [{"file", "line", "function", "config_path"}]}
audit_main_thread_imports.json scripts/audit_main_thread_imports.py --json {"findings": [{"file", "line", "imported_module", "thread"}]}
type_registry.json scripts/generate_type_registry.py --json {"types": {"<aggregate>": {"file", "fields": [{"name", "type", "optional"}]}}}

Tolerance: if any input is missing or malformed, the audit continues with the corresponding cross_audit_findings field set to () and the markdown notes the missing input. The audit does NOT fail on missing inputs.

The finding-to-aggregate mapping is 3-tier: tier 1 (function lookup) > tier 2 (field lookup via type registry) > tier 3 (heuristic fallback by file-of-origin). Each finding gets a (aggregate, confidence, mapping_tier) triple.

See Also

  • conductor/tracks/code_path_audit_20260607/spec_v2.md - the canonical spec
  • conductor/tracks/code_path_audit_20260607/plan_v2.md - the canonical plan
  • conductor/code_styleguides/data_oriented_design.md - the canonical DOD reference
  • conductor/code_styleguides/error_handling.md - the Result[T] convention
  • conductor/code_styleguides/type_aliases.md - the 10 TypeAliases + 1 NamedTuple
  • conductor/code_styleguides/agent_memory_dimensions.md - the 4 mem dims