Private

Public Access

Files

T

ed f5f313182b docs(styleguide): write the full 5-convention code_path_audit styleguide

Replaces the Phase 0 stub. Documents the per-aggregate profile
structure, the 4 decomposition directions, the override file
format, the 4 mem dim classification rules, and the 6-input
cross-audit integration contract.

2026-06-22 02:10:25 -04:00

5.6 KiB

Raw Blame History

Code Path & Data Pipeline Audit Styleguide

Status: Active convention as of 2026-06-22. Established by the code_path_audit_20260607 v2 track.

This styleguide codifies the contract for src/code_path_audit.py v2 and the 6 input audit scripts it consumes. Companion to data_oriented_design.md, error_handling.md, type_aliases.md, and agent_memory_dimensions.md.

The 5 Conventions

1. Per-aggregate profile structure

Every AggregateProfile (the central artifact) has 15 fields (14 required + 1 default): name, aggregate_kind, memory_dim, producers, consumers, access_pattern, access_pattern_evidence, frequency, frequency_evidence, result_coverage, type_alias_coverage, cross_audit_findings, decomposition_cost, optimization_candidates, is_candidate (plus mermaid and markdown with defaults). The is_candidate: bool flag distinguishes the 3 placeholder aggregates (ToolSpec, ChatMessage, ProviderHistory) from the 10 real aggregates.

The custom postfix .dsl output is the canonical artifact: each section is a self-contained tagged record (flat, streamable, tag-scannable). The 14 new v2 DSL words: kind, mem-dim, fn-ref, access-pattern, ap-evidence, frequency, freq-evidence, result-coverage, type-alias-coverage, cross-audit-finding, cross-audit-findings, decomp-cost, opt-candidate, is-candidate. Arity table in src/code_path_audit.py:DSL_WORD_ARITY_V2.

2. The 4 decomposition directions

For each aggregate, the audit computes a DecompositionCost (8 fields: current_cost_estimate, componentize_savings, unify_savings, recommended_direction, recommended_rationale, batch_size, struct_field_count, struct_frozen). The recommended_direction is one of:

componentize - split into smaller dataclasses; access pattern is field_by_field with many dead fields, OR hot_cold_split with small hot fields.
unify - combine into wider fat structs; access pattern is bulk_batched with a small struct, OR whole_struct with a small struct.
hold - current shape is correct; default for frozen + whole_struct (the ideal shape).
insufficient_data - access pattern is mixed or frequency is unknown; needs runtime profiling per pipeline.

The 4-direction logic is in src/code_path_audit.py:recommended_direction(). The savings estimates are heuristic (calibrated by pipeline_runtime_profiling_20260607); use as ranking input, not as actual savings.

3. The override file format

scripts/code_path_audit_overrides.toml (TOML) lets the user adjust per-aggregate. Sections:

[memory_dim]
"Metadata" = "curation"

[frequency]
"src.cleanup.do_nothing" = "cold"

The file is optional. Missing file = empty overrides (the canonical mappings + heuristics apply).

4. The 4 mem dim classification rules

MemoryDim is a 7-value Literal: curation, discussion, rag, knowledge, config, control, unknown. The classification precedence (per src/code_path_audit.py:classify_memory_dim()): overrides > canonical mappings > file-of-origin heuristic > unknown.

curation: per-file structural (FileItem, FileItems, ContextPreset).
discussion: per-turn conversational (Metadata, CommsLog, History, ChatMessage).
rag: opt-in semantic (RAGEngine state, indexed chunks).
knowledge: per-project durable (knowledge category files, digest).
config: project / global config (manual_slop.toml, presets.toml, personas.toml).
control: propagation primitives (Result[T], ErrorInfo, WebSocketMessage, ToolSpec, NormalizedResponse).
unknown: the audit can't classify; flagged for human review.

5. The cross-audit integration contract

The v2 audit consumes JSON from 6 input sources (in tests/artifacts/audit_inputs/):

Input	Producer	Shape
`audit_weak_types.json`	`scripts/audit_weak_types.py --json`	`{"findings": [{"file", "line", "type_string", "category"}]}`
`audit_exception_handling.json`	`scripts/audit_exception_handling.py --json`	`{"findings": [{"file", "line", "category", "function", "class", "body_summary"}]}`
`audit_optional_in_3_files.json`	`scripts/audit_optional_in_3_files.py --json`	`{"findings": [{"file", "line", "return_type", "function"}]}`
`audit_no_models_config_io.json`	`scripts/audit_no_models_config_io.py --json`	`{"findings": [{"file", "line", "function", "config_path"}]}`
`audit_main_thread_imports.json`	`scripts/audit_main_thread_imports.py --json`	`{"findings": [{"file", "line", "imported_module", "thread"}]}`
`type_registry.json`	`scripts/generate_type_registry.py --json`	`{"types": {"<aggregate>": {"file", "fields": [{"name", "type", "optional"}]}}}`

Tolerance: if any input is missing or malformed, the audit continues with the corresponding cross_audit_findings field set to () and the markdown notes the missing input. The audit does NOT fail on missing inputs.

The finding-to-aggregate mapping is 3-tier: tier 1 (function lookup) > tier 2 (field lookup via type registry) > tier 3 (heuristic fallback by file-of-origin). Each finding gets a (aggregate, confidence, mapping_tier) triple.

5.6 KiB Raw Blame History