Private

Public Access

Files

T

ed 5ac0618a33 refactor(scripts): move 7 code_path_audit files from src/ to scripts/code_path_audit/

The 7 code_path_audit*.py files (2604 lines total) are pure static
analysis tools. They do AST traversal of src/, no intrusive profiling,
no runtime markers. They were inlaid with src/ but only import:
- src.result_types (the Result[T] convention type)
- each other (the 6 siblings)

After the move:
- src/ is now pure application code; line-count audit metrics are clean
- scripts/code_path_audit/ is a new namespace-isolated subdir per
  AGENTS.md 'scripts are namespace-isolated by directory' rule

TIER-3 READ AGENTS.md + conductor/workflow.md + conductor/edit_workflow.md
+ conductor/code_styleguides/code_path_audit.md + the 7 files before
this commit.

Changes:
- 7 files moved: src/code_path_audit*.py -> scripts/code_path_audit/
- 7 files updated: internal imports rom src.code_path_audit_X ->
  rom code_path_audit_X (siblings in same subdir)
- 7 files updated: add sys.path.insert(0, str(Path(__file__).resolve().parents[2] / 'src'))
  to find src.result_types when run standalone
- 5 test files updated: rom src.code_path_audit -> rom code_path_audit
  + sys.path setup to find the new subdir
- 6 throwaway scripts in scripts/tier2/artifacts/ updated: import path
  + sys.path setup (parents[3] / 'src' + parents[3] / 'scripts' / 'code_path_audit')
- 2 styleguide/spec references updated: conductor/code_styleguides/code_path_audit.md
  + conductor/tracks/code_path_audit_20260607/spec_v2.md
- 1 meta-audit docstring updated: scripts/audit_code_path_audit_coverage.py
- 1 type registry entry deleted: docs/type_registry/src_code_path_audit.md
  (the type is no longer in src/)
- 1 type registry index updated: docs/type_registry/index.md (22 files, was 23)

Verification:
- 7/7 audit gates pass --strict (weak_types 102<=112, type_registry 22 files,
  main_thread_imports OK, no_models_config_io OK, code_path_audit_coverage 0
  violations, exception_handling 0 violations, optional_in_3_files 0 violations)
- 6/6 test files pass: test_code_path_audit, test_code_path_audit_integration,
  test_code_path_audit_phase78, test_code_path_audit_phase89,
  test_code_path_audit_ssdl_behavioral, test_metadata_nil_sentinel
- src/ line count: 29997 lines (down from 32621 = -2624 lines)
- scripts/code_path_audit/ line count: 2620 lines

2026-06-25 09:29:24 -04:00

5.7 KiB

Raw Blame History

Code Path & Data Pipeline Audit Styleguide

Status: Active convention as of 2026-06-22. Established by the code_path_audit_20260607 v2 track.

This styleguide codifies the contract for scripts/code_path_audit/code_path_audit.py v2 and the 6 input audit scripts it consumes. Companion to data_oriented_design.md, error_handling.md, type_aliases.md, and agent_memory_dimensions.md.

The 5 Conventions

1. Per-aggregate profile structure

Every AggregateProfile (the central artifact) has 15 fields (14 required + 1 default): name, aggregate_kind, memory_dim, producers, consumers, access_pattern, access_pattern_evidence, frequency, frequency_evidence, result_coverage, type_alias_coverage, cross_audit_findings, decomposition_cost, optimization_candidates, is_candidate (plus mermaid and markdown with defaults). The is_candidate: bool flag distinguishes the 3 placeholder aggregates (ToolSpec, ChatMessage, ProviderHistory) from the 10 real aggregates.

The custom postfix .dsl output is the canonical artifact: each section is a self-contained tagged record (flat, streamable, tag-scannable). The 14 new v2 DSL words: kind, mem-dim, fn-ref, access-pattern, ap-evidence, frequency, freq-evidence, result-coverage, type-alias-coverage, cross-audit-finding, cross-audit-findings, decomp-cost, opt-candidate, is-candidate. Arity table in scripts/code_path_audit/code_path_audit.py:DSL_WORD_ARITY_V2.

2. The 4 decomposition directions

For each aggregate, the audit computes a DecompositionCost (8 fields: current_cost_estimate, componentize_savings, unify_savings, recommended_direction, recommended_rationale, batch_size, struct_field_count, struct_frozen). The recommended_direction is one of:

componentize - split into smaller dataclasses; access pattern is field_by_field with many dead fields, OR hot_cold_split with small hot fields.
unify - combine into wider fat structs; access pattern is bulk_batched with a small struct, OR whole_struct with a small struct.
hold - current shape is correct; default for frozen + whole_struct (the ideal shape).
insufficient_data - access pattern is mixed or frequency is unknown; needs runtime profiling per pipeline.

The 4-direction logic is in scripts/code_path_audit/code_path_audit.py:recommended_direction(). The savings estimates are heuristic (calibrated by pipeline_runtime_profiling_20260607); use as ranking input, not as actual savings.

3. The override file format

scripts/code_path_audit_overrides.toml (TOML) lets the user adjust per-aggregate. Sections:

[memory_dim]
"Metadata" = "curation"

[frequency]
"src.cleanup.do_nothing" = "cold"

The file is optional. Missing file = empty overrides (the canonical mappings + heuristics apply).

4. The 4 mem dim classification rules

MemoryDim is a 7-value Literal: curation, discussion, rag, knowledge, config, control, unknown. The classification precedence (per scripts/code_path_audit/code_path_audit.py:classify_memory_dim()): overrides > canonical mappings > file-of-origin heuristic > unknown.

curation: per-file structural (FileItem, FileItems, ContextPreset).
discussion: per-turn conversational (Metadata, CommsLog, History, ChatMessage).
rag: opt-in semantic (RAGEngine state, indexed chunks).
knowledge: per-project durable (knowledge category files, digest).
config: project / global config (manual_slop.toml, presets.toml, personas.toml).
control: propagation primitives (Result[T], ErrorInfo, WebSocketMessage, ToolSpec, NormalizedResponse).
unknown: the audit can't classify; flagged for human review.

5. The cross-audit integration contract

The v2 audit consumes JSON from 6 input sources (in tests/artifacts/audit_inputs/):

Input	Producer	Shape
`audit_weak_types.json`	`scripts/audit_weak_types.py --json`	`{"findings": [{"file", "line", "type_string", "category"}]}`
`audit_exception_handling.json`	`scripts/audit_exception_handling.py --json`	`{"findings": [{"file", "line", "category", "function", "class", "body_summary"}]}`
`audit_optional_in_3_files.json`	`scripts/audit_optional_in_3_files.py --json`	`{"findings": [{"file", "line", "return_type", "function"}]}`
`audit_no_models_config_io.json`	`scripts/audit_no_models_config_io.py --json`	`{"findings": [{"file", "line", "function", "config_path"}]}`
`audit_main_thread_imports.json`	`scripts/audit_main_thread_imports.py --json`	`{"findings": [{"file", "line", "imported_module", "thread"}]}`
`type_registry.json`	`scripts/generate_type_registry.py --json`	`{"types": {"<aggregate>": {"file", "fields": [{"name", "type", "optional"}]}}}`

Tolerance: if any input is missing or malformed, the audit continues with the corresponding cross_audit_findings field set to () and the markdown notes the missing input. The audit does NOT fail on missing inputs.

The finding-to-aggregate mapping is 3-tier: tier 1 (function lookup) > tier 2 (field lookup via type registry) > tier 3 (heuristic fallback by file-of-origin). Each finding gets a (aggregate, confidence, mapping_tier) triple.

5.7 KiB Raw Blame History