conductor(handoff): code_path_audit_20260607 v2 - metadata + state + TIER2_STARTUP

metadata.json: standard track metadata (15 fields per the live_gui_test_fixes_20260618 precedent; includes scope, depends_on, blocks, out_of_scope, tolerated_at_run_time, test_summary, verification_criteria, 10 risks). state.toml: initial state (status=active, current_phase=0; 14 phases pending; 19 verification flags all false). TIER2_STARTUP.md: the per-track readme for the Tier 2 agent. Track-specific supplement to conductor/tier2/agents/tier2-autonomous.md. Covers: what to load (plan_v2.md first, spec_v2.md second; do NOT load v1 spec/plan), hard bans (3-layer), conventions, TDD protocol, per-task commit protocol, pre-delegation checkpoint, failcount contract, 8 known gotchas, verification protocol, end-of-track handoff, out-of-scope restatement. EXPLICITLY NOTES: - any_type_componentization_20260621 + phase2_4_5_call_site_completion_20260621 are NOT on master (merged f914b2bc, reverted 751b94d4). v2 audit is tolerant of their absence. - The 3 candidate aggregates (ToolSpec, ChatMessage, ProviderHistory) are forward-compat placeholders with is_candidate: True. The integration tests verify the placeholder format (synthesize_aggregate_profile() in Phase 9 Task 9.2 has the template hard-coded). - The 1-line extension to scripts/audit_optional_in_3_files.py is the audit gate; skipping Phase 12 Task 12.2 leaves the new file uncovered by the Optional[T] ban. Total v2 artifacts (committed): - spec_v2.md (460 lines) - plan_v2.md (5006 lines) - metadata.json - state.toml - TIER2_STARTUP.md
2026-06-22 00:27:03 -04:00
parent 85baea8cf0
commit d20e1c2e78
3 changed files with 527 additions and 0 deletions
@@ -0,0 +1,263 @@
+# Tier 2 Startup — code_path_audit_20260607 v2
+
+> **For Tier 2 Tech Lead (autonomous mode).** This is the entry point. Read this file first, then `plan_v2.md`, then `spec_v2.md`. The v1 files (`spec.md` + `plan.md`) are **preserved unchanged and never executed** — do not load them as the canonical spec.
+
+## What this track is
+
+Build `src/code_path_audit.py` v2 — a data-oriented static-analysis tool that audits the 13 data aggregates in `src/` (10 in-scope TypeAliases + 3 candidate placeholders for `any_type_componentization_20260621` which is NOT on master) and produces per-aggregate profiles. The output (custom postfix `.dsl` + markdown + prefix tree text) is the artifact that informs per-aggregate refactor decisions.
+
+**Why v2 supersedes v1:** v1 was authored 2026-06-07 before the 4 foundational tracks shipped. v1's "per-action" framing is now stale. v2 reframes the audit to "per-data-aggregate" + a 4-direction decomposition-cost heuristic (componentize / unify / hold / insufficient_data) per aggregate. v2 also cross-validates the 2 foundational conventions (`data_structure_strengthening_20260606` + `data_oriented_error_handling_20260606`) directly.
+
+**The user's framing (2026-06-22):**
+> "The whole point of the code path audit is to audit all paths nearly in the ./src of the codebase. The main point of it is to identify data-oriented pipelines and what data aggregate they will be operating on. This will realize what the data strengthening just uncovered and cross-audit if its deductions on the data structures are accurate while also being able to utilize additional flexibility the data oriented error handling track has provided. We are entering a time where the codebase is getting heavily adjusted into a properly engineered machine with discernable working parts. The cost of the pipeline is important, it should factor in what data needs to be componentized further vs which can be unified further into wider code paths handling larger fat structs."
+
+## What to load
+
+In this order:
+1. **This file** (`TIER2_STARTUP.md`) — startup context.
+2. **`plan_v2.md`** — the executable plan. 14 phases, 85+ tasks, 91 tests. **This is the source of truth for execution.**
+3. **`spec_v2.md`** — the design intent. Read this when the plan is ambiguous.
+4. **DO NOT load `spec.md` or `plan.md`** — those are the v1 files (preserved, never executed). The plan_v2.md supersedes plan.md.
+
+## What's on master (verified `7e61dd7d` + commits `7ea414e9` + `85baea8c`)
+
+- `src/type_aliases.py` — the 10 canonical TypeAliases + 1 NamedTuple (`FileItemsDiff`).
+- `src/result_types.py` — `Result[T]`, `ErrorInfo`, `ErrorKind`, `NilPath`, `NilRAGState`, `OK`.
+- `src/mcp_client.py:934-992` — `derive_code_path(target, max_depth=5)` (the v1 primitive; v2's PCG is the multi-symbol superset).
+- `src/performance_monitor.py` — runtime profiling (used by the `pipeline_runtime_profiling_20260607` follow-up, NOT by this track).
+- `scripts/audit_main_thread_imports.py` — import-graph CI gate.
+- `scripts/audit_weak_types.py` — weak-types CI gate.
+- `scripts/audit_exception_handling.py` — exception-handling CI gate.
+- `scripts/audit_no_models_config_io.py` — config-I/O ownership CI gate.
+- `scripts/audit_optional_in_3_files.py` — `Optional[T]` ban CI gate (the 3 baseline files; v2 extends this with +1 line in Phase 12).
+- `scripts/generate_type_registry.py` — type-registry generator.
+- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference.
+- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention.
+- `conductor/code_styleguides/type_aliases.md` — the 10 TypeAliases.
+- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 mem dims.
+
+**NOT on master (and the v2 audit must tolerate their absence for an interim run):**
+- `any_type_componentization_20260621` — merged `f914b2bc`, reverted `751b94d4` (9 minutes later). The 3 candidate aggregates (`ToolSpec`, `ChatMessage`, `ProviderHistory`) are forward-compat placeholders with `is_candidate: True`.
+- `phase2_4_5_call_site_completion_20260621` — same merge+revert history. The `PHASE3_HYPOTHETICAL_PROMOTION.md` report is NOT on master (reverted with the merge).
+
+**3 handoff files are also NOT on master** (reverted with the merge): `HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md`, `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md`, `PROMPT_FOR_TIER_1.md`. The v2 spec/plan do NOT reference these by name; the candidate-aggregate handling is described from first principles.
+
+## Hard Bans (3-layer enforced)
+
+These are restated from `conductor/tier2/agents/tier2-autonomous.md`; they apply on every commit:
+
+- `git push*` (any form) — the user fetches the branch + reviews + merges.
+- `git checkout*` (any form) — use `git switch -c` for new branches, `git switch` to switch.
+- `git restore*` (any form) — never restore files.
+- `git reset*` (any form) — never reset state.
+- File access outside `C:\projects\manual_slop_tier2\` (the Tier 2 clone) — the Windows restricted token blocks it.
+- **`*AppData\\*`** — AppData is OFF-LIMITS for any read, write, or shell command. Use `tests/artifacts/tier2_state/<track>/` for failcount state, `tests/artifacts/tier2_failures/` for failure reports, `scripts/tier2/artifacts/<track>/` for throwaway scripts.
+
+If a task requires one of these, **STOP and report to the user** — do not bypass.
+
+## Conventions (MUST follow)
+
+- **Test runner:** `uv run python scripts/run_tests_batched.py` (NEVER `uv run pytest` directly; the batched runner provides tier-based filtering, parallelization, and the summary table).
+- **Default branch:** `master` (not `main`).
+- **Line endings:** preserve existing. This repo has a mix of CRLF and LF. Do not normalize.
+- **Throw-away scripts:** `scripts/tier2/artifacts/code_path_audit_20260607/` (NOT the base `scripts/tier2/` dir).
+- **End-of-track report:** `docs/reports/TRACK_COMPLETION_code_path_audit_20260607.md` (the file name uses the track_id, not the date; check the precedent set by `TRACK_COMPLETION_live_gui_test_fixes_20260618.md`).
+
+## TDD Protocol (per `conductor/workflow.md`)
+
+1. **Red:** write the failing test (1 commit). Run `uv run python scripts/run_tests_batched.py` and confirm FAIL.
+2. **Green:** implement the minimal code to pass (1 commit). Run and confirm PASS.
+3. **Refactor:** (optional) 1 commit if there's cleanup.
+4. **Commit per task** (1 task = 1 commit). Attach a git note summarizing the task.
+5. **Update `plan_v2.md`**: change `[ ]` to `[x] <7-char-sha>` for the completed task. Commit the plan update.
+
+## Per-Task Commit Protocol
+
+After each task:
+1. `git add <specific files>` (not `git add .` for individual commits).
+2. `git commit -m "<type>(<scope>): <description>"` (e.g., `feat(audit): add the 5 enums`).
+3. Get the commit hash: `git log -1 --format="%H"`.
+4. Attach git note: `git notes add -m "Task N.M: ..." <hash>`.
+5. Update `plan_v2.md`: change `[ ]` to `[x] <7-char-sha>` for the task.
+6. Commit the plan update: `git add plan_v2.md && git commit -m "conductor(plan): Mark task N.M complete"`.
+
+## Pre-Delegation Checkpoint
+
+Before each Tier 3 worker delegation, run `git add .` to stage prior work. This is a safety net: if the worker fails or incorrectly runs `git restore`, your prior iterations are not lost.
+
+## Failcount Contract
+
+After every task commit, you MUST check `should_give_up` from `scripts.tier2.failcount`. The state is persisted at `tests/artifacts/tier2_state/code_path_audit_20260607/state.json` (project-relative; resolved via `Path(__file__).parents[2]` in the failcount module). The thresholds are:
+- 3 consecutive red-phase failures
+- 3 consecutive green-phase failures
+- 30 minutes with no progress (no commit, no green test)
+
+If `should_give_up` returns True, IMMEDIATELY stop. Do not attempt another fix. Call `write_failure_report` from `scripts.tier2.write_report` and print the report path. Then **escalate to the user** (do not just write a report and stop silently).
+
+## Track-Specific Guidance
+
+### The 3 candidate aggregates
+
+The 3 candidate aggregates (`ToolSpec`, `ChatMessage`, `ProviderHistory`) are NOT on master. The v2 audit produces **placeholders** with `is_candidate: True` and all metrics set to 0. The `candidates.md` rollup explains the placeholder status. The integration tests verify the placeholder format.
+
+**The v2 spec's `synthesize_aggregate_profile()` Task 9.2 has the placeholder template hard-coded.** When implementing it, use the exact template from the spec — do not invent a different placeholder structure.
+
+### The 4 audit gates
+
+After every commit, run:
+```bash
+uv run python scripts/audit_exception_handling.py --strict
+uv run python scripts/audit_weak_types.py --strict
+uv run python scripts/audit_main_thread_imports.py
+uv run python scripts/audit_no_models_config_io.py
+```
+
+These are the "laws of physics" for `src/code_path_audit.py`. If a gate fails, **fix before continuing**. The most likely failure mode is a Tier 3 worker adding an `Optional[T]` return type (banned in the 3 refactored files + the new file) or a `try/except: pass` (banned per `error_handling.md` Pattern 5).
+
+### The `Result[T]` return type rule
+
+**Every public function in `src/code_path_audit.py` that can fail at runtime returns `Result[T]`.** No `Optional[T]` returns. No `None` returns. No `raise Exception(...)` (only `raise` for programmer errors, e.g., `raise ValueError` in `__init__` for missing config).
+
+The plan marks 6 of the 11 public functions as returning deterministic `T` (no failure mode). The other 5 (1, 2, 7, 9, 10) return `Result[T]`. **Do not add `Result[T]` to the deterministic ones** — it adds noise. **Do not skip `Result[T]` on the fallible ones** — it violates the convention.
+
+### The 11 public functions (per the spec)
+
+| # | Function | Returns | Phase |
+|---|---|---|---|
+| 1 | `run_audit(...)` | `Result[AuditSummary]` | 9 |
+| 2 | `build_pcg(src_dir)` | `Result[ProducerConsumerGraph]` | 2 |
+| 3 | `classify_memory_dim(...)` | `MemoryDim` (deterministic) | 3 |
+| 4 | `detect_access_pattern(...)` | `AccessPattern` (deterministic) | 4 |
+| 5 | `estimate_call_frequency(...)` | `Frequency` (deterministic) | 5 |
+| 6 | `compute_decomposition_cost(...)` | `DecompositionCost` (deterministic) | 6 |
+| 7 | `read_input_json(path)` | `Result[dict]` | 7 |
+| 8 | `to_dsl_v2(profile)` | `str` (deterministic) | 8 |
+| 9 | `parse_dsl_v2(text)` | `Result[dict]` | 8 |
+| 10 | `to_markdown(profile)` | `str` (deterministic) | 8 |
+| 11 | `to_tree(profile)` | `str` (deterministic) | 8 |
+
+Plus the CLI (`if __name__ == "__main__":`) and the MCP tool wrapper (`code_path_audit_v2`).
+
+### The 14 v2 DSL tagged words (per the spec)
+
+`kind`, `mem-dim`, `fn-ref`, `access-pattern`, `ap-evidence`, `frequency`, `freq-evidence`, `result-coverage`, `type-alias-coverage`, `cross-audit-finding`, `cross-audit-findings`, `decomp-cost`, `opt-candidate`, `is-candidate`. The arity table is in `src/code_path_audit.py:DSL_WORD_ARITY_V2` (Phase 8 Task 8.1).
+
+The DSL format is **flat sections** (streamable, tag-scannable) — NOT a nested record. Each `\\ === section_name ===` line is followed by the section's tagged records. This is the v1 design's "no need to parse the whole file" property applied to v2.
+
+### The 5 enums (per the spec)
+
+`AggregateKind` (4 values: typealias, dataclass, candidate_dataclass, builtin), `MemoryDim` (7 values: curation, discussion, rag, knowledge, config, control, unknown), `AccessPattern` (5 values: whole_struct, field_by_field, hot_cold_split, bulk_batched, mixed), `Frequency` (7 values: hot, per_turn, per_discussion, per_request, cold, init, unknown), `RecommendedDirection` (4 values: componentize, unify, hold, insufficient_data).
+
+All enums are `Literal[...]` types (string-valued) for stable postfix DSL output. No `Enum` class — the v1 spec's rationale is "no enum-name lookup table needed in the parser."
+
+### The 9 supporting dataclasses (per the spec)
+
+`FunctionRef`, `AccessPatternEvidence`, `FrequencyEvidence`, `ResultCoverage`, `TypeAliasCoverage`, `CrossAuditFinding`, `CrossAuditFindings`, `DecompositionCost`, `OptimizationCandidate`. Plus the central `AggregateProfile` (14 required fields + 2 default). All `frozen=True` per the immutability story.
+
+### The 4 decomposition directions (per the spec)
+
+- `componentize` — split into smaller dataclasses; access pattern is `field_by_field` with many dead fields, OR `hot_cold_split` with small hot fields.
+- `unify` — combine into wider fat structs; access pattern is `bulk_batched` with a small struct, OR `whole_struct` with a small struct.
+- `hold` — current shape is correct; default for `frozen + whole_struct` (the ideal shape).
+- `insufficient_data` — access pattern is `mixed` or frequency is `unknown`; needs runtime profiling.
+
+The 4-direction logic is in `src/code_path_audit.py:recommended_direction()` (Phase 6 Task 6.6). The savings estimates are heuristic (calibrated by `pipeline_runtime_profiling_20260607`); use as ranking input, not as actual savings.
+
+### The 6 input JSON contracts (per the spec)
+
+The v2 audit consumes JSON from 6 sources in `tests/artifacts/audit_inputs/` (gitignored per `test_sandbox.md`):
+
+| Input | Producer | Path |
+|---|---|---|
+| 1 | `scripts/audit_weak_types.py --json` | `audit_weak_types.json` |
+| 2 | `scripts/audit_exception_handling.py --json` | `audit_exception_handling.json` |
+| 3 | `scripts/audit_optional_in_3_files.py --json` | `audit_optional_in_3_files.json` |
+| 4 | `scripts/audit_no_models_config_io.py --json` | `audit_no_models_config_io.json` |
+| 5 | `scripts/audit_main_thread_imports.py --json` | `audit_main_thread_imports.json` |
+| 6 | `scripts/generate_type_registry.py --json` | `type_registry.json` |
+
+**Tolerance:** if any input is missing or malformed, the audit continues with the corresponding `cross_audit_findings` field set to `()` (empty tuple) and the markdown notes the missing input. The audit does NOT fail on missing inputs.
+
+### The integration test fixture
+
+`tests/fixtures/synthetic_src/` defines 3 TypeAliases (Metadata, FileItems, History) + 6 functions (2 producers, 4 consumers). `tests/fixtures/audit_inputs/` has 6 JSON files matching the contracts. The integration tests assert the exact expected profiles per aggregate (the expected output is in the spec's §7.1 + the plan's Phase 10 tasks).
+
+**The fixture names match the canonical TypeAliases** (Metadata, FileItems, History) so the audit's `CANONICAL_MEMORY_DIM` lookup works correctly. Do not rename the fixture's aggregates.
+
+## Known gotchas (from prior tracks' lessons)
+
+These are the "1% chance this happens but you'll waste 4 hours if you don't know" notes:
+
+1. **`Optional[T]` ban extends to the new file.** The `scripts/audit_optional_in_3_files.py` script will be extended in Phase 12 to check `src/code_path_audit.py`. If any Tier 3 worker adds an `Optional[T]` return, the extended audit fails. **Read `conductor/code_styleguides/error_handling.md` before writing the public API.** The 5 MUST-DO rules and 7 MUST-NOT-DO rules apply.
+
+2. **Logging is NOT a drain.** Per `error_handling.md` Pattern A: `sys.stderr.write` / `logging.error` / `print` in an except body is `INTERNAL_SILENT_SWALLOW`, a violation. The CLI / MCP entry points are the drain points. Use `Result[T]` propagation and let the error reach the drain.
+
+3. **The AST walker does NOT execute the code.** The PCG, APD, CFE are pure static analysis. No `eval`, no `exec`, no imports of `src/*` modules that have side effects. The v2 audit reads files; it does not import them.
+
+4. **`scripts/run_tests_batched.py` is the only test runner.** Direct `uv run pytest` may work for a single file but bypasses the tiering that the live_gui tests depend on. The failcount and per-tier filtering only work with the batched runner.
+
+5. **`master` is the default branch.** This repo never had `main`. `git fetch origin master` (NOT `main`).
+
+6. **The CRLF/LF mix is intentional.** Do not normalize. Per-file preservation.
+
+7. **The 3 candidate aggregates are placeholders.** When you run the audit on `master`, the `candidates.md` rollup will show 3 placeholders with `is_candidate: True`. This is correct. The placeholders become real profiles when `any_type_componentization_20260621` is re-merged.
+
+8. **The 1-line extension to `scripts/audit_optional_in_3_files.py` is the audit gate.** If you skip Phase 12 Task 12.2, the new file is not covered by the `Optional[T]` ban, and a future Tier 3 worker could regress the convention. Do the extension.
+
+## Verification Protocol (per `conductor/workflow.md`)
+
+After every task, run the **4 audit gates** in `--strict` mode + the unit tests:
+
+```bash
+uv run pytest tests/test_code_path_audit.py -q
+uv run python scripts/audit_exception_handling.py --strict
+uv run python scripts/audit_weak_types.py --strict
+uv run python scripts/audit_main_thread_imports.py
+uv run python scripts/audit_no_models_config_io.py
+```
+
+At **end-of-track** (Phase 13), add:
+```bash
+uv run python -m src.code_path_audit --all --date 2026-06-22
+uv run python scripts/audit_code_path_audit_coverage.py --input-dir docs/reports/code_path_audit/2026-06-22/ --strict
+uv run python scripts/generate_type_registry.py --check
+```
+
+## End-of-Track Handoff
+
+When all 14 phases complete, write `docs/reports/TRACK_COMPLETION_code_path_audit_20260607.md` (the user reads this to decide merge). Update `conductor/tracks.md` with the v2 entry. Update `state.toml` to `status = "completed"` and `current_phase = "complete"`.
+
+The TRACK_COMPLETION report should include:
+- What shipped (file inventory).
+- Verification: 91 tests pass + 4 audit gates + meta-audit + type registry.
+- The cross-validation verdict (does the v2 audit's data match the actual state of `data_structure_strengthening` + `data_oriented_error_handling`?).
+- The 5 follow-up tracks.
+- The 3 candidate aggregates' forward-compat status.
+
+## Out of scope (restated)
+
+- Modifications to existing `src/*.py` files (read-only on the 65 existing files).
+- Modifications to the 5 existing audit scripts (consume their JSON; don't change them).
+- Runtime profiling (deferred to `pipeline_runtime_profiling_20260607`).
+- New pip dependencies (stdlib only).
+- Changes to v1 spec.md or plan.md (preserved unchanged).
+- MMA worker spawn action (cold per user).
+- New src/<thing>.py files (per AGENTS.md file size + naming convention).
+- The 23 lower-impact files (deferred).
+
+## See also
+
+- `conductor/tracks/code_path_audit_20260607/spec_v2.md` — the canonical spec (design intent).
+- `conductor/tracks/code_path_audit_20260607/plan_v2.md` — the canonical plan (executable).
+- `conductor/tracks/code_path_audit_20260607/metadata.json` — the track metadata.
+- `conductor/tracks/code_path_audit_20260607/state.toml` — the track state.
+- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference.
+- `conductor/code_styleguides/error_handling.md` — the `Result[T]` convention.
+- `conductor/code_styleguides/type_aliases.md` — the 10 TypeAliases.
+- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 mem dims.
+- `conductor/tier2/agents/tier2-autonomous.md` — the Tier 2 agent prompt (this file is the track-specific supplement).
+- `conductor/tier2/commands/tier-2-auto-execute.md` — the execute command.
+- `docs/reports/RESULT_MIGRATION_CAMPAIGN_STATUS_20260619.md` — the 100%-complete result migration campaign (the v2 audit runs against this final state).
+- `docs/reports/ANY_TYPE_AUDIT_20260621.md` — the 89-site audit that informed the 3 candidate aggregates.
+- `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md` — the cost analysis that informed the `ProviderHistory` candidate (NOT on master; reverted with the merge).
+- `conductor/tracks/nagent_review_20260608/nagent_takeaways_v3_1_20260620.md` — the v3.1 nagent review (Candidate 27: Markdown + custom DSL lock-in is the direct application of the v2's custom postfix DSL).
@@ -0,0 +1,200 @@
+{
+ "id": "code_path_audit_20260607",
+ "title": "Code Path & Data Pipeline Audit v2",
+ "type": "tooling",
+ "status": "active",
+ "priority": "A",
+ "created": "2026-06-07",
+ "last_revised": "2026-06-22",
+ "owner": "tier2-tech-lead",
+ "parent_umbrella": null,
+ "spec": "conductor/tracks/code_path_audit_20260607/spec_v2.md",
+ "plan": "conductor/tracks/code_path_audit_20260607/plan_v2.md",
+ "spec_v1_preserved": "conductor/tracks/code_path_audit_20260607/spec.md (v1, never executed; preserved unchanged)",
+ "plan_v1_preserved": "conductor/tracks/code_path_audit_20260607/plan.md (v1, never executed; preserved unchanged)",
+ "v2_revision_rationale": "v1 was authored 2026-06-07 before the 4 foundational tracks shipped; v1 framing is now stale. v2 re-scopes the audit from 'expensive operations per action' to 'data pipelines per aggregate' + a decomposition-cost heuristic (componentize vs unify) per aggregate. v2 also cross-validates data_structure_strengthening + data_oriented_error_handling directly (the 2 foundational tracks didn't exist on 2026-06-07).",
+ "scope": {
+ "files_created": 17,
+ "files_created_paths": [
+ "src/code_path_audit.py",
+ "tests/test_code_path_audit.py",
+ "tests/test_code_path_audit_live_gui.py",
+ "tests/fixtures/synthetic_src/__init__.py",
+ "tests/fixtures/synthetic_src/type_aliases.py",
+ "tests/fixtures/synthetic_src/ai_client.py",
+ "tests/fixtures/synthetic_src/aggregate.py",
+ "tests/fixtures/synthetic_src/gui_2.py",
+ "tests/fixtures/synthetic_src/cleanup.py",
+ "tests/fixtures/synthetic_src/overrides.toml",
+ "tests/fixtures/audit_inputs/audit_weak_types.json",
+ "tests/fixtures/audit_inputs/audit_exception_handling.json",
+ "tests/fixtures/audit_inputs/audit_optional_in_3_files.json",
+ "tests/fixtures/audit_inputs/audit_no_models_config_io.json",
+ "tests/fixtures/audit_inputs/audit_main_thread_imports.json",
+ "tests/fixtures/audit_inputs/type_registry.json",
+ "scripts/audit_code_path_audit_coverage.py",
+ "conductor/code_styleguides/code_path_audit.md"
+ ],
+ "files_modified": 1,
+ "files_modified_paths": [
+ "scripts/audit_optional_in_3_files.py (+1 line: add src/code_path_audit.py to the baseline list)"
+ ],
+ "files_preserved_v1": [
+ "conductor/tracks/code_path_audit_20260607/spec.md (v1)",
+ "conductor/tracks/code_path_audit_20260607/plan.md (v1)"
+ ],
+ "phases": 14,
+ "tasks": 85,
+ "tests_total": 91,
+ "tests_unit": 84,
+ "tests_integration": 7,
+ "tests_live_gui_opt_in": 2,
+ "aggregates_total": 13,
+ "aggregates_real": 10,
+ "aggregates_candidate": 3,
+ "rollups": 4,
+ "follow_up_tracks": 5
+ },
+ "depends_on": [
+ "data_oriented_error_handling_20260606 (SHIPPED; the v2 audit's result_coverage cross-checks this)",
+ "data_structure_strengthening_20260606 (SHIPPED; the v2 audit's type_alias_coverage cross-checks this)",
+ "mcp_architecture_refactor_20260606 (SHIPPED; provides the 6 input audit scripts' baselines)",
+ "qwen_llama_grok_integration_20260606 (SHIPPED; the v2 audit covers the 8 _send_<vendor> functions)",
+ "result_migration_20260616 (100% complete as of 2026-06-21; the v2 audit runs against the post-migration src/)"
+ ],
+ "blocks": [
+ "pipeline_runtime_profiling_20260607 (preserved from v1; calibrates v2's heuristic cost constants against real measurements)",
+ "data_pipelines_inventory_<date> (per-pipeline vs per-aggregate reports for the top 5 pipelines)",
+ "code_path_audit_in_ci_<date> (run v2 in CI on every PR)",
+ "code_path_audit_data_oriented_refactor_<date> (implement the 3 high-priority componentize candidates)",
+ "code_path_audit_v2_5_followup_<date> (re-run v2 after any_type_componentization_20260621 merges)"
+ ],
+ "out_of_scope": [
+ "No modifications to existing src/*.py files (read-only on the 65 existing files; the v2 audit doesn't change them).",
+ "No modifications to the 5 existing audit scripts (consume their JSON; don't change them).",
+ "No runtime profiling (deferred to pipeline_runtime_profiling_20260607).",
+ "No new pip dependencies (stdlib only: ast, pathlib, json, dataclasses, tomllib, re).",
+ "No changes to data_structure_strengthening or data_oriented_error_handling styleguides.",
+ "No changes to v1 spec.md or plan.md (v1 preserved unchanged).",
+ "No MMA worker spawn action (preserved from v1; user directive 2026-06-07: cold until 1:1 discussion UX is dogfooded).",
+ "No new src/<thing>.py files (per AGENTS.md file size + naming convention: helpers and sub-systems go in the parent module).",
+ "The 23 lower-impact files (1-9 weak-type sites each; deferred to a follow-up track).",
+ "The 3 candidate aggregates' 'real' analysis (deferred to code_path_audit_v2_5_followup_<date>).",
+ "The v1-style per-action output is preserved for backward compat but downgraded to cross-references."
+ ],
+ "tolerated_at_run_time": [
+ "any_type_componentization_20260621 is NOT on master (merged f914b2bc, reverted 751b94d4); the v2 audit produces placeholders for the 3 candidate aggregates with is_candidate: True.",
+ "phase2_4_5_call_site_completion_20260621 is NOT on master (same merge+revert history).",
+ "Missing input JSONs in tests/artifacts/audit_inputs/ are tolerated (the corresponding cross_audit_findings field is empty; the markdown notes the absence).",
+ "Malformed input JSONs are tolerated (the read_input_json() returns Result with errors; the v2 audit continues with empty data)."
+ ],
+ "test_summary": {
+ "tests_total": 91,
+ "tests_unit": 84,
+ "tests_integration": 7,
+ "tests_live_gui_opt_in": 2,
+ "test_tier_count": 11,
+ "test_pass_count_target": "All 91 tests PASS; the 2 live_gui are opt-in (CODE_PATH_AUDIT_LIVE_GUI=1)"
+ },
+ "verification_criteria": [
+ "FR-1: src/code_path_audit.py is created with the 11 public functions + 4 static analyzers (PCG, MemoryDim, APD, CFE) + 4 renderers (to_dsl_v2, to_markdown, to_tree, parse_dsl_v2) + run_audit() main entry + CLI + MCP tool wrapper",
+ "FR-2: All 11 public functions return Result[T] per error_handling.md (or return a deterministic T when no runtime failure is possible)",
+ "FR-3: The 4 audit gates pass in --strict mode (audit_exception_handling, audit_weak_types, audit_main_thread_imports, audit_no_models_config_io)",
+ "FR-4: The meta-audit (scripts/audit_code_path_audit_coverage.py) passes on the real audit output (0 schema violations)",
+ "FR-5: The type registry is in sync with src/type_aliases.py (scripts/generate_type_registry.py --check exits 0)",
+ "FR-6: 91 tests pass (84 unit + 7 integration; 2 live_gui are opt-in)",
+ "FR-7: The audit output (13 per-aggregate .dsl + .md + .tree files + 4 rollups) is committed to docs/reports/code_path_audit/2026-06-22/",
+ "FR-8: The TRACK_COMPLETION report is written to docs/reports/TRACK_COMPLETION_code_path_audit_20260622.md",
+ "FR-9: conductor/tracks.md is updated with the v2 track entry (the checkpoint SHA from the TRACK_COMPLETION report commit)",
+ "FR-10: The 1-line extension to scripts/audit_optional_in_3_files.py is committed; the extended audit passes in --strict mode",
+ "FR-11: conductor/code_styleguides/code_path_audit.md is written (the 5-convention styleguide)",
+ "Atomic per-task commits with git notes per conductor/workflow.md step 9.1-9.3",
+ "No day estimates, no T-shirt sizes in any artifact"
+ ],
+ "risks": [
+ {
+ "id": "R1",
+ "description": "The decomposition-cost heuristic is inaccurate (componentize_savings overestimate or underestimate)",
+ "mitigation": "The runtime-profiling follow-up recalibrates. The override file (scripts/code_path_audit_overrides.toml) lets the user adjust per-aggregate. The summary.md and decomposition_matrix.md headers caveat: 'Savings estimates are heuristic; use as ranking input, not as actual savings.'"
+ },
+ {
+ "id": "R2",
+ "description": "The PCG misses dynamic patterns (eval, getattr, decorator-driven dispatch like @imscope)",
+ "mitigation": "The override file lists the known passthroughs. The runtime-profiling follow-up catches the unresolved. The v1 spec's 'unresolved_calls' pattern is preserved."
+ },
+ {
+ "id": "R3",
+ "description": "The 6 input JSON contracts drift (the existing audit scripts evolve without bumping the v2 audit's contract)",
+ "mitigation": "The scripts/audit_code_path_audit_coverage.py meta-audit runs in CI; fails on schema drift. The v2 audit tolerates missing fields (returns empty cross_audit_findings; markdown notes the absence)."
+ },
+ {
+ "id": "R4",
+ "description": "The candidate aggregates don't merge (any_type_componentization_20260621 is delayed)",
+ "mitigation": "The v2 audit is forward-compatible. The is_candidate: bool flag handles the absence gracefully. The candidates.md rollup explains the placeholder status."
+ },
+ {
+ "id": "R5",
+ "description": "The v1 .dsl files don't round-trip (the v2 parser is more strict than v1)",
+ "mitigation": "The v2 parser is a superset of v1; the v1 action reports still parse. The test_v2_dsl_backward_compat_v1 test verifies."
+ },
+ {
+ "id": "R6",
+ "description": "The synthetic src/ fixture diverges from real src/ (the test expectations don't generalize)",
+ "mitigation": "The integration test layer runs against real src/ as well as the synthetic fixture. The 2 are decoupled."
+ },
+ {
+ "id": "R7",
+ "description": "The 4 audit gates regress during implementation (Tier 3 worker adds a try/except violation, Optional[T] return, etc.)",
+ "mitigation": "Run the 4 audit gates in --strict mode after every commit. If a gate fails, fix before continuing. The audit scripts are the 'laws of physics' for the new file."
+ },
+ {
+ "id": "R8",
+ "description": "The 85+ tasks exceed Tier 2's per-task context window (the model runs out of memory mid-track)",
+ "mitigation": "Per-task commits are atomic; the failcount state file persists progress. The per-task commit discipline means each commit is a safe rollback point. If a task fails 3 times, escalate to the user (don't keep retrying)."
+ },
+ {
+ "id": "R9",
+ "description": "The 91 tests are too long-running for the per-PR CI gate (the user expects <2 min for unit tests)",
+ "mitigation": "The unit + integration tests run in <30s. The live_gui tests are opt-in via the CODE_PATH_AUDIT_LIVE_GUI env var. The 2 opt-in tests are not in the default run."
+ },
+ {
+ "id": "R10",
+ "description": "The Tier 2 agent uses a git command that is hard-banned (git restore, git checkout, git reset, git push)",
+ "mitigation": "The 3-layer hard ban enforcement (OpenCode permission + Windows restricted token + git hooks) catches the violation. The TIER2_STARTUP.md restates the hard bans. If a task requires one, escalate to the user."
+ }
+ ],
+ "out_of_scope": [
+ "Modifications to existing src/*.py files (read-only on the 65 existing files)",
+ "Modifications to the 5 existing audit scripts (consume their JSON; don't change them)",
+ "Runtime profiling (deferred to pipeline_runtime_profiling_20260607)",
+ "New pip dependencies (stdlib only)",
+ "Changes to data_structure_strengthening or data_oriented_error_handling styleguides",
+ "Changes to v1 spec.md or plan.md (v1 preserved)",
+ "MMA worker spawn action (cold per user)",
+ "New src/<thing>.py files (per AGENTS.md file size + naming convention)",
+ "The 23 lower-impact files (deferred)",
+ "The 3 candidate aggregates' real analysis (deferred to v2.5 follow-up)"
+ ],
+ "follow_up_tracks": [
+ {
+ "id": "pipeline_runtime_profiling_20260607",
+ "purpose": "Calibrate v2's heuristic cost constants against real measurements. Uses src/performance_monitor.py."
+ },
+ {
+ "id": "data_pipelines_inventory_<date>",
+ "purpose": "Per-pipeline (vs per-aggregate) reports for the top 5 pipelines."
+ },
+ {
+ "id": "code_path_audit_in_ci_<date>",
+ "purpose": "Run v2 in CI on every PR; fail on new untyped sites or decomposition-matrix regression."
+ },
+ {
+ "id": "code_path_audit_data_oriented_refactor_<date>",
+ "purpose": "Implement the 3 high-priority componentize candidates (FileItems, History, Metadata)."
+ },
+ {
+ "id": "code_path_audit_v2_5_followup_<date>",
+ "purpose": "Re-run v2 after any_type_componentization_20260621 merges; the 3 placeholders become real profiles."
+ }
+ ]
+}
@@ -0,0 +1,64 @@
+# Track state for code_path_audit_20260607
+# v2 supersedes v1; spec_v2.md + plan_v2.md are the canonical artifacts
+# (v1's spec.md + plan.md are preserved unchanged, never executed)
+# Updated by Tier 2 Tech Lead as tasks complete
+
+[meta]
+track_id = "code_path_audit_20260607"
+name = "Code Path & Data Pipeline Audit v2"
+status = "active"  # active | completed
+current_phase = 0  # 0 = pre-Phase 1; 1..N = in Phase N; "complete" if all phases done
+last_updated = "2026-06-22"
+
+[parent]
+# Independent track (not part of an umbrella)
+
+[blocked_by]
+# No blockers. The 5 foundational tracks (data_oriented_error_handling_20260606,
+# data_structure_strengthening_20260606, mcp_architecture_refactor_20260606,
+# qwen_llama_grok_integration_20260606, result_migration_20260616) are SHIPPED.
+# The 2 candidate-related tracks (any_type_componentization_20260621,
+# phase2_4_5_call_site_completion_20260621) are NOT on master; the v2 audit
+# is tolerant of their absence (forward-compat placeholders).
+
+[blocks]
+# 5 follow-up tracks (see metadata.json follow_up_tracks)
+
+[phases]
+# 14 phases per plan_v2.md
+phase_0 = { status = "pending", checkpointsha = "", name = "Setup (state.toml, empty files, fixture dirs)" }
+phase_1 = { status = "pending", checkpointsha = "", name = "Data model (5 enums + 9 supporting dataclasses + AggregateProfile)" }
+phase_2 = { status = "pending", checkpointsha = "", name = "PCG (3 AST passes: P1 return types, P2 parameter types, P3 field access)" }
+phase_3 = { status = "pending", checkpointsha = "", name = "MemoryDim classifier (canonical mappings + file-of-origin + override)" }
+phase_4 = { status = "pending", checkpointsha = "", name = "APD (5 access patterns + 25% dominance rule)" }
+phase_5 = { status = "pending", checkpointsha = "", name = "CFE (7 frequencies + entry-point detection + override file)" }
+phase_6 = { status = "pending", checkpointsha = "", name = "Decomposition cost (4 directions + auto-generated rationale)" }
+phase_7 = { status = "pending", checkpointsha = "", name = "Cross-audit integration (6 input JSONs + 3-tier mapping)" }
+phase_8 = { status = "pending", checkpointsha = "", name = "v2 DSL (14 new tagged words + flat-section format)" }
+phase_9 = { status = "pending", checkpointsha = "", name = "run_audit() main entry + CLI + MCP tool" }
+phase_10 = { status = "pending", checkpointsha = "", name = "Integration tests (synthetic src/ + audit_inputs/ fixtures)" }
+phase_11 = { status = "pending", checkpointsha = "", name = "Live_gui E2E tests (opt-in via CODE_PATH_AUDIT_LIVE_GUI=1)" }
+phase_12 = { status = "pending", checkpointsha = "", name = "Meta-audit + 1-line extension + styleguide" }
+phase_13 = { status = "pending", checkpointsha = "", name = "End-of-track report + tracks.md update" }
+
+[verification]
+data_model_tests_passing = false
+pcg_tests_passing = false
+memory_dim_tests_passing = false
+apd_tests_passing = false
+cfe_tests_passing = false
+decomposition_cost_tests_passing = false
+cross_audit_integration_tests_passing = false
+v2_dsl_tests_passing = false
+renderers_tests_passing = false
+integration_tests_passing = false
+live_gui_tests_passing = false
+meta_audit_passing = false
+all_4_audit_gates_passing = false
+type_registry_check_passing = false
+audit_run_completed = false
+summary_md_approved = false
+optimization_candidates_md_approved = false
+truncation_md_approved = false
+track_completion_report_written = false
+tracks_md_updated = false