Private

Public Access

Files

T

ed 1a739ecef5 conductor(spec+plan): phase2_4_5_call_site_completion_20260621 + code_path_audit pre-flight adjustments + Phase 3 analysis

PHASE 2/4/5 FOLLOW-UP TRACK (Tier 1 decided SHINK to 6a + 6b + 6d):
- Phase 6a: Fix HookServer.broadcast() callers (app_controller.py + events.py + gui_2.py)
  Adds tests/test_websocket_broadcast_regression.py with no-TypeError assertion
- Phase 6b: Complete _send_grok/_send_minimax/_send_llama OpenAICompatibleRequest migration
- Phase 6d: Update those 3 senders' NormalizedResponse to use UsageStats

Total: ~16 atomic commits, ~3 hours Tier 2 work. Unblocks code_path_audit_20260607.

CODE_PATH_AUDIT_20260607 PRE-FLIGHT ADJUSTMENTS (per handoffs):
- Add 2 new actions: provider_history_append + websocket_broadcast
- Add 5 micro-benchmarks: NormalizedResponse.__init__, WebSocketMessage.__init__,
  UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__
- Add no-TypeError-errors-on-any-thread assertion (backs test_websocket_broadcast_regression.py)
- Add 89 fat-struct sites from ANY_TYPE_AUDIT_20260621.md as instrumented targets
- BLOCKER: phase2_4_5_call_site_completion_20260621 (broadcast() TypeError)

PHASE 3 HYPOTHETICAL ANALYSIS (separate doc):
docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md - dataclass definitions (already on tier2 branch),
per-provider codepath catalog (112 sites), qualitative cost estimation (~+1-2ms per session,
~+8-15us per _send_anthropic turn). Input for the audit; the audit quantifies the cost.

REGISTRATION:
conductor/tracks.md updated: new row 27 (follow-up), new row 28 (parent any_type_componentization),
row 17 (code_path_audit) updated with pre-flight adjustments note.

Files:
- conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md (NEW; 633 lines)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/plan.md (NEW; 7 phases, 23 tasks)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/metadata.json (NEW; 8.8KB)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/state.toml (NEW; 11.8KB)
- docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md (NEW; 380 lines; qualitative cost analysis)
- conductor/tracks/code_path_audit_20260607/spec.md (MODIFIED; +93 lines Pre-Flight Adjustments)
- conductor/tracks.md (MODIFIED; +35 lines: 3 new entries + 1 stale row fix)

2026-06-21 18:32:02 -04:00

37 KiB

Raw Blame History

Track: Code Path & Data Pipeline Audit

Status: Spec approved 2026-06-07; revised 2026-06-08 with post-4-tracks timing and 5-source framing Initialized: 2026-06-07 Owner: Tier 2 Tech Lead Priority: Medium (foundational; enables follow-up pruning track)

Revision note (2026-06-08). The user specified that this audit should run after the 4 foundational tracks complete (qwen_llama_grok_integration_20260606, data_oriented_error_handling_20260606, data_structure_strengthening_20260606, mcp_architecture_refactor_20260606). The 4 tracks will significantly reshape src/ai_client.py, src/mcp_client.py, src/app_controller.py, and src/type_aliases.py — running the audit on the pre-refactor code would produce a report that's stale on day 1. The post-4-tracks timing ensures the audit grounds optimization decisions for the resulting architecture, not the pre-refactor one. See §"Timing" below.

Overview

Build src/code_path_audit.py — a data-oriented static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. The output (custom postfix .dsl data + markdown + Mermaid + prefix tree text) is the artifact that informs pipeline-pruning decisions; the actual code changes are a follow-up track (pipeline_pruning_20260607).

Per the user's framing: "anything that can even remotely smell as an expensive bulk action or major action that takes more than 10-40 microseconds." The audit focuses on expensive operations (file I/O, network, AST parsing, big loops, anything that smells like a bulk action) inside the 3 actions — not on every state mutation. The cost model is heuristic, calibrated by a runtime-profiling follow-up (pipeline_runtime_profiling_20260607) that catches the cases static analysis can't resolve (C-extension cost, import cost, JIT effects, decorator-driven dispatch).

The MMA worker spawn action is out of scope for this track (per user: "keeping that cold for a while until I like the main ux loop with ai in a discussion fully dogfooded").

Timing (post-4-tracks)

This track is intentionally deferred until after the 4 foundational tracks ship:

qwen_llama_grok_integration_20260606 — adds 3 vendors (_send_qwen, _send_llama, _send_grok) and refactors _send_minimax to use the shared send_openai_compatible() helper. Modifies src/ai_client.py, src/openai_compatible.py (new), src/vendor_capabilities.py (new).
data_oriented_error_handling_20260606 — refactors ai_client._send_<vendor> to return Result[str], modifies mcp_client.py (30+ sites), rag_engine.py (Result returns).
data_structure_strengthening_20260606 — adds src/type_aliases.py with 10 TypeAliases, replaces 345 weak-type sites across 6 files.
mcp_architecture_refactor_20260606 — splits src/mcp_client.py (2,205 lines → 6 sub-MCPs + 1 external), adds src/mcp_client_legacy.py for backward compat.

Running the audit on the pre-refactor src/ would produce a report that's stale on day 1. The post-4-tracks timing ensures:

The audit's data grounds optimization decisions for the resulting architecture (post-Fleury-style "effective codepaths" and "ECS archetype tables" if the 4 tracks are implemented with the data-oriented philosophy).
The pipeline_pruning_20260607 follow-up has the right candidates to optimize — the 4 tracks will move the expensive ops around, and pruning the wrong ones wastes work.
The runtime-profiling follow-up (pipeline_runtime_profiling_20260607) measures the new code paths, not the old ones.

Pre-flight check (verifies the 4-tracks baseline before this track starts): confirm that all 4 tracks are marked [x] completed in conductor/tracks.md. If any of the 4 are still [~] in-progress, this track is blocked — the audit would catch the in-progress state as drift.

Analytical Framing (5-source lens)

The 5 sources loaded into context for the post-4-tracks audit collectively reframe what to look for in the 3 actions. The audit's static cost model and pipeline-pruning recommendations should be informed by:

Source	Lens the audit inherits
Ryan Fleury, "A Taxonomy of Computation Shapes" (Feb 2023)	The 6 shapes: instruction, codepath, wide codepath, codecycle, wide codecycle, codecycle graph. The audit's `trace_action` is a codepath visualization; the `redundancy` (call_count > 1) field detects wide codepaths that could be split into parallel sub-codepaths.
Ryan Fleury, "The Codepath Combinatoric Explosion" (Apr 2023)	The "effective codepath" concept. The audit's `pipelining_candidates` field detects codepaths that could be defused (multiple real codepaths collapsed into 1 effective codepath via nil sentinels, generational handles, or immediate-mode APIs). The `redundancy` field is the first indicator of defusing opportunities.
Casey Muratori, "The Big OOPs: Anatomy of a Thirty-Five-Year Mistake" (BSC 2025)	The 35-year-historical indictment of compile-time domain hierarchies. The audit's per-function `state_mutations` index reveals whether a function is in the system pattern (mutates component-like data, not entity state) or the entity-hierarchy pattern (mutates a single object's identity, where the cost compounds per type). Functions in the latter pattern are the highest-priority refactor targets — they may need to be split into components + systems.
Andrew Reece, "Assuming as Much as Possible" (BSC 2025)	The "assume as much as possible" engineering discipline. The audit's `expensive_ops` index, for any function that calls a general-purpose primitive (e.g., `json.dumps`, `Path.read_text`, `ast.parse`), should ask: "can this caller assume a smaller input domain and use a specialized primitive instead?" A function that calls `json.dumps` 50 times per action with 1KB payloads each may be replaceable by a function that calls a domain-specific serializer once with a 50KB payload.
User's chunk-ideation archive (May 2026)	The "fixed-size slices" + "ECS archetype tables" pattern. The audit's per-function calls that operate on lists/arrays should be flagged if they: (a) don't have a chunk-aware variant, (b) are in a hot path, (c) the data shape is uniform enough to chunk. Functions that match all 3 are the prime candidates for `pipeline_pruning_20260607` — chunkification is a known pattern with bounded risk.

Concrete audit-time heuristics that emerge from this framing:

Effective-codepath count: when a function has 3+ branches that all do roughly the same thing with different inputs, the audit should report "this is N real codepaths behaving as 1 effective codepath — could be defused with a nil sentinel or generational handle." The runtime-profiling follow-up measures the actual savings.
Entity-hierarchy fingerprint: when a function's state_mutations list has > 3 writes to a single self.X with a type discriminator, the audit should report "this function is operating on entity-hierarchy state; consider ECS split into components + systems." A concrete Manual Slop example the audit should catch: any function that does if self.active_ticket.kind == TicketKind.X: and then mutates multiple fields.
Assumed-too-much detector: when a function calls ast.parse (or any tree_sitter.*) on a file that could be assumed to be already-parsed (because the file is in the context composition and the aggregate.py pipeline has already done it), the audit should report "this is re-parsing data that was already parsed upstream; consider memoizing or threading the parsed AST through." This is the "assume as much as possible" pattern at the data-passing level.
Chunkification candidates: when a function loops over a list[dict] with a known uniform shape (heuristic: all dicts have the same key set), the audit should report "consider chunkifying — uniform data, hot path, no chunk awareness." The user has explicit code (docs/ideation/ed_chunk_data_structures_20260523.md) for the chunk pattern, so the audit's optimization candidates can cite it.

These heuristics are guidance for the audit's report interpretation — they don't change the audit's static cost model (which is data-grounded in the existing EXENSIVE_THRESHOLD + per-class weights). They shape how the Tier 2 Tech Lead and the user interpret the report.

Current State Audit (as of `ca781543`)

src/ has 61 .py files (27,447 total lines; 23,845 code lines). The call graph is non-trivial; per-action traversal is what makes the analysis tractable.

Already Implemented (DO NOT re-implement; KEEP / build on)

src/mcp_client.py:934-992 — derive_code_path(target, max_depth=5). A single-symbol recursive call tracer with text output. Doesn't render multi-action graphs, doesn't track mutations, doesn't measure cost. The new tool is the multi-action + mutation + cost version of this primitive. Build on this: lift the AST traversal logic and trace() recursion pattern into code_path_audit.py.
scripts/audit_main_thread_imports.py — static CI gate for import-time purity. Different concern (startup-time import cost), but its AST-walking pattern is the model for code_path_audit.py's implementation.
src/performance_monitor.py — runtime profiling with monitor.scope("name") and per-component hit counts + latencies. Used at runtime; the follow-up pipeline_runtime_profiling_20260607 track will use it to calibrate the heuristic cost model.
conductor/archive/code_path_analysis_20260507/ — prior manual audit + PIPELINE_ANALYSIS.md + Mermaid diagrams for the major pipelines. Manual effort, no reusable tool. New track is the data-grounded successor.
conductor/archive/ai_interaction_call_graph_20260507/ — sequence diagram for the AI loop. New track supersedes this for the 3 actions in scope.
SDM docstrings ([C: ...] / [M: ...] tags in src/*.py docstrings) — pre-computed caller/mutation info. The new audit tool will be a more rigorous version of what SDM already documents ad-hoc.

Gaps to Fill (this track's scope)

A static call-graph builder for all of src/ (multi-action, depth-configurable, machine-readable output).
A state-mutation index per function (5 mutation kinds: attr_write, container_mutate, file_write, ipc_emit, global_write).
An expensive-ops index (7 cost classes, with a heuristic data-size estimate).
A per-action traversal API (trace_action(action, max_depth=10) -> ActionProfile).
An output suite: custom postfix .dsl data files + markdown summaries + Mermaid per-action call graphs + prefix-tree text view.
A CLI (python -m src.code_path_audit --action <name>) and an MCP tool (code_path_audit(action_name, max_depth)).
The actual audit run on the 3 actions, with the report committed to docs/reports/code_path_audit/2026-06-07/.

Goals

Produce a queryable artifact. The custom postfix .dsl output is the source of truth; markdown + Mermaid + prefix-tree text are for human review. Re-run after any src/ change to see drift.
Surface the top-N optimization candidates per action. The summary.md ranks candidates by potential data-transform load reduction. This is what the user will use to decide which pruning/optimization work to do next.
Data-grounded design. The audit's data structure is the spec; the heuristics and the threshold are module-level constants tunable from one place.
Reusable across actions. The trace_action API takes any Action (entry point + description). Adding a 4th action (e.g., MMA worker spawn, when it's no longer cold) is one Action(...) declaration.
Surface calibration gaps clearly. When the static heuristic can't resolve a call (C-extension, decorator-driven dispatch, getattr magic), the report flags it as "unresolved" so the runtime-profiling follow-up targets it.

Non-Goals

Not implementing the actual code optimizations — that's pipeline_pruning_20260607.
Not profiling runtime costs — that's pipeline_runtime_profiling_20260607.
Not analyzing the MMA worker spawn action (cold per user).
Not analyzing simulation/* or tests/* directories.
Not analyzing actions beyond the 3 in scope.
Not resolving C-extension call costs statically.
Not resolving decorator-driven call dispatch statically (e.g., @property, @imscope).
Not providing real microsecond measurements — the cost is heuristic (calibrated later).

Architecture

src/code_path_audit.py — single new module, no new dependencies. Exposes both an MCP tool surface (for agents) and a CLI (python -m src.code_path_audit ...).

Public API

class CallGraph:
    """Directed graph: nodes are functions; edges are call sites."""
    nodes: dict[str, "FunctionNode"]            # fully-qualified name -> node
    edges: dict[str, set[str]]                 # caller -> set of callees
    def add_edge(self, caller: str, callee: str) -> None: ...
    def transitive_callees(self, root: str, max_depth: int = 10) -> set[str]: ...
    def render_mermaid(self, root: str, max_depth: int = 5) -> str: ...

class FunctionNode:
    fqname: str                                # "src.ai_client.AIClient.send"
    file: str
    line: int
    calls: list[str]                           # all callees (resolved or not)
    state_mutations: list["StateMutation"]
    expensive_ops: list["ExpensiveOp"]

class StateMutation:
    target: str                                # "self.history", "module.events", "file:..."
    kind: Literal["attr_write", "container_mutate", "file_write", "ipc_emit", "global_write"]
    line: int

class ExpensiveOp:
    callee: str
    cost_class: Literal["file_io", "network", "ast_parse", "json_io", "pickle", "deep_copy", "loop_amplified"]
    data_size_estimate: int | None              # bytes or container length, heuristic
    line: int                                  # call site in the caller
    weight: int                                # cost_class_weight * data_size (or 1 if data_size unknown)

class Action:
    name: str                                  # "ai_message_lifecycle"
    entry_points: list[str]                    # ["src.app_controller.AppController.process_user_request", ...]
    description: str

class ActionProfile:
    action: Action
    call_graph: CallGraph                      # subgraph reachable from entry points
    expensive_ops: list[ExpensiveOp]           # all expensive ops in the subgraph
    state_mutations: list[StateMutation]       # all mutations in the subgraph
    redundancy: list[tuple[str, int]]          # (op_fqname, call_count) where count > 1
    pipelining_candidates: list[list[str]]     # groups of independent ops currently sequential
    total_load_estimate: int                   # sum(weight) heuristic
    unresolved_calls: list[str]                # calls the AST walker couldn't resolve
    mermaid: str                               # rendered Mermaid
    markdown: str                              # human-readable per-action report

def trace_action(action: Action, max_depth: int = 10) -> ActionProfile: ...
def build_call_graph(src_dir: str = "src") -> CallGraph: ...   # full call graph
def build_expensive_ops_index(cg: CallGraph) -> dict[str, list[ExpensiveOp]]: ...
def build_state_mutations_index(cg: CallGraph) -> dict[str, list[StateMutation]]: ...

Cost Model (heuristic, calibrated by the runtime-profiling follow-up)

Pattern	Cost class	Default weight	Data size source
`open()`, `Path.read_`, `Path.write_`, `*.write_text`	`file_io`	100	file size from `Path.stat()` when resolvable, else `None`
`requests.`, `urllib.`, `websockets.*`, `client.send` (with httpx-like signatures)	`network`	500	payload size from param literal/typed hint
`ast.parse`, `ast.walk`, `tree_sitter.*`	`ast_parse`	200	source bytes from the path arg
`json.dump`, `json.load`, `tomli_w.dump`, `tomllib.load`	`json_io`	150	container length if param is a list/dict
`pickle.dump`, `pickle.load`	`pickle`	300	container length
`copy.deepcopy`	`deep_copy`	200	container length
Any call inside the body of a `for` / `while` loop	`loop_amplified`	caller_weight × loop_bound_estimate	loop bound = `range(...)` literal/arg, else 1

Expense threshold: EXPENSIVE_THRESHOLD = 40_000 (module-level constant). Any ExpensiveOp.weight > EXPENSIVE_THRESHOLD is flagged "expensive" in the per-action report. The 40,000 default matches the user's stated 10-40μs range; the runtime-profiling follow-up will calibrate it.

Unresolved calls: when the AST walker cannot resolve a callee (e.g., attribute access on self.X where X is set dynamically; getattr; decorator-wrapped method dispatch), the call goes into unresolved_calls with a "unresolved" cost class and weight 0. The report's caveats section notes these; the runtime-profiling follow-up measures them.

Out of the static analysis

C-extension call costs (imgui-bundle, tree-sitter native) — runtime profiling only.
Decorator-driven dispatch (e.g., @property, @imscope) — runtime profiling only.
Import cost at module load time — covered by the existing scripts/audit_main_thread_imports.py.
eval / exec calls — flagged as unresolved, not analyzed.

Per-Action Design

For each of the 3 actions, the audit is invoked with one or more entry points and a depth limit (default 10). The audit produces an ActionProfile that the report renders.

Action	Entry points	Expected high-cost ops the audit should surface
AI message lifecycle	`src.app_controller.AppController.process_user_request`, `src.ai_client.AIClient.send`, `src.aggregate.build_file_items`, `src.summarize._summarise_*`	Per-context-file AST parse in `build_file_items`; AI network call; history append + comms log append + session_logger file write; sub-agent summarization (network + AST, loop-amplified over context files)
Discussion save/load	`src.project_manager.save_project`, `src.project_manager.load_project`, `src.history.HistoryManager.save_snapshot`, `src.models.parse_history_entries`	`tomli_w.dump` / `tomllib.load` on project TOML; `json.dump` on comms log (loop-amplified per entry); history file read/write; AST parse on schema validation
GUI startup	`sloppy.main` → `gui_2.App.__init__`, `src.app_controller.AppController.__init__`, `src.paths._resolve_*`	`tomllib.load` on config.toml; AST parses for tool registration; file stat on log paths; `sloppy.py` first-frame import chain (covered by the existing `scripts/audit_main_thread_imports.py`)

The user can extend with more actions later (e.g., MMA worker spawn when it's no longer cold). Each action is one Action(...) declaration + a trace_action() call.

Output Format

CLI:

uv run python -m src.code_path_audit --action ai_message_lifecycle [--depth N] [--dsl] [--tree] [--markdown] [--mermaid]

MCP tool (for agents):

code_path_audit(action_name: str, max_depth: int = 10) -> dict

Generated artifacts (all under docs/reports/code_path_audit/<YYYY-MM-DD>/):

File	Format	Purpose
`call_graph.dsl`	Custom postfix DSL	Full call graph (all of `src/`); machine-readable, parses in ~30 lines
`expensive_ops.dsl`	Custom postfix DSL	Expensive ops index (per-file, per-function)
`state_mutations.dsl`	Custom postfix DSL	State mutations index (per function)
`actions/<action>.dsl`	Custom postfix DSL	Per-action profile (machine-readable)
`actions/<action>.tree`	Prefix tree (text)	Per-action human-readable tree (for human review)
`actions/<action>.md`	Markdown	Per-action summary + table (for code review)
`actions/<action>.mmd`	Mermaid	Per-action call graph (visual)
`summary.md`	Markdown	Top-level cross-action summary + ranked optimization candidates
`optimization_candidates.md`	Markdown	Ranked list with: candidate, current cost, proposed reduction, effort, priority

The two follow-up tracks consume the .dsl files; the markdown + tree are for human review.

The custom DSL is postfix (RPN) with length-prefixed lists — no brackets, no braces, no commas, no colons. Each "word" is a tagged constructor that consumes a known number of args from the stack (e.g., fn consumes 3, exp-op consumes 5, mut consumes 3, N list consumes N items). Whitespace-tokenized. Strings are bare atoms when they have no whitespace; quoted only when needed. nil for null. \ for line comments. The DSL is deliberately NOT strict Forth — it's a custom postfix format tailored to the audit's record shapes (function, call, mutation, expensive op, pair, list).

Example of a single FunctionNode record:

\ FunctionNode: fqname file line fn
"src.ai_client.AIClient.send" "src/ai_client.py" 100 fn
"build_file_items" call
"process_response" call
"self.history" attr_write 110 mut
"open" file_io 100 120 exp-op

The prefix tree renderer is a separate human-readable view of the same data — top-down, ├─/└─/│ box-drawing, scannable. Generated by a recursive walker. Inlined in the markdown reports (optionally produced as actions/<action>.tree for tooling).

Why custom postfix DSL (not JSON, not s-expressions, not strict Forth):

Not JSON (JSON is ill-performant: quoting, escaping, hash table allocation, no streaming).
Not s-expressions (the bracket version drifts back toward s-exprs; the user wanted postfix specifically).
Not strict Forth (the user wants a format ideal for call-graph recording, not a Turing-complete Forth program).
Postfix (per user: "I want a post-fix heiarchy"): stack-based, no delimiters to count.
Length-prefixed lists (standard postfix solution for nesting): N list consumes N items, unambiguous.
Trivial parser (~30 lines: split + walk + evaluate tagged words against a known arity table).
Compact: ~30-40% fewer characters than JSON for the same data.
Streamable: no need to parse the whole file to find a record; you can scan for tags.
Extensible: add new metric types by adding new tagged words (metric(name value sample_size), histogram(buckets), etc.).

Verification (TDD per `conductor/workflow.md`)

Unit tests in tests/test_code_path_audit.py:

CallGraph.add_edge + transitive_callees correctness on a synthetic 5-node graph.
ExpensiveOpIndex detects each of the 7 cost classes on synthetic source.
StateMutationIndex detects each of the 5 mutation kinds on synthetic source.
trace_action produces an ActionProfile for a synthetic action whose expected cost is computable by hand.
Custom postfix .dsl output round-trips (parse_dsl(to_dsl(profile)) == in-memory structure).
Prefix tree renderer produces well-formed box-drawing output for the 3 per-action reports.
Markdown output is well-formed (header per section, table per category).
Mermaid output parses as valid Mermaid syntax.

Smoke test: run python -m src.code_path_audit --action ai_message_lifecycle --depth 5 against a fixture project; verify the report is produced and contains the expected high-cost ops (per the table above).

Manual verification: the report is the deliverable. A Tier 2 Tech Lead + user review the produced summary.md to confirm the optimization candidates make sense.

Commit Structure (6 atomic commits, in order)

1. feat(audit): add code_path_audit data structures (CallGraph, ExpensiveOpIndex, StateMutationIndex)
   - src/code_path_audit.py (initial data structures)
   - tests/test_code_path_audit.py (unit tests)
2. feat(audit): add trace_action + ActionProfile + cost model
   - src/code_path_audit.py (extends with action tracing)
   - tests/test_code_path_audit.py (integration tests)
3. feat(audit): add custom postfix DSL writer + parser + tree renderer / markdown / Mermaid output
4. feat(audit): add MCP tool + CLI surface
5. docs(audit): run audit on 3 actions; commit report
   - docs/reports/code_path_audit/2026-06-07/* (the deliverable)
6. conductor(tracks): mark Code Path Audit track complete
   - tracks.md update

Each commit message includes a git notes add -m "..." summary per conductor/workflow.md step 9.1-9.3.

Risks

Risk	Likelihood	Impact	Mitigation
Heuristic cost model is imprecise; reported "expensive" ops aren't actually expensive at runtime.	Medium	Medium (false positives dilute the report)	`EXPENSIVE_THRESHOLD` is a module-level constant; the runtime-profiling follow-up calibrates it.
AST walking misses dynamic patterns (eval, getattr, decorator-driven dispatch).	Medium	Medium (under-estimates some calls)	Document the limitations in the report's caveats section; the runtime-profiling follow-up catches these.
Mermaid diagrams exceed renderable size for deep actions.	Medium	Low (visualization only)	Default `max_depth=5` for `--mermaid`; full graph available as `.dsl`.
The 3 actions' entry points are not exactly the functions the user has in mind.	Medium	Low (the report is the artifact; user can re-run with different entry points)	Document the chosen entry points in the report; CLI/MCP tool accepts any fully-qualified function name.
Report is too large to review (thousands of expensive ops).	Low	Medium	Per-action scoping; default `--depth 5`; ranked optimization candidates in `summary.md` make the top-N obvious.
Existing `derive_code_path` is the de-facto call-graph tool and the new one is redundant.	Low	Low (the new one is a strict superset)	`derive_code_path` stays as a thin wrapper around `code_path_audit.trace_action` for backward compat, OR gets a `@deprecated` shim.
The 3 actions are not actually the user's top 3 (user might have meant a different 3).	Low	Low (the tool is generic; re-run with different actions is one CLI call)	CLI accepts any `Action`; user can re-run.

Coordination with Pending Tracks

This track has no blockers and no conflicts. It can ship independently of the 5 active planned tracks. It enables future refactors:

Pending track	Could use this analysis for...
`qwen_llama_grok_integration_20260606`	Identifying redundant OpenAI-compatible request paths in `_send_*` functions
`data_oriented_error_handling_20260606`	Showing the call paths the new `Result[T]` return values will thread through
`data_structure_strengthening_20260606`	Pinpointing hot functions where the new type aliases matter most
`mcp_architecture_refactor_20260606`	Identifying which sub-MCPs have the most expensive operations (file_io vs network vs ast)
`test_batching_refactor_20260606`	Confirming which tests trigger the most expensive paths (to optimize test selection)

This track's analysis is read-only — it doesn't modify src/, doesn't change the public API, doesn't add tests to the existing test suite. The only new files are src/code_path_audit.py (the tool), tests/test_code_path_audit.py (the tests), and the report under docs/reports/code_path_audit/2026-06-07/.

Pre-Flight Adjustments (2026-06-21, per handoffs from `any_type_componentization_20260621`)

The any_type_componentization_20260621 track (shipped 2026-06-21 with 48/89 sites promoted) revealed that the 4 foundational tracks this audit was deferred behind have evolved. Specifically, 5 new hot-path dataclasses (ToolSpec, ChatMessage, UsageStats, ToolCall, WebSocketMessage) and 1 new module (provider_state.ProviderHistory) now exist. This audit must instrument them.

Per docs/handoffs/PROMPT_FOR_TIER_1.md and HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md, the following 4 adjustments are added to this audit's scope:

A1. Add 2 new actions to the per-action profiling

The existing 3 actions (ai_message_lifecycle, discussion_save_load, gui_startup) become 5:

Action	Codepath	Measures
`provider_history_append` (NEW)	`get_history(p).append(msg)` (or legacy `_anthropic_history.append(msg)`)	Per-turn append latency + lock acquire time + memory allocation per call. The hot path Phase 3 will refactor.
`websocket_broadcast` (NEW)	`broadcast(WebSocketMessage(...))` (post-Phase 6a)	Per-broadcast overhead (allocation + JSON serialization + WebSocket send). The GUI thread's per-event cost.
`ai_message_lifecycle` (existing)	`_send_<provider>` end-to-end	Total per-turn latency delta pre/post Phase 3 (`provider_state.ProviderHistory`). The 3 OpenAI-compatible providers (`grok`, `minimax`, `llama`) are newly instrumented (currently unprofiled).
`discussion_save_load` (existing)	`reset_session()` + project switch	Cold-path cost. The `clear_all()` migration's per-call delta.
`gui_startup` (existing)	`_PROVIDER_HISTORIES` dict init at module load	One-time init cost (6 `ProviderHistory()` instances + 6 locks).

A2. Add 5 micro-benchmarks to the audit's `optimization_candidates.md`

The audit's per-call cost estimates should include these 5 micro-benchmarks (added per HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md §7):

Micro-benchmark	Purpose	Expected overhead
`NormalizedResponse.__init__`	Dataclass construction vs the old 6-field dict literal	<1μs; immaterial
`WebSocketMessage.__init__`	Dataclass construction per broadcast	<5μs; the hot path concern
`UsageStats.__init__`	Nested dataclass construction per response	<500ns; negligible (4 int fields)
`ProviderHistory.lock` acquire	threading.Lock acquire overhead	<500ns; the threading hot path
`ToolSpec.__init__`	Dataclass construction per tool (45 tools, cold path)	<2μs; only at registration

The benchmarks are emitted to docs/reports/code_path_audit/<date>/micro_benchmarks.md.

A3. Add the "no-TypeError-errors-on-any-thread" assertion

The audit's per-action profiling runs the 5 actions in a controlled harness. The audit MUST assert that no worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given (or any TypeError on any thread) appears in the harness output during profiling.

This assertion catches the broadcast() regression that any_type_componentization_20260621 introduced. The regression test that backs this assertion lives in tests/test_websocket_broadcast_regression.py (added by the phase2_4_5_call_site_completion_20260621 follow-up track).

If the assertion fires, the audit's output should:

Mark the affected action's profile as INSTRUMENTATION_CONTAMINATED
List the offending thread + traceback in the report's errors.md
Recommend re-running the audit AFTER phase2_4_5_call_site_completion_20260621 merges

A4. Add the 89 fat-struct sites as instrumented targets

The audit reads docs/reports/ANY_TYPE_AUDIT_20260621.md §3's table and tags each Any usage with (file:line, hot_path, cold_path, init_path). The 89 sites become per-action cost estimates that flow into optimization_candidates.md.

For the 48 promoted sites, the audit compares pre-refactor (legacy globals + dict literals) vs post-refactor (dataclass + registry). For the 41 deferred Phase 3 sites, the audit produces per-call cost estimates that inform the future Phase 3 follow-up track (see docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md for the qualitative estimates).

A5. Sequencing (BLOCKER)

This audit is now blocked by phase2_4_5_call_site_completion_20260621 (the broadcast() fix). Until Phase 6a merges, the GUI thread's worker[queue_fallback] TypeError spam contaminates the audit's per-action profiling.

Recommended sequence:

T0:  Tier 1 approves follow-up track                  (decision: SHRINK to 6a + 6b + 6d)
T1:  Tier 2 implements Phase 6a + 6b + 6d            (~3 hours, ~16 commits)
T2:  Tier 1 reviews + merges follow-up track
T3:  Tier 1 launches code_path_audit_20260607
T4:  Tier 2 implements Phase 3 + cross-phase coupling (separate track, post-audit)

A6. New coordination with `any_type_componentization_20260621`

This audit now has new dependencies beyond the original 4 foundational tracks:

Track	Status	Provides to this audit
`any_type_componentization_20260621`	Shipped 2026-06-21 (48/89 promoted)	The 5 dataclasses + 1 module; the 200-site dataclass-coverage baseline
`phase2_4_5_call_site_completion_20260621`	Spec'd 2026-06-21; not yet merged	The fix for the broadcast() TypeError; the "no-TypeError" assertion

This audit is blocked_by both tracks (post-merge).

Follow-up

pipeline_runtime_profiling_20260607 (the user-requested follow-up; NOT in this track): adds a runtime profiling harness using the existing src/performance_monitor.py + a per-action test fixture. Measures real costs for the 3 actions. Calibrates the heuristic cost model (EXPENSIVE_THRESHOLD + per-class weights). Catches "things that aren't easy to resolve statically" — import cost, JIT effects, GC pauses, C-extension call cost (imgui-bundle, tree-sitter native), decorator-driven dispatch. Output: scripts/runtime_profiler.py + updated code_path_audit.py cost model.
pipeline_pruning_20260607 (the second follow-up; NOT in this track): implements the high-priority optimization candidates surfaced by this track's report. Will be scoped AFTER this track ships, since the report itself defines what to prune.

Out of Scope

MMA worker spawn action (deferred per user — keeping MMA cold until the 1:1 discussion UX is dogfooded in a few projects).
Implementing the optimization fixes (deferred to pipeline_pruning_20260607).
Runtime profiling (deferred to pipeline_runtime_profiling_20260607 per the user's explicit ask).
Other major actions beyond AI message, save/load, GUI startup.
C-extension call costs (deferred to runtime profiling).
Decorator-driven call dispatch (deferred to runtime profiling).
simulation/* and tests/* directories (analysis is src/-only for this track; can be extended later).
Modifying src/ (read-only analysis).

37 KiB Raw Blame History Unescape Escape