Private
Public Access
0
0
Files
manual_slop/conductor/tracks/code_path_audit_20260607/spec.md
T
ed 1a739ecef5 conductor(spec+plan): phase2_4_5_call_site_completion_20260621 + code_path_audit pre-flight adjustments + Phase 3 analysis
PHASE 2/4/5 FOLLOW-UP TRACK (Tier 1 decided SHINK to 6a + 6b + 6d):
- Phase 6a: Fix HookServer.broadcast() callers (app_controller.py + events.py + gui_2.py)
  Adds tests/test_websocket_broadcast_regression.py with no-TypeError assertion
- Phase 6b: Complete _send_grok/_send_minimax/_send_llama OpenAICompatibleRequest migration
- Phase 6d: Update those 3 senders' NormalizedResponse to use UsageStats

Total: ~16 atomic commits, ~3 hours Tier 2 work. Unblocks code_path_audit_20260607.

CODE_PATH_AUDIT_20260607 PRE-FLIGHT ADJUSTMENTS (per handoffs):
- Add 2 new actions: provider_history_append + websocket_broadcast
- Add 5 micro-benchmarks: NormalizedResponse.__init__, WebSocketMessage.__init__,
  UsageStats.__init__, ProviderHistory.lock, ToolSpec.__init__
- Add no-TypeError-errors-on-any-thread assertion (backs test_websocket_broadcast_regression.py)
- Add 89 fat-struct sites from ANY_TYPE_AUDIT_20260621.md as instrumented targets
- BLOCKER: phase2_4_5_call_site_completion_20260621 (broadcast() TypeError)

PHASE 3 HYPOTHETICAL ANALYSIS (separate doc):
docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md - dataclass definitions (already on tier2 branch),
per-provider codepath catalog (112 sites), qualitative cost estimation (~+1-2ms per session,
~+8-15us per _send_anthropic turn). Input for the audit; the audit quantifies the cost.

REGISTRATION:
conductor/tracks.md updated: new row 27 (follow-up), new row 28 (parent any_type_componentization),
row 17 (code_path_audit) updated with pre-flight adjustments note.

Files:
- conductor/tracks/phase2_4_5_call_site_completion_20260621/spec.md (NEW; 633 lines)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/plan.md (NEW; 7 phases, 23 tasks)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/metadata.json (NEW; 8.8KB)
- conductor/tracks/phase2_4_5_call_site_completion_20260621/state.toml (NEW; 11.8KB)
- docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md (NEW; 380 lines; qualitative cost analysis)
- conductor/tracks/code_path_audit_20260607/spec.md (MODIFIED; +93 lines Pre-Flight Adjustments)
- conductor/tracks.md (MODIFIED; +35 lines: 3 new entries + 1 stale row fix)
2026-06-21 18:32:02 -04:00

37 KiB
Raw Blame History

Track: Code Path & Data Pipeline Audit

Status: Spec approved 2026-06-07; revised 2026-06-08 with post-4-tracks timing and 5-source framing Initialized: 2026-06-07 Owner: Tier 2 Tech Lead Priority: Medium (foundational; enables follow-up pruning track)

Revision note (2026-06-08). The user specified that this audit should run after the 4 foundational tracks complete (qwen_llama_grok_integration_20260606, data_oriented_error_handling_20260606, data_structure_strengthening_20260606, mcp_architecture_refactor_20260606). The 4 tracks will significantly reshape src/ai_client.py, src/mcp_client.py, src/app_controller.py, and src/type_aliases.py — running the audit on the pre-refactor code would produce a report that's stale on day 1. The post-4-tracks timing ensures the audit grounds optimization decisions for the resulting architecture, not the pre-refactor one. See §"Timing" below.


Overview

Build src/code_path_audit.py — a data-oriented static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. The output (custom postfix .dsl data + markdown + Mermaid + prefix tree text) is the artifact that informs pipeline-pruning decisions; the actual code changes are a follow-up track (pipeline_pruning_20260607).

Per the user's framing: "anything that can even remotely smell as an expensive bulk action or major action that takes more than 10-40 microseconds." The audit focuses on expensive operations (file I/O, network, AST parsing, big loops, anything that smells like a bulk action) inside the 3 actions — not on every state mutation. The cost model is heuristic, calibrated by a runtime-profiling follow-up (pipeline_runtime_profiling_20260607) that catches the cases static analysis can't resolve (C-extension cost, import cost, JIT effects, decorator-driven dispatch).

The MMA worker spawn action is out of scope for this track (per user: "keeping that cold for a while until I like the main ux loop with ai in a discussion fully dogfooded").

Timing (post-4-tracks)

This track is intentionally deferred until after the 4 foundational tracks ship:

  1. qwen_llama_grok_integration_20260606 — adds 3 vendors (_send_qwen, _send_llama, _send_grok) and refactors _send_minimax to use the shared send_openai_compatible() helper. Modifies src/ai_client.py, src/openai_compatible.py (new), src/vendor_capabilities.py (new).
  2. data_oriented_error_handling_20260606 — refactors ai_client._send_<vendor> to return Result[str], modifies mcp_client.py (30+ sites), rag_engine.py (Result returns).
  3. data_structure_strengthening_20260606 — adds src/type_aliases.py with 10 TypeAliases, replaces 345 weak-type sites across 6 files.
  4. mcp_architecture_refactor_20260606 — splits src/mcp_client.py (2,205 lines → 6 sub-MCPs + 1 external), adds src/mcp_client_legacy.py for backward compat.

Running the audit on the pre-refactor src/ would produce a report that's stale on day 1. The post-4-tracks timing ensures:

  • The audit's data grounds optimization decisions for the resulting architecture (post-Fleury-style "effective codepaths" and "ECS archetype tables" if the 4 tracks are implemented with the data-oriented philosophy).
  • The pipeline_pruning_20260607 follow-up has the right candidates to optimize — the 4 tracks will move the expensive ops around, and pruning the wrong ones wastes work.
  • The runtime-profiling follow-up (pipeline_runtime_profiling_20260607) measures the new code paths, not the old ones.

Pre-flight check (verifies the 4-tracks baseline before this track starts): confirm that all 4 tracks are marked [x] completed in conductor/tracks.md. If any of the 4 are still [~] in-progress, this track is blocked — the audit would catch the in-progress state as drift.

Analytical Framing (5-source lens)

The 5 sources loaded into context for the post-4-tracks audit collectively reframe what to look for in the 3 actions. The audit's static cost model and pipeline-pruning recommendations should be informed by:

Source Lens the audit inherits
Ryan Fleury, "A Taxonomy of Computation Shapes" (Feb 2023) The 6 shapes: instruction, codepath, wide codepath, codecycle, wide codecycle, codecycle graph. The audit's trace_action is a codepath visualization; the redundancy (call_count > 1) field detects wide codepaths that could be split into parallel sub-codepaths.
Ryan Fleury, "The Codepath Combinatoric Explosion" (Apr 2023) The "effective codepath" concept. The audit's pipelining_candidates field detects codepaths that could be defused (multiple real codepaths collapsed into 1 effective codepath via nil sentinels, generational handles, or immediate-mode APIs). The redundancy field is the first indicator of defusing opportunities.
Casey Muratori, "The Big OOPs: Anatomy of a Thirty-Five-Year Mistake" (BSC 2025) The 35-year-historical indictment of compile-time domain hierarchies. The audit's per-function state_mutations index reveals whether a function is in the system pattern (mutates component-like data, not entity state) or the entity-hierarchy pattern (mutates a single object's identity, where the cost compounds per type). Functions in the latter pattern are the highest-priority refactor targets — they may need to be split into components + systems.
Andrew Reece, "Assuming as Much as Possible" (BSC 2025) The "assume as much as possible" engineering discipline. The audit's expensive_ops index, for any function that calls a general-purpose primitive (e.g., json.dumps, Path.read_text, ast.parse), should ask: "can this caller assume a smaller input domain and use a specialized primitive instead?" A function that calls json.dumps 50 times per action with 1KB payloads each may be replaceable by a function that calls a domain-specific serializer once with a 50KB payload.
User's chunk-ideation archive (May 2026) The "fixed-size slices" + "ECS archetype tables" pattern. The audit's per-function calls that operate on lists/arrays should be flagged if they: (a) don't have a chunk-aware variant, (b) are in a hot path, (c) the data shape is uniform enough to chunk. Functions that match all 3 are the prime candidates for pipeline_pruning_20260607 — chunkification is a known pattern with bounded risk.

Concrete audit-time heuristics that emerge from this framing:

  • Effective-codepath count: when a function has 3+ branches that all do roughly the same thing with different inputs, the audit should report "this is N real codepaths behaving as 1 effective codepath — could be defused with a nil sentinel or generational handle." The runtime-profiling follow-up measures the actual savings.
  • Entity-hierarchy fingerprint: when a function's state_mutations list has > 3 writes to a single self.X with a type discriminator, the audit should report "this function is operating on entity-hierarchy state; consider ECS split into components + systems." A concrete Manual Slop example the audit should catch: any function that does if self.active_ticket.kind == TicketKind.X: and then mutates multiple fields.
  • Assumed-too-much detector: when a function calls ast.parse (or any tree_sitter.*) on a file that could be assumed to be already-parsed (because the file is in the context composition and the aggregate.py pipeline has already done it), the audit should report "this is re-parsing data that was already parsed upstream; consider memoizing or threading the parsed AST through." This is the "assume as much as possible" pattern at the data-passing level.
  • Chunkification candidates: when a function loops over a list[dict] with a known uniform shape (heuristic: all dicts have the same key set), the audit should report "consider chunkifying — uniform data, hot path, no chunk awareness." The user has explicit code (docs/ideation/ed_chunk_data_structures_20260523.md) for the chunk pattern, so the audit's optimization candidates can cite it.

These heuristics are guidance for the audit's report interpretation — they don't change the audit's static cost model (which is data-grounded in the existing EXENSIVE_THRESHOLD + per-class weights). They shape how the Tier 2 Tech Lead and the user interpret the report.

Current State Audit (as of ca781543)

src/ has 61 .py files (27,447 total lines; 23,845 code lines). The call graph is non-trivial; per-action traversal is what makes the analysis tractable.

Already Implemented (DO NOT re-implement; KEEP / build on)

  1. src/mcp_client.py:934-992derive_code_path(target, max_depth=5). A single-symbol recursive call tracer with text output. Doesn't render multi-action graphs, doesn't track mutations, doesn't measure cost. The new tool is the multi-action + mutation + cost version of this primitive. Build on this: lift the AST traversal logic and trace() recursion pattern into code_path_audit.py.
  2. scripts/audit_main_thread_imports.py — static CI gate for import-time purity. Different concern (startup-time import cost), but its AST-walking pattern is the model for code_path_audit.py's implementation.
  3. src/performance_monitor.py — runtime profiling with monitor.scope("name") and per-component hit counts + latencies. Used at runtime; the follow-up pipeline_runtime_profiling_20260607 track will use it to calibrate the heuristic cost model.
  4. conductor/archive/code_path_analysis_20260507/ — prior manual audit + PIPELINE_ANALYSIS.md + Mermaid diagrams for the major pipelines. Manual effort, no reusable tool. New track is the data-grounded successor.
  5. conductor/archive/ai_interaction_call_graph_20260507/ — sequence diagram for the AI loop. New track supersedes this for the 3 actions in scope.
  6. SDM docstrings ([C: ...] / [M: ...] tags in src/*.py docstrings) — pre-computed caller/mutation info. The new audit tool will be a more rigorous version of what SDM already documents ad-hoc.

Gaps to Fill (this track's scope)

  • A static call-graph builder for all of src/ (multi-action, depth-configurable, machine-readable output).
  • A state-mutation index per function (5 mutation kinds: attr_write, container_mutate, file_write, ipc_emit, global_write).
  • An expensive-ops index (7 cost classes, with a heuristic data-size estimate).
  • A per-action traversal API (trace_action(action, max_depth=10) -> ActionProfile).
  • An output suite: custom postfix .dsl data files + markdown summaries + Mermaid per-action call graphs + prefix-tree text view.
  • A CLI (python -m src.code_path_audit --action <name>) and an MCP tool (code_path_audit(action_name, max_depth)).
  • The actual audit run on the 3 actions, with the report committed to docs/reports/code_path_audit/2026-06-07/.

Goals

  1. Produce a queryable artifact. The custom postfix .dsl output is the source of truth; markdown + Mermaid + prefix-tree text are for human review. Re-run after any src/ change to see drift.
  2. Surface the top-N optimization candidates per action. The summary.md ranks candidates by potential data-transform load reduction. This is what the user will use to decide which pruning/optimization work to do next.
  3. Data-grounded design. The audit's data structure is the spec; the heuristics and the threshold are module-level constants tunable from one place.
  4. Reusable across actions. The trace_action API takes any Action (entry point + description). Adding a 4th action (e.g., MMA worker spawn, when it's no longer cold) is one Action(...) declaration.
  5. Surface calibration gaps clearly. When the static heuristic can't resolve a call (C-extension, decorator-driven dispatch, getattr magic), the report flags it as "unresolved" so the runtime-profiling follow-up targets it.

Non-Goals

  • Not implementing the actual code optimizations — that's pipeline_pruning_20260607.
  • Not profiling runtime costs — that's pipeline_runtime_profiling_20260607.
  • Not analyzing the MMA worker spawn action (cold per user).
  • Not analyzing simulation/* or tests/* directories.
  • Not analyzing actions beyond the 3 in scope.
  • Not resolving C-extension call costs statically.
  • Not resolving decorator-driven call dispatch statically (e.g., @property, @imscope).
  • Not providing real microsecond measurements — the cost is heuristic (calibrated later).

Architecture

src/code_path_audit.py — single new module, no new dependencies. Exposes both an MCP tool surface (for agents) and a CLI (python -m src.code_path_audit ...).

Public API

class CallGraph:
    """Directed graph: nodes are functions; edges are call sites."""
    nodes: dict[str, "FunctionNode"]            # fully-qualified name -> node
    edges: dict[str, set[str]]                 # caller -> set of callees
    def add_edge(self, caller: str, callee: str) -> None: ...
    def transitive_callees(self, root: str, max_depth: int = 10) -> set[str]: ...
    def render_mermaid(self, root: str, max_depth: int = 5) -> str: ...

class FunctionNode:
    fqname: str                                # "src.ai_client.AIClient.send"
    file: str
    line: int
    calls: list[str]                           # all callees (resolved or not)
    state_mutations: list["StateMutation"]
    expensive_ops: list["ExpensiveOp"]

class StateMutation:
    target: str                                # "self.history", "module.events", "file:..."
    kind: Literal["attr_write", "container_mutate", "file_write", "ipc_emit", "global_write"]
    line: int

class ExpensiveOp:
    callee: str
    cost_class: Literal["file_io", "network", "ast_parse", "json_io", "pickle", "deep_copy", "loop_amplified"]
    data_size_estimate: int | None              # bytes or container length, heuristic
    line: int                                  # call site in the caller
    weight: int                                # cost_class_weight * data_size (or 1 if data_size unknown)

class Action:
    name: str                                  # "ai_message_lifecycle"
    entry_points: list[str]                    # ["src.app_controller.AppController.process_user_request", ...]
    description: str

class ActionProfile:
    action: Action
    call_graph: CallGraph                      # subgraph reachable from entry points
    expensive_ops: list[ExpensiveOp]           # all expensive ops in the subgraph
    state_mutations: list[StateMutation]       # all mutations in the subgraph
    redundancy: list[tuple[str, int]]          # (op_fqname, call_count) where count > 1
    pipelining_candidates: list[list[str]]     # groups of independent ops currently sequential
    total_load_estimate: int                   # sum(weight) heuristic
    unresolved_calls: list[str]                # calls the AST walker couldn't resolve
    mermaid: str                               # rendered Mermaid
    markdown: str                              # human-readable per-action report

def trace_action(action: Action, max_depth: int = 10) -> ActionProfile: ...
def build_call_graph(src_dir: str = "src") -> CallGraph: ...   # full call graph
def build_expensive_ops_index(cg: CallGraph) -> dict[str, list[ExpensiveOp]]: ...
def build_state_mutations_index(cg: CallGraph) -> dict[str, list[StateMutation]]: ...

Cost Model (heuristic, calibrated by the runtime-profiling follow-up)

Pattern Cost class Default weight Data size source
open(), Path.read_*, Path.write_*, *.write_text file_io 100 file size from Path.stat() when resolvable, else None
requests.*, urllib.*, websockets.*, client.send (with httpx-like signatures) network 500 payload size from param literal/typed hint
ast.parse, ast.walk, tree_sitter.* ast_parse 200 source bytes from the path arg
json.dump, json.load, tomli_w.dump, tomllib.load json_io 150 container length if param is a list/dict
pickle.dump, pickle.load pickle 300 container length
copy.deepcopy deep_copy 200 container length
Any call inside the body of a for / while loop loop_amplified caller_weight × loop_bound_estimate loop bound = range(...) literal/arg, else 1

Expense threshold: EXPENSIVE_THRESHOLD = 40_000 (module-level constant). Any ExpensiveOp.weight > EXPENSIVE_THRESHOLD is flagged "expensive" in the per-action report. The 40,000 default matches the user's stated 10-40μs range; the runtime-profiling follow-up will calibrate it.

Unresolved calls: when the AST walker cannot resolve a callee (e.g., attribute access on self.X where X is set dynamically; getattr; decorator-wrapped method dispatch), the call goes into unresolved_calls with a "unresolved" cost class and weight 0. The report's caveats section notes these; the runtime-profiling follow-up measures them.

Out of the static analysis

  • C-extension call costs (imgui-bundle, tree-sitter native) — runtime profiling only.
  • Decorator-driven dispatch (e.g., @property, @imscope) — runtime profiling only.
  • Import cost at module load time — covered by the existing scripts/audit_main_thread_imports.py.
  • eval / exec calls — flagged as unresolved, not analyzed.

Per-Action Design

For each of the 3 actions, the audit is invoked with one or more entry points and a depth limit (default 10). The audit produces an ActionProfile that the report renders.

Action Entry points Expected high-cost ops the audit should surface
AI message lifecycle src.app_controller.AppController.process_user_request, src.ai_client.AIClient.send, src.aggregate.build_file_items, src.summarize._summarise_* Per-context-file AST parse in build_file_items; AI network call; history append + comms log append + session_logger file write; sub-agent summarization (network + AST, loop-amplified over context files)
Discussion save/load src.project_manager.save_project, src.project_manager.load_project, src.history.HistoryManager.save_snapshot, src.models.parse_history_entries tomli_w.dump / tomllib.load on project TOML; json.dump on comms log (loop-amplified per entry); history file read/write; AST parse on schema validation
GUI startup sloppy.maingui_2.App.__init__, src.app_controller.AppController.__init__, src.paths._resolve_* tomllib.load on config.toml; AST parses for tool registration; file stat on log paths; sloppy.py first-frame import chain (covered by the existing scripts/audit_main_thread_imports.py)

The user can extend with more actions later (e.g., MMA worker spawn when it's no longer cold). Each action is one Action(...) declaration + a trace_action() call.

Output Format

CLI:

uv run python -m src.code_path_audit --action ai_message_lifecycle [--depth N] [--dsl] [--tree] [--markdown] [--mermaid]

MCP tool (for agents):

code_path_audit(action_name: str, max_depth: int = 10) -> dict

Generated artifacts (all under docs/reports/code_path_audit/<YYYY-MM-DD>/):

File Format Purpose
call_graph.dsl Custom postfix DSL Full call graph (all of src/); machine-readable, parses in ~30 lines
expensive_ops.dsl Custom postfix DSL Expensive ops index (per-file, per-function)
state_mutations.dsl Custom postfix DSL State mutations index (per function)
actions/<action>.dsl Custom postfix DSL Per-action profile (machine-readable)
actions/<action>.tree Prefix tree (text) Per-action human-readable tree (for human review)
actions/<action>.md Markdown Per-action summary + table (for code review)
actions/<action>.mmd Mermaid Per-action call graph (visual)
summary.md Markdown Top-level cross-action summary + ranked optimization candidates
optimization_candidates.md Markdown Ranked list with: candidate, current cost, proposed reduction, effort, priority

The two follow-up tracks consume the .dsl files; the markdown + tree are for human review.

The custom DSL is postfix (RPN) with length-prefixed lists — no brackets, no braces, no commas, no colons. Each "word" is a tagged constructor that consumes a known number of args from the stack (e.g., fn consumes 3, exp-op consumes 5, mut consumes 3, N list consumes N items). Whitespace-tokenized. Strings are bare atoms when they have no whitespace; quoted only when needed. nil for null. \ for line comments. The DSL is deliberately NOT strict Forth — it's a custom postfix format tailored to the audit's record shapes (function, call, mutation, expensive op, pair, list).

Example of a single FunctionNode record:

\ FunctionNode: fqname file line fn
"src.ai_client.AIClient.send" "src/ai_client.py" 100 fn
"build_file_items" call
"process_response" call
"self.history" attr_write 110 mut
"open" file_io 100 120 exp-op

The prefix tree renderer is a separate human-readable view of the same data — top-down, ├─/└─/ box-drawing, scannable. Generated by a recursive walker. Inlined in the markdown reports (optionally produced as actions/<action>.tree for tooling).

Why custom postfix DSL (not JSON, not s-expressions, not strict Forth):

  • Not JSON (JSON is ill-performant: quoting, escaping, hash table allocation, no streaming).
  • Not s-expressions (the bracket version drifts back toward s-exprs; the user wanted postfix specifically).
  • Not strict Forth (the user wants a format ideal for call-graph recording, not a Turing-complete Forth program).
  • Postfix (per user: "I want a post-fix heiarchy"): stack-based, no delimiters to count.
  • Length-prefixed lists (standard postfix solution for nesting): N list consumes N items, unambiguous.
  • Trivial parser (~30 lines: split + walk + evaluate tagged words against a known arity table).
  • Compact: ~30-40% fewer characters than JSON for the same data.
  • Streamable: no need to parse the whole file to find a record; you can scan for tags.
  • Extensible: add new metric types by adding new tagged words (metric(name value sample_size), histogram(buckets), etc.).

Verification (TDD per conductor/workflow.md)

Unit tests in tests/test_code_path_audit.py:

  • CallGraph.add_edge + transitive_callees correctness on a synthetic 5-node graph.
  • ExpensiveOpIndex detects each of the 7 cost classes on synthetic source.
  • StateMutationIndex detects each of the 5 mutation kinds on synthetic source.
  • trace_action produces an ActionProfile for a synthetic action whose expected cost is computable by hand.
  • Custom postfix .dsl output round-trips (parse_dsl(to_dsl(profile)) == in-memory structure).
  • Prefix tree renderer produces well-formed box-drawing output for the 3 per-action reports.
  • Markdown output is well-formed (header per section, table per category).
  • Mermaid output parses as valid Mermaid syntax.

Smoke test: run python -m src.code_path_audit --action ai_message_lifecycle --depth 5 against a fixture project; verify the report is produced and contains the expected high-cost ops (per the table above).

Manual verification: the report is the deliverable. A Tier 2 Tech Lead + user review the produced summary.md to confirm the optimization candidates make sense.

Commit Structure (6 atomic commits, in order)

1. feat(audit): add code_path_audit data structures (CallGraph, ExpensiveOpIndex, StateMutationIndex)
   - src/code_path_audit.py (initial data structures)
   - tests/test_code_path_audit.py (unit tests)
2. feat(audit): add trace_action + ActionProfile + cost model
   - src/code_path_audit.py (extends with action tracing)
   - tests/test_code_path_audit.py (integration tests)
3. feat(audit): add custom postfix DSL writer + parser + tree renderer / markdown / Mermaid output
4. feat(audit): add MCP tool + CLI surface
5. docs(audit): run audit on 3 actions; commit report
   - docs/reports/code_path_audit/2026-06-07/* (the deliverable)
6. conductor(tracks): mark Code Path Audit track complete
   - tracks.md update

Each commit message includes a git notes add -m "..." summary per conductor/workflow.md step 9.1-9.3.

Risks

Risk Likelihood Impact Mitigation
Heuristic cost model is imprecise; reported "expensive" ops aren't actually expensive at runtime. Medium Medium (false positives dilute the report) EXPENSIVE_THRESHOLD is a module-level constant; the runtime-profiling follow-up calibrates it.
AST walking misses dynamic patterns (eval, getattr, decorator-driven dispatch). Medium Medium (under-estimates some calls) Document the limitations in the report's caveats section; the runtime-profiling follow-up catches these.
Mermaid diagrams exceed renderable size for deep actions. Medium Low (visualization only) Default max_depth=5 for --mermaid; full graph available as .dsl.
The 3 actions' entry points are not exactly the functions the user has in mind. Medium Low (the report is the artifact; user can re-run with different entry points) Document the chosen entry points in the report; CLI/MCP tool accepts any fully-qualified function name.
Report is too large to review (thousands of expensive ops). Low Medium Per-action scoping; default --depth 5; ranked optimization candidates in summary.md make the top-N obvious.
Existing derive_code_path is the de-facto call-graph tool and the new one is redundant. Low Low (the new one is a strict superset) derive_code_path stays as a thin wrapper around code_path_audit.trace_action for backward compat, OR gets a @deprecated shim.
The 3 actions are not actually the user's top 3 (user might have meant a different 3). Low Low (the tool is generic; re-run with different actions is one CLI call) CLI accepts any Action; user can re-run.

Coordination with Pending Tracks

This track has no blockers and no conflicts. It can ship independently of the 5 active planned tracks. It enables future refactors:

Pending track Could use this analysis for...
qwen_llama_grok_integration_20260606 Identifying redundant OpenAI-compatible request paths in _send_* functions
data_oriented_error_handling_20260606 Showing the call paths the new Result[T] return values will thread through
data_structure_strengthening_20260606 Pinpointing hot functions where the new type aliases matter most
mcp_architecture_refactor_20260606 Identifying which sub-MCPs have the most expensive operations (file_io vs network vs ast)
test_batching_refactor_20260606 Confirming which tests trigger the most expensive paths (to optimize test selection)

This track's analysis is read-only — it doesn't modify src/, doesn't change the public API, doesn't add tests to the existing test suite. The only new files are src/code_path_audit.py (the tool), tests/test_code_path_audit.py (the tests), and the report under docs/reports/code_path_audit/2026-06-07/.

Pre-Flight Adjustments (2026-06-21, per handoffs from any_type_componentization_20260621)

The any_type_componentization_20260621 track (shipped 2026-06-21 with 48/89 sites promoted) revealed that the 4 foundational tracks this audit was deferred behind have evolved. Specifically, 5 new hot-path dataclasses (ToolSpec, ChatMessage, UsageStats, ToolCall, WebSocketMessage) and 1 new module (provider_state.ProviderHistory) now exist. This audit must instrument them.

Per docs/handoffs/PROMPT_FOR_TIER_1.md and HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md, the following 4 adjustments are added to this audit's scope:

A1. Add 2 new actions to the per-action profiling

The existing 3 actions (ai_message_lifecycle, discussion_save_load, gui_startup) become 5:

Action Codepath Measures
provider_history_append (NEW) get_history(p).append(msg) (or legacy _anthropic_history.append(msg)) Per-turn append latency + lock acquire time + memory allocation per call. The hot path Phase 3 will refactor.
websocket_broadcast (NEW) broadcast(WebSocketMessage(...)) (post-Phase 6a) Per-broadcast overhead (allocation + JSON serialization + WebSocket send). The GUI thread's per-event cost.
ai_message_lifecycle (existing) _send_<provider> end-to-end Total per-turn latency delta pre/post Phase 3 (provider_state.ProviderHistory). The 3 OpenAI-compatible providers (grok, minimax, llama) are newly instrumented (currently unprofiled).
discussion_save_load (existing) reset_session() + project switch Cold-path cost. The clear_all() migration's per-call delta.
gui_startup (existing) _PROVIDER_HISTORIES dict init at module load One-time init cost (6 ProviderHistory() instances + 6 locks).

A2. Add 5 micro-benchmarks to the audit's optimization_candidates.md

The audit's per-call cost estimates should include these 5 micro-benchmarks (added per HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md §7):

Micro-benchmark Purpose Expected overhead
NormalizedResponse.__init__ Dataclass construction vs the old 6-field dict literal <1μs; immaterial
WebSocketMessage.__init__ Dataclass construction per broadcast <5μs; the hot path concern
UsageStats.__init__ Nested dataclass construction per response <500ns; negligible (4 int fields)
ProviderHistory.lock acquire threading.Lock acquire overhead <500ns; the threading hot path
ToolSpec.__init__ Dataclass construction per tool (45 tools, cold path) <2μs; only at registration

The benchmarks are emitted to docs/reports/code_path_audit/<date>/micro_benchmarks.md.

A3. Add the "no-TypeError-errors-on-any-thread" assertion

The audit's per-action profiling runs the 5 actions in a controlled harness. The audit MUST assert that no worker[queue_fallback] error: WebSocketServer.broadcast() takes 2 positional arguments but 3 were given (or any TypeError on any thread) appears in the harness output during profiling.

This assertion catches the broadcast() regression that any_type_componentization_20260621 introduced. The regression test that backs this assertion lives in tests/test_websocket_broadcast_regression.py (added by the phase2_4_5_call_site_completion_20260621 follow-up track).

If the assertion fires, the audit's output should:

  1. Mark the affected action's profile as INSTRUMENTATION_CONTAMINATED
  2. List the offending thread + traceback in the report's errors.md
  3. Recommend re-running the audit AFTER phase2_4_5_call_site_completion_20260621 merges

A4. Add the 89 fat-struct sites as instrumented targets

The audit reads docs/reports/ANY_TYPE_AUDIT_20260621.md §3's table and tags each Any usage with (file:line, hot_path, cold_path, init_path). The 89 sites become per-action cost estimates that flow into optimization_candidates.md.

For the 48 promoted sites, the audit compares pre-refactor (legacy globals + dict literals) vs post-refactor (dataclass + registry). For the 41 deferred Phase 3 sites, the audit produces per-call cost estimates that inform the future Phase 3 follow-up track (see docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md for the qualitative estimates).

A5. Sequencing (BLOCKER)

This audit is now blocked by phase2_4_5_call_site_completion_20260621 (the broadcast() fix). Until Phase 6a merges, the GUI thread's worker[queue_fallback] TypeError spam contaminates the audit's per-action profiling.

Recommended sequence:

T0:  Tier 1 approves follow-up track                  (decision: SHRINK to 6a + 6b + 6d)
T1:  Tier 2 implements Phase 6a + 6b + 6d            (~3 hours, ~16 commits)
T2:  Tier 1 reviews + merges follow-up track
T3:  Tier 1 launches code_path_audit_20260607
T4:  Tier 2 implements Phase 3 + cross-phase coupling (separate track, post-audit)

A6. New coordination with any_type_componentization_20260621

This audit now has new dependencies beyond the original 4 foundational tracks:

Track Status Provides to this audit
any_type_componentization_20260621 Shipped 2026-06-21 (48/89 promoted) The 5 dataclasses + 1 module; the 200-site dataclass-coverage baseline
phase2_4_5_call_site_completion_20260621 Spec'd 2026-06-21; not yet merged The fix for the broadcast() TypeError; the "no-TypeError" assertion

This audit is blocked_by both tracks (post-merge).

Follow-up

  • pipeline_runtime_profiling_20260607 (the user-requested follow-up; NOT in this track): adds a runtime profiling harness using the existing src/performance_monitor.py + a per-action test fixture. Measures real costs for the 3 actions. Calibrates the heuristic cost model (EXPENSIVE_THRESHOLD + per-class weights). Catches "things that aren't easy to resolve statically" — import cost, JIT effects, GC pauses, C-extension call cost (imgui-bundle, tree-sitter native), decorator-driven dispatch. Output: scripts/runtime_profiler.py + updated code_path_audit.py cost model.
  • pipeline_pruning_20260607 (the second follow-up; NOT in this track): implements the high-priority optimization candidates surfaced by this track's report. Will be scoped AFTER this track ships, since the report itself defines what to prune.

Out of Scope

  • MMA worker spawn action (deferred per user — keeping MMA cold until the 1:1 discussion UX is dogfooded in a few projects).
  • Implementing the optimization fixes (deferred to pipeline_pruning_20260607).
  • Runtime profiling (deferred to pipeline_runtime_profiling_20260607 per the user's explicit ask).
  • Other major actions beyond AI message, save/load, GUI startup.
  • C-extension call costs (deferred to runtime profiling).
  • Decorator-driven call dispatch (deferred to runtime profiling).
  • simulation/* and tests/* directories (analysis is src/-only for this track; can be extended later).
  • Modifying src/ (read-only analysis).

See Also

  • conductor/archive/code_path_analysis_20260507/ — prior manual audit; the new track is its data-grounded successor.
  • conductor/archive/ai_interaction_call_graph_20260507/ — prior sequence diagram for the AI loop.
  • src/mcp_client.py:934-992derive_code_path(target, max_depth=5) (single-symbol tracer; the new tool supersedes this for multi-action use).
  • src/performance_monitor.py — runtime profiling infrastructure used by the pipeline_runtime_profiling_20260607 follow-up.
  • scripts/audit_main_thread_imports.py — related static CI gate (startup-time import cost).
  • docs/reports/PLANNING_DIGEST_20260606.md — planning context; the 5 active planned tracks are independent of this one.
  • docs/guide_data_oriented.md (if it exists; otherwise conductor/product-guidelines.md "Data-Oriented & Immediate Mode Heuristics") — the project's data-oriented design philosophy this track follows.
  • conductor/tracks/nagent_review_20260608/report.md §15 (Pitfalls #2 and #4, "provider-specific history in process globals" and "AI client is a stateful singleton") — the audit's state_mutations index will surface both of these in the post-4-tracks src/ai_client.py; the optimization candidates should specifically address them.
  • docs/transcripts/wo84LFzx5nI_big_oops_casemuratori.txt — full transcript of Casey Muratori's "The Big OOPs" talk, loaded 2026-06-08 for context. The historical genealogy (Stroustrup, Kay, Simula, Hoare) grounds the audit's "entity-hierarchy fingerprint" heuristic (above). Specifically, Hoare's 1966 "Record Handling" paper introduced discriminated unions — which Simula kept (as inspect) but C++ removed. The audit's actions/ai_message_lifecycle.tree should be checked for if/else chains that would be a discriminated union if Result[T] were threaded through.
  • docs/transcripts/i-h95QIGchY_assuming_as_much_as_possible_andrewreece.txt — full transcript of Andrew Reece's "Assuming as Much as Possible" talk, loaded 2026-06-08 for context. Reece's "Xar" data structure (8-byte header, power-of-2 chunks, bitwise divmod, no realloc copy) is the exemplar for the chunkification-candidate heuristic. The summary.md of the audit's report should note the Xar pattern as a possible optimization target for any function in the hot path that does append-heavy work on a list of uniform items.
  • docs/ideation/ed_chunk_data_structures_20260523.md — user's chunk-based-data-structure ideation (May 2026). The 5-image archive is the source of the "chunkification candidates" heuristic. Specifically, the user notes: "if my chunk size is 1,000 elements, but I only have 5 elements to store, aren't I wasting a massive amount of memory?" — the audit should distinguish real chunkification candidates (uniform data, hot path, large N) from false chunkification candidates (small N, low frequency, polymorphic data).
  • docs/reports/computational_shapes_ssdl_digest_20260608.md — the SSDL digest synthesizing the 4-source computational-shapes thinking. The audit's actions/<action>.tree and actions/<action>.mmd outputs are computational-shape visualizations; the SSDL vocabulary (6 primitives + 7 modifiers) is the conceptual model the audit's tree renderer should follow.