Per user direction ('make a custom DSL ideal for recording the
call-graph or other metrics', 'I want a post-fix heiarchy', 'JSON
is ill-performant'): replaced JSON serializer with a custom
postfix (RPN) DSL tailored to the audit's record shapes.
THE CUSTOM DSL
- Postfix (operands before operator); no brackets, braces,
commas, or colons.
- Length-prefixed lists: N items followed by 'list' word.
- Tagged records: each 'word' is a constructor with a known
arity (action=3, fn=3, call=1, mut=3, exp-op=5, pair=2, int=1).
- Whitespace-tokenized; bare atoms unquoted; double quotes
only when whitespace/special chars present.
- nil for null; backslash for line comments; true/false for bool.
- Trivial parser (~30 lines): _tokenize_dsl splits on
whitespace and respects quotes + comments; parse_dsl
walks tokens and evaluates tagged words against a known
arity table (DSL_WORD_ARITY).
- Round-trips: to_dsl(profile) -> parse_dsl(to_dsl(profile))
yields the same in-memory structure.
DELIVERABLES (updated spec + plan)
- src/code_path_audit.py: to_dsl, dump_dsl, parse_dsl,
_tokenize_dsl, to_tree (prefix-tree text renderer),
to_markdown, to_mermaid.
- Output: .dsl files (machine) + .tree (human prefix view) +
.md (summary tables) + .mmd (Mermaid diagrams).
- No new pip dependencies; pure stdlib.
WHAT STAYED
- The 7 cost classes (file_io, network, ast_parse, json_io,
pickle, deep_copy, loop_amplified) and 5 mutation kinds
are unchanged. The json_io cost class is for JSON file
I/O the audit detects, not the output format.
- 36 tests total (15 + 8 + 10 + 3 across the 4 implementation
phases).
22 KiB
Track: Code Path & Data Pipeline Audit
Status: Spec approved 2026-06-07 Initialized: 2026-06-07 Owner: Tier 2 Tech Lead Priority: Medium (foundational; enables follow-up pruning track)
Overview
Build src/code_path_audit.py — a data-oriented static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. The output (custom postfix .dsl data + markdown + Mermaid + prefix tree text) is the artifact that informs pipeline-pruning decisions; the actual code changes are a follow-up track (pipeline_pruning_20260607).
Per the user's framing: "anything that can even remotely smell as an expensive bulk action or major action that takes more than 10-40 microseconds." The audit focuses on expensive operations (file I/O, network, AST parsing, big loops, anything that smells like a bulk action) inside the 3 actions — not on every state mutation. The cost model is heuristic, calibrated by a runtime-profiling follow-up (pipeline_runtime_profiling_20260607) that catches the cases static analysis can't resolve (C-extension cost, import cost, JIT effects, decorator-driven dispatch).
The MMA worker spawn action is out of scope for this track (per user: "keeping that cold for a while until I like the main ux loop with ai in a discussion fully dogfooded").
Current State Audit (as of ca781543)
src/ has 61 .py files (27,447 total lines; 23,845 code lines). The call graph is non-trivial; per-action traversal is what makes the analysis tractable.
Already Implemented (DO NOT re-implement; KEEP / build on)
src/mcp_client.py:934-992—derive_code_path(target, max_depth=5). A single-symbol recursive call tracer with text output. Doesn't render multi-action graphs, doesn't track mutations, doesn't measure cost. The new tool is the multi-action + mutation + cost version of this primitive. Build on this: lift the AST traversal logic andtrace()recursion pattern intocode_path_audit.py.scripts/audit_main_thread_imports.py— static CI gate for import-time purity. Different concern (startup-time import cost), but its AST-walking pattern is the model forcode_path_audit.py's implementation.src/performance_monitor.py— runtime profiling withmonitor.scope("name")and per-component hit counts + latencies. Used at runtime; the follow-uppipeline_runtime_profiling_20260607track will use it to calibrate the heuristic cost model.conductor/archive/code_path_analysis_20260507/— prior manual audit +PIPELINE_ANALYSIS.md+ Mermaid diagrams for the major pipelines. Manual effort, no reusable tool. New track is the data-grounded successor.conductor/archive/ai_interaction_call_graph_20260507/— sequence diagram for the AI loop. New track supersedes this for the 3 actions in scope.- SDM docstrings (
[C: ...]/[M: ...]tags insrc/*.pydocstrings) — pre-computed caller/mutation info. The new audit tool will be a more rigorous version of what SDM already documents ad-hoc.
Gaps to Fill (this track's scope)
- A static call-graph builder for all of
src/(multi-action, depth-configurable, machine-readable output). - A state-mutation index per function (5 mutation kinds:
attr_write,container_mutate,file_write,ipc_emit,global_write). - An expensive-ops index (7 cost classes, with a heuristic data-size estimate).
- A per-action traversal API (
trace_action(action, max_depth=10) -> ActionProfile). - An output suite: custom postfix
.dsldata files + markdown summaries + Mermaid per-action call graphs + prefix-tree text view. - A CLI (
python -m src.code_path_audit --action <name>) and an MCP tool (code_path_audit(action_name, max_depth)). - The actual audit run on the 3 actions, with the report committed to
docs/reports/code_path_audit/2026-06-07/.
Goals
- Produce a queryable artifact. The custom postfix
.dsloutput is the source of truth; markdown + Mermaid + prefix-tree text are for human review. Re-run after anysrc/change to see drift. - Surface the top-N optimization candidates per action. The
summary.mdranks candidates by potential data-transform load reduction. This is what the user will use to decide which pruning/optimization work to do next. - Data-grounded design. The audit's data structure is the spec; the heuristics and the threshold are module-level constants tunable from one place.
- Reusable across actions. The
trace_actionAPI takes anyAction(entry point + description). Adding a 4th action (e.g., MMA worker spawn, when it's no longer cold) is oneAction(...)declaration. - Surface calibration gaps clearly. When the static heuristic can't resolve a call (C-extension, decorator-driven dispatch,
getattrmagic), the report flags it as "unresolved" so the runtime-profiling follow-up targets it.
Non-Goals
- Not implementing the actual code optimizations — that's
pipeline_pruning_20260607. - Not profiling runtime costs — that's
pipeline_runtime_profiling_20260607. - Not analyzing the MMA worker spawn action (cold per user).
- Not analyzing
simulation/*ortests/*directories. - Not analyzing actions beyond the 3 in scope.
- Not resolving C-extension call costs statically.
- Not resolving decorator-driven call dispatch statically (e.g.,
@property,@imscope). - Not providing real microsecond measurements — the cost is heuristic (calibrated later).
Architecture
src/code_path_audit.py — single new module, no new dependencies. Exposes both an MCP tool surface (for agents) and a CLI (python -m src.code_path_audit ...).
Public API
class CallGraph:
"""Directed graph: nodes are functions; edges are call sites."""
nodes: dict[str, "FunctionNode"] # fully-qualified name -> node
edges: dict[str, set[str]] # caller -> set of callees
def add_edge(self, caller: str, callee: str) -> None: ...
def transitive_callees(self, root: str, max_depth: int = 10) -> set[str]: ...
def render_mermaid(self, root: str, max_depth: int = 5) -> str: ...
class FunctionNode:
fqname: str # "src.ai_client.AIClient.send"
file: str
line: int
calls: list[str] # all callees (resolved or not)
state_mutations: list["StateMutation"]
expensive_ops: list["ExpensiveOp"]
class StateMutation:
target: str # "self.history", "module.events", "file:..."
kind: Literal["attr_write", "container_mutate", "file_write", "ipc_emit", "global_write"]
line: int
class ExpensiveOp:
callee: str
cost_class: Literal["file_io", "network", "ast_parse", "json_io", "pickle", "deep_copy", "loop_amplified"]
data_size_estimate: int | None # bytes or container length, heuristic
line: int # call site in the caller
weight: int # cost_class_weight * data_size (or 1 if data_size unknown)
class Action:
name: str # "ai_message_lifecycle"
entry_points: list[str] # ["src.app_controller.AppController.process_user_request", ...]
description: str
class ActionProfile:
action: Action
call_graph: CallGraph # subgraph reachable from entry points
expensive_ops: list[ExpensiveOp] # all expensive ops in the subgraph
state_mutations: list[StateMutation] # all mutations in the subgraph
redundancy: list[tuple[str, int]] # (op_fqname, call_count) where count > 1
pipelining_candidates: list[list[str]] # groups of independent ops currently sequential
total_load_estimate: int # sum(weight) heuristic
unresolved_calls: list[str] # calls the AST walker couldn't resolve
mermaid: str # rendered Mermaid
markdown: str # human-readable per-action report
def trace_action(action: Action, max_depth: int = 10) -> ActionProfile: ...
def build_call_graph(src_dir: str = "src") -> CallGraph: ... # full call graph
def build_expensive_ops_index(cg: CallGraph) -> dict[str, list[ExpensiveOp]]: ...
def build_state_mutations_index(cg: CallGraph) -> dict[str, list[StateMutation]]: ...
Cost Model (heuristic, calibrated by the runtime-profiling follow-up)
| Pattern | Cost class | Default weight | Data size source |
|---|---|---|---|
open(), Path.read_*, Path.write_*, *.write_text |
file_io |
100 | file size from Path.stat() when resolvable, else None |
requests.*, urllib.*, websockets.*, client.send (with httpx-like signatures) |
network |
500 | payload size from param literal/typed hint |
ast.parse, ast.walk, tree_sitter.* |
ast_parse |
200 | source bytes from the path arg |
json.dump, json.load, tomli_w.dump, tomllib.load |
json_io |
150 | container length if param is a list/dict |
pickle.dump, pickle.load |
pickle |
300 | container length |
copy.deepcopy |
deep_copy |
200 | container length |
Any call inside the body of a for / while loop |
loop_amplified |
caller_weight × loop_bound_estimate | loop bound = range(...) literal/arg, else 1 |
Expense threshold: EXPENSIVE_THRESHOLD = 40_000 (module-level constant). Any ExpensiveOp.weight > EXPENSIVE_THRESHOLD is flagged "expensive" in the per-action report. The 40,000 default matches the user's stated 10-40μs range; the runtime-profiling follow-up will calibrate it.
Unresolved calls: when the AST walker cannot resolve a callee (e.g., attribute access on self.X where X is set dynamically; getattr; decorator-wrapped method dispatch), the call goes into unresolved_calls with a "unresolved" cost class and weight 0. The report's caveats section notes these; the runtime-profiling follow-up measures them.
Out of the static analysis
- C-extension call costs (imgui-bundle, tree-sitter native) — runtime profiling only.
- Decorator-driven dispatch (e.g.,
@property,@imscope) — runtime profiling only. - Import cost at module load time — covered by the existing
scripts/audit_main_thread_imports.py. eval/execcalls — flagged as unresolved, not analyzed.
Per-Action Design
For each of the 3 actions, the audit is invoked with one or more entry points and a depth limit (default 10). The audit produces an ActionProfile that the report renders.
| Action | Entry points | Expected high-cost ops the audit should surface |
|---|---|---|
| AI message lifecycle | src.app_controller.AppController.process_user_request, src.ai_client.AIClient.send, src.aggregate.build_file_items, src.summarize._summarise_* |
Per-context-file AST parse in build_file_items; AI network call; history append + comms log append + session_logger file write; sub-agent summarization (network + AST, loop-amplified over context files) |
| Discussion save/load | src.project_manager.save_project, src.project_manager.load_project, src.history.HistoryManager.save_snapshot, src.models.parse_history_entries |
tomli_w.dump / tomllib.load on project TOML; json.dump on comms log (loop-amplified per entry); history file read/write; AST parse on schema validation |
| GUI startup | sloppy.main → gui_2.App.__init__, src.app_controller.AppController.__init__, src.paths._resolve_* |
tomllib.load on config.toml; AST parses for tool registration; file stat on log paths; sloppy.py first-frame import chain (covered by the existing scripts/audit_main_thread_imports.py) |
The user can extend with more actions later (e.g., MMA worker spawn when it's no longer cold). Each action is one Action(...) declaration + a trace_action() call.
Output Format
CLI:
uv run python -m src.code_path_audit --action ai_message_lifecycle [--depth N] [--dsl] [--tree] [--markdown] [--mermaid]
MCP tool (for agents):
code_path_audit(action_name: str, max_depth: int = 10) -> dict
Generated artifacts (all under docs/reports/code_path_audit/<YYYY-MM-DD>/):
| File | Format | Purpose |
|---|---|---|
call_graph.dsl |
Custom postfix DSL | Full call graph (all of src/); machine-readable, parses in ~30 lines |
expensive_ops.dsl |
Custom postfix DSL | Expensive ops index (per-file, per-function) |
state_mutations.dsl |
Custom postfix DSL | State mutations index (per function) |
actions/<action>.dsl |
Custom postfix DSL | Per-action profile (machine-readable) |
actions/<action>.tree |
Prefix tree (text) | Per-action human-readable tree (for human review) |
actions/<action>.md |
Markdown | Per-action summary + table (for code review) |
actions/<action>.mmd |
Mermaid | Per-action call graph (visual) |
summary.md |
Markdown | Top-level cross-action summary + ranked optimization candidates |
optimization_candidates.md |
Markdown | Ranked list with: candidate, current cost, proposed reduction, effort, priority |
The two follow-up tracks consume the .dsl files; the markdown + tree are for human review.
The custom DSL is postfix (RPN) with length-prefixed lists — no brackets, no braces, no commas, no colons. Each "word" is a tagged constructor that consumes a known number of args from the stack (e.g., fn consumes 3, exp-op consumes 5, mut consumes 3, N list consumes N items). Whitespace-tokenized. Strings are bare atoms when they have no whitespace; quoted only when needed. nil for null. \ for line comments. The DSL is deliberately NOT strict Forth — it's a custom postfix format tailored to the audit's record shapes (function, call, mutation, expensive op, pair, list).
Example of a single FunctionNode record:
\ FunctionNode: fqname file line fn
"src.ai_client.AIClient.send" "src/ai_client.py" 100 fn
"build_file_items" call
"process_response" call
"self.history" attr_write 110 mut
"open" file_io 100 120 exp-op
The prefix tree renderer is a separate human-readable view of the same data — top-down, ├─/└─/│ box-drawing, scannable. Generated by a recursive walker. Inlined in the markdown reports (optionally produced as actions/<action>.tree for tooling).
Why custom postfix DSL (not JSON, not s-expressions, not strict Forth):
- Not JSON (JSON is ill-performant: quoting, escaping, hash table allocation, no streaming).
- Not s-expressions (the bracket version drifts back toward s-exprs; the user wanted postfix specifically).
- Not strict Forth (the user wants a format ideal for call-graph recording, not a Turing-complete Forth program).
- Postfix (per user: "I want a post-fix heiarchy"): stack-based, no delimiters to count.
- Length-prefixed lists (standard postfix solution for nesting):
N listconsumes N items, unambiguous. - Trivial parser (~30 lines: split + walk + evaluate tagged words against a known arity table).
- Compact: ~30-40% fewer characters than JSON for the same data.
- Streamable: no need to parse the whole file to find a record; you can scan for tags.
- Extensible: add new metric types by adding new tagged words (
metric(name value sample_size),histogram(buckets), etc.).
Verification (TDD per conductor/workflow.md)
Unit tests in tests/test_code_path_audit.py:
CallGraph.add_edge+transitive_calleescorrectness on a synthetic 5-node graph.ExpensiveOpIndexdetects each of the 7 cost classes on synthetic source.StateMutationIndexdetects each of the 5 mutation kinds on synthetic source.trace_actionproduces anActionProfilefor a synthetic action whose expected cost is computable by hand.- Custom postfix
.dsloutput round-trips (parse_dsl(to_dsl(profile)) == in-memory structure). - Prefix tree renderer produces well-formed box-drawing output for the 3 per-action reports.
- Markdown output is well-formed (header per section, table per category).
- Mermaid output parses as valid Mermaid syntax.
Smoke test: run python -m src.code_path_audit --action ai_message_lifecycle --depth 5 against a fixture project; verify the report is produced and contains the expected high-cost ops (per the table above).
Manual verification: the report is the deliverable. A Tier 2 Tech Lead + user review the produced summary.md to confirm the optimization candidates make sense.
Commit Structure (6 atomic commits, in order)
1. feat(audit): add code_path_audit data structures (CallGraph, ExpensiveOpIndex, StateMutationIndex)
- src/code_path_audit.py (initial data structures)
- tests/test_code_path_audit.py (unit tests)
2. feat(audit): add trace_action + ActionProfile + cost model
- src/code_path_audit.py (extends with action tracing)
- tests/test_code_path_audit.py (integration tests)
3. feat(audit): add custom postfix DSL writer + parser + tree renderer / markdown / Mermaid output
4. feat(audit): add MCP tool + CLI surface
5. docs(audit): run audit on 3 actions; commit report
- docs/reports/code_path_audit/2026-06-07/* (the deliverable)
6. conductor(tracks): mark Code Path Audit track complete
- tracks.md update
Each commit message includes a git notes add -m "..." summary per conductor/workflow.md step 9.1-9.3.
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Heuristic cost model is imprecise; reported "expensive" ops aren't actually expensive at runtime. | Medium | Medium (false positives dilute the report) | EXPENSIVE_THRESHOLD is a module-level constant; the runtime-profiling follow-up calibrates it. |
| AST walking misses dynamic patterns (eval, getattr, decorator-driven dispatch). | Medium | Medium (under-estimates some calls) | Document the limitations in the report's caveats section; the runtime-profiling follow-up catches these. |
| Mermaid diagrams exceed renderable size for deep actions. | Medium | Low (visualization only) | Default max_depth=5 for --mermaid; full graph available as .dsl. |
| The 3 actions' entry points are not exactly the functions the user has in mind. | Medium | Low (the report is the artifact; user can re-run with different entry points) | Document the chosen entry points in the report; CLI/MCP tool accepts any fully-qualified function name. |
| Report is too large to review (thousands of expensive ops). | Low | Medium | Per-action scoping; default --depth 5; ranked optimization candidates in summary.md make the top-N obvious. |
Existing derive_code_path is the de-facto call-graph tool and the new one is redundant. |
Low | Low (the new one is a strict superset) | derive_code_path stays as a thin wrapper around code_path_audit.trace_action for backward compat, OR gets a @deprecated shim. |
| The 3 actions are not actually the user's top 3 (user might have meant a different 3). | Low | Low (the tool is generic; re-run with different actions is one CLI call) | CLI accepts any Action; user can re-run. |
Coordination with Pending Tracks
This track has no blockers and no conflicts. It can ship independently of the 5 active planned tracks. It enables future refactors:
| Pending track | Could use this analysis for... |
|---|---|
qwen_llama_grok_integration_20260606 |
Identifying redundant OpenAI-compatible request paths in _send_* functions |
data_oriented_error_handling_20260606 |
Showing the call paths the new Result[T] return values will thread through |
data_structure_strengthening_20260606 |
Pinpointing hot functions where the new type aliases matter most |
mcp_architecture_refactor_20260606 |
Identifying which sub-MCPs have the most expensive operations (file_io vs network vs ast) |
test_batching_refactor_20260606 |
Confirming which tests trigger the most expensive paths (to optimize test selection) |
This track's analysis is read-only — it doesn't modify src/, doesn't change the public API, doesn't add tests to the existing test suite. The only new files are src/code_path_audit.py (the tool), tests/test_code_path_audit.py (the tests), and the report under docs/reports/code_path_audit/2026-06-07/.
Follow-up
pipeline_runtime_profiling_20260607(the user-requested follow-up; NOT in this track): adds a runtime profiling harness using the existingsrc/performance_monitor.py+ a per-action test fixture. Measures real costs for the 3 actions. Calibrates the heuristic cost model (EXPENSIVE_THRESHOLD+ per-class weights). Catches "things that aren't easy to resolve statically" — import cost, JIT effects, GC pauses, C-extension call cost (imgui-bundle, tree-sitter native), decorator-driven dispatch. Output:scripts/runtime_profiler.py+ updatedcode_path_audit.pycost model.pipeline_pruning_20260607(the second follow-up; NOT in this track): implements the high-priority optimization candidates surfaced by this track's report. Will be scoped AFTER this track ships, since the report itself defines what to prune.
Out of Scope
- MMA worker spawn action (deferred per user — keeping MMA cold until the 1:1 discussion UX is dogfooded in a few projects).
- Implementing the optimization fixes (deferred to
pipeline_pruning_20260607). - Runtime profiling (deferred to
pipeline_runtime_profiling_20260607per the user's explicit ask). - Other major actions beyond AI message, save/load, GUI startup.
- C-extension call costs (deferred to runtime profiling).
- Decorator-driven call dispatch (deferred to runtime profiling).
simulation/*andtests/*directories (analysis issrc/-only for this track; can be extended later).- Modifying
src/(read-only analysis).
See Also
conductor/archive/code_path_analysis_20260507/— prior manual audit; the new track is its data-grounded successor.conductor/archive/ai_interaction_call_graph_20260507/— prior sequence diagram for the AI loop.src/mcp_client.py:934-992—derive_code_path(target, max_depth=5)(single-symbol tracer; the new tool supersedes this for multi-action use).src/performance_monitor.py— runtime profiling infrastructure used by thepipeline_runtime_profiling_20260607follow-up.scripts/audit_main_thread_imports.py— related static CI gate (startup-time import cost).docs/reports/PLANNING_DIGEST_20260606.md— planning context; the 5 active planned tracks are independent of this one.docs/guide_data_oriented.md(if it exists; otherwiseconductor/product-guidelines.md"Data-Oriented & Immediate Mode Heuristics") — the project's data-oriented design philosophy this track follows.