Private

Public Access

Files

T

ed ad13007352 chore(audit): switch output format from JSON to custom postfix DSL

Per user direction ('make a custom DSL ideal for recording the
call-graph or other metrics', 'I want a post-fix heiarchy', 'JSON
is ill-performant'): replaced JSON serializer with a custom
postfix (RPN) DSL tailored to the audit's record shapes.

THE CUSTOM DSL
- Postfix (operands before operator); no brackets, braces,
  commas, or colons.
- Length-prefixed lists: N items followed by 'list' word.
- Tagged records: each 'word' is a constructor with a known
  arity (action=3, fn=3, call=1, mut=3, exp-op=5, pair=2, int=1).
- Whitespace-tokenized; bare atoms unquoted; double quotes
  only when whitespace/special chars present.
- nil for null; backslash for line comments; true/false for bool.
- Trivial parser (~30 lines): _tokenize_dsl splits on
  whitespace and respects quotes + comments; parse_dsl
  walks tokens and evaluates tagged words against a known
  arity table (DSL_WORD_ARITY).
- Round-trips: to_dsl(profile) -> parse_dsl(to_dsl(profile))
  yields the same in-memory structure.

DELIVERABLES (updated spec + plan)
- src/code_path_audit.py: to_dsl, dump_dsl, parse_dsl,
  _tokenize_dsl, to_tree (prefix-tree text renderer),
  to_markdown, to_mermaid.
- Output: .dsl files (machine) + .tree (human prefix view) +
  .md (summary tables) + .mmd (Mermaid diagrams).
- No new pip dependencies; pure stdlib.

WHAT STAYED
- The 7 cost classes (file_io, network, ast_parse, json_io,
  pickle, deep_copy, loop_amplified) and 5 mutation kinds
  are unchanged. The json_io cost class is for JSON file
  I/O the audit detects, not the output format.
- 36 tests total (15 + 8 + 10 + 3 across the 4 implementation
  phases).

2026-06-07 12:17:56 -04:00

22 KiB

Raw Blame History

Track: Code Path & Data Pipeline Audit

Status: Spec approved 2026-06-07 Initialized: 2026-06-07 Owner: Tier 2 Tech Lead Priority: Medium (foundational; enables follow-up pruning track)

Overview

Build src/code_path_audit.py — a data-oriented static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. The output (custom postfix .dsl data + markdown + Mermaid + prefix tree text) is the artifact that informs pipeline-pruning decisions; the actual code changes are a follow-up track (pipeline_pruning_20260607).

Per the user's framing: "anything that can even remotely smell as an expensive bulk action or major action that takes more than 10-40 microseconds." The audit focuses on expensive operations (file I/O, network, AST parsing, big loops, anything that smells like a bulk action) inside the 3 actions — not on every state mutation. The cost model is heuristic, calibrated by a runtime-profiling follow-up (pipeline_runtime_profiling_20260607) that catches the cases static analysis can't resolve (C-extension cost, import cost, JIT effects, decorator-driven dispatch).

The MMA worker spawn action is out of scope for this track (per user: "keeping that cold for a while until I like the main ux loop with ai in a discussion fully dogfooded").

Current State Audit (as of `ca781543`)

src/ has 61 .py files (27,447 total lines; 23,845 code lines). The call graph is non-trivial; per-action traversal is what makes the analysis tractable.

Already Implemented (DO NOT re-implement; KEEP / build on)

src/mcp_client.py:934-992 — derive_code_path(target, max_depth=5). A single-symbol recursive call tracer with text output. Doesn't render multi-action graphs, doesn't track mutations, doesn't measure cost. The new tool is the multi-action + mutation + cost version of this primitive. Build on this: lift the AST traversal logic and trace() recursion pattern into code_path_audit.py.
scripts/audit_main_thread_imports.py — static CI gate for import-time purity. Different concern (startup-time import cost), but its AST-walking pattern is the model for code_path_audit.py's implementation.
src/performance_monitor.py — runtime profiling with monitor.scope("name") and per-component hit counts + latencies. Used at runtime; the follow-up pipeline_runtime_profiling_20260607 track will use it to calibrate the heuristic cost model.
conductor/archive/code_path_analysis_20260507/ — prior manual audit + PIPELINE_ANALYSIS.md + Mermaid diagrams for the major pipelines. Manual effort, no reusable tool. New track is the data-grounded successor.
conductor/archive/ai_interaction_call_graph_20260507/ — sequence diagram for the AI loop. New track supersedes this for the 3 actions in scope.
SDM docstrings ([C: ...] / [M: ...] tags in src/*.py docstrings) — pre-computed caller/mutation info. The new audit tool will be a more rigorous version of what SDM already documents ad-hoc.

Gaps to Fill (this track's scope)

A static call-graph builder for all of src/ (multi-action, depth-configurable, machine-readable output).
A state-mutation index per function (5 mutation kinds: attr_write, container_mutate, file_write, ipc_emit, global_write).
An expensive-ops index (7 cost classes, with a heuristic data-size estimate).
A per-action traversal API (trace_action(action, max_depth=10) -> ActionProfile).
An output suite: custom postfix .dsl data files + markdown summaries + Mermaid per-action call graphs + prefix-tree text view.
A CLI (python -m src.code_path_audit --action <name>) and an MCP tool (code_path_audit(action_name, max_depth)).
The actual audit run on the 3 actions, with the report committed to docs/reports/code_path_audit/2026-06-07/.

Goals

Produce a queryable artifact. The custom postfix .dsl output is the source of truth; markdown + Mermaid + prefix-tree text are for human review. Re-run after any src/ change to see drift.
Surface the top-N optimization candidates per action. The summary.md ranks candidates by potential data-transform load reduction. This is what the user will use to decide which pruning/optimization work to do next.
Data-grounded design. The audit's data structure is the spec; the heuristics and the threshold are module-level constants tunable from one place.
Reusable across actions. The trace_action API takes any Action (entry point + description). Adding a 4th action (e.g., MMA worker spawn, when it's no longer cold) is one Action(...) declaration.
Surface calibration gaps clearly. When the static heuristic can't resolve a call (C-extension, decorator-driven dispatch, getattr magic), the report flags it as "unresolved" so the runtime-profiling follow-up targets it.

Non-Goals

Not implementing the actual code optimizations — that's pipeline_pruning_20260607.
Not profiling runtime costs — that's pipeline_runtime_profiling_20260607.
Not analyzing the MMA worker spawn action (cold per user).
Not analyzing simulation/* or tests/* directories.
Not analyzing actions beyond the 3 in scope.
Not resolving C-extension call costs statically.
Not resolving decorator-driven call dispatch statically (e.g., @property, @imscope).
Not providing real microsecond measurements — the cost is heuristic (calibrated later).

Architecture

src/code_path_audit.py — single new module, no new dependencies. Exposes both an MCP tool surface (for agents) and a CLI (python -m src.code_path_audit ...).

Public API

class CallGraph:
    """Directed graph: nodes are functions; edges are call sites."""
    nodes: dict[str, "FunctionNode"]            # fully-qualified name -> node
    edges: dict[str, set[str]]                 # caller -> set of callees
    def add_edge(self, caller: str, callee: str) -> None: ...
    def transitive_callees(self, root: str, max_depth: int = 10) -> set[str]: ...
    def render_mermaid(self, root: str, max_depth: int = 5) -> str: ...

class FunctionNode:
    fqname: str                                # "src.ai_client.AIClient.send"
    file: str
    line: int
    calls: list[str]                           # all callees (resolved or not)
    state_mutations: list["StateMutation"]
    expensive_ops: list["ExpensiveOp"]

class StateMutation:
    target: str                                # "self.history", "module.events", "file:..."
    kind: Literal["attr_write", "container_mutate", "file_write", "ipc_emit", "global_write"]
    line: int

class ExpensiveOp:
    callee: str
    cost_class: Literal["file_io", "network", "ast_parse", "json_io", "pickle", "deep_copy", "loop_amplified"]
    data_size_estimate: int | None              # bytes or container length, heuristic
    line: int                                  # call site in the caller
    weight: int                                # cost_class_weight * data_size (or 1 if data_size unknown)

class Action:
    name: str                                  # "ai_message_lifecycle"
    entry_points: list[str]                    # ["src.app_controller.AppController.process_user_request", ...]
    description: str

class ActionProfile:
    action: Action
    call_graph: CallGraph                      # subgraph reachable from entry points
    expensive_ops: list[ExpensiveOp]           # all expensive ops in the subgraph
    state_mutations: list[StateMutation]       # all mutations in the subgraph
    redundancy: list[tuple[str, int]]          # (op_fqname, call_count) where count > 1
    pipelining_candidates: list[list[str]]     # groups of independent ops currently sequential
    total_load_estimate: int                   # sum(weight) heuristic
    unresolved_calls: list[str]                # calls the AST walker couldn't resolve
    mermaid: str                               # rendered Mermaid
    markdown: str                              # human-readable per-action report

def trace_action(action: Action, max_depth: int = 10) -> ActionProfile: ...
def build_call_graph(src_dir: str = "src") -> CallGraph: ...   # full call graph
def build_expensive_ops_index(cg: CallGraph) -> dict[str, list[ExpensiveOp]]: ...
def build_state_mutations_index(cg: CallGraph) -> dict[str, list[StateMutation]]: ...

Cost Model (heuristic, calibrated by the runtime-profiling follow-up)

Pattern	Cost class	Default weight	Data size source
`open()`, `Path.read_`, `Path.write_`, `*.write_text`	`file_io`	100	file size from `Path.stat()` when resolvable, else `None`
`requests.`, `urllib.`, `websockets.*`, `client.send` (with httpx-like signatures)	`network`	500	payload size from param literal/typed hint
`ast.parse`, `ast.walk`, `tree_sitter.*`	`ast_parse`	200	source bytes from the path arg
`json.dump`, `json.load`, `tomli_w.dump`, `tomllib.load`	`json_io`	150	container length if param is a list/dict
`pickle.dump`, `pickle.load`	`pickle`	300	container length
`copy.deepcopy`	`deep_copy`	200	container length
Any call inside the body of a `for` / `while` loop	`loop_amplified`	caller_weight × loop_bound_estimate	loop bound = `range(...)` literal/arg, else 1

Expense threshold: EXPENSIVE_THRESHOLD = 40_000 (module-level constant). Any ExpensiveOp.weight > EXPENSIVE_THRESHOLD is flagged "expensive" in the per-action report. The 40,000 default matches the user's stated 10-40μs range; the runtime-profiling follow-up will calibrate it.

Unresolved calls: when the AST walker cannot resolve a callee (e.g., attribute access on self.X where X is set dynamically; getattr; decorator-wrapped method dispatch), the call goes into unresolved_calls with a "unresolved" cost class and weight 0. The report's caveats section notes these; the runtime-profiling follow-up measures them.

Out of the static analysis

C-extension call costs (imgui-bundle, tree-sitter native) — runtime profiling only.
Decorator-driven dispatch (e.g., @property, @imscope) — runtime profiling only.
Import cost at module load time — covered by the existing scripts/audit_main_thread_imports.py.
eval / exec calls — flagged as unresolved, not analyzed.

Per-Action Design

For each of the 3 actions, the audit is invoked with one or more entry points and a depth limit (default 10). The audit produces an ActionProfile that the report renders.

Action	Entry points	Expected high-cost ops the audit should surface
AI message lifecycle	`src.app_controller.AppController.process_user_request`, `src.ai_client.AIClient.send`, `src.aggregate.build_file_items`, `src.summarize._summarise_*`	Per-context-file AST parse in `build_file_items`; AI network call; history append + comms log append + session_logger file write; sub-agent summarization (network + AST, loop-amplified over context files)
Discussion save/load	`src.project_manager.save_project`, `src.project_manager.load_project`, `src.history.HistoryManager.save_snapshot`, `src.models.parse_history_entries`	`tomli_w.dump` / `tomllib.load` on project TOML; `json.dump` on comms log (loop-amplified per entry); history file read/write; AST parse on schema validation
GUI startup	`sloppy.main` → `gui_2.App.__init__`, `src.app_controller.AppController.__init__`, `src.paths._resolve_*`	`tomllib.load` on config.toml; AST parses for tool registration; file stat on log paths; `sloppy.py` first-frame import chain (covered by the existing `scripts/audit_main_thread_imports.py`)

The user can extend with more actions later (e.g., MMA worker spawn when it's no longer cold). Each action is one Action(...) declaration + a trace_action() call.

Output Format

CLI:

uv run python -m src.code_path_audit --action ai_message_lifecycle [--depth N] [--dsl] [--tree] [--markdown] [--mermaid]

MCP tool (for agents):

code_path_audit(action_name: str, max_depth: int = 10) -> dict

Generated artifacts (all under docs/reports/code_path_audit/<YYYY-MM-DD>/):

File	Format	Purpose
`call_graph.dsl`	Custom postfix DSL	Full call graph (all of `src/`); machine-readable, parses in ~30 lines
`expensive_ops.dsl`	Custom postfix DSL	Expensive ops index (per-file, per-function)
`state_mutations.dsl`	Custom postfix DSL	State mutations index (per function)
`actions/<action>.dsl`	Custom postfix DSL	Per-action profile (machine-readable)
`actions/<action>.tree`	Prefix tree (text)	Per-action human-readable tree (for human review)
`actions/<action>.md`	Markdown	Per-action summary + table (for code review)
`actions/<action>.mmd`	Mermaid	Per-action call graph (visual)
`summary.md`	Markdown	Top-level cross-action summary + ranked optimization candidates
`optimization_candidates.md`	Markdown	Ranked list with: candidate, current cost, proposed reduction, effort, priority

The two follow-up tracks consume the .dsl files; the markdown + tree are for human review.

The custom DSL is postfix (RPN) with length-prefixed lists — no brackets, no braces, no commas, no colons. Each "word" is a tagged constructor that consumes a known number of args from the stack (e.g., fn consumes 3, exp-op consumes 5, mut consumes 3, N list consumes N items). Whitespace-tokenized. Strings are bare atoms when they have no whitespace; quoted only when needed. nil for null. \ for line comments. The DSL is deliberately NOT strict Forth — it's a custom postfix format tailored to the audit's record shapes (function, call, mutation, expensive op, pair, list).

Example of a single FunctionNode record:

\ FunctionNode: fqname file line fn
"src.ai_client.AIClient.send" "src/ai_client.py" 100 fn
"build_file_items" call
"process_response" call
"self.history" attr_write 110 mut
"open" file_io 100 120 exp-op

The prefix tree renderer is a separate human-readable view of the same data — top-down, ├─/└─/│ box-drawing, scannable. Generated by a recursive walker. Inlined in the markdown reports (optionally produced as actions/<action>.tree for tooling).

Why custom postfix DSL (not JSON, not s-expressions, not strict Forth):

Not JSON (JSON is ill-performant: quoting, escaping, hash table allocation, no streaming).
Not s-expressions (the bracket version drifts back toward s-exprs; the user wanted postfix specifically).
Not strict Forth (the user wants a format ideal for call-graph recording, not a Turing-complete Forth program).
Postfix (per user: "I want a post-fix heiarchy"): stack-based, no delimiters to count.
Length-prefixed lists (standard postfix solution for nesting): N list consumes N items, unambiguous.
Trivial parser (~30 lines: split + walk + evaluate tagged words against a known arity table).
Compact: ~30-40% fewer characters than JSON for the same data.
Streamable: no need to parse the whole file to find a record; you can scan for tags.
Extensible: add new metric types by adding new tagged words (metric(name value sample_size), histogram(buckets), etc.).

Verification (TDD per `conductor/workflow.md`)

Unit tests in tests/test_code_path_audit.py:

CallGraph.add_edge + transitive_callees correctness on a synthetic 5-node graph.
ExpensiveOpIndex detects each of the 7 cost classes on synthetic source.
StateMutationIndex detects each of the 5 mutation kinds on synthetic source.
trace_action produces an ActionProfile for a synthetic action whose expected cost is computable by hand.
Custom postfix .dsl output round-trips (parse_dsl(to_dsl(profile)) == in-memory structure).
Prefix tree renderer produces well-formed box-drawing output for the 3 per-action reports.
Markdown output is well-formed (header per section, table per category).
Mermaid output parses as valid Mermaid syntax.

Smoke test: run python -m src.code_path_audit --action ai_message_lifecycle --depth 5 against a fixture project; verify the report is produced and contains the expected high-cost ops (per the table above).

Manual verification: the report is the deliverable. A Tier 2 Tech Lead + user review the produced summary.md to confirm the optimization candidates make sense.

Commit Structure (6 atomic commits, in order)

1. feat(audit): add code_path_audit data structures (CallGraph, ExpensiveOpIndex, StateMutationIndex)
   - src/code_path_audit.py (initial data structures)
   - tests/test_code_path_audit.py (unit tests)
2. feat(audit): add trace_action + ActionProfile + cost model
   - src/code_path_audit.py (extends with action tracing)
   - tests/test_code_path_audit.py (integration tests)
3. feat(audit): add custom postfix DSL writer + parser + tree renderer / markdown / Mermaid output
4. feat(audit): add MCP tool + CLI surface
5. docs(audit): run audit on 3 actions; commit report
   - docs/reports/code_path_audit/2026-06-07/* (the deliverable)
6. conductor(tracks): mark Code Path Audit track complete
   - tracks.md update

Each commit message includes a git notes add -m "..." summary per conductor/workflow.md step 9.1-9.3.

Risks

Risk	Likelihood	Impact	Mitigation
Heuristic cost model is imprecise; reported "expensive" ops aren't actually expensive at runtime.	Medium	Medium (false positives dilute the report)	`EXPENSIVE_THRESHOLD` is a module-level constant; the runtime-profiling follow-up calibrates it.
AST walking misses dynamic patterns (eval, getattr, decorator-driven dispatch).	Medium	Medium (under-estimates some calls)	Document the limitations in the report's caveats section; the runtime-profiling follow-up catches these.
Mermaid diagrams exceed renderable size for deep actions.	Medium	Low (visualization only)	Default `max_depth=5` for `--mermaid`; full graph available as `.dsl`.
The 3 actions' entry points are not exactly the functions the user has in mind.	Medium	Low (the report is the artifact; user can re-run with different entry points)	Document the chosen entry points in the report; CLI/MCP tool accepts any fully-qualified function name.
Report is too large to review (thousands of expensive ops).	Low	Medium	Per-action scoping; default `--depth 5`; ranked optimization candidates in `summary.md` make the top-N obvious.
Existing `derive_code_path` is the de-facto call-graph tool and the new one is redundant.	Low	Low (the new one is a strict superset)	`derive_code_path` stays as a thin wrapper around `code_path_audit.trace_action` for backward compat, OR gets a `@deprecated` shim.
The 3 actions are not actually the user's top 3 (user might have meant a different 3).	Low	Low (the tool is generic; re-run with different actions is one CLI call)	CLI accepts any `Action`; user can re-run.

Coordination with Pending Tracks

This track has no blockers and no conflicts. It can ship independently of the 5 active planned tracks. It enables future refactors:

Pending track	Could use this analysis for...
`qwen_llama_grok_integration_20260606`	Identifying redundant OpenAI-compatible request paths in `_send_*` functions
`data_oriented_error_handling_20260606`	Showing the call paths the new `Result[T]` return values will thread through
`data_structure_strengthening_20260606`	Pinpointing hot functions where the new type aliases matter most
`mcp_architecture_refactor_20260606`	Identifying which sub-MCPs have the most expensive operations (file_io vs network vs ast)
`test_batching_refactor_20260606`	Confirming which tests trigger the most expensive paths (to optimize test selection)

This track's analysis is read-only — it doesn't modify src/, doesn't change the public API, doesn't add tests to the existing test suite. The only new files are src/code_path_audit.py (the tool), tests/test_code_path_audit.py (the tests), and the report under docs/reports/code_path_audit/2026-06-07/.

Follow-up

pipeline_runtime_profiling_20260607 (the user-requested follow-up; NOT in this track): adds a runtime profiling harness using the existing src/performance_monitor.py + a per-action test fixture. Measures real costs for the 3 actions. Calibrates the heuristic cost model (EXPENSIVE_THRESHOLD + per-class weights). Catches "things that aren't easy to resolve statically" — import cost, JIT effects, GC pauses, C-extension call cost (imgui-bundle, tree-sitter native), decorator-driven dispatch. Output: scripts/runtime_profiler.py + updated code_path_audit.py cost model.
pipeline_pruning_20260607 (the second follow-up; NOT in this track): implements the high-priority optimization candidates surfaced by this track's report. Will be scoped AFTER this track ships, since the report itself defines what to prune.

Out of Scope

MMA worker spawn action (deferred per user — keeping MMA cold until the 1:1 discussion UX is dogfooded in a few projects).
Implementing the optimization fixes (deferred to pipeline_pruning_20260607).
Runtime profiling (deferred to pipeline_runtime_profiling_20260607 per the user's explicit ask).
Other major actions beyond AI message, save/load, GUI startup.
C-extension call costs (deferred to runtime profiling).
Decorator-driven call dispatch (deferred to runtime profiling).
simulation/* and tests/* directories (analysis is src/-only for this track; can be extended later).
Modifying src/ (read-only analysis).

22 KiB Raw Blame History Unescape Escape