chore(audit): spec code path audit track

Design for a data-oriented static-analysis tool (src/code_path_audit.py) that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: JSON data files + markdown summaries + Mermaid per-action call graphs in docs/reports/code_path_audit/. 61 src/ files, 27,447 total lines. Call graph is non-trivial; per-action traversal is what makes analysis tractable. Cost model: 7 cost classes (file_io, network, ast_parse, json_io, pickle, deep_copy, loop_amplified) with heuristic weights; EXPENSIVE_THRESHOLD = 40,000 module constant. 5 state mutation kinds (attr_write, container_mutate, file_write, ipc_emit, global_write). The 3 action entry points are per-action defined (see Per-Action Design table). MMA worker spawn is OUT of scope per user (cold until 1:1 discussion UX is dogfooded). Two follow-up tracks recorded but NOT in this track: - pipeline_runtime_profiling_20260607: calibrate the heuristic cost model with real measurements; catch C-extension cost, decorator dispatch, JIT effects that static analysis can't resolve. - pipeline_pruning_20260607: implement the high-priority optimization candidates surfaced by this track's report. 6 atomic commits planned: data structures; trace_action + ActionProfile + cost model; output (JSON/MD/Mermaid); MCP + CLI; run audit + commit report; tracks.md update.
2026-06-07 11:30:06 -04:00
parent 1bd1b6d1c6
commit f069a8b27b
1 changed files with 265 additions and 0 deletions
@@ -0,0 +1,265 @@
+# Track: Code Path & Data Pipeline Audit
+
+**Status:** Spec approved 2026-06-07
+**Initialized:** 2026-06-07
+**Owner:** Tier 2 Tech Lead
+**Priority:** Medium (foundational; enables follow-up pruning track)
+
+---
+
+## Overview
+
+Build `src/code_path_audit.py` — a data-oriented static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. The output (JSON + markdown + Mermaid) is the artifact that informs pipeline-pruning decisions; the actual code changes are a follow-up track (`pipeline_pruning_20260607`).
+
+Per the user's framing: "anything that can even remotely smell as an expensive bulk action or major action that takes more than 10-40 microseconds." The audit focuses on **expensive** operations (file I/O, network, AST parsing, big loops, anything that smells like a bulk action) inside the 3 actions — not on every state mutation. The cost model is heuristic, calibrated by a runtime-profiling follow-up (`pipeline_runtime_profiling_20260607`) that catches the cases static analysis can't resolve (C-extension cost, import cost, JIT effects, decorator-driven dispatch).
+
+The MMA worker spawn action is **out of scope** for this track (per user: "keeping that cold for a while until I like the main ux loop with ai in a discussion fully dogfooded").
+
+## Current State Audit (as of `ca781543`)
+
+`src/` has 61 `.py` files (27,447 total lines; 23,845 code lines). The call graph is non-trivial; per-action traversal is what makes the analysis tractable.
+
+### Already Implemented (DO NOT re-implement; KEEP / build on)
+
+1. **`src/mcp_client.py:934-992` — `derive_code_path(target, max_depth=5)`.** A single-symbol recursive call tracer with text output. Doesn't render multi-action graphs, doesn't track mutations, doesn't measure cost. The new tool is the multi-action + mutation + cost version of this primitive. **Build on this:** lift the AST traversal logic and `trace()` recursion pattern into `code_path_audit.py`.
+2. **`scripts/audit_main_thread_imports.py`** — static CI gate for import-time purity. Different concern (startup-time import cost), but its AST-walking pattern is the model for `code_path_audit.py`'s implementation.
+3. **`src/performance_monitor.py`** — runtime profiling with `monitor.scope("name")` and per-component hit counts + latencies. Used at runtime; the follow-up `pipeline_runtime_profiling_20260607` track will use it to calibrate the heuristic cost model.
+4. **`conductor/archive/code_path_analysis_20260507/`** — prior manual audit + `PIPELINE_ANALYSIS.md` + Mermaid diagrams for the major pipelines. Manual effort, no reusable tool. New track is the data-grounded successor.
+5. **`conductor/archive/ai_interaction_call_graph_20260507/`** — sequence diagram for the AI loop. New track supersedes this for the 3 actions in scope.
+6. **SDM docstrings** (`[C: ...]` / `[M: ...]` tags in `src/*.py` docstrings) — pre-computed caller/mutation info. The new audit tool will be a more rigorous version of what SDM already documents ad-hoc.
+
+### Gaps to Fill (this track's scope)
+
+- A static call-graph builder for all of `src/` (multi-action, depth-configurable, machine-readable output).
+- A state-mutation index per function (5 mutation kinds: `attr_write`, `container_mutate`, `file_write`, `ipc_emit`, `global_write`).
+- An expensive-ops index (7 cost classes, with a heuristic data-size estimate).
+- A per-action traversal API (`trace_action(action, max_depth=10) -> ActionProfile`).
+- An output suite: JSON data files + markdown summaries + Mermaid per-action call graphs.
+- A CLI (`python -m src.code_path_audit --action <name>`) and an MCP tool (`code_path_audit(action_name, max_depth)`).
+- The actual audit run on the 3 actions, with the report committed to `docs/reports/code_path_audit/2026-06-07/`.
+
+## Goals
+
+1. **Produce a queryable artifact.** The JSON output is the source of truth; markdown + Mermaid are for human review. Re-run after any `src/` change to see drift.
+2. **Surface the top-N optimization candidates per action.** The `summary.md` ranks candidates by potential data-transform load reduction. This is what the user will use to decide which pruning/optimization work to do next.
+3. **Data-grounded design.** The audit's data structure is the spec; the heuristics and the threshold are module-level constants tunable from one place.
+4. **Reusable across actions.** The `trace_action` API takes any `Action` (entry point + description). Adding a 4th action (e.g., MMA worker spawn, when it's no longer cold) is one `Action(...)` declaration.
+5. **Surface calibration gaps clearly.** When the static heuristic can't resolve a call (C-extension, decorator-driven dispatch, `getattr` magic), the report flags it as "unresolved" so the runtime-profiling follow-up targets it.
+
+## Non-Goals
+
+- Not implementing the actual code optimizations — that's `pipeline_pruning_20260607`.
+- Not profiling runtime costs — that's `pipeline_runtime_profiling_20260607`.
+- Not analyzing the MMA worker spawn action (cold per user).
+- Not analyzing `simulation/*` or `tests/*` directories.
+- Not analyzing actions beyond the 3 in scope.
+- Not resolving C-extension call costs statically.
+- Not resolving decorator-driven call dispatch statically (e.g., `@property`, `@imscope`).
+- Not providing real microsecond measurements — the cost is heuristic (calibrated later).
+
+## Architecture
+
+`src/code_path_audit.py` — single new module, no new dependencies. Exposes both an MCP tool surface (for agents) and a CLI (`python -m src.code_path_audit ...`).
+
+### Public API
+
+```python
+class CallGraph:
+    """Directed graph: nodes are functions; edges are call sites."""
+    nodes: dict[str, "FunctionNode"]            # fully-qualified name -> node
+    edges: dict[str, set[str]]                 # caller -> set of callees
+    def add_edge(self, caller: str, callee: str) -> None: ...
+    def transitive_callees(self, root: str, max_depth: int = 10) -> set[str]: ...
+    def render_mermaid(self, root: str, max_depth: int = 5) -> str: ...
+
+class FunctionNode:
+    fqname: str                                # "src.ai_client.AIClient.send"
+    file: str
+    line: int
+    calls: list[str]                           # all callees (resolved or not)
+    state_mutations: list["StateMutation"]
+    expensive_ops: list["ExpensiveOp"]
+
+class StateMutation:
+    target: str                                # "self.history", "module.events", "file:..."
+    kind: Literal["attr_write", "container_mutate", "file_write", "ipc_emit", "global_write"]
+    line: int
+
+class ExpensiveOp:
+    callee: str
+    cost_class: Literal["file_io", "network", "ast_parse", "json_io", "pickle", "deep_copy", "loop_amplified"]
+    data_size_estimate: int | None              # bytes or container length, heuristic
+    line: int                                  # call site in the caller
+    weight: int                                # cost_class_weight * data_size (or 1 if data_size unknown)
+
+class Action:
+    name: str                                  # "ai_message_lifecycle"
+    entry_points: list[str]                    # ["src.app_controller.AppController.process_user_request", ...]
+    description: str
+
+class ActionProfile:
+    action: Action
+    call_graph: CallGraph                      # subgraph reachable from entry points
+    expensive_ops: list[ExpensiveOp]           # all expensive ops in the subgraph
+    state_mutations: list[StateMutation]       # all mutations in the subgraph
+    redundancy: list[tuple[str, int]]          # (op_fqname, call_count) where count > 1
+    pipelining_candidates: list[list[str]]     # groups of independent ops currently sequential
+    total_load_estimate: int                   # sum(weight) heuristic
+    unresolved_calls: list[str]                # calls the AST walker couldn't resolve
+    mermaid: str                               # rendered Mermaid
+    markdown: str                              # human-readable per-action report
+
+def trace_action(action: Action, max_depth: int = 10) -> ActionProfile: ...
+def build_call_graph(src_dir: str = "src") -> CallGraph: ...   # full call graph
+def build_expensive_ops_index(cg: CallGraph) -> dict[str, list[ExpensiveOp]]: ...
+def build_state_mutations_index(cg: CallGraph) -> dict[str, list[StateMutation]]: ...
+```
+
+### Cost Model (heuristic, calibrated by the runtime-profiling follow-up)
+
+| Pattern | Cost class | Default weight | Data size source |
+|---------|-----------|----------------|------------------|
+| `open()`, `Path.read_*`, `Path.write_*`, `*.write_text` | `file_io` | 100 | file size from `Path.stat()` when resolvable, else `None` |
+| `requests.*`, `urllib.*`, `websockets.*`, `client.send` (with httpx-like signatures) | `network` | 500 | payload size from param literal/typed hint |
+| `ast.parse`, `ast.walk`, `tree_sitter.*` | `ast_parse` | 200 | source bytes from the path arg |
+| `json.dump`, `json.load`, `tomli_w.dump`, `tomllib.load` | `json_io` | 150 | container length if param is a list/dict |
+| `pickle.dump`, `pickle.load` | `pickle` | 300 | container length |
+| `copy.deepcopy` | `deep_copy` | 200 | container length |
+| Any call inside the body of a `for` / `while` loop | `loop_amplified` | caller_weight × loop_bound_estimate | loop bound = `range(...)` literal/arg, else 1 |
+
+**Expense threshold:** `EXPENSIVE_THRESHOLD = 40_000` (module-level constant). Any `ExpensiveOp.weight > EXPENSIVE_THRESHOLD` is flagged "expensive" in the per-action report. The 40,000 default matches the user's stated 10-40μs range; the runtime-profiling follow-up will calibrate it.
+
+**Unresolved calls:** when the AST walker cannot resolve a callee (e.g., attribute access on `self.X` where `X` is set dynamically; `getattr`; decorator-wrapped method dispatch), the call goes into `unresolved_calls` with a `"unresolved"` cost class and weight 0. The report's caveats section notes these; the runtime-profiling follow-up measures them.
+
+### Out of the static analysis
+
+- C-extension call costs (imgui-bundle, tree-sitter native) — runtime profiling only.
+- Decorator-driven dispatch (e.g., `@property`, `@imscope`) — runtime profiling only.
+- Import cost at module load time — covered by the existing `scripts/audit_main_thread_imports.py`.
+- `eval` / `exec` calls — flagged as unresolved, not analyzed.
+
+## Per-Action Design
+
+For each of the 3 actions, the audit is invoked with one or more entry points and a depth limit (default 10). The audit produces an `ActionProfile` that the report renders.
+
+| Action | Entry points | Expected high-cost ops the audit should surface |
+|--------|--------------|------------------------------------------------|
+| **AI message lifecycle** | `src.app_controller.AppController.process_user_request`, `src.ai_client.AIClient.send`, `src.aggregate.build_file_items`, `src.summarize._summarise_*` | Per-context-file AST parse in `build_file_items`; AI network call; history append + comms log append + session_logger file write; sub-agent summarization (network + AST, loop-amplified over context files) |
+| **Discussion save/load** | `src.project_manager.save_project`, `src.project_manager.load_project`, `src.history.HistoryManager.save_snapshot`, `src.models.parse_history_entries` | `tomli_w.dump` / `tomllib.load` on project TOML; `json.dump` on comms log (loop-amplified per entry); history file read/write; AST parse on schema validation |
+| **GUI startup** | `sloppy.main` → `gui_2.App.__init__`, `src.app_controller.AppController.__init__`, `src.paths._resolve_*` | `tomllib.load` on config.toml; AST parses for tool registration; file stat on log paths; `sloppy.py` first-frame import chain (covered by the existing `scripts/audit_main_thread_imports.py`) |
+
+The user can extend with more actions later (e.g., MMA worker spawn when it's no longer cold). Each action is one `Action(...)` declaration + a `trace_action()` call.
+
+## Output Format
+
+CLI:
+```bash
+uv run python -m src.code_path_audit --action ai_message_lifecycle [--depth N] [--json] [--markdown] [--mermaid]
+```
+
+MCP tool (for agents):
+```python
+code_path_audit(action_name: str, max_depth: int = 10) -> dict
+```
+
+Generated artifacts (all under `docs/reports/code_path_audit/<YYYY-MM-DD>/`):
+
+| File | Format | Purpose |
+|------|--------|---------|
+| `call_graph.json` | JSON | Full call graph (all of `src/`) |
+| `expensive_ops.json` | JSON | Expensive ops index (per-file, per-function) |
+| `state_mutations.json` | JSON | State mutations index (per function) |
+| `actions/<action>.json` | JSON | Per-action profile (machine-readable) |
+| `actions/<action>.md` | Markdown | Per-action human-readable summary |
+| `actions/<action>.mmd` | Mermaid | Per-action call graph (visual) |
+| `summary.md` | Markdown | Top-level cross-action summary + ranked optimization candidates |
+| `optimization_candidates.md` | Markdown | Ranked list with: candidate, current cost, proposed reduction, effort, priority |
+
+The two follow-up tracks consume the JSON files; the markdown is for human review.
+
+## Verification (TDD per `conductor/workflow.md`)
+
+Unit tests in `tests/test_code_path_audit.py`:
+
+- `CallGraph.add_edge` + `transitive_callees` correctness on a synthetic 5-node graph.
+- `ExpensiveOpIndex` detects each of the 7 cost classes on synthetic source.
+- `StateMutationIndex` detects each of the 5 mutation kinds on synthetic source.
+- `trace_action` produces an `ActionProfile` for a synthetic action whose expected cost is computable by hand.
+- JSON output round-trips (deserialize → same structure).
+- Markdown output is well-formed (header per section, table per category).
+- Mermaid output parses as valid Mermaid syntax.
+
+Smoke test: run `python -m src.code_path_audit --action ai_message_lifecycle --depth 5` against a fixture project; verify the report is produced and contains the expected high-cost ops (per the table above).
+
+Manual verification: the report is the deliverable. A Tier 2 Tech Lead + user review the produced `summary.md` to confirm the optimization candidates make sense.
+
+## Commit Structure (6 atomic commits, in order)
+
+```
+1. feat(audit): add code_path_audit data structures (CallGraph, ExpensiveOpIndex, StateMutationIndex)
+   - src/code_path_audit.py (initial data structures)
+   - tests/test_code_path_audit.py (unit tests)
+2. feat(audit): add trace_action + ActionProfile + cost model
+   - src/code_path_audit.py (extends with action tracing)
+   - tests/test_code_path_audit.py (integration tests)
+3. feat(audit): add JSON / markdown / Mermaid output
+4. feat(audit): add MCP tool + CLI surface
+5. docs(audit): run audit on 3 actions; commit report
+   - docs/reports/code_path_audit/2026-06-07/* (the deliverable)
+6. conductor(tracks): mark Code Path Audit track complete
+   - tracks.md update
+```
+
+Each commit message includes a `git notes add -m "..."` summary per `conductor/workflow.md` step 9.1-9.3.
+
+## Risks
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| Heuristic cost model is imprecise; reported "expensive" ops aren't actually expensive at runtime. | Medium | Medium (false positives dilute the report) | `EXPENSIVE_THRESHOLD` is a module-level constant; the runtime-profiling follow-up calibrates it. |
+| AST walking misses dynamic patterns (eval, getattr, decorator-driven dispatch). | Medium | Medium (under-estimates some calls) | Document the limitations in the report's caveats section; the runtime-profiling follow-up catches these. |
+| Mermaid diagrams exceed renderable size for deep actions. | Medium | Low (visualization only) | Default `max_depth=5` for `--mermaid`; full graph available as JSON. |
+| The 3 actions' entry points are not exactly the functions the user has in mind. | Medium | Low (the report is the artifact; user can re-run with different entry points) | Document the chosen entry points in the report; CLI/MCP tool accepts any fully-qualified function name. |
+| Report is too large to review (thousands of expensive ops). | Low | Medium | Per-action scoping; default `--depth 5`; ranked optimization candidates in `summary.md` make the top-N obvious. |
+| Existing `derive_code_path` is the de-facto call-graph tool and the new one is redundant. | Low | Low (the new one is a strict superset) | `derive_code_path` stays as a thin wrapper around `code_path_audit.trace_action` for backward compat, OR gets a `@deprecated` shim. |
+| The 3 actions are not actually the user's top 3 (user might have meant a different 3). | Low | Low (the tool is generic; re-run with different actions is one CLI call) | CLI accepts any `Action`; user can re-run. |
+
+## Coordination with Pending Tracks
+
+This track has **no blockers** and **no conflicts**. It can ship independently of the 5 active planned tracks. **It enables** future refactors:
+
+| Pending track | Could use this analysis for... |
+|----------------|--------------------------------|
+| `qwen_llama_grok_integration_20260606` | Identifying redundant OpenAI-compatible request paths in `_send_*` functions |
+| `data_oriented_error_handling_20260606` | Showing the call paths the new `Result[T]` return values will thread through |
+| `data_structure_strengthening_20260606` | Pinpointing hot functions where the new type aliases matter most |
+| `mcp_architecture_refactor_20260606` | Identifying which sub-MCPs have the most expensive operations (file_io vs network vs ast) |
+| `test_batching_refactor_20260606` | Confirming which tests trigger the most expensive paths (to optimize test selection) |
+
+This track's analysis is **read-only** — it doesn't modify `src/`, doesn't change the public API, doesn't add tests to the existing test suite. The only new files are `src/code_path_audit.py` (the tool), `tests/test_code_path_audit.py` (the tests), and the report under `docs/reports/code_path_audit/2026-06-07/`.
+
+## Follow-up
+
+- **`pipeline_runtime_profiling_20260607`** (the user-requested follow-up; NOT in this track): adds a runtime profiling harness using the existing `src/performance_monitor.py` + a per-action test fixture. Measures real costs for the 3 actions. Calibrates the heuristic cost model (`EXPENSIVE_THRESHOLD` + per-class weights). Catches "things that aren't easy to resolve statically" — import cost, JIT effects, GC pauses, C-extension call cost (imgui-bundle, tree-sitter native), decorator-driven dispatch. Output: `scripts/runtime_profiler.py` + updated `code_path_audit.py` cost model.
+- **`pipeline_pruning_20260607`** (the second follow-up; NOT in this track): implements the high-priority optimization candidates surfaced by this track's report. Will be scoped AFTER this track ships, since the report itself defines what to prune.
+
+## Out of Scope
+
+- **MMA worker spawn action** (deferred per user — keeping MMA cold until the 1:1 discussion UX is dogfooded in a few projects).
+- **Implementing the optimization fixes** (deferred to `pipeline_pruning_20260607`).
+- **Runtime profiling** (deferred to `pipeline_runtime_profiling_20260607` per the user's explicit ask).
+- **Other major actions** beyond AI message, save/load, GUI startup.
+- **C-extension call costs** (deferred to runtime profiling).
+- **Decorator-driven call dispatch** (deferred to runtime profiling).
+- **`simulation/*` and `tests/*` directories** (analysis is `src/`-only for this track; can be extended later).
+- **Modifying `src/`** (read-only analysis).
+
+## See Also
+
+- `conductor/archive/code_path_analysis_20260507/` — prior manual audit; the new track is its data-grounded successor.
+- `conductor/archive/ai_interaction_call_graph_20260507/` — prior sequence diagram for the AI loop.
+- `src/mcp_client.py:934-992` — `derive_code_path(target, max_depth=5)` (single-symbol tracer; the new tool supersedes this for multi-action use).
+- `src/performance_monitor.py` — runtime profiling infrastructure used by the `pipeline_runtime_profiling_20260607` follow-up.
+- `scripts/audit_main_thread_imports.py` — related static CI gate (startup-time import cost).
+- `docs/reports/PLANNING_DIGEST_20260606.md` — planning context; the 5 active planned tracks are independent of this one.
+- `docs/guide_data_oriented.md` (if it exists; otherwise `conductor/product-guidelines.md` "Data-Oriented & Immediate Mode Heuristics") — the project's data-oriented design philosophy this track follows.