conductor(track-update): code_path_audit_20260607 - post-4-tracks timing + 5-source framing

The user specified that the code_path_audit_20260607 track should run AFTER the 4 foundational tracks complete (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor). This commit formalizes that timing and grounds the audit's analytical framing in the 5 sources loaded into context on 2026-06-08. 3 surgical additions to the spec/plan, no task changes: 1. Post-4-tracks timing (new section in spec.md §"Timing", plus a "Timing" callout in plan.md's opening): - The 4 tracks will significantly reshape src/ai_client.py, src/mcp_client.py, src/app_controller.py, and src/type_aliases.py - Running the audit on pre-refactor code would produce a report that's stale on day 1 - The post-4-tracks timing ensures the audit grounds optimization decisions for the *resulting* architecture - Pre-flight check: verify all 4 tracks are [x] completed in conductor/tracks.md before starting this track 2. Analytical framing (new section in spec.md §"Analytical Framing (5-source lens)"): - Maps each of the 5 sources (Fleury taxonomy + Fleury combinatoric + Muratori Big OOPs + Reece Assuming + user's chunk ideation) to specific audit-time heuristics - 4 concrete heuristics: effective-codepath count, entity-hierarchy fingerprint, assumed-too-much detector, chunkification candidates - The heuristics shape REPORT INTERPRETATION, not the static cost model (which stays data-grounded in EXPENSIVE_THRESHOLD + per-class weights) 3. See Also cross-references in spec.md (6 new entries): - nagent_review Pitfalls #2 and #4 (provider history globals + stateful singleton) - wo84LFzx5nI Big OOPs transcript (full text, 4310 segments, 200KB; loaded 2026-06-08) - i-h95QIGchY Assuming transcript (full text, 3719 segments, 162KB; loaded 2026-06-08) - ed_chunk_data_structures_20260523.md (5-image archive of user's chunk ideation, 19KB; saved 2026-06-08) - computational_shapes_ssdl_digest_20260608.md (the SSDL digest that synthesizes the 4-source computational-shapes thinking; the audit's tree/mermaid outputs ARE computational-shape visualizations) 4. tracks.md entry updated to include the spec/plan links and a brief status note that the audit is post-4-tracks. 5. plan.md has a "Timing" callout at the top stating the 4 tracks must ship before the plan executes. No code modified. The audit's tasks (Phases 1-6) are unchanged in structure; the new sections only add analytical context and timing constraints.
2026-06-08 22:05:54 -04:00
parent 2eef50c5c2
commit a9333bbb59
3 changed files with 55 additions and 2 deletions
@@ -533,7 +533,8 @@ User review surfaced five outstanding UI issues, each previously attempted witho
 *Link: [./tracks/test_batching_post_refactor_polish_20260607/](./tracks/test_batching_post_refactor_polish_20260607/)*

 #### Track: Code Path Audit
-*Link: [./tracks/code_path_audit_20260607/](./tracks/code_path_audit_20260607/)*
+*Link: [./tracks/code_path_audit_20260607/](./tracks/code_path_audit_20260607/), Spec: [./tracks/code_path_audit_20260607/spec.md](./tracks/code_path_audit_20260607/spec.md), Plan: [./tracks/code_path_audit_20260607/plan.md](./tracks/code_path_audit_20260607/plan.md) (to be authored by writing-plans skill)*
+*Goal: Build `src/code_path_audit.py` — a static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: custom postfix `.dsl` data + markdown + Mermaid + prefix tree text under `docs/reports/code_path_audit/<date>/`. The follow-up `pipeline_pruning_20260607` consumes the `.dsl` files; the markdown + tree are for human review. MMA worker spawn is **cold per user**. **Timing (revised 2026-06-08):** the audit must run *after* the 4 foundational tracks ship (`qwen_llama_grok`, `data_oriented_error_handling`, `data_structure_strengthening`, `mcp_architecture_refactor`); pre-4-tracks code is too stale to ground optimization decisions.*

 #### Track: GUI Architecture Refinement
 *Link: [./tracks/gui_architecture_refinement_20260512/](./tracks/gui_architecture_refinement_20260512/) (no spec.md; needs scoping before planning)*
@@ -2,6 +2,14 @@

 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

+> **Timing (added 2026-06-08).** This plan should **not** be executed until *all 4 foundational tracks* are shipped:
+> 1. `qwen_llama_grok_integration_20260606`
+> 2. `data_oriented_error_handling_20260606`
+> 3. `data_structure_strengthening_20260606`
+> 4. `mcp_architecture_refactor_20260606`
+>
+> The 4 tracks will significantly reshape `src/ai_client.py`, `src/mcp_client.py`, `src/app_controller.py`, and `src/type_aliases.py`. Running this audit on the pre-refactor `src/` would produce a report that's stale on day 1. The Tier 2 Tech Lead should verify the 4-tracks baseline (all marked `[x]` in `conductor/tracks.md`) before starting Phase 1.
+
 **Goal:** Build `src/code_path_audit.py` — a static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: custom postfix `.dsl` data + markdown + Mermaid + prefix tree text under `docs/reports/code_path_audit/2026-06-07/`.

 **Architecture:** Single new module `src/code_path_audit.py`. No new dependencies. Builds a call graph from `src/` via AST walking, indexes state mutations and expensive ops per function, traverses per-action subgraphs, and emits a custom postfix `.dsl` (machine) + markdown + Mermaid (visual) + prefix tree text (human). The postfix `.dsl` is a custom DSL tailored to the audit's record shapes — tagged records (each "word" is a constructor with a known arity), length-prefixed lists, whitespace-tokenized, with `"..."` quoting only when needed. The prefix tree renderer is a separate view of the same data, generated by a recursive walker. Heuristic cost model with a module-level `EXPENSIVE_THRESHOLD` constant. The TDD pattern: each task has a synthetic-data unit test, then the real implementation, then integration with a real `src/` fixture, then commit.
@@ -1,10 +1,12 @@
 # Track: Code Path & Data Pipeline Audit

-**Status:** Spec approved 2026-06-07
+**Status:** Spec approved 2026-06-07; revised 2026-06-08 with post-4-tracks timing and 5-source framing
 **Initialized:** 2026-06-07
 **Owner:** Tier 2 Tech Lead
 **Priority:** Medium (foundational; enables follow-up pruning track)

+> **Revision note (2026-06-08).** The user specified that this audit should run *after* the 4 foundational tracks complete (`qwen_llama_grok_integration_20260606`, `data_oriented_error_handling_20260606`, `data_structure_strengthening_20260606`, `mcp_architecture_refactor_20260606`). The 4 tracks will significantly reshape `src/ai_client.py`, `src/mcp_client.py`, `src/app_controller.py`, and `src/type_aliases.py` — running the audit on the pre-refactor code would produce a report that's stale on day 1. The post-4-tracks timing ensures the audit grounds optimization decisions for the *resulting* architecture, not the pre-refactor one. See §"Timing" below.
+
 ---

 ## Overview
@@ -15,6 +17,43 @@ Per the user's framing: "anything that can even remotely smell as an expensive b

 The MMA worker spawn action is **out of scope** for this track (per user: "keeping that cold for a while until I like the main ux loop with ai in a discussion fully dogfooded").

+## Timing (post-4-tracks)
+
+This track is intentionally **deferred** until *after* the 4 foundational tracks ship:
+
+1. `qwen_llama_grok_integration_20260606` — adds 3 vendors (`_send_qwen`, `_send_llama`, `_send_grok`) and refactors `_send_minimax` to use the shared `send_openai_compatible()` helper. Modifies `src/ai_client.py`, `src/openai_compatible.py` (new), `src/vendor_capabilities.py` (new).
+2. `data_oriented_error_handling_20260606` — refactors `ai_client._send_<vendor>` to return `Result[str]`, modifies `mcp_client.py` (30+ sites), `rag_engine.py` (Result returns).
+3. `data_structure_strengthening_20260606` — adds `src/type_aliases.py` with 10 TypeAliases, replaces 345 weak-type sites across 6 files.
+4. `mcp_architecture_refactor_20260606` — splits `src/mcp_client.py` (2,205 lines → 6 sub-MCPs + 1 external), adds `src/mcp_client_legacy.py` for backward compat.
+
+Running the audit on the **pre-refactor** `src/` would produce a report that's stale on day 1. The post-4-tracks timing ensures:
+- The audit's data grounds optimization decisions for the *resulting* architecture (post-Fleury-style "effective codepaths" and "ECS archetype tables" if the 4 tracks are implemented with the data-oriented philosophy).
+- The `pipeline_pruning_20260607` follow-up has the *right* candidates to optimize — the 4 tracks will move the expensive ops around, and pruning the wrong ones wastes work.
+- The runtime-profiling follow-up (`pipeline_runtime_profiling_20260607`) measures the *new* code paths, not the old ones.
+
+**Pre-flight check (verifies the 4-tracks baseline before this track starts):** confirm that all 4 tracks are marked `[x]` completed in `conductor/tracks.md`. If any of the 4 are still `[~]` in-progress, this track is blocked — the audit would catch the in-progress state as drift.
+
+## Analytical Framing (5-source lens)
+
+The 5 sources loaded into context for the post-4-tracks audit collectively reframe *what* to look for in the 3 actions. The audit's static cost model and pipeline-pruning recommendations should be informed by:
+
+| Source | Lens the audit inherits |
+|---|---|
+| [Ryan Fleury, "A Taxonomy of Computation Shapes"](https://www.dgtlgrove.com/p/a-taxonomy-of-computation-shapes) (Feb 2023) | The 6 shapes: instruction, codepath, wide codepath, codecycle, wide codecycle, codecycle graph. The audit's `trace_action` is a codepath visualization; the `redundancy` (call_count > 1) field detects **wide codepaths** that could be split into parallel sub-codepaths. |
+| [Ryan Fleury, "The Codepath Combinatoric Explosion"](https://www.dgtlgrove.com/p/the-codepath-combinatoric-explosion) (Apr 2023) | The "effective codepath" concept. The audit's `pipelining_candidates` field detects codepaths that *could be defused* (multiple real codepaths collapsed into 1 effective codepath via nil sentinels, generational handles, or immediate-mode APIs). The `redundancy` field is the *first indicator* of defusing opportunities. |
+| [Casey Muratori, "The Big OOPs: Anatomy of a Thirty-Five-Year Mistake" (BSC 2025)](https://youtu.be/wo84LFzx5nI) | The 35-year-historical indictment of compile-time domain hierarchies. The audit's per-function `state_mutations` index reveals whether a function is in the *system* pattern (mutates component-like data, not entity state) or the *entity-hierarchy* pattern (mutates a single object's identity, where the cost compounds per type). Functions in the latter pattern are the *highest-priority* refactor targets — they may need to be split into components + systems. |
+| [Andrew Reece, "Assuming as Much as Possible" (BSC 2025)](https://www.youtube.com/watch?v=i-h95QIGchY) | The "assume as much as possible" engineering discipline. The audit's `expensive_ops` index, for any function that calls a general-purpose primitive (e.g., `json.dumps`, `Path.read_text`, `ast.parse`), should ask: **"can this caller assume a smaller input domain and use a specialized primitive instead?"** A function that calls `json.dumps` 50 times per action with 1KB payloads each may be replaceable by a function that calls a domain-specific serializer once with a 50KB payload. |
+| User's chunk-ideation archive (May 2026) | The "fixed-size slices" + "ECS archetype tables" pattern. The audit's per-function calls that operate on lists/arrays should be flagged if they: (a) don't have a chunk-aware variant, (b) are in a hot path, (c) the data shape is uniform enough to chunk. Functions that match all 3 are the **prime candidates** for `pipeline_pruning_20260607` — chunkification is a known pattern with bounded risk. |
+
+**Concrete audit-time heuristics** that emerge from this framing:
+
+- **Effective-codepath count:** when a function has 3+ branches that all do roughly the same thing with different inputs, the audit should report "this is N real codepaths behaving as 1 effective codepath — could be defused with a nil sentinel or generational handle." The runtime-profiling follow-up measures the actual savings.
+- **Entity-hierarchy fingerprint:** when a function's `state_mutations` list has > 3 writes to a single `self.X` with a `type` discriminator, the audit should report "this function is operating on entity-hierarchy state; consider ECS split into components + systems." A *concrete Manual Slop example* the audit should catch: any function that does `if self.active_ticket.kind == TicketKind.X:` and then mutates multiple fields.
+- **Assumed-too-much detector:** when a function calls `ast.parse` (or any `tree_sitter.*`) on a file that *could be assumed* to be already-parsed (because the file is in the context composition and the `aggregate.py` pipeline has already done it), the audit should report "this is re-parsing data that was already parsed upstream; consider memoizing or threading the parsed AST through." This is the "assume as much as possible" pattern at the data-passing level.
+- **Chunkification candidates:** when a function loops over a `list[dict]` with a known uniform shape (heuristic: all dicts have the same key set), the audit should report "consider chunkifying — uniform data, hot path, no chunk awareness." The user has explicit code (`docs/ideation/ed_chunk_data_structures_20260523.md`) for the chunk pattern, so the audit's optimization candidates can cite it.
+
+These heuristics are *guidance for the audit's report interpretation* — they don't change the audit's static cost model (which is data-grounded in the existing `EXENSIVE_THRESHOLD` + per-class weights). They shape how the Tier 2 Tech Lead and the user interpret the report.
+
 ## Current State Audit (as of `ca781543`)

 `src/` has 61 `.py` files (27,447 total lines; 23,845 code lines). The call graph is non-trivial; per-action traversal is what makes the analysis tractable.
@@ -291,3 +330,8 @@ This track's analysis is **read-only** — it doesn't modify `src/`, doesn't cha
 - `scripts/audit_main_thread_imports.py` — related static CI gate (startup-time import cost).
 - `docs/reports/PLANNING_DIGEST_20260606.md` — planning context; the 5 active planned tracks are independent of this one.
 - `docs/guide_data_oriented.md` (if it exists; otherwise `conductor/product-guidelines.md` "Data-Oriented & Immediate Mode Heuristics") — the project's data-oriented design philosophy this track follows.
+- **`conductor/tracks/nagent_review_20260608/report.md` §15** (Pitfalls #2 and #4, "provider-specific history in process globals" and "AI client is a stateful singleton") — the audit's `state_mutations` index will surface both of these in the post-4-tracks `src/ai_client.py`; the optimization candidates should specifically address them.
+- **`docs/transcripts/wo84LFzx5nI_big_oops_casemuratori.txt`** — full transcript of Casey Muratori's "The Big OOPs" talk, loaded 2026-06-08 for context. The historical genealogy (Stroustrup, Kay, Simula, Hoare) grounds the audit's "entity-hierarchy fingerprint" heuristic (above). Specifically, Hoare's 1966 "Record Handling" paper introduced discriminated unions — which Simula kept (as `inspect`) but C++ removed. The audit's `actions/ai_message_lifecycle.tree` should be checked for `if/else` chains that *would be* a discriminated union if `Result[T]` were threaded through.
+- **`docs/transcripts/i-h95QIGchY_assuming_as_much_as_possible_andrewreece.txt`** — full transcript of Andrew Reece's "Assuming as Much as Possible" talk, loaded 2026-06-08 for context. Reece's "Xar" data structure (8-byte header, power-of-2 chunks, bitwise divmod, no `realloc` copy) is the *exemplar* for the chunkification-candidate heuristic. The `summary.md` of the audit's report should note the Xar pattern as a possible optimization target for any function in the hot path that does append-heavy work on a list of uniform items.
+- **`docs/ideation/ed_chunk_data_structures_20260523.md`** — user's chunk-based-data-structure ideation (May 2026). The 5-image archive is the source of the "chunkification candidates" heuristic. Specifically, the user notes: *"if my chunk size is 1,000 elements, but I only have 5 elements to store, aren't I wasting a massive amount of memory?"* — the audit should distinguish *real* chunkification candidates (uniform data, hot path, large N) from *false* chunkification candidates (small N, low frequency, polymorphic data).
+- **`docs/reports/computational_shapes_ssdl_digest_20260608.md`** — the SSDL digest synthesizing the 4-source computational-shapes thinking. The audit's `actions/<action>.tree` and `actions/<action>.mmd` outputs *are* computational-shape visualizations; the SSDL vocabulary (6 primitives + 7 modifiers) is the conceptual model the audit's tree renderer should follow.