# Track: Chunkification Optimization (C11 Pipeline Contingency) **Status:** Placeholder / contingency (do not start without a hard constraint) **Initialized:** 2026-06-08 **Owner:** Tier 2 Tech Lead **Priority:** DEFERRED (no current bottleneck) > **The one-paragraph summary.** This is a *contingency document*, not an active track. It activates only when a hard constraint surfaces that no existing Python package can solve, AND the target is hot enough that the C11 build cost is justified. Per user (verbatim): *"only worth it if I reach a hard constraint that I cannot solve with an existing python package. Then I could make a custom pipelien to deal with the hot data set witha custom cpython extension."* The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are **not currently bottlenecks** per `src/aggregate.py:380-454` (current implementation is pure-Python string concat, zero third-party markdown deps in `pyproject.toml:6-27`) and `src/history.py:1-141` (snapshot deep copy is bounded ~500KB at 100-snapshot capacity, debounced in `gui_2.py:1140-1170`). > > **The activation plan** is the substantive content of this doc — what to build *if/when* the hard constraint surfaces. The shape is a request-blob → C11 pipeline → response-blob subprocess, NOT a stateful CPython C extension. This is the v2 framing from `docs/reports/c11_python_interop_assessment_20260608.md` Part 3, §3.5-3.12. --- ## 1. Why this is a contingency, not a track ### 1.1 The two target use cases are not currently bottlenecks **Markdown parsing into aggregate markdown:** - `src/aggregate.py:380-454` (`build_markdown_from_items`) builds markdown by **pure-Python string concatenation** (`f"### \`{original}\`\n\n\`\`\`{suffix}\n{skeleton}\n\`\`\""` and `"\n\n---\n\n".join(sections)`) - `pyproject.toml:6-27` has **zero third-party markdown dependencies** (`mistune`, `markdown-it-py`, `commonmark-py`, `markdown` are all NOT in deps) - `src/summarize.py:7-219` `_summarise_markdown` only extracts headings; doesn't parse body - **First fix if this becomes a bottleneck:** add `markdown-it-py` to `pyproject.toml`. ~1 line change, ~10x speedup over pure-Python regex parsing. NOT C11. **Context snapshot processing:** - `src/history.py:1-141` `UISnapshot` is a 13-field dataclass. 100-snapshot default capacity. ~500KB max payload - `HistoryManager` snapshot capture is debounced at render frame (`gui_2.py:1140-1170`), not per-frame - `to_dict()` / `from_dict()` deep-copies are the only meaningful work - **First fix if this becomes a bottleneck:** switch from `to_dict`/`from_dict` to `pickle` (5-10x faster) or `msgspec` (10-20x faster). NOT C11. ### 1.2 The threshold is "hard constraint that no existing Python package can solve" Per user, the C11 path is justified ONLY when profiling demonstrates a real bottleneck AND the existing-Python-package fix has been tried and doesn't work. **This has not happened yet.** --- ## 2. The activation plan (what to build when the constraint surfaces) ### 2.1 Wire format (the contract) The Python side builds a request envelope; the C11 side reads it, runs ops, writes a response. The wire format is the ONLY contract; both sides agree on it. **v1 (text, debuggable):** ``` # request.txt op parse_md op summarise_python op mask_symbols @sym1 def @sym2 sig op build_section tier=3 input file src/foo.py input file src/bar.py format markdown_v3 end ``` **v2 (binary, fast):** ``` [1 byte: format version] [1 byte: op_count] [for each op: op_id | param_count | params] [for each input: byte_len | path | content] ``` **Recommended:** start with text v1, switch to binary v2 if profiling shows parse cost matters. A reasonable middle path: **text envelope + binary payloads** (you can `cat` the envelope to debug; the heavy bytes move binary). ### 2.2 The C11 pipeline API Single entry point. Standalone binary. No Python awareness. ```c // chunks_module.c (hypothetical) typedef Struct_(PipelineResponse) { U8* bytes; U8 len; U4 exit_code; // 0 = success Str8 error_msg; // optional }; IA_ PipelineResponse pipeline_run(Slice request); ``` The C side: 1. Parses the request envelope 2. Loads input files (or accepts inline blobs) 3. Runs each op in order 4. Collects output into response blob 5. Returns exit code + response ### 2.3 The Python wrapper ```python # Python side (hypothetical) import subprocess import json def run_pipeline(request: str) -> str: """Shell out to the C pipeline; return parsed response.""" proc = subprocess.run( ["./manual_slop_pipeline"], # the C binary input=request, capture_output=True, text=True, timeout=30, ) if proc.returncode != 0: raise PipelineError(proc.stderr) return proc.stdout ``` **Subprocess model is recommended for v1:** - Zero FFI surface (no ctypes, no PyTypeObject, no refcount discipline) - Trivially testable from the shell - Total process isolation (C crash doesn't take down Python) - ~10-20ms startup tax per call (acceptable for batch ops, not for per-frame hot loops) - Easy to swap implementations (rewrite the binary, keep wire format) **Move to in-process FFI only if subprocess startup is the new bottleneck.** The wire format doesn't change. ### 2.4 The chunkification (Reece's Xar pattern in duffle.h style) The chunk-array lives *inside* the C pipeline as a private implementation detail. Python never sees it. ```c // chunks_module.c (hypothetical, duffle.h style) typedef Struct_(ChunkArray) { Slice chunks; // { Chunk* ptr; U8 len; } U4 chunk_size; // power-of-2 U4 element_size; U8 total_used; FArena backing_arena; }; IA_ U8 chunka_push(ChunkArray* ca, U8 element) { U4 chunk_idx = ca->total_used >> log2_of(ca->chunk_size); if (chunk_idx >= ca->chunks.len) { Chunk* new_chunk = farena_push_type(& ca->backing_arena, Chunk, .alignment=64); ca->chunks.ptr[ca->chunks.len] = new_chunk; ca->chunks.len += 1; } U4 offset = ca->total_used & (ca->chunk_size - 1); U8* dst = (U8*)&ca->chunks.ptr[chunk_idx][offset * ca->element_size]; dst[0] = element; ca->total_used += 1; return ca->total_used - 1; } IA_ U8 chunka_at(ChunkArray* ca, U8 i) { U4 chunk_idx = i >> log2_of(ca->chunk_size); U4 offset = i & (ca->chunk_size - 1); return ((U8*)ca->chunks.ptr[chunk_idx])[offset * ca->element_size]; } ``` This is Reece's Xar pattern (8-byte header, power-of-2 chunks, bitwise divmod) written in the user's duffle.h style. ~200 lines of C for the chunk-array + ops. ### 2.5 Build + deploy - **Build:** `clang -O3 -std=c23 -shared chunks_module.c -o libchunks.so` (or .dll on Windows) - **Distribution:** ship the binary alongside the Python wheel. uv + pyproject.toml can reference a `[tool.uv.scripts]` entry that builds the C binary as part of `uv sync` - **Test:** `tests/test_chunka_c11.py` — TDD-style, write Python tests first, then write the C, verify - **Subprocess invocation:** `subprocess.run([sysconfig.get_path("scripts") + "/manual_slop_pipeline"], ...)` ### 2.6 The decision tree (when activated) ``` Is the target code path actually a bottleneck in profiling? ├── No → Don't activate. Re-evaluate next quarter. │ └── Yes → Is the bottleneck solvable with existing Python packages? ├── Yes (e.g., switch to_dict/from_dict to pickle) → Apply that fix. │ Cost: hours. Don't reach for C11. │ └── No (existing packages aren't fast enough) → Activate this track: 1. Define wire format (text v1, binary v2) 2. Write C11 pipeline binary in duffle.h style 3. Write Python wrapper (subprocess.run) 4. Profile: confirm C11 path is faster than Python baseline 5. If not faster, throw away C11 code and try different Python package ``` --- ## 3. Activation criteria (the 4 questions to revisit) These are the design decisions to make *when* (not before) the user hits a real bottleneck: 1. **Which target?** Is it markdown parsing, snapshot processing, log aggregation, RAG indexing, or something else? Each has different op shapes. 2. **Subprocess or in-process FFI?** Start with subprocess. Move to in-process only if startup cost is the new bottleneck. 3. **Text or binary wire format?** Text v1 (debuggable). Binary v2 (fast). Envelope-text + payload-binary middle ground. 4. **One pipeline binary or many?** One binary with op registry (simpler to build/test/deploy). Many binaries (more modular, harder to coordinate). Recommend one binary. --- ## 4. What this track does NOT produce (today) - No C code - No Python wrapper - No build configuration - No tests - No profiling - No activation This track produces only this contingency document. It is **not** in the active queue. It does not appear in `conductor/tracks.md` "Active Tracks" table. It appears in the "Future / Contingency" section as a *reference*, not a *commitment*. --- ## 5. What this track IS - A clear, pre-defined activation plan so when a hard constraint surfaces, the implementation work is already scoped - An honest record that the current bottlenecks are not yet hard constraints - A reference for the user's "what would C11 interop look like?" question, answered with the request/response pipeline model - A reminder that "default action is don't" — the existing Python tooling should be tried first --- ## 6. See Also - `docs/reports/c11_python_interop_assessment_20260608.md` — the full v1 + v2 assessment (style reference, interop design space, the v2 contingency) - `docs/reports/session_synthesis_20260608.md` §8.2 — the original proposal - `docs/ideation/ed_chunk_data_structures_20260523.md` — the user's chunk-ideation (the underlying principle) - `docs/reports/computational_shapes_ssdl_digest_20260608.md` — the **SSDL digest** (the theoretical foundation for this track; see §5.2 "Xar-style chunked arrays" + Technique 5 "Assume-away (Xar)" in §2.2 for the explicit pre-supports of this pattern; "Assume as much as possible" lens in §4 is the threshold-shift rationale — if the cost of being wrong is low, assume; if high, use a different structure) - `docs/transcripts/i-h95QIGchY_assuming_as_much_as_possible_andrewreece.txt` §56:42 — Reece's Xar (reference implementation) - `docs/transcripts/wo84LFzx5nI_big_oops_casemuratori.txt` — Muratori's "Big OOPs" (the historical indictment; the "domain vs systems" lens in SSDL §3 derives from this) - `src/aggregate.py:380-454` — the current markdown hot path (NOT a bottleneck today) - `src/history.py:1-141` — the current snapshot hot path (NOT a bottleneck today) - `pyproject.toml:6-27` — current zero-markdown-deps state ### 6.1 The SSDL alignment (why the chunkification is the *correct* shape, when activated) The SSDL digest's §2.2 enumerates 5 defusing techniques. The chunkification pattern is Technique 5 ("Assume-away (Xar)"). The digest's §5.2 explicitly recommends "Replace `realloc`-style growable buffers with Xar-like chunked arrays for chat history, log buffers, and the comms log" — which is *exactly* this track's target. The §5.1 "low-cost, high-value" recommendations include the "Add generational handles to the `TrackDAG` and `Ticket` system" pattern. If the chunkification track activates for `comms.log`, the *adjacent* ticket-storage refactor (per the digest's §5.2 "Refactor MMA ticket storage toward an ECS shape") becomes a natural follow-up. **The SSDL digest pre-supports this track.** When the activation criteria are met, the theoretical foundation is already in place. The implementation work is *applying* the SSDL's Technique 5 + the user's duffle.h style to a specific target. --- *End of contingency. Status: DEFERRED. Promote to active track when (if) the first hard constraint surfaces.*