# Track: Chunkification Optimization (C11 Pipeline Contingency)

**Status:** Placeholder / contingency (do not start without a hard constraint)
**Initialized:** 2026-06-08
**Owner:** Tier 2 Tech Lead
**Priority:** DEFERRED (no current bottleneck)

> **The one-paragraph summary.** This is a *contingency document*, not an active track. It activates only when a hard constraint surfaces that no existing Python package can solve, AND the target is hot enough that the C11 build cost is justified. Per user (verbatim): *"only worth it if I reach a hard constraint that I cannot solve with an existing python package. Then I could make a custom pipelien to deal with the hot data set witha custom cpython extension."* The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are **not currently bottlenecks** per `src/aggregate.py:380-454` (current implementation is pure-Python string concat, zero third-party markdown deps in `pyproject.toml:6-27`) and `src/history.py:1-141` (snapshot deep copy is bounded ~500KB at 100-snapshot capacity, debounced in `gui_2.py:1140-1170`).
>
> **The activation plan** is the substantive content of this doc — what to build *if/when* the hard constraint surfaces. The shape is a request-blob → C11 pipeline → response-blob subprocess, NOT a stateful CPython C extension. This is the v2 framing from `docs/reports/c11_python_interop_assessment_20260608.md` Part 3, §3.5-3.12.

---

## 1. Why this is a contingency, not a track

### 1.1 The two target use cases are not currently bottlenecks

**Markdown parsing into aggregate markdown:**
- `src/aggregate.py:380-454` (`build_markdown_from_items`) builds markdown by **pure-Python string concatenation** (`f"### \`{original}\`\n\n\`\`\`{suffix}\n{skeleton}\n\`\`\""` and `"\n\n---\n\n".join(sections)`)
- `pyproject.toml:6-27` has **zero third-party markdown dependencies** (`mistune`, `markdown-it-py`, `commonmark-py`, `markdown` are all NOT in deps)
- `src/summarize.py:7-219` `_summarise_markdown` only extracts headings; doesn't parse body
- **First fix if this becomes a bottleneck:** add `markdown-it-py` to `pyproject.toml`. ~1 line change, ~10x speedup over pure-Python regex parsing. NOT C11.

**Context snapshot processing:**
- `src/history.py:1-141` `UISnapshot` is a 13-field dataclass. 100-snapshot default capacity. ~500KB max payload
- `HistoryManager` snapshot capture is debounced at render frame (`gui_2.py:1140-1170`), not per-frame
- `to_dict()` / `from_dict()` deep-copies are the only meaningful work
- **First fix if this becomes a bottleneck:** switch from `to_dict`/`from_dict` to `pickle` (5-10x faster) or `msgspec` (10-20x faster). NOT C11.

### 1.2 The threshold is "hard constraint that no existing Python package can solve"

Per user, the C11 path is justified ONLY when profiling demonstrates a real bottleneck AND the existing-Python-package fix has been tried and doesn't work. **This has not happened yet.**

---

## 2. The activation plan (what to build when the constraint surfaces)

### 2.1 Wire format (the contract)

The Python side builds a request envelope; the C11 side reads it, runs ops, writes a response. The wire format is the ONLY contract; both sides agree on it.

**v1 (text, debuggable):**
```
# request.txt
op parse_md
op summarise_python
op mask_symbols @sym1 def @sym2 sig
op build_section tier=3
input file src/foo.py
input file src/bar.py
format markdown_v3
end
```

**v2 (binary, fast):**
```
[1 byte: format version]
[1 byte: op_count]
[for each op: op_id | param_count | params]
[for each input: byte_len | path | content]
```

**Recommended:** start with text v1, switch to binary v2 if profiling shows parse cost matters. A reasonable middle path: **text envelope + binary payloads** (you can `cat` the envelope to debug; the heavy bytes move binary).

### 2.2 The C11 pipeline API

Single entry point. Standalone binary. No Python awareness.

```c
// chunks_module.c (hypothetical)
typedef Struct_(PipelineResponse) {
    U8* bytes;
    U8  len;
    U4  exit_code;   // 0 = success
    Str8 error_msg;  // optional
};

IA_ PipelineResponse pipeline_run(Slice request);
```

The C side:
1. Parses the request envelope
2. Loads input files (or accepts inline blobs)
3. Runs each op in order
4. Collects output into response blob
5. Returns exit code + response

### 2.3 The Python wrapper

```python
# Python side (hypothetical)
import subprocess
import json

def run_pipeline(request: str) -> str:
    """Shell out to the C pipeline; return parsed response."""
    proc = subprocess.run(
        ["./manual_slop_pipeline"],  # the C binary
        input=request,
        capture_output=True,
        text=True,
        timeout=30,
    )
    if proc.returncode != 0:
        raise PipelineError(proc.stderr)
    return proc.stdout
```

**Subprocess model is recommended for v1:**
- Zero FFI surface (no ctypes, no PyTypeObject, no refcount discipline)
- Trivially testable from the shell
- Total process isolation (C crash doesn't take down Python)
- ~10-20ms startup tax per call (acceptable for batch ops, not for per-frame hot loops)
- Easy to swap implementations (rewrite the binary, keep wire format)

**Move to in-process FFI only if subprocess startup is the new bottleneck.** The wire format doesn't change.

### 2.4 The chunkification (Reece's Xar pattern in duffle.h style)

The chunk-array lives *inside* the C pipeline as a private implementation detail. Python never sees it.

```c
// chunks_module.c (hypothetical, duffle.h style)
typedef Struct_(ChunkArray) {
    Slice  chunks;        // { Chunk* ptr; U8 len; }
    U4     chunk_size;    // power-of-2
    U4     element_size;
    U8     total_used;
    FArena backing_arena;
};

IA_ U8 chunka_push(ChunkArray* ca, U8 element) {
    U4 chunk_idx = ca->total_used >> log2_of(ca->chunk_size);
    if (chunk_idx >= ca->chunks.len) {
        Chunk* new_chunk = farena_push_type(& ca->backing_arena, Chunk, .alignment=64);
        ca->chunks.ptr[ca->chunks.len] = new_chunk;
        ca->chunks.len += 1;
    }
    U4 offset = ca->total_used & (ca->chunk_size - 1);
    U8* dst = (U8*)&ca->chunks.ptr[chunk_idx][offset * ca->element_size];
    dst[0] = element;
    ca->total_used += 1;
    return ca->total_used - 1;
}

IA_ U8 chunka_at(ChunkArray* ca, U8 i) {
    U4 chunk_idx = i >> log2_of(ca->chunk_size);
    U4 offset    = i & (ca->chunk_size - 1);
    return ((U8*)ca->chunks.ptr[chunk_idx])[offset * ca->element_size];
}
```

This is Reece's Xar pattern (8-byte header, power-of-2 chunks, bitwise divmod) written in the user's duffle.h style. ~200 lines of C for the chunk-array + ops.

### 2.5 Build + deploy

- **Build:** `clang -O3 -std=c23 -shared chunks_module.c -o libchunks.so` (or .dll on Windows)
- **Distribution:** ship the binary alongside the Python wheel. uv + pyproject.toml can reference a `[tool.uv.scripts]` entry that builds the C binary as part of `uv sync`
- **Test:** `tests/test_chunka_c11.py` — TDD-style, write Python tests first, then write the C, verify
- **Subprocess invocation:** `subprocess.run([sysconfig.get_path("scripts") + "/manual_slop_pipeline"], ...)`

### 2.6 The decision tree (when activated)

```
Is the target code path actually a bottleneck in profiling?
├── No  → Don't activate. Re-evaluate next quarter.
│
└── Yes → Is the bottleneck solvable with existing Python packages?
    ├── Yes (e.g., switch to_dict/from_dict to pickle) → Apply that fix.
    │         Cost: hours. Don't reach for C11.
    │
    └── No (existing packages aren't fast enough) → Activate this track:
              1. Define wire format (text v1, binary v2)
              2. Write C11 pipeline binary in duffle.h style
              3. Write Python wrapper (subprocess.run)
              4. Profile: confirm C11 path is faster than Python baseline
              5. If not faster, throw away C11 code and try different Python package
```

---

## 3. Activation criteria (the 4 questions to revisit)

These are the design decisions to make *when* (not before) the user hits a real bottleneck:

1. **Which target?** Is it markdown parsing, snapshot processing, log aggregation, RAG indexing, or something else? Each has different op shapes.
2. **Subprocess or in-process FFI?** Start with subprocess. Move to in-process only if startup cost is the new bottleneck.
3. **Text or binary wire format?** Text v1 (debuggable). Binary v2 (fast). Envelope-text + payload-binary middle ground.
4. **One pipeline binary or many?** One binary with op registry (simpler to build/test/deploy). Many binaries (more modular, harder to coordinate). Recommend one binary.

---

## 4. What this track does NOT produce (today)

- No C code
- No Python wrapper
- No build configuration
- No tests
- No profiling
- No activation

This track produces only this contingency document. It is **not** in the active queue. It does not appear in `conductor/tracks.md` "Active Tracks" table. It appears in the "Future / Contingency" section as a *reference*, not a *commitment*.

---

## 5. What this track IS

- A clear, pre-defined activation plan so when a hard constraint surfaces, the implementation work is already scoped
- An honest record that the current bottlenecks are not yet hard constraints
- A reference for the user's "what would C11 interop look like?" question, answered with the request/response pipeline model
- A reminder that "default action is don't" — the existing Python tooling should be tried first

---

## 6. See Also

- `docs/reports/c11_python_interop_assessment_20260608.md` — the full v1 + v2 assessment (style reference, interop design space, the v2 contingency)
- `docs/reports/session_synthesis_20260608.md` §8.2 — the original proposal
- `docs/ideation/ed_chunk_data_structures_20260523.md` — the user's chunk-ideation (the underlying principle)
- `docs/reports/computational_shapes_ssdl_digest_20260608.md` — the **SSDL digest** (the theoretical foundation for this track; see §5.2 "Xar-style chunked arrays" + Technique 5 "Assume-away (Xar)" in §2.2 for the explicit pre-supports of this pattern; "Assume as much as possible" lens in §4 is the threshold-shift rationale — if the cost of being wrong is low, assume; if high, use a different structure)
- `docs/transcripts/i-h95QIGchY_assuming_as_much_as_possible_andrewreece.txt` §56:42 — Reece's Xar (reference implementation)
- `docs/transcripts/wo84LFzx5nI_big_oops_casemuratori.txt` — Muratori's "Big OOPs" (the historical indictment; the "domain vs systems" lens in SSDL §3 derives from this)
- `src/aggregate.py:380-454` — the current markdown hot path (NOT a bottleneck today)
- `src/history.py:1-141` — the current snapshot hot path (NOT a bottleneck today)
- `pyproject.toml:6-27` — current zero-markdown-deps state

### 6.1 The SSDL alignment (why the chunkification is the *correct* shape, when activated)

The SSDL digest's §2.2 enumerates 5 defusing techniques. The chunkification pattern is Technique 5 ("Assume-away (Xar)"). The digest's §5.2 explicitly recommends "Replace `realloc`-style growable buffers with Xar-like chunked arrays for chat history, log buffers, and the comms log" — which is *exactly* this track's target.

The §5.1 "low-cost, high-value" recommendations include the "Add generational handles to the `TrackDAG` and `Ticket` system" pattern. If the chunkification track activates for `comms.log`, the *adjacent* ticket-storage refactor (per the digest's §5.2 "Refactor MMA ticket storage toward an ECS shape") becomes a natural follow-up.

**The SSDL digest pre-supports this track.** When the activation criteria are met, the theoretical foundation is already in place. The implementation work is *applying* the SSDL's Technique 5 + the user's duffle.h style to a specific target.

---

*End of contingency. Status: DEFERRED. Promote to active track when (if) the first hard constraint surfaces.*