Private

Public Access

Files

T

conductor-tier2 816e9f2f5c conductor(track): chunkification_optimization_20260608_PLACEHOLDER - 1-page contingency document

The user's third correction this session changed the framing
from "build a stateful C extension" to "wait for a hard constraint,
then build a request/response blob pipeline." This commit creates
a 1-page contingency document (no plan.md, no implementation)
that captures:

- The threshold: "only worth it under a hard constraint that
  no existing Python package can solve"
- The shape when activated: subprocess-launch C11 binary with
  request/response blob wire format (NOT stateful CPython C
  extension)
- The 2 cited candidates (markdown parsing into aggregate markdown,
  context snapshot processing) are NOT currently bottlenecks per
  src/aggregate.py:380-454 (pure-Python string concat, zero
  third-party markdown deps in pyproject.toml:6-27) and
  src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity,
  debounced)
- The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 +
  "Xar-style chunked arrays" recommendation in §5.2 pre-support
  this track

Files (4 total, 227+ lines of contingency document):
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/metadata.json
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/state.toml
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/index.md

Cross-references added:
- docs/reports/computational_shapes_ssdl_digest_20260608.md (the
  SSDL digest is the theoretical foundation; explicitly cited in
  the spec's §6.1 "SSDL alignment" and in metadata.json external)
- docs/reports/c11_python_interop_assessment_20260608.md (the v1+v2
  assessment; explicitly cited in spec's §6 See Also)

No code modified. Track does NOT appear in the active queue
of conductor/tracks.md; appears in the Backlog / Contingency
section as a reference, not a commitment.

Activation criteria (per metadata.json):
1. Profiling shows a real bottleneck in a target code path
2. The bottleneck cannot be solved with existing Python packages
3. The user explicitly approves activation

Without all 3, this track stays deferred. Default action is don't.

2026-06-08 23:40:27 -04:00

12 KiB

Raw Blame History

Track: Chunkification Optimization (C11 Pipeline Contingency)

Status: Placeholder / contingency (do not start without a hard constraint) Initialized: 2026-06-08 Owner: Tier 2 Tech Lead Priority: DEFERRED (no current bottleneck)

The one-paragraph summary. This is a contingency document, not an active track. It activates only when a hard constraint surfaces that no existing Python package can solve, AND the target is hot enough that the C11 build cost is justified. Per user (verbatim): "only worth it if I reach a hard constraint that I cannot solve with an existing python package. Then I could make a custom pipelien to deal with the hot data set witha custom cpython extension." The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are not currently bottlenecks per src/aggregate.py:380-454 (current implementation is pure-Python string concat, zero third-party markdown deps in pyproject.toml:6-27) and src/history.py:1-141 (snapshot deep copy is bounded ~500KB at 100-snapshot capacity, debounced in gui_2.py:1140-1170).

The activation plan is the substantive content of this doc — what to build if/when the hard constraint surfaces. The shape is a request-blob → C11 pipeline → response-blob subprocess, NOT a stateful CPython C extension. This is the v2 framing from docs/reports/c11_python_interop_assessment_20260608.md Part 3, §3.5-3.12.

1. Why this is a contingency, not a track

1.1 The two target use cases are not currently bottlenecks

Markdown parsing into aggregate markdown:

src/aggregate.py:380-454 (build_markdown_from_items) builds markdown by pure-Python string concatenation (f"### \{original}`\n\n```{suffix}\n{skeleton}\n``""and"\n\n---\n\n".join(sections)`)
pyproject.toml:6-27 has zero third-party markdown dependencies (mistune, markdown-it-py, commonmark-py, markdown are all NOT in deps)
src/summarize.py:7-219 _summarise_markdown only extracts headings; doesn't parse body
First fix if this becomes a bottleneck: add markdown-it-py to pyproject.toml. ~1 line change, ~10x speedup over pure-Python regex parsing. NOT C11.

Context snapshot processing:

src/history.py:1-141 UISnapshot is a 13-field dataclass. 100-snapshot default capacity. ~500KB max payload
HistoryManager snapshot capture is debounced at render frame (gui_2.py:1140-1170), not per-frame
to_dict() / from_dict() deep-copies are the only meaningful work
First fix if this becomes a bottleneck: switch from to_dict/from_dict to pickle (5-10x faster) or msgspec (10-20x faster). NOT C11.

1.2 The threshold is "hard constraint that no existing Python package can solve"

Per user, the C11 path is justified ONLY when profiling demonstrates a real bottleneck AND the existing-Python-package fix has been tried and doesn't work. This has not happened yet.

2. The activation plan (what to build when the constraint surfaces)

2.1 Wire format (the contract)

The Python side builds a request envelope; the C11 side reads it, runs ops, writes a response. The wire format is the ONLY contract; both sides agree on it.

v1 (text, debuggable):

# request.txt
op parse_md
op summarise_python
op mask_symbols @sym1 def @sym2 sig
op build_section tier=3
input file src/foo.py
input file src/bar.py
format markdown_v3
end

v2 (binary, fast):

[1 byte: format version]
[1 byte: op_count]
[for each op: op_id | param_count | params]
[for each input: byte_len | path | content]

Recommended: start with text v1, switch to binary v2 if profiling shows parse cost matters. A reasonable middle path: text envelope + binary payloads (you can cat the envelope to debug; the heavy bytes move binary).

2.2 The C11 pipeline API

Single entry point. Standalone binary. No Python awareness.

// chunks_module.c (hypothetical)
typedef Struct_(PipelineResponse) {
    U8* bytes;
    U8  len;
    U4  exit_code;   // 0 = success
    Str8 error_msg;  // optional
};

IA_ PipelineResponse pipeline_run(Slice request);

The C side:

Parses the request envelope
Loads input files (or accepts inline blobs)
Runs each op in order
Collects output into response blob
Returns exit code + response

2.3 The Python wrapper

# Python side (hypothetical)
import subprocess
import json

def run_pipeline(request: str) -> str:
    """Shell out to the C pipeline; return parsed response."""
    proc = subprocess.run(
        ["./manual_slop_pipeline"],  # the C binary
        input=request,
        capture_output=True,
        text=True,
        timeout=30,
    )
    if proc.returncode != 0:
        raise PipelineError(proc.stderr)
    return proc.stdout

Subprocess model is recommended for v1:

Zero FFI surface (no ctypes, no PyTypeObject, no refcount discipline)
Trivially testable from the shell
Total process isolation (C crash doesn't take down Python)
~10-20ms startup tax per call (acceptable for batch ops, not for per-frame hot loops)
Easy to swap implementations (rewrite the binary, keep wire format)

Move to in-process FFI only if subprocess startup is the new bottleneck. The wire format doesn't change.

2.4 The chunkification (Reece's Xar pattern in duffle.h style)

The chunk-array lives inside the C pipeline as a private implementation detail. Python never sees it.

// chunks_module.c (hypothetical, duffle.h style)
typedef Struct_(ChunkArray) {
    Slice  chunks;        // { Chunk* ptr; U8 len; }
    U4     chunk_size;    // power-of-2
    U4     element_size;
    U8     total_used;
    FArena backing_arena;
};

IA_ U8 chunka_push(ChunkArray* ca, U8 element) {
    U4 chunk_idx = ca->total_used >> log2_of(ca->chunk_size);
    if (chunk_idx >= ca->chunks.len) {
        Chunk* new_chunk = farena_push_type(& ca->backing_arena, Chunk, .alignment=64);
        ca->chunks.ptr[ca->chunks.len] = new_chunk;
        ca->chunks.len += 1;
    }
    U4 offset = ca->total_used & (ca->chunk_size - 1);
    U8* dst = (U8*)&ca->chunks.ptr[chunk_idx][offset * ca->element_size];
    dst[0] = element;
    ca->total_used += 1;
    return ca->total_used - 1;
}

IA_ U8 chunka_at(ChunkArray* ca, U8 i) {
    U4 chunk_idx = i >> log2_of(ca->chunk_size);
    U4 offset    = i & (ca->chunk_size - 1);
    return ((U8*)ca->chunks.ptr[chunk_idx])[offset * ca->element_size];
}

This is Reece's Xar pattern (8-byte header, power-of-2 chunks, bitwise divmod) written in the user's duffle.h style. ~200 lines of C for the chunk-array + ops.

2.5 Build + deploy

Build: clang -O3 -std=c23 -shared chunks_module.c -o libchunks.so (or .dll on Windows)
Distribution: ship the binary alongside the Python wheel. uv + pyproject.toml can reference a [tool.uv.scripts] entry that builds the C binary as part of uv sync
Test: tests/test_chunka_c11.py — TDD-style, write Python tests first, then write the C, verify
Subprocess invocation: subprocess.run([sysconfig.get_path("scripts") + "/manual_slop_pipeline"], ...)

2.6 The decision tree (when activated)

Is the target code path actually a bottleneck in profiling?
├── No  → Don't activate. Re-evaluate next quarter.
│
└── Yes → Is the bottleneck solvable with existing Python packages?
    ├── Yes (e.g., switch to_dict/from_dict to pickle) → Apply that fix.
    │         Cost: hours. Don't reach for C11.
    │
    └── No (existing packages aren't fast enough) → Activate this track:
              1. Define wire format (text v1, binary v2)
              2. Write C11 pipeline binary in duffle.h style
              3. Write Python wrapper (subprocess.run)
              4. Profile: confirm C11 path is faster than Python baseline
              5. If not faster, throw away C11 code and try different Python package

3. Activation criteria (the 4 questions to revisit)

These are the design decisions to make when (not before) the user hits a real bottleneck:

Which target? Is it markdown parsing, snapshot processing, log aggregation, RAG indexing, or something else? Each has different op shapes.
Subprocess or in-process FFI? Start with subprocess. Move to in-process only if startup cost is the new bottleneck.
Text or binary wire format? Text v1 (debuggable). Binary v2 (fast). Envelope-text + payload-binary middle ground.
One pipeline binary or many? One binary with op registry (simpler to build/test/deploy). Many binaries (more modular, harder to coordinate). Recommend one binary.

4. What this track does NOT produce (today)

No C code
No Python wrapper
No build configuration
No tests
No profiling
No activation

This track produces only this contingency document. It is not in the active queue. It does not appear in conductor/tracks.md "Active Tracks" table. It appears in the "Future / Contingency" section as a reference, not a commitment.

5. What this track IS

A clear, pre-defined activation plan so when a hard constraint surfaces, the implementation work is already scoped
An honest record that the current bottlenecks are not yet hard constraints
A reference for the user's "what would C11 interop look like?" question, answered with the request/response pipeline model
A reminder that "default action is don't" — the existing Python tooling should be tried first

6. See Also

docs/reports/c11_python_interop_assessment_20260608.md — the full v1 + v2 assessment (style reference, interop design space, the v2 contingency)
docs/reports/session_synthesis_20260608.md §8.2 — the original proposal
docs/ideation/ed_chunk_data_structures_20260523.md — the user's chunk-ideation (the underlying principle)
docs/reports/computational_shapes_ssdl_digest_20260608.md — the SSDL digest (the theoretical foundation for this track; see §5.2 "Xar-style chunked arrays" + Technique 5 "Assume-away (Xar)" in §2.2 for the explicit pre-supports of this pattern; "Assume as much as possible" lens in §4 is the threshold-shift rationale — if the cost of being wrong is low, assume; if high, use a different structure)
docs/transcripts/i-h95QIGchY_assuming_as_much_as_possible_andrewreece.txt §56:42 — Reece's Xar (reference implementation)
docs/transcripts/wo84LFzx5nI_big_oops_casemuratori.txt — Muratori's "Big OOPs" (the historical indictment; the "domain vs systems" lens in SSDL §3 derives from this)
src/aggregate.py:380-454 — the current markdown hot path (NOT a bottleneck today)
src/history.py:1-141 — the current snapshot hot path (NOT a bottleneck today)
pyproject.toml:6-27 — current zero-markdown-deps state

6.1 The SSDL alignment (why the chunkification is the correct shape, when activated)

The SSDL digest's §2.2 enumerates 5 defusing techniques. The chunkification pattern is Technique 5 ("Assume-away (Xar)"). The digest's §5.2 explicitly recommends "Replace realloc-style growable buffers with Xar-like chunked arrays for chat history, log buffers, and the comms log" — which is exactly this track's target.

The §5.1 "low-cost, high-value" recommendations include the "Add generational handles to the TrackDAG and Ticket system" pattern. If the chunkification track activates for comms.log, the adjacent ticket-storage refactor (per the digest's §5.2 "Refactor MMA ticket storage toward an ECS shape") becomes a natural follow-up.

The SSDL digest pre-supports this track. When the activation criteria are met, the theoretical foundation is already in place. The implementation work is applying the SSDL's Technique 5 + the user's duffle.h style to a specific target.

End of contingency. Status: DEFERRED. Promote to active track when (if) the first hard constraint surfaces.

12 KiB Raw Blame History