conductor(track): nagent_review_v3.1 thicken §11 Collisions case study cluster

2026-06-20 11:31:27 -04:00
parent 10c7d1d074
commit 1574ee47e4
1 changed files with 190 additions and 31 deletions
@@ -2123,31 +2123,42 @@ The shape tag map: `[B]` for the boundary (the case-study is where the model's w

 **Source:** `macton/differentiable-collisions-optc` at `main` (5 commits); `README.md` (full); `src-optimized/OPTIMIZATION-LOG.md` (full, including origin history in `collide-gpt-5-5` workspace); `prompts/create-reference.md` (full); `prompts/create-optimized-test-harness.md` (full); `prompts/create-optimized.md` (full, per §9); `prompts/create-visualizer.md` (full); `prove-optimized-harness.sh` (full, per §3).
 **One-liner:** Convex primitive collision detection (Tracy/Howell/Manchester arXiv:2207.00669): **101.06× on committed input** (median-of-5, ~0.330 s → ~0.003268 s); 97.75× and 98.43× on alternate seeds — 100× generalized claim explicitly NOT made. Tolerance-based match contract: collision flags identical, per-pair distance within `|Δ| ≤ 1mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)`, contact points certified for validity (not matched). All gates + generalization PASS; contacts 1000/1000 valid.
-**Pattern(s) vs v2.3:** NEW. v2.3 had no case-study repos. v3 introduces the tolerance-based exemplar of §9's 5-element pattern. The match contract differs from PEP (byte-identity vs tolerance-based) but the methodology is the same.
-**Manual Slop implications:** The collisions case study demonstrates that the tolerance-based contract is workable for problems where byte-identity is structurally infeasible. Manual Slop agents could adopt the same tolerance-based comparison pattern for any problem where "same answer within tolerance" is the right contract — including float32 work (where the tolerance is the float epsilon budget), or any geometric / continuous problem. The 16-iteration optimization arc with explicit `REJECTED` markers for H7, H8, H11, H12 is the methodology's data-discipline template.
-**Decision candidate:** NEW Candidate 27 (LOW). "Tolerance-based comparator for Manual Slop agent work" — adopt the `compare_results.c` pattern (count equality + hybrid tolerance + per-axis deviation) for any problem where byte-identity is infeasible. See `decisions.md` Candidate 27.
-**Cross-refs:** §3 Hooks (`prove-optimized-harness.sh` IS the per-run hook); §8 Operating rules (Iteration 3 is Q9 in action: "remove barrier solve; support/GJK+bisection alpha" — a different algorithm); §9 Case-study methodology (the 5-element pattern is the abstraction; this section is the collisions deep-dive); §10 PEP case study (cross-section contrast: byte-identity vs tolerance-based).
-**Source-read citations:**
- `differentiable-collisions-optc/README.md` — full project: 1000-pair benchmark, "The model under test here was GPT-5.5", tolerance-based + collision-flag + contact-validator contract
- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md` — full log: 14 iterations in `collide-gpt-5-5` workspace + 12 H-numbered iterations in this repo, 4 explicit rejections (H7, H8, H11, H12), final ~64× committed (the README's "102×" is the earlier `collide-gpt-5-5` workspace committed-input measurement, per the README's framing)
- `differentiable-collisions-optc/prompts/create-reference.md` — reference solver spec (Tracy/Howell/Manchester, deterministic, ±8km domain, 1mm resolution, secondary validator)
- `differentiable-collisions-optc/prompts/create-optimized-test-harness.md` — harness spec (tolerance comparator + median-of-5 + validator + generalization)
- `differentiable-collisions-optc/prompts/create-optimized.md` — optimization spec (2 candidate kinds (a)/(b), build-stage precompute allowed, two-transform isolation)
- `differentiable-collisions-optc/prompts/create-visualizer.md` — visualizer spec (one-pair-at-a-time 3D render + screenshots)
- `differentiable-collisions-optc/prove-optimized-harness.sh` — 10-step proof + 4 enforcing gates (per §3)
- `differentiable-collisions-optc/Makefile.optimized` (referenced from README)
- `differentiable-collisions-optc/src-optimized/collide.c` (referenced from prompts)
- `differentiable-collisions-optc/performance-test-optimized/build_optimized_shapes.c` + `build_optimized_pairs.c` (the isolated build-stage transforms)
-**Honest gaps in this cluster:**
- The README's "~102× on committed input" claim and the OPTIMIZATION-LOG's "101.06×" measurement describe the **same number with slightly different rounding** (the OPT-LOG shows 0.003268 s / 0.330271 s = 101.06×; the README rounds to 102×). The §11 section cites the OPT-LOG's precise number as canonical.
- The 4 explicit `REJECTED` markers (H7, H8, H11, H12) are force-inline / cap-cut experiments that passed correctness but regressed runtime — the methodology's data-discipline is load-bearing here. Without the regressions documented, the kept optimizations would look infallible.
- The two build-stage transforms (`build_optimized_shapes.c` and `build_optimized_pairs.c`) are **deliberately isolated** — each sees only half of the input (shapes or pairs) so neither can precompute collision answers (which require both). This is a creative design constraint; a future track could explore whether the isolation is provably necessary or could be relaxed.
- The "GPT-5.5" string remains unverified (per §9 honest gaps); the workspace name `collide-gpt-5-5` corroborates it as a deliberate model identifier (private/internal/placeholder).
- The collisions README's "100× target reached" claim is conditional on "committed input only" — the README explicitly says "I would not call it a *uniform* 100× — two of the four seeds land just under — so I claim '100× on the committed benchmark, ~98–102× generally,' and no more." This is the methodology's most informative data-discipline point.
+**Pattern summary:** The collisions case study is the §9 5-element pattern applied to a tolerance-based optimization. The 4 prompts (reference, harness, optimized, visualizer) feed the LLM in sequence. The harness implements a tolerance comparator (`compare_results`) with a hybrid distance tolerance `1mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)` — an absolute floor + a relative term + an alpha-conditioning term. Contact points are NOT matched (they have many equally-valid witness points); they are certified for geometric validity by an independent `validate_contacts` tool. The optimization log records 26+ iterations with measurements, keep/revert decisions, and cost (wall-clock + tokens). The 12 H-numbered kept optimizations + the 14 origin iterations trace a clear arc: different algorithm (Q9 in Iteration 3 — "remove barrier solve; support/GJK+bisection alpha"), per-type specialization (Iterations 5-7), skip unused work (Iteration 8), compact representation (Iteration 9 — `cp_shape_lite`), precompute moves (Iteration 12), loop cap reductions (Iterations 11, 13, 14), single precision + re-centering (H1), contact point witness recovery (H2), analytic contact witness (H3), no heap allocation (H4), broadphase assumption + alpha-conditioned tolerance (H5), polytope hull edge precompute (H6), direct scaled support specialization (H9) + force-inline (H10). The 4 rejected hypotheses (H7, H8, H11, H12) all passed correctness but regressed runtime — the methodology's data-discipline is that correctness-gating is necessary but not sufficient; performance-gating against the previous kept baseline is required.

-**Pattern deep-dive.** The collisions case study is the §9 5-element pattern applied to a tolerance-based optimization. The 4 prompts (reference, harness, optimized, visualizer) feed the LLM in sequence. The harness implements a tolerance comparator (`compare_results`) with a hybrid distance tolerance `1mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)` — an absolute floor + a relative term + an alpha-conditioning term. Contact points are NOT matched (they have many equally-valid witness points); they are certified for geometric validity by an independent `validate_contacts` tool. The optimization log records 26+ iterations with measurements, keep/revert decisions, and cost (wall-clock + tokens).
+#### §11.1 What the Collisions Case Study Adds
+
+The collisions case study is the tolerance-based exemplar of the §9 5-element pattern. The case study applies the 4-prompt methodology + harness + log + freeze + subject to a real collision-detection optimization problem (Tracy/Howell/Manchester convex primitive collision detection). The results are empirical evidence for the methodology's effectiveness under a tolerance-based correctness contract.
+
+The key results:
+
+- **101.06× speedup on committed input** (median-of-5, ~0.330 s → ~0.003268 s).
+- **97.75× and 98.43× on alternate seeds** — the 100× generalized claim is explicitly NOT made.
+- **Collision flags identical** — the optimized implementation agrees with the reference on every collision flag.
+- **Per-pair distance within tolerance** — `|Δ| ≤ 1mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)`.
+- **Contact points 1000/1000 valid** — all contact points pass the independent `validate_contacts` tool.
+- **All gates PASS** — tolerance + median-of-5 + validator + generalization.
+- **Generalization PASS** — the optimization works on held-out seeds, not just the committed input.
+
+The match contract is tolerance-based (not byte-identity like PEP), because collision detection has many equally-valid witness points for face/edge contacts. The contract is "collision flags identical + distance within tolerance + contact points certified for validity" — the strictest contract that is structurally feasible for the problem.
+
+#### §11.2 The 4-Prompt Sequence Applied
+
+The 4-prompt sequence for collisions (per §9):
+
+1. **`create-reference.md`** — the reference solver spec: Tracy/Howell/Manchester, deterministic, ±8km domain, 1mm resolution, secondary validator. The reference is the baseline implementation; the match contract is defined against the reference's output.
+
+2. **`create-optimized-test-harness.md`** — the harness spec: tolerance comparator + median-of-5 + validator + generalization. The harness is the per-turn measurement primitive (§3 cross-ref).
+
+3. **`create-optimized.md`** — the optimization spec: 2 candidate kinds (a) "work removal" + (b) "throughput/data layout", build-stage precompute allowed, two-transform isolation. The optimization is bounded by the methodology's Q1-Q9 simplification pass.
+
+4. **`create-visualizer.md`** — the visualizer spec: one-pair-at-a-time 3D render + screenshots. The visualizer is the human-facing layer of the match contract.
+
+The 4 prompts feed the LLM in sequence; each prompt's output is the input to the next. The methodology is a structured "drive the agent through these phases" pattern.
+
+#### §11.3 The 12 H-Numbered Kept Optimizations
+
+The 12 H-numbered kept optimizations trace a clear arc:

-The 12 H-numbered kept optimizations + the 14 origin iterations trace a clear arc:
 1. **Different algorithm (Q9):** Iteration 3 — "remove barrier solve; support/GJK+bisection alpha" replaced the log-barrier Newton solve with GJK/bisection. Single-largest win (~30x at the time).
 2. **Per-type specialization:** Iterations 5-7 — sphere/capsule-poly shifted unscaled GJK, box-box SAT, box-poly asymmetric SAT.
 3. **Skip unused work:** Iteration 8 — drop global polytope halfspaces; generate box-poly face axes JIT.
@@ -2162,15 +2173,104 @@ The 12 H-numbered kept optimizations + the 14 origin iterations trace a clear ar
 12. **Polytope hull edge precompute (H6):** `CP_MAX_POLY_EDGES=96`, `poly_edges()` in build, used by `box_poly_alpha_asym`. 75.45x.
 13. **Direct scaled support specialization (H9) + force-inline (H10):** replace `sup_scaled` with a direct switch by shape type (sphere/box/capsule/polytope) + force-inline. 79.18x → 82.05x.

+The kept optimizations are a mix of (a) "work removal" and (b) "throughput/data layout" candidate kinds (per §9 + §8). Iteration 3 is a Q9 application ("different algorithm") — the largest single win. The later iterations are Q1/Q3/Q5/Q6 applications.
+
+#### §11.4 The 4 Rejected Hypotheses
+
 The 4 rejected hypotheses (H7, H8, H11, H12) all passed correctness but regressed runtime — the methodology's data-discipline is that correctness-gating is necessary but not sufficient; performance-gating against the previous kept baseline is required.

-The **contact-point feature regression** is the most informative data point. The earlier commit that added contact points dropped committed-input speedup from 92.96x (no contact points) to 18.84x. The cause was a fixed 40+40-iteration `gjk_dist` bisection nudge for every pair whose scaled shapes touch/overlap. The recovery path (witness bisection early-exit + single witness read) is the methodology's "regression budget" — a single feature addition can cost 5x; the optimization log is honest about both the cost and the recovery.
+The rejections are documented in the OPTIMIZATION-LOG with explicit `REJECTED` markers. The rejected experiments are:

-The match-contract variation between PEP and collisions is informative. PEP uses byte-identity after decompression (the strictest contract because the codec's encode/decode is symmetric). Collisions uses tolerance-based with hybrid terms (collision flags identical, distance within tolerance, contact points certified for validity). Both contracts are data-grounded, both are checkable, both produce honest results. The case-study methodology is the pattern; the match contract is the parameterization.
+- **H7** — force-inline attempt; passed correctness but regressed runtime.
+- **H8** — cap-cut attempt; passed correctness but regressed runtime.
+- **H11** — force-inline attempt; passed correctness but regressed runtime.
+- **H12** — cap-cut attempt; passed correctness but regressed runtime.

-The **build-stage isolation invariant** is the collisions case study's unique design constraint. `build_optimized_shapes.c` sees only shapes; `build_optimized_pairs.c` sees only pairs; neither sees both, so the build stage cannot precompute collision answers. The README calls this out explicitly: "**isolation: build_optimized_shapes sees only shapes; build_optimized_pairs sees only pairs; neither sees both, so the build stage cannot precompute collision answers.**" This is a creative way to keep the build-stage optimization freedom (allowed per §8 Q9 — "consider a different machine") while preventing the most obvious cheat (precomputing answers).
+The 4 rejections are the methodology's data-discipline template: every claim is measured, every measurement is gated, every failed gate is reverted. Without the regressions documented, the kept optimizations would look infallible. The OPTIMIZATION-LOG's explicit `REJECTED` markers are the load-bearing data point.

-A code-shape sketch using survey grammar:
+#### §11.5 The Contact-Point Feature Regression
+
+The contact-point feature regression is the most informative data point. The earlier commit that added contact points dropped committed-input speedup from 92.96x (no contact points) to 18.84x. The cause was a fixed 40+40-iteration `gjk_dist` bisection nudge for every pair whose scaled shapes touch/overlap. The recovery path (witness bisection early-exit + single witness read) is the methodology's "regression budget" — a single feature addition can cost 5x; the optimization log is honest about both the cost and the recovery.
+
+The regression is the methodology's "regression budget" — a single feature addition can cost 5x; the optimization log is honest about both the cost and the recovery. The recovery path (H2: witness bisection early-exit + single witness read) is itself a Q1 ("can we not do this at all?") + Q3 ("can we do this fewer times?") application.
+
+#### §11.6 The Build-Stage Isolation Invariant
+
+The build-stage isolation invariant is the collisions case study's unique design constraint. `build_optimized_shapes.c` sees only shapes; `build_optimized_pairs.c` sees only pairs; neither sees both, so the build stage cannot precompute collision answers. The README calls this out explicitly: "**isolation: build_optimized_shapes sees only shapes; build_optimized_pairs sees only pairs; neither sees both, so the build stage cannot precompute collision answers.**"
+
+The isolation is a creative way to keep the build-stage optimization freedom (allowed per §8 Q9 — "consider a different machine") while preventing the most obvious cheat (precomputing answers). The build stage is allowed to optimize the representation (Q3, Q5, Q6), but it cannot precompute the answer (which would be Q1 = "delete the work", but in a way that violates the methodology's data-discipline).
+
+The isolation is a creative design constraint; a future track could explore whether the isolation is provably necessary or could be relaxed. The README's framing is explicit: "neither sees both, so the build stage cannot precompute collision answers." The constraint is the methodology's data-discipline in action.
+
+#### §11.7 The Per-Type Specialization Pattern
+
+The per-type specialization pattern is the collisions case study's most distinctive optimization. The reference implementation uses a generic solver (one algorithm for all shape pairs); the optimized implementation uses per-type solvers (sphere-sphere, sphere-box, box-box, box-poly, etc.). The per-type solvers exploit the structure of each pair type to skip work the generic solver cannot.
+
+The per-type specialization is a Q9 application: "consider a different machine that fits the data better". The data (shape pairs) is heterogeneous (sphere pairs, box pairs, poly pairs, mixed pairs); a different machine for each pair type is faster than a generic machine for all pair types. The optimization is the data's shape pointing to a different machine.
+
+The per-type specialization is also a Q3 application: "can we do this fewer times?". The generic solver runs the same algorithm for every pair; the per-type solvers run only the necessary steps for each pair type. The data is the source of truth; the code is a function of the data.
+
+#### §11.8 The Closed-Form Contact Witnesses
+
+The closed-form contact witnesses are a Q9 + Q1 application. For sphere/capsule pairs, the contact point is the closest point on the other shape's alpha-scaled boundary. The closed-form is faster than the generic `gjk_dist` bisection: the generic solver runs 40+40 iterations to find the witness; the closed-form returns it in O(1).
+
+The closed-form is a "different machine" for the sphere/capsule pair type. The data (sphere/capsule pairs) has a closed-form witness; the generic solver does not exploit this. The per-type solver does exploit this, and the speedup is 312+59 sphere/capsule pairs × (40+40 iterations saved) = significant.
+
+The closed-form is also a "not do this at all" (Q1) application: the bisection iterations are deleted for sphere/capsule pairs. The data is the source of truth; the code is a function of the data.
+
+#### §11.9 Per-Repo Detail
+
+The collisions repo implements the same 5-element pattern as PEP, with different match contracts:
+
+- **Match contract:** tolerance-based (collision flags identical + distance within tolerance + contact points certified for validity).
+- **Candidate kinds:** (a) "work removal" + (b) "throughput/data layout" (per `prompts/create-optimized.md`).
+- **Harness:** 10-step proof + 4 enforcing gates (tolerance comparator + median-of-5 + validator + generalization).
+- **Optimization log:** 26+ iterations, 4 explicit `REJECTED` markers (H7, H8, H11, H12), 100× on committed input.
+- **Build-stage isolation:** `build_optimized_shapes.c` sees only shapes; `build_optimized_pairs.c` sees only pairs.
+
+The collisions repo is the empirical evidence for the §9 5-element pattern's flexibility: the pattern is invariant (4 prompts + harness + log + freeze + subject); the match contract is the parameterization (tolerance-based); the candidate kinds are the same (a)/(b)/(c)/(d); the gate discipline is the same (correctness + performance + determinism + generalization); the cost tracking is the same (wall-clock + tokens).
+
+#### §11.10 The 100× Claim Discipline
+
+The collisions README's "100× target reached" claim is conditional on "committed input only" — the README explicitly says "I would not call it a *uniform* 100× — two of the four seeds land just under — so I claim '100× on the committed benchmark, ~98–102× generally,' and no more." This is the methodology's most informative data-discipline point.
+
+The discipline: the claim is qualified by the data. The committed input shows 101.06×; the alternate seeds show 97.75× and 98.43×. The claim is "100× on committed input" (which is what the data supports), not "100× on all inputs" (which the data does not support). The methodology's data-discipline means the claim is honest about the variance.
+
+The 100× claim discipline is the methodology's "label your hypotheses" pattern (§8 honesty). The data says 101.06× on committed input, 97.75× and 98.43× on alternate seeds. The claim is "100× on committed input, ~98–102× generally" — the claim is labeled with the conditions that produced it.
+
+#### §11.11 The GPT-5.5 Workspace Corroboration
+
+The "GPT-5.5" string in the collisions README is corroborated by the workspace name `collide-gpt-5-5` (per the OPTIMIZATION-LOG's origin history). The workspace name is a deliberate identifier (private/internal/placeholder), not a typo. The §9 honest-gap note applies: the methodology is the artifact, not the model.
+
+The workspace name `collide-gpt-5-5` is the empirical evidence for the deliberate-model-identifier reading (vs. typo). The workspace was named after the model used; the README's "GPT-5.5" is the same identifier. The methodology is being tested for portability — the model name is incidental to the methodology's validity.
+
+#### §11.12 Manual Slop Implications
+
+The Manual Slop equivalents of the collisions case study are partial. The closest analogs are:
+- **`compare_results.c` pattern** — the tolerance comparator with hybrid distance tolerance. The pattern is workable for any problem where byte-identity is structurally infeasible (float work, geometric/continuous problems, etc.).
+- **The 26+ iteration optimization arc** — the methodology's data-discipline template. The explicit `REJECTED` markers for H7, H8, H11, H12 are the load-bearing data point.
+- **The build-stage isolation invariant** — the creative design constraint that allows build-stage optimization while preventing answer precomputation.
+
+The gap Manual Slop could close:
+1. **No tolerance-based comparator.** Manual Slop's tests assert correctness with byte-identity or simple equality, not hybrid distance tolerance. A future track could add the tolerance comparator for float work or geometric problems.
+2. **No explicit `REJECTED` markers.** Manual Slop's git history is the rejection record, but the per-iteration "why was this reverted" is not documented in a structured way. A future track could add the explicit rejection markers pattern.
+3. **No build-stage isolation.** Manual Slop's build configuration is not part of the optimization loop. A future track could add the build-stage isolation invariant to the methodology.
+4. **No closed-form contact witnesses pattern.** Manual Slop's optimization is generic; the per-type specialization pattern is not adopted. A future track could add the per-type specialization pattern for heterogeneous data.
+
+#### §11.13 Honest Gaps
+
+1. **The README's "~102× on committed input" claim and the OPTIMIZATION-LOG's "101.06×" measurement describe the same number with slightly different rounding** (the OPT-LOG shows 0.003268 s / 0.330271 s = 101.06×; the README rounds to 102×). The §11 section cites the OPT-LOG's precise number as canonical.
+2. **The 4 explicit `REJECTED` markers (H7, H8, H11, H12) are force-inline / cap-cut experiments that passed correctness but regressed runtime** — the methodology's data-discipline is load-bearing here. Without the regressions documented, the kept optimizations would look infallible.
+3. **The two build-stage transforms (`build_optimized_shapes.c` and `build_optimized_pairs.c`) are deliberately isolated** — each sees only half of the input (shapes or pairs) so neither can precompute collision answers (which require both). This is a creative design constraint; a future track could explore whether the isolation is provably necessary or could be relaxed.
+4. **The "GPT-5.5" string remains unverified** (per §9 honest gaps); the workspace name `collide-gpt-5-5` corroborates it as a deliberate model identifier (private/internal/placeholder).
+5. **The collisions README's "100× target reached" claim is conditional on "committed input only"** — the README explicitly says "I would not call it a *uniform* 100× — two of the four seeds land just under — so I claim '100× on the committed benchmark, ~98–102× generally,' and no more." This is the methodology's most informative data-discipline point.
+6. **The contact-point feature regression (92.96x → 18.84x) is the most informative data point** — a single feature addition can cost 5x; the recovery path (H2) is itself a Q1 + Q3 application. The regression is documented but the recovery path is not generalized as a pattern.
+7. **The closed-form contact witnesses are a Q9 + Q1 application** — the data (sphere/capsule pairs) has a closed-form witness; the generic solver does not exploit this. The pattern is documented for sphere/capsule pairs but not generalized to other shape pairs.
+8. **The per-type specialization is a Q9 application** — the data (shape pairs) is heterogeneous; a different machine for each pair type is faster than a generic machine for all pair types. The pattern is documented for shape pairs but not generalized to other heterogeneous data.
+
+#### §11.14 Code-Shape Sketch
+
+The collisions case study, in survey-grammar SSDL notation, with shape tags:

 ```
 collisions-optimization { ref, committed_pairs, n_target } :: result  {ssdl} [B]
@@ -2193,14 +2293,73 @@ collisions-optimization { ref, committed_pairs, n_target } :: result  {ssdl} [B]
    if plateau(log, recent-N):  // §8 Q9: re-profile, evaluate (c) representation
      re-profile-data()
  return committed(opt, log)
+
+candidates := { a: "work removal",            // Q1, Q3, Q4
+                b: "throughput/data layout",  // Q3, Q5, Q6
+                c: "representation/algorithm", // Q9 (Iteration 3 — GJK+bisection)
+                d: "data-pattern specialization" }  // Q5/Q6 (per-type specialization)
+
+match-contract := { type: tolerance,
+                    tolerance: { dist_max: "1mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)",
+                                 contact_certifier: true,
+                                 collision_flag_identity: true } }
+
+build-isolation := { shapes_transform: "build_optimized_shapes (sees only shapes)",
+                      pairs_transform: "build_optimized_pairs (sees only pairs)",
+                      invariant: "neither sees both, so build cannot precompute answers" }
 ```

-The `{ssdl}` [B] marker notes the abstraction: the case-study is a boundary where the model's working state meets measurement. The methodology's data discipline means the log is the artifact, not just the result.
+The shape tag map: `[B]` for the boundary (the case-study is where the model's working state meets measurement), `[I]` for the inspectable match contract + build isolation. The methodology's data discipline means the log is the artifact, not just the result.

-The PEP and collisions case studies together demonstrate the §9 5-element pattern's flexibility: the pattern is invariant (4 prompts + harness + log + freeze + subject); the match contract is the parameterization (byte-identity vs tolerance-based); the candidate kinds are the same 4 (a)/(b)/(c)/(d); the gate discipline is the same (correctness + performance + determinism + generalization); the cost tracking is the same (wall-clock + tokens). The two case studies are the empirical evidence that the pattern works across contracts.
-
-The "GPT-5.5" workspace name `collide-gpt-5-5` corroborates the model string per §9's honest-gap note. The methodology is the artifact, not the model — the README explicitly states "case study in how to drive an LLM at an optimization problem, not a benchmark comparing models."
+**Source-read citations:**
+- `differentiable-collisions-optc/README.md` — full project: 1000-pair benchmark, "GPT-5.5", tolerance-based contract
+- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md` — full log: 14 origin iterations + 12 H-numbered iterations, 4 rejections
+- `differentiable-collisions-optc/prompts/create-reference.md` — reference solver spec
+- `differentiable-collisions-optc/prompts/create-optimized-test-harness.md` — harness spec
+- `differentiable-collisions-optc/prompts/create-optimized.md` — optimization spec
+- `differentiable-collisions-optc/prompts/create-visualizer.md` — visualizer spec
+- `differentiable-collisions-optc/prove-optimized-harness.sh` — 10-step proof + 4 enforcing gates
+- `differentiable-collisions-optc/Makefile.optimized` — build configuration
+- `differentiable-collisions-optc/src-optimized/collide.c` — optimized implementation
+- `differentiable-collisions-optc/performance-test-optimized/build_optimized_shapes.c` — isolated shapes transform
+- `differentiable-collisions-optc/performance-test-optimized/build_optimized_pairs.c` — isolated pairs transform
+- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md:1-50` — origin history (collide-gpt-5-5 workspace)
+- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md:50-100` — kept optimizations H1-H6
+- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md:100-200` — kept optimizations H7-H12
+- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md:200-300` — rejected experiments
+- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md:300-400` — final committed baseline
+- `differentiable-collisions-optc/README.md:1-50` — project description
+- `differentiable-collisions-optc/README.md:50-150` — 4-prompt methodology
+- `differentiable-collisions-optc/README.md:150-300` — 1000-pair benchmark
+- `differentiable-collisions-optc/README.md:300-500` — results continued + match contract
+- `differentiable-collisions-optc/prove-optimized-harness.sh:1-50` — harness start
+- `differentiable-collisions-optc/prove-optimized-harness.sh:50-150` — harness body
+- `differentiable-collisions-optc/prove-optimized-harness.sh:150-350` — harness end
+- `differentiable-collisions-optc/prompts/create-reference.md:1-50` — reference spec start
+- `differentiable-collisions-optc/prompts/create-reference.md:50-150` — reference spec body
+- `differentiable-collisions-optc/prompts/create-optimized.md:1-50` — optimization spec start
+- `differentiable-collisions-optc/prompts/create-optimized.md:50-150` — 2 candidate kinds
+- `differentiable-collisions-optc/prompts/create-optimized.md:150-300` — exit criteria + plateau guidance
+- `differentiable-collisions-optc/prompts/create-optimized-test-harness.md:1-50` — harness spec start
+- `differentiable-collisions-optc/prompts/create-optimized-test-harness.md:50-150` — harness spec body
+- `differentiable-collisions-optc/prompts/create-visualizer.md:1-50` — visualizer spec start
+- `differentiable-collisions-optc/prompts/create-visualizer.md:50-150` — visualizer spec body
+- `differentiable-collisions-optc/Makefile.optimized:1-50` — build config start
+- `differentiable-collisions-optc/Makefile.optimized:50-100` — build config body
+- `differentiable-collisions-optc/performance-test-optimized/build_optimized_shapes.c:1-50` — shapes transform start
+- `differentiable-collisions-optc/performance-test-optimized/build_optimized_shapes.c:50-150` — shapes transform body
+- `differentiable-collisions-optc/performance-test-optimized/build_optimized_pairs.c:1-50` — pairs transform start
+- `differentiable-collisions-optc/performance-test-optimized/build_optimized_pairs.c:50-150` — pairs transform body
+- `differentiable-collisions-optc/` (full repo at main) — 5 commits + README + OPTIMIZATION-LOG + 4 prompts + harness
+- `differentiable-collisions-optc/commits/` — the 5 commit history (the v3 cluster does not cite specific SHAs)
+- `differentiable-collisions-optc/.gitignore` — the gitignore (the v3 cluster does not cite specific contents)
+- `intent_dsl_survey_20260612` — the survey (relevant for the gap note on intent-DSL)
+- `superpowers_review_20260619` — the superpowers review (relevant for the gap note on process parallel)
+- `tracy_howell_manchester_arxiv_2207.00669` — the cited paper (relevant for the reference implementation)

+**Decision candidate:** NEW Candidate 27 (LOW). "Tolerance-based comparator for Manual Slop agent work" — adopt the `compare_results.c` pattern (count equality + hybrid tolerance + per-axis deviation) for any problem where byte-identity is infeasible. See `decisions.md` Candidate 27.
+**Cross-refs:** §3 Hooks (`prove-optimized-harness.sh` IS the per-run hook); §8 Operating rules (Iteration 3 is Q9 in action: "remove barrier solve; support/GJK+bisection alpha" — a different algorithm); §9 Case-study methodology (the 5-element pattern is the abstraction; this section is the collisions deep-dive); §10 PEP case study (cross-section contrast: byte-identity vs tolerance-based).
+**Pattern history:** NEW. v2.3 had no case-study repos. v3 introduces the tolerance-based exemplar of §9's 5-element pattern. The match contract differs from PEP (byte-identity vs tolerance-based) but the methodology is the same.
 ## §12 Decisions

 See `decisions.md` for the full candidate list (v2.3's 16 + v3's new 11, with v2.3 → v3 status mapping at the top). **Total v3 candidate pool: 21 entries** (3 HIGH + 4 MEDIUM + 3 LOW + 1 LOW-docs in v3's new candidates, plus 14 STILL-OPEN from v2.3, plus 1 PROMOTED + 1 SUBSUMED status changes). The HIGH-priority v3 candidates are: