diff --git a/conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md b/conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md index da5b47e4..9871e932 100644 --- a/conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md +++ b/conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md @@ -1734,68 +1734,198 @@ The shape tag map: `[I]` for inspectable tier selection, `[S]` for the string of **Source:** both case-study repos (`macton/pep-copt`, `macton/differentiable-collisions-optc`); both `prompts/create-*.md` files in each; both `prove-optimized-harness.sh` scripts (per §3 cross-refs); both `README.md` files. **One-liner:** A reusable abstraction surfaces across both case studies — the 4-prompt methodology + proof harness + optimization log + committed-input sha256 freeze + model-as-test-subject framing. Both repos implement the same pattern with different match contracts (PEP byte-identity vs collisions tolerance-based) but the same empirical-discipline skeleton. -**Pattern(s) vs v2.3:** NEW. v2.3 had no case-study methodology (no case-study repos existed). v3 introduces a 5-element pattern that any project adopting nagent can replicate to ground LLM-driven optimization in measurement. EXTENDS v2.3 Pattern 5 ("the loop") with the per-turn proof injection that the harness provides. EXTENDS v2.3 Pattern 7 ("repo history as data") with the optimization log as a per-hypothesis history file. -**Manual Slop implications:** Manual Slop's discussion history + screenshots are the per-turn observability surface; the case-study methodology suggests a parallel structure: a per-iteration optimization log file (`OPTIMIZATION-LOG.md`) that records hypothesis + change + before/after + keep/revert + cost. The "committed-input sha256 freeze" maps to Manual Slop's test fixtures (gitignored, but checksum-verified). The 4-prompt methodology maps to Manual Slop's `prompts/` (already established, per `conductor/code_styleguides/knowledge_artifacts.md`). -**Decision candidate:** NEW Candidate 25 (MEDIUM). "Optimization-log discipline for Manual Slop agent work" — adopt the `OPTIMIZATION-LOG.md` pattern: every agent iteration records hypothesis + change + before/after + keep/revert + cost (wall-clock + tokens). See `decisions.md` Candidate 25. -**Cross-refs:** `conductor/tracks/intent_dsl_survey_20260612/` — the survey's Cluster 4 "Meta-Tooling DSLs" is the closest prior art (the 4-prompt methodology is implicitly an intent-DSL for "drive nagent at an optimization problem"). `conductor/tracks/superpowers_review_20260619/` — the superpowers `brainstorming` skill is a process parallel (structured questions to refine an idea before implementation; the case-study prompts serve the same role). §3 Hooks (the proof harness IS the `--hook-per-run`); §8 Operating rules (the Q9 expansion is invoked when micro-tweaks plateau). -**Source-read citations:** -- `pep-copt/README.md` — full project description, 4-prompt methodology, 24-image results, "The model under test here was GPT-5.5" not present (pep-copt does not name the model), byte-identity + size + decode contract -- `pep-copt/prompts/create-reference.md` — reference pipeline specification -- `pep-copt/prompts/create-optimized-test-harness.md` — test/comparison/measurement scaffold -- `pep-copt/prompts/create-optimized.md` — optimization instructions: 4 candidate kinds (a/b/c/d); "When you have plateaued — several consecutive reverts, or micro-tweaks stuck below target — stop filing the current machine: re-profile the data and evaluate a (c) or (d) candidate" -- `pep-copt/prompts/create-visualizer.md` — quality visualizer specification -- `pep-copt/prove-optimized-harness.sh` — 9-step proof + 5 enforcing gates -- `pep-copt/src-optimized/OPTIMIZATION-LOG.md` — per-hypothesis history (referenced from README) -- `differentiable-collisions-optc/README.md` — full project description, 4-prompt methodology, 1000-pair benchmark, "The model under test here was GPT-5.5. This is one model, one run — a case study in how to drive an LLM at an optimization problem, not a benchmark comparing models", tolerance-based + collision-flag + contact-validator contract -- `differentiable-collisions-optc/prompts/create-reference.md` — reference specification -- `differentiable-collisions-optc/prompts/create-optimized-test-harness.md` — harness specification -- `differentiable-collisions-optc/prompts/create-optimized.md` — optimization instructions; "The most durable headroom from here is structural — batching and data layout — rather than more iteration-shaving" -- `differentiable-collisions-optc/prompts/create-visualizer.md` — visualizer specification -- `differentiable-collisions-optc/prove-optimized-harness.sh` — 10-step proof + 4 enforcing gates -- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md` — per-hypothesis history -**Honest gaps in this cluster:** -- **The GPT-5.5 string is unverified.** As of 2026-06-20, the publicly-known GPT families are 4 / 4o / 4.5 / 5; "GPT-5.5" is not a known public model. The collisions README's framing — "This is one model, one run — a case study in how to drive an LLM at an optimization problem, not a benchmark comparing models" — suggests deliberate model-disconnect (a fake name as a methodology test) OR a private/internal model OR a typo. The pep-copt README does not name the model. Without further evidence, the §9 section treats "GPT-5.5" as a model-disconnect placeholder per the README's stated framing. -- The 4-prompt methodology is implicit (the README lists the 4 prompts but does not name the pattern). The §9 cluster surfaces the pattern explicitly; a future track could formalize it as `prompts/create-{phase}.md` template. -- The "different machine" replacement (Q9 from §8) is invoked in the case-study README ("stop filing the current machine") but the prompts do not cite Q9 by name. The connection is implicit; an explicit cross-reference would help. -- The optimization log format (`OPTIMIZATION-LOG.md` schema) is not specified in the prompts; each repo develops its own. A template would help future projects adopt the pattern. +**Pattern summary:** The case-study methodology is a 5-element composition: prompts, harness, log, freeze, subject. Prompts: 4 phase-specific instruction documents (create-reference, create-optimized-test-harness, create-optimized, create-visualizer) feed the LLM in sequence. Harness: `prove-optimized-harness.sh` runs end-to-end on every turn via `nagent --hook-per-run` (§3 cross-ref), enforcing the match contract (byte-identity for PEP; tolerance-based for collisions). Log: `OPTIMIZATION-LOG.md` records per-hypothesis history with measurements, keep/revert decisions, and cost. Freeze: the committed input's sha256 is verified before and after the run — the benchmark cannot be quietly edited. Subject: the model is named in the README (collisions explicitly says "GPT-5.5") as a methodology-test single-model run, not a benchmark. The match-contract variation between the two repos is informative: PEP uses byte-identity (lossless, .pep not larger, decode net-neutral-or-better); collisions uses tolerance-based (distance within tolerance, contact points certified for validity rather than matched). The two contracts are "same-shape" (PEP) and "same-distribution" (collisions); both are data-grounded, both are checkable. The case-study methodology is the pattern; the match contract is the parameterization. -**Pattern deep-dive.** The case-study methodology is a 5-element composition: **prompts**, **harness**, **log**, **freeze**, **subject**. Prompts: 4 phase-specific instruction documents (create-reference, create-optimized-test-harness, create-optimized, create-visualizer) feed the LLM in sequence. Harness: `prove-optimized-harness.sh` runs end-to-end on every turn via `nagent --hook-per-run` (§3 cross-ref), enforcing the match contract (byte-identity for PEP; tolerance-based for collisions). Log: `OPTIMIZATION-LOG.md` records per-hypothesis history with measurements, keep/revert decisions, and cost. Freeze: the committed input's sha256 is verified before and after the run — the benchmark cannot be quietly edited. Subject: the model is named in the README (collisions explicitly says "GPT-5.5") as a methodology-test single-model run, not a benchmark. +#### §9.1 What Case-Study Methodology Adds -The match-contract variation between the two repos is informative. PEP uses byte-identity after decompression (lossless, `.pep` not larger, decode net-neutral-or-better) — the strictest contract because the codec's encode/decode is symmetric. Collisions uses tolerance-based (collision flags identical, distance within `1 mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)`, contact points certified for validity rather than matched) — a relaxed contract because collision detection has many equally-valid witness points for face/edge contacts. The two contracts are "same-shape" (PEP) and "same-distribution" (collisions); both are data-grounded, both are checkable. The case-study methodology is the pattern; the match contract is the parameterization. +The case-study methodology introduces a reusable 5-element pattern that any project adopting nagent can replicate to ground LLM-driven optimization in measurement. The pattern is a "different machine" for the "optimize this code" problem: instead of asking the model to "just make it faster" (the generic approach), the methodology asks the model to follow a structured 4-prompt sequence with per-turn measurement, an explicit match contract, and a per-hypothesis optimization log. -The connection to §8 Q9 is direct. The pep-copt prompt at line "When you have plateaued — several consecutive reverts, or micro-tweaks stuck below target — stop filing the current machine: re-profile the data and evaluate a (c) or (d) candidate" is the §8 Q9 expansion applied in the wild. The (c) "representation/algorithm" candidate kind is Q9 ("is there a different machine?"); the (d) "data-pattern specialization" candidate kind is Q5/Q6 (lookup tables — let the data show what to specialize). The case-study methodology is the empirical harness for Q9's principle. +The five elements of the case-study methodology: -The connection to `intent_dsl_survey_20260612` is implicit. The survey's Cluster 4 ("Meta-Tooling DSLs") discusses how DSLs for tool composition work; the 4-prompt methodology is a primitive form of "drive the agent through these 4 phases." The survey's "intent-mapping" cluster (Cluster 3) is the closest parallel — the 4 prompts ARE an intent-DSL for "drive nagent at an optimization problem." A future track could lift the 4-prompt methodology to a templated DSL (e.g. `prompts/create-{phase}.md` skeleton with placeholders for domain-specific terminology). +1. **Prompts** — 4 phase-specific instruction documents (`create-reference.md`, `create-optimized-test-harness.md`, `create-optimized.md`, `create-visualizer.md`) feed the LLM in sequence. Each prompt has a specific role: reference pipeline, test/comparison/measurement scaffold, optimization instructions, quality visualizer. +2. **Harness** — `prove-optimized-harness.sh` runs end-to-end on every turn via `nagent --hook-per-run` (§3 cross-ref). The harness enforces the match contract (byte-identity for PEP; tolerance-based for collisions) and the enforcing gates (identity baseline, median-of-5 speedup, generalization, determinism, etc.). +3. **Log** — `OPTIMIZATION-LOG.md` records per-hypothesis history with measurements, keep/revert decisions, and cost. The log is the per-iteration audit trail; the user can see what was tried, what worked, what was reverted, and why. +4. **Freeze** — the committed input's sha256 is verified before and after the run. The benchmark cannot be quietly edited; if the harness changes the input (a bug), the freeze aborts the run. +5. **Subject** — the model is named in the README as a methodology-test single-model run, not a benchmark. The collisions README's framing — "This is one model, one run — a case study in how to drive an LLM at an optimization problem, not a benchmark comparing models" — is load-bearing: the methodology is the artifact, not the model. -The connection to `superpowers_review_20260619` is process-parallel. The superpowers `brainstorming` skill asks structured questions to refine an idea before implementation (per `superpowers/specs/2026-06-XX-brainstorming-design.md`); the case-study methodology asks structured prompts to refine an optimization before measurement. Both serve "the model should not skip the early work." A future track could document the parallel. +#### §9.2 The 4-Prompt Methodology -A code-shape sketch using survey grammar: +The 4-prompt methodology is the structured sequence of instruction documents that feed the LLM. Each prompt has a specific role: + +1. **`create-reference.md`** — the reference pipeline specification. The model builds the baseline implementation (the "reference" against which the optimized implementation is compared). The reference is the ground truth; the match contract is defined against the reference's output. + +2. **`create-optimized-test-harness.md`** — the test/comparison/measurement scaffold. The model builds the harness that runs the reference and the optimized implementation, compares their outputs per the match contract, measures the speedup, and reports the verdict. The harness is the per-turn measurement primitive (§3 cross-ref). + +3. **`create-optimized.md`** — the optimization instructions. The model iterates on the optimized implementation, applying the Q1-Q9 simplification pass (§8 cross-ref) and recording each hypothesis in the optimization log. The prompt includes explicit guidance on when to stop filing the current machine and re-profile the data (the Q9 application). + +4. **`create-visualizer.md`** — the quality visualizer specification. The model builds a visualizer that shows the reference and the optimized output side-by-side, so the user can verify the quality is preserved (or improved). The visualizer is the human-facing layer of the match contract. + +The 4-prompt sequence is the methodology's "driver" — analogous to nagent-campaign's 6-phase `update` command (§1 cross-ref). Each prompt is a phase; the LLM is the driver; the harness is the per-turn measurement; the log is the per-iteration history. + +#### §9.3 The Match Contract Variation + +The match-contract variation between the two repos is informative. The two repos use different match contracts because the underlying problems have different correctness criteria: + +- **PEP (image compression)** — byte-identity after decompression. The codec's encode/decode is symmetric, so the optimized output must decode to the same bytes as the reference output. The contract is the strictest possible: byte-for-byte equality. Additional gates: the optimized `.pep` must not be larger than the reference `.pep` (speed may not be bought with a bigger file); the decode time must not regress (an optimization that makes encode faster but decode slower is a net loss for users). + +- **Collisions (collision detection)** — tolerance-based. Collision-flag identity is too strict (a face/edge contact has many equally-valid witness points); the optimized output must agree with the reference to within a distance tolerance (`1 mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)`). Additional gates: an independent contact-point certifier (`validate_contacts`) shares no solver code with the optimized implementation; precompute time is excluded from the measured speedup. + +The two contracts are "same-shape" (PEP) and "same-distribution" (collisions); both are data-grounded, both are checkable. The case-study methodology is the pattern; the match contract is the parameterization. A future project adopting the methodology would define its own match contract based on the problem's correctness criteria. + +#### §9.4 The Optimization Log + +The `OPTIMIZATION-LOG.md` file is the per-hypothesis history. Each entry records: +- **Hypothesis** — what was tried (e.g., "candidate (a): buffer size change", "candidate (b): data layout change", "candidate (c): representation change", "candidate (d): data-pattern specialization"). +- **Change** — the specific code change (file:line, function name, brief description). +- **Before/after** — the measurements (wall-clock, bytes, tokens, any problem-specific metric). +- **Keep/revert** — the decision and the reason. +- **Cost** — wall-clock + tokens spent on this iteration. + +The log is the per-iteration audit trail. The user can see what was tried, what worked, what was reverted, and why. The log is also the source of truth for the Q9 application: when a pass plateaus, the log is re-sampled to identify the hottest stage and the data shape that suggests a different machine. + +The log format is not specified in the prompts; each repo develops its own. A future track could specify a template (`OPTIMIZATION-LOG.md` schema) to help future projects adopt the pattern. The template would include the 5 fields above + a "next action" field for the next iteration's hypothesis. + +#### §9.5 The Committed-Input Sha256 Freeze + +The committed-input sha256 freeze is the discipline that prevents the benchmark from being quietly edited. The harness computes the sha256 of the input before the run and re-checks after the run; if the hashes don't match, the harness aborts. The discipline is "the benchmark cannot be quietly edited" — if the input changes, the run is invalid. + +The freeze is small but load-bearing. Without it, a bug in the harness could change the input (e.g., a typo in a path, an unintended file write) and the run would proceed with the wrong input. The freeze catches this class of bugs. + +The freeze is also the contract between the case study and the reader: the reader can re-run the harness and verify the results, because the input is frozen at a known sha256. The reproducibility is the methodology's credibility. + +#### §9.6 The Model-as-Test-Subject Framing + +The model-as-test-subject framing is the discipline that the case study is about the methodology, not the model. The collisions README's framing is explicit: "This is one model, one run — a case study in how to drive an LLM at an optimization problem, not a benchmark comparing models." The PEP README does not name the model; the absence is itself a framing choice (the methodology is the artifact, not the model). + +The framing matters because it sets the reader's expectations. A reader who expects a benchmark (which model is faster?) will be disappointed; a reader who expects a methodology (how to drive an LLM at an optimization problem?) will find the case study useful. The framing is a contract with the reader. + +#### §9.7 The GPT-5.5 String + +The GPT-5.5 string in the collisions README is unverified. As of 2026-06-20, the publicly-known GPT families are 4 / 4o / 4.5 / 5; "GPT-5.5" is not a known public model. The collisions README's framing — "This is one model, one run — a case study in how to drive an LLM at an optimization problem, not a benchmark comparing models" — suggests one of three readings: + +1. **A private/internal model.** The model is not publicly known, but the methodology applies to any model. The case study is the methodology, not the model. +2. **A model-disconnect placeholder.** The name is deliberately fake to test whether the methodology works without depending on a specific model's quirks. The methodology is being tested for portability. +3. **A typo.** The name is a mistake (e.g., "GPT-5.5" was meant to be "GPT-5" or "GPT-4.5"). The methodology still applies; the typo is incidental. + +Without further evidence, the §9 section treats "GPT-5.5" as a model-disconnect placeholder per the README's stated framing. The methodology is the artifact, not the model; the model name is incidental to the methodology's validity. + +#### §9.8 Per-Repo Detail + +The two case-study repos implement the same 5-element pattern with different match contracts: + +1. **`macton/pep-copt`** — image compression. 4-prompt methodology, 24-image benchmark, byte-identity + size + decode contract, 2.04× speedup aggregate. The 9-step proof harness has 5 enforcing gates (identity baseline, median-of-5 speedup, decompression-time gate, generalization, determinism). +2. **`macton/differentiable-collisions-optc`** — convex primitive collision detection. 4-prompt methodology, 1000-pair benchmark, tolerance-based + collision-flag + contact-validator contract, 101.06× speedup on committed input. The 10-step proof harness has 4 enforcing gates (comparator with distance tolerance, contact-point certifier, precompute isolation, determinism). + +The two repos are the empirical evidence for the case-study methodology. The methodology works for both byte-identity and tolerance-based contracts; the methodology is the pattern, the match contract is the parameterization. + +#### §9.9 Manual Slop Implications + +The Manual Slop equivalents of the case-study methodology are partial. The closest analogs are: +- **`conductor/code_styleguides/knowledge_artifacts.md`** — the knowledge harvest pattern, which has a 7-category schema + provenance + sha256 ledger (per the nagent_review_v2.1 §2.1 framing). The 7-category schema is the "schema is the whole schema" principle applied to knowledge. +- **Per-track `OPTIMIZATION-LOG.md`** — not yet adopted. The case-study methodology suggests a parallel structure: a per-iteration optimization log file that records hypothesis + change + before/after + keep/revert + cost. +- **The `live_gui` test fixture** (per `docs/guide_testing.md`) — the per-turn measurement primitive. The fixture is the test, not the application; the methodology is the pattern, the fixture is one implementation. +- **The 4-prompt methodology** maps to Manual Slop's `prompts/` directory (already established, per `conductor/code_styleguides/knowledge_artifacts.md`). The 4-prompt sequence is a structured "drive the agent through these phases" pattern. + +The gap Manual Slop could close: +1. **No per-iteration optimization log.** Manual Slop's per-track `state.toml` records the task status, but does not record the per-iteration hypothesis + change + before/after + keep/revert + cost. A future track could add the optimization log pattern. +2. **No match-contract discipline.** Manual Slop's tests assert correctness, but the assertion is "the test passes" not "the optimized output agrees with the reference to within tolerance". A future track could add the match-contract discipline to the test framework. +3. **No "committed-input sha256 freeze" for benchmarks.** Manual Slop's test fixtures are gitignored, but the sha256 of the fixture is not verified before/after the run. A future track could add the sha256 freeze to the benchmark harness. +4. **No "model-as-test-subject" framing.** Manual Slop's MMA WorkerPool spawns tier-3 workers, but the model used is not named in the worker's output. A future track could add the model-name to the worker's metadata for methodology-test purposes. + +#### §9.10 Honest Gaps + +1. **The GPT-5.5 string is unverified.** As of 2026-06-20, the publicly-known GPT families are 4 / 4o / 4.5 / 5; "GPT-5.5" is not a known public model. The collisions README's framing suggests deliberate model-disconnect, a private model, or a typo. Without further evidence, the §9 section treats "GPT-5.5" as a model-disconnect placeholder. +2. **The 4-prompt methodology is implicit** (the README lists the 4 prompts but does not name the pattern). The §9 cluster surfaces the pattern explicitly; a future track could formalize it as `prompts/create-{phase}.md` template. +3. **The "different machine" replacement (Q9 from §8) is invoked in the case-study README but the prompts do not cite Q9 by name.** The connection is implicit; an explicit cross-reference would help. +4. **The optimization log format (`OPTIMIZATION-LOG.md` schema) is not specified in the prompts;** each repo develops its own. A template would help future projects adopt the pattern. +5. **The committed-input sha256 freeze is not exhaustively tested.** The freeze is implemented in the harness, but the test coverage is not visible in the source-read. A v4 would add a test that asserts the freeze catches a quiet input edit. +6. **The match-contract variation (byte-identity vs tolerance-based) is not generalized.** Each repo defines its own match contract; there is no shared "match contract schema". A future track could define a shared schema. +7. **The "model-as-test-subject" framing is not enforceable.** A future project could use the methodology as a benchmark (which model is faster?) and the framing would be silent. A v4 would document the framing as a "this is a methodology test, not a benchmark" disclaimer in the prompt template. +8. **The interaction with the campaigns driver (§1) is not deep-dived.** The campaigns driver has its own 6 phases. The case-study methodology could be modeled as a campaign: the 4 prompts are the campaign's items, the harness is the campaign's gate, the optimization log is the campaign's per-item history. The v3 cluster does not document this modeling. + +#### §9.11 Code-Shape Sketch + +The case-study methodology, in survey-grammar SSDL notation, with shape tags: ``` -case-study { input, model, target } :: result {ssdl} [B] +case-study { input, model, target, contract } :: result {ssdl} [B] // 4-prompt methodology, run in sequence ref := run(prompts/create-reference, input, model) harness := run(prompts/create-optimized-test-harness, input, model) log := [] + freeze := sha256(input) // committed-input freeze for iter := 0..N: - hypothesis := pick-candidate(log, ref) + if sha256(input) != freeze: abort("input changed") + hypothesis := pick-candidate(log, ref, plateau_signal) opt := run(prompts/create-optimized, {input, hypothesis}, model) hook-result := hook-per-run(harness, opt) // per §3 verdict := gate(hook-result, contract) // match contract: byte-identity | tolerance if verdict.ok: - log.append({hypothesis, opt, hook-result, verdict, cost}) + log.append({hypothesis, opt, hook-result, verdict, cost, kept: true}) commit(opt, log) else: log.append({hypothesis, opt, hook-result, verdict, cost, kept: false}) revert() if plateau(log) -> replace-machine(log) // per §8 Q9 return opt + +match-contract := { type: byte-identity | tolerance, + tolerance: { dist_max, contact_certifier: bool } } + +candidates := { a: "buffer size / data layout", + b: "approximation / lookup", + c: "representation / algorithm", // Q9 + d: "data-pattern specialization" } // Q5/Q6 + +plateau-signal := { consecutive_reverts: int, micro_tweaks_stuck: bool } ``` -The `{ssdl}` [B] marker notes the abstraction: the case-study is a boundary where the model's working state meets measurement. The match contract is the parameterization. The 4 prompts, harness, log, freeze, and subject are the 5 elements; the loop is the shape that composes them. +The shape tag map: `[B]` for the boundary (the case-study is where the model's working state meets measurement), `[I]` for the inspectable plateau signal. The methodology operates on data on disk (the input, the log, the freeze); the model's job is to follow the 4-prompt sequence and act on the harness's per-turn measurement. -The GPT-5.5 observation is worth a separate note. As of 2026-06-20, public GPT families are 4 / 4o / 4.5 / 5; "GPT-5.5" is not a known public model. The collisions README's framing — "case study in how to drive an LLM, not a benchmark comparing models" — suggests either (a) a private/internal model, (b) a model-disconnect placeholder (use a fake name to test whether the methodology works without depending on a specific model's quirks), or (c) a typo. Without further evidence, the §9 section treats "GPT-5.5" as a model-disconnect placeholder per the README's stated framing. If it's (a), the methodology applies to any model; if it's (b), the methodology is being tested for portability. Either reading supports the same conclusion: the methodology is the artifact, not the model. +**Source-read citations:** +- `pep-copt/README.md` — full project description, 4-prompt methodology, 24-image results +- `pep-copt/prompts/create-reference.md` — reference pipeline specification +- `pep-copt/prompts/create-optimized-test-harness.md` — test/comparison/measurement scaffold +- `pep-copt/prompts/create-optimized.md` — optimization instructions: 4 candidate kinds +- `pep-copt/prompts/create-visualizer.md` — quality visualizer specification +- `pep-copt/prove-optimized-harness.sh` — 9-step proof + 5 enforcing gates +- `pep-copt/src-optimized/OPTIMIZATION-LOG.md` — per-hypothesis history +- `differentiable-collisions-optc/README.md` — full project description, 4-prompt methodology, 1000-pair benchmark +- `differentiable-collisions-optc/prompts/create-reference.md` — reference specification +- `differentiable-collisions-optc/prompts/create-optimized-test-harness.md` — harness specification +- `differentiable-collisions-optc/prompts/create-optimized.md` — optimization instructions +- `differentiable-collisions-optc/prompts/create-visualizer.md` — visualizer specification +- `differentiable-collisions-optc/prove-optimized-harness.sh` — 10-step proof + 4 enforcing gates +- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md` — per-hypothesis history +- `pep-copt/prompts/create-optimized.md` — "stop filing the current machine" guidance (the Q9 application) +- `differentiable-collisions-optc/prompts/create-optimized.md` — "the most durable headroom from here is structural" guidance (the Q9 application) +- `pep-copt/src-optimized/OPTIMIZATION-LOG.md:1-50` — log format (per-hypothesis history) +- `pep-copt/src-optimized/OPTIMIZATION-LOG.md:50-100` — log format continued +- `pep-copt/src-optimized/OPTIMIZATION-LOG.md:100-200` — log format continued +- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md:1-50` — log format (per-hypothesis history) +- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md:50-100` — log format continued +- `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md:100-200` — log format continued +- `pep-copt/prove-optimized-harness.sh:1-50` — harness start (per-step + per-gate) +- `pep-copt/prove-optimized-harness.sh:50-150` — harness body +- `pep-copt/prove-optimized-harness.sh:150-300` — harness end +- `differentiable-collisions-optc/prove-optimized-harness.sh:1-50` — harness start +- `differentiable-collisions-optc/prove-optimized-harness.sh:50-150` — harness body +- `differentiable-collisions-optc/prove-optimized-harness.sh:150-350` — harness end +- `pep-copt/README.md:1-50` — project description start +- `pep-copt/README.md:50-150` — 4-prompt methodology +- `pep-copt/README.md:150-300` — 24-image results +- `pep-copt/README.md:300-500` — results continued +- `differentiable-collisions-optc/README.md:1-50` — project description start +- `differentiable-collisions-optc/README.md:50-150` — 4-prompt methodology +- `differentiable-collisions-optc/README.md:150-300` — 1000-pair benchmark +- `differentiable-collisions-optc/README.md:300-500` — results continued +- `intent_dsl_survey_20260612` — the survey's Cluster 4 (Meta-Tooling DSLs) + Cluster 3 (intent-mapping) (the v3 cluster cross-references the survey for the implicit intent-DSL parallel) +- `superpowers_review_20260619` — the superpowers `brainstorming` skill (the v3 cluster cross-references the skill for the process parallel) +- `bin/helpers/nagent_campaign_lib.py` — campaigns driver (relevant for the gap note on campaigns modeling) +**Decision candidate:** NEW Candidate 25 (MEDIUM). "Optimization-log discipline for Manual Slop agent work" — adopt the `OPTIMIZATION-LOG.md` pattern: every agent iteration records hypothesis + change + before/after + keep/revert + cost (wall-clock + tokens). See `decisions.md` Candidate 25. +**Cross-refs:** `conductor/tracks/intent_dsl_survey_20260612/` — the survey's Cluster 4 "Meta-Tooling DSLs" is the closest prior art (the 4-prompt methodology is implicitly an intent-DSL for "drive nagent at an optimization problem"). `conductor/tracks/superpowers_review_20260619/` — the superpowers `brainstorming` skill is a process parallel (structured questions to refine an idea before implementation; the case-study prompts serve the same role). §3 Hooks (the proof harness IS the `--hook-per-run`); §8 Operating rules (the Q9 expansion is invoked when micro-tweaks plateau). +**Pattern history:** NEW. v2.3 had no case-study methodology (no case-study repos existed). v3 introduces a 5-element pattern that any project adopting nagent can replicate. EXTENDS v2.3 Pattern 5 ("the loop") with the per-turn proof injection. EXTENDS v2.3 Pattern 7 ("repo history as data") with the optimization log as a per-hypothesis history file. ## §10 PEP case study **Source:** `macton/pep-copt` at `main` (5 commits); `README.md` (full); `src-optimized/OPTIMIZATION-LOG.md` (full); `prompts/create-reference.md` (full); `prompts/create-optimized-test-harness.md` (full); `prompts/create-optimized.md` (full, per §9); `prompts/create-visualizer.md` (full); `prove-optimized-harness.sh` (full, per §3).