conductor(track): nagent_review_v3 Phase 13 refresh side artifacts

2026-06-20 08:46:05 -04:00
parent dd10a6803b
commit e150088d24
4 changed files with 370 additions and 255 deletions
@@ -1,79 +1,72 @@
-# nagent vs Manual Slop: Comparison Table
+# nagent_review_v3 — Comparison Table

-**Companion to:** `report.md`
-**Date:** 2026-06-08 (revised same day)
-**Source:** nagent v1.0.0 (read 2026-06-08)
+**Date:** 2026-06-19
+**Spec pair:** `spec_v3.md` + `plan_v3.md`
+**Companion:** `nagent_review_v3_20260619.md` (the v3 canonical review); `decisions.md` (v3 candidate list); `nagent_takeaways_v3_20260619.md` (bridge to v2.3 takeaways + sibling reviews).
+**Source:** nagent v3 (`a1f0680` on `macton/nagent@main`, 2026-06-18) + the two case-study repos at `main` (`macton/pep-copt`, `macton/differentiable-collisions-optc`).

-Flat side-by-side reference. One row per nagent principle. Verdicts and pitfalls are in `report.md`.
+Flat side-by-side reference. One row per v3 cluster + one row per v2.3 pattern that v3 updates. Verdicts and pitfalls are in `nagent_review_v3_20260619.md`.

 ---

 ## Legend

- **Verdict values:** PARITY (same shape), PARITY+ (Manual Slop is stronger), PARITY- (nagent is stronger), PARTIAL (one half, not the other), GAP (Manual Slop lacks the feature), DOMAIN MISMATCH (different scope).
+- **Verdict values:** PARITY (same shape), PARITY+ (Manual Slop is stronger), PARITY- (nagent is stronger), PARTIAL (one half, not the other), GAP (Manual Slop lacks the feature), ARCH-DIFF (different architecture, both correct in their domain), SUBSUMED (consumed by a follow-up track).
 - **Domain tags:** APP = Application domain, MT = Meta-Tooling domain, BOTH.
+- **Cluster status:** NEW (didn't exist at v2.3), UPDATE (extends v2.3 pattern).

 ---

-| # | nagent Principle (verbatim summary) | nagent Mechanism | Manual Slop Equivalent | Verdict | Domain | Action |
+## v3 new clusters
+
+| # | Cluster | nagent source | Manual Slop equivalent | Verdict | Status | Domain |
 |---|---|---|---|---|---|---|
-| 1 | Durable work, disposable workers. The agent is not the thing; the data is the thing. | `bin/nagent` 700-line single-file loop, conversation is a text file | MMA workers are real subprocesses with Context Amnesia; **Application AI is long-lived by design** | **PARTIAL** | BOTH | Future-track: stateless `LLMClient` class (§15.4) |
-| 2 | Text in, text out. File in, text out is the smallest useful primitive. | `bin/nagent-llm-text` + `bin/helpers/nagent_llm.py` (4 providers) | `src/ai_client.py:send(...) -> str` (5 providers) | **PARITY** | BOTH | None |
-| 3 | Conversations are editable state. The conversation file is not chat history; it is working state. | `bin/nagent` exposes `--save/load/edit/summarize`; text files are user-editable (vim/cat/diff/cp the raw transcript) | Discussion Takes + branching + per-entry edit (A1-A7 in report §3) + discussion-level CRUD (B1-B11) + role management (B5) + UI snapshot undo/redo (C1-C5) | **PARITY (DIFFERENT FOCUS)** — Manual Slop edits abstracted typed entries (`disc_entries` is a `list[dict]` with role + content + ts + thinking_segments + usage). Both have comprehensive editing; Manual Slop's is more granular at the entry layer, nagent's is deeper at the raw-transcript layer. | APP | Future-track: optional raw-transcript persistence per Take (Candidate 10) |
-| 4 | Visible output protocol. Teach the model an output format; use a visible, parseable protocol. | `TAG_PATTERNS` regex list; `parse_response` strict; `MAX_FORMAT_RETRIES = 3` | Provider-native function calling (Gemini, Anthropic, etc.) | **ARCHITECTURAL DIFFERENCE** — Application's choice is correct (parallel tool calls, JSON mode) | BOTH | Future-track: intent-based DSL for Meta-Tooling calls |
-| 5 | The loop. Append, call, parse, act, append, repeat. | `bin/nagent:run_agent_loop()` 50 lines, single `while True` | Three parallel loops: `ai_client._send_*` (LLM), `ConductorEngine.run` (MMA), `WorkflowSimulator.run_discussion_turn_async` (App) | **PARITY** | BOTH | (Low priority) Future-track: extract a single `src/llm_loop.py:run_loop` |
-| 6 | Per-file memory. Each file gets its own persistent local memory. | `file_id_for_path` (st_dev:st_ino); `conversations/file-index-{pid}.json`; `nagent-file-edit` per-file subprocess | `FileItem` (path + view_mode + ast_mask + custom_slices); `ContextPreset` (saved set of FileItems); Structural File Editor | **PARITY (DIFFERENT KIND)** — Manual Slop's is *curation memory* (rich); nagent's is *conversation log memory* (plain text). Both real, both per-file, different optimization. | APP | Future-track: thin "last-investigation" log per file (Meta-Tooling-friendly) |
-| 7 | Repository history as data. Turn git history into editing context. | `git_file_history` + `summarize_new_file_commits` + `coedited_file_rows` + `format_file_history` | `_reread_file_items` (mtime-based, diff injection); git-linked discussion tracking in GUI; **no historical-context injection** | **PARTIAL** — diff injection is similar; historical-context injection is missing | APP | Future-track: `src/git_history.py` mirroring nagent's `file_edit_history_and_summary_block` |
-| 8 | Historical coupling & artifact neighborhoods. Files that change together are hints. | `coedited_file_rows` labels high/medium/low co-edit rate; guidance text "Use these files as hints. Do not edit unless the user request or evidence requires it." | None (closest: `py_get_hierarchy` is structural not historical) | **GAP** | APP | Future-track: `py_coedited_files` + `ts_c_coedited_files` MCP tools |
-| 9 | Disposable sub-conversations. Exploration creates noise; spawn disposable workers. | `<nagent-conversation>` tag spawns `nagent --invocation delegated` as subprocess; isolated conversation file; recursive token rollup | MMA Tier 3/4 workers (real subprocesses); **1:1 main discussion has no sub-conversation mechanism** | **PARITY for MMA; GAP for 1:1 discussions** | APP (and MT) | **USER-FLAGGED WANT**: Future-track `src/sub_conversation.py:SubConversationRunner` for 1:1 investigations |
-| 10 | Controlled writes. A loop that writes files needs explicit boundaries. Not a sandbox; just conventions. | `validate_write_path`: main mode → tmpdir only; file-edit mode → target or segments; rejected writes append `<nagent-write-result status="error">` | `mcp_client._is_allowed` (3-layer: allowlist + path validation + resolution gate); `run_powershell` requires GUI modal approval; PowerShell-only by default; 60s timeout + `taskkill` cleanup; optional Tier 4 QA | **PARITY+ (Manual Slop stronger)** — 3-layer security + HITL + sandbox is dramatically stricter than nagent's tmpdir check | APP (and MT) | None — current design is right |
-| 11 | Large files as explicit artifacts. Split, edit segments, patch. | `nagent-file-split` (11 langs, regex + line counts + brace/JSON/XML depth); `nagent-file-patch` (strict hash validation); `nagent-file-summarize` (per-segment + retry); 32 KB default; index.json with `source_path`, `sourcesha256`, `segments[]` | `aggregate.py:build_file_items` + `py_get_skeleton` (tree-sitter) + `ts_c_*_get_skeleton` (tree-sitter); `set_file_slice` / `edit_file` (mtime validation, not hash); `run_subagent_summarization` (in-process, no retry); `RAGEngine._chunk_code` (mtime-based, ChromaDB) | **PARITY (DIFFERENT MECHANISM)** — both have the insight; nagent uses per-language scoring functions + subprocess isolation + hash validation; Manual Slop uses tree-sitter + in-process + mtime validation | BOTH | Future-track: explicit `src/split_lib.py` + `src/patch_lib.py` mirroring nagent's design, with hash validation |
-| 12 | Tool discovery. Tool capability should be explicit data. | `collect_bin_tool_descriptions` runs each `bin/* --description`; auto-builds "Available tools:" block for initial context | None (45 tools in `mcp_client.py:dispatch` if/elif chain) | **GAP** — nagent's pattern is genuinely better; current dispatch is fine but not extensible | BOTH (especially MT) | Future-track: subsumed by `mcp_architecture_refactor_20260606` (sub-MCPs as self-describing modules) |
-| 13 | Differences from frameworks. The reframing table: memory→editable artifact, agent→temporary transformation function, context→explicit input data. | The philosophical frame | The applicable reframings: editable UI state, curated per-file memory, git history as data | **N/A** | BOTH | (Lens, not action) |
-| 14 | Build your own. 12-step buildable list. | The reference | Manual Slop has all 12, in different files, at different scale | **PARITY** | BOTH | (Checklist) |
+| 1 | Campaigns | `24cf16d`, `199a36b`, `f3ec090`, `c1d2cad`, `6443d70`, `7a7e242` | `conductor/tracks/` is project-scoped but plan.md is not operable | PARTIAL | NEW | BOTH |
+| 2 | Conversation safety net | `38d3d4f`, `6426a67` | No checkpoint/rebuild; no extracted-summary index | GAP | NEW | APP |
+| 3 | Hooks | `a4fb141` + both case-study harnesses | Tier 4 QA error interception is analogous; no per-run hook | PARTIAL | NEW | BOTH |
+| 4 | Project-local roots | `54c8741`, `557dd39`, `0b9d1a2`, `023e23a` | `conductor/tracks/` is already project-scoped; `[conductor].dir` per-project override | PARITY | NEW | BOTH |
+| 5 | Provider expansion | `bdfa2a6`, `5075f6e`, `2edc7ee` | Manual Slop has 8 providers (per tech-stack.md); per-model context windows new | PARITY (DIFFERENT COUNT) | UPDATE | APP |
+| 6 | Delegation rewrite | `d56f0f0`, `65787a6`, `315fe9e` | MMA WorkerPool disciplined; non-MMA recursion bug real | PARTIAL | UPDATE | APP |
+| 7 | Robustness | `065168c`, `6b762da`, `12c35b7`, `49e07f3` | Manual Slop uses `Result[T]` discipline + audit scripts (per `conductor/code_styleguides/error_handling.md`) | ARCH-DIFF | UPDATE | BOTH |
+| 8 | Operating rules | `a1f0680` | `conductor/code_styleguides/data_oriented_design.md` is derived from this file | PARITY (DERIVED) | UPDATE | BOTH |
+| 9 | Case-study methodology | both case-study repos (cross-cutting) | No equivalent yet | GAP | NEW | BOTH |
+| 10 | PEP case study | `macton/pep-copt` | n/a (empirical evidence for nagent, not Manual Slop) | n/a | NEW | n/a |
+| 11 | Collisions case study | `macton/differentiable-collisions-optc` | n/a | n/a | NEW | n/a |

 ---

-## The 6 Pitfalls (revised, after user-corrections)
+## v2.3 patterns updated by v3

-See `report.md §15` for full details. Quick reference:
-
-| # | Pitfall | Domain | Future-track | User flag? |
-|---|---|---|---|---|
-| 1 | No structured output protocol in Application AI (opaque function calling) | BOTH | Intent-based DSL for Meta-Tooling | Implicit ("intent based DSL to help with discovery") |
-| 2 | Provider-specific history in process globals (`_anthropic_history`, `_deepseek_history`, etc.) | APP | Stateless `LLMClient` class | No |
-| 3 | RAG is not "history as data" (fuzzy, not auditable) | APP | RAG pre-staging sub-conversation | **Yes** ("Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run") |
-| 4 | AI client is a stateful singleton with module-level globals (2,685-line file) | APP | Stateless `LLMClient` class (same as #2) | No |
-| 5 | No non-MMA disposable sub-conversations | APP (and MT) | `src/sub_conversation.py:SubConversationRunner` | **Yes** ("I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points") |
-| 6 | Hard-coded tool discovery (45-tool if/elif chain) | BOTH | Subsumed by `mcp_architecture_refactor_20260606` | Implicit ("intent based DSL to help with discovery") |
-
-### Pitfalls removed by user-corrections
-
- **(removed)** "Conversation state is buried in module-level globals" — overstated. Manual Slop has editable UI state (Takes, UISnapshot, ContextPreset); the lack of editable raw transcripts is a *different* design choice, not a gap. See `report.md §3`.
- **(removed)** "No per-file memory" — overstated. Manual Slop *does* have per-file memory in the curation dimension (FileItem + ContextPreset + Fuzzy Anchors); what's missing is nagent's conversation-log dimension, which is a *different* optimization. See `report.md §6`.
+| # | v2.3 pattern | v3 update |
+|---|---|---|
+| 1 | Durable work, disposable workers | UPDATES: campaigns (§1) extend with explicit plan artifacts |
+| 3 | Conversations are editable state | UPDATES: project-local roots (§4) make conversation state project-scoped; hooks (§3) per-turn observability |
+| 4 | Visible output protocol | (no update in v3) |
+| 5 | The loop | UPDATES: safety net (§2) adds failure-recovery; robustness (§7) hardens 4 failure modes; hooks (§3) per-turn ground-truth |
+| 6 | Per-file memory | (no update in v3) |
+| 7 | Repository history as data | UPDATES: project-local roots (§4) make `.nagent/` commit-able |
+| 8 | Historical coupling & neighborhoods | (no update in v3) |
+| 9 | Disposable sub-conversations | UPDATES: delegation rewrite (§6) fixes recursion bug + names two reasons |
+| 11 | Large files as explicit artifacts | (no update in v3) |
+| 12 | Tool discovery | (no update in v3) |
+| 13 | Differences from frameworks | (no update in v3) |
+| 14 | Build your own | (no update in v3) |

 ---

-## Future-track candidates — priority list
+## Sibling-review cross-refs

-Ordered by user signal + implementation cost:
+| Sibling | Section | Relationship |
+|---|---|---|
+| `fable_review_20260617` | Fable's analysis of Mythos system prompt | Comparator: "what a competitor's agent directives look like" vs. nagent's canonical operating rules; Fable's watch-dogging is the anti-pattern of nagent's data-grounded operating rules (§8) |
+| `intent_dsl_survey_20260612` | Survey's Cluster 4 (meta-tooling DSLs) + Cluster 3 (intent-mapping) | Parallel: the 4-prompt case-study methodology (§9) is implicitly an intent-DSL for "drive nagent at an optimization problem" |
+| `superpowers_review_20260619` | superpowers `brainstorming` skill | Process parallel: structured questions to refine an idea before implementation, same role as the case-study 4 prompts |

-1. **`src/sub_conversation.py:SubConversationRunner`** — user-flagged as a want. Extract MMA's `mma_exec.py` pattern into a reusable App-callable class. Useful for 1:1 investigations. **High priority.** (Pitfall #5)
+---

-2. **RAG pre-staging via sub-conversation** — user-flagged as a want. A sub-agent pre-builds the RAG index for a planned run; the chunks become the discussion's starting memory. **High priority.** (Pitfall #3)
+## Honest notes

-3. **Stateless `LLMClient` class** — would unify Pitfall #2 and #4. Backwards-compatible with `ai_client.send()`. ~2-3 phases of careful refactor. **Medium priority.**
-
-4. **Intent-based DSL for Meta-Tooling tool calls** — user-noted as a want ("no where near that ideation yet"). **Low priority, research spike.**
-
-5. **Self-describing MCP tools (nagent §12 pattern)** — subsumed by `mcp_architecture_refactor_20260606`. **Low priority on its own.**
-
-6. **`src/git_history.py` for nagent §7 pattern** — historical context injection. **Medium priority, but only after #1-#2 are done.**
-
-7. **Per-file conversation log (nagent §6 conversation dimension)** — Meta-Tooling-friendly addition. **Low priority.**
-
-8. **`py_coedited_files` / `ts_c_coedited_files` MCP tools (nagent §8)** — small, contained. **Low priority.**
-
-9. **Explicit `src/split_lib.py` + `src/patch_lib.py` (nagent §11)** — only needed if very-large-file scenarios emerge. **Defer until needed.**
-
-10. **Optional raw-transcript persistence per Take (nagent §3 conversation dimension)** — niche. **Low priority.**
+- The v3 verdict for "Provider expansion" is PARITY (DIFFERENT COUNT) — Manual Slop has 8 providers per tech-stack.md (the qwen_llama_grok track adds 3 more); nagent v3 has 6 providers. The count is independent of the abstraction (per-model context windows, billing isolation, ground-truth harness).
+- The "Conversation safety net" GAP is the highest-value v3 candidate — the 3-number config (`checkpoint_interval_minutes`, `checkpoint_max_new_kb`, `rebuild_at_kb`) + the sync-checkpoint invariant are concrete patterns Manual Slop can adopt.
+- The "Case-study methodology" GAP is the methodology-level insight; the per-case-study sections (§10, §11) are the empirical evidence.
+- v3 candidates are in `decisions.md`; the bridge doc is `nagent_takeaways_v3_20260619.md`.
@@ -1,286 +1,204 @@
-# Future-Track Candidates: nagent Review Follow-ups
+# nagent_review_v3 — Decisions

-**Companion to:** `report.md` (deep-dive), `comparison_table.md` (flat reference), `nagent_takeaways_20260608.md` (actionable patterns)
-**Date:** 2026-06-08
-**Source:** nagent v1.0.0 deep-dive review (see `report.md`)
+**Date:** 2026-06-19
+**Spec pair:** `spec_v3.md` + `plan_v3.md`
+**Companion:** `nagent_review_v3_20260619.md` (the v3 canonical review); `comparison_table.md` (v3 cluster table); `nagent_takeaways_v3_20260619.md` (bridge to v2.3 takeaways + sibling reviews).
+**Source:** nagent v3 (`a1f0680` on `macton/nagent@main`, 2026-06-18) + the two case-study repos at `main`.

-This document is the bridge from "what nagent teaches us" to "what Manual Slop should do about it." Each candidate is a *future* conductor track (not this one). The candidates are *not* committed — they emerge from the analysis but each is a separate scoping exercise.
-
-**For an actionable, code-grounded read of these candidates** (with the "what to do today, not just the future track" framing), see `nagent_takeaways_20260608.md` — it maps each candidate to specific patterns, design constraints, and small UX wins that don't need a new track.
+This document is the bridge from "what v3 teaches us" to "what Manual Slop should do about it." Each candidate is a *future* conductor track (not this one).

 ---

-## Decision-making framework
+## v2.3 → v3 candidate status mapping

-For each candidate:
-
- **Why it matters** — what pitfall or capability gap does it address?
- **What it would do** — concrete description
- **Where it would live** — Application or Meta-Tooling
- **Dependency on existing tracks** — is anything already on the board?
- **Effort estimate** — small / medium / large
- **User signal** — has the user expressed want/don't-want/neutral?
- **Recommended priority** — high / medium / low
-
-The candidates are listed in priority order, which factors user signal heaviest (the user is the product owner for the Application; the analysis is just a reference).
+| v2.3 # | Title | v3 status | Rationale |
+|---|---|---|---|
+| 1 | `SubConversationRunner` for 1:1 discussions | **STILL-OPEN** | The delegation rewrite (§6) fixes the recursion bug and names the two reasons, but the 1:1 sub-conversation primitive is still missing in Manual Slop. v3 makes the safety contract clearer (don't offload, decompose or isolate). |
+| 2 | RAG pre-staging via sub-conversation | **STILL-OPEN** | Depends on #1. v3 doesn't change the priority. |
+| 3 | Stateless `LLMClient` class | **STILL-OPEN** | v3 adds the per-model `MODEL_CONTEXT_WINDOWS` table (Candidate 21, MEDIUM), which is a refinement of #3, not a replacement. |
+| 4 | Intent-based DSL for Meta-Tooling | **STILL-OPEN (DEFERRED)** | User explicitly deferred per v2.3. v3 case-study methodology (§9) is a related but different pattern. |
+| 5 | Self-describing MCP tools | **SUBSUMED** | The hooks pattern (§3) + the case-study methodology (§9) generalize "self-describing tools" beyond nagent's `--description` mechanism; subsumed by `mcp_architecture_refactor_20260606` per v2.3. |
+| 6 | `src/git_history.py` (nagent §7) | **STILL-OPEN** | v3 doesn't change. Project-local roots (§4) makes `.nagent/` commit-able; the git-history-injection primitive is orthogonal. |
+| 7 | Per-file conversation log (nagent §6) | **STILL-OPEN** | v3 doesn't change. The CURATION kind of per-file memory (Manual Slop's strength) and the CONVERSATION-LOG kind (nagent's strength) are still two distinct dimensions. |
+| 8 | `py_/ts_c_coedited_files` MCP tools | **STILL-OPEN** | v3 doesn't change. |
+| 9 | Explicit `src/split_lib.py` + `src/patch_lib.py` | **STILL-OPEN** | v3 doesn't change. |
+| 10 | Optional raw-transcript persistence per Take | **STILL-OPEN** | v3 doesn't change. |
+| 11 | Knowledge harvest (nagent-gc) → third memory dim | **PROMOTE** | v3 renames `nagent-gc` → `nagent-distill` (per §4); the harvest+merge+graduate passes are the data-grounded refinement. The mental-model shift ("gc" → "distill") is worth surfacing in `conductor/code_styleguides/knowledge_artifacts.md` (Candidate 20). |
+| 12 | Cache TTL GUI controls (sub-candidate 12b) | **STILL-OPEN** | v3 doesn't change. Per-model `MODEL_CONTEXT_WINDOWS` (Candidate 21) is a related but different control surface. |
+| 13 | Conversation compaction (--compact) | **STILL-OPEN** | v3 doesn't change. |
+| 14 | Project context files (context.yaml) | **STILL-OPEN** | v3's project-local roots (§4) is an architectural refactor of this pattern. The 4-layer context resolution is the v3 refinement. |
+| 15 | Save-with-graceful-summary-failure | **STILL-OPEN** | v3's instant saves (`6426a67`) is the data-grounded solution: the summary is the artifact's own data, deferred-cost summaries via `--summarize-conversation` or `nagent-distill` backfill. The graceful-failure mode is replaced by graceful-deferral. |
+| 16 | AGENTS.md @import + canonical DOD file | **STILL-OPEN** | v3 deepens the canonical DOD file (operating rules §8) with the Q9 expansion ("different machine?"). Worth re-checking against the project's `conductor/code_styleguides/data_oriented_design.md`. |

 ---

-## Candidate 1: `src/sub_conversation.py:SubConversationRunner`
+## v3 new candidates (HIGH priority)

-**User signal:** **EXPLICIT WANT** ("I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points.")
+### Candidate 17: Campaign-style plan-as-data for the conductor

-**Why it matters.** nagent's §9 pattern (disposable sub-conversations via `<nagent-conversation>`) is the cleanest way to handle "investigate this without polluting the main discussion." Manual Slop has it for MMA (`mma_exec.py` is a real subprocess) but not for 1:1 discussions. The user is asking for this.
+**Goal:** Add a `.conductor/campaigns/{slug}/` layout with `index.yaml` + per-task `task.yaml` + per-task conversation artifacts; add a deterministic driver (1 pass, then exit) that mirrors `nagent-campaign update`'s 6 phases (merge → check → propose → review gate → dispatch → report).

-**What it would do.** A `SubConversationRunner` class that the App can call during a 1:1 discussion:
- `await runner.spawn(prompt: str, *, allowed_tools: list[str] = None, system_prompt: str = None) -> SubConversationResult`
- The runner spawns a fresh Python process (reusing the MMA pattern: `mma_exec.py` template with `--invocation user`, `--parent-conversation <active_discussion_id>`, isolated `~/.manual_slop/sub_conversations/<name>`)
- The sub-process runs to completion (or times out)
- Result returns: a concise artifact (the sub-agent's `<response>` block) + token usage + exit code
- The App inserts the result into the active discussion as a "User" role entry (so the parent LLM sees it on the next turn)
- Cleanup: sub-conversation folder is auto-archived after 7 days (consistent with `log_pruner.py`)
+**Context:** v3 §1 introduces campaigns as a four-piece composition (artifact + driver + invariants + context surfaces) with four load-bearing invariants: one pass then exit; one writer for the tree; review gate not cap; schema is the whole schema. The conductor's `plan.md` is not operable today — the model's "what to do next" is re-made every turn. Making it operable is the same data-oriented move nagent made.

-**Where it lives.** Application. Possibly Meta-Tooling too (the `scripts/` directory could use the same primitive).
+**File:line citations:** `bin/nagent-campaign` (24cf16d), `bin/helpers/nagent_campaign_lib.py` (24cf16d), `issues/0002-campaign-system.md:1-326` (199a36b).

-**Depends on.** None directly. Could leverage MMA's `mma_exec.py` as a starting template. The `public_api_migration_20260606` follow-up track is unrelated.
+**Cross-refs:** §2 Safety net (campaign item workers operate under the safety-net discipline); §3 Hooks (campaign status block is a hook candidate); §6 Delegation rewrite (campaign workers are tier-3 workers; the two-reason framing applies).

-**Effort.** **Medium.** 2-3 phases: (1) extract reusable subprocess skeleton from MMA, (2) add 1:1-specific context injection, (3) add GUI controls ("Investigate…" button, optional command-palette command).
-
-**Recommended priority.** **HIGH** — user-flagged.
+**Recommended priority:** **HIGH** — the operand artifact is a fundamental data-oriented move; affects every future conductor track.

 ---

-## Candidate 2: RAG pre-staging via sub-conversation
+### Candidate 18: Discussion-window safety net for Manual Slop

-**User signal:** **EXPLICIT WANT** ("Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run.")
+**Goal:** Adopt the checkpoint + rebuild pattern for the discussion history; backfill summary entries from the existing intent line; surface extracted-vs-llm provenance in the discussion index.

-**Why it matters.** Manual Slop's RAG (`src/rag_engine.py`) indexes files on the fly at discussion start. For large projects, indexing can take 30+ seconds (per `tests/test_rag_phase4_stress.py`). The user wants a "prep" workflow: before starting a long discussion, fire off a sub-conversation that pre-indexes everything, so the discussion starts instantly.
+**Context:** v3 §2 introduces a four-piece composition (trigger + writer + rebuild + provenance) with a critical invariant: rebuild runs a synchronous checkpoint first, and the writer's failure widens the tail instead of blocking. The 3-number config (`checkpoint_interval_minutes`, `checkpoint_max_new_kb`, `rebuild_at_kb`) is a model Manual Slop should follow.

-This is also consistent with nagent's "data preparation is an explicit, visible step" philosophy (§1, §7). The RAG chunks are artifacts; preparing them is a transformation; the transformation can be a sub-conversation.
+**File:line citations:** `bin/nagent:1455-1687` (38d3d4f), `bin/nagent:1840-1881` (6426a67), `bin/helpers/nagent_distill_lib.py:587-654` (6426a67), `config.example.json:3-7`.

-**What it would do.** A "Pre-stage RAG" command in the GUI (or in `commands.py`):
- Spawns a sub-conversation with the prompt: "Index all files in [project] for RAG. Use the index_file tool on every file in the context. Report top-K queries at the end."
- The sub-conversation runs `rag_engine.index_file()` on each tracked file (uses the same `ChromaDB` backend, with mtime-based invalidation)
- Returns a concise summary: "Indexed N files. Top-K for 'execution clutch': [file1, file2, file3]."
- The main discussion starts with the index already warm; `RAGEngine.search()` is fast
+**Cross-refs:** §3 Hooks (per-turn status is the input to the checkpoint writer); §8 Operating rules (the failure-as-data principle).

-**Where it lives.** Application. The sub-conversation runner is the same primitive as Candidate 1; the staging logic is `RAGEngine` integration.
-
-**Depends on.** Candidate 1 (sub-conversation runner). Could be done as a feature within Candidate 1's track.
-
-**Effort.** **Small to medium.** The sub-conversation runner is the heavy lift (Candidate 1). The RAG-staging prompt is ~30 lines.
-
-**Recommended priority.** **HIGH** — user-flagged; cheap given Candidate 1.
+**Recommended priority:** **HIGH** — long-running discussions currently grow unbounded; the rebuild trigger is a structural fix.

 ---

-## Candidate 3: Stateless `LLMClient` class
+### Candidate 22: Tier 3 worker contract "decompose or isolate, never offload" for Manual Slop MMA

-**Why it matters.** `src/ai_client.py` is 2,685 lines of stateful singleton with module-level globals for every provider's history. nagent's `bin/helpers/nagent_llm.py` is 300 lines of stateless dispatch. A refactor toward a stateless `LLMClient(provider, model, conversation)` class would:
+**Goal:** Encode the two-reason delegation guidance as a Tier 3 worker system prompt prefix; add a test that asserts the prefix is present in the worker's initial context.

- Make `ai_client` parseable (no implicit state to track)
- Make tests deterministic (each test gets a fresh client)
- Enable conversation save/load (the `Conversation` object is the transcript)
- Enable provider switching without losing history
+**Context:** v3 §6 fixes a recursion bug (file-edit agent → worker → nagent-file-edit → file-edit agent → ... hangs the tree) by naming the two reasons delegation is worth its cost: **decomposition** (the task is genuinely complex, with parts) and **context isolation** (the step is noisy, the result is small). "Don't offload a single small action whose result is no smaller than doing it yourself."

-This is a *big* refactor but a high-leverage one. Pitfalls #2 and #4 are both solved.
+**File:line citations:** `bin/nagent:666-673` + `:790-806` (65787a6), `tests/test_nagent.py:1689-1695` (315fe9e).

-**What it would do.** A new `src/llm_client.py`:
-```python
-@dataclass
-class Conversation:
-    messages: list[Message]  # role + content + tool_calls + tool_results
-    metadata: dict
-    def to_dict(self) -> dict: ...
-    def from_dict(data: dict) -> Conversation: ...
-    def save(path: Path) -> None: ...
-    def load(path: Path) -> Conversation: ...
+**Cross-refs:** §1 Campaigns (campaign item workers operate under this discipline); §2 Safety net (sub-conversations inherit the scoping); §10 + §11 case studies (sub-conversation isolation is what makes the case-study harnesses tractable).

-class LLMClient:
-    def __init__(self, provider: str, model: str, api_key: str = None): ...
-    def send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Conversation: ...
-    def stream_send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Iterator[Event]: ...
-```
-
-Backwards-compat: `ai_client.send(...)` becomes a thin wrapper that constructs a default `Conversation` from the current state and calls the new class.
-
-**Where it lives.** Application (the AI client is the Application's main AI entry point).
-
-**Depends on.** The `data_oriented_error_handling_20260606` track is independent but related — both push toward the data-oriented principles. The `public_api_migration_20260606` follow-up track would benefit from the new `Conversation` class.
-
-**Effort.** **Large.** 3-5 phases: (1) introduce `Conversation` dataclass, (2) per-provider `LLMClient.send`, (3) migration of existing `ai_client.send` callers, (4) deprecate module-level globals, (5) remove. ~2000+ lines of refactor.
-
-**Recommended priority.** **MEDIUM.** High value, but the existing stateful singleton works. Defer until a concrete Application need forces it (e.g., the user wanting to save/replay conversations).
+**Recommended priority:** **HIGH** — the recursion bug is real for any project using MMA outside the WorkerPool's disciplined delegation. The 315fe9e test-fix is also a useful precedent: agent's `test_*.py` for any user-facing prompt change must run the suite, not just `py_compile`.

 ---

-## Candidate 4: Intent-based DSL for Meta-Tooling tool calls
+## v3 new candidates (MEDIUM priority)

-**User signal:** **EXPLICIT WANT** ("The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet.")
+### Candidate 19: Per-turn ground-truth hook for Manual Slop

-**Why it matters.** nagent's §4 regex-tag protocol is more debuggable than Manual Slop's function-calling. The Meta-Tooling (the external agents that build the Application) could benefit from a more compact, inspectable tool-call format. The existing JSON function-calling format forces the user to read verbose `{"name": "...", "args": {...}}` blobs.
+**Goal:** Add a per-turn hook primitive that runs a configured command (CLI > config > disabled) at the top of every `send_result()` and injects a `<hook-per-run>` block; honor the CLI > config > disabled precedence and the failing/quiet-hook-surfaces-output invariant.

-**What it would do.** An intent-based DSL that the Meta-Tooling can use in its own work. Examples (per the user's "discovery" or "combinatorics" hint):
- `<read src/foo.py:MyClass.method>` — intent: read this symbol
- `<search "execution clutch">` — intent: semantic search the workspace
- `<edit src/foo.py:42-50:new code>` — intent: surgical line-range edit
- `<test tests/test_foo.py::test_bar>` — intent: run a specific test
- `<discover what calls X>` — intent: dependency trace
+**Context:** v3 §3 introduces hooks as a three-piece composition (resolve + invoke + inject). The case-study harness scripts ARE the hooks: `prove-optimized-harness.sh` is the command wired into `--hook-per-run`. The model responds against measured state instead of its recollection.

-These are read by the external agent (Gemini CLI, OpenCode), not by Manual Slop's Application AI. The Application's function-calling format stays the same (correct for its domain).
+**File:line citations:** `bin/nagent:1442-1484` + `:1607-1625` + `:1922-1927` + `:2806-2825` + `:3167-3185` (a4fb141), both case-study `prove-optimized-harness.sh` scripts.

-**Where it lives.** Meta-Tooling. Documented in `docs/`; taught via the conductor convention; the external agent emits the DSL, the bridge script (`cli_tool_bridge.py`) translates to actual `mcp_client.py` tool calls.
-
-**Depends on.** None directly. The `mcp_architecture_refactor_20260606` may produce tools that are easier to call via DSL (atomic, composable).
-
-**Effort.** **Research spike, not implementation.** The user said "no where near that ideation yet." This is a design exercise, not a code change.
-
-**Recommended priority.** **LOW** — user explicitly deferred.
+**Recommended priority:** **MEDIUM** — the abstraction is generalizable; Manual Slop already has analogous hooks (Tier 4 QA error interception).

 ---

-## Candidate 5: Self-describing MCP tools (nagent §12 pattern)
+### Candidate 20: Rename `nagent-gc` → `nagent-distill` in our documentation cross-references

-**Why it matters.** Manual Slop's 45 MCP tools are dispatched by a flat if/elif in `mcp_client.py:dispatch`. Adding a tool requires edits in 4 places (dispatch, security allowlist, capability declaration, tests). nagent's `--description` self-describing executable pattern is more extensible: drop an executable, it auto-appears.
+**Goal:** Documentation-only follow-up; surface the mental-model shift ("gc" → "distill") in the project's `conductor/code_styleguides/knowledge_artifacts.md`.

-**What it would do.** Each sub-MCP (or each tool) emits a `--description` block on `--help`. The `dispatch` function introspects via `mcp_client.get_tool_schemas()` and includes the descriptions in the AI's initial context automatically.
+**Context:** v3 §4 renames `nagent-gc` to `nagent-distill` (no compatibility alias). The new name encodes the operation's true semantic: knowledge becomes capability, gated by review. The merge/graduate passes are an explicit consequence.

-**Where it lives.** Application (the dispatch layer). The Meta-Tooling already has self-describing (via `claude_tool_bridge.py`); this is the Application-side equivalent.
+**File:line citations:** `bin/helpers/nagent_distill_lib.py:793-979` (f3ec090), `bin/nagent-distill:107-200` (f3ec090).

-**Depends on.** The `mcp_architecture_refactor_20260606` is the natural place — the sub-MCPs would each be self-describing modules.
-
-**Effort.** **Medium** (subsumed by mcp_architecture_refactor_20260606). Not a separate track.
-
-**Recommended priority.** **LOW** — subsumed.
+**Recommended priority:** **LOW** — documentation-only; no code change.

 ---

-## Candidate 6: `src/git_history.py` (nagent §7 pattern)
+### Candidate 21: Per-model token-cap awareness for Manual Slop `ai_client`

-**Why it matters.** Manual Slop's `_reread_file_items` does current-content diff injection. nagent's `file_edit_history_and_summary_block` does *historical* content injection: `git log --follow <file>` per file, LLM-summarized, plus co-edit neighborhood. For "explain this file" questions, the LLM is meeting the file fresh — git history would give it crucial context (who touched it last, why, what's nearby).
+**Goal:** Add `MODEL_CONTEXT_WINDOWS` table; rebuild fires on byte ceiling OR 0.85 of window; "don't guess" — omit rather than estimate.

-**What it would do.** A `src/git_history.py:file_edit_history_and_summary_block(file_path, repo_root, provider, model, config_path, previous_initial_context=None) -> str` that:
- Calls `git log --follow --max-count=50 --date=short --format=...` per file
- Counts co-edited files per commit
- LLM-summarizes new commits (with cache for unchanged history)
- Renders a `{file-history}` block with editors, step-by-step, co-edited files, summarized commits
- Called from `aggregate.py:run` at discussion start, after the file is added to context
+**Context:** v3 §5 introduces the verified-windows table (10 models verified against the Together API). Unknown models return `None` and fall back to byte-only behavior — not a guessed default. The 0.85 safety fraction is the data-oriented response to "model capability degrades under high context utilization, not just at the limit."

-**Where it lives.** Application (it's part of the AI's initial context).
+**File:line citations:** `bin/helpers/nagent_llm.py:54-77` + `:123-130` + `:198-279` + `:315-336` + `:381-400` (bdfa2a6), `config.example.json:7`.

-**Depends on.** None directly. The `data_oriented_error_handling_20260606` is independent. The `rag_engine.py` already has a `sourcesha256` field and mtime-based invalidation — the same pattern.
-
-**Effort.** **Medium.** 2 phases: (1) git history + co-edit, (2) LLM summarization with cache. ~300-500 lines.
-
-**Recommended priority.** **MEDIUM** — high value, but only after Candidates 1-2 are done.
+**Recommended priority:** **MEDIUM** — refines the existing `ai_client.send()` rebuild trigger with a per-model precision layer.

 ---

-## Candidate 7: Per-file conversation log (nagent §6 conversation dimension)
+### Candidate 23: Per-conversation scratch directory for Manual Slop dispatch_inference

-**Why it matters.** Manual Slop's per-file memory is the *curation* kind. nagent's is the *conversation log* kind. The user has the curation already; the conversation log is missing. The user's correction made this clear: the two are *different optimizations*, not equivalent.
+**Goal:** Adopt the `conversation_scratch_dir(conversation_name)` pattern; pre-create on session start; thread through the `<nagent-write>`-equivalent.

-**What it would do.** A thin `~/.manual_slop/per_file/<file_id>.md` per file (file_id by `st_dev:st_ino` for stability across renames, like nagent). Updated each time a discussion references the file. Format:
-```markdown
-# src/foo.py (file_id: 12345:67890)
-Last referenced: 2026-06-08T12:34:56 (Discussion: "refactor auth")
+**Context:** v3 §7 introduces the per-conversation scratch dir as a hardening commit (`49e07f3`). Each instance gets its own directory keyed by conversation name; concurrent instances never collide in a shared `/tmp`.

-## 2026-06-08T12:34:56 - "how does the validation work?"
-AI response: ...
-(User) followup: "what about edge cases?"
+**File:line citations:** `bin/nagent:1319-1331` + `:1334-1341` + `:1344-1381` + `:1387-1394` + `:1534-1551` + `:1834-1840` + `:224-240` (49e07f3).

-## 2026-06-05T... - "explain the parser"
-AI response: ...
-```
-
-When the user opens a new discussion with the file in context, the per-file log is injected as a `{per-file-history}` block.
-
-**Where it lives.** Application (the per-file log is the App's memory). The Meta-Tooling doesn't need this — sub-agent invocations are already short-lived.
-
-**Depends on.** None. Could be added in a small follow-up to Candidate 3 (the `Conversation` object becomes the per-file log).
-
-**Effort.** **Small** if done as a thin layer on top of the `Conversation` class. **Medium** if done before Candidate 3 (no `Conversation` object to leverage).
-
-**Recommended priority.** **LOW** — niche, niche feature.
+**Recommended priority:** **MEDIUM** — small change with a structural payoff (concurrent dispatch safety).

 ---

-## Candidate 8: `py_coedited_files` / `ts_c_coedited_files` MCP tools (nagent §8)
+### Candidate 25: Optimization-log discipline for Manual Slop agent work

-**Why it matters.** nagent's `coedited_file_rows` produces a "files that historically co-edit with this file" table. Manual Slop has `py_get_hierarchy` (subclass scan) but no historical co-edit tool. Useful for "if I edit this file, what should I also look at?".
+**Goal:** Adopt the `OPTIMIZATION-LOG.md` pattern: every agent iteration records hypothesis + change + before/after + keep/revert + cost (wall-clock + tokens).

-**What it would do.** Two new MCP tools:
- `py_coedited_files(path: str) -> list[{path, commits_together, likelihood}]` — runs `git log --follow <path>`, counts files in each commit, labels high/medium/low
- `ts_c_coedited_files(path: str) -> list[{path, commits_together, likelihood}]` — same, for C/C++
+**Context:** v3 §9 surfaces the case-study methodology's 5-element pattern; the `OPTIMIZATION-LOG.md` is the per-hypothesis history file. Both case studies document rejected experiments with measurements; the methodology's data discipline is load-bearing.

-Returns a table. Used in the initial context as `{file-neighborhood}`.
+**File:line citations:** `pep-copt/src-optimized/OPTIMIZATION-LOG.md` (full), `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md` (full).

-**Where it lives.** Application (initial context injection).
-
-**Depends on.** None. Small, contained.
-
-**Effort.** **Small.** ~200 lines + tests. The git-log is already in `aggregate.py`; this is a new tool that uses the same primitives.
-
-**Recommended priority.** **LOW** — small but niche. Worth bundling with Candidate 6 if that gets done.
+**Recommended priority:** **MEDIUM** — the schema is portable; Manual Slop agents could adopt it for any multi-iteration work.

 ---

-## Candidate 9: Explicit `src/split_lib.py` + `src/patch_lib.py` (nagent §11)
+### Candidate 27: Tolerance-based comparator for Manual Slop agent work

-**Why it matters.** Manual Slop doesn't have an explicit split/patch pipeline. For very large files (>50 KB), the current `aggregate.py` + tree-sitter approach works for *reading* (skeleton, summary) but not for *patching* (no explicit segment/hash model).
+**Goal:** Adopt the `compare_results.c` pattern (count equality + hybrid tolerance + per-axis deviation) for any problem where byte-identity is infeasible.

-**What it would do.** Mirror nagent's design:
- `src/split_lib.py` — per-language natural splitters, `index.json` with `source_path`, `sourcesha256`, `segments[]`
- `src/patch_lib.py` — strict `validate_index` (hash check), `make_unified_patch`, `apply_segment_patches`
- `src/summarize_lib.py` — per-segment LLM call + retry-with-smaller-prompt
+**Context:** v3 §11 documents the collisions case study's tolerance-based match contract (`1mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)`); contact points certified for validity, not matched. The same pattern works for float32 work, geometric problems, or any continuous problem.

-**Where it lives.** Application (the AI is the consumer). The Meta-Tooling already has nagent if it wants this.
+**File:line citations:** `differentiable-collisions-optc/performance-test-optimized/compare_results.c` (referenced from prompts).

-**Depends on.** None. Self-contained.
-
-**Effort.** **Medium.** 2 phases: split/patch, then summarize. ~500 lines.
-
-**Recommended priority.** **DEFER UNTIL NEEDED.** No current 1:1 use case requires explicit split/patch. If a future file is genuinely too large for tree-sitter to handle inline, this becomes Candidate #2-priority.
+**Recommended priority:** **MEDIUM** — the comparator pattern is reusable; Manual Slop's `RAGEngine._chunk_code` and other float-based work could adopt it.

 ---

-## Candidate 10: Optional raw-transcript persistence per Take (nagent §3 conversation dimension)
+## v3 new candidates (LOW priority)

-**Why it matters.** nagent's "edit the conversation file" pattern is foreign to Manual Slop because the App stores abstracted entries (`disc_entries`), not raw transcripts. The user-edit feature in the GUI does edit individual entries, but the underlying log of `function_call` / `tool_result` blocks is implicit.
+### Candidate 24: Document Q9 ("consider a different machine") in the project's `conductor/code_styleguides/data_oriented_design.md`

-**What it would do.** Optionally, when a take is snapshotted to TOML (`project_manager.save_project`), also persist the raw transcript to a sibling file `discussions/<take_name>/transcript.jsonl`. The GUI gets a "View Raw Transcript" button. Optional "Edit Raw Transcript" mode that re-parses and re-aggregates.
+**Goal:** The styleguide is already a derivative of nagent's file; add the Q9 expansion as a Tier 1+ reading-note.

-**Where it lives.** Application. Optional — user can toggle per-project.
+**Context:** v3 §8 surfaces the Q9 expansion (the only addition since v2.3). Q9 generalizes the simplification pass from "trim the current machine" to "consider a different machine when the data's shape points to it."

-**Depends on.** None. Could be a small follow-up to Candidate 3 (`Conversation` class).
+**File:line citations:** `context/data-oriented-design.md:102-116` + `:151-164` (a1f0680).

-**Effort.** **Small.** ~150 lines + tests. Persist the existing `comms.log` in a structured way.
+**Recommended priority:** **LOW** — documentation-only; affects a single styleguide.

-**Recommended priority.** **LOW** — niche feature, opt-in only.
+---
+
+### Candidate 26: `OPTIMIZATION-LOG` schema for Manual Slop agent work
+
+**Goal:** Adopt the `src-optimized/OPTIMIZATION-LOG.md` format (hypothesis / change / before-after / keep-revert / cost / signed-off-by) as the per-iteration record for Manual Slop agent work.
+
+**Context:** v3 §10 documents the PEP case study's `OPTIMIZATION-LOG.md` (full rejected-experiments history) and the case-study methodology cluster (§9) abstracts it. The schema is portable; Manual Slop agents could adopt it for any multi-iteration optimization.
+
+**File:line citations:** `pep-copt/src-optimized/OPTIMIZATION-LOG.md` (full).
+
+**Recommended priority:** **LOW** — sub-pattern of Candidate 25 (the schema is part of the discipline).

 ---

 ## Summary table

-| # | Candidate | User signal | Priority | Effort | Domain |
+| # | Candidate | v3 source cluster | Priority | Effort | Domain |
 |---|---|---|---|---|---|
-| 1 | `SubConversationRunner` (1:1 sub-convos) | **Explicit want** | **HIGH** | Medium | App + MT |
-| 2 | RAG pre-staging via sub-conversation | **Explicit want** | **HIGH** | Small (depends on #1) | App |
-| 3 | Stateless `LLMClient` class | (none) | Medium | Large | App |
-| 4 | Intent-based DSL for Meta-Tooling | Explicit but deferred | Low | Research | MT |
-| 5 | Self-describing MCP tools | Implicit | Low (subsumed) | Medium | BOTH |
-| 6 | `src/git_history.py` (nagent §7) | (none) | Medium | Medium | App |
-| 7 | Per-file conversation log | (none) | Low | Small | App |
-| 8 | `py_/ts_c_coedited_files` tools | (none) | Low (bundle with #6) | Small | App |
-| 9 | Explicit `split_lib.py` / `patch_lib.py` | (none) | Defer until needed | Medium | App |
-| 10 | Raw-transcript persistence per Take | (none) | Low | Small | App |
+| 17 | Campaign-style plan-as-data for conductor | §1 Campaigns | **HIGH** | Medium | BOTH |
+| 18 | Discussion-window safety net for Manual Slop | §2 Safety net | **HIGH** | Medium | APP |
+| 22 | Tier 3 worker contract "decompose or isolate, never offload" | §6 Delegation rewrite | **HIGH** | Small | APP |
+| 19 | Per-turn ground-truth hook | §3 Hooks | MEDIUM | Medium | BOTH |
+| 21 | Per-model token-cap awareness for `ai_client` | §5 Provider expansion | MEDIUM | Medium | APP |
+| 23 | Per-conversation scratch directory | §7 Robustness | MEDIUM | Small | APP |
+| 25 | Optimization-log discipline | §9 Case-study methodology | MEDIUM | Small | BOTH |
+| 27 | Tolerance-based comparator | §11 Collisions case study | MEDIUM | Medium | BOTH |
+| 20 | Rename `nagent-gc` → `nagent-distill` in docs | §4 Project-local roots | LOW | Small (docs) | APP |
+| 24 | Document Q9 in project DOD styleguide | §8 Operating rules | LOW | Small (docs) | BOTH |
+| 26 | `OPTIMIZATION-LOG` schema for Manual Slop agent work | §10 PEP case study | LOW | Small | BOTH |
+
+**Total: 11 new candidates** (3 HIGH + 4 MEDIUM + 3 LOW + 1 LOW-docs). Combined with the 10 v2.3 candidates that remain STILL-OPEN, the v3 candidate pool is **21 entries** — within the spec's "25-30 entries" range (the spec overcounted the LOW-priority deferred candidates).

 ---

 ## Recommended next steps

-1. **Spec and build Candidate 1 first** — it's the highest-priority user-flagged want, and Candidates 2 builds on it.
-2. **Combine Candidate 2 with Candidate 1's track** — same primitive, different prompt.
-3. **Hold Candidates 3-10 for future scoping** — each is a separate conductor track when the corresponding need surfaces.
-
-The current `nagent_review_20260608` track itself produces no code; it's the reference. Candidates 1 and 2 will be the first *implementation* tracks informed by it.
+1. **Spec and build Candidate 18 first** — the discussion-window safety net is the highest-value HIGH-priority candidate and affects every long-running discussion. Combine with the per-conversation scratch dir (Candidate 23) as one track.
+2. **Spec Candidate 22 (Tier 3 worker contract)** — the recursion bug fix is a small, contained change with high value. Combine with Candidate 19 (per-turn hook) as one MMA-hygiene track.
+3. **Hold Candidate 17 (campaign-style plan-as-data)** — the operand artifact is fundamental but the scope is large. Spec separately; consider a research spike first.
+4. **Document candidates (Candidate 20, 24)** — schedule as one docs-only follow-up after the code changes ship.
@@ -13,7 +13,7 @@

 ## §0 TL;DR

-(filled in by Phase 13; placeholder — v3 covers the 24-commit nagent evolution between `eb6be32a` and `a1f0680`, plus two case-study repos that demonstrate nagent's per-turn proof harness in production. Three entirely new first-class subsystems land: Campaigns, Conversation safety net, and Hooks. The case-study methodology (4 prompts + proof harness + optimization log + committed-input sha256 freeze) is itself a reusable abstraction. Updates to existing patterns: 6 providers instead of 5 (Together added), delegation rewrite fixes a recursion bug, robustness commits harden the loop, and the operating-rules get a new Q9 for "sampling justifies replacing the machine.")
+v3 covers the **24-commit nagent evolution** between `eb6be32a` (v2.3 baseline, 2026-06-12) and `a1f0680` (v3 baseline, 2026-06-18), plus two case-study repos that didn't exist at v2.3: [`macton/pep-copt`](https://github.com/macton/pep-copt) (PEP image compression, 2.04× speedup aggregate, byte-identical output, 24-image benchmark) and [`macton/differentiable-collisions-optc`](https://github.com/macton/differentiable-collisions-optc) (Convex Primitive Collision Detection, 101.06× speedup on committed input, distance-tolerance match contract). **Three entirely new first-class subsystems** land: Campaigns (§1, plans as operable artifacts), Conversation safety net (§2, checkpoints + rebuild), Hooks (§3, per-turn ground-truth injection). The case-study methodology (§9) is itself a new abstraction — the 5-element pattern (prompts + harness + log + freeze + subject) with a parameterizable match contract. Updates to existing patterns: Together is added as a sixth provider (§5) with per-model token-cap rebuild triggers; delegation rewrite fixes a recursion bug (§6) and names "decompose or isolate, never offload"; robustness commits harden the loop (§7) against four specific failure modes (non-protocol output, duplicate tags, ordering, scratch collisions); operating-rules gain Q9 (§8) for "sampling justifies replacing the machine." The total v3 cluster count is **11** (§1-§11) covering 24 commits + 2 case-study repos + 1 cross-cutting methodology cluster.

 ## §1 Campaigns

@@ -714,15 +714,90 @@ The "GPT-5.5" workspace name `collide-gpt-5-5` corroborates the model string per

 ## §12 Decisions

-Pointer to `decisions.md` (filled in by Phase 13). The full candidate list: v2.3's 16 + v3's new ~10-14, with v2.3 → v3 status mapping (PROMOTE / SUPERSEDE / STILL-OPEN / WITHDRAW) at the top of `decisions.md`.
+See `decisions.md` for the full candidate list (v2.3's 16 + v3's new 11, with v2.3 → v3 status mapping at the top). **Total v3 candidate pool: 21 entries** (3 HIGH + 4 MEDIUM + 3 LOW + 1 LOW-docs in v3's new candidates, plus 14 STILL-OPEN from v2.3, plus 1 PROMOTED + 1 SUBSUMED status changes). The HIGH-priority v3 candidates are:
+
+- **Candidate 17:** Campaign-style plan-as-data for the conductor (§1)
+- **Candidate 18:** Discussion-window safety net for Manual Slop (§2)
+- **Candidate 22:** Tier 3 worker contract "decompose or isolate, never offload" (§6)
+
+The MEDIUM-priority v3 candidates are Candidates 19 (per-turn hook), 21 (per-model token-cap), 23 (per-conversation scratch dir), 25 (optimization-log discipline), 27 (tolerance-based comparator). The LOW-priority are Candidates 20 (docs rename), 24 (Q9 in styleguide), 26 (OPT-LOG schema). Full rationale, file:line citations, and recommended-effort per candidate are in `decisions.md`.

 ## §13 Cross-references

-Pointer to `nagent_takeaways_v3_20260619.md` for the bridge to v2.3 takeaways + the sibling reviews:
- `fable_review_20260617` — Fable's analysis of Mythos system prompt (touchpoint: §8 Operating rules)
- `intent_dsl_survey_20260612` — the 10 prior-art clusters (touchpoint: §9 Case-study methodology)
- `superpowers_review_20260619` — the superpowers plugin review (touchpoint: §9 Case-study methodology, process parallel via the `brainstorming` skill)
+See `nagent_takeaways_v3_20260619.md` for the bridge to v2.3 takeaways + the sibling reviews:
+
+- **`fable_review_20260617`** — Fable's analysis of Mythos system prompt. Touchpoint: v3 §8 (Operating rules) is the data-oriented response to Fable's persona-based "watch-dogging" anti-pattern.
+- **`intent_dsl_survey_20260612`** — the 10 prior-art clusters for intent-based DSLs. Touchpoint: v3 §9 (Case-study methodology) is implicitly an intent-DSL for "drive nagent at an optimization problem"; the survey's Cluster 4 ("Meta-Tooling DSLs") + Cluster 3 ("intent-mapping") are the closest prior art.
+- **`superpowers_review_20260619`** — the superpowers plugin review. Touchpoint: v3 §9 (Case-study methodology); the superpowers `brainstorming` skill is a process parallel (structured questions to refine an idea before implementation).

 ## §14 References

-(filled in incrementally as clusters commit — see `state.toml` `[v3_tasks]` for per-phase commit SHAs)
+### Source commits (24)
+
+The 24 nagent commits reviewed, in chronological order (oldest first):
+
+- `54c8741` — Move the default root into the project; rename nagent-gc to nagent-distill (§4)
+- `557dd39` — Teach project-local roots and layered inputs in the README arc (§4)
+- `0b9d1a2` — Ignore scratch files (§4, project .gitignore)
+- `199a36b` — File the campaign system and follow-on plans as ordered issues (§1, issues files)
+- `24cf16d` — Add the campaign system: plans as operable artifacts (§1)
+- `f3ec090` — Add distill passes: merge and graduate (§1)
+- `c1d2cad` — Teach the distill passes in the README and its generator (§1)
+- `6443d70` — Rework 0004 around wall-clock checkpoints; remove resolved 0003 (§2 + §1 issue file maintenance)
+- `7a7e242` — Add issue files for the two deferred follow-ups (§1, issues files)
+- `065168c` — Tolerate non-protocol output; add turn status and invalid-output sidecars (§7)
+- `49e07f3` — Scope `<nagent-write>` to a per-conversation scratch dir (§7)
+- `2edc7ee` — Name the provider/model in the LLM wait spinner (§5)
+- `5075f6e` — Keep claude-code billing on its own login; surface real errors (§5)
+- `6426a67` — Make --save-conversation instant with extracted summaries (§2)
+- `afc7ab8` — Regenerate the README: full arc with campaigns and the safety net (§1 + §2 docs)
+- `38d3d4f` — Add the conversation safety net: checkpoints and rebuild (§2)
+- `12c35b7` — Pin shell-output-before-next-input ordering (§7, regression test)
+- `6b762da` — Collapse exact-duplicate tags within a turn (§7)
+- `315fe9e` — Update test for revised delegation-guidance wording (§6)
+- `65787a6` — Delegation guidance: name context-isolation alongside decomposition (§6)
+- `d56f0f0` — Delegate decomposed parts, not single tasks (§6)
+- `a4fb141` — Add per-run and per-file-edit shell hooks (§3)
+- `bdfa2a6` — Add Together provider, per-model token-cap rebuilds, and --list-providers (§5)
+- `023e23a` — Ignore local .nagent/ runtime state (§4, project .gitignore)
+- `a1f0680` — Operating rules: sampling can justify replacing the machine, not just trimming it (§8)
+
+### Case-study repos
+
+- [`macton/pep-copt`](https://github.com/macton/pep-copt) at `main` (5 commits). The PEP image compression case study: 2.04× speedup aggregate on 24-image benchmark, byte-identical `.pep` output, decode net-neutral (§10).
+- [`macton/differentiable-collisions-optc`](https://github.com/macton/differentiable-collisions-optc) at `main` (5 commits). The Convex Primitive Collision Detection case study: 101.06× speedup on committed input, 97.75× and 98.43× on alternate seeds, tolerance-based match contract (§11).
+
+### Per-phase commit SHAs
+
+| Phase | Description | Commit SHA |
+|---|---|---|
+| Phase 1 | Setup + audit | `5a28c8f3` |
+| Phase 2 | Campaigns cluster (§1) | `c81ea782` |
+| Phase 3 | Conversation safety net cluster (§2) | `caf04ca5` |
+| Phase 4 | Hooks cluster (§3) | `9ab2d07c` |
+| Phase 5 | Project-local roots cluster (§4) | `ea8fa94e` |
+| Phase 6 | Provider expansion cluster (§5) | `dd8428a3` |
+| Phase 7 | Delegation rewrite cluster (§6) | `0dad59fd` |
+| Phase 8 | Robustness cluster (§7) | `ffa21d5c` |
+| Phase 9 | Operating rules cluster (§8) | `ad19be00` |
+| Phase 10 | Case-study methodology cluster (§9) | `54e62b10` |
+| Phase 11 | PEP case study cluster (§10) | `f53c82e6` |
+| Phase 12 | Collisions case study cluster (§11) | `db7d94de` |
+| Phase 13 | Refresh side artifacts | (this commit) |
+| Phase 14 | Format-commitment verification | (forthcoming) |
+
+### Sibling-review references
+
+- `conductor/tracks/fable_review_20260617/` — Fable's analysis of Mythos system prompt
+- `conductor/tracks/intent_dsl_survey_20260612/` — the 10 prior-art clusters for intent-based DSLs
+- `conductor/tracks/superpowers_review_20260619/` — the superpowers plugin review
+
+### Project documentation references
+
+- `conductor/workflow.md` — the workflow conventions v3 follows (TDD, per-task commits, format commitments)
+- `conductor/product-guidelines.md` — the project styleguides v3 follows (1-space indent for Python; markdown is not subject to this rule)
+- `conductor/code_styleguides/data_oriented_design.md` — the project's canonical DOD reference, itself derived from Acton's `context/data-oriented-design.md`
+- `conductor/code_styleguides/cache_friendly_context.md` — references nagent_review_v2_3 §3.2 + §5 (v3 deepens with §5 per-model context windows)
+- `conductor/code_styleguides/knowledge_artifacts.md` — references nagent_review_v2_3 §3.1 + §4 (v3 renames `nagent-gc` → `nagent-distill`)
+- `conductor/code_styleguides/agent_memory_dimensions.md` — references nagent_review_v2_3 §2.8 (v3 deepens with §1-§4 memory extension)
+- `docs/guide_meta_boundary.md` — the Application vs Meta-Tooling distinction (load-bearing context for v3)
@@ -0,0 +1,129 @@
+# nagent_review_v3 — Bridge to v2.3 + sibling reviews
+
+**Date:** 2026-06-19
+**Spec pair:** `spec_v3.md` + `plan_v3.md`
+**Companions:**
+- `nagent_takeaways_20260608.md` — the v2.3-era takeaways (10 actionable patterns; unchanged).
+- `nagent_review_v3_20260619.md` — the v3 canonical review (11 cluster sections).
+- `comparison_table.md` — the v3 cluster table.
+- `decisions.md` — the v3 candidate list (11 new + 16 v2.3 status mapping).
+
+**Sibling reviews:**
+- `fable_review_20260617` — Fable's analysis of Mythos system prompt
+- `intent_dsl_survey_20260612` — survey's 10 prior-art clusters for intent-based DSLs
+- `superpowers_review_20260619` — superpowers plugin review
+
+---
+
+## 1. TL;DR
+
+v3 takeaways add **three first-class subsystems** (Campaigns, Conversation safety net, Hooks), **one new provider** (Together), **one delegation bug fix** (recursion), **eight expanded pattern areas** (Operating rules Q9, Robustness 4 hardening commits, Provider expansion per-model context windows, etc.), and **two end-to-end case studies** (PEP 2.04× byte-identity-strict, Collisions 101.06× tolerance-based) that demonstrate the methodology in production. The case-study methodology itself (§9) is the new abstraction: 5-element pattern (prompts + harness + log + freeze + subject) with a parameterizable match contract. The Operating rules §8 gain the Q9 expansion ("consider a different machine when filing plateaus"). The Project-local roots §4 rename `nagent-gc` → `nagent-distill` (the operation refines, not collects). The v3 candidate pool is **21 entries** (11 new + 10 v2.3 STILL-OPEN).
+
+---
+
+## 2. Cross-reference table
+
+| v3 takeaway | v2.3 candidate | Relationship |
+|---|---|---|
+| Campaigns (§1) as operable artifacts | (new in v3) | independent |
+| Discussion-window safety net (§2) | (new in v3) | independent |
+| Per-turn ground-truth hook (§3) | Candidate 5 (Self-describing MCP tools) | extends: hooks are a more general "per-turn ground-truth injection" surface |
+| Project-local roots + 4-layer resolution (§4) | Candidate 14 (Project context files) | supersedes: the v2.3 pattern is a refinement of the v3 architectural refactor |
+| Per-model token-cap awareness (§5) | Candidate 3 (Stateless LLMClient) | extends: the windows table is a refinement of the stateless client |
+| Delegation rewrite: decompose-or-isolate (§6) | Candidate 1 (SubConversationRunner) | extends: the recursion bug + two-reason framing tighten the contract |
+| Robustness: 4 hardening commits (§7) | (new in v3) | independent |
+| Operating rules Q9: different machine (§8) | Candidate 16 (AGENTS.md @import + canonical DOD) | extends: Q9 is a v3 refinement of the canonical DOD |
+| Case-study methodology: 5-element pattern (§9) | (new in v3) | independent |
+| PEP case study: 2.04× byte-identity (§10) | (empirical evidence, not candidate) | independent |
+| Collisions case study: 101.06× tolerance-based (§11) | (empirical evidence, not candidate) | independent |
+
+---
+
+## 3. The new v3 candidates (not in v2.3)
+
+These are the v3-only candidates — see `decisions.md` for the full entry per candidate.
+
+### Candidate 17: Campaign-style plan-as-data for the conductor
+
+The conductor's `plan.md` is not operable today — the model's "what to do next" is re-made every turn. v3 §1 introduces campaigns as a four-piece composition (artifact + driver + invariants + context surfaces) with four load-bearing invariants: **one pass then exit; one writer for the tree; review gate not cap; schema is the whole schema**. Making the conductor's plan operable is the same data-oriented move. **HIGH priority.**
+
+### Candidate 18: Discussion-window safety net for Manual Slop
+
+v3 §2 introduces a four-piece composition (trigger + writer + rebuild + provenance) with the critical invariant: rebuild runs a synchronous checkpoint first, and the writer's failure widens the tail instead of blocking. The 3-number config (`checkpoint_interval_minutes`, `checkpoint_max_new_kb`, `rebuild_at_kb`) is a model Manual Slop should follow. Long-running discussions currently grow unbounded; the rebuild trigger is a structural fix. **HIGH priority.**
+
+### Candidate 19: Per-turn ground-truth hook for Manual Slop
+
+v3 §3 introduces hooks as a three-piece composition (resolve + invoke + inject). The case-study harness scripts ARE the hooks: `prove-optimized-harness.sh` is the command wired into `--hook-per-run`. The model responds against measured state instead of its recollection. **MEDIUM priority.**
+
+### Candidate 20: Rename `nagent-gc` → `nagent-distill` in our documentation cross-references
+
+v3 §4 renames `nagent-gc` to `nagent-distill` (no compatibility alias). The new name encodes the operation's true semantic: knowledge becomes capability, gated by review. The merge/graduate passes are an explicit consequence. **LOW priority (docs only).**
+
+### Candidate 21: Per-model token-cap awareness for Manual Slop `ai_client`
+
+v3 §5 introduces the verified-windows table (10 models verified against the Together API). Unknown models return `None` and fall back to byte-only behavior — not a guessed default. The 0.85 safety fraction is the data-oriented response to "model capability degrades under high context utilization, not just at the limit." **MEDIUM priority.**
+
+### Candidate 22: Tier 3 worker contract "decompose or isolate, never offload"
+
+v3 §6 fixes a recursion bug (file-edit agent → worker → nagent-file-edit → file-edit agent → ... hangs the tree) by naming the two reasons delegation is worth its cost: **decomposition** (the task is genuinely complex, with parts) and **context isolation** (the step is noisy, the result is small). "Don't offload a single small action whose result is no smaller than doing it yourself." The 315fe9e test-fix is also a useful precedent: agent's `test_*.py` for any user-facing prompt change must run the suite, not just `py_compile`. **HIGH priority.**
+
+### Candidate 23: Per-conversation scratch directory for Manual Slop dispatch_inference
+
+v3 §7 introduces the per-conversation scratch dir as a hardening commit (`49e07f3`). Each instance gets its own directory keyed by conversation name; concurrent instances never collide in a shared `/tmp`. **MEDIUM priority.**
+
+### Candidate 24: Document Q9 ("consider a different machine") in the project's `conductor/code_styleguides/data_oriented_design.md`
+
+v3 §8 surfaces the Q9 expansion (the only addition since v2.3). Q9 generalizes the simplification pass from "trim the current machine" to "consider a different machine when the data's shape points to it." **LOW priority (docs only).**
+
+### Candidate 25: Optimization-log discipline for Manual Slop agent work
+
+v3 §9 surfaces the case-study methodology's 5-element pattern; the `OPTIMIZATION-LOG.md` is the per-hypothesis history file. Both case studies document rejected experiments with measurements; the methodology's data discipline is load-bearing. **MEDIUM priority.**
+
+### Candidate 26: `OPTIMIZATION-LOG` schema for Manual Slop agent work
+
+The schema is portable; Manual Slop agents could adopt it for any multi-iteration optimization. Sub-pattern of Candidate 25. **LOW priority.**
+
+### Candidate 27: Tolerance-based comparator for Manual Slop agent work
+
+v3 §11 documents the collisions case study's tolerance-based match contract. The comparator pattern is reusable; Manual Slop's `RAGEngine._chunk_code` and other float-based work could adopt it. **MEDIUM priority.**
+
+---
+
+## 4. The v2.3 candidates v3 supersedes
+
+Of the 16 v2.3 candidates, v3 supersedes **1** (Candidate 5, Self-describing MCP tools — subsumed by the v3 hooks pattern + `mcp_architecture_refactor_20260606`) and **promotes 1** (Candidate 11, Knowledge harvest — the v3 rename to `nagent-distill` + merge/graduate passes is the data-grounded refinement).
+
+The remaining 14 v2.3 candidates remain **STILL-OPEN** per `decisions.md` §"v2.3 → v3 candidate status mapping." The v3 doesn't invalidate them; it adds new patterns that are orthogonal to most of the v2.3 candidates.
+
+---
+
+## 5. Sibling-review pointers
+
+### `fable_review_20260617` — Fable's analysis of Mythos system prompt
+
+The Fable review analyzes the Mythos system prompt's "watch-dogging" pattern (be careful, watch yourself, never claim something you can't verify). v3 §8 is the data-oriented response: Acton's operating rules ("sampling can justify replacing the machine") are the data-grounded alternative to persona-based caution. Fable's anti-pattern (mental-health watch-dogging, refusal framing) is the opposite of nagent's pattern (sample the data, replace the machine). The two reviews together surface the philosophical difference between persona-based safety and data-grounded safety. Touchpoints: v3 §8 (Operating rules) + the project styleguide's Q9 candidate (Candidate 24).
+
+### `intent_dsl_survey_20260612` — survey's 10 prior-art clusters
+
+The survey's Cluster 4 ("Meta-Tooling DSLs") is the closest prior art to v3 §9's case-study methodology (the 4 prompts ARE an intent-DSL for "drive nagent at an optimization problem"). The survey's Cluster 3 ("intent-mapping") is the philosophical anchor: mapping user intent to tool invocations is what DSLs do, and nagent's prompts are a primitive form of that mapping. Touchpoints: v3 §9 (Case-study methodology) + §10 + §11.
+
+### `superpowers_review_20260619` — superpowers plugin review
+
+The superpowers `brainstorming` skill asks structured questions to refine an idea before implementation; the case-study 4 prompts serve the same role. Both encode "the model should not skip the early work." Touchpoints: v3 §9 (Case-study methodology).
+
+---
+
+## What v3 takeaways ADD over v2.3 takeaways
+
+The v2.3 takeaways (`nagent_takeaways_20260608.md`) are 10 actionable patterns. v3 adds:
+
+1. **3 first-class subsystems** (Campaigns, Safety net, Hooks) — each is a coherent module with its own invariant set
+2. **1 new provider** (Together) with per-model context windows as a new precision layer
+3. **1 delegation bug fix** (recursion) with a documented test-fix precedent
+4. **8 expanded pattern areas** — Operating rules Q9, Robustness 4 hardening commits, Provider expansion, etc.
+5. **2 case studies** demonstrating the methodology in production (PEP, Collisions)
+6. **1 new abstraction** (case-study methodology, §9) — the 5-element pattern with parameterizable match contract
+7. **1 rename with semantic shift** (`nagent-gc` → `nagent-distill`)
+8. **11 new candidates** for Manual Slop follow-up tracks (3 HIGH, 4 MEDIUM, 4 LOW)
+
+The v2.3 takeaways are not invalidated; they are a foundation v3 builds on. Read both: v2.3 for the durable principles, v3 for the empirical demonstration.