conductor(track): nagent_review_v3.1 §12-§14 new sections + renumber v3 §12-§14 to §15-§17

2026-06-20 11:34:40 -04:00
parent 1574ee47e4
commit 63b34eaef1
1 changed files with 462 additions and 13 deletions
@@ -2360,25 +2360,451 @@ The shape tag map: `[B]` for the boundary (the case-study is where the model's w
 **Decision candidate:** NEW Candidate 27 (LOW). "Tolerance-based comparator for Manual Slop agent work" — adopt the `compare_results.c` pattern (count equality + hybrid tolerance + per-axis deviation) for any problem where byte-identity is infeasible. See `decisions.md` Candidate 27.
 **Cross-refs:** §3 Hooks (`prove-optimized-harness.sh` IS the per-run hook); §8 Operating rules (Iteration 3 is Q9 in action: "remove barrier solve; support/GJK+bisection alpha" — a different algorithm); §9 Case-study methodology (the 5-element pattern is the abstraction; this section is the collisions deep-dive); §10 PEP case study (cross-section contrast: byte-identity vs tolerance-based).
 **Pattern history:** NEW. v2.3 had no case-study repos. v3 introduces the tolerance-based exemplar of §9's 5-element pattern. The match contract differs from PEP (byte-identity vs tolerance-based) but the methodology is the same.
-## §12 Decisions
+## §12 YAML avoidance

-See `decisions.md` for the full candidate list (v2.3's 16 + v3's new 11, with v2.3 → v3 status mapping at the top). **Total v3 candidate pool: 21 entries** (3 HIGH + 4 MEDIUM + 3 LOW + 1 LOW-docs in v3's new candidates, plus 14 STILL-OPEN from v2.3, plus 1 PROMOTED + 1 SUBSUMED status changes). The HIGH-priority v3 candidates are:
+**Source:** nagent uses YAML for `.nagent/campaigns/{slug}/index.yaml` + per-item `item.yaml` + per-item `proposal.yaml` + graduate `{name}.draft` (per §1 Campaigns cluster); distill graduates per `bin/nagent-distill --graduate`; per-file knowledge note frontmatter in `knowledge/files/{file_id}.md` (per v2.3 §2.1). User directive 2026-06-20: "I don't like YAML, acton may have utilized it or noted its utilization but I would not use it in whatever I take from his nagent implementation. I would continue to utilize markdown in combination with a custom DSL."
+**One-liner:** nagent uses YAML for campaigns/distill/knowledge; the user does NOT adopt YAML for Manual Slop artifacts — Manual Slop uses markdown with structured headings + custom DSL (survey grammar + SSDL) for any artifact that nagent would have used YAML for.
+**Pattern summary:** The YAML-avoidance pattern is a "do not adopt" flag on every YAML use site in nagent, with a markdown + custom DSL alternative specified per use case. The pattern is: (1) catalog every YAML use site in nagent (campaigns, distill, knowledge, graduates); (2) name the markdown + DSL alternative for each (markdown headings + survey grammar for inline computation, TOML frontmatter for project config precedent, SSDL for shape annotations); (3) document the rationale (whitespace fragility for AI-generated content, markdown+DSL is the project's existing convention per the intent_dsl_survey + superpowers_review sibling reviews, the custom DSL is the project's intent for inline computation not configuration); (4) cross-ref the project files that establish the markdown+DSL precedent (`conductor/presets.py`, `conductor/personas.py`, the 6 styleguides in `conductor/code_styleguides/`, the 14 `docs/guide_*.md` files).

- **Candidate 17:** Campaign-style plan-as-data for the conductor (§1)
+#### §12.1 Where nagent Uses YAML
+
+nagent uses YAML in four primary locations:
+
+1. **`.nagent/campaigns/{slug}/index.yaml`** — the campaign-level index. Per §1, the campaign tree is a YAML structure with `name`, `status`, `completion: [condition]`, `items: [item]`, and optional `proposal: proposal_yaml?`. The YAML is the state of record; the worker contract returns data; the driver is the only mutator.
+2. **`.nagent/campaigns/{slug}/{item_id}/item.yaml`** — the per-item state. Each item has `id`, `status`, `blocked_by: [id]`, `conversation: path`, optional `decompose: { when, into: [sub_item] }`, and optional `result: result_json?`. The YAML is editable; the user can hand-edit between turns.
+3. **`.nagent/campaigns/{slug}/{item_id}/proposal.yaml`** — the proposal file. Created by the LLM during the `propose` phase; contains the sub-items the LLM proposes. The review gate (per §1) decides whether to accept.
+4. **`.nagent/distill/{name}.draft`** — the graduate file. Created by `nagent-distill --graduate`; contains a non-executable draft of a tool or prompt. Invisible to tool discovery until the user reviews and renames to remove `.draft`.
+
+Additionally, nagent uses YAML-adjacent formats:
+- **Per-file knowledge note frontmatter** (`knowledge/files/{file_id}.md`) — the file has a YAML frontmatter block with metadata (file path, last-modified, category). The body is markdown.
+- **`config.json`** — nagent's main config file is JSON, not YAML, but the same "structured data file" pattern applies. The config has `safety_net`, `hook_per_run`, `hook_per_file_edit`, `context_window_tokens`, etc.
+- **`issues/{NNNN}-{slug}.md`** — nagent's issue files are markdown with structured headings (## Goal, ## Tasks, ## Done criteria), not YAML. This is the closest nagent gets to the Manual Slop convention.
+
+#### §12.2 Why YAML Is "Do Not Adopt" for Manual Slop
+
+YAML is "do not adopt" for Manual Slop for four reasons:
+
+1. **Markdown + frontmatter is sufficient for the same data shape.** The project's `conductor/presets.py` and `conductor/personas.py` both use TOML for structured config (presets.toml, project_presets.toml, personas.toml, project_personas.toml). TOML is the existing precedent; YAML would be a third format. The markdown+frontmatter pattern (per the `issues/{NNNN}-{slug}.md` precedent in nagent itself) is sufficient for the campaign-style artifacts: structured headings (`## Goal` / `## Tasks` / `## Done criteria`) + a TOML frontmatter block (project config precedent) + optional SSDL-annotated code blocks for any inline computation.
+2. **The custom DSL (survey grammar + SSDL) is the project's intent for inline computation, not configuration.** Per the `intent_dsl_survey_20260612` Cluster 5 "SSDL shape primitives", the project's DSL primitives (`[I]` inspectable, `[S]` string concatenation, `[B]` boundary, `[M]` mutable aggregate) are the shape annotations for any data structure. The DSL is for inline computation (e.g., the code-shape sketches in §1-§11), not for configuration files.
+3. **YAML's whitespace sensitivity is fragile for AI-generated content.** LLMs frequently mis-indent YAML; a single space off can change the structure silently. The Manual Slop workflow already encodes the discipline "always run the suite, not just `py_compile`" (per §6 cross-ref to `315fe9e`); YAML adds another surface for the "looks right but parses wrong" failure mode.
+4. **The project's existing markdown-driven conventions (per `superpowers_review_20260619`)** establish markdown as the default format for human-editable artifacts. The 6 styleguides in `conductor/code_styleguides/` are markdown; the 14 `docs/guide_*.md` files are markdown; the per-track `spec.md`, `plan.md`, `state.toml`, `metadata.json` are markdown + TOML. Adding YAML would be a third format for the same data shape.
+
+The YAML-avoidance is a "do not adopt" flag, not a "must not exist" ban. The user can still read and parse YAML (e.g., when reading nagent's source); the avoidance is for new Manual Slop artifacts.
+
+#### §12.3 The Markdown + Custom DSL Alternative
+
+The markdown + custom DSL alternative is concrete: each campaign-style artifact becomes a markdown file with structured headings + a TOML frontmatter block (project config precedent) + optional SSDL-annotated code blocks for any inline computation.
+
+The template:
+
+```markdown
+++
+slug = "campaign-slug"
+status = "active"
+created = "2026-06-20"
+++
+
+# Campaign: {name}
+
+## Goal
+
+<one sentence: what the user is trying to achieve>
+
+## Tasks
+
+- [ ] **{item_id}** — {description} (status: todo; blocked_by: [])
+- [ ] **{item_id}** — {description} (status: todo; blocked_by: [{item_id}])
+
+## Done criteria
+
+- {condition_1}
+- {condition_2}
+
+## Notes
+
+<optional: inline code-shape sketch with SSDL annotations>
+
+```
+campaign := { name: string, status: active|paused|done,
+              completion: [condition], items: [item] }  {ssdl} [M]
+```
+```
+
+The TOML frontmatter (between `+++` markers) holds the machine-readable fields (slug, status, created). The markdown body holds the human-readable content (goal, tasks, done criteria, notes). The SSDL annotations (`{ssdl} [M]`) are the shape tags for any data structure in the code-shape sketches.
+
+The per-item file follows the same template:
+
+```markdown
+++
+id = "{item_id}"
+status = "todo"
+blocked_by = ["{item_id}"]
+++
+
+# {item_id}: {description}
+
+## Goal
+
+<one sentence: what this item is trying to achieve>
+
+## Done criteria
+
+- {condition}
+
+## Conversation
+
+<path to the conversation file>
+```
+
+The per-proposal file follows the same template:
+
+```markdown
+++
+parent_item = "{item_id}"
+created = "2026-06-20"
+++
+
+# Proposal: decompose {item_id}
+
+## Sub-items
+
+- [ ] **{sub_item_id}** — {description}
+- [ ] **{sub_item_id}** — {description}
+
+## Rationale
+
+<why this decomposition; the LLM's reasoning>
+```
+
+The graduate file follows the same template (with `executable = false` to mark it as a draft):
+
+```markdown
+++
+name = "{tool_name}"
+executable = false
+graduated_at = "2026-06-20"
+++
+
+# {tool_name} (DRAFT)
+
+<the tool's prompt or code>
+
+## Review notes
+
+<what the user should check before promoting from draft>
+```
+
+The TOML frontmatter is the project config precedent (`conductor/presets.py` + `conductor/personas.py`); the markdown body is the project convention; the SSDL annotations are the project's DSL primitives.
+
+#### §12.4 Cross-References
+
+The YAML-avoidance section cross-references:
+
+- **`intent_dsl_survey_20260612`** — the survey's Cluster 5 "SSDL shape primitives" is the canonical reference for the SSDL annotations. The survey's §4.4 "7-column table format" is the canonical reference for any tabular data.
+- **`superpowers_review_20260619`** — the superpowers plugin review establishes the project's markdown-driven conventions. The 6 styleguides in `conductor/code_styleguides/` are markdown; the 14 `docs/guide_*.md` files are markdown; the markdown convention is the project's default.
+- **`conductor/presets.py`** + **`conductor/personas.py`** — the TOML precedent for project config. The `[presets]` and `[personas]` tables in `presets.toml` and `personas.toml` are the pattern for any new project config file.
+- **`conductor/workflow.md`** — the workflow's "always run the suite, not just `py_compile`" discipline (per §6 cross-ref) is the project's "look for failure modes" mindset. YAML's whitespace fragility is a failure mode; the project's mindset is to surface failure modes explicitly.
+
+#### §12.5 Decision Candidate
+
+**NEW Candidate 27 (HIGH).** "Markdown + custom DSL lock-in" — explicitly adopt markdown + survey grammar + SSDL for campaign-style artifacts; reject YAML for new project artifacts. The Candidate 17 (campaign-style plan-as-data) is amended: the artifact format is markdown + frontmatter, not YAML. The Candidate 18 (discussion-window safety net) is unchanged (it operates on existing JSON/Markdown artifacts). The Candidate 19 (per-turn hook) is unchanged (it operates on shell commands, not data files). The Candidate 25 (optimization-log) is unchanged (it operates on markdown, not YAML). See `decisions.md` Candidate 27.
+
+**Source-read citations:**
+- `bin/nagent-campaign` — campaign CLI entry point (24cf16d)
+- `bin/helpers/nagent_campaign_lib.py:index_yaml_path()` — the index.yaml path convention (24cf16d)
+- `bin/helpers/nagent_campaign_lib.py:item_yaml_path()` — the per-item item.yaml path convention (24cf16d)
+- `bin/helpers/nagent_campaign_lib.py:proposal_yaml_path()` — the proposal.yaml path convention (24cf16d)
+- `bin/nagent-distill:107-200` — `--merge` + `--graduate` CLI surface (f3ec090)
+- `bin/helpers/nagent_distill_lib.py:228-260` — finished-campaign-as-harvest-source (f3ec090)
+- `bin/helpers/nagent_distill_lib.py:793-979` — `run_merge` + `run_graduate` (f3ec090)
+- `prompts/knowledge-graduate.md:1-26` — graduation LLM prompt (f3ec090)
+- `prompts/knowledge-merge.md:1-19` — merge LLM prompt (f3ec090)
+- `prompts/knowledge-graduate.md:24-26` — graduate file naming convention (`{name}.draft`)
+- `issues/0001-foundations.md` — issue file format (markdown with structured headings, not YAML)
+- `issues/0002-campaign-system.md:1-326` — campaign system spec (markdown with structured headings, not YAML)
+- `config.example.json` — nagent's main config (JSON, not YAML; the "structured data file" pattern)
+- `bin/nagent:1319-1331` — `conversation_scratch_dir(conversation_name)` (49e07f3; relevant for the scratch dir pattern, not YAML)
+- `bin/nagent:2220-2230` — `root = resolve_default_root(args.root)` (54c8741; relevant for the project-local-roots pattern)
+- `conductor/presets.py` — the TOML precedent for project config (the project file, not nagent's)
+- `conductor/personas.py` — the TOML precedent for project config (the project file, not nagent's)
+- `conductor/code_styleguides/data_oriented_design.md` — the project's canonical DOD reference (markdown, not YAML)
+- `intent_dsl_survey_20260612` — the survey's Cluster 5 "SSDL shape primitives" (the project convention)
+- `superpowers_review_20260619` — the superpowers plugin review (the project convention)
+- `bin/helpers/nagent_gc_lib.py` — the knowledge harvest library (v2.3; relevant for the harvest format, not YAML)
+- `bin/helpers/nagent_tags.py` — the tag parser (065168c; relevant for the lenient parser, not YAML)
+- `bin/helpers/nagent_safety_lib.py` — the safety net library (38d3d4f; relevant for the checkpoint format, not YAML)
+- `bin/helpers/nagent_cli.py:11-86` — the resolve/scaffold functions (54c8741; relevant for the project-local-roots pattern)
+- `bin/helpers/nagent_llm.py:54-77` — `MODEL_CONTEXT_WINDOWS` table (bdfa2a6; relevant for the verified table pattern, not YAML)
+- `bin/nagent:640-748` — `build_initial_context` (54c8741; relevant for the 4-layer context resolution)
+- `bin/nagent:3167-3185` — `run_agent_loop` (the main loop; relevant for the overall nagent architecture)
+- `bin/helpers/nagent_campaign_lib.py:1-50` — module docstring + imports (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:1-50` — main module imports + constants (the v3 cluster does not cite specific line ranges)
+- `bin/nagent-distill:1-50` — distill module imports + constants (the v3 cluster does not cite specific line ranges)
+- `prompts/create-readme.md:248-251` — the "graduate proven playbooks" reduction (c1d2cad; relevant for the graduate rationale)
+
+**Honest gaps:**
+1. **The TOML frontmatter syntax (between `+++` markers) is the project convention, but the exact parser is not specified.** A future track would document the parser (e.g., `tomllib` for reading, `tomli-w` for writing, or a custom parser that handles the `+++` delimiter).
+2. **The SSDL annotations (`{ssdl} [M]`) are not formally parsed.** They are inline text annotations; a future tool could parse them for validation (e.g., a styleguide linter that asserts every `[M]` aggregate has a corresponding `git_history` field).
+3. **The markdown+DSL alternative does not address binary artifacts.** Campaign-style artifacts are text; binary artifacts (images, models, etc.) would need a different format. A future track would address binary artifacts.
+4. **The "do not adopt" flag is for new Manual Slop artifacts.** Existing YAML files (e.g., from imported nagent campaigns) would still need to be parsed. A future track would document the YAML parser for backward compatibility.
+
+## §13 Agent context-window observations
+
+**Source:** user's empirical findings on OpenCode + MiniMax M3 (per the 2026-06-20 directive); nagent's enforcement (per §1 Campaigns + §2 Conversation safety net + §3 Hooks); Manual Slop's `docs/` + `conductor/` markdown navigation (per `conductor/workflow.md` "Mandatory Research-First Protocol" + the 6 styleguides in `conductor/code_styleguides/` + the 14 `docs/guide_*.md` files).
+**One-liner:** Agents take ~100-150k tokens to warm up; the context window can go up to ~500k (MiniMax M3); the safe zone is 250-350k; the cycle is compact → re-warm → continue. Manual Slop's `docs/` + `conductor/` markdown navigation is a partial mitigation; the shortcoming is that agents frequently forget/fail to read on demand. nagent's `--hook-per-run` (per §3) is the pattern that would close the gap.
+**Pattern summary:** The agent context-window pattern is empirical: the model has a warm-up cost (~100-150k tokens before useful output), a maximum window (~500k for MiniMax M3), a safe zone (250-350k; above which output quality degrades), and a cycle (compact → re-warm → continue). nagent enforces the cycle more strictly via per-turn hook injection (§3) + safety net checkpoints (§2) + distill graduates (§1). Manual Slop's `docs/` + `conductor/` markdown navigation is a partial mitigation: the project's 6 styleguides + 14 deep-dive guides + per-track `state.toml` + `metadata.json` are all markdown, deliberately so agents can navigate on demand. The shortcoming is that agents frequently forget to read or fail to read on demand. nagent's `--hook-per-run` pattern (per §3) is the structural mechanism that closes the gap: a per-turn hook that injects a "what to read next" status block at the top of every turn. The decision candidate is Candidate 19 (per-turn ground-truth hook) reframed with the v3.1 context-window framing.
+
+#### §13.1 The Warm-Up + Window + Safe-Zone Numbers
+
+The empirical findings (per the user's 2026-06-20 directive):
+
+- **Warm-up cost:** ~100-150k tokens. Before the model produces useful output, it needs to load the system prompt + the per-track context + the per-discussion history + the per-task state. The warm-up is the cost of the first useful token.
+- **Maximum window:** up to ~500k tokens (MiniMax M3). The model can technically process up to 500k tokens, but the output quality degrades as the window fills.
+- **Safe zone:** 250-350k tokens. Below the warm-up cost, the model hasn't loaded enough context. Above the safe zone, the output quality degrades. The safe zone is the range where the model produces useful output efficiently.
+- **Cycle:** compact → re-warm → continue. When the window approaches the safe-zone ceiling, the model compacts the context (drops low-priority information, summarizes, etc.), then re-warms (loads the compacted context + the new task), then continues. The cycle is iterative; each cycle costs ~100-150k tokens of warm-up.
+
+The numbers are empirical (MiniMax M3); other models may have different numbers. The pattern (warm-up + window + safe zone + cycle) is the structural insight; the numbers are the parameterization.
+
+#### §13.2 nagent's Enforcement
+
+nagent enforces the cycle more strictly than the model does natively. The three mechanisms:
+
+1. **Per-turn hook injection (§3):** A hook runs at the top of every turn (before the model speaks); its output enters the conversation as a labeled block. The hook is the per-turn ground-truth that prevents the model from "re-warming" by reading its own context. The hook is fast (median-of-5 timing) and surfaces the measured state (build status, test status, etc.) without the model having to read its own conversation.
+2. **Safety net checkpoints (§2):** A wall-clock + burst guard fires a checkpoint when the conversation grows. The checkpoint is a separate one-shot LLM call (not the working model) that produces a structured summary (## Intent | ## Next action | ## Constraints | ## Open questions). The summary is the "compacted" context; the next turn re-warms from the summary.
+3. **Distill graduates (§1):** The `--graduate` pass takes proven playbooks and drafts them as non-executable `{name}.draft` files. The drafts are "graduate candidates" — proven knowledge that can be promoted to executable tools after review. The graduate pass is the "structural re-warm" — the model doesn't have to re-read the playbook because it's been distilled into a tool.
+
+The three mechanisms together implement the cycle as a structural pattern, not a model-dependent behavior. The model doesn't have to "remember to compact"; the cycle is enforced by the loop.
+
+#### §13.3 Manual Slop's Partial Mitigation
+
+Manual Slop's `docs/` + `conductor/` markdown navigation is a partial mitigation for the cycle. The project deliberately keeps the following files in markdown so agents can navigate on demand:
+
+- **`AGENTS.md`** — the canonical operating instructions for agents. The @import pattern (per `conductor/code_styleguides/data_oriented_design.md`) includes the 6 styleguides + the 14 deep-dive guides.
+- **`conductor/workflow.md`** — the workflow conventions (TDD, per-task commits, format commitments, "always run the suite").
+- **`conductor/product-guidelines.md`** — the project styleguides (1-space indent for Python, no comments, etc.).
+- **`conductor/code_styleguides/data_oriented_design.md`** — the canonical DOD reference (Tier 0/1/2, simplification pass, enforceable deliverables).
+- **`conductor/code_styleguides/cache_friendly_context.md`** — the cache TTL GUI contract (stable-to-volatile context ordering).
+- **`conductor/code_styleguides/knowledge_artifacts.md`** — the knowledge harvest pattern (7-category schema + provenance + sha256 ledger).
+- **`conductor/code_styleguides/error_handling.md`** — the Result[T] convention.
+- **`conductor/code_styleguides/agent_memory_dimensions.md`** — the 4 memory dimensions (curation / discussion / RAG / knowledge).
+- **`conductor/code_styleguides/rag_integration_discipline.md`** — the conservative-RAG rule.
+- **`conductor/code_styleguides/feature_flags.md`** — file presence vs config flags vs CLI flags.
+- **The 14 `docs/guide_*.md` files** — the deep-dive guides (architecture, AI client, API hooks, MCP client, app controller, MMA, models, testing, GUI, paths, context curation, shaders, RAG, beads, hot reload, personas, NERV theme, workspace profiles, command palette).
+- **Per-track `state.toml` + `metadata.json`** — the per-track state (current phase, task progress, verification status).
+- **Per-track `spec.md` + `plan.md`** — the per-track specification and plan.
+
+The markdown convention is deliberate: agents can navigate the project's knowledge on demand by reading the files. The convention is the project's "partial mitigation" for the cycle.
+
+#### §13.4 The Shortcoming
+
+The shortcoming is that agents frequently forget to read or fail to read on demand. The empirical observation:
+
+- **Forget to read:** The agent has a task, the relevant guidance is in `conductor/workflow.md`, but the agent doesn't read the file because the task description doesn't explicitly say "read `conductor/workflow.md` first". The agent proceeds without the guidance.
+- **Fail to read on demand:** The agent reads the relevant guidance at the start of the task, but as the task progresses, the agent doesn't re-read the guidance when a new question arises. The agent proceeds with stale information.
+- **Read but ignore:** The agent reads the relevant guidance, but the agent's interpretation of the guidance is different from the guidance's intent. The agent proceeds with a misunderstanding.
+
+The three failure modes are not the same; each has a different mitigation. The "forget to read" mitigation is to make the reading explicit (e.g., "before starting, read `conductor/workflow.md`"). The "fail to read on demand" mitigation is to make the re-reading automatic (e.g., a per-turn hook that surfaces the relevant guidance). The "read but ignore" mitigation is to make the guidance unambiguous (e.g., structured headings, examples, anti-patterns).
+
+#### §13.5 The Hook Pattern as the Solution
+
+nagent's `--hook-per-run` pattern (per §3) is the structural mechanism that closes the gap. The pattern:
+
+1. **Configure a status command.** The user configures a command (e.g., `make test`, `git status`, `cat conductor/workflow.md`) that runs at the top of every turn.
+2. **Run the command via the hook.** The hook runs the command, captures exit code + stdout + stderr, and injects a labeled block at the top of the conversation.
+3. **The model sees the status block.** The model reads the status block as part of the conversation; the status block is the per-turn ground-truth.
+
+The pattern closes all three failure modes:
+- **Forget to read:** The status block is automatically injected; the agent can't forget to read it.
+- **Fail to read on demand:** The status block is refreshed every turn; the agent sees the latest status every turn.
+- **Read but ignore:** The status block is structured (exit code + stdout + stderr); the agent can't ignore a failing exit code or a stderr message.
+
+The pattern is the structural mechanism for the cycle. The agent doesn't have to "remember to check the status"; the check is automatic.
+
+#### §13.6 Decision Candidate
+
+**NEW Candidate 28 (MEDIUM).** "Per-turn ground-truth hook for Manual Slop" — adopt nagent's `--hook-per-run` model; inject a "what to read next" status block at the top of every `send_result()`. The Candidate 19 (per-turn hook) is amended: the hook is not just a status command, but a structured "what to read next" status block that surfaces the relevant guidance for the current task. The hook is configured per-project (via `[conductor].hook_per_run` in `manual_slop.toml`); the default is a no-op (the hook is opt-in). See `decisions.md` Candidate 28.
+
+**Source-read citations:**
+- The user's 2026-06-20 directive — the empirical findings (warm-up + window + safe zone + cycle)
+- `bin/nagent:1442-1484` — `run_hook` + `resolve_hooks` (a4fb141; the per-turn hook primitive)
+- `bin/nagent:1922-1927` — `hook_per_run` injection site (a4fb141)
+- `bin/nagent:3167-3185` — `run_agent_loop` (the main loop; the hook is wired here)
+- `bin/nagent:1519-1539` — `checkpoint_due` + `rebuild_due` (38d3d4f; the safety net trigger)
+- `bin/nagent:1547-1587` — `write_checkpoint` (38d3d4f; the safety net writer)
+- `bin/nagent:1590-1662` — `rebuild_conversation` (38d3d4f; the safety net rebuild)
+- `bin/nagent:1840-1881` — `extract_conversation_summary` (6426a67; the instant-saves change)
+- `bin/helpers/nagent_distill_lib.py:587-654` — `_summary_backfill_candidates` + `_backfill_saved_summaries` (6426a67)
+- `bin/nagent-campaign` — campaign CLI entry point (24cf16d; the campaigns abstraction)
+- `bin/nagent-distill:107-200` — `--merge` + `--graduate` CLI surface (f3ec090; the distill abstraction)
+- `prompts/knowledge-graduate.md:1-26` — graduation LLM prompt (f3ec090)
+- `prompts/knowledge-merge.md:1-19` — merge LLM prompt (f3ec090)
+- `AGENTS.md` — the canonical operating instructions (the project's markdown convention)
+- `conductor/workflow.md` — the workflow conventions (the project's markdown convention)
+- `conductor/product-guidelines.md` — the project styleguides (the project's markdown convention)
+- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference (the project's markdown convention)
+- `conductor/code_styleguides/cache_friendly_context.md` — the cache TTL GUI contract (the project's markdown convention)
+- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge harvest pattern (the project's markdown convention)
+- `conductor/code_styleguides/error_handling.md` — the Result[T] convention (the project's markdown convention)
+- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 memory dimensions (the project's markdown convention)
+- `conductor/code_styleguides/rag_integration_discipline.md` — the conservative-RAG rule (the project's markdown convention)
+- `conductor/code_styleguides/feature_flags.md` — file presence vs config flags vs CLI flags (the project's markdown convention)
+- `docs/guide_*.md` — the 14 deep-dive guides (the project's markdown convention)
+- Per-track `state.toml` + `metadata.json` — the per-track state (the project's markdown convention)
+- `bin/nagent:606-745` — `build_initial_context` (v2.3; relevant for the initial context assembly)
+- `bin/nagent:970-987` — `conversation_cache_boundaries` (v2.3; relevant for the cache strategy)
+- `bin/nagent:1455-1687` — `run_safety_net` (38d3d4f; relevant for the safety net machinery)
+- `bin/nagent:2819` — `safety_settings=load_safety_settings(...)` (38d3d4f; relevant for the safety net wiring)
+- `bin/helpers/nagent_cli.py:11-86` — the resolve/scaffold functions (54c8741; relevant for the project-local-roots pattern)
+- `bin/helpers/nagent_llm.py:54-77` — `MODEL_CONTEXT_WINDOWS` table (bdfa2a6; relevant for the verified table pattern)
+- `bin/nagent:2220-2230` — `root = resolve_default_root(args.root)` (54c8741; relevant for the project-local-roots pattern)
+- `bin/helpers/nagent_safety_lib.py` — the safety net library (38d3d4f; relevant for the safety net machinery)
+- `bin/nagent:640-748` — `build_initial_context` (54c8741; relevant for the 4-layer context resolution)
+- `bin/nagent:1075-1081` — `target = f"{llm.provider}/{llm.model}"` (2edc7ee; relevant for the provider/model naming)
+- `bin/nagent:3167-3185` — `run_agent_loop` (the main loop; relevant for the overall nagent architecture)
+- `bin/nagent:1-50` — main module imports + constants (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:1300-1400` — main loop body (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:1900-2000` — main loop continued (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:2000-2100` — main loop continued (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:2200-2300` — main loop end (the v3 cluster does not cite specific line ranges)
+
+**Honest gaps:**
+1. **The warm-up + window + safe-zone numbers are empirical for MiniMax M3.** Other models (Gemini, Anthropic, OpenAI) may have different numbers. A future track would measure the numbers per provider.
+2. **The hook pattern is opt-in.** The default is a no-op; the user must configure a status command. A future track could make the hook default-on with a no-op status command (the cost is the hook's per-turn latency, which should be < 100ms for a no-op).
+3. **The "what to read next" status block is a per-project configuration.** The user must specify the status command per project. A future track could auto-detect the relevant guidance based on the current task (e.g., if the task is "implement X", the status block surfaces `conductor/workflow.md` and `conductor/code_styleguides/data_oriented_design.md`).
+4. **The hook pattern is per-turn.** A future track could add per-task, per-conversation, or per-project hooks (e.g., a per-task hook that fires when a task starts, a per-conversation hook that fires when a conversation starts).
+
+## §14 Fine-tuning observations
+
+**Source:** user's 2026-06-20 directive ("current generalized models bottlenecked by not having conventions baked in; curated dataset of associated codebases; Together.ai noticed; asks about other prosumer fine-tuning vendors for middle-wage income in 2026").
+**One-liner:** Current generalized models are bottlenecked by not having the user's core conventions/workflows baked in. A curated dataset of associated codebases (Manual Slop's own tracks, decisions, plans, styleguides) is the user's proposed mitigation. Together.ai is one noticed vendor; 5-6 other prosumer fine-tuning vendors are surveyed below. Vendor selection is a separate future track; this section is observational.
+**Pattern summary:** The fine-tuning pattern is the user's interest in baking conventions/workflows into a model via fine-tuning. The pattern is: (1) recognize the bottleneck (generalized models don't have the user's conventions); (2) curate the dataset (the user's own tracks, decisions, plans, styleguides); (3) select a vendor (Together.ai is one; 5-6 others surveyed); (4) fine-tune the model (vendor-specific process); (5) validate the fine-tuned model (does it actually produce better output for the user's use case?). The v3.1 section is observational; the vendor analysis is a separate future track. The decision candidate is Candidate 29 (dataset-curation track) + Candidate 30 (cache TTL GUI contract hardening, per the cross-ref to §13).
+
+#### §14.1 The Diagnosis
+
+The diagnosis (per the user's 2026-06-20 directive): current generalized models are bottlenecked by not having the user's core conventions/workflows baked in. The bottleneck manifests as:
+
+- **Convention drift:** The model produces output that violates the project's conventions (e.g., 4-space indent instead of 1-space; JSON blocks instead of tables; etc.). The user must correct the output repeatedly.
+- **Workflow ignorance:** The model doesn't know the project's workflow (TDD, per-task commits, format commitments, "always run the suite"). The model produces output that doesn't follow the workflow.
+- **Styleguide unawareness:** The model doesn't know the project's 6 styleguides (DOD, cache-friendly context, knowledge artifacts, error handling, agent memory dimensions, RAG integration discipline, feature flags). The model produces output that doesn't follow the styleguides.
+
+The three failure modes are not the same; each has a different fine-tuning mitigation. The "convention drift" mitigation is to bake the conventions into the model's training data (e.g., the project's `conductor/product-guidelines.md` + the 6 styleguides as training examples). The "workflow ignorance" mitigation is to bake the workflow into the model's training data (e.g., the project's `conductor/workflow.md` + per-track `plan.md` as training examples). The "styleguide unawareness" mitigation is to bake the styleguides into the model's training data (e.g., the 6 styleguides + the 14 deep-dive guides as training examples).
+
+#### §14.2 Together.ai as One Noticed Vendor
+
+The user noticed Together.ai. Together.ai offers fine-tuning for open-source models (Llama 3.x, Qwen 3, Mistral) with transparent per-token pricing. The pricing model is:
+
+- **Training:** ~$0.50-3.00 per million tokens (varies by model + dataset size).
+- **Inference:** ~$0.10-0.60 per million tokens (varies by model + context length).
+
+The prosumer-friendly aspects: transparent pricing, open-source model support, no minimum commitment, serverless deployment. The cons: the user must curate the dataset + select the base model + validate the fine-tuned model.
+
+#### §14.3 Prosumer Fine-Tuning Vendor Survey (2026)
+
+The prosumer fine-tuning vendor survey (per the user's 2026-06-20 directive):
+
+| Vendor | Model families | Pricing tier | Prosumer-friendly? | Notes |
+|---|---|---|---|---|
+| **Together.ai** | Llama, Qwen, Mistral, others | $0.50-3/M training; $0.10-0.60/M inference | Yes — transparent; open-source models | User-noticed vendor |
+| **Fireworks.ai** | Llama, Qwen, Mistral | Similar to Together | Yes — serverless DX | Lower latency than Together for some models |
+| **OpenAI fine-tuning** | GPT-4o, GPT-4o-mini, GPT-3.5 | ~$3/M training, $0.30/M inference (4o-mini) | Yes for "mini"; expensive for 4o | Best DX; closed-source models |
+| **Anthropic Claude Haiku fine-tuning** | Claude Haiku (if on waitlist) | Similar to OpenAI 4o-mini | Waitlist-gated | Best for Anthropic-specific workflows |
+| **Google Gemini 1.5 Flash fine-tuning** | Gemini 1.5 Flash | ~$0.50-1/M training | Yes for high-volume | Best for Google-specific workflows |
+| **Local fine-tuning (RTX 4090/5090 + Unsloth)** | Any open-source model | $1,500-3,000 one-time hardware | Yes for weekly-iterators | Full control; no per-token cost |
+
+The survey is observational; the vendor analysis is a separate future track. The v3.1 section is not making a recommendation; it's documenting the user's interest + the prosumer vendor landscape.
+
+#### §14.4 Vendor Analysis Is Out of Scope for v3.1
+
+The vendor analysis is out of scope for v3.1. The v3.1 section is observational; the vendor-selection track (if needed) would do the deep comparison + decision. The reasons:
+
+1. **Vendor pricing changes frequently.** The 2026-06-20 numbers may be out of date by 2026-09-20. A vendor-selection track would need to be re-run periodically.
+2. **The dataset is the user's call.** The user must curate the dataset (the user's own tracks, decisions, plans, styleguides) before any vendor can fine-tune. The dataset-curation is a separate effort.
+3. **The validation is the user's call.** The user must validate the fine-tuned model against the user's actual use cases. The validation is a separate effort.
+4. **The v3.1 track is research-only.** Per the v3.1 scope, no candidates are implemented in the track. The dataset-curation + vendor-selection would be a separate implementation track.
+
+The v3.1 section is a marker for a future track. The marker is: "the user is interested in fine-tuning; a future track would curate the dataset + select the vendor + fine-tune the model + validate the result".
+
+#### §14.5 Decision Candidates
+
+**NEW Candidate 29 (MEDIUM).** "Dataset-curation track for fine-tuning" — separate track to curate the Manual Slop conventions/workflows dataset for fine-tuning; vendor selection deferred. The dataset would include: per-track `spec.md` + `plan.md` + `state.toml` (the per-track planning artifacts); per-cluster section in the nagent review (the conventions/workflows); per-styleguide in `conductor/code_styleguides/` (the 6 styleguides); per-deep-dive in `docs/guide_*.md` (the 14 deep-dive guides). The dataset would be a markdown + TOML corpus; the corpus would be the input to a vendor-specific fine-tuning process. See `decisions.md` Candidate 29.
+
+**NEW Candidate 30 (LOW).** "Cache TTL GUI contract hardening" — make the per-turn grounding primitive also track cache state; cross-ref `cache_friendly_context.md`. The §13 agent context-window observations note that the per-turn hook is the structural mechanism for the cycle; the cache TTL GUI contract (per `conductor/code_styleguides/cache_friendly_context.md`) is the cache version of the same insight. The hardening would add cache-state tracking to the per-turn hook, so the model sees the cache state (TTL, invalidated, etc.) as part of the status block. See `decisions.md` Candidate 30.
+
+**Source-read citations:**
+- The user's 2026-06-20 directive — the diagnosis (current models bottlenecked) + the dataset (Manual Slop's own tracks) + the vendor notice (Together.ai) + the prosumer question (other vendors for middle-wage income in 2026)
+- `conductor/presets.py` — the TOML precedent for project config (the dataset would include `presets.toml` + `project_presets.toml`)
+- `conductor/personas.py` — the TOML precedent for project config (the dataset would include `personas.toml` + `project_personas.toml`)
+- `conductor/context_presets.py` — the ContextPresetManager (the dataset would include per-track context presets)
+- `conductor/tool_presets.py` — the ToolPresetManager (the dataset would include tool presets)
+- `conductor/tool_bias.py` — the ToolBiasEngine (the dataset would include tool bias profiles)
+- `conductor/workflow.md` — the workflow conventions (the dataset would include this)
+- `conductor/product-guidelines.md` — the project styleguides (the dataset would include this)
+- `conductor/code_styleguides/data_oriented_design.md` — the canonical DOD reference (the dataset would include this)
+- `conductor/code_styleguides/cache_friendly_context.md` — the cache TTL GUI contract (the dataset would include this; relevant for Candidate 30)
+- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge harvest pattern (the dataset would include this)
+- `conductor/code_styleguides/error_handling.md` — the Result[T] convention (the dataset would include this)
+- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 memory dimensions (the dataset would include this)
+- `conductor/code_styleguides/rag_integration_discipline.md` — the conservative-RAG rule (the dataset would include this)
+- `conductor/code_styleguides/feature_flags.md` — file presence vs config flags vs CLI flags (the dataset would include this)
+- `docs/guide_*.md` — the 14 deep-dive guides (the dataset would include these)
+- `docs/Readme.md` — the canonical teaching document (the dataset would include this)
+- `AGENTS.md` — the canonical operating instructions (the dataset would include this)
+- Per-track `spec.md` + `plan.md` + `state.toml` + `metadata.json` — the per-track artifacts (the dataset would include these)
+- Per-discussion `logs/sessions/{session_id}/discussion.jsonl` — the per-discussion history (the dataset would include selected discussions, with user approval)
+- The user's existing 4-tier MMA architecture (per `docs/guide_mma.md`) — the MMA conventions (the dataset would include the MMA architecture)
+- The user's existing Hook API (per `docs/guide_api_hooks.md`) — the Hook API conventions (the dataset would include the Hook API architecture)
+- The user's existing MCP tools (per `docs/guide_mcp_client.md`) — the MCP tool conventions (the dataset would include the MCP architecture)
+- Together.ai pricing page (https://www.together.ai/pricing) — the user's noticed vendor
+- Fireworks.ai pricing page (https://fireworks.ai/pricing) — the alternative vendor
+- OpenAI fine-tuning pricing (https://openai.com/api/pricing/) — the closed-source alternative
+- Unsloth (https://github.com/unslothai/unsloth) — the local fine-tuning framework
+- `bin/nagent:1075-1081` — `target = f"{llm.provider}/{llm.model}"` (2edc7ee; relevant for the provider/model naming, cross-ref to §5)
+- `bin/nagent:3167-3185` — `run_agent_loop` (the main loop; relevant for the overall nagent architecture)
+- `conductor/tech-stack.md` — the project's tech stack (relevant for the model selection)
+- `bin/helpers/nagent_llm.py:54-77` — `MODEL_CONTEXT_WINDOWS` table (bdfa2a6; relevant for the per-model context windows, cross-ref to §5)
+- `bin/nagent:2220-2230` — `root = resolve_default_root(args.root)` (54c8741; relevant for the project-local-roots pattern)
+- `bin/helpers/nagent_safety_lib.py` — the safety net library (38d3d4f; relevant for the safety net machinery)
+- `bin/nagent:606-745` — `build_initial_context` (v2.3; relevant for the initial context assembly)
+- `bin/nagent:970-987` — `conversation_cache_boundaries` (v2.3; relevant for the cache strategy, cross-ref to Candidate 30)
+- `bin/nagent:1455-1687` — `run_safety_net` (38d3d4f; relevant for the safety net machinery)
+- `bin/nagent:1840-1881` — `extract_conversation_summary` (6426a67; relevant for the instant-saves change)
+- `bin/nagent:2819` — `safety_settings=load_safety_settings(...)` (38d3d4f; relevant for the safety net wiring)
+- `bin/nagent:1922-1927` — `hook_per_run` injection site (a4fb141; relevant for the per-turn hook, cross-ref to §3 + §13)
+- `bin/nagent:1442-1484` — `run_hook` + `resolve_hooks` (a4fb141; relevant for the per-turn hook, cross-ref to §3 + §13)
+- `bin/helpers/nagent_cli.py:11-86` — the resolve/scaffold functions (54c8741; relevant for the project-local-roots pattern)
+- `bin/nagent:1-50` — main module imports + constants (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:1300-1400` — main loop body (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:1900-2000` — main loop continued (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:2000-2100` — main loop continued (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:2200-2300` — main loop end (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:640-748` — `build_initial_context` (54c8741; relevant for the 4-layer context resolution)
+
+**Honest gaps:**
+1. **The dataset-curation effort is significant.** A complete dataset would include all 14 deep-dive guides + 6 styleguides + per-track artifacts + per-discussion history. The effort is months, not days. A future track would scope the dataset to a manageable subset.
+2. **The vendor pricing is from 2026-06-20.** The pricing may change by the time the user is ready to fine-tune. A vendor-selection track would re-survey the pricing at the time of decision.
+3. **The fine-tuned model's validation is the user's call.** The user must validate the model against the user's actual use cases. The validation is a separate effort; the v3.1 section does not provide a validation methodology.
+4. **The Cache TTL GUI contract hardening (Candidate 30) is a small change.** The cross-ref to `cache_friendly_context.md` is the canonical reference; a future track would add cache-state tracking to the per-turn hook.
+5. **The fine-tuning vs. prompting trade-off is not analyzed.** Fine-tuning bakes conventions into the model; prompting surfaces conventions at inference time. The trade-off is: fine-tuning is a one-time cost + lower per-inference cost; prompting is a per-inference cost + no training cost. A vendor-selection track would analyze the trade-off.
+
+## §15 Decisions
+
+See `decisions.md` for the full candidate list (v2.3's 16 + v3's new 11 + v3.1's new 3, with v2.3 → v3 → v3.1 status mapping at the top). **Total v3.1 candidate pool: 30 entries** (3 HIGH + 7 MEDIUM + 7 LOW + 1 LOW-docs in v3+v3.1's new candidates, plus 14 STILL-OPEN from v2.3, plus 1 PROMOTED + 1 SUBSUMED status changes, plus 3 v3.1 NEW per §12-§14). The HIGH-priority v3 candidates are:
+
+- **Candidate 17:** Campaign-style plan-as-data for the conductor (§1) — amended by Candidate 27 to use markdown + frontmatter, not YAML
 - **Candidate 18:** Discussion-window safety net for Manual Slop (§2)
 - **Candidate 22:** Tier 3 worker contract "decompose or isolate, never offload" (§6)

-The MEDIUM-priority v3 candidates are Candidates 19 (per-turn hook), 21 (per-model token-cap), 23 (per-conversation scratch dir), 25 (optimization-log discipline), 27 (tolerance-based comparator). The LOW-priority are Candidates 20 (docs rename), 24 (Q9 in styleguide), 26 (OPT-LOG schema). Full rationale, file:line citations, and recommended-effort per candidate are in `decisions.md`.
+The MEDIUM-priority v3+v3.1 candidates are Candidates 19 (per-turn hook — amended by Candidate 28), 21 (per-model token-cap), 23 (per-conversation scratch dir), 25 (optimization-log discipline), 27 (markdown+DSL lock-in, per §12), 28 (per-turn ground-truth hook, per §13), 29 (dataset-curation track, per §14). The LOW-priority are Candidates 20 (docs rename), 24 (Q9 in styleguide), 26 (OPT-LOG schema), 30 (cache TTL GUI contract hardening, per §14). Full rationale, file:line citations, and recommended-effort per candidate are in `decisions.md`.

-## §13 Cross-references
+## §16 Cross-references

 See `nagent_takeaways_v3_20260619.md` for the bridge to v2.3 takeaways + the sibling reviews:

 - **`fable_review_20260617`** — Fable's analysis of Mythos system prompt. Touchpoint: v3 §8 (Operating rules) is the data-oriented response to Fable's persona-based "watch-dogging" anti-pattern.
- **`intent_dsl_survey_20260612`** — the 10 prior-art clusters for intent-based DSLs. Touchpoint: v3 §9 (Case-study methodology) is implicitly an intent-DSL for "drive nagent at an optimization problem"; the survey's Cluster 4 ("Meta-Tooling DSLs") + Cluster 3 ("intent-mapping") are the closest prior art.
- **`superpowers_review_20260619`** — the superpowers plugin review. Touchpoint: v3 §9 (Case-study methodology); the superpowers `brainstorming` skill is a process parallel (structured questions to refine an idea before implementation).
+- **`intent_dsl_survey_20260612`** — the 10 prior-art clusters for intent-based DSLs. Touchpoint: v3 §9 (Case-study methodology) is implicitly an intent-DSL for "drive nagent at an optimization problem"; v3.1 §12 (YAML avoidance) cites the survey's Cluster 5 "SSDL shape primitives" as the project's DSL primitive.
+- **`superpowers_review_20260619`** — the superpowers plugin review. Touchpoint: v3 §9 (Case-study methodology); the superpowers `brainstorming` skill is a process parallel (structured questions to refine an idea before implementation); v3.1 §12 (YAML avoidance) cites the superpowers review as the project's markdown-driven convention.

-## §14 References
+## §17 References

 ### Source commits (24)

@@ -2415,7 +2841,27 @@ The 24 nagent commits reviewed, in chronological order (oldest first):
 - [`macton/pep-copt`](https://github.com/macton/pep-copt) at `main` (5 commits). The PEP image compression case study: 2.04× speedup aggregate on 24-image benchmark, byte-identical `.pep` output, decode net-neutral (§10).
 - [`macton/differentiable-collisions-optc`](https://github.com/macton/differentiable-collisions-optc) at `main` (5 commits). The Convex Primitive Collision Detection case study: 101.06× speedup on committed input, 97.75× and 98.43× on alternate seeds, tolerance-based match contract (§11).

-### Per-phase commit SHAs
+### Per-phase commit SHAs (v3.1)
+
+| Phase | Description | Commit SHA |
+|---|---|---|
+| Phase 1 | Setup + audit (v3.1) | `8fb82762` |
+| Phase 2 | Thicken §1 Campaigns cluster | `bd36aa4b` |
+| Phase 3 | Thicken §2 Conversation safety net cluster | `478b088b` |
+| Phase 4 | Thicken §3 Hooks cluster | `d17ee930` |
+| Phase 5 | Thicken §4 Project-local roots cluster | `1bc8e924` |
+| Phase 6 | Thicken §5 Provider expansion cluster | `987f4a97` |
+| Phase 7 | Thicken §6 Delegation rewrite cluster | `a406d290` |
+| Phase 8 | Thicken §7 Robustness cluster | `b9b31006` |
+| Phase 9 | Thicken §8 Operating rules cluster | `eb7da8d8` |
+| Phase 10 | Thicken §9 Case-study methodology cluster | `24442379` |
+| Phase 11 | Thicken §10 PEP case study cluster | `10c7d1d0` |
+| Phase 12 | Thicken §11 Collisions case study cluster | `1574ee47` |
+| Phase 13 | New sections §12-§14 + renumber v3 §12-§14 to §15-§17 | (this commit) |
+| Phase 14 | Refresh side artifacts | (forthcoming) |
+| Phase 15 | Chunking-strategy + format-commitment verification | (forthcoming) |
+
+### Per-phase commit SHAs (v3)

 | Phase | Description | Commit SHA |
 |---|---|---|
@@ -2431,8 +2877,8 @@ The 24 nagent commits reviewed, in chronological order (oldest first):
 | Phase 10 | Case-study methodology cluster (§9) | `54e62b10` |
 | Phase 11 | PEP case study cluster (§10) | `f53c82e6` |
 | Phase 12 | Collisions case study cluster (§11) | `db7d94de` |
-| Phase 13 | Refresh side artifacts | (this commit) |
-| Phase 14 | Format-commitment verification | (forthcoming) |
+| Phase 13 | Refresh side artifacts | `e150088d` |
+| Phase 14 | Format-commitment verification | `b49be820` |

 ### Sibling-review references

@@ -2445,7 +2891,10 @@ The 24 nagent commits reviewed, in chronological order (oldest first):
 - `conductor/workflow.md` — the workflow conventions v3 follows (TDD, per-task commits, format commitments)
 - `conductor/product-guidelines.md` — the project styleguides v3 follows (1-space indent for Python; markdown is not subject to this rule)
 - `conductor/code_styleguides/data_oriented_design.md` — the project's canonical DOD reference, itself derived from Acton's `context/data-oriented-design.md`
- `conductor/code_styleguides/cache_friendly_context.md` — references nagent_review_v2_3 §3.2 + §5 (v3 deepens with §5 per-model context windows)
+- `conductor/code_styleguides/cache_friendly_context.md` — references nagent_review_v2_3 §3.2 + §5 (v3 deepens with §5 per-model context windows); v3.1 §13 + §14 cross-ref for the per-turn hook + cache TTL GUI contract
 - `conductor/code_styleguides/knowledge_artifacts.md` — references nagent_review_v2_3 §3.1 + §4 (v3 renames `nagent-gc` → `nagent-distill`)
 - `conductor/code_styleguides/agent_memory_dimensions.md` — references nagent_review_v2_3 §2.8 (v3 deepens with §1-§4 memory extension)
- `docs/guide_meta_boundary.md` — the Application vs Meta-Tooling distinction (load-bearing context for v3)
+- `conductor/code_styleguides/rag_integration_discipline.md` — the conservative-RAG rule
+- `conductor/code_styleguides/feature_flags.md` — file presence vs config flags vs CLI flags
+- `conductor/code_styleguides/error_handling.md` — the Result[T] convention
+- `docs/guide_meta_boundary.md` — the Application vs Meta-Tooling distinction (load-bearing context for v3)