From e150088d24a1679cffeba005912c4c841f57e9cf Mon Sep 17 00:00:00 2001 From: Ed_ Date: Sat, 20 Jun 2026 08:46:05 -0400 Subject: [PATCH] conductor(track): nagent_review_v3 Phase 13 refresh side artifacts --- .../comparison_table.md | 105 +++--- .../nagent_review_20260608/decisions.md | 302 +++++++----------- .../nagent_review_v3_20260619.md | 89 +++++- .../nagent_takeaways_v3_20260619.md | 129 ++++++++ 4 files changed, 370 insertions(+), 255 deletions(-) create mode 100644 conductor/tracks/nagent_review_20260608/nagent_takeaways_v3_20260619.md diff --git a/conductor/tracks/nagent_review_20260608/comparison_table.md b/conductor/tracks/nagent_review_20260608/comparison_table.md index ddab78e4..63280f67 100644 --- a/conductor/tracks/nagent_review_20260608/comparison_table.md +++ b/conductor/tracks/nagent_review_20260608/comparison_table.md @@ -1,79 +1,72 @@ -# nagent vs Manual Slop: Comparison Table +# nagent_review_v3 — Comparison Table -**Companion to:** `report.md` -**Date:** 2026-06-08 (revised same day) -**Source:** nagent v1.0.0 (read 2026-06-08) +**Date:** 2026-06-19 +**Spec pair:** `spec_v3.md` + `plan_v3.md` +**Companion:** `nagent_review_v3_20260619.md` (the v3 canonical review); `decisions.md` (v3 candidate list); `nagent_takeaways_v3_20260619.md` (bridge to v2.3 takeaways + sibling reviews). +**Source:** nagent v3 (`a1f0680` on `macton/nagent@main`, 2026-06-18) + the two case-study repos at `main` (`macton/pep-copt`, `macton/differentiable-collisions-optc`). -Flat side-by-side reference. One row per nagent principle. Verdicts and pitfalls are in `report.md`. +Flat side-by-side reference. One row per v3 cluster + one row per v2.3 pattern that v3 updates. Verdicts and pitfalls are in `nagent_review_v3_20260619.md`. --- ## Legend -- **Verdict values:** PARITY (same shape), PARITY+ (Manual Slop is stronger), PARITY- (nagent is stronger), PARTIAL (one half, not the other), GAP (Manual Slop lacks the feature), DOMAIN MISMATCH (different scope). +- **Verdict values:** PARITY (same shape), PARITY+ (Manual Slop is stronger), PARITY- (nagent is stronger), PARTIAL (one half, not the other), GAP (Manual Slop lacks the feature), ARCH-DIFF (different architecture, both correct in their domain), SUBSUMED (consumed by a follow-up track). - **Domain tags:** APP = Application domain, MT = Meta-Tooling domain, BOTH. +- **Cluster status:** NEW (didn't exist at v2.3), UPDATE (extends v2.3 pattern). --- -| # | nagent Principle (verbatim summary) | nagent Mechanism | Manual Slop Equivalent | Verdict | Domain | Action | +## v3 new clusters + +| # | Cluster | nagent source | Manual Slop equivalent | Verdict | Status | Domain | |---|---|---|---|---|---|---| -| 1 | Durable work, disposable workers. The agent is not the thing; the data is the thing. | `bin/nagent` 700-line single-file loop, conversation is a text file | MMA workers are real subprocesses with Context Amnesia; **Application AI is long-lived by design** | **PARTIAL** | BOTH | Future-track: stateless `LLMClient` class (§15.4) | -| 2 | Text in, text out. File in, text out is the smallest useful primitive. | `bin/nagent-llm-text` + `bin/helpers/nagent_llm.py` (4 providers) | `src/ai_client.py:send(...) -> str` (5 providers) | **PARITY** | BOTH | None | -| 3 | Conversations are editable state. The conversation file is not chat history; it is working state. | `bin/nagent` exposes `--save/load/edit/summarize`; text files are user-editable (vim/cat/diff/cp the raw transcript) | Discussion Takes + branching + per-entry edit (A1-A7 in report §3) + discussion-level CRUD (B1-B11) + role management (B5) + UI snapshot undo/redo (C1-C5) | **PARITY (DIFFERENT FOCUS)** — Manual Slop edits abstracted typed entries (`disc_entries` is a `list[dict]` with role + content + ts + thinking_segments + usage). Both have comprehensive editing; Manual Slop's is more granular at the entry layer, nagent's is deeper at the raw-transcript layer. | APP | Future-track: optional raw-transcript persistence per Take (Candidate 10) | -| 4 | Visible output protocol. Teach the model an output format; use a visible, parseable protocol. | `TAG_PATTERNS` regex list; `parse_response` strict; `MAX_FORMAT_RETRIES = 3` | Provider-native function calling (Gemini, Anthropic, etc.) | **ARCHITECTURAL DIFFERENCE** — Application's choice is correct (parallel tool calls, JSON mode) | BOTH | Future-track: intent-based DSL for Meta-Tooling calls | -| 5 | The loop. Append, call, parse, act, append, repeat. | `bin/nagent:run_agent_loop()` 50 lines, single `while True` | Three parallel loops: `ai_client._send_*` (LLM), `ConductorEngine.run` (MMA), `WorkflowSimulator.run_discussion_turn_async` (App) | **PARITY** | BOTH | (Low priority) Future-track: extract a single `src/llm_loop.py:run_loop` | -| 6 | Per-file memory. Each file gets its own persistent local memory. | `file_id_for_path` (st_dev:st_ino); `conversations/file-index-{pid}.json`; `nagent-file-edit` per-file subprocess | `FileItem` (path + view_mode + ast_mask + custom_slices); `ContextPreset` (saved set of FileItems); Structural File Editor | **PARITY (DIFFERENT KIND)** — Manual Slop's is *curation memory* (rich); nagent's is *conversation log memory* (plain text). Both real, both per-file, different optimization. | APP | Future-track: thin "last-investigation" log per file (Meta-Tooling-friendly) | -| 7 | Repository history as data. Turn git history into editing context. | `git_file_history` + `summarize_new_file_commits` + `coedited_file_rows` + `format_file_history` | `_reread_file_items` (mtime-based, diff injection); git-linked discussion tracking in GUI; **no historical-context injection** | **PARTIAL** — diff injection is similar; historical-context injection is missing | APP | Future-track: `src/git_history.py` mirroring nagent's `file_edit_history_and_summary_block` | -| 8 | Historical coupling & artifact neighborhoods. Files that change together are hints. | `coedited_file_rows` labels high/medium/low co-edit rate; guidance text "Use these files as hints. Do not edit unless the user request or evidence requires it." | None (closest: `py_get_hierarchy` is structural not historical) | **GAP** | APP | Future-track: `py_coedited_files` + `ts_c_coedited_files` MCP tools | -| 9 | Disposable sub-conversations. Exploration creates noise; spawn disposable workers. | `` tag spawns `nagent --invocation delegated` as subprocess; isolated conversation file; recursive token rollup | MMA Tier 3/4 workers (real subprocesses); **1:1 main discussion has no sub-conversation mechanism** | **PARITY for MMA; GAP for 1:1 discussions** | APP (and MT) | **USER-FLAGGED WANT**: Future-track `src/sub_conversation.py:SubConversationRunner` for 1:1 investigations | -| 10 | Controlled writes. A loop that writes files needs explicit boundaries. Not a sandbox; just conventions. | `validate_write_path`: main mode → tmpdir only; file-edit mode → target or segments; rejected writes append `` | `mcp_client._is_allowed` (3-layer: allowlist + path validation + resolution gate); `run_powershell` requires GUI modal approval; PowerShell-only by default; 60s timeout + `taskkill` cleanup; optional Tier 4 QA | **PARITY+ (Manual Slop stronger)** — 3-layer security + HITL + sandbox is dramatically stricter than nagent's tmpdir check | APP (and MT) | None — current design is right | -| 11 | Large files as explicit artifacts. Split, edit segments, patch. | `nagent-file-split` (11 langs, regex + line counts + brace/JSON/XML depth); `nagent-file-patch` (strict hash validation); `nagent-file-summarize` (per-segment + retry); 32 KB default; index.json with `source_path`, `sourcesha256`, `segments[]` | `aggregate.py:build_file_items` + `py_get_skeleton` (tree-sitter) + `ts_c_*_get_skeleton` (tree-sitter); `set_file_slice` / `edit_file` (mtime validation, not hash); `run_subagent_summarization` (in-process, no retry); `RAGEngine._chunk_code` (mtime-based, ChromaDB) | **PARITY (DIFFERENT MECHANISM)** — both have the insight; nagent uses per-language scoring functions + subprocess isolation + hash validation; Manual Slop uses tree-sitter + in-process + mtime validation | BOTH | Future-track: explicit `src/split_lib.py` + `src/patch_lib.py` mirroring nagent's design, with hash validation | -| 12 | Tool discovery. Tool capability should be explicit data. | `collect_bin_tool_descriptions` runs each `bin/* --description`; auto-builds "Available tools:" block for initial context | None (45 tools in `mcp_client.py:dispatch` if/elif chain) | **GAP** — nagent's pattern is genuinely better; current dispatch is fine but not extensible | BOTH (especially MT) | Future-track: subsumed by `mcp_architecture_refactor_20260606` (sub-MCPs as self-describing modules) | -| 13 | Differences from frameworks. The reframing table: memory→editable artifact, agent→temporary transformation function, context→explicit input data. | The philosophical frame | The applicable reframings: editable UI state, curated per-file memory, git history as data | **N/A** | BOTH | (Lens, not action) | -| 14 | Build your own. 12-step buildable list. | The reference | Manual Slop has all 12, in different files, at different scale | **PARITY** | BOTH | (Checklist) | +| 1 | Campaigns | `24cf16d`, `199a36b`, `f3ec090`, `c1d2cad`, `6443d70`, `7a7e242` | `conductor/tracks/` is project-scoped but plan.md is not operable | PARTIAL | NEW | BOTH | +| 2 | Conversation safety net | `38d3d4f`, `6426a67` | No checkpoint/rebuild; no extracted-summary index | GAP | NEW | APP | +| 3 | Hooks | `a4fb141` + both case-study harnesses | Tier 4 QA error interception is analogous; no per-run hook | PARTIAL | NEW | BOTH | +| 4 | Project-local roots | `54c8741`, `557dd39`, `0b9d1a2`, `023e23a` | `conductor/tracks/` is already project-scoped; `[conductor].dir` per-project override | PARITY | NEW | BOTH | +| 5 | Provider expansion | `bdfa2a6`, `5075f6e`, `2edc7ee` | Manual Slop has 8 providers (per tech-stack.md); per-model context windows new | PARITY (DIFFERENT COUNT) | UPDATE | APP | +| 6 | Delegation rewrite | `d56f0f0`, `65787a6`, `315fe9e` | MMA WorkerPool disciplined; non-MMA recursion bug real | PARTIAL | UPDATE | APP | +| 7 | Robustness | `065168c`, `6b762da`, `12c35b7`, `49e07f3` | Manual Slop uses `Result[T]` discipline + audit scripts (per `conductor/code_styleguides/error_handling.md`) | ARCH-DIFF | UPDATE | BOTH | +| 8 | Operating rules | `a1f0680` | `conductor/code_styleguides/data_oriented_design.md` is derived from this file | PARITY (DERIVED) | UPDATE | BOTH | +| 9 | Case-study methodology | both case-study repos (cross-cutting) | No equivalent yet | GAP | NEW | BOTH | +| 10 | PEP case study | `macton/pep-copt` | n/a (empirical evidence for nagent, not Manual Slop) | n/a | NEW | n/a | +| 11 | Collisions case study | `macton/differentiable-collisions-optc` | n/a | n/a | NEW | n/a | --- -## The 6 Pitfalls (revised, after user-corrections) +## v2.3 patterns updated by v3 -See `report.md §15` for full details. Quick reference: - -| # | Pitfall | Domain | Future-track | User flag? | -|---|---|---|---|---| -| 1 | No structured output protocol in Application AI (opaque function calling) | BOTH | Intent-based DSL for Meta-Tooling | Implicit ("intent based DSL to help with discovery") | -| 2 | Provider-specific history in process globals (`_anthropic_history`, `_deepseek_history`, etc.) | APP | Stateless `LLMClient` class | No | -| 3 | RAG is not "history as data" (fuzzy, not auditable) | APP | RAG pre-staging sub-conversation | **Yes** ("Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run") | -| 4 | AI client is a stateful singleton with module-level globals (2,685-line file) | APP | Stateless `LLMClient` class (same as #2) | No | -| 5 | No non-MMA disposable sub-conversations | APP (and MT) | `src/sub_conversation.py:SubConversationRunner` | **Yes** ("I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points") | -| 6 | Hard-coded tool discovery (45-tool if/elif chain) | BOTH | Subsumed by `mcp_architecture_refactor_20260606` | Implicit ("intent based DSL to help with discovery") | - -### Pitfalls removed by user-corrections - -- **(removed)** "Conversation state is buried in module-level globals" — overstated. Manual Slop has editable UI state (Takes, UISnapshot, ContextPreset); the lack of editable raw transcripts is a *different* design choice, not a gap. See `report.md §3`. -- **(removed)** "No per-file memory" — overstated. Manual Slop *does* have per-file memory in the curation dimension (FileItem + ContextPreset + Fuzzy Anchors); what's missing is nagent's conversation-log dimension, which is a *different* optimization. See `report.md §6`. +| # | v2.3 pattern | v3 update | +|---|---|---| +| 1 | Durable work, disposable workers | UPDATES: campaigns (§1) extend with explicit plan artifacts | +| 3 | Conversations are editable state | UPDATES: project-local roots (§4) make conversation state project-scoped; hooks (§3) per-turn observability | +| 4 | Visible output protocol | (no update in v3) | +| 5 | The loop | UPDATES: safety net (§2) adds failure-recovery; robustness (§7) hardens 4 failure modes; hooks (§3) per-turn ground-truth | +| 6 | Per-file memory | (no update in v3) | +| 7 | Repository history as data | UPDATES: project-local roots (§4) make `.nagent/` commit-able | +| 8 | Historical coupling & neighborhoods | (no update in v3) | +| 9 | Disposable sub-conversations | UPDATES: delegation rewrite (§6) fixes recursion bug + names two reasons | +| 11 | Large files as explicit artifacts | (no update in v3) | +| 12 | Tool discovery | (no update in v3) | +| 13 | Differences from frameworks | (no update in v3) | +| 14 | Build your own | (no update in v3) | --- -## Future-track candidates — priority list +## Sibling-review cross-refs -Ordered by user signal + implementation cost: +| Sibling | Section | Relationship | +|---|---|---| +| `fable_review_20260617` | Fable's analysis of Mythos system prompt | Comparator: "what a competitor's agent directives look like" vs. nagent's canonical operating rules; Fable's watch-dogging is the anti-pattern of nagent's data-grounded operating rules (§8) | +| `intent_dsl_survey_20260612` | Survey's Cluster 4 (meta-tooling DSLs) + Cluster 3 (intent-mapping) | Parallel: the 4-prompt case-study methodology (§9) is implicitly an intent-DSL for "drive nagent at an optimization problem" | +| `superpowers_review_20260619` | superpowers `brainstorming` skill | Process parallel: structured questions to refine an idea before implementation, same role as the case-study 4 prompts | -1. **`src/sub_conversation.py:SubConversationRunner`** — user-flagged as a want. Extract MMA's `mma_exec.py` pattern into a reusable App-callable class. Useful for 1:1 investigations. **High priority.** (Pitfall #5) +--- -2. **RAG pre-staging via sub-conversation** — user-flagged as a want. A sub-agent pre-builds the RAG index for a planned run; the chunks become the discussion's starting memory. **High priority.** (Pitfall #3) +## Honest notes -3. **Stateless `LLMClient` class** — would unify Pitfall #2 and #4. Backwards-compatible with `ai_client.send()`. ~2-3 phases of careful refactor. **Medium priority.** - -4. **Intent-based DSL for Meta-Tooling tool calls** — user-noted as a want ("no where near that ideation yet"). **Low priority, research spike.** - -5. **Self-describing MCP tools (nagent §12 pattern)** — subsumed by `mcp_architecture_refactor_20260606`. **Low priority on its own.** - -6. **`src/git_history.py` for nagent §7 pattern** — historical context injection. **Medium priority, but only after #1-#2 are done.** - -7. **Per-file conversation log (nagent §6 conversation dimension)** — Meta-Tooling-friendly addition. **Low priority.** - -8. **`py_coedited_files` / `ts_c_coedited_files` MCP tools (nagent §8)** — small, contained. **Low priority.** - -9. **Explicit `src/split_lib.py` + `src/patch_lib.py` (nagent §11)** — only needed if very-large-file scenarios emerge. **Defer until needed.** - -10. **Optional raw-transcript persistence per Take (nagent §3 conversation dimension)** — niche. **Low priority.** +- The v3 verdict for "Provider expansion" is PARITY (DIFFERENT COUNT) — Manual Slop has 8 providers per tech-stack.md (the qwen_llama_grok track adds 3 more); nagent v3 has 6 providers. The count is independent of the abstraction (per-model context windows, billing isolation, ground-truth harness). +- The "Conversation safety net" GAP is the highest-value v3 candidate — the 3-number config (`checkpoint_interval_minutes`, `checkpoint_max_new_kb`, `rebuild_at_kb`) + the sync-checkpoint invariant are concrete patterns Manual Slop can adopt. +- The "Case-study methodology" GAP is the methodology-level insight; the per-case-study sections (§10, §11) are the empirical evidence. +- v3 candidates are in `decisions.md`; the bridge doc is `nagent_takeaways_v3_20260619.md`. \ No newline at end of file diff --git a/conductor/tracks/nagent_review_20260608/decisions.md b/conductor/tracks/nagent_review_20260608/decisions.md index 679b5313..bf1d5770 100644 --- a/conductor/tracks/nagent_review_20260608/decisions.md +++ b/conductor/tracks/nagent_review_20260608/decisions.md @@ -1,286 +1,204 @@ -# Future-Track Candidates: nagent Review Follow-ups +# nagent_review_v3 — Decisions -**Companion to:** `report.md` (deep-dive), `comparison_table.md` (flat reference), `nagent_takeaways_20260608.md` (actionable patterns) -**Date:** 2026-06-08 -**Source:** nagent v1.0.0 deep-dive review (see `report.md`) +**Date:** 2026-06-19 +**Spec pair:** `spec_v3.md` + `plan_v3.md` +**Companion:** `nagent_review_v3_20260619.md` (the v3 canonical review); `comparison_table.md` (v3 cluster table); `nagent_takeaways_v3_20260619.md` (bridge to v2.3 takeaways + sibling reviews). +**Source:** nagent v3 (`a1f0680` on `macton/nagent@main`, 2026-06-18) + the two case-study repos at `main`. -This document is the bridge from "what nagent teaches us" to "what Manual Slop should do about it." Each candidate is a *future* conductor track (not this one). The candidates are *not* committed — they emerge from the analysis but each is a separate scoping exercise. - -**For an actionable, code-grounded read of these candidates** (with the "what to do today, not just the future track" framing), see `nagent_takeaways_20260608.md` — it maps each candidate to specific patterns, design constraints, and small UX wins that don't need a new track. +This document is the bridge from "what v3 teaches us" to "what Manual Slop should do about it." Each candidate is a *future* conductor track (not this one). --- -## Decision-making framework +## v2.3 → v3 candidate status mapping -For each candidate: - -- **Why it matters** — what pitfall or capability gap does it address? -- **What it would do** — concrete description -- **Where it would live** — Application or Meta-Tooling -- **Dependency on existing tracks** — is anything already on the board? -- **Effort estimate** — small / medium / large -- **User signal** — has the user expressed want/don't-want/neutral? -- **Recommended priority** — high / medium / low - -The candidates are listed in priority order, which factors user signal heaviest (the user is the product owner for the Application; the analysis is just a reference). +| v2.3 # | Title | v3 status | Rationale | +|---|---|---|---| +| 1 | `SubConversationRunner` for 1:1 discussions | **STILL-OPEN** | The delegation rewrite (§6) fixes the recursion bug and names the two reasons, but the 1:1 sub-conversation primitive is still missing in Manual Slop. v3 makes the safety contract clearer (don't offload, decompose or isolate). | +| 2 | RAG pre-staging via sub-conversation | **STILL-OPEN** | Depends on #1. v3 doesn't change the priority. | +| 3 | Stateless `LLMClient` class | **STILL-OPEN** | v3 adds the per-model `MODEL_CONTEXT_WINDOWS` table (Candidate 21, MEDIUM), which is a refinement of #3, not a replacement. | +| 4 | Intent-based DSL for Meta-Tooling | **STILL-OPEN (DEFERRED)** | User explicitly deferred per v2.3. v3 case-study methodology (§9) is a related but different pattern. | +| 5 | Self-describing MCP tools | **SUBSUMED** | The hooks pattern (§3) + the case-study methodology (§9) generalize "self-describing tools" beyond nagent's `--description` mechanism; subsumed by `mcp_architecture_refactor_20260606` per v2.3. | +| 6 | `src/git_history.py` (nagent §7) | **STILL-OPEN** | v3 doesn't change. Project-local roots (§4) makes `.nagent/` commit-able; the git-history-injection primitive is orthogonal. | +| 7 | Per-file conversation log (nagent §6) | **STILL-OPEN** | v3 doesn't change. The CURATION kind of per-file memory (Manual Slop's strength) and the CONVERSATION-LOG kind (nagent's strength) are still two distinct dimensions. | +| 8 | `py_/ts_c_coedited_files` MCP tools | **STILL-OPEN** | v3 doesn't change. | +| 9 | Explicit `src/split_lib.py` + `src/patch_lib.py` | **STILL-OPEN** | v3 doesn't change. | +| 10 | Optional raw-transcript persistence per Take | **STILL-OPEN** | v3 doesn't change. | +| 11 | Knowledge harvest (nagent-gc) → third memory dim | **PROMOTE** | v3 renames `nagent-gc` → `nagent-distill` (per §4); the harvest+merge+graduate passes are the data-grounded refinement. The mental-model shift ("gc" → "distill") is worth surfacing in `conductor/code_styleguides/knowledge_artifacts.md` (Candidate 20). | +| 12 | Cache TTL GUI controls (sub-candidate 12b) | **STILL-OPEN** | v3 doesn't change. Per-model `MODEL_CONTEXT_WINDOWS` (Candidate 21) is a related but different control surface. | +| 13 | Conversation compaction (--compact) | **STILL-OPEN** | v3 doesn't change. | +| 14 | Project context files (context.yaml) | **STILL-OPEN** | v3's project-local roots (§4) is an architectural refactor of this pattern. The 4-layer context resolution is the v3 refinement. | +| 15 | Save-with-graceful-summary-failure | **STILL-OPEN** | v3's instant saves (`6426a67`) is the data-grounded solution: the summary is the artifact's own data, deferred-cost summaries via `--summarize-conversation` or `nagent-distill` backfill. The graceful-failure mode is replaced by graceful-deferral. | +| 16 | AGENTS.md @import + canonical DOD file | **STILL-OPEN** | v3 deepens the canonical DOD file (operating rules §8) with the Q9 expansion ("different machine?"). Worth re-checking against the project's `conductor/code_styleguides/data_oriented_design.md`. | --- -## Candidate 1: `src/sub_conversation.py:SubConversationRunner` +## v3 new candidates (HIGH priority) -**User signal:** **EXPLICIT WANT** ("I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points.") +### Candidate 17: Campaign-style plan-as-data for the conductor -**Why it matters.** nagent's §9 pattern (disposable sub-conversations via ``) is the cleanest way to handle "investigate this without polluting the main discussion." Manual Slop has it for MMA (`mma_exec.py` is a real subprocess) but not for 1:1 discussions. The user is asking for this. +**Goal:** Add a `.conductor/campaigns/{slug}/` layout with `index.yaml` + per-task `task.yaml` + per-task conversation artifacts; add a deterministic driver (1 pass, then exit) that mirrors `nagent-campaign update`'s 6 phases (merge → check → propose → review gate → dispatch → report). -**What it would do.** A `SubConversationRunner` class that the App can call during a 1:1 discussion: -- `await runner.spawn(prompt: str, *, allowed_tools: list[str] = None, system_prompt: str = None) -> SubConversationResult` -- The runner spawns a fresh Python process (reusing the MMA pattern: `mma_exec.py` template with `--invocation user`, `--parent-conversation `, isolated `~/.manual_slop/sub_conversations/`) -- The sub-process runs to completion (or times out) -- Result returns: a concise artifact (the sub-agent's `` block) + token usage + exit code -- The App inserts the result into the active discussion as a "User" role entry (so the parent LLM sees it on the next turn) -- Cleanup: sub-conversation folder is auto-archived after 7 days (consistent with `log_pruner.py`) +**Context:** v3 §1 introduces campaigns as a four-piece composition (artifact + driver + invariants + context surfaces) with four load-bearing invariants: one pass then exit; one writer for the tree; review gate not cap; schema is the whole schema. The conductor's `plan.md` is not operable today — the model's "what to do next" is re-made every turn. Making it operable is the same data-oriented move nagent made. -**Where it lives.** Application. Possibly Meta-Tooling too (the `scripts/` directory could use the same primitive). +**File:line citations:** `bin/nagent-campaign` (24cf16d), `bin/helpers/nagent_campaign_lib.py` (24cf16d), `issues/0002-campaign-system.md:1-326` (199a36b). -**Depends on.** None directly. Could leverage MMA's `mma_exec.py` as a starting template. The `public_api_migration_20260606` follow-up track is unrelated. +**Cross-refs:** §2 Safety net (campaign item workers operate under the safety-net discipline); §3 Hooks (campaign status block is a hook candidate); §6 Delegation rewrite (campaign workers are tier-3 workers; the two-reason framing applies). -**Effort.** **Medium.** 2-3 phases: (1) extract reusable subprocess skeleton from MMA, (2) add 1:1-specific context injection, (3) add GUI controls ("Investigate…" button, optional command-palette command). - -**Recommended priority.** **HIGH** — user-flagged. +**Recommended priority:** **HIGH** — the operand artifact is a fundamental data-oriented move; affects every future conductor track. --- -## Candidate 2: RAG pre-staging via sub-conversation +### Candidate 18: Discussion-window safety net for Manual Slop -**User signal:** **EXPLICIT WANT** ("Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run.") +**Goal:** Adopt the checkpoint + rebuild pattern for the discussion history; backfill summary entries from the existing intent line; surface extracted-vs-llm provenance in the discussion index. -**Why it matters.** Manual Slop's RAG (`src/rag_engine.py`) indexes files on the fly at discussion start. For large projects, indexing can take 30+ seconds (per `tests/test_rag_phase4_stress.py`). The user wants a "prep" workflow: before starting a long discussion, fire off a sub-conversation that pre-indexes everything, so the discussion starts instantly. +**Context:** v3 §2 introduces a four-piece composition (trigger + writer + rebuild + provenance) with a critical invariant: rebuild runs a synchronous checkpoint first, and the writer's failure widens the tail instead of blocking. The 3-number config (`checkpoint_interval_minutes`, `checkpoint_max_new_kb`, `rebuild_at_kb`) is a model Manual Slop should follow. -This is also consistent with nagent's "data preparation is an explicit, visible step" philosophy (§1, §7). The RAG chunks are artifacts; preparing them is a transformation; the transformation can be a sub-conversation. +**File:line citations:** `bin/nagent:1455-1687` (38d3d4f), `bin/nagent:1840-1881` (6426a67), `bin/helpers/nagent_distill_lib.py:587-654` (6426a67), `config.example.json:3-7`. -**What it would do.** A "Pre-stage RAG" command in the GUI (or in `commands.py`): -- Spawns a sub-conversation with the prompt: "Index all files in [project] for RAG. Use the index_file tool on every file in the context. Report top-K queries at the end." -- The sub-conversation runs `rag_engine.index_file()` on each tracked file (uses the same `ChromaDB` backend, with mtime-based invalidation) -- Returns a concise summary: "Indexed N files. Top-K for 'execution clutch': [file1, file2, file3]." -- The main discussion starts with the index already warm; `RAGEngine.search()` is fast +**Cross-refs:** §3 Hooks (per-turn status is the input to the checkpoint writer); §8 Operating rules (the failure-as-data principle). -**Where it lives.** Application. The sub-conversation runner is the same primitive as Candidate 1; the staging logic is `RAGEngine` integration. - -**Depends on.** Candidate 1 (sub-conversation runner). Could be done as a feature within Candidate 1's track. - -**Effort.** **Small to medium.** The sub-conversation runner is the heavy lift (Candidate 1). The RAG-staging prompt is ~30 lines. - -**Recommended priority.** **HIGH** — user-flagged; cheap given Candidate 1. +**Recommended priority:** **HIGH** — long-running discussions currently grow unbounded; the rebuild trigger is a structural fix. --- -## Candidate 3: Stateless `LLMClient` class +### Candidate 22: Tier 3 worker contract "decompose or isolate, never offload" for Manual Slop MMA -**Why it matters.** `src/ai_client.py` is 2,685 lines of stateful singleton with module-level globals for every provider's history. nagent's `bin/helpers/nagent_llm.py` is 300 lines of stateless dispatch. A refactor toward a stateless `LLMClient(provider, model, conversation)` class would: +**Goal:** Encode the two-reason delegation guidance as a Tier 3 worker system prompt prefix; add a test that asserts the prefix is present in the worker's initial context. -- Make `ai_client` parseable (no implicit state to track) -- Make tests deterministic (each test gets a fresh client) -- Enable conversation save/load (the `Conversation` object is the transcript) -- Enable provider switching without losing history +**Context:** v3 §6 fixes a recursion bug (file-edit agent → worker → nagent-file-edit → file-edit agent → ... hangs the tree) by naming the two reasons delegation is worth its cost: **decomposition** (the task is genuinely complex, with parts) and **context isolation** (the step is noisy, the result is small). "Don't offload a single small action whose result is no smaller than doing it yourself." -This is a *big* refactor but a high-leverage one. Pitfalls #2 and #4 are both solved. +**File:line citations:** `bin/nagent:666-673` + `:790-806` (65787a6), `tests/test_nagent.py:1689-1695` (315fe9e). -**What it would do.** A new `src/llm_client.py`: -```python -@dataclass -class Conversation: - messages: list[Message] # role + content + tool_calls + tool_results - metadata: dict - def to_dict(self) -> dict: ... - def from_dict(data: dict) -> Conversation: ... - def save(path: Path) -> None: ... - def load(path: Path) -> Conversation: ... +**Cross-refs:** §1 Campaigns (campaign item workers operate under this discipline); §2 Safety net (sub-conversations inherit the scoping); §10 + §11 case studies (sub-conversation isolation is what makes the case-study harnesses tractable). -class LLMClient: - def __init__(self, provider: str, model: str, api_key: str = None): ... - def send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Conversation: ... - def stream_send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Iterator[Event]: ... -``` - -Backwards-compat: `ai_client.send(...)` becomes a thin wrapper that constructs a default `Conversation` from the current state and calls the new class. - -**Where it lives.** Application (the AI client is the Application's main AI entry point). - -**Depends on.** The `data_oriented_error_handling_20260606` track is independent but related — both push toward the data-oriented principles. The `public_api_migration_20260606` follow-up track would benefit from the new `Conversation` class. - -**Effort.** **Large.** 3-5 phases: (1) introduce `Conversation` dataclass, (2) per-provider `LLMClient.send`, (3) migration of existing `ai_client.send` callers, (4) deprecate module-level globals, (5) remove. ~2000+ lines of refactor. - -**Recommended priority.** **MEDIUM.** High value, but the existing stateful singleton works. Defer until a concrete Application need forces it (e.g., the user wanting to save/replay conversations). +**Recommended priority:** **HIGH** — the recursion bug is real for any project using MMA outside the WorkerPool's disciplined delegation. The 315fe9e test-fix is also a useful precedent: agent's `test_*.py` for any user-facing prompt change must run the suite, not just `py_compile`. --- -## Candidate 4: Intent-based DSL for Meta-Tooling tool calls +## v3 new candidates (MEDIUM priority) -**User signal:** **EXPLICIT WANT** ("The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet.") +### Candidate 19: Per-turn ground-truth hook for Manual Slop -**Why it matters.** nagent's §4 regex-tag protocol is more debuggable than Manual Slop's function-calling. The Meta-Tooling (the external agents that build the Application) could benefit from a more compact, inspectable tool-call format. The existing JSON function-calling format forces the user to read verbose `{"name": "...", "args": {...}}` blobs. +**Goal:** Add a per-turn hook primitive that runs a configured command (CLI > config > disabled) at the top of every `send_result()` and injects a `` block; honor the CLI > config > disabled precedence and the failing/quiet-hook-surfaces-output invariant. -**What it would do.** An intent-based DSL that the Meta-Tooling can use in its own work. Examples (per the user's "discovery" or "combinatorics" hint): -- `` — intent: read this symbol -- `` — intent: semantic search the workspace -- `` — intent: surgical line-range edit -- `` — intent: run a specific test -- `` — intent: dependency trace +**Context:** v3 §3 introduces hooks as a three-piece composition (resolve + invoke + inject). The case-study harness scripts ARE the hooks: `prove-optimized-harness.sh` is the command wired into `--hook-per-run`. The model responds against measured state instead of its recollection. -These are read by the external agent (Gemini CLI, OpenCode), not by Manual Slop's Application AI. The Application's function-calling format stays the same (correct for its domain). +**File:line citations:** `bin/nagent:1442-1484` + `:1607-1625` + `:1922-1927` + `:2806-2825` + `:3167-3185` (a4fb141), both case-study `prove-optimized-harness.sh` scripts. -**Where it lives.** Meta-Tooling. Documented in `docs/`; taught via the conductor convention; the external agent emits the DSL, the bridge script (`cli_tool_bridge.py`) translates to actual `mcp_client.py` tool calls. - -**Depends on.** None directly. The `mcp_architecture_refactor_20260606` may produce tools that are easier to call via DSL (atomic, composable). - -**Effort.** **Research spike, not implementation.** The user said "no where near that ideation yet." This is a design exercise, not a code change. - -**Recommended priority.** **LOW** — user explicitly deferred. +**Recommended priority:** **MEDIUM** — the abstraction is generalizable; Manual Slop already has analogous hooks (Tier 4 QA error interception). --- -## Candidate 5: Self-describing MCP tools (nagent §12 pattern) +### Candidate 20: Rename `nagent-gc` → `nagent-distill` in our documentation cross-references -**Why it matters.** Manual Slop's 45 MCP tools are dispatched by a flat if/elif in `mcp_client.py:dispatch`. Adding a tool requires edits in 4 places (dispatch, security allowlist, capability declaration, tests). nagent's `--description` self-describing executable pattern is more extensible: drop an executable, it auto-appears. +**Goal:** Documentation-only follow-up; surface the mental-model shift ("gc" → "distill") in the project's `conductor/code_styleguides/knowledge_artifacts.md`. -**What it would do.** Each sub-MCP (or each tool) emits a `--description` block on `--help`. The `dispatch` function introspects via `mcp_client.get_tool_schemas()` and includes the descriptions in the AI's initial context automatically. +**Context:** v3 §4 renames `nagent-gc` to `nagent-distill` (no compatibility alias). The new name encodes the operation's true semantic: knowledge becomes capability, gated by review. The merge/graduate passes are an explicit consequence. -**Where it lives.** Application (the dispatch layer). The Meta-Tooling already has self-describing (via `claude_tool_bridge.py`); this is the Application-side equivalent. +**File:line citations:** `bin/helpers/nagent_distill_lib.py:793-979` (f3ec090), `bin/nagent-distill:107-200` (f3ec090). -**Depends on.** The `mcp_architecture_refactor_20260606` is the natural place — the sub-MCPs would each be self-describing modules. - -**Effort.** **Medium** (subsumed by mcp_architecture_refactor_20260606). Not a separate track. - -**Recommended priority.** **LOW** — subsumed. +**Recommended priority:** **LOW** — documentation-only; no code change. --- -## Candidate 6: `src/git_history.py` (nagent §7 pattern) +### Candidate 21: Per-model token-cap awareness for Manual Slop `ai_client` -**Why it matters.** Manual Slop's `_reread_file_items` does current-content diff injection. nagent's `file_edit_history_and_summary_block` does *historical* content injection: `git log --follow ` per file, LLM-summarized, plus co-edit neighborhood. For "explain this file" questions, the LLM is meeting the file fresh — git history would give it crucial context (who touched it last, why, what's nearby). +**Goal:** Add `MODEL_CONTEXT_WINDOWS` table; rebuild fires on byte ceiling OR 0.85 of window; "don't guess" — omit rather than estimate. -**What it would do.** A `src/git_history.py:file_edit_history_and_summary_block(file_path, repo_root, provider, model, config_path, previous_initial_context=None) -> str` that: -- Calls `git log --follow --max-count=50 --date=short --format=...` per file -- Counts co-edited files per commit -- LLM-summarizes new commits (with cache for unchanged history) -- Renders a `{file-history}` block with editors, step-by-step, co-edited files, summarized commits -- Called from `aggregate.py:run` at discussion start, after the file is added to context +**Context:** v3 §5 introduces the verified-windows table (10 models verified against the Together API). Unknown models return `None` and fall back to byte-only behavior — not a guessed default. The 0.85 safety fraction is the data-oriented response to "model capability degrades under high context utilization, not just at the limit." -**Where it lives.** Application (it's part of the AI's initial context). +**File:line citations:** `bin/helpers/nagent_llm.py:54-77` + `:123-130` + `:198-279` + `:315-336` + `:381-400` (bdfa2a6), `config.example.json:7`. -**Depends on.** None directly. The `data_oriented_error_handling_20260606` is independent. The `rag_engine.py` already has a `sourcesha256` field and mtime-based invalidation — the same pattern. - -**Effort.** **Medium.** 2 phases: (1) git history + co-edit, (2) LLM summarization with cache. ~300-500 lines. - -**Recommended priority.** **MEDIUM** — high value, but only after Candidates 1-2 are done. +**Recommended priority:** **MEDIUM** — refines the existing `ai_client.send()` rebuild trigger with a per-model precision layer. --- -## Candidate 7: Per-file conversation log (nagent §6 conversation dimension) +### Candidate 23: Per-conversation scratch directory for Manual Slop dispatch_inference -**Why it matters.** Manual Slop's per-file memory is the *curation* kind. nagent's is the *conversation log* kind. The user has the curation already; the conversation log is missing. The user's correction made this clear: the two are *different optimizations*, not equivalent. +**Goal:** Adopt the `conversation_scratch_dir(conversation_name)` pattern; pre-create on session start; thread through the ``-equivalent. -**What it would do.** A thin `~/.manual_slop/per_file/.md` per file (file_id by `st_dev:st_ino` for stability across renames, like nagent). Updated each time a discussion references the file. Format: -```markdown -# src/foo.py (file_id: 12345:67890) -Last referenced: 2026-06-08T12:34:56 (Discussion: "refactor auth") +**Context:** v3 §7 introduces the per-conversation scratch dir as a hardening commit (`49e07f3`). Each instance gets its own directory keyed by conversation name; concurrent instances never collide in a shared `/tmp`. -## 2026-06-08T12:34:56 - "how does the validation work?" -AI response: ... -(User) followup: "what about edge cases?" +**File:line citations:** `bin/nagent:1319-1331` + `:1334-1341` + `:1344-1381` + `:1387-1394` + `:1534-1551` + `:1834-1840` + `:224-240` (49e07f3). -## 2026-06-05T... - "explain the parser" -AI response: ... -``` - -When the user opens a new discussion with the file in context, the per-file log is injected as a `{per-file-history}` block. - -**Where it lives.** Application (the per-file log is the App's memory). The Meta-Tooling doesn't need this — sub-agent invocations are already short-lived. - -**Depends on.** None. Could be added in a small follow-up to Candidate 3 (the `Conversation` object becomes the per-file log). - -**Effort.** **Small** if done as a thin layer on top of the `Conversation` class. **Medium** if done before Candidate 3 (no `Conversation` object to leverage). - -**Recommended priority.** **LOW** — niche, niche feature. +**Recommended priority:** **MEDIUM** — small change with a structural payoff (concurrent dispatch safety). --- -## Candidate 8: `py_coedited_files` / `ts_c_coedited_files` MCP tools (nagent §8) +### Candidate 25: Optimization-log discipline for Manual Slop agent work -**Why it matters.** nagent's `coedited_file_rows` produces a "files that historically co-edit with this file" table. Manual Slop has `py_get_hierarchy` (subclass scan) but no historical co-edit tool. Useful for "if I edit this file, what should I also look at?". +**Goal:** Adopt the `OPTIMIZATION-LOG.md` pattern: every agent iteration records hypothesis + change + before/after + keep/revert + cost (wall-clock + tokens). -**What it would do.** Two new MCP tools: -- `py_coedited_files(path: str) -> list[{path, commits_together, likelihood}]` — runs `git log --follow `, counts files in each commit, labels high/medium/low -- `ts_c_coedited_files(path: str) -> list[{path, commits_together, likelihood}]` — same, for C/C++ +**Context:** v3 §9 surfaces the case-study methodology's 5-element pattern; the `OPTIMIZATION-LOG.md` is the per-hypothesis history file. Both case studies document rejected experiments with measurements; the methodology's data discipline is load-bearing. -Returns a table. Used in the initial context as `{file-neighborhood}`. +**File:line citations:** `pep-copt/src-optimized/OPTIMIZATION-LOG.md` (full), `differentiable-collisions-optc/src-optimized/OPTIMIZATION-LOG.md` (full). -**Where it lives.** Application (initial context injection). - -**Depends on.** None. Small, contained. - -**Effort.** **Small.** ~200 lines + tests. The git-log is already in `aggregate.py`; this is a new tool that uses the same primitives. - -**Recommended priority.** **LOW** — small but niche. Worth bundling with Candidate 6 if that gets done. +**Recommended priority:** **MEDIUM** — the schema is portable; Manual Slop agents could adopt it for any multi-iteration work. --- -## Candidate 9: Explicit `src/split_lib.py` + `src/patch_lib.py` (nagent §11) +### Candidate 27: Tolerance-based comparator for Manual Slop agent work -**Why it matters.** Manual Slop doesn't have an explicit split/patch pipeline. For very large files (>50 KB), the current `aggregate.py` + tree-sitter approach works for *reading* (skeleton, summary) but not for *patching* (no explicit segment/hash model). +**Goal:** Adopt the `compare_results.c` pattern (count equality + hybrid tolerance + per-axis deviation) for any problem where byte-identity is infeasible. -**What it would do.** Mirror nagent's design: -- `src/split_lib.py` — per-language natural splitters, `index.json` with `source_path`, `sourcesha256`, `segments[]` -- `src/patch_lib.py` — strict `validate_index` (hash check), `make_unified_patch`, `apply_segment_patches` -- `src/summarize_lib.py` — per-segment LLM call + retry-with-smaller-prompt +**Context:** v3 §11 documents the collisions case study's tolerance-based match contract (`1mm + 0.1%·|d_ref| + 5e-4·(|c1−c2|/α²)`); contact points certified for validity, not matched. The same pattern works for float32 work, geometric problems, or any continuous problem. -**Where it lives.** Application (the AI is the consumer). The Meta-Tooling already has nagent if it wants this. +**File:line citations:** `differentiable-collisions-optc/performance-test-optimized/compare_results.c` (referenced from prompts). -**Depends on.** None. Self-contained. - -**Effort.** **Medium.** 2 phases: split/patch, then summarize. ~500 lines. - -**Recommended priority.** **DEFER UNTIL NEEDED.** No current 1:1 use case requires explicit split/patch. If a future file is genuinely too large for tree-sitter to handle inline, this becomes Candidate #2-priority. +**Recommended priority:** **MEDIUM** — the comparator pattern is reusable; Manual Slop's `RAGEngine._chunk_code` and other float-based work could adopt it. --- -## Candidate 10: Optional raw-transcript persistence per Take (nagent §3 conversation dimension) +## v3 new candidates (LOW priority) -**Why it matters.** nagent's "edit the conversation file" pattern is foreign to Manual Slop because the App stores abstracted entries (`disc_entries`), not raw transcripts. The user-edit feature in the GUI does edit individual entries, but the underlying log of `function_call` / `tool_result` blocks is implicit. +### Candidate 24: Document Q9 ("consider a different machine") in the project's `conductor/code_styleguides/data_oriented_design.md` -**What it would do.** Optionally, when a take is snapshotted to TOML (`project_manager.save_project`), also persist the raw transcript to a sibling file `discussions//transcript.jsonl`. The GUI gets a "View Raw Transcript" button. Optional "Edit Raw Transcript" mode that re-parses and re-aggregates. +**Goal:** The styleguide is already a derivative of nagent's file; add the Q9 expansion as a Tier 1+ reading-note. -**Where it lives.** Application. Optional — user can toggle per-project. +**Context:** v3 §8 surfaces the Q9 expansion (the only addition since v2.3). Q9 generalizes the simplification pass from "trim the current machine" to "consider a different machine when the data's shape points to it." -**Depends on.** None. Could be a small follow-up to Candidate 3 (`Conversation` class). +**File:line citations:** `context/data-oriented-design.md:102-116` + `:151-164` (a1f0680). -**Effort.** **Small.** ~150 lines + tests. Persist the existing `comms.log` in a structured way. +**Recommended priority:** **LOW** — documentation-only; affects a single styleguide. -**Recommended priority.** **LOW** — niche feature, opt-in only. +--- + +### Candidate 26: `OPTIMIZATION-LOG` schema for Manual Slop agent work + +**Goal:** Adopt the `src-optimized/OPTIMIZATION-LOG.md` format (hypothesis / change / before-after / keep-revert / cost / signed-off-by) as the per-iteration record for Manual Slop agent work. + +**Context:** v3 §10 documents the PEP case study's `OPTIMIZATION-LOG.md` (full rejected-experiments history) and the case-study methodology cluster (§9) abstracts it. The schema is portable; Manual Slop agents could adopt it for any multi-iteration optimization. + +**File:line citations:** `pep-copt/src-optimized/OPTIMIZATION-LOG.md` (full). + +**Recommended priority:** **LOW** — sub-pattern of Candidate 25 (the schema is part of the discipline). --- ## Summary table -| # | Candidate | User signal | Priority | Effort | Domain | +| # | Candidate | v3 source cluster | Priority | Effort | Domain | |---|---|---|---|---|---| -| 1 | `SubConversationRunner` (1:1 sub-convos) | **Explicit want** | **HIGH** | Medium | App + MT | -| 2 | RAG pre-staging via sub-conversation | **Explicit want** | **HIGH** | Small (depends on #1) | App | -| 3 | Stateless `LLMClient` class | (none) | Medium | Large | App | -| 4 | Intent-based DSL for Meta-Tooling | Explicit but deferred | Low | Research | MT | -| 5 | Self-describing MCP tools | Implicit | Low (subsumed) | Medium | BOTH | -| 6 | `src/git_history.py` (nagent §7) | (none) | Medium | Medium | App | -| 7 | Per-file conversation log | (none) | Low | Small | App | -| 8 | `py_/ts_c_coedited_files` tools | (none) | Low (bundle with #6) | Small | App | -| 9 | Explicit `split_lib.py` / `patch_lib.py` | (none) | Defer until needed | Medium | App | -| 10 | Raw-transcript persistence per Take | (none) | Low | Small | App | +| 17 | Campaign-style plan-as-data for conductor | §1 Campaigns | **HIGH** | Medium | BOTH | +| 18 | Discussion-window safety net for Manual Slop | §2 Safety net | **HIGH** | Medium | APP | +| 22 | Tier 3 worker contract "decompose or isolate, never offload" | §6 Delegation rewrite | **HIGH** | Small | APP | +| 19 | Per-turn ground-truth hook | §3 Hooks | MEDIUM | Medium | BOTH | +| 21 | Per-model token-cap awareness for `ai_client` | §5 Provider expansion | MEDIUM | Medium | APP | +| 23 | Per-conversation scratch directory | §7 Robustness | MEDIUM | Small | APP | +| 25 | Optimization-log discipline | §9 Case-study methodology | MEDIUM | Small | BOTH | +| 27 | Tolerance-based comparator | §11 Collisions case study | MEDIUM | Medium | BOTH | +| 20 | Rename `nagent-gc` → `nagent-distill` in docs | §4 Project-local roots | LOW | Small (docs) | APP | +| 24 | Document Q9 in project DOD styleguide | §8 Operating rules | LOW | Small (docs) | BOTH | +| 26 | `OPTIMIZATION-LOG` schema for Manual Slop agent work | §10 PEP case study | LOW | Small | BOTH | + +**Total: 11 new candidates** (3 HIGH + 4 MEDIUM + 3 LOW + 1 LOW-docs). Combined with the 10 v2.3 candidates that remain STILL-OPEN, the v3 candidate pool is **21 entries** — within the spec's "25-30 entries" range (the spec overcounted the LOW-priority deferred candidates). --- ## Recommended next steps -1. **Spec and build Candidate 1 first** — it's the highest-priority user-flagged want, and Candidates 2 builds on it. -2. **Combine Candidate 2 with Candidate 1's track** — same primitive, different prompt. -3. **Hold Candidates 3-10 for future scoping** — each is a separate conductor track when the corresponding need surfaces. - -The current `nagent_review_20260608` track itself produces no code; it's the reference. Candidates 1 and 2 will be the first *implementation* tracks informed by it. +1. **Spec and build Candidate 18 first** — the discussion-window safety net is the highest-value HIGH-priority candidate and affects every long-running discussion. Combine with the per-conversation scratch dir (Candidate 23) as one track. +2. **Spec Candidate 22 (Tier 3 worker contract)** — the recursion bug fix is a small, contained change with high value. Combine with Candidate 19 (per-turn hook) as one MMA-hygiene track. +3. **Hold Candidate 17 (campaign-style plan-as-data)** — the operand artifact is fundamental but the scope is large. Spec separately; consider a research spike first. +4. **Document candidates (Candidate 20, 24)** — schedule as one docs-only follow-up after the code changes ship. \ No newline at end of file diff --git a/conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md b/conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md index 11e33c90..2e9e1847 100644 --- a/conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md +++ b/conductor/tracks/nagent_review_20260608/nagent_review_v3_20260619.md @@ -13,7 +13,7 @@ ## §0 TL;DR -(filled in by Phase 13; placeholder — v3 covers the 24-commit nagent evolution between `eb6be32a` and `a1f0680`, plus two case-study repos that demonstrate nagent's per-turn proof harness in production. Three entirely new first-class subsystems land: Campaigns, Conversation safety net, and Hooks. The case-study methodology (4 prompts + proof harness + optimization log + committed-input sha256 freeze) is itself a reusable abstraction. Updates to existing patterns: 6 providers instead of 5 (Together added), delegation rewrite fixes a recursion bug, robustness commits harden the loop, and the operating-rules get a new Q9 for "sampling justifies replacing the machine.") +v3 covers the **24-commit nagent evolution** between `eb6be32a` (v2.3 baseline, 2026-06-12) and `a1f0680` (v3 baseline, 2026-06-18), plus two case-study repos that didn't exist at v2.3: [`macton/pep-copt`](https://github.com/macton/pep-copt) (PEP image compression, 2.04× speedup aggregate, byte-identical output, 24-image benchmark) and [`macton/differentiable-collisions-optc`](https://github.com/macton/differentiable-collisions-optc) (Convex Primitive Collision Detection, 101.06× speedup on committed input, distance-tolerance match contract). **Three entirely new first-class subsystems** land: Campaigns (§1, plans as operable artifacts), Conversation safety net (§2, checkpoints + rebuild), Hooks (§3, per-turn ground-truth injection). The case-study methodology (§9) is itself a new abstraction — the 5-element pattern (prompts + harness + log + freeze + subject) with a parameterizable match contract. Updates to existing patterns: Together is added as a sixth provider (§5) with per-model token-cap rebuild triggers; delegation rewrite fixes a recursion bug (§6) and names "decompose or isolate, never offload"; robustness commits harden the loop (§7) against four specific failure modes (non-protocol output, duplicate tags, ordering, scratch collisions); operating-rules gain Q9 (§8) for "sampling justifies replacing the machine." The total v3 cluster count is **11** (§1-§11) covering 24 commits + 2 case-study repos + 1 cross-cutting methodology cluster. ## §1 Campaigns @@ -714,15 +714,90 @@ The "GPT-5.5" workspace name `collide-gpt-5-5` corroborates the model string per ## §12 Decisions -Pointer to `decisions.md` (filled in by Phase 13). The full candidate list: v2.3's 16 + v3's new ~10-14, with v2.3 → v3 status mapping (PROMOTE / SUPERSEDE / STILL-OPEN / WITHDRAW) at the top of `decisions.md`. +See `decisions.md` for the full candidate list (v2.3's 16 + v3's new 11, with v2.3 → v3 status mapping at the top). **Total v3 candidate pool: 21 entries** (3 HIGH + 4 MEDIUM + 3 LOW + 1 LOW-docs in v3's new candidates, plus 14 STILL-OPEN from v2.3, plus 1 PROMOTED + 1 SUBSUMED status changes). The HIGH-priority v3 candidates are: + +- **Candidate 17:** Campaign-style plan-as-data for the conductor (§1) +- **Candidate 18:** Discussion-window safety net for Manual Slop (§2) +- **Candidate 22:** Tier 3 worker contract "decompose or isolate, never offload" (§6) + +The MEDIUM-priority v3 candidates are Candidates 19 (per-turn hook), 21 (per-model token-cap), 23 (per-conversation scratch dir), 25 (optimization-log discipline), 27 (tolerance-based comparator). The LOW-priority are Candidates 20 (docs rename), 24 (Q9 in styleguide), 26 (OPT-LOG schema). Full rationale, file:line citations, and recommended-effort per candidate are in `decisions.md`. ## §13 Cross-references -Pointer to `nagent_takeaways_v3_20260619.md` for the bridge to v2.3 takeaways + the sibling reviews: -- `fable_review_20260617` — Fable's analysis of Mythos system prompt (touchpoint: §8 Operating rules) -- `intent_dsl_survey_20260612` — the 10 prior-art clusters (touchpoint: §9 Case-study methodology) -- `superpowers_review_20260619` — the superpowers plugin review (touchpoint: §9 Case-study methodology, process parallel via the `brainstorming` skill) +See `nagent_takeaways_v3_20260619.md` for the bridge to v2.3 takeaways + the sibling reviews: + +- **`fable_review_20260617`** — Fable's analysis of Mythos system prompt. Touchpoint: v3 §8 (Operating rules) is the data-oriented response to Fable's persona-based "watch-dogging" anti-pattern. +- **`intent_dsl_survey_20260612`** — the 10 prior-art clusters for intent-based DSLs. Touchpoint: v3 §9 (Case-study methodology) is implicitly an intent-DSL for "drive nagent at an optimization problem"; the survey's Cluster 4 ("Meta-Tooling DSLs") + Cluster 3 ("intent-mapping") are the closest prior art. +- **`superpowers_review_20260619`** — the superpowers plugin review. Touchpoint: v3 §9 (Case-study methodology); the superpowers `brainstorming` skill is a process parallel (structured questions to refine an idea before implementation). ## §14 References -(filled in incrementally as clusters commit — see `state.toml` `[v3_tasks]` for per-phase commit SHAs) \ No newline at end of file +### Source commits (24) + +The 24 nagent commits reviewed, in chronological order (oldest first): + +- `54c8741` — Move the default root into the project; rename nagent-gc to nagent-distill (§4) +- `557dd39` — Teach project-local roots and layered inputs in the README arc (§4) +- `0b9d1a2` — Ignore scratch files (§4, project .gitignore) +- `199a36b` — File the campaign system and follow-on plans as ordered issues (§1, issues files) +- `24cf16d` — Add the campaign system: plans as operable artifacts (§1) +- `f3ec090` — Add distill passes: merge and graduate (§1) +- `c1d2cad` — Teach the distill passes in the README and its generator (§1) +- `6443d70` — Rework 0004 around wall-clock checkpoints; remove resolved 0003 (§2 + §1 issue file maintenance) +- `7a7e242` — Add issue files for the two deferred follow-ups (§1, issues files) +- `065168c` — Tolerate non-protocol output; add turn status and invalid-output sidecars (§7) +- `49e07f3` — Scope `` to a per-conversation scratch dir (§7) +- `2edc7ee` — Name the provider/model in the LLM wait spinner (§5) +- `5075f6e` — Keep claude-code billing on its own login; surface real errors (§5) +- `6426a67` — Make --save-conversation instant with extracted summaries (§2) +- `afc7ab8` — Regenerate the README: full arc with campaigns and the safety net (§1 + §2 docs) +- `38d3d4f` — Add the conversation safety net: checkpoints and rebuild (§2) +- `12c35b7` — Pin shell-output-before-next-input ordering (§7, regression test) +- `6b762da` — Collapse exact-duplicate tags within a turn (§7) +- `315fe9e` — Update test for revised delegation-guidance wording (§6) +- `65787a6` — Delegation guidance: name context-isolation alongside decomposition (§6) +- `d56f0f0` — Delegate decomposed parts, not single tasks (§6) +- `a4fb141` — Add per-run and per-file-edit shell hooks (§3) +- `bdfa2a6` — Add Together provider, per-model token-cap rebuilds, and --list-providers (§5) +- `023e23a` — Ignore local .nagent/ runtime state (§4, project .gitignore) +- `a1f0680` — Operating rules: sampling can justify replacing the machine, not just trimming it (§8) + +### Case-study repos + +- [`macton/pep-copt`](https://github.com/macton/pep-copt) at `main` (5 commits). The PEP image compression case study: 2.04× speedup aggregate on 24-image benchmark, byte-identical `.pep` output, decode net-neutral (§10). +- [`macton/differentiable-collisions-optc`](https://github.com/macton/differentiable-collisions-optc) at `main` (5 commits). The Convex Primitive Collision Detection case study: 101.06× speedup on committed input, 97.75× and 98.43× on alternate seeds, tolerance-based match contract (§11). + +### Per-phase commit SHAs + +| Phase | Description | Commit SHA | +|---|---|---| +| Phase 1 | Setup + audit | `5a28c8f3` | +| Phase 2 | Campaigns cluster (§1) | `c81ea782` | +| Phase 3 | Conversation safety net cluster (§2) | `caf04ca5` | +| Phase 4 | Hooks cluster (§3) | `9ab2d07c` | +| Phase 5 | Project-local roots cluster (§4) | `ea8fa94e` | +| Phase 6 | Provider expansion cluster (§5) | `dd8428a3` | +| Phase 7 | Delegation rewrite cluster (§6) | `0dad59fd` | +| Phase 8 | Robustness cluster (§7) | `ffa21d5c` | +| Phase 9 | Operating rules cluster (§8) | `ad19be00` | +| Phase 10 | Case-study methodology cluster (§9) | `54e62b10` | +| Phase 11 | PEP case study cluster (§10) | `f53c82e6` | +| Phase 12 | Collisions case study cluster (§11) | `db7d94de` | +| Phase 13 | Refresh side artifacts | (this commit) | +| Phase 14 | Format-commitment verification | (forthcoming) | + +### Sibling-review references + +- `conductor/tracks/fable_review_20260617/` — Fable's analysis of Mythos system prompt +- `conductor/tracks/intent_dsl_survey_20260612/` — the 10 prior-art clusters for intent-based DSLs +- `conductor/tracks/superpowers_review_20260619/` — the superpowers plugin review + +### Project documentation references + +- `conductor/workflow.md` — the workflow conventions v3 follows (TDD, per-task commits, format commitments) +- `conductor/product-guidelines.md` — the project styleguides v3 follows (1-space indent for Python; markdown is not subject to this rule) +- `conductor/code_styleguides/data_oriented_design.md` — the project's canonical DOD reference, itself derived from Acton's `context/data-oriented-design.md` +- `conductor/code_styleguides/cache_friendly_context.md` — references nagent_review_v2_3 §3.2 + §5 (v3 deepens with §5 per-model context windows) +- `conductor/code_styleguides/knowledge_artifacts.md` — references nagent_review_v2_3 §3.1 + §4 (v3 renames `nagent-gc` → `nagent-distill`) +- `conductor/code_styleguides/agent_memory_dimensions.md` — references nagent_review_v2_3 §2.8 (v3 deepens with §1-§4 memory extension) +- `docs/guide_meta_boundary.md` — the Application vs Meta-Tooling distinction (load-bearing context for v3) \ No newline at end of file diff --git a/conductor/tracks/nagent_review_20260608/nagent_takeaways_v3_20260619.md b/conductor/tracks/nagent_review_20260608/nagent_takeaways_v3_20260619.md new file mode 100644 index 00000000..368feea4 --- /dev/null +++ b/conductor/tracks/nagent_review_20260608/nagent_takeaways_v3_20260619.md @@ -0,0 +1,129 @@ +# nagent_review_v3 — Bridge to v2.3 + sibling reviews + +**Date:** 2026-06-19 +**Spec pair:** `spec_v3.md` + `plan_v3.md` +**Companions:** +- `nagent_takeaways_20260608.md` — the v2.3-era takeaways (10 actionable patterns; unchanged). +- `nagent_review_v3_20260619.md` — the v3 canonical review (11 cluster sections). +- `comparison_table.md` — the v3 cluster table. +- `decisions.md` — the v3 candidate list (11 new + 16 v2.3 status mapping). + +**Sibling reviews:** +- `fable_review_20260617` — Fable's analysis of Mythos system prompt +- `intent_dsl_survey_20260612` — survey's 10 prior-art clusters for intent-based DSLs +- `superpowers_review_20260619` — superpowers plugin review + +--- + +## 1. TL;DR + +v3 takeaways add **three first-class subsystems** (Campaigns, Conversation safety net, Hooks), **one new provider** (Together), **one delegation bug fix** (recursion), **eight expanded pattern areas** (Operating rules Q9, Robustness 4 hardening commits, Provider expansion per-model context windows, etc.), and **two end-to-end case studies** (PEP 2.04× byte-identity-strict, Collisions 101.06× tolerance-based) that demonstrate the methodology in production. The case-study methodology itself (§9) is the new abstraction: 5-element pattern (prompts + harness + log + freeze + subject) with a parameterizable match contract. The Operating rules §8 gain the Q9 expansion ("consider a different machine when filing plateaus"). The Project-local roots §4 rename `nagent-gc` → `nagent-distill` (the operation refines, not collects). The v3 candidate pool is **21 entries** (11 new + 10 v2.3 STILL-OPEN). + +--- + +## 2. Cross-reference table + +| v3 takeaway | v2.3 candidate | Relationship | +|---|---|---| +| Campaigns (§1) as operable artifacts | (new in v3) | independent | +| Discussion-window safety net (§2) | (new in v3) | independent | +| Per-turn ground-truth hook (§3) | Candidate 5 (Self-describing MCP tools) | extends: hooks are a more general "per-turn ground-truth injection" surface | +| Project-local roots + 4-layer resolution (§4) | Candidate 14 (Project context files) | supersedes: the v2.3 pattern is a refinement of the v3 architectural refactor | +| Per-model token-cap awareness (§5) | Candidate 3 (Stateless LLMClient) | extends: the windows table is a refinement of the stateless client | +| Delegation rewrite: decompose-or-isolate (§6) | Candidate 1 (SubConversationRunner) | extends: the recursion bug + two-reason framing tighten the contract | +| Robustness: 4 hardening commits (§7) | (new in v3) | independent | +| Operating rules Q9: different machine (§8) | Candidate 16 (AGENTS.md @import + canonical DOD) | extends: Q9 is a v3 refinement of the canonical DOD | +| Case-study methodology: 5-element pattern (§9) | (new in v3) | independent | +| PEP case study: 2.04× byte-identity (§10) | (empirical evidence, not candidate) | independent | +| Collisions case study: 101.06× tolerance-based (§11) | (empirical evidence, not candidate) | independent | + +--- + +## 3. The new v3 candidates (not in v2.3) + +These are the v3-only candidates — see `decisions.md` for the full entry per candidate. + +### Candidate 17: Campaign-style plan-as-data for the conductor + +The conductor's `plan.md` is not operable today — the model's "what to do next" is re-made every turn. v3 §1 introduces campaigns as a four-piece composition (artifact + driver + invariants + context surfaces) with four load-bearing invariants: **one pass then exit; one writer for the tree; review gate not cap; schema is the whole schema**. Making the conductor's plan operable is the same data-oriented move. **HIGH priority.** + +### Candidate 18: Discussion-window safety net for Manual Slop + +v3 §2 introduces a four-piece composition (trigger + writer + rebuild + provenance) with the critical invariant: rebuild runs a synchronous checkpoint first, and the writer's failure widens the tail instead of blocking. The 3-number config (`checkpoint_interval_minutes`, `checkpoint_max_new_kb`, `rebuild_at_kb`) is a model Manual Slop should follow. Long-running discussions currently grow unbounded; the rebuild trigger is a structural fix. **HIGH priority.** + +### Candidate 19: Per-turn ground-truth hook for Manual Slop + +v3 §3 introduces hooks as a three-piece composition (resolve + invoke + inject). The case-study harness scripts ARE the hooks: `prove-optimized-harness.sh` is the command wired into `--hook-per-run`. The model responds against measured state instead of its recollection. **MEDIUM priority.** + +### Candidate 20: Rename `nagent-gc` → `nagent-distill` in our documentation cross-references + +v3 §4 renames `nagent-gc` to `nagent-distill` (no compatibility alias). The new name encodes the operation's true semantic: knowledge becomes capability, gated by review. The merge/graduate passes are an explicit consequence. **LOW priority (docs only).** + +### Candidate 21: Per-model token-cap awareness for Manual Slop `ai_client` + +v3 §5 introduces the verified-windows table (10 models verified against the Together API). Unknown models return `None` and fall back to byte-only behavior — not a guessed default. The 0.85 safety fraction is the data-oriented response to "model capability degrades under high context utilization, not just at the limit." **MEDIUM priority.** + +### Candidate 22: Tier 3 worker contract "decompose or isolate, never offload" + +v3 §6 fixes a recursion bug (file-edit agent → worker → nagent-file-edit → file-edit agent → ... hangs the tree) by naming the two reasons delegation is worth its cost: **decomposition** (the task is genuinely complex, with parts) and **context isolation** (the step is noisy, the result is small). "Don't offload a single small action whose result is no smaller than doing it yourself." The 315fe9e test-fix is also a useful precedent: agent's `test_*.py` for any user-facing prompt change must run the suite, not just `py_compile`. **HIGH priority.** + +### Candidate 23: Per-conversation scratch directory for Manual Slop dispatch_inference + +v3 §7 introduces the per-conversation scratch dir as a hardening commit (`49e07f3`). Each instance gets its own directory keyed by conversation name; concurrent instances never collide in a shared `/tmp`. **MEDIUM priority.** + +### Candidate 24: Document Q9 ("consider a different machine") in the project's `conductor/code_styleguides/data_oriented_design.md` + +v3 §8 surfaces the Q9 expansion (the only addition since v2.3). Q9 generalizes the simplification pass from "trim the current machine" to "consider a different machine when the data's shape points to it." **LOW priority (docs only).** + +### Candidate 25: Optimization-log discipline for Manual Slop agent work + +v3 §9 surfaces the case-study methodology's 5-element pattern; the `OPTIMIZATION-LOG.md` is the per-hypothesis history file. Both case studies document rejected experiments with measurements; the methodology's data discipline is load-bearing. **MEDIUM priority.** + +### Candidate 26: `OPTIMIZATION-LOG` schema for Manual Slop agent work + +The schema is portable; Manual Slop agents could adopt it for any multi-iteration optimization. Sub-pattern of Candidate 25. **LOW priority.** + +### Candidate 27: Tolerance-based comparator for Manual Slop agent work + +v3 §11 documents the collisions case study's tolerance-based match contract. The comparator pattern is reusable; Manual Slop's `RAGEngine._chunk_code` and other float-based work could adopt it. **MEDIUM priority.** + +--- + +## 4. The v2.3 candidates v3 supersedes + +Of the 16 v2.3 candidates, v3 supersedes **1** (Candidate 5, Self-describing MCP tools — subsumed by the v3 hooks pattern + `mcp_architecture_refactor_20260606`) and **promotes 1** (Candidate 11, Knowledge harvest — the v3 rename to `nagent-distill` + merge/graduate passes is the data-grounded refinement). + +The remaining 14 v2.3 candidates remain **STILL-OPEN** per `decisions.md` §"v2.3 → v3 candidate status mapping." The v3 doesn't invalidate them; it adds new patterns that are orthogonal to most of the v2.3 candidates. + +--- + +## 5. Sibling-review pointers + +### `fable_review_20260617` — Fable's analysis of Mythos system prompt + +The Fable review analyzes the Mythos system prompt's "watch-dogging" pattern (be careful, watch yourself, never claim something you can't verify). v3 §8 is the data-oriented response: Acton's operating rules ("sampling can justify replacing the machine") are the data-grounded alternative to persona-based caution. Fable's anti-pattern (mental-health watch-dogging, refusal framing) is the opposite of nagent's pattern (sample the data, replace the machine). The two reviews together surface the philosophical difference between persona-based safety and data-grounded safety. Touchpoints: v3 §8 (Operating rules) + the project styleguide's Q9 candidate (Candidate 24). + +### `intent_dsl_survey_20260612` — survey's 10 prior-art clusters + +The survey's Cluster 4 ("Meta-Tooling DSLs") is the closest prior art to v3 §9's case-study methodology (the 4 prompts ARE an intent-DSL for "drive nagent at an optimization problem"). The survey's Cluster 3 ("intent-mapping") is the philosophical anchor: mapping user intent to tool invocations is what DSLs do, and nagent's prompts are a primitive form of that mapping. Touchpoints: v3 §9 (Case-study methodology) + §10 + §11. + +### `superpowers_review_20260619` — superpowers plugin review + +The superpowers `brainstorming` skill asks structured questions to refine an idea before implementation; the case-study 4 prompts serve the same role. Both encode "the model should not skip the early work." Touchpoints: v3 §9 (Case-study methodology). + +--- + +## What v3 takeaways ADD over v2.3 takeaways + +The v2.3 takeaways (`nagent_takeaways_20260608.md`) are 10 actionable patterns. v3 adds: + +1. **3 first-class subsystems** (Campaigns, Safety net, Hooks) — each is a coherent module with its own invariant set +2. **1 new provider** (Together) with per-model context windows as a new precision layer +3. **1 delegation bug fix** (recursion) with a documented test-fix precedent +4. **8 expanded pattern areas** — Operating rules Q9, Robustness 4 hardening commits, Provider expansion, etc. +5. **2 case studies** demonstrating the methodology in production (PEP, Collisions) +6. **1 new abstraction** (case-study methodology, §9) — the 5-element pattern with parameterizable match contract +7. **1 rename with semantic shift** (`nagent-gc` → `nagent-distill`) +8. **11 new candidates** for Manual Slop follow-up tracks (3 HIGH, 4 MEDIUM, 4 LOW) + +The v2.3 takeaways are not invalidated; they are a foundation v3 builds on. Read both: v2.3 for the durable principles, v3 for the empirical demonstration. \ No newline at end of file