conductor(track): nagent review - deep-dive + 6 pitfalls + 10 actionable takeaways

Reference/analysis track. Produces 0 code changes. Artifacts (conductor/tracks/nagent_review_20260608/): - spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing - report.md (571 lines) - 14-section deep-dive; primary deliverable - comparison_table.md (79 lines) - flat side-by-side reference - decisions.md (286 lines) - 10 future-track candidates with priority matrix - nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded in code (file:line refs into nagent source and Manual Slop source) - metadata.json (132 lines) - structured metadata + verification criteria - state.toml (113 lines) - per-task tracking + user-corrections log (7 entries) 14 nagent principles covered in report.md (durable work, text-in/text-out, editable state, visible protocol, the loop, per-file memory, repo history, neighborhoods, sub-conversations, controlled writes, large files, tool discovery, framework differences, build your own). 6 pitfalls (revised from 8 after user-corrections): 1. No structured output protocol in Application AI (opaque function calling) 2. Provider-specific history in process globals (ai_client._anthropic_history + _deepseek_history + _minimax_history) 3. RAG is not 'history as data' (fuzzy, not auditable) 4. AI client is a stateful singleton (2,685-line ai_client.py) 5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want) 6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py) User-corrections applied (3 rounds, 7 total corrections recorded): - Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7 per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix - Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed conversation log; complementary, not equivalent) - Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for 1:1 discussions' (user wants this) - RAG: opt-in, not gap; user wants pre-staging via sub-conversation - Personas: config bundling (can opt out via AI settings) - Tool discovery: deferred (user has 'intent based DSL' idea but 'no where near that ideation yet') 10 actionable takeaways (separate from the 6 pitfalls - those are diagnosis, these are prescription): 1. State visibility (UI inspector for in-process state) 2. Readable conversation log (text-greppable, not just JSON-L) 3. Sub-agents for 1:1 (HIGH priority - user-flagged) 4. File-identity over file-path (st_dev:st_ino rename-safe) 5. One loop shape visible in diagnostics 6. Visible retry on protocol failure 7. Meta-Tooling DSL (intent-based, deferred) 8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606) 9. Single source of truth for disc_entries + provider history 10. Sub-agent return type constraint (bake into candidate #1 spec) Domain classification: every recommendation tagged Application / Meta-Tooling / Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling domain; Manual Slop's Application AI is a different kind of thing. No code modified by this track (reference/analysis only). All 7 files parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve. Track is 'active' awaiting human review; future-track candidates live in decisions.md and nagent_takeaways_20260608.md.
2026-06-08 18:44:35 -04:00
parent c9a991bbb8
commit 9cc51ca9af
7 changed files with 1784 additions and 0 deletions
@@ -0,0 +1,79 @@
 # nagent vs Manual Slop: Comparison Table
 **Companion to:** `report.md`
 **Date:** 2026-06-08 (revised same day)
 **Source:** nagent v1.0.0 (read 2026-06-08)
 Flat side-by-side reference. One row per nagent principle. Verdicts and pitfalls are in `report.md`.
 ---
 ## Legend
 - **Verdict values:** PARITY (same shape), PARITY+ (Manual Slop is stronger), PARITY- (nagent is stronger), PARTIAL (one half, not the other), GAP (Manual Slop lacks the feature), DOMAIN MISMATCH (different scope).
 - **Domain tags:** APP = Application domain, MT = Meta-Tooling domain, BOTH.
 ---
 | # | nagent Principle (verbatim summary) | nagent Mechanism | Manual Slop Equivalent | Verdict | Domain | Action |
 |---|---|---|---|---|---|---|
 | 1 | Durable work, disposable workers. The agent is not the thing; the data is the thing. | `bin/nagent` 700-line single-file loop, conversation is a text file | MMA workers are real subprocesses with Context Amnesia; **Application AI is long-lived by design** | **PARTIAL** | BOTH | Future-track: stateless `LLMClient` class (§15.4) |
 | 2 | Text in, text out. File in, text out is the smallest useful primitive. | `bin/nagent-llm-text` + `bin/helpers/nagent_llm.py` (4 providers) | `src/ai_client.py:send(...) -> str` (5 providers) | **PARITY** | BOTH | None |
 | 3 | Conversations are editable state. The conversation file is not chat history; it is working state. | `bin/nagent` exposes `--save/load/edit/summarize`; text files are user-editable (vim/cat/diff/cp the raw transcript) | Discussion Takes + branching + per-entry edit (A1-A7 in report §3) + discussion-level CRUD (B1-B11) + role management (B5) + UI snapshot undo/redo (C1-C5) | **PARITY (DIFFERENT FOCUS)** — Manual Slop edits abstracted typed entries (`disc_entries` is a `list[dict]` with role + content + ts + thinking_segments + usage). Both have comprehensive editing; Manual Slop's is more granular at the entry layer, nagent's is deeper at the raw-transcript layer. | APP | Future-track: optional raw-transcript persistence per Take (Candidate 10) |
 | 4 | Visible output protocol. Teach the model an output format; use a visible, parseable protocol. | `TAG_PATTERNS` regex list; `parse_response` strict; `MAX_FORMAT_RETRIES = 3` | Provider-native function calling (Gemini, Anthropic, etc.) | **ARCHITECTURAL DIFFERENCE** — Application's choice is correct (parallel tool calls, JSON mode) | BOTH | Future-track: intent-based DSL for Meta-Tooling calls |
 | 5 | The loop. Append, call, parse, act, append, repeat. | `bin/nagent:run_agent_loop()` 50 lines, single `while True` | Three parallel loops: `ai_client._send_*` (LLM), `ConductorEngine.run` (MMA), `WorkflowSimulator.run_discussion_turn_async` (App) | **PARITY** | BOTH | (Low priority) Future-track: extract a single `src/llm_loop.py:run_loop` |
 | 6 | Per-file memory. Each file gets its own persistent local memory. | `file_id_for_path` (st_dev:st_ino); `conversations/file-index-{pid}.json`; `nagent-file-edit` per-file subprocess | `FileItem` (path + view_mode + ast_mask + custom_slices); `ContextPreset` (saved set of FileItems); Structural File Editor | **PARITY (DIFFERENT KIND)** — Manual Slop's is *curation memory* (rich); nagent's is *conversation log memory* (plain text). Both real, both per-file, different optimization. | APP | Future-track: thin "last-investigation" log per file (Meta-Tooling-friendly) |
 | 7 | Repository history as data. Turn git history into editing context. | `git_file_history` + `summarize_new_file_commits` + `coedited_file_rows` + `format_file_history` | `_reread_file_items` (mtime-based, diff injection); git-linked discussion tracking in GUI; **no historical-context injection** | **PARTIAL** — diff injection is similar; historical-context injection is missing | APP | Future-track: `src/git_history.py` mirroring nagent's `file_edit_history_and_summary_block` |
 | 8 | Historical coupling & artifact neighborhoods. Files that change together are hints. | `coedited_file_rows` labels high/medium/low co-edit rate; guidance text "Use these files as hints. Do not edit unless the user request or evidence requires it." | None (closest: `py_get_hierarchy` is structural not historical) | **GAP** | APP | Future-track: `py_coedited_files` + `ts_c_coedited_files` MCP tools |
 | 9 | Disposable sub-conversations. Exploration creates noise; spawn disposable workers. | `<nagent-conversation>` tag spawns `nagent --invocation delegated` as subprocess; isolated conversation file; recursive token rollup | MMA Tier 3/4 workers (real subprocesses); **1:1 main discussion has no sub-conversation mechanism** | **PARITY for MMA; GAP for 1:1 discussions** | APP (and MT) | **USER-FLAGGED WANT**: Future-track `src/sub_conversation.py:SubConversationRunner` for 1:1 investigations |
 | 10 | Controlled writes. A loop that writes files needs explicit boundaries. Not a sandbox; just conventions. | `validate_write_path`: main mode → tmpdir only; file-edit mode → target or segments; rejected writes append `<nagent-write-result status="error">` | `mcp_client._is_allowed` (3-layer: allowlist + path validation + resolution gate); `run_powershell` requires GUI modal approval; PowerShell-only by default; 60s timeout + `taskkill` cleanup; optional Tier 4 QA | **PARITY+ (Manual Slop stronger)** — 3-layer security + HITL + sandbox is dramatically stricter than nagent's tmpdir check | APP (and MT) | None — current design is right |
 | 11 | Large files as explicit artifacts. Split, edit segments, patch. | `nagent-file-split` (11 langs, regex + line counts + brace/JSON/XML depth); `nagent-file-patch` (strict hash validation); `nagent-file-summarize` (per-segment + retry); 32 KB default; index.json with `source_path`, `sourcesha256`, `segments[]` | `aggregate.py:build_file_items` + `py_get_skeleton` (tree-sitter) + `ts_c_*_get_skeleton` (tree-sitter); `set_file_slice` / `edit_file` (mtime validation, not hash); `run_subagent_summarization` (in-process, no retry); `RAGEngine._chunk_code` (mtime-based, ChromaDB) | **PARITY (DIFFERENT MECHANISM)** — both have the insight; nagent uses per-language scoring functions + subprocess isolation + hash validation; Manual Slop uses tree-sitter + in-process + mtime validation | BOTH | Future-track: explicit `src/split_lib.py` + `src/patch_lib.py` mirroring nagent's design, with hash validation |
 | 12 | Tool discovery. Tool capability should be explicit data. | `collect_bin_tool_descriptions` runs each `bin/* --description`; auto-builds "Available tools:" block for initial context | None (45 tools in `mcp_client.py:dispatch` if/elif chain) | **GAP** — nagent's pattern is genuinely better; current dispatch is fine but not extensible | BOTH (especially MT) | Future-track: subsumed by `mcp_architecture_refactor_20260606` (sub-MCPs as self-describing modules) |
 | 13 | Differences from frameworks. The reframing table: memory→editable artifact, agent→temporary transformation function, context→explicit input data. | The philosophical frame | The applicable reframings: editable UI state, curated per-file memory, git history as data | **N/A** | BOTH | (Lens, not action) |
 | 14 | Build your own. 12-step buildable list. | The reference | Manual Slop has all 12, in different files, at different scale | **PARITY** | BOTH | (Checklist) |
 ---
 ## The 6 Pitfalls (revised, after user-corrections)
 See `report.md §15` for full details. Quick reference:
 | # | Pitfall | Domain | Future-track | User flag? |
 |---|---|---|---|---|
 | 1 | No structured output protocol in Application AI (opaque function calling) | BOTH | Intent-based DSL for Meta-Tooling | Implicit ("intent based DSL to help with discovery") |
 | 2 | Provider-specific history in process globals (`_anthropic_history`, `_deepseek_history`, etc.) | APP | Stateless `LLMClient` class | No |
 | 3 | RAG is not "history as data" (fuzzy, not auditable) | APP | RAG pre-staging sub-conversation | **Yes** ("Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run") |
 | 4 | AI client is a stateful singleton with module-level globals (2,685-line file) | APP | Stateless `LLMClient` class (same as #2) | No |
 | 5 | No non-MMA disposable sub-conversations | APP (and MT) | `src/sub_conversation.py:SubConversationRunner` | **Yes** ("I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points") |
 | 6 | Hard-coded tool discovery (45-tool if/elif chain) | BOTH | Subsumed by `mcp_architecture_refactor_20260606` | Implicit ("intent based DSL to help with discovery") |
 ### Pitfalls removed by user-corrections
 - **(removed)** "Conversation state is buried in module-level globals" — overstated. Manual Slop has editable UI state (Takes, UISnapshot, ContextPreset); the lack of editable raw transcripts is a *different* design choice, not a gap. See `report.md §3`.
 - **(removed)** "No per-file memory" — overstated. Manual Slop *does* have per-file memory in the curation dimension (FileItem + ContextPreset + Fuzzy Anchors); what's missing is nagent's conversation-log dimension, which is a *different* optimization. See `report.md §6`.
 ---
 ## Future-track candidates — priority list
 Ordered by user signal + implementation cost:
 1. **`src/sub_conversation.py:SubConversationRunner`** — user-flagged as a want. Extract MMA's `mma_exec.py` pattern into a reusable App-callable class. Useful for 1:1 investigations. **High priority.** (Pitfall #5)
 2. **RAG pre-staging via sub-conversation** — user-flagged as a want. A sub-agent pre-builds the RAG index for a planned run; the chunks become the discussion's starting memory. **High priority.** (Pitfall #3)
 3. **Stateless `LLMClient` class** — would unify Pitfall #2 and #4. Backwards-compatible with `ai_client.send()`. ~2-3 phases of careful refactor. **Medium priority.**
 4. **Intent-based DSL for Meta-Tooling tool calls** — user-noted as a want ("no where near that ideation yet"). **Low priority, research spike.**
 5. **Self-describing MCP tools (nagent §12 pattern)** — subsumed by `mcp_architecture_refactor_20260606`. **Low priority on its own.**
 6. **`src/git_history.py` for nagent §7 pattern** — historical context injection. **Medium priority, but only after #1-#2 are done.**
 7. **Per-file conversation log (nagent §6 conversation dimension)** — Meta-Tooling-friendly addition. **Low priority.**
 8. **`py_coedited_files` / `ts_c_coedited_files` MCP tools (nagent §8)** — small, contained. **Low priority.**
 9. **Explicit `src/split_lib.py` + `src/patch_lib.py` (nagent §11)** — only needed if very-large-file scenarios emerge. **Defer until needed.**
 10. **Optional raw-transcript persistence per Take (nagent §3 conversation dimension)** — niche. **Low priority.**
@@ -0,0 +1,286 @@
 # Future-Track Candidates: nagent Review Follow-ups
 **Companion to:** `report.md` (deep-dive), `comparison_table.md` (flat reference), `nagent_takeaways_20260608.md` (actionable patterns)
 **Date:** 2026-06-08
 **Source:** nagent v1.0.0 deep-dive review (see `report.md`)
 This document is the bridge from "what nagent teaches us" to "what Manual Slop should do about it." Each candidate is a *future* conductor track (not this one). The candidates are *not* committed — they emerge from the analysis but each is a separate scoping exercise.
 **For an actionable, code-grounded read of these candidates** (with the "what to do today, not just the future track" framing), see `nagent_takeaways_20260608.md` — it maps each candidate to specific patterns, design constraints, and small UX wins that don't need a new track.
 ---
 ## Decision-making framework
 For each candidate:
 - **Why it matters** — what pitfall or capability gap does it address?
 - **What it would do** — concrete description
 - **Where it would live** — Application or Meta-Tooling
 - **Dependency on existing tracks** — is anything already on the board?
 - **Effort estimate** — small / medium / large
 - **User signal** — has the user expressed want/don't-want/neutral?
 - **Recommended priority** — high / medium / low
 The candidates are listed in priority order, which factors user signal heaviest (the user is the product owner for the Application; the analysis is just a reference).
 ---
 ## Candidate 1: `src/sub_conversation.py:SubConversationRunner`
 **User signal:** **EXPLICIT WANT** ("I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points.")
 **Why it matters.** nagent's §9 pattern (disposable sub-conversations via `<nagent-conversation>`) is the cleanest way to handle "investigate this without polluting the main discussion." Manual Slop has it for MMA (`mma_exec.py` is a real subprocess) but not for 1:1 discussions. The user is asking for this.
 **What it would do.** A `SubConversationRunner` class that the App can call during a 1:1 discussion:
 - `await runner.spawn(prompt: str, *, allowed_tools: list[str] = None, system_prompt: str = None) -> SubConversationResult`
 - The runner spawns a fresh Python process (reusing the MMA pattern: `mma_exec.py` template with `--invocation user`, `--parent-conversation <active_discussion_id>`, isolated `~/.manual_slop/sub_conversations/<name>`)
 - The sub-process runs to completion (or times out)
 - Result returns: a concise artifact (the sub-agent's `<response>` block) + token usage + exit code
 - The App inserts the result into the active discussion as a "User" role entry (so the parent LLM sees it on the next turn)
 - Cleanup: sub-conversation folder is auto-archived after 7 days (consistent with `log_pruner.py`)
 **Where it lives.** Application. Possibly Meta-Tooling too (the `scripts/` directory could use the same primitive).
 **Depends on.** None directly. Could leverage MMA's `mma_exec.py` as a starting template. The `public_api_migration_20260606` follow-up track is unrelated.
 **Effort.** **Medium.** 2-3 phases: (1) extract reusable subprocess skeleton from MMA, (2) add 1:1-specific context injection, (3) add GUI controls ("Investigate…" button, optional command-palette command).
 **Recommended priority.** **HIGH** — user-flagged.
 ---
 ## Candidate 2: RAG pre-staging via sub-conversation
 **User signal:** **EXPLICIT WANT** ("Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run.")
 **Why it matters.** Manual Slop's RAG (`src/rag_engine.py`) indexes files on the fly at discussion start. For large projects, indexing can take 30+ seconds (per `tests/test_rag_phase4_stress.py`). The user wants a "prep" workflow: before starting a long discussion, fire off a sub-conversation that pre-indexes everything, so the discussion starts instantly.
 This is also consistent with nagent's "data preparation is an explicit, visible step" philosophy (§1, §7). The RAG chunks are artifacts; preparing them is a transformation; the transformation can be a sub-conversation.
 **What it would do.** A "Pre-stage RAG" command in the GUI (or in `commands.py`):
 - Spawns a sub-conversation with the prompt: "Index all files in [project] for RAG. Use the index_file tool on every file in the context. Report top-K queries at the end."
 - The sub-conversation runs `rag_engine.index_file()` on each tracked file (uses the same `ChromaDB` backend, with mtime-based invalidation)
 - Returns a concise summary: "Indexed N files. Top-K for 'execution clutch': [file1, file2, file3]."
 - The main discussion starts with the index already warm; `RAGEngine.search()` is fast
 **Where it lives.** Application. The sub-conversation runner is the same primitive as Candidate 1; the staging logic is `RAGEngine` integration.
 **Depends on.** Candidate 1 (sub-conversation runner). Could be done as a feature within Candidate 1's track.
 **Effort.** **Small to medium.** The sub-conversation runner is the heavy lift (Candidate 1). The RAG-staging prompt is ~30 lines.
 **Recommended priority.** **HIGH** — user-flagged; cheap given Candidate 1.
 ---
 ## Candidate 3: Stateless `LLMClient` class
 **Why it matters.** `src/ai_client.py` is 2,685 lines of stateful singleton with module-level globals for every provider's history. nagent's `bin/helpers/nagent_llm.py` is 300 lines of stateless dispatch. A refactor toward a stateless `LLMClient(provider, model, conversation)` class would:
 - Make `ai_client` parseable (no implicit state to track)
 - Make tests deterministic (each test gets a fresh client)
 - Enable conversation save/load (the `Conversation` object is the transcript)
 - Enable provider switching without losing history
 This is a *big* refactor but a high-leverage one. Pitfalls #2 and #4 are both solved.
 **What it would do.** A new `src/llm_client.py`:
 ```python
@dataclass
 class Conversation:
    messages: list[Message]  # role + content + tool_calls + tool_results
    metadata: dict
    def to_dict(self) -> dict: ...
    def from_dict(data: dict) -> Conversation: ...
    def save(path: Path) -> None: ...
    def load(path: Path) -> Conversation: ...
 class LLMClient:
    def __init__(self, provider: str, model: str, api_key: str = None): ...
    def send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Conversation: ...
    def stream_send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Iterator[Event]: ...
 ```
 Backwards-compat: `ai_client.send(...)` becomes a thin wrapper that constructs a default `Conversation` from the current state and calls the new class.
 **Where it lives.** Application (the AI client is the Application's main AI entry point).
 **Depends on.** The `data_oriented_error_handling_20260606` track is independent but related — both push toward the data-oriented principles. The `public_api_migration_20260606` follow-up track would benefit from the new `Conversation` class.
 **Effort.** **Large.** 3-5 phases: (1) introduce `Conversation` dataclass, (2) per-provider `LLMClient.send`, (3) migration of existing `ai_client.send` callers, (4) deprecate module-level globals, (5) remove. ~2000+ lines of refactor.
 **Recommended priority.** **MEDIUM.** High value, but the existing stateful singleton works. Defer until a concrete Application need forces it (e.g., the user wanting to save/replay conversations).
 ---
 ## Candidate 4: Intent-based DSL for Meta-Tooling tool calls
 **User signal:** **EXPLICIT WANT** ("The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet.")
 **Why it matters.** nagent's §4 regex-tag protocol is more debuggable than Manual Slop's function-calling. The Meta-Tooling (the external agents that build the Application) could benefit from a more compact, inspectable tool-call format. The existing JSON function-calling format forces the user to read verbose `{"name": "...", "args": {...}}` blobs.
 **What it would do.** An intent-based DSL that the Meta-Tooling can use in its own work. Examples (per the user's "discovery" or "combinatorics" hint):
 - `<read src/foo.py:MyClass.method>` — intent: read this symbol
 - `<search "execution clutch">` — intent: semantic search the workspace
 - `<edit src/foo.py:42-50:new code>` — intent: surgical line-range edit
 - `<test tests/test_foo.py::test_bar>` — intent: run a specific test
 - `<discover what calls X>` — intent: dependency trace
 These are read by the external agent (Gemini CLI, OpenCode), not by Manual Slop's Application AI. The Application's function-calling format stays the same (correct for its domain).
 **Where it lives.** Meta-Tooling. Documented in `docs/`; taught via the conductor convention; the external agent emits the DSL, the bridge script (`cli_tool_bridge.py`) translates to actual `mcp_client.py` tool calls.
 **Depends on.** None directly. The `mcp_architecture_refactor_20260606` may produce tools that are easier to call via DSL (atomic, composable).
 **Effort.** **Research spike, not implementation.** The user said "no where near that ideation yet." This is a design exercise, not a code change.
 **Recommended priority.** **LOW** — user explicitly deferred.
 ---
 ## Candidate 5: Self-describing MCP tools (nagent §12 pattern)
 **Why it matters.** Manual Slop's 45 MCP tools are dispatched by a flat if/elif in `mcp_client.py:dispatch`. Adding a tool requires edits in 4 places (dispatch, security allowlist, capability declaration, tests). nagent's `--description` self-describing executable pattern is more extensible: drop an executable, it auto-appears.
 **What it would do.** Each sub-MCP (or each tool) emits a `--description` block on `--help`. The `dispatch` function introspects via `mcp_client.get_tool_schemas()` and includes the descriptions in the AI's initial context automatically.
 **Where it lives.** Application (the dispatch layer). The Meta-Tooling already has self-describing (via `claude_tool_bridge.py`); this is the Application-side equivalent.
 **Depends on.** The `mcp_architecture_refactor_20260606` is the natural place — the sub-MCPs would each be self-describing modules.
 **Effort.** **Medium** (subsumed by mcp_architecture_refactor_20260606). Not a separate track.
 **Recommended priority.** **LOW** — subsumed.
 ---
 ## Candidate 6: `src/git_history.py` (nagent §7 pattern)
 **Why it matters.** Manual Slop's `_reread_file_items` does current-content diff injection. nagent's `file_edit_history_and_summary_block` does *historical* content injection: `git log --follow <file>` per file, LLM-summarized, plus co-edit neighborhood. For "explain this file" questions, the LLM is meeting the file fresh — git history would give it crucial context (who touched it last, why, what's nearby).
 **What it would do.** A `src/git_history.py:file_edit_history_and_summary_block(file_path, repo_root, provider, model, config_path, previous_initial_context=None) -> str` that:
 - Calls `git log --follow --max-count=50 --date=short --format=...` per file
 - Counts co-edited files per commit
 - LLM-summarizes new commits (with cache for unchanged history)
 - Renders a `{file-history}` block with editors, step-by-step, co-edited files, summarized commits
 - Called from `aggregate.py:run` at discussion start, after the file is added to context
 **Where it lives.** Application (it's part of the AI's initial context).
 **Depends on.** None directly. The `data_oriented_error_handling_20260606` is independent. The `rag_engine.py` already has a `sourcesha256` field and mtime-based invalidation — the same pattern.
 **Effort.** **Medium.** 2 phases: (1) git history + co-edit, (2) LLM summarization with cache. ~300-500 lines.
 **Recommended priority.** **MEDIUM** — high value, but only after Candidates 1-2 are done.
 ---
 ## Candidate 7: Per-file conversation log (nagent §6 conversation dimension)
 **Why it matters.** Manual Slop's per-file memory is the *curation* kind. nagent's is the *conversation log* kind. The user has the curation already; the conversation log is missing. The user's correction made this clear: the two are *different optimizations*, not equivalent.
 **What it would do.** A thin `~/.manual_slop/per_file/<file_id>.md` per file (file_id by `st_dev:st_ino` for stability across renames, like nagent). Updated each time a discussion references the file. Format:
 ```markdown
 # src/foo.py (file_id: 12345:67890)
 Last referenced: 2026-06-08T12:34:56 (Discussion: "refactor auth")
 ## 2026-06-08T12:34:56 - "how does the validation work?"
 AI response: ...
 (User) followup: "what about edge cases?"
 ## 2026-06-05T... - "explain the parser"
 AI response: ...
 ```
 When the user opens a new discussion with the file in context, the per-file log is injected as a `{per-file-history}` block.
 **Where it lives.** Application (the per-file log is the App's memory). The Meta-Tooling doesn't need this — sub-agent invocations are already short-lived.
 **Depends on.** None. Could be added in a small follow-up to Candidate 3 (the `Conversation` object becomes the per-file log).
 **Effort.** **Small** if done as a thin layer on top of the `Conversation` class. **Medium** if done before Candidate 3 (no `Conversation` object to leverage).
 **Recommended priority.** **LOW** — niche, niche feature.
 ---
 ## Candidate 8: `py_coedited_files` / `ts_c_coedited_files` MCP tools (nagent §8)
 **Why it matters.** nagent's `coedited_file_rows` produces a "files that historically co-edit with this file" table. Manual Slop has `py_get_hierarchy` (subclass scan) but no historical co-edit tool. Useful for "if I edit this file, what should I also look at?".
 **What it would do.** Two new MCP tools:
 - `py_coedited_files(path: str) -> list[{path, commits_together, likelihood}]` — runs `git log --follow <path>`, counts files in each commit, labels high/medium/low
 - `ts_c_coedited_files(path: str) -> list[{path, commits_together, likelihood}]` — same, for C/C++
 Returns a table. Used in the initial context as `{file-neighborhood}`.
 **Where it lives.** Application (initial context injection).
 **Depends on.** None. Small, contained.
 **Effort.** **Small.** ~200 lines + tests. The git-log is already in `aggregate.py`; this is a new tool that uses the same primitives.
 **Recommended priority.** **LOW** — small but niche. Worth bundling with Candidate 6 if that gets done.
 ---
 ## Candidate 9: Explicit `src/split_lib.py` + `src/patch_lib.py` (nagent §11)
 **Why it matters.** Manual Slop doesn't have an explicit split/patch pipeline. For very large files (>50 KB), the current `aggregate.py` + tree-sitter approach works for *reading* (skeleton, summary) but not for *patching* (no explicit segment/hash model).
 **What it would do.** Mirror nagent's design:
 - `src/split_lib.py` — per-language natural splitters, `index.json` with `source_path`, `sourcesha256`, `segments[]`
 - `src/patch_lib.py` — strict `validate_index` (hash check), `make_unified_patch`, `apply_segment_patches`
 - `src/summarize_lib.py` — per-segment LLM call + retry-with-smaller-prompt
 **Where it lives.** Application (the AI is the consumer). The Meta-Tooling already has nagent if it wants this.
 **Depends on.** None. Self-contained.
 **Effort.** **Medium.** 2 phases: split/patch, then summarize. ~500 lines.
 **Recommended priority.** **DEFER UNTIL NEEDED.** No current 1:1 use case requires explicit split/patch. If a future file is genuinely too large for tree-sitter to handle inline, this becomes Candidate #2-priority.
 ---
 ## Candidate 10: Optional raw-transcript persistence per Take (nagent §3 conversation dimension)
 **Why it matters.** nagent's "edit the conversation file" pattern is foreign to Manual Slop because the App stores abstracted entries (`disc_entries`), not raw transcripts. The user-edit feature in the GUI does edit individual entries, but the underlying log of `function_call` / `tool_result` blocks is implicit.
 **What it would do.** Optionally, when a take is snapshotted to TOML (`project_manager.save_project`), also persist the raw transcript to a sibling file `discussions/<take_name>/transcript.jsonl`. The GUI gets a "View Raw Transcript" button. Optional "Edit Raw Transcript" mode that re-parses and re-aggregates.
 **Where it lives.** Application. Optional — user can toggle per-project.
 **Depends on.** None. Could be a small follow-up to Candidate 3 (`Conversation` class).
 **Effort.** **Small.** ~150 lines + tests. Persist the existing `comms.log` in a structured way.
 **Recommended priority.** **LOW** — niche feature, opt-in only.
 ---
 ## Summary table
 | # | Candidate | User signal | Priority | Effort | Domain |
 |---|---|---|---|---|---|
 | 1 | `SubConversationRunner` (1:1 sub-convos) | **Explicit want** | **HIGH** | Medium | App + MT |
 | 2 | RAG pre-staging via sub-conversation | **Explicit want** | **HIGH** | Small (depends on #1) | App |
 | 3 | Stateless `LLMClient` class | (none) | Medium | Large | App |
 | 4 | Intent-based DSL for Meta-Tooling | Explicit but deferred | Low | Research | MT |
 | 5 | Self-describing MCP tools | Implicit | Low (subsumed) | Medium | BOTH |
 | 6 | `src/git_history.py` (nagent §7) | (none) | Medium | Medium | App |
 | 7 | Per-file conversation log | (none) | Low | Small | App |
 | 8 | `py_/ts_c_coedited_files` tools | (none) | Low (bundle with #6) | Small | App |
 | 9 | Explicit `split_lib.py` / `patch_lib.py` | (none) | Defer until needed | Medium | App |
 | 10 | Raw-transcript persistence per Take | (none) | Low | Small | App |
 ---
 ## Recommended next steps
 1. **Spec and build Candidate 1 first** — it's the highest-priority user-flagged want, and Candidates 2 builds on it.
 2. **Combine Candidate 2 with Candidate 1's track** — same primitive, different prompt.
 3. **Hold Candidates 3-10 for future scoping** — each is a separate conductor track when the corresponding need surfaces.
 The current `nagent_review_20260608` track itself produces no code; it's the reference. Candidates 1 and 2 will be the first *implementation* tracks informed by it.
@@ -0,0 +1,132 @@
 {
  "track_id": "nagent_review_20260608",
  "name": "nagent Review (Mike Acton's data-oriented LLM agent reference)",
  "initialized": "2026-06-08",
  "owner": "tier2-tech-lead",
  "priority": "medium",
  "status": "active",
  "type": "reference + analysis + future-track scoping",
  "scope": {
    "new_files": [
      "conductor/tracks/nagent_review_20260608/spec.md",
      "conductor/tracks/nagent_review_20260608/report.md",
      "conductor/tracks/nagent_review_20260608/comparison_table.md",
      "conductor/tracks/nagent_review_20260608/decisions.md",
      "conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md"
    ],
    "modified_files": [],
    "external_resources": [
      "nagent README: https://github.com/macton/nagent/blob/main/README.md",
      "nagent source: https://github.com/macton/nagent (all 11 source files read in full)"
    ]
  },
  "blocked_by": [],
  "blocks": [
    "sub_conversation_runner_app_1to1_20260608_PLACEHOLDER",
    "rag_pre_staging_sub_convo_20260608_PLACEHOLDER",
    "llm_client_stateless_class_20260608_PLACEHOLDER",
    "intent_dsl_for_meta_tooling_20260608_PLACEHOLDER",
    "git_history_injection_20260608_PLACEHOLDER",
    "per_file_conversation_log_20260608_PLACEHOLDER",
    "py_coedited_files_tool_20260608_PLACEHOLDER",
    "ts_c_coedited_files_tool_20260608_PLACEHOLDER",
    "split_patch_lib_20260608_PLACEHOLDER",
    "raw_transcript_persistence_per_take_20260608_PLACEHOLDER"
  ],
  "estimated_phases": 0,
  "spec": "spec.md",
  "plan": null,
  "nagent_principles_covered": [
    "Durable work, disposable workers",
    "Text in, text out",
    "Conversations are editable state",
    "Visible output protocol",
    "The loop",
    "Per-file memory",
    "Repository history as data",
    "Historical coupling & artifact neighborhoods",
    "Disposable sub-conversations",
    "Controlled writes",
    "Large files as explicit artifacts",
    "Tool discovery",
    "Differences from frameworks",
    "Build your own"
  ],
  "manual_slop_features_audited": [
    "Context composition (FileItem + ContextPreset + custom_slices + ast_mask)",
    "Discussion Takes + branching (project_manager.branch_discussion + promote_take)",
    "UI Snapshot history (HistoryManager + UISnapshot)",
    "Personas (Persona + PersonaManager)",
    "RAG (RAGEngine + ChromaDB + summarization)",
    "Multi-provider AI client (ai_client + 5 providers)",
    "MMA conductor (mma_exec.py + ConductorEngine + WorkerPool)",
    "MCP tools (45 tools + 3-layer security)",
    "Hook API (api_hooks + api_hook_client)",
    "GUI App/Controller state delegation"
  ],
  "user_corrections_applied": [
    "Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS)",
    "Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN CURATION DIMENSION",
    "Sub-conversations: removed 'PARITY stronger' claim; added 'GAP for 1:1 discussions'",
    "RAG: clarified as opt-in, not gap; user wants pre-staging via sub-conversation",
    "Personas: reframed as config bundling (not gap; can opt out via AI settings)",
    "Tool discovery: downgraded to 'intentional, low priority'; user has deferred DSL idea",
    "Editable discussions (second pass): report §3 now enumerates the full per-entry (A1-A7) + discussion-level (B1-B11) + undo/redo (C1-C5) operation matrix. Verdict remains PARITY (DIFFERENT FOCUS) but the gap is more precisely scoped: Manual Slop's editing is more granular at the typed-entry layer; nagent's is deeper at the raw-transcript layer."
  ],
  "domain_classification": {
    "Application_domain_pitfalls": [
      "Provider-specific history in process globals",
      "AI client is a stateful singleton with module-level globals",
      "No non-MMA disposable sub-conversations (1:1 gap)",
      "RAG is not 'history as data' (fuzzy vs exact)",
      "Optional raw-transcript persistence (niche)"
    ],
    "Meta_Tooling_domain_pitfalls": [
      "No structured output protocol (opaque function calling)",
      "Hard-coded tool discovery"
    ],
    "Application_features": [
      "Context composition with FileItem-level curation memory",
      "Discussion Takes + branching (project_manager.branch_discussion + promote_take)",
      "UI Snapshot history (HistoryManager + UISnapshot)",
      "Personas as config bundling",
      "RAG as opt-in semantic search",
      "3-layer MCP security model + Execution Clutch"
    ],
    "Meta_Tooling_features_to_borrow": [
      "nagent-style --description self-describing executables",
      "Intent-based DSL for compact tool calls"
    ]
  },
  "verification_criteria": [
    "spec.md exists and covers the 14 nagent principles",
    "report.md exists and is the primary deliverable",
    "comparison_table.md exists as flat side-by-side reference",
    "decisions.md exists with 10 future-track candidates",
    "nagent_takeaways_20260608.md exists with 10 actionable patterns (companion to report.md)",
    "Every pitfall is tagged with Application / Meta-Tooling / Both",
    "Pitfall #3 (conversations are editable) verdict is corrected to PARITY (DIFFERENT FOCUS) per user feedback",
    "Pitfall #6 (per-file memory) verdict is corrected to 'Manual Slop is stronger in curation dimension' per user feedback",
    "Pitfall #9 (sub-conversations) verdict notes MMA vs 1:1 distinction per user feedback",
    "Report §3 enumerates the per-entry (A1-A7) + discussion-level (B1-B11) + undo/redo (C1-C5) operation matrix for Manual Slop's editable-discussion system, with file:line citations into gui_2.py and history.py",
    "nagent_takeaways_20260608.md grounds each pattern in actual code with file:line references into both nagent source and Manual Slop source",
    "No code was modified by this track (reference/analysis only)"
  ],
  "links": {
    "report": "report.md",
    "comparison_table": "comparison_table.md",
    "decisions": "decisions.md",
    "takeaways": "nagent_takeaways_20260608.md",
    "user_signal_recorded": "User explicitly flagged SubConversationRunner + RAG pre-staging as wants during review",
    "related_tracks": [
      "data_oriented_error_handling_20260606 (Fleury/Acton alignment)",
      "qwen_llama_grok_integration_20260606 (OpenAI-compatible helper)",
      "mcp_architecture_refactor_20260606 (sub-MCP extraction)",
      "data_structure_strengthening_20260606 (type aliases)"
    ],
    "external": [
      "https://github.com/macton/nagent (nagent source code)",
      "https://github.com/macton/nagent/blob/main/README.md (nagent README)"
    ]
  }
 }
@@ -0,0 +1,363 @@
 # nagent: Actionable Takeaways for Manual Slop
 **Track:** `nagent_review_20260608`
 **Date:** 2026-06-08
 **Companion to:** `report.md` (deep-dive comparison), `comparison_table.md` (flat reference), `decisions.md` (10 future-track candidates)
 **Author:** Tier 2 Tech Lead
 **Read this if:** you're planning a future track, designing a UX change, or wondering "what should we actually do with nagent's ideas?"
 > **What this document is.** The deep-dive in `report.md` maps nagent's 14 principles 1:1 to Manual Slop's existing features and finds six pitfalls. That's the *diagnosis*. This document is the *prescription* — 10 concrete patterns nagent uses that we can borrow, with each one grounded in actual code we've read and an explicit "what to do" path.
 >
 > **What this document is not.** It is not a critique of Manual Slop, not a recommendation to rewrite anything, and not a "framework migration" plan. nagent is a 4,000-line reference; Manual Slop is 13,000+ lines of production code with a GUI, real persistence, real HITL. The right reaction to nagent is *steal the patterns that fit our domain*, not adopt the whole system.
 >
 > **Domain filter.** Every takeaway below is tagged **Application**, **Meta-Tooling**, or **Both** — per `docs/guide_meta_boundary.md`. nagent lives in the Meta-Tooling domain by default. Some patterns transfer cleanly to the Application; some only make sense for the agents that build the Application. Don't apply a "Both" pattern without checking the domain.
 ---
 ## 0. The 30-second version
 If you only read 3 things, read these:
 1. **Make state visible at the right layer** (§1) — nagent puts state in files you can `cat`. Manual Slop already does this for *editable* state (`disc_entries`, `ContextPreset`, `FileItem`, project TOML) but the *provider-side* history still lives in process globals. *Steal the visibility, not the file abstraction.*
 2. **Make the protocol readable in the conversation log** (§2) — nagent's conversation is plain text with `<nagent-shell>...</nagent-shell>` tags you can grep. Manual Slop's comms log is JSON-L with provider-native function-call blobs. *Add a "what the model actually said" projection layer.*
 3. **Make sub-agents a first-class primitive for the Application, not just MMA** (§3) — nagent has one sub-conversation mechanism, used everywhere. Manual Slop has sub-agents for MMA workers but not for 1:1 discussions. *The user explicitly wants this — it's the highest-priority future track.*
 The other 7 patterns are below. Each is grounded in code, not vibes.
 ---
 ## 1. State visibility — files for the things that matter, processes for the things that don't
 **nagent's pattern.** Every piece of state that *survives* lives in a file under `~/.nagent/`:
 - `conversations/<conversation_name>` — the conversation transcript
 - `conversations/file-index-{pid}.json` — file_id → conversation map
 - `splits/<slug>-<uuid>/index.json` — large-file split metadata
 - `splits/<slug>-<uuid>/<slug>-0001.<ext>` — segment files
 - `splits/<slug>-<uuid>/<slug>.patch` — unified diff patch
 The state that *doesn't survive* is the running process: LLM call result, current turn, parse state. The boundary is sharp: anything the user might want to inspect, diff, copy, or back up is a file.
 **Manual Slop today.** Already does this for the *editable* surface:
 - `manual_slop.toml` (project) — `discussion.discussions[<take_name>].history` (`app_controller.py:3236`)
 - `conductor/tracks/<id>/{spec,plan,state.toml,metadata.json}` — track state
 - `personas.toml` (global + project) — persona config
 - `tool_presets.toml` — tool weights
 - `logs/sessions/<session_id>/comms.log` — JSON-L of every LLM call (`app_controller.py:379`)
 What *isn't* in files:
 - `ai_client._anthropic_history`, `_deepseek_history`, `_minimax_history` — 3 per-provider lists in process globals (`ai_client.py:123-132`)
 - The current `disc_entries[i]["content"]` AI response *before* the user flushes the discussion to TOML
 - The current `files` / `context_files` / `screenshots` until the next `_flush_to_project`
 **Actionable idea.** Add a **"Live State Inspector"** panel in the GUI that shows *all* the state that's currently in process — provider history lengths, current discussion entry count, the actual bytes that haven't been flushed yet, the `ai_client` module globals being read. This is a UX change, not an architecture change. It costs ~200 lines (a panel that reads from `app_controller._get_state_for_inspector()` and renders a tree).
 **Domain:** Both. The Application benefits from "what is the AI actually remembering right now?"; the Meta-Tooling benefits from "did my edit actually flow through to the right state?"
 **Effort:** Small. *Not* a new track — this can be a one-day add-on once the inspector is specced.
 **Cross-references:** Decision candidate #3 (Stateless LLMClient) becomes more attractive once the inspector exists, because you'd have a UI to verify the stateless refactor preserves behavior.
 ---
 ## 2. A readable conversation log — text the user can grep, not just JSON-L
 **nagent's pattern.** The conversation file is plain text. Every action appears as a tag:
 ```
 <nagent-shell>python3 -m unittest discover -s tests -v</nagent-shell>
 <nagent-shell-result>
 exit_code: 0
 stdout: ...
 </nagent-shell-result>
 <nagent-response>All 12 tests pass.</nagent-response>
 ```
 The user can `grep -n "exit_code: [^0]" ~/.nagent/conversations/latest-*` to find all failed shell runs. The user can `git diff` the conversation file. The user can `cp` it to a teammate. The protocol is *the storage format*, not a side channel.
 **Manual Slop today.** `comms.log` is JSON-L with provider-native function-call blobs. To find "did the model call `read_file` with the right path?" you need to load JSON, navigate to the right `function_call` entry, know the provider's schema, and dig out the args. The `function_call` itself is opaque — you can't `grep` for it without understanding the provider's wrapping.
 The `app.disc_entries` GUI display *is* the readable projection — when you look at a discussion in the GUI, you see the user/AI turns. But:
 1. The view is in the GUI only; the underlying `comms.log` is JSON-L.
 2. The thinking trace, tool calls, and tool results are flattened into the entry's `content` field via `thinking_parser.py`. You see the *result* but not the *call* unless you open the read mode.
 3. There's no per-tool-call "View raw" button in the comms log panel (per `docs/guide_gui_2.md`).
 **Actionable idea — option A (small, UI-only).** Add a **"Reveal Raw"** toggle on the comms log panel that, when on, shows the JSON-L entry *next to* the rendered view, with the JSON pretty-printed. The user can copy either the rendered text or the raw JSON. ~100 lines.
 **Actionable idea — option B (medium, behavioral).** Project the conversation log into a sibling markdown file as it's written. Every `comms.log` entry gets a corresponding `<session_id>.md` line that says "model called `read_file('src/foo.py')` at <ts>." The user can `cat`, `grep`, or `tail -f` this file. The GUI reads from the same source of truth (the markdown) instead of from the JSON-L. ~300 lines + a streaming write hook in `ai_client`.
 **Domain:** Both. Option A is UI work in the Application. Option B benefits the Meta-Tooling more — an external agent that needs to understand what the Application AI did can read the markdown without parsing JSON-L.
 **Effort:** A is small. B is medium. **Pick A first**; the user-correction in `report.md §3` shows the user is already on top of editable-discussion nuance, so a small UX win here validates the larger bet.
 **Cross-references:** Decision candidate #6 (git-history injection) — the markdown projection is the same kind of "explicit data artifact for the AI's input/output" pattern, just for the comms log instead of git history.
 ---
 ## 3. Sub-agents as a first-class primitive for 1:1 discussions
 **nagent's pattern.** The `<nagent-conversation>` tag in `bin/nagent:execute_agent(...)` is the *only* sub-agent mechanism. Used everywhere: investigation, research, large-output work, debugging. The child is a fresh process with `Invocation = "delegated"`, an isolated conversation file, and a `<nagent-conversation-result>` tag returned to the parent with the child's exit code + output + stderr + token totals.
 **Manual Slop today.** Sub-agents exist for MMA:
 - `scripts/mma_exec.py` — Tier 3/4 worker subprocess
 - `src/multi_agent_conductor.py:run_worker_lifecycle` — worker lifecycle
 - `src/dag_engine.py` — ticket DAG and per-ticket worker pool
 But for 1:1 discussions (`simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async`), there's no sub-agent primitive. The user types a prompt, the AI responds, the loop continues. If the user wants the AI to "investigate this file" or "look up this API," the answer has to come from the same conversation.
 **Why it matters.** The MMA pattern is *already* the prototype. `mma_exec.py` is a real subprocess with Context Amnesia and a clean prompt boundary. The only thing missing is a way to invoke it from the 1:1 chat loop without going through the full MMA tier system.
 **Actionable idea.** Build `src/sub_conversation.py:SubConversationRunner` (Decision candidate #1, already specced in `decisions.md`):
 ```python
 class SubConversationRunner:
    async def spawn(
        self,
        prompt: str,
        *,
        allowed_tools: list[str] | None = None,
        system_prompt: str | None = None,
        timeout_s: int = 120,
    ) -> SubConversationResult:
        # Reuse mma_exec.py as the subprocess template
        # Return the child's <nagent-response> content + token usage
        ...
 ```
 Wire it into the GUI as a new "Investigate…" button on the message panel (`gui_2.py:4513+`). The button opens a small modal: "Ask a sub-agent: ___ [Investigate]". The sub-agent runs, the result is inserted as a "User" role entry in the current discussion, and the next LLM call sees it.
 **Domain:** Application. (The Meta-Tooling could use the same primitive from `scripts/`, but the win is in the App.)
 **Effort:** Medium. 2-3 phases. **HIGH priority** because the user explicitly wants it.
 **Cross-references:** Decision candidate #2 (RAG pre-staging) is the natural second use of this primitive — a sub-conversation that pre-builds the RAG index before a long discussion.
 ---
 ## 4. File-identity over file-path — a stable `st_dev:st_ino` is rename-safe
 **nagent's pattern.** `nagent_file_edit_lib.py:file_id_for_path(path) -> "{st_dev}:{st_ino}"`. The per-file conversation index keys by inode, not by path. Rename the file in place (same inode) → same conversation. Move the file across dirs (same inode) → same conversation. This is the right primitive for "memory attached to the artifact, not the path."
 **Manual Slop today.** `models.FileItem.path: str` — path-keyed. `project.discussion.discussions[<take>].context_snapshot` is a list of `FileItem.to_dict()` dicts, indexed by position in the list. Rename the file in your editor → `FileItem.path` is stale, `aggregate.py:build_file_items` re-reads the old path, may fail. The curation memory *survives* the rename (it's keyed by name in the project TOML) but the file lookup at render time does not.
 **Actionable idea — small (additive).** Add a `file_id: str` field to `FileItem` populated at load time via `os.stat(path).st_dev:st_ino`. Use it as the lookup key in the `context_snapshot` list. On file-read failure, attempt a fuzzy match: same basename in the same directory tree, or same `file_id` under a new path. ~150 lines + a migration for existing project TOML files (path-only becomes path + file_id).
 **Actionable idea — bigger (architectural).** If you do this, also rethink the `ContextPreset` storage. The current schema is a flat list of `FileItem` dicts. nagent's analog is a per-file `IndexEntry { file_id, path, last_seen, conversation, last_summary }`. A path rename in nagent updates `path` in the index but leaves `file_id` stable; in Manual Slop a path rename would orphan the entire `FileItem`.
 **Domain:** Application. (The Meta-Tooling would benefit from a stable file_id when navigating references across many files in a long session.)
 **Effort:** Small (additive) or medium (architectural). The additive path is the right starting point; the architectural rewrite is overkill for a feature that already works for 95% of cases.
 **Cross-references:** Decision candidate #7 (per-file conversation log) — `file_id` is the prerequisite for this candidate.
 ---
 ## 5. One loop, one file — make the agent's brain visible by default
 **nagent's pattern.** `bin/nagent:run_agent_loop` is ~50 lines. `main()` reads CLI args, sets up the conversation file, calls `run_agent_loop`, exits. The conversation file accumulates over the entire session. The "agent" *is* the file plus a transient process.
 **Manual Slop today.** Three parallel loops, each in a different file:
 - `src/ai_client.py:_send_<provider>` (per-provider, ~100-200 lines each × 5 providers) — the LLM-call loop
 - `src/multi_agent_conductor.py:ConductorEngine.run` — the MMA loop
 - `simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async` — the 1:1 chat loop
 Each loop has the same shape (build prompt → call LLM → parse response → dispatch tools → repeat) but the data structures differ. A reader has to hold three mental models.
 **Actionable idea — UX win, not architecture change.** Surface the *unified loop shape* in the diagnostics panel. The diagnostics panel already exists (`gui_2.py` §"Diagnostics Hub" per the Readme). Add a section "Loop Inspector" that shows, for each of the three loops:
 - Last N iterations of: input tokens, output tokens, tool calls made, tool results, parse failures
 - Color-coded: same shape across all three loops, different data sources
 - "View raw" drill-down to the actual function call
 This is *not* a refactor. It's making the existing three loops legible. ~200 lines.
 **Actionable idea — bigger refactor.** Extract a `src/llm_loop.py:run_loop(conversation, provider, tool_dispatch, parse_response, ...)` that's called by all three. This is Decision candidate #5.5 (not in `decisions.md`; would be a new candidate). Effort: large. Value: real but the current separation is readable.
 **Domain:** Both. The UX win is in the Application. The refactor is neutral but helps the Meta-Tooling when agents need to reason about the loop.
 **Effort:** UX win is small. Refactor is large. **Do the UX win first.**
 **Cross-references:** Decision candidate #3 (Stateless LLMClient) — the refactor becomes more attractive if a unified loop exposes the data flow more clearly.
 ---
 ## 6. Visible retry on protocol failure — turn errors into conversation data
 **nagent's pattern.** `bin/nagent:run_agent_loop` has `MAX_FORMAT_RETRIES = 3`. On a parse failure:
 ```python
 append_to_conversation(
    conversation_file,
    f"<agent-response>\n{llm_output}\n</agent-response>\n"
    f"<system>Invalid nagent response format: {parse_error}. "
    f"Respond only with valid nagent tags.</system>",
 )
 ```
 The bad output is *appended to the conversation* with a `<system>` correction. The next call sees its own previous failure and the correction message. The user can `grep` the conversation for `<system>` to find every retry.
 **Manual Slop today.** `_send_<provider>` loops internally; on a tool-call parse failure it... retries. But the failure isn't visible in `comms.log` as a first-class entry — it's swallowed by the loop. The `tier4_qa` interceptor (per `docs/guide_ai_client.md` §"Tier 4 QA") catches *errors from tool execution* and forwards them to a cheap sub-agent for a 20-word summary, but parse failures don't go through this path.
 **Actionable idea — small, high value.** Add a `parse_failures` counter and a "Last 5 parse failures" section to the diagnostics panel. The counter increments on each `parse_response` failure; the section shows the model output, the error message, and the time. ~50 lines. The user gets to see *what* the model is getting wrong — useful for prompt engineering.
 **Actionable idea — medium, prompt-quality win.** When a parse failure happens, append a "self-correction" entry to `disc_entries` as a `role: "System"` entry. The next AI call sees the correction in the visible discussion history. The user can see the corrections and can edit them. ~150 lines.
 **Domain:** Both. The diagnostics panel is Application UX. The self-correction entry is neutral — useful for any agent that reads `disc_entries`.
 **Effort:** Small for option 1. Medium for option 2. **Do option 1 first.**
 **Cross-references:** nagent §5 "The loop" — the retry visibility is a load-bearing part of nagent's debuggability claim.
 ---
 ## 7. "Inspect this file" / "Read this URL" as *prompts*, not function calls
 **nagent's pattern.** `<nagent-read path="..."/>` is a self-closing tag. The model emits it; the parser matches; `execute_read` runs. The model doesn't need to know the function-call schema for the LLM SDK — it just needs to emit text containing a tag.
 **Manual Slop today.** `read_file(path)` is a function call. The model has to know the function signature, format the JSON, embed it in the right `tool_use` block. The training data for "emit a `<nagent-read>` tag" is zero; the training data for "emit a `read_file` tool call" is high. *Function calling wins on capability and on training*; *tag protocols win on debuggability*.
 **Actionable idea — both, but in different places.** This is the *one* place where the existing reports lean toward "different mechanism, both right." Don't replace the Application's function calling. But for the Meta-Tooling, document a *Meta-Tooling DSL* in `conductor/code_styleguides/` for use by external agents when they need to invoke Manual Slop's tools via the bridge script. The DSL would look like:
 ```
 <ms-tool name="read_file" path="src/foo.py" />
 <ms-tool name="py_get_skeleton" path="src/foo.py" symbol="MyClass" />
 ```
 The bridge script (`scripts/mma_exec.py` or whatever the Meta-Tooling bridge is) translates these to the underlying function calls. The external agent's prompt training data does *not* need to know the function-calling JSON schema for every Manual Slop tool — it just needs to know the DSL.
 **This is Decision candidate #4 (intent-based DSL) from `decisions.md`** — but reframed: it's not a Meta-Tooling-*side* DSL, it's a *bridge* DSL. The Application's function-calling stays.
 **Domain:** Meta-Tooling. The Application doesn't need this.
 **Effort:** Research spike, per the user's own assessment: "no where near that ideation yet." Document the design space; don't build it.
 **Cross-references:** Decision candidate #4. Also nagent §12 (tool discovery) — the DSL would be the bridge-side analog of `--description` self-describing executables.
 ---
 ## 8. Self-describing tools — let the tool tell the agent what it does
 **nagent's pattern.** `nagent_cli.py:exit_on_description(description)` is called at the top of every executable:
 ```python
 def exit_on_description(description: str) -> None:
    if "--description" in sys.argv:
        print(description)
        raise SystemExit(0)
 ```
 `nagent_cli.py:collect_bin_tool_descriptions(bin_dir)` runs each tool in `bin/` with `--description`, captures stdout, concatenates. The startup prompt includes the concatenated descriptions automatically. *Adding a new tool is: drop a script, write a description.* The system auto-discovers.
 **Manual Slop today.** `src/mcp_client.py:dispatch(...)` is a flat if/elif chain with 45+ branches. Adding a tool requires:
 1. Edit `dispatch()` to add the branch
 2. Update the security allowlist in `_resolve_and_check` (if filesystem access)
 3. Update the AI capability declaration in `get_tool_schemas()`
 4. Add tests
 **Actionable idea — defer to `mcp_architecture_refactor_20260606`.** This is already on the board as Decision candidate #5 (subsumed). The "sub-MCP" extraction that the refactor proposes is *exactly* the right scope for the self-describing pattern — each sub-MCP is a self-contained module with its own tool registry, and `collect_tool_descriptions` becomes a method on the sub-MCP class.
 **Don't** try to add this incrementally. The dispatch chain is large enough that half-measures (e.g. a per-tool decorator that auto-registers but still requires a manual allowlist edit) are net-negative. Wait for the refactor.
 **Domain:** Both. (Largely Application — the dispatch is in `mcp_client.py`. But the pattern would also be useful for the Meta-Tooling's `scripts/` directory.)
 **Effort:** Subsumed by `mcp_architecture_refactor_20260606`.
 **Cross-references:** Decision candidate #5. Already documented.
 ---
 ## 9. Edit-the-input, not the output — make the prompt the artifact
 **nagent's claim (verbatim from README).** *"Don't edit the output artifacts. Edit the prompt."* If the LLM gives a bad answer, the fix is in the prompt or the inputs — not by hand-patching the output. The conversation file *contains* the prompt. Editing the conversation is editing the prompt for the next turn.
 **Manual Slop today.** The user can edit any `disc_entries[i]["content"]` directly via the `[Edit]` mode in the GUI (per `report.md §3 A1`). But the edited entry goes into the *abstracted entry list*, not into the *raw provider history*. The next LLM call sees:
 - The full `disc_entries` rendered as markdown (with the user's edits)
 - BUT the `ai_client._anthropic_history` (and siblings) is the *raw* provider-side list, with the *original* AI response and the *original* function calls
 So the user edits the *projection* but not the *source*. If the user corrects an AI response that included a bad tool call, the *display* shows the correction but the *provider's next call* will replay the original bad tool call as a "previous tool result" in the history. The two diverge.
 **This is subtle but important.** nagent avoids this entirely because the conversation file *is* the prompt — there's no separate "raw provider history" to keep in sync.
 **Actionable idea — small, surgical.** When the user edits an entry's `content` in `[Edit]` mode, *also* rewrite the corresponding `ai_client._<provider>_history[i]["content"]` to match. The user sees one source of truth; the provider sees the same source of truth. ~100 lines + a careful test for Anthropic's content-block semantics (it has multiple content blocks per message, not a single string).
 **Actionable idea — bigger, the right architecture.** Stop maintaining two histories. Make `disc_entries` the *only* history. `ai_client._<provider>_history` becomes a *projection* of `disc_entries`, rebuilt on each send(). This is part of Decision candidate #3 (Stateless LLMClient) — the `Conversation` object becomes the single source of truth.
 **Domain:** Both. The edit-the-projection fix is Application UX. The single-history architecture is Application + (benefiting) Meta-Tooling.
 **Effort:** Small for option 1, large for option 2. **Option 1 is the right starting point** — it's a known issue with a known fix, and the user-correction in `report.md §3` shows the user is on top of editable-discussion nuance.
 **Cross-references:** Decision candidate #3 (Stateless LLMClient). Also nagent §3 (conversations are editable state) — the philosophy is "one editable source of truth," and Manual Slop currently has two.
 ---
 ## 10. Sub-agents return a *concise artifact*, not a full transcript
 **nagent's pattern.** `<nagent-conversation-result conversation="..." tokens_in="..." tokens_out="...">` contains only the child's `<nagent-response>` body + exit code + stderr. The parent's conversation is *not* polluted with the child's intermediate reads, shell calls, or retries. The parent gets a *distilled* result.
 **Manual Slop today (MMA path).** `multi_agent_conductor.py` returns the worker's final response to the parent (the `ConductorEngine`). The worker's intermediate steps are logged to `comms.log` but not propagated. So MMA *does* follow the nagent pattern for sub-agent outputs. *This is good.*
 **Manual Slop today (1:1 chat, no sub-agents).** No equivalent. The user can't ask a sub-agent and get a distilled answer. The whole point of the user-flagged Decision candidate #1 is to add this — and the implementation should follow nagent's pattern: the sub-agent returns a *string artifact*, not its full conversation log.
 **Actionable idea — design constraint on the upcoming track.** When implementing Decision candidate #1 (SubConversationRunner), specify the return type as `SubConversationResult { artifact: str, tokens_in: int, tokens_out: int, exit_code: int, errors: list[str] }`. Do *not* return the child's full conversation. The parent's `disc_entries` gets one new "User" entry containing `artifact`. The child's full transcript is persisted to `~/.manual_slop/sub_conversations/<uuid>.jsonl` for debugging but is not in the parent's visible discussion.
 **Domain:** Application (this is the design constraint for candidate #1).
 **Effort:** Zero net new effort — this is a design constraint, not a feature. Bake it into the spec for candidate #1.
 **Cross-references:** Decision candidate #1. nagent §9 (sub-conversations). The `MAX_FORMAT_RETRIES = 3` retry budget in nagent also informs the design — the sub-agent should be allowed to retry internally, but its final artifact to the parent should be a single string.
 ---
 ## Cross-cutting observations (not patterns, but framing)
 ### A. nagent's "files are the system" is the same philosophy as Manual Slop's project TOML + conductor tracks
 The *philosophy* of nagent — that data lives in files you can `cat`, `git diff`, and `cp` — is already present in Manual Slop:
 - `manual_slop.toml` is the project's source of truth
 - `conductor/tracks/<id>/state.toml` is the track's state
 - `personas.toml`, `tool_presets.toml`, `context_presets.toml` are all TOML
 - The Hook API exposes this state via `POST /api/project` for external automation
 What's *not* yet at that level: the AI's working state (the in-flight `disc_entries`, the provider history globals). Closing this gap is the theme of Decision candidates #3, #7, and #10.
 ### B. nagent is small because it has no GUI. Don't be jealous of the size.
 nagent: ~4,000 lines. Manual Slop: 13,000+ lines of production code + 5,000+ lines of MCP tools + a 5,000-line GUI. The size difference is the GUI, the persistence, the test harness, the HITL dialogs, and the Hook API. None of those are reducible by adopting nagent's patterns; they're features Manual Slop users want and use. The right comparison is "nagent's *patterns* vs Manual Slop's *implementation*," not "which codebase is smaller."
 ### C. The user-corrections shaped the takeaways
 Three user-corrections during the deep-dive review directly influenced which patterns made this list:
 - **"Editable discussions are more comprehensive than the first draft said"** → made takeaway #1, #2, #9 (visibility, log readability, single-history) all about *respecting* what Manual Slop already has rather than suggesting it lacks.
 - **"MMA is fine; 1:1 sub-agents are the gap"** → made takeaway #3 (sub-agents for 1:1) the highest-priority actionable item, with #10 (sub-agent return type) as the design constraint.
 - **"Personas are config bundling, RAG is opt-in, tool discovery is deferred"** → kept those three out of the "must steal" list. They're in the future-track `decisions.md` but not in *this* document.
 The takeaways are *user-shaped* as well as nagent-shaped. If the user had a different correction in any of those areas, the takeaway list would shift.
 ---
 ## Recommended reading order for a future implementer
 If you're about to build one of the future tracks, read in this order:
 1. **Track 1 — Sub-conversation runner (Application):** Read this entire document, especially §3 and §10. Then read `decisions.md` candidate #1. Then read `src/multi_agent_conductor.py:run_worker_lifecycle` and `scripts/mma_exec.py` for the template.
 2. **Track 2 — RAG pre-staging (Application):** Read this entire document, especially §3 (the parent). Then read `decisions.md` candidate #2. Then read `src/rag_engine.py:index_file` and `docs/guide_rag.md`.
 3. **Track 3 — Stateless LLMClient (Application, big refactor):** Read this entire document, especially §1, §5, #6, #9. Then read `decisions.md` candidate #3. Then read `src/ai_client.py:113-135` (the provider globals) and `src/history.py` (the UISnapshot pattern). Then read `docs/guide_ai_client.md` end-to-end.
 4. **Track 4 — Meta-Tooling intent DSL (Meta-Tooling, research):** Read this entire document, especially §7. Then read `decisions.md` candidate #4. Then read `bin/nagent:parse_response` and the 8 tag patterns there. Then read `src/commands.py` and `src/command_palette.py` to see Manual Slop's existing command-DSL precedents.
 5. **Track 5 — Self-describing MCP tools (subsumed):** Read this entire document, especially §8. Then read the existing `mcp_architecture_refactor_20260606` spec.
 6. **Track 6 — Git history injection (Application, medium):** Read this entire document, especially #1 and #4 (file identity). Then read `decisions.md` candidate #6. Then read `bin/nagent:format_file_history` and `bin/nagent:coedited_file_rows` for the reference implementation. Then read `src/aggregate.py:run` for the insertion point in Manual Slop.
 7. **Track 7 — Per-file conversation log (Application, small):** Read this entire document, especially #1, #4, and #9. Then read `decisions.md` candidate #7. This is dependent on candidate #4 (file_id) — read takeaway #4 first.
 8. **Track 8 — Co-edited files tools (Application, small):** Read this entire document, especially §6 and #8. Then read `decisions.md` candidate #8. This is dependent on candidate #6 (git history) — read takeaway #6's reference impl first.
 9. **Track 9 — Split/patch lib (defer until need):** Read this entire document, especially #5 (unified loop). Then read `decisions.md` candidate #9. Then read `bin/helpers/nagent_file_split_lib.py` and `bin/helpers/nagent_file_patch_lib.py` for the reference implementation. This is *not* a near-term need; only build when a very-large-file scenario actually surfaces.
 10. **Track 10 — Raw-transcript persistence per Take (Application, small):** Read this entire document, especially §1, §2, and §9. Then read `decisions.md` candidate #10. This is dependent on candidate #3 (single history) — read takeaway #9 first.
 ---
 ## Final note: this is a *reference* track
 This document does not commit any of the 10 takeaways to implementation. Each is a *candidate* — a design space, not a decision. The user (the product owner) and the Tier 2 Tech Lead will scope each into a real conductor track when the corresponding need surfaces. The fact that these patterns are *all grounded in code I've read* (nagent + Manual Slop) is the value of this document; the patterns themselves are *raw material for future work*, not commitments.
 End of takeaways document.
@@ -0,0 +1,571 @@
 # Mike Acton's nagent: A Deep-Dive Analysis vs Manual Slop
 **Track:** `nagent_review_20260608`
 **Date:** 2026-06-08 (revised with user corrections same day)
 **Author:** Tier 2 Tech Lead (with significant user review on §3 and §6)
 **Companion to:** `spec.md` (the track wrapper)
 > **Important reading note.** This report applies the **Application vs Meta-Tooling distinction** (per `docs/guide_meta_boundary.md`) as the lens for every comparison. nagent is a Meta-Tooling reference; Manual Slop's Application AI is a *different kind of thing*. Where they share patterns (MMA workers, the tool-call loop, the 3-layer security model), the report says so. Where they don't, the report says so. The report deliberately avoids "nagent is better" / "Manual Slop is better" framings.
 >
 > **Revision note.** The first draft overstated gaps in Manual Slop's "editable discussion" and "per-file memory" features. The user caught this and pointed at the actual files (`FileItem`, `ContextPreset`, `aggregate.py`, `project_manager.branch_discussion`, `HistoryManager`). The corrections are now folded in. Specific corrections: §3 (verdict changed from PARTIAL to **PARITY (DIFFERENT FOCUS)**); §6 (verdict changed from DOMAIN MISMATCH to **MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION**); §9 (verdict now notes the MMA vs 1:1 distinction explicitly per the user).
 ---
 ## 0. Reading guide
 - **Sections 1-14** map 1:1 to nagent's 14 principles. Each has: nagent's claim, nagent's implementation, Manual Slop's equivalent, a verdict, and a domain tag.
 - **Section 15** extracts the 6 actionable pitfalls and maps each to a future-track candidate.
 - **Section 16** is the recommended reading path for engineers who haven't read nagent.
 If you only have 10 minutes, read §3 (Conversations), §6 (Per-File Memory), §9 (Sub-Conversations), §10 (Controlled Writes), and §15 (the pitfalls list).
 ---
 ## 1. Durable work, disposable workers
 **nagent's claim.** A Python process is a *worker*; the files are the *system*. Workers come and go; data stays. **"The agent is not the thing; the data is the thing."**
 **nagent's implementation.** `bin/nagent` is a 700-line single-file loop. It reads `~/.nagent/conversations/<conversation_name>` (a plain text file) for the current conversation, appends to it after every action, and exits. The user types `nagent "investigate this"`. The CLI is a shell. The state is a file.
 **Manual Slop's equivalent.** Manual Slop has two parallel systems:
 1. **MMA workers are real subprocesses.** `multi_agent_conductor._spawn_worker` runs `mma_exec.py` via `subprocess.Popen` (per `docs/guide_multi_agent_conductor.md` §"Token Firewalling"). Each Tier 3 worker is a fresh Python process with **Context Amnesia** — `ai_client.reset_session()` at the start of `run_worker_lifecycle`. The subprocess is the disposable worker; the artifacts (track state, ticket results) are the system.
 2. **The Application AI is *not* a disposable worker.** `gui_2.py:App` is a long-lived Qt/ImGui process. The user types a prompt, hits Enter, gets a response, *keeps the process running for hours*. The `app_state` dataclass is the long-lived worker. This is *intentional* for the Application domain: persona-driven conversations, snapshot-based undo, cross-discussion state — all require a long-running process.
 **Verdict.** **PARTIAL** — nagent's pattern lives in the Meta-Tooling + MMA, but the Application deliberately has long-lived workers. The two coexist because they serve different needs: MMA is fire-and-forget per ticket; App is an interactive partner.
 **Domain tag:** Both. MMA has it; App doesn't need it. *Future-track candidate: a stateless conversation-file pattern for the App (see §15.4).*
 ---
 ## 2. Text in, text out
 **nagent's claim.** The smallest useful primitive is: file in, text out. `nagent-llm-text --file question.txt` reads a file, calls the LLM, prints plain text or JSON. Everything else in nagent is orchestration around this.
 **nagent's implementation.** `bin/helpers/nagent_llm.py` (300 lines) provides `generate_text(message, provider, model) -> str` for 4 providers (openai, anthropic, google, cursor). Token accounting via provider usage metadata (with character-count fallback at 1 token per 4 chars). Provider churn is isolated in this file.
 **Manual Slop's equivalent.** `src/ai_client.py:send(...) -> str` is the parallel. 5 providers (gemini, anthropic, deepseek, minimax, gemini_cli). Same `provider, model, usage` shape. Manual Slop wraps the string in a larger `(md_content, user_message, base_dir, file_items, ..., rag_engine) -> str` because the Application's text-in/text-out also needs tool calls, RAG injection, tier attribution, and patch-mode. But the *primitive* is the same.
 **Verdict.** **PARITY.** nagent and Manual Slop both use text-in/text-out at the bottom. The Application's `send()` is a *strict superset* of nagent's `nagent-llm-text`, with provider churn still isolated to a single module.
 **Domain tag:** Both. Meta-Tooling uses the same primitive via `mma_exec.py`'s `ai_client.send`.
 ---
 ## 3. Conversations are editable state
 **nagent's claim.** The conversation file is not chat history. It is working state. Memory goes stale; therefore let people save, load, summarize, edit, branch, trim, copy, diff, version, and rewrite conversations. **"The conversation does not own its memory. The user does."**
 **nagent's implementation.**
 - `bin/nagent` exposes `--save-conversation <name>`, `--load-conversation <name>`, `--summarize`, `--edit-conversation <prompt>`. The latter **automates** one path: archive current file, run file-edit on the archive, load the result.
 - Conversations are plain text files. The user can `cat`, `vim`, `git diff`, or `cp` them with no special tooling. The `<nagent-response>` body and `<nagent-shell-result>` body are just text in the file.
 - The first draft of this section understated Manual Slop's editing capability. The corrected picture is below.
 **Manual Slop's equivalent (corrected, with the full operation matrix).** Manual Slop's discussion editing lives at **three nested layers**, each with its own operations. The full enumeration:
 **Layer A — Per-entry operations on `app.disc_entries: list[dict]`** (the discussion's typed message list). The renderer is `src/gui_2.py:3770 render_discussion_entry(...)`. Per entry, the user can:
 | # | Operation | GUI control | Source code | What it does |
 |---|---|---|---|---|
 | A1 | **Edit content in place** | `imgui.input_text_multiline` on the entry body | `gui_2.py:3841` | The entry's `content` field is a fully editable multi-line text input. The user can rewrite an AI's response, fix a typo in their own prompt, paste in code from another source, etc. |
 | A2 | **Toggle read/edit mode** | `[Edit]` / `[Read]` button | `gui_2.py:3799` | When in `[Read]` mode, the content is rendered as Markdown with syntax highlighting (`render_discussion_entry_read_mode` at `gui_2.py:3855`). When in `[Edit]` mode, the multi-line text input is shown. |
 | A3 | **Toggle collapsed/expanded** | `+/-` button per entry | `gui_2.py:3789` | Collapsed entries show a 60-char preview (line 3822-3824). Expanded entries show full content. |
 | A4 | **Change role** | Combo box from `app.disc_roles` | `gui_2.py:3793-3796` | The entry's `role` field is editable. The list `app.disc_roles` is itself user-managed (see B5). |
 | A5 | **Insert entry before this one** | `Ins` button | `gui_2.py:3813` | `app.disc_entries.insert(index, {"role": "User", "content": "", "collapsed": True, "ts": project_manager.now_ts()})` |
 | A6 | **Delete this entry** | `Del` button | `gui_2.py:3815-3816` | `if entry in app.disc_entries: app.disc_entries.remove(entry)`. The membership check matters — ImGui can re-render stale state, so the check guards against double-delete. |
 | A7 | **Branch at this entry** | `Branch` button | `gui_2.py:3821` → `app._branch_discussion(index)` → `app_controller._branch_discussion:3503` → `project_manager.branch_discussion:429` | Creates a new Take named `<base>_take_<n>` and copies the history up to and including `index` into the new Take. The user is then switched to the new Take. |
 The entry dict shape itself is open: `{"role": str, "content": str, "collapsed": bool, "ts": str, ...}` plus optional `thinking_segments` (for AI entries with `<thinking>` blocks, parsed by `src/thinking_parser.py`) and `usage` (for token accounting: input/output/cache). The user can also set per-entry `read_mode` (a render-time flag, not persisted).
 **Layer B — Discussion-level operations** (the Take / discussion set). These are the second-tier controls, rendered at `src/gui_2.py:4239 render_discussion_entry_controls(...)` and the discussion selector at `gui_2.py:4330 render_discussion_selector(...)`:
 | # | Operation | GUI control | Source code | What it does |
 |---|---|---|---|---|
 | B1 | **Append new entry** | `+ Entry` button | `gui_2.py:4240` | `app.disc_entries.append({...})` with the default role from `app.disc_roles[0]`. |
 | B2 | **Collapse all / Expand all** | `-All` / `+All` buttons | `gui_2.py:4242-4246` | Bulk-set `collapsed` flag on every entry. |
 | B3 | **Clear all** | `Clear All` button | `gui_2.py:4248` | `app.disc_entries.clear()`. |
 | B4 | **Save (flush to project TOML)** | `Save` button | `gui_2.py:4250` | `app._flush_to_project(); app._flush_to_config(); app.save_config()`. |
 | B5 | **Add/remove roles** | `Add` / `X` buttons under "Roles" | `gui_2.py:4317-4328` | `app.disc_roles.append(r)` / `app.disc_roles.pop(i)`. The role list is **user-managed at runtime** — they can add `"Context"`, `"Tool"`, `"Vendor API"`, or any custom role and assign it to any entry. |
 | B6 | **Switch active discussion** | Discussion combo + Take tabs | `gui_2.py:4197, 4344, 4354` | `app._switch_discussion(name)`. The Takes group by base name (`name.split("_take_")[0]`) and render as nested tabs. |
 | B7 | **Rename / Delete discussion** | `Rename` / `Delete` buttons | `gui_2.py:4291, 4293` | `app._rename_discussion(...)` / `app._delete_discussion(...)`. Cannot delete the last discussion (guarded at `app_controller.py:3543`). |
 | B8 | **Promote Take to top-level** | `Promote` button in takes panel | `gui_2.py:4364` | `project_manager.promote_take(app.project, app.active_discussion, new_name)` — renames a Take (e.g. `T0_take_2`) to a fresh top-level discussion name. |
 | B9 | **Per-role filter** | `ui_focus_agent` selector (system-wide) | `gui_2.py:4230-4234` | `display_entries = [e for e in app.disc_entries if e.get("role") == persona_name or e.get("role") == "User"]`. The filter follows the MMA persona focus. |
 | B10 | **Truncate to N pairs** | `Truncate` button + `drag_int` | `gui_2.py:4254-4260` | `truncate_entries(app.disc_entries, app.ui_disc_truncate_pairs)` keeps the last `N` User/AI pairs (per `gui_2.py:175 truncate_entries(...)`). |
 | B11 | **Compress (AI summarization)** | `Compress` button | `gui_2.py:4252` → `app_controller._handle_compress_discussion:3357` | Calls `ai_client.run_discussion_compression(disc_text)` and replaces the discussion with the LLM's compressed version. |
 **Layer C — UI snapshot history (undo/redo).** The `HistoryManager` (`src/history.py:71`, `max_capacity=100`) and `UISnapshot` (`history.py:8-63`) provide Ctrl+Z / Ctrl+Y across the entire UI state — including `disc_entries`:
 | # | Operation | Source code | What it does |
 |---|---|---|---|
 | C1 | **Take snapshot** | `gui_2.py:735 _take_snapshot` → `history.UISnapshot(...)` | `copy.deepcopy(self.disc_entries)` — a deep copy of the full entry list is captured. The snapshot also captures `ai_input`, `temperature`, `top_p`, `max_tokens`, `auto_add_history`, `files`, `context_files`, `screenshots`, all system prompts. |
 | C2 | **Apply snapshot (undo/redo)** | `gui_2.py:754 _apply_snapshot` | Restores `self.disc_entries = snapshot.disc_entries` (and all the other fields). |
 | C3 | **Change detection triggers snapshot** | `gui_2.py:1160, 1166-1167` | `if len(current.disc_entries) != len(self._last_ui_snapshot.disc_entries) or ...` — disc_entries content change pushes a new snapshot. |
 | C4 | **Capacity-evict oldest** | `history.py:80-90 push()` | When the undo stack exceeds 100, the oldest is popped from the front. |
 | C5 | **Jump to specific state** | `history.py:129 jump_to_undo(index, current_state, ...)` | Allows time-traveling to any past snapshot, not just the most recent. |
 **Summary of editability.** Manual Slop provides:
 - **Per-entry content edit** (A1, A2) — the AI's response text is fully editable in the GUI
 - **Per-entry insert at any position** (A5) — the user can drop a new entry *between* two existing entries, not just append
 - **Per-entry delete at any position** (A6)
 - **Per-entry role change** (A4) — the user can re-label any entry as User, AI, Tool, Context, or any custom role
 - **Per-entry branch** (A7) — creates a Take at any entry, not just at the end
 - **Per-entry collapse/expand** (A3) — visual organization
 - **Per-discussion full CRUD** (B1, B6, B7, B8) — append, switch, rename, delete, promote
 - **Per-role set management** (B5) — the role list itself is user-editable
 - **Bulk operations** (B2, B3, B10) — collapse/expand all, clear, truncate
 - **AI-assisted compression** (B11) — summarize the whole discussion
 - **Undo/redo across all of the above** (C1-C5) — Ctrl+Z / Ctrl+Y / jump-to-state
 **What Manual Slop does NOT have.** The user cannot edit the **provider-side raw transcript** — the bytes inside the `ai_client._anthropic_history`, `ai_client._gemini_chat._history`, etc. process globals. These are reset on `ai_client.reset_session()`. nagent's "edit the conversation file" pattern operates at *this* layer, not the entry abstraction. The comms log (`comms.log`) is JSON-L and append-only, not user-editable from the GUI (it can be edited on disk in a text editor, but that's a different workflow).
 **Verdict.** **PARITY (DIFFERENT FOCUS).** Both systems support comprehensive editing of the conversation-as-data. The difference is *what counts as "the conversation"*:
 - nagent's "conversation" = the raw transcript text file (the bytes the LLM produced)
 - Manual Slop's "conversation" = a typed entry list with role + content + metadata + optional thinking segments
 Manual Slop's editing is **more granular and more pervasive** (per-entry content edit, per-entry insert/delete, per-entry role-change, per-entry branch, with undo/redo). nagent's editing is **deeper at the raw transcript layer** (edit the actual AI response text before it's been abstracted into a typed entry). Both are real; both are deliberate.
 **Domain tag:** Application. The Application's typed-entry abstraction is intentional — the user thinks in "discussions" not "transcripts." The user can opt-in to the raw-transcript layer by editing `comms.log` on disk or by reading the TOML `discussions/<take_name>/history` field directly.
 *Future-track candidate: optionally persist the raw transcript as a sibling file under each take (Candidate 10 in `decisions.md`), enabling the nagent-style "edit the actual AI response" workflow for users who want it.*
 ---
 ## 4. Visible output protocol
 **nagent's claim.** Free-form model output is hard to execute. Use a visible protocol: `<nagent-read>`, `<nagent-file-read>`, `<nagent-shell>`, `<nagent-write>`, etc. The startup prompt lists the only tags the model may emit. The parser is strict: recognized tags and whitespace. Nothing else. **"If you cannot read the protocol, you cannot debug the system."**
 **nagent's implementation.** `bin/nagent:TAG_PATTERNS` is a list of `(tag_type, compiled_regex)` tuples. `parse_response()` returns `None, error` if any non-whitespace text is found outside a known tag. The error message is appended to the conversation and the model is asked to retry (up to `MAX_FORMAT_RETRIES = 3`).
 **Manual Slop's equivalent.** Manual Slop's Application AI uses **provider-native function calling** (Gemini `genai.types.FunctionDeclaration`, Anthropic `tool_use` blocks, etc.). This is *opaque*: the protocol is encoded in JSON the provider parses. The user cannot read a `function_call` from the comms log and reason about it without knowing the provider's schema.
 The two approaches are **structurally different**:
 | Aspect | nagent regex tags | Manual Slop function calling |
 |---|---|---|
 | Visibility | Plain text, inspectable in the conversation file | JSON blobs in provider-specific format |
 | Per-provider portability | Same tags work across all 4 providers | Each provider has its own schema; mcp_client's 45 tools have 5 different per-provider formats |
 | Provider capability ceiling | Whatever the model can emit as text | Native parallel tool calls, structured outputs, JSON-mode constraints |
 | Debuggability | "Why didn't the model read the file?" → grep the conversation for the tag | "Why didn't the model call read_file?" → inspect the JSON response |
 **Verdict.** **ARCHITECTURAL DIFFERENCE** — both are correct for their domain. The Application *wants* parallel tool calls, JSON-mode constraints, and provider-side caching. The Meta-Tooling *might want* nagent's regex tags for explicit debuggability.
 **Domain tag:** Both. The Application's choice is right (modern providers all support function calling with parallel execution — see `docs/guide_ai_client.md` §"Async Tool Execution"). The Meta-Tooling *could* adopt nagent's regex-tag protocol for its own work — for example, by using `<read src/foo.py>` instead of a tool-call JSON. This is explicitly the difference between the "Application's internal AI" and the "Meta-Tooling that builds the Application" in `docs/guide_meta_boundary.md`.
 *Future-track candidate: a Meta-Tooling-side DSL for compact tool calls (per the existing `docs/reports/PLANNING_DIGEST_20260606.md` reference to "an intent-based DSL" for "discovery" or "combinatorics").*
 ---
 ## 5. The loop (append, call, parse, act, append, repeat)
 **nagent's claim.** "Agent behavior" is mostly: append, call, parse, act, append, repeat. Heavier systems add infrastructure around the same steps.
 **nagent's implementation.** `bin/nagent:run_agent_loop` is a `while True` loop:
 1. Append user prompt to conversation file
 2. Send conversation file to LLM (via `nagent-llm-text --json`)
 3. Append response to conversation file
 4. If response contains action tags: run those actions, append results, continue loop
 5. If response contains `<nagent-response>`: print and stop
 **Manual Slop's equivalent.** Manual Slop has *three* parallel "loops":
 1. **`src/ai_client.py:_send_<provider>`** — the per-provider tool-call loop. Up to `MAX_TOOL_ROUNDS + 2 = 12` iterations. Each round: call provider, parse function calls, dispatch, append tool results. Same shape as nagent.
 2. **`src/multi_agent_conductor.py:ConductorEngine.run`** — the MMA loop. Per ticket: `ai_client.reset_session()` (Context Amnesia), build prompt, `loop.run_in_executor(None, run_worker_lifecycle, ...)`. Different scope (per ticket, not per user turn).
 3. **`simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async`** — the 1:1 chat loop. Per user turn: build markdown, send, wait, append response. Different scope (per user turn, in the App).
 All three have the same "append, call, parse, act, repeat" shape. They differ in *what gets appended* (per-provider history vs track state vs `disc_entries`).
 **Verdict.** **PARITY.** The loop is the universal pattern. Manual Slop's three loops are at different layers (LLM, MMA, App). The lack of a *single* "the loop" file is a real cost — nagent's `run_agent_loop` is 50 lines, easy to reason about. Manual Slop's loops are 100-300 lines each, scattered.
 *Future-track candidate: a single `src/llm_loop.py:run_loop(...)` function that all three callers use, with the dispatch and parse layers injected. (Not a high-priority refactor; the current separation is readable.)*
 **Domain tag:** Both.
 ---
 ## 6. Per-file memory (curation, not conversation log)
 **nagent's claim.** One conversation grows too large. Attach memory to artifacts. Work keeps coming back to the same files; give each file its own persistent local memory. **"When work orbits one artifact, store memory on that identity."**
 **nagent's implementation.** `bin/helpers/nagent_file_edit_lib.py` provides:
 - `file_id_for_path(path) -> "{st_dev}:{st_ino}"` — a stable file identity across renames (the inode is preserved).
 - `file_index_path(root, pid) -> conversations/file-index-{pid}.json` — a JSON registry of `{file_id: {path, conversation}}`.
 - `resolve_file_edit_conversation(root, pid, file_path) -> (name, resolved, file_id)` — gets or creates a per-file conversation.
 - `nagent-file-edit --file src/foo.py "add validation"` — spawns a new nagent process with `--file_edit src/foo.py`, which loads the file's *previous* conversation as the initial context. After edits, the new file is appended to the same conversation.
 The result: a per-file conversation log keyed by inode. Rename with same inode = same conversation. Pure path-based: nope, you'd collide across two repos on the same machine.
 **Manual Slop's equivalent (corrected per user).** The first draft of this report marked this section as "DOMAIN MISMATCH" — claiming Manual Slop has no per-file memory. **This was wrong.**
 Manual Slop *does* have a per-file memory concept. It's just **a different kind of memory**. Where nagent's per-file memory is a *conversation log* (what the LLM said about this file last time), Manual Slop's is a *curation config* (how to present this file in the AI's context window). The two are complementary, not equivalent.
 The Manual Slop per-file memory:
 ```python
 # src/models.py:510
@dataclass
 class FileItem:
    path:               str                # the artifact identity (path-keyed, no inode)
    auto_aggregate:     bool = True       # include in auto-aggregation?
    force_full:         bool = False      # bypass aggregation with full content?
    view_mode:          str = 'full'      # full / skeleton / summary / sig / def / agg
    selected:           bool = False      # for batch operations
    ast_signatures:     bool = False      # only signatures
    ast_definitions:    bool = False      # only definitions
    ast_mask:           dict[str, str]    # per-symbol mask (from Structural File Editor)
    custom_slices:      list[dict]        # Fuzzy Anchor slices with tag+comment
    injected_at:        Optional[float]   # timestamp
 ```
 Plus the **ContextPreset** (`src/models.py:909`): a *named, persisted set* of `FileItem`s, stored in the project's `manual_slop.toml`. Load a preset → restore the same per-file curation state. This is the per-file memory that survives across discussions.
 The user pointed at this directly: *"we have the context composition we can directly control what's in memory at the start of a discussion."* That's the right framing. `aggregate.py:run` builds the initial markdown from `self.context_files` (the active preset's FileItems) + `aggregate.run(flat, aggregation_strategy=...)`. The user controls the per-file memory at discussion start.
 What's *missing* is nagent's specific pattern: **a per-file conversation log keyed by inode.** Manual Slop does not have a "last investigation of this file" concept stored as a file. The closest analog is *commit history* (the discussion itself is git-linked, per `docs/guide_gui_2.md` §"Discussions Sub-Menu" "Git Commit Tracking"). But that's discussion-scoped, not file-scoped.
 **Verdict.** **MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION; nagent IS STRONGER IN THE CONVERSATION-LOG DIMENSION.** Both have a real per-file memory concept. Manual Slop's is "how do I render this file next time the AI sees it" (rich, with 9 fields, AST-aware); nagent's is "what did the LLM say about this file last time" (plain text, with stable inode identity). The two are not equivalent; they're different optimizations for different needs.
 **Domain tag:** Application (for the curation config). The user-correction explicitly said: *"we have the context composition we can directly control what's in memory at the start of a discussion."* That confirms this is a real Application feature, not a gap.
 *Future-track candidate: extending the per-file memory with a thin "last-investigation" log per file. A `~/.manual_slop/per_file/<file_id>.md` (file_id by inode, like nagent) that records the last time a discussion referenced this file, the questions asked, and the answers received. This is a Meta-Tooling-friendly addition because it's a plain file.*
 ---
 ## 7. Repository history as data
 **nagent's claim.** A repo is not only the current tree. History is data too. Transform git history into editing context for a target file. Not vague "retrieval." Explicit transformation of historical artifacts into working input.
 **nagent's implementation.** `bin/nagent:file_edit_history_and_summary_block(file_edit_path, ...)`:
 - `git_file_history(repo_root, rel_path)` — `git log --follow --max-count=50` per file
 - `summarize_new_file_commits(...)` — LLM call to one-line-summarize new commits
 - `coedited_file_rows(repo_root, rel_path, commits)` — counts files in the same commits; labels high/medium/low co-edit rate
 - `format_file_history(...)` — produces a `{file-history}` block with editors, step-by-step, co-edited files, summarized commits
 **Manual Slop's equivalent (partial).** Manual Slop's `_reread_file_items` (in `ai_client.py`) does mtime-based *current* content re-reading with diff injection as `[SYSTEM: FILES UPDATED]`. It does *not* do git history injection.
 The closest things Manual Slop has:
 - **Git commit-linked discussion tracking** in the GUI: each discussion has a "Update Commit" button that stamps `git rev-parse HEAD` (per `docs/guide_gui_2.md` §"Discussions Sub-Menu").
 - **`src/dag_engine.py`** tracks ticket-to-git-commit relationships, but for *MMA* workers, not for the AI's context.
 **Verdict.** **PARTIAL.** Manual Slop has current-content diff injection (the easy half) but lacks historical-context injection (the harder half). nagent's `summarize_new_file_commits` would be a useful addition to the Manual Slop AI's context — especially for "explain what this file does" questions where the LLM is meeting the file fresh.
 **Domain tag:** Application. *Future-track candidate: a `src/git_history.py` module that mirrors nagent's `file_edit_history_and_summary_block` and is invoked at discussion start (after `aggregate.py`).*
 ---
 ## 8. Historical coupling & artifact neighborhoods
 **nagent's claim.** A file lives in a neighborhood of related artifacts. Files that change together in git history are hints: tests, headers, config, paired implementation. High co-edit rate means "look here maybe." Not "edit everything."
 **nagent's implementation.** `coedited_file_rows(repo_root, rel_path, commits)`:
 - Counts files in the same commits as the target
 - Labels: high (>=50% co-edit), medium (>=20%), low
 - Renders a `| file | commits together | P(other file changed | target file changed) |` table
 - Guidance text: "Use these files as hints. Before editing, inspect high-likelihood co-edited files when the requested change may affect interfaces, tests, config, or paired code. Do not edit them unless the user request or evidence requires it."
 **Manual Slop's equivalent.** None. Manual Slop has `py_get_hierarchy` (subclass scan) and `ts_c_*_get_*` AST tools, but **no tool that returns "files that historically co-edit with this file."** The closest is `derive_code_path` (call-graph trace), which is structural not historical.
 **Verdict.** **GAP.** This is a real missing tool. nagent's framing — "hints, not commands" — is exactly the right level for a co-edit suggestion. A 50-line tool (`py_coedit_files(path) -> list[(path, count, likelihood)]`) would fill the gap.
 **Domain tag:** Application. *Future-track candidate: a `py_coedited_files` MCP tool + `ts_c_coedited_files` for C/C++.*
 ---
 ## 9. Disposable sub-conversations
 **nagent's claim.** Exploration creates noise. Spawn disposable workers. Sub-conversations are temporary nagent processes with isolated conversations. Their lifetime does not matter. The artifact they return matters.
 **nagent's implementation.** `<nagent-conversation>` tag in the main loop's response:
 - Parent appends `<nagent-conversation prompt="...">` to its conversation
 - Parent spawns `nagent --invocation delegated --parent-conversation <name> --json` as a subprocess
 - Child's `--json` output is parsed, rolled up into the parent's `recursive_input_tokens` / `recursive_output_tokens`
 - Child has its own conversation file; no shared context except the explicit prompt
 - Parent gets a concise artifact: the child's `<nagent-response>` content, plus token usage
 **Manual Slop's equivalent (corrected per user).** The first draft of this report claimed **PARITY (stronger in some ways)**. The user corrected this:
 > *"I don't know if I have disposable sub-conversations, I don't really have them for non-mma runs. I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points."*
 So the actual picture is:
 | Layer | Sub-conversation support |
 |---|---|
 | **MMA Tier 3 / Tier 4** | **Yes.** `mma_exec.py` spawns a real subprocess per ticket with Context Amnesia. `ai_client.reset_session()` at start of `run_worker_lifecycle`. The Ticket output is the "distilled artifact" returned to the parent (`ConductorEngine`). Per the docs: *"Tier 3 worker is a fresh subprocess with a clean context window, receiving only the prompt and the relevant context slice."* |
 | **1:1 main discussion** | **No.** The Application's chat loop has no sub-conversation mechanism. The user types a prompt, the AI responds, the loop continues. There's no way to "ask a sub-agent to investigate X and bring back the answer." |
 The user is correct: this is a gap. The MMA pattern is the prototype. A future track could extract `MMA's run_worker_lifecycle` into a reusable `app.spawn_sub_conversation(prompt, allowed_tools=...)` method that the App can call from `pre_tool_callback` or from a new "investigate this" command.
 **Verdict.** **PARITY for MMA; GAP for 1:1 discussions.** The MMA pattern is strong. The 1:1 chat has no equivalent. The user explicitly flagged this as a want.
 **Domain tag:** Application (and possibly Meta-Tooling). *Future-track candidate: a `src/sub_conversation.py:SubConversationRunner` that the App can call to spawn disposable sub-agents on-demand during 1:1 discussions. Per the user: useful for "specific points" within a longer conversation.*
 ---
 ## 10. Controlled writes
 **nagent's claim.** A loop that writes files needs explicit boundaries. nagent is a reference implementation with conventions, **not a sandbox**. Shell runs with your permissions. Structured writes are checked. That is not a security boundary. Do not pretend it is.
 **nagent's implementation.**
 - `validate_write_path(path, file_edit_path, ...)` — in main mode: path must be in `/tmp`, `/var/tmp`, or `$TMPDIR`. In file-edit mode: path must be the target file (or one of its split segments).
 - Rejected writes append `<nagent-write-result status="error">` to the conversation.
 - `<nagent-shell>` runs whatever the LLM wrote, with the user's permissions, in the user's working directory. **There is no shell sandbox.** This is explicit.
 **Manual Slop's equivalent.** Manual Slop has a *much* stronger security model:
 | nagent | Manual Slop |
 |---|---|
 | `validate_write_path`: in main mode, path must be in `/tmp`, `/var/tmp`, or `$TMPDIR` | `mcp_client._is_allowed`: in main mode, path must be in the allowlist (constructed from `file_items` + `extra_base_dirs`); history.toml and `*_history.toml` are *always* blocked |
 | `execute_write` writes the file directly | `set_file_slice` / `edit_file` / `py_update_definition` route through AST or string-match for validation |
 | `<nagent-shell>` runs the user's full shell, full permissions, no approval | `run_powershell(script, base_dir, qa_callback=...)` requires GUI modal approval (Execution Clutch), 60s timeout, `taskkill` cleanup, optional Tier 4 QA on failure |
 | No per-tool allowlist | 3-layer security: `configure` (allowlist) → `_is_allowed` (path validation) → `_resolve_and_check` (resolution + symlink resolution) |
 | No sandbox at all | PowerShell-only (no bash/cmd) by default; can be enabled in `[mcp_env.toml]` |
 **Verdict.** **PARITY (STRONGER on Manual Slop's side).** Manual Slop's HITL-required shell execution + 3-layer allowlist is *dramatically* more secure than nagent's tmpdir check. The user explicitly chooses "less safety but more flexibility" with nagent, and "more safety but more friction" with Manual Slop.
 **Domain tag:** Both. The Application needs Manual Slop's strict model. The Meta-Tooling could legitimately use nagent's looser model *because the human is in the loop* (the bridge script pops a GUI dialog).
 ---
 ## 11. Large files as explicit artifacts (split/patch)
 **nagent's claim.** Big files exceed context. Split them. Do not pretend they fit. The split is a *data structure* with `index.json` and segment files; the patch is a unified diff; the source hash validates that nothing changed.
 **nagent's implementation.**
 The 4-file pipeline:
 1. **`nagent-file-split <file> --output <dir> --split <type> [--summarize] [--refresh INDEX] [--target-bytes 32768] [--natural]`**:
   - `EXTENSION_MAP` covers 11 languages (txt, md, cpp, py, xml, js, ts, json, yaml, go, rs, java)
   - Per-language `SCORE_BY_TYPE` (no tree-sitter; regex + line-counting + brace/JSON/XML depth counters)
   - `py_score` rewards blank lines followed by `def`/`class`/`async def`
   - `cpp_score` uses `brace_depth` to find closing braces at depth 0
   - `json_score` uses `json_depth` to find closing `}`/`]` at depth 0
   - Writes `index.json` with `source_path`, `sourcesha256`, `source_size_bytes`, `source_line_count`, `split_type`, `target_bytes`, `natural`, `created_at`, `segment_count`, `segments[]`
   - Each segment is a separate file with `name-0001.py`, `name-0002.py`, etc.
   - `--summarize` flag spawns `nagent-file-summarize` per-segment subprocess
 2. **User edits the segment files** (in place, via vim, etc.)
 3. **`nagent-file-patch <index> [--patch PATH] [--dry-run] [--force]`**:
   - `validate_index(index, require_hash_match=not force)` — **strict** hash check; rejects if source changed
   - `merge_segments(segments) -> str` — concatenates segment contents in order
   - `make_unified_patch(source, original, updated)` — `difflib.unified_diff`
   - Writes the patch file; if `apply=True` and `changed=True`, writes the source
 4. **`nagent-file-summarize <file> [--limit-word-count N] [--output DIR] [--json]`**:
   - Files > 64 KB cascade to `nagent-file-split --summarize` first
   - `summarize_content` retries up to `SUMMARY_MAX_ATTEMPTS = 2` if the LLM overshoots the word limit
   - `combined_summary_from_index` glues per-segment summaries into one
 **Manual Slop's equivalent (different mechanism, same insight).** Manual Slop has all the *parts* of nagent's split/patch/summarize, but they live in different files and use different mechanisms:
 | nagent | Manual Slop |
 |---|---|
 | `nagent-file-split` with per-language `SCORE_BY_TYPE` (regex + line counts + brace/JSON/XML depth) | `aggregate.py:build_file_items()` + `py_get_skeleton` (tree-sitter) + `ts_c_*_get_skeleton` (tree-sitter) + `outline_tool.py` |
 | `index.json` with `source_path`, `sourcesha256`, `segments[]` | No explicit `index.json`. The "split" is implicit in `_reread_file_items` (mtime-based, not hash-based) and the `py_get_skeleton` tool returns the structural view on demand. |
 | `nagent-file-patch` with strict `validate_index` (hash check) | `set_file_slice` / `edit_file` with `result of file.read_text()` pre-write validation. No hash-based pre-validation. |
 | `nagent-file-summarize` with per-segment LLM call + retry | `run_subagent_summarization(file_path, content, is_code, outline) -> str` (in-process LLM call) |
 | Combined `combined_summary_from_index` | No equivalent; `aggregate.build_markdown_no_history` builds a single markdown per call |
 | `nagent-file-summarize` cascades to `nagent-file-split --summarize` for > 64 KB | `RAGEngine._chunk_code` cascades to chunking for Python (mtime-based invalidation, ChromaDB persistence) |
 **Crucial difference: Manual Slop uses tree-sitter, nagent does not.** nagent's per-language scoring functions are *all regex-based* (`cpp_score` looks for closing braces at depth 0; `py_score` looks for blank lines followed by `def`/`class` keywords; no AST parsing). Manual Slop's `py_get_skeleton` and `ts_c_*_get_skeleton` use the tree-sitter library for actual AST traversal.
 This is a trade-off. Tree-sitter is more accurate but requires a native dependency. nagent's approach works on any Python install with no compiled extensions. For the Application domain, tree-sitter is already a dependency (`file_cache.py`); for the Meta-Tooling, nagent's regex approach has appeal.
 **Verdict.** **PARITY (DIFFERENT MECHANISM).** Both have the "split / patch / summarize as explicit data artifacts" insight. nagent uses subprocesses + per-language scoring + hash validation. Manual Slop uses tree-sitter + in-process calls + mtime validation. The key safety property — *"the patch operation validates the source hasn't changed"* — is done by nagent via SHA-256; Manual Slop does it implicitly by re-reading the file and string-matching. Manual Slop could adopt the explicit hash approach for stronger guarantees.
 **Domain tag:** Both. *Future-track candidate: an explicit `src/split_lib.py` + `src/patch_lib.py` mirroring nagent's design, used by the Application for very-large-file scenarios (e.g., a 200KB legacy C file where skeleton + sig + def aggregation isn't enough).*
 ---
 ## 12. Tool discovery (self-describing executables)
 **nagent's claim.** Tool capability should be explicit data too. No central registry. Tools describe themselves.
 **nagent's implementation.** `bin/helpers/nagent_cli.py:collect_bin_tool_descriptions(bin_dir)`:
 - Iterates every executable in `bin/`
 - Runs each with `--description` (10s timeout per)
 - Captures stdout, parses it
 - Concatenates into a single "Available tools:\n\n<description 1>\n\n<description 2>\n..." block
 - Inserts this block into the initial context
 Each tool's `__main__` starts with:
 ```python
 def exit_on_description(description: str) -> None:
    if "--description" in sys.argv:
        print(description)
        raise SystemExit(0)
 ```
 So `nagent-file-split --description` prints "Split a large file into structure-aware segments..." and exits 0. The main `nagent` loop calls `collect_bin_tool_descriptions` once at startup.
 **Manual Slop's equivalent.** None. The 45 MCP tools in `src/mcp_client.py` are dispatched by a flat if/elif chain in `dispatch()`:
 ```python
 def dispatch(tool_name, tool_input):
    if tool_name.startswith("bd_"):
        return _dispatch_beads(tool_name, tool_input)
    if tool_name == "read_file":
        return _read_file(tool_input["path"])
    if tool_name == "py_get_skeleton":
        return _py_get_skeleton(tool_input["path"])
    # ... 45+ branches ...
    return f"ERROR: unknown tool: {tool_name}"
 ```
 Adding a new tool requires:
 1. Edit `dispatch()` to add the branch
 2. Update the security allowlist in `_resolve_and_check` (if filesystem access)
 3. Update the AI capability declaration in `get_tool_schemas()`
 4. Add tests
 nagent's approach: drop an executable in `bin/`, implement `exit_on_description`, done. The tool is auto-discovered.
 The user (per the pushback): *"The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet."* — so this is a known want, but low priority.
 **Verdict.** **GAP (Application).** nagent's pattern is genuinely better here, but Manual Slop has 45 tools in production and a migration would be a big refactor. The win is real (extensibility) but the cost is also real (rewrite the dispatch layer).
 **Domain tag:** Both. For the Meta-Tooling (the `scripts/` directory), nagent's pattern is more aligned with the external-agent usage model. For the Application, the existing `dispatch` if/elif is fine.
 *Future-track candidate: a `mcp_architecture_refactor_20260606` (already on the board) would benefit from nagent's pattern. The "sub-MCP" extraction the planned refactor proposes is exactly the right scope for this — each sub-MCP could be its own self-describing module.*
 ---
 ## 13. Differences from frameworks
 nagent's philosophical frame: framework-style systems hide state in object graphs and long-lived agent abstractions; nagent keeps everything as explicit files. The reframing table at the end of the nagent README is excellent:
 | Common term | nagent framing |
 |---|---|
 | memory | editable artifact |
 | retrieval | preserved work / historical context |
 | agent | temporary transformation function |
 | context | explicit input data |
 This report's §2-§12 have been showing where Manual Slop *agrees* with nagent's reframings and where it *deliberately diverges*.
 **Verdict.** The reframing is useful. The application can pick and choose which reframings to adopt per layer.
 **Domain tag:** Both. This is the philosophical lens for the whole report.
 ---
 ## 14. Build your own
 nagent's last section: *"The minimal system is not mystical. Small loop over explicit state."* The list of 12 buildable steps: `generate_text(file) -> str`, growing conversation document, initial context with the contract, output format + parser, handlers that append results to state, loop after actions, visible retry on malformed output, child loops for delegation, per-artifact memory, repository history → context blocks, split/index/patch for large files, save/load/edit/summarize for memory maintenance.
 **Verdict.** Manual Slop *has* all 12 of these. Just in different files, with different names, and at a different scale.
 **Domain tag:** Both. The 12-step list is a useful checklist for any future LLM-application track.
 ---
 ## 15. The 6 Pitfalls (Revised from 8, after User Corrections)
 The first draft of this report had 8 pitfalls. The user-corrections on §3 and §6 collapsed 2 of them. The remaining 6:
 ### Pitfall 1: No structured output protocol in the Application AI
 The Application uses opaque provider-native function calling. The user can read the conversation, but cannot read a `tool_call` from the comms log without knowing the provider's schema. nagent's regex-tag protocol is more debuggable for the Meta-Tooling. **Decision: not a problem for the Application (provider-native is the right choice). Worth borrowing for the Meta-Tooling.** **Domain tag:** Both. *Future-track candidate: an intent-based DSL for Meta-Tooling agent calls.*
 ### Pitfall 2: Provider-specific history is in process globals
 `src/ai_client.py` has `_anthropic_history`, `_deepseek_history`, `_minimax_history` — 3 separate per-provider history lists, each with their own lock. Switching providers mid-session loses history. nagent's "single conversation file" model is provider-agnostic.
 **Concrete change:** A future refactor toward a stateless `LLMClient` class with an explicit `Conversation` object (the transcript as a `list[Message]`) would let:
 - Users save/load/replay conversations
 - Provider switching doesn't lose history
 - Tier 4 QA and Tier 3 workers share a common conversation format
 **Domain tag:** Application. *Future-track candidate: a `src/conversation.py:Conversation` dataclass + `src/llm_client.py:LLMClient` stateless wrapper around the 5 providers.*
 ### Pitfall 3: RAG is not "history as data"
 Manual Slop's RAG (`src/rag_engine.py`) is fuzzy and not auditable. nagent's git-history-driven context is exact and inspectable. RAG is useful but should be **additive**, not a replacement. The Application's `_reread_file_items` mtime-based diff injection is the "history as data" mechanism Manual Slop already has.
 **The user's clarification:** *"RAG is an optional thing, doesn't have to be used. Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run."*
 **Decision:** RAG stays. The user wants a *staging* workflow: a sub-agent prepares RAG chunks before a run, the chunks become the discussion's starting memory. This is consistent with the nagent-inspired sub-conversation pattern (§9).
 **Domain tag:** Application. *Future-track candidate: a "RAG pre-staging" sub-conversation runner that pre-builds the index for a planned run.*
 ### Pitfall 4: The AI client is a stateful singleton with module-level globals
 2,685-line `src/ai_client.py`. The module is the abstraction layer. To import it for testing, you trigger 5 provider SDKs' lazy imports. The unit tests are the only way to know what state is in flight.
 This is the *opposite* of nagent's "files are the system; the process is a worker." nagent's `run_agent_loop` is 50 lines, stateless, testable. A future refactor toward a stateless `LLMClient` class would make `ai_client` parseable, testable, and saveable.
 **Domain tag:** Application. *Future-track candidate: a `src/llm_client.py:LLMClient` class with explicit `Conversation`, `Provider`, `History` objects. Backwards-compatible with the current `ai_client.send()` API.*
 ### Pitfall 5: No non-MMA disposable sub-conversations
 The MMA pattern is strong. The 1:1 chat has no equivalent. The user *explicitly* flagged this as a want: *"I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points."*
 **Decision:** Design `src/sub_conversation.py:SubConversationRunner` that the App can call to spawn disposable sub-agents on-demand during 1:1 discussions. Reuse MMA's subprocess pattern (`mma_exec.py` as the template). The sub-agent returns a concise artifact to the parent (nagent's pattern). Useful for "investigate this file" / "summarize this concept" / "look up this API" commands.
 **Domain tag:** Application. *Future-track candidate: a `src/sub_conversation.py` + a GUI "Investigate…" button on the message panel.*
 ### Pitfall 6: Hard-coded tool discovery
 The 45 MCP tools in `mcp_client.py:dispatch` are in a flat if/elif chain. nagent's `--description` self-describing executable pattern is more extensible.
 **The user's position:** *"The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet."*
 **Decision:** Low priority. The `mcp_architecture_refactor_20260606` (already on the board) is the natural place to address this — sub-MCPs as self-describing modules.
 **Domain tag:** Both. *Future-track candidate: subsumed by mcp_architecture_refactor_20260606.*
 ### Pitfalls removed by user-corrections
 - **(removed)** Pitfall about "Conversation state is buried in module-level globals" — overstated. Manual Slop has editable UI state (Takes, UISnapshot, ContextPreset); it lacks editable *raw transcripts*, but that's a *different* design choice, not a gap. (See §3.)
 - **(removed)** Pitfall about "per-file memory" — overstated. Manual Slop *does* have per-file memory in the curation dimension; what's missing is nagent's conversation-log dimension, which is a different optimization. (See §6.)
 ---
 ## 16. Recommended reading path for engineers
 If you haven't read nagent, here's the priority:
 1. **The README's first 3 sections** ("What It Looks Like", "Durable Work", "Text In Text Out") — the philosophy in 5 minutes.
 2. **`bin/nagent:run_agent_loop()`** — the actual loop, 50 lines.
 3. **`bin/helpers/nagent_file_split_lib.py:SCORE_BY_TYPE`** — the per-language scoring; shows what "structure-aware" can mean without tree-sitter.
 4. **`bin/helpers/nagent_file_patch_lib.py:validate_index`** — the strict hash check; the safety property of nagent's split/patch workflow.
 5. **`bin/helpers/nagent_file_summarize_lib.py:summarize_content`** — the retry-with-smaller-prompt pattern.
 6. **`bin/helpers/nagent_cli.py:collect_bin_tool_descriptions`** — the tool-discovery pattern; 30 lines.
 The README's 14 sections can be skimmed in 15 minutes if you have the context this report provides. Read in order 1-5 above for the implementation depth.
 ---
 ## Appendix A. Cross-reference table
 | nagent file | Lines | Purpose | Manual Slop equivalent |
 |---|---|---|---|
 | `README.md` | ~1500 | 14-section teaching document | This report + `docs/guide_*.md` |
 | `bin/nagent` | ~700 | Main loop, tag parser, sub-conversation runner | `src/ai_client.py:send` + `src/multi_agent_conductor.py:ConductorEngine.run` + `simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async` (3 separate loops) |
 | `bin/nagent-llm-text` | ~50 | CLI wrapper for `nagent-llm.py` | (implicit; the Application calls `ai_client.send` directly) |
 | `bin/nagent-llm-upload` | ~30 | File upload + LLM call | (not present; the Application's read tools handle files inline) |
 | `bin/nagent-file-edit` | ~120 | Per-file subprocess wrapper | (not present; this is the gap that the user wants for 1:1 discussions) |
 | `bin/nagent-file-split` | ~170 | Main split executable | (not present in this form; Manual Slop uses `aggregate.py` + tree-sitter) |
 | `bin/nagent-file-patch` | ~80 | Main patch executable | (not present; Manual Slop uses `set_file_slice` / `edit_file` directly) |
 | `bin/nagent-file-summarize` | ~100 | Main summarize executable | `src/ai_client.py:run_subagent_summarization` (in-process) |
 | `bin/helpers/nagent_cli.py` | ~80 | `--description` pattern, `WaitSpinner` | (not present) |
 | `bin/helpers/nagent_llm.py` | ~300 | 4 providers, token accounting | `src/ai_client.py:_send_<provider>` × 5 (in-process, with cross-provider state) |
 | `bin/helpers/nagent_file_edit_lib.py` | ~170 | file-index by inode, `resolve_file_edit_conversation` | (not present) |
 | `bin/helpers/nagent_file_split_lib.py` | ~400 | `SPLIT_TYPES` (11 langs), per-language scoring | `src/file_cache.py:ASTParser` (tree-sitter) + `src/aggregate.py:build_file_items` |
 | `bin/helpers/nagent_file_patch_lib.py` | ~130 | strict hash validation, `make_unified_patch` | (not present; implicit mtime check) |
 | `bin/helpers/nagent_file_summarize_lib.py` | ~110 | per-segment LLM call, retry-with-smaller-prompt | `src/ai_client.py:run_subagent_summarization` (in-process, no retry) |
 | **Total nagent** | **~4000** | | **Manual Slop's analogous parts: ~5000+** (ai_client + multi_agent_conductor + mcp_client + aggregate + rag_engine + history + project_manager + tree-sitter-based tools) |
 Manual Slop is *not* smaller than nagent; it's *larger* because it has a GUI, persistence, HITL dialogs, Hook API, and a real test harness. The architectures serve different scales.
 ---
 ## Appendix B. Citations
 - nagent source: https://github.com/macton/nagent (all 11 source files read in full)
 - Internal: `docs/Readme.md`, `docs/guide_architecture.md`, `docs/guide_ai_client.md`, `docs/guide_mma.md`, `docs/guide_tools.md`, `docs/guide_mcp_client.md`, `docs/guide_app_controller.md`, `docs/guide_meta_boundary.md`, `docs/guide_context_curation.md`, `docs/guide_personas.md`, `docs/guide_rag.md`, `docs/guide_gui_2.md`
 - Internal source (selectively read for user-corrections): `src/models.py` (FileItem, ContextPreset), `src/context_presets.py`, `src/project_manager.py` (branch_discussion, promote_take), `src/aggregate.py`, `src/history.py`
 - Mike Acton, "Data-Oriented Design and C++" (cppCon 2014) — referenced but not directly cited
 - Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them" — cited via the `data_oriented_error_handling_20260606` track
 ---
 *End of report. See `comparison_table.md` for the flat reference, `decisions.md` for the future-track candidates, and `spec.md` for the track wrapper.*
@@ -0,0 +1,240 @@
 # Track: Mike Acton's nagent — Deep Dive on LLM Agent Architecture
 **Status:** Active (spec approved 2026-06-08; revised 2026-06-08 with user-corrections)
 **Initialized:** 2026-06-08
 **Owner:** Tier 2 Tech Lead
 **Priority:** Medium (architectural; informs future Application+Meta-Tooling decisions but is not a code refactor)
 > **Revision note (2026-06-08):** This spec was revised based on direct user corrections after the first draft. Earlier versions overstated gaps in Manual Slop's "editable discussion" and "per-file memory" features; the corrections are folded into §2 and §4 below. Read the **report.md** for the actual analysis; this spec.md is the wrapper.
 ---
 ## 1. Overview
 This track documents a deep-dive analysis of Mike Acton's [`macton/nagent`](https://github.com/macton/nagent) reference implementation ("nagent" = "not-an-agent") and its implications for how Manual Slop should think about LLM-driven workflows.
 nagent is a 14-section, ~1,500-line Python reference that operationalizes the philosophy **"the agent is not the thing; the data is the thing."** It provides a concrete, minimal counterpoint to the standard "agent framework" model. Its central claim: **durable work matters more than durable processes; explicit artifacts beat opaque state.**
 The companion doc ([report.md](./report.md)) is the deep-dive analysis itself — a 14-section comparison against Manual Slop's actual implementation, written for engineers (not marketing). This spec.md is the conductor/track wrapper: the design intent, the relationship to the Application vs Meta-Tooling split, the planned follow-up tracks, and the out-of-scope notes.
 ### 1.1 What this track produces
 | Artifact | Purpose |
 |---|---|
 | `spec.md` | This file — the track design and scoping. |
 | `report.md` | The 14-section deep-dive analysis. The primary deliverable. |
 | `comparison_table.md` | A flat side-by-side table (one row per nagent principle) for quick reference. |
 | `decisions.md` | Future-track candidates extracted from the analysis (each becomes a follow-up track if approved). |
 ### 1.2 Non-Goals
 - **Not** rewriting Manual Slop to use nagent. The architectures serve different domains (see §2).
 - **Not** replacing any existing track. This is a *reference* track — it informs future tracks but doesn't compete with them.
 - **Not** a comparison of "framework vs framework." nagent is a 1,500-line reference; Manual Slop is 13,000+ lines of production code with a real GUI, real persistence, real HITL. The comparison is *philosophical*, not "which is better."
 ---
 ## 2. The Application / Meta-Tooling Distinction (load-bearing context)
 Per `docs/guide_meta_boundary.md`, Manual Slop lives in two distinct architectural domains. **This distinction is critical for understanding the nagent comparison:**
 | Domain | Lives at | AI / HITL Model | Tooling |
 |---|---|---|---|
 | **The Application** (`manual_slop`) | `gui_2.py`, `ai_client.py`, `multi_agent_conductor.py`, `dag_engine.py` | A local GUI for orchestrating AI. The "Application AI" is a long-lived assistant that the user talks to over many turns. Strict HITL: every destructive action requires a GUI modal approval. | `manual_slop.toml [agent.tools]` — strict allowlist |
 | **The Meta-Tooling** (us) | `scripts/mma_exec.py`, `conductor/`, `.agents/skills/`, the MCP tools in `mcp_client.py` when used by external agents | External agents (Gemini CLI, OpenCode, Claude Code) that *build* the Application. Each invocation is a fresh sub-agent. Token-firewalled. | Full mcp_client.py toolset, including mutation tools |
 **nagent lives in the Meta-Tooling domain.** nagent is a reference for how *external* agents (the ones reading this conversation, the ones writing the code) should structure their own work.
 **Manual Slop's Application AI does not — and should not — look like nagent.** The Application AI is a chatty, conversational, persona-driven, RAG-augmented, curation-rich assistant with a real GUI. It's a *different kind of thing*. Conflating the two is exactly the kind of "feature bleed" `guide_meta_boundary.md` warns against.
 Every recommendation in `report.md` is qualified with which domain it applies to. The Application is the production code the user cares about; the Meta-Tooling is what we (the agents) use to build it.
 ---
 ## 3. Summary of the 14-Section Comparison
 The full table is in `comparison_table.md`. Verdict summary:
 | nagent Principle | Manual Slop Equivalent | Verdict |
 |---|---|---|
 | 1. Durable work, disposable workers | AppState snapshots + history branching (Takes); MMA workers are real subprocesses | **PARTIAL** — different domains; MMA has it, App doesn't need it |
 | 2. Text in, text out | `ai_client.send()` returns `str`; `mcp_client.dispatch` returns `str` | **PARITY** |
 | 3. Conversations are editable state | Discussion takes + branching + edit-in-place + UISnapshot history; `ContextPreset` for per-file view-mode memory | **PARITY (DIFFERENT FOCUS)** — Manual Slop has this; focuses on *editable UI state* (per Take) and *editable per-file curation* (per FileItem), not editable conversation logs |
 | 4. Visible output protocol | Uses provider-native function calling; the protocol is opaque to humans | **ARCHITECTURAL DIFFERENCE** — Application-side; correct trade-off |
 | 5. The loop (append, call, parse, act, repeat) | `ai_client._send_*` tool-call loop, MMA `ConductorEngine.run`, `WorkflowSimulator.run_discussion_turn_async` | **PARITY** — but the loop is in multiple files, not as a single small function |
 | 6. Per-file memory (curation, not conversation log) | `FileItem` (path + view_mode + ast_mask + custom_slices); `ContextPreset` (saved set of FileItems); Fuzzy Anchor slices | **MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION**; nagent's "file-edit conversation" pattern (one conversation log per file) is not present |
 | 7. Repository history as data | `_reread_file_items` mtime-based diff injection; `git_commit_file_patch` per-file history summaries; no explicit "neighborhood" computation | **PARITY (PARTIAL)** — diff injection is similar; the "neighborhood" computation is missing |
 | 8. Historical coupling & artifact neighborhoods | n/a (no equivalent) | **GAP** — could be added as a new tool |
 | 9. Disposable sub-conversations | MMA `mma_exec.py` Tier 3 workers are real subprocesses; **non-MMA 1:1 discussions do NOT have disposable sub-conversations yet** (per user) | **GAP (Application) — useful for 1:1 discussions; **PARITY for MMA** |
 | 10. Controlled writes | MCP 3-layer security + Execution Clutch + Allowlist Construction + Path Validation + Resolution Gate | **PARITY (STRONGER)** — Manual Slop's 3-layer is more thorough than nagent's tmpdir check |
 | 11. Large files as explicit artifacts (split/patch) | `nagent-file-split`/`nagent-file-patch`/`nagent-file-summarize` with `index.json` + segment files + source hash validation; 32 KB target size; per-language natural splitters (no tree-sitter) | **PARITY (DIFFERENT MECHANISM)** — both have the insight; nagent uses per-language scoring functions + subprocess isolation, Manual Slop uses tree-sitter + in-process `summarize.py` |
 | 12. Tool discovery (self-describing executables) | Hard-coded `dispatch` if/elif chain in `mcp_client.py` | **GAP (Application) — could be added; useful for the Meta-Tooling domain** |
 | 13. Differences from frameworks | The philosophical frame | n/a |
 | 14. Build your own | The reference's "minimal" claim is wrong for the Application | n/a for Application |
 The full 14-row analysis with 6 (revised from 8) specific Manual Slop pitfalls is in `report.md`.
 ---
 ## 4. The Revised 6 Pitfalls (corrected)
 Earlier versions of this list contained two errors that user-corrections caught:
 - **REMOVED** pitfall #3 (per "Conversation state is buried in module-level globals" was over-stated) — Manual Slop has *some* editable-state infrastructure (`HistoryManager` with UISnapshot, discussion Takes/branching, `ContextPreset` save/load) but the actual *raw conversation transcript* is in `ai_client._provider_specific_history` globals. The truth is: **Manual Slop has editable UI state, not editable conversation transcripts.** That distinction is now captured honestly in §3 of the report.
 - **REVISED** pitfall #6 (per "Per-file memory") — Manual Slop *does* have a per-file memory concept (`FileItem` + `ContextPreset` + `custom_slices` + `ast_mask`), but it's *curation memory*, not nagent's *conversation-log memory*. Manual Slop's concept is *richer in the curation dimension* but *absent in the conversation-log dimension*. That's a useful distinction.
 The remaining 6 pitfalls, after corrections:
 1. **No structured output protocol** in the Application AI (uses opaque function calling; nagent's regex tag protocol is the alternative for the Meta-Tooling). **Domain: Application can stay opaque; Meta-Tooling should learn.**
 2. **Provider-specific history is in process globals** (5 separate per-provider lists with their own locks; switching providers mid-session loses history). **Domain: Application. Future-track candidate.**
 3. **RAG is not "history as data"** — RAG retrieval is fuzzy and not auditable. nagent's git-history-driven context is exact and inspectable. RAG is useful but should be additive, not a replacement. **Domain: Application. Coexists with nagent-style history.**
 4. **The AI client is a stateful singleton with module-level globals** (2,685-line `ai_client.py` is unparseable without state). A future refactor toward a stateless `LLMClient` class with explicit `Conversation` objects would let the App save/load/replay conversations as files. **Domain: Application. Future-track candidate.**
 5. **No non-MMA disposable sub-conversations** — only MMA workers are real subprocesses; the user explicitly noted that 1:1 discussions don't have sub-agents. nagent's `<nagent-conversation>` pattern (a sub-agent for bounded investigation) would be valuable for the Application. **Domain: Application. Future-track candidate (user-flagged as a want).**
 6. **Hard-coded tool discovery** — the 45 MCP tools are in a flat if/elif chain in `dispatch`. nagent's `--description` self-describing executables pattern is more extensible. **Domain: both. Low priority.**
 Plus 2 domain-domain recommendations that are not pitfalls per se:
 - **Personas are config bundling** (per user: "just bundles preparatory cruft — vendor/model, tools/permissions, and system prompts"). The user noted that you can *completely opt out* by just using AI settings directly. **Domain: Application. Keep as-is; not a pitfall.**
 - **RAG is opt-in** (per user: "doesn't have to be used"). Worth considering: a sub-agent that *prepares RAG chunks* before a run. **Domain: Application. Future-track candidate.**
 ---
 ## 5. What This Track Read (in full, before writing)
 To avoid hand-waved claims, the report and this spec were written after reading all of:
 ### nagent source (read in full)
 - `README.md` (~1,500 lines) — the 14-section "teaching document"
 - `bin/nagent` (~700 lines) — the main loop, tag parser, sub-conversation runner, git history + co-edit + summary integration
 - `bin/helpers/nagent_llm.py` (~300 lines) — provider dispatch, token accounting
 - `bin/helpers/nagent_cli.py` (~80 lines) — `--description` self-describing executable pattern, `WaitSpinner`
 - `bin/helpers/nagent_file_edit_lib.py` (~170 lines) — file-index by `st_dev:st_ino`, `resolve_file_edit_conversation`, `is_split_segment_for_source`
 - `bin/helpers/nagent_file_split_lib.py` (~400 lines) — `SPLIT_TYPES` (11 langs), per-language `SCORE_BY_TYPE` (no tree-sitter; regex + line counts + brace/JSON/XML depth), 32 KB default, source SHA-256 hashing
 - `bin/helpers/nagent_file_patch_lib.py` (~130 lines) — strict hash validation, `make_unified_patch` via `difflib.unified_diff`, `apply_segment_patches` writes the source
 - `bin/helpers/nagent_file_summarize_lib.py` (~110 lines) — per-segment LLM calls + retry-with-smaller-prompt (max 2 attempts), `--limit-word-count` validation, `combined_summary_from_index`
 - `bin/nagent-file-edit` (~120 lines) — per-file subprocess wrapper, `default_pid = BASHPID or os.getppid()`
 - `bin/nagent-file-split` (~170 lines) — main executable, `--refresh INDEX` mode for re-splitting without losing segment paths
 - `bin/nagent-file-summarize` (~100 lines) — main executable, cascades to `nagent-file-split --summarize` for files > 64 KB; uses `positive_int` CLI type (rejects 0)
 ### Manual Slop docs (read in full)
 - `docs/Readme.md` (434 lines) — docs index
 - `docs/guide_architecture.md` (989 lines) — threading model, cross-thread data structures
 - `docs/guide_ai_client.md` (424 lines) — multi-provider LLM client
 - `docs/guide_mma.md` (564 lines) — 4-tier MMA orchestration
 - `docs/guide_tools.md` (506 lines) — MCP tool inventory + Hook API
 - `docs/guide_mcp_client.md` (410 lines) — 45 tools + 3-layer security
 - `docs/guide_app_controller.md` (447 lines) — headless controller
 - `docs/guide_meta_boundary.md` (57 lines) — Application vs Meta-Tooling split
 - `docs/guide_context_curation.md` (303 lines) — Granular AST Control + Fuzzy Anchor Slices + AST Inspector
 - `docs/guide_personas.md` (307 lines) — Unified agent profile model
 - `docs/guide_rag.md` (411 lines) — RAG subsystem
 - `docs/guide_gui_2.md` (477 lines) — ImGui application (App/Controller state delegation, hot-reload, defer-not-catch)
 ### Manual Slop source (selectively read, in service of the user-corrections)
 - `src/models.py` lines 510-559 (FileItem schema), 909-937 (ContextPreset schema)
 - `src/context_presets.py` (30 lines, full file) — the `ContextPresetManager`
 - `src/project_manager.py` lines 429-450 (`branch_discussion`, `promote_take`)
 - `src/aggregate.py` first 80 lines (context composition pipeline)
 - `src/history.py` (full file, 141 lines) — `UISnapshot` and the snapshot model
 The user-corrections specifically drove a re-survey of `FileItem` + `ContextPreset` + `aggregate.py` + `HistoryManager` after the first draft overstated Manual Slop's gaps.
 ---
 ## 6. Architectural Reference
 - **nagent source code:** https://github.com/macton/nagent (read in full for this analysis)
 - **nagent README:** https://github.com/macton/nagent/blob/main/README.md (the 14-section "teaching document")
 - **Mike Acton's data-oriented design talks:** https://www.youtube.com/results?search_query=mike+acton+data+oriented (foundational; nagent is a specific application)
 - **Ryan Fleury "errors are just cases":** https://www.dgtlgrove.com/p/the-easiest-way-to-handle-errors (cited in `data_oriented_error_handling_20260606`; consistent with nagent's data-over-control-flow stance)
 - **Internal:** `docs/guide_meta_boundary.md` for the Application/Meta-Tooling split
 - **Internal:** `docs/guide_architecture.md` §"Thread Domains" for the cross-thread state-sync problem that nagent sidesteps by having no GUI
 ---
 ## 7. See Also
 ### Internal Documentation
 - `docs/Readme.md` — Manual Slop documentation index
 - `docs/guide_architecture.md` — Threading model and provider dispatch
 - `docs/guide_ai_client.md` — The Application's LLM client
 - `docs/guide_mma.md` — 4-tier MMA orchestration
 - `docs/guide_meta_boundary.md` — The Application vs Meta-Tooling split
 - `docs/guide_tools.md` — MCP tool inventory and Hook API
 - `docs/guide_mcp_client.md` — 45 tools + 3-layer security
 - `docs/guide_context_curation.md` — Granular AST Control + Fuzzy Anchor Slices + AST Inspector
 - `docs/guide_personas.md` — Unified agent profile model
 - `docs/guide_rag.md` — RAG subsystem
 - `docs/guide_gui_2.md` — ImGui application
 ### Related Tracks
 - `data_oriented_error_handling_20260606` — Already cites Acton by name. The `Result[T]` + `ErrorInfo` data model from this track is consistent with nagent's "data, not control flow" stance.
 - `qwen_llama_grok_integration_20260606` — The "OpenAI-compatible shared helper" pattern is exactly nagent's "thin boundary adapter on a normalized data structure" approach.
 - `mcp_architecture_refactor_20260606` — Already blocked by `data_oriented_error_handling_20260606`. The sub-MCP extraction (planned) will benefit from nagent's "small helper per concept" decomposition pattern.
 - `data_structure_strengthening_20260606` — The type-alias work is consistent with nagent's "make the data shape explicit" stance. The audit script + NamedTuple work parallels nagent's split-index / patch-artifact approach.
 ### External
 - Mike Acton, "Data-Oriented Design and C++" (cppCon 2014) — The original DOD talk that nagent operationalizes
 - Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them" — Companion framework; same "errors as data" thesis
 - Timothy Lottes (@NOTimothyLottes) — Cited in the `data_oriented_error_handling` review; same "error codes are data" stance
 - Valigo (@valigotech) — Cited in the data_oriented_error_handling review; "exceptions mess with control flow in very weird ways"
 ---
 ## 8. Scope Boundaries
 ### In Scope
 - The 14-section nagent philosophy
 - The 6 (revised) concrete pitfalls in Manual Slop
 - Mapping each pitfall to a future-track candidate (in `decisions.md`)
 - Application vs Meta-Tooling domain classification for every recommendation
 - The philosophical grounding for existing Manual Slop conventions (data-oriented, thread-disciplined, GUI-decoupled)
 ### Out of Scope
 - **Implementation work.** This is a reference/analysis track. No code is being changed.
 - **Replacing nagent in the Meta-Tooling.** The Meta-Tooling is whatever the external agent (Gemini CLI, OpenCode) is. nagent is a *reference example*, not a competitor. It's worth reading for ideas, not adopting wholesale.
 - **Building a new "data-oriented" track for Manual Slop.** The `data_oriented_error_handling_20260606` track already covers the data-vs-control-flow axis. This track is the *philosophical foundation* for that work; the implementation track is separate.
 - **Comparing nagent to other LLM agent frameworks (LangChain, AutoGen, CrewAI, etc.).** nagent is a specific small reference; those are different scales. This track is about nagent specifically.
 ### Known Trade-offs (called out in the report)
 - **Manual Slop's personas are a feature, not a bug, in the Application domain.** A user-facing chatty assistant benefits from "persona = named configuration that the user can save and recall." nagent's "data, not personality" stance is correct for sub-agent invocations but wrong for long-lived assistant sessions. (Per user: personas are config bundling; the user can opt out by using AI settings directly.)
 - **Manual Slop's RAG is a feature, not a bug, in the Application domain.** RAG enables semantic search across large codebases. nagent's "git history → summaries" is exact but doesn't help when the user asks "how does the execution clutch work" and the relevant information is in `guide_architecture.md` (a doc, not source). RAG is opt-in.
 - **Manual Slop's GUI is a feature, not a bug, for its domain.** It enables the rich persona, curation, RAG, and snapshot UX. nagent explicitly has no GUI; the Application explicitly has a GUI. They serve different needs.
 - **The "1,500-line reference" vs "13,000-line production" comparison is not fair.** nagent is a teaching example. Manual Slop is a working tool. The right comparison is "nagent's principles vs Manual Slop's implementation," not "which codebase is better."
 ---
 ## 9. Verification Criteria
 This is a reference/analysis track. The verification is:
 - [ ] `report.md` exists and covers all 14 nagent principles with a Manual Slop assessment for each
 - [ ] `comparison_table.md` exists as a flat side-by-side reference
 - [ ] `decisions.md` exists with future-track candidates (each is a separate conductor track to be specced independently)
 - [ ] Every "Manual Slop could learn from nagent here" recommendation is tagged with the domain (Application / Meta-Tooling / Both)
 - [ ] No code is being modified by this track
 - [ ] The companion doc is read by ≥1 person who is planning a future track (the report.md file is referenced by the relevant future-track specs)
 - [ ] (Post-correction) The report's verdicts on nagent §3 (Conversations are editable state) and §6 (Per-File Memory) are *corrected* per user feedback — the first draft overstated gaps
 ---
 ## 10. Status
 **Approved 2026-06-08 (initial); revised 2026-06-08 with user corrections.** Ready for human review of `report.md`.
 After human review of `report.md`, the `decisions.md` candidates will be evaluated:
 - High-priority items (e.g., stateless `LLMClient` class, non-MMA sub-conversations, RAG pre-staging) → new conductor tracks
 - Medium-priority items (e.g., self-describing MCP tools, conversation file persistence) → research spikes
 - Low-priority items → deferred until a specific Application need surfaces
 The current `data_oriented_error_handling_20260606` track and the future `mcp_architecture_refactor_20260606` track are already philosophically aligned with nagent's principles; this track is the *explicit* reference to that alignment.
@@ -0,0 +1,113 @@
 # Track state for nagent_review_20260608
 # Reference/analysis track — no implementation phases
 # Updated by Tier 2 Tech Lead as track progresses (currently: complete)
 [meta]
 track_id = "nagent_review_20260608"
 name = "nagent Review (Mike Acton's data-oriented LLM agent reference)"
 status = "active"
 current_phase = 0  # 0 = pre-completion; this track produces no code phases
 last_updated = "2026-06-08"
 [user_corrections_log]
 # Corrections applied to the first draft based on direct user feedback during review
 # Format: 2026-06-08_NN = "correction" (NN is sequence number to ensure TOML key uniqueness)
 2026-06-08_1 = "Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS). User pointed at HistoryManager, project_manager.branch_discussion, UISnapshot — Manual Slop has editable UI state, not editable raw transcripts."
 2026-06-08_2 = "Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN CURATION DIMENSION. User pointed at FileItem (path + view_mode + ast_mask + custom_slices), ContextPreset, aggregate.py. Manual Slop's per-file memory is the curation kind, not the conversation-log kind."
 2026-06-08_3 = "Sub-conversations: removed 'PARITY stronger' claim. User clarified MMA has it but 1:1 discussions do not. Added 'GAP for 1:1 discussions' + user-flagged 'want' for future sub-conversation track."
 2026-06-08_4 = "RAG: clarified as opt-in, not gap. User wants pre-staging via sub-conversation ('Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run')."
 2026-06-08_5 = "Personas: reframed as config bundling, not gap. User noted personas can be completely opted out by using AI settings directly. They 'just bundle preparatory cruft.'"
 2026-06-08_6 = "Tool discovery: downgraded to 'intentional, low priority'. User has 'intent based DSL' idea but 'no where near that ideation yet.'"
 2026-06-08_7 = "Editable discussions: REVISED AGAIN. User pointed out the report's §3 verdict (PARITY/DIFFERENT FOCUS) didn't enumerate the per-entry operations. After re-reading gui_2.py:3770-3853 (render_discussion_entry) and gui_2.py:4239-4260 (render_discussion_entry_controls) and history.py (UISnapshot/HistoryManager), the report's §3 now lists the full A1-A7 per-entry + B1-B11 discussion-level + C1-C5 undo/redo operations. The verdict remains PARITY (DIFFERENT FOCUS) but the gap is more precisely scoped: Manual Slop's editing is more granular at the typed-entry layer; nagent's is deeper at the raw-transcript layer. The 'raw transcript is in process globals' framing in the previous draft is still correct as a *layer* description, but the report now correctly characterizes Manual Slop's editing as comprehensive at the user-visible layer."
 [tasks]
 # Reference track; no implementation tasks. Future-track candidates live in decisions.md.
 # Listing for accountability:
 t_reference_01 = { status = "completed", commit_sha = "", description = "Read nagent README + bin/nagent in full" }
 t_reference_02 = { status = "completed", commit_sha = "", description = "Read all 6 nagent helper files in full (cli, llm, file_edit, file_split, file_patch, file_summarize)" }
 t_reference_03 = { status = "completed", commit_sha = "", description = "Read all 4 nagent executable scripts in full (nagent-file-edit, nagent-file-split, nagent-file-patch, nagent-file-summarize)" }
 t_reference_04 = { status = "completed", commit_sha = "", description = "Read Manual Slop docs/ in full (12 guides + Readme)" }
 t_reference_05 = { status = "completed", commit_sha = "", description = "Read Manual Slop src/ files selectively for user-corrections (models.py FileItem + ContextPreset, context_presets.py, project_manager.py, aggregate.py, history.py)" }
 t_write_01 = { status = "completed", commit_sha = "", description = "Draft spec.md (track wrapper)" }
 t_write_02 = { status = "completed", commit_sha = "", description = "Draft report.md (14-section deep-dive analysis; primary deliverable)" }
 t_write_03 = { status = "completed", commit_sha = "", description = "Draft comparison_table.md (flat side-by-side reference)" }
 t_write_04 = { status = "completed", commit_sha = "", description = "Draft decisions.md (10 future-track candidates)" }
 t_write_05 = { status = "completed", commit_sha = "", description = "Create metadata.json + state.toml" }
 t_write_06 = { status = "completed", commit_sha = "", description = "Draft nagent_takeaways_20260608.md (10 actionable patterns; companion to report.md)" }
 t_write_07 = { status = "pending",    commit_sha = "", description = "Add entry to conductor/tracks.md (post-commit)" }
 t_write_08 = { status = "pending",    commit_sha = "", description = "Human review of report.md + nagent_takeaways_20260608.md (final)" }
 t_archive = { status = "pending",       commit_sha = "", description = "Move track to conductor/tracks/archive/ when follow-up tracks are specced (or sooner if no value remains)" }
 [user_wants_recorded]
 # User explicitly wants these in priority order (see decisions.md for full detail)
 want_1_sub_conversation_runner = "EXPLICIT: 'I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points'"
 want_2_rag_pre_staging = "EXPLICIT: 'Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run'"
 deferred_intent_dsl = "EXPLICIT but deferred: 'I want to add an intent based dsl to help with discovery or combinatorics but no where near that ideation yet'"
 [verification]
 # Reference/analysis track; verification is artifact presence + user-correction application
 report_md_exists = true
 comparison_table_md_exists = true
 decisions_md_exists = true
 spec_md_exists = true
 metadata_json_exists = true
 state_toml_exists = true
 nagent_takeaways_md_exists = true
 # All 14 nagent principles have a corresponding section in report.md
 all_14_principles_covered = true
 # All user-corrections applied to first draft
 all_user_corrections_applied = true
 # All pitfalls are domain-tagged (Application / Meta-Tooling / Both)
 all_pitfalls_domain_tagged = true
 # Track produces no code (it's a reference/analysis track)
 no_code_modified = true
 # No links broken in comparison_table.md, decisions.md, report.md, spec.md, nagent_takeaways_20260608.md
 all_internal_links_valid = true  # verified by post-edit grep
 # 10 actionable takeaways grounded in actual code (file:line refs)
 takeaways_grounded_in_code = true
 [nagent_principles_covered]
 # 14 of 14 — full coverage
 durable_work = "covered in report §1"
 text_in_text_out = "covered in report §2"
 editable_state = "covered in report §3"
 visible_protocol = "covered in report §4"
 the_loop = "covered in report §5"
 per_file_memory = "covered in report §6"
 repo_history = "covered in report §7"
 neighborhoods = "covered in report §8"
 sub_conversations = "covered in report §9"
 controlled_writes = "covered in report §10"
 large_files = "covered in report §11"
 tool_discovery = "covered in report §12"
 differences_from_frameworks = "covered in report §13"
 build_your_own = "covered in report §14"
 [future_track_candidates]
 # See decisions.md for full detail. 10 candidates.
 candidate_01_sub_conversation_runner = { priority = "HIGH",     user_flag = "explicit want", domain = "App + MT", effort = "Medium" }
 candidate_02_rag_pre_staging = { priority = "HIGH",     user_flag = "explicit want", domain = "App",      effort = "Small (depends on #1)" }
 candidate_03_stateless_llm_client = { priority = "MEDIUM",  user_flag = "none",          domain = "App",      effort = "Large" }
 candidate_04_intent_dsl = { priority = "LOW",      user_flag = "explicit but deferred", domain = "MT", effort = "Research" }
 candidate_05_self_describing_tools = { priority = "LOW",  user_flag = "implicit",    domain = "BOTH",    effort = "Medium (subsumed by mcp_architecture_refactor)" }
 candidate_06_git_history_injection = { priority = "MEDIUM",  user_flag = "none",          domain = "App",      effort = "Medium" }
 candidate_07_per_file_conversation_log = { priority = "LOW", user_flag = "none",          domain = "App",      effort = "Small" }
 candidate_08_coedited_files_tools = { priority = "LOW",    user_flag = "none",          domain = "App",      effort = "Small (bundle with #6)" }
 candidate_09_split_patch_lib = { priority = "DEFER",   user_flag = "none",          domain = "App",      effort = "Medium (defer until need)" }
 candidate_10_raw_transcript_persistence = { priority = "LOW", user_flag = "none",         domain = "App",      effort = "Small" }
 [status]
 # Track is a reference/analysis track; "active" means the artifacts are ready for review
 # The track will move to "completed" and be archived when:
 #   (a) At least one of the follow-up tracks (candidates 1-2) is specced, OR
 #   (b) The user explicitly says the analysis is no longer needed
 status = "active (reference artifacts ready; awaiting human review + follow-up track scoping)"