diff --git a/conductor/tracks/nagent_review_20260608/comparison_table.md b/conductor/tracks/nagent_review_20260608/comparison_table.md new file mode 100644 index 00000000..ddab78e4 --- /dev/null +++ b/conductor/tracks/nagent_review_20260608/comparison_table.md @@ -0,0 +1,79 @@ +# nagent vs Manual Slop: Comparison Table + +**Companion to:** `report.md` +**Date:** 2026-06-08 (revised same day) +**Source:** nagent v1.0.0 (read 2026-06-08) + +Flat side-by-side reference. One row per nagent principle. Verdicts and pitfalls are in `report.md`. + +--- + +## Legend + +- **Verdict values:** PARITY (same shape), PARITY+ (Manual Slop is stronger), PARITY- (nagent is stronger), PARTIAL (one half, not the other), GAP (Manual Slop lacks the feature), DOMAIN MISMATCH (different scope). +- **Domain tags:** APP = Application domain, MT = Meta-Tooling domain, BOTH. + +--- + +| # | nagent Principle (verbatim summary) | nagent Mechanism | Manual Slop Equivalent | Verdict | Domain | Action | +|---|---|---|---|---|---|---| +| 1 | Durable work, disposable workers. The agent is not the thing; the data is the thing. | `bin/nagent` 700-line single-file loop, conversation is a text file | MMA workers are real subprocesses with Context Amnesia; **Application AI is long-lived by design** | **PARTIAL** | BOTH | Future-track: stateless `LLMClient` class (§15.4) | +| 2 | Text in, text out. File in, text out is the smallest useful primitive. | `bin/nagent-llm-text` + `bin/helpers/nagent_llm.py` (4 providers) | `src/ai_client.py:send(...) -> str` (5 providers) | **PARITY** | BOTH | None | +| 3 | Conversations are editable state. The conversation file is not chat history; it is working state. | `bin/nagent` exposes `--save/load/edit/summarize`; text files are user-editable (vim/cat/diff/cp the raw transcript) | Discussion Takes + branching + per-entry edit (A1-A7 in report §3) + discussion-level CRUD (B1-B11) + role management (B5) + UI snapshot undo/redo (C1-C5) | **PARITY (DIFFERENT FOCUS)** — Manual Slop edits abstracted typed entries (`disc_entries` is a `list[dict]` with role + content + ts + thinking_segments + usage). Both have comprehensive editing; Manual Slop's is more granular at the entry layer, nagent's is deeper at the raw-transcript layer. | APP | Future-track: optional raw-transcript persistence per Take (Candidate 10) | +| 4 | Visible output protocol. Teach the model an output format; use a visible, parseable protocol. | `TAG_PATTERNS` regex list; `parse_response` strict; `MAX_FORMAT_RETRIES = 3` | Provider-native function calling (Gemini, Anthropic, etc.) | **ARCHITECTURAL DIFFERENCE** — Application's choice is correct (parallel tool calls, JSON mode) | BOTH | Future-track: intent-based DSL for Meta-Tooling calls | +| 5 | The loop. Append, call, parse, act, append, repeat. | `bin/nagent:run_agent_loop()` 50 lines, single `while True` | Three parallel loops: `ai_client._send_*` (LLM), `ConductorEngine.run` (MMA), `WorkflowSimulator.run_discussion_turn_async` (App) | **PARITY** | BOTH | (Low priority) Future-track: extract a single `src/llm_loop.py:run_loop` | +| 6 | Per-file memory. Each file gets its own persistent local memory. | `file_id_for_path` (st_dev:st_ino); `conversations/file-index-{pid}.json`; `nagent-file-edit` per-file subprocess | `FileItem` (path + view_mode + ast_mask + custom_slices); `ContextPreset` (saved set of FileItems); Structural File Editor | **PARITY (DIFFERENT KIND)** — Manual Slop's is *curation memory* (rich); nagent's is *conversation log memory* (plain text). Both real, both per-file, different optimization. | APP | Future-track: thin "last-investigation" log per file (Meta-Tooling-friendly) | +| 7 | Repository history as data. Turn git history into editing context. | `git_file_history` + `summarize_new_file_commits` + `coedited_file_rows` + `format_file_history` | `_reread_file_items` (mtime-based, diff injection); git-linked discussion tracking in GUI; **no historical-context injection** | **PARTIAL** — diff injection is similar; historical-context injection is missing | APP | Future-track: `src/git_history.py` mirroring nagent's `file_edit_history_and_summary_block` | +| 8 | Historical coupling & artifact neighborhoods. Files that change together are hints. | `coedited_file_rows` labels high/medium/low co-edit rate; guidance text "Use these files as hints. Do not edit unless the user request or evidence requires it." | None (closest: `py_get_hierarchy` is structural not historical) | **GAP** | APP | Future-track: `py_coedited_files` + `ts_c_coedited_files` MCP tools | +| 9 | Disposable sub-conversations. Exploration creates noise; spawn disposable workers. | `` tag spawns `nagent --invocation delegated` as subprocess; isolated conversation file; recursive token rollup | MMA Tier 3/4 workers (real subprocesses); **1:1 main discussion has no sub-conversation mechanism** | **PARITY for MMA; GAP for 1:1 discussions** | APP (and MT) | **USER-FLAGGED WANT**: Future-track `src/sub_conversation.py:SubConversationRunner` for 1:1 investigations | +| 10 | Controlled writes. A loop that writes files needs explicit boundaries. Not a sandbox; just conventions. | `validate_write_path`: main mode → tmpdir only; file-edit mode → target or segments; rejected writes append `` | `mcp_client._is_allowed` (3-layer: allowlist + path validation + resolution gate); `run_powershell` requires GUI modal approval; PowerShell-only by default; 60s timeout + `taskkill` cleanup; optional Tier 4 QA | **PARITY+ (Manual Slop stronger)** — 3-layer security + HITL + sandbox is dramatically stricter than nagent's tmpdir check | APP (and MT) | None — current design is right | +| 11 | Large files as explicit artifacts. Split, edit segments, patch. | `nagent-file-split` (11 langs, regex + line counts + brace/JSON/XML depth); `nagent-file-patch` (strict hash validation); `nagent-file-summarize` (per-segment + retry); 32 KB default; index.json with `source_path`, `sourcesha256`, `segments[]` | `aggregate.py:build_file_items` + `py_get_skeleton` (tree-sitter) + `ts_c_*_get_skeleton` (tree-sitter); `set_file_slice` / `edit_file` (mtime validation, not hash); `run_subagent_summarization` (in-process, no retry); `RAGEngine._chunk_code` (mtime-based, ChromaDB) | **PARITY (DIFFERENT MECHANISM)** — both have the insight; nagent uses per-language scoring functions + subprocess isolation + hash validation; Manual Slop uses tree-sitter + in-process + mtime validation | BOTH | Future-track: explicit `src/split_lib.py` + `src/patch_lib.py` mirroring nagent's design, with hash validation | +| 12 | Tool discovery. Tool capability should be explicit data. | `collect_bin_tool_descriptions` runs each `bin/* --description`; auto-builds "Available tools:" block for initial context | None (45 tools in `mcp_client.py:dispatch` if/elif chain) | **GAP** — nagent's pattern is genuinely better; current dispatch is fine but not extensible | BOTH (especially MT) | Future-track: subsumed by `mcp_architecture_refactor_20260606` (sub-MCPs as self-describing modules) | +| 13 | Differences from frameworks. The reframing table: memory→editable artifact, agent→temporary transformation function, context→explicit input data. | The philosophical frame | The applicable reframings: editable UI state, curated per-file memory, git history as data | **N/A** | BOTH | (Lens, not action) | +| 14 | Build your own. 12-step buildable list. | The reference | Manual Slop has all 12, in different files, at different scale | **PARITY** | BOTH | (Checklist) | + +--- + +## The 6 Pitfalls (revised, after user-corrections) + +See `report.md §15` for full details. Quick reference: + +| # | Pitfall | Domain | Future-track | User flag? | +|---|---|---|---|---| +| 1 | No structured output protocol in Application AI (opaque function calling) | BOTH | Intent-based DSL for Meta-Tooling | Implicit ("intent based DSL to help with discovery") | +| 2 | Provider-specific history in process globals (`_anthropic_history`, `_deepseek_history`, etc.) | APP | Stateless `LLMClient` class | No | +| 3 | RAG is not "history as data" (fuzzy, not auditable) | APP | RAG pre-staging sub-conversation | **Yes** ("Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run") | +| 4 | AI client is a stateful singleton with module-level globals (2,685-line file) | APP | Stateless `LLMClient` class (same as #2) | No | +| 5 | No non-MMA disposable sub-conversations | APP (and MT) | `src/sub_conversation.py:SubConversationRunner` | **Yes** ("I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points") | +| 6 | Hard-coded tool discovery (45-tool if/elif chain) | BOTH | Subsumed by `mcp_architecture_refactor_20260606` | Implicit ("intent based DSL to help with discovery") | + +### Pitfalls removed by user-corrections + +- **(removed)** "Conversation state is buried in module-level globals" — overstated. Manual Slop has editable UI state (Takes, UISnapshot, ContextPreset); the lack of editable raw transcripts is a *different* design choice, not a gap. See `report.md §3`. +- **(removed)** "No per-file memory" — overstated. Manual Slop *does* have per-file memory in the curation dimension (FileItem + ContextPreset + Fuzzy Anchors); what's missing is nagent's conversation-log dimension, which is a *different* optimization. See `report.md §6`. + +--- + +## Future-track candidates — priority list + +Ordered by user signal + implementation cost: + +1. **`src/sub_conversation.py:SubConversationRunner`** — user-flagged as a want. Extract MMA's `mma_exec.py` pattern into a reusable App-callable class. Useful for 1:1 investigations. **High priority.** (Pitfall #5) + +2. **RAG pre-staging via sub-conversation** — user-flagged as a want. A sub-agent pre-builds the RAG index for a planned run; the chunks become the discussion's starting memory. **High priority.** (Pitfall #3) + +3. **Stateless `LLMClient` class** — would unify Pitfall #2 and #4. Backwards-compatible with `ai_client.send()`. ~2-3 phases of careful refactor. **Medium priority.** + +4. **Intent-based DSL for Meta-Tooling tool calls** — user-noted as a want ("no where near that ideation yet"). **Low priority, research spike.** + +5. **Self-describing MCP tools (nagent §12 pattern)** — subsumed by `mcp_architecture_refactor_20260606`. **Low priority on its own.** + +6. **`src/git_history.py` for nagent §7 pattern** — historical context injection. **Medium priority, but only after #1-#2 are done.** + +7. **Per-file conversation log (nagent §6 conversation dimension)** — Meta-Tooling-friendly addition. **Low priority.** + +8. **`py_coedited_files` / `ts_c_coedited_files` MCP tools (nagent §8)** — small, contained. **Low priority.** + +9. **Explicit `src/split_lib.py` + `src/patch_lib.py` (nagent §11)** — only needed if very-large-file scenarios emerge. **Defer until needed.** + +10. **Optional raw-transcript persistence per Take (nagent §3 conversation dimension)** — niche. **Low priority.** diff --git a/conductor/tracks/nagent_review_20260608/decisions.md b/conductor/tracks/nagent_review_20260608/decisions.md new file mode 100644 index 00000000..679b5313 --- /dev/null +++ b/conductor/tracks/nagent_review_20260608/decisions.md @@ -0,0 +1,286 @@ +# Future-Track Candidates: nagent Review Follow-ups + +**Companion to:** `report.md` (deep-dive), `comparison_table.md` (flat reference), `nagent_takeaways_20260608.md` (actionable patterns) +**Date:** 2026-06-08 +**Source:** nagent v1.0.0 deep-dive review (see `report.md`) + +This document is the bridge from "what nagent teaches us" to "what Manual Slop should do about it." Each candidate is a *future* conductor track (not this one). The candidates are *not* committed — they emerge from the analysis but each is a separate scoping exercise. + +**For an actionable, code-grounded read of these candidates** (with the "what to do today, not just the future track" framing), see `nagent_takeaways_20260608.md` — it maps each candidate to specific patterns, design constraints, and small UX wins that don't need a new track. + +--- + +## Decision-making framework + +For each candidate: + +- **Why it matters** — what pitfall or capability gap does it address? +- **What it would do** — concrete description +- **Where it would live** — Application or Meta-Tooling +- **Dependency on existing tracks** — is anything already on the board? +- **Effort estimate** — small / medium / large +- **User signal** — has the user expressed want/don't-want/neutral? +- **Recommended priority** — high / medium / low + +The candidates are listed in priority order, which factors user signal heaviest (the user is the product owner for the Application; the analysis is just a reference). + +--- + +## Candidate 1: `src/sub_conversation.py:SubConversationRunner` + +**User signal:** **EXPLICIT WANT** ("I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points.") + +**Why it matters.** nagent's §9 pattern (disposable sub-conversations via ``) is the cleanest way to handle "investigate this without polluting the main discussion." Manual Slop has it for MMA (`mma_exec.py` is a real subprocess) but not for 1:1 discussions. The user is asking for this. + +**What it would do.** A `SubConversationRunner` class that the App can call during a 1:1 discussion: +- `await runner.spawn(prompt: str, *, allowed_tools: list[str] = None, system_prompt: str = None) -> SubConversationResult` +- The runner spawns a fresh Python process (reusing the MMA pattern: `mma_exec.py` template with `--invocation user`, `--parent-conversation `, isolated `~/.manual_slop/sub_conversations/`) +- The sub-process runs to completion (or times out) +- Result returns: a concise artifact (the sub-agent's `` block) + token usage + exit code +- The App inserts the result into the active discussion as a "User" role entry (so the parent LLM sees it on the next turn) +- Cleanup: sub-conversation folder is auto-archived after 7 days (consistent with `log_pruner.py`) + +**Where it lives.** Application. Possibly Meta-Tooling too (the `scripts/` directory could use the same primitive). + +**Depends on.** None directly. Could leverage MMA's `mma_exec.py` as a starting template. The `public_api_migration_20260606` follow-up track is unrelated. + +**Effort.** **Medium.** 2-3 phases: (1) extract reusable subprocess skeleton from MMA, (2) add 1:1-specific context injection, (3) add GUI controls ("Investigate…" button, optional command-palette command). + +**Recommended priority.** **HIGH** — user-flagged. + +--- + +## Candidate 2: RAG pre-staging via sub-conversation + +**User signal:** **EXPLICIT WANT** ("Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run.") + +**Why it matters.** Manual Slop's RAG (`src/rag_engine.py`) indexes files on the fly at discussion start. For large projects, indexing can take 30+ seconds (per `tests/test_rag_phase4_stress.py`). The user wants a "prep" workflow: before starting a long discussion, fire off a sub-conversation that pre-indexes everything, so the discussion starts instantly. + +This is also consistent with nagent's "data preparation is an explicit, visible step" philosophy (§1, §7). The RAG chunks are artifacts; preparing them is a transformation; the transformation can be a sub-conversation. + +**What it would do.** A "Pre-stage RAG" command in the GUI (or in `commands.py`): +- Spawns a sub-conversation with the prompt: "Index all files in [project] for RAG. Use the index_file tool on every file in the context. Report top-K queries at the end." +- The sub-conversation runs `rag_engine.index_file()` on each tracked file (uses the same `ChromaDB` backend, with mtime-based invalidation) +- Returns a concise summary: "Indexed N files. Top-K for 'execution clutch': [file1, file2, file3]." +- The main discussion starts with the index already warm; `RAGEngine.search()` is fast + +**Where it lives.** Application. The sub-conversation runner is the same primitive as Candidate 1; the staging logic is `RAGEngine` integration. + +**Depends on.** Candidate 1 (sub-conversation runner). Could be done as a feature within Candidate 1's track. + +**Effort.** **Small to medium.** The sub-conversation runner is the heavy lift (Candidate 1). The RAG-staging prompt is ~30 lines. + +**Recommended priority.** **HIGH** — user-flagged; cheap given Candidate 1. + +--- + +## Candidate 3: Stateless `LLMClient` class + +**Why it matters.** `src/ai_client.py` is 2,685 lines of stateful singleton with module-level globals for every provider's history. nagent's `bin/helpers/nagent_llm.py` is 300 lines of stateless dispatch. A refactor toward a stateless `LLMClient(provider, model, conversation)` class would: + +- Make `ai_client` parseable (no implicit state to track) +- Make tests deterministic (each test gets a fresh client) +- Enable conversation save/load (the `Conversation` object is the transcript) +- Enable provider switching without losing history + +This is a *big* refactor but a high-leverage one. Pitfalls #2 and #4 are both solved. + +**What it would do.** A new `src/llm_client.py`: +```python +@dataclass +class Conversation: + messages: list[Message] # role + content + tool_calls + tool_results + metadata: dict + def to_dict(self) -> dict: ... + def from_dict(data: dict) -> Conversation: ... + def save(path: Path) -> None: ... + def load(path: Path) -> Conversation: ... + +class LLMClient: + def __init__(self, provider: str, model: str, api_key: str = None): ... + def send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Conversation: ... + def stream_send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Iterator[Event]: ... +``` + +Backwards-compat: `ai_client.send(...)` becomes a thin wrapper that constructs a default `Conversation` from the current state and calls the new class. + +**Where it lives.** Application (the AI client is the Application's main AI entry point). + +**Depends on.** The `data_oriented_error_handling_20260606` track is independent but related — both push toward the data-oriented principles. The `public_api_migration_20260606` follow-up track would benefit from the new `Conversation` class. + +**Effort.** **Large.** 3-5 phases: (1) introduce `Conversation` dataclass, (2) per-provider `LLMClient.send`, (3) migration of existing `ai_client.send` callers, (4) deprecate module-level globals, (5) remove. ~2000+ lines of refactor. + +**Recommended priority.** **MEDIUM.** High value, but the existing stateful singleton works. Defer until a concrete Application need forces it (e.g., the user wanting to save/replay conversations). + +--- + +## Candidate 4: Intent-based DSL for Meta-Tooling tool calls + +**User signal:** **EXPLICIT WANT** ("The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet.") + +**Why it matters.** nagent's §4 regex-tag protocol is more debuggable than Manual Slop's function-calling. The Meta-Tooling (the external agents that build the Application) could benefit from a more compact, inspectable tool-call format. The existing JSON function-calling format forces the user to read verbose `{"name": "...", "args": {...}}` blobs. + +**What it would do.** An intent-based DSL that the Meta-Tooling can use in its own work. Examples (per the user's "discovery" or "combinatorics" hint): +- `` — intent: read this symbol +- `` — intent: semantic search the workspace +- `` — intent: surgical line-range edit +- `` — intent: run a specific test +- `` — intent: dependency trace + +These are read by the external agent (Gemini CLI, OpenCode), not by Manual Slop's Application AI. The Application's function-calling format stays the same (correct for its domain). + +**Where it lives.** Meta-Tooling. Documented in `docs/`; taught via the conductor convention; the external agent emits the DSL, the bridge script (`cli_tool_bridge.py`) translates to actual `mcp_client.py` tool calls. + +**Depends on.** None directly. The `mcp_architecture_refactor_20260606` may produce tools that are easier to call via DSL (atomic, composable). + +**Effort.** **Research spike, not implementation.** The user said "no where near that ideation yet." This is a design exercise, not a code change. + +**Recommended priority.** **LOW** — user explicitly deferred. + +--- + +## Candidate 5: Self-describing MCP tools (nagent §12 pattern) + +**Why it matters.** Manual Slop's 45 MCP tools are dispatched by a flat if/elif in `mcp_client.py:dispatch`. Adding a tool requires edits in 4 places (dispatch, security allowlist, capability declaration, tests). nagent's `--description` self-describing executable pattern is more extensible: drop an executable, it auto-appears. + +**What it would do.** Each sub-MCP (or each tool) emits a `--description` block on `--help`. The `dispatch` function introspects via `mcp_client.get_tool_schemas()` and includes the descriptions in the AI's initial context automatically. + +**Where it lives.** Application (the dispatch layer). The Meta-Tooling already has self-describing (via `claude_tool_bridge.py`); this is the Application-side equivalent. + +**Depends on.** The `mcp_architecture_refactor_20260606` is the natural place — the sub-MCPs would each be self-describing modules. + +**Effort.** **Medium** (subsumed by mcp_architecture_refactor_20260606). Not a separate track. + +**Recommended priority.** **LOW** — subsumed. + +--- + +## Candidate 6: `src/git_history.py` (nagent §7 pattern) + +**Why it matters.** Manual Slop's `_reread_file_items` does current-content diff injection. nagent's `file_edit_history_and_summary_block` does *historical* content injection: `git log --follow ` per file, LLM-summarized, plus co-edit neighborhood. For "explain this file" questions, the LLM is meeting the file fresh — git history would give it crucial context (who touched it last, why, what's nearby). + +**What it would do.** A `src/git_history.py:file_edit_history_and_summary_block(file_path, repo_root, provider, model, config_path, previous_initial_context=None) -> str` that: +- Calls `git log --follow --max-count=50 --date=short --format=...` per file +- Counts co-edited files per commit +- LLM-summarizes new commits (with cache for unchanged history) +- Renders a `{file-history}` block with editors, step-by-step, co-edited files, summarized commits +- Called from `aggregate.py:run` at discussion start, after the file is added to context + +**Where it lives.** Application (it's part of the AI's initial context). + +**Depends on.** None directly. The `data_oriented_error_handling_20260606` is independent. The `rag_engine.py` already has a `sourcesha256` field and mtime-based invalidation — the same pattern. + +**Effort.** **Medium.** 2 phases: (1) git history + co-edit, (2) LLM summarization with cache. ~300-500 lines. + +**Recommended priority.** **MEDIUM** — high value, but only after Candidates 1-2 are done. + +--- + +## Candidate 7: Per-file conversation log (nagent §6 conversation dimension) + +**Why it matters.** Manual Slop's per-file memory is the *curation* kind. nagent's is the *conversation log* kind. The user has the curation already; the conversation log is missing. The user's correction made this clear: the two are *different optimizations*, not equivalent. + +**What it would do.** A thin `~/.manual_slop/per_file/.md` per file (file_id by `st_dev:st_ino` for stability across renames, like nagent). Updated each time a discussion references the file. Format: +```markdown +# src/foo.py (file_id: 12345:67890) +Last referenced: 2026-06-08T12:34:56 (Discussion: "refactor auth") + +## 2026-06-08T12:34:56 - "how does the validation work?" +AI response: ... +(User) followup: "what about edge cases?" + +## 2026-06-05T... - "explain the parser" +AI response: ... +``` + +When the user opens a new discussion with the file in context, the per-file log is injected as a `{per-file-history}` block. + +**Where it lives.** Application (the per-file log is the App's memory). The Meta-Tooling doesn't need this — sub-agent invocations are already short-lived. + +**Depends on.** None. Could be added in a small follow-up to Candidate 3 (the `Conversation` object becomes the per-file log). + +**Effort.** **Small** if done as a thin layer on top of the `Conversation` class. **Medium** if done before Candidate 3 (no `Conversation` object to leverage). + +**Recommended priority.** **LOW** — niche, niche feature. + +--- + +## Candidate 8: `py_coedited_files` / `ts_c_coedited_files` MCP tools (nagent §8) + +**Why it matters.** nagent's `coedited_file_rows` produces a "files that historically co-edit with this file" table. Manual Slop has `py_get_hierarchy` (subclass scan) but no historical co-edit tool. Useful for "if I edit this file, what should I also look at?". + +**What it would do.** Two new MCP tools: +- `py_coedited_files(path: str) -> list[{path, commits_together, likelihood}]` — runs `git log --follow `, counts files in each commit, labels high/medium/low +- `ts_c_coedited_files(path: str) -> list[{path, commits_together, likelihood}]` — same, for C/C++ + +Returns a table. Used in the initial context as `{file-neighborhood}`. + +**Where it lives.** Application (initial context injection). + +**Depends on.** None. Small, contained. + +**Effort.** **Small.** ~200 lines + tests. The git-log is already in `aggregate.py`; this is a new tool that uses the same primitives. + +**Recommended priority.** **LOW** — small but niche. Worth bundling with Candidate 6 if that gets done. + +--- + +## Candidate 9: Explicit `src/split_lib.py` + `src/patch_lib.py` (nagent §11) + +**Why it matters.** Manual Slop doesn't have an explicit split/patch pipeline. For very large files (>50 KB), the current `aggregate.py` + tree-sitter approach works for *reading* (skeleton, summary) but not for *patching* (no explicit segment/hash model). + +**What it would do.** Mirror nagent's design: +- `src/split_lib.py` — per-language natural splitters, `index.json` with `source_path`, `sourcesha256`, `segments[]` +- `src/patch_lib.py` — strict `validate_index` (hash check), `make_unified_patch`, `apply_segment_patches` +- `src/summarize_lib.py` — per-segment LLM call + retry-with-smaller-prompt + +**Where it lives.** Application (the AI is the consumer). The Meta-Tooling already has nagent if it wants this. + +**Depends on.** None. Self-contained. + +**Effort.** **Medium.** 2 phases: split/patch, then summarize. ~500 lines. + +**Recommended priority.** **DEFER UNTIL NEEDED.** No current 1:1 use case requires explicit split/patch. If a future file is genuinely too large for tree-sitter to handle inline, this becomes Candidate #2-priority. + +--- + +## Candidate 10: Optional raw-transcript persistence per Take (nagent §3 conversation dimension) + +**Why it matters.** nagent's "edit the conversation file" pattern is foreign to Manual Slop because the App stores abstracted entries (`disc_entries`), not raw transcripts. The user-edit feature in the GUI does edit individual entries, but the underlying log of `function_call` / `tool_result` blocks is implicit. + +**What it would do.** Optionally, when a take is snapshotted to TOML (`project_manager.save_project`), also persist the raw transcript to a sibling file `discussions//transcript.jsonl`. The GUI gets a "View Raw Transcript" button. Optional "Edit Raw Transcript" mode that re-parses and re-aggregates. + +**Where it lives.** Application. Optional — user can toggle per-project. + +**Depends on.** None. Could be a small follow-up to Candidate 3 (`Conversation` class). + +**Effort.** **Small.** ~150 lines + tests. Persist the existing `comms.log` in a structured way. + +**Recommended priority.** **LOW** — niche feature, opt-in only. + +--- + +## Summary table + +| # | Candidate | User signal | Priority | Effort | Domain | +|---|---|---|---|---|---| +| 1 | `SubConversationRunner` (1:1 sub-convos) | **Explicit want** | **HIGH** | Medium | App + MT | +| 2 | RAG pre-staging via sub-conversation | **Explicit want** | **HIGH** | Small (depends on #1) | App | +| 3 | Stateless `LLMClient` class | (none) | Medium | Large | App | +| 4 | Intent-based DSL for Meta-Tooling | Explicit but deferred | Low | Research | MT | +| 5 | Self-describing MCP tools | Implicit | Low (subsumed) | Medium | BOTH | +| 6 | `src/git_history.py` (nagent §7) | (none) | Medium | Medium | App | +| 7 | Per-file conversation log | (none) | Low | Small | App | +| 8 | `py_/ts_c_coedited_files` tools | (none) | Low (bundle with #6) | Small | App | +| 9 | Explicit `split_lib.py` / `patch_lib.py` | (none) | Defer until needed | Medium | App | +| 10 | Raw-transcript persistence per Take | (none) | Low | Small | App | + +--- + +## Recommended next steps + +1. **Spec and build Candidate 1 first** — it's the highest-priority user-flagged want, and Candidates 2 builds on it. +2. **Combine Candidate 2 with Candidate 1's track** — same primitive, different prompt. +3. **Hold Candidates 3-10 for future scoping** — each is a separate conductor track when the corresponding need surfaces. + +The current `nagent_review_20260608` track itself produces no code; it's the reference. Candidates 1 and 2 will be the first *implementation* tracks informed by it. diff --git a/conductor/tracks/nagent_review_20260608/metadata.json b/conductor/tracks/nagent_review_20260608/metadata.json new file mode 100644 index 00000000..81c9477c --- /dev/null +++ b/conductor/tracks/nagent_review_20260608/metadata.json @@ -0,0 +1,132 @@ +{ + "track_id": "nagent_review_20260608", + "name": "nagent Review (Mike Acton's data-oriented LLM agent reference)", + "initialized": "2026-06-08", + "owner": "tier2-tech-lead", + "priority": "medium", + "status": "active", + "type": "reference + analysis + future-track scoping", + "scope": { + "new_files": [ + "conductor/tracks/nagent_review_20260608/spec.md", + "conductor/tracks/nagent_review_20260608/report.md", + "conductor/tracks/nagent_review_20260608/comparison_table.md", + "conductor/tracks/nagent_review_20260608/decisions.md", + "conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md" + ], + "modified_files": [], + "external_resources": [ + "nagent README: https://github.com/macton/nagent/blob/main/README.md", + "nagent source: https://github.com/macton/nagent (all 11 source files read in full)" + ] + }, + "blocked_by": [], + "blocks": [ + "sub_conversation_runner_app_1to1_20260608_PLACEHOLDER", + "rag_pre_staging_sub_convo_20260608_PLACEHOLDER", + "llm_client_stateless_class_20260608_PLACEHOLDER", + "intent_dsl_for_meta_tooling_20260608_PLACEHOLDER", + "git_history_injection_20260608_PLACEHOLDER", + "per_file_conversation_log_20260608_PLACEHOLDER", + "py_coedited_files_tool_20260608_PLACEHOLDER", + "ts_c_coedited_files_tool_20260608_PLACEHOLDER", + "split_patch_lib_20260608_PLACEHOLDER", + "raw_transcript_persistence_per_take_20260608_PLACEHOLDER" + ], + "estimated_phases": 0, + "spec": "spec.md", + "plan": null, + "nagent_principles_covered": [ + "Durable work, disposable workers", + "Text in, text out", + "Conversations are editable state", + "Visible output protocol", + "The loop", + "Per-file memory", + "Repository history as data", + "Historical coupling & artifact neighborhoods", + "Disposable sub-conversations", + "Controlled writes", + "Large files as explicit artifacts", + "Tool discovery", + "Differences from frameworks", + "Build your own" + ], + "manual_slop_features_audited": [ + "Context composition (FileItem + ContextPreset + custom_slices + ast_mask)", + "Discussion Takes + branching (project_manager.branch_discussion + promote_take)", + "UI Snapshot history (HistoryManager + UISnapshot)", + "Personas (Persona + PersonaManager)", + "RAG (RAGEngine + ChromaDB + summarization)", + "Multi-provider AI client (ai_client + 5 providers)", + "MMA conductor (mma_exec.py + ConductorEngine + WorkerPool)", + "MCP tools (45 tools + 3-layer security)", + "Hook API (api_hooks + api_hook_client)", + "GUI App/Controller state delegation" + ], + "user_corrections_applied": [ + "Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS)", + "Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN CURATION DIMENSION", + "Sub-conversations: removed 'PARITY stronger' claim; added 'GAP for 1:1 discussions'", + "RAG: clarified as opt-in, not gap; user wants pre-staging via sub-conversation", + "Personas: reframed as config bundling (not gap; can opt out via AI settings)", + "Tool discovery: downgraded to 'intentional, low priority'; user has deferred DSL idea", + "Editable discussions (second pass): report §3 now enumerates the full per-entry (A1-A7) + discussion-level (B1-B11) + undo/redo (C1-C5) operation matrix. Verdict remains PARITY (DIFFERENT FOCUS) but the gap is more precisely scoped: Manual Slop's editing is more granular at the typed-entry layer; nagent's is deeper at the raw-transcript layer." + ], + "domain_classification": { + "Application_domain_pitfalls": [ + "Provider-specific history in process globals", + "AI client is a stateful singleton with module-level globals", + "No non-MMA disposable sub-conversations (1:1 gap)", + "RAG is not 'history as data' (fuzzy vs exact)", + "Optional raw-transcript persistence (niche)" + ], + "Meta_Tooling_domain_pitfalls": [ + "No structured output protocol (opaque function calling)", + "Hard-coded tool discovery" + ], + "Application_features": [ + "Context composition with FileItem-level curation memory", + "Discussion Takes + branching (project_manager.branch_discussion + promote_take)", + "UI Snapshot history (HistoryManager + UISnapshot)", + "Personas as config bundling", + "RAG as opt-in semantic search", + "3-layer MCP security model + Execution Clutch" + ], + "Meta_Tooling_features_to_borrow": [ + "nagent-style --description self-describing executables", + "Intent-based DSL for compact tool calls" + ] + }, + "verification_criteria": [ + "spec.md exists and covers the 14 nagent principles", + "report.md exists and is the primary deliverable", + "comparison_table.md exists as flat side-by-side reference", + "decisions.md exists with 10 future-track candidates", + "nagent_takeaways_20260608.md exists with 10 actionable patterns (companion to report.md)", + "Every pitfall is tagged with Application / Meta-Tooling / Both", + "Pitfall #3 (conversations are editable) verdict is corrected to PARITY (DIFFERENT FOCUS) per user feedback", + "Pitfall #6 (per-file memory) verdict is corrected to 'Manual Slop is stronger in curation dimension' per user feedback", + "Pitfall #9 (sub-conversations) verdict notes MMA vs 1:1 distinction per user feedback", + "Report §3 enumerates the per-entry (A1-A7) + discussion-level (B1-B11) + undo/redo (C1-C5) operation matrix for Manual Slop's editable-discussion system, with file:line citations into gui_2.py and history.py", + "nagent_takeaways_20260608.md grounds each pattern in actual code with file:line references into both nagent source and Manual Slop source", + "No code was modified by this track (reference/analysis only)" + ], + "links": { + "report": "report.md", + "comparison_table": "comparison_table.md", + "decisions": "decisions.md", + "takeaways": "nagent_takeaways_20260608.md", + "user_signal_recorded": "User explicitly flagged SubConversationRunner + RAG pre-staging as wants during review", + "related_tracks": [ + "data_oriented_error_handling_20260606 (Fleury/Acton alignment)", + "qwen_llama_grok_integration_20260606 (OpenAI-compatible helper)", + "mcp_architecture_refactor_20260606 (sub-MCP extraction)", + "data_structure_strengthening_20260606 (type aliases)" + ], + "external": [ + "https://github.com/macton/nagent (nagent source code)", + "https://github.com/macton/nagent/blob/main/README.md (nagent README)" + ] + } +} diff --git a/conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md b/conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md new file mode 100644 index 00000000..a3b70685 --- /dev/null +++ b/conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md @@ -0,0 +1,363 @@ +# nagent: Actionable Takeaways for Manual Slop + +**Track:** `nagent_review_20260608` +**Date:** 2026-06-08 +**Companion to:** `report.md` (deep-dive comparison), `comparison_table.md` (flat reference), `decisions.md` (10 future-track candidates) +**Author:** Tier 2 Tech Lead +**Read this if:** you're planning a future track, designing a UX change, or wondering "what should we actually do with nagent's ideas?" + +> **What this document is.** The deep-dive in `report.md` maps nagent's 14 principles 1:1 to Manual Slop's existing features and finds six pitfalls. That's the *diagnosis*. This document is the *prescription* — 10 concrete patterns nagent uses that we can borrow, with each one grounded in actual code we've read and an explicit "what to do" path. +> +> **What this document is not.** It is not a critique of Manual Slop, not a recommendation to rewrite anything, and not a "framework migration" plan. nagent is a 4,000-line reference; Manual Slop is 13,000+ lines of production code with a GUI, real persistence, real HITL. The right reaction to nagent is *steal the patterns that fit our domain*, not adopt the whole system. +> +> **Domain filter.** Every takeaway below is tagged **Application**, **Meta-Tooling**, or **Both** — per `docs/guide_meta_boundary.md`. nagent lives in the Meta-Tooling domain by default. Some patterns transfer cleanly to the Application; some only make sense for the agents that build the Application. Don't apply a "Both" pattern without checking the domain. + +--- + +## 0. The 30-second version + +If you only read 3 things, read these: + +1. **Make state visible at the right layer** (§1) — nagent puts state in files you can `cat`. Manual Slop already does this for *editable* state (`disc_entries`, `ContextPreset`, `FileItem`, project TOML) but the *provider-side* history still lives in process globals. *Steal the visibility, not the file abstraction.* + +2. **Make the protocol readable in the conversation log** (§2) — nagent's conversation is plain text with `...` tags you can grep. Manual Slop's comms log is JSON-L with provider-native function-call blobs. *Add a "what the model actually said" projection layer.* + +3. **Make sub-agents a first-class primitive for the Application, not just MMA** (§3) — nagent has one sub-conversation mechanism, used everywhere. Manual Slop has sub-agents for MMA workers but not for 1:1 discussions. *The user explicitly wants this — it's the highest-priority future track.* + +The other 7 patterns are below. Each is grounded in code, not vibes. + +--- + +## 1. State visibility — files for the things that matter, processes for the things that don't + +**nagent's pattern.** Every piece of state that *survives* lives in a file under `~/.nagent/`: +- `conversations/` — the conversation transcript +- `conversations/file-index-{pid}.json` — file_id → conversation map +- `splits/-/index.json` — large-file split metadata +- `splits/-/-0001.` — segment files +- `splits/-/.patch` — unified diff patch + +The state that *doesn't survive* is the running process: LLM call result, current turn, parse state. The boundary is sharp: anything the user might want to inspect, diff, copy, or back up is a file. + +**Manual Slop today.** Already does this for the *editable* surface: +- `manual_slop.toml` (project) — `discussion.discussions[].history` (`app_controller.py:3236`) +- `conductor/tracks//{spec,plan,state.toml,metadata.json}` — track state +- `personas.toml` (global + project) — persona config +- `tool_presets.toml` — tool weights +- `logs/sessions//comms.log` — JSON-L of every LLM call (`app_controller.py:379`) + +What *isn't* in files: +- `ai_client._anthropic_history`, `_deepseek_history`, `_minimax_history` — 3 per-provider lists in process globals (`ai_client.py:123-132`) +- The current `disc_entries[i]["content"]` AI response *before* the user flushes the discussion to TOML +- The current `files` / `context_files` / `screenshots` until the next `_flush_to_project` + +**Actionable idea.** Add a **"Live State Inspector"** panel in the GUI that shows *all* the state that's currently in process — provider history lengths, current discussion entry count, the actual bytes that haven't been flushed yet, the `ai_client` module globals being read. This is a UX change, not an architecture change. It costs ~200 lines (a panel that reads from `app_controller._get_state_for_inspector()` and renders a tree). + +**Domain:** Both. The Application benefits from "what is the AI actually remembering right now?"; the Meta-Tooling benefits from "did my edit actually flow through to the right state?" + +**Effort:** Small. *Not* a new track — this can be a one-day add-on once the inspector is specced. + +**Cross-references:** Decision candidate #3 (Stateless LLMClient) becomes more attractive once the inspector exists, because you'd have a UI to verify the stateless refactor preserves behavior. + +--- + +## 2. A readable conversation log — text the user can grep, not just JSON-L + +**nagent's pattern.** The conversation file is plain text. Every action appears as a tag: +``` +python3 -m unittest discover -s tests -v + +exit_code: 0 +stdout: ... + +All 12 tests pass. +``` + +The user can `grep -n "exit_code: [^0]" ~/.nagent/conversations/latest-*` to find all failed shell runs. The user can `git diff` the conversation file. The user can `cp` it to a teammate. The protocol is *the storage format*, not a side channel. + +**Manual Slop today.** `comms.log` is JSON-L with provider-native function-call blobs. To find "did the model call `read_file` with the right path?" you need to load JSON, navigate to the right `function_call` entry, know the provider's schema, and dig out the args. The `function_call` itself is opaque — you can't `grep` for it without understanding the provider's wrapping. + +The `app.disc_entries` GUI display *is* the readable projection — when you look at a discussion in the GUI, you see the user/AI turns. But: +1. The view is in the GUI only; the underlying `comms.log` is JSON-L. +2. The thinking trace, tool calls, and tool results are flattened into the entry's `content` field via `thinking_parser.py`. You see the *result* but not the *call* unless you open the read mode. +3. There's no per-tool-call "View raw" button in the comms log panel (per `docs/guide_gui_2.md`). + +**Actionable idea — option A (small, UI-only).** Add a **"Reveal Raw"** toggle on the comms log panel that, when on, shows the JSON-L entry *next to* the rendered view, with the JSON pretty-printed. The user can copy either the rendered text or the raw JSON. ~100 lines. + +**Actionable idea — option B (medium, behavioral).** Project the conversation log into a sibling markdown file as it's written. Every `comms.log` entry gets a corresponding `.md` line that says "model called `read_file('src/foo.py')` at ." The user can `cat`, `grep`, or `tail -f` this file. The GUI reads from the same source of truth (the markdown) instead of from the JSON-L. ~300 lines + a streaming write hook in `ai_client`. + +**Domain:** Both. Option A is UI work in the Application. Option B benefits the Meta-Tooling more — an external agent that needs to understand what the Application AI did can read the markdown without parsing JSON-L. + +**Effort:** A is small. B is medium. **Pick A first**; the user-correction in `report.md §3` shows the user is already on top of editable-discussion nuance, so a small UX win here validates the larger bet. + +**Cross-references:** Decision candidate #6 (git-history injection) — the markdown projection is the same kind of "explicit data artifact for the AI's input/output" pattern, just for the comms log instead of git history. + +--- + +## 3. Sub-agents as a first-class primitive for 1:1 discussions + +**nagent's pattern.** The `` tag in `bin/nagent:execute_agent(...)` is the *only* sub-agent mechanism. Used everywhere: investigation, research, large-output work, debugging. The child is a fresh process with `Invocation = "delegated"`, an isolated conversation file, and a `` tag returned to the parent with the child's exit code + output + stderr + token totals. + +**Manual Slop today.** Sub-agents exist for MMA: +- `scripts/mma_exec.py` — Tier 3/4 worker subprocess +- `src/multi_agent_conductor.py:run_worker_lifecycle` — worker lifecycle +- `src/dag_engine.py` — ticket DAG and per-ticket worker pool + +But for 1:1 discussions (`simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async`), there's no sub-agent primitive. The user types a prompt, the AI responds, the loop continues. If the user wants the AI to "investigate this file" or "look up this API," the answer has to come from the same conversation. + +**Why it matters.** The MMA pattern is *already* the prototype. `mma_exec.py` is a real subprocess with Context Amnesia and a clean prompt boundary. The only thing missing is a way to invoke it from the 1:1 chat loop without going through the full MMA tier system. + +**Actionable idea.** Build `src/sub_conversation.py:SubConversationRunner` (Decision candidate #1, already specced in `decisions.md`): +```python +class SubConversationRunner: + async def spawn( + self, + prompt: str, + *, + allowed_tools: list[str] | None = None, + system_prompt: str | None = None, + timeout_s: int = 120, + ) -> SubConversationResult: + # Reuse mma_exec.py as the subprocess template + # Return the child's content + token usage + ... +``` + +Wire it into the GUI as a new "Investigate…" button on the message panel (`gui_2.py:4513+`). The button opens a small modal: "Ask a sub-agent: ___ [Investigate]". The sub-agent runs, the result is inserted as a "User" role entry in the current discussion, and the next LLM call sees it. + +**Domain:** Application. (The Meta-Tooling could use the same primitive from `scripts/`, but the win is in the App.) + +**Effort:** Medium. 2-3 phases. **HIGH priority** because the user explicitly wants it. + +**Cross-references:** Decision candidate #2 (RAG pre-staging) is the natural second use of this primitive — a sub-conversation that pre-builds the RAG index before a long discussion. + +--- + +## 4. File-identity over file-path — a stable `st_dev:st_ino` is rename-safe + +**nagent's pattern.** `nagent_file_edit_lib.py:file_id_for_path(path) -> "{st_dev}:{st_ino}"`. The per-file conversation index keys by inode, not by path. Rename the file in place (same inode) → same conversation. Move the file across dirs (same inode) → same conversation. This is the right primitive for "memory attached to the artifact, not the path." + +**Manual Slop today.** `models.FileItem.path: str` — path-keyed. `project.discussion.discussions[].context_snapshot` is a list of `FileItem.to_dict()` dicts, indexed by position in the list. Rename the file in your editor → `FileItem.path` is stale, `aggregate.py:build_file_items` re-reads the old path, may fail. The curation memory *survives* the rename (it's keyed by name in the project TOML) but the file lookup at render time does not. + +**Actionable idea — small (additive).** Add a `file_id: str` field to `FileItem` populated at load time via `os.stat(path).st_dev:st_ino`. Use it as the lookup key in the `context_snapshot` list. On file-read failure, attempt a fuzzy match: same basename in the same directory tree, or same `file_id` under a new path. ~150 lines + a migration for existing project TOML files (path-only becomes path + file_id). + +**Actionable idea — bigger (architectural).** If you do this, also rethink the `ContextPreset` storage. The current schema is a flat list of `FileItem` dicts. nagent's analog is a per-file `IndexEntry { file_id, path, last_seen, conversation, last_summary }`. A path rename in nagent updates `path` in the index but leaves `file_id` stable; in Manual Slop a path rename would orphan the entire `FileItem`. + +**Domain:** Application. (The Meta-Tooling would benefit from a stable file_id when navigating references across many files in a long session.) + +**Effort:** Small (additive) or medium (architectural). The additive path is the right starting point; the architectural rewrite is overkill for a feature that already works for 95% of cases. + +**Cross-references:** Decision candidate #7 (per-file conversation log) — `file_id` is the prerequisite for this candidate. + +--- + +## 5. One loop, one file — make the agent's brain visible by default + +**nagent's pattern.** `bin/nagent:run_agent_loop` is ~50 lines. `main()` reads CLI args, sets up the conversation file, calls `run_agent_loop`, exits. The conversation file accumulates over the entire session. The "agent" *is* the file plus a transient process. + +**Manual Slop today.** Three parallel loops, each in a different file: +- `src/ai_client.py:_send_` (per-provider, ~100-200 lines each × 5 providers) — the LLM-call loop +- `src/multi_agent_conductor.py:ConductorEngine.run` — the MMA loop +- `simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async` — the 1:1 chat loop + +Each loop has the same shape (build prompt → call LLM → parse response → dispatch tools → repeat) but the data structures differ. A reader has to hold three mental models. + +**Actionable idea — UX win, not architecture change.** Surface the *unified loop shape* in the diagnostics panel. The diagnostics panel already exists (`gui_2.py` §"Diagnostics Hub" per the Readme). Add a section "Loop Inspector" that shows, for each of the three loops: +- Last N iterations of: input tokens, output tokens, tool calls made, tool results, parse failures +- Color-coded: same shape across all three loops, different data sources +- "View raw" drill-down to the actual function call + +This is *not* a refactor. It's making the existing three loops legible. ~200 lines. + +**Actionable idea — bigger refactor.** Extract a `src/llm_loop.py:run_loop(conversation, provider, tool_dispatch, parse_response, ...)` that's called by all three. This is Decision candidate #5.5 (not in `decisions.md`; would be a new candidate). Effort: large. Value: real but the current separation is readable. + +**Domain:** Both. The UX win is in the Application. The refactor is neutral but helps the Meta-Tooling when agents need to reason about the loop. + +**Effort:** UX win is small. Refactor is large. **Do the UX win first.** + +**Cross-references:** Decision candidate #3 (Stateless LLMClient) — the refactor becomes more attractive if a unified loop exposes the data flow more clearly. + +--- + +## 6. Visible retry on protocol failure — turn errors into conversation data + +**nagent's pattern.** `bin/nagent:run_agent_loop` has `MAX_FORMAT_RETRIES = 3`. On a parse failure: +```python +append_to_conversation( + conversation_file, + f"\n{llm_output}\n\n" + f"Invalid nagent response format: {parse_error}. " + f"Respond only with valid nagent tags.", +) +``` + +The bad output is *appended to the conversation* with a `` correction. The next call sees its own previous failure and the correction message. The user can `grep` the conversation for `` to find every retry. + +**Manual Slop today.** `_send_` loops internally; on a tool-call parse failure it... retries. But the failure isn't visible in `comms.log` as a first-class entry — it's swallowed by the loop. The `tier4_qa` interceptor (per `docs/guide_ai_client.md` §"Tier 4 QA") catches *errors from tool execution* and forwards them to a cheap sub-agent for a 20-word summary, but parse failures don't go through this path. + +**Actionable idea — small, high value.** Add a `parse_failures` counter and a "Last 5 parse failures" section to the diagnostics panel. The counter increments on each `parse_response` failure; the section shows the model output, the error message, and the time. ~50 lines. The user gets to see *what* the model is getting wrong — useful for prompt engineering. + +**Actionable idea — medium, prompt-quality win.** When a parse failure happens, append a "self-correction" entry to `disc_entries` as a `role: "System"` entry. The next AI call sees the correction in the visible discussion history. The user can see the corrections and can edit them. ~150 lines. + +**Domain:** Both. The diagnostics panel is Application UX. The self-correction entry is neutral — useful for any agent that reads `disc_entries`. + +**Effort:** Small for option 1. Medium for option 2. **Do option 1 first.** + +**Cross-references:** nagent §5 "The loop" — the retry visibility is a load-bearing part of nagent's debuggability claim. + +--- + +## 7. "Inspect this file" / "Read this URL" as *prompts*, not function calls + +**nagent's pattern.** `` is a self-closing tag. The model emits it; the parser matches; `execute_read` runs. The model doesn't need to know the function-call schema for the LLM SDK — it just needs to emit text containing a tag. + +**Manual Slop today.** `read_file(path)` is a function call. The model has to know the function signature, format the JSON, embed it in the right `tool_use` block. The training data for "emit a `` tag" is zero; the training data for "emit a `read_file` tool call" is high. *Function calling wins on capability and on training*; *tag protocols win on debuggability*. + +**Actionable idea — both, but in different places.** This is the *one* place where the existing reports lean toward "different mechanism, both right." Don't replace the Application's function calling. But for the Meta-Tooling, document a *Meta-Tooling DSL* in `conductor/code_styleguides/` for use by external agents when they need to invoke Manual Slop's tools via the bridge script. The DSL would look like: +``` + + +``` + +The bridge script (`scripts/mma_exec.py` or whatever the Meta-Tooling bridge is) translates these to the underlying function calls. The external agent's prompt training data does *not* need to know the function-calling JSON schema for every Manual Slop tool — it just needs to know the DSL. + +**This is Decision candidate #4 (intent-based DSL) from `decisions.md`** — but reframed: it's not a Meta-Tooling-*side* DSL, it's a *bridge* DSL. The Application's function-calling stays. + +**Domain:** Meta-Tooling. The Application doesn't need this. + +**Effort:** Research spike, per the user's own assessment: "no where near that ideation yet." Document the design space; don't build it. + +**Cross-references:** Decision candidate #4. Also nagent §12 (tool discovery) — the DSL would be the bridge-side analog of `--description` self-describing executables. + +--- + +## 8. Self-describing tools — let the tool tell the agent what it does + +**nagent's pattern.** `nagent_cli.py:exit_on_description(description)` is called at the top of every executable: +```python +def exit_on_description(description: str) -> None: + if "--description" in sys.argv: + print(description) + raise SystemExit(0) +``` + +`nagent_cli.py:collect_bin_tool_descriptions(bin_dir)` runs each tool in `bin/` with `--description`, captures stdout, concatenates. The startup prompt includes the concatenated descriptions automatically. *Adding a new tool is: drop a script, write a description.* The system auto-discovers. + +**Manual Slop today.** `src/mcp_client.py:dispatch(...)` is a flat if/elif chain with 45+ branches. Adding a tool requires: +1. Edit `dispatch()` to add the branch +2. Update the security allowlist in `_resolve_and_check` (if filesystem access) +3. Update the AI capability declaration in `get_tool_schemas()` +4. Add tests + +**Actionable idea — defer to `mcp_architecture_refactor_20260606`.** This is already on the board as Decision candidate #5 (subsumed). The "sub-MCP" extraction that the refactor proposes is *exactly* the right scope for the self-describing pattern — each sub-MCP is a self-contained module with its own tool registry, and `collect_tool_descriptions` becomes a method on the sub-MCP class. + +**Don't** try to add this incrementally. The dispatch chain is large enough that half-measures (e.g. a per-tool decorator that auto-registers but still requires a manual allowlist edit) are net-negative. Wait for the refactor. + +**Domain:** Both. (Largely Application — the dispatch is in `mcp_client.py`. But the pattern would also be useful for the Meta-Tooling's `scripts/` directory.) + +**Effort:** Subsumed by `mcp_architecture_refactor_20260606`. + +**Cross-references:** Decision candidate #5. Already documented. + +--- + +## 9. Edit-the-input, not the output — make the prompt the artifact + +**nagent's claim (verbatim from README).** *"Don't edit the output artifacts. Edit the prompt."* If the LLM gives a bad answer, the fix is in the prompt or the inputs — not by hand-patching the output. The conversation file *contains* the prompt. Editing the conversation is editing the prompt for the next turn. + +**Manual Slop today.** The user can edit any `disc_entries[i]["content"]` directly via the `[Edit]` mode in the GUI (per `report.md §3 A1`). But the edited entry goes into the *abstracted entry list*, not into the *raw provider history*. The next LLM call sees: +- The full `disc_entries` rendered as markdown (with the user's edits) +- BUT the `ai_client._anthropic_history` (and siblings) is the *raw* provider-side list, with the *original* AI response and the *original* function calls + +So the user edits the *projection* but not the *source*. If the user corrects an AI response that included a bad tool call, the *display* shows the correction but the *provider's next call* will replay the original bad tool call as a "previous tool result" in the history. The two diverge. + +**This is subtle but important.** nagent avoids this entirely because the conversation file *is* the prompt — there's no separate "raw provider history" to keep in sync. + +**Actionable idea — small, surgical.** When the user edits an entry's `content` in `[Edit]` mode, *also* rewrite the corresponding `ai_client.__history[i]["content"]` to match. The user sees one source of truth; the provider sees the same source of truth. ~100 lines + a careful test for Anthropic's content-block semantics (it has multiple content blocks per message, not a single string). + +**Actionable idea — bigger, the right architecture.** Stop maintaining two histories. Make `disc_entries` the *only* history. `ai_client.__history` becomes a *projection* of `disc_entries`, rebuilt on each send(). This is part of Decision candidate #3 (Stateless LLMClient) — the `Conversation` object becomes the single source of truth. + +**Domain:** Both. The edit-the-projection fix is Application UX. The single-history architecture is Application + (benefiting) Meta-Tooling. + +**Effort:** Small for option 1, large for option 2. **Option 1 is the right starting point** — it's a known issue with a known fix, and the user-correction in `report.md §3` shows the user is on top of editable-discussion nuance. + +**Cross-references:** Decision candidate #3 (Stateless LLMClient). Also nagent §3 (conversations are editable state) — the philosophy is "one editable source of truth," and Manual Slop currently has two. + +--- + +## 10. Sub-agents return a *concise artifact*, not a full transcript + +**nagent's pattern.** `` contains only the child's `` body + exit code + stderr. The parent's conversation is *not* polluted with the child's intermediate reads, shell calls, or retries. The parent gets a *distilled* result. + +**Manual Slop today (MMA path).** `multi_agent_conductor.py` returns the worker's final response to the parent (the `ConductorEngine`). The worker's intermediate steps are logged to `comms.log` but not propagated. So MMA *does* follow the nagent pattern for sub-agent outputs. *This is good.* + +**Manual Slop today (1:1 chat, no sub-agents).** No equivalent. The user can't ask a sub-agent and get a distilled answer. The whole point of the user-flagged Decision candidate #1 is to add this — and the implementation should follow nagent's pattern: the sub-agent returns a *string artifact*, not its full conversation log. + +**Actionable idea — design constraint on the upcoming track.** When implementing Decision candidate #1 (SubConversationRunner), specify the return type as `SubConversationResult { artifact: str, tokens_in: int, tokens_out: int, exit_code: int, errors: list[str] }`. Do *not* return the child's full conversation. The parent's `disc_entries` gets one new "User" entry containing `artifact`. The child's full transcript is persisted to `~/.manual_slop/sub_conversations/.jsonl` for debugging but is not in the parent's visible discussion. + +**Domain:** Application (this is the design constraint for candidate #1). + +**Effort:** Zero net new effort — this is a design constraint, not a feature. Bake it into the spec for candidate #1. + +**Cross-references:** Decision candidate #1. nagent §9 (sub-conversations). The `MAX_FORMAT_RETRIES = 3` retry budget in nagent also informs the design — the sub-agent should be allowed to retry internally, but its final artifact to the parent should be a single string. + +--- + +## Cross-cutting observations (not patterns, but framing) + +### A. nagent's "files are the system" is the same philosophy as Manual Slop's project TOML + conductor tracks + +The *philosophy* of nagent — that data lives in files you can `cat`, `git diff`, and `cp` — is already present in Manual Slop: +- `manual_slop.toml` is the project's source of truth +- `conductor/tracks//state.toml` is the track's state +- `personas.toml`, `tool_presets.toml`, `context_presets.toml` are all TOML +- The Hook API exposes this state via `POST /api/project` for external automation + +What's *not* yet at that level: the AI's working state (the in-flight `disc_entries`, the provider history globals). Closing this gap is the theme of Decision candidates #3, #7, and #10. + +### B. nagent is small because it has no GUI. Don't be jealous of the size. + +nagent: ~4,000 lines. Manual Slop: 13,000+ lines of production code + 5,000+ lines of MCP tools + a 5,000-line GUI. The size difference is the GUI, the persistence, the test harness, the HITL dialogs, and the Hook API. None of those are reducible by adopting nagent's patterns; they're features Manual Slop users want and use. The right comparison is "nagent's *patterns* vs Manual Slop's *implementation*," not "which codebase is smaller." + +### C. The user-corrections shaped the takeaways + +Three user-corrections during the deep-dive review directly influenced which patterns made this list: +- **"Editable discussions are more comprehensive than the first draft said"** → made takeaway #1, #2, #9 (visibility, log readability, single-history) all about *respecting* what Manual Slop already has rather than suggesting it lacks. +- **"MMA is fine; 1:1 sub-agents are the gap"** → made takeaway #3 (sub-agents for 1:1) the highest-priority actionable item, with #10 (sub-agent return type) as the design constraint. +- **"Personas are config bundling, RAG is opt-in, tool discovery is deferred"** → kept those three out of the "must steal" list. They're in the future-track `decisions.md` but not in *this* document. + +The takeaways are *user-shaped* as well as nagent-shaped. If the user had a different correction in any of those areas, the takeaway list would shift. + +--- + +## Recommended reading order for a future implementer + +If you're about to build one of the future tracks, read in this order: + +1. **Track 1 — Sub-conversation runner (Application):** Read this entire document, especially §3 and §10. Then read `decisions.md` candidate #1. Then read `src/multi_agent_conductor.py:run_worker_lifecycle` and `scripts/mma_exec.py` for the template. + +2. **Track 2 — RAG pre-staging (Application):** Read this entire document, especially §3 (the parent). Then read `decisions.md` candidate #2. Then read `src/rag_engine.py:index_file` and `docs/guide_rag.md`. + +3. **Track 3 — Stateless LLMClient (Application, big refactor):** Read this entire document, especially §1, §5, #6, #9. Then read `decisions.md` candidate #3. Then read `src/ai_client.py:113-135` (the provider globals) and `src/history.py` (the UISnapshot pattern). Then read `docs/guide_ai_client.md` end-to-end. + +4. **Track 4 — Meta-Tooling intent DSL (Meta-Tooling, research):** Read this entire document, especially §7. Then read `decisions.md` candidate #4. Then read `bin/nagent:parse_response` and the 8 tag patterns there. Then read `src/commands.py` and `src/command_palette.py` to see Manual Slop's existing command-DSL precedents. + +5. **Track 5 — Self-describing MCP tools (subsumed):** Read this entire document, especially §8. Then read the existing `mcp_architecture_refactor_20260606` spec. + +6. **Track 6 — Git history injection (Application, medium):** Read this entire document, especially #1 and #4 (file identity). Then read `decisions.md` candidate #6. Then read `bin/nagent:format_file_history` and `bin/nagent:coedited_file_rows` for the reference implementation. Then read `src/aggregate.py:run` for the insertion point in Manual Slop. + +7. **Track 7 — Per-file conversation log (Application, small):** Read this entire document, especially #1, #4, and #9. Then read `decisions.md` candidate #7. This is dependent on candidate #4 (file_id) — read takeaway #4 first. + +8. **Track 8 — Co-edited files tools (Application, small):** Read this entire document, especially §6 and #8. Then read `decisions.md` candidate #8. This is dependent on candidate #6 (git history) — read takeaway #6's reference impl first. + +9. **Track 9 — Split/patch lib (defer until need):** Read this entire document, especially #5 (unified loop). Then read `decisions.md` candidate #9. Then read `bin/helpers/nagent_file_split_lib.py` and `bin/helpers/nagent_file_patch_lib.py` for the reference implementation. This is *not* a near-term need; only build when a very-large-file scenario actually surfaces. + +10. **Track 10 — Raw-transcript persistence per Take (Application, small):** Read this entire document, especially §1, §2, and §9. Then read `decisions.md` candidate #10. This is dependent on candidate #3 (single history) — read takeaway #9 first. + +--- + +## Final note: this is a *reference* track + +This document does not commit any of the 10 takeaways to implementation. Each is a *candidate* — a design space, not a decision. The user (the product owner) and the Tier 2 Tech Lead will scope each into a real conductor track when the corresponding need surfaces. The fact that these patterns are *all grounded in code I've read* (nagent + Manual Slop) is the value of this document; the patterns themselves are *raw material for future work*, not commitments. + +End of takeaways document. diff --git a/conductor/tracks/nagent_review_20260608/report.md b/conductor/tracks/nagent_review_20260608/report.md new file mode 100644 index 00000000..d2a3bc38 --- /dev/null +++ b/conductor/tracks/nagent_review_20260608/report.md @@ -0,0 +1,571 @@ +# Mike Acton's nagent: A Deep-Dive Analysis vs Manual Slop + +**Track:** `nagent_review_20260608` +**Date:** 2026-06-08 (revised with user corrections same day) +**Author:** Tier 2 Tech Lead (with significant user review on §3 and §6) +**Companion to:** `spec.md` (the track wrapper) + +> **Important reading note.** This report applies the **Application vs Meta-Tooling distinction** (per `docs/guide_meta_boundary.md`) as the lens for every comparison. nagent is a Meta-Tooling reference; Manual Slop's Application AI is a *different kind of thing*. Where they share patterns (MMA workers, the tool-call loop, the 3-layer security model), the report says so. Where they don't, the report says so. The report deliberately avoids "nagent is better" / "Manual Slop is better" framings. +> +> **Revision note.** The first draft overstated gaps in Manual Slop's "editable discussion" and "per-file memory" features. The user caught this and pointed at the actual files (`FileItem`, `ContextPreset`, `aggregate.py`, `project_manager.branch_discussion`, `HistoryManager`). The corrections are now folded in. Specific corrections: §3 (verdict changed from PARTIAL to **PARITY (DIFFERENT FOCUS)**); §6 (verdict changed from DOMAIN MISMATCH to **MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION**); §9 (verdict now notes the MMA vs 1:1 distinction explicitly per the user). + +--- + +## 0. Reading guide + +- **Sections 1-14** map 1:1 to nagent's 14 principles. Each has: nagent's claim, nagent's implementation, Manual Slop's equivalent, a verdict, and a domain tag. +- **Section 15** extracts the 6 actionable pitfalls and maps each to a future-track candidate. +- **Section 16** is the recommended reading path for engineers who haven't read nagent. + +If you only have 10 minutes, read §3 (Conversations), §6 (Per-File Memory), §9 (Sub-Conversations), §10 (Controlled Writes), and §15 (the pitfalls list). + +--- + +## 1. Durable work, disposable workers + +**nagent's claim.** A Python process is a *worker*; the files are the *system*. Workers come and go; data stays. **"The agent is not the thing; the data is the thing."** + +**nagent's implementation.** `bin/nagent` is a 700-line single-file loop. It reads `~/.nagent/conversations/` (a plain text file) for the current conversation, appends to it after every action, and exits. The user types `nagent "investigate this"`. The CLI is a shell. The state is a file. + +**Manual Slop's equivalent.** Manual Slop has two parallel systems: + +1. **MMA workers are real subprocesses.** `multi_agent_conductor._spawn_worker` runs `mma_exec.py` via `subprocess.Popen` (per `docs/guide_multi_agent_conductor.md` §"Token Firewalling"). Each Tier 3 worker is a fresh Python process with **Context Amnesia** — `ai_client.reset_session()` at the start of `run_worker_lifecycle`. The subprocess is the disposable worker; the artifacts (track state, ticket results) are the system. + +2. **The Application AI is *not* a disposable worker.** `gui_2.py:App` is a long-lived Qt/ImGui process. The user types a prompt, hits Enter, gets a response, *keeps the process running for hours*. The `app_state` dataclass is the long-lived worker. This is *intentional* for the Application domain: persona-driven conversations, snapshot-based undo, cross-discussion state — all require a long-running process. + +**Verdict.** **PARTIAL** — nagent's pattern lives in the Meta-Tooling + MMA, but the Application deliberately has long-lived workers. The two coexist because they serve different needs: MMA is fire-and-forget per ticket; App is an interactive partner. + +**Domain tag:** Both. MMA has it; App doesn't need it. *Future-track candidate: a stateless conversation-file pattern for the App (see §15.4).* + +--- + +## 2. Text in, text out + +**nagent's claim.** The smallest useful primitive is: file in, text out. `nagent-llm-text --file question.txt` reads a file, calls the LLM, prints plain text or JSON. Everything else in nagent is orchestration around this. + +**nagent's implementation.** `bin/helpers/nagent_llm.py` (300 lines) provides `generate_text(message, provider, model) -> str` for 4 providers (openai, anthropic, google, cursor). Token accounting via provider usage metadata (with character-count fallback at 1 token per 4 chars). Provider churn is isolated in this file. + +**Manual Slop's equivalent.** `src/ai_client.py:send(...) -> str` is the parallel. 5 providers (gemini, anthropic, deepseek, minimax, gemini_cli). Same `provider, model, usage` shape. Manual Slop wraps the string in a larger `(md_content, user_message, base_dir, file_items, ..., rag_engine) -> str` because the Application's text-in/text-out also needs tool calls, RAG injection, tier attribution, and patch-mode. But the *primitive* is the same. + +**Verdict.** **PARITY.** nagent and Manual Slop both use text-in/text-out at the bottom. The Application's `send()` is a *strict superset* of nagent's `nagent-llm-text`, with provider churn still isolated to a single module. + +**Domain tag:** Both. Meta-Tooling uses the same primitive via `mma_exec.py`'s `ai_client.send`. + +--- + +## 3. Conversations are editable state + +**nagent's claim.** The conversation file is not chat history. It is working state. Memory goes stale; therefore let people save, load, summarize, edit, branch, trim, copy, diff, version, and rewrite conversations. **"The conversation does not own its memory. The user does."** + +**nagent's implementation.** +- `bin/nagent` exposes `--save-conversation `, `--load-conversation `, `--summarize`, `--edit-conversation `. The latter **automates** one path: archive current file, run file-edit on the archive, load the result. +- Conversations are plain text files. The user can `cat`, `vim`, `git diff`, or `cp` them with no special tooling. The `` body and `` body are just text in the file. +- The first draft of this section understated Manual Slop's editing capability. The corrected picture is below. + +**Manual Slop's equivalent (corrected, with the full operation matrix).** Manual Slop's discussion editing lives at **three nested layers**, each with its own operations. The full enumeration: + +**Layer A — Per-entry operations on `app.disc_entries: list[dict]`** (the discussion's typed message list). The renderer is `src/gui_2.py:3770 render_discussion_entry(...)`. Per entry, the user can: + +| # | Operation | GUI control | Source code | What it does | +|---|---|---|---|---| +| A1 | **Edit content in place** | `imgui.input_text_multiline` on the entry body | `gui_2.py:3841` | The entry's `content` field is a fully editable multi-line text input. The user can rewrite an AI's response, fix a typo in their own prompt, paste in code from another source, etc. | +| A2 | **Toggle read/edit mode** | `[Edit]` / `[Read]` button | `gui_2.py:3799` | When in `[Read]` mode, the content is rendered as Markdown with syntax highlighting (`render_discussion_entry_read_mode` at `gui_2.py:3855`). When in `[Edit]` mode, the multi-line text input is shown. | +| A3 | **Toggle collapsed/expanded** | `+/-` button per entry | `gui_2.py:3789` | Collapsed entries show a 60-char preview (line 3822-3824). Expanded entries show full content. | +| A4 | **Change role** | Combo box from `app.disc_roles` | `gui_2.py:3793-3796` | The entry's `role` field is editable. The list `app.disc_roles` is itself user-managed (see B5). | +| A5 | **Insert entry before this one** | `Ins` button | `gui_2.py:3813` | `app.disc_entries.insert(index, {"role": "User", "content": "", "collapsed": True, "ts": project_manager.now_ts()})` | +| A6 | **Delete this entry** | `Del` button | `gui_2.py:3815-3816` | `if entry in app.disc_entries: app.disc_entries.remove(entry)`. The membership check matters — ImGui can re-render stale state, so the check guards against double-delete. | +| A7 | **Branch at this entry** | `Branch` button | `gui_2.py:3821` → `app._branch_discussion(index)` → `app_controller._branch_discussion:3503` → `project_manager.branch_discussion:429` | Creates a new Take named `_take_` and copies the history up to and including `index` into the new Take. The user is then switched to the new Take. | + +The entry dict shape itself is open: `{"role": str, "content": str, "collapsed": bool, "ts": str, ...}` plus optional `thinking_segments` (for AI entries with `` blocks, parsed by `src/thinking_parser.py`) and `usage` (for token accounting: input/output/cache). The user can also set per-entry `read_mode` (a render-time flag, not persisted). + +**Layer B — Discussion-level operations** (the Take / discussion set). These are the second-tier controls, rendered at `src/gui_2.py:4239 render_discussion_entry_controls(...)` and the discussion selector at `gui_2.py:4330 render_discussion_selector(...)`: + +| # | Operation | GUI control | Source code | What it does | +|---|---|---|---|---| +| B1 | **Append new entry** | `+ Entry` button | `gui_2.py:4240` | `app.disc_entries.append({...})` with the default role from `app.disc_roles[0]`. | +| B2 | **Collapse all / Expand all** | `-All` / `+All` buttons | `gui_2.py:4242-4246` | Bulk-set `collapsed` flag on every entry. | +| B3 | **Clear all** | `Clear All` button | `gui_2.py:4248` | `app.disc_entries.clear()`. | +| B4 | **Save (flush to project TOML)** | `Save` button | `gui_2.py:4250` | `app._flush_to_project(); app._flush_to_config(); app.save_config()`. | +| B5 | **Add/remove roles** | `Add` / `X` buttons under "Roles" | `gui_2.py:4317-4328` | `app.disc_roles.append(r)` / `app.disc_roles.pop(i)`. The role list is **user-managed at runtime** — they can add `"Context"`, `"Tool"`, `"Vendor API"`, or any custom role and assign it to any entry. | +| B6 | **Switch active discussion** | Discussion combo + Take tabs | `gui_2.py:4197, 4344, 4354` | `app._switch_discussion(name)`. The Takes group by base name (`name.split("_take_")[0]`) and render as nested tabs. | +| B7 | **Rename / Delete discussion** | `Rename` / `Delete` buttons | `gui_2.py:4291, 4293` | `app._rename_discussion(...)` / `app._delete_discussion(...)`. Cannot delete the last discussion (guarded at `app_controller.py:3543`). | +| B8 | **Promote Take to top-level** | `Promote` button in takes panel | `gui_2.py:4364` | `project_manager.promote_take(app.project, app.active_discussion, new_name)` — renames a Take (e.g. `T0_take_2`) to a fresh top-level discussion name. | +| B9 | **Per-role filter** | `ui_focus_agent` selector (system-wide) | `gui_2.py:4230-4234` | `display_entries = [e for e in app.disc_entries if e.get("role") == persona_name or e.get("role") == "User"]`. The filter follows the MMA persona focus. | +| B10 | **Truncate to N pairs** | `Truncate` button + `drag_int` | `gui_2.py:4254-4260` | `truncate_entries(app.disc_entries, app.ui_disc_truncate_pairs)` keeps the last `N` User/AI pairs (per `gui_2.py:175 truncate_entries(...)`). | +| B11 | **Compress (AI summarization)** | `Compress` button | `gui_2.py:4252` → `app_controller._handle_compress_discussion:3357` | Calls `ai_client.run_discussion_compression(disc_text)` and replaces the discussion with the LLM's compressed version. | + +**Layer C — UI snapshot history (undo/redo).** The `HistoryManager` (`src/history.py:71`, `max_capacity=100`) and `UISnapshot` (`history.py:8-63`) provide Ctrl+Z / Ctrl+Y across the entire UI state — including `disc_entries`: + +| # | Operation | Source code | What it does | +|---|---|---|---| +| C1 | **Take snapshot** | `gui_2.py:735 _take_snapshot` → `history.UISnapshot(...)` | `copy.deepcopy(self.disc_entries)` — a deep copy of the full entry list is captured. The snapshot also captures `ai_input`, `temperature`, `top_p`, `max_tokens`, `auto_add_history`, `files`, `context_files`, `screenshots`, all system prompts. | +| C2 | **Apply snapshot (undo/redo)** | `gui_2.py:754 _apply_snapshot` | Restores `self.disc_entries = snapshot.disc_entries` (and all the other fields). | +| C3 | **Change detection triggers snapshot** | `gui_2.py:1160, 1166-1167` | `if len(current.disc_entries) != len(self._last_ui_snapshot.disc_entries) or ...` — disc_entries content change pushes a new snapshot. | +| C4 | **Capacity-evict oldest** | `history.py:80-90 push()` | When the undo stack exceeds 100, the oldest is popped from the front. | +| C5 | **Jump to specific state** | `history.py:129 jump_to_undo(index, current_state, ...)` | Allows time-traveling to any past snapshot, not just the most recent. | + +**Summary of editability.** Manual Slop provides: +- **Per-entry content edit** (A1, A2) — the AI's response text is fully editable in the GUI +- **Per-entry insert at any position** (A5) — the user can drop a new entry *between* two existing entries, not just append +- **Per-entry delete at any position** (A6) +- **Per-entry role change** (A4) — the user can re-label any entry as User, AI, Tool, Context, or any custom role +- **Per-entry branch** (A7) — creates a Take at any entry, not just at the end +- **Per-entry collapse/expand** (A3) — visual organization +- **Per-discussion full CRUD** (B1, B6, B7, B8) — append, switch, rename, delete, promote +- **Per-role set management** (B5) — the role list itself is user-editable +- **Bulk operations** (B2, B3, B10) — collapse/expand all, clear, truncate +- **AI-assisted compression** (B11) — summarize the whole discussion +- **Undo/redo across all of the above** (C1-C5) — Ctrl+Z / Ctrl+Y / jump-to-state + +**What Manual Slop does NOT have.** The user cannot edit the **provider-side raw transcript** — the bytes inside the `ai_client._anthropic_history`, `ai_client._gemini_chat._history`, etc. process globals. These are reset on `ai_client.reset_session()`. nagent's "edit the conversation file" pattern operates at *this* layer, not the entry abstraction. The comms log (`comms.log`) is JSON-L and append-only, not user-editable from the GUI (it can be edited on disk in a text editor, but that's a different workflow). + +**Verdict.** **PARITY (DIFFERENT FOCUS).** Both systems support comprehensive editing of the conversation-as-data. The difference is *what counts as "the conversation"*: +- nagent's "conversation" = the raw transcript text file (the bytes the LLM produced) +- Manual Slop's "conversation" = a typed entry list with role + content + metadata + optional thinking segments + +Manual Slop's editing is **more granular and more pervasive** (per-entry content edit, per-entry insert/delete, per-entry role-change, per-entry branch, with undo/redo). nagent's editing is **deeper at the raw transcript layer** (edit the actual AI response text before it's been abstracted into a typed entry). Both are real; both are deliberate. + +**Domain tag:** Application. The Application's typed-entry abstraction is intentional — the user thinks in "discussions" not "transcripts." The user can opt-in to the raw-transcript layer by editing `comms.log` on disk or by reading the TOML `discussions//history` field directly. + +*Future-track candidate: optionally persist the raw transcript as a sibling file under each take (Candidate 10 in `decisions.md`), enabling the nagent-style "edit the actual AI response" workflow for users who want it.* + +--- + +## 4. Visible output protocol + +**nagent's claim.** Free-form model output is hard to execute. Use a visible protocol: ``, ``, ``, ``, etc. The startup prompt lists the only tags the model may emit. The parser is strict: recognized tags and whitespace. Nothing else. **"If you cannot read the protocol, you cannot debug the system."** + +**nagent's implementation.** `bin/nagent:TAG_PATTERNS` is a list of `(tag_type, compiled_regex)` tuples. `parse_response()` returns `None, error` if any non-whitespace text is found outside a known tag. The error message is appended to the conversation and the model is asked to retry (up to `MAX_FORMAT_RETRIES = 3`). + +**Manual Slop's equivalent.** Manual Slop's Application AI uses **provider-native function calling** (Gemini `genai.types.FunctionDeclaration`, Anthropic `tool_use` blocks, etc.). This is *opaque*: the protocol is encoded in JSON the provider parses. The user cannot read a `function_call` from the comms log and reason about it without knowing the provider's schema. + +The two approaches are **structurally different**: + +| Aspect | nagent regex tags | Manual Slop function calling | +|---|---|---| +| Visibility | Plain text, inspectable in the conversation file | JSON blobs in provider-specific format | +| Per-provider portability | Same tags work across all 4 providers | Each provider has its own schema; mcp_client's 45 tools have 5 different per-provider formats | +| Provider capability ceiling | Whatever the model can emit as text | Native parallel tool calls, structured outputs, JSON-mode constraints | +| Debuggability | "Why didn't the model read the file?" → grep the conversation for the tag | "Why didn't the model call read_file?" → inspect the JSON response | + +**Verdict.** **ARCHITECTURAL DIFFERENCE** — both are correct for their domain. The Application *wants* parallel tool calls, JSON-mode constraints, and provider-side caching. The Meta-Tooling *might want* nagent's regex tags for explicit debuggability. + +**Domain tag:** Both. The Application's choice is right (modern providers all support function calling with parallel execution — see `docs/guide_ai_client.md` §"Async Tool Execution"). The Meta-Tooling *could* adopt nagent's regex-tag protocol for its own work — for example, by using `` instead of a tool-call JSON. This is explicitly the difference between the "Application's internal AI" and the "Meta-Tooling that builds the Application" in `docs/guide_meta_boundary.md`. + +*Future-track candidate: a Meta-Tooling-side DSL for compact tool calls (per the existing `docs/reports/PLANNING_DIGEST_20260606.md` reference to "an intent-based DSL" for "discovery" or "combinatorics").* + +--- + +## 5. The loop (append, call, parse, act, append, repeat) + +**nagent's claim.** "Agent behavior" is mostly: append, call, parse, act, append, repeat. Heavier systems add infrastructure around the same steps. + +**nagent's implementation.** `bin/nagent:run_agent_loop` is a `while True` loop: +1. Append user prompt to conversation file +2. Send conversation file to LLM (via `nagent-llm-text --json`) +3. Append response to conversation file +4. If response contains action tags: run those actions, append results, continue loop +5. If response contains ``: print and stop + +**Manual Slop's equivalent.** Manual Slop has *three* parallel "loops": + +1. **`src/ai_client.py:_send_`** — the per-provider tool-call loop. Up to `MAX_TOOL_ROUNDS + 2 = 12` iterations. Each round: call provider, parse function calls, dispatch, append tool results. Same shape as nagent. + +2. **`src/multi_agent_conductor.py:ConductorEngine.run`** — the MMA loop. Per ticket: `ai_client.reset_session()` (Context Amnesia), build prompt, `loop.run_in_executor(None, run_worker_lifecycle, ...)`. Different scope (per ticket, not per user turn). + +3. **`simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async`** — the 1:1 chat loop. Per user turn: build markdown, send, wait, append response. Different scope (per user turn, in the App). + +All three have the same "append, call, parse, act, repeat" shape. They differ in *what gets appended* (per-provider history vs track state vs `disc_entries`). + +**Verdict.** **PARITY.** The loop is the universal pattern. Manual Slop's three loops are at different layers (LLM, MMA, App). The lack of a *single* "the loop" file is a real cost — nagent's `run_agent_loop` is 50 lines, easy to reason about. Manual Slop's loops are 100-300 lines each, scattered. + +*Future-track candidate: a single `src/llm_loop.py:run_loop(...)` function that all three callers use, with the dispatch and parse layers injected. (Not a high-priority refactor; the current separation is readable.)* + +**Domain tag:** Both. + +--- + +## 6. Per-file memory (curation, not conversation log) + +**nagent's claim.** One conversation grows too large. Attach memory to artifacts. Work keeps coming back to the same files; give each file its own persistent local memory. **"When work orbits one artifact, store memory on that identity."** + +**nagent's implementation.** `bin/helpers/nagent_file_edit_lib.py` provides: +- `file_id_for_path(path) -> "{st_dev}:{st_ino}"` — a stable file identity across renames (the inode is preserved). +- `file_index_path(root, pid) -> conversations/file-index-{pid}.json` — a JSON registry of `{file_id: {path, conversation}}`. +- `resolve_file_edit_conversation(root, pid, file_path) -> (name, resolved, file_id)` — gets or creates a per-file conversation. +- `nagent-file-edit --file src/foo.py "add validation"` — spawns a new nagent process with `--file_edit src/foo.py`, which loads the file's *previous* conversation as the initial context. After edits, the new file is appended to the same conversation. + +The result: a per-file conversation log keyed by inode. Rename with same inode = same conversation. Pure path-based: nope, you'd collide across two repos on the same machine. + +**Manual Slop's equivalent (corrected per user).** The first draft of this report marked this section as "DOMAIN MISMATCH" — claiming Manual Slop has no per-file memory. **This was wrong.** + +Manual Slop *does* have a per-file memory concept. It's just **a different kind of memory**. Where nagent's per-file memory is a *conversation log* (what the LLM said about this file last time), Manual Slop's is a *curation config* (how to present this file in the AI's context window). The two are complementary, not equivalent. + +The Manual Slop per-file memory: + +```python +# src/models.py:510 +@dataclass +class FileItem: + path: str # the artifact identity (path-keyed, no inode) + auto_aggregate: bool = True # include in auto-aggregation? + force_full: bool = False # bypass aggregation with full content? + view_mode: str = 'full' # full / skeleton / summary / sig / def / agg + selected: bool = False # for batch operations + ast_signatures: bool = False # only signatures + ast_definitions: bool = False # only definitions + ast_mask: dict[str, str] # per-symbol mask (from Structural File Editor) + custom_slices: list[dict] # Fuzzy Anchor slices with tag+comment + injected_at: Optional[float] # timestamp +``` + +Plus the **ContextPreset** (`src/models.py:909`): a *named, persisted set* of `FileItem`s, stored in the project's `manual_slop.toml`. Load a preset → restore the same per-file curation state. This is the per-file memory that survives across discussions. + +The user pointed at this directly: *"we have the context composition we can directly control what's in memory at the start of a discussion."* That's the right framing. `aggregate.py:run` builds the initial markdown from `self.context_files` (the active preset's FileItems) + `aggregate.run(flat, aggregation_strategy=...)`. The user controls the per-file memory at discussion start. + +What's *missing* is nagent's specific pattern: **a per-file conversation log keyed by inode.** Manual Slop does not have a "last investigation of this file" concept stored as a file. The closest analog is *commit history* (the discussion itself is git-linked, per `docs/guide_gui_2.md` §"Discussions Sub-Menu" "Git Commit Tracking"). But that's discussion-scoped, not file-scoped. + +**Verdict.** **MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION; nagent IS STRONGER IN THE CONVERSATION-LOG DIMENSION.** Both have a real per-file memory concept. Manual Slop's is "how do I render this file next time the AI sees it" (rich, with 9 fields, AST-aware); nagent's is "what did the LLM say about this file last time" (plain text, with stable inode identity). The two are not equivalent; they're different optimizations for different needs. + +**Domain tag:** Application (for the curation config). The user-correction explicitly said: *"we have the context composition we can directly control what's in memory at the start of a discussion."* That confirms this is a real Application feature, not a gap. + +*Future-track candidate: extending the per-file memory with a thin "last-investigation" log per file. A `~/.manual_slop/per_file/.md` (file_id by inode, like nagent) that records the last time a discussion referenced this file, the questions asked, and the answers received. This is a Meta-Tooling-friendly addition because it's a plain file.* + +--- + +## 7. Repository history as data + +**nagent's claim.** A repo is not only the current tree. History is data too. Transform git history into editing context for a target file. Not vague "retrieval." Explicit transformation of historical artifacts into working input. + +**nagent's implementation.** `bin/nagent:file_edit_history_and_summary_block(file_edit_path, ...)`: +- `git_file_history(repo_root, rel_path)` — `git log --follow --max-count=50` per file +- `summarize_new_file_commits(...)` — LLM call to one-line-summarize new commits +- `coedited_file_rows(repo_root, rel_path, commits)` — counts files in the same commits; labels high/medium/low co-edit rate +- `format_file_history(...)` — produces a `{file-history}` block with editors, step-by-step, co-edited files, summarized commits + +**Manual Slop's equivalent (partial).** Manual Slop's `_reread_file_items` (in `ai_client.py`) does mtime-based *current* content re-reading with diff injection as `[SYSTEM: FILES UPDATED]`. It does *not* do git history injection. + +The closest things Manual Slop has: +- **Git commit-linked discussion tracking** in the GUI: each discussion has a "Update Commit" button that stamps `git rev-parse HEAD` (per `docs/guide_gui_2.md` §"Discussions Sub-Menu"). +- **`src/dag_engine.py`** tracks ticket-to-git-commit relationships, but for *MMA* workers, not for the AI's context. + +**Verdict.** **PARTIAL.** Manual Slop has current-content diff injection (the easy half) but lacks historical-context injection (the harder half). nagent's `summarize_new_file_commits` would be a useful addition to the Manual Slop AI's context — especially for "explain what this file does" questions where the LLM is meeting the file fresh. + +**Domain tag:** Application. *Future-track candidate: a `src/git_history.py` module that mirrors nagent's `file_edit_history_and_summary_block` and is invoked at discussion start (after `aggregate.py`).* + +--- + +## 8. Historical coupling & artifact neighborhoods + +**nagent's claim.** A file lives in a neighborhood of related artifacts. Files that change together in git history are hints: tests, headers, config, paired implementation. High co-edit rate means "look here maybe." Not "edit everything." + +**nagent's implementation.** `coedited_file_rows(repo_root, rel_path, commits)`: +- Counts files in the same commits as the target +- Labels: high (>=50% co-edit), medium (>=20%), low +- Renders a `| file | commits together | P(other file changed | target file changed) |` table +- Guidance text: "Use these files as hints. Before editing, inspect high-likelihood co-edited files when the requested change may affect interfaces, tests, config, or paired code. Do not edit them unless the user request or evidence requires it." + +**Manual Slop's equivalent.** None. Manual Slop has `py_get_hierarchy` (subclass scan) and `ts_c_*_get_*` AST tools, but **no tool that returns "files that historically co-edit with this file."** The closest is `derive_code_path` (call-graph trace), which is structural not historical. + +**Verdict.** **GAP.** This is a real missing tool. nagent's framing — "hints, not commands" — is exactly the right level for a co-edit suggestion. A 50-line tool (`py_coedit_files(path) -> list[(path, count, likelihood)]`) would fill the gap. + +**Domain tag:** Application. *Future-track candidate: a `py_coedited_files` MCP tool + `ts_c_coedited_files` for C/C++.* + +--- + +## 9. Disposable sub-conversations + +**nagent's claim.** Exploration creates noise. Spawn disposable workers. Sub-conversations are temporary nagent processes with isolated conversations. Their lifetime does not matter. The artifact they return matters. + +**nagent's implementation.** `` tag in the main loop's response: +- Parent appends `` to its conversation +- Parent spawns `nagent --invocation delegated --parent-conversation --json` as a subprocess +- Child's `--json` output is parsed, rolled up into the parent's `recursive_input_tokens` / `recursive_output_tokens` +- Child has its own conversation file; no shared context except the explicit prompt +- Parent gets a concise artifact: the child's `` content, plus token usage + +**Manual Slop's equivalent (corrected per user).** The first draft of this report claimed **PARITY (stronger in some ways)**. The user corrected this: + +> *"I don't know if I have disposable sub-conversations, I don't really have them for non-mma runs. I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points."* + +So the actual picture is: + +| Layer | Sub-conversation support | +|---|---| +| **MMA Tier 3 / Tier 4** | **Yes.** `mma_exec.py` spawns a real subprocess per ticket with Context Amnesia. `ai_client.reset_session()` at start of `run_worker_lifecycle`. The Ticket output is the "distilled artifact" returned to the parent (`ConductorEngine`). Per the docs: *"Tier 3 worker is a fresh subprocess with a clean context window, receiving only the prompt and the relevant context slice."* | +| **1:1 main discussion** | **No.** The Application's chat loop has no sub-conversation mechanism. The user types a prompt, the AI responds, the loop continues. There's no way to "ask a sub-agent to investigate X and bring back the answer." | + +The user is correct: this is a gap. The MMA pattern is the prototype. A future track could extract `MMA's run_worker_lifecycle` into a reusable `app.spawn_sub_conversation(prompt, allowed_tools=...)` method that the App can call from `pre_tool_callback` or from a new "investigate this" command. + +**Verdict.** **PARITY for MMA; GAP for 1:1 discussions.** The MMA pattern is strong. The 1:1 chat has no equivalent. The user explicitly flagged this as a want. + +**Domain tag:** Application (and possibly Meta-Tooling). *Future-track candidate: a `src/sub_conversation.py:SubConversationRunner` that the App can call to spawn disposable sub-agents on-demand during 1:1 discussions. Per the user: useful for "specific points" within a longer conversation.* + +--- + +## 10. Controlled writes + +**nagent's claim.** A loop that writes files needs explicit boundaries. nagent is a reference implementation with conventions, **not a sandbox**. Shell runs with your permissions. Structured writes are checked. That is not a security boundary. Do not pretend it is. + +**nagent's implementation.** +- `validate_write_path(path, file_edit_path, ...)` — in main mode: path must be in `/tmp`, `/var/tmp`, or `$TMPDIR`. In file-edit mode: path must be the target file (or one of its split segments). +- Rejected writes append `` to the conversation. +- `` runs whatever the LLM wrote, with the user's permissions, in the user's working directory. **There is no shell sandbox.** This is explicit. + +**Manual Slop's equivalent.** Manual Slop has a *much* stronger security model: + +| nagent | Manual Slop | +|---|---| +| `validate_write_path`: in main mode, path must be in `/tmp`, `/var/tmp`, or `$TMPDIR` | `mcp_client._is_allowed`: in main mode, path must be in the allowlist (constructed from `file_items` + `extra_base_dirs`); history.toml and `*_history.toml` are *always* blocked | +| `execute_write` writes the file directly | `set_file_slice` / `edit_file` / `py_update_definition` route through AST or string-match for validation | +| `` runs the user's full shell, full permissions, no approval | `run_powershell(script, base_dir, qa_callback=...)` requires GUI modal approval (Execution Clutch), 60s timeout, `taskkill` cleanup, optional Tier 4 QA on failure | +| No per-tool allowlist | 3-layer security: `configure` (allowlist) → `_is_allowed` (path validation) → `_resolve_and_check` (resolution + symlink resolution) | +| No sandbox at all | PowerShell-only (no bash/cmd) by default; can be enabled in `[mcp_env.toml]` | + +**Verdict.** **PARITY (STRONGER on Manual Slop's side).** Manual Slop's HITL-required shell execution + 3-layer allowlist is *dramatically* more secure than nagent's tmpdir check. The user explicitly chooses "less safety but more flexibility" with nagent, and "more safety but more friction" with Manual Slop. + +**Domain tag:** Both. The Application needs Manual Slop's strict model. The Meta-Tooling could legitimately use nagent's looser model *because the human is in the loop* (the bridge script pops a GUI dialog). + +--- + +## 11. Large files as explicit artifacts (split/patch) + +**nagent's claim.** Big files exceed context. Split them. Do not pretend they fit. The split is a *data structure* with `index.json` and segment files; the patch is a unified diff; the source hash validates that nothing changed. + +**nagent's implementation.** + +The 4-file pipeline: +1. **`nagent-file-split --output --split [--summarize] [--refresh INDEX] [--target-bytes 32768] [--natural]`**: + - `EXTENSION_MAP` covers 11 languages (txt, md, cpp, py, xml, js, ts, json, yaml, go, rs, java) + - Per-language `SCORE_BY_TYPE` (no tree-sitter; regex + line-counting + brace/JSON/XML depth counters) + - `py_score` rewards blank lines followed by `def`/`class`/`async def` + - `cpp_score` uses `brace_depth` to find closing braces at depth 0 + - `json_score` uses `json_depth` to find closing `}`/`]` at depth 0 + - Writes `index.json` with `source_path`, `sourcesha256`, `source_size_bytes`, `source_line_count`, `split_type`, `target_bytes`, `natural`, `created_at`, `segment_count`, `segments[]` + - Each segment is a separate file with `name-0001.py`, `name-0002.py`, etc. + - `--summarize` flag spawns `nagent-file-summarize` per-segment subprocess +2. **User edits the segment files** (in place, via vim, etc.) +3. **`nagent-file-patch [--patch PATH] [--dry-run] [--force]`**: + - `validate_index(index, require_hash_match=not force)` — **strict** hash check; rejects if source changed + - `merge_segments(segments) -> str` — concatenates segment contents in order + - `make_unified_patch(source, original, updated)` — `difflib.unified_diff` + - Writes the patch file; if `apply=True` and `changed=True`, writes the source +4. **`nagent-file-summarize [--limit-word-count N] [--output DIR] [--json]`**: + - Files > 64 KB cascade to `nagent-file-split --summarize` first + - `summarize_content` retries up to `SUMMARY_MAX_ATTEMPTS = 2` if the LLM overshoots the word limit + - `combined_summary_from_index` glues per-segment summaries into one + +**Manual Slop's equivalent (different mechanism, same insight).** Manual Slop has all the *parts* of nagent's split/patch/summarize, but they live in different files and use different mechanisms: + +| nagent | Manual Slop | +|---|---| +| `nagent-file-split` with per-language `SCORE_BY_TYPE` (regex + line counts + brace/JSON/XML depth) | `aggregate.py:build_file_items()` + `py_get_skeleton` (tree-sitter) + `ts_c_*_get_skeleton` (tree-sitter) + `outline_tool.py` | +| `index.json` with `source_path`, `sourcesha256`, `segments[]` | No explicit `index.json`. The "split" is implicit in `_reread_file_items` (mtime-based, not hash-based) and the `py_get_skeleton` tool returns the structural view on demand. | +| `nagent-file-patch` with strict `validate_index` (hash check) | `set_file_slice` / `edit_file` with `result of file.read_text()` pre-write validation. No hash-based pre-validation. | +| `nagent-file-summarize` with per-segment LLM call + retry | `run_subagent_summarization(file_path, content, is_code, outline) -> str` (in-process LLM call) | +| Combined `combined_summary_from_index` | No equivalent; `aggregate.build_markdown_no_history` builds a single markdown per call | +| `nagent-file-summarize` cascades to `nagent-file-split --summarize` for > 64 KB | `RAGEngine._chunk_code` cascades to chunking for Python (mtime-based invalidation, ChromaDB persistence) | + +**Crucial difference: Manual Slop uses tree-sitter, nagent does not.** nagent's per-language scoring functions are *all regex-based* (`cpp_score` looks for closing braces at depth 0; `py_score` looks for blank lines followed by `def`/`class` keywords; no AST parsing). Manual Slop's `py_get_skeleton` and `ts_c_*_get_skeleton` use the tree-sitter library for actual AST traversal. + +This is a trade-off. Tree-sitter is more accurate but requires a native dependency. nagent's approach works on any Python install with no compiled extensions. For the Application domain, tree-sitter is already a dependency (`file_cache.py`); for the Meta-Tooling, nagent's regex approach has appeal. + +**Verdict.** **PARITY (DIFFERENT MECHANISM).** Both have the "split / patch / summarize as explicit data artifacts" insight. nagent uses subprocesses + per-language scoring + hash validation. Manual Slop uses tree-sitter + in-process calls + mtime validation. The key safety property — *"the patch operation validates the source hasn't changed"* — is done by nagent via SHA-256; Manual Slop does it implicitly by re-reading the file and string-matching. Manual Slop could adopt the explicit hash approach for stronger guarantees. + +**Domain tag:** Both. *Future-track candidate: an explicit `src/split_lib.py` + `src/patch_lib.py` mirroring nagent's design, used by the Application for very-large-file scenarios (e.g., a 200KB legacy C file where skeleton + sig + def aggregation isn't enough).* + +--- + +## 12. Tool discovery (self-describing executables) + +**nagent's claim.** Tool capability should be explicit data too. No central registry. Tools describe themselves. + +**nagent's implementation.** `bin/helpers/nagent_cli.py:collect_bin_tool_descriptions(bin_dir)`: +- Iterates every executable in `bin/` +- Runs each with `--description` (10s timeout per) +- Captures stdout, parses it +- Concatenates into a single "Available tools:\n\n\n\n\n..." block +- Inserts this block into the initial context + +Each tool's `__main__` starts with: +```python +def exit_on_description(description: str) -> None: + if "--description" in sys.argv: + print(description) + raise SystemExit(0) +``` + +So `nagent-file-split --description` prints "Split a large file into structure-aware segments..." and exits 0. The main `nagent` loop calls `collect_bin_tool_descriptions` once at startup. + +**Manual Slop's equivalent.** None. The 45 MCP tools in `src/mcp_client.py` are dispatched by a flat if/elif chain in `dispatch()`: +```python +def dispatch(tool_name, tool_input): + if tool_name.startswith("bd_"): + return _dispatch_beads(tool_name, tool_input) + if tool_name == "read_file": + return _read_file(tool_input["path"]) + if tool_name == "py_get_skeleton": + return _py_get_skeleton(tool_input["path"]) + # ... 45+ branches ... + return f"ERROR: unknown tool: {tool_name}" +``` + +Adding a new tool requires: +1. Edit `dispatch()` to add the branch +2. Update the security allowlist in `_resolve_and_check` (if filesystem access) +3. Update the AI capability declaration in `get_tool_schemas()` +4. Add tests + +nagent's approach: drop an executable in `bin/`, implement `exit_on_description`, done. The tool is auto-discovered. + +The user (per the pushback): *"The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet."* — so this is a known want, but low priority. + +**Verdict.** **GAP (Application).** nagent's pattern is genuinely better here, but Manual Slop has 45 tools in production and a migration would be a big refactor. The win is real (extensibility) but the cost is also real (rewrite the dispatch layer). + +**Domain tag:** Both. For the Meta-Tooling (the `scripts/` directory), nagent's pattern is more aligned with the external-agent usage model. For the Application, the existing `dispatch` if/elif is fine. + +*Future-track candidate: a `mcp_architecture_refactor_20260606` (already on the board) would benefit from nagent's pattern. The "sub-MCP" extraction the planned refactor proposes is exactly the right scope for this — each sub-MCP could be its own self-describing module.* + +--- + +## 13. Differences from frameworks + +nagent's philosophical frame: framework-style systems hide state in object graphs and long-lived agent abstractions; nagent keeps everything as explicit files. The reframing table at the end of the nagent README is excellent: + +| Common term | nagent framing | +|---|---| +| memory | editable artifact | +| retrieval | preserved work / historical context | +| agent | temporary transformation function | +| context | explicit input data | + +This report's §2-§12 have been showing where Manual Slop *agrees* with nagent's reframings and where it *deliberately diverges*. + +**Verdict.** The reframing is useful. The application can pick and choose which reframings to adopt per layer. + +**Domain tag:** Both. This is the philosophical lens for the whole report. + +--- + +## 14. Build your own + +nagent's last section: *"The minimal system is not mystical. Small loop over explicit state."* The list of 12 buildable steps: `generate_text(file) -> str`, growing conversation document, initial context with the contract, output format + parser, handlers that append results to state, loop after actions, visible retry on malformed output, child loops for delegation, per-artifact memory, repository history → context blocks, split/index/patch for large files, save/load/edit/summarize for memory maintenance. + +**Verdict.** Manual Slop *has* all 12 of these. Just in different files, with different names, and at a different scale. + +**Domain tag:** Both. The 12-step list is a useful checklist for any future LLM-application track. + +--- + +## 15. The 6 Pitfalls (Revised from 8, after User Corrections) + +The first draft of this report had 8 pitfalls. The user-corrections on §3 and §6 collapsed 2 of them. The remaining 6: + +### Pitfall 1: No structured output protocol in the Application AI + +The Application uses opaque provider-native function calling. The user can read the conversation, but cannot read a `tool_call` from the comms log without knowing the provider's schema. nagent's regex-tag protocol is more debuggable for the Meta-Tooling. **Decision: not a problem for the Application (provider-native is the right choice). Worth borrowing for the Meta-Tooling.** **Domain tag:** Both. *Future-track candidate: an intent-based DSL for Meta-Tooling agent calls.* + +### Pitfall 2: Provider-specific history is in process globals + +`src/ai_client.py` has `_anthropic_history`, `_deepseek_history`, `_minimax_history` — 3 separate per-provider history lists, each with their own lock. Switching providers mid-session loses history. nagent's "single conversation file" model is provider-agnostic. + +**Concrete change:** A future refactor toward a stateless `LLMClient` class with an explicit `Conversation` object (the transcript as a `list[Message]`) would let: +- Users save/load/replay conversations +- Provider switching doesn't lose history +- Tier 4 QA and Tier 3 workers share a common conversation format + +**Domain tag:** Application. *Future-track candidate: a `src/conversation.py:Conversation` dataclass + `src/llm_client.py:LLMClient` stateless wrapper around the 5 providers.* + +### Pitfall 3: RAG is not "history as data" + +Manual Slop's RAG (`src/rag_engine.py`) is fuzzy and not auditable. nagent's git-history-driven context is exact and inspectable. RAG is useful but should be **additive**, not a replacement. The Application's `_reread_file_items` mtime-based diff injection is the "history as data" mechanism Manual Slop already has. + +**The user's clarification:** *"RAG is an optional thing, doesn't have to be used. Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run."* + +**Decision:** RAG stays. The user wants a *staging* workflow: a sub-agent prepares RAG chunks before a run, the chunks become the discussion's starting memory. This is consistent with the nagent-inspired sub-conversation pattern (§9). + +**Domain tag:** Application. *Future-track candidate: a "RAG pre-staging" sub-conversation runner that pre-builds the index for a planned run.* + +### Pitfall 4: The AI client is a stateful singleton with module-level globals + +2,685-line `src/ai_client.py`. The module is the abstraction layer. To import it for testing, you trigger 5 provider SDKs' lazy imports. The unit tests are the only way to know what state is in flight. + +This is the *opposite* of nagent's "files are the system; the process is a worker." nagent's `run_agent_loop` is 50 lines, stateless, testable. A future refactor toward a stateless `LLMClient` class would make `ai_client` parseable, testable, and saveable. + +**Domain tag:** Application. *Future-track candidate: a `src/llm_client.py:LLMClient` class with explicit `Conversation`, `Provider`, `History` objects. Backwards-compatible with the current `ai_client.send()` API.* + +### Pitfall 5: No non-MMA disposable sub-conversations + +The MMA pattern is strong. The 1:1 chat has no equivalent. The user *explicitly* flagged this as a want: *"I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points."* + +**Decision:** Design `src/sub_conversation.py:SubConversationRunner` that the App can call to spawn disposable sub-agents on-demand during 1:1 discussions. Reuse MMA's subprocess pattern (`mma_exec.py` as the template). The sub-agent returns a concise artifact to the parent (nagent's pattern). Useful for "investigate this file" / "summarize this concept" / "look up this API" commands. + +**Domain tag:** Application. *Future-track candidate: a `src/sub_conversation.py` + a GUI "Investigate…" button on the message panel.* + +### Pitfall 6: Hard-coded tool discovery + +The 45 MCP tools in `mcp_client.py:dispatch` are in a flat if/elif chain. nagent's `--description` self-describing executable pattern is more extensible. + +**The user's position:** *"The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet."* + +**Decision:** Low priority. The `mcp_architecture_refactor_20260606` (already on the board) is the natural place to address this — sub-MCPs as self-describing modules. + +**Domain tag:** Both. *Future-track candidate: subsumed by mcp_architecture_refactor_20260606.* + +### Pitfalls removed by user-corrections + +- **(removed)** Pitfall about "Conversation state is buried in module-level globals" — overstated. Manual Slop has editable UI state (Takes, UISnapshot, ContextPreset); it lacks editable *raw transcripts*, but that's a *different* design choice, not a gap. (See §3.) +- **(removed)** Pitfall about "per-file memory" — overstated. Manual Slop *does* have per-file memory in the curation dimension; what's missing is nagent's conversation-log dimension, which is a different optimization. (See §6.) + +--- + +## 16. Recommended reading path for engineers + +If you haven't read nagent, here's the priority: + +1. **The README's first 3 sections** ("What It Looks Like", "Durable Work", "Text In Text Out") — the philosophy in 5 minutes. +2. **`bin/nagent:run_agent_loop()`** — the actual loop, 50 lines. +3. **`bin/helpers/nagent_file_split_lib.py:SCORE_BY_TYPE`** — the per-language scoring; shows what "structure-aware" can mean without tree-sitter. +4. **`bin/helpers/nagent_file_patch_lib.py:validate_index`** — the strict hash check; the safety property of nagent's split/patch workflow. +5. **`bin/helpers/nagent_file_summarize_lib.py:summarize_content`** — the retry-with-smaller-prompt pattern. +6. **`bin/helpers/nagent_cli.py:collect_bin_tool_descriptions`** — the tool-discovery pattern; 30 lines. + +The README's 14 sections can be skimmed in 15 minutes if you have the context this report provides. Read in order 1-5 above for the implementation depth. + +--- + +## Appendix A. Cross-reference table + +| nagent file | Lines | Purpose | Manual Slop equivalent | +|---|---|---|---| +| `README.md` | ~1500 | 14-section teaching document | This report + `docs/guide_*.md` | +| `bin/nagent` | ~700 | Main loop, tag parser, sub-conversation runner | `src/ai_client.py:send` + `src/multi_agent_conductor.py:ConductorEngine.run` + `simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async` (3 separate loops) | +| `bin/nagent-llm-text` | ~50 | CLI wrapper for `nagent-llm.py` | (implicit; the Application calls `ai_client.send` directly) | +| `bin/nagent-llm-upload` | ~30 | File upload + LLM call | (not present; the Application's read tools handle files inline) | +| `bin/nagent-file-edit` | ~120 | Per-file subprocess wrapper | (not present; this is the gap that the user wants for 1:1 discussions) | +| `bin/nagent-file-split` | ~170 | Main split executable | (not present in this form; Manual Slop uses `aggregate.py` + tree-sitter) | +| `bin/nagent-file-patch` | ~80 | Main patch executable | (not present; Manual Slop uses `set_file_slice` / `edit_file` directly) | +| `bin/nagent-file-summarize` | ~100 | Main summarize executable | `src/ai_client.py:run_subagent_summarization` (in-process) | +| `bin/helpers/nagent_cli.py` | ~80 | `--description` pattern, `WaitSpinner` | (not present) | +| `bin/helpers/nagent_llm.py` | ~300 | 4 providers, token accounting | `src/ai_client.py:_send_` × 5 (in-process, with cross-provider state) | +| `bin/helpers/nagent_file_edit_lib.py` | ~170 | file-index by inode, `resolve_file_edit_conversation` | (not present) | +| `bin/helpers/nagent_file_split_lib.py` | ~400 | `SPLIT_TYPES` (11 langs), per-language scoring | `src/file_cache.py:ASTParser` (tree-sitter) + `src/aggregate.py:build_file_items` | +| `bin/helpers/nagent_file_patch_lib.py` | ~130 | strict hash validation, `make_unified_patch` | (not present; implicit mtime check) | +| `bin/helpers/nagent_file_summarize_lib.py` | ~110 | per-segment LLM call, retry-with-smaller-prompt | `src/ai_client.py:run_subagent_summarization` (in-process, no retry) | +| **Total nagent** | **~4000** | | **Manual Slop's analogous parts: ~5000+** (ai_client + multi_agent_conductor + mcp_client + aggregate + rag_engine + history + project_manager + tree-sitter-based tools) | + +Manual Slop is *not* smaller than nagent; it's *larger* because it has a GUI, persistence, HITL dialogs, Hook API, and a real test harness. The architectures serve different scales. + +--- + +## Appendix B. Citations + +- nagent source: https://github.com/macton/nagent (all 11 source files read in full) +- Internal: `docs/Readme.md`, `docs/guide_architecture.md`, `docs/guide_ai_client.md`, `docs/guide_mma.md`, `docs/guide_tools.md`, `docs/guide_mcp_client.md`, `docs/guide_app_controller.md`, `docs/guide_meta_boundary.md`, `docs/guide_context_curation.md`, `docs/guide_personas.md`, `docs/guide_rag.md`, `docs/guide_gui_2.md` +- Internal source (selectively read for user-corrections): `src/models.py` (FileItem, ContextPreset), `src/context_presets.py`, `src/project_manager.py` (branch_discussion, promote_take), `src/aggregate.py`, `src/history.py` +- Mike Acton, "Data-Oriented Design and C++" (cppCon 2014) — referenced but not directly cited +- Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them" — cited via the `data_oriented_error_handling_20260606` track + +--- + +*End of report. See `comparison_table.md` for the flat reference, `decisions.md` for the future-track candidates, and `spec.md` for the track wrapper.* diff --git a/conductor/tracks/nagent_review_20260608/spec.md b/conductor/tracks/nagent_review_20260608/spec.md new file mode 100644 index 00000000..80ff06c2 --- /dev/null +++ b/conductor/tracks/nagent_review_20260608/spec.md @@ -0,0 +1,240 @@ +# Track: Mike Acton's nagent — Deep Dive on LLM Agent Architecture + +**Status:** Active (spec approved 2026-06-08; revised 2026-06-08 with user-corrections) +**Initialized:** 2026-06-08 +**Owner:** Tier 2 Tech Lead +**Priority:** Medium (architectural; informs future Application+Meta-Tooling decisions but is not a code refactor) + +> **Revision note (2026-06-08):** This spec was revised based on direct user corrections after the first draft. Earlier versions overstated gaps in Manual Slop's "editable discussion" and "per-file memory" features; the corrections are folded into §2 and §4 below. Read the **report.md** for the actual analysis; this spec.md is the wrapper. + +--- + +## 1. Overview + +This track documents a deep-dive analysis of Mike Acton's [`macton/nagent`](https://github.com/macton/nagent) reference implementation ("nagent" = "not-an-agent") and its implications for how Manual Slop should think about LLM-driven workflows. + +nagent is a 14-section, ~1,500-line Python reference that operationalizes the philosophy **"the agent is not the thing; the data is the thing."** It provides a concrete, minimal counterpoint to the standard "agent framework" model. Its central claim: **durable work matters more than durable processes; explicit artifacts beat opaque state.** + +The companion doc ([report.md](./report.md)) is the deep-dive analysis itself — a 14-section comparison against Manual Slop's actual implementation, written for engineers (not marketing). This spec.md is the conductor/track wrapper: the design intent, the relationship to the Application vs Meta-Tooling split, the planned follow-up tracks, and the out-of-scope notes. + +### 1.1 What this track produces + +| Artifact | Purpose | +|---|---| +| `spec.md` | This file — the track design and scoping. | +| `report.md` | The 14-section deep-dive analysis. The primary deliverable. | +| `comparison_table.md` | A flat side-by-side table (one row per nagent principle) for quick reference. | +| `decisions.md` | Future-track candidates extracted from the analysis (each becomes a follow-up track if approved). | + +### 1.2 Non-Goals + +- **Not** rewriting Manual Slop to use nagent. The architectures serve different domains (see §2). +- **Not** replacing any existing track. This is a *reference* track — it informs future tracks but doesn't compete with them. +- **Not** a comparison of "framework vs framework." nagent is a 1,500-line reference; Manual Slop is 13,000+ lines of production code with a real GUI, real persistence, real HITL. The comparison is *philosophical*, not "which is better." + +--- + +## 2. The Application / Meta-Tooling Distinction (load-bearing context) + +Per `docs/guide_meta_boundary.md`, Manual Slop lives in two distinct architectural domains. **This distinction is critical for understanding the nagent comparison:** + +| Domain | Lives at | AI / HITL Model | Tooling | +|---|---|---|---| +| **The Application** (`manual_slop`) | `gui_2.py`, `ai_client.py`, `multi_agent_conductor.py`, `dag_engine.py` | A local GUI for orchestrating AI. The "Application AI" is a long-lived assistant that the user talks to over many turns. Strict HITL: every destructive action requires a GUI modal approval. | `manual_slop.toml [agent.tools]` — strict allowlist | +| **The Meta-Tooling** (us) | `scripts/mma_exec.py`, `conductor/`, `.agents/skills/`, the MCP tools in `mcp_client.py` when used by external agents | External agents (Gemini CLI, OpenCode, Claude Code) that *build* the Application. Each invocation is a fresh sub-agent. Token-firewalled. | Full mcp_client.py toolset, including mutation tools | + +**nagent lives in the Meta-Tooling domain.** nagent is a reference for how *external* agents (the ones reading this conversation, the ones writing the code) should structure their own work. + +**Manual Slop's Application AI does not — and should not — look like nagent.** The Application AI is a chatty, conversational, persona-driven, RAG-augmented, curation-rich assistant with a real GUI. It's a *different kind of thing*. Conflating the two is exactly the kind of "feature bleed" `guide_meta_boundary.md` warns against. + +Every recommendation in `report.md` is qualified with which domain it applies to. The Application is the production code the user cares about; the Meta-Tooling is what we (the agents) use to build it. + +--- + +## 3. Summary of the 14-Section Comparison + +The full table is in `comparison_table.md`. Verdict summary: + +| nagent Principle | Manual Slop Equivalent | Verdict | +|---|---|---| +| 1. Durable work, disposable workers | AppState snapshots + history branching (Takes); MMA workers are real subprocesses | **PARTIAL** — different domains; MMA has it, App doesn't need it | +| 2. Text in, text out | `ai_client.send()` returns `str`; `mcp_client.dispatch` returns `str` | **PARITY** | +| 3. Conversations are editable state | Discussion takes + branching + edit-in-place + UISnapshot history; `ContextPreset` for per-file view-mode memory | **PARITY (DIFFERENT FOCUS)** — Manual Slop has this; focuses on *editable UI state* (per Take) and *editable per-file curation* (per FileItem), not editable conversation logs | +| 4. Visible output protocol | Uses provider-native function calling; the protocol is opaque to humans | **ARCHITECTURAL DIFFERENCE** — Application-side; correct trade-off | +| 5. The loop (append, call, parse, act, repeat) | `ai_client._send_*` tool-call loop, MMA `ConductorEngine.run`, `WorkflowSimulator.run_discussion_turn_async` | **PARITY** — but the loop is in multiple files, not as a single small function | +| 6. Per-file memory (curation, not conversation log) | `FileItem` (path + view_mode + ast_mask + custom_slices); `ContextPreset` (saved set of FileItems); Fuzzy Anchor slices | **MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION**; nagent's "file-edit conversation" pattern (one conversation log per file) is not present | +| 7. Repository history as data | `_reread_file_items` mtime-based diff injection; `git_commit_file_patch` per-file history summaries; no explicit "neighborhood" computation | **PARITY (PARTIAL)** — diff injection is similar; the "neighborhood" computation is missing | +| 8. Historical coupling & artifact neighborhoods | n/a (no equivalent) | **GAP** — could be added as a new tool | +| 9. Disposable sub-conversations | MMA `mma_exec.py` Tier 3 workers are real subprocesses; **non-MMA 1:1 discussions do NOT have disposable sub-conversations yet** (per user) | **GAP (Application) — useful for 1:1 discussions; **PARITY for MMA** | +| 10. Controlled writes | MCP 3-layer security + Execution Clutch + Allowlist Construction + Path Validation + Resolution Gate | **PARITY (STRONGER)** — Manual Slop's 3-layer is more thorough than nagent's tmpdir check | +| 11. Large files as explicit artifacts (split/patch) | `nagent-file-split`/`nagent-file-patch`/`nagent-file-summarize` with `index.json` + segment files + source hash validation; 32 KB target size; per-language natural splitters (no tree-sitter) | **PARITY (DIFFERENT MECHANISM)** — both have the insight; nagent uses per-language scoring functions + subprocess isolation, Manual Slop uses tree-sitter + in-process `summarize.py` | +| 12. Tool discovery (self-describing executables) | Hard-coded `dispatch` if/elif chain in `mcp_client.py` | **GAP (Application) — could be added; useful for the Meta-Tooling domain** | +| 13. Differences from frameworks | The philosophical frame | n/a | +| 14. Build your own | The reference's "minimal" claim is wrong for the Application | n/a for Application | + +The full 14-row analysis with 6 (revised from 8) specific Manual Slop pitfalls is in `report.md`. + +--- + +## 4. The Revised 6 Pitfalls (corrected) + +Earlier versions of this list contained two errors that user-corrections caught: + +- **REMOVED** pitfall #3 (per "Conversation state is buried in module-level globals" was over-stated) — Manual Slop has *some* editable-state infrastructure (`HistoryManager` with UISnapshot, discussion Takes/branching, `ContextPreset` save/load) but the actual *raw conversation transcript* is in `ai_client._provider_specific_history` globals. The truth is: **Manual Slop has editable UI state, not editable conversation transcripts.** That distinction is now captured honestly in §3 of the report. + +- **REVISED** pitfall #6 (per "Per-file memory") — Manual Slop *does* have a per-file memory concept (`FileItem` + `ContextPreset` + `custom_slices` + `ast_mask`), but it's *curation memory*, not nagent's *conversation-log memory*. Manual Slop's concept is *richer in the curation dimension* but *absent in the conversation-log dimension*. That's a useful distinction. + +The remaining 6 pitfalls, after corrections: + +1. **No structured output protocol** in the Application AI (uses opaque function calling; nagent's regex tag protocol is the alternative for the Meta-Tooling). **Domain: Application can stay opaque; Meta-Tooling should learn.** +2. **Provider-specific history is in process globals** (5 separate per-provider lists with their own locks; switching providers mid-session loses history). **Domain: Application. Future-track candidate.** +3. **RAG is not "history as data"** — RAG retrieval is fuzzy and not auditable. nagent's git-history-driven context is exact and inspectable. RAG is useful but should be additive, not a replacement. **Domain: Application. Coexists with nagent-style history.** +4. **The AI client is a stateful singleton with module-level globals** (2,685-line `ai_client.py` is unparseable without state). A future refactor toward a stateless `LLMClient` class with explicit `Conversation` objects would let the App save/load/replay conversations as files. **Domain: Application. Future-track candidate.** +5. **No non-MMA disposable sub-conversations** — only MMA workers are real subprocesses; the user explicitly noted that 1:1 discussions don't have sub-agents. nagent's `` pattern (a sub-agent for bounded investigation) would be valuable for the Application. **Domain: Application. Future-track candidate (user-flagged as a want).** +6. **Hard-coded tool discovery** — the 45 MCP tools are in a flat if/elif chain in `dispatch`. nagent's `--description` self-describing executables pattern is more extensible. **Domain: both. Low priority.** + +Plus 2 domain-domain recommendations that are not pitfalls per se: + +- **Personas are config bundling** (per user: "just bundles preparatory cruft — vendor/model, tools/permissions, and system prompts"). The user noted that you can *completely opt out* by just using AI settings directly. **Domain: Application. Keep as-is; not a pitfall.** +- **RAG is opt-in** (per user: "doesn't have to be used"). Worth considering: a sub-agent that *prepares RAG chunks* before a run. **Domain: Application. Future-track candidate.** + +--- + +## 5. What This Track Read (in full, before writing) + +To avoid hand-waved claims, the report and this spec were written after reading all of: + +### nagent source (read in full) + +- `README.md` (~1,500 lines) — the 14-section "teaching document" +- `bin/nagent` (~700 lines) — the main loop, tag parser, sub-conversation runner, git history + co-edit + summary integration +- `bin/helpers/nagent_llm.py` (~300 lines) — provider dispatch, token accounting +- `bin/helpers/nagent_cli.py` (~80 lines) — `--description` self-describing executable pattern, `WaitSpinner` +- `bin/helpers/nagent_file_edit_lib.py` (~170 lines) — file-index by `st_dev:st_ino`, `resolve_file_edit_conversation`, `is_split_segment_for_source` +- `bin/helpers/nagent_file_split_lib.py` (~400 lines) — `SPLIT_TYPES` (11 langs), per-language `SCORE_BY_TYPE` (no tree-sitter; regex + line counts + brace/JSON/XML depth), 32 KB default, source SHA-256 hashing +- `bin/helpers/nagent_file_patch_lib.py` (~130 lines) — strict hash validation, `make_unified_patch` via `difflib.unified_diff`, `apply_segment_patches` writes the source +- `bin/helpers/nagent_file_summarize_lib.py` (~110 lines) — per-segment LLM calls + retry-with-smaller-prompt (max 2 attempts), `--limit-word-count` validation, `combined_summary_from_index` +- `bin/nagent-file-edit` (~120 lines) — per-file subprocess wrapper, `default_pid = BASHPID or os.getppid()` +- `bin/nagent-file-split` (~170 lines) — main executable, `--refresh INDEX` mode for re-splitting without losing segment paths +- `bin/nagent-file-summarize` (~100 lines) — main executable, cascades to `nagent-file-split --summarize` for files > 64 KB; uses `positive_int` CLI type (rejects 0) + +### Manual Slop docs (read in full) + +- `docs/Readme.md` (434 lines) — docs index +- `docs/guide_architecture.md` (989 lines) — threading model, cross-thread data structures +- `docs/guide_ai_client.md` (424 lines) — multi-provider LLM client +- `docs/guide_mma.md` (564 lines) — 4-tier MMA orchestration +- `docs/guide_tools.md` (506 lines) — MCP tool inventory + Hook API +- `docs/guide_mcp_client.md` (410 lines) — 45 tools + 3-layer security +- `docs/guide_app_controller.md` (447 lines) — headless controller +- `docs/guide_meta_boundary.md` (57 lines) — Application vs Meta-Tooling split +- `docs/guide_context_curation.md` (303 lines) — Granular AST Control + Fuzzy Anchor Slices + AST Inspector +- `docs/guide_personas.md` (307 lines) — Unified agent profile model +- `docs/guide_rag.md` (411 lines) — RAG subsystem +- `docs/guide_gui_2.md` (477 lines) — ImGui application (App/Controller state delegation, hot-reload, defer-not-catch) + +### Manual Slop source (selectively read, in service of the user-corrections) + +- `src/models.py` lines 510-559 (FileItem schema), 909-937 (ContextPreset schema) +- `src/context_presets.py` (30 lines, full file) — the `ContextPresetManager` +- `src/project_manager.py` lines 429-450 (`branch_discussion`, `promote_take`) +- `src/aggregate.py` first 80 lines (context composition pipeline) +- `src/history.py` (full file, 141 lines) — `UISnapshot` and the snapshot model + +The user-corrections specifically drove a re-survey of `FileItem` + `ContextPreset` + `aggregate.py` + `HistoryManager` after the first draft overstated Manual Slop's gaps. + +--- + +## 6. Architectural Reference + +- **nagent source code:** https://github.com/macton/nagent (read in full for this analysis) +- **nagent README:** https://github.com/macton/nagent/blob/main/README.md (the 14-section "teaching document") +- **Mike Acton's data-oriented design talks:** https://www.youtube.com/results?search_query=mike+acton+data+oriented (foundational; nagent is a specific application) +- **Ryan Fleury "errors are just cases":** https://www.dgtlgrove.com/p/the-easiest-way-to-handle-errors (cited in `data_oriented_error_handling_20260606`; consistent with nagent's data-over-control-flow stance) +- **Internal:** `docs/guide_meta_boundary.md` for the Application/Meta-Tooling split +- **Internal:** `docs/guide_architecture.md` §"Thread Domains" for the cross-thread state-sync problem that nagent sidesteps by having no GUI + +--- + +## 7. See Also + +### Internal Documentation + +- `docs/Readme.md` — Manual Slop documentation index +- `docs/guide_architecture.md` — Threading model and provider dispatch +- `docs/guide_ai_client.md` — The Application's LLM client +- `docs/guide_mma.md` — 4-tier MMA orchestration +- `docs/guide_meta_boundary.md` — The Application vs Meta-Tooling split +- `docs/guide_tools.md` — MCP tool inventory and Hook API +- `docs/guide_mcp_client.md` — 45 tools + 3-layer security +- `docs/guide_context_curation.md` — Granular AST Control + Fuzzy Anchor Slices + AST Inspector +- `docs/guide_personas.md` — Unified agent profile model +- `docs/guide_rag.md` — RAG subsystem +- `docs/guide_gui_2.md` — ImGui application + +### Related Tracks + +- `data_oriented_error_handling_20260606` — Already cites Acton by name. The `Result[T]` + `ErrorInfo` data model from this track is consistent with nagent's "data, not control flow" stance. +- `qwen_llama_grok_integration_20260606` — The "OpenAI-compatible shared helper" pattern is exactly nagent's "thin boundary adapter on a normalized data structure" approach. +- `mcp_architecture_refactor_20260606` — Already blocked by `data_oriented_error_handling_20260606`. The sub-MCP extraction (planned) will benefit from nagent's "small helper per concept" decomposition pattern. +- `data_structure_strengthening_20260606` — The type-alias work is consistent with nagent's "make the data shape explicit" stance. The audit script + NamedTuple work parallels nagent's split-index / patch-artifact approach. + +### External + +- Mike Acton, "Data-Oriented Design and C++" (cppCon 2014) — The original DOD talk that nagent operationalizes +- Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them" — Companion framework; same "errors as data" thesis +- Timothy Lottes (@NOTimothyLottes) — Cited in the `data_oriented_error_handling` review; same "error codes are data" stance +- Valigo (@valigotech) — Cited in the data_oriented_error_handling review; "exceptions mess with control flow in very weird ways" + +--- + +## 8. Scope Boundaries + +### In Scope + +- The 14-section nagent philosophy +- The 6 (revised) concrete pitfalls in Manual Slop +- Mapping each pitfall to a future-track candidate (in `decisions.md`) +- Application vs Meta-Tooling domain classification for every recommendation +- The philosophical grounding for existing Manual Slop conventions (data-oriented, thread-disciplined, GUI-decoupled) + +### Out of Scope + +- **Implementation work.** This is a reference/analysis track. No code is being changed. +- **Replacing nagent in the Meta-Tooling.** The Meta-Tooling is whatever the external agent (Gemini CLI, OpenCode) is. nagent is a *reference example*, not a competitor. It's worth reading for ideas, not adopting wholesale. +- **Building a new "data-oriented" track for Manual Slop.** The `data_oriented_error_handling_20260606` track already covers the data-vs-control-flow axis. This track is the *philosophical foundation* for that work; the implementation track is separate. +- **Comparing nagent to other LLM agent frameworks (LangChain, AutoGen, CrewAI, etc.).** nagent is a specific small reference; those are different scales. This track is about nagent specifically. + +### Known Trade-offs (called out in the report) + +- **Manual Slop's personas are a feature, not a bug, in the Application domain.** A user-facing chatty assistant benefits from "persona = named configuration that the user can save and recall." nagent's "data, not personality" stance is correct for sub-agent invocations but wrong for long-lived assistant sessions. (Per user: personas are config bundling; the user can opt out by using AI settings directly.) +- **Manual Slop's RAG is a feature, not a bug, in the Application domain.** RAG enables semantic search across large codebases. nagent's "git history → summaries" is exact but doesn't help when the user asks "how does the execution clutch work" and the relevant information is in `guide_architecture.md` (a doc, not source). RAG is opt-in. +- **Manual Slop's GUI is a feature, not a bug, for its domain.** It enables the rich persona, curation, RAG, and snapshot UX. nagent explicitly has no GUI; the Application explicitly has a GUI. They serve different needs. +- **The "1,500-line reference" vs "13,000-line production" comparison is not fair.** nagent is a teaching example. Manual Slop is a working tool. The right comparison is "nagent's principles vs Manual Slop's implementation," not "which codebase is better." + +--- + +## 9. Verification Criteria + +This is a reference/analysis track. The verification is: + +- [ ] `report.md` exists and covers all 14 nagent principles with a Manual Slop assessment for each +- [ ] `comparison_table.md` exists as a flat side-by-side reference +- [ ] `decisions.md` exists with future-track candidates (each is a separate conductor track to be specced independently) +- [ ] Every "Manual Slop could learn from nagent here" recommendation is tagged with the domain (Application / Meta-Tooling / Both) +- [ ] No code is being modified by this track +- [ ] The companion doc is read by ≥1 person who is planning a future track (the report.md file is referenced by the relevant future-track specs) +- [ ] (Post-correction) The report's verdicts on nagent §3 (Conversations are editable state) and §6 (Per-File Memory) are *corrected* per user feedback — the first draft overstated gaps + +--- + +## 10. Status + +**Approved 2026-06-08 (initial); revised 2026-06-08 with user corrections.** Ready for human review of `report.md`. + +After human review of `report.md`, the `decisions.md` candidates will be evaluated: +- High-priority items (e.g., stateless `LLMClient` class, non-MMA sub-conversations, RAG pre-staging) → new conductor tracks +- Medium-priority items (e.g., self-describing MCP tools, conversation file persistence) → research spikes +- Low-priority items → deferred until a specific Application need surfaces + +The current `data_oriented_error_handling_20260606` track and the future `mcp_architecture_refactor_20260606` track are already philosophically aligned with nagent's principles; this track is the *explicit* reference to that alignment. diff --git a/conductor/tracks/nagent_review_20260608/state.toml b/conductor/tracks/nagent_review_20260608/state.toml new file mode 100644 index 00000000..c6ee5c4d --- /dev/null +++ b/conductor/tracks/nagent_review_20260608/state.toml @@ -0,0 +1,113 @@ +# Track state for nagent_review_20260608 +# Reference/analysis track — no implementation phases +# Updated by Tier 2 Tech Lead as track progresses (currently: complete) + +[meta] +track_id = "nagent_review_20260608" +name = "nagent Review (Mike Acton's data-oriented LLM agent reference)" +status = "active" +current_phase = 0 # 0 = pre-completion; this track produces no code phases +last_updated = "2026-06-08" + +[user_corrections_log] +# Corrections applied to the first draft based on direct user feedback during review +# Format: 2026-06-08_NN = "correction" (NN is sequence number to ensure TOML key uniqueness) +2026-06-08_1 = "Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS). User pointed at HistoryManager, project_manager.branch_discussion, UISnapshot — Manual Slop has editable UI state, not editable raw transcripts." +2026-06-08_2 = "Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN CURATION DIMENSION. User pointed at FileItem (path + view_mode + ast_mask + custom_slices), ContextPreset, aggregate.py. Manual Slop's per-file memory is the curation kind, not the conversation-log kind." +2026-06-08_3 = "Sub-conversations: removed 'PARITY stronger' claim. User clarified MMA has it but 1:1 discussions do not. Added 'GAP for 1:1 discussions' + user-flagged 'want' for future sub-conversation track." +2026-06-08_4 = "RAG: clarified as opt-in, not gap. User wants pre-staging via sub-conversation ('Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run')." +2026-06-08_5 = "Personas: reframed as config bundling, not gap. User noted personas can be completely opted out by using AI settings directly. They 'just bundle preparatory cruft.'" +2026-06-08_6 = "Tool discovery: downgraded to 'intentional, low priority'. User has 'intent based DSL' idea but 'no where near that ideation yet.'" +2026-06-08_7 = "Editable discussions: REVISED AGAIN. User pointed out the report's §3 verdict (PARITY/DIFFERENT FOCUS) didn't enumerate the per-entry operations. After re-reading gui_2.py:3770-3853 (render_discussion_entry) and gui_2.py:4239-4260 (render_discussion_entry_controls) and history.py (UISnapshot/HistoryManager), the report's §3 now lists the full A1-A7 per-entry + B1-B11 discussion-level + C1-C5 undo/redo operations. The verdict remains PARITY (DIFFERENT FOCUS) but the gap is more precisely scoped: Manual Slop's editing is more granular at the typed-entry layer; nagent's is deeper at the raw-transcript layer. The 'raw transcript is in process globals' framing in the previous draft is still correct as a *layer* description, but the report now correctly characterizes Manual Slop's editing as comprehensive at the user-visible layer." + +[tasks] +# Reference track; no implementation tasks. Future-track candidates live in decisions.md. +# Listing for accountability: + +t_reference_01 = { status = "completed", commit_sha = "", description = "Read nagent README + bin/nagent in full" } +t_reference_02 = { status = "completed", commit_sha = "", description = "Read all 6 nagent helper files in full (cli, llm, file_edit, file_split, file_patch, file_summarize)" } +t_reference_03 = { status = "completed", commit_sha = "", description = "Read all 4 nagent executable scripts in full (nagent-file-edit, nagent-file-split, nagent-file-patch, nagent-file-summarize)" } +t_reference_04 = { status = "completed", commit_sha = "", description = "Read Manual Slop docs/ in full (12 guides + Readme)" } +t_reference_05 = { status = "completed", commit_sha = "", description = "Read Manual Slop src/ files selectively for user-corrections (models.py FileItem + ContextPreset, context_presets.py, project_manager.py, aggregate.py, history.py)" } +t_write_01 = { status = "completed", commit_sha = "", description = "Draft spec.md (track wrapper)" } +t_write_02 = { status = "completed", commit_sha = "", description = "Draft report.md (14-section deep-dive analysis; primary deliverable)" } +t_write_03 = { status = "completed", commit_sha = "", description = "Draft comparison_table.md (flat side-by-side reference)" } +t_write_04 = { status = "completed", commit_sha = "", description = "Draft decisions.md (10 future-track candidates)" } +t_write_05 = { status = "completed", commit_sha = "", description = "Create metadata.json + state.toml" } +t_write_06 = { status = "completed", commit_sha = "", description = "Draft nagent_takeaways_20260608.md (10 actionable patterns; companion to report.md)" } +t_write_07 = { status = "pending", commit_sha = "", description = "Add entry to conductor/tracks.md (post-commit)" } +t_write_08 = { status = "pending", commit_sha = "", description = "Human review of report.md + nagent_takeaways_20260608.md (final)" } +t_archive = { status = "pending", commit_sha = "", description = "Move track to conductor/tracks/archive/ when follow-up tracks are specced (or sooner if no value remains)" } + +[user_wants_recorded] +# User explicitly wants these in priority order (see decisions.md for full detail) +want_1_sub_conversation_runner = "EXPLICIT: 'I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points'" +want_2_rag_pre_staging = "EXPLICIT: 'Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run'" +deferred_intent_dsl = "EXPLICIT but deferred: 'I want to add an intent based dsl to help with discovery or combinatorics but no where near that ideation yet'" + +[verification] +# Reference/analysis track; verification is artifact presence + user-correction application + +report_md_exists = true +comparison_table_md_exists = true +decisions_md_exists = true +spec_md_exists = true +metadata_json_exists = true +state_toml_exists = true +nagent_takeaways_md_exists = true + +# All 14 nagent principles have a corresponding section in report.md +all_14_principles_covered = true + +# All user-corrections applied to first draft +all_user_corrections_applied = true + +# All pitfalls are domain-tagged (Application / Meta-Tooling / Both) +all_pitfalls_domain_tagged = true + +# Track produces no code (it's a reference/analysis track) +no_code_modified = true + +# No links broken in comparison_table.md, decisions.md, report.md, spec.md, nagent_takeaways_20260608.md +all_internal_links_valid = true # verified by post-edit grep + +# 10 actionable takeaways grounded in actual code (file:line refs) +takeaways_grounded_in_code = true + +[nagent_principles_covered] +# 14 of 14 — full coverage +durable_work = "covered in report §1" +text_in_text_out = "covered in report §2" +editable_state = "covered in report §3" +visible_protocol = "covered in report §4" +the_loop = "covered in report §5" +per_file_memory = "covered in report §6" +repo_history = "covered in report §7" +neighborhoods = "covered in report §8" +sub_conversations = "covered in report §9" +controlled_writes = "covered in report §10" +large_files = "covered in report §11" +tool_discovery = "covered in report §12" +differences_from_frameworks = "covered in report §13" +build_your_own = "covered in report §14" + +[future_track_candidates] +# See decisions.md for full detail. 10 candidates. + +candidate_01_sub_conversation_runner = { priority = "HIGH", user_flag = "explicit want", domain = "App + MT", effort = "Medium" } +candidate_02_rag_pre_staging = { priority = "HIGH", user_flag = "explicit want", domain = "App", effort = "Small (depends on #1)" } +candidate_03_stateless_llm_client = { priority = "MEDIUM", user_flag = "none", domain = "App", effort = "Large" } +candidate_04_intent_dsl = { priority = "LOW", user_flag = "explicit but deferred", domain = "MT", effort = "Research" } +candidate_05_self_describing_tools = { priority = "LOW", user_flag = "implicit", domain = "BOTH", effort = "Medium (subsumed by mcp_architecture_refactor)" } +candidate_06_git_history_injection = { priority = "MEDIUM", user_flag = "none", domain = "App", effort = "Medium" } +candidate_07_per_file_conversation_log = { priority = "LOW", user_flag = "none", domain = "App", effort = "Small" } +candidate_08_coedited_files_tools = { priority = "LOW", user_flag = "none", domain = "App", effort = "Small (bundle with #6)" } +candidate_09_split_patch_lib = { priority = "DEFER", user_flag = "none", domain = "App", effort = "Medium (defer until need)" } +candidate_10_raw_transcript_persistence = { priority = "LOW", user_flag = "none", domain = "App", effort = "Small" } + +[status] +# Track is a reference/analysis track; "active" means the artifacts are ready for review +# The track will move to "completed" and be archived when: +# (a) At least one of the follow-up tracks (candidates 1-2) is specced, OR +# (b) The user explicitly says the analysis is no longer needed +status = "active (reference artifacts ready; awaiting human review + follow-up track scoping)"