# Mike Acton's nagent: A Deep-Dive Analysis vs Manual Slop **Track:** `nagent_review_20260608` **Date:** 2026-06-08 (revised with user corrections same day) **Author:** Tier 2 Tech Lead (with significant user review on §3 and §6) **Companion to:** `spec.md` (the track wrapper) > **Important reading note.** This report applies the **Application vs Meta-Tooling distinction** (per `docs/guide_meta_boundary.md`) as the lens for every comparison. nagent is a Meta-Tooling reference; Manual Slop's Application AI is a *different kind of thing*. Where they share patterns (MMA workers, the tool-call loop, the 3-layer security model), the report says so. Where they don't, the report says so. The report deliberately avoids "nagent is better" / "Manual Slop is better" framings. > > **Revision note.** The first draft overstated gaps in Manual Slop's "editable discussion" and "per-file memory" features. The user caught this and pointed at the actual files (`FileItem`, `ContextPreset`, `aggregate.py`, `project_manager.branch_discussion`, `HistoryManager`). The corrections are now folded in. Specific corrections: §3 (verdict changed from PARTIAL to **PARITY (DIFFERENT FOCUS)**); §6 (verdict changed from DOMAIN MISMATCH to **MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION**); §9 (verdict now notes the MMA vs 1:1 distinction explicitly per the user). --- ## 0. Reading guide - **Sections 1-14** map 1:1 to nagent's 14 principles. Each has: nagent's claim, nagent's implementation, Manual Slop's equivalent, a verdict, and a domain tag. - **Section 15** extracts the 6 actionable pitfalls and maps each to a future-track candidate. - **Section 16** is the recommended reading path for engineers who haven't read nagent. If you only have 10 minutes, read §3 (Conversations), §6 (Per-File Memory), §9 (Sub-Conversations), §10 (Controlled Writes), and §15 (the pitfalls list). --- ## 1. Durable work, disposable workers **nagent's claim.** A Python process is a *worker*; the files are the *system*. Workers come and go; data stays. **"The agent is not the thing; the data is the thing."** **nagent's implementation.** `bin/nagent` is a 700-line single-file loop. It reads `~/.nagent/conversations/` (a plain text file) for the current conversation, appends to it after every action, and exits. The user types `nagent "investigate this"`. The CLI is a shell. The state is a file. **Manual Slop's equivalent.** Manual Slop has two parallel systems: 1. **MMA workers are real subprocesses.** `multi_agent_conductor._spawn_worker` runs `mma_exec.py` via `subprocess.Popen` (per `docs/guide_multi_agent_conductor.md` §"Token Firewalling"). Each Tier 3 worker is a fresh Python process with **Context Amnesia** — `ai_client.reset_session()` at the start of `run_worker_lifecycle`. The subprocess is the disposable worker; the artifacts (track state, ticket results) are the system. 2. **The Application AI is *not* a disposable worker.** `gui_2.py:App` is a long-lived Qt/ImGui process. The user types a prompt, hits Enter, gets a response, *keeps the process running for hours*. The `app_state` dataclass is the long-lived worker. This is *intentional* for the Application domain: persona-driven conversations, snapshot-based undo, cross-discussion state — all require a long-running process. **Verdict.** **PARTIAL** — nagent's pattern lives in the Meta-Tooling + MMA, but the Application deliberately has long-lived workers. The two coexist because they serve different needs: MMA is fire-and-forget per ticket; App is an interactive partner. **Domain tag:** Both. MMA has it; App doesn't need it. *Future-track candidate: a stateless conversation-file pattern for the App (see §15.4).* --- ## 2. Text in, text out **nagent's claim.** The smallest useful primitive is: file in, text out. `nagent-llm-text --file question.txt` reads a file, calls the LLM, prints plain text or JSON. Everything else in nagent is orchestration around this. **nagent's implementation.** `bin/helpers/nagent_llm.py` (300 lines) provides `generate_text(message, provider, model) -> str` for 4 providers (openai, anthropic, google, cursor). Token accounting via provider usage metadata (with character-count fallback at 1 token per 4 chars). Provider churn is isolated in this file. **Manual Slop's equivalent.** `src/ai_client.py:send(...) -> str` is the parallel. 5 providers (gemini, anthropic, deepseek, minimax, gemini_cli). Same `provider, model, usage` shape. Manual Slop wraps the string in a larger `(md_content, user_message, base_dir, file_items, ..., rag_engine) -> str` because the Application's text-in/text-out also needs tool calls, RAG injection, tier attribution, and patch-mode. But the *primitive* is the same. **Verdict.** **PARITY.** nagent and Manual Slop both use text-in/text-out at the bottom. The Application's `send()` is a *strict superset* of nagent's `nagent-llm-text`, with provider churn still isolated to a single module. **Domain tag:** Both. Meta-Tooling uses the same primitive via `mma_exec.py`'s `ai_client.send`. --- ## 3. Conversations are editable state **nagent's claim.** The conversation file is not chat history. It is working state. Memory goes stale; therefore let people save, load, summarize, edit, branch, trim, copy, diff, version, and rewrite conversations. **"The conversation does not own its memory. The user does."** **nagent's implementation.** - `bin/nagent` exposes `--save-conversation `, `--load-conversation `, `--summarize`, `--edit-conversation `. The latter **automates** one path: archive current file, run file-edit on the archive, load the result. - Conversations are plain text files. The user can `cat`, `vim`, `git diff`, or `cp` them with no special tooling. The `` body and `` body are just text in the file. - The first draft of this section understated Manual Slop's editing capability. The corrected picture is below. **Manual Slop's equivalent (corrected, with the full operation matrix).** Manual Slop's discussion editing lives at **three nested layers**, each with its own operations. The full enumeration: **Layer A — Per-entry operations on `app.disc_entries: list[dict]`** (the discussion's typed message list). The renderer is `src/gui_2.py:3770 render_discussion_entry(...)`. Per entry, the user can: | # | Operation | GUI control | Source code | What it does | |---|---|---|---|---| | A1 | **Edit content in place** | `imgui.input_text_multiline` on the entry body | `gui_2.py:3841` | The entry's `content` field is a fully editable multi-line text input. The user can rewrite an AI's response, fix a typo in their own prompt, paste in code from another source, etc. | | A2 | **Toggle read/edit mode** | `[Edit]` / `[Read]` button | `gui_2.py:3799` | When in `[Read]` mode, the content is rendered as Markdown with syntax highlighting (`render_discussion_entry_read_mode` at `gui_2.py:3855`). When in `[Edit]` mode, the multi-line text input is shown. | | A3 | **Toggle collapsed/expanded** | `+/-` button per entry | `gui_2.py:3789` | Collapsed entries show a 60-char preview (line 3822-3824). Expanded entries show full content. | | A4 | **Change role** | Combo box from `app.disc_roles` | `gui_2.py:3793-3796` | The entry's `role` field is editable. The list `app.disc_roles` is itself user-managed (see B5). | | A5 | **Insert entry before this one** | `Ins` button | `gui_2.py:3813` | `app.disc_entries.insert(index, {"role": "User", "content": "", "collapsed": True, "ts": project_manager.now_ts()})` | | A6 | **Delete this entry** | `Del` button | `gui_2.py:3815-3816` | `if entry in app.disc_entries: app.disc_entries.remove(entry)`. The membership check matters — ImGui can re-render stale state, so the check guards against double-delete. | | A7 | **Branch at this entry** | `Branch` button | `gui_2.py:3821` → `app._branch_discussion(index)` → `app_controller._branch_discussion:3503` → `project_manager.branch_discussion:429` | Creates a new Take named `_take_` and copies the history up to and including `index` into the new Take. The user is then switched to the new Take. | The entry dict shape itself is open: `{"role": str, "content": str, "collapsed": bool, "ts": str, ...}` plus optional `thinking_segments` (for AI entries with `` blocks, parsed by `src/thinking_parser.py`) and `usage` (for token accounting: input/output/cache). The user can also set per-entry `read_mode` (a render-time flag, not persisted). **Layer B — Discussion-level operations** (the Take / discussion set). These are the second-tier controls, rendered at `src/gui_2.py:4239 render_discussion_entry_controls(...)` and the discussion selector at `gui_2.py:4330 render_discussion_selector(...)`: | # | Operation | GUI control | Source code | What it does | |---|---|---|---|---| | B1 | **Append new entry** | `+ Entry` button | `gui_2.py:4240` | `app.disc_entries.append({...})` with the default role from `app.disc_roles[0]`. | | B2 | **Collapse all / Expand all** | `-All` / `+All` buttons | `gui_2.py:4242-4246` | Bulk-set `collapsed` flag on every entry. | | B3 | **Clear all** | `Clear All` button | `gui_2.py:4248` | `app.disc_entries.clear()`. | | B4 | **Save (flush to project TOML)** | `Save` button | `gui_2.py:4250` | `app._flush_to_project(); app._flush_to_config(); app.save_config()`. | | B5 | **Add/remove roles** | `Add` / `X` buttons under "Roles" | `gui_2.py:4317-4328` | `app.disc_roles.append(r)` / `app.disc_roles.pop(i)`. The role list is **user-managed at runtime** — they can add `"Context"`, `"Tool"`, `"Vendor API"`, or any custom role and assign it to any entry. | | B6 | **Switch active discussion** | Discussion combo + Take tabs | `gui_2.py:4197, 4344, 4354` | `app._switch_discussion(name)`. The Takes group by base name (`name.split("_take_")[0]`) and render as nested tabs. | | B7 | **Rename / Delete discussion** | `Rename` / `Delete` buttons | `gui_2.py:4291, 4293` | `app._rename_discussion(...)` / `app._delete_discussion(...)`. Cannot delete the last discussion (guarded at `app_controller.py:3543`). | | B8 | **Promote Take to top-level** | `Promote` button in takes panel | `gui_2.py:4364` | `project_manager.promote_take(app.project, app.active_discussion, new_name)` — renames a Take (e.g. `T0_take_2`) to a fresh top-level discussion name. | | B9 | **Per-role filter** | `ui_focus_agent` selector (system-wide) | `gui_2.py:4230-4234` | `display_entries = [e for e in app.disc_entries if e.get("role") == persona_name or e.get("role") == "User"]`. The filter follows the MMA persona focus. | | B10 | **Truncate to N pairs** | `Truncate` button + `drag_int` | `gui_2.py:4254-4260` | `truncate_entries(app.disc_entries, app.ui_disc_truncate_pairs)` keeps the last `N` User/AI pairs (per `gui_2.py:175 truncate_entries(...)`). | | B11 | **Compress (AI summarization)** | `Compress` button | `gui_2.py:4252` → `app_controller._handle_compress_discussion:3357` | Calls `ai_client.run_discussion_compression(disc_text)` and replaces the discussion with the LLM's compressed version. | **Layer C — UI snapshot history (undo/redo).** The `HistoryManager` (`src/history.py:71`, `max_capacity=100`) and `UISnapshot` (`history.py:8-63`) provide Ctrl+Z / Ctrl+Y across the entire UI state — including `disc_entries`: | # | Operation | Source code | What it does | |---|---|---|---| | C1 | **Take snapshot** | `gui_2.py:735 _take_snapshot` → `history.UISnapshot(...)` | `copy.deepcopy(self.disc_entries)` — a deep copy of the full entry list is captured. The snapshot also captures `ai_input`, `temperature`, `top_p`, `max_tokens`, `auto_add_history`, `files`, `context_files`, `screenshots`, all system prompts. | | C2 | **Apply snapshot (undo/redo)** | `gui_2.py:754 _apply_snapshot` | Restores `self.disc_entries = snapshot.disc_entries` (and all the other fields). | | C3 | **Change detection triggers snapshot** | `gui_2.py:1160, 1166-1167` | `if len(current.disc_entries) != len(self._last_ui_snapshot.disc_entries) or ...` — disc_entries content change pushes a new snapshot. | | C4 | **Capacity-evict oldest** | `history.py:80-90 push()` | When the undo stack exceeds 100, the oldest is popped from the front. | | C5 | **Jump to specific state** | `history.py:129 jump_to_undo(index, current_state, ...)` | Allows time-traveling to any past snapshot, not just the most recent. | **Summary of editability.** Manual Slop provides: - **Per-entry content edit** (A1, A2) — the AI's response text is fully editable in the GUI - **Per-entry insert at any position** (A5) — the user can drop a new entry *between* two existing entries, not just append - **Per-entry delete at any position** (A6) - **Per-entry role change** (A4) — the user can re-label any entry as User, AI, Tool, Context, or any custom role - **Per-entry branch** (A7) — creates a Take at any entry, not just at the end - **Per-entry collapse/expand** (A3) — visual organization - **Per-discussion full CRUD** (B1, B6, B7, B8) — append, switch, rename, delete, promote - **Per-role set management** (B5) — the role list itself is user-editable - **Bulk operations** (B2, B3, B10) — collapse/expand all, clear, truncate - **AI-assisted compression** (B11) — summarize the whole discussion - **Undo/redo across all of the above** (C1-C5) — Ctrl+Z / Ctrl+Y / jump-to-state **What Manual Slop does NOT have.** The user cannot edit the **provider-side raw transcript** — the bytes inside the `ai_client._anthropic_history`, `ai_client._gemini_chat._history`, etc. process globals. These are reset on `ai_client.reset_session()`. nagent's "edit the conversation file" pattern operates at *this* layer, not the entry abstraction. The comms log (`comms.log`) is JSON-L and append-only, not user-editable from the GUI (it can be edited on disk in a text editor, but that's a different workflow). **Verdict.** **PARITY (DIFFERENT FOCUS).** Both systems support comprehensive editing of the conversation-as-data. The difference is *what counts as "the conversation"*: - nagent's "conversation" = the raw transcript text file (the bytes the LLM produced) - Manual Slop's "conversation" = a typed entry list with role + content + metadata + optional thinking segments Manual Slop's editing is **more granular and more pervasive** (per-entry content edit, per-entry insert/delete, per-entry role-change, per-entry branch, with undo/redo). nagent's editing is **deeper at the raw transcript layer** (edit the actual AI response text before it's been abstracted into a typed entry). Both are real; both are deliberate. **Domain tag:** Application. The Application's typed-entry abstraction is intentional — the user thinks in "discussions" not "transcripts." The user can opt-in to the raw-transcript layer by editing `comms.log` on disk or by reading the TOML `discussions//history` field directly. *Future-track candidate: optionally persist the raw transcript as a sibling file under each take (Candidate 10 in `decisions.md`), enabling the nagent-style "edit the actual AI response" workflow for users who want it.* --- ## 4. Visible output protocol **nagent's claim.** Free-form model output is hard to execute. Use a visible protocol: ``, ``, ``, ``, etc. The startup prompt lists the only tags the model may emit. The parser is strict: recognized tags and whitespace. Nothing else. **"If you cannot read the protocol, you cannot debug the system."** **nagent's implementation.** `bin/nagent:TAG_PATTERNS` is a list of `(tag_type, compiled_regex)` tuples. `parse_response()` returns `None, error` if any non-whitespace text is found outside a known tag. The error message is appended to the conversation and the model is asked to retry (up to `MAX_FORMAT_RETRIES = 3`). **Manual Slop's equivalent.** Manual Slop's Application AI uses **provider-native function calling** (Gemini `genai.types.FunctionDeclaration`, Anthropic `tool_use` blocks, etc.). This is *opaque*: the protocol is encoded in JSON the provider parses. The user cannot read a `function_call` from the comms log and reason about it without knowing the provider's schema. The two approaches are **structurally different**: | Aspect | nagent regex tags | Manual Slop function calling | |---|---|---| | Visibility | Plain text, inspectable in the conversation file | JSON blobs in provider-specific format | | Per-provider portability | Same tags work across all 4 providers | Each provider has its own schema; mcp_client's 45 tools have 5 different per-provider formats | | Provider capability ceiling | Whatever the model can emit as text | Native parallel tool calls, structured outputs, JSON-mode constraints | | Debuggability | "Why didn't the model read the file?" → grep the conversation for the tag | "Why didn't the model call read_file?" → inspect the JSON response | **Verdict.** **ARCHITECTURAL DIFFERENCE** — both are correct for their domain. The Application *wants* parallel tool calls, JSON-mode constraints, and provider-side caching. The Meta-Tooling *might want* nagent's regex tags for explicit debuggability. **Domain tag:** Both. The Application's choice is right (modern providers all support function calling with parallel execution — see `docs/guide_ai_client.md` §"Async Tool Execution"). The Meta-Tooling *could* adopt nagent's regex-tag protocol for its own work — for example, by using `` instead of a tool-call JSON. This is explicitly the difference between the "Application's internal AI" and the "Meta-Tooling that builds the Application" in `docs/guide_meta_boundary.md`. *Future-track candidate: a Meta-Tooling-side DSL for compact tool calls (per the existing `docs/reports/PLANNING_DIGEST_20260606.md` reference to "an intent-based DSL" for "discovery" or "combinatorics").* --- ## 5. The loop (append, call, parse, act, append, repeat) **nagent's claim.** "Agent behavior" is mostly: append, call, parse, act, append, repeat. Heavier systems add infrastructure around the same steps. **nagent's implementation.** `bin/nagent:run_agent_loop` is a `while True` loop: 1. Append user prompt to conversation file 2. Send conversation file to LLM (via `nagent-llm-text --json`) 3. Append response to conversation file 4. If response contains action tags: run those actions, append results, continue loop 5. If response contains ``: print and stop **Manual Slop's equivalent.** Manual Slop has *three* parallel "loops": 1. **`src/ai_client.py:_send_`** — the per-provider tool-call loop. Up to `MAX_TOOL_ROUNDS + 2 = 12` iterations. Each round: call provider, parse function calls, dispatch, append tool results. Same shape as nagent. 2. **`src/multi_agent_conductor.py:ConductorEngine.run`** — the MMA loop. Per ticket: `ai_client.reset_session()` (Context Amnesia), build prompt, `loop.run_in_executor(None, run_worker_lifecycle, ...)`. Different scope (per ticket, not per user turn). 3. **`simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async`** — the 1:1 chat loop. Per user turn: build markdown, send, wait, append response. Different scope (per user turn, in the App). All three have the same "append, call, parse, act, repeat" shape. They differ in *what gets appended* (per-provider history vs track state vs `disc_entries`). **Verdict.** **PARITY.** The loop is the universal pattern. Manual Slop's three loops are at different layers (LLM, MMA, App). The lack of a *single* "the loop" file is a real cost — nagent's `run_agent_loop` is 50 lines, easy to reason about. Manual Slop's loops are 100-300 lines each, scattered. *Future-track candidate: a single `src/llm_loop.py:run_loop(...)` function that all three callers use, with the dispatch and parse layers injected. (Not a high-priority refactor; the current separation is readable.)* **Domain tag:** Both. --- ## 6. Per-file memory (curation, not conversation log) **nagent's claim.** One conversation grows too large. Attach memory to artifacts. Work keeps coming back to the same files; give each file its own persistent local memory. **"When work orbits one artifact, store memory on that identity."** **nagent's implementation.** `bin/helpers/nagent_file_edit_lib.py` provides: - `file_id_for_path(path) -> "{st_dev}:{st_ino}"` — a stable file identity across renames (the inode is preserved). - `file_index_path(root, pid) -> conversations/file-index-{pid}.json` — a JSON registry of `{file_id: {path, conversation}}`. - `resolve_file_edit_conversation(root, pid, file_path) -> (name, resolved, file_id)` — gets or creates a per-file conversation. - `nagent-file-edit --file src/foo.py "add validation"` — spawns a new nagent process with `--file_edit src/foo.py`, which loads the file's *previous* conversation as the initial context. After edits, the new file is appended to the same conversation. The result: a per-file conversation log keyed by inode. Rename with same inode = same conversation. Pure path-based: nope, you'd collide across two repos on the same machine. **Manual Slop's equivalent (corrected per user).** The first draft of this report marked this section as "DOMAIN MISMATCH" — claiming Manual Slop has no per-file memory. **This was wrong.** Manual Slop *does* have a per-file memory concept. It's just **a different kind of memory**. Where nagent's per-file memory is a *conversation log* (what the LLM said about this file last time), Manual Slop's is a *curation config* (how to present this file in the AI's context window). The two are complementary, not equivalent. The Manual Slop per-file memory: ```python # src/models.py:510 @dataclass class FileItem: path: str # the artifact identity (path-keyed, no inode) auto_aggregate: bool = True # include in auto-aggregation? force_full: bool = False # bypass aggregation with full content? view_mode: str = 'full' # full / skeleton / summary / sig / def / agg selected: bool = False # for batch operations ast_signatures: bool = False # only signatures ast_definitions: bool = False # only definitions ast_mask: dict[str, str] # per-symbol mask (from Structural File Editor) custom_slices: list[dict] # Fuzzy Anchor slices with tag+comment injected_at: Optional[float] # timestamp ``` Plus the **ContextPreset** (`src/models.py:909`): a *named, persisted set* of `FileItem`s, stored in the project's `manual_slop.toml`. Load a preset → restore the same per-file curation state. This is the per-file memory that survives across discussions. The user pointed at this directly: *"we have the context composition we can directly control what's in memory at the start of a discussion."* That's the right framing. `aggregate.py:run` builds the initial markdown from `self.context_files` (the active preset's FileItems) + `aggregate.run(flat, aggregation_strategy=...)`. The user controls the per-file memory at discussion start. What's *missing* is nagent's specific pattern: **a per-file conversation log keyed by inode.** Manual Slop does not have a "last investigation of this file" concept stored as a file. The closest analog is *commit history* (the discussion itself is git-linked, per `docs/guide_gui_2.md` §"Discussions Sub-Menu" "Git Commit Tracking"). But that's discussion-scoped, not file-scoped. **Verdict.** **MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION; nagent IS STRONGER IN THE CONVERSATION-LOG DIMENSION.** Both have a real per-file memory concept. Manual Slop's is "how do I render this file next time the AI sees it" (rich, with 9 fields, AST-aware); nagent's is "what did the LLM say about this file last time" (plain text, with stable inode identity). The two are not equivalent; they're different optimizations for different needs. **Domain tag:** Application (for the curation config). The user-correction explicitly said: *"we have the context composition we can directly control what's in memory at the start of a discussion."* That confirms this is a real Application feature, not a gap. *Future-track candidate: extending the per-file memory with a thin "last-investigation" log per file. A `~/.manual_slop/per_file/.md` (file_id by inode, like nagent) that records the last time a discussion referenced this file, the questions asked, and the answers received. This is a Meta-Tooling-friendly addition because it's a plain file.* --- ## 7. Repository history as data **nagent's claim.** A repo is not only the current tree. History is data too. Transform git history into editing context for a target file. Not vague "retrieval." Explicit transformation of historical artifacts into working input. **nagent's implementation.** `bin/nagent:file_edit_history_and_summary_block(file_edit_path, ...)`: - `git_file_history(repo_root, rel_path)` — `git log --follow --max-count=50` per file - `summarize_new_file_commits(...)` — LLM call to one-line-summarize new commits - `coedited_file_rows(repo_root, rel_path, commits)` — counts files in the same commits; labels high/medium/low co-edit rate - `format_file_history(...)` — produces a `{file-history}` block with editors, step-by-step, co-edited files, summarized commits **Manual Slop's equivalent (partial).** Manual Slop's `_reread_file_items` (in `ai_client.py`) does mtime-based *current* content re-reading with diff injection as `[SYSTEM: FILES UPDATED]`. It does *not* do git history injection. The closest things Manual Slop has: - **Git commit-linked discussion tracking** in the GUI: each discussion has a "Update Commit" button that stamps `git rev-parse HEAD` (per `docs/guide_gui_2.md` §"Discussions Sub-Menu"). - **`src/dag_engine.py`** tracks ticket-to-git-commit relationships, but for *MMA* workers, not for the AI's context. **Verdict.** **PARTIAL.** Manual Slop has current-content diff injection (the easy half) but lacks historical-context injection (the harder half). nagent's `summarize_new_file_commits` would be a useful addition to the Manual Slop AI's context — especially for "explain what this file does" questions where the LLM is meeting the file fresh. **Domain tag:** Application. *Future-track candidate: a `src/git_history.py` module that mirrors nagent's `file_edit_history_and_summary_block` and is invoked at discussion start (after `aggregate.py`).* --- ## 8. Historical coupling & artifact neighborhoods **nagent's claim.** A file lives in a neighborhood of related artifacts. Files that change together in git history are hints: tests, headers, config, paired implementation. High co-edit rate means "look here maybe." Not "edit everything." **nagent's implementation.** `coedited_file_rows(repo_root, rel_path, commits)`: - Counts files in the same commits as the target - Labels: high (>=50% co-edit), medium (>=20%), low - Renders a `| file | commits together | P(other file changed | target file changed) |` table - Guidance text: "Use these files as hints. Before editing, inspect high-likelihood co-edited files when the requested change may affect interfaces, tests, config, or paired code. Do not edit them unless the user request or evidence requires it." **Manual Slop's equivalent.** None. Manual Slop has `py_get_hierarchy` (subclass scan) and `ts_c_*_get_*` AST tools, but **no tool that returns "files that historically co-edit with this file."** The closest is `derive_code_path` (call-graph trace), which is structural not historical. **Verdict.** **GAP.** This is a real missing tool. nagent's framing — "hints, not commands" — is exactly the right level for a co-edit suggestion. A 50-line tool (`py_coedit_files(path) -> list[(path, count, likelihood)]`) would fill the gap. **Domain tag:** Application. *Future-track candidate: a `py_coedited_files` MCP tool + `ts_c_coedited_files` for C/C++.* --- ## 9. Disposable sub-conversations **nagent's claim.** Exploration creates noise. Spawn disposable workers. Sub-conversations are temporary nagent processes with isolated conversations. Their lifetime does not matter. The artifact they return matters. **nagent's implementation.** `` tag in the main loop's response: - Parent appends `` to its conversation - Parent spawns `nagent --invocation delegated --parent-conversation --json` as a subprocess - Child's `--json` output is parsed, rolled up into the parent's `recursive_input_tokens` / `recursive_output_tokens` - Child has its own conversation file; no shared context except the explicit prompt - Parent gets a concise artifact: the child's `` content, plus token usage **Manual Slop's equivalent (corrected per user).** The first draft of this report claimed **PARITY (stronger in some ways)**. The user corrected this: > *"I don't know if I have disposable sub-conversations, I don't really have them for non-mma runs. I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points."* So the actual picture is: | Layer | Sub-conversation support | |---|---| | **MMA Tier 3 / Tier 4** | **Yes.** `mma_exec.py` spawns a real subprocess per ticket with Context Amnesia. `ai_client.reset_session()` at start of `run_worker_lifecycle`. The Ticket output is the "distilled artifact" returned to the parent (`ConductorEngine`). Per the docs: *"Tier 3 worker is a fresh subprocess with a clean context window, receiving only the prompt and the relevant context slice."* | | **1:1 main discussion** | **No.** The Application's chat loop has no sub-conversation mechanism. The user types a prompt, the AI responds, the loop continues. There's no way to "ask a sub-agent to investigate X and bring back the answer." | The user is correct: this is a gap. The MMA pattern is the prototype. A future track could extract `MMA's run_worker_lifecycle` into a reusable `app.spawn_sub_conversation(prompt, allowed_tools=...)` method that the App can call from `pre_tool_callback` or from a new "investigate this" command. **Verdict.** **PARITY for MMA; GAP for 1:1 discussions.** The MMA pattern is strong. The 1:1 chat has no equivalent. The user explicitly flagged this as a want. **Domain tag:** Application (and possibly Meta-Tooling). *Future-track candidate: a `src/sub_conversation.py:SubConversationRunner` that the App can call to spawn disposable sub-agents on-demand during 1:1 discussions. Per the user: useful for "specific points" within a longer conversation.* --- ## 10. Controlled writes **nagent's claim.** A loop that writes files needs explicit boundaries. nagent is a reference implementation with conventions, **not a sandbox**. Shell runs with your permissions. Structured writes are checked. That is not a security boundary. Do not pretend it is. **nagent's implementation.** - `validate_write_path(path, file_edit_path, ...)` — in main mode: path must be in `/tmp`, `/var/tmp`, or `$TMPDIR`. In file-edit mode: path must be the target file (or one of its split segments). - Rejected writes append `` to the conversation. - `` runs whatever the LLM wrote, with the user's permissions, in the user's working directory. **There is no shell sandbox.** This is explicit. **Manual Slop's equivalent.** Manual Slop has a *much* stronger security model: | nagent | Manual Slop | |---|---| | `validate_write_path`: in main mode, path must be in `/tmp`, `/var/tmp`, or `$TMPDIR` | `mcp_client._is_allowed`: in main mode, path must be in the allowlist (constructed from `file_items` + `extra_base_dirs`); history.toml and `*_history.toml` are *always* blocked | | `execute_write` writes the file directly | `set_file_slice` / `edit_file` / `py_update_definition` route through AST or string-match for validation | | `` runs the user's full shell, full permissions, no approval | `run_powershell(script, base_dir, qa_callback=...)` requires GUI modal approval (Execution Clutch), 60s timeout, `taskkill` cleanup, optional Tier 4 QA on failure | | No per-tool allowlist | 3-layer security: `configure` (allowlist) → `_is_allowed` (path validation) → `_resolve_and_check` (resolution + symlink resolution) | | No sandbox at all | PowerShell-only (no bash/cmd) by default; can be enabled in `[mcp_env.toml]` | **Verdict.** **PARITY (STRONGER on Manual Slop's side).** Manual Slop's HITL-required shell execution + 3-layer allowlist is *dramatically* more secure than nagent's tmpdir check. The user explicitly chooses "less safety but more flexibility" with nagent, and "more safety but more friction" with Manual Slop. **Domain tag:** Both. The Application needs Manual Slop's strict model. The Meta-Tooling could legitimately use nagent's looser model *because the human is in the loop* (the bridge script pops a GUI dialog). --- ## 11. Large files as explicit artifacts (split/patch) **nagent's claim.** Big files exceed context. Split them. Do not pretend they fit. The split is a *data structure* with `index.json` and segment files; the patch is a unified diff; the source hash validates that nothing changed. **nagent's implementation.** The 4-file pipeline: 1. **`nagent-file-split --output --split [--summarize] [--refresh INDEX] [--target-bytes 32768] [--natural]`**: - `EXTENSION_MAP` covers 11 languages (txt, md, cpp, py, xml, js, ts, json, yaml, go, rs, java) - Per-language `SCORE_BY_TYPE` (no tree-sitter; regex + line-counting + brace/JSON/XML depth counters) - `py_score` rewards blank lines followed by `def`/`class`/`async def` - `cpp_score` uses `brace_depth` to find closing braces at depth 0 - `json_score` uses `json_depth` to find closing `}`/`]` at depth 0 - Writes `index.json` with `source_path`, `sourcesha256`, `source_size_bytes`, `source_line_count`, `split_type`, `target_bytes`, `natural`, `created_at`, `segment_count`, `segments[]` - Each segment is a separate file with `name-0001.py`, `name-0002.py`, etc. - `--summarize` flag spawns `nagent-file-summarize` per-segment subprocess 2. **User edits the segment files** (in place, via vim, etc.) 3. **`nagent-file-patch [--patch PATH] [--dry-run] [--force]`**: - `validate_index(index, require_hash_match=not force)` — **strict** hash check; rejects if source changed - `merge_segments(segments) -> str` — concatenates segment contents in order - `make_unified_patch(source, original, updated)` — `difflib.unified_diff` - Writes the patch file; if `apply=True` and `changed=True`, writes the source 4. **`nagent-file-summarize [--limit-word-count N] [--output DIR] [--json]`**: - Files > 64 KB cascade to `nagent-file-split --summarize` first - `summarize_content` retries up to `SUMMARY_MAX_ATTEMPTS = 2` if the LLM overshoots the word limit - `combined_summary_from_index` glues per-segment summaries into one **Manual Slop's equivalent (different mechanism, same insight).** Manual Slop has all the *parts* of nagent's split/patch/summarize, but they live in different files and use different mechanisms: | nagent | Manual Slop | |---|---| | `nagent-file-split` with per-language `SCORE_BY_TYPE` (regex + line counts + brace/JSON/XML depth) | `aggregate.py:build_file_items()` + `py_get_skeleton` (tree-sitter) + `ts_c_*_get_skeleton` (tree-sitter) + `outline_tool.py` | | `index.json` with `source_path`, `sourcesha256`, `segments[]` | No explicit `index.json`. The "split" is implicit in `_reread_file_items` (mtime-based, not hash-based) and the `py_get_skeleton` tool returns the structural view on demand. | | `nagent-file-patch` with strict `validate_index` (hash check) | `set_file_slice` / `edit_file` with `result of file.read_text()` pre-write validation. No hash-based pre-validation. | | `nagent-file-summarize` with per-segment LLM call + retry | `run_subagent_summarization(file_path, content, is_code, outline) -> str` (in-process LLM call) | | Combined `combined_summary_from_index` | No equivalent; `aggregate.build_markdown_no_history` builds a single markdown per call | | `nagent-file-summarize` cascades to `nagent-file-split --summarize` for > 64 KB | `RAGEngine._chunk_code` cascades to chunking for Python (mtime-based invalidation, ChromaDB persistence) | **Crucial difference: Manual Slop uses tree-sitter, nagent does not.** nagent's per-language scoring functions are *all regex-based* (`cpp_score` looks for closing braces at depth 0; `py_score` looks for blank lines followed by `def`/`class` keywords; no AST parsing). Manual Slop's `py_get_skeleton` and `ts_c_*_get_skeleton` use the tree-sitter library for actual AST traversal. This is a trade-off. Tree-sitter is more accurate but requires a native dependency. nagent's approach works on any Python install with no compiled extensions. For the Application domain, tree-sitter is already a dependency (`file_cache.py`); for the Meta-Tooling, nagent's regex approach has appeal. **Verdict.** **PARITY (DIFFERENT MECHANISM).** Both have the "split / patch / summarize as explicit data artifacts" insight. nagent uses subprocesses + per-language scoring + hash validation. Manual Slop uses tree-sitter + in-process calls + mtime validation. The key safety property — *"the patch operation validates the source hasn't changed"* — is done by nagent via SHA-256; Manual Slop does it implicitly by re-reading the file and string-matching. Manual Slop could adopt the explicit hash approach for stronger guarantees. **Domain tag:** Both. *Future-track candidate: an explicit `src/split_lib.py` + `src/patch_lib.py` mirroring nagent's design, used by the Application for very-large-file scenarios (e.g., a 200KB legacy C file where skeleton + sig + def aggregation isn't enough).* --- ## 12. Tool discovery (self-describing executables) **nagent's claim.** Tool capability should be explicit data too. No central registry. Tools describe themselves. **nagent's implementation.** `bin/helpers/nagent_cli.py:collect_bin_tool_descriptions(bin_dir)`: - Iterates every executable in `bin/` - Runs each with `--description` (10s timeout per) - Captures stdout, parses it - Concatenates into a single "Available tools:\n\n\n\n\n..." block - Inserts this block into the initial context Each tool's `__main__` starts with: ```python def exit_on_description(description: str) -> None: if "--description" in sys.argv: print(description) raise SystemExit(0) ``` So `nagent-file-split --description` prints "Split a large file into structure-aware segments..." and exits 0. The main `nagent` loop calls `collect_bin_tool_descriptions` once at startup. **Manual Slop's equivalent.** None. The 45 MCP tools in `src/mcp_client.py` are dispatched by a flat if/elif chain in `dispatch()`: ```python def dispatch(tool_name, tool_input): if tool_name.startswith("bd_"): return _dispatch_beads(tool_name, tool_input) if tool_name == "read_file": return _read_file(tool_input["path"]) if tool_name == "py_get_skeleton": return _py_get_skeleton(tool_input["path"]) # ... 45+ branches ... return f"ERROR: unknown tool: {tool_name}" ``` Adding a new tool requires: 1. Edit `dispatch()` to add the branch 2. Update the security allowlist in `_resolve_and_check` (if filesystem access) 3. Update the AI capability declaration in `get_tool_schemas()` 4. Add tests nagent's approach: drop an executable in `bin/`, implement `exit_on_description`, done. The tool is auto-discovered. The user (per the pushback): *"The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet."* — so this is a known want, but low priority. **Verdict.** **GAP (Application).** nagent's pattern is genuinely better here, but Manual Slop has 45 tools in production and a migration would be a big refactor. The win is real (extensibility) but the cost is also real (rewrite the dispatch layer). **Domain tag:** Both. For the Meta-Tooling (the `scripts/` directory), nagent's pattern is more aligned with the external-agent usage model. For the Application, the existing `dispatch` if/elif is fine. *Future-track candidate: a `mcp_architecture_refactor_20260606` (already on the board) would benefit from nagent's pattern. The "sub-MCP" extraction the planned refactor proposes is exactly the right scope for this — each sub-MCP could be its own self-describing module.* --- ## 13. Differences from frameworks nagent's philosophical frame: framework-style systems hide state in object graphs and long-lived agent abstractions; nagent keeps everything as explicit files. The reframing table at the end of the nagent README is excellent: | Common term | nagent framing | |---|---| | memory | editable artifact | | retrieval | preserved work / historical context | | agent | temporary transformation function | | context | explicit input data | This report's §2-§12 have been showing where Manual Slop *agrees* with nagent's reframings and where it *deliberately diverges*. **Verdict.** The reframing is useful. The application can pick and choose which reframings to adopt per layer. **Domain tag:** Both. This is the philosophical lens for the whole report. --- ## 14. Build your own nagent's last section: *"The minimal system is not mystical. Small loop over explicit state."* The list of 12 buildable steps: `generate_text(file) -> str`, growing conversation document, initial context with the contract, output format + parser, handlers that append results to state, loop after actions, visible retry on malformed output, child loops for delegation, per-artifact memory, repository history → context blocks, split/index/patch for large files, save/load/edit/summarize for memory maintenance. **Verdict.** Manual Slop *has* all 12 of these. Just in different files, with different names, and at a different scale. **Domain tag:** Both. The 12-step list is a useful checklist for any future LLM-application track. --- ## 15. The 6 Pitfalls (Revised from 8, after User Corrections) The first draft of this report had 8 pitfalls. The user-corrections on §3 and §6 collapsed 2 of them. The remaining 6: ### Pitfall 1: No structured output protocol in the Application AI The Application uses opaque provider-native function calling. The user can read the conversation, but cannot read a `tool_call` from the comms log without knowing the provider's schema. nagent's regex-tag protocol is more debuggable for the Meta-Tooling. **Decision: not a problem for the Application (provider-native is the right choice). Worth borrowing for the Meta-Tooling.** **Domain tag:** Both. *Future-track candidate: an intent-based DSL for Meta-Tooling agent calls.* ### Pitfall 2: Provider-specific history is in process globals `src/ai_client.py` has `_anthropic_history`, `_deepseek_history`, `_minimax_history` — 3 separate per-provider history lists, each with their own lock. Switching providers mid-session loses history. nagent's "single conversation file" model is provider-agnostic. **Concrete change:** A future refactor toward a stateless `LLMClient` class with an explicit `Conversation` object (the transcript as a `list[Message]`) would let: - Users save/load/replay conversations - Provider switching doesn't lose history - Tier 4 QA and Tier 3 workers share a common conversation format **Domain tag:** Application. *Future-track candidate: a `src/conversation.py:Conversation` dataclass + `src/llm_client.py:LLMClient` stateless wrapper around the 5 providers.* ### Pitfall 3: RAG is not "history as data" Manual Slop's RAG (`src/rag_engine.py`) is fuzzy and not auditable. nagent's git-history-driven context is exact and inspectable. RAG is useful but should be **additive**, not a replacement. The Application's `_reread_file_items` mtime-based diff injection is the "history as data" mechanism Manual Slop already has. **The user's clarification:** *"RAG is an optional thing, doesn't have to be used. Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run."* **Decision:** RAG stays. The user wants a *staging* workflow: a sub-agent prepares RAG chunks before a run, the chunks become the discussion's starting memory. This is consistent with the nagent-inspired sub-conversation pattern (§9). **Domain tag:** Application. *Future-track candidate: a "RAG pre-staging" sub-conversation runner that pre-builds the index for a planned run.* ### Pitfall 4: The AI client is a stateful singleton with module-level globals 2,685-line `src/ai_client.py`. The module is the abstraction layer. To import it for testing, you trigger 5 provider SDKs' lazy imports. The unit tests are the only way to know what state is in flight. This is the *opposite* of nagent's "files are the system; the process is a worker." nagent's `run_agent_loop` is 50 lines, stateless, testable. A future refactor toward a stateless `LLMClient` class would make `ai_client` parseable, testable, and saveable. **Domain tag:** Application. *Future-track candidate: a `src/llm_client.py:LLMClient` class with explicit `Conversation`, `Provider`, `History` objects. Backwards-compatible with the current `ai_client.send()` API.* ### Pitfall 5: No non-MMA disposable sub-conversations The MMA pattern is strong. The 1:1 chat has no equivalent. The user *explicitly* flagged this as a want: *"I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points."* **Decision:** Design `src/sub_conversation.py:SubConversationRunner` that the App can call to spawn disposable sub-agents on-demand during 1:1 discussions. Reuse MMA's subprocess pattern (`mma_exec.py` as the template). The sub-agent returns a concise artifact to the parent (nagent's pattern). Useful for "investigate this file" / "summarize this concept" / "look up this API" commands. **Domain tag:** Application. *Future-track candidate: a `src/sub_conversation.py` + a GUI "Investigate…" button on the message panel.* ### Pitfall 6: Hard-coded tool discovery The 45 MCP tools in `mcp_client.py:dispatch` are in a flat if/elif chain. nagent's `--description` self-describing executable pattern is more extensible. **The user's position:** *"The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet."* **Decision:** Low priority. The `mcp_architecture_refactor_20260606` (already on the board) is the natural place to address this — sub-MCPs as self-describing modules. **Domain tag:** Both. *Future-track candidate: subsumed by mcp_architecture_refactor_20260606.* ### Pitfalls removed by user-corrections - **(removed)** Pitfall about "Conversation state is buried in module-level globals" — overstated. Manual Slop has editable UI state (Takes, UISnapshot, ContextPreset); it lacks editable *raw transcripts*, but that's a *different* design choice, not a gap. (See §3.) - **(removed)** Pitfall about "per-file memory" — overstated. Manual Slop *does* have per-file memory in the curation dimension; what's missing is nagent's conversation-log dimension, which is a different optimization. (See §6.) --- ## 16. Recommended reading path for engineers If you haven't read nagent, here's the priority: 1. **The README's first 3 sections** ("What It Looks Like", "Durable Work", "Text In Text Out") — the philosophy in 5 minutes. 2. **`bin/nagent:run_agent_loop()`** — the actual loop, 50 lines. 3. **`bin/helpers/nagent_file_split_lib.py:SCORE_BY_TYPE`** — the per-language scoring; shows what "structure-aware" can mean without tree-sitter. 4. **`bin/helpers/nagent_file_patch_lib.py:validate_index`** — the strict hash check; the safety property of nagent's split/patch workflow. 5. **`bin/helpers/nagent_file_summarize_lib.py:summarize_content`** — the retry-with-smaller-prompt pattern. 6. **`bin/helpers/nagent_cli.py:collect_bin_tool_descriptions`** — the tool-discovery pattern; 30 lines. The README's 14 sections can be skimmed in 15 minutes if you have the context this report provides. Read in order 1-5 above for the implementation depth. --- ## Appendix A. Cross-reference table | nagent file | Lines | Purpose | Manual Slop equivalent | |---|---|---|---| | `README.md` | ~1500 | 14-section teaching document | This report + `docs/guide_*.md` | | `bin/nagent` | ~700 | Main loop, tag parser, sub-conversation runner | `src/ai_client.py:send` + `src/multi_agent_conductor.py:ConductorEngine.run` + `simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async` (3 separate loops) | | `bin/nagent-llm-text` | ~50 | CLI wrapper for `nagent-llm.py` | (implicit; the Application calls `ai_client.send` directly) | | `bin/nagent-llm-upload` | ~30 | File upload + LLM call | (not present; the Application's read tools handle files inline) | | `bin/nagent-file-edit` | ~120 | Per-file subprocess wrapper | (not present; this is the gap that the user wants for 1:1 discussions) | | `bin/nagent-file-split` | ~170 | Main split executable | (not present in this form; Manual Slop uses `aggregate.py` + tree-sitter) | | `bin/nagent-file-patch` | ~80 | Main patch executable | (not present; Manual Slop uses `set_file_slice` / `edit_file` directly) | | `bin/nagent-file-summarize` | ~100 | Main summarize executable | `src/ai_client.py:run_subagent_summarization` (in-process) | | `bin/helpers/nagent_cli.py` | ~80 | `--description` pattern, `WaitSpinner` | (not present) | | `bin/helpers/nagent_llm.py` | ~300 | 4 providers, token accounting | `src/ai_client.py:_send_` × 5 (in-process, with cross-provider state) | | `bin/helpers/nagent_file_edit_lib.py` | ~170 | file-index by inode, `resolve_file_edit_conversation` | (not present) | | `bin/helpers/nagent_file_split_lib.py` | ~400 | `SPLIT_TYPES` (11 langs), per-language scoring | `src/file_cache.py:ASTParser` (tree-sitter) + `src/aggregate.py:build_file_items` | | `bin/helpers/nagent_file_patch_lib.py` | ~130 | strict hash validation, `make_unified_patch` | (not present; implicit mtime check) | | `bin/helpers/nagent_file_summarize_lib.py` | ~110 | per-segment LLM call, retry-with-smaller-prompt | `src/ai_client.py:run_subagent_summarization` (in-process, no retry) | | **Total nagent** | **~4000** | | **Manual Slop's analogous parts: ~5000+** (ai_client + multi_agent_conductor + mcp_client + aggregate + rag_engine + history + project_manager + tree-sitter-based tools) | Manual Slop is *not* smaller than nagent; it's *larger* because it has a GUI, persistence, HITL dialogs, Hook API, and a real test harness. The architectures serve different scales. --- ## Appendix B. Citations - nagent source: https://github.com/macton/nagent (all 11 source files read in full) - Internal: `docs/Readme.md`, `docs/guide_architecture.md`, `docs/guide_ai_client.md`, `docs/guide_mma.md`, `docs/guide_tools.md`, `docs/guide_mcp_client.md`, `docs/guide_app_controller.md`, `docs/guide_meta_boundary.md`, `docs/guide_context_curation.md`, `docs/guide_personas.md`, `docs/guide_rag.md`, `docs/guide_gui_2.md` - Internal source (selectively read for user-corrections): `src/models.py` (FileItem, ContextPreset), `src/context_presets.py`, `src/project_manager.py` (branch_discussion, promote_take), `src/aggregate.py`, `src/history.py` - Mike Acton, "Data-Oriented Design and C++" (cppCon 2014) — referenced but not directly cited - Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them" — cited via the `data_oriented_error_handling_20260606` track --- *End of report. See `comparison_table.md` for the flat reference, `decisions.md` for the future-track candidates, and `spec.md` for the track wrapper.*