Reference/analysis track. Produces 0 code changes. Artifacts (conductor/tracks/nagent_review_20260608/): - spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing - report.md (571 lines) - 14-section deep-dive; primary deliverable - comparison_table.md (79 lines) - flat side-by-side reference - decisions.md (286 lines) - 10 future-track candidates with priority matrix - nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded in code (file:line refs into nagent source and Manual Slop source) - metadata.json (132 lines) - structured metadata + verification criteria - state.toml (113 lines) - per-task tracking + user-corrections log (7 entries) 14 nagent principles covered in report.md (durable work, text-in/text-out, editable state, visible protocol, the loop, per-file memory, repo history, neighborhoods, sub-conversations, controlled writes, large files, tool discovery, framework differences, build your own). 6 pitfalls (revised from 8 after user-corrections): 1. No structured output protocol in Application AI (opaque function calling) 2. Provider-specific history in process globals (ai_client._anthropic_history + _deepseek_history + _minimax_history) 3. RAG is not 'history as data' (fuzzy, not auditable) 4. AI client is a stateful singleton (2,685-line ai_client.py) 5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want) 6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py) User-corrections applied (3 rounds, 7 total corrections recorded): - Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7 per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix - Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed conversation log; complementary, not equivalent) - Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for 1:1 discussions' (user wants this) - RAG: opt-in, not gap; user wants pre-staging via sub-conversation - Personas: config bundling (can opt out via AI settings) - Tool discovery: deferred (user has 'intent based DSL' idea but 'no where near that ideation yet') 10 actionable takeaways (separate from the 6 pitfalls - those are diagnosis, these are prescription): 1. State visibility (UI inspector for in-process state) 2. Readable conversation log (text-greppable, not just JSON-L) 3. Sub-agents for 1:1 (HIGH priority - user-flagged) 4. File-identity over file-path (st_dev:st_ino rename-safe) 5. One loop shape visible in diagnostics 6. Visible retry on protocol failure 7. Meta-Tooling DSL (intent-based, deferred) 8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606) 9. Single source of truth for disc_entries + provider history 10. Sub-agent return type constraint (bake into candidate #1 spec) Domain classification: every recommendation tagged Application / Meta-Tooling / Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling domain; Manual Slop's Application AI is a different kind of thing. No code modified by this track (reference/analysis only). All 7 files parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve. Track is 'active' awaiting human review; future-track candidates live in decisions.md and nagent_takeaways_20260608.md.
51 KiB
Mike Acton's nagent: A Deep-Dive Analysis vs Manual Slop
Track: nagent_review_20260608
Date: 2026-06-08 (revised with user corrections same day)
Author: Tier 2 Tech Lead (with significant user review on §3 and §6)
Companion to: spec.md (the track wrapper)
Important reading note. This report applies the Application vs Meta-Tooling distinction (per
docs/guide_meta_boundary.md) as the lens for every comparison. nagent is a Meta-Tooling reference; Manual Slop's Application AI is a different kind of thing. Where they share patterns (MMA workers, the tool-call loop, the 3-layer security model), the report says so. Where they don't, the report says so. The report deliberately avoids "nagent is better" / "Manual Slop is better" framings.Revision note. The first draft overstated gaps in Manual Slop's "editable discussion" and "per-file memory" features. The user caught this and pointed at the actual files (
FileItem,ContextPreset,aggregate.py,project_manager.branch_discussion,HistoryManager). The corrections are now folded in. Specific corrections: §3 (verdict changed from PARTIAL to PARITY (DIFFERENT FOCUS)); §6 (verdict changed from DOMAIN MISMATCH to MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION); §9 (verdict now notes the MMA vs 1:1 distinction explicitly per the user).
0. Reading guide
- Sections 1-14 map 1:1 to nagent's 14 principles. Each has: nagent's claim, nagent's implementation, Manual Slop's equivalent, a verdict, and a domain tag.
- Section 15 extracts the 6 actionable pitfalls and maps each to a future-track candidate.
- Section 16 is the recommended reading path for engineers who haven't read nagent.
If you only have 10 minutes, read §3 (Conversations), §6 (Per-File Memory), §9 (Sub-Conversations), §10 (Controlled Writes), and §15 (the pitfalls list).
1. Durable work, disposable workers
nagent's claim. A Python process is a worker; the files are the system. Workers come and go; data stays. "The agent is not the thing; the data is the thing."
nagent's implementation. bin/nagent is a 700-line single-file loop. It reads ~/.nagent/conversations/<conversation_name> (a plain text file) for the current conversation, appends to it after every action, and exits. The user types nagent "investigate this". The CLI is a shell. The state is a file.
Manual Slop's equivalent. Manual Slop has two parallel systems:
-
MMA workers are real subprocesses.
multi_agent_conductor._spawn_workerrunsmma_exec.pyviasubprocess.Popen(perdocs/guide_multi_agent_conductor.md§"Token Firewalling"). Each Tier 3 worker is a fresh Python process with Context Amnesia —ai_client.reset_session()at the start ofrun_worker_lifecycle. The subprocess is the disposable worker; the artifacts (track state, ticket results) are the system. -
The Application AI is not a disposable worker.
gui_2.py:Appis a long-lived Qt/ImGui process. The user types a prompt, hits Enter, gets a response, keeps the process running for hours. Theapp_statedataclass is the long-lived worker. This is intentional for the Application domain: persona-driven conversations, snapshot-based undo, cross-discussion state — all require a long-running process.
Verdict. PARTIAL — nagent's pattern lives in the Meta-Tooling + MMA, but the Application deliberately has long-lived workers. The two coexist because they serve different needs: MMA is fire-and-forget per ticket; App is an interactive partner.
Domain tag: Both. MMA has it; App doesn't need it. Future-track candidate: a stateless conversation-file pattern for the App (see §15.4).
2. Text in, text out
nagent's claim. The smallest useful primitive is: file in, text out. nagent-llm-text --file question.txt reads a file, calls the LLM, prints plain text or JSON. Everything else in nagent is orchestration around this.
nagent's implementation. bin/helpers/nagent_llm.py (300 lines) provides generate_text(message, provider, model) -> str for 4 providers (openai, anthropic, google, cursor). Token accounting via provider usage metadata (with character-count fallback at 1 token per 4 chars). Provider churn is isolated in this file.
Manual Slop's equivalent. src/ai_client.py:send(...) -> str is the parallel. 5 providers (gemini, anthropic, deepseek, minimax, gemini_cli). Same provider, model, usage shape. Manual Slop wraps the string in a larger (md_content, user_message, base_dir, file_items, ..., rag_engine) -> str because the Application's text-in/text-out also needs tool calls, RAG injection, tier attribution, and patch-mode. But the primitive is the same.
Verdict. PARITY. nagent and Manual Slop both use text-in/text-out at the bottom. The Application's send() is a strict superset of nagent's nagent-llm-text, with provider churn still isolated to a single module.
Domain tag: Both. Meta-Tooling uses the same primitive via mma_exec.py's ai_client.send.
3. Conversations are editable state
nagent's claim. The conversation file is not chat history. It is working state. Memory goes stale; therefore let people save, load, summarize, edit, branch, trim, copy, diff, version, and rewrite conversations. "The conversation does not own its memory. The user does."
nagent's implementation.
bin/nagentexposes--save-conversation <name>,--load-conversation <name>,--summarize,--edit-conversation <prompt>. The latter automates one path: archive current file, run file-edit on the archive, load the result.- Conversations are plain text files. The user can
cat,vim,git diff, orcpthem with no special tooling. The<nagent-response>body and<nagent-shell-result>body are just text in the file. - The first draft of this section understated Manual Slop's editing capability. The corrected picture is below.
Manual Slop's equivalent (corrected, with the full operation matrix). Manual Slop's discussion editing lives at three nested layers, each with its own operations. The full enumeration:
Layer A — Per-entry operations on app.disc_entries: list[dict] (the discussion's typed message list). The renderer is src/gui_2.py:3770 render_discussion_entry(...). Per entry, the user can:
| # | Operation | GUI control | Source code | What it does |
|---|---|---|---|---|
| A1 | Edit content in place | imgui.input_text_multiline on the entry body |
gui_2.py:3841 |
The entry's content field is a fully editable multi-line text input. The user can rewrite an AI's response, fix a typo in their own prompt, paste in code from another source, etc. |
| A2 | Toggle read/edit mode | [Edit] / [Read] button |
gui_2.py:3799 |
When in [Read] mode, the content is rendered as Markdown with syntax highlighting (render_discussion_entry_read_mode at gui_2.py:3855). When in [Edit] mode, the multi-line text input is shown. |
| A3 | Toggle collapsed/expanded | +/- button per entry |
gui_2.py:3789 |
Collapsed entries show a 60-char preview (line 3822-3824). Expanded entries show full content. |
| A4 | Change role | Combo box from app.disc_roles |
gui_2.py:3793-3796 |
The entry's role field is editable. The list app.disc_roles is itself user-managed (see B5). |
| A5 | Insert entry before this one | Ins button |
gui_2.py:3813 |
app.disc_entries.insert(index, {"role": "User", "content": "", "collapsed": True, "ts": project_manager.now_ts()}) |
| A6 | Delete this entry | Del button |
gui_2.py:3815-3816 |
if entry in app.disc_entries: app.disc_entries.remove(entry). The membership check matters — ImGui can re-render stale state, so the check guards against double-delete. |
| A7 | Branch at this entry | Branch button |
gui_2.py:3821 → app._branch_discussion(index) → app_controller._branch_discussion:3503 → project_manager.branch_discussion:429 |
Creates a new Take named <base>_take_<n> and copies the history up to and including index into the new Take. The user is then switched to the new Take. |
The entry dict shape itself is open: {"role": str, "content": str, "collapsed": bool, "ts": str, ...} plus optional thinking_segments (for AI entries with <thinking> blocks, parsed by src/thinking_parser.py) and usage (for token accounting: input/output/cache). The user can also set per-entry read_mode (a render-time flag, not persisted).
Layer B — Discussion-level operations (the Take / discussion set). These are the second-tier controls, rendered at src/gui_2.py:4239 render_discussion_entry_controls(...) and the discussion selector at gui_2.py:4330 render_discussion_selector(...):
| # | Operation | GUI control | Source code | What it does |
|---|---|---|---|---|
| B1 | Append new entry | + Entry button |
gui_2.py:4240 |
app.disc_entries.append({...}) with the default role from app.disc_roles[0]. |
| B2 | Collapse all / Expand all | -All / +All buttons |
gui_2.py:4242-4246 |
Bulk-set collapsed flag on every entry. |
| B3 | Clear all | Clear All button |
gui_2.py:4248 |
app.disc_entries.clear(). |
| B4 | Save (flush to project TOML) | Save button |
gui_2.py:4250 |
app._flush_to_project(); app._flush_to_config(); app.save_config(). |
| B5 | Add/remove roles | Add / X buttons under "Roles" |
gui_2.py:4317-4328 |
app.disc_roles.append(r) / app.disc_roles.pop(i). The role list is user-managed at runtime — they can add "Context", "Tool", "Vendor API", or any custom role and assign it to any entry. |
| B6 | Switch active discussion | Discussion combo + Take tabs | gui_2.py:4197, 4344, 4354 |
app._switch_discussion(name). The Takes group by base name (name.split("_take_")[0]) and render as nested tabs. |
| B7 | Rename / Delete discussion | Rename / Delete buttons |
gui_2.py:4291, 4293 |
app._rename_discussion(...) / app._delete_discussion(...). Cannot delete the last discussion (guarded at app_controller.py:3543). |
| B8 | Promote Take to top-level | Promote button in takes panel |
gui_2.py:4364 |
project_manager.promote_take(app.project, app.active_discussion, new_name) — renames a Take (e.g. T0_take_2) to a fresh top-level discussion name. |
| B9 | Per-role filter | ui_focus_agent selector (system-wide) |
gui_2.py:4230-4234 |
display_entries = [e for e in app.disc_entries if e.get("role") == persona_name or e.get("role") == "User"]. The filter follows the MMA persona focus. |
| B10 | Truncate to N pairs | Truncate button + drag_int |
gui_2.py:4254-4260 |
truncate_entries(app.disc_entries, app.ui_disc_truncate_pairs) keeps the last N User/AI pairs (per gui_2.py:175 truncate_entries(...)). |
| B11 | Compress (AI summarization) | Compress button |
gui_2.py:4252 → app_controller._handle_compress_discussion:3357 |
Calls ai_client.run_discussion_compression(disc_text) and replaces the discussion with the LLM's compressed version. |
Layer C — UI snapshot history (undo/redo). The HistoryManager (src/history.py:71, max_capacity=100) and UISnapshot (history.py:8-63) provide Ctrl+Z / Ctrl+Y across the entire UI state — including disc_entries:
| # | Operation | Source code | What it does |
|---|---|---|---|
| C1 | Take snapshot | gui_2.py:735 _take_snapshot → history.UISnapshot(...) |
copy.deepcopy(self.disc_entries) — a deep copy of the full entry list is captured. The snapshot also captures ai_input, temperature, top_p, max_tokens, auto_add_history, files, context_files, screenshots, all system prompts. |
| C2 | Apply snapshot (undo/redo) | gui_2.py:754 _apply_snapshot |
Restores self.disc_entries = snapshot.disc_entries (and all the other fields). |
| C3 | Change detection triggers snapshot | gui_2.py:1160, 1166-1167 |
if len(current.disc_entries) != len(self._last_ui_snapshot.disc_entries) or ... — disc_entries content change pushes a new snapshot. |
| C4 | Capacity-evict oldest | history.py:80-90 push() |
When the undo stack exceeds 100, the oldest is popped from the front. |
| C5 | Jump to specific state | history.py:129 jump_to_undo(index, current_state, ...) |
Allows time-traveling to any past snapshot, not just the most recent. |
Summary of editability. Manual Slop provides:
- Per-entry content edit (A1, A2) — the AI's response text is fully editable in the GUI
- Per-entry insert at any position (A5) — the user can drop a new entry between two existing entries, not just append
- Per-entry delete at any position (A6)
- Per-entry role change (A4) — the user can re-label any entry as User, AI, Tool, Context, or any custom role
- Per-entry branch (A7) — creates a Take at any entry, not just at the end
- Per-entry collapse/expand (A3) — visual organization
- Per-discussion full CRUD (B1, B6, B7, B8) — append, switch, rename, delete, promote
- Per-role set management (B5) — the role list itself is user-editable
- Bulk operations (B2, B3, B10) — collapse/expand all, clear, truncate
- AI-assisted compression (B11) — summarize the whole discussion
- Undo/redo across all of the above (C1-C5) — Ctrl+Z / Ctrl+Y / jump-to-state
What Manual Slop does NOT have. The user cannot edit the provider-side raw transcript — the bytes inside the ai_client._anthropic_history, ai_client._gemini_chat._history, etc. process globals. These are reset on ai_client.reset_session(). nagent's "edit the conversation file" pattern operates at this layer, not the entry abstraction. The comms log (comms.log) is JSON-L and append-only, not user-editable from the GUI (it can be edited on disk in a text editor, but that's a different workflow).
Verdict. PARITY (DIFFERENT FOCUS). Both systems support comprehensive editing of the conversation-as-data. The difference is what counts as "the conversation":
- nagent's "conversation" = the raw transcript text file (the bytes the LLM produced)
- Manual Slop's "conversation" = a typed entry list with role + content + metadata + optional thinking segments
Manual Slop's editing is more granular and more pervasive (per-entry content edit, per-entry insert/delete, per-entry role-change, per-entry branch, with undo/redo). nagent's editing is deeper at the raw transcript layer (edit the actual AI response text before it's been abstracted into a typed entry). Both are real; both are deliberate.
Domain tag: Application. The Application's typed-entry abstraction is intentional — the user thinks in "discussions" not "transcripts." The user can opt-in to the raw-transcript layer by editing comms.log on disk or by reading the TOML discussions/<take_name>/history field directly.
Future-track candidate: optionally persist the raw transcript as a sibling file under each take (Candidate 10 in decisions.md), enabling the nagent-style "edit the actual AI response" workflow for users who want it.
4. Visible output protocol
nagent's claim. Free-form model output is hard to execute. Use a visible protocol: <nagent-read>, <nagent-file-read>, <nagent-shell>, <nagent-write>, etc. The startup prompt lists the only tags the model may emit. The parser is strict: recognized tags and whitespace. Nothing else. "If you cannot read the protocol, you cannot debug the system."
nagent's implementation. bin/nagent:TAG_PATTERNS is a list of (tag_type, compiled_regex) tuples. parse_response() returns None, error if any non-whitespace text is found outside a known tag. The error message is appended to the conversation and the model is asked to retry (up to MAX_FORMAT_RETRIES = 3).
Manual Slop's equivalent. Manual Slop's Application AI uses provider-native function calling (Gemini genai.types.FunctionDeclaration, Anthropic tool_use blocks, etc.). This is opaque: the protocol is encoded in JSON the provider parses. The user cannot read a function_call from the comms log and reason about it without knowing the provider's schema.
The two approaches are structurally different:
| Aspect | nagent regex tags | Manual Slop function calling |
|---|---|---|
| Visibility | Plain text, inspectable in the conversation file | JSON blobs in provider-specific format |
| Per-provider portability | Same tags work across all 4 providers | Each provider has its own schema; mcp_client's 45 tools have 5 different per-provider formats |
| Provider capability ceiling | Whatever the model can emit as text | Native parallel tool calls, structured outputs, JSON-mode constraints |
| Debuggability | "Why didn't the model read the file?" → grep the conversation for the tag | "Why didn't the model call read_file?" → inspect the JSON response |
Verdict. ARCHITECTURAL DIFFERENCE — both are correct for their domain. The Application wants parallel tool calls, JSON-mode constraints, and provider-side caching. The Meta-Tooling might want nagent's regex tags for explicit debuggability.
Domain tag: Both. The Application's choice is right (modern providers all support function calling with parallel execution — see docs/guide_ai_client.md §"Async Tool Execution"). The Meta-Tooling could adopt nagent's regex-tag protocol for its own work — for example, by using <read src/foo.py> instead of a tool-call JSON. This is explicitly the difference between the "Application's internal AI" and the "Meta-Tooling that builds the Application" in docs/guide_meta_boundary.md.
Future-track candidate: a Meta-Tooling-side DSL for compact tool calls (per the existing docs/reports/PLANNING_DIGEST_20260606.md reference to "an intent-based DSL" for "discovery" or "combinatorics").
5. The loop (append, call, parse, act, append, repeat)
nagent's claim. "Agent behavior" is mostly: append, call, parse, act, append, repeat. Heavier systems add infrastructure around the same steps.
nagent's implementation. bin/nagent:run_agent_loop is a while True loop:
- Append user prompt to conversation file
- Send conversation file to LLM (via
nagent-llm-text --json) - Append response to conversation file
- If response contains action tags: run those actions, append results, continue loop
- If response contains
<nagent-response>: print and stop
Manual Slop's equivalent. Manual Slop has three parallel "loops":
-
src/ai_client.py:_send_<provider>— the per-provider tool-call loop. Up toMAX_TOOL_ROUNDS + 2 = 12iterations. Each round: call provider, parse function calls, dispatch, append tool results. Same shape as nagent. -
src/multi_agent_conductor.py:ConductorEngine.run— the MMA loop. Per ticket:ai_client.reset_session()(Context Amnesia), build prompt,loop.run_in_executor(None, run_worker_lifecycle, ...). Different scope (per ticket, not per user turn). -
simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async— the 1:1 chat loop. Per user turn: build markdown, send, wait, append response. Different scope (per user turn, in the App).
All three have the same "append, call, parse, act, repeat" shape. They differ in what gets appended (per-provider history vs track state vs disc_entries).
Verdict. PARITY. The loop is the universal pattern. Manual Slop's three loops are at different layers (LLM, MMA, App). The lack of a single "the loop" file is a real cost — nagent's run_agent_loop is 50 lines, easy to reason about. Manual Slop's loops are 100-300 lines each, scattered.
Future-track candidate: a single src/llm_loop.py:run_loop(...) function that all three callers use, with the dispatch and parse layers injected. (Not a high-priority refactor; the current separation is readable.)
Domain tag: Both.
6. Per-file memory (curation, not conversation log)
nagent's claim. One conversation grows too large. Attach memory to artifacts. Work keeps coming back to the same files; give each file its own persistent local memory. "When work orbits one artifact, store memory on that identity."
nagent's implementation. bin/helpers/nagent_file_edit_lib.py provides:
file_id_for_path(path) -> "{st_dev}:{st_ino}"— a stable file identity across renames (the inode is preserved).file_index_path(root, pid) -> conversations/file-index-{pid}.json— a JSON registry of{file_id: {path, conversation}}.resolve_file_edit_conversation(root, pid, file_path) -> (name, resolved, file_id)— gets or creates a per-file conversation.nagent-file-edit --file src/foo.py "add validation"— spawns a new nagent process with--file_edit src/foo.py, which loads the file's previous conversation as the initial context. After edits, the new file is appended to the same conversation.
The result: a per-file conversation log keyed by inode. Rename with same inode = same conversation. Pure path-based: nope, you'd collide across two repos on the same machine.
Manual Slop's equivalent (corrected per user). The first draft of this report marked this section as "DOMAIN MISMATCH" — claiming Manual Slop has no per-file memory. This was wrong.
Manual Slop does have a per-file memory concept. It's just a different kind of memory. Where nagent's per-file memory is a conversation log (what the LLM said about this file last time), Manual Slop's is a curation config (how to present this file in the AI's context window). The two are complementary, not equivalent.
The Manual Slop per-file memory:
# src/models.py:510
@dataclass
class FileItem:
path: str # the artifact identity (path-keyed, no inode)
auto_aggregate: bool = True # include in auto-aggregation?
force_full: bool = False # bypass aggregation with full content?
view_mode: str = 'full' # full / skeleton / summary / sig / def / agg
selected: bool = False # for batch operations
ast_signatures: bool = False # only signatures
ast_definitions: bool = False # only definitions
ast_mask: dict[str, str] # per-symbol mask (from Structural File Editor)
custom_slices: list[dict] # Fuzzy Anchor slices with tag+comment
injected_at: Optional[float] # timestamp
Plus the ContextPreset (src/models.py:909): a named, persisted set of FileItems, stored in the project's manual_slop.toml. Load a preset → restore the same per-file curation state. This is the per-file memory that survives across discussions.
The user pointed at this directly: "we have the context composition we can directly control what's in memory at the start of a discussion." That's the right framing. aggregate.py:run builds the initial markdown from self.context_files (the active preset's FileItems) + aggregate.run(flat, aggregation_strategy=...). The user controls the per-file memory at discussion start.
What's missing is nagent's specific pattern: a per-file conversation log keyed by inode. Manual Slop does not have a "last investigation of this file" concept stored as a file. The closest analog is commit history (the discussion itself is git-linked, per docs/guide_gui_2.md §"Discussions Sub-Menu" "Git Commit Tracking"). But that's discussion-scoped, not file-scoped.
Verdict. MANUAL SLOP IS STRONGER IN THE CURATION DIMENSION; nagent IS STRONGER IN THE CONVERSATION-LOG DIMENSION. Both have a real per-file memory concept. Manual Slop's is "how do I render this file next time the AI sees it" (rich, with 9 fields, AST-aware); nagent's is "what did the LLM say about this file last time" (plain text, with stable inode identity). The two are not equivalent; they're different optimizations for different needs.
Domain tag: Application (for the curation config). The user-correction explicitly said: "we have the context composition we can directly control what's in memory at the start of a discussion." That confirms this is a real Application feature, not a gap.
Future-track candidate: extending the per-file memory with a thin "last-investigation" log per file. A ~/.manual_slop/per_file/<file_id>.md (file_id by inode, like nagent) that records the last time a discussion referenced this file, the questions asked, and the answers received. This is a Meta-Tooling-friendly addition because it's a plain file.
7. Repository history as data
nagent's claim. A repo is not only the current tree. History is data too. Transform git history into editing context for a target file. Not vague "retrieval." Explicit transformation of historical artifacts into working input.
nagent's implementation. bin/nagent:file_edit_history_and_summary_block(file_edit_path, ...):
git_file_history(repo_root, rel_path)—git log --follow --max-count=50per filesummarize_new_file_commits(...)— LLM call to one-line-summarize new commitscoedited_file_rows(repo_root, rel_path, commits)— counts files in the same commits; labels high/medium/low co-edit rateformat_file_history(...)— produces a{file-history}block with editors, step-by-step, co-edited files, summarized commits
Manual Slop's equivalent (partial). Manual Slop's _reread_file_items (in ai_client.py) does mtime-based current content re-reading with diff injection as [SYSTEM: FILES UPDATED]. It does not do git history injection.
The closest things Manual Slop has:
- Git commit-linked discussion tracking in the GUI: each discussion has a "Update Commit" button that stamps
git rev-parse HEAD(perdocs/guide_gui_2.md§"Discussions Sub-Menu"). src/dag_engine.pytracks ticket-to-git-commit relationships, but for MMA workers, not for the AI's context.
Verdict. PARTIAL. Manual Slop has current-content diff injection (the easy half) but lacks historical-context injection (the harder half). nagent's summarize_new_file_commits would be a useful addition to the Manual Slop AI's context — especially for "explain what this file does" questions where the LLM is meeting the file fresh.
Domain tag: Application. Future-track candidate: a src/git_history.py module that mirrors nagent's file_edit_history_and_summary_block and is invoked at discussion start (after aggregate.py).
8. Historical coupling & artifact neighborhoods
nagent's claim. A file lives in a neighborhood of related artifacts. Files that change together in git history are hints: tests, headers, config, paired implementation. High co-edit rate means "look here maybe." Not "edit everything."
nagent's implementation. coedited_file_rows(repo_root, rel_path, commits):
- Counts files in the same commits as the target
- Labels: high (>=50% co-edit), medium (>=20%), low
- Renders a
| file | commits together | P(other file changed | target file changed) |table - Guidance text: "Use these files as hints. Before editing, inspect high-likelihood co-edited files when the requested change may affect interfaces, tests, config, or paired code. Do not edit them unless the user request or evidence requires it."
Manual Slop's equivalent. None. Manual Slop has py_get_hierarchy (subclass scan) and ts_c_*_get_* AST tools, but no tool that returns "files that historically co-edit with this file." The closest is derive_code_path (call-graph trace), which is structural not historical.
Verdict. GAP. This is a real missing tool. nagent's framing — "hints, not commands" — is exactly the right level for a co-edit suggestion. A 50-line tool (py_coedit_files(path) -> list[(path, count, likelihood)]) would fill the gap.
Domain tag: Application. Future-track candidate: a py_coedited_files MCP tool + ts_c_coedited_files for C/C++.
9. Disposable sub-conversations
nagent's claim. Exploration creates noise. Spawn disposable workers. Sub-conversations are temporary nagent processes with isolated conversations. Their lifetime does not matter. The artifact they return matters.
nagent's implementation. <nagent-conversation> tag in the main loop's response:
- Parent appends
<nagent-conversation prompt="...">to its conversation - Parent spawns
nagent --invocation delegated --parent-conversation <name> --jsonas a subprocess - Child's
--jsonoutput is parsed, rolled up into the parent'srecursive_input_tokens/recursive_output_tokens - Child has its own conversation file; no shared context except the explicit prompt
- Parent gets a concise artifact: the child's
<nagent-response>content, plus token usage
Manual Slop's equivalent (corrected per user). The first draft of this report claimed PARITY (stronger in some ways). The user corrected this:
"I don't know if I have disposable sub-conversations, I don't really have them for non-mma runs. I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points."
So the actual picture is:
| Layer | Sub-conversation support |
|---|---|
| MMA Tier 3 / Tier 4 | Yes. mma_exec.py spawns a real subprocess per ticket with Context Amnesia. ai_client.reset_session() at start of run_worker_lifecycle. The Ticket output is the "distilled artifact" returned to the parent (ConductorEngine). Per the docs: "Tier 3 worker is a fresh subprocess with a clean context window, receiving only the prompt and the relevant context slice." |
| 1:1 main discussion | No. The Application's chat loop has no sub-conversation mechanism. The user types a prompt, the AI responds, the loop continues. There's no way to "ask a sub-agent to investigate X and bring back the answer." |
The user is correct: this is a gap. The MMA pattern is the prototype. A future track could extract MMA's run_worker_lifecycle into a reusable app.spawn_sub_conversation(prompt, allowed_tools=...) method that the App can call from pre_tool_callback or from a new "investigate this" command.
Verdict. PARITY for MMA; GAP for 1:1 discussions. The MMA pattern is strong. The 1:1 chat has no equivalent. The user explicitly flagged this as a want.
Domain tag: Application (and possibly Meta-Tooling). Future-track candidate: a src/sub_conversation.py:SubConversationRunner that the App can call to spawn disposable sub-agents on-demand during 1:1 discussions. Per the user: useful for "specific points" within a longer conversation.
10. Controlled writes
nagent's claim. A loop that writes files needs explicit boundaries. nagent is a reference implementation with conventions, not a sandbox. Shell runs with your permissions. Structured writes are checked. That is not a security boundary. Do not pretend it is.
nagent's implementation.
validate_write_path(path, file_edit_path, ...)— in main mode: path must be in/tmp,/var/tmp, or$TMPDIR. In file-edit mode: path must be the target file (or one of its split segments).- Rejected writes append
<nagent-write-result status="error">to the conversation. <nagent-shell>runs whatever the LLM wrote, with the user's permissions, in the user's working directory. There is no shell sandbox. This is explicit.
Manual Slop's equivalent. Manual Slop has a much stronger security model:
| nagent | Manual Slop |
|---|---|
validate_write_path: in main mode, path must be in /tmp, /var/tmp, or $TMPDIR |
mcp_client._is_allowed: in main mode, path must be in the allowlist (constructed from file_items + extra_base_dirs); history.toml and *_history.toml are always blocked |
execute_write writes the file directly |
set_file_slice / edit_file / py_update_definition route through AST or string-match for validation |
<nagent-shell> runs the user's full shell, full permissions, no approval |
run_powershell(script, base_dir, qa_callback=...) requires GUI modal approval (Execution Clutch), 60s timeout, taskkill cleanup, optional Tier 4 QA on failure |
| No per-tool allowlist | 3-layer security: configure (allowlist) → _is_allowed (path validation) → _resolve_and_check (resolution + symlink resolution) |
| No sandbox at all | PowerShell-only (no bash/cmd) by default; can be enabled in [mcp_env.toml] |
Verdict. PARITY (STRONGER on Manual Slop's side). Manual Slop's HITL-required shell execution + 3-layer allowlist is dramatically more secure than nagent's tmpdir check. The user explicitly chooses "less safety but more flexibility" with nagent, and "more safety but more friction" with Manual Slop.
Domain tag: Both. The Application needs Manual Slop's strict model. The Meta-Tooling could legitimately use nagent's looser model because the human is in the loop (the bridge script pops a GUI dialog).
11. Large files as explicit artifacts (split/patch)
nagent's claim. Big files exceed context. Split them. Do not pretend they fit. The split is a data structure with index.json and segment files; the patch is a unified diff; the source hash validates that nothing changed.
nagent's implementation.
The 4-file pipeline:
nagent-file-split <file> --output <dir> --split <type> [--summarize] [--refresh INDEX] [--target-bytes 32768] [--natural]:EXTENSION_MAPcovers 11 languages (txt, md, cpp, py, xml, js, ts, json, yaml, go, rs, java)- Per-language
SCORE_BY_TYPE(no tree-sitter; regex + line-counting + brace/JSON/XML depth counters) py_scorerewards blank lines followed bydef/class/async defcpp_scoreusesbrace_depthto find closing braces at depth 0json_scoreusesjson_depthto find closing}/]at depth 0- Writes
index.jsonwithsource_path,sourcesha256,source_size_bytes,source_line_count,split_type,target_bytes,natural,created_at,segment_count,segments[] - Each segment is a separate file with
name-0001.py,name-0002.py, etc. --summarizeflag spawnsnagent-file-summarizeper-segment subprocess
- User edits the segment files (in place, via vim, etc.)
nagent-file-patch <index> [--patch PATH] [--dry-run] [--force]:validate_index(index, require_hash_match=not force)— strict hash check; rejects if source changedmerge_segments(segments) -> str— concatenates segment contents in ordermake_unified_patch(source, original, updated)—difflib.unified_diff- Writes the patch file; if
apply=Trueandchanged=True, writes the source
nagent-file-summarize <file> [--limit-word-count N] [--output DIR] [--json]:- Files > 64 KB cascade to
nagent-file-split --summarizefirst summarize_contentretries up toSUMMARY_MAX_ATTEMPTS = 2if the LLM overshoots the word limitcombined_summary_from_indexglues per-segment summaries into one
- Files > 64 KB cascade to
Manual Slop's equivalent (different mechanism, same insight). Manual Slop has all the parts of nagent's split/patch/summarize, but they live in different files and use different mechanisms:
| nagent | Manual Slop |
|---|---|
nagent-file-split with per-language SCORE_BY_TYPE (regex + line counts + brace/JSON/XML depth) |
aggregate.py:build_file_items() + py_get_skeleton (tree-sitter) + ts_c_*_get_skeleton (tree-sitter) + outline_tool.py |
index.json with source_path, sourcesha256, segments[] |
No explicit index.json. The "split" is implicit in _reread_file_items (mtime-based, not hash-based) and the py_get_skeleton tool returns the structural view on demand. |
nagent-file-patch with strict validate_index (hash check) |
set_file_slice / edit_file with result of file.read_text() pre-write validation. No hash-based pre-validation. |
nagent-file-summarize with per-segment LLM call + retry |
run_subagent_summarization(file_path, content, is_code, outline) -> str (in-process LLM call) |
Combined combined_summary_from_index |
No equivalent; aggregate.build_markdown_no_history builds a single markdown per call |
nagent-file-summarize cascades to nagent-file-split --summarize for > 64 KB |
RAGEngine._chunk_code cascades to chunking for Python (mtime-based invalidation, ChromaDB persistence) |
Crucial difference: Manual Slop uses tree-sitter, nagent does not. nagent's per-language scoring functions are all regex-based (cpp_score looks for closing braces at depth 0; py_score looks for blank lines followed by def/class keywords; no AST parsing). Manual Slop's py_get_skeleton and ts_c_*_get_skeleton use the tree-sitter library for actual AST traversal.
This is a trade-off. Tree-sitter is more accurate but requires a native dependency. nagent's approach works on any Python install with no compiled extensions. For the Application domain, tree-sitter is already a dependency (file_cache.py); for the Meta-Tooling, nagent's regex approach has appeal.
Verdict. PARITY (DIFFERENT MECHANISM). Both have the "split / patch / summarize as explicit data artifacts" insight. nagent uses subprocesses + per-language scoring + hash validation. Manual Slop uses tree-sitter + in-process calls + mtime validation. The key safety property — "the patch operation validates the source hasn't changed" — is done by nagent via SHA-256; Manual Slop does it implicitly by re-reading the file and string-matching. Manual Slop could adopt the explicit hash approach for stronger guarantees.
Domain tag: Both. Future-track candidate: an explicit src/split_lib.py + src/patch_lib.py mirroring nagent's design, used by the Application for very-large-file scenarios (e.g., a 200KB legacy C file where skeleton + sig + def aggregation isn't enough).
12. Tool discovery (self-describing executables)
nagent's claim. Tool capability should be explicit data too. No central registry. Tools describe themselves.
nagent's implementation. bin/helpers/nagent_cli.py:collect_bin_tool_descriptions(bin_dir):
- Iterates every executable in
bin/ - Runs each with
--description(10s timeout per) - Captures stdout, parses it
- Concatenates into a single "Available tools:\n\n<description 1>\n\n<description 2>\n..." block
- Inserts this block into the initial context
Each tool's __main__ starts with:
def exit_on_description(description: str) -> None:
if "--description" in sys.argv:
print(description)
raise SystemExit(0)
So nagent-file-split --description prints "Split a large file into structure-aware segments..." and exits 0. The main nagent loop calls collect_bin_tool_descriptions once at startup.
Manual Slop's equivalent. None. The 45 MCP tools in src/mcp_client.py are dispatched by a flat if/elif chain in dispatch():
def dispatch(tool_name, tool_input):
if tool_name.startswith("bd_"):
return _dispatch_beads(tool_name, tool_input)
if tool_name == "read_file":
return _read_file(tool_input["path"])
if tool_name == "py_get_skeleton":
return _py_get_skeleton(tool_input["path"])
# ... 45+ branches ...
return f"ERROR: unknown tool: {tool_name}"
Adding a new tool requires:
- Edit
dispatch()to add the branch - Update the security allowlist in
_resolve_and_check(if filesystem access) - Update the AI capability declaration in
get_tool_schemas() - Add tests
nagent's approach: drop an executable in bin/, implement exit_on_description, done. The tool is auto-discovered.
The user (per the pushback): "The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet." — so this is a known want, but low priority.
Verdict. GAP (Application). nagent's pattern is genuinely better here, but Manual Slop has 45 tools in production and a migration would be a big refactor. The win is real (extensibility) but the cost is also real (rewrite the dispatch layer).
Domain tag: Both. For the Meta-Tooling (the scripts/ directory), nagent's pattern is more aligned with the external-agent usage model. For the Application, the existing dispatch if/elif is fine.
Future-track candidate: a mcp_architecture_refactor_20260606 (already on the board) would benefit from nagent's pattern. The "sub-MCP" extraction the planned refactor proposes is exactly the right scope for this — each sub-MCP could be its own self-describing module.
13. Differences from frameworks
nagent's philosophical frame: framework-style systems hide state in object graphs and long-lived agent abstractions; nagent keeps everything as explicit files. The reframing table at the end of the nagent README is excellent:
| Common term | nagent framing |
|---|---|
| memory | editable artifact |
| retrieval | preserved work / historical context |
| agent | temporary transformation function |
| context | explicit input data |
This report's §2-§12 have been showing where Manual Slop agrees with nagent's reframings and where it deliberately diverges.
Verdict. The reframing is useful. The application can pick and choose which reframings to adopt per layer.
Domain tag: Both. This is the philosophical lens for the whole report.
14. Build your own
nagent's last section: "The minimal system is not mystical. Small loop over explicit state." The list of 12 buildable steps: generate_text(file) -> str, growing conversation document, initial context with the contract, output format + parser, handlers that append results to state, loop after actions, visible retry on malformed output, child loops for delegation, per-artifact memory, repository history → context blocks, split/index/patch for large files, save/load/edit/summarize for memory maintenance.
Verdict. Manual Slop has all 12 of these. Just in different files, with different names, and at a different scale.
Domain tag: Both. The 12-step list is a useful checklist for any future LLM-application track.
15. The 6 Pitfalls (Revised from 8, after User Corrections)
The first draft of this report had 8 pitfalls. The user-corrections on §3 and §6 collapsed 2 of them. The remaining 6:
Pitfall 1: No structured output protocol in the Application AI
The Application uses opaque provider-native function calling. The user can read the conversation, but cannot read a tool_call from the comms log without knowing the provider's schema. nagent's regex-tag protocol is more debuggable for the Meta-Tooling. Decision: not a problem for the Application (provider-native is the right choice). Worth borrowing for the Meta-Tooling. Domain tag: Both. Future-track candidate: an intent-based DSL for Meta-Tooling agent calls.
Pitfall 2: Provider-specific history is in process globals
src/ai_client.py has _anthropic_history, _deepseek_history, _minimax_history — 3 separate per-provider history lists, each with their own lock. Switching providers mid-session loses history. nagent's "single conversation file" model is provider-agnostic.
Concrete change: A future refactor toward a stateless LLMClient class with an explicit Conversation object (the transcript as a list[Message]) would let:
- Users save/load/replay conversations
- Provider switching doesn't lose history
- Tier 4 QA and Tier 3 workers share a common conversation format
Domain tag: Application. Future-track candidate: a src/conversation.py:Conversation dataclass + src/llm_client.py:LLMClient stateless wrapper around the 5 providers.
Pitfall 3: RAG is not "history as data"
Manual Slop's RAG (src/rag_engine.py) is fuzzy and not auditable. nagent's git-history-driven context is exact and inspectable. RAG is useful but should be additive, not a replacement. The Application's _reread_file_items mtime-based diff injection is the "history as data" mechanism Manual Slop already has.
The user's clarification: "RAG is an optional thing, doesn't have to be used. Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run."
Decision: RAG stays. The user wants a staging workflow: a sub-agent prepares RAG chunks before a run, the chunks become the discussion's starting memory. This is consistent with the nagent-inspired sub-conversation pattern (§9).
Domain tag: Application. Future-track candidate: a "RAG pre-staging" sub-conversation runner that pre-builds the index for a planned run.
Pitfall 4: The AI client is a stateful singleton with module-level globals
2,685-line src/ai_client.py. The module is the abstraction layer. To import it for testing, you trigger 5 provider SDKs' lazy imports. The unit tests are the only way to know what state is in flight.
This is the opposite of nagent's "files are the system; the process is a worker." nagent's run_agent_loop is 50 lines, stateless, testable. A future refactor toward a stateless LLMClient class would make ai_client parseable, testable, and saveable.
Domain tag: Application. Future-track candidate: a src/llm_client.py:LLMClient class with explicit Conversation, Provider, History objects. Backwards-compatible with the current ai_client.send() API.
Pitfall 5: No non-MMA disposable sub-conversations
The MMA pattern is strong. The 1:1 chat has no equivalent. The user explicitly flagged this as a want: "I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points."
Decision: Design src/sub_conversation.py:SubConversationRunner that the App can call to spawn disposable sub-agents on-demand during 1:1 discussions. Reuse MMA's subprocess pattern (mma_exec.py as the template). The sub-agent returns a concise artifact to the parent (nagent's pattern). Useful for "investigate this file" / "summarize this concept" / "look up this API" commands.
Domain tag: Application. Future-track candidate: a src/sub_conversation.py + a GUI "Investigate…" button on the message panel.
Pitfall 6: Hard-coded tool discovery
The 45 MCP tools in mcp_client.py:dispatch are in a flat if/elif chain. nagent's --description self-describing executable pattern is more extensible.
The user's position: "The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet."
Decision: Low priority. The mcp_architecture_refactor_20260606 (already on the board) is the natural place to address this — sub-MCPs as self-describing modules.
Domain tag: Both. Future-track candidate: subsumed by mcp_architecture_refactor_20260606.
Pitfalls removed by user-corrections
- (removed) Pitfall about "Conversation state is buried in module-level globals" — overstated. Manual Slop has editable UI state (Takes, UISnapshot, ContextPreset); it lacks editable raw transcripts, but that's a different design choice, not a gap. (See §3.)
- (removed) Pitfall about "per-file memory" — overstated. Manual Slop does have per-file memory in the curation dimension; what's missing is nagent's conversation-log dimension, which is a different optimization. (See §6.)
16. Recommended reading path for engineers
If you haven't read nagent, here's the priority:
- The README's first 3 sections ("What It Looks Like", "Durable Work", "Text In Text Out") — the philosophy in 5 minutes.
bin/nagent:run_agent_loop()— the actual loop, 50 lines.bin/helpers/nagent_file_split_lib.py:SCORE_BY_TYPE— the per-language scoring; shows what "structure-aware" can mean without tree-sitter.bin/helpers/nagent_file_patch_lib.py:validate_index— the strict hash check; the safety property of nagent's split/patch workflow.bin/helpers/nagent_file_summarize_lib.py:summarize_content— the retry-with-smaller-prompt pattern.bin/helpers/nagent_cli.py:collect_bin_tool_descriptions— the tool-discovery pattern; 30 lines.
The README's 14 sections can be skimmed in 15 minutes if you have the context this report provides. Read in order 1-5 above for the implementation depth.
Appendix A. Cross-reference table
| nagent file | Lines | Purpose | Manual Slop equivalent |
|---|---|---|---|
README.md |
~1500 | 14-section teaching document | This report + docs/guide_*.md |
bin/nagent |
~700 | Main loop, tag parser, sub-conversation runner | src/ai_client.py:send + src/multi_agent_conductor.py:ConductorEngine.run + simulation/workflow_sim.py:WorkflowSimulator.run_discussion_turn_async (3 separate loops) |
bin/nagent-llm-text |
~50 | CLI wrapper for nagent-llm.py |
(implicit; the Application calls ai_client.send directly) |
bin/nagent-llm-upload |
~30 | File upload + LLM call | (not present; the Application's read tools handle files inline) |
bin/nagent-file-edit |
~120 | Per-file subprocess wrapper | (not present; this is the gap that the user wants for 1:1 discussions) |
bin/nagent-file-split |
~170 | Main split executable | (not present in this form; Manual Slop uses aggregate.py + tree-sitter) |
bin/nagent-file-patch |
~80 | Main patch executable | (not present; Manual Slop uses set_file_slice / edit_file directly) |
bin/nagent-file-summarize |
~100 | Main summarize executable | src/ai_client.py:run_subagent_summarization (in-process) |
bin/helpers/nagent_cli.py |
~80 | --description pattern, WaitSpinner |
(not present) |
bin/helpers/nagent_llm.py |
~300 | 4 providers, token accounting | src/ai_client.py:_send_<provider> × 5 (in-process, with cross-provider state) |
bin/helpers/nagent_file_edit_lib.py |
~170 | file-index by inode, resolve_file_edit_conversation |
(not present) |
bin/helpers/nagent_file_split_lib.py |
~400 | SPLIT_TYPES (11 langs), per-language scoring |
src/file_cache.py:ASTParser (tree-sitter) + src/aggregate.py:build_file_items |
bin/helpers/nagent_file_patch_lib.py |
~130 | strict hash validation, make_unified_patch |
(not present; implicit mtime check) |
bin/helpers/nagent_file_summarize_lib.py |
~110 | per-segment LLM call, retry-with-smaller-prompt | src/ai_client.py:run_subagent_summarization (in-process, no retry) |
| Total nagent | ~4000 | Manual Slop's analogous parts: ~5000+ (ai_client + multi_agent_conductor + mcp_client + aggregate + rag_engine + history + project_manager + tree-sitter-based tools) |
Manual Slop is not smaller than nagent; it's larger because it has a GUI, persistence, HITL dialogs, Hook API, and a real test harness. The architectures serve different scales.
Appendix B. Citations
- nagent source: https://github.com/macton/nagent (all 11 source files read in full)
- Internal:
docs/Readme.md,docs/guide_architecture.md,docs/guide_ai_client.md,docs/guide_mma.md,docs/guide_tools.md,docs/guide_mcp_client.md,docs/guide_app_controller.md,docs/guide_meta_boundary.md,docs/guide_context_curation.md,docs/guide_personas.md,docs/guide_rag.md,docs/guide_gui_2.md - Internal source (selectively read for user-corrections):
src/models.py(FileItem, ContextPreset),src/context_presets.py,src/project_manager.py(branch_discussion, promote_take),src/aggregate.py,src/history.py - Mike Acton, "Data-Oriented Design and C++" (cppCon 2014) — referenced but not directly cited
- Ryan Fleury, "The Easiest Way To Handle Errors Is To Not Have Them" — cited via the
data_oriented_error_handling_20260606track
End of report. See comparison_table.md for the flat reference, decisions.md for the future-track candidates, and spec.md for the track wrapper.