Private

Public Access

Files

T

conductor-tier2 9cc51ca9af conductor(track): nagent review - deep-dive + 6 pitfalls + 10 actionable takeaways

Reference/analysis track. Produces 0 code changes.

Artifacts (conductor/tracks/nagent_review_20260608/):
- spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing
- report.md (571 lines) - 14-section deep-dive; primary deliverable
- comparison_table.md (79 lines) - flat side-by-side reference
- decisions.md (286 lines) - 10 future-track candidates with priority matrix
- nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded
  in code (file:line refs into nagent source and Manual Slop source)
- metadata.json (132 lines) - structured metadata + verification criteria
- state.toml (113 lines) - per-task tracking + user-corrections log (7 entries)

14 nagent principles covered in report.md (durable work, text-in/text-out,
editable state, visible protocol, the loop, per-file memory, repo history,
neighborhoods, sub-conversations, controlled writes, large files, tool
discovery, framework differences, build your own).

6 pitfalls (revised from 8 after user-corrections):
1. No structured output protocol in Application AI (opaque function calling)
2. Provider-specific history in process globals (ai_client._anthropic_history
   + _deepseek_history + _minimax_history)
3. RAG is not 'history as data' (fuzzy, not auditable)
4. AI client is a stateful singleton (2,685-line ai_client.py)
5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want)
6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py)

User-corrections applied (3 rounds, 7 total corrections recorded):
- Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7
  per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix
- Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN
  CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed
  conversation log; complementary, not equivalent)
- Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for
  1:1 discussions' (user wants this)
- RAG: opt-in, not gap; user wants pre-staging via sub-conversation
- Personas: config bundling (can opt out via AI settings)
- Tool discovery: deferred (user has 'intent based DSL' idea but 'no where
  near that ideation yet')

10 actionable takeaways (separate from the 6 pitfalls - those are
diagnosis, these are prescription):
1. State visibility (UI inspector for in-process state)
2. Readable conversation log (text-greppable, not just JSON-L)
3. Sub-agents for 1:1 (HIGH priority - user-flagged)
4. File-identity over file-path (st_dev:st_ino rename-safe)
5. One loop shape visible in diagnostics
6. Visible retry on protocol failure
7. Meta-Tooling DSL (intent-based, deferred)
8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606)
9. Single source of truth for disc_entries + provider history
10. Sub-agent return type constraint (bake into candidate #1 spec)

Domain classification: every recommendation tagged Application / Meta-Tooling
/ Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling
domain; Manual Slop's Application AI is a different kind of thing.

No code modified by this track (reference/analysis only). All 7 files
parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve.
Track is 'active' awaiting human review; future-track candidates live in
decisions.md and nagent_takeaways_20260608.md.

2026-06-08 18:44:35 -04:00

18 KiB

Raw Blame History

Future-Track Candidates: nagent Review Follow-ups

Companion to: report.md (deep-dive), comparison_table.md (flat reference), nagent_takeaways_20260608.md (actionable patterns) Date: 2026-06-08 Source: nagent v1.0.0 deep-dive review (see report.md)

This document is the bridge from "what nagent teaches us" to "what Manual Slop should do about it." Each candidate is a future conductor track (not this one). The candidates are not committed — they emerge from the analysis but each is a separate scoping exercise.

For an actionable, code-grounded read of these candidates (with the "what to do today, not just the future track" framing), see nagent_takeaways_20260608.md — it maps each candidate to specific patterns, design constraints, and small UX wins that don't need a new track.

Decision-making framework

For each candidate:

Why it matters — what pitfall or capability gap does it address?
What it would do — concrete description
Where it would live — Application or Meta-Tooling
Dependency on existing tracks — is anything already on the board?
Effort estimate — small / medium / large
User signal — has the user expressed want/don't-want/neutral?
Recommended priority — high / medium / low

The candidates are listed in priority order, which factors user signal heaviest (the user is the product owner for the Application; the analysis is just a reference).

Candidate 1: `src/sub_conversation.py:SubConversationRunner`

User signal: EXPLICIT WANT ("I probably want to add that for just 1:1 discussions where I use a sub-agent manually for specific points.")

Why it matters. nagent's §9 pattern (disposable sub-conversations via <nagent-conversation>) is the cleanest way to handle "investigate this without polluting the main discussion." Manual Slop has it for MMA (mma_exec.py is a real subprocess) but not for 1:1 discussions. The user is asking for this.

What it would do. A SubConversationRunner class that the App can call during a 1:1 discussion:

await runner.spawn(prompt: str, *, allowed_tools: list[str] = None, system_prompt: str = None) -> SubConversationResult
The runner spawns a fresh Python process (reusing the MMA pattern: mma_exec.py template with --invocation user, --parent-conversation <active_discussion_id>, isolated ~/.manual_slop/sub_conversations/<name>)
The sub-process runs to completion (or times out)
Result returns: a concise artifact (the sub-agent's <response> block) + token usage + exit code
The App inserts the result into the active discussion as a "User" role entry (so the parent LLM sees it on the next turn)
Cleanup: sub-conversation folder is auto-archived after 7 days (consistent with log_pruner.py)

Where it lives. Application. Possibly Meta-Tooling too (the scripts/ directory could use the same primitive).

Depends on. None directly. Could leverage MMA's mma_exec.py as a starting template. The public_api_migration_20260606 follow-up track is unrelated.

Effort. Medium. 2-3 phases: (1) extract reusable subprocess skeleton from MMA, (2) add 1:1-specific context injection, (3) add GUI controls ("Investigate…" button, optional command-palette command).

Recommended priority. HIGH — user-flagged.

Candidate 2: RAG pre-staging via sub-conversation

User signal: EXPLICIT WANT ("Would be cool to have a sub agent maybe prepare a rag chunks before I use them in a run.")

Why it matters. Manual Slop's RAG (src/rag_engine.py) indexes files on the fly at discussion start. For large projects, indexing can take 30+ seconds (per tests/test_rag_phase4_stress.py). The user wants a "prep" workflow: before starting a long discussion, fire off a sub-conversation that pre-indexes everything, so the discussion starts instantly.

This is also consistent with nagent's "data preparation is an explicit, visible step" philosophy (§1, §7). The RAG chunks are artifacts; preparing them is a transformation; the transformation can be a sub-conversation.

What it would do. A "Pre-stage RAG" command in the GUI (or in commands.py):

Spawns a sub-conversation with the prompt: "Index all files in [project] for RAG. Use the index_file tool on every file in the context. Report top-K queries at the end."
The sub-conversation runs rag_engine.index_file() on each tracked file (uses the same ChromaDB backend, with mtime-based invalidation)
Returns a concise summary: "Indexed N files. Top-K for 'execution clutch': [file1, file2, file3]."
The main discussion starts with the index already warm; RAGEngine.search() is fast

Where it lives. Application. The sub-conversation runner is the same primitive as Candidate 1; the staging logic is RAGEngine integration.

Depends on. Candidate 1 (sub-conversation runner). Could be done as a feature within Candidate 1's track.

Effort. Small to medium. The sub-conversation runner is the heavy lift (Candidate 1). The RAG-staging prompt is ~30 lines.

Recommended priority. HIGH — user-flagged; cheap given Candidate 1.

Candidate 3: Stateless `LLMClient` class

Why it matters. src/ai_client.py is 2,685 lines of stateful singleton with module-level globals for every provider's history. nagent's bin/helpers/nagent_llm.py is 300 lines of stateless dispatch. A refactor toward a stateless LLMClient(provider, model, conversation) class would:

Make ai_client parseable (no implicit state to track)
Make tests deterministic (each test gets a fresh client)
Enable conversation save/load (the Conversation object is the transcript)
Enable provider switching without losing history

This is a big refactor but a high-leverage one. Pitfalls #2 and #4 are both solved.

What it would do. A new src/llm_client.py:

@dataclass
class Conversation:
    messages: list[Message]  # role + content + tool_calls + tool_results
    metadata: dict
    def to_dict(self) -> dict: ...
    def from_dict(data: dict) -> Conversation: ...
    def save(path: Path) -> None: ...
    def load(path: Path) -> Conversation: ...

class LLMClient:
    def __init__(self, provider: str, model: str, api_key: str = None): ...
    def send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Conversation: ...
    def stream_send(self, conversation: Conversation, *, tools: list[Tool] = None) -> Iterator[Event]: ...

Backwards-compat: ai_client.send(...) becomes a thin wrapper that constructs a default Conversation from the current state and calls the new class.

Where it lives. Application (the AI client is the Application's main AI entry point).

Depends on. The data_oriented_error_handling_20260606 track is independent but related — both push toward the data-oriented principles. The public_api_migration_20260606 follow-up track would benefit from the new Conversation class.

Effort. Large. 3-5 phases: (1) introduce Conversation dataclass, (2) per-provider LLMClient.send, (3) migration of existing ai_client.send callers, (4) deprecate module-level globals, (5) remove. ~2000+ lines of refactor.

Recommended priority. MEDIUM. High value, but the existing stateful singleton works. Defer until a concrete Application need forces it (e.g., the user wanting to save/replay conversations).

Candidate 4: Intent-based DSL for Meta-Tooling tool calls

User signal: EXPLICIT WANT ("The tool use is kinda upfront, I want to add an intent based dsl to help with 'discovery' or combinatorics but no where near that ideation yet.")

Why it matters. nagent's §4 regex-tag protocol is more debuggable than Manual Slop's function-calling. The Meta-Tooling (the external agents that build the Application) could benefit from a more compact, inspectable tool-call format. The existing JSON function-calling format forces the user to read verbose {"name": "...", "args": {...}} blobs.

What it would do. An intent-based DSL that the Meta-Tooling can use in its own work. Examples (per the user's "discovery" or "combinatorics" hint):

<read src/foo.py:MyClass.method> — intent: read this symbol
<search "execution clutch"> — intent: semantic search the workspace
<edit src/foo.py:42-50:new code> — intent: surgical line-range edit
<test tests/test_foo.py::test_bar> — intent: run a specific test
<discover what calls X> — intent: dependency trace

These are read by the external agent (Gemini CLI, OpenCode), not by Manual Slop's Application AI. The Application's function-calling format stays the same (correct for its domain).

Where it lives. Meta-Tooling. Documented in docs/; taught via the conductor convention; the external agent emits the DSL, the bridge script (cli_tool_bridge.py) translates to actual mcp_client.py tool calls.

Depends on. None directly. The mcp_architecture_refactor_20260606 may produce tools that are easier to call via DSL (atomic, composable).

Effort. Research spike, not implementation. The user said "no where near that ideation yet." This is a design exercise, not a code change.

Recommended priority. LOW — user explicitly deferred.

Candidate 5: Self-describing MCP tools (nagent §12 pattern)

Why it matters. Manual Slop's 45 MCP tools are dispatched by a flat if/elif in mcp_client.py:dispatch. Adding a tool requires edits in 4 places (dispatch, security allowlist, capability declaration, tests). nagent's --description self-describing executable pattern is more extensible: drop an executable, it auto-appears.

What it would do. Each sub-MCP (or each tool) emits a --description block on --help. The dispatch function introspects via mcp_client.get_tool_schemas() and includes the descriptions in the AI's initial context automatically.

Where it lives. Application (the dispatch layer). The Meta-Tooling already has self-describing (via claude_tool_bridge.py); this is the Application-side equivalent.

Depends on. The mcp_architecture_refactor_20260606 is the natural place — the sub-MCPs would each be self-describing modules.

Effort. Medium (subsumed by mcp_architecture_refactor_20260606). Not a separate track.

Recommended priority. LOW — subsumed.

Candidate 6: `src/git_history.py` (nagent §7 pattern)

Why it matters. Manual Slop's _reread_file_items does current-content diff injection. nagent's file_edit_history_and_summary_block does historical content injection: git log --follow <file> per file, LLM-summarized, plus co-edit neighborhood. For "explain this file" questions, the LLM is meeting the file fresh — git history would give it crucial context (who touched it last, why, what's nearby).

What it would do. A src/git_history.py:file_edit_history_and_summary_block(file_path, repo_root, provider, model, config_path, previous_initial_context=None) -> str that:

Calls git log --follow --max-count=50 --date=short --format=... per file
Counts co-edited files per commit
LLM-summarizes new commits (with cache for unchanged history)
Renders a {file-history} block with editors, step-by-step, co-edited files, summarized commits
Called from aggregate.py:run at discussion start, after the file is added to context

Where it lives. Application (it's part of the AI's initial context).

Depends on. None directly. The data_oriented_error_handling_20260606 is independent. The rag_engine.py already has a sourcesha256 field and mtime-based invalidation — the same pattern.

Effort. Medium. 2 phases: (1) git history + co-edit, (2) LLM summarization with cache. ~300-500 lines.

Recommended priority. MEDIUM — high value, but only after Candidates 1-2 are done.

Candidate 7: Per-file conversation log (nagent §6 conversation dimension)

Why it matters. Manual Slop's per-file memory is the curation kind. nagent's is the conversation log kind. The user has the curation already; the conversation log is missing. The user's correction made this clear: the two are different optimizations, not equivalent.

What it would do. A thin ~/.manual_slop/per_file/<file_id>.md per file (file_id by st_dev:st_ino for stability across renames, like nagent). Updated each time a discussion references the file. Format:

# src/foo.py (file_id: 12345:67890)
Last referenced: 2026-06-08T12:34:56 (Discussion: "refactor auth")

## 2026-06-08T12:34:56 - "how does the validation work?"
AI response: ...
(User) followup: "what about edge cases?"

## 2026-06-05T... - "explain the parser"
AI response: ...

When the user opens a new discussion with the file in context, the per-file log is injected as a {per-file-history} block.

Where it lives. Application (the per-file log is the App's memory). The Meta-Tooling doesn't need this — sub-agent invocations are already short-lived.

Depends on. None. Could be added in a small follow-up to Candidate 3 (the Conversation object becomes the per-file log).

Effort. Small if done as a thin layer on top of the Conversation class. Medium if done before Candidate 3 (no Conversation object to leverage).

Recommended priority. LOW — niche, niche feature.

Candidate 8: `py_coedited_files` / `ts_c_coedited_files` MCP tools (nagent §8)

Why it matters. nagent's coedited_file_rows produces a "files that historically co-edit with this file" table. Manual Slop has py_get_hierarchy (subclass scan) but no historical co-edit tool. Useful for "if I edit this file, what should I also look at?".

What it would do. Two new MCP tools:

py_coedited_files(path: str) -> list[{path, commits_together, likelihood}] — runs git log --follow <path>, counts files in each commit, labels high/medium/low
ts_c_coedited_files(path: str) -> list[{path, commits_together, likelihood}] — same, for C/C++

Returns a table. Used in the initial context as {file-neighborhood}.

Where it lives. Application (initial context injection).

Depends on. None. Small, contained.

Effort. Small. ~200 lines + tests. The git-log is already in aggregate.py; this is a new tool that uses the same primitives.

Recommended priority. LOW — small but niche. Worth bundling with Candidate 6 if that gets done.

Candidate 9: Explicit `src/split_lib.py` + `src/patch_lib.py` (nagent §11)

Why it matters. Manual Slop doesn't have an explicit split/patch pipeline. For very large files (>50 KB), the current aggregate.py + tree-sitter approach works for reading (skeleton, summary) but not for patching (no explicit segment/hash model).

What it would do. Mirror nagent's design:

src/split_lib.py — per-language natural splitters, index.json with source_path, sourcesha256, segments[]
src/patch_lib.py — strict validate_index (hash check), make_unified_patch, apply_segment_patches
src/summarize_lib.py — per-segment LLM call + retry-with-smaller-prompt

Where it lives. Application (the AI is the consumer). The Meta-Tooling already has nagent if it wants this.

Depends on. None. Self-contained.

Effort. Medium. 2 phases: split/patch, then summarize. ~500 lines.

Recommended priority. DEFER UNTIL NEEDED. No current 1:1 use case requires explicit split/patch. If a future file is genuinely too large for tree-sitter to handle inline, this becomes Candidate #2-priority.

Candidate 10: Optional raw-transcript persistence per Take (nagent §3 conversation dimension)

Why it matters. nagent's "edit the conversation file" pattern is foreign to Manual Slop because the App stores abstracted entries (disc_entries), not raw transcripts. The user-edit feature in the GUI does edit individual entries, but the underlying log of function_call / tool_result blocks is implicit.

What it would do. Optionally, when a take is snapshotted to TOML (project_manager.save_project), also persist the raw transcript to a sibling file discussions/<take_name>/transcript.jsonl. The GUI gets a "View Raw Transcript" button. Optional "Edit Raw Transcript" mode that re-parses and re-aggregates.

Where it lives. Application. Optional — user can toggle per-project.

Depends on. None. Could be a small follow-up to Candidate 3 (Conversation class).

Effort. Small. ~150 lines + tests. Persist the existing comms.log in a structured way.

Recommended priority. LOW — niche feature, opt-in only.

Summary table

#	Candidate	User signal	Priority	Effort	Domain
1	`SubConversationRunner` (1:1 sub-convos)	Explicit want	HIGH	Medium	App + MT
2	RAG pre-staging via sub-conversation	Explicit want	HIGH	Small (depends on #1)	App
3	Stateless `LLMClient` class	(none)	Medium	Large	App
4	Intent-based DSL for Meta-Tooling	Explicit but deferred	Low	Research	MT
5	Self-describing MCP tools	Implicit	Low (subsumed)	Medium	BOTH
6	`src/git_history.py` (nagent §7)	(none)	Medium	Medium	App
7	Per-file conversation log	(none)	Low	Small	App
8	`py_/ts_c_coedited_files` tools	(none)	Low (bundle with #6)	Small	App
9	Explicit `split_lib.py` / `patch_lib.py`	(none)	Defer until needed	Medium	App
10	Raw-transcript persistence per Take	(none)	Low	Small	App

Recommended next steps

Spec and build Candidate 1 first — it's the highest-priority user-flagged want, and Candidates 2 builds on it.
Combine Candidate 2 with Candidate 1's track — same primitive, different prompt.
Hold Candidates 3-10 for future scoping — each is a separate conductor track when the corresponding need surfaces.

The current nagent_review_20260608 track itself produces no code; it's the reference. Candidates 1 and 2 will be the first implementation tracks informed by it.

18 KiB Raw Blame History