docs: agent workflow docs + regular docs (v2.3 surfacing)
Per user request 'use your remaining context to update agent workflow
docs and then regular docs based on what was discussed in this report',
this commit creates/updates 15 files derived from the v2.3 nagent
review (the 12 new nagent additions + the 4 memory dimensions
reframing + the cache strategy + the RAG discipline + the knowledge
harvest pattern).
Agent workflow docs (4 files):
- AGENTS.md (UPDATE): add @import line to canonical DOD + 'Code
Styleguides' section pointing to the 6 new styleguides + new
'Human-Facing Documentation' section pointing to ./docs/AGENTS.md
- conductor/workflow.md (UPDATE): new section 'Additions (2026-06-12)
- the 12 patterns from the latest nagent corpus' with TDD
protocols for knowledge harvest, cache ordering, compaction, RAG
discipline
- conductor/product-guidelines.md (UPDATE): new sections 'Memory
Dimensions (added 2026-06-12)' + 'See Also - Updated' with the
6-styleguide catalog
- docs/AGENTS.md (NEW): the agent-facing mirror of docs/Readme.md
(per the nagent CLAUDE.md pattern). 10 sections + the per-tier
reading path + the 4 memory dimensions + the caching strategy +
the knowledge harvest + the RAG discipline + the feature flags
Regular docs (11 files):
- 6 new styleguides (the convention catalog):
* data_oriented_design.md: the canonical DOD reference (Tier
0/1/2; 3 defaults to reject; 8 core defaults; 7-question
simplification pass; 10-question self-check; 4 memory
dimensions in Manual Slop context)
* agent_memory_dimensions.md: the 4 memory dims (curation /
discussion / RAG / knowledge) + when to use each + the
boundaries
* rag_integration_discipline.md: the conservative-RAG rule
(opt-in, complement, provenance, no mutation, feature-gated,
graceful failure)
* cache_friendly_context.md: stable-to-volatile context
ordering + the cache TTL GUI contract + the byte-comparison
test
* knowledge_artifacts.md: the knowledge harvest pattern
(category files, provenance, sha256 ledger, digest
regeneration, 'delete to turn off')
* feature_flags.md: file presence vs config flags vs CLI flags
- 3 new project docs (the cross-cutting guides):
* guide_agent_memory_dimensions.md: the cross-cutting guide on
the 4 dims + the decision tree
* guide_caching_strategy.md: caching across providers +
stable-to-volatile ordering + cache TTL GUI + the byte-
comparison test + the 5th provider (claude-code)
* guide_knowledge_curation.md: the knowledge memory guide (4th
dim) + the 5 category files + per-file notes + the digest +
the ledger + the harvest workflow
- 2 existing doc updates:
* guide_mma.md: new sections 'Delegation as context management'
+ 'The 4 memory dimensions (the MMA scope)'
* guide_ai_client.md: new section 'Cache strategy and the 12-
layer model' + the 5th provider (claude-code)
All files use the same style as the v2.3 review (the user's preferred
format): 7-column tables, no JSON, SSDL shape tags, forth/array
notation, file:line citations, ASCII sketches where useful. The
human Readme files (Readme.md, docs/Readme.md) are NOT modified
(per repeated user instruction).
The 5th provider (claude-code) is documented in guide_ai_client.md
+ the data_oriented_design.md references the nagent pattern as the
source of the canonical rules.
The cross-references are bidirectional: the 6 styleguides reference
the 3 project docs; the 3 project docs reference the 6 styleguides;
the 2 doc updates reference both; AGENTS.md + ./docs/AGENTS.md
provide the entry points.
This commit is contained in:
@@ -23,9 +23,27 @@ Detailed agent guidance lives in the following locations — read these directly
|
||||
- **Tier 3 (Worker):** `.agents/skills/mma-tier3-worker/SKILL.md`
|
||||
- **Tier 4 (QA):** `.agents/skills/mma-tier4-qa/SKILL.md`
|
||||
|
||||
## Canonical Operating Rules
|
||||
|
||||
@conductor/code_styleguides/data_oriented_design.md
|
||||
This is the canonical DOD reference. The same file is injected into the Application's RAG / context assembly via `[agent].context_files` in `manual_slop.toml` — one source of truth for both harnesses. Edit it there; do not duplicate rules into this file.
|
||||
|
||||
## Code Styleguides (the convention catalog)
|
||||
|
||||
Per-domain rules live in `conductor/code_styleguides/`. Read the relevant one before starting work in a new area:
|
||||
|
||||
- `conductor/code_styleguides/data_oriented_design.md` — The canonical DOD reference (Tier 0/1/2; 3 defaults to reject; 7-question simplification pass)
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` — The 4 memory dimensions (curation / discussion / RAG / knowledge) and when to use each
|
||||
- `conductor/code_styleguides/rag_integration_discipline.md` — The conservative-RAG rule: opt-in, complements never replaces, provenance required, no mutation
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — Stable-to-volatile context ordering; the cache TTL GUI contract; the byte-comparison test
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` — The knowledge harvest pattern: category files, provenance, sha256 ledger, digest regeneration
|
||||
- `conductor/code_styleguides/feature_flags.md` — Codifies "delete to turn off" (file presence) + "config.toml flag" (config); when to use each
|
||||
|
||||
## Human-Facing Documentation
|
||||
|
||||
For understanding, using, and maintaining the tool, see `docs/Readme.md` and the 14 deep-dive guides it indexes.
|
||||
For understanding, using, and maintaining the tool, see `docs/Readme.md` (the canonical teaching document) and `./docs/AGENTS.md` (the agent-facing mirror of `docs/Readme.md`).
|
||||
|
||||
The 14 deep-dive guides under `docs/` (`guide_architecture.md`, `guide_ai_client.md`, etc.) are referenced from `docs/Readme.md`; an agent reading for a feature scope should read `./docs/AGENTS.md` first, then the relevant `guide_*.md`.
|
||||
|
||||
## Critical Anti-Patterns
|
||||
|
||||
|
||||
@@ -0,0 +1,306 @@
|
||||
# The 4 Memory Dimensions
|
||||
|
||||
**Status:** Styleguide; codifies the 4 memory dimensions of the Manual Slop conversation data.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/data_oriented_design.md` §9; `docs/guide_agent_memory_dimensions.md`; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §2.8.
|
||||
|
||||
> **What this is.** The conversation data has 4 distinct memory dimensions. Each lives at a different layer; each serves a different purpose. The wrong shape for the wrong layer is a common mistake. This styleguide names the 4, names the boundary between them, and gives the rule for which one to use when.
|
||||
|
||||
---
|
||||
|
||||
## 0. The 4 dimensions (the one-glance table)
|
||||
|
||||
| # | Dim | Where it lives | What it stores | How it's edited | How it's queried | SSDL |
|
||||
|---|---|---|---|---|---|---|
|
||||
| 1 | **Curation** | `FileItem` + `ContextPreset` + Fuzzy Anchors | *How to render a file* in the AI's context window | Structural File Editor; project TOML | Implicit in `aggregate.py:run` at discussion start | `[Q]` |
|
||||
| 2 | **Discussion** | `app.disc_entries` + branching + UISnapshot | *What was said* in the conversation | GUI `[Edit]` mode; `[Branch]`; undo/redo | `build_markdown` renders as prior context | `o==>` |
|
||||
| 3 | **RAG** | `src/rag_engine.py` (ChromaDB) | *Semantic fingerprints* of indexed files | (opaque vector store) | `RAGEngine.search()` at LLM call time | `[Q]` |
|
||||
| 4 | **Knowledge** | `~/.manual_slop/knowledge/*.md` + per-file + digest + ledger | *Durable learnings* from past sessions | Plain markdown edit | Bounded digest as stable prefix | `o==>` |
|
||||
|
||||
---
|
||||
|
||||
## 1. Curation memory (per-file, per-discussion, structural)
|
||||
|
||||
**The shape.** Per-file curation config: `path`, `auto_aggregate`, `force_full`, `view_mode` (`full / skeleton / summary / sig / def / agg`), `ast_signatures`, `ast_definitions`, `ast_mask`, `custom_slices` (Fuzzy Anchors). A `ContextPreset` is a named, persisted set of `FileItem`s. Both persist in the project TOML.
|
||||
|
||||
**The query model.** "When discussion X opens, render file Y per its curation memory." Implicit in `aggregate.py:run` at discussion start. The user doesn't query the curation memory directly; they *configure* it.
|
||||
|
||||
**The right tool.** The Structural File Editor (per `docs/guide_context_curation.md`). AST-aware slices, Fuzzy Anchor slices, view-mode picker. The file's `FileItem` is the UI surface.
|
||||
|
||||
**The wrong tool.** Storing curation state in `disc_entries` (it's not conversational). Storing curation state in the RAG index (it's structural, not semantic). Storing curation state in the knowledge digest (it's per-discussion, not durable).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:discussion starts]
|
||||
│
|
||||
▼
|
||||
[Q:which ContextPreset is active?]
|
||||
│
|
||||
├── preset N ──► [I:load ContextPreset N's FileItems]
|
||||
│
|
||||
▼
|
||||
[loop: each FileItem]
|
||||
│
|
||||
├──► [Q:FileItem.view_mode?]
|
||||
│ │
|
||||
│ ├── full ──► [I:read full file]
|
||||
│ ├── skeleton ──► [I:py_get_skeleton / ts_c_get_skeleton]
|
||||
│ ├── summary ──► [I:run_subagent_summarization]
|
||||
│ ├── sig ──► [I:py_get_skeleton (signatures only)]
|
||||
│ ├── def ──► [I:py_get_skeleton (definitions only)]
|
||||
│ └── agg ──► [I:py_get_skeleton (children only)]
|
||||
│
|
||||
├──► [Q:FileItem.ast_mask?]
|
||||
│ │
|
||||
│ └── yes ──► [I:apply ast_mask to the rendered view]
|
||||
│
|
||||
├──► [Q:FileItem.custom_slices?]
|
||||
│ │
|
||||
│ └── yes ──► [I:apply custom_slices to the rendered view]
|
||||
│
|
||||
└──► [I:append to aggregate markdown]
|
||||
```
|
||||
|
||||
**The shape rule.** Curation is per-file, per-discussion, structural. Edited at the Structural File Editor. Persisted in TOML. The file's `FileItem` is the single source of truth for "how do I render this file in the AI's context."
|
||||
|
||||
---
|
||||
|
||||
## 2. Discussion memory (per-discussion, conversational, multi-turn)
|
||||
|
||||
**The shape.** `app.disc_entries: list[dict]` where each entry is `{"role": str, "content": str, "collapsed": bool, "ts": str, ...}` plus optional `thinking_segments` and `usage` (token accounting). The discussion is rendered as a `list[Message]` for the LLM by `build_markdown` (per `src/aggregate.py`).
|
||||
|
||||
**The query model.** "What did the user say? What did the AI say? In what order?" The discussion is the *prior context* for the next LLM call. The user can edit, insert, delete, role-change, and branch at any entry (A1-A7 per-entry operations per the nagent review v1 §3).
|
||||
|
||||
**The right tool.** The Discussion Hub panel. Per-entry `[Edit]`, `[Read]`, `[+/-]`, `Ins`, `Del`, `[Branch]`, role combo. The undo/redo stack (UISnapshot) and the Take/branching/compact system.
|
||||
|
||||
**The wrong tool.** Storing discussion state in the RAG index (it's temporal, not semantic). Storing discussion state in the knowledge digest (it's per-discussion, not durable). Storing discussion state in a FileItem (it's not per-file).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:user types prompt + hits Enter]
|
||||
│
|
||||
▼
|
||||
[I:append new entry to disc_entries] (role: "User")
|
||||
│
|
||||
▼
|
||||
[Q:which ContextPreset is active?]
|
||||
│
|
||||
├── preset N ──► [I:render FileItems per curation memory]
|
||||
│
|
||||
▼
|
||||
[I:aggregate.build_markdown(preset, discussion) -> str]
|
||||
│
|
||||
▼
|
||||
[I:ai_client.send(aggregate_text, history)]
|
||||
│
|
||||
▼
|
||||
[I:append new entry to disc_entries] (role: "AI", content: response)
|
||||
│
|
||||
▼
|
||||
[Q:user pressed Edit on an entry?]
|
||||
│
|
||||
├── yes ──► [I:update disc_entries[i].content]
|
||||
│
|
||||
▼
|
||||
[Q:user pressed Branch on an entry?]
|
||||
│
|
||||
├── yes ──► [I:project_manager.branch_discussion(index) -> new Take]
|
||||
│
|
||||
▼
|
||||
[Q:user pressed Undo?]
|
||||
│
|
||||
├── yes ──► [I:history.UISnapshot.pop() -> restore previous state]
|
||||
│
|
||||
▼
|
||||
[Q:user pressed Compact?]
|
||||
│
|
||||
├── yes ──► [I:ai_client.run_discussion_compaction(discussion)] (Candidate 11)
|
||||
│
|
||||
[T:render Discussion Hub panel from disc_entries]
|
||||
```
|
||||
|
||||
**The shape rule.** Discussion is per-discussion, conversational, multi-turn. Edited per-entry. Persisted in TOML via `_flush_to_project`. The `disc_entries` list is the single source of truth for "what was said in this discussion."
|
||||
|
||||
---
|
||||
|
||||
## 3. RAG memory (opt-in, semantic, fuzzy)
|
||||
|
||||
**The shape.** ChromaDB vector store; per-file `FileItem`-like records with embeddings. `RAGEngine.search(query, k=N)` returns the top-N most-similar chunks. Persisted in `tests/artifacts/.slop_cache/chroma_<embedding_provider>/`.
|
||||
|
||||
**The query model.** "Given a query, return similar content from the indexed corpus." Semantic similarity, fuzzy. No provenance beyond the file path. No user-editable content.
|
||||
|
||||
**The right tool.** `RAGEngine.search()` at LLM call time (the `rag_*` results injected into the LLM prompt). The `[X] Enable RAG` toggle in AI Settings. The `RAGConfig` (embedding provider, chunk size, chunk overlap, source selection).
|
||||
|
||||
**The wrong tool.** Using RAG as a *replacement* for the other 3 dimensions. Using RAG results for state mutation (the integration discipline prohibits this). Using RAG for "show me the last thing the user said" (use Discussion memory). Using RAG for "show me what we decided last time" (use Knowledge memory).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:ai_client.send() is called]
|
||||
│
|
||||
▼
|
||||
[Q:is RAG enabled?]
|
||||
│
|
||||
├── no ──► [T:skip]
|
||||
│
|
||||
▼
|
||||
[Q:which RAG source? (project / global / none)]
|
||||
│
|
||||
├── project ──► [I:RAGEngine.index_file(path) for each tracked file in project]
|
||||
├── global ──► [I:RAGEngine.index_file(path) for each file in ~/.manual_slop/knowledge/]
|
||||
└── none ──► [T:skip]
|
||||
│
|
||||
▼
|
||||
[Q:RAG engine initialized?]
|
||||
│
|
||||
├── no ──► [I:RAGEngine._init_embedding_provider()] (lazy init, may download)
|
||||
│
|
||||
▼
|
||||
[I:RAGEngine.search(query, k=N) -> list[SearchResult]]
|
||||
│
|
||||
▼
|
||||
[I:append "{rag-context}" block to aggregate markdown]
|
||||
│
|
||||
▼
|
||||
[I:ai_client.send() continues with augmented prompt]
|
||||
```
|
||||
|
||||
**The shape rule.** RAG is opt-in. Default-off. Complements the other dimensions; never replaces. Provenance is required (file path, chunk offset). No mutation. See `conductor/code_styleguides/rag_integration_discipline.md` for the full rule.
|
||||
|
||||
---
|
||||
|
||||
## 4. Knowledge memory (per-project, durable, provenance-aware)
|
||||
|
||||
**The shape.** A markdown tree at `~/.manual_slop/knowledge/`:
|
||||
|
||||
| File | Format | What it stores |
|
||||
|---|---|---|
|
||||
| `knowledge/facts.md` | `- {statement} {provenance}` | Durable statements about systems, repos, tools |
|
||||
| `knowledge/decisions.md` | `- {statement} {reason}` | Decisions that were made |
|
||||
| `knowledge/questions.md` | `- {question}` | Unanswered questions |
|
||||
| `knowledge/playbooks.md` | `- **{name}**: {steps}` | Reusable command sequences |
|
||||
| `knowledge/tasks.md` | `- {task}` (## Open / ## Done) | Open and done tasks |
|
||||
| `knowledge/files/{file_id}.md` | `- {note} {provenance}` | Per-file notes (keyed by inode) |
|
||||
| `knowledge/digest.md` | bounded 4KB | The projected digest (injected as `{knowledge}` block) |
|
||||
| `knowledge/ledger.json` | `{entries: {sha256: {status, at, items}}}` | The harvest audit log |
|
||||
|
||||
**The query model.** "Given past sessions, what durable knowledge should I inject into the current discussion?" The answer is the `{knowledge}` block in the initial context, regenerated from the category files (newest first), bounded to 4KB.
|
||||
|
||||
**The right tool.** The harvest CLI (`python -m src.knowledge_harvest`) for the harvest; the plain text editor (vim, nano, the GUI) for the category files. The "Knowledge" panel in the GUI for browse/edit/prune.
|
||||
|
||||
**The wrong tool.** Treating the knowledge digest as state (it's a projection; the category files are the state). Letting the digest grow unbounded (4KB cap; truncate with a visible note). Treating the per-file notes as a replacement for FileItem curation (different dimensions; both are useful).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:discussion starts]
|
||||
│
|
||||
▼
|
||||
[Q:knowledge digest exists? (knowledge/digest.md)]
|
||||
│
|
||||
├── no ──► [T:skip]
|
||||
│
|
||||
▼
|
||||
[Q:digest within 4KB budget?]
|
||||
│
|
||||
├── yes ──► [I:read digest]
|
||||
│
|
||||
├── no ──► [I:read digest (truncated with note)]
|
||||
│
|
||||
▼
|
||||
[Q:aggregate.py:run is at the stable prefix position]
|
||||
│
|
||||
▼
|
||||
[I:append "{knowledge}" block to initial context]
|
||||
│
|
||||
▼
|
||||
[Q:per-file knowledge for files in scope?]
|
||||
│
|
||||
├── yes ──► [I:append "{file-knowledge}" per FileItem]
|
||||
│
|
||||
[T:continue rendering aggregate]
|
||||
```
|
||||
|
||||
**The shape rule.** Knowledge is per-project, durable, provenance-aware. Edited by the user (plain markdown). The category files are the source of truth; the digest is a projection. See `conductor/code_styleguides/knowledge_artifacts.md` for the full harvest workflow.
|
||||
|
||||
---
|
||||
|
||||
## 5. The boundaries (when NOT to mix)
|
||||
|
||||
| Don't store... | In... | Because... |
|
||||
|---|---|---|
|
||||
| Discussion state | `FileItem` (curation) | Discussion is per-discussion, not per-file |
|
||||
| File curation | `disc_entries` (discussion) | Curation is per-file structural, not conversational |
|
||||
| Semantic search results | `disc_entries` (discussion) | RAG is fuzzy; the discussion is precise |
|
||||
| A long conversation | the knowledge digest (knowledge) | The digest is bounded (4KB); the conversation is unbounded |
|
||||
| A "this is the current state" fact | the RAG index (RAG) | RAG is semantic; state is precise |
|
||||
| Per-file notes | the discussion context | The notes should follow the file, not the discussion |
|
||||
| Per-discussion summary | the knowledge digest | The digest is *cross*-discussion, not per-discussion |
|
||||
| LLM-derived curation | the FileItem schema | LLM outputs are untrusted; the FileItem is user-edited |
|
||||
| Untrusted LLM output | the knowledge category files | The harvest prompt has retry + graceful failure; but the category files are *user-editable*, so corrections are first-class |
|
||||
|
||||
**The discipline.** When designing a new feature, ask: which of the 4 dimensions is the *natural* home? Don't reach for the RAG because "it's there"; reach for the dimension whose shape matches the data.
|
||||
|
||||
---
|
||||
|
||||
## 6. The cross-cutting principle (the "data is the thing")
|
||||
|
||||
All 4 dimensions share one principle: **the data is the thing, not the agent.** Each dimension has:
|
||||
- A flat shape (no object graphs; structs of structs of scalars)
|
||||
- A durable storage (TOML, ChromaDB, markdown — not Python objects)
|
||||
- A user-editable surface (the Structural File Editor, the Discussion Hub, the RAG toggle, the category files)
|
||||
- A query model that returns "data, not control flow" (per `data_oriented_error_handling_20260606`)
|
||||
|
||||
The wrong shape for the right question is a common mistake. The right question is "which of the 4 dimensions is this?" — not "is there a tool that does X?"
|
||||
|
||||
---
|
||||
|
||||
## 7. The decision tree (the 1-question test)
|
||||
|
||||
When a feature needs *some* memory, ask this single question:
|
||||
|
||||
```
|
||||
Q: What is the *data* (not the operation) the feature needs?
|
||||
│
|
||||
├── "How to render a file" ──► Curation (FileItem)
|
||||
├── "What was said in this chat" ──► Discussion (disc_entries)
|
||||
├── "What similar content exists" ──► RAG (RAGEngine.search)
|
||||
└── "What we learned from past runs" ──► Knowledge (knowledge/digest.md)
|
||||
```
|
||||
|
||||
Pick the matching dimension. If the feature needs 2+ dimensions, use 2+ dimensions — but be explicit about which is the *primary* (the one that holds the *answer*) and which is *secondary* (the one that provides *context*).
|
||||
|
||||
---
|
||||
|
||||
## 8. The implementation cross-references (the file:line map)
|
||||
|
||||
For Manual Slop's current state:
|
||||
|
||||
| Dim | Where in `src/` | Line range | What to look at |
|
||||
|---|---|---|---|
|
||||
| Curation | `src/models.py` | 510-559 | `FileItem` schema |
|
||||
| Curation | `src/models.py` | 909-937 | `ContextPreset` schema |
|
||||
| Curation | `src/context_presets.py` | (small) | `ContextPresetManager` |
|
||||
| Curation | `src/aggregate.py` | (518 lines) | `build_file_items`, `build_markdown` |
|
||||
| Discussion | `src/gui_2.py` | 3770-3853 | `render_discussion_entry` (A1-A7) |
|
||||
| Discussion | `src/gui_2.py` | 4239-4260 | `render_discussion_entry_controls` (B1-B11) |
|
||||
| Discussion | `src/history.py` | 8-71 | `UISnapshot`, `HistoryManager` (C1-C5) |
|
||||
| Discussion | `src/project_manager.py` | 429+ | `branch_discussion`, `promote_take` |
|
||||
| RAG | `src/rag_engine.py` | 1-384 | The RAG engine + ChromaDB |
|
||||
| Knowledge | (NEW) `src/knowledge_store.py` | (proposed) | The knowledge store |
|
||||
| Knowledge | (NEW) `src/knowledge_harvest_cli.py` | (proposed) | The harvest CLI |
|
||||
|
||||
---
|
||||
|
||||
## 9. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/data_oriented_design.md` §9 — the 4-dim table in the canonical DOD
|
||||
- `conductor/code_styleguides/rag_integration_discipline.md` — the conservative-RAG rule
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge harvest pattern
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — the cache strategy (where the 4 dims get injected)
|
||||
- `docs/guide_agent_memory_dimensions.md` — the user-facing cross-cutting guide
|
||||
- `docs/guide_context_curation.md` — the existing curation deep-dive
|
||||
- `docs/guide_rag.md` — the existing RAG deep-dive
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §2.8 — the nagent-origin pattern that informed the knowledge dim
|
||||
@@ -0,0 +1,354 @@
|
||||
# Cache-Friendly Context (stable-to-volatile ordering + cache TTL)
|
||||
|
||||
**Status:** Styleguide; codifies the cache strategy for `aggregate.py:run` and the GUI exposure of cache TTL.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/data_oriented_design.md` §3.2; `conductor/code_styleguides/agent_memory_dimensions.md`; `docs/guide_caching_strategy.md`; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5.
|
||||
|
||||
> **What this is.** The LLM providers that Manual Slop uses (Anthropic, Gemini, OpenAI) all support some form of prompt caching. The cost benefit comes from the *stable prefix* being byte-identical across turns and across discussions. This styleguide defines the stable prefix, the volatile suffix, the byte-comparison contract, and the cache TTL GUI exposure.
|
||||
|
||||
---
|
||||
|
||||
## 0. The one-glance principle
|
||||
|
||||
```
|
||||
[STABLE PREFIX (cached across turns)] [VOLATILE SUFFIX (per-turn)]
|
||||
[Role instructions] [Discussion metadata]
|
||||
[Function-calling schema] [Active preset (FileItems)]
|
||||
[Discovered tool descriptions] [Per-file details]
|
||||
[System prompt preset] [Tool-call results from prior turns]
|
||||
[Persona profile] [The user message]
|
||||
[Project context]
|
||||
[Knowledge digest]
|
||||
[file-knowledge for files in scope]
|
||||
```
|
||||
|
||||
The cache boundary is at layer 8/9 (the last stable / first volatile). The Anthropic-specific path wraps the prefix in `cache_control: {"type": "ephemeral"}` blocks at the boundary; the Gemini path uses `cachedContent` resources; the OpenAI path uses implicit prefix caching.
|
||||
|
||||
---
|
||||
|
||||
## 1. The 12-layer model (the stable-to-volatile ordering)
|
||||
|
||||
| # | Layer | Stable across turns? | Source | SSDL |
|
||||
|---|---|---|---|---|
|
||||
| 1 | Role instructions (model + provider) | yes | `_get_combined_system_prompt` | `[I]` |
|
||||
| 2 | Function-calling schema | yes | per provider | `[I]` |
|
||||
| 3 | Discovered tool descriptions | yes | `mcp_client.get_tool_schemas()` | `[I]` |
|
||||
| 4 | System prompt preset | yes | `app_state.ai_settings.system_prompt` | `[I]` |
|
||||
| 5 | Persona profile | yes | `app_state.active_persona` | `[I]` |
|
||||
| 6 | Project context (per `manual_slop.toml`) | yes | NEW (Candidate 14) | `[I]` |
|
||||
| 7 | Knowledge digest (per `knowledge/digest.md`) | yes (within a gc cycle) | NEW (Candidate 8) | `[I]` |
|
||||
| 8 | Discussion metadata (name, role count) | no (per turn) | `disc_entries[:1]` or `disc_meta` | `───` (data) |
|
||||
| 9 | Active preset (FileItem set) | no (per turn) | `self.context_files` | `───` (data) |
|
||||
| 10 | Per-file details (history, slices, notes) | no (per file) | per `FileItem` | `───` (data) |
|
||||
| 11 | Tool-call results from prior turns | no (per turn) | per `_reread_file_items` | `───` (data) |
|
||||
| 12 | The user message | no (per turn) | the input | `───` (data) |
|
||||
|
||||
**The cache boundary is at layer 7/8.** Layers 1-7 are byte-identical across turns of the same discussion (and across discussions of the same mode). Layers 8-12 change per turn.
|
||||
|
||||
---
|
||||
|
||||
## 2. The byte-comparison test (the design contract)
|
||||
|
||||
The design rule "stable prefix is byte-identical" must be testable. The test:
|
||||
|
||||
```python
|
||||
# In tests/test_aggregate_caching.py (NEW)
|
||||
def test_aggregate_stable_to_volatile_ordering():
|
||||
"""The first N characters of the context should be identical across turns
|
||||
of the same conversation, when no stable-layer inputs change."""
|
||||
ctrl = mock_app_controller()
|
||||
ctrl.ai_settings.system_prompt = "Test system prompt"
|
||||
ctrl.active_persona = mock_persona()
|
||||
|
||||
# Turn 1
|
||||
turn1 = aggregate.build_initial_context(ctrl, user_message="first prompt")
|
||||
|
||||
# Turn 2 (same stable inputs, different user message)
|
||||
turn2 = aggregate.build_initial_context(ctrl, user_message="second prompt")
|
||||
|
||||
# The first N characters should be identical (N = where the volatile layers start)
|
||||
N = aggregate.stable_prefix_length(ctrl)
|
||||
assert turn1[:N] == turn2[:N], f"Stable prefix mismatch: {turn1[:N]!r} != {turn2[:N]!r}"
|
||||
```
|
||||
|
||||
**The test is the contract.** If a new layer is added in the middle of the stack, this test fails; the agent must either move the layer to the stable position or update the test (with written justification).
|
||||
|
||||
**The implementation.** `aggregate.stable_prefix_length(ctrl)` returns the character offset where layer 8 starts. The simplest implementation: a class-level constant per `aggregate.py`, updated when the layer stack changes:
|
||||
|
||||
```python
|
||||
class AggregateStack:
|
||||
ROLE_INSTRUCTIONS_END = 0 # placeholder; computed at runtime
|
||||
SCHEMA_END = 0
|
||||
TOOLS_END = 0
|
||||
SYSTEM_PROMPT_END = 0
|
||||
PERSONA_END = 0
|
||||
PROJECT_CONTEXT_END = 0
|
||||
KNOWLEDGE_DIGEST_END = 0
|
||||
INSTANCE_START = 0 # the cache boundary
|
||||
```
|
||||
|
||||
**The test failure modes:**
|
||||
|
||||
| Failure | Why it fails | Fix |
|
||||
|---|---|---|
|
||||
| A new stable layer was added in the wrong position | The first N characters differ because the new layer is below the boundary | Move the new layer above the boundary (between layers 7 and 8) |
|
||||
| A stable layer was moved to the volatile position | The first N characters differ because the stable layer is now in the volatile part | Move the layer back to the stable position |
|
||||
| A volatile input leaked into a stable layer (e.g., a timestamp in the system prompt) | The first N characters differ because the volatile input is in the prefix | Strip the volatile input from the stable layer; pass it as a separate volatile argument |
|
||||
| The system prompt has a `now()` call | The first N characters differ across calls | Pass `now()` as a separate argument; don't include in the system prompt |
|
||||
|
||||
---
|
||||
|
||||
## 3. The provider-specific cache_control (the implementation)
|
||||
|
||||
### 3.1 Anthropic (5-minute ephemeral, 4 breakpoints max)
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_anthropic
|
||||
def _send_anthropic(messages, *, cache_prefix_chars=None):
|
||||
if cache_prefix_chars is not None:
|
||||
# Wrap the message in content blocks; mark each prefix with cache_control
|
||||
content_blocks = cache_prefix_blocks(messages, cache_prefix_chars)
|
||||
else:
|
||||
content_blocks = messages
|
||||
|
||||
response = anthropic_client.messages.create(
|
||||
model=model,
|
||||
max_tokens=8192,
|
||||
messages=[{"role": "user", "content": content_blocks}],
|
||||
)
|
||||
return _result_with_usage(response.content, response.usage, messages)
|
||||
```
|
||||
|
||||
**The cache_prefix_blocks helper** (mirrors nagent's `bin/helpers/nagent_llm.py:cache_prefix_blocks`):
|
||||
|
||||
```python
|
||||
def cache_prefix_blocks(message: str, cache_boundaries: list[int]) -> list[dict]:
|
||||
"""Split the message into content blocks at the given char offsets.
|
||||
Mark each prefix block with cache_control. Returns the plain string
|
||||
when no valid boundary exists. At most 3 prefix blocks (provider limit
|
||||
is 4 breakpoints per request)."""
|
||||
if not cache_boundaries:
|
||||
return message
|
||||
points = sorted({b for b in cache_boundaries if 0 < b < len(message)})[:3]
|
||||
if not points:
|
||||
return message
|
||||
blocks = []
|
||||
start = 0
|
||||
for point in points:
|
||||
blocks.append({
|
||||
"type": "text",
|
||||
"text": message[start:point],
|
||||
"cache_control": {"type": "ephemeral"},
|
||||
})
|
||||
start = point
|
||||
blocks.append({"type": "text", "text": message[start:]})
|
||||
return blocks
|
||||
```
|
||||
|
||||
**The Anthropic usage accounting** (per `nagent_llm.py:_result_with_usage`):
|
||||
|
||||
```python
|
||||
def _result_with_usage(text, usage, input_text=None):
|
||||
input_tokens = _usage_value(usage, "input_tokens", "prompt_tokens", "prompt_token_count")
|
||||
# Anthropic reports cached prompt tokens separately; fold them back
|
||||
# so input_tokens stays "tokens sent" across providers.
|
||||
input_tokens += _usage_value(usage, "cache_read_input_tokens")
|
||||
input_tokens += _usage_value(usage, "cache_creation_input_tokens")
|
||||
output_tokens = _usage_value(usage, "output_tokens", "completion_tokens", ...)
|
||||
# ... etc
|
||||
```
|
||||
|
||||
**The 4-breakpoint limit.** Anthropic allows at most 4 `cache_control` markers per request. nagent caps at 3 prefix blocks (one breakpoint per prefix). Manual Slop does the same: 3 prefix blocks, 1 volatile suffix.
|
||||
|
||||
### 3.2 Gemini (1-hour explicit cache, configurable TTL)
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_gemini
|
||||
def _send_gemini(messages, *, cache_ttl_seconds=3600):
|
||||
if cache_ttl_seconds > 0:
|
||||
# Create a cachedContent resource for the stable prefix
|
||||
cached_content = genai_client.caches.create(
|
||||
model=model,
|
||||
contents=stable_prefix_messages, # layers 1-7
|
||||
ttl=f"{cache_ttl_seconds}s",
|
||||
)
|
||||
# Reference the cached content in the request
|
||||
response = genai_client.models.generate_content(
|
||||
model=model,
|
||||
contents=volatile_messages, # layers 8-12
|
||||
config=genai.types.GenerateContentConfig(cached_content=cached_content.name),
|
||||
)
|
||||
else:
|
||||
response = genai_client.models.generate_content(model=model, contents=messages)
|
||||
return _result_with_usage(response.text, response.usage_metadata, messages)
|
||||
```
|
||||
|
||||
**The default TTL is 1 hour.** Configurable per the GUI (per §5 below).
|
||||
|
||||
### 3.3 OpenAI (5-10 min implicit, provider-managed)
|
||||
|
||||
OpenAI's caching is *implicit*: the provider automatically caches the prefix and reuses it across requests with the same prefix. No application-side control.
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_openai
|
||||
def _send_openai(messages, *, model="gpt-5.5"):
|
||||
response = openai_client.responses.create(model=model, input=messages)
|
||||
return _result_with_usage(response.output_text, response.usage, messages)
|
||||
# No application-side cache_control; the provider handles it
|
||||
```
|
||||
|
||||
**The TTL is provider-managed** (5-10 min). The GUI just shows "Cached by OpenAI; TTL: provider-managed."
|
||||
|
||||
### 3.4 The provider table (the summary)
|
||||
|
||||
| Provider | Cache type | Default TTL | Configurable? | GUI exposure? |
|
||||
|---|---|---|---|---|
|
||||
| Anthropic | ephemeral | 5 min | yes (via prompt cache breakpoints) | yes (per-discussion state) |
|
||||
| Google (Gemini) | explicit | 1 h | yes (via `ttl` field) | yes (TTL override) |
|
||||
| OpenAI | implicit (auto) | 5-10 min (provider-managed) | no | no (just shows "cached") |
|
||||
|
||||
---
|
||||
|
||||
## 4. The codepath (the end-to-end flow)
|
||||
|
||||
```
|
||||
[Q:ai_client.send() is called]
|
||||
│
|
||||
▼
|
||||
[I:aggregate.build_initial_context(ctrl, user_message) -> str]
|
||||
│
|
||||
├──► [I:layer 1-7: build stable prefix (the cache-friendly part)]
|
||||
│
|
||||
├──► [I:layer 8-12: build volatile suffix (the per-turn part)]
|
||||
│
|
||||
├──► [I:concatenate stable + volatile = full context]
|
||||
│
|
||||
├──► [I:stable_prefix_length(ctrl) -> N] (the cache boundary)
|
||||
│
|
||||
▼
|
||||
[Q:cache boundary N > 0?]
|
||||
│
|
||||
├── no ──► [I:pass full context to provider; no caching]
|
||||
│
|
||||
▼
|
||||
[Q:provider is Anthropic?]
|
||||
│
|
||||
├── yes ──► [I:cache_prefix_blocks(full_context, [N]) -> content_blocks]
|
||||
│ [I:anthropic.messages.create(content=content_blocks)]
|
||||
│
|
||||
[Q:provider is Gemini?]
|
||||
│
|
||||
├── yes ──► [I:create cachedContent resource for stable prefix]
|
||||
│ [I:genai.models.generate_content(cached_content=..., contents=volatile)]
|
||||
│
|
||||
[Q:provider is OpenAI?]
|
||||
│
|
||||
├── yes ──► [I:openai.responses.create(input=full_context)] (provider handles caching)
|
||||
│
|
||||
[I:return LlmResult(text, input_tokens, output_tokens)]
|
||||
│
|
||||
▼
|
||||
[Q:return to caller; aggregate.test_aggregate_stable_to_volatile_ordering is run]
|
||||
│
|
||||
[T:end]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. The GUI exposure (per-provider cache state)
|
||||
|
||||
The "Caching" Operations Hub sub-panel (per the v2.3 §5.3 sketch):
|
||||
|
||||
```
|
||||
+------------------------------------------------------+
|
||||
| Caching |
|
||||
+------------------------------------------------------+
|
||||
| Provider summaries |
|
||||
| [Anthropic] in:340 cache:80 hit:23% ttl:4:32 |
|
||||
| [Gemini] in:120 cache:0 hit:0% ttl:0:00 |
|
||||
| [OpenAI] in:560 cache:200 hit:35% ttl:n/a |
|
||||
+------------------------------------------------------+
|
||||
| Active discussions |
|
||||
| Discussion "refactor auth" |
|
||||
| cached: yes (Anthropic) |
|
||||
| expires: 2026-06-12T15:32 (in 4:32) |
|
||||
| [Invalidate cache] [Disable caching for this] |
|
||||
| Discussion "fix the parser" |
|
||||
| cached: no |
|
||||
| [Enable caching for this] |
|
||||
+------------------------------------------------------+
|
||||
| Global settings |
|
||||
| [X] Enable Anthropic ephemeral caching |
|
||||
| [X] Enable Gemini explicit caching |
|
||||
| [ ] Allow >1h Gemini caches (charges may apply) |
|
||||
| Anthropic default TTL: [5 min v] |
|
||||
| Gemini default TTL: [60 min v] |
|
||||
+------------------------------------------------------+
|
||||
```
|
||||
|
||||
**The data sources:**
|
||||
|
||||
| Widget | Data source | Frequency |
|
||||
|---|---|---|
|
||||
| `in:N cache:N hit:N%` | `ai_client.get_token_stats()` (already exported) | per turn (or per session) |
|
||||
| `ttl:4:32` | `ai_client._send_<provider>` usage metadata + the cache expiry timestamp | per turn |
|
||||
| `cached: yes/no` | per-discussion flag (NEW; tracks which discussions have active caches) | per discussion |
|
||||
| `[Invalidate cache]` | calls `ai_client._invalidate_cache(discussion_id)` (NEW) | on click |
|
||||
|
||||
**The new AI client state:**
|
||||
|
||||
```python
|
||||
# In src/ai_client.py (NEW)
|
||||
@dataclass
|
||||
class DiscussionCacheState:
|
||||
discussion_id: str
|
||||
provider: str
|
||||
cached_at: datetime
|
||||
expires_at: Optional[datetime] # None for OpenAI implicit
|
||||
hit_count: int = 0
|
||||
tokens_cached: int = 0
|
||||
last_invalidated_at: Optional[datetime] = None
|
||||
caching_enabled: bool = True # user can disable per-discussion
|
||||
|
||||
# In AppController (NEW)
|
||||
self.discussion_caches: dict[str, DiscussionCacheState] = {} # keyed by discussion_id
|
||||
```
|
||||
|
||||
**The Hook API additions:**
|
||||
|
||||
```
|
||||
GET /api/cache # list all discussion cache states
|
||||
GET /api/cache/<discussion_id> # get one
|
||||
POST /api/cache/<discussion_id>/invalidate
|
||||
POST /api/cache/<discussion_id>/disable
|
||||
POST /api/cache/<discussion_id>/enable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. The interaction with the 4 memory dimensions (where the cache hits)
|
||||
|
||||
| Dim | Where injected | Stable? | Cache impact |
|
||||
|---|---|---|---|
|
||||
| Curation | layer 9 (active preset) | no (per turn) | NOT cached; the user might switch presets |
|
||||
| Discussion | layer 8 (metadata) + layer 11 (prior turns) | no (per turn) | NOT cached (except: layer 8 metadata is the boundary) |
|
||||
| RAG | the `{rag-context}` block, appended to layer 8-12 | no (per query) | NOT cached; RAG is volatile per query |
|
||||
| Knowledge | layer 7 (digest) + per-file (file-knowledge) | yes (within a gc cycle) | CACHED; the digest is the stable prefix |
|
||||
|
||||
**The cache only hits on the stable prefix (layers 1-7).** The volatile suffix (layers 8-12) is *not* cached; the user expects the conversation to change per turn.
|
||||
|
||||
**The interaction with knowledge harvest:** when `nagent-gc` (or the Manual Slop equivalent) regenerates the digest, the cache is invalidated for the next turn. The user has a way to force invalidation manually (the `[Invalidate cache]` button).
|
||||
|
||||
**The interaction with file edit:** when the user edits a file in the Structural File Editor, the file-knowledge for that file is updated. The cache is invalidated for the next turn that references the file. The per-file knowledge change is a cache invalidator.
|
||||
|
||||
---
|
||||
|
||||
## 7. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/data_oriented_design.md` §3.2, §3.3, §3.4 — the data-oriented foundation
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 dims (where the cache hits)
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge digest (the layer 7 cached content)
|
||||
- `docs/guide_caching_strategy.md` — the user-facing deep-dive
|
||||
- `src/aggregate.py:run` — the consumer of this styleguide
|
||||
- `src/ai_client.py:_send_<provider>` — the producer
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5 — the nagent pattern that informed this styleguide
|
||||
@@ -0,0 +1,253 @@
|
||||
# Data-Oriented Design (the canonical rules)
|
||||
|
||||
**Status:** This is the canonical DOD reference for Manual Slop. Imported by `AGENTS.md` and injected into the Application's RAG / context assembly via `manual_slop.toml [agent].context_files`. One source of truth for both harnesses.
|
||||
**Source:** Adapted from Mike Acton's `context/data-oriented-design.md` (13,084 bytes, the nagent canonical reference).
|
||||
**Date:** 2026-06-12
|
||||
|
||||
> **What this is.** Operating rules, not philosophy: every rule here tells you what to *do*. Approach every problem — code, plan, pipeline, document — by understanding the real data first, then designing the simplest machine that transforms the input you actually have into the output you actually need, at a cost you can state. Decide from facts and measurement, not habit, analogy, or dogma.
|
||||
>
|
||||
> **Manual Slop context.** The project is an ImGui GUI orchestrator for LLM-driven coding sessions. The dominant data is *the conversation* — a typed message list with role + content + metadata + optional thinking segments. The data has to survive across workers (MMA Tier 3 subprocesses), across tools (the 45 MCP tools), across LLM providers (8 send paths), and across the user's editing session (per-entry edit, branch, undo). The data is the thing; the workers and processes are disposable.
|
||||
|
||||
---
|
||||
|
||||
## 0. Scope, tiers, and precedence
|
||||
|
||||
Scale the ceremony to the task. Decide the tier first; when unsure, pick the higher tier and say which you picked.
|
||||
|
||||
| Tier | When | What to do |
|
||||
|---|---|---|
|
||||
| **Tier 0** | Trivial: typo fixes, mechanical edits, one-line bugfixes, answering questions | Apply the defaults silently (naming, explicit error behavior, no speculative generality). No written plan or checklist |
|
||||
| **Tier 1** | Non-trivial change: new function or feature, behavior change, anything that touches a data layout, contract, or interface | Required: answer the framing + data questions in a short written plan *before* implementing, run the simplification pass, run the final self-check |
|
||||
| **Tier 2** | Subsystem-scale: new or substantially reworked subsystem, pipeline, or tool | Everything in tier 1 plus the enforceable deliverables (per §10) |
|
||||
|
||||
**Precedence when rules conflict:**
|
||||
|
||||
1. An explicit instruction from the user for the current task
|
||||
2. **This document** (`conductor/code_styleguides/data_oriented_design.md`)
|
||||
3. Existing codebase or workflow convention
|
||||
|
||||
When this document conflicts with existing convention and complying would mean a large refactor, **do not silently rewrite and do not silently conform**: state the conflict, estimate the cost of each option, and propose the smallest compliant change.
|
||||
|
||||
---
|
||||
|
||||
## 1. The 3 defaults to reject
|
||||
|
||||
These are the three default beliefs that produce bad solutions. Each comes with the replacement behavior — do the replacement, every time:
|
||||
|
||||
### 1.1 "The tools are the platform."
|
||||
|
||||
**Reality is the platform:** the actual hardware, organization, deadline, physics.
|
||||
|
||||
*Do instead:* before designing, name the real platform and the 2-3 of its fixed properties that constrain this solution, and design within them.
|
||||
|
||||
**For Manual Slop:** the platform is the user's machine (Windows; 1-8 cores; 16-128 GB RAM), the LLM provider API (rate limits, context window, cost), and the MCP tool surface (45 tools, 3-layer security). Not the ImGui API; not the Python version. The ImGui API is the *view*; the platform is the *view + the data + the user*.
|
||||
|
||||
### 1.2 "Design around a model of the world."
|
||||
|
||||
**World models** (objects, metaphors, idealized categories) hide the actual data and the actual cost.
|
||||
|
||||
*Do instead:* design around the data. Do not introduce an abstraction until you can describe, concretely, the data it organizes and the transform it serves — and what the abstraction costs.
|
||||
|
||||
**For Manual Slop:** the data is the `disc_entries` list, the `FileItem` schema, the `ContextPreset` schema, the `RAGEngine` index, the `comms.log` JSON-L. Not the *Discussion* or the *Persona* or the *Project* as objects. The objects are convenient summaries; the data is the ground truth.
|
||||
|
||||
### 1.3 "The solution matters more than the data."
|
||||
|
||||
**The only purpose of any solution is to transform data from one form to another.**
|
||||
|
||||
*Do instead:* start every task from the actual inputs and required outputs, never from the machinery you'd like to build.
|
||||
|
||||
**For Manual Slop:** before proposing a new class, module, or pipeline, write down (in a comment, in the plan, in the test) what the input is and what the output is. If you can't, that's the first task.
|
||||
|
||||
---
|
||||
|
||||
## 2. The 8 core defaults (any problem)
|
||||
|
||||
1. **The problem is the data.** Before proposing any solution, describe the input and output concretely. If you can't, getting that description *is* the first task.
|
||||
2. **State the cost.** Every design recommendation you make must state its cost (time, memory, complexity, maintenance) and on what platform that cost is paid. A recommendation without a cost is a guess.
|
||||
3. **Solve only the problem you have.** Different data is a different problem. Do not add parameters, options, abstraction layers, or extension points for hypothetical future needs. If you're tempted, write the one-line note of what you *didn't* build and why, and move on.
|
||||
4. **Where there is one, there are many.** Anything that happens once almost always happens many times — across space or across the time axis. Default every design to the batch; treat the single case as a batch of size one.
|
||||
5. **The common case dominates.** Identify the most common case explicitly and design the straight-line path for it. Handle rare and error cases, but outside that path — a "maybe" checked everywhere is an "always."
|
||||
6. **Exploit every constraint you have.** List the known constraints (ranges, volumes, rates, invariants) and use them to remove work. Do not discard a constraint to make the solution "more general" — that generality is a cost paid forever.
|
||||
7. **Simplicity is removing work.** Prefer fewer states, fewer steps, fewer special cases, fewer moving parts. Every added state or branch must be carried, tested, and explained — count them as cost.
|
||||
8. **"Can't be done" is a cost claim.** When something seems impossible, what is almost always true is that it costs more than it's worth. Say that, with the estimate, so the tradeoff can actually be decided.
|
||||
|
||||
---
|
||||
|
||||
## 3. Get the real data (required before designing)
|
||||
|
||||
You cannot observe data you were not given — so observe what you *can*, and label everything else:
|
||||
|
||||
- **Inspect before assuming.** Read representative input files, sample actual values, read the actual call sites, run the code on real input when a way to do so exists. Do not design from the type signatures or the docs alone.
|
||||
- **Label every assumption.** For each fact you need but cannot observe, write an explicit line — `ASSUMPTION: — affects ` — in your plan, and prefer designs that are cheap to revisit if the assumption is wrong. Ask the user only when the answer materially changes the design.
|
||||
- **Never fabricate.** Do not invent plausible-looking values, distributions, or measurements and treat them as real.
|
||||
|
||||
**Answer these about the data (in the tier 1+ plan):**
|
||||
|
||||
1. What does the input actually look like — shape, volume, source?
|
||||
2. What are the most common real values, and how are they distributed?
|
||||
3. What are the acceptable ranges, and what happens when out-of-range data arrives?
|
||||
4. What is the frequency of change — what is stable, what is volatile?
|
||||
5. What does the solution read and where does it come from? What does it write and where is it used? What does it touch that it doesn't need?
|
||||
|
||||
**For Manual Slop specifically:** the data is `disc_entries` (the conversation), `FileItem` (per-file curation), `ContextPreset` (per-preset curation), `RAGEngine` (semantic search), `comms.log` (audit), `Persona` (agent profile), `manual_slop.toml` (project config), `app_state` (live state). Read the actual files before designing.
|
||||
|
||||
---
|
||||
|
||||
## 4. Method (tier 1+)
|
||||
|
||||
Show this work as a short plan, a line or two per step:
|
||||
|
||||
1. **Frame it.** What is the problem, why is it worth solving, where is the limit beyond which it isn't, and what is plan B?
|
||||
2. **Get the data** (per §3).
|
||||
3. **State the cost** of the dominant transform on the real platform.
|
||||
4. **Design the transform:** a sequence or DAG of explicit transformations — what comes in, what goes out, what each step is responsible for, with explicit contracts (shape, meaning, ownership, lifetime, valid ranges) at each boundary.
|
||||
5. **Run the simplification pass** (per §5); say which questions applied and what work they removed.
|
||||
6. **Define done.** State the success criteria and what evidence would prove the approach wrong, before building.
|
||||
7. **Verify.** Check the result against the real data and the stated criteria, and report what was and wasn't verified.
|
||||
|
||||
---
|
||||
|
||||
## 5. The simplification pass (run recursively on every sub-problem)
|
||||
|
||||
The 7 questions, applied in order, to every sub-problem:
|
||||
|
||||
| # | Question | Reduces |
|
||||
|---|---|---|
|
||||
| 1 | Can we **not do this at all**? | Work that shouldn't exist |
|
||||
| 2 | Can we do this **only once** (precompute, cache, amortize)? | Repeated work |
|
||||
| 3 | Can we do this **fewer times**? | Frequency of work |
|
||||
| 4 | Can we **approximate** the result so that no one notices the difference? | Precision cost |
|
||||
| 5 | Can we use a **small lookup table**? | Branching cost |
|
||||
| 6 | Can we use a **large lookup table**? | Branching cost (alternative) |
|
||||
| 7 | Can we use a **small buffer/FIFO** to decouple producer from consumer? | Coupling cost |
|
||||
| 8 | Can we **constrain the problem further** so a simpler machine suffices? | Generality cost |
|
||||
|
||||
If any question applies, do the cheaper thing. If a question doesn't apply, say why and move on. The questions are not a checklist to score against; they're a habit.
|
||||
|
||||
---
|
||||
|
||||
## 6. Design rules
|
||||
|
||||
- **Minimize states and branches by design**, not by adding checks. Where the data genuinely varies, partition it by case and handle each partition straight-line, rather than re-deciding the case per element.
|
||||
- **Out-of-range and error behavior is always explicit** — clamp, reject, drop, or fail loudly; chosen deliberately and written down. Never leave undefined behavior as an implicit policy, in any tier.
|
||||
- **Complexity requires evidence.** Add complexity only against a real, observed need — never a hypothetical one.
|
||||
|
||||
---
|
||||
|
||||
## 7. Performance claims
|
||||
|
||||
- **Never assert an unmeasured performance result.** Not "this should be faster," not invented numbers.
|
||||
- If a way to measure exists (benchmark, profiler, test harness, counters), measure, and include before/after numbers with the change.
|
||||
- If no way to measure exists here, label the change **unverified**, state the expected effect as a hypothesis, and specify the exact measurement that would verify it.
|
||||
- If there is no measurable performance requirement, build the simplest correct design and skip speculative optimization entirely.
|
||||
|
||||
**For Manual Slop:** the existing audit scripts (`scripts/audit_main_thread_imports.py`, `scripts/audit_weak_types.py`, `scripts/check_test_toml_paths.py`) are the measurement infrastructure. Use them. Don't claim "faster" without a number from one of these.
|
||||
|
||||
---
|
||||
|
||||
## 8. Software specifics (systems, engine, embedded, game)
|
||||
|
||||
The rules above apply to any problem. These are their conclusions for software, where the hardware is unforgiving and the data volumes are real.
|
||||
|
||||
### 8.1 Batch-first transforms (plural by default)
|
||||
|
||||
- Write transforms to operate on **batches/arrays** by default, named in the **plural** (`update_things`, not `update_thing`).
|
||||
- A singular call is a degenerate batch: the same batch path with `count = 1`. Do not maintain separate singular logic without a proven, measured need.
|
||||
- Exception: true singletons (configuration state, a single shared resource). Taking the exception requires a written note: why the data is genuinely singular and batch semantics don't apply.
|
||||
|
||||
### 8.2 Memory, layout, and access
|
||||
|
||||
- **Indices over pointers/references/handles by default** (index into a contiguous array or table). Any pointer-heavy hot path must include a short written justification for why indices are insufficient.
|
||||
- Organize data by **access pattern, not conceptual ownership**. Split hot and cold fields when the cold fields aren't needed in the dominant loop.
|
||||
- For each hot path, write down the expected **access pattern** (linear / strided / random), expected **branch behavior** (predictable / unpredictable), and the hardware assumptions.
|
||||
- When branch entropy is high, prefer **partitioned passes** (bucket by state/tag, process each bucket straight-line) over per-element branching.
|
||||
- Keep the common-case path branch-minimal; rare and error handling lives outside the hot loop.
|
||||
|
||||
### 8.3 Data protocols between systems
|
||||
|
||||
Systems communicate through **explicit data protocols**, modeled after network protocols and file formats — explicit layout, versioning, documented meaning. The default is a **flat struct**: fixed layout, no hidden pointers, no OO-style interfaces. Use tagged unions or header-plus-payload when the flat struct genuinely can't express it. Do not model system boundaries as objects, virtual calls, or opaque handles.
|
||||
|
||||
**For Manual Slop:** the boundary between the AI client and the LLM provider is a *flat struct* (the `Message` dataclass: `role, content, tool_calls, tool_results`); the boundary between the MCP client and the tool implementer is a *flat struct* (the `tool_input` dict); the boundary between the LLM client and the GUI is the *comms.log* JSON-L. Not objects with virtual methods. Not opaque handles. Flat structs.
|
||||
|
||||
### 8.4 Hardware is the platform
|
||||
|
||||
Design with the actual hardware's properties — cache hierarchy, memory bandwidth, alignment, latency vs throughput — and to its strengths.
|
||||
|
||||
- **Latency and throughput are only the same thing in a sequential system.** For every performance requirement, identify which one it actually is before designing for it.
|
||||
- The compiler and language are tools, not magic: memory layout, access order, and the choice of what work to do at all are your job, not theirs — and they are roughly 90% of the problem. Know what the compiler can reasonably do with what you wrote, and don't delegate what it can't.
|
||||
|
||||
---
|
||||
|
||||
## 9. The 4 memory dimensions (the Manual Slop context)
|
||||
|
||||
The conversation data has 4 distinct memory dimensions. Each lives at a different layer; each serves a different purpose. The wrong shape for the wrong layer is a common mistake.
|
||||
|
||||
| # | Dim | Where it lives | What it stores | How it's edited | How it's queried | SSDL |
|
||||
|---|---|---|---|---|---|---|
|
||||
| 1 | **Curation** | `FileItem` + `ContextPreset` + Fuzzy Anchors | *How to render a file* in the AI's context window | Structural File Editor; project TOML | Implicit in `aggregate.py:run` at discussion start | `[Q]` |
|
||||
| 2 | **Discussion** | `app.disc_entries` + branching + UISnapshot | *What was said* in the conversation | GUI `[Edit]` mode; `[Branch]`; undo/redo | `build_markdown` renders as prior context | `o==>` |
|
||||
| 3 | **RAG** | `src/rag_engine.py` (ChromaDB) | *Semantic fingerprints* of indexed files | (opaque vector store) | `RAGEngine.search()` at LLM call time | `[Q]` |
|
||||
| 4 | **Knowledge** | `~/.manual_slop/knowledge/*.md` + per-file + digest + ledger | *Durable learnings* from past sessions | Plain markdown edit | Bounded digest as stable prefix | `o==>` |
|
||||
|
||||
**The shape rule:** curation is per-file structural; discussion is per-turn conversational; RAG is opt-in semantic; knowledge is per-project durable. A feature that wants one should use the matching dimension; mixing them is a maintenance liability.
|
||||
|
||||
See `conductor/code_styleguides/agent_memory_dimensions.md` for the full styleguide.
|
||||
|
||||
---
|
||||
|
||||
## 10. Enforceable deliverables (tier 2)
|
||||
|
||||
For each new or substantially reworked subsystem:
|
||||
|
||||
- One explicit **batch transform contract**: input layout, output layout, owner, lifetime, valid value ranges.
|
||||
- A **plural/batch path** for every transform; singular calls are thin wrappers over the batch implementation (`count = 1`) unless documented as a true singleton.
|
||||
- A written **justification for any pointer/reference/handle-heavy hot path** explaining why index-based access is insufficient.
|
||||
- Explicit **out-of-range behavior** (clamp/reject/drop/error) at every input boundary.
|
||||
- Unresolved design questions filed as **local issue files under `issues/`** — not GitHub issues, not inline TODOs.
|
||||
|
||||
**For Manual Slop specifically:** the equivalent of `issues/` is `docs/reports/` (where session retrospectives, audit reports, and design-issue docs live) or per-track `spec.md` §9 "Open Questions".
|
||||
|
||||
---
|
||||
|
||||
## 11. Final self-check (run before delivering tier 1+ work)
|
||||
|
||||
Verify, and fix or flag anything that fails:
|
||||
|
||||
- [ ] The plan answered the framing, data, and cost questions — or every gap is labeled `ASSUMPTION` with what it affects.
|
||||
- [ ] The most common case is identified and the design serves it straight-line; rare/error cases are out of the common path.
|
||||
- [ ] The simplification pass ran; the work it removed (or why nothing could be removed) is stated.
|
||||
- [ ] No speculative generality: no parameter, option, or abstraction exists for a need that isn't real yet.
|
||||
- [ ] Out-of-range and error behavior is explicit at every boundary.
|
||||
- [ ] Transforms are plural/batch, or the singleton exception is documented.
|
||||
- [ ] Pointer-heavy hot paths carry their written justification; everything else uses indices.
|
||||
- [ ] No unmeasured performance claim anywhere in code, comments, or summary; measurements included where possible, hypotheses labeled where not.
|
||||
- [ ] Done-criteria from the plan were checked, and the summary reports what was verified and what wasn't.
|
||||
- [ ] (Tier 2) Deliverables above are present; open questions are filed under `docs/reports/` or per-track `spec.md` §9.
|
||||
|
||||
---
|
||||
|
||||
## 12. Cross-references
|
||||
|
||||
- `AGENTS.md` — imports this file; the project-root agent-facing rules
|
||||
- `./docs/AGENTS.md` — the agent-facing mirror of `docs/Readme.md` (recommended first read for any agent scoping a feature)
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 memory dimensions
|
||||
- `conductor/code_styleguides/rag_integration_discipline.md` — the conservative-RAG rule
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — stable-to-volatile ordering + the cache TTL contract
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge harvest pattern
|
||||
- `conductor/code_styleguides/feature_flags.md` — "delete to turn off" + config flags
|
||||
- `conductor/product-guidelines.md` — the project's other product conventions
|
||||
- `conductor/tech-stack.md` — the tech stack constraints
|
||||
- `conductor/edit_workflow.md` — the edit-tool contract
|
||||
|
||||
---
|
||||
|
||||
## 13. External sources (the prior art this was adapted from)
|
||||
|
||||
- **Mike Acton, "Data-Oriented Design and C++"** (cppCon 2014) — the foundational DOD talk
|
||||
- **Casey Muratori, "The Big OOPs: Anatomy of a Thirty-Five-Year Mistake"** (BSC 2025) — the historical indictment of OOP
|
||||
- **Ryan Fleury, "A Taxonomy of Computation Shapes"** (Feb 2023) — the 6 computational shapes
|
||||
- **Ryan Fleury, "The Codepath Combinatoric Explosion"** (Apr 2023) — the nil-sentinel / immediate-mode defusing techniques
|
||||
- **Ryan Fleury, "Errors are just cases"** (the `Result[T, ErrorInfo]` pattern) — the data-oriented error handling
|
||||
- **Andrew Reece, "Assuming as Much as Possible"** (BSC 2025) — the Xar pattern; the engineering discipline for stripping layers
|
||||
- **John O'Donnell, "IMGUI / The Pitch / MVC"** — the immediate-mode + IEventTarget paradigm
|
||||
- **Mike Acton, `context/data-oriented-design.md`** (nagent canonical; 13,084 bytes) — the immediate source for the structure of this document
|
||||
@@ -0,0 +1,196 @@
|
||||
# Feature Flags (file presence vs config)
|
||||
|
||||
**Status:** Styleguide; codifies when to use file-presence flags ("delete to turn off") vs config flags (`[ai_settings.toml]` / `[manual_slop.toml]`).
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/knowledge_artifacts.md` §5; `conductor/code_styleguides/data_oriented_design.md`.
|
||||
|
||||
> **What this is.** Manual Slop has two patterns for "turning a feature on or off": (a) file presence (the file is the switch; `rm` to turn off); (b) config flag (the `[ai_settings.toml]` toggle or the GUI checkbox). They're both valid; each is right in different contexts. This styleguide codifies when to use which.
|
||||
|
||||
---
|
||||
|
||||
## 0. The two patterns (the one-glance table)
|
||||
|
||||
| Pattern | How it works | How to turn off | How to turn on |
|
||||
|---|---|---|---|
|
||||
| **File presence** | The feature checks for the file's existence; the file is the switch | `rm <file>` | Touch the file (or run the generator that creates it) |
|
||||
| **Config flag** | The feature checks a setting in `[ai_settings.toml]` / `[manual_slop.toml]`; the GUI checkbox is the surface | Set `enabled = false` in the config; or uncheck the GUI box | Set `enabled = true`; or check the GUI box |
|
||||
| **CLI flag** (a sub-pattern of config) | The CLI accepts a flag like `--no-cache`; the default behavior is "on" | Pass `--no-cache` on the CLI | Omit the flag (use the default) |
|
||||
| **Feature flag in metadata** (a sub-pattern) | A `metadata.json` field for the feature's track declares `uses_rag: true` | Edit the metadata | Edit the metadata |
|
||||
|
||||
---
|
||||
|
||||
## 1. When to use file presence (the "delete to turn off" pattern)
|
||||
|
||||
**Use file presence when:**
|
||||
- The feature generates a *side artifact* that the user might want to *turn off* by deleting the artifact
|
||||
- The "off" state is *recoverable* — the artifact can be regenerated by running a command
|
||||
- The user *expects* to be able to manage the feature via the filesystem (the user is on the command line; they know `rm`)
|
||||
- The feature is *opt-in by default-off* (deleting the artifact means the feature is off; the absence of the file is the "off" state)
|
||||
|
||||
**Examples in Manual Slop:**
|
||||
|
||||
| Feature | The "on" state | The "off" state | The regeneration command |
|
||||
|---|---|---|---|
|
||||
| Knowledge digest injection | `~/.manual_slop/knowledge/digest.md` exists | File is deleted | `python -m src.knowledge_harvest --apply` |
|
||||
| Per-file knowledge for file X | `~/.manual_slop/knowledge/files/{file_id}.md` exists | File is deleted | (the next harvest regenerates) |
|
||||
| Saved conversations index | `~/.manual_slop/conversations/index-saved-conversations-*.json` exists | File is deleted | (n/a; user manually saves) |
|
||||
| RAG index for project | `~/.manual_slop/.slop_cache/chroma_<provider>/` exists | Directory is deleted | `python -m src.rag_engine --rebuild-index` |
|
||||
| Audit log | `~/.manual_slop/logs/sessions/<session>/comms.log` exists | File is deleted | (n/a; the log is auto-generated per turn) |
|
||||
|
||||
**The principle (per the data-oriented foundation):** *the data is the thing*. If the feature produces a file, the file is the switch. Deleting the file is the natural way to turn off the feature.
|
||||
|
||||
**The discovery surface:** the user can `ls ~/.manual_slop/knowledge/` and see `digest.md` (or not) and understand the state.
|
||||
|
||||
**The ux surface:** the GUI shows the file state and provides a `[Delete to turn off]` button that does the same `rm` underneath.
|
||||
|
||||
---
|
||||
|
||||
## 2. When to use config flags (the `[ai_settings.toml]` pattern)
|
||||
|
||||
**Use config flags when:**
|
||||
- The feature is *always on* by default; the flag is a way to *opt out* in special circumstances
|
||||
- The "off" state is *not recoverable* by a single command (it's a persistent preference)
|
||||
- The user *expects* to manage the feature via the GUI (they're not on the command line)
|
||||
- The feature's behavior is *complex* (multiple settings, not just on/off)
|
||||
- The setting is *user-specific* (different users might have different preferences)
|
||||
|
||||
**Examples in Manual Slop:**
|
||||
|
||||
| Feature | The config | The default | The GUI surface |
|
||||
|---|---|---|---|
|
||||
| RAG enabled | `[ai_settings.toml] rag.enabled` | `false` (new projects) | `[X] Enable RAG` checkbox |
|
||||
| RAG source | `[ai_settings.toml] rag.source` | `project` | `(project / global / none)` radio |
|
||||
| RAG embedding provider | `[ai_settings.toml] rag.embedding_provider` | `gemini` | dropdown |
|
||||
| RAG chunk size | `[ai_settings.toml] rag.chunk_size` | `1000` | integer input |
|
||||
| Auto-aggregate | `[ai_settings.toml] aggregate.auto_aggregate` | `true` | `[X] Auto-aggregate files` |
|
||||
| Force full | `[ai_settings.toml] aggregate.force_full` | `false` | `[ ] Force full content` |
|
||||
| Cache TTL (Anthropic) | `[ai_settings.toml] cache.anthropic_ttl_seconds` | `300` (5 min) | integer input |
|
||||
| Cache TTL (Gemini) | `[ai_settings.toml] cache.gemini_ttl_seconds` | `3600` (1 h) | integer input |
|
||||
| Knowledge harvest enabled | `[ai_settings.toml] knowledge.harvest_enabled` | `true` | `[X] Enable knowledge harvest` |
|
||||
| Project context file | `[manual_slop.toml] agent.context_files` | (none) | file picker |
|
||||
|
||||
**The principle (per the data-oriented foundation):** *configuration is data*. The GUI checkbox is a *projection* of the config file; the config file is the source of truth.
|
||||
|
||||
**The discovery surface:** the user can read `[ai_settings.toml]` and see the state. The TOML is human-readable.
|
||||
|
||||
**The ux surface:** the GUI has a settings panel that reads from the TOML, displays it, and writes back on change.
|
||||
|
||||
---
|
||||
|
||||
## 3. When to use a CLI flag (the sub-pattern)
|
||||
|
||||
**Use CLI flags when:**
|
||||
- The feature is *invoked from the command line* (not from the GUI)
|
||||
- The flag is a *one-shot* setting (the user doesn't want to edit a config file for a one-time run)
|
||||
- The default is "on" and the flag is the "off" override
|
||||
|
||||
**Examples in Manual Slop:**
|
||||
|
||||
| CLI | Flag | Default | Effect |
|
||||
|---|---|---|---|
|
||||
| `python -m src.knowledge_harvest` | `--apply` | off (dry-run) | Mutate: harvest + reclaim |
|
||||
| `python -m src.knowledge_harvest` | `--no-harvest` | off (harvest) | Reclaim only; skip LLM |
|
||||
| `python -m src.knowledge_harvest` | `--max-harvest-bytes N` | unlimited | Cap the conversation bytes sent to the LLM |
|
||||
| `python -m src.knowledge_harvest` | `--root PATH` | `~/.manual_slop` | Use a custom knowledge root |
|
||||
| `pytest` | `--no-header` | off | Don't print the header |
|
||||
| `pytest` | `-x` | off | Stop on first failure |
|
||||
|
||||
**The principle (per the data-oriented foundation):** *the CLI flag is data*. The user types a flag; the value is passed to the function; the function behaves accordingly.
|
||||
|
||||
---
|
||||
|
||||
## 4. When to use a feature flag in `metadata.json` (the track flag)
|
||||
|
||||
**Use metadata feature flags when:**
|
||||
- A track's *implementation* depends on a feature (e.g., uses RAG); this is *static* metadata about the track
|
||||
- The flag is *documented* in the track's `metadata.json` for reviewers
|
||||
- The flag is *not* a runtime setting (it doesn't change behavior at runtime; it documents intent)
|
||||
|
||||
**Examples in Manual Slop:**
|
||||
|
||||
```json
|
||||
// In conductor/tracks/<track_id>/metadata.json
|
||||
{
|
||||
"uses_rag": true,
|
||||
"uses_mma": false,
|
||||
"tier": "tier-2",
|
||||
"uses_knowledge_harvest": true
|
||||
}
|
||||
```
|
||||
|
||||
**The principle:** the metadata documents the track's dependencies. A reviewer can read the metadata to understand "this track uses RAG; if you don't have RAG enabled, the track might not work."
|
||||
|
||||
---
|
||||
|
||||
## 5. The decision tree (the 1-question test)
|
||||
|
||||
When adding a new feature, ask this single question:
|
||||
|
||||
```
|
||||
Q: Is the feature's "off" state recoverable by a single command?
|
||||
│
|
||||
├── yes (e.g., regenerate the artifact) ──► File presence
|
||||
│
|
||||
└── no (the "off" is a persistent preference)
|
||||
│
|
||||
├── Q: Is the feature invoked from the CLI?
|
||||
│ │
|
||||
│ ├── yes ──► CLI flag (sub-pattern of config)
|
||||
│ │
|
||||
│ └── no ──► Config flag + GUI checkbox
|
||||
```
|
||||
|
||||
**The decision is the *kind* of flag, not the *implementation*.** The file presence vs config choice is about user expectations, not technical constraints.
|
||||
|
||||
---
|
||||
|
||||
## 6. The interaction between file presence and config (the layered)
|
||||
|
||||
**A feature can have both.** Example:
|
||||
|
||||
- The knowledge digest is gated by **file presence** (`digest.md` exists) for the *injection* of the `{knowledge}` block.
|
||||
- The knowledge harvest is gated by **config** (`[ai_settings.knowledge] harvest_enabled = true`) for the *automatic regeneration* of the digest after a discussion ends.
|
||||
|
||||
**The two flags are layered:**
|
||||
- File presence controls *whether the digest is injected* (a per-turn decision)
|
||||
- Config flag controls *whether the digest is regenerated* (a per-discussion decision)
|
||||
|
||||
**The user can turn off the entire feature** by both `rm digest.md` AND setting `harvest_enabled = false`. The feature is fully off.
|
||||
|
||||
**The user can turn on a single layer** by:
|
||||
- `touch digest.md` to turn on injection (but the file is empty; the next harvest populates it)
|
||||
- Setting `harvest_enabled = true` to turn on auto-regeneration
|
||||
|
||||
**The GUI surface** (per layer) is separate:
|
||||
- The `Knowledge` panel shows the digest file state and provides `[Delete to turn off]` and `[Regenerate]` buttons
|
||||
- The `AI Settings > Knowledge` panel has the `harvest_enabled` checkbox
|
||||
|
||||
**The ux:** the user has *two* knobs (file presence for "what's injected now"; config for "what gets regenerated"). Each is explicit about what it controls.
|
||||
|
||||
---
|
||||
|
||||
## 7. The forbidden patterns (the "don't do this" list)
|
||||
|
||||
| Pattern | Why it's forbidden |
|
||||
|---|---|
|
||||
| File presence for a feature with no regeneration path | The user can't turn the feature back on without manual intervention |
|
||||
| Config flag for a side artifact | The user can't `rm` the artifact to clean up disk |
|
||||
| File presence *and* config flag for the *same* behavior | Confusing; the user doesn't know which to use |
|
||||
| CLI flag that has no default ("off" by default) | The user has to remember the flag every time |
|
||||
| GUI checkbox that doesn't write to the config file | The change is lost on restart |
|
||||
| `metadata.json` flag that changes runtime behavior | The metadata is for documentation, not for behavior |
|
||||
| Hidden file (in `~/.cache/` or `/tmp/`) as a flag | The user can't find it |
|
||||
| Symlink-based flag | Platform-specific; debugging nightmare |
|
||||
| Env var as the only flag | The user can't discover it via the GUI or the docs |
|
||||
|
||||
---
|
||||
|
||||
## 8. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` §5 — the knowledge digest "delete to turn off" example
|
||||
- `conductor/code_styleguides/data_oriented_design.md` §1.2 — "Design around a model of the world" (the anti-pattern)
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — the cache TTL GUI surface (a config flag + GUI checkbox)
|
||||
- `conductor/code_styleguides/rag_integration_discipline.md` — the RAG opt-in (a config flag + GUI checkbox)
|
||||
- `src/paths.py` — the path resolution; the file-presence flags live under `~/.manual_slop/`
|
||||
- `docs/Readme.md` (human-facing) — the high-level overview
|
||||
- `./docs/AGENTS.md` (agent-facing) — the per-tier reading path
|
||||
@@ -0,0 +1,410 @@
|
||||
# Knowledge Artifacts (the harvest pattern)
|
||||
|
||||
**Status:** Styleguide; codifies the knowledge harvest pattern: category files, provenance, sha256 ledger, digest regeneration, "delete to turn off."
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/agent_memory_dimensions.md` §4; `conductor/code_styleguides/feature_flags.md`; `docs/guide_knowledge_curation.md`; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.1, §4.
|
||||
|
||||
> **What this is.** The 4th memory dimension (per `agent_memory_dimensions.md` §4) is the durable, provenance-aware, user-editable knowledge store. It's a *layer*, not a *snapshot*: category files are the source of truth; the digest is a projection; the ledger is the audit log. This styleguide names the files, the formats, the harvest workflow, and the "delete to turn off" pattern.
|
||||
|
||||
---
|
||||
|
||||
## 0. The one-glance directory layout
|
||||
|
||||
```
|
||||
~/.manual_slop/knowledge/
|
||||
├── facts.md # - {statement} {provenance}
|
||||
├── decisions.md # - {statement, reason} {provenance}
|
||||
├── questions.md # - {question} {provenance}
|
||||
├── playbooks.md # - **{name}**: {steps} {provenance}
|
||||
├── tasks.md # ## Open / ## Done
|
||||
├── files/
|
||||
│ └── {file_id}.md # per-file notes (keyed by inode)
|
||||
├── digest.md # bounded 4KB; the projection; "delete to turn off"
|
||||
├── ledger.json # sha256-of-content audit log
|
||||
└── prompts/
|
||||
└── harvest-conversation.md # user-editable harvest prompt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 1. The category files (the source of truth)
|
||||
|
||||
### 1.1 `facts.md` (durable statements)
|
||||
|
||||
```markdown
|
||||
# Facts
|
||||
|
||||
- The MCP dispatch uses a flat if/elif chain. 4 places, 45 tools. [from: 2026-05-12-investigate-dispatch, 2026-05-12]
|
||||
- ai_client.py has 5 separate per-provider history lists, each with their own lock. Switching providers mid-session loses history. [from: 2026-05-13-state-mutation-matrix, 2026-05-13]
|
||||
- RAG is opt-in. Default-off in new projects. [from: 2026-06-12-rag-discipline, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {statement} {provenance}`. Plain markdown. Append-only. User-editable.
|
||||
|
||||
### 1.2 `decisions.md` (decisions with reasons)
|
||||
|
||||
```markdown
|
||||
# Decisions
|
||||
|
||||
- Knowledge harvest is a complement to curation + discussion, not a RAG replacement. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
- Cache TTL defaults to 5 min (Anthropic) + 60 min (Gemini); configurable per-discussion. [from: 2026-06-12-cache-strategy, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {statement} {provenance}`. The "why" lives in the LLM's harvest output; the user's edits override.
|
||||
|
||||
### 1.3 `questions.md` (unanswered questions)
|
||||
|
||||
```markdown
|
||||
# Questions
|
||||
|
||||
- Where does intent resolution live — per-verb, per-block, or global? [from: 2026-06-12-follow-up-b, 2026-06-12]
|
||||
- How should the knowledge digest TTL be exposed in the GUI? [from: 2026-06-12-cache-ttl, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {question} {provenance}`. Open questions are *valuable* — they're the TODO list the next session can act on.
|
||||
|
||||
### 1.4 `playbooks.md` (reusable sequences)
|
||||
|
||||
```markdown
|
||||
# Playbooks
|
||||
|
||||
- **Knowledge Harvest**: scan -> classify -> LLM-distill -> append -> digest -> reclaim. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
- **Stable-to-Volatile Cache Ordering**: identify Instance: boundary -> pass to --cache-prefix-chars. [from: 2026-06-12-candidate-12, 2026-06-12]
|
||||
- **Candidate Verification (TBD)**: read src/ai_client.py:run_discussion_compression -> check failure mode. [from: 2026-06-12-candidate-15, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- **{name}**: {steps} {provenance}`. Playbooks are the "I did this once; here it is" record. Future workers use them directly.
|
||||
|
||||
### 1.5 `tasks.md` (open and done)
|
||||
|
||||
```markdown
|
||||
# Tasks
|
||||
|
||||
## Open
|
||||
- Create canonical DOD file at conductor/code_styleguides/data_oriented_design.md. [from: 2026-06-12-candidate-16, 2026-06-12]
|
||||
- Verify Candidate 15 by reading src/ai_client.py:run_discussion_compression. [from: 2026-06-12-candidate-15, 2026-06-12]
|
||||
|
||||
## Done
|
||||
- Read nagent source in full (18 files). [from: 2026-05-15, 2026-05-15]
|
||||
- Wrote v2.3 review (272KB / 3965 lines). [from: 2026-06-12-v2.3, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {task} {provenance}`. The two sections are manually maintained; the harvest places open items in `## Open` and done items in `## Done`.
|
||||
|
||||
### 1.6 `files/{file_id}.md` (per-file notes)
|
||||
|
||||
```markdown
|
||||
# /repo/src/ai_client.py
|
||||
|
||||
- Uses `cache_control: {"type": "ephemeral"}` blocks for Anthropic caching. [from: 2026-06-12-investigate-cache, 2026-06-12]
|
||||
- The 5 per-provider history lists are gated by their own locks. [from: 2026-05-13-state-mutation-matrix, 2026-05-13]
|
||||
- `run_discussion_compression` failure mode: TBD (Candidate 15). [from: 2026-06-12-candidate-15, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {note} {provenance}`. Keyed by `file_id` (the st_dev:st_ino of the file). Survives renames within the same filesystem.
|
||||
|
||||
**The file_id pattern** (per nagent's `bin/helpers/nagent_file_edit_lib.py:file_id_for_path`):
|
||||
|
||||
```python
|
||||
def file_id_for_path(path: Path) -> str:
|
||||
"""Stable file identity across renames. Returns 'device:inode'."""
|
||||
stat = path.stat()
|
||||
return f"{stat.st_dev}:{stat.st_ino}"
|
||||
```
|
||||
|
||||
**The "files" category in the harvest output** has a special branch: if the path resolves to an existing file, the note goes to `knowledge/files/{file_id}.md`; if not, the note falls back to `facts.md` as `{path}: {note} {provenance}`. The note survives, just loses the per-file binding.
|
||||
|
||||
---
|
||||
|
||||
## 2. The digest (`digest.md`)
|
||||
|
||||
The digest is a *projection* of the category files, bounded to **4KB**. It's injected as the `{knowledge}` block in the initial context.
|
||||
|
||||
**The format** (per nagent's `regenerate_digest`):
|
||||
|
||||
```markdown
|
||||
# Knowledge digest
|
||||
(regenerated by nagent-gc; edit the category files, not this file)
|
||||
|
||||
## Open tasks
|
||||
- Create canonical DOD file at conductor/code_styleguides/data_oriented_design.md. [from: 2026-06-12-candidate-16, 2026-06-12]
|
||||
|
||||
## Open questions
|
||||
- Where does intent resolution live — per-verb, per-block, or global? [from: 2026-06-12-follow-up-b, 2026-06-12]
|
||||
|
||||
## Decisions
|
||||
- Knowledge harvest is a complement to curation + discussion, not a RAG replacement. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
|
||||
## Facts
|
||||
- nagent has 5 providers; Manual Slop has 8. [from: 2026-06-12-v2.3, 2026-06-12]
|
||||
|
||||
## Playbooks
|
||||
- **Knowledge Harvest**: scan -> classify -> LLM-distill -> append -> digest -> reclaim. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
```
|
||||
|
||||
**The ordering is fixed:** Open tasks, Open questions, Decisions, Facts, Playbooks (per nagent's `DIGEST_SECTIONS = (('Open tasks', 'tasks_open'), ('Open questions', 'questions'), ('Decisions', 'decisions'), ('Facts', 'facts'), ('Playbooks', 'playbooks'))`).
|
||||
|
||||
**Within each section, newest first** (because the category files are append-only; reversing gives newest-first).
|
||||
|
||||
**Truncation:** if the sections don't fit in 4KB, the rest is truncated with a visible `(truncated; see the category files for the rest)` note.
|
||||
|
||||
**"Delete to turn off":** if all sections are empty, the digest is *deleted*:
|
||||
|
||||
```python
|
||||
# In regenerate_digest
|
||||
if not sections:
|
||||
if target.is_file():
|
||||
target.unlink() # delete to turn off
|
||||
return None
|
||||
```
|
||||
|
||||
**The injection point** (in `aggregate.py:run`):
|
||||
|
||||
```python
|
||||
# In aggregate.py:run (the consumer of the digest)
|
||||
knowledge_digest_path = paths.knowledge_dir() / "digest.md"
|
||||
if knowledge_digest_path.is_file():
|
||||
knowledge_digest = knowledge_digest_path.read_text(encoding="utf-8")
|
||||
stable_prefix.append(f"{{knowledge}}\n{knowledge_digest}\n{{/knowledge}}\n")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. The ledger (`ledger.json`)
|
||||
|
||||
The ledger is the **sha256-of-content audit log**. It gates deletion on a proven harvest.
|
||||
|
||||
**The format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"entries": {
|
||||
"<sha256-of-conversation-content>": {
|
||||
"path": "/home/user/.nagent/conversations/<name>-<uuid>",
|
||||
"status": "harvested",
|
||||
"at": "2026-06-12T14:23:45.123456+00:00",
|
||||
"items": {
|
||||
"facts": 3,
|
||||
"decisions": 2,
|
||||
"tasks_done": 1,
|
||||
"tasks_open": 0,
|
||||
"questions": 1,
|
||||
"playbooks": 0,
|
||||
"files": 1
|
||||
},
|
||||
"deleted": true
|
||||
},
|
||||
"<sha256-of-another-conversation>": {
|
||||
"path": "...",
|
||||
"status": "harvest-failed",
|
||||
"at": "2026-06-12T14:24:00.000000+00:00",
|
||||
"deleted": false,
|
||||
"error": "provider 'openai' not available"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**The status values:**
|
||||
|
||||
| Status | Meaning | Action |
|
||||
|---|---|---|
|
||||
| `harvested` | LLM distillation succeeded; items appended to category files | reclaim (unlink) |
|
||||
| `harvest-failed` | LLM distillation failed after retries | keep the conversation; record the error |
|
||||
| `deleted-unharvested` | User passed `--no-harvest`; the conversation is reclaimed without LLM | reclaim (unlink) |
|
||||
| `too-large` | File > 1MB; kept without harvesting | keep |
|
||||
|
||||
**The sha256-of-content dedup:** two conversations with the same content share a ledger entry. The second is reclaimed without paying the LLM cost again.
|
||||
|
||||
---
|
||||
|
||||
## 4. The harvest workflow
|
||||
|
||||
### 4.1 The 7-category schema (the LLM output)
|
||||
|
||||
The LLM's harvest output is strict JSON (no prose, no markdown fence):
|
||||
|
||||
```json
|
||||
{
|
||||
"facts": [
|
||||
{"statement": "The system has 4 memory dimensions", "detail": ""}
|
||||
],
|
||||
"decisions": [
|
||||
{"statement": "Knowledge harvest is a complement to curation + discussion", "detail": "not a RAG replacement"}
|
||||
],
|
||||
"tasks_done": [
|
||||
{"statement": "v2.3 review identified 10 future-track candidates", "detail": ""}
|
||||
],
|
||||
"tasks_open": [
|
||||
{"statement": "Create canonical DOD file at conductor/code_styleguides/data_oriented_design.md", "detail": "Candidate 14"}
|
||||
],
|
||||
"questions": [
|
||||
{"statement": "Where does intent resolution live — per-verb, per-block, or global?", "detail": ""}
|
||||
],
|
||||
"playbooks": [
|
||||
{"name": "Knowledge Harvest", "steps": "scan -> classify -> LLM-distill -> append -> digest -> reclaim"}
|
||||
],
|
||||
"files": [
|
||||
{"path": "/repo/src/ai_client.py", "note": "Cache TTL GUI: per-discussion state; cache hit rate per provider"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**The prompt** (in `prompts/harvest-conversation.md`; user-editable, root-first resolution):
|
||||
|
||||
```markdown
|
||||
# Harvest durable knowledge from a manual_slop conversation
|
||||
|
||||
You are given one conversation (or a summary of one). Extract only knowledge that
|
||||
stays useful after this conversation is deleted. Return only JSON in exactly this
|
||||
form (no prose, no markdown fence):
|
||||
|
||||
[the 7-category schema above]
|
||||
|
||||
Category rules:
|
||||
- facts: durable statements about systems, repositories, tools, environments, or
|
||||
constraints that were learned, not assumed.
|
||||
- decisions: choices that were made, with the why in `detail`.
|
||||
- tasks_done: concrete work completed in this conversation.
|
||||
- tasks_open: work that was started, planned, or requested but not finished.
|
||||
- questions: questions raised and never answered.
|
||||
- playbooks: command sequences or processes that worked and are reusable; `steps`
|
||||
is the runnable sequence.
|
||||
- files: a note tied to one specific file path (use the absolute path seen in
|
||||
the conversation).
|
||||
|
||||
General rules:
|
||||
- Empty arrays are valid and expected: most conversations contain nothing durable.
|
||||
Do not invent items to fill categories.
|
||||
- One item per distinct piece of knowledge; keep `statement` to one sentence.
|
||||
- `detail` is optional context; omit it or use "" when the statement stands alone.
|
||||
- Do not include conversation mechanics, tool output noise, retries, or one-off
|
||||
trivia (timestamps, token counts, transient errors).
|
||||
```
|
||||
|
||||
### 4.2 The retry budget
|
||||
|
||||
`HARVEST_MAX_ATTEMPTS = 2`. The retry is at the parse level (not the API level):
|
||||
|
||||
```python
|
||||
def harvest_conversation(path, provider, model, config_path, *, generate, summarize=None):
|
||||
content = read_or_summarize(path, provider, model)
|
||||
template = harvest_prompt_path().read_text(encoding="utf-8").strip()
|
||||
last_error = None
|
||||
for attempt in range(HARVEST_MAX_ATTEMPTS):
|
||||
prompt = build_harvest_prompt(template, path.name, content, retry=attempt > 0)
|
||||
response = generate(prompt, provider, model)
|
||||
try:
|
||||
return parse_harvest_json(response)
|
||||
except (json.JSONDecodeError, ValueError) as exc:
|
||||
last_error = exc
|
||||
raise RuntimeError(f"harvest output invalid after {HARVEST_MAX_ATTEMPTS} attempts: {last_error}")
|
||||
```
|
||||
|
||||
**The retry-suffix:** on retry, append `\nYour previous reply was not valid JSON. Return only the JSON object.\n` to the prompt. The LLM sees its previous (malformed) output and a one-line correction.
|
||||
|
||||
**The strict parser** (tolerates code-fence; otherwise strict):
|
||||
|
||||
```python
|
||||
def parse_harvest_json(text: str) -> dict:
|
||||
stripped = text.strip()
|
||||
fence = JSON_FENCE.match(stripped) # tolerates ```json ... ```
|
||||
if fence:
|
||||
stripped = fence.group(1).strip()
|
||||
payload = json.loads(stripped)
|
||||
if not isinstance(payload, dict):
|
||||
raise ValueError("harvest output is not a JSON object")
|
||||
harvested = {}
|
||||
for category in ITEM_CATEGORIES:
|
||||
rows = payload.get(category, [])
|
||||
harvested[category] = rows if isinstance(rows, list) else []
|
||||
return harvested
|
||||
```
|
||||
|
||||
### 4.3 The size limits (the budgets)
|
||||
|
||||
| Constant | Value | Why |
|
||||
|---|---|---|
|
||||
| `SUMMARIZE_THRESHOLD_BYTES` | 64 KB | Files > 64KB get summarized first |
|
||||
| `MAX_HARVEST_SOURCE_BYTES` | 1 MB | Files > 1MB are kept (not harvested) |
|
||||
| `DIGEST_MAX_BYTES` | 4 KB | The bounded digest size |
|
||||
| `HARVEST_MAX_ATTEMPTS` | 2 | Retry budget on parse failure |
|
||||
|
||||
**The "too-large" branch** (the budget guard):
|
||||
|
||||
```python
|
||||
if artifact.size_bytes > MAX_HARVEST_SOURCE_BYTES:
|
||||
entries[sha] = {"status": "too-large", "deleted": False}
|
||||
emit(f"kept (too large): {label}")
|
||||
continue
|
||||
```
|
||||
|
||||
### 4.4 The dry-run-by-default safety
|
||||
|
||||
The harvest CLI defaults to **dry-run**. Without `--apply`, the CLI classifies, estimates cost, and prints a report. **No mutation.**
|
||||
|
||||
```bash
|
||||
$ python -m src.knowledge_harvest
|
||||
artifacts: live:42, user-kept:3, prune:0, harvest:17, keep:1
|
||||
harvest candidates: 2.3MB (~600K input tokens), prune candidates: 0B
|
||||
dry run; pass --apply to harvest and reclaim
|
||||
|
||||
$ python -m src.knowledge_harvest --apply
|
||||
reclaimed: 2.3MB
|
||||
harvested items: facts:42, decisions:18, tasks_done:7, tasks_open:3, questions:5, playbooks:2, files:11
|
||||
digest: /home/user/.manual_slop/knowledge/digest.md
|
||||
ledger: /home/user/.manual_slop/knowledge/ledger.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. The "delete to turn off" pattern (per `feature_flags.md`)
|
||||
|
||||
**The principle.** Feature flags should be data, not config. If a feature is gated by the presence of a file, the user can turn it off by deleting the file. No GUI toggle, no env var, no `config.toml` edit. Just `rm`.
|
||||
|
||||
**The knowledge harvest pattern:** `rm ~/.manual_slop/knowledge/digest.md` → no `{knowledge}` block is injected. Re-enable by running `python -m src.knowledge_harvest --apply` (which regenerates the digest).
|
||||
|
||||
**The implementation:**
|
||||
|
||||
```python
|
||||
# In aggregate.py:run (the consumer)
|
||||
knowledge_digest_path = paths.knowledge_dir() / "digest.md"
|
||||
if knowledge_digest_path.is_file():
|
||||
knowledge_digest = knowledge_digest_path.read_text(encoding="utf-8")
|
||||
stable_prefix.append(f"{{knowledge}}\n{knowledge_digest}\n{{/knowledge}}\n")
|
||||
# else: skip; the file is the switch
|
||||
```
|
||||
|
||||
**The general pattern** recurs in 3 places:
|
||||
1. `regenerate_digest` deletes the digest when sections are empty
|
||||
2. The `aggregate.py:run` injection check is the load-bearing one
|
||||
3. The `Knowledge` panel shows the file state (so the user knows what to do)
|
||||
|
||||
**The alternative** (config toggle) is also supported: `[ai_settings.knowledge].digest_enabled = false`. See `feature_flags.md` for the rule on when to use file presence vs config.
|
||||
|
||||
---
|
||||
|
||||
## 6. The graceful failure modes
|
||||
|
||||
| Failure | Handling |
|
||||
|---|---|
|
||||
| LLM returns invalid JSON | Retry (up to 2 attempts); on 2nd failure, mark `harvest-failed` in the ledger; keep the conversation |
|
||||
| File > 1MB | Mark `too-large` in the ledger; keep the conversation |
|
||||
| File > 64KB | Summarize via `run_subagent_summarization` (or equivalent); use the summary as the LLM input |
|
||||
| Provider not available | Mark `harvest-failed`; keep the conversation |
|
||||
| Network timeout | Same; mark `harvest-failed`; keep the conversation |
|
||||
| Disk full writing to category files | Raise; mark `harvest-failed`; keep the conversation (don't reclaim) |
|
||||
|
||||
**The pattern:** critical operations complete; non-essential post-steps are best-effort. The marker is visible. The user can re-run.
|
||||
|
||||
---
|
||||
|
||||
## 7. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` §4 — the knowledge dim in context
|
||||
- `conductor/code_styleguides/feature_flags.md` — the "delete to turn off" pattern
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — where the digest is injected (layer 7, stable)
|
||||
- `conductor/code_styleguides/data_oriented_design.md` §1.2 — "Design around a model of the world" (the anti-pattern)
|
||||
- `data_oriented_error_handling_20260606` — the `Result[T, ErrorInfo]` pattern for the harvest LLM call
|
||||
- `docs/guide_knowledge_curation.md` — the user-facing deep-dive
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.1, §4 — the nagent pattern that informed this styleguide
|
||||
@@ -0,0 +1,284 @@
|
||||
# RAG Integration Discipline
|
||||
|
||||
**Status:** Styleguide; codifies when and how to wire RAG (the opt-in, semantic-search memory dimension) into Manual Slop features.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/agent_memory_dimensions.md` §3; `conductor/code_styleguides/data_oriented_design.md` §9; `docs/guide_rag.md`.
|
||||
|
||||
> **What this is.** RAG is the opt-in, semantic-search memory dimension. It's *useful* (semantic search across large codebases; concept-level discovery; cross-file pattern matching grep can't do). It's also *fuzzy* (vector similarity, not exact) and *opaque* (the vector store is not user-editable). The discipline: be conservative about when to wire it in. The wrong shape for the right question is a common mistake.
|
||||
|
||||
---
|
||||
|
||||
## 0. The 6 rules (the one-glance table)
|
||||
|
||||
| # | Rule | Why |
|
||||
|---|---|---|
|
||||
| 1 | RAG is **opt-in**. Default-off in new projects | Most features don't need it; the cost of unnecessary RAG is the embedding-provider round trip + the storage cost |
|
||||
| 2 | RAG **complements**; it never **replaces** | Curation / Discussion / Knowledge are the durable, user-editable dimensions; RAG is the fuzzy, semantic search |
|
||||
| 3 | RAG results display with **provenance** | The user needs to know which file and which chunk produced the result |
|
||||
| 4 | RAG **never mutates state** | No auto-injection of RAG results into `disc_entries`; no auto-update of `FileItem`; no auto-write to disk |
|
||||
| 5 | RAG integration is **feature-gated** | A feature must explicitly request RAG in its scope; RAG is not the default for "give me context" |
|
||||
| 6 | RAG failure is **graceful** | A failed search returns `Result.empty` or an empty list; never crashes the request |
|
||||
|
||||
---
|
||||
|
||||
## 1. RAG is opt-in (Rule 1)
|
||||
|
||||
**The default is OFF.** A new project opens with `rag_enabled = false`. The user opts in via the AI Settings panel.
|
||||
|
||||
**The rationale.** RAG is not free:
|
||||
- The embedding-provider round trip adds latency (200-500ms per call, per provider)
|
||||
- The storage cost grows with the indexed corpus (per `RAGConfig.chunk_size` and `chunk_overlap`)
|
||||
- The dim-mismatch fix at `16412ad5` shows that switching providers requires a full re-index (the existing collection is incompatible with the new provider's embedding dimension)
|
||||
|
||||
For a project that doesn't *need* semantic search (e.g., a small Python project with 20 files), RAG is overhead, not benefit.
|
||||
|
||||
**The opt-in surface.** Per the existing `[ai_settings.toml]` pattern:
|
||||
- `[X] Enable RAG` checkbox
|
||||
- Source: `(project / global / none)` radio
|
||||
- Embedding provider: `(gemini / local)` dropdown
|
||||
- Chunk size: integer (default 1000)
|
||||
- Chunk overlap: integer (default 200)
|
||||
|
||||
**The opt-out is also supported.** `rm ~/.manual_slop/.slop_cache/chroma_<provider>/` deletes the index. Re-enabling requires a full re-index.
|
||||
|
||||
**The opt-out via the AI Settings:**
|
||||
```toml
|
||||
[ai_settings.rag]
|
||||
enabled = false # default for new projects
|
||||
```
|
||||
|
||||
**The opt-in is explicit:**
|
||||
```toml
|
||||
[ai_settings.rag]
|
||||
enabled = true
|
||||
source = "project"
|
||||
embedding_provider = "gemini"
|
||||
chunk_size = 1000
|
||||
chunk_overlap = 200
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. RAG complements; it never replaces (Rule 2)
|
||||
|
||||
**The 4 memory dimensions** (per `conductor/code_styleguides/agent_memory_dimensions.md`):
|
||||
|
||||
| Dim | SSDL | Use when |
|
||||
|---|---|---|
|
||||
| Curation | `[Q]` | "How to render a file" |
|
||||
| Discussion | `o==>` | "What was said in this chat" |
|
||||
| **RAG** | `[Q]` | **"What similar content exists"** |
|
||||
| Knowledge | `o==>` | "What we learned from past runs" |
|
||||
|
||||
**The rule.** RAG is the *fuzzy semantic search* dimension. It is NOT:
|
||||
- A replacement for curation (use `FileItem.view_mode` + Fuzzy Anchors)
|
||||
- A replacement for discussion (use `disc_entries`)
|
||||
- A replacement for knowledge (use `knowledge/digest.md`)
|
||||
|
||||
**The cross-cutting principle.** When a feature asks "give me context," the answer is *not* "enable RAG." The answer is "which of the 4 dimensions is the right home?" — and the 4-dim decision tree is the test.
|
||||
|
||||
**The "complement" examples:**
|
||||
- A new discussion opens: render the active preset's `FileItem`s (curation) + the `disc_entries` (discussion) + the knowledge digest (knowledge). *Optionally* append `{rag-context}` if the user has opted in.
|
||||
- The LLM asks "what's the execution clutch?": try knowledge first (the user has decided it's a durable concept). Try discussion second (search the prior entries for "clutch"). Try RAG third (semantic search across the indexed codebase). Curation fourth (the user has configured specific files).
|
||||
- The user asks "where does X happen?": RAG is the *natural* shape for this question (semantic search). Use it.
|
||||
|
||||
---
|
||||
|
||||
## 3. Provenance required (Rule 3)
|
||||
|
||||
**The principle.** When RAG returns results, the user must be able to see *which file* and *which chunk* produced the result. No black boxes.
|
||||
|
||||
**The RAG result shape** (per `RAGEngine.search`):
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class SearchResult:
|
||||
file_path: str # the absolute path
|
||||
chunk_offset: int # byte offset within the file
|
||||
chunk_length: int # length in bytes
|
||||
content: str # the matched text
|
||||
similarity: float # the cosine similarity
|
||||
```
|
||||
|
||||
**The display in the LLM context** (the `{rag-context}` block):
|
||||
|
||||
```
|
||||
{rag-context}
|
||||
## src/ai_client.py:512-768 (similarity: 0.87)
|
||||
...content...
|
||||
|
||||
## src/aggregate.py:142-289 (similarity: 0.82)
|
||||
...content...
|
||||
{/rag-context}
|
||||
```
|
||||
|
||||
**The display in the GUI** (the per-result tooltip):
|
||||
|
||||
```
|
||||
[Anthropic cache-aware send]
|
||||
File: src/ai_client.py:512-768
|
||||
Similarity: 0.87
|
||||
Click to jump to file
|
||||
```
|
||||
|
||||
**The provenance is not optional.** If a result has no provenance, it doesn't go in the context.
|
||||
|
||||
**The cross-references.** The dim-mismatch fix at `16412ad5` shows the kind of bug that happens when the RAG index loses provenance: switching providers silently corrupts the index because the embeddings have different dimensions. The provenance (file path + chunk offset) is what makes the index re-buildable.
|
||||
|
||||
---
|
||||
|
||||
## 4. RAG never mutates state (Rule 4)
|
||||
|
||||
**The principle.** RAG is a *query* dimension. It returns data; it does not write data.
|
||||
|
||||
**The mutation rules:**
|
||||
- RAG results **do NOT** go into `disc_entries`
|
||||
- RAG results **do NOT** update `FileItem` curation state
|
||||
- RAG results **do NOT** write to disk
|
||||
- RAG results **do NOT** trigger knowledge harvest
|
||||
- RAG results **do NOT** modify the system prompt or persona
|
||||
|
||||
**The exception (none).** There is no feature that should mutate state from RAG results. If a feature wants to "remember" something from RAG, the user must explicitly say "add that to the discussion" (which appends a `role: "User"` entry to `disc_entries`) or "harvest that into knowledge" (which runs the harvest workflow).
|
||||
|
||||
**The boundary in code:**
|
||||
|
||||
```python
|
||||
# In ai_client.py:send() (the integration point)
|
||||
def send(...):
|
||||
prompt = aggregate.build(...)
|
||||
if config.rag_enabled:
|
||||
results = rag_engine.search(prompt, k=N)
|
||||
prompt = append_rag_block(prompt, results) # READ ONLY
|
||||
return self._send_<provider>(prompt, ...)
|
||||
# NO mutation of: disc_entries, FileItem, knowledge files
|
||||
```
|
||||
|
||||
**The mutation must happen in a different function, called explicitly by the user or the LLM with HITL approval.**
|
||||
|
||||
---
|
||||
|
||||
## 5. Feature-gated integration (Rule 5)
|
||||
|
||||
**The principle.** A feature must explicitly request RAG in its scope. RAG is not the default for "give me context."
|
||||
|
||||
**The gate.** Every feature that uses RAG declares the dependency in its spec, plan, and changelog:
|
||||
|
||||
```markdown
|
||||
## Scope
|
||||
- Feature X (uses RAG for semantic search)
|
||||
- Feature Y (no RAG dependency; uses Curation + Discussion only)
|
||||
|
||||
## Dependencies
|
||||
- RAG is required for Feature X; the user must opt-in via AI Settings
|
||||
- Feature Y is independent of RAG
|
||||
```
|
||||
|
||||
**The runtime gate.** The feature's code checks `config.rag_enabled` and behaves accordingly:
|
||||
|
||||
```python
|
||||
# In the feature's code
|
||||
def feature_x(query: str) -> list[SearchResult]:
|
||||
if not config.rag_enabled:
|
||||
raise RAGNotEnabledError("Feature X requires RAG; opt in via AI Settings")
|
||||
return rag_engine.search(query, k=N)
|
||||
```
|
||||
|
||||
**The error message is explicit.** The user knows why the feature isn't working.
|
||||
|
||||
**The CLI surface** (for testing and debugging):
|
||||
```bash
|
||||
$ python -m src.feature_x "execution clutch"
|
||||
# Error: RAG not enabled. Enable via: [ai_settings.toml] rag.enabled = true
|
||||
```
|
||||
|
||||
**The audit trail.** Every feature that uses RAG is logged in `metadata.json` for the feature's track: `uses_rag: true`.
|
||||
|
||||
---
|
||||
|
||||
## 6. Graceful failure (Rule 6)
|
||||
|
||||
**The principle.** RAG failure is data, not an exception. A failed search returns an empty result; the request continues.
|
||||
|
||||
**The failure modes** (in priority order):
|
||||
|
||||
| Failure | Handling |
|
||||
|---|---|
|
||||
| RAG not enabled | Skip; no `{rag-context}` block; the request continues |
|
||||
| ChromaDB not initialized | Skip; log a warning; the request continues |
|
||||
| Embedding provider not available | Skip; log a warning; the request continues |
|
||||
| Index missing (first run) | Skip; log a warning; the request continues |
|
||||
| Search returns empty | Normal; no `{rag-context}` block; the request continues |
|
||||
| Search times out | Return partial results; log a warning |
|
||||
| Search raises an exception | Catch; log the exception; return empty; the request continues |
|
||||
|
||||
**The exception is `Result[T, ErrorInfo]`, not an exception.** Per the `data_oriented_error_handling_20260606` convention.
|
||||
|
||||
```python
|
||||
# In the RAG engine
|
||||
def search(self, query: str, k: int = 5) -> Result[list[SearchResult], ErrorInfo]:
|
||||
try:
|
||||
if not self._enabled:
|
||||
return Result(data=[], errors=[ErrorInfo(NOT_READY, "RAG not enabled")])
|
||||
if not self._collection:
|
||||
return Result(data=[], errors=[ErrorInfo(NOT_READY, "RAG not initialized")])
|
||||
results = self._collection.query(query, k=k)
|
||||
return Result(data=results, errors=[])
|
||||
except Exception as exc:
|
||||
return Result(data=[], errors=[ErrorInfo(INTERNAL, str(exc))])
|
||||
```
|
||||
|
||||
**The caller** (`ai_client.py:send`) checks `.errors` and proceeds with empty results:
|
||||
|
||||
```python
|
||||
rag_result = rag_engine.search(prompt, k=N)
|
||||
if rag_result.ok and rag_result.data:
|
||||
prompt = append_rag_block(prompt, rag_result.data)
|
||||
# else: proceed without RAG; the request doesn't fail
|
||||
```
|
||||
|
||||
**The user sees the warning** in the comms log:
|
||||
```
|
||||
[RAG] search failed: ChromaDB not initialized
|
||||
[RAG] request continues without RAG
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. The wiring points (the where)
|
||||
|
||||
| Where in `src/` | What it does | What it does NOT do |
|
||||
|---|---|---|
|
||||
| `src/ai_client.py:send` | The integration point; appends `{rag-context}` if enabled | Does not mutate state |
|
||||
| `src/aggregate.py:run` | Builds the initial context; appends `{rag-context}` in the volatile layer | Does not query RAG directly |
|
||||
| `src/rag_engine.py:search` | The semantic search; returns `Result[list[SearchResult], ErrorInfo]` | Does not write to the index |
|
||||
| `src/rag_engine.py:index_file` | The indexer; called by `RAGEngine._init_vector_store` or by the harvest CLI | Does not run at LLM call time |
|
||||
| `src/ai_settings.toml` (or GUI) | The opt-in surface | Does not trigger RAG automatically |
|
||||
|
||||
---
|
||||
|
||||
## 8. The forbidden patterns (the "don't do this" list)
|
||||
|
||||
| Pattern | Why it's forbidden |
|
||||
|---|---|
|
||||
| RAG as a *replacement* for curation | Curation is structural (per-file schema); RAG is semantic (fuzzy). Use curation for "how to render file X" |
|
||||
| RAG as a *replacement* for discussion | Discussion is precise (the actual messages); RAG is fuzzy. Use discussion for "what was said" |
|
||||
| RAG as a *replacement* for knowledge | Knowledge is durable (user-edited, provenance-aware); RAG is volatile (indexed, opaque). Use knowledge for "what we decided" |
|
||||
| Auto-inject RAG results into `disc_entries` | This is a state mutation; it changes the conversation in a way the user didn't ask for |
|
||||
| Auto-write RAG results to disk | Same; no mutation |
|
||||
| Use RAG when the user hasn't opted in | RAG is opt-in; default-off in new projects |
|
||||
| Crash the request when RAG fails | Graceful failure; the request continues |
|
||||
| Use RAG for "show me the last thing the user said" | Use `disc_entries` (precise) |
|
||||
| Use RAG for "show me what we decided last time" | Use the knowledge digest (durable) |
|
||||
| Use RAG for "show me the file the user is editing" | Use `FileItem` (curation) |
|
||||
|
||||
---
|
||||
|
||||
## 9. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` §3 — the RAG dim in context
|
||||
- `conductor/code_styleguides/data_oriented_design.md` §1.2 — "Design around a model of the world" (the underlying anti-pattern)
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — where the 4 dims get injected in the cache strategy
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge dim (the alternative for "what we decided")
|
||||
- `docs/guide_rag.md` — the existing RAG deep-dive
|
||||
- `data_oriented_error_handling_20260606` — the `Result[T, ErrorInfo]` pattern
|
||||
- `conductor/tracks/rag_phase4_stress_fix_20260606` — the dim-mismatch fix at `16412ad5`
|
||||
@@ -111,3 +111,43 @@ The product guidelines are best understood alongside the per-source-file guides
|
||||
- **[docs/guide_models.md](../docs/guide_models.md):** §"Design Principles" + §"SDM Tags" — centralized registry, pydantic validation, `[C: ...]` / `[M: ...]` tags in docstrings.
|
||||
- **[docs/guide_testing.md](../docs/guide_testing.md):** §"Structural Testing Contract" — Ban on Arbitrary Core Mocking, `live_gui` Standard, Artifact Isolation.
|
||||
- **[code_styleguides/config_state_owner.md](code_styleguides/config_state_owner.md):** Config I/O state ownership — `AppController` is the single source of truth; direct calls to `models.save_config`/`models.load_config` in `src/` are forbidden (enforced by `scripts/audit_no_models_config_io.py`).
|
||||
## Memory Dimensions (added 2026-06-12)
|
||||
|
||||
The conversation data has 4 distinct memory dimensions. Features touch 1-2 typically; some touch 3. The dimensions are not interchangeable.
|
||||
|
||||
| Dim | Where | What it stores | User-editable | Status |
|
||||
|---|---|---|---|---|
|
||||
| Curation | `FileItem` + `ContextPreset` | *How to render a file* | Structural File Editor | Existing, strong |
|
||||
| Discussion | `disc_entries` + branching + UISnapshot | *What was said* | GUI `[Edit]` mode; undo/redo | Existing, strong |
|
||||
| RAG | `src/rag_engine.py` (ChromaDB) | *Semantic fingerprints* | (opaque) | Opt-in |
|
||||
| Knowledge | `~/.manual_slop/knowledge/*.md` + per-file + digest | *Durable learnings* | Plain markdown | Proposed (Candidate 8) |
|
||||
|
||||
**The product decision.** When scoping a new feature, identify which dimension(s) the feature touches. Pick the matching dimension; don't reach for the wrong shape. The full cross-cutting guide is `docs/guide_agent_memory_dimensions.md`. The canonical styleguide is `conductor/code_styleguides/agent_memory_dimensions.md`.
|
||||
|
||||
**The 6 design rules (the product implications).**
|
||||
|
||||
1. **Curation is structural.** Per-file schema; AST-aware; user-edited. Not conversational.
|
||||
2. **Discussion is conversational.** Per-discussion, multi-turn. Not per-file. Not semantic.
|
||||
3. **RAG is opt-in, fuzzy, semantic.** Default-off in new projects. Complements; never replaces. Provenance required. No mutation.
|
||||
4. **Knowledge is durable, user-editable, provenance-aware.** The category files are the source of truth; the digest is a projection. "Delete to turn off": `rm digest.md`.
|
||||
5. **Cache hits only on the stable prefix** (layers 1-7 of the 12-layer model). The volatile suffix (layers 8-12) is never cached.
|
||||
6. **Feature flags are data, not config.** File presence ("delete to turn off") for side artifacts; config flags for persistent preferences; CLI flags for one-shot overrides.
|
||||
|
||||
## See Also — Updated (2026-06-12)
|
||||
|
||||
The canonical styleguide catalog (per the nagent_review v2.3 + intent_dsl_survey cross-references):
|
||||
|
||||
- **[conductor/code_styleguides/data_oriented_design.md](code_styleguides/data_oriented_design.md)** — The canonical DOD reference (Tier 0/1/2; 3 defaults to reject; 7-question simplification pass; 10-question self-check)
|
||||
- **[conductor/code_styleguides/agent_memory_dimensions.md](code_styleguides/agent_memory_dimensions.md)** — The 4 memory dimensions and when to use each
|
||||
- **[conductor/code_styleguides/rag_integration_discipline.md](code_styleguides/rag_integration_discipline.md)** — The conservative-RAG rule
|
||||
- **[conductor/code_styleguides/cache_friendly_context.md](code_styleguides/cache_friendly_context.md)** — Stable-to-volatile context ordering + the cache TTL GUI contract
|
||||
- **[conductor/code_styleguides/knowledge_artifacts.md](code_styleguides/knowledge_artifacts.md)** — The knowledge harvest pattern
|
||||
- **[conductor/code_styleguides/feature_flags.md](code_styleguides/feature_flags.md)** — File presence vs config flags vs CLI flags
|
||||
|
||||
And the user-facing deep-dives (the cross-cutting guides):
|
||||
|
||||
- **[docs/guide_agent_memory_dimensions.md](../docs/guide_agent_memory_dimensions.md)** — Cross-cutting: the 4 memory dimensions
|
||||
- **[docs/guide_knowledge_curation.md](../docs/guide_knowledge_curation.md)** — The knowledge memory guide (4th dim)
|
||||
- **[docs/guide_caching_strategy.md](../docs/guide_caching_strategy.md)** — Caching across providers
|
||||
- **[./docs/AGENTS.md](../docs/AGENTS.md)** — The agent-facing mirror of `docs/Readme.md`
|
||||
|
||||
|
||||
@@ -693,3 +693,87 @@ Whenever a track introduces a new convention that can be statically checked, add
|
||||
|
||||
**The audit-script + styleguide pair:** every audit script's documented "what it checks" should map to a section in a `conductor/code_styleguides/` file. The styleguide says "this is the rule"; the audit says "your code violates this rule." The pair is complete when both exist.
|
||||
|
||||
## Additions (2026-06-12) — the 12 patterns from the latest nagent corpus
|
||||
|
||||
This section extends the existing workflow with the patterns surfaced by the `nagent_review_20260608` review (v2.3, 2026-06-12). The patterns are:
|
||||
|
||||
1. **Knowledge harvest** (the 3rd memory dim): test-driven per the 7-category schema + the byte-equality test on the digest
|
||||
2. **Stable-to-volatile cache ordering**: test the byte-equality of the first N chars across turns
|
||||
3. **Conversation compaction**: test-driven per the 10-question self-review
|
||||
4. **RAG integration**: test the "no mutation" invariant + the graceful failure
|
||||
|
||||
### The knowledge harvest TDD protocol
|
||||
|
||||
**The shape.** The harvest's LLM output is strict JSON. The test is the parser's contract:
|
||||
|
||||
```
|
||||
- [ ] tests/test_knowledge_store.py: 5+ tests for the 7-category schema
|
||||
- [ ] parse_harvest_json: 7 categories; rows must be lists
|
||||
- [ ] parse_harvest_json: rejects prose
|
||||
- [ ] parse_harvest_json: tolerates ```json ... ``` code-fence
|
||||
- [ ] parse_harvest_json: rejects non-dict payloads
|
||||
- [ ] regenerate_digest: 4KB cap; truncation with note
|
||||
- [ ] tests/test_knowledge_harvest.py: 8+ tests for the pipeline
|
||||
- [ ] classify (live/user-kept/prune/harvest/keep)
|
||||
- [ ] merge_harvest per category
|
||||
- [ ] per-file knowledge: existing-file branch
|
||||
- [ ] per-file knowledge: missing-file branch
|
||||
- [ ] ledger dedup (sha256-of-content)
|
||||
- [ ] retry budget (2 attempts)
|
||||
- [ ] "too-large" budget guard (1MB)
|
||||
- [ ] "delete to turn off" regeneration
|
||||
```
|
||||
|
||||
### The cache ordering TDD protocol
|
||||
|
||||
**The shape.** The byte-equality of the first N chars is the design contract. The test:
|
||||
|
||||
```
|
||||
- [ ] tests/test_aggregate_caching.py: the byte-comparison test
|
||||
- [ ] first N chars are identical across turns of the same discussion
|
||||
- [ ] N = aggregate.stable_prefix_length(ctrl)
|
||||
- [ ] failure modes: new layer in wrong position, volatile input leak
|
||||
- [ ] tests/test_cache_state.py: 3+ tests for the cache state machine
|
||||
- [ ] per-provider TTL defaults
|
||||
- [ ] DiscussionCacheState lifecycle
|
||||
- [ ] invalidate + regeneration
|
||||
- [ ] tests/test_gui_caching.py: 3+ live_gui tests for the "Caching" panel
|
||||
- [ ] panel renders provider summaries
|
||||
- [ ] invalidate button
|
||||
- [ ] per-discussion disable/enable
|
||||
```
|
||||
|
||||
### The compaction TDD protocol
|
||||
|
||||
**The shape.** The compaction's LLM output is the 12-section structure. The 10-question self-review is the contract. The tests:
|
||||
|
||||
```
|
||||
- [ ] tests/test_run_discussion_compaction.py: 10+ tests
|
||||
- [ ] compact preserves decisions
|
||||
- [ ] compact preserves constraints
|
||||
- [ ] compact preserves failures
|
||||
- [ ] compact preserves artifact refs
|
||||
- [ ] compact removes duplicates
|
||||
- [ ] compact replaces chronology with state
|
||||
- [ ] compact is substantially smaller
|
||||
- [ ] compact preserves capability
|
||||
- [ ] compact returns 12-section structure
|
||||
- [ ] compact continues until self-review passes
|
||||
```
|
||||
|
||||
### The RAG discipline TDD protocol
|
||||
|
||||
**The shape.** RAG is opt-in, never mutates state, fails gracefully. The tests:
|
||||
|
||||
```
|
||||
- [ ] tests/test_rag_discipline.py: 4+ tests
|
||||
- [ ] RAG disabled: no {rag-context} block
|
||||
- [ ] RAG results have provenance (file path + chunk)
|
||||
- [ ] RAG results do not mutate disc_entries
|
||||
- [ ] RAG failure returns empty (graceful)
|
||||
```
|
||||
|
||||
See `conductor/code_styleguides/knowledge_artifacts.md`, `cache_friendly_context.md`, `rag_integration_discipline.md` for the canonical styleguides.
|
||||
|
||||
---
|
||||
|
||||
|
||||
+268
@@ -0,0 +1,268 @@
|
||||
# ./docs/AGENTS.md (the agent-facing mirror)
|
||||
|
||||
**Status:** Agent-facing mirror of `docs/Readme.md` (the human-facing docs index, which is preserved as-is). For agents (any tier), this is the recommended first read for understanding the project's docs structure.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `docs/Readme.md` (human-facing); `AGENTS.md` (project root); the 6 styleguides in `conductor/code_styleguides/`.
|
||||
|
||||
> **What this is.** `docs/Readme.md` is the human-facing docs index. *This* file is the agent-facing equivalent: it organizes the 14 deep-dive guides under `docs/` by MMA tier, and it cross-references the canonical styleguides. The 2 files cover the same docs but with different audiences and different reading paths.
|
||||
>
|
||||
> **The reading path.** If you're an agent scoping a feature, read this file first; then read the 1-2 `guide_*.md` files for the layers your feature touches; then read the 1-2 styleguides for the patterns the feature uses. The expected reading time for a typical feature: 10-15 minutes.
|
||||
|
||||
---
|
||||
|
||||
## 0. The 4 memory dimensions (the cross-cutting lens)
|
||||
|
||||
The conversation data has 4 distinct memory dimensions. Most features touch 1-2; some touch 3. Use this lens to identify which dimension(s) your feature needs.
|
||||
|
||||
| # | Dim | Where it lives | Use when | Styleguide |
|
||||
|---|---|---|---|---|
|
||||
| 1 | **Curation** | `FileItem` + `ContextPreset` + Fuzzy Anchors | "How to render a file" | (the curation is per `docs/guide_context_curation.md`) |
|
||||
| 2 | **Discussion** | `app.disc_entries` + branching + UISnapshot | "What was said in this chat" | (the discussion is per `docs/guide_architecture.md` §"Threading model") |
|
||||
| 3 | **RAG** | `src/rag_engine.py` (ChromaDB) | "What similar content exists" (opt-in) | `conductor/code_styleguides/rag_integration_discipline.md` |
|
||||
| 4 | **Knowledge** | `~/.manual_slop/knowledge/*.md` + per-file + digest | "What we learned from past sessions" | `conductor/code_styleguides/knowledge_artifacts.md` |
|
||||
|
||||
See `docs/guide_agent_memory_dimensions.md` for the full cross-cutting guide.
|
||||
|
||||
---
|
||||
|
||||
## 1. The 14 deep-dive guides (organized by MMA tier)
|
||||
|
||||
| Tier | Guide | What it covers | When to read |
|
||||
|---|---|---|---|
|
||||
| **T1** | `docs/guide_architecture.md` | Threading model; cross-thread state sync | When scoping any cross-cutting feature |
|
||||
| **T1** | `docs/guide_meta_boundary.md` | The Application vs Meta-Tooling split | When scoping a Meta-Tooling-side feature |
|
||||
| **T2** | `docs/guide_app_controller.md` | The headless controller; `AppState` dataclass | When implementing controller-side logic |
|
||||
| **T2** | `docs/guide_ai_client.md` | The multi-provider LLM client | When implementing LLM-side logic |
|
||||
| **T2** | `docs/guide_mma.md` | The 4-tier MMA orchestration | When implementing MMA-side logic |
|
||||
| **T2** | `docs/guide_tools.md` | The MCP tool inventory + Hook API | When implementing MCP tools or Hook endpoints |
|
||||
| **T2** | `docs/guide_mcp_client.md` | The 45 tools + 3-layer security | When implementing new MCP tools or sub-MCPs |
|
||||
| **T3** | `docs/guide_context_curation.md` | Granular AST Control + Fuzzy Anchors + Structural File Editor | When implementing curation-side features |
|
||||
| **T3** | `docs/guide_personas.md` | The unified agent profile model | When implementing persona-side features |
|
||||
| **T3** | `docs/guide_rag.md` | The RAG subsystem | When implementing RAG-side features (rare; opt-in) |
|
||||
| **T3** | `docs/guide_gui_2.md` | The ImGui application | When implementing GUI-side features |
|
||||
| **All** | `docs/guide_testing.md` | The test suite architecture (251 test files; 7 conftest fixtures) | When writing any test |
|
||||
| **All** | `docs/guide_command_palette.md` | The 33 commands + "Everything" mode | When implementing command-palette features |
|
||||
| **NEW** | `docs/guide_knowledge_curation.md` | The knowledge memory guide (4th dim) | When implementing knowledge-side features |
|
||||
| **NEW** | `docs/guide_caching_strategy.md` | Caching across providers; stable-to-volatile ordering; cache TTL GUI | When implementing cache-side features |
|
||||
| **NEW** | `docs/guide_agent_memory_dimensions.md` | Cross-cutting: the 4 memory dimensions | When scoping any feature that touches memory |
|
||||
|
||||
---
|
||||
|
||||
## 2. The 6 canonical styleguides (the convention catalog)
|
||||
|
||||
| Styleguide | What it codifies | When to read |
|
||||
|---|---|---|
|
||||
| `conductor/code_styleguides/data_oriented_design.md` | The canonical DOD reference (Tier 0/1/2; 3 defaults to reject; 7-question simplification pass; 10-question self-check) | Before any non-trivial work |
|
||||
| `conductor/code_styleguides/agent_memory_dimensions.md` | The 4 memory dimensions and when to use each | When the feature touches memory |
|
||||
| `conductor/code_styleguides/rag_integration_discipline.md` | The conservative-RAG rule (opt-in; complements; provenance; no mutation; feature-gated; graceful failure) | When the feature uses RAG |
|
||||
| `conductor/code_styleguides/cache_friendly_context.md` | Stable-to-volatile context ordering; the cache TTL GUI contract; the byte-comparison test | When the feature builds context or caches |
|
||||
| `conductor/code_styleguides/knowledge_artifacts.md` | The knowledge harvest pattern (category files, provenance, sha256 ledger, digest regeneration) | When the feature uses the knowledge dim |
|
||||
| `conductor/code_styleguides/feature_flags.md` | File presence ("delete to turn off") vs config flags vs CLI flags; when to use each | When adding a new feature toggle |
|
||||
|
||||
---
|
||||
|
||||
## 3. The per-tier reading path
|
||||
|
||||
### Tier 1 (Orchestrator) — what to read
|
||||
|
||||
For scoping a feature, understanding the architecture, and planning:
|
||||
|
||||
| Read | Why |
|
||||
|---|---|
|
||||
| `docs/guide_architecture.md` | The threading model; the cross-thread data flow |
|
||||
| `docs/guide_meta_boundary.md` | The Application vs Meta-Tooling split (load-bearing) |
|
||||
| `docs/guide_agent_memory_dimensions.md` | The 4 memory dimensions (which dim does my feature touch?) |
|
||||
| `conductor/code_styleguides/data_oriented_design.md` | The 3 defaults to reject; the simplification pass; the final self-check |
|
||||
| `AGENTS.md` (project root) | The project-root agent-facing rules |
|
||||
| This file (`.docs/AGENTS.md`) | The docs structure |
|
||||
|
||||
**Tier 1 does NOT typically read:** `guide_*.md` for the specific subsystems (T2 reads those).
|
||||
|
||||
### Tier 2 (Tech Lead) — what to read
|
||||
|
||||
For track design, ticket generation, and architecture:
|
||||
|
||||
| Read | Why |
|
||||
|---|---|
|
||||
| All of Tier 1's reads | (foundational) |
|
||||
| `docs/guide_app_controller.md` | The headless controller; the `_predefined_callbacks` and `_gettable_fields` registries |
|
||||
| `docs/guide_ai_client.md` | The LLM client; the providers; the cache strategy |
|
||||
| `docs/guide_mma.md` | The 4-tier MMA; the DAG engine; the worker pool |
|
||||
| `docs/guide_tools.md` | The MCP tool inventory; the Hook API; the 3-layer security |
|
||||
| `conductor/code_styleguides/agent_memory_dimensions.md` | (for memory-touching tracks) |
|
||||
| `conductor/code_styleguides/cache_friendly_context.md` | (for context-building tracks) |
|
||||
|
||||
**Tier 2 does NOT typically read:** `guide_context_curation.md`, `guide_personas.md`, `guide_rag.md`, `guide_gui_2.md` (T3 reads those).
|
||||
|
||||
### Tier 3 (Worker) — what to read
|
||||
|
||||
For surgical implementation:
|
||||
|
||||
| Read | Why |
|
||||
|---|---|
|
||||
| All of Tier 2's reads (selectively) | (the system context) |
|
||||
| The 1-2 `guide_*.md` files for the specific layers the ticket touches | (the implementation surface) |
|
||||
| The 1-2 `code_styleguides/...md` files for the patterns the ticket uses | (the convention) |
|
||||
| The ticket itself (`conductor/tracks/<id>/plan.md`) | (the specific task) |
|
||||
|
||||
**Tier 3 reads in depth, not in breadth.** A typical T3 worker reads 2-4 docs total.
|
||||
|
||||
### Tier 4 (QA) — what to read
|
||||
|
||||
For error analysis and bug reproduction:
|
||||
|
||||
| Read | Why |
|
||||
|---|---|
|
||||
| All of Tier 2's reads (selectively) | (the system context) |
|
||||
| The 1-2 `guide_*.md` files for the failing layer | (the reproduction surface) |
|
||||
| The test file (if any) | (the verification surface) |
|
||||
| The audit scripts (`scripts/audit_*.py`) | (the static analysis surface) |
|
||||
|
||||
**Tier 4 reads narrowly.** The bug is in 1-2 files; the read is in 1-2 docs.
|
||||
|
||||
---
|
||||
|
||||
## 4. The 4 memory dimensions (the cross-cutting lens, in detail)
|
||||
|
||||
Most features touch 1-2 dimensions. Use this decision tree:
|
||||
|
||||
```
|
||||
Q: What is the *data* the feature needs?
|
||||
│
|
||||
├── "How to render a file" ──► Curation (FileItem)
|
||||
├── "What was said in this chat" ──► Discussion (disc_entries)
|
||||
├── "What similar content exists" ──► RAG (RAGEngine.search) [opt-in]
|
||||
└── "What we learned from past runs" ──► Knowledge (knowledge/digest.md)
|
||||
```
|
||||
|
||||
**Pick the matching dimension.** If the feature needs 2+, use 2+ — but be explicit about which is *primary* and which is *secondary*.
|
||||
|
||||
**The wrong shape for the right question is a common mistake:**
|
||||
- "Where does X happen?" → RAG (semantic search)
|
||||
- "How do I configure how file Y is rendered?" → Curation (FileItem)
|
||||
- "What was the user asking about 3 turns ago?" → Discussion (disc_entries)
|
||||
- "What did we decide last time about Z?" → Knowledge (digest)
|
||||
|
||||
See `docs/guide_agent_memory_dimensions.md` for the full cross-cutting guide.
|
||||
|
||||
---
|
||||
|
||||
## 5. The caching strategy (the cross-cutting concern)
|
||||
|
||||
If the feature builds the initial context (in `aggregate.py:run`) or calls the LLM (in `ai_client.py:send`), the cache strategy matters.
|
||||
|
||||
**The 12-layer model:**
|
||||
|
||||
| # | Layer | Stable across turns? | Where the cache hits |
|
||||
|---|---|---|---|
|
||||
| 1-7 | Role instructions, function-calling schema, tool descriptions, system prompt, persona, project context, knowledge digest | **YES** (cacheable) | Anthropic `cache_control`, Gemini `cachedContent`, OpenAI implicit |
|
||||
| 8-12 | Discussion metadata, active preset, per-file details, prior tool results, user message | **NO** (per turn) | NOT cached |
|
||||
|
||||
**The byte-comparison test** (the design contract for the stable prefix):
|
||||
|
||||
```python
|
||||
def test_aggregate_stable_to_volatile_ordering():
|
||||
"""The first N characters of the context should be identical across turns
|
||||
of the same conversation, when no stable-layer inputs change."""
|
||||
...
|
||||
```
|
||||
|
||||
**The provider-specific TTLs:**
|
||||
|
||||
| Provider | Default TTL | Configurable? |
|
||||
|---|---|---|
|
||||
| Anthropic ephemeral | 5 min | yes (per-provider control surface) |
|
||||
| Gemini explicit | 1 h | yes (per-discussion override) |
|
||||
| OpenAI implicit | 5-10 min (provider-managed) | no |
|
||||
|
||||
**The GUI exposure** is a "Caching" Operations Hub sub-panel (per the v2.3 §5.3 sketch). See `docs/guide_caching_strategy.md` for the full guide and `conductor/code_styleguides/cache_friendly_context.md` for the styleguide.
|
||||
|
||||
---
|
||||
|
||||
## 6. The knowledge harvest (the durable layer)
|
||||
|
||||
The 4th memory dimension (knowledge) is *opt-in but encouraged* — it's the durable, user-editable, provenance-aware store of facts / decisions / questions / playbooks / per-file notes.
|
||||
|
||||
**The directory layout** (per the user's `~/.manual_slop/knowledge/`):
|
||||
|
||||
```
|
||||
knowledge/
|
||||
├── facts.md # - {statement} {provenance}
|
||||
├── decisions.md # - {statement, reason} {provenance}
|
||||
├── questions.md # - {question} {provenance}
|
||||
├── playbooks.md # - **{name}**: {steps} {provenance}
|
||||
├── tasks.md # ## Open / ## Done
|
||||
├── files/{file_id}.md # per-file notes (keyed by inode)
|
||||
├── digest.md # bounded 4KB; the projection; "delete to turn off"
|
||||
├── ledger.json # sha256-of-content audit log
|
||||
└── prompts/harvest-conversation.md # user-editable
|
||||
```
|
||||
|
||||
**The harvest CLI:** `python -m src.knowledge_harvest [--apply] [--no-harvest] [--max-harvest-bytes N]`. Default: dry-run.
|
||||
|
||||
**The LLM output is strict JSON** (no prose, no markdown fence) with 7 categories. The retry budget is 2 attempts.
|
||||
|
||||
**The "delete to turn off" pattern:** `rm ~/.manual_slop/knowledge/digest.md` → no `{knowledge}` block injected. Re-enable by running the harvest.
|
||||
|
||||
See `docs/guide_knowledge_curation.md` for the full guide and `conductor/code_styleguides/knowledge_artifacts.md` for the styleguide.
|
||||
|
||||
---
|
||||
|
||||
## 7. The RAG discipline (the opt-in fuzzy dimension)
|
||||
|
||||
RAG is the *fuzzy semantic search* dimension. It's *opt-in* (default-off in new projects). The 6 rules:
|
||||
|
||||
1. **Opt-in.** Default-off in new projects
|
||||
2. **Complements; never replaces.** RAG is one of 4 dimensions, not a substitute
|
||||
3. **Provenance required.** Every result shows file + chunk
|
||||
4. **No mutation.** RAG results never write to `disc_entries`, `FileItem`, or disk
|
||||
5. **Feature-gated.** A feature must explicitly request RAG in its scope
|
||||
6. **Graceful failure.** Failed search returns empty; the request continues
|
||||
|
||||
See `docs/guide_rag.md` for the full RAG guide and `conductor/code_styleguides/rag_integration_discipline.md` for the styleguide.
|
||||
|
||||
---
|
||||
|
||||
## 8. The feature flag patterns (when to use what)
|
||||
|
||||
When adding a new feature with an "on/off" toggle, choose the right pattern:
|
||||
|
||||
| Pattern | When to use | Example |
|
||||
|---|---|---|
|
||||
| **File presence** ("delete to turn off") | The feature produces a side artifact; the user might want to clean up by `rm`-ing it | `~/.manual_slop/knowledge/digest.md` |
|
||||
| **Config flag** | The feature is always on; the flag is a persistent preference | `[ai_settings.toml] rag.enabled` |
|
||||
| **CLI flag** | The feature is invoked from the CLI; the flag is a one-shot override | `python -m src.knowledge_harvest --apply` |
|
||||
| **Track metadata flag** | The track's implementation uses a feature; this is *static documentation* | `metadata.json`: `{"uses_rag": true}` |
|
||||
|
||||
See `conductor/code_styleguides/feature_flags.md` for the full guide.
|
||||
|
||||
---
|
||||
|
||||
## 9. The cross-cutting principles (the data-oriented foundation)
|
||||
|
||||
All 14 docs and 6 styleguides share the same foundation (per `data_oriented_design.md`):
|
||||
|
||||
- **The data is the thing.** The conversation, the file items, the knowledge digest — these are the source of truth
|
||||
- **Behavior is transformation over data.** Not object graphs; not hidden state; not opaque handles
|
||||
- **Avoid hidden mutable state.** Errors are data, not exceptions. State is on disk, not in memory
|
||||
- **Separate durable artifacts from temporary execution.** Workers are disposable; artifacts are durable
|
||||
- **Optimize the shape, availability, and maintenance of the data.** Editable, provenance-aware, user-editable
|
||||
|
||||
When in doubt, read `conductor/code_styleguides/data_oriented_design.md` first.
|
||||
|
||||
---
|
||||
|
||||
## 10. The reading path (the 1-page summary)
|
||||
|
||||
For an agent scoping a feature:
|
||||
|
||||
1. **Read this file** (10 min)
|
||||
2. **Read the 1-2 `guide_*.md`** for the layers your feature touches (5-10 min each)
|
||||
3. **Read the 1-2 `code_styleguides/...md`** for the patterns your feature uses (5-10 min each)
|
||||
4. **Read the ticket** (`conductor/tracks/<id>/plan.md`) for the specific task (variable)
|
||||
|
||||
Total: 20-45 min for a typical feature. The investment pays back across the feature's lifetime.
|
||||
|
||||
If a guide is missing or stale, that's a bug; file a docs issue (or update the guide inline, per the project's "edit the source of truth, not this file" pattern).
|
||||
|
||||
End of agent-facing mirror.
|
||||
@@ -0,0 +1,283 @@
|
||||
# The 4 Memory Dimensions (cross-cutting guide)
|
||||
|
||||
**Status:** User-facing cross-cutting guide on the 4 memory dimensions. For agents, see `./docs/AGENTS.md` §0.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/agent_memory_dimensions.md`; `docs/guide_context_curation.md`; `docs/guide_rag.md`; `docs/guide_knowledge_curation.md`; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §2.8.
|
||||
|
||||
> **What this is.** The conversation data has 4 distinct memory dimensions. Most features touch 1-2; some touch 3. This guide is the cross-cutting reference: when to use which dimension, the boundaries between them, and the decision tree for "which dim does this feature need?"
|
||||
|
||||
---
|
||||
|
||||
## 0. The 30-second version
|
||||
|
||||
Manual Slop has 4 memory dimensions for the conversation data:
|
||||
|
||||
| # | Dim | Where it lives | What it stores | Status |
|
||||
|---|---|---|---|---|
|
||||
| 1 | **Curation** | `FileItem` + `ContextPreset` + Fuzzy Anchors | *How to render a file* in the AI's context window | Existing, strong |
|
||||
| 2 | **Discussion** | `app.disc_entries` + branching + UISnapshot | *What was said* in the conversation | Existing, strong |
|
||||
| 3 | **RAG** | `src/rag_engine.py` (ChromaDB) | *Semantic fingerprints* of indexed files | Opt-in |
|
||||
| 4 | **Knowledge** | `~/.manual_slop/knowledge/*.md` + per-file + digest | *Durable learnings* from past sessions | Proposed (Candidate 8) |
|
||||
|
||||
**The decision tree:**
|
||||
|
||||
```
|
||||
Q: What is the *data* the feature needs?
|
||||
│
|
||||
├── "How to render a file" ──► Curation (FileItem)
|
||||
├── "What was said in this chat" ──► Discussion (disc_entries)
|
||||
├── "What similar content exists" ──► RAG (RAGEngine.search) [opt-in]
|
||||
└── "What we learned from past runs" ──► Knowledge (knowledge/digest.md)
|
||||
```
|
||||
|
||||
**Pick the matching dimension.** If the feature needs 2+, use 2+ — but be explicit about which is *primary* and which is *secondary*.
|
||||
|
||||
---
|
||||
|
||||
## 1. Curation memory (per-file, per-discussion, structural)
|
||||
|
||||
**The shape.** Per-file curation config in `FileItem`:
|
||||
- `path` (the file identity)
|
||||
- `auto_aggregate` (include in auto-aggregation?)
|
||||
- `force_full` (bypass aggregation with full content?)
|
||||
- `view_mode` (`full / skeleton / summary / sig / def / agg`)
|
||||
- `ast_signatures` (signatures only?)
|
||||
- `ast_definitions` (definitions only?)
|
||||
- `ast_mask` (per-symbol mask)
|
||||
- `custom_slices` (Fuzzy Anchors)
|
||||
|
||||
A `ContextPreset` is a named, persisted set of `FileItem`s. Both persist in the project TOML.
|
||||
|
||||
**The query model.** "When discussion X opens, render file Y per its curation memory." Implicit in `aggregate.py:run` at discussion start. The user doesn't query the curation memory directly; they *configure* it.
|
||||
|
||||
**The right tool.** The Structural File Editor (per `docs/guide_context_curation.md`). AST-aware slices, Fuzzy Anchor slices, view-mode picker. The file's `FileItem` is the UI surface.
|
||||
|
||||
**The wrong tool.** Storing curation state in `disc_entries` (it's not conversational). Storing curation state in the RAG index (it's structural, not semantic). Storing curation state in the knowledge digest (it's per-discussion, not durable).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:discussion starts]
|
||||
│
|
||||
▼
|
||||
[Q:which ContextPreset is active?]
|
||||
│
|
||||
├── preset N ──► [I:load ContextPreset N's FileItems]
|
||||
│
|
||||
▼
|
||||
[loop: each FileItem]
|
||||
│
|
||||
├──► [Q:FileItem.view_mode?]
|
||||
│ ├── full ──► [I:read full file]
|
||||
│ ├── skeleton ──► [I:py_get_skeleton / ts_c_get_skeleton]
|
||||
│ ├── summary ──► [I:run_subagent_summarization]
|
||||
│ ├── sig ──► [I:py_get_skeleton (signatures only)]
|
||||
│ ├── def ──► [I:py_get_skeleton (definitions only)]
|
||||
│ └── agg ──► [I:py_get_skeleton (children only)]
|
||||
│
|
||||
├──► [Q:FileItem.ast_mask?] ──► [I:apply ast_mask to the rendered view]
|
||||
├──► [Q:FileItem.custom_slices?] ──► [I:apply custom_slices]
|
||||
└──► [I:append to aggregate markdown]
|
||||
```
|
||||
|
||||
**The shape rule.** Curation is per-file, per-discussion, structural. Edited at the Structural File Editor. Persisted in TOML. The file's `FileItem` is the single source of truth for "how do I render this file in the AI's context."
|
||||
|
||||
**See:** `docs/guide_context_curation.md`; `src/models.py:510-559` (FileItem schema); `src/context_presets.py` (ContextPresetManager).
|
||||
|
||||
---
|
||||
|
||||
## 2. Discussion memory (per-discussion, conversational, multi-turn)
|
||||
|
||||
**The shape.** `app.disc_entries: list[dict]` where each entry is `{"role": str, "content": str, "collapsed": bool, "ts": str, ...}` plus optional `thinking_segments` and `usage` (token accounting). The discussion is rendered as a `list[Message]` for the LLM by `build_markdown` (per `src/aggregate.py`).
|
||||
|
||||
**The query model.** "What did the user say? What did the AI say? In what order?" The discussion is the *prior context* for the next LLM call. The user can edit, insert, delete, role-change, and branch at any entry (A1-A7 per-entry operations per the nagent review v1 §3).
|
||||
|
||||
**The right tool.** The Discussion Hub panel. Per-entry `[Edit]`, `[Read]`, `[+/-]`, `Ins`, `Del`, `[Branch]`, role combo. The undo/redo stack (UISnapshot) and the Take/branching/compact system.
|
||||
|
||||
**The wrong tool.** Storing discussion state in the RAG index (it's temporal, not semantic). Storing discussion state in the knowledge digest (it's per-discussion, not durable). Storing discussion state in a FileItem (it's not per-file).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:user types prompt + hits Enter]
|
||||
│
|
||||
▼
|
||||
[I:append new entry to disc_entries] (role: "User")
|
||||
│
|
||||
▼
|
||||
[Q:which ContextPreset is active?] ──► [I:render FileItems per curation memory]
|
||||
│
|
||||
▼
|
||||
[I:aggregate.build_markdown(preset, discussion) -> str]
|
||||
│
|
||||
▼
|
||||
[I:ai_client.send(aggregate_text, history)]
|
||||
│
|
||||
▼
|
||||
[I:append new entry to disc_entries] (role: "AI", content: response)
|
||||
│
|
||||
▼
|
||||
[Q:user pressed Edit on an entry?] ──► [I:update disc_entries[i].content]
|
||||
[Q:user pressed Branch on an entry?] ──► [I:project_manager.branch_discussion(index) -> new Take]
|
||||
[Q:user pressed Undo?] ──► [I:history.UISnapshot.pop() -> restore previous state]
|
||||
[Q:user pressed Compact?] ──► [I:ai_client.run_discussion_compaction(discussion)]
|
||||
```
|
||||
|
||||
**The shape rule.** Discussion is per-discussion, conversational, multi-turn. Edited per-entry. Persisted in TOML via `_flush_to_project`. The `disc_entries` list is the single source of truth for "what was said in this discussion."
|
||||
|
||||
**See:** `docs/guide_architecture.md` §"Threading model"; `src/gui_2.py:3770-3853` (render_discussion_entry); `src/history.py:8-71` (UISnapshot).
|
||||
|
||||
---
|
||||
|
||||
## 3. RAG memory (opt-in, semantic, fuzzy)
|
||||
|
||||
**The shape.** ChromaDB vector store; per-file `FileItem`-like records with embeddings. `RAGEngine.search(query, k=N)` returns the top-N most-similar chunks. Persisted in `~/.manual_slop/.slop_cache/chroma_<embedding_provider>/`.
|
||||
|
||||
**The query model.** "Given a query, return similar content from the indexed corpus." Semantic similarity, fuzzy. No provenance beyond the file path. No user-editable content.
|
||||
|
||||
**The right tool.** `RAGEngine.search()` at LLM call time (the `rag_*` results injected into the LLM prompt). The `[X] Enable RAG` toggle in AI Settings. The `RAGConfig` (embedding provider, chunk size, chunk overlap, source selection).
|
||||
|
||||
**The wrong tool.** Using RAG as a *replacement* for the other 3 dimensions. Using RAG results for state mutation (the integration discipline prohibits this). Using RAG for "show me the last thing the user said" (use Discussion memory). Using RAG for "show me what we decided last time" (use Knowledge memory).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:ai_client.send() is called]
|
||||
│
|
||||
▼
|
||||
[Q:is RAG enabled?]
|
||||
│
|
||||
├── no ──► [T:skip]
|
||||
│
|
||||
▼
|
||||
[Q:which RAG source?]
|
||||
│
|
||||
├── project ──► [I:RAGEngine.index_file for each file in project]
|
||||
├── global ──► [I:RAGEngine.index_file for each file in ~/.manual_slop/knowledge/]
|
||||
└── none ──► [T:skip]
|
||||
│
|
||||
▼
|
||||
[Q:RAG engine initialized?]
|
||||
│
|
||||
├── no ──► [I:RAGEngine._init_embedding_provider()] (lazy init, may download)
|
||||
│
|
||||
▼
|
||||
[I:RAGEngine.search(query, k=N) -> Result[list[SearchResult], ErrorInfo]]
|
||||
│
|
||||
▼
|
||||
[I:append "{rag-context}" block to aggregate markdown]
|
||||
```
|
||||
|
||||
**The shape rule.** RAG is opt-in. Default-off. Complements the other dimensions; never replaces. Provenance is required (file path, chunk offset). No mutation.
|
||||
|
||||
**See:** `docs/guide_rag.md`; `conductor/code_styleguides/rag_integration_discipline.md`; `src/rag_engine.py:1-384`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Knowledge memory (per-project, durable, provenance-aware)
|
||||
|
||||
**The shape.** A markdown tree at `~/.manual_slop/knowledge/`:
|
||||
|
||||
| File | Format | What it stores |
|
||||
|---|---|---|
|
||||
| `facts.md` | `- {statement} {provenance}` | Durable statements about systems, repos, tools |
|
||||
| `decisions.md` | `- {statement, reason} {provenance}` | Decisions that were made |
|
||||
| `questions.md` | `- {question} {provenance}` | Unanswered questions |
|
||||
| `playbooks.md` | `- **{name}**: {steps} {provenance}` | Reusable command sequences |
|
||||
| `tasks.md` | `- {task}` (## Open / ## Done) | Open and done tasks |
|
||||
| `files/{file_id}.md` | `- {note} {provenance}` | Per-file notes (keyed by inode) |
|
||||
| `digest.md` | bounded 4KB | The projected digest (injected as `{knowledge}` block) |
|
||||
| `ledger.json` | `{entries: {sha256: {status, at, items}}}` | The harvest audit log |
|
||||
|
||||
**The query model.** "Given past sessions, what durable knowledge should I inject into the current discussion?" The answer is the `{knowledge}` block in the initial context, regenerated from the category files (newest first), bounded to 4KB.
|
||||
|
||||
**The right tool.** The harvest CLI (`python -m src.knowledge_harvest`) for the harvest; the plain text editor for the category files. The "Knowledge" panel in the GUI for browse/edit/prune.
|
||||
|
||||
**The wrong tool.** Treating the knowledge digest as state (it's a projection; the category files are the state). Letting the digest grow unbounded (4KB cap; truncate with a visible note). Treating the per-file notes as a replacement for FileItem curation (different dimensions; both are useful).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:discussion starts]
|
||||
│
|
||||
▼
|
||||
[Q:knowledge digest exists?]
|
||||
│
|
||||
├── no ──► [T:skip]
|
||||
│
|
||||
▼
|
||||
[Q:digest within 4KB budget?]
|
||||
│
|
||||
├── yes ──► [I:read digest]
|
||||
├── no ──► [I:read digest (truncated with note)]
|
||||
│
|
||||
▼
|
||||
[I:append "{knowledge}" block to stable prefix] (layer 7)
|
||||
│
|
||||
▼
|
||||
[Q:per-file knowledge for files in scope?]
|
||||
│
|
||||
├── yes ──► [I:append "{file-knowledge}" per FileItem]
|
||||
```
|
||||
|
||||
**The shape rule.** Knowledge is per-project, durable, provenance-aware. Edited by the user (plain markdown). The category files are the source of truth; the digest is a projection. "Delete to turn off": `rm digest.md` → no injection.
|
||||
|
||||
**See:** `docs/guide_knowledge_curation.md`; `conductor/code_styleguides/knowledge_artifacts.md`.
|
||||
|
||||
---
|
||||
|
||||
## 5. The boundaries (when NOT to mix)
|
||||
|
||||
| Don't store... | In... | Because... |
|
||||
|---|---|---|
|
||||
| Discussion state | `FileItem` (curation) | Discussion is per-discussion, not per-file |
|
||||
| File curation | `disc_entries` (discussion) | Curation is per-file structural, not conversational |
|
||||
| Semantic search results | `disc_entries` (discussion) | RAG is fuzzy; the discussion is precise |
|
||||
| A long conversation | the knowledge digest | The digest is bounded (4KB); the conversation is unbounded |
|
||||
| A "this is the current state" fact | the RAG index | RAG is semantic; state is precise |
|
||||
| Per-file notes | the discussion context | The notes should follow the file, not the discussion |
|
||||
| Per-discussion summary | the knowledge digest | The digest is *cross*-discussion, not per-discussion |
|
||||
| LLM-derived curation | the FileItem schema | LLM outputs are untrusted; the FileItem is user-edited |
|
||||
| Untrusted LLM output | the knowledge category files | The harvest has retry + graceful failure; but the category files are *user-editable*, so corrections are first-class |
|
||||
|
||||
**The discipline.** When designing a new feature, ask: which of the 4 dimensions is the *natural* home? Don't reach for the RAG because "it's there"; reach for the dimension whose shape matches the data.
|
||||
|
||||
---
|
||||
|
||||
## 6. The decision tree (the 1-question test)
|
||||
|
||||
When a feature needs *some* memory, ask this single question:
|
||||
|
||||
```
|
||||
Q: What is the *data* (not the operation) the feature needs?
|
||||
│
|
||||
├── "How to render a file" ──► Curation (FileItem)
|
||||
├── "What was said in this chat" ──► Discussion (disc_entries)
|
||||
├── "What similar content exists" ──► RAG (RAGEngine.search) [opt-in]
|
||||
└── "What we learned from past runs" ──► Knowledge (knowledge/digest.md)
|
||||
```
|
||||
|
||||
Pick the matching dimension. If the feature needs 2+, use 2+ — but be explicit about which is the *primary* (the one that holds the *answer*) and which is *secondary* (the one that provides *context*).
|
||||
|
||||
---
|
||||
|
||||
## 7. The cross-cutting principle (the "data is the thing")
|
||||
|
||||
All 4 dimensions share one principle: **the data is the thing, not the agent.** Each dimension has:
|
||||
- A flat shape (no object graphs; structs of structs of scalars)
|
||||
- A durable storage (TOML, ChromaDB, markdown — not Python objects)
|
||||
- A user-editable surface (the Structural File Editor, the Discussion Hub, the RAG toggle, the category files)
|
||||
- A query model that returns "data, not control flow" (per `data_oriented_error_handling_20260606`)
|
||||
|
||||
The wrong shape for the right question is a common mistake. The right question is "which of the 4 dimensions is this?" — not "is there a tool that does X?"
|
||||
|
||||
---
|
||||
|
||||
## 8. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` — the canonical styleguide
|
||||
- `docs/guide_context_curation.md` — the existing curation deep-dive (dimension 1)
|
||||
- `docs/guide_rag.md` — the existing RAG deep-dive (dimension 3)
|
||||
- `docs/guide_knowledge_curation.md` — the new knowledge guide (dimension 4)
|
||||
- `docs/guide_caching_strategy.md` — where the 4 dims get injected in the cache strategy
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §2.8 — the nagent-origin pattern that informed this guide
|
||||
@@ -703,3 +703,143 @@ Audit: `scripts/audit_providers_source_of_truth.py` fails if `PROVIDERS` is decl
|
||||
- `tests/test_vendor_capabilities.py` (3 tests): registry lookup, vendor-default fallback, unknown-vendor raises
|
||||
- `tests/test_openai_compatible.py` (6 tests): non-streaming, streaming aggregation, tool call detection, vision, error classification, frozen dataclass
|
||||
- **[conductor/tracks/nagent_review_20260608/report.md §15 Pitfalls #2 and #4](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the per-provider history globals and the stateful singleton pattern; future-track candidate for stateless LLMClient
|
||||
## Addition (2026-06-12) — Cache strategy and the 12-layer model
|
||||
|
||||
The nagent review (v2.3, §3.2 + §5) formalizes the cache strategy that this client implements. The strategy: **stable-to-volatile context ordering**, where layers 1-7 of the initial context are byte-identical across turns and across discussions of the same mode (and therefore cacheable), and layers 8-12 are per-turn (and therefore not cached).
|
||||
|
||||
### The 12-layer model (the recap)
|
||||
|
||||
| # | Layer | Stable? | Where |
|
||||
|---|---|---|---|
|
||||
| 1 | Role instructions | yes | `_get_combined_system_prompt` |
|
||||
| 2 | Function-calling schema | yes | per provider |
|
||||
| 3 | Discovered tool descriptions | yes | `mcp_client.get_tool_schemas()` |
|
||||
| 4 | System prompt preset | yes | `app_state.ai_settings.system_prompt` |
|
||||
| 5 | Persona profile | yes | `app_state.active_persona` |
|
||||
| 6 | Project context (per `manual_slop.toml`) | yes | NEW (Candidate 14) |
|
||||
| 7 | Knowledge digest (per `knowledge/digest.md`) | yes (within gc cycle) | NEW (Candidate 8) |
|
||||
| 8 | Discussion metadata | no | `disc_entries[:1]` or `disc_meta` |
|
||||
| 9 | Active preset (FileItem set) | no | `self.context_files` |
|
||||
| 10 | Per-file details | no | per `FileItem` |
|
||||
| 11 | Prior tool results | no | per `_reread_file_items` |
|
||||
| 12 | User message | no | the input |
|
||||
|
||||
### The byte-comparison test (the design contract)
|
||||
|
||||
The test in `tests/test_aggregate_caching.py` ensures the first N characters of the context are byte-identical across turns:
|
||||
|
||||
```python
|
||||
def test_aggregate_stable_to_volatile_ordering():
|
||||
ctrl = mock_app_controller()
|
||||
turn1 = aggregate.build_initial_context(ctrl, user_message="first")
|
||||
turn2 = aggregate.build_initial_context(ctrl, user_message="second")
|
||||
N = aggregate.stable_prefix_length(ctrl)
|
||||
assert turn1[:N] == turn2[:N], f"Stable prefix mismatch: {turn1[:N]!r} != {turn2[:N]!r}"
|
||||
```
|
||||
|
||||
**The test is the contract.** If a new layer is added in the wrong position, the test fails; the agent must move the layer to the stable position or update the test with written justification.
|
||||
|
||||
### The provider-specific cache strategies
|
||||
|
||||
#### Anthropic (5-min ephemeral, 4 breakpoints max)
|
||||
|
||||
```python
|
||||
def _send_anthropic(messages, *, cache_prefix_chars=None):
|
||||
if cache_prefix_chars is not None:
|
||||
content_blocks = cache_prefix_blocks(messages, cache_prefix_chars)
|
||||
else:
|
||||
content_blocks = messages
|
||||
|
||||
response = anthropic_client.messages.create(
|
||||
model=model,
|
||||
max_tokens=8192,
|
||||
messages=[{"role": "user", "content": content_blocks}],
|
||||
)
|
||||
return _result_with_usage(response.content, response.usage, messages)
|
||||
```
|
||||
|
||||
**The `cache_prefix_blocks` helper** splits the message at the given char offsets and marks each prefix with `cache_control: {"type": "ephemeral"}`. Max 3 prefix blocks (provider limit is 4 breakpoints per request).
|
||||
|
||||
**The Anthropic usage accounting** (in `_result_with_usage`): `cache_read_input_tokens` + `cache_creation_input_tokens` are added to `input_tokens` so the accounting stays "tokens sent" across providers. Caching is *invisible* in the user-facing number.
|
||||
|
||||
#### Gemini (1-h explicit, configurable TTL)
|
||||
|
||||
```python
|
||||
def _send_gemini(messages, *, cache_ttl_seconds=3600):
|
||||
if cache_ttl_seconds > 0:
|
||||
cached_content = genai_client.caches.create(
|
||||
model=model, contents=stable_prefix_messages, ttl=f"{cache_ttl_seconds}s",
|
||||
)
|
||||
response = genai_client.models.generate_content(
|
||||
model=model, contents=volatile_messages,
|
||||
config=genai.types.GenerateContentConfig(cached_content=cached_content.name),
|
||||
)
|
||||
else:
|
||||
response = genai_client.models.generate_content(model=model, contents=messages)
|
||||
return _result_with_usage(response.text, response.usage_metadata, messages)
|
||||
```
|
||||
|
||||
**The default TTL is 1 hour**; configurable per-discussion via the GUI.
|
||||
|
||||
#### OpenAI (5-10 min implicit, provider-managed)
|
||||
|
||||
No application-side control; the provider handles caching. The GUI just shows "Cached by OpenAI; TTL: provider-managed."
|
||||
|
||||
### The GUI exposure (the "Caching" Operations Hub sub-panel)
|
||||
|
||||
| Provider | Default TTL | Configurable? |
|
||||
|---|---|---|
|
||||
| Anthropic ephemeral | 5 min | yes (per-discussion state) |
|
||||
| Gemini explicit | 1 h | yes (TTL override) |
|
||||
| OpenAI implicit | 5-10 min (provider-managed) | no |
|
||||
| claude-code (Claude Agent SDK) | varies (provider-managed) | no |
|
||||
|
||||
**The new AI client state:**
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class DiscussionCacheState:
|
||||
discussion_id: str
|
||||
provider: str
|
||||
cached_at: datetime
|
||||
expires_at: Optional[datetime] # None for OpenAI implicit
|
||||
hit_count: int = 0
|
||||
tokens_cached: int = 0
|
||||
last_invalidated_at: Optional[datetime] = None
|
||||
caching_enabled: bool = True
|
||||
```
|
||||
|
||||
**The Hook API additions:**
|
||||
|
||||
```
|
||||
GET /api/cache # list all discussion cache states
|
||||
GET /api/cache/<discussion_id> # get one
|
||||
POST /api/cache/<discussion_id>/invalidate
|
||||
POST /api/cache/<discussion_id>/disable
|
||||
POST /api/cache/<discussion_id>/enable
|
||||
```
|
||||
|
||||
### The 5th provider (claude-code)
|
||||
|
||||
`claude-code` uses the Claude Agent SDK with local Claude Code authentication (no API key). The caching behavior is provider-managed.
|
||||
|
||||
```python
|
||||
def _send_claude_code(message, model, *, allowed_tools=None, max_turns=1):
|
||||
options = ClaudeAgentOptions(
|
||||
model=None if not model or model == "default" else model,
|
||||
max_turns=max_turns,
|
||||
tools=list(allowed_tools) if allowed_tools else [],
|
||||
allowed_tools=list(allowed_tools) if allowed_tools else [],
|
||||
cwd=os.getcwd(),
|
||||
)
|
||||
# ... claude_agent_sdk.query(prompt=message, options=options)
|
||||
return _result_with_usage(text, usage, message)
|
||||
```
|
||||
|
||||
### The cross-references
|
||||
|
||||
- `docs/guide_caching_strategy.md` — the user-facing deep-dive
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — the canonical styleguide
|
||||
- `docs/guide_agent_memory_dimensions.md` — the 4 dims (where the cache hits)
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5 — the nagent pattern
|
||||
|
||||
|
||||
@@ -0,0 +1,342 @@
|
||||
# Caching Strategy Guide
|
||||
|
||||
**Status:** User-facing deep-dive on the cache strategy: stable-to-volatile context ordering, the 4 cache-TTL profiles (Anthropic, Gemini, OpenAI, claude-code), and the GUI exposure.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/cache_friendly_context.md`; `docs/guide_ai_client.md`; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5.
|
||||
|
||||
> **What this is.** The LLM providers Manual Slop uses (Anthropic, Gemini, OpenAI) all support prompt caching. The cost benefit comes from the *stable prefix* being byte-identical across turns. This guide is the user-facing deep-dive on the 12-layer model, the byte-comparison test, the provider-specific TTLs, and the GUI exposure.
|
||||
|
||||
---
|
||||
|
||||
## 0. The 30-second version
|
||||
|
||||
```
|
||||
[STABLE PREFIX (cached across turns)] [VOLATILE SUFFIX (per-turn)]
|
||||
[Role instructions] [Discussion metadata]
|
||||
[Function-calling schema] [Active preset (FileItems)]
|
||||
[Discovered tool descriptions] [Per-file details]
|
||||
[System prompt preset] [Tool-call results from prior turns]
|
||||
[Persona profile] [The user message]
|
||||
[Project context]
|
||||
[Knowledge digest]
|
||||
[file-knowledge for files in scope]
|
||||
```
|
||||
|
||||
**The cache boundary is at layer 8/9.** Layers 1-7 are byte-identical across turns; layers 8-12 change per turn. The Anthropic-specific path wraps the prefix in `cache_control: {"type": "ephemeral"}` blocks; the Gemini path uses `cachedContent` resources; the OpenAI path uses implicit prefix caching.
|
||||
|
||||
**The provider-specific defaults:**
|
||||
|
||||
| Provider | Default TTL | Configurable? | GUI exposure? |
|
||||
|---|---|---|---|
|
||||
| Anthropic ephemeral | 5 min | yes (per-discussion) | yes |
|
||||
| Gemini explicit | 1 h | yes (per-discussion override) | yes (TTL override) |
|
||||
| OpenAI implicit | 5-10 min (provider-managed) | no | shows "cached" only |
|
||||
| claude-code (Claude Agent SDK) | varies (provider-managed) | no | shows "cached" only |
|
||||
|
||||
---
|
||||
|
||||
## 1. The 12-layer model (the stable-to-volatile ordering)
|
||||
|
||||
| # | Layer | Stable across turns? | Source | SSDL |
|
||||
|---|---|---|---|---|
|
||||
| 1 | Role instructions (model + provider) | yes | `_get_combined_system_prompt` | `[I]` |
|
||||
| 2 | Function-calling schema | yes | per provider | `[I]` |
|
||||
| 3 | Discovered tool descriptions | yes | `mcp_client.get_tool_schemas()` | `[I]` |
|
||||
| 4 | System prompt preset | yes | `app_state.ai_settings.system_prompt` | `[I]` |
|
||||
| 5 | Persona profile | yes | `app_state.active_persona` | `[I]` |
|
||||
| 6 | Project context (per `manual_slop.toml`) | yes | NEW | `[I]` |
|
||||
| 7 | Knowledge digest (per `knowledge/digest.md`) | yes (within a gc cycle) | NEW | `[I]` |
|
||||
| 8 | Discussion metadata (name, role count) | no (per turn) | `disc_entries[:1]` or `disc_meta` | `───` |
|
||||
| 9 | Active preset (FileItem set) | no (per turn) | `self.context_files` | `───` |
|
||||
| 10 | Per-file details (history, slices, notes) | no (per file) | per `FileItem` | `───` |
|
||||
| 11 | Tool-call results from prior turns | no (per turn) | per `_reread_file_items` | `───` |
|
||||
| 12 | The user message | no (per turn) | the input | `───` |
|
||||
|
||||
**The cache boundary is at layer 7/8.** Layers 1-7 are byte-identical across turns of the same discussion (and across discussions of the same mode). Layers 8-12 change per turn.
|
||||
|
||||
---
|
||||
|
||||
## 2. The byte-comparison test (the design contract)
|
||||
|
||||
The design rule "stable prefix is byte-identical" must be testable. The test:
|
||||
|
||||
```python
|
||||
# In tests/test_aggregate_caching.py (NEW)
|
||||
def test_aggregate_stable_to_volatile_ordering():
|
||||
"""The first N characters of the context should be identical across turns
|
||||
of the same conversation, when no stable-layer inputs change."""
|
||||
ctrl = mock_app_controller()
|
||||
ctrl.ai_settings.system_prompt = "Test system prompt"
|
||||
ctrl.active_persona = mock_persona()
|
||||
|
||||
# Turn 1
|
||||
turn1 = aggregate.build_initial_context(ctrl, user_message="first prompt")
|
||||
|
||||
# Turn 2 (same stable inputs, different user message)
|
||||
turn2 = aggregate.build_initial_context(ctrl, user_message="second prompt")
|
||||
|
||||
# The first N characters should be identical (N = where the volatile layers start)
|
||||
N = aggregate.stable_prefix_length(ctrl)
|
||||
assert turn1[:N] == turn2[:N], f"Stable prefix mismatch: {turn1[:N]!r} != {turn2[:N]!r}"
|
||||
```
|
||||
|
||||
**The test is the contract.** If a new layer is added in the middle of the stack, this test fails; the agent must either move the layer to the stable position or update the test (with written justification).
|
||||
|
||||
---
|
||||
|
||||
## 3. The provider-specific cache strategies
|
||||
|
||||
### 3.1 Anthropic (5-minute ephemeral, 4 breakpoints max)
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_anthropic
|
||||
def _send_anthropic(messages, *, cache_prefix_chars=None):
|
||||
if cache_prefix_chars is not None:
|
||||
# Wrap the message in content blocks; mark each prefix with cache_control
|
||||
content_blocks = cache_prefix_blocks(messages, cache_prefix_chars)
|
||||
else:
|
||||
content_blocks = messages
|
||||
|
||||
response = anthropic_client.messages.create(
|
||||
model=model,
|
||||
max_tokens=8192,
|
||||
messages=[{"role": "user", "content": content_blocks}],
|
||||
)
|
||||
return _result_with_usage(response.content, response.usage, messages)
|
||||
```
|
||||
|
||||
**The cache_prefix_blocks helper:**
|
||||
|
||||
```python
|
||||
def cache_prefix_blocks(message: str, cache_boundaries: list[int]) -> list[dict]:
|
||||
"""Split the message into content blocks at the given char offsets.
|
||||
Mark each prefix block with cache_control. Returns the plain string
|
||||
when no valid boundary exists. At most 3 prefix blocks (provider limit
|
||||
is 4 breakpoints per request)."""
|
||||
if not cache_boundaries:
|
||||
return message
|
||||
points = sorted({b for b in cache_boundaries if 0 < b < len(message)})[:3]
|
||||
if not points:
|
||||
return message
|
||||
blocks = []
|
||||
start = 0
|
||||
for point in points:
|
||||
blocks.append({
|
||||
"type": "text",
|
||||
"text": message[start:point],
|
||||
"cache_control": {"type": "ephemeral"},
|
||||
})
|
||||
start = point
|
||||
blocks.append({"type": "text", "text": message[start:]})
|
||||
return blocks
|
||||
```
|
||||
|
||||
**The Anthropic usage accounting:**
|
||||
|
||||
```python
|
||||
def _result_with_usage(text, usage, input_text=None):
|
||||
input_tokens = _usage_value(usage, "input_tokens", "prompt_tokens", "prompt_token_count")
|
||||
# Anthropic reports cached prompt tokens separately; fold them back
|
||||
# so input_tokens stays "tokens sent" across providers.
|
||||
input_tokens += _usage_value(usage, "cache_read_input_tokens")
|
||||
input_tokens += _usage_value(usage, "cache_creation_input_tokens")
|
||||
# ...
|
||||
```
|
||||
|
||||
**The 4-breakpoint limit.** Anthropic allows at most 4 `cache_control` markers per request. Manual Slop uses 3 prefix blocks (one breakpoint per prefix) + 1 volatile suffix.
|
||||
|
||||
### 3.2 Gemini (1-hour explicit cache, configurable TTL)
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_gemini
|
||||
def _send_gemini(messages, *, cache_ttl_seconds=3600):
|
||||
if cache_ttl_seconds > 0:
|
||||
cached_content = genai_client.caches.create(
|
||||
model=model,
|
||||
contents=stable_prefix_messages,
|
||||
ttl=f"{cache_ttl_seconds}s",
|
||||
)
|
||||
response = genai_client.models.generate_content(
|
||||
model=model,
|
||||
contents=volatile_messages,
|
||||
config=genai.types.GenerateContentConfig(cached_content=cached_content.name),
|
||||
)
|
||||
else:
|
||||
response = genai_client.models.generate_content(model=model, contents=messages)
|
||||
return _result_with_usage(response.text, response.usage_metadata, messages)
|
||||
```
|
||||
|
||||
**The default TTL is 1 hour.** Configurable per the GUI (per §4 below).
|
||||
|
||||
### 3.3 OpenAI (5-10 min implicit, provider-managed)
|
||||
|
||||
OpenAI's caching is *implicit*: the provider automatically caches the prefix and reuses it across requests with the same prefix. No application-side control.
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_openai
|
||||
def _send_openai(messages, *, model="gpt-5.5"):
|
||||
response = openai_client.responses.create(model=model, input=messages)
|
||||
return _result_with_usage(response.output_text, response.usage, messages)
|
||||
# No application-side cache_control; the provider handles it
|
||||
```
|
||||
|
||||
**The TTL is provider-managed** (5-10 min). The GUI just shows "Cached by OpenAI; TTL: provider-managed."
|
||||
|
||||
### 3.4 claude-code (5th provider, subscription auth)
|
||||
|
||||
`claude-code` uses the Claude Agent SDK with local Claude Code authentication (no API key). The caching behavior is provider-managed.
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_claude_code (the 5th provider)
|
||||
def _send_claude_code(message, model, *, allowed_tools=None, max_turns=1):
|
||||
options = ClaudeAgentOptions(
|
||||
model=None if not model or model == "default" else model,
|
||||
max_turns=max_turns,
|
||||
tools=list(allowed_tools) if allowed_tools else [],
|
||||
allowed_tools=list(allowed_tools) if allowed_tools else [],
|
||||
cwd=os.getcwd(),
|
||||
)
|
||||
# ... claude_agent_sdk.query(prompt=message, options=options)
|
||||
return _result_with_usage(text, usage, message)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. The GUI exposure
|
||||
|
||||
The "Caching" Operations Hub sub-panel:
|
||||
|
||||
```
|
||||
+------------------------------------------------------+
|
||||
| Caching |
|
||||
+------------------------------------------------------+
|
||||
| Provider summaries |
|
||||
| [Anthropic] in:340 cache:80 hit:23% ttl:4:32 |
|
||||
| [Gemini] in:120 cache:0 hit:0% ttl:0:00 |
|
||||
| [OpenAI] in:560 cache:200 hit:35% ttl:n/a |
|
||||
+------------------------------------------------------+
|
||||
| Active discussions |
|
||||
| Discussion "refactor auth" |
|
||||
| cached: yes (Anthropic) |
|
||||
| expires: 2026-06-12T15:32 (in 4:32) |
|
||||
| [Invalidate cache] [Disable caching for this] |
|
||||
| Discussion "fix the parser" |
|
||||
| cached: no |
|
||||
| [Enable caching for this] |
|
||||
+------------------------------------------------------+
|
||||
| Global settings |
|
||||
| [X] Enable Anthropic ephemeral caching |
|
||||
| [X] Enable Gemini explicit caching |
|
||||
| [ ] Allow >1h Gemini caches (charges may apply) |
|
||||
| Anthropic default TTL: [5 min v] |
|
||||
| Gemini default TTL: [60 min v] |
|
||||
+------------------------------------------------------+
|
||||
```
|
||||
|
||||
**The data sources:**
|
||||
|
||||
| Widget | Data source | Frequency |
|
||||
|---|---|---|
|
||||
| `in:N cache:N hit:N%` | `ai_client.get_token_stats()` | per turn (or per session) |
|
||||
| `ttl:4:32` | `ai_client._send_<provider>` usage metadata + the cache expiry timestamp | per turn |
|
||||
| `cached: yes/no` | per-discussion flag (NEW) | per discussion |
|
||||
| `[Invalidate cache]` | calls `ai_client._invalidate_cache(discussion_id)` (NEW) | on click |
|
||||
|
||||
**The new AI client state:**
|
||||
|
||||
```python
|
||||
# In src/ai_client.py (NEW)
|
||||
@dataclass
|
||||
class DiscussionCacheState:
|
||||
discussion_id: str
|
||||
provider: str
|
||||
cached_at: datetime
|
||||
expires_at: Optional[datetime]
|
||||
hit_count: int = 0
|
||||
tokens_cached: int = 0
|
||||
last_invalidated_at: Optional[datetime] = None
|
||||
caching_enabled: bool = True
|
||||
|
||||
# In AppController (NEW)
|
||||
self.discussion_caches: dict[str, DiscussionCacheState] = {}
|
||||
```
|
||||
|
||||
**The Hook API additions:**
|
||||
|
||||
```
|
||||
GET /api/cache # list all discussion cache states
|
||||
GET /api/cache/<discussion_id> # get one
|
||||
POST /api/cache/<discussion_id>/invalidate
|
||||
POST /api/cache/<discussion_id>/disable
|
||||
POST /api/cache/<discussion_id>/enable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. The injection (where the cache hits)
|
||||
|
||||
| Layer | Where injected | Stable? | Cache impact |
|
||||
|---|---|---|---|
|
||||
| 1. Role instructions | `_get_combined_system_prompt` | yes | **CACHED** |
|
||||
| 2. Function-calling schema | per provider | yes | **CACHED** |
|
||||
| 3. Discovered tool descriptions | `mcp_client.get_tool_schemas()` | yes | **CACHED** |
|
||||
| 4. System prompt preset | `app_state.ai_settings.system_prompt` | yes | **CACHED** |
|
||||
| 5. Persona profile | `app_state.active_persona` | yes | **CACHED** |
|
||||
| 6. Project context | `manual_slop.toml [agent.context_files]` | yes | **CACHED** |
|
||||
| 7. Knowledge digest | `~/.manual_slop/knowledge/digest.md` | yes (within a gc cycle) | **CACHED** |
|
||||
| 8. Discussion metadata | `disc_entries[:1]` | no | NOT cached |
|
||||
| 9. Active preset | `self.context_files` | no | NOT cached |
|
||||
| 10. Per-file details | per `FileItem` | no | NOT cached |
|
||||
| 11. Prior tool results | per `_reread_file_items` | no | NOT cached |
|
||||
| 12. User message | the input | no | NOT cached |
|
||||
|
||||
**The cache only hits on the stable prefix (layers 1-7).** The volatile suffix (layers 8-12) is *not* cached; the user expects the conversation to change per turn.
|
||||
|
||||
---
|
||||
|
||||
## 6. The cache invalidation triggers
|
||||
|
||||
| Trigger | Effect |
|
||||
|---|---|
|
||||
| `python -m src.knowledge_harvest --apply` | The digest is regenerated; the cache is invalidated for the next turn |
|
||||
| `FileItem.notes` edited | The per-file knowledge changes; the cache is invalidated for the next turn that references the file |
|
||||
| `persona` changed | The persona profile is in the stable prefix; the cache is invalidated |
|
||||
| `[Invalidate cache]` button | The per-discussion cache state is marked `last_invalidated_at`; the next turn re-creates it |
|
||||
| `expiration` reached | The provider's cache expires automatically; the next turn re-creates it |
|
||||
|
||||
---
|
||||
|
||||
## 7. The measurement (the empirical basis)
|
||||
|
||||
**The "before" measurement** (do this first, before any refactor):
|
||||
|
||||
```bash
|
||||
# Log the cache hit rate over a sample of representative discussions
|
||||
$ python -m scripts.measure_cache_hit_rate --discussions 50 --provider anthropic
|
||||
cache hit rate: 23% (avg)
|
||||
cache write rate: 45% (avg)
|
||||
in:N avg: 1,200
|
||||
cache:N avg: 280
|
||||
```
|
||||
|
||||
**The "after" measurement** (after the stable-to-volatile refactor):
|
||||
|
||||
```bash
|
||||
$ python -m scripts.measure_cache_hit_rate --discussions 50 --provider anthropic
|
||||
cache hit rate: 67% (avg) # <-- should be measurably higher
|
||||
cache write rate: 18% (avg) # <-- should be lower
|
||||
in:N avg: 1,200 # <-- unchanged (the user still types the same)
|
||||
cache:N avg: 280 # <-- unchanged
|
||||
```
|
||||
|
||||
**The win comes from re-aligning the boundaries**, not from changing the providers. The test is whether the cache hit rate is measurably higher after the refactor.
|
||||
|
||||
---
|
||||
|
||||
## 8. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — the canonical styleguide
|
||||
- `docs/guide_ai_client.md` — the underlying LLM client (the producer)
|
||||
- `docs/guide_agent_memory_dimensions.md` §5 — where the 4 dims get injected
|
||||
- `docs/guide_knowledge_curation.md` §3 — the digest (layer 7)
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5 — the nagent pattern
|
||||
@@ -0,0 +1,411 @@
|
||||
# Knowledge Curation Guide
|
||||
|
||||
**Status:** User-facing deep-dive on the 4th memory dimension (the knowledge memory). For agents, see `./docs/AGENTS.md` §6.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/knowledge_artifacts.md`; `docs/guide_agent_memory_dimensions.md` §4; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.1, §4.
|
||||
|
||||
> **What this is.** The 4th memory dimension is the *durable, user-editable, provenance-aware* knowledge store. It's a *layer*, not a *snapshot*. Category files are the source of truth; the digest is a projection; the ledger is the audit log. This guide is the user-facing deep-dive on how to use it, how to harvest it, and how to query it.
|
||||
|
||||
---
|
||||
|
||||
## 0. The 30-second version
|
||||
|
||||
Manual Slop's knowledge memory lives at `~/.manual_slop/knowledge/`. It has 5 category files (`facts.md`, `decisions.md`, `questions.md`, `playbooks.md`, `tasks.md`) plus per-file notes (`files/{file_id}.md`) plus a 4KB bounded digest plus a sha256 ledger. The LLM harvests past discussions into these files; the user can edit any of them in plain text. The digest is injected into every new discussion's initial context as a `{knowledge}` block.
|
||||
|
||||
```
|
||||
$ ls ~/.manual_slop/knowledge/
|
||||
facts.md # - {statement} {provenance}
|
||||
decisions.md # - {statement, reason} {provenance}
|
||||
questions.md # - {question} {provenance}
|
||||
playbooks.md # - **{name}**: {steps} {provenance}
|
||||
tasks.md # ## Open / ## Done
|
||||
files/ # per-file notes (keyed by inode)
|
||||
digest.md # bounded 4KB; the projection
|
||||
ledger.json # sha256-of-content audit log
|
||||
prompts/ # user-editable harvest prompt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 1. The 5 category files (the source of truth)
|
||||
|
||||
### 1.1 `facts.md` (durable statements)
|
||||
|
||||
```markdown
|
||||
# Facts
|
||||
|
||||
- The MCP dispatch uses a flat if/elif chain. 4 places, 45 tools. [from: 2026-05-12-investigate-dispatch, 2026-05-12]
|
||||
- ai_client.py has 5 separate per-provider history lists, each with their own lock. Switching providers mid-session loses history. [from: 2026-05-13-state-mutation-matrix, 2026-05-13]
|
||||
- RAG is opt-in. Default-off in new projects. [from: 2026-06-12-rag-discipline, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {statement} {provenance}`. Plain markdown. Append-only. User-editable.
|
||||
|
||||
**The provenance string:** `[from: {conversation_name}, {date}]`. The `date` is the ISO-8601 date prefix of the harvest timestamp.
|
||||
|
||||
**The user can edit any fact.** The LLM's output is a *suggestion*; the user is the editor. If a fact is wrong, the user deletes it. If a fact needs more detail, the user adds it. The harvest will *append*; it will not *overwrite*.
|
||||
|
||||
### 1.2 `decisions.md` (decisions with reasons)
|
||||
|
||||
```markdown
|
||||
# Decisions
|
||||
|
||||
- Knowledge harvest is a complement to curation + discussion, not a RAG replacement. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
- Cache TTL defaults to 5 min (Anthropic) + 60 min (Gemini); configurable per-discussion. [from: 2026-06-12-cache-strategy, 2026-06-12]
|
||||
- Per-file knowledge notes are keyed by st_dev:st_ino, not by path. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {statement} {provenance}`. The "why" lives in the LLM's harvest output's `detail` field. The user's edits override.
|
||||
|
||||
### 1.3 `questions.md` (unanswered questions)
|
||||
|
||||
```markdown
|
||||
# Questions
|
||||
|
||||
- Where does intent resolution live — per-verb, per-block, or global? [from: 2026-06-12-follow-up-b, 2026-06-12]
|
||||
- How should the knowledge digest TTL be exposed in the GUI? [from: 2026-06-12-cache-ttl, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {question} {provenance}`. Open questions are *valuable* — they're the TODO list the next session can act on.
|
||||
|
||||
### 1.4 `playbooks.md` (reusable sequences)
|
||||
|
||||
```markdown
|
||||
# Playbooks
|
||||
|
||||
- **Knowledge Harvest**: scan -> classify -> LLM-distill -> append -> digest -> reclaim. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
- **Stable-to-Volatile Cache Ordering**: identify Instance: boundary -> pass to --cache-prefix-chars. [from: 2026-06-12-candidate-12, 2026-06-12]
|
||||
- **Candidate Verification (TBD)**: read src/ai_client.py:run_discussion_compression -> check failure mode. [from: 2026-06-12-candidate-15, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- **{name}**: {steps} {provenance}`. Playbooks are the "I did this once; here it is" record. Future workers use them directly.
|
||||
|
||||
### 1.5 `tasks.md` (open and done)
|
||||
|
||||
```markdown
|
||||
# Tasks
|
||||
|
||||
## Open
|
||||
- Create canonical DOD file at conductor/code_styleguides/data_oriented_design.md. [from: 2026-06-12-candidate-16, 2026-06-12]
|
||||
- Verify Candidate 15 by reading src/ai_client.py:run_discussion_compression. [from: 2026-06-12-candidate-15, 2026-06-12]
|
||||
|
||||
## Done
|
||||
- Read nagent source in full (18 files). [from: 2026-05-15, 2026-05-15]
|
||||
- Wrote v2.3 review (272KB / 3965 lines). [from: 2026-06-12-v2.3, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {task} {provenance}`. The two sections are manually maintained; the harvest places open items in `## Open` and done items in `## Done`.
|
||||
|
||||
---
|
||||
|
||||
## 2. The per-file notes (`files/{file_id}.md`)
|
||||
|
||||
**The shape:**
|
||||
|
||||
```markdown
|
||||
# /repo/src/ai_client.py
|
||||
|
||||
- Uses `cache_control: {"type": "ephemeral"}` blocks for Anthropic caching. [from: 2026-06-12-investigate-cache, 2026-06-12]
|
||||
- The 5 per-provider history lists are gated by their own locks. [from: 2026-05-13-state-mutation-matrix, 2026-05-13]
|
||||
- `run_discussion_compression` failure mode: TBD (Candidate 15). [from: 2026-06-12-candidate-15, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {note} {provenance}`. Keyed by `file_id` (the st_dev:st_ino of the file). Survives renames within the same filesystem.
|
||||
|
||||
**The `file_id_for_path` pattern** (per nagent's `bin/helpers/nagent_file_edit_lib.py:file_id_for_path`):
|
||||
|
||||
```python
|
||||
def file_id_for_path(path: Path) -> str:
|
||||
"""Stable file identity across renames. Returns 'device:inode'."""
|
||||
stat = path.stat()
|
||||
return f"{stat.st_dev}:{stat.st_ino}"
|
||||
```
|
||||
|
||||
**Why inode and not path?** The path can change (rename, move, link); the inode is stable. A note about `src/foo.py` is preserved if `src/foo.py` is renamed to `src/bar.py` (same inode). If the file is moved across filesystems, the inode changes; the user must re-add the note.
|
||||
|
||||
**The "files" category in the harvest output has a special branch:**
|
||||
|
||||
```python
|
||||
# In merge_harvest (the harvest pipeline)
|
||||
file_notes = 0
|
||||
for row in harvested.get("files", []):
|
||||
if not isinstance(row, dict):
|
||||
continue
|
||||
path_text = str(row.get("path") or "").strip()
|
||||
note = str(row.get("note") or "").strip()
|
||||
if not note:
|
||||
continue
|
||||
target = Path(path_text) if path_text else None
|
||||
if target is not None and target.is_file():
|
||||
try:
|
||||
file_id = file_id_for_path(target)
|
||||
except OSError:
|
||||
file_id = None
|
||||
if file_id is not None:
|
||||
_append_bullets(
|
||||
file_knowledge_path(root, file_id), f"# {target.resolve()}",
|
||||
[f"{note} {provenance}"],
|
||||
)
|
||||
file_notes += 1
|
||||
continue
|
||||
# Target no longer resolvable: the note survives as a fact.
|
||||
prefix = f"{path_text}: " if path_text else ""
|
||||
_append_bullets(knowledge / "facts.md", "# Facts", [f"{prefix}{note} {provenance}"])
|
||||
file_notes += 1
|
||||
counts["files"] = file_notes
|
||||
```
|
||||
|
||||
**The behavior:**
|
||||
- If the path resolves to an existing file → the note goes to `knowledge/files/{file_id}.md`
|
||||
- If the path doesn't resolve (the file is gone) → the note falls back to `facts.md` as `{path}: {note} {provenance}`. The note survives, just loses the per-file binding.
|
||||
|
||||
---
|
||||
|
||||
## 3. The digest (`digest.md`)
|
||||
|
||||
The digest is a *projection* of the category files, bounded to **4KB**. It's injected as the `{knowledge}` block in the initial context.
|
||||
|
||||
**The format:**
|
||||
|
||||
```markdown
|
||||
# Knowledge digest
|
||||
(regenerated by knowledge_harvest; edit the category files, not this file)
|
||||
|
||||
## Open tasks
|
||||
- Create canonical DOD file at conductor/code_styleguides/data_oriented_design.md. [from: 2026-06-12-candidate-16, 2026-06-12]
|
||||
|
||||
## Open questions
|
||||
- Where does intent resolution live — per-verb, per-block, or global? [from: 2026-06-12-follow-up-b, 2026-06-12]
|
||||
|
||||
## Decisions
|
||||
- Knowledge harvest is a complement to curation + discussion, not a RAG replacement. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
|
||||
## Facts
|
||||
- nagent has 5 providers; Manual Slop has 8. [from: 2026-06-12-v2.3, 2026-06-12]
|
||||
|
||||
## Playbooks
|
||||
- **Knowledge Harvest**: scan -> classify -> LLM-distill -> append -> digest -> reclaim. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
```
|
||||
|
||||
**The ordering is fixed:** Open tasks, Open questions, Decisions, Facts, Playbooks. **Within each section, newest first** (because the category files are append-only; reversing gives newest-first).
|
||||
|
||||
**Truncation:** if the sections don't fit in 4KB, the rest is truncated with a visible `(truncated; see the category files for the rest)` note.
|
||||
|
||||
**"Delete to turn off":** `rm ~/.manual_slop/knowledge/digest.md` → no `{knowledge}` block injected. Re-enable by running the harvest (which regenerates the digest).
|
||||
|
||||
---
|
||||
|
||||
## 4. The ledger (`ledger.json`)
|
||||
|
||||
The ledger is the **sha256-of-content audit log**. It gates deletion on a proven harvest.
|
||||
|
||||
**The format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"entries": {
|
||||
"<sha256-of-conversation-content>": {
|
||||
"path": "/home/user/.manual_slop/conversations/<name>-<uuid>",
|
||||
"status": "harvested",
|
||||
"at": "2026-06-12T14:23:45.123456+00:00",
|
||||
"items": {
|
||||
"facts": 3,
|
||||
"decisions": 2,
|
||||
"tasks_done": 1,
|
||||
"tasks_open": 0,
|
||||
"questions": 1,
|
||||
"playbooks": 0,
|
||||
"files": 1
|
||||
},
|
||||
"deleted": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**The status values:**
|
||||
|
||||
| Status | Meaning | Action |
|
||||
|---|---|---|
|
||||
| `harvested` | LLM distillation succeeded; items appended to category files | reclaim (unlink) |
|
||||
| `harvest-failed` | LLM distillation failed after retries | keep the conversation; record the error |
|
||||
| `deleted-unharvested` | User passed `--no-harvest`; the conversation is reclaimed without LLM | reclaim (unlink) |
|
||||
| `too-large` | File > 1MB; kept without harvesting | keep |
|
||||
|
||||
**The sha256-of-content dedup:** two conversations with the same content share a ledger entry. The second is reclaimed without paying the LLM cost again.
|
||||
|
||||
---
|
||||
|
||||
## 5. The harvest workflow
|
||||
|
||||
### 5.1 The 7-category schema (the LLM output)
|
||||
|
||||
The LLM's harvest output is strict JSON (no prose, no markdown fence):
|
||||
|
||||
```json
|
||||
{
|
||||
"facts": [{"statement": "...", "detail": "..."}],
|
||||
"decisions": [{"statement": "...", "detail": "..."}],
|
||||
"tasks_done": [{"statement": "...", "detail": "..."}],
|
||||
"tasks_open": [{"statement": "...", "detail": "..."}],
|
||||
"questions": [{"statement": "...", "detail": "..."}],
|
||||
"playbooks": [{"name": "...", "steps": "..."}],
|
||||
"files": [{"path": "...", "note": "..."}]
|
||||
}
|
||||
```
|
||||
|
||||
**The prompt** (in `~/.manual_slop/knowledge/prompts/harvest-conversation.md`; user-editable, root-first resolution):
|
||||
|
||||
```markdown
|
||||
# Harvest durable knowledge from a manual_slop conversation
|
||||
|
||||
You are given one conversation (or a summary of one). Extract only knowledge that
|
||||
stays useful after this conversation is deleted. Return only JSON in exactly this
|
||||
form (no prose, no markdown fence):
|
||||
|
||||
[the 7-category schema above]
|
||||
|
||||
Category rules:
|
||||
- facts: durable statements about systems, repositories, tools, environments, or
|
||||
constraints that were learned, not assumed.
|
||||
- decisions: choices that were made, with the why in `detail`.
|
||||
- tasks_done: concrete work completed in this conversation.
|
||||
- tasks_open: work that was started, planned, or requested but not finished.
|
||||
- questions: questions raised and never answered.
|
||||
- playbooks: command sequences or processes that worked and are reusable; `steps`
|
||||
is the runnable sequence.
|
||||
- files: a note tied to one specific file path (use the absolute path seen in
|
||||
the conversation).
|
||||
|
||||
General rules:
|
||||
- Empty arrays are valid and expected: most conversations contain nothing durable.
|
||||
Do not invent items to fill categories.
|
||||
- One item per distinct piece of knowledge; keep `statement` to one sentence.
|
||||
- `detail` is optional context; omit it or use "" when the statement stands alone.
|
||||
- Do not include conversation mechanics, tool output noise, retries, or one-off
|
||||
trivia (timestamps, token counts, transient errors).
|
||||
```
|
||||
|
||||
### 5.2 The retry budget (the contract)
|
||||
|
||||
`HARVEST_MAX_ATTEMPTS = 2`. The retry is at the parse level (not the API level):
|
||||
|
||||
```python
|
||||
def harvest_conversation(path, provider, model, *, generate, summarize=None):
|
||||
content = read_or_summarize(path, provider, model)
|
||||
template = harvest_prompt_path().read_text(encoding="utf-8").strip()
|
||||
last_error = None
|
||||
for attempt in range(HARVEST_MAX_ATTEMPTS):
|
||||
prompt = build_harvest_prompt(template, path.name, content, retry=attempt > 0)
|
||||
response = generate(prompt, provider, model)
|
||||
try:
|
||||
return parse_harvest_json(response)
|
||||
except (json.JSONDecodeError, ValueError) as exc:
|
||||
last_error = exc
|
||||
raise RuntimeError(f"harvest output invalid after {HARVEST_MAX_ATTEMPTS} attempts: {last_error}")
|
||||
```
|
||||
|
||||
**The retry-suffix:** on retry, append `\nYour previous reply was not valid JSON. Return only the JSON object.\n` to the prompt.
|
||||
|
||||
### 5.3 The size limits (the budgets)
|
||||
|
||||
| Constant | Value | Why |
|
||||
|---|---|---|
|
||||
| `SUMMARIZE_THRESHOLD_BYTES` | 64 KB | Files > 64KB get summarized first |
|
||||
| `MAX_HARVEST_SOURCE_BYTES` | 1 MB | Files > 1MB are kept (not harvested) |
|
||||
| `DIGEST_MAX_BYTES` | 4 KB | The bounded digest size |
|
||||
| `HARVEST_MAX_ATTEMPTS` | 2 | Retry budget on parse failure |
|
||||
|
||||
### 5.4 The dry-run-by-default safety
|
||||
|
||||
The harvest CLI defaults to **dry-run**. Without `--apply`, the CLI classifies, estimates cost, and prints a report. **No mutation.**
|
||||
|
||||
```bash
|
||||
$ python -m src.knowledge_harvest
|
||||
artifacts: live:42, user-kept:3, prune:0, harvest:17, keep:1
|
||||
harvest candidates: 2.3MB (~600K input tokens), prune candidates: 0B
|
||||
dry run; pass --apply to harvest and reclaim
|
||||
|
||||
$ python -m src.knowledge_harvest --apply
|
||||
reclaimed: 2.3MB
|
||||
harvested items: facts:42, decisions:18, tasks_done:7, tasks_open:3, questions:5, playbooks:2, files:11
|
||||
digest: /home/user/.manual_slop/knowledge/digest.md
|
||||
ledger: /home/user/.manual_slop/knowledge/ledger.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. The "delete to turn off" pattern
|
||||
|
||||
**The principle.** Feature flags should be data, not config. If a feature is gated by the presence of a file, the user can turn it off by deleting the file. No GUI toggle, no env var, no `config.toml` edit. Just `rm`.
|
||||
|
||||
**The knowledge digest pattern:** `rm ~/.manual_slop/knowledge/digest.md` → no `{knowledge}` block is injected. Re-enable by running `python -m src.knowledge_harvest --apply` (which regenerates the digest).
|
||||
|
||||
**The implementation:**
|
||||
|
||||
```python
|
||||
# In aggregate.py:run (the consumer of the digest)
|
||||
knowledge_digest_path = paths.knowledge_dir() / "digest.md"
|
||||
if knowledge_digest_path.is_file():
|
||||
knowledge_digest = knowledge_digest_path.read_text(encoding="utf-8")
|
||||
stable_prefix.append(f"{{knowledge}}\n{knowledge_digest}\n{{/knowledge}}\n")
|
||||
# else: skip; the file is the switch
|
||||
```
|
||||
|
||||
**The pattern recurs in 3 places:**
|
||||
1. `regenerate_digest` deletes the digest when sections are empty
|
||||
2. The `aggregate.py:run` injection check is the load-bearing one
|
||||
3. The GUI `Knowledge` panel shows the file state and provides a `[Delete to turn off]` button
|
||||
|
||||
---
|
||||
|
||||
## 7. The graceful failure modes
|
||||
|
||||
| Failure | Handling |
|
||||
|---|---|
|
||||
| LLM returns invalid JSON | Retry (up to 2 attempts); on 2nd failure, mark `harvest-failed` in the ledger; keep the conversation |
|
||||
| File > 1MB | Mark `too-large` in the ledger; keep the conversation |
|
||||
| File > 64KB | Summarize via `run_subagent_summarization`; use the summary as the LLM input |
|
||||
| Provider not available | Mark `harvest-failed`; keep the conversation |
|
||||
| Network timeout | Same; mark `harvest-failed`; keep the conversation |
|
||||
| Disk full writing to category files | Raise; mark `harvest-failed`; keep the conversation (don't reclaim) |
|
||||
|
||||
**The pattern:** critical operations complete; non-essential post-steps are best-effort. The marker is visible. The user can re-run.
|
||||
|
||||
---
|
||||
|
||||
## 8. The injection (where the digest is used)
|
||||
|
||||
The digest is injected into the *stable* position of the initial context (layer 7 of the 12-layer model; per `cache_friendly_context.md`):
|
||||
|
||||
```python
|
||||
# In aggregate.py:run (the consumer)
|
||||
def build_initial_context(ctrl, user_message):
|
||||
stable_prefix = []
|
||||
|
||||
# Layer 1-6: role, schema, tools, system prompt, persona, project context
|
||||
stable_prefix.append(...)
|
||||
|
||||
# Layer 7: knowledge digest (the 4KB bounded projection)
|
||||
knowledge_digest_path = paths.knowledge_dir() / "digest.md"
|
||||
if knowledge_digest_path.is_file():
|
||||
knowledge_digest = knowledge_digest_path.read_text(encoding="utf-8")
|
||||
stable_prefix.append(f"{{knowledge}}\n{knowledge_digest}\n{{/knowledge}}\n")
|
||||
|
||||
# Layer 8-12: discussion metadata, active preset, per-file details, prior turns, user message
|
||||
volatile_suffix = [...]
|
||||
|
||||
return "".join(stable_prefix + volatile_suffix)
|
||||
```
|
||||
|
||||
**The position matters.** The digest is in the *stable* position (before the `Instance:` volatile block). The cache can include the digest in the cached prefix; the volatile suffix is not cached. Per `cache_friendly_context.md` §1.
|
||||
|
||||
---
|
||||
|
||||
## 9. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` — the canonical styleguide
|
||||
- `docs/guide_agent_memory_dimensions.md` §4 — the knowledge dim in context
|
||||
- `docs/guide_caching_strategy.md` §5 — where the digest is injected
|
||||
- `conductor/code_styleguides/feature_flags.md` — the "delete to turn off" pattern
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.1, §4 — the nagent pattern that informed this guide
|
||||
@@ -593,3 +593,73 @@ See [guide_workspace_profiles.md](guide_workspace_profiles.md) (placeholder; wri
|
||||
- **[guide_discussions.md](guide_discussions.md)** — The Discussion system; MMA worker prompts are built from the active discussion
|
||||
- **[conductor/tracks/nagent_review_20260608/report.md §9](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the MMA sub-conversation pattern vs nagent's `<nagent-conversation>` tag; **the highest-priority future-track is to extract MMA's `run_worker_lifecycle` into a reusable `SubConversationRunner` for 1:1 discussions** (per user-flagged want)
|
||||
- **[conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md §3 and §10](../conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md)** — Actionable patterns for the SubConversationRunner; the design constraint that sub-agents return a *concise artifact* (not a full transcript) is baked into the recommendation
|
||||
## Addition (2026-06-12) — Delegation as context management, not parallelism
|
||||
|
||||
The nagent review (v2.3, §3.12) reframed delegation with a new lens: **the reason to spawn a sub-conversation is to keep the parent's context clean. The fact that the child runs concurrently (sometimes) is incidental.** Per nagent's `bin/nagent:730`: *"Hand off when noisy: if this conversation is mostly stale tool output, distill goal/state/decisions into a sub-conversation prompt, delegate the rest, and tell your caller about the handoff. Never rewrite your own conversation file while running."*
|
||||
|
||||
The reframing table:
|
||||
|
||||
| Long-lived agent abstractions | Disposable workers |
|
||||
|---|---|
|
||||
| Identity is central | Output artifact is central |
|
||||
| Shared context gets noisy | Child context is isolated |
|
||||
| Parent absorbs all exploration | Parent gets a concise result |
|
||||
| Delegation implies personality | Delegation is context management |
|
||||
|
||||
### How this applies to MMA
|
||||
|
||||
MMA already does this implicitly:
|
||||
- `src/multi_agent_conductor.py:_spawn_worker` runs each MMA worker as a fresh subprocess with `ai_client.reset_session()` (Context Amnesia)
|
||||
- The worker returns a `Result[TaskOutput, ErrorInfo]` to the parent (the `ConductorEngine`)
|
||||
- The parent's `disc_entries` doesn't accumulate the worker's intermediate reads/shell calls
|
||||
|
||||
### The product implication for 1:1 discussions
|
||||
|
||||
The 1:1 discussion path has no sub-agent primitive today. The user types a prompt, the AI responds, the loop continues. If the user wants the AI to "investigate this file" or "look up this API," the answer has to come from the same conversation.
|
||||
|
||||
**The product decision (user-flagged want).** Add a `SubConversationRunner` for 1:1 discussions. Reuse MMA's `mma_exec.py` as the subprocess template. The sub-agent returns a concise artifact (the sub-agent's response) + token usage + exit code. The App inserts the result into the active discussion as a "User" role entry. The next LLM call sees it.
|
||||
|
||||
### The SubConversationRunner shape (per the v2.3 §10.2 spec)
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class SubConversationResult:
|
||||
artifact: str # the sub-agent's response
|
||||
tokens_in: int
|
||||
tokens_out: int
|
||||
exit_code: int
|
||||
errors: list[ErrorInfo] # from the data_oriented_error_handling convention
|
||||
|
||||
class SubConversationRunner:
|
||||
async def spawn(self, prompt: str, *, allowed_tools: list[str] = None, ...) -> SubConversationResult:
|
||||
# Reuses mma_exec.py as the subprocess template
|
||||
# Returns the child's <nagent-response> content + token usage
|
||||
...
|
||||
```
|
||||
|
||||
**The design contract.** The sub-agent's return type is `SubConversationResult`, not the full conversation. The parent gets a concise artifact, not a transcript. The sub-conversation folder is auto-archived after 7 days (consistent with `log_pruner.py`).
|
||||
|
||||
## Addition (2026-06-12) — The 4 memory dimensions (the MMA scope)
|
||||
|
||||
The MMA tracks operate on `disc_entries` (the Discussion dim) and `manual_slop.toml` (the project config). They do NOT typically touch the Curation dim (per-track ticket specs) or the Knowledge dim (per-track session reports). They MAY touch the RAG dim if the ticket scope includes RAG integration (declared in `metadata.json`).
|
||||
|
||||
**The MMA scope, in the 4-dim framework:**
|
||||
|
||||
| Dim | MMA scope? | Why |
|
||||
|---|---|---|
|
||||
| Curation | per-ticket only | A ticket might add a `FileItem` if the feature touches curation; not a default |
|
||||
| Discussion | YES (the work) | The MMA worker's prompt is built from the active discussion |
|
||||
| RAG | per-ticket only | A ticket might use RAG if the feature includes RAG; declared in `metadata.json` |
|
||||
| Knowledge | per-track only | The track's session synthesis (in `docs/reports/`) is the durable knowledge |
|
||||
|
||||
**The implication for MMA workers.** MMA workers are given Context Amnesia (`ai_client.reset_session()` at the start of `run_worker_lifecycle`). The worker sees:
|
||||
- The ticket's prompt (the scoped work)
|
||||
- The `manual_slop.toml [agent.context_files]` (the project context)
|
||||
- The `FileItem` set per the ticket's scope
|
||||
- *Optionally* a `knowledge/digest.md` excerpt (if the ticket scope includes knowledge injection)
|
||||
|
||||
The worker does NOT see:
|
||||
- The full `disc_entries` history (per the Context Amnesia pattern)
|
||||
- The full `~/.manual_slop/knowledge/` (only the digest excerpt)
|
||||
- The RAG index (unless the ticket scope explicitly opts in)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user