docs: agent workflow docs + regular docs (v2.3 surfacing)
Per user request 'use your remaining context to update agent workflow
docs and then regular docs based on what was discussed in this report',
this commit creates/updates 15 files derived from the v2.3 nagent
review (the 12 new nagent additions + the 4 memory dimensions
reframing + the cache strategy + the RAG discipline + the knowledge
harvest pattern).
Agent workflow docs (4 files):
- AGENTS.md (UPDATE): add @import line to canonical DOD + 'Code
Styleguides' section pointing to the 6 new styleguides + new
'Human-Facing Documentation' section pointing to ./docs/AGENTS.md
- conductor/workflow.md (UPDATE): new section 'Additions (2026-06-12)
- the 12 patterns from the latest nagent corpus' with TDD
protocols for knowledge harvest, cache ordering, compaction, RAG
discipline
- conductor/product-guidelines.md (UPDATE): new sections 'Memory
Dimensions (added 2026-06-12)' + 'See Also - Updated' with the
6-styleguide catalog
- docs/AGENTS.md (NEW): the agent-facing mirror of docs/Readme.md
(per the nagent CLAUDE.md pattern). 10 sections + the per-tier
reading path + the 4 memory dimensions + the caching strategy +
the knowledge harvest + the RAG discipline + the feature flags
Regular docs (11 files):
- 6 new styleguides (the convention catalog):
* data_oriented_design.md: the canonical DOD reference (Tier
0/1/2; 3 defaults to reject; 8 core defaults; 7-question
simplification pass; 10-question self-check; 4 memory
dimensions in Manual Slop context)
* agent_memory_dimensions.md: the 4 memory dims (curation /
discussion / RAG / knowledge) + when to use each + the
boundaries
* rag_integration_discipline.md: the conservative-RAG rule
(opt-in, complement, provenance, no mutation, feature-gated,
graceful failure)
* cache_friendly_context.md: stable-to-volatile context
ordering + the cache TTL GUI contract + the byte-comparison
test
* knowledge_artifacts.md: the knowledge harvest pattern
(category files, provenance, sha256 ledger, digest
regeneration, 'delete to turn off')
* feature_flags.md: file presence vs config flags vs CLI flags
- 3 new project docs (the cross-cutting guides):
* guide_agent_memory_dimensions.md: the cross-cutting guide on
the 4 dims + the decision tree
* guide_caching_strategy.md: caching across providers +
stable-to-volatile ordering + cache TTL GUI + the byte-
comparison test + the 5th provider (claude-code)
* guide_knowledge_curation.md: the knowledge memory guide (4th
dim) + the 5 category files + per-file notes + the digest +
the ledger + the harvest workflow
- 2 existing doc updates:
* guide_mma.md: new sections 'Delegation as context management'
+ 'The 4 memory dimensions (the MMA scope)'
* guide_ai_client.md: new section 'Cache strategy and the 12-
layer model' + the 5th provider (claude-code)
All files use the same style as the v2.3 review (the user's preferred
format): 7-column tables, no JSON, SSDL shape tags, forth/array
notation, file:line citations, ASCII sketches where useful. The
human Readme files (Readme.md, docs/Readme.md) are NOT modified
(per repeated user instruction).
The 5th provider (claude-code) is documented in guide_ai_client.md
+ the data_oriented_design.md references the nagent pattern as the
source of the canonical rules.
The cross-references are bidirectional: the 6 styleguides reference
the 3 project docs; the 3 project docs reference the 6 styleguides;
the 2 doc updates reference both; AGENTS.md + ./docs/AGENTS.md
provide the entry points.
This commit is contained in:
+268
@@ -0,0 +1,268 @@
|
||||
# ./docs/AGENTS.md (the agent-facing mirror)
|
||||
|
||||
**Status:** Agent-facing mirror of `docs/Readme.md` (the human-facing docs index, which is preserved as-is). For agents (any tier), this is the recommended first read for understanding the project's docs structure.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `docs/Readme.md` (human-facing); `AGENTS.md` (project root); the 6 styleguides in `conductor/code_styleguides/`.
|
||||
|
||||
> **What this is.** `docs/Readme.md` is the human-facing docs index. *This* file is the agent-facing equivalent: it organizes the 14 deep-dive guides under `docs/` by MMA tier, and it cross-references the canonical styleguides. The 2 files cover the same docs but with different audiences and different reading paths.
|
||||
>
|
||||
> **The reading path.** If you're an agent scoping a feature, read this file first; then read the 1-2 `guide_*.md` files for the layers your feature touches; then read the 1-2 styleguides for the patterns the feature uses. The expected reading time for a typical feature: 10-15 minutes.
|
||||
|
||||
---
|
||||
|
||||
## 0. The 4 memory dimensions (the cross-cutting lens)
|
||||
|
||||
The conversation data has 4 distinct memory dimensions. Most features touch 1-2; some touch 3. Use this lens to identify which dimension(s) your feature needs.
|
||||
|
||||
| # | Dim | Where it lives | Use when | Styleguide |
|
||||
|---|---|---|---|---|
|
||||
| 1 | **Curation** | `FileItem` + `ContextPreset` + Fuzzy Anchors | "How to render a file" | (the curation is per `docs/guide_context_curation.md`) |
|
||||
| 2 | **Discussion** | `app.disc_entries` + branching + UISnapshot | "What was said in this chat" | (the discussion is per `docs/guide_architecture.md` §"Threading model") |
|
||||
| 3 | **RAG** | `src/rag_engine.py` (ChromaDB) | "What similar content exists" (opt-in) | `conductor/code_styleguides/rag_integration_discipline.md` |
|
||||
| 4 | **Knowledge** | `~/.manual_slop/knowledge/*.md` + per-file + digest | "What we learned from past sessions" | `conductor/code_styleguides/knowledge_artifacts.md` |
|
||||
|
||||
See `docs/guide_agent_memory_dimensions.md` for the full cross-cutting guide.
|
||||
|
||||
---
|
||||
|
||||
## 1. The 14 deep-dive guides (organized by MMA tier)
|
||||
|
||||
| Tier | Guide | What it covers | When to read |
|
||||
|---|---|---|---|
|
||||
| **T1** | `docs/guide_architecture.md` | Threading model; cross-thread state sync | When scoping any cross-cutting feature |
|
||||
| **T1** | `docs/guide_meta_boundary.md` | The Application vs Meta-Tooling split | When scoping a Meta-Tooling-side feature |
|
||||
| **T2** | `docs/guide_app_controller.md` | The headless controller; `AppState` dataclass | When implementing controller-side logic |
|
||||
| **T2** | `docs/guide_ai_client.md` | The multi-provider LLM client | When implementing LLM-side logic |
|
||||
| **T2** | `docs/guide_mma.md` | The 4-tier MMA orchestration | When implementing MMA-side logic |
|
||||
| **T2** | `docs/guide_tools.md` | The MCP tool inventory + Hook API | When implementing MCP tools or Hook endpoints |
|
||||
| **T2** | `docs/guide_mcp_client.md` | The 45 tools + 3-layer security | When implementing new MCP tools or sub-MCPs |
|
||||
| **T3** | `docs/guide_context_curation.md` | Granular AST Control + Fuzzy Anchors + Structural File Editor | When implementing curation-side features |
|
||||
| **T3** | `docs/guide_personas.md` | The unified agent profile model | When implementing persona-side features |
|
||||
| **T3** | `docs/guide_rag.md` | The RAG subsystem | When implementing RAG-side features (rare; opt-in) |
|
||||
| **T3** | `docs/guide_gui_2.md` | The ImGui application | When implementing GUI-side features |
|
||||
| **All** | `docs/guide_testing.md` | The test suite architecture (251 test files; 7 conftest fixtures) | When writing any test |
|
||||
| **All** | `docs/guide_command_palette.md` | The 33 commands + "Everything" mode | When implementing command-palette features |
|
||||
| **NEW** | `docs/guide_knowledge_curation.md` | The knowledge memory guide (4th dim) | When implementing knowledge-side features |
|
||||
| **NEW** | `docs/guide_caching_strategy.md` | Caching across providers; stable-to-volatile ordering; cache TTL GUI | When implementing cache-side features |
|
||||
| **NEW** | `docs/guide_agent_memory_dimensions.md` | Cross-cutting: the 4 memory dimensions | When scoping any feature that touches memory |
|
||||
|
||||
---
|
||||
|
||||
## 2. The 6 canonical styleguides (the convention catalog)
|
||||
|
||||
| Styleguide | What it codifies | When to read |
|
||||
|---|---|---|
|
||||
| `conductor/code_styleguides/data_oriented_design.md` | The canonical DOD reference (Tier 0/1/2; 3 defaults to reject; 7-question simplification pass; 10-question self-check) | Before any non-trivial work |
|
||||
| `conductor/code_styleguides/agent_memory_dimensions.md` | The 4 memory dimensions and when to use each | When the feature touches memory |
|
||||
| `conductor/code_styleguides/rag_integration_discipline.md` | The conservative-RAG rule (opt-in; complements; provenance; no mutation; feature-gated; graceful failure) | When the feature uses RAG |
|
||||
| `conductor/code_styleguides/cache_friendly_context.md` | Stable-to-volatile context ordering; the cache TTL GUI contract; the byte-comparison test | When the feature builds context or caches |
|
||||
| `conductor/code_styleguides/knowledge_artifacts.md` | The knowledge harvest pattern (category files, provenance, sha256 ledger, digest regeneration) | When the feature uses the knowledge dim |
|
||||
| `conductor/code_styleguides/feature_flags.md` | File presence ("delete to turn off") vs config flags vs CLI flags; when to use each | When adding a new feature toggle |
|
||||
|
||||
---
|
||||
|
||||
## 3. The per-tier reading path
|
||||
|
||||
### Tier 1 (Orchestrator) — what to read
|
||||
|
||||
For scoping a feature, understanding the architecture, and planning:
|
||||
|
||||
| Read | Why |
|
||||
|---|---|
|
||||
| `docs/guide_architecture.md` | The threading model; the cross-thread data flow |
|
||||
| `docs/guide_meta_boundary.md` | The Application vs Meta-Tooling split (load-bearing) |
|
||||
| `docs/guide_agent_memory_dimensions.md` | The 4 memory dimensions (which dim does my feature touch?) |
|
||||
| `conductor/code_styleguides/data_oriented_design.md` | The 3 defaults to reject; the simplification pass; the final self-check |
|
||||
| `AGENTS.md` (project root) | The project-root agent-facing rules |
|
||||
| This file (`.docs/AGENTS.md`) | The docs structure |
|
||||
|
||||
**Tier 1 does NOT typically read:** `guide_*.md` for the specific subsystems (T2 reads those).
|
||||
|
||||
### Tier 2 (Tech Lead) — what to read
|
||||
|
||||
For track design, ticket generation, and architecture:
|
||||
|
||||
| Read | Why |
|
||||
|---|---|
|
||||
| All of Tier 1's reads | (foundational) |
|
||||
| `docs/guide_app_controller.md` | The headless controller; the `_predefined_callbacks` and `_gettable_fields` registries |
|
||||
| `docs/guide_ai_client.md` | The LLM client; the providers; the cache strategy |
|
||||
| `docs/guide_mma.md` | The 4-tier MMA; the DAG engine; the worker pool |
|
||||
| `docs/guide_tools.md` | The MCP tool inventory; the Hook API; the 3-layer security |
|
||||
| `conductor/code_styleguides/agent_memory_dimensions.md` | (for memory-touching tracks) |
|
||||
| `conductor/code_styleguides/cache_friendly_context.md` | (for context-building tracks) |
|
||||
|
||||
**Tier 2 does NOT typically read:** `guide_context_curation.md`, `guide_personas.md`, `guide_rag.md`, `guide_gui_2.md` (T3 reads those).
|
||||
|
||||
### Tier 3 (Worker) — what to read
|
||||
|
||||
For surgical implementation:
|
||||
|
||||
| Read | Why |
|
||||
|---|---|
|
||||
| All of Tier 2's reads (selectively) | (the system context) |
|
||||
| The 1-2 `guide_*.md` files for the specific layers the ticket touches | (the implementation surface) |
|
||||
| The 1-2 `code_styleguides/...md` files for the patterns the ticket uses | (the convention) |
|
||||
| The ticket itself (`conductor/tracks/<id>/plan.md`) | (the specific task) |
|
||||
|
||||
**Tier 3 reads in depth, not in breadth.** A typical T3 worker reads 2-4 docs total.
|
||||
|
||||
### Tier 4 (QA) — what to read
|
||||
|
||||
For error analysis and bug reproduction:
|
||||
|
||||
| Read | Why |
|
||||
|---|---|
|
||||
| All of Tier 2's reads (selectively) | (the system context) |
|
||||
| The 1-2 `guide_*.md` files for the failing layer | (the reproduction surface) |
|
||||
| The test file (if any) | (the verification surface) |
|
||||
| The audit scripts (`scripts/audit_*.py`) | (the static analysis surface) |
|
||||
|
||||
**Tier 4 reads narrowly.** The bug is in 1-2 files; the read is in 1-2 docs.
|
||||
|
||||
---
|
||||
|
||||
## 4. The 4 memory dimensions (the cross-cutting lens, in detail)
|
||||
|
||||
Most features touch 1-2 dimensions. Use this decision tree:
|
||||
|
||||
```
|
||||
Q: What is the *data* the feature needs?
|
||||
│
|
||||
├── "How to render a file" ──► Curation (FileItem)
|
||||
├── "What was said in this chat" ──► Discussion (disc_entries)
|
||||
├── "What similar content exists" ──► RAG (RAGEngine.search) [opt-in]
|
||||
└── "What we learned from past runs" ──► Knowledge (knowledge/digest.md)
|
||||
```
|
||||
|
||||
**Pick the matching dimension.** If the feature needs 2+, use 2+ — but be explicit about which is *primary* and which is *secondary*.
|
||||
|
||||
**The wrong shape for the right question is a common mistake:**
|
||||
- "Where does X happen?" → RAG (semantic search)
|
||||
- "How do I configure how file Y is rendered?" → Curation (FileItem)
|
||||
- "What was the user asking about 3 turns ago?" → Discussion (disc_entries)
|
||||
- "What did we decide last time about Z?" → Knowledge (digest)
|
||||
|
||||
See `docs/guide_agent_memory_dimensions.md` for the full cross-cutting guide.
|
||||
|
||||
---
|
||||
|
||||
## 5. The caching strategy (the cross-cutting concern)
|
||||
|
||||
If the feature builds the initial context (in `aggregate.py:run`) or calls the LLM (in `ai_client.py:send`), the cache strategy matters.
|
||||
|
||||
**The 12-layer model:**
|
||||
|
||||
| # | Layer | Stable across turns? | Where the cache hits |
|
||||
|---|---|---|---|
|
||||
| 1-7 | Role instructions, function-calling schema, tool descriptions, system prompt, persona, project context, knowledge digest | **YES** (cacheable) | Anthropic `cache_control`, Gemini `cachedContent`, OpenAI implicit |
|
||||
| 8-12 | Discussion metadata, active preset, per-file details, prior tool results, user message | **NO** (per turn) | NOT cached |
|
||||
|
||||
**The byte-comparison test** (the design contract for the stable prefix):
|
||||
|
||||
```python
|
||||
def test_aggregate_stable_to_volatile_ordering():
|
||||
"""The first N characters of the context should be identical across turns
|
||||
of the same conversation, when no stable-layer inputs change."""
|
||||
...
|
||||
```
|
||||
|
||||
**The provider-specific TTLs:**
|
||||
|
||||
| Provider | Default TTL | Configurable? |
|
||||
|---|---|---|
|
||||
| Anthropic ephemeral | 5 min | yes (per-provider control surface) |
|
||||
| Gemini explicit | 1 h | yes (per-discussion override) |
|
||||
| OpenAI implicit | 5-10 min (provider-managed) | no |
|
||||
|
||||
**The GUI exposure** is a "Caching" Operations Hub sub-panel (per the v2.3 §5.3 sketch). See `docs/guide_caching_strategy.md` for the full guide and `conductor/code_styleguides/cache_friendly_context.md` for the styleguide.
|
||||
|
||||
---
|
||||
|
||||
## 6. The knowledge harvest (the durable layer)
|
||||
|
||||
The 4th memory dimension (knowledge) is *opt-in but encouraged* — it's the durable, user-editable, provenance-aware store of facts / decisions / questions / playbooks / per-file notes.
|
||||
|
||||
**The directory layout** (per the user's `~/.manual_slop/knowledge/`):
|
||||
|
||||
```
|
||||
knowledge/
|
||||
├── facts.md # - {statement} {provenance}
|
||||
├── decisions.md # - {statement, reason} {provenance}
|
||||
├── questions.md # - {question} {provenance}
|
||||
├── playbooks.md # - **{name}**: {steps} {provenance}
|
||||
├── tasks.md # ## Open / ## Done
|
||||
├── files/{file_id}.md # per-file notes (keyed by inode)
|
||||
├── digest.md # bounded 4KB; the projection; "delete to turn off"
|
||||
├── ledger.json # sha256-of-content audit log
|
||||
└── prompts/harvest-conversation.md # user-editable
|
||||
```
|
||||
|
||||
**The harvest CLI:** `python -m src.knowledge_harvest [--apply] [--no-harvest] [--max-harvest-bytes N]`. Default: dry-run.
|
||||
|
||||
**The LLM output is strict JSON** (no prose, no markdown fence) with 7 categories. The retry budget is 2 attempts.
|
||||
|
||||
**The "delete to turn off" pattern:** `rm ~/.manual_slop/knowledge/digest.md` → no `{knowledge}` block injected. Re-enable by running the harvest.
|
||||
|
||||
See `docs/guide_knowledge_curation.md` for the full guide and `conductor/code_styleguides/knowledge_artifacts.md` for the styleguide.
|
||||
|
||||
---
|
||||
|
||||
## 7. The RAG discipline (the opt-in fuzzy dimension)
|
||||
|
||||
RAG is the *fuzzy semantic search* dimension. It's *opt-in* (default-off in new projects). The 6 rules:
|
||||
|
||||
1. **Opt-in.** Default-off in new projects
|
||||
2. **Complements; never replaces.** RAG is one of 4 dimensions, not a substitute
|
||||
3. **Provenance required.** Every result shows file + chunk
|
||||
4. **No mutation.** RAG results never write to `disc_entries`, `FileItem`, or disk
|
||||
5. **Feature-gated.** A feature must explicitly request RAG in its scope
|
||||
6. **Graceful failure.** Failed search returns empty; the request continues
|
||||
|
||||
See `docs/guide_rag.md` for the full RAG guide and `conductor/code_styleguides/rag_integration_discipline.md` for the styleguide.
|
||||
|
||||
---
|
||||
|
||||
## 8. The feature flag patterns (when to use what)
|
||||
|
||||
When adding a new feature with an "on/off" toggle, choose the right pattern:
|
||||
|
||||
| Pattern | When to use | Example |
|
||||
|---|---|---|
|
||||
| **File presence** ("delete to turn off") | The feature produces a side artifact; the user might want to clean up by `rm`-ing it | `~/.manual_slop/knowledge/digest.md` |
|
||||
| **Config flag** | The feature is always on; the flag is a persistent preference | `[ai_settings.toml] rag.enabled` |
|
||||
| **CLI flag** | The feature is invoked from the CLI; the flag is a one-shot override | `python -m src.knowledge_harvest --apply` |
|
||||
| **Track metadata flag** | The track's implementation uses a feature; this is *static documentation* | `metadata.json`: `{"uses_rag": true}` |
|
||||
|
||||
See `conductor/code_styleguides/feature_flags.md` for the full guide.
|
||||
|
||||
---
|
||||
|
||||
## 9. The cross-cutting principles (the data-oriented foundation)
|
||||
|
||||
All 14 docs and 6 styleguides share the same foundation (per `data_oriented_design.md`):
|
||||
|
||||
- **The data is the thing.** The conversation, the file items, the knowledge digest — these are the source of truth
|
||||
- **Behavior is transformation over data.** Not object graphs; not hidden state; not opaque handles
|
||||
- **Avoid hidden mutable state.** Errors are data, not exceptions. State is on disk, not in memory
|
||||
- **Separate durable artifacts from temporary execution.** Workers are disposable; artifacts are durable
|
||||
- **Optimize the shape, availability, and maintenance of the data.** Editable, provenance-aware, user-editable
|
||||
|
||||
When in doubt, read `conductor/code_styleguides/data_oriented_design.md` first.
|
||||
|
||||
---
|
||||
|
||||
## 10. The reading path (the 1-page summary)
|
||||
|
||||
For an agent scoping a feature:
|
||||
|
||||
1. **Read this file** (10 min)
|
||||
2. **Read the 1-2 `guide_*.md`** for the layers your feature touches (5-10 min each)
|
||||
3. **Read the 1-2 `code_styleguides/...md`** for the patterns your feature uses (5-10 min each)
|
||||
4. **Read the ticket** (`conductor/tracks/<id>/plan.md`) for the specific task (variable)
|
||||
|
||||
Total: 20-45 min for a typical feature. The investment pays back across the feature's lifetime.
|
||||
|
||||
If a guide is missing or stale, that's a bug; file a docs issue (or update the guide inline, per the project's "edit the source of truth, not this file" pattern).
|
||||
|
||||
End of agent-facing mirror.
|
||||
@@ -0,0 +1,283 @@
|
||||
# The 4 Memory Dimensions (cross-cutting guide)
|
||||
|
||||
**Status:** User-facing cross-cutting guide on the 4 memory dimensions. For agents, see `./docs/AGENTS.md` §0.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/agent_memory_dimensions.md`; `docs/guide_context_curation.md`; `docs/guide_rag.md`; `docs/guide_knowledge_curation.md`; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §2.8.
|
||||
|
||||
> **What this is.** The conversation data has 4 distinct memory dimensions. Most features touch 1-2; some touch 3. This guide is the cross-cutting reference: when to use which dimension, the boundaries between them, and the decision tree for "which dim does this feature need?"
|
||||
|
||||
---
|
||||
|
||||
## 0. The 30-second version
|
||||
|
||||
Manual Slop has 4 memory dimensions for the conversation data:
|
||||
|
||||
| # | Dim | Where it lives | What it stores | Status |
|
||||
|---|---|---|---|---|
|
||||
| 1 | **Curation** | `FileItem` + `ContextPreset` + Fuzzy Anchors | *How to render a file* in the AI's context window | Existing, strong |
|
||||
| 2 | **Discussion** | `app.disc_entries` + branching + UISnapshot | *What was said* in the conversation | Existing, strong |
|
||||
| 3 | **RAG** | `src/rag_engine.py` (ChromaDB) | *Semantic fingerprints* of indexed files | Opt-in |
|
||||
| 4 | **Knowledge** | `~/.manual_slop/knowledge/*.md` + per-file + digest | *Durable learnings* from past sessions | Proposed (Candidate 8) |
|
||||
|
||||
**The decision tree:**
|
||||
|
||||
```
|
||||
Q: What is the *data* the feature needs?
|
||||
│
|
||||
├── "How to render a file" ──► Curation (FileItem)
|
||||
├── "What was said in this chat" ──► Discussion (disc_entries)
|
||||
├── "What similar content exists" ──► RAG (RAGEngine.search) [opt-in]
|
||||
└── "What we learned from past runs" ──► Knowledge (knowledge/digest.md)
|
||||
```
|
||||
|
||||
**Pick the matching dimension.** If the feature needs 2+, use 2+ — but be explicit about which is *primary* and which is *secondary*.
|
||||
|
||||
---
|
||||
|
||||
## 1. Curation memory (per-file, per-discussion, structural)
|
||||
|
||||
**The shape.** Per-file curation config in `FileItem`:
|
||||
- `path` (the file identity)
|
||||
- `auto_aggregate` (include in auto-aggregation?)
|
||||
- `force_full` (bypass aggregation with full content?)
|
||||
- `view_mode` (`full / skeleton / summary / sig / def / agg`)
|
||||
- `ast_signatures` (signatures only?)
|
||||
- `ast_definitions` (definitions only?)
|
||||
- `ast_mask` (per-symbol mask)
|
||||
- `custom_slices` (Fuzzy Anchors)
|
||||
|
||||
A `ContextPreset` is a named, persisted set of `FileItem`s. Both persist in the project TOML.
|
||||
|
||||
**The query model.** "When discussion X opens, render file Y per its curation memory." Implicit in `aggregate.py:run` at discussion start. The user doesn't query the curation memory directly; they *configure* it.
|
||||
|
||||
**The right tool.** The Structural File Editor (per `docs/guide_context_curation.md`). AST-aware slices, Fuzzy Anchor slices, view-mode picker. The file's `FileItem` is the UI surface.
|
||||
|
||||
**The wrong tool.** Storing curation state in `disc_entries` (it's not conversational). Storing curation state in the RAG index (it's structural, not semantic). Storing curation state in the knowledge digest (it's per-discussion, not durable).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:discussion starts]
|
||||
│
|
||||
▼
|
||||
[Q:which ContextPreset is active?]
|
||||
│
|
||||
├── preset N ──► [I:load ContextPreset N's FileItems]
|
||||
│
|
||||
▼
|
||||
[loop: each FileItem]
|
||||
│
|
||||
├──► [Q:FileItem.view_mode?]
|
||||
│ ├── full ──► [I:read full file]
|
||||
│ ├── skeleton ──► [I:py_get_skeleton / ts_c_get_skeleton]
|
||||
│ ├── summary ──► [I:run_subagent_summarization]
|
||||
│ ├── sig ──► [I:py_get_skeleton (signatures only)]
|
||||
│ ├── def ──► [I:py_get_skeleton (definitions only)]
|
||||
│ └── agg ──► [I:py_get_skeleton (children only)]
|
||||
│
|
||||
├──► [Q:FileItem.ast_mask?] ──► [I:apply ast_mask to the rendered view]
|
||||
├──► [Q:FileItem.custom_slices?] ──► [I:apply custom_slices]
|
||||
└──► [I:append to aggregate markdown]
|
||||
```
|
||||
|
||||
**The shape rule.** Curation is per-file, per-discussion, structural. Edited at the Structural File Editor. Persisted in TOML. The file's `FileItem` is the single source of truth for "how do I render this file in the AI's context."
|
||||
|
||||
**See:** `docs/guide_context_curation.md`; `src/models.py:510-559` (FileItem schema); `src/context_presets.py` (ContextPresetManager).
|
||||
|
||||
---
|
||||
|
||||
## 2. Discussion memory (per-discussion, conversational, multi-turn)
|
||||
|
||||
**The shape.** `app.disc_entries: list[dict]` where each entry is `{"role": str, "content": str, "collapsed": bool, "ts": str, ...}` plus optional `thinking_segments` and `usage` (token accounting). The discussion is rendered as a `list[Message]` for the LLM by `build_markdown` (per `src/aggregate.py`).
|
||||
|
||||
**The query model.** "What did the user say? What did the AI say? In what order?" The discussion is the *prior context* for the next LLM call. The user can edit, insert, delete, role-change, and branch at any entry (A1-A7 per-entry operations per the nagent review v1 §3).
|
||||
|
||||
**The right tool.** The Discussion Hub panel. Per-entry `[Edit]`, `[Read]`, `[+/-]`, `Ins`, `Del`, `[Branch]`, role combo. The undo/redo stack (UISnapshot) and the Take/branching/compact system.
|
||||
|
||||
**The wrong tool.** Storing discussion state in the RAG index (it's temporal, not semantic). Storing discussion state in the knowledge digest (it's per-discussion, not durable). Storing discussion state in a FileItem (it's not per-file).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:user types prompt + hits Enter]
|
||||
│
|
||||
▼
|
||||
[I:append new entry to disc_entries] (role: "User")
|
||||
│
|
||||
▼
|
||||
[Q:which ContextPreset is active?] ──► [I:render FileItems per curation memory]
|
||||
│
|
||||
▼
|
||||
[I:aggregate.build_markdown(preset, discussion) -> str]
|
||||
│
|
||||
▼
|
||||
[I:ai_client.send(aggregate_text, history)]
|
||||
│
|
||||
▼
|
||||
[I:append new entry to disc_entries] (role: "AI", content: response)
|
||||
│
|
||||
▼
|
||||
[Q:user pressed Edit on an entry?] ──► [I:update disc_entries[i].content]
|
||||
[Q:user pressed Branch on an entry?] ──► [I:project_manager.branch_discussion(index) -> new Take]
|
||||
[Q:user pressed Undo?] ──► [I:history.UISnapshot.pop() -> restore previous state]
|
||||
[Q:user pressed Compact?] ──► [I:ai_client.run_discussion_compaction(discussion)]
|
||||
```
|
||||
|
||||
**The shape rule.** Discussion is per-discussion, conversational, multi-turn. Edited per-entry. Persisted in TOML via `_flush_to_project`. The `disc_entries` list is the single source of truth for "what was said in this discussion."
|
||||
|
||||
**See:** `docs/guide_architecture.md` §"Threading model"; `src/gui_2.py:3770-3853` (render_discussion_entry); `src/history.py:8-71` (UISnapshot).
|
||||
|
||||
---
|
||||
|
||||
## 3. RAG memory (opt-in, semantic, fuzzy)
|
||||
|
||||
**The shape.** ChromaDB vector store; per-file `FileItem`-like records with embeddings. `RAGEngine.search(query, k=N)` returns the top-N most-similar chunks. Persisted in `~/.manual_slop/.slop_cache/chroma_<embedding_provider>/`.
|
||||
|
||||
**The query model.** "Given a query, return similar content from the indexed corpus." Semantic similarity, fuzzy. No provenance beyond the file path. No user-editable content.
|
||||
|
||||
**The right tool.** `RAGEngine.search()` at LLM call time (the `rag_*` results injected into the LLM prompt). The `[X] Enable RAG` toggle in AI Settings. The `RAGConfig` (embedding provider, chunk size, chunk overlap, source selection).
|
||||
|
||||
**The wrong tool.** Using RAG as a *replacement* for the other 3 dimensions. Using RAG results for state mutation (the integration discipline prohibits this). Using RAG for "show me the last thing the user said" (use Discussion memory). Using RAG for "show me what we decided last time" (use Knowledge memory).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:ai_client.send() is called]
|
||||
│
|
||||
▼
|
||||
[Q:is RAG enabled?]
|
||||
│
|
||||
├── no ──► [T:skip]
|
||||
│
|
||||
▼
|
||||
[Q:which RAG source?]
|
||||
│
|
||||
├── project ──► [I:RAGEngine.index_file for each file in project]
|
||||
├── global ──► [I:RAGEngine.index_file for each file in ~/.manual_slop/knowledge/]
|
||||
└── none ──► [T:skip]
|
||||
│
|
||||
▼
|
||||
[Q:RAG engine initialized?]
|
||||
│
|
||||
├── no ──► [I:RAGEngine._init_embedding_provider()] (lazy init, may download)
|
||||
│
|
||||
▼
|
||||
[I:RAGEngine.search(query, k=N) -> Result[list[SearchResult], ErrorInfo]]
|
||||
│
|
||||
▼
|
||||
[I:append "{rag-context}" block to aggregate markdown]
|
||||
```
|
||||
|
||||
**The shape rule.** RAG is opt-in. Default-off. Complements the other dimensions; never replaces. Provenance is required (file path, chunk offset). No mutation.
|
||||
|
||||
**See:** `docs/guide_rag.md`; `conductor/code_styleguides/rag_integration_discipline.md`; `src/rag_engine.py:1-384`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Knowledge memory (per-project, durable, provenance-aware)
|
||||
|
||||
**The shape.** A markdown tree at `~/.manual_slop/knowledge/`:
|
||||
|
||||
| File | Format | What it stores |
|
||||
|---|---|---|
|
||||
| `facts.md` | `- {statement} {provenance}` | Durable statements about systems, repos, tools |
|
||||
| `decisions.md` | `- {statement, reason} {provenance}` | Decisions that were made |
|
||||
| `questions.md` | `- {question} {provenance}` | Unanswered questions |
|
||||
| `playbooks.md` | `- **{name}**: {steps} {provenance}` | Reusable command sequences |
|
||||
| `tasks.md` | `- {task}` (## Open / ## Done) | Open and done tasks |
|
||||
| `files/{file_id}.md` | `- {note} {provenance}` | Per-file notes (keyed by inode) |
|
||||
| `digest.md` | bounded 4KB | The projected digest (injected as `{knowledge}` block) |
|
||||
| `ledger.json` | `{entries: {sha256: {status, at, items}}}` | The harvest audit log |
|
||||
|
||||
**The query model.** "Given past sessions, what durable knowledge should I inject into the current discussion?" The answer is the `{knowledge}` block in the initial context, regenerated from the category files (newest first), bounded to 4KB.
|
||||
|
||||
**The right tool.** The harvest CLI (`python -m src.knowledge_harvest`) for the harvest; the plain text editor for the category files. The "Knowledge" panel in the GUI for browse/edit/prune.
|
||||
|
||||
**The wrong tool.** Treating the knowledge digest as state (it's a projection; the category files are the state). Letting the digest grow unbounded (4KB cap; truncate with a visible note). Treating the per-file notes as a replacement for FileItem curation (different dimensions; both are useful).
|
||||
|
||||
**The codepath** (SSDL):
|
||||
|
||||
```
|
||||
[Q:discussion starts]
|
||||
│
|
||||
▼
|
||||
[Q:knowledge digest exists?]
|
||||
│
|
||||
├── no ──► [T:skip]
|
||||
│
|
||||
▼
|
||||
[Q:digest within 4KB budget?]
|
||||
│
|
||||
├── yes ──► [I:read digest]
|
||||
├── no ──► [I:read digest (truncated with note)]
|
||||
│
|
||||
▼
|
||||
[I:append "{knowledge}" block to stable prefix] (layer 7)
|
||||
│
|
||||
▼
|
||||
[Q:per-file knowledge for files in scope?]
|
||||
│
|
||||
├── yes ──► [I:append "{file-knowledge}" per FileItem]
|
||||
```
|
||||
|
||||
**The shape rule.** Knowledge is per-project, durable, provenance-aware. Edited by the user (plain markdown). The category files are the source of truth; the digest is a projection. "Delete to turn off": `rm digest.md` → no injection.
|
||||
|
||||
**See:** `docs/guide_knowledge_curation.md`; `conductor/code_styleguides/knowledge_artifacts.md`.
|
||||
|
||||
---
|
||||
|
||||
## 5. The boundaries (when NOT to mix)
|
||||
|
||||
| Don't store... | In... | Because... |
|
||||
|---|---|---|
|
||||
| Discussion state | `FileItem` (curation) | Discussion is per-discussion, not per-file |
|
||||
| File curation | `disc_entries` (discussion) | Curation is per-file structural, not conversational |
|
||||
| Semantic search results | `disc_entries` (discussion) | RAG is fuzzy; the discussion is precise |
|
||||
| A long conversation | the knowledge digest | The digest is bounded (4KB); the conversation is unbounded |
|
||||
| A "this is the current state" fact | the RAG index | RAG is semantic; state is precise |
|
||||
| Per-file notes | the discussion context | The notes should follow the file, not the discussion |
|
||||
| Per-discussion summary | the knowledge digest | The digest is *cross*-discussion, not per-discussion |
|
||||
| LLM-derived curation | the FileItem schema | LLM outputs are untrusted; the FileItem is user-edited |
|
||||
| Untrusted LLM output | the knowledge category files | The harvest has retry + graceful failure; but the category files are *user-editable*, so corrections are first-class |
|
||||
|
||||
**The discipline.** When designing a new feature, ask: which of the 4 dimensions is the *natural* home? Don't reach for the RAG because "it's there"; reach for the dimension whose shape matches the data.
|
||||
|
||||
---
|
||||
|
||||
## 6. The decision tree (the 1-question test)
|
||||
|
||||
When a feature needs *some* memory, ask this single question:
|
||||
|
||||
```
|
||||
Q: What is the *data* (not the operation) the feature needs?
|
||||
│
|
||||
├── "How to render a file" ──► Curation (FileItem)
|
||||
├── "What was said in this chat" ──► Discussion (disc_entries)
|
||||
├── "What similar content exists" ──► RAG (RAGEngine.search) [opt-in]
|
||||
└── "What we learned from past runs" ──► Knowledge (knowledge/digest.md)
|
||||
```
|
||||
|
||||
Pick the matching dimension. If the feature needs 2+, use 2+ — but be explicit about which is the *primary* (the one that holds the *answer*) and which is *secondary* (the one that provides *context*).
|
||||
|
||||
---
|
||||
|
||||
## 7. The cross-cutting principle (the "data is the thing")
|
||||
|
||||
All 4 dimensions share one principle: **the data is the thing, not the agent.** Each dimension has:
|
||||
- A flat shape (no object graphs; structs of structs of scalars)
|
||||
- A durable storage (TOML, ChromaDB, markdown — not Python objects)
|
||||
- A user-editable surface (the Structural File Editor, the Discussion Hub, the RAG toggle, the category files)
|
||||
- A query model that returns "data, not control flow" (per `data_oriented_error_handling_20260606`)
|
||||
|
||||
The wrong shape for the right question is a common mistake. The right question is "which of the 4 dimensions is this?" — not "is there a tool that does X?"
|
||||
|
||||
---
|
||||
|
||||
## 8. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/agent_memory_dimensions.md` — the canonical styleguide
|
||||
- `docs/guide_context_curation.md` — the existing curation deep-dive (dimension 1)
|
||||
- `docs/guide_rag.md` — the existing RAG deep-dive (dimension 3)
|
||||
- `docs/guide_knowledge_curation.md` — the new knowledge guide (dimension 4)
|
||||
- `docs/guide_caching_strategy.md` — where the 4 dims get injected in the cache strategy
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §2.8 — the nagent-origin pattern that informed this guide
|
||||
@@ -703,3 +703,143 @@ Audit: `scripts/audit_providers_source_of_truth.py` fails if `PROVIDERS` is decl
|
||||
- `tests/test_vendor_capabilities.py` (3 tests): registry lookup, vendor-default fallback, unknown-vendor raises
|
||||
- `tests/test_openai_compatible.py` (6 tests): non-streaming, streaming aggregation, tool call detection, vision, error classification, frozen dataclass
|
||||
- **[conductor/tracks/nagent_review_20260608/report.md §15 Pitfalls #2 and #4](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the per-provider history globals and the stateful singleton pattern; future-track candidate for stateless LLMClient
|
||||
## Addition (2026-06-12) — Cache strategy and the 12-layer model
|
||||
|
||||
The nagent review (v2.3, §3.2 + §5) formalizes the cache strategy that this client implements. The strategy: **stable-to-volatile context ordering**, where layers 1-7 of the initial context are byte-identical across turns and across discussions of the same mode (and therefore cacheable), and layers 8-12 are per-turn (and therefore not cached).
|
||||
|
||||
### The 12-layer model (the recap)
|
||||
|
||||
| # | Layer | Stable? | Where |
|
||||
|---|---|---|---|
|
||||
| 1 | Role instructions | yes | `_get_combined_system_prompt` |
|
||||
| 2 | Function-calling schema | yes | per provider |
|
||||
| 3 | Discovered tool descriptions | yes | `mcp_client.get_tool_schemas()` |
|
||||
| 4 | System prompt preset | yes | `app_state.ai_settings.system_prompt` |
|
||||
| 5 | Persona profile | yes | `app_state.active_persona` |
|
||||
| 6 | Project context (per `manual_slop.toml`) | yes | NEW (Candidate 14) |
|
||||
| 7 | Knowledge digest (per `knowledge/digest.md`) | yes (within gc cycle) | NEW (Candidate 8) |
|
||||
| 8 | Discussion metadata | no | `disc_entries[:1]` or `disc_meta` |
|
||||
| 9 | Active preset (FileItem set) | no | `self.context_files` |
|
||||
| 10 | Per-file details | no | per `FileItem` |
|
||||
| 11 | Prior tool results | no | per `_reread_file_items` |
|
||||
| 12 | User message | no | the input |
|
||||
|
||||
### The byte-comparison test (the design contract)
|
||||
|
||||
The test in `tests/test_aggregate_caching.py` ensures the first N characters of the context are byte-identical across turns:
|
||||
|
||||
```python
|
||||
def test_aggregate_stable_to_volatile_ordering():
|
||||
ctrl = mock_app_controller()
|
||||
turn1 = aggregate.build_initial_context(ctrl, user_message="first")
|
||||
turn2 = aggregate.build_initial_context(ctrl, user_message="second")
|
||||
N = aggregate.stable_prefix_length(ctrl)
|
||||
assert turn1[:N] == turn2[:N], f"Stable prefix mismatch: {turn1[:N]!r} != {turn2[:N]!r}"
|
||||
```
|
||||
|
||||
**The test is the contract.** If a new layer is added in the wrong position, the test fails; the agent must move the layer to the stable position or update the test with written justification.
|
||||
|
||||
### The provider-specific cache strategies
|
||||
|
||||
#### Anthropic (5-min ephemeral, 4 breakpoints max)
|
||||
|
||||
```python
|
||||
def _send_anthropic(messages, *, cache_prefix_chars=None):
|
||||
if cache_prefix_chars is not None:
|
||||
content_blocks = cache_prefix_blocks(messages, cache_prefix_chars)
|
||||
else:
|
||||
content_blocks = messages
|
||||
|
||||
response = anthropic_client.messages.create(
|
||||
model=model,
|
||||
max_tokens=8192,
|
||||
messages=[{"role": "user", "content": content_blocks}],
|
||||
)
|
||||
return _result_with_usage(response.content, response.usage, messages)
|
||||
```
|
||||
|
||||
**The `cache_prefix_blocks` helper** splits the message at the given char offsets and marks each prefix with `cache_control: {"type": "ephemeral"}`. Max 3 prefix blocks (provider limit is 4 breakpoints per request).
|
||||
|
||||
**The Anthropic usage accounting** (in `_result_with_usage`): `cache_read_input_tokens` + `cache_creation_input_tokens` are added to `input_tokens` so the accounting stays "tokens sent" across providers. Caching is *invisible* in the user-facing number.
|
||||
|
||||
#### Gemini (1-h explicit, configurable TTL)
|
||||
|
||||
```python
|
||||
def _send_gemini(messages, *, cache_ttl_seconds=3600):
|
||||
if cache_ttl_seconds > 0:
|
||||
cached_content = genai_client.caches.create(
|
||||
model=model, contents=stable_prefix_messages, ttl=f"{cache_ttl_seconds}s",
|
||||
)
|
||||
response = genai_client.models.generate_content(
|
||||
model=model, contents=volatile_messages,
|
||||
config=genai.types.GenerateContentConfig(cached_content=cached_content.name),
|
||||
)
|
||||
else:
|
||||
response = genai_client.models.generate_content(model=model, contents=messages)
|
||||
return _result_with_usage(response.text, response.usage_metadata, messages)
|
||||
```
|
||||
|
||||
**The default TTL is 1 hour**; configurable per-discussion via the GUI.
|
||||
|
||||
#### OpenAI (5-10 min implicit, provider-managed)
|
||||
|
||||
No application-side control; the provider handles caching. The GUI just shows "Cached by OpenAI; TTL: provider-managed."
|
||||
|
||||
### The GUI exposure (the "Caching" Operations Hub sub-panel)
|
||||
|
||||
| Provider | Default TTL | Configurable? |
|
||||
|---|---|---|
|
||||
| Anthropic ephemeral | 5 min | yes (per-discussion state) |
|
||||
| Gemini explicit | 1 h | yes (TTL override) |
|
||||
| OpenAI implicit | 5-10 min (provider-managed) | no |
|
||||
| claude-code (Claude Agent SDK) | varies (provider-managed) | no |
|
||||
|
||||
**The new AI client state:**
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class DiscussionCacheState:
|
||||
discussion_id: str
|
||||
provider: str
|
||||
cached_at: datetime
|
||||
expires_at: Optional[datetime] # None for OpenAI implicit
|
||||
hit_count: int = 0
|
||||
tokens_cached: int = 0
|
||||
last_invalidated_at: Optional[datetime] = None
|
||||
caching_enabled: bool = True
|
||||
```
|
||||
|
||||
**The Hook API additions:**
|
||||
|
||||
```
|
||||
GET /api/cache # list all discussion cache states
|
||||
GET /api/cache/<discussion_id> # get one
|
||||
POST /api/cache/<discussion_id>/invalidate
|
||||
POST /api/cache/<discussion_id>/disable
|
||||
POST /api/cache/<discussion_id>/enable
|
||||
```
|
||||
|
||||
### The 5th provider (claude-code)
|
||||
|
||||
`claude-code` uses the Claude Agent SDK with local Claude Code authentication (no API key). The caching behavior is provider-managed.
|
||||
|
||||
```python
|
||||
def _send_claude_code(message, model, *, allowed_tools=None, max_turns=1):
|
||||
options = ClaudeAgentOptions(
|
||||
model=None if not model or model == "default" else model,
|
||||
max_turns=max_turns,
|
||||
tools=list(allowed_tools) if allowed_tools else [],
|
||||
allowed_tools=list(allowed_tools) if allowed_tools else [],
|
||||
cwd=os.getcwd(),
|
||||
)
|
||||
# ... claude_agent_sdk.query(prompt=message, options=options)
|
||||
return _result_with_usage(text, usage, message)
|
||||
```
|
||||
|
||||
### The cross-references
|
||||
|
||||
- `docs/guide_caching_strategy.md` — the user-facing deep-dive
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — the canonical styleguide
|
||||
- `docs/guide_agent_memory_dimensions.md` — the 4 dims (where the cache hits)
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5 — the nagent pattern
|
||||
|
||||
|
||||
@@ -0,0 +1,342 @@
|
||||
# Caching Strategy Guide
|
||||
|
||||
**Status:** User-facing deep-dive on the cache strategy: stable-to-volatile context ordering, the 4 cache-TTL profiles (Anthropic, Gemini, OpenAI, claude-code), and the GUI exposure.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/cache_friendly_context.md`; `docs/guide_ai_client.md`; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5.
|
||||
|
||||
> **What this is.** The LLM providers Manual Slop uses (Anthropic, Gemini, OpenAI) all support prompt caching. The cost benefit comes from the *stable prefix* being byte-identical across turns. This guide is the user-facing deep-dive on the 12-layer model, the byte-comparison test, the provider-specific TTLs, and the GUI exposure.
|
||||
|
||||
---
|
||||
|
||||
## 0. The 30-second version
|
||||
|
||||
```
|
||||
[STABLE PREFIX (cached across turns)] [VOLATILE SUFFIX (per-turn)]
|
||||
[Role instructions] [Discussion metadata]
|
||||
[Function-calling schema] [Active preset (FileItems)]
|
||||
[Discovered tool descriptions] [Per-file details]
|
||||
[System prompt preset] [Tool-call results from prior turns]
|
||||
[Persona profile] [The user message]
|
||||
[Project context]
|
||||
[Knowledge digest]
|
||||
[file-knowledge for files in scope]
|
||||
```
|
||||
|
||||
**The cache boundary is at layer 8/9.** Layers 1-7 are byte-identical across turns; layers 8-12 change per turn. The Anthropic-specific path wraps the prefix in `cache_control: {"type": "ephemeral"}` blocks; the Gemini path uses `cachedContent` resources; the OpenAI path uses implicit prefix caching.
|
||||
|
||||
**The provider-specific defaults:**
|
||||
|
||||
| Provider | Default TTL | Configurable? | GUI exposure? |
|
||||
|---|---|---|---|
|
||||
| Anthropic ephemeral | 5 min | yes (per-discussion) | yes |
|
||||
| Gemini explicit | 1 h | yes (per-discussion override) | yes (TTL override) |
|
||||
| OpenAI implicit | 5-10 min (provider-managed) | no | shows "cached" only |
|
||||
| claude-code (Claude Agent SDK) | varies (provider-managed) | no | shows "cached" only |
|
||||
|
||||
---
|
||||
|
||||
## 1. The 12-layer model (the stable-to-volatile ordering)
|
||||
|
||||
| # | Layer | Stable across turns? | Source | SSDL |
|
||||
|---|---|---|---|---|
|
||||
| 1 | Role instructions (model + provider) | yes | `_get_combined_system_prompt` | `[I]` |
|
||||
| 2 | Function-calling schema | yes | per provider | `[I]` |
|
||||
| 3 | Discovered tool descriptions | yes | `mcp_client.get_tool_schemas()` | `[I]` |
|
||||
| 4 | System prompt preset | yes | `app_state.ai_settings.system_prompt` | `[I]` |
|
||||
| 5 | Persona profile | yes | `app_state.active_persona` | `[I]` |
|
||||
| 6 | Project context (per `manual_slop.toml`) | yes | NEW | `[I]` |
|
||||
| 7 | Knowledge digest (per `knowledge/digest.md`) | yes (within a gc cycle) | NEW | `[I]` |
|
||||
| 8 | Discussion metadata (name, role count) | no (per turn) | `disc_entries[:1]` or `disc_meta` | `───` |
|
||||
| 9 | Active preset (FileItem set) | no (per turn) | `self.context_files` | `───` |
|
||||
| 10 | Per-file details (history, slices, notes) | no (per file) | per `FileItem` | `───` |
|
||||
| 11 | Tool-call results from prior turns | no (per turn) | per `_reread_file_items` | `───` |
|
||||
| 12 | The user message | no (per turn) | the input | `───` |
|
||||
|
||||
**The cache boundary is at layer 7/8.** Layers 1-7 are byte-identical across turns of the same discussion (and across discussions of the same mode). Layers 8-12 change per turn.
|
||||
|
||||
---
|
||||
|
||||
## 2. The byte-comparison test (the design contract)
|
||||
|
||||
The design rule "stable prefix is byte-identical" must be testable. The test:
|
||||
|
||||
```python
|
||||
# In tests/test_aggregate_caching.py (NEW)
|
||||
def test_aggregate_stable_to_volatile_ordering():
|
||||
"""The first N characters of the context should be identical across turns
|
||||
of the same conversation, when no stable-layer inputs change."""
|
||||
ctrl = mock_app_controller()
|
||||
ctrl.ai_settings.system_prompt = "Test system prompt"
|
||||
ctrl.active_persona = mock_persona()
|
||||
|
||||
# Turn 1
|
||||
turn1 = aggregate.build_initial_context(ctrl, user_message="first prompt")
|
||||
|
||||
# Turn 2 (same stable inputs, different user message)
|
||||
turn2 = aggregate.build_initial_context(ctrl, user_message="second prompt")
|
||||
|
||||
# The first N characters should be identical (N = where the volatile layers start)
|
||||
N = aggregate.stable_prefix_length(ctrl)
|
||||
assert turn1[:N] == turn2[:N], f"Stable prefix mismatch: {turn1[:N]!r} != {turn2[:N]!r}"
|
||||
```
|
||||
|
||||
**The test is the contract.** If a new layer is added in the middle of the stack, this test fails; the agent must either move the layer to the stable position or update the test (with written justification).
|
||||
|
||||
---
|
||||
|
||||
## 3. The provider-specific cache strategies
|
||||
|
||||
### 3.1 Anthropic (5-minute ephemeral, 4 breakpoints max)
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_anthropic
|
||||
def _send_anthropic(messages, *, cache_prefix_chars=None):
|
||||
if cache_prefix_chars is not None:
|
||||
# Wrap the message in content blocks; mark each prefix with cache_control
|
||||
content_blocks = cache_prefix_blocks(messages, cache_prefix_chars)
|
||||
else:
|
||||
content_blocks = messages
|
||||
|
||||
response = anthropic_client.messages.create(
|
||||
model=model,
|
||||
max_tokens=8192,
|
||||
messages=[{"role": "user", "content": content_blocks}],
|
||||
)
|
||||
return _result_with_usage(response.content, response.usage, messages)
|
||||
```
|
||||
|
||||
**The cache_prefix_blocks helper:**
|
||||
|
||||
```python
|
||||
def cache_prefix_blocks(message: str, cache_boundaries: list[int]) -> list[dict]:
|
||||
"""Split the message into content blocks at the given char offsets.
|
||||
Mark each prefix block with cache_control. Returns the plain string
|
||||
when no valid boundary exists. At most 3 prefix blocks (provider limit
|
||||
is 4 breakpoints per request)."""
|
||||
if not cache_boundaries:
|
||||
return message
|
||||
points = sorted({b for b in cache_boundaries if 0 < b < len(message)})[:3]
|
||||
if not points:
|
||||
return message
|
||||
blocks = []
|
||||
start = 0
|
||||
for point in points:
|
||||
blocks.append({
|
||||
"type": "text",
|
||||
"text": message[start:point],
|
||||
"cache_control": {"type": "ephemeral"},
|
||||
})
|
||||
start = point
|
||||
blocks.append({"type": "text", "text": message[start:]})
|
||||
return blocks
|
||||
```
|
||||
|
||||
**The Anthropic usage accounting:**
|
||||
|
||||
```python
|
||||
def _result_with_usage(text, usage, input_text=None):
|
||||
input_tokens = _usage_value(usage, "input_tokens", "prompt_tokens", "prompt_token_count")
|
||||
# Anthropic reports cached prompt tokens separately; fold them back
|
||||
# so input_tokens stays "tokens sent" across providers.
|
||||
input_tokens += _usage_value(usage, "cache_read_input_tokens")
|
||||
input_tokens += _usage_value(usage, "cache_creation_input_tokens")
|
||||
# ...
|
||||
```
|
||||
|
||||
**The 4-breakpoint limit.** Anthropic allows at most 4 `cache_control` markers per request. Manual Slop uses 3 prefix blocks (one breakpoint per prefix) + 1 volatile suffix.
|
||||
|
||||
### 3.2 Gemini (1-hour explicit cache, configurable TTL)
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_gemini
|
||||
def _send_gemini(messages, *, cache_ttl_seconds=3600):
|
||||
if cache_ttl_seconds > 0:
|
||||
cached_content = genai_client.caches.create(
|
||||
model=model,
|
||||
contents=stable_prefix_messages,
|
||||
ttl=f"{cache_ttl_seconds}s",
|
||||
)
|
||||
response = genai_client.models.generate_content(
|
||||
model=model,
|
||||
contents=volatile_messages,
|
||||
config=genai.types.GenerateContentConfig(cached_content=cached_content.name),
|
||||
)
|
||||
else:
|
||||
response = genai_client.models.generate_content(model=model, contents=messages)
|
||||
return _result_with_usage(response.text, response.usage_metadata, messages)
|
||||
```
|
||||
|
||||
**The default TTL is 1 hour.** Configurable per the GUI (per §4 below).
|
||||
|
||||
### 3.3 OpenAI (5-10 min implicit, provider-managed)
|
||||
|
||||
OpenAI's caching is *implicit*: the provider automatically caches the prefix and reuses it across requests with the same prefix. No application-side control.
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_openai
|
||||
def _send_openai(messages, *, model="gpt-5.5"):
|
||||
response = openai_client.responses.create(model=model, input=messages)
|
||||
return _result_with_usage(response.output_text, response.usage, messages)
|
||||
# No application-side cache_control; the provider handles it
|
||||
```
|
||||
|
||||
**The TTL is provider-managed** (5-10 min). The GUI just shows "Cached by OpenAI; TTL: provider-managed."
|
||||
|
||||
### 3.4 claude-code (5th provider, subscription auth)
|
||||
|
||||
`claude-code` uses the Claude Agent SDK with local Claude Code authentication (no API key). The caching behavior is provider-managed.
|
||||
|
||||
```python
|
||||
# In src/ai_client.py:_send_claude_code (the 5th provider)
|
||||
def _send_claude_code(message, model, *, allowed_tools=None, max_turns=1):
|
||||
options = ClaudeAgentOptions(
|
||||
model=None if not model or model == "default" else model,
|
||||
max_turns=max_turns,
|
||||
tools=list(allowed_tools) if allowed_tools else [],
|
||||
allowed_tools=list(allowed_tools) if allowed_tools else [],
|
||||
cwd=os.getcwd(),
|
||||
)
|
||||
# ... claude_agent_sdk.query(prompt=message, options=options)
|
||||
return _result_with_usage(text, usage, message)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. The GUI exposure
|
||||
|
||||
The "Caching" Operations Hub sub-panel:
|
||||
|
||||
```
|
||||
+------------------------------------------------------+
|
||||
| Caching |
|
||||
+------------------------------------------------------+
|
||||
| Provider summaries |
|
||||
| [Anthropic] in:340 cache:80 hit:23% ttl:4:32 |
|
||||
| [Gemini] in:120 cache:0 hit:0% ttl:0:00 |
|
||||
| [OpenAI] in:560 cache:200 hit:35% ttl:n/a |
|
||||
+------------------------------------------------------+
|
||||
| Active discussions |
|
||||
| Discussion "refactor auth" |
|
||||
| cached: yes (Anthropic) |
|
||||
| expires: 2026-06-12T15:32 (in 4:32) |
|
||||
| [Invalidate cache] [Disable caching for this] |
|
||||
| Discussion "fix the parser" |
|
||||
| cached: no |
|
||||
| [Enable caching for this] |
|
||||
+------------------------------------------------------+
|
||||
| Global settings |
|
||||
| [X] Enable Anthropic ephemeral caching |
|
||||
| [X] Enable Gemini explicit caching |
|
||||
| [ ] Allow >1h Gemini caches (charges may apply) |
|
||||
| Anthropic default TTL: [5 min v] |
|
||||
| Gemini default TTL: [60 min v] |
|
||||
+------------------------------------------------------+
|
||||
```
|
||||
|
||||
**The data sources:**
|
||||
|
||||
| Widget | Data source | Frequency |
|
||||
|---|---|---|
|
||||
| `in:N cache:N hit:N%` | `ai_client.get_token_stats()` | per turn (or per session) |
|
||||
| `ttl:4:32` | `ai_client._send_<provider>` usage metadata + the cache expiry timestamp | per turn |
|
||||
| `cached: yes/no` | per-discussion flag (NEW) | per discussion |
|
||||
| `[Invalidate cache]` | calls `ai_client._invalidate_cache(discussion_id)` (NEW) | on click |
|
||||
|
||||
**The new AI client state:**
|
||||
|
||||
```python
|
||||
# In src/ai_client.py (NEW)
|
||||
@dataclass
|
||||
class DiscussionCacheState:
|
||||
discussion_id: str
|
||||
provider: str
|
||||
cached_at: datetime
|
||||
expires_at: Optional[datetime]
|
||||
hit_count: int = 0
|
||||
tokens_cached: int = 0
|
||||
last_invalidated_at: Optional[datetime] = None
|
||||
caching_enabled: bool = True
|
||||
|
||||
# In AppController (NEW)
|
||||
self.discussion_caches: dict[str, DiscussionCacheState] = {}
|
||||
```
|
||||
|
||||
**The Hook API additions:**
|
||||
|
||||
```
|
||||
GET /api/cache # list all discussion cache states
|
||||
GET /api/cache/<discussion_id> # get one
|
||||
POST /api/cache/<discussion_id>/invalidate
|
||||
POST /api/cache/<discussion_id>/disable
|
||||
POST /api/cache/<discussion_id>/enable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. The injection (where the cache hits)
|
||||
|
||||
| Layer | Where injected | Stable? | Cache impact |
|
||||
|---|---|---|---|
|
||||
| 1. Role instructions | `_get_combined_system_prompt` | yes | **CACHED** |
|
||||
| 2. Function-calling schema | per provider | yes | **CACHED** |
|
||||
| 3. Discovered tool descriptions | `mcp_client.get_tool_schemas()` | yes | **CACHED** |
|
||||
| 4. System prompt preset | `app_state.ai_settings.system_prompt` | yes | **CACHED** |
|
||||
| 5. Persona profile | `app_state.active_persona` | yes | **CACHED** |
|
||||
| 6. Project context | `manual_slop.toml [agent.context_files]` | yes | **CACHED** |
|
||||
| 7. Knowledge digest | `~/.manual_slop/knowledge/digest.md` | yes (within a gc cycle) | **CACHED** |
|
||||
| 8. Discussion metadata | `disc_entries[:1]` | no | NOT cached |
|
||||
| 9. Active preset | `self.context_files` | no | NOT cached |
|
||||
| 10. Per-file details | per `FileItem` | no | NOT cached |
|
||||
| 11. Prior tool results | per `_reread_file_items` | no | NOT cached |
|
||||
| 12. User message | the input | no | NOT cached |
|
||||
|
||||
**The cache only hits on the stable prefix (layers 1-7).** The volatile suffix (layers 8-12) is *not* cached; the user expects the conversation to change per turn.
|
||||
|
||||
---
|
||||
|
||||
## 6. The cache invalidation triggers
|
||||
|
||||
| Trigger | Effect |
|
||||
|---|---|
|
||||
| `python -m src.knowledge_harvest --apply` | The digest is regenerated; the cache is invalidated for the next turn |
|
||||
| `FileItem.notes` edited | The per-file knowledge changes; the cache is invalidated for the next turn that references the file |
|
||||
| `persona` changed | The persona profile is in the stable prefix; the cache is invalidated |
|
||||
| `[Invalidate cache]` button | The per-discussion cache state is marked `last_invalidated_at`; the next turn re-creates it |
|
||||
| `expiration` reached | The provider's cache expires automatically; the next turn re-creates it |
|
||||
|
||||
---
|
||||
|
||||
## 7. The measurement (the empirical basis)
|
||||
|
||||
**The "before" measurement** (do this first, before any refactor):
|
||||
|
||||
```bash
|
||||
# Log the cache hit rate over a sample of representative discussions
|
||||
$ python -m scripts.measure_cache_hit_rate --discussions 50 --provider anthropic
|
||||
cache hit rate: 23% (avg)
|
||||
cache write rate: 45% (avg)
|
||||
in:N avg: 1,200
|
||||
cache:N avg: 280
|
||||
```
|
||||
|
||||
**The "after" measurement** (after the stable-to-volatile refactor):
|
||||
|
||||
```bash
|
||||
$ python -m scripts.measure_cache_hit_rate --discussions 50 --provider anthropic
|
||||
cache hit rate: 67% (avg) # <-- should be measurably higher
|
||||
cache write rate: 18% (avg) # <-- should be lower
|
||||
in:N avg: 1,200 # <-- unchanged (the user still types the same)
|
||||
cache:N avg: 280 # <-- unchanged
|
||||
```
|
||||
|
||||
**The win comes from re-aligning the boundaries**, not from changing the providers. The test is whether the cache hit rate is measurably higher after the refactor.
|
||||
|
||||
---
|
||||
|
||||
## 8. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/cache_friendly_context.md` — the canonical styleguide
|
||||
- `docs/guide_ai_client.md` — the underlying LLM client (the producer)
|
||||
- `docs/guide_agent_memory_dimensions.md` §5 — where the 4 dims get injected
|
||||
- `docs/guide_knowledge_curation.md` §3 — the digest (layer 7)
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5 — the nagent pattern
|
||||
@@ -0,0 +1,411 @@
|
||||
# Knowledge Curation Guide
|
||||
|
||||
**Status:** User-facing deep-dive on the 4th memory dimension (the knowledge memory). For agents, see `./docs/AGENTS.md` §6.
|
||||
**Date:** 2026-06-12
|
||||
**Cross-refs:** `conductor/code_styleguides/knowledge_artifacts.md`; `docs/guide_agent_memory_dimensions.md` §4; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.1, §4.
|
||||
|
||||
> **What this is.** The 4th memory dimension is the *durable, user-editable, provenance-aware* knowledge store. It's a *layer*, not a *snapshot*. Category files are the source of truth; the digest is a projection; the ledger is the audit log. This guide is the user-facing deep-dive on how to use it, how to harvest it, and how to query it.
|
||||
|
||||
---
|
||||
|
||||
## 0. The 30-second version
|
||||
|
||||
Manual Slop's knowledge memory lives at `~/.manual_slop/knowledge/`. It has 5 category files (`facts.md`, `decisions.md`, `questions.md`, `playbooks.md`, `tasks.md`) plus per-file notes (`files/{file_id}.md`) plus a 4KB bounded digest plus a sha256 ledger. The LLM harvests past discussions into these files; the user can edit any of them in plain text. The digest is injected into every new discussion's initial context as a `{knowledge}` block.
|
||||
|
||||
```
|
||||
$ ls ~/.manual_slop/knowledge/
|
||||
facts.md # - {statement} {provenance}
|
||||
decisions.md # - {statement, reason} {provenance}
|
||||
questions.md # - {question} {provenance}
|
||||
playbooks.md # - **{name}**: {steps} {provenance}
|
||||
tasks.md # ## Open / ## Done
|
||||
files/ # per-file notes (keyed by inode)
|
||||
digest.md # bounded 4KB; the projection
|
||||
ledger.json # sha256-of-content audit log
|
||||
prompts/ # user-editable harvest prompt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 1. The 5 category files (the source of truth)
|
||||
|
||||
### 1.1 `facts.md` (durable statements)
|
||||
|
||||
```markdown
|
||||
# Facts
|
||||
|
||||
- The MCP dispatch uses a flat if/elif chain. 4 places, 45 tools. [from: 2026-05-12-investigate-dispatch, 2026-05-12]
|
||||
- ai_client.py has 5 separate per-provider history lists, each with their own lock. Switching providers mid-session loses history. [from: 2026-05-13-state-mutation-matrix, 2026-05-13]
|
||||
- RAG is opt-in. Default-off in new projects. [from: 2026-06-12-rag-discipline, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {statement} {provenance}`. Plain markdown. Append-only. User-editable.
|
||||
|
||||
**The provenance string:** `[from: {conversation_name}, {date}]`. The `date` is the ISO-8601 date prefix of the harvest timestamp.
|
||||
|
||||
**The user can edit any fact.** The LLM's output is a *suggestion*; the user is the editor. If a fact is wrong, the user deletes it. If a fact needs more detail, the user adds it. The harvest will *append*; it will not *overwrite*.
|
||||
|
||||
### 1.2 `decisions.md` (decisions with reasons)
|
||||
|
||||
```markdown
|
||||
# Decisions
|
||||
|
||||
- Knowledge harvest is a complement to curation + discussion, not a RAG replacement. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
- Cache TTL defaults to 5 min (Anthropic) + 60 min (Gemini); configurable per-discussion. [from: 2026-06-12-cache-strategy, 2026-06-12]
|
||||
- Per-file knowledge notes are keyed by st_dev:st_ino, not by path. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {statement} {provenance}`. The "why" lives in the LLM's harvest output's `detail` field. The user's edits override.
|
||||
|
||||
### 1.3 `questions.md` (unanswered questions)
|
||||
|
||||
```markdown
|
||||
# Questions
|
||||
|
||||
- Where does intent resolution live — per-verb, per-block, or global? [from: 2026-06-12-follow-up-b, 2026-06-12]
|
||||
- How should the knowledge digest TTL be exposed in the GUI? [from: 2026-06-12-cache-ttl, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {question} {provenance}`. Open questions are *valuable* — they're the TODO list the next session can act on.
|
||||
|
||||
### 1.4 `playbooks.md` (reusable sequences)
|
||||
|
||||
```markdown
|
||||
# Playbooks
|
||||
|
||||
- **Knowledge Harvest**: scan -> classify -> LLM-distill -> append -> digest -> reclaim. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
- **Stable-to-Volatile Cache Ordering**: identify Instance: boundary -> pass to --cache-prefix-chars. [from: 2026-06-12-candidate-12, 2026-06-12]
|
||||
- **Candidate Verification (TBD)**: read src/ai_client.py:run_discussion_compression -> check failure mode. [from: 2026-06-12-candidate-15, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- **{name}**: {steps} {provenance}`. Playbooks are the "I did this once; here it is" record. Future workers use them directly.
|
||||
|
||||
### 1.5 `tasks.md` (open and done)
|
||||
|
||||
```markdown
|
||||
# Tasks
|
||||
|
||||
## Open
|
||||
- Create canonical DOD file at conductor/code_styleguides/data_oriented_design.md. [from: 2026-06-12-candidate-16, 2026-06-12]
|
||||
- Verify Candidate 15 by reading src/ai_client.py:run_discussion_compression. [from: 2026-06-12-candidate-15, 2026-06-12]
|
||||
|
||||
## Done
|
||||
- Read nagent source in full (18 files). [from: 2026-05-15, 2026-05-15]
|
||||
- Wrote v2.3 review (272KB / 3965 lines). [from: 2026-06-12-v2.3, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {task} {provenance}`. The two sections are manually maintained; the harvest places open items in `## Open` and done items in `## Done`.
|
||||
|
||||
---
|
||||
|
||||
## 2. The per-file notes (`files/{file_id}.md`)
|
||||
|
||||
**The shape:**
|
||||
|
||||
```markdown
|
||||
# /repo/src/ai_client.py
|
||||
|
||||
- Uses `cache_control: {"type": "ephemeral"}` blocks for Anthropic caching. [from: 2026-06-12-investigate-cache, 2026-06-12]
|
||||
- The 5 per-provider history lists are gated by their own locks. [from: 2026-05-13-state-mutation-matrix, 2026-05-13]
|
||||
- `run_discussion_compression` failure mode: TBD (Candidate 15). [from: 2026-06-12-candidate-15, 2026-06-12]
|
||||
```
|
||||
|
||||
**The shape:** `- {note} {provenance}`. Keyed by `file_id` (the st_dev:st_ino of the file). Survives renames within the same filesystem.
|
||||
|
||||
**The `file_id_for_path` pattern** (per nagent's `bin/helpers/nagent_file_edit_lib.py:file_id_for_path`):
|
||||
|
||||
```python
|
||||
def file_id_for_path(path: Path) -> str:
|
||||
"""Stable file identity across renames. Returns 'device:inode'."""
|
||||
stat = path.stat()
|
||||
return f"{stat.st_dev}:{stat.st_ino}"
|
||||
```
|
||||
|
||||
**Why inode and not path?** The path can change (rename, move, link); the inode is stable. A note about `src/foo.py` is preserved if `src/foo.py` is renamed to `src/bar.py` (same inode). If the file is moved across filesystems, the inode changes; the user must re-add the note.
|
||||
|
||||
**The "files" category in the harvest output has a special branch:**
|
||||
|
||||
```python
|
||||
# In merge_harvest (the harvest pipeline)
|
||||
file_notes = 0
|
||||
for row in harvested.get("files", []):
|
||||
if not isinstance(row, dict):
|
||||
continue
|
||||
path_text = str(row.get("path") or "").strip()
|
||||
note = str(row.get("note") or "").strip()
|
||||
if not note:
|
||||
continue
|
||||
target = Path(path_text) if path_text else None
|
||||
if target is not None and target.is_file():
|
||||
try:
|
||||
file_id = file_id_for_path(target)
|
||||
except OSError:
|
||||
file_id = None
|
||||
if file_id is not None:
|
||||
_append_bullets(
|
||||
file_knowledge_path(root, file_id), f"# {target.resolve()}",
|
||||
[f"{note} {provenance}"],
|
||||
)
|
||||
file_notes += 1
|
||||
continue
|
||||
# Target no longer resolvable: the note survives as a fact.
|
||||
prefix = f"{path_text}: " if path_text else ""
|
||||
_append_bullets(knowledge / "facts.md", "# Facts", [f"{prefix}{note} {provenance}"])
|
||||
file_notes += 1
|
||||
counts["files"] = file_notes
|
||||
```
|
||||
|
||||
**The behavior:**
|
||||
- If the path resolves to an existing file → the note goes to `knowledge/files/{file_id}.md`
|
||||
- If the path doesn't resolve (the file is gone) → the note falls back to `facts.md` as `{path}: {note} {provenance}`. The note survives, just loses the per-file binding.
|
||||
|
||||
---
|
||||
|
||||
## 3. The digest (`digest.md`)
|
||||
|
||||
The digest is a *projection* of the category files, bounded to **4KB**. It's injected as the `{knowledge}` block in the initial context.
|
||||
|
||||
**The format:**
|
||||
|
||||
```markdown
|
||||
# Knowledge digest
|
||||
(regenerated by knowledge_harvest; edit the category files, not this file)
|
||||
|
||||
## Open tasks
|
||||
- Create canonical DOD file at conductor/code_styleguides/data_oriented_design.md. [from: 2026-06-12-candidate-16, 2026-06-12]
|
||||
|
||||
## Open questions
|
||||
- Where does intent resolution live — per-verb, per-block, or global? [from: 2026-06-12-follow-up-b, 2026-06-12]
|
||||
|
||||
## Decisions
|
||||
- Knowledge harvest is a complement to curation + discussion, not a RAG replacement. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
|
||||
## Facts
|
||||
- nagent has 5 providers; Manual Slop has 8. [from: 2026-06-12-v2.3, 2026-06-12]
|
||||
|
||||
## Playbooks
|
||||
- **Knowledge Harvest**: scan -> classify -> LLM-distill -> append -> digest -> reclaim. [from: 2026-06-12-candidate-11, 2026-06-12]
|
||||
```
|
||||
|
||||
**The ordering is fixed:** Open tasks, Open questions, Decisions, Facts, Playbooks. **Within each section, newest first** (because the category files are append-only; reversing gives newest-first).
|
||||
|
||||
**Truncation:** if the sections don't fit in 4KB, the rest is truncated with a visible `(truncated; see the category files for the rest)` note.
|
||||
|
||||
**"Delete to turn off":** `rm ~/.manual_slop/knowledge/digest.md` → no `{knowledge}` block injected. Re-enable by running the harvest (which regenerates the digest).
|
||||
|
||||
---
|
||||
|
||||
## 4. The ledger (`ledger.json`)
|
||||
|
||||
The ledger is the **sha256-of-content audit log**. It gates deletion on a proven harvest.
|
||||
|
||||
**The format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"entries": {
|
||||
"<sha256-of-conversation-content>": {
|
||||
"path": "/home/user/.manual_slop/conversations/<name>-<uuid>",
|
||||
"status": "harvested",
|
||||
"at": "2026-06-12T14:23:45.123456+00:00",
|
||||
"items": {
|
||||
"facts": 3,
|
||||
"decisions": 2,
|
||||
"tasks_done": 1,
|
||||
"tasks_open": 0,
|
||||
"questions": 1,
|
||||
"playbooks": 0,
|
||||
"files": 1
|
||||
},
|
||||
"deleted": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**The status values:**
|
||||
|
||||
| Status | Meaning | Action |
|
||||
|---|---|---|
|
||||
| `harvested` | LLM distillation succeeded; items appended to category files | reclaim (unlink) |
|
||||
| `harvest-failed` | LLM distillation failed after retries | keep the conversation; record the error |
|
||||
| `deleted-unharvested` | User passed `--no-harvest`; the conversation is reclaimed without LLM | reclaim (unlink) |
|
||||
| `too-large` | File > 1MB; kept without harvesting | keep |
|
||||
|
||||
**The sha256-of-content dedup:** two conversations with the same content share a ledger entry. The second is reclaimed without paying the LLM cost again.
|
||||
|
||||
---
|
||||
|
||||
## 5. The harvest workflow
|
||||
|
||||
### 5.1 The 7-category schema (the LLM output)
|
||||
|
||||
The LLM's harvest output is strict JSON (no prose, no markdown fence):
|
||||
|
||||
```json
|
||||
{
|
||||
"facts": [{"statement": "...", "detail": "..."}],
|
||||
"decisions": [{"statement": "...", "detail": "..."}],
|
||||
"tasks_done": [{"statement": "...", "detail": "..."}],
|
||||
"tasks_open": [{"statement": "...", "detail": "..."}],
|
||||
"questions": [{"statement": "...", "detail": "..."}],
|
||||
"playbooks": [{"name": "...", "steps": "..."}],
|
||||
"files": [{"path": "...", "note": "..."}]
|
||||
}
|
||||
```
|
||||
|
||||
**The prompt** (in `~/.manual_slop/knowledge/prompts/harvest-conversation.md`; user-editable, root-first resolution):
|
||||
|
||||
```markdown
|
||||
# Harvest durable knowledge from a manual_slop conversation
|
||||
|
||||
You are given one conversation (or a summary of one). Extract only knowledge that
|
||||
stays useful after this conversation is deleted. Return only JSON in exactly this
|
||||
form (no prose, no markdown fence):
|
||||
|
||||
[the 7-category schema above]
|
||||
|
||||
Category rules:
|
||||
- facts: durable statements about systems, repositories, tools, environments, or
|
||||
constraints that were learned, not assumed.
|
||||
- decisions: choices that were made, with the why in `detail`.
|
||||
- tasks_done: concrete work completed in this conversation.
|
||||
- tasks_open: work that was started, planned, or requested but not finished.
|
||||
- questions: questions raised and never answered.
|
||||
- playbooks: command sequences or processes that worked and are reusable; `steps`
|
||||
is the runnable sequence.
|
||||
- files: a note tied to one specific file path (use the absolute path seen in
|
||||
the conversation).
|
||||
|
||||
General rules:
|
||||
- Empty arrays are valid and expected: most conversations contain nothing durable.
|
||||
Do not invent items to fill categories.
|
||||
- One item per distinct piece of knowledge; keep `statement` to one sentence.
|
||||
- `detail` is optional context; omit it or use "" when the statement stands alone.
|
||||
- Do not include conversation mechanics, tool output noise, retries, or one-off
|
||||
trivia (timestamps, token counts, transient errors).
|
||||
```
|
||||
|
||||
### 5.2 The retry budget (the contract)
|
||||
|
||||
`HARVEST_MAX_ATTEMPTS = 2`. The retry is at the parse level (not the API level):
|
||||
|
||||
```python
|
||||
def harvest_conversation(path, provider, model, *, generate, summarize=None):
|
||||
content = read_or_summarize(path, provider, model)
|
||||
template = harvest_prompt_path().read_text(encoding="utf-8").strip()
|
||||
last_error = None
|
||||
for attempt in range(HARVEST_MAX_ATTEMPTS):
|
||||
prompt = build_harvest_prompt(template, path.name, content, retry=attempt > 0)
|
||||
response = generate(prompt, provider, model)
|
||||
try:
|
||||
return parse_harvest_json(response)
|
||||
except (json.JSONDecodeError, ValueError) as exc:
|
||||
last_error = exc
|
||||
raise RuntimeError(f"harvest output invalid after {HARVEST_MAX_ATTEMPTS} attempts: {last_error}")
|
||||
```
|
||||
|
||||
**The retry-suffix:** on retry, append `\nYour previous reply was not valid JSON. Return only the JSON object.\n` to the prompt.
|
||||
|
||||
### 5.3 The size limits (the budgets)
|
||||
|
||||
| Constant | Value | Why |
|
||||
|---|---|---|
|
||||
| `SUMMARIZE_THRESHOLD_BYTES` | 64 KB | Files > 64KB get summarized first |
|
||||
| `MAX_HARVEST_SOURCE_BYTES` | 1 MB | Files > 1MB are kept (not harvested) |
|
||||
| `DIGEST_MAX_BYTES` | 4 KB | The bounded digest size |
|
||||
| `HARVEST_MAX_ATTEMPTS` | 2 | Retry budget on parse failure |
|
||||
|
||||
### 5.4 The dry-run-by-default safety
|
||||
|
||||
The harvest CLI defaults to **dry-run**. Without `--apply`, the CLI classifies, estimates cost, and prints a report. **No mutation.**
|
||||
|
||||
```bash
|
||||
$ python -m src.knowledge_harvest
|
||||
artifacts: live:42, user-kept:3, prune:0, harvest:17, keep:1
|
||||
harvest candidates: 2.3MB (~600K input tokens), prune candidates: 0B
|
||||
dry run; pass --apply to harvest and reclaim
|
||||
|
||||
$ python -m src.knowledge_harvest --apply
|
||||
reclaimed: 2.3MB
|
||||
harvested items: facts:42, decisions:18, tasks_done:7, tasks_open:3, questions:5, playbooks:2, files:11
|
||||
digest: /home/user/.manual_slop/knowledge/digest.md
|
||||
ledger: /home/user/.manual_slop/knowledge/ledger.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. The "delete to turn off" pattern
|
||||
|
||||
**The principle.** Feature flags should be data, not config. If a feature is gated by the presence of a file, the user can turn it off by deleting the file. No GUI toggle, no env var, no `config.toml` edit. Just `rm`.
|
||||
|
||||
**The knowledge digest pattern:** `rm ~/.manual_slop/knowledge/digest.md` → no `{knowledge}` block is injected. Re-enable by running `python -m src.knowledge_harvest --apply` (which regenerates the digest).
|
||||
|
||||
**The implementation:**
|
||||
|
||||
```python
|
||||
# In aggregate.py:run (the consumer of the digest)
|
||||
knowledge_digest_path = paths.knowledge_dir() / "digest.md"
|
||||
if knowledge_digest_path.is_file():
|
||||
knowledge_digest = knowledge_digest_path.read_text(encoding="utf-8")
|
||||
stable_prefix.append(f"{{knowledge}}\n{knowledge_digest}\n{{/knowledge}}\n")
|
||||
# else: skip; the file is the switch
|
||||
```
|
||||
|
||||
**The pattern recurs in 3 places:**
|
||||
1. `regenerate_digest` deletes the digest when sections are empty
|
||||
2. The `aggregate.py:run` injection check is the load-bearing one
|
||||
3. The GUI `Knowledge` panel shows the file state and provides a `[Delete to turn off]` button
|
||||
|
||||
---
|
||||
|
||||
## 7. The graceful failure modes
|
||||
|
||||
| Failure | Handling |
|
||||
|---|---|
|
||||
| LLM returns invalid JSON | Retry (up to 2 attempts); on 2nd failure, mark `harvest-failed` in the ledger; keep the conversation |
|
||||
| File > 1MB | Mark `too-large` in the ledger; keep the conversation |
|
||||
| File > 64KB | Summarize via `run_subagent_summarization`; use the summary as the LLM input |
|
||||
| Provider not available | Mark `harvest-failed`; keep the conversation |
|
||||
| Network timeout | Same; mark `harvest-failed`; keep the conversation |
|
||||
| Disk full writing to category files | Raise; mark `harvest-failed`; keep the conversation (don't reclaim) |
|
||||
|
||||
**The pattern:** critical operations complete; non-essential post-steps are best-effort. The marker is visible. The user can re-run.
|
||||
|
||||
---
|
||||
|
||||
## 8. The injection (where the digest is used)
|
||||
|
||||
The digest is injected into the *stable* position of the initial context (layer 7 of the 12-layer model; per `cache_friendly_context.md`):
|
||||
|
||||
```python
|
||||
# In aggregate.py:run (the consumer)
|
||||
def build_initial_context(ctrl, user_message):
|
||||
stable_prefix = []
|
||||
|
||||
# Layer 1-6: role, schema, tools, system prompt, persona, project context
|
||||
stable_prefix.append(...)
|
||||
|
||||
# Layer 7: knowledge digest (the 4KB bounded projection)
|
||||
knowledge_digest_path = paths.knowledge_dir() / "digest.md"
|
||||
if knowledge_digest_path.is_file():
|
||||
knowledge_digest = knowledge_digest_path.read_text(encoding="utf-8")
|
||||
stable_prefix.append(f"{{knowledge}}\n{knowledge_digest}\n{{/knowledge}}\n")
|
||||
|
||||
# Layer 8-12: discussion metadata, active preset, per-file details, prior turns, user message
|
||||
volatile_suffix = [...]
|
||||
|
||||
return "".join(stable_prefix + volatile_suffix)
|
||||
```
|
||||
|
||||
**The position matters.** The digest is in the *stable* position (before the `Instance:` volatile block). The cache can include the digest in the cached prefix; the volatile suffix is not cached. Per `cache_friendly_context.md` §1.
|
||||
|
||||
---
|
||||
|
||||
## 9. The cross-references
|
||||
|
||||
- `conductor/code_styleguides/knowledge_artifacts.md` — the canonical styleguide
|
||||
- `docs/guide_agent_memory_dimensions.md` §4 — the knowledge dim in context
|
||||
- `docs/guide_caching_strategy.md` §5 — where the digest is injected
|
||||
- `conductor/code_styleguides/feature_flags.md` — the "delete to turn off" pattern
|
||||
- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.1, §4 — the nagent pattern that informed this guide
|
||||
@@ -593,3 +593,73 @@ See [guide_workspace_profiles.md](guide_workspace_profiles.md) (placeholder; wri
|
||||
- **[guide_discussions.md](guide_discussions.md)** — The Discussion system; MMA worker prompts are built from the active discussion
|
||||
- **[conductor/tracks/nagent_review_20260608/report.md §9](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the MMA sub-conversation pattern vs nagent's `<nagent-conversation>` tag; **the highest-priority future-track is to extract MMA's `run_worker_lifecycle` into a reusable `SubConversationRunner` for 1:1 discussions** (per user-flagged want)
|
||||
- **[conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md §3 and §10](../conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md)** — Actionable patterns for the SubConversationRunner; the design constraint that sub-agents return a *concise artifact* (not a full transcript) is baked into the recommendation
|
||||
## Addition (2026-06-12) — Delegation as context management, not parallelism
|
||||
|
||||
The nagent review (v2.3, §3.12) reframed delegation with a new lens: **the reason to spawn a sub-conversation is to keep the parent's context clean. The fact that the child runs concurrently (sometimes) is incidental.** Per nagent's `bin/nagent:730`: *"Hand off when noisy: if this conversation is mostly stale tool output, distill goal/state/decisions into a sub-conversation prompt, delegate the rest, and tell your caller about the handoff. Never rewrite your own conversation file while running."*
|
||||
|
||||
The reframing table:
|
||||
|
||||
| Long-lived agent abstractions | Disposable workers |
|
||||
|---|---|
|
||||
| Identity is central | Output artifact is central |
|
||||
| Shared context gets noisy | Child context is isolated |
|
||||
| Parent absorbs all exploration | Parent gets a concise result |
|
||||
| Delegation implies personality | Delegation is context management |
|
||||
|
||||
### How this applies to MMA
|
||||
|
||||
MMA already does this implicitly:
|
||||
- `src/multi_agent_conductor.py:_spawn_worker` runs each MMA worker as a fresh subprocess with `ai_client.reset_session()` (Context Amnesia)
|
||||
- The worker returns a `Result[TaskOutput, ErrorInfo]` to the parent (the `ConductorEngine`)
|
||||
- The parent's `disc_entries` doesn't accumulate the worker's intermediate reads/shell calls
|
||||
|
||||
### The product implication for 1:1 discussions
|
||||
|
||||
The 1:1 discussion path has no sub-agent primitive today. The user types a prompt, the AI responds, the loop continues. If the user wants the AI to "investigate this file" or "look up this API," the answer has to come from the same conversation.
|
||||
|
||||
**The product decision (user-flagged want).** Add a `SubConversationRunner` for 1:1 discussions. Reuse MMA's `mma_exec.py` as the subprocess template. The sub-agent returns a concise artifact (the sub-agent's response) + token usage + exit code. The App inserts the result into the active discussion as a "User" role entry. The next LLM call sees it.
|
||||
|
||||
### The SubConversationRunner shape (per the v2.3 §10.2 spec)
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class SubConversationResult:
|
||||
artifact: str # the sub-agent's response
|
||||
tokens_in: int
|
||||
tokens_out: int
|
||||
exit_code: int
|
||||
errors: list[ErrorInfo] # from the data_oriented_error_handling convention
|
||||
|
||||
class SubConversationRunner:
|
||||
async def spawn(self, prompt: str, *, allowed_tools: list[str] = None, ...) -> SubConversationResult:
|
||||
# Reuses mma_exec.py as the subprocess template
|
||||
# Returns the child's <nagent-response> content + token usage
|
||||
...
|
||||
```
|
||||
|
||||
**The design contract.** The sub-agent's return type is `SubConversationResult`, not the full conversation. The parent gets a concise artifact, not a transcript. The sub-conversation folder is auto-archived after 7 days (consistent with `log_pruner.py`).
|
||||
|
||||
## Addition (2026-06-12) — The 4 memory dimensions (the MMA scope)
|
||||
|
||||
The MMA tracks operate on `disc_entries` (the Discussion dim) and `manual_slop.toml` (the project config). They do NOT typically touch the Curation dim (per-track ticket specs) or the Knowledge dim (per-track session reports). They MAY touch the RAG dim if the ticket scope includes RAG integration (declared in `metadata.json`).
|
||||
|
||||
**The MMA scope, in the 4-dim framework:**
|
||||
|
||||
| Dim | MMA scope? | Why |
|
||||
|---|---|---|
|
||||
| Curation | per-ticket only | A ticket might add a `FileItem` if the feature touches curation; not a default |
|
||||
| Discussion | YES (the work) | The MMA worker's prompt is built from the active discussion |
|
||||
| RAG | per-ticket only | A ticket might use RAG if the feature includes RAG; declared in `metadata.json` |
|
||||
| Knowledge | per-track only | The track's session synthesis (in `docs/reports/`) is the durable knowledge |
|
||||
|
||||
**The implication for MMA workers.** MMA workers are given Context Amnesia (`ai_client.reset_session()` at the start of `run_worker_lifecycle`). The worker sees:
|
||||
- The ticket's prompt (the scoped work)
|
||||
- The `manual_slop.toml [agent.context_files]` (the project context)
|
||||
- The `FileItem` set per the ticket's scope
|
||||
- *Optionally* a `knowledge/digest.md` excerpt (if the ticket scope includes knowledge injection)
|
||||
|
||||
The worker does NOT see:
|
||||
- The full `disc_entries` history (per the Context Amnesia pattern)
|
||||
- The full `~/.manual_slop/knowledge/` (only the digest excerpt)
|
||||
- The RAG index (unless the ticket scope explicitly opts in)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user