docs: agent workflow docs + regular docs (v2.3 surfacing)

Per user request 'use your remaining context to update agent workflow docs and then regular docs based on what was discussed in this report', this commit creates/updates 15 files derived from the v2.3 nagent review (the 12 new nagent additions + the 4 memory dimensions reframing + the cache strategy + the RAG discipline + the knowledge harvest pattern). Agent workflow docs (4 files): - AGENTS.md (UPDATE): add @import line to canonical DOD + 'Code Styleguides' section pointing to the 6 new styleguides + new 'Human-Facing Documentation' section pointing to ./docs/AGENTS.md - conductor/workflow.md (UPDATE): new section 'Additions (2026-06-12) - the 12 patterns from the latest nagent corpus' with TDD protocols for knowledge harvest, cache ordering, compaction, RAG discipline - conductor/product-guidelines.md (UPDATE): new sections 'Memory Dimensions (added 2026-06-12)' + 'See Also - Updated' with the 6-styleguide catalog - docs/AGENTS.md (NEW): the agent-facing mirror of docs/Readme.md (per the nagent CLAUDE.md pattern). 10 sections + the per-tier reading path + the 4 memory dimensions + the caching strategy + the knowledge harvest + the RAG discipline + the feature flags Regular docs (11 files): - 6 new styleguides (the convention catalog): * data_oriented_design.md: the canonical DOD reference (Tier 0/1/2; 3 defaults to reject; 8 core defaults; 7-question simplification pass; 10-question self-check; 4 memory dimensions in Manual Slop context) * agent_memory_dimensions.md: the 4 memory dims (curation / discussion / RAG / knowledge) + when to use each + the boundaries * rag_integration_discipline.md: the conservative-RAG rule (opt-in, complement, provenance, no mutation, feature-gated, graceful failure) * cache_friendly_context.md: stable-to-volatile context ordering + the cache TTL GUI contract + the byte-comparison test * knowledge_artifacts.md: the knowledge harvest pattern (category files, provenance, sha256 ledger, digest regeneration, 'delete to turn off') * feature_flags.md: file presence vs config flags vs CLI flags - 3 new project docs (the cross-cutting guides): * guide_agent_memory_dimensions.md: the cross-cutting guide on the 4 dims + the decision tree * guide_caching_strategy.md: caching across providers + stable-to-volatile ordering + cache TTL GUI + the byte- comparison test + the 5th provider (claude-code) * guide_knowledge_curation.md: the knowledge memory guide (4th dim) + the 5 category files + per-file notes + the digest + the ledger + the harvest workflow - 2 existing doc updates: * guide_mma.md: new sections 'Delegation as context management' + 'The 4 memory dimensions (the MMA scope)' * guide_ai_client.md: new section 'Cache strategy and the 12- layer model' + the 5th provider (claude-code) All files use the same style as the v2.3 review (the user's preferred format): 7-column tables, no JSON, SSDL shape tags, forth/array notation, file:line citations, ASCII sketches where useful. The human Readme files (Readme.md, docs/Readme.md) are NOT modified (per repeated user instruction). The 5th provider (claude-code) is documented in guide_ai_client.md + the data_oriented_design.md references the nagent pattern as the source of the canonical rules. The cross-references are bidirectional: the 6 styleguides reference the 3 project docs; the 3 project docs reference the 6 styleguides; the 2 doc updates reference both; AGENTS.md + ./docs/AGENTS.md provide the entry points.
2026-06-12 13:50:40 -04:00
parent d604a63e1f
commit 35c6cca134
15 changed files with 3460 additions and 1 deletions
@@ -0,0 +1,354 @@
+# Cache-Friendly Context (stable-to-volatile ordering + cache TTL)
+
+**Status:** Styleguide; codifies the cache strategy for `aggregate.py:run` and the GUI exposure of cache TTL.
+**Date:** 2026-06-12
+**Cross-refs:** `conductor/code_styleguides/data_oriented_design.md` §3.2; `conductor/code_styleguides/agent_memory_dimensions.md`; `docs/guide_caching_strategy.md`; `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5.
+
+> **What this is.** The LLM providers that Manual Slop uses (Anthropic, Gemini, OpenAI) all support some form of prompt caching. The cost benefit comes from the *stable prefix* being byte-identical across turns and across discussions. This styleguide defines the stable prefix, the volatile suffix, the byte-comparison contract, and the cache TTL GUI exposure.
+
+---
+
+## 0. The one-glance principle
+
+```
+[STABLE PREFIX (cached across turns)]  [VOLATILE SUFFIX (per-turn)]
+[Role instructions]                     [Discussion metadata]
+[Function-calling schema]               [Active preset (FileItems)]
+[Discovered tool descriptions]          [Per-file details]
+[System prompt preset]                  [Tool-call results from prior turns]
+[Persona profile]                       [The user message]
+[Project context]
+[Knowledge digest]
+[file-knowledge for files in scope]
+```
+
+The cache boundary is at layer 8/9 (the last stable / first volatile). The Anthropic-specific path wraps the prefix in `cache_control: {"type": "ephemeral"}` blocks at the boundary; the Gemini path uses `cachedContent` resources; the OpenAI path uses implicit prefix caching.
+
+---
+
+## 1. The 12-layer model (the stable-to-volatile ordering)
+
+| # | Layer | Stable across turns? | Source | SSDL |
+|---|---|---|---|---|
+| 1 | Role instructions (model + provider) | yes | `_get_combined_system_prompt` | `[I]` |
+| 2 | Function-calling schema | yes | per provider | `[I]` |
+| 3 | Discovered tool descriptions | yes | `mcp_client.get_tool_schemas()` | `[I]` |
+| 4 | System prompt preset | yes | `app_state.ai_settings.system_prompt` | `[I]` |
+| 5 | Persona profile | yes | `app_state.active_persona` | `[I]` |
+| 6 | Project context (per `manual_slop.toml`) | yes | NEW (Candidate 14) | `[I]` |
+| 7 | Knowledge digest (per `knowledge/digest.md`) | yes (within a gc cycle) | NEW (Candidate 8) | `[I]` |
+| 8 | Discussion metadata (name, role count) | no (per turn) | `disc_entries[:1]` or `disc_meta` | `───` (data) |
+| 9 | Active preset (FileItem set) | no (per turn) | `self.context_files` | `───` (data) |
+| 10 | Per-file details (history, slices, notes) | no (per file) | per `FileItem` | `───` (data) |
+| 11 | Tool-call results from prior turns | no (per turn) | per `_reread_file_items` | `───` (data) |
+| 12 | The user message | no (per turn) | the input | `───` (data) |
+
+**The cache boundary is at layer 7/8.** Layers 1-7 are byte-identical across turns of the same discussion (and across discussions of the same mode). Layers 8-12 change per turn.
+
+---
+
+## 2. The byte-comparison test (the design contract)
+
+The design rule "stable prefix is byte-identical" must be testable. The test:
+
+```python
+# In tests/test_aggregate_caching.py (NEW)
+def test_aggregate_stable_to_volatile_ordering():
+    """The first N characters of the context should be identical across turns
+    of the same conversation, when no stable-layer inputs change."""
+    ctrl = mock_app_controller()
+    ctrl.ai_settings.system_prompt = "Test system prompt"
+    ctrl.active_persona = mock_persona()
+
+    # Turn 1
+    turn1 = aggregate.build_initial_context(ctrl, user_message="first prompt")
+
+    # Turn 2 (same stable inputs, different user message)
+    turn2 = aggregate.build_initial_context(ctrl, user_message="second prompt")
+
+    # The first N characters should be identical (N = where the volatile layers start)
+    N = aggregate.stable_prefix_length(ctrl)
+    assert turn1[:N] == turn2[:N], f"Stable prefix mismatch: {turn1[:N]!r} != {turn2[:N]!r}"
+```
+
+**The test is the contract.** If a new layer is added in the middle of the stack, this test fails; the agent must either move the layer to the stable position or update the test (with written justification).
+
+**The implementation.** `aggregate.stable_prefix_length(ctrl)` returns the character offset where layer 8 starts. The simplest implementation: a class-level constant per `aggregate.py`, updated when the layer stack changes:
+
+```python
+class AggregateStack:
+    ROLE_INSTRUCTIONS_END = 0          # placeholder; computed at runtime
+    SCHEMA_END = 0
+    TOOLS_END = 0
+    SYSTEM_PROMPT_END = 0
+    PERSONA_END = 0
+    PROJECT_CONTEXT_END = 0
+    KNOWLEDGE_DIGEST_END = 0
+    INSTANCE_START = 0                 # the cache boundary
+```
+
+**The test failure modes:**
+
+| Failure | Why it fails | Fix |
+|---|---|---|
+| A new stable layer was added in the wrong position | The first N characters differ because the new layer is below the boundary | Move the new layer above the boundary (between layers 7 and 8) |
+| A stable layer was moved to the volatile position | The first N characters differ because the stable layer is now in the volatile part | Move the layer back to the stable position |
+| A volatile input leaked into a stable layer (e.g., a timestamp in the system prompt) | The first N characters differ because the volatile input is in the prefix | Strip the volatile input from the stable layer; pass it as a separate volatile argument |
+| The system prompt has a `now()` call | The first N characters differ across calls | Pass `now()` as a separate argument; don't include in the system prompt |
+
+---
+
+## 3. The provider-specific cache_control (the implementation)
+
+### 3.1 Anthropic (5-minute ephemeral, 4 breakpoints max)
+
+```python
+# In src/ai_client.py:_send_anthropic
+def _send_anthropic(messages, *, cache_prefix_chars=None):
+    if cache_prefix_chars is not None:
+        # Wrap the message in content blocks; mark each prefix with cache_control
+        content_blocks = cache_prefix_blocks(messages, cache_prefix_chars)
+    else:
+        content_blocks = messages
+
+    response = anthropic_client.messages.create(
+        model=model,
+        max_tokens=8192,
+        messages=[{"role": "user", "content": content_blocks}],
+    )
+    return _result_with_usage(response.content, response.usage, messages)
+```
+
+**The cache_prefix_blocks helper** (mirrors nagent's `bin/helpers/nagent_llm.py:cache_prefix_blocks`):
+
+```python
+def cache_prefix_blocks(message: str, cache_boundaries: list[int]) -> list[dict]:
+    """Split the message into content blocks at the given char offsets.
+    Mark each prefix block with cache_control. Returns the plain string
+    when no valid boundary exists. At most 3 prefix blocks (provider limit
+    is 4 breakpoints per request)."""
+    if not cache_boundaries:
+        return message
+    points = sorted({b for b in cache_boundaries if 0 < b < len(message)})[:3]
+    if not points:
+        return message
+    blocks = []
+    start = 0
+    for point in points:
+        blocks.append({
+            "type": "text",
+            "text": message[start:point],
+            "cache_control": {"type": "ephemeral"},
+        })
+        start = point
+    blocks.append({"type": "text", "text": message[start:]})
+    return blocks
+```
+
+**The Anthropic usage accounting** (per `nagent_llm.py:_result_with_usage`):
+
+```python
+def _result_with_usage(text, usage, input_text=None):
+    input_tokens = _usage_value(usage, "input_tokens", "prompt_tokens", "prompt_token_count")
+    # Anthropic reports cached prompt tokens separately; fold them back
+    # so input_tokens stays "tokens sent" across providers.
+    input_tokens += _usage_value(usage, "cache_read_input_tokens")
+    input_tokens += _usage_value(usage, "cache_creation_input_tokens")
+    output_tokens = _usage_value(usage, "output_tokens", "completion_tokens", ...)
+    # ... etc
+```
+
+**The 4-breakpoint limit.** Anthropic allows at most 4 `cache_control` markers per request. nagent caps at 3 prefix blocks (one breakpoint per prefix). Manual Slop does the same: 3 prefix blocks, 1 volatile suffix.
+
+### 3.2 Gemini (1-hour explicit cache, configurable TTL)
+
+```python
+# In src/ai_client.py:_send_gemini
+def _send_gemini(messages, *, cache_ttl_seconds=3600):
+    if cache_ttl_seconds > 0:
+        # Create a cachedContent resource for the stable prefix
+        cached_content = genai_client.caches.create(
+            model=model,
+            contents=stable_prefix_messages,    # layers 1-7
+            ttl=f"{cache_ttl_seconds}s",
+        )
+        # Reference the cached content in the request
+        response = genai_client.models.generate_content(
+            model=model,
+            contents=volatile_messages,         # layers 8-12
+            config=genai.types.GenerateContentConfig(cached_content=cached_content.name),
+        )
+    else:
+        response = genai_client.models.generate_content(model=model, contents=messages)
+    return _result_with_usage(response.text, response.usage_metadata, messages)
+```
+
+**The default TTL is 1 hour.** Configurable per the GUI (per §5 below).
+
+### 3.3 OpenAI (5-10 min implicit, provider-managed)
+
+OpenAI's caching is *implicit*: the provider automatically caches the prefix and reuses it across requests with the same prefix. No application-side control.
+
+```python
+# In src/ai_client.py:_send_openai
+def _send_openai(messages, *, model="gpt-5.5"):
+    response = openai_client.responses.create(model=model, input=messages)
+    return _result_with_usage(response.output_text, response.usage, messages)
+    # No application-side cache_control; the provider handles it
+```
+
+**The TTL is provider-managed** (5-10 min). The GUI just shows "Cached by OpenAI; TTL: provider-managed."
+
+### 3.4 The provider table (the summary)
+
+| Provider | Cache type | Default TTL | Configurable? | GUI exposure? |
+|---|---|---|---|---|
+| Anthropic | ephemeral | 5 min | yes (via prompt cache breakpoints) | yes (per-discussion state) |
+| Google (Gemini) | explicit | 1 h | yes (via `ttl` field) | yes (TTL override) |
+| OpenAI | implicit (auto) | 5-10 min (provider-managed) | no | no (just shows "cached") |
+
+---
+
+## 4. The codepath (the end-to-end flow)
+
+```
+[Q:ai_client.send() is called]
+   │
+   ▼
+[I:aggregate.build_initial_context(ctrl, user_message) -> str]
+   │
+   ├──► [I:layer 1-7: build stable prefix (the cache-friendly part)]
+   │
+   ├──► [I:layer 8-12: build volatile suffix (the per-turn part)]
+   │
+   ├──► [I:concatenate stable + volatile = full context]
+   │
+   ├──► [I:stable_prefix_length(ctrl) -> N]    (the cache boundary)
+   │
+   ▼
+[Q:cache boundary N > 0?]
+   │
+   ├── no ──► [I:pass full context to provider; no caching]
+   │
+   ▼
+[Q:provider is Anthropic?]
+   │
+   ├── yes ──► [I:cache_prefix_blocks(full_context, [N]) -> content_blocks]
+   │            [I:anthropic.messages.create(content=content_blocks)]
+   │
+[Q:provider is Gemini?]
+   │
+   ├── yes ──► [I:create cachedContent resource for stable prefix]
+   │            [I:genai.models.generate_content(cached_content=..., contents=volatile)]
+   │
+[Q:provider is OpenAI?]
+   │
+   ├── yes ──► [I:openai.responses.create(input=full_context)]    (provider handles caching)
+   │
+[I:return LlmResult(text, input_tokens, output_tokens)]
+   │
+   ▼
+[Q:return to caller; aggregate.test_aggregate_stable_to_volatile_ordering is run]
+   │
+[T:end]
+```
+
+---
+
+## 5. The GUI exposure (per-provider cache state)
+
+The "Caching" Operations Hub sub-panel (per the v2.3 §5.3 sketch):
+
+```
+------------------------------------------------------+
+| Caching                                              |
+------------------------------------------------------+
+| Provider summaries                                   |
+| [Anthropic]   in:340 cache:80  hit:23%  ttl:4:32   |
+| [Gemini]      in:120 cache:0   hit:0%   ttl:0:00   |
+| [OpenAI]      in:560 cache:200 hit:35%  ttl:n/a    |
+------------------------------------------------------+
+| Active discussions                                   |
+| Discussion "refactor auth"                           |
+|   cached: yes (Anthropic)                            |
+|   expires: 2026-06-12T15:32 (in 4:32)                |
+|   [Invalidate cache] [Disable caching for this]      |
+| Discussion "fix the parser"                           |
+|   cached: no                                         |
+|   [Enable caching for this]                         |
+------------------------------------------------------+
+| Global settings                                      |
+|   [X] Enable Anthropic ephemeral caching             |
+|   [X] Enable Gemini explicit caching                 |
+|   [ ] Allow >1h Gemini caches (charges may apply)    |
+|   Anthropic default TTL: [5 min v]                   |
+|   Gemini default TTL:    [60 min v]                  |
+------------------------------------------------------+
+```
+
+**The data sources:**
+
+| Widget | Data source | Frequency |
+|---|---|---|
+| `in:N cache:N hit:N%` | `ai_client.get_token_stats()` (already exported) | per turn (or per session) |
+| `ttl:4:32` | `ai_client._send_<provider>` usage metadata + the cache expiry timestamp | per turn |
+| `cached: yes/no` | per-discussion flag (NEW; tracks which discussions have active caches) | per discussion |
+| `[Invalidate cache]` | calls `ai_client._invalidate_cache(discussion_id)` (NEW) | on click |
+
+**The new AI client state:**
+
+```python
+# In src/ai_client.py (NEW)
+@dataclass
+class DiscussionCacheState:
+    discussion_id: str
+    provider: str
+    cached_at: datetime
+    expires_at: Optional[datetime]  # None for OpenAI implicit
+    hit_count: int = 0
+    tokens_cached: int = 0
+    last_invalidated_at: Optional[datetime] = None
+    caching_enabled: bool = True   # user can disable per-discussion
+
+# In AppController (NEW)
+self.discussion_caches: dict[str, DiscussionCacheState] = {}  # keyed by discussion_id
+```
+
+**The Hook API additions:**
+
+```
+GET  /api/cache                        # list all discussion cache states
+GET  /api/cache/<discussion_id>        # get one
+POST /api/cache/<discussion_id>/invalidate
+POST /api/cache/<discussion_id>/disable
+POST /api/cache/<discussion_id>/enable
+```
+
+---
+
+## 6. The interaction with the 4 memory dimensions (where the cache hits)
+
+| Dim | Where injected | Stable? | Cache impact |
+|---|---|---|---|
+| Curation | layer 9 (active preset) | no (per turn) | NOT cached; the user might switch presets |
+| Discussion | layer 8 (metadata) + layer 11 (prior turns) | no (per turn) | NOT cached (except: layer 8 metadata is the boundary) |
+| RAG | the `{rag-context}` block, appended to layer 8-12 | no (per query) | NOT cached; RAG is volatile per query |
+| Knowledge | layer 7 (digest) + per-file (file-knowledge) | yes (within a gc cycle) | CACHED; the digest is the stable prefix |
+
+**The cache only hits on the stable prefix (layers 1-7).** The volatile suffix (layers 8-12) is *not* cached; the user expects the conversation to change per turn.
+
+**The interaction with knowledge harvest:** when `nagent-gc` (or the Manual Slop equivalent) regenerates the digest, the cache is invalidated for the next turn. The user has a way to force invalidation manually (the `[Invalidate cache]` button).
+
+**The interaction with file edit:** when the user edits a file in the Structural File Editor, the file-knowledge for that file is updated. The cache is invalidated for the next turn that references the file. The per-file knowledge change is a cache invalidator.
+
+---
+
+## 7. The cross-references
+
+- `conductor/code_styleguides/data_oriented_design.md` §3.2, §3.3, §3.4 — the data-oriented foundation
+- `conductor/code_styleguides/agent_memory_dimensions.md` — the 4 dims (where the cache hits)
+- `conductor/code_styleguides/knowledge_artifacts.md` — the knowledge digest (the layer 7 cached content)
+- `docs/guide_caching_strategy.md` — the user-facing deep-dive
+- `src/aggregate.py:run` — the consumer of this styleguide
+- `src/ai_client.py:_send_<provider>` — the producer
+- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5 — the nagent pattern that informed this styleguide