docs: agent workflow docs + regular docs (v2.3 surfacing)

Per user request 'use your remaining context to update agent workflow docs and then regular docs based on what was discussed in this report', this commit creates/updates 15 files derived from the v2.3 nagent review (the 12 new nagent additions + the 4 memory dimensions reframing + the cache strategy + the RAG discipline + the knowledge harvest pattern). Agent workflow docs (4 files): - AGENTS.md (UPDATE): add @import line to canonical DOD + 'Code Styleguides' section pointing to the 6 new styleguides + new 'Human-Facing Documentation' section pointing to ./docs/AGENTS.md - conductor/workflow.md (UPDATE): new section 'Additions (2026-06-12) - the 12 patterns from the latest nagent corpus' with TDD protocols for knowledge harvest, cache ordering, compaction, RAG discipline - conductor/product-guidelines.md (UPDATE): new sections 'Memory Dimensions (added 2026-06-12)' + 'See Also - Updated' with the 6-styleguide catalog - docs/AGENTS.md (NEW): the agent-facing mirror of docs/Readme.md (per the nagent CLAUDE.md pattern). 10 sections + the per-tier reading path + the 4 memory dimensions + the caching strategy + the knowledge harvest + the RAG discipline + the feature flags Regular docs (11 files): - 6 new styleguides (the convention catalog): * data_oriented_design.md: the canonical DOD reference (Tier 0/1/2; 3 defaults to reject; 8 core defaults; 7-question simplification pass; 10-question self-check; 4 memory dimensions in Manual Slop context) * agent_memory_dimensions.md: the 4 memory dims (curation / discussion / RAG / knowledge) + when to use each + the boundaries * rag_integration_discipline.md: the conservative-RAG rule (opt-in, complement, provenance, no mutation, feature-gated, graceful failure) * cache_friendly_context.md: stable-to-volatile context ordering + the cache TTL GUI contract + the byte-comparison test * knowledge_artifacts.md: the knowledge harvest pattern (category files, provenance, sha256 ledger, digest regeneration, 'delete to turn off') * feature_flags.md: file presence vs config flags vs CLI flags - 3 new project docs (the cross-cutting guides): * guide_agent_memory_dimensions.md: the cross-cutting guide on the 4 dims + the decision tree * guide_caching_strategy.md: caching across providers + stable-to-volatile ordering + cache TTL GUI + the byte- comparison test + the 5th provider (claude-code) * guide_knowledge_curation.md: the knowledge memory guide (4th dim) + the 5 category files + per-file notes + the digest + the ledger + the harvest workflow - 2 existing doc updates: * guide_mma.md: new sections 'Delegation as context management' + 'The 4 memory dimensions (the MMA scope)' * guide_ai_client.md: new section 'Cache strategy and the 12- layer model' + the 5th provider (claude-code) All files use the same style as the v2.3 review (the user's preferred format): 7-column tables, no JSON, SSDL shape tags, forth/array notation, file:line citations, ASCII sketches where useful. The human Readme files (Readme.md, docs/Readme.md) are NOT modified (per repeated user instruction). The 5th provider (claude-code) is documented in guide_ai_client.md + the data_oriented_design.md references the nagent pattern as the source of the canonical rules. The cross-references are bidirectional: the 6 styleguides reference the 3 project docs; the 3 project docs reference the 6 styleguides; the 2 doc updates reference both; AGENTS.md + ./docs/AGENTS.md provide the entry points.
2026-06-12 13:50:40 -04:00
parent d604a63e1f
commit 35c6cca134
15 changed files with 3460 additions and 1 deletions
@@ -703,3 +703,143 @@ Audit: `scripts/audit_providers_source_of_truth.py` fails if `PROVIDERS` is decl
 - `tests/test_vendor_capabilities.py` (3 tests): registry lookup, vendor-default fallback, unknown-vendor raises
 - `tests/test_openai_compatible.py` (6 tests): non-streaming, streaming aggregation, tool call detection, vision, error classification, frozen dataclass
 - **[conductor/tracks/nagent_review_20260608/report.md §15 Pitfalls #2 and #4](../conductor/tracks/nagent_review_20260608/report.md)** — Deep-dive on the per-provider history globals and the stateful singleton pattern; future-track candidate for stateless LLMClient
+## Addition (2026-06-12) — Cache strategy and the 12-layer model
+
+The nagent review (v2.3, §3.2 + §5) formalizes the cache strategy that this client implements. The strategy: **stable-to-volatile context ordering**, where layers 1-7 of the initial context are byte-identical across turns and across discussions of the same mode (and therefore cacheable), and layers 8-12 are per-turn (and therefore not cached).
+
+### The 12-layer model (the recap)
+
+| # | Layer | Stable? | Where |
+|---|---|---|---|
+| 1 | Role instructions | yes | `_get_combined_system_prompt` |
+| 2 | Function-calling schema | yes | per provider |
+| 3 | Discovered tool descriptions | yes | `mcp_client.get_tool_schemas()` |
+| 4 | System prompt preset | yes | `app_state.ai_settings.system_prompt` |
+| 5 | Persona profile | yes | `app_state.active_persona` |
+| 6 | Project context (per `manual_slop.toml`) | yes | NEW (Candidate 14) |
+| 7 | Knowledge digest (per `knowledge/digest.md`) | yes (within gc cycle) | NEW (Candidate 8) |
+| 8 | Discussion metadata | no | `disc_entries[:1]` or `disc_meta` |
+| 9 | Active preset (FileItem set) | no | `self.context_files` |
+| 10 | Per-file details | no | per `FileItem` |
+| 11 | Prior tool results | no | per `_reread_file_items` |
+| 12 | User message | no | the input |
+
+### The byte-comparison test (the design contract)
+
+The test in `tests/test_aggregate_caching.py` ensures the first N characters of the context are byte-identical across turns:
+
+```python
+def test_aggregate_stable_to_volatile_ordering():
+    ctrl = mock_app_controller()
+    turn1 = aggregate.build_initial_context(ctrl, user_message="first")
+    turn2 = aggregate.build_initial_context(ctrl, user_message="second")
+    N = aggregate.stable_prefix_length(ctrl)
+    assert turn1[:N] == turn2[:N], f"Stable prefix mismatch: {turn1[:N]!r} != {turn2[:N]!r}"
+```
+
+**The test is the contract.** If a new layer is added in the wrong position, the test fails; the agent must move the layer to the stable position or update the test with written justification.
+
+### The provider-specific cache strategies
+
+#### Anthropic (5-min ephemeral, 4 breakpoints max)
+
+```python
+def _send_anthropic(messages, *, cache_prefix_chars=None):
+    if cache_prefix_chars is not None:
+        content_blocks = cache_prefix_blocks(messages, cache_prefix_chars)
+    else:
+        content_blocks = messages
+
+    response = anthropic_client.messages.create(
+        model=model,
+        max_tokens=8192,
+        messages=[{"role": "user", "content": content_blocks}],
+    )
+    return _result_with_usage(response.content, response.usage, messages)
+```
+
+**The `cache_prefix_blocks` helper** splits the message at the given char offsets and marks each prefix with `cache_control: {"type": "ephemeral"}`. Max 3 prefix blocks (provider limit is 4 breakpoints per request).
+
+**The Anthropic usage accounting** (in `_result_with_usage`): `cache_read_input_tokens` + `cache_creation_input_tokens` are added to `input_tokens` so the accounting stays "tokens sent" across providers. Caching is *invisible* in the user-facing number.
+
+#### Gemini (1-h explicit, configurable TTL)
+
+```python
+def _send_gemini(messages, *, cache_ttl_seconds=3600):
+    if cache_ttl_seconds > 0:
+        cached_content = genai_client.caches.create(
+            model=model, contents=stable_prefix_messages, ttl=f"{cache_ttl_seconds}s",
+        )
+        response = genai_client.models.generate_content(
+            model=model, contents=volatile_messages,
+            config=genai.types.GenerateContentConfig(cached_content=cached_content.name),
+        )
+    else:
+        response = genai_client.models.generate_content(model=model, contents=messages)
+    return _result_with_usage(response.text, response.usage_metadata, messages)
+```
+
+**The default TTL is 1 hour**; configurable per-discussion via the GUI.
+
+#### OpenAI (5-10 min implicit, provider-managed)
+
+No application-side control; the provider handles caching. The GUI just shows "Cached by OpenAI; TTL: provider-managed."
+
+### The GUI exposure (the "Caching" Operations Hub sub-panel)
+
+| Provider | Default TTL | Configurable? |
+|---|---|---|
+| Anthropic ephemeral | 5 min | yes (per-discussion state) |
+| Gemini explicit | 1 h | yes (TTL override) |
+| OpenAI implicit | 5-10 min (provider-managed) | no |
+| claude-code (Claude Agent SDK) | varies (provider-managed) | no |
+
+**The new AI client state:**
+
+```python
+@dataclass
+class DiscussionCacheState:
+    discussion_id: str
+    provider: str
+    cached_at: datetime
+    expires_at: Optional[datetime]  # None for OpenAI implicit
+    hit_count: int = 0
+    tokens_cached: int = 0
+    last_invalidated_at: Optional[datetime] = None
+    caching_enabled: bool = True
+```
+
+**The Hook API additions:**
+
+```
+GET  /api/cache                        # list all discussion cache states
+GET  /api/cache/<discussion_id>        # get one
+POST /api/cache/<discussion_id>/invalidate
+POST /api/cache/<discussion_id>/disable
+POST /api/cache/<discussion_id>/enable
+```
+
+### The 5th provider (claude-code)
+
+`claude-code` uses the Claude Agent SDK with local Claude Code authentication (no API key). The caching behavior is provider-managed.
+
+```python
+def _send_claude_code(message, model, *, allowed_tools=None, max_turns=1):
+    options = ClaudeAgentOptions(
+        model=None if not model or model == "default" else model,
+        max_turns=max_turns,
+        tools=list(allowed_tools) if allowed_tools else [],
+        allowed_tools=list(allowed_tools) if allowed_tools else [],
+        cwd=os.getcwd(),
+    )
+    # ... claude_agent_sdk.query(prompt=message, options=options)
+    return _result_with_usage(text, usage, message)
+```
+
+### The cross-references
+
+- `docs/guide_caching_strategy.md` — the user-facing deep-dive
+- `conductor/code_styleguides/cache_friendly_context.md` — the canonical styleguide
+- `docs/guide_agent_memory_dimensions.md` — the 4 dims (where the cache hits)
+- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` §3.2, §5 — the nagent pattern
+