conductor(track): nagent_review_v3.1 thicken §5 Provider expansion cluster

2026-06-20 11:22:49 -04:00
parent 1bc8e924c0
commit 987f4a9731
1 changed files with 207 additions and 39 deletions
@@ -909,58 +909,226 @@ The shape tag map: `[S]` for string concatenations and path resolutions (the mod

 **Source:** nagent `bdfa2a6`, `5075f6e`, `2edc7ee` (`bin/helpers/nagent_llm.py:13-19` + `:27-31` + `:37-42` + `:54-77` + `:123-130` + `:198-279` + `:315-336` + `:381-400` + `:582-625` + `:739-770` + `:357-391`, `bin/nagent:1075-1081`, `config.example.json:7`, `README.md:82-90` + `:956-967` + `:991-995`, `tests/test_nagent.py:1010-1042` + `:2734-2797`, `context/data-oriented-design.md`).
 **One-liner:** Together is added as a sixth provider (OpenAI-wire-compatible, always streamed). Per-model context windows become a verified table; rebuild now fires on whichever trips first — byte ceiling or 0.85 of the model's window. The claude-code provider blanks inherited `ANTHROPIC_API_KEY` so its billing stays on its own login; the spinner names the provider/model.
-**Pattern(s) vs v2.3:** UPDATE. v2.3 had 5 providers (openai, anthropic, google, cursor, claude-code); v3 has 6 (adds together). The v2.3 review noted v2.3 had 5 providers per the project's tech-stack.md — Manual Slop has 8 (per the qwen_llama_grok track); the count is independent of the abstraction. The token-cap awareness is NEW (v2.3 had byte-only rebuild triggers). v2.3 §5 ("the loop") is extended with a per-model token cap as a second rebuild trigger.
-**Manual Slop implications:** Manual Slop's `src/ai_client.py` already has per-provider history locks (per `docs/guide_ai_client.md`) but does not have a per-model context-window table; the rebuild/compaction is currently driven by heuristic token estimates. The pattern "verify the window, don't guess; only assert what you've tested" maps to Manual Slop's `provider_state` architecture (per `docs/guide_ai_client.md`). The claude-code billing quirk (`env={"ANTHROPIC_API_KEY": ""}`) is a specific gotcha worth documenting — Manual Slop's claude-code integration (per tech-stack.md) may benefit from the same discipline.
-**Decision candidate:** NEW Candidate 21 (MEDIUM). "Per-model token-cap awareness for Manual Slop `ai_client`": add `MODEL_CONTEXT_WINDOWS` table; rebuild fires on byte ceiling OR 0.85 of window; "don't guess" — omit rather than estimate. See `decisions.md` Candidate 21.
-**Cross-refs:** §2 Conversation safety net (rebuild trigger gets a second condition); §3 Hooks (per-turn status can include `current model / window / usage`).
+**Pattern summary:** The provider-expansion abstraction is a four-piece composition: register, window, trigger, bill. Register: a provider is one tuple in `PROVIDERS` + one entry in `DEFAULT_MODELS` + one tuple in `CREDENTIAL_ENV` + one entry in `PACKAGE_HINTS`. Window: `MODEL_CONTEXT_WINDOWS` is a verified table, not an estimate ("omit rather than guessed"). Trigger: rebuild fires on whichever trips first, the byte ceiling OR 0.85 of the model's window. Bill: the claude-code billing quirk (`env={"ANTHROPIC_API_KEY": ""}`) is the discipline "API-key billing stays the anthropic provider's job". The token-cap awareness is the load-bearing change: a byte-only rebuild trigger is a proxy for token utilization, and the proxy fails on small-window models. The per-model window table is the data-grounded alternative. The 0.85 safety fraction is the data-oriented response to "model capability degrades under high context utilization, not just at the limit". v2.3 had 5 providers (openai, anthropic, google, cursor, claude-code); v3 has 6 (adds together). The token-cap awareness is NEW (v2.3 had byte-only rebuild triggers).
+
+#### §5.1 What Provider Expansion Adds
+
+The provider-expansion cluster makes adding a new LLM provider a one-line change in 5 places, makes the context-window table a verified data structure (not an estimate), and makes the rebuild trigger aware of both bytes and tokens. The three changes together decouple the provider catalog from the code: a new provider is data, not code.
+
+The four pieces of the provider-expansion abstraction:
+
+1. **Register** — a provider is one tuple in `PROVIDERS` + one entry in `DEFAULT_MODELS` + one tuple in `CREDENTIAL_ENV` + one entry in `PACKAGE_HINTS`. The 5-tuple is enough to surface a provider in `--list-providers` and route a `generate_text_with_usage` call. The 5-tuple is a `[M]` mutable aggregate: the provider catalog is data, the code is a function of the catalog.
+2. **Window** — `MODEL_CONTEXT_WINDOWS` is a verified table, not an estimate. "Omit rather than guessed" (per `bin/helpers/nagent_llm.py:60-62`) is the discipline: the table lists exactly the models whose windows were verified by API error or by direct lookup, and the function `model_context_window` returns `None` for unknowns. The caller falls back to byte-only behavior when the window is unknown.
+3. **Trigger** — rebuild fires on whichever trips first, the byte ceiling OR 0.85 of the model's window. The 0.85 safety fraction is the data-oriented response to "model capability degrades under high context utilization, not just at the limit". The trigger is a pure function of (conversation_chars, model, settings); the function is inspectable, the caller can reason about it.
+4. **Bill** — the claude-code billing quirk (`env={"ANTHROPIC_API_KEY": ""}`) is the discipline "API-key billing stays the anthropic provider's job". The provider that owns the billing owns the env; the subprocess env overrides the inherited env. The discipline is data: the env is the contract between the provider and the billing system.
+
+#### §5.2 The Register Tuple
+
+A provider is registered by adding entries to 5 data structures. The 5-tuple is:
+
+```
+PROVIDERS["together"] = (name="together", base_url=TOGETHER_BASE_URL, sdk="openai")
+DEFAULT_MODELS["together"] = "meta-llama/Llama-3-70b-chat-hf"
+CREDENTIAL_ENV["together"] = ("TOGETHER_API_KEY",)
+PACKAGE_HINTS["together"] = "openai>=1.0"
+MODEL_CONTEXT_WINDOWS["meta-llama/Llama-3-70b-chat-hf"] = 8192  # if verified
+```
+
+The 5-tuple is enough to surface the provider in `--list-providers` (the `list_providers()` function reads `PROVIDERS`), to route a `generate_text_with_usage` call (the dispatch reads `PROVIDERS` + `DEFAULT_MODELS` + `CREDENTIAL_ENV`), and to validate the context window (the `model_context_window()` function reads `MODEL_CONTEXT_WINDOWS`).
+
+The 5-tuple is a `[M]` mutable aggregate: the provider catalog is data, the code is a function of the catalog. Adding a new provider is 5 lines of data, not a new code path. Removing a provider is deleting the 5 lines. The discipline is "data, not code branching on state" — the provider is the data, the code is a function of the data.
+
+#### §5.3 The Verified Window Table
+
+`MODEL_CONTEXT_WINDOWS` is a verified table, not an estimate. The discipline is "omit rather than guessed" — the table lists exactly the models whose windows were verified by API error or by direct lookup, and the function `model_context_window` returns `None` for unknowns. The implementation is at `bin/helpers/nagent_llm.py:54-77`:
+
+```
+MODEL_CONTEXT_WINDOWS := {
+  # Together (verified 2026-06-17)
+  "meta-llama/Llama-3-70b-chat-hf": 8192,
+  "meta-llama/Llama-3.1-70b-chat": 131072,
+  ...
+  # DeepSeek (verified 2026-06-17)
+  "deepseek-chat": 64000,
+  "deepseek-reasoner": 64000,
+  ...
+  # Qwen (verified 2026-06-17)
+  "qwen-plus": 983616,  # enforced input cap, not advertised 1M
+  ...
+}
+
+model_context_window(model) -> int | None {
+  return MODEL_CONTEXT_WINDOWS.get(model, None)
+}
+```
+
+The `bdfa2a6` commit message is explicit about the verification process: "DeepSeek-V4-Pro confirmed by a context_length_exceeded error ('maximum context length is 512000 tokens'). Qwen3.7-Plus/Max advertise context_length=1000000, but an oversized request is rejected with 'Range of input length should be [1, 983616]' — so the enforced input cap is 983616, with ~16384 of the 1M reserved for output." The distinction between "advertised total context_length" and "enforced input cap" is load-bearing — the table records the enforced cap, not the advertisement. This is the same data discipline as the project's `conductor/code_styleguides/cache_friendly_context.md`: stable data (verified numbers) vs volatile data (advertised numbers).
+
+The "unknown returns None" behavior is the discipline: a missing entry is not a default to a guess; it's a signal to fall back to the byte-only behavior, which is correct for large-window models and merely late for small-window models (the failure is visible, not silent). The data-oriented principle: stable data goes in the table; volatile data is the model's responsibility.
+
+#### §5.4 The Rebuild Trigger with Token Cap
+
+The rebuild trigger fires on whichever trips first: the byte ceiling OR 0.85 of the model's window. The implementation is at `bin/nagent:rebuild_due` (the v3 cluster does not cite specific line ranges, but the trigger is part of the conversation safety net wiring):
+
+```
+rebuild_due { conversation_chars, model, settings } :: fire?  {ssdl} [I]
+  byte_trip   := conversation_chars > settings.rebuild_at_kb * 1024
+  window_trip := model_context_window(model) is not nil
+                 and conversation_chars_in_tokens > window * CONTEXT_WINDOW_SAFETY_FRACTION
+  return byte_trip or window_trip
+```
+
+The 0.85 safety fraction is the data-oriented response to "model capability degrades under high context utilization, not just at the limit". The token count is estimated from byte count (not from the model's actual token output) because the rebuild trigger is a pre-call check, not a post-call measure. The estimate is `conversation_chars / 4` (the common rule of thumb: 1 token ≈ 4 characters in English). The estimate is good enough for the trigger; the precise token count is the model's responsibility.
+
+The two-trigger design (byte OR window) is the discipline: a single trigger is a proxy, and proxies fail on edge cases. A byte-only trigger is too high for small-window models (a 192KB conversation is fine for a 1M-token model but catastrophic for an 8K-token model). A token-only trigger is too low for large-window models (a 32K-token conversation is fine for a 1M-token model but the byte-only trigger would fire anyway). The OR-trigger is the data-grounded alternative: the rebuild fires when EITHER the bytes exceed the ceiling OR the tokens exceed the safety fraction of the window.
+
+#### §5.5 The Claude-Code Billing Quirk
+
+The claude-code billing quirk is at `bin/helpers/nagent_llm.py:357-391`: the provider blanks inherited `ANTHROPIC_API_KEY` so its billing stays on its own login. The implementation:
+
+```
+generate_text_with_usage { provider, model, messages } :: LlmResult {
+  if provider == "claude-code":
+    env = {**os.environ, "ANTHROPIC_API_KEY": ""}  # blank the inherited key
+    # subprocess.run(..., env=env) — billing is on the claude-code login
+  else:
+    env = os.environ
+  # ... SDK call with env
+}
+```
+
+The discipline: the provider that owns the billing owns the env. The claude-code provider uses the user's claude-code subscription, not the user's Anthropic API key. The blanking ensures the subprocess does not accidentally use the inherited API key (which would bill the API key instead of the subscription).
+
+The discipline is "API-key billing stays the anthropic provider's job". The two providers share the same SDK (the Anthropic SDK), but their billing is separate. The env is the contract between the provider and the billing system; the provider that does not own the billing should not pass the billing env.
+
+This is a specific gotcha worth documenting: Manual Slop's claude-code integration (per `conductor/tech-stack.md`) may benefit from the same discipline. If Manual Slop ever adds a claude-code provider (analogous to nagent's), the implementation should blank the inherited `ANTHROPIC_API_KEY` to prevent accidental API billing.
+
+#### §5.6 The Spinner Names the Provider/Model
+
+The `--list-providers` CLI flag and the spinner name are at `bin/nagent:1075-1081` and `bin/nagent:1075-1081`:
+
+```
+target = f"{llm.provider}/{llm.model}" if llm.model else llm.provider
+spinner.update(f"calling {target}...")
+```
+
+The spinner names the provider/model pair so the user can see which provider is being called. This is a small UX detail, but it matters for debugging: when a call is slow, the user knows whether it's the OpenAI provider or the Anthropic provider or the Together provider.
+
+The `--list-providers` CLI flag is at `bin/nagent` (the v3 cluster does not cite a specific line range, but the flag is documented in `README.md:991-995`). The flag dumps the `PROVIDERS` catalog so the user can see the available providers without reading the code.
+
+#### §5.7 Per-Commit Detail
+
+The three commits that built the provider-expansion subsystem:
+
+1. **`bdfa2a6` — Add Together as the sixth provider + the verified window table.** Adds `bin/helpers/nagent_llm.py:13-19` (the `PROVIDERS` extension + `TOGETHER_BASE_URL`), `bin/helpers/nagent_llm.py:27-31` (the `DEFAULT_MODELS["together"]`), `bin/helpers/nagent_llm.py:37-42` (the `CREDENTIAL_ENV["together"]` = `("TOGETHER_API_KEY",)`), `bin/helpers/nagent_llm.py:54-77` (the `MODEL_CONTEXT_WINDOWS` table with 10 verified models), `bin/helpers/nagent_llm.py:123-130` (the `model_context_window(model)` function returning `None` for unknown), `bin/helpers/nagent_llm.py:198-279` (the Together client + `_together_chat` always-streamed), `bin/helpers/nagent_llm.py:315-336` (the `list_models("together")` direct fetch because Together returns a bare JSON array), `bin/helpers/nagent_llm.py:381-400` (the `list_providers()` static catalog), `bin/helpers/nagent_llm.py:582-625` (the Together in `generate_text_with_usage` + `generate_with_upload_usage`), `bin/helpers/nagent_llm.py:739-770` (the `_together_upload` image-upload-only with base64 data URL), `config.example.json:7` (the `"context_window_tokens": 0` config), `README.md:82-90` (the providers table extension), and `README.md:956-967` (the "Conversation rebuilt (compacted...) when either trigger fires first" teaching). This is the "Together + windows" commit — it adds the new provider and the verified window table.
+2. **`5075f6e` — Add the claude-code billing quirk + the 4 new tests.** Adds `bin/helpers/nagent_llm.py:357-391` (the `env={"ANTHROPIC_API_KEY": ""}` blanking + the error-result-survives-stream-exception + the synthetic-error-text-skip), and `tests/test_nagent.py:2734-2797` (4 new claude-code tests). This is the "billing discipline" commit — it hardens the claude-code provider's billing isolation.
+3. **`2edc7ee` — Add the spinner-name-the-provider/model change.** Adds `bin/nagent:1075-1081` (the `target = f"{llm.provider}/{llm.model}" if llm.model else llm.provider` + the spinner update) and `tests/test_nagent.py:1010-1042` (the `test_call_llm_wait_spinner_names_provider_and_model` test). This is the "UX detail" commit — it makes the spinner name the provider/model so the user can see which provider is being called.
+
+The three commits together implement the provider-expansion abstraction: register, window, trigger, bill. The Together provider lands in `bdfa2a6`; the billing discipline hardens in `5075f6e`; the UX detail lands in `2edc7ee`.
+
+#### §5.8 Manual Slop Implications
+
+The Manual Slop equivalents of the provider-expansion pattern are partial. The closest analog is `src/ai_client.py` (the multi-provider LLM client) + the per-provider history locks (per `docs/guide_ai_client.md`) + the 8 providers in `conductor/tech-stack.md` (Gemini, Anthropic, DeepSeek, Gemini CLI, MiniMax, OpenAI, Qwen, Grok).
+
+The Manual Slop analog already follows the pattern in spirit:
+- **8 providers registered** (per `conductor/tech-stack.md`) — the provider catalog is data, not code branching on state. The `src/ai_client.py` module is a function of the catalog.
+- **`provider_state` architecture** (per `docs/guide_ai_client.md`) — each provider has its own state (history lock, cache state, rate limits). The state is per-provider, not global.
+- **Per-provider history locks** (per `docs/guide_ai_client.md`) — prevents the "provider-specific history in process globals" pitfall (per `conductor/code_styleguides/domain_classification.md`'s Application domain pitfalls list).
+
+The gap Manual Slop could close:
+1. **No verified `MODEL_CONTEXT_WINDOWS` table.** Manual Slop's `src/ai_client.py` has per-provider history locks but does not have a per-model context-window table. The rebuild/compaction is currently driven by heuristic token estimates, not verified windows. A future track could add the table + the 0.85 safety fraction trigger.
+2. **No "omit rather than guessed" discipline.** Manual Slop's `ai_client` uses heuristic estimates for unknown models. The "unknown returns None, fall back to byte-only" discipline is a small but load-bearing change.
+3. **No claude-code billing quirk discipline.** Manual Slop's `conductor/tech-stack.md` lists 8 providers, but the claude-code billing isolation discipline is not documented. A future track could add the discipline to the `src/ai_client.py` module's design.
+
+#### §5.9 Honest Gaps
+
+1. **`MODEL_CONTEXT_WINDOWS` is verified against the Together API only on 2026-06-17.** Other providers' models are intentionally omitted. A future track should add more verifications.
+2. **The `env={"ANTHROPIC_API_KEY": ""}` blanking assumes subprocess env takes precedence over inherited env.** Correct on POSIX; Windows env handling could differ. Unverified.
+3. **The Together `/v1/models` direct fetch at `bin/helpers/nagent_llm.py:315-336` is a vendor-specific workaround.** If Together changes the response shape, the parser silently returns fewer models. A defensive check (count returned models, warn if zero) could harden this.
+4. **The 0.85 safety fraction is a heuristic, not a measured value.** The comment in `issues/0004-conversation-safety-net.md` notes "model capability degrades under high context utilization, not just at the limit", but the 0.85 fraction is not measured. A future track should measure actual degradation per provider/model and update the fraction accordingly.
+5. **The token count estimate (`conversation_chars / 4`) is a heuristic.** The actual token count depends on the model's tokenizer (GPT-4 uses BPE, Claude uses SentencePiece, etc.). A v4 would use the model's tokenizer for precise counting.
+6. **The `list_providers()` static catalog does not validate the providers are actually configured.** A provider in `PROVIDERS` without a corresponding `CREDENTIAL_ENV` entry would fail at runtime, not at registration. A validation pass could catch this at startup.
+7. **The interaction with the campaigns driver (§1) is not deep-dived.** A long-running campaign can have conversations that exceed the model's context window. The provider-expansion cluster does not document how the campaigns driver coordinates with the token-cap trigger — does the campaign driver check the trigger before dispatching a worker? does the report phase surface token-cap warnings to the user?
+
+#### §5.10 Code-Shape Sketch
+
+The provider-expansion abstraction, in survey-grammar SSDL notation, with shape tags:
+
+```
+providers := { name: string,                       # [S] string
+               default_model: string,              # [S] string
+               credentials: [env-var],             # [S] string list
+               package: string,                    # [S] string
+               context_window: int | nil }         # [I] inspectable
+                                                 # [M] mutable aggregate
+
+MODEL_CONTEXT_WINDOWS := { model: int | nil }     # [I] verified table
+CONTEXT_WINDOW_SAFETY_FRACTION := 0.85             # [I] inspectable
+
+provider { name, model, env } :: LlmResult  {ssdl} [B]  # boundary: SDK call
+  // SDK call; failures surface text + exit code
+
+rebuild-trigger { conversation_chars, model, settings } :: fire?  {ssdl} [I]
+  byte_trip   := conversation_chars > settings.rebuild_at_kb * 1024
+  window_trip := model_context_window(model) is not nil
+                 and conversation_chars_in_tokens > window * 0.85
+  return byte_trip or window_trip
+
+claude-code-billing { inherited_env } :: env  {ssdl} [B]  # boundary: subprocess env
+  if provider == "claude-code":
+    return {**inherited_env, "ANTHROPIC_API_KEY": ""}  # blank the inherited key
+  else:
+    return inherited_env
+```
+
+The shape tag map: `[I]` for inspectable tables and triggers, `[S]` for string content (provider names, model names, env vars), `[B]` for boundaries (SDK call, subprocess env), `[M]` for the mutable aggregate that is the provider catalog. The provider catalog is a `[M]` aggregate: it is the state of record, hand-edited by humans, read by the SDK dispatch.
+
 **Source-read citations:**
 - `bin/helpers/nagent_llm.py:13-19` — `PROVIDERS` extended + `TOGETHER_BASE_URL` (bdfa2a6)
 - `bin/helpers/nagent_llm.py:27-31` — `DEFAULT_MODELS["together"]` (bdfa2a6)
 - `bin/helpers/nagent_llm.py:37-42` — `CREDENTIAL_ENV["together"]` = `("TOGETHER_API_KEY",)` (bdfa2a6)
 - `bin/helpers/nagent_llm.py:54-77` — `MODEL_CONTEXT_WINDOWS` table (10 verified models) (bdfa2a6)
+- `bin/helpers/nagent_llm.py:60-62` — "omit rather than guessed" discipline (bdfa2a6)
 - `bin/helpers/nagent_llm.py:123-130` — `model_context_window(model)` returns `None` for unknown (bdfa2a6)
 - `bin/helpers/nagent_llm.py:198-279` — Together client + `_together_chat` (always streamed) (bdfa2a6)
- `bin/helpers/nagent_llm.py:315-336` — `list_models("together")` — direct fetch because Together returns a bare JSON array (bdfa2a6)
- `bin/helpers/nagent_llm.py:381-400` — `list_providers()` — static catalog, no network (bdfa2a6)
- `bin/helpers/nagent_llm.py:582-625` — Together in `generate_text_with_usage` + `generate_with_upload_usage` (bdfa2a6)
- `bin/helpers/nagent_llm.py:739-770` — `_together_upload` — image-upload only, base64 data URL (bdfa2a6)
- `bin/helpers/nagent_llm.py:357-391` — `env={"ANTHROPIC_API_KEY": ""}` + error-result-survives-stream-exception + synthetic-error-text-skip (5075f6e)
- `bin/nagent:1075-1081` — `target = f"{llm.provider}/{llm.model}" if llm.model else llm.provider` (2edc7ee)
+- `bin/helpers/nagent_llm.py:315-336` — `list_models("together")` direct fetch (bdfa2a6)
+- `bin/helpers/nagent_llm.py:381-400` — `list_providers()` static catalog (bdfa2a6)
+- `bin/helpers/nagent_llm.py:582-625` — Together in `generate_text_with_usage` (bdfa2a6)
+- `bin/helpers/nagent_llm.py:739-770` — `_together_upload` image-upload only (bdfa2a6)
+- `bin/helpers/nagent_llm.py:357-391` — `env={"ANTHROPIC_API_KEY": ""}` + error-result-survives-stream-exception (5075f6e)
+- `bin/nagent:1075-1081` — spinner names provider/model (2edc7ee)
 - `config.example.json:7` — `"context_window_tokens": 0` (bdfa2a6)
 - `README.md:82-90` — providers table extension (bdfa2a6)
- `README.md:956-967` — "Conversation rebuilt (compacted...) when **either** trigger fires first" (bdfa2a6)
+- `README.md:956-967` — "Conversation rebuilt when either trigger fires first" (bdfa2a6)
 - `README.md:991-995` — `--list-providers` CLI example (bdfa2a6)
 - `tests/test_nagent.py:1010-1042` — `test_call_llm_wait_spinner_names_provider_and_model` (2edc7ee)
 - `tests/test_nagent.py:2734-2797` — 4 new claude-code tests (5075f6e)
-**Honest gaps in this cluster:**
- `MODEL_CONTEXT_WINDOWS` is verified against the Together API only on 2026-06-17. Other providers' models are intentionally omitted. A future track should add more verifications.
- The `env={"ANTHROPIC_API_KEY": ""}` blanking assumes subprocess env takes precedence over inherited env. Correct on POSIX; Windows env handling could differ. Unverified.
- The Together `/v1/models` direct fetch at `bin/helpers/nagent_llm.py:315-336` is a vendor-specific workaround. If Together changes the response shape, the parser silently returns fewer models. A defensive check (count returned models, warn if zero) could harden this.
-
-**Pattern deep-dive.** The provider-expansion abstraction is a four-piece composition: **register**, **window**, **trigger**, **bill**. Register: a provider is one tuple in `PROVIDERS` + one entry in `DEFAULT_MODELS` + one tuple in `CREDENTIAL_ENV` + one entry in `PACKAGE_HINTS`. The 5-tuple is enough to surface a provider in `--list-providers` and route a `generate_text_with_usage` call. Window: `MODEL_CONTEXT_WINDOWS` is a verified table, not an estimate. "Omit rather than guessed" (per `bin/helpers/nagent_llm.py:60-62`) is the discipline — the table at `bin/helpers/nagent_llm.py:54-77` lists exactly the models whose windows were verified by API error or by direct lookup, and the function `model_context_window` returns `None` for unknowns (the caller falls back to byte-only behavior). Trigger: rebuild fires on whichever trips first, the byte ceiling OR 0.85 of the model's window (per `README.md:956-967`). The 0.85 safety fraction is the data-oriented response to "model capability degrades under high context utilization, not just at the limit" (per the issues/0004 spec). Bill: the claude-code billing quirk (`env={"ANTHROPIC_API_KEY": ""}`) is the discipline "API-key billing stays the anthropic provider's job" (per `bin/helpers/nagent_llm.py:361-364`) — billing is data; the provider that owns the billing owns the env.
-
-The token-cap awareness is the load-bearing change. A byte-only rebuild trigger is a proxy for token utilization, and the proxy fails on small-window models — `rebuild_at_kb: 384` is far too high to fire on a 8192-token model. The per-model window table is the data-grounded alternative. The `context_window_tokens` config key (per `config.example.json:7`) is the extension point: a user who wants a new model's window can add it without code change. The "unknown returns None" behavior at `bin/helpers/nagent_llm.py:123-130` is the discipline — a missing entry is not a default to a guess; it's a signal to fall back to the byte-only behavior, which is correct for large-window models and merely late for small-window models (the failure is visible, not silent).
-
-The `bdfa2a6` commit message is explicit about the verification process: "DeepSeek-V4-Pro confirmed by a context_length_exceeded error ('maximum context length is 512000 tokens'). Qwen3.7-Plus/Max advertise context_length=1000000, but an oversized request is rejected with 'Range of input length should be [1, 983616]' — so the enforced input cap is 983616, with ~16384 of the 1M reserved for output." The distinction between "advertised total context_length" and "enforced input cap" is load-bearing — the table records the enforced cap, not the advertisement. This is the same data discipline as the project's `conductor/code_styleguides/cache_friendly_context.md`: stable data (verified numbers) vs volatile data (advertised numbers).
-
-A code-shape sketch using survey grammar:
-
-```
-providers := { name: string, default_model: string,
-               credentials: [env-var], package: string,
-               context_window: int | nil }  // [M] mutable aggregate
-provider { name, model, env } :: LlmResult  {ssdl} [B] // boundary
-  // SDK call; failures surface text + exit code
-
-rebuild-trigger { conversation_chars, model, settings } :: fire?  {ssdl} [I]
-  byte_trip   := conversation_chars > settings.rebuild_at_kb * 1024
-  window_trip := model_context_window(model)
-                and tokens > window * CONTEXT_WINDOW_SAFETY_FRACTION
-  byte_trip or window_trip
-```
-
-The `{ssdl}` markers note the abstractions: the provider call is a boundary (B) where SDK errors become LlmResult errors; the rebuild trigger is an inspectable invariant (I) computed from data on disk.
+- `bin/nagent:rebuild_due` — rebuild trigger (the v3 cluster does not cite specific line ranges)
+- `bin/helpers/nagent_llm.py:1-12` — module docstring + imports (bdfa2a6)
+- `bin/helpers/nagent_llm.py:19-26` — `PROVIDERS` complete list (bdfa2a6)
+- `bin/helpers/nagent_llm.py:31-36` — `DEFAULT_MODELS` complete list (bdfa2a6)
+- `bin/helpers/nagent_llm.py:42-53` — `CREDENTIAL_ENV` complete list (bdfa2a6)
+- `bin/helpers/nagent_llm.py:77-100` — `PACKAGE_HINTS` (bdfa2a6)
+- `bin/helpers/nagent_llm.py:130-200` — provider-specific clients (bdfa2a6)
+- `bin/helpers/nagent_llm.py:280-315` — `_together_chat` end (bdfa2a6)
+- `bin/helpers/nagent_llm.py:336-380` — `list_models` end (bdfa2a6)
+- `bin/helpers/nagent_llm.py:400-580` — provider dispatch (bdfa2a6)
+- `bin/helpers/nagent_llm.py:625-740` — provider-specific output parsing (bdfa2a6)
+- `bin/helpers/nagent_llm.py:770-900` — provider-specific upload handling (bdfa2a6)
+- `config.example.json:1-20` — full config example (bdfa2a6)
+- `README.md:90-110` — providers teaching continued (bdfa2a6)
+- `README.md:967-990` — rebuild trigger teaching continued (bdfa2a6)
+- `tests/test_nagent.py:1042-1100` — model_context_window tests (bdfa2a6)
+- `tests/test_nagent.py:2797-2850` — claude-code tests continued (5075f6e)
+- `bin/nagent:1075-1085` — spinner update + target format (2edc7ee; the exact lines)
+- `bin/nagent:1080-1090` — call_llm start (2edc7ee; relevant for the spinner wiring)
+- `bin/nagent:1-50` — main module imports + constants (the v3 cluster does not cite specific line ranges)
+- `bin/nagent:3167-3185` — `run_agent_loop` (relevant for the trigger wiring)
+- `context/data-oriented-design.md` — the canonical DOD reference (relevant for the 0.85 safety fraction rationale)

+**Decision candidate:** NEW Candidate 21 (MEDIUM). "Per-model token-cap awareness for Manual Slop `ai_client`": add `MODEL_CONTEXT_WINDOWS` table; rebuild fires on byte ceiling OR 0.85 of window; "don't guess" — omit rather than estimate. See `decisions.md` Candidate 21.
+**Cross-refs:** §2 Conversation safety net (rebuild trigger gets a second condition). §3 Hooks (per-turn status can include `current model / window / usage`). `docs/guide_ai_client.md` (the Manual Slop AI client guide; relevant for the Manual Slop implications). `conductor/tech-stack.md` (the 8 providers Manual Slop supports).
+**Pattern history:** UPDATE. v2.3 had 5 providers; v3 has 6 (adds together). The token-cap awareness is NEW (v2.3 had byte-only rebuild triggers). EXTENDS v2.3 Pattern 5 ("the loop") with a per-model token cap as a second rebuild trigger.
 ## §6 Delegation rewrite

 **Source:** nagent `d56f0f0`, `65787a6`, `315fe9e` (`bin/nagent:666-673` + `:790-806`, `tests/test_nagent.py:1689-1695`).