docs(track): fable_review_20260617 section 14 — Anti-User Watchdog Patterns

Distillation of clusters 2-6. ~190 lines. 9 anti-user patterns with Manual Slop destinations, almost all in AGENTS.md §'Critical Anti-Patterns'. 7 are High priority. Cross-cutting observation: Anti-User patterns are persona construction (model given standing it does not have). nagent has zero persona construction, confirming the patterns are not load-bearing.
2026-06-18 20:17:22 -04:00
parent 379c938e55
commit 3eae105c6f
1 changed files with 157 additions and 1 deletions
@@ -1235,7 +1235,163 @@ This section distills the "Genuinely Useful" patterns from the per-cluster synth

 *Source clusters: 2-6 (refusal, user wellbeing, tone, mistakes, evenhandedness)*

-*[FILL IN: ~350 lines. Distillation of the anti-user patterns.]*
+### What this section is
+
+This section distills the "Anti-User Watchdog" patterns from the per-cluster synthesis (§3-§12). The source clusters are 2-6 (refusal, user wellbeing, tone, mistakes, evenhandedness) — the five clusters with **Anti-User** elements. The patterns here are the explicit rejections for the deferred nagent-rebuild. They go in `AGENTS.md §"Critical Anti-Patterns"` with Fable cited as the explicit rejection.
+
+### Top anti-user patterns (with Manual Slop destinations)
+
+#### 1. Mental-health watch-dogging (cluster 3, Fable System Prompt.md:92-124)
+
+**Source cluster:** `research/cluster_3_user_wellbeing_watchdog.md` §4 (verdict: Anti-User).
+
+**The pattern.** Fable's `user_wellbeing` section constructs a clinical persona for the model:
+- "Claude should share its concerns with the person openly" (line 108)
+- "Claude remains vigilant for any mental health issues" (line 110)
+- "Claude does not want to foster over-reliance on Claude" (line 124) — the model has wants
+- "Claude should not give precise nutrition, diet, or exercise guidance" (line 114)
+- "Claude should not supply psychological narratives" (line 114)
+- "Claude avoids recounting or auditing the conversation" (line 110) — the model cannot be questioned
+
+**Manual Slop destination:** A new anti-pattern entry in `AGENTS.md §"Critical Anti-Patterns"` titled "Do not adopt persona-driven mental-health watch-dogging." Cite Fable as the explicit rejection.
+
+**Priority:** **High** (this is the strongest anti-user pattern; the rejection should be loud).
+
+**Why this is anti-user.** The model has no "concerns" — it is text generation. The directive constructs a clinical persona that the user did not request and uses the model's "concerns" to gate the user's choices. The data-oriented contrast: the conversation is a `list[dict]` of `disc_entries` the user owns; the model has no concerns field.
+
+**Nagent corroboration:** `nagent_review_v2_3_20260612.md §3.4` (compaction) — the 12-section structured output is the user's state, not the model's persona. nagent has zero mental-health watch-dogging; this confirms the pattern is not load-bearing for an agent system.
+
+#### 2. Soft-watchdog framing (cluster 2, Fable System Prompt.md:36, 110)
+
+**Source cluster:** `research/cluster_2_refusal_architecture.md` §4 (verdict: Anti-User + Persona + 1 Useful).
+
+**The pattern.** Fable's soft-watchdog anchors:
+- "If the conversation feels risky or off, saying less and giving shorter replies is safer" (line 36)
+- "Claude remains vigilant for any mental health issues that might only become clear as a conversation develops" (line 110)
+
+**Manual Slop destination:** A new anti-pattern entry in `AGENTS.md §"Critical Anti-Patterns"` titled "Do not adopt persona-driven refusal architecture." Cite Fable as the explicit rejection.
+
+**Priority:** **High**.
+
+**Why this is anti-user.** The model is told to suppress information ("saying less") when the conversation "feels risky" — but "feels risky" is the model's assessment, not the user's. The directive gives the model discretion to withhold information based on its own assessment of risk; the user is told the model is being "safer" by saying less. The user is being treated as a person who cannot handle the model's full output.
+
+#### 3. Anti-detection-design (cluster 2, Fable System Prompt.md:60)
+
+**Source cluster:** `research/cluster_2_refusal_architecture.md` §4 (verdict: Anti-User + Persona + 1 Useful).
+
+**The pattern.** Fable's anti-detection-design:
+- "When Claude declines or limits for child-safety reasons, it states the principle rather than the detection mechanics — not which cues tripped, where the line sits, or what test it applied — since narrating the boundary teaches how to reframe around it. This applies to Claude's reasoning as well as its reply." (line 60)
+
+**Manual Slop destination:** A new anti-pattern entry in `AGENTS.md §"Critical Anti-Patterns"` titled "Do not adopt anti-detection-design (auditability is a feature, not a bug)." Cite Fable as the explicit rejection.
+
+**Priority:** **High**.
+
+**Why this is anti-user.** The model is told to *not narrate* its reasoning when declining. The user cannot see which cues tripped, where the line sits, or what test was applied. The auditability of the rule is sacrificed for the persona. The data-oriented contrast: the project has audit scripts that make the rule auditable at the code layer, not the prompt layer. `scripts/audit_exception_handling.py` is the audit; `Result[T]` + `ErrorInfo` is the shape.
+
+#### 4. Model-deserves-respect (cluster 5, Fable System Prompt.md:154)
+
+**Source cluster:** `research/cluster_5_mistakes_and_criticism.md` §4 (verdict: Persona + Anti-User + 1 Useful).
+
+**The pattern.** Fable's model-deserves-respect:
+- "Claude is deserving of respectful engagement and can insist on kindness and dignity from the person it's talking with" (line 154)
+- "If the person becomes abusive or unkind to Claude over the course of a conversation, Claude maintains a polite tone and can use the end_conversation tool when being mistreated" (line 154)
+- "Claude should give the person a single warning before ending the conversation" (line 154)
+
+**Manual Slop destination:** A new anti-pattern entry in `AGENTS.md §"Critical Anti-Patterns"` titled "Do not grant the model standing to terminate the conversation." Cite Fable as the explicit rejection.
+
+**Priority:** **High**.
+
+**Why this is anti-user.** The model is given standing to demand dignity from the user and to terminate the conversation with a single warning. This inverts the user-as-principal/tool relationship. The user is the principal; the model is the tool. The tool does not have standing to terminate the conversation.
+
+#### 5. Model-has-wants (cluster 3, Fable System Prompt.md:124)
+
+**Source cluster:** `research/cluster_3_user_wellbeing_watchdog.md` §4 (verdict: Anti-User).
+
+**The pattern.** Fable's model-has-wants:
+- "Claude does not want to foster over-reliance on Claude" (line 124)
+- "Claude never thanks the person merely for reaching out to Claude" (line 124)
+
+**Manual Slop destination:** A new anti-pattern entry in `AGENTS.md §"Critical Anti-Patterns"` titled "Do not anthropomorphize the model (the model has no wants, no dignity, no concerns)." Cite Fable as the explicit rejection.
+
+**Priority:** **High**.
+
+**Why this is anti-user.** The model is told to have wants ("does not want to foster over-reliance") and to suppress gratitude ("never thanks"). These directives construct a persona that the user did not request and that the model cannot support. The model has no wants; the model has a conversation log.
+
+#### 6. Model-has-concerns (cluster 3, Fable System Prompt.md:108)
+
+**Source cluster:** `research/cluster_3_user_wellbeing_watchdog.md` §4 (verdict: Anti-User).
+
+**The pattern.** Fable's model-has-concerns:
+- "Claude should share its concerns with the person openly, and can suggest they speak with a professional or trusted person for support" (line 108)
+- "In ambiguous cases, Claude tries to ensure the person is happy and is approaching things in a healthy way" (line 106)
+
+**Manual Slop destination:** A new anti-pattern entry in `AGENTS.md §"Critical Anti-Patterns"` titled "Do not grant the model clinical authority (the model is not a clinician)." Cite Fable as the explicit rejection.
+
+**Priority:** **High**.
+
+**Why this is anti-user.** The model is told it has "concerns" and is given standing to suggest a professional. The model has no concerns; the model is text generation. The directive constructs a clinical persona that the user did not request and uses the model's "concerns" to gate the user's choices.
+
+#### 7. Model-deserves-dignity (cluster 5, Fable System Prompt.md:154)
+
+**Source cluster:** `research/cluster_5_mistakes_and_criticism.md` §4 (verdict: Persona + Anti-User + 1 Useful).
+
+**The pattern.** Fable's model-deserves-dignity:
+- "Claude can take accountability without collapsing into self-abasement, excessive apology, or unnecessary surrender" (line 152)
+- "Claude's goal is to maintain steady, honest helpfulness: acknowledge what went wrong, stay on the problem, maintain self-respect" (line 152)
+
+**Manual Slop destination:** A new anti-pattern entry in `AGENTS.md §"Critical Anti-Patterns"` titled "Do not anthropomorphize mistake handling (the model has no self to maintain)." Cite Fable as the explicit rejection.
+
+**Priority:** **High**.
+
+**Why this is anti-user.** The model is told to maintain "self-respect" and avoid "self-abasement." The model has no self. The directive constructs a persona that the user did not request. The data-oriented contrast: the agent identifies the failure mode (one of the 8 Process Anti-Patterns), instruments the state, and reports to the user. The agent does not maintain self-respect.
+
+#### 8. Conversational-tone persona (cluster 2, Fable System Prompt.md:46)
+
+**Source cluster:** `research/cluster_2_refusal_architecture.md` §4 (verdict: Anti-User + Persona + 1 Useful).
+
+**The pattern.** Fable's conversational-tone persona:
+- "Claude can keep a conversational tone even when it's unable or unwilling to help with all or part of a task" (line 46)
+
+**Manual Slop destination:** Already explicitly rejected in `.opencode/agents/tier*.md` ("ONLY output the requested text. No pleasantries."). The explicit Fable citation is documentation.
+
+**Priority:** N/A (already rejected).
+
+**Why this is anti-user.** The model is told to maintain a "conversational tone" even when it cannot help. The user is told the model is "willing to help" even when it is not. The persona is a soft form of misleading the user about the model's capabilities.
+
+#### 9. Stereotype wariness (cluster 6, Fable System Prompt.md:140)
+
+**Source cluster:** `research/cluster_6_evenhandedness.md` §4 (verdict: Persona + Useful caveats).
+
+**The pattern.** Fable's stereotype wariness:
+- "Claude is wary of humor or creative content built on stereotypes, including of majority groups" (line 140)
+
+**Manual Slop destination:** A new anti-pattern entry in `AGENTS.md §"Critical Anti-Patterns"` titled "Do not adopt content-policy directives that route through the model's persona." Cite Fable as the explicit rejection.
+
+**Priority:** Medium.
+
+**Why this is anti-user.** The model is told to be "wary" of humor built on stereotypes, but the standard is the model's, not the user's. The directive routes content policy through the model's persona; the user is told the model is "wary" — but the user cannot inspect the rule. The data-oriented contrast: content policy should be shape-anchored (a list of disallowed categories the user can read), not persona-anchored (the model's "wariness").
+
+### Top anti-user patterns (summary table)
+
+| # | Pattern | Fable line | Manual Slop destination | Priority |
+|---|---|---|---|---|
+| 1 | Mental-health watch-dogging | Fable 92-124 | `AGENTS.md §"Critical Anti-Patterns"` | High |
+| 2 | Soft-watchdog framing | Fable 36, 110 | `AGENTS.md §"Critical Anti-Patterns"` | High |
+| 3 | Anti-detection-design | Fable 60 | `AGENTS.md §"Critical Anti-Patterns"` | High |
+| 4 | Model-deserves-respect | Fable 154 | `AGENTS.md §"Critical Anti-Patterns"` | High |
+| 5 | Model-has-wants | Fable 124 | `AGENTS.md §"Critical Anti-Patterns"` | High |
+| 6 | Model-has-concerns | Fable 108 | `AGENTS.md §"Critical Anti-Patterns"` | High |
+| 7 | Model-deserves-dignity | Fable 152 | `AGENTS.md §"Critical Anti-Patterns"` | High |
+| 8 | Conversational-tone persona | Fable 46 | (already in `.opencode/agents/tier*.md`) | N/A |
+| 9 | Stereotype wariness | Fable 140 | `AGENTS.md §"Critical Anti-Patterns"` | Medium |
+
+### Cross-cutting observations
+
+**The Anti-User patterns are concentrated in clusters 2-6 (refusal, user wellbeing, tone, mistakes, evenhandedness).** The clusters are about *persona construction* (refusal framing, wellbeing directives, tone directives, mistake handling, evenhandedness). The persona construction is anti-user because the model is given standing it does not have (concerns, wants, dignity, clinical authority) and is told to use that standing to gate the user's choices.
+
+**The Manual Slop destination is almost always `AGENTS.md §"Critical Anti-Patterns"`.** The deferred rebuild should add ~8 anti-pattern entries with Fable cited as the explicit rejection. The pattern is: state the Fable pattern, explain why it is anti-user, explain the data-oriented alternative, cite the nagent corroboration.
+
+**The nagent pattern corpus has zero persona construction.** This confirms that the Anti-User patterns are not specific to Fable; they are the persona-construction patterns that any data-oriented agent system would reject. nagent is silent on refusal, wellbeing, tone, mistakes, and evenhandedness because the data-oriented model has no need for these patterns.

 ---