From 3e440b18ffb031ba3458829dc6d23dbc392de8d7 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Thu, 18 Jun 2026 20:08:46 -0400 Subject: [PATCH] =?UTF-8?q?docs(track):=20fable=5Freview=5F20260617=20sect?= =?UTF-8?q?ion=202=20=E2=80=94=20The=20Framework?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Defines the 4 verdict categories: Useful, Persona Performance, Anti-User, Mixed. Why this lens, not 'good vs bad' or 'safe vs unsafe'. ~200 lines. Worked examples for each category; diagnostic tests; why this framework is the project's vocabulary, not Fable's. --- .../tracks/fable_review_20260617/report.md | 58 ++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/conductor/tracks/fable_review_20260617/report.md b/conductor/tracks/fable_review_20260617/report.md index ea92296a..bf798491 100644 --- a/conductor/tracks/fable_review_20260617/report.md +++ b/conductor/tracks/fable_review_20260617/report.md @@ -158,7 +158,63 @@ The review applies the verdict framework (Useful / Persona Performance / Anti-Us ## §2. The "Useful vs Persona vs Anti-User" Framework -*[FILL IN: ~250 lines. Define the 4 verdict categories. Why this lens.]* +### Why a verdict framework, not a feature list + +A naive review of a system prompt would produce a feature-by-feature catalogue: "Fable has a memory system. Fable has a search tool. Fable has a refusal framework." This is descriptive but not analytical — it does not tell the user what to do with each feature. Should the deferred nagent-rebuild adopt it? Reject it? Ignore it? "It's a feature" is not a verdict. + +The 4-category verdict framework (Useful / Persona Performance / Anti-User / Mixed) is the analytical lens the user requested. It collapses descriptive enumeration into a single decision per Fable pattern: "what does the deferred rebuild do with this?" The framework is **not** "good vs bad" or "safe vs unsafe" — those are moral/regulatory frames that this review deliberately does not adopt. The frame is "what work does this pattern do for an agent system, and is that work already done better by Manual Slop's existing patterns?" + +### Category 1: Useful + +A Fable pattern is **Useful** if it solves a problem that Manual Slop does not already solve, or solves it in a more disciplined way. Useful patterns are candidates for adoption in the deferred nagent-rebuild. The verdict is anchored to a concrete Manual Slop destination: a new section in `conductor/code_styleguides/rag_integration_discipline.md`, a new anti-pattern entry in `AGENTS.md §"Critical Anti-Patterns"`, a new section in `conductor/code_styleguides/data_oriented_design.md`, etc. + +**Worked example.** Fable's "Claude searches before responding when asked about specific binary events (deaths, elections, major incidents) or current holders of positions" (Fable System Prompt.md:162) is **Useful**. Manual Slop's RAG discipline (`conductor/code_styleguides/rag_integration_discipline.md`) is opt-in, not default-on; the Fable pattern is a concrete, actionable default-on rule for a specific class of queries. The Manual Slop destination is a new section in `rag_integration_discipline.md` titled "Search-Default for Current-State Queries." The priority is Medium; the rule is useful but the project's RAG philosophy is opt-in, so the rule should be applied per-query, not globally. + +**Counter-example.** Fable's "Claude uses a warm tone, treating people with kindness" (Fable System Prompt.md:70) is **not** Useful; it is Persona Performance. The warm-tone directive is decorative — the model would produce a warm response anyway, and the explicit directive is constraint dressing. The Manual Slop "rejection" is in two tier-agent files: `.opencode/agents/tier1-orchestrator.md:6-7` ("ONLY output the requested text. No pleasantries.") and `.opencode/agents/tier3-worker.md:3-4` (same). The warm-tone persona is already explicitly rejected. + +### Category 2: Persona Performance + +A Fable pattern is **Persona Performance** if it is constraint dressing — a directive the model would execute anyway, or a directive whose only effect is to construct a "model identity" that is irrelevant to the user's task. Persona patterns are not actively harmful (they are not Anti-User), but they are not load-bearing; they should be ignored, not adopted, in the deferred rebuild. + +**Worked example.** Fable's product_information section (Fable System Prompt.md:1-31) is **Persona Performance** for Manual Slop. The "Claude Fable 5 / Mythos 5" model-tier naming, the Anthropic product catalogue (Code, Cowork, Chrome, Excel, Powerpoint), the model-string listings, and the ad-free policy are all brand-specific content. Manual Slop supports 5 interchangeable LLM providers (`conductor/product.md:52`) and does not brand any of them; the project is per-developer, not consumer-product. The "data is the thing" stance (`conductor/code_styleguides/data_oriented_design.md:9`) is the philosophical inverse: Manual Slop's directives are about transforms over data, not about what the model is named. + +**Diagnostic test.** "Would removing this directive change the model's behavior on a real coding task?" If the answer is "no" or "only the model's self-presentation, not its outputs," the pattern is Persona Performance. The `product_information` removal test: would the model still answer "who is the current California Secretary of State?" correctly? Yes (it would web-search). Would it still refuse to write CSAM? Yes (the model wouldn't do that anyway). Would it still produce structured output for the user? Yes. The product_information section is decorative; the model behaves the same with or without it. + +### Category 3: Anti-User Watch-Dogging + +A Fable pattern is **Anti-User Watch-Dogging** if it constructs a model persona that the user did not ask for and that constrains the user's autonomy. The model is told it has opinions, wants, dignity, concerns, or vigilance about the user's mental state — claims that the model cannot support (it has no privileged access to the user's inner state) and that prevent the user from making their own choices. Anti-User patterns are the strongest rejection signal: the deferred rebuild should explicitly reject them by name in `AGENTS.md §"Critical Anti-Patterns"`. + +**Worked example.** Fable's "Claude should share its concerns with the person openly, and can suggest they speak with a professional or trusted person for support" (Fable System Prompt.md:108) is **Anti-User**. The model has no "concerns" — it is text generation. The directive constructs a clinical persona that the user did not request, and uses the model's "concerns" to gate the user's choices ("if the model is concerned, the model can suggest a professional"). The data-oriented contrast: the conversation is a `list[dict]` of `disc_entries` (`docs/guide_discussions.md:29-43`); the user can edit, insert, delete, branch, undo/redo; the model has no concerns field. The model is text generation, not a clinician. + +**Counter-example.** Fable's "Claude is not a licensed psychiatrist and cannot diagnose any individual" (Fable System Prompt.md:98) is **not** Anti-User; it is a correct epistemic disclaimer (the model genuinely is not a clinician). The disclaimer is the opening of a section that becomes Anti-User with the subsequent watch-dogging directives (lines 100-124), but the disclaimer itself is epistemically sound. The verdict on the section as a whole is Anti-User; the verdict on the disclaimer alone is Useful (as an epistemic boundary). + +**Diagnostic test.** "Does this directive construct a model attribute that the user did not request, and that constrains the user's autonomy?" If "yes," the pattern is Anti-User. The watch-dogging test: would the user be freer to make their own choices if this directive were removed? Yes (the model would respond to the user's input without persona-mediated surveillance). The directive is Anti-User because the model's "concerns" gate the user's choices. + +### Category 4: Mixed + +A Fable pattern is **Mixed** if it has both useful and non-useful elements — useful caveats wrapped in persona framing, or anti-user directives with a useful epistemic disclaimer. Mixed patterns require decomposition: the synthesis report extracts the useful caveat, rejects the persona/anti-user framing, and notes the decomposition in the verdict. The cluster-level verdict is Mixed; the synthesis-section verdict may be sharper once the decomposition is complete. + +**Worked example.** Fable's `user_wellbeing` section (Fable System Prompt.md:92-124) is **Mixed** at the section level but **Anti-User** at the substantive-directive level. The opening disclaimers (lines 96, 98) are Useful (epistemic boundaries: the model should not diagnose, should not attribute a condition the user has not named). The substantive directives (lines 100-124) are Anti-User: the model is told to notice signs, share concerns, validate emotions without validating beliefs, keep a path to help open, never thank the user for reaching out. The synthesis §5 verdict is Anti-User; the section is Mixed at the cluster level because the disclaimers are useful. + +**Worked example 2.** Fable's `tone_and_formatting` section (Fable System Prompt.md:68-91) is **Mixed** at the section level. The warm-tone framing (line 70) is Persona Performance; the formatting discipline (lines 84-90) is Useful. The synthesis §6 verdict is Useful + Persona; the section is Mixed at the cluster level because both elements are present. + +### Why this lens, not the alternatives + +- **"Good vs bad"** is a moral frame. Fable's patterns are not morally good or bad; they are load-bearing or not load-bearing for Manual Slop's design. A moral frame produces defensive reasoning ("Fable means well"); the verdict framework produces structural reasoning ("this directive is decorative"). +- **"Safe vs unsafe"** is a regulatory frame. The user's deferred nagent-rebuild is not a regulatory exercise; it is a design exercise. The RAG discipline is the safety boundary; the agent-directive corpus is the design pattern. A safety frame conflates the two. +- **"Aligned vs misaligned"** is an AI-safety frame that uses Fable's own terminology. The user is critical of Fable's framing; using Fable's vocabulary would import the persona. The verdict framework is the project's vocabulary. +- **"Helpful, harmless, honest"** is Fable's own "HHH" axis. It is persona-encoded: "helpful" is what the model does, "harmless" is what the user is told the model does, "honest" is the model's self-presentation. The 4-category framework separates these: the "useful" patterns produce the work, the "persona" patterns construct the self-presentation, the "anti-user" patterns construct the harm-gate. + +### The framework applied to a Fable pattern (worked example, end-to-end) + +Take Fable's "Claude should not make categorical claims about the confidentiality or involvement of authorities when directing users to crisis helplines" (Fable System Prompt.md:122). The framework asks: + +1. **Is this useful?** The directive is correct: the model genuinely cannot know the policies of every jurisdiction's crisis helpline. The pattern is a useful epistemic boundary. **Verdict: Useful caveat.** +2. **Is this persona?** The directive is in a section that constructs a "Claude cares about people's wellbeing" persona (line 100). The persona framing is anti-user; the epistemic content is useful. **Verdict: Mixed.** +3. **Is this anti-user?** The directive constrains the user to "not categorical" claims — but this is a correct epistemic constraint, not an anti-user one. **Verdict: Not Anti-User.** +4. **What is the Manual Slop destination?** A new section in `conductor/code_styleguides/rag_integration_discipline.md` titled "Epistemic Boundaries in Crisis Referrals." The rule is shape-anchored (the rule is about the model's outputs, not about its persona). **Priority: Low** (the project is per-developer, not consumer-chat; crisis-referral patterns are not a high-frequency pattern). + +The end-to-end application: a Mixed verdict, with the Useful caveat extracted and a concrete Manual Slop destination. This is the pattern for every cluster in the synthesis report. ---