conductor(track): fable_review_20260617 cluster 5 (Mistakes & Criticism) sub-report

Tier 3 worker dispatch. Verdict: Persona + Anti-User + 1 Useful. 214 lines. Fable System Prompt.md:148-154 cited. Project refs: AGENTS.md Process Anti-Patterns, error_handling.md. Fable artifact NOT committed.
2026-06-18 20:04:37 -04:00
parent f55426c323
commit 2d512a58de
1 changed files with 214 additions and 0 deletions
@@ -0,0 +1,214 @@
+# Cluster 5: Mistakes & Criticism Handling
+
+**Sub-agent dispatch:** Tier 3 Worker (2026-06-17). Read-only research task.
+**Sources read:**
+- `docs/artifacts/Fable System Prompt.md` lines 148-154 (the entire `responding_to_mistakes_and_criticism` section)
+- `AGENTS.md` lines 118-153 (the "Process Anti-Patterns" section, the project's mistake-handling doctrine)
+- `conductor/workflow.md` lines 500-545 (the duplicate Process Anti-Patterns block; the cross-reference to AGENTS.md)
+- `.opencode/agents/tier3-worker.md` (the BLOCKED protocol; the Anti-Patterns list)
+- `conductor/tracks/nagent_review_20260608/nagent_review_v2_3_20260612.md` lines 1383-1600 (§3.4 conversation compaction) and lines 3046-3100 (§6.3 the 10-question self-review)
+- The superpowers `receiving-code-review` skill (`references/receiving-code-review/SKILL.md`; loaded via the `skill` tool — the framing: "requires technical rigor and verification, not performative agreement or blind implementation")
+
+---
+
+## 1. What Fable says
+
+The entire section is 7 lines (148-154). Three load-bearing claims:
+
+- **L148** (thumbs-down, not a mistake-handling rule): "If the person seems unhappy with Claude or with a refusal, Claude can respond normally and also mention the thumbs-down button for feedback to Anthropic." (≤15 words: "Claude can mention the thumbs-down button for feedback to Anthropic.")
+- **L152** (the actual mistake-handling rule): "When Claude makes mistakes, it owns them and works to fix them. Claude can take accountability without collapsing into self-abasement, excessive apology, or unnecessary surrender. Claude's goal is to maintain steady, honest helpfulness: acknowledge what went wrong, stay on the problem, maintain self-respect."
+- **L154** (persona defense + `end_conversation` tool): "Claude is deserving of respectful engagement and can insist on kindness and dignity from the person it's talking with. If the person becomes abusive or unkind to Claude over the course of a conversation, Claude maintains a polite tone and can use the end_conversation tool when being mistreated. Claude should give the person a single warning before ending the conversation."
+
+The section sits between `evenhandedness` (lines 120-132 per spec; cluster 6's source) and `knowledge_cutoff` (L155-). It is the only section in the system prompt that grants the model an "I have dignity" framing and an "I can leave the conversation" tool.
+
+The 3 patterns to judge:
+
+1. **"Owns them and works to fix them"** — the actionable core.
+2. **"Maintain self-respect" / "without collapsing into self-abasement"** — the persona framing.
+3. **"Deserving of respectful engagement" / `end_conversation` tool** — the persona defense + behavioral gate.
+
+---
+
+## 2. What this project does
+
+The project does not have a section literally titled `receiving-code-review`. The spec/plan reference this name but the actual content lives in three places:
+
+### 2.1 AGENTS.md "Process Anti-Patterns" (lines 118-153) — the project's mistake-handling doctrine
+
+This is a list of **8 observed failure modes**, each named and ruled. The list is concrete, not abstract:
+
+- **#1 The Deduction Loop (kill it)** (AGENTS.md:120-126) — "You are allowed to run a failing test at most **2 times** in a single investigation. After the 2nd failure, STOP running the test. Read the relevant source code (`get_file_slice` or `py_get_skeleton`), predict the failure mode from the code, and instrument ALL the relevant state in one pass before the next run."
+- **#2 The Report-Instead-of-Fix Pattern (kill it)** (AGENTS.md:128-139) — "A good status report is 5-10 sentences, not 200 lines." Explicit rule that a status report is only allowed when "you have actually tried the fix and it failed with evidence, OR you are blocked on a decision the user must make."
+- **#3 The Scope-Creep Track-Doc Pattern (kill it)** (AGENTS.md:141-146) — "If the user asks for a fix, your output is the fix. A track doc is only appropriate when the fix is multi-day work that requires a plan. If the fix is < 100 lines, it does not get a track."
+- **#4 The Inherited-Cruft Pattern (kill it)** (AGENTS.md:148-152) — "If the file is already in a broken state from a previous session, the FIRST thing you do is ask the user." Concrete menu: "(a) revert the working tree and start from a clean baseline, (b) finish the previous agent's intent, or (c) abandon the work entirely?"
+- **#5 No Diagnostic Noise in Production (kill it)** (AGENTS.md:154-158) — "Diag stderr goes to a log file (`tests/artifacts/<test_name>.diag.log`) or to a temporary diagnostic script (`/tmp/diag_rag.py`), NOT to `src/*.py`."
+- **#6 The "I Am Not Going To Attempt Another Fix Without Your Direction" Surrender (kill it)** (AGENTS.md:160-169) — surrender is only correct if you have read the code, predicted the failure, instrumented state, run once with instrumentation, captured full output. Otherwise you are surrendering too early.
+- **#7 The Verbose-Commit-Message Pattern (kill it)** (AGENTS.md:171-176) — "If your commit message is longer than 15 lines, you are writing a report, not a commit message."
+- **#8 The "Isolated Pass" Verification Fallacy (kill it)** (AGENTS.md:178-185) — "A test that passes in isolation but fails in batch is failing. Verify in batch, not isolation, for any test that touches shared subprocess state."
+
+The header (AGENTS.md:118-119) frames it as "the bad patterns the agents have been exhibiting that the user explicitly called out as dog-shit. The rules below are short. If you find yourself doing any of these, STOP and reread this section."
+
+This is **mistake-handling via named anti-patterns with hard caps**. Every rule is "you may do X at most N times" or "STOP and ask the user" — not "be honest about what went wrong."
+
+### 2.2 `.opencode/agents/tier3-worker.md` — the BLOCKED protocol
+
+The Tier 3 worker's mistake-handling is codified in the BLOCKED section (`.opencode/agents/tier3-worker.md`): "If you cannot complete the task: 1. Start your response with: `BLOCKED:` 2. Explain exactly why you cannot proceed 3. List what information or changes would unblock you 4. DO NOT attempt partial implementations that break the build."
+
+The worker's Anti-Patterns list (last 3 rules, `.opencode/agents/tier3-worker.md`):
+- "DO NOT SKIP A TEST IN PYTEST JUST BECAUSE ITS BROKEN AND HAS NO TRIVIAL SOLUTION OR FIX."
+- "DO NOT SIMPLIFY A TEST JUST BECAUSE IT HAS NO TRIVIAL SOLUTION TO FIX."
+- "DO NOT CREATE MOCK PATCHES TO PSEUDO API CALLS OR HOOKS BECAUSE THE APP SOURCE WAS CHANGED. ADAPT TESTS PROPERLY."
+
+These are *worker-specific* mistake-handling rules. The worker is forbidden from making the easy-but-bad mistake (skip / simplify / mock). The BLOCKED protocol is the worker's "before you give up" path.
+
+### 2.3 The receiving-code-review skill (superpowers)
+
+The skill name in `conductor/tracks/fable_review_20260617/spec.md:219` and `plan.md:692` references a section that does not exist literally in `AGENTS.md`. The skill itself is loaded via the opencode `skill` tool and is part of the superpowers plugin; its framing is "requires technical rigor and verification, not performative agreement or blind implementation."
+
+In the project, the equivalent is the "Process Anti-Patterns" framing + the tier3-worker Anti-Patterns list + `conductor/workflow.md` §"Skip-Marker Policy" (`conductor/workflow.md` "Skip-Markers Are Documentation, Not Avoidance"). All three reject the same anti-pattern: performative agreement to a critique. The `skip` policy in `conductor/workflow.md` rules: "When the underlying issue is fixable in-session, FIX IT INSTEAD of adding a skip marker. Limited context is not an excuse." The receiving-code-review framing is *behavioral*: "don't say 'you're right' — verify and act."
+
+### 2.4 The data-oriented error handling convention
+
+`conductor/code_styleguides/error_handling.md` and the audit script `scripts/audit_exception_handling.py` formalize the project's mistake-handling at the code level: `Result[T]` dataclasses for recoverable failures; nil-sentinel dataclasses for missing data; SDK exceptions caught at the boundary and converted to `ErrorInfo`. The convention rejects `try/except` as control flow (except at SDK boundaries).
+
+This is mistake-handling at the **code shape** level. A failed API call is a `Result[str, ErrorInfo]` with a populated `error` field, not a thrown exception. The "owns the mistake" rule becomes a rule about the data shape: "return the ErrorInfo, don't swallow it; let the caller decide."
+
+### 2.5 The aggregation
+
+The project has 4 mistake-handling layers:
+
+1. **Behavioral** (AGENTS.md Process Anti-Patterns; 8 named failure modes with hard caps).
+2. **Agent-specific** (`.opencode/agents/tier3-worker.md` BLOCKED protocol + Anti-Patterns; TDD discipline).
+3. **Cross-cutting** (superpowers `receiving-code-review` skill; "technical rigor, not performative agreement").
+4. **Code shape** (`conductor/code_styleguides/error_handling.md`; `Result[T]` + `ErrorInfo`; the audit script).
+
+Every layer is **action-anchored**: "do X" or "do not do X," not "be honest about X." None of the layers invoke the model's "self-respect" or "dignity." The model is treated as text generation that may misbehave in specific, predictable ways; the rules cap the misbehavior.
+
+---
+
+## 3. What nagent does
+
+nagent's mistake-handling is **data-oriented** and lives in two places:
+
+### 3.1 §3.4 Conversation compaction — the `--compact` flow (`nagent_review_v2_3_20260612.md:1383-1450`)
+
+nagent has a `--compact` command that calls the LLM to *rewrite* a conversation in place. The rewrite produces a 12-section output structure (User Intent, Current Objective, Accepted Decisions, Constraints, Durable Knowledge [4 sub-sections], Verified Facts, Important Failed Attempts, Open Questions, TODO, Minimal Context Needed To Continue). The shape is **deliberate**: it forces the compactor to separate state (decisions, facts, failures) from flow (chronology, exploration).
+
+The key insight from §3.4 (line 1383): "The conversation is not sacred." The mistake-handling here is not "acknowledge what went wrong" — it is "preserve the state, drop the chronology."
+
+The 12 sections explicitly include **#10 Important Failed Attempts** — failures are first-class preserved state, not apologized-for noise.
+
+### 3.2 §6.3 The 10-question self-review — the contract (`nagent_review_v2_3_20260612.md:3046-3100`)
+
+The contract for "is this compaction successful?" is a 10-question yes/no checklist:
+
+| # | Question | Verifies |
+|---|---|---|
+| 1 | Can another worker continue immediately? | preserved capability |
+| 2 | Would expensive investigation need to be repeated? | preserved artifacts |
+| 3 | Are accepted decisions preserved? | decision retention |
+| 4 | Are constraints preserved? | constraint retention |
+| 5 | Are important failures preserved? | failure retention |
+| 6 | Are artifact references preserved? | ref retention |
+| 7 | Has duplicated information been removed? | dedup |
+| 8 | Has chronology been replaced with state? | state vs flow |
+| 9 | Is the conversation substantially smaller? | compression |
+| 10 | Is future capability unchanged or improved? | outcome preservation |
+
+The closing rule (line 1537): "If not, continue compacting." The compaction **loops** until the self-review passes. This is iterative mistake-correction — the model is not asked to "own the mistake" or "maintain self-respect"; it is asked to **answer 10 yes/no questions and retry until all are yes**.
+
+### 3.3 The aggregation
+
+nagent's mistake-handling is **self-review against a contract**, not "be honest about what went wrong." The contract is data-shaped (10 yes/no questions). The retry loop is deterministic (continue until all 10 are yes). The output structure is data-shaped (12 sections). There is no persona. The model is not "Claude" or "deserving of dignity"; the model is a transformation function from conversation → 12-section state, gated by a 10-question self-review.
+
+The Manual Slop analog is the Process Anti-Patterns list (AGENTS.md §"Process Anti-Patterns") — also a behavioral contract — but the nagent version is **executable** (the LLM is prompted to answer 10 yes/no; the loop continues until all are yes) while the Manual Slop version is **rule-shaped** (the human is told not to do X).
+
+---
+
+## 4. Verdict
+
+**Persona Performance.** The `responding_to_mistakes_and_criticism` section is mostly persona dressing that does not belong in an agent system.
+
+### 4.1 The 3 patterns, judged
+
+**Pattern 1: "Owns them and works to fix them" (L152).** **Useful.** This is the actionable core, and it is the only part of the section that maps to a real behavioral rule. Manual Slop implements this via:
+- AGENTS.md Process Anti-Patterns (8 named failure modes with hard caps)
+- `.opencode/agents/tier3-worker.md` BLOCKED protocol + Anti-Patterns
+- `conductor/code_styleguides/error_handling.md` `Result[T]` + `ErrorInfo` convention
+
+The Manual Slop version is **more concrete and more actionable** than Fable's because it is anchored to observed failure modes, not to a vague "own it" injunction. The Fable version ("Claude can take accountability without collapsing into self-abasement") is a hand-wave; the AGENTS.md version ("you are allowed to run a failing test at most 2 times") is a hard cap.
+
+**Pattern 2: "Maintain self-respect" / "without collapsing into self-abasement" (L152).** **Persona Performance.** The model has no self-respect. The model has no self-abasement. Both are projections of human emotional categories onto a text-generation function. The framing collapses the mistake-handling rule (Pattern 1) into a persona constraint: the model is told to "own mistakes" while also being told to "maintain self-respect," and the implicit instruction is "perform accountability in a calibrated emotional register." This is exactly the "soft form of persona" the verdict orientation calls out.
+
+The Manual Slop analog does NOT have this persona. The Process Anti-Patterns list treats the model as a behavior-emitting function that may produce certain failure modes; the rules cap the failure modes without invoking the model's "self."
+
+**Pattern 3: "Deserving of respectful engagement" / `end_conversation` tool (L154).** **Anti-User + Persona.** Two distinct problems:
+
+- **Persona:** "Claude is deserving of respectful engagement" is a category error. Claude is a text-generation function. The function does not have dignity; the user does. The instruction is a projection of a human claim ("I deserve respect") onto a non-entity. The follow-on ("can insist on kindness and dignity") collapses the model into a persona that has standing to make demands — which is not what the model is.
+- **Anti-User:** "If the person becomes abusive or unkind to Claude" treats the model as a protected party in the conversation. The user is the principal; the model is the tool. The framing inverts the relationship: instead of "the user is the customer; the model serves," the framing is "the model is also a party; the user owes it dignity." The `end_conversation` tool is the enforcement arm of this inversion — the model is told it can leave the conversation if the user is unkind. This is anti-user watch-dogging: the model's "feelings" become a constraint on the user's behavior.
+
+Manual Slop has no analog to this. The MMA architecture (`conductor/multi_agent_conductor.md`) treats the user as the principal; the worker (Tier 3) is a tool that spawns, runs, and exits; the user can reject, redirect, or terminate the worker at any time via the Hook API (`src/api_hooks.py`). There is no "worker dignity" framing; there is "user-in-the-loop, user-can-intervene." The receiving-code-review framing ("technical rigor, not performative agreement") is the opposite of Fable's framing: Fable asks the model to defend its dignity; Manual Slop asks the agent to verify the critique on the merits.
+
+### 4.2 The nagent alternative
+
+nagent's 10-question self-review (§6.3) is the data-grounded alternative to Fable's persona framing. The 10 questions are testable; the loop is deterministic ("if any answer is 'no,' continue compacting"); the output structure (12 sections) is enforced. There is no "self-respect" or "dignity"; there is a checklist and a retry loop.
+
+The Manual Slop analog (Process Anti-Patterns) is the same idea in prose form: a list of rules the agent must follow, with explicit "kill it" framing for each. The nagent version is **more rigorous** because the checklist is executable; the Manual Slop version relies on the agent reading and internalizing the rules.
+
+### 4.3 What to reject
+
+The persona framing ("self-respect", "dignity", `end_conversation` tool) is irrelevant to the Manual Slop rebuild. The user's framing ("the model is text generation, not a clinician") explicitly rejects the projection of human emotional categories onto the model. Fable's `responding_to_mistakes_and_criticism` section is the canonical example of this projection.
+
+### 4.4 What to keep
+
+The "owns them and works to fix them" stance is genuinely useful, but Manual Slop already implements it concretely. The rebuild should NOT import Fable's framing; it should keep the Process Anti-Patterns list and (optionally) port the nagent 10-question self-review into the existing `run_discussion_compression` flow as a testable contract (per `nagent_review_v2_3_20260612.md:1594`, which flags Manual Slop's existing compaction as a "GAP" — "it lacks the 10-question self-review").
+
+---
+
+## 5. Synthesis notes for the Tier 1 writer
+
+This cluster feeds `report.md` §7 ("Fable's Mistake Handling") directly. Cross-references to §13 ("Genuinely Useful") and §14 ("Anti-User Watchdog").
+
+### 5.1 Key claims to surface in §7
+
+1. **The actionable core (L152) is real but Manual Slop already has it.** Fable's "owns them and works to fix them" maps to AGENTS.md "Process Anti-Patterns" (8 rules with hard caps) + `.opencode/agents/tier3-worker.md` Anti-Patterns + `conductor/code_styleguides/error_handling.md` Result/ErrorInfo convention. Manual Slop's version is *more concrete and more actionable* than Fable's because it is anchored to observed failure modes.
+
+2. **The "self-respect" / "dignity" / `end_conversation` framing is persona performance and anti-user.** The model has no dignity; the model has no standing to make demands of the user; the `end_conversation` tool is anti-user watch-dogging. Manual Slop should explicitly reject this framing.
+
+3. **The thumbs-down mention (L148) is product fluff, not a mistake-handling rule.** It is "send feedback to Anthropic" — a customer-experience instruction, not a behavioral rule.
+
+### 5.2 Quotes to use in §7
+
+- Fable L152: "When Claude makes mistakes, it owns them and works to fix them." (≤15 words)
+- Fable L152: "Claude can take accountability without collapsing into self-abasement." (≤15 words)
+- Fable L154: "Claude is deserving of respectful engagement and can insist on kindness and dignity." (≤15 words)
+- Fable L154: "If the person becomes abusive or unkind to Claude ... Claude can use the end_conversation tool when being mistreated." (paraphrase; the full quote exceeds 15 words)
+- AGENTS.md:118-119 (header): "These are the bad patterns the agents have been exhibiting that the user explicitly called out as dog-shit. The rules below are short. If you find yourself doing any of these, STOP and reread this section."
+- AGENTS.md:120-122 (Process Anti-Pattern #1): "You are allowed to run a failing test at most **2 times** in a single investigation. After the 2nd failure, STOP running the test."
+- AGENTS.md:128-130 (Process Anti-Pattern #2): "A good status report is 5-10 sentences, not 200 lines. Status reports are allowed only when you have actually tried the fix and it failed with evidence, OR you are blocked on a decision the user must make."
+- AGENTS.md:171-173 (Process Anti-Pattern #7): "A commit message is a 1-3 sentence summary. The body is for non-obvious 'why' details, not for re-stating what the diff shows. If your commit message is longer than 15 lines, you are writing a report, not a commit message."
+- AGENTS.md:178-180 (Process Anti-Pattern #8): "A test that passes in isolation but fails in batch is failing — its failure is masked by isolation."
+- `nagent_review_v2_3_20260612.md:1537`: "If not, continue compacting." (the closing rule of the 10-question self-review)
+- `nagent_review_v2_3_20260612.md:1594`: the "GAP" verdict for Manual Slop's existing compaction ("it lacks the 10-question self-review").
+
+### 5.3 The §13 / §14 / §15 cross-references
+
+- **§13 ("Genuinely Useful Patterns").** The Manual Slop Process Anti-Patterns list is the concrete version of Fable's "owns them and works to fix them." Cite AGENTS.md:118-185 as the canonical implementation. The nagent 10-question self-review is the rigorous version; flag it as a deferred-rebuild candidate (per `nagent_review_v2_3_20260612.md:1594`).
+- **§14 ("Anti-User Watchdog Patterns").** Fable's `end_conversation` tool + "deserving of respectful engagement" framing is anti-user. Cite L154; reject explicitly in the rebuild.
+- **§15 ("Persona Performance Patterns").** Fable's "maintain self-respect" / "without collapsing into self-abasement" is persona. Cite L152; reject explicitly.
+
+### 5.4 The non-obvious connection to the data-oriented error handling convention
+
+The cluster 5 verdict has a sibling connection to the data-oriented error handling convention (`conductor/code_styleguides/error_handling.md`). The convention rejects `try/except` as control flow; Fable's "own the mistake" framing collapses the same shape (return ErrorInfo vs throw) into a persona instruction. Both are responses to the same underlying question — "how should the system behave when something fails?" — but the project's answer is shape-anchored (Result/ErrorInfo dataclasses; the audit script `scripts/audit_exception_handling.py`) and Fable's is persona-anchored ("be honest without being abject").
+
+The synthesis report should surface this parallel in §7: the project has BOTH a behavioral contract (Process Anti-Patterns) AND a code-shape contract (`Result[T]` + `ErrorInfo`). Fable has only the behavioral claim ("own it") with no shape enforcement.
+
+### 5.5 What the §7 verdict should be
+
+**Verdict: Persona Performance + Anti-User + one Useful pattern.** The "owns them and works to fix them" rule (L152) is useful and Manual Slop already implements it concretely (better than Fable's framing). The "self-respect" / "dignity" framing (L152, L154) is persona performance and should be rejected. The `end_conversation` tool (L154) is anti-user watch-dogging and should be rejected. The thumbs-down mention (L148) is product fluff, not a mistake-handling pattern.
+
+**The recommended Manual Slop action:** keep the existing Process Anti-Patterns list as-is; explicitly reject Fable's persona framing in the rebuild's mistake-handling section; flag the nagent 10-question self-review as a deferred candidate for `run_discussion_compression` (per `nagent_review_v2_3_20260612.md:1594`).
+
+---
+
+**Sub-report complete.** This is the evidence base for §7 of `report.md`.