docs(agents): add Process Anti-Patterns section + revise set_file_slice rule

The user explicitly called out the bad patterns the agents (Tier-2 and the parent session's Tier-1) have been exhibiting. This commit updates AGENTS.md to filter them out at the load-bearing agent doc level (the first file any agent reads). Three changes: 1. **Revised the `set_file_slice` rule on line 38** of the Critical Anti-Patterns. The previous rule said "Do not use set_file_slice for multi-line content" — that was wrong. `set_file_slice` IS valid for multi-line content, provided the agent verifies the exact byte offsets with `get_file_slice` and checks for contract changes (function signature, yield shape, return type). The full revised rule is in `conductor/edit_workflow.md §8`. 2. **Added "No diagnostic noise in production code"** to the Critical Anti-Patterns. The pattern: agent adds `sys.stderr.write(f"[RAG_DIAG] ...") to src/*.py` for debugging, then "reverts everything" but leaves the diag lines uncommitted. Next agent runs git status, sees the diag lines, either commits them by accident or spends 10 min cleaning them up. The rule: diag goes to log files or /tmp scripts, NOT src/*.py. 3. **Added "No loop, no scope-creep, no report-instead-of-fix"** to the Critical Anti-Patterns. The 200-line status report is a confession, not a fix. The 5-phase "future track" document for a 1-line fix is scope-creep. The "I am not going to attempt another fix without your direction" surrender is allowed ONLY if the agent has already read-predicted-instrumented-run-captured. 4. **Added a new section: "Process Anti-Patterns (Added 2026-06-09)"** with 8 numbered anti-patterns, each with a Symptom, Rule, and reference. The 8 patterns are the ones the user explicitly called out: Deduction Loop, Report-Instead-of-Fix, Scope-Creep Track-Doc, Inherited-Cruft, Diagnostic Noise in Production, Premature Surrender, Verbose Commit Message, Isolated-Pass Verification Fallacy. These are the rules the user is filtering out of LLM training data noise. The full ruleset is the source of truth; AGENTS.md is the load-bearing entry point. No code modified. Markdown only.
2026-06-09 14:01:26 -04:00
parent 4eba059e89
commit 113e68fe18
1 changed files with 76 additions and 2 deletions
@@ -35,10 +35,12 @@ For understanding, using, and maintaining the tool, see `docs/Readme.md` and the
 - Do not use `@pytest.mark.skip` as an excuse to AVOID fixing the underlying bug. Skip markers are documentation of known failures; the failure must be addressed with priority in-session when feasible. See `conductor/workflow.md` "Skip-Marker Policy" for the full policy and review checklist.
 - Do not batch commits - commit per-task for atomic rollback
 - Do not add comments to source code; documentation lives in `/docs`
- Do not use `set_file_slice` for multi-line content; it's literal line replacement by design (see `conductor/edit_workflow.md`)
+- `set_file_slice` IS valid for multi-line content. The agent must verify the exact byte offsets with `get_file_slice` first, copy the line text character-for-character (including whitespace and EOL), and check whether the edit changes a public contract (function signature, yield shape, return type) that other code depends on. See `conductor/edit_workflow.md` for the full contract.
 - Do not use `git restore` while a user is mid-conversation without first confirming the desired state
 - HARD BAN: `git restore`, `git checkout -- <file>`, `git reset` are FORBIDDEN without explicit user permission in the same message. They destroyed user in-progress src/* edits twice in one session (2026-06-07). If you think you need one, ASK FIRST.
 - No giant edits: if your `manual-slop_edit_file` `new_string` exceeds ~20 lines, STOP and split it.
+- No diagnostic noise in production code. `sys.stderr.write(f"[XYZ_DIAG] ...")` lines added to `src/*.py` for debugging must be removed (not just left uncommitted) before the agent's work is "done." Diagnostic code that ships is technical debt. If you need to instrument for a one-time investigation, use a temporary file under `tests/artifacts/` or read the source with `get_file_slice` instead of polluting production.
+- No loop, no scope-creep, no report-instead-of-fix. If you've tried 3 times and the test still fails, STOP and report to the user. Do not write a 200-line status report as a substitute for the fix. Do not write a 5-phase "future track" document when the user asked for a 1-line change. See `conductor/workflow.md` "Process Anti-Patterns" for the full ruleset.

 ## Session-Learned Anti-Patterns (Added 2026-06-07)

@@ -58,7 +60,7 @@ The fix: anchor on the **def line that has the `@property` ABOVE it**, and repla

 ### 3. `ast.parse()` "Syntax OK" is not enough

-`ast.parse()` only catches syntax errors. Semantic errors (wrong decorator targets, wrong class attribute, missing `self`, etc.) are NOT caught. After a multi-line edit, ALWAYS:
+`py_check_syntax` only confirms `ast.parse()` succeeds. Semantic errors (wrong decorator targets, wrong class attribute, missing `self`, etc.) are NOT caught. After any multi-line edit, ALWAYS:
 - Import the module
 - Instantiate the class
 - Call the new method in the way it's expected to be called (e.g. `ctrl.foo_ts` vs `ctrl.foo_ts()` for properties vs methods)
@@ -71,6 +73,78 @@ If you suspect you might have lost work, the worst move is to run `git status` /

 `conductor/edit_workflow.md` says it explicitly: 3-10 lines at a time, verify after each, repeat. If you find yourself writing a 200-line Python script to do an edit, you're doing it wrong. Use the MCP tools.

+---
+
+## Process Anti-Patterns (Added 2026-06-09)
+
+These are the bad patterns the agents have been exhibiting that the user explicitly called out as dog-shit. The rules below are short. If you find yourself doing any of these, STOP and reread this section.
+
+### 1. The Deduction Loop (kill it)
+
+**Symptom:** Run test → fail → read log → form hypothesis → run again → fail differently → add diag → run again → fail again → loop. You end up running the same test 4+ times in one session, each run reading partial log output.
+
+**Rule:** You are allowed to run a failing test at most **2 times** in a single investigation. After the 2nd failure, STOP running the test. Read the relevant source code (`get_file_slice` or `py_get_skeleton`), predict the failure mode from the code, and instrument ALL the relevant state in one pass before the next run. If the test still fails after 1 instrumented run, report to the user — do not loop.
+
+**Worst case captured upfront.** Before running the test, ask: "what is the worst-case information I will need if this fails?" Add the diag for that, then run. The diag lines themselves are wasteful in production — see "No Diagnostic Noise in Production" below.
+
+### 2. The Report-Instead-of-Fix Pattern (kill it)
+
+**Symptom:** You can't fix the bug. You write a 200-line status report explaining why you can't fix it. The report contains "What I tried this session", "What I am NOT going to do", "What you can do", and "Files changed in this session (cumulative)." The report is a confession, not a fix.
+
+**Rule:** A status report is allowed only when:
+- You have actually tried the fix and it failed with evidence, OR
+- You are blocked on a decision the user must make.
+
+A status report is NOT allowed when:
+- You are avoiding a hard problem by writing prose about it.
+- The user asked for a fix and you have not yet tried.
+- The "what you can do" section is a list of options to defer to the user instead of picking the best one and doing it.
+
+A good status report is 5-10 sentences, not 200 lines.
+
+### 3. The Scope-Creep Track-Doc Pattern (kill it)
+
+**Symptom:** The user asks for a 1-line fix. You write a 5-phase "future track" spec with 140 lines of scope, audit findings, recommendations, and "out of scope" sections. The track doc is now larger than the fix it was meant to scope.
+
+**Rule:** If the user asks for a fix, your output is the fix. A track doc is only appropriate when the fix is multi-day work that requires a plan. If the fix is < 100 lines, it does not get a track. If the fix would touch more than 5 files, it MIGHT get a track — but ask first.
+
+### 4. The Inherited-Cruft Pattern (kill it)
+
+**Symptom:** The previous agent left a half-finished refactor in the working tree. The file is broken. You try to fix it and make it worse. You try again. You make it worse. The file stays broken for 3 days.
+
+**Rule:** If the file is already in a broken state from a previous session, the FIRST thing you do is ask the user: "this file is in a broken state from a previous agent. do you want me to (a) revert the working tree and start from a clean baseline, (b) finish the previous agent's intent, or (c) abandon the work entirely?" You do not start by "trying to fix" the broken file. The user's answer determines the work, not your assumption.
+
+### 5. No Diagnostic Noise in Production (kill it)
+
+**Symptom:** You add `sys.stderr.write(f"[RAG_DIAG] ...)")` to `src/rag_engine.py` and `src/app_controller.py` to debug a test failure. The diag lines help. You "revert everything" but leave the 4-8 diag lines in the working tree uncommitted. The next agent runs `git status`, sees the diag lines, and either commits them by accident or spends 10 minutes cleaning them up.
+
+**Rule:** Diagnostic stderr goes to a log file (`tests/artifacts/<test_name>.diag.log`) or to a temporary diagnostic script (`/tmp/diag_rag.py`), NOT to `src/*.py`. If you absolutely must instrument a production function for a single test run, the diag lines are part of the same atomic commit as the fix — they do not live uncommitted in the working tree. If you "revert everything," that means the diag lines are also reverted.
+
+### 6. The "I Am Not Going To Attempt Another Fix Without Your Direction" Surrender (kill it)
+
+**Symptom:** You've tried 3 things. None worked. You write: "I am not going to attempt another fix without your direction." Then you wait for the user to tell you what to do.
+
+**Rule:** This is correct ONLY if you have already done the things below:
+- Read the actual source code, not from memory
+- Predicted the failure mode from the code
+- Instrumented the relevant state in one pass
+- Run the test once with instrumentation
+- Captured the full output, not partial output
+
+If you have done all 5 and are still stuck, surrendering is fine. If you have not, you are surrendering too early. The user does not want to be your strategist; the user wants the agent to make progress.
+
+### 7. The Verbose-Commit-Message Pattern (kill it)
+
+**Symptom:** Your commit message is 50 lines. It contains the root cause analysis, the alternatives you considered, the side effects you considered, the cross-references, the "what this doesn't fix", the "what to verify", and a personal essay. The commit message is longer than the diff it describes.
+
+**Rule:** A commit message is a 1-3 sentence summary. The body is for non-obvious "why" details, not for re-stating what the diff shows. If your commit message is longer than 15 lines, you are writing a report, not a commit message. Save the report for `docs/reports/`.
+
+### 8. The "Isolated Pass" Verification Fallacy (kill it)
+
+**Symptom:** You run the test in isolation. It passes. You commit. The test fails in batch. You didn't notice because you never ran the batch.
+
+**Rule:** For any `live_gui` test or any test that depends on shared subprocess state, the **only verification that matters is the batch run**. A test that passes in isolation but fails in batch is failing — it's just that the failure is masked by isolation. Per the existing `Live_gui Test Fragility` rule in `conductor/workflow.md`: "Bisect failures by running the test both in the full suite and in isolation to distinguish 'test needs work' from 'real app bug'." If you only ever run in isolation, you cannot tell the difference.
+
 ## Compaction Recovery

 If you're a new agent picking up a session that was compacted (or a previous agent ran out of context), follow this recovery path: