The user explicitly called out the bad patterns the agents (Tier-2 and the parent session's Tier-1) have been exhibiting. This commit updates AGENTS.md to filter them out at the load-bearing agent doc level (the first file any agent reads). Three changes: 1. **Revised the `set_file_slice` rule on line 38** of the Critical Anti-Patterns. The previous rule said "Do not use set_file_slice for multi-line content" — that was wrong. `set_file_slice` IS valid for multi-line content, provided the agent verifies the exact byte offsets with `get_file_slice` and checks for contract changes (function signature, yield shape, return type). The full revised rule is in `conductor/edit_workflow.md §8`. 2. **Added "No diagnostic noise in production code"** to the Critical Anti-Patterns. The pattern: agent adds `sys.stderr.write(f"[RAG_DIAG] ...") to src/*.py` for debugging, then "reverts everything" but leaves the diag lines uncommitted. Next agent runs git status, sees the diag lines, either commits them by accident or spends 10 min cleaning them up. The rule: diag goes to log files or /tmp scripts, NOT src/*.py. 3. **Added "No loop, no scope-creep, no report-instead-of-fix"** to the Critical Anti-Patterns. The 200-line status report is a confession, not a fix. The 5-phase "future track" document for a 1-line fix is scope-creep. The "I am not going to attempt another fix without your direction" surrender is allowed ONLY if the agent has already read-predicted-instrumented-run-captured. 4. **Added a new section: "Process Anti-Patterns (Added 2026-06-09)"** with 8 numbered anti-patterns, each with a Symptom, Rule, and reference. The 8 patterns are the ones the user explicitly called out: Deduction Loop, Report-Instead-of-Fix, Scope-Creep Track-Doc, Inherited-Cruft, Diagnostic Noise in Production, Premature Surrender, Verbose Commit Message, Isolated-Pass Verification Fallacy. These are the rules the user is filtering out of LLM training data noise. The full ruleset is the source of truth; AGENTS.md is the load-bearing entry point. No code modified. Markdown only.
14 KiB
AGENTS.md
What This Is
Manual Slop is a local GUI orchestrator for LLM-driven coding sessions. It bridges high-latency AI reasoning with a low-latency ImGui render loop via a thread-safe async pipeline; every AI-generated payload passes through a human-auditable gate before execution.
The Conductor Convention
All AI agents consuming this project must read ./conductor/workflow.md and treat ./conductor/tracks.md as the task registry. Track implementation follows the TDD protocol documented in conductor/workflow.md with per-file atomic commits and git notes.
Guidance for AI Agents
Detailed agent guidance lives in the following locations — read these directly, do not duplicate content here:
- MUST READ TO - CORRECT EDIT WORKFLOW
conductor/edit_workflow.md - Operational workflow:
conductor/workflow.md - Code style and process:
conductor/product-guidelines.md - Tech stack and constraints:
conductor/tech-stack.md - Product context:
conductor/product.md - MMA orchestrator role:
mma-orchestrator/SKILL.md - Tier 1 (Orchestrator):
.agents/skills/mma-tier1-orchestrator/SKILL.md - Tier 2 (Tech Lead):
.agents/skills/mma-tier2-tech-lead/SKILL.md - Tier 3 (Worker):
.agents/skills/mma-tier3-worker/SKILL.md - Tier 4 (QA):
.agents/skills/mma-tier4-qa/SKILL.md
Human-Facing Documentation
For understanding, using, and maintaining the tool, see docs/Readme.md and the 14 deep-dive guides it indexes.
Critical Anti-Patterns
- Do not read full files >50 lines without first using
py_get_skeletonorget_file_summary - Do not modify the tech stack without updating
conductor/tech-stack.mdfirst - Do not skip TDD - write failing tests before implementation
- Do not use
@pytest.mark.skipas an excuse to AVOID fixing the underlying bug. Skip markers are documentation of known failures; the failure must be addressed with priority in-session when feasible. Seeconductor/workflow.md"Skip-Marker Policy" for the full policy and review checklist. - Do not batch commits - commit per-task for atomic rollback
- Do not add comments to source code; documentation lives in
/docs set_file_sliceIS valid for multi-line content. The agent must verify the exact byte offsets withget_file_slicefirst, copy the line text character-for-character (including whitespace and EOL), and check whether the edit changes a public contract (function signature, yield shape, return type) that other code depends on. Seeconductor/edit_workflow.mdfor the full contract.- Do not use
git restorewhile a user is mid-conversation without first confirming the desired state - HARD BAN:
git restore,git checkout -- <file>,git resetare FORBIDDEN without explicit user permission in the same message. They destroyed user in-progress src/* edits twice in one session (2026-06-07). If you think you need one, ASK FIRST. - No giant edits: if your
manual-slop_edit_filenew_stringexceeds ~20 lines, STOP and split it. - No diagnostic noise in production code.
sys.stderr.write(f"[XYZ_DIAG] ...")lines added tosrc/*.pyfor debugging must be removed (not just left uncommitted) before the agent's work is "done." Diagnostic code that ships is technical debt. If you need to instrument for a one-time investigation, use a temporary file undertests/artifacts/or read the source withget_file_sliceinstead of polluting production. - No loop, no scope-creep, no report-instead-of-fix. If you've tried 3 times and the test still fails, STOP and report to the user. Do not write a 200-line status report as a substitute for the fix. Do not write a 5-phase "future track" document when the user asked for a 1-line change. See
conductor/workflow.md"Process Anti-Patterns" for the full ruleset.
Session-Learned Anti-Patterns (Added 2026-06-07)
These burned the most time in a recent startup_speedup session. The rules below are short because the rules above (and conductor/edit_workflow.md) are the source of truth.
1. ALWAYS use the proper edit tool, not a custom script
- For Python source edits, use
manual-slop_edit_filewithold_string/new_string. Do NOT write a standalone Python script that does file-level replacements. - Custom scripts fail silently on: wrong indent in
new_content, wrong EOL (CRLF vs LF) inold_stringsearches, wrong exact-string match (whitespace drift). - When a script fails, debug the actual error message. Do not dismiss it and try a different approach.
2. The decorator-orphan pitfall
When inserting new methods before an existing @property def, your script will leave the @property decorator on the line above your new methods. The decorator then accidentally decorates YOUR new method (which is no longer a property, breaking any subsequent @your_method.setter calls). The file passes ast.parse() but blows up at import time.
The fix: anchor on the def line that has the @property ABOVE it, and replace the pair @property\n def foo(...) with @property\n def your_new(...)\n ...\n def foo(...) — keeping the decorator attached to its original method. Or anchor on a different non-decorated landmark (e.g. self._init_actions()).
3. ast.parse() "Syntax OK" is not enough
py_check_syntax only confirms ast.parse() succeeds. Semantic errors (wrong decorator targets, wrong class attribute, missing self, etc.) are NOT caught. After any multi-line edit, ALWAYS:
- Import the module
- Instantiate the class
- Call the new method in the way it's expected to be called (e.g.
ctrl.foo_tsvsctrl.foo_ts()for properties vs methods)
4. The "I'll just check git status" trap (now a HARD BAN, see Critical list above)
If you suspect you might have lost work, the worst move is to run git status / git restore while a frantic user is watching. Pause, read the actual file, and admit what state you're in. The user knows their state better than you do. This trap has now caused irrecoverable data loss twice in one session — the ban is enforced above.
5. Small, verified edits beat big scripts
conductor/edit_workflow.md says it explicitly: 3-10 lines at a time, verify after each, repeat. If you find yourself writing a 200-line Python script to do an edit, you're doing it wrong. Use the MCP tools.
Process Anti-Patterns (Added 2026-06-09)
These are the bad patterns the agents have been exhibiting that the user explicitly called out as dog-shit. The rules below are short. If you find yourself doing any of these, STOP and reread this section.
1. The Deduction Loop (kill it)
Symptom: Run test → fail → read log → form hypothesis → run again → fail differently → add diag → run again → fail again → loop. You end up running the same test 4+ times in one session, each run reading partial log output.
Rule: You are allowed to run a failing test at most 2 times in a single investigation. After the 2nd failure, STOP running the test. Read the relevant source code (get_file_slice or py_get_skeleton), predict the failure mode from the code, and instrument ALL the relevant state in one pass before the next run. If the test still fails after 1 instrumented run, report to the user — do not loop.
Worst case captured upfront. Before running the test, ask: "what is the worst-case information I will need if this fails?" Add the diag for that, then run. The diag lines themselves are wasteful in production — see "No Diagnostic Noise in Production" below.
2. The Report-Instead-of-Fix Pattern (kill it)
Symptom: You can't fix the bug. You write a 200-line status report explaining why you can't fix it. The report contains "What I tried this session", "What I am NOT going to do", "What you can do", and "Files changed in this session (cumulative)." The report is a confession, not a fix.
Rule: A status report is allowed only when:
- You have actually tried the fix and it failed with evidence, OR
- You are blocked on a decision the user must make.
A status report is NOT allowed when:
- You are avoiding a hard problem by writing prose about it.
- The user asked for a fix and you have not yet tried.
- The "what you can do" section is a list of options to defer to the user instead of picking the best one and doing it.
A good status report is 5-10 sentences, not 200 lines.
3. The Scope-Creep Track-Doc Pattern (kill it)
Symptom: The user asks for a 1-line fix. You write a 5-phase "future track" spec with 140 lines of scope, audit findings, recommendations, and "out of scope" sections. The track doc is now larger than the fix it was meant to scope.
Rule: If the user asks for a fix, your output is the fix. A track doc is only appropriate when the fix is multi-day work that requires a plan. If the fix is < 100 lines, it does not get a track. If the fix would touch more than 5 files, it MIGHT get a track — but ask first.
4. The Inherited-Cruft Pattern (kill it)
Symptom: The previous agent left a half-finished refactor in the working tree. The file is broken. You try to fix it and make it worse. You try again. You make it worse. The file stays broken for 3 days.
Rule: If the file is already in a broken state from a previous session, the FIRST thing you do is ask the user: "this file is in a broken state from a previous agent. do you want me to (a) revert the working tree and start from a clean baseline, (b) finish the previous agent's intent, or (c) abandon the work entirely?" You do not start by "trying to fix" the broken file. The user's answer determines the work, not your assumption.
5. No Diagnostic Noise in Production (kill it)
Symptom: You add sys.stderr.write(f"[RAG_DIAG] ...)") to src/rag_engine.py and src/app_controller.py to debug a test failure. The diag lines help. You "revert everything" but leave the 4-8 diag lines in the working tree uncommitted. The next agent runs git status, sees the diag lines, and either commits them by accident or spends 10 minutes cleaning them up.
Rule: Diagnostic stderr goes to a log file (tests/artifacts/<test_name>.diag.log) or to a temporary diagnostic script (/tmp/diag_rag.py), NOT to src/*.py. If you absolutely must instrument a production function for a single test run, the diag lines are part of the same atomic commit as the fix — they do not live uncommitted in the working tree. If you "revert everything," that means the diag lines are also reverted.
6. The "I Am Not Going To Attempt Another Fix Without Your Direction" Surrender (kill it)
Symptom: You've tried 3 things. None worked. You write: "I am not going to attempt another fix without your direction." Then you wait for the user to tell you what to do.
Rule: This is correct ONLY if you have already done the things below:
- Read the actual source code, not from memory
- Predicted the failure mode from the code
- Instrumented the relevant state in one pass
- Run the test once with instrumentation
- Captured the full output, not partial output
If you have done all 5 and are still stuck, surrendering is fine. If you have not, you are surrendering too early. The user does not want to be your strategist; the user wants the agent to make progress.
7. The Verbose-Commit-Message Pattern (kill it)
Symptom: Your commit message is 50 lines. It contains the root cause analysis, the alternatives you considered, the side effects you considered, the cross-references, the "what this doesn't fix", the "what to verify", and a personal essay. The commit message is longer than the diff it describes.
Rule: A commit message is a 1-3 sentence summary. The body is for non-obvious "why" details, not for re-stating what the diff shows. If your commit message is longer than 15 lines, you are writing a report, not a commit message. Save the report for docs/reports/.
8. The "Isolated Pass" Verification Fallacy (kill it)
Symptom: You run the test in isolation. It passes. You commit. The test fails in batch. You didn't notice because you never ran the batch.
Rule: For any live_gui test or any test that depends on shared subprocess state, the only verification that matters is the batch run. A test that passes in isolation but fails in batch is failing — it's just that the failure is masked by isolation. Per the existing Live_gui Test Fragility rule in conductor/workflow.md: "Bisect failures by running the test both in the full suite and in isolation to distinguish 'test needs work' from 'real app bug'." If you only ever run in isolation, you cannot tell the difference.
Compaction Recovery
If you're a new agent picking up a session that was compacted (or a previous agent ran out of context), follow this recovery path:
- Read the most recent
docs/reports/PLANNING_DIGEST_<date>.mdif one exists. It indexes the planning artifacts and explains the design decisions behind the active tracks. - For each in-flight track, read
conductor/tracks/<track_id>/state.tomlto seecurrent_phase; readconductor/tracks/<track_id>/plan.mdfor the task breakdown. - Check
git log --oneline -20to see what has been committed; the most recent commits inconductor/tracks/<track_id>/are the latest work. - Run the audit scripts (
scripts/audit_main_thread_imports.py,scripts/audit_weak_types.py) to see the current state of the codebase. - Resume from the next unchecked task in
state.toml. The per-task commit discipline means each commit is a safe rollback point.
The track's metadata.json has a verification_criteria field — this is the definition of "done" for the track. If all the criteria are checked, the track is complete.
For deeper recovery, see conductor/workflow.md "Compaction Recovery" (the same pattern, but workflow-level).