Private

Public Access

Files

T

conductor-tier2 113e68fe18 docs(agents): add Process Anti-Patterns section + revise set_file_slice rule

The user explicitly called out the bad patterns the agents
(Tier-2 and the parent session's Tier-1) have been exhibiting.
This commit updates AGENTS.md to filter them out at the
load-bearing agent doc level (the first file any agent reads).

Three changes:

1. **Revised the `set_file_slice` rule on line 38** of the
   Critical Anti-Patterns. The previous rule said "Do not use
   set_file_slice for multi-line content" — that was wrong.
   `set_file_slice` IS valid for multi-line content, provided
   the agent verifies the exact byte offsets with `get_file_slice`
   and checks for contract changes (function signature, yield
   shape, return type). The full revised rule is in
   `conductor/edit_workflow.md §8`.

2. **Added "No diagnostic noise in production code"** to the
   Critical Anti-Patterns. The pattern: agent adds
   `sys.stderr.write(f"[RAG_DIAG] ...") to src/*.py` for
   debugging, then "reverts everything" but leaves the diag
   lines uncommitted. Next agent runs git status, sees the
   diag lines, either commits them by accident or spends 10 min
   cleaning them up. The rule: diag goes to log files or
   /tmp scripts, NOT src/*.py.

3. **Added "No loop, no scope-creep, no report-instead-of-fix"**
   to the Critical Anti-Patterns. The 200-line status report
   is a confession, not a fix. The 5-phase "future track"
   document for a 1-line fix is scope-creep. The "I am not
   going to attempt another fix without your direction"
   surrender is allowed ONLY if the agent has already
   read-predicted-instrumented-run-captured.

4. **Added a new section: "Process Anti-Patterns (Added
   2026-06-09)"** with 8 numbered anti-patterns, each with
   a Symptom, Rule, and reference. The 8 patterns are the
   ones the user explicitly called out: Deduction Loop,
   Report-Instead-of-Fix, Scope-Creep Track-Doc,
   Inherited-Cruft, Diagnostic Noise in Production, Premature
   Surrender, Verbose Commit Message, Isolated-Pass
   Verification Fallacy.

These are the rules the user is filtering out of LLM training
data noise. The full ruleset is the source of truth; AGENTS.md
is the load-bearing entry point.

No code modified. Markdown only.

2026-06-09 14:01:26 -04:00

14 KiB

Raw Blame History

AGENTS.md

What This Is

Manual Slop is a local GUI orchestrator for LLM-driven coding sessions. It bridges high-latency AI reasoning with a low-latency ImGui render loop via a thread-safe async pipeline; every AI-generated payload passes through a human-auditable gate before execution.

The Conductor Convention

All AI agents consuming this project must read ./conductor/workflow.md and treat ./conductor/tracks.md as the task registry. Track implementation follows the TDD protocol documented in conductor/workflow.md with per-file atomic commits and git notes.

Guidance for AI Agents

Detailed agent guidance lives in the following locations — read these directly, do not duplicate content here:

MUST READ TO - CORRECT EDIT WORKFLOW conductor/edit_workflow.md
Operational workflow: conductor/workflow.md
Code style and process: conductor/product-guidelines.md
Tech stack and constraints: conductor/tech-stack.md
Product context: conductor/product.md
MMA orchestrator role: mma-orchestrator/SKILL.md
Tier 1 (Orchestrator): .agents/skills/mma-tier1-orchestrator/SKILL.md
Tier 2 (Tech Lead): .agents/skills/mma-tier2-tech-lead/SKILL.md
Tier 3 (Worker): .agents/skills/mma-tier3-worker/SKILL.md
Tier 4 (QA): .agents/skills/mma-tier4-qa/SKILL.md

Human-Facing Documentation

For understanding, using, and maintaining the tool, see docs/Readme.md and the 14 deep-dive guides it indexes.

Critical Anti-Patterns

Do not read full files >50 lines without first using py_get_skeleton or get_file_summary
Do not modify the tech stack without updating conductor/tech-stack.md first
Do not skip TDD - write failing tests before implementation
Do not use @pytest.mark.skip as an excuse to AVOID fixing the underlying bug. Skip markers are documentation of known failures; the failure must be addressed with priority in-session when feasible. See conductor/workflow.md "Skip-Marker Policy" for the full policy and review checklist.
Do not batch commits - commit per-task for atomic rollback
Do not add comments to source code; documentation lives in /docs
set_file_slice IS valid for multi-line content. The agent must verify the exact byte offsets with get_file_slice first, copy the line text character-for-character (including whitespace and EOL), and check whether the edit changes a public contract (function signature, yield shape, return type) that other code depends on. See conductor/edit_workflow.md for the full contract.
Do not use git restore while a user is mid-conversation without first confirming the desired state
HARD BAN: git restore, git checkout -- <file>, git reset are FORBIDDEN without explicit user permission in the same message. They destroyed user in-progress src/* edits twice in one session (2026-06-07). If you think you need one, ASK FIRST.
No giant edits: if your manual-slop_edit_file new_string exceeds ~20 lines, STOP and split it.
No diagnostic noise in production code. sys.stderr.write(f"[XYZ_DIAG] ...") lines added to src/*.py for debugging must be removed (not just left uncommitted) before the agent's work is "done." Diagnostic code that ships is technical debt. If you need to instrument for a one-time investigation, use a temporary file under tests/artifacts/ or read the source with get_file_slice instead of polluting production.
No loop, no scope-creep, no report-instead-of-fix. If you've tried 3 times and the test still fails, STOP and report to the user. Do not write a 200-line status report as a substitute for the fix. Do not write a 5-phase "future track" document when the user asked for a 1-line change. See conductor/workflow.md "Process Anti-Patterns" for the full ruleset.

Session-Learned Anti-Patterns (Added 2026-06-07)

These burned the most time in a recent startup_speedup session. The rules below are short because the rules above (and conductor/edit_workflow.md) are the source of truth.

1. ALWAYS use the proper edit tool, not a custom script

For Python source edits, use manual-slop_edit_file with old_string/new_string. Do NOT write a standalone Python script that does file-level replacements.
Custom scripts fail silently on: wrong indent in new_content, wrong EOL (CRLF vs LF) in old_string searches, wrong exact-string match (whitespace drift).
When a script fails, debug the actual error message. Do not dismiss it and try a different approach.

2. The decorator-orphan pitfall

When inserting new methods before an existing @property def, your script will leave the @property decorator on the line above your new methods. The decorator then accidentally decorates YOUR new method (which is no longer a property, breaking any subsequent @your_method.setter calls). The file passes ast.parse() but blows up at import time.

The fix: anchor on the def line that has the @property ABOVE it, and replace the pair @property\n def foo(...) with @property\n def your_new(...)\n ...\n def foo(...) — keeping the decorator attached to its original method. Or anchor on a different non-decorated landmark (e.g. self._init_actions()).

3. `ast.parse()` "Syntax OK" is not enough

py_check_syntax only confirms ast.parse() succeeds. Semantic errors (wrong decorator targets, wrong class attribute, missing self, etc.) are NOT caught. After any multi-line edit, ALWAYS:

Import the module
Instantiate the class
Call the new method in the way it's expected to be called (e.g. ctrl.foo_ts vs ctrl.foo_ts() for properties vs methods)

4. The "I'll just check git status" trap (now a HARD BAN, see Critical list above)

If you suspect you might have lost work, the worst move is to run git status / git restore while a frantic user is watching. Pause, read the actual file, and admit what state you're in. The user knows their state better than you do. This trap has now caused irrecoverable data loss twice in one session — the ban is enforced above.

5. Small, verified edits beat big scripts

conductor/edit_workflow.md says it explicitly: 3-10 lines at a time, verify after each, repeat. If you find yourself writing a 200-line Python script to do an edit, you're doing it wrong. Use the MCP tools.

Process Anti-Patterns (Added 2026-06-09)

These are the bad patterns the agents have been exhibiting that the user explicitly called out as dog-shit. The rules below are short. If you find yourself doing any of these, STOP and reread this section.

1. The Deduction Loop (kill it)

Symptom: Run test → fail → read log → form hypothesis → run again → fail differently → add diag → run again → fail again → loop. You end up running the same test 4+ times in one session, each run reading partial log output.

Rule: You are allowed to run a failing test at most 2 times in a single investigation. After the 2nd failure, STOP running the test. Read the relevant source code (get_file_slice or py_get_skeleton), predict the failure mode from the code, and instrument ALL the relevant state in one pass before the next run. If the test still fails after 1 instrumented run, report to the user — do not loop.

Worst case captured upfront. Before running the test, ask: "what is the worst-case information I will need if this fails?" Add the diag for that, then run. The diag lines themselves are wasteful in production — see "No Diagnostic Noise in Production" below.

2. The Report-Instead-of-Fix Pattern (kill it)

Symptom: You can't fix the bug. You write a 200-line status report explaining why you can't fix it. The report contains "What I tried this session", "What I am NOT going to do", "What you can do", and "Files changed in this session (cumulative)." The report is a confession, not a fix.

Rule: A status report is allowed only when:

You have actually tried the fix and it failed with evidence, OR
You are blocked on a decision the user must make.

A status report is NOT allowed when:

You are avoiding a hard problem by writing prose about it.
The user asked for a fix and you have not yet tried.
The "what you can do" section is a list of options to defer to the user instead of picking the best one and doing it.

A good status report is 5-10 sentences, not 200 lines.

3. The Scope-Creep Track-Doc Pattern (kill it)

Symptom: The user asks for a 1-line fix. You write a 5-phase "future track" spec with 140 lines of scope, audit findings, recommendations, and "out of scope" sections. The track doc is now larger than the fix it was meant to scope.

Rule: If the user asks for a fix, your output is the fix. A track doc is only appropriate when the fix is multi-day work that requires a plan. If the fix is < 100 lines, it does not get a track. If the fix would touch more than 5 files, it MIGHT get a track — but ask first.

4. The Inherited-Cruft Pattern (kill it)

Symptom: The previous agent left a half-finished refactor in the working tree. The file is broken. You try to fix it and make it worse. You try again. You make it worse. The file stays broken for 3 days.

Rule: If the file is already in a broken state from a previous session, the FIRST thing you do is ask the user: "this file is in a broken state from a previous agent. do you want me to (a) revert the working tree and start from a clean baseline, (b) finish the previous agent's intent, or (c) abandon the work entirely?" You do not start by "trying to fix" the broken file. The user's answer determines the work, not your assumption.

5. No Diagnostic Noise in Production (kill it)

Symptom: You add sys.stderr.write(f"[RAG_DIAG] ...)") to src/rag_engine.py and src/app_controller.py to debug a test failure. The diag lines help. You "revert everything" but leave the 4-8 diag lines in the working tree uncommitted. The next agent runs git status, sees the diag lines, and either commits them by accident or spends 10 minutes cleaning them up.

Rule: Diagnostic stderr goes to a log file (tests/artifacts/<test_name>.diag.log) or to a temporary diagnostic script (/tmp/diag_rag.py), NOT to src/*.py. If you absolutely must instrument a production function for a single test run, the diag lines are part of the same atomic commit as the fix — they do not live uncommitted in the working tree. If you "revert everything," that means the diag lines are also reverted.

6. The "I Am Not Going To Attempt Another Fix Without Your Direction" Surrender (kill it)

Symptom: You've tried 3 things. None worked. You write: "I am not going to attempt another fix without your direction." Then you wait for the user to tell you what to do.

Rule: This is correct ONLY if you have already done the things below:

Read the actual source code, not from memory
Predicted the failure mode from the code
Instrumented the relevant state in one pass
Run the test once with instrumentation
Captured the full output, not partial output

If you have done all 5 and are still stuck, surrendering is fine. If you have not, you are surrendering too early. The user does not want to be your strategist; the user wants the agent to make progress.

7. The Verbose-Commit-Message Pattern (kill it)

Symptom: Your commit message is 50 lines. It contains the root cause analysis, the alternatives you considered, the side effects you considered, the cross-references, the "what this doesn't fix", the "what to verify", and a personal essay. The commit message is longer than the diff it describes.

Rule: A commit message is a 1-3 sentence summary. The body is for non-obvious "why" details, not for re-stating what the diff shows. If your commit message is longer than 15 lines, you are writing a report, not a commit message. Save the report for docs/reports/.

8. The "Isolated Pass" Verification Fallacy (kill it)

Symptom: You run the test in isolation. It passes. You commit. The test fails in batch. You didn't notice because you never ran the batch.

Rule: For any live_gui test or any test that depends on shared subprocess state, the only verification that matters is the batch run. A test that passes in isolation but fails in batch is failing — it's just that the failure is masked by isolation. Per the existing Live_gui Test Fragility rule in conductor/workflow.md: "Bisect failures by running the test both in the full suite and in isolation to distinguish 'test needs work' from 'real app bug'." If you only ever run in isolation, you cannot tell the difference.

Compaction Recovery

If you're a new agent picking up a session that was compacted (or a previous agent ran out of context), follow this recovery path:

Read the most recent docs/reports/PLANNING_DIGEST_<date>.md if one exists. It indexes the planning artifacts and explains the design decisions behind the active tracks.
For each in-flight track, read conductor/tracks/<track_id>/state.toml to see current_phase; read conductor/tracks/<track_id>/plan.md for the task breakdown.
Check git log --oneline -20 to see what has been committed; the most recent commits in conductor/tracks/<track_id>/ are the latest work.
Run the audit scripts (scripts/audit_main_thread_imports.py, scripts/audit_weak_types.py) to see the current state of the codebase.
Resume from the next unchecked task in state.toml. The per-task commit discipline means each commit is a safe rollback point.

The track's metadata.json has a verification_criteria field — this is the definition of "done" for the track. If all the criteria are checked, the track is complete.

For deeper recovery, see conductor/workflow.md "Compaction Recovery" (the same pattern, but workflow-level).

14 KiB Raw Blame History