ed/manual_slop

Fork 0

Files

Ed_ b4de62f2e7 docs: Enforce strict atomic per-task commits for Tier 2 agents

2026-03-02 12:52:04 -05:00

23 KiB

Raw Blame History

Project Workflow

Guiding Principles

The Plan is the Source of Truth: All work must be tracked in plan.md
The Tech Stack is Deliberate: Changes to the tech stack must be documented in tech-stack.md before implementation
Test-Driven Development: Write unit tests before implementing functionality
High Code Coverage: Aim for >80% code coverage for all modules
User Experience First: Every decision should prioritize user experience
Non-Interactive & CI-Aware: Prefer non-interactive commands. Use CI=true for watch-mode tools (tests, linters) to ensure single execution.
MMA Tiered Delegation is Mandatory: The Conductor acts as a Tier 1/2 Orchestrator. You MUST delegate all non-trivial coding to Tier 3 Workers and all error analysis to Tier 4 QA Agents. Do NOT perform large file writes directly.
Mandatory Research-First Protocol: Before reading the full content of any file over 50 lines, you MUST use get_file_summary, py_get_skeleton, py_get_code_outline, or py_get_docstring to map the architecture and identify specific target ranges. Use get_git_diff to understand recent changes. Use py_find_usages to locate where symbols are used.
Architecture Documentation Fallback: When uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in docs/ (last updated: 08e003a):
- docs/guide_architecture.md: Thread domains, cross-thread patterns (AsyncEventQueue, guarded lists, Condition dialogs), frame-sync mechanism (_process_pending_gui_tasks action catalog), AI client multi-provider architecture, HITL Execution Clutch blocking flow.
- docs/guide_tools.md: MCP Bridge 3-layer security model, full 26-tool inventory with parameters, Hook API endpoint reference (GET/POST), ApiHookClient method reference, /api/ask synchronous HITL protocol.
- docs/guide_mma.md: Ticket/Track/WorkerContext data structures, DAG engine algorithms, ConductorEngine execution loop, Tier 2 ticket generation, Tier 3 worker lifecycle with context amnesia.
- docs/guide_simulations.md: live_gui fixture, Puppeteer pattern, mock provider protocol, visual verification patterns.

Task Workflow

All tasks follow a strict lifecycle:

Standard Task Workflow

Initialize MMA Environment: Before executing the first task of any track, you MUST activate the mma-orchestrator skill (activate_skill mma-orchestrator).
Select Task: Choose the next available task from plan.md in sequential order
Mark In Progress: Before beginning work, edit plan.md and change the task from [ ] to [~]
High-Signal Research Phase:
- Identify Dependencies: Use list_directory, get_tree, and py_get_imports to map file relations.
- Map Architecture: Use py_get_code_outline or py_get_skeleton on identified files to understand their structure.
- Analyze Changes: Use get_git_diff if the task involves modifying recently updated code.
- Minimize Token Burn: Only use read_file with start_line/end_line for specific implementation details once target areas are identified.
Write Failing Tests (Red Phase):
- Pre-Delegation Checkpoint: Before spawning a worker for dangerous or non-trivial changes, ensure your current progress is staged (git add .) or committed. This prevents losing iterations if a sub-agent incorrectly uses git restore.
- Code Style: ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
- Delegate Test Creation: Do NOT write test code directly. Spawn a Tier 3 Worker (python scripts/mma_exec.py --role tier3-worker "[PROMPT]") with a surgical prompt specifying WHERE (file:line range), WHAT (test to create), HOW (which assertions/fixtures to use), and SAFETY (thread constraints if applicable). Example: "Write tests in tests/test_cost_tracker.py for cost_tracker.py:estimate_cost(). Test all model patterns in MODEL_PRICING dict. Assert unknown model returns 0. Use 1-space indentation." (If repeating due to failures, pass --failure-count X to switch to a more capable model).
- Take the code generated by the Worker and apply it.
- CRITICAL: Run the tests and confirm that they fail as expected. This is the "Red" phase of TDD. Do not proceed until you have failing tests.
Implement to Pass Tests (Green Phase):
- Pre-Delegation Checkpoint: Ensure current progress is staged or committed before delegating.
- Code Style: ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
- Delegate Implementation: Do NOT write the implementation code directly. Spawn a Tier 3 Worker (python scripts/mma_exec.py --role tier3-worker "[PROMPT]") with a surgical prompt specifying WHERE (file:line range to modify), WHAT (the specific change), HOW (which API calls, data structures, or patterns to use), and SAFETY (thread-safety constraints). Example: "In gui_2.py _render_mma_dashboard (lines 2685-2699), extend the token usage table from 3 to 5 columns. Add 'Model' and 'Est. Cost' using imgui.table_setup_column(). Call cost_tracker.estimate_cost(model, input_tokens, output_tokens). Use 1-space indentation." (If repeating due to failures, pass --failure-count X to switch to a more capable model).
- Take the code generated by the Worker and apply it.
- Run the test suite again and confirm that all tests now pass. This is the "Green" phase.
Refactor (Optional but Recommended):
- With the safety of passing tests, refactor the implementation code and the test code to improve clarity, remove duplication, and enhance performance without changing the external behavior.
- Rerun tests to ensure they still pass after refactoring.
Verify Coverage: Run coverage reports using the project's chosen tools. For example, in a Python project, this might look like:
```
pytest --cov=app --cov-report=html
```
Target: >80% coverage for new code. The specific tools and commands will vary by language and framework.
Document Deviations: If implementation differs from tech stack:
- STOP implementation
- Update tech-stack.md with new design
- Add dated note explaining the change
- Resume implementation
Commit Code Changes:
- CRITICAL - ATOMIC PER-TASK COMMITS: You MUST commit your changes immediately after completing and verifying a single task. Do NOT move on to the next task in the plan without committing the current one. This ensures precise tracking and safe rollback points.
- Stage all code changes related to the task.
- Propose a clear, concise commit message e.g, feat(ui): Create basic HTML structure for calculator.
- Perform the commit.
Attach Task Summary with Git Notes:

Step 9.1: Get Commit Hash: Obtain the hash of the just-completed commit (git log -1 --format="%H").
Step 9.2: Draft Note Content: Create a detailed summary for the completed task. This should include the task name, a summary of changes, a list of all created/modified files, and the core "why" for the change.

Step 9.3: Attach Note: Use the git notes command to attach the summary to the commit.

# The note content from the previous step is passed via the -m flag.
git notes add -m "<note content>" <commit_hash>

Get and Record Task Commit SHA:
- Step 10.1: Update Plan: Read plan.md, find the line for the completed task, update its status from [~] to [x], and append the first 7 characters of the just-completed commit's commit hash.
- Step 10.2: Write Plan: Write the updated content back to plan.md.
Commit Plan Update:
- Action: Stage the modified plan.md file.
- Action: Commit this change with a descriptive message (e.g., conductor(plan): Mark task 'Create user model' as complete).

Phase Completion Verification and Checkpointing Protocol

Trigger: This protocol is executed immediately after a task is completed that also concludes a phase in plan.md.

Announce Protocol Start: Inform the user that the phase is complete and the verification and checkpointing protocol has begun.
Ensure Test Coverage for Phase Changes:
- Step 2.1: Determine Phase Scope: To identify the files changed in this phase, you must first find the starting point. Read plan.md to find the Git commit SHA of the previous phase's checkpoint. If no previous checkpoint exists, the scope is all changes since the first commit.
- Step 2.2: List Changed Files: Execute git diff --name-only <previous_checkpoint_sha> HEAD to get a precise list of all files modified during this phase.
- Step 2.3: Verify and Create Tests: For each file in the list:
  - CRITICAL: First, check its extension. Exclude non-code files (e.g., .json, .md, .yaml).
  - For each remaining code file, verify a corresponding test file exists.
  - If a test file is missing, you must create one. Before writing the test, first, analyze other test files in the repository to determine the correct naming convention and testing style. The new tests must validate the functionality described in this phase's tasks (plan.md).
Execute Automated Tests with Proactive Debugging:
- Before execution, you must announce the exact shell command you will use to run the tests.
- Example Announcement: "I will now run the automated test suite to verify the phase. Command: CI=true npm test"
- Execute the announced command.
  - If tests fail with significant output (e.g., a large traceback), DO NOT attempt to read the raw stderr directly into your context. Instead, pipe the output to a log file and spawn a Tier 4 QA Agent (python scripts/mma_exec.py --role tier4-qa "[PROMPT]") to summarize the failure.
  - You must inform the user and begin debugging using the QA Agent's summary. You may attempt to propose a fix a maximum of two times. If the tests still fail after your second proposed fix, you must stop, report the persistent failure, and ask the user for guidance.
Execute Automated API Hook Verification:
- CRITICAL: The Conductor agent will now automatically execute verification tasks using the application's API hooks.
- The agent will announce the start of the automated verification to the user.
- It will then communicate with the application's IPC server to trigger the necessary verification functions.
- Result Handling:
  - All results (successes and failures) from the API hook invocations will be logged.
  - If all automated verifications pass, the agent will inform the user and proceed to the next step (Create Checkpoint Commit).
  - If any automated verification fails, the agent will halt the workflow, present the detailed failure logs to the user, and await further instructions for debugging or remediation.
Present Automated Verification Results and User Confirmation:
- After executing automated verification, the Conductor agent will present the results to the user.
- If verification passed, the agent will state: "Automated verification completed successfully."
- If verification failed, the agent will state: "Automated verification failed. Please review the logs above for details. You may attempt to propose a fix a maximum of two times. If the tests still fail after your second proposed fix, you must stop, report the persistent failure, and ask the user for guidance."
- PAUSE and await the user's response. Do not proceed without an explicit yes or confirmation from the user to proceed if tests pass, or guidance if tests fail.
Create Checkpoint Commit:
- Stage all changes. If no changes occurred in this step, proceed with an empty commit.
- Perform the commit with a clear and concise message (e.g., conductor(checkpoint): Checkpoint end of Phase X).
Attach Auditable Verification Report using Git Notes:
- Step 7.1: Draft Note Content: Create a detailed verification report including the automated test command, the manual verification steps, and the user's confirmation.
- Step 7.2: Attach Note: Use the git notes command and the full commit hash from the previous step to attach the full report to the checkpoint commit.
Get and Record Phase Checkpoint SHA:
- Step 8.1: Get Commit Hash: Obtain the hash of the just-created checkpoint commit (git log -1 --format="%H").
- Step 8.2: Update Plan: Read plan.md, find the heading for the completed phase, and append the first 7 characters of the commit hash in the format [checkpoint: <sha>].
- Step 8.3: Write Plan: Write the updated content back to plan.md.
Commit Plan Update:
- Action: Stage the modified plan.md file.
- Action: Commit this change with a descriptive message following the format conductor(plan): Mark phase '<PHASE NAME>' as complete.
Announce Completion: Inform the user that the phase is complete and the checkpoint has been created, with the detailed verification report attached as a git note.

Verification via API Hooks

For features involving the GUI or complex internal state, unit tests are often insufficient. You MUST use the application's built-in API hooks for empirical verification:

Launch the App with Hooks: Run the application in a separate shell with the --enable-test-hooks flag:
```
uv run python gui.py --enable-test-hooks
```
This starts the hook server on port 8999.
Use the pytest live_gui Fixture: For automated tests, use the session-scoped live_gui fixture defined in tests/conftest.py. This fixture handles the lifecycle (startup/shutdown) of the application with hooks enabled.
```
def test_my_feature(live_gui):
    # The GUI is now running on port 8999
    ...
```
Note: pytest must be run with uv.
Verify via ApiHookClient: Use the ApiHookClient in api_hook_client.py to interact with the running application. It includes robust retry logic and health checks.
Verify via REST Commands: Use PowerShell or curl to send commands to the application and verify the response. For example, to check health:
```
Invoke-RestMethod -Uri "http://127.0.0.1:8999/status" -Method Get
```

Quality Gates

Before marking any task complete, verify:

All tests pass
Code coverage meets requirements (>80%)
Code follows project's code style guidelines (as defined in code_styleguides/)
All public functions/methods are documented (e.g., docstrings, JSDoc, GoDoc)
Type safety is enforced (e.g., type hints, TypeScript types, Go types)
No linting or static analysis errors (using the project's configured tools)
Works correctly on mobile (if applicable)
Documentation updated if needed
No security vulnerabilities introduced

Development Commands

AI AGENT INSTRUCTION: This section should be adapted to the project's specific language, framework, and build tools.

Setup

# Example: Commands to set up the development environment (e.g., install dependencies, configure database)
# e.g., for a Node.js project: npm install
# e.g., for a Go project: go mod tidy

Daily Development

# Example: Commands for common daily tasks (e.g., start dev server, run tests, lint, format)
# e.g., for a Node.js project: npm run dev, npm test, npm run lint
# e.g., for a Go project: go run main.go, go test ./..., go fmt ./...

Before Committing

# Example: Commands to run all pre-commit checks (e.g., format, lint, type check, run tests)
# e.g., for a Node.js project: npm run check
# e.g., for a Go project: make check (if a Makefile exists)

Testing Requirements

Unit Testing

Every module must have corresponding tests.
Use appropriate test setup/teardown mechanisms (e.g., fixtures, beforeEach/afterEach).
Mock external dependencies.
Test both success and failure cases.

Integration Testing

Test complete user flows
Verify database transactions
Test authentication and authorization
Check form submissions

Mobile Testing

Test on actual iPhone when possible
Use Safari developer tools
Test touch interactions
Verify responsive layouts
Check performance on 3G/4G

Code Review Process

Self-Review Checklist

Before requesting review:

Functionality
- Feature works as specified
- Edge cases handled
- Error messages are user-friendly
Code Quality
- Follows style guide
- DRY principle applied
- Clear variable/function names
- Appropriate comments
Testing
- Unit tests comprehensive
- Integration tests pass
- Coverage adequate (>80%)
Security
- No hardcoded secrets
- Input validation present
- SQL injection prevented
- XSS protection in place
Performance
- Database queries optimized
- Images optimized
- Caching implemented where needed
Mobile Experience
- Touch targets adequate (44x44px)
- Text readable without zooming
- Performance acceptable on mobile
- Interactions feel native

Commit Guidelines

Message Format

<type>(<scope>): <description>

[optional body]

[optional footer]

Types

feat: New feature
fix: Bug fix
docs: Documentation only
style: Formatting, missing semicolons, etc.
refactor: Code change that neither fixes a bug nor adds a feature
test: Adding missing tests
chore: Maintenance tasks

Examples

git commit -m "feat(auth): Add remember me functionality"
git commit -m "fix(posts): Correct excerpt generation for short posts"
git commit -m "test(comments): Add tests for emoji reaction limits"
git commit -m "style(mobile): Improve button touch targets"

Definition of Done

A task is complete when:

All code implemented to specification
Unit tests written and passing
Code coverage meets project requirements
Documentation complete (if applicable)
Code passes all configured linting and static analysis checks
Works beautifully on mobile (if applicable)
Implementation notes added to plan.md
Changes committed with proper message
Git note with task summary attached to the commit

Emergency Procedures

Critical Bug in Production

Create hotfix branch from main
Write failing test for bug
Implement minimal fix
Test thoroughly including mobile
Deploy immediately
Document in plan.md

Data Loss

Stop all write operations
Restore from latest backup
Verify data integrity
Document incident
Update backup procedures

Security Breach

Rotate all secrets immediately
Review access logs
Patch vulnerability
Notify affected users (if any)
Document and update security procedures

Deployment Workflow

Pre-Deployment Checklist

All tests passing
Coverage >80%
No linting errors
Mobile testing complete
Environment variables configured
Database migrations ready
Backup created

Deployment Steps

Merge feature branch to main
Tag release with version
Push to deployment service
Run database migrations
Verify deployment
Test critical paths
Monitor for errors

Post-Deployment

Monitor analytics
Check error logs
Gather user feedback
Plan next iteration

Continuous Improvement

Review workflow weekly
Update based on pain points
Document lessons learned
Optimize for user happiness
Keep things simple and maintainable

Conductor Token Firewalling & Model Switching Strategy

To emulate the 4-Tier MMA Architecture within the standard Conductor extension without requiring a custom fork, adhere to these strict workflow policies:

1. Active Model Switching (Simulating the 4 Tiers)

Mandatory Skill Activation: As the very first step of any MMA-driven process, including track initialization and implementation phases, the agent MUST activate the mma-orchestrator skill (activate_skill mma-orchestrator). This is crucial for enforcing the 4-Tier token firewall.
The MMA Bridge (mma_exec.py): All tiered delegation is routed through python scripts/mma_exec.py. This script acts as the primary bridge, managing model selection, context injection, and logging.
Model Tiers:
- Tier 1 (Strategic/Orchestration): gemini-3.1-pro-preview. Focused on product alignment, setup (/conductor:setup), and track initialization (/conductor:newTrack).
- Tier 2 (Architectural/Tech Lead): gemini-3-flash-preview. Focused on architectural design and track execution (/conductor:implement). Note: Tier 2 maintains persistent memory throughout a track's implementation.
- Tier 3 (Execution/Worker): gemini-2.5-flash-lite. Used for surgical code implementation and test generation. Operates statelessly (Context Amnesia) but has access to file I/O tools.
- Tier 4 (Utility/QA): gemini-2.5-flash-lite. Used for log summarization and error analysis. Operates statelessly (Context Amnesia) but has access to diagnostic tools.
Tiered Delegation Protocol:
- Tier 3 Worker: python scripts/mma_exec.py --role tier3-worker "[PROMPT]"
- Tier 4 QA Agent: python scripts/mma_exec.py --role tier4-qa "[PROMPT]"
Observability: All hierarchical interactions are recorded in logs/mma_delegation.log and detailed sub-agent logs are saved to logs/agents/.

2. Context Management and Token Firewalling

Context Amnesia (Tiers 3 & 4): mma_exec.py enforces "Context Amnesia" by executing sub-agents in a stateless manner. Each call starts with a clean slate, receiving only the strictly necessary documents and prompts.
Persistent Memory (Tier 2): The Tier 2 Tech Lead does NOT use Context Amnesia during track implementation to ensure continuity of technical strategy.
AST Skeleton Views: For Tier 3 implementation, mma_exec.py automatically generates "AST Skeleton Views" of project dependencies. This provides the worker model with the interface-level structure (function signatures, docstrings) of imported modules without the full source code, maximizing the signal-to-noise ratio in the context window.

3. Phase Checkpoints (The Final Defense)

The Phase Completion Verification and Checkpointing Protocol is the project's primary defense against token bloat.
When a Phase is marked complete and a checkpoint commit is created, the AI Agent must actively interpret this as a "Context Wipe" signal. It should summarize the outcome in its git notes and move forward treating the checkpoint as absolute truth, deliberately dropping earlier conversational history.
MMA Phase Memory Wipe: After completing a major Phase, use the Tier 1/2 Orchestrator's perspective to consolidate state into Git Notes and then disregard previous trial-and-error histories.

23 KiB Raw Blame History