c9c5535889
Per 2026-06-07 user feedback during test_suite cleanup: "if the intent is to annotate a known failure, fine. But that known failure must be addressed with priority." New section between "Per-Task Decision Protocol" and "Documentation Refresh Protocol" makes the policy explicit: - Skip markers are DOCUMENTATION, not avoidance - They're useful for opt-in integration tests, unimplemented features, or feature-flag-gated code - They're NOT useful for pre-existing failures, "I don't understand this" issues, or racy tests the agent doesn't want to debug - When adding a marker, MUST document the underlying issue AND what the fix would be - When the fix is in-session reachable, FIX IT INSTEAD of skipping — limited context is not an excuse Includes a 4-question review checklist before adding a skip. References the existing AGENTS.md "Use skip markers as excuse to AVOID" rule so the two policies don't drift.
623 lines
44 KiB
Markdown
623 lines
44 KiB
Markdown
# Project Workflow
|
|
|
|
## Session Start Checklist (MANDATORY)
|
|
|
|
## Code Style (MANDATORY - Python)
|
|
|
|
- **1-space indentation** for ALL Python code (NO EXCEPTIONS)
|
|
- **CRLF line endings** on Windows
|
|
- **NO COMMENTS** unless explicitly requested
|
|
- Type hints required for all public functions
|
|
- **ImGui Defer Patterns:** Use `imscope` context managers or `_render_window_if_open` dispatch helpers to prevent resource leaks and keep the main loop flat. See `conductor/code_styleguides/python.md` for details.
|
|
|
|
### CRITICAL: Native Edit Tool Destroys Indentation
|
|
|
|
The native `Edit` tool DESTROYS 1-space indentation and converts to 4-space.
|
|
|
|
**NEVER use native `edit` tool on Python files.**
|
|
|
|
Instead, use Manual Slop MCP tools:
|
|
- `manual-slop_py_update_definition` - Replace function/class
|
|
- `manual-slop_set_file_slice` - Replace line range
|
|
- `manual-slop_py_set_signature` - Replace signature only
|
|
|
|
Or use Python subprocess with `newline=''` to preserve line endings:
|
|
```python
|
|
python -c "
|
|
with open('file.py', 'r', encoding='utf-8', newline='') as f:
|
|
content = f.read()
|
|
content = content.replace(old, new)
|
|
with open('file.py', 'w', encoding='utf-8', newline='') as f:
|
|
f.write(content)
|
|
"
|
|
```
|
|
|
|
## Guiding Principles
|
|
|
|
1. **The Plan is the Source of Truth:** All work must be tracked in `plan.md`
|
|
2. **The Tech Stack is Deliberate:** Changes to the tech stack must be documented in `tech-stack.md` *before* implementation
|
|
3. **Test-Driven Development:** Write unit tests before implementing functionality
|
|
4. **High Code Coverage:** Aim for >80% code coverage for all modules
|
|
5. **User Experience First:** Every decision should prioritize user experience
|
|
6. **Non-Interactive & CI-Aware:** Prefer non-interactive commands. Use `CI=true` for watch-mode tools (tests, linters) to ensure single execution.
|
|
7. **MMA Tiered Delegation is Mandatory:** The Conductor acts as a Tier 1/2 Orchestrator. You MUST delegate all non-trivial coding to Tier 3 Workers and all error analysis to Tier 4 QA Agents. Do NOT perform large file writes directly.
|
|
8. **Mandatory Research-First Protocol:** Before reading the full content of any file over 50 lines, you MUST use `get_file_summary`, `py_get_skeleton`, `py_get_code_outline`, or `py_get_docstring` to map the architecture and identify specific target ranges. Use `get_git_diff` to understand recent changes. Use `py_find_usages` to locate where symbols are used.
|
|
9. **Architecture Documentation Fallback:** When uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in `docs/` (last refreshed: 2026-06-02 via the comprehensive documentation refresh track, **8 new guides added**):
|
|
- **[docs/guide_architecture.md](../docs/guide_architecture.md):** Thread domains, cross-thread patterns, AI client multi-provider (Gemini, Anthropic, DeepSeek, Gemini CLI, MiniMax), HITL Execution Clutch.
|
|
- **[docs/guide_tools.md](../docs/guide_tools.md):** MCP Bridge 3-layer security, full 45-tool inventory, Hook API, ApiHookClient, `/api/ask` HITL protocol.
|
|
- **[docs/guide_mma.md](../docs/guide_mma.md):** Ticket/Track/WorkerContext data structures, DAG engine, ConductorEngine, Tier 2/3/4 lifecycles, persona application.
|
|
- **[docs/guide_simulations.md](../docs/guide_simulations.md):** `live_gui` fixture, Puppeteer pattern, mock provider, test areas by subsystem.
|
|
- **[docs/guide_testing.md](../docs/guide_testing.md):** **NEW** — 251 test files, 5 categories, 7 conftest fixtures (`isolate_workspace`, `reset_paths`, `reset_ai_client`, `vlogger`, `kill_process_tree`, `mock_app`, `live_gui` session-scoped), Puppeteer pattern, mock provider, structural testing contract.
|
|
- **[docs/guide_gui_2.md](../docs/guide_gui_2.md):** **NEW** — `src/gui_2.py` (260KB main GUI): App class lifecycle, ~90 module-level render functions, Multi-Viewport docks, panel registry, command palette integration, ImGuiScope context managers, hot reload support.
|
|
- **[docs/guide_ai_client.md](../docs/guide_ai_client.md):** **NEW** — `src/ai_client.py` (116KB): multi-provider LLM singleton (5 providers), async dispatch via `asyncio.gather`, threading.local for source tier tagging, Anthropic ephemeral caching + Gemini explicit caching, Tier 4 QA error interception.
|
|
- **[docs/guide_api_hooks.md](../docs/guide_api_hooks.md):** **NEW** — `src/api_hooks.py` + `src/api_hook_client.py` (38KB + 31KB): HookServer on `127.0.0.1:8999`, ApiHookClient wrapper, 8+ endpoints, Remote Confirmation Protocol via `/api/ask`.
|
|
- **[docs/guide_mcp_client.md](../docs/guide_mcp_client.md):** **NEW** — `src/mcp_client.py` (81KB, 45 tools): 3-layer security (Allowlist → Validate → Resolve), all native tools (File I/O, Python AST, C/C++ AST, Analysis, Network, Runtime, Beads), ExternalMCPManager (Stdio + SSE), JSON-RPC 2.0 engine.
|
|
- **[docs/guide_app_controller.md](../docs/guide_app_controller.md):** **NEW** — `src/app_controller.py` (166KB): headless orchestrator, AppState dataclass, all subsystem managers, `_predefined_callbacks`/`_gettable_fields` Hook API registries, SyncEventQueue, headless mode.
|
|
- **[docs/guide_multi_agent_conductor.md](../docs/guide_multi_agent_conductor.md):** **NEW** — `src/multi_agent_conductor.py` + `src/dag_engine.py` (28KB + 10KB): TrackDAG (iterative DFS cycle detection, Kahn's topological sort), ExecutionEngine (Auto-Queue / Step Mode), MultiAgentConductor + WorkerPool (concurrency 4), mma_exec.py sub-agent invocation.
|
|
- **[docs/guide_models.md](../docs/guide_models.md):** **NEW** — `src/models.py` (132KB): centralized data model registry, `AGENT_TOOL_NAMES` canonical 45-tool list, `PROVIDERS` constant, `parse_plan_md` utility, validation patterns, SDM tags.
|
|
- See [docs/Readme.md](../docs/Readme.md) for the full **14-guide index** covering context curation, shaders, RAG, Beads, hot reload, personas, NERV theme, workspace profiles, and command palette.
|
|
|
|
## Task Workflow
|
|
|
|
All tasks follow a strict lifecycle:
|
|
|
|
### Standard Task Workflow
|
|
|
|
0. **Initialize MMA Environment:** Before executing the first task of any track, you MUST activate the `mma-orchestrator` skill (`activate_skill mma-orchestrator`).
|
|
|
|
1. **Select Task:** Choose the next available task from `plan.md` in sequential order
|
|
|
|
2. **Mark In Progress:** Before beginning work, edit `plan.md` and change the task from `[ ]` to `[~]`
|
|
|
|
3. **High-Signal Research Phase:**
|
|
- **Identify Dependencies:** Use `list_directory`, `get_tree`, and `py_get_imports` to map file relations.
|
|
- **Map Architecture:** Use `py_get_code_outline` or `py_get_skeleton` on identified files to understand their structure.
|
|
- **Audit State:** Use `py_get_code_outline` or `py_get_definition` on the target class's `__init__` method to check for existing, unused, or duplicate state variables before adding new ones.
|
|
- **Analyze Changes:** Use `get_git_diff` if the task involves modifying recently updated code.
|
|
- **Minimize Token Burn:** Only use `read_file` with `start_line`/`end_line` for specific implementation details once target areas are identified.
|
|
4. **Write Failing Tests (Red Phase):**
|
|
- **Pre-Delegation Checkpoint:** Before spawning a worker for dangerous or non-trivial changes, ensure your current progress is staged (`git add .`) or committed. This prevents losing iterations if a sub-agent incorrectly uses `git restore`.
|
|
- **Zero-Assertion Ban:** You MUST NOT write tests that contain only `pass` or lack meaningful assertions. A test is only valid if it contains assertions that explicitly test the behavioral change and verify the failure condition.
|
|
- **Code Style:** ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
|
|
- **Delegate Test Creation:** Do NOT write test code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a **surgical prompt** specifying WHERE (file:line range), WHAT (test to create), HOW (which assertions/fixtures to use), and SAFETY (thread constraints if applicable). Example: `"Write tests in tests/test_cost_tracker.py for cost_tracker.py:estimate_cost(). Test all model patterns in MODEL_PRICING dict. Assert unknown model returns 0. Use 1-space indentation."` (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
|
|
- Take the code generated by the Worker and apply it.
|
|
- **CRITICAL:** Run the tests and confirm that they fail as expected. This is the "Red" phase of TDD. Do not proceed until you have failing tests.
|
|
|
|
5. **Implement to Pass Tests (Green Phase):**
|
|
- **Pre-Delegation Checkpoint:** Ensure current progress is staged or committed before delegating.
|
|
- **Code Style:** ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
|
|
- **Delegate Implementation:** Do NOT write the implementation code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a **surgical prompt** specifying WHERE (file:line range to modify), WHAT (the specific change), HOW (which API calls, data structures, or patterns to use), and SAFETY (thread-safety constraints). Example: `"In gui_2.py _render_mma_dashboard (lines 2685-2699), extend the token usage table from 3 to 5 columns. Add 'Model' and 'Est. Cost' using imgui.table_setup_column(). Call cost_tracker.estimate_cost(model, input_tokens, output_tokens). Use 1-space indentation."` (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
|
|
- Take the code generated by the Worker and apply it.
|
|
- Run the test suite again and confirm that all tests now pass. This is the "Green" phase.
|
|
|
|
6. **Refactor (Optional but Recommended):**
|
|
- With the safety of passing tests, refactor the implementation code and the test code to improve clarity, remove duplication, and enhance performance without changing the external behavior.
|
|
- Rerun tests to ensure they still pass after refactoring.
|
|
|
|
7. **Verify Coverage:** Run coverage reports using the project's chosen tools. For example, in a Python project, this might look like:
|
|
```powershell
|
|
pytest --cov=app --cov-report=html
|
|
```
|
|
Target: >80% coverage for new code. The specific tools and commands will vary by language and framework.
|
|
|
|
8. **Document Deviations:** If implementation differs from tech stack:
|
|
- **STOP** implementation
|
|
- Update `tech-stack.md` with new design
|
|
- Add dated note explaining the change
|
|
- Resume implementation
|
|
|
|
9. **Commit Code Changes:**
|
|
- **CRITICAL - ATOMIC PER-TASK COMMITS**: You MUST commit your changes immediately after completing and verifying a single task. Do NOT move on to the next task in the plan without committing the current one. This ensures precise tracking and safe rollback points.
|
|
- Stage all code changes related to the task.
|
|
- Propose a clear, concise commit message e.g, `feat(ui): Create basic HTML structure for calculator`.
|
|
- Perform the commit.
|
|
|
|
10. **Attach Task Summary with Git Notes:**
|
|
- **Step 9.1: Get Commit Hash:** Obtain the hash of the *just-completed commit* (`git log -1 --format="%H"`).
|
|
- **Step 9.2: Draft Note Content:** Create a detailed summary for the completed task. This should include the task name, a summary of changes, a list of all created/modified files, and the core "why" for the change.
|
|
- **Step 9.3: Attach Note:** Use the `git notes` command to attach the summary to the commit.
|
|
```powershell
|
|
# The note content from the previous step is passed via the -m flag.
|
|
git notes add -m "<note content>" <commit_hash>
|
|
```
|
|
|
|
11. **Get and Record Task Commit SHA:**
|
|
- **Step 10.1: Update Plan:** Read `plan.md`, find the line for the completed task, update its status from `[~]` to `[x]`, and append the first 7 characters of the *just-completed commit's* commit hash.
|
|
- **Step 10.2: Write Plan:** Write the updated content back to `plan.md`.
|
|
|
|
12. **Commit Plan Update:**
|
|
- **Action:** Stage the modified `plan.md` file.
|
|
- **Action:** Commit this change with a descriptive message (e.g., `conductor(plan): Mark task 'Create user model' as complete`).
|
|
|
|
### Phase Completion Verification and Checkpointing Protocol
|
|
|
|
**Trigger:** This protocol is executed immediately after a task is completed that also concludes a phase in `plan.md`.
|
|
|
|
1. **Announce Protocol Start:** Inform the user that the phase is complete and the verification and checkpointing protocol has begun.
|
|
|
|
2. **Ensure Test Coverage for Phase Changes:**
|
|
- **Step 2.1: Determine Phase Scope:** To identify the files changed in this phase, you must first find the starting point. Read `plan.md` to find the Git commit SHA of the *previous* phase's checkpoint. If no previous checkpoint exists, the scope is all changes since the first commit.
|
|
- **Step 2.2: List Changed Files:** Execute `git diff --name-only <previous_checkpoint_sha> HEAD` to get a precise list of all files modified during this phase.
|
|
- **Step 2.3: Verify and Create Tests:** For each file in the list:
|
|
- **CRITICAL:** First, check its extension. Exclude non-code files (e.g., `.json`, `.md`, `.yaml`).
|
|
- For each remaining code file, verify a corresponding test file exists.
|
|
- If a test file is missing, you **must** create one. Before writing the test, **first, analyze other test files in the repository to determine the correct naming convention and testing style.** The new tests **must** validate the functionality described in this phase's tasks (`plan.md`).
|
|
|
|
3. **Execute Automated Tests in Batches:**
|
|
- Because the full suite is large (>360 tests) and contains complex UI simulations, running the entire suite frequently can lead to random timeouts or threading access violations.
|
|
- Before execution, you **must** announce the exact shell command.
|
|
- **CRITICAL:** When verifying changes, **do not run the full suite (`pytest tests/`)**. Instead, run tests in small, targeted batches (maximum 4 test files at a time). Only use long timeouts (`--timeout=60` or `--timeout=120`) if the specific tests in the batch are known to be slow (e.g., simulation tests).
|
|
- **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `uv run pytest tests/test_specific_feature.py`"
|
|
- Execute the announced command.
|
|
- If tests fail with significant output (e.g., a large traceback), **DO NOT** attempt to read the raw `stderr` directly into your context. Instead, pipe the output to a log file and **spawn a Tier 4 QA Agent (`python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`)** to summarize the failure.
|
|
- You **must** inform the user and begin debugging using the QA Agent's summary. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance.
|
|
|
|
4. **Execute Automated API Hook Verification:**
|
|
- **CRITICAL:** The Conductor agent will now automatically execute verification tasks using the application's API hooks.
|
|
- The agent will announce the start of the automated verification to the user.
|
|
- It will then communicate with the application's IPC server to trigger the necessary verification functions.
|
|
- **Result Handling:**
|
|
- All results (successes and failures) from the API hook invocations will be logged.
|
|
- If all automated verifications pass, the agent will inform the user and proceed to the next step (Create Checkpoint Commit).
|
|
- If any automated verification fails, the agent will halt the workflow, present the detailed failure logs to the user, and await further instructions for debugging or remediation.
|
|
|
|
5. **Present Automated Verification Results and User Confirmation:**
|
|
- After executing automated verification, the Conductor agent will present the results to the user.
|
|
- If verification passed, the agent will state: "Automated verification completed successfully."
|
|
- If verification failed, the agent will state: "Automated verification failed. Please review the logs above for details. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance."
|
|
- **PAUSE** and await the user's response. Do not proceed without an explicit yes or confirmation from the user to proceed if tests pass, or guidance if tests fail.
|
|
|
|
6. **Create Checkpoint Commit:**
|
|
- Stage all changes. If no changes occurred in this step, proceed with an empty commit.
|
|
- Perform the commit with a clear and concise message (e.g., `conductor(checkpoint): Checkpoint end of Phase X`).
|
|
|
|
7. **Attach Auditable Verification Report using Git Notes:**
|
|
- **Step 7.1: Draft Note Content:** Create a detailed verification report including the automated test command, the manual verification steps, and the user's confirmation.
|
|
- **Step 7.2: Attach Note:** Use the `git notes` command and the full commit hash from the previous step to attach the full report to the checkpoint commit.
|
|
|
|
8. **Get and Record Phase Checkpoint SHA:**
|
|
- **Step 8.1: Get Commit Hash:** Obtain the hash of the *just-created checkpoint commit* (`git log -1 --format="%H"`).
|
|
- **Step 8.2: Update Plan:** Read `plan.md`, find the heading for the completed phase, and append the first 7 characters of the commit hash in the format `[checkpoint: <sha>]`.
|
|
- **Step 8.3: Write Plan:** Write the updated content back to `plan.md`.
|
|
|
|
9. **Commit Plan Update:**
|
|
- **Action:** Stage the modified `plan.md` file.
|
|
- **Action:** Commit this change with a descriptive message following the format `conductor(plan): Mark phase '<PHASE NAME>' as complete`.
|
|
|
|
10. **Announce Completion:** Inform the user that the phase is complete and the checkpoint has been created, with the detailed verification report attached as a git note.
|
|
|
|
### Verification via API Hooks
|
|
|
|
For features involving the GUI or complex internal state, unit tests are often insufficient. You MUST use the application's built-in API hooks for empirical verification:
|
|
|
|
1. **Launch the App with Hooks:** Run the application in a separate shell with the `--enable-test-hooks` flag:
|
|
```powershell
|
|
uv run python gui.py --enable-test-hooks
|
|
```
|
|
This starts the hook server on port `8999`.
|
|
|
|
2. **Use the pytest `live_gui` Fixture:** For automated tests, use the session-scoped `live_gui` fixture defined in `tests/conftest.py`. This fixture handles the lifecycle (startup/shutdown) of the application with hooks enabled.
|
|
```python
|
|
def test_my_feature(live_gui):
|
|
# The GUI is now running on port 8999
|
|
...
|
|
```
|
|
Note: pytest must be run with `uv`.
|
|
|
|
3. **Verify via ApiHookClient:** Use the `ApiHookClient` in `api_hook_client.py` to interact with the running application. It includes robust retry logic and health checks.
|
|
|
|
4. **Verify via REST Commands:** Use PowerShell or `curl` to send commands to the application and verify the response. For example, to check health:
|
|
```powershell
|
|
Invoke-RestMethod -Uri "http://127.0.0.1:8999/status" -Method Get
|
|
```
|
|
|
|
### Quality Gates
|
|
|
|
Before marking any task complete, verify:
|
|
|
|
- [ ] All tests pass
|
|
- [ ] Code coverage meets requirements (>80%)
|
|
- [ ] Code follows project's code style guidelines (as defined in `code_styleguides/`)
|
|
- [ ] All public functions/methods are documented (e.g., docstrings, JSDoc, GoDoc)
|
|
- [ ] Type safety is enforced (e.g., type hints, TypeScript types, Go types)
|
|
- [ ] No linting or static analysis errors (using the project's configured tools)
|
|
- [ ] Works correctly on mobile (if applicable)
|
|
- [ ] Documentation updated if needed
|
|
- [ ] No security vulnerabilities introduced
|
|
|
|
## Development Commands
|
|
|
|
**AI AGENT INSTRUCTION: This section should be adapted to the project's specific language, framework, and build tools.**
|
|
|
|
### Setup
|
|
|
|
```powershell
|
|
# Example: Commands to set up the development environment (e.g., install dependencies, configure database)
|
|
# e.g., for a Node.js project: npm install
|
|
# e.g., for a Go project: go mod tidy
|
|
```
|
|
|
|
### Daily Development
|
|
|
|
```powershell
|
|
# Example: Commands for common daily tasks (e.g., start dev server, run tests, lint, format)
|
|
# e.g., for a Node.js project: npm run dev, npm test, npm run lint
|
|
# e.g., for a Go project: go run main.go, go test ./..., go fmt ./...
|
|
```
|
|
|
|
### Before Committing
|
|
|
|
```powershell
|
|
# Example: Commands to run all pre-commit checks (e.g., format, lint, type check, run tests)
|
|
# e.g., for a Node.js project: npm run check
|
|
# e.g., for a Go project: make check (if a Makefile exists)
|
|
```
|
|
|
|
## Testing Requirements
|
|
|
|
### Structural Testing Contract
|
|
|
|
1. **Ban on Arbitrary Core Mocking:** Tier 3 workers are strictly forbidden from using `unittest.mock.patch` to bypass or stub core infrastructure (e.g., event queues, `ai_client` internals, threading primitives) unless explicitly authorized by the Tier 2 Tech Lead for a specific boundary test.
|
|
2. **`live_gui` Standard:** All integration and end-to-end testing must utilize the `live_gui` fixture to interact with a real instance of the application via the Hook API. Bypassing the hook server to directly mutate GUI state in tests is prohibited.
|
|
3. **Artifact Isolation:** All test-generated artifacts (logs, temporary workspaces, mock outputs) MUST be written to the `tests/artifacts/` or `tests/logs/` directories. These directories are git-ignored to prevent repository pollution.
|
|
|
|
### Unit Testing
|
|
|
|
- Every module must have corresponding tests.
|
|
- Use appropriate test setup/teardown mechanisms (e.g., fixtures, beforeEach/afterEach).
|
|
- Mock external dependencies.
|
|
- Test both success and failure cases.
|
|
|
|
### Integration Testing
|
|
|
|
- Test complete user flows
|
|
- Verify database transactions
|
|
- Test authentication and authorization
|
|
- Check form submissions
|
|
|
|
### Mobile Testing
|
|
|
|
- Test on actual iPhone when possible
|
|
- Use Safari developer tools
|
|
- Test touch interactions
|
|
- Verify responsive layouts
|
|
- Check performance on 3G/4G
|
|
|
|
## Code Review Process
|
|
|
|
### Self-Review Checklist
|
|
|
|
Before requesting review:
|
|
|
|
1. **Functionality**
|
|
- Feature works as specified
|
|
- Edge cases handled
|
|
- Error messages are user-friendly
|
|
|
|
2. **Code Quality**
|
|
- Follows style guide
|
|
- DRY principle applied
|
|
- Clear variable/function names
|
|
- Appropriate comments
|
|
|
|
3. **Testing**
|
|
- Unit tests comprehensive
|
|
- Integration tests pass
|
|
- Coverage adequate (>80%)
|
|
|
|
4. **Security**
|
|
- No hardcoded secrets
|
|
- Input validation present
|
|
- SQL injection prevented
|
|
- XSS protection in place
|
|
|
|
5. **Performance**
|
|
- Database queries optimized
|
|
- Images optimized
|
|
- Caching implemented where needed
|
|
|
|
6. **Mobile Experience**
|
|
- Touch targets adequate (44x44px)
|
|
- Text readable without zooming
|
|
- Performance acceptable on mobile
|
|
- Interactions feel native
|
|
|
|
## Commit Guidelines
|
|
|
|
### Message Format
|
|
|
|
```
|
|
<type>(<scope>): <description>
|
|
|
|
[optional body]
|
|
|
|
[optional footer]
|
|
```
|
|
|
|
### Types
|
|
|
|
- `feat`: New feature
|
|
- `fix`: Bug fix
|
|
- `docs`: Documentation only
|
|
- `style`: Formatting, missing semicolons, etc.
|
|
- `refactor`: Code change that neither fixes a bug nor adds a feature
|
|
- `test`: Adding missing tests
|
|
- `chore`: Maintenance tasks
|
|
|
|
### Examples
|
|
|
|
```powershell
|
|
git commit -m "feat(auth): Add remember me functionality"
|
|
git commit -m "fix(posts): Correct excerpt generation for short posts"
|
|
git commit -m "test(comments): Add tests for emoji reaction limits"
|
|
git commit -m "style(mobile): Improve button touch targets"
|
|
```
|
|
|
|
## Definition of Done
|
|
|
|
A task is complete when:
|
|
|
|
1. All code implemented to specification
|
|
2. Unit tests written and passing
|
|
3. Code coverage meets project requirements
|
|
4. Documentation complete (if applicable)
|
|
5. Code passes all configured linting and static analysis checks
|
|
6. Works beautifully on mobile (if applicable)
|
|
7. Implementation notes added to `plan.md`
|
|
8. Changes committed with proper message
|
|
9. Git note with task summary attached to the commit
|
|
|
|
## Conductor Token Firewalling & Model Switching Strategy
|
|
|
|
To emulate the 4-Tier MMA Architecture within the standard Conductor extension without requiring a custom fork, adhere to these strict workflow policies:
|
|
|
|
### 1. Active Model Switching (Simulating the 4 Tiers)
|
|
|
|
- **Mandatory Skill Activation:** As the very first step of any MMA-driven process, including track initialization and implementation phases, the agent MUST activate the `mma-orchestrator` skill (`activate_skill mma-orchestrator`) and their corresponding role's specific tier skill. This is crucial for enforcing the 4-Tier token firewall.
|
|
- **The MMA Bridge (`mma_exec.py`):** All tiered delegation is routed through `uv python scripts/mma_exec.py`. This script acts as the primary bridge, managing model selection, context injection, and logging.
|
|
- **Model Tiers:**
|
|
- **Tier 1 (Strategic/Orchestration):** `gemini-3.1-pro-preview`. Focused on product alignment, setup (`/conductor:setup`), and track initialization (`/conductor:newTrack`).
|
|
- **Tier 2 (Architectural/Tech Lead):** `gemini-3-flash-preview`. Focused on architectural design and track execution (`/conductor:implement`). **Note:** Tier 2 maintains persistent memory throughout a track's implementation.
|
|
- **Tier 3 (Execution/Worker):** `gemini-2.5-flash-lite`. Used for surgical code implementation and test generation. Operates statelessly (Context Amnesia) but has access to file I/O tools.
|
|
- **Tier 4 (Utility/QA):** `gemini-2.5-flash-lite`. Used for log summarization and error analysis. Operates statelessly (Context Amnesia) but has access to diagnostic tools.
|
|
- **Tiered Delegation Protocol:**
|
|
- **Tier 3 Worker:** `uv run python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`
|
|
- **Tier 4 QA Agent:** `uv run python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`
|
|
- **Observability:** All hierarchical interactions are recorded in `logs/mma_delegation.log` and detailed sub-agent logs are saved to `logs/agents/`.
|
|
|
|
### 2. Context Management and Token Firewalling
|
|
|
|
- **Context Amnesia (Tiers 3 & 4):** `mma_exec.py` enforces "Context Amnesia" by executing sub-agents in a stateless manner. Each call starts with a clean slate, receiving only the strictly necessary documents and prompts.
|
|
- **Persistent Memory (Tier 2):** The Tier 2 Tech Lead does NOT use Context Amnesia during track implementation to ensure continuity of technical strategy.
|
|
- **AST Skeleton Views:** For Tier 3 implementation, `mma_exec.py` automatically generates "AST Skeleton Views" of project dependencies. This provides the worker model with the interface-level structure (function signatures, docstrings) of imported modules without the full source code, maximizing the signal-to-noise ratio in the context window.
|
|
|
|
### 3. Phase Checkpoints (The Final Defense)
|
|
|
|
- The **Phase Completion Verification and Checkpointing Protocol** is the project's primary defense against token bloat.
|
|
- When a Phase is marked complete and a checkpoint commit is created, the AI Agent must actively interpret this as a **"Context Wipe"** signal. It should summarize the outcome in its git notes and move forward treating the checkpoint as absolute truth, deliberately dropping earlier conversational history.
|
|
- **MMA Phase Memory Wipe:** After completing a major Phase, use the Tier 1/2 Orchestrator's perspective to consolidate state into Git Notes and then disregard previous trial-and-error histories.
|
|
|
|
---
|
|
|
|
## Known Pitfalls (2026-06-05)
|
|
|
|
### Defer-Not-Catch Pattern for Native Crashes
|
|
|
|
`imgui-bundle` (and similar native extension libraries) expose C-level functions that can crash the Python process with a Windows access violation (`0xc0000005`) or a SIGSEGV on Linux. **These crashes are not catchable from Python** — `try/except Exception` does not intercept native access violations, only Python exceptions.
|
|
|
|
The fix is **defer-not-catch**: track a one-shot "ready" flag in instance state; return early on the first call, only invoking the C function on subsequent calls. See [../docs/guide_gui_2.md](../docs/guide_gui_2.md#workspace-profile-defer-not-catch) and [../docs/guide_testing.md](../docs/guide_testing.md#known-gotchas-2026-06-05) for the canonical examples and how to recognize these crashes.
|
|
|
|
When designing any method that calls into `imgui.*` (or similar native libs), ask: "Can this be called before ImGui is fully initialized?" If yes, add a defer-not-catch guard.
|
|
|
|
**Sentinel type contract.** When implementing a defer-not-catch guard, the early-return sentinel value must match the type contract of the downstream consumer. For `WorkspaceProfile.ini_content: str` (in this codebase), the sentinel must be `""` (str), not `b""` (bytes) — `tomli_w` rejects bytes (`TypeError: Object of type 'bytes' is not TOML serializable`), and `imgui.load_ini_settings_from_memory(ini_data: str, ...)` also expects `str`. A previous version of this fix used `b""` and silently broke the save flow via a `TypeError` raised by `tomli_w.dump`; tests passed unit-test-wise but failed in the live_gui save+load round-trip. The fix was a 1-character change (`b""` → `""`). The regression test in `tests/test_workspace_profile_serialization.py` encodes this contract.
|
|
|
|
### Test Failure Bisect Anchors (Theme Track)
|
|
|
|
When debugging test failures introduced by a theming/visual change, use the following bisect anchors:
|
|
|
|
- **Pre-existing failures:** bisect to commit `7df65dff` (last commit before the multi_themes_20260604 track began). Failures that reproduce at this anchor are pre-existing and not caused by the theme changes.
|
|
- **Theme-caused failures:** bisect to commit `7ea52cbb` (the theme refactor commit). Failures that only appear after this commit but not at `7df65dff` were introduced by the theme track.
|
|
|
|
In particular, watch for:
|
|
- Tests asserting theme color usage: the theme track changed `C_LBL` etc. from `ImVec4` values to callable functions. Tests that assert with `C_LBL` (the function) need to be updated to `C_LBL()` (the call), and they need to patch `src.theme_2.imgui` so the mock's `theme.get_color()` returns the mock's `ImVec4`.
|
|
- Tests with production code that builds dicts of theme color callables (e.g. `DIR_COLORS = {"request": C_OUT}`): the dict must store the function, and the use site must call it (`d_col()` not `d_col`). Bug example: `src/gui_2.py:3705-3707` (commit `1469ecac`).
|
|
|
|
### Live_gui Test Fragility (Authoring-Side)
|
|
|
|
`live_gui` is a session-scoped fixture. All tests in a session share the same `sloppy.py` subprocess. A test that "passes when run after test X but fails in isolation" is a **fragile test, not a fragile fixture**. The fixture is session-scoped by design; the test must explicitly wait-for-ready, reset state via Hook API, and verify preconditions via `get_value`/`wait_for_event` rather than assuming a "clean" ImGui state from a prior test. See [../docs/guide_testing.md](../docs/guide_testing.md#authoring-robust-live_gui-tests-dont-assume-clean-state) for the 5-rule authoring contract with anti-pattern vs pattern code examples. Bisect failures by running the test both in the full suite and in isolation to distinguish "test needs work" from "real app bug".
|
|
|
|
|
|
### Indentation-Driven Class Method Visibility (CRITICAL)
|
|
|
|
**The bug:** A class method defined with the right intent (2-space indent) may be parsed as **nested inside the previous function** if indentation is off by even one space. The file "passes" syntactically (imports OK) but the method is **not** on the class. `hasattr(App, 'method_name')` returns `False`. Any production code that calls `app.method_name` falls through to `__getattr__`, which delegates to the Controller (which also doesn't have the method), and a cryptic `AttributeError` is raised at runtime.
|
|
|
|
**This bit the project in 2026-06-05** during a cleanup commit. `_capture_workspace_profile` was indented with 3 spaces instead of 2 (drift from re-organizing method placement). The Python parser saw the method as a nested function inside `_apply_snapshot` (the previous method). The App class had 59 methods but no `_capture_workspace_profile`. 3 live_gui tests (test_auto_switch_sim, test_workspace_profiles_restoration, test_undo_redo_lifecycle) failed with cryptic `AttributeError: 'AppController' object has no attribute '_capture_workspace_profile'` deep in the test subprocess.
|
|
|
|
**How to detect during TDD:**
|
|
- After modifying a class body, walk the AST and verify all expected methods are class-level:
|
|
```bash
|
|
uv run python -c "import ast; tree = ast.parse(open('src/gui_2.py').read()); [print(item.name) for n in ast.walk(tree) if isinstance(n, ast.ClassDef) and n.name == 'App' for item in n.body if isinstance(item, ast.FunctionDef)]"
|
|
```
|
|
- The skeleton via `manual-slop_py_get_skeleton` should show the method as a class member. If it's missing, it's nested.
|
|
|
|
**How to fix:** Re-indent the affected method to exactly 2-space class level. Use the file_slice tool or PyCharm-style auto-format to verify. Run the failing test to confirm.
|
|
|
|
**Prevention:** When reorganizing a class body, run the AST check above immediately after the edit. This catches the issue in <1 second vs. finding it via failing live_gui tests minutes later.
|
|
|
|
---
|
|
|
|
## Planning Session Workflow
|
|
|
|
Some sessions are *planning-only* — the agent produces `spec.md` + `metadata.json` + `state.toml` + `plan.md` for a new track. NO code is written. The flow:
|
|
|
|
1. **Explore** the project context. Use the `brainstorming` skill for the structured process (explore → clarify → propose → spec → review → plan).
|
|
2. **Ask clarifying questions** (one at a time; multiple choice preferred) to nail down the design. The "what are you trying to achieve + what are the constraints" questions come first; the "what is the scope" question comes after.
|
|
3. **Propose 2-3 approaches** with tradeoffs. Lead with the recommended one and explain why.
|
|
4. **Write the spec** following the established template (Overview / Goals / Non-Goals / Architecture / Per-File Design / Migration / Risks / Out of Scope / See Also). The spec is the agent's *design intent* — it explains WHY, not just WHAT.
|
|
5. **User reviews the spec**. Revise until approved. **The spec MUST be approved before the plan is written.** A plan for an unapproved spec is wasted effort.
|
|
6. **Write the plan** following the `writing-plans` skill (2-5 minute steps; full code; TDD). The plan is the agent's *executable plan* — it shows exactly what code to write, one step at a time.
|
|
7. **User reviews the plan**. Revise until approved.
|
|
8. **Commit spec + plan** in separate commits (per-track: spec commit + plan commit; both with git notes summarizing the work). User invokes implementation in a different session.
|
|
|
|
**The plan is the only artifact the implementing agent reads.** Specs are reference; plans are executable. Both are committed.
|
|
|
|
**The agent (planning role) does not execute.** If a "while you're at it, can you also..." request arrives mid-session, redirect to a follow-up track; do NOT bundle unrelated work.
|
|
|
|
**For the agent's own reference:** the `brainstorming` skill is the source of truth for steps 1-6. The `writing-plans` skill is the source of truth for step 6.
|
|
|
|
---
|
|
|
|
## Track Dependencies and Execution Order
|
|
|
|
Tracks can depend on other tracks. The `blocked_by` field in each track's `metadata.json` lists the track IDs that must ship first. The field name in state.toml is `[blocked_by]` (a table of track_id = "merged" | "planned" | etc.).
|
|
|
|
Before starting implementation of a track:
|
|
|
|
1. **Verify all tracks in `blocked_by` are SHIPPED.** Check `conductor/tracks.md` for status (`[x]` = done), or read each blocked_by track's `state.toml` to confirm `current_phase` equals the last phase and the track's notes indicate completion.
|
|
2. **If any blocker is NOT shipped:** report to the Tier 2 Tech Lead. Do not proceed.
|
|
3. **If the post-state baseline assumptions in the spec (usually a §10 "Coordination with Pending Tracks" section) are not met:** STOP. The implementer must verify the baseline BEFORE starting Phase 1 of the track. The verification commands are in the spec.
|
|
|
|
The recommended execution order is the topological sort of the `blocked_by` graph. This is usually recorded in the most recent `docs/reports/PLANNING_DIGEST_*.md` (under "Recommended Execution Order" or "Dependency Picture").
|
|
|
|
---
|
|
|
|
## State.toml Template
|
|
|
|
Every track's `conductor/tracks/<track_id>/state.toml` should follow this structure (used as the agent's "where am I in this track" source of truth):
|
|
|
|
```toml
|
|
# Track state for <track_id>
|
|
# Updated by Tier 2 Tech Lead as tasks complete
|
|
|
|
[meta]
|
|
track_id = "<track_id>"
|
|
name = "<Human-Readable Name>"
|
|
status = "active" # active | completed
|
|
current_phase = 0 # 0 = pre-Phase 1; 1..N = in Phase N; "complete" if all phases done
|
|
last_updated = "<YYYY-MM-DD>"
|
|
|
|
[blocked_by]
|
|
# Optional. List of track_id = "merged" | "planned" | etc.
|
|
# When the implementation agent starts Phase 1, verify all listed tracks are merged.
|
|
other_track_id = "merged"
|
|
|
|
[blocks]
|
|
# Optional. Tracks that depend on this one (populated from the spec's §12.1 "Follow-up Track" section).
|
|
followup_track_id = "planned in <this_track_id>"
|
|
|
|
[phases]
|
|
# One entry per phase. Update checkpointsha when the phase checkpoint commit is made.
|
|
phase_1 = { status = "pending", checkpointsha = "", name = "<Phase Name>" }
|
|
phase_2 = { status = "pending", checkpointsha = "", name = "<Phase Name>" }
|
|
# ...
|
|
|
|
[tasks]
|
|
# Tasks within phases. Structure: t<phase>_<n> = { status, commit_sha, description }
|
|
# status: "pending" | "in_progress" | "completed" | "cancelled"
|
|
# The implementing agent marks "in_progress" when starting and "completed" with commit_sha when done.
|
|
t1_1 = { status = "pending", commit_sha = "", description = "<task description>" }
|
|
# ...
|
|
|
|
[verification]
|
|
# Filled as phases complete. The metadata.json's verification_criteria is the source of truth.
|
|
phase_<n>_<thing>_complete = false
|
|
|
|
[<track_specific_section>]
|
|
# Optional. Track-specific progress tracking (e.g., audit_count_progression, refactor_stats).
|
|
# Add whatever is useful for THIS track.
|
|
|
|
[public_api_migration_followup]
|
|
# Optional. If the spec plans a follow-up, list it here so future planners can find it.
|
|
```
|
|
|
|
The `current_phase` field is the single source of truth for "where is this track." When the implementing agent advances, they update it.
|
|
|
|
---
|
|
|
|
## Per-Task Decision Protocol
|
|
|
|
When the implementing agent encounters a decision not covered by the plan:
|
|
|
|
1. **If the decision is purely cosmetic** (e.g., variable naming, comment placement, exact spacing): pick the option that matches the surrounding code style. Document the choice in the commit message.
|
|
2. **If the decision affects the architecture** (e.g., the spec's data model doesn't fit the code; the plan's approach doesn't compile; an external library doesn't behave as expected): **STOP. Do not commit. Report to the Tier 2 Tech Lead.** The lead will either:
|
|
- Update the spec to match the new constraint
|
|
- Add a clarifying task to the plan
|
|
- Defer the work to a follow-up track
|
|
3. **If the decision is a regression** (e.g., the plan's code works but introduces a known bug, or fails a test the plan didn't anticipate): **STOP and report.** Don't ship a known regression to save time. The lead will decide whether to fix forward or roll back.
|
|
|
|
**The principle: small decisions, decide yourself. Large decisions, escalate.** The boundary is "does this decision require a new spec or plan update?"
|
|
|
|
**Documentation:** if a decision was made that the spec or plan should reflect (even if it was a small decision), add a brief note in the commit message. The next agent (after compaction) reads commit messages to recover context.
|
|
|
|
---
|
|
|
|
## Skip-Marker Policy: Documentation, Not Avoidance
|
|
|
|
`@pytest.mark.skip(reason=...)` is **documentation of a known failure**, not a way to avoid fixing the underlying bug. Skip markers are useful for:
|
|
|
|
- **Opt-in integration tests** that require external resources (a real API key, a live provider, a specific env var). Use `@pytest.mark.skipif(...)` with an env-var gate so the test runs when the resource is available and skips by default.
|
|
- **Tests for features that don't exist yet** (planned but not implemented).
|
|
- **Tests for features behind a feature flag** that's currently off.
|
|
|
|
Skip markers are NOT useful for:
|
|
|
|
- **Pre-existing failing tests** (a test that "used to pass" or "was supposed to pass but the underlying code regressed"). The underlying code/test should be fixed in-session.
|
|
- **Tests that the agent doesn't understand** ("I don't know how to fix this, so I'll skip it"). Escalate to a Tier 4 QA agent for analysis, or ask the user.
|
|
- **Tests with racy assertions that the agent doesn't want to debug** (e.g., a `time.sleep(0.5)` would fix it). Fix the race, don't skip.
|
|
|
|
**When you add a skip marker, you MUST also:**
|
|
1. Document the underlying issue in the `reason=` string (one or two sentences).
|
|
2. State what the fix would be (file:line or a one-line description).
|
|
3. Commit the skip with a follow-up note in the commit body that records the underlying issue, so the next agent (or future self after compaction) can find it via `git log --oneline --grep "skip"`.
|
|
|
|
**When the underlying issue is fixable in-session, FIX IT INSTEAD of adding a skip marker.** Limited context is not an excuse: the agent may not know whether the fix is "important" or "easy" until it tries. A skip marker that never gets revisited is a silent test-suite rot.
|
|
|
|
**Review checklist before adding a skip marker:**
|
|
- [ ] Is this a known-bad infrastructure issue (env-var gated)? Use `@pytest.mark.skipif` instead.
|
|
- [ ] Is this a feature not yet implemented? If so, the feature should be a TODO, not a skip.
|
|
- [ ] Can the test be fixed in < 30 minutes of investigation? If yes, fix it.
|
|
- [ ] If the fix is too large, is the underlying issue tracked elsewhere (a conductor track, a TODO in the code)?
|
|
|
|
Reference: AGENTS.md "Critical Anti-Patterns" section "Use skip markers as excuse to AVOID" (added 2026-06-07).
|
|
|
|
---
|
|
|
|
## Documentation Refresh Protocol
|
|
|
|
Architectural refactor tracks often change the *shape* of modules the existing docs describe. After a track ships, the affected guides may be partly out of date.
|
|
|
|
**After each track ships, the implementing agent must:**
|
|
|
|
1. **Identify affected guides.** Run `grep -l "<renamed_or_moved_thing>" docs/guide_*.md` to find guides that reference renamed/moved symbols. Also check `docs/Readme.md` for the table of guides.
|
|
2. **For each affected guide, update it to reflect the new module structure.** If the spec's §3 or §4 lists the new file structure, mirror that in the guide.
|
|
3. **If the track introduced a NEW module**, add a new guide (or a new section to an existing guide). Per the project's `docs/Readme.md` structure, deep-dive guides are per-source-file (e.g., `guide_ai_client.md`, `guide_mcp_client.md`).
|
|
4. **If the track introduced a NEW convention** (e.g., the `Result[T]` pattern, the `TypeAlias` convention, the sub-MCP architecture), add a styleguide in `conductor/code_styleguides/<convention_name>.md`. Update `conductor/product-guidelines.md` to reference it.
|
|
5. **Commit the doc updates** as part of the track's final phase (or as a follow-up track if the scope is too large).
|
|
|
|
**The "post-tracks documentation" pattern is repeatable.** A track that only updates code (not docs) is incomplete. The latest `docs/reports/PLANNING_DIGEST_*.md` (under "Recommended Future Tracks") often lists the documentation refresh as the next track.
|
|
|
|
**Test for staleness:** before marking a track complete, run `git log --oneline -10 -- conductor/tracks/<track_id>/` to confirm the docs were touched in the same window as the code. If only code was committed, the track is incomplete.
|
|
|
|
---
|
|
|
|
## Audit Script Policy
|
|
|
|
Whenever a track introduces a new convention that can be statically checked, add an audit script in `scripts/`. The audit + CI gate pair is the convention-enforcement mechanism for this project. Conventions without audits will drift; audits without CI integration will be ignored.
|
|
|
|
**Script conventions:**
|
|
- Filename: `audit_<thing>.py` or `check_<thing>.py` (matching the existing 3 scripts)
|
|
- Must have a `--help` that explains what it checks and how to fix violations
|
|
- Should support a `--json` mode for CI integration (machine-readable output)
|
|
- Should have a default informational mode (exits 0; prints human-readable report) AND a strict mode (exits 1 on regression; used as CI gate)
|
|
- Should be runnable from the repo root
|
|
|
|
**Existing audit scripts as precedent:**
|
|
- `scripts/audit_main_thread_imports.py` — enforces the main-thread-purity invariant from the `startup_speedup_20260606` track
|
|
- `scripts/audit_weak_types.py` — enforces the type-alias convention from the `data_structure_strengthening_20260606` track
|
|
- `scripts/check_test_toml_paths.py` — enforces no real-TOML references in tests (predates the audit-script-policy, but follows the pattern)
|
|
|
|
**CI integration:** when a new audit script is added, it should be added to whatever CI workflow exists (or a follow-up track should add the CI workflow if one doesn't exist). The strict mode of the audit is the gate.
|
|
|
|
**The audit-script + styleguide pair:** every audit script's documented "what it checks" should map to a section in a `conductor/code_styleguides/` file. The styleguide says "this is the rule"; the audit says "your code violates this rule." The pair is complete when both exist.
|
|
|