428 lines
30 KiB
Markdown
428 lines
30 KiB
Markdown
# Project Workflow
|
|
|
|
## Session Start Checklist (MANDATORY)
|
|
|
|
## Code Style (MANDATORY - Python)
|
|
|
|
- **1-space indentation** for ALL Python code (NO EXCEPTIONS)
|
|
- **CRLF line endings** on Windows
|
|
- **NO COMMENTS** unless explicitly requested
|
|
- Type hints required for all public functions
|
|
- **ImGui Defer Patterns:** Use `imscope` context managers or `_render_window_if_open` dispatch helpers to prevent resource leaks and keep the main loop flat. See `conductor/code_styleguides/python.md` for details.
|
|
|
|
### CRITICAL: Native Edit Tool Destroys Indentation
|
|
|
|
The native `Edit` tool DESTROYS 1-space indentation and converts to 4-space.
|
|
|
|
**NEVER use native `edit` tool on Python files.**
|
|
|
|
Instead, use Manual Slop MCP tools:
|
|
- `manual-slop_py_update_definition` - Replace function/class
|
|
- `manual-slop_set_file_slice` - Replace line range
|
|
- `manual-slop_py_set_signature` - Replace signature only
|
|
|
|
Or use Python subprocess with `newline=''` to preserve line endings:
|
|
```python
|
|
python -c "
|
|
with open('file.py', 'r', encoding='utf-8', newline='') as f:
|
|
content = f.read()
|
|
content = content.replace(old, new)
|
|
with open('file.py', 'w', encoding='utf-8', newline='') as f:
|
|
f.write(content)
|
|
"
|
|
```
|
|
|
|
## Guiding Principles
|
|
|
|
1. **The Plan is the Source of Truth:** All work must be tracked in `plan.md`
|
|
2. **The Tech Stack is Deliberate:** Changes to the tech stack must be documented in `tech-stack.md` *before* implementation
|
|
3. **Test-Driven Development:** Write unit tests before implementing functionality
|
|
4. **High Code Coverage:** Aim for >80% code coverage for all modules
|
|
5. **User Experience First:** Every decision should prioritize user experience
|
|
6. **Non-Interactive & CI-Aware:** Prefer non-interactive commands. Use `CI=true` for watch-mode tools (tests, linters) to ensure single execution.
|
|
7. **MMA Tiered Delegation is Mandatory:** The Conductor acts as a Tier 1/2 Orchestrator. You MUST delegate all non-trivial coding to Tier 3 Workers and all error analysis to Tier 4 QA Agents. Do NOT perform large file writes directly.
|
|
8. **Mandatory Research-First Protocol:** Before reading the full content of any file over 50 lines, you MUST use `get_file_summary`, `py_get_skeleton`, `py_get_code_outline`, or `py_get_docstring` to map the architecture and identify specific target ranges. Use `get_git_diff` to understand recent changes. Use `py_find_usages` to locate where symbols are used.
|
|
9. **Architecture Documentation Fallback:** When uncertain about threading, event flow, data structures, or module interactions, consult the deep-dive docs in `docs/` (last refreshed: 2026-06-02 via the comprehensive documentation refresh track, **8 new guides added**):
|
|
- **[docs/guide_architecture.md](../docs/guide_architecture.md):** Thread domains, cross-thread patterns, AI client multi-provider (Gemini, Anthropic, DeepSeek, Gemini CLI, MiniMax), HITL Execution Clutch.
|
|
- **[docs/guide_tools.md](../docs/guide_tools.md):** MCP Bridge 3-layer security, full 45-tool inventory, Hook API, ApiHookClient, `/api/ask` HITL protocol.
|
|
- **[docs/guide_mma.md](../docs/guide_mma.md):** Ticket/Track/WorkerContext data structures, DAG engine, ConductorEngine, Tier 2/3/4 lifecycles, persona application.
|
|
- **[docs/guide_simulations.md](../docs/guide_simulations.md):** `live_gui` fixture, Puppeteer pattern, mock provider, test areas by subsystem.
|
|
- **[docs/guide_testing.md](../docs/guide_testing.md):** **NEW** — 251 test files, 5 categories, 7 conftest fixtures (`isolate_workspace`, `reset_paths`, `reset_ai_client`, `vlogger`, `kill_process_tree`, `mock_app`, `live_gui` session-scoped), Puppeteer pattern, mock provider, structural testing contract.
|
|
- **[docs/guide_gui_2.md](../docs/guide_gui_2.md):** **NEW** — `src/gui_2.py` (260KB main GUI): App class lifecycle, ~90 module-level render functions, Multi-Viewport docks, panel registry, command palette integration, ImGuiScope context managers, hot reload support.
|
|
- **[docs/guide_ai_client.md](../docs/guide_ai_client.md):** **NEW** — `src/ai_client.py` (116KB): multi-provider LLM singleton (5 providers), async dispatch via `asyncio.gather`, threading.local for source tier tagging, Anthropic ephemeral caching + Gemini explicit caching, Tier 4 QA error interception.
|
|
- **[docs/guide_api_hooks.md](../docs/guide_api_hooks.md):** **NEW** — `src/api_hooks.py` + `src/api_hook_client.py` (38KB + 31KB): HookServer on `127.0.0.1:8999`, ApiHookClient wrapper, 8+ endpoints, Remote Confirmation Protocol via `/api/ask`.
|
|
- **[docs/guide_mcp_client.md](../docs/guide_mcp_client.md):** **NEW** — `src/mcp_client.py` (81KB, 45 tools): 3-layer security (Allowlist → Validate → Resolve), all native tools (File I/O, Python AST, C/C++ AST, Analysis, Network, Runtime, Beads), ExternalMCPManager (Stdio + SSE), JSON-RPC 2.0 engine.
|
|
- **[docs/guide_app_controller.md](../docs/guide_app_controller.md):** **NEW** — `src/app_controller.py` (166KB): headless orchestrator, AppState dataclass, all subsystem managers, `_predefined_callbacks`/`_gettable_fields` Hook API registries, SyncEventQueue, headless mode.
|
|
- **[docs/guide_multi_agent_conductor.md](../docs/guide_multi_agent_conductor.md):** **NEW** — `src/multi_agent_conductor.py` + `src/dag_engine.py` (28KB + 10KB): TrackDAG (iterative DFS cycle detection, Kahn's topological sort), ExecutionEngine (Auto-Queue / Step Mode), MultiAgentConductor + WorkerPool (concurrency 4), mma_exec.py sub-agent invocation.
|
|
- **[docs/guide_models.md](../docs/guide_models.md):** **NEW** — `src/models.py` (132KB): centralized data model registry, `AGENT_TOOL_NAMES` canonical 45-tool list, `PROVIDERS` constant, `parse_plan_md` utility, validation patterns, SDM tags.
|
|
- See [docs/Readme.md](../docs/Readme.md) for the full **14-guide index** covering context curation, shaders, RAG, Beads, hot reload, personas, NERV theme, workspace profiles, and command palette.
|
|
|
|
## Task Workflow
|
|
|
|
All tasks follow a strict lifecycle:
|
|
|
|
### Standard Task Workflow
|
|
|
|
0. **Initialize MMA Environment:** Before executing the first task of any track, you MUST activate the `mma-orchestrator` skill (`activate_skill mma-orchestrator`).
|
|
|
|
1. **Select Task:** Choose the next available task from `plan.md` in sequential order
|
|
|
|
2. **Mark In Progress:** Before beginning work, edit `plan.md` and change the task from `[ ]` to `[~]`
|
|
|
|
3. **High-Signal Research Phase:**
|
|
- **Identify Dependencies:** Use `list_directory`, `get_tree`, and `py_get_imports` to map file relations.
|
|
- **Map Architecture:** Use `py_get_code_outline` or `py_get_skeleton` on identified files to understand their structure.
|
|
- **Audit State:** Use `py_get_code_outline` or `py_get_definition` on the target class's `__init__` method to check for existing, unused, or duplicate state variables before adding new ones.
|
|
- **Analyze Changes:** Use `get_git_diff` if the task involves modifying recently updated code.
|
|
- **Minimize Token Burn:** Only use `read_file` with `start_line`/`end_line` for specific implementation details once target areas are identified.
|
|
4. **Write Failing Tests (Red Phase):**
|
|
- **Pre-Delegation Checkpoint:** Before spawning a worker for dangerous or non-trivial changes, ensure your current progress is staged (`git add .`) or committed. This prevents losing iterations if a sub-agent incorrectly uses `git restore`.
|
|
- **Zero-Assertion Ban:** You MUST NOT write tests that contain only `pass` or lack meaningful assertions. A test is only valid if it contains assertions that explicitly test the behavioral change and verify the failure condition.
|
|
- **Code Style:** ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
|
|
- **Delegate Test Creation:** Do NOT write test code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a **surgical prompt** specifying WHERE (file:line range), WHAT (test to create), HOW (which assertions/fixtures to use), and SAFETY (thread constraints if applicable). Example: `"Write tests in tests/test_cost_tracker.py for cost_tracker.py:estimate_cost(). Test all model patterns in MODEL_PRICING dict. Assert unknown model returns 0. Use 1-space indentation."` (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
|
|
- Take the code generated by the Worker and apply it.
|
|
- **CRITICAL:** Run the tests and confirm that they fail as expected. This is the "Red" phase of TDD. Do not proceed until you have failing tests.
|
|
|
|
5. **Implement to Pass Tests (Green Phase):**
|
|
- **Pre-Delegation Checkpoint:** Ensure current progress is staged or committed before delegating.
|
|
- **Code Style:** ALWAYS explicitly mention "Use exactly 1-space indentation for Python code" when prompting a sub-agent.
|
|
- **Delegate Implementation:** Do NOT write the implementation code directly. Spawn a Tier 3 Worker (`python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`) with a **surgical prompt** specifying WHERE (file:line range to modify), WHAT (the specific change), HOW (which API calls, data structures, or patterns to use), and SAFETY (thread-safety constraints). Example: `"In gui_2.py _render_mma_dashboard (lines 2685-2699), extend the token usage table from 3 to 5 columns. Add 'Model' and 'Est. Cost' using imgui.table_setup_column(). Call cost_tracker.estimate_cost(model, input_tokens, output_tokens). Use 1-space indentation."` (If repeating due to failures, pass `--failure-count X` to switch to a more capable model).
|
|
- Take the code generated by the Worker and apply it.
|
|
- Run the test suite again and confirm that all tests now pass. This is the "Green" phase.
|
|
|
|
6. **Refactor (Optional but Recommended):**
|
|
- With the safety of passing tests, refactor the implementation code and the test code to improve clarity, remove duplication, and enhance performance without changing the external behavior.
|
|
- Rerun tests to ensure they still pass after refactoring.
|
|
|
|
7. **Verify Coverage:** Run coverage reports using the project's chosen tools. For example, in a Python project, this might look like:
|
|
```powershell
|
|
pytest --cov=app --cov-report=html
|
|
```
|
|
Target: >80% coverage for new code. The specific tools and commands will vary by language and framework.
|
|
|
|
8. **Document Deviations:** If implementation differs from tech stack:
|
|
- **STOP** implementation
|
|
- Update `tech-stack.md` with new design
|
|
- Add dated note explaining the change
|
|
- Resume implementation
|
|
|
|
9. **Commit Code Changes:**
|
|
- **CRITICAL - ATOMIC PER-TASK COMMITS**: You MUST commit your changes immediately after completing and verifying a single task. Do NOT move on to the next task in the plan without committing the current one. This ensures precise tracking and safe rollback points.
|
|
- Stage all code changes related to the task.
|
|
- Propose a clear, concise commit message e.g, `feat(ui): Create basic HTML structure for calculator`.
|
|
- Perform the commit.
|
|
|
|
10. **Attach Task Summary with Git Notes:**
|
|
- **Step 9.1: Get Commit Hash:** Obtain the hash of the *just-completed commit* (`git log -1 --format="%H"`).
|
|
- **Step 9.2: Draft Note Content:** Create a detailed summary for the completed task. This should include the task name, a summary of changes, a list of all created/modified files, and the core "why" for the change.
|
|
- **Step 9.3: Attach Note:** Use the `git notes` command to attach the summary to the commit.
|
|
```powershell
|
|
# The note content from the previous step is passed via the -m flag.
|
|
git notes add -m "<note content>" <commit_hash>
|
|
```
|
|
|
|
11. **Get and Record Task Commit SHA:**
|
|
- **Step 10.1: Update Plan:** Read `plan.md`, find the line for the completed task, update its status from `[~]` to `[x]`, and append the first 7 characters of the *just-completed commit's* commit hash.
|
|
- **Step 10.2: Write Plan:** Write the updated content back to `plan.md`.
|
|
|
|
12. **Commit Plan Update:**
|
|
- **Action:** Stage the modified `plan.md` file.
|
|
- **Action:** Commit this change with a descriptive message (e.g., `conductor(plan): Mark task 'Create user model' as complete`).
|
|
|
|
### Phase Completion Verification and Checkpointing Protocol
|
|
|
|
**Trigger:** This protocol is executed immediately after a task is completed that also concludes a phase in `plan.md`.
|
|
|
|
1. **Announce Protocol Start:** Inform the user that the phase is complete and the verification and checkpointing protocol has begun.
|
|
|
|
2. **Ensure Test Coverage for Phase Changes:**
|
|
- **Step 2.1: Determine Phase Scope:** To identify the files changed in this phase, you must first find the starting point. Read `plan.md` to find the Git commit SHA of the *previous* phase's checkpoint. If no previous checkpoint exists, the scope is all changes since the first commit.
|
|
- **Step 2.2: List Changed Files:** Execute `git diff --name-only <previous_checkpoint_sha> HEAD` to get a precise list of all files modified during this phase.
|
|
- **Step 2.3: Verify and Create Tests:** For each file in the list:
|
|
- **CRITICAL:** First, check its extension. Exclude non-code files (e.g., `.json`, `.md`, `.yaml`).
|
|
- For each remaining code file, verify a corresponding test file exists.
|
|
- If a test file is missing, you **must** create one. Before writing the test, **first, analyze other test files in the repository to determine the correct naming convention and testing style.** The new tests **must** validate the functionality described in this phase's tasks (`plan.md`).
|
|
|
|
3. **Execute Automated Tests in Batches:**
|
|
- Because the full suite is large (>360 tests) and contains complex UI simulations, running the entire suite frequently can lead to random timeouts or threading access violations.
|
|
- Before execution, you **must** announce the exact shell command.
|
|
- **CRITICAL:** When verifying changes, **do not run the full suite (`pytest tests/`)**. Instead, run tests in small, targeted batches (maximum 4 test files at a time). Only use long timeouts (`--timeout=60` or `--timeout=120`) if the specific tests in the batch are known to be slow (e.g., simulation tests).
|
|
- **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `uv run pytest tests/test_specific_feature.py`"
|
|
- Execute the announced command.
|
|
- If tests fail with significant output (e.g., a large traceback), **DO NOT** attempt to read the raw `stderr` directly into your context. Instead, pipe the output to a log file and **spawn a Tier 4 QA Agent (`python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`)** to summarize the failure.
|
|
- You **must** inform the user and begin debugging using the QA Agent's summary. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance.
|
|
|
|
4. **Execute Automated API Hook Verification:**
|
|
- **CRITICAL:** The Conductor agent will now automatically execute verification tasks using the application's API hooks.
|
|
- The agent will announce the start of the automated verification to the user.
|
|
- It will then communicate with the application's IPC server to trigger the necessary verification functions.
|
|
- **Result Handling:**
|
|
- All results (successes and failures) from the API hook invocations will be logged.
|
|
- If all automated verifications pass, the agent will inform the user and proceed to the next step (Create Checkpoint Commit).
|
|
- If any automated verification fails, the agent will halt the workflow, present the detailed failure logs to the user, and await further instructions for debugging or remediation.
|
|
|
|
5. **Present Automated Verification Results and User Confirmation:**
|
|
- After executing automated verification, the Conductor agent will present the results to the user.
|
|
- If verification passed, the agent will state: "Automated verification completed successfully."
|
|
- If verification failed, the agent will state: "Automated verification failed. Please review the logs above for details. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance."
|
|
- **PAUSE** and await the user's response. Do not proceed without an explicit yes or confirmation from the user to proceed if tests pass, or guidance if tests fail.
|
|
|
|
6. **Create Checkpoint Commit:**
|
|
- Stage all changes. If no changes occurred in this step, proceed with an empty commit.
|
|
- Perform the commit with a clear and concise message (e.g., `conductor(checkpoint): Checkpoint end of Phase X`).
|
|
|
|
7. **Attach Auditable Verification Report using Git Notes:**
|
|
- **Step 7.1: Draft Note Content:** Create a detailed verification report including the automated test command, the manual verification steps, and the user's confirmation.
|
|
- **Step 7.2: Attach Note:** Use the `git notes` command and the full commit hash from the previous step to attach the full report to the checkpoint commit.
|
|
|
|
8. **Get and Record Phase Checkpoint SHA:**
|
|
- **Step 8.1: Get Commit Hash:** Obtain the hash of the *just-created checkpoint commit* (`git log -1 --format="%H"`).
|
|
- **Step 8.2: Update Plan:** Read `plan.md`, find the heading for the completed phase, and append the first 7 characters of the commit hash in the format `[checkpoint: <sha>]`.
|
|
- **Step 8.3: Write Plan:** Write the updated content back to `plan.md`.
|
|
|
|
9. **Commit Plan Update:**
|
|
- **Action:** Stage the modified `plan.md` file.
|
|
- **Action:** Commit this change with a descriptive message following the format `conductor(plan): Mark phase '<PHASE NAME>' as complete`.
|
|
|
|
10. **Announce Completion:** Inform the user that the phase is complete and the checkpoint has been created, with the detailed verification report attached as a git note.
|
|
|
|
### Verification via API Hooks
|
|
|
|
For features involving the GUI or complex internal state, unit tests are often insufficient. You MUST use the application's built-in API hooks for empirical verification:
|
|
|
|
1. **Launch the App with Hooks:** Run the application in a separate shell with the `--enable-test-hooks` flag:
|
|
```powershell
|
|
uv run python gui.py --enable-test-hooks
|
|
```
|
|
This starts the hook server on port `8999`.
|
|
|
|
2. **Use the pytest `live_gui` Fixture:** For automated tests, use the session-scoped `live_gui` fixture defined in `tests/conftest.py`. This fixture handles the lifecycle (startup/shutdown) of the application with hooks enabled.
|
|
```python
|
|
def test_my_feature(live_gui):
|
|
# The GUI is now running on port 8999
|
|
...
|
|
```
|
|
Note: pytest must be run with `uv`.
|
|
|
|
3. **Verify via ApiHookClient:** Use the `ApiHookClient` in `api_hook_client.py` to interact with the running application. It includes robust retry logic and health checks.
|
|
|
|
4. **Verify via REST Commands:** Use PowerShell or `curl` to send commands to the application and verify the response. For example, to check health:
|
|
```powershell
|
|
Invoke-RestMethod -Uri "http://127.0.0.1:8999/status" -Method Get
|
|
```
|
|
|
|
### Quality Gates
|
|
|
|
Before marking any task complete, verify:
|
|
|
|
- [ ] All tests pass
|
|
- [ ] Code coverage meets requirements (>80%)
|
|
- [ ] Code follows project's code style guidelines (as defined in `code_styleguides/`)
|
|
- [ ] All public functions/methods are documented (e.g., docstrings, JSDoc, GoDoc)
|
|
- [ ] Type safety is enforced (e.g., type hints, TypeScript types, Go types)
|
|
- [ ] No linting or static analysis errors (using the project's configured tools)
|
|
- [ ] Works correctly on mobile (if applicable)
|
|
- [ ] Documentation updated if needed
|
|
- [ ] No security vulnerabilities introduced
|
|
|
|
## Development Commands
|
|
|
|
**AI AGENT INSTRUCTION: This section should be adapted to the project's specific language, framework, and build tools.**
|
|
|
|
### Setup
|
|
|
|
```powershell
|
|
# Example: Commands to set up the development environment (e.g., install dependencies, configure database)
|
|
# e.g., for a Node.js project: npm install
|
|
# e.g., for a Go project: go mod tidy
|
|
```
|
|
|
|
### Daily Development
|
|
|
|
```powershell
|
|
# Example: Commands for common daily tasks (e.g., start dev server, run tests, lint, format)
|
|
# e.g., for a Node.js project: npm run dev, npm test, npm run lint
|
|
# e.g., for a Go project: go run main.go, go test ./..., go fmt ./...
|
|
```
|
|
|
|
### Before Committing
|
|
|
|
```powershell
|
|
# Example: Commands to run all pre-commit checks (e.g., format, lint, type check, run tests)
|
|
# e.g., for a Node.js project: npm run check
|
|
# e.g., for a Go project: make check (if a Makefile exists)
|
|
```
|
|
|
|
## Testing Requirements
|
|
|
|
### Structural Testing Contract
|
|
|
|
1. **Ban on Arbitrary Core Mocking:** Tier 3 workers are strictly forbidden from using `unittest.mock.patch` to bypass or stub core infrastructure (e.g., event queues, `ai_client` internals, threading primitives) unless explicitly authorized by the Tier 2 Tech Lead for a specific boundary test.
|
|
2. **`live_gui` Standard:** All integration and end-to-end testing must utilize the `live_gui` fixture to interact with a real instance of the application via the Hook API. Bypassing the hook server to directly mutate GUI state in tests is prohibited.
|
|
3. **Artifact Isolation:** All test-generated artifacts (logs, temporary workspaces, mock outputs) MUST be written to the `tests/artifacts/` or `tests/logs/` directories. These directories are git-ignored to prevent repository pollution.
|
|
|
|
### Unit Testing
|
|
|
|
- Every module must have corresponding tests.
|
|
- Use appropriate test setup/teardown mechanisms (e.g., fixtures, beforeEach/afterEach).
|
|
- Mock external dependencies.
|
|
- Test both success and failure cases.
|
|
|
|
### Integration Testing
|
|
|
|
- Test complete user flows
|
|
- Verify database transactions
|
|
- Test authentication and authorization
|
|
- Check form submissions
|
|
|
|
### Mobile Testing
|
|
|
|
- Test on actual iPhone when possible
|
|
- Use Safari developer tools
|
|
- Test touch interactions
|
|
- Verify responsive layouts
|
|
- Check performance on 3G/4G
|
|
|
|
## Code Review Process
|
|
|
|
### Self-Review Checklist
|
|
|
|
Before requesting review:
|
|
|
|
1. **Functionality**
|
|
- Feature works as specified
|
|
- Edge cases handled
|
|
- Error messages are user-friendly
|
|
|
|
2. **Code Quality**
|
|
- Follows style guide
|
|
- DRY principle applied
|
|
- Clear variable/function names
|
|
- Appropriate comments
|
|
|
|
3. **Testing**
|
|
- Unit tests comprehensive
|
|
- Integration tests pass
|
|
- Coverage adequate (>80%)
|
|
|
|
4. **Security**
|
|
- No hardcoded secrets
|
|
- Input validation present
|
|
- SQL injection prevented
|
|
- XSS protection in place
|
|
|
|
5. **Performance**
|
|
- Database queries optimized
|
|
- Images optimized
|
|
- Caching implemented where needed
|
|
|
|
6. **Mobile Experience**
|
|
- Touch targets adequate (44x44px)
|
|
- Text readable without zooming
|
|
- Performance acceptable on mobile
|
|
- Interactions feel native
|
|
|
|
## Commit Guidelines
|
|
|
|
### Message Format
|
|
|
|
```
|
|
<type>(<scope>): <description>
|
|
|
|
[optional body]
|
|
|
|
[optional footer]
|
|
```
|
|
|
|
### Types
|
|
|
|
- `feat`: New feature
|
|
- `fix`: Bug fix
|
|
- `docs`: Documentation only
|
|
- `style`: Formatting, missing semicolons, etc.
|
|
- `refactor`: Code change that neither fixes a bug nor adds a feature
|
|
- `test`: Adding missing tests
|
|
- `chore`: Maintenance tasks
|
|
|
|
### Examples
|
|
|
|
```powershell
|
|
git commit -m "feat(auth): Add remember me functionality"
|
|
git commit -m "fix(posts): Correct excerpt generation for short posts"
|
|
git commit -m "test(comments): Add tests for emoji reaction limits"
|
|
git commit -m "style(mobile): Improve button touch targets"
|
|
```
|
|
|
|
## Definition of Done
|
|
|
|
A task is complete when:
|
|
|
|
1. All code implemented to specification
|
|
2. Unit tests written and passing
|
|
3. Code coverage meets project requirements
|
|
4. Documentation complete (if applicable)
|
|
5. Code passes all configured linting and static analysis checks
|
|
6. Works beautifully on mobile (if applicable)
|
|
7. Implementation notes added to `plan.md`
|
|
8. Changes committed with proper message
|
|
9. Git note with task summary attached to the commit
|
|
|
|
## Conductor Token Firewalling & Model Switching Strategy
|
|
|
|
To emulate the 4-Tier MMA Architecture within the standard Conductor extension without requiring a custom fork, adhere to these strict workflow policies:
|
|
|
|
### 1. Active Model Switching (Simulating the 4 Tiers)
|
|
|
|
- **Mandatory Skill Activation:** As the very first step of any MMA-driven process, including track initialization and implementation phases, the agent MUST activate the `mma-orchestrator` skill (`activate_skill mma-orchestrator`) and their corresponding role's specific tier skill. This is crucial for enforcing the 4-Tier token firewall.
|
|
- **The MMA Bridge (`mma_exec.py`):** All tiered delegation is routed through `uv python scripts/mma_exec.py`. This script acts as the primary bridge, managing model selection, context injection, and logging.
|
|
- **Model Tiers:**
|
|
- **Tier 1 (Strategic/Orchestration):** `gemini-3.1-pro-preview`. Focused on product alignment, setup (`/conductor:setup`), and track initialization (`/conductor:newTrack`).
|
|
- **Tier 2 (Architectural/Tech Lead):** `gemini-3-flash-preview`. Focused on architectural design and track execution (`/conductor:implement`). **Note:** Tier 2 maintains persistent memory throughout a track's implementation.
|
|
- **Tier 3 (Execution/Worker):** `gemini-2.5-flash-lite`. Used for surgical code implementation and test generation. Operates statelessly (Context Amnesia) but has access to file I/O tools.
|
|
- **Tier 4 (Utility/QA):** `gemini-2.5-flash-lite`. Used for log summarization and error analysis. Operates statelessly (Context Amnesia) but has access to diagnostic tools.
|
|
- **Tiered Delegation Protocol:**
|
|
- **Tier 3 Worker:** `uv run python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`
|
|
- **Tier 4 QA Agent:** `uv run python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`
|
|
- **Observability:** All hierarchical interactions are recorded in `logs/mma_delegation.log` and detailed sub-agent logs are saved to `logs/agents/`.
|
|
|
|
### 2. Context Management and Token Firewalling
|
|
|
|
- **Context Amnesia (Tiers 3 & 4):** `mma_exec.py` enforces "Context Amnesia" by executing sub-agents in a stateless manner. Each call starts with a clean slate, receiving only the strictly necessary documents and prompts.
|
|
- **Persistent Memory (Tier 2):** The Tier 2 Tech Lead does NOT use Context Amnesia during track implementation to ensure continuity of technical strategy.
|
|
- **AST Skeleton Views:** For Tier 3 implementation, `mma_exec.py` automatically generates "AST Skeleton Views" of project dependencies. This provides the worker model with the interface-level structure (function signatures, docstrings) of imported modules without the full source code, maximizing the signal-to-noise ratio in the context window.
|
|
|
|
### 3. Phase Checkpoints (The Final Defense)
|
|
|
|
- The **Phase Completion Verification and Checkpointing Protocol** is the project's primary defense against token bloat.
|
|
- When a Phase is marked complete and a checkpoint commit is created, the AI Agent must actively interpret this as a **"Context Wipe"** signal. It should summarize the outcome in its git notes and move forward treating the checkpoint as absolute truth, deliberately dropping earlier conversational history.
|
|
- **MMA Phase Memory Wipe:** After completing a major Phase, use the Tier 1/2 Orchestrator's perspective to consolidate state into Git Notes and then disregard previous trial-and-error histories.
|
|
|
|
---
|
|
|
|
## Known Pitfalls (2026-06-05)
|
|
|
|
### Defer-Not-Catch Pattern for Native Crashes
|
|
|
|
`imgui-bundle` (and similar native extension libraries) expose C-level functions that can crash the Python process with a Windows access violation (`0xc0000005`) or a SIGSEGV on Linux. **These crashes are not catchable from Python** — `try/except Exception` does not intercept native access violations, only Python exceptions.
|
|
|
|
The fix is **defer-not-catch**: track a one-shot "ready" flag in instance state; return early on the first call, only invoking the C function on subsequent calls. See [../docs/guide_gui_2.md](../docs/guide_gui_2.md#workspace-profile-defer-not-catch) and [../docs/guide_testing.md](../docs/guide_testing.md#known-gotchas-2026-06-05) for the canonical examples and how to recognize these crashes.
|
|
|
|
When designing any method that calls into `imgui.*` (or similar native libs), ask: "Can this be called before ImGui is fully initialized?" If yes, add a defer-not-catch guard.
|
|
|
|
**Sentinel type contract.** When implementing a defer-not-catch guard, the early-return sentinel value must match the type contract of the downstream consumer. For `WorkspaceProfile.ini_content: str` (in this codebase), the sentinel must be `""` (str), not `b""` (bytes) — `tomli_w` rejects bytes (`TypeError: Object of type 'bytes' is not TOML serializable`), and `imgui.load_ini_settings_from_memory(ini_data: str, ...)` also expects `str`. A previous version of this fix used `b""` and silently broke the save flow via a `TypeError` raised by `tomli_w.dump`; tests passed unit-test-wise but failed in the live_gui save+load round-trip. The fix was a 1-character change (`b""` → `""`). The regression test in `tests/test_workspace_profile_serialization.py` encodes this contract.
|
|
|
|
### Test Failure Bisect Anchors (Theme Track)
|
|
|
|
When debugging test failures introduced by a theming/visual change, use the following bisect anchors:
|
|
|
|
- **Pre-existing failures:** bisect to commit `7df65dff` (last commit before the multi_themes_20260604 track began). Failures that reproduce at this anchor are pre-existing and not caused by the theme changes.
|
|
- **Theme-caused failures:** bisect to commit `7ea52cbb` (the theme refactor commit). Failures that only appear after this commit but not at `7df65dff` were introduced by the theme track.
|
|
|
|
In particular, watch for:
|
|
- Tests asserting theme color usage: the theme track changed `C_LBL` etc. from `ImVec4` values to callable functions. Tests that assert with `C_LBL` (the function) need to be updated to `C_LBL()` (the call), and they need to patch `src.theme_2.imgui` so the mock's `theme.get_color()` returns the mock's `ImVec4`.
|
|
- Tests with production code that builds dicts of theme color callables (e.g. `DIR_COLORS = {"request": C_OUT}`): the dict must store the function, and the use site must call it (`d_col()` not `d_col`). Bug example: `src/gui_2.py:3705-3707` (commit `1469ecac`).
|
|
|
|
### Live_gui Test Fragility (Authoring-Side)
|
|
|
|
`live_gui` is a session-scoped fixture. All tests in a session share the same `sloppy.py` subprocess. A test that "passes when run after test X but fails in isolation" is a **fragile test, not a fragile fixture**. The fixture is session-scoped by design; the test must explicitly wait-for-ready, reset state via Hook API, and verify preconditions via `get_value`/`wait_for_event` rather than assuming a "clean" ImGui state from a prior test. See [../docs/guide_testing.md](../docs/guide_testing.md#authoring-robust-live_gui-tests-dont-assume-clean-state) for the 5-rule authoring contract with anti-pattern vs pattern code examples. Bisect failures by running the test both in the full suite and in isolation to distinguish "test needs work" from "real app bug".
|