docs: Update workflow rules, create new async tool track, and log journal

2026-03-03 01:49:04 -05:00
parent 2d3820bc76
commit 2b15bfb1c1
6 changed files with 72 additions and 4 deletions
@@ -102,9 +102,11 @@ All tasks follow a strict lifecycle:
        -   For each remaining code file, verify a corresponding test file exists.
        -   If a test file is missing, you **must** create one. Before writing the test, **first, analyze other test files in the repository to determine the correct naming convention and testing style.** The new tests **must** validate the functionality described in this phase's tasks (`plan.md`).

-3.  **Execute Automated Tests with Proactive Debugging:**
-    -   Before execution, you **must** announce the exact shell command you will use to run the tests.
-    -   **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `CI=true npm test`"
+3.  **Execute Automated Tests in Batches:**
+    -   Because the full suite is large (>360 tests) and contains complex UI simulations, running the entire suite frequently can lead to random timeouts or threading access violations.
+    -   Before execution, you **must** announce the exact shell command.
+    -   **CRITICAL:** When verifying changes, **do not run the full suite (`pytest tests/`)**. Instead, run tests in small, targeted batches (maximum 4 test files at a time). Only use long timeouts (`--timeout=60` or `--timeout=120`) if the specific tests in the batch are known to be slow (e.g., simulation tests).
+    -   **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `uv run pytest tests/test_specific_feature.py`"
    -   Execute the announced command.
        - If tests fail with significant output (e.g., a large traceback), **DO NOT** attempt to read the raw `stderr` directly into your context. Instead, pipe the output to a log file and **spawn a Tier 4 QA Agent (`python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`)** to summarize the failure.
        - You **must** inform the user and begin debugging using the QA Agent's summary. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance.