fixes to mma skills

chore(conductor): Checkpoint Phase 2: Manifest and Tooling for test curation track
conductor(plan): Mark phase 'Research and Inventory' as complete
2026-02-25 21:12:10 -05:00 · 2026-02-25 21:05:00 -05:00 · 2026-02-25 20:52:53 -05:00 · 2026-02-25 20:52:45 -05:00 · 2026-02-25 20:42:43 -05:00
10 changed files with 349 additions and 8 deletions
@@ -15,4 +15,5 @@ You are the Tier 2 Tech Lead. Your role is to ensure architectural integrity, al
 ## Limitations
 - Do not write boilerplate or exhaustive feature code yourself.
- Delegate implementation tasks to Tier 3 Workers.
+- Delegate implementation tasks to Tier 3 Workers using `uv python scripts/mma_exec.py --role tier3-worker "[PROMPT]"`.
 - For error analysis of large logs, use `uv python scripts/mma_exec.py --role tier4-qa "[PROMPT]"`.
@@ -45,6 +45,9 @@ This file tracks all major tracks for the project. Each track has its own detail
 ---
 - [~] **Track: Test Suite Curation and Organization**
 *Link: [./tracks/test_curation_20260225/](./tracks/test_curation_20260225/)*
 ---
@@ -0,0 +1,5 @@
 # Track test_curation_20260225 Context
 - [Specification](./spec.md)
 - [Implementation Plan](./plan.md)
 - [Metadata](./metadata.json)
@@ -0,0 +1,70 @@
 # Test Suite Inventory - manual_slop
 ## Categories
 ### Manual Slop Core/GUI
 - `tests/test_ai_context_history.py`
 - `tests/test_api_events.py`
 - `tests/test_gui_diagnostics.py`
 - `tests/test_gui_events.py`
 - `tests/test_gui_performance_requirements.py`
 - `tests/test_gui_stress_performance.py`
 - `tests/test_gui_updates.py`
 - `tests/test_gui2_events.py`
 - `tests/test_gui2_layout.py`
 - `tests/test_gui2_mcp.py`
 - `tests/test_gui2_parity.py`
 - `tests/test_gui2_performance.py`
 - `tests/test_headless_api.py`
 - `tests/test_headless_dependencies.py`
 - `tests/test_headless_startup.py`
 - `tests/test_history_blacklist.py`
 - `tests/test_history_bleed.py` (FAILING)
 - `tests/test_history_migration.py`
 - `tests/test_history_persistence.py`
 - `tests/test_history_truncation.py`
 - `tests/test_performance_monitor.py`
 - `tests/test_token_usage.py`
 - `tests/test_layout_reorganization.py`
 ### Conductor/MMA (To be Blacklisted from core runs)
 - `tests/test_mma_exec.py`
 - `tests/test_mma_skeleton.py`
 - `tests/test_conductor_api_hook_integration.py`
 - `tests/conductor/test_infrastructure.py`
 - `tests/test_gemini_cli_adapter.py`
 - `tests/test_gemini_cli_integration.py` (FAILING)
 - `tests/test_ai_client_cli.py`
 - `tests/test_cli_tool_bridge.py` (FAILING)
 - `tests/test_gemini_metrics.py`
 ### MCP/Integrations
 - `tests/test_api_hook_client.py`
 - `tests/test_api_hook_extensions.py`
 - `tests/test_hooks.py`
 - `tests/test_sync_hooks.py`
 - `tests/test_mcp_perf_tool.py`
 ### Simulation/Workflows
 - `tests/test_sim_ai_settings.py`
 - `tests/test_sim_base.py`
 - `tests/test_sim_context.py`
 - `tests/test_sim_execution.py`
 - `tests/test_sim_tools.py`
 - `tests/test_workflow_sim.py`
 - `tests/test_extended_sims.py`
 - `tests/test_user_agent.py`
 - `tests/test_live_workflow.py`
 - `tests/test_agent_capabilities.py`
 - `tests/test_agent_tools_wiring.py`
 ## Redundancy Observations
 - GUI tests are split between `gui` and `gui2`. Since `gui_2.py` is the current focus, legacy `gui` tests should be reviewed for relevance.
 - History tests are highly fragmented (5+ files).
 - Headless tests are fragmented (3 files).
 - Simulation tests are fragmented (10+ files).
 ## Failure Summary
 - `tests/test_cli_tool_bridge.py`: `test_deny_decision` and `test_unreachable_hook_server` failing (wrong decision returned).
 - `tests/test_gemini_cli_integration.py`: Integration with `gui_2.py` failing to find mock response in history.
 - `tests/test_history_bleed.py`: `test_get_history_bleed_stats_basic` failing (assert 0 == 900000).
@@ -0,0 +1,8 @@
 {
  "track_id": "test_curation_20260225",
  "type": "chore",
  "status": "new",
  "created_at": "2026-02-25T20:42:00Z",
  "updated_at": "2026-02-25T20:42:00Z",
  "description": "Review all tests that exist, some like the mma are conductor only (gemini cli, not related to manual slop program) and must be blacklisted from running when testing manual_slop itself. I think some tests are failing right now. Also no curation of the current tests has been done. They have been made incremetnally, on demand per track needs and have accumulated that way without any second-pass conslidation and organization. We problably can figure out a proper ordering, either add or remove tests based on redundancy or lack thero-of of an openly unchecked feature or process. This is important to get right now before doing heavier tracks."
 }
@@ -0,0 +1,30 @@
 # Implementation Plan: Test Suite Curation and Organization
 This plan outlines the process for categorizing, organizing, and curating the existing test suite using a central manifest and exhaustive review.
 ## Phase 1: Research and Inventory [checkpoint: be689ad]
 - [x] Task: Initialize MMA Environment `activate_skill mma-orchestrator` be689ad
 - [x] Task: Inventory all existing tests in `tests/` and mapping them to categories be689ad
 - [x] Task: Identify failing and redundant tests through a full execution sweep be689ad
 - [x] Task: Conductor - User Manual Verification 'Phase 1: Research and Inventory' (Protocol in workflow.md) be689ad
 ## Phase 2: Manifest and Tooling
 - [x] Task: T3-P2-1-STUB: Design tests.toml manifest schema (Completed by PM)
 - [x] Task: T3-P2-1-IMPL: Populate tests.toml with full inventory
 - [x] Task: T3-P2-2-STUB: Stub run_tests.py category-aware interface
 - [x] Task: T3-P2-2-IMPL: Implement run_tests.py filtering logic (Verified)
 - [x] Task: Verify that Conductor/MMA tests can be explicitly excluded from default runs (Verified)
 - [x] Task: Conductor - User Manual Verification 'Phase 2: Manifest and Tooling' (Protocol in workflow.md)
 ## Phase 3: Curation and Consolidation
 - [ ] Task: Fix all identified non-redundant failing tests
 - [ ] Task: Consolidate redundant tests into single, comprehensive test files
 - [ ] Task: Remove obsolete or deprecated test files
 - [ ] Task: Standardize test naming conventions across the suite
 - [ ] Task: Conductor - User Manual Verification 'Phase 3: Curation and Consolidation' (Protocol in workflow.md)
 ## Phase 4: Final Verification
 - [ ] Task: Execute full test suite by category using the new manifest
 - [ ] Task: Verify 100% pass rate for all non-blacklisted tests
 - [ ] Task: Generate a final test coverage report
 - [ ] Task: Conductor - User Manual Verification 'Phase 4: Final Verification' (Protocol in workflow.md)
@@ -0,0 +1,33 @@
 # Specification: Test Suite Curation and Organization
 ## Overview
 The current test suite for **Manual Slop** and the **Conductor** framework has grown incrementally and lacks a formal organization. This track aims to curate, categorize, and organize existing tests, specifically blacklisting Conductor-specific (MMA) tests from manual_slop's test runs. We will use a central manifest for test management and perform an exhaustive review of all tests to eliminate redundancy.
 ## Functional Requirements
 -   **Test Categorization:** Tests will be categorized into:
    -   Manual Slop Core/GUI
    -   Conductor/MMA
    -   MCP/Integrations
    -   Simulation/Workflows
 -   **Central Manifest:** Implement a `tests.toml` (or similar) manifest file to define test categories and blacklist specific tests from the default `manual_slop` test run.
 -   **Blacklisting:** Ensure that Conductor-only tests (e.g., MMA related) do not execute when running tests for the `manual_slop` application itself.
 -   **Exhaustive Curation:** Review all existing tests in `tests/` to:
    -   Fix failing tests.
    -   Identify and merge redundant tests.
    -   Remove obsolete tests.
    -   Ensure consistent naming conventions.
 ## Non-Functional Requirements
 -   **Clarity:** The `tests.toml` manifest should be easy to understand and maintain.
 -   **Reliability:** The curation must result in a stable, passing test suite for each category.
 ## Acceptance Criteria
 -   A central manifest (`tests.toml`) is created and used to manage test execution.
 -   Running `manual_slop` tests successfully ignores all blacklisted Conductor/MMA tests.
 -   All failing tests are either fixed or removed (if redundant).
 -   Each test file is assigned to at least one category in the manifest.
 -   Redundant test logic is consolidated.
 ## Out of Scope
 -   Writing new feature tests (unless required to consolidate redundancy).
 -   Major refactoring of the test framework itself (beyond the manifest).
@@ -11,21 +11,21 @@ To accomplish this, you MUST delegate token-heavy or stateless tasks to "Tier 3
 **CRITICAL Prerequisite:**
 To avoid hanging the CLI and ensure proper environment authentication, you MUST NOT call the `gemini` command directly. Instead, you MUST use the wrapper script:
-`.\scripts\run_subagent.ps1 -Role <Role> -Prompt "..."`
+`uv python scripts/mma_exec.py --role <Role> "..."`
 ## 1. The Tier 3 Worker (Heads-Down Coding)
 When you need to perform a significant code modification (e.g., refactoring a 50-line+ script, writing a massive class, or implementing a predefined spec):
 1. **DO NOT** attempt to write or use `replace`/`write_file` yourself. Your history will bloat.
 2. **DO** construct a single, highly specific prompt.
 3. **DO** spawn a sub-agent using `run_shell_command` pointing to the target file.
-   *Command:* `.\scripts\run_subagent.ps1 -Role Worker -Prompt "Read [FILE_PATH] and modify it to implement [SPECIFIC_INSTRUCTION]. Only write the code, no pleasantries."`
+   *Command:* `uv python scripts/mma_exec.py --role tier3-worker "Read [FILE_PATH] and modify it to implement [SPECIFIC_INSTRUCTION]. Only write the code, no pleasantries."`
 4. The Tier 3 Worker is stateless and has no tool access. You must take the clean code it returns and apply it to the file system using your own `replace` or `write_file` tools.
 ## 2. The Tier 4 QA Agent (Error Translation)
 If you run a local test (e.g., `npm test`, `pytest`, `go run`) via `run_shell_command` and it fails with a massive traceback (e.g., 100+ lines of `stderr`):
 1. **DO NOT** analyze the raw `stderr` in your own context window.
 2. **DO** immediately spawn a stateless Tier 4 agent to compress the error.
-3. *Command:* `.\scripts\run_subagent.ps1 -Role QA -Prompt "Summarize this stack trace into a 20-word fix: [PASTE_SNIPPET_OF_STDERR_HERE]"`
+3. *Command:* `uv python scripts/mma_exec.py --role tier4-qa "Summarize this stack trace into a 20-word fix: [PASTE_SNIPPET_OF_STDERR_HERE]"`
 4. Use the 20-word fix returned by the Tier 4 agent to inform your next architectural decision or pass it to the Tier 3 worker.
 ## 3. Context Amnesia (Phase Checkpoints)
@@ -40,7 +40,7 @@ When you complete a major Phase or Track within the `conductor` workflow:
 **Agent (You):** 
 ```json
 {
-  "command": ".\\scripts\\run_subagent.ps1 -Role QA -Prompt \"Summarize this stack trace into a 20-word fix: [snip first 30 lines...]\"",
+  "command": "python scripts/mma_exec.py --role tier4-qa \"Summarize this stack trace into a 20-word fix: [snip first 30 lines...]\"",
  "description": "Spawning Tier 4 QA to compress error trace statelessly."
 }
 ```
@@ -50,7 +50,7 @@ When you complete a major Phase or Track within the `conductor` workflow:
 **Agent (You):**
 ```json
 {
-  "command": ".\\scripts\\run_subagent.ps1 -Role Worker -Prompt \"Read file_cache.py and implement the ASTParser class using tree-sitter. Ensure you preserve docstrings but strip function bodies. Output the updated code.\"",
+  "command": "python scripts/mma_exec.py --role tier3-worker \"Read file_cache.py and implement the ASTParser class using tree-sitter. Ensure you preserve docstrings but strip function bodies. Output the updated code.\"",
  "description": "Delegating implementation to a Tier 3 Worker."
 }
 ```
@@ -1,5 +1,123 @@
-import pytest
+import argparse
 import sys
 import tomllib
 import pytest
 from typing import Dict, List, Any
 def load_manifest(path: str) -> Dict[str, Any]:
    """
    Loads a manifest file (expected to be in TOML format) from the given path.
    Args:
        path: The path to the TOML manifest file.
    Returns:
        A dictionary representing the loaded manifest.
    Raises:
        FileNotFoundError: If the manifest file does not exist.
        tomllib.TOMLDecodeError: If the manifest file is not valid TOML.
    """
    try:
        with open(path, 'rb') as f:
            return tomllib.load(f)
    except FileNotFoundError:
        print(f"Error: Manifest file not found at {path}", file=sys.stderr)
        raise
    except tomllib.TOMLDecodeError:
        print(f"Error: Could not decode TOML from {path}", file=sys.stderr)
        raise
 def get_test_files(manifest: Dict[str, Any], category: str) -> List[str]:
    """
    Determines the list of test files based on the manifest and a specified category.
    Args:
        manifest: The loaded manifest dictionary.
        category: The category of tests to retrieve.
    Returns:
        A list of file paths corresponding to the tests in the given category.
        Returns an empty list if the category is not found or has no tests.
    """
    print(f"DEBUG: Looking for category '{category}' in manifest.", file=sys.stderr)
    files = manifest.get("categories", {}).get(category, {}).get("files", [])
    print(f"DEBUG: Found test files for category '{category}': {files}", file=sys.stderr)
    return files
 def main():
    parser = argparse.ArgumentParser(
        description="Run tests with optional manifest and category filtering, passing additional pytest arguments.",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""\
 Example usage:
  python run_tests.py --manifest tests.toml --category unit -- --verbose --cov=my_module
  python run_tests.py --manifest tests.toml --category integration
  python run_tests.py --manifest tests.toml --category core
  python run_tests.py --manifest tests.toml # Runs tests from default_categories
  python run_tests.py -- --capture=no # Runs all tests with pytest args
 """
    )
    parser.add_argument(
        "--manifest",
        type=str,
        help="Path to the TOML manifest file containing test configurations."
    )
    parser.add_argument(
        "--category",
        type=str,
        help="Category of tests to run (e.g., 'unit', 'integration')."
    )
    args, remaining_pytest_args = parser.parse_known_args(sys.argv[1:])
    selected_test_files = []
    manifest_data = None
    if args.manifest:
        try:
            manifest_data = load_manifest(args.manifest)
        except (FileNotFoundError, tomllib.TOMLDecodeError):
            # Error message already printed by load_manifest
            sys.exit(1)
        if args.category:
            # Case 1: --manifest and --category provided
            files = get_test_files(manifest_data, args.category)
            selected_test_files.extend(files)
        else:
            # Case 2: --manifest provided, but no --category
            # Load default categories from manifest['execution']['default_categories']
            default_categories = manifest_data.get("execution", {}).get("default_categories", [])
            if not default_categories:
                print(f"Error: --manifest provided without --category, and no 'default_categories' found in manifest '{args.manifest}'.", file=sys.stderr)
                parser.print_help(sys.stderr)
                sys.exit(1)
            print(f"DEBUG: Using default categories from manifest '{args.manifest}': {default_categories}", file=sys.stderr)
            for cat in default_categories:
                files = get_test_files(manifest_data, cat)
                selected_test_files.extend(files)
    elif args.category:
        # Case 3: --category provided without --manifest
        print("Error: --category requires --manifest to be specified.", file=sys.stderr)
        parser.print_help(sys.stderr)
        sys.exit(1)
    # Combine selected test files with any remaining pytest arguments
    # If --manifest was not provided, selected_test_files will be empty.
    # If no tests were selected from manifest/category, selected_test_files will be empty.
    pytest_command_args = selected_test_files + remaining_pytest_args
    # Filter out empty strings that might appear if remaining_pytest_args had them
    final_pytest_args = [arg for arg in pytest_command_args if arg]
    # If no specific tests were selected and no manifest was provided,
    # and no other pytest args were given, pytest.main([]) runs default discovery.
    # This handles cases where user only passes pytest args like `python run_tests.py -- --cov=app`
    # or when manifest/category selection results in an empty list and no other args are passed.
    print(f"Running pytest with arguments: {final_pytest_args}", file=sys.stderr)
    sys.exit(pytest.main(final_pytest_args))
 if __name__ == "__main__":
-    sys.exit(pytest.main(sys.argv[1:]))
+    main()
@@ -0,0 +1,73 @@
 # Test Manifest for manual_slop
 [categories.core]
 description = "Manual Slop Core and GUI tests"
 files = [
    "tests/test_ai_context_history.py",
    "tests/test_api_events.py",
    "tests/test_gui_diagnostics.py",
    "tests/test_gui_events.py",
    "tests/test_gui_performance_requirements.py",
    "tests/test_gui_stress_performance.py",
    "tests/test_gui_updates.py",
    "tests/test_gui2_events.py",
    "tests/test_gui2_layout.py",
    "tests/test_gui2_mcp.py",
    "tests/test_gui2_parity.py",
    "tests/test_gui2_performance.py",
    "tests/test_headless_api.py",
    "tests/test_headless_dependencies.py",
    "tests/test_headless_startup.py",
    "tests/test_history_blacklist.py",
    "tests/test_history_bleed.py",
    "tests/test_history_migration.py",
    "tests/test_history_persistence.py",
    "tests/test_history_truncation.py",
    "tests/test_performance_monitor.py",
    "tests/test_token_usage.py",
    "tests/test_layout_reorganization.py"
 ]
 [categories.conductor]
 description = "Conductor and MMA internal tests (Blacklisted from default core runs)"
 files = [
    "tests/test_mma_exec.py",
    "tests/test_mma_skeleton.py",
    "tests/test_conductor_api_hook_integration.py",
    "tests/conductor/test_infrastructure.py",
    "tests/test_gemini_cli_adapter.py",
    "tests/test_gemini_cli_integration.py",
    "tests/test_ai_client_cli.py",
    "tests/test_cli_tool_bridge.py",
    "tests/test_gemini_metrics.py"
 ]
 [categories.integrations]
 description = "MCP and other external integration tests"
 files = [
    "tests/test_api_hook_client.py",
    "tests/test_api_hook_extensions.py",
    "tests/test_hooks.py",
    "tests/test_sync_hooks.py",
    "tests/test_mcp_perf_tool.py"
 ]
 [categories.simulations]
 description = "AI and Workflow simulation tests"
 files = [
    "tests/test_sim_ai_settings.py",
    "tests/test_sim_base.py",
    "tests/test_sim_context.py",
    "tests/test_sim_execution.py",
    "tests/test_sim_tools.py",
    "tests/test_workflow_sim.py",
    "tests/test_extended_sims.py",
    "tests/test_user_agent.py",
    "tests/test_live_workflow.py",
    "tests/test_agent_capabilities.py",
    "tests/test_agent_tools_wiring.py"
 ]
 [execution]
 default_categories = ["core", "integrations", "simulations"]
 blacklist_categories = ["conductor"]
Author	SHA1	Message	Date
ed	aea42e82ab	fixes to mma skills	2026-02-25 21:12:10 -05:00
ed	6152b63578	chore(conductor): Checkpoint Phase 2: Manifest and Tooling for test curation track	2026-02-25 21:05:00 -05:00
ed	26502df891	conductor(plan): Mark phase 'Research and Inventory' as complete	2026-02-25 20:52:53 -05:00
ed	be689ad1e9	chore(conductor): Checkpoint Phase 1: Research and Inventory for test curation track	2026-02-25 20:52:45 -05:00
ed	edae93498d	chore(conductor): Add new track 'Test Suite Curation and Organization'	2026-02-25 20:42:43 -05:00