chore(conductor): Archive track 'test_curation_20260225'
This commit is contained in:
@@ -1,5 +0,0 @@
|
||||
# Track test_curation_20260225 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -1,70 +0,0 @@
|
||||
# Test Suite Inventory - manual_slop
|
||||
|
||||
## Categories
|
||||
|
||||
### Manual Slop Core/GUI
|
||||
- `tests/test_ai_context_history.py`
|
||||
- `tests/test_api_events.py`
|
||||
- `tests/test_gui_diagnostics.py`
|
||||
- `tests/test_gui_events.py`
|
||||
- `tests/test_gui_performance_requirements.py`
|
||||
- `tests/test_gui_stress_performance.py`
|
||||
- `tests/test_gui_updates.py`
|
||||
- `tests/test_gui2_events.py`
|
||||
- `tests/test_gui2_layout.py`
|
||||
- `tests/test_gui2_mcp.py`
|
||||
- `tests/test_gui2_parity.py`
|
||||
- `tests/test_gui2_performance.py`
|
||||
- `tests/test_headless_api.py`
|
||||
- `tests/test_headless_dependencies.py`
|
||||
- `tests/test_headless_startup.py`
|
||||
- `tests/test_history_blacklist.py`
|
||||
- `tests/test_history_bleed.py` (FAILING)
|
||||
- `tests/test_history_migration.py`
|
||||
- `tests/test_history_persistence.py`
|
||||
- `tests/test_history_truncation.py`
|
||||
- `tests/test_performance_monitor.py`
|
||||
- `tests/test_token_usage.py`
|
||||
- `tests/test_layout_reorganization.py`
|
||||
|
||||
### Conductor/MMA (To be Blacklisted from core runs)
|
||||
- `tests/test_mma_exec.py`
|
||||
- `tests/test_mma_skeleton.py`
|
||||
- `tests/test_conductor_api_hook_integration.py`
|
||||
- `tests/conductor/test_infrastructure.py`
|
||||
- `tests/test_gemini_cli_adapter.py`
|
||||
- `tests/test_gemini_cli_integration.py` (FAILING)
|
||||
- `tests/test_ai_client_cli.py`
|
||||
- `tests/test_cli_tool_bridge.py` (FAILING)
|
||||
- `tests/test_gemini_metrics.py`
|
||||
|
||||
### MCP/Integrations
|
||||
- `tests/test_api_hook_client.py`
|
||||
- `tests/test_api_hook_extensions.py`
|
||||
- `tests/test_hooks.py`
|
||||
- `tests/test_sync_hooks.py`
|
||||
- `tests/test_mcp_perf_tool.py`
|
||||
|
||||
### Simulation/Workflows
|
||||
- `tests/test_sim_ai_settings.py`
|
||||
- `tests/test_sim_base.py`
|
||||
- `tests/test_sim_context.py`
|
||||
- `tests/test_sim_execution.py`
|
||||
- `tests/test_sim_tools.py`
|
||||
- `tests/test_workflow_sim.py`
|
||||
- `tests/test_extended_sims.py`
|
||||
- `tests/test_user_agent.py`
|
||||
- `tests/test_live_workflow.py`
|
||||
- `tests/test_agent_capabilities.py`
|
||||
- `tests/test_agent_tools_wiring.py`
|
||||
|
||||
## Redundancy Observations
|
||||
- GUI tests are split between `gui` and `gui2`. Since `gui_2.py` is the current focus, legacy `gui` tests should be reviewed for relevance.
|
||||
- History tests are highly fragmented (5+ files).
|
||||
- Headless tests are fragmented (3 files).
|
||||
- Simulation tests are fragmented (10+ files).
|
||||
|
||||
## Failure Summary
|
||||
- `tests/test_cli_tool_bridge.py`: `test_deny_decision` and `test_unreachable_hook_server` failing (wrong decision returned).
|
||||
- `tests/test_gemini_cli_integration.py`: Integration with `gui_2.py` failing to find mock response in history.
|
||||
- `tests/test_history_bleed.py`: `test_get_history_bleed_stats_basic` failing (assert 0 == 900000).
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"track_id": "test_curation_20260225",
|
||||
"type": "chore",
|
||||
"status": "new",
|
||||
"created_at": "2026-02-25T20:42:00Z",
|
||||
"updated_at": "2026-02-25T20:42:00Z",
|
||||
"description": "Review all tests that exist, some like the mma are conductor only (gemini cli, not related to manual slop program) and must be blacklisted from running when testing manual_slop itself. I think some tests are failing right now. Also no curation of the current tests has been done. They have been made incremetnally, on demand per track needs and have accumulated that way without any second-pass conslidation and organization. We problably can figure out a proper ordering, either add or remove tests based on redundancy or lack thero-of of an openly unchecked feature or process. This is important to get right now before doing heavier tracks."
|
||||
}
|
||||
@@ -1,35 +0,0 @@
|
||||
# Implementation Plan: Test Suite Curation and Organization
|
||||
|
||||
This plan outlines the process for categorizing, organizing, and curating the existing test suite using a central manifest and exhaustive review.
|
||||
|
||||
## Phase 1: Research and Inventory [checkpoint: be689ad]
|
||||
- [x] Task: Initialize MMA Environment `activate_skill mma-orchestrator` be689ad
|
||||
- [x] Task: Inventory all existing tests in `tests/` and mapping them to categories be689ad
|
||||
- [x] Task: Identify failing and redundant tests through a full execution sweep be689ad
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 1: Research and Inventory' (Protocol in workflow.md) be689ad
|
||||
|
||||
## Phase 2: Manifest and Tooling [checkpoint: 6152b63]
|
||||
- [x] Task: T3-P2-1-STUB: Design tests.toml manifest schema (Completed by PM) 6152b63
|
||||
- [x] Task: T3-P2-1-IMPL: Populate tests.toml with full inventory 6152b63
|
||||
- [x] Task: T3-P2-2-STUB: Stub run_tests.py category-aware interface 6152b63
|
||||
- [x] Task: T3-P2-2-IMPL: Implement run_tests.py filtering logic (Verified) 6152b63
|
||||
- [x] Task: Verify that Conductor/MMA tests can be explicitly excluded from default runs (Verified) 6152b63
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 2: Manifest and Tooling' (Protocol in workflow.md) 6152b63
|
||||
|
||||
## Phase 3: Curation and Consolidation
|
||||
- [x] Task: FIX-001: Fix CliToolBridge test decision logic (context variable)
|
||||
- [x] Task: FIX-002: Fix Gemini CLI Mock integration flow (env inheritance, multi-round tool loop, auto-dismiss modal)
|
||||
- [x] Task: FIX-003: Fix History Bleed limit for gemini_cli provider
|
||||
- [x] Task: CON-001: Consolidate History Management tests (6 files -> 1)
|
||||
- [x] Task: CON-002: Consolidate Headless API tests (3 files -> 1)
|
||||
- [x] Task: Standardize test naming conventions across the suite (Verified)
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 3: Curation and Consolidation' (Protocol in workflow.md)
|
||||
|
||||
## Phase 4: Final Verification
|
||||
- [x] Task: Execute full test suite by category using the new manifest (Verified)
|
||||
- [x] Task: Verify 100% pass rate for all non-blacklisted tests (Verified)
|
||||
- [x] Task: Generate a final test coverage report (Verified)
|
||||
- [x] Task: Conductor - User Manual Verification 'Phase 4: Final Verification' (Protocol in workflow.md)
|
||||
|
||||
## Phase: Review Fixes
|
||||
- [x] Task: Apply review suggestions c239660
|
||||
@@ -1,33 +0,0 @@
|
||||
# Specification: Test Suite Curation and Organization
|
||||
|
||||
## Overview
|
||||
The current test suite for **Manual Slop** and the **Conductor** framework has grown incrementally and lacks a formal organization. This track aims to curate, categorize, and organize existing tests, specifically blacklisting Conductor-specific (MMA) tests from manual_slop's test runs. We will use a central manifest for test management and perform an exhaustive review of all tests to eliminate redundancy.
|
||||
|
||||
## Functional Requirements
|
||||
- **Test Categorization:** Tests will be categorized into:
|
||||
- Manual Slop Core/GUI
|
||||
- Conductor/MMA
|
||||
- MCP/Integrations
|
||||
- Simulation/Workflows
|
||||
- **Central Manifest:** Implement a `tests.toml` (or similar) manifest file to define test categories and blacklist specific tests from the default `manual_slop` test run.
|
||||
- **Blacklisting:** Ensure that Conductor-only tests (e.g., MMA related) do not execute when running tests for the `manual_slop` application itself.
|
||||
- **Exhaustive Curation:** Review all existing tests in `tests/` to:
|
||||
- Fix failing tests.
|
||||
- Identify and merge redundant tests.
|
||||
- Remove obsolete tests.
|
||||
- Ensure consistent naming conventions.
|
||||
|
||||
## Non-Functional Requirements
|
||||
- **Clarity:** The `tests.toml` manifest should be easy to understand and maintain.
|
||||
- **Reliability:** The curation must result in a stable, passing test suite for each category.
|
||||
|
||||
## Acceptance Criteria
|
||||
- A central manifest (`tests.toml`) is created and used to manage test execution.
|
||||
- Running `manual_slop` tests successfully ignores all blacklisted Conductor/MMA tests.
|
||||
- All failing tests are either fixed or removed (if redundant).
|
||||
- Each test file is assigned to at least one category in the manifest.
|
||||
- Redundant test logic is consolidated.
|
||||
|
||||
## Out of Scope
|
||||
- Writing new feature tests (unless required to consolidate redundancy).
|
||||
- Major refactoring of the test framework itself (beyond the manifest).
|
||||
Reference in New Issue
Block a user