chore(conductor): Add new track 'Expanded Test Coverage and Stress Testing'

2026-03-09 21:45:45 -04:00
parent fe0f349c12
commit 5cd49290fe
5 changed files with 71 additions and 3 deletions
@@ -50,8 +50,8 @@ This file tracks all major tracks for the project. Each track has its own detail

 5. [x] **Track: NERV UI Theme Integration** (Archived 2026-03-09)

-6. [ ] **Track: Custom Shader and Window Frame Support**
-*Link: [./tracks/custom_shaders_20260309/](./tracks/custom_shaders_20260309/)*
+6. [ ] **Track: Custom Shader and Window Frame Support** 
+   *Link: [./tracks/custom_shaders_20260309/](./tracks/custom_shaders_20260309/)*

 ---

@@ -121,7 +121,6 @@ This file tracks all major tracks for the project. Each track has its own detail

 ### Completed / Archived

-
 - [x] **Track: True Parallel Worker Execution (The DAG Realization)**
 - [x] **Track: Deep AST-Driven Context Pruning (RAG for Code)**
 - [x] **Track: Visual DAG & Interactive Ticket Editing**
@@ -181,3 +180,12 @@ This file tracks all major tracks for the project. Each track has its own detail
 - [x] **Track: Robust Live Simulation Verification**

 ---
+
+- [ ] **Track: Custom Shader and Window Frame Support** 
+   *Link: [./tracks/custom_shaders_20260309/](./tracks/custom_shaders_20260309/)*
+
+---
+
+- [ ] **Track: Expanded Test Coverage and Stress Testing**
+   *Link: [./tracks/test_coverage_expansion_20260309/](./tracks/test_coverage_expansion_20260309/)*
+
@@ -0,0 +1,5 @@
+# Track test_coverage_expansion_20260309 Context
+
+- [Specification](./spec.md)
+- [Implementation Plan](./plan.md)
+- [Metadata](./metadata.json)
@@ -0,0 +1,8 @@
+{
+  "track_id": "test_coverage_expansion_20260309",
+  "type": "chore",
+  "status": "new",
+  "created_at": "2026-03-09T00:00:00Z",
+  "updated_at": "2026-03-09T00:00:00Z",
+  "description": "Add more unit tests for features lacking coverage or sim tests for scenarios not already covered to stress test the application."
+}
@@ -0,0 +1,19 @@
+# Implementation Plan: Expanded Test Coverage and Stress Testing
+
+## Phase 1: Tool Accessibility and State Unit Tests
+- [ ] Task: Review current tool registration and disabling logic in `src/mcp_client.py` and `src/api_hooks.py`.
+- [ ] Task: Write Tests: Create unit tests in `tests/test_agent_tools_wiring.py` (or similar) to verify turning a tool off removes it from the agent's available tool list.
+- [ ] Task: Implement: If tests fail due to missing logic, update the tool filtering implementation to ensure disabled tools are strictly excluded from the context sent to the provider.
+- [ ] Task: Conductor - User Manual Verification 'Phase 1: Tool Accessibility and State Unit Tests' (Protocol in workflow.md)
+
+## Phase 2: MMA Agent 'Step Mode' Simulation Tests
+- [ ] Task: Investigate existing simulation test patterns in `tests/simulation/` and the Hook API coverage for Step Mode.
+- [ ] Task: Write Tests: Create a new simulation test (`tests/test_mma_step_mode_sim.py`) that initializes an MMA track and specifically forces 'Step Mode' via API hooks.
+- [ ] Task: Implement/Refine: Ensure the simulation script correctly waits for and manually approves task transitions, validating that the execution engine pauses appropriately between steps.
+- [ ] Task: Conductor - User Manual Verification 'Phase 2: MMA Agent Step Mode Simulation Tests' (Protocol in workflow.md)
+
+## Phase 3: Multi-Epic and Advanced DAG Stress Tests
+- [ ] Task: Analyze the DAG execution engine (`src/dag_engine.py` and `src/multi_agent_conductor.py`) for handling multiple concurrent tracks/epics.
+- [ ] Task: Write Tests: Create an integration/simulation test that loads two or more complex tracks with interconnected dependencies simultaneously.
+- [ ] Task: Implement/Refine: Stress test the system by allowing the agent pool to execute these concurrent DAGs. Verify that blocked statuses propagate correctly and that the orchestrator does not deadlock or crash.
+- [ ] Task: Conductor - User Manual Verification 'Phase 3: Multi-Epic and Advanced DAG Stress Tests' (Protocol in workflow.md)
@@ -0,0 +1,28 @@
+# Specification: Expanded Test Coverage and Stress Testing
+
+## Overview
+Add more unit, simulation, and integration tests to increase coverage and stress test the application. The primary focus will be on critical and complex paths rather than aggressive total coverage percentage. 
+
+## Functional Requirements
+- **Targeted Areas:**
+  - **MMA Agent 'Step Mode':** Ensure the step-by-step execution mode of the multi-agent architecture is thoroughly tested, including manual confirmation steps.
+  - **Tool Toggling and Access:** Verify that tools can be explicitly disabled/turned off and that tests confirm these tools are indeed inaccessible to the agents.
+  - **Multi-Epic/Advanced DAG Usage:** Stress test the Directed Acyclic Graph (DAG) execution engine by running scenarios with more than one concurrent epic/track and advanced task dependencies.
+- **Testing Types:**
+  - **Unit Tests:** For core logic regarding tool accessibility and state management.
+  - **Integration Tests:** To ensure agents, the DAG engine, and the execution pool interact correctly under stress.
+  - **Simulation Tests:** To run end-to-end automated UI workflows covering Step Mode operations and multi-epic management.
+
+## Non-Functional Requirements
+- **Targeted Coverage:** Prioritize regression prevention and covering previously untested edge cases in the specified areas over reaching a strict 80% global coverage metric.
+- **Stability:** All new tests must be stable, repeatable, and avoid introducing flakiness to the test suite.
+
+## Acceptance Criteria
+- [ ] Unit tests exist to verify that disabling a tool explicitly prevents agent access.
+- [ ] Simulation tests are in place to run an MMA agent workflow specifically in 'Step Mode', capturing necessary UI interactions.
+- [ ] Integration/simulation tests exist that load and execute multiple epics/tracks within the DAG engine simultaneously to stress the orchestrator.
+- [ ] The CI or local test suite passes reliably with the new tests included.
+
+## Out of Scope
+- Reaching >80% total code coverage across all modules indiscriminately.
+- Refactoring the core DAG or MMA execution logic (unless absolutely necessary to fix a bug discovered during testing).