diff --git a/conductor/tracks.md b/conductor/tracks.md index 9546af9..a75a61b 100644 --- a/conductor/tracks.md +++ b/conductor/tracks.md @@ -163,6 +163,12 @@ This file tracks all major tracks for the project. Each track has its own detail *Link: [./tracks/caching_optimization_20260308/](./tracks/caching_optimization_20260308/)* *Goal: Verify and optimize caching strategies across all providers. Implement 4-breakpoint hierarchy for Anthropic, prefix stabilization for OpenAI/DeepSeek, and hybrid explicit/implicit caching for Gemini. Add GUI hit rate metrics.* +### Testing & Quality + +1. [~] **Track: Fix Concurrent MMA Live GUI Tests** + *Link: [./tracks/fix_concurrent_mma_tests_20260507/](./tracks/fix_concurrent_mma_tests_20260507/)* + *Goal: Fix timeout issues in concurrent MMA track execution tests (test_mma_concurrent_tracks_sim.py, test_mma_concurrent_tracks_stress_sim.py, test_visual_sim_mma_v2.py). Workers run correctly but tests timeout due to infrastructure issues.* + --- ## Phase 3: Future Horizons diff --git a/conductor/tracks/fix_concurrent_mma_tests_20260507/metadata.json b/conductor/tracks/fix_concurrent_mma_tests_20260507/metadata.json new file mode 100644 index 0000000..4e065ea --- /dev/null +++ b/conductor/tracks/fix_concurrent_mma_tests_20260507/metadata.json @@ -0,0 +1,10 @@ +{ + "id": "fix_concurrent_mma_tests_20260507", + "title": "Fix Concurrent MMA Live GUI Tests", + "created": "2026-05-07", + "status": "in_progress", + "priority": "high", + "estimate": "medium", + "tags": ["testing", "mma", "concurrency", "integration-tests"], + "depends_on": [] +} \ No newline at end of file diff --git a/conductor/tracks/fix_concurrent_mma_tests_20260507/plan.md b/conductor/tracks/fix_concurrent_mma_tests_20260507/plan.md new file mode 100644 index 0000000..0f91840 --- /dev/null +++ b/conductor/tracks/fix_concurrent_mma_tests_20260507/plan.md @@ -0,0 +1,19 @@ +# Implementation Plan: Fix Concurrent MMA Live GUI Tests + +## Phase 1: Investigate Test Infrastructure +- [ ] Task: Compare test_mma_concurrent_tracks_sim.py with working tests (e.g., test_visual_orchestration.py) +- [ ] Task: Check subprocess/port cleanup between live_gui fixture tests +- [ ] Task: Verify get_mma_workers() API returns expected format +- [ ] Task: Run isolated concurrent test with verbose debugging + +## Phase 2: Identify Root Cause +- [ ] Task: Determine if issue is fixture cleanup, API response, or timing +- [ ] Task: Document exact failure point + +## Phase 3: Implement Fix +- [ ] Task: Fix identified root cause +- [ ] Task: Verify fix doesn't break other tests + +## Phase 4: Final Verification +- [ ] Task: Run all concurrent MMA tests +- [ ] Task: Run full test suite to check for regressions \ No newline at end of file diff --git a/conductor/tracks/fix_concurrent_mma_tests_20260507/spec.md b/conductor/tracks/fix_concurrent_mma_tests_20260507/spec.md new file mode 100644 index 0000000..d866c6f --- /dev/null +++ b/conductor/tracks/fix_concurrent_mma_tests_20260507/spec.md @@ -0,0 +1,37 @@ +# Track: Fix Concurrent MMA Live GUI Tests + +## Problem Statement +The live GUI integration tests for concurrent MMA track execution timeout: +- `test_mma_concurrent_tracks_sim.py::test_mma_concurrent_tracks_execution` +- `test_mma_concurrent_tracks_stress_sim.py::test_mma_concurrent_tracks_stress` +- `test_visual_sim_mma_v2.py::test_mma_complete_lifecycle` + +While simpler single-track live GUI tests pass, these concurrent execution tests timeout (300s). + +## Observations +1. Workers ARE running - both ticket-A-1 and ticket-B-1 appear in `active_streams` (confirmed via user screenshot) +2. The issue is in the test infrastructure's ability to handle concurrent MMA workloads +3. Other live GUI tests work (visual_orchestration, mma_step_mode_sim, system_prompt_sim) + +## Investigation Tasks +1. [ ] Compare `test_mma_concurrent_tracks_sim.py` with `test_visual_orchestration.py` to identify differences in test structure +2. [ ] Check if `live_gui` fixture cleanup between concurrent tests causes issues +3. [ ] Investigate if subprocess management from previous tests pollutes subsequent test runs +4. [ ] Analyze timing issues - the concurrent test polls at 1s intervals but workers may complete faster +5. [ ] Verify `get_mma_workers()` API endpoint returns data in expected format + +## Hypotheses +1. **Subprocess leakage**: Previous test's GUI process isn't fully killed before next test starts +2. **API response format**: `mma_streams` dict may serialize differently than expected by test +3. **Timing race**: Workers complete before completion check loop starts, making completion undetectable +4. **Thread pool exhaustion**: Multiple concurrent tracks exhaust some shared resource + +## Success Criteria +- `test_mma_concurrent_tracks_execution` completes in <60s +- `test_mma_concurrent_tracks_stress` completes in <60s +- No other tests regress + +## Approach +1. First, isolate whether the issue is in the test setup (fixture) or test logic +2. Add debugging output to identify exact failure point +3. Fix the root cause - likely requires improving test infrastructure robustness \ No newline at end of file