feat(track): Add fix_concurrent_mma_tests_20260507 track

2026-05-06 22:15:40 -04:00
parent c36e691b8d
commit 885bb1395b
4 changed files with 72 additions and 0 deletions
@@ -163,6 +163,12 @@ This file tracks all major tracks for the project. Each track has its own detail
    *Link: [./tracks/caching_optimization_20260308/](./tracks/caching_optimization_20260308/)*
    *Goal: Verify and optimize caching strategies across all providers. Implement 4-breakpoint hierarchy for Anthropic, prefix stabilization for OpenAI/DeepSeek, and hybrid explicit/implicit caching for Gemini. Add GUI hit rate metrics.*

+### Testing & Quality
+
+1. [~] **Track: Fix Concurrent MMA Live GUI Tests**
+   *Link: [./tracks/fix_concurrent_mma_tests_20260507/](./tracks/fix_concurrent_mma_tests_20260507/)*
+   *Goal: Fix timeout issues in concurrent MMA track execution tests (test_mma_concurrent_tracks_sim.py, test_mma_concurrent_tracks_stress_sim.py, test_visual_sim_mma_v2.py). Workers run correctly but tests timeout due to infrastructure issues.*
+
 ---

 ## Phase 3: Future Horizons
@@ -0,0 +1,10 @@
+{
+  "id": "fix_concurrent_mma_tests_20260507",
+  "title": "Fix Concurrent MMA Live GUI Tests",
+  "created": "2026-05-07",
+  "status": "in_progress",
+  "priority": "high",
+  "estimate": "medium",
+  "tags": ["testing", "mma", "concurrency", "integration-tests"],
+  "depends_on": []
+}
@@ -0,0 +1,19 @@
+# Implementation Plan: Fix Concurrent MMA Live GUI Tests
+
+## Phase 1: Investigate Test Infrastructure
+- [ ] Task: Compare test_mma_concurrent_tracks_sim.py with working tests (e.g., test_visual_orchestration.py)
+- [ ] Task: Check subprocess/port cleanup between live_gui fixture tests
+- [ ] Task: Verify get_mma_workers() API returns expected format
+- [ ] Task: Run isolated concurrent test with verbose debugging
+
+## Phase 2: Identify Root Cause
+- [ ] Task: Determine if issue is fixture cleanup, API response, or timing
+- [ ] Task: Document exact failure point
+
+## Phase 3: Implement Fix
+- [ ] Task: Fix identified root cause
+- [ ] Task: Verify fix doesn't break other tests
+
+## Phase 4: Final Verification
+- [ ] Task: Run all concurrent MMA tests
+- [ ] Task: Run full test suite to check for regressions
@@ -0,0 +1,37 @@
+# Track: Fix Concurrent MMA Live GUI Tests
+
+## Problem Statement
+The live GUI integration tests for concurrent MMA track execution timeout:
+- `test_mma_concurrent_tracks_sim.py::test_mma_concurrent_tracks_execution`
+- `test_mma_concurrent_tracks_stress_sim.py::test_mma_concurrent_tracks_stress`
+- `test_visual_sim_mma_v2.py::test_mma_complete_lifecycle`
+
+While simpler single-track live GUI tests pass, these concurrent execution tests timeout (300s).
+
+## Observations
+1. Workers ARE running - both ticket-A-1 and ticket-B-1 appear in `active_streams` (confirmed via user screenshot)
+2. The issue is in the test infrastructure's ability to handle concurrent MMA workloads
+3. Other live GUI tests work (visual_orchestration, mma_step_mode_sim, system_prompt_sim)
+
+## Investigation Tasks
+1. [ ] Compare `test_mma_concurrent_tracks_sim.py` with `test_visual_orchestration.py` to identify differences in test structure
+2. [ ] Check if `live_gui` fixture cleanup between concurrent tests causes issues
+3. [ ] Investigate if subprocess management from previous tests pollutes subsequent test runs
+4. [ ] Analyze timing issues - the concurrent test polls at 1s intervals but workers may complete faster
+5. [ ] Verify `get_mma_workers()` API endpoint returns data in expected format
+
+## Hypotheses
+1. **Subprocess leakage**: Previous test's GUI process isn't fully killed before next test starts
+2. **API response format**: `mma_streams` dict may serialize differently than expected by test
+3. **Timing race**: Workers complete before completion check loop starts, making completion undetectable
+4. **Thread pool exhaustion**: Multiple concurrent tracks exhaust some shared resource
+
+## Success Criteria
+- `test_mma_concurrent_tracks_execution` completes in <60s
+- `test_mma_concurrent_tracks_stress` completes in <60s
+- No other tests regress
+
+## Approach
+1. First, isolate whether the issue is in the test setup (fixture) or test logic
+2. Add debugging output to identify exact failure point
+3. Fix the root cause - likely requires improving test infrastructure robustness