feat(track): Add fix_concurrent_mma_tests_20260507 track

This commit is contained in:
2026-05-06 22:15:40 -04:00
parent c36e691b8d
commit 885bb1395b
4 changed files with 72 additions and 0 deletions
+6
View File
@@ -163,6 +163,12 @@ This file tracks all major tracks for the project. Each track has its own detail
*Link: [./tracks/caching_optimization_20260308/](./tracks/caching_optimization_20260308/)*
*Goal: Verify and optimize caching strategies across all providers. Implement 4-breakpoint hierarchy for Anthropic, prefix stabilization for OpenAI/DeepSeek, and hybrid explicit/implicit caching for Gemini. Add GUI hit rate metrics.*
### Testing & Quality
1. [~] **Track: Fix Concurrent MMA Live GUI Tests**
*Link: [./tracks/fix_concurrent_mma_tests_20260507/](./tracks/fix_concurrent_mma_tests_20260507/)*
*Goal: Fix timeout issues in concurrent MMA track execution tests (test_mma_concurrent_tracks_sim.py, test_mma_concurrent_tracks_stress_sim.py, test_visual_sim_mma_v2.py). Workers run correctly but tests timeout due to infrastructure issues.*
---
## Phase 3: Future Horizons
@@ -0,0 +1,10 @@
{
"id": "fix_concurrent_mma_tests_20260507",
"title": "Fix Concurrent MMA Live GUI Tests",
"created": "2026-05-07",
"status": "in_progress",
"priority": "high",
"estimate": "medium",
"tags": ["testing", "mma", "concurrency", "integration-tests"],
"depends_on": []
}
@@ -0,0 +1,19 @@
# Implementation Plan: Fix Concurrent MMA Live GUI Tests
## Phase 1: Investigate Test Infrastructure
- [ ] Task: Compare test_mma_concurrent_tracks_sim.py with working tests (e.g., test_visual_orchestration.py)
- [ ] Task: Check subprocess/port cleanup between live_gui fixture tests
- [ ] Task: Verify get_mma_workers() API returns expected format
- [ ] Task: Run isolated concurrent test with verbose debugging
## Phase 2: Identify Root Cause
- [ ] Task: Determine if issue is fixture cleanup, API response, or timing
- [ ] Task: Document exact failure point
## Phase 3: Implement Fix
- [ ] Task: Fix identified root cause
- [ ] Task: Verify fix doesn't break other tests
## Phase 4: Final Verification
- [ ] Task: Run all concurrent MMA tests
- [ ] Task: Run full test suite to check for regressions
@@ -0,0 +1,37 @@
# Track: Fix Concurrent MMA Live GUI Tests
## Problem Statement
The live GUI integration tests for concurrent MMA track execution timeout:
- `test_mma_concurrent_tracks_sim.py::test_mma_concurrent_tracks_execution`
- `test_mma_concurrent_tracks_stress_sim.py::test_mma_concurrent_tracks_stress`
- `test_visual_sim_mma_v2.py::test_mma_complete_lifecycle`
While simpler single-track live GUI tests pass, these concurrent execution tests timeout (300s).
## Observations
1. Workers ARE running - both ticket-A-1 and ticket-B-1 appear in `active_streams` (confirmed via user screenshot)
2. The issue is in the test infrastructure's ability to handle concurrent MMA workloads
3. Other live GUI tests work (visual_orchestration, mma_step_mode_sim, system_prompt_sim)
## Investigation Tasks
1. [ ] Compare `test_mma_concurrent_tracks_sim.py` with `test_visual_orchestration.py` to identify differences in test structure
2. [ ] Check if `live_gui` fixture cleanup between concurrent tests causes issues
3. [ ] Investigate if subprocess management from previous tests pollutes subsequent test runs
4. [ ] Analyze timing issues - the concurrent test polls at 1s intervals but workers may complete faster
5. [ ] Verify `get_mma_workers()` API endpoint returns data in expected format
## Hypotheses
1. **Subprocess leakage**: Previous test's GUI process isn't fully killed before next test starts
2. **API response format**: `mma_streams` dict may serialize differently than expected by test
3. **Timing race**: Workers complete before completion check loop starts, making completion undetectable
4. **Thread pool exhaustion**: Multiple concurrent tracks exhaust some shared resource
## Success Criteria
- `test_mma_concurrent_tracks_execution` completes in <60s
- `test_mma_concurrent_tracks_stress` completes in <60s
- No other tests regress
## Approach
1. First, isolate whether the issue is in the test setup (fixture) or test logic
2. Add debugging output to identify exact failure point
3. Fix the root cause - likely requires improving test infrastructure robustness