feat(track): Add fix_concurrent_mma_tests_20260507 track
This commit is contained in:
@@ -163,6 +163,12 @@ This file tracks all major tracks for the project. Each track has its own detail
|
||||
*Link: [./tracks/caching_optimization_20260308/](./tracks/caching_optimization_20260308/)*
|
||||
*Goal: Verify and optimize caching strategies across all providers. Implement 4-breakpoint hierarchy for Anthropic, prefix stabilization for OpenAI/DeepSeek, and hybrid explicit/implicit caching for Gemini. Add GUI hit rate metrics.*
|
||||
|
||||
### Testing & Quality
|
||||
|
||||
1. [~] **Track: Fix Concurrent MMA Live GUI Tests**
|
||||
*Link: [./tracks/fix_concurrent_mma_tests_20260507/](./tracks/fix_concurrent_mma_tests_20260507/)*
|
||||
*Goal: Fix timeout issues in concurrent MMA track execution tests (test_mma_concurrent_tracks_sim.py, test_mma_concurrent_tracks_stress_sim.py, test_visual_sim_mma_v2.py). Workers run correctly but tests timeout due to infrastructure issues.*
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Future Horizons
|
||||
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"id": "fix_concurrent_mma_tests_20260507",
|
||||
"title": "Fix Concurrent MMA Live GUI Tests",
|
||||
"created": "2026-05-07",
|
||||
"status": "in_progress",
|
||||
"priority": "high",
|
||||
"estimate": "medium",
|
||||
"tags": ["testing", "mma", "concurrency", "integration-tests"],
|
||||
"depends_on": []
|
||||
}
|
||||
@@ -0,0 +1,19 @@
|
||||
# Implementation Plan: Fix Concurrent MMA Live GUI Tests
|
||||
|
||||
## Phase 1: Investigate Test Infrastructure
|
||||
- [ ] Task: Compare test_mma_concurrent_tracks_sim.py with working tests (e.g., test_visual_orchestration.py)
|
||||
- [ ] Task: Check subprocess/port cleanup between live_gui fixture tests
|
||||
- [ ] Task: Verify get_mma_workers() API returns expected format
|
||||
- [ ] Task: Run isolated concurrent test with verbose debugging
|
||||
|
||||
## Phase 2: Identify Root Cause
|
||||
- [ ] Task: Determine if issue is fixture cleanup, API response, or timing
|
||||
- [ ] Task: Document exact failure point
|
||||
|
||||
## Phase 3: Implement Fix
|
||||
- [ ] Task: Fix identified root cause
|
||||
- [ ] Task: Verify fix doesn't break other tests
|
||||
|
||||
## Phase 4: Final Verification
|
||||
- [ ] Task: Run all concurrent MMA tests
|
||||
- [ ] Task: Run full test suite to check for regressions
|
||||
@@ -0,0 +1,37 @@
|
||||
# Track: Fix Concurrent MMA Live GUI Tests
|
||||
|
||||
## Problem Statement
|
||||
The live GUI integration tests for concurrent MMA track execution timeout:
|
||||
- `test_mma_concurrent_tracks_sim.py::test_mma_concurrent_tracks_execution`
|
||||
- `test_mma_concurrent_tracks_stress_sim.py::test_mma_concurrent_tracks_stress`
|
||||
- `test_visual_sim_mma_v2.py::test_mma_complete_lifecycle`
|
||||
|
||||
While simpler single-track live GUI tests pass, these concurrent execution tests timeout (300s).
|
||||
|
||||
## Observations
|
||||
1. Workers ARE running - both ticket-A-1 and ticket-B-1 appear in `active_streams` (confirmed via user screenshot)
|
||||
2. The issue is in the test infrastructure's ability to handle concurrent MMA workloads
|
||||
3. Other live GUI tests work (visual_orchestration, mma_step_mode_sim, system_prompt_sim)
|
||||
|
||||
## Investigation Tasks
|
||||
1. [ ] Compare `test_mma_concurrent_tracks_sim.py` with `test_visual_orchestration.py` to identify differences in test structure
|
||||
2. [ ] Check if `live_gui` fixture cleanup between concurrent tests causes issues
|
||||
3. [ ] Investigate if subprocess management from previous tests pollutes subsequent test runs
|
||||
4. [ ] Analyze timing issues - the concurrent test polls at 1s intervals but workers may complete faster
|
||||
5. [ ] Verify `get_mma_workers()` API endpoint returns data in expected format
|
||||
|
||||
## Hypotheses
|
||||
1. **Subprocess leakage**: Previous test's GUI process isn't fully killed before next test starts
|
||||
2. **API response format**: `mma_streams` dict may serialize differently than expected by test
|
||||
3. **Timing race**: Workers complete before completion check loop starts, making completion undetectable
|
||||
4. **Thread pool exhaustion**: Multiple concurrent tracks exhaust some shared resource
|
||||
|
||||
## Success Criteria
|
||||
- `test_mma_concurrent_tracks_execution` completes in <60s
|
||||
- `test_mma_concurrent_tracks_stress` completes in <60s
|
||||
- No other tests regress
|
||||
|
||||
## Approach
|
||||
1. First, isolate whether the issue is in the test setup (fixture) or test logic
|
||||
2. Add debugging output to identify exact failure point
|
||||
3. Fix the root cause - likely requires improving test infrastructure robustness
|
||||
Reference in New Issue
Block a user