conductor(track): Document real bug - self.engine gets overwritten
This commit is contained in:
@@ -4,16 +4,16 @@
|
|||||||
- [x] Task: Compare test_mma_concurrent_tracks_sim.py with working tests - Same fixture, same pattern
|
- [x] Task: Compare test_mma_concurrent_tracks_sim.py with working tests - Same fixture, same pattern
|
||||||
- [x] Task: Check subprocess/port cleanup between live_gui fixture tests - Not cleanup issue, single test also times out
|
- [x] Task: Check subprocess/port cleanup between live_gui fixture tests - Not cleanup issue, single test also times out
|
||||||
- [x] Task: Verify get_mma_workers() API returns expected format - Returns {"workers": Dict[str, str]}
|
- [x] Task: Verify get_mma_workers() API returns expected format - Returns {"workers": Dict[str, str]}
|
||||||
- [x] Task: Run isolated concurrent test with verbose debugging - Workers ARE running (confirmed via user screenshot)
|
- [x] Task: Run isolated concurrent test with verbose debugging - Confirmed: only ONE worker appears when starting two tracks
|
||||||
|
|
||||||
## Phase 2: Identify Root Cause
|
## Phase 2: Identify Root Cause - CONFIRMED BUG
|
||||||
- [x] Task: Workers appear in active_streams but test times out - Workers run but completion detection fails
|
- [x] Task: Root cause found: `self.engine` is single attribute that gets overwritten
|
||||||
- [x] Task: Issue is NOT fixture cleanup - Even isolated runs timeout
|
- [x] Task: When second track starts, it overwrites `self.engine`, orphaning first track's engine
|
||||||
- [x] Task: Issue is likely in btn_mma_accept_tracks background thread - Accept tracks completes (proposed_tracks works), but something after that hangs
|
|
||||||
|
|
||||||
## Phase 3: Implement Fix
|
## Phase 3: Implement Fix - NEEDS REAL FIX, NOT TEST SIMPLIFICATION
|
||||||
- [ ] Task: Simplify test to avoid complex polling - Just verify workers appear, don't wait for completion
|
- [ ] Task: Change `self.engine: Optional[ConductorEngine]` to `self.engines: Dict[str, ConductorEngine]`
|
||||||
- [ ] Task: Or fix the underlying issue causing the hang
|
- [ ] Task: Update all `self.engine` references to track-specific lookup
|
||||||
|
- [ ] Task: Verify two concurrent tracks produce two concurrent workers
|
||||||
|
|
||||||
## Phase 4: Final Verification
|
## Phase 4: Final Verification
|
||||||
- [ ] Task: Run all concurrent MMA tests
|
- [ ] Task: Run all concurrent MMA tests
|
||||||
|
|||||||
@@ -1,37 +1,30 @@
|
|||||||
# Track: Fix Concurrent MMA Live GUI Tests
|
# Track: Fix Concurrent MMA Live GUI Tests
|
||||||
|
|
||||||
## Problem Statement
|
## Problem Statement
|
||||||
The live GUI integration tests for concurrent MMA track execution timeout:
|
When starting two MMA tracks concurrently via `btn_mma_start_track`, only ONE worker appears instead of two. The test `test_mma_concurrent_tracks_stress_sim.py` correctly identifies this bug:
|
||||||
- `test_mma_concurrent_tracks_sim.py::test_mma_concurrent_tracks_execution`
|
- Two tracks are started
|
||||||
- `test_mma_concurrent_tracks_stress_sim.py::test_mma_concurrent_tracks_stress`
|
- But only ONE Tier 3 worker (`mock-ticket-1`) appears
|
||||||
- `test_visual_sim_mma_v2.py::test_mma_complete_lifecycle`
|
- The second worker never starts
|
||||||
|
|
||||||
While simpler single-track live GUI tests pass, these concurrent execution tests timeout (300s).
|
## Root Cause Analysis
|
||||||
|
In `app_controller.py` `_cb_start_track()`:
|
||||||
|
```python
|
||||||
|
self.engine = multi_agent_conductor.ConductorEngine(...)
|
||||||
|
threading.Thread(target=engine.run, daemon=True).start()
|
||||||
|
```
|
||||||
|
|
||||||
## Observations
|
`self.engine` is a SINGLE attribute. When a second track is started, it **overwrites** `self.engine`, orphaning the first track's engine. The first track's engine thread continues running but loses its reference in `self.engine`.
|
||||||
1. Workers ARE running - both ticket-A-1 and ticket-B-1 appear in `active_streams` (confirmed via user screenshot)
|
|
||||||
2. The issue is in the test infrastructure's ability to handle concurrent MMA workloads
|
|
||||||
3. Other live GUI tests work (visual_orchestration, mma_step_mode_sim, system_prompt_sim)
|
|
||||||
|
|
||||||
## Investigation Tasks
|
## Required Fix
|
||||||
1. [ ] Compare `test_mma_concurrent_tracks_sim.py` with `test_visual_orchestration.py` to identify differences in test structure
|
1. Replace `self.engine: Optional[ConductorEngine]` with `self.engines: Dict[str, ConductorEngine]` - one engine per track
|
||||||
2. [ ] Check if `live_gui` fixture cleanup between concurrent tests causes issues
|
2. Update all references to `self.engine` to use track-specific engine lookup
|
||||||
3. [ ] Investigate if subprocess management from previous tests pollutes subsequent test runs
|
3. Update `kill_worker`, `pause_mma`, `resume_mma`, `approve_ticket`, `mutate_dag` to target specific track engines
|
||||||
4. [ ] Analyze timing issues - the concurrent test polls at 1s intervals but workers may complete faster
|
4. Ensure cleanup of engines when tracks complete
|
||||||
5. [ ] Verify `get_mma_workers()` API endpoint returns data in expected format
|
|
||||||
|
|
||||||
## Hypotheses
|
## Impact
|
||||||
1. **Subprocess leakage**: Previous test's GUI process isn't fully killed before next test starts
|
This affects ALL concurrent MMA track execution - not just tests. Multiple tracks started simultaneously will interfere with each other.
|
||||||
2. **API response format**: `mma_streams` dict may serialize differently than expected by test
|
|
||||||
3. **Timing race**: Workers complete before completion check loop starts, making completion undetectable
|
|
||||||
4. **Thread pool exhaustion**: Multiple concurrent tracks exhaust some shared resource
|
|
||||||
|
|
||||||
## Success Criteria
|
## Success Criteria
|
||||||
- `test_mma_concurrent_tracks_execution` completes in <60s
|
- Two concurrent tracks produce two concurrent workers
|
||||||
- `test_mma_concurrent_tracks_stress` completes in <60s
|
- No regression in single-track execution
|
||||||
- No other tests regress
|
- All concurrent tests pass
|
||||||
|
|
||||||
## Approach
|
|
||||||
1. First, isolate whether the issue is in the test setup (fixture) or test logic
|
|
||||||
2. Add debugging output to identify exact failure point
|
|
||||||
3. Fix the root cause - likely requires improving test infrastructure robustness
|
|
||||||
Reference in New Issue
Block a user