conductor(track): Update spec and plan with complete bug analysis
Document all 4 bugs found: 1. self.engine overwritten (FIXED) 2. active_tickets not initialized (FIXED) 3. _start_track_logic uses None.active_track (FIXED) 4. _cb_start_track overwrites active_track (PARTIALLY FIXED) Current status: Track B works, Track A reload path triggered but worker never visible.
This commit is contained in:
@@ -6,15 +6,37 @@
|
|||||||
- [x] Task: Verify get_mma_workers() API returns expected format - Returns {"workers": Dict[str, str]}
|
- [x] Task: Verify get_mma_workers() API returns expected format - Returns {"workers": Dict[str, str]}
|
||||||
- [x] Task: Run isolated concurrent test with verbose debugging - Confirmed: only ONE worker appears when starting two tracks
|
- [x] Task: Run isolated concurrent test with verbose debugging - Confirmed: only ONE worker appears when starting two tracks
|
||||||
|
|
||||||
## Phase 2: Identify Root Cause - CONFIRMED BUG
|
## Phase 2: Identify Root Cause - MULTIPLE BUGS FOUND
|
||||||
- [x] Task: Root cause found: `self.engine` is single attribute that gets overwritten
|
- [x] Bug 1: `self.engine` single reference overwritten (FIXED in ac0b564)
|
||||||
- [x] Task: When second track starts, it overwrites `self.engine`, orphaning first track's engine
|
- [x] Bug 2: `active_tickets` not initialized (FIXED in 6f2a2c2)
|
||||||
|
- [x] Bug 3: `_start_track_logic` uses `self.active_track.id` when None (FIXED in f3585cb)
|
||||||
|
- [x] Bug 4: `_cb_start_track` overwrites `active_track` (PARTIALLY FIXED in b0a837d)
|
||||||
|
|
||||||
## Phase 3: Implement Fix - NEEDS REAL FIX, NOT TEST SIMPLIFICATION
|
## Phase 3: Implement Fix
|
||||||
- [ ] Task: Change `self.engine: Optional[ConductorEngine]` to `self.engines: Dict[str, ConductorEngine]`
|
- [x] Task 3.1: Change `self.engine: Optional[ConductorEngine]` to `self.engines: Dict[str, ConductorEngine]`
|
||||||
- [ ] Task: Update all `self.engine` references to track-specific lookup
|
- [x] Task 3.2: Update all `self.engine` references to track-specific lookup
|
||||||
- [ ] Task: Verify two concurrent tracks produce two concurrent workers
|
- [x] Task 3.3: Fix `_start_track_logic` to use `track.id` instead of `self.active_track.id`
|
||||||
|
- [x] Task 3.4: Add `self.active_tickets = []` in `init_state()`
|
||||||
|
- [x] Task 3.5: Add explicit reload path in `_cb_start_track` when load fails
|
||||||
|
- [ ] Task 3.6: Verify `_cb_load_track` properly loads tracks created by `_cb_accept_tracks`
|
||||||
|
|
||||||
## Phase 4: Final Verification
|
## Phase 4: Debug Remaining Issue
|
||||||
- [ ] Task: Run all concurrent MMA tests
|
- [ ] Task 4.1: Investigate why reload path in `_cb_start_track` doesn't bring back Track A worker
|
||||||
- [ ] Task: Run full test suite to check for regressions
|
- [ ] Task 4.2: Add debug logging to `_cb_load_track` to trace failure
|
||||||
|
- [ ] Task 4.3: Verify track state is saved with correct ID when created by `_cb_accept_tracks`
|
||||||
|
- [ ] Task 4.4: Run test and verify both workers appear
|
||||||
|
|
||||||
|
## Phase 5: Final Verification
|
||||||
|
- [ ] Task 5.1: Run all concurrent MMA tests
|
||||||
|
- [ ] Task 5.2: Run full test suite to check for regressions
|
||||||
|
- [ ] Task 5.3: Remove debug logging and commit final fixes
|
||||||
|
|
||||||
|
## Commits So Far
|
||||||
|
- `ac0b564` - fix(mma): Change self.engine to self.engines dict for concurrent track support
|
||||||
|
- `6f2a2c2` - fix(gui): Initialize active_tickets in AppController.init_state
|
||||||
|
- `f3585cb` - fix(mma): Use track.id instead of self.active_track.id in _start_track_logic
|
||||||
|
- `b0a837d` - fix(mma): Add explicit reload logic when _cb_load_track fails in _cb_start_track
|
||||||
|
- `cab733a` - debug: Add logging to _cb_start_track
|
||||||
|
|
||||||
|
## Current Blocker
|
||||||
|
After commits 1-4, Track B's worker appears but Track A's never does. The reload path (b0a837d) is triggered for Track A but the worker doesn't become visible. Need to trace why `_cb_load_track` succeeds for Track B but fails for Track A, or why the engine thread doesn't produce visible output.
|
||||||
@@ -1,30 +1,73 @@
|
|||||||
# Track: Fix Concurrent MMA Live GUI Tests
|
# Track: Fix Concurrent MMA Live GUI Tests
|
||||||
|
|
||||||
## Problem Statement
|
## Problem Statement
|
||||||
When starting two MMA tracks concurrently via `btn_mma_start_track`, only ONE worker appears instead of two. The test `test_mma_concurrent_tracks_stress_sim.py` correctly identifies this bug:
|
When starting two MMA tracks concurrently via `btn_mma_start_track`, only ONE worker appears instead of two. The test `test_mma_concurrent_tracks_stress_sim.py` correctly identifies this bug.
|
||||||
- Two tracks are started
|
|
||||||
- But only ONE Tier 3 worker (`mock-ticket-1`) appears
|
|
||||||
- The second worker never starts
|
|
||||||
|
|
||||||
## Root Cause Analysis
|
## Bugs Found (Multiple Layers)
|
||||||
In `app_controller.py` `_cb_start_track()`:
|
|
||||||
```python
|
### Bug 1: `self.engine` Single Reference (FIXED - commit ac0b564)
|
||||||
self.engine = multi_agent_conductor.ConductorEngine(...)
|
**Root Cause:** `self.engine` was a single ConductorEngine attribute. When second track started, it overwrote `self.engine`, orphaning first track's engine.
|
||||||
threading.Thread(target=engine.run, daemon=True).start()
|
|
||||||
|
**Fix:** Changed to `self.engines: Dict[str, ConductorEngine]` keyed by track.id.
|
||||||
|
|
||||||
|
### Bug 2: `active_tickets` Not Initialized (FIXED - commit 6f2a2c2)
|
||||||
|
**Root Cause:** `AppController.init_state()` never initialized `self.active_tickets`. When GUI tried to render `_render_mma_dashboard()`, `__getattr__` delegation returned `controller.active_tickets` but it didn't exist, causing `AttributeError: 'App' object has no attribute 'active_tickets'`.
|
||||||
|
|
||||||
|
**Fix:** Added `self.active_tickets = []` in `init_state()`.
|
||||||
|
|
||||||
|
### Bug 3: `_start_track_logic` Uses `self.active_track.id` When None (FIXED - commit f3585cb)
|
||||||
|
**Root Cause:** `_cb_accept_tracks` calls `_start_track_logic` in a loop for each proposed track. `_start_track_logic` used `self.engines[self.active_track.id]` but `self.active_track` was None (only set by `_cb_load_track`, not by accept logic).
|
||||||
|
|
||||||
|
**Fix:** Changed to `self.engines[track.id]` using the locally created track object.
|
||||||
|
|
||||||
|
### Bug 4: `_cb_start_track` Overwrites `active_track` (PARTIALLY FIXED - commit b0a837d)
|
||||||
|
**Root Cause:** When `btn_mma_start_track(track_a_id)` is called:
|
||||||
|
1. `_cb_load_track(track_a_id)` is called - may fail if track not saved to disk
|
||||||
|
2. If fails, `self.active_track` remains None or pointing to wrong track
|
||||||
|
3. When `btn_mma_start_track(track_b_id)` is called next, it overwrites `self.active_track` to track B
|
||||||
|
4. Engine for track A never created properly
|
||||||
|
|
||||||
|
**Fix:** Added explicit reload path when loaded track doesn't match requested track.
|
||||||
|
|
||||||
|
## Current Status
|
||||||
|
After fixes 1-4, the log shows:
|
||||||
|
- Track A: `_cb_start_track` called, load fails, reload path triggered but track A worker never visible
|
||||||
|
- Track B: Works correctly, worker appears
|
||||||
|
- Test assertion: `seen_a=False, seen_b=True`
|
||||||
|
|
||||||
|
## Remaining Issue
|
||||||
|
The reload path in `_cb_start_track` (commit b0a837d) correctly identifies that load failed, but the reload itself may not be working. Need to verify:
|
||||||
|
1. `_cb_load_track(track_a_id)` is called with correct track ID
|
||||||
|
2. Track state is properly saved and can be loaded
|
||||||
|
3. `active_track` is correctly set after reload
|
||||||
|
|
||||||
|
## Architecture Notes
|
||||||
|
|
||||||
|
### API Flow for Starting Tracks
|
||||||
|
```
|
||||||
|
test clicks btn_mma_start_track(track_a_id)
|
||||||
|
→ api_hook_client.click('btn_mma_start_track', user_data=track_a_id)
|
||||||
|
→ POST /api/gui {"action": "click", "item": "btn_mma_start_track", "user_data": track_a_id}
|
||||||
|
→ _process_gui_task finds btn_mma_start_track in _clickable_actions
|
||||||
|
→ calls _cb_start_track(user_data=track_a_id)
|
||||||
|
→ calls _cb_load_track(track_id)
|
||||||
|
→ loads track state from disk
|
||||||
|
→ sets self.active_track
|
||||||
|
→ creates ConductorEngine
|
||||||
|
→ stores in self.engines[track_id]
|
||||||
|
→ starts engine thread
|
||||||
```
|
```
|
||||||
|
|
||||||
`self.engine` is a SINGLE attribute. When a second track is started, it **overwrites** `self.engine`, orphaning the first track's engine. The first track's engine thread continues running but loses its reference in `self.engine`.
|
### Key Files
|
||||||
|
- `src/app_controller.py` - Contains all the bugs and fixes
|
||||||
## Required Fix
|
- `src/multi_agent_conductor.py` - ConductorEngine, run loop
|
||||||
1. Replace `self.engine: Optional[ConductorEngine]` with `self.engines: Dict[str, ConductorEngine]` - one engine per track
|
- `src/api_hooks.py` - API endpoint routing
|
||||||
2. Update all references to `self.engine` to use track-specific engine lookup
|
- `src/gui_2.py` - UI rendering, `__getattr__` delegation
|
||||||
3. Update `kill_worker`, `pause_mma`, `resume_mma`, `approve_ticket`, `mutate_dag` to target specific track engines
|
- `tests/test_mma_concurrent_tracks_sim.py` - The failing test
|
||||||
4. Ensure cleanup of engines when tracks complete
|
|
||||||
|
|
||||||
## Impact
|
|
||||||
This affects ALL concurrent MMA track execution - not just tests. Multiple tracks started simultaneously will interfere with each other.
|
|
||||||
|
|
||||||
## Success Criteria
|
## Success Criteria
|
||||||
- Two concurrent tracks produce two concurrent workers
|
- [x] GUI starts without crash (active_tickets initialized)
|
||||||
- No regression in single-track execution
|
- [x] `self.engines` dict properly stores multiple concurrent engines
|
||||||
- All concurrent tests pass
|
- [ ] Two concurrent tracks produce two concurrent workers visible in `get_mma_workers()`
|
||||||
|
- [ ] No regression in single-track execution
|
||||||
|
- [ ] All concurrent tests pass
|
||||||
Reference in New Issue
Block a user