|
|
|
@@ -1,475 +0,0 @@
|
|
|
|
|
# Concurrent MMA Tracks Bug: Comprehensive Technical Report
|
|
|
|
|
|
|
|
|
|
**Track:** `fix_concurrent_mma_tests_20260507`
|
|
|
|
|
**Issue:** When two MMA tracks are started concurrently, only Track B's worker appears in `mma_streams`. Track A's worker never manifests despite its engine being created and stored in `self.engines[track.id]`.
|
|
|
|
|
**Status:** Active investigation with multiple root causes identified.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Executive Summary
|
|
|
|
|
|
|
|
|
|
The concurrent MMA track execution bug is a multi-faceted race condition involving:
|
|
|
|
|
|
|
|
|
|
1. **Global `comms_log_callback` race** (FIXED) - Thread-local storage now used
|
|
|
|
|
2. **`confirm_spawn` dialog container race** (PENDING) - Per-ticket isolation needed
|
|
|
|
|
3. **Global `_gemini_cli_adapter` singleton** (PENDING) - Per-call adapter or locking needed
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 1. Architecture Overview
|
|
|
|
|
|
|
|
|
|
### 1.1 MMA 4-Tier Architecture
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Tier 1: Orchestrator (Strategic planning, epic → tracks)
|
|
|
|
|
↓
|
|
|
|
|
Tier 2: Tech Lead (Track execution, ticket generation) - src/conductor_tech_lead.py
|
|
|
|
|
↓
|
|
|
|
|
Tier 3: Worker (Stateless code implementation) - src/multi_agent_conductor.py:run_worker_lifecycle()
|
|
|
|
|
↓
|
|
|
|
|
Tier 4: QA (Error analysis, stateless)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 1.2 Concurrent Execution Flow
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# Test flow:
|
|
|
|
|
client.click('btn_mma_start_track', user_data=track_a_id) # Start Track A
|
|
|
|
|
time.sleep(0.5) # 500ms delay
|
|
|
|
|
client.click('btn_mma_start_track', user_data=track_b_id) # Start Track B
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 2. State Management Architecture
|
|
|
|
|
|
|
|
|
|
### 2.1 Per-Track Engine Isolation
|
|
|
|
|
|
|
|
|
|
Each track gets its own `ConductorEngine` stored in `app_controller.engines`:
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# src/app_controller.py:2920-2921
|
|
|
|
|
engine = multi_agent_conductor.ConductorEngine(self.active_track, self.event_queue, auto_queue=not self.mma_step_mode)
|
|
|
|
|
self.engines[self.active_track.id] = engine # Per-track engine storage
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**State ownership:**
|
|
|
|
|
|
|
|
|
|
| Component | Location | Scope |
|
|
|
|
|
|-----------|----------|-------|
|
|
|
|
|
| `app_controller.engines` | `Dict[track_id → Engine]` | Shared, per-track |
|
|
|
|
|
| `app_controller.active_track` | Single `Track` object | Current loaded track |
|
|
|
|
|
| `app_controller.mma_streams` | `Dict[stream_id → text]` | Shared, all tracks |
|
|
|
|
|
| `engine._active_workers` | `Dict[ticket_id → Thread]` | Per-engine |
|
|
|
|
|
| `engine._abort_events` | `Dict[ticket_id → Event]` | Per-engine |
|
|
|
|
|
| `engine.dag` | `TrackDAG` with ticket statuses | Per-engine |
|
|
|
|
|
| `engine.pool` | `WorkerPool` with concurrency | Per-engine |
|
|
|
|
|
|
|
|
|
|
### 2.2 Thread-Local Callback Storage (FIXED)
|
|
|
|
|
|
|
|
|
|
**Original (BUGGY):**
|
|
|
|
|
```python
|
|
|
|
|
# src/ai_client.py:103 (GLOBAL - thread-unsafe)
|
|
|
|
|
comms_log_callback: Optional[Callable[[dict[str, Any]], None]] = None
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Fixed (THREAD-SAFE):**
|
|
|
|
|
```python
|
|
|
|
|
# src/ai_client.py:121-127
|
|
|
|
|
def get_comms_log_callback() -> Optional[Callable[[dict[str, Any]], None]]:
|
|
|
|
|
"""Returns the comms log callback from thread-local storage."""
|
|
|
|
|
return getattr(_local_storage, "comms_log_callback", None)
|
|
|
|
|
|
|
|
|
|
def set_comms_log_callback(cb: Optional[Callable[[dict[str, Any]], None]]) -> None:
|
|
|
|
|
"""Sets the comms log callback in thread-local storage."""
|
|
|
|
|
_local_storage.comms_log_callback = cb
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 2.3 Worker Lifecycle State Flow
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# src/multi_agent_conductor.py:run_worker_lifecycle()
|
|
|
|
|
|
|
|
|
|
def run_worker_lifecycle(ticket, context, context_files, event_queue, engine, md_content):
|
|
|
|
|
# 1. Set thread-local callback for THIS worker
|
|
|
|
|
old_comms_cb = ai_client.get_comms_log_callback()
|
|
|
|
|
ai_client.set_comms_log_callback(worker_comms_callback) # Thread-local
|
|
|
|
|
|
|
|
|
|
# 2. Set current tier (ALSO thread-local)
|
|
|
|
|
ai_client.set_current_tier(f"Tier 3 (Worker): {ticket.id}")
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
# 3. Call ai_client.send() - THIS is where stream_callback fires
|
|
|
|
|
response = ai_client.send(
|
|
|
|
|
md_content=md_content,
|
|
|
|
|
user_message=user_message,
|
|
|
|
|
stream_callback=stream_callback # Created INSIDE run_worker_lifecycle
|
|
|
|
|
)
|
|
|
|
|
finally:
|
|
|
|
|
# 4. Restore
|
|
|
|
|
ai_client.set_comms_log_callback(old_comms_cb)
|
|
|
|
|
ai_client.set_current_tier(None)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 3. Root Causes Identified
|
|
|
|
|
|
|
|
|
|
### 3.1 Root Cause #1: Global `comms_log_callback` (FIXED)
|
|
|
|
|
|
|
|
|
|
**Problem:** The `comms_log_callback` was a module-level global variable. When two workers ran concurrently:
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# Timeline (BEFORE FIX):
|
|
|
|
|
T1: Worker A sets ai_client.comms_log_callback = callback_A
|
|
|
|
|
T2: Worker A releases _send_lock while waiting for subprocess
|
|
|
|
|
T3: Worker B sets ai_client.comms_log_callback = callback_B # OVERWRITES!
|
|
|
|
|
T4: Worker A's subprocess returns → uses callback_B (WRONG!)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Fix Applied:**
|
|
|
|
|
- Changed `comms_log_callback` to use thread-local storage via `_local_storage.comms_log_callback`
|
|
|
|
|
- Each worker thread now has isolated callback storage
|
|
|
|
|
|
|
|
|
|
### 3.2 Root Cause #2: `confirm_spawn` Dialog Container Race (PENDING)
|
|
|
|
|
|
|
|
|
|
**Problem:** The `confirm_spawn` function uses a single-element list as a dialog container:
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# src/multi_agent_conductor.py:358-372
|
|
|
|
|
def confirm_spawn(role, prompt, context_md, event_queue, ticket_id):
|
|
|
|
|
dialog_container = [None] # SHARED per-call but not per-ticket!
|
|
|
|
|
task = {
|
|
|
|
|
"action": "mma_spawn_approval",
|
|
|
|
|
"ticket_id": ticket_id,
|
|
|
|
|
"dialog_container": dialog_container
|
|
|
|
|
}
|
|
|
|
|
_queue_put(event_queue, "mma_spawn_approval", task)
|
|
|
|
|
|
|
|
|
|
# Worker waits here
|
|
|
|
|
while dialog_container[0] is None and time.time() - start < 60:
|
|
|
|
|
time.sleep(0.1)
|
|
|
|
|
res = dialog_container[0].wait()
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Race Condition:**
|
|
|
|
|
1. Track A's worker calls `confirm_spawn` → `dialog_container_A = [None]`
|
|
|
|
|
2. Track B's worker calls `confirm_spawn` → `dialog_container_B = [None]`
|
|
|
|
|
3. Both tasks queue to `_pending_gui_tasks`
|
|
|
|
|
4. GUI processes Track B's dialog first → sets `dialog_container_B[0]`
|
|
|
|
|
5. Track A times out waiting for `dialog_container_A[0]` to be set
|
|
|
|
|
6. **Track A worker returns `"BLOCKED: Spawn rejected by user."` and NEVER calls `ai_client.send()`**
|
|
|
|
|
|
|
|
|
|
**Impact:** If `confirm_spawn` times out, the worker never reaches `ai_client.send()`, so no `stream_callback` is created, and the worker never appears in `mma_streams`.
|
|
|
|
|
|
|
|
|
|
### 3.3 Root Cause #3: Global Singleton `_gemini_cli_adapter` (PENDING)
|
|
|
|
|
|
|
|
|
|
**Problem:** The `GeminiCliAdapter` is a global singleton:
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# src/ai_client.py:97, 1192-1194
|
|
|
|
|
_gemini_cli_adapter: Optional[GeminiCliAdapter] = None
|
|
|
|
|
|
|
|
|
|
def _send_gemini_cli(...):
|
|
|
|
|
global _gemini_cli_adapter
|
|
|
|
|
if _gemini_cli_adapter is None:
|
|
|
|
|
_gemini_cli_adapter = GeminiCliAdapter(binary_path="gemini")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
When two workers use `gemini_cli` provider concurrently:
|
|
|
|
|
- Both call `adapter.send()` on the **same adapter instance**
|
|
|
|
|
- The adapter has `session_id` and `last_usage` instance state
|
|
|
|
|
- Sessions may interfere or overwrite each other
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 4. The `stream_callback` Path
|
|
|
|
|
|
|
|
|
|
### 4.1 Stream Callback Creation
|
|
|
|
|
|
|
|
|
|
The `stream_callback` is created **inside `run_worker_lifecycle`** and captures `ticket.id`:
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# src/multi_agent_conductor.py:528-530
|
|
|
|
|
def stream_callback(chunk: str) -> None:
|
|
|
|
|
if event_queue:
|
|
|
|
|
_queue_put(event_queue, 'mma_stream', {
|
|
|
|
|
'stream_id': f'Tier 3 (Worker): {ticket.id}', # Captures ticket.id!
|
|
|
|
|
'text': chunk
|
|
|
|
|
})
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 4.2 Stream Callback Flow
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Worker Thread A Worker Thread B
|
|
|
|
|
─────────────── ───────────────
|
|
|
|
|
1. create stream_callback(ticket_id="ticket-a-1")
|
|
|
|
|
1. create stream_callback(ticket_id="ticket-b-1")
|
|
|
|
|
2. ai_client.send(..., stream_callback=stream_callback_A)
|
|
|
|
|
2. ai_client.send(..., stream_callback=stream_callback_B)
|
|
|
|
|
3. _send_gemini_cli() calls adapter.send(..., stream_callback=stream_callback_A)
|
|
|
|
|
3. _send_gemini_cli() calls adapter.send(..., stream_callback=stream_callback_B)
|
|
|
|
|
4. adapter.send() fires chunks → stream_callback_A(chunk)
|
|
|
|
|
4. adapter.send() fires chunks → stream_callback_B(chunk)
|
|
|
|
|
5. stream_callback_A does _queue_put(event_queue, 'mma_stream', {'stream_id': 'Tier 3 (Worker): ticket-a-1', ...})
|
|
|
|
|
5. stream_callback_B does _queue_put(event_queue, 'mma_stream', {'stream_id': 'Tier 3 (Worker): ticket-b-1', ...})
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 4.3 mma_streams Population
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# src/app_controller.py:755-768
|
|
|
|
|
def _handle_mma_stream(self, payload):
|
|
|
|
|
text = payload.get("text", "")
|
|
|
|
|
stream_id = payload.get("stream_id") # e.g., "Tier 3 (Worker): ticket-a-1"
|
|
|
|
|
is_streaming = payload.get("status") == "streaming..."
|
|
|
|
|
if stream_id:
|
|
|
|
|
if is_streaming:
|
|
|
|
|
if stream_id not in self.mma_streams:
|
|
|
|
|
self.mma_streams[stream_id] = ""
|
|
|
|
|
self.mma_streams[stream_id] += text # ACCUMULATES chunks
|
|
|
|
|
else:
|
|
|
|
|
self.mma_streams[stream_id] = text # Final response
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 4.4 get_mma_workers API
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# src/api_hooks.py:250-255
|
|
|
|
|
elif self.path == "/api/mma/workers":
|
|
|
|
|
mma_streams = _get_app_attr(app, "mma_streams", {})
|
|
|
|
|
self.wfile.write(json.dumps({"workers": _serialize_for_api(mma_streams)}).encode("utf-8"))
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The test polls `get_mma_workers()` and checks if `"ticket-A-1"` or `"ticket-B-1"` appears in the stream_ids.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 5. Why Only Track B Appears
|
|
|
|
|
|
|
|
|
|
### 5.1 Hypothesis: `confirm_spawn` Timeout
|
|
|
|
|
|
|
|
|
|
If Track A's `confirm_spawn` times out:
|
|
|
|
|
1. Worker returns `"BLOCKED: Spawn rejected by user."`
|
|
|
|
|
2. `ai_client.send()` is NEVER called
|
|
|
|
|
3. `stream_callback` is NEVER created
|
|
|
|
|
4. No `mma_stream` events are ever queued
|
|
|
|
|
5. `"ticket-a-1"` never appears in `mma_streams`
|
|
|
|
|
|
|
|
|
|
### 5.2 Why B "Stays Longer"
|
|
|
|
|
|
|
|
|
|
Track B's dialog is processed first (due to queue ordering), so:
|
|
|
|
|
1. Track B's `confirm_spawn` succeeds
|
|
|
|
|
2. Track B calls `ai_client.send()`
|
|
|
|
|
3. Track B's `stream_callback` fires
|
|
|
|
|
4. Track B appears in `mma_streams`
|
|
|
|
|
5. Track A is still waiting for its dialog (or has timed out)
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 6. Detailed Fix Strategy
|
|
|
|
|
|
|
|
|
|
### 6.1 Fix #1: Thread-Local Callbacks (IMPLEMENTED ✓)
|
|
|
|
|
|
|
|
|
|
**Status:** DONE
|
|
|
|
|
|
|
|
|
|
**Files Modified:**
|
|
|
|
|
- `src/ai_client.py:103-127` - Added `get_comms_log_callback()` and `set_comms_log_callback()` using `_local_storage`
|
|
|
|
|
- `src/ai_client.py:215-217` - Changed `_append_comms()` to use thread-local callback
|
|
|
|
|
- `src/ai_client.py:1238-1249` - Changed direct callback invocation to use thread-local getter
|
|
|
|
|
- `src/multi_agent_conductor.py:532,551,565` - Changed to use `get_comms_log_callback()` and `set_comms_log_callback()`
|
|
|
|
|
- `src/app_controller.py:1522` - Changed to use `set_comms_log_callback()`
|
|
|
|
|
|
|
|
|
|
### 6.2 Fix #2: Per-Ticket Dialog Containers (PENDING)
|
|
|
|
|
|
|
|
|
|
**Required Change:** Make `confirm_spawn` dialog containers ticket-specific:
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# Option A: Use threading.Event per ticket
|
|
|
|
|
_ticket_events: Dict[str, threading.Event] = {}
|
|
|
|
|
|
|
|
|
|
def confirm_spawn(ticket_id, ...):
|
|
|
|
|
event = threading.Event()
|
|
|
|
|
_ticket_events[ticket_id] = event
|
|
|
|
|
task = {"action": "mma_spawn_approval", "ticket_id": ticket_id, ...}
|
|
|
|
|
_queue_put(event_queue, "mma_spawn_approval", task)
|
|
|
|
|
event.wait(timeout=60)
|
|
|
|
|
return _ticket_events[ticket_id].result
|
|
|
|
|
|
|
|
|
|
# GUI sets:
|
|
|
|
|
_ticket_events[ticket_id].result = (approved, prompt, context)
|
|
|
|
|
_ticket_events[ticket_id].set()
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 6.3 Fix #3: Per-Call Gemini CLI Adapter (PENDING)
|
|
|
|
|
|
|
|
|
|
**Option A:** Create new adapter per send call
|
|
|
|
|
```python
|
|
|
|
|
def _send_gemini_cli(..., adapter=None):
|
|
|
|
|
if adapter is None:
|
|
|
|
|
adapter = GeminiCliAdapter(binary_path="gemini")
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Option B:** Add locking around adapter.send()
|
|
|
|
|
```python
|
|
|
|
|
_gemini_cli_lock = threading.Lock()
|
|
|
|
|
def _send_gemini_cli(...):
|
|
|
|
|
global _gemini_cli_adapter
|
|
|
|
|
with _gemini_cli_lock:
|
|
|
|
|
if _gemini_cli_adapter is None:
|
|
|
|
|
_gemini_cli_adapter = GeminiCliAdapter(...)
|
|
|
|
|
return _gemini_cli_adapter.send(...)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 7. Test Verification Strategy
|
|
|
|
|
|
|
|
|
|
### 7.1 Current Test
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
# tests/test_mma_concurrent_tracks_sim.py
|
|
|
|
|
|
|
|
|
|
def test_mma_concurrent_tracks_execution(live_gui):
|
|
|
|
|
# 1. Setup mock provider
|
|
|
|
|
client.set_value('current_provider', 'gemini_cli')
|
|
|
|
|
client.set_value('gcli_path', f'"{sys.executable}" "{mock_path}"')
|
|
|
|
|
|
|
|
|
|
# 2. Plan epic → generates Track A and Track B
|
|
|
|
|
client.set_value('mma_epic_input', 'PATH: Epic Initialization')
|
|
|
|
|
client.click('btn_mma_plan_epic')
|
|
|
|
|
# Poll until len(proposed_tracks) >= 2
|
|
|
|
|
|
|
|
|
|
# 3. Accept tracks
|
|
|
|
|
client.click('btn_mma_accept_tracks')
|
|
|
|
|
# Poll until len(tracks) >= 2
|
|
|
|
|
|
|
|
|
|
# 4. Start Track A, then Track B
|
|
|
|
|
client.click('btn_mma_start_track', user_data=track_a_id)
|
|
|
|
|
time.sleep(0.5)
|
|
|
|
|
client.click('btn_mma_start_track', user_data=track_b_id)
|
|
|
|
|
|
|
|
|
|
# 5. Poll get_mma_workers() looking for both ticket-a-1 and ticket-b-1
|
|
|
|
|
for i in range(40):
|
|
|
|
|
workers = client.get_mma_workers().get('workers', {})
|
|
|
|
|
stream_ids = workers.keys()
|
|
|
|
|
if any("ticket-A-1" in sid for sid in stream_ids):
|
|
|
|
|
seen_a = True
|
|
|
|
|
if any("ticket-B-1" in sid for sid in stream_ids):
|
|
|
|
|
seen_b = True
|
|
|
|
|
if seen_a and seen_b:
|
|
|
|
|
break
|
|
|
|
|
time.sleep(1)
|
|
|
|
|
|
|
|
|
|
assert seen_a, "Worker from Track A never appeared"
|
|
|
|
|
assert seen_b, "Worker from Track B never appeared"
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 7.2 Debug Points
|
|
|
|
|
|
|
|
|
|
Add debug logging at:
|
|
|
|
|
1. `_cb_start_track` entry/exit
|
|
|
|
|
2. `engine.run()` loop iteration
|
|
|
|
|
3. `confirm_spawn` call and timeout
|
|
|
|
|
4. `stream_callback` invocation
|
|
|
|
|
5. `_handle_mma_stream` entry
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 8. Related Files Reference
|
|
|
|
|
|
|
|
|
|
| File | Key Functions/Classes | Relevance |
|
|
|
|
|
|------|----------------------|-----------|
|
|
|
|
|
| `src/app_controller.py` | `_cb_start_track`, `_cb_load_track`, `engines`, `mma_streams`, `_handle_mma_stream` | Central coordinator |
|
|
|
|
|
| `src/multi_agent_conductor.py` | `run_worker_lifecycle`, `confirm_spawn`, `stream_callback`, `WorkerPool` | Worker execution |
|
|
|
|
|
| `src/ai_client.py` | `send`, `_send_gemini_cli`, `comms_log_callback`, `_append_comms` | AI client, callback dispatch |
|
|
|
|
|
| `src/project_manager.py` | `flat_config`, `load_track_state`, `save_track_state` | Project/track persistence |
|
|
|
|
|
| `src/aggregate.py` | `run()` | Context aggregation |
|
|
|
|
|
| `src/api_hooks.py` | `HookServer`, `/api/mma/workers` | Test interface |
|
|
|
|
|
| `src/api_hook_client.py` | `get_mma_workers()` | Test client |
|
|
|
|
|
| `src/gemini_cli_adapter.py` | `GeminiCliAdapter.send()` | CLI subprocess adapter |
|
|
|
|
|
| `tests/mock_concurrent_mma.py` | `main()` | Mock provider for tests |
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 9. Commit History for This Track
|
|
|
|
|
|
|
|
|
|
| Commit | Description |
|
|
|
|
|
|--------|-------------|
|
|
|
|
|
| `ac0b564` | fix(mma): Change self.engine to self.engines dict for concurrent track support |
|
|
|
|
|
| `6f2a2c2` | fix(gui): Initialize active_tickets in AppController.init_state |
|
|
|
|
|
| `f3585cb` | fix(mma): Use track.id instead of self.active_track.id in _start_track_logic |
|
|
|
|
|
| `b0a837d` | fix(mma): Add explicit reload logic when _cb_load_track fails in _cb_start_track |
|
|
|
|
|
| `cab733a` | debug: Add logging to _cb_start_track |
|
|
|
|
|
| `b44ee29` | conductor(plan): Mark task complete |
|
|
|
|
|
| `6f2f539` | conductor(track): Update spec and plan with complete bug analysis |
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 10. Open Questions
|
|
|
|
|
|
|
|
|
|
1. **Is `confirm_spawn` actually timing out for Track A?** Need to add logging to confirm.
|
|
|
|
|
2. **Is the Gemini CLI adapter causing session interference?** Need to verify with per-call adapters.
|
|
|
|
|
3. **Is `auto_queue=True` causing immediate execution before dialog can be shown?** The `auto_queue` parameter bypasses step mode but still calls `confirm_spawn` for HITL clutch.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 11. Recommended Immediate Fixes
|
|
|
|
|
|
|
|
|
|
### Priority 1: Fix `confirm_spawn` Dialog Container
|
|
|
|
|
Make dialog containers ticket-specific using a dictionary keyed by `ticket_id` instead of a shared list.
|
|
|
|
|
|
|
|
|
|
### Priority 2: Add Per-Call Adapter Lock
|
|
|
|
|
Add a lock around `_gemini_cli_adapter.send()` to prevent session interference.
|
|
|
|
|
|
|
|
|
|
### Priority 3: Add Comprehensive Debug Logging
|
|
|
|
|
Log every major lifecycle event in both worker threads and the GUI to trace exactly where Track A diverges from Track B.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## 12. Phase Checklist
|
|
|
|
|
|
|
|
|
|
### Phase 1: Investigate Test Infrastructure
|
|
|
|
|
- [x] Task: Compare test_mma_concurrent_tracks_sim.py with working tests
|
|
|
|
|
- [x] Task: Check subprocess/port cleanup between live_gui fixture tests
|
|
|
|
|
- [x] Task: Verify get_mma_workers() API returns expected format
|
|
|
|
|
- [x] Task: Run isolated concurrent test with verbose debugging
|
|
|
|
|
|
|
|
|
|
### Phase 2: Identify Root Cause - MULTIPLE BUGS FOUND
|
|
|
|
|
- [x] Bug 1: `self.engine` single reference overwritten (FIXED in ac0b564)
|
|
|
|
|
- [x] Bug 2: `active_tickets` not initialized (FIXED in 6f2a2c2)
|
|
|
|
|
- [x] Bug 3: `_start_track_logic` uses `self.active_track.id` when None (FIXED in f3585cb)
|
|
|
|
|
- [x] Bug 4: `_cb_start_track` overwrites `active_track` (PARTIALLY FIXED in b0a837d)
|
|
|
|
|
|
|
|
|
|
### Phase 3: Implement Fix
|
|
|
|
|
- [x] Task 3.1: Change `self.engine: Optional[ConductorEngine]` to `self.engines: Dict[str, ConductorEngine]`
|
|
|
|
|
- [x] Task 3.2: Update all `self.engine` references to track-specific lookup
|
|
|
|
|
- [x] Task 3.3: Fix `_start_track_logic` to use `track.id` instead of `self.active_track.id`
|
|
|
|
|
- [x] Task 3.4: Add `self.active_tickets = []` in `init_state()`
|
|
|
|
|
- [x] Task 3.5: Add explicit reload path in `_cb_start_track` when load fails
|
|
|
|
|
- [ ] Task 3.6: Verify `_cb_load_track` properly loads tracks created by `_cb_accept_tracks`
|
|
|
|
|
|
|
|
|
|
### Phase 4: Fix Thread-Local Callback (IMPLEMENTED)
|
|
|
|
|
- [x] Task 4.1: Make comms_log_callback thread-local via _local_storage
|
|
|
|
|
- [x] Task 4.2: Update _append_comms to use thread-local getter
|
|
|
|
|
- [x] Task 4.3: Update direct callback invocations to use thread-local getter
|
|
|
|
|
- [x] Task 4.4: Update run_worker_lifecycle to use set_comms_log_callback
|
|
|
|
|
|
|
|
|
|
### Phase 5: Fix confirm_spawn Dialog (PENDING)
|
|
|
|
|
- [ ] Task 5.1: Make dialog_container per-ticket using ticket_id key
|
|
|
|
|
- [ ] Task 5.2: Update GUI handler to set correct dialog_container
|
|
|
|
|
|
|
|
|
|
### Phase 6: Fix Gemini CLI Adapter (PENDING)
|
|
|
|
|
- [ ] Task 6.1: Add lock around _gemini_cli_adapter.send() OR
|
|
|
|
|
- [ ] Task 6.2: Create per-call adapter
|
|
|
|
|
|
|
|
|
|
### Phase 7: Final Verification
|
|
|
|
|
- [ ] Task 7.1: Run concurrent MMA tests
|
|
|
|
|
- [ ] Task 7.2: Run full test suite to check for regressions
|
|
|
|
|
- [ ] Task 7.3: Remove debug logging and commit final fixes
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
*This report was generated during the fix_concurrent_mma_tests_20260507 track investigation.*
|
|
|
|
|
*Last updated: 2026-05-07*
|