docs update (wip)
This commit is contained in:
@@ -138,6 +138,31 @@ class ExecutionEngine:
|
||||
|
||||
---
|
||||
|
||||
## WorkerPool (`multi_agent_conductor.py`)
|
||||
|
||||
Bounded concurrent worker pool with semaphore gating.
|
||||
|
||||
```python
|
||||
class WorkerPool:
|
||||
def __init__(self, max_workers: int = 4):
|
||||
self.max_workers = max_workers
|
||||
self._active: dict[str, threading.Thread] = {}
|
||||
self._lock = threading.Lock()
|
||||
self._semaphore = threading.Semaphore(max_workers)
|
||||
```
|
||||
|
||||
**Key Methods:**
|
||||
- `spawn(ticket_id, target, args)` — Spawns a worker thread if pool has capacity. Returns `None` if full.
|
||||
- `join_all(timeout)` — Waits for all active workers to complete.
|
||||
- `get_active_count()` — Returns current number of active workers.
|
||||
- `is_full()` — Returns `True` if at capacity.
|
||||
|
||||
**Thread Safety:** All state mutations are protected by `_lock`. The semaphore ensures at most `max_workers` threads execute concurrently.
|
||||
|
||||
**Configuration:** `max_workers` is loaded from `config.toml` → `[mma].max_workers` (default: 4).
|
||||
|
||||
---
|
||||
|
||||
## ConductorEngine (`multi_agent_conductor.py`)
|
||||
|
||||
The Tier 2 orchestrator. Owns the execution loop that drives tickets through the DAG.
|
||||
@@ -148,13 +173,16 @@ class ConductorEngine:
|
||||
self.track = track
|
||||
self.event_queue = event_queue
|
||||
self.tier_usage = {
|
||||
"Tier 1": {"input": 0, "output": 0},
|
||||
"Tier 2": {"input": 0, "output": 0},
|
||||
"Tier 3": {"input": 0, "output": 0},
|
||||
"Tier 4": {"input": 0, "output": 0},
|
||||
"Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
|
||||
"Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
|
||||
"Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
|
||||
"Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
|
||||
}
|
||||
self.dag = TrackDAG(self.track.tickets)
|
||||
self.engine = ExecutionEngine(self.dag, auto_queue=auto_queue)
|
||||
self.pool = WorkerPool(max_workers=max_workers)
|
||||
self._abort_events: dict[str, threading.Event] = {}
|
||||
self._pause_event: threading.Event = threading.Event()
|
||||
```
|
||||
|
||||
### State Broadcast (`_push_state`)
|
||||
@@ -350,6 +378,80 @@ Each tier operates within its own token budget:
|
||||
|
||||
---
|
||||
|
||||
## Abort Event Propagation
|
||||
|
||||
Workers can be killed mid-execution via abort events:
|
||||
|
||||
```python
|
||||
# In ConductorEngine.__init__:
|
||||
self._abort_events: dict[str, threading.Event] = {}
|
||||
|
||||
# When spawning a worker:
|
||||
self._abort_events[ticket.id] = threading.Event()
|
||||
|
||||
# To kill a worker:
|
||||
def kill_worker(self, ticket_id: str) -> None:
|
||||
if ticket_id in self._abort_events:
|
||||
self._abort_events[ticket_id].set() # Signal abort
|
||||
thread = self._active_workers.get(ticket_id)
|
||||
if thread:
|
||||
thread.join(timeout=1.0) # Wait for graceful shutdown
|
||||
```
|
||||
|
||||
**Abort Check Points in `run_worker_lifecycle`:**
|
||||
1. **Before major work** — checked immediately after `ai_client.reset_session()`
|
||||
2. **During clutch_callback** — checked before each tool execution
|
||||
3. **After blocking send()** — checked after AI call returns
|
||||
|
||||
When abort is detected, the ticket status is set to `"killed"` and the worker exits immediately.
|
||||
|
||||
---
|
||||
|
||||
## Pause/Resume Control
|
||||
|
||||
The engine supports pausing the entire orchestration pipeline:
|
||||
|
||||
```python
|
||||
def pause(self) -> None:
|
||||
self._pause_event.set()
|
||||
|
||||
def resume(self) -> None:
|
||||
self._pause_event.clear()
|
||||
```
|
||||
|
||||
In the main `run()` loop:
|
||||
|
||||
```python
|
||||
while True:
|
||||
if self._pause_event.is_set():
|
||||
self._push_state(status="paused", active_tier="Paused")
|
||||
time.sleep(0.5)
|
||||
continue
|
||||
# ... normal execution
|
||||
```
|
||||
|
||||
This allows the user to pause execution without killing workers.
|
||||
|
||||
---
|
||||
|
||||
## Model Escalation
|
||||
|
||||
Workers automatically escalate to more capable models on retry:
|
||||
|
||||
```python
|
||||
models_list = [
|
||||
"gemini-2.5-flash-lite", # First attempt
|
||||
"gemini-2.5-flash", # Second attempt
|
||||
"gemini-3.1-pro-preview" # Third+ attempt
|
||||
]
|
||||
model_idx = min(ticket.retry_count, len(models_list) - 1)
|
||||
model_name = models_list[model_idx]
|
||||
```
|
||||
|
||||
The `ticket.model_override` field can bypass this logic with a specific model.
|
||||
|
||||
---
|
||||
|
||||
## Track State Persistence
|
||||
|
||||
Track state can be persisted to disk via `project_manager.py`:
|
||||
|
||||
Reference in New Issue
Block a user