docs update (wip)

This commit is contained in:
2026-03-08 01:46:34 -05:00
parent d9a06fd2fe
commit d34c35941f
14 changed files with 1213 additions and 105 deletions

View File

@@ -138,6 +138,31 @@ class ExecutionEngine:
---
## WorkerPool (`multi_agent_conductor.py`)
Bounded concurrent worker pool with semaphore gating.
```python
class WorkerPool:
def __init__(self, max_workers: int = 4):
self.max_workers = max_workers
self._active: dict[str, threading.Thread] = {}
self._lock = threading.Lock()
self._semaphore = threading.Semaphore(max_workers)
```
**Key Methods:**
- `spawn(ticket_id, target, args)` — Spawns a worker thread if pool has capacity. Returns `None` if full.
- `join_all(timeout)` — Waits for all active workers to complete.
- `get_active_count()` — Returns current number of active workers.
- `is_full()` — Returns `True` if at capacity.
**Thread Safety:** All state mutations are protected by `_lock`. The semaphore ensures at most `max_workers` threads execute concurrently.
**Configuration:** `max_workers` is loaded from `config.toml``[mma].max_workers` (default: 4).
---
## ConductorEngine (`multi_agent_conductor.py`)
The Tier 2 orchestrator. Owns the execution loop that drives tickets through the DAG.
@@ -148,13 +173,16 @@ class ConductorEngine:
self.track = track
self.event_queue = event_queue
self.tier_usage = {
"Tier 1": {"input": 0, "output": 0},
"Tier 2": {"input": 0, "output": 0},
"Tier 3": {"input": 0, "output": 0},
"Tier 4": {"input": 0, "output": 0},
"Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
"Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
"Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
"Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
}
self.dag = TrackDAG(self.track.tickets)
self.engine = ExecutionEngine(self.dag, auto_queue=auto_queue)
self.pool = WorkerPool(max_workers=max_workers)
self._abort_events: dict[str, threading.Event] = {}
self._pause_event: threading.Event = threading.Event()
```
### State Broadcast (`_push_state`)
@@ -350,6 +378,80 @@ Each tier operates within its own token budget:
---
## Abort Event Propagation
Workers can be killed mid-execution via abort events:
```python
# In ConductorEngine.__init__:
self._abort_events: dict[str, threading.Event] = {}
# When spawning a worker:
self._abort_events[ticket.id] = threading.Event()
# To kill a worker:
def kill_worker(self, ticket_id: str) -> None:
if ticket_id in self._abort_events:
self._abort_events[ticket_id].set() # Signal abort
thread = self._active_workers.get(ticket_id)
if thread:
thread.join(timeout=1.0) # Wait for graceful shutdown
```
**Abort Check Points in `run_worker_lifecycle`:**
1. **Before major work** — checked immediately after `ai_client.reset_session()`
2. **During clutch_callback** — checked before each tool execution
3. **After blocking send()** — checked after AI call returns
When abort is detected, the ticket status is set to `"killed"` and the worker exits immediately.
---
## Pause/Resume Control
The engine supports pausing the entire orchestration pipeline:
```python
def pause(self) -> None:
self._pause_event.set()
def resume(self) -> None:
self._pause_event.clear()
```
In the main `run()` loop:
```python
while True:
if self._pause_event.is_set():
self._push_state(status="paused", active_tier="Paused")
time.sleep(0.5)
continue
# ... normal execution
```
This allows the user to pause execution without killing workers.
---
## Model Escalation
Workers automatically escalate to more capable models on retry:
```python
models_list = [
"gemini-2.5-flash-lite", # First attempt
"gemini-2.5-flash", # Second attempt
"gemini-3.1-pro-preview" # Third+ attempt
]
model_idx = min(ticket.retry_count, len(models_list) - 1)
model_name = models_list[model_idx]
```
The `ticket.model_override` field can bypass this logic with a specific model.
---
## Track State Persistence
Track state can be persisted to disk via `project_manager.py`: