docs update (wip)

2026-03-08 01:46:34 -05:00
parent d9a06fd2fe
commit d34c35941f
14 changed files with 1213 additions and 105 deletions
@@ -138,6 +138,31 @@ class ExecutionEngine:

 ---

+## WorkerPool (`multi_agent_conductor.py`)
+
+Bounded concurrent worker pool with semaphore gating.
+
+```python
+class WorkerPool:
+    def __init__(self, max_workers: int = 4):
+        self.max_workers = max_workers
+        self._active: dict[str, threading.Thread] = {}
+        self._lock = threading.Lock()
+        self._semaphore = threading.Semaphore(max_workers)
+```
+
+**Key Methods:**
+- `spawn(ticket_id, target, args)` — Spawns a worker thread if pool has capacity. Returns `None` if full.
+- `join_all(timeout)` — Waits for all active workers to complete.
+- `get_active_count()` — Returns current number of active workers.
+- `is_full()` — Returns `True` if at capacity.
+
+**Thread Safety:** All state mutations are protected by `_lock`. The semaphore ensures at most `max_workers` threads execute concurrently.
+
+**Configuration:** `max_workers` is loaded from `config.toml` → `[mma].max_workers` (default: 4).
+
+---
+
 ## ConductorEngine (`multi_agent_conductor.py`)

 The Tier 2 orchestrator. Owns the execution loop that drives tickets through the DAG.
@@ -148,13 +173,16 @@ class ConductorEngine:
        self.track = track
        self.event_queue = event_queue
        self.tier_usage = {
-            "Tier 1": {"input": 0, "output": 0},
-            "Tier 2": {"input": 0, "output": 0},
-            "Tier 3": {"input": 0, "output": 0},
-            "Tier 4": {"input": 0, "output": 0},
+            "Tier 1": {"input": 0, "output": 0, "model": "gemini-3.1-pro-preview"},
+            "Tier 2": {"input": 0, "output": 0, "model": "gemini-3-flash-preview"},
+            "Tier 3": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
+            "Tier 4": {"input": 0, "output": 0, "model": "gemini-2.5-flash-lite"},
        }
        self.dag = TrackDAG(self.track.tickets)
        self.engine = ExecutionEngine(self.dag, auto_queue=auto_queue)
+        self.pool = WorkerPool(max_workers=max_workers)
+        self._abort_events: dict[str, threading.Event] = {}
+        self._pause_event: threading.Event = threading.Event()
 ```

 ### State Broadcast (`_push_state`)
@@ -350,6 +378,80 @@ Each tier operates within its own token budget:

 ---

+## Abort Event Propagation
+
+Workers can be killed mid-execution via abort events:
+
+```python
+# In ConductorEngine.__init__:
+self._abort_events: dict[str, threading.Event] = {}
+
+# When spawning a worker:
+self._abort_events[ticket.id] = threading.Event()
+
+# To kill a worker:
+def kill_worker(self, ticket_id: str) -> None:
+    if ticket_id in self._abort_events:
+        self._abort_events[ticket_id].set()  # Signal abort
+    thread = self._active_workers.get(ticket_id)
+    if thread:
+        thread.join(timeout=1.0)  # Wait for graceful shutdown
+```
+
+**Abort Check Points in `run_worker_lifecycle`:**
+1. **Before major work** — checked immediately after `ai_client.reset_session()`
+2. **During clutch_callback** — checked before each tool execution
+3. **After blocking send()** — checked after AI call returns
+
+When abort is detected, the ticket status is set to `"killed"` and the worker exits immediately.
+
+---
+
+## Pause/Resume Control
+
+The engine supports pausing the entire orchestration pipeline:
+
+```python
+def pause(self) -> None:
+    self._pause_event.set()
+
+def resume(self) -> None:
+    self._pause_event.clear()
+```
+
+In the main `run()` loop:
+
+```python
+while True:
+    if self._pause_event.is_set():
+        self._push_state(status="paused", active_tier="Paused")
+        time.sleep(0.5)
+        continue
+    # ... normal execution
+```
+
+This allows the user to pause execution without killing workers.
+
+---
+
+## Model Escalation
+
+Workers automatically escalate to more capable models on retry:
+
+```python
+models_list = [
+    "gemini-2.5-flash-lite",    # First attempt
+    "gemini-2.5-flash",          # Second attempt
+    "gemini-3.1-pro-preview"     # Third+ attempt
+]
+model_idx = min(ticket.retry_count, len(models_list) - 1)
+model_name = models_list[model_idx]
+```
+
+The `ticket.model_override` field can bypass this logic with a specific model.
+
+---
+
 ## Track State Persistence

 Track state can be persisted to disk via `project_manager.py`: