chore(conductor): Enhance all 6 backlog tracks to Surgical Spec Protocol

2026-03-02 22:38:02 -05:00
parent 2f4dca719f
commit 2e73212abd
12 changed files with 286 additions and 89 deletions
--- a/conductor/tracks/concurrent_tier_source_tier_20260302/plan.md
+++ b/conductor/tracks/concurrent_tier_source_tier_20260302/plan.md
@@ -1,10 +1,31 @@
-# Implementation Plan: Concurrent Tier Isolation
+# Implementation Plan: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302)

-## Phase 1: Thread-Local Storage
+## Phase 1: Thread-Local Context Refactoring
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Replace `current_tier` with `threading.local()`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
+- [ ] Task: Refactor `ai_client` to `threading.local()`
+    - [ ] WHERE: `ai_client.py`
+    - [ ] WHAT: Replace `current_tier = None` with `_local_context = threading.local()`. Implement safe getters/setters for the tier.
+    - [ ] HOW: Use standard `threading.local` attributes.
+    - [ ] SAFETY: Provide defaults (e.g., `getattr(_local_context, 'tier', None)`) so uninitialized threads don't crash.
+- [ ] Task: Update Lifecycle Callers
+    - [ ] WHERE: `multi_agent_conductor.py`, `conductor_tech_lead.py`
+    - [ ] WHAT: Update how they set the current tier around `send()` calls.
+    - [ ] HOW: Use the new setter/getter functions from `ai_client`.
+    - [ ] SAFETY: Ensure `finally` blocks clean up the thread-local state.
+- [ ] Task: Conductor - User Manual Verification 'Phase 1: Refactoring' (Protocol in workflow.md)

-## Phase 2: Refactor & Test
- [ ] Task: Update loggers and test with mock concurrent threads.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
+## Phase 2: Testing Concurrency
+- [ ] Task: Write Concurrent Execution Test
+    - [ ] WHERE: `tests/test_ai_client_concurrency.py` (New)
+    - [ ] WHAT: Spawn two threads. Thread A sets Tier 3 and calls a mock `send`. Thread B sets Tier 4 and calls mock `send`. 
+    - [ ] HOW: Assert that the resulting `comms_log` correctly maps the entries to Tier 3 and Tier 4 respectively without race condition overwrites.
+    - [ ] SAFETY: Use `threading.Barrier` to force race conditions in the test to ensure the isolation holds.
+- [ ] Task: Conductor - User Manual Verification 'Phase 2: Testing Concurrency' (Protocol in workflow.md)
+
+## Phase 3: Final Validation
+- [ ] Task: Full Suite Validation & Warning Cleanup
+    - [ ] WHERE: Project root
+    - [ ] WHAT: `uv run pytest`
+    - [ ] HOW: Ensure 100% pass rate.
+    - [ ] SAFETY: None.
+- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)
--- a/conductor/tracks/concurrent_tier_source_tier_20260302/spec.md
+++ b/conductor/tracks/concurrent_tier_source_tier_20260302/spec.md
@@ -1,8 +1,18 @@
-# Track Specification: Concurrent Tier Source Isolation
+# Track Specification: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302)

 ## Overview
-Prepares the architecture for parallel Tier 3/4 agents by replacing the global `ai_client.current_tier` with thread-safe `threading.local()` or explicit call signatures.
+Currently, `ai_client.current_tier` is a module-level `str | None`. This works safely only because the MMA engine serializes `ai_client.send()` calls. To prepare the architecture for parallel agents (e.g., executing multiple Tier 3 worker tickets concurrently), this global state must be replaced. This track will refactor the tagging system to use thread-safe context.
+
+## Architectural Constraints
+- **Thread Safety**: The solution MUST guarantee that if two threads call `ai_client.send()` simultaneously, their `source_tier` logs do not cross-contaminate.
+- **API Surface**: Prefer passing `source_tier` explicitly in the `send()` method signature over implicit global/local state to ensure functional purity, OR use strictly isolated `threading.local()`.

 ## Functional Requirements
- Refactor `current_tier` to be thread-safe.
- Update all logging calls to use the thread-safe context.
+- Refactor `ai_client.py` to remove the global `current_tier` variable.
+- Update `run_worker_lifecycle` and `generate_tickets` to pass the tier context directly to the AI client or into a `threading.local` context block.
+- Update `_append_comms` and `_append_tool_log` to utilize the thread-safe context.
+
+## Acceptance Criteria
+- [ ] `ai_client.current_tier` global variable is removed.
+- [ ] `source_tier` tagging in `_comms_log` and `_tool_log` continues to function accurately.
+- [ ] Tests simulate concurrent `send()` calls from different threads and assert correct log tagging without race conditions.
--- a/conductor/tracks/gui_decoupling_controller_20260302/plan.md
+++ b/conductor/tracks/gui_decoupling_controller_20260302/plan.md
@@ -1,18 +1,49 @@
-# Implementation Plan: GUI Decoupling
+# Implementation Plan: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302)

-## Phase 1: Controller Skeleton
+## Phase 1: Controller Skeleton & State Migration
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Create `app_controller.py`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
+- [ ] Task: Create `app_controller.py` Skeleton
+    - [ ] WHERE: `app_controller.py` (New file)
+    - [ ] WHAT: Create the `AppController` class. Initialize basic state structures (logs, metrics, flags).
+    - [ ] HOW: Standard class definition.
+    - [ ] SAFETY: Do not break existing GUI yet.
+- [ ] Task: Migrate Data State from GUI
+    - [ ] WHERE: `gui_2.py:__init__` and `app_controller.py`
+    - [ ] WHAT: Move variables like `_comms_log`, `_tool_log`, `mma_streams`, `active_tickets` to the controller.
+    - [ ] HOW: Update GUI to reference `self.controller.mma_streams` instead of `self.mma_streams`.
+    - [ ] SAFETY: Search and replace carefully; use `py_check_syntax`.
+- [ ] Task: Conductor - User Manual Verification 'Phase 1: State Migration' (Protocol in workflow.md)

-## Phase 2: State Migration
- [ ] Task: Move App state from `gui_2.py` to controller.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
+## Phase 2: Logic & Background Thread Migration
+- [ ] Task: Extract Background Threads & Event Queue
+    - [ ] WHERE: `gui_2.py` (e.g., `_init_ai_and_hooks`, `_process_event_queue`)
+    - [ ] WHAT: Move the `AsyncEventQueue`, asyncio worker thread, and HookServer initialization to the controller.
+    - [ ] HOW: The GUI should just call `self.controller.start_services()` and read the `_pending_gui_tasks` queue.
+    - [ ] SAFETY: Thread lifecycle management is critical. Ensure shutdown hooks are migrated.
+- [ ] Task: Extract I/O and AI Methods
+    - [ ] WHERE: `gui_2.py` (`_cb_plan_epic`, `_flush_to_project`, `_cb_create_track`)
+    - [ ] WHAT: Move business logic methods to the controller.
+    - [ ] HOW: GUI callbacks simply become `lambda: self.controller.plan_epic(input)`.
+    - [ ] SAFETY: Verify Hook API endpoints still work.
+- [ ] Task: Conductor - User Manual Verification 'Phase 2: Logic Migration' (Protocol in workflow.md)

-## Phase 3: Logic Migration
- [ ] Task: Move non-rendering methods to controller.
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
+## Phase 3: Test Suite Refactoring
+- [ ] Task: Update `conftest.py` Fixtures
+    - [ ] WHERE: `tests/conftest.py`
+    - [ ] WHAT: Update `app_instance` fixture to mock/initialize the `AppController` instead of just `App`.
+    - [ ] HOW: Adjust `patch` targets to hit `app_controller.py` where appropriate.
+    - [ ] SAFETY: Run subset of tests continuously to fix import breaks.
+- [ ] Task: Resolve Broken GUI Tests
+    - [ ] WHERE: `tests/test_gui_*.py`
+    - [ ] WHAT: Update test assertions that look for state on `app_instance` to look at `app_instance.controller`.
+    - [ ] HOW: Surgical string replacements.
+    - [ ] SAFETY: Ensure no false-positives.
+- [ ] Task: Conductor - User Manual Verification 'Phase 3: Test Suite Refactoring' (Protocol in workflow.md)

-## Phase 4: Validation
- [ ] Task: Update all tests to mock/use the controller.
- [ ] Task: Conductor - User Manual Verification 'Phase 4'
+## Phase 4: Final Validation
+- [ ] Task: Full Suite Validation & Warning Cleanup
+    - [ ] WHERE: Project root
+    - [ ] WHAT: `uv run pytest`
+    - [ ] HOW: Ensure 100% pass rate.
+    - [ ] SAFETY: Watch out for lingering thread closure issues.
+- [ ] Task: Conductor - User Manual Verification 'Phase 4: Final Validation' (Protocol in workflow.md)
--- a/conductor/tracks/gui_decoupling_controller_20260302/spec.md
+++ b/conductor/tracks/gui_decoupling_controller_20260302/spec.md
@@ -1,9 +1,21 @@
-# Track Specification: GUI Decoupling & Controller Architecture
+# Track Specification: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302)

 ## Overview
-`gui_2.py` is a monolithic God Object. This track extracts its business logic and state machine into `app_controller.py`, leaving the GUI as a pure immediate-mode view adhering to Data-Oriented Design.
+`gui_2.py` currently operates as a Monolithic God Object (3,500+ lines). It violates the Data-Oriented Design heuristic by owning complex business logic, orchestrator hooks, and markdown file building. This track extracts the core state machine and lifecycle into a headless `app_controller.py`, turning the GUI into a pure immediate-mode view.
+
+## Architectural Constraints: The "Immediate Mode View" Contract
+- **No Business Logic in View**: `gui_2.py` MUST NOT perform file I/O, AI API calls, or subprocess management directly.
+- **State Ownership**: `app_controller.py` (or equivalent) owns the "Source of Truth" state.
+- **Event-Driven Mutations**: The GUI must mutate state exclusively by dispatching events or calling controller methods, never by directly manipulating backend objects in the render loop.

 ## Functional Requirements
- Create `app_controller.py`.
- Migrate state variables and lifecycle methods from `gui_2.py` to the controller.
- Ensure `gui_2.py` only reads state and dispatches events.
+- **Controller Extraction**: Create `app_controller.py` to handle all non-rendering logic.
+- **State Migration**: Move state variables (`_tool_log`, `_comms_log`, `active_tickets`, etc.) out of `App.__init__` into the controller.
+- **Logic Migration**: Move background threads, file reading/writing (`_flush_to_project`), and AI orchestrator invocations to the controller.
+- **View Refactoring**: Refactor `gui_2.py` to accept the controller as a dependency and merely render its current state.
+
+## Acceptance Criteria
+- [ ] `app_controller.py` exists and owns the application state.
+- [ ] `gui_2.py` has been reduced in size and complexity (no file I/O or AI calls).
+- [ ] All existing features (chat, tools, tracks) function identically.
+- [ ] The full test suite runs and passes against the new decoupled architecture.
--- a/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md
+++ b/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md
@@ -1,14 +1,36 @@
-# Implementation Plan: Hook API UI State
+# Implementation Plan: Hook API UI State Verification (hook_api_ui_state_verification_20260302)

-## Phase 1: API Endpoint
+## Phase 1: API Endpoint Implementation
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Implement `/api/gui/state` GET endpoint.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
+- [ ] Task: Implement `/api/gui/state` GET Endpoint
+    - [ ] WHERE: `gui_2.py` (or `app_controller.py` if decoupled), inside `create_api()`.
+    - [ ] WHAT: Add a FastAPI route that serializes allowed UI state variables into JSON.
+    - [ ] HOW: Define a set of safe keys (e.g., `_gettable_fields`) and extract them from the App instance.
+    - [ ] SAFETY: Use thread-safe reads or deepcopies if accessing complex dictionaries.
+- [ ] Task: Update `ApiHookClient`
+    - [ ] WHERE: `api_hook_client.py`
+    - [ ] WHAT: Add a `get_gui_state(self)` method that hits the new endpoint.
+    - [ ] HOW: Standard `requests.get`.
+    - [ ] SAFETY: Include error handling/timeouts.
+- [ ] Task: Conductor - User Manual Verification 'Phase 1: API Endpoint' (Protocol in workflow.md)

-## Phase 2: State Wiring
- [ ] Task: Add UI state fields to `_settable_fields`.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
+## Phase 2: State Wiring & Integration Tests
+- [ ] Task: Wire Critical UI States
+    - [ ] WHERE: `gui_2.py`
+    - [ ] WHAT: Ensure fields like `ui_focus_agent`, `active_discussion`, `_track_discussion_active` are included in the exposed state.
+    - [ ] HOW: Update the mapping definition.
+    - [ ] SAFETY: None.
+- [ ] Task: Write `live_gui` Integration Tests
+    - [ ] WHERE: `tests/test_live_gui_integration.py`
+    - [ ] WHAT: Add a test that changes the provider/model or focus agent via actions, then asserts `client.get_gui_state()` reflects the change.
+    - [ ] HOW: Use `pytest` and `live_gui` fixture.
+    - [ ] SAFETY: Ensure robust wait conditions for GUI updates.
+- [ ] Task: Conductor - User Manual Verification 'Phase 2: State Wiring & Tests' (Protocol in workflow.md)

-## Phase 3: Integration Tests
- [ ] Task: Write `live_gui` tests validating state retrieval.
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
+## Phase 3: Final Validation
+- [ ] Task: Full Suite Validation & Warning Cleanup
+    - [ ] WHERE: Project root
+    - [ ] WHAT: `uv run pytest`
+    - [ ] HOW: Ensure 100% pass rate.
+    - [ ] SAFETY: Ensure the hook server gracefully stops.
+- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)
--- a/conductor/tracks/hook_api_ui_state_verification_20260302/spec.md
+++ b/conductor/tracks/hook_api_ui_state_verification_20260302/spec.md
@@ -1,9 +1,18 @@
-# Track Specification: Hook API UI State Verification
+# Track Specification: Hook API UI State Verification (hook_api_ui_state_verification_20260302)

 ## Overview
-Adds an `/api/gui/state` endpoint to expose internal UI widget states (like `ui_focus_agent`) for reliable programmatic testing without user confirmation.
+Currently, manual verification of UI widget state is difficult, and automated testing relies heavily on brittle logic. This track will expose internal UI widget states (like `ui_focus_agent`) via a new `/api/gui/state` GET endpoint. It wires critical UI state variables into `_settable_fields` so the `live_gui` fixture can programmatically read and assert exact widget states without requiring user confirmation dialogs.
+
+## Architectural Constraints
+- **Idempotent Reads**: The `/api/gui/state` endpoint MUST be read-only and free of side-effects.
+- **Thread Safety**: Reading UI state from the HookServer thread MUST use the established locking mechanisms (e.g., querying via thread-safe proxies or safe reads of primitive types).

 ## Functional Requirements
- Add `/api/gui/state` endpoint to the HookServer.
- Wire UI state variables into `_settable_fields`.
- Write `live_gui` integration tests to assert widget states.
+- **New Endpoint**: Implement a `/api/gui/state` GET endpoint in the headless API.
+- **State Wiring**: Expand `_settable_fields` (or create a new `_gettable_fields` mapping) to safely expose internal UI states (combo boxes, checkbox states, active tabs).
+- **Integration Testing**: Write `live_gui` based integration tests that mutate the application state and assert the correct UI state via the new endpoint.
+
+## Acceptance Criteria
+- [ ] `/api/gui/state` endpoint successfully returns JSON representing the UI state.
+- [ ] Key UI variables (like `ui_focus_agent`) are queryable via the Hook Client.
+- [ ] New `live_gui` integration tests exist that validate UI state retrieval.
--- a/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md
+++ b/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md
@@ -1,10 +1,26 @@
-# Implementation Plan: Robust JSON Parsing
+# Implementation Plan: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302)

-## Phase 1: Retry Logic
+## Phase 1: Implementation of Retry Logic
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Implement retry loop in `conductor_tech_lead.py`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
+- [ ] Task: Implement Retry Loop in `generate_tickets`
+    - [ ] WHERE: `conductor_tech_lead.py:generate_tickets`
+    - [ ] WHAT: Wrap the `send` and `json.loads` calls in a `for _ in range(max_retries)` loop.
+    - [ ] HOW: If `JSONDecodeError` is caught, append an error message to the context and loop. If it succeeds, `break` and return.
+    - [ ] SAFETY: Ensure token limits aren't massively breached by appending huge error states. Truncate raw output if necessary.
+- [ ] Task: Conductor - User Manual Verification 'Phase 1: Implementation' (Protocol in workflow.md)

-## Phase 2: Validation
- [ ] Task: Write unit tests simulating JSON hallucination.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
+## Phase 2: Unit Testing
+- [ ] Task: Write Simulation Tests for JSON Parsing
+    - [ ] WHERE: `tests/test_conductor_tech_lead.py`
+    - [ ] WHAT: Add tests `test_generate_tickets_retry_success` and `test_generate_tickets_retry_failure`.
+    - [ ] HOW: Mock `ai_client.send` side_effect to return invalid JSON first, then valid JSON. Assert call counts.
+    - [ ] SAFETY: Standard pytest mocking.
+- [ ] Task: Conductor - User Manual Verification 'Phase 2: Unit Testing' (Protocol in workflow.md)
+
+## Phase 3: Final Validation
+- [ ] Task: Full Suite Validation & Warning Cleanup
+    - [ ] WHERE: Project root
+    - [ ] WHAT: `uv run pytest tests/test_conductor_tech_lead.py`
+    - [ ] HOW: Ensure 100% pass rate.
+    - [ ] SAFETY: None.
+- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)
--- a/conductor/tracks/robust_json_parsing_tech_lead_20260302/spec.md
+++ b/conductor/tracks/robust_json_parsing_tech_lead_20260302/spec.md
@@ -1,9 +1,20 @@
-# Track Specification: Robust JSON Parsing for Tech Lead
+# Track Specification: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302)

 ## Overview
-`conductor_tech_lead.py` silently fails if Tier 2 outputs invalid JSON. This track adds an auto-retry loop that feeds tracebacks back to the LLM for self-correction.
+In `conductor_tech_lead.py`, the `generate_tickets` function relies on a generic `try...except` block to parse the LLM's JSON ticket array. If the Tier 2 model hallucinates or outputs invalid JSON, it silently returns an empty array `[]`, causing the GUI track creation process to fail silently. This track adds an auto-retry loop that catches `JSONDecodeError` and feeds the traceback back to the LLM for self-correction.
+
+## Architectural Constraints
+- **Max Retries**: The retry loop MUST have a hard cap (e.g., 3 retries) to prevent infinite loops and runaway API costs.
+- **Error Injection**: The error message fed back to the LLM must include the specific `JSONDecodeError` trace and the raw string it attempted to parse.

 ## Functional Requirements
- Add retry loop in `generate_tickets`.
- Catch `JSONDecodeError` and reprompt the model.
- Abort after N failures.
+- Modify `generate_tickets` in `conductor_tech_lead.py` to wrap the `ai_client.send` call in a retry loop.
+- If `json.loads()` fails, construct a corrective prompt (e.g., "Your previous output failed to parse as JSON: {error}. Here was your output: {raw_text}. Please fix the formatting and output ONLY valid JSON.")
+- Send the corrective prompt via a new `ai_client.send` turn within the same session.
+- Abort and raise a structured error if the max retry count is reached.
+
+## Acceptance Criteria
+- [ ] `generate_tickets` includes a `while` loop with a max retry cap.
+- [ ] Invalid JSON responses automatically trigger a corrective reprompt to the model.
+- [ ] Unit tests exist that use `unittest.mock` on the AI client to simulate 1 failure followed by 1 success, asserting the final valid parse.
+- [ ] Unit tests exist simulating repeated failures hitting the retry cap.
--- a/conductor/tracks/strict_static_analysis_and_typing_20260302/plan.md
+++ b/conductor/tracks/strict_static_analysis_and_typing_20260302/plan.md
@@ -1,18 +1,40 @@
-# Implementation Plan: Strict Static Analysis & Type Safety
+# Implementation Plan: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302)

-## Phase 1: Configuration & Tooling
+## Phase 1: Configuration & Tooling Setup
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Configure strict `mypy.ini` and update `pyproject.toml`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
+- [ ] Task: Configure Strict Mypy Settings
+    - [ ] WHERE: `pyproject.toml` or `mypy.ini`
+    - [ ] WHAT: Enable `strict = true`, `disallow_untyped_defs = true`, `disallow_incomplete_defs = true`.
+    - [ ] HOW: Modify the toml/ini config file directly.
+    - [ ] SAFETY: May cause a massive spike in reported errors initially.
+- [ ] Task: Conductor - User Manual Verification 'Phase 1: Configuration' (Protocol in workflow.md)

-## Phase 2: Core Library Typing
- [ ] Task: Resolve typing in `api_hook_client.py` and models.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
+## Phase 2: Core Library Typing Resolution
+- [ ] Task: Resolve `api_hook_client.py` and `models.py` Type Errors
+    - [ ] WHERE: `api_hook_client.py`, `models.py`, `events.py`
+    - [ ] WHAT: Add explicit type hints to all function arguments, return values, and complex dictionaries. Resolve `Any` bleeding.
+    - [ ] HOW: Surgical type annotations (`dict[str, Any]`, `list[str]`, etc.).
+    - [ ] SAFETY: Do not change runtime logic, only type signatures.
+- [ ] Task: Resolve Conductor Subsystem Type Errors
+    - [ ] WHERE: `conductor_tech_lead.py`, `dag_engine.py`, `orchestrator_pm.py`
+    - [ ] WHAT: Enforce strict typing on track state, tickets, and DAG models.
+    - [ ] HOW: Standard python typing imports.
+    - [ ] SAFETY: Preserve JSON serialization compatibility.
+- [ ] Task: Conductor - User Manual Verification 'Phase 2: Core Library' (Protocol in workflow.md)

-## Phase 3: GUI Typing
- [ ] Task: Resolve typing in `gui_2.py`.
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
+## Phase 3: GUI God-Object Typing Resolution
+- [ ] Task: Resolve `gui_2.py` Type Errors
+    - [ ] WHERE: `gui_2.py`
+    - [ ] WHAT: Type the `App` class state variables, method signatures, and ImGui integration boundaries.
+    - [ ] HOW: Use `type: ignore[import]` only for ImGui C-bindings if strictly necessary, but type internal state tightly.
+    - [ ] SAFETY: Ensure `live_gui` tests pass after typing.
+- [ ] Task: Conductor - User Manual Verification 'Phase 3: GUI Typing' (Protocol in workflow.md)

-## Phase 4: CI Integration
- [ ] Task: Implement pre-commit hooks for ruff and mypy.
- [ ] Task: Conductor - User Manual Verification 'Phase 4'
+## Phase 4: CI Integration & Final Validation
+- [ ] Task: Establish Pre-Commit Guardrails
+    - [ ] WHERE: `.git/hooks/pre-commit` or a `scripts/validate_types.ps1`
+    - [ ] WHAT: Create a script that runs ruff and mypy, blocking commits if they fail.
+    - [ ] HOW: Standard shell scripting.
+    - [ ] SAFETY: Ensure it works cross-platform (Windows/Linux).
+- [ ] Task: Full Suite Validation & Warning Cleanup
+- [ ] Task: Conductor - User Manual Verification 'Phase 4: Validation' (Protocol in workflow.md)
--- a/conductor/tracks/strict_static_analysis_and_typing_20260302/spec.md
+++ b/conductor/tracks/strict_static_analysis_and_typing_20260302/spec.md
@@ -1,10 +1,21 @@
-# Track Specification: Strict Static Analysis & Type Safety
+# Track Specification: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302)

 ## Overview
-The codebase suffers from massive type-safety debt (512+ mypy errors). This track resolves all violations, enforces strict typing across `gui_2.py` and `api_hook_client.py`, and integrates pre-commit checks.
+The codebase currently suffers from massive type-safety debt (512+ `mypy` errors across 64 files) and lingering `ruff` violations. This track will harden the foundation by resolving all violations, enforcing strict typing (especially in `gui_2.py` and `api_hook_client.py`), and integrating pre-commit checks. This is a prerequisite for safe AI-driven refactoring.
+
+## Architectural Constraints: The "Strict Typing Contract"
+- **No Implicit Any**: Variables and function returns must have explicit types.
+- **No Ignored Errors**: Do not use `# type: ignore` unless absolutely unavoidable (e.g., for poorly typed third-party C bindings). If used, it must include a specific error code.
+- **Strict Optionals**: All optional types must be explicitly defined (e.g., `str | None`).

 ## Functional Requirements
- Resolve all mypy errors.
- Resolve all remaining ruff violations.
- Enforce strict typing.
- Add CI/pre-commit hook for linting.
+- **Mypy Resolution**: Fix all 512+ existing `mypy` errors.
+- **Ruff Resolution**: Fix all remaining `ruff` linting violations.
+- **Configuration**: Update `pyproject.toml` or `mypy.ini` to enforce strict type checking globally.
+- **CI/Automation**: Implement a pre-commit hook or script (`scripts/check_hints.py` equivalent) to block untyped code.
+
+## Acceptance Criteria
+- [ ] `uv run mypy --strict .` returns 0 errors.
+- [ ] `uv run ruff check .` returns 0 violations.
+- [ ] No new `# type: ignore` comments are added without justification.
+- [ ] Pre-commit hook or validation script is documented and active.
--- a/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md
+++ b/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md
@@ -1,14 +1,36 @@
-# Implementation Plan: Test Suite Performance
+# Implementation Plan: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)

 ## Phase 1: Audit & Polling Primitives
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Create deterministic polling primitives in `conftest.py`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
+- [ ] Task: Create Deterministic Polling Primitives
+    - [ ] WHERE: `tests/conftest.py`
+    - [ ] WHAT: Implement a `wait_until(predicate_fn, timeout=5.0, interval=0.05)` utility.
+    - [ ] HOW: Standard while loop that evaluates `predicate_fn()`.
+    - [ ] SAFETY: Ensure it raises a clear `TimeoutError` if it fails.
+- [ ] Task: Conductor - User Manual Verification 'Phase 1: Polling Primitives' (Protocol in workflow.md)

-## Phase 2: Refactoring Sleeps
- [ ] Task: Replace `time.sleep` across integration tests.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
+## Phase 2: Refactoring Integration Tests
+- [ ] Task: Refactor `test_spawn_interception.py`
+    - [ ] WHERE: `tests/test_spawn_interception.py`
+    - [ ] WHAT: Replace hardcoded sleeps with `wait_until` checking the `event_queue` or internal state.
+    - [ ] HOW: Use the new `conftest.py` utility.
+    - [ ] SAFETY: Prevent event loop deadlocks.
+- [ ] Task: Refactor Simulation Waits
+    - [ ] WHERE: `simulation/*.py` and `tests/test_live_gui_integration.py`
+    - [ ] WHAT: Replace `time.sleep()` blocks with `ApiHookClient.wait_for_event` or `client.wait_until_value_equals`.
+    - [ ] HOW: Expand `ApiHookClient` polling capabilities if necessary.
+    - [ ] SAFETY: Ensure the GUI hook server remains responsive during rapid polling.
+- [ ] Task: Conductor - User Manual Verification 'Phase 2: Refactoring Sleeps' (Protocol in workflow.md)

-## Phase 3: Test Marking
- [ ] Task: Apply `@pytest.mark.slow` to long-running tests.
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
+## Phase 3: Test Marking & Final Validation
+- [ ] Task: Apply Slow Test Marks
+    - [ ] WHERE: Across all `tests/`
+    - [ ] WHAT: Add `@pytest.mark.slow` to any test requiring a live GUI boot or API mocking that takes >2 seconds.
+    - [ ] HOW: Import pytest and apply the decorator.
+    - [ ] SAFETY: Update `pyproject.toml` to register the `slow` marker.
+- [ ] Task: Full Suite Performance Validation
+    - [ ] WHERE: Project root
+    - [ ] WHAT: Run `uv run pytest -m "not slow"` and verify execution time < 10 seconds. Run `uv run pytest` to ensure total suite passes.
+    - [ ] HOW: Time the terminal command.
+    - [ ] SAFETY: None.
+- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)
--- a/conductor/tracks/test_suite_performance_and_flakiness_20260302/spec.md
+++ b/conductor/tracks/test_suite_performance_and_flakiness_20260302/spec.md
@@ -1,9 +1,19 @@
-# Track Specification: Test Suite Performance & Flakiness
+# Track Specification: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)

 ## Overview
-The test suite is slow and flaky due to `time.sleep()`. This track replaces sleeps with deterministic polling (`threading.Event()`), aiming for a <10s core TDD loop.
+The test suite currently takes over 5.0 minutes to execute and frequently hangs on integration tests (e.g., `test_spawn_interception.py`). Several simulation tests are flaky or timing out. This track replaces arbitrary `time.sleep()` calls with deterministic polling (`threading.Event()`), aiming to drive the core TDD test execution time down to under 10 seconds.
+
+## Architectural Constraints
+- **Zero Arbitrary Sleeps**: `time.sleep(1.0)` is banned in test files unless testing actual rate-limiting or debounce functionality.
+- **Deterministic Waits**: Tests must use state-polling (with aggressive micro-sleeps) or `asyncio.Event` / `threading.Event` to proceed exactly when the system is ready.

 ## Functional Requirements
- Audit and remove `time.sleep()` in tests.
- Implement deterministic event polling.
- Mark slow integration tests with `@pytest.mark.slow`.
+- Audit all `tests/` and `simulation/` files for `time.sleep()`.
+- Implement polling helper functions in `conftest.py` (e.g., `wait_until(condition_func, timeout)`).
+- Refactor all integration tests to use the deterministic polling helpers.
+- Apply `@pytest.mark.slow` to any test that legitimately takes >2 seconds, allowing developers to skip them during rapid TDD loops.
+
+## Acceptance Criteria
+- [ ] `time.sleep` occurrences in the test suite are eliminated or strictly justified.
+- [ ] The core unit test suite (excluding `@pytest.mark.slow`) executes in under 10 seconds.
+- [ ] Integration tests pass consistently without flakiness across 10 consecutive runs.