chore(conductor): Enhance all 6 backlog tracks to Surgical Spec Protocol

This commit is contained in:
2026-03-02 22:38:02 -05:00
parent 2f4dca719f
commit 2e73212abd
12 changed files with 286 additions and 89 deletions

View File

@@ -1,10 +1,31 @@
# Implementation Plan: Concurrent Tier Isolation # Implementation Plan: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302)
## Phase 1: Thread-Local Storage ## Phase 1: Thread-Local Context Refactoring
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Replace `current_tier` with `threading.local()`. - [ ] Task: Refactor `ai_client` to `threading.local()`
- [ ] Task: Conductor - User Manual Verification 'Phase 1' - [ ] WHERE: `ai_client.py`
- [ ] WHAT: Replace `current_tier = None` with `_local_context = threading.local()`. Implement safe getters/setters for the tier.
- [ ] HOW: Use standard `threading.local` attributes.
- [ ] SAFETY: Provide defaults (e.g., `getattr(_local_context, 'tier', None)`) so uninitialized threads don't crash.
- [ ] Task: Update Lifecycle Callers
- [ ] WHERE: `multi_agent_conductor.py`, `conductor_tech_lead.py`
- [ ] WHAT: Update how they set the current tier around `send()` calls.
- [ ] HOW: Use the new setter/getter functions from `ai_client`.
- [ ] SAFETY: Ensure `finally` blocks clean up the thread-local state.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Refactoring' (Protocol in workflow.md)
## Phase 2: Refactor & Test ## Phase 2: Testing Concurrency
- [ ] Task: Update loggers and test with mock concurrent threads. - [ ] Task: Write Concurrent Execution Test
- [ ] Task: Conductor - User Manual Verification 'Phase 2' - [ ] WHERE: `tests/test_ai_client_concurrency.py` (New)
- [ ] WHAT: Spawn two threads. Thread A sets Tier 3 and calls a mock `send`. Thread B sets Tier 4 and calls mock `send`.
- [ ] HOW: Assert that the resulting `comms_log` correctly maps the entries to Tier 3 and Tier 4 respectively without race condition overwrites.
- [ ] SAFETY: Use `threading.Barrier` to force race conditions in the test to ensure the isolation holds.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Testing Concurrency' (Protocol in workflow.md)
## Phase 3: Final Validation
- [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] WHERE: Project root
- [ ] WHAT: `uv run pytest`
- [ ] HOW: Ensure 100% pass rate.
- [ ] SAFETY: None.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)

View File

@@ -1,8 +1,18 @@
# Track Specification: Concurrent Tier Source Isolation # Track Specification: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302)
## Overview ## Overview
Prepares the architecture for parallel Tier 3/4 agents by replacing the global `ai_client.current_tier` with thread-safe `threading.local()` or explicit call signatures. Currently, `ai_client.current_tier` is a module-level `str | None`. This works safely only because the MMA engine serializes `ai_client.send()` calls. To prepare the architecture for parallel agents (e.g., executing multiple Tier 3 worker tickets concurrently), this global state must be replaced. This track will refactor the tagging system to use thread-safe context.
## Architectural Constraints
- **Thread Safety**: The solution MUST guarantee that if two threads call `ai_client.send()` simultaneously, their `source_tier` logs do not cross-contaminate.
- **API Surface**: Prefer passing `source_tier` explicitly in the `send()` method signature over implicit global/local state to ensure functional purity, OR use strictly isolated `threading.local()`.
## Functional Requirements ## Functional Requirements
- Refactor `current_tier` to be thread-safe. - Refactor `ai_client.py` to remove the global `current_tier` variable.
- Update all logging calls to use the thread-safe context. - Update `run_worker_lifecycle` and `generate_tickets` to pass the tier context directly to the AI client or into a `threading.local` context block.
- Update `_append_comms` and `_append_tool_log` to utilize the thread-safe context.
## Acceptance Criteria
- [ ] `ai_client.current_tier` global variable is removed.
- [ ] `source_tier` tagging in `_comms_log` and `_tool_log` continues to function accurately.
- [ ] Tests simulate concurrent `send()` calls from different threads and assert correct log tagging without race conditions.

View File

@@ -1,18 +1,49 @@
# Implementation Plan: GUI Decoupling # Implementation Plan: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302)
## Phase 1: Controller Skeleton ## Phase 1: Controller Skeleton & State Migration
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Create `app_controller.py`. - [ ] Task: Create `app_controller.py` Skeleton
- [ ] Task: Conductor - User Manual Verification 'Phase 1' - [ ] WHERE: `app_controller.py` (New file)
- [ ] WHAT: Create the `AppController` class. Initialize basic state structures (logs, metrics, flags).
- [ ] HOW: Standard class definition.
- [ ] SAFETY: Do not break existing GUI yet.
- [ ] Task: Migrate Data State from GUI
- [ ] WHERE: `gui_2.py:__init__` and `app_controller.py`
- [ ] WHAT: Move variables like `_comms_log`, `_tool_log`, `mma_streams`, `active_tickets` to the controller.
- [ ] HOW: Update GUI to reference `self.controller.mma_streams` instead of `self.mma_streams`.
- [ ] SAFETY: Search and replace carefully; use `py_check_syntax`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: State Migration' (Protocol in workflow.md)
## Phase 2: State Migration ## Phase 2: Logic & Background Thread Migration
- [ ] Task: Move App state from `gui_2.py` to controller. - [ ] Task: Extract Background Threads & Event Queue
- [ ] Task: Conductor - User Manual Verification 'Phase 2' - [ ] WHERE: `gui_2.py` (e.g., `_init_ai_and_hooks`, `_process_event_queue`)
- [ ] WHAT: Move the `AsyncEventQueue`, asyncio worker thread, and HookServer initialization to the controller.
- [ ] HOW: The GUI should just call `self.controller.start_services()` and read the `_pending_gui_tasks` queue.
- [ ] SAFETY: Thread lifecycle management is critical. Ensure shutdown hooks are migrated.
- [ ] Task: Extract I/O and AI Methods
- [ ] WHERE: `gui_2.py` (`_cb_plan_epic`, `_flush_to_project`, `_cb_create_track`)
- [ ] WHAT: Move business logic methods to the controller.
- [ ] HOW: GUI callbacks simply become `lambda: self.controller.plan_epic(input)`.
- [ ] SAFETY: Verify Hook API endpoints still work.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Logic Migration' (Protocol in workflow.md)
## Phase 3: Logic Migration ## Phase 3: Test Suite Refactoring
- [ ] Task: Move non-rendering methods to controller. - [ ] Task: Update `conftest.py` Fixtures
- [ ] Task: Conductor - User Manual Verification 'Phase 3' - [ ] WHERE: `tests/conftest.py`
- [ ] WHAT: Update `app_instance` fixture to mock/initialize the `AppController` instead of just `App`.
- [ ] HOW: Adjust `patch` targets to hit `app_controller.py` where appropriate.
- [ ] SAFETY: Run subset of tests continuously to fix import breaks.
- [ ] Task: Resolve Broken GUI Tests
- [ ] WHERE: `tests/test_gui_*.py`
- [ ] WHAT: Update test assertions that look for state on `app_instance` to look at `app_instance.controller`.
- [ ] HOW: Surgical string replacements.
- [ ] SAFETY: Ensure no false-positives.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Test Suite Refactoring' (Protocol in workflow.md)
## Phase 4: Validation ## Phase 4: Final Validation
- [ ] Task: Update all tests to mock/use the controller. - [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] Task: Conductor - User Manual Verification 'Phase 4' - [ ] WHERE: Project root
- [ ] WHAT: `uv run pytest`
- [ ] HOW: Ensure 100% pass rate.
- [ ] SAFETY: Watch out for lingering thread closure issues.
- [ ] Task: Conductor - User Manual Verification 'Phase 4: Final Validation' (Protocol in workflow.md)

View File

@@ -1,9 +1,21 @@
# Track Specification: GUI Decoupling & Controller Architecture # Track Specification: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302)
## Overview ## Overview
`gui_2.py` is a monolithic God Object. This track extracts its business logic and state machine into `app_controller.py`, leaving the GUI as a pure immediate-mode view adhering to Data-Oriented Design. `gui_2.py` currently operates as a Monolithic God Object (3,500+ lines). It violates the Data-Oriented Design heuristic by owning complex business logic, orchestrator hooks, and markdown file building. This track extracts the core state machine and lifecycle into a headless `app_controller.py`, turning the GUI into a pure immediate-mode view.
## Architectural Constraints: The "Immediate Mode View" Contract
- **No Business Logic in View**: `gui_2.py` MUST NOT perform file I/O, AI API calls, or subprocess management directly.
- **State Ownership**: `app_controller.py` (or equivalent) owns the "Source of Truth" state.
- **Event-Driven Mutations**: The GUI must mutate state exclusively by dispatching events or calling controller methods, never by directly manipulating backend objects in the render loop.
## Functional Requirements ## Functional Requirements
- Create `app_controller.py`. - **Controller Extraction**: Create `app_controller.py` to handle all non-rendering logic.
- Migrate state variables and lifecycle methods from `gui_2.py` to the controller. - **State Migration**: Move state variables (`_tool_log`, `_comms_log`, `active_tickets`, etc.) out of `App.__init__` into the controller.
- Ensure `gui_2.py` only reads state and dispatches events. - **Logic Migration**: Move background threads, file reading/writing (`_flush_to_project`), and AI orchestrator invocations to the controller.
- **View Refactoring**: Refactor `gui_2.py` to accept the controller as a dependency and merely render its current state.
## Acceptance Criteria
- [ ] `app_controller.py` exists and owns the application state.
- [ ] `gui_2.py` has been reduced in size and complexity (no file I/O or AI calls).
- [ ] All existing features (chat, tools, tracks) function identically.
- [ ] The full test suite runs and passes against the new decoupled architecture.

View File

@@ -1,14 +1,36 @@
# Implementation Plan: Hook API UI State # Implementation Plan: Hook API UI State Verification (hook_api_ui_state_verification_20260302)
## Phase 1: API Endpoint ## Phase 1: API Endpoint Implementation
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Implement `/api/gui/state` GET endpoint. - [ ] Task: Implement `/api/gui/state` GET Endpoint
- [ ] Task: Conductor - User Manual Verification 'Phase 1' - [ ] WHERE: `gui_2.py` (or `app_controller.py` if decoupled), inside `create_api()`.
- [ ] WHAT: Add a FastAPI route that serializes allowed UI state variables into JSON.
- [ ] HOW: Define a set of safe keys (e.g., `_gettable_fields`) and extract them from the App instance.
- [ ] SAFETY: Use thread-safe reads or deepcopies if accessing complex dictionaries.
- [ ] Task: Update `ApiHookClient`
- [ ] WHERE: `api_hook_client.py`
- [ ] WHAT: Add a `get_gui_state(self)` method that hits the new endpoint.
- [ ] HOW: Standard `requests.get`.
- [ ] SAFETY: Include error handling/timeouts.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: API Endpoint' (Protocol in workflow.md)
## Phase 2: State Wiring ## Phase 2: State Wiring & Integration Tests
- [ ] Task: Add UI state fields to `_settable_fields`. - [ ] Task: Wire Critical UI States
- [ ] Task: Conductor - User Manual Verification 'Phase 2' - [ ] WHERE: `gui_2.py`
- [ ] WHAT: Ensure fields like `ui_focus_agent`, `active_discussion`, `_track_discussion_active` are included in the exposed state.
- [ ] HOW: Update the mapping definition.
- [ ] SAFETY: None.
- [ ] Task: Write `live_gui` Integration Tests
- [ ] WHERE: `tests/test_live_gui_integration.py`
- [ ] WHAT: Add a test that changes the provider/model or focus agent via actions, then asserts `client.get_gui_state()` reflects the change.
- [ ] HOW: Use `pytest` and `live_gui` fixture.
- [ ] SAFETY: Ensure robust wait conditions for GUI updates.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: State Wiring & Tests' (Protocol in workflow.md)
## Phase 3: Integration Tests ## Phase 3: Final Validation
- [ ] Task: Write `live_gui` tests validating state retrieval. - [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] Task: Conductor - User Manual Verification 'Phase 3' - [ ] WHERE: Project root
- [ ] WHAT: `uv run pytest`
- [ ] HOW: Ensure 100% pass rate.
- [ ] SAFETY: Ensure the hook server gracefully stops.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)

View File

@@ -1,9 +1,18 @@
# Track Specification: Hook API UI State Verification # Track Specification: Hook API UI State Verification (hook_api_ui_state_verification_20260302)
## Overview ## Overview
Adds an `/api/gui/state` endpoint to expose internal UI widget states (like `ui_focus_agent`) for reliable programmatic testing without user confirmation. Currently, manual verification of UI widget state is difficult, and automated testing relies heavily on brittle logic. This track will expose internal UI widget states (like `ui_focus_agent`) via a new `/api/gui/state` GET endpoint. It wires critical UI state variables into `_settable_fields` so the `live_gui` fixture can programmatically read and assert exact widget states without requiring user confirmation dialogs.
## Architectural Constraints
- **Idempotent Reads**: The `/api/gui/state` endpoint MUST be read-only and free of side-effects.
- **Thread Safety**: Reading UI state from the HookServer thread MUST use the established locking mechanisms (e.g., querying via thread-safe proxies or safe reads of primitive types).
## Functional Requirements ## Functional Requirements
- Add `/api/gui/state` endpoint to the HookServer. - **New Endpoint**: Implement a `/api/gui/state` GET endpoint in the headless API.
- Wire UI state variables into `_settable_fields`. - **State Wiring**: Expand `_settable_fields` (or create a new `_gettable_fields` mapping) to safely expose internal UI states (combo boxes, checkbox states, active tabs).
- Write `live_gui` integration tests to assert widget states. - **Integration Testing**: Write `live_gui` based integration tests that mutate the application state and assert the correct UI state via the new endpoint.
## Acceptance Criteria
- [ ] `/api/gui/state` endpoint successfully returns JSON representing the UI state.
- [ ] Key UI variables (like `ui_focus_agent`) are queryable via the Hook Client.
- [ ] New `live_gui` integration tests exist that validate UI state retrieval.

View File

@@ -1,10 +1,26 @@
# Implementation Plan: Robust JSON Parsing # Implementation Plan: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302)
## Phase 1: Retry Logic ## Phase 1: Implementation of Retry Logic
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Implement retry loop in `conductor_tech_lead.py`. - [ ] Task: Implement Retry Loop in `generate_tickets`
- [ ] Task: Conductor - User Manual Verification 'Phase 1' - [ ] WHERE: `conductor_tech_lead.py:generate_tickets`
- [ ] WHAT: Wrap the `send` and `json.loads` calls in a `for _ in range(max_retries)` loop.
- [ ] HOW: If `JSONDecodeError` is caught, append an error message to the context and loop. If it succeeds, `break` and return.
- [ ] SAFETY: Ensure token limits aren't massively breached by appending huge error states. Truncate raw output if necessary.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Implementation' (Protocol in workflow.md)
## Phase 2: Validation ## Phase 2: Unit Testing
- [ ] Task: Write unit tests simulating JSON hallucination. - [ ] Task: Write Simulation Tests for JSON Parsing
- [ ] Task: Conductor - User Manual Verification 'Phase 2' - [ ] WHERE: `tests/test_conductor_tech_lead.py`
- [ ] WHAT: Add tests `test_generate_tickets_retry_success` and `test_generate_tickets_retry_failure`.
- [ ] HOW: Mock `ai_client.send` side_effect to return invalid JSON first, then valid JSON. Assert call counts.
- [ ] SAFETY: Standard pytest mocking.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Unit Testing' (Protocol in workflow.md)
## Phase 3: Final Validation
- [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] WHERE: Project root
- [ ] WHAT: `uv run pytest tests/test_conductor_tech_lead.py`
- [ ] HOW: Ensure 100% pass rate.
- [ ] SAFETY: None.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)

View File

@@ -1,9 +1,20 @@
# Track Specification: Robust JSON Parsing for Tech Lead # Track Specification: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302)
## Overview ## Overview
`conductor_tech_lead.py` silently fails if Tier 2 outputs invalid JSON. This track adds an auto-retry loop that feeds tracebacks back to the LLM for self-correction. In `conductor_tech_lead.py`, the `generate_tickets` function relies on a generic `try...except` block to parse the LLM's JSON ticket array. If the Tier 2 model hallucinates or outputs invalid JSON, it silently returns an empty array `[]`, causing the GUI track creation process to fail silently. This track adds an auto-retry loop that catches `JSONDecodeError` and feeds the traceback back to the LLM for self-correction.
## Architectural Constraints
- **Max Retries**: The retry loop MUST have a hard cap (e.g., 3 retries) to prevent infinite loops and runaway API costs.
- **Error Injection**: The error message fed back to the LLM must include the specific `JSONDecodeError` trace and the raw string it attempted to parse.
## Functional Requirements ## Functional Requirements
- Add retry loop in `generate_tickets`. - Modify `generate_tickets` in `conductor_tech_lead.py` to wrap the `ai_client.send` call in a retry loop.
- Catch `JSONDecodeError` and reprompt the model. - If `json.loads()` fails, construct a corrective prompt (e.g., "Your previous output failed to parse as JSON: {error}. Here was your output: {raw_text}. Please fix the formatting and output ONLY valid JSON.")
- Abort after N failures. - Send the corrective prompt via a new `ai_client.send` turn within the same session.
- Abort and raise a structured error if the max retry count is reached.
## Acceptance Criteria
- [ ] `generate_tickets` includes a `while` loop with a max retry cap.
- [ ] Invalid JSON responses automatically trigger a corrective reprompt to the model.
- [ ] Unit tests exist that use `unittest.mock` on the AI client to simulate 1 failure followed by 1 success, asserting the final valid parse.
- [ ] Unit tests exist simulating repeated failures hitting the retry cap.

View File

@@ -1,18 +1,40 @@
# Implementation Plan: Strict Static Analysis & Type Safety # Implementation Plan: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302)
## Phase 1: Configuration & Tooling ## Phase 1: Configuration & Tooling Setup
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Configure strict `mypy.ini` and update `pyproject.toml`. - [ ] Task: Configure Strict Mypy Settings
- [ ] Task: Conductor - User Manual Verification 'Phase 1' - [ ] WHERE: `pyproject.toml` or `mypy.ini`
- [ ] WHAT: Enable `strict = true`, `disallow_untyped_defs = true`, `disallow_incomplete_defs = true`.
- [ ] HOW: Modify the toml/ini config file directly.
- [ ] SAFETY: May cause a massive spike in reported errors initially.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Configuration' (Protocol in workflow.md)
## Phase 2: Core Library Typing ## Phase 2: Core Library Typing Resolution
- [ ] Task: Resolve typing in `api_hook_client.py` and models. - [ ] Task: Resolve `api_hook_client.py` and `models.py` Type Errors
- [ ] Task: Conductor - User Manual Verification 'Phase 2' - [ ] WHERE: `api_hook_client.py`, `models.py`, `events.py`
- [ ] WHAT: Add explicit type hints to all function arguments, return values, and complex dictionaries. Resolve `Any` bleeding.
- [ ] HOW: Surgical type annotations (`dict[str, Any]`, `list[str]`, etc.).
- [ ] SAFETY: Do not change runtime logic, only type signatures.
- [ ] Task: Resolve Conductor Subsystem Type Errors
- [ ] WHERE: `conductor_tech_lead.py`, `dag_engine.py`, `orchestrator_pm.py`
- [ ] WHAT: Enforce strict typing on track state, tickets, and DAG models.
- [ ] HOW: Standard python typing imports.
- [ ] SAFETY: Preserve JSON serialization compatibility.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Core Library' (Protocol in workflow.md)
## Phase 3: GUI Typing ## Phase 3: GUI God-Object Typing Resolution
- [ ] Task: Resolve typing in `gui_2.py`. - [ ] Task: Resolve `gui_2.py` Type Errors
- [ ] Task: Conductor - User Manual Verification 'Phase 3' - [ ] WHERE: `gui_2.py`
- [ ] WHAT: Type the `App` class state variables, method signatures, and ImGui integration boundaries.
- [ ] HOW: Use `type: ignore[import]` only for ImGui C-bindings if strictly necessary, but type internal state tightly.
- [ ] SAFETY: Ensure `live_gui` tests pass after typing.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: GUI Typing' (Protocol in workflow.md)
## Phase 4: CI Integration ## Phase 4: CI Integration & Final Validation
- [ ] Task: Implement pre-commit hooks for ruff and mypy. - [ ] Task: Establish Pre-Commit Guardrails
- [ ] Task: Conductor - User Manual Verification 'Phase 4' - [ ] WHERE: `.git/hooks/pre-commit` or a `scripts/validate_types.ps1`
- [ ] WHAT: Create a script that runs ruff and mypy, blocking commits if they fail.
- [ ] HOW: Standard shell scripting.
- [ ] SAFETY: Ensure it works cross-platform (Windows/Linux).
- [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] Task: Conductor - User Manual Verification 'Phase 4: Validation' (Protocol in workflow.md)

View File

@@ -1,10 +1,21 @@
# Track Specification: Strict Static Analysis & Type Safety # Track Specification: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302)
## Overview ## Overview
The codebase suffers from massive type-safety debt (512+ mypy errors). This track resolves all violations, enforces strict typing across `gui_2.py` and `api_hook_client.py`, and integrates pre-commit checks. The codebase currently suffers from massive type-safety debt (512+ `mypy` errors across 64 files) and lingering `ruff` violations. This track will harden the foundation by resolving all violations, enforcing strict typing (especially in `gui_2.py` and `api_hook_client.py`), and integrating pre-commit checks. This is a prerequisite for safe AI-driven refactoring.
## Architectural Constraints: The "Strict Typing Contract"
- **No Implicit Any**: Variables and function returns must have explicit types.
- **No Ignored Errors**: Do not use `# type: ignore` unless absolutely unavoidable (e.g., for poorly typed third-party C bindings). If used, it must include a specific error code.
- **Strict Optionals**: All optional types must be explicitly defined (e.g., `str | None`).
## Functional Requirements ## Functional Requirements
- Resolve all mypy errors. - **Mypy Resolution**: Fix all 512+ existing `mypy` errors.
- Resolve all remaining ruff violations. - **Ruff Resolution**: Fix all remaining `ruff` linting violations.
- Enforce strict typing. - **Configuration**: Update `pyproject.toml` or `mypy.ini` to enforce strict type checking globally.
- Add CI/pre-commit hook for linting. - **CI/Automation**: Implement a pre-commit hook or script (`scripts/check_hints.py` equivalent) to block untyped code.
## Acceptance Criteria
- [ ] `uv run mypy --strict .` returns 0 errors.
- [ ] `uv run ruff check .` returns 0 violations.
- [ ] No new `# type: ignore` comments are added without justification.
- [ ] Pre-commit hook or validation script is documented and active.

View File

@@ -1,14 +1,36 @@
# Implementation Plan: Test Suite Performance # Implementation Plan: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)
## Phase 1: Audit & Polling Primitives ## Phase 1: Audit & Polling Primitives
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Create deterministic polling primitives in `conftest.py`. - [ ] Task: Create Deterministic Polling Primitives
- [ ] Task: Conductor - User Manual Verification 'Phase 1' - [ ] WHERE: `tests/conftest.py`
- [ ] WHAT: Implement a `wait_until(predicate_fn, timeout=5.0, interval=0.05)` utility.
- [ ] HOW: Standard while loop that evaluates `predicate_fn()`.
- [ ] SAFETY: Ensure it raises a clear `TimeoutError` if it fails.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Polling Primitives' (Protocol in workflow.md)
## Phase 2: Refactoring Sleeps ## Phase 2: Refactoring Integration Tests
- [ ] Task: Replace `time.sleep` across integration tests. - [ ] Task: Refactor `test_spawn_interception.py`
- [ ] Task: Conductor - User Manual Verification 'Phase 2' - [ ] WHERE: `tests/test_spawn_interception.py`
- [ ] WHAT: Replace hardcoded sleeps with `wait_until` checking the `event_queue` or internal state.
- [ ] HOW: Use the new `conftest.py` utility.
- [ ] SAFETY: Prevent event loop deadlocks.
- [ ] Task: Refactor Simulation Waits
- [ ] WHERE: `simulation/*.py` and `tests/test_live_gui_integration.py`
- [ ] WHAT: Replace `time.sleep()` blocks with `ApiHookClient.wait_for_event` or `client.wait_until_value_equals`.
- [ ] HOW: Expand `ApiHookClient` polling capabilities if necessary.
- [ ] SAFETY: Ensure the GUI hook server remains responsive during rapid polling.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Refactoring Sleeps' (Protocol in workflow.md)
## Phase 3: Test Marking ## Phase 3: Test Marking & Final Validation
- [ ] Task: Apply `@pytest.mark.slow` to long-running tests. - [ ] Task: Apply Slow Test Marks
- [ ] Task: Conductor - User Manual Verification 'Phase 3' - [ ] WHERE: Across all `tests/`
- [ ] WHAT: Add `@pytest.mark.slow` to any test requiring a live GUI boot or API mocking that takes >2 seconds.
- [ ] HOW: Import pytest and apply the decorator.
- [ ] SAFETY: Update `pyproject.toml` to register the `slow` marker.
- [ ] Task: Full Suite Performance Validation
- [ ] WHERE: Project root
- [ ] WHAT: Run `uv run pytest -m "not slow"` and verify execution time < 10 seconds. Run `uv run pytest` to ensure total suite passes.
- [ ] HOW: Time the terminal command.
- [ ] SAFETY: None.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)

View File

@@ -1,9 +1,19 @@
# Track Specification: Test Suite Performance & Flakiness # Track Specification: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)
## Overview ## Overview
The test suite is slow and flaky due to `time.sleep()`. This track replaces sleeps with deterministic polling (`threading.Event()`), aiming for a <10s core TDD loop. The test suite currently takes over 5.0 minutes to execute and frequently hangs on integration tests (e.g., `test_spawn_interception.py`). Several simulation tests are flaky or timing out. This track replaces arbitrary `time.sleep()` calls with deterministic polling (`threading.Event()`), aiming to drive the core TDD test execution time down to under 10 seconds.
## Architectural Constraints
- **Zero Arbitrary Sleeps**: `time.sleep(1.0)` is banned in test files unless testing actual rate-limiting or debounce functionality.
- **Deterministic Waits**: Tests must use state-polling (with aggressive micro-sleeps) or `asyncio.Event` / `threading.Event` to proceed exactly when the system is ready.
## Functional Requirements ## Functional Requirements
- Audit and remove `time.sleep()` in tests. - Audit all `tests/` and `simulation/` files for `time.sleep()`.
- Implement deterministic event polling. - Implement polling helper functions in `conftest.py` (e.g., `wait_until(condition_func, timeout)`).
- Mark slow integration tests with `@pytest.mark.slow`. - Refactor all integration tests to use the deterministic polling helpers.
- Apply `@pytest.mark.slow` to any test that legitimately takes >2 seconds, allowing developers to skip them during rapid TDD loops.
## Acceptance Criteria
- [ ] `time.sleep` occurrences in the test suite are eliminated or strictly justified.
- [ ] The core unit test suite (excluding `@pytest.mark.slow`) executes in under 10 seconds.
- [ ] Integration tests pass consistently without flakiness across 10 consecutive runs.