chore(conductor): Enhance all 6 backlog tracks to Surgical Spec Protocol

This commit is contained in:
2026-03-02 22:38:02 -05:00
parent 2f4dca719f
commit 2e73212abd
12 changed files with 286 additions and 89 deletions

View File

@@ -1,10 +1,31 @@
# Implementation Plan: Concurrent Tier Isolation
# Implementation Plan: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302)
## Phase 1: Thread-Local Storage
## Phase 1: Thread-Local Context Refactoring
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Replace `current_tier` with `threading.local()`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
- [ ] Task: Refactor `ai_client` to `threading.local()`
- [ ] WHERE: `ai_client.py`
- [ ] WHAT: Replace `current_tier = None` with `_local_context = threading.local()`. Implement safe getters/setters for the tier.
- [ ] HOW: Use standard `threading.local` attributes.
- [ ] SAFETY: Provide defaults (e.g., `getattr(_local_context, 'tier', None)`) so uninitialized threads don't crash.
- [ ] Task: Update Lifecycle Callers
- [ ] WHERE: `multi_agent_conductor.py`, `conductor_tech_lead.py`
- [ ] WHAT: Update how they set the current tier around `send()` calls.
- [ ] HOW: Use the new setter/getter functions from `ai_client`.
- [ ] SAFETY: Ensure `finally` blocks clean up the thread-local state.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Refactoring' (Protocol in workflow.md)
## Phase 2: Refactor & Test
- [ ] Task: Update loggers and test with mock concurrent threads.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
## Phase 2: Testing Concurrency
- [ ] Task: Write Concurrent Execution Test
- [ ] WHERE: `tests/test_ai_client_concurrency.py` (New)
- [ ] WHAT: Spawn two threads. Thread A sets Tier 3 and calls a mock `send`. Thread B sets Tier 4 and calls mock `send`.
- [ ] HOW: Assert that the resulting `comms_log` correctly maps the entries to Tier 3 and Tier 4 respectively without race condition overwrites.
- [ ] SAFETY: Use `threading.Barrier` to force race conditions in the test to ensure the isolation holds.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Testing Concurrency' (Protocol in workflow.md)
## Phase 3: Final Validation
- [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] WHERE: Project root
- [ ] WHAT: `uv run pytest`
- [ ] HOW: Ensure 100% pass rate.
- [ ] SAFETY: None.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)

View File

@@ -1,8 +1,18 @@
# Track Specification: Concurrent Tier Source Isolation
# Track Specification: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302)
## Overview
Prepares the architecture for parallel Tier 3/4 agents by replacing the global `ai_client.current_tier` with thread-safe `threading.local()` or explicit call signatures.
Currently, `ai_client.current_tier` is a module-level `str | None`. This works safely only because the MMA engine serializes `ai_client.send()` calls. To prepare the architecture for parallel agents (e.g., executing multiple Tier 3 worker tickets concurrently), this global state must be replaced. This track will refactor the tagging system to use thread-safe context.
## Architectural Constraints
- **Thread Safety**: The solution MUST guarantee that if two threads call `ai_client.send()` simultaneously, their `source_tier` logs do not cross-contaminate.
- **API Surface**: Prefer passing `source_tier` explicitly in the `send()` method signature over implicit global/local state to ensure functional purity, OR use strictly isolated `threading.local()`.
## Functional Requirements
- Refactor `current_tier` to be thread-safe.
- Update all logging calls to use the thread-safe context.
- Refactor `ai_client.py` to remove the global `current_tier` variable.
- Update `run_worker_lifecycle` and `generate_tickets` to pass the tier context directly to the AI client or into a `threading.local` context block.
- Update `_append_comms` and `_append_tool_log` to utilize the thread-safe context.
## Acceptance Criteria
- [ ] `ai_client.current_tier` global variable is removed.
- [ ] `source_tier` tagging in `_comms_log` and `_tool_log` continues to function accurately.
- [ ] Tests simulate concurrent `send()` calls from different threads and assert correct log tagging without race conditions.

View File

@@ -1,18 +1,49 @@
# Implementation Plan: GUI Decoupling
# Implementation Plan: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302)
## Phase 1: Controller Skeleton
## Phase 1: Controller Skeleton & State Migration
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Create `app_controller.py`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
- [ ] Task: Create `app_controller.py` Skeleton
- [ ] WHERE: `app_controller.py` (New file)
- [ ] WHAT: Create the `AppController` class. Initialize basic state structures (logs, metrics, flags).
- [ ] HOW: Standard class definition.
- [ ] SAFETY: Do not break existing GUI yet.
- [ ] Task: Migrate Data State from GUI
- [ ] WHERE: `gui_2.py:__init__` and `app_controller.py`
- [ ] WHAT: Move variables like `_comms_log`, `_tool_log`, `mma_streams`, `active_tickets` to the controller.
- [ ] HOW: Update GUI to reference `self.controller.mma_streams` instead of `self.mma_streams`.
- [ ] SAFETY: Search and replace carefully; use `py_check_syntax`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: State Migration' (Protocol in workflow.md)
## Phase 2: State Migration
- [ ] Task: Move App state from `gui_2.py` to controller.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
## Phase 2: Logic & Background Thread Migration
- [ ] Task: Extract Background Threads & Event Queue
- [ ] WHERE: `gui_2.py` (e.g., `_init_ai_and_hooks`, `_process_event_queue`)
- [ ] WHAT: Move the `AsyncEventQueue`, asyncio worker thread, and HookServer initialization to the controller.
- [ ] HOW: The GUI should just call `self.controller.start_services()` and read the `_pending_gui_tasks` queue.
- [ ] SAFETY: Thread lifecycle management is critical. Ensure shutdown hooks are migrated.
- [ ] Task: Extract I/O and AI Methods
- [ ] WHERE: `gui_2.py` (`_cb_plan_epic`, `_flush_to_project`, `_cb_create_track`)
- [ ] WHAT: Move business logic methods to the controller.
- [ ] HOW: GUI callbacks simply become `lambda: self.controller.plan_epic(input)`.
- [ ] SAFETY: Verify Hook API endpoints still work.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Logic Migration' (Protocol in workflow.md)
## Phase 3: Logic Migration
- [ ] Task: Move non-rendering methods to controller.
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
## Phase 3: Test Suite Refactoring
- [ ] Task: Update `conftest.py` Fixtures
- [ ] WHERE: `tests/conftest.py`
- [ ] WHAT: Update `app_instance` fixture to mock/initialize the `AppController` instead of just `App`.
- [ ] HOW: Adjust `patch` targets to hit `app_controller.py` where appropriate.
- [ ] SAFETY: Run subset of tests continuously to fix import breaks.
- [ ] Task: Resolve Broken GUI Tests
- [ ] WHERE: `tests/test_gui_*.py`
- [ ] WHAT: Update test assertions that look for state on `app_instance` to look at `app_instance.controller`.
- [ ] HOW: Surgical string replacements.
- [ ] SAFETY: Ensure no false-positives.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Test Suite Refactoring' (Protocol in workflow.md)
## Phase 4: Validation
- [ ] Task: Update all tests to mock/use the controller.
- [ ] Task: Conductor - User Manual Verification 'Phase 4'
## Phase 4: Final Validation
- [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] WHERE: Project root
- [ ] WHAT: `uv run pytest`
- [ ] HOW: Ensure 100% pass rate.
- [ ] SAFETY: Watch out for lingering thread closure issues.
- [ ] Task: Conductor - User Manual Verification 'Phase 4: Final Validation' (Protocol in workflow.md)

View File

@@ -1,9 +1,21 @@
# Track Specification: GUI Decoupling & Controller Architecture
# Track Specification: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302)
## Overview
`gui_2.py` is a monolithic God Object. This track extracts its business logic and state machine into `app_controller.py`, leaving the GUI as a pure immediate-mode view adhering to Data-Oriented Design.
`gui_2.py` currently operates as a Monolithic God Object (3,500+ lines). It violates the Data-Oriented Design heuristic by owning complex business logic, orchestrator hooks, and markdown file building. This track extracts the core state machine and lifecycle into a headless `app_controller.py`, turning the GUI into a pure immediate-mode view.
## Architectural Constraints: The "Immediate Mode View" Contract
- **No Business Logic in View**: `gui_2.py` MUST NOT perform file I/O, AI API calls, or subprocess management directly.
- **State Ownership**: `app_controller.py` (or equivalent) owns the "Source of Truth" state.
- **Event-Driven Mutations**: The GUI must mutate state exclusively by dispatching events or calling controller methods, never by directly manipulating backend objects in the render loop.
## Functional Requirements
- Create `app_controller.py`.
- Migrate state variables and lifecycle methods from `gui_2.py` to the controller.
- Ensure `gui_2.py` only reads state and dispatches events.
- **Controller Extraction**: Create `app_controller.py` to handle all non-rendering logic.
- **State Migration**: Move state variables (`_tool_log`, `_comms_log`, `active_tickets`, etc.) out of `App.__init__` into the controller.
- **Logic Migration**: Move background threads, file reading/writing (`_flush_to_project`), and AI orchestrator invocations to the controller.
- **View Refactoring**: Refactor `gui_2.py` to accept the controller as a dependency and merely render its current state.
## Acceptance Criteria
- [ ] `app_controller.py` exists and owns the application state.
- [ ] `gui_2.py` has been reduced in size and complexity (no file I/O or AI calls).
- [ ] All existing features (chat, tools, tracks) function identically.
- [ ] The full test suite runs and passes against the new decoupled architecture.

View File

@@ -1,14 +1,36 @@
# Implementation Plan: Hook API UI State
# Implementation Plan: Hook API UI State Verification (hook_api_ui_state_verification_20260302)
## Phase 1: API Endpoint
## Phase 1: API Endpoint Implementation
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Implement `/api/gui/state` GET endpoint.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
- [ ] Task: Implement `/api/gui/state` GET Endpoint
- [ ] WHERE: `gui_2.py` (or `app_controller.py` if decoupled), inside `create_api()`.
- [ ] WHAT: Add a FastAPI route that serializes allowed UI state variables into JSON.
- [ ] HOW: Define a set of safe keys (e.g., `_gettable_fields`) and extract them from the App instance.
- [ ] SAFETY: Use thread-safe reads or deepcopies if accessing complex dictionaries.
- [ ] Task: Update `ApiHookClient`
- [ ] WHERE: `api_hook_client.py`
- [ ] WHAT: Add a `get_gui_state(self)` method that hits the new endpoint.
- [ ] HOW: Standard `requests.get`.
- [ ] SAFETY: Include error handling/timeouts.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: API Endpoint' (Protocol in workflow.md)
## Phase 2: State Wiring
- [ ] Task: Add UI state fields to `_settable_fields`.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
## Phase 2: State Wiring & Integration Tests
- [ ] Task: Wire Critical UI States
- [ ] WHERE: `gui_2.py`
- [ ] WHAT: Ensure fields like `ui_focus_agent`, `active_discussion`, `_track_discussion_active` are included in the exposed state.
- [ ] HOW: Update the mapping definition.
- [ ] SAFETY: None.
- [ ] Task: Write `live_gui` Integration Tests
- [ ] WHERE: `tests/test_live_gui_integration.py`
- [ ] WHAT: Add a test that changes the provider/model or focus agent via actions, then asserts `client.get_gui_state()` reflects the change.
- [ ] HOW: Use `pytest` and `live_gui` fixture.
- [ ] SAFETY: Ensure robust wait conditions for GUI updates.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: State Wiring & Tests' (Protocol in workflow.md)
## Phase 3: Integration Tests
- [ ] Task: Write `live_gui` tests validating state retrieval.
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
## Phase 3: Final Validation
- [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] WHERE: Project root
- [ ] WHAT: `uv run pytest`
- [ ] HOW: Ensure 100% pass rate.
- [ ] SAFETY: Ensure the hook server gracefully stops.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)

View File

@@ -1,9 +1,18 @@
# Track Specification: Hook API UI State Verification
# Track Specification: Hook API UI State Verification (hook_api_ui_state_verification_20260302)
## Overview
Adds an `/api/gui/state` endpoint to expose internal UI widget states (like `ui_focus_agent`) for reliable programmatic testing without user confirmation.
Currently, manual verification of UI widget state is difficult, and automated testing relies heavily on brittle logic. This track will expose internal UI widget states (like `ui_focus_agent`) via a new `/api/gui/state` GET endpoint. It wires critical UI state variables into `_settable_fields` so the `live_gui` fixture can programmatically read and assert exact widget states without requiring user confirmation dialogs.
## Architectural Constraints
- **Idempotent Reads**: The `/api/gui/state` endpoint MUST be read-only and free of side-effects.
- **Thread Safety**: Reading UI state from the HookServer thread MUST use the established locking mechanisms (e.g., querying via thread-safe proxies or safe reads of primitive types).
## Functional Requirements
- Add `/api/gui/state` endpoint to the HookServer.
- Wire UI state variables into `_settable_fields`.
- Write `live_gui` integration tests to assert widget states.
- **New Endpoint**: Implement a `/api/gui/state` GET endpoint in the headless API.
- **State Wiring**: Expand `_settable_fields` (or create a new `_gettable_fields` mapping) to safely expose internal UI states (combo boxes, checkbox states, active tabs).
- **Integration Testing**: Write `live_gui` based integration tests that mutate the application state and assert the correct UI state via the new endpoint.
## Acceptance Criteria
- [ ] `/api/gui/state` endpoint successfully returns JSON representing the UI state.
- [ ] Key UI variables (like `ui_focus_agent`) are queryable via the Hook Client.
- [ ] New `live_gui` integration tests exist that validate UI state retrieval.

View File

@@ -1,10 +1,26 @@
# Implementation Plan: Robust JSON Parsing
# Implementation Plan: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302)
## Phase 1: Retry Logic
## Phase 1: Implementation of Retry Logic
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Implement retry loop in `conductor_tech_lead.py`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
- [ ] Task: Implement Retry Loop in `generate_tickets`
- [ ] WHERE: `conductor_tech_lead.py:generate_tickets`
- [ ] WHAT: Wrap the `send` and `json.loads` calls in a `for _ in range(max_retries)` loop.
- [ ] HOW: If `JSONDecodeError` is caught, append an error message to the context and loop. If it succeeds, `break` and return.
- [ ] SAFETY: Ensure token limits aren't massively breached by appending huge error states. Truncate raw output if necessary.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Implementation' (Protocol in workflow.md)
## Phase 2: Validation
- [ ] Task: Write unit tests simulating JSON hallucination.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
## Phase 2: Unit Testing
- [ ] Task: Write Simulation Tests for JSON Parsing
- [ ] WHERE: `tests/test_conductor_tech_lead.py`
- [ ] WHAT: Add tests `test_generate_tickets_retry_success` and `test_generate_tickets_retry_failure`.
- [ ] HOW: Mock `ai_client.send` side_effect to return invalid JSON first, then valid JSON. Assert call counts.
- [ ] SAFETY: Standard pytest mocking.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Unit Testing' (Protocol in workflow.md)
## Phase 3: Final Validation
- [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] WHERE: Project root
- [ ] WHAT: `uv run pytest tests/test_conductor_tech_lead.py`
- [ ] HOW: Ensure 100% pass rate.
- [ ] SAFETY: None.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)

View File

@@ -1,9 +1,20 @@
# Track Specification: Robust JSON Parsing for Tech Lead
# Track Specification: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302)
## Overview
`conductor_tech_lead.py` silently fails if Tier 2 outputs invalid JSON. This track adds an auto-retry loop that feeds tracebacks back to the LLM for self-correction.
In `conductor_tech_lead.py`, the `generate_tickets` function relies on a generic `try...except` block to parse the LLM's JSON ticket array. If the Tier 2 model hallucinates or outputs invalid JSON, it silently returns an empty array `[]`, causing the GUI track creation process to fail silently. This track adds an auto-retry loop that catches `JSONDecodeError` and feeds the traceback back to the LLM for self-correction.
## Architectural Constraints
- **Max Retries**: The retry loop MUST have a hard cap (e.g., 3 retries) to prevent infinite loops and runaway API costs.
- **Error Injection**: The error message fed back to the LLM must include the specific `JSONDecodeError` trace and the raw string it attempted to parse.
## Functional Requirements
- Add retry loop in `generate_tickets`.
- Catch `JSONDecodeError` and reprompt the model.
- Abort after N failures.
- Modify `generate_tickets` in `conductor_tech_lead.py` to wrap the `ai_client.send` call in a retry loop.
- If `json.loads()` fails, construct a corrective prompt (e.g., "Your previous output failed to parse as JSON: {error}. Here was your output: {raw_text}. Please fix the formatting and output ONLY valid JSON.")
- Send the corrective prompt via a new `ai_client.send` turn within the same session.
- Abort and raise a structured error if the max retry count is reached.
## Acceptance Criteria
- [ ] `generate_tickets` includes a `while` loop with a max retry cap.
- [ ] Invalid JSON responses automatically trigger a corrective reprompt to the model.
- [ ] Unit tests exist that use `unittest.mock` on the AI client to simulate 1 failure followed by 1 success, asserting the final valid parse.
- [ ] Unit tests exist simulating repeated failures hitting the retry cap.

View File

@@ -1,18 +1,40 @@
# Implementation Plan: Strict Static Analysis & Type Safety
# Implementation Plan: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302)
## Phase 1: Configuration & Tooling
## Phase 1: Configuration & Tooling Setup
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Configure strict `mypy.ini` and update `pyproject.toml`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
- [ ] Task: Configure Strict Mypy Settings
- [ ] WHERE: `pyproject.toml` or `mypy.ini`
- [ ] WHAT: Enable `strict = true`, `disallow_untyped_defs = true`, `disallow_incomplete_defs = true`.
- [ ] HOW: Modify the toml/ini config file directly.
- [ ] SAFETY: May cause a massive spike in reported errors initially.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Configuration' (Protocol in workflow.md)
## Phase 2: Core Library Typing
- [ ] Task: Resolve typing in `api_hook_client.py` and models.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
## Phase 2: Core Library Typing Resolution
- [ ] Task: Resolve `api_hook_client.py` and `models.py` Type Errors
- [ ] WHERE: `api_hook_client.py`, `models.py`, `events.py`
- [ ] WHAT: Add explicit type hints to all function arguments, return values, and complex dictionaries. Resolve `Any` bleeding.
- [ ] HOW: Surgical type annotations (`dict[str, Any]`, `list[str]`, etc.).
- [ ] SAFETY: Do not change runtime logic, only type signatures.
- [ ] Task: Resolve Conductor Subsystem Type Errors
- [ ] WHERE: `conductor_tech_lead.py`, `dag_engine.py`, `orchestrator_pm.py`
- [ ] WHAT: Enforce strict typing on track state, tickets, and DAG models.
- [ ] HOW: Standard python typing imports.
- [ ] SAFETY: Preserve JSON serialization compatibility.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Core Library' (Protocol in workflow.md)
## Phase 3: GUI Typing
- [ ] Task: Resolve typing in `gui_2.py`.
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
## Phase 3: GUI God-Object Typing Resolution
- [ ] Task: Resolve `gui_2.py` Type Errors
- [ ] WHERE: `gui_2.py`
- [ ] WHAT: Type the `App` class state variables, method signatures, and ImGui integration boundaries.
- [ ] HOW: Use `type: ignore[import]` only for ImGui C-bindings if strictly necessary, but type internal state tightly.
- [ ] SAFETY: Ensure `live_gui` tests pass after typing.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: GUI Typing' (Protocol in workflow.md)
## Phase 4: CI Integration
- [ ] Task: Implement pre-commit hooks for ruff and mypy.
- [ ] Task: Conductor - User Manual Verification 'Phase 4'
## Phase 4: CI Integration & Final Validation
- [ ] Task: Establish Pre-Commit Guardrails
- [ ] WHERE: `.git/hooks/pre-commit` or a `scripts/validate_types.ps1`
- [ ] WHAT: Create a script that runs ruff and mypy, blocking commits if they fail.
- [ ] HOW: Standard shell scripting.
- [ ] SAFETY: Ensure it works cross-platform (Windows/Linux).
- [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] Task: Conductor - User Manual Verification 'Phase 4: Validation' (Protocol in workflow.md)

View File

@@ -1,10 +1,21 @@
# Track Specification: Strict Static Analysis & Type Safety
# Track Specification: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302)
## Overview
The codebase suffers from massive type-safety debt (512+ mypy errors). This track resolves all violations, enforces strict typing across `gui_2.py` and `api_hook_client.py`, and integrates pre-commit checks.
The codebase currently suffers from massive type-safety debt (512+ `mypy` errors across 64 files) and lingering `ruff` violations. This track will harden the foundation by resolving all violations, enforcing strict typing (especially in `gui_2.py` and `api_hook_client.py`), and integrating pre-commit checks. This is a prerequisite for safe AI-driven refactoring.
## Architectural Constraints: The "Strict Typing Contract"
- **No Implicit Any**: Variables and function returns must have explicit types.
- **No Ignored Errors**: Do not use `# type: ignore` unless absolutely unavoidable (e.g., for poorly typed third-party C bindings). If used, it must include a specific error code.
- **Strict Optionals**: All optional types must be explicitly defined (e.g., `str | None`).
## Functional Requirements
- Resolve all mypy errors.
- Resolve all remaining ruff violations.
- Enforce strict typing.
- Add CI/pre-commit hook for linting.
- **Mypy Resolution**: Fix all 512+ existing `mypy` errors.
- **Ruff Resolution**: Fix all remaining `ruff` linting violations.
- **Configuration**: Update `pyproject.toml` or `mypy.ini` to enforce strict type checking globally.
- **CI/Automation**: Implement a pre-commit hook or script (`scripts/check_hints.py` equivalent) to block untyped code.
## Acceptance Criteria
- [ ] `uv run mypy --strict .` returns 0 errors.
- [ ] `uv run ruff check .` returns 0 violations.
- [ ] No new `# type: ignore` comments are added without justification.
- [ ] Pre-commit hook or validation script is documented and active.

View File

@@ -1,14 +1,36 @@
# Implementation Plan: Test Suite Performance
# Implementation Plan: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)
## Phase 1: Audit & Polling Primitives
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Create deterministic polling primitives in `conftest.py`.
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
- [ ] Task: Create Deterministic Polling Primitives
- [ ] WHERE: `tests/conftest.py`
- [ ] WHAT: Implement a `wait_until(predicate_fn, timeout=5.0, interval=0.05)` utility.
- [ ] HOW: Standard while loop that evaluates `predicate_fn()`.
- [ ] SAFETY: Ensure it raises a clear `TimeoutError` if it fails.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Polling Primitives' (Protocol in workflow.md)
## Phase 2: Refactoring Sleeps
- [ ] Task: Replace `time.sleep` across integration tests.
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
## Phase 2: Refactoring Integration Tests
- [ ] Task: Refactor `test_spawn_interception.py`
- [ ] WHERE: `tests/test_spawn_interception.py`
- [ ] WHAT: Replace hardcoded sleeps with `wait_until` checking the `event_queue` or internal state.
- [ ] HOW: Use the new `conftest.py` utility.
- [ ] SAFETY: Prevent event loop deadlocks.
- [ ] Task: Refactor Simulation Waits
- [ ] WHERE: `simulation/*.py` and `tests/test_live_gui_integration.py`
- [ ] WHAT: Replace `time.sleep()` blocks with `ApiHookClient.wait_for_event` or `client.wait_until_value_equals`.
- [ ] HOW: Expand `ApiHookClient` polling capabilities if necessary.
- [ ] SAFETY: Ensure the GUI hook server remains responsive during rapid polling.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Refactoring Sleeps' (Protocol in workflow.md)
## Phase 3: Test Marking
- [ ] Task: Apply `@pytest.mark.slow` to long-running tests.
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
## Phase 3: Test Marking & Final Validation
- [ ] Task: Apply Slow Test Marks
- [ ] WHERE: Across all `tests/`
- [ ] WHAT: Add `@pytest.mark.slow` to any test requiring a live GUI boot or API mocking that takes >2 seconds.
- [ ] HOW: Import pytest and apply the decorator.
- [ ] SAFETY: Update `pyproject.toml` to register the `slow` marker.
- [ ] Task: Full Suite Performance Validation
- [ ] WHERE: Project root
- [ ] WHAT: Run `uv run pytest -m "not slow"` and verify execution time < 10 seconds. Run `uv run pytest` to ensure total suite passes.
- [ ] HOW: Time the terminal command.
- [ ] SAFETY: None.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)

View File

@@ -1,9 +1,19 @@
# Track Specification: Test Suite Performance & Flakiness
# Track Specification: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)
## Overview
The test suite is slow and flaky due to `time.sleep()`. This track replaces sleeps with deterministic polling (`threading.Event()`), aiming for a <10s core TDD loop.
The test suite currently takes over 5.0 minutes to execute and frequently hangs on integration tests (e.g., `test_spawn_interception.py`). Several simulation tests are flaky or timing out. This track replaces arbitrary `time.sleep()` calls with deterministic polling (`threading.Event()`), aiming to drive the core TDD test execution time down to under 10 seconds.
## Architectural Constraints
- **Zero Arbitrary Sleeps**: `time.sleep(1.0)` is banned in test files unless testing actual rate-limiting or debounce functionality.
- **Deterministic Waits**: Tests must use state-polling (with aggressive micro-sleeps) or `asyncio.Event` / `threading.Event` to proceed exactly when the system is ready.
## Functional Requirements
- Audit and remove `time.sleep()` in tests.
- Implement deterministic event polling.
- Mark slow integration tests with `@pytest.mark.slow`.
- Audit all `tests/` and `simulation/` files for `time.sleep()`.
- Implement polling helper functions in `conftest.py` (e.g., `wait_until(condition_func, timeout)`).
- Refactor all integration tests to use the deterministic polling helpers.
- Apply `@pytest.mark.slow` to any test that legitimately takes >2 seconds, allowing developers to skip them during rapid TDD loops.
## Acceptance Criteria
- [ ] `time.sleep` occurrences in the test suite are eliminated or strictly justified.
- [ ] The core unit test suite (excluding `@pytest.mark.slow`) executes in under 10 seconds.
- [ ] Integration tests pass consistently without flakiness across 10 consecutive runs.