chore(conductor): Enhance all 6 backlog tracks to Surgical Spec Protocol
This commit is contained in:
@@ -1,10 +1,31 @@
|
|||||||
# Implementation Plan: Concurrent Tier Isolation
|
# Implementation Plan: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302)
|
||||||
|
|
||||||
## Phase 1: Thread-Local Storage
|
## Phase 1: Thread-Local Context Refactoring
|
||||||
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
||||||
- [ ] Task: Replace `current_tier` with `threading.local()`.
|
- [ ] Task: Refactor `ai_client` to `threading.local()`
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
|
- [ ] WHERE: `ai_client.py`
|
||||||
|
- [ ] WHAT: Replace `current_tier = None` with `_local_context = threading.local()`. Implement safe getters/setters for the tier.
|
||||||
|
- [ ] HOW: Use standard `threading.local` attributes.
|
||||||
|
- [ ] SAFETY: Provide defaults (e.g., `getattr(_local_context, 'tier', None)`) so uninitialized threads don't crash.
|
||||||
|
- [ ] Task: Update Lifecycle Callers
|
||||||
|
- [ ] WHERE: `multi_agent_conductor.py`, `conductor_tech_lead.py`
|
||||||
|
- [ ] WHAT: Update how they set the current tier around `send()` calls.
|
||||||
|
- [ ] HOW: Use the new setter/getter functions from `ai_client`.
|
||||||
|
- [ ] SAFETY: Ensure `finally` blocks clean up the thread-local state.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Refactoring' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 2: Refactor & Test
|
## Phase 2: Testing Concurrency
|
||||||
- [ ] Task: Update loggers and test with mock concurrent threads.
|
- [ ] Task: Write Concurrent Execution Test
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
|
- [ ] WHERE: `tests/test_ai_client_concurrency.py` (New)
|
||||||
|
- [ ] WHAT: Spawn two threads. Thread A sets Tier 3 and calls a mock `send`. Thread B sets Tier 4 and calls mock `send`.
|
||||||
|
- [ ] HOW: Assert that the resulting `comms_log` correctly maps the entries to Tier 3 and Tier 4 respectively without race condition overwrites.
|
||||||
|
- [ ] SAFETY: Use `threading.Barrier` to force race conditions in the test to ensure the isolation holds.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Testing Concurrency' (Protocol in workflow.md)
|
||||||
|
|
||||||
|
## Phase 3: Final Validation
|
||||||
|
- [ ] Task: Full Suite Validation & Warning Cleanup
|
||||||
|
- [ ] WHERE: Project root
|
||||||
|
- [ ] WHAT: `uv run pytest`
|
||||||
|
- [ ] HOW: Ensure 100% pass rate.
|
||||||
|
- [ ] SAFETY: None.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)
|
||||||
@@ -1,8 +1,18 @@
|
|||||||
# Track Specification: Concurrent Tier Source Isolation
|
# Track Specification: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
Prepares the architecture for parallel Tier 3/4 agents by replacing the global `ai_client.current_tier` with thread-safe `threading.local()` or explicit call signatures.
|
Currently, `ai_client.current_tier` is a module-level `str | None`. This works safely only because the MMA engine serializes `ai_client.send()` calls. To prepare the architecture for parallel agents (e.g., executing multiple Tier 3 worker tickets concurrently), this global state must be replaced. This track will refactor the tagging system to use thread-safe context.
|
||||||
|
|
||||||
|
## Architectural Constraints
|
||||||
|
- **Thread Safety**: The solution MUST guarantee that if two threads call `ai_client.send()` simultaneously, their `source_tier` logs do not cross-contaminate.
|
||||||
|
- **API Surface**: Prefer passing `source_tier` explicitly in the `send()` method signature over implicit global/local state to ensure functional purity, OR use strictly isolated `threading.local()`.
|
||||||
|
|
||||||
## Functional Requirements
|
## Functional Requirements
|
||||||
- Refactor `current_tier` to be thread-safe.
|
- Refactor `ai_client.py` to remove the global `current_tier` variable.
|
||||||
- Update all logging calls to use the thread-safe context.
|
- Update `run_worker_lifecycle` and `generate_tickets` to pass the tier context directly to the AI client or into a `threading.local` context block.
|
||||||
|
- Update `_append_comms` and `_append_tool_log` to utilize the thread-safe context.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
- [ ] `ai_client.current_tier` global variable is removed.
|
||||||
|
- [ ] `source_tier` tagging in `_comms_log` and `_tool_log` continues to function accurately.
|
||||||
|
- [ ] Tests simulate concurrent `send()` calls from different threads and assert correct log tagging without race conditions.
|
||||||
@@ -1,18 +1,49 @@
|
|||||||
# Implementation Plan: GUI Decoupling
|
# Implementation Plan: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302)
|
||||||
|
|
||||||
## Phase 1: Controller Skeleton
|
## Phase 1: Controller Skeleton & State Migration
|
||||||
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
||||||
- [ ] Task: Create `app_controller.py`.
|
- [ ] Task: Create `app_controller.py` Skeleton
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
|
- [ ] WHERE: `app_controller.py` (New file)
|
||||||
|
- [ ] WHAT: Create the `AppController` class. Initialize basic state structures (logs, metrics, flags).
|
||||||
|
- [ ] HOW: Standard class definition.
|
||||||
|
- [ ] SAFETY: Do not break existing GUI yet.
|
||||||
|
- [ ] Task: Migrate Data State from GUI
|
||||||
|
- [ ] WHERE: `gui_2.py:__init__` and `app_controller.py`
|
||||||
|
- [ ] WHAT: Move variables like `_comms_log`, `_tool_log`, `mma_streams`, `active_tickets` to the controller.
|
||||||
|
- [ ] HOW: Update GUI to reference `self.controller.mma_streams` instead of `self.mma_streams`.
|
||||||
|
- [ ] SAFETY: Search and replace carefully; use `py_check_syntax`.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 1: State Migration' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 2: State Migration
|
## Phase 2: Logic & Background Thread Migration
|
||||||
- [ ] Task: Move App state from `gui_2.py` to controller.
|
- [ ] Task: Extract Background Threads & Event Queue
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
|
- [ ] WHERE: `gui_2.py` (e.g., `_init_ai_and_hooks`, `_process_event_queue`)
|
||||||
|
- [ ] WHAT: Move the `AsyncEventQueue`, asyncio worker thread, and HookServer initialization to the controller.
|
||||||
|
- [ ] HOW: The GUI should just call `self.controller.start_services()` and read the `_pending_gui_tasks` queue.
|
||||||
|
- [ ] SAFETY: Thread lifecycle management is critical. Ensure shutdown hooks are migrated.
|
||||||
|
- [ ] Task: Extract I/O and AI Methods
|
||||||
|
- [ ] WHERE: `gui_2.py` (`_cb_plan_epic`, `_flush_to_project`, `_cb_create_track`)
|
||||||
|
- [ ] WHAT: Move business logic methods to the controller.
|
||||||
|
- [ ] HOW: GUI callbacks simply become `lambda: self.controller.plan_epic(input)`.
|
||||||
|
- [ ] SAFETY: Verify Hook API endpoints still work.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Logic Migration' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 3: Logic Migration
|
## Phase 3: Test Suite Refactoring
|
||||||
- [ ] Task: Move non-rendering methods to controller.
|
- [ ] Task: Update `conftest.py` Fixtures
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
|
- [ ] WHERE: `tests/conftest.py`
|
||||||
|
- [ ] WHAT: Update `app_instance` fixture to mock/initialize the `AppController` instead of just `App`.
|
||||||
|
- [ ] HOW: Adjust `patch` targets to hit `app_controller.py` where appropriate.
|
||||||
|
- [ ] SAFETY: Run subset of tests continuously to fix import breaks.
|
||||||
|
- [ ] Task: Resolve Broken GUI Tests
|
||||||
|
- [ ] WHERE: `tests/test_gui_*.py`
|
||||||
|
- [ ] WHAT: Update test assertions that look for state on `app_instance` to look at `app_instance.controller`.
|
||||||
|
- [ ] HOW: Surgical string replacements.
|
||||||
|
- [ ] SAFETY: Ensure no false-positives.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Test Suite Refactoring' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 4: Validation
|
## Phase 4: Final Validation
|
||||||
- [ ] Task: Update all tests to mock/use the controller.
|
- [ ] Task: Full Suite Validation & Warning Cleanup
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 4'
|
- [ ] WHERE: Project root
|
||||||
|
- [ ] WHAT: `uv run pytest`
|
||||||
|
- [ ] HOW: Ensure 100% pass rate.
|
||||||
|
- [ ] SAFETY: Watch out for lingering thread closure issues.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 4: Final Validation' (Protocol in workflow.md)
|
||||||
@@ -1,9 +1,21 @@
|
|||||||
# Track Specification: GUI Decoupling & Controller Architecture
|
# Track Specification: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
`gui_2.py` is a monolithic God Object. This track extracts its business logic and state machine into `app_controller.py`, leaving the GUI as a pure immediate-mode view adhering to Data-Oriented Design.
|
`gui_2.py` currently operates as a Monolithic God Object (3,500+ lines). It violates the Data-Oriented Design heuristic by owning complex business logic, orchestrator hooks, and markdown file building. This track extracts the core state machine and lifecycle into a headless `app_controller.py`, turning the GUI into a pure immediate-mode view.
|
||||||
|
|
||||||
|
## Architectural Constraints: The "Immediate Mode View" Contract
|
||||||
|
- **No Business Logic in View**: `gui_2.py` MUST NOT perform file I/O, AI API calls, or subprocess management directly.
|
||||||
|
- **State Ownership**: `app_controller.py` (or equivalent) owns the "Source of Truth" state.
|
||||||
|
- **Event-Driven Mutations**: The GUI must mutate state exclusively by dispatching events or calling controller methods, never by directly manipulating backend objects in the render loop.
|
||||||
|
|
||||||
## Functional Requirements
|
## Functional Requirements
|
||||||
- Create `app_controller.py`.
|
- **Controller Extraction**: Create `app_controller.py` to handle all non-rendering logic.
|
||||||
- Migrate state variables and lifecycle methods from `gui_2.py` to the controller.
|
- **State Migration**: Move state variables (`_tool_log`, `_comms_log`, `active_tickets`, etc.) out of `App.__init__` into the controller.
|
||||||
- Ensure `gui_2.py` only reads state and dispatches events.
|
- **Logic Migration**: Move background threads, file reading/writing (`_flush_to_project`), and AI orchestrator invocations to the controller.
|
||||||
|
- **View Refactoring**: Refactor `gui_2.py` to accept the controller as a dependency and merely render its current state.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
- [ ] `app_controller.py` exists and owns the application state.
|
||||||
|
- [ ] `gui_2.py` has been reduced in size and complexity (no file I/O or AI calls).
|
||||||
|
- [ ] All existing features (chat, tools, tracks) function identically.
|
||||||
|
- [ ] The full test suite runs and passes against the new decoupled architecture.
|
||||||
@@ -1,14 +1,36 @@
|
|||||||
# Implementation Plan: Hook API UI State
|
# Implementation Plan: Hook API UI State Verification (hook_api_ui_state_verification_20260302)
|
||||||
|
|
||||||
## Phase 1: API Endpoint
|
## Phase 1: API Endpoint Implementation
|
||||||
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
||||||
- [ ] Task: Implement `/api/gui/state` GET endpoint.
|
- [ ] Task: Implement `/api/gui/state` GET Endpoint
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
|
- [ ] WHERE: `gui_2.py` (or `app_controller.py` if decoupled), inside `create_api()`.
|
||||||
|
- [ ] WHAT: Add a FastAPI route that serializes allowed UI state variables into JSON.
|
||||||
|
- [ ] HOW: Define a set of safe keys (e.g., `_gettable_fields`) and extract them from the App instance.
|
||||||
|
- [ ] SAFETY: Use thread-safe reads or deepcopies if accessing complex dictionaries.
|
||||||
|
- [ ] Task: Update `ApiHookClient`
|
||||||
|
- [ ] WHERE: `api_hook_client.py`
|
||||||
|
- [ ] WHAT: Add a `get_gui_state(self)` method that hits the new endpoint.
|
||||||
|
- [ ] HOW: Standard `requests.get`.
|
||||||
|
- [ ] SAFETY: Include error handling/timeouts.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 1: API Endpoint' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 2: State Wiring
|
## Phase 2: State Wiring & Integration Tests
|
||||||
- [ ] Task: Add UI state fields to `_settable_fields`.
|
- [ ] Task: Wire Critical UI States
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
|
- [ ] WHERE: `gui_2.py`
|
||||||
|
- [ ] WHAT: Ensure fields like `ui_focus_agent`, `active_discussion`, `_track_discussion_active` are included in the exposed state.
|
||||||
|
- [ ] HOW: Update the mapping definition.
|
||||||
|
- [ ] SAFETY: None.
|
||||||
|
- [ ] Task: Write `live_gui` Integration Tests
|
||||||
|
- [ ] WHERE: `tests/test_live_gui_integration.py`
|
||||||
|
- [ ] WHAT: Add a test that changes the provider/model or focus agent via actions, then asserts `client.get_gui_state()` reflects the change.
|
||||||
|
- [ ] HOW: Use `pytest` and `live_gui` fixture.
|
||||||
|
- [ ] SAFETY: Ensure robust wait conditions for GUI updates.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 2: State Wiring & Tests' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 3: Integration Tests
|
## Phase 3: Final Validation
|
||||||
- [ ] Task: Write `live_gui` tests validating state retrieval.
|
- [ ] Task: Full Suite Validation & Warning Cleanup
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
|
- [ ] WHERE: Project root
|
||||||
|
- [ ] WHAT: `uv run pytest`
|
||||||
|
- [ ] HOW: Ensure 100% pass rate.
|
||||||
|
- [ ] SAFETY: Ensure the hook server gracefully stops.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)
|
||||||
@@ -1,9 +1,18 @@
|
|||||||
# Track Specification: Hook API UI State Verification
|
# Track Specification: Hook API UI State Verification (hook_api_ui_state_verification_20260302)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
Adds an `/api/gui/state` endpoint to expose internal UI widget states (like `ui_focus_agent`) for reliable programmatic testing without user confirmation.
|
Currently, manual verification of UI widget state is difficult, and automated testing relies heavily on brittle logic. This track will expose internal UI widget states (like `ui_focus_agent`) via a new `/api/gui/state` GET endpoint. It wires critical UI state variables into `_settable_fields` so the `live_gui` fixture can programmatically read and assert exact widget states without requiring user confirmation dialogs.
|
||||||
|
|
||||||
|
## Architectural Constraints
|
||||||
|
- **Idempotent Reads**: The `/api/gui/state` endpoint MUST be read-only and free of side-effects.
|
||||||
|
- **Thread Safety**: Reading UI state from the HookServer thread MUST use the established locking mechanisms (e.g., querying via thread-safe proxies or safe reads of primitive types).
|
||||||
|
|
||||||
## Functional Requirements
|
## Functional Requirements
|
||||||
- Add `/api/gui/state` endpoint to the HookServer.
|
- **New Endpoint**: Implement a `/api/gui/state` GET endpoint in the headless API.
|
||||||
- Wire UI state variables into `_settable_fields`.
|
- **State Wiring**: Expand `_settable_fields` (or create a new `_gettable_fields` mapping) to safely expose internal UI states (combo boxes, checkbox states, active tabs).
|
||||||
- Write `live_gui` integration tests to assert widget states.
|
- **Integration Testing**: Write `live_gui` based integration tests that mutate the application state and assert the correct UI state via the new endpoint.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
- [ ] `/api/gui/state` endpoint successfully returns JSON representing the UI state.
|
||||||
|
- [ ] Key UI variables (like `ui_focus_agent`) are queryable via the Hook Client.
|
||||||
|
- [ ] New `live_gui` integration tests exist that validate UI state retrieval.
|
||||||
@@ -1,10 +1,26 @@
|
|||||||
# Implementation Plan: Robust JSON Parsing
|
# Implementation Plan: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302)
|
||||||
|
|
||||||
## Phase 1: Retry Logic
|
## Phase 1: Implementation of Retry Logic
|
||||||
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
||||||
- [ ] Task: Implement retry loop in `conductor_tech_lead.py`.
|
- [ ] Task: Implement Retry Loop in `generate_tickets`
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
|
- [ ] WHERE: `conductor_tech_lead.py:generate_tickets`
|
||||||
|
- [ ] WHAT: Wrap the `send` and `json.loads` calls in a `for _ in range(max_retries)` loop.
|
||||||
|
- [ ] HOW: If `JSONDecodeError` is caught, append an error message to the context and loop. If it succeeds, `break` and return.
|
||||||
|
- [ ] SAFETY: Ensure token limits aren't massively breached by appending huge error states. Truncate raw output if necessary.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Implementation' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 2: Validation
|
## Phase 2: Unit Testing
|
||||||
- [ ] Task: Write unit tests simulating JSON hallucination.
|
- [ ] Task: Write Simulation Tests for JSON Parsing
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
|
- [ ] WHERE: `tests/test_conductor_tech_lead.py`
|
||||||
|
- [ ] WHAT: Add tests `test_generate_tickets_retry_success` and `test_generate_tickets_retry_failure`.
|
||||||
|
- [ ] HOW: Mock `ai_client.send` side_effect to return invalid JSON first, then valid JSON. Assert call counts.
|
||||||
|
- [ ] SAFETY: Standard pytest mocking.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Unit Testing' (Protocol in workflow.md)
|
||||||
|
|
||||||
|
## Phase 3: Final Validation
|
||||||
|
- [ ] Task: Full Suite Validation & Warning Cleanup
|
||||||
|
- [ ] WHERE: Project root
|
||||||
|
- [ ] WHAT: `uv run pytest tests/test_conductor_tech_lead.py`
|
||||||
|
- [ ] HOW: Ensure 100% pass rate.
|
||||||
|
- [ ] SAFETY: None.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)
|
||||||
@@ -1,9 +1,20 @@
|
|||||||
# Track Specification: Robust JSON Parsing for Tech Lead
|
# Track Specification: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
`conductor_tech_lead.py` silently fails if Tier 2 outputs invalid JSON. This track adds an auto-retry loop that feeds tracebacks back to the LLM for self-correction.
|
In `conductor_tech_lead.py`, the `generate_tickets` function relies on a generic `try...except` block to parse the LLM's JSON ticket array. If the Tier 2 model hallucinates or outputs invalid JSON, it silently returns an empty array `[]`, causing the GUI track creation process to fail silently. This track adds an auto-retry loop that catches `JSONDecodeError` and feeds the traceback back to the LLM for self-correction.
|
||||||
|
|
||||||
|
## Architectural Constraints
|
||||||
|
- **Max Retries**: The retry loop MUST have a hard cap (e.g., 3 retries) to prevent infinite loops and runaway API costs.
|
||||||
|
- **Error Injection**: The error message fed back to the LLM must include the specific `JSONDecodeError` trace and the raw string it attempted to parse.
|
||||||
|
|
||||||
## Functional Requirements
|
## Functional Requirements
|
||||||
- Add retry loop in `generate_tickets`.
|
- Modify `generate_tickets` in `conductor_tech_lead.py` to wrap the `ai_client.send` call in a retry loop.
|
||||||
- Catch `JSONDecodeError` and reprompt the model.
|
- If `json.loads()` fails, construct a corrective prompt (e.g., "Your previous output failed to parse as JSON: {error}. Here was your output: {raw_text}. Please fix the formatting and output ONLY valid JSON.")
|
||||||
- Abort after N failures.
|
- Send the corrective prompt via a new `ai_client.send` turn within the same session.
|
||||||
|
- Abort and raise a structured error if the max retry count is reached.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
- [ ] `generate_tickets` includes a `while` loop with a max retry cap.
|
||||||
|
- [ ] Invalid JSON responses automatically trigger a corrective reprompt to the model.
|
||||||
|
- [ ] Unit tests exist that use `unittest.mock` on the AI client to simulate 1 failure followed by 1 success, asserting the final valid parse.
|
||||||
|
- [ ] Unit tests exist simulating repeated failures hitting the retry cap.
|
||||||
@@ -1,18 +1,40 @@
|
|||||||
# Implementation Plan: Strict Static Analysis & Type Safety
|
# Implementation Plan: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302)
|
||||||
|
|
||||||
## Phase 1: Configuration & Tooling
|
## Phase 1: Configuration & Tooling Setup
|
||||||
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
||||||
- [ ] Task: Configure strict `mypy.ini` and update `pyproject.toml`.
|
- [ ] Task: Configure Strict Mypy Settings
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
|
- [ ] WHERE: `pyproject.toml` or `mypy.ini`
|
||||||
|
- [ ] WHAT: Enable `strict = true`, `disallow_untyped_defs = true`, `disallow_incomplete_defs = true`.
|
||||||
|
- [ ] HOW: Modify the toml/ini config file directly.
|
||||||
|
- [ ] SAFETY: May cause a massive spike in reported errors initially.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Configuration' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 2: Core Library Typing
|
## Phase 2: Core Library Typing Resolution
|
||||||
- [ ] Task: Resolve typing in `api_hook_client.py` and models.
|
- [ ] Task: Resolve `api_hook_client.py` and `models.py` Type Errors
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
|
- [ ] WHERE: `api_hook_client.py`, `models.py`, `events.py`
|
||||||
|
- [ ] WHAT: Add explicit type hints to all function arguments, return values, and complex dictionaries. Resolve `Any` bleeding.
|
||||||
|
- [ ] HOW: Surgical type annotations (`dict[str, Any]`, `list[str]`, etc.).
|
||||||
|
- [ ] SAFETY: Do not change runtime logic, only type signatures.
|
||||||
|
- [ ] Task: Resolve Conductor Subsystem Type Errors
|
||||||
|
- [ ] WHERE: `conductor_tech_lead.py`, `dag_engine.py`, `orchestrator_pm.py`
|
||||||
|
- [ ] WHAT: Enforce strict typing on track state, tickets, and DAG models.
|
||||||
|
- [ ] HOW: Standard python typing imports.
|
||||||
|
- [ ] SAFETY: Preserve JSON serialization compatibility.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Core Library' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 3: GUI Typing
|
## Phase 3: GUI God-Object Typing Resolution
|
||||||
- [ ] Task: Resolve typing in `gui_2.py`.
|
- [ ] Task: Resolve `gui_2.py` Type Errors
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
|
- [ ] WHERE: `gui_2.py`
|
||||||
|
- [ ] WHAT: Type the `App` class state variables, method signatures, and ImGui integration boundaries.
|
||||||
|
- [ ] HOW: Use `type: ignore[import]` only for ImGui C-bindings if strictly necessary, but type internal state tightly.
|
||||||
|
- [ ] SAFETY: Ensure `live_gui` tests pass after typing.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 3: GUI Typing' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 4: CI Integration
|
## Phase 4: CI Integration & Final Validation
|
||||||
- [ ] Task: Implement pre-commit hooks for ruff and mypy.
|
- [ ] Task: Establish Pre-Commit Guardrails
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 4'
|
- [ ] WHERE: `.git/hooks/pre-commit` or a `scripts/validate_types.ps1`
|
||||||
|
- [ ] WHAT: Create a script that runs ruff and mypy, blocking commits if they fail.
|
||||||
|
- [ ] HOW: Standard shell scripting.
|
||||||
|
- [ ] SAFETY: Ensure it works cross-platform (Windows/Linux).
|
||||||
|
- [ ] Task: Full Suite Validation & Warning Cleanup
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 4: Validation' (Protocol in workflow.md)
|
||||||
@@ -1,10 +1,21 @@
|
|||||||
# Track Specification: Strict Static Analysis & Type Safety
|
# Track Specification: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
The codebase suffers from massive type-safety debt (512+ mypy errors). This track resolves all violations, enforces strict typing across `gui_2.py` and `api_hook_client.py`, and integrates pre-commit checks.
|
The codebase currently suffers from massive type-safety debt (512+ `mypy` errors across 64 files) and lingering `ruff` violations. This track will harden the foundation by resolving all violations, enforcing strict typing (especially in `gui_2.py` and `api_hook_client.py`), and integrating pre-commit checks. This is a prerequisite for safe AI-driven refactoring.
|
||||||
|
|
||||||
|
## Architectural Constraints: The "Strict Typing Contract"
|
||||||
|
- **No Implicit Any**: Variables and function returns must have explicit types.
|
||||||
|
- **No Ignored Errors**: Do not use `# type: ignore` unless absolutely unavoidable (e.g., for poorly typed third-party C bindings). If used, it must include a specific error code.
|
||||||
|
- **Strict Optionals**: All optional types must be explicitly defined (e.g., `str | None`).
|
||||||
|
|
||||||
## Functional Requirements
|
## Functional Requirements
|
||||||
- Resolve all mypy errors.
|
- **Mypy Resolution**: Fix all 512+ existing `mypy` errors.
|
||||||
- Resolve all remaining ruff violations.
|
- **Ruff Resolution**: Fix all remaining `ruff` linting violations.
|
||||||
- Enforce strict typing.
|
- **Configuration**: Update `pyproject.toml` or `mypy.ini` to enforce strict type checking globally.
|
||||||
- Add CI/pre-commit hook for linting.
|
- **CI/Automation**: Implement a pre-commit hook or script (`scripts/check_hints.py` equivalent) to block untyped code.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
- [ ] `uv run mypy --strict .` returns 0 errors.
|
||||||
|
- [ ] `uv run ruff check .` returns 0 violations.
|
||||||
|
- [ ] No new `# type: ignore` comments are added without justification.
|
||||||
|
- [ ] Pre-commit hook or validation script is documented and active.
|
||||||
@@ -1,14 +1,36 @@
|
|||||||
# Implementation Plan: Test Suite Performance
|
# Implementation Plan: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)
|
||||||
|
|
||||||
## Phase 1: Audit & Polling Primitives
|
## Phase 1: Audit & Polling Primitives
|
||||||
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
|
||||||
- [ ] Task: Create deterministic polling primitives in `conftest.py`.
|
- [ ] Task: Create Deterministic Polling Primitives
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 1'
|
- [ ] WHERE: `tests/conftest.py`
|
||||||
|
- [ ] WHAT: Implement a `wait_until(predicate_fn, timeout=5.0, interval=0.05)` utility.
|
||||||
|
- [ ] HOW: Standard while loop that evaluates `predicate_fn()`.
|
||||||
|
- [ ] SAFETY: Ensure it raises a clear `TimeoutError` if it fails.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Polling Primitives' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 2: Refactoring Sleeps
|
## Phase 2: Refactoring Integration Tests
|
||||||
- [ ] Task: Replace `time.sleep` across integration tests.
|
- [ ] Task: Refactor `test_spawn_interception.py`
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 2'
|
- [ ] WHERE: `tests/test_spawn_interception.py`
|
||||||
|
- [ ] WHAT: Replace hardcoded sleeps with `wait_until` checking the `event_queue` or internal state.
|
||||||
|
- [ ] HOW: Use the new `conftest.py` utility.
|
||||||
|
- [ ] SAFETY: Prevent event loop deadlocks.
|
||||||
|
- [ ] Task: Refactor Simulation Waits
|
||||||
|
- [ ] WHERE: `simulation/*.py` and `tests/test_live_gui_integration.py`
|
||||||
|
- [ ] WHAT: Replace `time.sleep()` blocks with `ApiHookClient.wait_for_event` or `client.wait_until_value_equals`.
|
||||||
|
- [ ] HOW: Expand `ApiHookClient` polling capabilities if necessary.
|
||||||
|
- [ ] SAFETY: Ensure the GUI hook server remains responsive during rapid polling.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Refactoring Sleeps' (Protocol in workflow.md)
|
||||||
|
|
||||||
## Phase 3: Test Marking
|
## Phase 3: Test Marking & Final Validation
|
||||||
- [ ] Task: Apply `@pytest.mark.slow` to long-running tests.
|
- [ ] Task: Apply Slow Test Marks
|
||||||
- [ ] Task: Conductor - User Manual Verification 'Phase 3'
|
- [ ] WHERE: Across all `tests/`
|
||||||
|
- [ ] WHAT: Add `@pytest.mark.slow` to any test requiring a live GUI boot or API mocking that takes >2 seconds.
|
||||||
|
- [ ] HOW: Import pytest and apply the decorator.
|
||||||
|
- [ ] SAFETY: Update `pyproject.toml` to register the `slow` marker.
|
||||||
|
- [ ] Task: Full Suite Performance Validation
|
||||||
|
- [ ] WHERE: Project root
|
||||||
|
- [ ] WHAT: Run `uv run pytest -m "not slow"` and verify execution time < 10 seconds. Run `uv run pytest` to ensure total suite passes.
|
||||||
|
- [ ] HOW: Time the terminal command.
|
||||||
|
- [ ] SAFETY: None.
|
||||||
|
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)
|
||||||
@@ -1,9 +1,19 @@
|
|||||||
# Track Specification: Test Suite Performance & Flakiness
|
# Track Specification: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
The test suite is slow and flaky due to `time.sleep()`. This track replaces sleeps with deterministic polling (`threading.Event()`), aiming for a <10s core TDD loop.
|
The test suite currently takes over 5.0 minutes to execute and frequently hangs on integration tests (e.g., `test_spawn_interception.py`). Several simulation tests are flaky or timing out. This track replaces arbitrary `time.sleep()` calls with deterministic polling (`threading.Event()`), aiming to drive the core TDD test execution time down to under 10 seconds.
|
||||||
|
|
||||||
|
## Architectural Constraints
|
||||||
|
- **Zero Arbitrary Sleeps**: `time.sleep(1.0)` is banned in test files unless testing actual rate-limiting or debounce functionality.
|
||||||
|
- **Deterministic Waits**: Tests must use state-polling (with aggressive micro-sleeps) or `asyncio.Event` / `threading.Event` to proceed exactly when the system is ready.
|
||||||
|
|
||||||
## Functional Requirements
|
## Functional Requirements
|
||||||
- Audit and remove `time.sleep()` in tests.
|
- Audit all `tests/` and `simulation/` files for `time.sleep()`.
|
||||||
- Implement deterministic event polling.
|
- Implement polling helper functions in `conftest.py` (e.g., `wait_until(condition_func, timeout)`).
|
||||||
- Mark slow integration tests with `@pytest.mark.slow`.
|
- Refactor all integration tests to use the deterministic polling helpers.
|
||||||
|
- Apply `@pytest.mark.slow` to any test that legitimately takes >2 seconds, allowing developers to skip them during rapid TDD loops.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
- [ ] `time.sleep` occurrences in the test suite are eliminated or strictly justified.
|
||||||
|
- [ ] The core unit test suite (excluding `@pytest.mark.slow`) executes in under 10 seconds.
|
||||||
|
- [ ] Integration tests pass consistently without flakiness across 10 consecutive runs.
|
||||||
Reference in New Issue
Block a user