From 2e73212abd645399e539b7ea565f1c7d9377c339 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Mon, 2 Mar 2026 22:38:02 -0500 Subject: [PATCH] chore(conductor): Enhance all 6 backlog tracks to Surgical Spec Protocol --- .../plan.md | 35 +++++++++--- .../spec.md | 18 ++++-- .../plan.md | 57 ++++++++++++++----- .../spec.md | 22 +++++-- .../plan.md | 42 ++++++++++---- .../spec.md | 19 +++++-- .../plan.md | 30 +++++++--- .../spec.md | 21 +++++-- .../plan.md | 48 +++++++++++----- .../spec.md | 23 ++++++-- .../plan.md | 40 ++++++++++--- .../spec.md | 20 +++++-- 12 files changed, 286 insertions(+), 89 deletions(-) diff --git a/conductor/tracks/concurrent_tier_source_tier_20260302/plan.md b/conductor/tracks/concurrent_tier_source_tier_20260302/plan.md index 0e876bb..5fc7858 100644 --- a/conductor/tracks/concurrent_tier_source_tier_20260302/plan.md +++ b/conductor/tracks/concurrent_tier_source_tier_20260302/plan.md @@ -1,10 +1,31 @@ -# Implementation Plan: Concurrent Tier Isolation +# Implementation Plan: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302) -## Phase 1: Thread-Local Storage +## Phase 1: Thread-Local Context Refactoring - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` -- [ ] Task: Replace `current_tier` with `threading.local()`. -- [ ] Task: Conductor - User Manual Verification 'Phase 1' +- [ ] Task: Refactor `ai_client` to `threading.local()` + - [ ] WHERE: `ai_client.py` + - [ ] WHAT: Replace `current_tier = None` with `_local_context = threading.local()`. Implement safe getters/setters for the tier. + - [ ] HOW: Use standard `threading.local` attributes. + - [ ] SAFETY: Provide defaults (e.g., `getattr(_local_context, 'tier', None)`) so uninitialized threads don't crash. +- [ ] Task: Update Lifecycle Callers + - [ ] WHERE: `multi_agent_conductor.py`, `conductor_tech_lead.py` + - [ ] WHAT: Update how they set the current tier around `send()` calls. + - [ ] HOW: Use the new setter/getter functions from `ai_client`. + - [ ] SAFETY: Ensure `finally` blocks clean up the thread-local state. +- [ ] Task: Conductor - User Manual Verification 'Phase 1: Refactoring' (Protocol in workflow.md) -## Phase 2: Refactor & Test -- [ ] Task: Update loggers and test with mock concurrent threads. -- [ ] Task: Conductor - User Manual Verification 'Phase 2' \ No newline at end of file +## Phase 2: Testing Concurrency +- [ ] Task: Write Concurrent Execution Test + - [ ] WHERE: `tests/test_ai_client_concurrency.py` (New) + - [ ] WHAT: Spawn two threads. Thread A sets Tier 3 and calls a mock `send`. Thread B sets Tier 4 and calls mock `send`. + - [ ] HOW: Assert that the resulting `comms_log` correctly maps the entries to Tier 3 and Tier 4 respectively without race condition overwrites. + - [ ] SAFETY: Use `threading.Barrier` to force race conditions in the test to ensure the isolation holds. +- [ ] Task: Conductor - User Manual Verification 'Phase 2: Testing Concurrency' (Protocol in workflow.md) + +## Phase 3: Final Validation +- [ ] Task: Full Suite Validation & Warning Cleanup + - [ ] WHERE: Project root + - [ ] WHAT: `uv run pytest` + - [ ] HOW: Ensure 100% pass rate. + - [ ] SAFETY: None. +- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md) \ No newline at end of file diff --git a/conductor/tracks/concurrent_tier_source_tier_20260302/spec.md b/conductor/tracks/concurrent_tier_source_tier_20260302/spec.md index f4f3fb9..aa936f2 100644 --- a/conductor/tracks/concurrent_tier_source_tier_20260302/spec.md +++ b/conductor/tracks/concurrent_tier_source_tier_20260302/spec.md @@ -1,8 +1,18 @@ -# Track Specification: Concurrent Tier Source Isolation +# Track Specification: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302) ## Overview -Prepares the architecture for parallel Tier 3/4 agents by replacing the global `ai_client.current_tier` with thread-safe `threading.local()` or explicit call signatures. +Currently, `ai_client.current_tier` is a module-level `str | None`. This works safely only because the MMA engine serializes `ai_client.send()` calls. To prepare the architecture for parallel agents (e.g., executing multiple Tier 3 worker tickets concurrently), this global state must be replaced. This track will refactor the tagging system to use thread-safe context. + +## Architectural Constraints +- **Thread Safety**: The solution MUST guarantee that if two threads call `ai_client.send()` simultaneously, their `source_tier` logs do not cross-contaminate. +- **API Surface**: Prefer passing `source_tier` explicitly in the `send()` method signature over implicit global/local state to ensure functional purity, OR use strictly isolated `threading.local()`. ## Functional Requirements -- Refactor `current_tier` to be thread-safe. -- Update all logging calls to use the thread-safe context. \ No newline at end of file +- Refactor `ai_client.py` to remove the global `current_tier` variable. +- Update `run_worker_lifecycle` and `generate_tickets` to pass the tier context directly to the AI client or into a `threading.local` context block. +- Update `_append_comms` and `_append_tool_log` to utilize the thread-safe context. + +## Acceptance Criteria +- [ ] `ai_client.current_tier` global variable is removed. +- [ ] `source_tier` tagging in `_comms_log` and `_tool_log` continues to function accurately. +- [ ] Tests simulate concurrent `send()` calls from different threads and assert correct log tagging without race conditions. \ No newline at end of file diff --git a/conductor/tracks/gui_decoupling_controller_20260302/plan.md b/conductor/tracks/gui_decoupling_controller_20260302/plan.md index db9f816..aa9b829 100644 --- a/conductor/tracks/gui_decoupling_controller_20260302/plan.md +++ b/conductor/tracks/gui_decoupling_controller_20260302/plan.md @@ -1,18 +1,49 @@ -# Implementation Plan: GUI Decoupling +# Implementation Plan: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302) -## Phase 1: Controller Skeleton +## Phase 1: Controller Skeleton & State Migration - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` -- [ ] Task: Create `app_controller.py`. -- [ ] Task: Conductor - User Manual Verification 'Phase 1' +- [ ] Task: Create `app_controller.py` Skeleton + - [ ] WHERE: `app_controller.py` (New file) + - [ ] WHAT: Create the `AppController` class. Initialize basic state structures (logs, metrics, flags). + - [ ] HOW: Standard class definition. + - [ ] SAFETY: Do not break existing GUI yet. +- [ ] Task: Migrate Data State from GUI + - [ ] WHERE: `gui_2.py:__init__` and `app_controller.py` + - [ ] WHAT: Move variables like `_comms_log`, `_tool_log`, `mma_streams`, `active_tickets` to the controller. + - [ ] HOW: Update GUI to reference `self.controller.mma_streams` instead of `self.mma_streams`. + - [ ] SAFETY: Search and replace carefully; use `py_check_syntax`. +- [ ] Task: Conductor - User Manual Verification 'Phase 1: State Migration' (Protocol in workflow.md) -## Phase 2: State Migration -- [ ] Task: Move App state from `gui_2.py` to controller. -- [ ] Task: Conductor - User Manual Verification 'Phase 2' +## Phase 2: Logic & Background Thread Migration +- [ ] Task: Extract Background Threads & Event Queue + - [ ] WHERE: `gui_2.py` (e.g., `_init_ai_and_hooks`, `_process_event_queue`) + - [ ] WHAT: Move the `AsyncEventQueue`, asyncio worker thread, and HookServer initialization to the controller. + - [ ] HOW: The GUI should just call `self.controller.start_services()` and read the `_pending_gui_tasks` queue. + - [ ] SAFETY: Thread lifecycle management is critical. Ensure shutdown hooks are migrated. +- [ ] Task: Extract I/O and AI Methods + - [ ] WHERE: `gui_2.py` (`_cb_plan_epic`, `_flush_to_project`, `_cb_create_track`) + - [ ] WHAT: Move business logic methods to the controller. + - [ ] HOW: GUI callbacks simply become `lambda: self.controller.plan_epic(input)`. + - [ ] SAFETY: Verify Hook API endpoints still work. +- [ ] Task: Conductor - User Manual Verification 'Phase 2: Logic Migration' (Protocol in workflow.md) -## Phase 3: Logic Migration -- [ ] Task: Move non-rendering methods to controller. -- [ ] Task: Conductor - User Manual Verification 'Phase 3' +## Phase 3: Test Suite Refactoring +- [ ] Task: Update `conftest.py` Fixtures + - [ ] WHERE: `tests/conftest.py` + - [ ] WHAT: Update `app_instance` fixture to mock/initialize the `AppController` instead of just `App`. + - [ ] HOW: Adjust `patch` targets to hit `app_controller.py` where appropriate. + - [ ] SAFETY: Run subset of tests continuously to fix import breaks. +- [ ] Task: Resolve Broken GUI Tests + - [ ] WHERE: `tests/test_gui_*.py` + - [ ] WHAT: Update test assertions that look for state on `app_instance` to look at `app_instance.controller`. + - [ ] HOW: Surgical string replacements. + - [ ] SAFETY: Ensure no false-positives. +- [ ] Task: Conductor - User Manual Verification 'Phase 3: Test Suite Refactoring' (Protocol in workflow.md) -## Phase 4: Validation -- [ ] Task: Update all tests to mock/use the controller. -- [ ] Task: Conductor - User Manual Verification 'Phase 4' \ No newline at end of file +## Phase 4: Final Validation +- [ ] Task: Full Suite Validation & Warning Cleanup + - [ ] WHERE: Project root + - [ ] WHAT: `uv run pytest` + - [ ] HOW: Ensure 100% pass rate. + - [ ] SAFETY: Watch out for lingering thread closure issues. +- [ ] Task: Conductor - User Manual Verification 'Phase 4: Final Validation' (Protocol in workflow.md) \ No newline at end of file diff --git a/conductor/tracks/gui_decoupling_controller_20260302/spec.md b/conductor/tracks/gui_decoupling_controller_20260302/spec.md index 121c85a..8e826be 100644 --- a/conductor/tracks/gui_decoupling_controller_20260302/spec.md +++ b/conductor/tracks/gui_decoupling_controller_20260302/spec.md @@ -1,9 +1,21 @@ -# Track Specification: GUI Decoupling & Controller Architecture +# Track Specification: GUI Decoupling & Controller Architecture (gui_decoupling_controller_20260302) ## Overview -`gui_2.py` is a monolithic God Object. This track extracts its business logic and state machine into `app_controller.py`, leaving the GUI as a pure immediate-mode view adhering to Data-Oriented Design. +`gui_2.py` currently operates as a Monolithic God Object (3,500+ lines). It violates the Data-Oriented Design heuristic by owning complex business logic, orchestrator hooks, and markdown file building. This track extracts the core state machine and lifecycle into a headless `app_controller.py`, turning the GUI into a pure immediate-mode view. + +## Architectural Constraints: The "Immediate Mode View" Contract +- **No Business Logic in View**: `gui_2.py` MUST NOT perform file I/O, AI API calls, or subprocess management directly. +- **State Ownership**: `app_controller.py` (or equivalent) owns the "Source of Truth" state. +- **Event-Driven Mutations**: The GUI must mutate state exclusively by dispatching events or calling controller methods, never by directly manipulating backend objects in the render loop. ## Functional Requirements -- Create `app_controller.py`. -- Migrate state variables and lifecycle methods from `gui_2.py` to the controller. -- Ensure `gui_2.py` only reads state and dispatches events. \ No newline at end of file +- **Controller Extraction**: Create `app_controller.py` to handle all non-rendering logic. +- **State Migration**: Move state variables (`_tool_log`, `_comms_log`, `active_tickets`, etc.) out of `App.__init__` into the controller. +- **Logic Migration**: Move background threads, file reading/writing (`_flush_to_project`), and AI orchestrator invocations to the controller. +- **View Refactoring**: Refactor `gui_2.py` to accept the controller as a dependency and merely render its current state. + +## Acceptance Criteria +- [ ] `app_controller.py` exists and owns the application state. +- [ ] `gui_2.py` has been reduced in size and complexity (no file I/O or AI calls). +- [ ] All existing features (chat, tools, tracks) function identically. +- [ ] The full test suite runs and passes against the new decoupled architecture. \ No newline at end of file diff --git a/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md b/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md index bc94e99..ba95bd9 100644 --- a/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md +++ b/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md @@ -1,14 +1,36 @@ -# Implementation Plan: Hook API UI State +# Implementation Plan: Hook API UI State Verification (hook_api_ui_state_verification_20260302) -## Phase 1: API Endpoint +## Phase 1: API Endpoint Implementation - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` -- [ ] Task: Implement `/api/gui/state` GET endpoint. -- [ ] Task: Conductor - User Manual Verification 'Phase 1' +- [ ] Task: Implement `/api/gui/state` GET Endpoint + - [ ] WHERE: `gui_2.py` (or `app_controller.py` if decoupled), inside `create_api()`. + - [ ] WHAT: Add a FastAPI route that serializes allowed UI state variables into JSON. + - [ ] HOW: Define a set of safe keys (e.g., `_gettable_fields`) and extract them from the App instance. + - [ ] SAFETY: Use thread-safe reads or deepcopies if accessing complex dictionaries. +- [ ] Task: Update `ApiHookClient` + - [ ] WHERE: `api_hook_client.py` + - [ ] WHAT: Add a `get_gui_state(self)` method that hits the new endpoint. + - [ ] HOW: Standard `requests.get`. + - [ ] SAFETY: Include error handling/timeouts. +- [ ] Task: Conductor - User Manual Verification 'Phase 1: API Endpoint' (Protocol in workflow.md) -## Phase 2: State Wiring -- [ ] Task: Add UI state fields to `_settable_fields`. -- [ ] Task: Conductor - User Manual Verification 'Phase 2' +## Phase 2: State Wiring & Integration Tests +- [ ] Task: Wire Critical UI States + - [ ] WHERE: `gui_2.py` + - [ ] WHAT: Ensure fields like `ui_focus_agent`, `active_discussion`, `_track_discussion_active` are included in the exposed state. + - [ ] HOW: Update the mapping definition. + - [ ] SAFETY: None. +- [ ] Task: Write `live_gui` Integration Tests + - [ ] WHERE: `tests/test_live_gui_integration.py` + - [ ] WHAT: Add a test that changes the provider/model or focus agent via actions, then asserts `client.get_gui_state()` reflects the change. + - [ ] HOW: Use `pytest` and `live_gui` fixture. + - [ ] SAFETY: Ensure robust wait conditions for GUI updates. +- [ ] Task: Conductor - User Manual Verification 'Phase 2: State Wiring & Tests' (Protocol in workflow.md) -## Phase 3: Integration Tests -- [ ] Task: Write `live_gui` tests validating state retrieval. -- [ ] Task: Conductor - User Manual Verification 'Phase 3' \ No newline at end of file +## Phase 3: Final Validation +- [ ] Task: Full Suite Validation & Warning Cleanup + - [ ] WHERE: Project root + - [ ] WHAT: `uv run pytest` + - [ ] HOW: Ensure 100% pass rate. + - [ ] SAFETY: Ensure the hook server gracefully stops. +- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md) \ No newline at end of file diff --git a/conductor/tracks/hook_api_ui_state_verification_20260302/spec.md b/conductor/tracks/hook_api_ui_state_verification_20260302/spec.md index 11d8716..3663dfb 100644 --- a/conductor/tracks/hook_api_ui_state_verification_20260302/spec.md +++ b/conductor/tracks/hook_api_ui_state_verification_20260302/spec.md @@ -1,9 +1,18 @@ -# Track Specification: Hook API UI State Verification +# Track Specification: Hook API UI State Verification (hook_api_ui_state_verification_20260302) ## Overview -Adds an `/api/gui/state` endpoint to expose internal UI widget states (like `ui_focus_agent`) for reliable programmatic testing without user confirmation. +Currently, manual verification of UI widget state is difficult, and automated testing relies heavily on brittle logic. This track will expose internal UI widget states (like `ui_focus_agent`) via a new `/api/gui/state` GET endpoint. It wires critical UI state variables into `_settable_fields` so the `live_gui` fixture can programmatically read and assert exact widget states without requiring user confirmation dialogs. + +## Architectural Constraints +- **Idempotent Reads**: The `/api/gui/state` endpoint MUST be read-only and free of side-effects. +- **Thread Safety**: Reading UI state from the HookServer thread MUST use the established locking mechanisms (e.g., querying via thread-safe proxies or safe reads of primitive types). ## Functional Requirements -- Add `/api/gui/state` endpoint to the HookServer. -- Wire UI state variables into `_settable_fields`. -- Write `live_gui` integration tests to assert widget states. \ No newline at end of file +- **New Endpoint**: Implement a `/api/gui/state` GET endpoint in the headless API. +- **State Wiring**: Expand `_settable_fields` (or create a new `_gettable_fields` mapping) to safely expose internal UI states (combo boxes, checkbox states, active tabs). +- **Integration Testing**: Write `live_gui` based integration tests that mutate the application state and assert the correct UI state via the new endpoint. + +## Acceptance Criteria +- [ ] `/api/gui/state` endpoint successfully returns JSON representing the UI state. +- [ ] Key UI variables (like `ui_focus_agent`) are queryable via the Hook Client. +- [ ] New `live_gui` integration tests exist that validate UI state retrieval. \ No newline at end of file diff --git a/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md b/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md index 924defe..1cccfa1 100644 --- a/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md +++ b/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md @@ -1,10 +1,26 @@ -# Implementation Plan: Robust JSON Parsing +# Implementation Plan: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302) -## Phase 1: Retry Logic +## Phase 1: Implementation of Retry Logic - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` -- [ ] Task: Implement retry loop in `conductor_tech_lead.py`. -- [ ] Task: Conductor - User Manual Verification 'Phase 1' +- [ ] Task: Implement Retry Loop in `generate_tickets` + - [ ] WHERE: `conductor_tech_lead.py:generate_tickets` + - [ ] WHAT: Wrap the `send` and `json.loads` calls in a `for _ in range(max_retries)` loop. + - [ ] HOW: If `JSONDecodeError` is caught, append an error message to the context and loop. If it succeeds, `break` and return. + - [ ] SAFETY: Ensure token limits aren't massively breached by appending huge error states. Truncate raw output if necessary. +- [ ] Task: Conductor - User Manual Verification 'Phase 1: Implementation' (Protocol in workflow.md) -## Phase 2: Validation -- [ ] Task: Write unit tests simulating JSON hallucination. -- [ ] Task: Conductor - User Manual Verification 'Phase 2' \ No newline at end of file +## Phase 2: Unit Testing +- [ ] Task: Write Simulation Tests for JSON Parsing + - [ ] WHERE: `tests/test_conductor_tech_lead.py` + - [ ] WHAT: Add tests `test_generate_tickets_retry_success` and `test_generate_tickets_retry_failure`. + - [ ] HOW: Mock `ai_client.send` side_effect to return invalid JSON first, then valid JSON. Assert call counts. + - [ ] SAFETY: Standard pytest mocking. +- [ ] Task: Conductor - User Manual Verification 'Phase 2: Unit Testing' (Protocol in workflow.md) + +## Phase 3: Final Validation +- [ ] Task: Full Suite Validation & Warning Cleanup + - [ ] WHERE: Project root + - [ ] WHAT: `uv run pytest tests/test_conductor_tech_lead.py` + - [ ] HOW: Ensure 100% pass rate. + - [ ] SAFETY: None. +- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md) \ No newline at end of file diff --git a/conductor/tracks/robust_json_parsing_tech_lead_20260302/spec.md b/conductor/tracks/robust_json_parsing_tech_lead_20260302/spec.md index 22a9949..8347d58 100644 --- a/conductor/tracks/robust_json_parsing_tech_lead_20260302/spec.md +++ b/conductor/tracks/robust_json_parsing_tech_lead_20260302/spec.md @@ -1,9 +1,20 @@ -# Track Specification: Robust JSON Parsing for Tech Lead +# Track Specification: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302) ## Overview -`conductor_tech_lead.py` silently fails if Tier 2 outputs invalid JSON. This track adds an auto-retry loop that feeds tracebacks back to the LLM for self-correction. +In `conductor_tech_lead.py`, the `generate_tickets` function relies on a generic `try...except` block to parse the LLM's JSON ticket array. If the Tier 2 model hallucinates or outputs invalid JSON, it silently returns an empty array `[]`, causing the GUI track creation process to fail silently. This track adds an auto-retry loop that catches `JSONDecodeError` and feeds the traceback back to the LLM for self-correction. + +## Architectural Constraints +- **Max Retries**: The retry loop MUST have a hard cap (e.g., 3 retries) to prevent infinite loops and runaway API costs. +- **Error Injection**: The error message fed back to the LLM must include the specific `JSONDecodeError` trace and the raw string it attempted to parse. ## Functional Requirements -- Add retry loop in `generate_tickets`. -- Catch `JSONDecodeError` and reprompt the model. -- Abort after N failures. \ No newline at end of file +- Modify `generate_tickets` in `conductor_tech_lead.py` to wrap the `ai_client.send` call in a retry loop. +- If `json.loads()` fails, construct a corrective prompt (e.g., "Your previous output failed to parse as JSON: {error}. Here was your output: {raw_text}. Please fix the formatting and output ONLY valid JSON.") +- Send the corrective prompt via a new `ai_client.send` turn within the same session. +- Abort and raise a structured error if the max retry count is reached. + +## Acceptance Criteria +- [ ] `generate_tickets` includes a `while` loop with a max retry cap. +- [ ] Invalid JSON responses automatically trigger a corrective reprompt to the model. +- [ ] Unit tests exist that use `unittest.mock` on the AI client to simulate 1 failure followed by 1 success, asserting the final valid parse. +- [ ] Unit tests exist simulating repeated failures hitting the retry cap. \ No newline at end of file diff --git a/conductor/tracks/strict_static_analysis_and_typing_20260302/plan.md b/conductor/tracks/strict_static_analysis_and_typing_20260302/plan.md index cf4a4e5..ddb3d8c 100644 --- a/conductor/tracks/strict_static_analysis_and_typing_20260302/plan.md +++ b/conductor/tracks/strict_static_analysis_and_typing_20260302/plan.md @@ -1,18 +1,40 @@ -# Implementation Plan: Strict Static Analysis & Type Safety +# Implementation Plan: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302) -## Phase 1: Configuration & Tooling +## Phase 1: Configuration & Tooling Setup - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` -- [ ] Task: Configure strict `mypy.ini` and update `pyproject.toml`. -- [ ] Task: Conductor - User Manual Verification 'Phase 1' +- [ ] Task: Configure Strict Mypy Settings + - [ ] WHERE: `pyproject.toml` or `mypy.ini` + - [ ] WHAT: Enable `strict = true`, `disallow_untyped_defs = true`, `disallow_incomplete_defs = true`. + - [ ] HOW: Modify the toml/ini config file directly. + - [ ] SAFETY: May cause a massive spike in reported errors initially. +- [ ] Task: Conductor - User Manual Verification 'Phase 1: Configuration' (Protocol in workflow.md) -## Phase 2: Core Library Typing -- [ ] Task: Resolve typing in `api_hook_client.py` and models. -- [ ] Task: Conductor - User Manual Verification 'Phase 2' +## Phase 2: Core Library Typing Resolution +- [ ] Task: Resolve `api_hook_client.py` and `models.py` Type Errors + - [ ] WHERE: `api_hook_client.py`, `models.py`, `events.py` + - [ ] WHAT: Add explicit type hints to all function arguments, return values, and complex dictionaries. Resolve `Any` bleeding. + - [ ] HOW: Surgical type annotations (`dict[str, Any]`, `list[str]`, etc.). + - [ ] SAFETY: Do not change runtime logic, only type signatures. +- [ ] Task: Resolve Conductor Subsystem Type Errors + - [ ] WHERE: `conductor_tech_lead.py`, `dag_engine.py`, `orchestrator_pm.py` + - [ ] WHAT: Enforce strict typing on track state, tickets, and DAG models. + - [ ] HOW: Standard python typing imports. + - [ ] SAFETY: Preserve JSON serialization compatibility. +- [ ] Task: Conductor - User Manual Verification 'Phase 2: Core Library' (Protocol in workflow.md) -## Phase 3: GUI Typing -- [ ] Task: Resolve typing in `gui_2.py`. -- [ ] Task: Conductor - User Manual Verification 'Phase 3' +## Phase 3: GUI God-Object Typing Resolution +- [ ] Task: Resolve `gui_2.py` Type Errors + - [ ] WHERE: `gui_2.py` + - [ ] WHAT: Type the `App` class state variables, method signatures, and ImGui integration boundaries. + - [ ] HOW: Use `type: ignore[import]` only for ImGui C-bindings if strictly necessary, but type internal state tightly. + - [ ] SAFETY: Ensure `live_gui` tests pass after typing. +- [ ] Task: Conductor - User Manual Verification 'Phase 3: GUI Typing' (Protocol in workflow.md) -## Phase 4: CI Integration -- [ ] Task: Implement pre-commit hooks for ruff and mypy. -- [ ] Task: Conductor - User Manual Verification 'Phase 4' \ No newline at end of file +## Phase 4: CI Integration & Final Validation +- [ ] Task: Establish Pre-Commit Guardrails + - [ ] WHERE: `.git/hooks/pre-commit` or a `scripts/validate_types.ps1` + - [ ] WHAT: Create a script that runs ruff and mypy, blocking commits if they fail. + - [ ] HOW: Standard shell scripting. + - [ ] SAFETY: Ensure it works cross-platform (Windows/Linux). +- [ ] Task: Full Suite Validation & Warning Cleanup +- [ ] Task: Conductor - User Manual Verification 'Phase 4: Validation' (Protocol in workflow.md) \ No newline at end of file diff --git a/conductor/tracks/strict_static_analysis_and_typing_20260302/spec.md b/conductor/tracks/strict_static_analysis_and_typing_20260302/spec.md index 7236ee9..5d9f7a4 100644 --- a/conductor/tracks/strict_static_analysis_and_typing_20260302/spec.md +++ b/conductor/tracks/strict_static_analysis_and_typing_20260302/spec.md @@ -1,10 +1,21 @@ -# Track Specification: Strict Static Analysis & Type Safety +# Track Specification: Strict Static Analysis & Type Safety (strict_static_analysis_and_typing_20260302) ## Overview -The codebase suffers from massive type-safety debt (512+ mypy errors). This track resolves all violations, enforces strict typing across `gui_2.py` and `api_hook_client.py`, and integrates pre-commit checks. +The codebase currently suffers from massive type-safety debt (512+ `mypy` errors across 64 files) and lingering `ruff` violations. This track will harden the foundation by resolving all violations, enforcing strict typing (especially in `gui_2.py` and `api_hook_client.py`), and integrating pre-commit checks. This is a prerequisite for safe AI-driven refactoring. + +## Architectural Constraints: The "Strict Typing Contract" +- **No Implicit Any**: Variables and function returns must have explicit types. +- **No Ignored Errors**: Do not use `# type: ignore` unless absolutely unavoidable (e.g., for poorly typed third-party C bindings). If used, it must include a specific error code. +- **Strict Optionals**: All optional types must be explicitly defined (e.g., `str | None`). ## Functional Requirements -- Resolve all mypy errors. -- Resolve all remaining ruff violations. -- Enforce strict typing. -- Add CI/pre-commit hook for linting. \ No newline at end of file +- **Mypy Resolution**: Fix all 512+ existing `mypy` errors. +- **Ruff Resolution**: Fix all remaining `ruff` linting violations. +- **Configuration**: Update `pyproject.toml` or `mypy.ini` to enforce strict type checking globally. +- **CI/Automation**: Implement a pre-commit hook or script (`scripts/check_hints.py` equivalent) to block untyped code. + +## Acceptance Criteria +- [ ] `uv run mypy --strict .` returns 0 errors. +- [ ] `uv run ruff check .` returns 0 violations. +- [ ] No new `# type: ignore` comments are added without justification. +- [ ] Pre-commit hook or validation script is documented and active. \ No newline at end of file diff --git a/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md b/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md index 10e641f..b78fb26 100644 --- a/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md +++ b/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md @@ -1,14 +1,36 @@ -# Implementation Plan: Test Suite Performance +# Implementation Plan: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302) ## Phase 1: Audit & Polling Primitives - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` -- [ ] Task: Create deterministic polling primitives in `conftest.py`. -- [ ] Task: Conductor - User Manual Verification 'Phase 1' +- [ ] Task: Create Deterministic Polling Primitives + - [ ] WHERE: `tests/conftest.py` + - [ ] WHAT: Implement a `wait_until(predicate_fn, timeout=5.0, interval=0.05)` utility. + - [ ] HOW: Standard while loop that evaluates `predicate_fn()`. + - [ ] SAFETY: Ensure it raises a clear `TimeoutError` if it fails. +- [ ] Task: Conductor - User Manual Verification 'Phase 1: Polling Primitives' (Protocol in workflow.md) -## Phase 2: Refactoring Sleeps -- [ ] Task: Replace `time.sleep` across integration tests. -- [ ] Task: Conductor - User Manual Verification 'Phase 2' +## Phase 2: Refactoring Integration Tests +- [ ] Task: Refactor `test_spawn_interception.py` + - [ ] WHERE: `tests/test_spawn_interception.py` + - [ ] WHAT: Replace hardcoded sleeps with `wait_until` checking the `event_queue` or internal state. + - [ ] HOW: Use the new `conftest.py` utility. + - [ ] SAFETY: Prevent event loop deadlocks. +- [ ] Task: Refactor Simulation Waits + - [ ] WHERE: `simulation/*.py` and `tests/test_live_gui_integration.py` + - [ ] WHAT: Replace `time.sleep()` blocks with `ApiHookClient.wait_for_event` or `client.wait_until_value_equals`. + - [ ] HOW: Expand `ApiHookClient` polling capabilities if necessary. + - [ ] SAFETY: Ensure the GUI hook server remains responsive during rapid polling. +- [ ] Task: Conductor - User Manual Verification 'Phase 2: Refactoring Sleeps' (Protocol in workflow.md) -## Phase 3: Test Marking -- [ ] Task: Apply `@pytest.mark.slow` to long-running tests. -- [ ] Task: Conductor - User Manual Verification 'Phase 3' \ No newline at end of file +## Phase 3: Test Marking & Final Validation +- [ ] Task: Apply Slow Test Marks + - [ ] WHERE: Across all `tests/` + - [ ] WHAT: Add `@pytest.mark.slow` to any test requiring a live GUI boot or API mocking that takes >2 seconds. + - [ ] HOW: Import pytest and apply the decorator. + - [ ] SAFETY: Update `pyproject.toml` to register the `slow` marker. +- [ ] Task: Full Suite Performance Validation + - [ ] WHERE: Project root + - [ ] WHAT: Run `uv run pytest -m "not slow"` and verify execution time < 10 seconds. Run `uv run pytest` to ensure total suite passes. + - [ ] HOW: Time the terminal command. + - [ ] SAFETY: None. +- [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md) \ No newline at end of file diff --git a/conductor/tracks/test_suite_performance_and_flakiness_20260302/spec.md b/conductor/tracks/test_suite_performance_and_flakiness_20260302/spec.md index a82421e..6f4d1f8 100644 --- a/conductor/tracks/test_suite_performance_and_flakiness_20260302/spec.md +++ b/conductor/tracks/test_suite_performance_and_flakiness_20260302/spec.md @@ -1,9 +1,19 @@ -# Track Specification: Test Suite Performance & Flakiness +# Track Specification: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302) ## Overview -The test suite is slow and flaky due to `time.sleep()`. This track replaces sleeps with deterministic polling (`threading.Event()`), aiming for a <10s core TDD loop. +The test suite currently takes over 5.0 minutes to execute and frequently hangs on integration tests (e.g., `test_spawn_interception.py`). Several simulation tests are flaky or timing out. This track replaces arbitrary `time.sleep()` calls with deterministic polling (`threading.Event()`), aiming to drive the core TDD test execution time down to under 10 seconds. + +## Architectural Constraints +- **Zero Arbitrary Sleeps**: `time.sleep(1.0)` is banned in test files unless testing actual rate-limiting or debounce functionality. +- **Deterministic Waits**: Tests must use state-polling (with aggressive micro-sleeps) or `asyncio.Event` / `threading.Event` to proceed exactly when the system is ready. ## Functional Requirements -- Audit and remove `time.sleep()` in tests. -- Implement deterministic event polling. -- Mark slow integration tests with `@pytest.mark.slow`. \ No newline at end of file +- Audit all `tests/` and `simulation/` files for `time.sleep()`. +- Implement polling helper functions in `conftest.py` (e.g., `wait_until(condition_func, timeout)`). +- Refactor all integration tests to use the deterministic polling helpers. +- Apply `@pytest.mark.slow` to any test that legitimately takes >2 seconds, allowing developers to skip them during rapid TDD loops. + +## Acceptance Criteria +- [ ] `time.sleep` occurrences in the test suite are eliminated or strictly justified. +- [ ] The core unit test suite (excluding `@pytest.mark.slow`) executes in under 10 seconds. +- [ ] Integration tests pass consistently without flakiness across 10 consecutive runs. \ No newline at end of file