diff --git a/TASKS.md b/TASKS.md index 7d40b2f..770663a 100644 --- a/TASKS.md +++ b/TASKS.md @@ -16,71 +16,61 @@ ## Planned: The Strict Execution Queue *All previously loose backlog items have been rigorously spec'd and initialized as Conductor Tracks. They MUST be executed in this exact order.* -### 1. `test_stabilization_20260302` (Active/Next) -- **Status:** Initialized / Looked Over +> [!WARNING] TEST ARCHITECTURE DEBT NOTICE (2026-03-05) +> The `gui_decoupling` track exposed deep flaws in the test architecture (asyncio event loop exhaustion, IPC polling race conditions, phantom Windows subprocesses). +> **Current Testing Policy:** +> - Full-suite integration tests (`live_gui` / extended sims) are currently considered **"flaky by design"**. +> - Do NOT write new `live_gui` simulations until Track #5 and #6 are complete. +> - If unit tests pass but `test_extended_sims.py` hangs or fails locally, you may manually verify the GUI behavior and proceed. + +### 1. `test_stabilization_20260302` (Archived) +- **Status:** Completed - **Priority:** High - **Goal:** Stabilize `asyncio` errors, ban mock-rot, completely remove `gui_legacy.py`, and consolidate testing paradigms. -### 2. `strict_static_analysis_and_typing_20260302` -- **Status:** Initialized / Looked Over +### 2. `strict_static_analysis_and_typing_20260302` (Archived) +- **Status:** Completed - **Priority:** High - **Goal:** Resolve 512+ mypy errors and remaining ruff violations to secure the foundation before refactoring. Add pre-commit hooks. -### 3. `codebase_migration_20260302` -- **Status:** Initialized / Looked Over +### 3. `codebase_migration_20260302` (Archived) +- **Status:** Completed - **Priority:** High - **Goal:** Restructure directories to a `src/` layout. Doing this after static analysis ensures no hidden import bugs are introduced. Creates `sloppy.py` entry point. -### 4. `gui_decoupling_controller_20260302` -- **Status:** Initialized / Looked Over +### 4. `gui_decoupling_controller_20260302` (Archived) +- **Status:** Completed - **Priority:** High - **Goal:** Extract the state machine and core lifecycle into a headless `app_controller.py`, leaving `gui_2.py` as a pure, immediate-mode view. -### 5. `hook_api_ui_state_verification_20260302` +### 5. `hook_api_ui_state_verification_20260302` (Active/Next) - **Status:** Initialized / Looked Over -- **Priority:** Medium -- **Goal:** Add a `/api/gui/state` GET endpoint. Wire UI state into `_settable_fields` to enable programmatic `live_gui` testing without user confirmation. +- **Priority:** High +- **Goal:** Add a `/api/gui/state` GET endpoint. Wire UI state into `_settable_fields` to enable programmatic `live_gui` testing without user confirmation. +- **Fixes Test Debt:** Replaces brittle `time.sleep()` and string-matching assertions in simulations with deterministic API queries. -### 6. `robust_json_parsing_tech_lead_20260302` +### 6. `test_suite_performance_and_flakiness_20260302` +- **Status:** Initialized / Looked Over +- **Priority:** High +- **Goal:** Resolve deep asyncio/threading deadlocks. Replace `asyncio.Queue` in `AppController` with a standard `queue.Queue`. Ensure phantom subprocesses are killed. +- **Fixes Test Debt:** Eliminates `RuntimeError: Event loop is closed` and zombie port 8999 hijacking. Restores full-suite reliability. + +### 7. `robust_json_parsing_tech_lead_20260302` - **Status:** Initialized / Looked Over - **Priority:** Medium - **Goal:** Implement an auto-retry loop that catches `JSONDecodeError` and feeds the traceback to the Tier 2 model for self-correction. -### 7. `concurrent_tier_source_tier_20260302` +### 8. `concurrent_tier_source_tier_20260302` - **Status:** Initialized / Looked Over - **Priority:** Low - **Goal:** Replace global state with `threading.local()` or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel. -### 8. `test_suite_performance_and_flakiness_20260302` -- **Status:** Initialized / Looked Over -- **Priority:** Low -- **Goal:** Replace `time.sleep()` with deterministic polling or `threading.Event()` triggers. Mark exceptionally heavy tests with `@pytest.mark.slow`. - ### 9. `manual_ux_validation_20260302` - **Status:** Initialized / Looked Over - **Priority:** Medium - **Goal:** Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback. ---- - -## Phase 3: Future Horizons (Post-Hardening Backlog) -*To be evaluated in a future Tier 1 session once the Strict Execution Queue is cleared and the architectural foundation is stabilized.* - -### 1. True Parallel Worker Execution (The DAG Realization) -**Goal:** Implement true concurrency for the DAG engine. Once `threading.local()` is in place, the `ExecutionEngine` should spawn independent Tier 3 workers in parallel (e.g., 4 workers handling 4 isolated tests simultaneously). Requires strict file-locking or a Git-based diff-merging strategy to prevent AST collision. - -### 2. Deep AST-Driven Context Pruning (RAG for Code) -**Goal:** Before dispatching a Tier 3 worker, use `tree_sitter` to automatically parse the target file's AST, strip out unrelated function bodies, and inject a surgically condensed skeleton into the worker's prompt. Guarantees the AI only "sees" what it needs to edit, drastically reducing token burn. - -### 3. Visual DAG & Interactive Ticket Editing -**Goal:** Replace the linear ticket list in the GUI with an interactive Node Graph using ImGui Bundle's node editor. Allow the user to visually drag dependency lines, split nodes, or delete tasks before clicking "Execute Pipeline." - -### 4. Advanced Tier 4 QA Auto-Patching -**Goal:** Elevate Tier 4 from a log summarizer to an auto-patcher. When a verification test fails, Tier 4 generates a `.patch` file. The GUI intercepts this and presents a side-by-side Diff Viewer. The user clicks "Apply Patch" to instantly resume the pipeline. - -### 5. Transitioning to a Native Orchestrator -**Goal:** Absorb the Conductor extension entirely into the core application. Manual Slop should natively read/write `plan.md`, manage the `metadata.json`, and orchestrate the MMA tiers in pure Python, removing the dependency on external CLI shell executions (`mma_exec.py`). -### 10. est_architecture_integrity_audit_20260304 (Planned) -- **Status:** Initialized +### 10. `test_architecture_integrity_audit_20260304` +- **Status:** Audit Completed - **Priority:** High -- **Goal:** Comprehensive audit of testing infrastructure and simulation framework to identify false positive risks, coverage gaps, and simulation fidelity issues. Documented by GLM-4.7 via full skeletal analysis of src/, tests/, and simulation/ directories. +- **Goal:** Comprehensive audit of testing infrastructure and simulation framework. Produced `report_gemini.md` detailing exact mechanical failures and remediation paths. diff --git a/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md b/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md index ba95bd9..823dd6d 100644 --- a/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md +++ b/conductor/tracks/hook_api_ui_state_verification_20260302/plan.md @@ -1,5 +1,7 @@ # Implementation Plan: Hook API UI State Verification (hook_api_ui_state_verification_20260302) +> **TEST DEBT FIX:** This track replaces fragile `time.sleep()` and string-matching assertions in simulations (like `test_visual_sim_mma_v2.py`) with deterministic UI state queries. This is critical for stabilizing the test suite after the GUI decoupling. + ## Phase 1: API Endpoint Implementation - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` - [ ] Task: Implement `/api/gui/state` GET Endpoint diff --git a/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md b/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md index 1cccfa1..0c86d53 100644 --- a/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md +++ b/conductor/tracks/robust_json_parsing_tech_lead_20260302/plan.md @@ -1,5 +1,7 @@ # Implementation Plan: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302) +> **TEST DEBT FIX:** Due to ongoing test architecture instability (documented in `test_architecture_integrity_audit_20260304`), do NOT write new `live_gui` integration tests for this track. Rely strictly on in-process `unittest.mock` for the `ai_client` to verify the retry logic. + ## Phase 1: Implementation of Retry Logic - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` - [ ] Task: Implement Retry Loop in `generate_tickets` diff --git a/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md b/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md index b78fb26..48842d5 100644 --- a/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md +++ b/conductor/tracks/test_suite_performance_and_flakiness_20260302/plan.md @@ -1,6 +1,21 @@ # Implementation Plan: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302) -## Phase 1: Audit & Polling Primitives +> **TEST DEBT FIX:** This track is responsible for eliminating the `RuntimeError: Event loop is closed` deadlocks and zombie subprocesses discovered during the `test_architecture_integrity_audit_20260304` audit. + +## Phase 1: Asyncio Decoupling & Queue Refactor +- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` +- [ ] Task: Rip out `asyncio` from `AppController` + - [ ] WHERE: `src/app_controller.py` and `src/events.py` + - [ ] WHAT: Replace `events.AsyncEventQueue` with a standard `queue.Queue`. Convert `_process_event_queue` to a synchronous `while True` loop running in a daemon thread. + - [ ] HOW: Remove all `async`/`await` and `asyncio.run_coroutine_threadsafe` calls related to the internal event queue. + - [ ] SAFETY: Ensures test teardowns no longer violently crash background event loops. +- [ ] Task: Ensure Phantom Processes are Killed + - [ ] WHERE: `tests/conftest.py` + - [ ] WHAT: Verify the `kill_process_tree` implementation added in the audit is fully robust against hanging `sloppy.py` instances. + - [ ] HOW: Test with intentional process hangs. +- [ ] Task: Conductor - User Manual Verification 'Phase 1: Asyncio Decoupling' + +## Phase 2: Audit & Polling Primitives - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator` - [ ] Task: Create Deterministic Polling Primitives - [ ] WHERE: `tests/conftest.py`