chore(conductor): Archive completed and deprecated tracks

- Moved codebase_migration_20260302 to archive - Moved gui_decoupling_controller_20260302 to archive - Moved test_architecture_integrity_audit_20260304 to archive - Removed deprecated test_suite_performance_and_flakiness_20260302
docs: Reorder track queue and initialize final stabilization tracks
2026-03-05 09:51:11 -05:00 · 2026-03-05 09:43:42 -05:00 · 2026-03-05 09:35:03 -05:00 · 2026-03-05 09:32:24 -05:00
37 changed files with 221 additions and 117 deletions
@@ -6,60 +6,68 @@
 *(none — all planned tracks queued below)*
 ## Completed This Session
 - `test_architecture_integrity_audit_20260304` — Comprehensive test architecture audit completed. Wrote exhaustive report_gemini.md detailing fixing the "Triple Bingo" streaming history explosion, Destructive IPC Read drops, and Asyncio deadlocks. Checkpoint: e3c6b9e.
 - `mma_agent_focus_ux_20260302` — Per-tier source_tier tagging on comms+tool entries; Focus Agent combo UI; filter logic in comms+tool panels; [tier] label per comms entry. 18 tests. Checkpoint: b30e563.
 - `feature_bleed_cleanup_20260302` — Removed dead comms panel dup, dead menubar block, duplicate __init__ vars; added working Quit; fixed Token Budget layout. All phases verified. Checkpoint: 0d081a2.
 - `context_token_viz_20260301` — Token budget panel (color bar, breakdown table, trim warning, cache status, auto-refresh). All phases verified. Commit: d577457.
 - `tech_debt_and_test_cleanup_20260302` — [BOTCHED/ARCHIVED] Centralized fixtures but exposed deep asyncio flaws.
 ---
 ## Planned: The Strict Execution Queue
 *All previously loose backlog items have been rigorously spec'd and initialized as Conductor Tracks. They MUST be executed in this exact order.*
-### 1. `test_stabilization_20260302` (Active/Next)
+> [!WARNING] TEST ARCHITECTURE DEBT NOTICE (2026-03-05)
- **Status:** Initialized / Looked Over
+> The `gui_decoupling` track exposed deep flaws in the test architecture (asyncio event loop exhaustion, IPC polling race conditions, phantom Windows subprocesses). 
- **Priority:** High
+> **Current Testing Policy:** 
- **Goal:** Stabilize `asyncio` errors, ban mock-rot, completely remove `gui_legacy.py`, and consolidate testing paradigms.
+> - Full-suite integration tests (`live_gui` / extended sims) are currently considered **"flaky by design"**. 
 > - Do NOT write new `live_gui` simulations until Track #1, #2, and #3 are complete. 
 > - If unit tests pass but `test_extended_sims.py` hangs or fails locally, you may manually verify the GUI behavior and proceed.
-### 2. `strict_static_analysis_and_typing_20260302`
+### 1. `hook_api_ui_state_verification_20260302` (Active/Next)
- **Status:** Initialized / Looked Over
+- **Status:** Initialized
 - **Priority:** High
 - **Goal:** Resolve 512+ mypy errors and remaining ruff violations to secure the foundation before refactoring. Add pre-commit hooks.
 ### 3. `codebase_migration_20260302`
 - **Status:** Initialized / Looked Over
 - **Priority:** High
 - **Goal:** Restructure directories to a `src/` layout. Doing this after static analysis ensures no hidden import bugs are introduced. Creates `sloppy.py` entry point.
 ### 4. `gui_decoupling_controller_20260302`
 - **Status:** Initialized / Looked Over
 - **Priority:** High
 - **Goal:** Extract the state machine and core lifecycle into a headless `app_controller.py`, leaving `gui_2.py` as a pure, immediate-mode view.
 ### 5. `hook_api_ui_state_verification_20260302`
 - **Status:** Initialized / Looked Over
 - **Priority:** Medium
 - **Goal:** Add a `/api/gui/state` GET endpoint. Wire UI state into `_settable_fields` to enable programmatic `live_gui` testing without user confirmation. 
 - **Fixes Test Debt:** Replaces brittle `time.sleep()` and string-matching assertions in simulations with deterministic API queries.
-### 6. `robust_json_parsing_tech_lead_20260302`
+### 2. `asyncio_decoupling_refactor_20260306`
- **Status:** Initialized / Looked Over
+- **Status:** Initialized
 - **Priority:** High
 - **Goal:** Resolve deep asyncio/threading deadlocks. Replace `asyncio.Queue` in `AppController` with a standard `queue.Queue`. Ensure phantom subprocesses are killed.
 - **Fixes Test Debt:** Eliminates `RuntimeError: Event loop is closed` and zombie port 8999 hijacking. Restores full-suite reliability.
 ### 3. `mock_provider_hardening_20260305`
 - **Status:** Initialized
 - **Priority:** Medium
 - **Goal:** Introduce negative testing paths (malformed JSON, timeouts) into the mock AI provider.
 - **Fixes Test Debt:** Allows the test suite to verify error handling flows that were previously masked by a mock provider that only ever returned success.
 ### 4. `robust_json_parsing_tech_lead_20260302`
 - **Status:** Initialized
 - **Priority:** Medium
 - **Goal:** Implement an auto-retry loop that catches `JSONDecodeError` and feeds the traceback to the Tier 2 model for self-correction.
 - **Test Debt Note:** Rely strictly on in-process `unittest.mock` to verify the retry logic until stabilization tracks are done.
-### 7. `concurrent_tier_source_tier_20260302`
+### 5. `concurrent_tier_source_tier_20260302`
- **Status:** Initialized / Looked Over
+- **Status:** Initialized
 - **Priority:** Low
 - **Goal:** Replace global state with `threading.local()` or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel.
 - **Test Debt Note:** Use in-process mocks to verify concurrency.
-### 8. `test_suite_performance_and_flakiness_20260302`
+### 6. `manual_ux_validation_20260302`
- **Status:** Initialized / Looked Over
+- **Status:** Initialized
 - **Priority:** Low
 - **Goal:** Replace `time.sleep()` with deterministic polling or `threading.Event()` triggers. Mark exceptionally heavy tests with `@pytest.mark.slow`.
 ### 9. `manual_ux_validation_20260302`
 - **Status:** Initialized / Looked Over
 - **Priority:** Medium
 - **Goal:** Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback.
 - **Test Debt Note:** Naturally bypasses automated testing debt as it is purely human-in-the-loop.
 ### 7. `async_tool_execution_20260303`
 - **Status:** Initialized
 - **Priority:** Medium
 - **Goal:** Refactor MCP tool execution to utilize `asyncio.gather` or thread pools to run multiple tools concurrently within a single AI loop.
 - **Test Debt Note:** Use in-process mocks to verify concurrency.
 ### 8. `simulation_fidelity_enhancement_20260305`
 - **Status:** Initialized
 - **Priority:** Low
 - **Goal:** Add human-like jitter, hesitation, and reading latency to the UserSimAgent.
 ---
@@ -80,7 +88,3 @@
 ### 5. Transitioning to a Native Orchestrator
 **Goal:** Absorb the Conductor extension entirely into the core application. Manual Slop should natively read/write `plan.md`, manage the `metadata.json`, and orchestrate the MMA tiers in pure Python, removing the dependency on external CLI shell executions (`mma_exec.py`).
 ### 10. 	est_architecture_integrity_audit_20260304 (Planned)
 - **Status:** Initialized
 - **Priority:** High
 - **Goal:** Comprehensive audit of testing infrastructure and simulation framework to identify false positive risks, coverage gaps, and simulation fidelity issues. Documented by GLM-4.7 via full skeletal analysis of src/, tests/, and simulation/ directories.
@@ -8,34 +8,43 @@ This file tracks all major tracks for the project. Each track has its own detail
 *The following tracks MUST be executed in this exact order to safely resolve tech debt before feature development.*
-1. [x] **Track: Codebase Migration to `src` & Cleanup**
+1. [ ] **Track: Hook API UI State Verification**
 *Link: [./tracks/codebase_migration_20260302/](./tracks/codebase_migration_20260302/)*
 2. [x] **Track: GUI Decoupling & Controller Architecture**
 *Link: [./tracks/gui_decoupling_controller_20260302/](./tracks/gui_decoupling_controller_20260302/)*
 3. [ ] **Track: Hook API UI State Verification**
 *Link: [./tracks/hook_api_ui_state_verification_20260302/](./tracks/hook_api_ui_state_verification_20260302/)*
 2. [ ] **Track: Asyncio Decoupling & Queue Refactor**
 *Link: [./tracks/asyncio_decoupling_refactor_20260306/](./tracks/asyncio_decoupling_refactor_20260306/)*
 3. [ ] **Track: Mock Provider Hardening**
 *Link: [./tracks/mock_provider_hardening_20260305/](./tracks/mock_provider_hardening_20260305/)*
 4. [ ] **Track: Robust JSON Parsing for Tech Lead**
 *Link: [./tracks/robust_json_parsing_tech_lead_20260302/](./tracks/robust_json_parsing_tech_lead_20260302/)*
 5. [ ] **Track: Concurrent Tier Source Isolation**
 *Link: [./tracks/concurrent_tier_source_tier_20260302/](./tracks/concurrent_tier_source_tier_20260302/)*
-6. [ ] **Track: Test Suite Performance & Flakiness**
+6. [ ] **Track: Manual UX Validation & Polish**
 *Link: [./tracks/test_suite_performance_and_flakiness_20260302/](./tracks/test_suite_performance_and_flakiness_20260302/)*
 7. [ ] **Track: Manual UX Validation & Polish**
 *Link: [./tracks/manual_ux_validation_20260302/](./tracks/manual_ux_validation_20260302/)*
-8. [ ] **Track: Asynchronous Tool Execution Engine**
+7. [ ] **Track: Asynchronous Tool Execution Engine**
 *Link: [./tracks/async_tool_execution_20260303/](./tracks/async_tool_execution_20260303/)*
 8. [ ] **Track: Simulation Fidelity Enhancement**
 *Link: [./tracks/simulation_fidelity_enhancement_20260305/](./tracks/simulation_fidelity_enhancement_20260305/)*
 ---
 ## Completed / Archived
 - [x] **Track: Test Architecture Integrity Audit**
 *Link: [./archive/test_architecture_integrity_audit_20260304/](./archive/test_architecture_integrity_audit_20260304/)*
 - [x] **Track: Codebase Migration to `src` & Cleanup**
 *Link: [./archive/codebase_migration_20260302/](./archive/codebase_migration_20260302/)*
 - [x] **Track: GUI Decoupling & Controller Architecture**
 *Link: [./archive/gui_decoupling_controller_20260302/](./archive/gui_decoupling_controller_20260302/)*
 - [x] **Track: Strict Static Analysis & Type Safety**
 *Link: [./archive/strict_static_analysis_and_typing_20260302/](./archive/strict_static_analysis_and_typing_20260302/)*
@@ -1,5 +1,7 @@
 # Implementation Plan: Asynchronous Tool Execution Engine (async_tool_execution_20260303)
 > **TEST DEBT FIX:** Due to ongoing test architecture instability (documented in `test_architecture_integrity_audit_20260304`), do NOT write new `live_gui` integration tests for this track. Use purely in-process mocks to verify concurrency logic.
 ## Phase 1: Engine Refactoring
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
 - [ ] Task: Refactor `mcp_client.py` for async execution
@@ -0,0 +1,8 @@
 {
  "id": "asyncio_decoupling_refactor_20260306",
  "title": "Asyncio Decoupling & Queue Refactor",
  "description": "Rip out asyncio from AppController to eliminate test deadlocks.",
  "status": "planned",
  "created_at": "2026-03-05T00:00:00Z",
  "updated_at": "2026-03-05T00:00:00Z"
 }
@@ -0,0 +1,33 @@
 # Implementation Plan: Asyncio Decoupling Refactor (asyncio_decoupling_refactor_20260306)
 > **TEST DEBT FIX:** This track is responsible for permanently eliminating the `RuntimeError: Event loop is closed` test suite crashes by ripping out the conflict-prone asyncio loops from the AppController.
 ## Phase 1: Event System Migration
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
 - [ ] Task: Refactor `events.py`
    - [ ] WHERE: `src/events.py`
    - [ ] WHAT: Replace `AsyncEventQueue` with `SyncEventQueue` using `import queue`.
    - [ ] HOW: Change `async def get()` to a blocking `def get()`. Remove `asyncio` imports.
    - [ ] SAFETY: Ensure thread-safety.
 - [ ] Task: Conductor - User Manual Verification 'Phase 1: Event System'
 ## Phase 2: AppController Decoupling
 - [ ] Task: Refactor `AppController` Event Loop
    - [ ] WHERE: `src/app_controller.py`
    - [ ] WHAT: Remove `self._loop` and `asyncio.new_event_loop()`.
    - [ ] HOW: Change `_run_event_loop` to just call `_process_event_queue` directly (which will now block on queue gets).
    - [ ] SAFETY: Ensure `shutdown()` properly signals the queue to unblock and join the thread.
 - [ ] Task: Thread Task Dispatching
    - [ ] WHERE: `src/app_controller.py`
    - [ ] WHAT: Replace `asyncio.run_coroutine_threadsafe(self.event_queue.put(...))` with direct synchronous `.put()`. Replace `self._loop.run_in_executor` with `threading.Thread(target=self._handle_request_event)`.
    - [ ] HOW: Mechanical replacement of async primitives.
    - [ ] SAFETY: None.
 - [ ] Task: Conductor - User Manual Verification 'Phase 2: Decoupling'
 ## Phase 3: Final Validation
 - [ ] Task: Full Suite Validation
    - [ ] WHERE: Project root
    - [ ] WHAT: `uv run pytest`
    - [ ] HOW: Ensure 100% pass rate with no hanging threads or event loop errors.
    - [ ] SAFETY: None.
 - [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation'
@@ -0,0 +1,14 @@
 # Specification: Asyncio Decoupling & Refactor
 ## Background
 The `AppController` currently utilizes an internal `asyncio.Queue` and a dedicated `_loop_thread` to manage background tasks and GUI updates. As identified in the `test_architecture_integrity_audit_20260304`, this architecture leads to severe event loop exhaustion and `RuntimeError: Event loop is closed` deadlocks during full test suite runs due to conflicts with `pytest-asyncio`'s loop management.
 ## Objective
 Remove all `asyncio` dependencies from `AppController` and `events.py`. Replace the asynchronous event queue with a standard, thread-safe `queue.Queue` from Python's standard library. 
 ## Requirements
 1. **Remove Asyncio:** Strip `import asyncio` from `app_controller.py` and `events.py`.
 2. **Synchronous Queues:** Convert `events.AsyncEventQueue` to a standard synchronous wrapper around `queue.Queue`.
 3. **Daemon Thread Processing:** Convert `AppController._process_event_queue` from an `async def` to a standard synchronous `def` that blocks on `self.event_queue.get()`.
 4. **Thread Offloading:** Use `threading.Thread` or `concurrent.futures.ThreadPoolExecutor` to handle AI request dispatching (instead of `self._loop.run_in_executor`).
 5. **No Regressions:** The application must remain responsive (60 FPS) and all unit/integration tests must pass cleanly.
@@ -1,5 +1,7 @@
 # Implementation Plan: Concurrent Tier Source Isolation (concurrent_tier_source_tier_20260302)
 > **TEST DEBT FIX:** Due to ongoing test architecture instability (documented in `test_architecture_integrity_audit_20260304`), do NOT write new `live_gui` integration tests for this track. Rely strictly on in-process `unittest.mock` for `ai_client` concurrency verification.
 ## Phase 1: Thread-Local Context Refactoring
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
 - [ ] Task: Refactor `ai_client` to `threading.local()`
@@ -1,5 +1,7 @@
 # Implementation Plan: Hook API UI State Verification (hook_api_ui_state_verification_20260302)
 > **TEST DEBT FIX:** This track replaces fragile `time.sleep()` and string-matching assertions in simulations (like `test_visual_sim_mma_v2.py`) with deterministic UI state queries. This is critical for stabilizing the test suite after the GUI decoupling.
 ## Phase 1: API Endpoint Implementation
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
 - [ ] Task: Implement `/api/gui/state` GET Endpoint
@@ -1,5 +1,7 @@
 # Implementation Plan: Manual UX Validation & Polish (manual_ux_validation_20260302)
 > **TEST DEBT NOTE:** This track is explicitly manual/visual and naturally bypasses the current `live_gui` automated testing debt (documented in `test_architecture_integrity_audit_20260304`). 
 ## Phase 1: Observation Harness Setup
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
 - [ ] Task: Create Slow-Mode Simulation
@@ -0,0 +1,8 @@
 {
  "id": "mock_provider_hardening_20260305",
  "title": "Mock Provider Hardening",
  "description": "Introduce negative testing paths (malformed JSON, timeouts) into the mock AI provider.",
  "status": "planned",
  "created_at": "2026-03-05T00:00:00Z",
  "updated_at": "2026-03-05T00:00:00Z"
 }
@@ -0,0 +1,26 @@
 # Implementation Plan: Mock Provider Hardening (mock_provider_hardening_20260305)
 ## Phase 1: Mock Script Extension
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
 - [ ] Task: Add `MOCK_MODE` to `mock_gemini_cli.py`
    - [ ] WHERE: `tests/mock_gemini_cli.py`
    - [ ] WHAT: Implement conditional branches based on `MOCK_MODE` environment variable.
    - [ ] HOW: Support `success`, `malformed_json`, `error_result`, and `timeout`.
    - [ ] SAFETY: Ensure it still defaults to `success` to not break existing tests.
 - [ ] Task: Conductor - User Manual Verification 'Phase 1: Mock Extension'
 ## Phase 2: Negative Path Testing
 - [ ] Task: Write `test_negative_flows.py`
    - [ ] WHERE: `tests/test_negative_flows.py`
    - [ ] WHAT: Write tests that launch `live_gui`, inject `MOCK_MODE` via `ApiHookClient` custom callback or `env` dictionary, and assert the UI gracefully handles the failure.
    - [ ] HOW: Use `wait_for_event('response')` and check that the payload status is `"error"`.
    - [ ] SAFETY: Ensure `timeout` tests don't actually hang the test suite for 120s (configure the timeout shorter if possible in test setup).
 - [ ] Task: Conductor - User Manual Verification 'Phase 2: Negative Tests'
 ## Phase 3: Final Validation
 - [ ] Task: Full Suite Validation
    - [ ] WHERE: Project root
    - [ ] WHAT: `uv run pytest`
    - [ ] HOW: Ensure 100% pass rate.
    - [ ] SAFETY: None.
 - [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation'
@@ -0,0 +1,14 @@
 # Specification: Mock Provider Hardening
 ## Background
 The current `mock_gemini_cli.py` provider only tests the "happy path". It always returns successfully parsed JSON-L responses, which masks potential error-handling bugs in `ai_client.py` and `AppController`. To properly verify the system's robustness, the mock must be capable of failing realistically.
 ## Objective
 Extend `mock_gemini_cli.py` to support negative testing paths, controlled via an environment variable `MOCK_MODE`.
 ## Requirements
 1. **MOCK_MODE parsing:** The mock script must read `os.environ.get("MOCK_MODE", "success")`.
 2. **malformed_json:** If mode is `malformed_json`, the mock should print a truncated or syntactically invalid JSON string to `stdout` and exit.
 3. **error_result:** If mode is `error_result`, the mock should print a valid JSON string but with `"status": "error"` and an error message payload.
 4. **timeout:** If mode is `timeout`, the mock should `time.sleep(120)` to force the parent process to handle a subprocess timeout.
 5. **Integration Tests:** New tests must be written to explicitly trigger these modes using `ApiHookClient` and verify that the GUI displays an error state rather than crashing.
@@ -1,5 +1,7 @@
 # Implementation Plan: Robust JSON Parsing for Tech Lead (robust_json_parsing_tech_lead_20260302)
 > **TEST DEBT FIX:** Due to ongoing test architecture instability (documented in `test_architecture_integrity_audit_20260304`), do NOT write new `live_gui` integration tests for this track. Rely strictly on in-process `unittest.mock` for the `ai_client` to verify the retry logic.
 ## Phase 1: Implementation of Retry Logic
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
 - [ ] Task: Implement Retry Loop in `generate_tickets`
@@ -0,0 +1,8 @@
 {
  "id": "simulation_fidelity_enhancement_20260305",
  "title": "Simulation Fidelity Enhancement",
  "description": "Add human-like jitter, hesitation, and reading latency to the UserSimAgent.",
  "status": "planned",
  "created_at": "2026-03-05T00:00:00Z",
  "updated_at": "2026-03-05T00:00:00Z"
 }
@@ -0,0 +1,26 @@
 # Implementation Plan: Simulation Fidelity Enhancement (simulation_fidelity_enhancement_20260305)
 ## Phase 1: User Agent Modeling
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
 - [ ] Task: Update `UserSimAgent`
    - [ ] WHERE: `simulation/user_agent.py`
    - [ ] WHAT: Add reading delay calculation (based on word count), typing jitter for input fields, and action hesitation probabilities.
    - [ ] HOW: Use Python's `random` module to introduce variance.
    - [ ] SAFETY: Ensure these delays are configurable so that fast test runs can disable them.
 - [ ] Task: Conductor - User Manual Verification 'Phase 1: Agent Modeling'
 ## Phase 2: Application to Simulations
 - [ ] Task: Update Simulator
    - [ ] WHERE: `simulation/workflow_sim.py`
    - [ ] WHAT: Inject the `UserSimAgent` into the standard workflow steps (e.g., waiting before approving a ticket).
    - [ ] HOW: Call the agent's delay methods before executing `ApiHookClient` commands.
    - [ ] SAFETY: None.
 - [ ] Task: Conductor - User Manual Verification 'Phase 2: Simulator Integration'
 ## Phase 3: Final Validation
 - [ ] Task: Watch Simulation
    - [ ] WHERE: Terminal
    - [ ] WHAT: Run `python simulation/sim_execution.py` locally and observe the pacing.
    - [ ] HOW: Verify it feels more human.
    - [ ] SAFETY: None.
 - [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation'
@@ -0,0 +1,12 @@
 # Specification: Simulation Fidelity Enhancement
 ## Background
 The `simulation/user_agent.py` currently relies on fixed random delays to simulate human typing. As identified in the architecture audit, this provides a low-fidelity simulation of actual user interactions, which may hide UI rendering glitches that only appear when ImGui is forced to render intermediate, hesitating states.
 ## Objective
 Enhance the `UserSimAgent` to behave more like a human, introducing realistic jitter, hesitation, and reading delays.
 ## Requirements
 1. **Variable Reading Latency:** Calculate artificial delays based on the length of the AI's response to simulate the user reading the text before clicking next.
 2. **Typing Jitter:** Instead of just injecting text instantly, simulate keystrokes with slight random delays if testing input fields (optional, but good for stress testing the render loop).
 3. **Hesitation Vectors:** Introduce a random chance for a longer "hesitation" delay (e.g., 2-5 seconds) before critical actions like "Approve Script".
@@ -1,5 +0,0 @@
 # Track test_suite_performance_and_flakiness_20260302 Context
 - [Specification](./spec.md)
 - [Implementation Plan](./plan.md)
 - [Metadata](./metadata.json)
@@ -1,8 +0,0 @@
 {
  "track_id": "test_suite_performance_and_flakiness_20260302",
  "type": "chore",
  "status": "new",
  "created_at": "2026-03-02T22:30:00Z",
  "updated_at": "2026-03-02T22:30:00Z",
  "description": "Replace arbitrary time.sleep() calls with deterministic polling/Events and optimize test speed."
 }
@@ -1,36 +0,0 @@
 # Implementation Plan: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)
 ## Phase 1: Audit & Polling Primitives
 - [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
 - [ ] Task: Create Deterministic Polling Primitives
    - [ ] WHERE: `tests/conftest.py`
    - [ ] WHAT: Implement a `wait_until(predicate_fn, timeout=5.0, interval=0.05)` utility.
    - [ ] HOW: Standard while loop that evaluates `predicate_fn()`.
    - [ ] SAFETY: Ensure it raises a clear `TimeoutError` if it fails.
 - [ ] Task: Conductor - User Manual Verification 'Phase 1: Polling Primitives' (Protocol in workflow.md)
 ## Phase 2: Refactoring Integration Tests
 - [ ] Task: Refactor `test_spawn_interception.py`
    - [ ] WHERE: `tests/test_spawn_interception.py`
    - [ ] WHAT: Replace hardcoded sleeps with `wait_until` checking the `event_queue` or internal state.
    - [ ] HOW: Use the new `conftest.py` utility.
    - [ ] SAFETY: Prevent event loop deadlocks.
 - [ ] Task: Refactor Simulation Waits
    - [ ] WHERE: `simulation/*.py` and `tests/test_live_gui_integration.py`
    - [ ] WHAT: Replace `time.sleep()` blocks with `ApiHookClient.wait_for_event` or `client.wait_until_value_equals`.
    - [ ] HOW: Expand `ApiHookClient` polling capabilities if necessary.
    - [ ] SAFETY: Ensure the GUI hook server remains responsive during rapid polling.
 - [ ] Task: Conductor - User Manual Verification 'Phase 2: Refactoring Sleeps' (Protocol in workflow.md)
 ## Phase 3: Test Marking & Final Validation
 - [ ] Task: Apply Slow Test Marks
    - [ ] WHERE: Across all `tests/`
    - [ ] WHAT: Add `@pytest.mark.slow` to any test requiring a live GUI boot or API mocking that takes >2 seconds.
    - [ ] HOW: Import pytest and apply the decorator.
    - [ ] SAFETY: Update `pyproject.toml` to register the `slow` marker.
 - [ ] Task: Full Suite Performance Validation
    - [ ] WHERE: Project root
    - [ ] WHAT: Run `uv run pytest -m "not slow"` and verify execution time < 10 seconds. Run `uv run pytest` to ensure total suite passes.
    - [ ] HOW: Time the terminal command.
    - [ ] SAFETY: None.
 - [ ] Task: Conductor - User Manual Verification 'Phase 3: Final Validation' (Protocol in workflow.md)
@@ -1,19 +0,0 @@
 # Track Specification: Test Suite Performance & Flakiness (test_suite_performance_and_flakiness_20260302)
 ## Overview
 The test suite currently takes over 5.0 minutes to execute and frequently hangs on integration tests (e.g., `test_spawn_interception.py`). Several simulation tests are flaky or timing out. This track replaces arbitrary `time.sleep()` calls with deterministic polling (`threading.Event()`), aiming to drive the core TDD test execution time down to under 10 seconds.
 ## Architectural Constraints
 - **Zero Arbitrary Sleeps**: `time.sleep(1.0)` is banned in test files unless testing actual rate-limiting or debounce functionality.
 - **Deterministic Waits**: Tests must use state-polling (with aggressive micro-sleeps) or `asyncio.Event` / `threading.Event` to proceed exactly when the system is ready.
 ## Functional Requirements
 - Audit all `tests/` and `simulation/` files for `time.sleep()`.
 - Implement polling helper functions in `conftest.py` (e.g., `wait_until(condition_func, timeout)`).
 - Refactor all integration tests to use the deterministic polling helpers.
 - Apply `@pytest.mark.slow` to any test that legitimately takes >2 seconds, allowing developers to skip them during rapid TDD loops.
 ## Acceptance Criteria
 - [ ] `time.sleep` occurrences in the test suite are eliminated or strictly justified.
 - [ ] The core unit test suite (excluding `@pytest.mark.slow`) executes in under 10 seconds.
 - [ ] Integration tests pass consistently without flakiness across 10 consecutive runs.
Author	SHA1	Message	Date
ed	d0e7743ef6	chore(conductor): Archive completed and deprecated tracks - Moved codebase_migration_20260302 to archive - Moved gui_decoupling_controller_20260302 to archive - Moved test_architecture_integrity_audit_20260304 to archive - Removed deprecated test_suite_performance_and_flakiness_20260302	2026-03-05 09:51:11 -05:00
ed	c295db1630	docs: Reorder track queue and initialize final stabilization tracks - Initialize asyncio_decoupling_refactor_20260306 track - Initialize mock_provider_hardening_20260305 track - Initialize simulation_fidelity_enhancement_20260305 track - Update TASKS.md and tracks.md to reflect new strict execution queue - Archive completed tracks and remove deprecated test performance track	2026-03-05 09:43:42 -05:00
ed	e21cd64833	docs: Update remaining track plans with test architecture debt warnings - Add test debt notes to concurrent_tier, manual_ux, and async_tool tracks to guide testing strategies away from live_gui where appropriate.	2026-03-05 09:35:03 -05:00
ed	d863c51da3	docs: Update track plans with test architecture debt warnings - Mark live_gui tests as flaky by design in TASKS.md until stabiliztion tracks complete - Add test debt notes to upcoming tracks to guide testing strategies	2026-03-05 09:32:24 -05:00