docs: Reorder track queue and initialize final stabilization tracks

- Initialize asyncio_decoupling_refactor_20260306 track - Initialize mock_provider_hardening_20260305 track - Initialize simulation_fidelity_enhancement_20260305 track - Update TASKS.md and tracks.md to reflect new strict execution queue - Archive completed tracks and remove deprecated test performance track
2026-03-05 09:43:42 -05:00
parent e21cd64833
commit c295db1630
11 changed files with 222 additions and 50 deletions
@@ -6,10 +6,9 @@
 *(none — all planned tracks queued below)*

 ## Completed This Session
+- `test_architecture_integrity_audit_20260304` — Comprehensive test architecture audit completed. Wrote exhaustive report_gemini.md detailing fixing the "Triple Bingo" streaming history explosion, Destructive IPC Read drops, and Asyncio deadlocks. Checkpoint: e3c6b9e.
 - `mma_agent_focus_ux_20260302` — Per-tier source_tier tagging on comms+tool entries; Focus Agent combo UI; filter logic in comms+tool panels; [tier] label per comms entry. 18 tests. Checkpoint: b30e563.
 - `feature_bleed_cleanup_20260302` — Removed dead comms panel dup, dead menubar block, duplicate __init__ vars; added working Quit; fixed Token Budget layout. All phases verified. Checkpoint: 0d081a2.
- `context_token_viz_20260301` — Token budget panel (color bar, breakdown table, trim warning, cache status, auto-refresh). All phases verified. Commit: d577457.
- `tech_debt_and_test_cleanup_20260302` — [BOTCHED/ARCHIVED] Centralized fixtures but exposed deep asyncio flaws.

 ---

@@ -20,57 +19,72 @@
 > The `gui_decoupling` track exposed deep flaws in the test architecture (asyncio event loop exhaustion, IPC polling race conditions, phantom Windows subprocesses). 
 > **Current Testing Policy:** 
 > - Full-suite integration tests (`live_gui` / extended sims) are currently considered **"flaky by design"**. 
-> - Do NOT write new `live_gui` simulations until Track #5 and #6 are complete. 
+> - Do NOT write new `live_gui` simulations until Track #1, #2, and #3 are complete. 
 > - If unit tests pass but `test_extended_sims.py` hangs or fails locally, you may manually verify the GUI behavior and proceed.

-### 1. `test_stabilization_20260302` (Archived)
- **Status:** Completed
- **Priority:** High
- **Goal:** Stabilize `asyncio` errors, ban mock-rot, completely remove `gui_legacy.py`, and consolidate testing paradigms.
-
-### 2. `strict_static_analysis_and_typing_20260302` (Archived)
- **Status:** Completed
- **Priority:** High
- **Goal:** Resolve 512+ mypy errors and remaining ruff violations to secure the foundation before refactoring. Add pre-commit hooks.
-
-### 3. `codebase_migration_20260302` (Archived)
- **Status:** Completed
- **Priority:** High
- **Goal:** Restructure directories to a `src/` layout. Doing this after static analysis ensures no hidden import bugs are introduced. Creates `sloppy.py` entry point.
-
-### 4. `gui_decoupling_controller_20260302` (Archived)
- **Status:** Completed
- **Priority:** High
- **Goal:** Extract the state machine and core lifecycle into a headless `app_controller.py`, leaving `gui_2.py` as a pure, immediate-mode view.
-
-### 5. `hook_api_ui_state_verification_20260302` (Active/Next)
- **Status:** Initialized / Looked Over
+### 1. `hook_api_ui_state_verification_20260302` (Active/Next)
+- **Status:** Initialized
 - **Priority:** High
 - **Goal:** Add a `/api/gui/state` GET endpoint. Wire UI state into `_settable_fields` to enable programmatic `live_gui` testing without user confirmation. 
 - **Fixes Test Debt:** Replaces brittle `time.sleep()` and string-matching assertions in simulations with deterministic API queries.

-### 6. `test_suite_performance_and_flakiness_20260302`
- **Status:** Initialized / Looked Over
+### 2. `asyncio_decoupling_refactor_20260306`
+- **Status:** Initialized
 - **Priority:** High
 - **Goal:** Resolve deep asyncio/threading deadlocks. Replace `asyncio.Queue` in `AppController` with a standard `queue.Queue`. Ensure phantom subprocesses are killed.
 - **Fixes Test Debt:** Eliminates `RuntimeError: Event loop is closed` and zombie port 8999 hijacking. Restores full-suite reliability.

-### 7. `robust_json_parsing_tech_lead_20260302`
- **Status:** Initialized / Looked Over
+### 3. `mock_provider_hardening_20260305`
+- **Status:** Initialized
+- **Priority:** Medium
+- **Goal:** Introduce negative testing paths (malformed JSON, timeouts) into the mock AI provider.
+- **Fixes Test Debt:** Allows the test suite to verify error handling flows that were previously masked by a mock provider that only ever returned success.
+
+### 4. `robust_json_parsing_tech_lead_20260302`
+- **Status:** Initialized
 - **Priority:** Medium
 - **Goal:** Implement an auto-retry loop that catches `JSONDecodeError` and feeds the traceback to the Tier 2 model for self-correction.
+- **Test Debt Note:** Rely strictly on in-process `unittest.mock` to verify the retry logic until stabilization tracks are done.

-### 8. `concurrent_tier_source_tier_20260302`
- **Status:** Initialized / Looked Over
+### 5. `concurrent_tier_source_tier_20260302`
+- **Status:** Initialized
 - **Priority:** Low
 - **Goal:** Replace global state with `threading.local()` or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel.
+- **Test Debt Note:** Use in-process mocks to verify concurrency.

-### 9. `manual_ux_validation_20260302`
- **Status:** Initialized / Looked Over
+### 6. `manual_ux_validation_20260302`
+- **Status:** Initialized
 - **Priority:** Medium
 - **Goal:** Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback.
+- **Test Debt Note:** Naturally bypasses automated testing debt as it is purely human-in-the-loop.

-### 10. `test_architecture_integrity_audit_20260304`
- **Status:** Audit Completed
- **Priority:** High
- **Goal:** Comprehensive audit of testing infrastructure and simulation framework. Produced `report_gemini.md` detailing exact mechanical failures and remediation paths.
+### 7. `async_tool_execution_20260303`
+- **Status:** Initialized
+- **Priority:** Medium
+- **Goal:** Refactor MCP tool execution to utilize `asyncio.gather` or thread pools to run multiple tools concurrently within a single AI loop.
+- **Test Debt Note:** Use in-process mocks to verify concurrency.
+
+### 8. `simulation_fidelity_enhancement_20260305`
+- **Status:** Initialized
+- **Priority:** Low
+- **Goal:** Add human-like jitter, hesitation, and reading latency to the UserSimAgent.
+
+---
+
+## Phase 3: Future Horizons (Post-Hardening Backlog)
+*To be evaluated in a future Tier 1 session once the Strict Execution Queue is cleared and the architectural foundation is stabilized.*
+
+### 1. True Parallel Worker Execution (The DAG Realization)
+**Goal:** Implement true concurrency for the DAG engine. Once `threading.local()` is in place, the `ExecutionEngine` should spawn independent Tier 3 workers in parallel (e.g., 4 workers handling 4 isolated tests simultaneously). Requires strict file-locking or a Git-based diff-merging strategy to prevent AST collision.
+
+### 2. Deep AST-Driven Context Pruning (RAG for Code)
+**Goal:** Before dispatching a Tier 3 worker, use `tree_sitter` to automatically parse the target file's AST, strip out unrelated function bodies, and inject a surgically condensed skeleton into the worker's prompt. Guarantees the AI only "sees" what it needs to edit, drastically reducing token burn.
+
+### 3. Visual DAG & Interactive Ticket Editing
+**Goal:** Replace the linear ticket list in the GUI with an interactive Node Graph using ImGui Bundle's node editor. Allow the user to visually drag dependency lines, split nodes, or delete tasks before clicking "Execute Pipeline."
+
+### 4. Advanced Tier 4 QA Auto-Patching
+**Goal:** Elevate Tier 4 from a log summarizer to an auto-patcher. When a verification test fails, Tier 4 generates a `.patch` file. The GUI intercepts this and presents a side-by-side Diff Viewer. The user clicks "Apply Patch" to instantly resume the pipeline.
+
+### 5. Transitioning to a Native Orchestrator
+**Goal:** Absorb the Conductor extension entirely into the core application. Manual Slop should natively read/write `plan.md`, manage the `metadata.json`, and orchestrate the MMA tiers in pure Python, removing the dependency on external CLI shell executions (`mma_exec.py`).