- Initialize asyncio_decoupling_refactor_20260306 track - Initialize mock_provider_hardening_20260305 track - Initialize simulation_fidelity_enhancement_20260305 track - Update TASKS.md and tracks.md to reflect new strict execution queue - Archive completed tracks and remove deprecated test performance track
6.1 KiB
TASKS.md
Active Tracks
(none — all planned tracks queued below)
Completed This Session
test_architecture_integrity_audit_20260304— Comprehensive test architecture audit completed. Wrote exhaustive report_gemini.md detailing fixing the "Triple Bingo" streaming history explosion, Destructive IPC Read drops, and Asyncio deadlocks. Checkpoint:e3c6b9e.mma_agent_focus_ux_20260302— Per-tier source_tier tagging on comms+tool entries; Focus Agent combo UI; filter logic in comms+tool panels; [tier] label per comms entry. 18 tests. Checkpoint:b30e563.feature_bleed_cleanup_20260302— Removed dead comms panel dup, dead menubar block, duplicate init vars; added working Quit; fixed Token Budget layout. All phases verified. Checkpoint:0d081a2.
Planned: The Strict Execution Queue
All previously loose backlog items have been rigorously spec'd and initialized as Conductor Tracks. They MUST be executed in this exact order.
[!WARNING] TEST ARCHITECTURE DEBT NOTICE (2026-03-05) The
gui_decouplingtrack exposed deep flaws in the test architecture (asyncio event loop exhaustion, IPC polling race conditions, phantom Windows subprocesses). Current Testing Policy:
- Full-suite integration tests (
live_gui/ extended sims) are currently considered "flaky by design".- Do NOT write new
live_guisimulations until Track #1, #2, and #3 are complete.- If unit tests pass but
test_extended_sims.pyhangs or fails locally, you may manually verify the GUI behavior and proceed.
1. hook_api_ui_state_verification_20260302 (Active/Next)
- Status: Initialized
- Priority: High
- Goal: Add a
/api/gui/stateGET endpoint. Wire UI state into_settable_fieldsto enable programmaticlive_guitesting without user confirmation. - Fixes Test Debt: Replaces brittle
time.sleep()and string-matching assertions in simulations with deterministic API queries.
2. asyncio_decoupling_refactor_20260306
- Status: Initialized
- Priority: High
- Goal: Resolve deep asyncio/threading deadlocks. Replace
asyncio.QueueinAppControllerwith a standardqueue.Queue. Ensure phantom subprocesses are killed. - Fixes Test Debt: Eliminates
RuntimeError: Event loop is closedand zombie port 8999 hijacking. Restores full-suite reliability.
3. mock_provider_hardening_20260305
- Status: Initialized
- Priority: Medium
- Goal: Introduce negative testing paths (malformed JSON, timeouts) into the mock AI provider.
- Fixes Test Debt: Allows the test suite to verify error handling flows that were previously masked by a mock provider that only ever returned success.
4. robust_json_parsing_tech_lead_20260302
- Status: Initialized
- Priority: Medium
- Goal: Implement an auto-retry loop that catches
JSONDecodeErrorand feeds the traceback to the Tier 2 model for self-correction. - Test Debt Note: Rely strictly on in-process
unittest.mockto verify the retry logic until stabilization tracks are done.
5. concurrent_tier_source_tier_20260302
- Status: Initialized
- Priority: Low
- Goal: Replace global state with
threading.local()or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel. - Test Debt Note: Use in-process mocks to verify concurrency.
6. manual_ux_validation_20260302
- Status: Initialized
- Priority: Medium
- Goal: Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback.
- Test Debt Note: Naturally bypasses automated testing debt as it is purely human-in-the-loop.
7. async_tool_execution_20260303
- Status: Initialized
- Priority: Medium
- Goal: Refactor MCP tool execution to utilize
asyncio.gatheror thread pools to run multiple tools concurrently within a single AI loop. - Test Debt Note: Use in-process mocks to verify concurrency.
8. simulation_fidelity_enhancement_20260305
- Status: Initialized
- Priority: Low
- Goal: Add human-like jitter, hesitation, and reading latency to the UserSimAgent.
Phase 3: Future Horizons (Post-Hardening Backlog)
To be evaluated in a future Tier 1 session once the Strict Execution Queue is cleared and the architectural foundation is stabilized.
1. True Parallel Worker Execution (The DAG Realization)
Goal: Implement true concurrency for the DAG engine. Once threading.local() is in place, the ExecutionEngine should spawn independent Tier 3 workers in parallel (e.g., 4 workers handling 4 isolated tests simultaneously). Requires strict file-locking or a Git-based diff-merging strategy to prevent AST collision.
2. Deep AST-Driven Context Pruning (RAG for Code)
Goal: Before dispatching a Tier 3 worker, use tree_sitter to automatically parse the target file's AST, strip out unrelated function bodies, and inject a surgically condensed skeleton into the worker's prompt. Guarantees the AI only "sees" what it needs to edit, drastically reducing token burn.
3. Visual DAG & Interactive Ticket Editing
Goal: Replace the linear ticket list in the GUI with an interactive Node Graph using ImGui Bundle's node editor. Allow the user to visually drag dependency lines, split nodes, or delete tasks before clicking "Execute Pipeline."
4. Advanced Tier 4 QA Auto-Patching
Goal: Elevate Tier 4 from a log summarizer to an auto-patcher. When a verification test fails, Tier 4 generates a .patch file. The GUI intercepts this and presents a side-by-side Diff Viewer. The user clicks "Apply Patch" to instantly resume the pipeline.
5. Transitioning to a Native Orchestrator
Goal: Absorb the Conductor extension entirely into the core application. Manual Slop should natively read/write plan.md, manage the metadata.json, and orchestrate the MMA tiers in pure Python, removing the dependency on external CLI shell executions (mma_exec.py).