Files

Ed_ c295db1630 docs: Reorder track queue and initialize final stabilization tracks

- Initialize asyncio_decoupling_refactor_20260306 track

- Initialize mock_provider_hardening_20260305 track

- Initialize simulation_fidelity_enhancement_20260305 track

- Update TASKS.md and tracks.md to reflect new strict execution queue

- Archive completed tracks and remove deprecated test performance track

2026-03-05 09:43:42 -05:00

6.1 KiB

Raw Blame History

TASKS.md

Active Tracks

(none — all planned tracks queued below)

Completed This Session

test_architecture_integrity_audit_20260304 — Comprehensive test architecture audit completed. Wrote exhaustive report_gemini.md detailing fixing the "Triple Bingo" streaming history explosion, Destructive IPC Read drops, and Asyncio deadlocks. Checkpoint: e3c6b9e.
mma_agent_focus_ux_20260302 — Per-tier source_tier tagging on comms+tool entries; Focus Agent combo UI; filter logic in comms+tool panels; [tier] label per comms entry. 18 tests. Checkpoint: b30e563.
feature_bleed_cleanup_20260302 — Removed dead comms panel dup, dead menubar block, duplicate init vars; added working Quit; fixed Token Budget layout. All phases verified. Checkpoint: 0d081a2.

Planned: The Strict Execution Queue

All previously loose backlog items have been rigorously spec'd and initialized as Conductor Tracks. They MUST be executed in this exact order.

[!WARNING] TEST ARCHITECTURE DEBT NOTICE (2026-03-05) The gui_decoupling track exposed deep flaws in the test architecture (asyncio event loop exhaustion, IPC polling race conditions, phantom Windows subprocesses). Current Testing Policy:

Full-suite integration tests (live_gui / extended sims) are currently considered "flaky by design".

Do NOT write new live_gui simulations until Track #1, #2, and #3 are complete.

If unit tests pass but test_extended_sims.py hangs or fails locally, you may manually verify the GUI behavior and proceed.

1. `hook_api_ui_state_verification_20260302` (Active/Next)

Status: Initialized
Priority: High
Goal: Add a /api/gui/state GET endpoint. Wire UI state into _settable_fields to enable programmatic live_gui testing without user confirmation.
Fixes Test Debt: Replaces brittle time.sleep() and string-matching assertions in simulations with deterministic API queries.

2. `asyncio_decoupling_refactor_20260306`

Status: Initialized
Priority: High
Goal: Resolve deep asyncio/threading deadlocks. Replace asyncio.Queue in AppController with a standard queue.Queue. Ensure phantom subprocesses are killed.
Fixes Test Debt: Eliminates RuntimeError: Event loop is closed and zombie port 8999 hijacking. Restores full-suite reliability.

3. `mock_provider_hardening_20260305`

Status: Initialized
Priority: Medium
Goal: Introduce negative testing paths (malformed JSON, timeouts) into the mock AI provider.
Fixes Test Debt: Allows the test suite to verify error handling flows that were previously masked by a mock provider that only ever returned success.

4. `robust_json_parsing_tech_lead_20260302`

Status: Initialized
Priority: Medium
Goal: Implement an auto-retry loop that catches JSONDecodeError and feeds the traceback to the Tier 2 model for self-correction.
Test Debt Note: Rely strictly on in-process unittest.mock to verify the retry logic until stabilization tracks are done.

5. `concurrent_tier_source_tier_20260302`

Status: Initialized
Priority: Low
Goal: Replace global state with threading.local() or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel.
Test Debt Note: Use in-process mocks to verify concurrency.

6. `manual_ux_validation_20260302`

Status: Initialized
Priority: Medium
Goal: Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback.
Test Debt Note: Naturally bypasses automated testing debt as it is purely human-in-the-loop.

7. `async_tool_execution_20260303`

Status: Initialized
Priority: Medium
Goal: Refactor MCP tool execution to utilize asyncio.gather or thread pools to run multiple tools concurrently within a single AI loop.
Test Debt Note: Use in-process mocks to verify concurrency.

8. `simulation_fidelity_enhancement_20260305`

Status: Initialized
Priority: Low
Goal: Add human-like jitter, hesitation, and reading latency to the UserSimAgent.

Phase 3: Future Horizons (Post-Hardening Backlog)

To be evaluated in a future Tier 1 session once the Strict Execution Queue is cleared and the architectural foundation is stabilized.

1. True Parallel Worker Execution (The DAG Realization)

Goal: Implement true concurrency for the DAG engine. Once threading.local() is in place, the ExecutionEngine should spawn independent Tier 3 workers in parallel (e.g., 4 workers handling 4 isolated tests simultaneously). Requires strict file-locking or a Git-based diff-merging strategy to prevent AST collision.

2. Deep AST-Driven Context Pruning (RAG for Code)

Goal: Before dispatching a Tier 3 worker, use tree_sitter to automatically parse the target file's AST, strip out unrelated function bodies, and inject a surgically condensed skeleton into the worker's prompt. Guarantees the AI only "sees" what it needs to edit, drastically reducing token burn.

3. Visual DAG & Interactive Ticket Editing

Goal: Replace the linear ticket list in the GUI with an interactive Node Graph using ImGui Bundle's node editor. Allow the user to visually drag dependency lines, split nodes, or delete tasks before clicking "Execute Pipeline."

4. Advanced Tier 4 QA Auto-Patching

Goal: Elevate Tier 4 from a log summarizer to an auto-patcher. When a verification test fails, Tier 4 generates a .patch file. The GUI intercepts this and presents a side-by-side Diff Viewer. The user clicks "Apply Patch" to instantly resume the pipeline.

5. Transitioning to a Native Orchestrator

Goal: Absorb the Conductor extension entirely into the core application. Manual Slop should natively read/write plan.md, manage the metadata.json, and orchestrate the MMA tiers in pure Python, removing the dependency on external CLI shell executions (mma_exec.py).

6.1 KiB Raw Blame History