Files
manual_slop/TASKS.md
Ed_ d863c51da3 docs: Update track plans with test architecture debt warnings
- Mark live_gui tests as flaky by design in TASKS.md until stabiliztion tracks complete

- Add test debt notes to upcoming tracks to guide testing strategies
2026-03-05 09:32:24 -05:00

4.5 KiB

TASKS.md

Active Tracks

(none — all planned tracks queued below)

Completed This Session

  • mma_agent_focus_ux_20260302 — Per-tier source_tier tagging on comms+tool entries; Focus Agent combo UI; filter logic in comms+tool panels; [tier] label per comms entry. 18 tests. Checkpoint: b30e563.
  • feature_bleed_cleanup_20260302 — Removed dead comms panel dup, dead menubar block, duplicate init vars; added working Quit; fixed Token Budget layout. All phases verified. Checkpoint: 0d081a2.
  • context_token_viz_20260301 — Token budget panel (color bar, breakdown table, trim warning, cache status, auto-refresh). All phases verified. Commit: d577457.
  • tech_debt_and_test_cleanup_20260302 — [BOTCHED/ARCHIVED] Centralized fixtures but exposed deep asyncio flaws.

Planned: The Strict Execution Queue

All previously loose backlog items have been rigorously spec'd and initialized as Conductor Tracks. They MUST be executed in this exact order.

[!WARNING] TEST ARCHITECTURE DEBT NOTICE (2026-03-05) The gui_decoupling track exposed deep flaws in the test architecture (asyncio event loop exhaustion, IPC polling race conditions, phantom Windows subprocesses). Current Testing Policy:

  • Full-suite integration tests (live_gui / extended sims) are currently considered "flaky by design".
  • Do NOT write new live_gui simulations until Track #5 and #6 are complete.
  • If unit tests pass but test_extended_sims.py hangs or fails locally, you may manually verify the GUI behavior and proceed.

1. test_stabilization_20260302 (Archived)

  • Status: Completed
  • Priority: High
  • Goal: Stabilize asyncio errors, ban mock-rot, completely remove gui_legacy.py, and consolidate testing paradigms.

2. strict_static_analysis_and_typing_20260302 (Archived)

  • Status: Completed
  • Priority: High
  • Goal: Resolve 512+ mypy errors and remaining ruff violations to secure the foundation before refactoring. Add pre-commit hooks.

3. codebase_migration_20260302 (Archived)

  • Status: Completed
  • Priority: High
  • Goal: Restructure directories to a src/ layout. Doing this after static analysis ensures no hidden import bugs are introduced. Creates sloppy.py entry point.

4. gui_decoupling_controller_20260302 (Archived)

  • Status: Completed
  • Priority: High
  • Goal: Extract the state machine and core lifecycle into a headless app_controller.py, leaving gui_2.py as a pure, immediate-mode view.

5. hook_api_ui_state_verification_20260302 (Active/Next)

  • Status: Initialized / Looked Over
  • Priority: High
  • Goal: Add a /api/gui/state GET endpoint. Wire UI state into _settable_fields to enable programmatic live_gui testing without user confirmation.
  • Fixes Test Debt: Replaces brittle time.sleep() and string-matching assertions in simulations with deterministic API queries.

6. test_suite_performance_and_flakiness_20260302

  • Status: Initialized / Looked Over
  • Priority: High
  • Goal: Resolve deep asyncio/threading deadlocks. Replace asyncio.Queue in AppController with a standard queue.Queue. Ensure phantom subprocesses are killed.
  • Fixes Test Debt: Eliminates RuntimeError: Event loop is closed and zombie port 8999 hijacking. Restores full-suite reliability.

7. robust_json_parsing_tech_lead_20260302

  • Status: Initialized / Looked Over
  • Priority: Medium
  • Goal: Implement an auto-retry loop that catches JSONDecodeError and feeds the traceback to the Tier 2 model for self-correction.

8. concurrent_tier_source_tier_20260302

  • Status: Initialized / Looked Over
  • Priority: Low
  • Goal: Replace global state with threading.local() or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel.

9. manual_ux_validation_20260302

  • Status: Initialized / Looked Over
  • Priority: Medium
  • Goal: Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback.

10. test_architecture_integrity_audit_20260304

  • Status: Audit Completed
  • Priority: High
  • Goal: Comprehensive audit of testing infrastructure and simulation framework. Produced report_gemini.md detailing exact mechanical failures and remediation paths.