Files

Ed_ d863c51da3 docs: Update track plans with test architecture debt warnings

- Mark live_gui tests as flaky by design in TASKS.md until stabiliztion tracks complete

- Add test debt notes to upcoming tracks to guide testing strategies

2026-03-05 09:32:24 -05:00

4.5 KiB

Raw Blame History

TASKS.md

Active Tracks

(none — all planned tracks queued below)

Completed This Session

mma_agent_focus_ux_20260302 — Per-tier source_tier tagging on comms+tool entries; Focus Agent combo UI; filter logic in comms+tool panels; [tier] label per comms entry. 18 tests. Checkpoint: b30e563.
feature_bleed_cleanup_20260302 — Removed dead comms panel dup, dead menubar block, duplicate init vars; added working Quit; fixed Token Budget layout. All phases verified. Checkpoint: 0d081a2.
context_token_viz_20260301 — Token budget panel (color bar, breakdown table, trim warning, cache status, auto-refresh). All phases verified. Commit: d577457.
tech_debt_and_test_cleanup_20260302 — [BOTCHED/ARCHIVED] Centralized fixtures but exposed deep asyncio flaws.

Planned: The Strict Execution Queue

All previously loose backlog items have been rigorously spec'd and initialized as Conductor Tracks. They MUST be executed in this exact order.

[!WARNING] TEST ARCHITECTURE DEBT NOTICE (2026-03-05) The gui_decoupling track exposed deep flaws in the test architecture (asyncio event loop exhaustion, IPC polling race conditions, phantom Windows subprocesses). Current Testing Policy:

Full-suite integration tests (live_gui / extended sims) are currently considered "flaky by design".

Do NOT write new live_gui simulations until Track #5 and #6 are complete.

If unit tests pass but test_extended_sims.py hangs or fails locally, you may manually verify the GUI behavior and proceed.

1. `test_stabilization_20260302` (Archived)

Status: Completed
Priority: High
Goal: Stabilize asyncio errors, ban mock-rot, completely remove gui_legacy.py, and consolidate testing paradigms.

2. `strict_static_analysis_and_typing_20260302` (Archived)

Status: Completed
Priority: High
Goal: Resolve 512+ mypy errors and remaining ruff violations to secure the foundation before refactoring. Add pre-commit hooks.

3. `codebase_migration_20260302` (Archived)

Status: Completed
Priority: High
Goal: Restructure directories to a src/ layout. Doing this after static analysis ensures no hidden import bugs are introduced. Creates sloppy.py entry point.

4. `gui_decoupling_controller_20260302` (Archived)

Status: Completed
Priority: High
Goal: Extract the state machine and core lifecycle into a headless app_controller.py, leaving gui_2.py as a pure, immediate-mode view.

5. `hook_api_ui_state_verification_20260302` (Active/Next)

Status: Initialized / Looked Over
Priority: High
Goal: Add a /api/gui/state GET endpoint. Wire UI state into _settable_fields to enable programmatic live_gui testing without user confirmation.
Fixes Test Debt: Replaces brittle time.sleep() and string-matching assertions in simulations with deterministic API queries.

6. `test_suite_performance_and_flakiness_20260302`

Status: Initialized / Looked Over
Priority: High
Goal: Resolve deep asyncio/threading deadlocks. Replace asyncio.Queue in AppController with a standard queue.Queue. Ensure phantom subprocesses are killed.
Fixes Test Debt: Eliminates RuntimeError: Event loop is closed and zombie port 8999 hijacking. Restores full-suite reliability.

7. `robust_json_parsing_tech_lead_20260302`

Status: Initialized / Looked Over
Priority: Medium
Goal: Implement an auto-retry loop that catches JSONDecodeError and feeds the traceback to the Tier 2 model for self-correction.

8. `concurrent_tier_source_tier_20260302`

Status: Initialized / Looked Over
Priority: Low
Goal: Replace global state with threading.local() or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel.

9. `manual_ux_validation_20260302`

Status: Initialized / Looked Over
Priority: Medium
Goal: Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback.

10. `test_architecture_integrity_audit_20260304`

Status: Audit Completed
Priority: High
Goal: Comprehensive audit of testing infrastructure and simulation framework. Produced report_gemini.md detailing exact mechanical failures and remediation paths.

4.5 KiB Raw Blame History

TASKS.md

Active Tracks

Completed This Session

Planned: The Strict Execution Queue

1. test_stabilization_20260302 (Archived)

2. strict_static_analysis_and_typing_20260302 (Archived)

3. codebase_migration_20260302 (Archived)

4. gui_decoupling_controller_20260302 (Archived)

5. hook_api_ui_state_verification_20260302 (Active/Next)

6. test_suite_performance_and_flakiness_20260302

7. robust_json_parsing_tech_lead_20260302

8. concurrent_tier_source_tier_20260302

9. manual_ux_validation_20260302

10. test_architecture_integrity_audit_20260304