# TASKS.md ## Active Tracks *(none — all planned tracks queued below)* ## Completed This Session - `feature_bleed_cleanup_20260302` — Removed dead comms panel dup, dead menubar block, duplicate __init__ vars; added working Quit; fixed Token Budget layout. All phases verified. Checkpoint: 0d081a2. - `context_token_viz_20260301` — Token budget panel (color bar, breakdown table, trim warning, cache status, auto-refresh). All phases verified. Commit: d577457. ## Planned: Next Track ### `mma_agent_focus_ux_20260302` (initialized — run after bleed cleanup) **Priority:** High **Depends on:** `feature_bleed_cleanup_20260302` Phase 1 (dead comms panel removed) **Track dir:** `conductor/tracks/mma_agent_focus_ux_20260302/` **Audit-confirmed gaps:** - `ai_client._append_comms` emits entries with no `source_tier` key - `ai_client` has no `current_tier` module variable — no way for tiers to self-identify - `_tool_log` is `list[tuple[str,str,float]]` — no tier field, tuple must migrate to dict - `run_worker_lifecycle` replaces `comms_log_callback` but never stamps `source_tier` - `generate_tickets` (Tier 2) does NOT replace callback at all - No Focus Agent selector widget in Operations Hub **Scope:** Phase 1 (tier tagging) → Phase 2 (tool log dict migration) → Phase 3 (Focus Agent UI + filter). Per-tier token stats deferred to sub-track. ### `tech_debt_and_test_cleanup_20260302` (initialized) **Priority:** High **Depends on:** `feature_bleed_cleanup_20260302` **Track dir:** `conductor/tracks/tech_debt_and_test_cleanup_20260302/` **Audit-confirmed gaps:** - 13 test files duplicate `app_instance` fixture instead of using `conftest.py`. - Duplicate test files (`test_ast_parser_curated.py`). - Multiple simulation tests silently pass with no assertions. - `gui_2.py` initializes 9 state variables in `__init__` that are never read. - `gui_2.py` has over 15 uncalled HTTP/background methods. **Scope:** Phase 1 (Fixture deduplication) → Phase 2 (False-positive test fixing) → Phase 3 (Dead code excision in `gui_2.py`). ### `conductor_workflow_improvements_20260302` (initialized) **Priority:** High **Depends on:** None **Track dir:** `conductor/tracks/conductor_workflow_improvements_20260302/` **Audit-confirmed gaps:** - Tier 2 skill lacks enforcement of AST pre-implementation scans to prevent duplicate state variables. - Tier 2 skill lacks explicit rejection of non-TDD execution. - Tier 3 skill does not strictly forbid implementing code without failing tests. - `workflow.md` lacks explicit warnings against zero-assertion tests and redundant `__init__` state. **Scope:** Phase 1 (Update MMA Skill prompts) → Phase 2 (Update `workflow.md`). ### `architecture_boundary_hardening_20260302` (initialized) **Priority:** High **Depends on:** None **Track dir:** `conductor/tracks/architecture_boundary_hardening_20260302/` **Audit-confirmed gaps:** - `ai_client.py` loops execute `set_file_slice` and `py_update_definition` instantly without checking `pre_tool_callback`, bypassing GUI approval. - New `mcp_client.py` tools are not exposed in the GUI or `manual_slop.toml` config for user control. - `mma_exec.py` bypasses skeletonization for `mcp_client`, causing token bloat. - `dag_engine.py` does not cascade `blocked` states, causing orchestrator infinite loops. **Scope:** Phase 1 (Meta-tooling token fix) → Phase 2 (Complete MCP Tool Integration & Seal GUI HITL bypass) → Phase 3 (Fix DAG Engine cascading blocks). ### `testing_consolidation_20260302` (initialized) **Priority:** Medium **Depends on:** `tech_debt_and_test_cleanup_20260302` **Track dir:** `conductor/tracks/testing_consolidation_20260302/` **Audit-confirmed gaps:** - `visual_mma_verification.py` manually runs `subprocess.Popen` instead of using the robust `live_gui` fixture. - Duplicate architectural logic between tests and `simulation/` directories causing fragmentation. **Scope:** Phase 1 (Migrate manual launchers to fixtures) → Phase 2 (Consolidate simulation scripts). --- ## Track Dependency Order (Execution Guide) To ensure smooth execution, execute the tracks in the following order: 1. `feature_bleed_cleanup_20260302` (Base cleanup of GUI structure) 2. `mma_agent_focus_ux_20260302` (Depends on feature bleed cleanup Phase 1) 3. `architecture_boundary_hardening_20260302` (Fixes critical HITL & Token leaks; independent but foundational) 4. `tech_debt_and_test_cleanup_20260302` (Re-establishes testing foundation; run after feature tracks) 5. `testing_consolidation_20260302` (Refactors testing methodology; depends on tech debt cleanup) 6. `conductor_workflow_improvements_20260302` (Meta-level updates to skills/workflow docs; can be run anytime) --- ## Future Backlog (Post-Cleanup) *To be evaluated in a future Tier 1 session after the immediate tech debt queue is cleared.* ### `gui_decoupling_controller` **Context:** `gui_2.py` is over 3,500 lines and operates as a Monolithic God Object. It violates the "Data-Oriented & Immediate Mode" heuristics by owning complex business logic, orchestrator hooks (`_bg_create_track`), and markdown file building instead of acting as a pure view. **Goal:** Create a headless `orchestrator_pm.py` or `app_controller.py` that handles the core lifecycle, allowing `gui_2.py` to be a lagless, immediate-mode projection of the state. ### `robust_json_parsing_tech_lead` **Context:** In `conductor_tech_lead.py`, the `generate_tickets` function relies on a generic `try...except` block to parse the LLM's JSON ticket array. If the model hallucinates or outputs invalid JSON, it silently returns an empty array `[]`, causing the GUI to fail the track creation process without giving the model a chance to self-correct. **Goal:** Implement a programmatic retry loop that catches `JSONDecodeError` and feeds the error back to the Tier 2 model for self-correction before failing the UI operation. ### `strict_static_analysis_and_typing` **Context:** Running `uv run ruff check .` and `uv run mypy --explicit-package-bases .` revealed massive technical debt in type safety (512+ Mypy errors across 64 files, 200+ remaining Ruff violations). The `gui_2.py` and `api_hook_client.py` files specifically have severe "Any" bleeding and incorrect unions. **Goal:** Resolve all static analysis errors. Enforce strict `mypy` compliance, remove implicit `Optional` types, and fix ambiguous variables (`l`). Integrate `ruff` and `mypy` into a CI pre-commit hook so Tier 3 workers are forced to write type-safe code going forward. ### `test_suite_performance_and_flakiness` **Context:** Running `uv run pytest` takes over 5.0 minutes to execute and frequently hangs on integration tests (e.g. `test_spawn_interception.py`). Several simulation tests (`test_sim_ai_settings.py`, `test_extended_sims.py`) are also currently failing or timing out. **Goal:** Audit the test suite for `time.sleep()` abuse. Replace hardcoded sleeps with `threading.Event()` hooks or robust polling. Isolate slow integration tests with `@pytest.mark.slow` and ensure the core unit test suite runs in under 10 seconds to maintain high-velocity TDD.