Files
manual_slop/TASKS.md

6.9 KiB

TASKS.md

Active Tracks

  • feature_bleed_cleanup_20260302 — Dead code & conflicting design state cleanup (Phase 1-3)

Completed This Session

  • context_token_viz_20260301 — Token budget panel (color bar, breakdown table, trim warning, cache status, auto-refresh). All phases verified. Commit: d577457.

Planned: Next Track

mma_agent_focus_ux_20260302 (initialized — run after bleed cleanup)

Priority: High Depends on: feature_bleed_cleanup_20260302 Phase 1 (dead comms panel removed) Track dir: conductor/tracks/mma_agent_focus_ux_20260302/

Audit-confirmed gaps:

  • ai_client._append_comms emits entries with no source_tier key
  • ai_client has no current_tier module variable — no way for tiers to self-identify
  • _tool_log is list[tuple[str,str,float]] — no tier field, tuple must migrate to dict
  • run_worker_lifecycle replaces comms_log_callback but never stamps source_tier
  • generate_tickets (Tier 2) does NOT replace callback at all
  • No Focus Agent selector widget in Operations Hub

Scope: Phase 1 (tier tagging) → Phase 2 (tool log dict migration) → Phase 3 (Focus Agent UI + filter). Per-tier token stats deferred to sub-track.

tech_debt_and_test_cleanup_20260302 (initialized)

Priority: High Depends on: feature_bleed_cleanup_20260302 Track dir: conductor/tracks/tech_debt_and_test_cleanup_20260302/

Audit-confirmed gaps:

  • 13 test files duplicate app_instance fixture instead of using conftest.py.
  • Duplicate test files (test_ast_parser_curated.py).
  • Multiple simulation tests silently pass with no assertions.
  • gui_2.py initializes 9 state variables in __init__ that are never read.
  • gui_2.py has over 15 uncalled HTTP/background methods.

Scope: Phase 1 (Fixture deduplication) → Phase 2 (False-positive test fixing) → Phase 3 (Dead code excision in gui_2.py).

conductor_workflow_improvements_20260302 (initialized)

Priority: High Depends on: None Track dir: conductor/tracks/conductor_workflow_improvements_20260302/

Audit-confirmed gaps:

  • Tier 2 skill lacks enforcement of AST pre-implementation scans to prevent duplicate state variables.
  • Tier 2 skill lacks explicit rejection of non-TDD execution.
  • Tier 3 skill does not strictly forbid implementing code without failing tests.
  • workflow.md lacks explicit warnings against zero-assertion tests and redundant __init__ state.

Scope: Phase 1 (Update MMA Skill prompts) → Phase 2 (Update workflow.md).

architecture_boundary_hardening_20260302 (initialized)

Priority: High Depends on: None Track dir: conductor/tracks/architecture_boundary_hardening_20260302/

Audit-confirmed gaps:

  • ai_client.py loops execute set_file_slice and py_update_definition instantly without checking pre_tool_callback, bypassing GUI approval.
  • New mcp_client.py tools are not exposed in the GUI or manual_slop.toml config for user control.
  • mma_exec.py bypasses skeletonization for mcp_client, causing token bloat.
  • dag_engine.py does not cascade blocked states, causing orchestrator infinite loops.

Scope: Phase 1 (Meta-tooling token fix) → Phase 2 (Complete MCP Tool Integration & Seal GUI HITL bypass) → Phase 3 (Fix DAG Engine cascading blocks).

testing_consolidation_20260302 (initialized)

Priority: Medium Depends on: tech_debt_and_test_cleanup_20260302 Track dir: conductor/tracks/testing_consolidation_20260302/

Audit-confirmed gaps:

  • visual_mma_verification.py manually runs subprocess.Popen instead of using the robust live_gui fixture.
  • Duplicate architectural logic between tests and simulation/ directories causing fragmentation.

Scope: Phase 1 (Migrate manual launchers to fixtures) → Phase 2 (Consolidate simulation scripts).


Track Dependency Order (Execution Guide)

To ensure smooth execution, execute the tracks in the following order:

  1. feature_bleed_cleanup_20260302 (Base cleanup of GUI structure)
  2. mma_agent_focus_ux_20260302 (Depends on feature bleed cleanup Phase 1)
  3. architecture_boundary_hardening_20260302 (Fixes critical HITL & Token leaks; independent but foundational)
  4. tech_debt_and_test_cleanup_20260302 (Re-establishes testing foundation; run after feature tracks)
  5. testing_consolidation_20260302 (Refactors testing methodology; depends on tech debt cleanup)
  6. conductor_workflow_improvements_20260302 (Meta-level updates to skills/workflow docs; can be run anytime)

Future Backlog (Post-Cleanup)

To be evaluated in a future Tier 1 session after the immediate tech debt queue is cleared.

gui_decoupling_controller

Context: gui_2.py is over 3,500 lines and operates as a Monolithic God Object. It violates the "Data-Oriented & Immediate Mode" heuristics by owning complex business logic, orchestrator hooks (_bg_create_track), and markdown file building instead of acting as a pure view. Goal: Create a headless orchestrator_pm.py or app_controller.py that handles the core lifecycle, allowing gui_2.py to be a lagless, immediate-mode projection of the state.

robust_json_parsing_tech_lead

Context: In conductor_tech_lead.py, the generate_tickets function relies on a generic try...except block to parse the LLM's JSON ticket array. If the model hallucinates or outputs invalid JSON, it silently returns an empty array [], causing the GUI to fail the track creation process without giving the model a chance to self-correct. Goal: Implement a programmatic retry loop that catches JSONDecodeError and feeds the error back to the Tier 2 model for self-correction before failing the UI operation.

strict_static_analysis_and_typing

Context: Running uv run ruff check . and uv run mypy --explicit-package-bases . revealed massive technical debt in type safety (512+ Mypy errors across 64 files, 200+ remaining Ruff violations). The gui_2.py and api_hook_client.py files specifically have severe "Any" bleeding and incorrect unions. Goal: Resolve all static analysis errors. Enforce strict mypy compliance, remove implicit Optional types, and fix ambiguous variables (l). Integrate ruff and mypy into a CI pre-commit hook so Tier 3 workers are forced to write type-safe code going forward.

test_suite_performance_and_flakiness

Context: Running uv run pytest takes over 5.0 minutes to execute and frequently hangs on integration tests (e.g. test_spawn_interception.py). Several simulation tests (test_sim_ai_settings.py, test_extended_sims.py) are also currently failing or timing out. Goal: Audit the test suite for time.sleep() abuse. Replace hardcoded sleeps with threading.Event() hooks or robust polling. Isolate slow integration tests with @pytest.mark.slow and ensure the core unit test suite runs in under 10 seconds to maintain high-velocity TDD.