ed/manual_slop

Fork 0

Files

Ed_ 1d4dfedab7 chore(conductor): Add Manual UX Validation & Polish track to the strict execution queue

2026-03-02 22:42:27 -05:00

6.9 KiB

Raw Blame History

TASKS.md

Active Tracks

(none — all planned tracks queued below)

Completed This Session

mma_agent_focus_ux_20260302 — Per-tier source_tier tagging on comms+tool entries; Focus Agent combo UI; filter logic in comms+tool panels; [tier] label per comms entry. 18 tests. Checkpoint: b30e563.
feature_bleed_cleanup_20260302 — Removed dead comms panel dup, dead menubar block, duplicate init vars; added working Quit; fixed Token Budget layout. All phases verified. Checkpoint: 0d081a2.
context_token_viz_20260301 — Token budget panel (color bar, breakdown table, trim warning, cache status, auto-refresh). All phases verified. Commit: d577457.

Planned: Next Track

`mma_agent_focus_ux_20260302` — COMPLETED (`b30e563`)

~~(initialized — run after bleed cleanup)~~ Priority: High Depends on: feature_bleed_cleanup_20260302 Phase 1 (dead comms panel removed) Track dir: conductor/tracks/mma_agent_focus_ux_20260302/

Audit-confirmed gaps:

ai_client._append_comms emits entries with no source_tier key
ai_client has no current_tier module variable — no way for tiers to self-identify
_tool_log is list[tuple[str,str,float]] — no tier field, tuple must migrate to dict
run_worker_lifecycle replaces comms_log_callback but never stamps source_tier
generate_tickets (Tier 2) does NOT replace callback at all
No Focus Agent selector widget in Operations Hub

Scope: Phase 1 (tier tagging) → Phase 2 (tool log dict migration) → Phase 3 (Focus Agent UI + filter). Per-tier token stats deferred to sub-track.

`tech_debt_and_test_cleanup_20260302` (initialized)

Priority: High Depends on: feature_bleed_cleanup_20260302 Track dir: conductor/tracks/tech_debt_and_test_cleanup_20260302/

Audit-confirmed gaps:

13 test files duplicate app_instance fixture instead of using conftest.py.
Duplicate test files (test_ast_parser_curated.py).
Multiple simulation tests silently pass with no assertions.
gui_2.py initializes 9 state variables in __init__ that are never read.
gui_2.py has over 15 uncalled HTTP/background methods.

Scope: Phase 1 (Fixture deduplication) → Phase 2 (False-positive test fixing) → Phase 3 (Dead code excision in gui_2.py).

`conductor_workflow_improvements_20260302` (initialized)

Priority: High Depends on: None Track dir: conductor/tracks/conductor_workflow_improvements_20260302/

Audit-confirmed gaps:

Tier 2 skill lacks enforcement of AST pre-implementation scans to prevent duplicate state variables.
Tier 2 skill lacks explicit rejection of non-TDD execution.
Tier 3 skill does not strictly forbid implementing code without failing tests.
workflow.md lacks explicit warnings against zero-assertion tests and redundant __init__ state.

Scope: Phase 1 (Update MMA Skill prompts) → Phase 2 (Update workflow.md).

`architecture_boundary_hardening_20260302` (initialized)

Priority: High Depends on: None Track dir: conductor/tracks/architecture_boundary_hardening_20260302/

Audit-confirmed gaps:

ai_client.py loops execute set_file_slice and py_update_definition instantly without checking pre_tool_callback, bypassing GUI approval.
New mcp_client.py tools are not exposed in the GUI or manual_slop.toml config for user control.
mma_exec.py bypasses skeletonization for mcp_client, causing token bloat.
dag_engine.py does not cascade blocked states, causing orchestrator infinite loops.

Scope: Phase 1 (Meta-tooling token fix) → Phase 2 (Complete MCP Tool Integration & Seal GUI HITL bypass) → Phase 3 (Fix DAG Engine cascading blocks).

`testing_consolidation_20260302` (initialized)

Priority: Medium Depends on: tech_debt_and_test_cleanup_20260302 Track dir: conductor/tracks/testing_consolidation_20260302/

Audit-confirmed gaps:

visual_mma_verification.py manually runs subprocess.Popen instead of using the robust live_gui fixture.
Duplicate architectural logic between tests and simulation/ directories causing fragmentation.

Scope: Phase 1 (Migrate manual launchers to fixtures) → Phase 2 (Consolidate simulation scripts).

Track Dependency Order (Execution Guide)

To ensure smooth execution, execute the tracks in the following order:

feature_bleed_cleanup_20260302 (Base cleanup of GUI structure)
mma_agent_focus_ux_20260302 (Depends on feature bleed cleanup Phase 1)
architecture_boundary_hardening_20260302 (Fixes critical HITL & Token leaks; independent but foundational)
tech_debt_and_test_cleanup_20260302 (Re-establishes testing foundation; run after feature tracks)
testing_consolidation_20260302 (Refactors testing methodology; depends on tech debt cleanup)
conductor_workflow_improvements_20260302 (Meta-level updates to skills/workflow docs; can be run anytime)

Planned: Upcoming Tracks

The following tracks have been initialized and ordered for execution.

1. `test_stabilization_20260302` (Active/Next)

Priority: High Goal: Stabilize asyncio errors, ban mock-rot, and consolidate testing paradigms.

2. `strict_static_analysis_and_typing_20260302`

Priority: High Goal: Resolve 512+ mypy errors and remaining ruff violations to secure the foundation before refactoring. Add pre-commit hooks.

3. `codebase_migration_20260302`

Priority: High Goal: Restructure directories to a src/ layout. Doing this after static analysis ensures no hidden import bugs are introduced.

4. `gui_decoupling_controller_20260302`

Priority: High Goal: Extract the state machine and core lifecycle into a headless app_controller.py, leaving gui_2.py as a pure, immediate-mode view.

5. `hook_api_ui_state_verification_20260302`

Priority: Medium Goal: Add a /api/gui/state GET endpoint. Wire UI state into _settable_fields to enable programmatic live_gui testing without user confirmation.

6. `robust_json_parsing_tech_lead_20260302`

Priority: Medium Goal: Implement an auto-retry loop that catches JSONDecodeError and feeds the traceback to the Tier 2 model for self-correction.

7. `concurrent_tier_source_tier_20260302`

Priority: Low Goal: Replace global state with threading.local() or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel.

8. `test_suite_performance_and_flakiness_20260302`

Priority: Low Goal: Replace time.sleep() with deterministic polling or threading.Event() triggers. Mark exceptionally heavy tests with @pytest.mark.slow.

9. `manual_ux_validation_20260302`

Priority: Medium Goal: Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback.

6.9 KiB Raw Blame History

TASKS.md

Active Tracks

Completed This Session

Planned: Next Track

mma_agent_focus_ux_20260302 — COMPLETED (b30e563)

tech_debt_and_test_cleanup_20260302 (initialized)

conductor_workflow_improvements_20260302 (initialized)

architecture_boundary_hardening_20260302 (initialized)

testing_consolidation_20260302 (initialized)

Track Dependency Order (Execution Guide)

Planned: Upcoming Tracks

1. test_stabilization_20260302 (Active/Next)

2. strict_static_analysis_and_typing_20260302

3. codebase_migration_20260302

4. gui_decoupling_controller_20260302

5. hook_api_ui_state_verification_20260302

6. robust_json_parsing_tech_lead_20260302

7. concurrent_tier_source_tier_20260302

8. test_suite_performance_and_flakiness_20260302

9. manual_ux_validation_20260302

6.9 KiB

Raw Blame History

`mma_agent_focus_ux_20260302` — COMPLETED (`b30e563`)

`tech_debt_and_test_cleanup_20260302` (initialized)

`conductor_workflow_improvements_20260302` (initialized)

`architecture_boundary_hardening_20260302` (initialized)

`testing_consolidation_20260302` (initialized)

1. `test_stabilization_20260302` (Active/Next)

2. `strict_static_analysis_and_typing_20260302`

3. `codebase_migration_20260302`

4. `gui_decoupling_controller_20260302`

5. `hook_api_ui_state_verification_20260302`

6. `robust_json_parsing_tech_lead_20260302`

7. `concurrent_tier_source_tier_20260302`

8. `test_suite_performance_and_flakiness_20260302`

9. `manual_ux_validation_20260302`