From d93290a3d94688037d1461bb726db80257376aa6 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Mon, 2 Mar 2026 22:45:00 -0500 Subject: [PATCH] docs: Update Journal and Tasks with session 5 strategic shift --- JOURNAL.md | 44 +++++++++++------- TASKS.md | 132 ++++++++++++----------------------------------------- 2 files changed, 58 insertions(+), 118 deletions(-) diff --git a/JOURNAL.md b/JOURNAL.md index e24207a..bf4cec9 100644 --- a/JOURNAL.md +++ b/JOURNAL.md @@ -45,6 +45,21 @@ - **Dependency Order**: Added an explicit 'Track Dependency Order' execution guide to `TASKS.md` to ensure safe progression through the accumulated tech debt. - **Documentation**: Added guide_meta_boundary.md to explicitly clarify the difference between the Application's strict-HITL environment and the autonomous Meta-Tooling environment, helping future Tiers avoid feature bleed. - **Heuristics & Backlog**: Added Data-Oriented Design and Immediate Mode architectural heuristics (inspired by Muratori/Acton) to product-guidelines.md. Logged future decoupling and robust parsing tracks to a 'Future Backlog' in TASKS.md. + +--- + +## 2026-03-02 (Session 3) + +### Track: feature_bleed_cleanup_20260302 — Completed |TASK:feature_bleed_cleanup_20260302| +- **What**: Removed all confirmed dead code and layout regressions from gui_2.py (3 phases) +- **Why**: Tier 3 workers had left behind dead duplicate methods, dead menu block, duplicate state vars, and a broken Token Budget layout that embedded the panel inside Provider & Model with double labels +- **How**: + - Phase 1: Deleted dead `_render_comms_history_panel` duplicate (stale `type` key, nonexistent `_cb_load_prior_log`, `scroll_area` ID collision). Deleted 4 duplicate `__init__` assignments (ui_new_track_name etc.) + - Phase 2: Deleted dead `begin_main_menu_bar()` block (24 lines, always-False in HelloImGui). Added working `Quit` to `_show_menus` via `runner_params.app_shall_exit = True` + - Phase 3: Removed 4 redundant Token Budget labels/call from `_render_provider_panel`. Added `collapsing_header("Token Budget")` to AI Settings with proper `_render_token_budget_panel()` call +- **Issues**: Full test suite hangs (pre-existing — `test_suite_performance_and_flakiness` backlog). Ran targeted GUI/MMA subset (32 passed) as regression proxy. Meta-Level Sanity Check: 52 ruff errors in gui_2.py before and after — zero new violations introduced +- **Result**: All 3 phases verified by user. Checkpoints: be7174c (Phase 1), 15fd786 (Phase 2), 0d081a2 (Phase 3) + --- ## 2026-03-02 (Session 4) @@ -65,21 +80,18 @@ --- -## 2026-03-02 (Session 3) - -### Track: feature_bleed_cleanup_20260302 — Completed |TASK:feature_bleed_cleanup_20260302| -- **What**: Removed all confirmed dead code and layout regressions from gui_2.py (3 phases) -- **Why**: Tier 3 workers had left behind dead duplicate methods, dead menu block, duplicate state vars, and a broken Token Budget layout that embedded the panel inside Provider & Model with double labels -- **How**: - - Phase 1: Deleted dead `_render_comms_history_panel` duplicate (stale `type` key, nonexistent `_cb_load_prior_log`, `scroll_area` ID collision). Deleted 4 duplicate `__init__` assignments (ui_new_track_name etc.) - - Phase 2: Deleted dead `begin_main_menu_bar()` block (24 lines, always-False in HelloImGui). Added working `Quit` to `_show_menus` via `runner_params.app_shall_exit = True` - - Phase 3: Removed 4 redundant Token Budget labels/call from `_render_provider_panel`. Added `collapsing_header("Token Budget")` to AI Settings with proper `_render_token_budget_panel()` call -- **Issues**: Full test suite hangs (pre-existing — `test_suite_performance_and_flakiness` backlog). Ran targeted GUI/MMA subset (32 passed) as regression proxy. Meta-Level Sanity Check: 52 ruff errors in gui_2.py before and after — zero new violations introduced -- **Result**: All 3 phases verified by user. Checkpoints: be7174c (Phase 1), 15fd786 (Phase 2), 0d081a2 (Phase 3) - ---- - - - +## 2026-03-02 (Session 5) +### Track: tech_debt_and_test_cleanup_20260302 — Botched / Archived +- **What**: Attempted to centralize test fixtures and enforce test discipline. +- **Issues**: Track was launched with a flawed specification that misidentified critical headless API endpoints as "dead code." While centralized `app_instance` fixtures were successfully deployed, it exposed several zero-assertion tests and exacerbated deep architectural issues with the `asyncio` loop lifecycle, causing widespread `RuntimeError: Event loop is closed` warnings and test hangs. +- **Result**: Track was aborted and archived. A post-mortem `DEBRIEF.md` was generated. +### Strategic Shift: The Strict Execution Queue +- **What**: Systematically audited the Future Backlog and converted all pending technical debt into a strict, 9-track, linearly ordered execution queue in `conductor/tracks.md`. +- **Why**: "Mock-Rot" and stateless Tier 3 entropy. Tier 3 workers were blindly using `unittest.mock.patch` to pass tests without testing integration realities, creating a false sense of security. +- **How**: + - Defined the "Surgical Spec Protocol" to force Tier 1/2 agents to map exact `WHERE/WHAT/HOW/SAFETY` targets for workers. + - Initialized 7 new tracks: `test_stabilization_20260302`, `strict_static_analysis_and_typing_20260302`, `codebase_migration_20260302`, `gui_decoupling_controller_20260302`, `hook_api_ui_state_verification_20260302`, `robust_json_parsing_tech_lead_20260302`, `concurrent_tier_source_tier_20260302`, and `test_suite_performance_and_flakiness_20260302`. + - Added a highly interactive `manual_ux_validation_20260302` track specifically for tuning GUI animations and structural layout using a slow-mode simulation harness. +- **Result**: The project now has a crystal-clear, heavily guarded roadmap to escape technical debt and transition to a robust, Data-Oriented, type-safe architecture. \ No newline at end of file diff --git a/TASKS.md b/TASKS.md index 67329bd..e15794b 100644 --- a/TASKS.md +++ b/TASKS.md @@ -9,126 +9,54 @@ - `mma_agent_focus_ux_20260302` — Per-tier source_tier tagging on comms+tool entries; Focus Agent combo UI; filter logic in comms+tool panels; [tier] label per comms entry. 18 tests. Checkpoint: b30e563. - `feature_bleed_cleanup_20260302` — Removed dead comms panel dup, dead menubar block, duplicate __init__ vars; added working Quit; fixed Token Budget layout. All phases verified. Checkpoint: 0d081a2. - `context_token_viz_20260301` — Token budget panel (color bar, breakdown table, trim warning, cache status, auto-refresh). All phases verified. Commit: d577457. - -## Planned: Next Track - -### `mma_agent_focus_ux_20260302` — COMPLETED (b30e563) -~~(initialized — run after bleed cleanup)~~ -**Priority:** High -**Depends on:** `feature_bleed_cleanup_20260302` Phase 1 (dead comms panel removed) -**Track dir:** `conductor/tracks/mma_agent_focus_ux_20260302/` - -**Audit-confirmed gaps:** -- `ai_client._append_comms` emits entries with no `source_tier` key -- `ai_client` has no `current_tier` module variable — no way for tiers to self-identify -- `_tool_log` is `list[tuple[str,str,float]]` — no tier field, tuple must migrate to dict -- `run_worker_lifecycle` replaces `comms_log_callback` but never stamps `source_tier` -- `generate_tickets` (Tier 2) does NOT replace callback at all -- No Focus Agent selector widget in Operations Hub - -**Scope:** Phase 1 (tier tagging) → Phase 2 (tool log dict migration) → Phase 3 (Focus Agent UI + filter). Per-tier token stats deferred to sub-track. - -### `tech_debt_and_test_cleanup_20260302` (initialized) -**Priority:** High -**Depends on:** `feature_bleed_cleanup_20260302` -**Track dir:** `conductor/tracks/tech_debt_and_test_cleanup_20260302/` - -**Audit-confirmed gaps:** -- 13 test files duplicate `app_instance` fixture instead of using `conftest.py`. -- Duplicate test files (`test_ast_parser_curated.py`). -- Multiple simulation tests silently pass with no assertions. -- `gui_2.py` initializes 9 state variables in `__init__` that are never read. -- `gui_2.py` has over 15 uncalled HTTP/background methods. - -**Scope:** Phase 1 (Fixture deduplication) → Phase 2 (False-positive test fixing) → Phase 3 (Dead code excision in `gui_2.py`). - -### `conductor_workflow_improvements_20260302` (initialized) -**Priority:** High -**Depends on:** None -**Track dir:** `conductor/tracks/conductor_workflow_improvements_20260302/` - -**Audit-confirmed gaps:** -- Tier 2 skill lacks enforcement of AST pre-implementation scans to prevent duplicate state variables. -- Tier 2 skill lacks explicit rejection of non-TDD execution. -- Tier 3 skill does not strictly forbid implementing code without failing tests. -- `workflow.md` lacks explicit warnings against zero-assertion tests and redundant `__init__` state. - -**Scope:** Phase 1 (Update MMA Skill prompts) → Phase 2 (Update `workflow.md`). - -### `architecture_boundary_hardening_20260302` (initialized) -**Priority:** High -**Depends on:** None -**Track dir:** `conductor/tracks/architecture_boundary_hardening_20260302/` - -**Audit-confirmed gaps:** -- `ai_client.py` loops execute `set_file_slice` and `py_update_definition` instantly without checking `pre_tool_callback`, bypassing GUI approval. -- New `mcp_client.py` tools are not exposed in the GUI or `manual_slop.toml` config for user control. -- `mma_exec.py` bypasses skeletonization for `mcp_client`, causing token bloat. -- `dag_engine.py` does not cascade `blocked` states, causing orchestrator infinite loops. - -**Scope:** Phase 1 (Meta-tooling token fix) → Phase 2 (Complete MCP Tool Integration & Seal GUI HITL bypass) → Phase 3 (Fix DAG Engine cascading blocks). - -### `testing_consolidation_20260302` (initialized) -**Priority:** Medium -**Depends on:** `tech_debt_and_test_cleanup_20260302` -**Track dir:** `conductor/tracks/testing_consolidation_20260302/` - -**Audit-confirmed gaps:** -- `visual_mma_verification.py` manually runs `subprocess.Popen` instead of using the robust `live_gui` fixture. -- Duplicate architectural logic between tests and `simulation/` directories causing fragmentation. - -**Scope:** Phase 1 (Migrate manual launchers to fixtures) → Phase 2 (Consolidate simulation scripts). +- `tech_debt_and_test_cleanup_20260302` — [BOTCHED/ARCHIVED] Centralized fixtures but exposed deep asyncio flaws. --- -## Track Dependency Order (Execution Guide) -To ensure smooth execution, execute the tracks in the following order: -1. `feature_bleed_cleanup_20260302` (Base cleanup of GUI structure) -2. `mma_agent_focus_ux_20260302` (Depends on feature bleed cleanup Phase 1) -3. `architecture_boundary_hardening_20260302` (Fixes critical HITL & Token leaks; independent but foundational) -4. `tech_debt_and_test_cleanup_20260302` (Re-establishes testing foundation; run after feature tracks) -5. `testing_consolidation_20260302` (Refactors testing methodology; depends on tech debt cleanup) -6. `conductor_workflow_improvements_20260302` (Meta-level updates to skills/workflow docs; can be run anytime) - ---- - -## Planned: Upcoming Tracks -*The following tracks have been initialized and ordered for execution.* +## Planned: The Strict Execution Queue +*All previously loose backlog items have been rigorously spec'd and initialized as Conductor Tracks. They MUST be executed in this exact order.* ### 1. `test_stabilization_20260302` (Active/Next) -**Priority:** High -**Goal:** Stabilize `asyncio` errors, ban mock-rot, and consolidate testing paradigms. +- **Status:** Initialized / Looked Over +- **Priority:** High +- **Goal:** Stabilize `asyncio` errors, ban mock-rot, completely remove `gui_legacy.py`, and consolidate testing paradigms. ### 2. `strict_static_analysis_and_typing_20260302` -**Priority:** High -**Goal:** Resolve 512+ mypy errors and remaining ruff violations to secure the foundation before refactoring. Add pre-commit hooks. +- **Status:** Initialized / Looked Over +- **Priority:** High +- **Goal:** Resolve 512+ mypy errors and remaining ruff violations to secure the foundation before refactoring. Add pre-commit hooks. ### 3. `codebase_migration_20260302` -**Priority:** High -**Goal:** Restructure directories to a `src/` layout. Doing this after static analysis ensures no hidden import bugs are introduced. +- **Status:** Initialized / Looked Over +- **Priority:** High +- **Goal:** Restructure directories to a `src/` layout. Doing this after static analysis ensures no hidden import bugs are introduced. Creates `sloppy.py` entry point. ### 4. `gui_decoupling_controller_20260302` -**Priority:** High -**Goal:** Extract the state machine and core lifecycle into a headless `app_controller.py`, leaving `gui_2.py` as a pure, immediate-mode view. +- **Status:** Initialized / Looked Over +- **Priority:** High +- **Goal:** Extract the state machine and core lifecycle into a headless `app_controller.py`, leaving `gui_2.py` as a pure, immediate-mode view. ### 5. `hook_api_ui_state_verification_20260302` -**Priority:** Medium -**Goal:** Add a `/api/gui/state` GET endpoint. Wire UI state into `_settable_fields` to enable programmatic `live_gui` testing without user confirmation. +- **Status:** Initialized / Looked Over +- **Priority:** Medium +- **Goal:** Add a `/api/gui/state` GET endpoint. Wire UI state into `_settable_fields` to enable programmatic `live_gui` testing without user confirmation. ### 6. `robust_json_parsing_tech_lead_20260302` -**Priority:** Medium -**Goal:** Implement an auto-retry loop that catches `JSONDecodeError` and feeds the traceback to the Tier 2 model for self-correction. +- **Status:** Initialized / Looked Over +- **Priority:** Medium +- **Goal:** Implement an auto-retry loop that catches `JSONDecodeError` and feeds the traceback to the Tier 2 model for self-correction. ### 7. `concurrent_tier_source_tier_20260302` -**Priority:** Low -**Goal:** Replace global state with `threading.local()` or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel. +- **Status:** Initialized / Looked Over +- **Priority:** Low +- **Goal:** Replace global state with `threading.local()` or explicit context passing to guarantee thread-safe logging when multiple Tier 3 workers process tickets in parallel. ### 8. `test_suite_performance_and_flakiness_20260302` -**Priority:** Low -**Goal:** Replace `time.sleep()` with deterministic polling or `threading.Event()` triggers. Mark exceptionally heavy tests with `@pytest.mark.slow`. +- **Status:** Initialized / Looked Over +- **Priority:** Low +- **Goal:** Replace `time.sleep()` with deterministic polling or `threading.Event()` triggers. Mark exceptionally heavy tests with `@pytest.mark.slow`. ### 9. `manual_ux_validation_20260302` -**Priority:** Medium -**Goal:** Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback. - - +- **Status:** Initialized / Looked Over +- **Priority:** Medium +- **Goal:** Highly interactive human-in-the-loop track to review and adjust GUI UX, animations, popups, and layout structures based on slow-interval simulation feedback. \ No newline at end of file