chore(conductor): Ensure plan complies with Surgical Spec Protocol

This commit is contained in:
2026-03-02 22:22:52 -05:00
parent 9a2dff9d66
commit 6141a958d3

View File

@@ -3,38 +3,66 @@
## Phase 1: Infrastructure & Paradigm Consolidation
- [ ] Task: Initialize MMA Environment `activate_skill mma-orchestrator`
- [ ] Task: Setup Artifact Isolation Directories
- [ ] Create `./tests/artifacts/` and `./tests/logs/` with appropriate `.gitignore`.
- [ ] WHERE: Project root
- [ ] WHAT: Create `./tests/artifacts/` and `./tests/logs/` directories. Add `.gitignore` to both containing `*` and `!.gitignore`.
- [ ] HOW: Use PowerShell `New-Item` and `Out-File`.
- [ ] SAFETY: Do not commit artifacts.
- [ ] Task: Migrate Manual Launchers to `live_gui` Fixture
- [ ] Refactor `tests/visual_mma_verification.py` to use the `live_gui` fixture.
- [ ] Audit `simulation/` and `tests/` for other manual subprocess launchers and refactor.
- [ ] WHERE: `tests/visual_mma_verification.py` (lines 15-40), `simulation/` scripts.
- [ ] WHAT: Replace `subprocess.Popen(["python", "gui_2.py"])` with the `live_gui` fixture injected into `pytest` test functions. Remove manual while-loop sleeps.
- [ ] HOW: Use standard pytest `def test_... (live_gui):` and rely on `ApiHookClient` with proper timeouts.
- [ ] SAFETY: Ensure `subprocess` is not orphaned if test fails.
- [ ] Task: Conductor - User Manual Verification 'Phase 1: Infrastructure & Consolidation' (Protocol in workflow.md)
## Phase 2: Asyncio Stabilization & Logging
- [ ] Task: Audit and Fix `conftest.py` Loop Lifecycle
- [ ] Ensure all fixtures correctly handle loop cleanup and task cancellation.
- [ ] WHERE: `tests/conftest.py:20-50` (around `app_instance` fixture).
- [ ] WHAT: Ensure the `app._loop.stop()` cleanup safely cancels pending background tasks.
- [ ] HOW: Use `asyncio.all_tasks(loop)` and `task.cancel()` before stopping the loop in the fixture teardown.
- [ ] SAFETY: Thread-safety; only cancel tasks belonging to the app's loop.
- [ ] Task: Resolve `Event loop is closed` in Core Test Suite
- [ ] Update identified files to pass active loops and use `ThreadPoolExecutor`.
- [ ] WHERE: `tests/test_spawn_interception.py`, `tests/test_gui_streaming.py`.
- [ ] WHAT: Update blocking calls to use `ThreadPoolExecutor` or `asyncio.run_coroutine_threadsafe(..., loop)`.
- [ ] HOW: Pass the active loop from `app_instance` to the functions triggering the events.
- [ ] SAFETY: Prevent event queue deadlocks.
- [ ] Task: Implement Centralized Sectioned Logging Utility
- [ ] Route `VerificationLogger` output to `./tests/logs/`.
- [ ] WHERE: `tests/conftest.py:50-80` (`VerificationLogger`).
- [ ] WHAT: Route `VerificationLogger` output to `./tests/logs/` instead of `logs/test/`.
- [ ] HOW: Update `self.logs_dir = Path(f"tests/logs/{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}")`.
- [ ] SAFETY: No state impact.
- [ ] Task: Conductor - User Manual Verification 'Phase 2: Asyncio & Logging' (Protocol in workflow.md)
## Phase 3: Assertion Implementation & Legacy Cleanup
- [ ] Task: Replace `pytest.fail` with Functional Assertions
- [ ] Focus on `api_events`, `execution_engine`, `token_usage`, and `agent_capabilities`.
- [ ] Task: Replace `pytest.fail` with Functional Assertions (`api_events`, `execution_engine`)
- [ ] WHERE: `tests/test_api_events.py:40`, `tests/test_execution_engine.py:45`.
- [ ] WHAT: Implement actual `assert` statements testing the mock calls and status updates.
- [ ] HOW: Use `MagicMock.assert_called_with` and check `ticket.status == "completed"`.
- [ ] SAFETY: Isolate mocks.
- [ ] Task: Replace `pytest.fail` with Functional Assertions (`token_usage`, `agent_capabilities`)
- [ ] WHERE: `tests/test_token_usage.py`, `tests/test_agent_capabilities.py`.
- [ ] WHAT: Implement tests verifying the `usage_metadata` extraction and `list_models` output count.
- [ ] HOW: Check for 6 models (including `gemini-2.0-flash`) in `list_models` test.
- [ ] SAFETY: Isolate mocks.
- [ ] Task: Resolve Simulation Entry Count Regressions
- [ ] Fix entry count assertions in `test_context_sim_live` and align mocks.
- [ ] WHERE: `tests/test_extended_sims.py:20`.
- [ ] WHAT: Fix `AssertionError: Expected at least 2 entries, found 0`.
- [ ] HOW: Update simulation flow to properly wait for the `User` and `AI` entries to populate the GUI history before asserting.
- [ ] SAFETY: Use dynamic wait (`ApiHookClient.wait_for_event`) instead of static sleeps.
- [ ] Task: Remove Legacy `gui_legacy` Test Imports & File
- [ ] Refactor `tests/test_gui_events.py`, `test_gui_updates.py`, and `test_gui_diagnostics.py` to use `gui_2.py`.
- [ ] Delete `gui_legacy.py` from the project root.
- [ ] WHERE: `tests/test_gui_events.py`, `tests/test_gui_updates.py`, `tests/test_gui_diagnostics.py`, and project root.
- [ ] WHAT: Change `from gui_legacy import App` to `from gui_2 import App`. Fix any breaking UI locators. Then delete `gui_legacy.py`.
- [ ] HOW: String replacement and standard `os.remove`.
- [ ] SAFETY: Verify no remaining imports exist across the suite using `grep_search`.
- [ ] Task: Conductor - User Manual Verification 'Phase 3: Assertions & Legacy Cleanup' (Protocol in workflow.md)
## Phase 4: Documentation & Final Verification
- [ ] Task: Model Switch Request
- [ ] Ask the user to run the `/model` command to switch to a high reasoning model for the documentation phase. Wait for their confirmation before proceeding.
- [ ] Task: Update Core Documentation & Workflow Contract
- [ ] Update `Readme.md` regarding the test framework and artifact/log locations.
- [ ] Update `docs/guide_simulations.md` to detail the `live_gui` fixture requirement and removal of manual simulation scripts.
- [ ] Update `conductor/workflow.md` to establish a strict "Structural Testing Contract": explicitly banning arbitrary `unittest.mock.patch` on core infra and mandating the use of centralized fixtures for all future Tier 2/Tier 3 agents.
- [ ] WHERE: `Readme.md`, `docs/guide_simulations.md`, `conductor/workflow.md`.
- [ ] WHAT: Document artifact locations, `live_gui` standard, and the strict "Structural Testing Contract".
- [ ] HOW: Markdown editing. Add sections explicitly banning arbitrary `unittest.mock.patch` on core infra for Tier 3 workers.
- [ ] SAFETY: Keep formatting clean.
- [ ] Task: Full Suite Validation & Warning Cleanup
- [ ] Task: Final Artifact Isolation Verification
- [ ] Task: Conductor - User Manual Verification 'Phase 4: Documentation & Final Verification' (Protocol in workflow.md)