Compare commits
6 Commits
821983065c
...
6837a28b61
| Author | SHA1 | Date | |
|---|---|---|---|
| 6837a28b61 | |||
| bf10231ad5 | |||
| f088bab7e0 | |||
| 1eeed31040 | |||
| e88336e97d | |||
| 95bf42aa37 |
@@ -39,5 +39,11 @@
|
||||
- **Agent focus findings** (ai_client.py + conductors): No `current_tier` var; Tier 3 swaps callback but never stamps tier; Tier 2 doesn't swap at all; `_tool_log` is untagged tuple list
|
||||
- **Result**: 2 tracks committed (4f11d1e, c1a86e2). Bleed cleanup is active; agent focus depends on it.
|
||||
|
||||
- **More Tracks**: Initialized 'tech_debt_and_test_cleanup_20260302' and 'conductor_workflow_improvements_20260302' to harden TDD discipline, resolve test tech debt (false-positives, dupes), and mandate AST-based codebase auditing.
|
||||
- **Final Track**: Initialized 'architecture_boundary_hardening_20260302' to fix the GUI HITL bypass allowing direct AST mutations, patch token bloat in `mma_exec.py`, and implement cascading blockers in `dag_engine.py`.
|
||||
- **Testing Consolidation**: Initialized 'testing_consolidation_20260302' track to standardize simulation testing workflows around the pytest `live_gui` fixture and eliminate redundant `subprocess.Popen` wrappers.
|
||||
- **Dependency Order**: Added an explicit 'Track Dependency Order' execution guide to `TASKS.md` to ensure safe progression through the accumulated tech debt.
|
||||
---
|
||||
|
||||
|
||||
|
||||
|
||||
63
TASKS.md
63
TASKS.md
@@ -24,3 +24,66 @@
|
||||
- No Focus Agent selector widget in Operations Hub
|
||||
|
||||
**Scope:** Phase 1 (tier tagging) → Phase 2 (tool log dict migration) → Phase 3 (Focus Agent UI + filter). Per-tier token stats deferred to sub-track.
|
||||
|
||||
### `tech_debt_and_test_cleanup_20260302` (initialized)
|
||||
**Priority:** High
|
||||
**Depends on:** `feature_bleed_cleanup_20260302`
|
||||
**Track dir:** `conductor/tracks/tech_debt_and_test_cleanup_20260302/`
|
||||
|
||||
**Audit-confirmed gaps:**
|
||||
- 13 test files duplicate `app_instance` fixture instead of using `conftest.py`.
|
||||
- Duplicate test files (`test_ast_parser_curated.py`).
|
||||
- Multiple simulation tests silently pass with no assertions.
|
||||
- `gui_2.py` initializes 9 state variables in `__init__` that are never read.
|
||||
- `gui_2.py` has over 15 uncalled HTTP/background methods.
|
||||
|
||||
**Scope:** Phase 1 (Fixture deduplication) → Phase 2 (False-positive test fixing) → Phase 3 (Dead code excision in `gui_2.py`).
|
||||
|
||||
### `conductor_workflow_improvements_20260302` (initialized)
|
||||
**Priority:** High
|
||||
**Depends on:** None
|
||||
**Track dir:** `conductor/tracks/conductor_workflow_improvements_20260302/`
|
||||
|
||||
**Audit-confirmed gaps:**
|
||||
- Tier 2 skill lacks enforcement of AST pre-implementation scans to prevent duplicate state variables.
|
||||
- Tier 2 skill lacks explicit rejection of non-TDD execution.
|
||||
- Tier 3 skill does not strictly forbid implementing code without failing tests.
|
||||
- `workflow.md` lacks explicit warnings against zero-assertion tests and redundant `__init__` state.
|
||||
|
||||
**Scope:** Phase 1 (Update MMA Skill prompts) → Phase 2 (Update `workflow.md`).
|
||||
|
||||
### `architecture_boundary_hardening_20260302` (initialized)
|
||||
**Priority:** High
|
||||
**Depends on:** None
|
||||
**Track dir:** `conductor/tracks/architecture_boundary_hardening_20260302/`
|
||||
|
||||
**Audit-confirmed gaps:**
|
||||
- `ai_client.py` loops execute `set_file_slice` and `py_update_definition` instantly without checking `pre_tool_callback`, bypassing GUI approval.
|
||||
- `mma_exec.py` bypasses skeletonization for `mcp_client`, causing token bloat.
|
||||
- `dag_engine.py` does not cascade `blocked` states, causing orchestrator infinite loops.
|
||||
|
||||
**Scope:** Phase 1 (Meta-tooling token fix) → Phase 2 (Seal GUI HITL bypass) → Phase 3 (Fix DAG Engine cascading blocks).
|
||||
|
||||
### `testing_consolidation_20260302` (initialized)
|
||||
**Priority:** Medium
|
||||
**Depends on:** `tech_debt_and_test_cleanup_20260302`
|
||||
**Track dir:** `conductor/tracks/testing_consolidation_20260302/`
|
||||
|
||||
**Audit-confirmed gaps:**
|
||||
- `visual_mma_verification.py` manually runs `subprocess.Popen` instead of using the robust `live_gui` fixture.
|
||||
- Duplicate architectural logic between tests and `simulation/` directories causing fragmentation.
|
||||
|
||||
**Scope:** Phase 1 (Migrate manual launchers to fixtures) → Phase 2 (Consolidate simulation scripts).
|
||||
|
||||
---
|
||||
|
||||
## Track Dependency Order (Execution Guide)
|
||||
To ensure smooth execution, execute the tracks in the following order:
|
||||
1. `feature_bleed_cleanup_20260302` (Base cleanup of GUI structure)
|
||||
2. `mma_agent_focus_ux_20260302` (Depends on feature bleed cleanup Phase 1)
|
||||
3. `architecture_boundary_hardening_20260302` (Fixes critical HITL & Token leaks; independent but foundational)
|
||||
4. `tech_debt_and_test_cleanup_20260302` (Re-establishes testing foundation; run after feature tracks)
|
||||
5. `testing_consolidation_20260302` (Refactors testing methodology; depends on tech debt cleanup)
|
||||
6. `conductor_workflow_improvements_20260302` (Meta-level updates to skills/workflow docs; can be run anytime)
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,5 @@
|
||||
# Track architecture_boundary_hardening_20260302 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"track_id": "architecture_boundary_hardening_20260302",
|
||||
"type": "fix",
|
||||
"status": "new",
|
||||
"created_at": "2026-03-02T00:00:00Z",
|
||||
"updated_at": "2026-03-02T00:00:00Z",
|
||||
"description": "Fix boundary leak where the native MCP file mutation tools bypass the manual_slop GUI approval dialog, and patch token leaks in the meta-tooling scripts."
|
||||
}
|
||||
@@ -0,0 +1,23 @@
|
||||
# Implementation Plan: Architecture Boundary Hardening
|
||||
|
||||
Architecture reference: [docs/guide_architecture.md](../../../docs/guide_architecture.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Patch Context Amnesia Leak (Meta-Tooling)
|
||||
Focus: Stop `mma_exec.py` from injecting massive full-text dependencies.
|
||||
|
||||
- [ ] Task 1.1: In `scripts/mma_exec.py`, completely remove the `UNFETTERED_MODULES` constant and its associated `if dep in UNFETTERED_MODULES:` check. Ensure all imported local dependencies strictly use `generate_skeleton()`.
|
||||
|
||||
## Phase 2: Seal the HITL Bypass (Application Core)
|
||||
Focus: Ensure native MCP mutating tools cannot execute without user approval in the `manual_slop` application.
|
||||
|
||||
- [ ] Task 2.1: In `mcp_client.py`, define a new constant set `MUTATING_TOOLS = {"set_file_slice", "py_update_definition", "py_set_signature", "py_set_var_declaration"}`. (Note: `write_file` is not currently in the tool list, but add it if it is).
|
||||
- [ ] Task 2.2: In `ai_client.py`'s provider loops (`_send_gemini`, `_send_gemini_cli`, `_send_anthropic`, `_send_deepseek`), update the tool execution logic. If `name in mcp_client.MUTATING_TOOLS`, it MUST trigger the `pre_tool_callback` (or a variation of it) to ask for user approval before calling `mcp_client.dispatch`.
|
||||
- [ ] Task 2.3: In `gui_2.py`, ensure the UI rendering for the pending tool approval handles the AST mutations gracefully (e.g. showing the `new_content` payload instead of a PowerShell script).
|
||||
|
||||
## Phase 3: DAG Engine Cascading Blocks (Application Core)
|
||||
Focus: Prevent infinite deadlocks when Tier 3 workers fail repeatedly.
|
||||
|
||||
- [ ] Task 3.1: In `dag_engine.py`, add a `cascade_blocks()` method to `TrackDAG`. This method should iterate through all `todo` tickets and if any of their dependencies are `blocked`, mark the ticket itself as `blocked`.
|
||||
- [ ] Task 3.2: In `multi_agent_conductor.py`, update `ConductorEngine.run()`. Before calling `self.engine.tick()`, call `self.track_dag.cascade_blocks()` (or equivalent) so that blocked states propagate cleanly, allowing the `all_done` or block detection logic to exit the while loop correctly.
|
||||
@@ -0,0 +1,30 @@
|
||||
# Track Specification: Architecture Boundary Hardening
|
||||
|
||||
## Overview
|
||||
The `manual_slop` project serves dual roles: it is an end-user GUI application built around Human-In-The-Loop (HITL) AI orchestration, and it is the sandbox for the AI meta-tooling (`mma_exec.py`, `tool_call.py`) being used to develop it.
|
||||
Because `mcp_client.py` is shared between both environments to provide robust code investigation tools, a critical HITL bypass has emerged. Additionally, the meta-tooling scripts are bleeding tokens.
|
||||
|
||||
## Current State Audit
|
||||
|
||||
1. **HITL Bypass in `manual_slop` Application**:
|
||||
- Location: `ai_client.py` inside `_send_gemini`, `_send_gemini_cli`, `_send_anthropic`, and `_send_deepseek`.
|
||||
- Issue: The `pre_tool_callback` is explicitly only checked if `name == TOOL_NAME` (which is `run_powershell`).
|
||||
- If an AI agent running inside the GUI calls `set_file_slice` or `py_update_definition`, the code falls through to `elif name in mcp_client.TOOL_NAMES:` and dispatches it immediately, silently mutating the user's codebase without approval.
|
||||
- *Requirement*: The application strictly requires step-by-step deterministic user approval for *any* filesystem modification, whether by script or direct AST manipulation.
|
||||
|
||||
2. **Token Firewall Leak in Meta-Tooling (`mma_exec.py`)**:
|
||||
- Location: `scripts/mma_exec.py:101`.
|
||||
- Issue: `UNFETTERED_MODULES` hardcodes `['mcp_client', 'project_manager', 'events', 'aggregate']`. If a worker targets a file that imports `mcp_client`, the script injects the full `mcp_client.py` (~450 lines) into the context instead of its skeleton, blowing out the token budget and destroying Context Amnesia.
|
||||
|
||||
3. **DAG Engine Blocking Stalls (`dag_engine.py`)**:
|
||||
- Location: `dag_engine.py` -> `get_ready_tasks()`
|
||||
- Issue: `get_ready_tasks` requires all dependencies to be explicitly `completed`. If a task is marked `blocked` (e.g. after max retries in the ConductorEngine), its dependents stay `todo` forever. The `ConductorEngine.run()` loop has no logic to handle this cleanly, causing an infinite stall.
|
||||
|
||||
## Desired State
|
||||
- Any mutating tool from `mcp_client.py` (`set_file_slice`, `py_update_definition`, `py_set_signature`, `py_set_var_declaration`, `write_file`) must trigger a user approval dialogue, just like `run_powershell`.
|
||||
- The `UNFETTERED_MODULES` list must be completely removed from `mma_exec.py` so all dependencies are reliably skeletonized.
|
||||
- The `dag_engine.py` must cascade `blocked` status to downstream tasks so the track halts cleanly instead of deadlocking.
|
||||
|
||||
## Technical Constraints
|
||||
- The UI modal must be updated or a new `pre_mutation_callback` must be introduced to handle showing the proposed AST edit vs the proposed script.
|
||||
- Keep the boundary clear: changes in `ai_client.py` affect the user's `manual_slop` application experience. Changes in `mma_exec.py` affect *our* meta-tooling environment.
|
||||
@@ -0,0 +1,5 @@
|
||||
# Track conductor_workflow_improvements_20260302 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"track_id": "conductor_workflow_improvements_20260302",
|
||||
"type": "chore",
|
||||
"status": "new",
|
||||
"created_at": "2026-03-02T00:00:00Z",
|
||||
"updated_at": "2026-03-02T00:00:00Z",
|
||||
"description": "Improve MMA Skill prompts and Conductor workflow docs to enforce TDD, prevent feature bleed, and force mandatory pre-implementation architecture audits."
|
||||
}
|
||||
@@ -0,0 +1,17 @@
|
||||
# Implementation Plan: Conductor Workflow Improvements
|
||||
|
||||
Architecture reference: [docs/guide_mma.md](../../../docs/guide_mma.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Skill Document Hardening
|
||||
Focus: Update the agent skill prompts to enforce strict discipline.
|
||||
|
||||
- [ ] Task 1.1: Update `.gemini/skills/mma-tier2-tech-lead/SKILL.md`. Add a new section `## Anti-Entropy Protocol` requiring the Tech Lead to: (1) Use `py_get_code_outline` on the target class's `__init__` to check for redundant state before adding new variables; (2) Ensure failing tests are written and executed *before* delegating implementation to Tier 3.
|
||||
- [ ] Task 1.2: Update `.gemini/skills/mma-tier3-worker/SKILL.md`. Add an explicit directive in the `## Responsibilities` section: "You MUST write a failing test and verify it fails (the Red phase) BEFORE writing any implementation code. Do NOT write tests that contain only `pass` or lack assertions."
|
||||
|
||||
## Phase 2: Workflow Documentation Updates
|
||||
Focus: Add safeguards to the global Conductor workflow.
|
||||
|
||||
- [ ] Task 2.1: Update `conductor/workflow.md`. In the `High-Signal Research Phase` section, add a requirement to audit class initializers (`__init__`) for existing, unused, or duplicate state variables before adding new ones.
|
||||
- [ ] Task 2.2: Update `conductor/workflow.md`. In the `Test-Driven Development` section, explicitly ban zero-assertion tests and state that a test is only valid if it contains assertions that test the behavioral change.
|
||||
@@ -0,0 +1,19 @@
|
||||
# Track Specification: Conductor Workflow Improvements
|
||||
|
||||
## Overview
|
||||
Recent Tier 2 track implementations have resulted in feature bleed, redundant code, unread state variables, and degradation of TDD discipline (e.g., zero-assertion tests).
|
||||
This track updates the Conductor documentation (`workflow.md`) and the Gemini skills for Tiers 2 and 3 to hard-enforce TDD, prevent hallucinated "mock" implementations, and enforce strict codebase auditing before writing code.
|
||||
|
||||
## Current State Audit
|
||||
1. **Tier 2 Tech Lead Skill (`.gemini/skills/mma-tier2-tech-lead/SKILL.md`)**: Lacks explicit instructions forbidding the merging of code without verified failing test runs. Also lacks mandatory instructions to use `py_get_code_outline` or AST scans specifically to prevent duplicate state variables.
|
||||
2. **Tier 3 Worker Skill (`.gemini/skills/mma-tier3-worker/SKILL.md`)**: Mentions TDD, but does not explicitly instruct the agent to refuse to write implementation code if failing tests haven't been written and executed first.
|
||||
3. **Workflow Document (`conductor/workflow.md`)**: Mentions TDD and a Research-First Protocol, but lacks a strict "Zero-Assertion Prevention" rule and doesn't emphasize AST analysis of `__init__` functions when modifying state.
|
||||
|
||||
## Desired State
|
||||
- The `mma-tier2-tech-lead` skill forces the Tech Lead to execute tests and verify failure *before* delegating the implementation. It also mandates an explicit check of `__init__` for existing variables before adding new ones.
|
||||
- The `mma-tier3-worker` skill includes an explicit safeguard: "Do NOT write implementation code if you have not first written and executed a failing test for it."
|
||||
- The `conductor/workflow.md` explicitly calls out the danger of zero-assertion tests and requires AST checks for redundant state.
|
||||
|
||||
## Technical Constraints
|
||||
- The `.gemini/skills/` documents are the ultimate source of truth for agent behavior and must be updated directly.
|
||||
- The updates should be clear, commanding, and reference the specific errors encountered (e.g., "feature bleed", "zero-assertion tests").
|
||||
@@ -0,0 +1,5 @@
|
||||
# Track tech_debt_and_test_cleanup_20260302 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"track_id": "tech_debt_and_test_cleanup_20260302",
|
||||
"type": "chore",
|
||||
"status": "new",
|
||||
"created_at": "2026-03-02T00:00:00Z",
|
||||
"updated_at": "2026-03-02T00:00:00Z",
|
||||
"description": "Tech debt cleanup: Centralize duplicate app_instance fixtures, fix zero-assertion tests, and remove dead unused variables/methods from gui_2.py."
|
||||
}
|
||||
26
conductor/tracks/tech_debt_and_test_cleanup_20260302/plan.md
Normal file
26
conductor/tracks/tech_debt_and_test_cleanup_20260302/plan.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Implementation Plan: Tech Debt & Test Discipline Cleanup
|
||||
|
||||
Architecture reference: [docs/guide_architecture.md](../../../docs/guide_architecture.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Test Suite Deduplication and Centralization
|
||||
Focus: Move `app_instance` and `mock_app` to `tests/conftest.py` and remove them from individual test files.
|
||||
|
||||
- [ ] Task 1.1: Add `app_instance` and `mock_app` fixtures to `tests/conftest.py`. Ensure they properly yield the App instance and tear down.
|
||||
- [ ] Task 1.2: Remove local `app_instance` and `mock_app` fixtures from all 13 identified test files. (Tier 3 Worker string replacement / rewrite).
|
||||
- [ ] Task 1.3: Delete `tests/test_ast_parser_curated.py` if its contents are fully duplicated in `test_ast_parser.py`, or merge any missing tests.
|
||||
- [ ] Task 1.4: Run the test suite (`pytest`) to ensure no fixture resolution errors.
|
||||
|
||||
## Phase 2: False-Positive Test Exposure
|
||||
Focus: Make zero-assertion tests fail loudly so they can be properly tracked.
|
||||
|
||||
- [ ] Task 2.1: Add `pytest.fail("TODO: Implement assertions")` to `test_workflow_sim.py`, `test_sim_ai_settings.py`, `test_sim_tools.py`, `test_api_events.py` and any other tests identified as having zero assertions or just a `pass`.
|
||||
- [ ] Task 2.2: Add `@pytest.mark.skip(reason="TODO: Implement assertions")` to the visual simulation tests that only have a `pass` block.
|
||||
|
||||
## Phase 3: Dead Code Excision in `gui_2.py`
|
||||
Focus: Remove unused state variables and dead HTTP/background methods.
|
||||
|
||||
- [ ] Task 3.1: In `gui_2.py` `__init__`, remove the initialization of `_role`, `_ticket_id`, `_uid`, `_base_dir`, `last_md_path`, `_scroll_tool_calls_to_bottom`, `_token_budget_limit`, `_token_budget_pct`, `_token_budget_current`.
|
||||
- [ ] Task 3.2: Delete the following unused method definitions from `gui_2.py`: `do_fetch`, `do_post`, `fetch_stats`, `health`, `get_session`, `list_sessions`, `delete_session`, `status`, `get_context`, `_bg_task`, `_push_t1_usage`, `_load_fonts`, `run_prune`, `_parse_history_entries`, `confirm_action`, `pending_actions`, `token_stats`.
|
||||
- [ ] Task 3.3: Run `gui_2.py --headless` to verify the application still initializes properly without these variables/methods.
|
||||
24
conductor/tracks/tech_debt_and_test_cleanup_20260302/spec.md
Normal file
24
conductor/tracks/tech_debt_and_test_cleanup_20260302/spec.md
Normal file
@@ -0,0 +1,24 @@
|
||||
# Track Specification: Tech Debt & Test Discipline Cleanup
|
||||
|
||||
## Overview
|
||||
Due to rapid iterative development and feature bleed across multiple Tier 2-led tracks, significant tech debt has accumulated in both the testing suite and `gui_2.py`.
|
||||
This track will clean up test fixtures, enforce test assertion integrity, and remove dead codebase remnants.
|
||||
|
||||
## Current State Audit
|
||||
1. **Duplicate Fixtures**: The `app_instance` fixture is duplicated across 13 test files (e.g. `test_gui_events.py`, `test_process_pending_gui_tasks.py`). `mock_app` is similarly duplicated. They should live in `tests/conftest.py`.
|
||||
2. **Duplicate Tests**: `test_ast_parser_get_curated_view` exists in both `test_ast_parser.py` and `test_ast_parser_curated.py`.
|
||||
3. **Zero-Assertion Tests**: Many simulation tests and API event tests (e.g., `test_setup_new_project`, `test_sim_ai_settings.py`, `visual_sim_gui_ux.py`) merely run `pass` or execute commands without assertions, acting as a false positive for code coverage.
|
||||
4. **Dead State/Methods in gui_2.py**:
|
||||
- `gui_2.py.__init__` assigns state variables never read: `_role`, `_ticket_id`, `_uid`, `_base_dir`, `last_md_path`, `_scroll_tool_calls_to_bottom`, `_token_budget_limit`, `_token_budget_pct`, `_token_budget_current`.
|
||||
- `gui_2.py` has uncalled boilerplate methods (FastAPI leftovers or old logic): `do_fetch`, `do_post`, `fetch_stats`, `health`, `get_session`, `list_sessions`, `delete_session`, `status`, `get_context`, `_bg_task`, `_push_t1_usage`, `_load_fonts`, `run_prune`, `_parse_history_entries`, `confirm_action`, `pending_actions`, `token_stats`.
|
||||
|
||||
## Desired State
|
||||
- `app_instance` and `mock_app` fixtures centralized in `conftest.py`.
|
||||
- Duplicate test files/functions removed.
|
||||
- Tests without assertions marked with `pytest.fail("TODO: Add assertions")` so they correctly show as incomplete.
|
||||
- Unused variables and methods completely removed from `gui_2.py`.
|
||||
|
||||
## Technical Constraints
|
||||
- The `app_instance` fixture requires the `live_gui` logic or an isolated `App` instance setup. Must ensure it does not leak state when placed in `conftest.py`.
|
||||
- Ensure removal of unused variables in `gui_2.py` does not break any reflection/serialization if they are coincidentally used by config savers (though AST confirmed they are not read locally).
|
||||
- Must adhere to 1-space indentation for `gui_2.py`.
|
||||
5
conductor/tracks/testing_consolidation_20260302/index.md
Normal file
5
conductor/tracks/testing_consolidation_20260302/index.md
Normal file
@@ -0,0 +1,5 @@
|
||||
# Track testing_consolidation_20260302 Context
|
||||
|
||||
- [Specification](./spec.md)
|
||||
- [Implementation Plan](./plan.md)
|
||||
- [Metadata](./metadata.json)
|
||||
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"track_id": "testing_consolidation_20260302",
|
||||
"type": "chore",
|
||||
"status": "new",
|
||||
"created_at": "2026-03-02T00:00:00Z",
|
||||
"updated_at": "2026-03-02T00:00:00Z",
|
||||
"description": "Consolidate divergent simulation tests to uniformly use the pytest live_gui fixture and remove redundant subprocess launcher scripts."
|
||||
}
|
||||
16
conductor/tracks/testing_consolidation_20260302/plan.md
Normal file
16
conductor/tracks/testing_consolidation_20260302/plan.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# Implementation Plan: Testing & Simulation Consolidation
|
||||
|
||||
Architecture reference: [docs/guide_simulations.md](../../../docs/guide_simulations.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Migrate Manual Launchers to Pytest Fixtures
|
||||
Focus: Remove `subprocess.Popen` from visual verification scripts and convert them to proper pytest tests.
|
||||
|
||||
- [ ] Task 1.1: Refactor `tests/visual_mma_verification.py` to be a standard pytest function: `def test_visual_mma_verification(live_gui):`. Remove all `subprocess.Popen` and directory changing logic.
|
||||
- [ ] Task 1.2: Audit `tests/` for any other file containing `subprocess.Popen` pointing to `gui_2.py` and refactor them similarly.
|
||||
|
||||
## Phase 2: Consolidate Simulation Scripts
|
||||
Focus: Ensure the `simulation/` directory integrates cleanly with the pytest framework or serves a distinct non-testing purpose.
|
||||
|
||||
- [ ] Task 2.1: Audit the `simulation/` directory. If scripts there are just tests in disguise, move them into `tests/` and wrap them in the `live_gui` fixture. If they are intended as standalone interactive demos, clearly document their purpose and ensure they don't duplicate `conftest.py` logic unnecessarily.
|
||||
16
conductor/tracks/testing_consolidation_20260302/spec.md
Normal file
16
conductor/tracks/testing_consolidation_20260302/spec.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# Track Specification: Testing & Simulation Consolidation
|
||||
|
||||
## Overview
|
||||
Currently, the codebase has redundant testing paradigms. Some tests (`tests/visual_sim_gui_ux.py`) properly use the `live_gui` fixture managed by `pytest` in `conftest.py`. However, other visual verification scripts (like `tests/visual_mma_verification.py` and potentially files in `simulation/`) reinvent the wheel by manually opening subprocesses with `subprocess.Popen` to launch the GUI. This fragmentation causes tech debt and test flakiness.
|
||||
|
||||
## Current State Audit
|
||||
1. **Redundant Subprocess Launching**: `tests/visual_mma_verification.py` manually spawns `gui_2.py` via `subprocess.Popen` instead of using the `conftest.py` `live_gui` fixture.
|
||||
2. **Simulation Redundancy**: The `simulation/` directory contains `sim_base.py`, `workflow_sim.py`, etc., that also use `ApiHookClient` but may be reinventing pytest workflows outside of the standard test runner.
|
||||
|
||||
## Desired State
|
||||
- All "visual" or "integration" testing scripts that interact with the live GUI via `ApiHookClient` MUST use the `live_gui` pytest fixture and be executed via `pytest`.
|
||||
- Any standalone scripts in `tests/` that manually spawn `subprocess.Popen` for `gui_2.py` must be rewritten as standard pytest functions taking the `live_gui` argument.
|
||||
|
||||
## Technical Constraints
|
||||
- No tests should manually spawn `gui_2.py`. They must rely on `conftest.py`.
|
||||
- Keep testing framework unified strictly under `pytest`.
|
||||
Reference in New Issue
Block a user