Manual Slop: Verification & Simulation Framework

Detailed specification of the live GUI testing infrastructure, simulation lifecycle, and the mock provider strategy.

1. Live GUI Verification Infrastructure

To verify complex UI state and asynchronous interactions, Manual Slop employs a Live Verification strategy using the application's built-in API hooks.

`--enable-test-hooks`

When launched with this flag, the application starts the HookServer on port 8999, exposing its internal state to external HTTP requests. This is the foundation for all automated visual verification.

The `live_gui` pytest Fixture

Defined in tests/conftest.py, this session-scoped fixture manages the lifecycle of the application under test:

Startup: Spawns gui_2.py in a separate process with --enable-test-hooks.
Telemetry: Polls /status until the hook server is ready.
Isolation: Resets the AI session and clears comms logs between tests to prevent state pollution.
Teardown: Robustly kills the process tree on completion or failure.

2. Simulation Lifecycle: The "Puppeteer" Pattern

Simulations (like tests/visual_sim_mma_v2.py) act as a "Puppeteer," driving the GUI through the ApiHookClient.

Phase 1: Environment Setup

Provider Mocking: The simulation sets the current_provider to gemini_cli and redirects the gcli_path to a mock script (e.g., tests/mock_gemini_cli.py).
Workspace Isolation: The files_base_dir is pointed to a temporary artifacts directory to prevent accidental modification of the host project.

Phase 2: User Interaction Loop

The simulation replicates a human workflow by invoking client methods:

client.set_value('mma_epic_input', '...'): Injects the epic description.
client.click('btn_mma_plan_epic'): Triggers the orchestration engine.

Phase 3: Polling & Assertion

Because AI orchestration is asynchronous, simulations use a Polling with Multi-Modal Approval loop:

State Polling: The script polls client.get_mma_status() in a loop.
Auto-Approval: If the status indicates a pending tool or spawn request, the simulation automatically clicks the approval buttons (btn_approve_spawn, btn_approve_tool).
Verification: Once the expected state (e.g., "Mock Goal 1" appears in the track list) is detected, the simulation proceeds to the next phase or asserts success.

3. Mock Provider Strategy

To test the 4-Tier MMA hierarchy without incurring API costs or latency, Manual Slop uses a Script-Based Mocking strategy via the gemini_cli adapter.

`tests/mock_gemini_cli.py`

This script simulates the behavior of the gemini CLI by:

Input Parsing: Reading the system prompt and user message from the environment/stdin.
Deterministic Response: Returning pre-defined JSON payloads (e.g., track definitions, worker implementation scripts) based on keywords in the prompt.
Tool Simulation: Mimicking function-call responses to trigger the "Execution Clutch" within the GUI.

4. Visual Verification Examples

Tests in this framework don't just check return values; they verify the rendered state of the application:

DAG Integrity: Verifying that active_tickets in the MMA status matches the expected task graph.
Stream Telemetry: Checking mma_streams to ensure that output from multiple tiers is correctly captured and displayed in the terminal.
Modal State: Asserting that the correct dialog (e.g., ConfirmDialog) is active during a pending tool call.

By combining these techniques, Manual Slop achieves a level of verification rigor usually reserved for high-stakes embedded systems or complex graphics engines.

3.7 KiB Raw Blame History