Specification: Human-Like UX Interaction Test

Overview

This track implements a robust, "human-like" interaction test suite for Manual Slop. The suite will simulate a real user's workflow—from project creation to complex AI discussions and history management—using the application's API hooks. It aims to verify the "Integrated Workspace" functionality, tool execution, and history persistence without requiring manual human input, while remaining slow enough for visual audit.

Scope

Standalone Interactive Test: A Python script (live_walkthrough.py) that drives the GUI through a full session, ending with an optional manual sign-off.
Automated Regression Test: A pytest integration (tests/test_live_workflow.py) that executes the same logic in a headless or automated fashion for CI.
Target Model: Google Gemini Flash 2.5.

Functional Requirements

User Simulation:
- Dynamic Messaging: The test agent will generate responses based on the AI's output to simulate a multi-turn conversation.
- Tactile Delays: Short, random delays (minimum 0.5s) between actions to simulate reading and "typing" time.
- Visual Feedback: Automatic scrolling of the discussion history and comms logs to keep the "live" action in view.
Workflow Scenarios:
- Project Scaffolding: Create a new project and initialize a tiny console-based Python program.
- Discussion Loop: Engage in a ~5-turn conversation with the AI to refine the code.
- Context Management: Verify that tool calls (filesystem, shell) are reflected correctly in the Comms and Tool Log tabs.
- History Depth: Verify truncation limits and switching between named discussions.
Session Management:
- Tab Interaction: Programmatically switch between "Comms Log" and "Tool Log" tabs during operations.
- Historical Audit: Use the "Load Session Log" feature to load a prior log file and verify "Tinted Mode" visibility.

Non-Functional Requirements

Efficiency: Minimize token usage by using Gemini Flash and keeping the "User" prompts concise.
Observability: The standalone test must be clearly visible to a human observer, with state changes occurring at a "human-readable" pace.

Acceptance Criteria

live_walkthrough.py successfully completes a 5-turn discussion and signs off.
tests/test_live_workflow.py passes in CI environment.
Prior session logs are loaded and visualized without crashing.
Thinking and Live indicators trigger correctly during simulated API calls.

Out of Scope

Support for Anthropic API in this specific test track.
Stress testing high-concurrency tool calls.

2.6 KiB Raw Blame History