23 KiB
Testing Guide
Top | Architecture | Simulations | Workflow
Overview
Manual Slop has 251 test files in tests/ covering every subsystem. The test infrastructure is designed around four principles:
- No real I/O during tests — every test gets a sandboxed workspace via the
isolate_workspaceautouse fixture. - No real AI calls — tests use mock providers, reset session state, and never hit the network.
- GUI tests launch a real app — the
live_guisession fixture startssloppy.py --enable-test-hooksso integration tests can drive the actual app via the Hook API. - Tests are categorized by marker — unit, integration, strict, clean_install, docker — so CI can opt in to expensive tests.
This guide is the canonical reference for how the test suite is structured and how to add new tests.
Test File Layout
tests/
├── conftest.py # Session-wide fixtures (live_gui, isolate_workspace, etc.)
├── conftest.py is the canonical source
├── test_*.py # 251 test files, named `test_<topic>_<aspect>.py`
├── *_sim.py # Integration tests using the live_gui fixture
├── test_clean_install.py # Opt-in: clones the repo to tmp and verifies hooks
├── test_docker_build.py # Opt-in: builds and runs the Docker image
├── test_arch_boundary_phase1.py # Architectural boundary tests
├── test_enforce_no_real_toml.py # Meta-test for the enforcer fixture
├── artifacts/ # Git-ignored; test output
├── logs/ # Git-ignored; live_gui log files
└── mock_concurrent_mma.py # Mock providers for MMA tests
Naming conventions:
test_*.py— pytest collection*_sim.py— integration test (useslive_gui)*_e2e.py— end-to-end test (real processes, opt-in via env var)test_<area>_<aspect>.py— single aspect of an area, e.g.,test_ai_client_cli.py
The conftest.py Fixtures
The tests/conftest.py file defines 7 fixtures. They are listed below in the order pytest applies them (autouse first, then function-scoped, then session-scoped).
Autouse Fixtures (Run Before Every Test)
isolate_workspace (line 70)
Purpose: Give every test a fresh, isolated workspace so it cannot pollute the user's real manual_slop.toml, presets.toml, etc.
Mechanism:
- Creates a temp directory via
tmp_path_factory.mktemp("isolated_workspace") - Writes a fresh
config.tomlto the temp dir - Sets
SLOP_CONFIG,SLOP_GLOBAL_PRESETS,SLOP_GLOBAL_TOOL_PRESETS,SLOP_GLOBAL_PERSONAS,SLOP_GLOBAL_WORKSPACE_PROFILESenv vars to point at the temp dir - The app reads these env vars on startup; the test sees an isolated world
Verification: python scripts/check_test_toml_paths.py exits 0 (no test references real TOMLs).
reset_paths (line 95)
Purpose: Reset the src.paths global state before and after each test.
Mechanism: Calls paths.reset_resolved() so path resolution re-evaluates on the next access.
reset_ai_client (line 107)
Purpose: Prevent ai_client state from leaking between tests.
Mechanism:
- Calls
ai_client.reset_session() - Clears callback hooks (
confirm_and_run_callback,comms_log_callback,tool_log_callback) - Clears all event listeners
- Resets provider to
("gemini", "gemini-2.5-flash-lite") - Resets MCP client state via
mcp_client.configure([], [])
Function-Scoped Fixtures (Opt-in)
vlogger (line 131)
Purpose: Provide a VerificationLogger instance for structured diagnostic logging.
Usage:
def test_my_thing(vlogger):
vlogger.log_state("Field", "before_value", "after_value")
# ... test logic ...
vlogger.finalize("Test Title", "PASS", "result message")
Output: tests/logs/<timestamp>/<script_name>.txt
kill_process_tree (function, line 138)
Purpose: Robustly kill a process and all its children. Used by live_gui for cleanup, but available to any test.
Mechanism:
- Windows:
taskkill /F /T /PID <pid>(the/Tflag is critical — kills the whole tree) - Unix:
os.killpg(os.getpgid(pid), SIGKILL)(kills the process group)
mock_app (line 157)
Purpose: Create an App instance with all external side effects mocked. For unit tests that need the App but not the GUI loop.
Mocks applied:
src.models.load_config→ returns a default configsrc.gui_2.project_managersrc.gui_2.session_loggersrc.gui_2.immapp.run(prevents the actual render loop from starting)src.app_controller.AppController._load_active_projectsrc.app_controller.AppController._fetch_modelsApp._load_fontsApp._post_initsrc.app_controller.AppController._prune_old_logssrc.app_controller.AppController.start_servicessrc.app_controller.AppController._init_ai_and_hookssrc.performance_monitor.PerformanceMonitor
Cleanup: Shuts down the controller after the test.
app_instance (line 190)
Purpose: Same as mock_app but with a slightly different mocking surface (the same mocks but used in test_gui_phase4.py and test_token_viz.py historically). Both are equivalent for most purposes.
Session-Scoped Fixtures (One Per Test Run)
live_gui (line 227)
Purpose: Start sloppy.py --enable-test-hooks for the entire test session. Integration tests use this to drive the real GUI via the Hook API.
Lifecycle:
- Setup (once per session):
- Create
tests/artifacts/live_gui_workspace/temp directory - Write
manual_slop.tomlandconfig.tomlto the workspace - Set up
SLOP_*env vars to point at the workspace - Symlink
assets/for fonts - Launch
sloppy.py --enable-test-hooksviasubprocess.Popen - Poll
GET /statusfor up to 15 seconds (waiting for the HookServer to start) - On failure:
pytest.fail()(kills the process tree, aborts the session)
- Create
- Yield: tests run
- Teardown (once per session):
- Call
ApiHookClient.reset_session()to clear GUI state - Kill the process tree (Windows:
taskkill /F /T, Unix:SIGKILL) - Wait 0.5s for file handles to close
- Close the log file
- Remove the temp workspace (with 5 retries for Windows file locks)
- Call
Yield value: (process: subprocess.Popen, gui_script: str) — but most tests just take the fixture and use the ApiHookClient directly.
Usage pattern:
def test_my_thing(live_gui):
client = ApiHookClient() # connects to localhost:8999
client.click("btn_id")
time.sleep(0.5)
assert client.get_value("show_thing") is True
Test Categories
1. Unit Tests (no fixtures, fast)
Pure functions tested in isolation. No app, no GUI, no subprocess. Run in <100ms each.
Examples:
tests/test_command_palette.py— fuzzy matcher, command registrytests/test_fuzzy_anchor.py— anchor slice algorithmtests/test_paths.py— path resolutiontests/test_token_usage.py— token trackingtests/test_cost_tracker.py— cost estimation
Pattern:
def test_my_unit():
result = my_function(input)
assert result == expected
2. Integration Tests (use live_gui, slow)
Drive the actual app via the Hook API. Run in 1-10 seconds each (real subprocess).
Examples:
tests/test_saved_presets_sim.py— preset switching via the GUItests/test_command_palette_sim.py— palette toggle, navigationtests/test_mma_concurrent_tracks_sim.py— multi-track MMAtests/test_workspace_profiles_sim.py— workspace profile save/loadtests/test_gui_dag_beads.py— Beads DAG visualization
Pattern:
def test_my_integration(live_gui):
client = ApiHookClient()
client.push_event("custom_callback", {
"callback": "_my_method",
"args": [arg1, arg2],
})
time.sleep(0.5)
assert client.get_value("result") == expected
3. Mock App Tests (use mock_app or app_instance, fast)
Need an App instance but not the full render loop. Run in <500ms each.
Examples:
tests/test_text_viewer.py— text viewer state updatestests/test_patch_modal.py— patch modal workflowtests/test_gui2_events.py— event subscriptions
Pattern:
def test_my_thing(mock_app):
mock_app.some_attr = "test_value"
mock_app._do_something()
assert mock_app.some_attr == "expected"
4. Headless Tests (no GUI, real services)
Test the FastAPI/headless service directly via the Hook API. No subprocess.
Examples:
tests/test_headless_service.py— service lifecycletests/test_headless_verification.py— full run with QA interceptor
5. Opt-in Tests (gated by env var)
Slow or network-dependent tests that don't run by default. Set the env var to enable.
| Test File | Marker | Env Var | Purpose |
|---|---|---|---|
tests/test_clean_install.py |
@pytest.mark.clean_install |
RUN_CLEAN_INSTALL_TEST=1 |
Clones the repo to tmp and verifies the hook API |
tests/test_docker_build.py |
@pytest.mark.docker |
RUN_DOCKER_TEST=1 |
Builds and runs the Docker image |
Running opt-in tests:
RUN_CLEAN_INSTALL_TEST=1 uv run pytest tests/test_clean_install.py -v
RUN_DOCKER_TEST=1 uv run pytest tests/test_docker_build.py -v
Markers
Defined in pyproject.toml:
[tool.pytest.ini_options]
markers = [
"integration: marks tests as integration tests (requires live GUI)",
]
Adding a new marker: add it to the list. Pytest will warn if a marker is used but not registered.
Filtering by marker:
uv run pytest -m integration # Only integration tests
uv run pytest -m "not integration" # Skip integration tests
uv run pytest -m clean_install # Opt-in clean install tests
The Hook API (For Integration Tests)
The live GUI exposes a Hook API on http://127.0.0.1:8999 when launched with --enable-test-hooks. The ApiHookClient (src/api_hook_client.py) is the Python wrapper.
Key Methods
client = ApiHookClient() # connects to localhost:8999 by default
# Click a button
client.click("btn_reset")
# Set a widget value
client.set_value("ui_ai_input", "Hello world")
# Push a generic GUI task
client.push_event("custom_callback", {
"callback": "_my_method",
"args": [arg1, arg2],
})
# Get a value (gettable field)
value = client.get_value("show_command_palette")
# Wait for an event
event = client.wait_for_event("ai_response", timeout=10)
# Reset the session
client.reset_session()
predefined_callbacks Pattern
To make a test invoke an App method via the hook, register it in gui_2.py:
self.controller._predefined_callbacks['_my_method'] = self._my_method
self.controller._gettable_fields['show_thing'] = 'show_thing'
The test can then invoke _my_method via:
client.push_event("custom_callback", {
"callback": "_my_method",
"args": [],
})
This pattern is how the Command Palette's _toggle_command_palette is exposed for tests (since the keyboard shortcut can't be simulated via the hook).
Common Patterns
Testing a Pure Function
def test_my_function():
from src.mymodule import my_function
result = my_function("input", 42)
assert result == "expected"
Testing with a Mock App
from unittest.mock import MagicMock
def test_with_mock():
app = MagicMock()
app.some_attr = "test"
from src.mymodule import do_thing
do_thing(app)
app.some_method.assert_called_once()
Testing via live_gui
import time
import pytest
from src.api_hook_client import ApiHookClient
def test_via_gui(live_gui):
client = ApiHookClient()
client.push_event("custom_callback", {
"callback": "_some_method",
"args": ["value"],
})
time.sleep(0.5)
assert client.get_value("result") == "expected"
Testing an Exception Path
import pytest
def test_raises():
from src.mymodule import do_thing
with pytest.raises(ValueError, match="expected message"):
do_thing(bad_input)
Parametrized Tests
import pytest
@pytest.mark.parametrize("input,expected", [
("a", 1),
("b", 2),
("c", 3),
])
def test_my_parametrized(input, expected):
assert my_function(input) == expected
Test Configuration
pyproject.toml
[tool.pytest.ini_options]
asyncio_mode = "strict"
markers = [
"integration: marks tests as integration tests (requires live GUI)",
]
asyncio_default_fixture_loop_scope = None
asyncio_default_test_loop_scope = "function"
asyncio_mode = "strict" means async tests need explicit @pytest.mark.asyncio. This is intentional — most Manual Slop tests are synchronous.
Coverage
Run with coverage:
uv run pytest tests/ --cov=src --cov-report=html
Open htmlcov/index.html in a browser. Target: >80% coverage for new code (per the project's quality gates).
Running Tests
All Tests
uv run pytest tests/ -v
Warning: This runs 251 tests including slow live_gui integration tests. Total runtime: 5-10 minutes.
Specific Test File
uv run pytest tests/test_command_palette.py -v
Specific Test
uv run pytest tests/test_command_palette.py::test_fuzzy_match_prefix_ranks_first -v
By Marker
uv run pytest -m integration -v # Only integration tests
uv run pytest -m "not integration" # Skip integration tests
With Stop on First Failure
uv run pytest tests/ -x -v
With Timeout
uv run pytest tests/ --timeout=60 -v
Adding a New Test
For a Pure Function
- Add tests to an existing
tests/test_<module>.pyfile (if it exists) or create a new one - Use
def test_<thing>():naming convention - No fixtures needed unless you're reading state
- Verify it runs:
uv run pytest tests/test_<file>.py::test_<name> -v
For an Integration Test
- Create or extend a
*_sim.pyfile - Add
def test_<thing>(live_gui):with the live_gui fixture - Use
ApiHookClientto drive the GUI - If you need to invoke an App method that's not yet exposed, register it as a
_predefined_callbacksentry ingui_2.py - Verify:
uv run pytest tests/test_<file>_sim.py::test_<name> -v
For an Opt-in Test (Clean Install / Docker)
- Mark with
@pytest.mark.<marker_name> - Gate the entire file with a skip if the env var isn't set
- Add the marker to
pyproject.toml'smarkerslist - Document the env var in the test file's docstring
Debugging Failed Tests
Verbose Output
uv run pytest tests/test_X.py -v -s
-s disables stdout/stderr capture so you can see print() output.
Stop at First Failure
uv run pytest tests/test_X.py -x
Enter PDB on Failure
uv run pytest tests/test_X.py --pdb
Show Local Variables on Failure
uv run pytest tests/test_X.py -l
Re-run Last Failed
uv run pytest --lf
Common Failure Modes
| Symptom | Likely Cause | Fix |
|---|---|---|
ImportError for a module |
Missing dependency or 1-space indent issue | Check pyproject.toml; run uv sync |
live_gui times out |
Previous test left a process running | taskkill /F /IM python.exe to clean up |
get_value returns None |
Field not registered as gettable | Add to self.controller._gettable_fields in gui_2.py |
custom_callback does nothing |
Callback not registered | Add to self.controller._predefined_callbacks |
IM_ASSERT: Must call EndChild() |
Modal end_child/end pairing broken (usually from a buggy action) | Wrap actions in try/except; check for imgui.end_child() before imgui.end() |
pytest.fail from live_gui startup |
Hook server didn't start in 15s | Check logs/gui_2_py_test.log for crash |
The Audit Script
scripts/check_test_toml_paths.py greps tests/ for direct ./<name>.toml references and exits 0 only if all tests are sandboxed. It's the enforcement mechanism for the "no real TOML in tests" rule.
Run it:
python scripts/check_test_toml_paths.py
Expected output:
OK: No tests reference real TOML files.
If violations are found, migrate the offender to use tmp_path + monkeypatch.
Test Data Flow
A typical test goes through this lifecycle:
Test starts
├─> isolate_workspace (autouse)
│ ├─> Creates tmp dir
│ └─> Sets SLOP_* env vars
│
├─> reset_paths (autouse)
│ └─> paths.reset_resolved()
│
├─> reset_ai_client (autouse)
│ └─> Resets ai_client global state
│
├─> (test body runs)
│ ├─> If using live_gui: subprocess already running (session-scoped)
│ ├─> Test makes API calls via ApiHookClient
│ └─> Test asserts on returned values
│
└─> Teardown
├─> reset_paths runs again
└─> (autouse) state cleanup
The live_gui session fixture runs once at the start of the test session and tears down once at the end. All tests in the session share the same sloppy.py process.
Known Gotchas (2026-06-05)
Authoring Robust live_gui Tests (Don't Assume Clean State)
live_gui is a session-scoped fixture. All tests in a session share the same sloppy.py subprocess. The subprocess is not restarted between tests; its internal state (Fonts, DisplaySize, internal caches, current theme, current workspace profile, current discussion, current MMA track) accumulates from the previous test.
This is a test-authoring contract, not a fixture bug. A test that "passes when run after test X" but "fails when run in isolation" is a fragile test. Robust live_gui tests must:
- Not assume clean state. Before invoking an operation, explicitly verify the precondition via the Hook API (e.g.
client.get_value("show_my_window"),client.get_mma_status(),client.get_session()). Do not assume a previous test set the state. - Use the wait-for-ready pattern, not fixed sleeps.
time.sleep(1)is not enough for ImGui to stabilize in the first few render frames (use 3+ seconds, but better: usewait_for_eventwith a generous timeout, or pollclient.get_status()until ImGui reportsready). Fixed sleeps are a code smell; if you reach for one, the right answer is almost always "poll a gettable field instead". - Reset state explicitly if the test depends on it. For tests that mutate state (e.g. "click button X"), reset the relevant state via Hook API in a
try/finallyso the next test starts from a known baseline. Alternatively, use a function-scoped helper that issues areset_sessioncallback before the test body. - Test both in the full suite AND in isolation before merging. If a test passes in the full suite but fails in isolation, the test is fragile — fix the test, don't add a "warmup" comment. Bisecting by
pytest path::test -k "filter"orpytest --collect-only --quiethelps. - Use
get_value/wait_for_eventto assert ready, not just to assert success. Example:Thedef test_open_settings_modal(live_gui): client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []}) # Wait for the modal to actually appear, not just for the click to dispatch assert client.get_value("show_settings_modal"), "settings modal did not open"get_valuepoll doubles as a wait-for-ready AND a correctness assertion.
Anti-pattern (fragile):
def test_open_settings_modal(live_gui):
client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
time.sleep(1) # hope the modal opened
assert some_cached_value["settings_open"] is True # may be stale from a prior test
Pattern (robust):
def test_open_settings_modal(live_gui):
client.reset_session() # function-scoped helper; Hook API reset callback
client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
assert client.get_value("show_settings_modal"), "settings modal did not open"
Early-Render C-Level Crashes (Defer-Not-Catch Pattern)
imgui.save_ini_settings_to_memory() (and similar raw imgui calls that read internal state) will crash the Python process at the C level (0xc0000005 access violation) if called before ImGui's internal state is fully initialized. This is not catchable from Python — try/except Exception cannot intercept native access violations.
Symptoms:
- The
sloppy.pysubprocess disappears without a Python traceback. - The pytest output shows
pytest.fail("Hook server did not start in 15s")(the subprocess died during startup). - Windows Event Viewer shows
Faulting module: _imgui_bundle.cp311-win_amd64.pydwith exception code0xc0000005.
Fix pattern: defer-not-catch. Track a one-shot "ready" flag in the instance state; return early on the first call, only invoking the C function on subsequent calls:
def _capture_workspace_profile(self, name: str) -> models.WorkspaceProfile:
if not getattr(self, "_ini_capture_ready", False):
self._ini_capture_ready = True
return models.WorkspaceProfile(name=name, docking_layout=b"", ...)
ini = imgui.save_ini_settings_to_memory()
return models.WorkspaceProfile(name=name, docking_layout=ini.encode("utf-8") if isinstance(ini, str) else ini, ...)
The first call (during initial startup) returns a safe empty profile and flips the flag; subsequent calls (when the user actually clicks "Save Profile") invoke the C function. The user's workflow is unaffected because the first call is non-blocking and the user cannot have clicked "Save Profile" before the GUI was fully rendered.
See src/gui_2.py:601-606 for the canonical implementation. This pattern unblocks 4-5 live_gui tests that were crashing the GUI subprocess during the first render frames after _capture_workspace_profile was invoked by the test (typically via a save_workspace_profile Hook API callback).
See Also
- guide_simulations.md — Older guide focused on the Puppeteer pattern; still relevant for the test scenarios it documents
- guide_meta_boundary.md — Application vs Meta-Tooling domain separation; the test suite is in the Application domain
- guide_architecture.md — Threading model that the
live_guitest fixture respects src/api_hook_client.py— The Python wrapper for the Hook API used in integration teststests/conftest.py— The canonical source of all fixtures documented in this guide
See guide_architecture.md for the overall architecture and conductor/workflow.md for the TDD protocol that the test suite implements.