Private
Public Access
0
0
Files
manual_slop/docs/guide_testing.md
T

26 KiB

Testing Guide

Top | Architecture | Simulations | Workflow


Overview

Manual Slop has 251 test files in tests/ covering every subsystem. The test infrastructure is designed around four principles:

  1. No real I/O during tests — every test gets a sandboxed workspace via the isolate_workspace autouse fixture.
  2. No real AI calls — tests use mock providers, reset session state, and never hit the network.
  3. GUI tests launch a real app — the live_gui session fixture starts sloppy.py --enable-test-hooks so integration tests can drive the actual app via the Hook API.
  4. Tests are categorized by marker — unit, integration, strict, clean_install, docker — so CI can opt in to expensive tests.

This guide is the canonical reference for how the test suite is structured and how to add new tests.


Test File Layout

tests/
├── conftest.py                    # Session-wide fixtures (live_gui, isolate_workspace, etc.)
├── conftest.py is the canonical source
├── test_*.py                      # 251 test files, named `test_<topic>_<aspect>.py`
├── *_sim.py                      # Integration tests using the live_gui fixture
├── test_clean_install.py          # Opt-in: clones the repo to tmp and verifies hooks
├── test_docker_build.py          # Opt-in: builds and runs the Docker image
├── test_arch_boundary_phase1.py   # Architectural boundary tests
├── test_enforce_no_real_toml.py   # Meta-test for the enforcer fixture
├── artifacts/                    # Git-ignored; test output
├── logs/                          # Git-ignored; live_gui log files
└── mock_concurrent_mma.py         # Mock providers for MMA tests

Naming conventions:

  • test_*.py — pytest collection
  • *_sim.py — integration test (uses live_gui)
  • *_e2e.py — end-to-end test (real processes, opt-in via env var)
  • test_<area>_<aspect>.py — single aspect of an area, e.g., test_ai_client_cli.py

The conftest.py Fixtures

The tests/conftest.py file defines 7 fixtures. They are listed below in the order pytest applies them (autouse first, then function-scoped, then session-scoped).

Autouse Fixtures (Run Before Every Test)

isolate_workspace (line 70)

Purpose: Give every test a fresh, isolated workspace so it cannot pollute the user's real manual_slop.toml, presets.toml, etc.

Mechanism:

  1. Creates a temp directory via tmp_path_factory.mktemp("isolated_workspace")
  2. Writes a fresh config.toml to the temp dir
  3. Sets SLOP_CONFIG, SLOP_GLOBAL_PRESETS, SLOP_GLOBAL_TOOL_PRESETS, SLOP_GLOBAL_PERSONAS, SLOP_GLOBAL_WORKSPACE_PROFILES env vars to point at the temp dir
  4. The app reads these env vars on startup; the test sees an isolated world

Verification: python scripts/check_test_toml_paths.py exits 0 (no test references real TOMLs).

reset_paths (line 95)

Purpose: Reset the src.paths global state before and after each test.

Mechanism: Calls paths.reset_resolved() so path resolution re-evaluates on the next access.

reset_ai_client (line 107)

Purpose: Prevent ai_client state from leaking between tests.

Mechanism:

  1. Calls ai_client.reset_session()
  2. Clears callback hooks (confirm_and_run_callback, comms_log_callback, tool_log_callback)
  3. Clears all event listeners
  4. Resets provider to ("gemini", "gemini-2.5-flash-lite")
  5. Resets MCP client state via mcp_client.configure([], [])

Function-Scoped Fixtures (Opt-in)

vlogger (line 131)

Purpose: Provide a VerificationLogger instance for structured diagnostic logging.

Usage:

def test_my_thing(vlogger):
    vlogger.log_state("Field", "before_value", "after_value")
    # ... test logic ...
    vlogger.finalize("Test Title", "PASS", "result message")

Output: tests/logs/<timestamp>/<script_name>.txt

kill_process_tree (function, line 138)

Purpose: Robustly kill a process and all its children. Used by live_gui for cleanup, but available to any test.

Mechanism:

  • Windows: taskkill /F /T /PID <pid> (the /T flag is critical — kills the whole tree)
  • Unix: os.killpg(os.getpgid(pid), SIGKILL) (kills the process group)

mock_app (line 157)

Purpose: Create an App instance with all external side effects mocked. For unit tests that need the App but not the GUI loop.

Mocks applied:

  • src.models.load_config → returns a default config
  • src.gui_2.project_manager
  • src.gui_2.session_logger
  • src.gui_2.immapp.run (prevents the actual render loop from starting)
  • src.app_controller.AppController._load_active_project
  • src.app_controller.AppController._fetch_models
  • App._load_fonts
  • App._post_init
  • src.app_controller.AppController._prune_old_logs
  • src.app_controller.AppController.start_services
  • src.app_controller.AppController._init_ai_and_hooks
  • src.performance_monitor.PerformanceMonitor

Cleanup: Shuts down the controller after the test.

app_instance (line 190)

Purpose: Same as mock_app but with a slightly different mocking surface (the same mocks but used in test_gui_phase4.py and test_token_viz.py historically). Both are equivalent for most purposes.

Session-Scoped Fixtures (One Per Test Run)

live_gui (line 227)

Purpose: Start sloppy.py --enable-test-hooks for the entire test session. Integration tests use this to drive the real GUI via the Hook API.

Lifecycle:

  1. Setup (once per session):
    • Create tests/artifacts/live_gui_workspace/ temp directory
    • Write manual_slop.toml and config.toml to the workspace
    • Set up SLOP_* env vars to point at the workspace
    • Symlink assets/ for fonts
    • Launch sloppy.py --enable-test-hooks via subprocess.Popen
    • Poll GET /status for up to 15 seconds (waiting for the HookServer to start)
    • On failure: pytest.fail() (kills the process tree, aborts the session)
  2. Yield: tests run
  3. Teardown (once per session):
    • Call ApiHookClient.reset_session() to clear GUI state
    • Kill the process tree (Windows: taskkill /F /T, Unix: SIGKILL)
    • Wait 0.5s for file handles to close
    • Close the log file
    • Remove the temp workspace (with 5 retries for Windows file locks)

Yield value: (process: subprocess.Popen, gui_script: str) — but most tests just take the fixture and use the ApiHookClient directly.

Usage pattern:

def test_my_thing(live_gui):
    client = ApiHookClient()  # connects to localhost:8999
    client.click("btn_id")
    time.sleep(0.5)
    assert client.get_value("show_thing") is True

Test Categories

1. Unit Tests (no fixtures, fast)

Pure functions tested in isolation. No app, no GUI, no subprocess. Run in <100ms each.

Examples:

  • tests/test_command_palette.py — fuzzy matcher, command registry
  • tests/test_fuzzy_anchor.py — anchor slice algorithm
  • tests/test_paths.py — path resolution
  • tests/test_token_usage.py — token tracking
  • tests/test_cost_tracker.py — cost estimation

Pattern:

def test_my_unit():
    result = my_function(input)
    assert result == expected

2. Integration Tests (use live_gui, slow)

Drive the actual app via the Hook API. Run in 1-10 seconds each (real subprocess).

Examples:

  • tests/test_saved_presets_sim.py — preset switching via the GUI
  • tests/test_command_palette_sim.py — palette toggle, navigation
  • tests/test_mma_concurrent_tracks_sim.py — multi-track MMA
  • tests/test_workspace_profiles_sim.py — workspace profile save/load
  • tests/test_gui_dag_beads.py — Beads DAG visualization

Pattern:

def test_my_integration(live_gui):
    client = ApiHookClient()
    client.push_event("custom_callback", {
        "callback": "_my_method",
        "args": [arg1, arg2],
    })
    time.sleep(0.5)
    assert client.get_value("result") == expected

3. Mock App Tests (use mock_app or app_instance, fast)

Need an App instance but not the full render loop. Run in <500ms each.

Examples:

  • tests/test_text_viewer.py — text viewer state updates
  • tests/test_patch_modal.py — patch modal workflow
  • tests/test_gui2_events.py — event subscriptions

Pattern:

def test_my_thing(mock_app):
    mock_app.some_attr = "test_value"
    mock_app._do_something()
    assert mock_app.some_attr == "expected"

4. Headless Tests (no GUI, real services)

Test the FastAPI/headless service directly via the Hook API. No subprocess.

Examples:

  • tests/test_headless_service.py — service lifecycle
  • tests/test_headless_verification.py — full run with QA interceptor

5. Opt-in Tests (gated by env var)

Slow or network-dependent tests that don't run by default. Set the env var to enable.

Test File Marker Env Var Purpose
tests/test_clean_install.py @pytest.mark.clean_install RUN_CLEAN_INSTALL_TEST=1 Clones the repo to tmp and verifies the hook API
tests/test_docker_build.py @pytest.mark.docker RUN_DOCKER_TEST=1 Builds and runs the Docker image

Running opt-in tests:

RUN_CLEAN_INSTALL_TEST=1 uv run pytest tests/test_clean_install.py -v
RUN_DOCKER_TEST=1 uv run pytest tests/test_docker_build.py -v

Markers

Defined in pyproject.toml:

[tool.pytest.ini_options]
markers = [
    "integration: marks tests as integration tests (requires live GUI)",
]

Adding a new marker: add it to the list. Pytest will warn if a marker is used but not registered.

Filtering by marker:

uv run pytest -m integration         # Only integration tests
uv run pytest -m "not integration"   # Skip integration tests
uv run pytest -m clean_install      # Opt-in clean install tests

The Hook API (For Integration Tests)

The live GUI exposes a Hook API on http://127.0.0.1:8999 when launched with --enable-test-hooks. The ApiHookClient (src/api_hook_client.py) is the Python wrapper.

Key Methods

client = ApiHookClient()  # connects to localhost:8999 by default

# Click a button
client.click("btn_reset")

# Set a widget value
client.set_value("ui_ai_input", "Hello world")

# Push a generic GUI task
client.push_event("custom_callback", {
    "callback": "_my_method",
    "args": [arg1, arg2],
})

# Get a value (gettable field)
value = client.get_value("show_command_palette")

# Wait for an event
event = client.wait_for_event("ai_response", timeout=10)

# Reset the session
client.reset_session()

predefined_callbacks Pattern

To make a test invoke an App method via the hook, register it in gui_2.py:

self.controller._predefined_callbacks['_my_method'] = self._my_method
self.controller._gettable_fields['show_thing'] = 'show_thing'

The test can then invoke _my_method via:

client.push_event("custom_callback", {
    "callback": "_my_method",
    "args": [],
})

This pattern is how the Command Palette's _toggle_command_palette is exposed for tests (since the keyboard shortcut can't be simulated via the hook).


Common Patterns

Testing a Pure Function

def test_my_function():
    from src.mymodule import my_function
    result = my_function("input", 42)
    assert result == "expected"

Testing with a Mock App

from unittest.mock import MagicMock

def test_with_mock():
    app = MagicMock()
    app.some_attr = "test"
    from src.mymodule import do_thing
    do_thing(app)
    app.some_method.assert_called_once()

Testing via live_gui

import time
import pytest
from src.api_hook_client import ApiHookClient

def test_via_gui(live_gui):
    client = ApiHookClient()
    client.push_event("custom_callback", {
        "callback": "_some_method",
        "args": ["value"],
    })
    time.sleep(0.5)
    assert client.get_value("result") == "expected"

Testing an Exception Path

import pytest

def test_raises():
    from src.mymodule import do_thing
    with pytest.raises(ValueError, match="expected message"):
        do_thing(bad_input)

Parametrized Tests

import pytest

@pytest.mark.parametrize("input,expected", [
    ("a", 1),
    ("b", 2),
    ("c", 3),
])
def test_my_parametrized(input, expected):
    assert my_function(input) == expected

Test Configuration

pyproject.toml

[tool.pytest.ini_options]
asyncio_mode = "strict"
markers = [
    "integration: marks tests as integration tests (requires live GUI)",
]
asyncio_default_fixture_loop_scope = None
asyncio_default_test_loop_scope = "function"

asyncio_mode = "strict" means async tests need explicit @pytest.mark.asyncio. This is intentional — most Manual Slop tests are synchronous.

Coverage

Run with coverage:

uv run pytest tests/ --cov=src --cov-report=html

Open htmlcov/index.html in a browser. Target: >80% coverage for new code (per the project's quality gates).


Running Tests

All Tests

uv run pytest tests/ -v

Warning: This runs 251 tests including slow live_gui integration tests. Total runtime: 5-10 minutes.

Specific Test File

uv run pytest tests/test_command_palette.py -v

Specific Test

uv run pytest tests/test_command_palette.py::test_fuzzy_match_prefix_ranks_first -v

Batched Run (Categorized)

uv run python scripts/run_tests_batched.py

This runs the new categorized batcher: 6 fixture-class-isolated tiers (opt-in skipped by default, unit with xdist, mock_app, live_gui in one session, headless, performance). Each tier prints a summary line. Use --plan to see the batch plan without running; --audit to list unclassified files; --tiers 1,2 to limit which tiers run.

See conductor/tracks/test_batching_refactor_20260606/spec.md for the full design.

By Marker

uv run pytest -m integration -v      # Only integration tests
uv run pytest -m "not integration"   # Skip integration tests

With Stop on First Failure

uv run pytest tests/ -x -v

With Timeout

uv run pytest tests/ --timeout=60 -v

Adding a New Test

For a Pure Function

  1. Add tests to an existing tests/test_<module>.py file (if it exists) or create a new one
  2. Use def test_<thing>(): naming convention
  3. No fixtures needed unless you're reading state
  4. Verify it runs: uv run pytest tests/test_<file>.py::test_<name> -v

For an Integration Test

  1. Create or extend a *_sim.py file
  2. Add def test_<thing>(live_gui): with the live_gui fixture
  3. Use ApiHookClient to drive the GUI
  4. If you need to invoke an App method that's not yet exposed, register it as a _predefined_callbacks entry in gui_2.py
  5. Verify: uv run pytest tests/test_<file>_sim.py::test_<name> -v

For an Opt-in Test (Clean Install / Docker)

  1. Mark with @pytest.mark.<marker_name>
  2. Gate the entire file with a skip if the env var isn't set
  3. Add the marker to pyproject.toml's markers list
  4. Document the env var in the test file's docstring

Debugging Failed Tests

Verbose Output

uv run pytest tests/test_X.py -v -s

-s disables stdout/stderr capture so you can see print() output.

Stop at First Failure

uv run pytest tests/test_X.py -x

Enter PDB on Failure

uv run pytest tests/test_X.py --pdb

Show Local Variables on Failure

uv run pytest tests/test_X.py -l

Re-run Last Failed

uv run pytest --lf

Common Failure Modes

Symptom Likely Cause Fix
ImportError for a module Missing dependency or 1-space indent issue Check pyproject.toml; run uv sync
live_gui times out Previous test left a process running taskkill /F /IM python.exe to clean up
get_value returns None Field not registered as gettable Add to self.controller._gettable_fields in gui_2.py
custom_callback does nothing Callback not registered Add to self.controller._predefined_callbacks
IM_ASSERT: Must call EndChild() Modal end_child/end pairing broken (usually from a buggy action) Wrap actions in try/except; check for imgui.end_child() before imgui.end()
pytest.fail from live_gui startup Hook server didn't start in 15s Check logs/gui_2_py_test.log for crash

The Audit Script

scripts/check_test_toml_paths.py greps tests/ for direct ./<name>.toml references and exits 0 only if all tests are sandboxed. It's the enforcement mechanism for the "no real TOML in tests" rule.

Run it:

python scripts/check_test_toml_paths.py

Expected output:

OK: No tests reference real TOML files.

If violations are found, migrate the offender to use tmp_path + monkeypatch.


Test Data Flow

A typical test goes through this lifecycle:

Test starts
  ├─> isolate_workspace (autouse)
  │     ├─> Creates tmp dir
  │     └─> Sets SLOP_* env vars
  │
  ├─> reset_paths (autouse)
  │     └─> paths.reset_resolved()
  │
  ├─> reset_ai_client (autouse)
  │     └─> Resets ai_client global state
  │
  ├─> (test body runs)
  │     ├─> If using live_gui: subprocess already running (session-scoped)
  │     ├─> Test makes API calls via ApiHookClient
  │     └─> Test asserts on returned values
  │
  └─> Teardown
        ├─> reset_paths runs again
        └─> (autouse) state cleanup

The live_gui session fixture runs once at the start of the test session and tears down once at the end. All tests in the session share the same sloppy.py process.


Known Gotchas (2026-06-05)

Authoring Robust live_gui Tests (Don't Assume Clean State)

live_gui is a session-scoped fixture. All tests in a session share the same sloppy.py subprocess. The subprocess is not restarted between tests; its internal state (Fonts, DisplaySize, internal caches, current theme, current workspace profile, current discussion, current MMA track) accumulates from the previous test.

This is a test-authoring contract, not a fixture bug. A test that "passes when run after test X" but "fails when run in isolation" is a fragile test. Robust live_gui tests must:

  1. Not assume clean state. Before invoking an operation, explicitly verify the precondition via the Hook API (e.g. client.get_value("show_my_window"), client.get_mma_status(), client.get_session()). Do not assume a previous test set the state.
  2. Use the wait-for-ready pattern, not fixed sleeps. time.sleep(1) is not enough for ImGui to stabilize in the first few render frames (use 3+ seconds, but better: use wait_for_event with a generous timeout, or poll client.get_status() until ImGui reports ready). Fixed sleeps are a code smell; if you reach for one, the right answer is almost always "poll a gettable field instead".
  3. Reset state explicitly if the test depends on it. For tests that mutate state (e.g. "click button X"), reset the relevant state via Hook API in a try/finally so the next test starts from a known baseline. Alternatively, use a function-scoped helper that issues a reset_session callback before the test body.
  4. Test both in the full suite AND in isolation before merging. If a test passes in the full suite but fails in isolation, the test is fragile — fix the test, don't add a "warmup" comment. Bisecting by pytest path::test -k "filter" or pytest --collect-only --quiet helps.
  5. Use get_value/wait_for_event to assert ready, not just to assert success. Example:
    def test_open_settings_modal(live_gui):
        client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
        # Wait for the modal to actually appear, not just for the click to dispatch
        assert client.get_value("show_settings_modal"), "settings modal did not open"
    
    The get_value poll doubles as a wait-for-ready AND a correctness assertion.

Anti-pattern (fragile):

def test_open_settings_modal(live_gui):
    client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
    time.sleep(1)  # hope the modal opened
    assert some_cached_value["settings_open"] is True  # may be stale from a prior test

Pattern (robust):

def test_open_settings_modal(live_gui):
    client.reset_session()  # function-scoped helper; Hook API reset callback
    client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
    assert client.get_value("show_settings_modal"), "settings modal did not open"

Early-Render C-Level Crashes (Defer-Not-Catch Pattern)

imgui.save_ini_settings_to_memory() (and similar raw imgui calls that read internal state) will crash the Python process at the C level (0xc0000005 access violation) if called before ImGui's internal state is fully initialized. This is not catchable from Pythontry/except Exception cannot intercept native access violations.

Symptoms:

  • The sloppy.py subprocess disappears without a Python traceback.
  • The pytest output shows pytest.fail("Hook server did not start in 15s") (the subprocess died during startup).
  • Windows Event Viewer shows Faulting module: _imgui_bundle.cp311-win_amd64.pyd with exception code 0xc0000005.

Fix pattern: defer-not-catch. Track a one-shot "ready" flag in the instance state; return early on the first call, only invoking the C function on subsequent calls:

def _capture_workspace_profile(self, name: str) -> models.WorkspaceProfile:
    if not getattr(self, "_ini_capture_ready", False):
        self._ini_capture_ready = True
        return models.WorkspaceProfile(name=name, docking_layout=b"", ...)
    ini = imgui.save_ini_settings_to_memory()
    return models.WorkspaceProfile(name=name, docking_layout=ini.encode("utf-8") if isinstance(ini, str) else ini, ...)

The first call (during initial startup) returns a safe empty profile and flips the flag; subsequent calls (when the user actually clicks "Save Profile") invoke the C function. The user's workflow is unaffected because the first call is non-blocking and the user cannot have clicked "Save Profile" before the GUI was fully rendered.

See src/gui_2.py:601-606 for the canonical implementation. This pattern unblocks 4-5 live_gui tests that were crashing the GUI subprocess during the first render frames after _capture_workspace_profile was invoked by the test (typically via a save_workspace_profile Hook API callback).

Sentinel type contract. When implementing a defer-not-catch guard, the early-return sentinel value must match the type contract of the downstream consumer. For WorkspaceProfile.ini_content: str (in this codebase), the sentinel must be "" (str), not b"" (bytes) — tomli_w rejects bytes (TypeError: Object of type 'bytes' is not TOML serializable), and imgui.load_ini_settings_from_memory(ini_data: str, ...) also expects str. A previous version of this fix used b"" and silently broke the save flow via a TypeError raised by tomli_w.dump; tests passed unit-test-wise but failed in the live_gui save+load round-trip. The fix was a 1-character change (b""""). The regression test in tests/test_workspace_profile_serialization.py encodes this contract.


Pattern: Narrow Test Paths vs. Kitchen-Sink Functions

Anti-pattern: calling a kitchen-sink function. A test that does gui_2.render_main_interface(app_instance) requires mocking 50+ imgui/imscope methods because render_main_interface dispatches to dozens of nested render functions. Adding a single mock for imscope.window (to return a tuple) just reveals the next un-mocked dependency (e.g. imgui.begin returning bool where a 2-tuple is expected). The test never reaches its assertion.

Better pattern: test the narrow function. Most render flows have a dedicated sub-function (e.g. render_prior_session_view, render_preset_manager_window, render_theme_panel). Refactor the test to call the narrow function directly with mocks scoped to what that function actually uses. Example outcome:

  • render_main_interface test: 50+ mocks, ~6s runtime, flakiness on every un-mocked imgui call.
  • render_prior_session_view test: 20 mocks, ~0.08s runtime, stable.

When to refactor vs. add mocks:

  • If the test intent is "verify push/pop balance in the prior-session render path", call the narrow function.
  • If the test intent is "verify the whole GUI render path is correct", accept the 50+ mock cost (and ensure all mocks are correct).

See the prior_session_test_harden_20260605 plan in docs/superpowers/plans/ for the concrete refactor example.


Pattern: Indentation-Driven Method Visibility

The bug: A class method defined with the right intent (2-space indent) may be parsed as nested inside a previous function if indentation is off by even one space. The file "passes" syntactically (imports OK) but the method is not on the class — hasattr(App, 'method_name') returns False. Any production code that calls app.method_name falls through to __getattr__, which delegates to the controller (which also doesn't have the method), and a cryptic AttributeError is raised at runtime.

How to detect:

  • Use AST to list all App methods: uv run python -c "import ast; tree = ast.parse(open('src/gui_2.py').read()); [print(item.name) for n in ast.walk(tree) if isinstance(n, ast.ClassDef) and n.name == 'App' for item in n.body if isinstance(item, ast.FunctionDef)]".
  • The skeleton via manual-slop_py_get_skeleton should show the method as a class member.

How to fix: Re-indent the affected method to 2-space class level. Run the failing test to confirm. See the live_gui_test_hardening_v2_20260605 track in conductor/tracks.md for the concrete example (where _capture_workspace_profile was being parsed as nested inside _apply_snapshot due to a 1-space indentation drift after a cleanup commit).


See Also

  • guide_simulations.md — Older guide focused on the Puppeteer pattern; still relevant for the test scenarios it documents
  • guide_meta_boundary.md — Application vs Meta-Tooling domain separation; the test suite is in the Application domain
  • guide_architecture.md — Threading model that the live_gui test fixture respects
  • src/api_hook_client.py — The Python wrapper for the Hook API used in integration tests
  • tests/conftest.py — The canonical source of all fixtures documented in this guide