ed/manual_slop

Private

Public Access

Fork 0

Files

T

ed 0fec0f4f56 docs(testing): reframe live_gui gotcha as test-authoring contract, not fixture bug

2026-06-05 18:39:33 -04:00

23 KiB

Raw Blame History

Testing Guide

Top | Architecture | Simulations | Workflow

Overview

Manual Slop has 251 test files in tests/ covering every subsystem. The test infrastructure is designed around four principles:

No real I/O during tests — every test gets a sandboxed workspace via the isolate_workspace autouse fixture.
No real AI calls — tests use mock providers, reset session state, and never hit the network.
GUI tests launch a real app — the live_gui session fixture starts sloppy.py --enable-test-hooks so integration tests can drive the actual app via the Hook API.
Tests are categorized by marker — unit, integration, strict, clean_install, docker — so CI can opt in to expensive tests.

This guide is the canonical reference for how the test suite is structured and how to add new tests.

Test File Layout

tests/
├── conftest.py                    # Session-wide fixtures (live_gui, isolate_workspace, etc.)
├── conftest.py is the canonical source
├── test_*.py                      # 251 test files, named `test_<topic>_<aspect>.py`
├── *_sim.py                      # Integration tests using the live_gui fixture
├── test_clean_install.py          # Opt-in: clones the repo to tmp and verifies hooks
├── test_docker_build.py          # Opt-in: builds and runs the Docker image
├── test_arch_boundary_phase1.py   # Architectural boundary tests
├── test_enforce_no_real_toml.py   # Meta-test for the enforcer fixture
├── artifacts/                    # Git-ignored; test output
├── logs/                          # Git-ignored; live_gui log files
└── mock_concurrent_mma.py         # Mock providers for MMA tests

Naming conventions:

test_*.py — pytest collection
*_sim.py — integration test (uses live_gui)
*_e2e.py — end-to-end test (real processes, opt-in via env var)
test_<area>_<aspect>.py — single aspect of an area, e.g., test_ai_client_cli.py

The `conftest.py` Fixtures

The tests/conftest.py file defines 7 fixtures. They are listed below in the order pytest applies them (autouse first, then function-scoped, then session-scoped).

Autouse Fixtures (Run Before Every Test)

`isolate_workspace` (line 70)

Purpose: Give every test a fresh, isolated workspace so it cannot pollute the user's real manual_slop.toml, presets.toml, etc.

Mechanism:

Creates a temp directory via tmp_path_factory.mktemp("isolated_workspace")
Writes a fresh config.toml to the temp dir
Sets SLOP_CONFIG, SLOP_GLOBAL_PRESETS, SLOP_GLOBAL_TOOL_PRESETS, SLOP_GLOBAL_PERSONAS, SLOP_GLOBAL_WORKSPACE_PROFILES env vars to point at the temp dir
The app reads these env vars on startup; the test sees an isolated world

Verification: python scripts/check_test_toml_paths.py exits 0 (no test references real TOMLs).

`reset_paths` (line 95)

Purpose: Reset the src.paths global state before and after each test.

Mechanism: Calls paths.reset_resolved() so path resolution re-evaluates on the next access.

`reset_ai_client` (line 107)

Purpose: Prevent ai_client state from leaking between tests.

Mechanism:

Calls ai_client.reset_session()
Clears callback hooks (confirm_and_run_callback, comms_log_callback, tool_log_callback)
Clears all event listeners
Resets provider to ("gemini", "gemini-2.5-flash-lite")
Resets MCP client state via mcp_client.configure([], [])

Function-Scoped Fixtures (Opt-in)

`vlogger` (line 131)

Purpose: Provide a VerificationLogger instance for structured diagnostic logging.

Usage:

def test_my_thing(vlogger):
    vlogger.log_state("Field", "before_value", "after_value")
    # ... test logic ...
    vlogger.finalize("Test Title", "PASS", "result message")

Output: tests/logs/<timestamp>/<script_name>.txt

`kill_process_tree` (function, line 138)

Purpose: Robustly kill a process and all its children. Used by live_gui for cleanup, but available to any test.

Mechanism:

Windows: taskkill /F /T /PID <pid> (the /T flag is critical — kills the whole tree)
Unix: os.killpg(os.getpgid(pid), SIGKILL) (kills the process group)

`mock_app` (line 157)

Purpose: Create an App instance with all external side effects mocked. For unit tests that need the App but not the GUI loop.

Mocks applied:

src.models.load_config → returns a default config
src.gui_2.project_manager
src.gui_2.session_logger
src.gui_2.immapp.run (prevents the actual render loop from starting)
src.app_controller.AppController._load_active_project
src.app_controller.AppController._fetch_models
App._load_fonts
App._post_init
src.app_controller.AppController._prune_old_logs
src.app_controller.AppController.start_services
src.app_controller.AppController._init_ai_and_hooks
src.performance_monitor.PerformanceMonitor

Cleanup: Shuts down the controller after the test.

`app_instance` (line 190)

Purpose: Same as mock_app but with a slightly different mocking surface (the same mocks but used in test_gui_phase4.py and test_token_viz.py historically). Both are equivalent for most purposes.

Session-Scoped Fixtures (One Per Test Run)

`live_gui` (line 227)

Purpose: Start sloppy.py --enable-test-hooks for the entire test session. Integration tests use this to drive the real GUI via the Hook API.

Lifecycle:

Setup (once per session):
- Create tests/artifacts/live_gui_workspace/ temp directory
- Write manual_slop.toml and config.toml to the workspace
- Set up SLOP_* env vars to point at the workspace
- Symlink assets/ for fonts
- Launch sloppy.py --enable-test-hooks via subprocess.Popen
- Poll GET /status for up to 15 seconds (waiting for the HookServer to start)
- On failure: pytest.fail() (kills the process tree, aborts the session)
Yield: tests run
Teardown (once per session):
- Call ApiHookClient.reset_session() to clear GUI state
- Kill the process tree (Windows: taskkill /F /T, Unix: SIGKILL)
- Wait 0.5s for file handles to close
- Close the log file
- Remove the temp workspace (with 5 retries for Windows file locks)

Yield value: (process: subprocess.Popen, gui_script: str) — but most tests just take the fixture and use the ApiHookClient directly.

Usage pattern:

def test_my_thing(live_gui):
    client = ApiHookClient()  # connects to localhost:8999
    client.click("btn_id")
    time.sleep(0.5)
    assert client.get_value("show_thing") is True

Test Categories

1. Unit Tests (no fixtures, fast)

Pure functions tested in isolation. No app, no GUI, no subprocess. Run in <100ms each.

Examples:

tests/test_command_palette.py — fuzzy matcher, command registry
tests/test_fuzzy_anchor.py — anchor slice algorithm
tests/test_paths.py — path resolution
tests/test_token_usage.py — token tracking
tests/test_cost_tracker.py — cost estimation

Pattern:

def test_my_unit():
    result = my_function(input)
    assert result == expected

2. Integration Tests (use `live_gui`, slow)

Drive the actual app via the Hook API. Run in 1-10 seconds each (real subprocess).

Examples:

tests/test_saved_presets_sim.py — preset switching via the GUI
tests/test_command_palette_sim.py — palette toggle, navigation
tests/test_mma_concurrent_tracks_sim.py — multi-track MMA
tests/test_workspace_profiles_sim.py — workspace profile save/load
tests/test_gui_dag_beads.py — Beads DAG visualization

Pattern:

def test_my_integration(live_gui):
    client = ApiHookClient()
    client.push_event("custom_callback", {
        "callback": "_my_method",
        "args": [arg1, arg2],
    })
    time.sleep(0.5)
    assert client.get_value("result") == expected

3. Mock App Tests (use `mock_app` or `app_instance`, fast)

Need an App instance but not the full render loop. Run in <500ms each.

Examples:

tests/test_text_viewer.py — text viewer state updates
tests/test_patch_modal.py — patch modal workflow
tests/test_gui2_events.py — event subscriptions

Pattern:

def test_my_thing(mock_app):
    mock_app.some_attr = "test_value"
    mock_app._do_something()
    assert mock_app.some_attr == "expected"

4. Headless Tests (no GUI, real services)

Test the FastAPI/headless service directly via the Hook API. No subprocess.

Examples:

tests/test_headless_service.py — service lifecycle
tests/test_headless_verification.py — full run with QA interceptor

5. Opt-in Tests (gated by env var)

Slow or network-dependent tests that don't run by default. Set the env var to enable.

Test File	Marker	Env Var	Purpose
`tests/test_clean_install.py`	`@pytest.mark.clean_install`	`RUN_CLEAN_INSTALL_TEST=1`	Clones the repo to tmp and verifies the hook API
`tests/test_docker_build.py`	`@pytest.mark.docker`	`RUN_DOCKER_TEST=1`	Builds and runs the Docker image

Running opt-in tests:

RUN_CLEAN_INSTALL_TEST=1 uv run pytest tests/test_clean_install.py -v
RUN_DOCKER_TEST=1 uv run pytest tests/test_docker_build.py -v

Markers

Defined in pyproject.toml:

[tool.pytest.ini_options]
markers = [
    "integration: marks tests as integration tests (requires live GUI)",
]

Adding a new marker: add it to the list. Pytest will warn if a marker is used but not registered.

Filtering by marker:

uv run pytest -m integration         # Only integration tests
uv run pytest -m "not integration"   # Skip integration tests
uv run pytest -m clean_install      # Opt-in clean install tests

The Hook API (For Integration Tests)

The live GUI exposes a Hook API on http://127.0.0.1:8999 when launched with --enable-test-hooks. The ApiHookClient (src/api_hook_client.py) is the Python wrapper.

Key Methods

client = ApiHookClient()  # connects to localhost:8999 by default

# Click a button
client.click("btn_reset")

# Set a widget value
client.set_value("ui_ai_input", "Hello world")

# Push a generic GUI task
client.push_event("custom_callback", {
    "callback": "_my_method",
    "args": [arg1, arg2],
})

# Get a value (gettable field)
value = client.get_value("show_command_palette")

# Wait for an event
event = client.wait_for_event("ai_response", timeout=10)

# Reset the session
client.reset_session()

`predefined_callbacks` Pattern

To make a test invoke an App method via the hook, register it in gui_2.py:

self.controller._predefined_callbacks['_my_method'] = self._my_method
self.controller._gettable_fields['show_thing'] = 'show_thing'

The test can then invoke _my_method via:

client.push_event("custom_callback", {
    "callback": "_my_method",
    "args": [],
})

This pattern is how the Command Palette's _toggle_command_palette is exposed for tests (since the keyboard shortcut can't be simulated via the hook).

Common Patterns

Testing a Pure Function

def test_my_function():
    from src.mymodule import my_function
    result = my_function("input", 42)
    assert result == "expected"

Testing with a Mock App

from unittest.mock import MagicMock

def test_with_mock():
    app = MagicMock()
    app.some_attr = "test"
    from src.mymodule import do_thing
    do_thing(app)
    app.some_method.assert_called_once()

Testing via live_gui

import time
import pytest
from src.api_hook_client import ApiHookClient

def test_via_gui(live_gui):
    client = ApiHookClient()
    client.push_event("custom_callback", {
        "callback": "_some_method",
        "args": ["value"],
    })
    time.sleep(0.5)
    assert client.get_value("result") == "expected"

Testing an Exception Path

import pytest

def test_raises():
    from src.mymodule import do_thing
    with pytest.raises(ValueError, match="expected message"):
        do_thing(bad_input)

Parametrized Tests

import pytest

@pytest.mark.parametrize("input,expected", [
    ("a", 1),
    ("b", 2),
    ("c", 3),
])
def test_my_parametrized(input, expected):
    assert my_function(input) == expected

Test Configuration

`pyproject.toml`

[tool.pytest.ini_options]
asyncio_mode = "strict"
markers = [
    "integration: marks tests as integration tests (requires live GUI)",
]
asyncio_default_fixture_loop_scope = None
asyncio_default_test_loop_scope = "function"

asyncio_mode = "strict" means async tests need explicit @pytest.mark.asyncio. This is intentional — most Manual Slop tests are synchronous.

Coverage

Run with coverage:

uv run pytest tests/ --cov=src --cov-report=html

Open htmlcov/index.html in a browser. Target: >80% coverage for new code (per the project's quality gates).

Running Tests

All Tests

uv run pytest tests/ -v

Warning: This runs 251 tests including slow live_gui integration tests. Total runtime: 5-10 minutes.

Specific Test File

uv run pytest tests/test_command_palette.py -v

Specific Test

uv run pytest tests/test_command_palette.py::test_fuzzy_match_prefix_ranks_first -v

By Marker

uv run pytest -m integration -v      # Only integration tests
uv run pytest -m "not integration"   # Skip integration tests

With Stop on First Failure

uv run pytest tests/ -x -v

With Timeout

uv run pytest tests/ --timeout=60 -v

Adding a New Test

For a Pure Function

Add tests to an existing tests/test_<module>.py file (if it exists) or create a new one
Use def test_<thing>(): naming convention
No fixtures needed unless you're reading state
Verify it runs: uv run pytest tests/test_<file>.py::test_<name> -v

For an Integration Test

Create or extend a *_sim.py file
Add def test_<thing>(live_gui): with the live_gui fixture
Use ApiHookClient to drive the GUI
If you need to invoke an App method that's not yet exposed, register it as a _predefined_callbacks entry in gui_2.py
Verify: uv run pytest tests/test_<file>_sim.py::test_<name> -v

For an Opt-in Test (Clean Install / Docker)

Mark with @pytest.mark.<marker_name>
Gate the entire file with a skip if the env var isn't set
Add the marker to pyproject.toml's markers list
Document the env var in the test file's docstring

Debugging Failed Tests

Verbose Output

uv run pytest tests/test_X.py -v -s

-s disables stdout/stderr capture so you can see print() output.

Stop at First Failure

uv run pytest tests/test_X.py -x

Enter PDB on Failure

uv run pytest tests/test_X.py --pdb

Show Local Variables on Failure

uv run pytest tests/test_X.py -l

Re-run Last Failed

uv run pytest --lf

Common Failure Modes

Symptom	Likely Cause	Fix
`ImportError` for a module	Missing dependency or 1-space indent issue	Check pyproject.toml; run `uv sync`
`live_gui` times out	Previous test left a process running	`taskkill /F /IM python.exe` to clean up
`get_value` returns `None`	Field not registered as gettable	Add to `self.controller._gettable_fields` in `gui_2.py`
`custom_callback` does nothing	Callback not registered	Add to `self.controller._predefined_callbacks`
`IM_ASSERT: Must call EndChild()`	Modal end_child/end pairing broken (usually from a buggy action)	Wrap actions in try/except; check for `imgui.end_child()` before `imgui.end()`
`pytest.fail` from `live_gui` startup	Hook server didn't start in 15s	Check `logs/gui_2_py_test.log` for crash

The `Audit Script`

scripts/check_test_toml_paths.py greps tests/ for direct ./<name>.toml references and exits 0 only if all tests are sandboxed. It's the enforcement mechanism for the "no real TOML in tests" rule.

Run it:

python scripts/check_test_toml_paths.py

Expected output:

OK: No tests reference real TOML files.

If violations are found, migrate the offender to use tmp_path + monkeypatch.

Test Data Flow

A typical test goes through this lifecycle:

Test starts
  ├─> isolate_workspace (autouse)
  │     ├─> Creates tmp dir
  │     └─> Sets SLOP_* env vars
  │
  ├─> reset_paths (autouse)
  │     └─> paths.reset_resolved()
  │
  ├─> reset_ai_client (autouse)
  │     └─> Resets ai_client global state
  │
  ├─> (test body runs)
  │     ├─> If using live_gui: subprocess already running (session-scoped)
  │     ├─> Test makes API calls via ApiHookClient
  │     └─> Test asserts on returned values
  │
  └─> Teardown
        ├─> reset_paths runs again
        └─> (autouse) state cleanup

The live_gui session fixture runs once at the start of the test session and tears down once at the end. All tests in the session share the same sloppy.py process.

Known Gotchas (2026-06-05)

Authoring Robust `live_gui` Tests (Don't Assume Clean State)

live_gui is a session-scoped fixture. All tests in a session share the same sloppy.py subprocess. The subprocess is not restarted between tests; its internal state (Fonts, DisplaySize, internal caches, current theme, current workspace profile, current discussion, current MMA track) accumulates from the previous test.

This is a test-authoring contract, not a fixture bug. A test that "passes when run after test X" but "fails when run in isolation" is a fragile test. Robust live_gui tests must:

Not assume clean state. Before invoking an operation, explicitly verify the precondition via the Hook API (e.g. client.get_value("show_my_window"), client.get_mma_status(), client.get_session()). Do not assume a previous test set the state.
Use the wait-for-ready pattern, not fixed sleeps. time.sleep(1) is not enough for ImGui to stabilize in the first few render frames (use 3+ seconds, but better: use wait_for_event with a generous timeout, or poll client.get_status() until ImGui reports ready). Fixed sleeps are a code smell; if you reach for one, the right answer is almost always "poll a gettable field instead".
Reset state explicitly if the test depends on it. For tests that mutate state (e.g. "click button X"), reset the relevant state via Hook API in a try/finally so the next test starts from a known baseline. Alternatively, use a function-scoped helper that issues a reset_session callback before the test body.
Test both in the full suite AND in isolation before merging. If a test passes in the full suite but fails in isolation, the test is fragile — fix the test, don't add a "warmup" comment. Bisecting by pytest path::test -k "filter" or pytest --collect-only --quiet helps.

Use get_value/wait_for_event to assert ready, not just to assert success. Example:

def test_open_settings_modal(live_gui):
    client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
    # Wait for the modal to actually appear, not just for the click to dispatch
    assert client.get_value("show_settings_modal"), "settings modal did not open"

The get_value poll doubles as a wait-for-ready AND a correctness assertion.

Anti-pattern (fragile):

def test_open_settings_modal(live_gui):
    client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
    time.sleep(1)  # hope the modal opened
    assert some_cached_value["settings_open"] is True  # may be stale from a prior test

Pattern (robust):

def test_open_settings_modal(live_gui):
    client.reset_session()  # function-scoped helper; Hook API reset callback
    client.push_event("custom_callback", {"callback": "_toggle_settings", "args": []})
    assert client.get_value("show_settings_modal"), "settings modal did not open"

Early-Render C-Level Crashes (Defer-Not-Catch Pattern)

imgui.save_ini_settings_to_memory() (and similar raw imgui calls that read internal state) will crash the Python process at the C level (0xc0000005 access violation) if called before ImGui's internal state is fully initialized. This is not catchable from Python — try/except Exception cannot intercept native access violations.

Symptoms:

The sloppy.py subprocess disappears without a Python traceback.
The pytest output shows pytest.fail("Hook server did not start in 15s") (the subprocess died during startup).
Windows Event Viewer shows Faulting module: _imgui_bundle.cp311-win_amd64.pyd with exception code 0xc0000005.

Fix pattern: defer-not-catch. Track a one-shot "ready" flag in the instance state; return early on the first call, only invoking the C function on subsequent calls:

def _capture_workspace_profile(self, name: str) -> models.WorkspaceProfile:
    if not getattr(self, "_ini_capture_ready", False):
        self._ini_capture_ready = True
        return models.WorkspaceProfile(name=name, docking_layout=b"", ...)
    ini = imgui.save_ini_settings_to_memory()
    return models.WorkspaceProfile(name=name, docking_layout=ini.encode("utf-8") if isinstance(ini, str) else ini, ...)

The first call (during initial startup) returns a safe empty profile and flips the flag; subsequent calls (when the user actually clicks "Save Profile") invoke the C function. The user's workflow is unaffected because the first call is non-blocking and the user cannot have clicked "Save Profile" before the GUI was fully rendered.

See src/gui_2.py:601-606 for the canonical implementation. This pattern unblocks 4-5 live_gui tests that were crashing the GUI subprocess during the first render frames after _capture_workspace_profile was invoked by the test (typically via a save_workspace_profile Hook API callback).

23 KiB Raw Blame History

Testing Guide

Overview

Test File Layout

The conftest.py Fixtures

Autouse Fixtures (Run Before Every Test)

isolate_workspace (line 70)

reset_paths (line 95)

reset_ai_client (line 107)

Function-Scoped Fixtures (Opt-in)

vlogger (line 131)

kill_process_tree (function, line 138)

mock_app (line 157)

app_instance (line 190)

Session-Scoped Fixtures (One Per Test Run)

live_gui (line 227)

Test Categories

1. Unit Tests (no fixtures, fast)

2. Integration Tests (use live_gui, slow)

3. Mock App Tests (use mock_app or app_instance, fast)

4. Headless Tests (no GUI, real services)

5. Opt-in Tests (gated by env var)

Markers

The Hook API (For Integration Tests)

Key Methods

predefined_callbacks Pattern

Common Patterns

Testing a Pure Function

Testing with a Mock App

Testing via live_gui

Testing an Exception Path

Parametrized Tests

Test Configuration

pyproject.toml

Coverage

Running Tests

All Tests

Specific Test File

Specific Test

By Marker

With Stop on First Failure

With Timeout

Adding a New Test

For a Pure Function

For an Integration Test

For an Opt-in Test (Clean Install / Docker)

Debugging Failed Tests

Verbose Output

Stop at First Failure

Enter PDB on Failure

Show Local Variables on Failure

Re-run Last Failed

Common Failure Modes

The Audit Script

Test Data Flow

Known Gotchas (2026-06-05)

Authoring Robust live_gui Tests (Don't Assume Clean State)

Early-Render C-Level Crashes (Defer-Not-Catch Pattern)

See Also

23 KiB

Raw Blame History

The `conftest.py` Fixtures

`isolate_workspace` (line 70)

`reset_paths` (line 95)

`reset_ai_client` (line 107)

`vlogger` (line 131)

`kill_process_tree` (function, line 138)

`mock_app` (line 157)

`app_instance` (line 190)

`live_gui` (line 227)

2. Integration Tests (use `live_gui`, slow)

3. Mock App Tests (use `mock_app` or `app_instance`, fast)

`predefined_callbacks` Pattern

`pyproject.toml`

The `Audit Script`

Authoring Robust `live_gui` Tests (Don't Assume Clean State)