Private
Public Access
0
0

docs(testing): add comprehensive guide_testing.md

Comprehensive guide covering the 251-test-file suite:
- Test file layout and naming conventions
- 7 conftest.py fixtures (isolate_workspace, reset_paths, reset_ai_client, vlogger, kill_process_tree, mock_app, app_instance, live_gui) with their mechanisms
- 5 test categories (unit, integration, mock app, headless, opt-in)
- Markers (integration, clean_install, docker) and how to filter by them
- Hook API for integration tests (ApiHookClient methods, predefined_callbacks pattern)
- Common test patterns (pure function, mock, live_gui, exception, parametrized)
- Test configuration in pyproject.toml
- Running tests (all, by file, by marker, with timeout, etc.)
- Adding new tests (pure, integration, opt-in)
- Debugging failed tests (common failure modes and fixes)
- The check_test_toml_paths.py audit script
- Test data flow diagram
This commit is contained in:
2026-06-02 23:05:02 -04:00
parent a280706ce4
commit 7825588200
3 changed files with 610 additions and 20 deletions
+1 -1
View File
@@ -38,7 +38,7 @@ separate_external_tools = false
"AI Settings" = true
"MMA Dashboard" = false
"Task DAG" = false
"Usage Analytics" = true
"Usage Analytics" = false
"Tier 1" = false
"Tier 2" = false
"Tier 3" = false
+590
View File
@@ -0,0 +1,590 @@
# Testing Guide
[Top](../README.md) | [Architecture](guide_architecture.md) | [Simulations](guide_simulations.md) | [Workflow](../../conductor/workflow.md)
---
## Overview
Manual Slop has **251 test files** in `tests/` covering every subsystem. The test infrastructure is designed around four principles:
1. **No real I/O during tests** — every test gets a sandboxed workspace via the `isolate_workspace` autouse fixture.
2. **No real AI calls** — tests use mock providers, reset session state, and never hit the network.
3. **GUI tests launch a real app** — the `live_gui` session fixture starts `sloppy.py --enable-test-hooks` so integration tests can drive the actual app via the Hook API.
4. **Tests are categorized by marker** — unit, integration, strict, clean_install, docker — so CI can opt in to expensive tests.
This guide is the canonical reference for how the test suite is structured and how to add new tests.
---
## Test File Layout
```
tests/
├── conftest.py # Session-wide fixtures (live_gui, isolate_workspace, etc.)
├── conftest.py is the canonical source
├── test_*.py # 251 test files, named `test_<topic>_<aspect>.py`
├── *_sim.py # Integration tests using the live_gui fixture
├── test_clean_install.py # Opt-in: clones the repo to tmp and verifies hooks
├── test_docker_build.py # Opt-in: builds and runs the Docker image
├── test_arch_boundary_phase1.py # Architectural boundary tests
├── test_enforce_no_real_toml.py # Meta-test for the enforcer fixture
├── artifacts/ # Git-ignored; test output
├── logs/ # Git-ignored; live_gui log files
└── mock_concurrent_mma.py # Mock providers for MMA tests
```
**Naming conventions:**
- `test_*.py` — pytest collection
- `*_sim.py` — integration test (uses `live_gui`)
- `*_e2e.py` — end-to-end test (real processes, opt-in via env var)
- `test_<area>_<aspect>.py` — single aspect of an area, e.g., `test_ai_client_cli.py`
---
## The `conftest.py` Fixtures
The `tests/conftest.py` file defines 7 fixtures. They are listed below in the order pytest applies them (autouse first, then function-scoped, then session-scoped).
### Autouse Fixtures (Run Before Every Test)
#### `isolate_workspace` (line 70)
**Purpose**: Give every test a fresh, isolated workspace so it cannot pollute the user's real `manual_slop.toml`, `presets.toml`, etc.
**Mechanism**:
1. Creates a temp directory via `tmp_path_factory.mktemp("isolated_workspace")`
2. Writes a fresh `config.toml` to the temp dir
3. Sets `SLOP_CONFIG`, `SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES` env vars to point at the temp dir
4. The app reads these env vars on startup; the test sees an isolated world
**Verification**: `python scripts/check_test_toml_paths.py` exits 0 (no test references real TOMLs).
#### `reset_paths` (line 95)
**Purpose**: Reset the `src.paths` global state before and after each test.
**Mechanism**: Calls `paths.reset_resolved()` so path resolution re-evaluates on the next access.
#### `reset_ai_client` (line 107)
**Purpose**: Prevent `ai_client` state from leaking between tests.
**Mechanism**:
1. Calls `ai_client.reset_session()`
2. Clears callback hooks (`confirm_and_run_callback`, `comms_log_callback`, `tool_log_callback`)
3. Clears all event listeners
4. Resets provider to `("gemini", "gemini-2.5-flash-lite")`
5. Resets MCP client state via `mcp_client.configure([], [])`
### Function-Scoped Fixtures (Opt-in)
#### `vlogger` (line 131)
**Purpose**: Provide a `VerificationLogger` instance for structured diagnostic logging.
**Usage**:
```python
def test_my_thing(vlogger):
vlogger.log_state("Field", "before_value", "after_value")
# ... test logic ...
vlogger.finalize("Test Title", "PASS", "result message")
```
Output: `tests/logs/<timestamp>/<script_name>.txt`
#### `kill_process_tree` (function, line 138)
**Purpose**: Robustly kill a process and all its children. Used by `live_gui` for cleanup, but available to any test.
**Mechanism**:
- Windows: `taskkill /F /T /PID <pid>` (the `/T` flag is critical — kills the whole tree)
- Unix: `os.killpg(os.getpgid(pid), SIGKILL)` (kills the process group)
#### `mock_app` (line 157)
**Purpose**: Create an `App` instance with all external side effects mocked. For unit tests that need the App but not the GUI loop.
**Mocks applied**:
- `src.models.load_config` → returns a default config
- `src.gui_2.project_manager`
- `src.gui_2.session_logger`
- `src.gui_2.immapp.run` (prevents the actual render loop from starting)
- `src.app_controller.AppController._load_active_project`
- `src.app_controller.AppController._fetch_models`
- `App._load_fonts`
- `App._post_init`
- `src.app_controller.AppController._prune_old_logs`
- `src.app_controller.AppController.start_services`
- `src.app_controller.AppController._init_ai_and_hooks`
- `src.performance_monitor.PerformanceMonitor`
**Cleanup**: Shuts down the controller after the test.
#### `app_instance` (line 190)
**Purpose**: Same as `mock_app` but with a slightly different mocking surface (the same mocks but used in `test_gui_phase4.py` and `test_token_viz.py` historically). Both are equivalent for most purposes.
### Session-Scoped Fixtures (One Per Test Run)
#### `live_gui` (line 227)
**Purpose**: Start `sloppy.py --enable-test-hooks` for the entire test session. Integration tests use this to drive the real GUI via the Hook API.
**Lifecycle**:
1. **Setup (once per session)**:
- Create `tests/artifacts/live_gui_workspace/` temp directory
- Write `manual_slop.toml` and `config.toml` to the workspace
- Set up `SLOP_*` env vars to point at the workspace
- Symlink `assets/` for fonts
- Launch `sloppy.py --enable-test-hooks` via `subprocess.Popen`
- Poll `GET /status` for up to 15 seconds (waiting for the HookServer to start)
- On failure: `pytest.fail()` (kills the process tree, aborts the session)
2. **Yield**: tests run
3. **Teardown (once per session)**:
- Call `ApiHookClient.reset_session()` to clear GUI state
- Kill the process tree (Windows: `taskkill /F /T`, Unix: `SIGKILL`)
- Wait 0.5s for file handles to close
- Close the log file
- Remove the temp workspace (with 5 retries for Windows file locks)
**Yield value**: `(process: subprocess.Popen, gui_script: str)` — but most tests just take the fixture and use the `ApiHookClient` directly.
**Usage pattern**:
```python
def test_my_thing(live_gui):
client = ApiHookClient() # connects to localhost:8999
client.click("btn_id")
time.sleep(0.5)
assert client.get_value("show_thing") is True
```
---
## Test Categories
### 1. Unit Tests (no fixtures, fast)
Pure functions tested in isolation. No app, no GUI, no subprocess. Run in <100ms each.
**Examples**:
- `tests/test_command_palette.py` — fuzzy matcher, command registry
- `tests/test_fuzzy_anchor.py` — anchor slice algorithm
- `tests/test_paths.py` — path resolution
- `tests/test_token_usage.py` — token tracking
- `tests/test_cost_tracker.py` — cost estimation
**Pattern**:
```python
def test_my_unit():
result = my_function(input)
assert result == expected
```
### 2. Integration Tests (use `live_gui`, slow)
Drive the actual app via the Hook API. Run in 1-10 seconds each (real subprocess).
**Examples**:
- `tests/test_saved_presets_sim.py` — preset switching via the GUI
- `tests/test_command_palette_sim.py` — palette toggle, navigation
- `tests/test_mma_concurrent_tracks_sim.py` — multi-track MMA
- `tests/test_workspace_profiles_sim.py` — workspace profile save/load
- `tests/test_gui_dag_beads.py` — Beads DAG visualization
**Pattern**:
```python
def test_my_integration(live_gui):
client = ApiHookClient()
client.push_event("custom_callback", {
"callback": "_my_method",
"args": [arg1, arg2],
})
time.sleep(0.5)
assert client.get_value("result") == expected
```
### 3. Mock App Tests (use `mock_app` or `app_instance`, fast)
Need an App instance but not the full render loop. Run in <500ms each.
**Examples**:
- `tests/test_text_viewer.py` — text viewer state updates
- `tests/test_patch_modal.py` — patch modal workflow
- `tests/test_gui2_events.py` — event subscriptions
**Pattern**:
```python
def test_my_thing(mock_app):
mock_app.some_attr = "test_value"
mock_app._do_something()
assert mock_app.some_attr == "expected"
```
### 4. Headless Tests (no GUI, real services)
Test the FastAPI/headless service directly via the Hook API. No subprocess.
**Examples**:
- `tests/test_headless_service.py` — service lifecycle
- `tests/test_headless_verification.py` — full run with QA interceptor
### 5. Opt-in Tests (gated by env var)
Slow or network-dependent tests that don't run by default. Set the env var to enable.
| Test File | Marker | Env Var | Purpose |
|---|---|---|---|
| `tests/test_clean_install.py` | `@pytest.mark.clean_install` | `RUN_CLEAN_INSTALL_TEST=1` | Clones the repo to tmp and verifies the hook API |
| `tests/test_docker_build.py` | `@pytest.mark.docker` | `RUN_DOCKER_TEST=1` | Builds and runs the Docker image |
**Running opt-in tests**:
```bash
RUN_CLEAN_INSTALL_TEST=1 uv run pytest tests/test_clean_install.py -v
RUN_DOCKER_TEST=1 uv run pytest tests/test_docker_build.py -v
```
---
## Markers
Defined in `pyproject.toml`:
```toml
[tool.pytest.ini_options]
markers = [
"integration: marks tests as integration tests (requires live GUI)",
]
```
**Adding a new marker**: add it to the list. Pytest will warn if a marker is used but not registered.
**Filtering by marker**:
```bash
uv run pytest -m integration # Only integration tests
uv run pytest -m "not integration" # Skip integration tests
uv run pytest -m clean_install # Opt-in clean install tests
```
---
## The Hook API (For Integration Tests)
The live GUI exposes a Hook API on `http://127.0.0.1:8999` when launched with `--enable-test-hooks`. The `ApiHookClient` (`src/api_hook_client.py`) is the Python wrapper.
### Key Methods
```python
client = ApiHookClient() # connects to localhost:8999 by default
# Click a button
client.click("btn_reset")
# Set a widget value
client.set_value("ui_ai_input", "Hello world")
# Push a generic GUI task
client.push_event("custom_callback", {
"callback": "_my_method",
"args": [arg1, arg2],
})
# Get a value (gettable field)
value = client.get_value("show_command_palette")
# Wait for an event
event = client.wait_for_event("ai_response", timeout=10)
# Reset the session
client.reset_session()
```
### `predefined_callbacks` Pattern
To make a test invoke an App method via the hook, register it in `gui_2.py`:
```python
self.controller._predefined_callbacks['_my_method'] = self._my_method
self.controller._gettable_fields['show_thing'] = 'show_thing'
```
The test can then invoke `_my_method` via:
```python
client.push_event("custom_callback", {
"callback": "_my_method",
"args": [],
})
```
This pattern is how the Command Palette's `_toggle_command_palette` is exposed for tests (since the keyboard shortcut can't be simulated via the hook).
---
## Common Patterns
### Testing a Pure Function
```python
def test_my_function():
from src.mymodule import my_function
result = my_function("input", 42)
assert result == "expected"
```
### Testing with a Mock App
```python
from unittest.mock import MagicMock
def test_with_mock():
app = MagicMock()
app.some_attr = "test"
from src.mymodule import do_thing
do_thing(app)
app.some_method.assert_called_once()
```
### Testing via live_gui
```python
import time
import pytest
from src.api_hook_client import ApiHookClient
def test_via_gui(live_gui):
client = ApiHookClient()
client.push_event("custom_callback", {
"callback": "_some_method",
"args": ["value"],
})
time.sleep(0.5)
assert client.get_value("result") == "expected"
```
### Testing an Exception Path
```python
import pytest
def test_raises():
from src.mymodule import do_thing
with pytest.raises(ValueError, match="expected message"):
do_thing(bad_input)
```
### Parametrized Tests
```python
import pytest
@pytest.mark.parametrize("input,expected", [
("a", 1),
("b", 2),
("c", 3),
])
def test_my_parametrized(input, expected):
assert my_function(input) == expected
```
---
## Test Configuration
### `pyproject.toml`
```toml
[tool.pytest.ini_options]
asyncio_mode = "strict"
markers = [
"integration: marks tests as integration tests (requires live GUI)",
]
asyncio_default_fixture_loop_scope = None
asyncio_default_test_loop_scope = "function"
```
`asyncio_mode = "strict"` means async tests need explicit `@pytest.mark.asyncio`. This is intentional — most Manual Slop tests are synchronous.
### Coverage
Run with coverage:
```bash
uv run pytest tests/ --cov=src --cov-report=html
```
Open `htmlcov/index.html` in a browser. Target: >80% coverage for new code (per the project's quality gates).
---
## Running Tests
### All Tests
```bash
uv run pytest tests/ -v
```
**Warning**: This runs 251 tests including slow `live_gui` integration tests. Total runtime: 5-10 minutes.
### Specific Test File
```bash
uv run pytest tests/test_command_palette.py -v
```
### Specific Test
```bash
uv run pytest tests/test_command_palette.py::test_fuzzy_match_prefix_ranks_first -v
```
### By Marker
```bash
uv run pytest -m integration -v # Only integration tests
uv run pytest -m "not integration" # Skip integration tests
```
### With Stop on First Failure
```bash
uv run pytest tests/ -x -v
```
### With Timeout
```bash
uv run pytest tests/ --timeout=60 -v
```
---
## Adding a New Test
### For a Pure Function
1. Add tests to an existing `tests/test_<module>.py` file (if it exists) or create a new one
2. Use `def test_<thing>():` naming convention
3. No fixtures needed unless you're reading state
4. Verify it runs: `uv run pytest tests/test_<file>.py::test_<name> -v`
### For an Integration Test
1. Create or extend a `*_sim.py` file
2. Add `def test_<thing>(live_gui):` with the live_gui fixture
3. Use `ApiHookClient` to drive the GUI
4. If you need to invoke an App method that's not yet exposed, register it as a `_predefined_callbacks` entry in `gui_2.py`
5. Verify: `uv run pytest tests/test_<file>_sim.py::test_<name> -v`
### For an Opt-in Test (Clean Install / Docker)
1. Mark with `@pytest.mark.<marker_name>`
2. Gate the entire file with a skip if the env var isn't set
3. Add the marker to `pyproject.toml`'s `markers` list
4. Document the env var in the test file's docstring
---
## Debugging Failed Tests
### Verbose Output
```bash
uv run pytest tests/test_X.py -v -s
```
`-s` disables stdout/stderr capture so you can see print() output.
### Stop at First Failure
```bash
uv run pytest tests/test_X.py -x
```
### Enter PDB on Failure
```bash
uv run pytest tests/test_X.py --pdb
```
### Show Local Variables on Failure
```bash
uv run pytest tests/test_X.py -l
```
### Re-run Last Failed
```bash
uv run pytest --lf
```
### Common Failure Modes
| Symptom | Likely Cause | Fix |
|---|---|---|
| `ImportError` for a module | Missing dependency or 1-space indent issue | Check pyproject.toml; run `uv sync` |
| `live_gui` times out | Previous test left a process running | `taskkill /F /IM python.exe` to clean up |
| `get_value` returns `None` | Field not registered as gettable | Add to `self.controller._gettable_fields` in `gui_2.py` |
| `custom_callback` does nothing | Callback not registered | Add to `self.controller._predefined_callbacks` |
| `IM_ASSERT: Must call EndChild()` | Modal end_child/end pairing broken (usually from a buggy action) | Wrap actions in try/except; check for `imgui.end_child()` before `imgui.end()` |
| `pytest.fail` from `live_gui` startup | Hook server didn't start in 15s | Check `logs/gui_2_py_test.log` for crash |
---
## The `Audit Script`
`scripts/check_test_toml_paths.py` greps `tests/` for direct `./<name>.toml` references and exits 0 only if all tests are sandboxed. It's the enforcement mechanism for the "no real TOML in tests" rule.
**Run it**:
```bash
python scripts/check_test_toml_paths.py
```
**Expected output**:
```
OK: No tests reference real TOML files.
```
If violations are found, migrate the offender to use `tmp_path` + `monkeypatch`.
---
## Test Data Flow
A typical test goes through this lifecycle:
```
Test starts
├─> isolate_workspace (autouse)
│ ├─> Creates tmp dir
│ └─> Sets SLOP_* env vars
├─> reset_paths (autouse)
│ └─> paths.reset_resolved()
├─> reset_ai_client (autouse)
│ └─> Resets ai_client global state
├─> (test body runs)
│ ├─> If using live_gui: subprocess already running (session-scoped)
│ ├─> Test makes API calls via ApiHookClient
│ └─> Test asserts on returned values
└─> Teardown
├─> reset_paths runs again
└─> (autouse) state cleanup
```
The `live_gui` session fixture runs once at the start of the test session and tears down once at the end. All tests in the session share the same `sloppy.py` process.
---
## See Also
- **[guide_simulations.md](guide_simulations.md)** — Older guide focused on the Puppeteer pattern; still relevant for the test scenarios it documents
- **[guide_meta_boundary.md](guide_meta_boundary.md)** — Application vs Meta-Tooling domain separation; the test suite is in the Application domain
- **[guide_architecture.md](guide_architecture.md#the-task-pipeline-producer-consumer-synchronization)** — Threading model that the `live_gui` test fixture respects
- **`src/api_hook_client.py`** — The Python wrapper for the Hook API used in integration tests
- **`tests/conftest.py`** — The canonical source of all fixtures documented in this guide
See [guide_architecture.md](guide_architecture.md) for the overall architecture and [conductor/workflow.md](../../conductor/workflow.md) for the TDD protocol that the test suite implements.
+19 -19
View File
@@ -44,20 +44,20 @@ Collapsed=0
DockId=0x00000010,0
[Window][Message]
Pos=166,28
Size=1514,1172
Pos=280,28
Size=1400,1172
Collapsed=0
DockId=0x00000006,0
[Window][Response]
Pos=0,28
Size=164,1172
Size=278,1172
Collapsed=0
DockId=0x00000010,4
[Window][Tool Calls]
Pos=166,28
Size=1514,1172
Pos=280,28
Size=1400,1172
Collapsed=0
DockId=0x00000006,3
@@ -77,7 +77,7 @@ DockId=0xAFC85805,2
[Window][Theme]
Pos=0,28
Size=164,1172
Size=278,1172
Collapsed=0
DockId=0x00000010,0
@@ -105,26 +105,26 @@ Collapsed=0
DockId=0x0000000D,0
[Window][Discussion Hub]
Pos=166,28
Size=1514,1172
Pos=280,28
Size=1400,1172
Collapsed=0
DockId=0x00000006,1
[Window][Operations Hub]
Pos=0,28
Size=164,1172
Size=278,1172
Collapsed=0
DockId=0x00000010,3
[Window][Files & Media]
Pos=0,28
Size=164,1172
Size=278,1172
Collapsed=0
DockId=0x00000010,2
[Window][AI Settings]
Pos=0,28
Size=164,1172
Size=278,1172
Collapsed=0
DockId=0x00000010,1
@@ -140,8 +140,8 @@ Collapsed=0
DockId=0x00000006,2
[Window][Log Management]
Pos=166,28
Size=1514,1172
Pos=280,28
Size=1400,1172
Collapsed=0
DockId=0x00000006,2
@@ -495,8 +495,8 @@ Size=1780,1669
Collapsed=0
[Window][Context Preview]
Pos=1360,28
Size=1561,1677
Pos=280,28
Size=1400,1172
Collapsed=0
DockId=0x00000006,4
@@ -690,7 +690,7 @@ Column 1 Width=30
[Table][0x9D36FCE8,2]
RefScale=20
Column 0 Width=659
Column 0 Width=742
Column 1 Weight=1.0000
[Docking][Data]
@@ -700,10 +700,10 @@ DockNode ID=0x00000008 Pos=3125,170 Size=593,1157 Split=Y
DockSpace ID=0xAFC85805 Window=0x079D3A04 Pos=0,28 Size=1680,1172 Split=X
DockNode ID=0x00000003 Parent=0xAFC85805 SizeRef=2357,1183 Split=X
DockNode ID=0x0000000B Parent=0x00000003 SizeRef=404,1186 Split=X Selected=0xF4139CA2
DockNode ID=0x00000005 Parent=0x0000000B SizeRef=1208,1681 Split=Y Selected=0x3F1379AF
DockNode ID=0x00000010 Parent=0x00000005 SizeRef=983,1140 CentralNode=1 Selected=0x0D5A5273
DockNode ID=0x00000005 Parent=0x0000000B SizeRef=1144,1681 Split=Y Selected=0x3F1379AF
DockNode ID=0x00000010 Parent=0x00000005 SizeRef=983,1140 CentralNode=1 Selected=0x418C7449
DockNode ID=0x00000011 Parent=0x00000005 SizeRef=983,184 Selected=0x432BAE4E
DockNode ID=0x00000006 Parent=0x0000000B SizeRef=1514,1681 Selected=0x2C0206CE
DockNode ID=0x00000006 Parent=0x0000000B SizeRef=1400,1681 Selected=0x2C0206CE
DockNode ID=0x0000000D Parent=0x00000003 SizeRef=435,1186 Selected=0x363E93D6
DockNode ID=0x00000004 Parent=0xAFC85805 SizeRef=488,1183 Selected=0x3AEC3498