diff --git a/config.toml b/config.toml index dab060a1..f07d5cd8 100644 --- a/config.toml +++ b/config.toml @@ -38,7 +38,7 @@ separate_external_tools = false "AI Settings" = true "MMA Dashboard" = false "Task DAG" = false -"Usage Analytics" = true +"Usage Analytics" = false "Tier 1" = false "Tier 2" = false "Tier 3" = false diff --git a/docs/guide_testing.md b/docs/guide_testing.md new file mode 100644 index 00000000..ef1a3183 --- /dev/null +++ b/docs/guide_testing.md @@ -0,0 +1,590 @@ +# Testing Guide + +[Top](../README.md) | [Architecture](guide_architecture.md) | [Simulations](guide_simulations.md) | [Workflow](../../conductor/workflow.md) + +--- + +## Overview + +Manual Slop has **251 test files** in `tests/` covering every subsystem. The test infrastructure is designed around four principles: + +1. **No real I/O during tests** — every test gets a sandboxed workspace via the `isolate_workspace` autouse fixture. +2. **No real AI calls** — tests use mock providers, reset session state, and never hit the network. +3. **GUI tests launch a real app** — the `live_gui` session fixture starts `sloppy.py --enable-test-hooks` so integration tests can drive the actual app via the Hook API. +4. **Tests are categorized by marker** — unit, integration, strict, clean_install, docker — so CI can opt in to expensive tests. + +This guide is the canonical reference for how the test suite is structured and how to add new tests. + +--- + +## Test File Layout + +``` +tests/ +├── conftest.py # Session-wide fixtures (live_gui, isolate_workspace, etc.) +├── conftest.py is the canonical source +├── test_*.py # 251 test files, named `test__.py` +├── *_sim.py # Integration tests using the live_gui fixture +├── test_clean_install.py # Opt-in: clones the repo to tmp and verifies hooks +├── test_docker_build.py # Opt-in: builds and runs the Docker image +├── test_arch_boundary_phase1.py # Architectural boundary tests +├── test_enforce_no_real_toml.py # Meta-test for the enforcer fixture +├── artifacts/ # Git-ignored; test output +├── logs/ # Git-ignored; live_gui log files +└── mock_concurrent_mma.py # Mock providers for MMA tests +``` + +**Naming conventions:** +- `test_*.py` — pytest collection +- `*_sim.py` — integration test (uses `live_gui`) +- `*_e2e.py` — end-to-end test (real processes, opt-in via env var) +- `test__.py` — single aspect of an area, e.g., `test_ai_client_cli.py` + +--- + +## The `conftest.py` Fixtures + +The `tests/conftest.py` file defines 7 fixtures. They are listed below in the order pytest applies them (autouse first, then function-scoped, then session-scoped). + +### Autouse Fixtures (Run Before Every Test) + +#### `isolate_workspace` (line 70) + +**Purpose**: Give every test a fresh, isolated workspace so it cannot pollute the user's real `manual_slop.toml`, `presets.toml`, etc. + +**Mechanism**: +1. Creates a temp directory via `tmp_path_factory.mktemp("isolated_workspace")` +2. Writes a fresh `config.toml` to the temp dir +3. Sets `SLOP_CONFIG`, `SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES` env vars to point at the temp dir +4. The app reads these env vars on startup; the test sees an isolated world + +**Verification**: `python scripts/check_test_toml_paths.py` exits 0 (no test references real TOMLs). + +#### `reset_paths` (line 95) + +**Purpose**: Reset the `src.paths` global state before and after each test. + +**Mechanism**: Calls `paths.reset_resolved()` so path resolution re-evaluates on the next access. + +#### `reset_ai_client` (line 107) + +**Purpose**: Prevent `ai_client` state from leaking between tests. + +**Mechanism**: +1. Calls `ai_client.reset_session()` +2. Clears callback hooks (`confirm_and_run_callback`, `comms_log_callback`, `tool_log_callback`) +3. Clears all event listeners +4. Resets provider to `("gemini", "gemini-2.5-flash-lite")` +5. Resets MCP client state via `mcp_client.configure([], [])` + +### Function-Scoped Fixtures (Opt-in) + +#### `vlogger` (line 131) + +**Purpose**: Provide a `VerificationLogger` instance for structured diagnostic logging. + +**Usage**: +```python +def test_my_thing(vlogger): + vlogger.log_state("Field", "before_value", "after_value") + # ... test logic ... + vlogger.finalize("Test Title", "PASS", "result message") +``` + +Output: `tests/logs//.txt` + +#### `kill_process_tree` (function, line 138) + +**Purpose**: Robustly kill a process and all its children. Used by `live_gui` for cleanup, but available to any test. + +**Mechanism**: +- Windows: `taskkill /F /T /PID ` (the `/T` flag is critical — kills the whole tree) +- Unix: `os.killpg(os.getpgid(pid), SIGKILL)` (kills the process group) + +#### `mock_app` (line 157) + +**Purpose**: Create an `App` instance with all external side effects mocked. For unit tests that need the App but not the GUI loop. + +**Mocks applied**: +- `src.models.load_config` → returns a default config +- `src.gui_2.project_manager` +- `src.gui_2.session_logger` +- `src.gui_2.immapp.run` (prevents the actual render loop from starting) +- `src.app_controller.AppController._load_active_project` +- `src.app_controller.AppController._fetch_models` +- `App._load_fonts` +- `App._post_init` +- `src.app_controller.AppController._prune_old_logs` +- `src.app_controller.AppController.start_services` +- `src.app_controller.AppController._init_ai_and_hooks` +- `src.performance_monitor.PerformanceMonitor` + +**Cleanup**: Shuts down the controller after the test. + +#### `app_instance` (line 190) + +**Purpose**: Same as `mock_app` but with a slightly different mocking surface (the same mocks but used in `test_gui_phase4.py` and `test_token_viz.py` historically). Both are equivalent for most purposes. + +### Session-Scoped Fixtures (One Per Test Run) + +#### `live_gui` (line 227) + +**Purpose**: Start `sloppy.py --enable-test-hooks` for the entire test session. Integration tests use this to drive the real GUI via the Hook API. + +**Lifecycle**: +1. **Setup (once per session)**: + - Create `tests/artifacts/live_gui_workspace/` temp directory + - Write `manual_slop.toml` and `config.toml` to the workspace + - Set up `SLOP_*` env vars to point at the workspace + - Symlink `assets/` for fonts + - Launch `sloppy.py --enable-test-hooks` via `subprocess.Popen` + - Poll `GET /status` for up to 15 seconds (waiting for the HookServer to start) + - On failure: `pytest.fail()` (kills the process tree, aborts the session) +2. **Yield**: tests run +3. **Teardown (once per session)**: + - Call `ApiHookClient.reset_session()` to clear GUI state + - Kill the process tree (Windows: `taskkill /F /T`, Unix: `SIGKILL`) + - Wait 0.5s for file handles to close + - Close the log file + - Remove the temp workspace (with 5 retries for Windows file locks) + +**Yield value**: `(process: subprocess.Popen, gui_script: str)` — but most tests just take the fixture and use the `ApiHookClient` directly. + +**Usage pattern**: +```python +def test_my_thing(live_gui): + client = ApiHookClient() # connects to localhost:8999 + client.click("btn_id") + time.sleep(0.5) + assert client.get_value("show_thing") is True +``` + +--- + +## Test Categories + +### 1. Unit Tests (no fixtures, fast) + +Pure functions tested in isolation. No app, no GUI, no subprocess. Run in <100ms each. + +**Examples**: +- `tests/test_command_palette.py` — fuzzy matcher, command registry +- `tests/test_fuzzy_anchor.py` — anchor slice algorithm +- `tests/test_paths.py` — path resolution +- `tests/test_token_usage.py` — token tracking +- `tests/test_cost_tracker.py` — cost estimation + +**Pattern**: +```python +def test_my_unit(): + result = my_function(input) + assert result == expected +``` + +### 2. Integration Tests (use `live_gui`, slow) + +Drive the actual app via the Hook API. Run in 1-10 seconds each (real subprocess). + +**Examples**: +- `tests/test_saved_presets_sim.py` — preset switching via the GUI +- `tests/test_command_palette_sim.py` — palette toggle, navigation +- `tests/test_mma_concurrent_tracks_sim.py` — multi-track MMA +- `tests/test_workspace_profiles_sim.py` — workspace profile save/load +- `tests/test_gui_dag_beads.py` — Beads DAG visualization + +**Pattern**: +```python +def test_my_integration(live_gui): + client = ApiHookClient() + client.push_event("custom_callback", { + "callback": "_my_method", + "args": [arg1, arg2], + }) + time.sleep(0.5) + assert client.get_value("result") == expected +``` + +### 3. Mock App Tests (use `mock_app` or `app_instance`, fast) + +Need an App instance but not the full render loop. Run in <500ms each. + +**Examples**: +- `tests/test_text_viewer.py` — text viewer state updates +- `tests/test_patch_modal.py` — patch modal workflow +- `tests/test_gui2_events.py` — event subscriptions + +**Pattern**: +```python +def test_my_thing(mock_app): + mock_app.some_attr = "test_value" + mock_app._do_something() + assert mock_app.some_attr == "expected" +``` + +### 4. Headless Tests (no GUI, real services) + +Test the FastAPI/headless service directly via the Hook API. No subprocess. + +**Examples**: +- `tests/test_headless_service.py` — service lifecycle +- `tests/test_headless_verification.py` — full run with QA interceptor + +### 5. Opt-in Tests (gated by env var) + +Slow or network-dependent tests that don't run by default. Set the env var to enable. + +| Test File | Marker | Env Var | Purpose | +|---|---|---|---| +| `tests/test_clean_install.py` | `@pytest.mark.clean_install` | `RUN_CLEAN_INSTALL_TEST=1` | Clones the repo to tmp and verifies the hook API | +| `tests/test_docker_build.py` | `@pytest.mark.docker` | `RUN_DOCKER_TEST=1` | Builds and runs the Docker image | + +**Running opt-in tests**: +```bash +RUN_CLEAN_INSTALL_TEST=1 uv run pytest tests/test_clean_install.py -v +RUN_DOCKER_TEST=1 uv run pytest tests/test_docker_build.py -v +``` + +--- + +## Markers + +Defined in `pyproject.toml`: + +```toml +[tool.pytest.ini_options] +markers = [ + "integration: marks tests as integration tests (requires live GUI)", +] +``` + +**Adding a new marker**: add it to the list. Pytest will warn if a marker is used but not registered. + +**Filtering by marker**: +```bash +uv run pytest -m integration # Only integration tests +uv run pytest -m "not integration" # Skip integration tests +uv run pytest -m clean_install # Opt-in clean install tests +``` + +--- + +## The Hook API (For Integration Tests) + +The live GUI exposes a Hook API on `http://127.0.0.1:8999` when launched with `--enable-test-hooks`. The `ApiHookClient` (`src/api_hook_client.py`) is the Python wrapper. + +### Key Methods + +```python +client = ApiHookClient() # connects to localhost:8999 by default + +# Click a button +client.click("btn_reset") + +# Set a widget value +client.set_value("ui_ai_input", "Hello world") + +# Push a generic GUI task +client.push_event("custom_callback", { + "callback": "_my_method", + "args": [arg1, arg2], +}) + +# Get a value (gettable field) +value = client.get_value("show_command_palette") + +# Wait for an event +event = client.wait_for_event("ai_response", timeout=10) + +# Reset the session +client.reset_session() +``` + +### `predefined_callbacks` Pattern + +To make a test invoke an App method via the hook, register it in `gui_2.py`: + +```python +self.controller._predefined_callbacks['_my_method'] = self._my_method +self.controller._gettable_fields['show_thing'] = 'show_thing' +``` + +The test can then invoke `_my_method` via: +```python +client.push_event("custom_callback", { + "callback": "_my_method", + "args": [], +}) +``` + +This pattern is how the Command Palette's `_toggle_command_palette` is exposed for tests (since the keyboard shortcut can't be simulated via the hook). + +--- + +## Common Patterns + +### Testing a Pure Function + +```python +def test_my_function(): + from src.mymodule import my_function + result = my_function("input", 42) + assert result == "expected" +``` + +### Testing with a Mock App + +```python +from unittest.mock import MagicMock + +def test_with_mock(): + app = MagicMock() + app.some_attr = "test" + from src.mymodule import do_thing + do_thing(app) + app.some_method.assert_called_once() +``` + +### Testing via live_gui + +```python +import time +import pytest +from src.api_hook_client import ApiHookClient + +def test_via_gui(live_gui): + client = ApiHookClient() + client.push_event("custom_callback", { + "callback": "_some_method", + "args": ["value"], + }) + time.sleep(0.5) + assert client.get_value("result") == "expected" +``` + +### Testing an Exception Path + +```python +import pytest + +def test_raises(): + from src.mymodule import do_thing + with pytest.raises(ValueError, match="expected message"): + do_thing(bad_input) +``` + +### Parametrized Tests + +```python +import pytest + +@pytest.mark.parametrize("input,expected", [ + ("a", 1), + ("b", 2), + ("c", 3), +]) +def test_my_parametrized(input, expected): + assert my_function(input) == expected +``` + +--- + +## Test Configuration + +### `pyproject.toml` + +```toml +[tool.pytest.ini_options] +asyncio_mode = "strict" +markers = [ + "integration: marks tests as integration tests (requires live GUI)", +] +asyncio_default_fixture_loop_scope = None +asyncio_default_test_loop_scope = "function" +``` + +`asyncio_mode = "strict"` means async tests need explicit `@pytest.mark.asyncio`. This is intentional — most Manual Slop tests are synchronous. + +### Coverage + +Run with coverage: +```bash +uv run pytest tests/ --cov=src --cov-report=html +``` + +Open `htmlcov/index.html` in a browser. Target: >80% coverage for new code (per the project's quality gates). + +--- + +## Running Tests + +### All Tests + +```bash +uv run pytest tests/ -v +``` + +**Warning**: This runs 251 tests including slow `live_gui` integration tests. Total runtime: 5-10 minutes. + +### Specific Test File + +```bash +uv run pytest tests/test_command_palette.py -v +``` + +### Specific Test + +```bash +uv run pytest tests/test_command_palette.py::test_fuzzy_match_prefix_ranks_first -v +``` + +### By Marker + +```bash +uv run pytest -m integration -v # Only integration tests +uv run pytest -m "not integration" # Skip integration tests +``` + +### With Stop on First Failure + +```bash +uv run pytest tests/ -x -v +``` + +### With Timeout + +```bash +uv run pytest tests/ --timeout=60 -v +``` + +--- + +## Adding a New Test + +### For a Pure Function + +1. Add tests to an existing `tests/test_.py` file (if it exists) or create a new one +2. Use `def test_():` naming convention +3. No fixtures needed unless you're reading state +4. Verify it runs: `uv run pytest tests/test_.py::test_ -v` + +### For an Integration Test + +1. Create or extend a `*_sim.py` file +2. Add `def test_(live_gui):` with the live_gui fixture +3. Use `ApiHookClient` to drive the GUI +4. If you need to invoke an App method that's not yet exposed, register it as a `_predefined_callbacks` entry in `gui_2.py` +5. Verify: `uv run pytest tests/test__sim.py::test_ -v` + +### For an Opt-in Test (Clean Install / Docker) + +1. Mark with `@pytest.mark.` +2. Gate the entire file with a skip if the env var isn't set +3. Add the marker to `pyproject.toml`'s `markers` list +4. Document the env var in the test file's docstring + +--- + +## Debugging Failed Tests + +### Verbose Output + +```bash +uv run pytest tests/test_X.py -v -s +``` + +`-s` disables stdout/stderr capture so you can see print() output. + +### Stop at First Failure + +```bash +uv run pytest tests/test_X.py -x +``` + +### Enter PDB on Failure + +```bash +uv run pytest tests/test_X.py --pdb +``` + +### Show Local Variables on Failure + +```bash +uv run pytest tests/test_X.py -l +``` + +### Re-run Last Failed + +```bash +uv run pytest --lf +``` + +### Common Failure Modes + +| Symptom | Likely Cause | Fix | +|---|---|---| +| `ImportError` for a module | Missing dependency or 1-space indent issue | Check pyproject.toml; run `uv sync` | +| `live_gui` times out | Previous test left a process running | `taskkill /F /IM python.exe` to clean up | +| `get_value` returns `None` | Field not registered as gettable | Add to `self.controller._gettable_fields` in `gui_2.py` | +| `custom_callback` does nothing | Callback not registered | Add to `self.controller._predefined_callbacks` | +| `IM_ASSERT: Must call EndChild()` | Modal end_child/end pairing broken (usually from a buggy action) | Wrap actions in try/except; check for `imgui.end_child()` before `imgui.end()` | +| `pytest.fail` from `live_gui` startup | Hook server didn't start in 15s | Check `logs/gui_2_py_test.log` for crash | + +--- + +## The `Audit Script` + +`scripts/check_test_toml_paths.py` greps `tests/` for direct `./.toml` references and exits 0 only if all tests are sandboxed. It's the enforcement mechanism for the "no real TOML in tests" rule. + +**Run it**: +```bash +python scripts/check_test_toml_paths.py +``` + +**Expected output**: +``` +OK: No tests reference real TOML files. +``` + +If violations are found, migrate the offender to use `tmp_path` + `monkeypatch`. + +--- + +## Test Data Flow + +A typical test goes through this lifecycle: + +``` +Test starts + ├─> isolate_workspace (autouse) + │ ├─> Creates tmp dir + │ └─> Sets SLOP_* env vars + │ + ├─> reset_paths (autouse) + │ └─> paths.reset_resolved() + │ + ├─> reset_ai_client (autouse) + │ └─> Resets ai_client global state + │ + ├─> (test body runs) + │ ├─> If using live_gui: subprocess already running (session-scoped) + │ ├─> Test makes API calls via ApiHookClient + │ └─> Test asserts on returned values + │ + └─> Teardown + ├─> reset_paths runs again + └─> (autouse) state cleanup +``` + +The `live_gui` session fixture runs once at the start of the test session and tears down once at the end. All tests in the session share the same `sloppy.py` process. + +--- + +## See Also + +- **[guide_simulations.md](guide_simulations.md)** — Older guide focused on the Puppeteer pattern; still relevant for the test scenarios it documents +- **[guide_meta_boundary.md](guide_meta_boundary.md)** — Application vs Meta-Tooling domain separation; the test suite is in the Application domain +- **[guide_architecture.md](guide_architecture.md#the-task-pipeline-producer-consumer-synchronization)** — Threading model that the `live_gui` test fixture respects +- **`src/api_hook_client.py`** — The Python wrapper for the Hook API used in integration tests +- **`tests/conftest.py`** — The canonical source of all fixtures documented in this guide + +See [guide_architecture.md](guide_architecture.md) for the overall architecture and [conductor/workflow.md](../../conductor/workflow.md) for the TDD protocol that the test suite implements. diff --git a/manualslop_layout.ini b/manualslop_layout.ini index 88b7bcf1..4213444b 100644 --- a/manualslop_layout.ini +++ b/manualslop_layout.ini @@ -44,20 +44,20 @@ Collapsed=0 DockId=0x00000010,0 [Window][Message] -Pos=166,28 -Size=1514,1172 +Pos=280,28 +Size=1400,1172 Collapsed=0 DockId=0x00000006,0 [Window][Response] Pos=0,28 -Size=164,1172 +Size=278,1172 Collapsed=0 DockId=0x00000010,4 [Window][Tool Calls] -Pos=166,28 -Size=1514,1172 +Pos=280,28 +Size=1400,1172 Collapsed=0 DockId=0x00000006,3 @@ -77,7 +77,7 @@ DockId=0xAFC85805,2 [Window][Theme] Pos=0,28 -Size=164,1172 +Size=278,1172 Collapsed=0 DockId=0x00000010,0 @@ -105,26 +105,26 @@ Collapsed=0 DockId=0x0000000D,0 [Window][Discussion Hub] -Pos=166,28 -Size=1514,1172 +Pos=280,28 +Size=1400,1172 Collapsed=0 DockId=0x00000006,1 [Window][Operations Hub] Pos=0,28 -Size=164,1172 +Size=278,1172 Collapsed=0 DockId=0x00000010,3 [Window][Files & Media] Pos=0,28 -Size=164,1172 +Size=278,1172 Collapsed=0 DockId=0x00000010,2 [Window][AI Settings] Pos=0,28 -Size=164,1172 +Size=278,1172 Collapsed=0 DockId=0x00000010,1 @@ -140,8 +140,8 @@ Collapsed=0 DockId=0x00000006,2 [Window][Log Management] -Pos=166,28 -Size=1514,1172 +Pos=280,28 +Size=1400,1172 Collapsed=0 DockId=0x00000006,2 @@ -495,8 +495,8 @@ Size=1780,1669 Collapsed=0 [Window][Context Preview] -Pos=1360,28 -Size=1561,1677 +Pos=280,28 +Size=1400,1172 Collapsed=0 DockId=0x00000006,4 @@ -690,7 +690,7 @@ Column 1 Width=30 [Table][0x9D36FCE8,2] RefScale=20 -Column 0 Width=659 +Column 0 Width=742 Column 1 Weight=1.0000 [Docking][Data] @@ -700,10 +700,10 @@ DockNode ID=0x00000008 Pos=3125,170 Size=593,1157 Split=Y DockSpace ID=0xAFC85805 Window=0x079D3A04 Pos=0,28 Size=1680,1172 Split=X DockNode ID=0x00000003 Parent=0xAFC85805 SizeRef=2357,1183 Split=X DockNode ID=0x0000000B Parent=0x00000003 SizeRef=404,1186 Split=X Selected=0xF4139CA2 - DockNode ID=0x00000005 Parent=0x0000000B SizeRef=1208,1681 Split=Y Selected=0x3F1379AF - DockNode ID=0x00000010 Parent=0x00000005 SizeRef=983,1140 CentralNode=1 Selected=0x0D5A5273 + DockNode ID=0x00000005 Parent=0x0000000B SizeRef=1144,1681 Split=Y Selected=0x3F1379AF + DockNode ID=0x00000010 Parent=0x00000005 SizeRef=983,1140 CentralNode=1 Selected=0x418C7449 DockNode ID=0x00000011 Parent=0x00000005 SizeRef=983,184 Selected=0x432BAE4E - DockNode ID=0x00000006 Parent=0x0000000B SizeRef=1514,1681 Selected=0x2C0206CE + DockNode ID=0x00000006 Parent=0x0000000B SizeRef=1400,1681 Selected=0x2C0206CE DockNode ID=0x0000000D Parent=0x00000003 SizeRef=435,1186 Selected=0x363E93D6 DockNode ID=0x00000004 Parent=0xAFC85805 SizeRef=488,1183 Selected=0x3AEC3498