Private
Public Access
0
0

docs(testing): critical live_gui_workspace path fix + 8 new sections

CRITICAL fix:
- live_gui_workspace path: tmp_path_factory (banned) ->
  tests/artifacts/live_gui_workspace_<timestamp> (per-run timestamp)
  (per conductor/code_styleguides/workspace_paths.md)

8 new sections under 'Per-test Subprocess Resilience':
1. _reset_clean_baseline autouse fixture (mma_tier_usage +
   rag_config=default RAGConfig(), not None)
2. Watchdog and Hang Bounding (signal-based, 900s smart + 900s
   unconditional, replaces removed 30s daemon-thread)
3. Chroma Cache Path (tests/artifacts/.slop_cache/, parent-trailing-slash
   bug, pre-cleanup pattern in test_rag_phase4_final_verify)
4. xdist Worker Coordination (O_EXCL file lock, PYTEST_XDIST_WORKER,
   owner/client roles, stale lock demotion)
5. Required Test Dependencies Gate (sentence-transformers,
   uv sync --extra local-rag fix)
6. MMA and RAG State in reset_session() (5 buckets: mma_tier_usage
   pre-populated, rag_config fresh RAGConfig() not None)
7. _LiveGuiHandle __getitem__ (handle[0] / handle[1])

Expand 'Audit Script' -> 'Audit Scripts' (4 scripts total):
- check_test_toml_paths.py (existing)
- audit_main_thread_imports.py (startup_speedup)
- audit_weak_types.py (data_structure_strengthening)
- audit_no_models_config_io.py (config_state_owner styleguide)
This commit is contained in:
2026-06-10 20:05:16 -04:00
parent 2e12b266e4
commit 5fa8a10ebf
+103 -15
View File
@@ -133,7 +133,7 @@ Output: `tests/logs/<timestamp>/<script_name>.txt`
**Lifecycle**:
1. **Setup (once per session)**:
- Create `tests/artifacts/live_gui_workspace/` temp directory
- Compute the per-run workspace path: `tests/artifacts/live_gui_workspace_<timestamp>/` (where `<timestamp>` is `datetime.now().strftime("%Y%m%d_%H%M%S")` at conftest import time). Per the `conductor/code_styleguides/workspace_paths.md` hard rule, test workspaces live in the project tree (not `%TEMP%`).
- Write `manual_slop.toml` and `config.toml` to the workspace
- Set up `SLOP_*` env vars to point at the workspace
- Symlink `assets/` for fonts
@@ -148,7 +148,7 @@ Output: `tests/logs/<timestamp>/<script_name>.txt`
- Close the log file
- Remove the temp workspace (with 5 retries for Windows file locks)
**Yield value**: `(process: subprocess.Popen, gui_script: str)` — but most tests just take the fixture and use the `ApiHookClient` directly.
**Yield value**: A `_LiveGuiHandle` object (see below). Most tests just take the fixture and use the `ApiHookClient` directly.
**Usage pattern**:
```python
@@ -180,9 +180,11 @@ The `live_gui` fixture yields a `_LiveGuiHandle` instead of a `(process, gui_scr
**Backward compat:** The handle is iterable as `(process, gui_script)`, so existing `proc, _ = live_gui` patterns still work.
### `live_gui_workspace` fixture (tests/conftest.py:657)
### `live_gui_workspace` fixture (tests/conftest.py:727)
Yields the absolute path to the live_gui subprocess's workspace (a `tmp_path_factory.mktemp("live_gui_workspace")` directory in pytest's tmp dir). Tests that need to create files in the workspace should request this fixture instead of hardcoding `Path("tests/artifacts/live_gui_workspace")`.
Yields `live_gui.handle.workspace`, a `Path` to `tests/artifacts/live_gui_workspace_<timestamp>/` (computed at conftest import time). Tests that need to create files in the workspace should request this fixture instead of hardcoding the path.
> **Important:** This is NOT `tmp_path_factory.mktemp()` despite the older documentation. The `workspace_paths.md` styleguide bans `tmp_path_factory` for test infrastructure workspaces (they'd live in `%TEMP%` and be unfindable). The path is in the project tree, under `tests/artifacts/`, and is shared across all xdist workers in a session (with a file-lock for spawn coordination).
```python
def test_rag_setup(live_gui, live_gui_workspace):
@@ -208,6 +210,87 @@ def test_rag_final_verify(live_gui):
Use this for tests that are sensitive to controller state pollution from prior tests in the same session. The `test_rag_phase4_final_verify` test is marked this way because the 4 sims in `test_extended_sims.py` mutate controller state (provider, model, etc.) that would otherwise pollute the RAG test.
### `_reset_clean_baseline` autouse fixture
The `clean_baseline` marker triggers an autouse `_reset_clean_baseline` fixture that calls `client.reset_session()` before the test starts. The `_handle_reset_session` controller method must clear **both** MMA state and RAG state to prevent cross-test pollution:
- **MMA**: `mma_tier_usage`, `mma_status`, `active_tier`
- **RAG**: `rag_engine` (if disabled), `rag_config` (reset to a fresh `RAGConfig()`, not `None``rag_config=None` makes all `rag_*` setters no-op because they check `if self.rag_config:`)
See `tests/test_reset_session_clears_mma_and_rag.py` for the regression test that locked this contract in.
### Watchdog and Hang Bounding
The conftest's hang-bounding is **signal-based**, not naive `time.sleep` + `os._exit`:
| Watchdog | Timeout | Trigger | Action |
|---|---|---|---|
| **Smart watchdog** (`_smart_watchdog_exit`) | 900s (15 min) | Waits for `_pytest_finished_event` | On signal: 5s grace, then `os._exit(0)`. On timeout: `os._exit(2)` (hard fail). |
| **Unconditional watchdog** (`_unconditional_watchdog_exit`) | 900s (15 min, longer safety net) | Same event | Same action. Catches the case where pytest is so slow the smart expires first. |
`_pytest_finished_event` is set in two hooks for redundancy: `pytest_terminal_summary` (primary — after the summary is printed) and `pytest_unconfigure` (fallback). The prior daemon-thread approach (30s `os._exit(0)`) was removed because it killed batches mid-test on Windows (daemon threads are not auto-killed by the interpreter) and hid pytest's exit code from `run_tests_batched.py`. See `conductor/workflow.md` for the broader rationale.
### Chroma Cache Path and Cross-Test Pollution
The chroma cache lives at `tests/artifacts/.slop_cache/chroma_<project>/`, **NOT** the per-run `live_gui_workspace_<timestamp>` subdir. This is because `active_project_root = Path(active_project_path).parent` and some test setups produce a trailing slash on the project path, placing the cache one level higher than expected.
**Implication for tests:** Tests that interact with RAG should pre-clean the chroma cache to avoid persistent state from prior tests in the batch. `tests/test_rag_phase4_final_verify.py` does this:
```python
def test_phase4_final_verify(live_gui):
# Wipe any stale chroma from prior batched runs
cache = Path("tests/artifacts/.slop_cache/chroma_test_final_verify")
if cache.exists():
shutil.rmtree(cache, ignore_errors=True)
# ... rest of test
```
Without this cleanup, a prior batched run with a different embedding provider (e.g., Gemini 3072-dim vs local 384-dim) can leave a corrupt collection that fails the next test's `search()` with a dim-mismatch error. The `_validate_collection_dim()` mechanism in `RAGEngine` also auto-recovers (see [guide_rag.md](guide_rag.md#dimension-mismatch-protection)) but pre-cleaning is faster and avoids the stderr warning.
### xdist Worker Coordination and Stale Lock Demotion
The `live_gui` fixture uses an `O_EXCL`-atomic file lock at `tests/artifacts/live_gui_workspace_*/.live_gui_owner.lock`:
1. Each xdist worker reads its ID from `PYTEST_XDIST_WORKER` env var (e.g., `gw0`, `gw1`, ...).
2. The first worker to acquire the file lock becomes the **owner** and spawns `sloppy.py --enable-test-hooks`.
3. Other workers become **clients** and wait for the hook server to respond on `127.0.0.1:8999`, then yield a `_LiveGuiHandle(process=None, ...)` (null process — they share the owner's subprocess via the hook API).
4. If the owner worker's hook server is **not** up when a client acquires the lock (e.g., owner crashed or hasn't started), the client demotes itself: removes the stale lock and re-tries as the owner. This prevents one crashed owner from blocking all 16 workers.
### Required Test Dependencies Gate
`_check_required_test_dependencies()` runs in `pytest_configure` and raises `pytest.UsageError` at session start if a required test dep is missing:
| Module | Package | Fix command |
|---|---|---|
| `sentence_transformers` | `sentence-transformers` | `uv sync --extra local-rag` |
This is a regression gate for the 2026-06-09 incident where a fresh `uv sync` (without `--extra local-rag`) produced a confusing `rag_status = error: ... Install with manual_slop[local-rag]` failure inside the live_gui subprocess. The gate fails fast with the exact fix command instead. To add a new required test dep, append `(module_name, package_name)` to `_REQUIRED_TEST_IMPORTS` in `tests/conftest.py`.
### MMA and RAG State in `reset_session()`
`controller._handle_reset_session()` must clear 5 specific state buckets to prevent cross-test pollution:
| Bucket | What gets reset | Why |
|---|---|---|
| `mma_tier_usage` | Pre-populated to full default shape (input, output, provider, model, tool_preset) — NOT empty dict | Downstream `_flush_to_project` does `d["model"]`; empty dict causes `KeyError`. |
| `mma_status` | `None` | Status machine reset. |
| `active_tier` | `None` | Current tier cleared. |
| `rag_engine` | `None` if RAG disabled; unchanged if enabled (teardown is expensive) | Avoids 30s+ chroma re-init per test. |
| `rag_config` | Fresh `RAGConfig()` default (not `None`) | All `rag_*` setters check `if self.rag_config:` and become no-ops if None. |
The regression tests are in `tests/test_reset_session_clears_mma_and_rag.py` (poll-for-state pattern, not `time.sleep`).
### `_LiveGuiHandle` Indexing (`__getitem__`)
In addition to the `__iter__` tuple-unpacking backward compat, `_LiveGuiHandle` supports `__getitem__` indexing:
| Expression | Returns |
|---|---|
| `handle[0]` | `process` |
| `handle[1]` | `gui_script` |
| `handle[n]` for `n >= 2` | `IndexError` |
Both patterns are valid and equivalent. New code should prefer the named attributes (`.process`, `.gui_script`, `.workspace`) for clarity; the indexing is for backward compat with the old tuple-yielding fixture.
---
## Test Categories
@@ -590,21 +673,26 @@ uv run pytest --lf
---
## The `Audit Script`
## The `Audit Scripts`
`scripts/check_test_toml_paths.py` greps `tests/` for direct `./<name>.toml` references and exits 0 only if all tests are sandboxed. It's the enforcement mechanism for the "no real TOML in tests" rule.
The project has 4 audit scripts that enforce static conventions. They run as pre-commit/CI gates and exit non-zero on regression.
**Run it**:
```bash
python scripts/check_test_toml_paths.py
```
| Script | Enforces | Run command |
|---|---|---|
| `scripts/check_test_toml_paths.py` | No real-TOML references in `tests/` (must be sandboxed) | `python scripts/check_test_toml_paths.py` |
| `scripts/audit_main_thread_imports.py` | Main-thread-purity invariant: the main thread (entering `immapp.run()`) never imports a module heavier than `imgui_bundle` + lean `gui_2` skeleton. Heavy SDKs (`google.genai`, `anthropic`, `openai`, `fastapi`) are lazy-only. | `python scripts/audit_main_thread_imports.py` |
| `scripts/audit_weak_types.py` | Type-alias convention: 430 weak `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types across 6 high-traffic files were replaced with named `TypeAlias`es in `src/type_aliases.py`. The audit enforces the convention going forward. | `python scripts/audit_weak_types.py` |
| `scripts/audit_no_models_config_io.py` | `AppController` is the single source of truth for config I/O; direct calls to `models.save_config` / `models.load_config` in `src/` are forbidden. | `python scripts/audit_no_models_config_io.py` |
**Expected output**:
```
OK: No tests reference real TOML files.
```
**Per-script details:**
If violations are found, migrate the offender to use `tmp_path` + `monkeypatch`.
**`check_test_toml_paths.py`** greps `tests/` for direct `./<name>.toml` references and exits 0 only if all tests are sandboxed. It's the enforcement mechanism for the "no real TOML in tests" rule. If violations are found, migrate the offender to use `tmp_path` + `monkeypatch`.
**`audit_main_thread_imports.py`** (added in `startup_speedup_20260606`) enforces the main-thread-purity invariant. It scans `src/` for `import` statements that pull in heavy SDKs at module level. Each violation is a site that will re-introduce a multi-second startup cost. The startup_speedup track reduced violations from 67 to 62 (5 fixed; the remaining 62 are large refactors tracked in future work).
**`audit_weak_types.py`** (added in `data_structure_strengthening_20260606`) enforces the type-alias convention. The baseline file `scripts/audit_weak_types.baseline.json` records the post-refactor weak-type count (~60 after 86% reduction from 430 in 6 high-traffic files). New violations exit 1; the baseline lets you audit the delta.
**`audit_no_models_config_io.py`** enforces the "AppController is the sole config owner" rule from `conductor/code_styleguides/config_state_owner.md`.
---