Private
Public Access
0
0

docs(testing): critical live_gui_workspace path fix + 8 new sections

CRITICAL fix:
- live_gui_workspace path: tmp_path_factory (banned) ->
  tests/artifacts/live_gui_workspace_<timestamp> (per-run timestamp)
  (per conductor/code_styleguides/workspace_paths.md)

8 new sections under 'Per-test Subprocess Resilience':
1. _reset_clean_baseline autouse fixture (mma_tier_usage +
   rag_config=default RAGConfig(), not None)
2. Watchdog and Hang Bounding (signal-based, 900s smart + 900s
   unconditional, replaces removed 30s daemon-thread)
3. Chroma Cache Path (tests/artifacts/.slop_cache/, parent-trailing-slash
   bug, pre-cleanup pattern in test_rag_phase4_final_verify)
4. xdist Worker Coordination (O_EXCL file lock, PYTEST_XDIST_WORKER,
   owner/client roles, stale lock demotion)
5. Required Test Dependencies Gate (sentence-transformers,
   uv sync --extra local-rag fix)
6. MMA and RAG State in reset_session() (5 buckets: mma_tier_usage
   pre-populated, rag_config fresh RAGConfig() not None)
7. _LiveGuiHandle __getitem__ (handle[0] / handle[1])

Expand 'Audit Script' -> 'Audit Scripts' (4 scripts total):
- check_test_toml_paths.py (existing)
- audit_main_thread_imports.py (startup_speedup)
- audit_weak_types.py (data_structure_strengthening)
- audit_no_models_config_io.py (config_state_owner styleguide)
This commit is contained in:
2026-06-10 20:05:16 -04:00
parent 2e12b266e4
commit 5fa8a10ebf
+103 -15
View File
@@ -133,7 +133,7 @@ Output: `tests/logs/<timestamp>/<script_name>.txt`
**Lifecycle**: **Lifecycle**:
1. **Setup (once per session)**: 1. **Setup (once per session)**:
- Create `tests/artifacts/live_gui_workspace/` temp directory - Compute the per-run workspace path: `tests/artifacts/live_gui_workspace_<timestamp>/` (where `<timestamp>` is `datetime.now().strftime("%Y%m%d_%H%M%S")` at conftest import time). Per the `conductor/code_styleguides/workspace_paths.md` hard rule, test workspaces live in the project tree (not `%TEMP%`).
- Write `manual_slop.toml` and `config.toml` to the workspace - Write `manual_slop.toml` and `config.toml` to the workspace
- Set up `SLOP_*` env vars to point at the workspace - Set up `SLOP_*` env vars to point at the workspace
- Symlink `assets/` for fonts - Symlink `assets/` for fonts
@@ -148,7 +148,7 @@ Output: `tests/logs/<timestamp>/<script_name>.txt`
- Close the log file - Close the log file
- Remove the temp workspace (with 5 retries for Windows file locks) - Remove the temp workspace (with 5 retries for Windows file locks)
**Yield value**: `(process: subprocess.Popen, gui_script: str)` — but most tests just take the fixture and use the `ApiHookClient` directly. **Yield value**: A `_LiveGuiHandle` object (see below). Most tests just take the fixture and use the `ApiHookClient` directly.
**Usage pattern**: **Usage pattern**:
```python ```python
@@ -180,9 +180,11 @@ The `live_gui` fixture yields a `_LiveGuiHandle` instead of a `(process, gui_scr
**Backward compat:** The handle is iterable as `(process, gui_script)`, so existing `proc, _ = live_gui` patterns still work. **Backward compat:** The handle is iterable as `(process, gui_script)`, so existing `proc, _ = live_gui` patterns still work.
### `live_gui_workspace` fixture (tests/conftest.py:657) ### `live_gui_workspace` fixture (tests/conftest.py:727)
Yields the absolute path to the live_gui subprocess's workspace (a `tmp_path_factory.mktemp("live_gui_workspace")` directory in pytest's tmp dir). Tests that need to create files in the workspace should request this fixture instead of hardcoding `Path("tests/artifacts/live_gui_workspace")`. Yields `live_gui.handle.workspace`, a `Path` to `tests/artifacts/live_gui_workspace_<timestamp>/` (computed at conftest import time). Tests that need to create files in the workspace should request this fixture instead of hardcoding the path.
> **Important:** This is NOT `tmp_path_factory.mktemp()` despite the older documentation. The `workspace_paths.md` styleguide bans `tmp_path_factory` for test infrastructure workspaces (they'd live in `%TEMP%` and be unfindable). The path is in the project tree, under `tests/artifacts/`, and is shared across all xdist workers in a session (with a file-lock for spawn coordination).
```python ```python
def test_rag_setup(live_gui, live_gui_workspace): def test_rag_setup(live_gui, live_gui_workspace):
@@ -208,6 +210,87 @@ def test_rag_final_verify(live_gui):
Use this for tests that are sensitive to controller state pollution from prior tests in the same session. The `test_rag_phase4_final_verify` test is marked this way because the 4 sims in `test_extended_sims.py` mutate controller state (provider, model, etc.) that would otherwise pollute the RAG test. Use this for tests that are sensitive to controller state pollution from prior tests in the same session. The `test_rag_phase4_final_verify` test is marked this way because the 4 sims in `test_extended_sims.py` mutate controller state (provider, model, etc.) that would otherwise pollute the RAG test.
### `_reset_clean_baseline` autouse fixture
The `clean_baseline` marker triggers an autouse `_reset_clean_baseline` fixture that calls `client.reset_session()` before the test starts. The `_handle_reset_session` controller method must clear **both** MMA state and RAG state to prevent cross-test pollution:
- **MMA**: `mma_tier_usage`, `mma_status`, `active_tier`
- **RAG**: `rag_engine` (if disabled), `rag_config` (reset to a fresh `RAGConfig()`, not `None``rag_config=None` makes all `rag_*` setters no-op because they check `if self.rag_config:`)
See `tests/test_reset_session_clears_mma_and_rag.py` for the regression test that locked this contract in.
### Watchdog and Hang Bounding
The conftest's hang-bounding is **signal-based**, not naive `time.sleep` + `os._exit`:
| Watchdog | Timeout | Trigger | Action |
|---|---|---|---|
| **Smart watchdog** (`_smart_watchdog_exit`) | 900s (15 min) | Waits for `_pytest_finished_event` | On signal: 5s grace, then `os._exit(0)`. On timeout: `os._exit(2)` (hard fail). |
| **Unconditional watchdog** (`_unconditional_watchdog_exit`) | 900s (15 min, longer safety net) | Same event | Same action. Catches the case where pytest is so slow the smart expires first. |
`_pytest_finished_event` is set in two hooks for redundancy: `pytest_terminal_summary` (primary — after the summary is printed) and `pytest_unconfigure` (fallback). The prior daemon-thread approach (30s `os._exit(0)`) was removed because it killed batches mid-test on Windows (daemon threads are not auto-killed by the interpreter) and hid pytest's exit code from `run_tests_batched.py`. See `conductor/workflow.md` for the broader rationale.
### Chroma Cache Path and Cross-Test Pollution
The chroma cache lives at `tests/artifacts/.slop_cache/chroma_<project>/`, **NOT** the per-run `live_gui_workspace_<timestamp>` subdir. This is because `active_project_root = Path(active_project_path).parent` and some test setups produce a trailing slash on the project path, placing the cache one level higher than expected.
**Implication for tests:** Tests that interact with RAG should pre-clean the chroma cache to avoid persistent state from prior tests in the batch. `tests/test_rag_phase4_final_verify.py` does this:
```python
def test_phase4_final_verify(live_gui):
# Wipe any stale chroma from prior batched runs
cache = Path("tests/artifacts/.slop_cache/chroma_test_final_verify")
if cache.exists():
shutil.rmtree(cache, ignore_errors=True)
# ... rest of test
```
Without this cleanup, a prior batched run with a different embedding provider (e.g., Gemini 3072-dim vs local 384-dim) can leave a corrupt collection that fails the next test's `search()` with a dim-mismatch error. The `_validate_collection_dim()` mechanism in `RAGEngine` also auto-recovers (see [guide_rag.md](guide_rag.md#dimension-mismatch-protection)) but pre-cleaning is faster and avoids the stderr warning.
### xdist Worker Coordination and Stale Lock Demotion
The `live_gui` fixture uses an `O_EXCL`-atomic file lock at `tests/artifacts/live_gui_workspace_*/.live_gui_owner.lock`:
1. Each xdist worker reads its ID from `PYTEST_XDIST_WORKER` env var (e.g., `gw0`, `gw1`, ...).
2. The first worker to acquire the file lock becomes the **owner** and spawns `sloppy.py --enable-test-hooks`.
3. Other workers become **clients** and wait for the hook server to respond on `127.0.0.1:8999`, then yield a `_LiveGuiHandle(process=None, ...)` (null process — they share the owner's subprocess via the hook API).
4. If the owner worker's hook server is **not** up when a client acquires the lock (e.g., owner crashed or hasn't started), the client demotes itself: removes the stale lock and re-tries as the owner. This prevents one crashed owner from blocking all 16 workers.
### Required Test Dependencies Gate
`_check_required_test_dependencies()` runs in `pytest_configure` and raises `pytest.UsageError` at session start if a required test dep is missing:
| Module | Package | Fix command |
|---|---|---|
| `sentence_transformers` | `sentence-transformers` | `uv sync --extra local-rag` |
This is a regression gate for the 2026-06-09 incident where a fresh `uv sync` (without `--extra local-rag`) produced a confusing `rag_status = error: ... Install with manual_slop[local-rag]` failure inside the live_gui subprocess. The gate fails fast with the exact fix command instead. To add a new required test dep, append `(module_name, package_name)` to `_REQUIRED_TEST_IMPORTS` in `tests/conftest.py`.
### MMA and RAG State in `reset_session()`
`controller._handle_reset_session()` must clear 5 specific state buckets to prevent cross-test pollution:
| Bucket | What gets reset | Why |
|---|---|---|
| `mma_tier_usage` | Pre-populated to full default shape (input, output, provider, model, tool_preset) — NOT empty dict | Downstream `_flush_to_project` does `d["model"]`; empty dict causes `KeyError`. |
| `mma_status` | `None` | Status machine reset. |
| `active_tier` | `None` | Current tier cleared. |
| `rag_engine` | `None` if RAG disabled; unchanged if enabled (teardown is expensive) | Avoids 30s+ chroma re-init per test. |
| `rag_config` | Fresh `RAGConfig()` default (not `None`) | All `rag_*` setters check `if self.rag_config:` and become no-ops if None. |
The regression tests are in `tests/test_reset_session_clears_mma_and_rag.py` (poll-for-state pattern, not `time.sleep`).
### `_LiveGuiHandle` Indexing (`__getitem__`)
In addition to the `__iter__` tuple-unpacking backward compat, `_LiveGuiHandle` supports `__getitem__` indexing:
| Expression | Returns |
|---|---|
| `handle[0]` | `process` |
| `handle[1]` | `gui_script` |
| `handle[n]` for `n >= 2` | `IndexError` |
Both patterns are valid and equivalent. New code should prefer the named attributes (`.process`, `.gui_script`, `.workspace`) for clarity; the indexing is for backward compat with the old tuple-yielding fixture.
--- ---
## Test Categories ## Test Categories
@@ -590,21 +673,26 @@ uv run pytest --lf
--- ---
## The `Audit Script` ## The `Audit Scripts`
`scripts/check_test_toml_paths.py` greps `tests/` for direct `./<name>.toml` references and exits 0 only if all tests are sandboxed. It's the enforcement mechanism for the "no real TOML in tests" rule. The project has 4 audit scripts that enforce static conventions. They run as pre-commit/CI gates and exit non-zero on regression.
**Run it**: | Script | Enforces | Run command |
```bash |---|---|---|
python scripts/check_test_toml_paths.py | `scripts/check_test_toml_paths.py` | No real-TOML references in `tests/` (must be sandboxed) | `python scripts/check_test_toml_paths.py` |
``` | `scripts/audit_main_thread_imports.py` | Main-thread-purity invariant: the main thread (entering `immapp.run()`) never imports a module heavier than `imgui_bundle` + lean `gui_2` skeleton. Heavy SDKs (`google.genai`, `anthropic`, `openai`, `fastapi`) are lazy-only. | `python scripts/audit_main_thread_imports.py` |
| `scripts/audit_weak_types.py` | Type-alias convention: 430 weak `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types across 6 high-traffic files were replaced with named `TypeAlias`es in `src/type_aliases.py`. The audit enforces the convention going forward. | `python scripts/audit_weak_types.py` |
| `scripts/audit_no_models_config_io.py` | `AppController` is the single source of truth for config I/O; direct calls to `models.save_config` / `models.load_config` in `src/` are forbidden. | `python scripts/audit_no_models_config_io.py` |
**Expected output**: **Per-script details:**
```
OK: No tests reference real TOML files.
```
If violations are found, migrate the offender to use `tmp_path` + `monkeypatch`. **`check_test_toml_paths.py`** greps `tests/` for direct `./<name>.toml` references and exits 0 only if all tests are sandboxed. It's the enforcement mechanism for the "no real TOML in tests" rule. If violations are found, migrate the offender to use `tmp_path` + `monkeypatch`.
**`audit_main_thread_imports.py`** (added in `startup_speedup_20260606`) enforces the main-thread-purity invariant. It scans `src/` for `import` statements that pull in heavy SDKs at module level. Each violation is a site that will re-introduce a multi-second startup cost. The startup_speedup track reduced violations from 67 to 62 (5 fixed; the remaining 62 are large refactors tracked in future work).
**`audit_weak_types.py`** (added in `data_structure_strengthening_20260606`) enforces the type-alias convention. The baseline file `scripts/audit_weak_types.baseline.json` records the post-refactor weak-type count (~60 after 86% reduction from 430 in 6 high-traffic files). New violations exit 1; the baseline lets you audit the delta.
**`audit_no_models_config_io.py`** enforces the "AppController is the sole config owner" rule from `conductor/code_styleguides/config_state_owner.md`.
--- ---