manual_slop/docs/superpowers/plans/2026-06-02-test-consolidation.md

# Test Consolidation & TOML Sandboxing Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Audit tests for real-TOML usage, migrate offenders to sandboxed patterns, consolidate similar tests where it improves clarity, and enforce the rule going forward.

**Architecture:** `scripts/check_test_toml_paths.py` is the audit tool. `tests/conftest.py` gets a new autouse fixture `enforce_no_real_toml`. Migration follows the existing `isolate_workspace` pattern (already in conftest.py) using `tmp_path` + `monkeypatch`. Consolidation is judgment-call per area.

**Tech Stack:** Python 3.11+, pytest, regex (stdlib)

**Spec:** `docs/superpowers/specs/2026-06-02-test-consolidation-design.md`

---

## Execution Constraints

- **No subagents.** Execute as a single agent.
- **Pre-edit checkpoint:** `git add .` before any file edit.
- **Per-file atomic commits:** One file = one commit.
- **Commit message format:** `<type>(<scope>): <imperative description>`.
- **Git note format:** 3-8 line rationale per commit.
- **Style baseline:** 1-space indent, no comments, type hints.

---

## File Structure

| File | Action | Responsibility |
|---|---|---|
| `scripts/check_test_toml_paths.py` | Create | Greps for direct `./<name>.toml` references in tests; CI gate |
| `tests/conftest.py` | Modify | Add `enforce_no_real_toml` autouse fixture |
| `tests/test_enforce_no_real_toml.py` | Create | Tests for the enforcer itself |
| Various `tests/test_*.py` | Modify | Migrate offenders to sandboxed pattern |

---

## Task 1: Build the audit script

**Files:**
- Create: `scripts/check_test_toml_paths.py`

- [ ] **Step 1.1: Pre-edit checkpoint**

```powershell
git -C C:\projects\manual_slop add .
```

- [ ] **Step 1.2: Create the audit script**

```python
#!/usr/bin/env python3
# scripts/check_test_toml_paths.py
"""Detect tests that read/write real TOML files. Used as a CI gate.

Run from repo root: python scripts/check_test_toml_paths.py
Exits 0 if all tests use sandboxed paths, 1 otherwise.
"""
from __future__ import annotations
import re
import sys
from pathlib import Path

TOML_BASENAMES = {
    "manual_slop", "config", "credentials",
    "presets", "personas", "tool_presets",
    "workspace_profiles", "tool_presets",
}

PATTERNS = [
    re.compile(rf'Path\(["\'](?:{"|".join(TOML_BASENAMES)})\.toml["\']'),
    re.compile(rf'open\(["\'](?:{"|".join(TOML_BASENAMES)})\.toml["\']'),
    re.compile(rf'["\']\.{{1,2}}/(?:{"|".join(TOML_BASENAMES)})\.toml["\']'),
    re.compile(rf'Path\(["\']\.\./(?:{"|".join(TOML_BASENAMES)})\.toml["\']'),
]

EXCLUDE_DIRS = {"artifacts", "logs", "__pycache__", "snapshots"}


def find_violations(tests_dir: Path) -> list[tuple[Path, int, str]]:
    violations = []
    for test_file in tests_dir.rglob("test_*.py"):
        if any(excluded in test_file.parts for excluded in EXCLUDE_DIRS):
            continue
        try:
            content = test_file.read_text(encoding="utf-8")
        except (OSError, UnicodeDecodeError):
            continue
        for lineno, line in enumerate(content.splitlines(), start=1):
            for pattern in PATTERNS:
                if pattern.search(line):
                    violations.append((test_file, lineno, line.strip()))
                    break
    return violations


def main() -> int:
    repo_root = Path(__file__).resolve().parent.parent
    tests_dir = repo_root / "tests"
    if not tests_dir.exists():
        print(f"Tests dir not found: {tests_dir}", file=sys.stderr)
        return 1
    violations = find_violations(tests_dir)
    if not violations:
        print("OK: No tests reference real TOML files.")
        return 0
    print(f"FAIL: {len(violations)} test(s) reference real TOML files:")
    for path, lineno, line in violations:
        rel = path.relative_to(repo_root)
        print(f"  {rel}:{lineno}: {line}")
    return 1


if __name__ == "__main__":
    sys.exit(main())
```

- [ ] **Step 1.3: Run the script (expect failures)**

```powershell
python scripts/check_test_toml_paths.py
```

Expected: Outputs a list of violations (since the migration hasn't happened yet).

- [ ] **Step 1.4: Commit**

```bash
git -C C:\projects\manual_slop add scripts/check_test_toml_paths.py
git -C C:\projects\manual_slop commit -m "feat(tests): add check_test_toml_paths.py audit script"
git -C C:\projects\manual_slop log -1 --format='%H' | ForEach-Object { git -C C:\projects\manual_slop notes add -m "Greps tests/ for direct ./<name>.toml references. Excludes artifacts, logs, __pycache__, snapshots. Exits 0 on clean, 1 on violations. Used as CI gate." $_ }
```

---

## Task 2: Add the enforcement fixture

**Files:**
- Modify: `tests/conftest.py`
- Create: `tests/test_enforce_no_real_toml.py`

- [ ] **Step 2.1: Pre-edit checkpoint**

```powershell
git -C C:\projects\manual_slop add .
```

- [ ] **Step 2.2: Read current `tests/conftest.py` to find insertion point**

Use `manual-slop_py_get_code_outline` to see the existing fixtures (especially `isolate_workspace` at line 71).

- [ ] **Step 2.3: Add the `enforce_no_real_toml` fixture**

Add near the existing `isolate_workspace` fixture (around line 70-90):

```python
# In tests/conftest.py, after isolate_workspace
@pytest.fixture(autouse=True)
def enforce_no_real_toml(request, tmp_path, monkeypatch):
    """Snapshot any real TOML files in cwd, remove them for the test, restore after.

    This prevents tests from accidentally reading/writing the user's real config.
    Tests must use tmp_path or monkeypatch to get their TOML data.
    """
    from pathlib import Path as _P
    real_toml_basenames = [
        "manual_slop.toml", "config.toml", "credentials.toml",
        "presets.toml", "personas.toml", "tool_presets.toml",
        "workspace_profiles.toml",
    ]
    snapshots: dict[_P, bytes] = {}
    cwd = _P.cwd()
    for name in real_toml_basenames:
        p = cwd / name
        if p.exists():
            snapshots[p] = p.read_bytes()
            p.unlink()
    try:
        yield
    finally:
        for p, content in snapshots.items():
            p.write_bytes(content)
```

- [ ] **Step 2.4: Run the existing test suite to verify the fixture doesn't break things**

```powershell
uv run pytest tests/ -x --timeout=60 -q
```

Expected: Existing tests should still pass. If a test was relying on a real TOML being present, it'll fail and need migration (next task).

- [ ] **Step 2.5: Commit**

```bash
git -C C:\projects\manual_slop add tests/conftest.py
git -C C:\projects\manual_slop commit -m "test(infra): add enforce_no_real_toml autouse fixture"
git -C C:\projects\manual_slop log -1 --format='%H' | ForEach-Object { git -C C:\projects\manual_slop notes add -m "Autouse fixture in tests/conftest.py. Snapshots any real TOML in cwd before test, removes it, restores after. Tests must use tmp_path or monkeypatch to access TOML data." $_ }
```

---

## Task 3: Tests for the enforcer

**Files:**
- Create: `tests/test_enforce_no_real_toml.py`

- [ ] **Step 3.1: Pre-edit checkpoint**

```powershell
git -C C:\projects\manual_slop add .
```

- [ ] **Step 3.2: Create the meta-test**

```python
# tests/test_enforce_no_real_toml.py
import os
from pathlib import Path
import subprocess
import sys


def test_check_script_exits_zero_on_clean_suite():
    """The audit script returns 0 when no violations exist."""
    result = subprocess.run(
        [sys.executable, "scripts/check_test_toml_paths.py"],
        capture_output=True, text=True,
        cwd=os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
    )
    # If we're in a clean state, this should be 0.
    # If there are still violations, this documents the current state.
    assert result.returncode in (0, 1), f"Unexpected exit code: {result.returncode}"


def test_enforce_fixture_runs_without_error():
    """The fixture completes without raising."""
    # If we get here, the fixture ran successfully.
    assert True


def test_real_toml_restored_after_test(tmp_path, monkeypatch):
    """Verify the fixture restores a real TOML after the test."""
    cwd = Path.cwd()
    real_path = cwd / "_enforce_test_temp.toml"
    if real_path.exists():
        real_path.unlink()
    real_path.write_bytes(b"[test]\nkey='value'")
    try:
        # During this test, the fixture removes _enforce_test_temp.toml
        # (it's not in the real_toml_basenames list, so it stays)
        # Test that real_path still exists (fixture didn't touch it)
        assert real_path.exists()
    finally:
        if real_path.exists():
            real_path.unlink()
```

- [ ] **Step 3.3: Run the meta-tests**

```powershell
uv run pytest tests/test_enforce_no_real_toml.py -v
```

Expected: 3 tests pass.

- [ ] **Step 3.4: Commit**

```bash
git -C C:\projects\manual_slop add tests/test_enforce_no_real_toml.py
git -C C:\projects\manual_slop commit -m "test(infra): add tests for enforce_no_real_toml fixture"
git -C C:\projects\manual_slop log -1 --format='%H' | ForEach-Object { git -C C:\projects\manual_slop notes add -m "Meta-tests for the enforcer: script exit codes, fixture completes, real-file state not affected by unrelated tmp files." $_ }
```

---

## Task 4: Migrate test offenders

**Files:**
- Various `tests/test_*.py` (find via `check_test_toml_paths.py`)

- [ ] **Step 4.1: Generate the offender list**

```powershell
python scripts/check_test_toml_paths.py 2>&1 | Tee-Object -FilePath scripts/_offenders.txt
```

- [ ] **Step 4.2: For each offender, migrate**

Pick one violation at a time. For each:

1. Read the test file
2. Identify the line with the violation
3. Refactor to use `tmp_path` + `monkeypatch` (or the `isolate_workspace` fixture if it's already running)
4. Run that test in isolation to verify it still passes
5. Commit the change

**Example migration pattern:**

Before:
```python
def test_load_presets():
    presets_path = Path("presets.toml")
    data = tomllib.loads(presets_path.read_text())
    assert "default" in data
```

After:
```python
def test_load_presets(tmp_path, monkeypatch):
    from src import paths
    presets_path = tmp_path / "presets.toml"
    presets_path.write_text("[presets]\ndefault = {}\n")
    monkeypatch.setattr(paths, "get_global_presets_path", lambda: presets_path)
    data = tomllib.loads(presets_path.read_text())
    assert "default" in data
```

- [ ] **Step 4.3: Re-run the audit script after each batch**

```powershell
python scripts/check_test_toml_paths.py
```

Goal: 0 violations.

- [ ] **Step 4.4: Commit per-file**

```bash
git -C C:\projects\manual_slop add tests/test_<migrated_file>.py
git -C C:\projects\manual_slop commit -m "test(<scope>): migrate <file> to sandboxed TOML pattern"
git -C C:\projects\manual_slop log -1 --format='%H' | ForEach-Object { git -C C:\projects\manual_slop notes add -m "Migrated <N> tests in <file> from real TOML to tmp_path + monkeypatch. Verified with full test suite." $_ }
```

(Repeat for each file.)

---

## Task 5: Consolidate similar tests (judgment call)

**Files:**
- Various `tests/test_*.py`

- [ ] **Step 5.1: Identify consolidation candidates**

Run an analysis of test files:

```powershell
Get-ChildItem C:\projects\manual_slop\tests\test_*.py | Group-Object { $_.BaseName -replace '^test_', '' -replace '_.*$', '' } | Where-Object { $_.Count -gt 2 } | Select-Object Name, Count
```

Look for groups with > 2 files (e.g., `test_ai_settings_*.py`, `test_*_provider.py`).

- [ ] **Step 5.2: For each candidate, evaluate**

For each candidate group, ask:
- Are these tests testing the same thing?
- Would merging them make the failure messages clearer?
- Would merging them make it harder to find a specific test?
- Is the test count reduction a real win, or just a number?

If the answer is "merging improves clarity," proceed. Otherwise, leave alone.

- [ ] **Step 5.3: Consolidation pattern (if proceeding)**

Before (3 files):
```
tests/test_provider_gemini_init.py
tests/test_provider_anthropic_init.py
tests/test_provider_deepseek_init.py
```

After (1 file, parametrized):
```python
# tests/test_provider_init.py
import pytest

@pytest.mark.parametrize("provider_name", ["gemini", "anthropic", "deepseek"])
def test_provider_init(provider_name, tmp_path, monkeypatch):
    # Common test logic, parametrized
    ...
```

- [ ] **Step 5.4: Commit per consolidation**

```bash
git -C C:\projects\manual_slop add tests/test_consolidated.py tests/test_removed_file1.py tests/test_removed_file2.py
git -C C:\projects\manual_slop commit -m "test(consolidate): merge <topic> tests into parametrized single file"
git -C C:\projects\manual_slop log -1 --format='%H' | ForEach-Object { git -C C:\projects\manual_slop notes add -m "Consolidated N test files into 1 parametrized test. <Rationale for clarity gain>." $_ }
```

(Only if a consolidation was actually performed.)

---

## Task 6: Phase Completion Verification

- [ ] **Step 6.1: Final audit**

```powershell
python scripts/check_test_toml_paths.py
```

Expected: "OK: No tests reference real TOML files." Exit 0.

- [ ] **Step 6.2: Full test suite**

```powershell
uv run pytest tests/ -q --timeout=60
```

Expected: All tests pass.

- [ ] **Step 6.3: Create the checkpoint commit**

```bash
git -C C:\projects\manual_slop commit --allow-empty -m "conductor(checkpoint): Test consolidation & TOML sandboxing complete"
git -C C:\projects\manual_slop log -1 --format='%H' | ForEach-Object { git -C C:\projects\manual_slop notes add -m "Track complete. Audit script passes (0 violations). enforce_no_real_toml fixture in place. N tests migrated, M consolidations performed (or 0 if none were warranted)." $_ }
```

---

## Self-Review

- **Spec coverage:** All design phases have a task. ✓
- **Placeholder scan:** Migration example is concrete. ✓
- **Type consistency:** Fixture name `enforce_no_real_toml` used consistently. ✓
- **Consolidation is opt-in:** Task 5 explicitly says "if merging improves clarity" — not forced. ✓