Merge remote-tracking branch 'tier2-clone/master' into tier2/result_migration_app_controller_20260618

2026-06-19 01:11:35 -04:00
parent 8bbec5ce12 ccff6cd5e1
commit aa3c993f4a
5 changed files with 1403 additions and 0 deletions
@@ -34,6 +34,7 @@ Tracks that are unblocked and ready to start. Ordered by **dependency** (blocked
 | 7b | B | [Continued SQLite-Granularity Inline Docs for gui_2.py](#track-continued-sqlite-granularity-inline-docs-for-gui_2py) | spec ✓, plan ✓, complete | (none — independent) |
 | 7c | B | [SQLite-Granularity Inline Docs for ai_client.py](#track-sqlite-granularity-inline-docs-for-ai_clientpy) | spec ✓, plan ✓, ready to start | (none — independent) |
 | 7d | A | [Live GUI Test Infrastructure Fixes](#track-live-gui-test-infrastructure-fixes-new-2026-06-18) | spec ✓, plan ✓, metadata ✓, state ✓, **active**; addresses 2 issues reported for diff tracks by `result_migration_small_files_20260617` Phase 13: (1) `test_execution_sim_live` GUI subprocess (port 8999) crashes mid-test during script generation flow — same failure with both `gemini_cli` and `gemini`; NOT provider-specific; 90s timeout reached without AI text; (2) `test_live_gui_workspace_exists` xdist race — workspace cleanup timing under parallel xdist; passes in isolation. 4 phases: (1) Investigation + Issue 2 parent-commit verification; (2) Fix Issue 2 (TDD); (3) Fix Issue 1 (TDD + remove diagnostic logging); (4) Final verification (11/11 tiers PASS clean). | `result_migration_small_files_20260617` (shipped 2026-06-18 with the 2 issues reported for diff tracks) | (**NEW 2026-06-18**; test-infrastructure track; 2-3 files affected (test + src); TDD for each issue; 11-tier verification required; NO new `@pytest.mark.skip` markers per user directive; out of scope: the 4 Gemini 503 skip markers from sub-track 2 Phase 13 — deferred to a separate follow-up track that mocks the Gemini API in `summarize.summarise_file`) |
+| 16 | A | [Test Sandbox Hardening](#track-test-sandbox-hardening-new-2026-06-19) | spec ✓, plan ✓, metadata ✓, state ✓, **ready to start**; 5-part fix for test data loss outside `./tests/`. Phase 1: investigation + baseline pass count + audit of `get_config_path()` callers. Phase 2: `scripts/audit_test_sandbox_violations.py` (FR4 static audit + `--strict` CI gate). Phase 3: `_enforce_test_sandbox` autouse fixture in conftest.py using `sys.addaudithook` (FR1 Python guard; hard fail on any write outside `./tests/`). Phase 4: root-cause fix — remove `SLOP_CONFIG` env-var fallback from `src/paths.py`; add `--config <path>` CLI flag to sloppy.py + conftest.py; `set_config_override(path)` module-level API (FR2). Phase 5: `isolate_workspace` migration off `tmp_path_factory.mktemp` to `tests/artifacts/_isolation_workspace_<RUN_ID>/`; pyproject.toml `--basetemp` addopts; `SLOP_CREDENTIALS`/`SLOP_MCP_ENV` env vars added to non-live_gui tests; tech-stack.md dated note (FR3). Phase 6: `scripts/run_tests_sandboxed.ps1` (FR5 Windows restricted-token wrapper, OPT-IN). Phase 7: `conductor/code_styleguides/test_sandbox.md` + updates to workspace_paths.md and guide_testing.md (FR7 docs). Phase 8: full 11-tier verification. Phase 9: end-of-track report. 13 regression tests in `tests/test_test_sandbox.py`. ~11 atomic commits. | (none — independent; **NEW 2026-06-19**; test-infrastructure + root-cause fix; primary motivation: user has lost important sample data multiple times over the past month because tests wrote to top-level TOML files; **NO ENV VARS for config path per user directive** — `--config` CLI flag is the only override mechanism; test workspace file naming: `config_overrides.toml`; hard fail on any sandbox violation; tests should never need AppData temp (`tempfile.mkdtemp/mkstemp` without `dir=` is flagged); baseline 1288 + 4 + 0; **out of scope**: converting the other 7 `SLOP_*` env vars (`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`) to CLI flags — user considers this a separate "mess" to address in follow-up tracks; deferred: macOS/Linux OS-level wrapper, per-fixture sandbox strictness tuning, read-side isolation) |
 | 8 | — | [Bootstrap gencpp Python Bindings](#track-bootstrap-gencpp-python-bindings) | spec TBD | (none — independent) |
 | 9 | — | [Tree-Sitter Lua MCP Tools](#track-tree-sitter-lua-mcp-tools) | spec TBD | (none — independent) |
 | 10 | — | [GDScript Language Support Tools](#track-gdscript-language-support-tools) | spec TBD | (none — independent) |
@@ -806,6 +807,40 @@ Lightweight chronology; full spec/plan/state per track is in the linked folder.

 *Out of scope (deferred to follow-up track): the 4 `@pytest.mark.skip` markers for Gemini 503 pre-existing failures (`test_auto_aggregate_skip`, `test_view_mode_summary`, `test_view_mode_default_summary`, `test_view_mode_custom_empty_default_to_summary`). To remove them, mock the Gemini API in `summarize.summarise_file` for tests.*

+#### Track: Test Sandbox Hardening (hard sandbox for tests; root-cause fix for test data loss) `[track-created: 2026-06-19]`
+*Link: [./tracks/test_sandbox_hardening_20260619/](./tracks/test_sandbox_hardening_20260619/), Spec: [./tracks/test_sandbox_hardening_20260619/spec.md](./tracks/test_sandbox_hardening_20260619/spec.md), Plan: [./tracks/test_sandbox_hardening_20260619/plan.md](./tracks/test_sandbox_hardening_20260619/plan.md), Metadata: [./tracks/test_sandbox_hardening_20260619/metadata.json](./tracks/test_sandbox_hardening_20260619/metadata.json)*
+
+*Status: 2026-06-19 - SPEC + PLAN committed. Ready for Tier 2 implementation. 9 phases, 30 tasks, ~11 atomic commits.*
+
+*Goal: Make any `pytest` or `run_tests_batched.py` invocation provably incapable of writing files outside `./tests/`. Default-on Python guard + opt-in OS-level wrapper. Root-cause fix: eliminate the silent `SLOP_CONFIG` env-var fallback that lets tests accidentally touch the user's real `manual_slop.toml` and related top-level files.*
+
+*The 5 enforcement layers:*
+1. **FR2 root-cause fix** — `src/paths.py:get_config_path()` no longer falls back to `<project_root>/config.toml` via `SLOP_CONFIG`. New API: `paths.set_config_override(path)`. CLI flag `--config <path>` at the entry point (sloppy.py for production, conftest.py for tests).
+2. **FR1 Python guard** — `sys.addaudithook` autouse fixture blocks writes outside `./tests/` with `RuntimeError("TEST_SANDBOX_VIOLATION: ...")`. Hard fail; reads unaffected.
+3. **FR3 isolation migration** — `isolate_workspace` moved off `tmp_path_factory.mktemp` to `tests/artifacts/_isolation_workspace_<RUN_ID>/`. pyproject.toml adds `addopts = "--basetemp=tests/artifacts/_pytest_tmp"`. All test infra paths now under `./tests/`.
+4. **FR4 static audit** — `scripts/audit_test_sandbox_violations.py` flags hardcoded paths to top-level TOMLs + `tempfile.mkdtemp/mkstemp` without `dir=`. CI gate (`--strict` exits 1).
+5. **FR5 OS-level wrapper** — `scripts/run_tests_sandboxed.ps1` (Windows restricted-token + Job Object; OPT-IN).
+
+*User directives (locked 2026-06-19):*
+- NO ENV VARS for config path. `--config` CLI flag is the only override mechanism.
+- Test workspace file naming: `config_overrides.toml` (per user direction).
+- Hard fail on any sandbox violation (no warnings, no soft fails).
+- Tests should never need AppData temp.
+- Out of scope (deferred to follow-up tracks): converting the other 7 `SLOP_*` env vars (`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`) — user considers this the "mess" to address separately.
+
+*Baseline (per `result_migration_small_files_20260617` shipped 2026-06-18): 1288 passed + 4 xdist-skipped. VC8 requires no regression vs. this baseline.*
+
+*Root causes of data loss (per Phase 1 audit):*
+1. `src/paths.py:get_config_path()` at line 42 silently falls back to `<project_root>/config.toml` when `SLOP_CONFIG` is unset (the default for tests). This is the silent default that bites.
+2. `tests/conftest.py:isolate_workspace` at line 265 uses `tmp_path_factory.mktemp` which lives in `%TEMP%\pytest-of-<user>\` on Windows — outside `./tests/`.
+3. The Layer 1 Python guard is the runtime safety net; FR2 + FR3 are the proper fixes.
+
+*Deferred follow-up tracks (per metadata.json `deferred_to_followup_tracks`):*
+- Convert the other 7 `SLOP_*` env vars to CLI flags (same pattern: `paths.set_<thing>_override()` + entry-point flag).
+- macOS/Linux OS-level sandbox wrapper (`run_tests_sandboxed.sh` using `bwrap`/`unshare`).
+- Per-fixture sandbox strictness tuning (`@pytest.fixture(sandbox_strict=True)`).
+- Read-side isolation (block reads of real config from tests).
+
 ## Phase 9: Chore Tracks

 *Initialized: 2026-06-07*
@@ -0,0 +1,169 @@
+{
+  "track_id": "test_sandbox_hardening_20260619",
+  "name": "Test Sandbox Hardening",
+  "created": "2026-06-19",
+  "status": "spec_written",
+  "blocked_by": [],
+  "blocks": [],
+  "priority": "A",
+  "rationale": "User has lost important sample data multiple times over the past month because tests have written to top-level TOML files (manual_slop.toml, manual_slop_history.toml, personas.toml, presets.toml, tool_presets.toml, credentials.toml) at the project root. This track adds a 4-layer enforcement stack to make such writes impossible at the Python layer (default) and at the OS layer (opt-in).",
+  "scope": {
+    "new_files": [
+      "scripts/audit_test_sandbox_violations.py",
+      "scripts/run_tests_sandboxed.ps1",
+      "tests/test_test_sandbox.py",
+      "conductor/code_styleguides/test_sandbox.md"
+    ],
+    "modified_files": [
+      "src/paths.py",
+      "src/models.py",
+      "sloppy.py",
+      "tests/conftest.py",
+      "pyproject.toml",
+      "conductor/tech-stack.md",
+      "conductor/code_styleguides/workspace_paths.md",
+      "docs/guide_testing.md",
+      ".gitignore"
+    ],
+    "deleted_files": []
+  },
+  "estimated_effort": {
+    "method": "scope (per workflow.md Tier 1 Track Initialization Rules). NO day estimates.",
+    "phase_1": "1 task: baseline pass-rate capture + verification that isolate_workspace + check_test_toml_paths work as documented",
+    "phase_2": "1 audit script + 4 regression tests + 1 commit",
+    "phase_3": "1 conftest fixture (Layer 1 audit hook) + 4 guard-specific regression tests + 1 commit",
+    "phase_4": "isolate_workspace migration + pyproject.toml addopts + tech-stack.md note + 1 commit",
+    "phase_5": "1 PowerShell wrapper (Layer 3) + 1 smoke test + 1 commit",
+    "phase_6": "2 doc files updated or 1 new styleguide + 1 commit",
+    "phase_7": "11-tier verification run + 1 commit (report)",
+    "phase_8": "1 end-of-track report + 1 commit",
+    "summary": "8 phases, ~8 commits, ~10-12 source files touched across scripts/, tests/, pyproject.toml, docs/, conductor/"
+  },
+  "verification_criteria": [
+    "tests/test_test_sandbox.py exists and all 13 tests pass",
+    "scripts/audit_test_sandbox_violations.py runs in both default and --strict modes",
+    "pyproject.toml contains addopts = '--basetemp=tests/artifacts/_pytest_tmp' under [tool.pytest.ini_options]",
+    "tests/conftest.py isolate_workspace no longer calls tmp_path_factory.mktemp (per workspace_paths.md); all env-var redirects point to paths inside ./tests/artifacts/",
+    "src/paths.py:get_config_path() does NOT call os.environ.get('SLOP_CONFIG', ...); uses set_config_override() instead",
+    "src/paths.py:set_config_override(path) exists and is callable from sloppy.py and conftest.py",
+    "sloppy.py accepts --config argparse argument and calls paths.set_config_override() before importing src/",
+    "tests/conftest.py parses sys.argv for --config at module body (BEFORE any src/ import); auto-defaults to tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml",
+    "tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml is auto-generated on every pytest run",
+    "conductor/code_styleguides/test_sandbox.md exists and documents the --config CLI flag + config_overrides.toml convention",
+    "scripts/run_tests_sandboxed.ps1 exists, parses cleanly, and on Windows can be invoked (-WhatIf mode for dry-run)",
+    "conductor/tech-stack.md has a dated note explaining the --basetemp choice",
+    "conductor/code_styleguides/workspace_paths.md or new test_sandbox.md documents the 3-layer model",
+    "Full test suite (11 tiers) runs to completion with no regression vs. pre-track baseline (1288 passed + 4 xdist-skipped per result_migration_small_files_20260617)",
+    "No new @pytest.mark.skip markers added (per conductor/workflow.md Skip-Marker Policy + user directive)",
+    "End-of-track report at docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md"
+  ],
+  "risk_register": [
+    {
+      "id": "R1",
+      "title": "Layer 1 audit hook breaks a test that legitimately writes outside ./tests/",
+      "likelihood": "medium",
+      "scope_impact": "the implementation may be larger than the spec suggests if many tests need to be migrated to tmp_path",
+      "mitigation": "allowlist includes pytest --basetemp; RuntimeError includes test name so offending test is obvious; add new paths to allowlist only via explicit allowlist update"
+    },
+    {
+      "id": "R2",
+      "title": "Layer 1 audit hook slows down the test suite",
+      "likelihood": "low",
+      "scope_impact": "minimal",
+      "mitigation": "sys.addaudithook is a thin C-level callback; overhead measured in <2% per Python docs"
+    },
+    {
+      "id": "R3",
+      "title": "Layer 4 audit flags a currently-passing test as a false positive",
+      "likelihood": "medium",
+      "scope_impact": "the implementation may be larger than the spec suggests if many tests need cleanup",
+      "mitigation": "audit is INFORMATIONAL by default; --strict is opt-in for CI; fix offending test rather than suppress audit"
+    },
+    {
+      "id": "R4",
+      "title": "Layer 3 PowerShell wrapper breaks on a Windows version without the required privileges",
+      "likelihood": "low",
+      "scope_impact": "minimal",
+      "mitigation": "wrapper is opt-in; default invocation stays uv run pytest; wrapper docs explain privilege requirements"
+    },
+    {
+      "id": "R5",
+      "title": "Existing tests that don't go through isolate_workspace still read real config files",
+      "likelihood": "high",
+      "scope_impact": "known gap, out of scope",
+      "mitigation": "Reads are out of scope per the Out of Scope section; Layer 1 still blocks writes which is the user's primary concern"
+    },
+    {
+      "id": "R7",
+      "title": "Removing SLOP_CONFIG env var fallback breaks code paths that relied on it",
+      "likelihood": "medium",
+      "scope_impact": "the implementation may be larger than the spec suggests if many call sites need updating",
+      "mitigation": "conftest.py auto-defaults to config_overrides.toml inside the test workspace; sloppy.py auto-defaults to root_dir/config.toml; the change should be transparent for any code that goes through get_config_path()"
+    },
+    {
+      "id": "R8",
+      "title": "conftest.py sys.argv parse at module body races with pytest's own argparse",
+      "likelihood": "low",
+      "scope_impact": "minimal",
+      "mitigation": "pytest_addoption registers --config so pytest doesn't warn about unknown flag; sys.argv parse at module body is a known-safe pattern (per conductor/tracks/test_infrastructure_hardening_20260609 conftest patterns)"
+    },
+    {
+      "id": "R6",
+      "title": "pytest_configure setting _tmp_path_factory._basetemp uses a private API that changes between versions",
+      "likelihood": "medium",
+      "scope_impact": "minimal",
+      "mitigation": "the --basetemp addopts is the primary mechanism; the _basetemp assignment is defensive only; if it breaks, addopts still works"
+    }
+  ],
+  "architecture_reference": {
+    "primary_styleguide": "conductor/code_styleguides/workspace_paths.md",
+    "secondary_styleguides": [
+      "conductor/code_styleguides/feature_flags.md",
+      "conductor/code_styleguides/data_oriented_design.md"
+    ],
+    "related_tracks": [
+      "conductor/archive/workspace_path_finalize_20260609/",
+      "conductor/tracks/tier2_autonomous_sandbox_20260616/",
+      "Test Consolidation & TOML Sandboxing (per conductor/tracks.md:395)"
+    ],
+    "pattern_references": [
+      "scripts/audit_no_temp_writes.py (audit script pattern)",
+      "scripts/tier2/run_tier2_sandboxed.ps1 (PowerShell wrapper pattern)",
+      "scripts/check_test_toml_paths.py (existing static audit)"
+    ]
+  },
+  "deferred_to_followup_tracks": [
+    {
+      "title": "Eliminate the remaining SLOP_* env vars (presets, credentials, etc.)",
+      "description": "This track only eliminates SLOP_CONFIG. The other 7 SLOP_* env vars (SLOP_GLOBAL_PRESETS, SLOP_GLOBAL_TOOL_PRESETS, SLOP_GLOBAL_PERSONAS, SLOP_GLOBAL_WORKSPACE_PROFILES, SLOP_CREDENTIALS, SLOP_MCP_ENV, SLOP_LOGS_DIR, SLOP_SCRIPTS_DIR) remain env-var-driven. Per user directive, this is the 'mess' to address in follow-up tracks. Same pattern: paths.set_<thing>_override() module-level + CLI flag at entry point.",
+      "track_status": "not yet specced"
+    },
+    {
+      "title": "Read-side isolation (block reads of real config from tests)",
+      "description": "Layer 1 only blocks writes; reads of real credentials.toml / config.toml still happen for tests that don't go through isolate_workspace. Future track could block reads via a stricter allowlist.",
+      "track_status": "not yet specced"
+    },
+    {
+      "title": "macOS/Linux OS-level sandbox wrapper",
+      "description": "Layer 3 is Windows-only (restricted token + Job Object). A run_tests_sandboxed.sh using bwrap/unshare would extend to macOS/Linux.",
+      "track_status": "not yet specced"
+    },
+    {
+      "title": "Per-fixture sandbox strictness tuning",
+      "description": "The blanket autouse fixture is the v1. A future track could add @pytest.fixture(sandbox_strict=True) for tests that need full OS isolation vs. the default Python guard.",
+      "track_status": "not yet specced"
+    }
+  ],
+  "regressions_and_pre_existing_failures": [],
+  "pre_existing_failures_remaining": [],
+  "user_directives": [
+    "Hard sandbox for tests, similar to Tier 2 - completely banned from accessing files outside ./tests/",
+    "No new @pytest.mark.skip markers",
+    "User has lost important sample data multiple times - this is the primary motivation",
+    "NO ENV VARS for config path. Use --config CLI flag at the entry point (sloppy.py for production, conftest.py for tests)",
+    "Test workspace file naming: config_overrides.toml (per user direction)",
+    "Out of scope: converting the other SLOP_* env vars (presets, credentials, etc.) to CLI flags. User considers them a separate mess to address in follow-up tracks.",
+    "Hard fail on any sandbox violation (no warnings, no soft fails)",
+    "Tests should never need AppData temp (tempfile.mkdtemp/mkstemp without dir= is a flag)"
+  ]
+}
@@ -0,0 +1,741 @@
+# Track Implementation Plan: Test Sandbox Hardening (2026-06-19)
+
+> **For Tier 3 workers:** This plan is executed task-by-task per `conductor/workflow.md`. Each task has WHERE / WHAT / HOW / SAFETY / COMMIT / GIT NOTE fields. Use the spec at `conductor/tracks/test_sandbox_hardening_20260619/spec.md` as the authoritative reference for FR/NFR/VC details.
+
+**Goal:** Make any `pytest` or `run_tests_batched.py` invocation provably incapable of writing files outside `./tests/` at the Python layer (default-on) and at the OS layer (opt-in via `scripts/run_tests_sandboxed.ps1`), by replacing the silent `SLOP_CONFIG` env-var fallback with an explicit `--config` CLI flag and adding a runtime file-I/O guard.
+
+**Architecture:** 5-part fix — (1) `src/paths.py` removes the env-var fallback; (2) sloppy.py + conftest.py parse `--config` and call `paths.set_config_override()`; (3) `sys.addaudithook` blocks writes outside `./tests/`; (4) pytest's `--basetemp` + conftest's `isolate_workspace` migrated under `./tests/`; (5) opt-in Windows restricted-token wrapper. Tests are TDD (red → green → commit).
+
+**Tech Stack:** Python 3.11+, `sys.addaudithook`, `pytest` 9.0+, PowerShell 7+, existing `tomli_w`, `tomllib`.
+
+**Reference files:**
+- Spec: `conductor/tracks/test_sandbox_hardening_20260619/spec.md`
+- Existing pattern: `scripts/audit_no_temp_writes.py` (audit script), `scripts/tier2/run_tier2_sandboxed.ps1` (PowerShell wrapper), `scripts/check_test_toml_paths.py` (existing static audit).
+
+---
+
+## Phase 1: Investigation + Baseline
+
+**Focus:** Capture current pass count + audit src/ for `get_config_path()` callers so FR2 changes are transparent.
+
+- [ ] **Task 1.1:** Capture baseline pass count.
+  - **WHERE:** None (read-only audit).
+  - **WHAT:** Run the full test suite and record results.
+  - **HOW:** `uv run python scripts/run_tests_batched.py --tiers 1,2,3,4,5,6,7,8,9,10,11 > tests/artifacts/_baseline_pre_sandbox.txt 2>&1`
+  - **SAFETY:** Capture pass count + skip count + duration to `tests/artifacts/_baseline_pre_sandbox_summary.txt`. Do NOT modify any source file.
+  - **COMMIT:** None (audit-only).
+  - **GIT NOTE:** None.
+
+- [ ] **Task 1.2:** Audit `src/` for `get_config_path()` callers.
+  - **WHERE:** `src/` (grep audit).
+  - **WHAT:** Find every call site of `paths.get_config_path()` and `models._load_config_from_disk()` / `models._save_config_to_disk()`. The FR2 change (removing env-var fallback) must be transparent to all of them.
+  - **HOW:**
+    ```bash
+    grep -rn "get_config_path\|_load_config_from_disk\|_save_config_to_disk" src/ > tests/artifacts/_get_config_path_callers.txt
+    cat tests/artifacts/_get_config_path_callers.txt | wc -l  # record count
+    ```
+  - **SAFETY:** Expected ~10-20 call sites. All must be transparent because FR2's default (`<project_root>/config.toml`) matches the current silent fallback behavior.
+  - **COMMIT:** None.
+  - **GIT NOTE:** None.
+
+- [ ] **Task 1.3:** Phase 1 verification.
+  - **WHERE:** None.
+  - **WHAT:** Confirm baseline + audit files exist + no source changes since session start.
+  - **HOW:** `ls tests/artifacts/_baseline_pre_sandbox* tests/artifacts/_get_config_path_callers.txt; git status --short | wc -l`
+  - **SAFETY:** Phase 1 is READ-ONLY; `git status` must show 0 modified source files.
+  - **COMMIT:** None.
+  - **GIT NOTE:** None.
+
+---
+
+## Phase 2: FR4 Static Audit (LOW RISK — ship first)
+
+**Focus:** Write the static audit script that flags test files with hardcoded paths or `tempfile.mkdtemp()` without `dir=`. CI gate (default informational, `--strict` exits 1).
+
+- [ ] **Task 2.1:** Write `scripts/audit_test_sandbox_violations.py`.
+  - **WHERE:** Create `scripts/audit_test_sandbox_violations.py`.
+  - **WHAT:** Mirror `scripts/check_test_toml_paths.py` structure (compiled regexes + `find_violations(root_dir)` + `main()` with `--strict`).
+  - **HOW:** Patterns:
+    ```python
+    TOML_BASENAMES = r"manual_slop|config|credentials|presets|personas|tool_presets|workspace_profiles|project|manualslop_layout|manualslop_history|manualslop_history"
+    PATTERNS = [
+        re.compile(rf'Path\(["\'](?:{TOML_BASENAMES})\.toml["\']'),
+        re.compile(rf'Path\(["\'](?:{TOML_BASENAMES})\.ini["\']'),
+        re.compile(rf'open\(["\'](?:{TOML_BASENAMES})\.toml["\'], ["\']w["\']'),
+        re.compile(r'Path\(["\']C:[/\\]+projects'),
+        re.compile(r'Path\(["\']tests/artifacts/'),
+        re.compile(r"tempfile\.mk(dt|st)emp\("),  # bare calls without dir=
+    ]
+    EXCLUDE_DIRS = {"artifacts", "logs", "__pycache__"}
+    ```
+    Plus a `find_violations(tests_dir)` that scans `tests/test_*.py` and returns `list[tuple[Path, int, str]]`. Plus `main()` with `--strict` (exit 1 on any violation; default exit 0 with report).
+  - **SAFETY:** Audit is INFORMATIONAL by default (exits 0). `--strict` exits 1 only on violations. Per `conductor/code_styleguides/audit-script-conventions.md` (if exists) or `audit_no_temp_writes.py` precedent.
+  - **COMMIT:** `chore(audit): add scripts/audit_test_sandbox_violations.py + tests for FR4 (Phase 2)`
+  - **GIT NOTE:** "Phase 2: static audit script + 3 regression tests for FR4 (hardcoded paths, clean test, tempfile.mkdtemp without dir=). Audit default informational, --strict exits 1."
+
+- [ ] **Task 2.2:** Write tests 5, 6, 10 in `tests/test_test_sandbox.py`.
+  - **WHERE:** Create `tests/test_test_sandbox.py`.
+  - **WHAT:** Three tests for the audit script. Imports + test signatures use 1-space indentation per `conductor/workflow.md`.
+  - **HOW:**
+    ```python
+    import subprocess, sys
+    from pathlib import Path
+
+    def test_audit_flags_known_bad_pattern() -> None:
+        bad = Path("tests/artifacts/_audit_test_bad.py")
+        bad.parent.mkdir(parents=True, exist_ok=True)
+        bad.write_text('Path("manual_slop.toml").write_text("x")\n', encoding="utf-8")
+        result = subprocess.run([sys.executable, "scripts/audit_test_sandbox_violations.py", "--strict"],
+                                  capture_output=True, text=True)
+        assert result.returncode == 1, f"Expected exit 1, got {result.returncode}"
+        bad.unlink()
+
+    def test_audit_passes_clean_test() -> None:
+        good = Path("tests/artifacts/_audit_test_good.py")
+        good.parent.mkdir(parents=True, exist_ok=True)
+        good.write_text("def test_x(tmp_path): tmp_path.joinpath('foo').write_text('x')\n", encoding="utf-8")
+        result = subprocess.run([sys.executable, "scripts/audit_test_sandbox_violations.py", "--strict"],
+                                  capture_output=True, text=True)
+        assert result.returncode == 0, f"Expected exit 0, got {result.returncode}: {result.stdout}"
+        good.unlink()
+
+    def test_audit_flags_tempfile_mkdtemp_without_tests_dir() -> None:
+        bad = Path("tests/artifacts/_audit_test_tempfile.py")
+        bad.parent.mkdir(parents=True, exist_ok=True)
+        bad.write_text("import tempfile\ndef test_x(): tempfile.mkdtemp()\n", encoding="utf-8")
+        result = subprocess.run([sys.executable, "scripts/audit_test_sandbox_violations.py", "--strict"],
+                                  capture_output=True, text=True)
+        assert result.returncode == 1, f"Expected exit 1, got {result.returncode}"
+        bad.unlink()
+    ```
+  - **SAFETY:** Tests must clean up their temp files even on failure (use `try/finally` or pytest fixture cleanup).
+  - **COMMIT:** Same as 2.1 (combined commit).
+  - **GIT NOTE:** Same as 2.1.
+
+- [ ] **Task 2.3:** Run Phase 2 tests to verify.
+  - **WHERE:** None.
+  - **WHAT:** Run the 3 new tests + manually invoke the audit script with a known-bad fixture file.
+  - **HOW:** `uv run python -m pytest tests/test_test_sandbox.py -v -k "audit_"`
+  - **SAFETY:** All 3 must pass. If any fail, debug and fix before committing.
+  - **COMMIT:** Same as 2.1.
+  - **GIT NOTE:** Same as 2.1.
+
+---
+
+## Phase 3: FR1 Python Guard (HIGH RISK — must be precise)
+
+**Focus:** Implement `sys.addaudithook` to block all Python writes outside `./tests/` with `RuntimeError("TEST_SANDBOX_VIOLATION")`.
+
+- [ ] **Task 3.1:** Write `_enforce_test_sandbox` autouse fixture in `tests/conftest.py`.
+  - **WHERE:** Modify `tests/conftest.py` — add new fixture near `isolate_workspace` at line ~258.
+  - **WHAT:** Install `sys.addaudithook` for `open` (write modes), `os.mkdir`, `os.makedirs`, `shutil.rmtree`, `tempfile.mkdtemp`, `tempfile.mkstemp`. Allowlist = anything under `<project_root>/tests/`. Block everything else.
+  - **HOW:** (Insert before the existing `isolate_workspace` fixture):
+    ```python
+    _SANDBOX_ALLOWLIST_PREFIXES: tuple[str, ...] = ()  # initialized in pytest_configure
+
+    def _sandbox_audit_hook(event: str, args: tuple[object, ...]) -> None:
+        """sys.addaudithook target. Blocks writes outside ./tests/."""
+        if event == "open":
+            path_obj, mode, *_ = args
+            if not isinstance(path_obj, (str, bytes, os.PathLike)):
+                return
+            if isinstance(mode, str) and not any(m in mode for m in ("w", "a", "x", "+")):
+                return
+            try:
+                resolved = Path(os.fspath(path_obj)).resolve()
+            except (OSError, ValueError):
+                return
+            if not _is_under_tests(resolved):
+                raise RuntimeError(
+                    f"TEST_SANDBOX_VIOLATION: attempted to write to {resolved} "
+                    f"(outside <project_root>/tests/). Use tmp_path or fixture-provided paths."
+                )
+
+    def _is_under_tests(path: Path) -> bool:
+        for prefix in _SANDBOX_ALLOWLIST_PREFIXES:
+            try:
+                path.relative_to(prefix)
+                return True
+            except ValueError:
+                pass
+        return False
+
+    @pytest.fixture(autouse=True)
+    def _enforce_test_sandbox() -> Generator[None, None, None]:
+        """Default-on runtime guard. Installed in pytest_configure."""
+        yield  # No-op; hook is installed at session start.
+
+    def pytest_configure(config: object) -> None:
+        global _SANDBOX_ALLOWLIST_PREFIXES
+        project_root = Path(__file__).resolve().parent.parent
+        _SANDBOX_ALLOWLIST_PREFIXES = (
+            str(project_root / "tests"),
+            str(Path("tests/artifacts/_pytest_tmp").resolve()),
+            str(Path("tests/artifacts/_isolation_workspace").resolve()),
+        )
+        sys.addaudithook(_sandbox_audit_hook)
+        _check_required_test_dependencies()  # existing call
+
+    def pytest_unconfigure(config: object) -> None:
+        # Note: sys.addaudithook is permanent for the process; no removal API.
+        # The hook stays active until process exit (pytest is the only Python here).
+        pass
+    ```
+    **IMPORTANT:** The existing `pytest_configure` at conftest.py:140 must be MERGED with the new one (don't create two definitions).
+  - **SAFETY:** The hook ONLY blocks write modes. Reads pass through. `.pytest_cache`, `__pycache__`, `.coverage` live under `./tests/` or project_root — verify with a quick test run before committing.
+  - **COMMIT:** `feat(tests): add _enforce_test_sandbox autouse fixture for FR1 (Phase 3)`
+  - **GIT NOTE:** "Phase 3: Python sys.addaudithook runtime guard. Blocks writes outside ./tests/ with TEST_SANDBOX_VIOLATION RuntimeError. Reads unaffected. Layer 1 of 4 enforcement stack."
+
+- [ ] **Task 3.2:** Write tests 1-4 in `tests/test_test_sandbox.py`.
+  - **WHERE:** Add to existing `tests/test_test_sandbox.py` (created in Phase 2).
+  - **WHAT:** Four tests verifying guard behavior.
+  - **HOW:**
+    ```python
+    def test_sandbox_blocks_writes_outside_tests_dir() -> None:
+        bad_path = Path(__file__).resolve().parent.parent / "manual_slop.toml"
+        with pytest.raises(RuntimeError, match="TEST_SANDBOX_VIOLATION"):
+            bad_path.write_text("corrupt", encoding="utf-8")
+
+    def test_sandbox_allows_writes_inside_tests_dir(tmp_path) -> None:
+        (tmp_path / "foo.txt").write_text("ok", encoding="utf-8")
+        assert (tmp_path / "foo.txt").read_text(encoding="utf-8") == "ok"
+
+    def test_sandbox_allows_writes_inside_tests_artifacts() -> None:
+        p = Path("tests/artifacts/_sandbox_test_allows/foo.txt")
+        p.parent.mkdir(parents=True, exist_ok=True)
+        p.write_text("ok", encoding="utf-8")
+        assert p.read_text(encoding="utf-8") == "ok"
+        p.unlink()
+
+    def test_sandbox_does_not_block_reads() -> None:
+        pyproject = Path(__file__).resolve().parent.parent / "pyproject.toml"
+        content = pyproject.read_text(encoding="utf-8")
+        assert "[tool.pytest.ini_options]" in content
+    ```
+  - **SAFETY:** Test 1 is expected to RAISE; pytest.raises catches it. Tests 2-3 must SUCCEED. Test 4 must SUCCEED (read-only).
+  - **COMMIT:** Same as 3.1 (combined).
+  - **GIT NOTE:** Same as 3.1.
+
+- [ ] **Task 3.3:** Run full Tier-1 unit suite to verify no regression.
+  - **WHERE:** None.
+  - **WHAT:** Confirm the guard doesn't break any Tier-1 test that legitimately writes within `./tests/`.
+  - **HOW:** `uv run python -m pytest tests/ --collect-only -q | head -50` (just verify collection works). Then `uv run python scripts/run_tests_batched.py --tiers 1 --timeout 120`
+  - **SAFETY:** Tier-1 may have tests that write to `tmp_path` (which now resolves under `./tests/artifacts/_pytest_tmp`). If any test fails, the guard's allowlist needs expansion. Document and add to allowlist only after careful review (the test should already be using `tmp_path`).
+  - **COMMIT:** Same as 3.1.
+  - **GIT NOTE:** Same as 3.1.
+
+---
+
+## Phase 4: FR2 Root-Cause Fix (--config CLI flag — MOST IMPORTANT)
+
+**Focus:** Replace the silent `SLOP_CONFIG` env-var fallback in `src/paths.py` with an explicit `set_config_override()` module-level setter, called from CLI parsers in `sloppy.py` and `tests/conftest.py`. This is THE fix for the user's data-loss pain.
+
+- [ ] **Task 4.1:** Refactor `src/paths.py` to remove the env-var fallback.
+  - **WHERE:** Modify `src/paths.py:42-46` (the `get_config_path()` function).
+  - **WHAT:** Remove `os.environ.get("SLOP_CONFIG", ...)` lookup. Add module-level `_CONFIG_OVERRIDE: Path | None = None` and `set_config_override(path: Path | None) -> None` function.
+  - **HOW:**
+    ```python
+    _CONFIG_OVERRIDE: Path | None = None
+
+    def set_config_override(path: Path | None) -> None:
+        """Set the active config.toml path. None = use default.
+        CLI flag is the ONLY way to override. No env var fallback.
+        [C: sloppy.py:main, tests/conftest.py:_setup_test_paths]"""
+        global _CONFIG_OVERRIDE
+        _CONFIG_OVERRIDE = path
+        _RESOLVED.clear()
+
+    def get_config_path() -> Path:
+        """Returns the active config.toml. If override is set, returns it.
+        Otherwise returns the default <project_root>/config.toml.
+        [C: src/app_controller.py:AppController.load_config,
+           src/app_controller.py:AppController.init_state,
+           src/models.py:_load_config_from_disk]"""
+        if _CONFIG_OVERRIDE is not None:
+            return _CONFIG_OVERRIDE
+        root_dir = Path(__file__).resolve().parent.parent
+        return root_dir / "config.toml"
+    ```
+  - **SAFETY:** The default behavior (no override) returns the same path as the previous env-var fallback when `SLOP_CONFIG` was unset. This is the SAME path the desktop GUI currently uses. So sloppy.py without `--config` works unchanged.
+  - **COMMIT:** `fix(paths): remove SLOP_CONFIG env-var fallback from get_config_path() (Phase 4, FR2 root-cause)`
+  - **GIT NOTE:** "Phase 4 task 4.1: root-cause fix for data loss. src/paths.py no longer silently falls back to <project_root>/config.toml via SLOP_CONFIG env var. New API: paths.set_config_override(path). Default behavior unchanged when no override is set."
+
+- [ ] **Task 4.2:** Remove diagnostic stderr line from `src/models.py:193`.
+  - **WHERE:** Modify `src/models.py:193` (in `_save_config_to_disk`).
+  - **WHAT:** Delete the `sys.stderr.write(f"[DEBUG] Saving config. Theme: {config.get('theme')}\n"); sys.stderr.flush()` line. Per `AGENTS.md` "No Diagnostic Noise in Production" rule.
+  - **HOW:** Delete the two lines.
+  - **SAFETY:** This is a pure removal of diagnostic noise. No behavior change for normal operation. If any test depends on this stderr output, it should be removed too (check `tests/` for `capsys` fixtures matching this output).
+  - **COMMIT:** Same as 4.1 (combined commit "src cleanup for FR2").
+  - **GIT NOTE:** Same as 4.1.
+
+- [ ] **Task 4.3:** Add `--config` argparse to `sloppy.py`.
+  - **WHERE:** Modify `sloppy.py` — the argparse setup (find the existing `ArgumentParser` block).
+  - **WHAT:** Add `--config <path>` flag. Call `paths.set_config_override(args.config)` BEFORE any `src/` import.
+  - **HOW:**
+    ```python
+    parser.add_argument("--config", type=str, default=None,
+                         help="Path to config.toml (default: <project_root>/config.toml)")
+    # ... parse args ...
+    if args.config:
+        from src import paths
+        paths.set_config_override(Path(args.config).resolve())
+    # THEN import the rest:
+    from src.gui_2 import App  # existing import below
+    ```
+  - **SAFETY:** The `set_config_override` call must happen BEFORE `from src.gui_2 import App` because that import chain eventually imports paths and may trigger `get_config_path()`.
+  - **COMMIT:** `feat(sloppy): add --config CLI flag for config.toml override (Phase 4, FR2)`
+  - **GIT NOTE:** "Phase 4 task 4.3: sloppy.py accepts --config <path>. Sets paths.set_config_override() before any src/ import. Default behavior unchanged."
+
+- [ ] **Task 4.4:** Update `tests/conftest.py` to parse `--config` at module body.
+  - **WHERE:** Modify `tests/conftest.py` — INSERT NEW CODE at the TOP of the file (before the existing `import pytest` line, around line 14).
+  - **WHAT:** Parse `sys.argv` for `--config` at module body BEFORE any `src/` import. Auto-default to `tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml`. Also register with pytest via `pytest_addoption`.
+  - **HOW:**
+    ```python
+    # === STAGE 1: Parse --config from sys.argv BEFORE any src/ import ===
+    import sys as _sys
+    from pathlib import Path as _Path
+
+    _RUN_ID = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
+    _ISOLATION_WORKSPACE = _Path(f"tests/artifacts/_isolation_workspace_{_RUN_ID}")
+    _ISOLATION_WORKSPACE.mkdir(parents=True, exist_ok=True)
+
+    def _parse_config_arg(argv: list[str]) -> _Path | None:
+        for i, arg in enumerate(argv[1:]):
+            if arg == "--config" and i + 1 < len(argv) - 1:
+                return _Path(argv[i + 2]).resolve()
+            if arg.startswith("--config="):
+                return _Path(arg.split("=", 1)[1]).resolve()
+        return None
+
+    _config_override_arg = _parse_config_arg(_sys.argv)
+    if _config_override_arg is None:
+        _config_override_arg = _ISOLATION_WORKSPACE / "config_overrides.toml"
+
+    # Set override BEFORE any src/ import
+    from src import paths as _paths  # noqa: E402
+    _paths.set_config_override(_config_override_arg)
+
+    # Register --config with pytest so it doesn't warn about unknown flag
+    def pytest_addoption(parser):
+        parser.addoption("--config", action="store", default=None,
+                          help="Manual Slop: override config.toml path for tests")
+    ```
+    **IMPORTANT:** This block must be inserted BEFORE `from src.app_controller import AppController` (line 64) and BEFORE any other `src/` imports. Also DELETE the `from src.gui_2 import App` line at line ~250 (move it after the new fixture insertion point to keep imports tidy).
+  - **SAFETY:** The sys.argv parse happens at conftest module import time, BEFORE pytest's argparse. The auto-generated `_config_override_arg` lives inside `./tests/artifacts/`, which the Layer 1 guard will allowlist. Tests that explicitly pass `--config /some/path` get that override. Tests without `--config` get the auto-sandbox.
+  - **COMMIT:** `feat(tests): parse --config CLI flag in conftest.py module body (Phase 4, FR2)`
+  - **GIT NOTE:** "Phase 4 task 4.4: conftest.py parses sys.argv for --config BEFORE any src/ import. Auto-defaults to tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml. registers via pytest_addoption so pytest doesn't warn."
+
+- [ ] **Task 4.5:** Write tests 11, 12, 13 in `tests/test_test_sandbox.py`.
+  - **WHERE:** Add to existing `tests/test_test_sandbox.py`.
+  - **WHAT:** Three tests for the `--config` CLI flag behavior.
+  - **HOW:**
+    ```python
+    def test_config_override_via_cli_flag(tmp_path) -> None:
+        config_path = tmp_path / "my_config.toml"
+        config_path.write_text("[ai]\nprovider='gemini'\n", encoding="utf-8")
+        from src import paths
+        original = paths._CONFIG_OVERRIDE
+        try:
+            paths.set_config_override(config_path)
+            assert paths.get_config_path() == config_path
+        finally:
+            paths.set_config_override(original)
+
+    def test_paths_get_config_path_no_env_fallback(monkeypatch) -> None:
+        monkeypatch.delenv("SLOP_CONFIG", raising=False)
+        from src import paths
+        original = paths._CONFIG_OVERRIDE
+        try:
+            paths.set_config_override(None)
+            root = Path(__file__).resolve().parent.parent
+            assert paths.get_config_path() == root / "config.toml"
+        finally:
+            paths.set_config_override(original)
+
+    def test_sloppy_py_parses_config_flag() -> None:
+        import ast
+        sloppy = Path(__file__).resolve().parent.parent / "sloppy.py"
+        tree = ast.parse(sloppy.read_text(encoding="utf-8"))
+        found_config = False
+        for node in ast.walk(tree):
+            if isinstance(node, ast.arg) and node.arg == "config":
+                found_config = True
+        assert found_config, "sloppy.py must have a --config argparse argument"
+    ```
+  - **SAFETY:** Tests manipulate `paths._CONFIG_OVERRIDE` directly (private API but necessary for testing). Always restore in `finally` block.
+  - **COMMIT:** `test(sandbox): add regression tests for --config CLI flag (Phase 4)`
+  - **GIT NOTE:** "Phase 4 task 4.5: 3 regression tests for FR2 (--config CLI flag, no env var fallback, sloppy.py argparse)."
+
+- [ ] **Task 4.6:** Phase 4 verification — run a broad smoke test.
+  - **WHERE:** None.
+  - **WHAT:** Confirm sloppy.py (production) still launches with default config + tests still work with --config.
+  - **HOW:**
+    ```bash
+    # Production: sloppy.py without --config uses default
+    python sloppy.py --help  # should NOT raise; --config appears in help
+
+    # Tests: conftest auto-defaults to ./tests/artifacts/.../config_overrides.toml
+    uv run python -m pytest tests/test_test_sandbox.py::test_config_override_via_cli_flag -v
+    uv run python -m pytest tests/test_paths.py -v  # existing tests still work
+    ```
+  - **SAFETY:** If sloppy.py crashes at import, the `--config` ordering is wrong. If existing tests fail, the new default breaks something — debug before committing.
+  - **COMMIT:** None (this is verification, not a code change).
+  - **GIT NOTE:** None.
+
+---
+
+## Phase 5: FR3 isolate_workspace + basetemp migration
+
+**Focus:** Move the `isolate_workspace` workspace off `%TEMP%` to `./tests/artifacts/_isolation_workspace_<run_id>/`. Add `addopts = "--basetemp=..."` to pyproject.toml. Update tech-stack.md note.
+
+- [ ] **Task 5.1:** Refactor `isolate_workspace` in `tests/conftest.py`.
+  - **WHERE:** Modify `tests/conftest.py:259-281` (the existing `isolate_workspace` autouse).
+  - **WHAT:** Replace `tmp_path_factory.mktemp("isolated_workspace")` with `Path("tests/artifacts/_isolation_workspace") / _RUN_ID`. Add `SLOP_CREDENTIALS` + `SLOP_MCP_ENV` env vars. Auto-generate placeholder TOML files.
+  - **HOW:**
+    ```python
+    @pytest.fixture(autouse=True)
+    def isolate_workspace(monkeypatch) -> Generator[None, None, None]:
+        """Autouse fixture to isolate tests from the active user workspace.
+        Workspace lives under tests/artifacts/ per workspace_paths.md."""
+        test_workspace = _ISOLATION_WORKSPACE  # defined in conftest module body
+        test_workspace.mkdir(parents=True, exist_ok=True)
+
+        # Generate placeholder TOML files
+        config_content = {
+            "ai": {"provider": "gemini", "model": "gemini-2.5-flash-lite"},
+            "projects": {"paths": [], "active": ""},
+            "gui": {"show_windows": {}},
+        }
+        with open(test_workspace / "config_overrides.toml", "wb") as f:
+            tomli_w.dump(config_content, f)
+        for name in ("credentials.toml", "mcp_env.toml", "presets.toml",
+                     "tool_presets.toml", "personas.toml", "workspace_profiles.toml"):
+            (test_workspace / name).touch()
+
+        monkeypatch.setenv("SLOP_CREDENTIALS", str(test_workspace / "credentials.toml"))
+        monkeypatch.setenv("SLOP_MCP_ENV", str(test_workspace / "mcp_env.toml"))
+        monkeypatch.setenv("SLOP_GLOBAL_PRESETS", str(test_workspace / "presets.toml"))
+        monkeypatch.setenv("SLOP_GLOBAL_TOOL_PRESETS", str(test_workspace / "tool_presets.toml"))
+        monkeypatch.setenv("SLOP_GLOBAL_PERSONAS", str(test_workspace / "personas.toml"))
+        monkeypatch.setenv("SLOP_GLOBAL_WORKSPACE_PROFILES", str(test_workspace / "workspace_profiles.toml"))
+        yield
+    ```
+    **Note:** The `tmp_path_factory` parameter is REMOVED from this fixture. Tests that legitimately need it should request it directly (`def test_x(tmp_path): ...`).
+  - **SAFETY:** All env vars point INSIDE the isolation workspace, which is inside `./tests/artifacts/`. The Layer 1 guard allows this. No test should break UNLESS it was relying on the previous `%TEMP%` path.
+  - **COMMIT:** `refactor(tests): migrate isolate_workspace off tmp_path_factory to tests/artifacts/ (Phase 5, FR3)`
+  - **GIT NOTE:** "Phase 5 task 5.1: isolate_workspace fixture now creates tests/artifacts/_isolation_workspace_<RUN_ID>/. Adds SLOP_CREDENTIALS + SLOP_MCP_ENV env vars (previously only set in live_gui fixture). Per workspace_paths.md styleguide."
+
+- [ ] **Task 5.2:** Add `addopts` to `pyproject.toml`.
+  - **WHERE:** Modify `pyproject.toml` — add to `[tool.pytest.ini_options]` section.
+  - **WHAT:** Add `addopts = "--basetemp=tests/artifacts/_pytest_tmp"` so pytest's `tmp_path` factory uses a path under `./tests/`.
+  - **HOW:** Insert:
+    ```toml
+    [tool.pytest.ini_options]
+    addopts = "--basetemp=tests/artifacts/_pytest_tmp"
+    markers = [
+        ...
+    ]
+    ```
+  - **SAFETY:** The basetemp directory is auto-created by pytest. `.gitignore` already has `tests/artifacts/` so it's gitignored.
+  - **COMMIT:** `chore(pyproject): add --basetemp=tests/artifacts/_pytest_tmp addopts (Phase 5, FR3)`
+  - **GIT NOTE:** "Phase 5 task 5.2: pyproject.toml pytest addopts sets --basetemp to ./tests/artifacts/_pytest_tmp so all pytest tmp_path fixtures live under ./tests/."
+
+- [ ] **Task 5.3:** Defensive `_tmp_path_factory._basetemp` check in `conftest.py:pytest_configure`.
+  - **WHERE:** Add to existing `pytest_configure` in `tests/conftest.py` (the one merged in Task 3.1).
+  - **WHAT:** If `config._tmp_path_factory._basetemp` resolves outside `./tests/`, override to `./tests/artifacts/_pytest_tmp`.
+  - **HOW:**
+    ```python
+    project_root = Path(__file__).resolve().parent.parent
+    basetemp = getattr(config, "_tmp_path_factory", None)
+    if basetemp is not None:
+        current = Path(str(basetemp._basetemp)).resolve()
+        if not str(current).startswith(str(project_root / "tests")):
+            basetemp._basetemp = str(project_root / "tests" / "artifacts" / "_pytest_tmp")
+    ```
+  - **SAFETY:** Uses private API `_tmp_path_factory._basetemp` — if pytest version changes, this breaks. The `addopts` in Task 5.2 is the primary mechanism; this is defensive.
+  - **COMMIT:** Same as 5.2 (combined).
+  - **GIT NOTE:** Same as 5.2.
+
+- [ ] **Task 5.4:** Add dated note to `conductor/tech-stack.md`.
+  - **WHERE:** Modify `conductor/tech-stack.md` — append a dated note to the pytest section.
+  - **WHAT:** Explain the `--basetemp` choice and reference `workspace_paths.md`.
+  - **HOW:**
+    ```markdown
+    ## pyproject.toml pytest addopts (added 2026-06-19, per test_sandbox_hardening_20260619)
+
+    `[tool.pytest.ini_options].addopts = "--basetemp=tests/artifacts/_pytest_tmp"`.
+
+    **Rationale:** Per `conductor/code_styleguides/workspace_paths.md`, ALL test
+    infrastructure paths must live under `./tests/`. pytest's `tmp_path` and
+    `tmp_path_factory` fixtures default to `%TEMP%\pytest-of-<user>\` on
+    Windows. This `addopts` redirects them under `./tests/` so the Layer 1
+    runtime guard's allowlist (also `./tests/`) can be a single rule.
+    ```
+  - **SAFETY:** Pure documentation change.
+  - **COMMIT:** `docs(tech-stack): note --basetemp addopts rationale (Phase 5, FR3)`
+  - **GIT NOTE:** Same as 5.2.
+
+- [ ] **Task 5.5:** Write tests 7, 8, 9 in `tests/test_test_sandbox.py`.
+  - **WHERE:** Add to existing `tests/test_test_sandbox.py`.
+  - **WHAT:** Three tests verifying pyproject.toml, isolate_workspace, and AppController invariant.
+  - **HOW:**
+    ```python
+    def test_pyproject_toml_basetemp_is_under_tests() -> None:
+        pyproject = Path(__file__).resolve().parent.parent / "pyproject.toml"
+        text = pyproject.read_text(encoding="utf-8")
+        assert "--basetemp=tests/artifacts/_pytest_tmp" in text
+
+    def test_isolate_workspace_does_not_use_tmp_path_factory_for_infra() -> None:
+        import ast
+        conftest = Path(__file__).resolve().parent / "conftest.py"
+        tree = ast.parse(conftest.read_text(encoding="utf-8"))
+        for node in ast.walk(tree):
+            if isinstance(node, ast.FunctionDef) and node.name == "isolate_workspace":
+                src = ast.unparse(node)
+                assert "tmp_path_factory.mktemp" not in src, (
+                    "isolate_workspace must not use tmp_path_factory.mktemp; "
+                    "use Path('tests/artifacts/_isolation_workspace') / _RUN_ID"
+                )
+                return
+        raise AssertionError("isolate_workspace fixture not found in conftest.py")
+
+    def test_appcontroller_init_does_not_load_config() -> None:
+        import ast
+        app_controller = Path(__file__).resolve().parent.parent / "src" / "app_controller.py"
+        tree = ast.parse(app_controller.read_text(encoding="utf-8"))
+        for node in ast.walk(tree):
+            if isinstance(node, ast.FunctionDef) and node.name == "__init__":
+                src = ast.unparse(node)
+                assert "init_state()" not in src, (
+                    "AppController.__init__ must not call init_state() "
+                    "(this would trigger config reads before fixtures apply)"
+                )
+                assert "load_config()" not in src, (
+                    "AppController.__init__ must not call load_config() "
+                    "(this would trigger config reads before fixtures apply)"
+                )
+                return
+        raise AssertionError("AppController.__init__ not found")
+    ```
+  - **SAFETY:** These tests are static AST checks; they parse source files. They fail loud if invariants break. The `init_state()` invariant test is critical per FR2 audit.
+  - **COMMIT:** `test(sandbox): add regression tests for FR3 invariants (Phase 5)`
+  - **GIT NOTE:** "Phase 5 task 5.5: 3 regression tests for FR3 (pyproject basetemp, isolate_workspace no tmp_path_factory, AppController.__init__ invariant)."
+
+- [ ] **Task 5.6:** Phase 5 verification — run Tier-2 + Tier-3 to confirm no regression.
+  - **WHERE:** None.
+  - **WHAT:** Verify the basetemp migration + isolate_workspace migration don't break existing tests.
+  - **HOW:** `uv run python scripts/run_tests_batched.py --tiers 2,3 --timeout 180`
+  - **SAFETY:** If tests fail, check whether they were using `tmp_path` (which now resolves under `./tests/`) or hardcoded paths to `%TEMP%` (which the Layer 1 guard now blocks). Audit the failing test, don't disable the guard.
+  - **COMMIT:** None.
+  - **GIT NOTE:** None.
+
+---
+
+## Phase 6: FR5 PowerShell Wrapper (OPT-IN)
+
+**Focus:** Write `scripts/run_tests_sandboxed.ps1` (Windows-only, opt-in) that wraps pytest in a Windows restricted token + Job Object.
+
+- [ ] **Task 6.1:** Write `scripts/run_tests_sandboxed.ps1`.
+  - **WHERE:** Create `scripts/run_tests_sandboxed.ps1`.
+  - **WHAT:** Mirror `scripts/tier2/run_tier2_sandboxed.ps1` structure (100 lines). Replace OpenCode launch with pytest launch.
+  - **HOW:** Tier 3 worker MUST read `scripts/tier2/run_tier2_sandboxed.ps1` end-to-end first (per writing-plans skill "Read Reference Implementation COMPLETELY"), then copy its Add-Type / Job Object / token-acquisition blocks verbatim. Only the LAST step (the actual process launch) differs. Full template:
+    ```powershell
+    # scripts/run_tests_sandboxed.ps1
+    <#
+    .SYNOPSIS
+      Run pytest in a Windows restricted-token sandbox.
+    .DESCRIPTION
+      Acquires a Windows restricted token (drops dangerous privileges),
+      wraps pytest in a Job Object, and runs the test suite. The test
+      workspace is forced under ./tests/ via the --config and --basetemp
+      flags (handled by the conftest.py autouse fixtures). The Tier 2
+      clone at <ProjectRoot> is the only directory pytest can read/write
+      for tests; everything outside ./tests/ is blocked by the Layer 1
+      Python guard PLUS the restricted-token enforcement.
+    .NOTES
+      Requires Windows + PowerShell 7+ + admin privileges for full
+      restricted-token acquisition. The -WhatIf mode is a no-op dry-run
+      (exits 0 without acquiring a token).
+    .LINK
+      scripts/tier2/run_tier2_sandboxed.ps1 (template)
+      conductor/tracks/test_sandbox_hardening_20260619/spec.md (FR5)
+    #>
+    [CmdletBinding()]
+    param(
+      [switch]$WhatIf,
+      [string]$TestPath = "tests/",
+      [string]$ConfigPath = ""  # empty = conftest.py auto-defaults to config_overrides.toml
+    )
+
+    $ErrorActionPreference = "Stop"
+    $ProjectRoot = (Resolve-Path "$PSScriptRoot/..").Path
+
+    if ($WhatIf) {
+      Write-Host "[SANDBOX-WHATIF] Would run pytest in restricted token at $ProjectRoot"
+      Write-Host "[SANDBOX-WHATIF] TestPath: $TestPath"
+      Write-Host "[SANDBOX-WHATIF] ConfigPath: $($ConfigPath) (empty = conftest.py auto-defaults)"
+      exit 0
+    }
+
+    # === BEGIN: copy Add-Type / token / Job Object blocks from ===
+    # === scripts/tier2/run_tier2_sandboxed.ps1 lines 30-95 verbatim ===
+    # (See reference script for the full restricted-token + Job Object setup.)
+
+    # === END: tier2 clone blocks ===
+
+    # Invoke pytest under restricted token with sandbox flags.
+    # The --basetemp flag ensures pytest's tmp dirs live under ./tests/.
+    # The --config flag points to a config_overrides.toml inside ./tests/
+    # (or empty = conftest.py auto-defaults).
+    $argList = @(
+      "run", "python", "-m", "pytest", $TestPath,
+      "--basetemp=tests/artifacts/_pytest_tmp"
+    )
+    if ($ConfigPath -ne "") { $argList += "--config=$ConfigPath" }
+
+    Push-Location $ProjectRoot
+    try {
+      & uv @argList
+    } finally {
+      Pop-Location
+    }
+    ```
+    The Add-Type / token / Job Object blocks MUST be copied verbatim from `scripts/tier2/run_tier2_sandboxed.ps1` lines 30-95 (or wherever the equivalent code lives in the latest version of that script — Tier 3 worker should re-read the source). Only the LAST block (the actual invocation) is new.
+  - **SAFETY:** `-WhatIf` mode is a no-op (exits 0). Full PowerShell restricted-token wrapper requires admin privileges on Windows; document this in the script header. The script is OPT-IN — users continue to use `uv run pytest` or `uv run python scripts/run_tests_batched.py` for normal test runs.
+  - **COMMIT:** `feat(scripts): add scripts/run_tests_sandboxed.ps1 (Phase 6, FR5 opt-in)`
+  - **GIT NOTE:** "Phase 6 task 6.1: PowerShell wrapper for Windows restricted-token + Job Object pytest sandbox. Mirrors run_tier2_sandboxed.ps1 structure (Add-Type + token + Job Object blocks copied verbatim). Only the invocation differs (pytest instead of OpenCode). -WhatIf mode for dry-run. OPT-IN."
+
+- [ ] **Task 6.2:** Write a smoke test for `-WhatIf` mode.
+  - **WHERE:** Add to `tests/test_test_sandbox.py` (as test 14).
+  - **WHAT:** Verify `pwsh -File scripts/run_tests_sandboxed.ps1 -WhatIf` exits 0.
+  - **HOW:**
+    ```python
+    @pytest.mark.skipif(os.name != "nt", reason="Windows-only sandbox wrapper")
+    def test_run_tests_sandboxed_whatif() -> None:
+        result = subprocess.run(
+            ["pwsh", "-File", "scripts/run_tests_sandboxed.ps1", "-WhatIf"],
+            capture_output=True, text=True,
+        )
+        assert result.returncode == 0, f"Expected exit 0, got {result.returncode}: {result.stderr}"
+    ```
+  - **SAFETY:** Skipped on non-Windows per `conductor/workflow.md` Skip-Marker Policy (legitimate opt-in integration test, requires Windows + pwsh).
+  - **COMMIT:** Same as 6.1.
+  - **GIT NOTE:** Same as 6.1.
+
+---
+
+## Phase 7: FR7 Documentation
+
+**Focus:** Document the 4-layer enforcement model + `--config` CLI flag convention + `config_overrides.toml` naming.
+
+- [ ] **Task 7.1:** Create `conductor/code_styleguides/test_sandbox.md`.
+  - **WHERE:** Create `conductor/code_styleguides/test_sandbox.md`.
+  - **WHAT:** Styleguide document covering: the `--config` CLI flag, `config_overrides.toml` convention, 4-layer enforcement model, `--basetemp` rule, Layer 1 audit hook contract, opt-in `run_tests_sandboxed.ps1`, audit script.
+  - **HOW:** Use elements-of-style:writing-clearly-and-concisely (the existing styleguides in `conductor/code_styleguides/` are good templates). Sections: TL;DR; The 4-Layer Model; `--config` CLI Flag (replaces SLOP_CONFIG); `--basetemp` Rule; Layer 1 Audit Hook Contract; Static Audit; OS-Level Wrapper; Test Workspace Convention (`config_overrides.toml`); See Also.
+  - **SAFETY:** Documentation only. Reference actual file:line locations from the spec.
+  - **COMMIT:** `docs(styleguide): add test_sandbox.md (Phase 7, FR7)`
+  - **GIT NOTE:** "Phase 7 task 7.1: new styleguide test_sandbox.md documents the 4-layer enforcement model, --config CLI flag, config_overrides.toml convention, --basetemp rule."
+
+- [ ] **Task 7.2:** Update `conductor/code_styleguides/workspace_paths.md`.
+  - **WHERE:** Append a section to the existing file.
+  - **WHAT:** Mention the `SLOP_CONFIG → --config` migration + `pytest --basetemp` addopts.
+  - **HOW:** Add a "2026-06-19 Update" section at the bottom.
+  - **SAFETY:** Documentation only.
+  - **COMMIT:** Same as 7.1.
+  - **GIT NOTE:** Same as 7.1.
+
+- [ ] **Task 7.3:** Add `Sandbox Hardening` section to `docs/guide_testing.md`.
+  - **WHERE:** Modify `docs/guide_testing.md` — add a new section.
+  - **WHAT:** Cross-reference to `test_sandbox.md` + summary of the 4 layers.
+  - **HOW:** Append the section.
+  - **SAFETY:** Documentation only.
+  - **COMMIT:** Same as 7.1.
+  - **GIT NOTE:** Same as 7.1.
+
+---
+
+## Phase 8: Full Suite Verification
+
+**Focus:** Run the full 11-tier suite and confirm no regression vs. the `1288 passed + 4 xdist-skipped` baseline.
+
+- [ ] **Task 8.1:** Run full test suite.
+  - **WHERE:** None.
+  - **WHAT:** Run all 11 tiers and capture results.
+  - **HOW:** `uv run python scripts/run_tests_batched.py --tiers 1,2,3,4,5,6,7,8,9,10,11 > tests/artifacts/_full_suite_post_sandbox.txt 2>&1`
+  - **SAFETY:** If regression vs. baseline (1288 + 4), STOP and report to user. Do not commit a broken suite. Per `conductor/workflow.md` Phase Completion Verification protocol.
+  - **COMMIT:** None (verification).
+  - **GIT NOTE:** None.
+
+- [ ] **Task 8.2:** Commit verification report.
+  - **WHERE:** None (commit the baseline diff comparison).
+  - **WHAT:** Stage `tests/artifacts/_full_suite_post_sandbox.txt` as a verification artifact.
+  - **HOW:** `git add tests/artifacts/_full_suite_post_sandbox.txt; git commit -m "conductor(checkpoint): Phase 8 - full suite green, no regression vs. baseline 1288+4"`
+  - **SAFETY:** If regression occurred in 8.1, fix forward or roll back per `conductor/workflow.md` Per-Task Decision Protocol.
+  - **COMMIT:** As above.
+  - **GIT NOTE:** "Phase 8 checkpoint: full 11-tier suite passed. No regression vs. pre-track baseline (1288 + 4). Test sandbox hardening is operational."
+
+---
+
+## Phase 9: End-of-Track Report
+
+**Focus:** Write the completion report following the precedent set by `TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md`. Update state.toml to `completed`.
+
+- [ ] **Task 9.1:** Write `docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md`.
+  - **WHERE:** Create `docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md`.
+  - **WHAT:** Track completion report with: scope (files added/modified), pass-rate baseline + post, deferred items, lessons learned, follow-up tracks (other SLOP_* env vars), user review gate.
+  - **HOW:** Mirror the structure of `docs/reports/TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md`.
+  - **SAFETY:** Pure documentation.
+  - **COMMIT:** `docs(reports): TRACK_COMPLETION_test_sandbox_hardening_20260619 (Phase 9)`
+  - **GIT NOTE:** "Phase 9: track completion report. 9 phases shipped. 4-layer test sandbox enforcement operational. Deferred: convert other SLOP_* env vars to CLI flags (separate mess, separate tracks)."
+
+- [ ] **Task 9.2:** Update `state.toml` and commit.
+  - **WHERE:** Modify `conductor/tracks/test_sandbox_hardening_20260619/state.toml`.
+  - **WHAT:** Set `status = "completed"`, `current_phase = "complete"`.
+  - **HOW:**
+    ```toml
+    [meta]
+    status = "completed"
+    current_phase = "complete"
+    last_updated = "2026-06-19"
+    ```
+  - **SAFETY:** Pure metadata.
+  - **COMMIT:** `conductor(state): mark test_sandbox_hardening_20260619 complete`
+  - **GIT NOTE:** "Phase 9 final: state.toml marked complete. Track ships."
+
+---
+
+## Summary
+
+| Phase | Tasks | Key output | Risk |
+|-------|-------|------------|------|
+| 1: Investigation | 3 | Baseline pass count + audit of get_config_path() callers | None (read-only) |
+| 2: FR4 Static audit | 3 | `scripts/audit_test_sandbox_violations.py` + 3 tests | Low |
+| 3: FR1 Python guard | 3 | `_enforce_test_sandbox` fixture + 4 tests | High (can break tests) |
+| 4: FR2 Root-cause fix | 6 | `set_config_override()` + `--config` CLI flag + 3 tests | High (root-cause) |
+| 5: FR3 Isolation migration | 6 | `isolate_workspace` + `--basetemp` + tech-stack.md + 3 tests | Medium |
+| 6: FR5 PowerShell | 2 | `scripts/run_tests_sandboxed.ps1` + smoke test | Low (opt-in) |
+| 7: FR7 Documentation | 3 | `test_sandbox.md` + updates | None |
+| 8: Verification | 2 | 11-tier pass count + checkpoint commit | Verification only |
+| 9: Report | 2 | `TRACK_COMPLETION_*` + state.toml `completed` | None |
+
+**Total: 30 tasks across 9 phases, ~11 atomic commits.**
+
+**TDD per phase:** Red (write failing test) → Green (minimal impl) → Verify → Commit.
+
+**Per-task discipline:** WHERE / WHAT / HOW / SAFETY / COMMIT / GIT NOTE per `conductor/workflow.md` Tier 1 rules.
+
+**Hard bans:** No `git restore`, `git checkout`, `git reset`. No day estimates in commit messages or git notes. No diagnostic noise in `src/*.py`. No new `@pytest.mark.skip` markers except the one for `test_run_tests_sandboxed_whatif` (Windows-only, legitimate per `conductor/workflow.md` Skip-Marker Policy).
+
+**Rollback:** Each phase is a separate commit. If any phase breaks, `git revert` the phase's commit(s) without affecting the others.
+
+---
+
+## Handoff to Tier 2
+
+This plan is executed by a Tier 2 Tech Lead via the standard `conductor/workflow.md` Task Workflow:
+1. Activate `mma-orchestrator` skill.
+2. For each task: read context, write code, run tests, commit per `git commit` line, attach git note.
+3. After each phase: phase completion verification + checkpoint.
+4. After Phase 9: track complete; user reviews merge per `conductor/workflow.md` "Review and merge workflow".
+
+Tier 3 workers (via `scripts/mma_exec.py --role tier3-worker`) handle individual tasks with surgical prompts. The Tier 2 Tech Lead reviews each commit before moving to the next task.
@@ -0,0 +1,373 @@
+# Track Specification: Test Sandbox Hardening (2026-06-19)
+
+## Overview
+
+This track adds a hard file-I/O sandbox for the test suite so that a misbehaving
+test (missing fixture, broken monkeypatch, direct `open()` to a hardcoded path)
+cannot corrupt user-owned files in the project root. The user has lost
+"important sample data" multiple times over the past month because tests have
+written to `manual_slop.toml`, `manual_slop_history.toml`, `personas.toml`,
+`presets.toml`, `tool_presets.toml`, or `credentials.toml` at the top of the
+repo.
+
+The fix has 5 parts:
+
+1. **Eliminate the silent `SLOP_CONFIG` env-var fallback** in `src/paths.py`. Replace
+   it with a module-level override set explicitly by the CLI flag `--config`
+   at the entry point (sloppy.py for production, conftest.py for tests). This
+   is the root-cause fix — without it, every other defense is a band-aid.
+2. **Add a Python runtime file-I/O guard** (`sys.addaudithook` on `open` writes).
+   Default-on for every pytest invocation.
+3. **Migrate the test workspace off `tmp_path_factory.mktemp`** (which lives in
+   `%TEMP%`) onto `tests/artifacts/_isolation_workspace_<run_id>/` so the
+   Layer 2 allowlist can be a single rule. Add pytest `--basetemp` to pyproject.toml
+   so pytest's own tmp dirs also live under `./tests/`.
+4. **Add an OS-level restricted-token wrapper** (Windows-only, opt-in via
+   `scripts/run_tests_sandboxed.ps1`) for users who want defense in depth on
+   top of the Python guard.
+5. **Extend the static audit** to flag any test source code that could try
+   to write to a top-level TOML file, plus `tempfile.mkdtemp()` /
+   `tempfile.mkstemp()` calls without `dir=` pointing under `./tests/`.
+
+**Out of scope (per user directive):** the OTHER `SLOP_*` env vars
+(`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`,
+`SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`,
+`SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`) remain env-var-driven for now. The user
+considers them a separate "mess" to be addressed in follow-up tracks. The
+test workspace still uses these env vars to redirect to per-run paths under
+`./tests/artifacts/`.
+
+After this track, the rule is: **any `pytest` or `run_tests_batched.py`
+invocation cannot write a single byte outside `./tests/`, and the static audit
+flags any test source code that could try.**
+
+## Current State Audit (as of 2026-06-19)
+
+### Already Implemented (DO NOT re-implement)
+
+1. **`isolate_workspace` autouse fixture** (`tests/conftest.py:259-281`)
+   - Sets `SLOP_CONFIG`, `SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`,
+     `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES` to a per-test
+     path via `tmp_path_factory.mktemp("isolated_workspace")`.
+   - Provides partial protection for tests that go through `src.paths.get_*_path()`.
+2. **`live_gui` fixture workspace** (`tests/conftest.py:484-525`)
+   - Creates `tests/artifacts/live_gui_workspace_<TIMESTAMP>/` per pytest
+     invocation with fresh `manual_slop.toml` + `config.toml` (per
+     `workspace_path_finalize_20260609`).
+   - Sets `SLOP_CREDENTIALS` + `SLOP_MCP_ENV` to the **real** project-root
+     `credentials.toml` / `mcp_env.toml` (read-only intent).
+3. **`scripts/check_test_toml_paths.py`** (per `conductor/tracks.md:395`)
+   - Static audit detects tests with hardcoded references to TOML basenames
+     (`manual_slop.toml`, `config.toml`, `credentials.toml`, etc.) or to
+     `Path("C:/projects/...")` and `Path("tests/artifacts/")` literals.
+   - CI gate that exits 1 on violation.
+4. **`conductor/code_styleguides/workspace_paths.md`** (148 lines)
+   - Hard rule: test workspaces must live under `tests/artifacts/`. Banned:
+     `tmp_path_factory.mktemp` for test infrastructure workspaces, env vars
+     for test paths, CLI args for test paths.
+5. **`scripts/audit_no_temp_writes.py`** (108 lines)
+   - Audits `scripts/` for `%TEMP%` usage. Pattern reference for the new
+     audit script in Layer 4.
+
+### Gaps to Fill (This Track's Scope)
+
+| # | Gap | Risk | Where |
+|---|-----|------|-------|
+| G1 | `isolate_workspace` uses `tmp_path_factory.mktemp` which lives in `%TEMP%`, violating the existing styleguide | Path allowlist for Layer 1 has to include `%TEMP%` = widened blast radius | `tests/conftest.py:265` |
+| G2 | `isolate_workspace` doesn't set `SLOP_CREDENTIALS` or `SLOP_MCP_ENV` | Non-live_gui tests that go through `src.paths.get_credentials_path()` read the real `credentials.toml` | `tests/conftest.py:259-281` |
+| G3 | No runtime file-I/O guard. Tests can `Path("manual_slop.toml").write_text(...)` with no consequence | **Direct cause of user's data loss** | New: `tests/conftest.py:_enforce_test_sandbox` |
+| G4 | Pytest's `tmp_path` / `tmp_path_factory` default to `%TEMP%\pytest-of-<user>\` | Same widening issue as G1 | New: `pyproject.toml` addopts + `tests/conftest.py:pytest_configure` |
+| G5 | `check_test_toml_paths.py` doesn't catch non-TOML writes (e.g., `Path("manualslop_layout.ini").write_text`, `Path("manualslop_history.toml").write_text`) | Hidden test paths slip through the static audit | New: `scripts/audit_test_sandbox_violations.py` (extends existing) |
+| G6 | No OS-level hard sandbox option for paranoid users (Tier 2 has `run_tier2_sandboxed.ps1`; tests have no equivalent) | Same risk for users running `pytest` interactively without the Python guard | New: `scripts/run_tests_sandboxed.ps1` (opt-in) |
+| G7 | `AppController()` is initialized at `tests/conftest.py:65` at module import (line 65-66: `_warmup_app_controller = AppController(); _warmup_app_controller.wait_for_warmup(timeout=60.0)`), BEFORE the autouse `isolate_workspace` fixture applies | **MOSTLY OK but no invariant.** Per the call chain audit: `AppController.__init__` (src/app_controller.py:787) only sets up state + starts warmup background thread; it does NOT call `init_state()` or `load_config()`. `init_state()` (which reads config) is called from `App.__init__()` in `src/gui_2.py`, AFTER fixtures apply. But there's no test asserting this invariant — a future refactor could accidentally move init_state() into __init__ and silently break the safety | New FR8 regression test: assert `AppController.__init__` does not call `init_state()` or `load_config()` |
+
+## Goals
+
+1. **Make it impossible for any test invocation to write outside `./tests/`** at the Python layer (Layer 1) and at the OS layer when the opt-in wrapper is used (Layer 3).
+2. **Make every test see a fully-sandboxed project root** (Layer 2): config, presets, personas, tool_presets, credentials, mcp_env, AND pytest's own tmp dirs all live under `./tests/`.
+3. **Catch sandbox violations statically** (Layer 4): a developer adding a bad path to a test source file gets a CI failure before the test ever runs.
+4. **No regression in test pass rate.** All 11 tiers must continue to pass clean after this track ships.
+5. **No new `@pytest.mark.skip` markers.** Per the user directive (per `conductor/workflow.md` Skip-Marker Policy), in-session fixes only.
+
+## Functional Requirements
+
+### FR1. Python runtime file-I/O sandbox (Layer 1 — DEFAULT ON)
+
+**WHERE:** New `_enforce_test_sandbox` autouse fixture in `tests/conftest.py` (registered alongside `isolate_workspace` at line ~258).
+
+**WHAT:** Install a `sys.addaudithook()` callable that intercepts the `open` audit event when the mode is `'w'`, `'a'`, `'x'`, or `'+'`, or when the call is to `os.makedirs` / `shutil.rmtree` / `tempfile.mkdtemp` / `tempfile.mkstemp`.
+
+**HOW:**
+- Allowlist (writes ALLOWED):
+  - Any path under `<project_root>/tests/` (resolved absolute; case-normalized on Windows).
+  - Any path under `<project_root>/tests/artifacts/` (already covered by above; explicit for clarity).
+  - Any path under the per-run `_RUN_WORKSPACE` (which lives inside `./tests/artifacts/`).
+  - The pyproject.toml `pytest --basetemp` target (also inside `./tests/`).
+- Denylist (writes REJECTED):
+  - Anything outside `./tests/`.
+- On violation: raise `RuntimeError("TEST_SANDBOX_VIOLATION: <test_name> attempted to write to <absolute_path> which is outside <project_root>/tests/. Use tmp_path or fixture-provided paths.")` and let pytest report the failure with the test name.
+- The hook is installed in `pytest_configure` (so it's in place before any test module imports), uninstalled in `pytest_unconfigure`.
+
+**SAFETY:**
+- Reads are NOT blocked. Tests legitimately need to read the source tree (`src/`, `pyproject.toml`, `mcp_env.toml`, `credentials.toml`, etc.) for fixtures and mocks.
+- The hook must be thread-safe (pytest may run tests in xdist workers).
+- The hook must not break pytest's own internals (`.pytest_cache`, `_pytest_tmp_path_factory` cleanup). The basetemp migration (FR3) handles this.
+- Allowlist resolution must NOT block legitimate pytest cache writes (`<project_root>/.pytest_cache/`, `<project_root>/tests/.pytest_cache/`, `<project_root>/tests/artifacts/__pycache__/`). Add `.pytest_cache`, `__pycache__`, `.coverage`, `.slop_cache`, `.ruff_cache` to the allowlist.
+
+### FR2. CLI flag `--config` replaces `SLOP_CONFIG` env var (ROOT-CAUSE FIX)
+
+**WHERE:** `src/paths.py:42-46` (the silent fallback). `sloppy.py` (CLI parser). `tests/conftest.py` (CLI parser at module body BEFORE any src/ import).
+
+**WHAT:** Remove the `os.environ.get("SLOP_CONFIG", ...)` fallback from `src/paths.py:get_config_path()`. Replace with a module-level `_CONFIG_OVERRIDE: Path | None` that is set ONLY by explicit CLI flag parsing at the entry point.
+
+**HOW:**
+
+```python
+# src/paths.py — REPLACE the existing get_config_path with:
+_CONFIG_OVERRIDE: Path | None = None
+
+def set_config_override(path: Path | None) -> None:
+    """CLI flag is the ONLY way to override. Pass None to use default.
+    [C: sloppy.py:main, tests/conftest.py:_setup_test_paths]"""
+    global _CONFIG_OVERRIDE
+    _CONFIG_OVERRIDE = path
+    _RESOLVED.clear()
+
+def get_config_path() -> Path:
+    """Returns the active config.toml. If override is set, returns it.
+    Otherwise returns the default <project_root>/config.toml.
+    [C: src/app_controller.py:AppController.load_config,
+       src/app_controller.py:AppController.init_state,
+       src/models.py:_load_config_from_disk]"""
+    if _CONFIG_OVERRIDE is not None:
+        return _CONFIG_OVERRIDE
+    root_dir = Path(__file__).resolve().parent.parent
+    return root_dir / "config.toml"
+```
+
+```python
+# sloppy.py — ADD argparse flag (BEFORE any src/ import):
+parser.add_argument("--config", help="Path to config.toml (default: <project_root>/config.toml)")
+args = parser.parse_args()
+if args.config:
+    from src import paths
+    paths.set_config_override(Path(args.config).resolve())
+```
+
+```python
+# tests/conftest.py — PARSE sys.argv at module body BEFORE any src/ import:
+import sys as _sys
+from pathlib import Path as _Path
+
+def _parse_config_arg() -> _Path | None:
+    """Manual sys.argv parse for --config. Returns resolved Path or None."""
+    args = _sys.argv[1:]
+    for i, arg in enumerate(args):
+        if arg == "--config" and i + 1 < len(args):
+            return _Path(args[i + 1]).resolve()
+        if arg.startswith("--config="):
+            return _Path(arg.split("=", 1)[1]).resolve()
+    return None
+
+_config_arg = _parse_config_arg()
+if _config_arg is None:
+    # Default for tests: sandboxed config_overrides.toml
+    config_arg = _Path(f"tests/artifacts/_isolation_workspace_{_RUN_ID}/config_overrides.toml")
+else:
+    config_arg = _config_arg
+
+# Set override BEFORE any src/ import
+from src import paths  # noqa: E402
+paths.set_config_override(config_arg)
+
+# ALSO register with pytest so it doesn't warn about unknown flag:
+def pytest_addoption(parser):
+    parser.addoption("--config", action="store", default=None,
+                      help="Manual Slop: override config.toml path for tests")
+```
+
+**Test workspace contents** (auto-generated by `_setup_test_paths` helper in conftest):
+```
+tests/artifacts/_isolation_workspace_<RUN_ID>/
+├── config_overrides.toml     # the AppController config (per user naming)
+├── credentials.toml          # placeholder for SLOP_CREDENTIALS (env var stays)
+├── mcp_env.toml              # placeholder for SLOP_MCP_ENV (env var stays)
+├── presets.toml              # placeholder for SLOP_GLOBAL_PRESETS
+├── tool_presets.toml         # placeholder for SLOP_GLOBAL_TOOL_PRESETS
+└── personas.toml             # placeholder for SLOP_GLOBAL_PERSONAS
+```
+
+**SAFETY:**
+- The new `get_config_path()` raises `KeyError`-like behavior if no override AND no default exists. This is intentional — better to fail fast than silently use a wrong path.
+- The desktop GUI (`sloppy.py` without `--config`) uses the default `<project_root>/config.toml`, which is unchanged behavior.
+- Tests ALWAYS use a path inside `./tests/` (either from `--config` or the auto-generated default), so the Layer 1 audit hook's allowlist catches any stray writes.
+- Conftest's sys.argv parse happens BEFORE pytest's own argparse (which is too late for src/ imports).
+- The `config_overrides.toml` naming is a convention; tests CAN pass `--config /some/other/path.toml` and it will work.
+
+### FR3. Pytest tmp paths + `isolate_workspace` migration (Layer 2 — DEFAULT ON)
+
+**WHERE:**
+1. `pyproject.toml` — add `addopts = "--basetemp=tests/artifacts/_pytest_tmp"` to `[tool.pytest.ini_options]`.
+2. `tests/conftest.py:isolate_workspace` (lines 259-281) — replace `tmp_path_factory.mktemp("isolated_workspace")` with `Path("tests/artifacts/_isolation_workspace") / _RUN_ID`.
+3. `tests/conftest.py:pytest_configure` — defensive normalization: if `config._tmp_path_factory._basetemp` resolves outside `./tests/`, override to `tests/artifacts/_pytest_tmp`.
+4. `conductor/tech-stack.md` — dated note explaining the `--basetemp` choice.
+
+**WHAT:**
+- All pytest `tmp_path` / `tmp_path_factory` fixtures create temp dirs under `<project_root>/tests/artifacts/_pytest_tmp/`.
+- The `isolate_workspace` autouse fixture's workspace lives under `<project_root>/tests/artifacts/_isolation_workspace_<RUN_ID>/`.
+- Both the `--basetemp` path AND the `isolate_workspace` path are inside `./tests/`, so the Layer 1 audit hook's allowlist can be a single rule: "anything under `./tests/` is allowed."
+
+**HOW:**
+- `pyproject.toml`: standard `addopts` entry.
+- `isolate_workspace`: replace `tmp_path_factory.mktemp("isolated_workspace")` with `Path("tests/artifacts/_isolation_workspace") / _RUN_ID`. Add `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES` env vars pointing inside the isolation workspace. (Note: these are the OTHER `SLOP_*` env vars that the user is punting; they stay env-var-driven for now.)
+- `conftest.py:pytest_configure`: defensive check for `_tmp_path_factory._basetemp`.
+- `tech-stack.md`: dated note.
+
+**SAFETY:** Update `.gitignore` to ensure `tests/artifacts/_pytest_tmp/` and `tests/artifacts/_isolation_workspace/` are covered (already covered by `tests/artifacts/` pattern).
+
+### FR4. Extended static audit (Layer 4 — DEFAULT ON as CI gate)
+
+**WHERE:** New file `scripts/audit_test_sandbox_violations.py` (extends `scripts/check_test_toml_paths.py`).
+
+**WHAT:** Detect test source files that contain hardcoded write operations targeting paths outside `./tests/`. Patterns:
+- `Path("manual_slop.toml")`, `Path("config.toml")`, `Path("credentials.toml")`, `Path("presets.toml")`, `Path("personas.toml")`, `Path("tool_presets.toml")`, `Path("workspace_profiles.toml")`, `Path("manualslop_layout.ini")`, `Path("manual_slop_history.toml")`
+- `Path("project.toml")`, `Path("manual_slop_history.toml")` (top-level TOMLs)
+- `open("manual_slop.toml", "w")` and similar
+- `Path("C:/projects/...")` and `Path("C:\\projects\\...")` (project root references)
+- `Path("tests/artifacts/...")` literal (violates workspace_paths.md; should use a fixture instead)
+- `Path(__file__).parent.parent / "config.toml"` (and similar `..` traversal)
+- `tempfile.mkdtemp()`, `tempfile.mkstemp()` (without a `dir=` kwarg pointing to `./tests/`)
+
+**HOW:** Mirror the existing `check_test_toml_paths.py` structure: list of compiled regexes + `find_violations(root_dir)` + `main()` with `--strict` CI mode.
+
+**SAFETY:** The audit is INFORMATIONAL by default (exits 0). `--strict` mode (CI) exits 1 on any violation. This matches the precedent set by `audit_no_temp_writes.py` and `check_test_toml_paths.py`.
+
+### FR5. OS-level sandbox wrapper (Layer 3 — OPT IN)
+
+**WHERE:** New file `scripts/run_tests_sandboxed.ps1` (analogous to `scripts/tier2/run_tier2_sandboxed.ps1`).
+
+**WHAT:** A PowerShell launcher that:
+1. Acquires a Windows restricted token (drops `SeDebugPrivilege`, `SeBackupPrivilege`, `SeRestorePrivilege`, `SeTakeOwnershipPrivilege`, etc.) — same pattern as `run_tier2_sandboxed.ps1`.
+2. Sets the current directory to the project root.
+3. Wraps the pytest process tree in a Job Object so it cannot escape.
+4. Invokes `uv run python -m pytest` (or `uv run python scripts/run_tests_batched.py`) under the restricted token with `--basetemp=tests/artifacts/_pytest_tmp` (Layer 2 + FR3 ensure tmp dirs are inside the sandbox).
+5. Forwards `--config tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml` so the test config is inside the sandbox.
+6. Reports the exit code.
+
+**HOW:** Copy the structure of `scripts/tier2/run_tier2_sandboxed.ps1` (100 lines). Replace the OpenCode launch with a pytest launch. Keep the Job Object + restricted token machinery.
+
+**SAFETY:** This is OPT-IN. Users who don't need OS-level enforcement continue to use `uv run pytest` or `uv run python scripts/run_tests_batched.py` directly and still get Layer 1 + Layer 2 + Layer 4 protection. The wrapper is for paranoid scenarios.
+
+**Note on Windows ACLs:** The Tier 2 wrapper uses restricted-token; it does NOT use file ACLs. For tests, this is sufficient because (a) tests run as the same user, (b) the Python guard (Layer 1) is the primary defense, (c) restricted-token catches native code paths that bypass Python.
+
+### FR6. Regression tests for the guard
+
+**WHERE:** New file `tests/test_test_sandbox.py`.
+
+**WHAT:** Tests that verify:
+1. `test_sandbox_blocks_writes_outside_tests_dir`: write to a hardcoded `Path("manual_slop.toml")` from a test → `RuntimeError` with the `TEST_SANDBOX_VIOLATION` prefix.
+2. `test_sandbox_allows_writes_inside_tests_dir`: write to `tmp_path / "foo.txt"` → succeeds.
+3. `test_sandbox_allows_writes_inside_tests_artifacts`: write to `Path("tests/artifacts/_pytest_tmp_xxx/foo.txt")` → succeeds.
+4. `test_sandbox_does_not_block_reads`: read from `Path("pyproject.toml")` → succeeds.
+5. `test_audit_test_sandbox_violations_flags_known_bad_pattern`: create a temp test file with `Path("manual_slop.toml").write_text(...)`, run the audit script as a subprocess, assert exit 1.
+6. `test_audit_test_sandbox_violations_passes_clean_test`: create a temp test file using only `tmp_path`, assert exit 0.
+7. `test_pyproject_toml_basetemp_is_under_tests`: parse `pyproject.toml`, assert `addopts` contains `--basetemp=tests/artifacts/_pytest_tmp`.
+8. `test_isolate_workspace_does_not_use_tmp_path_factory_for_infra`: parse `tests/conftest.py`, assert no `tmp_path_factory.mktemp` calls in `isolate_workspace`.
+9. `test_appcontroller_init_does_not_load_config`: parse `src/app_controller.py`, assert `AppController.__init__` does NOT call `init_state()` or `load_config()`. **Per the G7 audit: this guards the invariant that config reads happen AFTER fixtures apply.**
+10. `test_audit_flags_tempfile_mkdtemp_without_tests_dir`: create a test file with `tempfile.mkdtemp()`, run audit, assert exit 1. **Per user directive: tests should never need AppData temp.**
+11. `test_config_override_via_cli_flag`: invoke `python -c "..."` with `--config <path>` and verify `paths.get_config_path()` returns the override.
+12. `test_paths_get_config_path_no_env_fallback`: monkeypatch-delete `SLOP_CONFIG` env var, import `src.paths`, assert `get_config_path()` returns default (no env var lookup).
+13. `test_sloppy_py_parses_config_flag`: parse `sloppy.py` AST, assert `--config` argparse argument exists and triggers `paths.set_config_override()`.
+
+**HOW:** Standard pytest. Use `subprocess.run` for the audit script invocations to test the CLI surface. Use `ast.parse` for static checks on `conftest.py`, `app_controller.py`, and `sloppy.py`.
+
+**SAFETY:** The `test_sandbox_blocks_writes_outside_tests_dir` test will raise `RuntimeError` — pytest must catch it as a pass. Use `pytest.raises(RuntimeError, match="TEST_SANDBOX_VIOLATION")`.
+
+### FR7. Documentation update
+
+**WHERE:** New file `conductor/code_styleguides/test_sandbox.md`. Update `conductor/code_styleguides/workspace_paths.md`. Update `docs/guide_testing.md`.
+
+**WHAT:** Document:
+- The `--config` CLI flag convention (replaces `SLOP_CONFIG` env var).
+- The `config_overrides.toml` naming convention for test workspace configs.
+- The 4-layer enforcement model (Python guard, conftest isolation, OS-level wrapper, static audit).
+- The `--basetemp` rule (why pytest tmp paths must live under `./tests/`).
+- The Layer 1 audit hook contract: writes outside `./tests/` raise `RuntimeError`.
+- The opt-in `scripts/run_tests_sandboxed.ps1` wrapper.
+- The audit script and CI gate.
+
+## Non-Functional Requirements
+
+- **NFR1. Performance:** The audit hook adds < 5% overhead to pytest run time (measured on the existing 11-tier suite). Conftest fixtures are unchanged in scope; only env-var setup is added.
+- **NFR2. Maintainability:** No new `src/` files (per `AGENTS.md` File Size and Naming Convention rule). The Python guard lives in `tests/conftest.py` (test infrastructure). The audit script lives in `scripts/` (project infrastructure). The PowerShell wrapper lives in `scripts/`.
+- **NFR3. Cross-platform:** The Python guard (Layer 1) and the static audit (Layer 4) work on Windows, macOS, and Linux. The PowerShell wrapper (Layer 3) is Windows-only; on non-Windows it's a documented no-op (`Write-Host "OS-level sandbox requires Windows"` and exit 0).
+- **NFR4. Discoverability:** The audit script's `--help` explains what it checks, how to fix violations, and how to run in `--strict` mode. The `RuntimeError` raised by Layer 1 includes a "How to fix" line pointing at `conductor/code_styleguides/test_sandbox.md`.
+
+## Architecture Reference
+
+- **`conductor/code_styleguides/workspace_paths.md`** — existing rule: test workspaces live under `tests/artifacts/`. This track extends it to ALL test infrastructure (including pytest's `tmp_path`).
+- **`conductor/code_styleguides/feature_flags.md`** — Layer 1 + Layer 2 + Layer 4 are file-presence-on = enabled (matches the project's "delete to turn off" convention for cross-cutting concerns). Layer 3 is opt-in via the explicit PowerShell wrapper.
+- **`scripts/audit_no_temp_writes.py`** — pattern reference for the new `scripts/audit_test_sandbox_violations.py`.
+- **`scripts/tier2/run_tier2_sandboxed.ps1`** — pattern reference for the new `scripts/run_tests_sandboxed.ps1`.
+- **`docs/guide_testing.md`** (existing) — test infrastructure deep-dive. Add a new "Sandbox Hardening" section that summarizes Layers 1-4 and links to the styleguide.
+- **`conductor/tracks/workspace_path_finalize_20260609/`** — prior track that established the `tests/artifacts/` workspace pattern. This track extends it.
+
+## Out of Scope
+
+1. **Reading protection.** Tests still need to read the source tree (`src/`, `pyproject.toml`, etc.) for fixtures. Reads are intentionally NOT blocked. If a future track wants read isolation, it's a separate scope.
+2. **Network sandboxing.** Tests that hit the live Gemini/Anthropic/etc. APIs continue to do so. The user's data loss is filesystem, not network.
+3. **Migrating existing tests to the new patterns.** The audit (Layer 4) catches new violations; existing tests that already pass continue to pass. If the audit catches a currently-passing test, that's a bug to fix in the test (separate, in-session fixes).
+4. **Converting the OTHER `SLOP_*` env vars to CLI flags** (`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`). Per user directive, this is the "mess" to address in follow-up tracks. This track only eliminates `SLOP_CONFIG`. The test workspace still uses the other env vars to redirect to per-run paths under `./tests/artifacts/`.
+5. **A cross-platform equivalent of `run_tests_sandboxed.ps1`.** macOS/Linux users get Layer 1 + Layer 2 + Layer 4. Adding a `run_tests_sandboxed.sh` would require `bwrap` or `unshare` setup; defer to a future track if needed.
+6. **Conftest fixture-level enforcement (e.g., `@pytest.fixture(sandbox_strict=True)` for tests that need full OS isolation).** The blanket autouse fixture is the v1. Per-fixture tuning is a v2 feature.
+
+## Verification Criteria
+
+For the track to be marked complete, ALL of the following must be true:
+
+- [ ] **VC1.** `tests/test_test_sandbox.py` exists and all 13 tests pass.
+- [ ] **VC2.** `scripts/audit_test_sandbox_violations.py` runs in both modes:
+  - Default (informational): exit 0, lists violations (or says "clean").
+  - `--strict`: exit 1 on violation, exit 0 on clean.
+- [ ] **VC3.** `pyproject.toml` contains `addopts = "--basetemp=tests/artifacts/_pytest_tmp"` under `[tool.pytest.ini_options]`.
+- [ ] **VC4.** `tests/conftest.py:isolate_workspace` no longer calls `tmp_path_factory.mktemp` (per `workspace_paths.md`). All env-var redirects point to paths inside `./tests/artifacts/`.
+- [ ] **VC5.** `scripts/run_tests_sandboxed.ps1` exists, parses cleanly, and on Windows can be invoked (`-WhatIf` mode for dry-run).
+- [ ] **VC6.** `conductor/tech-stack.md` has a dated note explaining the `--basetemp` choice.
+- [ ] **VC7.** `conductor/code_styleguides/workspace_paths.md` (or new `test_sandbox.md`) documents the 3-layer model.
+- [ ] **VC8.** Full test suite: `uv run python scripts/run_tests_batched.py --tiers 1,2,3,4,5,6,7,8,9,10,11` runs to completion; no regression in pass rate vs. the pre-track baseline (1288 passed + 4 xdist-skipped per `result_migration_small_files_20260617`).
+- [ ] **VC9.** No new `@pytest.mark.skip` markers added (per `conductor/workflow.md` Skip-Marker Policy + user directive).
+- [ ] **VC10.** End-of-track report at `docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md` (per Tier 2 conventions).
+
+## Risk Assessment
+
+| Risk | Likelihood | Mitigation |
+|---|---|---|
+| Layer 1 audit hook breaks a test that legitimately writes outside `./tests/` (e.g., a test that writes to a tempfile.mkdtemp default location) | medium | FR1 allowlist includes pytest's `--basetemp`; if a new path is needed, add it. The hook's `RuntimeError` includes the test name so the offending test is obvious. |
+| Layer 1 audit hook slows down the test suite | low | `sys.addaudithook` is a thin C-level callback; overhead measured in <2% per Python docs. |
+| Layer 4 audit flags a currently-passing test as a false positive | medium | The audit is INFORMATIONAL by default; `--strict` is opt-in for CI. If a real test is flagged, fix the test (don't suppress the audit). |
+| Layer 3 PowerShell wrapper breaks on a Windows version without the required privileges | low | Wrapper is opt-in; default invocation stays `uv run pytest`. Wrapper docs explain the privilege requirements. |
+| Existing tests that don't go through `isolate_workspace` still read real config files | high (known gap) | Reads are out of scope per the Out of Scope section. Layer 1 still blocks writes, which is the user's primary concern. |
+| `pytest_configure` setting `_tmp_path_factory._basetemp` uses a private API that changes between versions | medium | The `--basetemp` addopts is the primary mechanism. The `_basetemp` assignment is defensive only; if it breaks, addopts still works. |
+
+## Execution Plan (high-level — see `plan.md` for worker-ready tasks)
+
+- [ ] **Phase 1: Investigation + baseline** — capture current pass count, confirm `isolate_workspace` + audit script work as documented.
+- [ ] **Phase 2: Layer 4 (static audit) + tests** — write `audit_test_sandbox_violations.py`, write `test_test_sandbox.py` audit-tests (parts 5-8), commit.
+- [ ] **Phase 3: Layer 1 (Python guard) + tests** — implement `_enforce_test_sandbox` fixture, write guard-specific regression tests (parts 1-4), commit.
+- [ ] **Phase 4: Layer 2 (`isolate_workspace` migration + FR3 basetemp)** — refactor `isolate_workspace`, add `addopts` to `pyproject.toml`, update `tech-stack.md`, commit.
+- [ ] **Phase 5: Layer 3 (PowerShell wrapper)** — write `scripts/run_tests_sandboxed.ps1`, write a smoke test, commit.
+- [ ] **Phase 6: Documentation** — update `workspace_paths.md` (or write `test_sandbox.md`), update `docs/guide_testing.md`, commit.
+- [ ] **Phase 7: Full suite verification** — run all 11 tiers, verify no regression, commit.
+- [ ] **Phase 8: End-of-track report** — write `docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md`, commit.
+
+## See Also
+
+- `conductor/tracks.md:395` — prior "Test Consolidation & TOML Sandboxing" track (added `check_test_toml_paths.py`).
+- `conductor/archive/workspace_path_finalize_20260609/` — prior track that established the `tests/artifacts/` workspace pattern.
+- `conductor/tracks/tier2_autonomous_sandbox_20260616/` — meta-tooling track that this sandbox mirrors.
+- `conductor/code_styleguides/workspace_paths.md` — existing test-workspace rule.
+- `conductor/code_styleguides/feature_flags.md` — file-presence = enabled convention.
+- `conductor/workflow.md` Skip-Marker Policy + Skip-Marker review checklist.
+- `docs/guide_testing.md` — existing test infrastructure deep-dive (add a "Sandbox Hardening" section).
+- `scripts/audit_no_temp_writes.py` — pattern reference for the new audit script.
+- `scripts/tier2/run_tier2_sandboxed.ps1` — pattern reference for the new PowerShell wrapper.
@@ -0,0 +1,85 @@
+# Track state for test_sandbox_hardening_20260619
+# Updated by Tier 2 Tech Lead as tasks complete
+
+[meta]
+track_id = "test_sandbox_hardening_20260619"
+name = "Test Sandbox Hardening"
+status = "active"
+current_phase = 0
+last_updated = "2026-06-19"
+
+[blocked_by]
+# Independent track. No blockers.
+
+[blocks]
+# No followup tracks blocked on this one (deferred items listed in metadata.json).
+
+[phases]
+phase_1 = { status = "pending", checkpointsha = "", name = "Investigation + baseline" }
+phase_2 = { status = "pending", checkpointsha = "", name = "FR4 static audit + tests" }
+phase_3 = { status = "pending", checkpointsha = "", name = "FR1 Python guard + tests" }
+phase_4 = { status = "pending", checkpointsha = "", name = "FR2 root-cause fix (--config replaces SLOP_CONFIG)" }
+phase_5 = { status = "pending", checkpointsha = "", name = "FR3 isolate_workspace + basetemp migration" }
+phase_6 = { status = "pending", checkpointsha = "", name = "FR5 PowerShell wrapper" }
+phase_7 = { status = "pending", checkpointsha = "", name = "FR7 documentation" }
+phase_8 = { status = "pending", checkpointsha = "", name = "Full suite verification" }
+phase_9 = { status = "pending", checkpointsha = "", name = "End-of-track report" }
+
+[tasks]
+t1_1 = { status = "pending", commit_sha = "", description = "Capture baseline pass count via `uv run python scripts/run_tests_batched.py --tiers 1..11`. Record pass count + skip count + duration." }
+t1_2 = { status = "pending", commit_sha = "", description = "Confirm scripts/check_test_toml_paths.py runs cleanly (exit 0 in default mode)." }
+t1_3 = { status = "pending", commit_sha = "", description = "Audit src/ for all callers of get_config_path() to confirm FR2 will be transparent. List each caller." }
+
+t2_1 = { status = "pending", commit_sha = "", description = "Write scripts/audit_test_sandbox_violations.py mirroring check_test_toml_paths.py structure. Patterns: manual_slop_history.toml, project.toml, manualslop_layout.ini, tempfile.{mkdtemp,mkstemp} without dir=, Path(__file__).parent.parent / 'config.toml'." }
+t2_2 = { status = "pending", commit_sha = "", description = "Write tests/test_test_sandbox.py tests 5,6,10: audit flagging bad pattern, audit passes clean, audit flags tempfile.mkdtemp without tests dir." }
+t2_3 = { status = "pending", commit_sha = "", description = "Verify audit script with sample bad test files. Commit Phase 2." }
+
+t3_1 = { status = "pending", commit_sha = "", description = "Implement _enforce_test_sandbox autouse fixture in tests/conftest.py: pytest_configure installs sys.addaudithook for open writes + os.makedirs + shutil.rmtree + tempfile.{mkdtemp,mkstemp}; pytest_unconfigure removes it." }
+t3_2 = { status = "pending", commit_sha = "", description = "Write tests/test_test_sandbox.py tests 1-4: guard blocks writes outside ./tests, allows writes inside ./tests, allows writes inside ./tests/artifacts, doesn't block reads." }
+t3_3 = { status = "pending", commit_sha = "", description = "Manually verify guard fires on a sample bad write. Commit Phase 3." }
+
+t4_1 = { status = "pending", commit_sha = "", description = "Refactor src/paths.py: remove os.environ.get('SLOP_CONFIG', ...) fallback from get_config_path(). Add module-level _CONFIG_OVERRIDE + set_config_override() function." }
+t4_2 = { status = "pending", commit_sha = "", description = "Remove diagnostic stderr.write line at src/models.py:193 ('[DEBUG] Saving config...')." }
+t4_3 = { status = "pending", commit_sha = "", description = "Add --config argparse argument to sloppy.py. Call paths.set_config_override(args.config) BEFORE any src/ import." }
+t4_4 = { status = "pending", commit_sha = "", description = "Update tests/conftest.py to parse sys.argv for --config at module body BEFORE any src/ import. Add pytest_addoption registration. Auto-default to tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml." }
+t4_5 = { status = "pending", commit_sha = "", description = "Write tests/test_test_sandbox.py tests 11,12,13: --config CLI flag works, no env var fallback, sloppy.py parses --config." }
+t4_6 = { status = "pending", commit_sha = "", description = "Commit Phase 4 (FR2 root-cause fix). This is the most important commit in the track." }
+
+t5_1 = { status = "pending", commit_sha = "", description = "Refactor tests/conftest.py isolate_workspace: replace tmp_path_factory.mktemp with Path('tests/artifacts/_isolation_workspace') / _RUN_ID. Add SLOP_CREDENTIALS + SLOP_MCP_ENV env vars. Auto-generate placeholder TOML files (credentials.toml, mcp_env.toml, presets.toml, tool_presets.toml, personas.toml) in the isolation workspace." }
+t5_2 = { status = "pending", commit_sha = "", description = "Add `addopts = \"--basetemp=tests/artifacts/_pytest_tmp\"` to pyproject.toml [tool.pytest.ini_options]." }
+t5_3 = { status = "pending", commit_sha = "", description = "Add defensive pytest_configure check in conftest.py: if config._tmp_path_factory._basetemp resolves outside ./tests/, override." }
+t5_4 = { status = "pending", commit_sha = "", description = "Add dated note to conductor/tech-stack.md explaining --basetemp choice." }
+t5_5 = { status = "pending", commit_sha = "", description = "Write tests/test_test_sandbox.py tests 7,8,9: pyproject.toml has --basetemp, isolate_workspace no tmp_path_factory.mktemp, AppController.__init__ doesn't call init_state()." }
+t5_6 = { status = "pending", commit_sha = "", description = "Commit Phase 5 (FR3 isolation migration)." }
+
+t6_1 = { status = "pending", commit_sha = "", description = "Write scripts/run_tests_sandboxed.ps1 based on run_tier2_sandboxed.ps1 structure: restricted token + Job Object + pytest invocation with --config + --basetemp." }
+t6_2 = { status = "pending", commit_sha = "", description = "Write smoke test: `pwsh -File scripts/run_tests_sandboxed.ps1 -WhatIf` exits 0." }
+t6_3 = { status = "pending", commit_sha = "", description = "Commit Phase 6 (FR5 PowerShell wrapper)." }
+
+t7_1 = { status = "pending", commit_sha = "", description = "Create conductor/code_styleguides/test_sandbox.md documenting --config CLI flag, config_overrides.toml convention, 4-layer enforcement model." }
+t7_2 = { status = "pending", commit_sha = "", description = "Update conductor/code_styleguides/workspace_paths.md to mention the new SLOP_CONFIG → --config migration." }
+t7_3 = { status = "pending", commit_sha = "", description = "Add 'Sandbox Hardening' section to docs/guide_testing.md." }
+t7_4 = { status = "pending", commit_sha = "", description = "Commit Phase 7 (FR7 documentation)." }
+
+t8_1 = { status = "pending", commit_sha = "", description = "Run `uv run python scripts/run_tests_batched.py --tiers 1,2,3,4,5,6,7,8,9,10,11`. Capture pass count + duration. Verify no regression vs. baseline (1288 passed + 4 xdist-skipped)." }
+t8_2 = { status = "pending", commit_sha = "", description = "If regression: report to user with diff and propose fix. If no regression: commit verification report." }
+
+t9_1 = { status = "pending", commit_sha = "", description = "Write docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md following precedent set by TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md." }
+t9_2 = { status = "pending", commit_sha = "", description = "Update state.toml: status = 'completed', current_phase = 'complete'. Commit." }
+
+[verification]
+phase_1_baseline_captured = false
+phase_4_root_cause_fix = false
+phase_5_layer_2_works = false
+phase_8_full_suite_no_regression = false
+phase_9_report_written = false
+
+[user_directives_logged]
+hard_sandbox_required = "User wants hard sandbox similar to Tier 2; pytest/run_tests_batched.ps1 entirely banned from accessing files outside ./tests/"
+no_new_skip_markers = "Per conductor/workflow.md Skip-Marker Policy + user directive"
+sample_data_loss = "User has lost important sample data multiple times over the past month - primary motivation for this track"
+design_chosen = "C (Both): Python guard (default) + Windows restricted-token wrapper (opt-in). User confirmed on 2026-06-19."
+no_env_vars = "Per user 2026-06-19: NO ENV VARS for config path. --config CLI flag is the only override mechanism. Other SLOP_* env vars stay for now (user will fix in follow-up tracks)."
+config_overrides_naming = "Per user 2026-06-19: test workspace file is named 'config_overrides.toml' (the convention for test sandbox configs)."
+hard_fail = "Per user 2026-06-19: HARD FAIL on any sandbox violation. No warnings, no soft fails."
+no_appdata_temp = "Per user 2026-06-19: tests should never need AppData temp. tempfile.mkdtemp/mkstemp without dir= is a flag."