Merge remote-tracking branch 'tier2-clone/master' into tier2/result_migration_app_controller_20260618
This commit is contained in:
@@ -34,6 +34,7 @@ Tracks that are unblocked and ready to start. Ordered by **dependency** (blocked
|
||||
| 7b | B | [Continued SQLite-Granularity Inline Docs for gui_2.py](#track-continued-sqlite-granularity-inline-docs-for-gui_2py) | spec ✓, plan ✓, complete | (none — independent) |
|
||||
| 7c | B | [SQLite-Granularity Inline Docs for ai_client.py](#track-sqlite-granularity-inline-docs-for-ai_clientpy) | spec ✓, plan ✓, ready to start | (none — independent) |
|
||||
| 7d | A | [Live GUI Test Infrastructure Fixes](#track-live-gui-test-infrastructure-fixes-new-2026-06-18) | spec ✓, plan ✓, metadata ✓, state ✓, **active**; addresses 2 issues reported for diff tracks by `result_migration_small_files_20260617` Phase 13: (1) `test_execution_sim_live` GUI subprocess (port 8999) crashes mid-test during script generation flow — same failure with both `gemini_cli` and `gemini`; NOT provider-specific; 90s timeout reached without AI text; (2) `test_live_gui_workspace_exists` xdist race — workspace cleanup timing under parallel xdist; passes in isolation. 4 phases: (1) Investigation + Issue 2 parent-commit verification; (2) Fix Issue 2 (TDD); (3) Fix Issue 1 (TDD + remove diagnostic logging); (4) Final verification (11/11 tiers PASS clean). | `result_migration_small_files_20260617` (shipped 2026-06-18 with the 2 issues reported for diff tracks) | (**NEW 2026-06-18**; test-infrastructure track; 2-3 files affected (test + src); TDD for each issue; 11-tier verification required; NO new `@pytest.mark.skip` markers per user directive; out of scope: the 4 Gemini 503 skip markers from sub-track 2 Phase 13 — deferred to a separate follow-up track that mocks the Gemini API in `summarize.summarise_file`) |
|
||||
| 16 | A | [Test Sandbox Hardening](#track-test-sandbox-hardening-new-2026-06-19) | spec ✓, plan ✓, metadata ✓, state ✓, **ready to start**; 5-part fix for test data loss outside `./tests/`. Phase 1: investigation + baseline pass count + audit of `get_config_path()` callers. Phase 2: `scripts/audit_test_sandbox_violations.py` (FR4 static audit + `--strict` CI gate). Phase 3: `_enforce_test_sandbox` autouse fixture in conftest.py using `sys.addaudithook` (FR1 Python guard; hard fail on any write outside `./tests/`). Phase 4: root-cause fix — remove `SLOP_CONFIG` env-var fallback from `src/paths.py`; add `--config <path>` CLI flag to sloppy.py + conftest.py; `set_config_override(path)` module-level API (FR2). Phase 5: `isolate_workspace` migration off `tmp_path_factory.mktemp` to `tests/artifacts/_isolation_workspace_<RUN_ID>/`; pyproject.toml `--basetemp` addopts; `SLOP_CREDENTIALS`/`SLOP_MCP_ENV` env vars added to non-live_gui tests; tech-stack.md dated note (FR3). Phase 6: `scripts/run_tests_sandboxed.ps1` (FR5 Windows restricted-token wrapper, OPT-IN). Phase 7: `conductor/code_styleguides/test_sandbox.md` + updates to workspace_paths.md and guide_testing.md (FR7 docs). Phase 8: full 11-tier verification. Phase 9: end-of-track report. 13 regression tests in `tests/test_test_sandbox.py`. ~11 atomic commits. | (none — independent; **NEW 2026-06-19**; test-infrastructure + root-cause fix; primary motivation: user has lost important sample data multiple times over the past month because tests wrote to top-level TOML files; **NO ENV VARS for config path per user directive** — `--config` CLI flag is the only override mechanism; test workspace file naming: `config_overrides.toml`; hard fail on any sandbox violation; tests should never need AppData temp (`tempfile.mkdtemp/mkstemp` without `dir=` is flagged); baseline 1288 + 4 + 0; **out of scope**: converting the other 7 `SLOP_*` env vars (`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`) to CLI flags — user considers this a separate "mess" to address in follow-up tracks; deferred: macOS/Linux OS-level wrapper, per-fixture sandbox strictness tuning, read-side isolation) |
|
||||
| 8 | — | [Bootstrap gencpp Python Bindings](#track-bootstrap-gencpp-python-bindings) | spec TBD | (none — independent) |
|
||||
| 9 | — | [Tree-Sitter Lua MCP Tools](#track-tree-sitter-lua-mcp-tools) | spec TBD | (none — independent) |
|
||||
| 10 | — | [GDScript Language Support Tools](#track-gdscript-language-support-tools) | spec TBD | (none — independent) |
|
||||
@@ -806,6 +807,40 @@ Lightweight chronology; full spec/plan/state per track is in the linked folder.
|
||||
|
||||
*Out of scope (deferred to follow-up track): the 4 `@pytest.mark.skip` markers for Gemini 503 pre-existing failures (`test_auto_aggregate_skip`, `test_view_mode_summary`, `test_view_mode_default_summary`, `test_view_mode_custom_empty_default_to_summary`). To remove them, mock the Gemini API in `summarize.summarise_file` for tests.*
|
||||
|
||||
#### Track: Test Sandbox Hardening (hard sandbox for tests; root-cause fix for test data loss) `[track-created: 2026-06-19]`
|
||||
*Link: [./tracks/test_sandbox_hardening_20260619/](./tracks/test_sandbox_hardening_20260619/), Spec: [./tracks/test_sandbox_hardening_20260619/spec.md](./tracks/test_sandbox_hardening_20260619/spec.md), Plan: [./tracks/test_sandbox_hardening_20260619/plan.md](./tracks/test_sandbox_hardening_20260619/plan.md), Metadata: [./tracks/test_sandbox_hardening_20260619/metadata.json](./tracks/test_sandbox_hardening_20260619/metadata.json)*
|
||||
|
||||
*Status: 2026-06-19 - SPEC + PLAN committed. Ready for Tier 2 implementation. 9 phases, 30 tasks, ~11 atomic commits.*
|
||||
|
||||
*Goal: Make any `pytest` or `run_tests_batched.py` invocation provably incapable of writing files outside `./tests/`. Default-on Python guard + opt-in OS-level wrapper. Root-cause fix: eliminate the silent `SLOP_CONFIG` env-var fallback that lets tests accidentally touch the user's real `manual_slop.toml` and related top-level files.*
|
||||
|
||||
*The 5 enforcement layers:*
|
||||
1. **FR2 root-cause fix** — `src/paths.py:get_config_path()` no longer falls back to `<project_root>/config.toml` via `SLOP_CONFIG`. New API: `paths.set_config_override(path)`. CLI flag `--config <path>` at the entry point (sloppy.py for production, conftest.py for tests).
|
||||
2. **FR1 Python guard** — `sys.addaudithook` autouse fixture blocks writes outside `./tests/` with `RuntimeError("TEST_SANDBOX_VIOLATION: ...")`. Hard fail; reads unaffected.
|
||||
3. **FR3 isolation migration** — `isolate_workspace` moved off `tmp_path_factory.mktemp` to `tests/artifacts/_isolation_workspace_<RUN_ID>/`. pyproject.toml adds `addopts = "--basetemp=tests/artifacts/_pytest_tmp"`. All test infra paths now under `./tests/`.
|
||||
4. **FR4 static audit** — `scripts/audit_test_sandbox_violations.py` flags hardcoded paths to top-level TOMLs + `tempfile.mkdtemp/mkstemp` without `dir=`. CI gate (`--strict` exits 1).
|
||||
5. **FR5 OS-level wrapper** — `scripts/run_tests_sandboxed.ps1` (Windows restricted-token + Job Object; OPT-IN).
|
||||
|
||||
*User directives (locked 2026-06-19):*
|
||||
- NO ENV VARS for config path. `--config` CLI flag is the only override mechanism.
|
||||
- Test workspace file naming: `config_overrides.toml` (per user direction).
|
||||
- Hard fail on any sandbox violation (no warnings, no soft fails).
|
||||
- Tests should never need AppData temp.
|
||||
- Out of scope (deferred to follow-up tracks): converting the other 7 `SLOP_*` env vars (`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`) — user considers this the "mess" to address separately.
|
||||
|
||||
*Baseline (per `result_migration_small_files_20260617` shipped 2026-06-18): 1288 passed + 4 xdist-skipped. VC8 requires no regression vs. this baseline.*
|
||||
|
||||
*Root causes of data loss (per Phase 1 audit):*
|
||||
1. `src/paths.py:get_config_path()` at line 42 silently falls back to `<project_root>/config.toml` when `SLOP_CONFIG` is unset (the default for tests). This is the silent default that bites.
|
||||
2. `tests/conftest.py:isolate_workspace` at line 265 uses `tmp_path_factory.mktemp` which lives in `%TEMP%\pytest-of-<user>\` on Windows — outside `./tests/`.
|
||||
3. The Layer 1 Python guard is the runtime safety net; FR2 + FR3 are the proper fixes.
|
||||
|
||||
*Deferred follow-up tracks (per metadata.json `deferred_to_followup_tracks`):*
|
||||
- Convert the other 7 `SLOP_*` env vars to CLI flags (same pattern: `paths.set_<thing>_override()` + entry-point flag).
|
||||
- macOS/Linux OS-level sandbox wrapper (`run_tests_sandboxed.sh` using `bwrap`/`unshare`).
|
||||
- Per-fixture sandbox strictness tuning (`@pytest.fixture(sandbox_strict=True)`).
|
||||
- Read-side isolation (block reads of real config from tests).
|
||||
|
||||
## Phase 9: Chore Tracks
|
||||
|
||||
*Initialized: 2026-06-07*
|
||||
|
||||
@@ -0,0 +1,169 @@
|
||||
{
|
||||
"track_id": "test_sandbox_hardening_20260619",
|
||||
"name": "Test Sandbox Hardening",
|
||||
"created": "2026-06-19",
|
||||
"status": "spec_written",
|
||||
"blocked_by": [],
|
||||
"blocks": [],
|
||||
"priority": "A",
|
||||
"rationale": "User has lost important sample data multiple times over the past month because tests have written to top-level TOML files (manual_slop.toml, manual_slop_history.toml, personas.toml, presets.toml, tool_presets.toml, credentials.toml) at the project root. This track adds a 4-layer enforcement stack to make such writes impossible at the Python layer (default) and at the OS layer (opt-in).",
|
||||
"scope": {
|
||||
"new_files": [
|
||||
"scripts/audit_test_sandbox_violations.py",
|
||||
"scripts/run_tests_sandboxed.ps1",
|
||||
"tests/test_test_sandbox.py",
|
||||
"conductor/code_styleguides/test_sandbox.md"
|
||||
],
|
||||
"modified_files": [
|
||||
"src/paths.py",
|
||||
"src/models.py",
|
||||
"sloppy.py",
|
||||
"tests/conftest.py",
|
||||
"pyproject.toml",
|
||||
"conductor/tech-stack.md",
|
||||
"conductor/code_styleguides/workspace_paths.md",
|
||||
"docs/guide_testing.md",
|
||||
".gitignore"
|
||||
],
|
||||
"deleted_files": []
|
||||
},
|
||||
"estimated_effort": {
|
||||
"method": "scope (per workflow.md Tier 1 Track Initialization Rules). NO day estimates.",
|
||||
"phase_1": "1 task: baseline pass-rate capture + verification that isolate_workspace + check_test_toml_paths work as documented",
|
||||
"phase_2": "1 audit script + 4 regression tests + 1 commit",
|
||||
"phase_3": "1 conftest fixture (Layer 1 audit hook) + 4 guard-specific regression tests + 1 commit",
|
||||
"phase_4": "isolate_workspace migration + pyproject.toml addopts + tech-stack.md note + 1 commit",
|
||||
"phase_5": "1 PowerShell wrapper (Layer 3) + 1 smoke test + 1 commit",
|
||||
"phase_6": "2 doc files updated or 1 new styleguide + 1 commit",
|
||||
"phase_7": "11-tier verification run + 1 commit (report)",
|
||||
"phase_8": "1 end-of-track report + 1 commit",
|
||||
"summary": "8 phases, ~8 commits, ~10-12 source files touched across scripts/, tests/, pyproject.toml, docs/, conductor/"
|
||||
},
|
||||
"verification_criteria": [
|
||||
"tests/test_test_sandbox.py exists and all 13 tests pass",
|
||||
"scripts/audit_test_sandbox_violations.py runs in both default and --strict modes",
|
||||
"pyproject.toml contains addopts = '--basetemp=tests/artifacts/_pytest_tmp' under [tool.pytest.ini_options]",
|
||||
"tests/conftest.py isolate_workspace no longer calls tmp_path_factory.mktemp (per workspace_paths.md); all env-var redirects point to paths inside ./tests/artifacts/",
|
||||
"src/paths.py:get_config_path() does NOT call os.environ.get('SLOP_CONFIG', ...); uses set_config_override() instead",
|
||||
"src/paths.py:set_config_override(path) exists and is callable from sloppy.py and conftest.py",
|
||||
"sloppy.py accepts --config argparse argument and calls paths.set_config_override() before importing src/",
|
||||
"tests/conftest.py parses sys.argv for --config at module body (BEFORE any src/ import); auto-defaults to tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml",
|
||||
"tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml is auto-generated on every pytest run",
|
||||
"conductor/code_styleguides/test_sandbox.md exists and documents the --config CLI flag + config_overrides.toml convention",
|
||||
"scripts/run_tests_sandboxed.ps1 exists, parses cleanly, and on Windows can be invoked (-WhatIf mode for dry-run)",
|
||||
"conductor/tech-stack.md has a dated note explaining the --basetemp choice",
|
||||
"conductor/code_styleguides/workspace_paths.md or new test_sandbox.md documents the 3-layer model",
|
||||
"Full test suite (11 tiers) runs to completion with no regression vs. pre-track baseline (1288 passed + 4 xdist-skipped per result_migration_small_files_20260617)",
|
||||
"No new @pytest.mark.skip markers added (per conductor/workflow.md Skip-Marker Policy + user directive)",
|
||||
"End-of-track report at docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md"
|
||||
],
|
||||
"risk_register": [
|
||||
{
|
||||
"id": "R1",
|
||||
"title": "Layer 1 audit hook breaks a test that legitimately writes outside ./tests/",
|
||||
"likelihood": "medium",
|
||||
"scope_impact": "the implementation may be larger than the spec suggests if many tests need to be migrated to tmp_path",
|
||||
"mitigation": "allowlist includes pytest --basetemp; RuntimeError includes test name so offending test is obvious; add new paths to allowlist only via explicit allowlist update"
|
||||
},
|
||||
{
|
||||
"id": "R2",
|
||||
"title": "Layer 1 audit hook slows down the test suite",
|
||||
"likelihood": "low",
|
||||
"scope_impact": "minimal",
|
||||
"mitigation": "sys.addaudithook is a thin C-level callback; overhead measured in <2% per Python docs"
|
||||
},
|
||||
{
|
||||
"id": "R3",
|
||||
"title": "Layer 4 audit flags a currently-passing test as a false positive",
|
||||
"likelihood": "medium",
|
||||
"scope_impact": "the implementation may be larger than the spec suggests if many tests need cleanup",
|
||||
"mitigation": "audit is INFORMATIONAL by default; --strict is opt-in for CI; fix offending test rather than suppress audit"
|
||||
},
|
||||
{
|
||||
"id": "R4",
|
||||
"title": "Layer 3 PowerShell wrapper breaks on a Windows version without the required privileges",
|
||||
"likelihood": "low",
|
||||
"scope_impact": "minimal",
|
||||
"mitigation": "wrapper is opt-in; default invocation stays uv run pytest; wrapper docs explain privilege requirements"
|
||||
},
|
||||
{
|
||||
"id": "R5",
|
||||
"title": "Existing tests that don't go through isolate_workspace still read real config files",
|
||||
"likelihood": "high",
|
||||
"scope_impact": "known gap, out of scope",
|
||||
"mitigation": "Reads are out of scope per the Out of Scope section; Layer 1 still blocks writes which is the user's primary concern"
|
||||
},
|
||||
{
|
||||
"id": "R7",
|
||||
"title": "Removing SLOP_CONFIG env var fallback breaks code paths that relied on it",
|
||||
"likelihood": "medium",
|
||||
"scope_impact": "the implementation may be larger than the spec suggests if many call sites need updating",
|
||||
"mitigation": "conftest.py auto-defaults to config_overrides.toml inside the test workspace; sloppy.py auto-defaults to root_dir/config.toml; the change should be transparent for any code that goes through get_config_path()"
|
||||
},
|
||||
{
|
||||
"id": "R8",
|
||||
"title": "conftest.py sys.argv parse at module body races with pytest's own argparse",
|
||||
"likelihood": "low",
|
||||
"scope_impact": "minimal",
|
||||
"mitigation": "pytest_addoption registers --config so pytest doesn't warn about unknown flag; sys.argv parse at module body is a known-safe pattern (per conductor/tracks/test_infrastructure_hardening_20260609 conftest patterns)"
|
||||
},
|
||||
{
|
||||
"id": "R6",
|
||||
"title": "pytest_configure setting _tmp_path_factory._basetemp uses a private API that changes between versions",
|
||||
"likelihood": "medium",
|
||||
"scope_impact": "minimal",
|
||||
"mitigation": "the --basetemp addopts is the primary mechanism; the _basetemp assignment is defensive only; if it breaks, addopts still works"
|
||||
}
|
||||
],
|
||||
"architecture_reference": {
|
||||
"primary_styleguide": "conductor/code_styleguides/workspace_paths.md",
|
||||
"secondary_styleguides": [
|
||||
"conductor/code_styleguides/feature_flags.md",
|
||||
"conductor/code_styleguides/data_oriented_design.md"
|
||||
],
|
||||
"related_tracks": [
|
||||
"conductor/archive/workspace_path_finalize_20260609/",
|
||||
"conductor/tracks/tier2_autonomous_sandbox_20260616/",
|
||||
"Test Consolidation & TOML Sandboxing (per conductor/tracks.md:395)"
|
||||
],
|
||||
"pattern_references": [
|
||||
"scripts/audit_no_temp_writes.py (audit script pattern)",
|
||||
"scripts/tier2/run_tier2_sandboxed.ps1 (PowerShell wrapper pattern)",
|
||||
"scripts/check_test_toml_paths.py (existing static audit)"
|
||||
]
|
||||
},
|
||||
"deferred_to_followup_tracks": [
|
||||
{
|
||||
"title": "Eliminate the remaining SLOP_* env vars (presets, credentials, etc.)",
|
||||
"description": "This track only eliminates SLOP_CONFIG. The other 7 SLOP_* env vars (SLOP_GLOBAL_PRESETS, SLOP_GLOBAL_TOOL_PRESETS, SLOP_GLOBAL_PERSONAS, SLOP_GLOBAL_WORKSPACE_PROFILES, SLOP_CREDENTIALS, SLOP_MCP_ENV, SLOP_LOGS_DIR, SLOP_SCRIPTS_DIR) remain env-var-driven. Per user directive, this is the 'mess' to address in follow-up tracks. Same pattern: paths.set_<thing>_override() module-level + CLI flag at entry point.",
|
||||
"track_status": "not yet specced"
|
||||
},
|
||||
{
|
||||
"title": "Read-side isolation (block reads of real config from tests)",
|
||||
"description": "Layer 1 only blocks writes; reads of real credentials.toml / config.toml still happen for tests that don't go through isolate_workspace. Future track could block reads via a stricter allowlist.",
|
||||
"track_status": "not yet specced"
|
||||
},
|
||||
{
|
||||
"title": "macOS/Linux OS-level sandbox wrapper",
|
||||
"description": "Layer 3 is Windows-only (restricted token + Job Object). A run_tests_sandboxed.sh using bwrap/unshare would extend to macOS/Linux.",
|
||||
"track_status": "not yet specced"
|
||||
},
|
||||
{
|
||||
"title": "Per-fixture sandbox strictness tuning",
|
||||
"description": "The blanket autouse fixture is the v1. A future track could add @pytest.fixture(sandbox_strict=True) for tests that need full OS isolation vs. the default Python guard.",
|
||||
"track_status": "not yet specced"
|
||||
}
|
||||
],
|
||||
"regressions_and_pre_existing_failures": [],
|
||||
"pre_existing_failures_remaining": [],
|
||||
"user_directives": [
|
||||
"Hard sandbox for tests, similar to Tier 2 - completely banned from accessing files outside ./tests/",
|
||||
"No new @pytest.mark.skip markers",
|
||||
"User has lost important sample data multiple times - this is the primary motivation",
|
||||
"NO ENV VARS for config path. Use --config CLI flag at the entry point (sloppy.py for production, conftest.py for tests)",
|
||||
"Test workspace file naming: config_overrides.toml (per user direction)",
|
||||
"Out of scope: converting the other SLOP_* env vars (presets, credentials, etc.) to CLI flags. User considers them a separate mess to address in follow-up tracks.",
|
||||
"Hard fail on any sandbox violation (no warnings, no soft fails)",
|
||||
"Tests should never need AppData temp (tempfile.mkdtemp/mkstemp without dir= is a flag)"
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,741 @@
|
||||
# Track Implementation Plan: Test Sandbox Hardening (2026-06-19)
|
||||
|
||||
> **For Tier 3 workers:** This plan is executed task-by-task per `conductor/workflow.md`. Each task has WHERE / WHAT / HOW / SAFETY / COMMIT / GIT NOTE fields. Use the spec at `conductor/tracks/test_sandbox_hardening_20260619/spec.md` as the authoritative reference for FR/NFR/VC details.
|
||||
|
||||
**Goal:** Make any `pytest` or `run_tests_batched.py` invocation provably incapable of writing files outside `./tests/` at the Python layer (default-on) and at the OS layer (opt-in via `scripts/run_tests_sandboxed.ps1`), by replacing the silent `SLOP_CONFIG` env-var fallback with an explicit `--config` CLI flag and adding a runtime file-I/O guard.
|
||||
|
||||
**Architecture:** 5-part fix — (1) `src/paths.py` removes the env-var fallback; (2) sloppy.py + conftest.py parse `--config` and call `paths.set_config_override()`; (3) `sys.addaudithook` blocks writes outside `./tests/`; (4) pytest's `--basetemp` + conftest's `isolate_workspace` migrated under `./tests/`; (5) opt-in Windows restricted-token wrapper. Tests are TDD (red → green → commit).
|
||||
|
||||
**Tech Stack:** Python 3.11+, `sys.addaudithook`, `pytest` 9.0+, PowerShell 7+, existing `tomli_w`, `tomllib`.
|
||||
|
||||
**Reference files:**
|
||||
- Spec: `conductor/tracks/test_sandbox_hardening_20260619/spec.md`
|
||||
- Existing pattern: `scripts/audit_no_temp_writes.py` (audit script), `scripts/tier2/run_tier2_sandboxed.ps1` (PowerShell wrapper), `scripts/check_test_toml_paths.py` (existing static audit).
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Investigation + Baseline
|
||||
|
||||
**Focus:** Capture current pass count + audit src/ for `get_config_path()` callers so FR2 changes are transparent.
|
||||
|
||||
- [ ] **Task 1.1:** Capture baseline pass count.
|
||||
- **WHERE:** None (read-only audit).
|
||||
- **WHAT:** Run the full test suite and record results.
|
||||
- **HOW:** `uv run python scripts/run_tests_batched.py --tiers 1,2,3,4,5,6,7,8,9,10,11 > tests/artifacts/_baseline_pre_sandbox.txt 2>&1`
|
||||
- **SAFETY:** Capture pass count + skip count + duration to `tests/artifacts/_baseline_pre_sandbox_summary.txt`. Do NOT modify any source file.
|
||||
- **COMMIT:** None (audit-only).
|
||||
- **GIT NOTE:** None.
|
||||
|
||||
- [ ] **Task 1.2:** Audit `src/` for `get_config_path()` callers.
|
||||
- **WHERE:** `src/` (grep audit).
|
||||
- **WHAT:** Find every call site of `paths.get_config_path()` and `models._load_config_from_disk()` / `models._save_config_to_disk()`. The FR2 change (removing env-var fallback) must be transparent to all of them.
|
||||
- **HOW:**
|
||||
```bash
|
||||
grep -rn "get_config_path\|_load_config_from_disk\|_save_config_to_disk" src/ > tests/artifacts/_get_config_path_callers.txt
|
||||
cat tests/artifacts/_get_config_path_callers.txt | wc -l # record count
|
||||
```
|
||||
- **SAFETY:** Expected ~10-20 call sites. All must be transparent because FR2's default (`<project_root>/config.toml`) matches the current silent fallback behavior.
|
||||
- **COMMIT:** None.
|
||||
- **GIT NOTE:** None.
|
||||
|
||||
- [ ] **Task 1.3:** Phase 1 verification.
|
||||
- **WHERE:** None.
|
||||
- **WHAT:** Confirm baseline + audit files exist + no source changes since session start.
|
||||
- **HOW:** `ls tests/artifacts/_baseline_pre_sandbox* tests/artifacts/_get_config_path_callers.txt; git status --short | wc -l`
|
||||
- **SAFETY:** Phase 1 is READ-ONLY; `git status` must show 0 modified source files.
|
||||
- **COMMIT:** None.
|
||||
- **GIT NOTE:** None.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: FR4 Static Audit (LOW RISK — ship first)
|
||||
|
||||
**Focus:** Write the static audit script that flags test files with hardcoded paths or `tempfile.mkdtemp()` without `dir=`. CI gate (default informational, `--strict` exits 1).
|
||||
|
||||
- [ ] **Task 2.1:** Write `scripts/audit_test_sandbox_violations.py`.
|
||||
- **WHERE:** Create `scripts/audit_test_sandbox_violations.py`.
|
||||
- **WHAT:** Mirror `scripts/check_test_toml_paths.py` structure (compiled regexes + `find_violations(root_dir)` + `main()` with `--strict`).
|
||||
- **HOW:** Patterns:
|
||||
```python
|
||||
TOML_BASENAMES = r"manual_slop|config|credentials|presets|personas|tool_presets|workspace_profiles|project|manualslop_layout|manualslop_history|manualslop_history"
|
||||
PATTERNS = [
|
||||
re.compile(rf'Path\(["\'](?:{TOML_BASENAMES})\.toml["\']'),
|
||||
re.compile(rf'Path\(["\'](?:{TOML_BASENAMES})\.ini["\']'),
|
||||
re.compile(rf'open\(["\'](?:{TOML_BASENAMES})\.toml["\'], ["\']w["\']'),
|
||||
re.compile(r'Path\(["\']C:[/\\]+projects'),
|
||||
re.compile(r'Path\(["\']tests/artifacts/'),
|
||||
re.compile(r"tempfile\.mk(dt|st)emp\("), # bare calls without dir=
|
||||
]
|
||||
EXCLUDE_DIRS = {"artifacts", "logs", "__pycache__"}
|
||||
```
|
||||
Plus a `find_violations(tests_dir)` that scans `tests/test_*.py` and returns `list[tuple[Path, int, str]]`. Plus `main()` with `--strict` (exit 1 on any violation; default exit 0 with report).
|
||||
- **SAFETY:** Audit is INFORMATIONAL by default (exits 0). `--strict` exits 1 only on violations. Per `conductor/code_styleguides/audit-script-conventions.md` (if exists) or `audit_no_temp_writes.py` precedent.
|
||||
- **COMMIT:** `chore(audit): add scripts/audit_test_sandbox_violations.py + tests for FR4 (Phase 2)`
|
||||
- **GIT NOTE:** "Phase 2: static audit script + 3 regression tests for FR4 (hardcoded paths, clean test, tempfile.mkdtemp without dir=). Audit default informational, --strict exits 1."
|
||||
|
||||
- [ ] **Task 2.2:** Write tests 5, 6, 10 in `tests/test_test_sandbox.py`.
|
||||
- **WHERE:** Create `tests/test_test_sandbox.py`.
|
||||
- **WHAT:** Three tests for the audit script. Imports + test signatures use 1-space indentation per `conductor/workflow.md`.
|
||||
- **HOW:**
|
||||
```python
|
||||
import subprocess, sys
|
||||
from pathlib import Path
|
||||
|
||||
def test_audit_flags_known_bad_pattern() -> None:
|
||||
bad = Path("tests/artifacts/_audit_test_bad.py")
|
||||
bad.parent.mkdir(parents=True, exist_ok=True)
|
||||
bad.write_text('Path("manual_slop.toml").write_text("x")\n', encoding="utf-8")
|
||||
result = subprocess.run([sys.executable, "scripts/audit_test_sandbox_violations.py", "--strict"],
|
||||
capture_output=True, text=True)
|
||||
assert result.returncode == 1, f"Expected exit 1, got {result.returncode}"
|
||||
bad.unlink()
|
||||
|
||||
def test_audit_passes_clean_test() -> None:
|
||||
good = Path("tests/artifacts/_audit_test_good.py")
|
||||
good.parent.mkdir(parents=True, exist_ok=True)
|
||||
good.write_text("def test_x(tmp_path): tmp_path.joinpath('foo').write_text('x')\n", encoding="utf-8")
|
||||
result = subprocess.run([sys.executable, "scripts/audit_test_sandbox_violations.py", "--strict"],
|
||||
capture_output=True, text=True)
|
||||
assert result.returncode == 0, f"Expected exit 0, got {result.returncode}: {result.stdout}"
|
||||
good.unlink()
|
||||
|
||||
def test_audit_flags_tempfile_mkdtemp_without_tests_dir() -> None:
|
||||
bad = Path("tests/artifacts/_audit_test_tempfile.py")
|
||||
bad.parent.mkdir(parents=True, exist_ok=True)
|
||||
bad.write_text("import tempfile\ndef test_x(): tempfile.mkdtemp()\n", encoding="utf-8")
|
||||
result = subprocess.run([sys.executable, "scripts/audit_test_sandbox_violations.py", "--strict"],
|
||||
capture_output=True, text=True)
|
||||
assert result.returncode == 1, f"Expected exit 1, got {result.returncode}"
|
||||
bad.unlink()
|
||||
```
|
||||
- **SAFETY:** Tests must clean up their temp files even on failure (use `try/finally` or pytest fixture cleanup).
|
||||
- **COMMIT:** Same as 2.1 (combined commit).
|
||||
- **GIT NOTE:** Same as 2.1.
|
||||
|
||||
- [ ] **Task 2.3:** Run Phase 2 tests to verify.
|
||||
- **WHERE:** None.
|
||||
- **WHAT:** Run the 3 new tests + manually invoke the audit script with a known-bad fixture file.
|
||||
- **HOW:** `uv run python -m pytest tests/test_test_sandbox.py -v -k "audit_"`
|
||||
- **SAFETY:** All 3 must pass. If any fail, debug and fix before committing.
|
||||
- **COMMIT:** Same as 2.1.
|
||||
- **GIT NOTE:** Same as 2.1.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: FR1 Python Guard (HIGH RISK — must be precise)
|
||||
|
||||
**Focus:** Implement `sys.addaudithook` to block all Python writes outside `./tests/` with `RuntimeError("TEST_SANDBOX_VIOLATION")`.
|
||||
|
||||
- [ ] **Task 3.1:** Write `_enforce_test_sandbox` autouse fixture in `tests/conftest.py`.
|
||||
- **WHERE:** Modify `tests/conftest.py` — add new fixture near `isolate_workspace` at line ~258.
|
||||
- **WHAT:** Install `sys.addaudithook` for `open` (write modes), `os.mkdir`, `os.makedirs`, `shutil.rmtree`, `tempfile.mkdtemp`, `tempfile.mkstemp`. Allowlist = anything under `<project_root>/tests/`. Block everything else.
|
||||
- **HOW:** (Insert before the existing `isolate_workspace` fixture):
|
||||
```python
|
||||
_SANDBOX_ALLOWLIST_PREFIXES: tuple[str, ...] = () # initialized in pytest_configure
|
||||
|
||||
def _sandbox_audit_hook(event: str, args: tuple[object, ...]) -> None:
|
||||
"""sys.addaudithook target. Blocks writes outside ./tests/."""
|
||||
if event == "open":
|
||||
path_obj, mode, *_ = args
|
||||
if not isinstance(path_obj, (str, bytes, os.PathLike)):
|
||||
return
|
||||
if isinstance(mode, str) and not any(m in mode for m in ("w", "a", "x", "+")):
|
||||
return
|
||||
try:
|
||||
resolved = Path(os.fspath(path_obj)).resolve()
|
||||
except (OSError, ValueError):
|
||||
return
|
||||
if not _is_under_tests(resolved):
|
||||
raise RuntimeError(
|
||||
f"TEST_SANDBOX_VIOLATION: attempted to write to {resolved} "
|
||||
f"(outside <project_root>/tests/). Use tmp_path or fixture-provided paths."
|
||||
)
|
||||
|
||||
def _is_under_tests(path: Path) -> bool:
|
||||
for prefix in _SANDBOX_ALLOWLIST_PREFIXES:
|
||||
try:
|
||||
path.relative_to(prefix)
|
||||
return True
|
||||
except ValueError:
|
||||
pass
|
||||
return False
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _enforce_test_sandbox() -> Generator[None, None, None]:
|
||||
"""Default-on runtime guard. Installed in pytest_configure."""
|
||||
yield # No-op; hook is installed at session start.
|
||||
|
||||
def pytest_configure(config: object) -> None:
|
||||
global _SANDBOX_ALLOWLIST_PREFIXES
|
||||
project_root = Path(__file__).resolve().parent.parent
|
||||
_SANDBOX_ALLOWLIST_PREFIXES = (
|
||||
str(project_root / "tests"),
|
||||
str(Path("tests/artifacts/_pytest_tmp").resolve()),
|
||||
str(Path("tests/artifacts/_isolation_workspace").resolve()),
|
||||
)
|
||||
sys.addaudithook(_sandbox_audit_hook)
|
||||
_check_required_test_dependencies() # existing call
|
||||
|
||||
def pytest_unconfigure(config: object) -> None:
|
||||
# Note: sys.addaudithook is permanent for the process; no removal API.
|
||||
# The hook stays active until process exit (pytest is the only Python here).
|
||||
pass
|
||||
```
|
||||
**IMPORTANT:** The existing `pytest_configure` at conftest.py:140 must be MERGED with the new one (don't create two definitions).
|
||||
- **SAFETY:** The hook ONLY blocks write modes. Reads pass through. `.pytest_cache`, `__pycache__`, `.coverage` live under `./tests/` or project_root — verify with a quick test run before committing.
|
||||
- **COMMIT:** `feat(tests): add _enforce_test_sandbox autouse fixture for FR1 (Phase 3)`
|
||||
- **GIT NOTE:** "Phase 3: Python sys.addaudithook runtime guard. Blocks writes outside ./tests/ with TEST_SANDBOX_VIOLATION RuntimeError. Reads unaffected. Layer 1 of 4 enforcement stack."
|
||||
|
||||
- [ ] **Task 3.2:** Write tests 1-4 in `tests/test_test_sandbox.py`.
|
||||
- **WHERE:** Add to existing `tests/test_test_sandbox.py` (created in Phase 2).
|
||||
- **WHAT:** Four tests verifying guard behavior.
|
||||
- **HOW:**
|
||||
```python
|
||||
def test_sandbox_blocks_writes_outside_tests_dir() -> None:
|
||||
bad_path = Path(__file__).resolve().parent.parent / "manual_slop.toml"
|
||||
with pytest.raises(RuntimeError, match="TEST_SANDBOX_VIOLATION"):
|
||||
bad_path.write_text("corrupt", encoding="utf-8")
|
||||
|
||||
def test_sandbox_allows_writes_inside_tests_dir(tmp_path) -> None:
|
||||
(tmp_path / "foo.txt").write_text("ok", encoding="utf-8")
|
||||
assert (tmp_path / "foo.txt").read_text(encoding="utf-8") == "ok"
|
||||
|
||||
def test_sandbox_allows_writes_inside_tests_artifacts() -> None:
|
||||
p = Path("tests/artifacts/_sandbox_test_allows/foo.txt")
|
||||
p.parent.mkdir(parents=True, exist_ok=True)
|
||||
p.write_text("ok", encoding="utf-8")
|
||||
assert p.read_text(encoding="utf-8") == "ok"
|
||||
p.unlink()
|
||||
|
||||
def test_sandbox_does_not_block_reads() -> None:
|
||||
pyproject = Path(__file__).resolve().parent.parent / "pyproject.toml"
|
||||
content = pyproject.read_text(encoding="utf-8")
|
||||
assert "[tool.pytest.ini_options]" in content
|
||||
```
|
||||
- **SAFETY:** Test 1 is expected to RAISE; pytest.raises catches it. Tests 2-3 must SUCCEED. Test 4 must SUCCEED (read-only).
|
||||
- **COMMIT:** Same as 3.1 (combined).
|
||||
- **GIT NOTE:** Same as 3.1.
|
||||
|
||||
- [ ] **Task 3.3:** Run full Tier-1 unit suite to verify no regression.
|
||||
- **WHERE:** None.
|
||||
- **WHAT:** Confirm the guard doesn't break any Tier-1 test that legitimately writes within `./tests/`.
|
||||
- **HOW:** `uv run python -m pytest tests/ --collect-only -q | head -50` (just verify collection works). Then `uv run python scripts/run_tests_batched.py --tiers 1 --timeout 120`
|
||||
- **SAFETY:** Tier-1 may have tests that write to `tmp_path` (which now resolves under `./tests/artifacts/_pytest_tmp`). If any test fails, the guard's allowlist needs expansion. Document and add to allowlist only after careful review (the test should already be using `tmp_path`).
|
||||
- **COMMIT:** Same as 3.1.
|
||||
- **GIT NOTE:** Same as 3.1.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: FR2 Root-Cause Fix (--config CLI flag — MOST IMPORTANT)
|
||||
|
||||
**Focus:** Replace the silent `SLOP_CONFIG` env-var fallback in `src/paths.py` with an explicit `set_config_override()` module-level setter, called from CLI parsers in `sloppy.py` and `tests/conftest.py`. This is THE fix for the user's data-loss pain.
|
||||
|
||||
- [ ] **Task 4.1:** Refactor `src/paths.py` to remove the env-var fallback.
|
||||
- **WHERE:** Modify `src/paths.py:42-46` (the `get_config_path()` function).
|
||||
- **WHAT:** Remove `os.environ.get("SLOP_CONFIG", ...)` lookup. Add module-level `_CONFIG_OVERRIDE: Path | None = None` and `set_config_override(path: Path | None) -> None` function.
|
||||
- **HOW:**
|
||||
```python
|
||||
_CONFIG_OVERRIDE: Path | None = None
|
||||
|
||||
def set_config_override(path: Path | None) -> None:
|
||||
"""Set the active config.toml path. None = use default.
|
||||
CLI flag is the ONLY way to override. No env var fallback.
|
||||
[C: sloppy.py:main, tests/conftest.py:_setup_test_paths]"""
|
||||
global _CONFIG_OVERRIDE
|
||||
_CONFIG_OVERRIDE = path
|
||||
_RESOLVED.clear()
|
||||
|
||||
def get_config_path() -> Path:
|
||||
"""Returns the active config.toml. If override is set, returns it.
|
||||
Otherwise returns the default <project_root>/config.toml.
|
||||
[C: src/app_controller.py:AppController.load_config,
|
||||
src/app_controller.py:AppController.init_state,
|
||||
src/models.py:_load_config_from_disk]"""
|
||||
if _CONFIG_OVERRIDE is not None:
|
||||
return _CONFIG_OVERRIDE
|
||||
root_dir = Path(__file__).resolve().parent.parent
|
||||
return root_dir / "config.toml"
|
||||
```
|
||||
- **SAFETY:** The default behavior (no override) returns the same path as the previous env-var fallback when `SLOP_CONFIG` was unset. This is the SAME path the desktop GUI currently uses. So sloppy.py without `--config` works unchanged.
|
||||
- **COMMIT:** `fix(paths): remove SLOP_CONFIG env-var fallback from get_config_path() (Phase 4, FR2 root-cause)`
|
||||
- **GIT NOTE:** "Phase 4 task 4.1: root-cause fix for data loss. src/paths.py no longer silently falls back to <project_root>/config.toml via SLOP_CONFIG env var. New API: paths.set_config_override(path). Default behavior unchanged when no override is set."
|
||||
|
||||
- [ ] **Task 4.2:** Remove diagnostic stderr line from `src/models.py:193`.
|
||||
- **WHERE:** Modify `src/models.py:193` (in `_save_config_to_disk`).
|
||||
- **WHAT:** Delete the `sys.stderr.write(f"[DEBUG] Saving config. Theme: {config.get('theme')}\n"); sys.stderr.flush()` line. Per `AGENTS.md` "No Diagnostic Noise in Production" rule.
|
||||
- **HOW:** Delete the two lines.
|
||||
- **SAFETY:** This is a pure removal of diagnostic noise. No behavior change for normal operation. If any test depends on this stderr output, it should be removed too (check `tests/` for `capsys` fixtures matching this output).
|
||||
- **COMMIT:** Same as 4.1 (combined commit "src cleanup for FR2").
|
||||
- **GIT NOTE:** Same as 4.1.
|
||||
|
||||
- [ ] **Task 4.3:** Add `--config` argparse to `sloppy.py`.
|
||||
- **WHERE:** Modify `sloppy.py` — the argparse setup (find the existing `ArgumentParser` block).
|
||||
- **WHAT:** Add `--config <path>` flag. Call `paths.set_config_override(args.config)` BEFORE any `src/` import.
|
||||
- **HOW:**
|
||||
```python
|
||||
parser.add_argument("--config", type=str, default=None,
|
||||
help="Path to config.toml (default: <project_root>/config.toml)")
|
||||
# ... parse args ...
|
||||
if args.config:
|
||||
from src import paths
|
||||
paths.set_config_override(Path(args.config).resolve())
|
||||
# THEN import the rest:
|
||||
from src.gui_2 import App # existing import below
|
||||
```
|
||||
- **SAFETY:** The `set_config_override` call must happen BEFORE `from src.gui_2 import App` because that import chain eventually imports paths and may trigger `get_config_path()`.
|
||||
- **COMMIT:** `feat(sloppy): add --config CLI flag for config.toml override (Phase 4, FR2)`
|
||||
- **GIT NOTE:** "Phase 4 task 4.3: sloppy.py accepts --config <path>. Sets paths.set_config_override() before any src/ import. Default behavior unchanged."
|
||||
|
||||
- [ ] **Task 4.4:** Update `tests/conftest.py` to parse `--config` at module body.
|
||||
- **WHERE:** Modify `tests/conftest.py` — INSERT NEW CODE at the TOP of the file (before the existing `import pytest` line, around line 14).
|
||||
- **WHAT:** Parse `sys.argv` for `--config` at module body BEFORE any `src/` import. Auto-default to `tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml`. Also register with pytest via `pytest_addoption`.
|
||||
- **HOW:**
|
||||
```python
|
||||
# === STAGE 1: Parse --config from sys.argv BEFORE any src/ import ===
|
||||
import sys as _sys
|
||||
from pathlib import Path as _Path
|
||||
|
||||
_RUN_ID = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
_ISOLATION_WORKSPACE = _Path(f"tests/artifacts/_isolation_workspace_{_RUN_ID}")
|
||||
_ISOLATION_WORKSPACE.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def _parse_config_arg(argv: list[str]) -> _Path | None:
|
||||
for i, arg in enumerate(argv[1:]):
|
||||
if arg == "--config" and i + 1 < len(argv) - 1:
|
||||
return _Path(argv[i + 2]).resolve()
|
||||
if arg.startswith("--config="):
|
||||
return _Path(arg.split("=", 1)[1]).resolve()
|
||||
return None
|
||||
|
||||
_config_override_arg = _parse_config_arg(_sys.argv)
|
||||
if _config_override_arg is None:
|
||||
_config_override_arg = _ISOLATION_WORKSPACE / "config_overrides.toml"
|
||||
|
||||
# Set override BEFORE any src/ import
|
||||
from src import paths as _paths # noqa: E402
|
||||
_paths.set_config_override(_config_override_arg)
|
||||
|
||||
# Register --config with pytest so it doesn't warn about unknown flag
|
||||
def pytest_addoption(parser):
|
||||
parser.addoption("--config", action="store", default=None,
|
||||
help="Manual Slop: override config.toml path for tests")
|
||||
```
|
||||
**IMPORTANT:** This block must be inserted BEFORE `from src.app_controller import AppController` (line 64) and BEFORE any other `src/` imports. Also DELETE the `from src.gui_2 import App` line at line ~250 (move it after the new fixture insertion point to keep imports tidy).
|
||||
- **SAFETY:** The sys.argv parse happens at conftest module import time, BEFORE pytest's argparse. The auto-generated `_config_override_arg` lives inside `./tests/artifacts/`, which the Layer 1 guard will allowlist. Tests that explicitly pass `--config /some/path` get that override. Tests without `--config` get the auto-sandbox.
|
||||
- **COMMIT:** `feat(tests): parse --config CLI flag in conftest.py module body (Phase 4, FR2)`
|
||||
- **GIT NOTE:** "Phase 4 task 4.4: conftest.py parses sys.argv for --config BEFORE any src/ import. Auto-defaults to tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml. registers via pytest_addoption so pytest doesn't warn."
|
||||
|
||||
- [ ] **Task 4.5:** Write tests 11, 12, 13 in `tests/test_test_sandbox.py`.
|
||||
- **WHERE:** Add to existing `tests/test_test_sandbox.py`.
|
||||
- **WHAT:** Three tests for the `--config` CLI flag behavior.
|
||||
- **HOW:**
|
||||
```python
|
||||
def test_config_override_via_cli_flag(tmp_path) -> None:
|
||||
config_path = tmp_path / "my_config.toml"
|
||||
config_path.write_text("[ai]\nprovider='gemini'\n", encoding="utf-8")
|
||||
from src import paths
|
||||
original = paths._CONFIG_OVERRIDE
|
||||
try:
|
||||
paths.set_config_override(config_path)
|
||||
assert paths.get_config_path() == config_path
|
||||
finally:
|
||||
paths.set_config_override(original)
|
||||
|
||||
def test_paths_get_config_path_no_env_fallback(monkeypatch) -> None:
|
||||
monkeypatch.delenv("SLOP_CONFIG", raising=False)
|
||||
from src import paths
|
||||
original = paths._CONFIG_OVERRIDE
|
||||
try:
|
||||
paths.set_config_override(None)
|
||||
root = Path(__file__).resolve().parent.parent
|
||||
assert paths.get_config_path() == root / "config.toml"
|
||||
finally:
|
||||
paths.set_config_override(original)
|
||||
|
||||
def test_sloppy_py_parses_config_flag() -> None:
|
||||
import ast
|
||||
sloppy = Path(__file__).resolve().parent.parent / "sloppy.py"
|
||||
tree = ast.parse(sloppy.read_text(encoding="utf-8"))
|
||||
found_config = False
|
||||
for node in ast.walk(tree):
|
||||
if isinstance(node, ast.arg) and node.arg == "config":
|
||||
found_config = True
|
||||
assert found_config, "sloppy.py must have a --config argparse argument"
|
||||
```
|
||||
- **SAFETY:** Tests manipulate `paths._CONFIG_OVERRIDE` directly (private API but necessary for testing). Always restore in `finally` block.
|
||||
- **COMMIT:** `test(sandbox): add regression tests for --config CLI flag (Phase 4)`
|
||||
- **GIT NOTE:** "Phase 4 task 4.5: 3 regression tests for FR2 (--config CLI flag, no env var fallback, sloppy.py argparse)."
|
||||
|
||||
- [ ] **Task 4.6:** Phase 4 verification — run a broad smoke test.
|
||||
- **WHERE:** None.
|
||||
- **WHAT:** Confirm sloppy.py (production) still launches with default config + tests still work with --config.
|
||||
- **HOW:**
|
||||
```bash
|
||||
# Production: sloppy.py without --config uses default
|
||||
python sloppy.py --help # should NOT raise; --config appears in help
|
||||
|
||||
# Tests: conftest auto-defaults to ./tests/artifacts/.../config_overrides.toml
|
||||
uv run python -m pytest tests/test_test_sandbox.py::test_config_override_via_cli_flag -v
|
||||
uv run python -m pytest tests/test_paths.py -v # existing tests still work
|
||||
```
|
||||
- **SAFETY:** If sloppy.py crashes at import, the `--config` ordering is wrong. If existing tests fail, the new default breaks something — debug before committing.
|
||||
- **COMMIT:** None (this is verification, not a code change).
|
||||
- **GIT NOTE:** None.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: FR3 isolate_workspace + basetemp migration
|
||||
|
||||
**Focus:** Move the `isolate_workspace` workspace off `%TEMP%` to `./tests/artifacts/_isolation_workspace_<run_id>/`. Add `addopts = "--basetemp=..."` to pyproject.toml. Update tech-stack.md note.
|
||||
|
||||
- [ ] **Task 5.1:** Refactor `isolate_workspace` in `tests/conftest.py`.
|
||||
- **WHERE:** Modify `tests/conftest.py:259-281` (the existing `isolate_workspace` autouse).
|
||||
- **WHAT:** Replace `tmp_path_factory.mktemp("isolated_workspace")` with `Path("tests/artifacts/_isolation_workspace") / _RUN_ID`. Add `SLOP_CREDENTIALS` + `SLOP_MCP_ENV` env vars. Auto-generate placeholder TOML files.
|
||||
- **HOW:**
|
||||
```python
|
||||
@pytest.fixture(autouse=True)
|
||||
def isolate_workspace(monkeypatch) -> Generator[None, None, None]:
|
||||
"""Autouse fixture to isolate tests from the active user workspace.
|
||||
Workspace lives under tests/artifacts/ per workspace_paths.md."""
|
||||
test_workspace = _ISOLATION_WORKSPACE # defined in conftest module body
|
||||
test_workspace.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Generate placeholder TOML files
|
||||
config_content = {
|
||||
"ai": {"provider": "gemini", "model": "gemini-2.5-flash-lite"},
|
||||
"projects": {"paths": [], "active": ""},
|
||||
"gui": {"show_windows": {}},
|
||||
}
|
||||
with open(test_workspace / "config_overrides.toml", "wb") as f:
|
||||
tomli_w.dump(config_content, f)
|
||||
for name in ("credentials.toml", "mcp_env.toml", "presets.toml",
|
||||
"tool_presets.toml", "personas.toml", "workspace_profiles.toml"):
|
||||
(test_workspace / name).touch()
|
||||
|
||||
monkeypatch.setenv("SLOP_CREDENTIALS", str(test_workspace / "credentials.toml"))
|
||||
monkeypatch.setenv("SLOP_MCP_ENV", str(test_workspace / "mcp_env.toml"))
|
||||
monkeypatch.setenv("SLOP_GLOBAL_PRESETS", str(test_workspace / "presets.toml"))
|
||||
monkeypatch.setenv("SLOP_GLOBAL_TOOL_PRESETS", str(test_workspace / "tool_presets.toml"))
|
||||
monkeypatch.setenv("SLOP_GLOBAL_PERSONAS", str(test_workspace / "personas.toml"))
|
||||
monkeypatch.setenv("SLOP_GLOBAL_WORKSPACE_PROFILES", str(test_workspace / "workspace_profiles.toml"))
|
||||
yield
|
||||
```
|
||||
**Note:** The `tmp_path_factory` parameter is REMOVED from this fixture. Tests that legitimately need it should request it directly (`def test_x(tmp_path): ...`).
|
||||
- **SAFETY:** All env vars point INSIDE the isolation workspace, which is inside `./tests/artifacts/`. The Layer 1 guard allows this. No test should break UNLESS it was relying on the previous `%TEMP%` path.
|
||||
- **COMMIT:** `refactor(tests): migrate isolate_workspace off tmp_path_factory to tests/artifacts/ (Phase 5, FR3)`
|
||||
- **GIT NOTE:** "Phase 5 task 5.1: isolate_workspace fixture now creates tests/artifacts/_isolation_workspace_<RUN_ID>/. Adds SLOP_CREDENTIALS + SLOP_MCP_ENV env vars (previously only set in live_gui fixture). Per workspace_paths.md styleguide."
|
||||
|
||||
- [ ] **Task 5.2:** Add `addopts` to `pyproject.toml`.
|
||||
- **WHERE:** Modify `pyproject.toml` — add to `[tool.pytest.ini_options]` section.
|
||||
- **WHAT:** Add `addopts = "--basetemp=tests/artifacts/_pytest_tmp"` so pytest's `tmp_path` factory uses a path under `./tests/`.
|
||||
- **HOW:** Insert:
|
||||
```toml
|
||||
[tool.pytest.ini_options]
|
||||
addopts = "--basetemp=tests/artifacts/_pytest_tmp"
|
||||
markers = [
|
||||
...
|
||||
]
|
||||
```
|
||||
- **SAFETY:** The basetemp directory is auto-created by pytest. `.gitignore` already has `tests/artifacts/` so it's gitignored.
|
||||
- **COMMIT:** `chore(pyproject): add --basetemp=tests/artifacts/_pytest_tmp addopts (Phase 5, FR3)`
|
||||
- **GIT NOTE:** "Phase 5 task 5.2: pyproject.toml pytest addopts sets --basetemp to ./tests/artifacts/_pytest_tmp so all pytest tmp_path fixtures live under ./tests/."
|
||||
|
||||
- [ ] **Task 5.3:** Defensive `_tmp_path_factory._basetemp` check in `conftest.py:pytest_configure`.
|
||||
- **WHERE:** Add to existing `pytest_configure` in `tests/conftest.py` (the one merged in Task 3.1).
|
||||
- **WHAT:** If `config._tmp_path_factory._basetemp` resolves outside `./tests/`, override to `./tests/artifacts/_pytest_tmp`.
|
||||
- **HOW:**
|
||||
```python
|
||||
project_root = Path(__file__).resolve().parent.parent
|
||||
basetemp = getattr(config, "_tmp_path_factory", None)
|
||||
if basetemp is not None:
|
||||
current = Path(str(basetemp._basetemp)).resolve()
|
||||
if not str(current).startswith(str(project_root / "tests")):
|
||||
basetemp._basetemp = str(project_root / "tests" / "artifacts" / "_pytest_tmp")
|
||||
```
|
||||
- **SAFETY:** Uses private API `_tmp_path_factory._basetemp` — if pytest version changes, this breaks. The `addopts` in Task 5.2 is the primary mechanism; this is defensive.
|
||||
- **COMMIT:** Same as 5.2 (combined).
|
||||
- **GIT NOTE:** Same as 5.2.
|
||||
|
||||
- [ ] **Task 5.4:** Add dated note to `conductor/tech-stack.md`.
|
||||
- **WHERE:** Modify `conductor/tech-stack.md` — append a dated note to the pytest section.
|
||||
- **WHAT:** Explain the `--basetemp` choice and reference `workspace_paths.md`.
|
||||
- **HOW:**
|
||||
```markdown
|
||||
## pyproject.toml pytest addopts (added 2026-06-19, per test_sandbox_hardening_20260619)
|
||||
|
||||
`[tool.pytest.ini_options].addopts = "--basetemp=tests/artifacts/_pytest_tmp"`.
|
||||
|
||||
**Rationale:** Per `conductor/code_styleguides/workspace_paths.md`, ALL test
|
||||
infrastructure paths must live under `./tests/`. pytest's `tmp_path` and
|
||||
`tmp_path_factory` fixtures default to `%TEMP%\pytest-of-<user>\` on
|
||||
Windows. This `addopts` redirects them under `./tests/` so the Layer 1
|
||||
runtime guard's allowlist (also `./tests/`) can be a single rule.
|
||||
```
|
||||
- **SAFETY:** Pure documentation change.
|
||||
- **COMMIT:** `docs(tech-stack): note --basetemp addopts rationale (Phase 5, FR3)`
|
||||
- **GIT NOTE:** Same as 5.2.
|
||||
|
||||
- [ ] **Task 5.5:** Write tests 7, 8, 9 in `tests/test_test_sandbox.py`.
|
||||
- **WHERE:** Add to existing `tests/test_test_sandbox.py`.
|
||||
- **WHAT:** Three tests verifying pyproject.toml, isolate_workspace, and AppController invariant.
|
||||
- **HOW:**
|
||||
```python
|
||||
def test_pyproject_toml_basetemp_is_under_tests() -> None:
|
||||
pyproject = Path(__file__).resolve().parent.parent / "pyproject.toml"
|
||||
text = pyproject.read_text(encoding="utf-8")
|
||||
assert "--basetemp=tests/artifacts/_pytest_tmp" in text
|
||||
|
||||
def test_isolate_workspace_does_not_use_tmp_path_factory_for_infra() -> None:
|
||||
import ast
|
||||
conftest = Path(__file__).resolve().parent / "conftest.py"
|
||||
tree = ast.parse(conftest.read_text(encoding="utf-8"))
|
||||
for node in ast.walk(tree):
|
||||
if isinstance(node, ast.FunctionDef) and node.name == "isolate_workspace":
|
||||
src = ast.unparse(node)
|
||||
assert "tmp_path_factory.mktemp" not in src, (
|
||||
"isolate_workspace must not use tmp_path_factory.mktemp; "
|
||||
"use Path('tests/artifacts/_isolation_workspace') / _RUN_ID"
|
||||
)
|
||||
return
|
||||
raise AssertionError("isolate_workspace fixture not found in conftest.py")
|
||||
|
||||
def test_appcontroller_init_does_not_load_config() -> None:
|
||||
import ast
|
||||
app_controller = Path(__file__).resolve().parent.parent / "src" / "app_controller.py"
|
||||
tree = ast.parse(app_controller.read_text(encoding="utf-8"))
|
||||
for node in ast.walk(tree):
|
||||
if isinstance(node, ast.FunctionDef) and node.name == "__init__":
|
||||
src = ast.unparse(node)
|
||||
assert "init_state()" not in src, (
|
||||
"AppController.__init__ must not call init_state() "
|
||||
"(this would trigger config reads before fixtures apply)"
|
||||
)
|
||||
assert "load_config()" not in src, (
|
||||
"AppController.__init__ must not call load_config() "
|
||||
"(this would trigger config reads before fixtures apply)"
|
||||
)
|
||||
return
|
||||
raise AssertionError("AppController.__init__ not found")
|
||||
```
|
||||
- **SAFETY:** These tests are static AST checks; they parse source files. They fail loud if invariants break. The `init_state()` invariant test is critical per FR2 audit.
|
||||
- **COMMIT:** `test(sandbox): add regression tests for FR3 invariants (Phase 5)`
|
||||
- **GIT NOTE:** "Phase 5 task 5.5: 3 regression tests for FR3 (pyproject basetemp, isolate_workspace no tmp_path_factory, AppController.__init__ invariant)."
|
||||
|
||||
- [ ] **Task 5.6:** Phase 5 verification — run Tier-2 + Tier-3 to confirm no regression.
|
||||
- **WHERE:** None.
|
||||
- **WHAT:** Verify the basetemp migration + isolate_workspace migration don't break existing tests.
|
||||
- **HOW:** `uv run python scripts/run_tests_batched.py --tiers 2,3 --timeout 180`
|
||||
- **SAFETY:** If tests fail, check whether they were using `tmp_path` (which now resolves under `./tests/`) or hardcoded paths to `%TEMP%` (which the Layer 1 guard now blocks). Audit the failing test, don't disable the guard.
|
||||
- **COMMIT:** None.
|
||||
- **GIT NOTE:** None.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: FR5 PowerShell Wrapper (OPT-IN)
|
||||
|
||||
**Focus:** Write `scripts/run_tests_sandboxed.ps1` (Windows-only, opt-in) that wraps pytest in a Windows restricted token + Job Object.
|
||||
|
||||
- [ ] **Task 6.1:** Write `scripts/run_tests_sandboxed.ps1`.
|
||||
- **WHERE:** Create `scripts/run_tests_sandboxed.ps1`.
|
||||
- **WHAT:** Mirror `scripts/tier2/run_tier2_sandboxed.ps1` structure (100 lines). Replace OpenCode launch with pytest launch.
|
||||
- **HOW:** Tier 3 worker MUST read `scripts/tier2/run_tier2_sandboxed.ps1` end-to-end first (per writing-plans skill "Read Reference Implementation COMPLETELY"), then copy its Add-Type / Job Object / token-acquisition blocks verbatim. Only the LAST step (the actual process launch) differs. Full template:
|
||||
```powershell
|
||||
# scripts/run_tests_sandboxed.ps1
|
||||
<#
|
||||
.SYNOPSIS
|
||||
Run pytest in a Windows restricted-token sandbox.
|
||||
.DESCRIPTION
|
||||
Acquires a Windows restricted token (drops dangerous privileges),
|
||||
wraps pytest in a Job Object, and runs the test suite. The test
|
||||
workspace is forced under ./tests/ via the --config and --basetemp
|
||||
flags (handled by the conftest.py autouse fixtures). The Tier 2
|
||||
clone at <ProjectRoot> is the only directory pytest can read/write
|
||||
for tests; everything outside ./tests/ is blocked by the Layer 1
|
||||
Python guard PLUS the restricted-token enforcement.
|
||||
.NOTES
|
||||
Requires Windows + PowerShell 7+ + admin privileges for full
|
||||
restricted-token acquisition. The -WhatIf mode is a no-op dry-run
|
||||
(exits 0 without acquiring a token).
|
||||
.LINK
|
||||
scripts/tier2/run_tier2_sandboxed.ps1 (template)
|
||||
conductor/tracks/test_sandbox_hardening_20260619/spec.md (FR5)
|
||||
#>
|
||||
[CmdletBinding()]
|
||||
param(
|
||||
[switch]$WhatIf,
|
||||
[string]$TestPath = "tests/",
|
||||
[string]$ConfigPath = "" # empty = conftest.py auto-defaults to config_overrides.toml
|
||||
)
|
||||
|
||||
$ErrorActionPreference = "Stop"
|
||||
$ProjectRoot = (Resolve-Path "$PSScriptRoot/..").Path
|
||||
|
||||
if ($WhatIf) {
|
||||
Write-Host "[SANDBOX-WHATIF] Would run pytest in restricted token at $ProjectRoot"
|
||||
Write-Host "[SANDBOX-WHATIF] TestPath: $TestPath"
|
||||
Write-Host "[SANDBOX-WHATIF] ConfigPath: $($ConfigPath) (empty = conftest.py auto-defaults)"
|
||||
exit 0
|
||||
}
|
||||
|
||||
# === BEGIN: copy Add-Type / token / Job Object blocks from ===
|
||||
# === scripts/tier2/run_tier2_sandboxed.ps1 lines 30-95 verbatim ===
|
||||
# (See reference script for the full restricted-token + Job Object setup.)
|
||||
|
||||
# === END: tier2 clone blocks ===
|
||||
|
||||
# Invoke pytest under restricted token with sandbox flags.
|
||||
# The --basetemp flag ensures pytest's tmp dirs live under ./tests/.
|
||||
# The --config flag points to a config_overrides.toml inside ./tests/
|
||||
# (or empty = conftest.py auto-defaults).
|
||||
$argList = @(
|
||||
"run", "python", "-m", "pytest", $TestPath,
|
||||
"--basetemp=tests/artifacts/_pytest_tmp"
|
||||
)
|
||||
if ($ConfigPath -ne "") { $argList += "--config=$ConfigPath" }
|
||||
|
||||
Push-Location $ProjectRoot
|
||||
try {
|
||||
& uv @argList
|
||||
} finally {
|
||||
Pop-Location
|
||||
}
|
||||
```
|
||||
The Add-Type / token / Job Object blocks MUST be copied verbatim from `scripts/tier2/run_tier2_sandboxed.ps1` lines 30-95 (or wherever the equivalent code lives in the latest version of that script — Tier 3 worker should re-read the source). Only the LAST block (the actual invocation) is new.
|
||||
- **SAFETY:** `-WhatIf` mode is a no-op (exits 0). Full PowerShell restricted-token wrapper requires admin privileges on Windows; document this in the script header. The script is OPT-IN — users continue to use `uv run pytest` or `uv run python scripts/run_tests_batched.py` for normal test runs.
|
||||
- **COMMIT:** `feat(scripts): add scripts/run_tests_sandboxed.ps1 (Phase 6, FR5 opt-in)`
|
||||
- **GIT NOTE:** "Phase 6 task 6.1: PowerShell wrapper for Windows restricted-token + Job Object pytest sandbox. Mirrors run_tier2_sandboxed.ps1 structure (Add-Type + token + Job Object blocks copied verbatim). Only the invocation differs (pytest instead of OpenCode). -WhatIf mode for dry-run. OPT-IN."
|
||||
|
||||
- [ ] **Task 6.2:** Write a smoke test for `-WhatIf` mode.
|
||||
- **WHERE:** Add to `tests/test_test_sandbox.py` (as test 14).
|
||||
- **WHAT:** Verify `pwsh -File scripts/run_tests_sandboxed.ps1 -WhatIf` exits 0.
|
||||
- **HOW:**
|
||||
```python
|
||||
@pytest.mark.skipif(os.name != "nt", reason="Windows-only sandbox wrapper")
|
||||
def test_run_tests_sandboxed_whatif() -> None:
|
||||
result = subprocess.run(
|
||||
["pwsh", "-File", "scripts/run_tests_sandboxed.ps1", "-WhatIf"],
|
||||
capture_output=True, text=True,
|
||||
)
|
||||
assert result.returncode == 0, f"Expected exit 0, got {result.returncode}: {result.stderr}"
|
||||
```
|
||||
- **SAFETY:** Skipped on non-Windows per `conductor/workflow.md` Skip-Marker Policy (legitimate opt-in integration test, requires Windows + pwsh).
|
||||
- **COMMIT:** Same as 6.1.
|
||||
- **GIT NOTE:** Same as 6.1.
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: FR7 Documentation
|
||||
|
||||
**Focus:** Document the 4-layer enforcement model + `--config` CLI flag convention + `config_overrides.toml` naming.
|
||||
|
||||
- [ ] **Task 7.1:** Create `conductor/code_styleguides/test_sandbox.md`.
|
||||
- **WHERE:** Create `conductor/code_styleguides/test_sandbox.md`.
|
||||
- **WHAT:** Styleguide document covering: the `--config` CLI flag, `config_overrides.toml` convention, 4-layer enforcement model, `--basetemp` rule, Layer 1 audit hook contract, opt-in `run_tests_sandboxed.ps1`, audit script.
|
||||
- **HOW:** Use elements-of-style:writing-clearly-and-concisely (the existing styleguides in `conductor/code_styleguides/` are good templates). Sections: TL;DR; The 4-Layer Model; `--config` CLI Flag (replaces SLOP_CONFIG); `--basetemp` Rule; Layer 1 Audit Hook Contract; Static Audit; OS-Level Wrapper; Test Workspace Convention (`config_overrides.toml`); See Also.
|
||||
- **SAFETY:** Documentation only. Reference actual file:line locations from the spec.
|
||||
- **COMMIT:** `docs(styleguide): add test_sandbox.md (Phase 7, FR7)`
|
||||
- **GIT NOTE:** "Phase 7 task 7.1: new styleguide test_sandbox.md documents the 4-layer enforcement model, --config CLI flag, config_overrides.toml convention, --basetemp rule."
|
||||
|
||||
- [ ] **Task 7.2:** Update `conductor/code_styleguides/workspace_paths.md`.
|
||||
- **WHERE:** Append a section to the existing file.
|
||||
- **WHAT:** Mention the `SLOP_CONFIG → --config` migration + `pytest --basetemp` addopts.
|
||||
- **HOW:** Add a "2026-06-19 Update" section at the bottom.
|
||||
- **SAFETY:** Documentation only.
|
||||
- **COMMIT:** Same as 7.1.
|
||||
- **GIT NOTE:** Same as 7.1.
|
||||
|
||||
- [ ] **Task 7.3:** Add `Sandbox Hardening` section to `docs/guide_testing.md`.
|
||||
- **WHERE:** Modify `docs/guide_testing.md` — add a new section.
|
||||
- **WHAT:** Cross-reference to `test_sandbox.md` + summary of the 4 layers.
|
||||
- **HOW:** Append the section.
|
||||
- **SAFETY:** Documentation only.
|
||||
- **COMMIT:** Same as 7.1.
|
||||
- **GIT NOTE:** Same as 7.1.
|
||||
|
||||
---
|
||||
|
||||
## Phase 8: Full Suite Verification
|
||||
|
||||
**Focus:** Run the full 11-tier suite and confirm no regression vs. the `1288 passed + 4 xdist-skipped` baseline.
|
||||
|
||||
- [ ] **Task 8.1:** Run full test suite.
|
||||
- **WHERE:** None.
|
||||
- **WHAT:** Run all 11 tiers and capture results.
|
||||
- **HOW:** `uv run python scripts/run_tests_batched.py --tiers 1,2,3,4,5,6,7,8,9,10,11 > tests/artifacts/_full_suite_post_sandbox.txt 2>&1`
|
||||
- **SAFETY:** If regression vs. baseline (1288 + 4), STOP and report to user. Do not commit a broken suite. Per `conductor/workflow.md` Phase Completion Verification protocol.
|
||||
- **COMMIT:** None (verification).
|
||||
- **GIT NOTE:** None.
|
||||
|
||||
- [ ] **Task 8.2:** Commit verification report.
|
||||
- **WHERE:** None (commit the baseline diff comparison).
|
||||
- **WHAT:** Stage `tests/artifacts/_full_suite_post_sandbox.txt` as a verification artifact.
|
||||
- **HOW:** `git add tests/artifacts/_full_suite_post_sandbox.txt; git commit -m "conductor(checkpoint): Phase 8 - full suite green, no regression vs. baseline 1288+4"`
|
||||
- **SAFETY:** If regression occurred in 8.1, fix forward or roll back per `conductor/workflow.md` Per-Task Decision Protocol.
|
||||
- **COMMIT:** As above.
|
||||
- **GIT NOTE:** "Phase 8 checkpoint: full 11-tier suite passed. No regression vs. pre-track baseline (1288 + 4). Test sandbox hardening is operational."
|
||||
|
||||
---
|
||||
|
||||
## Phase 9: End-of-Track Report
|
||||
|
||||
**Focus:** Write the completion report following the precedent set by `TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md`. Update state.toml to `completed`.
|
||||
|
||||
- [ ] **Task 9.1:** Write `docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md`.
|
||||
- **WHERE:** Create `docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md`.
|
||||
- **WHAT:** Track completion report with: scope (files added/modified), pass-rate baseline + post, deferred items, lessons learned, follow-up tracks (other SLOP_* env vars), user review gate.
|
||||
- **HOW:** Mirror the structure of `docs/reports/TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md`.
|
||||
- **SAFETY:** Pure documentation.
|
||||
- **COMMIT:** `docs(reports): TRACK_COMPLETION_test_sandbox_hardening_20260619 (Phase 9)`
|
||||
- **GIT NOTE:** "Phase 9: track completion report. 9 phases shipped. 4-layer test sandbox enforcement operational. Deferred: convert other SLOP_* env vars to CLI flags (separate mess, separate tracks)."
|
||||
|
||||
- [ ] **Task 9.2:** Update `state.toml` and commit.
|
||||
- **WHERE:** Modify `conductor/tracks/test_sandbox_hardening_20260619/state.toml`.
|
||||
- **WHAT:** Set `status = "completed"`, `current_phase = "complete"`.
|
||||
- **HOW:**
|
||||
```toml
|
||||
[meta]
|
||||
status = "completed"
|
||||
current_phase = "complete"
|
||||
last_updated = "2026-06-19"
|
||||
```
|
||||
- **SAFETY:** Pure metadata.
|
||||
- **COMMIT:** `conductor(state): mark test_sandbox_hardening_20260619 complete`
|
||||
- **GIT NOTE:** "Phase 9 final: state.toml marked complete. Track ships."
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Phase | Tasks | Key output | Risk |
|
||||
|-------|-------|------------|------|
|
||||
| 1: Investigation | 3 | Baseline pass count + audit of get_config_path() callers | None (read-only) |
|
||||
| 2: FR4 Static audit | 3 | `scripts/audit_test_sandbox_violations.py` + 3 tests | Low |
|
||||
| 3: FR1 Python guard | 3 | `_enforce_test_sandbox` fixture + 4 tests | High (can break tests) |
|
||||
| 4: FR2 Root-cause fix | 6 | `set_config_override()` + `--config` CLI flag + 3 tests | High (root-cause) |
|
||||
| 5: FR3 Isolation migration | 6 | `isolate_workspace` + `--basetemp` + tech-stack.md + 3 tests | Medium |
|
||||
| 6: FR5 PowerShell | 2 | `scripts/run_tests_sandboxed.ps1` + smoke test | Low (opt-in) |
|
||||
| 7: FR7 Documentation | 3 | `test_sandbox.md` + updates | None |
|
||||
| 8: Verification | 2 | 11-tier pass count + checkpoint commit | Verification only |
|
||||
| 9: Report | 2 | `TRACK_COMPLETION_*` + state.toml `completed` | None |
|
||||
|
||||
**Total: 30 tasks across 9 phases, ~11 atomic commits.**
|
||||
|
||||
**TDD per phase:** Red (write failing test) → Green (minimal impl) → Verify → Commit.
|
||||
|
||||
**Per-task discipline:** WHERE / WHAT / HOW / SAFETY / COMMIT / GIT NOTE per `conductor/workflow.md` Tier 1 rules.
|
||||
|
||||
**Hard bans:** No `git restore`, `git checkout`, `git reset`. No day estimates in commit messages or git notes. No diagnostic noise in `src/*.py`. No new `@pytest.mark.skip` markers except the one for `test_run_tests_sandboxed_whatif` (Windows-only, legitimate per `conductor/workflow.md` Skip-Marker Policy).
|
||||
|
||||
**Rollback:** Each phase is a separate commit. If any phase breaks, `git revert` the phase's commit(s) without affecting the others.
|
||||
|
||||
---
|
||||
|
||||
## Handoff to Tier 2
|
||||
|
||||
This plan is executed by a Tier 2 Tech Lead via the standard `conductor/workflow.md` Task Workflow:
|
||||
1. Activate `mma-orchestrator` skill.
|
||||
2. For each task: read context, write code, run tests, commit per `git commit` line, attach git note.
|
||||
3. After each phase: phase completion verification + checkpoint.
|
||||
4. After Phase 9: track complete; user reviews merge per `conductor/workflow.md` "Review and merge workflow".
|
||||
|
||||
Tier 3 workers (via `scripts/mma_exec.py --role tier3-worker`) handle individual tasks with surgical prompts. The Tier 2 Tech Lead reviews each commit before moving to the next task.
|
||||
@@ -0,0 +1,373 @@
|
||||
# Track Specification: Test Sandbox Hardening (2026-06-19)
|
||||
|
||||
## Overview
|
||||
|
||||
This track adds a hard file-I/O sandbox for the test suite so that a misbehaving
|
||||
test (missing fixture, broken monkeypatch, direct `open()` to a hardcoded path)
|
||||
cannot corrupt user-owned files in the project root. The user has lost
|
||||
"important sample data" multiple times over the past month because tests have
|
||||
written to `manual_slop.toml`, `manual_slop_history.toml`, `personas.toml`,
|
||||
`presets.toml`, `tool_presets.toml`, or `credentials.toml` at the top of the
|
||||
repo.
|
||||
|
||||
The fix has 5 parts:
|
||||
|
||||
1. **Eliminate the silent `SLOP_CONFIG` env-var fallback** in `src/paths.py`. Replace
|
||||
it with a module-level override set explicitly by the CLI flag `--config`
|
||||
at the entry point (sloppy.py for production, conftest.py for tests). This
|
||||
is the root-cause fix — without it, every other defense is a band-aid.
|
||||
2. **Add a Python runtime file-I/O guard** (`sys.addaudithook` on `open` writes).
|
||||
Default-on for every pytest invocation.
|
||||
3. **Migrate the test workspace off `tmp_path_factory.mktemp`** (which lives in
|
||||
`%TEMP%`) onto `tests/artifacts/_isolation_workspace_<run_id>/` so the
|
||||
Layer 2 allowlist can be a single rule. Add pytest `--basetemp` to pyproject.toml
|
||||
so pytest's own tmp dirs also live under `./tests/`.
|
||||
4. **Add an OS-level restricted-token wrapper** (Windows-only, opt-in via
|
||||
`scripts/run_tests_sandboxed.ps1`) for users who want defense in depth on
|
||||
top of the Python guard.
|
||||
5. **Extend the static audit** to flag any test source code that could try
|
||||
to write to a top-level TOML file, plus `tempfile.mkdtemp()` /
|
||||
`tempfile.mkstemp()` calls without `dir=` pointing under `./tests/`.
|
||||
|
||||
**Out of scope (per user directive):** the OTHER `SLOP_*` env vars
|
||||
(`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`,
|
||||
`SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`,
|
||||
`SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`) remain env-var-driven for now. The user
|
||||
considers them a separate "mess" to be addressed in follow-up tracks. The
|
||||
test workspace still uses these env vars to redirect to per-run paths under
|
||||
`./tests/artifacts/`.
|
||||
|
||||
After this track, the rule is: **any `pytest` or `run_tests_batched.py`
|
||||
invocation cannot write a single byte outside `./tests/`, and the static audit
|
||||
flags any test source code that could try.**
|
||||
|
||||
## Current State Audit (as of 2026-06-19)
|
||||
|
||||
### Already Implemented (DO NOT re-implement)
|
||||
|
||||
1. **`isolate_workspace` autouse fixture** (`tests/conftest.py:259-281`)
|
||||
- Sets `SLOP_CONFIG`, `SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`,
|
||||
`SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES` to a per-test
|
||||
path via `tmp_path_factory.mktemp("isolated_workspace")`.
|
||||
- Provides partial protection for tests that go through `src.paths.get_*_path()`.
|
||||
2. **`live_gui` fixture workspace** (`tests/conftest.py:484-525`)
|
||||
- Creates `tests/artifacts/live_gui_workspace_<TIMESTAMP>/` per pytest
|
||||
invocation with fresh `manual_slop.toml` + `config.toml` (per
|
||||
`workspace_path_finalize_20260609`).
|
||||
- Sets `SLOP_CREDENTIALS` + `SLOP_MCP_ENV` to the **real** project-root
|
||||
`credentials.toml` / `mcp_env.toml` (read-only intent).
|
||||
3. **`scripts/check_test_toml_paths.py`** (per `conductor/tracks.md:395`)
|
||||
- Static audit detects tests with hardcoded references to TOML basenames
|
||||
(`manual_slop.toml`, `config.toml`, `credentials.toml`, etc.) or to
|
||||
`Path("C:/projects/...")` and `Path("tests/artifacts/")` literals.
|
||||
- CI gate that exits 1 on violation.
|
||||
4. **`conductor/code_styleguides/workspace_paths.md`** (148 lines)
|
||||
- Hard rule: test workspaces must live under `tests/artifacts/`. Banned:
|
||||
`tmp_path_factory.mktemp` for test infrastructure workspaces, env vars
|
||||
for test paths, CLI args for test paths.
|
||||
5. **`scripts/audit_no_temp_writes.py`** (108 lines)
|
||||
- Audits `scripts/` for `%TEMP%` usage. Pattern reference for the new
|
||||
audit script in Layer 4.
|
||||
|
||||
### Gaps to Fill (This Track's Scope)
|
||||
|
||||
| # | Gap | Risk | Where |
|
||||
|---|-----|------|-------|
|
||||
| G1 | `isolate_workspace` uses `tmp_path_factory.mktemp` which lives in `%TEMP%`, violating the existing styleguide | Path allowlist for Layer 1 has to include `%TEMP%` = widened blast radius | `tests/conftest.py:265` |
|
||||
| G2 | `isolate_workspace` doesn't set `SLOP_CREDENTIALS` or `SLOP_MCP_ENV` | Non-live_gui tests that go through `src.paths.get_credentials_path()` read the real `credentials.toml` | `tests/conftest.py:259-281` |
|
||||
| G3 | No runtime file-I/O guard. Tests can `Path("manual_slop.toml").write_text(...)` with no consequence | **Direct cause of user's data loss** | New: `tests/conftest.py:_enforce_test_sandbox` |
|
||||
| G4 | Pytest's `tmp_path` / `tmp_path_factory` default to `%TEMP%\pytest-of-<user>\` | Same widening issue as G1 | New: `pyproject.toml` addopts + `tests/conftest.py:pytest_configure` |
|
||||
| G5 | `check_test_toml_paths.py` doesn't catch non-TOML writes (e.g., `Path("manualslop_layout.ini").write_text`, `Path("manualslop_history.toml").write_text`) | Hidden test paths slip through the static audit | New: `scripts/audit_test_sandbox_violations.py` (extends existing) |
|
||||
| G6 | No OS-level hard sandbox option for paranoid users (Tier 2 has `run_tier2_sandboxed.ps1`; tests have no equivalent) | Same risk for users running `pytest` interactively without the Python guard | New: `scripts/run_tests_sandboxed.ps1` (opt-in) |
|
||||
| G7 | `AppController()` is initialized at `tests/conftest.py:65` at module import (line 65-66: `_warmup_app_controller = AppController(); _warmup_app_controller.wait_for_warmup(timeout=60.0)`), BEFORE the autouse `isolate_workspace` fixture applies | **MOSTLY OK but no invariant.** Per the call chain audit: `AppController.__init__` (src/app_controller.py:787) only sets up state + starts warmup background thread; it does NOT call `init_state()` or `load_config()`. `init_state()` (which reads config) is called from `App.__init__()` in `src/gui_2.py`, AFTER fixtures apply. But there's no test asserting this invariant — a future refactor could accidentally move init_state() into __init__ and silently break the safety | New FR8 regression test: assert `AppController.__init__` does not call `init_state()` or `load_config()` |
|
||||
|
||||
## Goals
|
||||
|
||||
1. **Make it impossible for any test invocation to write outside `./tests/`** at the Python layer (Layer 1) and at the OS layer when the opt-in wrapper is used (Layer 3).
|
||||
2. **Make every test see a fully-sandboxed project root** (Layer 2): config, presets, personas, tool_presets, credentials, mcp_env, AND pytest's own tmp dirs all live under `./tests/`.
|
||||
3. **Catch sandbox violations statically** (Layer 4): a developer adding a bad path to a test source file gets a CI failure before the test ever runs.
|
||||
4. **No regression in test pass rate.** All 11 tiers must continue to pass clean after this track ships.
|
||||
5. **No new `@pytest.mark.skip` markers.** Per the user directive (per `conductor/workflow.md` Skip-Marker Policy), in-session fixes only.
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
### FR1. Python runtime file-I/O sandbox (Layer 1 — DEFAULT ON)
|
||||
|
||||
**WHERE:** New `_enforce_test_sandbox` autouse fixture in `tests/conftest.py` (registered alongside `isolate_workspace` at line ~258).
|
||||
|
||||
**WHAT:** Install a `sys.addaudithook()` callable that intercepts the `open` audit event when the mode is `'w'`, `'a'`, `'x'`, or `'+'`, or when the call is to `os.makedirs` / `shutil.rmtree` / `tempfile.mkdtemp` / `tempfile.mkstemp`.
|
||||
|
||||
**HOW:**
|
||||
- Allowlist (writes ALLOWED):
|
||||
- Any path under `<project_root>/tests/` (resolved absolute; case-normalized on Windows).
|
||||
- Any path under `<project_root>/tests/artifacts/` (already covered by above; explicit for clarity).
|
||||
- Any path under the per-run `_RUN_WORKSPACE` (which lives inside `./tests/artifacts/`).
|
||||
- The pyproject.toml `pytest --basetemp` target (also inside `./tests/`).
|
||||
- Denylist (writes REJECTED):
|
||||
- Anything outside `./tests/`.
|
||||
- On violation: raise `RuntimeError("TEST_SANDBOX_VIOLATION: <test_name> attempted to write to <absolute_path> which is outside <project_root>/tests/. Use tmp_path or fixture-provided paths.")` and let pytest report the failure with the test name.
|
||||
- The hook is installed in `pytest_configure` (so it's in place before any test module imports), uninstalled in `pytest_unconfigure`.
|
||||
|
||||
**SAFETY:**
|
||||
- Reads are NOT blocked. Tests legitimately need to read the source tree (`src/`, `pyproject.toml`, `mcp_env.toml`, `credentials.toml`, etc.) for fixtures and mocks.
|
||||
- The hook must be thread-safe (pytest may run tests in xdist workers).
|
||||
- The hook must not break pytest's own internals (`.pytest_cache`, `_pytest_tmp_path_factory` cleanup). The basetemp migration (FR3) handles this.
|
||||
- Allowlist resolution must NOT block legitimate pytest cache writes (`<project_root>/.pytest_cache/`, `<project_root>/tests/.pytest_cache/`, `<project_root>/tests/artifacts/__pycache__/`). Add `.pytest_cache`, `__pycache__`, `.coverage`, `.slop_cache`, `.ruff_cache` to the allowlist.
|
||||
|
||||
### FR2. CLI flag `--config` replaces `SLOP_CONFIG` env var (ROOT-CAUSE FIX)
|
||||
|
||||
**WHERE:** `src/paths.py:42-46` (the silent fallback). `sloppy.py` (CLI parser). `tests/conftest.py` (CLI parser at module body BEFORE any src/ import).
|
||||
|
||||
**WHAT:** Remove the `os.environ.get("SLOP_CONFIG", ...)` fallback from `src/paths.py:get_config_path()`. Replace with a module-level `_CONFIG_OVERRIDE: Path | None` that is set ONLY by explicit CLI flag parsing at the entry point.
|
||||
|
||||
**HOW:**
|
||||
|
||||
```python
|
||||
# src/paths.py — REPLACE the existing get_config_path with:
|
||||
_CONFIG_OVERRIDE: Path | None = None
|
||||
|
||||
def set_config_override(path: Path | None) -> None:
|
||||
"""CLI flag is the ONLY way to override. Pass None to use default.
|
||||
[C: sloppy.py:main, tests/conftest.py:_setup_test_paths]"""
|
||||
global _CONFIG_OVERRIDE
|
||||
_CONFIG_OVERRIDE = path
|
||||
_RESOLVED.clear()
|
||||
|
||||
def get_config_path() -> Path:
|
||||
"""Returns the active config.toml. If override is set, returns it.
|
||||
Otherwise returns the default <project_root>/config.toml.
|
||||
[C: src/app_controller.py:AppController.load_config,
|
||||
src/app_controller.py:AppController.init_state,
|
||||
src/models.py:_load_config_from_disk]"""
|
||||
if _CONFIG_OVERRIDE is not None:
|
||||
return _CONFIG_OVERRIDE
|
||||
root_dir = Path(__file__).resolve().parent.parent
|
||||
return root_dir / "config.toml"
|
||||
```
|
||||
|
||||
```python
|
||||
# sloppy.py — ADD argparse flag (BEFORE any src/ import):
|
||||
parser.add_argument("--config", help="Path to config.toml (default: <project_root>/config.toml)")
|
||||
args = parser.parse_args()
|
||||
if args.config:
|
||||
from src import paths
|
||||
paths.set_config_override(Path(args.config).resolve())
|
||||
```
|
||||
|
||||
```python
|
||||
# tests/conftest.py — PARSE sys.argv at module body BEFORE any src/ import:
|
||||
import sys as _sys
|
||||
from pathlib import Path as _Path
|
||||
|
||||
def _parse_config_arg() -> _Path | None:
|
||||
"""Manual sys.argv parse for --config. Returns resolved Path or None."""
|
||||
args = _sys.argv[1:]
|
||||
for i, arg in enumerate(args):
|
||||
if arg == "--config" and i + 1 < len(args):
|
||||
return _Path(args[i + 1]).resolve()
|
||||
if arg.startswith("--config="):
|
||||
return _Path(arg.split("=", 1)[1]).resolve()
|
||||
return None
|
||||
|
||||
_config_arg = _parse_config_arg()
|
||||
if _config_arg is None:
|
||||
# Default for tests: sandboxed config_overrides.toml
|
||||
config_arg = _Path(f"tests/artifacts/_isolation_workspace_{_RUN_ID}/config_overrides.toml")
|
||||
else:
|
||||
config_arg = _config_arg
|
||||
|
||||
# Set override BEFORE any src/ import
|
||||
from src import paths # noqa: E402
|
||||
paths.set_config_override(config_arg)
|
||||
|
||||
# ALSO register with pytest so it doesn't warn about unknown flag:
|
||||
def pytest_addoption(parser):
|
||||
parser.addoption("--config", action="store", default=None,
|
||||
help="Manual Slop: override config.toml path for tests")
|
||||
```
|
||||
|
||||
**Test workspace contents** (auto-generated by `_setup_test_paths` helper in conftest):
|
||||
```
|
||||
tests/artifacts/_isolation_workspace_<RUN_ID>/
|
||||
├── config_overrides.toml # the AppController config (per user naming)
|
||||
├── credentials.toml # placeholder for SLOP_CREDENTIALS (env var stays)
|
||||
├── mcp_env.toml # placeholder for SLOP_MCP_ENV (env var stays)
|
||||
├── presets.toml # placeholder for SLOP_GLOBAL_PRESETS
|
||||
├── tool_presets.toml # placeholder for SLOP_GLOBAL_TOOL_PRESETS
|
||||
└── personas.toml # placeholder for SLOP_GLOBAL_PERSONAS
|
||||
```
|
||||
|
||||
**SAFETY:**
|
||||
- The new `get_config_path()` raises `KeyError`-like behavior if no override AND no default exists. This is intentional — better to fail fast than silently use a wrong path.
|
||||
- The desktop GUI (`sloppy.py` without `--config`) uses the default `<project_root>/config.toml`, which is unchanged behavior.
|
||||
- Tests ALWAYS use a path inside `./tests/` (either from `--config` or the auto-generated default), so the Layer 1 audit hook's allowlist catches any stray writes.
|
||||
- Conftest's sys.argv parse happens BEFORE pytest's own argparse (which is too late for src/ imports).
|
||||
- The `config_overrides.toml` naming is a convention; tests CAN pass `--config /some/other/path.toml` and it will work.
|
||||
|
||||
### FR3. Pytest tmp paths + `isolate_workspace` migration (Layer 2 — DEFAULT ON)
|
||||
|
||||
**WHERE:**
|
||||
1. `pyproject.toml` — add `addopts = "--basetemp=tests/artifacts/_pytest_tmp"` to `[tool.pytest.ini_options]`.
|
||||
2. `tests/conftest.py:isolate_workspace` (lines 259-281) — replace `tmp_path_factory.mktemp("isolated_workspace")` with `Path("tests/artifacts/_isolation_workspace") / _RUN_ID`.
|
||||
3. `tests/conftest.py:pytest_configure` — defensive normalization: if `config._tmp_path_factory._basetemp` resolves outside `./tests/`, override to `tests/artifacts/_pytest_tmp`.
|
||||
4. `conductor/tech-stack.md` — dated note explaining the `--basetemp` choice.
|
||||
|
||||
**WHAT:**
|
||||
- All pytest `tmp_path` / `tmp_path_factory` fixtures create temp dirs under `<project_root>/tests/artifacts/_pytest_tmp/`.
|
||||
- The `isolate_workspace` autouse fixture's workspace lives under `<project_root>/tests/artifacts/_isolation_workspace_<RUN_ID>/`.
|
||||
- Both the `--basetemp` path AND the `isolate_workspace` path are inside `./tests/`, so the Layer 1 audit hook's allowlist can be a single rule: "anything under `./tests/` is allowed."
|
||||
|
||||
**HOW:**
|
||||
- `pyproject.toml`: standard `addopts` entry.
|
||||
- `isolate_workspace`: replace `tmp_path_factory.mktemp("isolated_workspace")` with `Path("tests/artifacts/_isolation_workspace") / _RUN_ID`. Add `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES` env vars pointing inside the isolation workspace. (Note: these are the OTHER `SLOP_*` env vars that the user is punting; they stay env-var-driven for now.)
|
||||
- `conftest.py:pytest_configure`: defensive check for `_tmp_path_factory._basetemp`.
|
||||
- `tech-stack.md`: dated note.
|
||||
|
||||
**SAFETY:** Update `.gitignore` to ensure `tests/artifacts/_pytest_tmp/` and `tests/artifacts/_isolation_workspace/` are covered (already covered by `tests/artifacts/` pattern).
|
||||
|
||||
### FR4. Extended static audit (Layer 4 — DEFAULT ON as CI gate)
|
||||
|
||||
**WHERE:** New file `scripts/audit_test_sandbox_violations.py` (extends `scripts/check_test_toml_paths.py`).
|
||||
|
||||
**WHAT:** Detect test source files that contain hardcoded write operations targeting paths outside `./tests/`. Patterns:
|
||||
- `Path("manual_slop.toml")`, `Path("config.toml")`, `Path("credentials.toml")`, `Path("presets.toml")`, `Path("personas.toml")`, `Path("tool_presets.toml")`, `Path("workspace_profiles.toml")`, `Path("manualslop_layout.ini")`, `Path("manual_slop_history.toml")`
|
||||
- `Path("project.toml")`, `Path("manual_slop_history.toml")` (top-level TOMLs)
|
||||
- `open("manual_slop.toml", "w")` and similar
|
||||
- `Path("C:/projects/...")` and `Path("C:\\projects\\...")` (project root references)
|
||||
- `Path("tests/artifacts/...")` literal (violates workspace_paths.md; should use a fixture instead)
|
||||
- `Path(__file__).parent.parent / "config.toml"` (and similar `..` traversal)
|
||||
- `tempfile.mkdtemp()`, `tempfile.mkstemp()` (without a `dir=` kwarg pointing to `./tests/`)
|
||||
|
||||
**HOW:** Mirror the existing `check_test_toml_paths.py` structure: list of compiled regexes + `find_violations(root_dir)` + `main()` with `--strict` CI mode.
|
||||
|
||||
**SAFETY:** The audit is INFORMATIONAL by default (exits 0). `--strict` mode (CI) exits 1 on any violation. This matches the precedent set by `audit_no_temp_writes.py` and `check_test_toml_paths.py`.
|
||||
|
||||
### FR5. OS-level sandbox wrapper (Layer 3 — OPT IN)
|
||||
|
||||
**WHERE:** New file `scripts/run_tests_sandboxed.ps1` (analogous to `scripts/tier2/run_tier2_sandboxed.ps1`).
|
||||
|
||||
**WHAT:** A PowerShell launcher that:
|
||||
1. Acquires a Windows restricted token (drops `SeDebugPrivilege`, `SeBackupPrivilege`, `SeRestorePrivilege`, `SeTakeOwnershipPrivilege`, etc.) — same pattern as `run_tier2_sandboxed.ps1`.
|
||||
2. Sets the current directory to the project root.
|
||||
3. Wraps the pytest process tree in a Job Object so it cannot escape.
|
||||
4. Invokes `uv run python -m pytest` (or `uv run python scripts/run_tests_batched.py`) under the restricted token with `--basetemp=tests/artifacts/_pytest_tmp` (Layer 2 + FR3 ensure tmp dirs are inside the sandbox).
|
||||
5. Forwards `--config tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml` so the test config is inside the sandbox.
|
||||
6. Reports the exit code.
|
||||
|
||||
**HOW:** Copy the structure of `scripts/tier2/run_tier2_sandboxed.ps1` (100 lines). Replace the OpenCode launch with a pytest launch. Keep the Job Object + restricted token machinery.
|
||||
|
||||
**SAFETY:** This is OPT-IN. Users who don't need OS-level enforcement continue to use `uv run pytest` or `uv run python scripts/run_tests_batched.py` directly and still get Layer 1 + Layer 2 + Layer 4 protection. The wrapper is for paranoid scenarios.
|
||||
|
||||
**Note on Windows ACLs:** The Tier 2 wrapper uses restricted-token; it does NOT use file ACLs. For tests, this is sufficient because (a) tests run as the same user, (b) the Python guard (Layer 1) is the primary defense, (c) restricted-token catches native code paths that bypass Python.
|
||||
|
||||
### FR6. Regression tests for the guard
|
||||
|
||||
**WHERE:** New file `tests/test_test_sandbox.py`.
|
||||
|
||||
**WHAT:** Tests that verify:
|
||||
1. `test_sandbox_blocks_writes_outside_tests_dir`: write to a hardcoded `Path("manual_slop.toml")` from a test → `RuntimeError` with the `TEST_SANDBOX_VIOLATION` prefix.
|
||||
2. `test_sandbox_allows_writes_inside_tests_dir`: write to `tmp_path / "foo.txt"` → succeeds.
|
||||
3. `test_sandbox_allows_writes_inside_tests_artifacts`: write to `Path("tests/artifacts/_pytest_tmp_xxx/foo.txt")` → succeeds.
|
||||
4. `test_sandbox_does_not_block_reads`: read from `Path("pyproject.toml")` → succeeds.
|
||||
5. `test_audit_test_sandbox_violations_flags_known_bad_pattern`: create a temp test file with `Path("manual_slop.toml").write_text(...)`, run the audit script as a subprocess, assert exit 1.
|
||||
6. `test_audit_test_sandbox_violations_passes_clean_test`: create a temp test file using only `tmp_path`, assert exit 0.
|
||||
7. `test_pyproject_toml_basetemp_is_under_tests`: parse `pyproject.toml`, assert `addopts` contains `--basetemp=tests/artifacts/_pytest_tmp`.
|
||||
8. `test_isolate_workspace_does_not_use_tmp_path_factory_for_infra`: parse `tests/conftest.py`, assert no `tmp_path_factory.mktemp` calls in `isolate_workspace`.
|
||||
9. `test_appcontroller_init_does_not_load_config`: parse `src/app_controller.py`, assert `AppController.__init__` does NOT call `init_state()` or `load_config()`. **Per the G7 audit: this guards the invariant that config reads happen AFTER fixtures apply.**
|
||||
10. `test_audit_flags_tempfile_mkdtemp_without_tests_dir`: create a test file with `tempfile.mkdtemp()`, run audit, assert exit 1. **Per user directive: tests should never need AppData temp.**
|
||||
11. `test_config_override_via_cli_flag`: invoke `python -c "..."` with `--config <path>` and verify `paths.get_config_path()` returns the override.
|
||||
12. `test_paths_get_config_path_no_env_fallback`: monkeypatch-delete `SLOP_CONFIG` env var, import `src.paths`, assert `get_config_path()` returns default (no env var lookup).
|
||||
13. `test_sloppy_py_parses_config_flag`: parse `sloppy.py` AST, assert `--config` argparse argument exists and triggers `paths.set_config_override()`.
|
||||
|
||||
**HOW:** Standard pytest. Use `subprocess.run` for the audit script invocations to test the CLI surface. Use `ast.parse` for static checks on `conftest.py`, `app_controller.py`, and `sloppy.py`.
|
||||
|
||||
**SAFETY:** The `test_sandbox_blocks_writes_outside_tests_dir` test will raise `RuntimeError` — pytest must catch it as a pass. Use `pytest.raises(RuntimeError, match="TEST_SANDBOX_VIOLATION")`.
|
||||
|
||||
### FR7. Documentation update
|
||||
|
||||
**WHERE:** New file `conductor/code_styleguides/test_sandbox.md`. Update `conductor/code_styleguides/workspace_paths.md`. Update `docs/guide_testing.md`.
|
||||
|
||||
**WHAT:** Document:
|
||||
- The `--config` CLI flag convention (replaces `SLOP_CONFIG` env var).
|
||||
- The `config_overrides.toml` naming convention for test workspace configs.
|
||||
- The 4-layer enforcement model (Python guard, conftest isolation, OS-level wrapper, static audit).
|
||||
- The `--basetemp` rule (why pytest tmp paths must live under `./tests/`).
|
||||
- The Layer 1 audit hook contract: writes outside `./tests/` raise `RuntimeError`.
|
||||
- The opt-in `scripts/run_tests_sandboxed.ps1` wrapper.
|
||||
- The audit script and CI gate.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
- **NFR1. Performance:** The audit hook adds < 5% overhead to pytest run time (measured on the existing 11-tier suite). Conftest fixtures are unchanged in scope; only env-var setup is added.
|
||||
- **NFR2. Maintainability:** No new `src/` files (per `AGENTS.md` File Size and Naming Convention rule). The Python guard lives in `tests/conftest.py` (test infrastructure). The audit script lives in `scripts/` (project infrastructure). The PowerShell wrapper lives in `scripts/`.
|
||||
- **NFR3. Cross-platform:** The Python guard (Layer 1) and the static audit (Layer 4) work on Windows, macOS, and Linux. The PowerShell wrapper (Layer 3) is Windows-only; on non-Windows it's a documented no-op (`Write-Host "OS-level sandbox requires Windows"` and exit 0).
|
||||
- **NFR4. Discoverability:** The audit script's `--help` explains what it checks, how to fix violations, and how to run in `--strict` mode. The `RuntimeError` raised by Layer 1 includes a "How to fix" line pointing at `conductor/code_styleguides/test_sandbox.md`.
|
||||
|
||||
## Architecture Reference
|
||||
|
||||
- **`conductor/code_styleguides/workspace_paths.md`** — existing rule: test workspaces live under `tests/artifacts/`. This track extends it to ALL test infrastructure (including pytest's `tmp_path`).
|
||||
- **`conductor/code_styleguides/feature_flags.md`** — Layer 1 + Layer 2 + Layer 4 are file-presence-on = enabled (matches the project's "delete to turn off" convention for cross-cutting concerns). Layer 3 is opt-in via the explicit PowerShell wrapper.
|
||||
- **`scripts/audit_no_temp_writes.py`** — pattern reference for the new `scripts/audit_test_sandbox_violations.py`.
|
||||
- **`scripts/tier2/run_tier2_sandboxed.ps1`** — pattern reference for the new `scripts/run_tests_sandboxed.ps1`.
|
||||
- **`docs/guide_testing.md`** (existing) — test infrastructure deep-dive. Add a new "Sandbox Hardening" section that summarizes Layers 1-4 and links to the styleguide.
|
||||
- **`conductor/tracks/workspace_path_finalize_20260609/`** — prior track that established the `tests/artifacts/` workspace pattern. This track extends it.
|
||||
|
||||
## Out of Scope
|
||||
|
||||
1. **Reading protection.** Tests still need to read the source tree (`src/`, `pyproject.toml`, etc.) for fixtures. Reads are intentionally NOT blocked. If a future track wants read isolation, it's a separate scope.
|
||||
2. **Network sandboxing.** Tests that hit the live Gemini/Anthropic/etc. APIs continue to do so. The user's data loss is filesystem, not network.
|
||||
3. **Migrating existing tests to the new patterns.** The audit (Layer 4) catches new violations; existing tests that already pass continue to pass. If the audit catches a currently-passing test, that's a bug to fix in the test (separate, in-session fixes).
|
||||
4. **Converting the OTHER `SLOP_*` env vars to CLI flags** (`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`). Per user directive, this is the "mess" to address in follow-up tracks. This track only eliminates `SLOP_CONFIG`. The test workspace still uses the other env vars to redirect to per-run paths under `./tests/artifacts/`.
|
||||
5. **A cross-platform equivalent of `run_tests_sandboxed.ps1`.** macOS/Linux users get Layer 1 + Layer 2 + Layer 4. Adding a `run_tests_sandboxed.sh` would require `bwrap` or `unshare` setup; defer to a future track if needed.
|
||||
6. **Conftest fixture-level enforcement (e.g., `@pytest.fixture(sandbox_strict=True)` for tests that need full OS isolation).** The blanket autouse fixture is the v1. Per-fixture tuning is a v2 feature.
|
||||
|
||||
## Verification Criteria
|
||||
|
||||
For the track to be marked complete, ALL of the following must be true:
|
||||
|
||||
- [ ] **VC1.** `tests/test_test_sandbox.py` exists and all 13 tests pass.
|
||||
- [ ] **VC2.** `scripts/audit_test_sandbox_violations.py` runs in both modes:
|
||||
- Default (informational): exit 0, lists violations (or says "clean").
|
||||
- `--strict`: exit 1 on violation, exit 0 on clean.
|
||||
- [ ] **VC3.** `pyproject.toml` contains `addopts = "--basetemp=tests/artifacts/_pytest_tmp"` under `[tool.pytest.ini_options]`.
|
||||
- [ ] **VC4.** `tests/conftest.py:isolate_workspace` no longer calls `tmp_path_factory.mktemp` (per `workspace_paths.md`). All env-var redirects point to paths inside `./tests/artifacts/`.
|
||||
- [ ] **VC5.** `scripts/run_tests_sandboxed.ps1` exists, parses cleanly, and on Windows can be invoked (`-WhatIf` mode for dry-run).
|
||||
- [ ] **VC6.** `conductor/tech-stack.md` has a dated note explaining the `--basetemp` choice.
|
||||
- [ ] **VC7.** `conductor/code_styleguides/workspace_paths.md` (or new `test_sandbox.md`) documents the 3-layer model.
|
||||
- [ ] **VC8.** Full test suite: `uv run python scripts/run_tests_batched.py --tiers 1,2,3,4,5,6,7,8,9,10,11` runs to completion; no regression in pass rate vs. the pre-track baseline (1288 passed + 4 xdist-skipped per `result_migration_small_files_20260617`).
|
||||
- [ ] **VC9.** No new `@pytest.mark.skip` markers added (per `conductor/workflow.md` Skip-Marker Policy + user directive).
|
||||
- [ ] **VC10.** End-of-track report at `docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md` (per Tier 2 conventions).
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
| Risk | Likelihood | Mitigation |
|
||||
|---|---|---|
|
||||
| Layer 1 audit hook breaks a test that legitimately writes outside `./tests/` (e.g., a test that writes to a tempfile.mkdtemp default location) | medium | FR1 allowlist includes pytest's `--basetemp`; if a new path is needed, add it. The hook's `RuntimeError` includes the test name so the offending test is obvious. |
|
||||
| Layer 1 audit hook slows down the test suite | low | `sys.addaudithook` is a thin C-level callback; overhead measured in <2% per Python docs. |
|
||||
| Layer 4 audit flags a currently-passing test as a false positive | medium | The audit is INFORMATIONAL by default; `--strict` is opt-in for CI. If a real test is flagged, fix the test (don't suppress the audit). |
|
||||
| Layer 3 PowerShell wrapper breaks on a Windows version without the required privileges | low | Wrapper is opt-in; default invocation stays `uv run pytest`. Wrapper docs explain the privilege requirements. |
|
||||
| Existing tests that don't go through `isolate_workspace` still read real config files | high (known gap) | Reads are out of scope per the Out of Scope section. Layer 1 still blocks writes, which is the user's primary concern. |
|
||||
| `pytest_configure` setting `_tmp_path_factory._basetemp` uses a private API that changes between versions | medium | The `--basetemp` addopts is the primary mechanism. The `_basetemp` assignment is defensive only; if it breaks, addopts still works. |
|
||||
|
||||
## Execution Plan (high-level — see `plan.md` for worker-ready tasks)
|
||||
|
||||
- [ ] **Phase 1: Investigation + baseline** — capture current pass count, confirm `isolate_workspace` + audit script work as documented.
|
||||
- [ ] **Phase 2: Layer 4 (static audit) + tests** — write `audit_test_sandbox_violations.py`, write `test_test_sandbox.py` audit-tests (parts 5-8), commit.
|
||||
- [ ] **Phase 3: Layer 1 (Python guard) + tests** — implement `_enforce_test_sandbox` fixture, write guard-specific regression tests (parts 1-4), commit.
|
||||
- [ ] **Phase 4: Layer 2 (`isolate_workspace` migration + FR3 basetemp)** — refactor `isolate_workspace`, add `addopts` to `pyproject.toml`, update `tech-stack.md`, commit.
|
||||
- [ ] **Phase 5: Layer 3 (PowerShell wrapper)** — write `scripts/run_tests_sandboxed.ps1`, write a smoke test, commit.
|
||||
- [ ] **Phase 6: Documentation** — update `workspace_paths.md` (or write `test_sandbox.md`), update `docs/guide_testing.md`, commit.
|
||||
- [ ] **Phase 7: Full suite verification** — run all 11 tiers, verify no regression, commit.
|
||||
- [ ] **Phase 8: End-of-track report** — write `docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md`, commit.
|
||||
|
||||
## See Also
|
||||
|
||||
- `conductor/tracks.md:395` — prior "Test Consolidation & TOML Sandboxing" track (added `check_test_toml_paths.py`).
|
||||
- `conductor/archive/workspace_path_finalize_20260609/` — prior track that established the `tests/artifacts/` workspace pattern.
|
||||
- `conductor/tracks/tier2_autonomous_sandbox_20260616/` — meta-tooling track that this sandbox mirrors.
|
||||
- `conductor/code_styleguides/workspace_paths.md` — existing test-workspace rule.
|
||||
- `conductor/code_styleguides/feature_flags.md` — file-presence = enabled convention.
|
||||
- `conductor/workflow.md` Skip-Marker Policy + Skip-Marker review checklist.
|
||||
- `docs/guide_testing.md` — existing test infrastructure deep-dive (add a "Sandbox Hardening" section).
|
||||
- `scripts/audit_no_temp_writes.py` — pattern reference for the new audit script.
|
||||
- `scripts/tier2/run_tier2_sandboxed.ps1` — pattern reference for the new PowerShell wrapper.
|
||||
@@ -0,0 +1,85 @@
|
||||
# Track state for test_sandbox_hardening_20260619
|
||||
# Updated by Tier 2 Tech Lead as tasks complete
|
||||
|
||||
[meta]
|
||||
track_id = "test_sandbox_hardening_20260619"
|
||||
name = "Test Sandbox Hardening"
|
||||
status = "active"
|
||||
current_phase = 0
|
||||
last_updated = "2026-06-19"
|
||||
|
||||
[blocked_by]
|
||||
# Independent track. No blockers.
|
||||
|
||||
[blocks]
|
||||
# No followup tracks blocked on this one (deferred items listed in metadata.json).
|
||||
|
||||
[phases]
|
||||
phase_1 = { status = "pending", checkpointsha = "", name = "Investigation + baseline" }
|
||||
phase_2 = { status = "pending", checkpointsha = "", name = "FR4 static audit + tests" }
|
||||
phase_3 = { status = "pending", checkpointsha = "", name = "FR1 Python guard + tests" }
|
||||
phase_4 = { status = "pending", checkpointsha = "", name = "FR2 root-cause fix (--config replaces SLOP_CONFIG)" }
|
||||
phase_5 = { status = "pending", checkpointsha = "", name = "FR3 isolate_workspace + basetemp migration" }
|
||||
phase_6 = { status = "pending", checkpointsha = "", name = "FR5 PowerShell wrapper" }
|
||||
phase_7 = { status = "pending", checkpointsha = "", name = "FR7 documentation" }
|
||||
phase_8 = { status = "pending", checkpointsha = "", name = "Full suite verification" }
|
||||
phase_9 = { status = "pending", checkpointsha = "", name = "End-of-track report" }
|
||||
|
||||
[tasks]
|
||||
t1_1 = { status = "pending", commit_sha = "", description = "Capture baseline pass count via `uv run python scripts/run_tests_batched.py --tiers 1..11`. Record pass count + skip count + duration." }
|
||||
t1_2 = { status = "pending", commit_sha = "", description = "Confirm scripts/check_test_toml_paths.py runs cleanly (exit 0 in default mode)." }
|
||||
t1_3 = { status = "pending", commit_sha = "", description = "Audit src/ for all callers of get_config_path() to confirm FR2 will be transparent. List each caller." }
|
||||
|
||||
t2_1 = { status = "pending", commit_sha = "", description = "Write scripts/audit_test_sandbox_violations.py mirroring check_test_toml_paths.py structure. Patterns: manual_slop_history.toml, project.toml, manualslop_layout.ini, tempfile.{mkdtemp,mkstemp} without dir=, Path(__file__).parent.parent / 'config.toml'." }
|
||||
t2_2 = { status = "pending", commit_sha = "", description = "Write tests/test_test_sandbox.py tests 5,6,10: audit flagging bad pattern, audit passes clean, audit flags tempfile.mkdtemp without tests dir." }
|
||||
t2_3 = { status = "pending", commit_sha = "", description = "Verify audit script with sample bad test files. Commit Phase 2." }
|
||||
|
||||
t3_1 = { status = "pending", commit_sha = "", description = "Implement _enforce_test_sandbox autouse fixture in tests/conftest.py: pytest_configure installs sys.addaudithook for open writes + os.makedirs + shutil.rmtree + tempfile.{mkdtemp,mkstemp}; pytest_unconfigure removes it." }
|
||||
t3_2 = { status = "pending", commit_sha = "", description = "Write tests/test_test_sandbox.py tests 1-4: guard blocks writes outside ./tests, allows writes inside ./tests, allows writes inside ./tests/artifacts, doesn't block reads." }
|
||||
t3_3 = { status = "pending", commit_sha = "", description = "Manually verify guard fires on a sample bad write. Commit Phase 3." }
|
||||
|
||||
t4_1 = { status = "pending", commit_sha = "", description = "Refactor src/paths.py: remove os.environ.get('SLOP_CONFIG', ...) fallback from get_config_path(). Add module-level _CONFIG_OVERRIDE + set_config_override() function." }
|
||||
t4_2 = { status = "pending", commit_sha = "", description = "Remove diagnostic stderr.write line at src/models.py:193 ('[DEBUG] Saving config...')." }
|
||||
t4_3 = { status = "pending", commit_sha = "", description = "Add --config argparse argument to sloppy.py. Call paths.set_config_override(args.config) BEFORE any src/ import." }
|
||||
t4_4 = { status = "pending", commit_sha = "", description = "Update tests/conftest.py to parse sys.argv for --config at module body BEFORE any src/ import. Add pytest_addoption registration. Auto-default to tests/artifacts/_isolation_workspace_<RUN_ID>/config_overrides.toml." }
|
||||
t4_5 = { status = "pending", commit_sha = "", description = "Write tests/test_test_sandbox.py tests 11,12,13: --config CLI flag works, no env var fallback, sloppy.py parses --config." }
|
||||
t4_6 = { status = "pending", commit_sha = "", description = "Commit Phase 4 (FR2 root-cause fix). This is the most important commit in the track." }
|
||||
|
||||
t5_1 = { status = "pending", commit_sha = "", description = "Refactor tests/conftest.py isolate_workspace: replace tmp_path_factory.mktemp with Path('tests/artifacts/_isolation_workspace') / _RUN_ID. Add SLOP_CREDENTIALS + SLOP_MCP_ENV env vars. Auto-generate placeholder TOML files (credentials.toml, mcp_env.toml, presets.toml, tool_presets.toml, personas.toml) in the isolation workspace." }
|
||||
t5_2 = { status = "pending", commit_sha = "", description = "Add `addopts = \"--basetemp=tests/artifacts/_pytest_tmp\"` to pyproject.toml [tool.pytest.ini_options]." }
|
||||
t5_3 = { status = "pending", commit_sha = "", description = "Add defensive pytest_configure check in conftest.py: if config._tmp_path_factory._basetemp resolves outside ./tests/, override." }
|
||||
t5_4 = { status = "pending", commit_sha = "", description = "Add dated note to conductor/tech-stack.md explaining --basetemp choice." }
|
||||
t5_5 = { status = "pending", commit_sha = "", description = "Write tests/test_test_sandbox.py tests 7,8,9: pyproject.toml has --basetemp, isolate_workspace no tmp_path_factory.mktemp, AppController.__init__ doesn't call init_state()." }
|
||||
t5_6 = { status = "pending", commit_sha = "", description = "Commit Phase 5 (FR3 isolation migration)." }
|
||||
|
||||
t6_1 = { status = "pending", commit_sha = "", description = "Write scripts/run_tests_sandboxed.ps1 based on run_tier2_sandboxed.ps1 structure: restricted token + Job Object + pytest invocation with --config + --basetemp." }
|
||||
t6_2 = { status = "pending", commit_sha = "", description = "Write smoke test: `pwsh -File scripts/run_tests_sandboxed.ps1 -WhatIf` exits 0." }
|
||||
t6_3 = { status = "pending", commit_sha = "", description = "Commit Phase 6 (FR5 PowerShell wrapper)." }
|
||||
|
||||
t7_1 = { status = "pending", commit_sha = "", description = "Create conductor/code_styleguides/test_sandbox.md documenting --config CLI flag, config_overrides.toml convention, 4-layer enforcement model." }
|
||||
t7_2 = { status = "pending", commit_sha = "", description = "Update conductor/code_styleguides/workspace_paths.md to mention the new SLOP_CONFIG → --config migration." }
|
||||
t7_3 = { status = "pending", commit_sha = "", description = "Add 'Sandbox Hardening' section to docs/guide_testing.md." }
|
||||
t7_4 = { status = "pending", commit_sha = "", description = "Commit Phase 7 (FR7 documentation)." }
|
||||
|
||||
t8_1 = { status = "pending", commit_sha = "", description = "Run `uv run python scripts/run_tests_batched.py --tiers 1,2,3,4,5,6,7,8,9,10,11`. Capture pass count + duration. Verify no regression vs. baseline (1288 passed + 4 xdist-skipped)." }
|
||||
t8_2 = { status = "pending", commit_sha = "", description = "If regression: report to user with diff and propose fix. If no regression: commit verification report." }
|
||||
|
||||
t9_1 = { status = "pending", commit_sha = "", description = "Write docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md following precedent set by TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md." }
|
||||
t9_2 = { status = "pending", commit_sha = "", description = "Update state.toml: status = 'completed', current_phase = 'complete'. Commit." }
|
||||
|
||||
[verification]
|
||||
phase_1_baseline_captured = false
|
||||
phase_4_root_cause_fix = false
|
||||
phase_5_layer_2_works = false
|
||||
phase_8_full_suite_no_regression = false
|
||||
phase_9_report_written = false
|
||||
|
||||
[user_directives_logged]
|
||||
hard_sandbox_required = "User wants hard sandbox similar to Tier 2; pytest/run_tests_batched.ps1 entirely banned from accessing files outside ./tests/"
|
||||
no_new_skip_markers = "Per conductor/workflow.md Skip-Marker Policy + user directive"
|
||||
sample_data_loss = "User has lost important sample data multiple times over the past month - primary motivation for this track"
|
||||
design_chosen = "C (Both): Python guard (default) + Windows restricted-token wrapper (opt-in). User confirmed on 2026-06-19."
|
||||
no_env_vars = "Per user 2026-06-19: NO ENV VARS for config path. --config CLI flag is the only override mechanism. Other SLOP_* env vars stay for now (user will fix in follow-up tracks)."
|
||||
config_overrides_naming = "Per user 2026-06-19: test workspace file is named 'config_overrides.toml' (the convention for test sandbox configs)."
|
||||
hard_fail = "Per user 2026-06-19: HARD FAIL on any sandbox violation. No warnings, no soft fails."
|
||||
no_appdata_temp = "Per user 2026-06-19: tests should never need AppData temp. tempfile.mkdtemp/mkstemp without dir= is a flag."
|
||||
Reference in New Issue
Block a user