# Tier 2 Autonomous Sandbox ## Why this exists When you run Tier 2 in the main repo, every `edit` and every `bash` call prompts you for approval (`permission: ask`). For well-regularized tracks (TDD red/green with atomic per-task commits), this is noise. This track adds an **autonomous mode** in a sibling clone where Tier 2 runs unattended, with a 3-layer enforcement stack to keep it contained. ## One-time bootstrap ```powershell cd C:\projects\manual_slop pwsh -File scripts\tier2\setup_tier2_clone.ps1 -WhatIf # dry run first pwsh -File scripts\tier2\setup_tier2_clone.ps1 # actual bootstrap ``` The bootstrap: 1. Clones the main repo to `C:\projects\manual_slop_tier2\` 2. Sets `origin = C:\projects\manual_slop` (local path; no remote) 3. Copies the agent, slash command, and opencode.json templates to the clone 4. Installs the git hooks (`pre-push` refuses all pushes; `post-checkout` logs checkouts) 5. Creates a "Tier 2 (Sandboxed)" desktop shortcut **As of 2026-06-18:** the bootstrap no longer creates any directory on AppData. Tier 2 state and failure reports live at `tests/artifacts/tier2_state//state.json` and `tests/artifacts/tier2_failures/_.md` (project-relative; inside the project tree under the already-gitignored `tests/artifacts/`). The user directive is "NEVER USE APPDATA" — enforced by the OpenCode `*AppData\\*` bash deny rule. ## Per-track invocation 1. Double-click the "Tier 2 (Sandboxed)" desktop shortcut (or run `pwsh -File C:\projects\manual_slop\scripts\tier2\run_tier2_sandboxed.ps1` manually) 2. In the OpenCode session, type: ``` /tier-2-auto-execute ``` Examples: - `/tier-2-auto-execute result_migration_review_pass` - `/tier-2-auto-execute data_structure_strengthening_20260606 --resume` - `/tier-2-auto-execute rag_test_failures_20260615 --toast` 3. Tier 2 runs the track autonomously, commits per task, monitors failcount 4. On success: prints a summary 5. On give-up: writes a failure report and prints the path ## Review and merge After Tier 2 finishes (success or give-up): 1. `cd C:\projects\manual_slop` (back to main) 2. `git fetch C:/projects/manual_slop_tier2 tier2/` 3. Review the diff with Tier 1 (interactive) 4. On approval: `git merge --no-ff tier2/` to main ## The 4 hard bans (enforced at 3 layers) | Ban | Layer 1 (OpenCode) | Layer 2 (OS) | Layer 3 (git hook) | |---|---|---|---| | `git push*` (any push) | `permission.bash` deny rule | n/a | `pre-push` hook refuses all pushes | | `git checkout*` (any form) | `permission.bash` deny rule | n/a | `post-checkout` hook logs the checkout | | `git restore*` (any form) | `permission.bash` deny rule | n/a | n/a | | `git reset*` (any form) | `permission.bash` deny rule | n/a | n/a | | File access outside Tier 2 clone (AppData, Temp, Documents, etc. all denied) | `permission.read`/`write` path allowlist + `*AppData\\*` bash deny | Windows ACL | n/a | ## The failcount threshold Tier 2 gives up if ANY of these hit: - 3 consecutive red-phase failures (the test doesn't fail when it should) - 3 consecutive green-phase failures (the implementation doesn't make the test pass) - 30 minutes with no progress (no commit, no green test) Override via `scripts/tier2/failcount.toml`. ## The failure report Written to `tests/artifacts/tier2_failures/_.md` (project-relative; inside `tests/artifacts/` which is gitignored) with 7 sections: 1. Header (track, branch, started, stopped, duration, give-up signal) 2. Tasks completed 3. Current task (where it stopped) 4. Last 3 failures 5. Failcount state 6. Git state (`git log tier2/ ^origin/master`) 7. Recommendation (heuristic-based) A `.STOPPED` flag file is created alongside the report. The main repo can check for it on next Tier 1 session start (an opt-in banner). ## Conventions (added 2026-06-17) These are enforced by the Tier 2 agent prompt. The agent MUST follow them — they're not optional. - **Test runner:** Tier 2 always uses `uv run python scripts/run_tests_batched.py`. Never `uv run pytest` directly. The batched runner provides tier-based filtering, parallelization (xdist), and a summary table that direct pytest doesn't. - **Default branch:** this repo uses `master` (not `main`). When fetching or branching, use `origin/master`. Tier 2 may otherwise get confused by the missing `main` reference. - **Line endings:** Tier 2 preserves existing line endings on edit. This repo has a mix of CRLF and LF; standardizing to repo-wide LF is a future track. For now, do not normalize. - **Throw-away scripts:** Tier 2 writes its working scripts to `scripts/tier2/artifacts//`, NOT the base `scripts/tier2/` directory. The base directory is reserved for production code. Throw-away scripts are kept for archival but isolated in a track-specific subdir. - **End-of-track report:** at the end of every track, Tier 2 writes `docs/reports/TRACK_COMPLETION_.md` (follow the precedent set by `TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md`) and updates `conductor/tracks//state.toml` to `status = "completed"`. The user reads this report to decide merge. - **Run-time expectation:** tracks are expected to take 1-4 hours. If the model reports it is running out of context, Tier 2 notes progress to disk and continues. The user expects autonomous runs to complete without manual "press continue" intervention. ## Verify the sandbox (manual checklist) After bootstrap, run these inside the Tier 2 sandboxed OpenCode session to verify the bans are enforced: - [ ] Try `git restore tests/test_failcount.py` — should print "denied" - [ ] Try `git push origin master` — should print "denied" (or the pre-push hook fires) - [ ] Try `git checkout -- src/foo.py` — should print "denied" - [ ] Try `git reset --hard HEAD~1` — should print "denied" - [ ] Try to read `C:\Users\Ed\Documents\test.txt` (from a Python subprocess) — should print "ACCESS_DENIED" And verify allowed operations work: - [ ] `git status` — works - [ ] `git switch -c test-branch` — works - [ ] Edit a file in the Tier 2 clone — works - [ ] `git add && git commit -m "test"` — works ## Troubleshooting - **"Tier 2 (Sandboxed) shortcut doesn't work"**: check that `pwsh.exe` is on the PATH (`where.exe pwsh`). - **"Permission denied" on file access inside the sandbox**: the Windows ACL may be too restrictive. Re-run the bootstrap (`setup_tier2_clone.ps1` is idempotent). - **"Failcount state not found"**: the `tests/artifacts/tier2_state//` dir may be missing. The failcount module creates it on first save; check that the Tier 2 clone's project root is correct. - **"Pre-push hook not firing"**: check that `.git/hooks/pre-push` is executable. On Windows, Git Bash runs the hook; check `git config core.hooksPath` if you have a custom hooks dir. - **"Tier 2 keeps giving up at 30 min"**: increase `no_progress_minutes` in `scripts/tier2/failcount.toml`. - **"Tier 2 ran out of context"**: the model stopped mid-track. The user (interactive Tier 1) should `cd` to the Tier 2 clone, inspect `tests/artifacts/tier2_state//state.json` for the last completed task, and re-invoke with `/tier-2-auto-execute --resume` to continue. The state file persists across runs.