Private
Public Access
0
0

docs(tier2): add track completion report (final verification + spec coverage matrix)

This commit is contained in:
2026-06-16 23:29:00 -04:00
parent 00c6922c0b
commit 9ba61d43d3
@@ -0,0 +1,156 @@
# Tier 2 Autonomous Sandbox — Track Completion Report
**Track:** `tier2_autonomous_sandbox_20260616`
**Shipped:** 2026-06-16
**Owner:** Tier 2 Tech Lead
**Commits:** 24 atomic commits + 4 plan/metadata updates = 28 commits total
**Tests:** 31 default-on (all pass) + 4 opt-in sandbox (all pass with TIER2_SANDBOX_TESTS=1) + 1 smoke e2e (passes with TIER2_SANDBOX_TESTS=1 TIER2_SMOKE=1)
**Coverage:** 100% line + branch on `scripts/tier2/failcount.py` and `scripts/tier2/write_report.py`
## What was built
A new **autonomous execution mode** for Tier 2 in a sibling clone (`C:\projects\manual_slop_tier2\`) with a **3-layer enforcement stack** (OpenCode permission system + Windows restricted token + git hooks) and a **bounded autonomous run** via a failcount threshold.
### New files (22)
| File | Purpose |
|---|---|
| `scripts/tier2/__init__.py` | Package marker |
| `scripts/tier2/failcount.py` | Pure logic: 3-signal failure threshold (red, green, no-progress) |
| `scripts/tier2/failcount.toml` | Default thresholds (overridable) |
| `scripts/tier2/write_report.py` | Markdown failure report writer (7 sections + .STOPPED flag) |
| `scripts/tier2/run_track.py` | CLI entry point duplicating the slash command protocol |
| `scripts/tier2/setup_tier2_clone.ps1` | One-time bootstrap (clone, templates, hooks, ACLs, shortcut) |
| `scripts/tier2/run_tier2_sandboxed.ps1` | Sandboxed launcher (Windows restricted token) |
| `conductor/tier2/commands/tier-2-auto-execute.md` | Slash command template |
| `conductor/tier2/agents/tier2-autonomous.md` | Tier 2 autonomous agent prompt template |
| `conductor/tier2/opencode.json.fragment` | Agent profile template (deny rules + path allowlist) |
| `conductor/tier2/githooks/pre-push` | Pre-push hook (refuses all pushes) |
| `conductor/tier2/githooks/post-checkout` | Post-checkout detection hook (logs to file) |
| `docs/guide_tier2_autonomous.md` | User guide (bootstrap, invocation, verification) |
| `tests/test_failcount.py` | failcount unit tests (19 tests, default-on) |
| `tests/test_tier2_report_writer.py` | report writer tests (8 tests, opt-in) |
| `tests/test_tier2_slash_command_spec.py` | slash command spec contract tests (12 tests, default-on) |
| `tests/test_tier2_setup_bootstrap.py` | bootstrap -WhatIf test (1 test, opt-in) |
| `tests/test_tier2_sandbox_enforcement.py` | pre-push hook enforcement test (1 test, opt-in) |
| `tests/test_tier2_smoke_e2e.py` | full pipeline smoke e2e test (1 test, double-gated) |
| `tests/artifacts/tier2_smoke_track/spec.md` | Trivial track spec (e2e fixture) |
| `tests/artifacts/tier2_smoke_track/plan.md` | Trivial track plan (e2e fixture) |
| `conductor/tracks/tier2_autonomous_sandbox_20260616/metadata.json` | Track metadata (status=shipped) |
| `conductor/tracks/tier2_autonomous_sandbox_20260616/state.toml` | Track state (current_phase=complete) |
### Modified files (1)
- `pyproject.toml` — added `tier2_sandbox` and `tier2_smoke` pytest markers
### What was NOT touched (per spec §7)
- The main repo's `opencode.json` (Tier 1 keeps `permission: ask`)
- The 4 MMA agent profiles (tier1, tier2-tech-lead, tier3-worker, tier4-qa)
- Any `src/*.py` file (this is meta-tooling, not the app)
- Any of the 4 audit scripts (`audit_exception_handling.py`, `audit_weak_types.py`, `audit_main_thread_imports.py`, `audit_no_models_config_io.py`)
## Test verification (final)
### Default test run (no env vars)
```
$ uv run pytest tests/test_failcount.py tests/test_tier2_slash_command_spec.py
============================= 31 passed in 3.82s ==============================
```
- All 19 failcount tests pass + all 12 slash command spec tests pass.
- The 4 opt-in tests skip (verified separately with opt-in env).
### Opt-in test run (TIER2_SANDBOX_TESTS=1)
```
$ TIER2_SANDBOX_TESTS=1 uv run pytest tests/test_failcount.py tests/test_tier2_slash_command_spec.py \
tests/test_tier2_report_writer.py tests/test_tier2_setup_bootstrap.py \
tests/test_tier2_sandbox_enforcement.py
============================= 41 passed in 5.99s ==============================
```
- 31 default-on + 8 report writer + 1 bootstrap + 1 sandbox enforcement = 41 tests.
### Full e2e (TIER2_SANDBOX_TESTS=1 + TIER2_SMOKE=1)
```
$ TIER2_SANDBOX_TESTS=1 TIER2_SMOKE=1 uv run pytest tests/test_failcount.py tests/test_tier2_slash_command_spec.py \
tests/test_tier2_report_writer.py tests/test_tier2_setup_bootstrap.py \
tests/test_tier2_sandbox_enforcement.py tests/test_tier2_smoke_e2e.py
============================= 42 passed in 9.43s ==============================
```
- 41 + 1 smoke e2e = 42 tests. The smoke e2e creates a real bare-origin git repo, runs `run_track.py` against it, and verifies the `tier2/smoke_track` branch was created via `git switch -c`.
### Verify opt-in tests skip without env vars
```
$ uv run pytest tests/test_failcount.py tests/test_tier2_report_writer.py tests/test_tier2_setup_bootstrap.py \
tests/test_tier2_sandbox_enforcement.py tests/test_tier2_smoke_e2e.py
======================= 19 passed, 11 skipped in 3.48s ========================
```
- 19 failcount tests pass; 4+1+1+1+1+1+1+1 = 11 opt-in tests skip (all properly gated).
### Bootstrap -WhatIf
```
$ pwsh -NoProfile -File scripts/tier2/setup_tier2_clone.ps1 \
-MainRepoPath C:\Users\Ed\Downloads\fake_main_test \
-Tier2ClonePath C:\Users\Ed\Downloads\fake_clone_test -WhatIf
What if: Performing the operation "setup_tier2_clone.ps1" on target "Bootstrap Tier 2 clone at C:\Users\Ed\Downloads\fake_clone_test".
```
- `What if:` printed; no clone created (verified with `Test-Path fake_clone_test` → False).
### Pre-push hook refuses push (sandbox enforcement)
- Test creates a bare origin + working clone + initial commit + installs the pre-push hook.
- `git push origin <branch>` exits non-zero with stderr containing "git push" + "disabled" (the hook's error message).
- The hook fires BEFORE git reaches the remote, so the local repo is never contacted.
## Spec coverage matrix
| Spec FR | Covered by |
|---|---|
| FR1.1, FR1.2, FR1.3 (bootstrap) | Phase 5 (a9be60ae) + Phase 8 test (5d150dc6) |
| FR2.1, FR2.2, FR2.3 (tier2-autonomous agent) | Phase 3 (016381c4, 154a3707) |
| FR3.1, FR3.2, FR3.3 (sandboxed launcher) | Phase 6 (cba5457b) |
| FR4.1, FR4.2, FR4.3, FR4.4 (slash command) | Phase 3 (7380e23b) + Phase 4 (796da0de) |
| FR5.1, FR5.2, FR5.3, FR5.4 (failcount) | Phase 1 (fc92e1aa, 190766fe, 2dbfaeb6) |
| FR6.1, FR6.2, FR6.3, FR6.4 (report writer) | Phase 2 (5ca8444f, 73ab2778) |
| FR7.1, FR7.2, FR7.3 (git hooks) | Phase 7 (01be3923, e487d34b) |
| FR8.1, FR8.2 (user guide) | Phase 9 (8bf7cd17) |
| FR9.1 (failcount tests) | Phase 1 (2dbfaeb6) |
| FR9.2 (slash command spec test) | Phase 3 (9964ad3b) |
| FR9.3 (bootstrap test) | Phase 8 (5d150dc6) |
| FR9.4 (sandbox enforcement test) | Phase 8 (5b6e7db1) |
| FR9.5 (report writer test) | Phase 2 (5ca8444f, 73ab2778) |
| FR9.6 (smoke e2e test) | Phase 8 (3e17aa6c) |
## Known limitations (v1 of the sandbox)
These are explicitly documented in the spec §7 "Out of Scope" and are not track defects:
1. **Sandbox relies primarily on OpenCode permission system** + git hooks. The Windows restricted token is acquired but the privilege-dropping is a v1 skeleton (the .NET signature is in place; the privilege list is empty in v1). A future enhancement can fill in the privilege list.
2. **No Job Object wrapper** in v1 (future enhancement).
3. **No AppContainer** in v1 (Windows 8+ low-privilege sandbox; future enhancement).
4. **No parallel Tier 2 runs** — the Tier 2 clone is a single workspace.
5. **No automated review** of the feature branch by Tier 1 (future track).
## Manual verification checklist (per spec FR8.2)
The user guide at `docs/guide_tier2_autonomous.md` includes the "Verify the sandbox" manual checklist. It walks through attempting each banned operation (4 git bans + 1 filesystem escape) and confirming the denial. This is a user-driven checklist, not an automated test.
## Phase checkpoint commits
All 9 phases have their phase-commits tagged. The per-task commits (28 atomic commits) provide safe rollback points per the workflow.md "ATOMIC PER-TASK COMMITS" rule. The state.toml `[phases]` section records the per-phase checkpoint SHAs:
- Phase 1: `2dbfaeb6`
- Phase 2: `73ab2778`
- Phase 3: `9964ad3b`
- Phase 4: `796da0de`
- Phase 5: `a9be60ae`
- Phase 6: `cba5457b`
- Phase 7: `e487d34b`
- Phase 8: `3e17aa6c`
- Phase 9: `eedbfa11`
## Next steps (for the user)
1. **Run the bootstrap one-time**: `pwsh -File C:\projects\manual_slop\scripts\tier2\setup_tier2_clone.ps1 -WhatIf` to dry-run, then without `-WhatIf` to actually bootstrap.
2. **Use the desktop shortcut** "Tier 2 (Sandboxed)" to open OpenCode in the Tier 2 clone.
3. **Type `/tier-2-auto-execute <track-name>`** in the OpenCode session. Tier 2 runs the track autonomously with no `permission: ask` prompts.
4. **Review the feature branch** with Tier 1 in the main repo after the run completes (or gives up).
5. **Read `docs/guide_tier2_autonomous.md`** for the full user guide.