docs(tier2): add track completion report (final verification + spec coverage matrix)
This commit is contained in:
@@ -0,0 +1,156 @@
|
||||
# Tier 2 Autonomous Sandbox — Track Completion Report
|
||||
|
||||
**Track:** `tier2_autonomous_sandbox_20260616`
|
||||
**Shipped:** 2026-06-16
|
||||
**Owner:** Tier 2 Tech Lead
|
||||
**Commits:** 24 atomic commits + 4 plan/metadata updates = 28 commits total
|
||||
**Tests:** 31 default-on (all pass) + 4 opt-in sandbox (all pass with TIER2_SANDBOX_TESTS=1) + 1 smoke e2e (passes with TIER2_SANDBOX_TESTS=1 TIER2_SMOKE=1)
|
||||
**Coverage:** 100% line + branch on `scripts/tier2/failcount.py` and `scripts/tier2/write_report.py`
|
||||
|
||||
## What was built
|
||||
|
||||
A new **autonomous execution mode** for Tier 2 in a sibling clone (`C:\projects\manual_slop_tier2\`) with a **3-layer enforcement stack** (OpenCode permission system + Windows restricted token + git hooks) and a **bounded autonomous run** via a failcount threshold.
|
||||
|
||||
### New files (22)
|
||||
|
||||
| File | Purpose |
|
||||
|---|---|
|
||||
| `scripts/tier2/__init__.py` | Package marker |
|
||||
| `scripts/tier2/failcount.py` | Pure logic: 3-signal failure threshold (red, green, no-progress) |
|
||||
| `scripts/tier2/failcount.toml` | Default thresholds (overridable) |
|
||||
| `scripts/tier2/write_report.py` | Markdown failure report writer (7 sections + .STOPPED flag) |
|
||||
| `scripts/tier2/run_track.py` | CLI entry point duplicating the slash command protocol |
|
||||
| `scripts/tier2/setup_tier2_clone.ps1` | One-time bootstrap (clone, templates, hooks, ACLs, shortcut) |
|
||||
| `scripts/tier2/run_tier2_sandboxed.ps1` | Sandboxed launcher (Windows restricted token) |
|
||||
| `conductor/tier2/commands/tier-2-auto-execute.md` | Slash command template |
|
||||
| `conductor/tier2/agents/tier2-autonomous.md` | Tier 2 autonomous agent prompt template |
|
||||
| `conductor/tier2/opencode.json.fragment` | Agent profile template (deny rules + path allowlist) |
|
||||
| `conductor/tier2/githooks/pre-push` | Pre-push hook (refuses all pushes) |
|
||||
| `conductor/tier2/githooks/post-checkout` | Post-checkout detection hook (logs to file) |
|
||||
| `docs/guide_tier2_autonomous.md` | User guide (bootstrap, invocation, verification) |
|
||||
| `tests/test_failcount.py` | failcount unit tests (19 tests, default-on) |
|
||||
| `tests/test_tier2_report_writer.py` | report writer tests (8 tests, opt-in) |
|
||||
| `tests/test_tier2_slash_command_spec.py` | slash command spec contract tests (12 tests, default-on) |
|
||||
| `tests/test_tier2_setup_bootstrap.py` | bootstrap -WhatIf test (1 test, opt-in) |
|
||||
| `tests/test_tier2_sandbox_enforcement.py` | pre-push hook enforcement test (1 test, opt-in) |
|
||||
| `tests/test_tier2_smoke_e2e.py` | full pipeline smoke e2e test (1 test, double-gated) |
|
||||
| `tests/artifacts/tier2_smoke_track/spec.md` | Trivial track spec (e2e fixture) |
|
||||
| `tests/artifacts/tier2_smoke_track/plan.md` | Trivial track plan (e2e fixture) |
|
||||
| `conductor/tracks/tier2_autonomous_sandbox_20260616/metadata.json` | Track metadata (status=shipped) |
|
||||
| `conductor/tracks/tier2_autonomous_sandbox_20260616/state.toml` | Track state (current_phase=complete) |
|
||||
|
||||
### Modified files (1)
|
||||
|
||||
- `pyproject.toml` — added `tier2_sandbox` and `tier2_smoke` pytest markers
|
||||
|
||||
### What was NOT touched (per spec §7)
|
||||
|
||||
- The main repo's `opencode.json` (Tier 1 keeps `permission: ask`)
|
||||
- The 4 MMA agent profiles (tier1, tier2-tech-lead, tier3-worker, tier4-qa)
|
||||
- Any `src/*.py` file (this is meta-tooling, not the app)
|
||||
- Any of the 4 audit scripts (`audit_exception_handling.py`, `audit_weak_types.py`, `audit_main_thread_imports.py`, `audit_no_models_config_io.py`)
|
||||
|
||||
## Test verification (final)
|
||||
|
||||
### Default test run (no env vars)
|
||||
```
|
||||
$ uv run pytest tests/test_failcount.py tests/test_tier2_slash_command_spec.py
|
||||
============================= 31 passed in 3.82s ==============================
|
||||
```
|
||||
- All 19 failcount tests pass + all 12 slash command spec tests pass.
|
||||
- The 4 opt-in tests skip (verified separately with opt-in env).
|
||||
|
||||
### Opt-in test run (TIER2_SANDBOX_TESTS=1)
|
||||
```
|
||||
$ TIER2_SANDBOX_TESTS=1 uv run pytest tests/test_failcount.py tests/test_tier2_slash_command_spec.py \
|
||||
tests/test_tier2_report_writer.py tests/test_tier2_setup_bootstrap.py \
|
||||
tests/test_tier2_sandbox_enforcement.py
|
||||
============================= 41 passed in 5.99s ==============================
|
||||
```
|
||||
- 31 default-on + 8 report writer + 1 bootstrap + 1 sandbox enforcement = 41 tests.
|
||||
|
||||
### Full e2e (TIER2_SANDBOX_TESTS=1 + TIER2_SMOKE=1)
|
||||
```
|
||||
$ TIER2_SANDBOX_TESTS=1 TIER2_SMOKE=1 uv run pytest tests/test_failcount.py tests/test_tier2_slash_command_spec.py \
|
||||
tests/test_tier2_report_writer.py tests/test_tier2_setup_bootstrap.py \
|
||||
tests/test_tier2_sandbox_enforcement.py tests/test_tier2_smoke_e2e.py
|
||||
============================= 42 passed in 9.43s ==============================
|
||||
```
|
||||
- 41 + 1 smoke e2e = 42 tests. The smoke e2e creates a real bare-origin git repo, runs `run_track.py` against it, and verifies the `tier2/smoke_track` branch was created via `git switch -c`.
|
||||
|
||||
### Verify opt-in tests skip without env vars
|
||||
```
|
||||
$ uv run pytest tests/test_failcount.py tests/test_tier2_report_writer.py tests/test_tier2_setup_bootstrap.py \
|
||||
tests/test_tier2_sandbox_enforcement.py tests/test_tier2_smoke_e2e.py
|
||||
======================= 19 passed, 11 skipped in 3.48s ========================
|
||||
```
|
||||
- 19 failcount tests pass; 4+1+1+1+1+1+1+1 = 11 opt-in tests skip (all properly gated).
|
||||
|
||||
### Bootstrap -WhatIf
|
||||
```
|
||||
$ pwsh -NoProfile -File scripts/tier2/setup_tier2_clone.ps1 \
|
||||
-MainRepoPath C:\Users\Ed\Downloads\fake_main_test \
|
||||
-Tier2ClonePath C:\Users\Ed\Downloads\fake_clone_test -WhatIf
|
||||
What if: Performing the operation "setup_tier2_clone.ps1" on target "Bootstrap Tier 2 clone at C:\Users\Ed\Downloads\fake_clone_test".
|
||||
```
|
||||
- `What if:` printed; no clone created (verified with `Test-Path fake_clone_test` → False).
|
||||
|
||||
### Pre-push hook refuses push (sandbox enforcement)
|
||||
- Test creates a bare origin + working clone + initial commit + installs the pre-push hook.
|
||||
- `git push origin <branch>` exits non-zero with stderr containing "git push" + "disabled" (the hook's error message).
|
||||
- The hook fires BEFORE git reaches the remote, so the local repo is never contacted.
|
||||
|
||||
## Spec coverage matrix
|
||||
|
||||
| Spec FR | Covered by |
|
||||
|---|---|
|
||||
| FR1.1, FR1.2, FR1.3 (bootstrap) | Phase 5 (a9be60ae) + Phase 8 test (5d150dc6) |
|
||||
| FR2.1, FR2.2, FR2.3 (tier2-autonomous agent) | Phase 3 (016381c4, 154a3707) |
|
||||
| FR3.1, FR3.2, FR3.3 (sandboxed launcher) | Phase 6 (cba5457b) |
|
||||
| FR4.1, FR4.2, FR4.3, FR4.4 (slash command) | Phase 3 (7380e23b) + Phase 4 (796da0de) |
|
||||
| FR5.1, FR5.2, FR5.3, FR5.4 (failcount) | Phase 1 (fc92e1aa, 190766fe, 2dbfaeb6) |
|
||||
| FR6.1, FR6.2, FR6.3, FR6.4 (report writer) | Phase 2 (5ca8444f, 73ab2778) |
|
||||
| FR7.1, FR7.2, FR7.3 (git hooks) | Phase 7 (01be3923, e487d34b) |
|
||||
| FR8.1, FR8.2 (user guide) | Phase 9 (8bf7cd17) |
|
||||
| FR9.1 (failcount tests) | Phase 1 (2dbfaeb6) |
|
||||
| FR9.2 (slash command spec test) | Phase 3 (9964ad3b) |
|
||||
| FR9.3 (bootstrap test) | Phase 8 (5d150dc6) |
|
||||
| FR9.4 (sandbox enforcement test) | Phase 8 (5b6e7db1) |
|
||||
| FR9.5 (report writer test) | Phase 2 (5ca8444f, 73ab2778) |
|
||||
| FR9.6 (smoke e2e test) | Phase 8 (3e17aa6c) |
|
||||
|
||||
## Known limitations (v1 of the sandbox)
|
||||
|
||||
These are explicitly documented in the spec §7 "Out of Scope" and are not track defects:
|
||||
|
||||
1. **Sandbox relies primarily on OpenCode permission system** + git hooks. The Windows restricted token is acquired but the privilege-dropping is a v1 skeleton (the .NET signature is in place; the privilege list is empty in v1). A future enhancement can fill in the privilege list.
|
||||
2. **No Job Object wrapper** in v1 (future enhancement).
|
||||
3. **No AppContainer** in v1 (Windows 8+ low-privilege sandbox; future enhancement).
|
||||
4. **No parallel Tier 2 runs** — the Tier 2 clone is a single workspace.
|
||||
5. **No automated review** of the feature branch by Tier 1 (future track).
|
||||
|
||||
## Manual verification checklist (per spec FR8.2)
|
||||
|
||||
The user guide at `docs/guide_tier2_autonomous.md` includes the "Verify the sandbox" manual checklist. It walks through attempting each banned operation (4 git bans + 1 filesystem escape) and confirming the denial. This is a user-driven checklist, not an automated test.
|
||||
|
||||
## Phase checkpoint commits
|
||||
|
||||
All 9 phases have their phase-commits tagged. The per-task commits (28 atomic commits) provide safe rollback points per the workflow.md "ATOMIC PER-TASK COMMITS" rule. The state.toml `[phases]` section records the per-phase checkpoint SHAs:
|
||||
|
||||
- Phase 1: `2dbfaeb6`
|
||||
- Phase 2: `73ab2778`
|
||||
- Phase 3: `9964ad3b`
|
||||
- Phase 4: `796da0de`
|
||||
- Phase 5: `a9be60ae`
|
||||
- Phase 6: `cba5457b`
|
||||
- Phase 7: `e487d34b`
|
||||
- Phase 8: `3e17aa6c`
|
||||
- Phase 9: `eedbfa11`
|
||||
|
||||
## Next steps (for the user)
|
||||
|
||||
1. **Run the bootstrap one-time**: `pwsh -File C:\projects\manual_slop\scripts\tier2\setup_tier2_clone.ps1 -WhatIf` to dry-run, then without `-WhatIf` to actually bootstrap.
|
||||
2. **Use the desktop shortcut** "Tier 2 (Sandboxed)" to open OpenCode in the Tier 2 clone.
|
||||
3. **Type `/tier-2-auto-execute <track-name>`** in the OpenCode session. Tier 2 runs the track autonomously with no `permission: ask` prompts.
|
||||
4. **Review the feature branch** with Tier 1 in the main repo after the run completes (or gives up).
|
||||
5. **Read `docs/guide_tier2_autonomous.md`** for the full user guide.
|
||||
Reference in New Issue
Block a user