conductor(spec): Tier 2 autonomous sandbox track spec

2026-06-16 18:31:48 -04:00
parent 88e44d1c0e
commit 024938bd46
1 changed files with 614 additions and 0 deletions
@@ -0,0 +1,614 @@
+# Track Specification: Tier 2 Autonomous Sandbox (unattended track execution with bounded blast radius)
+
+**Track ID:** `tier2_autonomous_sandbox_20260616`
+**Status:** Planned (spec pending user review)
+**Priority:** A (user-blocking; eliminates the manual `permission: ask` bottleneck for well-regularized tracks)
+**Owner:** Tier 2 Tech Lead (per `conductor/workflow.md`)
+**Type:** feature (meta-tooling — adds a new execution mode to the existing MMA workflow, not to the Manual Slop app itself)
+**Scope:** ~7 new files in main repo + 1 sibling clone at `C:\projects\manual_slop_tier2\` (one-time bootstrap)
+**Parent tracks:** `opencode_config_overhaul_20260310` (shipped; established the agent profile scaffolding this track extends)
+**Sibling tracks:** none (independent)
+
+> **Note on effort estimates:** per the Tier 1 rules (see
+> `conductor/workflow.md` §"Tier 1 Track Initialization Rules"), this
+> spec does NOT include day estimates. Effort is measured by **scope** (N
+> files, M sites) and **T-shirt size** (S/M/L/XL). The user / Tier 2
+> agent decides the actual pacing.
+
+---
+
+## 0. TL;DR
+
+This track adds an **unattended execution mode** for Tier 2: you open
+OpenCode in a sibling clone (`C:\projects\manual_slop_tier2\`), type
+`/tier-2-auto-execute <track-name>`, and Tier 2 runs the track
+autonomously — **no `permission: ask` prompts** — while a **3-layer
+defense-in-depth** enforcement stack prevents it from touching the
+filesystem outside its clone + an app-data temp dir, and from running
+destructive git operations (`git restore`, `git push*`, `git checkout`,
+`git reset`). If Tier 2 can't make progress (3 red-phase failures, 3
+green-phase failures, or 30 minutes with no commit/green), it stops
+early, writes a failure report, and notifies you. You review the
+feature branch with Tier 1 in the main repo, then merge.
+
+**T-shirt size: L** — 7 new files in main repo (mostly config +
+scripts + 1 small Python module), 4 new test files, 1 PowerShell
+wrapper, 1 bootstrap script, 1 user guide. ~600 lines of new code.
+
+---
+
+## 1. Overview
+
+### 1.1 The State Before This Track (as of `88e44d1c`)
+
+The current OpenCode configuration has these properties:
+
+- **One repo, two modes via agent profile.** `opencode.json:11` sets
+  `default_agent: "tier2-tech-lead"`. Tier 1 and Tier 2 are
+  distinguished by which agent profile the user selects in the OpenCode
+  session, not by which directory they're in.
+- **Permission bottleneck on Tier 2.** `.opencode/agents/tier2-tech-lead.md:6-9`
+  sets `permission: { edit: "ask", bash: "ask", 'manual-slop_*': allow }`.
+  Every `edit` and every `bash` call from Tier 2 prompts the user for
+  approval. For well-regularized tracks (TDD red/green/refactor with
+  atomic per-task commits, e.g., the upcoming `result_migration_*`
+  tracks), this is **noise** — the user has already pre-approved the
+  track plan, and the per-task approval doesn't add safety, it just
+  adds 50+ clicks per track.
+- **No filesystem boundary enforcement.** Tier 2 has the same
+  filesystem access as the user. There is nothing preventing Tier 2 (or
+  a delegated Tier 3 worker) from reading `C:\Users\Ed\.aws\credentials`
+  or writing to a different project entirely.
+- **No git ban enforcement.** Nothing prevents Tier 2 from running
+  `git restore`, `git push origin`, `git checkout -- <file>`, or
+  `git reset --hard`. These are the four operations the user has
+  called out as "destructive to its progress or affects the origin
+  server" in the original ask.
+- **No failure threshold / give-up mechanism.** A stuck Tier 2 runs
+  until the user notices or the agent self-terminates. There is no
+  "3 red-phase attempts without progress → stop and write a report"
+  guardrail.
+- **One OpenCode session at a time.** The main repo's OpenCode session
+  is the only execution environment. Tier 2 cannot run in parallel with
+  Tier 1 review.
+
+### 1.2 The Goal
+
+Add a **second execution mode** for Tier 2 that is:
+
+- **Autonomous** — no `permission: ask` prompts for `edit` or `bash`
+- **Sandboxed** — file access is restricted to the Tier 2 clone + an
+  app-data temp dir, enforced at 3 independent layers (OpenCode
+  permission system, Windows restricted token + ACLs, git hooks)
+- **Bounded** — a one-shot run with a failure threshold; stuck runs
+  stop early and write a report
+- **Reviewable** — the run produces a feature branch in the clone;
+  the user fetches it back to main and reviews with Tier 1
+- **Opt-in to the app's test suite** — the sandbox / bootstrap / smoke
+  tests are env-var-gated so the default `uv run pytest` run stays
+  app-focused and fast
+
+The main repo (the Tier 1 control plane) is **not modified** —
+`opencode.json` stays the same (Tier 1 still has `permission: ask`),
+and the existing MMA agents stay the same.
+
+### 1.3 What the User Experiences
+
+**One-time bootstrap (the user runs once):**
+```powershell
+cd C:\projects\manual_slop
+pwsh scripts/tier2/setup_tier2_clone.ps1
+```
+
+**Per-track invocation (the user's normal flow from now on):**
+1. `cd C:\projects\manual_slop_tier2`
+2. Open OpenCode in that directory (the "Tier 2 Sandboxed" desktop
+   shortcut the bootstrap created)
+3. In the OpenCode session, type:
+   ```
+   /tier-2-auto-execute result_migration_review_pass
+   ```
+4. Tier 2 fetches the spec, creates `tier2/result_migration_review_pass`
+   branch, runs the plan, commits per task
+5. On success: prints a summary. On give-up: writes a failure report
+   and prints its path.
+6. `cd C:\projects\manual_slop` (back to main)
+7. `git fetch C:/projects/manual_slop_tier2 tier2/result_migration_review_pass`
+8. Review the diff with Tier 1 (interactive)
+9. `git merge --no-ff tier2/result_migration_review_pass` to main
+
+**No `permission: ask` prompts in step 4.** If a Tier 2 tool call
+attempts a banned operation, the OpenCode permission system denies it;
+if a delegated Tier 3 worker tries to escape via a Python subprocess,
+the Windows ACLs deny it; if a `git push` somehow slips through, the
+pre-push hook blocks it. **Three independent layers, all enforcing the
+same ban list.**
+
+---
+
+## 2. Current State Audit (as of `88e44d1c`)
+
+### 2.1 Already Implemented (DO NOT re-implement)
+
+- **OpenCode agent profile scaffolding** —
+  `.opencode/agents/tier{1,2,3,4}-*.md:1-200` and the
+  `opencode.json:1-50` config file. The `tier2-autonomous` agent
+  profile this track adds follows the same pattern.
+- **Slash command pattern** — `.opencode/commands/conductor-implement.md:1-100`
+  is the existing pattern for slash commands. The
+  `tier-2-auto-execute.md` command follows the same structure (front
+  matter `agent:` and `description:`, markdown body with protocol).
+- **Conductor track convention** — `conductor/tracks/<id>/{spec,plan}.md`
+  and `metadata.json` per `conductor/workflow.md` "State.toml
+  Template" + "Track Dependencies and Execution Order" sections. This
+  track's artifacts follow that pattern.
+- **Project-level test opt-in convention** — the `live_gui` fixture
+  in `tests/conftest.py` and the existing env-var-gated tests (e.g.,
+  the `RUN_LIVE_GUI=1` pattern in `tests/test_live_*.py`). The
+  `TIER2_SANDBOX_TESTS=1` opt-in gate for this track's sandbox tests
+  follows the same shape.
+- **PowerShell-based tooling** — `scripts/` already contains
+  PowerShell-adjacent Python scripts. The new wrapper is a pure
+  PowerShell script, consistent with `pywin32`-based operations on
+  Windows.
+- **`scripts/audit_*.py` pattern** — the 4 existing audit scripts
+  (`audit_exception_handling.py`, `audit_weak_types.py`,
+  `audit_main_thread_imports.py`, `audit_no_models_config_io.py`) are
+  the project's enforcement mechanism. This track does not introduce
+  a new audit (the failcount thresholds are TOML-config, not
+  statically checkable), but follows the `scripts/audit_<name>.py`
+  naming for any future addition.
+
+### 2.2 Gaps to Fill (This Track's Scope)
+
+**Gap 1: A second clone as the Tier 2 execution environment.**
+
+The main repo (`C:\projects\manual_slop\`) currently doubles as both
+the Tier 1 control plane and the Tier 2 execution environment. The
+fix is a sibling clone at `C:\projects\manual_slop_tier2\` with
+`origin` set to the main repo's local path (no remote). The clone is
+where the feature branch lives; the user fetches the branch back into
+main for review.
+
+**Gap 2: A `tier2-autonomous` agent profile with deny rules.**
+
+The existing `tier2-tech-lead` agent has `permission: ask` for `edit`
+and `bash`. The fix is a new `tier2-autonomous` agent profile (in the
+Tier 2 clone's `opencode.json`) with:
+- `permission.edit: allow`
+- `permission.bash: { "*": "allow", "git push*": "deny",
+  "git checkout*": "deny", "git restore*": "deny", "git reset*": "deny" }`
+- `permission.read` / `permission.write` restricted to the Tier 2
+  clone + `C:\Users\Ed\AppData\Local\manual_slop\tier2\`
+
+**Gap 3: A sandboxed launcher (Windows restricted token + ACLs).**
+
+OpenCode's permission system is process-level. A determined Tier 3
+worker calling `os.system("...")` from a delegated Python script
+could in principle bypass OpenCode. The fix is a PowerShell wrapper
+that:
+- Acquires a Windows restricted token (drops `SeBackupPrivilege`,
+  `SeRestorePrivilege`, `SeTakeOwnershipPrivilege`, `SeDebugPrivilege`,
+  `SeLoadDriverPrivilege`)
+- Sets explicit ACLs on the Tier 2 clone + app-data temp dir (allow
+  the restricted token, deny everything else)
+- Wraps the process tree in a Job Object (no breakaway)
+- Launches OpenCode + the MCP server under the restricted token via
+  `CreateProcessWithTokenW`
+
+**Gap 4: A `tier-2-auto-execute` slash command.**
+
+The existing slash commands are conductor-style ("start
+implementation", "create track"). The new slash command takes a
+`<track-name>` argument, fetches the spec from `origin/main`, creates
+a `tier2/<track-name>` branch via `git switch -c` (NOT `git checkout`),
+runs the plan via Tier 2, monitors the failcount, and reports back.
+
+**Gap 5: A failure threshold + give-up mechanism (`failcount.py`).**
+
+The current Tier 2 has no built-in "I can't make progress" detection.
+A stuck agent burns tokens until the user notices. The fix is a pure
+Python module that tracks three orthogonal signals:
+- `red_phase_failures` (3 = give up)
+- `green_phase_failures` (3 = give up)
+- `no_progress_minutes` (30 = give up)
+
+Whichever signal hits its threshold first triggers give-up. The
+module is pure logic, fully unit-testable, with a TOML config for
+threshold overrides.
+
+**Gap 6: A failure report writer + flag file + notification.**
+
+When give-up fires, the system needs to:
+- Write a markdown report to
+  `C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\<track>_<utc-timestamp>.md`
+  with: header, tasks completed, current task state, last 3 failures,
+  failcount state, git log, recommendation
+- Create a `.STOPPED` flag file alongside the report
+- Print a clear "TRACK ABORTED" banner in the OpenCode session with
+  the report path
+- Optionally: Windows toast notification (opt-in via `--toast` flag)
+
+**Gap 7: Git hooks as defense-in-depth (Layer 3).**
+
+The OpenCode permission system is the primary enforcement for git bans.
+A pre-push hook (`pre-push` in the clone's `.git/hooks/`) is the
+backup that catches `git push origin*` even if the OpenCode deny rule
+is somehow misconfigured. A `post-checkout` hook logs any checkout of
+tracked files to a detection log.
+
+**Gap 8: A user guide for bootstrap + invocation + manual verification.**
+
+The user needs to know:
+- How to run the bootstrap once
+- How to invoke the slash command
+- What the failure report looks like
+- How to review and merge the feature branch
+- How to manually verify the sandbox blocks the banned operations
+
+---
+
+## 3. Goals
+
+- **Eliminate the `permission: ask` bottleneck** for well-regularized
+  tracks. The user clicks zero times during a normal Tier 2 run
+  (excluding the "did Tier 2 give up?" check at the end).
+- **Enforce the 4 hard git bans** (`git restore`, `git push*`,
+  `git checkout`, `git reset`) at 3 independent layers (OpenCode,
+  Windows OS, git hooks). A bypass of one layer is caught by another.
+- **Enforce the filesystem boundary** (Tier 2 clone + app-data temp
+  only) at 2 independent layers (OpenCode path allowlist, Windows
+  ACLs). Even a delegated Python subprocess can't read outside the
+  allowlist.
+- **Bound the blast radius** with a failure threshold. A stuck Tier 2
+  stops within ~30 minutes and writes a report, instead of running
+  indefinitely.
+- **Keep the default test run app-focused.** All sandbox/bootstrap/
+  smoke tests are env-var-gated; `uv run pytest` with no env vars
+  stays fast and never touches the Windows ACL subsystem.
+- **Keep Tier 1 unchanged.** The main repo's `opencode.json` is not
+  modified. Tier 1 retains its `permission: ask` workflow.
+
+## 4. Functional Requirements
+
+### 4.1 Bootstrap (one-time, user-driven)
+
+**FR1.1:** `scripts/tier2/setup_tier2_clone.ps1` (new) clones the
+main repo to `C:\projects\manual_slop_tier2\`, sets
+`origin = C:\projects\manual_slop`, copies the agent/command/
+opencode.json templates to the clone, installs the git hooks into
+the clone's `.git/hooks/`, creates the app-data temp dir
+`C:\Users\Ed\AppData\Local\manual_slop\tier2\` with restricted ACLs,
+and creates a "Tier 2 (Sandboxed)" desktop shortcut.
+
+**FR1.2:** The bootstrap is idempotent — re-running it does not
+destroy an existing clone's feature branches (it `git fetch origin`
+and pulls the latest templates, but does not `git reset` the clone).
+
+**FR1.3:** The bootstrap dry-run mode (`-WhatIf`) shows what would
+happen without making changes. Required for safety.
+
+### 4.2 The tier2-autonomous agent profile
+
+**FR2.1:** `.opencode/agents/tier2-autonomous.md` (template) in main
+repo; copied to Tier 2 clone during bootstrap. Defines the
+autonomous-mode agent with the deny rules in §2.2 Gap 2.
+
+**FR2.2:** The agent's `temperature: 0.4` (matches Tier 2 Tech Lead).
+The agent uses `git switch -c <branch>` for new branches and
+`git switch <branch>` for switching — `git checkout` is banned
+project-wide.
+
+**FR2.3:** The agent prompt includes the failcount monitoring
+contract: "After each task commit, check
+`<app-data>/tier2/<track>/state.json` via the failcount module. If
+`should_give_up` returns true, write the failure report and stop."
+
+### 4.3 The sandboxed launcher
+
+**FR3.1:** `scripts/tier2/run_tier2_sandboxed.ps1` (new) is the
+entry point that opens OpenCode in the Tier 2 clone under a
+restricted token.
+
+**FR3.2:** The wrapper acquires a restricted token via .NET
+(`CreateRestrictedToken`), sets ACLs on the Tier 2 clone + app-data
+dir to grant the restricted token read/write, wraps the process
+tree in a Job Object, and launches OpenCode + the MCP server under
+the restricted token via `CreateProcessWithTokenW`.
+
+**FR3.3:** The wrapper is the target of the "Tier 2 (Sandboxed)"
+desktop shortcut created during bootstrap. Right-click → Properties
+shows the command: `pwsh -File C:\projects\manual_slop\scripts\tier2\run_tier2_sandboxed.ps1`.
+
+### 4.4 The slash command
+
+**FR4.1:** `.opencode/commands/tier-2-auto-execute.md` (template) in
+main repo; copied to Tier 2 clone during bootstrap. Takes a
+required `<track-name>` argument.
+
+**FR4.2:** The slash command:
+1. Reads `conductor/tracks/<track-name>/spec.md` + `plan.md` from
+   the current branch (after a `git fetch origin main`)
+2. Creates a `tier2/<track-name>` branch via
+   `git switch -c tier2/<track-name> origin/main`
+3. Initializes the failcount state file at
+   `<app-data>/tier2/<track-name>/state.json`
+4. Delegates the plan to the tier2-autonomous agent
+5. After each task commit, checks failcount; on give-up, writes the
+   report and stops
+6. On success, prints a summary (branch name, N commits, M tasks)
+
+**FR4.3:** The slash command's protocol is duplicated in a CLI
+entry point (`scripts/tier2/run_track.py`) so the smoke e2e test
+can invoke the same logic without spinning up an OpenCode session.
+
+**FR4.4:** The slash command supports `--resume` to continue a
+previously-give-up track from the last completed task (state is in
+the state.json file). Default behavior: refuse to resume, ask for
+explicit confirmation.
+
+### 4.5 The failcount module
+
+**FR5.1:** `scripts/tier2/failcount.py` (new) is a pure-Python module
+with no external deps. Exposes:
+- `class FailcountState` — the signal state dataclass
+- `class FailcountConfig` — threshold loader (from TOML or defaults)
+- `def should_give_up(state: FailcountState, config: FailcountConfig,
+  now: datetime) -> Result[bool, ErrorInfo]`
+- `def record_red_failure(state: FailcountState) -> FailcountState`
+- `def record_green_failure(state: FailcountState) -> FailcountState`
+- `def record_green_success(state: FailcountState,
+  now: datetime) -> FailcountState` (resets no_progress)
+- `def record_commit(state: FailcountState,
+  now: datetime) -> FailcountState` (resets no_progress)
+- `def to_dict(state) -> dict`, `def from_dict(d) -> FailcountState`
+- `def load_state(track_name: str) -> Result[FailcountState, ErrorInfo]`
+- `def save_state(track_name: str, state: FailcountState) -> Result[None, ErrorInfo]`
+
+**FR5.2:** Default thresholds (override via `failcount.toml`):
+- `red_phase_threshold: 3`
+- `green_phase_threshold: 3`
+- `no_progress_minutes: 30`
+
+**FR5.3:** `should_give_up` returns `True` if ANY signal hits its
+threshold. The `now` parameter is injectable for testing.
+
+**FR5.4:** `record_green_success` and `record_commit` reset the
+`no_progress_minutes` timer. They do NOT reset the red/green
+failure counters (those only reset on the next progress signal of
+the same type — e.g., a red failure is reset by a green test that
+eventually passes).
+
+### 4.6 The failure report writer
+
+**FR6.1:** `scripts/tier2/write_report.py` (new) takes a track name,
+branch name, state, and a list of `TaskResult` records, and writes
+the markdown report to
+`C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\<track>_<utc-timestamp>.md`.
+
+**FR6.2:** The report contains the 7 sections in order:
+1. Header (track, branch, started-at, stopped-at, duration, give-up signal)
+2. Tasks completed (list with task IDs, commit SHAs, summaries)
+3. Current task state (where it stopped: task ID, phase, worker output, test failure)
+4. Last 3 failures (truncated to 50 lines, full output in `..._full.log`)
+5. Failcount state at give-up
+6. Git state (`git log --oneline tier2/<track> ^origin/main`)
+7. Recommendation (heuristic-based: "track too complex", "spec needs clearer plan", "external dependency missing", "review carefully")
+
+**FR6.3:** A `.STOPPED` flag file is created at
+`C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\<track>.STOPPED`.
+
+**FR6.4:** The report writer returns the report path on success
+(via `Result[str, ErrorInfo]`).
+
+### 4.7 The git hooks (Layer 3)
+
+**FR7.1:** `conductor/tier2/githooks/pre-push` (template) is a
+shell/PowerShell script that refuses `git push` invocations to any
+remote. The script returns exit code 1 with the message
+"Tier 2 autonomous mode: `git push` is disabled. Push the branch
+manually from the main repo after review."
+
+**FR7.2:** `conductor/tier2/githooks/post-checkout` (template) is a
+detection-only hook that logs any checkout of tracked files to
+`C:\Users\Ed\AppData\Local\manual_slop\tier2\tier2_checkout_log.txt`
+with a timestamp, the commit hash, and the affected paths.
+
+**FR7.3:** The bootstrap script copies both hooks to the Tier 2
+clone's `.git/hooks/` and `chmod +x` (on Linux/WSL) or sets the
+executable bit via `icacls` (on Windows).
+
+### 4.8 The user guide
+
+**FR8.1:** `docs/guide_tier2_autonomous.md` (new) covers:
+- Why this exists (the `permission: ask` bottleneck)
+- One-time bootstrap procedure (with `-WhatIf` instructions)
+- Per-track invocation procedure
+- The slash command arguments (`<track-name>`, `--resume`, `--toast`)
+- The failure report layout (with screenshot/example)
+- How to review and merge the feature branch
+- The "Verify the sandbox" checklist (manual verification)
+- Troubleshooting (common errors: origin not set, hooks not
+  executable, failcount.toml missing)
+
+**FR8.2:** The guide includes a "Verify the sandbox" section that
+walks the user through attempting each banned operation manually
+and confirming the denial. This is the user-driven checklist from
+the design.
+
+### 4.9 The test suite (opt-in)
+
+**FR9.1:** `tests/test_failcount.py` (new) — **default-on**. Unit
+tests for the failure threshold module. The full test inventory:
+- `test_initial_state_zero`
+- `test_red_phase_failure_increments`
+- `test_green_success_resets_red_counter`
+- `test_green_phase_failure_increments`
+- `test_no_progress_advances`
+- `test_no_progress_resets_on_commit`
+- `test_no_progress_resets_on_green`
+- `test_threshold_fires_at_three`
+- `test_threshold_does_not_fire_at_two`
+- `test_multi_signal_independence`
+- `test_any_signal_triggers`
+- `test_state_persistence_round_trip`
+- `test_configurable_thresholds`
+
+Target: 100% line + branch coverage on `failcount.py`.
+
+**FR9.2:** `tests/test_tier2_slash_command_spec.py` (new) — **default-on**.
+Loads the slash command markdown, verifies its protocol contract
+(argument parsing, git commands, failcount check, report writing).
+
+**FR9.3:** `tests/test_tier2_setup_bootstrap.py` (new) — **opt-in**
+(`TIER2_SANDBOX_TESTS=1`). Runs `setup_tier2_clone.ps1` against a
+fixture workspace, verifies the side effects (clone exists, origin
+set, templates copied, hooks installed, app-data dir created with
+ACLs).
+
+**FR9.4:** `tests/test_tier2_sandbox_enforcement.py` (new) —
+**opt-in** (`TIER2_SANDBOX_TESTS=1`). The critical test: spawns the
+wrapper in a subprocess, inside the sandboxed context attempts
+each banned operation, verifies each is denied.
+
+**FR9.5:** `tests/test_tier2_report_writer.py` (new) — **opt-in**
+(`TIER2_SANDBOX_TESTS=1`). Invokes failcount until give-up,
+verifies the report file is created at the right path with the
+right 7 sections.
+
+**FR9.6:** `tests/test_tier2_smoke_e2e.py` (new) — **opt-in**
+(`TIER2_SANDBOX_TESTS=1 TIER2_SMOKE=1`). Runs the full pipeline
+against a fixture workspace: bootstrap → invoke the CLI entry
+point → verify the feature branch exists with 1 commit → verify
+the report file is NOT created (success path).
+
+## 5. Non-Functional Requirements
+
+**NFR1. Performance:** the failcount module adds <1ms per check.
+The slash command's protocol adds <500ms to a typical Tier 2 task
+(spec fetch + branch creation + state init).
+
+**NFR2. Reliability:** the failcount state is persisted after every
+commit. A killed run can be resumed (or refused to resume) on the
+next invocation. The state file uses atomic write (write to
+`state.json.tmp` + `os.replace`) to survive crashes mid-write.
+
+**NFR3. Security:**
+- The 4 git bans are enforced at 3 independent layers (OpenCode
+  permission system, Windows OS-level via restricted token, git
+  hooks). A bypass of one layer is caught by another.
+- The filesystem boundary is enforced at 2 independent layers
+  (OpenCode path allowlist, Windows ACLs).
+- The Tier 2 process tree is wrapped in a Job Object that
+  prevents child process escape.
+
+**NFR4. Testability:**
+- The failcount module is pure logic, 100% unit-testable without
+  any infrastructure.
+- The slash command's protocol is duplicated in
+  `scripts/tier2/run_track.py` (CLI entry point) so the smoke e2e
+  test runs without an OpenCode session.
+- All sandbox / bootstrap / smoke tests are env-var-gated
+  (`TIER2_SANDBOX_TESTS=1`, `TIER2_SMOKE=1`).
+
+**NFR5. Auditability:** every Tier 2 run writes to
+`C:\Users\Ed\AppData\Local\manual_slop\tier2\<track>\state.json`
+and (on give-up) `C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\<track>_<timestamp>.md`.
+The user can inspect the state at any time.
+
+**NFR6. UX:** the user clicks zero times during a normal Tier 2
+run. The "did Tier 2 give up?" check is passive (an OpenCode
+banner, an optional Windows toast, and a flag file the user can
+check on next Tier 1 session start).
+
+**NFR7. Backward compatibility:** the main repo's `opencode.json`
+is not modified. Tier 1 retains its `permission: ask` workflow.
+The new agent profile (`tier2-autonomous`) is in the Tier 2 clone
+only. The new slash command is in the Tier 2 clone only.
+
+## 6. Architecture Reference
+
+**This track's design follows these existing patterns:**
+
+- **`docs/guide_architecture.md`** §"Threading model" — the
+  Tier 2 process tree runs in its own Job Object, isolated from
+  the user's main session.
+- **`docs/guide_mma.md`** §"Tier 2/3/4 lifecycles" — the Tier 2
+  Tech Lead's existing delegation patterns (Task tool to
+  `@tier3-worker`, `@tier4-qa`) are preserved in the autonomous
+  mode.
+- **`docs/guide_meta_boundary.md`** — this track is squarely in
+  the "Meta-Tooling" environment (it builds execution infrastructure
+  for the agents), not the "Application" environment. No changes
+  to `src/*.py`.
+- **`docs/guide_testing.md`** §"Authoring robust live_gui tests"
+  + the `live_gui` session-scoped pattern — the smoke e2e test
+  follows the same opt-in env-var-gated pattern.
+- **`conductor/code_styleguides/python.md`** — 1-space indentation,
+  CRLF line endings, no comments, strict type hints. All new Python
+  code in this track follows this styleguide.
+- **`conductor/code_styleguides/error_handling.md`** — the
+  failcount module uses `Result[T, ErrorInfo]` per the convention
+  (the 3 refactored baseline files use it; the convention is being
+  rolled out across the codebase per
+  `data_oriented_error_handling_20260606` + the upcoming
+  `result_migration_20260616` sub-tracks).
+
+**This track's NEW patterns (the contribution to the codebase):**
+
+- **Sibling clone as execution mode switch** — opening OpenCode in
+  a different directory IS the mode switch (no `mode:` flag in
+  `opencode.json`, no env var, just a directory).
+- **3-layer enforcement stack** — OpenCode permission system +
+  Windows restricted token + git hooks. Documented in
+  `docs/guide_tier2_autonomous.md` (this track's new guide).
+- **Bounded autonomous run with fail-loud** — the failcount module
+  is a general-purpose "I'm stuck" detector, applicable to any
+  future autonomous run (not just Tier 2). The pattern is
+  reusable for any sub-agent that has a contract to follow.
+
+## 7. Out of Scope
+
+- **No changes to the Manual Slop app (`src/*.py`).** This is
+  meta-tooling, not the app. The 4 audit scripts
+  (`audit_exception_handling.py`, `audit_weak_types.py`,
+  `audit_main_thread_imports.py`, `audit_no_models_config_io.py`)
+  are not modified.
+- **No changes to the main repo's `opencode.json` or MMA agent
+  profiles.** The new `tier2-autonomous` profile lives in the
+  Tier 2 clone only.
+- **No new top-level `src/<thing>.py` files.** Per the file-naming
+  convention (`AGENTS.md` §"File Size and Naming Convention"), the
+  new code is in `scripts/tier2/`, `conductor/tier2/`, and `tests/`
+  (all namespace-isolated by directory).
+- **No changes to existing tracks or in-flight work.** The
+  `result_migration_20260616` umbrella track, the
+  `data_oriented_error_handling_20260606` track, and the
+  `exception_handling_audit_20260616` track are not affected.
+- **No new audit script.** The failcount thresholds are TOML config,
+  not statically checkable. If a future track adds a checkable
+  convention (e.g., "all CLI entry points must use Result[T]"),
+  the new audit script should follow the
+  `scripts/audit_<name>.py` pattern from the existing 4.
+- **No WSL2 / Docker / Windows Sandbox variants.** The user
+  approved Approach 1 (OpenCode + Windows restricted token + git
+  hooks, all native Windows). WSL2 was considered and deferred;
+  the failure to run Dear PyGui/ImGui tests in WSL2 was the
+  deciding factor.
+- **No parallel Tier 2 runs.** The Tier 2 clone is a single
+  workspace. Two parallel Tier 2 runs would conflict on the
+  feature branch. If parallel runs become a need, that's a
+  follow-up track.
+- **No `git push` to non-origin remotes.** Even though the deny
+  rule is `git push*` (any push), the practical use case is
+  "Tier 2 doesn't push at all; the user pushes after review."
+  Adding a "push to a tier2-remote bare dir" workflow is a
+  follow-up if needed.
+- **No automated review of the feature branch.** Tier 1 reviewing
+  Tier 2's branch is a future track (out of scope here).
+
+---
+
+**Spec ends.** The implementation plan (`plan.md` + `metadata.json`)
+will be written by the `writing-plans` skill in the next phase, after
+the user reviews this spec.