diff --git a/conductor/tracks/tier2_autonomous_sandbox_20260616/spec.md b/conductor/tracks/tier2_autonomous_sandbox_20260616/spec.md new file mode 100644 index 00000000..aa54ccb9 --- /dev/null +++ b/conductor/tracks/tier2_autonomous_sandbox_20260616/spec.md @@ -0,0 +1,614 @@ +# Track Specification: Tier 2 Autonomous Sandbox (unattended track execution with bounded blast radius) + +**Track ID:** `tier2_autonomous_sandbox_20260616` +**Status:** Planned (spec pending user review) +**Priority:** A (user-blocking; eliminates the manual `permission: ask` bottleneck for well-regularized tracks) +**Owner:** Tier 2 Tech Lead (per `conductor/workflow.md`) +**Type:** feature (meta-tooling — adds a new execution mode to the existing MMA workflow, not to the Manual Slop app itself) +**Scope:** ~7 new files in main repo + 1 sibling clone at `C:\projects\manual_slop_tier2\` (one-time bootstrap) +**Parent tracks:** `opencode_config_overhaul_20260310` (shipped; established the agent profile scaffolding this track extends) +**Sibling tracks:** none (independent) + +> **Note on effort estimates:** per the Tier 1 rules (see +> `conductor/workflow.md` §"Tier 1 Track Initialization Rules"), this +> spec does NOT include day estimates. Effort is measured by **scope** (N +> files, M sites) and **T-shirt size** (S/M/L/XL). The user / Tier 2 +> agent decides the actual pacing. + +--- + +## 0. TL;DR + +This track adds an **unattended execution mode** for Tier 2: you open +OpenCode in a sibling clone (`C:\projects\manual_slop_tier2\`), type +`/tier-2-auto-execute `, and Tier 2 runs the track +autonomously — **no `permission: ask` prompts** — while a **3-layer +defense-in-depth** enforcement stack prevents it from touching the +filesystem outside its clone + an app-data temp dir, and from running +destructive git operations (`git restore`, `git push*`, `git checkout`, +`git reset`). If Tier 2 can't make progress (3 red-phase failures, 3 +green-phase failures, or 30 minutes with no commit/green), it stops +early, writes a failure report, and notifies you. You review the +feature branch with Tier 1 in the main repo, then merge. + +**T-shirt size: L** — 7 new files in main repo (mostly config + +scripts + 1 small Python module), 4 new test files, 1 PowerShell +wrapper, 1 bootstrap script, 1 user guide. ~600 lines of new code. + +--- + +## 1. Overview + +### 1.1 The State Before This Track (as of `88e44d1c`) + +The current OpenCode configuration has these properties: + +- **One repo, two modes via agent profile.** `opencode.json:11` sets + `default_agent: "tier2-tech-lead"`. Tier 1 and Tier 2 are + distinguished by which agent profile the user selects in the OpenCode + session, not by which directory they're in. +- **Permission bottleneck on Tier 2.** `.opencode/agents/tier2-tech-lead.md:6-9` + sets `permission: { edit: "ask", bash: "ask", 'manual-slop_*': allow }`. + Every `edit` and every `bash` call from Tier 2 prompts the user for + approval. For well-regularized tracks (TDD red/green/refactor with + atomic per-task commits, e.g., the upcoming `result_migration_*` + tracks), this is **noise** — the user has already pre-approved the + track plan, and the per-task approval doesn't add safety, it just + adds 50+ clicks per track. +- **No filesystem boundary enforcement.** Tier 2 has the same + filesystem access as the user. There is nothing preventing Tier 2 (or + a delegated Tier 3 worker) from reading `C:\Users\Ed\.aws\credentials` + or writing to a different project entirely. +- **No git ban enforcement.** Nothing prevents Tier 2 from running + `git restore`, `git push origin`, `git checkout -- `, or + `git reset --hard`. These are the four operations the user has + called out as "destructive to its progress or affects the origin + server" in the original ask. +- **No failure threshold / give-up mechanism.** A stuck Tier 2 runs + until the user notices or the agent self-terminates. There is no + "3 red-phase attempts without progress → stop and write a report" + guardrail. +- **One OpenCode session at a time.** The main repo's OpenCode session + is the only execution environment. Tier 2 cannot run in parallel with + Tier 1 review. + +### 1.2 The Goal + +Add a **second execution mode** for Tier 2 that is: + +- **Autonomous** — no `permission: ask` prompts for `edit` or `bash` +- **Sandboxed** — file access is restricted to the Tier 2 clone + an + app-data temp dir, enforced at 3 independent layers (OpenCode + permission system, Windows restricted token + ACLs, git hooks) +- **Bounded** — a one-shot run with a failure threshold; stuck runs + stop early and write a report +- **Reviewable** — the run produces a feature branch in the clone; + the user fetches it back to main and reviews with Tier 1 +- **Opt-in to the app's test suite** — the sandbox / bootstrap / smoke + tests are env-var-gated so the default `uv run pytest` run stays + app-focused and fast + +The main repo (the Tier 1 control plane) is **not modified** — +`opencode.json` stays the same (Tier 1 still has `permission: ask`), +and the existing MMA agents stay the same. + +### 1.3 What the User Experiences + +**One-time bootstrap (the user runs once):** +```powershell +cd C:\projects\manual_slop +pwsh scripts/tier2/setup_tier2_clone.ps1 +``` + +**Per-track invocation (the user's normal flow from now on):** +1. `cd C:\projects\manual_slop_tier2` +2. Open OpenCode in that directory (the "Tier 2 Sandboxed" desktop + shortcut the bootstrap created) +3. In the OpenCode session, type: + ``` + /tier-2-auto-execute result_migration_review_pass + ``` +4. Tier 2 fetches the spec, creates `tier2/result_migration_review_pass` + branch, runs the plan, commits per task +5. On success: prints a summary. On give-up: writes a failure report + and prints its path. +6. `cd C:\projects\manual_slop` (back to main) +7. `git fetch C:/projects/manual_slop_tier2 tier2/result_migration_review_pass` +8. Review the diff with Tier 1 (interactive) +9. `git merge --no-ff tier2/result_migration_review_pass` to main + +**No `permission: ask` prompts in step 4.** If a Tier 2 tool call +attempts a banned operation, the OpenCode permission system denies it; +if a delegated Tier 3 worker tries to escape via a Python subprocess, +the Windows ACLs deny it; if a `git push` somehow slips through, the +pre-push hook blocks it. **Three independent layers, all enforcing the +same ban list.** + +--- + +## 2. Current State Audit (as of `88e44d1c`) + +### 2.1 Already Implemented (DO NOT re-implement) + +- **OpenCode agent profile scaffolding** — + `.opencode/agents/tier{1,2,3,4}-*.md:1-200` and the + `opencode.json:1-50` config file. The `tier2-autonomous` agent + profile this track adds follows the same pattern. +- **Slash command pattern** — `.opencode/commands/conductor-implement.md:1-100` + is the existing pattern for slash commands. The + `tier-2-auto-execute.md` command follows the same structure (front + matter `agent:` and `description:`, markdown body with protocol). +- **Conductor track convention** — `conductor/tracks//{spec,plan}.md` + and `metadata.json` per `conductor/workflow.md` "State.toml + Template" + "Track Dependencies and Execution Order" sections. This + track's artifacts follow that pattern. +- **Project-level test opt-in convention** — the `live_gui` fixture + in `tests/conftest.py` and the existing env-var-gated tests (e.g., + the `RUN_LIVE_GUI=1` pattern in `tests/test_live_*.py`). The + `TIER2_SANDBOX_TESTS=1` opt-in gate for this track's sandbox tests + follows the same shape. +- **PowerShell-based tooling** — `scripts/` already contains + PowerShell-adjacent Python scripts. The new wrapper is a pure + PowerShell script, consistent with `pywin32`-based operations on + Windows. +- **`scripts/audit_*.py` pattern** — the 4 existing audit scripts + (`audit_exception_handling.py`, `audit_weak_types.py`, + `audit_main_thread_imports.py`, `audit_no_models_config_io.py`) are + the project's enforcement mechanism. This track does not introduce + a new audit (the failcount thresholds are TOML-config, not + statically checkable), but follows the `scripts/audit_.py` + naming for any future addition. + +### 2.2 Gaps to Fill (This Track's Scope) + +**Gap 1: A second clone as the Tier 2 execution environment.** + +The main repo (`C:\projects\manual_slop\`) currently doubles as both +the Tier 1 control plane and the Tier 2 execution environment. The +fix is a sibling clone at `C:\projects\manual_slop_tier2\` with +`origin` set to the main repo's local path (no remote). The clone is +where the feature branch lives; the user fetches the branch back into +main for review. + +**Gap 2: A `tier2-autonomous` agent profile with deny rules.** + +The existing `tier2-tech-lead` agent has `permission: ask` for `edit` +and `bash`. The fix is a new `tier2-autonomous` agent profile (in the +Tier 2 clone's `opencode.json`) with: +- `permission.edit: allow` +- `permission.bash: { "*": "allow", "git push*": "deny", + "git checkout*": "deny", "git restore*": "deny", "git reset*": "deny" }` +- `permission.read` / `permission.write` restricted to the Tier 2 + clone + `C:\Users\Ed\AppData\Local\manual_slop\tier2\` + +**Gap 3: A sandboxed launcher (Windows restricted token + ACLs).** + +OpenCode's permission system is process-level. A determined Tier 3 +worker calling `os.system("...")` from a delegated Python script +could in principle bypass OpenCode. The fix is a PowerShell wrapper +that: +- Acquires a Windows restricted token (drops `SeBackupPrivilege`, + `SeRestorePrivilege`, `SeTakeOwnershipPrivilege`, `SeDebugPrivilege`, + `SeLoadDriverPrivilege`) +- Sets explicit ACLs on the Tier 2 clone + app-data temp dir (allow + the restricted token, deny everything else) +- Wraps the process tree in a Job Object (no breakaway) +- Launches OpenCode + the MCP server under the restricted token via + `CreateProcessWithTokenW` + +**Gap 4: A `tier-2-auto-execute` slash command.** + +The existing slash commands are conductor-style ("start +implementation", "create track"). The new slash command takes a +`` argument, fetches the spec from `origin/main`, creates +a `tier2/` branch via `git switch -c` (NOT `git checkout`), +runs the plan via Tier 2, monitors the failcount, and reports back. + +**Gap 5: A failure threshold + give-up mechanism (`failcount.py`).** + +The current Tier 2 has no built-in "I can't make progress" detection. +A stuck agent burns tokens until the user notices. The fix is a pure +Python module that tracks three orthogonal signals: +- `red_phase_failures` (3 = give up) +- `green_phase_failures` (3 = give up) +- `no_progress_minutes` (30 = give up) + +Whichever signal hits its threshold first triggers give-up. The +module is pure logic, fully unit-testable, with a TOML config for +threshold overrides. + +**Gap 6: A failure report writer + flag file + notification.** + +When give-up fires, the system needs to: +- Write a markdown report to + `C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\_.md` + with: header, tasks completed, current task state, last 3 failures, + failcount state, git log, recommendation +- Create a `.STOPPED` flag file alongside the report +- Print a clear "TRACK ABORTED" banner in the OpenCode session with + the report path +- Optionally: Windows toast notification (opt-in via `--toast` flag) + +**Gap 7: Git hooks as defense-in-depth (Layer 3).** + +The OpenCode permission system is the primary enforcement for git bans. +A pre-push hook (`pre-push` in the clone's `.git/hooks/`) is the +backup that catches `git push origin*` even if the OpenCode deny rule +is somehow misconfigured. A `post-checkout` hook logs any checkout of +tracked files to a detection log. + +**Gap 8: A user guide for bootstrap + invocation + manual verification.** + +The user needs to know: +- How to run the bootstrap once +- How to invoke the slash command +- What the failure report looks like +- How to review and merge the feature branch +- How to manually verify the sandbox blocks the banned operations + +--- + +## 3. Goals + +- **Eliminate the `permission: ask` bottleneck** for well-regularized + tracks. The user clicks zero times during a normal Tier 2 run + (excluding the "did Tier 2 give up?" check at the end). +- **Enforce the 4 hard git bans** (`git restore`, `git push*`, + `git checkout`, `git reset`) at 3 independent layers (OpenCode, + Windows OS, git hooks). A bypass of one layer is caught by another. +- **Enforce the filesystem boundary** (Tier 2 clone + app-data temp + only) at 2 independent layers (OpenCode path allowlist, Windows + ACLs). Even a delegated Python subprocess can't read outside the + allowlist. +- **Bound the blast radius** with a failure threshold. A stuck Tier 2 + stops within ~30 minutes and writes a report, instead of running + indefinitely. +- **Keep the default test run app-focused.** All sandbox/bootstrap/ + smoke tests are env-var-gated; `uv run pytest` with no env vars + stays fast and never touches the Windows ACL subsystem. +- **Keep Tier 1 unchanged.** The main repo's `opencode.json` is not + modified. Tier 1 retains its `permission: ask` workflow. + +## 4. Functional Requirements + +### 4.1 Bootstrap (one-time, user-driven) + +**FR1.1:** `scripts/tier2/setup_tier2_clone.ps1` (new) clones the +main repo to `C:\projects\manual_slop_tier2\`, sets +`origin = C:\projects\manual_slop`, copies the agent/command/ +opencode.json templates to the clone, installs the git hooks into +the clone's `.git/hooks/`, creates the app-data temp dir +`C:\Users\Ed\AppData\Local\manual_slop\tier2\` with restricted ACLs, +and creates a "Tier 2 (Sandboxed)" desktop shortcut. + +**FR1.2:** The bootstrap is idempotent — re-running it does not +destroy an existing clone's feature branches (it `git fetch origin` +and pulls the latest templates, but does not `git reset` the clone). + +**FR1.3:** The bootstrap dry-run mode (`-WhatIf`) shows what would +happen without making changes. Required for safety. + +### 4.2 The tier2-autonomous agent profile + +**FR2.1:** `.opencode/agents/tier2-autonomous.md` (template) in main +repo; copied to Tier 2 clone during bootstrap. Defines the +autonomous-mode agent with the deny rules in §2.2 Gap 2. + +**FR2.2:** The agent's `temperature: 0.4` (matches Tier 2 Tech Lead). +The agent uses `git switch -c ` for new branches and +`git switch ` for switching — `git checkout` is banned +project-wide. + +**FR2.3:** The agent prompt includes the failcount monitoring +contract: "After each task commit, check +`/tier2//state.json` via the failcount module. If +`should_give_up` returns true, write the failure report and stop." + +### 4.3 The sandboxed launcher + +**FR3.1:** `scripts/tier2/run_tier2_sandboxed.ps1` (new) is the +entry point that opens OpenCode in the Tier 2 clone under a +restricted token. + +**FR3.2:** The wrapper acquires a restricted token via .NET +(`CreateRestrictedToken`), sets ACLs on the Tier 2 clone + app-data +dir to grant the restricted token read/write, wraps the process +tree in a Job Object, and launches OpenCode + the MCP server under +the restricted token via `CreateProcessWithTokenW`. + +**FR3.3:** The wrapper is the target of the "Tier 2 (Sandboxed)" +desktop shortcut created during bootstrap. Right-click → Properties +shows the command: `pwsh -File C:\projects\manual_slop\scripts\tier2\run_tier2_sandboxed.ps1`. + +### 4.4 The slash command + +**FR4.1:** `.opencode/commands/tier-2-auto-execute.md` (template) in +main repo; copied to Tier 2 clone during bootstrap. Takes a +required `` argument. + +**FR4.2:** The slash command: +1. Reads `conductor/tracks//spec.md` + `plan.md` from + the current branch (after a `git fetch origin main`) +2. Creates a `tier2/` branch via + `git switch -c tier2/ origin/main` +3. Initializes the failcount state file at + `/tier2//state.json` +4. Delegates the plan to the tier2-autonomous agent +5. After each task commit, checks failcount; on give-up, writes the + report and stops +6. On success, prints a summary (branch name, N commits, M tasks) + +**FR4.3:** The slash command's protocol is duplicated in a CLI +entry point (`scripts/tier2/run_track.py`) so the smoke e2e test +can invoke the same logic without spinning up an OpenCode session. + +**FR4.4:** The slash command supports `--resume` to continue a +previously-give-up track from the last completed task (state is in +the state.json file). Default behavior: refuse to resume, ask for +explicit confirmation. + +### 4.5 The failcount module + +**FR5.1:** `scripts/tier2/failcount.py` (new) is a pure-Python module +with no external deps. Exposes: +- `class FailcountState` — the signal state dataclass +- `class FailcountConfig` — threshold loader (from TOML or defaults) +- `def should_give_up(state: FailcountState, config: FailcountConfig, + now: datetime) -> Result[bool, ErrorInfo]` +- `def record_red_failure(state: FailcountState) -> FailcountState` +- `def record_green_failure(state: FailcountState) -> FailcountState` +- `def record_green_success(state: FailcountState, + now: datetime) -> FailcountState` (resets no_progress) +- `def record_commit(state: FailcountState, + now: datetime) -> FailcountState` (resets no_progress) +- `def to_dict(state) -> dict`, `def from_dict(d) -> FailcountState` +- `def load_state(track_name: str) -> Result[FailcountState, ErrorInfo]` +- `def save_state(track_name: str, state: FailcountState) -> Result[None, ErrorInfo]` + +**FR5.2:** Default thresholds (override via `failcount.toml`): +- `red_phase_threshold: 3` +- `green_phase_threshold: 3` +- `no_progress_minutes: 30` + +**FR5.3:** `should_give_up` returns `True` if ANY signal hits its +threshold. The `now` parameter is injectable for testing. + +**FR5.4:** `record_green_success` and `record_commit` reset the +`no_progress_minutes` timer. They do NOT reset the red/green +failure counters (those only reset on the next progress signal of +the same type — e.g., a red failure is reset by a green test that +eventually passes). + +### 4.6 The failure report writer + +**FR6.1:** `scripts/tier2/write_report.py` (new) takes a track name, +branch name, state, and a list of `TaskResult` records, and writes +the markdown report to +`C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\_.md`. + +**FR6.2:** The report contains the 7 sections in order: +1. Header (track, branch, started-at, stopped-at, duration, give-up signal) +2. Tasks completed (list with task IDs, commit SHAs, summaries) +3. Current task state (where it stopped: task ID, phase, worker output, test failure) +4. Last 3 failures (truncated to 50 lines, full output in `..._full.log`) +5. Failcount state at give-up +6. Git state (`git log --oneline tier2/ ^origin/main`) +7. Recommendation (heuristic-based: "track too complex", "spec needs clearer plan", "external dependency missing", "review carefully") + +**FR6.3:** A `.STOPPED` flag file is created at +`C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\.STOPPED`. + +**FR6.4:** The report writer returns the report path on success +(via `Result[str, ErrorInfo]`). + +### 4.7 The git hooks (Layer 3) + +**FR7.1:** `conductor/tier2/githooks/pre-push` (template) is a +shell/PowerShell script that refuses `git push` invocations to any +remote. The script returns exit code 1 with the message +"Tier 2 autonomous mode: `git push` is disabled. Push the branch +manually from the main repo after review." + +**FR7.2:** `conductor/tier2/githooks/post-checkout` (template) is a +detection-only hook that logs any checkout of tracked files to +`C:\Users\Ed\AppData\Local\manual_slop\tier2\tier2_checkout_log.txt` +with a timestamp, the commit hash, and the affected paths. + +**FR7.3:** The bootstrap script copies both hooks to the Tier 2 +clone's `.git/hooks/` and `chmod +x` (on Linux/WSL) or sets the +executable bit via `icacls` (on Windows). + +### 4.8 The user guide + +**FR8.1:** `docs/guide_tier2_autonomous.md` (new) covers: +- Why this exists (the `permission: ask` bottleneck) +- One-time bootstrap procedure (with `-WhatIf` instructions) +- Per-track invocation procedure +- The slash command arguments (``, `--resume`, `--toast`) +- The failure report layout (with screenshot/example) +- How to review and merge the feature branch +- The "Verify the sandbox" checklist (manual verification) +- Troubleshooting (common errors: origin not set, hooks not + executable, failcount.toml missing) + +**FR8.2:** The guide includes a "Verify the sandbox" section that +walks the user through attempting each banned operation manually +and confirming the denial. This is the user-driven checklist from +the design. + +### 4.9 The test suite (opt-in) + +**FR9.1:** `tests/test_failcount.py` (new) — **default-on**. Unit +tests for the failure threshold module. The full test inventory: +- `test_initial_state_zero` +- `test_red_phase_failure_increments` +- `test_green_success_resets_red_counter` +- `test_green_phase_failure_increments` +- `test_no_progress_advances` +- `test_no_progress_resets_on_commit` +- `test_no_progress_resets_on_green` +- `test_threshold_fires_at_three` +- `test_threshold_does_not_fire_at_two` +- `test_multi_signal_independence` +- `test_any_signal_triggers` +- `test_state_persistence_round_trip` +- `test_configurable_thresholds` + +Target: 100% line + branch coverage on `failcount.py`. + +**FR9.2:** `tests/test_tier2_slash_command_spec.py` (new) — **default-on**. +Loads the slash command markdown, verifies its protocol contract +(argument parsing, git commands, failcount check, report writing). + +**FR9.3:** `tests/test_tier2_setup_bootstrap.py` (new) — **opt-in** +(`TIER2_SANDBOX_TESTS=1`). Runs `setup_tier2_clone.ps1` against a +fixture workspace, verifies the side effects (clone exists, origin +set, templates copied, hooks installed, app-data dir created with +ACLs). + +**FR9.4:** `tests/test_tier2_sandbox_enforcement.py` (new) — +**opt-in** (`TIER2_SANDBOX_TESTS=1`). The critical test: spawns the +wrapper in a subprocess, inside the sandboxed context attempts +each banned operation, verifies each is denied. + +**FR9.5:** `tests/test_tier2_report_writer.py` (new) — **opt-in** +(`TIER2_SANDBOX_TESTS=1`). Invokes failcount until give-up, +verifies the report file is created at the right path with the +right 7 sections. + +**FR9.6:** `tests/test_tier2_smoke_e2e.py` (new) — **opt-in** +(`TIER2_SANDBOX_TESTS=1 TIER2_SMOKE=1`). Runs the full pipeline +against a fixture workspace: bootstrap → invoke the CLI entry +point → verify the feature branch exists with 1 commit → verify +the report file is NOT created (success path). + +## 5. Non-Functional Requirements + +**NFR1. Performance:** the failcount module adds <1ms per check. +The slash command's protocol adds <500ms to a typical Tier 2 task +(spec fetch + branch creation + state init). + +**NFR2. Reliability:** the failcount state is persisted after every +commit. A killed run can be resumed (or refused to resume) on the +next invocation. The state file uses atomic write (write to +`state.json.tmp` + `os.replace`) to survive crashes mid-write. + +**NFR3. Security:** +- The 4 git bans are enforced at 3 independent layers (OpenCode + permission system, Windows OS-level via restricted token, git + hooks). A bypass of one layer is caught by another. +- The filesystem boundary is enforced at 2 independent layers + (OpenCode path allowlist, Windows ACLs). +- The Tier 2 process tree is wrapped in a Job Object that + prevents child process escape. + +**NFR4. Testability:** +- The failcount module is pure logic, 100% unit-testable without + any infrastructure. +- The slash command's protocol is duplicated in + `scripts/tier2/run_track.py` (CLI entry point) so the smoke e2e + test runs without an OpenCode session. +- All sandbox / bootstrap / smoke tests are env-var-gated + (`TIER2_SANDBOX_TESTS=1`, `TIER2_SMOKE=1`). + +**NFR5. Auditability:** every Tier 2 run writes to +`C:\Users\Ed\AppData\Local\manual_slop\tier2\\state.json` +and (on give-up) `C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\_.md`. +The user can inspect the state at any time. + +**NFR6. UX:** the user clicks zero times during a normal Tier 2 +run. The "did Tier 2 give up?" check is passive (an OpenCode +banner, an optional Windows toast, and a flag file the user can +check on next Tier 1 session start). + +**NFR7. Backward compatibility:** the main repo's `opencode.json` +is not modified. Tier 1 retains its `permission: ask` workflow. +The new agent profile (`tier2-autonomous`) is in the Tier 2 clone +only. The new slash command is in the Tier 2 clone only. + +## 6. Architecture Reference + +**This track's design follows these existing patterns:** + +- **`docs/guide_architecture.md`** §"Threading model" — the + Tier 2 process tree runs in its own Job Object, isolated from + the user's main session. +- **`docs/guide_mma.md`** §"Tier 2/3/4 lifecycles" — the Tier 2 + Tech Lead's existing delegation patterns (Task tool to + `@tier3-worker`, `@tier4-qa`) are preserved in the autonomous + mode. +- **`docs/guide_meta_boundary.md`** — this track is squarely in + the "Meta-Tooling" environment (it builds execution infrastructure + for the agents), not the "Application" environment. No changes + to `src/*.py`. +- **`docs/guide_testing.md`** §"Authoring robust live_gui tests" + + the `live_gui` session-scoped pattern — the smoke e2e test + follows the same opt-in env-var-gated pattern. +- **`conductor/code_styleguides/python.md`** — 1-space indentation, + CRLF line endings, no comments, strict type hints. All new Python + code in this track follows this styleguide. +- **`conductor/code_styleguides/error_handling.md`** — the + failcount module uses `Result[T, ErrorInfo]` per the convention + (the 3 refactored baseline files use it; the convention is being + rolled out across the codebase per + `data_oriented_error_handling_20260606` + the upcoming + `result_migration_20260616` sub-tracks). + +**This track's NEW patterns (the contribution to the codebase):** + +- **Sibling clone as execution mode switch** — opening OpenCode in + a different directory IS the mode switch (no `mode:` flag in + `opencode.json`, no env var, just a directory). +- **3-layer enforcement stack** — OpenCode permission system + + Windows restricted token + git hooks. Documented in + `docs/guide_tier2_autonomous.md` (this track's new guide). +- **Bounded autonomous run with fail-loud** — the failcount module + is a general-purpose "I'm stuck" detector, applicable to any + future autonomous run (not just Tier 2). The pattern is + reusable for any sub-agent that has a contract to follow. + +## 7. Out of Scope + +- **No changes to the Manual Slop app (`src/*.py`).** This is + meta-tooling, not the app. The 4 audit scripts + (`audit_exception_handling.py`, `audit_weak_types.py`, + `audit_main_thread_imports.py`, `audit_no_models_config_io.py`) + are not modified. +- **No changes to the main repo's `opencode.json` or MMA agent + profiles.** The new `tier2-autonomous` profile lives in the + Tier 2 clone only. +- **No new top-level `src/.py` files.** Per the file-naming + convention (`AGENTS.md` §"File Size and Naming Convention"), the + new code is in `scripts/tier2/`, `conductor/tier2/`, and `tests/` + (all namespace-isolated by directory). +- **No changes to existing tracks or in-flight work.** The + `result_migration_20260616` umbrella track, the + `data_oriented_error_handling_20260606` track, and the + `exception_handling_audit_20260616` track are not affected. +- **No new audit script.** The failcount thresholds are TOML config, + not statically checkable. If a future track adds a checkable + convention (e.g., "all CLI entry points must use Result[T]"), + the new audit script should follow the + `scripts/audit_.py` pattern from the existing 4. +- **No WSL2 / Docker / Windows Sandbox variants.** The user + approved Approach 1 (OpenCode + Windows restricted token + git + hooks, all native Windows). WSL2 was considered and deferred; + the failure to run Dear PyGui/ImGui tests in WSL2 was the + deciding factor. +- **No parallel Tier 2 runs.** The Tier 2 clone is a single + workspace. Two parallel Tier 2 runs would conflict on the + feature branch. If parallel runs become a need, that's a + follow-up track. +- **No `git push` to non-origin remotes.** Even though the deny + rule is `git push*` (any push), the practical use case is + "Tier 2 doesn't push at all; the user pushes after review." + Adding a "push to a tier2-remote bare dir" workflow is a + follow-up if needed. +- **No automated review of the feature branch.** Tier 1 reviewing + Tier 2's branch is a future track (out of scope here). + +--- + +**Spec ends.** The implementation plan (`plan.md` + `metadata.json`) +will be written by the `writing-plans` skill in the next phase, after +the user reviews this spec.