Private
Public Access
0
0

conductor(spec): Tier 2 autonomous sandbox track spec

This commit is contained in:
2026-06-16 18:31:48 -04:00
parent 88e44d1c0e
commit 024938bd46
@@ -0,0 +1,614 @@
# Track Specification: Tier 2 Autonomous Sandbox (unattended track execution with bounded blast radius)
**Track ID:** `tier2_autonomous_sandbox_20260616`
**Status:** Planned (spec pending user review)
**Priority:** A (user-blocking; eliminates the manual `permission: ask` bottleneck for well-regularized tracks)
**Owner:** Tier 2 Tech Lead (per `conductor/workflow.md`)
**Type:** feature (meta-tooling — adds a new execution mode to the existing MMA workflow, not to the Manual Slop app itself)
**Scope:** ~7 new files in main repo + 1 sibling clone at `C:\projects\manual_slop_tier2\` (one-time bootstrap)
**Parent tracks:** `opencode_config_overhaul_20260310` (shipped; established the agent profile scaffolding this track extends)
**Sibling tracks:** none (independent)
> **Note on effort estimates:** per the Tier 1 rules (see
> `conductor/workflow.md` §"Tier 1 Track Initialization Rules"), this
> spec does NOT include day estimates. Effort is measured by **scope** (N
> files, M sites) and **T-shirt size** (S/M/L/XL). The user / Tier 2
> agent decides the actual pacing.
---
## 0. TL;DR
This track adds an **unattended execution mode** for Tier 2: you open
OpenCode in a sibling clone (`C:\projects\manual_slop_tier2\`), type
`/tier-2-auto-execute <track-name>`, and Tier 2 runs the track
autonomously — **no `permission: ask` prompts** — while a **3-layer
defense-in-depth** enforcement stack prevents it from touching the
filesystem outside its clone + an app-data temp dir, and from running
destructive git operations (`git restore`, `git push*`, `git checkout`,
`git reset`). If Tier 2 can't make progress (3 red-phase failures, 3
green-phase failures, or 30 minutes with no commit/green), it stops
early, writes a failure report, and notifies you. You review the
feature branch with Tier 1 in the main repo, then merge.
**T-shirt size: L** — 7 new files in main repo (mostly config +
scripts + 1 small Python module), 4 new test files, 1 PowerShell
wrapper, 1 bootstrap script, 1 user guide. ~600 lines of new code.
---
## 1. Overview
### 1.1 The State Before This Track (as of `88e44d1c`)
The current OpenCode configuration has these properties:
- **One repo, two modes via agent profile.** `opencode.json:11` sets
`default_agent: "tier2-tech-lead"`. Tier 1 and Tier 2 are
distinguished by which agent profile the user selects in the OpenCode
session, not by which directory they're in.
- **Permission bottleneck on Tier 2.** `.opencode/agents/tier2-tech-lead.md:6-9`
sets `permission: { edit: "ask", bash: "ask", 'manual-slop_*': allow }`.
Every `edit` and every `bash` call from Tier 2 prompts the user for
approval. For well-regularized tracks (TDD red/green/refactor with
atomic per-task commits, e.g., the upcoming `result_migration_*`
tracks), this is **noise** — the user has already pre-approved the
track plan, and the per-task approval doesn't add safety, it just
adds 50+ clicks per track.
- **No filesystem boundary enforcement.** Tier 2 has the same
filesystem access as the user. There is nothing preventing Tier 2 (or
a delegated Tier 3 worker) from reading `C:\Users\Ed\.aws\credentials`
or writing to a different project entirely.
- **No git ban enforcement.** Nothing prevents Tier 2 from running
`git restore`, `git push origin`, `git checkout -- <file>`, or
`git reset --hard`. These are the four operations the user has
called out as "destructive to its progress or affects the origin
server" in the original ask.
- **No failure threshold / give-up mechanism.** A stuck Tier 2 runs
until the user notices or the agent self-terminates. There is no
"3 red-phase attempts without progress → stop and write a report"
guardrail.
- **One OpenCode session at a time.** The main repo's OpenCode session
is the only execution environment. Tier 2 cannot run in parallel with
Tier 1 review.
### 1.2 The Goal
Add a **second execution mode** for Tier 2 that is:
- **Autonomous** — no `permission: ask` prompts for `edit` or `bash`
- **Sandboxed** — file access is restricted to the Tier 2 clone + an
app-data temp dir, enforced at 3 independent layers (OpenCode
permission system, Windows restricted token + ACLs, git hooks)
- **Bounded** — a one-shot run with a failure threshold; stuck runs
stop early and write a report
- **Reviewable** — the run produces a feature branch in the clone;
the user fetches it back to main and reviews with Tier 1
- **Opt-in to the app's test suite** — the sandbox / bootstrap / smoke
tests are env-var-gated so the default `uv run pytest` run stays
app-focused and fast
The main repo (the Tier 1 control plane) is **not modified**
`opencode.json` stays the same (Tier 1 still has `permission: ask`),
and the existing MMA agents stay the same.
### 1.3 What the User Experiences
**One-time bootstrap (the user runs once):**
```powershell
cd C:\projects\manual_slop
pwsh scripts/tier2/setup_tier2_clone.ps1
```
**Per-track invocation (the user's normal flow from now on):**
1. `cd C:\projects\manual_slop_tier2`
2. Open OpenCode in that directory (the "Tier 2 Sandboxed" desktop
shortcut the bootstrap created)
3. In the OpenCode session, type:
```
/tier-2-auto-execute result_migration_review_pass
```
4. Tier 2 fetches the spec, creates `tier2/result_migration_review_pass`
branch, runs the plan, commits per task
5. On success: prints a summary. On give-up: writes a failure report
and prints its path.
6. `cd C:\projects\manual_slop` (back to main)
7. `git fetch C:/projects/manual_slop_tier2 tier2/result_migration_review_pass`
8. Review the diff with Tier 1 (interactive)
9. `git merge --no-ff tier2/result_migration_review_pass` to main
**No `permission: ask` prompts in step 4.** If a Tier 2 tool call
attempts a banned operation, the OpenCode permission system denies it;
if a delegated Tier 3 worker tries to escape via a Python subprocess,
the Windows ACLs deny it; if a `git push` somehow slips through, the
pre-push hook blocks it. **Three independent layers, all enforcing the
same ban list.**
---
## 2. Current State Audit (as of `88e44d1c`)
### 2.1 Already Implemented (DO NOT re-implement)
- **OpenCode agent profile scaffolding** —
`.opencode/agents/tier{1,2,3,4}-*.md:1-200` and the
`opencode.json:1-50` config file. The `tier2-autonomous` agent
profile this track adds follows the same pattern.
- **Slash command pattern** — `.opencode/commands/conductor-implement.md:1-100`
is the existing pattern for slash commands. The
`tier-2-auto-execute.md` command follows the same structure (front
matter `agent:` and `description:`, markdown body with protocol).
- **Conductor track convention** — `conductor/tracks/<id>/{spec,plan}.md`
and `metadata.json` per `conductor/workflow.md` "State.toml
Template" + "Track Dependencies and Execution Order" sections. This
track's artifacts follow that pattern.
- **Project-level test opt-in convention** — the `live_gui` fixture
in `tests/conftest.py` and the existing env-var-gated tests (e.g.,
the `RUN_LIVE_GUI=1` pattern in `tests/test_live_*.py`). The
`TIER2_SANDBOX_TESTS=1` opt-in gate for this track's sandbox tests
follows the same shape.
- **PowerShell-based tooling** — `scripts/` already contains
PowerShell-adjacent Python scripts. The new wrapper is a pure
PowerShell script, consistent with `pywin32`-based operations on
Windows.
- **`scripts/audit_*.py` pattern** — the 4 existing audit scripts
(`audit_exception_handling.py`, `audit_weak_types.py`,
`audit_main_thread_imports.py`, `audit_no_models_config_io.py`) are
the project's enforcement mechanism. This track does not introduce
a new audit (the failcount thresholds are TOML-config, not
statically checkable), but follows the `scripts/audit_<name>.py`
naming for any future addition.
### 2.2 Gaps to Fill (This Track's Scope)
**Gap 1: A second clone as the Tier 2 execution environment.**
The main repo (`C:\projects\manual_slop\`) currently doubles as both
the Tier 1 control plane and the Tier 2 execution environment. The
fix is a sibling clone at `C:\projects\manual_slop_tier2\` with
`origin` set to the main repo's local path (no remote). The clone is
where the feature branch lives; the user fetches the branch back into
main for review.
**Gap 2: A `tier2-autonomous` agent profile with deny rules.**
The existing `tier2-tech-lead` agent has `permission: ask` for `edit`
and `bash`. The fix is a new `tier2-autonomous` agent profile (in the
Tier 2 clone's `opencode.json`) with:
- `permission.edit: allow`
- `permission.bash: { "*": "allow", "git push*": "deny",
"git checkout*": "deny", "git restore*": "deny", "git reset*": "deny" }`
- `permission.read` / `permission.write` restricted to the Tier 2
clone + `C:\Users\Ed\AppData\Local\manual_slop\tier2\`
**Gap 3: A sandboxed launcher (Windows restricted token + ACLs).**
OpenCode's permission system is process-level. A determined Tier 3
worker calling `os.system("...")` from a delegated Python script
could in principle bypass OpenCode. The fix is a PowerShell wrapper
that:
- Acquires a Windows restricted token (drops `SeBackupPrivilege`,
`SeRestorePrivilege`, `SeTakeOwnershipPrivilege`, `SeDebugPrivilege`,
`SeLoadDriverPrivilege`)
- Sets explicit ACLs on the Tier 2 clone + app-data temp dir (allow
the restricted token, deny everything else)
- Wraps the process tree in a Job Object (no breakaway)
- Launches OpenCode + the MCP server under the restricted token via
`CreateProcessWithTokenW`
**Gap 4: A `tier-2-auto-execute` slash command.**
The existing slash commands are conductor-style ("start
implementation", "create track"). The new slash command takes a
`<track-name>` argument, fetches the spec from `origin/main`, creates
a `tier2/<track-name>` branch via `git switch -c` (NOT `git checkout`),
runs the plan via Tier 2, monitors the failcount, and reports back.
**Gap 5: A failure threshold + give-up mechanism (`failcount.py`).**
The current Tier 2 has no built-in "I can't make progress" detection.
A stuck agent burns tokens until the user notices. The fix is a pure
Python module that tracks three orthogonal signals:
- `red_phase_failures` (3 = give up)
- `green_phase_failures` (3 = give up)
- `no_progress_minutes` (30 = give up)
Whichever signal hits its threshold first triggers give-up. The
module is pure logic, fully unit-testable, with a TOML config for
threshold overrides.
**Gap 6: A failure report writer + flag file + notification.**
When give-up fires, the system needs to:
- Write a markdown report to
`C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\<track>_<utc-timestamp>.md`
with: header, tasks completed, current task state, last 3 failures,
failcount state, git log, recommendation
- Create a `.STOPPED` flag file alongside the report
- Print a clear "TRACK ABORTED" banner in the OpenCode session with
the report path
- Optionally: Windows toast notification (opt-in via `--toast` flag)
**Gap 7: Git hooks as defense-in-depth (Layer 3).**
The OpenCode permission system is the primary enforcement for git bans.
A pre-push hook (`pre-push` in the clone's `.git/hooks/`) is the
backup that catches `git push origin*` even if the OpenCode deny rule
is somehow misconfigured. A `post-checkout` hook logs any checkout of
tracked files to a detection log.
**Gap 8: A user guide for bootstrap + invocation + manual verification.**
The user needs to know:
- How to run the bootstrap once
- How to invoke the slash command
- What the failure report looks like
- How to review and merge the feature branch
- How to manually verify the sandbox blocks the banned operations
---
## 3. Goals
- **Eliminate the `permission: ask` bottleneck** for well-regularized
tracks. The user clicks zero times during a normal Tier 2 run
(excluding the "did Tier 2 give up?" check at the end).
- **Enforce the 4 hard git bans** (`git restore`, `git push*`,
`git checkout`, `git reset`) at 3 independent layers (OpenCode,
Windows OS, git hooks). A bypass of one layer is caught by another.
- **Enforce the filesystem boundary** (Tier 2 clone + app-data temp
only) at 2 independent layers (OpenCode path allowlist, Windows
ACLs). Even a delegated Python subprocess can't read outside the
allowlist.
- **Bound the blast radius** with a failure threshold. A stuck Tier 2
stops within ~30 minutes and writes a report, instead of running
indefinitely.
- **Keep the default test run app-focused.** All sandbox/bootstrap/
smoke tests are env-var-gated; `uv run pytest` with no env vars
stays fast and never touches the Windows ACL subsystem.
- **Keep Tier 1 unchanged.** The main repo's `opencode.json` is not
modified. Tier 1 retains its `permission: ask` workflow.
## 4. Functional Requirements
### 4.1 Bootstrap (one-time, user-driven)
**FR1.1:** `scripts/tier2/setup_tier2_clone.ps1` (new) clones the
main repo to `C:\projects\manual_slop_tier2\`, sets
`origin = C:\projects\manual_slop`, copies the agent/command/
opencode.json templates to the clone, installs the git hooks into
the clone's `.git/hooks/`, creates the app-data temp dir
`C:\Users\Ed\AppData\Local\manual_slop\tier2\` with restricted ACLs,
and creates a "Tier 2 (Sandboxed)" desktop shortcut.
**FR1.2:** The bootstrap is idempotent — re-running it does not
destroy an existing clone's feature branches (it `git fetch origin`
and pulls the latest templates, but does not `git reset` the clone).
**FR1.3:** The bootstrap dry-run mode (`-WhatIf`) shows what would
happen without making changes. Required for safety.
### 4.2 The tier2-autonomous agent profile
**FR2.1:** `.opencode/agents/tier2-autonomous.md` (template) in main
repo; copied to Tier 2 clone during bootstrap. Defines the
autonomous-mode agent with the deny rules in §2.2 Gap 2.
**FR2.2:** The agent's `temperature: 0.4` (matches Tier 2 Tech Lead).
The agent uses `git switch -c <branch>` for new branches and
`git switch <branch>` for switching — `git checkout` is banned
project-wide.
**FR2.3:** The agent prompt includes the failcount monitoring
contract: "After each task commit, check
`<app-data>/tier2/<track>/state.json` via the failcount module. If
`should_give_up` returns true, write the failure report and stop."
### 4.3 The sandboxed launcher
**FR3.1:** `scripts/tier2/run_tier2_sandboxed.ps1` (new) is the
entry point that opens OpenCode in the Tier 2 clone under a
restricted token.
**FR3.2:** The wrapper acquires a restricted token via .NET
(`CreateRestrictedToken`), sets ACLs on the Tier 2 clone + app-data
dir to grant the restricted token read/write, wraps the process
tree in a Job Object, and launches OpenCode + the MCP server under
the restricted token via `CreateProcessWithTokenW`.
**FR3.3:** The wrapper is the target of the "Tier 2 (Sandboxed)"
desktop shortcut created during bootstrap. Right-click → Properties
shows the command: `pwsh -File C:\projects\manual_slop\scripts\tier2\run_tier2_sandboxed.ps1`.
### 4.4 The slash command
**FR4.1:** `.opencode/commands/tier-2-auto-execute.md` (template) in
main repo; copied to Tier 2 clone during bootstrap. Takes a
required `<track-name>` argument.
**FR4.2:** The slash command:
1. Reads `conductor/tracks/<track-name>/spec.md` + `plan.md` from
the current branch (after a `git fetch origin main`)
2. Creates a `tier2/<track-name>` branch via
`git switch -c tier2/<track-name> origin/main`
3. Initializes the failcount state file at
`<app-data>/tier2/<track-name>/state.json`
4. Delegates the plan to the tier2-autonomous agent
5. After each task commit, checks failcount; on give-up, writes the
report and stops
6. On success, prints a summary (branch name, N commits, M tasks)
**FR4.3:** The slash command's protocol is duplicated in a CLI
entry point (`scripts/tier2/run_track.py`) so the smoke e2e test
can invoke the same logic without spinning up an OpenCode session.
**FR4.4:** The slash command supports `--resume` to continue a
previously-give-up track from the last completed task (state is in
the state.json file). Default behavior: refuse to resume, ask for
explicit confirmation.
### 4.5 The failcount module
**FR5.1:** `scripts/tier2/failcount.py` (new) is a pure-Python module
with no external deps. Exposes:
- `class FailcountState` — the signal state dataclass
- `class FailcountConfig` — threshold loader (from TOML or defaults)
- `def should_give_up(state: FailcountState, config: FailcountConfig,
now: datetime) -> Result[bool, ErrorInfo]`
- `def record_red_failure(state: FailcountState) -> FailcountState`
- `def record_green_failure(state: FailcountState) -> FailcountState`
- `def record_green_success(state: FailcountState,
now: datetime) -> FailcountState` (resets no_progress)
- `def record_commit(state: FailcountState,
now: datetime) -> FailcountState` (resets no_progress)
- `def to_dict(state) -> dict`, `def from_dict(d) -> FailcountState`
- `def load_state(track_name: str) -> Result[FailcountState, ErrorInfo]`
- `def save_state(track_name: str, state: FailcountState) -> Result[None, ErrorInfo]`
**FR5.2:** Default thresholds (override via `failcount.toml`):
- `red_phase_threshold: 3`
- `green_phase_threshold: 3`
- `no_progress_minutes: 30`
**FR5.3:** `should_give_up` returns `True` if ANY signal hits its
threshold. The `now` parameter is injectable for testing.
**FR5.4:** `record_green_success` and `record_commit` reset the
`no_progress_minutes` timer. They do NOT reset the red/green
failure counters (those only reset on the next progress signal of
the same type — e.g., a red failure is reset by a green test that
eventually passes).
### 4.6 The failure report writer
**FR6.1:** `scripts/tier2/write_report.py` (new) takes a track name,
branch name, state, and a list of `TaskResult` records, and writes
the markdown report to
`C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\<track>_<utc-timestamp>.md`.
**FR6.2:** The report contains the 7 sections in order:
1. Header (track, branch, started-at, stopped-at, duration, give-up signal)
2. Tasks completed (list with task IDs, commit SHAs, summaries)
3. Current task state (where it stopped: task ID, phase, worker output, test failure)
4. Last 3 failures (truncated to 50 lines, full output in `..._full.log`)
5. Failcount state at give-up
6. Git state (`git log --oneline tier2/<track> ^origin/main`)
7. Recommendation (heuristic-based: "track too complex", "spec needs clearer plan", "external dependency missing", "review carefully")
**FR6.3:** A `.STOPPED` flag file is created at
`C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\<track>.STOPPED`.
**FR6.4:** The report writer returns the report path on success
(via `Result[str, ErrorInfo]`).
### 4.7 The git hooks (Layer 3)
**FR7.1:** `conductor/tier2/githooks/pre-push` (template) is a
shell/PowerShell script that refuses `git push` invocations to any
remote. The script returns exit code 1 with the message
"Tier 2 autonomous mode: `git push` is disabled. Push the branch
manually from the main repo after review."
**FR7.2:** `conductor/tier2/githooks/post-checkout` (template) is a
detection-only hook that logs any checkout of tracked files to
`C:\Users\Ed\AppData\Local\manual_slop\tier2\tier2_checkout_log.txt`
with a timestamp, the commit hash, and the affected paths.
**FR7.3:** The bootstrap script copies both hooks to the Tier 2
clone's `.git/hooks/` and `chmod +x` (on Linux/WSL) or sets the
executable bit via `icacls` (on Windows).
### 4.8 The user guide
**FR8.1:** `docs/guide_tier2_autonomous.md` (new) covers:
- Why this exists (the `permission: ask` bottleneck)
- One-time bootstrap procedure (with `-WhatIf` instructions)
- Per-track invocation procedure
- The slash command arguments (`<track-name>`, `--resume`, `--toast`)
- The failure report layout (with screenshot/example)
- How to review and merge the feature branch
- The "Verify the sandbox" checklist (manual verification)
- Troubleshooting (common errors: origin not set, hooks not
executable, failcount.toml missing)
**FR8.2:** The guide includes a "Verify the sandbox" section that
walks the user through attempting each banned operation manually
and confirming the denial. This is the user-driven checklist from
the design.
### 4.9 The test suite (opt-in)
**FR9.1:** `tests/test_failcount.py` (new) — **default-on**. Unit
tests for the failure threshold module. The full test inventory:
- `test_initial_state_zero`
- `test_red_phase_failure_increments`
- `test_green_success_resets_red_counter`
- `test_green_phase_failure_increments`
- `test_no_progress_advances`
- `test_no_progress_resets_on_commit`
- `test_no_progress_resets_on_green`
- `test_threshold_fires_at_three`
- `test_threshold_does_not_fire_at_two`
- `test_multi_signal_independence`
- `test_any_signal_triggers`
- `test_state_persistence_round_trip`
- `test_configurable_thresholds`
Target: 100% line + branch coverage on `failcount.py`.
**FR9.2:** `tests/test_tier2_slash_command_spec.py` (new) — **default-on**.
Loads the slash command markdown, verifies its protocol contract
(argument parsing, git commands, failcount check, report writing).
**FR9.3:** `tests/test_tier2_setup_bootstrap.py` (new) — **opt-in**
(`TIER2_SANDBOX_TESTS=1`). Runs `setup_tier2_clone.ps1` against a
fixture workspace, verifies the side effects (clone exists, origin
set, templates copied, hooks installed, app-data dir created with
ACLs).
**FR9.4:** `tests/test_tier2_sandbox_enforcement.py` (new) —
**opt-in** (`TIER2_SANDBOX_TESTS=1`). The critical test: spawns the
wrapper in a subprocess, inside the sandboxed context attempts
each banned operation, verifies each is denied.
**FR9.5:** `tests/test_tier2_report_writer.py` (new) — **opt-in**
(`TIER2_SANDBOX_TESTS=1`). Invokes failcount until give-up,
verifies the report file is created at the right path with the
right 7 sections.
**FR9.6:** `tests/test_tier2_smoke_e2e.py` (new) — **opt-in**
(`TIER2_SANDBOX_TESTS=1 TIER2_SMOKE=1`). Runs the full pipeline
against a fixture workspace: bootstrap → invoke the CLI entry
point → verify the feature branch exists with 1 commit → verify
the report file is NOT created (success path).
## 5. Non-Functional Requirements
**NFR1. Performance:** the failcount module adds <1ms per check.
The slash command's protocol adds <500ms to a typical Tier 2 task
(spec fetch + branch creation + state init).
**NFR2. Reliability:** the failcount state is persisted after every
commit. A killed run can be resumed (or refused to resume) on the
next invocation. The state file uses atomic write (write to
`state.json.tmp` + `os.replace`) to survive crashes mid-write.
**NFR3. Security:**
- The 4 git bans are enforced at 3 independent layers (OpenCode
permission system, Windows OS-level via restricted token, git
hooks). A bypass of one layer is caught by another.
- The filesystem boundary is enforced at 2 independent layers
(OpenCode path allowlist, Windows ACLs).
- The Tier 2 process tree is wrapped in a Job Object that
prevents child process escape.
**NFR4. Testability:**
- The failcount module is pure logic, 100% unit-testable without
any infrastructure.
- The slash command's protocol is duplicated in
`scripts/tier2/run_track.py` (CLI entry point) so the smoke e2e
test runs without an OpenCode session.
- All sandbox / bootstrap / smoke tests are env-var-gated
(`TIER2_SANDBOX_TESTS=1`, `TIER2_SMOKE=1`).
**NFR5. Auditability:** every Tier 2 run writes to
`C:\Users\Ed\AppData\Local\manual_slop\tier2\<track>\state.json`
and (on give-up) `C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\<track>_<timestamp>.md`.
The user can inspect the state at any time.
**NFR6. UX:** the user clicks zero times during a normal Tier 2
run. The "did Tier 2 give up?" check is passive (an OpenCode
banner, an optional Windows toast, and a flag file the user can
check on next Tier 1 session start).
**NFR7. Backward compatibility:** the main repo's `opencode.json`
is not modified. Tier 1 retains its `permission: ask` workflow.
The new agent profile (`tier2-autonomous`) is in the Tier 2 clone
only. The new slash command is in the Tier 2 clone only.
## 6. Architecture Reference
**This track's design follows these existing patterns:**
- **`docs/guide_architecture.md`** §"Threading model" — the
Tier 2 process tree runs in its own Job Object, isolated from
the user's main session.
- **`docs/guide_mma.md`** §"Tier 2/3/4 lifecycles" — the Tier 2
Tech Lead's existing delegation patterns (Task tool to
`@tier3-worker`, `@tier4-qa`) are preserved in the autonomous
mode.
- **`docs/guide_meta_boundary.md`** — this track is squarely in
the "Meta-Tooling" environment (it builds execution infrastructure
for the agents), not the "Application" environment. No changes
to `src/*.py`.
- **`docs/guide_testing.md`** §"Authoring robust live_gui tests"
+ the `live_gui` session-scoped pattern — the smoke e2e test
follows the same opt-in env-var-gated pattern.
- **`conductor/code_styleguides/python.md`** — 1-space indentation,
CRLF line endings, no comments, strict type hints. All new Python
code in this track follows this styleguide.
- **`conductor/code_styleguides/error_handling.md`** — the
failcount module uses `Result[T, ErrorInfo]` per the convention
(the 3 refactored baseline files use it; the convention is being
rolled out across the codebase per
`data_oriented_error_handling_20260606` + the upcoming
`result_migration_20260616` sub-tracks).
**This track's NEW patterns (the contribution to the codebase):**
- **Sibling clone as execution mode switch** — opening OpenCode in
a different directory IS the mode switch (no `mode:` flag in
`opencode.json`, no env var, just a directory).
- **3-layer enforcement stack** — OpenCode permission system +
Windows restricted token + git hooks. Documented in
`docs/guide_tier2_autonomous.md` (this track's new guide).
- **Bounded autonomous run with fail-loud** — the failcount module
is a general-purpose "I'm stuck" detector, applicable to any
future autonomous run (not just Tier 2). The pattern is
reusable for any sub-agent that has a contract to follow.
## 7. Out of Scope
- **No changes to the Manual Slop app (`src/*.py`).** This is
meta-tooling, not the app. The 4 audit scripts
(`audit_exception_handling.py`, `audit_weak_types.py`,
`audit_main_thread_imports.py`, `audit_no_models_config_io.py`)
are not modified.
- **No changes to the main repo's `opencode.json` or MMA agent
profiles.** The new `tier2-autonomous` profile lives in the
Tier 2 clone only.
- **No new top-level `src/<thing>.py` files.** Per the file-naming
convention (`AGENTS.md` §"File Size and Naming Convention"), the
new code is in `scripts/tier2/`, `conductor/tier2/`, and `tests/`
(all namespace-isolated by directory).
- **No changes to existing tracks or in-flight work.** The
`result_migration_20260616` umbrella track, the
`data_oriented_error_handling_20260606` track, and the
`exception_handling_audit_20260616` track are not affected.
- **No new audit script.** The failcount thresholds are TOML config,
not statically checkable. If a future track adds a checkable
convention (e.g., "all CLI entry points must use Result[T]"),
the new audit script should follow the
`scripts/audit_<name>.py` pattern from the existing 4.
- **No WSL2 / Docker / Windows Sandbox variants.** The user
approved Approach 1 (OpenCode + Windows restricted token + git
hooks, all native Windows). WSL2 was considered and deferred;
the failure to run Dear PyGui/ImGui tests in WSL2 was the
deciding factor.
- **No parallel Tier 2 runs.** The Tier 2 clone is a single
workspace. Two parallel Tier 2 runs would conflict on the
feature branch. If parallel runs become a need, that's a
follow-up track.
- **No `git push` to non-origin remotes.** Even though the deny
rule is `git push*` (any push), the practical use case is
"Tier 2 doesn't push at all; the user pushes after review."
Adding a "push to a tier2-remote bare dir" workflow is a
follow-up if needed.
- **No automated review of the feature branch.** Tier 1 reviewing
Tier 2's branch is a future track (out of scope here).
---
**Spec ends.** The implementation plan (`plan.md` + `metadata.json`)
will be written by the `writing-plans` skill in the next phase, after
the user reviews this spec.