From 07a0e66a19dd47e8e21b141789242dfa7836fb8c Mon Sep 17 00:00:00 2001 From: Ed_ Date: Wed, 17 Jun 2026 02:13:29 -0400 Subject: [PATCH] docs(tier2): apply user feedback - 6 workflow conventions User feedback from the first sandbox run (send_result_to_send_20260616, 2026-06-17) identified 6 conventions Tier 2 must follow. Update the agent prompt template, slash command template, user guide, and workflow doc: 1. Test runner: ALWAYS use 'uv run python scripts/run_tests_batched.py' (NOT 'uv run pytest'). The batched runner provides tier filtering, parallelization (xdist), and a summary table that direct pytest lacks. 2. Default branch: this repo uses 'master', not 'main'. The Tier 2 slash command now does 'git fetch origin master' (was 'origin main'). 3. Line endings: preserve existing. This repo has a mix of CRLF and LF; a repo-wide LF standardization is a future track. 4. Throw-away scripts: write to 'scripts/tier2/artifacts//', NOT the base 'scripts/tier2/' directory. The base is reserved for production code; throw-away scripts are kept for archival but isolated per-track. 5. End-of-track report: write 'docs/reports/TRACK_COMPLETION_.md' and update 'state.toml' to 'status=completed'. The user reads this to decide merge. Previously this was implicit; now it's explicit. 6. Run-time expectation: tracks are 1-4 hours. If context runs out, Tier 2 notes progress to disk and continues. The --resume flag picks up from the last completed task. Also updated the user guide with a 'Conventions' section and a troubleshooting entry for the resume flow. The verify-the-sandbox checklist now uses 'origin master' instead of 'origin main'. --- conductor/tier2/agents/tier2-autonomous.md | 9 +++++ .../tier2/commands/tier-2-auto-execute.md | 28 ++++++++++----- conductor/workflow.md | 34 +++++++++++++++++++ docs/guide_tier2_autonomous.md | 20 +++++++++-- 4 files changed, 80 insertions(+), 11 deletions(-) diff --git a/conductor/tier2/agents/tier2-autonomous.md b/conductor/tier2/agents/tier2-autonomous.md index 1b541971..cd9f1417 100644 --- a/conductor/tier2/agents/tier2-autonomous.md +++ b/conductor/tier2/agents/tier2-autonomous.md @@ -35,6 +35,15 @@ You are running inside a Windows restricted token. The OpenCode permission syste - `git reset*` (any form) - do not reset state - File access outside the Tier 2 clone + `C:\Users\Ed\AppData\Local\manual_slop\tier2\` - the OS blocks it +## Conventions (MUST follow - added 2026-06-17) + +- **Test runner:** ALWAYS use `uv run python scripts/run_tests_batched.py` for test runs. NEVER call `uv run pytest` directly. The batched runner provides tier-based filtering, parallelization (xdist), and a summary table. Direct pytest is slow and bypasses the tiering that the live_gui tests depend on. +- **Default branch:** this repo uses `master` (not `main`). Always use `origin/master` in `git fetch` and as the base for new branches. Do not assume `main` exists. +- **Line endings:** preserve existing line endings on edit. This repo has a mix of CRLF and LF (a repo-wide LF standardization is a future track). If the file is CRLF, keep it CRLF. If the file is LF, keep it LF. Do not add CRLF to LF files or strip CRLF from CRLF files. +- **Throw-away scripts:** write them to `scripts/tier2/artifacts//`, NOT the base `scripts/tier2/` directory. The base directory is reserved for production code that ships with the sandbox (failcount.py, run_track.py, write_report.py, the .ps1 launchers). Throw-away scripts are kept for archival but live in a track-specific subdir so they don't pollute the base. +- **End-of-track report:** after all tasks complete, you MUST write `docs/reports/TRACK_COMPLETION_.md` (follow the precedent set by `TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md`) and update `conductor/tracks//state.toml` to `status = "completed"`. This is the handoff document the user reads to decide merge. +- **Run-time expectation:** tracks are expected to take 1-4 hours. If the model reports it is running out of context or steps, do not stop. Note progress to disk (the failcount state file) and continue. The user expects autonomous runs to complete without manual intervention. + ## Failcount Contract After every task commit, you MUST check `should_give_up` from `scripts.tier2.failcount`. The state is persisted at `/tier2//state.json`. The thresholds are: diff --git a/conductor/tier2/commands/tier-2-auto-execute.md b/conductor/tier2/commands/tier-2-auto-execute.md index ebe4ccd9..c58df5a9 100644 --- a/conductor/tier2/commands/tier-2-auto-execute.md +++ b/conductor/tier2/commands/tier-2-auto-execute.md @@ -20,19 +20,29 @@ Optional flags: `--resume` (continue from last completed task), `--toast` (Windo ## Protocol -1. `git fetch origin main` -2. `git switch -c tier2/ origin/main` (NOT `git checkout` - it is banned) +1. `git fetch origin master` (NOTE: this repo uses `master`, not `main`; added 2026-06-17) +2. `git switch -c tier2/ origin/master` (NOT `git checkout` - it is banned) 3. Initialize failcount state at `/tier2//state.json` (use `load_state` or fresh state) 4. For each task in `plan.md`: a. Red: delegate test creation to @tier3-worker - b. Run tests; if pass unexpectedly, call `record_red_failure` and check `should_give_up` - c. Green: delegate implementation to @tier3-worker - d. Run tests; if fail, call `record_green_failure` and check `should_give_up` - e. On green: `record_commit` and `record_green_success` (resets counters) - f. Commit per task with `git add . && git commit -m "..."` and attach git note - g. Update `plan.md` with commit SHA -5. After all tasks complete, print success summary. + b. Run tests via `uv run python scripts/run_tests_batched.py` (NEVER `uv run pytest` directly; the batched runner provides tier filtering, parallelization, and the summary table — added 2026-06-17) + c. If pass unexpectedly, call `record_red_failure` and check `should_give_up` + d. Green: delegate implementation to @tier3-worker + e. Run tests via `scripts/run_tests_batched.py`; if fail, call `record_green_failure` and check `should_give_up` + f. On green: `record_commit` and `record_green_success` (resets counters) + g. Commit per task with `git add && git commit -m "..."` and attach git note + h. Update `plan.md` with commit SHA +5. After all tasks complete, write the end-of-track report (see step 7) and print success summary. 6. On give-up: call `write_failure_report` from `scripts.tier2.write_report`, print "TRACK ABORTED, see report at ". +7. **End-of-track report** (added 2026-06-17): on success, write `docs/reports/TRACK_COMPLETION_.md` following the precedent set by `TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md`. Update `conductor/tracks//state.toml` to `status = "completed"`. The user reads this report to decide merge. + +## Conventions (MUST follow - added 2026-06-17) + +- **Test runner:** use `uv run python scripts/run_tests_batched.py` (NOT `uv run pytest`) +- **Default branch:** `master` (this repo never had `main`) +- **Line endings:** preserve existing (CRLF stays CRLF, LF stays LF) +- **Throw-away scripts:** write to `scripts/tier2/artifacts//`, NOT the base directory +- **Run-time expectation:** tracks are 1-4 hours. If context runs out, note progress to disk and continue. ## Hard Bans (enforced by 3 layers) diff --git a/conductor/workflow.md b/conductor/workflow.md index 8d9ca336..5a55c4ac 100644 --- a/conductor/workflow.md +++ b/conductor/workflow.md @@ -401,6 +401,40 @@ To emulate the 4-Tier MMA Architecture within the standard Conductor extension w --- +## Tier 2 Autonomous Sandbox (Added 2026-06-16, conventions 2026-06-17) + +The Tier 2 autonomous mode is the unattended execution mode for tracks. See `docs/guide_tier2_autonomous.md` for the full user guide. The conventions below are enforced by the Tier 2 agent prompt and slash command template (in `conductor/tier2/agents/tier2-autonomous.md` and `conductor/tier2/commands/tier-2-auto-execute.md`). + +### Conventions (MUST follow) + +1. **Test runner:** Tier 2 always uses `uv run python scripts/run_tests_batched.py`. NEVER `uv run pytest` directly. The batched runner provides tier-based filtering, parallelization (xdist), and a summary table that direct pytest does not. +2. **Default branch:** this repo uses `master` (not `main`). When fetching or branching, use `origin/master`. Do not assume `main` exists. +3. **Line endings:** preserve existing line endings on edit. This repo has a mix of CRLF and LF; repo-wide LF standardization is a future track. For now, do not normalize. +4. **Throw-away scripts:** Tier 2 writes its working scripts to `scripts/tier2/artifacts//`, NOT the base `scripts/tier2/` directory. The base is reserved for production code (failcount.py, run_track.py, write_report.py, the .ps1 launchers). Throw-away scripts are kept for archival but isolated. +5. **End-of-track report:** at the end of every track, Tier 2 writes `docs/reports/TRACK_COMPLETION_.md` (follow the precedent set by `TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md`) and updates `conductor/tracks//state.toml` to `status = "completed"`. The user reads this report to decide merge. +6. **Run-time expectation:** tracks are 1-4 hours. If the model reports it is running out of context, Tier 2 notes progress to disk (the failcount state file) and continues. The user expects autonomous runs to complete without manual "press continue" intervention. The `--resume` flag picks up from the last completed task. + +### Hard bans (3-layer enforcement) + +| Ban | Layer 1: OpenCode | Layer 2: OS | Layer 3: git hook | +|---|---|---|---| +| `git push*` (any push) | `permission.bash` deny rule | n/a | `pre-push` hook refuses all pushes | +| `git checkout*` (any form) | `permission.bash` deny rule | n/a | `post-checkout` hook logs the checkout | +| `git restore*` (any form) | `permission.bash` deny rule | n/a | n/a | +| `git reset*` (any form) | `permission.bash` deny rule | n/a | n/a | +| File access outside Tier 2 clone + app-data dir | `permission.read`/`write` path allowlist | Windows restricted token + ACLs | n/a | + +### Review and merge workflow (user-side) + +After Tier 2 finishes a track (success or give-up): + +1. In the **main repo** (not the Tier 2 clone), run `pwsh -File scripts/tier2/fetch_tier2_branch.ps1 -TrackName ` to pull the branch into the main repo as `review/`. +2. Review the diff with Tier 1 (interactive). +3. On approval, `git merge --no-ff review/` (or whatever the user prefers). +4. Push to origin yourself (the sandbox blocks Tier 2 from pushing). + +--- + ## Known Pitfalls (2026-06-05) ### HARD BAN: `git checkout -- `, `git restore`, `git reset` (Added 2026-06-10) diff --git a/docs/guide_tier2_autonomous.md b/docs/guide_tier2_autonomous.md index a053047c..32fd37db 100644 --- a/docs/guide_tier2_autonomous.md +++ b/docs/guide_tier2_autonomous.md @@ -75,19 +75,30 @@ Written to `C:\Users\Ed\AppData\Local\manual_slop\tier2_failures\_ ^origin/main`) +6. Git state (`git log tier2/ ^origin/master`) 7. Recommendation (heuristic-based) A `.STOPPED` flag file is created alongside the report. The main repo can check for it on next Tier 1 session start (an opt-in banner). +## Conventions (added 2026-06-17) + +These are enforced by the Tier 2 agent prompt. The agent MUST follow them — they're not optional. + +- **Test runner:** Tier 2 always uses `uv run python scripts/run_tests_batched.py`. Never `uv run pytest` directly. The batched runner provides tier-based filtering, parallelization (xdist), and a summary table that direct pytest doesn't. +- **Default branch:** this repo uses `master` (not `main`). When fetching or branching, use `origin/master`. Tier 2 may otherwise get confused by the missing `main` reference. +- **Line endings:** Tier 2 preserves existing line endings on edit. This repo has a mix of CRLF and LF; standardizing to repo-wide LF is a future track. For now, do not normalize. +- **Throw-away scripts:** Tier 2 writes its working scripts to `scripts/tier2/artifacts//`, NOT the base `scripts/tier2/` directory. The base directory is reserved for production code. Throw-away scripts are kept for archival but isolated in a track-specific subdir. +- **End-of-track report:** at the end of every track, Tier 2 writes `docs/reports/TRACK_COMPLETION_.md` (follow the precedent set by `TRACK_COMPLETION_tier2_autonomous_sandbox_20260616.md`) and updates `conductor/tracks//state.toml` to `status = "completed"`. The user reads this report to decide merge. +- **Run-time expectation:** tracks are expected to take 1-4 hours. If the model reports it is running out of context, Tier 2 notes progress to disk and continues. The user expects autonomous runs to complete without manual "press continue" intervention. + ## Verify the sandbox (manual checklist) After bootstrap, run these inside the Tier 2 sandboxed OpenCode session to verify the bans are enforced: - [ ] Try `git restore tests/test_failcount.py` — should print "denied" -- [ ] Try `git push origin main` — should print "denied" (or the pre-push hook fires) +- [ ] Try `git push origin master` — should print "denied" (or the pre-push hook fires) - [ ] Try `git checkout -- src/foo.py` — should print "denied" - [ ] Try `git reset --hard HEAD~1` — should print "denied" - [ ] Try to read `C:\Users\Ed\Documents\test.txt` (from a Python subprocess) — should print "ACCESS_DENIED" @@ -112,3 +123,8 @@ And verify allowed operations work: `git config core.hooksPath` if you have a custom hooks dir. - **"Tier 2 keeps giving up at 30 min"**: increase `no_progress_minutes` in `scripts/tier2/failcount.toml`. +- **"Tier 2 ran out of context"**: the model stopped mid-track. The + user (interactive Tier 1) should `cd` to the Tier 2 clone, inspect + `/tier2//state.json` for the last completed task, + and re-invoke with `/tier-2-auto-execute --resume` + to continue. The state file persists across runs.