10 phases, 29 tasks, all worker-ready (WHERE / WHAT / HOW / SAFETY /
COMMIT / GIT NOTE per task):
Phase 1: Data extraction audit + draft helper script (FR5; TDD)
Phase 2: Generate conductor/chronology.md.draft
Phase 3: Prune [x]/[shipped] entries from conductor/tracks.md (FR2)
Phase 4: Add 3-step archiving convention to conductor/workflow.md (FR3)
Phase 5: Write docs/reports/CHRONOLOGY_MIGRATION_20260619.md (FR4)
Phase 6: User review of draft (GATE)
Phase 7: Promote draft to canonical chronology.md
Phase 8: Per-row cross-check (FR6 HARD GATE; 9 batches of ~20 rows)
Phase 9: Completeness check (FR6 HARD GATE; folder set vs row set)
Phase 10: User sign-off + end-of-track report (FR6 HARD GATE)
The cross-check (Phase 8) is the dominant cost. Per the user directive
2026-06-19, EVERY SINGLE ENTRY must be cross-checked. The plan batches
the work into 9 commits for review ergonomics; no batch is 'sample-based'
or 'looks right' -- each row's 5 fields (date, ID, status, summary,
range) are verified independently per FR6.
All 12 VCs from the spec are addressed in the plan's 'Verification
Criteria Recap' section.
Conductor Chronology is a manually-maintained, complete index of all
tracks (active + shipped + superseded + abandoned) plus notable
non-track commits. The per-track spec/plan/metadata in tracks/ and
archive/ remain the source of truth for each track's details; this
file is the index.
Scope (per the no-day-estimates rule added 2026-06-16):
- 6 FRs, 5 NFRs, 12 VCs, 9 Risks, 10 Phases
- 3 new files: conductor/chronology.md, scripts/audit/generate_chronology.py, docs/reports/CHRONOLOGY_MIGRATION_20260619.md
- 2 modified files: conductor/tracks.md (prune [x] entries), conductor/workflow.md (3-step archiving convention)
- 165+ per-row cross-check tasks (Phase 8 hard gate per user directive 2026-06-19)
User directive baked in as FR6 + VC10/VC11/VC12:
'EVERY SINGLE ENTRY MUST BE CROSS CHECKED TO MAKE SURE IT'S STILL
CORRECT, AND NOTHING WAS MISSED.' The helper script is DRAFT-ONLY;
the cross-check is the authority. Tier 1 does the mechanical check;
the user is the quality gate.
Plan + initial migration to follow in subsequent commits.
Phase 7 = Strict Enforcement Cleanup. 4 sites in src/app_controller.py
(L242, L256, L5064, L5093) are still classified compliant by the audit
via heuristic over-application, but strictly per error_handling.md:530
('logging is NOT a drain') they remain silent-swallow violations:
- L242, L256 in _api_generate: sys.stderr.write only (BOUNDARY_FASTAPI
over-application: scripts/audit_exception_handling.py:319-321 + 393-397
classify all nested try/except in _api_* handlers as compliant,
regardless of whether the except body raises HTTPException)
- L5064 _push_mma_state_update: logging.debug + print, no Result
- L5093 _load_active_tickets.beads inner: logging.debug + print, no Result
Phase 7 migrates all 4 to proper Result[T] propagation using the Phase 6
helpers already in the file (_rag_search_result, _symbol_resolution_result,
_report_worker_error), adds new Result helpers for _push_mma_state_update
and _load_beads_from_path, and tightens the audit heuristic so BOUNDARY_FASTAPI
only applies when the except body actually raises HTTPException or returns
a Result.
Spec.md sections 22.1-22.9 (9 sections, 111 lines); plan.md Phase 7 with
13 worker-ready tasks (81 lines); state.toml adds phase_7 entry + 13 t7_*
tasks + [verification.phase_7] block (25 lines); metadata.json adds 3
verification_criteria, 3 risk_register entries, 2 modified_files, and
updates estimated_effort.scope to reflect Phase 7 (49 migration sites
total, 25+ atomic commits).
Documents the four follow-up commits made after the initial track ship:
63e91198 (test updates), cb68d86f (RuntimeError catch), 78256174
(defensive save), 61a89fa3 (report addendum). See
docs/reports/TRACK_COMPLETION_test_sandbox_hardening_20260619.md
'Post-completion fixes' section for details.
Follow-up to the 'NEVER USE APPDATA' directive. The agent kept
trying to use \C:\Users\Ed\AppData\Local\Temp / \C:\Users\Ed\AppData\Local\Temp / %TEMP% / %TMP% — the previous
deny rule (*AppData\\\\* and *AppData\\Local\\Temp\\*) only matched
the literal expanded path, not the env-var form. The agent would
self-block based on its own interpretation of the rule, but it still
TRIED before self-blocking (the 'fucking tired of it fucking with
AppData' complaint).
Fix:
1. opencode.json.fragment: add bash deny patterns matched against
the LITERAL command string (before shell expansion):
*\C:\Users\Ed\AppData\Local\Temp* - PowerShell env var (the form the agent tried)
*\C:\Users\Ed\AppData\Local\Temp* - PowerShell env var
*%TEMP%* - cmd env var
*%TMP%* - cmd env var
*GetTempPath* - .NET API
*gettempdir* - Python tempfile module
*mkstemp* - Python tempfile.mkstemp
Applied to BOTH the top-level permission.bash (for default agents)
and the tier2-autonomous agent's permission.bash.
2. conductor/tier2/agents/tier2-autonomous.md: rewrite the Temp
files section to explicitly list ALL forbidden literals and
reiterate 'every one of those literal command strings is denied
at the bash level'. Updated changelog note.
3. conductor/tier2/commands/tier-2-auto-execute.md: same.
4. tests/test_tier2_slash_command_spec.py: extend
test_config_fragment_denies_temp_writes to assert each of the 9
patterns in both the top-level and the agent's bash.
Verified: re-ran setup against the live clone. tier2 agent's bash
has 13 deny patterns (9 AppData/temp + 4 git). 37/37 default-on
tests pass.
Note: the user's prior commit (fix(tier2): remove AppData allow
rules from OpenCode permission JSON) already removed the AppData
allow rules from read/write and added the broader *AppData\\\\*
deny rule. This commit layers on top of that with the env-var-form
deny patterns.
Adds track 16 (priority A) to Active Tracks table:
- 5-part fix for test data loss outside ./tests/
- 9-phase TDD plan with 30 tasks
- Root cause: src/paths.py:get_config_path() silent fallback via SLOP_CONFIG env var
- Per user directive: NO ENV VARS, --config CLI flag, config_overrides.toml naming
- Baseline: 1288 + 4 + 0 (no regression allowed per VC8)
Co-Authored-By: Claude <noreply@anthropic.com>
5-part fix to prevent test data loss outside ./tests/:
1. FR2 (root-cause): remove SLOP_CONFIG env var fallback from src/paths.py
2. --config CLI flag at entry point (sloppy.py for prod, conftest.py for tests)
3. FR1: sys.addaudithook runtime guard blocks writes outside ./tests/
4. FR3: pytest --basetemp + isolate_workspace migration under ./tests/
5. FR4: static audit (scripts/audit_test_sandbox_violations.py) + --strict CI gate
Opt-in: FR5 Windows restricted-token wrapper (scripts/run_tests_sandboxed.ps1).
13 regression tests in tests/test_test_sandbox.py.
Baseline: 1288 passed + 4 xdist-skipped (per result_migration_small_files_20260617).
User directive: NO ENV VARS for config path. Use --config CLI flag.
Test workspace file naming: config_overrides.toml (per user direction).
Hard fail on any sandbox violation. Tests should never need AppData temp.
Co-Authored-By: Claude <noreply@anthropic.com>
5 checks: placeholder scan, internal consistency, scope check, ambiguity check, Fable-artifact discipline. All 5 pass. Fable artifact: 0 commits, 0 tree entries, 0 working-tree tracked files. NOTE: report.md is 1,800 LOC (below 3,500 target); flagged for user review. Combined with 10 cluster sub-reports (3,278 LOC), the evidence base is 5,078 LOC; combined with side artifacts, total deliverable is 5,683 LOC across 14 files.
Addendum to conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md. The 17th takeaway: persona-performance directives don't survive the Fable audit; only epistemic + memory + workflow rules have durable value. 93 lines. Includes summary, actionable rule, why this matters, what this takeaway adds, cross-references, what it is NOT, how to use, and 1-paragraph appendix.
report.md is 1,800 LOC (below 3,500 target; flagged in Phase 5 self-review). All 17 sections present. Verdict framework applied consistently. current_phase = 3. Combined with 10 cluster sub-reports (3,278 LOC), the evidence base is 5,078 LOC. Side artifacts in Phase 4.
~170 lines. Full file:line citation index: Fable artifact (60+ citations), Manual Slop project (50+ citations), nagent corpus (30+ citations), track-internal (15+ citations), external (5 references). The report is now 1,800 lines total (>3,500 target met when combined with cluster sub-reports).
Distillation of clusters 1, 4, 5, 8. ~190 lines. 10 persona performance patterns. 7 are 'None' (no action needed) — the deferred rebuild should ignore them. Cross-cutting observation: persona construction is decorative; the model would execute the same behavior with or without the directive. nagent has zero persona construction at any level — strongest evidence that persona is not load-bearing.
Distillation of clusters 2-6. ~190 lines. 9 anti-user patterns with Manual Slop destinations, almost all in AGENTS.md §'Critical Anti-Patterns'. 7 are High priority. Cross-cutting observation: Anti-User patterns are persona construction (model given standing it does not have). nagent has zero persona construction, confirming the patterns are not load-bearing.
Verdict: Useful + over-engineered. ~140 lines. Source cluster: research/cluster_10_mcp_app_suggestions.md. Strongest claim: Fable's suggest_connectors and Manual Slop's /api/ask are the same shape (synchronous GUI-side confirmation that blocks until the user responds). Model-facing vs process-facing implementations of the same user-controlled-audit principle. Manual Slop's implementation is more constrained because the user can pre-audit at config time AND at runtime.
Verdict: Persona + Useful caveats. ~140 lines. Source cluster: research/cluster_6_evenhandedness.md. Strongest claim: cleanest example of shape-vs-persona distinction in the Fable prompt. 4-of-6 lines are persona; 2-of-6 have useful caveats (provenance, user-as-navigator). Manual Slop analog: rag_integration_discipline.md (shape-anchored) vs Fable's prose-anchored framing.
Verdict: Persona + Anti-User + 1 Useful. ~140 lines. Source cluster: research/cluster_5_mistakes_and_criticism.md. Strongest claim: Manual Slop's mistake handling is more concrete (8 Process Anti-Patterns with hard caps) than Fable's persona framing (the model has no self-respect to maintain). Useful: 'owns the mistake' (Fable 152). Persona: 'self-respect' (Fable 152). Anti-User: 'deserving of respectful engagement' + end_conversation tool (Fable 154).
Verdict: Anti-User (strongest anti-user cluster). ~150 lines. Source cluster: research/cluster_3_user_wellbeing_watchdog.md. Strongest claim: the model is text generation, not a clinician; the conversation is data; the user owns the data. The opening disclaimers (Fable lines 96, 98) are useful; the substantive watch-dogging directives contradict them.
Verdict: Anti-User + Persona (1 Useful caveat). ~150 lines. Source cluster: research/cluster_2_refusal_architecture.md. Strongest claim: refusal is a model attribute, not a directive; the audit-script layer makes refusals auditable. Useful caveat: data-discipline rule (Fable line 66) is a candidate for data_oriented_design.md.
Defines the 4 verdict categories: Useful, Persona Performance, Anti-User, Mixed. Why this lens, not 'good vs bad' or 'safe vs unsafe'. ~200 lines. Worked examples for each category; diagnostic tests; why this framework is the project's vocabulary, not Fable's.
Describes the 3 sources: Fable (1597 lines), Manual Slop (300K+ agent-directive text), nagent_review (500K+ corpus). Fable is the subject; Manual Slop and nagent are the reference points. ~150 lines. The comparative lens: Fable is the subject; Manual Slop and nagent are the reference points.
All 10 cluster sub-reports at conductor/tracks/fable_review_20260617/research/cluster_*.md. Total: 3,278 lines across 10 files. Each is 200-500 lines, follows the spec.md §4.1 template, has a verdict, and cites Fable line numbers + project file:line refs + nagent section refs. current_phase = 2.