From c073e42a7a3bf7d52c78848f49a023ea667b49f3 Mon Sep 17 00:00:00 2001 From: Ed_ Date: Sat, 6 Jun 2026 21:22:40 -0400 Subject: [PATCH] docs(workflow,agents): add 7 process improvements from planning session All additive; no breaking changes to existing content. Derived from gaps observed during the 2026-06-06 planning session (5 tracks spec'd + planned end-to-end). **AGENTS.md (1 new section, 16 lines):** - Compaction Recovery - explicit recovery path for a new agent picking up mid-track (read the digest, check state.toml, run audits, resume from next unchecked task). Cross-references the workflow-level 'Compaction Recovery' section. **conductor/workflow.md (6 new sections, 145 lines):** - Planning Session Workflow - documents the brainstorming -> spec -> plan flow used 5x this session; mandates spec approval before plan; notes the plan is the only artifact the implementer reads. - Track Dependencies and Execution Order - verify the blocked_by chain in metadata.json before starting; topological sort gives the recommended execution order (recorded in PLANNING_DIGEST). - State.toml Template - canonical structure (meta / blocked_by / blocks / phases / tasks / verification / track-specific) so future tracks have a consistent shape. - Per-Task Decision Protocol - small decisions (cosmetic) decide yourself; large decisions (architectural) STOP and report; regressions STOP and report. The boundary is 'does this require a new spec or plan update?'. - Documentation Refresh Protocol - after a track ships, identify affected guides (grep for renamed/moved symbols), update them, add new guides for new modules, add styleguides for new conventions. The 'post-tracks documentation' pattern is repeatable; tracks that only update code are incomplete. - Audit Script Policy - whenever a track introduces a new convention that can be statically checked, add an audit script in scripts/ with --help / --json / strict modes. The audit + CI gate pair is the convention-enforcement mechanism; 3 existing audits (audit_main_thread_imports, audit_weak_types, check_test_toml_paths) are the precedent. All sections reference existing project files (brainstorming skill, writing-plans skill, audit scripts, tracks.md, the existing 5 new tracks' spec.md files, PLANNING_DIGEST_20260606.md). No code changes. Documentation only. ~160 lines total added. --- AGENTS.md | 18 +++++- conductor/workflow.md | 145 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 161 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 51bb165c..549cb082 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -30,6 +30,20 @@ For understanding, using, and maintaining the tool, see `docs/Readme.md` and the - Do not read full files >50 lines without first using `py_get_skeleton` or `get_file_summary` - Do not modify the tech stack without updating `conductor/tech-stack.md` first -- Do not skip TDD — write failing tests before implementation -- Do not batch commits — commit per-task for atomic rollback +- Do not skip TDD - write failing tests before implementation +- Do not batch commits - commit per-task for atomic rollback - Do not add comments to source code; documentation lives in `/docs` + +## Compaction Recovery + +If you're a new agent picking up a session that was compacted (or a previous agent ran out of context), follow this recovery path: + +1. **Read the most recent `docs/reports/PLANNING_DIGEST_.md`** if one exists. It indexes the planning artifacts and explains the design decisions behind the active tracks. +2. **For each in-flight track**, read `conductor/tracks//state.toml` to see `current_phase`; read `conductor/tracks//plan.md` for the task breakdown. +3. **Check `git log --oneline -20`** to see what has been committed; the most recent commits in `conductor/tracks//` are the latest work. +4. **Run the audit scripts** (`scripts/audit_main_thread_imports.py`, `scripts/audit_weak_types.py`) to see the current state of the codebase. +5. **Resume from the next unchecked task** in `state.toml`. The per-task commit discipline means each commit is a safe rollback point. + +The track's `metadata.json` has a `verification_criteria` field — this is the definition of "done" for the track. If all the criteria are checked, the track is complete. + +For deeper recovery, see `conductor/workflow.md` "Compaction Recovery" (the same pattern, but workflow-level). diff --git a/conductor/workflow.md b/conductor/workflow.md index 6e98ad99..afcaf41d 100644 --- a/conductor/workflow.md +++ b/conductor/workflow.md @@ -444,3 +444,148 @@ In particular, watch for: **Prevention:** When reorganizing a class body, run the AST check above immediately after the edit. This catches the issue in <1 second vs. finding it via failing live_gui tests minutes later. +--- + +## Planning Session Workflow + +Some sessions are *planning-only* — the agent produces `spec.md` + `metadata.json` + `state.toml` + `plan.md` for a new track. NO code is written. The flow: + +1. **Explore** the project context. Use the `brainstorming` skill for the structured process (explore → clarify → propose → spec → review → plan). +2. **Ask clarifying questions** (one at a time; multiple choice preferred) to nail down the design. The "what are you trying to achieve + what are the constraints" questions come first; the "what is the scope" question comes after. +3. **Propose 2-3 approaches** with tradeoffs. Lead with the recommended one and explain why. +4. **Write the spec** following the established template (Overview / Goals / Non-Goals / Architecture / Per-File Design / Migration / Risks / Out of Scope / See Also). The spec is the agent's *design intent* — it explains WHY, not just WHAT. +5. **User reviews the spec**. Revise until approved. **The spec MUST be approved before the plan is written.** A plan for an unapproved spec is wasted effort. +6. **Write the plan** following the `writing-plans` skill (2-5 minute steps; full code; TDD). The plan is the agent's *executable plan* — it shows exactly what code to write, one step at a time. +7. **User reviews the plan**. Revise until approved. +8. **Commit spec + plan** in separate commits (per-track: spec commit + plan commit; both with git notes summarizing the work). User invokes implementation in a different session. + +**The plan is the only artifact the implementing agent reads.** Specs are reference; plans are executable. Both are committed. + +**The agent (planning role) does not execute.** If a "while you're at it, can you also..." request arrives mid-session, redirect to a follow-up track; do NOT bundle unrelated work. + +**For the agent's own reference:** the `brainstorming` skill is the source of truth for steps 1-6. The `writing-plans` skill is the source of truth for step 6. + +--- + +## Track Dependencies and Execution Order + +Tracks can depend on other tracks. The `blocked_by` field in each track's `metadata.json` lists the track IDs that must ship first. The field name in state.toml is `[blocked_by]` (a table of track_id = "merged" | "planned" | etc.). + +Before starting implementation of a track: + +1. **Verify all tracks in `blocked_by` are SHIPPED.** Check `conductor/tracks.md` for status (`[x]` = done), or read each blocked_by track's `state.toml` to confirm `current_phase` equals the last phase and the track's notes indicate completion. +2. **If any blocker is NOT shipped:** report to the Tier 2 Tech Lead. Do not proceed. +3. **If the post-state baseline assumptions in the spec (usually a §10 "Coordination with Pending Tracks" section) are not met:** STOP. The implementer must verify the baseline BEFORE starting Phase 1 of the track. The verification commands are in the spec. + +The recommended execution order is the topological sort of the `blocked_by` graph. This is usually recorded in the most recent `docs/reports/PLANNING_DIGEST_*.md` (under "Recommended Execution Order" or "Dependency Picture"). + +--- + +## State.toml Template + +Every track's `conductor/tracks//state.toml` should follow this structure (used as the agent's "where am I in this track" source of truth): + +```toml +# Track state for +# Updated by Tier 2 Tech Lead as tasks complete + +[meta] +track_id = "" +name = "" +status = "active" # active | completed +current_phase = 0 # 0 = pre-Phase 1; 1..N = in Phase N; "complete" if all phases done +last_updated = "" + +[blocked_by] +# Optional. List of track_id = "merged" | "planned" | etc. +# When the implementation agent starts Phase 1, verify all listed tracks are merged. +other_track_id = "merged" + +[blocks] +# Optional. Tracks that depend on this one (populated from the spec's §12.1 "Follow-up Track" section). +followup_track_id = "planned in " + +[phases] +# One entry per phase. Update checkpointsha when the phase checkpoint commit is made. +phase_1 = { status = "pending", checkpointsha = "", name = "" } +phase_2 = { status = "pending", checkpointsha = "", name = "" } +# ... + +[tasks] +# Tasks within phases. Structure: t_ = { status, commit_sha, description } +# status: "pending" | "in_progress" | "completed" | "cancelled" +# The implementing agent marks "in_progress" when starting and "completed" with commit_sha when done. +t1_1 = { status = "pending", commit_sha = "", description = "" } +# ... + +[verification] +# Filled as phases complete. The metadata.json's verification_criteria is the source of truth. +phase___complete = false + +[] +# Optional. Track-specific progress tracking (e.g., audit_count_progression, refactor_stats). +# Add whatever is useful for THIS track. + +[public_api_migration_followup] +# Optional. If the spec plans a follow-up, list it here so future planners can find it. +``` + +The `current_phase` field is the single source of truth for "where is this track." When the implementing agent advances, they update it. + +--- + +## Per-Task Decision Protocol + +When the implementing agent encounters a decision not covered by the plan: + +1. **If the decision is purely cosmetic** (e.g., variable naming, comment placement, exact spacing): pick the option that matches the surrounding code style. Document the choice in the commit message. +2. **If the decision affects the architecture** (e.g., the spec's data model doesn't fit the code; the plan's approach doesn't compile; an external library doesn't behave as expected): **STOP. Do not commit. Report to the Tier 2 Tech Lead.** The lead will either: + - Update the spec to match the new constraint + - Add a clarifying task to the plan + - Defer the work to a follow-up track +3. **If the decision is a regression** (e.g., the plan's code works but introduces a known bug, or fails a test the plan didn't anticipate): **STOP and report.** Don't ship a known regression to save time. The lead will decide whether to fix forward or roll back. + +**The principle: small decisions, decide yourself. Large decisions, escalate.** The boundary is "does this decision require a new spec or plan update?" + +**Documentation:** if a decision was made that the spec or plan should reflect (even if it was a small decision), add a brief note in the commit message. The next agent (after compaction) reads commit messages to recover context. + +--- + +## Documentation Refresh Protocol + +Architectural refactor tracks often change the *shape* of modules the existing docs describe. After a track ships, the affected guides may be partly out of date. + +**After each track ships, the implementing agent must:** + +1. **Identify affected guides.** Run `grep -l "" docs/guide_*.md` to find guides that reference renamed/moved symbols. Also check `docs/Readme.md` for the table of guides. +2. **For each affected guide, update it to reflect the new module structure.** If the spec's §3 or §4 lists the new file structure, mirror that in the guide. +3. **If the track introduced a NEW module**, add a new guide (or a new section to an existing guide). Per the project's `docs/Readme.md` structure, deep-dive guides are per-source-file (e.g., `guide_ai_client.md`, `guide_mcp_client.md`). +4. **If the track introduced a NEW convention** (e.g., the `Result[T]` pattern, the `TypeAlias` convention, the sub-MCP architecture), add a styleguide in `conductor/code_styleguides/.md`. Update `conductor/product-guidelines.md` to reference it. +5. **Commit the doc updates** as part of the track's final phase (or as a follow-up track if the scope is too large). + +**The "post-tracks documentation" pattern is repeatable.** A track that only updates code (not docs) is incomplete. The latest `docs/reports/PLANNING_DIGEST_*.md` (under "Recommended Future Tracks") often lists the documentation refresh as the next track. + +**Test for staleness:** before marking a track complete, run `git log --oneline -10 -- conductor/tracks//` to confirm the docs were touched in the same window as the code. If only code was committed, the track is incomplete. + +--- + +## Audit Script Policy + +Whenever a track introduces a new convention that can be statically checked, add an audit script in `scripts/`. The audit + CI gate pair is the convention-enforcement mechanism for this project. Conventions without audits will drift; audits without CI integration will be ignored. + +**Script conventions:** +- Filename: `audit_.py` or `check_.py` (matching the existing 3 scripts) +- Must have a `--help` that explains what it checks and how to fix violations +- Should support a `--json` mode for CI integration (machine-readable output) +- Should have a default informational mode (exits 0; prints human-readable report) AND a strict mode (exits 1 on regression; used as CI gate) +- Should be runnable from the repo root + +**Existing audit scripts as precedent:** +- `scripts/audit_main_thread_imports.py` — enforces the main-thread-purity invariant from the `startup_speedup_20260606` track +- `scripts/audit_weak_types.py` — enforces the type-alias convention from the `data_structure_strengthening_20260606` track +- `scripts/check_test_toml_paths.py` — enforces no real-TOML references in tests (predates the audit-script-policy, but follows the pattern) + +**CI integration:** when a new audit script is added, it should be added to whatever CI workflow exists (or a follow-up track should add the CI workflow if one doesn't exist). The strict mode of the audit is the gate. + +**The audit-script + styleguide pair:** every audit script's documented "what it checks" should map to a section in a `conductor/code_styleguides/` file. The styleguide says "this is the rule"; the audit says "your code violates this rule." The pair is complete when both exist. +