feat(directives): scavenge sweep 3/5 (docs/reports/2026-06-08): 6 directives

Lifted: - test_instantiation_not_mock_away: tests must exercise actual instantiation, not mock away the constructor (caught the MiniMax 401 regression) - preserve_prior_versions_of_review_docs: when iterating on a review, preserve prior versions as separate files (v2/v2.1/v2.2/v2.3) - neutral_language_for_doc_drift: doc-drift fixes use 'predates/stale/ outdated', not 'fictional' (value judgment, not technical description) - preserve_before_compact_archive: at 80%+ context, write a comprehensive session-synthesis archive before compaction - user_corrections_log_in_state_toml: per-track state.toml has a user_corrections_log section for any reviewed-by-user track - surface_dirty_state_in_test_runner: when subprocess is dead/degraded, print a clear [BATCH-WARN] rather than silent timeout
2026-07-04 01:05:54 -04:00
parent a89d0cb30e
commit b193df100f
12 changed files with 60 additions and 0 deletions
@@ -0,0 +1,9 @@
+# neutral_language_for_doc_drift
+
+## v1
+
+**Why this iteration:** Lifted from `docs/reports/2026-06-08/docs_sync_test_era_20260610.md` §"Verbiage lesson (applied going forward)". The first 11 continuation commits used the word "fictional" in commit messages and twice in user-facing doc text. The user pushed back: "fictional" is a value judgment on the doc and its author, not a technical description. The technical reality is that the doc described an earlier architecture, the code refactored, and the doc was not updated. Going forward: "predates the refactor", "stale", "outdated", "no longer matches the source", "did not exist in the real dataclass". The word "fictional" is reserved for the narrow case where the doc explicitly says "X is implemented as Y" and the source shows "Y" was never written at all.
+**Source:** `docs/reports/2026-06-08/docs_sync_test_era_20260610.md` §"Verbiage lesson (applied going forward)"
+
+---
+**Lifted:** 2026-07-03 (scavenge sweep batch 3/5: docs/reports/2026-06-08)
@@ -0,0 +1 @@
+# Document-drift fixes must use neutral language (predates, stale, outdated), never value judgments like fictional
@@ -0,0 +1,9 @@
+# preserve_before_compact_archive
+
+## v1
+
+**Why this iteration:** Lifted from `docs/reports/2026-06-08/workflow_markdown_audit_20260608.md` §1.8 ("Document the preserve-before-compact archive pattern"). At 94% context (478,992 tokens) the user explicitly requested "the biggest in-depth report you can muster... I want to preserve as much information synthesized by it". The artifact produced was `docs/reports/session_synthesis_20260608.md` (579 lines, 40KB) — a session-level synthesis with 12 sections covering artifacts produced, source transcripts, commit chain, user-corrections log, track decisions, handoff to next session. The pattern is pre-compaction; the existing AGENTS.md Compaction Recovery is post-compaction.
+**Source:** `docs/reports/2026-06-08/workflow_markdown_audit_20260608.md` §1.8 + `docs/reports/2026-06-08/session_synthesis_20260608.md` (the first instance)
+
+---
+**Lifted:** 2026-07-03 (scavenge sweep batch 3/5: docs/reports/2026-06-08)
@@ -0,0 +1 @@
+# When context usage approaches exhaustion, write a comprehensive session-synthesis archive before compaction so the next session can re-anchor
@@ -0,0 +1,9 @@
+# preserve_prior_versions_of_review_docs
+
+## v1
+
+**Why this iteration:** Lifted from `docs/reports/2026-06-08/nagent_review_session_20260612.md` §"0.2 The 4 review files" + §"Round 2: the v2.1 user-revised". The session iterated v2 → v2.1 → v2.2 → v2.3 of a 434KB nagent review across 5 user-correction rounds. The convention that emerged: each version is a separate file (e.g., `nagent_review_v2.md`, `nagent_review_v2_1.md`, `nagent_review_v2_2.md`, `nagent_review_v2_3.md`) and prior versions are preserved rather than overwritten. The user explicitly required the non-destructive write pattern ("I had to interrupt there I wanted to clarify to make a v2.1 report. I want non-destructive writes I want to keep this v2 draft.").
+**Source:** `docs/reports/2026-06-08/nagent_review_session_20260612.md:97-110` + §"The 5 rounds at a glance" (Round 2 row)
+
+---
+**Lifted:** 2026-07-03 (scavenge sweep batch 3/5: docs/reports/2026-06-08)
@@ -0,0 +1 @@
+# When iterating on a review or assessment document, preserve prior versions as separate files instead of overwriting them
@@ -0,0 +1,9 @@
+# surface_dirty_state_in_test_runner
+
+## v1
+
+**Why this iteration:** Lifted from `docs/reports/2026-06-08/batch_resilience_plan_20260608.md` §"Real User Concern: Within-Session Subprocess Degradation" and the user's verbatim requirement: "I also don't want a batch to be too fragile where I can't restart the app and continue with the next test file if it fails. Just has to note that the new file didn't get to deal with a dirty state." When a prior test file's `sloppy.py` subprocess crashes (e.g., IM_ASSERT triggers `io_pool.shutdown`), the next test file's clicks hit a degraded pool and the test silently times out instead of reporting "subprocess degraded, this batch's tests may not start with a clean state." The fix is a clear printed warning (`[BATCH-WARN] Prior tier-3 batch left the live_gui subprocess (pid=N) dead`) so the user can decide whether to debug or skip the batch.
+**Source:** `docs/reports/2026-06-08/batch_resilience_plan_20260608.md` §2 + §"Solution C: Per-Batch Process Tracking"
+
+---
+**Lifted:** 2026-07-03 (scavenge sweep batch 3/5: docs/reports/2026-06-08)
@@ -0,0 +1 @@
+# When the test runner detects that a subprocess is dead or degraded, surface a clear warning rather than silently timing out
@@ -0,0 +1,9 @@
+# test_instantiation_not_mock_away
+
+## v1
+
+**Why this iteration:** Lifted from `docs/reports/2026-06-08/TEST_REGRESSION_ANALYSIS_MINIMAX_OPENAI_20260613.md` §"Why the Tests Missed the Regressions". The MiniMax / OpenAI-compatible shim regressions (401 from mismatched base URLs, NameError on missing module, type mismatches on the new Result wrapper) all hid behind `with patch("src.ai_client._ensure_minimax_client", return_value=MagicMock())`. Patching out the entire initialization function means the test never executes the actual code that constructs the client, looks up the base URL, or validates the API key. The fix per the report: invoke `_ensure_minimax_client()` directly and assert on the exact arguments to `openai.OpenAI()`.
+**Source:** `docs/reports/2026-06-08/TEST_REGRESSION_ANALYSIS_MINIMAX_OPENAI_20260613.md:14-25` + §"Test Hardening Steps Taken" items 1-2
+
+---
+**Lifted:** 2026-07-03 (scavenge sweep batch 3/5: docs/reports/2026-06-08)
@@ -0,0 +1 @@
+# Tests must exercise the actual instantiation code paths instead of mocking the constructors away
@@ -0,0 +1,9 @@
+# user_corrections_log_in_state_toml
+
+## v1
+
+**Why this iteration:** Lifted from `docs/reports/2026-06-08/workflow_markdown_audit_20260608.md` §1.4 ("Add the Per-Track User-Corrections Log pattern to State.toml Template"). The `nagent_review_20260608/state.toml` introduced a `[user_corrections_log]` section with 7 entries documenting 3 rounds of corrections the user pushed during review. The pattern: `2026-06-08_NN = "correction summary"` keyed by date and sequence number. Each entry is a self-contained one-sentence summary; the full context (what the user said, what the agent changed in response) lives in the report.md or comparison_table.md; the log is the index. Without this section the user-corrections are lost in the report.md and easy to miss when re-anchoring after compaction.
+**Source:** `docs/reports/2026-06-08/workflow_markdown_audit_20260608.md` §1.4 + `conductor/tracks/nagent_review_20260608/state.toml` (first instance)
+
+---
+**Lifted:** 2026-07-03 (scavenge sweep batch 3/5: docs/reports/2026-06-08)
@@ -0,0 +1 @@
+# Per-track state.toml must include a user_corrections_log section for any track where the user reviews drafts and provides corrections
				`@@ -0,0 +1 @@`
				`# Document-drift fixes must use neutral language (predates, stale, outdated), never value judgments like fictional`
				`@@ -0,0 +1 @@`
				`# When context usage approaches exhaustion, write a comprehensive session-synthesis archive before compaction so the next session can re-anchor`
				`@@ -0,0 +1 @@`
				`# When iterating on a review or assessment document, preserve prior versions as separate files instead of overwriting them`
				`@@ -0,0 +1 @@`
				`# When the test runner detects that a subprocess is dead or degraded, surface a clear warning rather than silently timing out`
				`@@ -0,0 +1 @@`
				`# Tests must exercise the actual instantiation code paths instead of mocking the constructors away`
				`@@ -0,0 +1 @@`
				`# Per-track state.toml must include a user_corrections_log section for any track where the user reviews drafts and provides corrections`