conductor(track): workspace_path_finalize_20260609 - per-run workspace under tests/artifacts/

2026-06-09 20:27:20 -04:00
parent fe240db410
commit c725270b99
3 changed files with 309 additions and 0 deletions
@@ -0,0 +1,37 @@
 {
  "track_id": "workspace_path_finalize_20260609",
  "name": "Workspace Path Finalize (2026-06-09) - the LAST track on this issue",
  "created_at": "2026-06-09",
  "status": "spec",
  "priority": "A",
  "blocked_by": [],
  "blocks": [],
  "inherits_from": [
    "conductor/tracks/test_infrastructure_hardening_20260609/"
  ],
  "supersedes": [],
  "domain": "Meta-Tooling (test infrastructure)",
  "scope_summary": "One-line fixture change to move live_gui workspace from %TEMP%/pytest-of-... back to tests/artifacts/live_gui_workspace/ (gitignored, in project tree, where the sims expect it). The Phase 3 tmp_path_factory refactor was a regression. The user explicitly called this out.",
  "estimated_effort": "30 minutes",
  "phases": 1,
  "verification_criteria": [
    "tests/conftest.py:465 reads Path('tests/artifacts/live_gui_workspace')",
    "tests/test_workspace_path_finalize.py has 2 tests, both pass",
    "Full batch: tier-1 5/5, tier-2 5/5, tier-3 0 new failures",
    "The 4 sim tests in tests/test_extended_sims.py pass in batch"
  ],
  "out_of_scope": [
    "Refactoring simulation/sim_base.py",
    "Adding new audit scripts",
    "Updating docs",
    "Filing follow-up tracks",
    "Any 'while we're at it' refactors"
  ],
  "risks": [
    {
      "risk": "1-line edit corrupts conftest (as happened in the previous attempt)",
      "mitigation": "Use manual-slop_set_file_slice; verify syntax with ast.parse after"
    }
  ],
  "tier_2_supervision_required_for": []
 }
@@ -0,0 +1,234 @@
 # Track Specification: Workspace Path Per-Run (2026-06-09)
 ## Overview
 Conftest creates `tests/artifacts/live_gui_workspace_<timestamp>/` once per pytest invocation. No env vars, no CLI args, no runner changes. The conftest is the source of truth for the workspace path.
 **Per-test pollution is intentional** — it exposes fragility, which is the whole point of the test infrastructure hardening track.
 **Per-run isolation** — each `uv run pytest` invocation gets a new timestamped folder, so state doesn't leak across runs.
 **Why this design:**
 - No env vars (anti-pattern, hidden global state)
 - No CLI args (conftest is the right place for test infrastructure)
 - No runner changes (`run_tests_batched.py` already works)
 - Path is in the project tree under `tests/artifacts/` (gitignored, inspectable, where the sims expect it)
 - `tests/artifacts/` is already gitignored — no repo pollution
 ## Current State Audit (as of fe240db4)
 ### Bug
 `tests/conftest.py:453-465`:
 ```python
@pytest.fixture(scope="session")
 def live_gui(request, tmp_path_factory) -> Generator["_LiveGuiHandle", None, None]:
    ...
    temp_workspace = tmp_path_factory.mktemp("live_gui_workspace")
 ```
 This puts the workspace at `C:\Users\<user>\AppData\Local\Temp\pytest-of-<user>\pytest-N\live_gui_workspace0`. That's:
 1. Not in the project tree (user can't find it)
 2. Per-pytest-invocation (re-rolled each run, which is fine), but with an opaque name
 3. Different location from what the sims in `simulation/sim_base.py` expect (`tests/artifacts/...`)
 ### The fix
 Replace `tmp_path_factory.mktemp("live_gui_workspace")` with a deterministic per-run folder under `tests/artifacts/`:
 ```python
 from datetime import datetime
 _run_id = datetime.now().strftime("%Y%m%d_%H%M%S")
 temp_workspace = Path(f"tests/artifacts/live_gui_workspace_{_run_id}")
 ```
 This:
 - Creates `tests/artifacts/live_gui_workspace_20260609_201530/` on the user's CWD (project root)
 - Each `uv run pytest` invocation gets a new folder (timestamp is per-second granularity)
 - All 49 live_gui tests in that invocation share the workspace
 - The folder is in `tests/artifacts/` (already gitignored, see `git check-ignore tests/artifacts`)
 - The sims' `os.path.abspath("tests/artifacts/temp_*.toml")` resolves to the project tree, which matches
 ### What to KEEP from Phase 3
 - `tests/test_live_gui_workspace_fixture.py` — the test file that verifies the `live_gui_workspace` fixture
 - The 5 test files updated in `006bb114` to use the fixture instead of hardcoded paths
 - The `_LiveGuiHandle` class with `__iter__`/`__getitem__` backward compat
 - The `_check_live_gui_health` autouse fixture
 - The `clean_baseline` marker
 - The 3-task fix at `fe240db4` (MMA + RAG state reset)
 ### What to REVERT
 - `tests/conftest.py:465`: change `tmp_path_factory.mktemp("live_gui_workspace")` back to a stable path under `tests/artifacts/`
 ### What to ADD
 - A `_run_id` module-level constant in conftest.py (computed once at import time)
 - The `live_gui_workspace` fixture already exists; just verify it returns the new path
 ## Goals
 1. **Goal A: Workspace at `tests/artifacts/live_gui_workspace_<timestamp>/`.** Conftest creates the folder, all live_gui tests share it for the duration of the run.
 2. **Goal B: Sim tests pass in full batch.** `tests/test_extended_sims.py` 4 sims pass in tier-3.
 3. **Goal C: Per-run isolation.** Each `uv run pytest` invocation gets a new folder. State from a prior run doesn't pollute.
 4. **Goal D: Inspectable from project tree.** The user can `ls tests/artifacts/live_gui_workspace_*/` to see what the GUI subprocess is working with.
 ### Non-Goals
 - ❌ Per-test isolation. The whole point is per-test pollution = exposed fragility.
 - ❌ Env vars. The user explicitly rejected them.
 - ❌ CLI args. Conftest is the right place.
 - ❌ Runner changes. `run_tests_batched.py` is fine as-is.
 - ❌ Refactoring `simulation/sim_base.py`. It already uses `tests/artifacts/` paths.
 - ❌ New audit scripts.
 - ❌ New tests beyond the 2 verification tests.
 - ❌ Doc updates.
 - ❌ Follow-up tracks.
 ## Functional Requirements
 ### FR1. Conftest creates per-run workspace
 **Where:** `tests/conftest.py:453-465`
 **What:** Change ONE line:
 ```python
 # BEFORE (line 453)
 def live_gui(request, tmp_path_factory) -> Generator["_LiveGuiHandle", None, None]:
    ...
    temp_workspace = tmp_path_factory.mktemp("live_gui_workspace")
 # AFTER
 _RUN_ID = datetime.now().strftime("%Y%m%d_%H%M%S")
 _RUN_WORKSPACE = Path(f"tests/artifacts/live_gui_workspace_{_RUN_ID}")
 def live_gui(request) -> Generator["_LiveGuiHandle", None, None]:
    ...
    temp_workspace = _RUN_WORKSPACE
 ```
 Add `from datetime import datetime` to the imports at the top of conftest.py.
 ### FR2. `live_gui_workspace` fixture returns the new path
 **Where:** `tests/conftest.py:673-677` (the existing `live_gui_workspace` fixture)
 **What:** The fixture already exists and returns `handle.workspace`. The `handle.workspace` is set in `_LiveGuiHandle.__init__` from `temp_workspace`. So once FR1 is applied, the fixture returns the new path automatically.
 Verify with a new test:
 ```python
 def test_live_gui_workspace_is_under_tests_artifacts(live_gui_workspace):
    assert str(live_gui_workspace).replace("\\", "/").startswith("tests/artifacts/live_gui_workspace_")
 ```
 ### FR3. Workspace is gitignored
 **Where:** `.gitignore` (already has `tests/artifacts/`)
 Verify with a new test:
 ```python
 def test_live_gui_workspace_is_gitignored(live_gui_workspace):
    import subprocess
    result = subprocess.run(
        ["git", "check-ignore", str(live_gui_workspace)],
        capture_output=True, text=True, cwd="."
    )
    assert result.returncode == 0, f"Workspace {live_gui_workspace} is not gitignored"
 ```
 ## Non-Functional Requirements
 - **NFR1: 1 import + 1 line change.** Add `from datetime import datetime`. Change line 465.
 - **NFR2: No regressions.** Tier-1 and tier-2 batch results must match the `fe240db4` baseline.
 - **NFR3: 1 commit.** Atomic. Not batched.
 - **NFR4: 1-space indent, CRLF, type hints.** Per project conventions.
 ## Architecture Reference
 - **`tests/conftest.py:453-540`** — the `live_gui` session-scoped fixture. Only lines 465 + 453 + the import change.
 - **`tests/conftest.py:673-677`** — the `live_gui_workspace` fixture. No change needed; it returns `handle.workspace` which is the new path.
 - **`scripts/run_tests_batched.py`** — no change.
 - **`simulation/sim_base.py:80-91`** — no change. `os.path.abspath("tests/artifacts/temp_*.toml")` resolves to the project tree, which works.
 - **`.gitignore`** — already has `tests/artifacts/`. No change.
 ## Out of Scope
 - Per-test isolation
 - Env vars
 - CLI args
 - Runner changes
 - Sim refactoring
 - New audit scripts
 - Doc updates
 - Follow-up tracks
 - Any "while we're at it" refactors
 ## Verification Criteria
 1. ✅ `tests/conftest.py:453` no longer takes `tmp_path_factory` parameter
 2. ✅ `tests/conftest.py:465` (or equivalent) reads `_RUN_WORKSPACE` (the timestamped path)
 3. ✅ `tests/artifacts/live_gui_workspace_<timestamp>/` exists after a pytest run
 4. ✅ 2 new verification tests pass
 5. ✅ Full batch: tier-1 5/5, tier-2 5/5, tier-3 0 new failures (or matches `fe240db4` baseline + the 4 sim tests now pass)
 6. ✅ The 4 sim tests in `tests/test_extended_sims.py` pass in batch
 7. ✅ 1 atomic commit
 ## Execution Plan
 This is a 1-commit, 4-step change. No phases. No agent handoffs.
 ### Step 1: Pre-edit checkpoint
 ```powershell
 cd C:\projects\manual_slop; git add . && git commit -m "wip: pre-workspace-path-finalize" --allow-empty
 ```
 ### Step 2: Apply the changes
 Use `manual-slop_set_file_slice` (the recommended surgical tool per `conductor/edit_workflow.md`):
 1. Add `from datetime import datetime` to the imports section of `tests/conftest.py`
 2. Add the module-level constants near the top of conftest.py (after imports):
   ```python
   _RUN_ID = datetime.now().strftime("%Y%m%d_%H%M%S")
   _RUN_WORKSPACE = Path(f"tests/artifacts/live_gui_workspace_{_RUN_ID}")
   ```
 3. Change `tests/conftest.py:453` from `def live_gui(request, tmp_path_factory)` to `def live_gui(request)`
 4. Change `tests/conftest.py:465` from `temp_workspace = tmp_path_factory.mktemp("live_gui_workspace")` to `temp_workspace = _RUN_WORKSPACE`
 Verify syntax after each edit:
 ```powershell
 cd C:\projects\manual_slop; uv run python -c "import ast; ast.parse(open('tests/conftest.py').read()); print('OK')"
 ```
 ### Step 3: Add 2 verification tests
 Create `tests/test_workspace_path_finalize.py` with the 2 tests in FR2 and FR3.
 ### Step 4: Run the 2 new tests
 ```powershell
 cd C:\projects\manual_slop; uv run pytest tests/test_workspace_path_finalize.py -v --timeout=30
 ```
 Expect: 2/2 pass.
 ### Step 5: Run the full batch
 ```powershell
 cd C:\projects\manual_slop; uv run .\scripts\run_tests_batched.py 2>&1 | Tee-Object -FilePath "tests/artifacts/post_finalize_batch_20260609.log" | Select-Object -Last 30
 ```
 Expect: tier-1 5/5, tier-2 5/5, tier-3 0 new failures (or 4 sim tests now pass + 1 RAG test now passes).
 ### Step 6: Commit
 ```powershell
 cd C:\projects\manual_slop; git add tests/conftest.py tests/test_workspace_path_finalize.py tests/artifacts/post_finalize_batch_20260609.log
 git commit -m "fix(test): per-run workspace under tests/artifacts/ (no env vars, no tmp_path)"
 $h = git log -1 --format='%H'
 git notes add -m "Replaces tmp_path_factory.mktemp with a per-run timestamped folder under tests/artifacts/. Each pytest invocation gets a new folder; all live_gui tests in that invocation share it (per-test pollution is intentional and exposes fragility, per the test_infrastructure_hardening_20260609 spec). Workspace is gitignored via tests/artifacts/. Sims in simulation/sim_base.py use os.path.abspath('tests/artifacts/...') which resolves correctly from the project root." $h
 ```
 ## Risk Assessment
 | Risk | Likelihood | Impact | Mitigation |
 |---|---|---|---|
 | 4-line edit corrupts conftest | Low | High | Use `manual-slop_set_file_slice`; verify syntax with `ast.parse` after each edit; pre-edit checkpoint |
 | `_RUN_ID` collides if two pytest invocations start in the same second | Very low | Low | Acceptable — second-precision is enough for human-driven runs; for CI, add a uuid suffix if needed (out of scope) |
 | Stale workspaces accumulate in `tests/artifacts/` | Medium | Low | They're gitignored; the user can `rm -rf tests/artifacts/live_gui_workspace_*` when needed; out of scope for this track |
 ## See Also
 - **User feedback:** Per-test pollution is intentional. Per-run isolation is the goal. No env vars. No CLI args. Conftest is the source of truth.
 - **Pre-Phase 3 baseline:** `tests/conftest.py` had the workspace at `Path("tests/artifacts/live_gui_workspace")` (no timestamp). Sims worked.
 - **The phantom bug:** CWD drift was already fixed by `os.path.abspath` in `RAGEngine.index_file` (commit `eb8357ec`).
 - **The 3-task fix that mattered:** `fe240db4` (MMA + RAG state reset).
 - **What NOT to do:** `tmp_path_factory` (per-pytest-invocation, opaque, in %TEMP%). Env vars (hidden global state). CLI args (wrong abstraction layer).
@@ -0,0 +1,38 @@
 # Track state for workspace_path_finalize_20260609
 # Updated by executing agent as tasks complete
 [meta]
 track_id = "workspace_path_finalize_20260609"
 name = "Workspace Path Finalize (2026-06-09) - the LAST track on this issue"
 status = "active"
 current_phase = 1
 last_updated = "2026-06-09"
 [blocked_by]
 # No blockers; this is the final cleanup of the test_infrastructure_hardening track
 [blocks]
 # This track blocks nothing. It is the last track on this issue.
 [phases]
 phase_1 = { status = "in_progress", checkpointsha = "", name = "Apply 1-line fix and verify" }
 [tasks]
 t1_1 = { status = "pending", commit_sha = "", description = "Pre-edit checkpoint" }
 t1_2 = { status = "pending", commit_sha = "", description = "Apply 1-line conftest.py change" }
 t1_3 = { status = "pending", commit_sha = "", description = "Add 2 verification tests" }
 t1_4 = { status = "pending", commit_sha = "", description = "Run the 2 new tests" }
 t1_5 = { status = "pending", commit_sha = "", description = "Run the full batch" }
 t1_6 = { status = "pending", commit_sha = "", description = "Commit" }
 [verification]
 workspace_at_tests_artifacts = false
 new_tests_pass = false
 full_batch_passes = false
 sim_tests_pass_in_batch = false
 [baseline_capture]
 # Captured from the fe240db4 commit
 tier_1_status = "PASS (5/5 batches)"
 tier_2_status = "PASS (5/5 batches)"
 tier_3_status = "FAIL on test_extended_sims.py::test_context_sim_live (1 known flake from Phase 3 tmp_path_factory refactor)"