Private
Public Access
0
0

conductor(track): workspace_path_finalize_20260609 - per-run workspace under tests/artifacts/

This commit is contained in:
2026-06-09 20:27:20 -04:00
parent fe240db410
commit c725270b99
3 changed files with 309 additions and 0 deletions
@@ -0,0 +1,37 @@
{
"track_id": "workspace_path_finalize_20260609",
"name": "Workspace Path Finalize (2026-06-09) - the LAST track on this issue",
"created_at": "2026-06-09",
"status": "spec",
"priority": "A",
"blocked_by": [],
"blocks": [],
"inherits_from": [
"conductor/tracks/test_infrastructure_hardening_20260609/"
],
"supersedes": [],
"domain": "Meta-Tooling (test infrastructure)",
"scope_summary": "One-line fixture change to move live_gui workspace from %TEMP%/pytest-of-... back to tests/artifacts/live_gui_workspace/ (gitignored, in project tree, where the sims expect it). The Phase 3 tmp_path_factory refactor was a regression. The user explicitly called this out.",
"estimated_effort": "30 minutes",
"phases": 1,
"verification_criteria": [
"tests/conftest.py:465 reads Path('tests/artifacts/live_gui_workspace')",
"tests/test_workspace_path_finalize.py has 2 tests, both pass",
"Full batch: tier-1 5/5, tier-2 5/5, tier-3 0 new failures",
"The 4 sim tests in tests/test_extended_sims.py pass in batch"
],
"out_of_scope": [
"Refactoring simulation/sim_base.py",
"Adding new audit scripts",
"Updating docs",
"Filing follow-up tracks",
"Any 'while we're at it' refactors"
],
"risks": [
{
"risk": "1-line edit corrupts conftest (as happened in the previous attempt)",
"mitigation": "Use manual-slop_set_file_slice; verify syntax with ast.parse after"
}
],
"tier_2_supervision_required_for": []
}
@@ -0,0 +1,234 @@
# Track Specification: Workspace Path Per-Run (2026-06-09)
## Overview
Conftest creates `tests/artifacts/live_gui_workspace_<timestamp>/` once per pytest invocation. No env vars, no CLI args, no runner changes. The conftest is the source of truth for the workspace path.
**Per-test pollution is intentional** — it exposes fragility, which is the whole point of the test infrastructure hardening track.
**Per-run isolation** — each `uv run pytest` invocation gets a new timestamped folder, so state doesn't leak across runs.
**Why this design:**
- No env vars (anti-pattern, hidden global state)
- No CLI args (conftest is the right place for test infrastructure)
- No runner changes (`run_tests_batched.py` already works)
- Path is in the project tree under `tests/artifacts/` (gitignored, inspectable, where the sims expect it)
- `tests/artifacts/` is already gitignored — no repo pollution
## Current State Audit (as of fe240db4)
### Bug
`tests/conftest.py:453-465`:
```python
@pytest.fixture(scope="session")
def live_gui(request, tmp_path_factory) -> Generator["_LiveGuiHandle", None, None]:
...
temp_workspace = tmp_path_factory.mktemp("live_gui_workspace")
```
This puts the workspace at `C:\Users\<user>\AppData\Local\Temp\pytest-of-<user>\pytest-N\live_gui_workspace0`. That's:
1. Not in the project tree (user can't find it)
2. Per-pytest-invocation (re-rolled each run, which is fine), but with an opaque name
3. Different location from what the sims in `simulation/sim_base.py` expect (`tests/artifacts/...`)
### The fix
Replace `tmp_path_factory.mktemp("live_gui_workspace")` with a deterministic per-run folder under `tests/artifacts/`:
```python
from datetime import datetime
_run_id = datetime.now().strftime("%Y%m%d_%H%M%S")
temp_workspace = Path(f"tests/artifacts/live_gui_workspace_{_run_id}")
```
This:
- Creates `tests/artifacts/live_gui_workspace_20260609_201530/` on the user's CWD (project root)
- Each `uv run pytest` invocation gets a new folder (timestamp is per-second granularity)
- All 49 live_gui tests in that invocation share the workspace
- The folder is in `tests/artifacts/` (already gitignored, see `git check-ignore tests/artifacts`)
- The sims' `os.path.abspath("tests/artifacts/temp_*.toml")` resolves to the project tree, which matches
### What to KEEP from Phase 3
- `tests/test_live_gui_workspace_fixture.py` — the test file that verifies the `live_gui_workspace` fixture
- The 5 test files updated in `006bb114` to use the fixture instead of hardcoded paths
- The `_LiveGuiHandle` class with `__iter__`/`__getitem__` backward compat
- The `_check_live_gui_health` autouse fixture
- The `clean_baseline` marker
- The 3-task fix at `fe240db4` (MMA + RAG state reset)
### What to REVERT
- `tests/conftest.py:465`: change `tmp_path_factory.mktemp("live_gui_workspace")` back to a stable path under `tests/artifacts/`
### What to ADD
- A `_run_id` module-level constant in conftest.py (computed once at import time)
- The `live_gui_workspace` fixture already exists; just verify it returns the new path
## Goals
1. **Goal A: Workspace at `tests/artifacts/live_gui_workspace_<timestamp>/`.** Conftest creates the folder, all live_gui tests share it for the duration of the run.
2. **Goal B: Sim tests pass in full batch.** `tests/test_extended_sims.py` 4 sims pass in tier-3.
3. **Goal C: Per-run isolation.** Each `uv run pytest` invocation gets a new folder. State from a prior run doesn't pollute.
4. **Goal D: Inspectable from project tree.** The user can `ls tests/artifacts/live_gui_workspace_*/` to see what the GUI subprocess is working with.
### Non-Goals
- ❌ Per-test isolation. The whole point is per-test pollution = exposed fragility.
- ❌ Env vars. The user explicitly rejected them.
- ❌ CLI args. Conftest is the right place.
- ❌ Runner changes. `run_tests_batched.py` is fine as-is.
- ❌ Refactoring `simulation/sim_base.py`. It already uses `tests/artifacts/` paths.
- ❌ New audit scripts.
- ❌ New tests beyond the 2 verification tests.
- ❌ Doc updates.
- ❌ Follow-up tracks.
## Functional Requirements
### FR1. Conftest creates per-run workspace
**Where:** `tests/conftest.py:453-465`
**What:** Change ONE line:
```python
# BEFORE (line 453)
def live_gui(request, tmp_path_factory) -> Generator["_LiveGuiHandle", None, None]:
...
temp_workspace = tmp_path_factory.mktemp("live_gui_workspace")
# AFTER
_RUN_ID = datetime.now().strftime("%Y%m%d_%H%M%S")
_RUN_WORKSPACE = Path(f"tests/artifacts/live_gui_workspace_{_RUN_ID}")
def live_gui(request) -> Generator["_LiveGuiHandle", None, None]:
...
temp_workspace = _RUN_WORKSPACE
```
Add `from datetime import datetime` to the imports at the top of conftest.py.
### FR2. `live_gui_workspace` fixture returns the new path
**Where:** `tests/conftest.py:673-677` (the existing `live_gui_workspace` fixture)
**What:** The fixture already exists and returns `handle.workspace`. The `handle.workspace` is set in `_LiveGuiHandle.__init__` from `temp_workspace`. So once FR1 is applied, the fixture returns the new path automatically.
Verify with a new test:
```python
def test_live_gui_workspace_is_under_tests_artifacts(live_gui_workspace):
assert str(live_gui_workspace).replace("\\", "/").startswith("tests/artifacts/live_gui_workspace_")
```
### FR3. Workspace is gitignored
**Where:** `.gitignore` (already has `tests/artifacts/`)
Verify with a new test:
```python
def test_live_gui_workspace_is_gitignored(live_gui_workspace):
import subprocess
result = subprocess.run(
["git", "check-ignore", str(live_gui_workspace)],
capture_output=True, text=True, cwd="."
)
assert result.returncode == 0, f"Workspace {live_gui_workspace} is not gitignored"
```
## Non-Functional Requirements
- **NFR1: 1 import + 1 line change.** Add `from datetime import datetime`. Change line 465.
- **NFR2: No regressions.** Tier-1 and tier-2 batch results must match the `fe240db4` baseline.
- **NFR3: 1 commit.** Atomic. Not batched.
- **NFR4: 1-space indent, CRLF, type hints.** Per project conventions.
## Architecture Reference
- **`tests/conftest.py:453-540`** — the `live_gui` session-scoped fixture. Only lines 465 + 453 + the import change.
- **`tests/conftest.py:673-677`** — the `live_gui_workspace` fixture. No change needed; it returns `handle.workspace` which is the new path.
- **`scripts/run_tests_batched.py`** — no change.
- **`simulation/sim_base.py:80-91`** — no change. `os.path.abspath("tests/artifacts/temp_*.toml")` resolves to the project tree, which works.
- **`.gitignore`** — already has `tests/artifacts/`. No change.
## Out of Scope
- Per-test isolation
- Env vars
- CLI args
- Runner changes
- Sim refactoring
- New audit scripts
- Doc updates
- Follow-up tracks
- Any "while we're at it" refactors
## Verification Criteria
1.`tests/conftest.py:453` no longer takes `tmp_path_factory` parameter
2.`tests/conftest.py:465` (or equivalent) reads `_RUN_WORKSPACE` (the timestamped path)
3.`tests/artifacts/live_gui_workspace_<timestamp>/` exists after a pytest run
4. ✅ 2 new verification tests pass
5. ✅ Full batch: tier-1 5/5, tier-2 5/5, tier-3 0 new failures (or matches `fe240db4` baseline + the 4 sim tests now pass)
6. ✅ The 4 sim tests in `tests/test_extended_sims.py` pass in batch
7. ✅ 1 atomic commit
## Execution Plan
This is a 1-commit, 4-step change. No phases. No agent handoffs.
### Step 1: Pre-edit checkpoint
```powershell
cd C:\projects\manual_slop; git add . && git commit -m "wip: pre-workspace-path-finalize" --allow-empty
```
### Step 2: Apply the changes
Use `manual-slop_set_file_slice` (the recommended surgical tool per `conductor/edit_workflow.md`):
1. Add `from datetime import datetime` to the imports section of `tests/conftest.py`
2. Add the module-level constants near the top of conftest.py (after imports):
```python
_RUN_ID = datetime.now().strftime("%Y%m%d_%H%M%S")
_RUN_WORKSPACE = Path(f"tests/artifacts/live_gui_workspace_{_RUN_ID}")
```
3. Change `tests/conftest.py:453` from `def live_gui(request, tmp_path_factory)` to `def live_gui(request)`
4. Change `tests/conftest.py:465` from `temp_workspace = tmp_path_factory.mktemp("live_gui_workspace")` to `temp_workspace = _RUN_WORKSPACE`
Verify syntax after each edit:
```powershell
cd C:\projects\manual_slop; uv run python -c "import ast; ast.parse(open('tests/conftest.py').read()); print('OK')"
```
### Step 3: Add 2 verification tests
Create `tests/test_workspace_path_finalize.py` with the 2 tests in FR2 and FR3.
### Step 4: Run the 2 new tests
```powershell
cd C:\projects\manual_slop; uv run pytest tests/test_workspace_path_finalize.py -v --timeout=30
```
Expect: 2/2 pass.
### Step 5: Run the full batch
```powershell
cd C:\projects\manual_slop; uv run .\scripts\run_tests_batched.py 2>&1 | Tee-Object -FilePath "tests/artifacts/post_finalize_batch_20260609.log" | Select-Object -Last 30
```
Expect: tier-1 5/5, tier-2 5/5, tier-3 0 new failures (or 4 sim tests now pass + 1 RAG test now passes).
### Step 6: Commit
```powershell
cd C:\projects\manual_slop; git add tests/conftest.py tests/test_workspace_path_finalize.py tests/artifacts/post_finalize_batch_20260609.log
git commit -m "fix(test): per-run workspace under tests/artifacts/ (no env vars, no tmp_path)"
$h = git log -1 --format='%H'
git notes add -m "Replaces tmp_path_factory.mktemp with a per-run timestamped folder under tests/artifacts/. Each pytest invocation gets a new folder; all live_gui tests in that invocation share it (per-test pollution is intentional and exposes fragility, per the test_infrastructure_hardening_20260609 spec). Workspace is gitignored via tests/artifacts/. Sims in simulation/sim_base.py use os.path.abspath('tests/artifacts/...') which resolves correctly from the project root." $h
```
## Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| 4-line edit corrupts conftest | Low | High | Use `manual-slop_set_file_slice`; verify syntax with `ast.parse` after each edit; pre-edit checkpoint |
| `_RUN_ID` collides if two pytest invocations start in the same second | Very low | Low | Acceptable — second-precision is enough for human-driven runs; for CI, add a uuid suffix if needed (out of scope) |
| Stale workspaces accumulate in `tests/artifacts/` | Medium | Low | They're gitignored; the user can `rm -rf tests/artifacts/live_gui_workspace_*` when needed; out of scope for this track |
## See Also
- **User feedback:** Per-test pollution is intentional. Per-run isolation is the goal. No env vars. No CLI args. Conftest is the source of truth.
- **Pre-Phase 3 baseline:** `tests/conftest.py` had the workspace at `Path("tests/artifacts/live_gui_workspace")` (no timestamp). Sims worked.
- **The phantom bug:** CWD drift was already fixed by `os.path.abspath` in `RAGEngine.index_file` (commit `eb8357ec`).
- **The 3-task fix that mattered:** `fe240db4` (MMA + RAG state reset).
- **What NOT to do:** `tmp_path_factory` (per-pytest-invocation, opaque, in %TEMP%). Env vars (hidden global state). CLI args (wrong abstraction layer).
@@ -0,0 +1,38 @@
# Track state for workspace_path_finalize_20260609
# Updated by executing agent as tasks complete
[meta]
track_id = "workspace_path_finalize_20260609"
name = "Workspace Path Finalize (2026-06-09) - the LAST track on this issue"
status = "active"
current_phase = 1
last_updated = "2026-06-09"
[blocked_by]
# No blockers; this is the final cleanup of the test_infrastructure_hardening track
[blocks]
# This track blocks nothing. It is the last track on this issue.
[phases]
phase_1 = { status = "in_progress", checkpointsha = "", name = "Apply 1-line fix and verify" }
[tasks]
t1_1 = { status = "pending", commit_sha = "", description = "Pre-edit checkpoint" }
t1_2 = { status = "pending", commit_sha = "", description = "Apply 1-line conftest.py change" }
t1_3 = { status = "pending", commit_sha = "", description = "Add 2 verification tests" }
t1_4 = { status = "pending", commit_sha = "", description = "Run the 2 new tests" }
t1_5 = { status = "pending", commit_sha = "", description = "Run the full batch" }
t1_6 = { status = "pending", commit_sha = "", description = "Commit" }
[verification]
workspace_at_tests_artifacts = false
new_tests_pass = false
full_batch_passes = false
sim_tests_pass_in_batch = false
[baseline_capture]
# Captured from the fe240db4 commit
tier_1_status = "PASS (5/5 batches)"
tier_2_status = "PASS (5/5 batches)"
tier_3_status = "FAIL on test_extended_sims.py::test_context_sim_live (1 known flake from Phase 3 tmp_path_factory refactor)"