conductor(plan): Update spec/plan for Phase 2 (live_gui sim test fragility)

2026-06-10 10:12:09 -04:00
parent e788512d93
commit c729f8adaf
3 changed files with 157 additions and 20 deletions
@@ -552,11 +552,104 @@ Focus: The moment of truth. The 4 sim tests in `test_extended_sims.py` now pass,
  - `tests/test_context_presets_manager.py::test_app_controller_save_load`
  - `tests/test_project_switch_persona_preset.py::test_load_active_project_creates_persona_manager`
  - `tests/test_project_switch_persona_preset.py::test_load_context_preset_missing_raises_keyerror`
- [x] 4 sim tests in `tests/test_extended_sims.py` pass (full sim run)
+- [x] 4 sim tests in `tests/test_extended_sims.py` pass (ISOLATED run; 4/4 in 222.08s)
 - [x] Targeted regression verification: 36/36 affected tests pass
- [ ] Tier-1 batch: 5/5 pass (full batch aborted; targeted verification substituted)
- [ ] Tier-2 batch: 5/5 pass (full batch aborted)
- [ ] Tier-3 batch: 0 new failures (full batch aborted; targeted verification substituted)
+- [x] Tier-1 batch: 5/5 pass (2026-06-10 batch run)
+- [x] Tier-2 batch: 5/5 pass (2026-06-10 batch run)
+- [ ] Tier-3 batch: 0 new failures (FAILED in 2026-06-10 batch run; see Phase 2 below)
+
+## Phase 2: Fix live_gui sim test fragility
+
+The Phase 1 verification (isolated sim test run) was misleading. The full batch run revealed a SEPARATE failure in `test_extended_sims.py::test_context_sim_live` — `KeyError: 'paths'` at `simulation/sim_context.py:44`. This is a live_gui shared-subprocess state issue, not a regression of the FR1+FR2 fix.
+
+### Task 7.1: Diagnose the root cause
+
+- [ ] **Step 7.1.1: Read the duplicated loop in sim_context.py**
+  ```powershell
+  cd C:\projects\manual_slop; uv run python -c "import ast; print(ast.unparse(ast.parse(open('simulation/sim_context.py').read())))" | Select-String "for f in all_py"
+  ```
+  Confirm lines 32-37 and 41-47 are duplicate logic. The second loop is supposed to add MORE files but the first loop already added all of them.
+
+- [ ] **Step 7.1.2: Check what post_project does to empty/missing `paths`**
+  ```powershell
+  cd C:\projects\manual_slop; uv run python -c "
+from api_hook_client import ApiHookClient
+import json
+client = ApiHookClient()
+import time
+if not client.wait_for_server(timeout=5):
+    print('server not up; skip')
+else:
+    p = client.get_project()
+    print('project files before:', json.dumps(p.get('project', {}).get('files', {}), indent=2))
+  "
+  ```
+  Expected: in the live_gui subprocess, the project's `files` dict may not have a `paths` key after a fresh `setup()` (because the test setup at `simulation/sim_base.py:78-99` doesn't pre-populate `paths`).
+
+- [ ] **Step 7.1.3: Read sim_base.setup to understand initial state**
+  Use `manual-slop_get_file_slice` to read `simulation/sim_base.py:78-99`. Confirm `setup()` does NOT pre-populate `files['paths']` in the saved project.
+
+### Task 7.2: Apply the fix
+
+The fix is a 1-3 line change. Choose ONE of:
+
+**Option A: Make the test code defensive (test-only fix)**
+Modify `simulation/sim_context.py:44` to use `.setdefault('paths', [])`:
+```python
+for f in all_py:
+    if f not in proj['project']['files'].setdefault('paths', []):
+        proj['project']['files']['paths'].append(f)
+```
+Apply to BOTH loops (lines 33-35 and lines 43-45) for consistency.
+
+**Option B: Remove the redundant second loop (cleanup)**
+The second loop (lines 41-47) is identical to the first. Remove it. The first loop's `post_project` (line 37) already saves the project with all the files. The second loop+post is unnecessary.
+
+**Recommended:** Option A is the minimal, defensive fix that addresses the test fragility without restructuring. Option B is cleaner code but more change.
+
+- [ ] **Step 7.2.1: Apply the chosen fix to simulation/sim_context.py**
+
+- [ ] **Step 7.2.2: Verify syntax**
+  ```powershell
+  cd C:\projects\manual_slop; uv run python -c "import ast; ast.parse(open('simulation/sim_context.py').read()); print('OK')"
+  ```
+
+- [ ] **Step 7.2.3: Verify import**
+  ```powershell
+  cd C:\projects\manual_slop; uv run python -c "from simulation.sim_context import ContextSimulation; print('import OK')"
+  ```
+
+### Task 7.3: Verify in batch
+
+- [ ] **Step 7.3.1: Run the 4 sim tests in isolation first (sanity)**
+  ```powershell
+  cd C:\projects\manual_slop; uv run pytest tests/test_extended_sims.py -v --timeout=300
+  ```
+  Expected: 4/4 pass in isolation.
+
+- [ ] **Step 7.3.2: Run the FULL batch to confirm (authoritative verification)**
+  ```powershell
+  cd C:\projects\manual_slop; uv run .\scripts\run_tests_batched.py 2>&1 | Tee-Object -FilePath "tests/artifacts/post_phase2_mma_reset_fix_batch_20260610.log" | Select-Object -Last 50
+  ```
+  Expected: tier-1 5/5, tier-2 5/5, tier-3 0 failures.
+
+### Task 7.4: Final checkpoint
+
+- [ ] **Step 7.4.1: Commit the fix**
+  ```powershell
+  cd C:\projects\manual_slop; git add simulation/sim_context.py
+  git commit -m "fix(sim): make test_context_sim_live defensive against missing files['paths'] in batch"
+  $h = git log -1 --format='%H'
+  git notes add -m "..." $h
+  ```
+
+- [ ] **Step 7.4.2: Checkpoint commit with full batch log**
+  ```powershell
+  cd C:\projects\manual_slop; git add -f tests/artifacts/post_phase2_mma_reset_fix_batch_20260610.log
+  git commit -m "conductor(checkpoint): Phase 2 complete - sim test fragility fixed"
+  $h = git log -1 --format='%H'
+  git notes add -m "..." $h
+  ```

 ## Track Done

@@ -242,20 +242,51 @@ The accompanying comment claims `hasattr()` still returns False for these, which

 ## Verification Criteria

+### Phase 1 (COMPLETE — verified 2026-06-10)
+
 1. ✅ `src/app_controller.py:3409` pre-populates `mma_tier_usage` with the full default shape (model, provider, tool_preset, input, output for all 4 tiers).
 2. ✅ `src/app_controller.py:2639` uses `d.get("model")` (or equivalent) instead of `d["model"]`.
 3. ✅ `src/app_controller.py:__init__` contains `self.context_preset_manager = ContextPresetManager()` between the `_settable_fields` block and `self.perf_monitor = ...`.
 4. ✅ `src/app_controller.py:1266-1275` does NOT contain `"persona_manager"` in `_LAZY_MANAGER_DEFAULTS`. The misleading comment is fixed or removed.
 5. ✅ A new unit test in `tests/test_mma_tier_usage_reset_fix.py` verifies the post-reset flush doesn't crash.
 6. ✅ `tests/test_reset_session_clears_mma_and_rag.py` (3 tests) still pass.
-7. ✅ `tests/test_extended_sims.py::test_context_sim_live` passes in batch.
-8. ✅ `tests/test_extended_sims.py::test_ai_settings_sim_live` passes in batch.
-9. ✅ `tests/test_extended_sims.py::test_tools_sim_live` passes in batch.
-10. ✅ `tests/test_extended_sims.py::test_execution_sim_live` passes in batch.
 11. ✅ `tests/test_context_presets_manager.py::test_app_controller_save_load` passes.
 12. ✅ `tests/test_project_switch_persona_preset.py::test_load_active_project_creates_persona_manager` passes.
 13. ✅ `tests/test_project_switch_persona_preset.py::test_load_context_preset_missing_raises_keyerror` passes.
 14. ✅ Tier-1 batch: 5/5 pass.
 15. ✅ Tier-2 batch: 5/5 pass.
-16. ✅ Tier-3 batch: 0 new failures vs `33d02bb1` baseline.
 17. ✅ 4 atomic commits (one per FR).
+
+### Phase 2 (PENDING — to be completed)
+
+7. ❌ `tests/test_extended_sims.py::test_context_sim_live` passes in batch.
+8. ✅ `tests/test_extended_sims.py::test_ai_settings_sim_live` passes in batch.
+9. ✅ `tests/test_extended_sims.py::test_tools_sim_live` passes in batch.
+10. ✅ `tests/test_extended_sims.py::test_execution_sim_live` passes in batch.
+16. ❌ Tier-3 batch: 0 new failures vs `33d02bb1` baseline.
+
+### Phase 2 Diagnosis (2026-06-10 full batch run)
+
+The Phase 1 FRs fixed the original `KeyError: 'model'` from `_flush_to_project`. However, the full batch run (not the isolated test run) revealed a SEPARATE failure in the same test:
+
+```
+FAILED tests/test_extended_sims.py::test_context_sim_live
+KeyError: 'paths'
+simulation\sim_context.py:44: KeyError
+```
+
+The traceback shows the SECOND loop in `simulation/sim_context.py:41-47` (a redundant copy of the first loop) failing because `proj['project']['files']['paths']` is missing after the `post_project` round-trip. This loop is duplicated logic (the first loop at lines 32-37 already adds all `.py` files to `paths`; the second loop is supposed to add more, but the round-trip strips `paths`).
+
+**Differences from original failure (which FR1+FR2 fixed):**
+- Original (pre-fix): `KeyError: 'model'` from `_flush_to_project` at `src/app_controller.py:2639`
+- New (post-fix): `KeyError: 'paths'` from `simulation/sim_context.py:44` (in the test code, not production)
+
+**Root cause hypothesis:** The `post_project` hook strips empty/missing fields during the round-trip. In isolation, the first `post_project` succeeds and `paths` is preserved (probably because the first `proj` fetch already had a non-empty `paths` from prior session state). In batch, the live_gui subprocess state is different (different project setup path, prior tests' state has been cleared) and `paths` is empty/absent, so the re-fetch returns a project where `files['paths']` is missing entirely.
+
+**Verification path for Phase 2:**
+- Read the current `sim_context.py:run()` to understand the duplicated loop's intent
+- Either: (a) remove the redundant second loop, (b) make the test handle missing `paths` key with `.setdefault('paths', [])`, (c) fix `_flush_to_project` to preserve empty `paths` lists
+- Re-run the full batch to confirm all 4 sim tests pass
+- Update the verification log
+
+**Per AGENTS.md "Isolated-Pass Verification Fallacy":** the previous run that claimed "4/4 sim tests pass" was based on an isolated run. The full batch is the authoritative test. The track is NOT complete until Phase 2 verification passes.
@@ -4,8 +4,8 @@
 [meta]
 track_id = "mma_tier_usage_reset_fix_20260610"
 name = "Fix mma_tier_usage reset + 3 pre-existing controller bugs (2026-06-10)"
-status = "completed"
-current_phase = "complete"
+status = "in_progress"
+current_phase = 1
 last_updated = "2026-06-10"

 [blocked_by]
@@ -16,6 +16,7 @@ last_updated = "2026-06-10"

 [phases]
 phase_1 = { status = "completed", checkpointsha = "428aa189", name = "Apply 4 FRs in app_controller.py + 4 regression tests" }
+phase_2 = { status = "pending", checkpointsha = "", name = "Fix live_gui sim test fragility (test_context_sim_live KeyError: paths)" }

 [tasks]
 t1_1 = { status = "completed", commit_sha = "f5021360", description = "Pre-edit checkpoint" }
@@ -28,6 +29,10 @@ t1_7 = { status = "completed", commit_sha = "b96d709e", description = "Verify th
 t1_8 = { status = "completed", commit_sha = "b96d709e", description = "Run the 3 previously-failing tier-1 tests + 4 sim tests in test_extended_sims.py" }
 t1_9 = { status = "completed", commit_sha = "428aa189", description = "Run targeted regression tests (full batch aborted by user)" }
 t1_10 = { status = "completed", commit_sha = "428aa189", description = "Checkpoint commit" }
+t2_1 = { status = "pending", commit_sha = "", description = "Diagnose why proj['project']['files']['paths'] is missing after post_project round-trip in batched live_gui (works in isolated run)" }
+t2_2 = { status = "pending", commit_sha = "", description = "Apply fix (production or test, whichever is correct root cause)" }
+t2_3 = { status = "pending", commit_sha = "", description = "Verify all 4 sim tests pass in FULL batch (tier-3-live_gui)" }
+t2_4 = { status = "pending", commit_sha = "", description = "Final checkpoint with batch log" }

 [verification]
 mma_tier_usage_prepopulated = true
@@ -37,16 +42,17 @@ persona_manager_removed_from_lazy_defaults = true
 regression_tests_pass = true
 reset_clears_mma_tests_pass = true
 three_failing_tier1_tests_pass = true
-extended_sims_pass = true
-targeted_regression_passes = true
+extended_sims_pass_isolated = true
+extended_sims_pass_in_batch = false

 [baseline_capture]
-# Captured from the 2026-06-10 batch run
+# Captured from the 2026-06-10 batch runs
 tier_1_status_pre_fix = "FAIL (3 tests: test_app_controller_save_load, test_load_active_project_creates_persona_manager, test_load_context_preset_missing_raises_keyerror)"
 tier_2_status_pre_fix = "PASS (5/5 batches)"
-tier_3_status_pre_fix = "FAIL on test_extended_sims.py::test_context_sim_live (4 sim tests)"
-tier_1_status_post_fix = "PASS (targeted verification: 20/20 in affected test files)"
-tier_3_status_post_fix = "PASS (4/4 sim tests in test_extended_sims.py)"
+tier_3_status_pre_fix = "FAIL on test_extended_sims.py::test_context_sim_live (4 sim tests) - KeyError: 'model' (the original FR1+FR2 bug)"
+tier_1_status_post_phase_1 = "PASS (5/5 tier-1 batches in 2026-06-10 batch run)"
+tier_2_status_post_phase_1 = "PASS (5/5 tier-2 batches in 2026-06-10 batch run)"
+tier_3_status_post_phase_1 = "FAIL on test_extended_sims.py::test_context_sim_live - KeyError: 'paths' (a NEW issue exposed after the original bug was fixed)"

 [notes]
 # Test fixture in tests/test_mma_tier_usage_reset_fix.py sets 4 UI flags
@@ -56,6 +62,13 @@ tier_3_status_post_fix = "PASS (4/4 sim tests in test_extended_sims.py)"
 # refactor from the previous agent's WIP commit. A follow-up to clean up
 # _UI_FLAG_DEFAULTS is recommended.

-# The full scripts/run_tests_batched.py run was aborted by the user.
-# Targeted regression runs (36/36 across all affected files) cover all
-# known affected code paths.
+# CRITICAL: Phase 1 verification was based on ISOLATED sim test runs.
+# The full batch run (tier-3-live_gui) reveals a SEPARATE test fragility
+# issue: sim_context.py:41-47 (the SECOND redundant file-add loop)
+# fails in batch because proj['project']['files']['paths'] is missing
+# after the post_project round-trip. This is a live_gui shared-subprocess
+# state issue (different from the original KeyError: 'model' bug).
+#
+# Per the workflow's "Isolated-Pass Verification Fallacy" rule, the batch
+# failure is the authoritative result. The track is NOT complete; the
+# 4 sim tests must pass in batch (the verification criterion).