Private
Public Access
0
0

conductor(plan): Update spec/plan for Phase 2 (live_gui sim test fragility)

This commit is contained in:
2026-06-10 10:12:09 -04:00
parent e788512d93
commit c729f8adaf
3 changed files with 157 additions and 20 deletions
@@ -552,11 +552,104 @@ Focus: The moment of truth. The 4 sim tests in `test_extended_sims.py` now pass,
- `tests/test_context_presets_manager.py::test_app_controller_save_load`
- `tests/test_project_switch_persona_preset.py::test_load_active_project_creates_persona_manager`
- `tests/test_project_switch_persona_preset.py::test_load_context_preset_missing_raises_keyerror`
- [x] 4 sim tests in `tests/test_extended_sims.py` pass (full sim run)
- [x] 4 sim tests in `tests/test_extended_sims.py` pass (ISOLATED run; 4/4 in 222.08s)
- [x] Targeted regression verification: 36/36 affected tests pass
- [ ] Tier-1 batch: 5/5 pass (full batch aborted; targeted verification substituted)
- [ ] Tier-2 batch: 5/5 pass (full batch aborted)
- [ ] Tier-3 batch: 0 new failures (full batch aborted; targeted verification substituted)
- [x] Tier-1 batch: 5/5 pass (2026-06-10 batch run)
- [x] Tier-2 batch: 5/5 pass (2026-06-10 batch run)
- [ ] Tier-3 batch: 0 new failures (FAILED in 2026-06-10 batch run; see Phase 2 below)
## Phase 2: Fix live_gui sim test fragility
The Phase 1 verification (isolated sim test run) was misleading. The full batch run revealed a SEPARATE failure in `test_extended_sims.py::test_context_sim_live` — `KeyError: 'paths'` at `simulation/sim_context.py:44`. This is a live_gui shared-subprocess state issue, not a regression of the FR1+FR2 fix.
### Task 7.1: Diagnose the root cause
- [ ] **Step 7.1.1: Read the duplicated loop in sim_context.py**
```powershell
cd C:\projects\manual_slop; uv run python -c "import ast; print(ast.unparse(ast.parse(open('simulation/sim_context.py').read())))" | Select-String "for f in all_py"
```
Confirm lines 32-37 and 41-47 are duplicate logic. The second loop is supposed to add MORE files but the first loop already added all of them.
- [ ] **Step 7.1.2: Check what post_project does to empty/missing `paths`**
```powershell
cd C:\projects\manual_slop; uv run python -c "
from api_hook_client import ApiHookClient
import json
client = ApiHookClient()
import time
if not client.wait_for_server(timeout=5):
print('server not up; skip')
else:
p = client.get_project()
print('project files before:', json.dumps(p.get('project', {}).get('files', {}), indent=2))
"
```
Expected: in the live_gui subprocess, the project's `files` dict may not have a `paths` key after a fresh `setup()` (because the test setup at `simulation/sim_base.py:78-99` doesn't pre-populate `paths`).
- [ ] **Step 7.1.3: Read sim_base.setup to understand initial state**
Use `manual-slop_get_file_slice` to read `simulation/sim_base.py:78-99`. Confirm `setup()` does NOT pre-populate `files['paths']` in the saved project.
### Task 7.2: Apply the fix
The fix is a 1-3 line change. Choose ONE of:
**Option A: Make the test code defensive (test-only fix)**
Modify `simulation/sim_context.py:44` to use `.setdefault('paths', [])`:
```python
for f in all_py:
if f not in proj['project']['files'].setdefault('paths', []):
proj['project']['files']['paths'].append(f)
```
Apply to BOTH loops (lines 33-35 and lines 43-45) for consistency.
**Option B: Remove the redundant second loop (cleanup)**
The second loop (lines 41-47) is identical to the first. Remove it. The first loop's `post_project` (line 37) already saves the project with all the files. The second loop+post is unnecessary.
**Recommended:** Option A is the minimal, defensive fix that addresses the test fragility without restructuring. Option B is cleaner code but more change.
- [ ] **Step 7.2.1: Apply the chosen fix to simulation/sim_context.py**
- [ ] **Step 7.2.2: Verify syntax**
```powershell
cd C:\projects\manual_slop; uv run python -c "import ast; ast.parse(open('simulation/sim_context.py').read()); print('OK')"
```
- [ ] **Step 7.2.3: Verify import**
```powershell
cd C:\projects\manual_slop; uv run python -c "from simulation.sim_context import ContextSimulation; print('import OK')"
```
### Task 7.3: Verify in batch
- [ ] **Step 7.3.1: Run the 4 sim tests in isolation first (sanity)**
```powershell
cd C:\projects\manual_slop; uv run pytest tests/test_extended_sims.py -v --timeout=300
```
Expected: 4/4 pass in isolation.
- [ ] **Step 7.3.2: Run the FULL batch to confirm (authoritative verification)**
```powershell
cd C:\projects\manual_slop; uv run .\scripts\run_tests_batched.py 2>&1 | Tee-Object -FilePath "tests/artifacts/post_phase2_mma_reset_fix_batch_20260610.log" | Select-Object -Last 50
```
Expected: tier-1 5/5, tier-2 5/5, tier-3 0 failures.
### Task 7.4: Final checkpoint
- [ ] **Step 7.4.1: Commit the fix**
```powershell
cd C:\projects\manual_slop; git add simulation/sim_context.py
git commit -m "fix(sim): make test_context_sim_live defensive against missing files['paths'] in batch"
$h = git log -1 --format='%H'
git notes add -m "..." $h
```
- [ ] **Step 7.4.2: Checkpoint commit with full batch log**
```powershell
cd C:\projects\manual_slop; git add -f tests/artifacts/post_phase2_mma_reset_fix_batch_20260610.log
git commit -m "conductor(checkpoint): Phase 2 complete - sim test fragility fixed"
$h = git log -1 --format='%H'
git notes add -m "..." $h
```
## Track Done
@@ -242,20 +242,51 @@ The accompanying comment claims `hasattr()` still returns False for these, which
## Verification Criteria
### Phase 1 (COMPLETE — verified 2026-06-10)
1.`src/app_controller.py:3409` pre-populates `mma_tier_usage` with the full default shape (model, provider, tool_preset, input, output for all 4 tiers).
2.`src/app_controller.py:2639` uses `d.get("model")` (or equivalent) instead of `d["model"]`.
3.`src/app_controller.py:__init__` contains `self.context_preset_manager = ContextPresetManager()` between the `_settable_fields` block and `self.perf_monitor = ...`.
4.`src/app_controller.py:1266-1275` does NOT contain `"persona_manager"` in `_LAZY_MANAGER_DEFAULTS`. The misleading comment is fixed or removed.
5. ✅ A new unit test in `tests/test_mma_tier_usage_reset_fix.py` verifies the post-reset flush doesn't crash.
6.`tests/test_reset_session_clears_mma_and_rag.py` (3 tests) still pass.
7.`tests/test_extended_sims.py::test_context_sim_live` passes in batch.
8.`tests/test_extended_sims.py::test_ai_settings_sim_live` passes in batch.
9.`tests/test_extended_sims.py::test_tools_sim_live` passes in batch.
10.`tests/test_extended_sims.py::test_execution_sim_live` passes in batch.
11.`tests/test_context_presets_manager.py::test_app_controller_save_load` passes.
12.`tests/test_project_switch_persona_preset.py::test_load_active_project_creates_persona_manager` passes.
13.`tests/test_project_switch_persona_preset.py::test_load_context_preset_missing_raises_keyerror` passes.
14. ✅ Tier-1 batch: 5/5 pass.
15. ✅ Tier-2 batch: 5/5 pass.
16. ✅ Tier-3 batch: 0 new failures vs `33d02bb1` baseline.
17. ✅ 4 atomic commits (one per FR).
### Phase 2 (PENDING — to be completed)
7.`tests/test_extended_sims.py::test_context_sim_live` passes in batch.
8.`tests/test_extended_sims.py::test_ai_settings_sim_live` passes in batch.
9.`tests/test_extended_sims.py::test_tools_sim_live` passes in batch.
10.`tests/test_extended_sims.py::test_execution_sim_live` passes in batch.
16. ❌ Tier-3 batch: 0 new failures vs `33d02bb1` baseline.
### Phase 2 Diagnosis (2026-06-10 full batch run)
The Phase 1 FRs fixed the original `KeyError: 'model'` from `_flush_to_project`. However, the full batch run (not the isolated test run) revealed a SEPARATE failure in the same test:
```
FAILED tests/test_extended_sims.py::test_context_sim_live
KeyError: 'paths'
simulation\sim_context.py:44: KeyError
```
The traceback shows the SECOND loop in `simulation/sim_context.py:41-47` (a redundant copy of the first loop) failing because `proj['project']['files']['paths']` is missing after the `post_project` round-trip. This loop is duplicated logic (the first loop at lines 32-37 already adds all `.py` files to `paths`; the second loop is supposed to add more, but the round-trip strips `paths`).
**Differences from original failure (which FR1+FR2 fixed):**
- Original (pre-fix): `KeyError: 'model'` from `_flush_to_project` at `src/app_controller.py:2639`
- New (post-fix): `KeyError: 'paths'` from `simulation/sim_context.py:44` (in the test code, not production)
**Root cause hypothesis:** The `post_project` hook strips empty/missing fields during the round-trip. In isolation, the first `post_project` succeeds and `paths` is preserved (probably because the first `proj` fetch already had a non-empty `paths` from prior session state). In batch, the live_gui subprocess state is different (different project setup path, prior tests' state has been cleared) and `paths` is empty/absent, so the re-fetch returns a project where `files['paths']` is missing entirely.
**Verification path for Phase 2:**
- Read the current `sim_context.py:run()` to understand the duplicated loop's intent
- Either: (a) remove the redundant second loop, (b) make the test handle missing `paths` key with `.setdefault('paths', [])`, (c) fix `_flush_to_project` to preserve empty `paths` lists
- Re-run the full batch to confirm all 4 sim tests pass
- Update the verification log
**Per AGENTS.md "Isolated-Pass Verification Fallacy":** the previous run that claimed "4/4 sim tests pass" was based on an isolated run. The full batch is the authoritative test. The track is NOT complete until Phase 2 verification passes.
@@ -4,8 +4,8 @@
[meta]
track_id = "mma_tier_usage_reset_fix_20260610"
name = "Fix mma_tier_usage reset + 3 pre-existing controller bugs (2026-06-10)"
status = "completed"
current_phase = "complete"
status = "in_progress"
current_phase = 1
last_updated = "2026-06-10"
[blocked_by]
@@ -16,6 +16,7 @@ last_updated = "2026-06-10"
[phases]
phase_1 = { status = "completed", checkpointsha = "428aa189", name = "Apply 4 FRs in app_controller.py + 4 regression tests" }
phase_2 = { status = "pending", checkpointsha = "", name = "Fix live_gui sim test fragility (test_context_sim_live KeyError: paths)" }
[tasks]
t1_1 = { status = "completed", commit_sha = "f5021360", description = "Pre-edit checkpoint" }
@@ -28,6 +29,10 @@ t1_7 = { status = "completed", commit_sha = "b96d709e", description = "Verify th
t1_8 = { status = "completed", commit_sha = "b96d709e", description = "Run the 3 previously-failing tier-1 tests + 4 sim tests in test_extended_sims.py" }
t1_9 = { status = "completed", commit_sha = "428aa189", description = "Run targeted regression tests (full batch aborted by user)" }
t1_10 = { status = "completed", commit_sha = "428aa189", description = "Checkpoint commit" }
t2_1 = { status = "pending", commit_sha = "", description = "Diagnose why proj['project']['files']['paths'] is missing after post_project round-trip in batched live_gui (works in isolated run)" }
t2_2 = { status = "pending", commit_sha = "", description = "Apply fix (production or test, whichever is correct root cause)" }
t2_3 = { status = "pending", commit_sha = "", description = "Verify all 4 sim tests pass in FULL batch (tier-3-live_gui)" }
t2_4 = { status = "pending", commit_sha = "", description = "Final checkpoint with batch log" }
[verification]
mma_tier_usage_prepopulated = true
@@ -37,16 +42,17 @@ persona_manager_removed_from_lazy_defaults = true
regression_tests_pass = true
reset_clears_mma_tests_pass = true
three_failing_tier1_tests_pass = true
extended_sims_pass = true
targeted_regression_passes = true
extended_sims_pass_isolated = true
extended_sims_pass_in_batch = false
[baseline_capture]
# Captured from the 2026-06-10 batch run
# Captured from the 2026-06-10 batch runs
tier_1_status_pre_fix = "FAIL (3 tests: test_app_controller_save_load, test_load_active_project_creates_persona_manager, test_load_context_preset_missing_raises_keyerror)"
tier_2_status_pre_fix = "PASS (5/5 batches)"
tier_3_status_pre_fix = "FAIL on test_extended_sims.py::test_context_sim_live (4 sim tests)"
tier_1_status_post_fix = "PASS (targeted verification: 20/20 in affected test files)"
tier_3_status_post_fix = "PASS (4/4 sim tests in test_extended_sims.py)"
tier_3_status_pre_fix = "FAIL on test_extended_sims.py::test_context_sim_live (4 sim tests) - KeyError: 'model' (the original FR1+FR2 bug)"
tier_1_status_post_phase_1 = "PASS (5/5 tier-1 batches in 2026-06-10 batch run)"
tier_2_status_post_phase_1 = "PASS (5/5 tier-2 batches in 2026-06-10 batch run)"
tier_3_status_post_phase_1 = "FAIL on test_extended_sims.py::test_context_sim_live - KeyError: 'paths' (a NEW issue exposed after the original bug was fixed)"
[notes]
# Test fixture in tests/test_mma_tier_usage_reset_fix.py sets 4 UI flags
@@ -56,6 +62,13 @@ tier_3_status_post_fix = "PASS (4/4 sim tests in test_extended_sims.py)"
# refactor from the previous agent's WIP commit. A follow-up to clean up
# _UI_FLAG_DEFAULTS is recommended.
# The full scripts/run_tests_batched.py run was aborted by the user.
# Targeted regression runs (36/36 across all affected files) cover all
# known affected code paths.
# CRITICAL: Phase 1 verification was based on ISOLATED sim test runs.
# The full batch run (tier-3-live_gui) reveals a SEPARATE test fragility
# issue: sim_context.py:41-47 (the SECOND redundant file-add loop)
# fails in batch because proj['project']['files']['paths'] is missing
# after the post_project round-trip. This is a live_gui shared-subprocess
# state issue (different from the original KeyError: 'model' bug).
#
# Per the workflow's "Isolated-Pass Verification Fallacy" rule, the batch
# failure is the authoritative result. The track is NOT complete; the
# 4 sim tests must pass in batch (the verification criterion).