conductor(plan): Update spec/plan for Phase 2 (live_gui sim test fragility)
This commit is contained in:
@@ -552,11 +552,104 @@ Focus: The moment of truth. The 4 sim tests in `test_extended_sims.py` now pass,
|
||||
- `tests/test_context_presets_manager.py::test_app_controller_save_load`
|
||||
- `tests/test_project_switch_persona_preset.py::test_load_active_project_creates_persona_manager`
|
||||
- `tests/test_project_switch_persona_preset.py::test_load_context_preset_missing_raises_keyerror`
|
||||
- [x] 4 sim tests in `tests/test_extended_sims.py` pass (full sim run)
|
||||
- [x] 4 sim tests in `tests/test_extended_sims.py` pass (ISOLATED run; 4/4 in 222.08s)
|
||||
- [x] Targeted regression verification: 36/36 affected tests pass
|
||||
- [ ] Tier-1 batch: 5/5 pass (full batch aborted; targeted verification substituted)
|
||||
- [ ] Tier-2 batch: 5/5 pass (full batch aborted)
|
||||
- [ ] Tier-3 batch: 0 new failures (full batch aborted; targeted verification substituted)
|
||||
- [x] Tier-1 batch: 5/5 pass (2026-06-10 batch run)
|
||||
- [x] Tier-2 batch: 5/5 pass (2026-06-10 batch run)
|
||||
- [ ] Tier-3 batch: 0 new failures (FAILED in 2026-06-10 batch run; see Phase 2 below)
|
||||
|
||||
## Phase 2: Fix live_gui sim test fragility
|
||||
|
||||
The Phase 1 verification (isolated sim test run) was misleading. The full batch run revealed a SEPARATE failure in `test_extended_sims.py::test_context_sim_live` — `KeyError: 'paths'` at `simulation/sim_context.py:44`. This is a live_gui shared-subprocess state issue, not a regression of the FR1+FR2 fix.
|
||||
|
||||
### Task 7.1: Diagnose the root cause
|
||||
|
||||
- [ ] **Step 7.1.1: Read the duplicated loop in sim_context.py**
|
||||
```powershell
|
||||
cd C:\projects\manual_slop; uv run python -c "import ast; print(ast.unparse(ast.parse(open('simulation/sim_context.py').read())))" | Select-String "for f in all_py"
|
||||
```
|
||||
Confirm lines 32-37 and 41-47 are duplicate logic. The second loop is supposed to add MORE files but the first loop already added all of them.
|
||||
|
||||
- [ ] **Step 7.1.2: Check what post_project does to empty/missing `paths`**
|
||||
```powershell
|
||||
cd C:\projects\manual_slop; uv run python -c "
|
||||
from api_hook_client import ApiHookClient
|
||||
import json
|
||||
client = ApiHookClient()
|
||||
import time
|
||||
if not client.wait_for_server(timeout=5):
|
||||
print('server not up; skip')
|
||||
else:
|
||||
p = client.get_project()
|
||||
print('project files before:', json.dumps(p.get('project', {}).get('files', {}), indent=2))
|
||||
"
|
||||
```
|
||||
Expected: in the live_gui subprocess, the project's `files` dict may not have a `paths` key after a fresh `setup()` (because the test setup at `simulation/sim_base.py:78-99` doesn't pre-populate `paths`).
|
||||
|
||||
- [ ] **Step 7.1.3: Read sim_base.setup to understand initial state**
|
||||
Use `manual-slop_get_file_slice` to read `simulation/sim_base.py:78-99`. Confirm `setup()` does NOT pre-populate `files['paths']` in the saved project.
|
||||
|
||||
### Task 7.2: Apply the fix
|
||||
|
||||
The fix is a 1-3 line change. Choose ONE of:
|
||||
|
||||
**Option A: Make the test code defensive (test-only fix)**
|
||||
Modify `simulation/sim_context.py:44` to use `.setdefault('paths', [])`:
|
||||
```python
|
||||
for f in all_py:
|
||||
if f not in proj['project']['files'].setdefault('paths', []):
|
||||
proj['project']['files']['paths'].append(f)
|
||||
```
|
||||
Apply to BOTH loops (lines 33-35 and lines 43-45) for consistency.
|
||||
|
||||
**Option B: Remove the redundant second loop (cleanup)**
|
||||
The second loop (lines 41-47) is identical to the first. Remove it. The first loop's `post_project` (line 37) already saves the project with all the files. The second loop+post is unnecessary.
|
||||
|
||||
**Recommended:** Option A is the minimal, defensive fix that addresses the test fragility without restructuring. Option B is cleaner code but more change.
|
||||
|
||||
- [ ] **Step 7.2.1: Apply the chosen fix to simulation/sim_context.py**
|
||||
|
||||
- [ ] **Step 7.2.2: Verify syntax**
|
||||
```powershell
|
||||
cd C:\projects\manual_slop; uv run python -c "import ast; ast.parse(open('simulation/sim_context.py').read()); print('OK')"
|
||||
```
|
||||
|
||||
- [ ] **Step 7.2.3: Verify import**
|
||||
```powershell
|
||||
cd C:\projects\manual_slop; uv run python -c "from simulation.sim_context import ContextSimulation; print('import OK')"
|
||||
```
|
||||
|
||||
### Task 7.3: Verify in batch
|
||||
|
||||
- [ ] **Step 7.3.1: Run the 4 sim tests in isolation first (sanity)**
|
||||
```powershell
|
||||
cd C:\projects\manual_slop; uv run pytest tests/test_extended_sims.py -v --timeout=300
|
||||
```
|
||||
Expected: 4/4 pass in isolation.
|
||||
|
||||
- [ ] **Step 7.3.2: Run the FULL batch to confirm (authoritative verification)**
|
||||
```powershell
|
||||
cd C:\projects\manual_slop; uv run .\scripts\run_tests_batched.py 2>&1 | Tee-Object -FilePath "tests/artifacts/post_phase2_mma_reset_fix_batch_20260610.log" | Select-Object -Last 50
|
||||
```
|
||||
Expected: tier-1 5/5, tier-2 5/5, tier-3 0 failures.
|
||||
|
||||
### Task 7.4: Final checkpoint
|
||||
|
||||
- [ ] **Step 7.4.1: Commit the fix**
|
||||
```powershell
|
||||
cd C:\projects\manual_slop; git add simulation/sim_context.py
|
||||
git commit -m "fix(sim): make test_context_sim_live defensive against missing files['paths'] in batch"
|
||||
$h = git log -1 --format='%H'
|
||||
git notes add -m "..." $h
|
||||
```
|
||||
|
||||
- [ ] **Step 7.4.2: Checkpoint commit with full batch log**
|
||||
```powershell
|
||||
cd C:\projects\manual_slop; git add -f tests/artifacts/post_phase2_mma_reset_fix_batch_20260610.log
|
||||
git commit -m "conductor(checkpoint): Phase 2 complete - sim test fragility fixed"
|
||||
$h = git log -1 --format='%H'
|
||||
git notes add -m "..." $h
|
||||
```
|
||||
|
||||
## Track Done
|
||||
|
||||
|
||||
@@ -242,20 +242,51 @@ The accompanying comment claims `hasattr()` still returns False for these, which
|
||||
|
||||
## Verification Criteria
|
||||
|
||||
### Phase 1 (COMPLETE — verified 2026-06-10)
|
||||
|
||||
1. ✅ `src/app_controller.py:3409` pre-populates `mma_tier_usage` with the full default shape (model, provider, tool_preset, input, output for all 4 tiers).
|
||||
2. ✅ `src/app_controller.py:2639` uses `d.get("model")` (or equivalent) instead of `d["model"]`.
|
||||
3. ✅ `src/app_controller.py:__init__` contains `self.context_preset_manager = ContextPresetManager()` between the `_settable_fields` block and `self.perf_monitor = ...`.
|
||||
4. ✅ `src/app_controller.py:1266-1275` does NOT contain `"persona_manager"` in `_LAZY_MANAGER_DEFAULTS`. The misleading comment is fixed or removed.
|
||||
5. ✅ A new unit test in `tests/test_mma_tier_usage_reset_fix.py` verifies the post-reset flush doesn't crash.
|
||||
6. ✅ `tests/test_reset_session_clears_mma_and_rag.py` (3 tests) still pass.
|
||||
7. ✅ `tests/test_extended_sims.py::test_context_sim_live` passes in batch.
|
||||
8. ✅ `tests/test_extended_sims.py::test_ai_settings_sim_live` passes in batch.
|
||||
9. ✅ `tests/test_extended_sims.py::test_tools_sim_live` passes in batch.
|
||||
10. ✅ `tests/test_extended_sims.py::test_execution_sim_live` passes in batch.
|
||||
11. ✅ `tests/test_context_presets_manager.py::test_app_controller_save_load` passes.
|
||||
12. ✅ `tests/test_project_switch_persona_preset.py::test_load_active_project_creates_persona_manager` passes.
|
||||
13. ✅ `tests/test_project_switch_persona_preset.py::test_load_context_preset_missing_raises_keyerror` passes.
|
||||
14. ✅ Tier-1 batch: 5/5 pass.
|
||||
15. ✅ Tier-2 batch: 5/5 pass.
|
||||
16. ✅ Tier-3 batch: 0 new failures vs `33d02bb1` baseline.
|
||||
17. ✅ 4 atomic commits (one per FR).
|
||||
|
||||
### Phase 2 (PENDING — to be completed)
|
||||
|
||||
7. ❌ `tests/test_extended_sims.py::test_context_sim_live` passes in batch.
|
||||
8. ✅ `tests/test_extended_sims.py::test_ai_settings_sim_live` passes in batch.
|
||||
9. ✅ `tests/test_extended_sims.py::test_tools_sim_live` passes in batch.
|
||||
10. ✅ `tests/test_extended_sims.py::test_execution_sim_live` passes in batch.
|
||||
16. ❌ Tier-3 batch: 0 new failures vs `33d02bb1` baseline.
|
||||
|
||||
### Phase 2 Diagnosis (2026-06-10 full batch run)
|
||||
|
||||
The Phase 1 FRs fixed the original `KeyError: 'model'` from `_flush_to_project`. However, the full batch run (not the isolated test run) revealed a SEPARATE failure in the same test:
|
||||
|
||||
```
|
||||
FAILED tests/test_extended_sims.py::test_context_sim_live
|
||||
KeyError: 'paths'
|
||||
simulation\sim_context.py:44: KeyError
|
||||
```
|
||||
|
||||
The traceback shows the SECOND loop in `simulation/sim_context.py:41-47` (a redundant copy of the first loop) failing because `proj['project']['files']['paths']` is missing after the `post_project` round-trip. This loop is duplicated logic (the first loop at lines 32-37 already adds all `.py` files to `paths`; the second loop is supposed to add more, but the round-trip strips `paths`).
|
||||
|
||||
**Differences from original failure (which FR1+FR2 fixed):**
|
||||
- Original (pre-fix): `KeyError: 'model'` from `_flush_to_project` at `src/app_controller.py:2639`
|
||||
- New (post-fix): `KeyError: 'paths'` from `simulation/sim_context.py:44` (in the test code, not production)
|
||||
|
||||
**Root cause hypothesis:** The `post_project` hook strips empty/missing fields during the round-trip. In isolation, the first `post_project` succeeds and `paths` is preserved (probably because the first `proj` fetch already had a non-empty `paths` from prior session state). In batch, the live_gui subprocess state is different (different project setup path, prior tests' state has been cleared) and `paths` is empty/absent, so the re-fetch returns a project where `files['paths']` is missing entirely.
|
||||
|
||||
**Verification path for Phase 2:**
|
||||
- Read the current `sim_context.py:run()` to understand the duplicated loop's intent
|
||||
- Either: (a) remove the redundant second loop, (b) make the test handle missing `paths` key with `.setdefault('paths', [])`, (c) fix `_flush_to_project` to preserve empty `paths` lists
|
||||
- Re-run the full batch to confirm all 4 sim tests pass
|
||||
- Update the verification log
|
||||
|
||||
**Per AGENTS.md "Isolated-Pass Verification Fallacy":** the previous run that claimed "4/4 sim tests pass" was based on an isolated run. The full batch is the authoritative test. The track is NOT complete until Phase 2 verification passes.
|
||||
|
||||
@@ -4,8 +4,8 @@
|
||||
[meta]
|
||||
track_id = "mma_tier_usage_reset_fix_20260610"
|
||||
name = "Fix mma_tier_usage reset + 3 pre-existing controller bugs (2026-06-10)"
|
||||
status = "completed"
|
||||
current_phase = "complete"
|
||||
status = "in_progress"
|
||||
current_phase = 1
|
||||
last_updated = "2026-06-10"
|
||||
|
||||
[blocked_by]
|
||||
@@ -16,6 +16,7 @@ last_updated = "2026-06-10"
|
||||
|
||||
[phases]
|
||||
phase_1 = { status = "completed", checkpointsha = "428aa189", name = "Apply 4 FRs in app_controller.py + 4 regression tests" }
|
||||
phase_2 = { status = "pending", checkpointsha = "", name = "Fix live_gui sim test fragility (test_context_sim_live KeyError: paths)" }
|
||||
|
||||
[tasks]
|
||||
t1_1 = { status = "completed", commit_sha = "f5021360", description = "Pre-edit checkpoint" }
|
||||
@@ -28,6 +29,10 @@ t1_7 = { status = "completed", commit_sha = "b96d709e", description = "Verify th
|
||||
t1_8 = { status = "completed", commit_sha = "b96d709e", description = "Run the 3 previously-failing tier-1 tests + 4 sim tests in test_extended_sims.py" }
|
||||
t1_9 = { status = "completed", commit_sha = "428aa189", description = "Run targeted regression tests (full batch aborted by user)" }
|
||||
t1_10 = { status = "completed", commit_sha = "428aa189", description = "Checkpoint commit" }
|
||||
t2_1 = { status = "pending", commit_sha = "", description = "Diagnose why proj['project']['files']['paths'] is missing after post_project round-trip in batched live_gui (works in isolated run)" }
|
||||
t2_2 = { status = "pending", commit_sha = "", description = "Apply fix (production or test, whichever is correct root cause)" }
|
||||
t2_3 = { status = "pending", commit_sha = "", description = "Verify all 4 sim tests pass in FULL batch (tier-3-live_gui)" }
|
||||
t2_4 = { status = "pending", commit_sha = "", description = "Final checkpoint with batch log" }
|
||||
|
||||
[verification]
|
||||
mma_tier_usage_prepopulated = true
|
||||
@@ -37,16 +42,17 @@ persona_manager_removed_from_lazy_defaults = true
|
||||
regression_tests_pass = true
|
||||
reset_clears_mma_tests_pass = true
|
||||
three_failing_tier1_tests_pass = true
|
||||
extended_sims_pass = true
|
||||
targeted_regression_passes = true
|
||||
extended_sims_pass_isolated = true
|
||||
extended_sims_pass_in_batch = false
|
||||
|
||||
[baseline_capture]
|
||||
# Captured from the 2026-06-10 batch run
|
||||
# Captured from the 2026-06-10 batch runs
|
||||
tier_1_status_pre_fix = "FAIL (3 tests: test_app_controller_save_load, test_load_active_project_creates_persona_manager, test_load_context_preset_missing_raises_keyerror)"
|
||||
tier_2_status_pre_fix = "PASS (5/5 batches)"
|
||||
tier_3_status_pre_fix = "FAIL on test_extended_sims.py::test_context_sim_live (4 sim tests)"
|
||||
tier_1_status_post_fix = "PASS (targeted verification: 20/20 in affected test files)"
|
||||
tier_3_status_post_fix = "PASS (4/4 sim tests in test_extended_sims.py)"
|
||||
tier_3_status_pre_fix = "FAIL on test_extended_sims.py::test_context_sim_live (4 sim tests) - KeyError: 'model' (the original FR1+FR2 bug)"
|
||||
tier_1_status_post_phase_1 = "PASS (5/5 tier-1 batches in 2026-06-10 batch run)"
|
||||
tier_2_status_post_phase_1 = "PASS (5/5 tier-2 batches in 2026-06-10 batch run)"
|
||||
tier_3_status_post_phase_1 = "FAIL on test_extended_sims.py::test_context_sim_live - KeyError: 'paths' (a NEW issue exposed after the original bug was fixed)"
|
||||
|
||||
[notes]
|
||||
# Test fixture in tests/test_mma_tier_usage_reset_fix.py sets 4 UI flags
|
||||
@@ -56,6 +62,13 @@ tier_3_status_post_fix = "PASS (4/4 sim tests in test_extended_sims.py)"
|
||||
# refactor from the previous agent's WIP commit. A follow-up to clean up
|
||||
# _UI_FLAG_DEFAULTS is recommended.
|
||||
|
||||
# The full scripts/run_tests_batched.py run was aborted by the user.
|
||||
# Targeted regression runs (36/36 across all affected files) cover all
|
||||
# known affected code paths.
|
||||
# CRITICAL: Phase 1 verification was based on ISOLATED sim test runs.
|
||||
# The full batch run (tier-3-live_gui) reveals a SEPARATE test fragility
|
||||
# issue: sim_context.py:41-47 (the SECOND redundant file-add loop)
|
||||
# fails in batch because proj['project']['files']['paths'] is missing
|
||||
# after the post_project round-trip. This is a live_gui shared-subprocess
|
||||
# state issue (different from the original KeyError: 'model' bug).
|
||||
#
|
||||
# Per the workflow's "Isolated-Pass Verification Fallacy" rule, the batch
|
||||
# failure is the authoritative result. The track is NOT complete; the
|
||||
# 4 sim tests must pass in batch (the verification criterion).
|
||||
|
||||
Reference in New Issue
Block a user