diff --git a/conductor/tracks/mma_tier_usage_reset_fix_20260610/plan.md b/conductor/tracks/mma_tier_usage_reset_fix_20260610/plan.md index 985a86e3..227d3a52 100644 --- a/conductor/tracks/mma_tier_usage_reset_fix_20260610/plan.md +++ b/conductor/tracks/mma_tier_usage_reset_fix_20260610/plan.md @@ -552,11 +552,104 @@ Focus: The moment of truth. The 4 sim tests in `test_extended_sims.py` now pass, - `tests/test_context_presets_manager.py::test_app_controller_save_load` - `tests/test_project_switch_persona_preset.py::test_load_active_project_creates_persona_manager` - `tests/test_project_switch_persona_preset.py::test_load_context_preset_missing_raises_keyerror` -- [x] 4 sim tests in `tests/test_extended_sims.py` pass (full sim run) +- [x] 4 sim tests in `tests/test_extended_sims.py` pass (ISOLATED run; 4/4 in 222.08s) - [x] Targeted regression verification: 36/36 affected tests pass -- [ ] Tier-1 batch: 5/5 pass (full batch aborted; targeted verification substituted) -- [ ] Tier-2 batch: 5/5 pass (full batch aborted) -- [ ] Tier-3 batch: 0 new failures (full batch aborted; targeted verification substituted) +- [x] Tier-1 batch: 5/5 pass (2026-06-10 batch run) +- [x] Tier-2 batch: 5/5 pass (2026-06-10 batch run) +- [ ] Tier-3 batch: 0 new failures (FAILED in 2026-06-10 batch run; see Phase 2 below) + +## Phase 2: Fix live_gui sim test fragility + +The Phase 1 verification (isolated sim test run) was misleading. The full batch run revealed a SEPARATE failure in `test_extended_sims.py::test_context_sim_live` — `KeyError: 'paths'` at `simulation/sim_context.py:44`. This is a live_gui shared-subprocess state issue, not a regression of the FR1+FR2 fix. + +### Task 7.1: Diagnose the root cause + +- [ ] **Step 7.1.1: Read the duplicated loop in sim_context.py** + ```powershell + cd C:\projects\manual_slop; uv run python -c "import ast; print(ast.unparse(ast.parse(open('simulation/sim_context.py').read())))" | Select-String "for f in all_py" + ``` + Confirm lines 32-37 and 41-47 are duplicate logic. The second loop is supposed to add MORE files but the first loop already added all of them. + +- [ ] **Step 7.1.2: Check what post_project does to empty/missing `paths`** + ```powershell + cd C:\projects\manual_slop; uv run python -c " +from api_hook_client import ApiHookClient +import json +client = ApiHookClient() +import time +if not client.wait_for_server(timeout=5): + print('server not up; skip') +else: + p = client.get_project() + print('project files before:', json.dumps(p.get('project', {}).get('files', {}), indent=2)) + " + ``` + Expected: in the live_gui subprocess, the project's `files` dict may not have a `paths` key after a fresh `setup()` (because the test setup at `simulation/sim_base.py:78-99` doesn't pre-populate `paths`). + +- [ ] **Step 7.1.3: Read sim_base.setup to understand initial state** + Use `manual-slop_get_file_slice` to read `simulation/sim_base.py:78-99`. Confirm `setup()` does NOT pre-populate `files['paths']` in the saved project. + +### Task 7.2: Apply the fix + +The fix is a 1-3 line change. Choose ONE of: + +**Option A: Make the test code defensive (test-only fix)** +Modify `simulation/sim_context.py:44` to use `.setdefault('paths', [])`: +```python +for f in all_py: + if f not in proj['project']['files'].setdefault('paths', []): + proj['project']['files']['paths'].append(f) +``` +Apply to BOTH loops (lines 33-35 and lines 43-45) for consistency. + +**Option B: Remove the redundant second loop (cleanup)** +The second loop (lines 41-47) is identical to the first. Remove it. The first loop's `post_project` (line 37) already saves the project with all the files. The second loop+post is unnecessary. + +**Recommended:** Option A is the minimal, defensive fix that addresses the test fragility without restructuring. Option B is cleaner code but more change. + +- [ ] **Step 7.2.1: Apply the chosen fix to simulation/sim_context.py** + +- [ ] **Step 7.2.2: Verify syntax** + ```powershell + cd C:\projects\manual_slop; uv run python -c "import ast; ast.parse(open('simulation/sim_context.py').read()); print('OK')" + ``` + +- [ ] **Step 7.2.3: Verify import** + ```powershell + cd C:\projects\manual_slop; uv run python -c "from simulation.sim_context import ContextSimulation; print('import OK')" + ``` + +### Task 7.3: Verify in batch + +- [ ] **Step 7.3.1: Run the 4 sim tests in isolation first (sanity)** + ```powershell + cd C:\projects\manual_slop; uv run pytest tests/test_extended_sims.py -v --timeout=300 + ``` + Expected: 4/4 pass in isolation. + +- [ ] **Step 7.3.2: Run the FULL batch to confirm (authoritative verification)** + ```powershell + cd C:\projects\manual_slop; uv run .\scripts\run_tests_batched.py 2>&1 | Tee-Object -FilePath "tests/artifacts/post_phase2_mma_reset_fix_batch_20260610.log" | Select-Object -Last 50 + ``` + Expected: tier-1 5/5, tier-2 5/5, tier-3 0 failures. + +### Task 7.4: Final checkpoint + +- [ ] **Step 7.4.1: Commit the fix** + ```powershell + cd C:\projects\manual_slop; git add simulation/sim_context.py + git commit -m "fix(sim): make test_context_sim_live defensive against missing files['paths'] in batch" + $h = git log -1 --format='%H' + git notes add -m "..." $h + ``` + +- [ ] **Step 7.4.2: Checkpoint commit with full batch log** + ```powershell + cd C:\projects\manual_slop; git add -f tests/artifacts/post_phase2_mma_reset_fix_batch_20260610.log + git commit -m "conductor(checkpoint): Phase 2 complete - sim test fragility fixed" + $h = git log -1 --format='%H' + git notes add -m "..." $h + ``` ## Track Done diff --git a/conductor/tracks/mma_tier_usage_reset_fix_20260610/spec.md b/conductor/tracks/mma_tier_usage_reset_fix_20260610/spec.md index f910af01..edbe569c 100644 --- a/conductor/tracks/mma_tier_usage_reset_fix_20260610/spec.md +++ b/conductor/tracks/mma_tier_usage_reset_fix_20260610/spec.md @@ -242,20 +242,51 @@ The accompanying comment claims `hasattr()` still returns False for these, which ## Verification Criteria +### Phase 1 (COMPLETE — verified 2026-06-10) + 1. ✅ `src/app_controller.py:3409` pre-populates `mma_tier_usage` with the full default shape (model, provider, tool_preset, input, output for all 4 tiers). 2. ✅ `src/app_controller.py:2639` uses `d.get("model")` (or equivalent) instead of `d["model"]`. 3. ✅ `src/app_controller.py:__init__` contains `self.context_preset_manager = ContextPresetManager()` between the `_settable_fields` block and `self.perf_monitor = ...`. 4. ✅ `src/app_controller.py:1266-1275` does NOT contain `"persona_manager"` in `_LAZY_MANAGER_DEFAULTS`. The misleading comment is fixed or removed. 5. ✅ A new unit test in `tests/test_mma_tier_usage_reset_fix.py` verifies the post-reset flush doesn't crash. 6. ✅ `tests/test_reset_session_clears_mma_and_rag.py` (3 tests) still pass. -7. ✅ `tests/test_extended_sims.py::test_context_sim_live` passes in batch. -8. ✅ `tests/test_extended_sims.py::test_ai_settings_sim_live` passes in batch. -9. ✅ `tests/test_extended_sims.py::test_tools_sim_live` passes in batch. -10. ✅ `tests/test_extended_sims.py::test_execution_sim_live` passes in batch. 11. ✅ `tests/test_context_presets_manager.py::test_app_controller_save_load` passes. 12. ✅ `tests/test_project_switch_persona_preset.py::test_load_active_project_creates_persona_manager` passes. 13. ✅ `tests/test_project_switch_persona_preset.py::test_load_context_preset_missing_raises_keyerror` passes. 14. ✅ Tier-1 batch: 5/5 pass. 15. ✅ Tier-2 batch: 5/5 pass. -16. ✅ Tier-3 batch: 0 new failures vs `33d02bb1` baseline. 17. ✅ 4 atomic commits (one per FR). + +### Phase 2 (PENDING — to be completed) + +7. ❌ `tests/test_extended_sims.py::test_context_sim_live` passes in batch. +8. ✅ `tests/test_extended_sims.py::test_ai_settings_sim_live` passes in batch. +9. ✅ `tests/test_extended_sims.py::test_tools_sim_live` passes in batch. +10. ✅ `tests/test_extended_sims.py::test_execution_sim_live` passes in batch. +16. ❌ Tier-3 batch: 0 new failures vs `33d02bb1` baseline. + +### Phase 2 Diagnosis (2026-06-10 full batch run) + +The Phase 1 FRs fixed the original `KeyError: 'model'` from `_flush_to_project`. However, the full batch run (not the isolated test run) revealed a SEPARATE failure in the same test: + +``` +FAILED tests/test_extended_sims.py::test_context_sim_live +KeyError: 'paths' +simulation\sim_context.py:44: KeyError +``` + +The traceback shows the SECOND loop in `simulation/sim_context.py:41-47` (a redundant copy of the first loop) failing because `proj['project']['files']['paths']` is missing after the `post_project` round-trip. This loop is duplicated logic (the first loop at lines 32-37 already adds all `.py` files to `paths`; the second loop is supposed to add more, but the round-trip strips `paths`). + +**Differences from original failure (which FR1+FR2 fixed):** +- Original (pre-fix): `KeyError: 'model'` from `_flush_to_project` at `src/app_controller.py:2639` +- New (post-fix): `KeyError: 'paths'` from `simulation/sim_context.py:44` (in the test code, not production) + +**Root cause hypothesis:** The `post_project` hook strips empty/missing fields during the round-trip. In isolation, the first `post_project` succeeds and `paths` is preserved (probably because the first `proj` fetch already had a non-empty `paths` from prior session state). In batch, the live_gui subprocess state is different (different project setup path, prior tests' state has been cleared) and `paths` is empty/absent, so the re-fetch returns a project where `files['paths']` is missing entirely. + +**Verification path for Phase 2:** +- Read the current `sim_context.py:run()` to understand the duplicated loop's intent +- Either: (a) remove the redundant second loop, (b) make the test handle missing `paths` key with `.setdefault('paths', [])`, (c) fix `_flush_to_project` to preserve empty `paths` lists +- Re-run the full batch to confirm all 4 sim tests pass +- Update the verification log + +**Per AGENTS.md "Isolated-Pass Verification Fallacy":** the previous run that claimed "4/4 sim tests pass" was based on an isolated run. The full batch is the authoritative test. The track is NOT complete until Phase 2 verification passes. diff --git a/conductor/tracks/mma_tier_usage_reset_fix_20260610/state.toml b/conductor/tracks/mma_tier_usage_reset_fix_20260610/state.toml index e93e3cf9..9659d0a2 100644 --- a/conductor/tracks/mma_tier_usage_reset_fix_20260610/state.toml +++ b/conductor/tracks/mma_tier_usage_reset_fix_20260610/state.toml @@ -4,8 +4,8 @@ [meta] track_id = "mma_tier_usage_reset_fix_20260610" name = "Fix mma_tier_usage reset + 3 pre-existing controller bugs (2026-06-10)" -status = "completed" -current_phase = "complete" +status = "in_progress" +current_phase = 1 last_updated = "2026-06-10" [blocked_by] @@ -16,6 +16,7 @@ last_updated = "2026-06-10" [phases] phase_1 = { status = "completed", checkpointsha = "428aa189", name = "Apply 4 FRs in app_controller.py + 4 regression tests" } +phase_2 = { status = "pending", checkpointsha = "", name = "Fix live_gui sim test fragility (test_context_sim_live KeyError: paths)" } [tasks] t1_1 = { status = "completed", commit_sha = "f5021360", description = "Pre-edit checkpoint" } @@ -28,6 +29,10 @@ t1_7 = { status = "completed", commit_sha = "b96d709e", description = "Verify th t1_8 = { status = "completed", commit_sha = "b96d709e", description = "Run the 3 previously-failing tier-1 tests + 4 sim tests in test_extended_sims.py" } t1_9 = { status = "completed", commit_sha = "428aa189", description = "Run targeted regression tests (full batch aborted by user)" } t1_10 = { status = "completed", commit_sha = "428aa189", description = "Checkpoint commit" } +t2_1 = { status = "pending", commit_sha = "", description = "Diagnose why proj['project']['files']['paths'] is missing after post_project round-trip in batched live_gui (works in isolated run)" } +t2_2 = { status = "pending", commit_sha = "", description = "Apply fix (production or test, whichever is correct root cause)" } +t2_3 = { status = "pending", commit_sha = "", description = "Verify all 4 sim tests pass in FULL batch (tier-3-live_gui)" } +t2_4 = { status = "pending", commit_sha = "", description = "Final checkpoint with batch log" } [verification] mma_tier_usage_prepopulated = true @@ -37,16 +42,17 @@ persona_manager_removed_from_lazy_defaults = true regression_tests_pass = true reset_clears_mma_tests_pass = true three_failing_tier1_tests_pass = true -extended_sims_pass = true -targeted_regression_passes = true +extended_sims_pass_isolated = true +extended_sims_pass_in_batch = false [baseline_capture] -# Captured from the 2026-06-10 batch run +# Captured from the 2026-06-10 batch runs tier_1_status_pre_fix = "FAIL (3 tests: test_app_controller_save_load, test_load_active_project_creates_persona_manager, test_load_context_preset_missing_raises_keyerror)" tier_2_status_pre_fix = "PASS (5/5 batches)" -tier_3_status_pre_fix = "FAIL on test_extended_sims.py::test_context_sim_live (4 sim tests)" -tier_1_status_post_fix = "PASS (targeted verification: 20/20 in affected test files)" -tier_3_status_post_fix = "PASS (4/4 sim tests in test_extended_sims.py)" +tier_3_status_pre_fix = "FAIL on test_extended_sims.py::test_context_sim_live (4 sim tests) - KeyError: 'model' (the original FR1+FR2 bug)" +tier_1_status_post_phase_1 = "PASS (5/5 tier-1 batches in 2026-06-10 batch run)" +tier_2_status_post_phase_1 = "PASS (5/5 tier-2 batches in 2026-06-10 batch run)" +tier_3_status_post_phase_1 = "FAIL on test_extended_sims.py::test_context_sim_live - KeyError: 'paths' (a NEW issue exposed after the original bug was fixed)" [notes] # Test fixture in tests/test_mma_tier_usage_reset_fix.py sets 4 UI flags @@ -56,6 +62,13 @@ tier_3_status_post_fix = "PASS (4/4 sim tests in test_extended_sims.py)" # refactor from the previous agent's WIP commit. A follow-up to clean up # _UI_FLAG_DEFAULTS is recommended. -# The full scripts/run_tests_batched.py run was aborted by the user. -# Targeted regression runs (36/36 across all affected files) cover all -# known affected code paths. +# CRITICAL: Phase 1 verification was based on ISOLATED sim test runs. +# The full batch run (tier-3-live_gui) reveals a SEPARATE test fragility +# issue: sim_context.py:41-47 (the SECOND redundant file-add loop) +# fails in batch because proj['project']['files']['paths'] is missing +# after the post_project round-trip. This is a live_gui shared-subprocess +# state issue (different from the original KeyError: 'model' bug). +# +# Per the workflow's "Isolated-Pass Verification Fallacy" rule, the batch +# failure is the authoritative result. The track is NOT complete; the +# 4 sim tests must pass in batch (the verification criterion).