conductor(tracks): add fix_test_failures_20260624 row (#31)

Added row #31 to the tracks.md registry for the fix_test_failures_20260624 test-fix track. Marks the track as SHIPPED 2026-06-24 with: - 4 phases, 4 tasks, 8 atomic commits - 14 originally-failing tests now pass - VC1-3,5,6 = true; VC4 = PARTIAL (6 pre-existing failures) - TRACK_COMPLETION at docs/reports/TRACK_COMPLETION_fix_test_failures_20260624.md Documents VC4 PARTIAL: 6 pre-existing failures (5 in test_openai_compatible.py from Phase 2 dataclass refactor; 1 known flake in test_execution_sim_live) predate this fix. All 6 verified to exist in origin/master HEAD. Recommended follow-up track to fix the 5 openai_compatible tests (1-line fixes per test: tool_calls[0].function.name instead of subscripting).
2026-06-24 11:34:48 -04:00
parent f776cc6bc6
commit 26a4975209
1 changed files with 1 additions and 0 deletions
@@ -70,6 +70,7 @@ Tracks that are unblocked and ready to start. Ordered by **dependency** (blocked
 | 29b | A (research) | [C11 Reference (Pass 3 Sub-Track)](#track-c11-reference-pass-3-sub-track-2026-06-23) | spec ✓, plan ✓, metadata ✓, state ✓, README ✓, **in progress**; 4 cluster sub-reports + 1 main c11_convention.md + tracks.md update; PRIMARY sources = Pikuma duffle (9 headers) + forth bootslop attempt_1 (4 files) + forth references (2 files) + gte_hello (2 files); FALLBACK = raddebugger/src/base (5 headers); the C11 reference synthesizes the user's idiomatic C11 (byte-width types, underscore-suffixed type modifiers, hand-rolled DSL, memory ordering vocabulary, slice + arena, design-doc headers) with the raddbg fallback for patterns duffle doesn't cover (U64/S64, Vec2F32/Vec3S32, String8, force_inline, etc.); the per-language `<<` / `>>` rendering for C11 is included (much_less / much_greater / weakly_coupled with tolerance) | `video_analysis_deob_apply_20260621` (SHIPPED 2026-06-23) + `video_analysis_deob_lexicon_v2_20260623` (SHIPPED 2026-06-23) | (**NEW 2026-06-23**; Pass 3 sub-track; 6 atomic commits planned; hands off to Pass 3 as the primary C11 style guide; per user 2026-06-23: 'use the forth bootslop and pikuma then. Use raddbg's base for stuff missing. otherwise go for jai/odin.') |
 | 29c | A (research) | [Pass 3 — C11/Python Projection (the final phase)](#track-pass-3-c11python-projection-2026-06-23) | spec ✓, plan ✓, metadata ✓, state ✓, README ✓, TIER2_STARTER ✓, **spec DRAFT pending user review**; projects v2-deobfuscated outputs to C11 or Python code that conveys each video's content; 11 videos (10 C11 default + 2 Python + 1 synthesis); per-video deliverables: C11 (.c + .h) or Python (.py) + 3-4 markdown docs (translation, decoder, notes); 4 + 3 verification criteria met per the v2 lexicon; per-language `<<` / `>>` rendering (much_less / much_greater / weakly_coupled); encoding placeholder scheme (float / integer / Scalar / float64); code may or may not run (per user 2026-06-23); Tier 2 holds full context + 4 parallel Tier 3 sub-agents (per cluster) | `video_analysis_deob_apply_20260621` (SHIPPED) + `video_analysis_deob_lexicon_v2_20260623` (SHIPPED) + `video_analysis_deob_c11_reference_20260623` (SHIPPED) | (**NEW 2026-06-23**; **Pass 3 of 3**; the FINAL phase of the 3-pass research campaign; ~35-58 atomic commits planned; 11 videos × 3-5 deliverables = 33-55 files + 2 global reports; the user's 'ok awesome' (or similar) after the deliverables is the formal close of the 3-pass campaign) |
 | 30 | A (cleanup) | [Code Path Audit Polish (follow-up to code_path_audit_20260607)](#track-code-path-audit-polish-2026-06-22) | spec ✓, plan ✓, metadata ✓, state ✓, **SHIPPED 2026-06-24** by Tier 2 autonomous mode; 5 phases, 12 tasks, 22 atomic commits; 10/10 VCs pass; 127 tests (was 131; -6 deleted DSL/compute_result_coverage tests, +2 new SSDL behavioral tests); audit_weak_types --strict passes (104 <= 112 baseline); generate_type_registry --check passes (23 files in sync); 3 carry-over code smells removed (duplicate import json, dead DSL parser 148 lines + 4 tests, dead compute_result_coverage 30 lines + 2 tests); behavioral SSDL test locks down the headline 4.01e22 effective_codepaths math; spec_v2.md Revision History added; TRACK_COMPLETION at `docs/reports/TRACK_COMPLETION_code_path_audit_polish_20260622.md` | `code_path_audit_20260607` (parent; shipped 2026-06-22 with MVP pivot) | (**NEW 2026-06-22**; small surgical follow-up; **out of scope**: 4 pre-existing exception-handling violations NG1 + 7 pre-existing Optional[T] violations NG2 + 7-file split refactor NG3 + function-body imports NG4 + _resolve_aliases list[X] bug NG5 + frequency hardcoded NG6; **deferred to follow-up tracks**: deferred-convention-cleanup, deferred-7to1-refactor; investigation found spec WHERE for Task 1.1 was inaccurate — the actual regression was in src/openai_schemas.py and src/mcp_tool_specs.py, NOT in src/code_path_audit*.py files as the spec stated; fix applied to the actual locations with plan.md investigation note documenting the discrepancy) |
+| 31 | A (bugfix) | [Fix 14 Test Failures (post-polish merge)](#track-fix-14-test-failures-post-polish-merge-2026-06-24) | spec ✓, plan ✓, metadata ✓, state ✓, **SHIPPED 2026-06-24** by Tier 2 autonomous mode; 4 phases, 4 tasks, 8 atomic commits (3 task commits + 3 plan updates + state + TRACK_COMPLETION); 14 originally-failing tests now pass (12 NormalizedResponse dual-signature + 1 test_auto_whitelist + 3 palette tests); VC1=true, VC2=true, VC3=true, VC4=PARTIAL (6 pre-existing failures NOT in spec), VC5=true, VC6=true; TRACK_COMPLETION at `docs/reports/TRACK_COMPLETION_fix_test_failures_20260624.md` | `code_path_audit_polish_20260622` (parent; shipped 2026-06-24 and merged) | (**NEW 2026-06-24**; small surgical test-fix; 3 root causes: 1) NormalizedResponse __init__ signature mismatch (Phase 2 refactor left 12 tests using legacy flat kwargs; fix: added init=False + custom __init__ accepting both nested usage: UsageStats AND legacy usage_input_tokens=...); 2) test_auto_whitelist mutated a frozen Session via dict assignment (fix: use dataclasses.replace); 3) 3 palette tests depended on toggle + session-scoped fixture state (fix: force-close preamble that guarantees closed state via conditional toggle + poll); **VC4 PARTIAL**: 6 pre-existing failures remain (5 in tests/test_openai_compatible.py with `'ToolCall' object is not subscriptable` from Phase 2 dataclass refactor; 1 in tests/test_extended_sims.py::test_execution_sim_live which is a known flake); all 6 verified to exist in origin/master HEAD BEFORE this fix; **recommended follow-up track** to fix the 5 openai_compatible tests (1-line fixes per test: `tool_calls[0].function.name` instead of `tool_calls[0]["function"]["name"]`)) |

 **Note on numbering:** the legacy file used `0a`, `0b`, `0c`... and `0d`, `0e`, `0f`, `0g` for tracks created 2026-06-06+. This is the **git-blame sort order**, not a logical execution order. The new structure re-orders by dependency.