conductor(tracks): consolidate Phase 6+ chronology (3 recently completed + 4 in plan)

The Phase 6+ section had two duplicate '### Active' headers, which made the chronology confusing. The user (paraphrased): preserve the chronology of project progress, don't need full detail, follow the previous restructure's lightweight pattern. Changes: - Add '### Recently Completed (2026-06-06 to 2026-06-10)' subsection containing the 3 closed tracks (startup_speedup, test_batching_refactor, test_infrastructure_hardening) with lightweight entries: per-phase commit SHAs only, 1-line summary, link to spec/plan/state folder. Trimmed the verbose per-sub-track commentary that was in the old startup_speedup entry (the per-sub-track bullets for warmup, status indicator, audit violations, post-shipping fixes are in the archive's spec/plan, not the tracks.md). - Remove the duplicate '### Active' header. - Update section intro to reflect '3 recently completed, 4 in plan' (was '2 already completed, 3 in plan'). - test_infrastructure_hardening entry now has phase commit SHAs (5df22fa8, 67d0211e, 006bb114, b8fcd9d6, 33d5cac, 7b87bbf5, 84edb200, 719fe9a) instead of just the closing-report link. Chronology is now visible at a glance; per-track full detail is in the linked archive/ folder.
2026-06-10 20:42:00 -04:00
parent 3e0c7702ad
commit 994ded3598
1 changed files with 13 additions and 23 deletions
@@ -436,42 +436,32 @@ User review surfaced five outstanding UI issues, each previously attempted witho

 ## Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)

-*Initialized: 2026-06-06 — the current major sprint. Four foundational tracks launched in this sprint, plus one follow-up. Two already completed; three in plan state.*
+*Initialized: 2026-06-06 — the current major sprint. Four foundational tracks launched in this sprint, plus one follow-up. **As of 2026-06-10: 3 recently completed (startup_speedup, test_batching_refactor, test_infrastructure_hardening); 4 in plan state (qwen, error_handling, data_structure, mcp_arch).** The 4 in-plan tracks are now unblocked (the upstream test_infrastructure_hardening track is shipped).*

-### Active
+### Recently Completed (2026-06-06 to 2026-06-10)
+
+Lightweight chronology; full spec/plan/state per track is in the linked folder.

 #### Track: Sloppy.py Startup Speedup `[COMPLETE 2026-06-07]`
-*Link: [./tracks/startup_speedup_20260606/](./tracks/startup_speedup_20260606/), Spec: [./tracks/startup_speedup_20260606/spec.md](./tracks/startup_speedup_20260606/spec.md), Plan: [./tracks/startup_speedup_20260606/plan.md](./tracks/startup_speedup_20260606/plan.md)*
+*Link: [./tracks/startup_speedup_20260606/](./tracks/startup_speedup_20260606/) (full spec/plan/state in folder)*

-`[track-created: cd4fb045] [phase-1-2-done: f9a01258] [phase-3-done: 51c054ec] [phase-4-done: 3849d304] [phase-5a-done: 78d3a1db] [phase-5b-done: 69d098ba] [phase-5c-done: 48c96499] [phase-5d-done: de6b85d2] [phase-5-done: 515a3029] [phase-6-partial-done: 85d18885] [sub-track-1-done: 253e1798] [post-shipping-fix-1: 8c4791d0] [post-shipping-fix-2: 88fc42bb] [post-shipping-fix-3: 52ea2693] [sub-track-3-done: 8fea8fe9] [sub-track-4-done: f3d071e0] [conftest-atexit-fix: 8957c9a5] [phase-9-shipped: 12cec6ae] [sub-track-2a-done: 01ddf9f1] [sub-track-2b-done: a41b31ed] [sub-track-2c-done: 372b0681] [sub-track-2d-done: 11a9c4f7] [sub-track-2e+f-done: 2e3a6385] [audit-CLEAN: 2e3a6385]`
+`[track-created: cd4fb045] [phase-1-2-done: f9a01258] [phase-3-done: 51c054ec] [phase-4-done: 3849d304] [phase-5-done: 515a3029] [sub-track-1-done: 253e1798] [sub-track-2e+f-done: 2e3a6385] [audit-CLEAN: 2e3a6385] [conftest-atexit-fix: 8957c9a5] [post-shipping-fix-1: 8c4791d0] [post-shipping-fix-2: 88fc42bb] [post-shipping-fix-3: 52ea2693]`

-*Goal: Reduce sloppy.py startup time. Main Thread Purity Invariant. 9 phases, 57 tasks. 44 TDD tests added (all passing). 7 main thread purity tests enforce invariant for 6 refactored files.*
-*Final measured: import src.ai_client 161ms (was 1800ms; 91% reduction / 1638ms saved). import src.gui_2 341ms (was 1770ms; 81% reduction / 1429ms saved). Total ~3067ms saved on the 2 big files. 62 audit violations remain (was 63 after Sub-track 2 partial; was 67 baseline) - all 6 refactored files contribute 0 new violations.*
-*Sub-track 1 (Phase 6 full completion) at 253e1798: 15 ad-hoc threading.Thread() call sites migrated to self.submit_io(...); ZERO new threading.Thread() in src/; only 5 domain-specific exempt sites remain (HookServer HTTP/WS, asyncio loop, WorkerPool, CPU monitor).*
-*Sub-track 3 (Hook API warmup endpoints) at 8fea8fe9: GET /api/warmup_status and GET /api/warmup_wait?timeout=N. 7 tests (5 unit + 2 live_gui). All pass.*
-*Sub-track 4 (GUI status indicator) at f3d071e0: render_warmup_status_indicator() + _on_warmup_complete_callback() + App._post_init registration. 6 tests (5 unit + 1 live_gui). All pass.*
-*Conftest atexit fix at 8957c9a5: registers a non-blocking pool shutdown via atexit. Fixes the run_tests_batched.py hang between batches (ThreadPoolExecutor.__del__ was blocking on shutdown(wait=True) for stuck warmup jobs).*
-*Sub-track 2 (audit violations) PARTIAL at ae3b433e: 1 of 63 violations fixed (tomli_w in src/models.py). 62 remain (pydantic in models.py; tree_sitter in file_cache.py; websockets/cost_tracker/session_logger in api_hooks.py; 48 in app_controller.py + gui_2.py; 4 in sloppy.py). These are large refactors (especially gui_2.py with 24 violations and app_controller.py with 24) that exceed the scope of a single sub-track; addressed as future work.*
-*3 post-shipping bugfix commits: 8c4791d0 (real bug: _ensure_gemini_client UnboundLocalError + test_discussion_compression deepseek mock adaptation); 88fc42bb (spec convention: 7 sites in src/ai_client.py use _require_warmed('google.genai') + .types parent lookup instead of leaf); 52ea2693 (conftest: use AppController.wait_for_warmup(timeout=60.0) instead of direct import google.genai — user-corrected jank workaround).*
-*Pre-existing test failures (unrelated, user will address): test_api_generate_blocked_while_stale (ui_global_preset_name AttributeError); test_rag_large_codebase_verification_sim (RAG retrieval).*
+*9 phases, 57 tasks. 44 TDD tests added. Main Thread Purity Invariant enforced via `scripts/audit_main_thread_imports.py` CI gate. Final measured: import src.ai_client 161ms (was 1800ms; 91% reduction); import src.gui_2 341ms (was 1770ms; 81% reduction); total ~3067ms saved. 62 audit violations remain (large refactors deferred).*

 #### Track: Test Batching Refactor `[COMPLETE 2026-06-08] [archived]`
-*Link: [./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/](./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/), Spec: [./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/spec.md](./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/spec.md), Plan: [./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/plan.md](./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/plan.md)*
+*Link: [./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/](./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/)*

-`[track-created: b7a97374] [COMPLETE 2026-06-08] [phase-1-done: 57285d04] [phase-2-skipped: no-CI] [phase-3-done: 5252b6d7] [phase-4-done: 50bd894f] [archived: 50bd894f]`
+`[track-created: b7a97374] [COMPLETE 2026-06-08] [phase-1-done: 57285d04] [phase-3-done: 5252b6d7] [phase-4-done: 50bd894f] [archived: 50bd894f]`

-*Adaptations: (a) library modules moved from scripts/ to tests/ per user directive; (b) auto-inference uses AST scan (not regex) per user "FUCK REGEX" policy + prereq spec; (c) Phase 2 (CI shadow run) skipped: no CI infrastructure in repo; manual plan-vs-actual spot-check was the equivalent verification.*
-*Goal: Replace alphabetical 4-at-a-time batching in `scripts/run_tests_batched.py` with fixture-class-isolated tiers: 0 (opt-in: clean_install/docker, gated on env var + --include-opt-in flag), 1 (unit, grouped by subsystem batch_group, pytest-xdist), 2 (mock_app, grouped), 3 (live_gui, all in one pytest invocation to amortize 15s startup), H (headless), P (performance, last). Hybrid classification: auto-infer from filename + AST fixture scan, hand-curated `tests/test_categories.toml` overrides for cross-cutting and ambiguous files. Opt-in per-test order control via `[[files.X.test_order]]` sub-tables, gated on a conftest-loaded pytest plugin (no-op without entries). Priority: B (process isolation) > A (subsystem diagnostic) > C (speed). 4 phases: library+dry-run, shadow run, switch default, cleanup.*
-*Goal: Reduce `sloppy.py` startup time by ~2000-2400ms. **Main Thread Purity Invariant**: main thread (entering `immapp.run()`) never imports a module heavier than `imgui_bundle` + lean `gui_2` skeleton. **No-prefetch rule**: heavy SDKs (`google.genai` 955ms, `anthropic` 430ms, `openai` 445ms, `fastapi` 470ms) are lazy-only — paid once on first use, on the asyncio thread, not in the background. **No-new-threads rule**: all background work goes through `AppController._io_pool` (4-thread `ThreadPoolExecutor`, named `controller-io-N`); zero new `threading.Thread(...)` calls in `src/`. **Enforcement**: static `scripts/audit_main_thread_imports.py` CI gate + runtime `tests/test_main_thread_purity.py` (`sys.addaudithook` test). 9 phases, 57 tasks. Target: `import src.ai_client` < 50ms (from ~1800ms), `import src.gui_2` < 500ms (from ~3000ms), `live_gui.wait_for_server(timeout=15)` no longer times out.*
-
-### Active
+*4 phases, fixture-class-isolated tiers (0-3 + H + P) replacing alphabetical 4-at-a-time batching. Hand-curated `tests/test_categories.toml` overrides for cross-cutting files. Phase 2 (CI shadow run) skipped (no CI in repo).*

 #### Track: Test Infrastructure Hardening (2026-06-09) `[COMPLETE 2026-06-10] [archived]`
-*Link: [./archive/test_infrastructure_hardening_20260609/](./archive/test_infrastructure_hardening_20260609/), Spec: [./archive/test_infrastructure_hardening_20260609/spec.md](./archive/test_infrastructure_hardening_20260609/spec.md), Plan: [./archive/test_infrastructure_hardening_20260609/plan.md](./archive/test_infrastructure_hardening_20260609/plan.md), Metadata: [./archive/test_infrastructure_hardening_20260609/metadata.json](./archive/test_infrastructure_hardening_20260609/metadata.json), State: [./archive/test_infrastructure_hardening_20260609/state.toml](./archive/test_infrastructure_hardening_20260609/state.toml)*
+*Link: [./archive/test_infrastructure_hardening_20260609/](./archive/test_infrastructure_hardening_20260609/)*

-*Closed by `docs_sync_test_era_20260610` on 2026-06-10. All 8 phases committed (last checkpoint 719fe9a). Final batch status: 314/314 tests green across all 11 tier batches (`docs/reports/test_infrastructure_hardening_batch_green_20260610.md`). Lineage: `workspace_path_finalize_20260609` (precursor, also archived) + `mma_tier_usage_reset_fix_20260610` + `rag_phase4_sync_fix_20260610` (both also archived).*
+`[track-created: 566cf08c] [phase-1-done: 5df22fa8] [phase-2-done: 67d0211e] [phase-3-done: 006bb114] [phase-4-done: b8fcd9d6] [phase-5-done: 33d5cac] [phase-6-done: 7b87bbf5] [phase-7-done: 84edb200] [phase-8-done: 719fe9a]`

-*Goal: **Kill the test regression nightmare** that has consumed 4+ days of Tier 2 work. Fix 3 root causes of test regression churn: (1) subprocess state pollution via autouse `_check_live_gui_health` respawn (FR1), (2) filesystem path hygiene via `tmp_path_factory` + `live_gui_workspace` fixture (FR2), (3) `_sync_rag_engine` io_pool race via token + dirty flag coalescing (FR3). Plus 2 related fixes: `set_value` hook routing for `ai_input` (FR4), and an opt-in `clean_baseline` marker (FR5). 8 phases, ~60 surgical tasks, 6.5 days. Produces `docs/reports/test_bed_health_20260609.md` as the green baseline for the 4 upcoming tracks. **Inherits from** `test_infra_hardening_foundation_20260608` + `batch_resilience_plan_20260608` + `rag_test_batch_failure_status_20260609_pm3` + `rag_work_final_20260609_pm`. **Supersedes** the placeholder tracks `fix_remaining_tests_20260513`, `test_harness_hardening_20260310`, `test_patch_fixes_20260513`, and `test_batching_post_refactor_polish_20260607` (whose work is now scoped in FR1+FR2+FR3). **Blocks** the 4 upcoming tracks (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor) and code_path_audit_20260607. **Tier 2 supervision required for** Phases 1, 3, 4 (audit review, conftest refactor, io_pool race fix).*
+*8 phases, ~60 surgical tasks, 6.5 days. Fixes 3 root causes of test regression churn: FR1 subprocess health autouse, FR2 `live_gui_workspace` fixture (per-run timestamped under `tests/artifacts/`), FR3 `_sync_rag_engine` token+dirty coalescing. Plus FR4 `set_value` hook + FR5 `clean_baseline` marker. 314/314 tests green across all 11 tier batches. Closing report: `docs/reports/test_infrastructure_hardening_batch_green_20260610.md`. Lineage: `workspace_path_finalize_20260609` + `mma_tier_usage_reset_fix_20260610` + `rag_phase4_sync_fix_20260610` (all also archived).*

 ### In Plan (or Pending Spec)