Private

Public Access

Files

T

ed fcb161fd2e conductor(tracks): add test_infrastructure_hardening_20260609 as foundation track + supersede 4 placeholder test tracks

2026-06-09 15:18:20 -04:00

60 KiB

Raw Blame History

Project Tracks

This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder (or in ../archive/<track_name>/ for completed tracks).

Structure:

Active Tracks (Current Queue): In-flight and unblocked work the implementer can pick up today.
Phase 0 - 9 (Chronological): The full project history in chronological order. Each phase has three sub-sections: Active (work in progress), Completed (work shipped but track not yet archived), Archived (track folder moved to archive/).

Archive directories live at ../archive/<track_name>/ (from this file's location at conductor/tracks.md); the ./archive/... links in this file are relative to that location and resolve correctly.

Active Tracks (Current Queue)

Tracks that are unblocked and ready to start. Ordered by dependency (blocked-by first) and priority (A foundational → D forward-looking).

#	Priority	Track	Status	Blocked By
1	A	Test Infrastructure Hardening (2026-06-09)	spec ✓, plan ✓, ready to start	(none — foundation track; SUPERSEDES tracks 19, 20, 21, 22)
2	A	Qwen, Llama & Grok Vendor Integration + Capability Matrix	spec ✓, plan pending	test_infrastructure_hardening_20260609 (was: none)
3	A	Data-Oriented Error Handling (Fleury Pattern)	spec ✓, plan ✓, ready to start	startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609, qwen_llama_grok
4	A	Data Structure Strengthening (Type Aliases + NamedTuples)	spec ✓, plan pending	test_infrastructure_hardening_20260609 (was: none)
5	A	MCP Architecture Refactor (Sub-MCP Extraction)	spec ✓, plan pending	test_infrastructure_hardening_20260609, data_oriented_error_handling, data_structure_strengthening
6	D	Public API Result Migration	placeholder; not yet specced	data_oriented_error_handling (deprecated `send()`)
7	—	UI Polish (Five Issues)	spec ✓, plan ✓, ready to start	(none — independent)
8	—	Bootstrap gencpp Python Bindings	spec TBD	(none — independent)
9	—	Tree-Sitter Lua MCP Tools	spec TBD	(none — independent)
10	—	GDScript Language Support Tools	spec TBD	(none — independent)
11	—	C# Language Support Tools	spec TBD	(none — independent)
12	—	OpenAI Provider Integration	spec TBD	(none — independent)
13	—	Zhipu AI (GLM) Provider Integration	spec TBD	(none — independent)
14	—	AI Provider Caching Optimization	spec TBD	(none — independent)
15	—	Manual UX Validation & Review	spec TBD	(none — independent)
15a	—	Manual UX Validation — ASCII-Sketch Workflow	spec ✓, plan ✓, ready to start	(none — independent; NEW 2026-06-08)
15b	—	Chunkification Optimization (Contingency)	spec ✓ (contingency), no plan	hard constraint surface (deferred)
16	—	GenCpp Dogfood Feedback Loop	spec TBD	(none — independent; oldest pending track)
17	—	Code Path Audit	spec TBD	test_infrastructure_hardening_20260609 (was: none)
18	—	GUI Architecture Refinement	(no spec.md)	(TBD)
19	—	Context First Message Fix	spec TBD	(none — independent)
19	—	~~Fix Remaining Tests~~	~~SUPERSEDED by track 1~~	—
20	—	~~Test Harness Hardening~~	~~SUPERSEDED by track 1~~	—
21	—	~~Test Patch Fixes~~	~~SUPERSEDED by track 1~~	—
22	—	~~Test Batching Post-Refactor Polish~~	~~SUPERSEDED by track 1 (FR1 + FR2)~~	—
20	—	Prior Session Test Harden (20260605)	superseded; no action needed	—

Note on numbering: the legacy file used 0a, 0b, 0c... and 0d, 0e, 0f, 0g for tracks created 2026-06-06+. This is the git-blame sort order, not a logical execution order. The new structure re-orders by dependency.

Phase 0: Infrastructure (Critical)

Initialized: 2026-02 (project foundation)

Completed

Track: Conductor Path Configuration Note: One-line entry; full details in ./tracks/conductor_path_configurable_20260306/ (still in tracks/; not yet archived).

Phase 1: Pre-Track Foundation (2026-02 - 2026-03)

No tracks were added under explicit Phase 1; this section is reserved for the early architectural groundwork that preceded the formal track system.

Completed

Various one-off refactors; full details in conductor/archive/ by track name prefix.

Phase 2: Strict Execution Queue

Completed 2026-03-06

Completed

Track: Strict Execution Queue (Phase 2) See: ./archive/strict_execution_queue_completed_20260306/

Phase 3 - Phase 4: Foundational Tracks (March 2026)

Multiple sub-tracks under the initial feature-development push. All archived.

Archived

Tracks 1 - 29 of the original Phase 4 archive (preserved with original numbers for cross-reference continuity):

~~Track: Session Context Snapshots & Visibility~~ (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/session_context_snapshots_20260311/
~~Track: Discussion Takes & Timeline Branching~~ (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/discussion_takes_branching_20260311/
Track: RAG Support Link: ./archive/rag_support_20260308/
Track: Agent Tool Preference & Bias Tuning Link: ./archive/tool_bias_tuning_20260308/
Track: Expanded Hook API & Headless Orchestration Link: ./archive/hook_api_expansion_20260308/
Track: Codebase Audit and Cleanup Link: ./archive/codebase_audit_20260308/
Track: Expanded Test Coverage and Stress Testing Link: ./archive/test_coverage_expansion_20260309/
Track: Beads Mode Integration Link: ./archive/beads_mode_20260309/
Track: Optimization pass for Data-Oriented Python heuristics Link: ./archive/data_oriented_optimization_20260312/
Track: Rich Thinking Trace Handling Link: ./archive/thinking_trace_handling_20260313/
Track: Smarter Aggregation with Sub-Agent Summarization Link: ./archive/aggregation_smarter_summaries_20260322/
Track: System Context Exposure Link: ./archive/system_context_exposure_20260322/
Track: Advanced Log Management and Session Restoration Link: ./archive/log_session_overhaul_20260308/
Track: UI Theme Overhaul & Style System Link: ./archive/ui_theme_overhaul_20260308/
Track: Selectable GUI Text & UX Improvements Link: ./archive/selectable_ui_text_20260308/
Track: Markdown Support & Syntax Highlighting Link: ./archive/markdown_highlighting_20260308/
Track: Custom Shader and Window Frame Support Link: ./archive/custom_shaders_20260309/
Track: UI/UX Improvements - Presets and AI Settings Link: ./archive/presets_ai_settings_ux_20260311/
Track: Discussion Hub Panel Reorganization Link: ./archive/discussion_hub_panel_reorganization_20260322/
Track: Undo/Redo History Support Link: ./archive/undo_redo_history_20260311/
Track: Advanced Text Viewer with Syntax Highlighting Link: ./archive/text_viewer_rich_rendering_20260313/
Track: Tree-Sitter C/C++ MCP Tools Link: ./archive/ts_cpp_tree_sitter_20260308/
Track: Saved System Prompt Presets Link: ./archive/saved_presets_20260308/
Track: Saved Tool Presets Link: ./archive/saved_tool_presets_20260308/
Track: External Text Editor Integration for Approvals Link: ./archive/external_editor_integration_20260308/
Track: Agent Personas: Unified Profiles & Tool Presets Link: ./archive/agent_personas_20260309/
Track: Advanced Workspace Docking & Layout Profiles Link: ./archive/workspace_profiles_20260310/
Track: Review investigation of codebase and expose/cull any hidden invisible prompting Link: ./archive/cull_hidden_prompts_20260502/
Track: Test Regression Verification Link: ./archive/test_regression_verification_20260307/

Phase 5: Codebase Curation

Initialized: 2026-05-07

Completed (all archived)

Analysis & Structural Review

Track: Comprehensive Path Mapping & Tooling Link: ./archive/ai_interaction_call_graph_20260507/ Goal: Automated and manual derivation of all major code paths and pipelines in the system.
Track: Controller State Mutation Matrix Link: ./archive/controller_state_mutation_matrix_20260507/ Goal: Comprehensive map of all methods that modify the AppController and App state.
Track: Source-Wide Redundancy Audit Link: ./archive/source_wide_redundancy_audit_20260507/ Goal: Deep file-by-file audit to identify unused methods, duplicate logic, and dead code.
Track: Curate Provider Registries Link: ./archive/curate_provider_registries_20260507/ Goal: Move the PROVIDERS list to models.py and update all references to use this single source of truth.
Track: Encapsulate AppController Status Link: ./archive/encapsulate_appcontroller_status_20260507/ Goal: Convert ai_status and mma_status to properties with thread-safe setters.
Track: Decouple GUI Log Loading Link: ./archive/decouple_gui_log_loading_20260507/ Goal: Move Tkinter directory selection out of AppController and into gui_2.py.
Track: Refactor Context Aggregation Pipeline Link: ./archive/refactor_context_aggregation_pipeline_20260507/ Goal: Modernize src/aggregate.py and consolidate legacy tier builders.
Track: Cull Unused Symbols Link: ./archive/cull_unused_symbols_20260507/ Goal: Safely remove the 27 dead symbols identified in the redundancy audit.
Track: Structural Dependency Mapping (SDM) Docstrings Link: ./archive/sdm_docstrings_20260509/
Track: AppController Curation & Structural Alignment Link: ./archive/app_controller_curation_20260513/ Goal: Curate src/app_controller.py to match gui_2.py organization and enforce Python style conventions.
Track: Fix 45 failing test files across 12 batches Link: ./archive/fix_test_suite_failures_20260514/
Track: Fix Indentation 1-Space Convention Link: ./archive/fix_indentation_1space_20260516/ Goal: Standardize all Python files to 1-space indentation per AI-Optimized Python Style Guide. Audit and correct indentation in src/, tests/, scripts/, and conductor/ directories.

Phase 6: Context Composition Redesign

Initialized: 2026-05-10

Completed (all archived)

Context Control & Workflow Enhancements

Track: Granular AST Control (Signatures vs. Definitions) Link: ./archive/granular_ast_control_20260510/ Goal: Introduce 'AST Signatures' and 'AST Definitions' states in the Context Panel for C/C++ files.
Track: Context Snapshotting per "Take" Link: ./archive/context_snapshotting_takes_20260510/ Goal: Snapshot and visually restore the Context Panel state when switching between Takes.
Track: Interactive Text Slice Highlighting Link: ./archive/interactive_text_slice_highlighting_20260510/ Goal: Allow highlighting text ranges to create fuzzy-anchored slices (Def, Sig, Hide) that survive file modifications.
Track: Context Batch Operations UX Link: ./archive/context_batch_operations_ux_20260510/ Goal: Add multi-select and batch state modification capabilities to the Context Panel for rapid wrangling.
Track: GenCpp Project Initialization Link: ./archive/gencpp_project_init_20260510/ Goal: Configure manual_slop.toml in the gencpp repo to isolate conductor tracks, logs, and history.
Track: Interactive AST Tree Masking Link: ./archive/interactive_ast_tree_masking_20260510/ Goal: Inspect C/C++ ASTs in the GUI and mask individual classes/functions as Def, Sig, or Hide.
Track: Phase 6 Review and Regression Verification Link: ./archive/phase6_review_20260510/ Goal: Review Phase 6 implementation, perform full-suite batch regression testing, and expand test coverage for new context curation features.
Track: Context Composition Decoupling Link: ./archive/context_comp_decouple_20260510/ Goal: Decouple Files & Media from Context Composition, add directory grouping, file stats, and view mode selection per file.
Track: Context Composition Slice Visualization Link: ./archive/context_comp_slices_20260510/ Goal: Enhance slice visualization with visual editor, annotation support (tags/comments), and view presets.
Track: GUI Refactor & Stabilization Link: ./archive/gui_refactor_stabilization_20260512/ Goal: Refactor gui_2.py to fix regressions and enforce better imgui scoping patterns.
Track: GUI 2 Large Cleanup (originally listed as "I started to do a large cleanup to ./src/gui_2.py..." — the long user message was the track description) Link: ./archive/gui_2_cleanup_20260513/ Goal: Study gui_2.py and derive more information on how to maintain and write code for the Python codebase. Update product guidelines or the python code_styleguidelines based on what is discovered. May also need changes to the mcp_tools for better structural awareness of annotations or other conventions with these python files.
Track: Add Python structural MCP tools (py_remove_def, py_add_def, py_move_def, py_region_wrap) Link: ./archive/python_structural_mcp_tools_20260513/
[~] Track: Context Preview & Slice Editor Fixes Link: ./tracks/context_preview_fixes_20260516/ Goal: Fix Preview button generating empty content, and Inspect/Slices buttons failing to open their respective editor panels. Status: in progress; track folder still in tracks/ (not yet archived).

Active

Track: GenCpp Dogfood Feedback Loop Link: ./tracks/gencpp_dogfood_feedback_20260510/ Goal: Verify Manual Slop can target gencpp at C:/projects/gencpp and establish a feedback mechanism for issues found during dogfooding. Status: oldest pending track (2026-05-10). Track folder still in tracks/.

Hot Reload Feature (2026-05-16)

Single-track feature, not part of a numbered Phase.

Archived

Track: Hot Reload Python Codebase (Phase 2) Link: ./archive/hot_reload_python_20260516/ Goal: Implement selective, state-preserving hot-reload for src/gui_2.py with delegation pattern refactor, manual trigger via Ctrl+Alt+R and GUI button, and visual error tint feedback on failure.

Phase 7: Stabilization & Polishing (2026-05-13 to 2026-06-02)

Two archival phases under the same "Phase 7" umbrella. Both completed; tracks moved to archive/.

Archived

Track: Phase 7 Stabilization and Polishing (Regressions Fix) Link: ./archive/phase7_stabilization_and_polishing_20260601/
Track: Phase 7 Monolithic Stabilization (Final Cleanup) Link: ./archive/phase7_monolithic_stabilization_20260602/

Late May 2026 - Early June 2026: One-Off Fixes and Polish

One-off bug fixes and UX polish that landed in the days leading up to the major track work. All archived.

Archived

Track: Robust Live Simulation Verification
Track: Fix GUI Crashes in Tool Preset Manager and Discussion Hub Link: ./archive/gui_crash_fixes_20260531/
Track: Fix keys_down AttributeError in ImGui IO Link: ./archive/fix_imgui_keys_down_20260601/
Track: Selectable Thinking Monologs Link: ./archive/selectable_thinking_monologs_20260601/
Track: Fix MiniMax history sequencing and truncation Link: ./archive/minimax_history_fix_20260601/
Track: Preserve context selection on discussion switch and add empty context warning Link: ./archive/context_preservation_and_warnings_20260601/
Track: Fix Text Viewer docking conflicts and Tool Call row click interactivity Link: ./archive/text_viewer_and_tool_call_fixes_20260601/
Track: UX Refinements for Context Composition and Discussion Entries Link: ./archive/context_composition_ux_20260601/
Track: Combine AST Inspector and Slices Editor into a unified Structural File Editor Link: ./archive/structural_file_editor_20260601/
Track: Add per-response token metrics and AI-assisted history compression Link: ./archive/discussion_metrics_and_compression_20260601/
Track: Fix Approve Modal sizing and inline full preview Link: ./archive/approve_modal_ux_20260601/
Track: Implement Async Context Preview to fix UI hangs and add an 'Everything' Command Palette. Link: ./archive/command_palette_and_performance_20260602/ Goal: Async context preview offload (background thread, state lock) + Command Palette (32 commands, fuzzy search, Ctrl+Shift+P, Up/Down/Enter nav, 13 unit + 7 live_gui tests). Phases 1-3 complete.
Track: Comprehensive Documentation Refresh Link: ./archive/documentation_refresh_comprehensive_20260602/ Goal: Refresh stale documentation across docs/. Completed: ASCII file tree updates (docs/Readme.md + Readme.md 5→14 guides, 22→53 src modules), docs/guide_testing.md (new, comprehensive 251-file test suite reference), 7 per-source-file guides (guide_gui_2.md, guide_ai_client.md, guide_api_hooks.md, guide_mcp_client.md, guide_app_controller.md, guide_multi_agent_conductor.md, guide_models.md). All 14 guides cross-linked. Gap analysis: ./archive/documentation_refresh_comprehensive_20260602/gap_analysis.md.

Sub-tracks (all checkpointed):
- Sub-Track 1: Docs Layer Refresh [checkpoint: 20225c8] — 18 per-file atomic commits. 15 guides (8 refreshed + 7 new), Subsystem Index (24 entries), 106 cross-links all resolve, symbol parity fixed (apply_nerv_theme -> apply_nerv).
- Sub-Track 2: Conductor Docs Refresh [checkpoint: ef4efab2] — 4 per-file atomic commits: product.md (14 guides, MiniMax, Command Palette), tech-stack.md (MiniMax, Gemini Embedding 001), workflow.md (2026-06-02 doc refresh, 45-tool count), index.md (active track links).
- Sub-Track 3: Agent Config Refresh [checkpoint: 87f668a6] — 3 per-file atomic commits: AGENTS.md (5.4K -> 0.7K thin pointer), CLAUDE.md (6.7K -> 0.2K deprecation stub), GEMINI.md (5 providers, sloppy.py entry, 12 key modules). Drift check: 0 issues in 9 mirrored skill files.
Track: Test Consolidation & TOML Sandboxing [checkpoint: cb91006c] Spec: ./../../docs/superpowers/specs/2026-06-02-test-consolidation-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-test-consolidation.md Goal: Audit tests for real-TOML usage, migrate offenders to sandboxed patterns. Added scripts/check_test_toml_paths.py audit script (CI gate). Migrated test_mcp_client_whitelist_enforcement to tmp_path (was the only offender). Skipped redundant enforce_no_real_toml fixture — existing isolate_workspace autouse + audit script provide equivalent coverage.

Phase 8: UI Polish (2026-06-03)

Initialized: 2026-06-03

User review surfaced five outstanding UI issues, each previously attempted without success. This track addresses them as five independent phases with their own TDD cycles and atomic commits.

Active

Track: UI Polish (Five Issues) Spec: ./../../docs/superpowers/specs/2026-06-03-ui-polish-design.md Plan: ./../../docs/superpowers/plans/2026-06-03-ui-polish.md *Goal: Resolve five long-standing UI issues:
- Phase 1: GFM markdown table rendering (pre-processor into src/markdown_table.py, wire into MarkdownRenderer.render).
- Phase 2: Widen the Keep Pairs numeric input next to Truncate in the discussion panel (gui_2.py:3829, width 80 -> 140, switch to drag_int).
- Phase 3: Fix Refresh Registry button in Log Management — currently instantiates LogRegistry without calling load_registry() so the displayed table never reflects on-disk state (gui_2.py:1675).
- Phase 4: Add Vendor State tab to Operations Hub — at-a-glance provider/model, context-window utilization, cache hit rate, last error class, vendor quota (new src/vendor_state.py aggregator + controller.vendor_quota field + ai_client wire-up).
- Phase 5: Files & Media > Files directory-grouped tree (re-use aggregate.group_files_by_dir, mirror render_context_files_table collapsible-node style).*

Recently Archived (post-Phase 8)

Track: Clean Install Test [checkpoint: d14ae3b] Link: ./tracks/clean_install_test_20260603/, Spec: ./../../docs/superpowers/specs/2026-06-02-clean-install-test-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-clean-install-test.md Goal: Add opt-in pytest test (RUN_CLEAN_INSTALL_TEST=1) that clones the repo to tmp_path, runs uv sync, launches sloppy.py --enable-test-hooks, verifies Hook API responds. Catches "works on my machine" failures. Added clean_install marker to pyproject.toml. Created tests/test_clean_install.py (114 lines, uses urllib.request from stdlib per tech-stack.md dependency minimalism rule - deviation from plan). Skipped by default. Marked with @pytest.mark.clean_install.
Track: Fix markdown_helper.py for imgui-bundle >=1.92.801 [checkpoint: 7a34edf] Link: ./tracks/markdown_helper_language_api_compat_20260603/ Goal: First thing the clean install test caught. ed.TextEditor.LanguageDefinitionId enum was removed in imgui-bundle>=1.92.801. Replaced with version-compat shim helpers _get_language_id(name) and _set_editor_language(editor, lang_obj) that detect the API at runtime (1.92.5 enum vs 1.92.801+ factory). Also added parallel _editor_lang_cache to track current language tag per editor (robust to API name differences like "C++" vs "cpp"). Verified: test passes in opt-in mode (1.92.801), shim still works in local 1.92.5 env, follow-up commit b306f8f corrected test URL /api/mma_status -> /api/gui/mma_status (actual endpoint per src/api_hooks.py:181).
Track: Multi-Theme TOML System (Multi-Themes Mod) [checkpoint: 38abf231] Link: ./tracks/multi_themes_20260604/, Plan: ./../../docs/superpowers/plans/2026-06-04-theme-syntax-modularization.md Goal: TOML-based theming: per-theme file layout (themes/<name>.toml global + <project>/project_themes.toml overrides), schema (syntax_palette + [colors] table of imgui.Col_ snake_case keys), public API (load_themes_from_disk, get_syntax_palette_for_theme, apply_syntax_palette), MarkdownRenderer calls apply_syntax_palette on init, color-callable convention (C_LBL() / C_VAL() so theme switches take effect at use site), upstream 4-syntax-palette limit documented in ./../../docs/guide_themes.md (new guide). 8 new theme files shipped. Theme-caused production bug fixed at src/gui_2.py:3705-3707 (commit 1469ecac): DIR_COLORS dict stored C_VAL not C_VAL(), so imgui.text_colored(d_col, ...) was being passed a function. Fixed by calling the function at the use site.
[~] Track: Test Regression Fixes (post multi-themes ship) [checkpoint: d7487af4] Link: ./tracks/regression_fixes_20260605/, Plan: ./../../docs/superpowers/plans/2026-06-05-regression-fixes.md Goal: Resolve 21 failing tests surfaced after the multi-themes ship. 11 of 21 fixed across 10 atomic commits: theme regression (test_gui_progress C_LBL/C_VAL API change, 38abf231), pre-existing non-live_gui (test_gui_phase4 markdown_helper mocks, df43f158; test_view_presets persona_manager mock, 970f198c), GUI production bug (DIR_COLORS callable, 1469ecac), live_gui LogPruner busy loop (ac08ee87), RAG NoneType guard (c96bdb06). Root cause of remaining 10 live_gui failures identified (commit d7487af4): imgui.save_ini_settings_to_memory() at src/gui_2.py:601 crashes C-level (0xc0000005) when called in the first few render frames because ImGui's internal state (Fonts, DisplaySize, Settings) isn't ready. Crash is uncatchable from Python. Fixed with _ini_capture_ready flag (defer-not-catch pattern): first call returns b"" and sets the flag, subsequent calls invoke the C function. Bisect anchors: 7df65dff (pre-existing failures start), 7ea52cbb (theme-caused failures start). Deferred follow-up track needed for ~5 remaining live_gui tests (MMA engine state transitions, RAG status timing, one test needing substantial render path mocks).
Track: Live-GUI Fragility Fixes (post regression_fixes ship) [checkpoint: 1488e715] [superseded by live_gui_test_hardening_v2] Link: Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-fragility-fixes.md, Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-fragility-fixes-design.md Goal: Resolve the 3 remaining live_gui failures (269/272 → 271/272 plus 1 new regression unit test). 1-line src fix in _capture_workspace_profile (change ini=b"" to ini="" to satisfy WorkspaceProfile.ini_content: str contract that tomli_w enforces); the b"" sentinel was a regression from d7487af4 that caused save_workspace_profile to raise TypeError, profile never saved, load_workspace_profile became a no-op. 1 new unit test (tests/test_workspace_profile_serialization.py) encoding the str/bytes contract. test_prior_session_no_pop_imbalance is deferred to a separate follow-up track — the test was more under-mocked than the spec assumed; fixing imscope.window tuple-return only revealed the next un-mocked dependency (imgui.begin returning bool where 2-tuple expected at line 4496). render_main_interface is a kitchen-sink function requiring 50+ mocks; a follow-up track will either add the missing mocks or refactor the test to exercise a narrow prior-session render path. Change 4 (doc hardening of defer-not-catch sections) deferred to track end; not done due to scope focus.
Track: Live-GUI Test Hardening v2 (post v1 ship) [complete: 26e0ced4] Note: No standalone track directory was created; the v2 work was completed as commit 26e0ced4 within the live_gui_fragility_fixes_20260605 lineage. The "v1" track directory ./archive/hot_reload_python_20260516/ is unrelated; this is a logical successor track with no folder of its own. Goal: Resolve the 4 remaining live_gui failures (was 3 in v1; 1 new regression). v1 fixed the str/bytes sentinel bug but exposed a deeper issue. Decomposed into 4 sub-tracks, 3 active: Sub-track 1: live_gui_state_sync_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-state-sync-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-state-sync.md. REAL root cause was bad indentation in src/gui_2.py:607 (user fixed). The App class had _capture_workspace_profile being parsed as nested inside _apply_snapshot due to indentation. Once fixed, 3 tests (test_auto_switch_sim, test_workspace_profiles_restoration, test_undo_redo_lifecycle) immediately passed. App/Controller state sync is already correctly handled by getattr/setattr at lines 478-487. Sub-track 2: prior_session_test_harden_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-prior-session-test-harden-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-prior-session-test-harden.md. Test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4. Sub-track 3: wait_for_ready_test_pattern_20260605 - SKIPPED. Tests already pass without polling. The flake hypothesis (time.sleep not enough) was wrong; the real cause was the indent. Polling can be a follow-up hardening pass if tests become flaky in CI. Sub-track 4: undo_redo_lifecycle_fix_20260605 - RESOLVED by Sub-track 1 indent fix. test_undo_redo_lifecycle now passes; no separate investigation needed. Net result: 4 originally-failing live_gui tests all pass. User can run the full batched suite to confirm.

Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)

Initialized: 2026-06-06 — the current major sprint. Four foundational tracks launched in this sprint, plus one follow-up. Two already completed; three in plan state.

Active

Track: Sloppy.py Startup Speedup `[COMPLETE 2026-06-07]`

Link: ./tracks/startup_speedup_20260606/, Spec: ./tracks/startup_speedup_20260606/spec.md, Plan: ./tracks/startup_speedup_20260606/plan.md

[track-created: cd4fb045] [phase-1-2-done: f9a01258] [phase-3-done: 51c054ec] [phase-4-done: 3849d304] [phase-5a-done: 78d3a1db] [phase-5b-done: 69d098ba] [phase-5c-done: 48c96499] [phase-5d-done: de6b85d2] [phase-5-done: 515a3029] [phase-6-partial-done: 85d18885] [sub-track-1-done: 253e1798] [post-shipping-fix-1: 8c4791d0] [post-shipping-fix-2: 88fc42bb] [post-shipping-fix-3: 52ea2693] [sub-track-3-done: 8fea8fe9] [sub-track-4-done: f3d071e0] [conftest-atexit-fix: 8957c9a5] [phase-9-shipped: 12cec6ae] [sub-track-2a-done: 01ddf9f1] [sub-track-2b-done: a41b31ed] [sub-track-2c-done: 372b0681] [sub-track-2d-done: 11a9c4f7] [sub-track-2e+f-done: 2e3a6385] [audit-CLEAN: 2e3a6385]

Goal: Reduce sloppy.py startup time. Main Thread Purity Invariant. 9 phases, 57 tasks. 44 TDD tests added (all passing). 7 main thread purity tests enforce invariant for 6 refactored files. Final measured: import src.ai_client 161ms (was 1800ms; 91% reduction / 1638ms saved). import src.gui_2 341ms (was 1770ms; 81% reduction / 1429ms saved). Total ~3067ms saved on the 2 big files. 62 audit violations remain (was 63 after Sub-track 2 partial; was 67 baseline) - all 6 refactored files contribute 0 new violations. Sub-track 1 (Phase 6 full completion) at 253e1798: 15 ad-hoc threading.Thread() call sites migrated to self.submit_io(...); ZERO new threading.Thread() in src/; only 5 domain-specific exempt sites remain (HookServer HTTP/WS, asyncio loop, WorkerPool, CPU monitor). Sub-track 3 (Hook API warmup endpoints) at 8fea8fe9: GET /api/warmup_status and GET /api/warmup_wait?timeout=N. 7 tests (5 unit + 2 live_gui). All pass. Sub-track 4 (GUI status indicator) at f3d071e0: render_warmup_status_indicator() + _on_warmup_complete_callback() + App._post_init registration. 6 tests (5 unit + 1 live_gui). All pass. Conftest atexit fix at 8957c9a5: registers a non-blocking pool shutdown via atexit. Fixes the run_tests_batched.py hang between batches (ThreadPoolExecutor.del was blocking on shutdown(wait=True) for stuck warmup jobs). Sub-track 2 (audit violations) PARTIAL at ae3b433e: 1 of 63 violations fixed (tomli_w in src/models.py). 62 remain (pydantic in models.py; tree_sitter in file_cache.py; websockets/cost_tracker/session_logger in api_hooks.py; 48 in app_controller.py + gui_2.py; 4 in sloppy.py). These are large refactors (especially gui_2.py with 24 violations and app_controller.py with 24) that exceed the scope of a single sub-track; addressed as future work. 3 post-shipping bugfix commits: 8c4791d0 (real bug: _ensure_gemini_client UnboundLocalError + test_discussion_compression deepseek mock adaptation); 88fc42bb (spec convention: 7 sites in src/ai_client.py use _require_warmed('google.genai') + .types parent lookup instead of leaf); 52ea2693 (conftest: use AppController.wait_for_warmup(timeout=60.0) instead of direct import google.genai — user-corrected jank workaround). Pre-existing test failures (unrelated, user will address): test_api_generate_blocked_while_stale (ui_global_preset_name AttributeError); test_rag_large_codebase_verification_sim (RAG retrieval).

Track: Test Batching Refactor `[COMPLETE 2026-06-08] [archived]`

Link: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/, Spec: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/spec.md, Plan: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/plan.md

[track-created: b7a97374] [COMPLETE 2026-06-08] [phase-1-done: 57285d04] [phase-2-skipped: no-CI] [phase-3-done: 5252b6d7] [phase-4-done: 50bd894f] [archived: 50bd894f]

Adaptations: (a) library modules moved from scripts/ to tests/ per user directive; (b) auto-inference uses AST scan (not regex) per user "FUCK REGEX" policy + prereq spec; (c) Phase 2 (CI shadow run) skipped: no CI infrastructure in repo; manual plan-vs-actual spot-check was the equivalent verification. Goal: Replace alphabetical 4-at-a-time batching in scripts/run_tests_batched.py with fixture-class-isolated tiers: 0 (opt-in: clean_install/docker, gated on env var + --include-opt-in flag), 1 (unit, grouped by subsystem batch_group, pytest-xdist), 2 (mock_app, grouped), 3 (live_gui, all in one pytest invocation to amortize 15s startup), H (headless), P (performance, last). Hybrid classification: auto-infer from filename + AST fixture scan, hand-curated tests/test_categories.toml overrides for cross-cutting and ambiguous files. Opt-in per-test order control via [[files.X.test_order]] sub-tables, gated on a conftest-loaded pytest plugin (no-op without entries). Priority: B (process isolation) > A (subsystem diagnostic) > C (speed). 4 phases: library+dry-run, shadow run, switch default, cleanup. Goal: Reduce sloppy.py startup time by ~2000-2400ms. Main Thread Purity Invariant: main thread (entering immapp.run()) never imports a module heavier than imgui_bundle + lean gui_2 skeleton. No-prefetch rule: heavy SDKs (google.genai 955ms, anthropic 430ms, openai 445ms, fastapi 470ms) are lazy-only — paid once on first use, on the asyncio thread, not in the background. No-new-threads rule: all background work goes through AppController._io_pool (4-thread ThreadPoolExecutor, named controller-io-N); zero new threading.Thread(...) calls in src/. Enforcement: static scripts/audit_main_thread_imports.py CI gate + runtime tests/test_main_thread_purity.py (sys.addaudithook test). 9 phases, 57 tasks. Target: import src.ai_client < 50ms (from ~1800ms), import src.gui_2 < 500ms (from ~3000ms), live_gui.wait_for_server(timeout=15) no longer times out.

Active

Track: Test Infrastructure Hardening (2026-06-09) `[track-created: 566cf08c]`

Link: ./tracks/test_infrastructure_hardening_20260609/, Spec: ./tracks/test_infrastructure_hardening_20260609/spec.md, Plan: ./tracks/test_infrastructure_hardening_20260609/plan.md, Metadata: ./tracks/test_infrastructure_hardening_20260609/metadata.json, State: ./tracks/test_infrastructure_hardening_20260609/state.toml

Goal: Kill the test regression nightmare that has consumed 4+ days of Tier 2 work. Fix 3 root causes of test regression churn: (1) subprocess state pollution via autouse _check_live_gui_health respawn (FR1), (2) filesystem path hygiene via tmp_path_factory + live_gui_workspace fixture (FR2), (3) _sync_rag_engine io_pool race via token + dirty flag coalescing (FR3). Plus 2 related fixes: set_value hook routing for ai_input (FR4), and an opt-in clean_baseline marker (FR5). 8 phases, ~60 surgical tasks, 6.5 days. Produces docs/reports/test_bed_health_20260609.md as the green baseline for the 4 upcoming tracks. Inherits from test_infra_hardening_foundation_20260608 + batch_resilience_plan_20260608 + rag_test_batch_failure_status_20260609_pm3 + rag_work_final_20260609_pm. Supersedes the placeholder tracks fix_remaining_tests_20260513, test_harness_hardening_20260310, test_patch_fixes_20260513, and test_batching_post_refactor_polish_20260607 (whose work is now scoped in FR1+FR2+FR3). Blocks the 4 upcoming tracks (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor) and code_path_audit_20260607. Tier 2 supervision required for Phases 1, 3, 4 (audit review, conftest refactor, io_pool race fix).

In Plan (or Pending Spec)

Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix `[track-created: 7c1d597e]`

Link: ./tracks/qwen_llama_grok_integration_20260606/, Spec: ./tracks/qwen_llama_grok_integration_20260606/spec.md, Plan: ./tracks/qwen_llama_grok_integration_20260606/plan.md (to be authored by writing-plans skill)

Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a Vendor Capability Matrix (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in src/vendor_capabilities.py. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared send_openai_compatible() helper in src/openai_compatible.py that operates on a normalized request/response data structure; each _send_<vendor>() is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor _send_minimax() to use the helper (~250 lines → ~50). Out of scope (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive. Now blocked by test_infrastructure_hardening_20260609 (was: none).

Track: Data-Oriented Error Handling (Fleury Pattern) `[track-created: 494f68f9]`

Link: ./tracks/data_oriented_error_handling_20260606/, Spec: ./tracks/data_oriented_error_handling_20260606/spec.md, Plan: ./tracks/data_oriented_error_handling_20260606/plan.md

Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New src/result_types.py (ErrorKind enum, ErrorInfo dataclass, Result[T] with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new conductor/code_styleguides/error_handling.md canonical reference. Refactor src/mcp_client.py ((p, err) tuples → Result; 30+ assert p is not None → nil-sentinel paths), src/ai_client.py (ProviderError exception → ErrorInfo dataclass; _send_<vendor>() → _send_<vendor>_result() returning Result[str]; send() marked @deprecated; new send_result() public API), and src/rag_engine.py (RAGEngine methods → Result returns). Update conductor/product-guidelines.md + workflow.md + docs/guide_*.md so the convention is documented and future plans can incrementally migrate the remaining src/ files. Blocked by startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive. Follow-up: public_api_migration_20260606 (planned; not yet specced; no directory yet) — removes the deprecated ai_client.send() and migrates all callers. Detailed in the parent track's spec §12.1.

Track: Data Structure Strengthening (Type Aliases + NamedTuples) `[track-created: ed42a97a]`

Link: ./tracks/data_structure_strengthening_20260606/, Spec: ./tracks/data_structure_strengthening_20260606/spec.md, Plan: ./tracks/data_structure_strengthening_20260606/plan.md (to be authored by writing-plans skill)

Goal: Improve AI-readability by naming 430 currently-anonymous dict[str, Any] / list[dict[...]] / Tuple[...] types. New src/type_aliases.py with 10 TypeAlias definitions (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback) and 1 NamedTuple (FileItemsDiff). Mechanical replacement of 345 weak sites across 6 high-traffic files: src/ai_client.py (139), src/app_controller.py (86), src/models.py (51), src/api_hook_client.py (32), src/project_manager.py (20), src/aggregate.py (17). Add --strict mode to the existing scripts/audit_weak_types.py (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate scripts/audit_weak_types.baseline.json with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. Data-grounded: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. Honest about what's missing: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk. Now blocked by test_infrastructure_hardening_20260609 (was: none).

Track: MCP Architecture Refactor (Sub-MCP Extraction) `[track-created: 2720a894]`

Link: ./tracks/mcp_architecture_refactor_20260606/, Spec: ./tracks/mcp_architecture_refactor_20260606/spec.md, Plan: ./tracks/mcp_architecture_refactor_20260606/plan.md (to be authored by writing-plans skill)

Goal: Split the 2,205-line monolithic src/mcp_client.py (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Naming convention mcp_<type>.py for native MCPs: mcp_file_io.py (9 tools), mcp_python.py (14), mcp_c.py (5), mcp_cpp.py (5), mcp_web.py (2), mcp_analysis.py (2). The existing ExternalMCPManager is extracted to mcp_external.py (class name preserved). New MCPController class in src/mcp_client.py holds the 3-layer security model (extracted to src/mcp_client_security.py), the ALL_SUB_MCPS registration list, and the inverted-dict dispatch lookup. New src/mcp_client_legacy.py re-exports all 45+ old symbols for backward compat (the 4 existing test files + src/app_controller.py:61 continue to work). Each sub-MCP's invoke() returns Result[str, ErrorInfo] (Fleury pattern). Path parameters use the Metadata family aliases. Blocked by test_infrastructure_hardening_20260609, data_oriented_error_handling_20260606 (for Result/ErrorInfo), and data_structure_strengthening_20260606 (for Metadata aliases). 7 phases: foundation (security + controller), move-to-legacy, extract File I/O, extract Python, extract C/C++/Web/Analysis, extract External, dispatch update + docs + archive. Out of scope (per user): a per-MCP DSL (APL/K/Cosy-inspired) for compact tool calls — deferred to mcp_dsl_20260606 follow-up. JSON-only for now.

Track: RAG Phase 4 Stress Test Fix `[x] — fixed 16412ad5`

Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). index_file() upserts silently corrupt the collection, then search() fails with Collection expecting embedding with dimension of 3072, got 384 and the AI request never reaches 'done' status, timing out the 500.5s = 25s poll loop. Fix: RAGEngine._init_vector_store now calls _validate_collection_dim which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: test_rag_collection_dim_mismatch_recreates_collection and test_rag_collection_dim_match_preserves_collection in tests/test_rag_engine.py. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*

Track: Prior Session Test Harden (20260605) `[superseded by live_gui_test_hardening_v2_20260605]`

Status: 2026-05-05 — Surfaced during live_gui_fragility_fixes_20260605 execution. test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders is more under-mocked than expected. Completed as part of live_gui_test_hardening_v2_20260605: test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4.

Backlog (Provider + Language + Investigation)

Track: Manual UX Validation — ASCII-Sketch Workflow (NEW 2026-06-08)

Link: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/, Spec: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md, Plan: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/plan.md Goal: Promote the ASCII-sketch UX ideation workflow (docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to a real track. Resolves 5 open questions (vocabulary preference, comparison policy, storage location, tooling, frequency), then executes the workflow on the first target: the per-entry rendering of the Discussion Hub at src/gui_2.py:3770 render_discussion_entry. The 23-op matrix A1-A7 in docs/guide_discussions.md is the source of truth; the SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md, 504 lines) informs the internal refactoring decisions. Complements the broader 20260302 track. 4 phases, 21 tasks, TDD-style for Phase 3. User-confirmed worth doing. Status: Active; Phase 1 (5 open questions to the user) is the current phase.

Track: Chunkification Optimization (NEW 2026-06-08, CONTINGENCY)

Link: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/, Spec: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md Goal: Contingency document only. Activates ONLY when a hard constraint surfaces that no existing Python package can solve AND the target is hot enough to justify the C11 build cost. Per user (verbatim): "only worth it if I reach a hard constraint that I cannot solve with an existing python package." The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are NOT currently bottlenecks per src/aggregate.py:380-454 (pure-Python string concat, zero third-party markdown deps in pyproject.toml:6-27) and src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity, debounced). First fix if they become bottlenecks: add markdown-it-py OR switch to pickle/msgspec — NOT C11. The shape when activated: subprocess-launch C11 binary with request/response blob wire format (NOT stateful C extension). The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 + "Xar-style chunked arrays" recommendation in §5.2 pre-support this track. Status: Deferred. Promotes to active track when (if) the first hard constraint surfaces.

Track: Context First Message Fix

Link: ./tracks/context_first_message_fix_20260604/

Track: Fix Remaining Tests

Link: ./tracks/fix_remaining_tests_20260513/

Track: Test Harness Hardening

Link: ./tracks/test_harness_hardening_20260310/

Track: Test Patch Fixes

Link: ./tracks/test_patch_fixes_20260513/

Track: Test Batching Post-Refactor Polish

Link: ./tracks/test_batching_post_refactor_polish_20260607/

Track: Code Path Audit

Link: ./tracks/code_path_audit_20260607/, Spec: ./tracks/code_path_audit_20260607/spec.md, Plan: ./tracks/code_path_audit_20260607/plan.md (to be authored by writing-plans skill) Goal: Build src/code_path_audit.py — a static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: custom postfix .dsl data + markdown + Mermaid + prefix tree text under docs/reports/code_path_audit/<date>/. The follow-up pipeline_pruning_20260607 consumes the .dsl files; the markdown + tree are for human review. MMA worker spawn is cold per user. Timing (revised 2026-06-08): the audit must run after the 4 foundational tracks ship (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor); pre-4-tracks code is too stale to ground optimization decisions.

Track: GUI Architecture Refinement

Link: ./tracks/gui_architecture_refinement_20260512/ (no spec.md; needs scoping before planning)

Follow-up (Planned, Not Yet Specced)

Track: Public API Result Migration (follow-up to data_oriented_error_handling_20260606)

Plan to be authored when data_oriented_error_handling_20260606 is complete; not started yet. Goal: Remove the deprecated ai_client.send() and migrate all callers to send_result(). Affects src/app_controller.py:290 and :3559, src/multi_agent_conductor.py:591, src/orchestrator_pm.py:86, src/conductor_tech_lead.py:68 (4 production call sites in src/), and ~50+ test files. The 4-caller enumeration + baseline counts are recorded in the parent track's spec §12.1.

Phase 9: Chore Tracks

Initialized: 2026-06-07

Completed (recently archived or in `tracks/`)

Track: Unused Scripts Cleanup [checkpoint: 46ce3cd] Link: ./tracks/unused_scripts_cleanup_20260607/, Spec: ./tracks/unused_scripts_cleanup_20260607/spec.md, Plan: ./tracks/unused_scripts_cleanup_20260607/plan.md Goal: Remove 30 confirmed-unused one-off scripts from scripts/ (56 → 26 files, 54% reduction). 5 atomic per-category commits; no new CI gate; follow-up unused_scripts_audit_20260607 recorded. All non-GUI test batches still pass; 2 audit scripts (main_thread_imports, weak_types) report no new violations.
Track: License & CVE Audit (Dependency Compliance) [checkpoint: a7ab994f] Link: ./tracks/license_cve_audit_20260607/, Spec: ./tracks/license_cve_audit_20260607/spec.md, Plan: ./tracks/license_cve_audit_20260607/plan.md Goal: Build scripts/audit_license_cve.py — single audit script that checks third-party deps (pyproject.toml + uv.lock transitive) for license compliance + known CVEs + version-pinning + SPDX source-headers. Tilde-pin all deps, delete requirements.txt, regenerate uv.lock (gitignored per project policy), add --strict mode + baseline file (CI gate). Policy: ALLOW (permissive + weak copyleft + public domain), BLOCK (GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, unknown). Track is scope-limited to third-party deps; the project's own LICENSE and SPDX headers are explicitly OUT of scope (the user reserves all rights to the repo). 28 unit + integration tests passing; --strict mode wired as CI gate; baseline file committed at scripts/audit_license_cve.baseline.json. 4 atomic commits: audit script + initial report, tilde-pin + lock regen + delete requirements.txt, --strict + baseline, tracks.md update.

Notes

Archive link convention: ./archive/... paths in this file resolve to conductor/archive/... (this file is at conductor/tracks.md). The 71 archive links in this file are all valid as of 2026-06-08.

Status legend:

[ ] not started
[~] in progress
[x] completed (track may still be in tracks/ or may have been moved to archive/)
~~**...**~~ struck-through (renamed/replaced/superseded)

Naming convention: Each track's spec.md and plan.md (where present) follow the project's standard format: spec.md for design intent (the "why"), plan.md for executable tasks (the "how"). See conductor/tracks/data_oriented_error_handling_20260606/ for the canonical example.

Editing this file: When you mark a track as [x] and move its folder to archive/, also move it to the appropriate Archived sub-section. When you start a new track, create the folder under tracks/ first, then add the entry to the Active Tracks table at the top. The git-blame sort order (0a, 0b, 0c...) is no longer used; this file is now organized by phase + dependency.

60 KiB Raw Blame History

Project Tracks

Active Tracks (Current Queue)

Phase 0: Infrastructure (Critical)

Completed

Phase 1: Pre-Track Foundation (2026-02 - 2026-03)

Completed

Phase 2: Strict Execution Queue

Completed

Phase 3 - Phase 4: Foundational Tracks (March 2026)

Archived

Phase 5: Codebase Curation

Completed (all archived)

Analysis & Structural Review

Phase 6: Context Composition Redesign

Completed (all archived)

Context Control & Workflow Enhancements

Active

Hot Reload Feature (2026-05-16)

Archived

Phase 7: Stabilization & Polishing (2026-05-13 to 2026-06-02)

Archived

Late May 2026 - Early June 2026: One-Off Fixes and Polish

Archived

Phase 8: UI Polish (2026-06-03)

Active

Recently Archived (post-Phase 8)

Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)

Active

Track: Sloppy.py Startup Speedup [COMPLETE 2026-06-07]

Track: Test Batching Refactor [COMPLETE 2026-06-08] [archived]

Active

Track: Test Infrastructure Hardening (2026-06-09) [track-created: 566cf08c]

In Plan (or Pending Spec)

Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix [track-created: 7c1d597e]

Track: Data-Oriented Error Handling (Fleury Pattern) [track-created: 494f68f9]

Track: Data Structure Strengthening (Type Aliases + NamedTuples) [track-created: ed42a97a]

Track: MCP Architecture Refactor (Sub-MCP Extraction) [track-created: 2720a894]

Track: RAG Phase 4 Stress Test Fix [x] — fixed 16412ad5

Track: Prior Session Test Harden (20260605) [superseded by live_gui_test_hardening_v2_20260605]

Backlog (Provider + Language + Investigation)

Track: Bootstrap gencpp Python Bindings

Track: Tree-Sitter Lua MCP Tools

Track: GDScript Language Support Tools

Track: C# Language Support Tools

Track: OpenAI Provider Integration

Track: Zhipu AI (GLM) Provider Integration

Track: AI Provider Caching Optimization

Track: Manual UX Validation & Review

Track: Manual UX Validation — ASCII-Sketch Workflow (NEW 2026-06-08)

Track: Chunkification Optimization (NEW 2026-06-08, CONTINGENCY)

Track: Context First Message Fix

Track: Fix Remaining Tests

Track: Test Harness Hardening

Track: Test Patch Fixes

Track: Test Batching Post-Refactor Polish

Track: Code Path Audit

Track: GUI Architecture Refinement

Follow-up (Planned, Not Yet Specced)

Track: Public API Result Migration (follow-up to data_oriented_error_handling_20260606)

Phase 9: Chore Tracks

Completed (recently archived or in tracks/)

Notes

60 KiB

Raw Blame History

Track: Sloppy.py Startup Speedup `[COMPLETE 2026-06-07]`

Track: Test Batching Refactor `[COMPLETE 2026-06-08] [archived]`

Track: Test Infrastructure Hardening (2026-06-09) `[track-created: 566cf08c]`

Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix `[track-created: 7c1d597e]`

Track: Data-Oriented Error Handling (Fleury Pattern) `[track-created: 494f68f9]`

Track: Data Structure Strengthening (Type Aliases + NamedTuples) `[track-created: ed42a97a]`

Track: MCP Architecture Refactor (Sub-MCP Extraction) `[track-created: 2720a894]`

Track: RAG Phase 4 Stress Test Fix `[x] — fixed 16412ad5`

Track: Prior Session Test Harden (20260605) `[superseded by live_gui_test_hardening_v2_20260605]`

Completed (recently archived or in `tracks/`)