Private
Public Access
0
0
Files
manual_slop/conductor/tracks.md
T

60 KiB

Project Tracks

This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder (or in ../archive/<track_name>/ for completed tracks).

Structure:

  • Active Tracks (Current Queue): In-flight and unblocked work the implementer can pick up today.
  • Phase 0 - 9 (Chronological): The full project history in chronological order. Each phase has three sub-sections: Active (work in progress), Completed (work shipped but track not yet archived), Archived (track folder moved to archive/).

Archive directories live at ../archive/<track_name>/ (from this file's location at conductor/tracks.md); the ./archive/... links in this file are relative to that location and resolve correctly.


Active Tracks (Current Queue)

Tracks that are unblocked and ready to start. Ordered by dependency (blocked-by first) and priority (A foundational → D forward-looking).

# Priority Track Status Blocked By
1 A Test Infrastructure Hardening (2026-06-09) spec ✓, plan ✓, ready to start (none — foundation track; SUPERSEDES tracks 19, 20, 21, 22)
2 A Qwen, Llama & Grok Vendor Integration + Capability Matrix spec ✓, plan pending test_infrastructure_hardening_20260609 (was: none)
3 A Data-Oriented Error Handling (Fleury Pattern) spec ✓, plan ✓, ready to start startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609, qwen_llama_grok
4 A Data Structure Strengthening (Type Aliases + NamedTuples) spec ✓, plan pending test_infrastructure_hardening_20260609 (was: none)
5 A MCP Architecture Refactor (Sub-MCP Extraction) spec ✓, plan pending test_infrastructure_hardening_20260609, data_oriented_error_handling, data_structure_strengthening
6 D Public API Result Migration placeholder; not yet specced data_oriented_error_handling (deprecated send())
7 UI Polish (Five Issues) spec ✓, plan ✓, ready to start (none — independent)
8 Bootstrap gencpp Python Bindings spec TBD (none — independent)
9 Tree-Sitter Lua MCP Tools spec TBD (none — independent)
10 GDScript Language Support Tools spec TBD (none — independent)
11 C# Language Support Tools spec TBD (none — independent)
12 OpenAI Provider Integration spec TBD (none — independent)
13 Zhipu AI (GLM) Provider Integration spec TBD (none — independent)
14 AI Provider Caching Optimization spec TBD (none — independent)
15 Manual UX Validation & Review spec TBD (none — independent)
15a Manual UX Validation — ASCII-Sketch Workflow spec ✓, plan ✓, ready to start (none — independent; NEW 2026-06-08)
15b Chunkification Optimization (Contingency) spec ✓ (contingency), no plan hard constraint surface (deferred)
16 GenCpp Dogfood Feedback Loop spec TBD (none — independent; oldest pending track)
17 Code Path Audit spec TBD test_infrastructure_hardening_20260609 (was: none)
18 GUI Architecture Refinement (no spec.md) (TBD)
19 Context First Message Fix spec TBD (none — independent)
19 Fix Remaining Tests SUPERSEDED by track 1
20 Test Harness Hardening SUPERSEDED by track 1
21 Test Patch Fixes SUPERSEDED by track 1
22 Test Batching Post-Refactor Polish SUPERSEDED by track 1 (FR1 + FR2)
20 Prior Session Test Harden (20260605) superseded; no action needed

Note on numbering: the legacy file used 0a, 0b, 0c... and 0d, 0e, 0f, 0g for tracks created 2026-06-06+. This is the git-blame sort order, not a logical execution order. The new structure re-orders by dependency.


Phase 0: Infrastructure (Critical)

Initialized: 2026-02 (project foundation)

Completed


Phase 1: Pre-Track Foundation (2026-02 - 2026-03)

No tracks were added under explicit Phase 1; this section is reserved for the early architectural groundwork that preceded the formal track system.

Completed

  • Various one-off refactors; full details in conductor/archive/ by track name prefix.

Phase 2: Strict Execution Queue

Completed 2026-03-06

Completed


Phase 3 - Phase 4: Foundational Tracks (March 2026)

Multiple sub-tracks under the initial feature-development push. All archived.

Archived

Tracks 1 - 29 of the original Phase 4 archive (preserved with original numbers for cross-reference continuity):

  1. Track: Session Context Snapshots & Visibility (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/session_context_snapshots_20260311/

  2. Track: Discussion Takes & Timeline Branching (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/discussion_takes_branching_20260311/

  3. Track: RAG Support Link: ./archive/rag_support_20260308/

  4. Track: Agent Tool Preference & Bias Tuning Link: ./archive/tool_bias_tuning_20260308/

  5. Track: Expanded Hook API & Headless Orchestration Link: ./archive/hook_api_expansion_20260308/

  6. Track: Codebase Audit and Cleanup Link: ./archive/codebase_audit_20260308/

  7. Track: Expanded Test Coverage and Stress Testing Link: ./archive/test_coverage_expansion_20260309/

  8. Track: Beads Mode Integration Link: ./archive/beads_mode_20260309/

  9. Track: Optimization pass for Data-Oriented Python heuristics Link: ./archive/data_oriented_optimization_20260312/

  10. Track: Rich Thinking Trace Handling Link: ./archive/thinking_trace_handling_20260313/

  11. Track: Smarter Aggregation with Sub-Agent Summarization Link: ./archive/aggregation_smarter_summaries_20260322/

  12. Track: System Context Exposure Link: ./archive/system_context_exposure_20260322/

  13. Track: Advanced Log Management and Session Restoration Link: ./archive/log_session_overhaul_20260308/

  14. Track: UI Theme Overhaul & Style System Link: ./archive/ui_theme_overhaul_20260308/

  15. Track: Selectable GUI Text & UX Improvements Link: ./archive/selectable_ui_text_20260308/

  16. Track: Markdown Support & Syntax Highlighting Link: ./archive/markdown_highlighting_20260308/

  17. Track: Custom Shader and Window Frame Support Link: ./archive/custom_shaders_20260309/

  18. Track: UI/UX Improvements - Presets and AI Settings Link: ./archive/presets_ai_settings_ux_20260311/

  19. Track: Discussion Hub Panel Reorganization Link: ./archive/discussion_hub_panel_reorganization_20260322/

  20. Track: Undo/Redo History Support Link: ./archive/undo_redo_history_20260311/

  21. Track: Advanced Text Viewer with Syntax Highlighting Link: ./archive/text_viewer_rich_rendering_20260313/

  22. Track: Tree-Sitter C/C++ MCP Tools Link: ./archive/ts_cpp_tree_sitter_20260308/

  23. Track: Saved System Prompt Presets Link: ./archive/saved_presets_20260308/

  24. Track: Saved Tool Presets Link: ./archive/saved_tool_presets_20260308/

  25. Track: External Text Editor Integration for Approvals Link: ./archive/external_editor_integration_20260308/

  26. Track: Agent Personas: Unified Profiles & Tool Presets Link: ./archive/agent_personas_20260309/

  27. Track: Advanced Workspace Docking & Layout Profiles Link: ./archive/workspace_profiles_20260310/

  28. Track: Review investigation of codebase and expose/cull any hidden invisible prompting Link: ./archive/cull_hidden_prompts_20260502/

  29. Track: Test Regression Verification Link: ./archive/test_regression_verification_20260307/


Phase 5: Codebase Curation

Initialized: 2026-05-07

Completed (all archived)

Analysis & Structural Review

  1. Track: Comprehensive Path Mapping & Tooling Link: ./archive/ai_interaction_call_graph_20260507/ Goal: Automated and manual derivation of all major code paths and pipelines in the system.

  2. Track: Controller State Mutation Matrix Link: ./archive/controller_state_mutation_matrix_20260507/ Goal: Comprehensive map of all methods that modify the AppController and App state.

  3. Track: Source-Wide Redundancy Audit Link: ./archive/source_wide_redundancy_audit_20260507/ Goal: Deep file-by-file audit to identify unused methods, duplicate logic, and dead code.

  4. Track: Curate Provider Registries Link: ./archive/curate_provider_registries_20260507/ Goal: Move the PROVIDERS list to models.py and update all references to use this single source of truth.

  5. Track: Encapsulate AppController Status Link: ./archive/encapsulate_appcontroller_status_20260507/ Goal: Convert ai_status and mma_status to properties with thread-safe setters.

  6. Track: Decouple GUI Log Loading Link: ./archive/decouple_gui_log_loading_20260507/ Goal: Move Tkinter directory selection out of AppController and into gui_2.py.

  7. Track: Refactor Context Aggregation Pipeline Link: ./archive/refactor_context_aggregation_pipeline_20260507/ Goal: Modernize src/aggregate.py and consolidate legacy tier builders.

  8. Track: Cull Unused Symbols Link: ./archive/cull_unused_symbols_20260507/ Goal: Safely remove the 27 dead symbols identified in the redundancy audit.

  9. Track: Structural Dependency Mapping (SDM) Docstrings Link: ./archive/sdm_docstrings_20260509/

  10. Track: AppController Curation & Structural Alignment Link: ./archive/app_controller_curation_20260513/ Goal: Curate src/app_controller.py to match gui_2.py organization and enforce Python style conventions.

  11. Track: Fix 45 failing test files across 12 batches Link: ./archive/fix_test_suite_failures_20260514/

  12. Track: Fix Indentation 1-Space Convention Link: ./archive/fix_indentation_1space_20260516/ Goal: Standardize all Python files to 1-space indentation per AI-Optimized Python Style Guide. Audit and correct indentation in src/, tests/, scripts/, and conductor/ directories.


Phase 6: Context Composition Redesign

Initialized: 2026-05-10

Completed (all archived)

Context Control & Workflow Enhancements

  1. Track: Granular AST Control (Signatures vs. Definitions) Link: ./archive/granular_ast_control_20260510/ Goal: Introduce 'AST Signatures' and 'AST Definitions' states in the Context Panel for C/C++ files.

  2. Track: Context Snapshotting per "Take" Link: ./archive/context_snapshotting_takes_20260510/ Goal: Snapshot and visually restore the Context Panel state when switching between Takes.

  3. Track: Interactive Text Slice Highlighting Link: ./archive/interactive_text_slice_highlighting_20260510/ Goal: Allow highlighting text ranges to create fuzzy-anchored slices (Def, Sig, Hide) that survive file modifications.

  4. Track: Context Batch Operations UX Link: ./archive/context_batch_operations_ux_20260510/ Goal: Add multi-select and batch state modification capabilities to the Context Panel for rapid wrangling.

  5. Track: GenCpp Project Initialization Link: ./archive/gencpp_project_init_20260510/ Goal: Configure manual_slop.toml in the gencpp repo to isolate conductor tracks, logs, and history.

  6. Track: Interactive AST Tree Masking Link: ./archive/interactive_ast_tree_masking_20260510/ Goal: Inspect C/C++ ASTs in the GUI and mask individual classes/functions as Def, Sig, or Hide.

  7. Track: Phase 6 Review and Regression Verification Link: ./archive/phase6_review_20260510/ Goal: Review Phase 6 implementation, perform full-suite batch regression testing, and expand test coverage for new context curation features.

  8. Track: Context Composition Decoupling Link: ./archive/context_comp_decouple_20260510/ Goal: Decouple Files & Media from Context Composition, add directory grouping, file stats, and view mode selection per file.

  9. Track: Context Composition Slice Visualization Link: ./archive/context_comp_slices_20260510/ Goal: Enhance slice visualization with visual editor, annotation support (tags/comments), and view presets.

  10. Track: GUI Refactor & Stabilization Link: ./archive/gui_refactor_stabilization_20260512/ Goal: Refactor gui_2.py to fix regressions and enforce better imgui scoping patterns.

  11. Track: GUI 2 Large Cleanup (originally listed as "I started to do a large cleanup to ./src/gui_2.py..." — the long user message was the track description) Link: ./archive/gui_2_cleanup_20260513/ Goal: Study gui_2.py and derive more information on how to maintain and write code for the Python codebase. Update product guidelines or the python code_styleguidelines based on what is discovered. May also need changes to the mcp_tools for better structural awareness of annotations or other conventions with these python files.

  12. Track: Add Python structural MCP tools (py_remove_def, py_add_def, py_move_def, py_region_wrap) Link: ./archive/python_structural_mcp_tools_20260513/

  13. [~] Track: Context Preview & Slice Editor Fixes Link: ./tracks/context_preview_fixes_20260516/ Goal: Fix Preview button generating empty content, and Inspect/Slices buttons failing to open their respective editor panels. Status: in progress; track folder still in tracks/ (not yet archived).

Active

  1. Track: GenCpp Dogfood Feedback Loop Link: ./tracks/gencpp_dogfood_feedback_20260510/ Goal: Verify Manual Slop can target gencpp at C:/projects/gencpp and establish a feedback mechanism for issues found during dogfooding. Status: oldest pending track (2026-05-10). Track folder still in tracks/.

Hot Reload Feature (2026-05-16)

Single-track feature, not part of a numbered Phase.

Archived

  1. Track: Hot Reload Python Codebase (Phase 2) Link: ./archive/hot_reload_python_20260516/ Goal: Implement selective, state-preserving hot-reload for src/gui_2.py with delegation pattern refactor, manual trigger via Ctrl+Alt+R and GUI button, and visual error tint feedback on failure.

Phase 7: Stabilization & Polishing (2026-05-13 to 2026-06-02)

Two archival phases under the same "Phase 7" umbrella. Both completed; tracks moved to archive/.

Archived


Late May 2026 - Early June 2026: One-Off Fixes and Polish

One-off bug fixes and UX polish that landed in the days leading up to the major track work. All archived.

Archived


Phase 8: UI Polish (2026-06-03)

Initialized: 2026-06-03

User review surfaced five outstanding UI issues, each previously attempted without success. This track addresses them as five independent phases with their own TDD cycles and atomic commits.

Active

  1. Track: UI Polish (Five Issues) Spec: ./../../docs/superpowers/specs/2026-06-03-ui-polish-design.md Plan: ./../../docs/superpowers/plans/2026-06-03-ui-polish.md *Goal: Resolve five long-standing UI issues:
    • Phase 1: GFM markdown table rendering (pre-processor into src/markdown_table.py, wire into MarkdownRenderer.render).
    • Phase 2: Widen the Keep Pairs numeric input next to Truncate in the discussion panel (gui_2.py:3829, width 80 -> 140, switch to drag_int).
    • Phase 3: Fix Refresh Registry button in Log Management — currently instantiates LogRegistry without calling load_registry() so the displayed table never reflects on-disk state (gui_2.py:1675).
    • Phase 4: Add Vendor State tab to Operations Hub — at-a-glance provider/model, context-window utilization, cache hit rate, last error class, vendor quota (new src/vendor_state.py aggregator + controller.vendor_quota field + ai_client wire-up).
    • Phase 5: Files & Media > Files directory-grouped tree (re-use aggregate.group_files_by_dir, mirror render_context_files_table collapsible-node style).*

Recently Archived (post-Phase 8)

  • Track: Clean Install Test [checkpoint: d14ae3b] Link: ./tracks/clean_install_test_20260603/, Spec: ./../../docs/superpowers/specs/2026-06-02-clean-install-test-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-clean-install-test.md Goal: Add opt-in pytest test (RUN_CLEAN_INSTALL_TEST=1) that clones the repo to tmp_path, runs uv sync, launches sloppy.py --enable-test-hooks, verifies Hook API responds. Catches "works on my machine" failures. Added clean_install marker to pyproject.toml. Created tests/test_clean_install.py (114 lines, uses urllib.request from stdlib per tech-stack.md dependency minimalism rule - deviation from plan). Skipped by default. Marked with @pytest.mark.clean_install.

  • Track: Fix markdown_helper.py for imgui-bundle >=1.92.801 [checkpoint: 7a34edf] Link: ./tracks/markdown_helper_language_api_compat_20260603/ Goal: First thing the clean install test caught. ed.TextEditor.LanguageDefinitionId enum was removed in imgui-bundle>=1.92.801. Replaced with version-compat shim helpers _get_language_id(name) and _set_editor_language(editor, lang_obj) that detect the API at runtime (1.92.5 enum vs 1.92.801+ factory). Also added parallel _editor_lang_cache to track current language tag per editor (robust to API name differences like "C++" vs "cpp"). Verified: test passes in opt-in mode (1.92.801), shim still works in local 1.92.5 env, follow-up commit b306f8f corrected test URL /api/mma_status -> /api/gui/mma_status (actual endpoint per src/api_hooks.py:181).

  • Track: Multi-Theme TOML System (Multi-Themes Mod) [checkpoint: 38abf231] Link: ./tracks/multi_themes_20260604/, Plan: ./../../docs/superpowers/plans/2026-06-04-theme-syntax-modularization.md Goal: TOML-based theming: per-theme file layout (themes/<name>.toml global + <project>/project_themes.toml overrides), schema (syntax_palette + [colors] table of imgui.Col_ snake_case keys), public API (load_themes_from_disk, get_syntax_palette_for_theme, apply_syntax_palette), MarkdownRenderer calls apply_syntax_palette on init, color-callable convention (C_LBL() / C_VAL() so theme switches take effect at use site), upstream 4-syntax-palette limit documented in ./../../docs/guide_themes.md (new guide). 8 new theme files shipped. Theme-caused production bug fixed at src/gui_2.py:3705-3707 (commit 1469ecac): DIR_COLORS dict stored C_VAL not C_VAL(), so imgui.text_colored(d_col, ...) was being passed a function. Fixed by calling the function at the use site.

  • [~] Track: Test Regression Fixes (post multi-themes ship) [checkpoint: d7487af4] Link: ./tracks/regression_fixes_20260605/, Plan: ./../../docs/superpowers/plans/2026-06-05-regression-fixes.md Goal: Resolve 21 failing tests surfaced after the multi-themes ship. 11 of 21 fixed across 10 atomic commits: theme regression (test_gui_progress C_LBL/C_VAL API change, 38abf231), pre-existing non-live_gui (test_gui_phase4 markdown_helper mocks, df43f158; test_view_presets persona_manager mock, 970f198c), GUI production bug (DIR_COLORS callable, 1469ecac), live_gui LogPruner busy loop (ac08ee87), RAG NoneType guard (c96bdb06). Root cause of remaining 10 live_gui failures identified (commit d7487af4): imgui.save_ini_settings_to_memory() at src/gui_2.py:601 crashes C-level (0xc0000005) when called in the first few render frames because ImGui's internal state (Fonts, DisplaySize, Settings) isn't ready. Crash is uncatchable from Python. Fixed with _ini_capture_ready flag (defer-not-catch pattern): first call returns b"" and sets the flag, subsequent calls invoke the C function. Bisect anchors: 7df65dff (pre-existing failures start), 7ea52cbb (theme-caused failures start). Deferred follow-up track needed for ~5 remaining live_gui tests (MMA engine state transitions, RAG status timing, one test needing substantial render path mocks).

  • Track: Live-GUI Fragility Fixes (post regression_fixes ship) [checkpoint: 1488e715] [superseded by live_gui_test_hardening_v2] Link: Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-fragility-fixes.md, Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-fragility-fixes-design.md Goal: Resolve the 3 remaining live_gui failures (269/272 → 271/272 plus 1 new regression unit test). 1-line src fix in _capture_workspace_profile (change ini=b"" to ini="" to satisfy WorkspaceProfile.ini_content: str contract that tomli_w enforces); the b"" sentinel was a regression from d7487af4 that caused save_workspace_profile to raise TypeError, profile never saved, load_workspace_profile became a no-op. 1 new unit test (tests/test_workspace_profile_serialization.py) encoding the str/bytes contract. test_prior_session_no_pop_imbalance is deferred to a separate follow-up track — the test was more under-mocked than the spec assumed; fixing imscope.window tuple-return only revealed the next un-mocked dependency (imgui.begin returning bool where 2-tuple expected at line 4496). render_main_interface is a kitchen-sink function requiring 50+ mocks; a follow-up track will either add the missing mocks or refactor the test to exercise a narrow prior-session render path. Change 4 (doc hardening of defer-not-catch sections) deferred to track end; not done due to scope focus.

  • Track: Live-GUI Test Hardening v2 (post v1 ship) [complete: 26e0ced4] Note: No standalone track directory was created; the v2 work was completed as commit 26e0ced4 within the live_gui_fragility_fixes_20260605 lineage. The "v1" track directory ./archive/hot_reload_python_20260516/ is unrelated; this is a logical successor track with no folder of its own. Goal: Resolve the 4 remaining live_gui failures (was 3 in v1; 1 new regression). v1 fixed the str/bytes sentinel bug but exposed a deeper issue. Decomposed into 4 sub-tracks, 3 active: Sub-track 1: live_gui_state_sync_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-state-sync-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-state-sync.md. REAL root cause was bad indentation in src/gui_2.py:607 (user fixed). The App class had _capture_workspace_profile being parsed as nested inside _apply_snapshot due to indentation. Once fixed, 3 tests (test_auto_switch_sim, test_workspace_profiles_restoration, test_undo_redo_lifecycle) immediately passed. App/Controller state sync is already correctly handled by getattr/setattr at lines 478-487. Sub-track 2: prior_session_test_harden_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-prior-session-test-harden-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-prior-session-test-harden.md. Test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4. Sub-track 3: wait_for_ready_test_pattern_20260605 - SKIPPED. Tests already pass without polling. The flake hypothesis (time.sleep not enough) was wrong; the real cause was the indent. Polling can be a follow-up hardening pass if tests become flaky in CI. Sub-track 4: undo_redo_lifecycle_fix_20260605 - RESOLVED by Sub-track 1 indent fix. test_undo_redo_lifecycle now passes; no separate investigation needed. Net result: 4 originally-failing live_gui tests all pass. User can run the full batched suite to confirm.


Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)

Initialized: 2026-06-06 — the current major sprint. Four foundational tracks launched in this sprint, plus one follow-up. Two already completed; three in plan state.

Active

Track: Sloppy.py Startup Speedup [COMPLETE 2026-06-07]

Link: ./tracks/startup_speedup_20260606/, Spec: ./tracks/startup_speedup_20260606/spec.md, Plan: ./tracks/startup_speedup_20260606/plan.md

[track-created: cd4fb045] [phase-1-2-done: f9a01258] [phase-3-done: 51c054ec] [phase-4-done: 3849d304] [phase-5a-done: 78d3a1db] [phase-5b-done: 69d098ba] [phase-5c-done: 48c96499] [phase-5d-done: de6b85d2] [phase-5-done: 515a3029] [phase-6-partial-done: 85d18885] [sub-track-1-done: 253e1798] [post-shipping-fix-1: 8c4791d0] [post-shipping-fix-2: 88fc42bb] [post-shipping-fix-3: 52ea2693] [sub-track-3-done: 8fea8fe9] [sub-track-4-done: f3d071e0] [conftest-atexit-fix: 8957c9a5] [phase-9-shipped: 12cec6ae] [sub-track-2a-done: 01ddf9f1] [sub-track-2b-done: a41b31ed] [sub-track-2c-done: 372b0681] [sub-track-2d-done: 11a9c4f7] [sub-track-2e+f-done: 2e3a6385] [audit-CLEAN: 2e3a6385]

Goal: Reduce sloppy.py startup time. Main Thread Purity Invariant. 9 phases, 57 tasks. 44 TDD tests added (all passing). 7 main thread purity tests enforce invariant for 6 refactored files. Final measured: import src.ai_client 161ms (was 1800ms; 91% reduction / 1638ms saved). import src.gui_2 341ms (was 1770ms; 81% reduction / 1429ms saved). Total ~3067ms saved on the 2 big files. 62 audit violations remain (was 63 after Sub-track 2 partial; was 67 baseline) - all 6 refactored files contribute 0 new violations. Sub-track 1 (Phase 6 full completion) at 253e1798: 15 ad-hoc threading.Thread() call sites migrated to self.submit_io(...); ZERO new threading.Thread() in src/; only 5 domain-specific exempt sites remain (HookServer HTTP/WS, asyncio loop, WorkerPool, CPU monitor). Sub-track 3 (Hook API warmup endpoints) at 8fea8fe9: GET /api/warmup_status and GET /api/warmup_wait?timeout=N. 7 tests (5 unit + 2 live_gui). All pass. Sub-track 4 (GUI status indicator) at f3d071e0: render_warmup_status_indicator() + _on_warmup_complete_callback() + App._post_init registration. 6 tests (5 unit + 1 live_gui). All pass. Conftest atexit fix at 8957c9a5: registers a non-blocking pool shutdown via atexit. Fixes the run_tests_batched.py hang between batches (ThreadPoolExecutor.del was blocking on shutdown(wait=True) for stuck warmup jobs). Sub-track 2 (audit violations) PARTIAL at ae3b433e: 1 of 63 violations fixed (tomli_w in src/models.py). 62 remain (pydantic in models.py; tree_sitter in file_cache.py; websockets/cost_tracker/session_logger in api_hooks.py; 48 in app_controller.py + gui_2.py; 4 in sloppy.py). These are large refactors (especially gui_2.py with 24 violations and app_controller.py with 24) that exceed the scope of a single sub-track; addressed as future work. 3 post-shipping bugfix commits: 8c4791d0 (real bug: _ensure_gemini_client UnboundLocalError + test_discussion_compression deepseek mock adaptation); 88fc42bb (spec convention: 7 sites in src/ai_client.py use _require_warmed('google.genai') + .types parent lookup instead of leaf); 52ea2693 (conftest: use AppController.wait_for_warmup(timeout=60.0) instead of direct import google.genai — user-corrected jank workaround). Pre-existing test failures (unrelated, user will address): test_api_generate_blocked_while_stale (ui_global_preset_name AttributeError); test_rag_large_codebase_verification_sim (RAG retrieval).

Track: Test Batching Refactor [COMPLETE 2026-06-08] [archived]

Link: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/, Spec: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/spec.md, Plan: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/plan.md

[track-created: b7a97374] [COMPLETE 2026-06-08] [phase-1-done: 57285d04] [phase-2-skipped: no-CI] [phase-3-done: 5252b6d7] [phase-4-done: 50bd894f] [archived: 50bd894f]

Adaptations: (a) library modules moved from scripts/ to tests/ per user directive; (b) auto-inference uses AST scan (not regex) per user "FUCK REGEX" policy + prereq spec; (c) Phase 2 (CI shadow run) skipped: no CI infrastructure in repo; manual plan-vs-actual spot-check was the equivalent verification. Goal: Replace alphabetical 4-at-a-time batching in scripts/run_tests_batched.py with fixture-class-isolated tiers: 0 (opt-in: clean_install/docker, gated on env var + --include-opt-in flag), 1 (unit, grouped by subsystem batch_group, pytest-xdist), 2 (mock_app, grouped), 3 (live_gui, all in one pytest invocation to amortize 15s startup), H (headless), P (performance, last). Hybrid classification: auto-infer from filename + AST fixture scan, hand-curated tests/test_categories.toml overrides for cross-cutting and ambiguous files. Opt-in per-test order control via [[files.X.test_order]] sub-tables, gated on a conftest-loaded pytest plugin (no-op without entries). Priority: B (process isolation) > A (subsystem diagnostic) > C (speed). 4 phases: library+dry-run, shadow run, switch default, cleanup. Goal: Reduce sloppy.py startup time by ~2000-2400ms. Main Thread Purity Invariant: main thread (entering immapp.run()) never imports a module heavier than imgui_bundle + lean gui_2 skeleton. No-prefetch rule: heavy SDKs (google.genai 955ms, anthropic 430ms, openai 445ms, fastapi 470ms) are lazy-only — paid once on first use, on the asyncio thread, not in the background. No-new-threads rule: all background work goes through AppController._io_pool (4-thread ThreadPoolExecutor, named controller-io-N); zero new threading.Thread(...) calls in src/. Enforcement: static scripts/audit_main_thread_imports.py CI gate + runtime tests/test_main_thread_purity.py (sys.addaudithook test). 9 phases, 57 tasks. Target: import src.ai_client < 50ms (from ~1800ms), import src.gui_2 < 500ms (from ~3000ms), live_gui.wait_for_server(timeout=15) no longer times out.

Active

Track: Test Infrastructure Hardening (2026-06-09) [track-created: 566cf08c]

Link: ./tracks/test_infrastructure_hardening_20260609/, Spec: ./tracks/test_infrastructure_hardening_20260609/spec.md, Plan: ./tracks/test_infrastructure_hardening_20260609/plan.md, Metadata: ./tracks/test_infrastructure_hardening_20260609/metadata.json, State: ./tracks/test_infrastructure_hardening_20260609/state.toml

Goal: Kill the test regression nightmare that has consumed 4+ days of Tier 2 work. Fix 3 root causes of test regression churn: (1) subprocess state pollution via autouse _check_live_gui_health respawn (FR1), (2) filesystem path hygiene via tmp_path_factory + live_gui_workspace fixture (FR2), (3) _sync_rag_engine io_pool race via token + dirty flag coalescing (FR3). Plus 2 related fixes: set_value hook routing for ai_input (FR4), and an opt-in clean_baseline marker (FR5). 8 phases, ~60 surgical tasks, 6.5 days. Produces docs/reports/test_bed_health_20260609.md as the green baseline for the 4 upcoming tracks. Inherits from test_infra_hardening_foundation_20260608 + batch_resilience_plan_20260608 + rag_test_batch_failure_status_20260609_pm3 + rag_work_final_20260609_pm. Supersedes the placeholder tracks fix_remaining_tests_20260513, test_harness_hardening_20260310, test_patch_fixes_20260513, and test_batching_post_refactor_polish_20260607 (whose work is now scoped in FR1+FR2+FR3). Blocks the 4 upcoming tracks (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor) and code_path_audit_20260607. Tier 2 supervision required for Phases 1, 3, 4 (audit review, conftest refactor, io_pool race fix).

In Plan (or Pending Spec)

Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix [track-created: 7c1d597e]

Link: ./tracks/qwen_llama_grok_integration_20260606/, Spec: ./tracks/qwen_llama_grok_integration_20260606/spec.md, Plan: ./tracks/qwen_llama_grok_integration_20260606/plan.md (to be authored by writing-plans skill)

Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a Vendor Capability Matrix (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in src/vendor_capabilities.py. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared send_openai_compatible() helper in src/openai_compatible.py that operates on a normalized request/response data structure; each _send_<vendor>() is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor _send_minimax() to use the helper (~250 lines → ~50). Out of scope (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive. Now blocked by test_infrastructure_hardening_20260609 (was: none).

Track: Data-Oriented Error Handling (Fleury Pattern) [track-created: 494f68f9]

Link: ./tracks/data_oriented_error_handling_20260606/, Spec: ./tracks/data_oriented_error_handling_20260606/spec.md, Plan: ./tracks/data_oriented_error_handling_20260606/plan.md

Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New src/result_types.py (ErrorKind enum, ErrorInfo dataclass, Result[T] with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new conductor/code_styleguides/error_handling.md canonical reference. Refactor src/mcp_client.py ((p, err) tuples → Result; 30+ assert p is not None → nil-sentinel paths), src/ai_client.py (ProviderError exception → ErrorInfo dataclass; _send_<vendor>()_send_<vendor>_result() returning Result[str]; send() marked @deprecated; new send_result() public API), and src/rag_engine.py (RAGEngine methods → Result returns). Update conductor/product-guidelines.md + workflow.md + docs/guide_*.md so the convention is documented and future plans can incrementally migrate the remaining src/ files. Blocked by startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive. Follow-up: public_api_migration_20260606 (planned; not yet specced; no directory yet) — removes the deprecated ai_client.send() and migrates all callers. Detailed in the parent track's spec §12.1.

Track: Data Structure Strengthening (Type Aliases + NamedTuples) [track-created: ed42a97a]

Link: ./tracks/data_structure_strengthening_20260606/, Spec: ./tracks/data_structure_strengthening_20260606/spec.md, Plan: ./tracks/data_structure_strengthening_20260606/plan.md (to be authored by writing-plans skill)

Goal: Improve AI-readability by naming 430 currently-anonymous dict[str, Any] / list[dict[...]] / Tuple[...] types. New src/type_aliases.py with 10 TypeAlias definitions (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback) and 1 NamedTuple (FileItemsDiff). Mechanical replacement of 345 weak sites across 6 high-traffic files: src/ai_client.py (139), src/app_controller.py (86), src/models.py (51), src/api_hook_client.py (32), src/project_manager.py (20), src/aggregate.py (17). Add --strict mode to the existing scripts/audit_weak_types.py (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate scripts/audit_weak_types.baseline.json with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. Data-grounded: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. Honest about what's missing: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk. Now blocked by test_infrastructure_hardening_20260609 (was: none).

Track: MCP Architecture Refactor (Sub-MCP Extraction) [track-created: 2720a894]

Link: ./tracks/mcp_architecture_refactor_20260606/, Spec: ./tracks/mcp_architecture_refactor_20260606/spec.md, Plan: ./tracks/mcp_architecture_refactor_20260606/plan.md (to be authored by writing-plans skill)

Goal: Split the 2,205-line monolithic src/mcp_client.py (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Naming convention mcp_<type>.py for native MCPs: mcp_file_io.py (9 tools), mcp_python.py (14), mcp_c.py (5), mcp_cpp.py (5), mcp_web.py (2), mcp_analysis.py (2). The existing ExternalMCPManager is extracted to mcp_external.py (class name preserved). New MCPController class in src/mcp_client.py holds the 3-layer security model (extracted to src/mcp_client_security.py), the ALL_SUB_MCPS registration list, and the inverted-dict dispatch lookup. New src/mcp_client_legacy.py re-exports all 45+ old symbols for backward compat (the 4 existing test files + src/app_controller.py:61 continue to work). Each sub-MCP's invoke() returns Result[str, ErrorInfo] (Fleury pattern). Path parameters use the Metadata family aliases. Blocked by test_infrastructure_hardening_20260609, data_oriented_error_handling_20260606 (for Result/ErrorInfo), and data_structure_strengthening_20260606 (for Metadata aliases). 7 phases: foundation (security + controller), move-to-legacy, extract File I/O, extract Python, extract C/C++/Web/Analysis, extract External, dispatch update + docs + archive. Out of scope (per user): a per-MCP DSL (APL/K/Cosy-inspired) for compact tool calls — deferred to mcp_dsl_20260606 follow-up. JSON-only for now.

Track: RAG Phase 4 Stress Test Fix [x] — fixed 16412ad5

Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). index_file() upserts silently corrupt the collection, then search() fails with Collection expecting embedding with dimension of 3072, got 384 and the AI request never reaches 'done' status, timing out the 500.5s = 25s poll loop. Fix: RAGEngine._init_vector_store now calls _validate_collection_dim which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: test_rag_collection_dim_mismatch_recreates_collection and test_rag_collection_dim_match_preserves_collection in tests/test_rag_engine.py. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*

Track: Prior Session Test Harden (20260605) [superseded by live_gui_test_hardening_v2_20260605]

Status: 2026-05-05 — Surfaced during live_gui_fragility_fixes_20260605 execution. test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders is more under-mocked than expected. Completed as part of live_gui_test_hardening_v2_20260605: test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4.

Backlog (Provider + Language + Investigation)

Track: Bootstrap gencpp Python Bindings

Link: ./tracks/gencpp_python_bindings_20260308/

Track: Tree-Sitter Lua MCP Tools

Link: ./tracks/tree_sitter_lua_mcp_tools_20260310/

Track: GDScript Language Support Tools

Link: ./tracks/gdscript_godot_script_language_support_tools_20260310/

Track: C# Language Support Tools

Link: ./tracks/csharp_language_support_tools_20260310/

Track: OpenAI Provider Integration

Link: ./tracks/openai_integration_20260308/

Track: Zhipu AI (GLM) Provider Integration

Link: ./tracks/zhipu_integration_20260308/

Track: AI Provider Caching Optimization

Link: ./tracks/caching_optimization_20260308/

Track: Manual UX Validation & Review

Link: ./tracks/manual_ux_validation_20260302/

Track: Manual UX Validation — ASCII-Sketch Workflow (NEW 2026-06-08)

Link: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/, Spec: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md, Plan: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/plan.md Goal: Promote the ASCII-sketch UX ideation workflow (docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to a real track. Resolves 5 open questions (vocabulary preference, comparison policy, storage location, tooling, frequency), then executes the workflow on the first target: the per-entry rendering of the Discussion Hub at src/gui_2.py:3770 render_discussion_entry. The 23-op matrix A1-A7 in docs/guide_discussions.md is the source of truth; the SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md, 504 lines) informs the internal refactoring decisions. Complements the broader 20260302 track. 4 phases, 21 tasks, TDD-style for Phase 3. User-confirmed worth doing. Status: Active; Phase 1 (5 open questions to the user) is the current phase.

Track: Chunkification Optimization (NEW 2026-06-08, CONTINGENCY)

Link: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/, Spec: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md Goal: Contingency document only. Activates ONLY when a hard constraint surfaces that no existing Python package can solve AND the target is hot enough to justify the C11 build cost. Per user (verbatim): "only worth it if I reach a hard constraint that I cannot solve with an existing python package." The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are NOT currently bottlenecks per src/aggregate.py:380-454 (pure-Python string concat, zero third-party markdown deps in pyproject.toml:6-27) and src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity, debounced). First fix if they become bottlenecks: add markdown-it-py OR switch to pickle/msgspec — NOT C11. The shape when activated: subprocess-launch C11 binary with request/response blob wire format (NOT stateful C extension). The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 + "Xar-style chunked arrays" recommendation in §5.2 pre-support this track. Status: Deferred. Promotes to active track when (if) the first hard constraint surfaces.

Track: Context First Message Fix

Link: ./tracks/context_first_message_fix_20260604/

Track: Fix Remaining Tests

Link: ./tracks/fix_remaining_tests_20260513/

Track: Test Harness Hardening

Link: ./tracks/test_harness_hardening_20260310/

Track: Test Patch Fixes

Link: ./tracks/test_patch_fixes_20260513/

Track: Test Batching Post-Refactor Polish

Link: ./tracks/test_batching_post_refactor_polish_20260607/

Track: Code Path Audit

Link: ./tracks/code_path_audit_20260607/, Spec: ./tracks/code_path_audit_20260607/spec.md, Plan: ./tracks/code_path_audit_20260607/plan.md (to be authored by writing-plans skill) Goal: Build src/code_path_audit.py — a static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: custom postfix .dsl data + markdown + Mermaid + prefix tree text under docs/reports/code_path_audit/<date>/. The follow-up pipeline_pruning_20260607 consumes the .dsl files; the markdown + tree are for human review. MMA worker spawn is cold per user. Timing (revised 2026-06-08): the audit must run after the 4 foundational tracks ship (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor); pre-4-tracks code is too stale to ground optimization decisions.

Track: GUI Architecture Refinement

Link: ./tracks/gui_architecture_refinement_20260512/ (no spec.md; needs scoping before planning)

Follow-up (Planned, Not Yet Specced)

Track: Public API Result Migration (follow-up to data_oriented_error_handling_20260606)

Plan to be authored when data_oriented_error_handling_20260606 is complete; not started yet. Goal: Remove the deprecated ai_client.send() and migrate all callers to send_result(). Affects src/app_controller.py:290 and :3559, src/multi_agent_conductor.py:591, src/orchestrator_pm.py:86, src/conductor_tech_lead.py:68 (4 production call sites in src/), and ~50+ test files. The 4-caller enumeration + baseline counts are recorded in the parent track's spec §12.1.


Phase 9: Chore Tracks

Initialized: 2026-06-07

Completed (recently archived or in tracks/)

  • Track: Unused Scripts Cleanup [checkpoint: 46ce3cd] Link: ./tracks/unused_scripts_cleanup_20260607/, Spec: ./tracks/unused_scripts_cleanup_20260607/spec.md, Plan: ./tracks/unused_scripts_cleanup_20260607/plan.md Goal: Remove 30 confirmed-unused one-off scripts from scripts/ (56 → 26 files, 54% reduction). 5 atomic per-category commits; no new CI gate; follow-up unused_scripts_audit_20260607 recorded. All non-GUI test batches still pass; 2 audit scripts (main_thread_imports, weak_types) report no new violations.

  • Track: License & CVE Audit (Dependency Compliance) [checkpoint: a7ab994f] Link: ./tracks/license_cve_audit_20260607/, Spec: ./tracks/license_cve_audit_20260607/spec.md, Plan: ./tracks/license_cve_audit_20260607/plan.md Goal: Build scripts/audit_license_cve.py — single audit script that checks third-party deps (pyproject.toml + uv.lock transitive) for license compliance + known CVEs + version-pinning + SPDX source-headers. Tilde-pin all deps, delete requirements.txt, regenerate uv.lock (gitignored per project policy), add --strict mode + baseline file (CI gate). Policy: ALLOW (permissive + weak copyleft + public domain), BLOCK (GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, unknown). Track is scope-limited to third-party deps; the project's own LICENSE and SPDX headers are explicitly OUT of scope (the user reserves all rights to the repo). 28 unit + integration tests passing; --strict mode wired as CI gate; baseline file committed at scripts/audit_license_cve.baseline.json. 4 atomic commits: audit script + initial report, tilde-pin + lock regen + delete requirements.txt, --strict + baseline, tracks.md update.


Notes

Archive link convention: ./archive/... paths in this file resolve to conductor/archive/... (this file is at conductor/tracks.md). The 71 archive links in this file are all valid as of 2026-06-08.

Status legend:

  • [ ] not started
  • [~] in progress
  • [x] completed (track may still be in tracks/ or may have been moved to archive/)
  • ~~**...**~~ struck-through (renamed/replaced/superseded)

Naming convention: Each track's spec.md and plan.md (where present) follow the project's standard format: spec.md for design intent (the "why"), plan.md for executable tasks (the "how"). See conductor/tracks/data_oriented_error_handling_20260606/ for the canonical example.

Editing this file: When you mark a track as [x] and move its folder to archive/, also move it to the appropriate Archived sub-section. When you start a new track, create the folder under tracks/ first, then add the entry to the Active Tracks table at the top. The git-blame sort order (0a, 0b, 0c...) is no longer used; this file is now organized by phase + dependency.