Phase 6 t6.1 + t6.2 (no archive per user directive): - docs/guide_ai_client.md: update Overview to mention 8 providers (was 5); add 'Shared OpenAI-Compatible Helper' section explaining src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest, send_openai_compatible, usage pattern); document the Qwen adapter and Llama multi-backend. - docs/guide_models.md: update PROVIDERS list to 8 entries (was 5). - conductor/tracks.md: update the Qwen track entry to reflect '50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up'; add detailed status note pointing to the follow-up track + audit report. - docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report explaining why a follow-up is needed (7 categories of gaps; the Tech Lead's 'footnote for now' failure mode; the lessons learned). - conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up track setup (spec.md, state.toml, metadata.json, TODO.md). 5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9, local-first + matrix v2, Anthropic/Gemini/DeepSeek migration. Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed) are NOT applied per user directive: 'we can then doc this we're not archiving yet, if we have a follow up track I need this one to stay up because there is still alot todo'.
55 KiB
Project Tracks
This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder (or in ../archive/<track_name>/ for completed tracks).
Structure:
- Active Tracks (Current Queue): In-flight and unblocked work the implementer can pick up today.
- Phase 0 - 9 (Chronological): The full project history in chronological order. Each phase has three sub-sections: Active (work in progress), Completed (work shipped but track not yet archived), Archived (track folder moved to
archive/).
Archive directories live at ../archive/<track_name>/ (from this file's location at conductor/tracks.md); the ./archive/... links in this file are relative to that location and resolve correctly.
Active Tracks (Current Queue)
Tracks that are unblocked and ready to start. Ordered by dependency (blocked-by first) and priority (A foundational → D forward-looking).
| # | Priority | Track | Status | Blocked By |
|---|---|---|---|---|
| 2 | A | Qwen, Llama & Grok Vendor Integration + Capability Matrix | spec ✓, plan ✓, 50/79 tasks done; Phase 6 in progress (docs); NOT archiving — has follow-up track | test_infrastructure_hardening_20260609 (merged) |
| 3 | A | Data-Oriented Error Handling (Fleury Pattern) | spec ✓, plan ✓, ready to start | startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609 (merged), qwen_llama_grok |
| 4 | A | Data Structure Strengthening (Type Aliases + NamedTuples) | spec ✓, plan pending | test_infrastructure_hardening_20260609 (merged) |
| 5 | A | MCP Architecture Refactor (Sub-MCP Extraction) | spec ✓, plan pending | test_infrastructure_hardening_20260609 (merged), data_oriented_error_handling, data_structure_strengthening |
| 6 | D | Public API Result Migration | placeholder; not yet specced | data_oriented_error_handling (deprecated send()) |
| 7 | — | UI Polish (Five Issues) | spec ✓, plan ✓, ready to start | (none — independent) |
| 8 | — | Bootstrap gencpp Python Bindings | spec TBD | (none — independent) |
| 9 | — | Tree-Sitter Lua MCP Tools | spec TBD | (none — independent) |
| 10 | — | GDScript Language Support Tools | spec TBD | (none — independent) |
| 11 | — | C# Language Support Tools | spec TBD | (none — independent) |
| 12 | — | OpenAI Provider Integration | spec TBD | (none — independent) |
| 13 | — | Zhipu AI (GLM) Provider Integration | spec TBD | (none — independent) |
| 14 | — | AI Provider Caching Optimization | spec TBD | (none — independent) |
| 15 | — | Manual UX Validation & Review | spec TBD | (none — independent) |
| 15a | — | Manual UX Validation — ASCII-Sketch Workflow | spec ✓, plan ✓, ready to start | (none — independent; NEW 2026-06-08) |
| 15b | — | Chunkification Optimization (Contingency) | spec ✓ (contingency), no plan | hard constraint surface (deferred) |
| 16 | — | GenCpp Dogfood Feedback Loop | spec TBD | (none — independent; oldest pending track) |
| 17 | — | Code Path Audit | spec TBD | test_infrastructure_hardening_20260609 (merged) |
| 18 | — | GUI Architecture Refinement | (no spec.md) | (TBD) |
| 19 | — | Context First Message Fix | spec TBD | (none — independent) |
| — | — | |||
| — | — | |||
| — | — | |||
| — | — | |||
| 20 | — | Prior Session Test Harden (20260605) | superseded; no action needed | — |
Note on numbering: the legacy file used 0a, 0b, 0c... and 0d, 0e, 0f, 0g for tracks created 2026-06-06+. This is the git-blame sort order, not a logical execution order. The new structure re-orders by dependency.
Phase 0: Infrastructure (Critical)
Initialized: 2026-02 (project foundation)
Completed
- Track: Conductor Path Configuration
Note: One-line entry; full details in ./tracks/conductor_path_configurable_20260306/ (still in
tracks/; not yet archived).
Phase 1: Pre-Track Foundation (2026-02 - 2026-03)
No tracks were added under explicit Phase 1; this section is reserved for the early architectural groundwork that preceded the formal track system.
Completed
- Various one-off refactors; full details in
conductor/archive/by track name prefix.
Phase 2: Strict Execution Queue
Completed 2026-03-06
Completed
- Track: Strict Execution Queue (Phase 2) See: ./archive/strict_execution_queue_completed_20260306/
Phase 3 - Phase 4: Foundational Tracks (March 2026)
Multiple sub-tracks under the initial feature-development push. All archived.
Archived
Tracks 1 - 29 of the original Phase 4 archive (preserved with original numbers for cross-reference continuity):
-
Track: Session Context Snapshots & Visibility(Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/session_context_snapshots_20260311/ -
Track: Discussion Takes & Timeline Branching(Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/discussion_takes_branching_20260311/ -
Track: RAG Support Link: ./archive/rag_support_20260308/
-
Track: Agent Tool Preference & Bias Tuning Link: ./archive/tool_bias_tuning_20260308/
-
Track: Expanded Hook API & Headless Orchestration Link: ./archive/hook_api_expansion_20260308/
-
Track: Codebase Audit and Cleanup Link: ./archive/codebase_audit_20260308/
-
Track: Expanded Test Coverage and Stress Testing Link: ./archive/test_coverage_expansion_20260309/
-
Track: Beads Mode Integration Link: ./archive/beads_mode_20260309/
-
Track: Optimization pass for Data-Oriented Python heuristics Link: ./archive/data_oriented_optimization_20260312/
-
Track: Rich Thinking Trace Handling Link: ./archive/thinking_trace_handling_20260313/
-
Track: Smarter Aggregation with Sub-Agent Summarization Link: ./archive/aggregation_smarter_summaries_20260322/
-
Track: System Context Exposure Link: ./archive/system_context_exposure_20260322/
-
Track: Advanced Log Management and Session Restoration Link: ./archive/log_session_overhaul_20260308/
-
Track: UI Theme Overhaul & Style System Link: ./archive/ui_theme_overhaul_20260308/
-
Track: Selectable GUI Text & UX Improvements Link: ./archive/selectable_ui_text_20260308/
-
Track: Markdown Support & Syntax Highlighting Link: ./archive/markdown_highlighting_20260308/
-
Track: Custom Shader and Window Frame Support Link: ./archive/custom_shaders_20260309/
-
Track: UI/UX Improvements - Presets and AI Settings Link: ./archive/presets_ai_settings_ux_20260311/
-
Track: Discussion Hub Panel Reorganization Link: ./archive/discussion_hub_panel_reorganization_20260322/
-
Track: Undo/Redo History Support Link: ./archive/undo_redo_history_20260311/
-
Track: Advanced Text Viewer with Syntax Highlighting Link: ./archive/text_viewer_rich_rendering_20260313/
-
Track: Tree-Sitter C/C++ MCP Tools Link: ./archive/ts_cpp_tree_sitter_20260308/
-
Track: Saved System Prompt Presets Link: ./archive/saved_presets_20260308/
-
Track: Saved Tool Presets Link: ./archive/saved_tool_presets_20260308/
-
Track: External Text Editor Integration for Approvals Link: ./archive/external_editor_integration_20260308/
-
Track: Agent Personas: Unified Profiles & Tool Presets Link: ./archive/agent_personas_20260309/
-
Track: Advanced Workspace Docking & Layout Profiles Link: ./archive/workspace_profiles_20260310/
-
Track: Review investigation of codebase and expose/cull any hidden invisible prompting Link: ./archive/cull_hidden_prompts_20260502/
-
Track: Test Regression Verification Link: ./archive/test_regression_verification_20260307/
Phase 5: Codebase Curation
Initialized: 2026-05-07
Completed (all archived)
Analysis & Structural Review
-
Track: Comprehensive Path Mapping & Tooling Link: ./archive/ai_interaction_call_graph_20260507/ Goal: Automated and manual derivation of all major code paths and pipelines in the system.
-
Track: Controller State Mutation Matrix Link: ./archive/controller_state_mutation_matrix_20260507/ Goal: Comprehensive map of all methods that modify the
AppControllerandAppstate. -
Track: Source-Wide Redundancy Audit Link: ./archive/source_wide_redundancy_audit_20260507/ Goal: Deep file-by-file audit to identify unused methods, duplicate logic, and dead code.
-
Track: Curate Provider Registries Link: ./archive/curate_provider_registries_20260507/ Goal: Move the PROVIDERS list to models.py and update all references to use this single source of truth.
-
Track: Encapsulate AppController Status Link: ./archive/encapsulate_appcontroller_status_20260507/ Goal: Convert ai_status and mma_status to properties with thread-safe setters.
-
Track: Decouple GUI Log Loading Link: ./archive/decouple_gui_log_loading_20260507/ Goal: Move Tkinter directory selection out of AppController and into gui_2.py.
-
Track: Refactor Context Aggregation Pipeline Link: ./archive/refactor_context_aggregation_pipeline_20260507/ Goal: Modernize src/aggregate.py and consolidate legacy tier builders.
-
Track: Cull Unused Symbols Link: ./archive/cull_unused_symbols_20260507/ Goal: Safely remove the 27 dead symbols identified in the redundancy audit.
-
Track: Structural Dependency Mapping (SDM) Docstrings Link: ./archive/sdm_docstrings_20260509/
-
Track: AppController Curation & Structural Alignment Link: ./archive/app_controller_curation_20260513/ Goal: Curate src/app_controller.py to match gui_2.py organization and enforce Python style conventions.
-
Track: Fix 45 failing test files across 12 batches Link: ./archive/fix_test_suite_failures_20260514/
-
Track: Fix Indentation 1-Space Convention Link: ./archive/fix_indentation_1space_20260516/ Goal: Standardize all Python files to 1-space indentation per AI-Optimized Python Style Guide. Audit and correct indentation in src/, tests/, scripts/, and conductor/ directories.
Phase 6: Context Composition Redesign
Initialized: 2026-05-10
Completed (all archived)
Context Control & Workflow Enhancements
-
Track: Granular AST Control (Signatures vs. Definitions) Link: ./archive/granular_ast_control_20260510/ Goal: Introduce 'AST Signatures' and 'AST Definitions' states in the Context Panel for C/C++ files.
-
Track: Context Snapshotting per "Take" Link: ./archive/context_snapshotting_takes_20260510/ Goal: Snapshot and visually restore the Context Panel state when switching between Takes.
-
Track: Interactive Text Slice Highlighting Link: ./archive/interactive_text_slice_highlighting_20260510/ Goal: Allow highlighting text ranges to create fuzzy-anchored slices (Def, Sig, Hide) that survive file modifications.
-
Track: Context Batch Operations UX Link: ./archive/context_batch_operations_ux_20260510/ Goal: Add multi-select and batch state modification capabilities to the Context Panel for rapid wrangling.
-
Track: GenCpp Project Initialization Link: ./archive/gencpp_project_init_20260510/ Goal: Configure manual_slop.toml in the gencpp repo to isolate conductor tracks, logs, and history.
-
Track: Interactive AST Tree Masking Link: ./archive/interactive_ast_tree_masking_20260510/ Goal: Inspect C/C++ ASTs in the GUI and mask individual classes/functions as Def, Sig, or Hide.
-
Track: Phase 6 Review and Regression Verification Link: ./archive/phase6_review_20260510/ Goal: Review Phase 6 implementation, perform full-suite batch regression testing, and expand test coverage for new context curation features.
-
Track: Context Composition Decoupling Link: ./archive/context_comp_decouple_20260510/ Goal: Decouple Files & Media from Context Composition, add directory grouping, file stats, and view mode selection per file.
-
Track: Context Composition Slice Visualization Link: ./archive/context_comp_slices_20260510/ Goal: Enhance slice visualization with visual editor, annotation support (tags/comments), and view presets.
-
Track: GUI Refactor & Stabilization Link: ./archive/gui_refactor_stabilization_20260512/ Goal: Refactor gui_2.py to fix regressions and enforce better imgui scoping patterns.
-
Track: GUI 2 Large Cleanup (originally listed as "I started to do a large cleanup to ./src/gui_2.py..." — the long user message was the track description) Link: ./archive/gui_2_cleanup_20260513/ Goal: Study gui_2.py and derive more information on how to maintain and write code for the Python codebase. Update product guidelines or the python code_styleguidelines based on what is discovered. May also need changes to the mcp_tools for better structural awareness of annotations or other conventions with these python files.
-
Track: Add Python structural MCP tools (py_remove_def, py_add_def, py_move_def, py_region_wrap) Link: ./archive/python_structural_mcp_tools_20260513/
-
[~] Track: Context Preview & Slice Editor Fixes Link: ./tracks/context_preview_fixes_20260516/ Goal: Fix Preview button generating empty content, and Inspect/Slices buttons failing to open their respective editor panels. Status: in progress; track folder still in
tracks/(not yet archived).
Active
- Track: GenCpp Dogfood Feedback Loop
Link: ./tracks/gencpp_dogfood_feedback_20260510/
Goal: Verify Manual Slop can target gencpp at C:/projects/gencpp and establish a feedback mechanism for issues found during dogfooding.
Status: oldest pending track (2026-05-10). Track folder still in
tracks/.
Hot Reload Feature (2026-05-16)
Single-track feature, not part of a numbered Phase.
Archived
- Track: Hot Reload Python Codebase (Phase 2) Link: ./archive/hot_reload_python_20260516/ Goal: Implement selective, state-preserving hot-reload for src/gui_2.py with delegation pattern refactor, manual trigger via Ctrl+Alt+R and GUI button, and visual error tint feedback on failure.
Phase 7: Stabilization & Polishing (2026-05-13 to 2026-06-02)
Two archival phases under the same "Phase 7" umbrella. Both completed; tracks moved to archive/.
Archived
-
Track: Phase 7 Stabilization and Polishing (Regressions Fix) Link: ./archive/phase7_stabilization_and_polishing_20260601/
-
Track: Phase 7 Monolithic Stabilization (Final Cleanup) Link: ./archive/phase7_monolithic_stabilization_20260602/
Late May 2026 - Early June 2026: One-Off Fixes and Polish
One-off bug fixes and UX polish that landed in the days leading up to the major track work. All archived.
Archived
-
Track: Robust Live Simulation Verification
-
Track: Fix GUI Crashes in Tool Preset Manager and Discussion Hub Link: ./archive/gui_crash_fixes_20260531/
-
Track: Fix
keys_downAttributeError in ImGui IO Link: ./archive/fix_imgui_keys_down_20260601/ -
Track: Selectable Thinking Monologs Link: ./archive/selectable_thinking_monologs_20260601/
-
Track: Fix MiniMax history sequencing and truncation Link: ./archive/minimax_history_fix_20260601/
-
Track: Preserve context selection on discussion switch and add empty context warning Link: ./archive/context_preservation_and_warnings_20260601/
-
Track: Fix Text Viewer docking conflicts and Tool Call row click interactivity Link: ./archive/text_viewer_and_tool_call_fixes_20260601/
-
Track: UX Refinements for Context Composition and Discussion Entries Link: ./archive/context_composition_ux_20260601/
-
Track: Combine AST Inspector and Slices Editor into a unified Structural File Editor Link: ./archive/structural_file_editor_20260601/
-
Track: Add per-response token metrics and AI-assisted history compression Link: ./archive/discussion_metrics_and_compression_20260601/
-
Track: Fix Approve Modal sizing and inline full preview Link: ./archive/approve_modal_ux_20260601/
-
Track: Implement Async Context Preview to fix UI hangs and add an 'Everything' Command Palette. Link: ./archive/command_palette_and_performance_20260602/ Goal: Async context preview offload (background thread, state lock) + Command Palette (32 commands, fuzzy search, Ctrl+Shift+P, Up/Down/Enter nav, 13 unit + 7 live_gui tests). Phases 1-3 complete.
-
Track: Comprehensive Documentation Refresh Link: ./archive/documentation_refresh_comprehensive_20260602/ Goal: Refresh stale documentation across
docs/. Completed: ASCII file tree updates (docs/Readme.md+Readme.md5→14 guides, 22→53 src modules),docs/guide_testing.md(new, comprehensive 251-file test suite reference), 7 per-source-file guides (guide_gui_2.md,guide_ai_client.md,guide_api_hooks.md,guide_mcp_client.md,guide_app_controller.md,guide_multi_agent_conductor.md,guide_models.md). All 14 guides cross-linked. Gap analysis: ./archive/documentation_refresh_comprehensive_20260602/gap_analysis.md.Sub-tracks (all checkpointed):
- Sub-Track 1: Docs Layer Refresh
[checkpoint: 20225c8]— 18 per-file atomic commits. 15 guides (8 refreshed + 7 new), Subsystem Index (24 entries), 106 cross-links all resolve, symbol parity fixed (apply_nerv_theme->apply_nerv). - Sub-Track 2: Conductor Docs Refresh
[checkpoint: ef4efab2]— 4 per-file atomic commits:product.md(14 guides, MiniMax, Command Palette),tech-stack.md(MiniMax, Gemini Embedding 001),workflow.md(2026-06-02 doc refresh, 45-tool count),index.md(active track links). - Sub-Track 3: Agent Config Refresh
[checkpoint: 87f668a6]— 3 per-file atomic commits:AGENTS.md(5.4K -> 0.7K thin pointer),CLAUDE.md(6.7K -> 0.2K deprecation stub),GEMINI.md(5 providers, sloppy.py entry, 12 key modules). Drift check: 0 issues in 9 mirrored skill files.
- Sub-Track 1: Docs Layer Refresh
-
Track: Test Consolidation & TOML Sandboxing
[checkpoint: cb91006c]Spec: ./../../docs/superpowers/specs/2026-06-02-test-consolidation-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-test-consolidation.md Goal: Audit tests for real-TOML usage, migrate offenders to sandboxed patterns. Addedscripts/check_test_toml_paths.pyaudit script (CI gate). Migratedtest_mcp_client_whitelist_enforcementtotmp_path(was the only offender). Skipped redundantenforce_no_real_tomlfixture — existingisolate_workspaceautouse + audit script provide equivalent coverage.
Phase 8: UI Polish (2026-06-03)
Initialized: 2026-06-03
User review surfaced five outstanding UI issues, each previously attempted without success. This track addresses them as five independent phases with their own TDD cycles and atomic commits.
Active
- Track: UI Polish (Five Issues)
Spec: ./../../docs/superpowers/specs/2026-06-03-ui-polish-design.md
Plan: ./../../docs/superpowers/plans/2026-06-03-ui-polish.md
*Goal: Resolve five long-standing UI issues:
- Phase 1: GFM markdown table rendering (pre-processor into
src/markdown_table.py, wire intoMarkdownRenderer.render). - Phase 2: Widen the
Keep Pairsnumeric input next toTruncatein the discussion panel (gui_2.py:3829, width 80 -> 140, switch todrag_int). - Phase 3: Fix
Refresh Registrybutton in Log Management — currently instantiatesLogRegistrywithout callingload_registry()so the displayed table never reflects on-disk state (gui_2.py:1675). - Phase 4: Add
Vendor Statetab to Operations Hub — at-a-glance provider/model, context-window utilization, cache hit rate, last error class, vendor quota (newsrc/vendor_state.pyaggregator +controller.vendor_quotafield +ai_clientwire-up). - Phase 5: Files & Media > Files directory-grouped tree (re-use
aggregate.group_files_by_dir, mirrorrender_context_files_tablecollapsible-node style).*
- Phase 1: GFM markdown table rendering (pre-processor into
Recently Archived (post-Phase 8)
-
Track: Clean Install Test
[checkpoint: d14ae3b]Link: ./tracks/clean_install_test_20260603/, Spec: ./../../docs/superpowers/specs/2026-06-02-clean-install-test-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-clean-install-test.md Goal: Add opt-in pytest test (RUN_CLEAN_INSTALL_TEST=1) that clones the repo to tmp_path, runsuv sync, launchessloppy.py --enable-test-hooks, verifies Hook API responds. Catches "works on my machine" failures. Addedclean_installmarker topyproject.toml. Createdtests/test_clean_install.py(114 lines, usesurllib.requestfrom stdlib per tech-stack.md dependency minimalism rule - deviation from plan). Skipped by default. Marked with@pytest.mark.clean_install. -
Track: Fix markdown_helper.py for imgui-bundle >=1.92.801
[checkpoint: 7a34edf]Link: ./tracks/markdown_helper_language_api_compat_20260603/ Goal: First thing the clean install test caught.ed.TextEditor.LanguageDefinitionIdenum was removed inimgui-bundle>=1.92.801. Replaced with version-compat shim helpers_get_language_id(name)and_set_editor_language(editor, lang_obj)that detect the API at runtime (1.92.5 enum vs 1.92.801+ factory). Also added parallel_editor_lang_cacheto track current language tag per editor (robust to API name differences like "C++" vs "cpp"). Verified: test passes in opt-in mode (1.92.801), shim still works in local 1.92.5 env, follow-up commitb306f8fcorrected test URL/api/mma_status->/api/gui/mma_status(actual endpoint persrc/api_hooks.py:181). -
Track: Multi-Theme TOML System (Multi-Themes Mod)
[checkpoint: 38abf231]Link: ./tracks/multi_themes_20260604/, Plan: ./../../docs/superpowers/plans/2026-06-04-theme-syntax-modularization.md Goal: TOML-based theming: per-theme file layout (themes/<name>.tomlglobal +<project>/project_themes.tomloverrides), schema (syntax_palette+[colors]table ofimgui.Col_snake_case keys), public API (load_themes_from_disk,get_syntax_palette_for_theme,apply_syntax_palette),MarkdownRenderercallsapply_syntax_paletteon init, color-callable convention (C_LBL()/C_VAL()so theme switches take effect at use site), upstream 4-syntax-palette limit documented in ./../../docs/guide_themes.md (new guide). 8 new theme files shipped. Theme-caused production bug fixed atsrc/gui_2.py:3705-3707(commit1469ecac):DIR_COLORSdict storedC_VALnotC_VAL(), soimgui.text_colored(d_col, ...)was being passed a function. Fixed by calling the function at the use site. -
[~] Track: Test Regression Fixes (post multi-themes ship)
[checkpoint: d7487af4]Link: ./tracks/regression_fixes_20260605/, Plan: ./../../docs/superpowers/plans/2026-06-05-regression-fixes.md Goal: Resolve 21 failing tests surfaced after the multi-themes ship. 11 of 21 fixed across 10 atomic commits: theme regression (test_gui_progressC_LBL/C_VAL API change,38abf231), pre-existing non-live_gui (test_gui_phase4markdown_helper mocks,df43f158;test_view_presetspersona_manager mock,970f198c), GUI production bug (DIR_COLORScallable,1469ecac), live_guiLogPrunerbusy loop (ac08ee87), RAG NoneType guard (c96bdb06). Root cause of remaining 10 live_gui failures identified (commitd7487af4):imgui.save_ini_settings_to_memory()atsrc/gui_2.py:601crashes C-level (0xc0000005) when called in the first few render frames because ImGui's internal state (Fonts, DisplaySize, Settings) isn't ready. Crash is uncatchable from Python. Fixed with_ini_capture_readyflag (defer-not-catch pattern): first call returnsb""and sets the flag, subsequent calls invoke the C function. Bisect anchors:7df65dff(pre-existing failures start),7ea52cbb(theme-caused failures start). Deferred follow-up track needed for ~5 remaining live_gui tests (MMA engine state transitions, RAG status timing, one test needing substantial render path mocks). -
Track: Live-GUI Fragility Fixes (post regression_fixes ship)
[checkpoint: 1488e715][superseded by live_gui_test_hardening_v2] Link: Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-fragility-fixes.md, Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-fragility-fixes-design.md Goal: Resolve the 3 remaining live_gui failures (269/272 → 271/272 plus 1 new regression unit test). 1-line src fix in_capture_workspace_profile(changeini=b""toini=""to satisfyWorkspaceProfile.ini_content: strcontract thattomli_wenforces); theb""sentinel was a regression fromd7487af4that causedsave_workspace_profileto raiseTypeError, profile never saved,load_workspace_profilebecame a no-op. 1 new unit test (tests/test_workspace_profile_serialization.py) encoding the str/bytes contract.test_prior_session_no_pop_imbalanceis deferred to a separate follow-up track — the test was more under-mocked than the spec assumed; fixing imscope.window tuple-return only revealed the next un-mocked dependency (imgui.begin returning bool where 2-tuple expected at line 4496).render_main_interfaceis a kitchen-sink function requiring 50+ mocks; a follow-up track will either add the missing mocks or refactor the test to exercise a narrow prior-session render path. Change 4 (doc hardening of defer-not-catch sections) deferred to track end; not done due to scope focus. -
Track: Live-GUI Test Hardening v2 (post v1 ship)
[complete: 26e0ced4]Note: No standalone track directory was created; the v2 work was completed as commit26e0ced4within the live_gui_fragility_fixes_20260605 lineage. The "v1" track directory ./archive/hot_reload_python_20260516/ is unrelated; this is a logical successor track with no folder of its own. Goal: Resolve the 4 remaining live_gui failures (was 3 in v1; 1 new regression). v1 fixed the str/bytes sentinel bug but exposed a deeper issue. Decomposed into 4 sub-tracks, 3 active: Sub-track 1: live_gui_state_sync_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-state-sync-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-state-sync.md. REAL root cause was bad indentation in src/gui_2.py:607 (user fixed). The App class had _capture_workspace_profile being parsed as nested inside _apply_snapshot due to indentation. Once fixed, 3 tests (test_auto_switch_sim, test_workspace_profiles_restoration, test_undo_redo_lifecycle) immediately passed. App/Controller state sync is already correctly handled by getattr/setattr at lines 478-487. Sub-track 2: prior_session_test_harden_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-prior-session-test-harden-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-prior-session-test-harden.md. Test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit26e0ced4. Sub-track 3: wait_for_ready_test_pattern_20260605 - SKIPPED. Tests already pass without polling. The flake hypothesis (time.sleep not enough) was wrong; the real cause was the indent. Polling can be a follow-up hardening pass if tests become flaky in CI. Sub-track 4: undo_redo_lifecycle_fix_20260605 - RESOLVED by Sub-track 1 indent fix. test_undo_redo_lifecycle now passes; no separate investigation needed. Net result: 4 originally-failing live_gui tests all pass. User can run the full batched suite to confirm.
Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)
Initialized: 2026-06-06 — the current major sprint. Four foundational tracks launched in this sprint, plus one follow-up. As of 2026-06-10: 3 recently completed (startup_speedup, test_batching_refactor, test_infrastructure_hardening); 4 in plan state (qwen, error_handling, data_structure, mcp_arch). The 4 in-plan tracks are now unblocked (the upstream test_infrastructure_hardening track is shipped).
Recently Completed (2026-06-06 to 2026-06-10)
Lightweight chronology; full spec/plan/state per track is in the linked folder.
Track: Sloppy.py Startup Speedup [COMPLETE 2026-06-07]
Link: ./tracks/startup_speedup_20260606/ (full spec/plan/state in folder)
[track-created: cd4fb045] [phase-1-2-done: f9a01258] [phase-3-done: 51c054ec] [phase-4-done: 3849d304] [phase-5-done: 515a3029] [sub-track-1-done: 253e1798] [sub-track-2e+f-done: 2e3a6385] [audit-CLEAN: 2e3a6385] [conftest-atexit-fix: 8957c9a5] [post-shipping-fix-1: 8c4791d0] [post-shipping-fix-2: 88fc42bb] [post-shipping-fix-3: 52ea2693]
9 phases, 57 tasks. 44 TDD tests added. Main Thread Purity Invariant enforced via scripts/audit_main_thread_imports.py CI gate. Final measured: import src.ai_client 161ms (was 1800ms; 91% reduction); import src.gui_2 341ms (was 1770ms; 81% reduction); total ~3067ms saved. 62 audit violations remain (large refactors deferred).
Track: Test Batching Refactor [COMPLETE 2026-06-08] [archived]
Link: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/
[track-created: b7a97374] [COMPLETE 2026-06-08] [phase-1-done: 57285d04] [phase-3-done: 5252b6d7] [phase-4-done: 50bd894f] [archived: 50bd894f]
4 phases, fixture-class-isolated tiers (0-3 + H + P) replacing alphabetical 4-at-a-time batching. Hand-curated tests/test_categories.toml overrides for cross-cutting files. Phase 2 (CI shadow run) skipped (no CI in repo).
Track: Test Infrastructure Hardening (2026-06-09) [COMPLETE 2026-06-10] [archived]
Link: ./archive/test_infrastructure_hardening_20260609/
[track-created: 566cf08c] [phase-1-done: 5df22fa8] [phase-2-done: 67d0211e] [phase-3-done: 006bb114] [phase-4-done: b8fcd9d6] [phase-5-done: 33d5cac] [phase-6-done: 7b87bbf5] [phase-7-done: 84edb200] [phase-8-done: 719fe9a]
8 phases, ~60 surgical tasks, 6.5 days. Fixes 3 root causes of test regression churn: FR1 subprocess health autouse, FR2 live_gui_workspace fixture (per-run timestamped under tests/artifacts/), FR3 _sync_rag_engine token+dirty coalescing. Plus FR4 set_value hook + FR5 clean_baseline marker. 314/314 tests green across all 11 tier batches. Closing report: docs/reports/test_infrastructure_hardening_batch_green_20260610.md. Lineage: workspace_path_finalize_20260609 + mma_tier_usage_reset_fix_20260610 + rag_phase4_sync_fix_20260610 (all also archived).
In Plan (or Pending Spec)
Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix [track-created: 7c1d597e]
Link: ./tracks/qwen_llama_grok_integration_20260606/, Spec: ./tracks/qwen_llama_grok_integration_20260606/spec.md, Plan: ./tracks/qwen_llama_grok_integration_20260606/plan.md (to be authored by writing-plans skill)
Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a Vendor Capability Matrix (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in src/vendor_capabilities.py. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared send_openai_compatible() helper in src/openai_compatible.py that operates on a normalized request/response data structure; each _send_<vendor>() is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor _send_minimax() to use the helper (~250 lines → ~50). Out of scope (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive. Now blocked by test_infrastructure_hardening_20260609 (was: none).
Track: Data-Oriented Error Handling (Fleury Pattern) [track-created: 494f68f9]
Link: ./tracks/data_oriented_error_handling_20260606/, Spec: ./tracks/data_oriented_error_handling_20260606/spec.md, Plan: ./tracks/data_oriented_error_handling_20260606/plan.md
Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New src/result_types.py (ErrorKind enum, ErrorInfo dataclass, Result[T] with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new conductor/code_styleguides/error_handling.md canonical reference. Refactor src/mcp_client.py ((p, err) tuples → Result; 30+ assert p is not None → nil-sentinel paths), src/ai_client.py (ProviderError exception → ErrorInfo dataclass; _send_<vendor>() → _send_<vendor>_result() returning Result[str]; send() marked @deprecated; new send_result() public API), and src/rag_engine.py (RAGEngine methods → Result returns). Update conductor/product-guidelines.md + workflow.md + docs/guide_*.md so the convention is documented and future plans can incrementally migrate the remaining src/ files. Blocked by startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive.
Follow-up: public_api_migration_20260606 (planned; not yet specced; no directory yet) — removes the deprecated ai_client.send() and migrates all callers. Detailed in the parent track's spec §12.1.
Track: Data Structure Strengthening (Type Aliases + NamedTuples) [track-created: ed42a97a]
Link: ./tracks/data_structure_strengthening_20260606/, Spec: ./tracks/data_structure_strengthening_20260606/spec.md, Plan: ./tracks/data_structure_strengthening_20260606/plan.md (to be authored by writing-plans skill)
Goal: Improve AI-readability by naming 430 currently-anonymous dict[str, Any] / list[dict[...]] / Tuple[...] types. New src/type_aliases.py with 10 TypeAlias definitions (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback) and 1 NamedTuple (FileItemsDiff). Mechanical replacement of 345 weak sites across 6 high-traffic files: src/ai_client.py (139), src/app_controller.py (86), src/models.py (51), src/api_hook_client.py (32), src/project_manager.py (20), src/aggregate.py (17). Add --strict mode to the existing scripts/audit_weak_types.py (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate scripts/audit_weak_types.baseline.json with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. Data-grounded: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. Honest about what's missing: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk. Now blocked by test_infrastructure_hardening_20260609 (was: none).
Track: MCP Architecture Refactor (Sub-MCP Extraction) [track-created: 2720a894]
Link: ./tracks/mcp_architecture_refactor_20260606/, Spec: ./tracks/mcp_architecture_refactor_20260606/spec.md, Plan: ./tracks/mcp_architecture_refactor_20260606/plan.md (to be authored by writing-plans skill)
Goal: Split the 2,205-line monolithic src/mcp_client.py (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Naming convention mcp_<type>.py for native MCPs: mcp_file_io.py (9 tools), mcp_python.py (14), mcp_c.py (5), mcp_cpp.py (5), mcp_web.py (2), mcp_analysis.py (2). The existing ExternalMCPManager is extracted to mcp_external.py (class name preserved). New MCPController class in src/mcp_client.py holds the 3-layer security model (extracted to src/mcp_client_security.py), the ALL_SUB_MCPS registration list, and the inverted-dict dispatch lookup. New src/mcp_client_legacy.py re-exports all 45+ old symbols for backward compat (the 4 existing test files + src/app_controller.py:61 continue to work). Each sub-MCP's invoke() returns Result[str, ErrorInfo] (Fleury pattern). Path parameters use the Metadata family aliases. Blocked by test_infrastructure_hardening_20260609, data_oriented_error_handling_20260606 (for Result/ErrorInfo), and data_structure_strengthening_20260606 (for Metadata aliases). 7 phases: foundation (security + controller), move-to-legacy, extract File I/O, extract Python, extract C/C++/Web/Analysis, extract External, dispatch update + docs + archive. Out of scope (per user): a per-MCP DSL (APL/K/Cosy-inspired) for compact tool calls — deferred to mcp_dsl_20260606 follow-up. JSON-only for now.
Track: RAG Phase 4 Stress Test Fix [x] — fixed 16412ad5
Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). index_file() upserts silently corrupt the collection, then search() fails with Collection expecting embedding with dimension of 3072, got 384 and the AI request never reaches 'done' status, timing out the 500.5s = 25s poll loop. Fix: RAGEngine._init_vector_store now calls _validate_collection_dim which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: test_rag_collection_dim_mismatch_recreates_collection and test_rag_collection_dim_match_preserves_collection in tests/test_rag_engine.py. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*
Track: Prior Session Test Harden (20260605) [superseded by live_gui_test_hardening_v2_20260605]
Status: 2026-05-05 — Surfaced during live_gui_fragility_fixes_20260605 execution. test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders is more under-mocked than expected. Completed as part of live_gui_test_hardening_v2_20260605: test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4.
Backlog (Provider + Language + Investigation)
Track: Bootstrap gencpp Python Bindings
Link: ./tracks/gencpp_python_bindings_20260308/
Track: Tree-Sitter Lua MCP Tools
Link: ./tracks/tree_sitter_lua_mcp_tools_20260310/
Track: GDScript Language Support Tools
Link: ./tracks/gdscript_godot_script_language_support_tools_20260310/
Track: C# Language Support Tools
Link: ./tracks/csharp_language_support_tools_20260310/
Track: OpenAI Provider Integration
Link: ./tracks/openai_integration_20260308/
Track: Zhipu AI (GLM) Provider Integration
Link: ./tracks/zhipu_integration_20260308/
Track: AI Provider Caching Optimization
Link: ./tracks/caching_optimization_20260308/
Track: Manual UX Validation & Review
Link: ./tracks/manual_ux_validation_20260302/
Track: Manual UX Validation — ASCII-Sketch Workflow (NEW 2026-06-08)
Link: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/, Spec: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md, Plan: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/plan.md
Goal: Promote the ASCII-sketch UX ideation workflow (docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to a real track. Resolves 5 open questions (vocabulary preference, comparison policy, storage location, tooling, frequency), then executes the workflow on the first target: the per-entry rendering of the Discussion Hub at src/gui_2.py:3770 render_discussion_entry. The 23-op matrix A1-A7 in docs/guide_discussions.md is the source of truth; the SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md, 504 lines) informs the internal refactoring decisions. Complements the broader 20260302 track. 4 phases, 21 tasks, TDD-style for Phase 3. User-confirmed worth doing.
Status: Active; Phase 1 (5 open questions to the user) is the current phase.
Track: Chunkification Optimization (NEW 2026-06-08, CONTINGENCY)
Link: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/, Spec: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md
Goal: Contingency document only. Activates ONLY when a hard constraint surfaces that no existing Python package can solve AND the target is hot enough to justify the C11 build cost. Per user (verbatim): "only worth it if I reach a hard constraint that I cannot solve with an existing python package." The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are NOT currently bottlenecks per src/aggregate.py:380-454 (pure-Python string concat, zero third-party markdown deps in pyproject.toml:6-27) and src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity, debounced). First fix if they become bottlenecks: add markdown-it-py OR switch to pickle/msgspec — NOT C11. The shape when activated: subprocess-launch C11 binary with request/response blob wire format (NOT stateful C extension). The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 + "Xar-style chunked arrays" recommendation in §5.2 pre-support this track.
Status: Deferred. Promotes to active track when (if) the first hard constraint surfaces.
Track: Context First Message Fix
Link: ./tracks/context_first_message_fix_20260604/
Track: Fix Remaining Tests
Link: ./tracks/fix_remaining_tests_20260513/
Track: Test Harness Hardening
Link: ./tracks/test_harness_hardening_20260310/
Track: Test Patch Fixes
Link: ./tracks/test_patch_fixes_20260513/
Track: Test Batching Post-Refactor Polish
Link: ./tracks/test_batching_post_refactor_polish_20260607/
Track: Code Path Audit
Link: ./tracks/code_path_audit_20260607/, Spec: ./tracks/code_path_audit_20260607/spec.md, Plan: ./tracks/code_path_audit_20260607/plan.md (to be authored by writing-plans skill)
Goal: Build src/code_path_audit.py — a static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: custom postfix .dsl data + markdown + Mermaid + prefix tree text under docs/reports/code_path_audit/<date>/. The follow-up pipeline_pruning_20260607 consumes the .dsl files; the markdown + tree are for human review. MMA worker spawn is cold per user. Timing (revised 2026-06-08): the audit must run after the 4 foundational tracks ship (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor); pre-4-tracks code is too stale to ground optimization decisions.
Track: GUI Architecture Refinement
Link: ./tracks/gui_architecture_refinement_20260512/ (no spec.md; needs scoping before planning)
Follow-up (Planned, Not Yet Specced)
Track: Public API Result Migration (follow-up to data_oriented_error_handling_20260606)
Plan to be authored when data_oriented_error_handling_20260606 is complete; not started yet.
Goal: Remove the deprecated ai_client.send() and migrate all callers to send_result(). Affects src/app_controller.py:290 and :3559, src/multi_agent_conductor.py:591, src/orchestrator_pm.py:86, src/conductor_tech_lead.py:68 (4 production call sites in src/), and ~50+ test files. The 4-caller enumeration + baseline counts are recorded in the parent track's spec §12.1.
Phase 9: Chore Tracks
Initialized: 2026-06-07
Completed (recently archived or in tracks/)
-
Track: Unused Scripts Cleanup
[checkpoint: 46ce3cd]Link: ./tracks/unused_scripts_cleanup_20260607/, Spec: ./tracks/unused_scripts_cleanup_20260607/spec.md, Plan: ./tracks/unused_scripts_cleanup_20260607/plan.md Goal: Remove 30 confirmed-unused one-off scripts fromscripts/(56 → 26 files, 54% reduction). 5 atomic per-category commits; no new CI gate; follow-upunused_scripts_audit_20260607recorded. All non-GUI test batches still pass; 2 audit scripts (main_thread_imports, weak_types) report no new violations. -
Track: License & CVE Audit (Dependency Compliance)
[checkpoint: a7ab994f]Link: ./tracks/license_cve_audit_20260607/, Spec: ./tracks/license_cve_audit_20260607/spec.md, Plan: ./tracks/license_cve_audit_20260607/plan.md Goal: Buildscripts/audit_license_cve.py— single audit script that checks third-party deps (pyproject.toml + uv.lock transitive) for license compliance + known CVEs + version-pinning + SPDX source-headers. Tilde-pin all deps, delete requirements.txt, regenerate uv.lock (gitignored per project policy), add --strict mode + baseline file (CI gate). Policy: ALLOW (permissive + weak copyleft + public domain), BLOCK (GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, unknown). Track is scope-limited to third-party deps; the project's own LICENSE and SPDX headers are explicitly OUT of scope (the user reserves all rights to the repo). 28 unit + integration tests passing; --strict mode wired as CI gate; baseline file committed at scripts/audit_license_cve.baseline.json. 4 atomic commits: audit script + initial report, tilde-pin + lock regen + delete requirements.txt, --strict + baseline, tracks.md update.
Notes
Archive link convention: ./archive/... paths in this file resolve to conductor/archive/... (this file is at conductor/tracks.md). The 71 archive links in this file are all valid as of 2026-06-08.
Status legend:
[ ]not started[~]in progress[x]completed (track may still be intracks/or may have been moved toarchive/)~~**...**~~struck-through (renamed/replaced/superseded)
Naming convention: Each track's spec.md and plan.md (where present) follow the project's standard format: spec.md for design intent (the "why"), plan.md for executable tasks (the "how"). See conductor/tracks/data_oriented_error_handling_20260606/ for the canonical example.
Editing this file: When you mark a track as [x] and move its folder to archive/, also move it to the appropriate Archived sub-section. When you start a new track, create the folder under tracks/ first, then add the entry to the Active Tracks table at the top. The git-blame sort order (0a, 0b, 0c...) is no longer used; this file is now organized by phase + dependency.