The user specified that the code_path_audit_20260607 track should run
AFTER the 4 foundational tracks complete (qwen_llama_grok,
data_oriented_error_handling, data_structure_strengthening,
mcp_architecture_refactor). This commit formalizes that timing
and grounds the audit's analytical framing in the 5 sources loaded
into context on 2026-06-08.
3 surgical additions to the spec/plan, no task changes:
1. Post-4-tracks timing (new section in spec.md §"Timing", plus
a "Timing" callout in plan.md's opening):
- The 4 tracks will significantly reshape src/ai_client.py,
src/mcp_client.py, src/app_controller.py, and
src/type_aliases.py
- Running the audit on pre-refactor code would produce a
report that's stale on day 1
- The post-4-tracks timing ensures the audit grounds
optimization decisions for the *resulting* architecture
- Pre-flight check: verify all 4 tracks are [x] completed
in conductor/tracks.md before starting this track
2. Analytical framing (new section in spec.md §"Analytical Framing
(5-source lens)"):
- Maps each of the 5 sources (Fleury taxonomy + Fleury
combinatoric + Muratori Big OOPs + Reece Assuming + user's
chunk ideation) to specific audit-time heuristics
- 4 concrete heuristics: effective-codepath count,
entity-hierarchy fingerprint, assumed-too-much detector,
chunkification candidates
- The heuristics shape REPORT INTERPRETATION, not the
static cost model (which stays data-grounded in
EXPENSIVE_THRESHOLD + per-class weights)
3. See Also cross-references in spec.md (6 new entries):
- nagent_review Pitfalls #2 and #4 (provider history
globals + stateful singleton)
- wo84LFzx5nI Big OOPs transcript (full text, 4310
segments, 200KB; loaded 2026-06-08)
- i-h95QIGchY Assuming transcript (full text, 3719
segments, 162KB; loaded 2026-06-08)
- ed_chunk_data_structures_20260523.md (5-image archive
of user's chunk ideation, 19KB; saved 2026-06-08)
- computational_shapes_ssdl_digest_20260608.md (the SSDL
digest that synthesizes the 4-source computational-shapes
thinking; the audit's tree/mermaid outputs ARE
computational-shape visualizations)
4. tracks.md entry updated to include the spec/plan links and
a brief status note that the audit is post-4-tracks.
5. plan.md has a "Timing" callout at the top stating the 4
tracks must ship before the plan executes.
No code modified. The audit's tasks (Phases 1-6) are unchanged
in structure; the new sections only add analytical context
and timing constraints.
54 KiB
Project Tracks
This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder (or in ../archive/<track_name>/ for completed tracks).
Structure:
- Active Tracks (Current Queue): In-flight and unblocked work the implementer can pick up today.
- Phase 0 - 9 (Chronological): The full project history in chronological order. Each phase has three sub-sections: Active (work in progress), Completed (work shipped but track not yet archived), Archived (track folder moved to
archive/).
Archive directories live at ../archive/<track_name>/ (from this file's location at conductor/tracks.md); the ./archive/... links in this file are relative to that location and resolve correctly.
Active Tracks (Current Queue)
Tracks that are unblocked and ready to start. Ordered by dependency (blocked-by first) and priority (A foundational → D forward-looking).
| # | Priority | Track | Status | Blocked By |
|---|---|---|---|---|
| 1 | A | Qwen, Llama & Grok Vendor Integration + Capability Matrix | spec ✓, plan pending | (none — foundation track) |
| 2 | A | Data-Oriented Error Handling (Fleury Pattern) | spec ✓, plan ✓, ready to start | startup_speedup, test_batching_refactor, qwen_llama_grok (this is the upstream) |
| 3 | A | Data Structure Strengthening (Type Aliases + NamedTuples) | spec ✓, plan pending | (none — independent) |
| 4 | A | MCP Architecture Refactor (Sub-MCP Extraction) | spec ✓, plan pending | data_oriented_error_handling, data_structure_strengthening |
| 5 | D | Public API Result Migration | placeholder; not yet specced | data_oriented_error_handling (deprecated send()) |
| 6 | — | UI Polish (Five Issues) | spec ✓, plan ✓, ready to start | (none — independent) |
| 7 | — | Bootstrap gencpp Python Bindings | spec TBD | (none — independent) |
| 8 | — | Tree-Sitter Lua MCP Tools | spec TBD | (none — independent) |
| 9 | — | GDScript Language Support Tools | spec TBD | (none — independent) |
| 10 | — | C# Language Support Tools | spec TBD | (none — independent) |
| 11 | — | OpenAI Provider Integration | spec TBD | (none — independent) |
| 12 | — | Zhipu AI (GLM) Provider Integration | spec TBD | (none — independent) |
| 13 | — | AI Provider Caching Optimization | spec TBD | (none — independent) |
| 14 | — | Manual UX Validation & Review | spec TBD | (none — independent) |
| 15 | — | GenCpp Dogfood Feedback Loop | spec TBD | (none — independent; oldest pending track) |
| 16 | — | Code Path Audit | spec TBD | (none — investigation track) |
| 17 | — | GUI Architecture Refinement | (no spec.md) | (TBD) |
| 18 | — | Context First Message Fix | spec TBD | (none — independent) |
| 19 | — | Fix Remaining Tests | spec TBD | (none — independent) |
| 20 | — | Test Harness Hardening | spec TBD | (none — independent) |
| 21 | — | Test Patch Fixes | spec TBD | (none — independent) |
| 22 | — | Test Batching Post-Refactor Polish | spec TBD | test_batching_refactor (COMPLETE) |
| 23 | — | Prior Session Test Harden (20260605) | superseded; no action needed | — |
Note on numbering: the legacy file used 0a, 0b, 0c... and 0d, 0e, 0f, 0g for tracks created 2026-06-06+. This is the git-blame sort order, not a logical execution order. The new structure re-orders by dependency.
Phase 0: Infrastructure (Critical)
Initialized: 2026-02 (project foundation)
Completed
- Track: Conductor Path Configuration
Note: One-line entry; full details in ./tracks/conductor_path_configurable_20260306/ (still in
tracks/; not yet archived).
Phase 1: Pre-Track Foundation (2026-02 - 2026-03)
No tracks were added under explicit Phase 1; this section is reserved for the early architectural groundwork that preceded the formal track system.
Completed
- Various one-off refactors; full details in
conductor/archive/by track name prefix.
Phase 2: Strict Execution Queue
Completed 2026-03-06
Completed
- Track: Strict Execution Queue (Phase 2) See: ./archive/strict_execution_queue_completed_20260306/
Phase 3 - Phase 4: Foundational Tracks (March 2026)
Multiple sub-tracks under the initial feature-development push. All archived.
Archived
Tracks 1 - 29 of the original Phase 4 archive (preserved with original numbers for cross-reference continuity):
-
Track: Session Context Snapshots & Visibility(Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/session_context_snapshots_20260311/ -
Track: Discussion Takes & Timeline Branching(Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/discussion_takes_branching_20260311/ -
Track: RAG Support Link: ./archive/rag_support_20260308/
-
Track: Agent Tool Preference & Bias Tuning Link: ./archive/tool_bias_tuning_20260308/
-
Track: Expanded Hook API & Headless Orchestration Link: ./archive/hook_api_expansion_20260308/
-
Track: Codebase Audit and Cleanup Link: ./archive/codebase_audit_20260308/
-
Track: Expanded Test Coverage and Stress Testing Link: ./archive/test_coverage_expansion_20260309/
-
Track: Beads Mode Integration Link: ./archive/beads_mode_20260309/
-
Track: Optimization pass for Data-Oriented Python heuristics Link: ./archive/data_oriented_optimization_20260312/
-
Track: Rich Thinking Trace Handling Link: ./archive/thinking_trace_handling_20260313/
-
Track: Smarter Aggregation with Sub-Agent Summarization Link: ./archive/aggregation_smarter_summaries_20260322/
-
Track: System Context Exposure Link: ./archive/system_context_exposure_20260322/
-
Track: Advanced Log Management and Session Restoration Link: ./archive/log_session_overhaul_20260308/
-
Track: UI Theme Overhaul & Style System Link: ./archive/ui_theme_overhaul_20260308/
-
Track: Selectable GUI Text & UX Improvements Link: ./archive/selectable_ui_text_20260308/
-
Track: Markdown Support & Syntax Highlighting Link: ./archive/markdown_highlighting_20260308/
-
Track: Custom Shader and Window Frame Support Link: ./archive/custom_shaders_20260309/
-
Track: UI/UX Improvements - Presets and AI Settings Link: ./archive/presets_ai_settings_ux_20260311/
-
Track: Discussion Hub Panel Reorganization Link: ./archive/discussion_hub_panel_reorganization_20260322/
-
Track: Undo/Redo History Support Link: ./archive/undo_redo_history_20260311/
-
Track: Advanced Text Viewer with Syntax Highlighting Link: ./archive/text_viewer_rich_rendering_20260313/
-
Track: Tree-Sitter C/C++ MCP Tools Link: ./archive/ts_cpp_tree_sitter_20260308/
-
Track: Saved System Prompt Presets Link: ./archive/saved_presets_20260308/
-
Track: Saved Tool Presets Link: ./archive/saved_tool_presets_20260308/
-
Track: External Text Editor Integration for Approvals Link: ./archive/external_editor_integration_20260308/
-
Track: Agent Personas: Unified Profiles & Tool Presets Link: ./archive/agent_personas_20260309/
-
Track: Advanced Workspace Docking & Layout Profiles Link: ./archive/workspace_profiles_20260310/
-
Track: Review investigation of codebase and expose/cull any hidden invisible prompting Link: ./archive/cull_hidden_prompts_20260502/
-
Track: Test Regression Verification Link: ./archive/test_regression_verification_20260307/
Phase 5: Codebase Curation
Initialized: 2026-05-07
Completed (all archived)
Analysis & Structural Review
-
Track: Comprehensive Path Mapping & Tooling Link: ./archive/ai_interaction_call_graph_20260507/ Goal: Automated and manual derivation of all major code paths and pipelines in the system.
-
Track: Controller State Mutation Matrix Link: ./archive/controller_state_mutation_matrix_20260507/ Goal: Comprehensive map of all methods that modify the
AppControllerandAppstate. -
Track: Source-Wide Redundancy Audit Link: ./archive/source_wide_redundancy_audit_20260507/ Goal: Deep file-by-file audit to identify unused methods, duplicate logic, and dead code.
-
Track: Curate Provider Registries Link: ./archive/curate_provider_registries_20260507/ Goal: Move the PROVIDERS list to models.py and update all references to use this single source of truth.
-
Track: Encapsulate AppController Status Link: ./archive/encapsulate_appcontroller_status_20260507/ Goal: Convert ai_status and mma_status to properties with thread-safe setters.
-
Track: Decouple GUI Log Loading Link: ./archive/decouple_gui_log_loading_20260507/ Goal: Move Tkinter directory selection out of AppController and into gui_2.py.
-
Track: Refactor Context Aggregation Pipeline Link: ./archive/refactor_context_aggregation_pipeline_20260507/ Goal: Modernize src/aggregate.py and consolidate legacy tier builders.
-
Track: Cull Unused Symbols Link: ./archive/cull_unused_symbols_20260507/ Goal: Safely remove the 27 dead symbols identified in the redundancy audit.
-
Track: Structural Dependency Mapping (SDM) Docstrings Link: ./archive/sdm_docstrings_20260509/
-
Track: AppController Curation & Structural Alignment Link: ./archive/app_controller_curation_20260513/ Goal: Curate src/app_controller.py to match gui_2.py organization and enforce Python style conventions.
-
Track: Fix 45 failing test files across 12 batches Link: ./archive/fix_test_suite_failures_20260514/
-
Track: Fix Indentation 1-Space Convention Link: ./archive/fix_indentation_1space_20260516/ Goal: Standardize all Python files to 1-space indentation per AI-Optimized Python Style Guide. Audit and correct indentation in src/, tests/, scripts/, and conductor/ directories.
Phase 6: Context Composition Redesign
Initialized: 2026-05-10
Completed (all archived)
Context Control & Workflow Enhancements
-
Track: Granular AST Control (Signatures vs. Definitions) Link: ./archive/granular_ast_control_20260510/ Goal: Introduce 'AST Signatures' and 'AST Definitions' states in the Context Panel for C/C++ files.
-
Track: Context Snapshotting per "Take" Link: ./archive/context_snapshotting_takes_20260510/ Goal: Snapshot and visually restore the Context Panel state when switching between Takes.
-
Track: Interactive Text Slice Highlighting Link: ./archive/interactive_text_slice_highlighting_20260510/ Goal: Allow highlighting text ranges to create fuzzy-anchored slices (Def, Sig, Hide) that survive file modifications.
-
Track: Context Batch Operations UX Link: ./archive/context_batch_operations_ux_20260510/ Goal: Add multi-select and batch state modification capabilities to the Context Panel for rapid wrangling.
-
Track: GenCpp Project Initialization Link: ./archive/gencpp_project_init_20260510/ Goal: Configure manual_slop.toml in the gencpp repo to isolate conductor tracks, logs, and history.
-
Track: Interactive AST Tree Masking Link: ./archive/interactive_ast_tree_masking_20260510/ Goal: Inspect C/C++ ASTs in the GUI and mask individual classes/functions as Def, Sig, or Hide.
-
Track: Phase 6 Review and Regression Verification Link: ./archive/phase6_review_20260510/ Goal: Review Phase 6 implementation, perform full-suite batch regression testing, and expand test coverage for new context curation features.
-
Track: Context Composition Decoupling Link: ./archive/context_comp_decouple_20260510/ Goal: Decouple Files & Media from Context Composition, add directory grouping, file stats, and view mode selection per file.
-
Track: Context Composition Slice Visualization Link: ./archive/context_comp_slices_20260510/ Goal: Enhance slice visualization with visual editor, annotation support (tags/comments), and view presets.
-
Track: GUI Refactor & Stabilization Link: ./archive/gui_refactor_stabilization_20260512/ Goal: Refactor gui_2.py to fix regressions and enforce better imgui scoping patterns.
-
Track: GUI 2 Large Cleanup (originally listed as "I started to do a large cleanup to ./src/gui_2.py..." — the long user message was the track description) Link: ./archive/gui_2_cleanup_20260513/ Goal: Study gui_2.py and derive more information on how to maintain and write code for the Python codebase. Update product guidelines or the python code_styleguidelines based on what is discovered. May also need changes to the mcp_tools for better structural awareness of annotations or other conventions with these python files.
-
Track: Add Python structural MCP tools (py_remove_def, py_add_def, py_move_def, py_region_wrap) Link: ./archive/python_structural_mcp_tools_20260513/
-
[~] Track: Context Preview & Slice Editor Fixes Link: ./tracks/context_preview_fixes_20260516/ Goal: Fix Preview button generating empty content, and Inspect/Slices buttons failing to open their respective editor panels. Status: in progress; track folder still in
tracks/(not yet archived).
Active
- Track: GenCpp Dogfood Feedback Loop
Link: ./tracks/gencpp_dogfood_feedback_20260510/
Goal: Verify Manual Slop can target gencpp at C:/projects/gencpp and establish a feedback mechanism for issues found during dogfooding.
Status: oldest pending track (2026-05-10). Track folder still in
tracks/.
Hot Reload Feature (2026-05-16)
Single-track feature, not part of a numbered Phase.
Archived
- Track: Hot Reload Python Codebase (Phase 2) Link: ./archive/hot_reload_python_20260516/ Goal: Implement selective, state-preserving hot-reload for src/gui_2.py with delegation pattern refactor, manual trigger via Ctrl+Alt+R and GUI button, and visual error tint feedback on failure.
Phase 7: Stabilization & Polishing (2026-05-13 to 2026-06-02)
Two archival phases under the same "Phase 7" umbrella. Both completed; tracks moved to archive/.
Archived
-
Track: Phase 7 Stabilization and Polishing (Regressions Fix) Link: ./archive/phase7_stabilization_and_polishing_20260601/
-
Track: Phase 7 Monolithic Stabilization (Final Cleanup) Link: ./archive/phase7_monolithic_stabilization_20260602/
Late May 2026 - Early June 2026: One-Off Fixes and Polish
One-off bug fixes and UX polish that landed in the days leading up to the major track work. All archived.
Archived
-
Track: Robust Live Simulation Verification
-
Track: Fix GUI Crashes in Tool Preset Manager and Discussion Hub Link: ./archive/gui_crash_fixes_20260531/
-
Track: Fix
keys_downAttributeError in ImGui IO Link: ./archive/fix_imgui_keys_down_20260601/ -
Track: Selectable Thinking Monologs Link: ./archive/selectable_thinking_monologs_20260601/
-
Track: Fix MiniMax history sequencing and truncation Link: ./archive/minimax_history_fix_20260601/
-
Track: Preserve context selection on discussion switch and add empty context warning Link: ./archive/context_preservation_and_warnings_20260601/
-
Track: Fix Text Viewer docking conflicts and Tool Call row click interactivity Link: ./archive/text_viewer_and_tool_call_fixes_20260601/
-
Track: UX Refinements for Context Composition and Discussion Entries Link: ./archive/context_composition_ux_20260601/
-
Track: Combine AST Inspector and Slices Editor into a unified Structural File Editor Link: ./archive/structural_file_editor_20260601/
-
Track: Add per-response token metrics and AI-assisted history compression Link: ./archive/discussion_metrics_and_compression_20260601/
-
Track: Fix Approve Modal sizing and inline full preview Link: ./archive/approve_modal_ux_20260601/
-
Track: Implement Async Context Preview to fix UI hangs and add an 'Everything' Command Palette. Link: ./archive/command_palette_and_performance_20260602/ Goal: Async context preview offload (background thread, state lock) + Command Palette (32 commands, fuzzy search, Ctrl+Shift+P, Up/Down/Enter nav, 13 unit + 7 live_gui tests). Phases 1-3 complete.
-
Track: Comprehensive Documentation Refresh Link: ./archive/documentation_refresh_comprehensive_20260602/ Goal: Refresh stale documentation across
docs/. Completed: ASCII file tree updates (docs/Readme.md+Readme.md5→14 guides, 22→53 src modules),docs/guide_testing.md(new, comprehensive 251-file test suite reference), 7 per-source-file guides (guide_gui_2.md,guide_ai_client.md,guide_api_hooks.md,guide_mcp_client.md,guide_app_controller.md,guide_multi_agent_conductor.md,guide_models.md). All 14 guides cross-linked. Gap analysis: ./archive/documentation_refresh_comprehensive_20260602/gap_analysis.md.Sub-tracks (all checkpointed):
- Sub-Track 1: Docs Layer Refresh
[checkpoint: 20225c8]— 18 per-file atomic commits. 15 guides (8 refreshed + 7 new), Subsystem Index (24 entries), 106 cross-links all resolve, symbol parity fixed (apply_nerv_theme->apply_nerv). - Sub-Track 2: Conductor Docs Refresh
[checkpoint: ef4efab2]— 4 per-file atomic commits:product.md(14 guides, MiniMax, Command Palette),tech-stack.md(MiniMax, Gemini Embedding 001),workflow.md(2026-06-02 doc refresh, 45-tool count),index.md(active track links). - Sub-Track 3: Agent Config Refresh
[checkpoint: 87f668a6]— 3 per-file atomic commits:AGENTS.md(5.4K -> 0.7K thin pointer),CLAUDE.md(6.7K -> 0.2K deprecation stub),GEMINI.md(5 providers, sloppy.py entry, 12 key modules). Drift check: 0 issues in 9 mirrored skill files.
- Sub-Track 1: Docs Layer Refresh
-
Track: Test Consolidation & TOML Sandboxing
[checkpoint: cb91006c]Spec: ./../../docs/superpowers/specs/2026-06-02-test-consolidation-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-test-consolidation.md Goal: Audit tests for real-TOML usage, migrate offenders to sandboxed patterns. Addedscripts/check_test_toml_paths.pyaudit script (CI gate). Migratedtest_mcp_client_whitelist_enforcementtotmp_path(was the only offender). Skipped redundantenforce_no_real_tomlfixture — existingisolate_workspaceautouse + audit script provide equivalent coverage.
Phase 8: UI Polish (2026-06-03)
Initialized: 2026-06-03
User review surfaced five outstanding UI issues, each previously attempted without success. This track addresses them as five independent phases with their own TDD cycles and atomic commits.
Active
- Track: UI Polish (Five Issues)
Spec: ./../../docs/superpowers/specs/2026-06-03-ui-polish-design.md
Plan: ./../../docs/superpowers/plans/2026-06-03-ui-polish.md
*Goal: Resolve five long-standing UI issues:
- Phase 1: GFM markdown table rendering (pre-processor into
src/markdown_table.py, wire intoMarkdownRenderer.render). - Phase 2: Widen the
Keep Pairsnumeric input next toTruncatein the discussion panel (gui_2.py:3829, width 80 -> 140, switch todrag_int). - Phase 3: Fix
Refresh Registrybutton in Log Management — currently instantiatesLogRegistrywithout callingload_registry()so the displayed table never reflects on-disk state (gui_2.py:1675). - Phase 4: Add
Vendor Statetab to Operations Hub — at-a-glance provider/model, context-window utilization, cache hit rate, last error class, vendor quota (newsrc/vendor_state.pyaggregator +controller.vendor_quotafield +ai_clientwire-up). - Phase 5: Files & Media > Files directory-grouped tree (re-use
aggregate.group_files_by_dir, mirrorrender_context_files_tablecollapsible-node style).*
- Phase 1: GFM markdown table rendering (pre-processor into
Recently Archived (post-Phase 8)
-
Track: Clean Install Test
[checkpoint: d14ae3b]Link: ./tracks/clean_install_test_20260603/, Spec: ./../../docs/superpowers/specs/2026-06-02-clean-install-test-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-clean-install-test.md Goal: Add opt-in pytest test (RUN_CLEAN_INSTALL_TEST=1) that clones the repo to tmp_path, runsuv sync, launchessloppy.py --enable-test-hooks, verifies Hook API responds. Catches "works on my machine" failures. Addedclean_installmarker topyproject.toml. Createdtests/test_clean_install.py(114 lines, usesurllib.requestfrom stdlib per tech-stack.md dependency minimalism rule - deviation from plan). Skipped by default. Marked with@pytest.mark.clean_install. -
Track: Fix markdown_helper.py for imgui-bundle >=1.92.801
[checkpoint: 7a34edf]Link: ./tracks/markdown_helper_language_api_compat_20260603/ Goal: First thing the clean install test caught.ed.TextEditor.LanguageDefinitionIdenum was removed inimgui-bundle>=1.92.801. Replaced with version-compat shim helpers_get_language_id(name)and_set_editor_language(editor, lang_obj)that detect the API at runtime (1.92.5 enum vs 1.92.801+ factory). Also added parallel_editor_lang_cacheto track current language tag per editor (robust to API name differences like "C++" vs "cpp"). Verified: test passes in opt-in mode (1.92.801), shim still works in local 1.92.5 env, follow-up commitb306f8fcorrected test URL/api/mma_status->/api/gui/mma_status(actual endpoint persrc/api_hooks.py:181). -
Track: Multi-Theme TOML System (Multi-Themes Mod)
[checkpoint: 38abf231]Link: ./tracks/multi_themes_20260604/, Plan: ./../../docs/superpowers/plans/2026-06-04-theme-syntax-modularization.md Goal: TOML-based theming: per-theme file layout (themes/<name>.tomlglobal +<project>/project_themes.tomloverrides), schema (syntax_palette+[colors]table ofimgui.Col_snake_case keys), public API (load_themes_from_disk,get_syntax_palette_for_theme,apply_syntax_palette),MarkdownRenderercallsapply_syntax_paletteon init, color-callable convention (C_LBL()/C_VAL()so theme switches take effect at use site), upstream 4-syntax-palette limit documented in ./../../docs/guide_themes.md (new guide). 8 new theme files shipped. Theme-caused production bug fixed atsrc/gui_2.py:3705-3707(commit1469ecac):DIR_COLORSdict storedC_VALnotC_VAL(), soimgui.text_colored(d_col, ...)was being passed a function. Fixed by calling the function at the use site. -
[~] Track: Test Regression Fixes (post multi-themes ship)
[checkpoint: d7487af4]Link: ./tracks/regression_fixes_20260605/, Plan: ./../../docs/superpowers/plans/2026-06-05-regression-fixes.md Goal: Resolve 21 failing tests surfaced after the multi-themes ship. 11 of 21 fixed across 10 atomic commits: theme regression (test_gui_progressC_LBL/C_VAL API change,38abf231), pre-existing non-live_gui (test_gui_phase4markdown_helper mocks,df43f158;test_view_presetspersona_manager mock,970f198c), GUI production bug (DIR_COLORScallable,1469ecac), live_guiLogPrunerbusy loop (ac08ee87), RAG NoneType guard (c96bdb06). Root cause of remaining 10 live_gui failures identified (commitd7487af4):imgui.save_ini_settings_to_memory()atsrc/gui_2.py:601crashes C-level (0xc0000005) when called in the first few render frames because ImGui's internal state (Fonts, DisplaySize, Settings) isn't ready. Crash is uncatchable from Python. Fixed with_ini_capture_readyflag (defer-not-catch pattern): first call returnsb""and sets the flag, subsequent calls invoke the C function. Bisect anchors:7df65dff(pre-existing failures start),7ea52cbb(theme-caused failures start). Deferred follow-up track needed for ~5 remaining live_gui tests (MMA engine state transitions, RAG status timing, one test needing substantial render path mocks). -
Track: Live-GUI Fragility Fixes (post regression_fixes ship)
[checkpoint: 1488e715][superseded by live_gui_test_hardening_v2] Link: Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-fragility-fixes.md, Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-fragility-fixes-design.md Goal: Resolve the 3 remaining live_gui failures (269/272 → 271/272 plus 1 new regression unit test). 1-line src fix in_capture_workspace_profile(changeini=b""toini=""to satisfyWorkspaceProfile.ini_content: strcontract thattomli_wenforces); theb""sentinel was a regression fromd7487af4that causedsave_workspace_profileto raiseTypeError, profile never saved,load_workspace_profilebecame a no-op. 1 new unit test (tests/test_workspace_profile_serialization.py) encoding the str/bytes contract.test_prior_session_no_pop_imbalanceis deferred to a separate follow-up track — the test was more under-mocked than the spec assumed; fixing imscope.window tuple-return only revealed the next un-mocked dependency (imgui.begin returning bool where 2-tuple expected at line 4496).render_main_interfaceis a kitchen-sink function requiring 50+ mocks; a follow-up track will either add the missing mocks or refactor the test to exercise a narrow prior-session render path. Change 4 (doc hardening of defer-not-catch sections) deferred to track end; not done due to scope focus. -
Track: Live-GUI Test Hardening v2 (post v1 ship)
[complete: 26e0ced4]Note: No standalone track directory was created; the v2 work was completed as commit26e0ced4within the live_gui_fragility_fixes_20260605 lineage. The "v1" track directory ./archive/hot_reload_python_20260516/ is unrelated; this is a logical successor track with no folder of its own. Goal: Resolve the 4 remaining live_gui failures (was 3 in v1; 1 new regression). v1 fixed the str/bytes sentinel bug but exposed a deeper issue. Decomposed into 4 sub-tracks, 3 active: Sub-track 1: live_gui_state_sync_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-state-sync-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-state-sync.md. REAL root cause was bad indentation in src/gui_2.py:607 (user fixed). The App class had _capture_workspace_profile being parsed as nested inside _apply_snapshot due to indentation. Once fixed, 3 tests (test_auto_switch_sim, test_workspace_profiles_restoration, test_undo_redo_lifecycle) immediately passed. App/Controller state sync is already correctly handled by getattr/setattr at lines 478-487. Sub-track 2: prior_session_test_harden_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-prior-session-test-harden-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-prior-session-test-harden.md. Test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit26e0ced4. Sub-track 3: wait_for_ready_test_pattern_20260605 - SKIPPED. Tests already pass without polling. The flake hypothesis (time.sleep not enough) was wrong; the real cause was the indent. Polling can be a follow-up hardening pass if tests become flaky in CI. Sub-track 4: undo_redo_lifecycle_fix_20260605 - RESOLVED by Sub-track 1 indent fix. test_undo_redo_lifecycle now passes; no separate investigation needed. Net result: 4 originally-failing live_gui tests all pass. User can run the full batched suite to confirm.
Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)
Initialized: 2026-06-06 — the current major sprint. Four foundational tracks launched in this sprint, plus one follow-up. Two already completed; three in plan state.
Active
Track: Sloppy.py Startup Speedup [COMPLETE 2026-06-07]
Link: ./tracks/startup_speedup_20260606/, Spec: ./tracks/startup_speedup_20260606/spec.md, Plan: ./tracks/startup_speedup_20260606/plan.md
[track-created: cd4fb045] [phase-1-2-done: f9a01258] [phase-3-done: 51c054ec] [phase-4-done: 3849d304] [phase-5a-done: 78d3a1db] [phase-5b-done: 69d098ba] [phase-5c-done: 48c96499] [phase-5d-done: de6b85d2] [phase-5-done: 515a3029] [phase-6-partial-done: 85d18885] [sub-track-1-done: 253e1798] [post-shipping-fix-1: 8c4791d0] [post-shipping-fix-2: 88fc42bb] [post-shipping-fix-3: 52ea2693] [sub-track-3-done: 8fea8fe9] [sub-track-4-done: f3d071e0] [conftest-atexit-fix: 8957c9a5] [phase-9-shipped: 12cec6ae] [sub-track-2a-done: 01ddf9f1] [sub-track-2b-done: a41b31ed] [sub-track-2c-done: 372b0681] [sub-track-2d-done: 11a9c4f7] [sub-track-2e+f-done: 2e3a6385] [audit-CLEAN: 2e3a6385]
Goal: Reduce sloppy.py startup time. Main Thread Purity Invariant. 9 phases, 57 tasks. 44 TDD tests added (all passing). 7 main thread purity tests enforce invariant for 6 refactored files.
Final measured: import src.ai_client 161ms (was 1800ms; 91% reduction / 1638ms saved). import src.gui_2 341ms (was 1770ms; 81% reduction / 1429ms saved). Total ~3067ms saved on the 2 big files. 62 audit violations remain (was 63 after Sub-track 2 partial; was 67 baseline) - all 6 refactored files contribute 0 new violations.
Sub-track 1 (Phase 6 full completion) at 253e1798: 15 ad-hoc threading.Thread() call sites migrated to self.submit_io(...); ZERO new threading.Thread() in src/; only 5 domain-specific exempt sites remain (HookServer HTTP/WS, asyncio loop, WorkerPool, CPU monitor).
Sub-track 3 (Hook API warmup endpoints) at 8fea8fe9: GET /api/warmup_status and GET /api/warmup_wait?timeout=N. 7 tests (5 unit + 2 live_gui). All pass.
Sub-track 4 (GUI status indicator) at f3d071e0: render_warmup_status_indicator() + _on_warmup_complete_callback() + App._post_init registration. 6 tests (5 unit + 1 live_gui). All pass.
Conftest atexit fix at 8957c9a5: registers a non-blocking pool shutdown via atexit. Fixes the run_tests_batched.py hang between batches (ThreadPoolExecutor.del was blocking on shutdown(wait=True) for stuck warmup jobs).
Sub-track 2 (audit violations) PARTIAL at ae3b433e: 1 of 63 violations fixed (tomli_w in src/models.py). 62 remain (pydantic in models.py; tree_sitter in file_cache.py; websockets/cost_tracker/session_logger in api_hooks.py; 48 in app_controller.py + gui_2.py; 4 in sloppy.py). These are large refactors (especially gui_2.py with 24 violations and app_controller.py with 24) that exceed the scope of a single sub-track; addressed as future work.
3 post-shipping bugfix commits: 8c4791d0 (real bug: _ensure_gemini_client UnboundLocalError + test_discussion_compression deepseek mock adaptation); 88fc42bb (spec convention: 7 sites in src/ai_client.py use _require_warmed('google.genai') + .types parent lookup instead of leaf); 52ea2693 (conftest: use AppController.wait_for_warmup(timeout=60.0) instead of direct import google.genai — user-corrected jank workaround).
Pre-existing test failures (unrelated, user will address): test_api_generate_blocked_while_stale (ui_global_preset_name AttributeError); test_rag_large_codebase_verification_sim (RAG retrieval).
Track: Test Batching Refactor [COMPLETE 2026-06-08] [archived]
Link: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/, Spec: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/spec.md, Plan: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/plan.md
[track-created: b7a97374] [COMPLETE 2026-06-08] [phase-1-done: 57285d04] [phase-2-skipped: no-CI] [phase-3-done: 5252b6d7] [phase-4-done: 50bd894f] [archived: 50bd894f]
Adaptations: (a) library modules moved from scripts/ to tests/ per user directive; (b) auto-inference uses AST scan (not regex) per user "FUCK REGEX" policy + prereq spec; (c) Phase 2 (CI shadow run) skipped: no CI infrastructure in repo; manual plan-vs-actual spot-check was the equivalent verification.
Goal: Replace alphabetical 4-at-a-time batching in scripts/run_tests_batched.py with fixture-class-isolated tiers: 0 (opt-in: clean_install/docker, gated on env var + --include-opt-in flag), 1 (unit, grouped by subsystem batch_group, pytest-xdist), 2 (mock_app, grouped), 3 (live_gui, all in one pytest invocation to amortize 15s startup), H (headless), P (performance, last). Hybrid classification: auto-infer from filename + AST fixture scan, hand-curated tests/test_categories.toml overrides for cross-cutting and ambiguous files. Opt-in per-test order control via [[files.X.test_order]] sub-tables, gated on a conftest-loaded pytest plugin (no-op without entries). Priority: B (process isolation) > A (subsystem diagnostic) > C (speed). 4 phases: library+dry-run, shadow run, switch default, cleanup.
Goal: Reduce sloppy.py startup time by ~2000-2400ms. Main Thread Purity Invariant: main thread (entering immapp.run()) never imports a module heavier than imgui_bundle + lean gui_2 skeleton. No-prefetch rule: heavy SDKs (google.genai 955ms, anthropic 430ms, openai 445ms, fastapi 470ms) are lazy-only — paid once on first use, on the asyncio thread, not in the background. No-new-threads rule: all background work goes through AppController._io_pool (4-thread ThreadPoolExecutor, named controller-io-N); zero new threading.Thread(...) calls in src/. Enforcement: static scripts/audit_main_thread_imports.py CI gate + runtime tests/test_main_thread_purity.py (sys.addaudithook test). 9 phases, 57 tasks. Target: import src.ai_client < 50ms (from ~1800ms), import src.gui_2 < 500ms (from ~3000ms), live_gui.wait_for_server(timeout=15) no longer times out.
In Plan (or Pending Spec)
Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix [track-created: 7c1d597e]
Link: ./tracks/qwen_llama_grok_integration_20260606/, Spec: ./tracks/qwen_llama_grok_integration_20260606/spec.md, Plan: ./tracks/qwen_llama_grok_integration_20260606/plan.md (to be authored by writing-plans skill)
Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a Vendor Capability Matrix (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in src/vendor_capabilities.py. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared send_openai_compatible() helper in src/openai_compatible.py that operates on a normalized request/response data structure; each _send_<vendor>() is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor _send_minimax() to use the helper (~250 lines → ~50). Out of scope (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive.
Track: Data-Oriented Error Handling (Fleury Pattern) [track-created: 494f68f9]
Link: ./tracks/data_oriented_error_handling_20260606/, Spec: ./tracks/data_oriented_error_handling_20260606/spec.md, Plan: ./tracks/data_oriented_error_handling_20260606/plan.md
Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New src/result_types.py (ErrorKind enum, ErrorInfo dataclass, Result[T] with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new conductor/code_styleguides/error_handling.md canonical reference. Refactor src/mcp_client.py ((p, err) tuples → Result; 30+ assert p is not None → nil-sentinel paths), src/ai_client.py (ProviderError exception → ErrorInfo dataclass; _send_<vendor>() → _send_<vendor>_result() returning Result[str]; send() marked @deprecated; new send_result() public API), and src/rag_engine.py (RAGEngine methods → Result returns). Update conductor/product-guidelines.md + workflow.md + docs/guide_*.md so the convention is documented and future plans can incrementally migrate the remaining src/ files. Blocked by startup_speedup, test_batching_refactor, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive.
Follow-up: public_api_migration_20260606 (planned; not yet specced; no directory yet) — removes the deprecated ai_client.send() and migrates all callers. Detailed in the parent track's spec §12.1.
Track: Data Structure Strengthening (Type Aliases + NamedTuples) [track-created: ed42a97a]
Link: ./tracks/data_structure_strengthening_20260606/, Spec: ./tracks/data_structure_strengthening_20260606/spec.md, Plan: ./tracks/data_structure_strengthening_20260606/plan.md (to be authored by writing-plans skill)
Goal: Improve AI-readability by naming 430 currently-anonymous dict[str, Any] / list[dict[...]] / Tuple[...] types. New src/type_aliases.py with 10 TypeAlias definitions (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback) and 1 NamedTuple (FileItemsDiff). Mechanical replacement of 345 weak sites across 6 high-traffic files: src/ai_client.py (139), src/app_controller.py (86), src/models.py (51), src/api_hook_client.py (32), src/project_manager.py (20), src/aggregate.py (17). Add --strict mode to the existing scripts/audit_weak_types.py (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate scripts/audit_weak_types.baseline.json with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. Data-grounded: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. Honest about what's missing: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk.
Track: MCP Architecture Refactor (Sub-MCP Extraction) [track-created: 2720a894]
Link: ./tracks/mcp_architecture_refactor_20260606/, Spec: ./tracks/mcp_architecture_refactor_20260606/spec.md, Plan: ./tracks/mcp_architecture_refactor_20260606/plan.md (to be authored by writing-plans skill)
Goal: Split the 2,205-line monolithic src/mcp_client.py (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Naming convention mcp_<type>.py for native MCPs: mcp_file_io.py (9 tools), mcp_python.py (14), mcp_c.py (5), mcp_cpp.py (5), mcp_web.py (2), mcp_analysis.py (2). The existing ExternalMCPManager is extracted to mcp_external.py (class name preserved). New MCPController class in src/mcp_client.py holds the 3-layer security model (extracted to src/mcp_client_security.py), the ALL_SUB_MCPS registration list, and the inverted-dict dispatch lookup. New src/mcp_client_legacy.py re-exports all 45+ old symbols for backward compat (the 4 existing test files + src/app_controller.py:61 continue to work). Each sub-MCP's invoke() returns Result[str, ErrorInfo] (Fleury pattern). Path parameters use the Metadata family aliases. Blocked by data_oriented_error_handling_20260606 (for Result/ErrorInfo) and data_structure_strengthening_20260606 (for Metadata aliases). 7 phases: foundation (security + controller), move-to-legacy, extract File I/O, extract Python, extract C/C++/Web/Analysis, extract External, dispatch update + docs + archive. Out of scope (per user): a per-MCP DSL (APL/K/Cosy-inspired) for compact tool calls — deferred to mcp_dsl_20260606 follow-up. JSON-only for now.
Track: RAG Phase 4 Stress Test Fix [x] — fixed 16412ad5
Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). index_file() upserts silently corrupt the collection, then search() fails with Collection expecting embedding with dimension of 3072, got 384 and the AI request never reaches 'done' status, timing out the 500.5s = 25s poll loop. Fix: RAGEngine._init_vector_store now calls _validate_collection_dim which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: test_rag_collection_dim_mismatch_recreates_collection and test_rag_collection_dim_match_preserves_collection in tests/test_rag_engine.py. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*
Track: Prior Session Test Harden (20260605) [superseded by live_gui_test_hardening_v2_20260605]
Status: 2026-05-05 — Surfaced during live_gui_fragility_fixes_20260605 execution. test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders is more under-mocked than expected. Completed as part of live_gui_test_hardening_v2_20260605: test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4.
Backlog (Provider + Language + Investigation)
Track: Bootstrap gencpp Python Bindings
Link: ./tracks/gencpp_python_bindings_20260308/
Track: Tree-Sitter Lua MCP Tools
Link: ./tracks/tree_sitter_lua_mcp_tools_20260310/
Track: GDScript Language Support Tools
Link: ./tracks/gdscript_godot_script_language_support_tools_20260310/
Track: C# Language Support Tools
Link: ./tracks/csharp_language_support_tools_20260310/
Track: OpenAI Provider Integration
Link: ./tracks/openai_integration_20260308/
Track: Zhipu AI (GLM) Provider Integration
Link: ./tracks/zhipu_integration_20260308/
Track: AI Provider Caching Optimization
Link: ./tracks/caching_optimization_20260308/
Track: Manual UX Validation & Review
Link: ./tracks/manual_ux_validation_20260302/
Track: Context First Message Fix
Link: ./tracks/context_first_message_fix_20260604/
Track: Fix Remaining Tests
Link: ./tracks/fix_remaining_tests_20260513/
Track: Test Harness Hardening
Link: ./tracks/test_harness_hardening_20260310/
Track: Test Patch Fixes
Link: ./tracks/test_patch_fixes_20260513/
Track: Test Batching Post-Refactor Polish
Link: ./tracks/test_batching_post_refactor_polish_20260607/
Track: Code Path Audit
Link: ./tracks/code_path_audit_20260607/, Spec: ./tracks/code_path_audit_20260607/spec.md, Plan: ./tracks/code_path_audit_20260607/plan.md (to be authored by writing-plans skill)
Goal: Build src/code_path_audit.py — a static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: custom postfix .dsl data + markdown + Mermaid + prefix tree text under docs/reports/code_path_audit/<date>/. The follow-up pipeline_pruning_20260607 consumes the .dsl files; the markdown + tree are for human review. MMA worker spawn is cold per user. Timing (revised 2026-06-08): the audit must run after the 4 foundational tracks ship (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor); pre-4-tracks code is too stale to ground optimization decisions.
Track: GUI Architecture Refinement
Link: ./tracks/gui_architecture_refinement_20260512/ (no spec.md; needs scoping before planning)
Follow-up (Planned, Not Yet Specced)
Track: Public API Result Migration (follow-up to data_oriented_error_handling_20260606)
Plan to be authored when data_oriented_error_handling_20260606 is complete; not started yet.
Goal: Remove the deprecated ai_client.send() and migrate all callers to send_result(). Affects src/app_controller.py:290 and :3559, src/multi_agent_conductor.py:591, src/orchestrator_pm.py:86, src/conductor_tech_lead.py:68 (4 production call sites in src/), and ~50+ test files. The 4-caller enumeration + baseline counts are recorded in the parent track's spec §12.1.
Phase 9: Chore Tracks
Initialized: 2026-06-07
Completed (recently archived or in tracks/)
-
Track: Unused Scripts Cleanup
[checkpoint: 46ce3cd]Link: ./tracks/unused_scripts_cleanup_20260607/, Spec: ./tracks/unused_scripts_cleanup_20260607/spec.md, Plan: ./tracks/unused_scripts_cleanup_20260607/plan.md Goal: Remove 30 confirmed-unused one-off scripts fromscripts/(56 → 26 files, 54% reduction). 5 atomic per-category commits; no new CI gate; follow-upunused_scripts_audit_20260607recorded. All non-GUI test batches still pass; 2 audit scripts (main_thread_imports, weak_types) report no new violations. -
Track: License & CVE Audit (Dependency Compliance)
[checkpoint: a7ab994f]Link: ./tracks/license_cve_audit_20260607/, Spec: ./tracks/license_cve_audit_20260607/spec.md, Plan: ./tracks/license_cve_audit_20260607/plan.md Goal: Buildscripts/audit_license_cve.py— single audit script that checks third-party deps (pyproject.toml + uv.lock transitive) for license compliance + known CVEs + version-pinning + SPDX source-headers. Tilde-pin all deps, delete requirements.txt, regenerate uv.lock (gitignored per project policy), add --strict mode + baseline file (CI gate). Policy: ALLOW (permissive + weak copyleft + public domain), BLOCK (GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, unknown). Track is scope-limited to third-party deps; the project's own LICENSE and SPDX headers are explicitly OUT of scope (the user reserves all rights to the repo). 28 unit + integration tests passing; --strict mode wired as CI gate; baseline file committed at scripts/audit_license_cve.baseline.json. 4 atomic commits: audit script + initial report, tilde-pin + lock regen + delete requirements.txt, --strict + baseline, tracks.md update.
Notes
Archive link convention: ./archive/... paths in this file resolve to conductor/archive/... (this file is at conductor/tracks.md). The 71 archive links in this file are all valid as of 2026-06-08.
Status legend:
[ ]not started[~]in progress[x]completed (track may still be intracks/or may have been moved toarchive/)~~**...**~~struck-through (renamed/replaced/superseded)
Naming convention: Each track's spec.md and plan.md (where present) follow the project's standard format: spec.md for design intent (the "why"), plan.md for executable tasks (the "how"). See conductor/tracks/data_oriented_error_handling_20260606/ for the canonical example.
Editing this file: When you mark a track as [x] and move its folder to archive/, also move it to the appropriate Archived sub-section. When you start a new track, create the folder under tracks/ first, then add the entry to the Active Tracks table at the top. The git-blame sort order (0a, 0b, 0c...) is no longer used; this file is now organized by phase + dependency.