Private
Public Access
0
0
Files
manual_slop/conductor/tracks.md
T
ed 6dd41b3e6d conductor(plan): mark result_migration_baseline_cleanup_20260620 as active
TIER-2 READ conductor/code_styleguides/error_handling.md end-to-end before Phase 0.

Task 0.1 (Phase 0): update conductor/tracks.md row 32 from
'ready to start' to 'active 2026-06-20'.
2026-06-20 08:07:59 -04:00

120 KiB

Project Tracks

This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder (or in ../archive/<track_name>/ for completed tracks).

Structure:

  • Active Tracks (Current Queue): In-flight and unblocked work the implementer can pick up today.
  • Phase 0 - 9 (Chronological): The full project history in chronological order. Each phase has three sub-sections: Active (work in progress), Completed (work shipped but track not yet archived), Archived (track folder moved to archive/).

Archive directories live at ../archive/<track_name>/ (from this file's location at conductor/tracks.md); the ./archive/... links in this file are relative to that location and resolve correctly.


Active Tracks (Current Queue)

Tracks that are unblocked and ready to start. Ordered by dependency (blocked-by first) and priority (A foundational → D forward-looking).

# Priority Track Status Blocked By
2 A Qwen, Llama & Grok Vendor Integration + Capability Matrix spec ✓, plan ✓, 50/79 tasks done; Phase 6 in progress (docs); NOT archiving — has follow-up track test_infrastructure_hardening_20260609 (merged)
3 A Data-Oriented Error Handling (Fleury Pattern) spec ✓, plan ✓, ready to start startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609 (merged), qwen_llama_grok
4 A Data Structure Strengthening (Type Aliases + NamedTuples) spec ✓, plan pending test_infrastructure_hardening_20260609 (merged)
5 A MCP Architecture Refactor (Sub-MCP Extraction) spec ✓, plan pending test_infrastructure_hardening_20260609 (merged), data_oriented_error_handling, data_structure_strengthening
6 D Public API Result Migration placeholder; not yet specced data_oriented_error_handling (deprecated send())
6a A Public API Migration + UI Polish Test Cleanup spec ✓, plan ✓, shipped 2026-06-15 (13 pre-existing failures fixed; 3 RAG failures deferred to rag_test_failures_20260615) (none — independent; NEW 2026-06-15; combined stability track)
6b A RAG Test Failures Fix spec ✓, plan ✓, shipped 2026-06-15 (3 RAG tests fixed; first fully green baseline 1288 + 4 + 0) (none — independent; NEW 2026-06-15; small bug-fix track)
6c B Exception Handling Audit (Convention Compliance + Doc Clarification) spec ✓, plan ✓, shipped 2026-06-16 (211 violations identified across 42 files; 5 doc gaps closed) (none — independent; NEW 2026-06-16; audit + doc track; identifies the migration target for data_structure_strengthening_20260606 and the user's send_resultsend rename)
6d A Result Migration (5 sub-tracks) umbrella spec ✓; sub-tracks 1+2 initialized (sub-track 1: result_migration_review_pass_20260617 shipped 2026-06-17; sub-track 2: result_migration_small_files_20260617 initialized; 3 remaining) exception_handling_audit_20260616; identifies the migration target
6d-1 A Result Migration Sub-Track 1: Review Pass spec ✓, plan ✓, metadata ✓, state ✓; shipped 2026-06-17 (43 sites classified: 23 compliant + 1 migration-target + 8 PATTERN_1/2 + 9 compliant + 1 audit-script-bug; 10 new heuristics added; 3 audit-script bugs documented) result_migration_20260616 (umbrella); exception_handling_audit_20260616 (shipped 2026-06-16)
6d-2 A Result Migration Sub-Track 2: Small Files + Audit-Script Bug Fixes spec ✓, plan ✓, metadata ✓, state ✓, shipped 2026-06-18 (Phase 10 REJECTED for sliming 21 sites via 5 laundering heuristics; Phase 11 REDOES the 21 sites: 5 full Result migrations in warmup.py + 2 helper extracts + 14 documented; Phase 12 = ACTUAL full Result[T] migration: 16 sites in api_hooks.py + 27 sites in 16 small files; Heuristic #19 REMOVED; visit_Try bug FIXED; Heuristic D ADDED; Drain Points section in styleguide; Phase 12 REJECTED for false test claim; **Phase 13 = script crash fixed (UTF-8 reconfigure in run_tests_batched.py) + 3 failures investigated on parent commit (0 regressions) + 4 pre-existing Gemini 503 tests documented with @pytest.mark.skip + test_execution_sim_live switched from gemini_cli to gemini per user directive (STILL FAILS, reported for diff track); 11/11 tiers actually run; 9 PASS clean + 2 PASS with documented issues) result_migration_20260616 (umbrella); result_migration_review_pass_20260617 (shipped 2026-06-17)
6d-3 A Result Migration Sub-Track 3: App Controller spec ✓, plan ✓, metadata ✓, state ✓, active; migrates 45 sites in src/app_controller.py to Result[T] (32 INTERNAL_BROAD_CATCH + 8 INTERNAL_SILENT_SWALLOW + 4 INTERNAL_RETHROW + 1 INTERNAL_OPTIONAL_RETURN); 22 sites stay as-is (15 BOUNDARY_FASTAPI + 2 BOUNDARY_SDK + 4 INTERNAL_COMPLIANT + 1 INTERNAL_PROGRAMMER_RAISE). Phase 1 = fix the 2 known regressions (test_tool_presets_execution::test_tool_ask_approval + test_extended_sims::test_execution_sim_live) caused by the half-migrated session_logger.log_tool_call call site in _offload_entry_payload (lines 3715, 3721). 5-file-commit pattern from doeh_test_thinking_cleanup_20260615 (1 source + 1 test + 1 plan + 1 metadata + 1 state per task). 6 phases: (1) Setup + fix regressions; (2) 32 broad-catch → 4 bulk batches; (3) 8 silent-swallow → 2 batches with logging.debug per Heuristic #19; (4) 4 rethrow classified + 1 optional migrated; (5) Verify + audit + end-of-track report. result_migration_20260616 (umbrella); result_migration_small_files_20260617 (shipped 2026-06-18)
6d-4 A Result Migration Sub-Track 4: gui_2.py spec ✓, plan ✓, metadata ✓, state ✓, shipped 2026-06-20; migrated 42 sites in src/gui_2.py (25 INTERNAL_BROAD_CATCH + 13 INTERNAL_SILENT_SWALLOW + 2 INTERNAL_RETHROW + 2 UNCLEAR) to Result[T]; added 3 new drain-plane render functions + 1 new test file + 2 new audit heuristics (Phase 11 dunder raise + Phase 12 lazy-loading fallback). Audit: V=0, S=0, ?=0 for gui_2.py. 81 atomic commits across 13 phases; 114 tests pass; Tier 1+2 batched: 10/10 PASS; Tier 3: 1 known issue (FPS 28.46 vs 30 threshold; documented in TRACK_COMPLETION). Anti-sliming protocol: 13 phases cap each phase at <=10 sites with per-phase styleguide re-read + per-site audit pre/post check + per-phase invariant test. result_migration_app_controller_20260618 (sub-track 3, SHIPPED 2026-06-19 with Phase 7; data plane ready)
6d-5 A Result Migration Sub-Track 5: Baseline Cleanup spec ✓, plan ✓, metadata ✓, state ✓, active 2026-06-20; migrates 88 sites across 3 baseline files (src/mcp_client.py 46 + src/ai_client.py 33 + src/rag_engine.py 9) to make the convention reference 100% compliant. Same anti-sliming protocol as sub-track 4: 14 phases cap each phase at <=9 sites with per-phase styleguide re-read + per-site audit pre/post check + per-phase invariant test. result_migration_gui_2_20260619 (sub-track 4, SHIPPED 2026-06-20; first to ship without error correction per user)
6e A (meta-tooling) Tier 2 Autonomous Sandbox (unattended track execution) spec ✓, plan ✓, shipped 2026-06-16 (9 phases, 24 default-on tests + 4 opt-in tests + 1 smoke e2e) (none — independent; NEW 2026-06-16; meta-tooling; eliminates the permission: ask bottleneck for well-regularized tracks via a 3-layer enforcement stack: OpenCode permission system + Windows restricted token + git hooks)
6f A (meta-tooling) Tier 2 Sandbox File Leak Prevention (revert + 3-layer defense) spec ✓, plan ✓, metadata ✓, state ✓, shipped 2026-06-20; selectively reverted the 4 user-named files from offender commit 00e5a3f2 (.opencode/agents/tier2-autonomous.md, .opencode/commands/tier-2-auto-execute.md, opencode.json, mcp_paths.toml); added 3-layer defense: pre-commit hook at conductor/tier2/githooks/pre-commit (auto-unstages forbidden files at commit boundary; 12 tests), scripts/audit_tier2_leaks.py (working-tree audit with --strict CI gate; 13 tests), wired hook installation into scripts/tier2/setup_tier2_clone.ps1. 25 default-on + 4 opt-in tests pass; 4 atomic commits (fab2e55b + 81e1fd7b + f5d8ea04 + 8f54deda); user-driven response to a one-off incident (per user directive: tier-2 must NEVER commit those files again; NOT via gitignore). DEFERRED: CI wiring of audit --strict mode; rebase of stale tier-2 branches (tier2/result_migration_app_controller_phase6_20260619, tier2/test_sandbox_hardening_20260619) on origin/master@8f54deda to drop 00e5a3f2 (user action). (none — independent; NEW 2026-06-20; meta-tooling fix; selective revert of 4 of 9 changes in offender commit 00e5a3f2)
7 UI Polish (Five Issues) spec ✓, plan ✓, ready to start (Phases 1/4/5 shipped; Phases 2/3 code shipped but tests broken — fixed by track 6a) (none — independent)
7a B SQLite-Granularity Inline Docs for gui_2.py spec ✓, plan ✓, complete (none — independent)
7b B Continued SQLite-Granularity Inline Docs for gui_2.py spec ✓, plan ✓, complete (none — independent)
7c B SQLite-Granularity Inline Docs for ai_client.py spec ✓, plan ✓, ready to start (none — independent)
7d A Live GUI Test Infrastructure Fixes spec ✓, plan ✓, metadata ✓, state ✓, active; addresses 2 issues reported for diff tracks by result_migration_small_files_20260617 Phase 13: (1) test_execution_sim_live GUI subprocess (port 8999) crashes mid-test during script generation flow — same failure with both gemini_cli and gemini; NOT provider-specific; 90s timeout reached without AI text; (2) test_live_gui_workspace_exists xdist race — workspace cleanup timing under parallel xdist; passes in isolation. 4 phases: (1) Investigation + Issue 2 parent-commit verification; (2) Fix Issue 2 (TDD); (3) Fix Issue 1 (TDD + remove diagnostic logging); (4) Final verification (11/11 tiers PASS clean). result_migration_small_files_20260617 (shipped 2026-06-18 with the 2 issues reported for diff tracks)
16 A Test Sandbox Hardening spec ✓, plan ✓, metadata ✓, state ✓, ready to start; 5-part fix for test data loss outside ./tests/. Phase 1: investigation + baseline pass count + audit of get_config_path() callers. Phase 2: scripts/audit_test_sandbox_violations.py (FR4 static audit + --strict CI gate). Phase 3: _enforce_test_sandbox autouse fixture in conftest.py using sys.addaudithook (FR1 Python guard; hard fail on any write outside ./tests/). Phase 4: root-cause fix — remove SLOP_CONFIG env-var fallback from src/paths.py; add --config <path> CLI flag to sloppy.py + conftest.py; set_config_override(path) module-level API (FR2). Phase 5: isolate_workspace migration off tmp_path_factory.mktemp to tests/artifacts/_isolation_workspace_<RUN_ID>/; pyproject.toml --basetemp addopts; SLOP_CREDENTIALS/SLOP_MCP_ENV env vars added to non-live_gui tests; tech-stack.md dated note (FR3). Phase 6: scripts/run_tests_sandboxed.ps1 (FR5 Windows restricted-token wrapper, OPT-IN). Phase 7: conductor/code_styleguides/test_sandbox.md + updates to workspace_paths.md and guide_testing.md (FR7 docs). Phase 8: full 11-tier verification. Phase 9: end-of-track report. 13 regression tests in tests/test_test_sandbox.py. ~11 atomic commits. (none — independent; NEW 2026-06-19; test-infrastructure + root-cause fix; primary motivation: user has lost important sample data multiple times over the past month because tests wrote to top-level TOML files; NO ENV VARS for config path per user directive--config CLI flag is the only override mechanism; test workspace file naming: config_overrides.toml; hard fail on any sandbox violation; tests should never need AppData temp (tempfile.mkdtemp/mkstemp without dir= is flagged); baseline 1288 + 4 + 0; out of scope: converting the other 7 SLOP_* env vars (SLOP_GLOBAL_PRESETS, SLOP_GLOBAL_TOOL_PRESETS, SLOP_GLOBAL_PERSONAS, SLOP_GLOBAL_WORKSPACE_PROFILES, SLOP_CREDENTIALS, SLOP_MCP_ENV, SLOP_LOGS_DIR, SLOP_SCRIPTS_DIR) to CLI flags — user considers this a separate "mess" to address in follow-up tracks; deferred: macOS/Linux OS-level wrapper, per-fixture sandbox strictness tuning, read-side isolation)
8 Bootstrap gencpp Python Bindings spec TBD (none — independent)
9 Tree-Sitter Lua MCP Tools spec TBD (none — independent)
10 GDScript Language Support Tools spec TBD (none — independent)
11 C# Language Support Tools spec TBD (none — independent)
12 OpenAI Provider Integration spec TBD (none — independent)
13 Zhipu AI (GLM) Provider Integration spec TBD (none — independent)
14 AI Provider Caching Optimization spec TBD (none — independent)
15 Manual UX Validation & Review spec TBD (none — independent)
15a Manual UX Validation — ASCII-Sketch Workflow spec ✓, plan ✓, ready to start (none — independent; NEW 2026-06-08)
15b Chunkification Optimization (Contingency) spec ✓ (contingency), no plan hard constraint surface (deferred)
16 GenCpp Dogfood Feedback Loop spec TBD (none — independent; oldest pending track)
17 Code Path Audit spec TBD test_infrastructure_hardening_20260609 (merged)
23 A (research) Intent-Based Scripting Languages Survey spec ✓, plan pending (none — independent; NEW 2026-06-12; non-impl research track, time-sensitive: report must complete before nagent v2.2)
24 A (bugfix) AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek) spec ✓, plan ✓, shipped 2026-06-15 (with 1 critical _api_generate regression + 2 deferred bugs — see doeh_test_thinking_cleanup_20260615) (none — independent; NEW 2026-06-14; user-blocking; 3 bugs from data_oriented_error_handling_20260606)
25 B (research) Fable System Prompt Review (Critical Analysis) spec ✓, plan pending (none — independent; NEW 2026-06-17; non-impl research track, informs the deferred nagent-rebuild; 10 cluster sub-reports + 17-section synthesis report >3500 LOC + 3 side artifacts; Fable artifact at docs/artifacts/Fable System Prompt.txt is local-only and NEVER committed)
18 GUI Architecture Refinement (no spec.md) (TBD)
19 Context First Message Fix spec TBD (none — independent)
19 Fix Remaining Tests SUPERSEDED by track 1
20 Test Harness Hardening SUPERSEDED by track 1
21 Test Patch Fixes SUPERSEDED by track 1
22 Test Batching Post-Refactor Polish SUPERSEDED by track 1 (FR1 + FR2)
20 Prior Session Test Harden (20260605) superseded; no action needed

Note on numbering: the legacy file used 0a, 0b, 0c... and 0d, 0e, 0f, 0g for tracks created 2026-06-06+. This is the git-blame sort order, not a logical execution order. The new structure re-orders by dependency.


Phase 0: Infrastructure (Critical)

Initialized: 2026-02 (project foundation)

Completed


Phase 1: Pre-Track Foundation (2026-02 - 2026-03)

No tracks were added under explicit Phase 1; this section is reserved for the early architectural groundwork that preceded the formal track system.

Completed

  • Various one-off refactors; full details in conductor/archive/ by track name prefix.

Phase 2: Strict Execution Queue

Completed 2026-03-06

Completed


Phase 3 - Phase 4: Foundational Tracks (March 2026)

Multiple sub-tracks under the initial feature-development push. All archived.

Archived

Tracks 1 - 29 of the original Phase 4 archive (preserved with original numbers for cross-reference continuity):

  1. Track: Session Context Snapshots & Visibility (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/session_context_snapshots_20260311/

  2. Track: Discussion Takes & Timeline Branching (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/discussion_takes_branching_20260311/

  3. Track: RAG Support Link: ./archive/rag_support_20260308/

  4. Track: Agent Tool Preference & Bias Tuning Link: ./archive/tool_bias_tuning_20260308/

  5. Track: Expanded Hook API & Headless Orchestration Link: ./archive/hook_api_expansion_20260308/

  6. Track: Codebase Audit and Cleanup Link: ./archive/codebase_audit_20260308/

  7. Track: Expanded Test Coverage and Stress Testing Link: ./archive/test_coverage_expansion_20260309/

  8. Track: Beads Mode Integration Link: ./archive/beads_mode_20260309/

  9. Track: Optimization pass for Data-Oriented Python heuristics Link: ./archive/data_oriented_optimization_20260312/

  10. Track: Rich Thinking Trace Handling Link: ./archive/thinking_trace_handling_20260313/

  11. Track: Smarter Aggregation with Sub-Agent Summarization Link: ./archive/aggregation_smarter_summaries_20260322/

  12. Track: System Context Exposure Link: ./archive/system_context_exposure_20260322/

  13. Track: Advanced Log Management and Session Restoration Link: ./archive/log_session_overhaul_20260308/

  14. Track: UI Theme Overhaul & Style System Link: ./archive/ui_theme_overhaul_20260308/

  15. Track: Selectable GUI Text & UX Improvements Link: ./archive/selectable_ui_text_20260308/

  16. Track: Markdown Support & Syntax Highlighting Link: ./archive/markdown_highlighting_20260308/

  17. Track: Custom Shader and Window Frame Support Link: ./archive/custom_shaders_20260309/

  18. Track: UI/UX Improvements - Presets and AI Settings Link: ./archive/presets_ai_settings_ux_20260311/

  19. Track: Discussion Hub Panel Reorganization Link: ./archive/discussion_hub_panel_reorganization_20260322/

  20. Track: Undo/Redo History Support Link: ./archive/undo_redo_history_20260311/

  21. Track: Advanced Text Viewer with Syntax Highlighting Link: ./archive/text_viewer_rich_rendering_20260313/

  22. Track: Tree-Sitter C/C++ MCP Tools Link: ./archive/ts_cpp_tree_sitter_20260308/

  23. Track: Saved System Prompt Presets Link: ./archive/saved_presets_20260308/

  24. Track: Saved Tool Presets Link: ./archive/saved_tool_presets_20260308/

  25. Track: External Text Editor Integration for Approvals Link: ./archive/external_editor_integration_20260308/

  26. Track: Agent Personas: Unified Profiles & Tool Presets Link: ./archive/agent_personas_20260309/

  27. Track: Advanced Workspace Docking & Layout Profiles Link: ./archive/workspace_profiles_20260310/

  28. Track: Review investigation of codebase and expose/cull any hidden invisible prompting Link: ./archive/cull_hidden_prompts_20260502/

  29. Track: Test Regression Verification Link: ./archive/test_regression_verification_20260307/


Phase 5: Codebase Curation

Initialized: 2026-05-07

Completed (all archived)

Analysis & Structural Review

  1. Track: Comprehensive Path Mapping & Tooling Link: ./archive/ai_interaction_call_graph_20260507/ Goal: Automated and manual derivation of all major code paths and pipelines in the system.

  2. Track: Controller State Mutation Matrix Link: ./archive/controller_state_mutation_matrix_20260507/ Goal: Comprehensive map of all methods that modify the AppController and App state.

  3. Track: Source-Wide Redundancy Audit Link: ./archive/source_wide_redundancy_audit_20260507/ Goal: Deep file-by-file audit to identify unused methods, duplicate logic, and dead code.

  4. Track: Curate Provider Registries Link: ./archive/curate_provider_registries_20260507/ Goal: Move the PROVIDERS list to models.py and update all references to use this single source of truth.

  5. Track: Encapsulate AppController Status Link: ./archive/encapsulate_appcontroller_status_20260507/ Goal: Convert ai_status and mma_status to properties with thread-safe setters.

  6. Track: Decouple GUI Log Loading Link: ./archive/decouple_gui_log_loading_20260507/ Goal: Move Tkinter directory selection out of AppController and into gui_2.py.

  7. Track: Refactor Context Aggregation Pipeline Link: ./archive/refactor_context_aggregation_pipeline_20260507/ Goal: Modernize src/aggregate.py and consolidate legacy tier builders.

  8. Track: Cull Unused Symbols Link: ./archive/cull_unused_symbols_20260507/ Goal: Safely remove the 27 dead symbols identified in the redundancy audit.

  9. Track: Structural Dependency Mapping (SDM) Docstrings Link: ./archive/sdm_docstrings_20260509/

  10. Track: AppController Curation & Structural Alignment Link: ./archive/app_controller_curation_20260513/ Goal: Curate src/app_controller.py to match gui_2.py organization and enforce Python style conventions.

  11. Track: Fix 45 failing test files across 12 batches Link: ./archive/fix_test_suite_failures_20260514/

  12. Track: Fix Indentation 1-Space Convention Link: ./archive/fix_indentation_1space_20260516/ Goal: Standardize all Python files to 1-space indentation per AI-Optimized Python Style Guide. Audit and correct indentation in src/, tests/, scripts/, and conductor/ directories.


Phase 6: Context Composition Redesign

Initialized: 2026-05-10

Completed (all archived)

Context Control & Workflow Enhancements

  1. Track: Granular AST Control (Signatures vs. Definitions) Link: ./archive/granular_ast_control_20260510/ Goal: Introduce 'AST Signatures' and 'AST Definitions' states in the Context Panel for C/C++ files.

  2. Track: Context Snapshotting per "Take" Link: ./archive/context_snapshotting_takes_20260510/ Goal: Snapshot and visually restore the Context Panel state when switching between Takes.

  3. Track: Interactive Text Slice Highlighting Link: ./archive/interactive_text_slice_highlighting_20260510/ Goal: Allow highlighting text ranges to create fuzzy-anchored slices (Def, Sig, Hide) that survive file modifications.

  4. Track: Context Batch Operations UX Link: ./archive/context_batch_operations_ux_20260510/ Goal: Add multi-select and batch state modification capabilities to the Context Panel for rapid wrangling.

  5. Track: GenCpp Project Initialization Link: ./archive/gencpp_project_init_20260510/ Goal: Configure manual_slop.toml in the gencpp repo to isolate conductor tracks, logs, and history.

  6. Track: Interactive AST Tree Masking Link: ./archive/interactive_ast_tree_masking_20260510/ Goal: Inspect C/C++ ASTs in the GUI and mask individual classes/functions as Def, Sig, or Hide.

  7. Track: Phase 6 Review and Regression Verification Link: ./archive/phase6_review_20260510/ Goal: Review Phase 6 implementation, perform full-suite batch regression testing, and expand test coverage for new context curation features.

  8. Track: Context Composition Decoupling Link: ./archive/context_comp_decouple_20260510/ Goal: Decouple Files & Media from Context Composition, add directory grouping, file stats, and view mode selection per file.

  9. Track: Context Composition Slice Visualization Link: ./archive/context_comp_slices_20260510/ Goal: Enhance slice visualization with visual editor, annotation support (tags/comments), and view presets.

  10. Track: GUI Refactor & Stabilization Link: ./archive/gui_refactor_stabilization_20260512/ Goal: Refactor gui_2.py to fix regressions and enforce better imgui scoping patterns.

  11. Track: GUI 2 Large Cleanup (originally listed as "I started to do a large cleanup to ./src/gui_2.py..." — the long user message was the track description) Link: ./archive/gui_2_cleanup_20260513/ Goal: Study gui_2.py and derive more information on how to maintain and write code for the Python codebase. Update product guidelines or the python code_styleguidelines based on what is discovered. May also need changes to the mcp_tools for better structural awareness of annotations or other conventions with these python files.

  12. Track: Add Python structural MCP tools (py_remove_def, py_add_def, py_move_def, py_region_wrap) Link: ./archive/python_structural_mcp_tools_20260513/

  13. [~] Track: Context Preview & Slice Editor Fixes Link: ./tracks/context_preview_fixes_20260516/ Goal: Fix Preview button generating empty content, and Inspect/Slices buttons failing to open their respective editor panels. Status: in progress; track folder still in tracks/ (not yet archived).

Active

  1. Track: GenCpp Dogfood Feedback Loop Link: ./tracks/gencpp_dogfood_feedback_20260510/ Goal: Verify Manual Slop can target gencpp at C:/projects/gencpp and establish a feedback mechanism for issues found during dogfooding. Status: oldest pending track (2026-05-10). Track folder still in tracks/.

Hot Reload Feature (2026-05-16)

Single-track feature, not part of a numbered Phase.

Archived

  1. Track: Hot Reload Python Codebase (Phase 2) Link: ./archive/hot_reload_python_20260516/ Goal: Implement selective, state-preserving hot-reload for src/gui_2.py with delegation pattern refactor, manual trigger via Ctrl+Alt+R and GUI button, and visual error tint feedback on failure.

Phase 7: Stabilization & Polishing (2026-05-13 to 2026-06-02)

Two archival phases under the same "Phase 7" umbrella. Both completed; tracks moved to archive/.

Archived


Late May 2026 - Early June 2026: One-Off Fixes and Polish

One-off bug fixes and UX polish that landed in the days leading up to the major track work. All archived.

Archived


Phase 8: UI Polish (2026-06-03)

Initialized: 2026-06-03

User review surfaced five outstanding UI issues, each previously attempted without success. This track addresses them as five independent phases with their own TDD cycles and atomic commits.

Active

  1. Track: UI Polish (Five Issues) Spec: ./../../docs/superpowers/specs/2026-06-03-ui-polish-design.md Plan: ./../../docs/superpowers/plans/2026-06-03-ui-polish.md *Goal: Resolve five long-standing UI issues:
    • Phase 1: GFM markdown table rendering (pre-processor into src/markdown_table.py, wire into MarkdownRenderer.render).
    • Phase 2: Widen the Keep Pairs numeric input next to Truncate in the discussion panel (gui_2.py:3829, width 80 -> 140, switch to drag_int).
    • Phase 3: Fix Refresh Registry button in Log Management — currently instantiates LogRegistry without calling load_registry() so the displayed table never reflects on-disk state (gui_2.py:1675).
    • Phase 4: Add Vendor State tab to Operations Hub — at-a-glance provider/model, context-window utilization, cache hit rate, last error class, vendor quota (new src/vendor_state.py aggregator + controller.vendor_quota field + ai_client wire-up).
    • Phase 5: Files & Media > Files directory-grouped tree (re-use aggregate.group_files_by_dir, mirror render_context_files_table collapsible-node style).*

Recently Archived (post-Phase 8)

  • Track: Clean Install Test [checkpoint: d14ae3b] Link: ./tracks/clean_install_test_20260603/, Spec: ./../../docs/superpowers/specs/2026-06-02-clean-install-test-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-clean-install-test.md Goal: Add opt-in pytest test (RUN_CLEAN_INSTALL_TEST=1) that clones the repo to tmp_path, runs uv sync, launches sloppy.py --enable-test-hooks, verifies Hook API responds. Catches "works on my machine" failures. Added clean_install marker to pyproject.toml. Created tests/test_clean_install.py (114 lines, uses urllib.request from stdlib per tech-stack.md dependency minimalism rule - deviation from plan). Skipped by default. Marked with @pytest.mark.clean_install.

  • Track: Fix markdown_helper.py for imgui-bundle >=1.92.801 [checkpoint: 7a34edf] Link: ./tracks/markdown_helper_language_api_compat_20260603/ Goal: First thing the clean install test caught. ed.TextEditor.LanguageDefinitionId enum was removed in imgui-bundle>=1.92.801. Replaced with version-compat shim helpers _get_language_id(name) and _set_editor_language(editor, lang_obj) that detect the API at runtime (1.92.5 enum vs 1.92.801+ factory). Also added parallel _editor_lang_cache to track current language tag per editor (robust to API name differences like "C++" vs "cpp"). Verified: test passes in opt-in mode (1.92.801), shim still works in local 1.92.5 env, follow-up commit b306f8f corrected test URL /api/mma_status -> /api/gui/mma_status (actual endpoint per src/api_hooks.py:181).

  • Track: Multi-Theme TOML System (Multi-Themes Mod) [checkpoint: 38abf231] Link: ./tracks/multi_themes_20260604/, Plan: ./../../docs/superpowers/plans/2026-06-04-theme-syntax-modularization.md Goal: TOML-based theming: per-theme file layout (themes/<name>.toml global + <project>/project_themes.toml overrides), schema (syntax_palette + [colors] table of imgui.Col_ snake_case keys), public API (load_themes_from_disk, get_syntax_palette_for_theme, apply_syntax_palette), MarkdownRenderer calls apply_syntax_palette on init, color-callable convention (C_LBL() / C_VAL() so theme switches take effect at use site), upstream 4-syntax-palette limit documented in ./../../docs/guide_themes.md (new guide). 8 new theme files shipped. Theme-caused production bug fixed at src/gui_2.py:3705-3707 (commit 1469ecac): DIR_COLORS dict stored C_VAL not C_VAL(), so imgui.text_colored(d_col, ...) was being passed a function. Fixed by calling the function at the use site.

  • [~] Track: Test Regression Fixes (post multi-themes ship) [checkpoint: d7487af4] Link: ./tracks/regression_fixes_20260605/, Plan: ./../../docs/superpowers/plans/2026-06-05-regression-fixes.md Goal: Resolve 21 failing tests surfaced after the multi-themes ship. 11 of 21 fixed across 10 atomic commits: theme regression (test_gui_progress C_LBL/C_VAL API change, 38abf231), pre-existing non-live_gui (test_gui_phase4 markdown_helper mocks, df43f158; test_view_presets persona_manager mock, 970f198c), GUI production bug (DIR_COLORS callable, 1469ecac), live_gui LogPruner busy loop (ac08ee87), RAG NoneType guard (c96bdb06). Root cause of remaining 10 live_gui failures identified (commit d7487af4): imgui.save_ini_settings_to_memory() at src/gui_2.py:601 crashes C-level (0xc0000005) when called in the first few render frames because ImGui's internal state (Fonts, DisplaySize, Settings) isn't ready. Crash is uncatchable from Python. Fixed with _ini_capture_ready flag (defer-not-catch pattern): first call returns b"" and sets the flag, subsequent calls invoke the C function. Bisect anchors: 7df65dff (pre-existing failures start), 7ea52cbb (theme-caused failures start). Deferred follow-up track needed for ~5 remaining live_gui tests (MMA engine state transitions, RAG status timing, one test needing substantial render path mocks).

  • Track: Live-GUI Fragility Fixes (post regression_fixes ship) [checkpoint: 1488e715] [superseded by live_gui_test_hardening_v2] Link: Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-fragility-fixes.md, Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-fragility-fixes-design.md Goal: Resolve the 3 remaining live_gui failures (269/272 → 271/272 plus 1 new regression unit test). 1-line src fix in _capture_workspace_profile (change ini=b"" to ini="" to satisfy WorkspaceProfile.ini_content: str contract that tomli_w enforces); the b"" sentinel was a regression from d7487af4 that caused save_workspace_profile to raise TypeError, profile never saved, load_workspace_profile became a no-op. 1 new unit test (tests/test_workspace_profile_serialization.py) encoding the str/bytes contract. test_prior_session_no_pop_imbalance is deferred to a separate follow-up track — the test was more under-mocked than the spec assumed; fixing imscope.window tuple-return only revealed the next un-mocked dependency (imgui.begin returning bool where 2-tuple expected at line 4496). render_main_interface is a kitchen-sink function requiring 50+ mocks; a follow-up track will either add the missing mocks or refactor the test to exercise a narrow prior-session render path. Change 4 (doc hardening of defer-not-catch sections) deferred to track end; not done due to scope focus.

  • Track: Live-GUI Test Hardening v2 (post v1 ship) [complete: 26e0ced4] Note: No standalone track directory was created; the v2 work was completed as commit 26e0ced4 within the live_gui_fragility_fixes_20260605 lineage. The "v1" track directory ./archive/hot_reload_python_20260516/ is unrelated; this is a logical successor track with no folder of its own. Goal: Resolve the 4 remaining live_gui failures (was 3 in v1; 1 new regression). v1 fixed the str/bytes sentinel bug but exposed a deeper issue. Decomposed into 4 sub-tracks, 3 active: Sub-track 1: live_gui_state_sync_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-state-sync-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-state-sync.md. REAL root cause was bad indentation in src/gui_2.py:607 (user fixed). The App class had _capture_workspace_profile being parsed as nested inside _apply_snapshot due to indentation. Once fixed, 3 tests (test_auto_switch_sim, test_workspace_profiles_restoration, test_undo_redo_lifecycle) immediately passed. App/Controller state sync is already correctly handled by getattr/setattr at lines 478-487. Sub-track 2: prior_session_test_harden_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-prior-session-test-harden-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-prior-session-test-harden.md. Test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4. Sub-track 3: wait_for_ready_test_pattern_20260605 - SKIPPED. Tests already pass without polling. The flake hypothesis (time.sleep not enough) was wrong; the real cause was the indent. Polling can be a follow-up hardening pass if tests become flaky in CI. Sub-track 4: undo_redo_lifecycle_fix_20260605 - RESOLVED by Sub-track 1 indent fix. test_undo_redo_lifecycle now passes; no separate investigation needed. Net result: 4 originally-failing live_gui tests all pass. User can run the full batched suite to confirm.


Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)

Initialized: 2026-06-06 — the current major sprint. Four foundational tracks launched in this sprint, plus one follow-up. As of 2026-06-10: 3 recently completed (startup_speedup, test_batching_refactor, test_infrastructure_hardening); 4 in plan state (qwen, error_handling, data_structure, mcp_arch). The 4 in-plan tracks are now unblocked (the upstream test_infrastructure_hardening track is shipped).

Recently Completed (2026-06-06 to 2026-06-10)

Lightweight chronology; full spec/plan/state per track is in the linked folder.

Track: Sloppy.py Startup Speedup [COMPLETE 2026-06-07]

Link: ./tracks/startup_speedup_20260606/ (full spec/plan/state in folder)

[track-created: cd4fb045] [phase-1-2-done: f9a01258] [phase-3-done: 51c054ec] [phase-4-done: 3849d304] [phase-5-done: 515a3029] [sub-track-1-done: 253e1798] [sub-track-2e+f-done: 2e3a6385] [audit-CLEAN: 2e3a6385] [conftest-atexit-fix: 8957c9a5] [post-shipping-fix-1: 8c4791d0] [post-shipping-fix-2: 88fc42bb] [post-shipping-fix-3: 52ea2693]

9 phases, 57 tasks. 44 TDD tests added. Main Thread Purity Invariant enforced via scripts/audit_main_thread_imports.py CI gate. Final measured: import src.ai_client 161ms (was 1800ms; 91% reduction); import src.gui_2 341ms (was 1770ms; 81% reduction); total ~3067ms saved. 62 audit violations remain (large refactors deferred).

Track: Tier 2 Sandbox File Leak Prevention [COMPLETE 2026-06-20]

Link: ./tracks/tier2_leak_prevention_20260620/, Report: ../../docs/reports/TRACK_COMPLETION_tier2_leak_prevention_20260620.md

[phase-1-revert: fab2e55b] [phase-2-hook: 81e1fd7b] [phase-3-audit: f5d8ea04] [phase-4-install: 8f54deda]

Selective revert of the 4 user-named files from offender commit 00e5a3f2 (.opencode/agents/tier2-autonomous.md, .opencode/commands/tier-2-auto-execute.md, opencode.json, mcp_paths.toml). 3-layer defense-in-depth added: pre-commit hook (auto-unstages forbidden files at commit boundary; 12 tests), working-tree audit script with --strict CI gate (13 tests), and hook installation via scripts/tier2/setup_tier2_clone.ps1. 25 default-on tests pass. Out of scope (per user explicit list): the 4 throwaway scripts in scripts/tier2/artifacts/.../*.py and the project_history.toml timestamp. DEFERRED: CI wiring of audit_tier2_leaks.py --strict; rebase of stale tier-2 branches (tier2/result_migration_app_controller_phase6_20260619, tier2/test_sandbox_hardening_20260619) on origin/master@8f54deda to drop 00e5a3f2 (user action).

Track: Test Batching Refactor [COMPLETE 2026-06-08] [archived]

Link: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/

[track-created: b7a97374] [COMPLETE 2026-06-08] [phase-1-done: 57285d04] [phase-3-done: 5252b6d7] [phase-4-done: 50bd894f] [archived: 50bd894f]

4 phases, fixture-class-isolated tiers (0-3 + H + P) replacing alphabetical 4-at-a-time batching. Hand-curated tests/test_categories.toml overrides for cross-cutting files. Phase 2 (CI shadow run) skipped (no CI in repo).

Track: Test Infrastructure Hardening (2026-06-09) [COMPLETE 2026-06-10] [archived]

Link: ./archive/test_infrastructure_hardening_20260609/

[track-created: 566cf08c] [phase-1-done: 5df22fa8] [phase-2-done: 67d0211e] [phase-3-done: 006bb114] [phase-4-done: b8fcd9d6] [phase-5-done: 33d5cac] [phase-6-done: 7b87bbf5] [phase-7-done: 84edb200] [phase-8-done: 719fe9a]

8 phases, ~60 surgical tasks, 6.5 days. Fixes 3 root causes of test regression churn: FR1 subprocess health autouse, FR2 live_gui_workspace fixture (per-run timestamped under tests/artifacts/), FR3 _sync_rag_engine token+dirty coalescing. Plus FR4 set_value hook + FR5 clean_baseline marker. 314/314 tests green across all 11 tier batches. Closing report: docs/reports/test_infrastructure_hardening_batch_green_20260610.md. Lineage: workspace_path_finalize_20260609 + mma_tier_usage_reset_fix_20260610 + rag_phase4_sync_fix_20260610 (all also archived).

In Plan (or Pending Spec)

Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix [track-created: 7c1d597e]

Link: ./tracks/qwen_llama_grok_integration_20260606/, Spec: ./tracks/qwen_llama_grok_integration_20260606/spec.md, Plan: ./tracks/qwen_llama_grok_integration_20260606/plan.md (to be authored by writing-plans skill)

Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a Vendor Capability Matrix (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in src/vendor_capabilities.py. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared send_openai_compatible() helper in src/openai_compatible.py that operates on a normalized request/response data structure; each _send_<vendor>() is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor _send_minimax() to use the helper (~250 lines → ~50). Out of scope (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive. Now blocked by test_infrastructure_hardening_20260609 (was: none).

Status (2026-06-11): Phases 1-5 done; Phase 6 (docs) in progress. NOT ARCHIVING — has a follow-up track. See ./tracks/qwen_llama_grok_followup_20260611/ for the 5-phase follow-up. Audit report: ../docs/reports/qwen_llama_grok_followup_audit_20260611.md. 50/79 tasks done. Known gaps: tool-call loop only on MiniMax; 1 of 9 UX adaptations shipped; PROVIDERS in models.py is sprawl; src/ai_client.py needs codepath consolidation; local models need first-class priority; 12 v2 matrix fields documented but not implemented; Anthropic/Gemini/DeepSeek still not on the matrix.

Track: Data-Oriented Error Handling (Fleury Pattern) [track-created: 494f68f9]

Link: ./tracks/data_oriented_error_handling_20260606/, Spec: ./tracks/data_oriented_error_handling_20260606/spec.md, Plan: ./tracks/data_oriented_error_handling_20260606/plan.md

Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New src/result_types.py (ErrorKind enum, ErrorInfo dataclass, Result[T] with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new conductor/code_styleguides/error_handling.md canonical reference. Refactor src/mcp_client.py ((p, err) tuples → Result; 30+ assert p is not None → nil-sentinel paths), src/ai_client.py (ProviderError exception → ErrorInfo dataclass; _send_<vendor>()_send_<vendor>_result() returning Result[str]; send() marked @deprecated; new send_result() public API), and src/rag_engine.py (RAGEngine methods → Result returns). Update conductor/product-guidelines.md + workflow.md + docs/guide_*.md so the convention is documented and future plans can incrementally migrate the remaining src/ files. Blocked by startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive. Follow-up: public_api_migration_20260606 (planned; not yet specced; no directory yet) — removes the deprecated ai_client.send() and migrates all callers. Detailed in the parent track's spec §12.1.

Status (2026-06-12): SHIPPED. Phases 1-5 complete on branch doeh-ai_client. Path C was used for src/mcp_client.py (additive *_result variants; the 30+ tool-function refactor deferred to follow-up). Full refactor was used for src/ai_client.py (ProviderError removed, 9 _send_*() renamed, send() marked @deprecated, send_result() public API added) and src/rag_engine.py (_init_vector_store_result, _validate_collection_dim_result, _get_state with NilRAGState). 28 new tests pass; 4 existing tests updated; 13 test regressions in test_llama_provider.py (3) + test_llama_ollama_native.py (4) + test_grok_provider.py (3) + test_minimax_provider.py (2) + test_live_gui_integration_v2.py (1) — all from the Phase 3 renames + ProviderError removal. Regressions are documented in state.toml [regressions_20260612] and are the intended work of public_api_migration_20260606. Archive status: directory remains in place (matches repo convention; archive is conceptual, not physical).

Track: Data Structure Strengthening (Type Aliases + NamedTuples) [track-created: ed42a97a]

Link: ./tracks/data_structure_strengthening_20260606/, Spec: ./tracks/data_structure_strengthening_20260606/spec.md, Plan: ./tracks/data_structure_strengthening_20260606/plan.md (to be authored by writing-plans skill)

Goal: Improve AI-readability by naming 430 currently-anonymous dict[str, Any] / list[dict[...]] / Tuple[...] types. New src/type_aliases.py with 10 TypeAlias definitions (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback) and 1 NamedTuple (FileItemsDiff). Mechanical replacement of 345 weak sites across 6 high-traffic files: src/ai_client.py (139), src/app_controller.py (86), src/models.py (51), src/api_hook_client.py (32), src/project_manager.py (20), src/aggregate.py (17). Add --strict mode to the existing scripts/audit_weak_types.py (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate scripts/audit_weak_types.baseline.json with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. Data-grounded: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. Honest about what's missing: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk. Now blocked by test_infrastructure_hardening_20260609 (was: none).

Track: AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek) [track-created: 2026-06-14] [shipped: 2026-06-15]

Link: ./tracks/ai_loop_regressions_20260614/, Spec: ./tracks/ai_loop_regressions_20260614/spec.md, Plan: ./tracks/ai_loop_regressions_20260614/plan.md, Metadata: ./tracks/ai_loop_regressions_20260614/metadata.json, Report: ../../docs/reports/TRACK_COMPLETION_ai_loop_regressions_20260615.md

Status: 2026-06-15 — SHIPPED with 1 known production regression + 2 deferred bugs (both flagged for follow-up). 3 documented bugs (Bug #1 dead except ai_client.ProviderError, Bug #2 error → no discussion entry, Bug #3 MiniMax thinking mono) are fixed. 7 new regression tests pass; 2 pre-existing tests in test_live_gui_integration_v2.py were adapted (not skipped). 12 commits.

Goal: Diagnose and fix the user-blocking AI loop regressions for the 4 providers (MiniMax, Gemini, Gemini CLI, DeepSeek) most heavily touched by the data_oriented_error_handling_20260606 track (shipped 2026-06-12) and the subsequent ai client pass commit 5030bd84 (2026-06-13, 503-line src/ai_client.py refactor). 3 distinct bugs: Bug #1 (3 dead except ai_client.ProviderError clauses in src/app_controller.py:305, 313, 3692 — the class was removed in commit 64b787b8). Bug #2 (_handle_request_event calls the deprecated ai_client.send() which now returns "" on error; _on_comms_entry filters empty text). Bug #3 (_send_minimax doesn't wrap reasoning in <thinking> tags in returned text).

5 phases: Phase 1 (TDD red), Phase 2 (FR1 fix), Phase 3 (FR2 fix), Phase 4 (FR3 fix), Phase 5 (regression sweep + docs). 17 tasks, 12 atomic commits, ~1.5 days of Tier 2 work.

Deferred to follow-up tracks (per user direction 2026-06-14): (1) Gemini / Gemini CLI thinking-format compatibility (Bug #4) — see doeh_test_thinking_cleanup_20260615 Phase 3. (2) <think> (half-width) marker support in thinking_parser.py (Bug #5) — see doeh_test_thinking_cleanup_20260615 Phase 4.

blocks: public_api_migration_20260606 (this track migrates 3 broken sites; the public_api track picks up the remaining 5 production + 63 test call sites).

Track: Data-Oriented Error Handling Test & Thinking-Parser Cleanup [track-created: 2026-06-15]

Link: ./tracks/doeh_test_thinking_cleanup_20260615/, Spec: ./tracks/doeh_test_thinking_cleanup_20260615/spec.md, Plan: ./tracks/doeh_test_thinking_cleanup_20260615/plan.md, Metadata: ./tracks/doeh_test_thinking_cleanup_20260615/metadata.json

Status: 2026-06-15 — Active, ready for Tier 2 implementation. User-blocking cleanup track. 1 critical production regression + 10 pre-existing test mock bugs + 2 deferred bugs (from ai_loop_regressions_20260614) + 2 housekeeping items.

Goal: Consolidate the cleanup work that didn't fit in data_oriented_error_handling_20260606 (the parent refactor) and ai_loop_regressions_20260614 (the immediate fix track). 5 phases: Phase 1 (CRITICAL: fix _api_generate NameError regression introduced by ai_loop_regressions_20260614 commit 2b7b571a — the FR2 fix accidentally removed the context_to_send variable definition while preserving its usage at line 278), Phase 2 (fix 11 pre-existing test mock bugs: 3 in test_grok_provider, 3 in test_llama_provider, 4 in test_llama_ollama_native, 1 in test_ai_client_tool_loop_builder, 1 in test_headless_service), Phase 3 (Bug #4 deferred: Gemini / Gemini CLI thinking-format compatibility), Phase 4 (Bug #5 deferred: <think> half-width marker support in thinking_parser), Phase 5 (housekeeping: state.toml duplicate-key fix, tracks.md row 24 update, full suite sweep, doc updates). 16 tasks, ~15 atomic commits, 5-8 hours of Tier 2 work (0.5-1 day).

Out of scope (documented in spec.md §7 + §12): public_api_migration_20260606 (planned; the broader migration of 5 production + ~50 test call sites not touched here), live_gui_mock_injection_20260615 (recommended; infrastructure for proper e2e live_gui + AI client tests), test_rag_phase4_final_verify (separate RAG concern), UI Polish Five Issues track phases 2/3 (separate track).

Track: MCP Architecture Refactor (Sub-MCP Extraction) [track-created: 2720a894]

Link: ./tracks/mcp_architecture_refactor_20260606/, Spec: ./tracks/mcp_architecture_refactor_20260606/spec.md, Plan: ./tracks/mcp_architecture_refactor_20260606/plan.md (to be authored by writing-plans skill)

Goal: Split the 2,205-line monolithic src/mcp_client.py (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Naming convention mcp_<type>.py for native MCPs: mcp_file_io.py (9 tools), mcp_python.py (14), mcp_c.py (5), mcp_cpp.py (5), mcp_web.py (2), mcp_analysis.py (2). The existing ExternalMCPManager is extracted to mcp_external.py (class name preserved). New MCPController class in src/mcp_client.py holds the 3-layer security model (extracted to src/mcp_client_security.py), the ALL_SUB_MCPS registration list, and the inverted-dict dispatch lookup. New src/mcp_client_legacy.py re-exports all 45+ old symbols for backward compat (the 4 existing test files + src/app_controller.py:61 continue to work). Each sub-MCP's invoke() returns Result[str, ErrorInfo] (Fleury pattern). Path parameters use the Metadata family aliases. Blocked by test_infrastructure_hardening_20260609, data_oriented_error_handling_20260606 (for Result/ErrorInfo), and data_structure_strengthening_20260606 (for Metadata aliases). 7 phases: foundation (security + controller), move-to-legacy, extract File I/O, extract Python, extract C/C++/Web/Analysis, extract External, dispatch update + docs + archive. Out of scope (per user): a per-MCP DSL (APL/K/Cosy-inspired) for compact tool calls — deferred to mcp_dsl_20260606 follow-up. JSON-only for now.

Track: RAG Phase 4 Stress Test Fix [x] — fixed 16412ad5

Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). index_file() upserts silently corrupt the collection, then search() fails with Collection expecting embedding with dimension of 3072, got 384 and the AI request never reaches 'done' status, timing out the 500.5s = 25s poll loop. Fix: RAGEngine._init_vector_store now calls _validate_collection_dim which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: test_rag_collection_dim_mismatch_recreates_collection and test_rag_collection_dim_match_preserves_collection in tests/test_rag_engine.py. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*

Track: SQLite-Granularity Inline Docs for gui_2.py [COMPLETE: sqlite_docs_gui_2_20260612]

Link: ./tracks/sqlite_docs_gui_2_20260612/, Spec: ./tracks/sqlite_docs_gui_2_20260612/spec.md, Plan: ./tracks/sqlite_docs_gui_2_20260612/plan.md

Status: 2026-06-12 — COMPLETE. SQLite-style docstrings with embedded ASCII layouts and DAG context have been added to key modules representing App lifecycle, discussion panels, context panels, settings hubs, and diagnostics panels.

Goal: Add SQLite-granularity docstrings with embedded ASCII layouts and DAG relationships for src/gui_2.py panel-by-panel. Ensure zero functional regression. 5 phases: app lifecycle & setup, discussion panel, context panel, settings/hubs, and diagnostics/modals.

Track: Continued SQLite-Granularity Inline Docs for gui_2.py [COMPLETE: sqlite_docs_gui_2_continued_20260613]

Link: ./tracks/sqlite_docs_gui_2_continued_20260613/, Spec: ./tracks/sqlite_docs_gui_2_continued_20260613/spec.md, Plan: ./tracks/sqlite_docs_gui_2_continued_20260613/plan.md

Status: 2026-06-13 — COMPLETE. Completed the SQLite-style docstring initiative for preset managers, editors, persona selectors, and the command palette modal.

Goal: Document preset managers/editors, persona selectors/editors, provider panel, and command palette in src/gui_2.py and src/command_palette.py with embedded SSDL and ASCII layouts.

Track: SQLite-Granularity Inline Docs for ai_client.py [COMPLETE: ai_client_docs_20260613]

Link: ./tracks/ai_client_docs_20260613/, Spec: ./tracks/ai_client_docs_20260613/spec.md, Plan: ./tracks/ai_client_docs_20260613/plan.md

Status: 2026-06-13 — COMPLETE. Added SQLite-granularity docstrings with SSDL traces, parameters, functional scopes, and thread boundaries for the primary entry points, providers, and helper functions in src/ai_client.py.

Goal: Add SQLite-granularity docstrings with SSDL traces, parameters, functional scopes, and thread boundaries for the primary entry points, providers, and helper functions in src/ai_client.py.

Track: Intent-Based Scripting Languages Survey [COMPLETE: 213e4994]

Link: ./tracks/intent_dsl_survey_20260612/, Spec: ./tracks/intent_dsl_survey_20260612/spec.md, Plan: ./tracks/intent_dsl_survey_20260612/plan.md, Report: ./tracks/intent_dsl_survey_20260612/report_v1.2.md, v1.1: ./tracks/intent_dsl_survey_20260612/report_v1.1.md, v1.0: ./tracks/intent_dsl_survey_20260612/report.md, Review: ./tracks/intent_dsl_survey_20260612/reportreview.md

Status: 2026-06-12 — COMPLETE. Research-only track (non-impl). Final deliverable: report_v1.2.md (1343 lines, 168KB+, 7 sections + 9-subsection expanded Appendix). 4-tier vocab with 42 verbs (T1 math 12, T2 pipeline 12, T3 shell 10, T4 AI-fuzzing 8); 10 prior-art clusters (0: O'Donnell philosophical anchor; 1: Concatenative; 2: Array; 3: Intent-mapping; 4: Meta-Tooling DSLs; 5: SSDL; 6: Command Palette; 7: Result convention; 8: Metadesk Self-Describing Data + Tag Dispatch; 9: Verse Multi-Paradigm Calculi with Transactional Semantics); 14-primitive grammar from user's math pseudocode; 4 hardware anchor claims; 10 AI-agent properties tying to existing project architecture; 8 open questions for the follow-up interpreter prototype. Version history: v1.0 (418 lines) → v1.1 (1301 lines, +883): XML/JSON rejection citation fix, OCR-restored Lottes quote, softened Wasm streaming-parse inference, expanded Appendix A.1-A.9. → v1.2 (1343 lines): (1) Renamed arena { }tape { } (46 occurrences); (2) Mixed postfix/infix notation for math; (3) nagent attribution corrected (Jody Bruchon → Mike Acton); (4) Added Cluster 8 (Metadesk) and Cluster 9 (Verse) — survey now covers 10 clusters (sub-agents at research/cluster_8_metadesk.md and research/cluster_9_verse.md). Time-sensitive goal met: completed before nagent v2.2 hard boundary. Will be consumed by nagent v2.2 (Future-Track Candidate #4) and the future interpreter prototype (follow-up B track, separate). Appendix A.3/A.4 retain v1.1 form pending a sync pass; noted in v1.2 changelog at the top of the report.

Goal: Survey intent-based scripting languages as a design philosophy and propose a Meta-Tooling-facing intent DSL vocabulary. Research-only (non-impl): produces 1 markdown file at conductor/tracks/intent_dsl_survey_20260612/report.md. No new src/ code, no new tests, no pyproject.toml changes. The report is the foundation document for the user's nagent v2.2 (its "Future-Track Candidate #4: Intent-based DSL" section), the placeholder intent_dsl_for_meta_tooling_20260608_PLACEHOLDER (per mcp_architecture_refactor_20260606/spec.md §12.1 and nagent_review_20260608/metadata.json:28), and a future interpreter prototype (follow-up B track, separate). 7 sections: (1) the "intent-based" design philosophy (O'Donnell immediate-mode as the anchor); (2) prior art across 10 clusters (0: John O'Donnell IMGUI/MVC at johno.se/book/; 1: Forth family — Forth, ColorForth, KYRA/Onat, x68/Lottes, Joy, CoSy/Bob Armstrong; 2: Array — APL, K, BQN, Uiua; 3: Intent-mapping — Jofito/Jody, jq, nagent tag protocol [rejected as model], Wasm; 4: Meta-Tooling DSLs — mcp_dsl_20260606 placeholder, nagent's Bridge DSL, OpenAI/Anthropic tool-use; 5: SSDL shape primitives per computational_shapes_ssdl_digest_20260608.md; 6: Project's own Command Palette 33 commands; 7: Result[T] + ErrorInfo convention per data_oriented_error_handling_20260606); (3) the 14-primitive grammar formalized from the user's math pseudocode (determinate/minor/matrix-transpose snippets), with explicit ambiguity flags; (4) the 4-tier vocab (~40 verbs: T1 math ~10, T2 data pipeline ~12, T3 shell ~10, T4 AI-fuzzing tolerance ~8 — T4 is the novel contribution); (5) hardware mapping with 4 anchor claims (Onat/Lottes 2-register stack + magenta pipe + basic blocks + lambdas + preemptive scatter; O'Donnell "widgets are method invocations"; Forth/CoSy concatenative syntax; APL/K array data); (6) AI-agent properties (10 claims tying to existing project architecture: Meta-Tooling domain per guide_meta_boundary.md, runtime path through cli_tool_bridge.py, 3-layer security per guide_tools.md, 4 memory dimensions per nagent v2.1 §2.1, stable-to-volatile cache ordering, Result[T] envelope, Command Palette 33 commands, Hook API state fields, O'Donnell IEventTarget = sandbox verb, O'Donnell "reads are free" = cheap Tier 2 verbs); (7) ≥6 open questions for follow-up B (interpreter prototype) + connection block to intent_dsl_for_meta_tooling_20260608_PLACEHOLDER. 4 phases: source gathering + outline (checkpoint commit), write sections 1-3, write sections 4-7, self-review + user review + commit + register in tracks.md. Time-sensitive: report must complete before nagent v2.2 ships.*

Spec approved 2026-06-12 (commit b389f1be). 789 lines; modeled on data_oriented_error_handling_20260606/spec.md.

Track: Prior Session Test Harden (20260605) [superseded by live_gui_test_hardening_v2_20260605]

Status: 2026-05-05 — Surfaced during live_gui_fragility_fixes_20260605 execution. test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders is more under-mocked than expected. Completed as part of live_gui_test_hardening_v2_20260605: test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4.

Backlog (Provider + Language + Investigation)

Track: Bootstrap gencpp Python Bindings

Link: ./tracks/gencpp_python_bindings_20260308/

Track: Tree-Sitter Lua MCP Tools

Link: ./tracks/tree_sitter_lua_mcp_tools_20260310/

Track: GDScript Language Support Tools

Link: ./tracks/gdscript_godot_script_language_support_tools_20260310/

Track: C# Language Support Tools

Link: ./tracks/csharp_language_support_tools_20260310/

Track: OpenAI Provider Integration

Link: ./tracks/openai_integration_20260308/

Track: Zhipu AI (GLM) Provider Integration

Link: ./tracks/zhipu_integration_20260308/

Track: AI Provider Caching Optimization

Link: ./tracks/caching_optimization_20260308/

Track: Manual UX Validation & Review

Link: ./tracks/manual_ux_validation_20260302/

Track: Manual UX Validation — ASCII-Sketch Workflow (NEW 2026-06-08)

Link: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/, Spec: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md, Plan: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/plan.md Goal: Promote the ASCII-sketch UX ideation workflow (docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to a real track. Resolves 5 open questions (vocabulary preference, comparison policy, storage location, tooling, frequency), then executes the workflow on the first target: the per-entry rendering of the Discussion Hub at src/gui_2.py:3770 render_discussion_entry. The 23-op matrix A1-A7 in docs/guide_discussions.md is the source of truth; the SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md, 504 lines) informs the internal refactoring decisions. Complements the broader 20260302 track. 4 phases, 21 tasks, TDD-style for Phase 3. User-confirmed worth doing. Status: Active; Phase 1 (5 open questions to the user) is the current phase.

Track: Chunkification Optimization (NEW 2026-06-08, CONTINGENCY)

Link: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/, Spec: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md Goal: Contingency document only. Activates ONLY when a hard constraint surfaces that no existing Python package can solve AND the target is hot enough to justify the C11 build cost. Per user (verbatim): "only worth it if I reach a hard constraint that I cannot solve with an existing python package." The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are NOT currently bottlenecks per src/aggregate.py:380-454 (pure-Python string concat, zero third-party markdown deps in pyproject.toml:6-27) and src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity, debounced). First fix if they become bottlenecks: add markdown-it-py OR switch to pickle/msgspec — NOT C11. The shape when activated: subprocess-launch C11 binary with request/response blob wire format (NOT stateful C extension). The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 + "Xar-style chunked arrays" recommendation in §5.2 pre-support this track. Status: Deferred. Promotes to active track when (if) the first hard constraint surfaces.

Track: Context First Message Fix

Link: ./tracks/context_first_message_fix_20260604/

Track: Fix Remaining Tests

Link: ./tracks/fix_remaining_tests_20260513/

Track: Test Harness Hardening

Link: ./tracks/test_harness_hardening_20260310/

Track: Test Patch Fixes

Link: ./tracks/test_patch_fixes_20260513/

Track: Test Batching Post-Refactor Polish

Link: ./tracks/test_batching_post_refactor_polish_20260607/

Track: Code Path Audit

Link: ./tracks/code_path_audit_20260607/, Spec: ./tracks/code_path_audit_20260607/spec.md, Plan: ./tracks/code_path_audit_20260607/plan.md (to be authored by writing-plans skill) Goal: Build src/code_path_audit.py — a static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: custom postfix .dsl data + markdown + Mermaid + prefix tree text under docs/reports/code_path_audit/<date>/. The follow-up pipeline_pruning_20260607 consumes the .dsl files; the markdown + tree are for human review. MMA worker spawn is cold per user. Timing (revised 2026-06-08): the audit must run after the 4 foundational tracks ship (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor); pre-4-tracks code is too stale to ground optimization decisions.

Track: GUI Architecture Refinement

Link: ./tracks/gui_architecture_refinement_20260512/ (no spec.md; needs scoping before planning)

Follow-up (Planned, Not Yet Specced)

Track: Public API Result Migration (follow-up to data_oriented_error_handling_20260606)

Plan to be authored when data_oriented_error_handling_20260606 is complete; not started yet. Goal: Remove the deprecated ai_client.send() and migrate all callers to send_result(). Affects 5 production call sites in src/ (src/app_controller.py:290 + :3692, src/multi_agent_conductor.py:591, src/orchestrator_pm.py:86, src/conductor_tech_lead.py:68, plus src/mcp_client.py:2274 in the tool-result dispatch path) and 63 test files. The enumeration + baseline counts are recorded in the parent track's spec §12.1 and verified in this track's state.toml [baseline_post_qwen_track].

send_result(...) mirrors the send(...) signature (13+ parameters including 8 callbacks); see docs/guide_ai_client.md "Data-Oriented Error Handling (Fleury Pattern) > Public API" for the call shape.

Track: Public API Migration + UI Polish Test Cleanup (combined stability track) [track-created: 2026-06-15]

Link: ./tracks/public_api_migration_and_ui_polish_20260615/, Spec: ./tracks/public_api_migration_and_ui_polish_20260615/spec.md, Plan: ./tracks/public_api_migration_and_ui_polish_20260615/plan.md, Metadata: ./tracks/public_api_migration_and_ui_polish_20260615/metadata.json

Status: 2026-06-15 — Active, ready for Tier 2 implementation. User-blocking stability track that finishes the cleanup work from data_oriented_error_handling_20260606 and doeh_test_thinking_cleanup_20260615 before the data structure track.

Goal: Two concerns, one track. (A) Public API Migration — remove the deprecated ai_client.send() legacy wrapper. Migrate 3 remaining production call sites (src/conductor_tech_lead.py:68, src/orchestrator_pm.py:86, src/multi_agent_conductor.py:591) + 12 test files to send_result(). Fix 4 of the 10 pre-existing test failures (2 Qwen + 2 symbol_parsing) as a side effect. (B) UI Polish Test Cleanup — fix 2 broken test assertions in test_discussion_truncate_layout.py and test_log_management_refresh.py (the production code was already fixed by user commits d0b06575 and df7bda6e; the tests use find() which locates the comment block instead of the actual code). Combined result: 6 of 10 pre-existing failures fixed (1280 + 6 = 1286 pass; 4 RAG failures deferred to next track).

7 phases: Phase 1 (3 production call sites migrated), Phase 2 (12 test files migrated to send_result()), Phase 3 (2 Qwen test fixes), Phase 4 (2 symbol_parsing test fixes), Phase 5 (2 UI Polish test fixes), Phase 6 (deprecation removed: send() function + filterwarnings + test_deprecation_warnings.py), Phase 7 (docs + housekeep). ~28 tasks, ~28 atomic commits, 2-3 days Tier 2 work.

Critical audit findings (2026-06-15): UI Polish phases 1, 4, 5 already SHIPPED (commits 79ac9210, 3a864076, 74e02485); phases 2, 3 code SHIPPED (user commits) but tests broken (this track fixes). The 3 remaining production send() call sites (not 5 as the parent spec claimed — 2 were already migrated by doeh_test_thinking_cleanup_20260615; mcp_client.py:2274 was a misidentification). 12 test files use send() (not 63 as the parent spec claimed — doeh_test_thinking_cleanup_20260615 already migrated 11).

blocks: data_structure_strengthening_20260606 (cleaner Result API usage makes the type-alias replacement easier) and mcp_architecture_refactor_20260606 (transitively).

Out of scope (documented in spec §7): 4 RAG test fixes (separate RAG subsystem track), the _send_<vendor>()_send_<vendor>_result() rename (not needed; tests work with current names), 23 lower-impact weak-type files (next major track: data_structure_strengthening_20260606), live_gui_mock_injection_20260615 infrastructure (separate infrastructure track).

Track: RAG Test Failures Fix (small bug-fix track) [track-created: 2026-06-15] [shipped: 2026-06-15]

Link: ./tracks/rag_test_failures_20260615/, Spec: ./tracks/rag_test_failures_20260615/spec.md, Plan: ./tracks/rag_test_failures_20260615/plan.md, Metadata: ./tracks/rag_test_failures_20260615/metadata.json

Status: 2026-06-15 — Shipped. 4 atomic commits. First fully green baseline since data_oriented_error_handling_20260606 shipped 2026-06-12 (1288 pass + 4 skip + 0 fail; was 1282 + 4 + 3 pre-track). All 11 batched test tiers pass.

Goal: Fix the 3 remaining pre-existing test failures (down from 4 as the parent track documented; test_rag_integration.py was inadvertently fixed by public_api_migration_and_ui_polish_20260615 Phase 2 follow-up commit 26e1b652). All 3 share the same root cause: 'NoneType' object has no attribute 'get' error in src/rag_engine.py, surfaced via _rebuild_rag_indexget_all_indexed_paths() (line 331: m.get('path') on None metadata) and _validate_collection_dim_result (line 150: if not embeddings raising ValueError on non-empty numpy arrays).

3 tests fixed by this track:

  • tests/test_rag_phase4_final_verify.py::test_phase4_final_verify (fails at line 65) — PASSES as of commit 35581163
  • tests/test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim (fails at line 48) — PASSES as of commit 35581163
  • tests/test_rag_visual_sim.py::test_rag_full_lifecycle_sim (was listed as failing in spec §1.1, but actually passed at track execution time; the chromadb init path was already protected by the new tests in test_rag_sync_none_error.py)

Implementation summary (4 atomic commits):

  • fix(rag): handle None metadata in get_all_indexed_paths and non-empty numpy in dim check (35581163) — the production fix
  • conductor(checkpoint): Phase 3 complete (6a0ac357) — empty checkpoint
  • docs(rag): add troubleshooting section for NoneType.get error (d89c5810) — guide_rag.md update
  • conductor(track): mark rag_test_failures_20260615 as completed (pending) — metadata + tracks.md

New test file: tests/test_rag_sync_none_error.py (3 tests, all pass):

  • test_dim_check_does_not_raise_on_non_empty_ndarray — guards against the if not embeddings numpy ValueError
  • test_get_all_indexed_paths_handles_none_metadata — guards against m.get('path') on None
  • test_get_all_indexed_paths_returns_paths_with_metadata — positive control that normal flow still works

5 phases: Phase 1 (investigation + reproducing test), Phase 2 (fix), Phase 3 (full + batched test verification), Phase 4 (docs update), Phase 5 (metadata + tracks.md). ~10 tasks, 4 atomic commits, ~30 min Tier 2 work (much faster than the 0.5-1 day estimate).

Critical audit findings (2026-06-15): The RAGConfig() default is correct (vector_store is not None; provider is 'mock' by default). The RAGEngine with mock vector store constructs successfully (verified by direct instantiation). The error originates in the RAG sync worker at src/app_controller.py:1480. Most likely candidates for the .get(None) call: src/rag_engine.py:149 (embeddings = res.get('embeddings') in _validate_collection_dim_result) or a subtle config field that becomes None. Diagnostic strategy: add traceback.format_exc() to the except clause, capture the full traceback, identify the exact call site, fix surgically, remove the diagnostic.

blocks: data_structure_strengthening_20260606 (cleaner codebase makes type-alias replacement easier) and the user's stated send_resultsend mass rename.

Out of scope (deferred to separate tracks): the send_resultsend mass rename (user's stated manual refactor), 23 lower-impact weak-type files (data_structure_strengthening_20260606), live_gui_mock_injection_20260615 infrastructure (separate track), RAG test quality cleanup (poll loops, etc.; separate track).

Track: Tier 2 Autonomous Sandbox (unattended track execution with bounded blast radius) [track-created: 2026-06-16] [shipped: 2026-06-16]

Link: ./tracks/tier2_autonomous_sandbox_20260616/, Spec: ./tracks/tier2_autonomous_sandbox_20260616/spec.md, Plan: ./tracks/tier2_autonomous_sandbox_20260616/plan.md, Metadata: ./tracks/tier2_autonomous_sandbox_20260616/metadata.json, Guide: ../../docs/guide_tier2_autonomous.md

Status: 2026-06-16 — SHIPPED. 9 phases, 19 failcount tests (100% coverage), 8 report writer tests (100% coverage), 12 slash-command contract tests, 3 opt-in sandbox tests, 1 smoke e2e test (double-gated). Meta-tooling track — adds a sibling clone + 3-layer enforcement stack (OpenCode permissions + Windows restricted token + git hooks) for unattended Tier 2 execution. No permission: ask prompts during a normal run. 4 hard git bans enforced (git restore, git push*, git checkout, git reset); failcount threshold gives up after 3 red/green failures or 30 min no-progress, writes a markdown failure report with 7 sections + .STOPPED flag.

Goal: Eliminate the permission: ask bottleneck for well-regularized tracks (TDD red/green with atomic per-task commits) by running Tier 2 unattended in a sibling clone at C:\projects\manual_slop_tier2\. Bounded blast radius via 3-layer enforcement; bounded run via failcount threshold; auditable via per-run state.json + (on give-up) markdown failure report.

Deliverables: 7 new files in main repo (scripts/tier2/{__init__.py, failcount.py, failcount.toml, write_report.py, run_track.py, setup_tier2_clone.ps1, run_tier2_sandboxed.ps1} + 3 templates in conductor/tier2/ + 2 git hooks in conductor/tier2/githooks/ + 1 user guide docs/guide_tier2_autonomous.md) + 5 new test files + 1 trivial smoke track fixture in tests/artifacts/. pyproject.toml gets 2 new pytest markers (tier2_sandbox, tier2_smoke). The main repo's opencode.json is UNTOUCHED — Tier 1 retains its permission: ask workflow.

Test inventory: 19 failcount unit tests (default-on; 100% coverage on scripts/tier2/failcount.py); 8 report writer tests (opt-in via TIER2_SANDBOX_TESTS=1; 100% coverage on scripts/tier2/write_report.py); 12 slash command spec contract tests (default-on); 1 bootstrap -WhatIf test (opt-in); 1 sandbox enforcement pre-push hook test (opt-in); 1 smoke e2e test (double-gated).

blocks: None (meta-tooling; no source code impact on the Manual Slop app).

Track: Rename send_result to send (sandbox test track) [track-created: 2026-06-16] [shipped: 2026-06-17]

Link: ./tracks/send_result_to_send_20260616/, Spec: ./tracks/send_result_to_send_20260616/spec.md, Plan: ./tracks/send_result_to_send_20260616/plan.md, Metadata: ./tracks/send_result_to_send_20260616/metadata.json

Status: 2026-06-17 - SHIPPED. 6 phases, 10 atomic rename commits + 12 plan/script commits (22 total). The FIRST end-to-end test of the tier2_autonomous_sandbox_20260616 sandbox. Refactor track (mechanical rename; no behavior change). Scope: 37 files modified (6 src/ + 27 tests/ + 3 docs + 1 metadata/state); 0 files added, 0 files deleted. Spec estimated 38 files; actual 37 (test_deprecation_warnings.py no longer exists in the repo).

Goal: Revert the 2026-06-15 public_api_migration rename (ai_client.send -> ai_client.send_result) back to ai_client.send. The migration was driven by the data-oriented error handling convention; the user wants the shorter name now that the Tier 2 autonomous sandbox can do the rename safely. Pure mechanical rename across 37 files + a surgical rewrite of one stale deprecation section in error_handling.md.

Deliverables: 0 new files, 0 deleted files. The 22 commits include 10 atomic rename commits (1 in src/ai_client.py + 1 batch in 5 other src/ + 5 per-file in top 5 tests + 1 batch in 22 remaining tests + 1 in 3 docs) and 12 plan/script commits (audit trail + helper scripts). The audit_tier2 subdirectory in scripts/tier2/ accumulates the rename + plan-update helper scripts as a record of the mechanical change pattern.

Test inventory: 100/101 tests pass in the 26 files directly affected by the rename. 1 pre-existing failure (test_headless_service.py::test_generate_endpoint) unrelated to the rename - confirmed by running the same test against origin/master baseline where it also fails (missing credentials.toml). 7 broader suite failures are all pre-existing credentials.toml issues, also confirmed against origin/master.

blocks: None (independent refactor + sandbox test).

Track: Tier 2 Sandbox - Move State/Failures Off AppData [track-created: 2026-06-18]

Link: ./tracks/tier2_no_appdata_20260618/, Spec: ./tracks/tier2_no_appdata_20260618/spec.md, Plan: ./tracks/tier2_no_appdata_20260618/plan.md, Metadata: ./tracks/tier2_no_appdata_20260618/metadata.json

Status: 2026-06-18 — SHIPPED. 6 phases, 16 atomic commits (no test commits; the test changes ride with the source changes since the tests assert the source contract). Configuration-only fix — no behavior change in product code. Scope: 11 source files modified (5 scripts/tier2/ + 2 conductor/tier2/* + 2 docs/* + 1 conductor/* + 1 .gitignore) + 2 test files modified + 1 new test added.*

Goal: Per the user's 2026-06-18 'NEVER USE APPDATA' directive, move the Tier 2 failcount state and failure-report locations inside the Tier 2 clone (scripts/tier2/state//state.json and scripts/tier2/failures/_.md). Remove every AppData reference from the Tier 2 conventions, permissions, scripts, docs, and tests. After this track, the C:\Users\Ed\AppData\... tree is never referenced by the Tier 2 sandbox in any form.

Deliverables: 0 new files, 0 deleted files. The 16 commits include 4 source code changes (failcount.py + write_report.py + run_track.py + opencode.json.fragment), 2 prompt changes (agent + slash command), 2 bootstrap-script changes (setup + sandboxed launcher), 5 doc/test changes (guide + workflow + write_track_completion_report + slash_command_spec + no_temp_writes), 1 .gitignore, 1 write_track_completion_report output, and 1 last-minute example fix caught by the test. The track-isolated directories (scripts/tier2/state/ and scripts/tier2/failures/) are gitignored so they never pollute the source tree.

Test inventory: 37 default-on tests pass (test_failcount.py: 19; test_tier2_slash_command_spec.py: 14 + 1 new = 15; test_no_temp_writes.py: 1; the test_tier2_report_writer.py 8 tests are opt-in via TIER2_SANDBOX_TESTS=1 and pass when enabled). audit_no_temp_writes.py --strict exits 0. No regressions.

blocks: None. Followup: the user re-runs pwsh -File scripts/tier2/setup_tier2_clone.ps1 to re-bootstrap the live Tier 2 clone with the new conventions.

Track: Exception Handling Audit (Convention Compliance + Doc Clarification) [track-created: 2026-06-16]

Link: ./tracks/exception_handling_audit_20260616/, Spec: ./tracks/exception_handling_audit_20260616/spec.md, Plan: ./tracks/exception_handling_audit_20260616/plan.md, Metadata: ./tracks/exception_handling_audit_20260616/metadata.json, Report: ../../docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md

Status: 2026-06-16 — Active, completed (5/5 phases, ~12 tasks). An AUDIT + DOC track (no production code change). The deliverable is the audit script + the report + 3 doc/codestyle updates that close 5 gaps in the convention's documentation.

Goal: produce a static analyzer that classifies every try/except/finally/raise site in the codebase against the data-oriented error handling convention established by data_oriented_error_handling_20260606 (shipped 2026-06-12). The audit's value is in the report + the doc clarification, not in a refactor.

Deliverables:

  • scripts/audit_exception_handling.py — 792-line AST-based static analyzer; 10-category classification taxonomy (5 compliant + 3 violation + 1 suspicious + 1 unclear); --json, --top, --verbose, --strict, --include-tests modes; "delete to turn off" per feature_flags.md
  • conductor/code_styleguides/error_handling.md — 5 new sections (Boundary Types, The Broad-Except Distinction, Constructors Can Raise, Re-Raise Patterns, Audit Script) closing 5 gaps the audit revealed
  • docs/guide_app_controller.md — new "Exception Handling" section explaining the 13 FastAPI boundary sites + the 40 migration-target sites
  • conductor/product-guidelines.md — cross-reference to the audit script
  • docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md — 9-section report (370 lines) for the user to decide the next track

Headline numbers: 348 total sites across 65 files. 80 compliant (23%) + 25 suspicious (7%) + 211 violation (61%) + 32 unclear (9%). The 3 refactored baseline files (mcp_client, ai_client, rag_engine) have 112 sites / 77 violations (the convention reference; remaining violations are mostly broad-catches without ErrorInfo conversion). The 62 migration-target files have 236 sites / 134 violations (the work for future refactor tracks).

5 gaps the audit revealed + closed:

  • G1: FastAPI HTTPException in _api_* handlers not explicitly documented as a legitimate boundary (closed in styleguide + app_controller doc)
  • G2: The "broad except Exception" rule doesn't distinguish between "swallow" and "convert to ErrorInfo" (closed in styleguide)
  • G3: The "constructors can raise" rule is brief; needs elaboration (closed in styleguide)
  • G4: The "re-raise" pattern is not in the styleguide at all (closed in styleguide)
  • G5: The new audit script is not referenced from the styleguide (closed in styleguide + product-guidelines.md)

Critical audit findings (2026-06-16): The convention is applied to 3 of 65 src/ files (mcp_client.py, ai_client.py, rag_engine.py — the "baseline"). The remaining ~10 files in src/ are in the "migration-target" state. The top 3 candidates by violation count: src/gui_2.py (37 violations, 260KB), src/app_controller.py (35 violations + 13 FastAPI boundary = 48 sites, 166KB), src/session_logger.py (8 violations, 16KB). The user decides which is the next refactor track.

blocks: app_controller_result_migration_20260616 (recommended next track; 22 migration-target sites in app_controller.py after excluding the 13 FastAPI boundary sites; 2-3 days Tier 2), gui_2_result_migration (37 violations; 2-3 days Tier 2), session_logger_result_migration (8 violations; 0.5 day Tier 2). Also unblocks the user's stated send_resultsend mass rename and the planned data_structure_strengthening_20260606 track.

Out of scope (deferred to separate tracks): the send_resultsend mass rename (user's stated manual refactor), 23 lower-impact weak-type files (data_structure_strengthening_20260606), live_gui_mock_injection_20260615 infrastructure (separate track), RAG test quality cleanup (poll loops; separate track), and — most importantly — any production code refactor (this track is informational; the user decides what to migrate).

Track: Result Migration (5 sub-tracks) [track-created: 2026-06-16]

Link: ./tracks/result_migration_20260616/, Spec: ./tracks/result_migration_20260616/spec.md, Plan: ./tracks/result_migration_20260616/plan.md, Metadata: ./tracks/result_migration_20260616/metadata.json, Audit: ../../docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md

Status: 2026-06-16 — Umbrella track; spec/plan/metadata planned. 2026-06-17 update: sub-track 1 (result_migration_review_pass_20260617) shipped; sub-track 2 (result_migration_small_files_20260617) initialized; 3 sub-tracks remaining. The umbrella specifies the sequence and scope of the 5 sub-tracks; each sub-track gets its own spec/plan/metadata when it starts.

Goal: Eliminate all 211 violations + 25 suspicious + 32 unclear = 268 "bad" sites across 42 files (per the exception_handling_audit_20260616 report). After all 5 sub-tracks ship, the data-oriented error handling convention is fully applied to all 65 src/ files, and the audit_exception_handling.py --strict mode can be wired into CI as a pre-commit gate.

5 sub-tracks (consistent result_migration_* prefix):

# Sub-track Scope Why this position
1 result_migration_review_pass S 57 sites (32 UNCLEAR + 25 INTERNAL_RETHROW) across 15 files First: human review + audit script heuristic updates inform all later sub-tracks
2 result_migration_small_files L 37 files (35 SMALL + 2 MEDIUM from --by-size); 72 V+S sites Second: quick wins; doesn't depend on the orchestrator or GUI; can run in parallel with 3-4
3 result_migration_app_controller XL 56 sites in src/app_controller.py (166KB; 13 FastAPI boundary stay as-is) — Phase 6 added 2026-06-18 to fix the 28 silent-swallow sites that Phase 3's logging.debug migration didn't actually migrate (audit gate: --strict exits 0) Third: high coordination with Hook API + MMA + RAG; gates the GUI migration
4 result_migration_gui_2 XL 55 sites in src/gui_2.py (260KB; 14 ? includes the +1 site src/gui_2.py:1349 from the review pass) Fourth: depends on 3 for clean API; the largest file
5 result_migration_baseline_cleanup L 112 sites in 3 refactored files (mcp_client.py, ai_client.py, rag_engine.py) Fifth: closes the gaps in the convention reference; parent's Path C deferred work

Total: 5 sub-tracks, 268 sites across 42 files, ~2100 lines changed.

NO day estimates (per the new Tier 1 rule added 2026-06-16). Effort is measured by scope (N files, M sites) only. The user / Tier 2 agent decides the actual pacing.

Sequence: 1 (review) -> 2 (small files) -> 3 (app_controller) -> 4 (gui_2) -> 5 (baseline cleanup). Tracks 2 + 5 can run in parallel; tracks 3 + 4 must be sequential (the GUI calls controller methods); track 1 is independent.

blocks: data_structure_strengthening_20260606 (parallel track; uses the cleaner Result API from this phase) and the user's stated send_resultsend mass rename.

Out of scope (deferred to separate tracks): the send_resultsend mass rename (user's stated manual refactor; post-this-phase), 23 lower-impact weak-type files (data_structure_strengthening_20260606), live_gui_mock_injection_20260615 infrastructure (separate track), RAG test quality cleanup (poll loops; separate track), and any audit script changes that belong in the review pass (sub-track 1) — those are detailed in conductor/tracks/result_migration_20260616/plan.md.


Track: Live GUI Test Infrastructure Fixes (test_execution_sim_live crash + test_live_gui_workspace_exists race) [track-created: 2026-06-18] [shipped: 2026-06-18]

Link: ./tracks/live_gui_test_fixes_20260618/, Spec: ./tracks/live_gui_test_fixes_20260618/spec.md, Plan: ./tracks/live_gui_test_fixes_20260618/plan.md, Metadata: ./tracks/live_gui_test_fixes_20260618/metadata.json, Report: ../../docs/reports/TRACK_COMPLETION_live_gui_test_fixes_20260618.md

Status: 2026-06-18 - SHIPPED. 4 phases, 8 atomic commits (1 setup + 4 TDD/test/fix + 2 docs + 1 audit). Pre-conditions for sub-track 2's full closure. Scope: 2 issues fixed; 2 src files modified + 2 test files extended + 1 conftest modified + 2 docs + 2 audit logs. Test result: 11/11 tiers PASS clean (~825s total).

Goal: Fix the 2 documented test infrastructure issues that blocked sub-track 2 (result_migration_small_files_20260617) from full closure. The 2 issues were reported as "documented issues" by sub-track 2 Phase 13 (commit 30ca3265). Both are pre-existing (not regressions from the Result[T] migration).

The 2 fixes:

Issue 1: test_execution_sim_live GUI subprocess crash (tier-3-live_gui)

  • Symptom: GUI subprocess (port 8999) crashes mid-test with 0xC00000FD = STATUS_STACK_OVERFLOW
  • Root cause: imgui.set_window_focus("Response") was called directly during the response panel render, exhausting the GUI main thread's 1.94 MB stack on Windows
  • Fix: defer the focus call to the next frame's idle phase via a new _pending_focus_response flag (commits d02c6d56, 0f796d7d)
  • Same root cause as test_z_negative_flows.py (documented in docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617_REFINED.md)

Issue 2: test_live_gui_workspace_exists xdist race (tier-1-unit-gui)

  • Symptom: xdist race where the owner worker's teardown removes the shared workspace path before a client worker's test can assert it exists
  • Root cause: live_gui_workspace fixture in tests/conftest.py:727 returned handle.workspace without ensuring the path existed
  • Fix: call workspace.mkdir(parents=True, exist_ok=True) before returning (commits 3fdb2592, bf6bc67b)
  • Pre-existing on parent commit 4ab7c732 (verified in tests/artifacts/PHASE14_PARENT_VERIFICATION.log)

Deliverables:

  • 1 setup commit (chore(scripts): relocate Tier 2 state paths to project-relative) - honors NEVER USE APPDATA directive; the failcount state and write_report failures directory now default to project-relative paths under tests/artifacts/
  • 2 TDD red + 2 TDD green commits (one pair per issue)
  • 1 audit commit (chore(audit): Phase 14.1 - verify Issue 2 on parent commit 4ab7c732)
  • 1 audit commit (chore(audit): Phase 4.1 - 11/11 test tiers PASS clean)
  • 2 docs commits (sub-track 2 reports updated with Phase 14 addendum)
  • 1 track artifact import commit (conductor(track): import live_gui_test_fixes_20260618 artifacts)

blocks: sub-track 2 of result_migration_20260616 (full closure requires the 2 issues fixed).

Out of scope (deferred to follow-up track): the 4 @pytest.mark.skip markers for Gemini 503 pre-existing failures (test_auto_aggregate_skip, test_view_mode_summary, test_view_mode_default_summary, test_view_mode_custom_empty_default_to_summary). To remove them, mock the Gemini API in summarize.summarise_file for tests.

Track: Test Sandbox Hardening (hard sandbox for tests; root-cause fix for test data loss) [track-created: 2026-06-19]

Link: ./tracks/test_sandbox_hardening_20260619/, Spec: ./tracks/test_sandbox_hardening_20260619/spec.md, Plan: ./tracks/test_sandbox_hardening_20260619/plan.md, Metadata: ./tracks/test_sandbox_hardening_20260619/metadata.json

Status: 2026-06-19 - SPEC + PLAN committed. Ready for Tier 2 implementation. 9 phases, 30 tasks, ~11 atomic commits.

Goal: Make any pytest or run_tests_batched.py invocation provably incapable of writing files outside ./tests/. Default-on Python guard + opt-in OS-level wrapper. Root-cause fix: eliminate the silent SLOP_CONFIG env-var fallback that lets tests accidentally touch the user's real manual_slop.toml and related top-level files.

The 5 enforcement layers:

  1. FR2 root-cause fixsrc/paths.py:get_config_path() no longer falls back to <project_root>/config.toml via SLOP_CONFIG. New API: paths.set_config_override(path). CLI flag --config <path> at the entry point (sloppy.py for production, conftest.py for tests).
  2. FR1 Python guardsys.addaudithook autouse fixture blocks writes outside ./tests/ with RuntimeError("TEST_SANDBOX_VIOLATION: ..."). Hard fail; reads unaffected.
  3. FR3 isolation migrationisolate_workspace moved off tmp_path_factory.mktemp to tests/artifacts/_isolation_workspace_<RUN_ID>/. pyproject.toml adds addopts = "--basetemp=tests/artifacts/_pytest_tmp". All test infra paths now under ./tests/.
  4. FR4 static auditscripts/audit_test_sandbox_violations.py flags hardcoded paths to top-level TOMLs + tempfile.mkdtemp/mkstemp without dir=. CI gate (--strict exits 1).
  5. FR5 OS-level wrapperscripts/run_tests_sandboxed.ps1 (Windows restricted-token + Job Object; OPT-IN).

User directives (locked 2026-06-19):

  • NO ENV VARS for config path. --config CLI flag is the only override mechanism.
  • Test workspace file naming: config_overrides.toml (per user direction).
  • Hard fail on any sandbox violation (no warnings, no soft fails).
  • Tests should never need AppData temp.
  • Out of scope (deferred to follow-up tracks): converting the other 7 SLOP_* env vars (SLOP_GLOBAL_PRESETS, SLOP_GLOBAL_TOOL_PRESETS, SLOP_GLOBAL_PERSONAS, SLOP_GLOBAL_WORKSPACE_PROFILES, SLOP_CREDENTIALS, SLOP_MCP_ENV, SLOP_LOGS_DIR, SLOP_SCRIPTS_DIR) — user considers this the "mess" to address separately.

Baseline (per result_migration_small_files_20260617 shipped 2026-06-18): 1288 passed + 4 xdist-skipped. VC8 requires no regression vs. this baseline.

Root causes of data loss (per Phase 1 audit):

  1. src/paths.py:get_config_path() at line 42 silently falls back to <project_root>/config.toml when SLOP_CONFIG is unset (the default for tests). This is the silent default that bites.
  2. tests/conftest.py:isolate_workspace at line 265 uses tmp_path_factory.mktemp which lives in %TEMP%\pytest-of-<user>\ on Windows — outside ./tests/.
  3. The Layer 1 Python guard is the runtime safety net; FR2 + FR3 are the proper fixes.

Deferred follow-up tracks (per metadata.json deferred_to_followup_tracks):

  • Convert the other 7 SLOP_* env vars to CLI flags (same pattern: paths.set_<thing>_override() + entry-point flag).
  • macOS/Linux OS-level sandbox wrapper (run_tests_sandboxed.sh using bwrap/unshare).
  • Per-fixture sandbox strictness tuning (@pytest.fixture(sandbox_strict=True)).
  • Read-side isolation (block reads of real config from tests).

Phase 9: Chore Tracks

Initialized: 2026-06-07

Completed (recently archived or in tracks/)


Active Research Tracks (2026-06+)

Tracks that produce a research deliverable (a markdown report) rather than Application code. These are non-impl by design.

Active

  • Track: Fable System Prompt Review (Critical Analysis) [initialized: 058e2c93; shipped: 2026-06-18] Link: ./tracks/fable_review_20260617/, Spec: ./tracks/fable_review_20260617/spec.md, Metadata: ./tracks/fable_review_20260617/metadata.json, State: ./tracks/fable_review_20260617/state.toml Goal: Critical analysis of Anthropic's Claude Fable 5 system prompt (1585 lines, the public "Mythos" version), comparing it against Manual Slop's existing agent-directive corpus and Mike Acton's nagent patterns. 10 distributed cluster sub-reports (Tier 3 worker dispatches in parallel) feed a 17-section synthesis report (>3500 LOC) written by Tier 1 using a max-token-output strategy, plus 3 side artifacts (comparison_table.md, decisions.md for the deferred nagent-rebuild, nagent_takeaways_fable_20260617.md). Verdict framework: Useful / Persona Performance / Anti-User / Mixed. Hard rule (per user 2026-06-17): docs/artifacts/Fable System Prompt.txt is local-only and MUST NOT be committed; the report quotes line ranges (≤15 words per quote, Fable's own rule applied externally) but the file does not enter git. No day estimates. No T-shirt sizes. Informs the deferred nagent-rebuild (per user 2026-06-17: "I haven't entirely overhauled the agent's directives or workflow based on it yet, I'm deferring that till probably next week or two."). 7 phases: (1) init + skeletons, (2) 10 parallel cluster dispatches, (3) 17 synthesis sections (Tier 1 max-token-output), (4) 3 side artifacts, (5) self-review, (6) user review, (7) final commit + register. SHIPPED 2026-06-18: 14 files, 5,683 LOC total (10 cluster sub-reports 3,278 LOC + synthesis report 1,800 LOC + 3 side artifacts 605 LOC). Verdict distribution: 47% Useful, 38% Persona, 15% Anti-User, 7% Mixed. 20 concrete recommendations in decisions.md (11 adoptions + 7 explicit rejections + 2 ignore). Fable-artifact discipline verified: 0 commits, 0 tracked files, 0 tree entries. Note: synthesis report is 1,800 LOC (below 3,500 spec target); content is complete but per-section verbosity is below spec target. Track ready for archive (deferred per project convention).

Notes

Archive link convention: ./archive/... paths in this file resolve to conductor/archive/... (this file is at conductor/tracks.md). The 71 archive links in this file are all valid as of 2026-06-08.

Status legend:

  • [ ] not started
  • [~] in progress
  • [x] completed (track may still be in tracks/ or may have been moved to archive/)
  • ~~**...**~~ struck-through (renamed/replaced/superseded)

Naming convention: Each track's spec.md and plan.md (where present) follow the project's standard format: spec.md for design intent (the "why"), plan.md for executable tasks (the "how"). See conductor/tracks/data_oriented_error_handling_20260606/ for the canonical example.

Editing this file: When you mark a track as [x] and move its folder to archive/, also move it to the appropriate Archived sub-section. When you start a new track, create the folder under tracks/ first, then add the entry to the Active Tracks table at the top. The git-blame sort order (0a, 0b, 0c...) is no longer used; this file is now organized by phase + dependency.