Private

Public Access

Files

T

ed 6dd41b3e6d conductor(plan): mark result_migration_baseline_cleanup_20260620 as active

TIER-2 READ conductor/code_styleguides/error_handling.md end-to-end before Phase 0.

Task 0.1 (Phase 0): update conductor/tracks.md row 32 from
'ready to start' to 'active 2026-06-20'.

2026-06-20 08:07:59 -04:00

120 KiB

Raw Blame History

Project Tracks

This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder (or in ../archive/<track_name>/ for completed tracks).

Structure:

Active Tracks (Current Queue): In-flight and unblocked work the implementer can pick up today.
Phase 0 - 9 (Chronological): The full project history in chronological order. Each phase has three sub-sections: Active (work in progress), Completed (work shipped but track not yet archived), Archived (track folder moved to archive/).

Archive directories live at ../archive/<track_name>/ (from this file's location at conductor/tracks.md); the ./archive/... links in this file are relative to that location and resolve correctly.

Active Tracks (Current Queue)

Tracks that are unblocked and ready to start. Ordered by dependency (blocked-by first) and priority (A foundational → D forward-looking).

#	Priority	Track	Status	Blocked By
2	A	Qwen, Llama & Grok Vendor Integration + Capability Matrix	spec ✓, plan ✓, 50/79 tasks done; Phase 6 in progress (docs); NOT archiving — has follow-up track	test_infrastructure_hardening_20260609 (merged)
3	A	Data-Oriented Error Handling (Fleury Pattern)	spec ✓, plan ✓, ready to start	startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609 (merged), qwen_llama_grok
4	A	Data Structure Strengthening (Type Aliases + NamedTuples)	spec ✓, plan pending	test_infrastructure_hardening_20260609 (merged)
5	A	MCP Architecture Refactor (Sub-MCP Extraction)	spec ✓, plan pending	test_infrastructure_hardening_20260609 (merged), data_oriented_error_handling, data_structure_strengthening
6	D	Public API Result Migration	placeholder; not yet specced	data_oriented_error_handling (deprecated `send()`)
6a	A	Public API Migration + UI Polish Test Cleanup	spec ✓, plan ✓, shipped 2026-06-15 (13 pre-existing failures fixed; 3 RAG failures deferred to `rag_test_failures_20260615`)	(none — independent; NEW 2026-06-15; combined stability track)
6b	A	RAG Test Failures Fix	spec ✓, plan ✓, shipped 2026-06-15 (3 RAG tests fixed; first fully green baseline 1288 + 4 + 0)	(none — independent; NEW 2026-06-15; small bug-fix track)
6c	B	Exception Handling Audit (Convention Compliance + Doc Clarification)	spec ✓, plan ✓, shipped 2026-06-16 (211 violations identified across 42 files; 5 doc gaps closed)	(none — independent; NEW 2026-06-16; audit + doc track; identifies the migration target for `data_structure_strengthening_20260606` and the user's `send_result` → `send` rename)
6d	A	Result Migration (5 sub-tracks)	umbrella spec ✓; sub-tracks 1+2 initialized (sub-track 1: `result_migration_review_pass_20260617` shipped 2026-06-17; sub-track 2: `result_migration_small_files_20260617` initialized; 3 remaining)	`exception_handling_audit_20260616`; identifies the migration target
6d-1	A	Result Migration Sub-Track 1: Review Pass	spec ✓, plan ✓, metadata ✓, state ✓; shipped 2026-06-17 (43 sites classified: 23 compliant + 1 migration-target + 8 PATTERN_1/2 + 9 compliant + 1 audit-script-bug; 10 new heuristics added; 3 audit-script bugs documented)	`result_migration_20260616` (umbrella); `exception_handling_audit_20260616` (shipped 2026-06-16)
6d-2	A	Result Migration Sub-Track 2: Small Files + Audit-Script Bug Fixes	spec ✓, plan ✓, metadata ✓, state ✓, shipped 2026-06-18 (Phase 10 REJECTED for sliming 21 sites via 5 laundering heuristics; Phase 11 REDOES the 21 sites: 5 full Result migrations in warmup.py + 2 helper extracts + 14 documented; Phase 12 = ACTUAL full Result[T] migration: 16 sites in api_hooks.py + 27 sites in 16 small files; Heuristic #19 REMOVED; visit_Try bug FIXED; Heuristic D ADDED; Drain Points section in styleguide; Phase 12 REJECTED for false test claim; **Phase 13 = script crash fixed (UTF-8 reconfigure in run_tests_batched.py) + 3 failures investigated on parent commit (0 regressions) + 4 pre-existing Gemini 503 tests documented with @pytest.mark.skip + test_execution_sim_live switched from gemini_cli to gemini per user directive (STILL FAILS, reported for diff track); 11/11 tiers actually run; 9 PASS clean + 2 PASS with documented issues)	`result_migration_20260616` (umbrella); `result_migration_review_pass_20260617` (shipped 2026-06-17)
6d-3	A	Result Migration Sub-Track 3: App Controller	spec ✓, plan ✓, metadata ✓, state ✓, active; migrates 45 sites in `src/app_controller.py` to `Result[T]` (32 INTERNAL_BROAD_CATCH + 8 INTERNAL_SILENT_SWALLOW + 4 INTERNAL_RETHROW + 1 INTERNAL_OPTIONAL_RETURN); 22 sites stay as-is (15 BOUNDARY_FASTAPI + 2 BOUNDARY_SDK + 4 INTERNAL_COMPLIANT + 1 INTERNAL_PROGRAMMER_RAISE). Phase 1 = fix the 2 known regressions (test_tool_presets_execution::test_tool_ask_approval + test_extended_sims::test_execution_sim_live) caused by the half-migrated `session_logger.log_tool_call` call site in `_offload_entry_payload` (lines 3715, 3721). 5-file-commit pattern from `doeh_test_thinking_cleanup_20260615` (1 source + 1 test + 1 plan + 1 metadata + 1 state per task). 6 phases: (1) Setup + fix regressions; (2) 32 broad-catch → 4 bulk batches; (3) 8 silent-swallow → 2 batches with logging.debug per Heuristic #19; (4) 4 rethrow classified + 1 optional migrated; (5) Verify + audit + end-of-track report.	`result_migration_20260616` (umbrella); `result_migration_small_files_20260617` (shipped 2026-06-18)
6d-4	A	Result Migration Sub-Track 4: gui_2.py	spec ✓, plan ✓, metadata ✓, state ✓, shipped 2026-06-20; migrated 42 sites in `src/gui_2.py` (25 INTERNAL_BROAD_CATCH + 13 INTERNAL_SILENT_SWALLOW + 2 INTERNAL_RETHROW + 2 UNCLEAR) to `Result[T]`; added 3 new drain-plane render functions + 1 new test file + 2 new audit heuristics (Phase 11 dunder raise + Phase 12 lazy-loading fallback). Audit: V=0, S=0, ?=0 for gui_2.py. 81 atomic commits across 13 phases; 114 tests pass; Tier 1+2 batched: 10/10 PASS; Tier 3: 1 known issue (FPS 28.46 vs 30 threshold; documented in TRACK_COMPLETION). Anti-sliming protocol: 13 phases cap each phase at <=10 sites with per-phase styleguide re-read + per-site audit pre/post check + per-phase invariant test.	`result_migration_app_controller_20260618` (sub-track 3, SHIPPED 2026-06-19 with Phase 7; data plane ready)
6d-5	A	Result Migration Sub-Track 5: Baseline Cleanup	spec ✓, plan ✓, metadata ✓, state ✓, active 2026-06-20; migrates 88 sites across 3 baseline files (`src/mcp_client.py` 46 + `src/ai_client.py` 33 + `src/rag_engine.py` 9) to make the convention reference 100% compliant. Same anti-sliming protocol as sub-track 4: 14 phases cap each phase at <=9 sites with per-phase styleguide re-read + per-site audit pre/post check + per-phase invariant test.	`result_migration_gui_2_20260619` (sub-track 4, SHIPPED 2026-06-20; first to ship without error correction per user)
6e	A (meta-tooling)	Tier 2 Autonomous Sandbox (unattended track execution)	spec ✓, plan ✓, shipped 2026-06-16 (9 phases, 24 default-on tests + 4 opt-in tests + 1 smoke e2e)	(none — independent; NEW 2026-06-16; meta-tooling; eliminates the `permission: ask` bottleneck for well-regularized tracks via a 3-layer enforcement stack: OpenCode permission system + Windows restricted token + git hooks)
6f	A (meta-tooling)	Tier 2 Sandbox File Leak Prevention (revert + 3-layer defense)	spec ✓, plan ✓, metadata ✓, state ✓, shipped 2026-06-20; selectively reverted the 4 user-named files from offender commit `00e5a3f2` (`.opencode/agents/tier2-autonomous.md`, `.opencode/commands/tier-2-auto-execute.md`, `opencode.json`, `mcp_paths.toml`); added 3-layer defense: pre-commit hook at `conductor/tier2/githooks/pre-commit` (auto-unstages forbidden files at commit boundary; 12 tests), `scripts/audit_tier2_leaks.py` (working-tree audit with `--strict` CI gate; 13 tests), wired hook installation into `scripts/tier2/setup_tier2_clone.ps1`. 25 default-on + 4 opt-in tests pass; 4 atomic commits (`fab2e55b` + `81e1fd7b` + `f5d8ea04` + `8f54deda`); user-driven response to a one-off incident (per user directive: tier-2 must NEVER commit those files again; NOT via gitignore). DEFERRED: CI wiring of audit `--strict` mode; rebase of stale tier-2 branches (`tier2/result_migration_app_controller_phase6_20260619`, `tier2/test_sandbox_hardening_20260619`) on `origin/master@8f54deda` to drop `00e5a3f2` (user action).	(none — independent; NEW 2026-06-20; meta-tooling fix; selective revert of 4 of 9 changes in offender commit `00e5a3f2`)
7	—	UI Polish (Five Issues)	spec ✓, plan ✓, ready to start (Phases 1/4/5 shipped; Phases 2/3 code shipped but tests broken — fixed by track 6a)	(none — independent)
7a	B	SQLite-Granularity Inline Docs for gui_2.py	spec ✓, plan ✓, complete	(none — independent)
7b	B	Continued SQLite-Granularity Inline Docs for gui_2.py	spec ✓, plan ✓, complete	(none — independent)
7c	B	SQLite-Granularity Inline Docs for ai_client.py	spec ✓, plan ✓, ready to start	(none — independent)
7d	A	Live GUI Test Infrastructure Fixes	spec ✓, plan ✓, metadata ✓, state ✓, active; addresses 2 issues reported for diff tracks by `result_migration_small_files_20260617` Phase 13: (1) `test_execution_sim_live` GUI subprocess (port 8999) crashes mid-test during script generation flow — same failure with both `gemini_cli` and `gemini`; NOT provider-specific; 90s timeout reached without AI text; (2) `test_live_gui_workspace_exists` xdist race — workspace cleanup timing under parallel xdist; passes in isolation. 4 phases: (1) Investigation + Issue 2 parent-commit verification; (2) Fix Issue 2 (TDD); (3) Fix Issue 1 (TDD + remove diagnostic logging); (4) Final verification (11/11 tiers PASS clean).	`result_migration_small_files_20260617` (shipped 2026-06-18 with the 2 issues reported for diff tracks)
16	A	Test Sandbox Hardening	spec ✓, plan ✓, metadata ✓, state ✓, ready to start; 5-part fix for test data loss outside `./tests/`. Phase 1: investigation + baseline pass count + audit of `get_config_path()` callers. Phase 2: `scripts/audit_test_sandbox_violations.py` (FR4 static audit + `--strict` CI gate). Phase 3: `_enforce_test_sandbox` autouse fixture in conftest.py using `sys.addaudithook` (FR1 Python guard; hard fail on any write outside `./tests/`). Phase 4: root-cause fix — remove `SLOP_CONFIG` env-var fallback from `src/paths.py`; add `--config <path>` CLI flag to sloppy.py + conftest.py; `set_config_override(path)` module-level API (FR2). Phase 5: `isolate_workspace` migration off `tmp_path_factory.mktemp` to `tests/artifacts/_isolation_workspace_<RUN_ID>/`; pyproject.toml `--basetemp` addopts; `SLOP_CREDENTIALS`/`SLOP_MCP_ENV` env vars added to non-live_gui tests; tech-stack.md dated note (FR3). Phase 6: `scripts/run_tests_sandboxed.ps1` (FR5 Windows restricted-token wrapper, OPT-IN). Phase 7: `conductor/code_styleguides/test_sandbox.md` + updates to workspace_paths.md and guide_testing.md (FR7 docs). Phase 8: full 11-tier verification. Phase 9: end-of-track report. 13 regression tests in `tests/test_test_sandbox.py`. ~11 atomic commits.	(none — independent; NEW 2026-06-19; test-infrastructure + root-cause fix; primary motivation: user has lost important sample data multiple times over the past month because tests wrote to top-level TOML files; NO ENV VARS for config path per user directive — `--config` CLI flag is the only override mechanism; test workspace file naming: `config_overrides.toml`; hard fail on any sandbox violation; tests should never need AppData temp (`tempfile.mkdtemp/mkstemp` without `dir=` is flagged); baseline 1288 + 4 + 0; out of scope: converting the other 7 `SLOP_*` env vars (`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`) to CLI flags — user considers this a separate "mess" to address in follow-up tracks; deferred: macOS/Linux OS-level wrapper, per-fixture sandbox strictness tuning, read-side isolation)
8	—	Bootstrap gencpp Python Bindings	spec TBD	(none — independent)
9	—	Tree-Sitter Lua MCP Tools	spec TBD	(none — independent)
10	—	GDScript Language Support Tools	spec TBD	(none — independent)
11	—	C# Language Support Tools	spec TBD	(none — independent)
12	—	OpenAI Provider Integration	spec TBD	(none — independent)
13	—	Zhipu AI (GLM) Provider Integration	spec TBD	(none — independent)
14	—	AI Provider Caching Optimization	spec TBD	(none — independent)
15	—	Manual UX Validation & Review	spec TBD	(none — independent)
15a	—	Manual UX Validation — ASCII-Sketch Workflow	spec ✓, plan ✓, ready to start	(none — independent; NEW 2026-06-08)
15b	—	Chunkification Optimization (Contingency)	spec ✓ (contingency), no plan	hard constraint surface (deferred)
16	—	GenCpp Dogfood Feedback Loop	spec TBD	(none — independent; oldest pending track)
17	—	Code Path Audit	spec TBD	test_infrastructure_hardening_20260609 (merged)
23	A (research)	Intent-Based Scripting Languages Survey	spec ✓, plan pending	(none — independent; NEW 2026-06-12; non-impl research track, time-sensitive: report must complete before nagent v2.2)
24	A (bugfix)	AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek)	spec ✓, plan ✓, shipped 2026-06-15 (with 1 critical `_api_generate` regression + 2 deferred bugs — see `doeh_test_thinking_cleanup_20260615`)	(none — independent; NEW 2026-06-14; user-blocking; 3 bugs from `data_oriented_error_handling_20260606`)
25	B (research)	Fable System Prompt Review (Critical Analysis)	spec ✓, plan pending	(none — independent; NEW 2026-06-17; non-impl research track, informs the deferred nagent-rebuild; 10 cluster sub-reports + 17-section synthesis report >3500 LOC + 3 side artifacts; Fable artifact at `docs/artifacts/Fable System Prompt.txt` is local-only and NEVER committed)
18	—	GUI Architecture Refinement	(no spec.md)	(TBD)
19	—	Context First Message Fix	spec TBD	(none — independent)
19	—	~~Fix Remaining Tests~~	~~SUPERSEDED by track 1~~	—
20	—	~~Test Harness Hardening~~	~~SUPERSEDED by track 1~~	—
21	—	~~Test Patch Fixes~~	~~SUPERSEDED by track 1~~	—
22	—	~~Test Batching Post-Refactor Polish~~	~~SUPERSEDED by track 1 (FR1 + FR2)~~	—
20	—	Prior Session Test Harden (20260605)	superseded; no action needed	—

Note on numbering: the legacy file used 0a, 0b, 0c... and 0d, 0e, 0f, 0g for tracks created 2026-06-06+. This is the git-blame sort order, not a logical execution order. The new structure re-orders by dependency.

Phase 0: Infrastructure (Critical)

Initialized: 2026-02 (project foundation)

Completed

Track: Conductor Path Configuration Note: One-line entry; full details in ./tracks/conductor_path_configurable_20260306/ (still in tracks/; not yet archived).

Phase 1: Pre-Track Foundation (2026-02 - 2026-03)

No tracks were added under explicit Phase 1; this section is reserved for the early architectural groundwork that preceded the formal track system.

Completed

Various one-off refactors; full details in conductor/archive/ by track name prefix.

Phase 2: Strict Execution Queue

Completed 2026-03-06

Completed

Track: Strict Execution Queue (Phase 2) See: ./archive/strict_execution_queue_completed_20260306/

Phase 3 - Phase 4: Foundational Tracks (March 2026)

Multiple sub-tracks under the initial feature-development push. All archived.

Archived

Tracks 1 - 29 of the original Phase 4 archive (preserved with original numbers for cross-reference continuity):

~~Track: Session Context Snapshots & Visibility~~ (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/session_context_snapshots_20260311/
~~Track: Discussion Takes & Timeline Branching~~ (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization) Link: ./archive/discussion_takes_branching_20260311/
Track: RAG Support Link: ./archive/rag_support_20260308/
Track: Agent Tool Preference & Bias Tuning Link: ./archive/tool_bias_tuning_20260308/
Track: Expanded Hook API & Headless Orchestration Link: ./archive/hook_api_expansion_20260308/
Track: Codebase Audit and Cleanup Link: ./archive/codebase_audit_20260308/
Track: Expanded Test Coverage and Stress Testing Link: ./archive/test_coverage_expansion_20260309/
Track: Beads Mode Integration Link: ./archive/beads_mode_20260309/
Track: Optimization pass for Data-Oriented Python heuristics Link: ./archive/data_oriented_optimization_20260312/
Track: Rich Thinking Trace Handling Link: ./archive/thinking_trace_handling_20260313/
Track: Smarter Aggregation with Sub-Agent Summarization Link: ./archive/aggregation_smarter_summaries_20260322/
Track: System Context Exposure Link: ./archive/system_context_exposure_20260322/
Track: Advanced Log Management and Session Restoration Link: ./archive/log_session_overhaul_20260308/
Track: UI Theme Overhaul & Style System Link: ./archive/ui_theme_overhaul_20260308/
Track: Selectable GUI Text & UX Improvements Link: ./archive/selectable_ui_text_20260308/
Track: Markdown Support & Syntax Highlighting Link: ./archive/markdown_highlighting_20260308/
Track: Custom Shader and Window Frame Support Link: ./archive/custom_shaders_20260309/
Track: UI/UX Improvements - Presets and AI Settings Link: ./archive/presets_ai_settings_ux_20260311/
Track: Discussion Hub Panel Reorganization Link: ./archive/discussion_hub_panel_reorganization_20260322/
Track: Undo/Redo History Support Link: ./archive/undo_redo_history_20260311/
Track: Advanced Text Viewer with Syntax Highlighting Link: ./archive/text_viewer_rich_rendering_20260313/
Track: Tree-Sitter C/C++ MCP Tools Link: ./archive/ts_cpp_tree_sitter_20260308/
Track: Saved System Prompt Presets Link: ./archive/saved_presets_20260308/
Track: Saved Tool Presets Link: ./archive/saved_tool_presets_20260308/
Track: External Text Editor Integration for Approvals Link: ./archive/external_editor_integration_20260308/
Track: Agent Personas: Unified Profiles & Tool Presets Link: ./archive/agent_personas_20260309/
Track: Advanced Workspace Docking & Layout Profiles Link: ./archive/workspace_profiles_20260310/
Track: Review investigation of codebase and expose/cull any hidden invisible prompting Link: ./archive/cull_hidden_prompts_20260502/
Track: Test Regression Verification Link: ./archive/test_regression_verification_20260307/

Phase 5: Codebase Curation

Initialized: 2026-05-07

Completed (all archived)

Analysis & Structural Review

Track: Comprehensive Path Mapping & Tooling Link: ./archive/ai_interaction_call_graph_20260507/ Goal: Automated and manual derivation of all major code paths and pipelines in the system.
Track: Controller State Mutation Matrix Link: ./archive/controller_state_mutation_matrix_20260507/ Goal: Comprehensive map of all methods that modify the AppController and App state.
Track: Source-Wide Redundancy Audit Link: ./archive/source_wide_redundancy_audit_20260507/ Goal: Deep file-by-file audit to identify unused methods, duplicate logic, and dead code.
Track: Curate Provider Registries Link: ./archive/curate_provider_registries_20260507/ Goal: Move the PROVIDERS list to models.py and update all references to use this single source of truth.
Track: Encapsulate AppController Status Link: ./archive/encapsulate_appcontroller_status_20260507/ Goal: Convert ai_status and mma_status to properties with thread-safe setters.
Track: Decouple GUI Log Loading Link: ./archive/decouple_gui_log_loading_20260507/ Goal: Move Tkinter directory selection out of AppController and into gui_2.py.
Track: Refactor Context Aggregation Pipeline Link: ./archive/refactor_context_aggregation_pipeline_20260507/ Goal: Modernize src/aggregate.py and consolidate legacy tier builders.
Track: Cull Unused Symbols Link: ./archive/cull_unused_symbols_20260507/ Goal: Safely remove the 27 dead symbols identified in the redundancy audit.
Track: Structural Dependency Mapping (SDM) Docstrings Link: ./archive/sdm_docstrings_20260509/
Track: AppController Curation & Structural Alignment Link: ./archive/app_controller_curation_20260513/ Goal: Curate src/app_controller.py to match gui_2.py organization and enforce Python style conventions.
Track: Fix 45 failing test files across 12 batches Link: ./archive/fix_test_suite_failures_20260514/
Track: Fix Indentation 1-Space Convention Link: ./archive/fix_indentation_1space_20260516/ Goal: Standardize all Python files to 1-space indentation per AI-Optimized Python Style Guide. Audit and correct indentation in src/, tests/, scripts/, and conductor/ directories.

Phase 6: Context Composition Redesign

Initialized: 2026-05-10

Completed (all archived)

Context Control & Workflow Enhancements

Track: Granular AST Control (Signatures vs. Definitions) Link: ./archive/granular_ast_control_20260510/ Goal: Introduce 'AST Signatures' and 'AST Definitions' states in the Context Panel for C/C++ files.
Track: Context Snapshotting per "Take" Link: ./archive/context_snapshotting_takes_20260510/ Goal: Snapshot and visually restore the Context Panel state when switching between Takes.
Track: Interactive Text Slice Highlighting Link: ./archive/interactive_text_slice_highlighting_20260510/ Goal: Allow highlighting text ranges to create fuzzy-anchored slices (Def, Sig, Hide) that survive file modifications.
Track: Context Batch Operations UX Link: ./archive/context_batch_operations_ux_20260510/ Goal: Add multi-select and batch state modification capabilities to the Context Panel for rapid wrangling.
Track: GenCpp Project Initialization Link: ./archive/gencpp_project_init_20260510/ Goal: Configure manual_slop.toml in the gencpp repo to isolate conductor tracks, logs, and history.
Track: Interactive AST Tree Masking Link: ./archive/interactive_ast_tree_masking_20260510/ Goal: Inspect C/C++ ASTs in the GUI and mask individual classes/functions as Def, Sig, or Hide.
Track: Phase 6 Review and Regression Verification Link: ./archive/phase6_review_20260510/ Goal: Review Phase 6 implementation, perform full-suite batch regression testing, and expand test coverage for new context curation features.
Track: Context Composition Decoupling Link: ./archive/context_comp_decouple_20260510/ Goal: Decouple Files & Media from Context Composition, add directory grouping, file stats, and view mode selection per file.
Track: Context Composition Slice Visualization Link: ./archive/context_comp_slices_20260510/ Goal: Enhance slice visualization with visual editor, annotation support (tags/comments), and view presets.
Track: GUI Refactor & Stabilization Link: ./archive/gui_refactor_stabilization_20260512/ Goal: Refactor gui_2.py to fix regressions and enforce better imgui scoping patterns.
Track: GUI 2 Large Cleanup (originally listed as "I started to do a large cleanup to ./src/gui_2.py..." — the long user message was the track description) Link: ./archive/gui_2_cleanup_20260513/ Goal: Study gui_2.py and derive more information on how to maintain and write code for the Python codebase. Update product guidelines or the python code_styleguidelines based on what is discovered. May also need changes to the mcp_tools for better structural awareness of annotations or other conventions with these python files.
Track: Add Python structural MCP tools (py_remove_def, py_add_def, py_move_def, py_region_wrap) Link: ./archive/python_structural_mcp_tools_20260513/
[~] Track: Context Preview & Slice Editor Fixes Link: ./tracks/context_preview_fixes_20260516/ Goal: Fix Preview button generating empty content, and Inspect/Slices buttons failing to open their respective editor panels. Status: in progress; track folder still in tracks/ (not yet archived).

Active

Track: GenCpp Dogfood Feedback Loop Link: ./tracks/gencpp_dogfood_feedback_20260510/ Goal: Verify Manual Slop can target gencpp at C:/projects/gencpp and establish a feedback mechanism for issues found during dogfooding. Status: oldest pending track (2026-05-10). Track folder still in tracks/.

Hot Reload Feature (2026-05-16)

Single-track feature, not part of a numbered Phase.

Archived

Track: Hot Reload Python Codebase (Phase 2) Link: ./archive/hot_reload_python_20260516/ Goal: Implement selective, state-preserving hot-reload for src/gui_2.py with delegation pattern refactor, manual trigger via Ctrl+Alt+R and GUI button, and visual error tint feedback on failure.

Phase 7: Stabilization & Polishing (2026-05-13 to 2026-06-02)

Two archival phases under the same "Phase 7" umbrella. Both completed; tracks moved to archive/.

Archived

Track: Phase 7 Stabilization and Polishing (Regressions Fix) Link: ./archive/phase7_stabilization_and_polishing_20260601/
Track: Phase 7 Monolithic Stabilization (Final Cleanup) Link: ./archive/phase7_monolithic_stabilization_20260602/

Late May 2026 - Early June 2026: One-Off Fixes and Polish

One-off bug fixes and UX polish that landed in the days leading up to the major track work. All archived.

Archived

Track: Robust Live Simulation Verification
Track: Fix GUI Crashes in Tool Preset Manager and Discussion Hub Link: ./archive/gui_crash_fixes_20260531/
Track: Fix keys_down AttributeError in ImGui IO Link: ./archive/fix_imgui_keys_down_20260601/
Track: Selectable Thinking Monologs Link: ./archive/selectable_thinking_monologs_20260601/
Track: Fix MiniMax history sequencing and truncation Link: ./archive/minimax_history_fix_20260601/
Track: Preserve context selection on discussion switch and add empty context warning Link: ./archive/context_preservation_and_warnings_20260601/
Track: Fix Text Viewer docking conflicts and Tool Call row click interactivity Link: ./archive/text_viewer_and_tool_call_fixes_20260601/
Track: UX Refinements for Context Composition and Discussion Entries Link: ./archive/context_composition_ux_20260601/
Track: Combine AST Inspector and Slices Editor into a unified Structural File Editor Link: ./archive/structural_file_editor_20260601/
Track: Add per-response token metrics and AI-assisted history compression Link: ./archive/discussion_metrics_and_compression_20260601/
Track: Fix Approve Modal sizing and inline full preview Link: ./archive/approve_modal_ux_20260601/
Track: Implement Async Context Preview to fix UI hangs and add an 'Everything' Command Palette. Link: ./archive/command_palette_and_performance_20260602/ Goal: Async context preview offload (background thread, state lock) + Command Palette (32 commands, fuzzy search, Ctrl+Shift+P, Up/Down/Enter nav, 13 unit + 7 live_gui tests). Phases 1-3 complete.
Track: Comprehensive Documentation Refresh Link: ./archive/documentation_refresh_comprehensive_20260602/ Goal: Refresh stale documentation across docs/. Completed: ASCII file tree updates (docs/Readme.md + Readme.md 5→14 guides, 22→53 src modules), docs/guide_testing.md (new, comprehensive 251-file test suite reference), 7 per-source-file guides (guide_gui_2.md, guide_ai_client.md, guide_api_hooks.md, guide_mcp_client.md, guide_app_controller.md, guide_multi_agent_conductor.md, guide_models.md). All 14 guides cross-linked. Gap analysis: ./archive/documentation_refresh_comprehensive_20260602/gap_analysis.md.

Sub-tracks (all checkpointed):
- Sub-Track 1: Docs Layer Refresh [checkpoint: 20225c8] — 18 per-file atomic commits. 15 guides (8 refreshed + 7 new), Subsystem Index (24 entries), 106 cross-links all resolve, symbol parity fixed (apply_nerv_theme -> apply_nerv).
- Sub-Track 2: Conductor Docs Refresh [checkpoint: ef4efab2] — 4 per-file atomic commits: product.md (14 guides, MiniMax, Command Palette), tech-stack.md (MiniMax, Gemini Embedding 001), workflow.md (2026-06-02 doc refresh, 45-tool count), index.md (active track links).
- Sub-Track 3: Agent Config Refresh [checkpoint: 87f668a6] — 3 per-file atomic commits: AGENTS.md (5.4K -> 0.7K thin pointer), CLAUDE.md (6.7K -> 0.2K deprecation stub), GEMINI.md (5 providers, sloppy.py entry, 12 key modules). Drift check: 0 issues in 9 mirrored skill files.
Track: Test Consolidation & TOML Sandboxing [checkpoint: cb91006c] Spec: ./../../docs/superpowers/specs/2026-06-02-test-consolidation-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-test-consolidation.md Goal: Audit tests for real-TOML usage, migrate offenders to sandboxed patterns. Added scripts/check_test_toml_paths.py audit script (CI gate). Migrated test_mcp_client_whitelist_enforcement to tmp_path (was the only offender). Skipped redundant enforce_no_real_toml fixture — existing isolate_workspace autouse + audit script provide equivalent coverage.

Phase 8: UI Polish (2026-06-03)

Initialized: 2026-06-03

User review surfaced five outstanding UI issues, each previously attempted without success. This track addresses them as five independent phases with their own TDD cycles and atomic commits.

Active

Track: UI Polish (Five Issues) Spec: ./../../docs/superpowers/specs/2026-06-03-ui-polish-design.md Plan: ./../../docs/superpowers/plans/2026-06-03-ui-polish.md *Goal: Resolve five long-standing UI issues:
- Phase 1: GFM markdown table rendering (pre-processor into src/markdown_table.py, wire into MarkdownRenderer.render).
- Phase 2: Widen the Keep Pairs numeric input next to Truncate in the discussion panel (gui_2.py:3829, width 80 -> 140, switch to drag_int).
- Phase 3: Fix Refresh Registry button in Log Management — currently instantiates LogRegistry without calling load_registry() so the displayed table never reflects on-disk state (gui_2.py:1675).
- Phase 4: Add Vendor State tab to Operations Hub — at-a-glance provider/model, context-window utilization, cache hit rate, last error class, vendor quota (new src/vendor_state.py aggregator + controller.vendor_quota field + ai_client wire-up).
- Phase 5: Files & Media > Files directory-grouped tree (re-use aggregate.group_files_by_dir, mirror render_context_files_table collapsible-node style).*

Recently Archived (post-Phase 8)

Track: Clean Install Test [checkpoint: d14ae3b] Link: ./tracks/clean_install_test_20260603/, Spec: ./../../docs/superpowers/specs/2026-06-02-clean-install-test-design.md, Plan: ./../../docs/superpowers/plans/2026-06-02-clean-install-test.md Goal: Add opt-in pytest test (RUN_CLEAN_INSTALL_TEST=1) that clones the repo to tmp_path, runs uv sync, launches sloppy.py --enable-test-hooks, verifies Hook API responds. Catches "works on my machine" failures. Added clean_install marker to pyproject.toml. Created tests/test_clean_install.py (114 lines, uses urllib.request from stdlib per tech-stack.md dependency minimalism rule - deviation from plan). Skipped by default. Marked with @pytest.mark.clean_install.
Track: Fix markdown_helper.py for imgui-bundle >=1.92.801 [checkpoint: 7a34edf] Link: ./tracks/markdown_helper_language_api_compat_20260603/ Goal: First thing the clean install test caught. ed.TextEditor.LanguageDefinitionId enum was removed in imgui-bundle>=1.92.801. Replaced with version-compat shim helpers _get_language_id(name) and _set_editor_language(editor, lang_obj) that detect the API at runtime (1.92.5 enum vs 1.92.801+ factory). Also added parallel _editor_lang_cache to track current language tag per editor (robust to API name differences like "C++" vs "cpp"). Verified: test passes in opt-in mode (1.92.801), shim still works in local 1.92.5 env, follow-up commit b306f8f corrected test URL /api/mma_status -> /api/gui/mma_status (actual endpoint per src/api_hooks.py:181).
Track: Multi-Theme TOML System (Multi-Themes Mod) [checkpoint: 38abf231] Link: ./tracks/multi_themes_20260604/, Plan: ./../../docs/superpowers/plans/2026-06-04-theme-syntax-modularization.md Goal: TOML-based theming: per-theme file layout (themes/<name>.toml global + <project>/project_themes.toml overrides), schema (syntax_palette + [colors] table of imgui.Col_ snake_case keys), public API (load_themes_from_disk, get_syntax_palette_for_theme, apply_syntax_palette), MarkdownRenderer calls apply_syntax_palette on init, color-callable convention (C_LBL() / C_VAL() so theme switches take effect at use site), upstream 4-syntax-palette limit documented in ./../../docs/guide_themes.md (new guide). 8 new theme files shipped. Theme-caused production bug fixed at src/gui_2.py:3705-3707 (commit 1469ecac): DIR_COLORS dict stored C_VAL not C_VAL(), so imgui.text_colored(d_col, ...) was being passed a function. Fixed by calling the function at the use site.
[~] Track: Test Regression Fixes (post multi-themes ship) [checkpoint: d7487af4] Link: ./tracks/regression_fixes_20260605/, Plan: ./../../docs/superpowers/plans/2026-06-05-regression-fixes.md Goal: Resolve 21 failing tests surfaced after the multi-themes ship. 11 of 21 fixed across 10 atomic commits: theme regression (test_gui_progress C_LBL/C_VAL API change, 38abf231), pre-existing non-live_gui (test_gui_phase4 markdown_helper mocks, df43f158; test_view_presets persona_manager mock, 970f198c), GUI production bug (DIR_COLORS callable, 1469ecac), live_gui LogPruner busy loop (ac08ee87), RAG NoneType guard (c96bdb06). Root cause of remaining 10 live_gui failures identified (commit d7487af4): imgui.save_ini_settings_to_memory() at src/gui_2.py:601 crashes C-level (0xc0000005) when called in the first few render frames because ImGui's internal state (Fonts, DisplaySize, Settings) isn't ready. Crash is uncatchable from Python. Fixed with _ini_capture_ready flag (defer-not-catch pattern): first call returns b"" and sets the flag, subsequent calls invoke the C function. Bisect anchors: 7df65dff (pre-existing failures start), 7ea52cbb (theme-caused failures start). Deferred follow-up track needed for ~5 remaining live_gui tests (MMA engine state transitions, RAG status timing, one test needing substantial render path mocks).
Track: Live-GUI Fragility Fixes (post regression_fixes ship) [checkpoint: 1488e715] [superseded by live_gui_test_hardening_v2] Link: Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-fragility-fixes.md, Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-fragility-fixes-design.md Goal: Resolve the 3 remaining live_gui failures (269/272 → 271/272 plus 1 new regression unit test). 1-line src fix in _capture_workspace_profile (change ini=b"" to ini="" to satisfy WorkspaceProfile.ini_content: str contract that tomli_w enforces); the b"" sentinel was a regression from d7487af4 that caused save_workspace_profile to raise TypeError, profile never saved, load_workspace_profile became a no-op. 1 new unit test (tests/test_workspace_profile_serialization.py) encoding the str/bytes contract. test_prior_session_no_pop_imbalance is deferred to a separate follow-up track — the test was more under-mocked than the spec assumed; fixing imscope.window tuple-return only revealed the next un-mocked dependency (imgui.begin returning bool where 2-tuple expected at line 4496). render_main_interface is a kitchen-sink function requiring 50+ mocks; a follow-up track will either add the missing mocks or refactor the test to exercise a narrow prior-session render path. Change 4 (doc hardening of defer-not-catch sections) deferred to track end; not done due to scope focus.
Track: Live-GUI Test Hardening v2 (post v1 ship) [complete: 26e0ced4] Note: No standalone track directory was created; the v2 work was completed as commit 26e0ced4 within the live_gui_fragility_fixes_20260605 lineage. The "v1" track directory ./archive/hot_reload_python_20260516/ is unrelated; this is a logical successor track with no folder of its own. Goal: Resolve the 4 remaining live_gui failures (was 3 in v1; 1 new regression). v1 fixed the str/bytes sentinel bug but exposed a deeper issue. Decomposed into 4 sub-tracks, 3 active: Sub-track 1: live_gui_state_sync_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-live-gui-state-sync-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-live-gui-state-sync.md. REAL root cause was bad indentation in src/gui_2.py:607 (user fixed). The App class had _capture_workspace_profile being parsed as nested inside _apply_snapshot due to indentation. Once fixed, 3 tests (test_auto_switch_sim, test_workspace_profiles_restoration, test_undo_redo_lifecycle) immediately passed. App/Controller state sync is already correctly handled by getattr/setattr at lines 478-487. Sub-track 2: prior_session_test_harden_20260605 - Spec: ./../../docs/superpowers/specs/2026-06-05-prior-session-test-harden-design.md, Plan: ./../../docs/superpowers/plans/2026-06-05-prior-session-test-harden.md. Test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4. Sub-track 3: wait_for_ready_test_pattern_20260605 - SKIPPED. Tests already pass without polling. The flake hypothesis (time.sleep not enough) was wrong; the real cause was the indent. Polling can be a follow-up hardening pass if tests become flaky in CI. Sub-track 4: undo_redo_lifecycle_fix_20260605 - RESOLVED by Sub-track 1 indent fix. test_undo_redo_lifecycle now passes; no separate investigation needed. Net result: 4 originally-failing live_gui tests all pass. User can run the full batched suite to confirm.

Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)

Initialized: 2026-06-06 — the current major sprint. Four foundational tracks launched in this sprint, plus one follow-up. As of 2026-06-10: 3 recently completed (startup_speedup, test_batching_refactor, test_infrastructure_hardening); 4 in plan state (qwen, error_handling, data_structure, mcp_arch). The 4 in-plan tracks are now unblocked (the upstream test_infrastructure_hardening track is shipped).

Recently Completed (2026-06-06 to 2026-06-10)

Lightweight chronology; full spec/plan/state per track is in the linked folder.

Track: Sloppy.py Startup Speedup `[COMPLETE 2026-06-07]`

Link: ./tracks/startup_speedup_20260606/ (full spec/plan/state in folder)

[track-created: cd4fb045] [phase-1-2-done: f9a01258] [phase-3-done: 51c054ec] [phase-4-done: 3849d304] [phase-5-done: 515a3029] [sub-track-1-done: 253e1798] [sub-track-2e+f-done: 2e3a6385] [audit-CLEAN: 2e3a6385] [conftest-atexit-fix: 8957c9a5] [post-shipping-fix-1: 8c4791d0] [post-shipping-fix-2: 88fc42bb] [post-shipping-fix-3: 52ea2693]

9 phases, 57 tasks. 44 TDD tests added. Main Thread Purity Invariant enforced via scripts/audit_main_thread_imports.py CI gate. Final measured: import src.ai_client 161ms (was 1800ms; 91% reduction); import src.gui_2 341ms (was 1770ms; 81% reduction); total ~3067ms saved. 62 audit violations remain (large refactors deferred).

Track: Tier 2 Sandbox File Leak Prevention `[COMPLETE 2026-06-20]`

Link: ./tracks/tier2_leak_prevention_20260620/, Report: ../../docs/reports/TRACK_COMPLETION_tier2_leak_prevention_20260620.md

[phase-1-revert: fab2e55b] [phase-2-hook: 81e1fd7b] [phase-3-audit: f5d8ea04] [phase-4-install: 8f54deda]

Selective revert of the 4 user-named files from offender commit 00e5a3f2 (.opencode/agents/tier2-autonomous.md, .opencode/commands/tier-2-auto-execute.md, opencode.json, mcp_paths.toml). 3-layer defense-in-depth added: pre-commit hook (auto-unstages forbidden files at commit boundary; 12 tests), working-tree audit script with --strict CI gate (13 tests), and hook installation via scripts/tier2/setup_tier2_clone.ps1. 25 default-on tests pass. Out of scope (per user explicit list): the 4 throwaway scripts in scripts/tier2/artifacts/.../*.py and the project_history.toml timestamp. DEFERRED: CI wiring of audit_tier2_leaks.py --strict; rebase of stale tier-2 branches (tier2/result_migration_app_controller_phase6_20260619, tier2/test_sandbox_hardening_20260619) on origin/master@8f54deda to drop 00e5a3f2 (user action).

Track: Test Batching Refactor `[COMPLETE 2026-06-08] [archived]`

Link: ./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/

[track-created: b7a97374] [COMPLETE 2026-06-08] [phase-1-done: 57285d04] [phase-3-done: 5252b6d7] [phase-4-done: 50bd894f] [archived: 50bd894f]

4 phases, fixture-class-isolated tiers (0-3 + H + P) replacing alphabetical 4-at-a-time batching. Hand-curated tests/test_categories.toml overrides for cross-cutting files. Phase 2 (CI shadow run) skipped (no CI in repo).

Track: Test Infrastructure Hardening (2026-06-09) `[COMPLETE 2026-06-10] [archived]`

Link: ./archive/test_infrastructure_hardening_20260609/

[track-created: 566cf08c] [phase-1-done: 5df22fa8] [phase-2-done: 67d0211e] [phase-3-done: 006bb114] [phase-4-done: b8fcd9d6] [phase-5-done: 33d5cac] [phase-6-done: 7b87bbf5] [phase-7-done: 84edb200] [phase-8-done: 719fe9a]

8 phases, ~60 surgical tasks, 6.5 days. Fixes 3 root causes of test regression churn: FR1 subprocess health autouse, FR2 live_gui_workspace fixture (per-run timestamped under tests/artifacts/), FR3 _sync_rag_engine token+dirty coalescing. Plus FR4 set_value hook + FR5 clean_baseline marker. 314/314 tests green across all 11 tier batches. Closing report: docs/reports/test_infrastructure_hardening_batch_green_20260610.md. Lineage: workspace_path_finalize_20260609 + mma_tier_usage_reset_fix_20260610 + rag_phase4_sync_fix_20260610 (all also archived).

In Plan (or Pending Spec)

Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix `[track-created: 7c1d597e]`

Link: ./tracks/qwen_llama_grok_integration_20260606/, Spec: ./tracks/qwen_llama_grok_integration_20260606/spec.md, Plan: ./tracks/qwen_llama_grok_integration_20260606/plan.md (to be authored by writing-plans skill)

Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a Vendor Capability Matrix (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in src/vendor_capabilities.py. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared send_openai_compatible() helper in src/openai_compatible.py that operates on a normalized request/response data structure; each _send_<vendor>() is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor _send_minimax() to use the helper (~250 lines → ~50). Out of scope (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive. Now blocked by test_infrastructure_hardening_20260609 (was: none).

Status (2026-06-11): Phases 1-5 done; Phase 6 (docs) in progress. NOT ARCHIVING — has a follow-up track. See ./tracks/qwen_llama_grok_followup_20260611/ for the 5-phase follow-up. Audit report: ../docs/reports/qwen_llama_grok_followup_audit_20260611.md. 50/79 tasks done. Known gaps: tool-call loop only on MiniMax; 1 of 9 UX adaptations shipped; PROVIDERS in models.py is sprawl; src/ai_client.py needs codepath consolidation; local models need first-class priority; 12 v2 matrix fields documented but not implemented; Anthropic/Gemini/DeepSeek still not on the matrix.

Track: Data-Oriented Error Handling (Fleury Pattern) `[track-created: 494f68f9]`

Link: ./tracks/data_oriented_error_handling_20260606/, Spec: ./tracks/data_oriented_error_handling_20260606/spec.md, Plan: ./tracks/data_oriented_error_handling_20260606/plan.md

Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New src/result_types.py (ErrorKind enum, ErrorInfo dataclass, Result[T] with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new conductor/code_styleguides/error_handling.md canonical reference. Refactor src/mcp_client.py ((p, err) tuples → Result; 30+ assert p is not None → nil-sentinel paths), src/ai_client.py (ProviderError exception → ErrorInfo dataclass; _send_<vendor>() → _send_<vendor>_result() returning Result[str]; send() marked @deprecated; new send_result() public API), and src/rag_engine.py (RAGEngine methods → Result returns). Update conductor/product-guidelines.md + workflow.md + docs/guide_*.md so the convention is documented and future plans can incrementally migrate the remaining src/ files. Blocked by startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive. Follow-up: public_api_migration_20260606 (planned; not yet specced; no directory yet) — removes the deprecated ai_client.send() and migrates all callers. Detailed in the parent track's spec §12.1.

Status (2026-06-12): SHIPPED. Phases 1-5 complete on branch doeh-ai_client. Path C was used for src/mcp_client.py (additive *_result variants; the 30+ tool-function refactor deferred to follow-up). Full refactor was used for src/ai_client.py (ProviderError removed, 9 _send_*() renamed, send() marked @deprecated, send_result() public API added) and src/rag_engine.py (_init_vector_store_result, _validate_collection_dim_result, _get_state with NilRAGState). 28 new tests pass; 4 existing tests updated; 13 test regressions in test_llama_provider.py (3) + test_llama_ollama_native.py (4) + test_grok_provider.py (3) + test_minimax_provider.py (2) + test_live_gui_integration_v2.py (1) — all from the Phase 3 renames + ProviderError removal. Regressions are documented in state.toml [regressions_20260612] and are the intended work of public_api_migration_20260606. Archive status: directory remains in place (matches repo convention; archive is conceptual, not physical).

Track: Data Structure Strengthening (Type Aliases + NamedTuples) `[track-created: ed42a97a]`

Link: ./tracks/data_structure_strengthening_20260606/, Spec: ./tracks/data_structure_strengthening_20260606/spec.md, Plan: ./tracks/data_structure_strengthening_20260606/plan.md (to be authored by writing-plans skill)

Goal: Improve AI-readability by naming 430 currently-anonymous dict[str, Any] / list[dict[...]] / Tuple[...] types. New src/type_aliases.py with 10 TypeAlias definitions (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback) and 1 NamedTuple (FileItemsDiff). Mechanical replacement of 345 weak sites across 6 high-traffic files: src/ai_client.py (139), src/app_controller.py (86), src/models.py (51), src/api_hook_client.py (32), src/project_manager.py (20), src/aggregate.py (17). Add --strict mode to the existing scripts/audit_weak_types.py (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate scripts/audit_weak_types.baseline.json with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. Data-grounded: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. Honest about what's missing: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk. Now blocked by test_infrastructure_hardening_20260609 (was: none).

Track: AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek) `[track-created: 2026-06-14]` `[shipped: 2026-06-15]`

Link: ./tracks/ai_loop_regressions_20260614/, Spec: ./tracks/ai_loop_regressions_20260614/spec.md, Plan: ./tracks/ai_loop_regressions_20260614/plan.md, Metadata: ./tracks/ai_loop_regressions_20260614/metadata.json, Report: ../../docs/reports/TRACK_COMPLETION_ai_loop_regressions_20260615.md

Status: 2026-06-15 — SHIPPED with 1 known production regression + 2 deferred bugs (both flagged for follow-up). 3 documented bugs (Bug #1 dead except ai_client.ProviderError, Bug #2 error → no discussion entry, Bug #3 MiniMax thinking mono) are fixed. 7 new regression tests pass; 2 pre-existing tests in test_live_gui_integration_v2.py were adapted (not skipped). 12 commits.

Goal: Diagnose and fix the user-blocking AI loop regressions for the 4 providers (MiniMax, Gemini, Gemini CLI, DeepSeek) most heavily touched by the data_oriented_error_handling_20260606 track (shipped 2026-06-12) and the subsequent ai client pass commit 5030bd84 (2026-06-13, 503-line src/ai_client.py refactor). 3 distinct bugs: Bug #1 (3 dead except ai_client.ProviderError clauses in src/app_controller.py:305, 313, 3692 — the class was removed in commit 64b787b8). Bug #2 (_handle_request_event calls the deprecated ai_client.send() which now returns "" on error; _on_comms_entry filters empty text). Bug #3 (_send_minimax doesn't wrap reasoning in <thinking> tags in returned text).

5 phases: Phase 1 (TDD red), Phase 2 (FR1 fix), Phase 3 (FR2 fix), Phase 4 (FR3 fix), Phase 5 (regression sweep + docs). 17 tasks, 12 atomic commits, ~1.5 days of Tier 2 work.

Deferred to follow-up tracks (per user direction 2026-06-14): (1) Gemini / Gemini CLI thinking-format compatibility (Bug #4) — see doeh_test_thinking_cleanup_20260615 Phase 3. (2) <think> (half-width) marker support in thinking_parser.py (Bug #5) — see doeh_test_thinking_cleanup_20260615 Phase 4.

blocks: public_api_migration_20260606 (this track migrates 3 broken sites; the public_api track picks up the remaining 5 production + 63 test call sites).

Track: Data-Oriented Error Handling Test & Thinking-Parser Cleanup `[track-created: 2026-06-15]`

Link: ./tracks/doeh_test_thinking_cleanup_20260615/, Spec: ./tracks/doeh_test_thinking_cleanup_20260615/spec.md, Plan: ./tracks/doeh_test_thinking_cleanup_20260615/plan.md, Metadata: ./tracks/doeh_test_thinking_cleanup_20260615/metadata.json

Status: 2026-06-15 — Active, ready for Tier 2 implementation. User-blocking cleanup track. 1 critical production regression + 10 pre-existing test mock bugs + 2 deferred bugs (from ai_loop_regressions_20260614) + 2 housekeeping items.

Goal: Consolidate the cleanup work that didn't fit in data_oriented_error_handling_20260606 (the parent refactor) and ai_loop_regressions_20260614 (the immediate fix track). 5 phases: Phase 1 (CRITICAL: fix _api_generate NameError regression introduced by ai_loop_regressions_20260614 commit 2b7b571a — the FR2 fix accidentally removed the context_to_send variable definition while preserving its usage at line 278), Phase 2 (fix 11 pre-existing test mock bugs: 3 in test_grok_provider, 3 in test_llama_provider, 4 in test_llama_ollama_native, 1 in test_ai_client_tool_loop_builder, 1 in test_headless_service), Phase 3 (Bug #4 deferred: Gemini / Gemini CLI thinking-format compatibility), Phase 4 (Bug #5 deferred: <think> half-width marker support in thinking_parser), Phase 5 (housekeeping: state.toml duplicate-key fix, tracks.md row 24 update, full suite sweep, doc updates). 16 tasks, ~15 atomic commits, 5-8 hours of Tier 2 work (0.5-1 day).

Out of scope (documented in spec.md §7 + §12): public_api_migration_20260606 (planned; the broader migration of 5 production + ~50 test call sites not touched here), live_gui_mock_injection_20260615 (recommended; infrastructure for proper e2e live_gui + AI client tests), test_rag_phase4_final_verify (separate RAG concern), UI Polish Five Issues track phases 2/3 (separate track).

Track: MCP Architecture Refactor (Sub-MCP Extraction) `[track-created: 2720a894]`

Link: ./tracks/mcp_architecture_refactor_20260606/, Spec: ./tracks/mcp_architecture_refactor_20260606/spec.md, Plan: ./tracks/mcp_architecture_refactor_20260606/plan.md (to be authored by writing-plans skill)

Goal: Split the 2,205-line monolithic src/mcp_client.py (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Naming convention mcp_<type>.py for native MCPs: mcp_file_io.py (9 tools), mcp_python.py (14), mcp_c.py (5), mcp_cpp.py (5), mcp_web.py (2), mcp_analysis.py (2). The existing ExternalMCPManager is extracted to mcp_external.py (class name preserved). New MCPController class in src/mcp_client.py holds the 3-layer security model (extracted to src/mcp_client_security.py), the ALL_SUB_MCPS registration list, and the inverted-dict dispatch lookup. New src/mcp_client_legacy.py re-exports all 45+ old symbols for backward compat (the 4 existing test files + src/app_controller.py:61 continue to work). Each sub-MCP's invoke() returns Result[str, ErrorInfo] (Fleury pattern). Path parameters use the Metadata family aliases. Blocked by test_infrastructure_hardening_20260609, data_oriented_error_handling_20260606 (for Result/ErrorInfo), and data_structure_strengthening_20260606 (for Metadata aliases). 7 phases: foundation (security + controller), move-to-legacy, extract File I/O, extract Python, extract C/C++/Web/Analysis, extract External, dispatch update + docs + archive. Out of scope (per user): a per-MCP DSL (APL/K/Cosy-inspired) for compact tool calls — deferred to mcp_dsl_20260606 follow-up. JSON-only for now.

Track: RAG Phase 4 Stress Test Fix `[x] — fixed 16412ad5`

Status: 2026-06-06 — Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). index_file() upserts silently corrupt the collection, then search() fails with Collection expecting embedding with dimension of 3072, got 384 and the AI request never reaches 'done' status, timing out the 500.5s = 25s poll loop. Fix: RAGEngine._init_vector_store now calls _validate_collection_dim which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: test_rag_collection_dim_mismatch_recreates_collection and test_rag_collection_dim_match_preserves_collection in tests/test_rag_engine.py. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*

Track: SQLite-Granularity Inline Docs for gui_2.py `[COMPLETE: sqlite_docs_gui_2_20260612]`

Link: ./tracks/sqlite_docs_gui_2_20260612/, Spec: ./tracks/sqlite_docs_gui_2_20260612/spec.md, Plan: ./tracks/sqlite_docs_gui_2_20260612/plan.md

Status: 2026-06-12 — COMPLETE. SQLite-style docstrings with embedded ASCII layouts and DAG context have been added to key modules representing App lifecycle, discussion panels, context panels, settings hubs, and diagnostics panels.

Goal: Add SQLite-granularity docstrings with embedded ASCII layouts and DAG relationships for src/gui_2.py panel-by-panel. Ensure zero functional regression. 5 phases: app lifecycle & setup, discussion panel, context panel, settings/hubs, and diagnostics/modals.

Track: Continued SQLite-Granularity Inline Docs for gui_2.py `[COMPLETE: sqlite_docs_gui_2_continued_20260613]`

Link: ./tracks/sqlite_docs_gui_2_continued_20260613/, Spec: ./tracks/sqlite_docs_gui_2_continued_20260613/spec.md, Plan: ./tracks/sqlite_docs_gui_2_continued_20260613/plan.md

Status: 2026-06-13 — COMPLETE. Completed the SQLite-style docstring initiative for preset managers, editors, persona selectors, and the command palette modal.

Goal: Document preset managers/editors, persona selectors/editors, provider panel, and command palette in src/gui_2.py and src/command_palette.py with embedded SSDL and ASCII layouts.

Track: SQLite-Granularity Inline Docs for ai_client.py `[COMPLETE: ai_client_docs_20260613]`

Link: ./tracks/ai_client_docs_20260613/, Spec: ./tracks/ai_client_docs_20260613/spec.md, Plan: ./tracks/ai_client_docs_20260613/plan.md

Status: 2026-06-13 — COMPLETE. Added SQLite-granularity docstrings with SSDL traces, parameters, functional scopes, and thread boundaries for the primary entry points, providers, and helper functions in src/ai_client.py.

Goal: Add SQLite-granularity docstrings with SSDL traces, parameters, functional scopes, and thread boundaries for the primary entry points, providers, and helper functions in src/ai_client.py.

Track: Intent-Based Scripting Languages Survey `[COMPLETE: 213e4994]`

Link: ./tracks/intent_dsl_survey_20260612/, Spec: ./tracks/intent_dsl_survey_20260612/spec.md, Plan: ./tracks/intent_dsl_survey_20260612/plan.md, Report: ./tracks/intent_dsl_survey_20260612/report_v1.2.md, v1.1: ./tracks/intent_dsl_survey_20260612/report_v1.1.md, v1.0: ./tracks/intent_dsl_survey_20260612/report.md, Review: ./tracks/intent_dsl_survey_20260612/reportreview.md

Status: 2026-06-12 — COMPLETE. Research-only track (non-impl). Final deliverable: report_v1.2.md (1343 lines, 168KB+, 7 sections + 9-subsection expanded Appendix). 4-tier vocab with 42 verbs (T1 math 12, T2 pipeline 12, T3 shell 10, T4 AI-fuzzing 8); 10 prior-art clusters (0: O'Donnell philosophical anchor; 1: Concatenative; 2: Array; 3: Intent-mapping; 4: Meta-Tooling DSLs; 5: SSDL; 6: Command Palette; 7: Result convention; 8: Metadesk Self-Describing Data + Tag Dispatch; 9: Verse Multi-Paradigm Calculi with Transactional Semantics); 14-primitive grammar from user's math pseudocode; 4 hardware anchor claims; 10 AI-agent properties tying to existing project architecture; 8 open questions for the follow-up interpreter prototype. Version history: v1.0 (418 lines) → v1.1 (1301 lines, +883): XML/JSON rejection citation fix, OCR-restored Lottes quote, softened Wasm streaming-parse inference, expanded Appendix A.1-A.9. → v1.2 (1343 lines): (1) Renamed arena { } → tape { } (46 occurrences); (2) Mixed postfix/infix notation for math; (3) nagent attribution corrected (Jody Bruchon → Mike Acton); (4) Added Cluster 8 (Metadesk) and Cluster 9 (Verse) — survey now covers 10 clusters (sub-agents at research/cluster_8_metadesk.md and research/cluster_9_verse.md). Time-sensitive goal met: completed before nagent v2.2 hard boundary. Will be consumed by nagent v2.2 (Future-Track Candidate #4) and the future interpreter prototype (follow-up B track, separate). Appendix A.3/A.4 retain v1.1 form pending a sync pass; noted in v1.2 changelog at the top of the report.

Goal: Survey intent-based scripting languages as a design philosophy and propose a Meta-Tooling-facing intent DSL vocabulary. Research-only (non-impl): produces 1 markdown file at conductor/tracks/intent_dsl_survey_20260612/report.md. No new src/ code, no new tests, no pyproject.toml changes. The report is the foundation document for the user's nagent v2.2 (its "Future-Track Candidate #4: Intent-based DSL" section), the placeholder intent_dsl_for_meta_tooling_20260608_PLACEHOLDER (per mcp_architecture_refactor_20260606/spec.md §12.1 and nagent_review_20260608/metadata.json:28), and a future interpreter prototype (follow-up B track, separate). 7 sections: (1) the "intent-based" design philosophy (O'Donnell immediate-mode as the anchor); (2) prior art across 10 clusters (0: John O'Donnell IMGUI/MVC at johno.se/book/; 1: Forth family — Forth, ColorForth, KYRA/Onat, x68/Lottes, Joy, CoSy/Bob Armstrong; 2: Array — APL, K, BQN, Uiua; 3: Intent-mapping — Jofito/Jody, jq, nagent tag protocol [rejected as model], Wasm; 4: Meta-Tooling DSLs — mcp_dsl_20260606 placeholder, nagent's Bridge DSL, OpenAI/Anthropic tool-use; 5: SSDL shape primitives per computational_shapes_ssdl_digest_20260608.md; 6: Project's own Command Palette 33 commands; 7: Result[T] + ErrorInfo convention per data_oriented_error_handling_20260606); (3) the 14-primitive grammar formalized from the user's math pseudocode (determinate/minor/matrix-transpose snippets), with explicit ambiguity flags; (4) the 4-tier vocab (~40 verbs: T1 math ~10, T2 data pipeline ~12, T3 shell ~10, T4 AI-fuzzing tolerance ~8 — T4 is the novel contribution); (5) hardware mapping with 4 anchor claims (Onat/Lottes 2-register stack + magenta pipe + basic blocks + lambdas + preemptive scatter; O'Donnell "widgets are method invocations"; Forth/CoSy concatenative syntax; APL/K array data); (6) AI-agent properties (10 claims tying to existing project architecture: Meta-Tooling domain per guide_meta_boundary.md, runtime path through cli_tool_bridge.py, 3-layer security per guide_tools.md, 4 memory dimensions per nagent v2.1 §2.1, stable-to-volatile cache ordering, Result[T] envelope, Command Palette 33 commands, Hook API state fields, O'Donnell IEventTarget = sandbox verb, O'Donnell "reads are free" = cheap Tier 2 verbs); (7) ≥6 open questions for follow-up B (interpreter prototype) + connection block to intent_dsl_for_meta_tooling_20260608_PLACEHOLDER. 4 phases: source gathering + outline (checkpoint commit), write sections 1-3, write sections 4-7, self-review + user review + commit + register in tracks.md. Time-sensitive: report must complete before nagent v2.2 ships.*

Spec approved 2026-06-12 (commit b389f1be). 789 lines; modeled on data_oriented_error_handling_20260606/spec.md.

Track: Prior Session Test Harden (20260605) `[superseded by live_gui_test_hardening_v2_20260605]`

Status: 2026-05-05 — Surfaced during live_gui_fragility_fixes_20260605 execution. test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders is more under-mocked than expected. Completed as part of live_gui_test_hardening_v2_20260605: test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4.

Backlog (Provider + Language + Investigation)

Track: Manual UX Validation — ASCII-Sketch Workflow (NEW 2026-06-08)

Link: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/, Spec: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md, Plan: ./tracks/manual_ux_validation_20260608_PLACEHOLDER/plan.md Goal: Promote the ASCII-sketch UX ideation workflow (docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to a real track. Resolves 5 open questions (vocabulary preference, comparison policy, storage location, tooling, frequency), then executes the workflow on the first target: the per-entry rendering of the Discussion Hub at src/gui_2.py:3770 render_discussion_entry. The 23-op matrix A1-A7 in docs/guide_discussions.md is the source of truth; the SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md, 504 lines) informs the internal refactoring decisions. Complements the broader 20260302 track. 4 phases, 21 tasks, TDD-style for Phase 3. User-confirmed worth doing. Status: Active; Phase 1 (5 open questions to the user) is the current phase.

Track: Chunkification Optimization (NEW 2026-06-08, CONTINGENCY)

Link: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/, Spec: ./tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md Goal: Contingency document only. Activates ONLY when a hard constraint surfaces that no existing Python package can solve AND the target is hot enough to justify the C11 build cost. Per user (verbatim): "only worth it if I reach a hard constraint that I cannot solve with an existing python package." The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are NOT currently bottlenecks per src/aggregate.py:380-454 (pure-Python string concat, zero third-party markdown deps in pyproject.toml:6-27) and src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity, debounced). First fix if they become bottlenecks: add markdown-it-py OR switch to pickle/msgspec — NOT C11. The shape when activated: subprocess-launch C11 binary with request/response blob wire format (NOT stateful C extension). The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 + "Xar-style chunked arrays" recommendation in §5.2 pre-support this track. Status: Deferred. Promotes to active track when (if) the first hard constraint surfaces.

Track: Context First Message Fix

Link: ./tracks/context_first_message_fix_20260604/

Track: Fix Remaining Tests

Link: ./tracks/fix_remaining_tests_20260513/

Track: Test Harness Hardening

Link: ./tracks/test_harness_hardening_20260310/

Track: Test Patch Fixes

Link: ./tracks/test_patch_fixes_20260513/

Track: Test Batching Post-Refactor Polish

Link: ./tracks/test_batching_post_refactor_polish_20260607/

Track: Code Path Audit

Link: ./tracks/code_path_audit_20260607/, Spec: ./tracks/code_path_audit_20260607/spec.md, Plan: ./tracks/code_path_audit_20260607/plan.md (to be authored by writing-plans skill) Goal: Build src/code_path_audit.py — a static-analysis tool that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: custom postfix .dsl data + markdown + Mermaid + prefix tree text under docs/reports/code_path_audit/<date>/. The follow-up pipeline_pruning_20260607 consumes the .dsl files; the markdown + tree are for human review. MMA worker spawn is cold per user. Timing (revised 2026-06-08): the audit must run after the 4 foundational tracks ship (qwen_llama_grok, data_oriented_error_handling, data_structure_strengthening, mcp_architecture_refactor); pre-4-tracks code is too stale to ground optimization decisions.

Track: GUI Architecture Refinement

Link: ./tracks/gui_architecture_refinement_20260512/ (no spec.md; needs scoping before planning)

Follow-up (Planned, Not Yet Specced)

Track: Public API Result Migration (follow-up to data_oriented_error_handling_20260606)

Plan to be authored when data_oriented_error_handling_20260606 is complete; not started yet. Goal: Remove the deprecated ai_client.send() and migrate all callers to send_result(). Affects 5 production call sites in src/ (src/app_controller.py:290 + :3692, src/multi_agent_conductor.py:591, src/orchestrator_pm.py:86, src/conductor_tech_lead.py:68, plus src/mcp_client.py:2274 in the tool-result dispatch path) and 63 test files. The enumeration + baseline counts are recorded in the parent track's spec §12.1 and verified in this track's state.toml [baseline_post_qwen_track].

send_result(...) mirrors the send(...) signature (13+ parameters including 8 callbacks); see docs/guide_ai_client.md "Data-Oriented Error Handling (Fleury Pattern) > Public API" for the call shape.

Track: Public API Migration + UI Polish Test Cleanup (combined stability track) `[track-created: 2026-06-15]`

Link: ./tracks/public_api_migration_and_ui_polish_20260615/, Spec: ./tracks/public_api_migration_and_ui_polish_20260615/spec.md, Plan: ./tracks/public_api_migration_and_ui_polish_20260615/plan.md, Metadata: ./tracks/public_api_migration_and_ui_polish_20260615/metadata.json

Status: 2026-06-15 — Active, ready for Tier 2 implementation. User-blocking stability track that finishes the cleanup work from data_oriented_error_handling_20260606 and doeh_test_thinking_cleanup_20260615 before the data structure track.

Goal: Two concerns, one track. (A) Public API Migration — remove the deprecated ai_client.send() legacy wrapper. Migrate 3 remaining production call sites (src/conductor_tech_lead.py:68, src/orchestrator_pm.py:86, src/multi_agent_conductor.py:591) + 12 test files to send_result(). Fix 4 of the 10 pre-existing test failures (2 Qwen + 2 symbol_parsing) as a side effect. (B) UI Polish Test Cleanup — fix 2 broken test assertions in test_discussion_truncate_layout.py and test_log_management_refresh.py (the production code was already fixed by user commits d0b06575 and df7bda6e; the tests use find() which locates the comment block instead of the actual code). Combined result: 6 of 10 pre-existing failures fixed (1280 + 6 = 1286 pass; 4 RAG failures deferred to next track).

7 phases: Phase 1 (3 production call sites migrated), Phase 2 (12 test files migrated to send_result()), Phase 3 (2 Qwen test fixes), Phase 4 (2 symbol_parsing test fixes), Phase 5 (2 UI Polish test fixes), Phase 6 (deprecation removed: send() function + filterwarnings + test_deprecation_warnings.py), Phase 7 (docs + housekeep). ~28 tasks, ~28 atomic commits, 2-3 days Tier 2 work.

Critical audit findings (2026-06-15): UI Polish phases 1, 4, 5 already SHIPPED (commits 79ac9210, 3a864076, 74e02485); phases 2, 3 code SHIPPED (user commits) but tests broken (this track fixes). The 3 remaining production send() call sites (not 5 as the parent spec claimed — 2 were already migrated by doeh_test_thinking_cleanup_20260615; mcp_client.py:2274 was a misidentification). 12 test files use send() (not 63 as the parent spec claimed — doeh_test_thinking_cleanup_20260615 already migrated 11).

blocks: data_structure_strengthening_20260606 (cleaner Result API usage makes the type-alias replacement easier) and mcp_architecture_refactor_20260606 (transitively).

Out of scope (documented in spec §7): 4 RAG test fixes (separate RAG subsystem track), the _send_<vendor>() → _send_<vendor>_result() rename (not needed; tests work with current names), 23 lower-impact weak-type files (next major track: data_structure_strengthening_20260606), live_gui_mock_injection_20260615 infrastructure (separate infrastructure track).

Track: RAG Test Failures Fix (small bug-fix track) `[track-created: 2026-06-15]` `[shipped: 2026-06-15]`

Link: ./tracks/rag_test_failures_20260615/, Spec: ./tracks/rag_test_failures_20260615/spec.md, Plan: ./tracks/rag_test_failures_20260615/plan.md, Metadata: ./tracks/rag_test_failures_20260615/metadata.json

Status: 2026-06-15 — Shipped. 4 atomic commits. First fully green baseline since data_oriented_error_handling_20260606 shipped 2026-06-12 (1288 pass + 4 skip + 0 fail; was 1282 + 4 + 3 pre-track). All 11 batched test tiers pass.

Goal: Fix the 3 remaining pre-existing test failures (down from 4 as the parent track documented; test_rag_integration.py was inadvertently fixed by public_api_migration_and_ui_polish_20260615 Phase 2 follow-up commit 26e1b652). All 3 share the same root cause: 'NoneType' object has no attribute 'get' error in src/rag_engine.py, surfaced via _rebuild_rag_index → get_all_indexed_paths() (line 331: m.get('path') on None metadata) and _validate_collection_dim_result (line 150: if not embeddings raising ValueError on non-empty numpy arrays).

3 tests fixed by this track:

tests/test_rag_phase4_final_verify.py::test_phase4_final_verify (fails at line 65) — PASSES as of commit 35581163
tests/test_rag_phase4_stress.py::test_rag_large_codebase_verification_sim (fails at line 48) — PASSES as of commit 35581163
tests/test_rag_visual_sim.py::test_rag_full_lifecycle_sim (was listed as failing in spec §1.1, but actually passed at track execution time; the chromadb init path was already protected by the new tests in test_rag_sync_none_error.py)

Implementation summary (4 atomic commits):

fix(rag): handle None metadata in get_all_indexed_paths and non-empty numpy in dim check (35581163) — the production fix
conductor(checkpoint): Phase 3 complete (6a0ac357) — empty checkpoint
docs(rag): add troubleshooting section for NoneType.get error (d89c5810) — guide_rag.md update
conductor(track): mark rag_test_failures_20260615 as completed (pending) — metadata + tracks.md

New test file: tests/test_rag_sync_none_error.py (3 tests, all pass):

test_dim_check_does_not_raise_on_non_empty_ndarray — guards against the if not embeddings numpy ValueError
test_get_all_indexed_paths_handles_none_metadata — guards against m.get('path') on None
test_get_all_indexed_paths_returns_paths_with_metadata — positive control that normal flow still works

5 phases: Phase 1 (investigation + reproducing test), Phase 2 (fix), Phase 3 (full + batched test verification), Phase 4 (docs update), Phase 5 (metadata + tracks.md). ~10 tasks, 4 atomic commits, ~30 min Tier 2 work (much faster than the 0.5-1 day estimate).

Critical audit findings (2026-06-15): The RAGConfig() default is correct (vector_store is not None; provider is 'mock' by default). The RAGEngine with mock vector store constructs successfully (verified by direct instantiation). The error originates in the RAG sync worker at src/app_controller.py:1480. Most likely candidates for the .get(None) call: src/rag_engine.py:149 (embeddings = res.get('embeddings') in _validate_collection_dim_result) or a subtle config field that becomes None. Diagnostic strategy: add traceback.format_exc() to the except clause, capture the full traceback, identify the exact call site, fix surgically, remove the diagnostic.

blocks: data_structure_strengthening_20260606 (cleaner codebase makes type-alias replacement easier) and the user's stated send_result → send mass rename.

Out of scope (deferred to separate tracks): the send_result → send mass rename (user's stated manual refactor), 23 lower-impact weak-type files (data_structure_strengthening_20260606), live_gui_mock_injection_20260615 infrastructure (separate track), RAG test quality cleanup (poll loops, etc.; separate track).

Track: Tier 2 Autonomous Sandbox (unattended track execution with bounded blast radius) `[track-created: 2026-06-16]` [shipped: 2026-06-16]

Link: ./tracks/tier2_autonomous_sandbox_20260616/, Spec: ./tracks/tier2_autonomous_sandbox_20260616/spec.md, Plan: ./tracks/tier2_autonomous_sandbox_20260616/plan.md, Metadata: ./tracks/tier2_autonomous_sandbox_20260616/metadata.json, Guide: ../../docs/guide_tier2_autonomous.md

Status: 2026-06-16 — SHIPPED. 9 phases, 19 failcount tests (100% coverage), 8 report writer tests (100% coverage), 12 slash-command contract tests, 3 opt-in sandbox tests, 1 smoke e2e test (double-gated). Meta-tooling track — adds a sibling clone + 3-layer enforcement stack (OpenCode permissions + Windows restricted token + git hooks) for unattended Tier 2 execution. No permission: ask prompts during a normal run. 4 hard git bans enforced (git restore, git push*, git checkout, git reset); failcount threshold gives up after 3 red/green failures or 30 min no-progress, writes a markdown failure report with 7 sections + .STOPPED flag.

Goal: Eliminate the permission: ask bottleneck for well-regularized tracks (TDD red/green with atomic per-task commits) by running Tier 2 unattended in a sibling clone at C:\projects\manual_slop_tier2\. Bounded blast radius via 3-layer enforcement; bounded run via failcount threshold; auditable via per-run state.json + (on give-up) markdown failure report.

Deliverables: 7 new files in main repo (scripts/tier2/{__init__.py, failcount.py, failcount.toml, write_report.py, run_track.py, setup_tier2_clone.ps1, run_tier2_sandboxed.ps1} + 3 templates in conductor/tier2/ + 2 git hooks in conductor/tier2/githooks/ + 1 user guide docs/guide_tier2_autonomous.md) + 5 new test files + 1 trivial smoke track fixture in tests/artifacts/. pyproject.toml gets 2 new pytest markers (tier2_sandbox, tier2_smoke). The main repo's opencode.json is UNTOUCHED — Tier 1 retains its permission: ask workflow.

Test inventory: 19 failcount unit tests (default-on; 100% coverage on scripts/tier2/failcount.py); 8 report writer tests (opt-in via TIER2_SANDBOX_TESTS=1; 100% coverage on scripts/tier2/write_report.py); 12 slash command spec contract tests (default-on); 1 bootstrap -WhatIf test (opt-in); 1 sandbox enforcement pre-push hook test (opt-in); 1 smoke e2e test (double-gated).

blocks: None (meta-tooling; no source code impact on the Manual Slop app).

Track: Rename send_result to send (sandbox test track) `[track-created: 2026-06-16]` [shipped: 2026-06-17]

Link: ./tracks/send_result_to_send_20260616/, Spec: ./tracks/send_result_to_send_20260616/spec.md, Plan: ./tracks/send_result_to_send_20260616/plan.md, Metadata: ./tracks/send_result_to_send_20260616/metadata.json

Status: 2026-06-17 - SHIPPED. 6 phases, 10 atomic rename commits + 12 plan/script commits (22 total). The FIRST end-to-end test of the tier2_autonomous_sandbox_20260616 sandbox. Refactor track (mechanical rename; no behavior change). Scope: 37 files modified (6 src/ + 27 tests/ + 3 docs + 1 metadata/state); 0 files added, 0 files deleted. Spec estimated 38 files; actual 37 (test_deprecation_warnings.py no longer exists in the repo).

Goal: Revert the 2026-06-15 public_api_migration rename (ai_client.send -> ai_client.send_result) back to ai_client.send. The migration was driven by the data-oriented error handling convention; the user wants the shorter name now that the Tier 2 autonomous sandbox can do the rename safely. Pure mechanical rename across 37 files + a surgical rewrite of one stale deprecation section in error_handling.md.

Deliverables: 0 new files, 0 deleted files. The 22 commits include 10 atomic rename commits (1 in src/ai_client.py + 1 batch in 5 other src/ + 5 per-file in top 5 tests + 1 batch in 22 remaining tests + 1 in 3 docs) and 12 plan/script commits (audit trail + helper scripts). The audit_tier2 subdirectory in scripts/tier2/ accumulates the rename + plan-update helper scripts as a record of the mechanical change pattern.

Test inventory: 100/101 tests pass in the 26 files directly affected by the rename. 1 pre-existing failure (test_headless_service.py::test_generate_endpoint) unrelated to the rename - confirmed by running the same test against origin/master baseline where it also fails (missing credentials.toml). 7 broader suite failures are all pre-existing credentials.toml issues, also confirmed against origin/master.

blocks: None (independent refactor + sandbox test).

Track: Tier 2 Sandbox - Move State/Failures Off AppData `[track-created: 2026-06-18]`

Link: ./tracks/tier2_no_appdata_20260618/, Spec: ./tracks/tier2_no_appdata_20260618/spec.md, Plan: ./tracks/tier2_no_appdata_20260618/plan.md, Metadata: ./tracks/tier2_no_appdata_20260618/metadata.json

Status: 2026-06-18 — SHIPPED. 6 phases, 16 atomic commits (no test commits; the test changes ride with the source changes since the tests assert the source contract). Configuration-only fix — no behavior change in product code. Scope: 11 source files modified (5 scripts/tier2/ + 2 conductor/tier2/* + 2 docs/* + 1 conductor/* + 1 .gitignore) + 2 test files modified + 1 new test added.*

Goal: Per the user's 2026-06-18 'NEVER USE APPDATA' directive, move the Tier 2 failcount state and failure-report locations inside the Tier 2 clone (scripts/tier2/state//state.json and scripts/tier2/failures/_.md). Remove every AppData reference from the Tier 2 conventions, permissions, scripts, docs, and tests. After this track, the C:\Users\Ed\AppData\... tree is never referenced by the Tier 2 sandbox in any form.

Deliverables: 0 new files, 0 deleted files. The 16 commits include 4 source code changes (failcount.py + write_report.py + run_track.py + opencode.json.fragment), 2 prompt changes (agent + slash command), 2 bootstrap-script changes (setup + sandboxed launcher), 5 doc/test changes (guide + workflow + write_track_completion_report + slash_command_spec + no_temp_writes), 1 .gitignore, 1 write_track_completion_report output, and 1 last-minute example fix caught by the test. The track-isolated directories (scripts/tier2/state/ and scripts/tier2/failures/) are gitignored so they never pollute the source tree.

Test inventory: 37 default-on tests pass (test_failcount.py: 19; test_tier2_slash_command_spec.py: 14 + 1 new = 15; test_no_temp_writes.py: 1; the test_tier2_report_writer.py 8 tests are opt-in via TIER2_SANDBOX_TESTS=1 and pass when enabled). audit_no_temp_writes.py --strict exits 0. No regressions.

blocks: None. Followup: the user re-runs pwsh -File scripts/tier2/setup_tier2_clone.ps1 to re-bootstrap the live Tier 2 clone with the new conventions.

Track: Exception Handling Audit (Convention Compliance + Doc Clarification) `[track-created: 2026-06-16]`

Link: ./tracks/exception_handling_audit_20260616/, Spec: ./tracks/exception_handling_audit_20260616/spec.md, Plan: ./tracks/exception_handling_audit_20260616/plan.md, Metadata: ./tracks/exception_handling_audit_20260616/metadata.json, Report: ../../docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md

Status: 2026-06-16 — Active, completed (5/5 phases, ~12 tasks). An AUDIT + DOC track (no production code change). The deliverable is the audit script + the report + 3 doc/codestyle updates that close 5 gaps in the convention's documentation.

Goal: produce a static analyzer that classifies every try/except/finally/raise site in the codebase against the data-oriented error handling convention established by data_oriented_error_handling_20260606 (shipped 2026-06-12). The audit's value is in the report + the doc clarification, not in a refactor.

Deliverables:

scripts/audit_exception_handling.py — 792-line AST-based static analyzer; 10-category classification taxonomy (5 compliant + 3 violation + 1 suspicious + 1 unclear); --json, --top, --verbose, --strict, --include-tests modes; "delete to turn off" per feature_flags.md
conductor/code_styleguides/error_handling.md — 5 new sections (Boundary Types, The Broad-Except Distinction, Constructors Can Raise, Re-Raise Patterns, Audit Script) closing 5 gaps the audit revealed
docs/guide_app_controller.md — new "Exception Handling" section explaining the 13 FastAPI boundary sites + the 40 migration-target sites
conductor/product-guidelines.md — cross-reference to the audit script
docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md — 9-section report (370 lines) for the user to decide the next track

Headline numbers: 348 total sites across 65 files. 80 compliant (23%) + 25 suspicious (7%) + 211 violation (61%) + 32 unclear (9%). The 3 refactored baseline files (mcp_client, ai_client, rag_engine) have 112 sites / 77 violations (the convention reference; remaining violations are mostly broad-catches without ErrorInfo conversion). The 62 migration-target files have 236 sites / 134 violations (the work for future refactor tracks).

5 gaps the audit revealed + closed:

G1: FastAPI HTTPException in _api_* handlers not explicitly documented as a legitimate boundary (closed in styleguide + app_controller doc)
G2: The "broad except Exception" rule doesn't distinguish between "swallow" and "convert to ErrorInfo" (closed in styleguide)
G3: The "constructors can raise" rule is brief; needs elaboration (closed in styleguide)
G4: The "re-raise" pattern is not in the styleguide at all (closed in styleguide)
G5: The new audit script is not referenced from the styleguide (closed in styleguide + product-guidelines.md)

Critical audit findings (2026-06-16): The convention is applied to 3 of 65 src/ files (mcp_client.py, ai_client.py, rag_engine.py — the "baseline"). The remaining ~10 files in src/ are in the "migration-target" state. The top 3 candidates by violation count: src/gui_2.py (37 violations, 260KB), src/app_controller.py (35 violations + 13 FastAPI boundary = 48 sites, 166KB), src/session_logger.py (8 violations, 16KB). The user decides which is the next refactor track.

blocks: app_controller_result_migration_20260616 (recommended next track; 22 migration-target sites in app_controller.py after excluding the 13 FastAPI boundary sites; 2-3 days Tier 2), gui_2_result_migration (37 violations; 2-3 days Tier 2), session_logger_result_migration (8 violations; 0.5 day Tier 2). Also unblocks the user's stated send_result → send mass rename and the planned data_structure_strengthening_20260606 track.

Track: Result Migration (5 sub-tracks) `[track-created: 2026-06-16]`

Link: ./tracks/result_migration_20260616/, Spec: ./tracks/result_migration_20260616/spec.md, Plan: ./tracks/result_migration_20260616/plan.md, Metadata: ./tracks/result_migration_20260616/metadata.json, Audit: ../../docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md

Status: 2026-06-16 — Umbrella track; spec/plan/metadata planned. 2026-06-17 update: sub-track 1 (result_migration_review_pass_20260617) shipped; sub-track 2 (result_migration_small_files_20260617) initialized; 3 sub-tracks remaining. The umbrella specifies the sequence and scope of the 5 sub-tracks; each sub-track gets its own spec/plan/metadata when it starts.

Goal: Eliminate all 211 violations + 25 suspicious + 32 unclear = 268 "bad" sites across 42 files (per the exception_handling_audit_20260616 report). After all 5 sub-tracks ship, the data-oriented error handling convention is fully applied to all 65 src/ files, and the audit_exception_handling.py --strict mode can be wired into CI as a pre-commit gate.

5 sub-tracks (consistent result_migration_* prefix):

#	Sub-track	Scope	Why this position
1	`result_migration_review_pass`	S	57 sites (32 UNCLEAR + 25 INTERNAL_RETHROW) across 15 files	First: human review + audit script heuristic updates inform all later sub-tracks
2	`result_migration_small_files`	L	37 files (35 SMALL + 2 MEDIUM from `--by-size`); 72 V+S sites	Second: quick wins; doesn't depend on the orchestrator or GUI; can run in parallel with 3-4
3	`result_migration_app_controller`	XL	56 sites in `src/app_controller.py` (166KB; 13 FastAPI boundary stay as-is) — Phase 6 added 2026-06-18 to fix the 28 silent-swallow sites that Phase 3's `logging.debug` migration didn't actually migrate (audit gate: `--strict` exits 0)	Third: high coordination with Hook API + MMA + RAG; gates the GUI migration
4	`result_migration_gui_2`	XL	55 sites in `src/gui_2.py` (260KB; 14 ? includes the +1 site `src/gui_2.py:1349` from the review pass)	Fourth: depends on 3 for clean API; the largest file
5	`result_migration_baseline_cleanup`	L	112 sites in 3 refactored files (mcp_client.py, ai_client.py, rag_engine.py)	Fifth: closes the gaps in the convention reference; parent's Path C deferred work

Total: 5 sub-tracks, 268 sites across 42 files, ~2100 lines changed.

NO day estimates (per the new Tier 1 rule added 2026-06-16). Effort is measured by scope (N files, M sites) only. The user / Tier 2 agent decides the actual pacing.

Sequence: 1 (review) -> 2 (small files) -> 3 (app_controller) -> 4 (gui_2) -> 5 (baseline cleanup). Tracks 2 + 5 can run in parallel; tracks 3 + 4 must be sequential (the GUI calls controller methods); track 1 is independent.

blocks: data_structure_strengthening_20260606 (parallel track; uses the cleaner Result API from this phase) and the user's stated send_result → send mass rename.

Out of scope (deferred to separate tracks): the send_result → send mass rename (user's stated manual refactor; post-this-phase), 23 lower-impact weak-type files (data_structure_strengthening_20260606), live_gui_mock_injection_20260615 infrastructure (separate track), RAG test quality cleanup (poll loops; separate track), and any audit script changes that belong in the review pass (sub-track 1) — those are detailed in conductor/tracks/result_migration_20260616/plan.md.

Track: Live GUI Test Infrastructure Fixes (test_execution_sim_live crash + test_live_gui_workspace_exists race) `[track-created: 2026-06-18]` [shipped: 2026-06-18]

Link: ./tracks/live_gui_test_fixes_20260618/, Spec: ./tracks/live_gui_test_fixes_20260618/spec.md, Plan: ./tracks/live_gui_test_fixes_20260618/plan.md, Metadata: ./tracks/live_gui_test_fixes_20260618/metadata.json, Report: ../../docs/reports/TRACK_COMPLETION_live_gui_test_fixes_20260618.md

Status: 2026-06-18 - SHIPPED. 4 phases, 8 atomic commits (1 setup + 4 TDD/test/fix + 2 docs + 1 audit). Pre-conditions for sub-track 2's full closure. Scope: 2 issues fixed; 2 src files modified + 2 test files extended + 1 conftest modified + 2 docs + 2 audit logs. Test result: 11/11 tiers PASS clean (~825s total).

Goal: Fix the 2 documented test infrastructure issues that blocked sub-track 2 (result_migration_small_files_20260617) from full closure. The 2 issues were reported as "documented issues" by sub-track 2 Phase 13 (commit 30ca3265). Both are pre-existing (not regressions from the Result[T] migration).

The 2 fixes:

Issue 1: test_execution_sim_live GUI subprocess crash (tier-3-live_gui)

Symptom: GUI subprocess (port 8999) crashes mid-test with 0xC00000FD = STATUS_STACK_OVERFLOW
Root cause: imgui.set_window_focus("Response") was called directly during the response panel render, exhausting the GUI main thread's 1.94 MB stack on Windows
Fix: defer the focus call to the next frame's idle phase via a new _pending_focus_response flag (commits d02c6d56, 0f796d7d)
Same root cause as test_z_negative_flows.py (documented in docs/reports/NEGATIVE_FLOWS_INVESTIGATION_20260617_REFINED.md)

Issue 2: test_live_gui_workspace_exists xdist race (tier-1-unit-gui)

Symptom: xdist race where the owner worker's teardown removes the shared workspace path before a client worker's test can assert it exists
Root cause: live_gui_workspace fixture in tests/conftest.py:727 returned handle.workspace without ensuring the path existed
Fix: call workspace.mkdir(parents=True, exist_ok=True) before returning (commits 3fdb2592, bf6bc67b)
Pre-existing on parent commit 4ab7c732 (verified in tests/artifacts/PHASE14_PARENT_VERIFICATION.log)

Deliverables:

1 setup commit (chore(scripts): relocate Tier 2 state paths to project-relative) - honors NEVER USE APPDATA directive; the failcount state and write_report failures directory now default to project-relative paths under tests/artifacts/
2 TDD red + 2 TDD green commits (one pair per issue)
1 audit commit (chore(audit): Phase 14.1 - verify Issue 2 on parent commit 4ab7c732)
1 audit commit (chore(audit): Phase 4.1 - 11/11 test tiers PASS clean)
2 docs commits (sub-track 2 reports updated with Phase 14 addendum)
1 track artifact import commit (conductor(track): import live_gui_test_fixes_20260618 artifacts)

blocks: sub-track 2 of result_migration_20260616 (full closure requires the 2 issues fixed).

Out of scope (deferred to follow-up track): the 4 @pytest.mark.skip markers for Gemini 503 pre-existing failures (test_auto_aggregate_skip, test_view_mode_summary, test_view_mode_default_summary, test_view_mode_custom_empty_default_to_summary). To remove them, mock the Gemini API in summarize.summarise_file for tests.

Track: Test Sandbox Hardening (hard sandbox for tests; root-cause fix for test data loss) `[track-created: 2026-06-19]`

Link: ./tracks/test_sandbox_hardening_20260619/, Spec: ./tracks/test_sandbox_hardening_20260619/spec.md, Plan: ./tracks/test_sandbox_hardening_20260619/plan.md, Metadata: ./tracks/test_sandbox_hardening_20260619/metadata.json

Status: 2026-06-19 - SPEC + PLAN committed. Ready for Tier 2 implementation. 9 phases, 30 tasks, ~11 atomic commits.

Goal: Make any pytest or run_tests_batched.py invocation provably incapable of writing files outside ./tests/. Default-on Python guard + opt-in OS-level wrapper. Root-cause fix: eliminate the silent SLOP_CONFIG env-var fallback that lets tests accidentally touch the user's real manual_slop.toml and related top-level files.

The 5 enforcement layers:

FR2 root-cause fix — src/paths.py:get_config_path() no longer falls back to <project_root>/config.toml via SLOP_CONFIG. New API: paths.set_config_override(path). CLI flag --config <path> at the entry point (sloppy.py for production, conftest.py for tests).
FR1 Python guard — sys.addaudithook autouse fixture blocks writes outside ./tests/ with RuntimeError("TEST_SANDBOX_VIOLATION: ..."). Hard fail; reads unaffected.
FR3 isolation migration — isolate_workspace moved off tmp_path_factory.mktemp to tests/artifacts/_isolation_workspace_<RUN_ID>/. pyproject.toml adds addopts = "--basetemp=tests/artifacts/_pytest_tmp". All test infra paths now under ./tests/.
FR4 static audit — scripts/audit_test_sandbox_violations.py flags hardcoded paths to top-level TOMLs + tempfile.mkdtemp/mkstemp without dir=. CI gate (--strict exits 1).
FR5 OS-level wrapper — scripts/run_tests_sandboxed.ps1 (Windows restricted-token + Job Object; OPT-IN).

User directives (locked 2026-06-19):

NO ENV VARS for config path. --config CLI flag is the only override mechanism.
Test workspace file naming: config_overrides.toml (per user direction).
Hard fail on any sandbox violation (no warnings, no soft fails).
Tests should never need AppData temp.
Out of scope (deferred to follow-up tracks): converting the other 7 SLOP_* env vars (SLOP_GLOBAL_PRESETS, SLOP_GLOBAL_TOOL_PRESETS, SLOP_GLOBAL_PERSONAS, SLOP_GLOBAL_WORKSPACE_PROFILES, SLOP_CREDENTIALS, SLOP_MCP_ENV, SLOP_LOGS_DIR, SLOP_SCRIPTS_DIR) — user considers this the "mess" to address separately.

Baseline (per result_migration_small_files_20260617 shipped 2026-06-18): 1288 passed + 4 xdist-skipped. VC8 requires no regression vs. this baseline.

Root causes of data loss (per Phase 1 audit):

src/paths.py:get_config_path() at line 42 silently falls back to <project_root>/config.toml when SLOP_CONFIG is unset (the default for tests). This is the silent default that bites.
tests/conftest.py:isolate_workspace at line 265 uses tmp_path_factory.mktemp which lives in %TEMP%\pytest-of-<user>\ on Windows — outside ./tests/.
The Layer 1 Python guard is the runtime safety net; FR2 + FR3 are the proper fixes.

Deferred follow-up tracks (per metadata.json deferred_to_followup_tracks):

Convert the other 7 SLOP_* env vars to CLI flags (same pattern: paths.set_<thing>_override() + entry-point flag).
macOS/Linux OS-level sandbox wrapper (run_tests_sandboxed.sh using bwrap/unshare).
Per-fixture sandbox strictness tuning (@pytest.fixture(sandbox_strict=True)).
Read-side isolation (block reads of real config from tests).

Phase 9: Chore Tracks

Initialized: 2026-06-07

Completed (recently archived or in `tracks/`)

Track: Unused Scripts Cleanup [checkpoint: 46ce3cd] Link: ./tracks/unused_scripts_cleanup_20260607/, Spec: ./tracks/unused_scripts_cleanup_20260607/spec.md, Plan: ./tracks/unused_scripts_cleanup_20260607/plan.md Goal: Remove 30 confirmed-unused one-off scripts from scripts/ (56 → 26 files, 54% reduction). 5 atomic per-category commits; no new CI gate; follow-up unused_scripts_audit_20260607 recorded. All non-GUI test batches still pass; 2 audit scripts (main_thread_imports, weak_types) report no new violations.
Track: License & CVE Audit (Dependency Compliance) [checkpoint: a7ab994f] Link: ./tracks/license_cve_audit_20260607/, Spec: ./tracks/license_cve_audit_20260607/spec.md, Plan: ./tracks/license_cve_audit_20260607/plan.md Goal: Build scripts/audit_license_cve.py — single audit script that checks third-party deps (pyproject.toml + uv.lock transitive) for license compliance + known CVEs + version-pinning + SPDX source-headers. Tilde-pin all deps, delete requirements.txt, regenerate uv.lock (gitignored per project policy), add --strict mode + baseline file (CI gate). Policy: ALLOW (permissive + weak copyleft + public domain), BLOCK (GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, unknown). Track is scope-limited to third-party deps; the project's own LICENSE and SPDX headers are explicitly OUT of scope (the user reserves all rights to the repo). 28 unit + integration tests passing; --strict mode wired as CI gate; baseline file committed at scripts/audit_license_cve.baseline.json. 4 atomic commits: audit script + initial report, tilde-pin + lock regen + delete requirements.txt, --strict + baseline, tracks.md update.
Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix [COMPLETE 2026-06-11] [archived] Link: ./archive/qwen_llama_grok_integration_20260606/, Spec: ./archive/qwen_llama_grok_integration_20260606/spec.md, Plan: ./archive/qwen_llama_grok_integration_20260606/plan.md Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Vendor Capability Matrix (7 v1 + 12 v2 = 19 capabilities total) in src/vendor_capabilities.py. Shared send_openai_compatible() helper in src/openai_compatible.py. MiniMax refactored to use the helper. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive. Follow-up track: qwen_llama_grok_followup_20260611 (also archived).
Track: Qwen/Llama/Grok Follow-Up (tool loop, PROVIDERS move, UX, local-first, matrix v2, old-vendor wiring) [COMPLETE 2026-06-11] [archived] Link: ./archive/qwen_llama_grok_followup_20260611/, Spec: ./archive/qwen_llama_grok_followup_20260611/spec.md, Plan: ./archive/qwen_llama_grok_followup_20260611/plan.md Goal: Close the gaps from the parent track. 6 phases: (1) run_with_tool_loop shared helper + apply to 4 vendors; (2) PROVIDERS move to src/ai_client.py (HARD RULE compliance) + 4 import sites; (3) UX adaptations 2-9; (4) local-first + matrix v2 expansion (12 new fields, native Ollama adapter, GUI "Local Model" badge, runtime local override); (5) Anthropic/Gemini/DeepSeek matrix entries + old-vendor matrix wiring (grok + minimax consult the v2 fields); (6) archive. Reports: ../docs/reports/qwen_llama_grok_followup_phase5_final_20260611.md, ../docs/reports/qwen_llama_grok_followup_session_end_20260611.md, ../docs/reports/qwen_llama_grok_followup_deferred_work_20260611.md, ../docs/reports/meta_llama_api_verification_20260611.md.

Active Research Tracks (2026-06+)

Tracks that produce a research deliverable (a markdown report) rather than Application code. These are non-impl by design.

Active

Track: Fable System Prompt Review (Critical Analysis) [initialized: 058e2c93; shipped: 2026-06-18] Link: ./tracks/fable_review_20260617/, Spec: ./tracks/fable_review_20260617/spec.md, Metadata: ./tracks/fable_review_20260617/metadata.json, State: ./tracks/fable_review_20260617/state.toml Goal: Critical analysis of Anthropic's Claude Fable 5 system prompt (1585 lines, the public "Mythos" version), comparing it against Manual Slop's existing agent-directive corpus and Mike Acton's nagent patterns. 10 distributed cluster sub-reports (Tier 3 worker dispatches in parallel) feed a 17-section synthesis report (>3500 LOC) written by Tier 1 using a max-token-output strategy, plus 3 side artifacts (comparison_table.md, decisions.md for the deferred nagent-rebuild, nagent_takeaways_fable_20260617.md). Verdict framework: Useful / Persona Performance / Anti-User / Mixed. Hard rule (per user 2026-06-17): docs/artifacts/Fable System Prompt.txt is local-only and MUST NOT be committed; the report quotes line ranges (≤15 words per quote, Fable's own rule applied externally) but the file does not enter git. No day estimates. No T-shirt sizes. Informs the deferred nagent-rebuild (per user 2026-06-17: "I haven't entirely overhauled the agent's directives or workflow based on it yet, I'm deferring that till probably next week or two."). 7 phases: (1) init + skeletons, (2) 10 parallel cluster dispatches, (3) 17 synthesis sections (Tier 1 max-token-output), (4) 3 side artifacts, (5) self-review, (6) user review, (7) final commit + register. SHIPPED 2026-06-18: 14 files, 5,683 LOC total (10 cluster sub-reports 3,278 LOC + synthesis report 1,800 LOC + 3 side artifacts 605 LOC). Verdict distribution: 47% Useful, 38% Persona, 15% Anti-User, 7% Mixed. 20 concrete recommendations in decisions.md (11 adoptions + 7 explicit rejections + 2 ignore). Fable-artifact discipline verified: 0 commits, 0 tracked files, 0 tree entries. Note: synthesis report is 1,800 LOC (below 3,500 spec target); content is complete but per-section verbosity is below spec target. Track ready for archive (deferred per project convention).

Notes

Archive link convention: ./archive/... paths in this file resolve to conductor/archive/... (this file is at conductor/tracks.md). The 71 archive links in this file are all valid as of 2026-06-08.

Status legend:

[ ] not started
[~] in progress
[x] completed (track may still be in tracks/ or may have been moved to archive/)
~~**...**~~ struck-through (renamed/replaced/superseded)

Naming convention: Each track's spec.md and plan.md (where present) follow the project's standard format: spec.md for design intent (the "why"), plan.md for executable tasks (the "how"). See conductor/tracks/data_oriented_error_handling_20260606/ for the canonical example.

Editing this file: When you mark a track as [x] and move its folder to archive/, also move it to the appropriate Archived sub-section. When you start a new track, create the folder under tracks/ first, then add the entry to the Active Tracks table at the top. The git-blame sort order (0a, 0b, 0c...) is no longer used; this file is now organized by phase + dependency.

120 KiB Raw Blame History

Project Tracks

Active Tracks (Current Queue)

Phase 0: Infrastructure (Critical)

Completed

Phase 1: Pre-Track Foundation (2026-02 - 2026-03)

Completed

Phase 2: Strict Execution Queue

Completed

Phase 3 - Phase 4: Foundational Tracks (March 2026)

Archived

Phase 5: Codebase Curation

Completed (all archived)

Analysis & Structural Review

Phase 6: Context Composition Redesign

Completed (all archived)

Context Control & Workflow Enhancements

Active

Hot Reload Feature (2026-05-16)

Archived

Phase 7: Stabilization & Polishing (2026-05-13 to 2026-06-02)

Archived

Late May 2026 - Early June 2026: One-Off Fixes and Polish

Archived

Phase 8: UI Polish (2026-06-03)

Active

Recently Archived (post-Phase 8)

Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)

Recently Completed (2026-06-06 to 2026-06-10)

Track: Sloppy.py Startup Speedup [COMPLETE 2026-06-07]

Track: Tier 2 Sandbox File Leak Prevention [COMPLETE 2026-06-20]

Track: Test Batching Refactor [COMPLETE 2026-06-08] [archived]

Track: Test Infrastructure Hardening (2026-06-09) [COMPLETE 2026-06-10] [archived]

In Plan (or Pending Spec)

Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix [track-created: 7c1d597e]

Track: Data-Oriented Error Handling (Fleury Pattern) [track-created: 494f68f9]

Track: Data Structure Strengthening (Type Aliases + NamedTuples) [track-created: ed42a97a]

Track: AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek) [track-created: 2026-06-14] [shipped: 2026-06-15]

Track: Data-Oriented Error Handling Test & Thinking-Parser Cleanup [track-created: 2026-06-15]

Track: MCP Architecture Refactor (Sub-MCP Extraction) [track-created: 2720a894]

Track: RAG Phase 4 Stress Test Fix [x] — fixed 16412ad5

Track: SQLite-Granularity Inline Docs for gui_2.py [COMPLETE: sqlite_docs_gui_2_20260612]

Track: Continued SQLite-Granularity Inline Docs for gui_2.py [COMPLETE: sqlite_docs_gui_2_continued_20260613]

Track: SQLite-Granularity Inline Docs for ai_client.py [COMPLETE: ai_client_docs_20260613]

Track: Intent-Based Scripting Languages Survey [COMPLETE: 213e4994]

Track: Prior Session Test Harden (20260605) [superseded by live_gui_test_hardening_v2_20260605]

Backlog (Provider + Language + Investigation)

Track: Bootstrap gencpp Python Bindings

Track: Tree-Sitter Lua MCP Tools

Track: GDScript Language Support Tools

Track: C# Language Support Tools

Track: OpenAI Provider Integration

Track: Zhipu AI (GLM) Provider Integration

Track: AI Provider Caching Optimization

Track: Manual UX Validation & Review

Track: Manual UX Validation — ASCII-Sketch Workflow (NEW 2026-06-08)

Track: Chunkification Optimization (NEW 2026-06-08, CONTINGENCY)

Track: Context First Message Fix

Track: Fix Remaining Tests

Track: Test Harness Hardening

Track: Test Patch Fixes

Track: Test Batching Post-Refactor Polish

Track: Code Path Audit

Track: GUI Architecture Refinement

Follow-up (Planned, Not Yet Specced)

Track: Public API Result Migration (follow-up to data_oriented_error_handling_20260606)

Track: Public API Migration + UI Polish Test Cleanup (combined stability track) [track-created: 2026-06-15]

Track: RAG Test Failures Fix (small bug-fix track) [track-created: 2026-06-15] [shipped: 2026-06-15]

Track: Tier 2 Autonomous Sandbox (unattended track execution with bounded blast radius) [track-created: 2026-06-16] [shipped: 2026-06-16]

Track: Rename send_result to send (sandbox test track) [track-created: 2026-06-16] [shipped: 2026-06-17]

Track: Tier 2 Sandbox - Move State/Failures Off AppData [track-created: 2026-06-18]

Track: Exception Handling Audit (Convention Compliance + Doc Clarification) [track-created: 2026-06-16]

Track: Result Migration (5 sub-tracks) [track-created: 2026-06-16]

Track: Live GUI Test Infrastructure Fixes (test_execution_sim_live crash + test_live_gui_workspace_exists race) [track-created: 2026-06-18] [shipped: 2026-06-18]

Track: Test Sandbox Hardening (hard sandbox for tests; root-cause fix for test data loss) [track-created: 2026-06-19]

Phase 9: Chore Tracks

Completed (recently archived or in tracks/)

Active Research Tracks (2026-06+)

Active

Notes

120 KiB

Raw Blame History

Track: Sloppy.py Startup Speedup `[COMPLETE 2026-06-07]`

Track: Tier 2 Sandbox File Leak Prevention `[COMPLETE 2026-06-20]`

Track: Test Batching Refactor `[COMPLETE 2026-06-08] [archived]`

Track: Test Infrastructure Hardening (2026-06-09) `[COMPLETE 2026-06-10] [archived]`

Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix `[track-created: 7c1d597e]`

Track: Data-Oriented Error Handling (Fleury Pattern) `[track-created: 494f68f9]`

Track: Data Structure Strengthening (Type Aliases + NamedTuples) `[track-created: ed42a97a]`

Track: AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek) `[track-created: 2026-06-14]` `[shipped: 2026-06-15]`

Track: Data-Oriented Error Handling Test & Thinking-Parser Cleanup `[track-created: 2026-06-15]`

Track: MCP Architecture Refactor (Sub-MCP Extraction) `[track-created: 2720a894]`

Track: RAG Phase 4 Stress Test Fix `[x] — fixed 16412ad5`

Track: SQLite-Granularity Inline Docs for gui_2.py `[COMPLETE: sqlite_docs_gui_2_20260612]`

Track: Continued SQLite-Granularity Inline Docs for gui_2.py `[COMPLETE: sqlite_docs_gui_2_continued_20260613]`

Track: SQLite-Granularity Inline Docs for ai_client.py `[COMPLETE: ai_client_docs_20260613]`

Track: Intent-Based Scripting Languages Survey `[COMPLETE: 213e4994]`

Track: Prior Session Test Harden (20260605) `[superseded by live_gui_test_hardening_v2_20260605]`

Track: Public API Migration + UI Polish Test Cleanup (combined stability track) `[track-created: 2026-06-15]`

Track: RAG Test Failures Fix (small bug-fix track) `[track-created: 2026-06-15]` `[shipped: 2026-06-15]`

Track: Tier 2 Autonomous Sandbox (unattended track execution with bounded blast radius) `[track-created: 2026-06-16]` [shipped: 2026-06-16]

Track: Rename send_result to send (sandbox test track) `[track-created: 2026-06-16]` [shipped: 2026-06-17]

Track: Tier 2 Sandbox - Move State/Failures Off AppData `[track-created: 2026-06-18]`

Track: Exception Handling Audit (Convention Compliance + Doc Clarification) `[track-created: 2026-06-16]`

Track: Result Migration (5 sub-tracks) `[track-created: 2026-06-16]`

Track: Live GUI Test Infrastructure Fixes (test_execution_sim_live crash + test_live_gui_workspace_exists race) `[track-created: 2026-06-18]` [shipped: 2026-06-18]

Track: Test Sandbox Hardening (hard sandbox for tests; root-cause fix for test data loss) `[track-created: 2026-06-19]`

Completed (recently archived or in `tracks/`)