Private
Public Access
0
0
Files
manual_slop/conductor/tracks.md
T
ed de1ffadd92 conductor(tracks): update code_path_audit_20260607 entry to reflect MVP pivot
Updated the Code Path Audit entry in the tracks.md registry to accurately
describe the MVP state after the code_path_audit_polish_20260622 follow-up:

REMOVED:
- '4 renderers (to_dsl_v2 flat-section, to_markdown 10-section, to_tree
  box-drawing, parse_dsl_v2 round-trip)' -> '2 renderers (to_markdown
  10-section, to_tree box-drawing)'
- '14-tagged-word v2 postfix DSL' claim (the DSL parser was deprecated)

ADDED:
- 'MVP output is a single AUDIT_REPORT.md (6797 lines, 311KB) + per-aggregate
  markdowns + summary.md as a TOC pointer'
- '127 tests passing after the polish follow-up (was 131 pre-polish; -4 DSL
  tests removed)' (was previously 131)
- Note about DSL deprecation referencing code_path_audit_polish_20260622

No other track entries were modified.
2026-06-24 10:07:01 -04:00

909 lines
127 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Project Tracks
This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder (or in `../archive/<track_name>/` for completed tracks).
**Structure:**
- **Active Tracks (Current Queue):** In-flight and unblocked work the implementer can pick up today.
- **Phase 0 - 9 (Chronological):** The full project history in chronological order. Each phase has three sub-sections: **Active** (work in progress), **Completed** (work shipped but track not yet archived), **Archived** (track folder moved to `archive/`).
Archive directories live at `../archive/<track_name>/` (from this file's location at `conductor/tracks.md`); the `./archive/...` links in this file are relative to that location and resolve correctly.
---
## Active Tracks (Current Queue)
Tracks that are unblocked and ready to start. Ordered by **dependency** (blocked-by first) and **priority** (A foundational → D forward-looking).
| # | Priority | Track | Status | Blocked By |
|---|---|---|---|---|
| 2 | A | [Qwen, Llama & Grok Vendor Integration + Capability Matrix](#track-qwen-llama-grok-vendor-integration--capability-matrix) | spec Γ£ô, plan Γ£ô, 50/79 tasks done; **Phase 6 in progress (docs); NOT archiving ΓÇö has follow-up track** | **test_infrastructure_hardening_20260609 (merged)** |
| 3 | A | [Data-Oriented Error Handling (Fleury Pattern)](#track-data-oriented-error-handling-fleury-pattern) | spec Γ£ô, plan Γ£ô, ready to start | startup_speedup, test_batching_refactor, **test_infrastructure_hardening_20260609 (merged)**, qwen_llama_grok |
| 4 | A | [MCP Architecture Refactor (Sub-MCP Extraction)](#track-mcp-architecture-refactor-sub-mcp-extraction) | spec Γ£ô, plan pending | test_infrastructure_hardening_20260609 (merged), data_oriented_error_handling, data_structure_strengthening |
| 6 | D | [Public API Result Migration](#track-public-api-result-migration-followup) | placeholder; not yet specced | data_oriented_error_handling (deprecated `send()`) |
| 6a | A | [Public API Migration + UI Polish Test Cleanup](#track-public-api-migration--ui-polish-test-cleanup) | spec Γ£ô, plan Γ£ô, shipped 2026-06-15 (13 pre-existing failures fixed; 3 RAG failures deferred to `rag_test_failures_20260615`) | (none ΓÇö independent; **NEW 2026-06-15**; combined stability track) |
| 6b | A | [RAG Test Failures Fix](#track-rag-test-failures-fix-new-2026-06-15) | spec Γ£ô, plan Γ£ô, shipped 2026-06-15 (3 RAG tests fixed; first fully green baseline 1288 + 4 + 0) | (none ΓÇö independent; **NEW 2026-06-15**; small bug-fix track) |
| 6c | B | [Exception Handling Audit (Convention Compliance + Doc Clarification)](#track-exception-handling-audit-convention-compliance--doc-clarification) | spec ✓, plan ✓, shipped 2026-06-16 (211 violations identified across 42 files; 5 doc gaps closed) | (none — independent; **NEW 2026-06-16**; audit + doc track; identifies the migration target for `data_structure_strengthening_20260606` and the user's `send_result` → `send` rename) |
| 6d | A | [Result Migration (5 sub-tracks)](#track-result-migration-5-sub-tracks-new-2026-06-16) | umbrella spec Γ£ô; sub-tracks 1+2 initialized (sub-track 1: `result_migration_review_pass_20260617` **shipped 2026-06-17**; sub-track 2: `result_migration_small_files_20260617` initialized; 3 remaining) | `exception_handling_audit_20260616`; identifies the migration target | (none ΓÇö independent; **NEW 2026-06-16**; refactor phase; 5 sub-tracks eliminate the 268 "bad" sites per the audit; sub-tracks use the consistent `result_migration_*` prefix; **post-review pass 2026-06-17**: sub-track 4 gains 1 site `src/gui_2.py:1349`) |
| 6d-1 | A | [Result Migration Sub-Track 1: Review Pass](#track-result-migration-sub-track-1-review-pass-2026-06-17) | spec Γ£ô, plan Γ£ô, metadata Γ£ô, state Γ£ô; **shipped 2026-06-17** (43 sites classified: 23 compliant + 1 migration-target + 8 PATTERN_1/2 + 9 compliant + 1 audit-script-bug; 10 new heuristics added; 3 audit-script bugs documented) | `result_migration_20260616` (umbrella); `exception_handling_audit_20260616` (shipped 2026-06-16) | (**NEW 2026-06-17**; sub-track 1 of 5; 43 sites classified; no production code change; T-shirt S; per-site decisions feed sub-tracks 2-4; 3 audit-script bugs documented for sub-track 2 Phase 1) |
| 6d-2 | A | [Result Migration Sub-Track 2: Small Files + Audit-Script Bug Fixes](#track-result-migration-sub-track-2-small-files--audit-script-bug-fixes-2026-06-17) | spec Γ£ô, plan Γ£ô, metadata Γ£ô, state Γ£ô, **shipped 2026-06-18** (Phase 10 REJECTED for sliming 21 sites via 5 laundering heuristics; Phase 11 REDOES the 21 sites: 5 full Result migrations in warmup.py + 2 helper extracts + 14 documented; Phase 12 = ACTUAL full Result[T] migration: 16 sites in api_hooks.py + 27 sites in 16 small files; Heuristic #19 REMOVED; visit_Try bug FIXED; Heuristic D ADDED; Drain Points section in styleguide; **Phase 12 REJECTED for false test claim**; **Phase 13 = script crash fixed (UTF-8 reconfigure in run_tests_batched.py) + 3 failures investigated on parent commit (0 regressions) + 4 pre-existing Gemini 503 tests documented with @pytest.mark.skip + test_execution_sim_live switched from gemini_cli to gemini per user directive (STILL FAILS, reported for diff track); 11/11 tiers actually run; 9 PASS clean + 2 PASS with documented issues) | `result_migration_20260616` (umbrella); `result_migration_review_pass_20260617` (shipped 2026-06-17) | (**NEW 2026-06-17**; sub-track 2 of 5; 37 files (35 SMALL + 2 MEDIUM) with 76 sites; Phase 1 = 3 audit-script bugs fixed; Phases 3-8 = 49 sites migrated; Phase 10 = 26 SILENT_SWALLOW + 14 new UNCLEAR sites via full Result + 5 new heuristics; **Phase 10 REJECTED; Phase 11 = 5 full Result + 2 helper extracts + 14 documented; 5 laundering heuristics REVERTED; Heuristic A ADDED; Phase 12 = ACTUAL migration of all sites + styleguide Drain Points; Phase 13 = test count verification; 2 reported issues for diff tracks**) |
| 6d-3 | A | [Result Migration Sub-Track 3: App Controller](#track-result-migration-sub-track-3-app-controller-2026-06-18) | spec ✓, plan ✓, metadata ✓, state ✓, **active**; migrates 45 sites in `src/app_controller.py` to `Result[T]` (32 INTERNAL_BROAD_CATCH + 8 INTERNAL_SILENT_SWALLOW + 4 INTERNAL_RETHROW + 1 INTERNAL_OPTIONAL_RETURN); 22 sites stay as-is (15 BOUNDARY_FASTAPI + 2 BOUNDARY_SDK + 4 INTERNAL_COMPLIANT + 1 INTERNAL_PROGRAMMER_RAISE). **Phase 1 = fix the 2 known regressions** (test_tool_presets_execution::test_tool_ask_approval + test_extended_sims::test_execution_sim_live) caused by the half-migrated `session_logger.log_tool_call` call site in `_offload_entry_payload` (lines 3715, 3721). 5-file-commit pattern from `doeh_test_thinking_cleanup_20260615` (1 source + 1 test + 1 plan + 1 metadata + 1 state per task). 6 phases: (1) Setup + fix regressions; (2) 32 broad-catch → 4 bulk batches; (3) 8 silent-swallow → 2 batches with logging.debug per Heuristic #19; (4) 4 rethrow classified + 1 optional migrated; (5) Verify + audit + end-of-track report. | `result_migration_20260616` (umbrella); `result_migration_small_files_20260617` (shipped 2026-06-18) | (**NEW 2026-06-18**; sub-track 3 of 5; scope: 1 source file (src/app_controller.py) modified across 6 phases; 45 migration sites organized into 4 bulk batches + 3 single-site tasks; 1 new test file (test_app_controller_result.py) + 2 test files updated; 4 metadata/plan/state files; 1 end-of-track report; 18 atomic commits. **Scope larger than umbrella's T-shirt estimate** (45 migration + 22 stay = 67 total, not the estimated 22 + 34 = 56); the audit's per-category output is the source of truth, not the umbrella's T-shirt estimate**) |
| 6d-4 | A | [Result Migration Sub-Track 4: gui_2.py](#track-result-migration-sub-track-4-gui_2py-20260619) | spec Γ£ô, plan Γ£ô, metadata Γ£ô, state Γ£ô, **shipped 2026-06-20**; migrated 42 sites in `src/gui_2.py` (25 INTERNAL_BROAD_CATCH + 13 INTERNAL_SILENT_SWALLOW + 2 INTERNAL_RETHROW + 2 UNCLEAR) to `Result[T]`; added 3 new drain-plane render functions + 1 new test file + 2 new audit heuristics (Phase 11 dunder raise + Phase 12 lazy-loading fallback). **Audit: V=0, S=0, ?=0 for gui_2.py.** 81 atomic commits across 13 phases; 114 tests pass; Tier 1+2 batched: 10/10 PASS; Tier 3: 1 known issue (FPS 28.46 vs 30 threshold; documented in TRACK_COMPLETION). **Anti-sliming protocol: 13 phases cap each phase at <=10 sites with per-phase styleguide re-read + per-site audit pre/post check + per-phase invariant test.** | `result_migration_app_controller_20260618` (sub-track 3, SHIPPED 2026-06-19 with Phase 7; data plane ready) | (**NEW 2026-06-19**; sub-track 4 of 5; scope: 1 source file (src/gui_2.py) modified across 13 phases; 42 migration sites organized into 12 migration phases + 3 setup phases; 1 new test file (tests/test_gui_2_result.py) with 114 tests; 1 modified test file (tests/test_audit_heuristics.py) with 8 regression tests; 4 metadata/plan/state/spec files; 1 end-of-track report; 81 atomic commits. **Extra-long phase structure per user directive (2026-06-19) to prevent Tier 2 sliming.**) |
| 6d-5 | A | [Result Migration Sub-Track 5: Baseline Cleanup](#track-result-migration-baseline-cleanup-20260620) | spec Γ£ô, plan Γ£ô, metadata Γ£ô, state Γ£ô, **shipped 2026-06-20**; migrated 88 sites across 3 baseline files (`src/mcp_client.py` 46 + `src/ai_client.py` 33 + `src/rag_engine.py` 9) to make the convention reference 100% compliant. **All 3 baseline files V=0** (strict audit gate passes for baseline). 122 unit tests pass (31 baseline + 16 audit heuristics + 13 tier4 + 62 tier2). 9/11 batched tiers pass (2 with pre-existing flaky failures). 1 regression caught + fixed (test_set_tool_preset_with_objects ΓÇö `global` declaration lost in helper extraction). **Same anti-sliming protocol as sub-track 4: 14 phases cap each phase at <=9 sites with per-phase styleguide re-read + per-site audit pre/post check + per-phase invariant test.** 84 atomic commits across 14 phases. **Known limitations documented**: 9 Pattern 1/3 RETHROW sites remain (audit lacks heuristic; strict mode accepts); 4 pre-existing non-baseline INTERNAL_OPTIONAL_RETURN in external_editor/session_logger/project_manager (out of scope). | `result_migration_gui_2_20260619` (sub-track 4, SHIPPED 2026-06-20) | (**NEW 2026-06-20, SHIPPED 2026-06-20**; sub-track 5 of 5; scope: 3 source files (mcp_client.py + ai_client.py + rag_engine.py = 231KB / 5917 lines) modified across 14 phases; 88 migration sites organized into 12 migration phases + 3 setup phases; 1 new test file (tests/test_baseline_result.py) with 31 tests; 3 inventory docs (1 per file); 4 metadata/plan/state/spec files; 1 end-of-track report + 1 progress report + 1 TIER1_REVIEW report; 84 atomic commits. **Same anti-sliming template as sub-track 4 per user directive (2026-06-20); completes the 5-sub-track campaign ΓÇö 100% Result[T] convention coverage across all 65 src/ files.**) |
| 6d-6 | A | [Result Migration: Cruft Removal (Wrapper Obliteration)](#track-result-migration-cruft-removal-wrapper-obliteration-20260620) | spec Γ£ô, plan Γ£ô, metadata Γ£ô, state Γ£ô, **shipped 2026-06-20 with Phase 9 patch 2026-06-21**; obliterated 9 legacy `def _x(): return _x_result(...).data` wrappers across 4 files (mcp_client 1, ai_client 5, rag_engine 1, gui_2 2). **0 legacy wrappers remain in src/ (verified by scripts/audit_legacy_wrappers.py + 4 Phase 9 invariant tests).** 127/127 unit tests pass (31 baseline + 16 heuristic + 11 cruft + 64 tier2 + 5 thinking); 9/11 batched tiers PASS (2 with pre-existing flaky failures). **OBLITERATE principle per user directive (2026-06-20): no pass-throughs; no backward compat; in-site callers rewritten to use `_x_result(...).ok` directly; the dead code dies.** 9 phases: (0) Setup + styleguide re-read; (1) Fix 5 failing tests (synthesized baseline JSON from inventory docs; not 7 as spec claimed); (2) Final detailed audit (full legacy wrapper inventory; 9 found via revised audit script); (3-6) Per-file wrapper removal; (8) Audit gate + end-of-track report + campaign close-out; (9) **Phase 9 PATCH per Tier 1 (2026-06-21)** ΓÇö verified the 3 missing wrappers were actually obliterated in Phases 5-6 (not at the time Tier 1 inspected the tier-2-clone at 8f6d044d); added 4 invariant tests; added CORRECTION NOTICE at top of TRACK_COMPLETION doc; updated campaign status report to true 100% complete. **Closes the 5-sub-track result_migration_20260616 campaign: 100% Result[T] convention coverage across all 65 src/ files.** 21+ atomic commits. End-of-track report: `docs/reports/TRACK_COMPLETION_result_migration_cruft_removal_20260620.md` (with CORRECTION NOTICE). | `result_migration_baseline_cleanup_20260620` (sub-track 5, SHIPPED 2026-06-20) | (**NEW 2026-06-20, SHIPPED 2026-06-20 + Phase 9 patch 2026-06-21**; campaign close-out track; 1 new test file (tests/test_cruft_removal.py with 18 tests) + 1 new audit script (scripts/audit_legacy_wrappers.py) + 1 inventory doc (tests/artifacts/PHASE2_WRAPPER_AUDIT.md) + 1 throw-away synth script; 14 source/test files modified; 1 end-of-track report; 1 campaign status report update; 25+ atomic commits. **Anti-sliming protocol: 9 phases cap each phase at 1-5 wrappers with per-phase styleguide re-read + per-wrapper audit pre/post check + per-wrapper invariant test.**) |
| 6e | A (meta-tooling) | [Tier 2 Autonomous Sandbox (unattended track execution)](#track-tier-2-autonomous-sandbox-new-2026-06-16) | spec Γ£ô, plan Γ£ô, **shipped 2026-06-16** (9 phases, 24 default-on tests + 4 opt-in tests + 1 smoke e2e) | (none ΓÇö independent; **NEW 2026-06-16**; meta-tooling; eliminates the `permission: ask` bottleneck for well-regularized tracks via a 3-layer enforcement stack: OpenCode permission system + Windows restricted token + git hooks) |
| 6f | A (meta-tooling) | [Tier 2 Sandbox File Leak Prevention (revert + 3-layer defense)](#track-tier-2-sandbox-file-leak-prevention-new-2026-06-20) | spec Γ£ô, plan Γ£ô, metadata Γ£ô, state Γ£ô, **shipped 2026-06-20**; selectively reverted the 4 user-named files from offender commit `00e5a3f2` (`.opencode/agents/tier2-autonomous.md`, `.opencode/commands/tier-2-auto-execute.md`, `opencode.json`, `mcp_paths.toml`); added 3-layer defense: pre-commit hook at `conductor/tier2/githooks/pre-commit` (auto-unstages forbidden files at commit boundary; 12 tests), `scripts/audit_tier2_leaks.py` (working-tree audit with `--strict` CI gate; 13 tests), wired hook installation into `scripts/tier2/setup_tier2_clone.ps1`. 25 default-on + 4 opt-in tests pass; 4 atomic commits (`fab2e55b` + `81e1fd7b` + `f5d8ea04` + `8f54deda`); user-driven response to a one-off incident (per user directive: tier-2 must NEVER commit those files again; **NOT via gitignore**). **DEFERRED**: CI wiring of audit `--strict` mode; rebase of stale tier-2 branches (`tier2/result_migration_app_controller_phase6_20260619`, `tier2/test_sandbox_hardening_20260619`) on `origin/master@8f54deda` to drop `00e5a3f2` (user action). | (none ΓÇö independent; **NEW 2026-06-20**; meta-tooling fix; selective revert of 4 of 9 changes in offender commit `00e5a3f2`) |
| 7 | ΓÇö | [UI Polish (Five Issues)](#track-ui-polish-five-issues) | spec Γ£ô, plan Γ£ô, ready to start (Phases 1/4/5 shipped; Phases 2/3 code shipped but tests broken ΓÇö fixed by track 6a) | (none ΓÇö independent) |
| 7a | B | [SQLite-Granularity Inline Docs for gui_2.py](#track-sqlite-granularity-inline-docs-for-gui_2py) | spec Γ£ô, plan Γ£ô, complete | (none ΓÇö independent) |
| 7b | B | [Continued SQLite-Granularity Inline Docs for gui_2.py](#track-continued-sqlite-granularity-inline-docs-for-gui_2py) | spec Γ£ô, plan Γ£ô, complete | (none ΓÇö independent) |
| 7c | B | [SQLite-Granularity Inline Docs for ai_client.py](#track-sqlite-granularity-inline-docs-for-ai_clientpy) | spec Γ£ô, plan Γ£ô, ready to start | (none ΓÇö independent) |
| 7d | A | [Live GUI Test Infrastructure Fixes](#track-live-gui-test-infrastructure-fixes-new-2026-06-18) | spec Γ£ô, plan Γ£ô, metadata Γ£ô, state Γ£ô, **active**; addresses 2 issues reported for diff tracks by `result_migration_small_files_20260617` Phase 13: (1) `test_execution_sim_live` GUI subprocess (port 8999) crashes mid-test during script generation flow ΓÇö same failure with both `gemini_cli` and `gemini`; NOT provider-specific; 90s timeout reached without AI text; (2) `test_live_gui_workspace_exists` xdist race ΓÇö workspace cleanup timing under parallel xdist; passes in isolation. 4 phases: (1) Investigation + Issue 2 parent-commit verification; (2) Fix Issue 2 (TDD); (3) Fix Issue 1 (TDD + remove diagnostic logging); (4) Final verification (11/11 tiers PASS clean). | `result_migration_small_files_20260617` (shipped 2026-06-18 with the 2 issues reported for diff tracks) | (**NEW 2026-06-18**; test-infrastructure track; 2-3 files affected (test + src); TDD for each issue; 11-tier verification required; NO new `@pytest.mark.skip` markers per user directive; out of scope: the 4 Gemini 503 skip markers from sub-track 2 Phase 13 ΓÇö deferred to a separate follow-up track that mocks the Gemini API in `summarize.summarise_file`) |
| 16 | A | [Test Sandbox Hardening](#track-test-sandbox-hardening-new-2026-06-19) | spec Γ£ô, plan Γ£ô, metadata Γ£ô, state Γ£ô, **ready to start**; 5-part fix for test data loss outside `./tests/`. Phase 1: investigation + baseline pass count + audit of `get_config_path()` callers. Phase 2: `scripts/audit_test_sandbox_violations.py` (FR4 static audit + `--strict` CI gate). Phase 3: `_enforce_test_sandbox` autouse fixture in conftest.py using `sys.addaudithook` (FR1 Python guard; hard fail on any write outside `./tests/`). Phase 4: root-cause fix ΓÇö remove `SLOP_CONFIG` env-var fallback from `src/paths.py`; add `--config <path>` CLI flag to sloppy.py + conftest.py; `set_config_override(path)` module-level API (FR2). Phase 5: `isolate_workspace` migration off `tmp_path_factory.mktemp` to `tests/artifacts/_isolation_workspace_<RUN_ID>/`; pyproject.toml `--basetemp` addopts; `SLOP_CREDENTIALS`/`SLOP_MCP_ENV` env vars added to non-live_gui tests; tech-stack.md dated note (FR3). Phase 6: `scripts/run_tests_sandboxed.ps1` (FR5 Windows restricted-token wrapper, OPT-IN). Phase 7: `conductor/code_styleguides/test_sandbox.md` + updates to workspace_paths.md and guide_testing.md (FR7 docs). Phase 8: full 11-tier verification. Phase 9: end-of-track report. 13 regression tests in `tests/test_test_sandbox.py`. ~11 atomic commits. | (none ΓÇö independent; **NEW 2026-06-19**; test-infrastructure + root-cause fix; primary motivation: user has lost important sample data multiple times over the past month because tests wrote to top-level TOML files; **NO ENV VARS for config path per user directive** ΓÇö `--config` CLI flag is the only override mechanism; test workspace file naming: `config_overrides.toml`; hard fail on any sandbox violation; tests should never need AppData temp (`tempfile.mkdtemp/mkstemp` without `dir=` is flagged); baseline 1288 + 4 + 0; **out of scope**: converting the other 7 `SLOP_*` env vars (`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`) to CLI flags ΓÇö user considers this a separate "mess" to address in follow-up tracks; deferred: macOS/Linux OS-level wrapper, per-fixture sandbox strictness tuning, read-side isolation) |
| 8 | ΓÇö | [Bootstrap gencpp Python Bindings](#track-bootstrap-gencpp-python-bindings) | spec TBD | (none ΓÇö independent) |
| 9 | ΓÇö | [Tree-Sitter Lua MCP Tools](#track-tree-sitter-lua-mcp-tools) | spec TBD | (none ΓÇö independent) |
| 10 | ΓÇö | [GDScript Language Support Tools](#track-gdscript-language-support-tools) | spec TBD | (none ΓÇö independent) |
| 11 | ΓÇö | [C# Language Support Tools](#track-c-language-support-tools) | spec TBD | (none ΓÇö independent) |
| 12 | ΓÇö | [OpenAI Provider Integration](#track-openai-provider-integration) | spec TBD | (none ΓÇö independent) |
| 13 | ΓÇö | [Zhipu AI (GLM) Provider Integration](#track-zhipu-ai-glm-provider-integration) | spec TBD | (none ΓÇö independent) |
| 14 | ΓÇö | [AI Provider Caching Optimization](#track-ai-provider-caching-optimization) | spec TBD | (none ΓÇö independent) |
| 15 | ΓÇö | [Manual UX Validation & Review](#track-manual-ux-validation--review) | spec TBD | (none ΓÇö independent) |
| 15a | ΓÇö | [Manual UX Validation ΓÇö ASCII-Sketch Workflow](#track-manual-ux-validation--ascii-sketch-workflow-new-2026-06-08) | spec Γ£ô, plan Γ£ô, ready to start | (none ΓÇö independent; NEW 2026-06-08) |
| 15b | ΓÇö | [Chunkification Optimization (Contingency)](#track-chunkification-optimization-new-2026-06-08-contingency) | spec Γ£ô (contingency), no plan | hard constraint surface (deferred) |
| 16 | ΓÇö | [GenCpp Dogfood Feedback Loop](#track-gencpp-dogfood-feedback-loop) | spec TBD | (none ΓÇö independent; oldest pending track) |
| 17 | A | [Code Path Audit](#track-code-path-audit) | spec Γ£ô + plan Γ£ô (revised 2026-06-08 post-4-tracks; **pre-flight adjusted 2026-06-21** with 2 new actions + 5 micro-benchmarks + no-TypeError assertion per `docs/handoffs/PROMPT_FOR_TIER_1.md`) | test_infrastructure_hardening_20260609 (merged), any_type_componentization_20260621 (shipped 2026-06-21), phase2_4_5_call_site_completion_20260621 (BLOCKER for the broadcast() TypeError fix; unblocks audit instrumentation) |
| 23 | A (research) | [Intent-Based Scripting Languages Survey](#track-intent-based-scripting-languages-survey-new-2026-06-12) | spec Γ£ô, plan pending | (none ΓÇö independent; NEW 2026-06-12; **non-impl research track**, **time-sensitive: report must complete before nagent v2.2**) |
| 24 | A (bugfix) | [AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek)](#track-ai-loop-regressions-minimax-gemini-gemini-cli-deepseek-new-2026-06-14) | spec Γ£ô, plan Γ£ô, shipped 2026-06-15 (with 1 critical `_api_generate` regression + 2 deferred bugs ΓÇö see `doeh_test_thinking_cleanup_20260615`) | (none ΓÇö independent; **NEW 2026-06-14**; user-blocking; 3 bugs from `data_oriented_error_handling_20260606`) |
| 25 | B (research) | [Fable System Prompt Review (Critical Analysis)](#track-fable-system-prompt-review-critical-analysis-new-2026-06-17) | spec Γ£ô, plan pending | (none ΓÇö independent; **NEW 2026-06-17**; **non-impl research track**, **informs the deferred nagent-rebuild**; 10 cluster sub-reports + 17-section synthesis report >3500 LOC + 3 side artifacts; Fable artifact at `docs/artifacts/Fable System Prompt.txt` is local-only and **NEVER committed**) |
| 18 | ΓÇö | [GUI Architecture Refinement](#track-gui-architecture-refinement) | (no spec.md) | (TBD) |
| 19 | ΓÇö | [Context First Message Fix](#track-context-first-message-fix) | spec TBD | (none ΓÇö independent) |
| ~~19~~ | ΓÇö | ~~[Fix Remaining Tests](#track-fix-remaining-tests)~~ | ~~SUPERSEDED by track 1~~ | ΓÇö |
| ~~20~~ | ΓÇö | ~~[Test Harness Hardening](#track-test-harness-hardening)~~ | ~~SUPERSEDED by track 1~~ | ΓÇö |
| ~~21~~ | ΓÇö | ~~[Test Patch Fixes](#track-test-patch-fixes)~~ | ~~SUPERSEDED by track 1~~ | ΓÇö |
| ~~22~~ | ΓÇö | ~~[Test Batching Post-Refactor Polish](#track-test-batching-post-refactor-polish)~~ | ~~SUPERSEDED by track 1 (FR1 + FR2)~~ | ΓÇö |
| 20 | ΓÇö | [Prior Session Test Harden (20260605)](#track-prior-session-test-harden-20260605-superseded) | superseded; no action needed | ΓÇö |
| 21 | A | [Conductor Chronology (chronology.md canonical index)](#track-conductor-chronology) | spec Γ£ô, plan Γ£ô, 10/10 phases implemented; Phase 10 (user sign-off) pending; end-of-track report at `docs/reports/TRACK_COMPLETION_chronology_20260619.md` | (none ΓÇö independent; **NEW 2026-06-19**; canonical-track infrastructure; the `superpowers_review_20260619` track is `blocked_by` this one) |
| 22b | A (meta-tooling) | [Meta-Tooling Workflow Review — Past-Month LLM Behavior Analysis](#track-meta-tooling-workflow-review-past-month-llm-behavior-analysis) | spec ✓, plan ✓, metadata ✓, state ✓, **parked 2026-06-20** (current_phase=0); 11-phase plan; ≥4,000-LOC 4-part report; 13-15 atomic commits; Tier 1 anchor + 3 Tier 3 parallel sweeps | (none — independent; **NEW 2026-06-20**; sibling to nagent_review + fable_review + superpowers_review + intent_dsl_survey; produces workflow_improvements.md + implementation_sequencing.md as standalone inputs for a near-future "workflow improvements rebuild" track; research-only; no src/, tests/, AGENTS.md, conductor/*.md, .opencode/, or scripts/audit_*.py changes; **anti-sliming guard**: Phase 9 self-review + Phase 10 user review gate are literal hard gates per the chronology_20260619 handover) |
| 26 | A (research) | [Video Analysis Campaign (12 videos, 5 clusters, Pass 1 of 3)](#track-video-analysis-campaign-20260621) | spec ✓, plan ✓, **14 folders scaffolded (1 umbrella + 12 children + 1 synthesis); Pass 1 of 3 (information extraction); awaiting Phase 0 tooling prerequisites (yt-dlp, cv2, imagehash install in repo venv)**; 12 children in execution order: CS229 → math foundations → Platonic/geometric → biological → CS336 → applied capstone; per-video target: 1000-10000 LOC markdown deep-dive report | (none — independent; **NEW 2026-06-21**; multi-track research campaign; 12 videos across 5 clusters (E: Stanford >1hr; A: math foundations; B: Platonic AI; C: biological/cognitive; D: applied); multi-pass handoff to Pass 2 (de-obfuscation via user's math encoding — USER must rediscover notation before Pass 2 starts) + Pass 3 (projection to applied domain — USER must articulate "own caveats" before Pass 3 starts); **lossless preservation directive**: Pass 1 artifacts must NOT be over-summarized (data cascades to Pass 2/3); **2 E-cluster videos failed oEmbed 401** (yt-dlp may still work; verify in Phase 1); reusable tooling: 5 TDD scripts in `scripts/video_analysis/` (download_video, extract_transcript, extract_keyframes, ocr_frames, synthesize_report) |
| 27 | A | [Phase 2/4/5 Call-Site Completion (post any_type_componentization)](#track-phase2-4-5-call-site-completion-20260621) | spec ✓, plan ✓, metadata ✓, state ✓, **SHIPPED 2026-06-21** with all 4 phases complete (6a broadcast fix + 6b ChatMessage + 6d UsageStats no-op + 6e Phase 3 cost analysis); 5 atomic commits on tier2 branch; broadcast() TypeError fixed; 20/20 provider tests pass; all 3 audits --strict pass; unblocks `code_path_audit_20260607`; report at `docs/reports/TRACK_COMPLETION_phase2_4_5_call_site_completion_20260621.md` | any_type_componentization_20260621 (parent; shipped 2026-06-21 with 48/89 sites + 1 runtime bug) | (NEW 2026-06-21; bugfix + refactor + test-infrastructure + Tier 2 cost analysis; **Phase 6a COMPLETE**: fixed 2 broadcast() callers in `src/app_controller.py:1849` + `src/events.py:115` (gui_2.py had no callers, verified by grep); added `tests/test_websocket_broadcast_regression.py` 4/4 pass; **Phase 6b COMPLETE**: migrated `_send_grok` + `_send_minimax` + `_send_llama` to `ChatMessage` API; 20/20 provider tests pass; **Phase 6d NO-OP**: `NormalizedResponse` already uses `UsageStats` throughout `openai_compatible.py`; **Phase 6e COMPLETE**: produced `docs/reports/PHASE3_TIER2_ANALYSIS.md` (253 lines; Tier 2 authoritative version); measured 104 history sites (vs Tier 1 estimate 112); discovered 3 hidden cross-references (_strip_private_keys, _extract_minimax_reasoning, _send_llama_native); refined cost estimates: anthropic 35-65us/turn (Tier 1 said 8-15), grok/qwen/llama ~400ns (Tier 1 said 2-8us); **deferred**: Phase 3 call-site migration (104 sites in ai_client.py) -> separate track post-audit; cross-phase coupling -> separate track; `audit_tier2_leaks.py` sandbox-pollution -> infra track; **does NOT merge `tier2/any_type_componentization_20260621` branch** per Tier 2 reconnaissance framing; **does NOT archive `conductor/tracks/phase2_4_5_call_site_completion_20260621/`** - user handles that) |
| 28 | A | [Any-Type Componentization (Promote dict[str, Any] to dataclass(frozen=True))](#track-any-type-componentization-promote-dictstr-any-to-dataclassfrozentrue) | spec ✓, plan ✓, metadata ✓, state ✓, **shipped 2026-06-21** with 48/89 fat-struct sites promoted (Phases 1, 2, 4, 5 complete); Phase 3 (`provider_state` call-site migration in `ai_client.py`) DEFERRED to a separate track; 1 runtime bug surfaced (`HookServer.broadcast()` callers in `app_controller.py` + `events.py`); not merged; reconnaissance for `code_path_audit_20260607`; tier2 branch at 24 commits | (none — independent; **NEW 2026-06-21**; refactor + ai-readability + type-safety; ships: 3 new modules (`src/mcp_tool_specs.py`, `src/openai_schemas.py`, `src/provider_state.py`); 2 new audit scripts (`scripts/audit_dataclass_coverage.py` + `--strict` mode); styleguide `conductor/code_styleguides/type_aliases.md` §12 "When to Promote TypeAlias to dataclass"; type-registry regenerated; 130+ tests pass; **input artifact**: `docs/reports/ANY_TYPE_AUDIT_20260621.md`; **handoff docs**: `docs/handoffs/PROMPT_FOR_TIER_1.md` + `HANDOFF_FOLLOWUP_TRACK_FROM_any_type_componentization.md` + `HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md`) |
| 29 | A (research) | [Video Analysis De-obfuscation Campaign (Pass 2 of 3)](#track-video-analysis-deob-20260621) | spec ✓, plan ✓, **5 folders scaffolded (1 umbrella + 1 warmup + 3 phase children); Pass 2 of 3 (de-obfuscation)**; **awaits USER action item**: gather 3-10 samples of past de-obfuscation notes into `video_analysis_deob_warmup_20260621/samples/`; warmup produces `report.md` + `prompt_template.md`; lexicon child refines; pilot child validates on 2 videos (`cs229_building_llms` + `entropy_epiplexity`); apply child applies to 10 + synthesis; multi-layer deliverable per video: translation + replacement + decoder | (none — independent; **NEW 2026-06-21**; multi-track research campaign; **de-obfuscation philosophy**: constructive type theory + Wildberger-style finitism + boundedness for knowledge + cycles/iteration explicit + etymology-aware; 4 verification criteria (lossless, bounded, constructively typed, etymology-cited); supersedes Pass 1 spec §11.1; consumed by Pass 3 (projection to applied domain, future user-led); **load-bearing directive**: Pass 1 artifacts must remain lossless because Pass 2 de-obfuscation consumes them as raw input) |
| 29a | A (research) | [Lexicon v2 Patch (Pass 2 Phase 1.5)](#track-lexicon-v2-patch-pass-2-phase-15-2026-06-23) | spec ✓, plan ✓, metadata ✓, state ✓, README ✓, V2_CHANGELOG ✓, **spec DRAFT pending user review**; targeted corrective pass after Pass 2 SHIPPED; 5 source files updated + 1 changelog; 8 corrections (L1-L8) + 3 DEFERRED refinements (R1, R4, R6) + 4 template notations (TN1-TN4: B default, C++ opt-in, Odin opt-in, Jai opt-in) + 2 `<<` / `>>` placements (<<1, <<2) + 1 per-language rendering section (<<3); encoding default changed from `float64` to placeholder scheme (`float` general, `integer` general, `Scalar` linear/geo/tensor alg, `float64` resolved); 76 terms in v2 (was 72); v1 state preserved in git history; 33 deliverables + 2 reports NOT re-processed (intermediate artifacts; Pass 3 will use v2) | `video_analysis_deob_apply_20260621` (SHIPPED 2026-06-23, commit 8f2e8a69) | (**NEW 2026-06-23**; **post-apply corrective pass**; not exhaustive; 7 atomic commits planned; Pass 3 (the C11/Python projection) is the next user-led track and will use v2; the 5 DEFERRED gaps are deferred to lexicon v3) |
| 29b | A (research) | [C11 Reference (Pass 3 Sub-Track)](#track-c11-reference-pass-3-sub-track-2026-06-23) | spec ✓, plan ✓, metadata ✓, state ✓, README ✓, **in progress**; 4 cluster sub-reports + 1 main c11_convention.md + tracks.md update; PRIMARY sources = Pikuma duffle (9 headers) + forth bootslop attempt_1 (4 files) + forth references (2 files) + gte_hello (2 files); FALLBACK = raddebugger/src/base (5 headers); the C11 reference synthesizes the user's idiomatic C11 (byte-width types, underscore-suffixed type modifiers, hand-rolled DSL, memory ordering vocabulary, slice + arena, design-doc headers) with the raddbg fallback for patterns duffle doesn't cover (U64/S64, Vec2F32/Vec3S32, String8, force_inline, etc.); the per-language `<<` / `>>` rendering for C11 is included (much_less / much_greater / weakly_coupled with tolerance) | `video_analysis_deob_apply_20260621` (SHIPPED 2026-06-23) + `video_analysis_deob_lexicon_v2_20260623` (SHIPPED 2026-06-23) | (**NEW 2026-06-23**; Pass 3 sub-track; 6 atomic commits planned; hands off to Pass 3 as the primary C11 style guide; per user 2026-06-23: 'use the forth bootslop and pikuma then. Use raddbg's base for stuff missing. otherwise go for jai/odin.') |
| 29c | A (research) | [Pass 3 — C11/Python Projection (the final phase)](#track-pass-3-c11python-projection-2026-06-23) | spec ✓, plan ✓, metadata ✓, state ✓, README ✓, TIER2_STARTER ✓, **spec DRAFT pending user review**; projects v2-deobfuscated outputs to C11 or Python code that conveys each video's content; 11 videos (10 C11 default + 2 Python + 1 synthesis); per-video deliverables: C11 (.c + .h) or Python (.py) + 3-4 markdown docs (translation, decoder, notes); 4 + 3 verification criteria met per the v2 lexicon; per-language `<<` / `>>` rendering (much_less / much_greater / weakly_coupled); encoding placeholder scheme (float / integer / Scalar / float64); code may or may not run (per user 2026-06-23); Tier 2 holds full context + 4 parallel Tier 3 sub-agents (per cluster) | `video_analysis_deob_apply_20260621` (SHIPPED) + `video_analysis_deob_lexicon_v2_20260623` (SHIPPED) + `video_analysis_deob_c11_reference_20260623` (SHIPPED) | (**NEW 2026-06-23**; **Pass 3 of 3**; the FINAL phase of the 3-pass research campaign; ~35-58 atomic commits planned; 11 videos × 3-5 deliverables = 33-55 files + 2 global reports; the user's 'ok awesome' (or similar) after the deliverables is the formal close of the 3-pass campaign) |
**Note on numbering:** the legacy file used `0a`, `0b`, `0c`... and `0d`, `0e`, `0f`, `0g` for tracks created 2026-06-06+. This is the **git-blame sort order**, not a logical execution order. The new structure re-orders by dependency.
---
## Phase 0: Infrastructure (Critical)
*Initialized: 2026-02 (project foundation)*
### Completed
- [x] **Track: Conductor Path Configuration**
*Note: One-line entry; full details in [./tracks/conductor_path_configurable_20260306/](./tracks/conductor_path_configurable_20260306/) (still in `tracks/`; not yet archived).*
---
## Phase 1: Pre-Track Foundation (2026-02 - 2026-03)
*No tracks were added under explicit Phase 1; this section is reserved for the early architectural groundwork that preceded the formal track system.*
### Completed
- [x] Various one-off refactors; full details in `conductor/archive/` by track name prefix.
---
## Phase 2: Strict Execution Queue
*Completed 2026-03-06*
### Completed
- [x] **Track: Strict Execution Queue (Phase 2)**
*See: [./archive/strict_execution_queue_completed_20260306/](./archive/strict_execution_queue_completed_20260306/)*
---
## Phase 3 - Phase 4: Foundational Tracks (March 2026)
*Multiple sub-tracks under the initial feature-development push. All archived.*
### Archived
Tracks 1 - 29 of the original Phase 4 archive (preserved with original numbers for cross-reference continuity):
1. [x] ~~**Track: Session Context Snapshots & Visibility**~~ (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization)
*Link: [./archive/session_context_snapshots_20260311/](./archive/session_context_snapshots_20260311/)*
2. [x] ~~**Track: Discussion Takes & Timeline Branching**~~ (Archived 2026-03-22 - Replaced by discussion_hub_panel_reorganization)
*Link: [./archive/discussion_takes_branching_20260311/](./archive/discussion_takes_branching_20260311/)*
3. [x] **Track: RAG Support**
*Link: [./archive/rag_support_20260308/](./archive/rag_support_20260308/)*
4. [x] **Track: Agent Tool Preference & Bias Tuning**
*Link: [./archive/tool_bias_tuning_20260308/](./archive/tool_bias_tuning_20260308/)*
5. [x] **Track: Expanded Hook API & Headless Orchestration**
*Link: [./archive/hook_api_expansion_20260308/](./archive/hook_api_expansion_20260308/)*
6. [x] **Track: Codebase Audit and Cleanup**
*Link: [./archive/codebase_audit_20260308/](./archive/codebase_audit_20260308/)*
7. [x] **Track: Expanded Test Coverage and Stress Testing**
*Link: [./archive/test_coverage_expansion_20260309/](./archive/test_coverage_expansion_20260309/)*
8. [x] **Track: Beads Mode Integration**
*Link: [./archive/beads_mode_20260309/](./archive/beads_mode_20260309/)*
9. [x] **Track: Optimization pass for Data-Oriented Python heuristics**
*Link: [./archive/data_oriented_optimization_20260312/](./archive/data_oriented_optimization_20260312/)*
10. [x] **Track: Rich Thinking Trace Handling**
*Link: [./archive/thinking_trace_handling_20260313/](./archive/thinking_trace_handling_20260313/)*
11. [x] **Track: Smarter Aggregation with Sub-Agent Summarization**
*Link: [./archive/aggregation_smarter_summaries_20260322/](./archive/aggregation_smarter_summaries_20260322/)*
12. [x] **Track: System Context Exposure**
*Link: [./archive/system_context_exposure_20260322/](./archive/system_context_exposure_20260322/)*
13. [x] **Track: Advanced Log Management and Session Restoration**
*Link: [./archive/log_session_overhaul_20260308/](./archive/log_session_overhaul_20260308/)*
14. [x] **Track: UI Theme Overhaul & Style System**
*Link: [./archive/ui_theme_overhaul_20260308/](./archive/ui_theme_overhaul_20260308/)*
15. [x] **Track: Selectable GUI Text & UX Improvements**
*Link: [./archive/selectable_ui_text_20260308/](./archive/selectable_ui_text_20260308/)*
16. [x] **Track: Markdown Support & Syntax Highlighting**
*Link: [./archive/markdown_highlighting_20260308/](./archive/markdown_highlighting_20260308/)*
17. [x] **Track: Custom Shader and Window Frame Support**
*Link: [./archive/custom_shaders_20260309/](./archive/custom_shaders_20260309/)*
18. [x] **Track: UI/UX Improvements - Presets and AI Settings**
*Link: [./archive/presets_ai_settings_ux_20260311/](./archive/presets_ai_settings_ux_20260311/)*
19. [x] **Track: Discussion Hub Panel Reorganization**
*Link: [./archive/discussion_hub_panel_reorganization_20260322/](./archive/discussion_hub_panel_reorganization_20260322/)*
20. [x] **Track: Undo/Redo History Support**
*Link: [./archive/undo_redo_history_20260311/](./archive/undo_redo_history_20260311/)*
21. [x] **Track: Advanced Text Viewer with Syntax Highlighting**
*Link: [./archive/text_viewer_rich_rendering_20260313/](./archive/text_viewer_rich_rendering_20260313/)*
22. [x] **Track: Tree-Sitter C/C++ MCP Tools**
*Link: [./archive/ts_cpp_tree_sitter_20260308/](./archive/ts_cpp_tree_sitter_20260308/)*
23. [x] **Track: Saved System Prompt Presets**
*Link: [./archive/saved_presets_20260308/](./archive/saved_presets_20260308/)*
24. [x] **Track: Saved Tool Presets**
*Link: [./archive/saved_tool_presets_20260308/](./archive/saved_tool_presets_20260308/)*
25. [x] **Track: External Text Editor Integration for Approvals**
*Link: [./archive/external_editor_integration_20260308/](./archive/external_editor_integration_20260308/)*
26. [x] **Track: Agent Personas: Unified Profiles & Tool Presets**
*Link: [./archive/agent_personas_20260309/](./archive/agent_personas_20260309/)*
27. [x] **Track: Advanced Workspace Docking & Layout Profiles**
*Link: [./archive/workspace_profiles_20260310/](./archive/workspace_profiles_20260310/)*
28. [x] **Track: Review investigation of codebase and expose/cull any hidden invisible prompting**
*Link: [./archive/cull_hidden_prompts_20260502/](./archive/cull_hidden_prompts_20260502/)*
29. [x] **Track: Test Regression Verification**
*Link: [./archive/test_regression_verification_20260307/](./archive/test_regression_verification_20260307/)*
---
## Phase 5: Codebase Curation
*Initialized: 2026-05-07*
### Completed (all archived)
#### Analysis & Structural Review
1. [x] **Track: Comprehensive Path Mapping & Tooling**
*Link: [./archive/ai_interaction_call_graph_20260507/](./archive/ai_interaction_call_graph_20260507/)*
*Goal: Automated and manual derivation of all major code paths and pipelines in the system.*
2. [x] **Track: Controller State Mutation Matrix**
*Link: [./archive/controller_state_mutation_matrix_20260507/](./archive/controller_state_mutation_matrix_20260507/)*
*Goal: Comprehensive map of all methods that modify the `AppController` and `App` state.*
3. [x] **Track: Source-Wide Redundancy Audit**
*Link: [./archive/source_wide_redundancy_audit_20260507/](./archive/source_wide_redundancy_audit_20260507/)*
*Goal: Deep file-by-file audit to identify unused methods, duplicate logic, and dead code.*
4. [x] **Track: Curate Provider Registries**
*Link: [./archive/curate_provider_registries_20260507/](./archive/curate_provider_registries_20260507/)*
*Goal: Move the PROVIDERS list to models.py and update all references to use this single source of truth.*
5. [x] **Track: Encapsulate AppController Status**
*Link: [./archive/encapsulate_appcontroller_status_20260507/](./archive/encapsulate_appcontroller_status_20260507/)*
*Goal: Convert ai_status and mma_status to properties with thread-safe setters.*
6. [x] **Track: Decouple GUI Log Loading**
*Link: [./archive/decouple_gui_log_loading_20260507/](./archive/decouple_gui_log_loading_20260507/)*
*Goal: Move Tkinter directory selection out of AppController and into gui_2.py.*
7. [x] **Track: Refactor Context Aggregation Pipeline**
*Link: [./archive/refactor_context_aggregation_pipeline_20260507/](./archive/refactor_context_aggregation_pipeline_20260507/)*
*Goal: Modernize src/aggregate.py and consolidate legacy tier builders.*
8. [x] **Track: Cull Unused Symbols**
*Link: [./archive/cull_unused_symbols_20260507/](./archive/cull_unused_symbols_20260507/)*
*Goal: Safely remove the 27 dead symbols identified in the redundancy audit.*
9. [x] **Track: Structural Dependency Mapping (SDM) Docstrings**
*Link: [./archive/sdm_docstrings_20260509/](./archive/sdm_docstrings_20260509/)*
10. [x] **Track: AppController Curation & Structural Alignment**
*Link: [./archive/app_controller_curation_20260513/](./archive/app_controller_curation_20260513/)*
*Goal: Curate src/app_controller.py to match gui_2.py organization and enforce Python style conventions.*
11. [x] **Track: Fix 45 failing test files across 12 batches**
*Link: [./archive/fix_test_suite_failures_20260514/](./archive/fix_test_suite_failures_20260514/)*
12. [x] **Track: Fix Indentation 1-Space Convention**
*Link: [./archive/fix_indentation_1space_20260516/](./archive/fix_indentation_1space_20260516/)*
*Goal: Standardize all Python files to 1-space indentation per AI-Optimized Python Style Guide. Audit and correct indentation in src/, tests/, scripts/, and conductor/ directories.*
---
## Phase 6: Context Composition Redesign
*Initialized: 2026-05-10*
### Completed (all archived)
#### Context Control & Workflow Enhancements
1. [x] **Track: Granular AST Control (Signatures vs. Definitions)**
*Link: [./archive/granular_ast_control_20260510/](./archive/granular_ast_control_20260510/)*
*Goal: Introduce 'AST Signatures' and 'AST Definitions' states in the Context Panel for C/C++ files.*
2. [x] **Track: Context Snapshotting per "Take"**
*Link: [./archive/context_snapshotting_takes_20260510/](./archive/context_snapshotting_takes_20260510/)*
*Goal: Snapshot and visually restore the Context Panel state when switching between Takes.*
3. [x] **Track: Interactive Text Slice Highlighting**
*Link: [./archive/interactive_text_slice_highlighting_20260510/](./archive/interactive_text_slice_highlighting_20260510/)*
*Goal: Allow highlighting text ranges to create fuzzy-anchored slices (Def, Sig, Hide) that survive file modifications.*
4. [x] **Track: Context Batch Operations UX**
*Link: [./archive/context_batch_operations_ux_20260510/](./archive/context_batch_operations_ux_20260510/)*
*Goal: Add multi-select and batch state modification capabilities to the Context Panel for rapid wrangling.*
5. [x] **Track: GenCpp Project Initialization**
*Link: [./archive/gencpp_project_init_20260510/](./archive/gencpp_project_init_20260510/)*
*Goal: Configure manual_slop.toml in the gencpp repo to isolate conductor tracks, logs, and history.*
6. [x] **Track: Interactive AST Tree Masking**
*Link: [./archive/interactive_ast_tree_masking_20260510/](./archive/interactive_ast_tree_masking_20260510/)*
*Goal: Inspect C/C++ ASTs in the GUI and mask individual classes/functions as Def, Sig, or Hide.*
7. [x] **Track: Phase 6 Review and Regression Verification**
*Link: [./archive/phase6_review_20260510/](./archive/phase6_review_20260510/)*
*Goal: Review Phase 6 implementation, perform full-suite batch regression testing, and expand test coverage for new context curation features.*
9. [x] **Track: Context Composition Decoupling**
*Link: [./archive/context_comp_decouple_20260510/](./archive/context_comp_decouple_20260510/)*
*Goal: Decouple Files & Media from Context Composition, add directory grouping, file stats, and view mode selection per file.*
10. [x] **Track: Context Composition Slice Visualization**
*Link: [./archive/context_comp_slices_20260510/](./archive/context_comp_slices_20260510/)*
*Goal: Enhance slice visualization with visual editor, annotation support (tags/comments), and view presets.*
11. [x] **Track: GUI Refactor & Stabilization**
*Link: [./archive/gui_refactor_stabilization_20260512/](./archive/gui_refactor_stabilization_20260512/)*
*Goal: Refactor gui_2.py to fix regressions and enforce better imgui scoping patterns.*
12. [x] **Track: GUI 2 Large Cleanup** (originally listed as "I started to do a large cleanup to ./src/gui_2.py..." ΓÇö the long user message was the track description)
*Link: [./archive/gui_2_cleanup_20260513/](./archive/gui_2_cleanup_20260513/)*
*Goal: Study gui_2.py and derive more information on how to maintain and write code for the Python codebase. Update product guidelines or the python code_styleguidelines based on what is discovered. May also need changes to the mcp_tools for better structural awareness of annotations or other conventions with these python files.*
13. [x] **Track: Add Python structural MCP tools (py_remove_def, py_add_def, py_move_def, py_region_wrap)**
*Link: [./archive/python_structural_mcp_tools_20260513/](./archive/python_structural_mcp_tools_20260513/)*
14. [~] **Track: Context Preview & Slice Editor Fixes**
*Link: [./tracks/context_preview_fixes_20260516/](./tracks/context_preview_fixes_20260516/)*
*Goal: Fix Preview button generating empty content, and Inspect/Slices buttons failing to open their respective editor panels.*
*Status: in progress; track folder still in `tracks/` (not yet archived).*
### Active
8. [ ] **Track: GenCpp Dogfood Feedback Loop**
*Link: [./tracks/gencpp_dogfood_feedback_20260510/](./tracks/gencpp_dogfood_feedback_20260510/)*
*Goal: Verify Manual Slop can target gencpp at C:/projects/gencpp and establish a feedback mechanism for issues found during dogfooding.*
*Status: oldest pending track (2026-05-10). Track folder still in `tracks/`.*
---
## Hot Reload Feature (2026-05-16)
*Single-track feature, not part of a numbered Phase.*
### Archived
1. [x] **Track: Hot Reload Python Codebase (Phase 2)**
*Link: [./archive/hot_reload_python_20260516/](./archive/hot_reload_python_20260516/)*
*Goal: Implement selective, state-preserving hot-reload for src/gui_2.py with delegation pattern refactor, manual trigger via Ctrl+Alt+R and GUI button, and visual error tint feedback on failure.*
---
## Phase 7: Stabilization & Polishing (2026-05-13 to 2026-06-02)
*Two archival phases under the same "Phase 7" umbrella. Both completed; tracks moved to `archive/`.*
### Archived
- [x] **Track: Phase 7 Stabilization and Polishing (Regressions Fix)**
*Link: [./archive/phase7_stabilization_and_polishing_20260601/](./archive/phase7_stabilization_and_polishing_20260601/)*
- [x] **Track: Phase 7 Monolithic Stabilization (Final Cleanup)**
*Link: [./archive/phase7_monolithic_stabilization_20260602/](./archive/phase7_monolithic_stabilization_20260602/)*
---
## Late May 2026 - Early June 2026: One-Off Fixes and Polish
*One-off bug fixes and UX polish that landed in the days leading up to the major track work. All archived.*
### Archived
- [x] **Track: Robust Live Simulation Verification**
- [x] **Track: Fix GUI Crashes in Tool Preset Manager and Discussion Hub**
*Link: [./archive/gui_crash_fixes_20260531/](./archive/gui_crash_fixes_20260531/)*
- [x] **Track: Fix `keys_down` AttributeError in ImGui IO**
*Link: [./archive/fix_imgui_keys_down_20260601/](./archive/fix_imgui_keys_down_20260601/)*
- [x] **Track: Selectable Thinking Monologs**
*Link: [./archive/selectable_thinking_monologs_20260601/](./archive/selectable_thinking_monologs_20260601/)*
- [x] **Track: Fix MiniMax history sequencing and truncation**
*Link: [./archive/minimax_history_fix_20260601/](./archive/minimax_history_fix_20260601/)*
- [x] **Track: Preserve context selection on discussion switch and add empty context warning**
*Link: [./archive/context_preservation_and_warnings_20260601/](./archive/context_preservation_and_warnings_20260601/)*
- [x] **Track: Fix Text Viewer docking conflicts and Tool Call row click interactivity**
*Link: [./archive/text_viewer_and_tool_call_fixes_20260601/](./archive/text_viewer_and_tool_call_fixes_20260601/)*
- [x] **Track: UX Refinements for Context Composition and Discussion Entries**
*Link: [./archive/context_composition_ux_20260601/](./archive/context_composition_ux_20260601/)*
- [x] **Track: Combine AST Inspector and Slices Editor into a unified Structural File Editor**
*Link: [./archive/structural_file_editor_20260601/](./archive/structural_file_editor_20260601/)*
- [x] **Track: Add per-response token metrics and AI-assisted history compression**
*Link: [./archive/discussion_metrics_and_compression_20260601/](./archive/discussion_metrics_and_compression_20260601/)*
- [x] **Track: Fix Approve Modal sizing and inline full preview**
*Link: [./archive/approve_modal_ux_20260601/](./archive/approve_modal_ux_20260601/)*
- [x] **Track: Implement Async Context Preview to fix UI hangs and add an 'Everything' Command Palette.**
*Link: [./archive/command_palette_and_performance_20260602/](./archive/command_palette_and_performance_20260602/)*
*Goal: Async context preview offload (background thread, state lock) + Command Palette (32 commands, fuzzy search, Ctrl+Shift+P, Up/Down/Enter nav, 13 unit + 7 live_gui tests). Phases 1-3 complete.*
- [x] **Track: Comprehensive Documentation Refresh**
*Link: [./archive/documentation_refresh_comprehensive_20260602/](./archive/documentation_refresh_comprehensive_20260602/)*
*Goal: Refresh stale documentation across `docs/`. Completed: ASCII file tree updates (`docs/Readme.md` + `Readme.md` 5→14 guides, 22→53 src modules), `docs/guide_testing.md` (new, comprehensive 251-file test suite reference), 7 per-source-file guides (`guide_gui_2.md`, `guide_ai_client.md`, `guide_api_hooks.md`, `guide_mcp_client.md`, `guide_app_controller.md`, `guide_multi_agent_conductor.md`, `guide_models.md`). All 14 guides cross-linked. Gap analysis: [./archive/documentation_refresh_comprehensive_20260602/gap_analysis.md](./archive/documentation_refresh_comprehensive_20260602/gap_analysis.md).*
Sub-tracks (all checkpointed):
- [x] **Sub-Track 1: Docs Layer Refresh** `[checkpoint: 20225c8]` ΓÇö 18 per-file atomic commits. 15 guides (8 refreshed + 7 new), Subsystem Index (24 entries), 106 cross-links all resolve, symbol parity fixed (`apply_nerv_theme` -> `apply_nerv`).
- [x] **Sub-Track 2: Conductor Docs Refresh** `[checkpoint: ef4efab2]` ΓÇö 4 per-file atomic commits: `product.md` (14 guides, MiniMax, Command Palette), `tech-stack.md` (MiniMax, Gemini Embedding 001), `workflow.md` (2026-06-02 doc refresh, 45-tool count), `index.md` (active track links).
- [x] **Sub-Track 3: Agent Config Refresh** `[checkpoint: 87f668a6]` ΓÇö 3 per-file atomic commits: `AGENTS.md` (5.4K -> 0.7K thin pointer), `CLAUDE.md` (6.7K -> 0.2K deprecation stub), `GEMINI.md` (5 providers, sloppy.py entry, 12 key modules). Drift check: 0 issues in 9 mirrored skill files.
- [x] **Track: Test Consolidation & TOML Sandboxing** `[checkpoint: cb91006c]`
*Spec: [./../../docs/superpowers/specs/2026-06-02-test-consolidation-design.md](./../../docs/superpowers/specs/2026-06-02-test-consolidation-design.md), Plan: [./../../docs/superpowers/plans/2026-06-02-test-consolidation.md](./../../docs/superpowers/plans/2026-06-02-test-consolidation.md)*
*Goal: Audit tests for real-TOML usage, migrate offenders to sandboxed patterns. Added `scripts/check_test_toml_paths.py` audit script (CI gate). Migrated `test_mcp_client_whitelist_enforcement` to `tmp_path` (was the only offender). Skipped redundant `enforce_no_real_toml` fixture ΓÇö existing `isolate_workspace` autouse + audit script provide equivalent coverage.*
---
## Phase 8: UI Polish (2026-06-03)
*Initialized: 2026-06-03*
User review surfaced five outstanding UI issues, each previously attempted without success. This track addresses them as five independent phases with their own TDD cycles and atomic commits.
### Active
1. [ ] **Track: UI Polish (Five Issues)**
*Spec: [./../../docs/superpowers/specs/2026-06-03-ui-polish-design.md](./../../docs/superpowers/specs/2026-06-03-ui-polish-design.md)*
*Plan: [./../../docs/superpowers/plans/2026-06-03-ui-polish.md](./../../docs/superpowers/plans/2026-06-03-ui-polish.md)*
*Goal: Resolve five long-standing UI issues:
- Phase 1: GFM markdown table rendering (pre-processor into `src/markdown_table.py`, wire into `MarkdownRenderer.render`).
- Phase 2: Widen the `Keep Pairs` numeric input next to `Truncate` in the discussion panel (`gui_2.py:3829`, width 80 -> 140, switch to `drag_int`).
- Phase 3: Fix `Refresh Registry` button in Log Management ΓÇö currently instantiates `LogRegistry` without calling `load_registry()` so the displayed table never reflects on-disk state (`gui_2.py:1675`).
- Phase 4: Add `Vendor State` tab to Operations Hub ΓÇö at-a-glance provider/model, context-window utilization, cache hit rate, last error class, vendor quota (new `src/vendor_state.py` aggregator + `controller.vendor_quota` field + `ai_client` wire-up).
- Phase 5: Files & Media > Files directory-grouped tree (re-use `aggregate.group_files_by_dir`, mirror `render_context_files_table` collapsible-node style).*
### Recently Archived (post-Phase 8)
- [x] **Track: Clean Install Test** `[checkpoint: d14ae3b]`
*Link: [./tracks/clean_install_test_20260603/](./tracks/clean_install_test_20260603/), Spec: [./../../docs/superpowers/specs/2026-06-02-clean-install-test-design.md](./../../docs/superpowers/specs/2026-06-02-clean-install-test-design.md), Plan: [./../../docs/superpowers/plans/2026-06-02-clean-install-test.md](./../../docs/superpowers/plans/2026-06-02-clean-install-test.md)*
*Goal: Add opt-in pytest test (`RUN_CLEAN_INSTALL_TEST=1`) that clones the repo to tmp_path, runs `uv sync`, launches `sloppy.py --enable-test-hooks`, verifies Hook API responds. Catches "works on my machine" failures. Added `clean_install` marker to `pyproject.toml`. Created `tests/test_clean_install.py` (114 lines, uses `urllib.request` from stdlib per tech-stack.md dependency minimalism rule - deviation from plan). Skipped by default. Marked with `@pytest.mark.clean_install`.*
- [x] **Track: Fix markdown_helper.py for imgui-bundle >=1.92.801** `[checkpoint: 7a34edf]`
*Link: [./tracks/markdown_helper_language_api_compat_20260603/](./tracks/markdown_helper_language_api_compat_20260603/)*
*Goal: First thing the clean install test caught. `ed.TextEditor.LanguageDefinitionId` enum was removed in `imgui-bundle>=1.92.801`. Replaced with version-compat shim helpers `_get_language_id(name)` and `_set_editor_language(editor, lang_obj)` that detect the API at runtime (1.92.5 enum vs 1.92.801+ factory). Also added parallel `_editor_lang_cache` to track current language tag per editor (robust to API name differences like "C++" vs "cpp"). Verified: test passes in opt-in mode (1.92.801), shim still works in local 1.92.5 env, follow-up commit `b306f8f` corrected test URL `/api/mma_status` -> `/api/gui/mma_status` (actual endpoint per `src/api_hooks.py:181`).*
- [x] **Track: Multi-Theme TOML System (Multi-Themes Mod)** `[checkpoint: 38abf231]`
*Link: [./tracks/multi_themes_20260604/](./tracks/multi_themes_20260604/), Plan: [./../../docs/superpowers/plans/2026-06-04-theme-syntax-modularization.md](./../../docs/superpowers/plans/2026-06-04-theme-syntax-modularization.md)*
*Goal: TOML-based theming: per-theme file layout (`themes/<name>.toml` global + `<project>/project_themes.toml` overrides), schema (`syntax_palette` + `[colors]` table of `imgui.Col_` snake_case keys), public API (`load_themes_from_disk`, `get_syntax_palette_for_theme`, `apply_syntax_palette`), `MarkdownRenderer` calls `apply_syntax_palette` on init, color-callable convention (`C_LBL()` / `C_VAL()` so theme switches take effect at use site), upstream 4-syntax-palette limit documented in [./../../docs/guide_themes.md](./../../docs/guide_themes.md) (new guide). 8 new theme files shipped. Theme-caused production bug fixed at `src/gui_2.py:3705-3707` (commit `1469ecac`): `DIR_COLORS` dict stored `C_VAL` not `C_VAL()`, so `imgui.text_colored(d_col, ...)` was being passed a function. Fixed by calling the function at the use site.*
- [~] **Track: Test Regression Fixes (post multi-themes ship)** `[checkpoint: d7487af4]`
*Link: [./tracks/regression_fixes_20260605/](./tracks/regression_fixes_20260605/), Plan: [./../../docs/superpowers/plans/2026-06-05-regression-fixes.md](./../../docs/superpowers/plans/2026-06-05-regression-fixes.md)*
*Goal: Resolve 21 failing tests surfaced after the multi-themes ship. 11 of 21 fixed across 10 atomic commits: theme regression (`test_gui_progress` C_LBL/C_VAL API change, `38abf231`), pre-existing non-live_gui (`test_gui_phase4` markdown_helper mocks, `df43f158`; `test_view_presets` persona_manager mock, `970f198c`), GUI production bug (`DIR_COLORS` callable, `1469ecac`), live_gui `LogPruner` busy loop (`ac08ee87`), RAG NoneType guard (`c96bdb06`). **Root cause of remaining 10 live_gui failures identified (commit `d7487af4`)**: `imgui.save_ini_settings_to_memory()` at `src/gui_2.py:601` crashes C-level (`0xc0000005`) when called in the first few render frames because ImGui's internal state (Fonts, DisplaySize, Settings) isn't ready. Crash is uncatchable from Python. Fixed with `_ini_capture_ready` flag (defer-not-catch pattern): first call returns `b""` and sets the flag, subsequent calls invoke the C function. Bisect anchors: `7df65dff` (pre-existing failures start), `7ea52cbb` (theme-caused failures start). Deferred follow-up track needed for ~5 remaining live_gui tests (MMA engine state transitions, RAG status timing, one test needing substantial render path mocks).*
- [x] **Track: Live-GUI Fragility Fixes (post regression_fixes ship)** `[checkpoint: 1488e715]` [superseded by live_gui_test_hardening_v2]
*Link: Plan: [./../../docs/superpowers/plans/2026-06-05-live-gui-fragility-fixes.md](./../../docs/superpowers/plans/2026-06-05-live-gui-fragility-fixes.md), Spec: [./../../docs/superpowers/specs/2026-06-05-live-gui-fragility-fixes-design.md](./../../docs/superpowers/specs/2026-06-05-live-gui-fragility-fixes-design.md)*
*Goal: Resolve the 3 remaining live_gui failures (269/272 → 271/272 plus 1 new regression unit test). 1-line src fix in `_capture_workspace_profile` (change `ini=b""` to `ini=""` to satisfy `WorkspaceProfile.ini_content: str` contract that `tomli_w` enforces); the `b""` sentinel was a regression from `d7487af4` that caused `save_workspace_profile` to raise `TypeError`, profile never saved, `load_workspace_profile` became a no-op. 1 new unit test (`tests/test_workspace_profile_serialization.py`) encoding the str/bytes contract. `test_prior_session_no_pop_imbalance` is **deferred to a separate follow-up track** — the test was more under-mocked than the spec assumed; fixing imscope.window tuple-return only revealed the next un-mocked dependency (imgui.begin returning bool where 2-tuple expected at line 4496). `render_main_interface` is a kitchen-sink function requiring 50+ mocks; a follow-up track will either add the missing mocks or refactor the test to exercise a narrow prior-session render path. Change 4 (doc hardening of defer-not-catch sections) deferred to track end; not done due to scope focus.*
- [x] **Track: Live-GUI Test Hardening v2 (post v1 ship)** `[complete: 26e0ced4]`
*Note: No standalone track directory was created; the v2 work was completed as commit 26e0ced4 within the live_gui_fragility_fixes_20260605 lineage. The "v1" track directory [./archive/hot_reload_python_20260516/](./archive/hot_reload_python_20260516/) is unrelated; this is a logical successor track with no folder of its own.*
*Goal: Resolve the 4 remaining live_gui failures (was 3 in v1; 1 new regression). v1 fixed the str/bytes sentinel bug but exposed a deeper issue. Decomposed into 4 sub-tracks, 3 active:*
*Sub-track 1: live_gui_state_sync_20260605 - Spec: [./../../docs/superpowers/specs/2026-06-05-live-gui-state-sync-design.md](./../../docs/superpowers/specs/2026-06-05-live-gui-state-sync.md), Plan: [./../../docs/superpowers/plans/2026-06-05-live-gui-state-sync.md](./../../docs/superpowers/plans/2026-06-05-live-gui-state-sync.md). **REAL root cause was bad indentation in src/gui_2.py:607** (user fixed). The App class had _capture_workspace_profile being parsed as nested inside _apply_snapshot due to indentation. Once fixed, 3 tests (test_auto_switch_sim, test_workspace_profiles_restoration, test_undo_redo_lifecycle) immediately passed. App/Controller state sync is already correctly handled by __getattr__/__setattr__ at lines 478-487.*
*Sub-track 2: prior_session_test_harden_20260605 - Spec: [./../../docs/superpowers/specs/2026-06-05-prior-session-test-harden-design.md](./../../docs/superpowers/specs/2026-06-05-prior-session-test-harden.md), Plan: [./../../docs/superpowers/plans/2026-06-05-prior-session-test-harden.md](./../../docs/superpowers/plans/2026-06-05-prior-session-test-harden.md). Test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4.*
*Sub-track 3: wait_for_ready_test_pattern_20260605 - **SKIPPED**. Tests already pass without polling. The flake hypothesis (time.sleep not enough) was wrong; the real cause was the indent. Polling can be a follow-up hardening pass if tests become flaky in CI.*
*Sub-track 4: undo_redo_lifecycle_fix_20260605 - **RESOLVED by Sub-track 1 indent fix**. test_undo_redo_lifecycle now passes; no separate investigation needed.*
*Net result: 4 originally-failing live_gui tests all pass. User can run the full batched suite to confirm.*
---
## Phase 6+ (Active Sprint): Performance, Vendor Coverage, Error Handling, MCP Refactor (2026-06-06+)
*Initialized: 2026-06-06 ΓÇö the current major sprint. Four foundational tracks launched in this sprint, plus one follow-up. **As of 2026-06-10: 3 recently completed (startup_speedup, test_batching_refactor, test_infrastructure_hardening); 4 in plan state (qwen, error_handling, data_structure, mcp_arch).** The 4 in-plan tracks are now unblocked (the upstream test_infrastructure_hardening track is shipped).*
### Recently Completed (2026-06-06 to 2026-06-10)
Lightweight chronology; full spec/plan/state per track is in the linked folder.
#### Track: Sloppy.py Startup Speedup `[COMPLETE 2026-06-07]`
*Link: [./tracks/startup_speedup_20260606/](./tracks/startup_speedup_20260606/) (full spec/plan/state in folder)*
`[track-created: cd4fb045] [phase-1-2-done: f9a01258] [phase-3-done: 51c054ec] [phase-4-done: 3849d304] [phase-5-done: 515a3029] [sub-track-1-done: 253e1798] [sub-track-2e+f-done: 2e3a6385] [audit-CLEAN: 2e3a6385] [conftest-atexit-fix: 8957c9a5] [post-shipping-fix-1: 8c4791d0] [post-shipping-fix-2: 88fc42bb] [post-shipping-fix-3: 52ea2693]`
*9 phases, 57 tasks. 44 TDD tests added. Main Thread Purity Invariant enforced via `scripts/audit_main_thread_imports.py` CI gate. Final measured: import src.ai_client 161ms (was 1800ms; 91% reduction); import src.gui_2 341ms (was 1770ms; 81% reduction); total ~3067ms saved. 62 audit violations remain (large refactors deferred).*
#### Track: Tier 2 Sandbox File Leak Prevention `[COMPLETE 2026-06-20]`
*Link: [./tracks/tier2_leak_prevention_20260620/](./tracks/tier2_leak_prevention_20260620/), Report: [../../docs/reports/TRACK_COMPLETION_tier2_leak_prevention_20260620.md](../../docs/reports/TRACK_COMPLETION_tier2_leak_prevention_20260620.md)*
`[phase-1-revert: fab2e55b] [phase-2-hook: 81e1fd7b] [phase-3-audit: f5d8ea04] [phase-4-install: 8f54deda]`
*Selective revert of the 4 user-named files from offender commit `00e5a3f2` (`.opencode/agents/tier2-autonomous.md`, `.opencode/commands/tier-2-auto-execute.md`, `opencode.json`, `mcp_paths.toml`). 3-layer defense-in-depth added: pre-commit hook (auto-unstages forbidden files at commit boundary; 12 tests), working-tree audit script with `--strict` CI gate (13 tests), and hook installation via `scripts/tier2/setup_tier2_clone.ps1`. 25 default-on tests pass. **Out of scope** (per user explicit list): the 4 throwaway scripts in `scripts/tier2/artifacts/.../*.py` and the `project_history.toml` timestamp. **DEFERRED**: CI wiring of `audit_tier2_leaks.py --strict`; rebase of stale tier-2 branches (`tier2/result_migration_app_controller_phase6_20260619`, `tier2/test_sandbox_hardening_20260619`) on `origin/master@8f54deda` to drop `00e5a3f2` (user action).*
#### Track: Test Batching Refactor `[COMPLETE 2026-06-08] [archived]`
*Link: [./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/](./tracks/archive_completed_tracks_20260603/test_batching_refactor_20260606/)*
`[track-created: b7a97374] [COMPLETE 2026-06-08] [phase-1-done: 57285d04] [phase-3-done: 5252b6d7] [phase-4-done: 50bd894f] [archived: 50bd894f]`
*4 phases, fixture-class-isolated tiers (0-3 + H + P) replacing alphabetical 4-at-a-time batching. Hand-curated `tests/test_categories.toml` overrides for cross-cutting files. Phase 2 (CI shadow run) skipped (no CI in repo).*
#### Track: Test Infrastructure Hardening (2026-06-09) `[COMPLETE 2026-06-10] [archived]`
*Link: [./archive/test_infrastructure_hardening_20260609/](./archive/test_infrastructure_hardening_20260609/)*
`[track-created: 566cf08c] [phase-1-done: 5df22fa8] [phase-2-done: 67d0211e] [phase-3-done: 006bb114] [phase-4-done: b8fcd9d6] [phase-5-done: 33d5cac] [phase-6-done: 7b87bbf5] [phase-7-done: 84edb200] [phase-8-done: 719fe9a]`
*8 phases, ~60 surgical tasks, 6.5 days. Fixes 3 root causes of test regression churn: FR1 subprocess health autouse, FR2 `live_gui_workspace` fixture (per-run timestamped under `tests/artifacts/`), FR3 `_sync_rag_engine` token+dirty coalescing. Plus FR4 `set_value` hook + FR5 `clean_baseline` marker. 314/314 tests green across all 11 tier batches. Closing report: `docs/reports/test_infrastructure_hardening_batch_green_20260610.md`. Lineage: `workspace_path_finalize_20260609` + `mma_tier_usage_reset_fix_20260610` + `rag_phase4_sync_fix_20260610` (all also archived).*
### In Plan (or Pending Spec)
#### Track: Qwen, Llama & Grok Vendor Integration + Capability Matrix `[track-created: 7c1d597e]`
*Link: [./tracks/qwen_llama_grok_integration_20260606/](./tracks/qwen_llama_grok_integration_20260606/), Spec: [./tracks/qwen_llama_grok_integration_20260606/spec.md](./tracks/qwen_llama_grok_integration_20260606/spec.md), Plan: [./tracks/qwen_llama_grok_integration_20260606/plan.md](./tracks/qwen_llama_grok_integration_20260606/plan.md) (to be authored by writing-plans skill)*
*Goal: Add first-class support for Qwen (DashScope native SDK), Llama (Ollama local + OpenRouter cloud + custom URL), and Grok (xAI OpenAI-compatible). Introduce a **Vendor Capability Matrix** (7 v1 capabilities: vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking; audio and server-side code_execution deferred) declared per-(vendor, model) in `src/vendor_capabilities.py`. GUI reads the matrix to enable/disable 9 UI elements (screenshot button, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) instead of hard-coding per-vendor branches. Extract a shared `send_openai_compatible()` helper in `src/openai_compatible.py` that operates on a normalized request/response data structure; each `_send_<vendor>()` is a thin boundary adapter (data-oriented design per Fleury/Acton/Lottes). Refactor `_send_minimax()` to use the helper (~250 lines → ~50). **Out of scope** (separate follow-up track): Anthropic/Gemini/DeepSeek migration to the matrix. 6 phases: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive. **Now blocked by** test_infrastructure_hardening_20260609 (was: none).*
*Status (2026-06-11): Phases 1-5 done; Phase 6 (docs) in progress. **NOT ARCHIVING** ΓÇö has a follow-up track. See [./tracks/qwen_llama_grok_followup_20260611/](./tracks/qwen_llama_grok_followup_20260611/) for the 5-phase follow-up. Audit report: [../docs/reports/qwen_llama_grok_followup_audit_20260611.md](../docs/reports/qwen_llama_grok_followup_audit_20260611.md). 50/79 tasks done. Known gaps: tool-call loop only on MiniMax; 1 of 9 UX adaptations shipped; PROVIDERS in models.py is sprawl; src/ai_client.py needs codepath consolidation; local models need first-class priority; 12 v2 matrix fields documented but not implemented; Anthropic/Gemini/DeepSeek still not on the matrix.*
#### Track: Data-Oriented Error Handling (Fleury Pattern) `[track-created: 494f68f9]`
*Link: [./tracks/data_oriented_error_handling_20260606/](./tracks/data_oriented_error_handling_20260606/), Spec: [./tracks/data_oriented_error_handling_20260606/spec.md](./tracks/data_oriented_error_handling_20260606/spec.md), Plan: [./tracks/data_oriented_error_handling_20260606/plan.md](./tracks/data_oriented_error_handling_20260606/plan.md)*
*Goal: Introduce Ryan Fleury's "errors are just cases" framework as a project convention. New `src/result_types.py` (ErrorKind enum, ErrorInfo dataclass, `Result[T]` with data + side-channel errors list, NilPath + NilRAGState sentinel singletons) and new `conductor/code_styleguides/error_handling.md` canonical reference. Refactor `src/mcp_client.py` ((p, err) tuples → Result; 30+ `assert p is not None` → nil-sentinel paths), `src/ai_client.py` (ProviderError exception → ErrorInfo dataclass; `_send_<vendor>()` → `_send_<vendor>_result()` returning `Result[str]`; `send()` marked `@deprecated`; new `send_result()` public API), and `src/rag_engine.py` (RAGEngine methods → Result returns). Update `conductor/product-guidelines.md` + `workflow.md` + `docs/guide_*.md` so the convention is documented and future plans can incrementally migrate the remaining `src/` files. **Blocked by** startup_speedup, test_batching_refactor, test_infrastructure_hardening_20260609, and qwen_llama_grok tracks. 5 phases: foundation+styleguide, mcp_client refactor, ai_client refactor (highest risk; ProviderError removal), rag_engine refactor, deprecation+docs+archive.*
*Follow-up: **`public_api_migration_20260606`** (planned; not yet specced; no directory yet) — removes the deprecated `ai_client.send()` and migrates all callers. Detailed in the parent track's spec §12.1.*
*Status (2026-06-12): **SHIPPED.** Phases 1-5 complete on branch `doeh-ai_client`. Path C was used for `src/mcp_client.py` (additive `*_result` variants; the 30+ tool-function refactor deferred to follow-up). Full refactor was used for `src/ai_client.py` (ProviderError removed, 9 `_send_*()` renamed, `send()` marked `@deprecated`, `send_result()` public API added) and `src/rag_engine.py` (`_init_vector_store_result`, `_validate_collection_dim_result`, `_get_state` with `NilRAGState`). 28 new tests pass; 4 existing tests updated; 13 test regressions in test_llama_provider.py (3) + test_llama_ollama_native.py (4) + test_grok_provider.py (3) + test_minimax_provider.py (2) + test_live_gui_integration_v2.py (1) ΓÇö all from the Phase 3 renames + ProviderError removal. Regressions are documented in `state.toml` `[regressions_20260612]` and are the intended work of `public_api_migration_20260606`. Archive status: directory remains in place (matches repo convention; `archive` is conceptual, not physical).*
#### Track: Data Structure Strengthening (Type Aliases + NamedTuples) `[track-created: ed42a97a]` `[shipped: 2026-06-21]`
*Link: [./tracks/data_structure_strengthening_20260606/](./tracks/data_structure_strengthening_20260606/), Spec: [./tracks/data_structure_strengthening_20260606/spec.md](./tracks/data_structure_strengthening_20260606/spec.md), Plan: [./tracks/data_structure_strengthening_20260606/plan.md](./tracks/data_structure_strengthening_20260606/plan.md) (to be authored by writing-plans skill)*
*Goal: Improve AI-readability by naming 430 currently-anonymous `dict[str, Any]` / `list[dict[...]]` / `Tuple[...]` types. New `src/type_aliases.py` with 10 `TypeAlias` definitions (`Metadata`, `CommsLogEntry`, `CommsLog`, `HistoryMessage`, `History`, `FileItem`, `FileItems`, `ToolDefinition`, `ToolCall`, `CommsLogCallback`) and 1 `NamedTuple` (`FileItemsDiff`). Mechanical replacement of 345 weak sites across 6 high-traffic files: `src/ai_client.py` (139), `src/app_controller.py` (86), `src/models.py` (51), `src/api_hook_client.py` (32), `src/project_manager.py` (20), `src/aggregate.py` (17). Add `--strict` mode to the existing `scripts/audit_weak_types.py` (committed in 84fd9ac9; found the 430 sites) so it becomes a permanent CI gate that fails when new weak types are introduced. Generate `scripts/audit_weak_types.baseline.json` with the post-refactor count. 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. **Data-grounded**: the audit script is the source of truth; the count drops from 430 to ~60 (86% reduction) in the 6 high-traffic files. **Honest about what's missing**: 23 lower-impact files remain; TypedDict/dataclass migration is deferred to a follow-up track. 2-3 days work, 1-2 phases, low risk. **Now blocked by** test_infrastructure_hardening_20260609 (was: none).*
#### Track: AI Loop Regressions (MiniMax, Gemini, Gemini CLI, DeepSeek) `[track-created: 2026-06-14]` `[shipped: 2026-06-15]`
*Link: [./tracks/ai_loop_regressions_20260614/](./tracks/ai_loop_regressions_20260614/), Spec: [./tracks/ai_loop_regressions_20260614/spec.md](./tracks/ai_loop_regressions_20260614/spec.md), Plan: [./tracks/ai_loop_regressions_20260614/plan.md](./tracks/ai_loop_regressions_20260614/plan.md), Metadata: [./tracks/ai_loop_regressions_20260614/metadata.json](./tracks/ai_loop_regressions_20260614/metadata.json), Report: [../../docs/reports/TRACK_COMPLETION_ai_loop_regressions_20260615.md](../../docs/reports/TRACK_COMPLETION_ai_loop_regressions_20260615.md)*
*Status: 2026-06-15 — **SHIPPED with 1 known production regression + 2 deferred bugs** (both flagged for follow-up). 3 documented bugs (Bug #1 dead `except ai_client.ProviderError`, Bug #2 error → no discussion entry, Bug #3 MiniMax thinking mono) are fixed. 7 new regression tests pass; 2 pre-existing tests in `test_live_gui_integration_v2.py` were adapted (not skipped). 12 commits.*
*Goal: Diagnose and fix the user-blocking AI loop regressions for the 4 providers (MiniMax, Gemini, Gemini CLI, DeepSeek) most heavily touched by the `data_oriented_error_handling_20260606` track (shipped 2026-06-12) and the subsequent `ai client pass` commit `5030bd84` (2026-06-13, 503-line `src/ai_client.py` refactor). 3 distinct bugs: **Bug #1** (3 dead `except ai_client.ProviderError` clauses in `src/app_controller.py:305, 313, 3692` ΓÇö the class was removed in commit `64b787b8`). **Bug #2** (`_handle_request_event` calls the deprecated `ai_client.send()` which now returns `""` on error; `_on_comms_entry` filters empty text). **Bug #3** (`_send_minimax` doesn't wrap reasoning in `<thinking>` tags in returned text).*
*5 phases: Phase 1 (TDD red), Phase 2 (FR1 fix), Phase 3 (FR2 fix), Phase 4 (FR3 fix), Phase 5 (regression sweep + docs). 17 tasks, 12 atomic commits, ~1.5 days of Tier 2 work.*
*Deferred to follow-up tracks (per user direction 2026-06-14): (1) Gemini / Gemini CLI thinking-format compatibility (Bug #4) ΓÇö see `doeh_test_thinking_cleanup_20260615` Phase 3. (2) `<think>` (half-width) marker support in `thinking_parser.py` (Bug #5) ΓÇö see `doeh_test_thinking_cleanup_20260615` Phase 4.*
*`blocks: public_api_migration_20260606` (this track migrates 3 broken sites; the public_api track picks up the remaining 5 production + 63 test call sites).*
#### Track: Data-Oriented Error Handling Test & Thinking-Parser Cleanup `[track-created: 2026-06-15]`
*Link: [./tracks/doeh_test_thinking_cleanup_20260615/](./tracks/doeh_test_thinking_cleanup_20260615/), Spec: [./tracks/doeh_test_thinking_cleanup_20260615/spec.md](./tracks/doeh_test_thinking_cleanup_20260615/spec.md), Plan: [./tracks/doeh_test_thinking_cleanup_20260615/plan.md](./tracks/doeh_test_thinking_cleanup_20260615/plan.md), Metadata: [./tracks/doeh_test_thinking_cleanup_20260615/metadata.json](./tracks/doeh_test_thinking_cleanup_20260615/metadata.json)*
*Status: 2026-06-15 ΓÇö Active, ready for Tier 2 implementation. User-blocking cleanup track. 1 critical production regression + 10 pre-existing test mock bugs + 2 deferred bugs (from `ai_loop_regressions_20260614`) + 2 housekeeping items.*
*Goal: Consolidate the cleanup work that didn't fit in `data_oriented_error_handling_20260606` (the parent refactor) and `ai_loop_regressions_20260614` (the immediate fix track). 5 phases: Phase 1 (CRITICAL: fix `_api_generate` `NameError` regression introduced by `ai_loop_regressions_20260614` commit `2b7b571a` ΓÇö the FR2 fix accidentally removed the `context_to_send` variable definition while preserving its usage at line 278), Phase 2 (fix 11 pre-existing test mock bugs: 3 in test_grok_provider, 3 in test_llama_provider, 4 in test_llama_ollama_native, 1 in test_ai_client_tool_loop_builder, 1 in test_headless_service), Phase 3 (Bug #4 deferred: Gemini / Gemini CLI thinking-format compatibility), Phase 4 (Bug #5 deferred: `<think>` half-width marker support in thinking_parser), Phase 5 (housekeeping: state.toml duplicate-key fix, tracks.md row 24 update, full suite sweep, doc updates). 16 tasks, ~15 atomic commits, 5-8 hours of Tier 2 work (0.5-1 day).*
*Out of scope (documented in spec.md §7 + §12): `public_api_migration_20260606` (planned; the broader migration of 5 production + ~50 test call sites not touched here), `live_gui_mock_injection_20260615` (recommended; infrastructure for proper e2e live_gui + AI client tests), `test_rag_phase4_final_verify` (separate RAG concern), UI Polish Five Issues track phases 2/3 (separate track).*
#### Track: MCP Architecture Refactor (Sub-MCP Extraction) `[track-created: 2720a894]`
*Link: [./tracks/mcp_architecture_refactor_20260606/](./tracks/mcp_architecture_refactor_20260606/), Spec: [./tracks/mcp_architecture_refactor_20260606/spec.md](./tracks/mcp_architecture_refactor_20260606/spec.md), Plan: [./tracks/mcp_architecture_refactor_20260606/plan.md](./tracks/mcp_architecture_refactor_20260606/plan.md) (to be authored by writing-plans skill)*
*Goal: Split the 2,205-line monolithic `src/mcp_client.py` (45 module-level functions) into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Naming convention `mcp_<type>.py` for native MCPs: `mcp_file_io.py` (9 tools), `mcp_python.py` (14), `mcp_c.py` (5), `mcp_cpp.py` (5), `mcp_web.py` (2), `mcp_analysis.py` (2). The existing `ExternalMCPManager` is extracted to `mcp_external.py` (class name preserved). New `MCPController` class in `src/mcp_client.py` holds the 3-layer security model (extracted to `src/mcp_client_security.py`), the `ALL_SUB_MCPS` registration list, and the inverted-dict dispatch lookup. New `src/mcp_client_legacy.py` re-exports all 45+ old symbols for backward compat (the 4 existing test files + `src/app_controller.py:61` continue to work). Each sub-MCP's `invoke()` returns `Result[str, ErrorInfo]` (Fleury pattern). Path parameters use the `Metadata` family aliases. **Blocked by** test_infrastructure_hardening_20260609, `data_oriented_error_handling_20260606` (for `Result`/`ErrorInfo`), and `data_structure_strengthening_20260606` (for `Metadata` aliases). 7 phases: foundation (security + controller), move-to-legacy, extract File I/O, extract Python, extract C/C++/Web/Analysis, extract External, dispatch update + docs + archive. **Out of scope** (per user): a per-MCP DSL (APL/K/Cosy-inspired) for compact tool calls ΓÇö deferred to `mcp_dsl_20260606` follow-up. JSON-only for now.*
#### Track: RAG Phase 4 Stress Test Fix `[x] ΓÇö fixed 16412ad5`
*Status: 2026-06-06 ΓÇö Surfaced during post-v2 verification. Resolved: real bug, NOT a test flake. Root cause: ChromaDB collection dimension mismatch across test runs. The persistent on-disk collection (`tests/artifacts/live_gui_workspace/.slop_cache/chroma_test_stress/`) was created by a previous run with Gemini embeddings (3072-dim); the current run uses local SentenceTransformers (384-dim). `index_file()` upserts silently corrupt the collection, then `search()` fails with `Collection expecting embedding with dimension of 3072, got 384` and the AI request never reaches 'done' status, timing out the 50*0.5s = 25s poll loop. Fix: `RAGEngine._init_vector_store` now calls `_validate_collection_dim` which inspects the first existing vector's dim, compares to the current provider's output, and recreates the collection on mismatch (with a stderr warning). Regression tests added: `test_rag_collection_dim_mismatch_recreates_collection` and `test_rag_collection_dim_match_preserves_collection` in `tests/test_rag_engine.py`. This also fixes a real user-facing bug: switching embedding providers in the GUI previously caused silent corruption. Commit 16412ad5.*
#### Track: SQLite-Granularity Inline Docs for gui_2.py `[COMPLETE: sqlite_docs_gui_2_20260612]`
*Link: [./tracks/sqlite_docs_gui_2_20260612/](./tracks/sqlite_docs_gui_2_20260612/), Spec: [./tracks/sqlite_docs_gui_2_20260612/spec.md](./tracks/sqlite_docs_gui_2_20260612/spec.md), Plan: [./tracks/sqlite_docs_gui_2_20260612/plan.md](./tracks/sqlite_docs_gui_2_20260612/plan.md)*
*Status: 2026-06-12 ΓÇö COMPLETE. SQLite-style docstrings with embedded ASCII layouts and DAG context have been added to key modules representing App lifecycle, discussion panels, context panels, settings hubs, and diagnostics panels.*
*Goal: Add SQLite-granularity docstrings with embedded ASCII layouts and DAG relationships for `src/gui_2.py` panel-by-panel. Ensure zero functional regression. 5 phases: app lifecycle & setup, discussion panel, context panel, settings/hubs, and diagnostics/modals.*
#### Track: Continued SQLite-Granularity Inline Docs for gui_2.py `[COMPLETE: sqlite_docs_gui_2_continued_20260613]`
*Link: [./tracks/sqlite_docs_gui_2_continued_20260613/](./tracks/sqlite_docs_gui_2_continued_20260613/), Spec: [./tracks/sqlite_docs_gui_2_continued_20260613/spec.md](./tracks/sqlite_docs_gui_2_continued_20260613/spec.md), Plan: [./tracks/sqlite_docs_gui_2_continued_20260613/plan.md](./tracks/sqlite_docs_gui_2_continued_20260613/plan.md)*
*Status: 2026-06-13 ΓÇö COMPLETE. Completed the SQLite-style docstring initiative for preset managers, editors, persona selectors, and the command palette modal.*
*Goal: Document preset managers/editors, persona selectors/editors, provider panel, and command palette in `src/gui_2.py` and `src/command_palette.py` with embedded SSDL and ASCII layouts.*
#### Track: SQLite-Granularity Inline Docs for ai_client.py `[COMPLETE: ai_client_docs_20260613]`
*Link: [./tracks/ai_client_docs_20260613/](./tracks/ai_client_docs_20260613/), Spec: [./tracks/ai_client_docs_20260613/spec.md](./tracks/ai_client_docs_20260613/spec.md), Plan: [./tracks/ai_client_docs_20260613/plan.md](./tracks/ai_client_docs_20260613/plan.md)*
*Status: 2026-06-13 ΓÇö COMPLETE. Added SQLite-granularity docstrings with SSDL traces, parameters, functional scopes, and thread boundaries for the primary entry points, providers, and helper functions in src/ai_client.py.*
*Goal: Add SQLite-granularity docstrings with SSDL traces, parameters, functional scopes, and thread boundaries for the primary entry points, providers, and helper functions in `src/ai_client.py`.*
#### Track: Intent-Based Scripting Languages Survey `[COMPLETE: 213e4994]`
*Link: [./tracks/intent_dsl_survey_20260612/](./tracks/intent_dsl_survey_20260612/), Spec: [./tracks/intent_dsl_survey_20260612/spec.md](./tracks/intent_dsl_survey_20260612/spec.md), Plan: [./tracks/intent_dsl_survey_20260612/plan.md](./tracks/intent_dsl_survey_20260612/plan.md), Report: [./tracks/intent_dsl_survey_20260612/report_v1.2.md](./tracks/intent_dsl_survey_20260612/report_v1.2.md), v1.1: [./tracks/intent_dsl_survey_20260612/report_v1.1.md](./tracks/intent_dsl_survey_20260612/report_v1.1.md), v1.0: [./tracks/intent_dsl_survey_20260612/report.md](./tracks/intent_dsl_survey_20260612/report.md), Review: [./tracks/intent_dsl_survey_20260612/reportreview.md](./tracks/intent_dsl_survey_20260612/reportreview.md)*
*Status: 2026-06-12 — COMPLETE. Research-only track (non-impl). Final deliverable: `report_v1.2.md` (1343 lines, 168KB+, 7 sections + 9-subsection expanded Appendix). 4-tier vocab with 42 verbs (T1 math 12, T2 pipeline 12, T3 shell 10, T4 AI-fuzzing 8); **10 prior-art clusters** (0: O'Donnell philosophical anchor; 1: Concatenative; 2: Array; 3: Intent-mapping; 4: Meta-Tooling DSLs; 5: SSDL; 6: Command Palette; 7: Result convention; 8: Metadesk Self-Describing Data + Tag Dispatch; 9: Verse Multi-Paradigm Calculi with Transactional Semantics); 14-primitive grammar from user's math pseudocode; 4 hardware anchor claims; 10 AI-agent properties tying to existing project architecture; 8 open questions for the follow-up interpreter prototype. Version history: v1.0 (418 lines) → v1.1 (1301 lines, +883): XML/JSON rejection citation fix, OCR-restored Lottes quote, softened Wasm streaming-parse inference, expanded Appendix A.1-A.9. → **v1.2** (1343 lines): (1) Renamed `arena { }` → `tape { }` (46 occurrences); (2) **Mixed postfix/infix notation** for math; (3) nagent attribution corrected (Jody Bruchon → Mike Acton); (4) **Added Cluster 8 (Metadesk) and Cluster 9 (Verse)** — survey now covers 10 clusters (sub-agents at `research/cluster_8_metadesk.md` and `research/cluster_9_verse.md`). Time-sensitive goal met: completed before nagent v2.2 hard boundary. Will be consumed by nagent v2.2 (Future-Track Candidate #4) and the future interpreter prototype (follow-up B track, separate). Appendix A.3/A.4 retain v1.1 form pending a sync pass; noted in v1.2 changelog at the top of the report.*
*Goal: Survey intent-based scripting languages as a design philosophy and propose a Meta-Tooling-facing intent DSL vocabulary. **Research-only** (non-impl): produces 1 markdown file at `conductor/tracks/intent_dsl_survey_20260612/report.md`. No new `src/` code, no new tests, no `pyproject.toml` changes. The report is the *foundation document* for the user's nagent v2.2 (its "Future-Track Candidate #4: Intent-based DSL" section), the placeholder `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER` (per `mcp_architecture_refactor_20260606/spec.md` §12.1 and `nagent_review_20260608/metadata.json:28`), and a future interpreter prototype (follow-up B track, separate). 7 sections: (1) the "intent-based" design philosophy (O'Donnell immediate-mode as the anchor); (2) prior art across **10 clusters** (0: John O'Donnell IMGUI/MVC at johno.se/book/*; 1: Forth family — Forth, ColorForth, KYRA/Onat, x68/Lottes, Joy, CoSy/Bob Armstrong; 2: Array — APL, K, BQN, Uiua; 3: Intent-mapping — Jofito/Jody, jq, nagent tag protocol [rejected as model], Wasm; 4: Meta-Tooling DSLs — `mcp_dsl_20260606` placeholder, nagent's Bridge DSL, OpenAI/Anthropic tool-use; 5: SSDL shape primitives per `computational_shapes_ssdl_digest_20260608.md`; 6: Project's own Command Palette 33 commands; 7: `Result[T]` + `ErrorInfo` convention per `data_oriented_error_handling_20260606`); (3) the 14-primitive grammar formalized from the user's math pseudocode (`determinate`/`minor`/`matrix-transpose` snippets), with explicit ambiguity flags; (4) the 4-tier vocab (~40 verbs: T1 math ~10, T2 data pipeline ~12, T3 shell ~10, T4 AI-fuzzing tolerance ~8 — T4 is the novel contribution); (5) hardware mapping with 4 anchor claims (Onat/Lottes 2-register stack + magenta pipe + basic blocks + lambdas + preemptive scatter; O'Donnell "widgets are method invocations"; Forth/CoSy concatenative syntax; APL/K array data); (6) AI-agent properties (10 claims tying to existing project architecture: Meta-Tooling domain per `guide_meta_boundary.md`, runtime path through `cli_tool_bridge.py`, 3-layer security per `guide_tools.md`, 4 memory dimensions per nagent v2.1 §2.1, stable-to-volatile cache ordering, `Result[T]` envelope, Command Palette 33 commands, Hook API state fields, O'Donnell IEventTarget = `sandbox` verb, O'Donnell "reads are free" = cheap Tier 2 verbs); (7) ≥6 open questions for follow-up B (interpreter prototype) + connection block to `intent_dsl_for_meta_tooling_20260608_PLACEHOLDER`. 4 phases: source gathering + outline (checkpoint commit), write sections 1-3, write sections 4-7, self-review + user review + commit + register in tracks.md. **Time-sensitive**: report must complete before nagent v2.2 ships.*
*Spec approved 2026-06-12 (commit `b389f1be`). 789 lines; modeled on `data_oriented_error_handling_20260606/spec.md`.*
#### Track: Prior Session Test Harden (20260605) `[superseded by live_gui_test_hardening_v2_20260605]`
*Status: 2026-05-05 ΓÇö Surfaced during live_gui_fragility_fixes_20260605 execution. `test_prior_session_no_pop_imbalance::test_no_extraneous_pop_when_prior_session_renders` is more under-mocked than expected. Completed as part of live_gui_test_hardening_v2_20260605: test refactored to call narrow render_prior_session_view (50+ mocks -> 20, runtime 5.79s -> 0.08s). Commit 26e0ced4.*
### Backlog (Provider + Language + Investigation)
#### Track: Bootstrap gencpp Python Bindings
*Link: [./tracks/gencpp_python_bindings_20260308/](./tracks/gencpp_python_bindings_20260308/)*
#### Track: Tree-Sitter Lua MCP Tools
*Link: [./tracks/tree_sitter_lua_mcp_tools_20260310/](./tracks/tree_sitter_lua_mcp_tools_20260310/)*
#### Track: GDScript Language Support Tools
*Link: [./tracks/gdscript_godot_script_language_support_tools_20260310/](./tracks/gdscript_godot_script_language_support_tools_20260310/)*
#### Track: C# Language Support Tools
*Link: [./tracks/csharp_language_support_tools_20260310/](./tracks/csharp_language_support_tools_20260310/)*
#### Track: OpenAI Provider Integration
*Link: [./tracks/openai_integration_20260308/](./tracks/openai_integration_20260308/)*
#### Track: Zhipu AI (GLM) Provider Integration
*Link: [./tracks/zhipu_integration_20260308/](./tracks/zhipu_integration_20260308/)*
#### Track: AI Provider Caching Optimization
*Link: [./tracks/caching_optimization_20260308/](./tracks/caching_optimization_20260308/)*
#### Track: Manual UX Validation & Review
*Link: [./tracks/manual_ux_validation_20260302/](./tracks/manual_ux_validation_20260302/)*
#### Track: Manual UX Validation ΓÇö ASCII-Sketch Workflow (NEW 2026-06-08)
*Link: [./tracks/manual_ux_validation_20260608_PLACEHOLDER/](./tracks/manual_ux_validation_20260608_PLACEHOLDER/), Spec: [./tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md](./tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md), Plan: [./tracks/manual_ux_validation_20260608_PLACEHOLDER/plan.md](./tracks/manual_ux_validation_20260608_PLACEHOLDER/plan.md)*
*Goal: Promote the ASCII-sketch UX ideation workflow (`docs/reports/ascii_sketch_ux_workflow_20260608.md`, 340 lines) to a real track. Resolves 5 open questions (vocabulary preference, comparison policy, storage location, tooling, frequency), then executes the workflow on the first target: the per-entry rendering of the Discussion Hub at `src/gui_2.py:3770 render_discussion_entry`. The 23-op matrix A1-A7 in `docs/guide_discussions.md` is the source of truth; the SSDL digest (`docs/reports/computational_shapes_ssdl_digest_20260608.md`, 504 lines) informs the *internal refactoring* decisions. Complements the broader 20260302 track. 4 phases, 21 tasks, TDD-style for Phase 3. User-confirmed worth doing.*
*Status: Active; Phase 1 (5 open questions to the user) is the current phase.*
#### Track: Chunkification Optimization (NEW 2026-06-08, CONTINGENCY)
*Link: [./tracks/chunkification_optimization_20260608_PLACEHOLDER/](./tracks/chunkification_optimization_20260608_PLACEHOLDER/), Spec: [./tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md](./tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md)*
*Goal: Contingency document only. Activates ONLY when a hard constraint surfaces that no existing Python package can solve AND the target is hot enough to justify the C11 build cost. Per user (verbatim): "only worth it if I reach a hard constraint that I cannot solve with an existing python package." The 2 cited candidates (markdown parsing into aggregate markdown, context snapshot processing) are NOT currently bottlenecks per `src/aggregate.py:380-454` (pure-Python string concat, zero third-party markdown deps in `pyproject.toml:6-27`) and `src/history.py:1-141` (bounded ~500KB at 100-snapshot capacity, debounced). First fix if they become bottlenecks: add `markdown-it-py` OR switch to `pickle`/`msgspec` — NOT C11. The shape when activated: subprocess-launch C11 binary with request/response blob wire format (NOT stateful C extension). The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 + "Xar-style chunked arrays" recommendation in §5.2 pre-support this track.*
*Status: Deferred. Promotes to active track when (if) the first hard constraint surfaces.*
#### Track: Context First Message Fix
*Link: [./tracks/context_first_message_fix_20260604/](./tracks/context_first_message_fix_20260604/)*
#### Track: Fix Remaining Tests
*Link: [./tracks/fix_remaining_tests_20260513/](./tracks/fix_remaining_tests_20260513/)*
#### Track: Test Harness Hardening
*Link: [./tracks/test_harness_hardening_20260310/](./tracks/test_harness_hardening_20260310/)*
#### Track: Test Patch Fixes
*Link: [./tracks/test_patch_fixes_20260513/](./tracks/test_patch_fixes_20260513/)*
#### Track: Test Batching Post-Refactor Polish
*Link: [./tracks/test_batching_post_refactor_polish_20260607/](./tracks/test_batching_post_refactor_polish_20260607/)*
#### Track: Code Path Audit
*Link: [./tracks/code_path_audit_20260607/](./tracks/code_path_audit_20260607/), Spec: [./tracks/code_path_audit_20260607/spec_v2.md](./tracks/code_path_audit_20260607/spec_v2.md), Plan: [./tracks/code_path_audit_20260607/plan_v2.md](./tracks/code_path_audit_20260607/plan_v2.md), Report: [../../docs/reports/TRACK_COMPLETION_code_path_audit_20260622.md](../../docs/reports/TRACK_COMPLETION_code_path_audit_20260622.md)*
*Goal: **v2 SHIPPED 2026-06-22 (commit `a99e3e6e`)** — Build `src/code_path_audit.py` — a data-oriented static-analysis tool that audits the 13 data aggregates (10 in-scope + 3 candidate placeholders for any_type_componentization_20260621) in `src/`. 4 static analyzers (PCG via 3 AST passes, MemoryDim classifier, APD with 5 access patterns + 25% dominance, CFE with 7 frequencies + entry-point detection), 2 renderers (`to_markdown` 10-section, `to_tree` box-drawing), 11 public functions (5 deterministic + 5 returning `Result[T]` per `error_handling.md` hard rule + 1 CLI). Cross-validates the 2 foundational tracks (`data_structure_strengthening_20260606` + `data_oriented_error_handling_20260606`) via the 6-input cross-audit integration. 4-direction decomposition cost (componentize/unify/hold/insufficient_data). **MVP output is a single `AUDIT_REPORT.md` (6797 lines, 311KB) + per-aggregate markdowns + `summary.md` as a TOC pointer.** 127 tests passing after the polish follow-up (was 131 pre-polish; -4 DSL tests removed). All 4 audit scripts pass (with 2 known issues documented in the completion report, 5 follow-up tracks recorded). **Note: the v2 DSL format (`to_dsl_v2` + `parse_dsl_v2`) was deprecated in 2026-06-24 by code_path_audit_polish_20260622 — DSL files are no longer produced; `run_audit()` writes `.md` files only.***
*v1 preserved unchanged as `spec.md` + `plan.md`. The v2 re-scope replaced "per-action" framing with "per-data-aggregate" framing (the user's directive 2026-06-22).*
#### Track: Phase 2/4/5 Call-Site Completion (post any_type_componentization) `[track-created: 2026-06-21]`
*Link: [./tracks/phase2_4_5_call_site_completion_20260621/](./tracks/phase2_4_5_call_site_completion_20260621/), Spec: [./tracks/phase2_4_5_call_site_completion_20260621/spec.md](./tracks/phase2_4_5_call_site_completion_20260621/spec.md), Plan: [./tracks/phase2_4_5_call_site_completion_20260621/plan.md](./tracks/phase2_4_5_call_site_completion_20260621/plan.md), Metadata: [./tracks/phase2_4_5_call_site_completion_20260621/metadata.json](./tracks/phase2_4_5_call_site_completion_20260621/metadata.json), State: [./tracks/phase2_4_5_call_site_completion_20260621/state.toml](./tracks/phase2_4_5_call_site_completion_20260621/state.toml)*
*Status: 2026-06-21 ΓÇö Active, Tier 1 decision pending Tier 2 implementation. **SHRUNK scope** per `PROMPT_FOR_TIER_1.md` Decision 1 (Phase 6a + 6b + 6d only; defer Phase 3 to its own track post-audit).*
*Goal: Three-phase focused track that **(a) fixes the `HookServer.broadcast()` runtime bug** introduced by `any_type_componentization_20260621` Phase 5 (the Phase 5 commit `e9fa69dd` changed `broadcast(channel, payload)` → `broadcast(message: WebSocketMessage)` but did not update internal callers in `src/app_controller.py`, `src/events.py`, `src/gui_2.py`); **(b) completes the `_send_grok` / `_send_minimax` / `_send_llama` Phase 2 migration** (the 3 OpenAI-compatible senders were deferred in t2_6 and still construct `OpenAICompatibleRequest(messages=[{"role": ..., "content": ...}])` instead of `messages=[ChatMessage(...)]`); **(c) updates those 3 senders' `NormalizedResponse` construction** to use the Phase 2 `UsageStats` dataclass. **Adds `tests/test_websocket_broadcast_regression.py` with a "no-TypeError-errors-on-any-thread" assertion that `code_path_audit_20260607` will reuse**.*
*Scope (per Tier 1's shrink decision):*
- *Phase 6a (~7 commits): Fix `HookServer.broadcast()` callers in `src/app_controller.py:_run_pending_tasks_once_result` + `src/events.py` + `src/gui_2.py:_process_pending_gui_tasks`. Replace `broadcast(channel, payload)` with `broadcast(WebSocketMessage(channel=, payload=))`. Add regression test.*
- *Phase 6b (~5 commits): Migrate `_send_grok` (L2532) + `_send_minimax` (L2616) + `_send_llama` (L2856) to construct `OpenAICompatibleRequest(messages=[ChatMessage(...)], ...)`. Update provider tests.*
- *Phase 6d (~4 commits): Update those 3 senders' `NormalizedResponse` construction to use `usage=UsageStats(input_tokens=..., output_tokens=..., cache_read_tokens=..., cache_creation_tokens=...)` instead of 4 separate int fields.*
- *Total: ~16 atomic commits, ~3 hours Tier 2 work.*
*Deferred (out of scope, per Tier 1's decision):*
- *Phase 3 (`provider_state.ProviderHistory` call-site migration in `src/ai_client.py`): 112 sites across 6 senders (`_send_anthropic` 25, `_send_deepseek` 20, `_send_minimax` 21, `_send_qwen` 12, `_send_grok` 13, `_send_llama` 21). Qualitative cost estimate: ~+1-2ms per session; +8-15╬╝s per `_send_anthropic` turn. Full analysis: `docs/reports/PHASE3_HYPOTHETICAL_PROMOTION.md`. The audit will quantify this before the Phase 3 track runs.*
- *Cross-phase coupling: `OpenAICompatibleRequest.tools: list[dict[str, Any]]` → `list[ToolSpec]`. Deferred to a separate track.*
- *`audit_tier2_leaks.py` sandbox-pollution fixes (3 failures): `--allowlist` for `mcp_paths.toml`, `opencode.json`, `.opencode/*`. Infrastructure track.*
- *Pre-existing `test_gui2_custom_callback_hook_works` flake. Separate investigation.*
*`blocks: code_path_audit_20260607` (the broadcast() TypeError contaminates the audit's per-action profiling; this track unblocks the audit). `blocked_by: any_type_componentization_20260621` (parent track; shipped 2026-06-21; the tier2 branch is NOT merged).*
*Does NOT merge `tier2/any_type_componentization_20260621` branch per Tier 2's reconnaissance framing in `HANDOFF_CODE_PATH_AUDIT_FROM_any_type_componentization.md` ("Use as input for the audit, not as a merge candidate"). The branch stays at 24 commits as the audit's reconnaissance warm-up.*
*Regression protocol (the lesson from `any_type_componentization_20260621`'s 10 test failures): after each Phase, run `uv run python scripts/run_tests_batched.py --tier tier-1-unit-core` FULLY (no stop-on-failure). After all phases complete, run all 11 tiers FULLY. The "no-TypeError" assertion is the canonical regression test.*
#### Track: GUI Architecture Refinement
*Link: [./tracks/gui_architecture_refinement_20260512/](./tracks/gui_architecture_refinement_20260512/) (no spec.md; needs scoping before planning)*
### Follow-up (Planned, Not Yet Specced)
#### Track: Public API Result Migration (follow-up to data_oriented_error_handling_20260606)
*Plan to be authored when data_oriented_error_handling_20260606 is complete; not started yet.*
*Goal: Remove the deprecated `ai_client.send()` and migrate all callers to `send_result()`. Affects 5 production call sites in `src/` (`src/app_controller.py:290` + `:3692`, `src/multi_agent_conductor.py:591`, `src/orchestrator_pm.py:86`, `src/conductor_tech_lead.py:68`, plus `src/mcp_client.py:2274` in the tool-result dispatch path) and 63 test files. The enumeration + baseline counts are recorded in the parent track's spec §12.1 and verified in this track's `state.toml` `[baseline_post_qwen_track]`.*
*`send_result(...)` mirrors the `send(...)` signature (13+ parameters including 8 callbacks); see `docs/guide_ai_client.md` "Data-Oriented Error Handling (Fleury Pattern) > Public API" for the call shape.*
#### Track: Public API Migration + UI Polish Test Cleanup (combined stability track) `[track-created: 2026-06-15]`
*Link: [./tracks/public_api_migration_and_ui_polish_20260615/](./tracks/public_api_migration_and_ui_polish_20260615/), Spec: [./tracks/public_api_migration_and_ui_polish_20260615/spec.md](./tracks/public_api_migration_and_ui_polish_20260615/spec.md), Plan: [./tracks/public_api_migration_and_ui_polish_20260615/plan.md](./tracks/public_api_migration_and_ui_polish_20260615/plan.md), Metadata: [./tracks/public_api_migration_and_ui_polish_20260615/metadata.json](./tracks/public_api_migration_and_ui_polish_20260615/metadata.json)*
*Status: 2026-06-15 ΓÇö Active, ready for Tier 2 implementation. User-blocking stability track that finishes the cleanup work from `data_oriented_error_handling_20260606` and `doeh_test_thinking_cleanup_20260615` before the data structure track.*
*Goal: Two concerns, one track. **(A) Public API Migration** ΓÇö remove the deprecated `ai_client.send()` legacy wrapper. Migrate 3 remaining production call sites (`src/conductor_tech_lead.py:68`, `src/orchestrator_pm.py:86`, `src/multi_agent_conductor.py:591`) + 12 test files to `send_result()`. Fix 4 of the 10 pre-existing test failures (2 Qwen + 2 symbol_parsing) as a side effect. **(B) UI Polish Test Cleanup** ΓÇö fix 2 broken test assertions in `test_discussion_truncate_layout.py` and `test_log_management_refresh.py` (the production code was already fixed by user commits `d0b06575` and `df7bda6e`; the tests use `find()` which locates the comment block instead of the actual code). **Combined result**: 6 of 10 pre-existing failures fixed (1280 + 6 = 1286 pass; 4 RAG failures deferred to next track).*
*7 phases: Phase 1 (3 production call sites migrated), Phase 2 (12 test files migrated to send_result()), Phase 3 (2 Qwen test fixes), Phase 4 (2 symbol_parsing test fixes), Phase 5 (2 UI Polish test fixes), Phase 6 (deprecation removed: send() function + filterwarnings + test_deprecation_warnings.py), Phase 7 (docs + housekeep). ~28 tasks, ~28 atomic commits, 2-3 days Tier 2 work.*
*Critical audit findings (2026-06-15): UI Polish phases 1, 4, 5 already SHIPPED (commits `79ac9210`, `3a864076`, `74e02485`); phases 2, 3 code SHIPPED (user commits) but tests broken (this track fixes). The 3 remaining production send() call sites (not 5 as the parent spec claimed ΓÇö 2 were already migrated by `doeh_test_thinking_cleanup_20260615`; `mcp_client.py:2274` was a misidentification). 12 test files use `send()` (not 63 as the parent spec claimed ΓÇö `doeh_test_thinking_cleanup_20260615` already migrated 11).*
*`blocks: data_structure_strengthening_20260606` (cleaner Result API usage makes the type-alias replacement easier) and `mcp_architecture_refactor_20260606` (transitively).*
*Out of scope (documented in spec §7): 4 RAG test fixes (separate RAG subsystem track), the `_send_<vendor>()` → `_send_<vendor>_result()` rename (not needed; tests work with current names), 23 lower-impact weak-type files (next major track: `data_structure_strengthening_20260606`), `live_gui_mock_injection_20260615` infrastructure (separate infrastructure track).*
`blocks:` None (independent refactor + sandbox test).
#### Track: Tier 2 Sandbox - Move State/Failures Off AppData `[track-created: 2026-06-18]`
*Link: [./tracks/tier2_no_appdata_20260618/](./tracks/tier2_no_appdata_20260618/), Spec: [./tracks/tier2_no_appdata_20260618/spec.md](./tracks/tier2_no_appdata_20260618/spec.md), Plan: [./tracks/tier2_no_appdata_20260618/plan.md](./tracks/tier2_no_appdata_20260618/plan.md), Metadata: [./tracks/tier2_no_appdata_20260618/metadata.json](./tracks/tier2_no_appdata_20260618/metadata.json)*
*Status: 2026-06-18 ΓÇö SHIPPED. 6 phases, 16 atomic commits (no test commits; the test changes ride with the source changes since the tests assert the source contract). Configuration-only fix ΓÇö no behavior change in product code. Scope: 11 source files modified (5 scripts/tier2/* + 2 conductor/tier2/* + 2 docs/* + 1 conductor/* + 1 .gitignore) + 2 test files modified + 1 new test added.*
*Goal: Per the user's 2026-06-18 'NEVER USE APPDATA' directive, move the Tier 2 failcount state and failure-report locations inside the Tier 2 clone (scripts/tier2/state/<track>/state.json and scripts/tier2/failures/<track>_<ts>.md). Remove every AppData reference from the Tier 2 conventions, permissions, scripts, docs, and tests. After this track, the C:\\Users\\Ed\\AppData\\... tree is never referenced by the Tier 2 sandbox in any form.*
*Deliverables: 0 new files, 0 deleted files. The 16 commits include 4 source code changes (failcount.py + write_report.py + run_track.py + opencode.json.fragment), 2 prompt changes (agent + slash command), 2 bootstrap-script changes (setup + sandboxed launcher), 5 doc/test changes (guide + workflow + write_track_completion_report + slash_command_spec + no_temp_writes), 1 .gitignore, 1 write_track_completion_report output, and 1 last-minute example fix caught by the test. The track-isolated directories (scripts/tier2/state/ and scripts/tier2/failures/) are gitignored so they never pollute the source tree.*
*Test inventory: 37 default-on tests pass (test_failcount.py: 19; test_tier2_slash_command_spec.py: 14 + 1 new = 15; test_no_temp_writes.py: 1; the test_tier2_report_writer.py 8 tests are opt-in via TIER2_SANDBOX_TESTS=1 and pass when enabled). audit_no_temp_writes.py --strict exits 0. No regressions.*
`blocks:` None. Followup: the user re-runs `pwsh -File scripts/tier2/setup_tier2_clone.ps1` to re-bootstrap the live Tier 2 clone with the new conventions.
#### Track: Exception Handling Audit (Convention Compliance + Doc Clarification) `[track-created: 2026-06-16]`
*Link: [./tracks/exception_handling_audit_20260616/](./tracks/exception_handling_audit_20260616/), Spec: [./tracks/exception_handling_audit_20260616/spec.md](./tracks/exception_handling_audit_20260616/spec.md), Plan: [./tracks/exception_handling_audit_20260616/plan.md](./tracks/exception_handling_audit_20260616/plan.md), Metadata: [./tracks/exception_handling_audit_20260616/metadata.json](./tracks/exception_handling_audit_20260616/metadata.json), Report: [../../docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md](../../docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md)*
*Status: 2026-06-16 ΓÇö Active, completed (5/5 phases, ~12 tasks). An AUDIT + DOC track (no production code change). The deliverable is the audit script + the report + 3 doc/codestyle updates that close 5 gaps in the convention's documentation.*
*Goal: produce a static analyzer that classifies every `try/except/finally/raise` site in the codebase against the data-oriented error handling convention established by `data_oriented_error_handling_20260606` (shipped 2026-06-12). The audit's value is in the report + the doc clarification, not in a refactor.*
*Deliverables:*
- *`scripts/audit_exception_handling.py` ΓÇö 792-line AST-based static analyzer; 10-category classification taxonomy (5 compliant + 3 violation + 1 suspicious + 1 unclear); `--json`, `--top`, `--verbose`, `--strict`, `--include-tests` modes; "delete to turn off" per `feature_flags.md`*
- *`conductor/code_styleguides/error_handling.md` ΓÇö 5 new sections (Boundary Types, The Broad-Except Distinction, Constructors Can Raise, Re-Raise Patterns, Audit Script) closing 5 gaps the audit revealed*
- *`docs/guide_app_controller.md` ΓÇö new "Exception Handling" section explaining the 13 FastAPI boundary sites + the 40 migration-target sites*
- *`conductor/product-guidelines.md` ΓÇö cross-reference to the audit script*
- *`docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md` ΓÇö 9-section report (370 lines) for the user to decide the next track*
*Headline numbers: 348 total sites across 65 files. 80 compliant (23%) + 25 suspicious (7%) + 211 violation (61%) + 32 unclear (9%). The 3 refactored baseline files (mcp_client, ai_client, rag_engine) have 112 sites / 77 violations (the convention reference; remaining violations are mostly broad-catches without ErrorInfo conversion). The 62 migration-target files have 236 sites / 134 violations (the work for future refactor tracks).*
*5 gaps the audit revealed + closed:*
- *G1: FastAPI `HTTPException` in `_api_*` handlers not explicitly documented as a legitimate boundary (closed in styleguide + app_controller doc)*
- *G2: The "broad except Exception" rule doesn't distinguish between "swallow" and "convert to ErrorInfo" (closed in styleguide)*
- *G3: The "constructors can raise" rule is brief; needs elaboration (closed in styleguide)*
- *G4: The "re-raise" pattern is not in the styleguide at all (closed in styleguide)*
- *G5: The new audit script is not referenced from the styleguide (closed in styleguide + product-guidelines.md)*
*Critical audit findings (2026-06-16): The convention is applied to 3 of 65 src/ files (mcp_client.py, ai_client.py, rag_engine.py ΓÇö the "baseline"). The remaining ~10 files in src/ are in the "migration-target" state. The top 3 candidates by violation count: `src/gui_2.py` (37 violations, 260KB), `src/app_controller.py` (35 violations + 13 FastAPI boundary = 48 sites, 166KB), `src/session_logger.py` (8 violations, 16KB). The user decides which is the next refactor track.*
*`blocks: app_controller_result_migration_20260616` (recommended next track; 22 migration-target sites in app_controller.py after excluding the 13 FastAPI boundary sites; 2-3 days Tier 2), `gui_2_result_migration` (37 violations; 2-3 days Tier 2), `session_logger_result_migration` (8 violations; 0.5 day Tier 2). Also unblocks the user's stated `send_result` → `send` mass rename and the planned `data_structure_strengthening_20260606` track.*
*Out of scope (deferred to separate tracks): the `send_result` → `send` mass rename (user's stated manual refactor), 23 lower-impact weak-type files (`data_structure_strengthening_20260606`), `live_gui_mock_injection_20260615` infrastructure (separate track), RAG test quality cleanup (poll loops; separate track), and — most importantly — **any production code refactor** (this track is informational; the user decides what to migrate).*
#### Track: Result Migration (5 sub-tracks) `[track-created: 2026-06-16]`
*Link: [./tracks/result_migration_20260616/](./tracks/result_migration_20260616/), Spec: [./tracks/result_migration_20260616/spec.md](./tracks/result_migration_20260616/spec.md), Plan: [./tracks/result_migration_20260616/plan.md](./tracks/result_migration_20260616/plan.md), Metadata: [./tracks/result_migration_20260616/metadata.json](./tracks/result_migration_20260616/metadata.json), Audit: [../../docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md](../../docs/reports/EXCEPTION_HANDLING_AUDIT_20260616.md)*
*Status: 2026-06-16 ΓÇö Umbrella track; spec/plan/metadata planned. **2026-06-17 update**: sub-track 1 (`result_migration_review_pass_20260617`) shipped; sub-track 2 (`result_migration_small_files_20260617`) initialized; 3 sub-tracks remaining. The umbrella specifies the sequence and scope of the 5 sub-tracks; each sub-track gets its own spec/plan/metadata when it starts.*
*Goal: Eliminate all 211 violations + 25 suspicious + 32 unclear = **268 "bad" sites** across 42 files (per the `exception_handling_audit_20260616` report). After all 5 sub-tracks ship, the data-oriented error handling convention is fully applied to all 65 `src/` files, and the `audit_exception_handling.py --strict` mode can be wired into CI as a pre-commit gate.*
*5 sub-tracks (consistent `result_migration_*` prefix):*
| # | Sub-track | Scope | Why this position |
|---|---|---|---|---|
| 1 | `result_migration_review_pass` | S | 57 sites (32 UNCLEAR + 25 INTERNAL_RETHROW) across 15 files | First: human review + audit script heuristic updates inform all later sub-tracks |
| 2 | `result_migration_small_files` | L | 37 files (35 SMALL + 2 MEDIUM from `--by-size`); 72 V+S sites | Second: quick wins; doesn't depend on the orchestrator or GUI; can run in parallel with 3-4 |
| 3 | `result_migration_app_controller` | XL | 56 sites in `src/app_controller.py` (166KB; 13 FastAPI boundary stay as-is) ΓÇö **Phase 6 added 2026-06-18** to fix the 28 silent-swallow sites that Phase 3's `logging.debug` migration didn't actually migrate (audit gate: `--strict` exits 0) | Third: high coordination with Hook API + MMA + RAG; gates the GUI migration |
| 4 | `result_migration_gui_2` | XL | **55 sites** in `src/gui_2.py` (260KB; 14 ? includes the +1 site `src/gui_2.py:1349` from the review pass) | Fourth: depends on 3 for clean API; the largest file |
| 5 | `result_migration_baseline_cleanup` | L | 112 sites in 3 refactored files (mcp_client.py, ai_client.py, rag_engine.py) | Fifth: closes the gaps in the convention reference; parent's Path C deferred work |
*Total: 5 sub-tracks, 268 sites across 42 files, ~2100 lines changed.*
*NO day estimates (per the new Tier 1 rule added 2026-06-16). Effort is measured by scope (N files, M sites) only. The user / Tier 2 agent decides the actual pacing.*
*Sequence: 1 (review) -> 2 (small files) -> 3 (app_controller) -> 4 (gui_2) -> 5 (baseline cleanup). Tracks 2 + 5 can run in parallel; tracks 3 + 4 must be sequential (the GUI calls controller methods); track 1 is independent.*
*`blocks: data_structure_strengthening_20260606` (parallel track; uses the cleaner Result API from this phase) and the user's stated `send_result` → `send` mass rename.*
*Out of scope (deferred to separate tracks): the `send_result` → `send` mass rename (user's stated manual refactor; post-this-phase), 23 lower-impact weak-type files (`data_structure_strengthening_20260606`), `live_gui_mock_injection_20260615` infrastructure (separate track), RAG test quality cleanup (poll loops; separate track), and **any audit script changes that belong in the review pass (sub-track 1)** — those are detailed in `conductor/tracks/result_migration_20260616/plan.md`.*
---
#### Track: Test Sandbox Hardening (hard sandbox for tests; root-cause fix for test data loss) `[track-created: 2026-06-19]`
*Link: [./tracks/test_sandbox_hardening_20260619/](./tracks/test_sandbox_hardening_20260619/), Spec: [./tracks/test_sandbox_hardening_20260619/spec.md](./tracks/test_sandbox_hardening_20260619/spec.md), Plan: [./tracks/test_sandbox_hardening_20260619/plan.md](./tracks/test_sandbox_hardening_20260619/plan.md), Metadata: [./tracks/test_sandbox_hardening_20260619/metadata.json](./tracks/test_sandbox_hardening_20260619/metadata.json)*
*Status: 2026-06-19 - SPEC + PLAN committed. Ready for Tier 2 implementation. 9 phases, 30 tasks, ~11 atomic commits.*
*Goal: Make any `pytest` or `run_tests_batched.py` invocation provably incapable of writing files outside `./tests/`. Default-on Python guard + opt-in OS-level wrapper. Root-cause fix: eliminate the silent `SLOP_CONFIG` env-var fallback that lets tests accidentally touch the user's real `manual_slop.toml` and related top-level files.*
*The 5 enforcement layers:*
1. **FR2 root-cause fix** ΓÇö `src/paths.py:get_config_path()` no longer falls back to `<project_root>/config.toml` via `SLOP_CONFIG`. New API: `paths.set_config_override(path)`. CLI flag `--config <path>` at the entry point (sloppy.py for production, conftest.py for tests).
2. **FR1 Python guard** ΓÇö `sys.addaudithook` autouse fixture blocks writes outside `./tests/` with `RuntimeError("TEST_SANDBOX_VIOLATION: ...")`. Hard fail; reads unaffected.
3. **FR3 isolation migration** ΓÇö `isolate_workspace` moved off `tmp_path_factory.mktemp` to `tests/artifacts/_isolation_workspace_<RUN_ID>/`. pyproject.toml adds `addopts = "--basetemp=tests/artifacts/_pytest_tmp"`. All test infra paths now under `./tests/`.
4. **FR4 static audit** ΓÇö `scripts/audit_test_sandbox_violations.py` flags hardcoded paths to top-level TOMLs + `tempfile.mkdtemp/mkstemp` without `dir=`. CI gate (`--strict` exits 1).
5. **FR5 OS-level wrapper** ΓÇö `scripts/run_tests_sandboxed.ps1` (Windows restricted-token + Job Object; OPT-IN).
*User directives (locked 2026-06-19):*
- NO ENV VARS for config path. `--config` CLI flag is the only override mechanism.
- Test workspace file naming: `config_overrides.toml` (per user direction).
- Hard fail on any sandbox violation (no warnings, no soft fails).
- Tests should never need AppData temp.
- Out of scope (deferred to follow-up tracks): converting the other 7 `SLOP_*` env vars (`SLOP_GLOBAL_PRESETS`, `SLOP_GLOBAL_TOOL_PRESETS`, `SLOP_GLOBAL_PERSONAS`, `SLOP_GLOBAL_WORKSPACE_PROFILES`, `SLOP_CREDENTIALS`, `SLOP_MCP_ENV`, `SLOP_LOGS_DIR`, `SLOP_SCRIPTS_DIR`) ΓÇö user considers this the "mess" to address separately.
*Baseline (per `result_migration_small_files_20260617` shipped 2026-06-18): 1288 passed + 4 xdist-skipped. VC8 requires no regression vs. this baseline.*
*Root causes of data loss (per Phase 1 audit):*
1. `src/paths.py:get_config_path()` at line 42 silently falls back to `<project_root>/config.toml` when `SLOP_CONFIG` is unset (the default for tests). This is the silent default that bites.
2. `tests/conftest.py:isolate_workspace` at line 265 uses `tmp_path_factory.mktemp` which lives in `%TEMP%\pytest-of-<user>\` on Windows ΓÇö outside `./tests/`.
3. The Layer 1 Python guard is the runtime safety net; FR2 + FR3 are the proper fixes.
*Deferred follow-up tracks (per metadata.json `deferred_to_followup_tracks`):*
- Convert the other 7 `SLOP_*` env vars to CLI flags (same pattern: `paths.set_<thing>_override()` + entry-point flag).
- macOS/Linux OS-level sandbox wrapper (`run_tests_sandboxed.sh` using `bwrap`/`unshare`).
- Per-fixture sandbox strictness tuning (`@pytest.fixture(sandbox_strict=True)`).
- Read-side isolation (block reads of real config from tests).
## Phase 9: Chore Tracks
*Completed chore tracks are in [`chronology.md`](./chronology.md).*
---
## Active Research Tracks (2026-06+)
Tracks that produce a research deliverable (a markdown report) rather than Application code. These are non-impl by design.
*Shipped research tracks are in [`chronology.md`](./chronology.md); active tracks are listed in the [Active Tracks (Current Queue)](#active-tracks-current-queue) table at the top of this file.*
### Track: Video Analysis Campaign (2026-06-21)
**Pass 1 of 3** in a long-running research campaign to penetrate the AI field. The user framed the broader effort:
- **Pass 1 (THIS track):** Information extraction + distillation. 12 curated YouTube videos → transcripts, keyframes, OCR, deep-dive reports.
- **Pass 2 (FUTURE, user-led):** De-obfuscation via user's custom math encoding notation (USER must rediscover the encoding before starting; related: `intent_dsl_survey_20260612`).
- **Pass 3 (FUTURE, user-led):** Projection to user's applied domain (handmade/data-oriented/GPGPU — Timothy Lottes, Onat Türkçüoğlu, Jebrim — + user's own caveats).
**Scope (14 folders):**
- **Umbrella:** [`tracks/video_analysis_campaign_20260621/`](./tracks/video_analysis_campaign_20260621/) ΓÇö spec Γ£ô, plan Γ£ô, metadata Γ£ô, state Γ£ô, README Γ£ô
- **12 child tracks:** [`video_analysis_<slug>_20260621/`](./tracks/) ΓÇö one per video, lightweight spec.md scaffolded; full `plan.md` + `metadata.json` + `state.toml` added during execution by Tier 2
- **1 synthesis track:** [`tracks/video_analysis_synthesis_20260621/`](./tracks/video_analysis_synthesis_20260621/) ΓÇö blocked_by all 12 children; produces `per_video_summary.md` + cross-cutting `report.md`
**12 videos (5 clusters, execution order):**
- **E (Stanford >1hr):** CS229 ΓÇö Building LLMs; CS336 ΓÇö Language Modeling from Scratch, Spring 2026, Lecture 3: Architectures
- **A (math/info-theoretic foundations):** Probability Theory is an Extension of Logic; From Entropy to Epiplexity (Wilson & Finzi); Learning Dynamics from Statistics (Giorgini)
- **B (Platonic/geometric AI):** Towards a Platonic Intelligence (Kumar); Free Lunches (Levin)
- **C (biological/cognitive/generic):** Interesting Behavior by Generic Systems (Fields); Most Counterintuitive Way to Build a Brain; Cognition Emerges from Neural Dynamics (Miller); A Multiscale Logic of Collective Intelligence (Hoffman & Prakash)
- **D (applied):** Creikey ΓÇö DL/CV for Game Developers (BSC 2025)
**Per-child deliverables:** `artifacts/transcript.json` (timestamped segments, lossless JSON) + `artifacts/frames/*.jpg` (50-500 deduplicated) + `artifacts/ocr.md` (full per-frame OCR) + `report.md` (**1000-10000 LOC markdown per user directive**) + `summary.md` (200-400 words).
**Reusable tooling (5 scripts, TDD in `scripts/video_analysis/`):** `download_video.py` (yt-dlp subprocess), `extract_transcript.py` (youtube-transcript-api), `extract_keyframes.py` (ffmpeg scene detect + cv2 + imagehash), `ocr_frames.py` (winsdk or tesseract), `synthesize_report.py` (orchestrator).
**Phase 0 tooling prerequisites (BLOCKERS, verified 2026-06-21):** `yt-dlp`, `opencv-python`, `imagehash`, `pillow` are NOT installed in this repo's venv. OCR backend decision pending (winsdk preferred, tesseract fallback).
**Risk register highlights:** R5 (2 E-cluster videos failed oEmbed 401 ΓÇö yt-dlp may still work), R7 (Pass 1 over-summarization loses signal for Pass 2), R8 (Tier 2 capacity for 12+ child tracks).
**See also:** [umbrella spec](./tracks/video_analysis_campaign_20260621/spec.md) for full design; [umbrella metadata](./tracks/video_analysis_campaign_20260621/metadata.json) for scope + verification criteria.
---
## Notes
**Archive link convention:** `./archive/...` paths in this file resolve to `conductor/archive/...` (this file is at `conductor/tracks.md`). The 71 archive links in this file are all valid as of 2026-06-08.
**Status legend:**
- `[ ]` not started
- `[~]` in progress
- `[x]` completed (track may still be in `tracks/` or may have been moved to `archive/`)
- `~~**...**~~` struck-through (renamed/replaced/superseded)
**Naming convention:** Each track's `spec.md` and `plan.md` (where present) follow the project's standard format: `spec.md` for design intent (the "why"), `plan.md` for executable tasks (the "how"). See `conductor/tracks/data_oriented_error_handling_20260606/` for the canonical example.
**Editing this file:** When you mark a track as `[x]` and move its folder to `archive/`, also move it to the appropriate Archived sub-section. When you start a new track, create the folder under `tracks/` first, then add the entry to the Active Tracks table at the top. The git-blame sort order (`0a`, `0b`, `0c`...) is no longer used; this file is now organized by phase + dependency.
**Archiving a track (3 steps):** When a track ships and its folder moves from `conductor/tracks/<id>/` to `conductor/archive/<id>/`, complete all 3 steps in order:
1. Move the folder: `git mv conductor/tracks/<id> conductor/archive/<id>` (preserves history as a rename).
2. Remove the `[x]` entry from this file (`conductor/tracks.md`). Update any related status badges (e.g., dependency links in the Active Tracks table or other sections).
3. Add a row to [`conductor/chronology.md`](./chronology.md) with the init SHA (first commit on the track's folder), the end SHA (the archive-move commit), the date, the track ID, the status, and a one-sentence summary. Chronology.md is the canonical index of all tracks (active, shipped, superseded, abandoned); this file is the active task list.
The 3-step convention is documented here because this is where the existing "Editing this file" section already lives. The spec/plan referenced `conductor/workflow.md` "Notes > Editing this file" but that section doesn't exist; the actual location is `conductor/tracks.md`.
---
## Archived (Closed 2026-06-23)
### 3-Pass Video Analysis Research Campaign (CLOSED 2026-06-23)
**Status:** CLOSED. The 22 video_analysis tracks are archived at `conductor/archive/analysis/`. The campaign closeout report is at `docs/reports/CAMPAIGN_CLOSE_OUT_video_analysis_20260621.md`.
**The 22 archived tracks:**
| # | Track | Date | Status |
|---|---|---|---|
| 1 | `video_analysis_campaign_20260621` (Pass 1 umbrella) | 2026-06-21 | SHIPPED |
| 2 | `video_analysis_cs229_building_llms_20260621` | 2026-06-22 | SHIPPED |
| 3 | `video_analysis_probability_logic_20260621` | 2026-06-22 | SHIPPED |
| 4 | `video_analysis_entropy_epiplexity_20260621` | 2026-06-22 | SHIPPED |
| 5 | `video_analysis_score_dynamics_giorgini_20260621` | 2026-06-22 | SHIPPED |
| 6 | `video_analysis_platonic_intelligence_kumar_20260621` | 2026-06-22 | SHIPPED |
| 7 | `video_analysis_free_lunches_levin_20260621` | 2026-06-22 | SHIPPED |
| 8 | `video_analysis_generic_systems_fields_20260621` | 2026-06-22 | SHIPPED |
| 9 | `video_analysis_brain_counterintuitive_20260621` | 2026-06-22 | SHIPPED |
| 10 | `video_analysis_neural_dynamics_miller_20260621` | 2026-06-22 | SHIPPED |
| 11 | `video_analysis_multiscale_hoffman_20260621` | 2026-06-22 | SHIPPED |
| 12 | `video_analysis_cs336_architectures_20260621` | 2026-06-22 | SHIPPED |
| 13 | `video_analysis_creikey_dl_cv_20260621` | 2026-06-22 | SHIPPED |
| 14 | `video_analysis_synthesis_20260621` | 2026-06-22 | SHIPPED |
| 15 | `video_analysis_deob_20260621` (Pass 2 umbrella) | 2026-06-23 | SHIPPED |
| 16 | `video_analysis_deob_warmup_20260621` | 2026-06-23 | SHIPPED |
| 17 | `video_analysis_deob_lexicon_20260621` | 2026-06-23 | SHIPPED |
| 18 | `video_analysis_deob_pilot_20260621` | 2026-06-23 | SHIPPED |
| 19 | `video_analysis_deob_apply_20260621` | 2026-06-23 | SHIPPED |
| 20 | `video_analysis_deob_lexicon_v2_20260623` | 2026-06-23 | SHIPPED |
| 21 | `video_analysis_deob_c11_reference_20260623` | 2026-06-23 | SHIPPED |
| 22 | `video_analysis_deob_pass3_20260623` | 2026-06-23 | SHIPPED |
**Campaign summary:**
- **Pass 1:** 12 deep-dive reports + 1 synthesis (~14,000 LOC)
- **Pass 2:** 33 markdown deliverables (~14,413 LOC)
- **v2 patch:** 8 corrections + 3 refinements + 4 template notations + 2 `<<` / `>>` placements
- **C11 reference:** 4 cluster sub-reports + 1 main reference (~1,300 LOC)
- **Pass 3:** 44 per-video deliverables (C11 .c or Python .py) + 2 global reports
- **Total:** ~35,704 LOC of new content across ~75 atomic commits
**Final report:** [`docs/reports/CAMPAIGN_CLOSE_OUT_video_analysis_20260621.md`](../docs/reports/CAMPAIGN_CLOSE_OUT_video_analysis_20260621.md)