Design for removing 30 confirmed-unused one-off scripts from
scripts/. Net effect: scripts/ shrinks from 56 -> 26 files
(54% reduction). All deletions are hard deletes via 5 atomic
per-category commits; git log is the restore path.
26 KEEPS documented by category (CI gates, MMA, MCP, test runner,
ImGui linter, audit/scaffolding, tool-call bridge, Docker, borderline
utility). 30 DELETES grouped by category: one-shot indent fixers
(10), one-shot transform scripts (6), superseded entropy audits (4),
one-shot migrators/repros (6), tool-call aliases and legacy tool
discovery (4).
No new CI gate added. Follow-up unused_scripts_audit_20260607
recorded in the spec. Plan (writing-plans) will produce 5 phases
(one per category).
Phase 9 was shipped at 12cec6ae and the 9-phase core plan is done, but the [COMPLETE 2026-06-07] tag was applied prematurely. Sub-track 2 (audit violations) remains partial at ae3b433e with 61 violations remaining: pydantic in models.py (1), tree_sitter in file_cache.py (4), api_hooks.py (4), sloppy.py (5), app_controller.py (23), gui_2.py (24). Reopening the track to finish sub-track 2 in 6 per-file sub-tracks (2A-2F).
Captures the 5 patterns that burned the most time in the
startup_speedup_20260606 sub-track 4 work:
1. ALWAYS use manual-slop_edit_file, not custom scripts
(custom scripts fail silently on indent/EOL/whitespace drift)
2. The decorator-orphan pitfall
(inserting before 'def foo' leaves @property decorating YOUR new method)
3. ast.parse() is not enough
(semantic errors aren't caught; import + instantiate + call after every edit)
4. The git restore trap
(don't run git status/restore while a user is mid-conversation)
5. Small verified edits beat big scripts
(edit_workflow says 3-10 lines; if you write 200 lines of script, wrong tool)
Also adds 2 new anti-patterns to the Critical list in AGENTS.md and
3 new sections to conductor/edit_workflow.md (decorator-orphan,
ast.parse-not-enough, set_file_slice-is-literal).
All additive; no breaking changes to existing content. Derived from gaps
observed during the 2026-06-06 planning session (5 tracks spec'd +
planned end-to-end).
**AGENTS.md (1 new section, 16 lines):**
- Compaction Recovery - explicit recovery path for a new agent
picking up mid-track (read the digest, check state.toml, run audits,
resume from next unchecked task). Cross-references the
workflow-level 'Compaction Recovery' section.
**conductor/workflow.md (6 new sections, 145 lines):**
- Planning Session Workflow - documents the brainstorming -> spec ->
plan flow used 5x this session; mandates spec approval before plan;
notes the plan is the only artifact the implementer reads.
- Track Dependencies and Execution Order - verify the blocked_by
chain in metadata.json before starting; topological sort gives the
recommended execution order (recorded in PLANNING_DIGEST).
- State.toml Template - canonical structure (meta / blocked_by /
blocks / phases / tasks / verification / track-specific) so future
tracks have a consistent shape.
- Per-Task Decision Protocol - small decisions (cosmetic) decide
yourself; large decisions (architectural) STOP and report; regressions
STOP and report. The boundary is 'does this require a new spec or
plan update?'.
- Documentation Refresh Protocol - after a track ships, identify
affected guides (grep for renamed/moved symbols), update them, add
new guides for new modules, add styleguides for new conventions.
The 'post-tracks documentation' pattern is repeatable; tracks that
only update code are incomplete.
- Audit Script Policy - whenever a track introduces a new convention
that can be statically checked, add an audit script in scripts/
with --help / --json / strict modes. The audit + CI gate pair is
the convention-enforcement mechanism; 3 existing audits
(audit_main_thread_imports, audit_weak_types, check_test_toml_paths)
are the precedent.
All sections reference existing project files (brainstorming skill,
writing-plans skill, audit scripts, tracks.md, the existing 5 new
tracks' spec.md files, PLANNING_DIGEST_20260606.md).
No code changes. Documentation only. ~160 lines total added.
~25 tasks across 7 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.5): Foundation. 3-layer security module (8 unit tests
returning Result[Path]); SubMCP Protocol + MCPController class (6 unit
tests). Controller added ALONGSIDE the existing 45 functions in
mcp_client.py (no removal yet).
- Phase 2 (2.1-2.4): Backward compat. git mv mcp_client.py to
mcp_client_legacy.py; create new mcp_client.py as a slim shim
re-exporting 45+ old symbols. 12 legacy shim tests verify the surface.
The 4 existing test files + src/app_controller.py:61 still work.
- Phase 3 (3.1-3.4): FileIOMCP extracted (9 tools, 10 unit tests).
- Phase 4 (4.1-4.4): PythonMCP extracted (14 tools, 14 unit tests).
- Phase 5 (5.1-5.5): CMCP, CppMCP, WebMCP, AnalysisMCP extracted
(4 sub-MCPs, 18 unit tests; pattern mirrors Phase 3/4).
- Phase 6 (6.1-6.3): ExternalMCP extracted from mcp_client_legacy.
Class name preserved (ExternalMCPManager).
- Phase 7 (7.1-7.5): Update dispatch() in the legacy shim to use the
new controller (inverted-dict O(1) lookup); update docs; manual
smoke test; archive the track.
Each sub-MCP follows the same template (class with name / description
/ tools / invoke; security check for path-taking tools; Result wrapping
in invoke(); delegation to legacy functions for the actual implementation).
The sub-MCPs are thin adapters in v1; a future track can move the
implementations into the sub-MCP files directly.
Self-review at the end maps every spec section to a task (no gaps),
confirms zero placeholders, and verifies type/method-name consistency
across phases (SubMCP Protocol, MCPController class, Result[str,
ErrorInfo], _resolve_and_check all defined in Phase 1; used
consistently across Phases 3-6).
Track + metadata + state + tracks.md registration for the 2,205-line
mcp_client.py split into a slim controller + 6 native sub-MCPs + 1
external sub-MCP.
Key design decisions (per user feedback):
- Naming convention: mcp_<type>.py for native MCPs (mcp_file_io.py,
mcp_python.py, mcp_c.py, mcp_cpp.py, mcp_web.py, mcp_analysis.py).
- ExternalMCPManager class name preserved (moves to mcp_external.py).
- Sub-MCP shape: class with name / description / tools / invoke().
- MCPController: holds ALL_SUB_MCPS list, inverted-dict tool lookup,
3-layer security (extracted to mcp_client_security.py), schema
aggregation.
- Each invoke() returns Result[str, ErrorInfo] (from
data_oriented_error_handling_20260606).
- Backward compat: mcp_client_legacy.py re-exports all 45+ old
symbols; the 4 existing test files + src/app_controller.py:61
direct call continue to work.
DSL future (per user notes on APL/K/Cosy): NOT in this track.
Documented in spec §12.1 as the mcp_dsl_20260606 follow-up.
Sub-MCP architecture is the natural unit to pair with a DSL emitter.
7 phases. ~22 task slots. New tests: 9 (one per sub-MCP + controller +
security + legacy). Modified tests: 4 (existing mcp_* tests must
pass unchanged).
Blocked by: data_oriented_error_handling_20260606, data_structure_strengthening_20260606.
Blocks: mcp_dsl_20260606 (future DSL track).
~22 tasks across 2 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.12): Foundation. type_aliases.py (10 TypeAliases + 1
NamedTuple) with 8 unit tests. Mechanical replacement of 345 weak
sites in 6 files (ai_client 139, app_controller 86, models 51,
api_hook_client 32, project_manager 20, aggregate 17). Each file
has a per-substitution table for the mechanical replacement. Audit
script gains --strict mode + baseline file (CI gate). 4 audit tests.
- Phase 2 (2.1-2.10): FileItemsDiff NamedTuple integrated.
generate_type_registry.py (AST-based; 3 modes: default, --check,
--diff). Initial registry generated in docs/type_registry/ (8+ .md
files). 6 generator tests. Type aliases styleguide + product-guidelines
updates. Manual smoke test. Track archived.
The type registry generator uses --check mode for CI: it regenerates to
a temp dir and diffs against the committed registry; exit 1 if drift.
The agent's track-completion workflow is: regenerate -> review diff ->
commit. CI enforces --check on every PR.
Self-review at the end maps every spec section to a task (no gaps),
confirms zero placeholders, and verifies type/method-name consistency
across phases (all 10 aliases + FileItemsDiff defined in Task 1.2; used
consistently in Tasks 1.3-1.8 and Phase 2).
Per user feedback (2026-06-06): instead of a follow-up 'TypedDict
Migration' track, add a NEW deliverable: an auto-generated type registry
in docs/type_registry/ that captures the field information in docs form.
New files:
- scripts/generate_type_registry.py (NEW): AST-based tool that reads
src/ and writes per-source-file .md files with the fields of every
@dataclass, NamedTuple, TypeAlias, TypedDict. Has --check (CI mode,
exits 1 if registry would change) and --diff (dry run) modes.
- docs/type_registry/ (NEW, generated): index.md + per-source-file
references (type_aliases.md, ai_client.md, models.md, etc.).
- tests/test_generate_type_registry.py (NEW): verify the generator.
Architecture updates:
- Section 3.6 (NEW): Type Registry architecture with example output.
- Section 3.7 (NEW): Why per-source-file docs (locality of reference).
- Section 1.1 (NEW): 'Why docs over TypedDict' analysis (3 reasons:
lower upfront cost, better fit for AI workflow, auto-maintained).
- Goals table: registry added as a C (innovation) goal.
- Module layout: docs/type_registry/ and scripts/generate_type_registry.py
added to the new files list.
- Migration: Phase 2 now includes the registry generator + initial docs.
- Out of scope: TypedDict migration REMOVED; 'auto-typing the field
shape' added with the docs as the chosen approach.
- See Also: TypedDict follow-up REPLACED with 'Registry Maintenance &
CI Integration' (smaller scope, just wires the generator into CI).
The 'cost we eat' is the LLM reading 200-500 lines of markdown per
query. This is bounded and proportional to actual information need.
The upfront cost of designing TypedDict schemas for every type is
unbounded. Tradeoffs favor the docs approach for v1; TypedDict can
come later as a future track if desired.
Track + metadata + state + tracks.md registration for the type-aliases
refactor that follows the audit_weak_types.py findings (430 weak sites
across 29 of 61 files; 86% concentrated in 6 high-traffic files).
Key design decisions (per user approval):
- 10 TypeAlias definitions in src/type_aliases.py (Metadata, CommsLogEntry,
CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition,
ToolCall, CommsLogCallback).
- 1 NamedTuple (FileItemsDiff) for the _reread_file_items return.
- Mechanical replacement of 345 weak sites across 6 files (NOT 430; the
remaining 85 are in 23 lower-impact files deferred to future tracks).
- scripts/audit_weak_types.py gains a --strict mode and a baseline file
(scripts/audit_weak_types.baseline.json) so the count is enforced.
- 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples
+ docs + archive.
- Honest about what's missing: TypedDict / @dataclass migration is a
follow-up track (typed_dict_migration_20260606), not this one.
- Coexistence with the data_oriented_error_handling_20260606 track's
Result[T] / ErrorInfo: the aliases are value-level (data types), Result
is control-level (wrapper). They compose (Result[FileItems] is valid).
No conflict.
Audit baseline:
- Pre-track: 430 weak sites, 0 strong patterns
- Target after Phase 1: ~60 weak sites (only the 23 lower-impact files)
- Top 4 unique type strings account for 86% of findings (4-6 aliases
eliminate the bulk of the noise).
Not blocked by anything; can be executed independently of the other
pending tracks. Blocks typed_dict_migration_20260606 (the future Phase 2).
Track + metadata + state + tracks.md registration for the Fleury-pattern
error handling refactor.
Key design decisions (per user approval):
- Option A for _send_<vendor>() handling: rename to _send_<vendor>_result()
and change return type to Result[str] (contained to internal callers).
- send() is marked @typing_extensions.deprecated; send_result() is the new
public API.
- ProviderError exception is FULLY REPLACED by ErrorInfo dataclass
(a value, not an exception).
- 5 phases: foundation, mcp_client, ai_client, rag_engine, deprecation+archive.
- Post-tracks baseline check (Phase 1 Task 1.1) verifies the 3 pending
tracks have merged before proceeding.
- 9 Open Questions, 7 Risks, 5 verification criteria, follow-up track
public_api_migration_20260606 planned in spec §12.1.
Blocked by: startup_speedup_20260606, test_batching_refactor_20260606,
qwen_llama_grok_integration_20260606. Blocks: public_api_migration_20260606.
This track executes after startup_speedup, test_batching_refactor, and
qwen_llama_grok_integration land. Section 10 documents the expected
post-tracks codebase state and answers 6 critical coordination questions:
- Q1: Existing _send_<vendor>() functions (returning str) are renamed
to _send_<vendor>_result() and changed to return Result[str] (Option A:
clean rename, contained to internal callers).
- Q2: send_openai_compatible in src/openai_compatible.py STAYS as-is
(it raises at the SDK boundary; correct per Fleury). The new
_send_<vendor>_result() functions catch and convert to ErrorInfo.
- Q3: Deprecation warning on send() will produce Python warnings in
tests; filterwarnings in conftest.py silences them during transition.
- Q4: The except ProviderError clauses in src/ai_client.py become
dead code after the refactor and are removed in Phase 3.
- Q5: ProviderError is FULLY REPLACED by ErrorInfo (a value, not an
exception). ProviderError removed entirely; ErrorInfo is the new
error type.
- Q6: ProviderError.ui_message() moves to ErrorInfo.ui_message().
Phase 1 also adds a baseline verification task to confirm the 3 pending
tracks have merged before proceeding.
Also renumbered Out of Scope (11) and See Also (12) sections to
preserve monotonic section numbers.
The capability matrix v1 has no 'audio' field (audio_input is deferred to v2).
Qwen-Audio's vision flag was incorrectly marked true. Changed to false and
clarified that v1 uses Qwen-Audio as text-only; audio attachment UI is
hidden via the absent audio capability check.
Phase 2 of startup_speedup_20260606 is done.
Tasks:
T2.1 (Red) tests/test_io_pool.py 1354679e 4 tests
T2.2 (Green) src/io_pool.py 1354679e make_io_pool() factory
T2.3 (Red) tests/test_warmup.py 1354679e 10 tests
T2.4 (Green) src/warmup.py 1354679e WarmupManager
T2.5 (Wire) AppController integration 922c5ad9 io_pool + warmup in __init__ + 5 public delegation methods
T2.6 (Plan) this commit
What now exists:
- make_io_pool() returns a 4-worker ThreadPoolExecutor named 'controller-io-N'
- WarmupManager class with submit/status/is_done/wait/on_complete/reset
- AppController creates self._io_pool + self._warmup early in __init__
- Warmup is submitted immediately (jobs run concurrent with the rest of init)
- Public API: controller.warmup_status(), controller.is_warmup_done(),
controller.wait_for_warmup(timeout), controller.on_warmup_complete(cb)
- controller._compute_warmup_list() returns 9 always + 2 conditional (fastapi)
- shutdown() now also shuts down the io_pool
Currently the warmup is a no-op for modules already imported at the top
of app_controller.py (fastapi, requests). Phase 3 will remove those
top-level imports; the warmup infrastructure will then start doing
real work.
18/18 tests passing (4 io_pool + 10 warmup + 4 test_app_controller_*).
Next: Phase 3 (remove top-level SDK imports from src/ai_client.py).
Expected to fix ~3 audit violations (google.genai, anthropic, openai).
16 tasks across 4 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.16): Library + dry-run. 20 unit tests across categorizer,
batcher, plugin. New run_tests_batched.py has --plan/--audit only.
- Phase 2 (2.1-2.3): Shadow run via CI. Compare new vs old plan output.
- Phase 3 (3.1-3.4): Switch default. Full CLI with --tiers, --durations.
Old script becomes .legacy. Update docs/guide_testing.md.
- Phase 4 (4.1-4.6): Populate registry, gitignore durations, delete
legacy, archive track.
1-space indentation per project style guide. No placeholders. All
test code is concrete.
Default --audit exits non-zero on hard errors only. --strict adds the
'multiple subsystems = probably cross-cutting' heuristic from Section 9
as a CI gate. Two modes, one flag.
Three-tier batching refactor: replace alphabetical 4-at-a-time batching with
fixture-class-isolated tiers (0 opt-in, 1 unit/xdist, 2 mock_app, 3 live_gui
in one session, H headless, P performance).
Hybrid classification: auto-infer from filename + AST fixture scan; hand-curated
tests/test_categories.toml overrides for cross-cutting and ambiguous files.
Opt-in per-test order control via [[files.X.test_order]] sub-tables, gated on
a conftest-loaded pytest plugin (no-op without entries).
Priority order: B (process isolation) > A (subsystem diagnostic) > C (speed).
Architectural shift driven by user clarification: lazy-loading on first
use causes user-perceptible lag when the user-triggered action (e.g.
provider switch) propagates to a controller method that triggers the
first import. The fix is to pre-import heavy modules on a bg thread
at startup and have functions access them via _require_warmed().
Old design (rejected):
- from google import genai inside _send_gemini (lazy on first call)
- First user action that triggers this pays the cost; UI feels laggy
New design (this commit):
- Top-level heavy imports REMOVED from main-thread-reachable files
- AppController.__init__ submits warmup jobs to _io_pool (4 threads,
named 'controller-io-N')
- Each warmup worker imports its module and updates a thread-safe
warmup_status dict
- Functions access modules via _require_warmed(name), which assumes
the module is in sys.modules (warmed at startup)
- When all jobs complete, _warmup_done_event is set and registered
on_warmup_complete callbacks fire
- GUI shows status indicator + toast when warmup completes
- Hook API exposes /api/warmup_status and /api/warmup_wait
- Tests can call controller.wait_for_warmup() before exercising
warmup-dependent functionality
Phase 2 now bundles job pool + warmup (T2.3+T2.4 add warmup tests +
implementation). Phases 3-5 do 'remove top-level imports' instead of
'lazy-load'. Phase 7 is the notification surface (Hook API + GUI).
Definition of Done includes warmup-completion criteria, the
'no function-body imports' check, and an end-to-end 'provider switch
is INSTANT' smoke test.
No code changes; this is a planning update only.
Fulfills the existing backlog entry at conductor/tracks.md:152
(2026-06-05 root-cause analysis of live_gui wait_for_server timeouts).
Main Thread Purity Invariant: the main thread (entering immapp.run())
must never import a module heavier than imgui_bundle and the lean
gui_2 skeleton. Enforced by:
- static gate: scripts/audit_main_thread_imports.py (CI)
- runtime hook: tests/test_main_thread_purity.py (sys.addaudithook)
Threading constraint: no new threading.Thread(...) calls in src/.
All background work goes through AppController._io_pool
(ThreadPoolExecutor, max_workers=4, thread_name_prefix='controller-io').
9 phases, 57 tasks: audit+baseline, job pool, lazy-load SDKs, lazy-load
FastAPI, lazy-load feature-gated GUI, migrate ad-hoc threads, runtime
enforcement, hook API + diagnostics, verify+checkpoint.
Expected savings: ~2000-2400ms off main-thread import cost.
Target: import src.ai_client < 50ms (from ~1800ms), live_gui fixtures
no longer time out at wait_for_server(timeout=15).