manual_slop

Private

Public Access

Author	SHA1	Message	Date
conductor-tier2	9cc51ca9af	conductor(track): nagent review - deep-dive + 6 pitfalls + 10 actionable takeaways Reference/analysis track. Produces 0 code changes. Artifacts (conductor/tracks/nagent_review_20260608/): - spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing - report.md (571 lines) - 14-section deep-dive; primary deliverable - comparison_table.md (79 lines) - flat side-by-side reference - decisions.md (286 lines) - 10 future-track candidates with priority matrix - nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded in code (file:line refs into nagent source and Manual Slop source) - metadata.json (132 lines) - structured metadata + verification criteria - state.toml (113 lines) - per-task tracking + user-corrections log (7 entries) 14 nagent principles covered in report.md (durable work, text-in/text-out, editable state, visible protocol, the loop, per-file memory, repo history, neighborhoods, sub-conversations, controlled writes, large files, tool discovery, framework differences, build your own). 6 pitfalls (revised from 8 after user-corrections): 1. No structured output protocol in Application AI (opaque function calling) 2. Provider-specific history in process globals (ai_client._anthropic_history + _deepseek_history + _minimax_history) 3. RAG is not 'history as data' (fuzzy, not auditable) 4. AI client is a stateful singleton (2,685-line ai_client.py) 5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want) 6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py) User-corrections applied (3 rounds, 7 total corrections recorded): - Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7 per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix - Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed conversation log; complementary, not equivalent) - Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for 1:1 discussions' (user wants this) - RAG: opt-in, not gap; user wants pre-staging via sub-conversation - Personas: config bundling (can opt out via AI settings) - Tool discovery: deferred (user has 'intent based DSL' idea but 'no where near that ideation yet') 10 actionable takeaways (separate from the 6 pitfalls - those are diagnosis, these are prescription): 1. State visibility (UI inspector for in-process state) 2. Readable conversation log (text-greppable, not just JSON-L) 3. Sub-agents for 1:1 (HIGH priority - user-flagged) 4. File-identity over file-path (st_dev:st_ino rename-safe) 5. One loop shape visible in diagnostics 6. Visible retry on protocol failure 7. Meta-Tooling DSL (intent-based, deferred) 8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606) 9. Single source of truth for disc_entries + provider history 10. Sub-agent return type constraint (bake into candidate #1 spec) Domain classification: every recommendation tagged Application / Meta-Tooling / Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling domain; Manual Slop's Application AI is a different kind of thing. No code modified by this track (reference/analysis only). All 7 files parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve. Track is 'active' awaiting human review; future-track candidates live in decisions.md and nagent_takeaways_20260608.md.	2026-06-08 18:44:35 -04:00
ed	9afc93bce2	fix(app_controller): clear project-switch state in _handle_reset_session When a prior test in the tier-3-live_gui batch leaves a _do_project_switch background thread running, the next test's btn_project_new_automated click sees _project_switch_in_progress=True (from the prior thread) and queues the new path via _project_switch_pending_path. The queued switch is never actually submitted to the io_pool, so is_project_stale() stays True and AI ops (_handle_generate_send) bail with 'project switch in progress; AI ops disabled'. Fix: _handle_reset_session now also clears _project_switch_in_progress, _project_switch_pending_path, and _project_switch_error (under the existing _project_switch_lock). This way, even if the prior background thread is still running, the controller reports an idle state and the new switch can be submitted normally. Also: - src/api_hook_client.py: reverted wait_for_project_switch to require in_progress=False (was relaxed to return on queued path, which misled the caller into thinking the switch was done) - tests/test_handle_reset_session_clears_project.py: new test test_handle_reset_session_clears_project_switch_state asserts is_project_stale() returns False after reset - tests/test_api_hook_client_wait_for_project_switch.py: updated test_wait_for_project_switch_does_not_return_on_queued (in_progress + matching path should keep waiting, not return early) - tests/test_live_workflow.py: added pre-wait for any in-flight switch before doing btn_reset (so the test waits up to 60s for the prior switch to complete if needed) - conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with the deeper hang analysis and recommended fix Known follow-up: test_full_live_workflow still hangs in tier-3 batch even with this fix, because the new _do_project_switch itself is hung in the io_pool (likely saturation from prior sims' AI discussion turn workers). Deeper investigation required.	2026-06-08 15:19:30 -04:00
ed	5087ee988d	chore: move TODO_test_full_live_workflow.md to conductor/todos/ Following the conductor convention of organizing track-related artifacts under conductor/. The TODO tracks the test_full_live_workflow race condition fix and its follow-up items (Tasks 3, 7 still pending; known batch hang documented). Tasks 1, 2 (with regression fix), 4, 5, 6 are SHIPPED in prior commits.	2026-06-08 14:05:40 -04:00
ed	4548726a2b	conductor(tracks): restructure - chronological by phase + status groupings + active queue table	2026-06-08 12:26:56 -04:00
ed	c531cebe03	conductor(plan): review pass — fix cross-references, add NOT_READY + with_errors + Lottes/Valigo, split §3.4 into 8 sub-tasks	2026-06-08 09:38:27 -04:00
ed	64823493c0	conductor(closeout): ship test_batching_refactor_20260606 with CLOSEOUT.md and follow-up recommendation	2026-06-08 08:36:22 -04:00
ed	fb6b4bd3eb	conductor(tracks): mark test_batching_refactor_20260606 as completed	2026-06-08 01:18:20 -04:00
ed	50bd894f8d	conductor(archive): ship test_batching_refactor_20260606 to archive	2026-06-08 01:16:58 -04:00
ed	796eec0058	conductor(plan): mark Phases 2,3 complete in test_batching_refactor_20260606	2026-06-08 01:09:02 -04:00
ed	7610c9c1dc	conductor(plan): mark Phase 1 complete in test_batching_refactor_20260606	2026-06-08 00:53:59 -04:00
ed	2b56ab3c5c	conductor(track): initialize test_batching_post_refactor_polish_20260607 spec/plan/state	2026-06-08 00:27:32 -04:00
ed	7bcb5a8c07	refactor(config): Route all config I/O through AppController Eliminates 22 call sites that bypassed the AppController state owner and read/wrote config.toml directly. AppController is now the single source of truth for self.config; gui_2.py, commands.py, etc. go through controller.save_config() / controller.load_config(). Production changes: - src/models.py: rename load_config -> _load_config_from_disk, save_config -> _save_config_to_disk (private I/O primitives) - src/app_controller.py: add public load_config()/save_config() methods that own the state. Update 3 internal call sites and 3 ConductorEngine call sites to pass max_workers from self.config - src/multi_agent_conductor.py: ConductorEngine.__init__ now takes max_workers as a parameter (caller responsibility, not I/O primitive) - src/external_editor.py: get_default_launcher() takes config as a parameter; gui_2.py:1311,4776 pass app.config - src/gui_2.py: 17 sites of models.save_config(X.config) replaced with X.save_config() (delegates via __getattr__ to controller) - src/commands.py: save_all() uses app.save_config() Test changes (route through controller, not I/O primitive): - tests/conftest.py: mock_app and app_instance fixtures now patch AppController.load_config/save_config instead of models I/O primitives - 18 other test files: patches renamed from models._save_config_to_disk to AppController.save_config (and same for load_config) - tests/test_app_controller_mcp.py: use SLOP_CONFIG env var instead of patching removed CONFIG_PATH module constant - tests/test_parallel_execution.py: pass max_workers=2 explicitly to ConductorEngine (caller no longer reads config) - tests/test_gui_paths.py: add save_config=MagicMock() to MockApp; assert on controller method, not I/O primitive - tests/test_models_no_top_level_tomli_w.py: still calls private _save_config_to_disk directly (the only allowed exception; tests the lazy-load behavior of the primitive itself) New files: - scripts/audit_no_models_config_io.py: enforces the rule (--strict, --json modes; AST-based docstring detection to avoid false positives) - conductor/code_styleguides/config_state_owner.md: documents the rule Verification: - 67 targeted tests pass - scripts/audit_no_models_config_io.py --strict returns 0 This is the architectural cleanup that surfaced during the audit_architectural_cheats_20260607 review. Closes the smoke-gun CONFIG_PATH module constant (already done in `0c7ebf22`) AND the free-function models.load_config/save_config smell. [conductor(checkpoint): config-iO-refactor-20260607]	2026-06-07 19:54:17 -04:00
ed	c9c5535889	docs(workflow): add Skip-Marker Policy section Per 2026-06-07 user feedback during test_suite cleanup: "if the intent is to annotate a known failure, fine. But that known failure must be addressed with priority." New section between "Per-Task Decision Protocol" and "Documentation Refresh Protocol" makes the policy explicit: - Skip markers are DOCUMENTATION, not avoidance - They're useful for opt-in integration tests, unimplemented features, or feature-flag-gated code - They're NOT useful for pre-existing failures, "I don't understand this" issues, or racy tests the agent doesn't want to debug - When adding a marker, MUST document the underlying issue AND what the fix would be - When the fix is in-session reachable, FIX IT INSTEAD of skipping — limited context is not an excuse Includes a 4-question review checklist before adding a skip. References the existing AGENTS.md "Use skip markers as excuse to AVOID" rule so the two policies don't drift.	2026-06-07 16:57:54 -04:00
ed	0db5ec3eef	conductor(tracks): mark License CVE Audit track as complete Phase 4 verification complete: 4 atomic commits landed, 28 unit + integration tests passing, the audit script runs end-to-end against the post-cleanup repo, --strict mode + baseline file wired in as the CI gate. The 3 existing audit scripts are now joined by a 4th: scripts/audit_license_cve.py. Scope: third-party deps only. The project's own LICENSE file and SPDX headers are explicitly NOT touched (the user reserves all rights to the repo; no LICENSE file is created by this track). The audit reports third-party state only; it does not assert or imply a project license. Commits: `a8ae11d3` - chore(audit): add license_cve audit script + initial report `20fa3558` - chore(deps): tilde-pin all deps; delete requirements.txt `a7ab994f` - chore(audit): add --strict mode + baseline file (CI gate) (this) - conductor(tracks): mark track complete	2026-06-07 15:28:25 -04:00
ed	a8ae11d3a8	chore(audit): add license_cve audit script + initial report scripts/audit_license_cve.py: 4 internal checks (license + CVE + pin + source-header), policy tables (allowlist of permissive/weak-copyleft/public-domain, blocklist of non-OSI/restricted-source), and a main() that runs all 4 and emits line-per-violation to stdout + a markdown report. Tests (26 unit + integration) cover license classifier (16 variants across MIT, BSD, Apache, LGPL, MPL, CC0, WTFPL, GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, Anti-996, Hippocratic, unknown), pin check (3), source-header check (3), license check via importlib.metadata (1), CVE check via subprocess pip-audit (2), and a smoke test of the main loop (1). No new pip deps in the project: pure stdlib (importlib.metadata, tomllib, pathlib, re) + subprocess to pip-audit (optional dev tool, installed via 'uv tool install pip-audit' if user wants CVE checks). Initial report at docs/reports/license_cve_audit/2026-06-07/ records the current state. The Phase 2 commit will apply the fixes (tilde-pin, delete requirements.txt); the Phase 3 commit will add --strict mode + baseline file for CI.	2026-06-07 15:07:46 -04:00
ed	8af3af5c34	fix(app_controller): correctly construct TrackState with Ticket (not TicketState) The _push_mma_state_update method (added in `8216d494`) used models.TicketState for the persisted tasks list, but: - src.models has no TicketState class; only Ticket - TrackState.tasks is annotated as List[Ticket] So my code raised AttributeError on every call, which my try/except caught and silently printed. Tests that depended on save_track_state being called (test_push_mma_state_update) failed because the call was skipped. Also fixed: - TrackState field name: it's 'tasks' (not 'tickets') per the src.models dataclass annotation. My code was using 'tickets=' which created a TypeError on construction. - Removed the [DEBUG ...] print statements added during the investigation; they were only for diagnosing the silent AttributeError. - Kept the try/except so a real exception is still logged to stderr (visible via -s flag) without breaking the test. Result: 11/11 tests in test_gui_phase4 + test_ticket_queue now pass: - test_push_mma_state_update - test_ticket_priority_default/custom/to_dict/from_dict - TestBulkOperations::test_bulk_execute/skip/block (3) - TestReorder::test_reorder_ticket_valid/invalid (2)	2026-06-07 14:32:29 -04:00
ed	61b5572e2b	chore(audit): spec license_cve_audit track (compliance + CVE + pinning) Builds scripts/audit_license_cve.py: single audit script that checks third-party deps (pyproject.toml + uv.lock transitive tree) for: (1) license compliance against the project's policy, (2) known CVEs (via pip-audit subprocess), (3) version-pinning, and (4) source-file SPDX license headers in src/ and scripts/. LICENSE POLICY (encoded in the script) Allowlist (permissive or weak copyleft or public domain): - Permissive: MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib, Python-2.0, 0BSD, PSF-2.0 - Weak copyleft (Python import-safe): LGPL 2.1/3.0, MPL-2.0 - Public domain: CC0, WTFPL Blocklist (non-OSI / restricted-source): - GPL (any version), AGPL (any version) - SSPL (MongoDB 2018) - broad service-provider trigger - BSL / BUSL - delayed open source; competitive-use restriction - Commons Clause - 'cannot sell the software' addendum - Elastic License v2 - 'cannot offer as managed service' - Unknown / unparseable / missing metadata (catches packaging bugs and custom licenses) The two lists are explicit. Default rule: unknown = violation (never auto-pass). The script's --help references the policy table for transparency. Specific per-license additions go in scripts/audit_license_cve.py directly; no spec change needed. TRACK SCOPE In scope: third-party deps (direct + transitive), source-file SPDX headers, vendored libraries (defensive), version pinning. Out of scope: the project's own LICENSE file, project's own SPDX/Copyright headers, recommendations on project license. The user reserves all rights to the repo; no LICENSE file is created by the track. The audit reports third-party state only. OUTPUT FORMAT (sanitized: no JSON in user-facing output) - Stdout: line-per-violation, parseable by eye and by grep - Markdown report in docs/reports/license_cve_audit/2026-06-07/ - Baseline file: JSON (matches existing audit_weak_types convention; internal state for --strict mode only) CI GATE --strict mode + scripts/audit_license_cve.baseline.json. Fails CI on any new violation OR any new CVE. Mirrors the 3 existing audit scripts (audit_main_thread_imports, audit_weak_types, check_test_toml_paths). COMMITS PLANNED 1. chore(audit): add license_cve audit script + initial report 2. chore(deps): tilde-pin all deps; delete requirements.txt 3. chore(audit): add --strict mode + baseline file (CI gate) 4. conductor(tracks): mark License CVE Audit track complete NO NEW PIP DEPENDENCIES IN PROJECT Pure stdlib (importlib.metadata, tomllib, pathlib, re) + subprocess to pip-audit (an optional dev tool, installed via 'uv tool install pip-audit' if user wants CVE checks).	2026-06-07 14:26:22 -04:00
ed	ad13007352	chore(audit): switch output format from JSON to custom postfix DSL Per user direction ('make a custom DSL ideal for recording the call-graph or other metrics', 'I want a post-fix heiarchy', 'JSON is ill-performant'): replaced JSON serializer with a custom postfix (RPN) DSL tailored to the audit's record shapes. THE CUSTOM DSL - Postfix (operands before operator); no brackets, braces, commas, or colons. - Length-prefixed lists: N items followed by 'list' word. - Tagged records: each 'word' is a constructor with a known arity (action=3, fn=3, call=1, mut=3, exp-op=5, pair=2, int=1). - Whitespace-tokenized; bare atoms unquoted; double quotes only when whitespace/special chars present. - nil for null; backslash for line comments; true/false for bool. - Trivial parser (~30 lines): _tokenize_dsl splits on whitespace and respects quotes + comments; parse_dsl walks tokens and evaluates tagged words against a known arity table (DSL_WORD_ARITY). - Round-trips: to_dsl(profile) -> parse_dsl(to_dsl(profile)) yields the same in-memory structure. DELIVERABLES (updated spec + plan) - src/code_path_audit.py: to_dsl, dump_dsl, parse_dsl, _tokenize_dsl, to_tree (prefix-tree text renderer), to_markdown, to_mermaid. - Output: .dsl files (machine) + .tree (human prefix view) + .md (summary tables) + .mmd (Mermaid diagrams). - No new pip dependencies; pure stdlib. WHAT STAYED - The 7 cost classes (file_io, network, ast_parse, json_io, pickle, deep_copy, loop_amplified) and 5 mutation kinds are unchanged. The json_io cost class is for JSON file I/O the audit detects, not the output format. - 36 tests total (15 + 8 + 10 + 3 across the 4 implementation phases).	2026-06-07 12:17:56 -04:00
ed	803f87137b	chore(audit): plan code path audit track (6 phases, 30 tests) 6 phases, one per commit: Phase 1: data structures (CallGraph, ExpensiveOp, StateMutation) - 15 unit tests Phase 2: trace_action + ActionProfile + cost model + AST walking - 8 tests (synthetic + integration on real src/) Phase 3: JSON / markdown / Mermaid output - 4 tests Phase 4: MCP tool + CLI surface - 3 tests Phase 5: run audit on 3 actions; commit report Phase 6: tracks.md update TDD pattern: each task has synthetic-data unit test, then real implementation, then integration with real src/, then commit. The state.toml scaffold is created in Phase 0 Step 0.1 and advanced after each phase. 3 actions in scope (MMA is cold per user): - ai_message_lifecycle (5 entry points) - discussion_save_load (4 entry points) - gui_startup (3 entry points) Two follow-up tracks recorded but NOT in this track: - pipeline_runtime_profiling_20260607 - pipeline_pruning_20260607 No new pip dependencies; pure stdlib (ast, json, pathlib, dataclasses). Read-only on src/; new files are the tool, the tests, and the report under docs/reports/code_path_audit/2026-06-07/.	2026-06-07 11:37:40 -04:00
ed	c82207b191	conductor(plan): mark phase 6 complete [`9647b8d`]	2026-06-07 11:31:43 -04:00
ed	9647b8d228	conductor(tracks): mark Unused Scripts Cleanup track as complete Phase 6 verification complete: 5 atomic per-category commits landed, non-GUI test suite passes, 2 audit scripts (main_thread_imports, weak_types) report no new violations, ImGui linter reports the 3 pre-existing src/gui_2.py findings (src/ untouched by this track; informational mode exit 0). scripts/ shrinks from 56 to 26 files (54% reduction).	2026-06-07 11:30:29 -04:00
ed	f069a8b27b	chore(audit): spec code path audit track Design for a data-oriented static-analysis tool (src/code_path_audit.py) that audits the 3 major actions (AI message lifecycle, discussion save/load, GUI startup) for expensive operations, redundant calls, and pipelining candidates. Output: JSON data files + markdown summaries + Mermaid per-action call graphs in docs/reports/code_path_audit/. 61 src/ files, 27,447 total lines. Call graph is non-trivial; per-action traversal is what makes analysis tractable. Cost model: 7 cost classes (file_io, network, ast_parse, json_io, pickle, deep_copy, loop_amplified) with heuristic weights; EXPENSIVE_THRESHOLD = 40,000 module constant. 5 state mutation kinds (attr_write, container_mutate, file_write, ipc_emit, global_write). The 3 action entry points are per-action defined (see Per-Action Design table). MMA worker spawn is OUT of scope per user (cold until 1:1 discussion UX is dogfooded). Two follow-up tracks recorded but NOT in this track: - pipeline_runtime_profiling_20260607: calibrate the heuristic cost model with real measurements; catch C-extension cost, decorator dispatch, JIT effects that static analysis can't resolve. - pipeline_pruning_20260607: implement the high-priority optimization candidates surfaced by this track's report. 6 atomic commits planned: data structures; trace_action + ActionProfile + cost model; output (JSON/MD/Mermaid); MCP + CLI; run audit + commit report; tracks.md update.	2026-06-07 11:30:06 -04:00
ed	ca781543ea	conductor(plan): mark sub-track 2 (audit violations) COMPLETE [`2e3a6385`] All 6 sub-tracks (2A-2F) complete. Audit script: 0 violations (was 67 baseline / 61 before sub-track 2). Track is now FULLY COMPLETE (was previously [~] due to sub-track 2 partial). 79 tests added/passing across sub-tracks 2A-2F. Updated sub_tracks table in state.toml with per-sub-track completion details. Pre-existing test failures (4 unrelated) documented in test_failure_notes.	2026-06-07 11:01:24 -04:00
ed	adfd75a6d4	conductor(plan): mark phase 5 complete [`46ce3cd`]	2026-06-07 10:49:34 -04:00
ed	f5fc99f91f	conductor(plan): mark phase 4 complete [`0022dd8`]	2026-06-07 10:45:33 -04:00
ed	811e7203c1	conductor(plan): mark phase 3 complete [`bd20fee`]	2026-06-07 10:43:52 -04:00
ed	41e970e0e2	conductor(plan): mark phase 2 complete [`dfbde95`]	2026-06-07 10:40:46 -04:00
ed	62214e3cae	conductor(plan): mark phase 1 complete [`3d412ba`]	2026-06-07 10:38:52 -04:00
ed	eae5b0a22b	chore(scripts): plan unused scripts cleanup track (5 phases) 5 phases, one per deletion category from the spec: Phase 1: Remove one-shot indent fixers (10 files) Phase 2: Remove one-shot transform scripts (6 files) Phase 3: Remove superseded entropy and code-stat audits (4 files) Phase 4: Remove one-shot migrators and repros (6 files) Phase 5: Remove tool-call aliases and legacy tool discovery (4 files) Phase 6: Final verification + tracks.md update Each phase = one git rm + one commit + one git note + one state.toml update. Phase 0 adds the state.toml scaffold. Phase 6 runs the full test suite in 4-at-a-time batches per workflow.md Phase Completion protocol, re-runs the 2 active audit scripts (main_thread_imports, weak_types) for regression check, and commits the tracks.md update. TDD pattern adapted for deletion: pre-deletion baseline (Phase 0) + per-phase git rm + post-deletion test suite pass (Phase 6). No new code, no new tests, no new CI gate.	2026-06-07 10:26:49 -04:00
ed	87098a2ec3	chore(scripts): spec unused scripts cleanup track Design for removing 30 confirmed-unused one-off scripts from scripts/. Net effect: scripts/ shrinks from 56 -> 26 files (54% reduction). All deletions are hard deletes via 5 atomic per-category commits; git log is the restore path. 26 KEEPS documented by category (CI gates, MMA, MCP, test runner, ImGui linter, audit/scaffolding, tool-call bridge, Docker, borderline utility). 30 DELETES grouped by category: one-shot indent fixers (10), one-shot transform scripts (6), superseded entropy audits (4), one-shot migrators/repros (6), tool-call aliases and legacy tool discovery (4). No new CI gate added. Follow-up unused_scripts_audit_20260607 recorded in the spec. Plan (writing-plans) will produce 5 phases (one per category).	2026-06-07 10:19:20 -04:00
ed	02239bc38f	conductor(plan): mark sub-track 2A (pydantic in models.py) complete [`01ddf9f1`] Resuming sub-track 2 (audit violations) per user direction. Sub-track 2A cleared 1 of 61 violations (pydantic in src/models.py via PEP 562 __getattr__ + pydantic.create_model). 60 remain across file_cache (4), api_hooks (4), sloppy (5), app_controller (23), gui_2 (24). Next: 2B (tree_sitter in file_cache.py).	2026-06-07 10:03:48 -04:00
ed	a88c748d77	conductor(tracks): un-mark startup_speedup as complete; sub-track 2 still pending Phase 9 was shipped at `12cec6ae` and the 9-phase core plan is done, but the [COMPLETE 2026-06-07] tag was applied prematurely. Sub-track 2 (audit violations) remains partial at `ae3b433e` with 61 violations remaining: pydantic in models.py (1), tree_sitter in file_cache.py (4), api_hooks.py (4), sloppy.py (5), app_controller.py (23), gui_2.py (24). Reopening the track to finish sub-track 2 in 6 per-file sub-tracks (2A-2F).	2026-06-07 09:36:08 -04:00
ed	820cdab15a	docs(agents,edit_workflow): capture session-learned anti-patterns (2026-06-07) Captures the 5 patterns that burned the most time in the startup_speedup_20260606 sub-track 4 work: 1. ALWAYS use manual-slop_edit_file, not custom scripts (custom scripts fail silently on indent/EOL/whitespace drift) 2. The decorator-orphan pitfall (inserting before 'def foo' leaves @property decorating YOUR new method) 3. ast.parse() is not enough (semantic errors aren't caught; import + instantiate + call after every edit) 4. The git restore trap (don't run git status/restore while a user is mid-conversation) 5. Small verified edits beat big scripts (edit_workflow says 3-10 lines; if you write 200 lines of script, wrong tool) Also adds 2 new anti-patterns to the Critical list in AGENTS.md and 3 new sections to conductor/edit_workflow.md (decorator-orphan, ast.parse-not-enough, set_file_slice-is-literal).	2026-06-06 22:52:02 -04:00
ed	f09cd4a733	conductor: doc final sync for sub-tracks 2 (partial), 3, 4 + conftest fix	2026-06-06 21:45:27 -04:00
ed	c073e42a7a	docs(workflow,agents): add 7 process improvements from planning session All additive; no breaking changes to existing content. Derived from gaps observed during the 2026-06-06 planning session (5 tracks spec'd + planned end-to-end). AGENTS.md (1 new section, 16 lines): - Compaction Recovery - explicit recovery path for a new agent picking up mid-track (read the digest, check state.toml, run audits, resume from next unchecked task). Cross-references the workflow-level 'Compaction Recovery' section. conductor/workflow.md (6 new sections, 145 lines): - Planning Session Workflow - documents the brainstorming -> spec -> plan flow used 5x this session; mandates spec approval before plan; notes the plan is the only artifact the implementer reads. - Track Dependencies and Execution Order - verify the blocked_by chain in metadata.json before starting; topological sort gives the recommended execution order (recorded in PLANNING_DIGEST). - State.toml Template - canonical structure (meta / blocked_by / blocks / phases / tasks / verification / track-specific) so future tracks have a consistent shape. - Per-Task Decision Protocol - small decisions (cosmetic) decide yourself; large decisions (architectural) STOP and report; regressions STOP and report. The boundary is 'does this require a new spec or plan update?'. - Documentation Refresh Protocol - after a track ships, identify affected guides (grep for renamed/moved symbols), update them, add new guides for new modules, add styleguides for new conventions. The 'post-tracks documentation' pattern is repeatable; tracks that only update code are incomplete. - Audit Script Policy - whenever a track introduces a new convention that can be statically checked, add an audit script in scripts/ with --help / --json / strict modes. The audit + CI gate pair is the convention-enforcement mechanism; 3 existing audits (audit_main_thread_imports, audit_weak_types, check_test_toml_paths) are the precedent. All sections reference existing project files (brainstorming skill, writing-plans skill, audit scripts, tracks.md, the existing 5 new tracks' spec.md files, PLANNING_DIGEST_20260606.md). No code changes. Documentation only. ~160 lines total added.	2026-06-06 21:22:40 -04:00
ed	530a29f0d2	conductor(tracks): fix sub-track count in startup_speedup row (4 → 3; sub-track 1 is done)	2026-06-06 20:51:25 -04:00
ed	bb2ac6c9c0	conductor: finalize startup_speedup_20260606 docs (sub-track 1 + 3 post-shipping fixes)	2026-06-06 20:45:58 -04:00
ed	cf01870b35	conductor(plan): write 7-phase implementation plan for mcp_architecture_refactor_20260606 ~25 tasks across 7 phases, each with explicit Red-Green-Refactor TDD steps: - Phase 1 (1.1-1.5): Foundation. 3-layer security module (8 unit tests returning Result[Path]); SubMCP Protocol + MCPController class (6 unit tests). Controller added ALONGSIDE the existing 45 functions in mcp_client.py (no removal yet). - Phase 2 (2.1-2.4): Backward compat. git mv mcp_client.py to mcp_client_legacy.py; create new mcp_client.py as a slim shim re-exporting 45+ old symbols. 12 legacy shim tests verify the surface. The 4 existing test files + src/app_controller.py:61 still work. - Phase 3 (3.1-3.4): FileIOMCP extracted (9 tools, 10 unit tests). - Phase 4 (4.1-4.4): PythonMCP extracted (14 tools, 14 unit tests). - Phase 5 (5.1-5.5): CMCP, CppMCP, WebMCP, AnalysisMCP extracted (4 sub-MCPs, 18 unit tests; pattern mirrors Phase 3/4). - Phase 6 (6.1-6.3): ExternalMCP extracted from mcp_client_legacy. Class name preserved (ExternalMCPManager). - Phase 7 (7.1-7.5): Update dispatch() in the legacy shim to use the new controller (inverted-dict O(1) lookup); update docs; manual smoke test; archive the track. Each sub-MCP follows the same template (class with name / description / tools / invoke; security check for path-taking tools; Result wrapping in invoke(); delegation to legacy functions for the actual implementation). The sub-MCPs are thin adapters in v1; a future track can move the implementations into the sub-MCP files directly. Self-review at the end maps every spec section to a task (no gaps), confirms zero placeholders, and verifies type/method-name consistency across phases (SubMCP Protocol, MCPController class, Result[str, ErrorInfo], _resolve_and_check all defined in Phase 1; used consistently across Phases 3-6).	2026-06-06 20:43:48 -04:00
ed	dd137df750	conductor(tracks): backfill mcp_architecture_refactor SHA in registry	2026-06-06 20:34:35 -04:00
ed	2720a8940c	conductor(track): Initialize mcp_architecture_refactor_20260606 Track + metadata + state + tracks.md registration for the 2,205-line mcp_client.py split into a slim controller + 6 native sub-MCPs + 1 external sub-MCP. Key design decisions (per user feedback): - Naming convention: mcp_<type>.py for native MCPs (mcp_file_io.py, mcp_python.py, mcp_c.py, mcp_cpp.py, mcp_web.py, mcp_analysis.py). - ExternalMCPManager class name preserved (moves to mcp_external.py). - Sub-MCP shape: class with name / description / tools / invoke(). - MCPController: holds ALL_SUB_MCPS list, inverted-dict tool lookup, 3-layer security (extracted to mcp_client_security.py), schema aggregation. - Each invoke() returns Result[str, ErrorInfo] (from data_oriented_error_handling_20260606). - Backward compat: mcp_client_legacy.py re-exports all 45+ old symbols; the 4 existing test files + src/app_controller.py:61 direct call continue to work. DSL future (per user notes on APL/K/Cosy): NOT in this track. Documented in spec §12.1 as the mcp_dsl_20260606 follow-up. Sub-MCP architecture is the natural unit to pair with a DSL emitter. 7 phases. ~22 task slots. New tests: 9 (one per sub-MCP + controller + security + legacy). Modified tests: 4 (existing mcp_* tests must pass unchanged). Blocked by: data_oriented_error_handling_20260606, data_structure_strengthening_20260606. Blocks: mcp_dsl_20260606 (future DSL track).	2026-06-06 20:34:00 -04:00
ed	9147578155	conductor(plan): write 2-phase implementation plan for data_structure_strengthening_20260606 ~22 tasks across 2 phases, each with explicit Red-Green-Refactor TDD steps: - Phase 1 (1.1-1.12): Foundation. type_aliases.py (10 TypeAliases + 1 NamedTuple) with 8 unit tests. Mechanical replacement of 345 weak sites in 6 files (ai_client 139, app_controller 86, models 51, api_hook_client 32, project_manager 20, aggregate 17). Each file has a per-substitution table for the mechanical replacement. Audit script gains --strict mode + baseline file (CI gate). 4 audit tests. - Phase 2 (2.1-2.10): FileItemsDiff NamedTuple integrated. generate_type_registry.py (AST-based; 3 modes: default, --check, --diff). Initial registry generated in docs/type_registry/ (8+ .md files). 6 generator tests. Type aliases styleguide + product-guidelines updates. Manual smoke test. Track archived. The type registry generator uses --check mode for CI: it regenerates to a temp dir and diffs against the committed registry; exit 1 if drift. The agent's track-completion workflow is: regenerate -> review diff -> commit. CI enforces --check on every PR. Self-review at the end maps every spec section to a task (no gaps), confirms zero placeholders, and verifies type/method-name consistency across phases (all 10 aliases + FileItemsDiff defined in Task 1.2; used consistently in Tasks 1.3-1.8 and Phase 2).	2026-06-06 18:15:15 -04:00
ed	95d1b08142	conductor(plan): Final track summary - 9 phases, 50 tests, 3066ms saved	2026-06-06 18:08:59 -04:00
ed	432c789524	conductor(spec): add registry-drift risk to §9	2026-06-06 18:07:48 -04:00
ed	aba35f9f4a	conductor(spec): Add type registry to data_structure_strengthening track Per user feedback (2026-06-06): instead of a follow-up 'TypedDict Migration' track, add a NEW deliverable: an auto-generated type registry in docs/type_registry/ that captures the field information in docs form. New files: - scripts/generate_type_registry.py (NEW): AST-based tool that reads src/ and writes per-source-file .md files with the fields of every @dataclass, NamedTuple, TypeAlias, TypedDict. Has --check (CI mode, exits 1 if registry would change) and --diff (dry run) modes. - docs/type_registry/ (NEW, generated): index.md + per-source-file references (type_aliases.md, ai_client.md, models.md, etc.). - tests/test_generate_type_registry.py (NEW): verify the generator. Architecture updates: - Section 3.6 (NEW): Type Registry architecture with example output. - Section 3.7 (NEW): Why per-source-file docs (locality of reference). - Section 1.1 (NEW): 'Why docs over TypedDict' analysis (3 reasons: lower upfront cost, better fit for AI workflow, auto-maintained). - Goals table: registry added as a C (innovation) goal. - Module layout: docs/type_registry/ and scripts/generate_type_registry.py added to the new files list. - Migration: Phase 2 now includes the registry generator + initial docs. - Out of scope: TypedDict migration REMOVED; 'auto-typing the field shape' added with the docs as the chosen approach. - See Also: TypedDict follow-up REPLACED with 'Registry Maintenance & CI Integration' (smaller scope, just wires the generator into CI). The 'cost we eat' is the LLM reading 200-500 lines of markdown per query. This is bounded and proportional to actual information need. The upfront cost of designing TypedDict schemas for every type is unbounded. Tradeoffs favor the docs approach for v1; TypedDict can come later as a future track if desired.	2026-06-06 18:06:34 -04:00
ed	4e6a86a84c	conductor(tracks): backfill data_structure_strengthening_20260606 SHA in registry	2026-06-06 17:51:33 -04:00
ed	ed42a97a9b	conductor(track): Initialize data_structure_strengthening_20260606 Track + metadata + state + tracks.md registration for the type-aliases refactor that follows the audit_weak_types.py findings (430 weak sites across 29 of 61 files; 86% concentrated in 6 high-traffic files). Key design decisions (per user approval): - 10 TypeAlias definitions in src/type_aliases.py (Metadata, CommsLogEntry, CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition, ToolCall, CommsLogCallback). - 1 NamedTuple (FileItemsDiff) for the _reread_file_items return. - Mechanical replacement of 345 weak sites across 6 files (NOT 430; the remaining 85 are in 23 lower-impact files deferred to future tracks). - scripts/audit_weak_types.py gains a --strict mode and a baseline file (scripts/audit_weak_types.baseline.json) so the count is enforced. - 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples + docs + archive. - Honest about what's missing: TypedDict / @dataclass migration is a follow-up track (typed_dict_migration_20260606), not this one. - Coexistence with the data_oriented_error_handling_20260606 track's Result[T] / ErrorInfo: the aliases are value-level (data types), Result is control-level (wrapper). They compose (Result[FileItems] is valid). No conflict. Audit baseline: - Pre-track: 430 weak sites, 0 strong patterns - Target after Phase 1: ~60 weak sites (only the 23 lower-impact files) - Top 4 unique type strings account for 86% of findings (4-6 aliases eliminate the bulk of the noise). Not blocked by anything; can be executed independently of the other pending tracks. Blocks typed_dict_migration_20260606 (the future Phase 2).	2026-06-06 17:49:22 -04:00
ed	b91962e458	conductor(plan): Mark Phase 5D complete - gui_2 lazy proxy + dead import removal	2026-06-06 17:19:14 -04:00
ed	f7b11f7f1c	conductor(plan): write 5-phase implementation plan for data_oriented_error_handling_20260606 ~25 tasks across 5 phases, each with explicit Red-Green-Refactor TDD steps: - Phase 1 (1.1-1.9): Foundation. Post-tracks baseline verification, typing_extensions dep, src/result_types.py (10 unit tests), conductor/code_styleguides/error_handling.md canonical reference, product-guidelines.md + workflow.md updates. - Phase 2 (2.1-2.7): mcp_client.py refactor. _resolve_and_check returns Result[Path]; all 9 tool functions return Result[str]; 30+ 'assert p is not None' chain removed; tool dispatch updated; existing tests migrated to .data/.errors pattern. - Phase 3 (3.1-3.8): ai_client.py refactor (HIGHEST RISK). _classify_<vendor>_error() returns ErrorInfo (not raise ProviderError); _send_<vendor>() renamed to _send_<vendor>_result() returning Result[str] (8 vendors); ProviderError class REMOVED; new public send_result() API; send() marked @deprecated (rewired to call send_result() and unwrap). - Phase 4 (4.1-4.5): rag_engine.py refactor. _init_vector_store, _validate_collection_dim return Result; NilRAGState used; broad except Exception becomes ErrorInfo entries. - Phase 5 (5.1-5.7): Deprecation wiring (filterwarnings in conftest.py to silence send() warning in existing tests), docs updates (guide_ai_client + guide_mcp_client), follow-up track public_api_migration_20260606 placeholder in tracks.md, manual smoke test, archive the track. Coordination with the 3 pending tracks (startup_speedup, test_batching_refactor, qwen_llama_grok_integration) addressed throughout. Phase 1 Task 1.1 verifies the baseline before any refactor begins. Post-tracks state considerations from spec §10 fully integrated into the task breakdown. 1-space indentation per project style guide. No placeholders. All test code is concrete. Self-review at end confirms full spec coverage (every section of spec.md mapped to a task).	2026-06-06 17:06:30 -04:00
ed	32edad0a4b	conductor(plan): Mark Phase 5A-5C complete (commands, theme_2, markdown_helper lazy imports)	2026-06-06 17:01:05 -04:00
ed	cbc3b075a0	conductor(track): Initialize data_oriented_error_handling_20260606 Track + metadata + state + tracks.md registration for the Fleury-pattern error handling refactor. Key design decisions (per user approval): - Option A for _send_<vendor>() handling: rename to _send_<vendor>_result() and change return type to Result[str] (contained to internal callers). - send() is marked @typing_extensions.deprecated; send_result() is the new public API. - ProviderError exception is FULLY REPLACED by ErrorInfo dataclass (a value, not an exception). - 5 phases: foundation, mcp_client, ai_client, rag_engine, deprecation+archive. - Post-tracks baseline check (Phase 1 Task 1.1) verifies the 3 pending tracks have merged before proceeding. - 9 Open Questions, 7 Risks, 5 verification criteria, follow-up track public_api_migration_20260606 planned in spec §12.1. Blocked by: startup_speedup_20260606, test_batching_refactor_20260606, qwen_llama_grok_integration_20260606. Blocks: public_api_migration_20260606.	2026-06-06 16:58:22 -04:00

1 2 3 4 5 ...

1367 Commits