Private
Public Access
0
0
Commit Graph

1038 Commits

Author SHA1 Message Date
conductor-tier2 8a597d1832 conductor(track-update): mcp_architecture_refactor - list_tool_schemas + security-as-contract
4 surgical additions to the spec, no task changes:

1. list_tool_schemas on the SubMCP Protocol: Added the method
   to §3.1 (The SubMCP Protocol). Per nagent_review Pitfall #6
   (hard-coded tool discovery) and takeaway #5 (self-describing
   tools), each sub-MCP advertises its own capabilities via
   list_tool_schemas() rather than relying on a central registry.
   This is the equivalent of nagent's collect_bin_tool_descriptions
   per sub-MCP. The MCPController.get_tool_schemas() becomes a
   simple aggregator.

2. Security model is the contract: Added a new Important note
   to §3.3 (The 3-Layer Security Model). The 3 layers
   (Allowlist Construction -> Path Validation -> Resolution
   Gate, per docs/guide_mcp_client.md) are not just refactored
   - they are the CONTRACT between MCPController and the
   sub-MCPs. Sub-MCPs receive a pre-validated Path and trust
   it. They do NOT re-validate. The refactor is structural,
   not security-changing.

3. Docs touchpoint in Phase 7: Added the docs touchpoint to
   Phase 7 per the docs Refresh Protocol. The update to
   docs/guide_mcp_client.md should add a Sub-MCP Architecture
   section, link the list_tool_schemas pattern to 3-Layer
   Security Model, and cross-link the 3 new guides from
   the 2026-06-08 docs refresh.

4. See Also cross-references: Added 8 new entries to §12.2:
   - docs/guide_context_aggregation.md (FileItem consumer)
   - docs/guide_state_lifecycle.md (App state delegation)
   - docs/guide_discussions.md (23-operation matrix)
   - conductor/tracks/qwen_llama_grok_integration_20260606/
     (Result return type coordination)
   - conductor/tracks/nagent_review_20260608/{report,takeaways}.md
   - (2 specific data_oriented_error_handling and
     data_structure_strengthening cross-refs)

No plan.md changes.
2026-06-08 20:59:27 -04:00
conductor-tier2 1fb0d79c0d conductor(track-update): data_structure_strengthening - HistoryMessage vs ProviderHistoryMessage split
4 surgical additions to the spec, no task changes:

1. ProviderHistoryMessage: Added a new alias to §3.1 (The
   Aliases). Per nagent_review Pitfall #4 (provider history
   divergence), the UI/curation layer (HistoryMessage, edited
   via disc_entries[i].content) and the SDK layer
   (ProviderHistoryMessage, the bytes actually replayed to the
   LLM) are *distinct*. Conflating them via a single alias
   perpetuates the bug. The new alias is documented as a
   separate concept with its own use sites (_anthropic_history,
   _deepseek_history, _minimax_history, _grok_history,
   _llama_history). The follow-up public_api_migration_20260606
   track is the natural moment to unify the two layers; this
   spec just makes the distinction explicit.

2. FileItem alias points to the existing models.FileItem
   dataclass, not Metadata. Per docs/guide_context_aggregation.md
   (added 2026-06-08), FileItem is a 9-field dataclass
   (path, auto_aggregate, force_full, view_mode, selected,
   ast_signatures, ast_definitions, ast_mask, custom_slices,
   injected_at) with a __post_init__ normalizer. Aliasing it to
   dict[str, Any] would lose the type safety. The 9 other
   aliases remain dict aliases for round-trip compatibility.

3. gui_2.py and mcp_client.py as follow-up: Added a Note
   (dated 2026-06-08) to the Out of Scope section. The 23
   lower-impact files (deferred) are dominated by gui_2.py
   (26+ weak sites per guide_state_lifecycle.md) and
   mcp_client.py (will be touched heavily by the parallel
   mcp_architecture_refactor_20260606). The deferral is correct
   but the follow-up should explicitly call out these two
   files as the next targets, rather than implying they're
   handled.

4. See Also cross-references: Added 7 new entries to §12.2:
   - docs/guide_models.md (FileItem dataclass source)
   - docs/guide_context_aggregation.md (FileItems consumer)
   - docs/guide_discussions.md (HistoryMessage shape)
   - docs/guide_state_lifecycle.md (state delegation)
   - conductor/tracks/mcp_architecture_refactor_20260606/
   - conductor/tracks/nagent_review_20260608/{report,takeaways}.md

No plan.md changes.
2026-06-08 20:50:50 -04:00
conductor-tier2 0471440c68 conductor(track-update): data_oriented_error_handling - nagent_review + docs refresh
3 surgical additions to the spec, no task changes:

1. New ErrorKind: Added PROVIDER_HISTORY_DIVERGED_FROM_UI to
   the ErrorKind enum. Per nagent_review Pitfall #4 (provider
   history divergence: user edits disc_entries[i].content via
   the discussion UI but ai_client._<provider>_history still
   replays the original). The new kind makes the divergence
   *detectable* and *reportable* so the follow-up
   public_api_migration_20260606 track can collapse the two
   history layers. The Result pattern from this track is the
   natural carrier for the signal.

2. State-delegation regression tests: Added mandatory
   regression tests to the testing strategy in §6 for the
   ai_client refactor (highest-risk phase). The new tests
   exercise:
   - app.temperature = 0.5 round-trips through App.__getattr__/
     __setattr__ delegation (per gui_2.py:666-675)
   - controller.disc_entries[i].content is reflected in the
     next send_result()'s messages parameter
   - The 3 per-provider history locks serialize correctly under
     concurrent send_result() calls
   The reason this is mandatory: per guide_state_lifecycle.md
   (added 2026-06-08), the App.__getattr__/__setattr__ pattern
   means a partial refactor manifests as silent AttributeError
   deep in test code, not at the refactor commit boundary.

3. See Also cross-references: Added 6 new entries to §12.3:
   - docs/guide_ai_client.md (per-provider history globals)
   - docs/guide_mcp_client.md (3-layer security model)
   - docs/guide_state_lifecycle.md (3 per-thread + 7-lock pattern)
   - docs/guide_discussions.md (23-operation matrix)
   - docs/guide_context_aggregation.md (build_discussion_section)
   - conductor/tracks/mcp_architecture_refactor_20260606/
   - conductor/tracks/nagent_review_20260608/{report,takeaways}.md

No plan.md changes. Plan tasks are task-level and will flow from
the spec changes when the track is re-planned.
2026-06-08 20:41:00 -04:00
conductor-tier2 77ae2ec7a8 conductor(track-update): qwen_llama_grok - spec notes for nagent_review + docs refresh
4 surgical additions to the spec, no task changes:

1. Result return type: Added a coordination note in §3.1 (Data-
   Oriented Design) explaining that the shared send_openai_compatible
   helper should return Result[NormalizedResponse, ErrorInfo] from
   day 1, not NormalizedResponse + ProviderError raise. This is so
   the downstream data_oriented_error_handling_20260606 track is
   a small mechanical pass over new code, not a second migration.
   References nagent_review Pitfall #4 (provider history divergence)
   and the ErrorKind.PROVIDER_HISTORY_DIVERGED_FROM_UI use case.

2. Declarative read, not behavioral dispatch: Added clarification
   to §6 (UX Adaptation) that the capability matrix is a *read* of
   declarative data, not a new dispatch layer. Per nagent_review
   Pitfall #1 (opaque function calling in the Application is the
   correct choice; nagent-style protocol is for Meta-Tooling),
   UI elements are visible/enabled/disabled/hidden but the
   *behavior* they invoke is unchanged. Three concrete examples
   added: screenshot button, cost panel, cache panel.

3. PROVIDERS source of truth: Added a NOTE in §3.2 (Module Layout)
   that src/models.py:79-86 PROVIDERS is the existing single
   source of truth for the (vendor, model) enumeration. The
   capability registry reads from this constant rather than
   introducing a parallel list. Cross-references
   docs/guide_models.md.

4. Docs touchpoint: Expanded Phase 6 (Docs + Archive) in §9 to
   note that docs/guide_ai_client.md needs the new providers +
   the shared helper documented, and that
   docs/guide_context_aggregation.md (added 2026-06-08) is the
   reference for the aggregate.py pipeline that all new providers
   use.

5. See Also cross-references: Added 3 new entries to §13.2:
   - docs/guide_context_aggregation.md (the new pipeline guide)
   - conductor/tracks/nagent_review_20260608/report.md (§1, §5, §15)
   - conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md
     (§1, §2, §9)

No plan.md changes. Plan tasks are task-level and will flow from
the spec changes when the track is re-planned.
2026-06-08 20:35:52 -04:00
conductor-tier2 9cc51ca9af conductor(track): nagent review - deep-dive + 6 pitfalls + 10 actionable takeaways
Reference/analysis track. Produces 0 code changes.

Artifacts (conductor/tracks/nagent_review_20260608/):
- spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing
- report.md (571 lines) - 14-section deep-dive; primary deliverable
- comparison_table.md (79 lines) - flat side-by-side reference
- decisions.md (286 lines) - 10 future-track candidates with priority matrix
- nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded
  in code (file:line refs into nagent source and Manual Slop source)
- metadata.json (132 lines) - structured metadata + verification criteria
- state.toml (113 lines) - per-task tracking + user-corrections log (7 entries)

14 nagent principles covered in report.md (durable work, text-in/text-out,
editable state, visible protocol, the loop, per-file memory, repo history,
neighborhoods, sub-conversations, controlled writes, large files, tool
discovery, framework differences, build your own).

6 pitfalls (revised from 8 after user-corrections):
1. No structured output protocol in Application AI (opaque function calling)
2. Provider-specific history in process globals (ai_client._anthropic_history
   + _deepseek_history + _minimax_history)
3. RAG is not 'history as data' (fuzzy, not auditable)
4. AI client is a stateful singleton (2,685-line ai_client.py)
5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want)
6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py)

User-corrections applied (3 rounds, 7 total corrections recorded):
- Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7
  per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix
- Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN
  CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed
  conversation log; complementary, not equivalent)
- Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for
  1:1 discussions' (user wants this)
- RAG: opt-in, not gap; user wants pre-staging via sub-conversation
- Personas: config bundling (can opt out via AI settings)
- Tool discovery: deferred (user has 'intent based DSL' idea but 'no where
  near that ideation yet')

10 actionable takeaways (separate from the 6 pitfalls - those are
diagnosis, these are prescription):
1. State visibility (UI inspector for in-process state)
2. Readable conversation log (text-greppable, not just JSON-L)
3. Sub-agents for 1:1 (HIGH priority - user-flagged)
4. File-identity over file-path (st_dev:st_ino rename-safe)
5. One loop shape visible in diagnostics
6. Visible retry on protocol failure
7. Meta-Tooling DSL (intent-based, deferred)
8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606)
9. Single source of truth for disc_entries + provider history
10. Sub-agent return type constraint (bake into candidate #1 spec)

Domain classification: every recommendation tagged Application / Meta-Tooling
/ Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling
domain; Manual Slop's Application AI is a different kind of thing.

No code modified by this track (reference/analysis only). All 7 files
parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve.
Track is 'active' awaiting human review; future-track candidates live in
decisions.md and nagent_takeaways_20260608.md.
2026-06-08 18:44:35 -04:00
ed c531cebe03 conductor(plan): review pass — fix cross-references, add NOT_READY + with_errors + Lottes/Valigo, split §3.4 into 8 sub-tasks 2026-06-08 09:38:27 -04:00
ed 64823493c0 conductor(closeout): ship test_batching_refactor_20260606 with CLOSEOUT.md and follow-up recommendation 2026-06-08 08:36:22 -04:00
ed 50bd894f8d conductor(archive): ship test_batching_refactor_20260606 to archive 2026-06-08 01:16:58 -04:00
ed 796eec0058 conductor(plan): mark Phases 2,3 complete in test_batching_refactor_20260606 2026-06-08 01:09:02 -04:00
ed 7610c9c1dc conductor(plan): mark Phase 1 complete in test_batching_refactor_20260606 2026-06-08 00:53:59 -04:00
ed 2b56ab3c5c conductor(track): initialize test_batching_post_refactor_polish_20260607 spec/plan/state 2026-06-08 00:27:32 -04:00
ed 0db5ec3eef conductor(tracks): mark License CVE Audit track as complete
Phase 4 verification complete: 4 atomic commits landed, 28
unit + integration tests passing, the audit script runs
end-to-end against the post-cleanup repo, --strict mode
+ baseline file wired in as the CI gate. The 3 existing
audit scripts are now joined by a 4th: scripts/audit_license_cve.py.

Scope: third-party deps only. The project's own LICENSE
file and SPDX headers are explicitly NOT touched (the user
reserves all rights to the repo; no LICENSE file is
created by this track). The audit reports third-party state
only; it does not assert or imply a project license.

Commits:
  a8ae11d3 - chore(audit): add license_cve audit script + initial report
  20fa3558 - chore(deps): tilde-pin all deps; delete requirements.txt
  a7ab994f - chore(audit): add --strict mode + baseline file (CI gate)
  (this)   - conductor(tracks): mark track complete
2026-06-07 15:28:25 -04:00
ed a8ae11d3a8 chore(audit): add license_cve audit script + initial report
scripts/audit_license_cve.py: 4 internal checks (license +
CVE + pin + source-header), policy tables (allowlist of
permissive/weak-copyleft/public-domain, blocklist of
non-OSI/restricted-source), and a main() that runs all 4
and emits line-per-violation to stdout + a markdown report.

Tests (26 unit + integration) cover license classifier (16
variants across MIT, BSD, Apache, LGPL, MPL, CC0, WTFPL,
GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, Anti-996,
Hippocratic, unknown), pin check (3), source-header check
(3), license check via importlib.metadata (1), CVE check
via subprocess pip-audit (2), and a smoke test of the main
loop (1).

No new pip deps in the project: pure stdlib
(importlib.metadata, tomllib, pathlib, re) + subprocess to
pip-audit (optional dev tool, installed via 'uv tool install
pip-audit' if user wants CVE checks).

Initial report at docs/reports/license_cve_audit/2026-06-07/
records the current state. The Phase 2 commit will apply
the fixes (tilde-pin, delete requirements.txt); the Phase 3
commit will add --strict mode + baseline file for CI.
2026-06-07 15:07:46 -04:00
ed 8af3af5c34 fix(app_controller): correctly construct TrackState with Ticket (not TicketState)
The _push_mma_state_update method (added in 8216d494) used
models.TicketState for the persisted tasks list, but:
  - src.models has no TicketState class; only Ticket
  - TrackState.tasks is annotated as List[Ticket]

So my code raised AttributeError on every call, which my
try/except caught and silently printed. Tests that depended
on save_track_state being called (test_push_mma_state_update)
failed because the call was skipped.

Also fixed:
  - TrackState field name: it's 'tasks' (not 'tickets') per the
    src.models dataclass annotation. My code was using 'tickets='
    which created a TypeError on construction.
  - Removed the [DEBUG ...] print statements added during the
    investigation; they were only for diagnosing the silent
    AttributeError.
  - Kept the try/except so a real exception is still logged to
    stderr (visible via -s flag) without breaking the test.

Result: 11/11 tests in test_gui_phase4 + test_ticket_queue now
pass:
  - test_push_mma_state_update
  - test_ticket_priority_default/custom/to_dict/from_dict
  - TestBulkOperations::test_bulk_execute/skip/block (3)
  - TestReorder::test_reorder_ticket_valid/invalid (2)
2026-06-07 14:32:29 -04:00
ed 61b5572e2b chore(audit): spec license_cve_audit track (compliance + CVE + pinning)
Builds scripts/audit_license_cve.py: single audit script that
checks third-party deps (pyproject.toml + uv.lock transitive
tree) for: (1) license compliance against the project's policy,
(2) known CVEs (via pip-audit subprocess), (3) version-pinning,
and (4) source-file SPDX license headers in src/ and scripts/.

LICENSE POLICY (encoded in the script)
Allowlist (permissive or weak copyleft or public domain):
- Permissive: MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib,
  Python-2.0, 0BSD, PSF-2.0
- Weak copyleft (Python import-safe): LGPL 2.1/3.0, MPL-2.0
- Public domain: CC0, WTFPL

Blocklist (non-OSI / restricted-source):
- GPL (any version), AGPL (any version)
- SSPL (MongoDB 2018) - broad service-provider trigger
- BSL / BUSL - delayed open source; competitive-use restriction
- Commons Clause - 'cannot sell the software' addendum
- Elastic License v2 - 'cannot offer as managed service'
- Unknown / unparseable / missing metadata (catches packaging
  bugs and custom licenses)

The two lists are explicit. Default rule: unknown = violation
(never auto-pass). The script's --help references the policy
table for transparency. Specific per-license additions go in
scripts/audit_license_cve.py directly; no spec change needed.

TRACK SCOPE
In scope: third-party deps (direct + transitive), source-file
SPDX headers, vendored libraries (defensive), version pinning.
Out of scope: the project's own LICENSE file, project's own
SPDX/Copyright headers, recommendations on project license.
The user reserves all rights to the repo; no LICENSE file is
created by the track. The audit reports third-party state only.

OUTPUT FORMAT (sanitized: no JSON in user-facing output)
- Stdout: line-per-violation, parseable by eye and by grep
- Markdown report in docs/reports/license_cve_audit/2026-06-07/
- Baseline file: JSON (matches existing audit_weak_types
  convention; internal state for --strict mode only)

CI GATE
--strict mode + scripts/audit_license_cve.baseline.json. Fails
CI on any new violation OR any new CVE. Mirrors the 3 existing
audit scripts (audit_main_thread_imports, audit_weak_types,
check_test_toml_paths).

COMMITS PLANNED
1. chore(audit): add license_cve audit script + initial report
2. chore(deps): tilde-pin all deps; delete requirements.txt
3. chore(audit): add --strict mode + baseline file (CI gate)
4. conductor(tracks): mark License CVE Audit track complete

NO NEW PIP DEPENDENCIES IN PROJECT
Pure stdlib (importlib.metadata, tomllib, pathlib, re) +
subprocess to pip-audit (an optional dev tool, installed via
'uv tool install pip-audit' if user wants CVE checks).
2026-06-07 14:26:22 -04:00
ed ad13007352 chore(audit): switch output format from JSON to custom postfix DSL
Per user direction ('make a custom DSL ideal for recording the
call-graph or other metrics', 'I want a post-fix heiarchy', 'JSON
is ill-performant'): replaced JSON serializer with a custom
postfix (RPN) DSL tailored to the audit's record shapes.

THE CUSTOM DSL
- Postfix (operands before operator); no brackets, braces,
  commas, or colons.
- Length-prefixed lists: N items followed by 'list' word.
- Tagged records: each 'word' is a constructor with a known
  arity (action=3, fn=3, call=1, mut=3, exp-op=5, pair=2, int=1).
- Whitespace-tokenized; bare atoms unquoted; double quotes
  only when whitespace/special chars present.
- nil for null; backslash for line comments; true/false for bool.
- Trivial parser (~30 lines): _tokenize_dsl splits on
  whitespace and respects quotes + comments; parse_dsl
  walks tokens and evaluates tagged words against a known
  arity table (DSL_WORD_ARITY).
- Round-trips: to_dsl(profile) -> parse_dsl(to_dsl(profile))
  yields the same in-memory structure.

DELIVERABLES (updated spec + plan)
- src/code_path_audit.py: to_dsl, dump_dsl, parse_dsl,
  _tokenize_dsl, to_tree (prefix-tree text renderer),
  to_markdown, to_mermaid.
- Output: .dsl files (machine) + .tree (human prefix view) +
  .md (summary tables) + .mmd (Mermaid diagrams).
- No new pip dependencies; pure stdlib.

WHAT STAYED
- The 7 cost classes (file_io, network, ast_parse, json_io,
  pickle, deep_copy, loop_amplified) and 5 mutation kinds
  are unchanged. The json_io cost class is for JSON file
  I/O the audit detects, not the output format.
- 36 tests total (15 + 8 + 10 + 3 across the 4 implementation
  phases).
2026-06-07 12:17:56 -04:00
ed 803f87137b chore(audit): plan code path audit track (6 phases, 30 tests)
6 phases, one per commit:
Phase 1: data structures (CallGraph, ExpensiveOp, StateMutation)
  - 15 unit tests
Phase 2: trace_action + ActionProfile + cost model + AST walking
  - 8 tests (synthetic + integration on real src/)
Phase 3: JSON / markdown / Mermaid output
  - 4 tests
Phase 4: MCP tool + CLI surface
  - 3 tests
Phase 5: run audit on 3 actions; commit report
Phase 6: tracks.md update

TDD pattern: each task has synthetic-data unit test, then
real implementation, then integration with real src/, then
commit. The state.toml scaffold is created in Phase 0 Step 0.1
and advanced after each phase.

3 actions in scope (MMA is cold per user):
- ai_message_lifecycle (5 entry points)
- discussion_save_load (4 entry points)
- gui_startup (3 entry points)

Two follow-up tracks recorded but NOT in this track:
- pipeline_runtime_profiling_20260607
- pipeline_pruning_20260607

No new pip dependencies; pure stdlib (ast, json, pathlib,
dataclasses). Read-only on src/; new files are the tool, the
tests, and the report under docs/reports/code_path_audit/2026-06-07/.
2026-06-07 11:37:40 -04:00
ed c82207b191 conductor(plan): mark phase 6 complete [9647b8d] 2026-06-07 11:31:43 -04:00
ed f069a8b27b chore(audit): spec code path audit track
Design for a data-oriented static-analysis tool
(src/code_path_audit.py) that audits the 3 major actions (AI
message lifecycle, discussion save/load, GUI startup) for
expensive operations, redundant calls, and pipelining
candidates. Output: JSON data files + markdown summaries +
Mermaid per-action call graphs in docs/reports/code_path_audit/.

61 src/ files, 27,447 total lines. Call graph is non-trivial;
per-action traversal is what makes analysis tractable.

Cost model: 7 cost classes (file_io, network, ast_parse,
json_io, pickle, deep_copy, loop_amplified) with heuristic
weights; EXPENSIVE_THRESHOLD = 40,000 module constant. 5
state mutation kinds (attr_write, container_mutate, file_write,
ipc_emit, global_write).

The 3 action entry points are per-action defined (see Per-Action
Design table). MMA worker spawn is OUT of scope per user (cold
until 1:1 discussion UX is dogfooded).

Two follow-up tracks recorded but NOT in this track:
- pipeline_runtime_profiling_20260607: calibrate the heuristic
  cost model with real measurements; catch C-extension cost,
  decorator dispatch, JIT effects that static analysis can't
  resolve.
- pipeline_pruning_20260607: implement the high-priority
  optimization candidates surfaced by this track's report.

6 atomic commits planned: data structures; trace_action +
ActionProfile + cost model; output (JSON/MD/Mermaid); MCP +
CLI; run audit + commit report; tracks.md update.
2026-06-07 11:30:06 -04:00
ed ca781543ea conductor(plan): mark sub-track 2 (audit violations) COMPLETE [2e3a6385]
All 6 sub-tracks (2A-2F) complete. Audit script: 0 violations (was 67 baseline / 61 before sub-track 2). Track is now FULLY COMPLETE (was previously [~] due to sub-track 2 partial). 79 tests added/passing across sub-tracks 2A-2F. Updated sub_tracks table in state.toml with per-sub-track completion details. Pre-existing test failures (4 unrelated) documented in test_failure_notes.
2026-06-07 11:01:24 -04:00
ed adfd75a6d4 conductor(plan): mark phase 5 complete [46ce3cd] 2026-06-07 10:49:34 -04:00
ed f5fc99f91f conductor(plan): mark phase 4 complete [0022dd8] 2026-06-07 10:45:33 -04:00
ed 811e7203c1 conductor(plan): mark phase 3 complete [bd20fee] 2026-06-07 10:43:52 -04:00
ed 41e970e0e2 conductor(plan): mark phase 2 complete [dfbde95] 2026-06-07 10:40:46 -04:00
ed 62214e3cae conductor(plan): mark phase 1 complete [3d412ba] 2026-06-07 10:38:52 -04:00
ed eae5b0a22b chore(scripts): plan unused scripts cleanup track (5 phases)
5 phases, one per deletion category from the spec:

Phase 1: Remove one-shot indent fixers (10 files)
Phase 2: Remove one-shot transform scripts (6 files)
Phase 3: Remove superseded entropy and code-stat audits (4 files)
Phase 4: Remove one-shot migrators and repros (6 files)
Phase 5: Remove tool-call aliases and legacy tool discovery (4 files)
Phase 6: Final verification + tracks.md update

Each phase = one git rm + one commit + one git note + one
state.toml update. Phase 0 adds the state.toml scaffold. Phase 6
runs the full test suite in 4-at-a-time batches per workflow.md
Phase Completion protocol, re-runs the 2 active audit scripts
(main_thread_imports, weak_types) for regression check, and
commits the tracks.md update.

TDD pattern adapted for deletion: pre-deletion baseline (Phase 0)
+ per-phase git rm + post-deletion test suite pass (Phase 6).
No new code, no new tests, no new CI gate.
2026-06-07 10:26:49 -04:00
ed 87098a2ec3 chore(scripts): spec unused scripts cleanup track
Design for removing 30 confirmed-unused one-off scripts from
scripts/. Net effect: scripts/ shrinks from 56 -> 26 files
(54% reduction). All deletions are hard deletes via 5 atomic
per-category commits; git log is the restore path.

26 KEEPS documented by category (CI gates, MMA, MCP, test runner,
ImGui linter, audit/scaffolding, tool-call bridge, Docker, borderline
utility). 30 DELETES grouped by category: one-shot indent fixers
(10), one-shot transform scripts (6), superseded entropy audits (4),
one-shot migrators/repros (6), tool-call aliases and legacy tool
discovery (4).

No new CI gate added. Follow-up unused_scripts_audit_20260607
recorded in the spec. Plan (writing-plans) will produce 5 phases
(one per category).
2026-06-07 10:19:20 -04:00
ed 02239bc38f conductor(plan): mark sub-track 2A (pydantic in models.py) complete [01ddf9f1]
Resuming sub-track 2 (audit violations) per user direction. Sub-track 2A cleared 1 of 61 violations (pydantic in src/models.py via PEP 562 __getattr__ + pydantic.create_model). 60 remain across file_cache (4), api_hooks (4), sloppy (5), app_controller (23), gui_2 (24). Next: 2B (tree_sitter in file_cache.py).
2026-06-07 10:03:48 -04:00
ed f09cd4a733 conductor: doc final sync for sub-tracks 2 (partial), 3, 4 + conftest fix 2026-06-06 21:45:27 -04:00
ed bb2ac6c9c0 conductor: finalize startup_speedup_20260606 docs (sub-track 1 + 3 post-shipping fixes) 2026-06-06 20:45:58 -04:00
ed cf01870b35 conductor(plan): write 7-phase implementation plan for mcp_architecture_refactor_20260606
~25 tasks across 7 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.5): Foundation. 3-layer security module (8 unit tests
  returning Result[Path]); SubMCP Protocol + MCPController class (6 unit
  tests). Controller added ALONGSIDE the existing 45 functions in
  mcp_client.py (no removal yet).
- Phase 2 (2.1-2.4): Backward compat. git mv mcp_client.py to
  mcp_client_legacy.py; create new mcp_client.py as a slim shim
  re-exporting 45+ old symbols. 12 legacy shim tests verify the surface.
  The 4 existing test files + src/app_controller.py:61 still work.
- Phase 3 (3.1-3.4): FileIOMCP extracted (9 tools, 10 unit tests).
- Phase 4 (4.1-4.4): PythonMCP extracted (14 tools, 14 unit tests).
- Phase 5 (5.1-5.5): CMCP, CppMCP, WebMCP, AnalysisMCP extracted
  (4 sub-MCPs, 18 unit tests; pattern mirrors Phase 3/4).
- Phase 6 (6.1-6.3): ExternalMCP extracted from mcp_client_legacy.
  Class name preserved (ExternalMCPManager).
- Phase 7 (7.1-7.5): Update dispatch() in the legacy shim to use the
  new controller (inverted-dict O(1) lookup); update docs; manual
  smoke test; archive the track.

Each sub-MCP follows the same template (class with name / description
/ tools / invoke; security check for path-taking tools; Result wrapping
in invoke(); delegation to legacy functions for the actual implementation).
The sub-MCPs are thin adapters in v1; a future track can move the
implementations into the sub-MCP files directly.

Self-review at the end maps every spec section to a task (no gaps),
confirms zero placeholders, and verifies type/method-name consistency
across phases (SubMCP Protocol, MCPController class, Result[str,
ErrorInfo], _resolve_and_check all defined in Phase 1; used
consistently across Phases 3-6).
2026-06-06 20:43:48 -04:00
ed 2720a8940c conductor(track): Initialize mcp_architecture_refactor_20260606
Track + metadata + state + tracks.md registration for the 2,205-line
mcp_client.py split into a slim controller + 6 native sub-MCPs + 1
external sub-MCP.

Key design decisions (per user feedback):
- Naming convention: mcp_<type>.py for native MCPs (mcp_file_io.py,
  mcp_python.py, mcp_c.py, mcp_cpp.py, mcp_web.py, mcp_analysis.py).
- ExternalMCPManager class name preserved (moves to mcp_external.py).
- Sub-MCP shape: class with name / description / tools / invoke().
- MCPController: holds ALL_SUB_MCPS list, inverted-dict tool lookup,
  3-layer security (extracted to mcp_client_security.py), schema
  aggregation.
- Each invoke() returns Result[str, ErrorInfo] (from
  data_oriented_error_handling_20260606).
- Backward compat: mcp_client_legacy.py re-exports all 45+ old
  symbols; the 4 existing test files + src/app_controller.py:61
  direct call continue to work.

DSL future (per user notes on APL/K/Cosy): NOT in this track.
Documented in spec §12.1 as the mcp_dsl_20260606 follow-up.
Sub-MCP architecture is the natural unit to pair with a DSL emitter.

7 phases. ~22 task slots. New tests: 9 (one per sub-MCP + controller +
security + legacy). Modified tests: 4 (existing mcp_* tests must
pass unchanged).

Blocked by: data_oriented_error_handling_20260606, data_structure_strengthening_20260606.
Blocks: mcp_dsl_20260606 (future DSL track).
2026-06-06 20:34:00 -04:00
ed 9147578155 conductor(plan): write 2-phase implementation plan for data_structure_strengthening_20260606
~22 tasks across 2 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.12): Foundation. type_aliases.py (10 TypeAliases + 1
  NamedTuple) with 8 unit tests. Mechanical replacement of 345 weak
  sites in 6 files (ai_client 139, app_controller 86, models 51,
  api_hook_client 32, project_manager 20, aggregate 17). Each file
  has a per-substitution table for the mechanical replacement. Audit
  script gains --strict mode + baseline file (CI gate). 4 audit tests.
- Phase 2 (2.1-2.10): FileItemsDiff NamedTuple integrated.
  generate_type_registry.py (AST-based; 3 modes: default, --check,
  --diff). Initial registry generated in docs/type_registry/ (8+ .md
  files). 6 generator tests. Type aliases styleguide + product-guidelines
  updates. Manual smoke test. Track archived.

The type registry generator uses --check mode for CI: it regenerates to
a temp dir and diffs against the committed registry; exit 1 if drift.
The agent's track-completion workflow is: regenerate -> review diff ->
commit. CI enforces --check on every PR.

Self-review at the end maps every spec section to a task (no gaps),
confirms zero placeholders, and verifies type/method-name consistency
across phases (all 10 aliases + FileItemsDiff defined in Task 1.2; used
consistently in Tasks 1.3-1.8 and Phase 2).
2026-06-06 18:15:15 -04:00
ed 95d1b08142 conductor(plan): Final track summary - 9 phases, 50 tests, 3066ms saved 2026-06-06 18:08:59 -04:00
ed 432c789524 conductor(spec): add registry-drift risk to §9 2026-06-06 18:07:48 -04:00
ed aba35f9f4a conductor(spec): Add type registry to data_structure_strengthening track
Per user feedback (2026-06-06): instead of a follow-up 'TypedDict
Migration' track, add a NEW deliverable: an auto-generated type registry
in docs/type_registry/ that captures the field information in docs form.

New files:
- scripts/generate_type_registry.py (NEW): AST-based tool that reads
  src/ and writes per-source-file .md files with the fields of every
  @dataclass, NamedTuple, TypeAlias, TypedDict. Has --check (CI mode,
  exits 1 if registry would change) and --diff (dry run) modes.
- docs/type_registry/ (NEW, generated): index.md + per-source-file
  references (type_aliases.md, ai_client.md, models.md, etc.).
- tests/test_generate_type_registry.py (NEW): verify the generator.

Architecture updates:
- Section 3.6 (NEW): Type Registry architecture with example output.
- Section 3.7 (NEW): Why per-source-file docs (locality of reference).
- Section 1.1 (NEW): 'Why docs over TypedDict' analysis (3 reasons:
  lower upfront cost, better fit for AI workflow, auto-maintained).
- Goals table: registry added as a C (innovation) goal.
- Module layout: docs/type_registry/ and scripts/generate_type_registry.py
  added to the new files list.
- Migration: Phase 2 now includes the registry generator + initial docs.
- Out of scope: TypedDict migration REMOVED; 'auto-typing the field
  shape' added with the docs as the chosen approach.
- See Also: TypedDict follow-up REPLACED with 'Registry Maintenance &
  CI Integration' (smaller scope, just wires the generator into CI).

The 'cost we eat' is the LLM reading 200-500 lines of markdown per
query. This is bounded and proportional to actual information need.
The upfront cost of designing TypedDict schemas for every type is
unbounded. Tradeoffs favor the docs approach for v1; TypedDict can
come later as a future track if desired.
2026-06-06 18:06:34 -04:00
ed ed42a97a9b conductor(track): Initialize data_structure_strengthening_20260606
Track + metadata + state + tracks.md registration for the type-aliases
refactor that follows the audit_weak_types.py findings (430 weak sites
across 29 of 61 files; 86% concentrated in 6 high-traffic files).

Key design decisions (per user approval):
- 10 TypeAlias definitions in src/type_aliases.py (Metadata, CommsLogEntry,
  CommsLog, HistoryMessage, History, FileItem, FileItems, ToolDefinition,
  ToolCall, CommsLogCallback).
- 1 NamedTuple (FileItemsDiff) for the _reread_file_items return.
- Mechanical replacement of 345 weak sites across 6 files (NOT 430; the
  remaining 85 are in 23 lower-impact files deferred to future tracks).
- scripts/audit_weak_types.py gains a --strict mode and a baseline file
  (scripts/audit_weak_types.baseline.json) so the count is enforced.
- 2 phases: aliases + 6-file replacement + audit baseline; NamedTuples
  + docs + archive.
- Honest about what's missing: TypedDict / @dataclass migration is a
  follow-up track (typed_dict_migration_20260606), not this one.
- Coexistence with the data_oriented_error_handling_20260606 track's
  Result[T] / ErrorInfo: the aliases are value-level (data types), Result
  is control-level (wrapper). They compose (Result[FileItems] is valid).
  No conflict.

Audit baseline:
- Pre-track: 430 weak sites, 0 strong patterns
- Target after Phase 1: ~60 weak sites (only the 23 lower-impact files)
- Top 4 unique type strings account for 86% of findings (4-6 aliases
  eliminate the bulk of the noise).

Not blocked by anything; can be executed independently of the other
pending tracks. Blocks typed_dict_migration_20260606 (the future Phase 2).
2026-06-06 17:49:22 -04:00
ed b91962e458 conductor(plan): Mark Phase 5D complete - gui_2 lazy proxy + dead import removal 2026-06-06 17:19:14 -04:00
ed f7b11f7f1c conductor(plan): write 5-phase implementation plan for data_oriented_error_handling_20260606
~25 tasks across 5 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.9): Foundation. Post-tracks baseline verification, typing_extensions
  dep, src/result_types.py (10 unit tests), conductor/code_styleguides/error_handling.md
  canonical reference, product-guidelines.md + workflow.md updates.
- Phase 2 (2.1-2.7): mcp_client.py refactor. _resolve_and_check returns Result[Path];
  all 9 tool functions return Result[str]; 30+ 'assert p is not None' chain removed;
  tool dispatch updated; existing tests migrated to .data/.errors pattern.
- Phase 3 (3.1-3.8): ai_client.py refactor (HIGHEST RISK). _classify_<vendor>_error()
  returns ErrorInfo (not raise ProviderError); _send_<vendor>() renamed to
  _send_<vendor>_result() returning Result[str] (8 vendors); ProviderError class
  REMOVED; new public send_result() API; send() marked @deprecated (rewired to
  call send_result() and unwrap).
- Phase 4 (4.1-4.5): rag_engine.py refactor. _init_vector_store, _validate_collection_dim
  return Result; NilRAGState used; broad except Exception becomes ErrorInfo entries.
- Phase 5 (5.1-5.7): Deprecation wiring (filterwarnings in conftest.py to silence
  send() warning in existing tests), docs updates (guide_ai_client + guide_mcp_client),
  follow-up track public_api_migration_20260606 placeholder in tracks.md, manual
  smoke test, archive the track.

Coordination with the 3 pending tracks (startup_speedup, test_batching_refactor,
qwen_llama_grok_integration) addressed throughout. Phase 1 Task 1.1 verifies the
baseline before any refactor begins. Post-tracks state considerations from spec
§10 fully integrated into the task breakdown.

1-space indentation per project style guide. No placeholders. All test code
is concrete. Self-review at end confirms full spec coverage (every section
of spec.md mapped to a task).
2026-06-06 17:06:30 -04:00
ed 32edad0a4b conductor(plan): Mark Phase 5A-5C complete (commands, theme_2, markdown_helper lazy imports) 2026-06-06 17:01:05 -04:00
ed cbc3b075a0 conductor(track): Initialize data_oriented_error_handling_20260606
Track + metadata + state + tracks.md registration for the Fleury-pattern
error handling refactor.

Key design decisions (per user approval):
- Option A for _send_<vendor>() handling: rename to _send_<vendor>_result()
  and change return type to Result[str] (contained to internal callers).
- send() is marked @typing_extensions.deprecated; send_result() is the new
  public API.
- ProviderError exception is FULLY REPLACED by ErrorInfo dataclass
  (a value, not an exception).
- 5 phases: foundation, mcp_client, ai_client, rag_engine, deprecation+archive.
- Post-tracks baseline check (Phase 1 Task 1.1) verifies the 3 pending
  tracks have merged before proceeding.
- 9 Open Questions, 7 Risks, 5 verification criteria, follow-up track
  public_api_migration_20260606 planned in spec §12.1.

Blocked by: startup_speedup_20260606, test_batching_refactor_20260606,
qwen_llama_grok_integration_20260606. Blocks: public_api_migration_20260606.
2026-06-06 16:58:22 -04:00
ed 494f68f9d9 conductor(spec): Add 'Coordination with Pending Tracks' section (§10)
This track executes after startup_speedup, test_batching_refactor, and
qwen_llama_grok_integration land. Section 10 documents the expected
post-tracks codebase state and answers 6 critical coordination questions:

- Q1: Existing _send_<vendor>() functions (returning str) are renamed
  to _send_<vendor>_result() and changed to return Result[str] (Option A:
  clean rename, contained to internal callers).
- Q2: send_openai_compatible in src/openai_compatible.py STAYS as-is
  (it raises at the SDK boundary; correct per Fleury). The new
  _send_<vendor>_result() functions catch and convert to ErrorInfo.
- Q3: Deprecation warning on send() will produce Python warnings in
  tests; filterwarnings in conftest.py silences them during transition.
- Q4: The except ProviderError clauses in src/ai_client.py become
  dead code after the refactor and are removed in Phase 3.
- Q5: ProviderError is FULLY REPLACED by ErrorInfo (a value, not an
  exception). ProviderError removed entirely; ErrorInfo is the new
  error type.
- Q6: ProviderError.ui_message() moves to ErrorInfo.ui_message().

Phase 1 also adds a baseline verification task to confirm the 3 pending
tracks have merged before proceeding.

Also renumbered Out of Scope (11) and See Also (12) sections to
preserve monotonic section numbers.
2026-06-06 16:54:25 -04:00
ed 16291234ff conductor(plan): Record Phase 4 checkpoint SHA 883682c1 2026-06-06 16:37:27 -04:00
ed a0ff1bde91 conductor(plan): Mark Phase 4 complete - app_controller fastapi import removal + _require_warmed lift 2026-06-06 16:36:20 -04:00
ed 7fb13fbf4b conductor(plan): Record Phase 3 checkpoint SHA + mark T3.6 complete 2026-06-06 16:13:35 -04:00
ed 8905c26bff conductor(plan): Mark Phase 3 complete - ai_client SDK import removal done 2026-06-06 16:11:14 -04:00
ed 9eed60238a conductor(plan): mark T3.1 RED done; T3.2 holding for MCP fix (16780ec6) 2026-06-06 15:16:02 -04:00
ed b17cbbdeca conductor(plan): write 6-phase implementation plan for qwen_llama_grok_integration_20260606
~30 tasks across 6 phases, each with explicit Red-Green-Refactor TDD steps:
- Phase 1 (1.1-1.8): Capability matrix framework (src/vendor_capabilities.py)
  + shared OpenAI-compatible helper (src/openai_compatible.py). 13 unit tests.
- Phase 2 (2.1-2.8): Qwen via DashScope native SDK. 5 unit tests.
- Phase 3 (3.1-3.7): Grok (xAI) + Llama (Ollama + OpenRouter + custom URL)
  via shared helper. 8 unit tests.
- Phase 4 (4.1-4.3): MiniMax refactor (_send_minimax from ~250 -> ~50 lines).
  Safety net: existing tests/test_minimax_provider.py.
- Phase 5 (5.1-5.5): 9 capability-driven UX adaptations in src/gui_2.py.
  Manual smoke test for all 3 new vendors.
- Phase 6 (6.1-6.4): Update docs/guide_ai_client.md + guide_models.md.
  Archive the track.

Data-oriented design: shared helper is the algorithm on normalized data;
_send_<vendor>() entry points are thin boundary adapters.

1-space indentation per project style guide. No placeholders. All test
code is concrete. Self-review at end confirms spec coverage (every
section of spec.md mapped to a task).
2026-06-06 15:06:30 -04:00
ed 97daaff29b conductor(spec): Fix Qwen-Audio matrix entry consistency (vision=false, audio deferred)
The capability matrix v1 has no 'audio' field (audio_input is deferred to v2).
Qwen-Audio's vision flag was incorrectly marked true. Changed to false and
clarified that v1 uses Qwen-Audio as text-only; audio attachment UI is
hidden via the absent audio capability check.
2026-06-06 14:58:03 -04:00
ed 7c1d597ef1 conductor(track): Initialize qwen_llama_grok_integration_20260606 spec
Three new vendors + capability matrix framework + MiniMax refactor:

**Capability matrix v1 (7 features):** vision, tool_calling, caching, streaming,
model_discovery, context_window, cost_tracking. Audio and server-side code
execution deferred to a follow-up track.

**Qwen via DashScope native SDK:** Qwen-Turbo, Qwen-Plus, Qwen-Max, Qwen-Long
(1M context), Qwen-VL-Plus/Max (vision), Qwen-Audio. Native API chosen over
OpenAI-compatible mode to unlock Qwen-Audio, Qwen-Long custom chunking, and
Qwen-VL-Max enhanced vision.

**Llama (OpenAI-compatible, multi-backend):** Ollama (local, free), OpenRouter
(cloud aggregator covering Together/Groq/Fireworks), custom URL escape hatch.
Models: Llama 3.1 8B/70B/405B, 3.2 1B/3B, 3.2 11B/90B Vision, 3.3 70B.

**Grok via xAI (OpenAI-compatible):** Grok-2, Grok-2-Vision, Grok-Beta.

**Shared OpenAI-compatible helper** in src/openai_compatible.py processes a
normalized request/response data structure; each _send_<vendor>() is a thin
adapter at the boundary (data-oriented design per Fleury/Acton/Lottes).

**MiniMax refactor:** ~250 lines reduced to ~50 by using the shared helper.
Existing test_minimax_provider.py is the safety net.

**UX adaptation:** 9 UI elements (screenshot, tools toggle, cache panel, stream
progress, fetch models, token budget, cost panel) read from the matrix instead
of hard-coding per-vendor branches.

**Out of scope (deferred):** Anthropic/Gemini/DeepSeek migration to the matrix
(separate track), audio input, server-side code execution, PDF input, batch API,
fine-tuning.

6 phases planned: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX
adaptation, docs+archive.
2026-06-06 14:56:00 -04:00