Private
Public Access
0
0
Commit Graph

1394 Commits

Author SHA1 Message Date
ed d2ff6ffcf9 conductor(plan): Phase 7 complete - test_bed_health report 2026-06-09 16:59:16 -04:00
ed 3ed52be4bf conductor(plan): Phase 6 complete - clean_baseline marker 2026-06-09 16:42:48 -04:00
ed afc8600800 conductor(plan): Phase 5 complete - set_value hook verified 2026-06-09 16:35:18 -04:00
ed 6764c9e12f conductor(plan): Phase 4 complete - coalesce _sync_rag_engine 2026-06-09 16:27:15 -04:00
ed 45b4497a66 conductor(plan): Phase 3 complete - tmp_path_factory + live_gui_workspace fixture 2026-06-09 16:15:50 -04:00
ed 05ddb45236 conductor(plan): Phase 2 complete - FR1 handle + autouse fixture 2026-06-09 15:43:38 -04:00
ed 30c04860c7 conductor(plan): Phase 1 audit complete - ready for user review 2026-06-09 15:30:31 -04:00
ed 5df22fa8d5 conductor(audit): trace set_value('ai_input') flow to find routing bug 2026-06-09 15:29:27 -04:00
ed 5e13fa9ba7 conductor(audit): document _sync_rag_engine race in controller 2026-06-09 15:29:17 -04:00
ed aebbd66836 conductor(audit): document hardcoded workspace paths in test suite 2026-06-09 15:29:06 -04:00
ed d1c6c6c327 conductor(audit): catalog live_gui test cross-file state dependencies 2026-06-09 15:28:56 -04:00
ed fcb161fd2e conductor(tracks): add test_infrastructure_hardening_20260609 as foundation track + supersede 4 placeholder test tracks 2026-06-09 15:18:20 -04:00
ed 566cf08cb8 conductor(track): test_infrastructure_hardening_20260609 - spec to kill the test regression nightmare 2026-06-09 15:15:26 -04:00
conductor-tier2 ac0c0cbe73 docs(styleguide): add No-Diagnostic-Noise rule to AI-Agent Conventions
One addition to conductor/code_styleguides/python.md §8
"AI-Agent Specific Conventions":

- **No diagnostic noise in production code (Added
  2026-06-09).** `sys.stderr.write(f"[XYZ_DIAG] ...") lines
  in src/*.py are technical debt. The right place for
  one-time investigation output is tests/artifacts/<test>.diag.log
  (a log file) or a standalone /tmp/diag_<name>.py script.
  If you must instrument production code, the diag lines
  are part of the same atomic commit as the fix.

- **Test files ARE allowed to be diagnostic.** The rule
  applies to src/*.py only; tests/test_*.py may use
  print(..., file=sys.stderr) freely.

Markdown only. No code modified.
2026-06-09 14:03:18 -04:00
conductor-tier2 631c40c9c4 docs(workflow): add Process Anti-Patterns section + Isolated-Pass rule
Two additions to conductor/workflow.md §"Known Pitfalls":

1. **Isolated-Pass Verification Fallacy (Added 2026-06-09)** —
   the rule that a test passing in isolation but failing in
   batch is FAILING. The only verification that matters for
   live_gui tests is the batch run. This is the flip side of
   the existing "Live_gui Test Fragility (Authoring-Side)"
   rule. Cross-references that rule.

2. **Process Anti-Patterns (Added 2026-06-09)** — 8-rule
   summary list, with cross-reference to AGENTS.md for the
   full ruleset. The 8 patterns are: Deduction Loop,
   Report-Instead-of-Fix, Scope-Creep Track-Doc,
   Inherited-Cruft, Diagnostic Noise in Production, Premature
   Surrender, Verbose Commit Message, Isolated-Pass
   Verification Fallacy.

Markdown only. No code modified. Cross-references
AGENTS.md (the load-bearing agent doc) for the full text
of each pattern.
2026-06-09 14:03:00 -04:00
conductor-tier2 d7dc1e3b90 docs(edit-workflow): fix set_file_slice rule + add contract-change check
Three surgical fixes to conductor/edit_workflow.md:

1. **§2 "Verify Before Editing"** — removed the leftover
   `git checkout -- src/gui_2.py` instruction. The user's
   commit `4eba059e unfuck edit workflow` removed most of
   the git checkout nuke instructions but missed §2. The
   revised §2 now says: read the contract (function signature,
   yield shape, return type) before editing, and DO NOT use
   `git checkout` to revert. Ask the user.

2. **§3 "Reading Before Editing"** — added the line-number
   offset check. `set_file_slice` uses 1-indexed inclusive
   `start_line`/`end_line`; off-by-one is a common silent
   failure. The rule is now: confirm the exact line range
   with `get_file_slice` first.

3. **§8 "set_file_slice IS Valid for Multi-Line Content
   (Revised 2026-06-09)"** — replaced the wrong rule
   ("Do not use set_file_slice for multi-line content") with
   the correct rule: set_file_slice IS valid for 3-10 line
   surgical edits, with a tool-selection guide (which tool
   for which job), a mandatory contract-change check
   (search for callers of the symbol being changed; update
   all callers in the same atomic commit if the public
   interface changes), and a mandatory whitespace-and-EOL
   rule (preserve line ending, indentation, and line count).

4. **§9 "No Diagnostic Noise in Production Code
   (Added 2026-06-09)"** — new section. Diag stderr goes
   to log files or /tmp scripts, NOT src/*.py. If you must
   add diag lines to production code, they are part of the
   same atomic commit as the fix — they do not live
   uncommitted in the working tree.

5. **"If set_file_slice produces wrong indentation"** —
   new handler in the Step-by-Step Workflow. Tells the
   agent: you wrote the wrong indent; the tool did what
   you asked; re-read the file with get_file_slice; do
   NOT use git checkout to revert.

These are the rule corrections the user demanded after
the Tier-2's bad set_file_slice + git nuke + diag-noise
behavior. Markdown only. No code modified.
2026-06-09 14:02:41 -04:00
ed 4eba059e89 unfuck edit workflow. 2026-06-09 13:48:17 -04:00
ed b801b11c3b conductor(todo): mark task 9 (test deps in dev + conftest gate) as shipped 2026-06-09 10:39:29 -04:00
conductor-tier2 999fdea467 docs(c11-interop): cross-reference SSDL digest in See Also
The SSDL digest (docs/reports/computational_shapes_ssdl_digest_20260608.md,
504 lines, 30KB) is the theoretical foundation for the chunkification
pattern. Per the digest's Technique 5 "Assume-away (Xar)" in §2.2
and the "Xar-style chunked arrays" recommendation in §5.2, the
chunkification track is a *direct application* of the SSDL's
"assume as much as possible" lens (§4).

This commit adds the SSDL digest to the See Also of the v1+v2
C11-Python interop assessment (front-matter Cross-references line).
The same cross-reference is also being added to:
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md
  (in a new §6.1 "SSDL alignment" subsection)
- conductor/tracks/manual_ux_validation_20260608_PLACEHOLDER/spec.md
  (in §5 Architectural Reference + §6 See Also + a new §2.6
  "SSDL cross-reference" section that distinguishes GUI ASCII
  vocabulary from SSDL vocabulary)

No code modified. Cross-reference only.

Also: small update to conductor/tracks.md to add the 2 new
tracks (manual_ux_validation_20260608_PLACEHOLDER as Active;
chunkification_optimization_20260608_PLACEHOLDER as Backlog/Contingency).
2026-06-08 23:42:21 -04:00
conductor-tier2 5b3c11a0f3 conductor(track): manual_ux_validation_20260608_PLACEHOLDER - ASCII-sketch workflow + first-target redesign
The user said (verbatim): "On number 1. I love the idea and definitely
see poitental." This commit creates a full track that promotes the
ASCII-sketch UX ideation workflow
(docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to
a real track with a concrete first target.

The track complements (does not replace) the existing
manual_ux_validation_20260302 track (which is a general UX review
track; this 2026-06-08 track is *focused* on the ASCII-sketch
workflow specifically).

Files (5 total, ~52KB, 12,000+ words):
- spec.md (186 lines, 9 sections) - track design, 5 open
  questions, first target analysis, SSDL cross-reference
- plan.md (~280 lines, 4 phases, 21 tasks) - TDD-style with
  WHERE/WHAT/HOW/SAFETY annotations
- metadata.json (~120 lines) - structured metadata, 5 open
  questions with defaults, 5 SSDL principles available
- state.toml (~95 lines) - per-task tracking + phase status
- index.md (~50 lines) - track context + related docs

Key design decisions captured:

1. Two distinct vocabularies are conflated at first glance:
   - GUI ASCII (the workflow) for panel sketches
   - SSDL (computational shapes digest) for internal code sketches
   Spec §2.6 makes the distinction explicit; both are useful for
   this track (GUI ASCII for Phase 2 design; SSDL for Phase 3
   internal refactoring documentation).

2. The 5 open questions from the workflow report (Q1 vocabulary,
   Q2 comparison policy, Q3 storage location, Q4 tooling,
   Q5 frequency) are documented with sensible defaults in
   spec.md §2.1-2.5 and metadata.json. The user can override
   any of them; defaults pre-stage the work.

3. First target is src/gui_2.py:3770 render_discussion_entry
   (Discussion Hub per-entry panel). Rationale:
   - Most-edited surface (every AI/user message)
   - User has strong opinions (per nagent_review_20260608 3 rounds
     of corrections)
   - 23-op matrix A1-A7 is the source of truth
   - ImGui layout maps cleanly to ASCII
   - SSDL defusing techniques can guide the internal refactoring

4. 4 phases: 1=resolve 5 questions, 2=execute workflow on first
   target (1-3 ASCII rounds), 3=implement per design contract
   (TDD with 7 test files for A1-A7 operations),
   4=document the pattern + propose 5-7 next targets.

Cross-references added throughout:
- docs/reports/computational_shapes_ssdl_digest_20260608.md
  (the SSDL digest, with explicit "this is a different vocabulary
  for a different purpose" note in spec §2.6)
- docs/reports/ascii_sketch_ux_workflow_20260608.md (the workflow)
- docs/guide_discussions.md (the 23-op matrix A1-A7)
- conductor/tracks/nagent_review_20260608/ (the source of the
  user's editable-discussion corrections)
- conductor/tracks/manual_ux_validation_20260302/ (complementary
  general UX review track)
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/
  (the contingency track; referenced in spec §2.6 SSDL cross-ref)

No code modified. Track is active; Phase 1 (5 user-questions) is
the current phase. User-confirmed worth doing in the prior turn.
2026-06-08 23:41:43 -04:00
conductor-tier2 816e9f2f5c conductor(track): chunkification_optimization_20260608_PLACEHOLDER - 1-page contingency document
The user's third correction this session changed the framing
from "build a stateful C extension" to "wait for a hard constraint,
then build a request/response blob pipeline." This commit creates
a 1-page contingency document (no plan.md, no implementation)
that captures:

- The threshold: "only worth it under a hard constraint that
  no existing Python package can solve"
- The shape when activated: subprocess-launch C11 binary with
  request/response blob wire format (NOT stateful CPython C
  extension)
- The 2 cited candidates (markdown parsing into aggregate markdown,
  context snapshot processing) are NOT currently bottlenecks per
  src/aggregate.py:380-454 (pure-Python string concat, zero
  third-party markdown deps in pyproject.toml:6-27) and
  src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity,
  debounced)
- The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 +
  "Xar-style chunked arrays" recommendation in §5.2 pre-support
  this track

Files (4 total, 227+ lines of contingency document):
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/metadata.json
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/state.toml
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/index.md

Cross-references added:
- docs/reports/computational_shapes_ssdl_digest_20260608.md (the
  SSDL digest is the theoretical foundation; explicitly cited in
  the spec's §6.1 "SSDL alignment" and in metadata.json external)
- docs/reports/c11_python_interop_assessment_20260608.md (the v1+v2
  assessment; explicitly cited in spec's §6 See Also)

No code modified. Track does NOT appear in the active queue
of conductor/tracks.md; appears in the Backlog / Contingency
section as a reference, not a commitment.

Activation criteria (per metadata.json):
1. Profiling shows a real bottleneck in a target code path
2. The bottleneck cannot be solved with existing Python packages
3. The user explicitly approves activation

Without all 3, this track stays deferred. Default action is don't.
2026-06-08 23:40:27 -04:00
conductor-tier2 a9333bbb59 conductor(track-update): code_path_audit_20260607 - post-4-tracks timing + 5-source framing
The user specified that the code_path_audit_20260607 track should run
AFTER the 4 foundational tracks complete (qwen_llama_grok,
data_oriented_error_handling, data_structure_strengthening,
mcp_architecture_refactor). This commit formalizes that timing
and grounds the audit's analytical framing in the 5 sources loaded
into context on 2026-06-08.

3 surgical additions to the spec/plan, no task changes:

1. Post-4-tracks timing (new section in spec.md §"Timing", plus
   a "Timing" callout in plan.md's opening):
   - The 4 tracks will significantly reshape src/ai_client.py,
     src/mcp_client.py, src/app_controller.py, and
     src/type_aliases.py
   - Running the audit on pre-refactor code would produce a
     report that's stale on day 1
   - The post-4-tracks timing ensures the audit grounds
     optimization decisions for the *resulting* architecture
   - Pre-flight check: verify all 4 tracks are [x] completed
     in conductor/tracks.md before starting this track

2. Analytical framing (new section in spec.md §"Analytical Framing
   (5-source lens)"):
   - Maps each of the 5 sources (Fleury taxonomy + Fleury
     combinatoric + Muratori Big OOPs + Reece Assuming + user's
     chunk ideation) to specific audit-time heuristics
   - 4 concrete heuristics: effective-codepath count,
     entity-hierarchy fingerprint, assumed-too-much detector,
     chunkification candidates
   - The heuristics shape REPORT INTERPRETATION, not the
     static cost model (which stays data-grounded in
     EXPENSIVE_THRESHOLD + per-class weights)

3. See Also cross-references in spec.md (6 new entries):
   - nagent_review Pitfalls #2 and #4 (provider history
     globals + stateful singleton)
   - wo84LFzx5nI Big OOPs transcript (full text, 4310
     segments, 200KB; loaded 2026-06-08)
   - i-h95QIGchY Assuming transcript (full text, 3719
     segments, 162KB; loaded 2026-06-08)
   - ed_chunk_data_structures_20260523.md (5-image archive
     of user's chunk ideation, 19KB; saved 2026-06-08)
   - computational_shapes_ssdl_digest_20260608.md (the SSDL
     digest that synthesizes the 4-source computational-shapes
     thinking; the audit's tree/mermaid outputs ARE
     computational-shape visualizations)

4. tracks.md entry updated to include the spec/plan links and
   a brief status note that the audit is post-4-tracks.

5. plan.md has a "Timing" callout at the top stating the 4
   tracks must ship before the plan executes.

No code modified. The audit's tasks (Phases 1-6) are unchanged
in structure; the new sections only add analytical context
and timing constraints.
2026-06-08 22:05:54 -04:00
ed 51ecace464 test(live_workflow): pre-flight health check fails fast on dirty state
PR3 of the test_full_live_workflow_imgui_assert fix sequence.

When a prior live_gui test in the same session crashes the GUI (e.g.
via an ImGui IM_ASSERT from cumulative panel state), the controller's
_io_pool gets shut down. The next test starts in a degraded state
but only discovers this 120s later when its project switch times
out with a confusing 'cannot schedule new futures after shutdown'
error.

This commit adds a /api/gui_health pre-flight check at the start of
test_full_live_workflow. If the GUI is degraded, the test fails
fast (within 1s) with a clear, actionable message that includes:
- The exact RuntimeError that caused the degradation
- The full traceback of the last ImGui scope mismatch
- A note that the new test cannot proceed with a dirty state

Per user feedback 2026-06-08: 'I don't want a batch to be too fragile
where I can't restart the app and continue with the next test file
if it fails. Just has to note that the new file didn't get to deal
with a dirty state.'

Also includes the planning documents written earlier in this session:
- TODO_test_full_live_workflow_v2.md (task list)
- test_full_live_workflow_imgui_assert_20260608.md (root cause report)
- test_full_live_workflow_propagation_digest_20260608.md (solutions digest)
- batch_resilience_plan_20260608.md (batch resilience plan)

Verification:
- test_full_live_workflow in isolation: 13.45s PASS (health=True, no degrade)
- 4 sims + test_full_live_workflow in batch: 76.46s (1 FAIL fast, 4 sims PASS)
  - Without PR3 fix: 200s FAIL with confusing 120s timeout
  - With PR3 fix: 76s FAIL with clear 'GUI is degraded' message
- The fast-fail is observable, not silent (per user's 'wrap might be
  worth it if that properly lets us handle the assert')
2026-06-08 21:17:54 -04:00
conductor-tier2 8a597d1832 conductor(track-update): mcp_architecture_refactor - list_tool_schemas + security-as-contract
4 surgical additions to the spec, no task changes:

1. list_tool_schemas on the SubMCP Protocol: Added the method
   to §3.1 (The SubMCP Protocol). Per nagent_review Pitfall #6
   (hard-coded tool discovery) and takeaway #5 (self-describing
   tools), each sub-MCP advertises its own capabilities via
   list_tool_schemas() rather than relying on a central registry.
   This is the equivalent of nagent's collect_bin_tool_descriptions
   per sub-MCP. The MCPController.get_tool_schemas() becomes a
   simple aggregator.

2. Security model is the contract: Added a new Important note
   to §3.3 (The 3-Layer Security Model). The 3 layers
   (Allowlist Construction -> Path Validation -> Resolution
   Gate, per docs/guide_mcp_client.md) are not just refactored
   - they are the CONTRACT between MCPController and the
   sub-MCPs. Sub-MCPs receive a pre-validated Path and trust
   it. They do NOT re-validate. The refactor is structural,
   not security-changing.

3. Docs touchpoint in Phase 7: Added the docs touchpoint to
   Phase 7 per the docs Refresh Protocol. The update to
   docs/guide_mcp_client.md should add a Sub-MCP Architecture
   section, link the list_tool_schemas pattern to 3-Layer
   Security Model, and cross-link the 3 new guides from
   the 2026-06-08 docs refresh.

4. See Also cross-references: Added 8 new entries to §12.2:
   - docs/guide_context_aggregation.md (FileItem consumer)
   - docs/guide_state_lifecycle.md (App state delegation)
   - docs/guide_discussions.md (23-operation matrix)
   - conductor/tracks/qwen_llama_grok_integration_20260606/
     (Result return type coordination)
   - conductor/tracks/nagent_review_20260608/{report,takeaways}.md
   - (2 specific data_oriented_error_handling and
     data_structure_strengthening cross-refs)

No plan.md changes.
2026-06-08 20:59:27 -04:00
conductor-tier2 1fb0d79c0d conductor(track-update): data_structure_strengthening - HistoryMessage vs ProviderHistoryMessage split
4 surgical additions to the spec, no task changes:

1. ProviderHistoryMessage: Added a new alias to §3.1 (The
   Aliases). Per nagent_review Pitfall #4 (provider history
   divergence), the UI/curation layer (HistoryMessage, edited
   via disc_entries[i].content) and the SDK layer
   (ProviderHistoryMessage, the bytes actually replayed to the
   LLM) are *distinct*. Conflating them via a single alias
   perpetuates the bug. The new alias is documented as a
   separate concept with its own use sites (_anthropic_history,
   _deepseek_history, _minimax_history, _grok_history,
   _llama_history). The follow-up public_api_migration_20260606
   track is the natural moment to unify the two layers; this
   spec just makes the distinction explicit.

2. FileItem alias points to the existing models.FileItem
   dataclass, not Metadata. Per docs/guide_context_aggregation.md
   (added 2026-06-08), FileItem is a 9-field dataclass
   (path, auto_aggregate, force_full, view_mode, selected,
   ast_signatures, ast_definitions, ast_mask, custom_slices,
   injected_at) with a __post_init__ normalizer. Aliasing it to
   dict[str, Any] would lose the type safety. The 9 other
   aliases remain dict aliases for round-trip compatibility.

3. gui_2.py and mcp_client.py as follow-up: Added a Note
   (dated 2026-06-08) to the Out of Scope section. The 23
   lower-impact files (deferred) are dominated by gui_2.py
   (26+ weak sites per guide_state_lifecycle.md) and
   mcp_client.py (will be touched heavily by the parallel
   mcp_architecture_refactor_20260606). The deferral is correct
   but the follow-up should explicitly call out these two
   files as the next targets, rather than implying they're
   handled.

4. See Also cross-references: Added 7 new entries to §12.2:
   - docs/guide_models.md (FileItem dataclass source)
   - docs/guide_context_aggregation.md (FileItems consumer)
   - docs/guide_discussions.md (HistoryMessage shape)
   - docs/guide_state_lifecycle.md (state delegation)
   - conductor/tracks/mcp_architecture_refactor_20260606/
   - conductor/tracks/nagent_review_20260608/{report,takeaways}.md

No plan.md changes.
2026-06-08 20:50:50 -04:00
conductor-tier2 0471440c68 conductor(track-update): data_oriented_error_handling - nagent_review + docs refresh
3 surgical additions to the spec, no task changes:

1. New ErrorKind: Added PROVIDER_HISTORY_DIVERGED_FROM_UI to
   the ErrorKind enum. Per nagent_review Pitfall #4 (provider
   history divergence: user edits disc_entries[i].content via
   the discussion UI but ai_client._<provider>_history still
   replays the original). The new kind makes the divergence
   *detectable* and *reportable* so the follow-up
   public_api_migration_20260606 track can collapse the two
   history layers. The Result pattern from this track is the
   natural carrier for the signal.

2. State-delegation regression tests: Added mandatory
   regression tests to the testing strategy in §6 for the
   ai_client refactor (highest-risk phase). The new tests
   exercise:
   - app.temperature = 0.5 round-trips through App.__getattr__/
     __setattr__ delegation (per gui_2.py:666-675)
   - controller.disc_entries[i].content is reflected in the
     next send_result()'s messages parameter
   - The 3 per-provider history locks serialize correctly under
     concurrent send_result() calls
   The reason this is mandatory: per guide_state_lifecycle.md
   (added 2026-06-08), the App.__getattr__/__setattr__ pattern
   means a partial refactor manifests as silent AttributeError
   deep in test code, not at the refactor commit boundary.

3. See Also cross-references: Added 6 new entries to §12.3:
   - docs/guide_ai_client.md (per-provider history globals)
   - docs/guide_mcp_client.md (3-layer security model)
   - docs/guide_state_lifecycle.md (3 per-thread + 7-lock pattern)
   - docs/guide_discussions.md (23-operation matrix)
   - docs/guide_context_aggregation.md (build_discussion_section)
   - conductor/tracks/mcp_architecture_refactor_20260606/
   - conductor/tracks/nagent_review_20260608/{report,takeaways}.md

No plan.md changes. Plan tasks are task-level and will flow from
the spec changes when the track is re-planned.
2026-06-08 20:41:00 -04:00
conductor-tier2 77ae2ec7a8 conductor(track-update): qwen_llama_grok - spec notes for nagent_review + docs refresh
4 surgical additions to the spec, no task changes:

1. Result return type: Added a coordination note in §3.1 (Data-
   Oriented Design) explaining that the shared send_openai_compatible
   helper should return Result[NormalizedResponse, ErrorInfo] from
   day 1, not NormalizedResponse + ProviderError raise. This is so
   the downstream data_oriented_error_handling_20260606 track is
   a small mechanical pass over new code, not a second migration.
   References nagent_review Pitfall #4 (provider history divergence)
   and the ErrorKind.PROVIDER_HISTORY_DIVERGED_FROM_UI use case.

2. Declarative read, not behavioral dispatch: Added clarification
   to §6 (UX Adaptation) that the capability matrix is a *read* of
   declarative data, not a new dispatch layer. Per nagent_review
   Pitfall #1 (opaque function calling in the Application is the
   correct choice; nagent-style protocol is for Meta-Tooling),
   UI elements are visible/enabled/disabled/hidden but the
   *behavior* they invoke is unchanged. Three concrete examples
   added: screenshot button, cost panel, cache panel.

3. PROVIDERS source of truth: Added a NOTE in §3.2 (Module Layout)
   that src/models.py:79-86 PROVIDERS is the existing single
   source of truth for the (vendor, model) enumeration. The
   capability registry reads from this constant rather than
   introducing a parallel list. Cross-references
   docs/guide_models.md.

4. Docs touchpoint: Expanded Phase 6 (Docs + Archive) in §9 to
   note that docs/guide_ai_client.md needs the new providers +
   the shared helper documented, and that
   docs/guide_context_aggregation.md (added 2026-06-08) is the
   reference for the aggregate.py pipeline that all new providers
   use.

5. See Also cross-references: Added 3 new entries to §13.2:
   - docs/guide_context_aggregation.md (the new pipeline guide)
   - conductor/tracks/nagent_review_20260608/report.md (§1, §5, §15)
   - conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md
     (§1, §2, §9)

No plan.md changes. Plan tasks are task-level and will flow from
the spec changes when the track is re-planned.
2026-06-08 20:35:52 -04:00
conductor-tier2 9cc51ca9af conductor(track): nagent review - deep-dive + 6 pitfalls + 10 actionable takeaways
Reference/analysis track. Produces 0 code changes.

Artifacts (conductor/tracks/nagent_review_20260608/):
- spec.md (240 lines) - track wrapper with Application/Meta-Tooling framing
- report.md (571 lines) - 14-section deep-dive; primary deliverable
- comparison_table.md (79 lines) - flat side-by-side reference
- decisions.md (286 lines) - 10 future-track candidates with priority matrix
- nagent_takeaways_20260608.md (363 lines) - 10 actionable patterns grounded
  in code (file:line refs into nagent source and Manual Slop source)
- metadata.json (132 lines) - structured metadata + verification criteria
- state.toml (113 lines) - per-task tracking + user-corrections log (7 entries)

14 nagent principles covered in report.md (durable work, text-in/text-out,
editable state, visible protocol, the loop, per-file memory, repo history,
neighborhoods, sub-conversations, controlled writes, large files, tool
discovery, framework differences, build your own).

6 pitfalls (revised from 8 after user-corrections):
1. No structured output protocol in Application AI (opaque function calling)
2. Provider-specific history in process globals (ai_client._anthropic_history
   + _deepseek_history + _minimax_history)
3. RAG is not 'history as data' (fuzzy, not auditable)
4. AI client is a stateful singleton (2,685-line ai_client.py)
5. No non-MMA disposable sub-conversations (1:1 gap; user-flagged want)
6. Hard-coded tool discovery (45-tool if/elif in mcp_client.py)

User-corrections applied (3 rounds, 7 total corrections recorded):
- Editable discussions: PARTIAL -> PARITY (DIFFERENT FOCUS) with full A1-A7
  per-entry + B1-B11 discussion-level + C1-C5 undo/redo operation matrix
- Per-file memory: DOMAIN MISMATCH -> MANUAL SLOP IS STRONGER IN
  CURATION DIMENSION (FileItem + ContextPreset vs nagent's inode-keyed
  conversation log; complementary, not equivalent)
- Sub-conversations: MMA has it; 1:1 does not -> 'PARITY for MMA; GAP for
  1:1 discussions' (user wants this)
- RAG: opt-in, not gap; user wants pre-staging via sub-conversation
- Personas: config bundling (can opt out via AI settings)
- Tool discovery: deferred (user has 'intent based DSL' idea but 'no where
  near that ideation yet')

10 actionable takeaways (separate from the 6 pitfalls - those are
diagnosis, these are prescription):
1. State visibility (UI inspector for in-process state)
2. Readable conversation log (text-greppable, not just JSON-L)
3. Sub-agents for 1:1 (HIGH priority - user-flagged)
4. File-identity over file-path (st_dev:st_ino rename-safe)
5. One loop shape visible in diagnostics
6. Visible retry on protocol failure
7. Meta-Tooling DSL (intent-based, deferred)
8. Self-describing tools (subsumed by mcp_architecture_refactor_20260606)
9. Single source of truth for disc_entries + provider history
10. Sub-agent return type constraint (bake into candidate #1 spec)

Domain classification: every recommendation tagged Application / Meta-Tooling
/ Both per docs/guide_meta_boundary.md. nagent lives in the Meta-Tooling
domain; Manual Slop's Application AI is a different kind of thing.

No code modified by this track (reference/analysis only). All 7 files
parse cleanly (JSON, TOML, Markdown). All internal cross-links resolve.
Track is 'active' awaiting human review; future-track candidates live in
decisions.md and nagent_takeaways_20260608.md.
2026-06-08 18:44:35 -04:00
ed 9afc93bce2 fix(app_controller): clear project-switch state in _handle_reset_session
When a prior test in the tier-3-live_gui batch leaves a _do_project_switch
background thread running, the next test's btn_project_new_automated click
sees _project_switch_in_progress=True (from the prior thread) and queues
the new path via _project_switch_pending_path. The queued switch is never
actually submitted to the io_pool, so is_project_stale() stays True and
AI ops (_handle_generate_send) bail with 'project switch in progress;
AI ops disabled'.

Fix: _handle_reset_session now also clears _project_switch_in_progress,
_project_switch_pending_path, and _project_switch_error (under the
existing _project_switch_lock). This way, even if the prior background
thread is still running, the controller reports an idle state and the
new switch can be submitted normally.

Also:
- src/api_hook_client.py: reverted wait_for_project_switch to require
  in_progress=False (was relaxed to return on queued path, which misled
  the caller into thinking the switch was done)
- tests/test_handle_reset_session_clears_project.py: new test
  test_handle_reset_session_clears_project_switch_state asserts
  is_project_stale() returns False after reset
- tests/test_api_hook_client_wait_for_project_switch.py: updated
  test_wait_for_project_switch_does_not_return_on_queued (in_progress
  + matching path should keep waiting, not return early)
- tests/test_live_workflow.py: added pre-wait for any in-flight switch
  before doing btn_reset (so the test waits up to 60s for the prior
  switch to complete if needed)
- conductor/todos/TODO_test_full_live_workflow.md: updated Task 4 with
  the deeper hang analysis and recommended fix

Known follow-up: test_full_live_workflow still hangs in tier-3 batch
even with this fix, because the new _do_project_switch itself is hung
in the io_pool (likely saturation from prior sims' AI discussion turn
workers). Deeper investigation required.
2026-06-08 15:19:30 -04:00
ed 5087ee988d chore: move TODO_test_full_live_workflow.md to conductor/todos/
Following the conductor convention of organizing track-related
artifacts under conductor/. The TODO tracks the test_full_live_workflow
race condition fix and its follow-up items (Tasks 3, 7 still pending;
known batch hang documented).

Tasks 1, 2 (with regression fix), 4, 5, 6 are SHIPPED in prior commits.
2026-06-08 14:05:40 -04:00
ed 4548726a2b conductor(tracks): restructure - chronological by phase + status groupings + active queue table 2026-06-08 12:26:56 -04:00
ed c531cebe03 conductor(plan): review pass — fix cross-references, add NOT_READY + with_errors + Lottes/Valigo, split §3.4 into 8 sub-tasks 2026-06-08 09:38:27 -04:00
ed 64823493c0 conductor(closeout): ship test_batching_refactor_20260606 with CLOSEOUT.md and follow-up recommendation 2026-06-08 08:36:22 -04:00
ed fb6b4bd3eb conductor(tracks): mark test_batching_refactor_20260606 as completed 2026-06-08 01:18:20 -04:00
ed 50bd894f8d conductor(archive): ship test_batching_refactor_20260606 to archive 2026-06-08 01:16:58 -04:00
ed 796eec0058 conductor(plan): mark Phases 2,3 complete in test_batching_refactor_20260606 2026-06-08 01:09:02 -04:00
ed 7610c9c1dc conductor(plan): mark Phase 1 complete in test_batching_refactor_20260606 2026-06-08 00:53:59 -04:00
ed 2b56ab3c5c conductor(track): initialize test_batching_post_refactor_polish_20260607 spec/plan/state 2026-06-08 00:27:32 -04:00
ed 7bcb5a8c07 refactor(config): Route all config I/O through AppController
Eliminates 22 call sites that bypassed the AppController state owner
and read/wrote config.toml directly. AppController is now the single
source of truth for self.config; gui_2.py, commands.py, etc. go
through controller.save_config() / controller.load_config().

Production changes:
- src/models.py: rename load_config -> _load_config_from_disk,
  save_config -> _save_config_to_disk (private I/O primitives)
- src/app_controller.py: add public load_config()/save_config() methods
  that own the state. Update 3 internal call sites and 3 ConductorEngine
  call sites to pass max_workers from self.config
- src/multi_agent_conductor.py: ConductorEngine.__init__ now takes
  max_workers as a parameter (caller responsibility, not I/O primitive)
- src/external_editor.py: get_default_launcher() takes config as a
  parameter; gui_2.py:1311,4776 pass app.config
- src/gui_2.py: 17 sites of models.save_config(X.config) replaced with
  X.save_config() (delegates via __getattr__ to controller)
- src/commands.py: save_all() uses app.save_config()

Test changes (route through controller, not I/O primitive):
- tests/conftest.py: mock_app and app_instance fixtures now patch
  AppController.load_config/save_config instead of models I/O primitives
- 18 other test files: patches renamed from models._save_config_to_disk
  to AppController.save_config (and same for load_config)
- tests/test_app_controller_mcp.py: use SLOP_CONFIG env var instead of
  patching removed CONFIG_PATH module constant
- tests/test_parallel_execution.py: pass max_workers=2 explicitly to
  ConductorEngine (caller no longer reads config)
- tests/test_gui_paths.py: add save_config=MagicMock() to MockApp;
  assert on controller method, not I/O primitive
- tests/test_models_no_top_level_tomli_w.py: still calls private
  _save_config_to_disk directly (the only allowed exception; tests
  the lazy-load behavior of the primitive itself)

New files:
- scripts/audit_no_models_config_io.py: enforces the rule (--strict,
  --json modes; AST-based docstring detection to avoid false positives)
- conductor/code_styleguides/config_state_owner.md: documents the rule

Verification:
- 67 targeted tests pass
- scripts/audit_no_models_config_io.py --strict returns 0

This is the architectural cleanup that surfaced during the
audit_architectural_cheats_20260607 review. Closes the smoke-gun
CONFIG_PATH module constant (already done in 0c7ebf22) AND the
free-function models.load_config/save_config smell.

[conductor(checkpoint): config-iO-refactor-20260607]
2026-06-07 19:54:17 -04:00
ed c9c5535889 docs(workflow): add Skip-Marker Policy section
Per 2026-06-07 user feedback during test_suite cleanup:
"if the intent is to annotate a known failure, fine. But that
known failure must be addressed with priority."

New section between "Per-Task Decision Protocol" and
"Documentation Refresh Protocol" makes the policy explicit:

- Skip markers are DOCUMENTATION, not avoidance
- They're useful for opt-in integration tests, unimplemented
  features, or feature-flag-gated code
- They're NOT useful for pre-existing failures, "I don't
  understand this" issues, or racy tests the agent doesn't want
  to debug
- When adding a marker, MUST document the underlying issue AND
  what the fix would be
- When the fix is in-session reachable, FIX IT INSTEAD of
  skipping — limited context is not an excuse

Includes a 4-question review checklist before adding a skip.
References the existing AGENTS.md "Use skip markers as excuse to
AVOID" rule so the two policies don't drift.
2026-06-07 16:57:54 -04:00
ed 0db5ec3eef conductor(tracks): mark License CVE Audit track as complete
Phase 4 verification complete: 4 atomic commits landed, 28
unit + integration tests passing, the audit script runs
end-to-end against the post-cleanup repo, --strict mode
+ baseline file wired in as the CI gate. The 3 existing
audit scripts are now joined by a 4th: scripts/audit_license_cve.py.

Scope: third-party deps only. The project's own LICENSE
file and SPDX headers are explicitly NOT touched (the user
reserves all rights to the repo; no LICENSE file is
created by this track). The audit reports third-party state
only; it does not assert or imply a project license.

Commits:
  a8ae11d3 - chore(audit): add license_cve audit script + initial report
  20fa3558 - chore(deps): tilde-pin all deps; delete requirements.txt
  a7ab994f - chore(audit): add --strict mode + baseline file (CI gate)
  (this)   - conductor(tracks): mark track complete
2026-06-07 15:28:25 -04:00
ed a8ae11d3a8 chore(audit): add license_cve audit script + initial report
scripts/audit_license_cve.py: 4 internal checks (license +
CVE + pin + source-header), policy tables (allowlist of
permissive/weak-copyleft/public-domain, blocklist of
non-OSI/restricted-source), and a main() that runs all 4
and emits line-per-violation to stdout + a markdown report.

Tests (26 unit + integration) cover license classifier (16
variants across MIT, BSD, Apache, LGPL, MPL, CC0, WTFPL,
GPL, AGPL, SSPL, BSL, Commons Clause, Elastic, Anti-996,
Hippocratic, unknown), pin check (3), source-header check
(3), license check via importlib.metadata (1), CVE check
via subprocess pip-audit (2), and a smoke test of the main
loop (1).

No new pip deps in the project: pure stdlib
(importlib.metadata, tomllib, pathlib, re) + subprocess to
pip-audit (optional dev tool, installed via 'uv tool install
pip-audit' if user wants CVE checks).

Initial report at docs/reports/license_cve_audit/2026-06-07/
records the current state. The Phase 2 commit will apply
the fixes (tilde-pin, delete requirements.txt); the Phase 3
commit will add --strict mode + baseline file for CI.
2026-06-07 15:07:46 -04:00
ed 8af3af5c34 fix(app_controller): correctly construct TrackState with Ticket (not TicketState)
The _push_mma_state_update method (added in 8216d494) used
models.TicketState for the persisted tasks list, but:
  - src.models has no TicketState class; only Ticket
  - TrackState.tasks is annotated as List[Ticket]

So my code raised AttributeError on every call, which my
try/except caught and silently printed. Tests that depended
on save_track_state being called (test_push_mma_state_update)
failed because the call was skipped.

Also fixed:
  - TrackState field name: it's 'tasks' (not 'tickets') per the
    src.models dataclass annotation. My code was using 'tickets='
    which created a TypeError on construction.
  - Removed the [DEBUG ...] print statements added during the
    investigation; they were only for diagnosing the silent
    AttributeError.
  - Kept the try/except so a real exception is still logged to
    stderr (visible via -s flag) without breaking the test.

Result: 11/11 tests in test_gui_phase4 + test_ticket_queue now
pass:
  - test_push_mma_state_update
  - test_ticket_priority_default/custom/to_dict/from_dict
  - TestBulkOperations::test_bulk_execute/skip/block (3)
  - TestReorder::test_reorder_ticket_valid/invalid (2)
2026-06-07 14:32:29 -04:00
ed 61b5572e2b chore(audit): spec license_cve_audit track (compliance + CVE + pinning)
Builds scripts/audit_license_cve.py: single audit script that
checks third-party deps (pyproject.toml + uv.lock transitive
tree) for: (1) license compliance against the project's policy,
(2) known CVEs (via pip-audit subprocess), (3) version-pinning,
and (4) source-file SPDX license headers in src/ and scripts/.

LICENSE POLICY (encoded in the script)
Allowlist (permissive or weak copyleft or public domain):
- Permissive: MIT, BSD, Apache 2.0, ISC, Unlicense, Zlib,
  Python-2.0, 0BSD, PSF-2.0
- Weak copyleft (Python import-safe): LGPL 2.1/3.0, MPL-2.0
- Public domain: CC0, WTFPL

Blocklist (non-OSI / restricted-source):
- GPL (any version), AGPL (any version)
- SSPL (MongoDB 2018) - broad service-provider trigger
- BSL / BUSL - delayed open source; competitive-use restriction
- Commons Clause - 'cannot sell the software' addendum
- Elastic License v2 - 'cannot offer as managed service'
- Unknown / unparseable / missing metadata (catches packaging
  bugs and custom licenses)

The two lists are explicit. Default rule: unknown = violation
(never auto-pass). The script's --help references the policy
table for transparency. Specific per-license additions go in
scripts/audit_license_cve.py directly; no spec change needed.

TRACK SCOPE
In scope: third-party deps (direct + transitive), source-file
SPDX headers, vendored libraries (defensive), version pinning.
Out of scope: the project's own LICENSE file, project's own
SPDX/Copyright headers, recommendations on project license.
The user reserves all rights to the repo; no LICENSE file is
created by the track. The audit reports third-party state only.

OUTPUT FORMAT (sanitized: no JSON in user-facing output)
- Stdout: line-per-violation, parseable by eye and by grep
- Markdown report in docs/reports/license_cve_audit/2026-06-07/
- Baseline file: JSON (matches existing audit_weak_types
  convention; internal state for --strict mode only)

CI GATE
--strict mode + scripts/audit_license_cve.baseline.json. Fails
CI on any new violation OR any new CVE. Mirrors the 3 existing
audit scripts (audit_main_thread_imports, audit_weak_types,
check_test_toml_paths).

COMMITS PLANNED
1. chore(audit): add license_cve audit script + initial report
2. chore(deps): tilde-pin all deps; delete requirements.txt
3. chore(audit): add --strict mode + baseline file (CI gate)
4. conductor(tracks): mark License CVE Audit track complete

NO NEW PIP DEPENDENCIES IN PROJECT
Pure stdlib (importlib.metadata, tomllib, pathlib, re) +
subprocess to pip-audit (an optional dev tool, installed via
'uv tool install pip-audit' if user wants CVE checks).
2026-06-07 14:26:22 -04:00
ed ad13007352 chore(audit): switch output format from JSON to custom postfix DSL
Per user direction ('make a custom DSL ideal for recording the
call-graph or other metrics', 'I want a post-fix heiarchy', 'JSON
is ill-performant'): replaced JSON serializer with a custom
postfix (RPN) DSL tailored to the audit's record shapes.

THE CUSTOM DSL
- Postfix (operands before operator); no brackets, braces,
  commas, or colons.
- Length-prefixed lists: N items followed by 'list' word.
- Tagged records: each 'word' is a constructor with a known
  arity (action=3, fn=3, call=1, mut=3, exp-op=5, pair=2, int=1).
- Whitespace-tokenized; bare atoms unquoted; double quotes
  only when whitespace/special chars present.
- nil for null; backslash for line comments; true/false for bool.
- Trivial parser (~30 lines): _tokenize_dsl splits on
  whitespace and respects quotes + comments; parse_dsl
  walks tokens and evaluates tagged words against a known
  arity table (DSL_WORD_ARITY).
- Round-trips: to_dsl(profile) -> parse_dsl(to_dsl(profile))
  yields the same in-memory structure.

DELIVERABLES (updated spec + plan)
- src/code_path_audit.py: to_dsl, dump_dsl, parse_dsl,
  _tokenize_dsl, to_tree (prefix-tree text renderer),
  to_markdown, to_mermaid.
- Output: .dsl files (machine) + .tree (human prefix view) +
  .md (summary tables) + .mmd (Mermaid diagrams).
- No new pip dependencies; pure stdlib.

WHAT STAYED
- The 7 cost classes (file_io, network, ast_parse, json_io,
  pickle, deep_copy, loop_amplified) and 5 mutation kinds
  are unchanged. The json_io cost class is for JSON file
  I/O the audit detects, not the output format.
- 36 tests total (15 + 8 + 10 + 3 across the 4 implementation
  phases).
2026-06-07 12:17:56 -04:00
ed 803f87137b chore(audit): plan code path audit track (6 phases, 30 tests)
6 phases, one per commit:
Phase 1: data structures (CallGraph, ExpensiveOp, StateMutation)
  - 15 unit tests
Phase 2: trace_action + ActionProfile + cost model + AST walking
  - 8 tests (synthetic + integration on real src/)
Phase 3: JSON / markdown / Mermaid output
  - 4 tests
Phase 4: MCP tool + CLI surface
  - 3 tests
Phase 5: run audit on 3 actions; commit report
Phase 6: tracks.md update

TDD pattern: each task has synthetic-data unit test, then
real implementation, then integration with real src/, then
commit. The state.toml scaffold is created in Phase 0 Step 0.1
and advanced after each phase.

3 actions in scope (MMA is cold per user):
- ai_message_lifecycle (5 entry points)
- discussion_save_load (4 entry points)
- gui_startup (3 entry points)

Two follow-up tracks recorded but NOT in this track:
- pipeline_runtime_profiling_20260607
- pipeline_pruning_20260607

No new pip dependencies; pure stdlib (ast, json, pathlib,
dataclasses). Read-only on src/; new files are the tool, the
tests, and the report under docs/reports/code_path_audit/2026-06-07/.
2026-06-07 11:37:40 -04:00
ed c82207b191 conductor(plan): mark phase 6 complete [9647b8d] 2026-06-07 11:31:43 -04:00
ed 9647b8d228 conductor(tracks): mark Unused Scripts Cleanup track as complete
Phase 6 verification complete: 5 atomic per-category commits landed,
non-GUI test suite passes, 2 audit scripts (main_thread_imports,
weak_types) report no new violations, ImGui linter reports the
3 pre-existing src/gui_2.py findings (src/ untouched by this
track; informational mode exit 0). scripts/ shrinks from 56 to
26 files (54% reduction).
2026-06-07 11:30:29 -04:00
ed f069a8b27b chore(audit): spec code path audit track
Design for a data-oriented static-analysis tool
(src/code_path_audit.py) that audits the 3 major actions (AI
message lifecycle, discussion save/load, GUI startup) for
expensive operations, redundant calls, and pipelining
candidates. Output: JSON data files + markdown summaries +
Mermaid per-action call graphs in docs/reports/code_path_audit/.

61 src/ files, 27,447 total lines. Call graph is non-trivial;
per-action traversal is what makes analysis tractable.

Cost model: 7 cost classes (file_io, network, ast_parse,
json_io, pickle, deep_copy, loop_amplified) with heuristic
weights; EXPENSIVE_THRESHOLD = 40,000 module constant. 5
state mutation kinds (attr_write, container_mutate, file_write,
ipc_emit, global_write).

The 3 action entry points are per-action defined (see Per-Action
Design table). MMA worker spawn is OUT of scope per user (cold
until 1:1 discussion UX is dogfooded).

Two follow-up tracks recorded but NOT in this track:
- pipeline_runtime_profiling_20260607: calibrate the heuristic
  cost model with real measurements; catch C-extension cost,
  decorator dispatch, JIT effects that static analysis can't
  resolve.
- pipeline_pruning_20260607: implement the high-priority
  optimization candidates surfaced by this track's report.

6 atomic commits planned: data structures; trace_action +
ActionProfile + cost model; output (JSON/MD/Mermaid); MCP +
CLI; run audit + commit report; tracks.md update.
2026-06-07 11:30:06 -04:00
ed ca781543ea conductor(plan): mark sub-track 2 (audit violations) COMPLETE [2e3a6385]
All 6 sub-tracks (2A-2F) complete. Audit script: 0 violations (was 67 baseline / 61 before sub-track 2). Track is now FULLY COMPLETE (was previously [~] due to sub-track 2 partial). 79 tests added/passing across sub-tracks 2A-2F. Updated sub_tracks table in state.toml with per-sub-track completion details. Pre-existing test failures (4 unrelated) documented in test_failure_notes.
2026-06-07 11:01:24 -04:00