Private
Public Access
0
0
Commit Graph

1088 Commits

Author SHA1 Message Date
ed 15b3b33081 docs(spec): footnote tool-loop lift follow-up in §13.1.B (in case context expires)
As of end of Phase 4, only _send_minimax has a working tool-call loop.
Phase 3 (Grok, Llama) and Phase 2 (Qwen) entry points are single-shot;
they call send_openai_compatible once and return without executing
tool_calls. If the user notices 'tool execution doesn't work for
Qwen/Grok/Llama' after Phase 5 ships, the fix is to lift the tool
loop into a shared run_with_tool_loop() helper that wraps
send_openai_compatible. The 4 existing vendors (_send_anthropic /
_send_gemini / _send_gemini_cli / _send_deepseek) already have the
same inline duplication, so the lift would also help those.

This is a follow-up track, not in scope for qwen_llama_grok_integration_20260606.
2026-06-11 09:04:54 -04:00
ed ccdfaefd52 conductor(plan): mark Phase 4 fully complete (fix phase_4 SHA, t4_4 status, verification flags, minimax_refactor_stats, openai_compatible_models flag) 2026-06-11 08:57:35 -04:00
ed fadb4c329b conductor(plan): mark Phase 4 complete in qwen_llama_grok_integration_20260606 2026-06-11 02:25:36 -04:00
ed 94fe10089e conductor(plan): mark t3.18 + phase_3 complete; advance to phase 4 2026-06-11 02:06:13 -04:00
ed 9be228f620 conductor(plan): fix duplicates in Phase 3 state; advance t3.18 (checkpoint) 2026-06-11 02:05:07 -04:00
ed 07bac1c6a7 conductor(plan): mark t3.3-t3.7 + t3.14-t3.17 complete (t3.4/t3.15 cancelled: no template) 2026-06-11 02:04:09 -04:00
ed 8e3543d875 docs(spec): revise 'best API per vendor' after Grok consultation
Grok's own recommendation (consulted 2026-06-11):

  'xAI (Grok) | xAI official OpenAI-compatible (https://api.x.ai/v1) |
   Fully compatible and clean. Supports Grok-2 + Grok-2-Vision. No
   meaningful unique native surface lost by using the compatible
   endpoint.'

This REVERSES the earlier 'xAI native' correction. The OpenAI-
compatible approach for Grok is the canonical full-featured path;
the implementation in Phase 3 (OpenAI SDK with base_url=https://api.x.ai/v1
+ send_openai_compatible helper) is correct as-is.

Updates to the spec:

1. §3.1.1: replaced the 'use xAI native' decision with the confirmed
   per-vendor table. Qwen=Native, Grok=OpenAI-Compatible (per Grok's
   own confirmation), MiniMax=OpenAI-Compatible, DeepSeek=OpenAI-
   Compatible, Ollama=OpenAI-Compatible-in-v1 (native in v2),
   Meta Llama API=Native (new 4th backend, follow-up), Gemini=Native
   (follow-up), Anthropic=Native (follow-up). Also added Grok's
   recommended v2 matrix field expansion: audio, video, grounding,
   computer_use, local, reasoning/extended_thinking, web_search,
   x_search, code_execution, file_search, mcp_support, structured_output.

2. §4.3: reverted from 'Grok via xAI (Native REST API)' back to
   'Grok via xAI (OpenAI-Compatible) - confirmed 2026-06-11'. The
   implementation does NOT need a native refactor; the OpenAI SDK
   at https://api.x.ai/v1 is the canonical approach. Removed the
   earlier 'caching: true' entry from the registry (since the
   OpenAI-compat shim doesn't expose prompt_cache_key) and the
   'no persistent client' state struct (back to the OpenAI SDK
   pattern).

3. §13.1.B: renamed from 'Native Vendor APIs' to 'Llama Native APIs
   (Ollama native + Meta Llama API)' and removed the Grok native
   refactor item (Grok says OpenAI-compat is fine). Kept the Ollama
   native + Meta Llama API items + matrix expansion. Clarified that
   Grok tests do NOT need rewriting; only Llama tests get 2 more
   (native Ollama, Meta Llama API).

Net effect: the Phase 3 work that just shipped (Grok+Llama Green
using OpenAI-compat shim) is CORRECT as-is. The implementation
matches Grok's actual recommendation. No code rollback needed.
2026-06-11 02:01:08 -04:00
ed 06716252f1 docs(spec): add 'best API per vendor' principle; mark xAI native as target; document follow-ups
Three additions to the spec, per the user's architectural correction
in this session:

1. NEW section 3.1.1: 'Architectural principle: Use the best API per
   vendor' — explains why the OpenAI-compatible shim loses vendor-
   specific features (xAI: prompt_cache_key, reasoning_effort, server-
   side tools, cost_in_usd_ticks; Ollama: think param, images array,
   thinking field, structured outputs) and states the principle:
   'use each vendor's native SDK or REST API when one exists, falling
   back to OpenAI-compatible only when no native option exists.'

   Also notes that the capability matrix IS the aggregate tracker;
   future native features go into the matrix, and the GUI filters
   based on it (no per-vendor UI branches).

2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was
   'OpenAI-Compatible'. Now specifies two native endpoints
   (/v1/chat/completions and /v1/responses), the native features that
   matter, the updated capability registry (caching=true for Grok
   via prompt_cache_key), and a 'Phase 3 placeholder behavior' note
   that this track's Phase 3 ships the OpenAI-compatible Grok as a
   placeholder. The native refactor is deferred to follow-up B.

3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs
   (post-OpenAI-compatible-placeholder)' which documents:
   - Grok → xAI native REST
   - Llama (Ollama) → native /api/chat
   - Llama (Meta Llama API) → new 4th backend (deferred pending
     verification of Meta's API spec; llama.developer.meta.com/docs/overview
     returned 400 on fetch this session)
   - Capability matrix expansion (web_search, x_search, code_execution,
     file_search, mcp_support, reasoning_effort, structured_output)
   - Test rewrites (mock requests.post instead of chat.completions.create)

This is a docs-only commit; no code changes. The Phase 3 Green work
continues with the OpenAI-compatible approach as planned in the
existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track
B handles the native refactor when prioritized.
2026-06-11 01:49:36 -04:00
ed 891c008f0c conductor(plan): mark t3.1-t3.2 + t3.8-t3.13 complete; advance to t3.3+t3.14 (Green) 2026-06-11 01:42:13 -04:00
ed 4204116c66 conductor(plan): mark t2.11 completed (Phase 2 checkpoint) 2026-06-11 01:36:44 -04:00
ed 4d70dcc7ce conductor(plan): mark t2.11 + phase_2 complete; advance to phase 3 2026-06-11 01:35:22 -04:00
ed 45d316a0bd conductor(plan): mark t2.6-t2.10 complete (t2.7 cancelled: no template); advance to t2.11 2026-06-11 01:34:25 -04:00
ed 3940eb36ac conductor(plan): mark t2.1-t2.5 complete; advance to t2.6 (Green) 2026-06-11 00:53:58 -04:00
ed d5373e8f94 conductor(plan): mark t1.12 + phase_1 complete; advance to phase 2 2026-06-11 00:48:14 -04:00
ed 67782198b6 conductor(plan): mark t1.11 (dashscope dep) complete; advance to t1.12 2026-06-11 00:46:18 -04:00
ed f07e616c38 conductor(plan): mark t1.5-t1.10 complete; advance to t1.11 2026-06-11 00:41:11 -04:00
ed 6f11e7da14 conductor(plan): mark t1.1-t1.4 complete; advance to phase 1 in_progress 2026-06-11 00:31:57 -04:00
ed 49ac008a87 docs: replace 2 'fictional' usages with neutral phrasing (predates the refactor / was stale) 2026-06-10 23:34:33 -04:00
ed e1287a4cf4 conductor(plan): prior_session_sepia_20260610 spec + design + metadata
New track for prior-session sepia tint:
- 3 new theme slots (prior_session_bg, prior_session_tint, prior_session_amount)
- per-palette state dict mirroring _brightness/_contrast/_gamma
- apply_prior_tint helper (float-only math per user requirement)
- 6 prior-session render sites wrapped (2 bubble_vendor swaps + 4 tint wraps)
- Theme Settings panel slider with persistence

Code-block tonemap fix is OUT OF SCOPE (upstream imgui_bundle 1.92.5
API only exposes 4-value PaletteId enum, no per-instance struct).
See spec §1.1.1 and design doc 'Honest constraint' section.
2026-06-10 23:00:29 -04:00
ed 2b0e17ef0c conductor(track): add docs_sync_test_era_20260610 plan.md and spec.md
These were authored at track start but missed by the final-state
commit. They are the brief 1-2 page design intent and executable
plan for the docs sync track. The closing report at
docs/reports/docs_sync_test_era_20260610.md summarizes the actual
17-commit execution.
2026-06-10 20:25:32 -04:00
ed da240577f9 conductor(track): close docs_sync_test_era_20260610
- state.toml: status active->completed, all 25 tasks marked complete
  with commit SHAs, all 4 phases checkpointed
- metadata.json: status active->shipped, 17-commit list, all 9
  verification criteria flipped to DONE
2026-06-10 20:24:31 -04:00
ed 5d2624526b conductor(archive): move 4 test-hell lineage tracks to archive/
- workspace_path_finalize_20260609 -> archive/ (precursor track)
- test_infrastructure_hardening_20260609 -> archive/ (main 8-phase track)
- mma_tier_usage_reset_fix_20260610 -> archive/ (4 controller bug fixes)
- rag_phase4_sync_fix_20260610 -> archive/ (RAG dim-mismatch + rag_config reset)

The archive/ directory already existed (71+ archived tracks from
earlier phases). The 4 tracks' state.toml + metadata.json were already
closed in the prior commit. This just relocates the folders to match
the convention referenced in tracks.md.
2026-06-10 20:12:50 -04:00
ed 1ea38ad16b conductor(track): close 4 test-hell lineage tracks (state + metadata)
- test_infrastructure_hardening_20260609: status active->completed,
  last_updated 2026-06-09->2026-06-10, t7_*/t8_* tasks marked complete
  with commit SHAs (84edb200, 719fe9a, cb525519)
- mma_tier_usage_reset_fix_20260610: status spec->shipped
- rag_phase4_sync_fix_20260610: status spec->shipped
- workspace_path_finalize_20260609: status active->completed,
  current_phase 1->complete, all tasks marked complete
  (c725270b, 93ec2809), verification flags flipped to true
2026-06-10 20:09:01 -04:00
ed 2c924fe6df test(infra): poll-for-event race fixes + watchdog timeout bump + spec update 2026-06-10 15:14:35 -04:00
ed 80697e221a conductor(checkpoint): RAG phase 4 sync fix + test assertion fix - track complete 2026-06-10 13:55:06 -04:00
ed 2ad0d6a3f0 conductor(plan): Update RAG sync fix track state - sync works, retrieval assertion is separate 2026-06-10 13:29:18 -04:00
ed 989b2e6835 conductor(plan): New track for RAG phase 4 sync fix 2026-06-10 12:45:56 -04:00
ed 1772fa8fc2 conductor(checkpoint): Final Phase 2 complete - FR1+FR2 re-applied, sim test passes in batch 2026-06-10 12:13:16 -04:00
ed 14a329c1a9 conductor(plan): Adjust track after catastrophic git checkout - FR1+FR2 reverted, FR3+FR4 were no-ops 2026-06-10 11:45:56 -04:00
ed c729f8adaf conductor(plan): Update spec/plan for Phase 2 (live_gui sim test fragility) 2026-06-10 10:12:09 -04:00
ed e788512d93 conductor(plan): Mark mma_tier_usage_reset_fix_20260610 as complete 2026-06-10 09:59:26 -04:00
ed d304af5d22 sigh 2026-06-10 08:34:46 -04:00
ed 39c97cb365 conductor(track): workspace_path_finalize_20260609 - plan with 3 phases, 4-step execution 2026-06-09 20:29:55 -04:00
ed c725270b99 conductor(track): workspace_path_finalize_20260609 - per-run workspace under tests/artifacts/ 2026-06-09 20:27:20 -04:00
ed 5656957622 conductor(plan): Phase 8 complete - docs + audit extended 2026-06-09 17:05:35 -04:00
ed d2ff6ffcf9 conductor(plan): Phase 7 complete - test_bed_health report 2026-06-09 16:59:16 -04:00
ed 3ed52be4bf conductor(plan): Phase 6 complete - clean_baseline marker 2026-06-09 16:42:48 -04:00
ed afc8600800 conductor(plan): Phase 5 complete - set_value hook verified 2026-06-09 16:35:18 -04:00
ed 6764c9e12f conductor(plan): Phase 4 complete - coalesce _sync_rag_engine 2026-06-09 16:27:15 -04:00
ed 45b4497a66 conductor(plan): Phase 3 complete - tmp_path_factory + live_gui_workspace fixture 2026-06-09 16:15:50 -04:00
ed 05ddb45236 conductor(plan): Phase 2 complete - FR1 handle + autouse fixture 2026-06-09 15:43:38 -04:00
ed 30c04860c7 conductor(plan): Phase 1 audit complete - ready for user review 2026-06-09 15:30:31 -04:00
ed 5df22fa8d5 conductor(audit): trace set_value('ai_input') flow to find routing bug 2026-06-09 15:29:27 -04:00
ed 5e13fa9ba7 conductor(audit): document _sync_rag_engine race in controller 2026-06-09 15:29:17 -04:00
ed aebbd66836 conductor(audit): document hardcoded workspace paths in test suite 2026-06-09 15:29:06 -04:00
ed d1c6c6c327 conductor(audit): catalog live_gui test cross-file state dependencies 2026-06-09 15:28:56 -04:00
ed 566cf08cb8 conductor(track): test_infrastructure_hardening_20260609 - spec to kill the test regression nightmare 2026-06-09 15:15:26 -04:00
conductor-tier2 5b3c11a0f3 conductor(track): manual_ux_validation_20260608_PLACEHOLDER - ASCII-sketch workflow + first-target redesign
The user said (verbatim): "On number 1. I love the idea and definitely
see poitental." This commit creates a full track that promotes the
ASCII-sketch UX ideation workflow
(docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to
a real track with a concrete first target.

The track complements (does not replace) the existing
manual_ux_validation_20260302 track (which is a general UX review
track; this 2026-06-08 track is *focused* on the ASCII-sketch
workflow specifically).

Files (5 total, ~52KB, 12,000+ words):
- spec.md (186 lines, 9 sections) - track design, 5 open
  questions, first target analysis, SSDL cross-reference
- plan.md (~280 lines, 4 phases, 21 tasks) - TDD-style with
  WHERE/WHAT/HOW/SAFETY annotations
- metadata.json (~120 lines) - structured metadata, 5 open
  questions with defaults, 5 SSDL principles available
- state.toml (~95 lines) - per-task tracking + phase status
- index.md (~50 lines) - track context + related docs

Key design decisions captured:

1. Two distinct vocabularies are conflated at first glance:
   - GUI ASCII (the workflow) for panel sketches
   - SSDL (computational shapes digest) for internal code sketches
   Spec §2.6 makes the distinction explicit; both are useful for
   this track (GUI ASCII for Phase 2 design; SSDL for Phase 3
   internal refactoring documentation).

2. The 5 open questions from the workflow report (Q1 vocabulary,
   Q2 comparison policy, Q3 storage location, Q4 tooling,
   Q5 frequency) are documented with sensible defaults in
   spec.md §2.1-2.5 and metadata.json. The user can override
   any of them; defaults pre-stage the work.

3. First target is src/gui_2.py:3770 render_discussion_entry
   (Discussion Hub per-entry panel). Rationale:
   - Most-edited surface (every AI/user message)
   - User has strong opinions (per nagent_review_20260608 3 rounds
     of corrections)
   - 23-op matrix A1-A7 is the source of truth
   - ImGui layout maps cleanly to ASCII
   - SSDL defusing techniques can guide the internal refactoring

4. 4 phases: 1=resolve 5 questions, 2=execute workflow on first
   target (1-3 ASCII rounds), 3=implement per design contract
   (TDD with 7 test files for A1-A7 operations),
   4=document the pattern + propose 5-7 next targets.

Cross-references added throughout:
- docs/reports/computational_shapes_ssdl_digest_20260608.md
  (the SSDL digest, with explicit "this is a different vocabulary
  for a different purpose" note in spec §2.6)
- docs/reports/ascii_sketch_ux_workflow_20260608.md (the workflow)
- docs/guide_discussions.md (the 23-op matrix A1-A7)
- conductor/tracks/nagent_review_20260608/ (the source of the
  user's editable-discussion corrections)
- conductor/tracks/manual_ux_validation_20260302/ (complementary
  general UX review track)
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/
  (the contingency track; referenced in spec §2.6 SSDL cross-ref)

No code modified. Track is active; Phase 1 (5 user-questions) is
the current phase. User-confirmed worth doing in the prior turn.
2026-06-08 23:41:43 -04:00
conductor-tier2 816e9f2f5c conductor(track): chunkification_optimization_20260608_PLACEHOLDER - 1-page contingency document
The user's third correction this session changed the framing
from "build a stateful C extension" to "wait for a hard constraint,
then build a request/response blob pipeline." This commit creates
a 1-page contingency document (no plan.md, no implementation)
that captures:

- The threshold: "only worth it under a hard constraint that
  no existing Python package can solve"
- The shape when activated: subprocess-launch C11 binary with
  request/response blob wire format (NOT stateful CPython C
  extension)
- The 2 cited candidates (markdown parsing into aggregate markdown,
  context snapshot processing) are NOT currently bottlenecks per
  src/aggregate.py:380-454 (pure-Python string concat, zero
  third-party markdown deps in pyproject.toml:6-27) and
  src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity,
  debounced)
- The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 +
  "Xar-style chunked arrays" recommendation in §5.2 pre-support
  this track

Files (4 total, 227+ lines of contingency document):
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/metadata.json
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/state.toml
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/index.md

Cross-references added:
- docs/reports/computational_shapes_ssdl_digest_20260608.md (the
  SSDL digest is the theoretical foundation; explicitly cited in
  the spec's §6.1 "SSDL alignment" and in metadata.json external)
- docs/reports/c11_python_interop_assessment_20260608.md (the v1+v2
  assessment; explicitly cited in spec's §6 See Also)

No code modified. Track does NOT appear in the active queue
of conductor/tracks.md; appears in the Backlog / Contingency
section as a reference, not a commitment.

Activation criteria (per metadata.json):
1. Profiling shows a real bottleneck in a target code path
2. The bottleneck cannot be solved with existing Python packages
3. The user explicitly approves activation

Without all 3, this track stays deferred. Default action is don't.
2026-06-08 23:40:27 -04:00
conductor-tier2 a9333bbb59 conductor(track-update): code_path_audit_20260607 - post-4-tracks timing + 5-source framing
The user specified that the code_path_audit_20260607 track should run
AFTER the 4 foundational tracks complete (qwen_llama_grok,
data_oriented_error_handling, data_structure_strengthening,
mcp_architecture_refactor). This commit formalizes that timing
and grounds the audit's analytical framing in the 5 sources loaded
into context on 2026-06-08.

3 surgical additions to the spec/plan, no task changes:

1. Post-4-tracks timing (new section in spec.md §"Timing", plus
   a "Timing" callout in plan.md's opening):
   - The 4 tracks will significantly reshape src/ai_client.py,
     src/mcp_client.py, src/app_controller.py, and
     src/type_aliases.py
   - Running the audit on pre-refactor code would produce a
     report that's stale on day 1
   - The post-4-tracks timing ensures the audit grounds
     optimization decisions for the *resulting* architecture
   - Pre-flight check: verify all 4 tracks are [x] completed
     in conductor/tracks.md before starting this track

2. Analytical framing (new section in spec.md §"Analytical Framing
   (5-source lens)"):
   - Maps each of the 5 sources (Fleury taxonomy + Fleury
     combinatoric + Muratori Big OOPs + Reece Assuming + user's
     chunk ideation) to specific audit-time heuristics
   - 4 concrete heuristics: effective-codepath count,
     entity-hierarchy fingerprint, assumed-too-much detector,
     chunkification candidates
   - The heuristics shape REPORT INTERPRETATION, not the
     static cost model (which stays data-grounded in
     EXPENSIVE_THRESHOLD + per-class weights)

3. See Also cross-references in spec.md (6 new entries):
   - nagent_review Pitfalls #2 and #4 (provider history
     globals + stateful singleton)
   - wo84LFzx5nI Big OOPs transcript (full text, 4310
     segments, 200KB; loaded 2026-06-08)
   - i-h95QIGchY Assuming transcript (full text, 3719
     segments, 162KB; loaded 2026-06-08)
   - ed_chunk_data_structures_20260523.md (5-image archive
     of user's chunk ideation, 19KB; saved 2026-06-08)
   - computational_shapes_ssdl_digest_20260608.md (the SSDL
     digest that synthesizes the 4-source computational-shapes
     thinking; the audit's tree/mermaid outputs ARE
     computational-shape visualizations)

4. tracks.md entry updated to include the spec/plan links and
   a brief status note that the audit is post-4-tracks.

5. plan.md has a "Timing" callout at the top stating the 4
   tracks must ship before the plan executes.

No code modified. The audit's tasks (Phases 1-6) are unchanged
in structure; the new sections only add analytical context
and timing constraints.
2026-06-08 22:05:54 -04:00