Grok's own recommendation (consulted 2026-06-11):
'xAI (Grok) | xAI official OpenAI-compatible (https://api.x.ai/v1) |
Fully compatible and clean. Supports Grok-2 + Grok-2-Vision. No
meaningful unique native surface lost by using the compatible
endpoint.'
This REVERSES the earlier 'xAI native' correction. The OpenAI-
compatible approach for Grok is the canonical full-featured path;
the implementation in Phase 3 (OpenAI SDK with base_url=https://api.x.ai/v1
+ send_openai_compatible helper) is correct as-is.
Updates to the spec:
1. §3.1.1: replaced the 'use xAI native' decision with the confirmed
per-vendor table. Qwen=Native, Grok=OpenAI-Compatible (per Grok's
own confirmation), MiniMax=OpenAI-Compatible, DeepSeek=OpenAI-
Compatible, Ollama=OpenAI-Compatible-in-v1 (native in v2),
Meta Llama API=Native (new 4th backend, follow-up), Gemini=Native
(follow-up), Anthropic=Native (follow-up). Also added Grok's
recommended v2 matrix field expansion: audio, video, grounding,
computer_use, local, reasoning/extended_thinking, web_search,
x_search, code_execution, file_search, mcp_support, structured_output.
2. §4.3: reverted from 'Grok via xAI (Native REST API)' back to
'Grok via xAI (OpenAI-Compatible) - confirmed 2026-06-11'. The
implementation does NOT need a native refactor; the OpenAI SDK
at https://api.x.ai/v1 is the canonical approach. Removed the
earlier 'caching: true' entry from the registry (since the
OpenAI-compat shim doesn't expose prompt_cache_key) and the
'no persistent client' state struct (back to the OpenAI SDK
pattern).
3. §13.1.B: renamed from 'Native Vendor APIs' to 'Llama Native APIs
(Ollama native + Meta Llama API)' and removed the Grok native
refactor item (Grok says OpenAI-compat is fine). Kept the Ollama
native + Meta Llama API items + matrix expansion. Clarified that
Grok tests do NOT need rewriting; only Llama tests get 2 more
(native Ollama, Meta Llama API).
Net effect: the Phase 3 work that just shipped (Grok+Llama Green
using OpenAI-compat shim) is CORRECT as-is. The implementation
matches Grok's actual recommendation. No code rollback needed.
Three additions to the spec, per the user's architectural correction
in this session:
1. NEW section 3.1.1: 'Architectural principle: Use the best API per
vendor' — explains why the OpenAI-compatible shim loses vendor-
specific features (xAI: prompt_cache_key, reasoning_effort, server-
side tools, cost_in_usd_ticks; Ollama: think param, images array,
thinking field, structured outputs) and states the principle:
'use each vendor's native SDK or REST API when one exists, falling
back to OpenAI-compatible only when no native option exists.'
Also notes that the capability matrix IS the aggregate tracker;
future native features go into the matrix, and the GUI filters
based on it (no per-vendor UI branches).
2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was
'OpenAI-Compatible'. Now specifies two native endpoints
(/v1/chat/completions and /v1/responses), the native features that
matter, the updated capability registry (caching=true for Grok
via prompt_cache_key), and a 'Phase 3 placeholder behavior' note
that this track's Phase 3 ships the OpenAI-compatible Grok as a
placeholder. The native refactor is deferred to follow-up B.
3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs
(post-OpenAI-compatible-placeholder)' which documents:
- Grok → xAI native REST
- Llama (Ollama) → native /api/chat
- Llama (Meta Llama API) → new 4th backend (deferred pending
verification of Meta's API spec; llama.developer.meta.com/docs/overview
returned 400 on fetch this session)
- Capability matrix expansion (web_search, x_search, code_execution,
file_search, mcp_support, reasoning_effort, structured_output)
- Test rewrites (mock requests.post instead of chat.completions.create)
This is a docs-only commit; no code changes. The Phase 3 Green work
continues with the OpenAI-compatible approach as planned in the
existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track
B handles the native refactor when prioritized.
New track for prior-session sepia tint:
- 3 new theme slots (prior_session_bg, prior_session_tint, prior_session_amount)
- per-palette state dict mirroring _brightness/_contrast/_gamma
- apply_prior_tint helper (float-only math per user requirement)
- 6 prior-session render sites wrapped (2 bubble_vendor swaps + 4 tint wraps)
- Theme Settings panel slider with persistence
Code-block tonemap fix is OUT OF SCOPE (upstream imgui_bundle 1.92.5
API only exposes 4-value PaletteId enum, no per-instance struct).
See spec §1.1.1 and design doc 'Honest constraint' section.
These were authored at track start but missed by the final-state
commit. They are the brief 1-2 page design intent and executable
plan for the docs sync track. The closing report at
docs/reports/docs_sync_test_era_20260610.md summarizes the actual
17-commit execution.
- state.toml: status active->completed, all 25 tasks marked complete
with commit SHAs, all 4 phases checkpointed
- metadata.json: status active->shipped, 17-commit list, all 9
verification criteria flipped to DONE
The user said (verbatim): "On number 1. I love the idea and definitely
see poitental." This commit creates a full track that promotes the
ASCII-sketch UX ideation workflow
(docs/reports/ascii_sketch_ux_workflow_20260608.md, 340 lines) to
a real track with a concrete first target.
The track complements (does not replace) the existing
manual_ux_validation_20260302 track (which is a general UX review
track; this 2026-06-08 track is *focused* on the ASCII-sketch
workflow specifically).
Files (5 total, ~52KB, 12,000+ words):
- spec.md (186 lines, 9 sections) - track design, 5 open
questions, first target analysis, SSDL cross-reference
- plan.md (~280 lines, 4 phases, 21 tasks) - TDD-style with
WHERE/WHAT/HOW/SAFETY annotations
- metadata.json (~120 lines) - structured metadata, 5 open
questions with defaults, 5 SSDL principles available
- state.toml (~95 lines) - per-task tracking + phase status
- index.md (~50 lines) - track context + related docs
Key design decisions captured:
1. Two distinct vocabularies are conflated at first glance:
- GUI ASCII (the workflow) for panel sketches
- SSDL (computational shapes digest) for internal code sketches
Spec §2.6 makes the distinction explicit; both are useful for
this track (GUI ASCII for Phase 2 design; SSDL for Phase 3
internal refactoring documentation).
2. The 5 open questions from the workflow report (Q1 vocabulary,
Q2 comparison policy, Q3 storage location, Q4 tooling,
Q5 frequency) are documented with sensible defaults in
spec.md §2.1-2.5 and metadata.json. The user can override
any of them; defaults pre-stage the work.
3. First target is src/gui_2.py:3770 render_discussion_entry
(Discussion Hub per-entry panel). Rationale:
- Most-edited surface (every AI/user message)
- User has strong opinions (per nagent_review_20260608 3 rounds
of corrections)
- 23-op matrix A1-A7 is the source of truth
- ImGui layout maps cleanly to ASCII
- SSDL defusing techniques can guide the internal refactoring
4. 4 phases: 1=resolve 5 questions, 2=execute workflow on first
target (1-3 ASCII rounds), 3=implement per design contract
(TDD with 7 test files for A1-A7 operations),
4=document the pattern + propose 5-7 next targets.
Cross-references added throughout:
- docs/reports/computational_shapes_ssdl_digest_20260608.md
(the SSDL digest, with explicit "this is a different vocabulary
for a different purpose" note in spec §2.6)
- docs/reports/ascii_sketch_ux_workflow_20260608.md (the workflow)
- docs/guide_discussions.md (the 23-op matrix A1-A7)
- conductor/tracks/nagent_review_20260608/ (the source of the
user's editable-discussion corrections)
- conductor/tracks/manual_ux_validation_20260302/ (complementary
general UX review track)
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/
(the contingency track; referenced in spec §2.6 SSDL cross-ref)
No code modified. Track is active; Phase 1 (5 user-questions) is
the current phase. User-confirmed worth doing in the prior turn.
The user's third correction this session changed the framing
from "build a stateful C extension" to "wait for a hard constraint,
then build a request/response blob pipeline." This commit creates
a 1-page contingency document (no plan.md, no implementation)
that captures:
- The threshold: "only worth it under a hard constraint that
no existing Python package can solve"
- The shape when activated: subprocess-launch C11 binary with
request/response blob wire format (NOT stateful CPython C
extension)
- The 2 cited candidates (markdown parsing into aggregate markdown,
context snapshot processing) are NOT currently bottlenecks per
src/aggregate.py:380-454 (pure-Python string concat, zero
third-party markdown deps in pyproject.toml:6-27) and
src/history.py:1-141 (bounded ~500KB at 100-snapshot capacity,
debounced)
- The SSDL digest's Technique 5 "Assume-away (Xar)" in §2.2 +
"Xar-style chunked arrays" recommendation in §5.2 pre-support
this track
Files (4 total, 227+ lines of contingency document):
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/spec.md
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/metadata.json
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/state.toml
- conductor/tracks/chunkification_optimization_20260608_PLACEHOLDER/index.md
Cross-references added:
- docs/reports/computational_shapes_ssdl_digest_20260608.md (the
SSDL digest is the theoretical foundation; explicitly cited in
the spec's §6.1 "SSDL alignment" and in metadata.json external)
- docs/reports/c11_python_interop_assessment_20260608.md (the v1+v2
assessment; explicitly cited in spec's §6 See Also)
No code modified. Track does NOT appear in the active queue
of conductor/tracks.md; appears in the Backlog / Contingency
section as a reference, not a commitment.
Activation criteria (per metadata.json):
1. Profiling shows a real bottleneck in a target code path
2. The bottleneck cannot be solved with existing Python packages
3. The user explicitly approves activation
Without all 3, this track stays deferred. Default action is don't.
The user specified that the code_path_audit_20260607 track should run
AFTER the 4 foundational tracks complete (qwen_llama_grok,
data_oriented_error_handling, data_structure_strengthening,
mcp_architecture_refactor). This commit formalizes that timing
and grounds the audit's analytical framing in the 5 sources loaded
into context on 2026-06-08.
3 surgical additions to the spec/plan, no task changes:
1. Post-4-tracks timing (new section in spec.md §"Timing", plus
a "Timing" callout in plan.md's opening):
- The 4 tracks will significantly reshape src/ai_client.py,
src/mcp_client.py, src/app_controller.py, and
src/type_aliases.py
- Running the audit on pre-refactor code would produce a
report that's stale on day 1
- The post-4-tracks timing ensures the audit grounds
optimization decisions for the *resulting* architecture
- Pre-flight check: verify all 4 tracks are [x] completed
in conductor/tracks.md before starting this track
2. Analytical framing (new section in spec.md §"Analytical Framing
(5-source lens)"):
- Maps each of the 5 sources (Fleury taxonomy + Fleury
combinatoric + Muratori Big OOPs + Reece Assuming + user's
chunk ideation) to specific audit-time heuristics
- 4 concrete heuristics: effective-codepath count,
entity-hierarchy fingerprint, assumed-too-much detector,
chunkification candidates
- The heuristics shape REPORT INTERPRETATION, not the
static cost model (which stays data-grounded in
EXPENSIVE_THRESHOLD + per-class weights)
3. See Also cross-references in spec.md (6 new entries):
- nagent_review Pitfalls #2 and #4 (provider history
globals + stateful singleton)
- wo84LFzx5nI Big OOPs transcript (full text, 4310
segments, 200KB; loaded 2026-06-08)
- i-h95QIGchY Assuming transcript (full text, 3719
segments, 162KB; loaded 2026-06-08)
- ed_chunk_data_structures_20260523.md (5-image archive
of user's chunk ideation, 19KB; saved 2026-06-08)
- computational_shapes_ssdl_digest_20260608.md (the SSDL
digest that synthesizes the 4-source computational-shapes
thinking; the audit's tree/mermaid outputs ARE
computational-shape visualizations)
4. tracks.md entry updated to include the spec/plan links and
a brief status note that the audit is post-4-tracks.
5. plan.md has a "Timing" callout at the top stating the 4
tracks must ship before the plan executes.
No code modified. The audit's tasks (Phases 1-6) are unchanged
in structure; the new sections only add analytical context
and timing constraints.
4 surgical additions to the spec, no task changes:
1. list_tool_schemas on the SubMCP Protocol: Added the method
to §3.1 (The SubMCP Protocol). Per nagent_review Pitfall #6
(hard-coded tool discovery) and takeaway #5 (self-describing
tools), each sub-MCP advertises its own capabilities via
list_tool_schemas() rather than relying on a central registry.
This is the equivalent of nagent's collect_bin_tool_descriptions
per sub-MCP. The MCPController.get_tool_schemas() becomes a
simple aggregator.
2. Security model is the contract: Added a new Important note
to §3.3 (The 3-Layer Security Model). The 3 layers
(Allowlist Construction -> Path Validation -> Resolution
Gate, per docs/guide_mcp_client.md) are not just refactored
- they are the CONTRACT between MCPController and the
sub-MCPs. Sub-MCPs receive a pre-validated Path and trust
it. They do NOT re-validate. The refactor is structural,
not security-changing.
3. Docs touchpoint in Phase 7: Added the docs touchpoint to
Phase 7 per the docs Refresh Protocol. The update to
docs/guide_mcp_client.md should add a Sub-MCP Architecture
section, link the list_tool_schemas pattern to 3-Layer
Security Model, and cross-link the 3 new guides from
the 2026-06-08 docs refresh.
4. See Also cross-references: Added 8 new entries to §12.2:
- docs/guide_context_aggregation.md (FileItem consumer)
- docs/guide_state_lifecycle.md (App state delegation)
- docs/guide_discussions.md (23-operation matrix)
- conductor/tracks/qwen_llama_grok_integration_20260606/
(Result return type coordination)
- conductor/tracks/nagent_review_20260608/{report,takeaways}.md
- (2 specific data_oriented_error_handling and
data_structure_strengthening cross-refs)
No plan.md changes.
4 surgical additions to the spec, no task changes:
1. ProviderHistoryMessage: Added a new alias to §3.1 (The
Aliases). Per nagent_review Pitfall #4 (provider history
divergence), the UI/curation layer (HistoryMessage, edited
via disc_entries[i].content) and the SDK layer
(ProviderHistoryMessage, the bytes actually replayed to the
LLM) are *distinct*. Conflating them via a single alias
perpetuates the bug. The new alias is documented as a
separate concept with its own use sites (_anthropic_history,
_deepseek_history, _minimax_history, _grok_history,
_llama_history). The follow-up public_api_migration_20260606
track is the natural moment to unify the two layers; this
spec just makes the distinction explicit.
2. FileItem alias points to the existing models.FileItem
dataclass, not Metadata. Per docs/guide_context_aggregation.md
(added 2026-06-08), FileItem is a 9-field dataclass
(path, auto_aggregate, force_full, view_mode, selected,
ast_signatures, ast_definitions, ast_mask, custom_slices,
injected_at) with a __post_init__ normalizer. Aliasing it to
dict[str, Any] would lose the type safety. The 9 other
aliases remain dict aliases for round-trip compatibility.
3. gui_2.py and mcp_client.py as follow-up: Added a Note
(dated 2026-06-08) to the Out of Scope section. The 23
lower-impact files (deferred) are dominated by gui_2.py
(26+ weak sites per guide_state_lifecycle.md) and
mcp_client.py (will be touched heavily by the parallel
mcp_architecture_refactor_20260606). The deferral is correct
but the follow-up should explicitly call out these two
files as the next targets, rather than implying they're
handled.
4. See Also cross-references: Added 7 new entries to §12.2:
- docs/guide_models.md (FileItem dataclass source)
- docs/guide_context_aggregation.md (FileItems consumer)
- docs/guide_discussions.md (HistoryMessage shape)
- docs/guide_state_lifecycle.md (state delegation)
- conductor/tracks/mcp_architecture_refactor_20260606/
- conductor/tracks/nagent_review_20260608/{report,takeaways}.md
No plan.md changes.
3 surgical additions to the spec, no task changes:
1. New ErrorKind: Added PROVIDER_HISTORY_DIVERGED_FROM_UI to
the ErrorKind enum. Per nagent_review Pitfall #4 (provider
history divergence: user edits disc_entries[i].content via
the discussion UI but ai_client._<provider>_history still
replays the original). The new kind makes the divergence
*detectable* and *reportable* so the follow-up
public_api_migration_20260606 track can collapse the two
history layers. The Result pattern from this track is the
natural carrier for the signal.
2. State-delegation regression tests: Added mandatory
regression tests to the testing strategy in §6 for the
ai_client refactor (highest-risk phase). The new tests
exercise:
- app.temperature = 0.5 round-trips through App.__getattr__/
__setattr__ delegation (per gui_2.py:666-675)
- controller.disc_entries[i].content is reflected in the
next send_result()'s messages parameter
- The 3 per-provider history locks serialize correctly under
concurrent send_result() calls
The reason this is mandatory: per guide_state_lifecycle.md
(added 2026-06-08), the App.__getattr__/__setattr__ pattern
means a partial refactor manifests as silent AttributeError
deep in test code, not at the refactor commit boundary.
3. See Also cross-references: Added 6 new entries to §12.3:
- docs/guide_ai_client.md (per-provider history globals)
- docs/guide_mcp_client.md (3-layer security model)
- docs/guide_state_lifecycle.md (3 per-thread + 7-lock pattern)
- docs/guide_discussions.md (23-operation matrix)
- docs/guide_context_aggregation.md (build_discussion_section)
- conductor/tracks/mcp_architecture_refactor_20260606/
- conductor/tracks/nagent_review_20260608/{report,takeaways}.md
No plan.md changes. Plan tasks are task-level and will flow from
the spec changes when the track is re-planned.
4 surgical additions to the spec, no task changes:
1. Result return type: Added a coordination note in §3.1 (Data-
Oriented Design) explaining that the shared send_openai_compatible
helper should return Result[NormalizedResponse, ErrorInfo] from
day 1, not NormalizedResponse + ProviderError raise. This is so
the downstream data_oriented_error_handling_20260606 track is
a small mechanical pass over new code, not a second migration.
References nagent_review Pitfall #4 (provider history divergence)
and the ErrorKind.PROVIDER_HISTORY_DIVERGED_FROM_UI use case.
2. Declarative read, not behavioral dispatch: Added clarification
to §6 (UX Adaptation) that the capability matrix is a *read* of
declarative data, not a new dispatch layer. Per nagent_review
Pitfall #1 (opaque function calling in the Application is the
correct choice; nagent-style protocol is for Meta-Tooling),
UI elements are visible/enabled/disabled/hidden but the
*behavior* they invoke is unchanged. Three concrete examples
added: screenshot button, cost panel, cache panel.
3. PROVIDERS source of truth: Added a NOTE in §3.2 (Module Layout)
that src/models.py:79-86 PROVIDERS is the existing single
source of truth for the (vendor, model) enumeration. The
capability registry reads from this constant rather than
introducing a parallel list. Cross-references
docs/guide_models.md.
4. Docs touchpoint: Expanded Phase 6 (Docs + Archive) in §9 to
note that docs/guide_ai_client.md needs the new providers +
the shared helper documented, and that
docs/guide_context_aggregation.md (added 2026-06-08) is the
reference for the aggregate.py pipeline that all new providers
use.
5. See Also cross-references: Added 3 new entries to §13.2:
- docs/guide_context_aggregation.md (the new pipeline guide)
- conductor/tracks/nagent_review_20260608/report.md (§1, §5, §15)
- conductor/tracks/nagent_review_20260608/nagent_takeaways_20260608.md
(§1, §2, §9)
No plan.md changes. Plan tasks are task-level and will flow from
the spec changes when the track is re-planned.