The user explicitly stated 2026-06-11: 'I need a naming convention
enforce for separate files you keep introducing that are technically
part of a system or parent module.' Per AGENTS.md 'File Size and
Naming Convention' HARD RULE: new src/<thing>.py files may only be
created on the user's explicit request. All AI-client code lives
IN src/ai_client.py.
Sweep through all follow-up track files to remove the stale
references to the no-longer-planned new src/ files:
- TODO.md: t1.4 'Implement helper in src/tool_loop.py' -> '...in
src/ai_client.py'
- plan.md: 5 stale references updated (Task 4.3 title, Step 1
'Files:', Step 5 'git add', Phase 4 git note, the function
summary in Phase 1 verification)
- plan.md: 'src/llama_ollama_native.py' removed (ollama_chat and
_send_llama_native both in src/ai_client.py)
- spec.md: Phase Plan section T1.2 and T4.2/T4.3 updated to
reference src/ai_client.py
- state.toml: t1.4, t4_2, t4_3 descriptions updated
- metadata.json: new_files list shrunk (3 new src/ files removed);
verification_criteria updated to reference src/ai_client.py
functions; follow_up_audit_report reference updated to point to
the actual file (docs/reports/qwen_llama_grok_followup_audit_20260611.md)
Spec additions from the same turn (not in the previous plan version):
- Naming Convention section explicitly references AGENTS.md HARD
RULE; 'If you find yourself about to create one, ASK FIRST'
- 'Non-Goals' section now lists 8 explicit non-goals (vs the
previous 4) including history management lift, reasoning
extraction lift, error classification lift
- 'Deferred Work' section documents 3 separate follow-up tracks
(namespace_cleanup_20260611, ai_client_codepath_consolidation_20260611,
mcp_architecture_refactor_20260606 [already specced])
- 'Open Questions' has 1 RESOLVED (PROVIDERS location) and 2 still
open (Meta URL verification; local model UI mode)
- 'Goals' table: 'local-backend' field added separately from
'cost_tracking' (per user feedback: distinct concept)
- 'B.1 Local-First' section: native Ollama DEFAULT for localhost
(not fallback), Meta Llama API prerequisite (verify URL first)
- 'B.2 Matrix Expansion' section: full list of 12 v2 fields + UI
adaptations for each
This is docs-only. The plan is now complete and aligned with the
HARD RULE. The next agent can pick up at Phase 1, Task 1.1 and
execute straight through.
The user called out the LLM training data bias: 'small files are
good, large files are bad.' This is wrong for production codebases.
Unreal has 15K+ line files; OS kernels, game engines, compilers all
routinely have 10K+ line files. File size is a non-issue. Cognitive
load is managed via naming, regions, and navigation tools (the
manual-slop MCP) — NOT via file splitting.
Updates:
1. AGENTS.md (master agent guidance):
- Added 'File Size and Naming Convention' section
- Added the hard rule: 'New namespaced src/<thing>.py files may
only be created on the user's explicit request. If you find
yourself about to create one, ASK FIRST.'
- Defaults: helpers and sub-systems go in the parent module
2. conductor/workflow.md (Guiding Principles):
- Removed 'Do NOT perform large file writes directamente' from
principle 7 (it was a delegating rule, but 'large file writes'
carried the propaganda)
- Added principle 8: 'File Naming Convention (HARD RULE)' that
references AGENTS.md
- Re-phrased principle 9 (Research-First) to clarify it's about
navigation efficiency, not file size
3. conductor/code_styleguides/python.md:
- Removed the 'extremely large files that violate the Anti-OOP
rule by necessity' framing
- Added the new rule about new src/<thing>.py files
4. .opencode/agents/tier3-worker.md and .opencode/agents/tier4-qa.md:
- Re-phrased 'Do NOT read full large files' to 'Use skeleton
tools to navigate any file regardless of size. File size is
not a concern; the right tools are.'
- Added the new rule about not creating new src/<thing>.py
files unless user explicitly requests it
5. conductor/tracks/qwen_llama_grok_followup_20260611/plan.md:
- Updated the 'Naming Convention' section to reference the new
'user explicit request' rule
This is docs-only. No code changes. The rule is now codified:
agents must ASK FIRST before creating new top-level src/ files.
The follow-up track had a spec but no plan. The plan is the executable
artifact — it specifies file:line refs, exact code to type, TDD steps,
and per-file atomic commits. Without the plan, the next agent cannot
implement from the spec alone.
Plan structure (5 phases, ~40 tasks):
- Phase 1: Tool loop lift (5 Red tests + helper + apply to 8 vendors +
audit script)
- Phase 2: PROVIDERS move (decide location + move + update 4 import
sites + audit script)
- Phase 3: UX adaptations 2-9 (8 separate applications of the pattern
established in parent Phase 5)
- Phase 4: Local-first + matrix v2 (12 new fields + native Ollama
adapter + Meta Llama API + Local Model GUI badge)
- Phase 5: Anthropic / Gemini / DeepSeek migration (matrix entries
for the 3 remaining providers + docs update)
Each task has:
- WHERE: exact file and (where applicable) line range
- WHAT: the specific change
- HOW: TDD step ordering (Red then Green)
- SAFETY: thread-safety, dependency-ordering, and project-invariant
constraints
The plan models the parent track's plan structure (2177 lines,
2-5 minute steps, per-file atomic commits).
Adds a status line to the qwen_llama_grok_integration_20260606 entry
in conductor/tracks.md noting that:
- Phases 1-5 are done; Phase 6 (docs) is in progress
- The track is NOT being archived (per user directive)
- A 5-phase follow-up track exists at
conductor/tracks/qwen_llama_grok_followup_20260611/
- An audit report is at docs/reports/qwen_llama_grok_followup_audit_20260611.md
- 50/79 tasks done; the remaining gaps are documented
Phase 6 t6.1 + t6.2 (no archive per user directive):
- docs/guide_ai_client.md: update Overview to mention 8 providers (was 5);
add 'Shared OpenAI-Compatible Helper' section explaining
src/openai_compatible.py (NormalizedResponse, OpenAICompatibleRequest,
send_openai_compatible, usage pattern); document the Qwen adapter
and Llama multi-backend.
- docs/guide_models.md: update PROVIDERS list to 8 entries (was 5).
- conductor/tracks.md: update the Qwen track entry to reflect
'50/79 tasks done; Phase 6 in progress; NOT archiving - has follow-up';
add detailed status note pointing to the follow-up track + audit
report.
- docs/reports/qwen_llama_grok_followup_audit_20260611.md: NEW report
explaining why a follow-up is needed (7 categories of gaps; the
Tech Lead's 'footnote for now' failure mode; the lessons learned).
- conductor/tracks/qwen_llama_grok_followup_20260611/: NEW follow-up
track setup (spec.md, state.toml, metadata.json, TODO.md).
5 phases: tool loop lift, PROVIDERS move, UX adaptations 2-9,
local-first + matrix v2, Anthropic/Gemini/DeepSeek migration.
Phase 6 t6.3 (git mv to archive) and t6.4 (mark Recently Completed)
are NOT applied per user directive: 'we can then doc this we're not
archiving yet, if we have a follow up track I need this one to stay
up because there is still alot todo'.
After the end of Phase 5, only adaptation 1 of 9 from spec §6 was
applied (Screenshot button iff vision, render_files_and_media:3030).
The pattern is established; the remaining 8 are mechanical
applications of the same pattern at their respective render sites.
The follow-up track applies the wrapping at:
- tools toggle (tool_calling)
- cache panel (caching)
- stream progress (streaming)
- fetch models button (model_discovery)
- token budget max (context_window)
- cost panel (3 cost_tracking states: estimate / 'Free (local)' / '-')
The _get_active_capabilities() helper (t5.1) is already in place.
As of end of Phase 4, only _send_minimax has a working tool-call loop.
Phase 3 (Grok, Llama) and Phase 2 (Qwen) entry points are single-shot;
they call send_openai_compatible once and return without executing
tool_calls. If the user notices 'tool execution doesn't work for
Qwen/Grok/Llama' after Phase 5 ships, the fix is to lift the tool
loop into a shared run_with_tool_loop() helper that wraps
send_openai_compatible. The 4 existing vendors (_send_anthropic /
_send_gemini / _send_gemini_cli / _send_deepseek) already have the
same inline duplication, so the lift would also help those.
This is a follow-up track, not in scope for qwen_llama_grok_integration_20260606.
Grok's own recommendation (consulted 2026-06-11):
'xAI (Grok) | xAI official OpenAI-compatible (https://api.x.ai/v1) |
Fully compatible and clean. Supports Grok-2 + Grok-2-Vision. No
meaningful unique native surface lost by using the compatible
endpoint.'
This REVERSES the earlier 'xAI native' correction. The OpenAI-
compatible approach for Grok is the canonical full-featured path;
the implementation in Phase 3 (OpenAI SDK with base_url=https://api.x.ai/v1
+ send_openai_compatible helper) is correct as-is.
Updates to the spec:
1. §3.1.1: replaced the 'use xAI native' decision with the confirmed
per-vendor table. Qwen=Native, Grok=OpenAI-Compatible (per Grok's
own confirmation), MiniMax=OpenAI-Compatible, DeepSeek=OpenAI-
Compatible, Ollama=OpenAI-Compatible-in-v1 (native in v2),
Meta Llama API=Native (new 4th backend, follow-up), Gemini=Native
(follow-up), Anthropic=Native (follow-up). Also added Grok's
recommended v2 matrix field expansion: audio, video, grounding,
computer_use, local, reasoning/extended_thinking, web_search,
x_search, code_execution, file_search, mcp_support, structured_output.
2. §4.3: reverted from 'Grok via xAI (Native REST API)' back to
'Grok via xAI (OpenAI-Compatible) - confirmed 2026-06-11'. The
implementation does NOT need a native refactor; the OpenAI SDK
at https://api.x.ai/v1 is the canonical approach. Removed the
earlier 'caching: true' entry from the registry (since the
OpenAI-compat shim doesn't expose prompt_cache_key) and the
'no persistent client' state struct (back to the OpenAI SDK
pattern).
3. §13.1.B: renamed from 'Native Vendor APIs' to 'Llama Native APIs
(Ollama native + Meta Llama API)' and removed the Grok native
refactor item (Grok says OpenAI-compat is fine). Kept the Ollama
native + Meta Llama API items + matrix expansion. Clarified that
Grok tests do NOT need rewriting; only Llama tests get 2 more
(native Ollama, Meta Llama API).
Net effect: the Phase 3 work that just shipped (Grok+Llama Green
using OpenAI-compat shim) is CORRECT as-is. The implementation
matches Grok's actual recommendation. No code rollback needed.
Three additions to the spec, per the user's architectural correction
in this session:
1. NEW section 3.1.1: 'Architectural principle: Use the best API per
vendor' — explains why the OpenAI-compatible shim loses vendor-
specific features (xAI: prompt_cache_key, reasoning_effort, server-
side tools, cost_in_usd_ticks; Ollama: think param, images array,
thinking field, structured outputs) and states the principle:
'use each vendor's native SDK or REST API when one exists, falling
back to OpenAI-compatible only when no native option exists.'
Also notes that the capability matrix IS the aggregate tracker;
future native features go into the matrix, and the GUI filters
based on it (no per-vendor UI branches).
2. UPDATED section 4.3 (Grok): 'Grok via xAI (Native REST API)' — was
'OpenAI-Compatible'. Now specifies two native endpoints
(/v1/chat/completions and /v1/responses), the native features that
matter, the updated capability registry (caching=true for Grok
via prompt_cache_key), and a 'Phase 3 placeholder behavior' note
that this track's Phase 3 ships the OpenAI-compatible Grok as a
placeholder. The native refactor is deferred to follow-up B.
3. UPDATED section 13.1: added follow-up track B 'Native Vendor APIs
(post-OpenAI-compatible-placeholder)' which documents:
- Grok → xAI native REST
- Llama (Ollama) → native /api/chat
- Llama (Meta Llama API) → new 4th backend (deferred pending
verification of Meta's API spec; llama.developer.meta.com/docs/overview
returned 400 on fetch this session)
- Capability matrix expansion (web_search, x_search, code_execution,
file_search, mcp_support, reasoning_effort, structured_output)
- Test rewrites (mock requests.post instead of chat.completions.create)
This is a docs-only commit; no code changes. The Phase 3 Green work
continues with the OpenAI-compatible approach as planned in the
existing Red tests (t3.3 Grok + t3.14 Llama), and the follow-up track
B handles the native refactor when prioritized.
New track for prior-session sepia tint:
- 3 new theme slots (prior_session_bg, prior_session_tint, prior_session_amount)
- per-palette state dict mirroring _brightness/_contrast/_gamma
- apply_prior_tint helper (float-only math per user requirement)
- 6 prior-session render sites wrapped (2 bubble_vendor swaps + 4 tint wraps)
- Theme Settings panel slider with persistence
Code-block tonemap fix is OUT OF SCOPE (upstream imgui_bundle 1.92.5
API only exposes 4-value PaletteId enum, no per-instance struct).
See spec §1.1.1 and design doc 'Honest constraint' section.
The Phase 6+ section had two duplicate '### Active' headers, which
made the chronology confusing. The user (paraphrased): preserve the
chronology of project progress, don't need full detail, follow the
previous restructure's lightweight pattern.
Changes:
- Add '### Recently Completed (2026-06-06 to 2026-06-10)' subsection
containing the 3 closed tracks (startup_speedup, test_batching_refactor,
test_infrastructure_hardening) with lightweight entries: per-phase
commit SHAs only, 1-line summary, link to spec/plan/state folder.
Trimmed the verbose per-sub-track commentary that was in the old
startup_speedup entry (the per-sub-track bullets for warmup, status
indicator, audit violations, post-shipping fixes are in the
archive's spec/plan, not the tracks.md).
- Remove the duplicate '### Active' header.
- Update section intro to reflect '3 recently completed, 4 in plan'
(was '2 already completed, 3 in plan').
- test_infrastructure_hardening entry now has phase commit SHAs
(5df22fa8, 67d0211e, 006bb114, b8fcd9d6, 33d5cac, 7b87bbf5,
84edb200, 719fe9a) instead of just the closing-report link.
Chronology is now visible at a glance; per-track full detail is
in the linked archive/ folder.
These were authored at track start but missed by the final-state
commit. They are the brief 1-2 page design intent and executable
plan for the docs sync track. The closing report at
docs/reports/docs_sync_test_era_20260610.md summarizes the actual
17-commit execution.
- state.toml: status active->completed, all 25 tasks marked complete
with commit SHAs, all 4 phases checkpointed
- metadata.json: status active->shipped, 17-commit list, all 9
verification criteria flipped to DONE
- Structural Testing Contract (mirrors workflow.md)
- Isolated-Pass Verification Fallacy (Lesson 1, with link to the
test_infrastructure_hardening_batch_green_20260610 incident report
that motivated the rule)
- Audit Scripts as CI Gates (4 scripts: check_test_toml_paths,
audit_main_thread_imports, audit_weak_types, audit_no_models_config_io)
- Skip Markers Are Documentation, Not Avoidance (workflow.md policy)
Known Pitfalls (new subsection):
- HARD BAN: git checkout -- <file>, git restore, git reset
(per AGENTS.md Critical Anti-Patterns; destroyed user in-progress
edits twice on 2026-06-07; concrete 2026-06-10 incident:
mma_tier_usage_reset_fix regression)
Live_gui Test Fragility (2 new subsections):
- Anti-pattern: push_event + time.sleep(N) + assert is a race.
Fix: poll-until-state-visible with bounded retries. 5+ tests
affected in 2026-06-10 batch-green wave.
- Async setters need poll-for-state. mma_state_update and rag_*
setters dispatch to _pending_gui_tasks queue; the setter returns
before the GUI render loop processes the task. Assert immediately
= race. Fix: poll via get_value with bounded retry.
Lesson 5 from the 4-day test-hell saga. The chroma cache lives at
tests/artifacts/.slop_cache/chroma_<collection>/, NOT at the per-run
live_gui_workspace_<timestamp>/ subdir. The trailing-slash bug in
Path(active_project_path).parent places the cache one level higher
than expected.
RAG tests must pre-clean the cache to avoid persistent state from
prior batched runs. Documents the cleanup pattern (shutil.rmtree with
ignore_errors=True), the auto-recovery mechanism (_validate_collection_dim),
and 3 anti-patterns (assuming per-run, not cleaning, asserting on
first chunk in batched context).
New entry at the top of the Recently Shipped list, linking to the
archive/ folder. Includes:
- 314/314 green across all 11 tier batches
- FR1-FR5 summary
- 3 lineage tracks also archived
- The 4 unblocked tracks
- Link to the closing batch-green report
- Remove row 1 from Active Tracks table
- Update rows 2-5, 17: test_infrastructure_hardening_20260609 -> '(merged)'
- Mark test_infrastructure_hardening as [COMPLETE 2026-06-10] [archived]
- Update link to use archive/ instead of tracks/
- Add closing note: 314/314 tests green, lineage tracks also archived