7c1d597ef1
Three new vendors + capability matrix framework + MiniMax refactor: **Capability matrix v1 (7 features):** vision, tool_calling, caching, streaming, model_discovery, context_window, cost_tracking. Audio and server-side code execution deferred to a follow-up track. **Qwen via DashScope native SDK:** Qwen-Turbo, Qwen-Plus, Qwen-Max, Qwen-Long (1M context), Qwen-VL-Plus/Max (vision), Qwen-Audio. Native API chosen over OpenAI-compatible mode to unlock Qwen-Audio, Qwen-Long custom chunking, and Qwen-VL-Max enhanced vision. **Llama (OpenAI-compatible, multi-backend):** Ollama (local, free), OpenRouter (cloud aggregator covering Together/Groq/Fireworks), custom URL escape hatch. Models: Llama 3.1 8B/70B/405B, 3.2 1B/3B, 3.2 11B/90B Vision, 3.3 70B. **Grok via xAI (OpenAI-compatible):** Grok-2, Grok-2-Vision, Grok-Beta. **Shared OpenAI-compatible helper** in src/openai_compatible.py processes a normalized request/response data structure; each _send_<vendor>() is a thin adapter at the boundary (data-oriented design per Fleury/Acton/Lottes). **MiniMax refactor:** ~250 lines reduced to ~50 by using the shared helper. Existing test_minimax_provider.py is the safety net. **UX adaptation:** 9 UI elements (screenshot, tools toggle, cache panel, stream progress, fetch models, token budget, cost panel) read from the matrix instead of hard-coding per-vendor branches. **Out of scope (deferred):** Anthropic/Gemini/DeepSeek migration to the matrix (separate track), audio input, server-side code execution, PDF input, batch API, fine-tuning. 6 phases planned: matrix+helper, Qwen, Grok+Llama, MiniMax refactor, UX adaptation, docs+archive.