Kwas Posted October 12, 2025 Posted October 12, 2025 https://qwen.ai/blog?from=research.latest-advancements-list&id=4074cca80393150c248e508aa62983f9cb7d27cd& We believe that Context Length Scaling and Total Parameter Scaling are two major trends in the future of large models. To further improve training and inference efficiency under long-context and large-parameter settings, we design a brand-new model architecture called Qwen3-Next.Compared to the MoE structure of Qwen3, Qwen3-Next introduces several key improvements: a hybrid attention mechanism, a highly sparse Mixture-of-Experts (MoE) structure, training-stability-friendly optimizations, and a multi-token prediction mechanism for faster inference. Supposed to be a pretty splod.model with thinking and instruct options. Going to test this out soon using Ollama on my setup, need to know how many r's are still in strawberry. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.