mixture-of-experts
Mistral releases a lean, open model built for formal math proofs News
Leanstral 1.5 is a free, open model specialized for writing machine-checked mathematical proofs, using a design that keeps only a small slice of itself active at a time.
A 35-billion-parameter agent that punches like a trillion-parameter model News
Shanghai AI Lab argues you can reach giant-model performance on long tasks not by adding parameters, but by training on much longer chains of real work.
Mixture of Experts: The Committee Inside a Giant Model Lesson
Why the biggest AI models are not really one big brain but a large team of specialists, only a few of whom wake up for any given word -- the trick that lets a model be huge and fast at the same time.
A small but elegant idea: putting 'experts' inside the attention layer News
Grouped Query Experts brings the mixture-of-experts trick into attention, activating only half a model's query heads per token while matching the full version -- at least at small scale.
A Classic Efficiency Trick Just Moved Into a New Part of the AI News
For years, the committee-of-specialists design that keeps big models fast lived in one layer of the network. A clean new result shows it works in the attention layer too, halving some of the work for free.