Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

JetBrains has released Mellum2, an open 12B Mixture-of-Experts model optimized for low-latency text and code tasks. It delivers competitive performance with over 2x faster inference compared to similar-sized models.
- Mellum2 is a 12B-parameter Mixture-of-Experts model trained from scratch on natural language and code.
- The model activates only 2.5B parameters per token, making it efficient for high-throughput, low-latency inference. Mellum2 is can be used for routing, RAG, summarization, sub-agents, high-throughput coding features, and private deployments.
- It is released under the Apache 2.0 license.
- Compared with similar-sized models, Mellum2 delivers competitive benchmark performance while achieving more than 2x faster inference.
- Download the model on Hugging Face: https://huggingface.co/collections/JetBrains/mellum-2
- For architecture details, training setup, benchmarks, and evaluation methodology, read the full technical report: https://arxiv.org/pdf/2605.31268
Source: Hugging Face Blog












