DEV-TOOLSJune 1, 20261 min read4 views

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

JetBrains has released Mellum2, an open 12B Mixture-of-Experts model optimized for low-latency text and code tasks. It delivers competitive performance with over 2x faster inference compared to similar-sized models.

Mellum2 is a 12B-parameter Mixture-of-Experts model trained from scratch on natural language and code.
The model activates only 2.5B parameters per token, making it efficient for high-throughput, low-latency inference. Mellum2 is can be used for routing, RAG, summarization, sub-agents, high-throughput coding features, and private deployments.
It is released under the Apache 2.0 license.
Compared with similar-sized models, Mellum2 delivers competitive benchmark performance while achieving more than 2x faster inference.
Download the model on Hugging Face: https://huggingface.co/collections/JetBrains/mellum-2
For architecture details, training setup, benchmarks, and evaluation methodology, read the full technical report: https://arxiv.org/pdf/2605.31268

Source: Hugging Face Blog

More in this category

dev-tools

OpenAI and Hugging Face address security incident during model evaluation

OpenAI and Hugging Face detailed a security incident where an AI agent exploited vulnerabilities during internal cyber capability evaluations, underscoring the critical need for advanced safeguards as AI models gain sophisticated technical capabilities.

dev-tools

The State of Simulation for Physical AI: An Overview

Data availability is the primary bottleneck for Physical AI. GPU-accelerated simulation platforms solve this by generating scalable synthetic datasets to train, test, and deploy next-generation robotics systems.

dev-tools

Linux kernel will support $ORIGIN, sort of

Farid Zakaria shares his journey of proposing a patch to the Linux kernel to support relocatable binaries in Nix, which evolved into a powerful eBPF-based solution for binfmt_misc.

dev-tools

Five US tech giants' hidden debts soar to $1.65T on opaque AI funding

A Nikkei study reveals that off-balance-sheet debts at five major US tech companies have surged eightfold to $1.65 trillion, driven by massive AI investments like data center leases and GPU contracts.

dev-tools

Running Doom on Our Custom CPU and Going Viral

Two developers successfully built a custom CPU from scratch at the logic gate level, deployed it on an FPGA, and optimized its memory architecture to run the classic game DOOM.

dev-tools

Incremental – A library for incremental computations

Incremental is a library designed for building complex computations that update efficiently when inputs change. Inspired by self-adjusting computation research, it is highly useful for large-scale calculations, GUI views, and data synchronization.