MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thinking Synergy

Researchers have introduced MER-R1, a breakthrough reinforcement learning framework that optimizes multimodal emotion recognition (MER). By synergizing 'fast thinking' (intuition) and 'slow thinking' (deliberative reasoning), MER-R1 achieves state-of-the-art performance on major benchmarks.

Computer Science > Artificial Intelligence

Title:MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thinking Synergy

View PDF HTML (experimental)Abstract:We find that explicit reasoning does not necessarily translate into better multimodal emotion recognition (MER) accuracy, even though it makes predictions more interpretable. Specifically, for reasoning-based MLLMs, fast thinking by triggering direct answers often outperforms slow thinking after deliberative reasoning. Our empirical analyses show that fast thinking improves recall with broader and more confident predictions, whereas slow thinking favors precision through conservative filtering of incorrect categories. Building on these insights, we propose MER-R1, a reinforcement learning framework that turns slow-fast complementarity into explicit optimization. Dual-objective disentanglement separates recall and precision into two optimization signals, allowing them to be jointly optimized rather than traded off against each other. Slow-fast confidence calibration further aligns the final slow-thinking answer with fast-thinking intuition, strengthening correct emotions while suppressing incorrect ones. In this way, MER-R1 unifies the recall-oriented intuition of fast thinking with the precision-oriented selectivity of slow thinking. We further provide theoretical justification for this synergy, showing that it mitigates variance-induced interference during optimization. Extensive experiments on MER-UniBench and MME-Emotion show that MER-R1 achieves state-of-the-art performance and makes reasoning genuinely benefit emotion recognition.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

When Does Personality Composition Matter for Multi-Agent LLM Teams?

A new study investigates how prompting personality traits in LLMs affects multi-agent team performance, revealing that the impact of personality depends heavily on the specific task structure.

agentic-systems

Odyssey: Constructing Verifiable Local Truth-Preserving Foundation Models

Researchers introduce Odyssey, a categorical framework designed to construct verifiable, local truth-preserving foundation models. By leveraging advanced mathematical concepts like sheaf theory and Kan extensions, Odyssey ensures AI models maintain factual consistency and logical integrity across diverse domains.

NOW LET US Related – JD Oxygen AI Item Center (Oxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications

agentic-systems

JD Oxygen AI Item Center (Oxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications

JD.com has introduced Oxygen AIIC, an industrial-scale platform leveraging LLMs and VLMs to optimize the management and understanding of billions of products. This solution significantly improves user experience, reduces operational costs, and enhances search and recommendation efficiency across the e-commerce platform.

agentic-systems

Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System

Researchers have developed DD-Elo, a new chess rating system based on the drift-diffusion model from cognitive neuroscience. By analyzing move-by-move data rather than just match outcomes, DD-Elo updates player ratings much faster and more accurately than the traditional Elo system.

agentic-systems

What We are Missing in Multimodal LLM Evaluation?

While multimodal large language models (MLLMs) are advancing rapidly, current evaluation benchmarks fail to keep pace. This research highlights critical gaps in assessing how these models truly integrate cross-modal information.

agentic-systems

Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems

Researchers have formalized 'Instruction Bleed' (Compositional Behavioral Leakage), a recurring failure mode in prompt-composed agentic systems where editing one prompt module silently shifts the behavior of others due to lack of architectural isolation in Transformer self-attention.

EXPLORE TOPICS