AGENTIC-SYSTEMSMarch 20, 20261 min read12 views

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

InfoMamba is a novel attention-free hybrid architecture that combines the efficiency of Mamba-style state-space models with the global modeling capabilities of Transformers, achieving near-linear scaling and superior performance.

Computer Science > Machine Learning

Title:InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

View PDF HTML (experimental)Abstract:Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs) scale linearly but often struggle to capture high-rank and synchronous global interactions. We present a consistency boundary analysis that characterizes when diagonal short-memory SSMs can approximate causal attention and identifies structural gaps that remain. Motivated by this analysis, we propose InfoMamba, an attention-free hybrid architecture. InfoMamba replaces token-level self-attention with a concept bottleneck linear filtering layer that serves as a minimal-bandwidth global interface and integrates it with a selective recurrent stream through information-maximizing fusion (IMF). IMF dynamically injects global context into the SSM dynamics and encourages complementary information usage through a mutual-information-inspired objective. Extensive experiments on classification, dense prediction, and non-vision tasks show that InfoMamba consistently outperforms strong Transformer and SSM baselines, achieving competitive accuracy-efficiency trade-offs while maintaining near-linear scaling.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

Probabilistic Concept-Aware Steering for Trustworthy LLM Inference

Researchers have introduced Probabilistic Concept-Aware Steering (PCS), an inference-time intervention framework for LLMs. PCS provides fine-grained, safety-oriented semantic steering while preserving original task competence.

agentic-systems

MUX: Continuous Reasoning via Multiplexed Tokens

Researchers introduced MUX, a novel latent reasoning method that compresses discrete text-based reasoning steps into continuous multiplexed tokens. By enabling lossless superposition, MUX significantly boosts LLM reasoning efficiency and speed across complex problem-solving tasks.

agentic-systems

ProbSPARQL: Querying Knowledge Graphs with Multi-dimensional, Uncertain Numeric Data

Researchers have introduced ProbSPARQL, an upward-compatible SPARQL extension designed to query multi-dimensional and uncertain numeric measurement data within Knowledge Graphs, providing significant query performance gains for complex industrial applications.

agentic-systems

Position: AI/ML Deepfake Research is Misaligned with AI-Generated Non-Consensual Intimate Imagery (AIG-NCII)

Current AI/ML research on deepfakes focuses primarily on epistemic harms like fake news and scams, leaving a dangerous gap in addressing AI-generated non-consensual intimate imagery (AIG-NCII) and its dignity harms to victims.

NOW LET US Related – Cross-Dialect Generalization Without Retraining: Benchmarks and Evaluation of Schema-Derived Constrained Decoding for MLIR

agentic-systems

Cross-Dialect Generalization Without Retraining: Benchmarks and Evaluation of Schema-Derived Constrained Decoding for MLIR

Researchers introduced schema-derived constrained decoding for MLIR, enabling a 1.7B small language model to match or beat 15B-34B LLMs without retraining, while generating code 8x-25x faster.

agentic-systems

Beyond Accuracy and Cost: Latency-Aware LLM Query Routing for Dynamic Workloads

Researchers introduced a novel latency-aware LLM query router that jointly optimizes generation latency, accuracy, and cost. The framework achieves up to a 40% improvement in accuracy-cost utility while maintaining low latency under dynamic workloads.