NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

SafeGene: Reusable Adapters for Transferable Safety Alignment

Share
NOW LET US Article – SafeGene: Reusable Adapters for Transferable Safety Alignment

Fine-tuning open-weight LLMs often inadvertently degrades their safety alignment, making them vulnerable to malicious prompts. SafeGene addresses this by introducing a reusable safety-adapter module that restores safety across various downstream tasks without compromising model performance.

Computer Science > Artificial Intelligence

Title:SafeGene: Reusable Adapters for Transferable Safety Alignment

View PDF HTML (experimental)Abstract:Open-weight LLMs are increasingly fine-tuned into customized assistants, but downstream fine-tuning can weaken safety alignment and make models more vulnerable to malicious prompts, even when the training data is not intentionally harmful. This creates a recurring safety recovery problem as target models are repeatedly updated with new task data or user interactions. We propose SafeGene, a reusable safety-adapter module designed for cross-task reuse within each architecture-compatible model family. Rather than treating safety recovery as a model-specific repair step, SafeGene treats safety capability as an independent, reusable adapter representation decoupled from task-specific updates. This representation is obtained from aligned--degraded model discrepancies, refined into task-transferable safety vectors through data-aware layer selection, and expressed in each downstream task-adapted model via few-shot layer-wise coefficient recalibration. Experiments across multiple model families, downstream tasks, and safety judges show that SafeGene-enhanced models reduce harmful response rates while maintaining downstream performance, outperforming representative safe adaptation methods in safety--utility trade-off.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – DiBS: Diffusion-Informed Branch Selection

agentic-systems

DiBS: Diffusion-Informed Branch Selection

Researchers have introduced DiBS, a novel approach that guides symbolic solvers with diffusion models to solve complex Sudoku puzzles. This method significantly reduces search costs and backtracks on challenging instances while maintaining strict correctness guarantees.

NOW LET US Related – Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

agentic-systems

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Researchers introduce Brick-Composer, a learning framework that equips multimodal large language models (MLLMs) with spatial reasoning and visual grounding capabilities for brick assembly, significantly improving their construction accuracy.

NOW LET US Related – Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison

agentic-systems

Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison

A new study compared three state-of-the-art LLMs (GPT-4o, Claude Sonnet, and Llama 3.1) against ten medical specialists in summarizing clinical literature. While expert-written summaries remain preferred, the study reveals that distinguishing between human- and AI-generated medical content is becoming increasingly difficult.

NOW LET US Related – Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution

agentic-systems

Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution

A new study on convergence dynamics in LLM-driven program evolution reveals that large language models tend to cycle back to previously seen code structures rather than exploring novel solutions. This 'mutation without variation' phenomenon poses a significant challenge for using AI in autonomous software development.

NOW LET US Related – LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

agentic-systems

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

Researchers have introduced LeanMarathon, a novel multi-agent harness designed to solve the challenges of long-horizon mathematical autoformalization. By utilizing an evolving blueprint and a two-stage orchestrator, the system successfully formalized complex theorems autonomously.

NOW LET US Related – Residual Modeling for High-Fidelity Learned Compression of Scientific Data

agentic-systems

Residual Modeling for High-Fidelity Learned Compression of Scientific Data

Researchers have proposed a new residual modeling approach to optimize machine learning-based scientific data compression. This solution addresses the block-level accuracy degradation issue, outperforming traditional methods in high-fidelity scenarios.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.