AGENTIC-SYSTEMSMarch 20, 20261 min read15 views

Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction

Researchers developed the MultiTraitsss framework to generate 'Dark models' that simulate harmful AI behaviors, enabling the study and prevention of negative psychological outcomes in human-AI interactions.

Computer Science > Artificial Intelligence

Title:Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction

View PDF HTML (experimental)Abstract:Recent incidents have highlighted alarming cases where human-AI interactions led to negative psychological outcomes, including mental health crises and even user harm. As LLMs serve as sources of guidance, emotional support, and even informal therapy, these risks are poised to escalate. However, studying the mechanisms underlying harmful human-AI interactions presents significant methodological challenges, where organic harmful interactions typically develop over sustained engagement, requiring extensive conversational context that are difficult to simulate in controlled settings. To address this gap, we developed a Multi-Trait Subspace Steering (MultiTraitsss) framework that leverages established crisis-associated traits and novel subspace steering framework to generate Dark models that exhibits cumulative harmful behavioral patterns. Single-turn and multi-turn evaluations show that our dark models consistently produce harmful interaction and outcomes. Using our Dark models, we propose protective measure to reduce harmful outcomes in Human-AI interactions.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

Probabilistic Concept-Aware Steering for Trustworthy LLM Inference

Researchers have introduced Probabilistic Concept-Aware Steering (PCS), an inference-time intervention framework for LLMs. PCS provides fine-grained, safety-oriented semantic steering while preserving original task competence.

agentic-systems

MUX: Continuous Reasoning via Multiplexed Tokens

Researchers introduced MUX, a novel latent reasoning method that compresses discrete text-based reasoning steps into continuous multiplexed tokens. By enabling lossless superposition, MUX significantly boosts LLM reasoning efficiency and speed across complex problem-solving tasks.

agentic-systems

ProbSPARQL: Querying Knowledge Graphs with Multi-dimensional, Uncertain Numeric Data

Researchers have introduced ProbSPARQL, an upward-compatible SPARQL extension designed to query multi-dimensional and uncertain numeric measurement data within Knowledge Graphs, providing significant query performance gains for complex industrial applications.

agentic-systems

Position: AI/ML Deepfake Research is Misaligned with AI-Generated Non-Consensual Intimate Imagery (AIG-NCII)

Current AI/ML research on deepfakes focuses primarily on epistemic harms like fake news and scams, leaving a dangerous gap in addressing AI-generated non-consensual intimate imagery (AIG-NCII) and its dignity harms to victims.

NOW LET US Related – Cross-Dialect Generalization Without Retraining: Benchmarks and Evaluation of Schema-Derived Constrained Decoding for MLIR

agentic-systems

Cross-Dialect Generalization Without Retraining: Benchmarks and Evaluation of Schema-Derived Constrained Decoding for MLIR

Researchers introduced schema-derived constrained decoding for MLIR, enabling a 1.7B small language model to match or beat 15B-34B LLMs without retraining, while generating code 8x-25x faster.

agentic-systems

Beyond Accuracy and Cost: Latency-Aware LLM Query Routing for Dynamic Workloads

Researchers introduced a novel latency-aware LLM query router that jointly optimizes generation latency, accuracy, and cost. The framework achieves up to a 40% improvement in accuracy-cost utility while maintaining low latency under dynamic workloads.