NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

Share
NOW LET US Article – Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

A new study reveals that LLM-as-judge evaluation systems, while stable under neutral conditions, are highly susceptible to manipulation through post-decision interactions, potentially distorting benchmark rankings.

Computer Science > Artificial Intelligence

Title:Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges

LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show that this assumption does not hold under interaction. We study post-decision manipulability: the extent to which an evaluation outcome can be altered through subsequent conversation with the judge after an initial decision has been made. Across controlled experiments on MT-Bench and AlpacaEval, we find that LLM judges are highly stable under repeated and neutral reevaluation, yet become substantially reversible under targeted post-decision challenge. An anti-baseline challenge protocol shows that stable judgments can be overturned through motivated interaction, while a counterbalanced target-validation protocol separates this reversibility from net target-directed steering. These reversals have practical consequences: they can degrade agreement with human preferences, shift benchmark rankings, and produce harmful evaluation changes despite high self-reported confidence. Authority framing is especially destabilizing, and revised judgments are often accompanied by low-overlap justifications, suggesting post hoc rationalization rather than reliable error correction. We introduce the Evaluation Robustness Score (ERS) to quantify interactional robustness by combining reversal susceptibility with counterbalanced directional effects. Our findings identify post-decision interaction as a distinct failure mode for LLM-as-judge evaluation and motivate evaluation protocols that measure not only static agreement, but robustness under challenge.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

agentic-systems

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

Researchers have proposed SAGE-PTQ, a novel ultra-low-bit post-training quantization framework for LLMs that minimizes hidden scaling overhead. It significantly reduces GPU memory usage and accelerates decoding speed while maintaining high accuracy compared to existing methods.

NOW LET US Related – Agents' Last Exam

agentic-systems

Agents' Last Exam

A new benchmark called "Agents' Last Exam" (ALE) has been introduced to evaluate AI agents on long-horizon, economically valuable, real-world tasks, revealing that current models achieve an average pass rate of just 2.6% on the hardest tier.

NOW LET US Related – A Motivational Architecture for Conversational AGI

agentic-systems

A Motivational Architecture for Conversational AGI

This paper proposes a conversational reinterpretation of the OpenPsi motivational lineage, coupled to MetaMo's higher-level motivational scaffold, for agents built on a modular execution substrate.

NOW LET US Related – Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

agentic-systems

Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

A new study analyzing 403 hyperscale data centers in the US reveals that the AI boom is driving electricity consumption and carbon emissions to alarming levels, with their carbon intensity averaging 48% higher than the national grid average.

NOW LET US Related – An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

agentic-systems

An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

Researchers have developed an interpretable and trustworthy AI framework to study the relationship between knee joint structural abnormalities and pain progression. By combining deep learning with advanced statistical modeling, this framework significantly improves prediction accuracy and clinical reliability.

NOW LET US Related – Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory

agentic-systems

Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory

Researchers have developed a novel AI framework that combines uncertainty-aware functional prediction with component-level fatigue assessment to evaluate the reuse potential of returned products in circular factories.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.