A Definition of Good Explanations and the Challenges Explaining LLM Outputs

Defining what constitutes a "good explanation" for AI decisions is a long-standing challenge. This paper proposes a new definition based on counterfactual explanations and user beliefs, highlighting why explaining LLM outputs remains particularly difficult.

Computer Science > Artificial Intelligence

Title:A Definition of Good Explanations and the Challenges Explaining LLM Outputs

View PDF HTML (experimental)Abstract:How to define a good explanation is a long-standing philosophical debate which has found recent renewed interest in the context of AI outputs. Explainability is crucial for AI adoption in many contexts, but in order to produce good explanations of AI systems, we must first have an understanding of what good explanations are. In this paper we propose a definition inspired by the notion of counterfactual explanations, however we argue that one must also take into account the interlocutor's prior beliefs in each fact that could be offered in an explanation. We explore the ramifications of this definition for AI explainability and, in particular, why LLM outputs are difficult to produce good explanations for.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry

Researchers have introduced "Adversarial Concept Search," a novel method that uses an LLM's representational geometry to predict which concept combinations it will fail on due to feature interference.

agentic-systems

Hyperdimensional computing for structured querying on tabular data embeddings

Researchers investigate the use of HyperDimensional Computing (HDC) to enable structured select-project queries on tabular data embeddings, solving the critical 'zero-match' detection problem that plagues current vector search methods.

NOW LET US Related – YeasierAgent: Agentic Social Sandbox as a Canvas for Intent-Driven Creation of Platform-Agnostic Symbiotic Agent-Native Applications

agentic-systems

YeasierAgent: Agentic Social Sandbox as a Canvas for Intent-Driven Creation of Platform-Agnostic Symbiotic Agent-Native Applications

The research on YeasierAgent introduces a groundbreaking paradigm for application building, shifting from isolated chatbots to cohesive multi-agent computational environments. By bypassing fixed graphical layouts, it enables rapid creation of platform-agnostic agent-native applications.

agentic-systems

Capability Minimization as a Safety Primitive: Risk-Aware Causal Gating for Least-Privilege LLM Agents

Researchers introduce Risk-Aware Causal Gating (RACG), a framework that enhances LLM agent safety by deciding whether to act, defer, or abstain based on counterfactual risk. By separating causal risk from predictive uncertainty, RACG significantly reduces high-cost errors in high-stakes decision-making.

agentic-systems

Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

While large language models can successfully close proof gaps in interactive theorem provers, a new case study reveals that AI-generated formalizations often fail expert reviews due to poor API design and structural issues. The researchers argue that autoformalization must be evaluated by human standards, not just compiled code.

agentic-systems

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

Researchers have introduced HOTE, a breakthrough framework that enables AI agents to self-evolve through a tri-evolutionary reinforcement learning mechanism, allowing an 8B model to outperform much larger models in complex, open-ended deep research tasks.

EXPLORE TOPICS