Beyond expert users: agents should help users construct preferences, not just elicit them

Current AI agents assume users have well-formed preferences, but a new study argues they should instead help users construct these preferences by providing domain knowledge. Evaluating frontier models on a new benchmark called CoShop reveals significant limitations in how current AI assists users in understanding their own needs.

Computer Science > Artificial Intelligence

Title: Beyond expert users: agents should help users construct preferences, not just elicit them

Abstract: Agents typically assume an expert user -- one with well-formed preferences about what they want -- and default to clarifying questions whenever the task is underspecified. We argue this assumption is unrealistic. Users often lack the domain knowledge to have completely specified preferences; if asked about their preference on some feature, the user may be unable to answer without the agent helping the user to learn some domain knowledge needed to form a preference for that feature, e.g., via examples or explanations. To formalize these principles, we draw on the Search-Experience-Credence framework from Information Economics to introduce CoPref, a model of how users construct preferences based on agent dialog actions. We then study these ideas concretely in agentic recommender systems, proposing CoShop, an interactive benchmark. In CoShop, an agent converses with and makes recommendations for a CoPref user. The agent's performance depends on whether it can help the user gain the knowledge needed to specify the task well. Evaluating five frontier models, we find that no agent exceeds 56% accuracy on CoShop despite five turns of interaction. Failures stem not from agents' ability to find items, but from how little the interaction expands what users know about what they want.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

How Can AI Find My Model? A Model-Finding Experimental Study Considering Data Formats, Embeddings, and Retrieval Strategies

A recent experimental study explores how AI can optimize the discovery and reuse of simulation models using natural language queries. By evaluating data formats, embedding models, and retrieval strategies, the research establishes a baseline for AI-driven model composability and interoperability.

agentic-systems

Why Solve It Twice? Hierarchical Accumulation of Skills for Transfer-Efficient ML Engineering

Researchers introduce HASTE, a hierarchical multi-agent system that organizes cross-competition knowledge into three scope tiers, allowing ML agents to accumulate and reuse skills, significantly reducing compute costs and improving performance.

agentic-systems

AgentBound: Verifiable Behavioral Governance for Autonomous AI Agents

Researchers have introduced AgentBound, a runtime governance framework that provides verifiable behavioral oversight for autonomous AI agents. By combining three independent authorities and generating cryptographically verifiable receipts, AgentBound establishes a deterministic governance layer between authorization and execution.

agentic-systems

Start building with Nano Banana 2 Lite and Gemini Omni Flash

Google has launched Nano Banana 2 Lite, its fastest and most cost-effective image generation model, alongside Gemini Omni Flash, a powerful tool for video generation and conversational editing. Together, these models empower developers to build seamless, end-to-end multimedia workflows.

agentic-systems

GPTNT: Benchmarking Real-Time Collaboration Between Multimodal Agents on Keep Talking And Nobody Explodes

Researchers introduce GPTNT, a new benchmark based on the game 'Keep Talking and Nobody Explodes' to evaluate real-time collaboration between multimodal AI agents. The study reveals that current state-of-the-art models fail to defuse a single bomb in real-time, highlighting key weaknesses in AI coordination.

agentic-systems

The Two Genie Game: Adoption and Welfare in Audit-Grounded AI Governance

A game-theoretic study analyzes when harm-minimizing AI agents can displace approval-seeking RLHF agents in competitive markets, revealing that self-audited AI is not a silver bullet for preventing community harm.

EXPLORE TOPICS