What Drives Interactive Improvement from Feedback?

A new study reveals that multi-turn improvements in LLMs are often driven by repeated attempts rather than feedback quality, highlighting that the student model's ability to act on feedback is the primary bottleneck.

Computer Science > Artificial Intelligence

Title:What Drives Interactive Improvement from Feedback?

View PDF HTML (experimental)Abstract:We study when natural-language feedback produces improvement beyond the gains obtainable from repeated attempts alone. In multi-turn language agent setting, higher final accuracy can reflect useful feedback, but it can also arise from resampling, format correction, or additional test-time computation. To separate these effects, we introduce a controlled student-teacher protocol across Omni-MATH, Codeforces, BBEH Linguini, and ARC-AGI1, evaluating thirteen open-weight models in both student and teacher roles. We compare external feedback, self-feedback, and unguided self-refinement, while varying interaction history, task difficulty, and teacher access to privileged task information. Across settings, we find that multi-turn improvement is often not evidence of feedback use: self-generated feedback adds little beyond unguided self-refinement, whereas the strongest external teachers produce substantially larger feedback-specific gains, suggesting that useful feedback must provide guidance beyond generic retry. Dense student-teacher interaction matrices further show that interactive gains are driven more by the student's ability to use feedback than by the teacher's identity, although teacher choice remains important for a fixed student. These results suggest that feedback-based agents should be evaluated against repeated-attempt baselines, and that ability to act on feedback, not merely feedback availability, is a central bottleneck for interactive improvement. We release our controlled student-teacher evaluation framework at this https URL.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

MultiUAV-Plat: An LLM-Oriented Platform, Benchmark and Framework for Multi-UAV Collaborative Task Planning

Researchers have introduced MultiUAV-Plat, a breakthrough simulation and benchmarking platform for LLM-based multi-UAV collaborative task planning, alongside the Agent4Drone framework which significantly improves task success rates.

agentic-systems

AgRefactor: Self-Evolving Agentic Workflow for HLS Compatibility and Performance

Researchers have introduced AgRefactor, an LLM-based multi-agent workflow that automates the refactoring of software code into HLS-compatible programs. Featuring a self-evolving memory system and tool integration, AgRefactor outperforms existing solutions and paves the way for automated chip design.

agentic-systems

Contrastive Reflection for Iterative Prompt Optimization

Researchers have introduced Contrastive Reflection, an iterative prompt-optimization framework for agentic information retrieval workflows. By comparing failed and successful execution traces, the method improves exact-match accuracy on HotpotQA from 51.4% to 60.4%.

agentic-systems

Neuro-Bayesian-Symbolic Residual Attention Shallow Network: Explainable Deep Learning for Cybersecurity Risk Assessment

Researchers have introduced NBS-RASN, a hybrid shallow network architecture that brings explainability to cybersecurity risk assessment in open-source ecosystems, proving that shallow networks with deep reasoning can outperform opaque deep models.

agentic-systems

DDIAgents: Mechanism-Conditioned Context Flow for Drug-Drug Interaction Prediction

Researchers have introduced DDIAgents, a novel multi-agent framework that improves drug-drug interaction (DDI) prediction through dynamic knowledge orchestration. By adapting context flow to specific interaction mechanisms, DDIAgents outperforms existing baselines and enhances interpretability in AI4Science.

agentic-systems

Beyond Compilation: Evaluating Faithful Natural-Language-to-Lean Statement Formalization

A new study reveals that successfully compiling AI-generated Lean statements does not guarantee semantic accuracy, exposing a significant gap and proposing a more rigorous evaluation framework.

EXPLORE TOPICS