NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

Contrastive Reflection for Iterative Prompt Optimization

Share
NOW LET US Article – Contrastive Reflection for Iterative Prompt Optimization

Researchers have introduced Contrastive Reflection, an iterative prompt-optimization framework for agentic information retrieval workflows. By comparing failed and successful execution traces, the method improves exact-match accuracy on HotpotQA from 51.4% to 60.4%.

Computer Science > Artificial Intelligence

Title:Contrastive Reflection for Iterative Prompt Optimization

View PDF HTML (experimental)Abstract:LLM agents are becoming central to information retrieval: they issue retrieval queries, synthesize answers, and increasingly serve as judges for IR evaluation. Improving the prompts that control these agents is an optimization problem, but in applied IR settings it often looks less like blind search and more like debugging. Engineers need to know which behavior failed, which nearby behavior still worked, what distinguishes the two, and whether a prompt edit improves held-out quality without introducing regressions.

We present Contrastive Reflection, an iterative prompt-optimization framework for agentic IR workflows. The framework starts from a task-centric quality definition: QA agents expose retrieval or reasoning traces, and grading agents expose dimension-level scores and rationales. These structured traces are used to identify error-anchored behavioral slices, add nearby successful examples from the same region, and ask a Teacher LLM to propose a targeted prompt edit. Candidate edits are accepted only when validation performance improves, optionally subject to regression checks. We instantiate the framework with a tree-based slice selector, but the contribution is the contrastive reflection loop rather than the tree itself.

On a public HotpotQA retrieval-augmented QA setup, one tree-selected contrastive repair improves held-out exact-match accuracy from 51.4% to 60.4%. Failure-only and random-evidence variants improve less and break more previously correct examples. A light instruction-only comparison places the method near modern prompt optimizers: MIPROv2 reaches 59.4% and GEPA 57.0%. The result is an interpretable optimization loop for IR agents, aimed at making prompt repair more inspectable and validation-driven.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Neuro-Bayesian-Symbolic Residual Attention Shallow Network: Explainable Deep Learning for Cybersecurity Risk Assessment

agentic-systems

Neuro-Bayesian-Symbolic Residual Attention Shallow Network: Explainable Deep Learning for Cybersecurity Risk Assessment

Researchers have introduced NBS-RASN, a hybrid shallow network architecture that brings explainability to cybersecurity risk assessment in open-source ecosystems, proving that shallow networks with deep reasoning can outperform opaque deep models.

NOW LET US Related – MultiUAV-Plat: An LLM-Oriented Platform, Benchmark and Framework for Multi-UAV Collaborative Task Planning

agentic-systems

MultiUAV-Plat: An LLM-Oriented Platform, Benchmark and Framework for Multi-UAV Collaborative Task Planning

Researchers have introduced MultiUAV-Plat, a breakthrough simulation and benchmarking platform for LLM-based multi-UAV collaborative task planning, alongside the Agent4Drone framework which significantly improves task success rates.

NOW LET US Related – DDIAgents: Mechanism-Conditioned Context Flow for Drug-Drug Interaction Prediction

agentic-systems

DDIAgents: Mechanism-Conditioned Context Flow for Drug-Drug Interaction Prediction

Researchers have introduced DDIAgents, a novel multi-agent framework that improves drug-drug interaction (DDI) prediction through dynamic knowledge orchestration. By adapting context flow to specific interaction mechanisms, DDIAgents outperforms existing baselines and enhances interpretability in AI4Science.

NOW LET US Related – Beyond Compilation: Evaluating Faithful Natural-Language-to-Lean Statement Formalization

agentic-systems

Beyond Compilation: Evaluating Faithful Natural-Language-to-Lean Statement Formalization

A new study reveals that successfully compiling AI-generated Lean statements does not guarantee semantic accuracy, exposing a significant gap and proposing a more rigorous evaluation framework.

NOW LET US Related – A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management

agentic-systems

A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management

Researchers have proposed a novel three-phase deep reinforcement learning system that addresses key limitations in financial AI. The model enables tax-aware, highly personalized portfolio management by leveraging time-series foundation models and adapting to real-world user trading behaviors.

NOW LET US Related – When Regulation Has Memory: Hysteresis and Control Burden in Artificial Agency

agentic-systems

When Regulation Has Memory: Hysteresis and Control Burden in Artificial Agency

A new study reveals that adaptive artificial agents can exhibit stable behavior while masking a heavy internal "regulatory burden" influenced by their operational history. Researchers suggest that future AI evaluations should measure this hidden control cost rather than just outward stability.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.