When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval

Researchers have proposed a training-free, self-evolving framework powered by an LLM agent to optimize legal case retrieval by automatically creating and refining query rewriting rules.

Computer Science > Artificial Intelligence

Title:When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval

View PDF HTML (experimental)Abstract:Legal case retrieval remains challenging due to the complexity of legal language and the need for precise lexical alignment between queries and relevant cases. Although dense retrieval models have achieved notable progress, empirical studies show that BM25 continues to serve as a strong baseline in this domain. It motivates us to propose a self-evolving framework for rule-driven query rewriting that enhances BM25 without any parameter training. The framework equips an LLM-based agent with an automatic evaluation environment, enabling it to iteratively create rewriting rules, plan validation experiments over rule combinations, and eliminate ineffective rules based on historical feedbacks. We evaluate our method on the Chinese legal case retrieval benchmark LeCaRD-v2. Experimental results demonstrate that the proposed framework outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection, particularly when powered by a highcapacity core LLM. We also conduct detailed analyses to investigate the mechanisms underlying self-evolution. Our findings reveal that LLM's capabilities to leverage previous experimental results and its intrinsic knowledge of rule elimination play critical roles in refining the rule set via self-evolution.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

SpeechDx: A Multi-Task Benchmark for Clinical Speech AI

Researchers have introduced SpeechDx, a large-scale benchmark for clinical speech AI spanning 12 datasets and 27 tasks. This tool aims to address fragmentation in medical AI research by standardizing the evaluation of speech-based disease diagnosis.

agentic-systems

Surrogate Assisted Pedestrian Protection Design via a Foundation Model Orchestrated Workflow

Researchers have developed the first foundation model-orchestrated workflow for crash safety design, reducing evaluation times from hours of conventional CAE simulations to mere seconds.

agentic-systems

SkillChain-Gym: A Benchmark for Reskilling-Aware Production-Inventory Control under Disruptions

SkillChain-Gym is a new benchmark for production-inventory control that integrates workforce reskilling and training dynamics under disruptions. It evaluates how training policies compete with production capacity and helps decision-makers balance operational efficiency and workforce resilience.

agentic-systems

Unlocking UK house-building with AI-accelerated planning

The UK government is partnering with Google DeepMind to develop a Gemini-powered AI prototype that aims to halve the processing time for homeowner planning applications, supporting the nation's goal to build 1.5 million new homes.

agentic-systems

Feature Attribution in Directed Acyclic Graphs Using Edge Intervention

Researchers propose DAG-SHAP, a novel feature attribution method based on edge intervention in directed acyclic graphs, overcoming the limitations of traditional node-centric approaches to improve AI explainability.

agentic-systems

Metric Match: A Subset Selection Approach to Evaluating LLM Judge Reliability

Researchers have developed Metric Match, a novel method to estimate the reliability of LLM judges using limited human annotations. By selecting an optimal subset of samples, it reduces annotation needs by 32.5% and significantly cuts down evaluation costs.

EXPLORE TOPICS