AGENTIC-SYSTEMSApril 3, 20261 min read26 views

Execution-Verified Reinforcement Learning for Optimization Modeling

Researchers introduce EVOM, a framework that uses mathematical solvers as interactive verifiers to automate optimization modeling with LLMs. This outcome-based reinforcement learning approach eliminates costly process supervision and enables seamless cross-solver generalization.

Computer Science > Artificial Intelligence

Title:Execution-Verified Reinforcement Learning for Optimization Modeling

View PDF HTML (experimental)Abstract:Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, optimized with GRPO and DAPO in a closed-loop generate-execute-feedback-update process. This outcome-only formulation removes the need for process-level supervision, and enables cross-solver generalization by switching the verification environment rather than reconstructing solver-specific datasets. Experiments on NL4OPT, MAMO, IndustryOR, and OptiBench across Gurobi, OR-Tools, and COPT show that EVOM matches or outperforms process-supervised SFT, supports zero-shot solver transfer, and achieves effective low-cost solver adaptation by continuing training under the target solver backend.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

More in this category

agentic-systems

Teaching LLMs to Update Beliefs for Efficient Long-Horizon Interaction

As LLMs tackle longer tasks, retaining complete interaction histories becomes contextually and computationally expensive. The ABBEL framework addresses this by isolating summaries into natural-language 'belief states' and supervising them via belief grading, recovering accuracy while significantly cutting memory usage.

agentic-systems

DC-Leap: Training-Free Acceleration of dLLMs via Draft-Guided Contiguous Leaping Decoding

Researchers have introduced DC-Leap, a training-free framework that accelerates inference in Diffusion Large Language Models (dLLMs) by up to 105x when combined with KV-Cache while maintaining generation quality.

agentic-systems

Benchmarking Large Language Models on Multi-Sensor Physical Hazard Assessment

A new benchmark evaluates five leading large language models on multi-sensor physical hazard assessments, revealing a critical vulnerability when evaluating combined sensor risks below individual thresholds. Despite near-perfect accuracy on single-sensor breaches, all models failed to issue precautionary signals for multi-sensor hazards.

agentic-systems

Benchmarking the Personalization Capabilities of Large Language Models

A new study introduces SDR-Bench and SDR-Arena to benchmark the personalization capabilities of LLMs in two-party persuasion scenarios, revealing a personalization plateau among frontier models.

NOW LET US Related – Stochastic Sampling is Epistemically Shallow: The Dimensionality Gap Between Temperature Variation and Model Diversity in LLMs

agentic-systems

Stochastic Sampling is Epistemically Shallow: The Dimensionality Gap Between Temperature Variation and Model Diversity in LLMs

A new study reveals that stochastic sampling via temperature variation in a single LLM only provides per-question uncertainty, failing to capture complex cross-question epistemic structures compared to a diverse ensemble of distinct models.

NOW LET US Related – AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

agentic-systems

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

AINTMA introduces an agentic AI framework featuring six specialized autonomous agents to transform enterprise software test management. Evaluated across 12 projects over 18 months, the system cut test cycle times by 43% and reduced defect escape rates to 2.1%.