Models model training announce update agentic agent

Execution-Verified Reinforcement Learning for Optimization Modeling

ArXiv CS.AIby [Submitted on 1 Apr 2026]April 2, 20262 min read1 views

arXiv:2604.00442v1 Announce Type: new Abstract: Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, opti

View PDF HTML (experimental)

Abstract:Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, optimized with GRPO and DAPO in a closed-loop generate-execute-feedback-update process. This outcome-only formulation removes the need for process-level supervision, and enables cross-solver generalization by switching the verification environment rather than reconstructing solver-specific datasets. Experiments on NL4OPT, MAMO, IndustryOR, and OptiBench across Gurobi, OR-Tools, and COPT show that EVOM matches or outperforms process-supervised SFT, supports zero-shot solver transfer, and achieves effective low-cost solver adaptation by continuing training under the target solver backend.

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Cite as: arXiv:2604.00442 [cs.AI]

(or arXiv:2604.00442v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2604.00442

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Runda Guan [view email] [v1] Wed, 1 Apr 2026 03:39:11 UTC (668 KB)

Original source

ArXiv CS.AI

https://arxiv.org/abs/2604.00442

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modeltrainingannounce

Self-Evolving AILive

Agentic AI Is Moving Fast and Businesses Need to Catch Up - HackerNoon

Agentic AI Is Moving Fast and Businesses Need to Catch Up HackerNoon

GNews AI agentic

1mabout 1 hour ago

ModelsFresh

Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize - VentureBeat

Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize VentureBeat

GNews AI open source

1mabout 3 hours ago

Self-Evolving AIRecent

Why Agentic AI Needs Boundaries Before Freedom - unu.edu

Why Agentic AI Needs Boundaries Before Freedom unu.edu

GNews AI agentic

1mabout 20 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 170 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Fears Over U.S. AI Dominance Boost Business for France’s Mistral - WSJ

Fears Over U.S. AI Dominance Boost Business for France’s Mistral WSJ

Google News - Mistral AI France

1m10 months ago

ModelsFresh

Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize - VentureBeat

Arcee's new, open source Trinity-Large-Thinking is the rare, powerful U.S.-made AI model that enterprises can download and customize VentureBeat

GNews AI open source

1mabout 3 hours ago

ModelsLive

Google strongly implies the existence of large Gemma 4 models

In the huggingface card: Increased Context Window – The small models feature a 128K context window, while the medium models support 256K. Small and medium... implying at least one large model! 124B confirmed :P submitted by /u/coder543 [link] [comments]

Reddit r/LocalLLaMA

1mabout 2 hours ago

ModelsFresh

Which company has the second best Coding AI model end of April? Trading Odds & Predictions - Polymarket

Which company has the second best Coding AI model end of April? Trading Odds & Predictions Polymarket

GNews AI Alibaba

1mabout 5 hours ago