Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessHow Google's Ad Review Bots Have Evolved in 2026: What Media Buyers Need to KnowDEV CommunityApfel: The Free AI Already Built Into Your MacDEV CommunityOpenClaw SaaS vs Self-Hosting: Which One Should You Choose in 2026?DEV Community7 Best AI Coding Assistant Tools in 2026DEV CommunityHow Is Agentic AI Changing Travel Booking? What Ask Skift Says - SkiftGNews AI agenticWhat is GEO (Generative Engine Optimization)? The 2026 GuideDev.to AI[D] CVPR 2026 Travel Grant/Registration WaiverReddit r/MachineLearningIAPP Global Privacy Summit 2026: State AI Trends, FTC Signals, California’s DROP Build-Out, and the Hard Work of Cookie Compliance - JD SupraGNews AI privacy[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?Reddit r/MachineLearningQIS for Energy Grids: Why Distributed Renewable Integration Keeps Failing and What Outcome Routing ChangesDev.to AIBig Banks Seeking a Piece of SpaceX’s I.P.O. Must Subscribe to Elon Musk’s GrokNYT TechnologyCan We Fix Political Conversation Online? Joe Kiani's CitizeX Is Betting on Identity Verification, Not AlgorithmsInternational Business TimesBlack Hat USADark ReadingBlack Hat AsiaAI BusinessHow Google's Ad Review Bots Have Evolved in 2026: What Media Buyers Need to KnowDEV CommunityApfel: The Free AI Already Built Into Your MacDEV CommunityOpenClaw SaaS vs Self-Hosting: Which One Should You Choose in 2026?DEV Community7 Best AI Coding Assistant Tools in 2026DEV CommunityHow Is Agentic AI Changing Travel Booking? What Ask Skift Says - SkiftGNews AI agenticWhat is GEO (Generative Engine Optimization)? The 2026 GuideDev.to AI[D] CVPR 2026 Travel Grant/Registration WaiverReddit r/MachineLearningIAPP Global Privacy Summit 2026: State AI Trends, FTC Signals, California’s DROP Build-Out, and the Hard Work of Cookie Compliance - JD SupraGNews AI privacy[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?Reddit r/MachineLearningQIS for Energy Grids: Why Distributed Renewable Integration Keeps Failing and What Outcome Routing ChangesDev.to AIBig Banks Seeking a Piece of SpaceX’s I.P.O. Must Subscribe to Elon Musk’s GrokNYT TechnologyCan We Fix Political Conversation Online? Joe Kiani's CitizeX Is Betting on Identity Verification, Not AlgorithmsInternational Business Times
AI NEWS HUBbyEIGENVECTOREigenvector

Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning

arXiv cs.ROby [Submitted on 2 Apr 2026]April 3, 20261 min read1 views
Source Quiz

arXiv:2604.01860v1 Announce Type: new Abstract: Expressive generative models have advanced robotic manipulation by capturing complex, multi-modal action distributions over temporally extended trajectories. However, fine-tuning these policies via RL remains challenging due to instability and sample inefficiency. We introduce Posterior Optimization with Clipped Objective (POCO), a principled RL framework that formulates policy improvement as a posterior inference problem tailored for temporal action chunks. Through an Expectation-Maximization procedure, POCO distills a reward-weighted implicit posterior into the policy without likelihood estimation. Furthermore, POCO adopts an offline-to-online paradigm that anchors online exploration to pre-trained priors, and its model-agnostic design scal

View PDF HTML (experimental)

Abstract:Expressive generative models have advanced robotic manipulation by capturing complex, multi-modal action distributions over temporally extended trajectories. However, fine-tuning these policies via RL remains challenging due to instability and sample inefficiency. We introduce Posterior Optimization with Clipped Objective (POCO), a principled RL framework that formulates policy improvement as a posterior inference problem tailored for temporal action chunks. Through an Expectation-Maximization procedure, POCO distills a reward-weighted implicit posterior into the policy without likelihood estimation. Furthermore, POCO adopts an offline-to-online paradigm that anchors online exploration to pre-trained priors, and its model-agnostic design scales to fine-tune large VLA models without architectural modifications. Evaluations across 7 simulation benchmarks and 4 contact-rich real-world tasks demonstrate that POCO prevents catastrophic policy collapse, outperforms SOTA baselines, and achieves a 96.7% success rate on real-world tasks. Videos are available at our project website this https URL.

Subjects:

Robotics (cs.RO)

Cite as: arXiv:2604.01860 [cs.RO]

(or arXiv:2604.01860v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2604.01860

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yuhui Chen [view email] [v1] Thu, 2 Apr 2026 10:15:47 UTC (11,831 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarkannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Posterior O…modelbenchmarkannounceavailablevaluationpolicyarXiv cs.RO

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 179 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI