Live
Black Hat USADark ReadingBlack Hat AsiaAI Businesstrunk/1c3c1f23f399a48a252043977d161cd647046533PyTorch ReleasesTaiwan tackles CPO testing bottlenecks to scale SiPh for AI data centers - digitimesGNews AI TaiwanWeekend Project: I Built a Full MLOps Pipeline for a Credit Scoring Model (And You Can Too)Hackernoon AIUMich Engineering, School of Information offers AI minors - The Michigan DailyGNews AI educationHuawei gave tough spot to Nvidia in 2025 Chinese AI chip sales race - Huawei CentralGNews AI HuaweiShahed-killing interceptor drones may look simple, but building them to keep up with the threat isn't easyBusiness InsiderHow Strataphy Geothermal Cooling to Manage AI's Energy Demands - cairoscene.comGNews AI energyUber drivers: Your boss knows you're using Tesla's FSD on the jobBusiness InsiderPitchBook: US venture funding surges to record $267B as OpenAI, Anthropic and xAI dominate AI deals - SiliconANGLEGoogle News: OpenAISECURE: Stable Early Collision Understanding via Robust Embeddings in Autonomous DrivingarXivSven: Singular Value Descent as a Computationally Efficient Natural Gradient MethodarXivModel Merging via Data-Free Covariance EstimationarXivBlack Hat USADark ReadingBlack Hat AsiaAI Businesstrunk/1c3c1f23f399a48a252043977d161cd647046533PyTorch ReleasesTaiwan tackles CPO testing bottlenecks to scale SiPh for AI data centers - digitimesGNews AI TaiwanWeekend Project: I Built a Full MLOps Pipeline for a Credit Scoring Model (And You Can Too)Hackernoon AIUMich Engineering, School of Information offers AI minors - The Michigan DailyGNews AI educationHuawei gave tough spot to Nvidia in 2025 Chinese AI chip sales race - Huawei CentralGNews AI HuaweiShahed-killing interceptor drones may look simple, but building them to keep up with the threat isn't easyBusiness InsiderHow Strataphy Geothermal Cooling to Manage AI's Energy Demands - cairoscene.comGNews AI energyUber drivers: Your boss knows you're using Tesla's FSD on the jobBusiness InsiderPitchBook: US venture funding surges to record $267B as OpenAI, Anthropic and xAI dominate AI deals - SiliconANGLEGoogle News: OpenAISECURE: Stable Early Collision Understanding via Robust Embeddings in Autonomous DrivingarXivSven: Singular Value Descent as a Computationally Efficient Natural Gradient MethodarXivModel Merging via Data-Free Covariance EstimationarXiv
AI NEWS HUBbyEIGENVECTOREigenvector

Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning

arXivApril 3, 202610 min read0 views
Source Quiz

arXiv:2604.01345v1 Announce Type: new Abstract: Inverse reinforcement learning (IRL) recovers the loss function of a forward learner from its observed responses adaptive IRL aims to reconstruct the loss function of a forward learner by passively observing its gradients as it performs reinforcement learning (RL). This paper proposes a novel passive Langevin-based algorithm that achieves adaptive IRL. The key difficulty in adaptive IRL is that the required gradients in the passive algorithm are counterfactual, that is, they are conditioned on events of probability zero under the forward learner' — Vikram Krishnamurthy, Luke Snow

View PDF HTML (experimental)

Abstract:Inverse reinforcement learning (IRL) recovers the loss function of a forward learner from its observed responses adaptive IRL aims to reconstruct the loss function of a forward learner by passively observing its gradients as it performs reinforcement learning (RL). This paper proposes a novel passive Langevin-based algorithm that achieves adaptive IRL. The key difficulty in adaptive IRL is that the required gradients in the passive algorithm are counterfactual, that is, they are conditioned on events of probability zero under the forward learner's trajectory. Therefore, naive Monte Carlo estimators are prohibitively inefficient, and kernel smoothing, though common, suffers from slow convergence. We overcome this by employing Malliavin calculus to efficiently estimate the required counterfactual gradients. We reformulate the counterfactual conditioning as a ratio of unconditioned expectations involving Malliavin quantities, thus recovering standard estimation rates. We derive the necessary Malliavin derivatives and their adjoint Skorohod integral formulations for a general Langevin structure, and provide a concrete algorithmic approach which exploits these for counterfactual gradient estimation.

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2604.01345 [cs.LG]

(or arXiv:2604.01345v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2604.01345

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Luke Snow [view email] [v1] Wed, 1 Apr 2026 19:56:02 UTC (1,131 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Malliavin C…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 151 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers