Defend: Automated Rebuttals for Peer Review with Minimal Author Guidance
arXiv:2603.27360v1 Announce Type: new Abstract: Rebuttal generation is a critical component of the peer review process for scientific papers, enabling authors to clarify misunderstandings, correct factual inaccuracies, and guide reviewers toward a more accurate evaluation. We observe that Large Language Models (LLMs) often struggle to perform targeted refutation and maintain accurate factual grounding when used directly for rebuttal generation, highlighting the need for structured reasoning and author intervention. To address this, in the paper, we introduce DEFEND an LLM based tool designed t — Jyotsana Khatri, Manasi Patwardhan
View PDF HTML (experimental)
Abstract:Rebuttal generation is a critical component of the peer review process for scientific papers, enabling authors to clarify misunderstandings, correct factual inaccuracies, and guide reviewers toward a more accurate evaluation. We observe that Large Language Models (LLMs) often struggle to perform targeted refutation and maintain accurate factual grounding when used directly for rebuttal generation, highlighting the need for structured reasoning and author intervention. To address this, in the paper, we introduce DEFEND an LLM based tool designed to explicitly execute the underlying reasoning process of automated rebuttal generation, while keeping the author-in-the-loop. As opposed to writing the rebuttals from scratch, the author needs to only drive the reasoning process with minimal intervention, leading an efficient approach with minimal effort and less cognitive load. We compare DEFEND against three other paradigms: (i) Direct rebuttal generation using LLM (DRG), (ii) Segment-wise rebuttal generation using LLM (SWRG), and (iii) Sequential approach (SA) of segment-wise rebuttal generation without author intervention. To enable finegrained evaluation, we extend the ReviewCritique dataset, creating review segmentation, deficiency, error type annotations, rebuttal-action labels, and mapping to gold rebuttal segments. Experimental results and a user study demonstrate that directly using LLMs perform poorly in factual correctness and targeted refutation. Segment-wise generation and the automated sequential approach with author-in-the-loop, substantially improve factual correctness and strength of refutation.
Subjects:
Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.27360 [cs.AI]
(or arXiv:2603.27360v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2603.27360
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Jyotsana Khatri [view email] [v1] Sat, 28 Mar 2026 18:12:31 UTC (958 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivSetting up a hugo static site hosted with Porkbun
<h2> Content generation </h2> <p>This is a static site generated with <a href="https://gohugo.io/getting-started/quick-start/" rel="noopener noreferrer">hugo</a> with the PaperMod theme. I wanted an easy to use static site generator. I considered <a href="https://jekyllrb.com/" rel="noopener noreferrer">Jekyll</a> and believe it to be a good choice for static sites. There seemed to be slightly more themes I liked with hugo so I went with that. That's a pretty superficial choice but I also don't plan on hacking on the site generation itself so I was agnostic to the Go versus Ruby choice.</p> <h2> Domain hosting </h2> <p>This site uses <a href="https://porkbun.com" rel="noopener noreferrer">porkbun</a> for a domain host. I chose it not least because I do enjoy porkbuns. They also listed stat
UTEP research seeks to make AI speech more natural - KTSM 9 News
<a href="https://news.google.com/rss/articles/CBMilgFBVV95cUxOUXU5QnZjNHp2OUY2MHpadVUydlBrVlVETzdZQmxFX2FaTEo0WjhUT29FckJMY1FXXy1zZFdrY3FScDRScnBsbDRSakhtXzdhdWFLa0N4TFdvMFFRVHF1VG5xdGw3VlNxeXBITThfY3Y0SDBDbDBIcXRoRklGLUpFVFpoSUc3YUdzZnE1Nk1CX0x3U1dlaEHSAZsBQVVfeXFMUFlfcVVpLWNYd3oxTnB3cmt4YXprOWl6UFhQTS02WUNra21hTFllc19BUm5sTEViamlzS2lheFAzR2g0UWZVaHRmZmhsWV9RU0NuR2t2VnhaZENHZUlVWjRPbFNPV1JHZVNLV1RkMl90QUt1YTJSMEVHQUhCQktBc00xRDhrMl9INEFEUFY0dkVjV0NEOUFYdmNwd00?oc=5" target="_blank">UTEP research seeks to make AI speech more natural</a> <font color="#6f6f6f">KTSM 9 News</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQQWE5dG14R3FVTXBQdGRCcHVFU3N6a2QyWUVXamxHSHdINE9UQTZvci14ZlJVS3VzSk5VNDg4Wm1PQjN5U244Z3U2Mm1rd1lNNTZNTE1OMEoyQjI1SkNUTV8xRW5KdVBYYkViVjZOZlZnV01lQTJZemc1V2RWV1QtUFhEVFRFcWpYNGFOTC1kWDhtN3VpTjNjOVJyMDhlMmV6NUJGWTZZRFc1MU5pWHdYR1JJS1Nxbnk0Um9tV3RGV2h4SEVBbTJxeEk4azFLYTdBdlRmcjJyNmJyT09lQjh6aFlLajBDUXplMDczMDVvUDZvNDExR2ltUE9EeWZUN004TURNOXZBc2t1S2JBMURDNUwxZlVfczd4dmhQSkNiSFRzZmgxZXZ4RWV2SFhObWh4a2VFUFZPY0VSM09LWFBYUi14Y192R29NOU4welJnUWN1MmYwUWtHUEhzVzJVTWxoUF9VcFhsYzZhTDVuVFBrbFR3ai1aLVY1SkZZMlR0TW1ZWmpHNEw2aE1LLVJiYS1HOGNBT05DbE13Z2Y4SVFRc2VzUDFTQlhZYjBsb0JaQkxDUV9FdE9VSg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Why Drug Toxicity Can’t Be Predicted in Isolation — Building EIRION with Graph Neural Networks
How we built a graph neural network that finally sees the whole play — not just the audition Every year, drugs that passed early safety tests go on to harm people in ways nobody predicted. Not because the chemistry was wrong. Not because the researchers were careless. But because we kept evaluating drugs the way a talent agent judges an actor from a solo audition tape. Isolated. Out of context. No script. No co-stars. No stage. In real theatre, a performance is never just about one actor. It depends on who they share the stage with, which scene they appear in, what the story demands at that moment. A brilliant performer in the wrong play, surrounded by the wrong cast, in the wrong context — can still wreck the whole production. That is exactly how drug toxicity works. And that is exactly t
It's Not Smarter Models — It's Cheaper Memory: TurboQuant's Real Impact, Wall Street Panic & Academic Storm
<blockquote> <p><strong>One-line summary:</strong> TurboQuant is a genuinely important engineering breakthrough — but Google's marketing, academic ethics controversy, and Wall Street's overreaction made the story far more dramatic than the technology itself.</p> </blockquote> <h2> 0. What This Article Answers </h2> <p>Google Research published TurboQuant at ICLR 2026 (<a href="https://arxiv.org/abs/2504.19874" rel="noopener noreferrer">arXiv 2504.19874</a>), claiming 6x memory compression, 8x speedup, and zero accuracy loss for LLM KV caches.</p> <p>Then, in the same week:</p> <ol> <li>Global memory stocks lost over <strong>$90 billion</strong> in market cap</li> <li>An ETH Zürich researcher publicly accused the paper of <strong>academic plagiarism and experimental fraud</strong> </li> <li
OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training
OptiMer enables flexible continual pre-training by decoupling data mixture ratio selection from training through post-hoc Bayesian optimization of distribution vectors extracted from individual dataset models. (1 upvotes on HuggingFace)
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
CARLA-Air integrates high-fidelity driving and multirotor flight simulation within a unified Unreal Engine framework, supporting joint air-ground agent modeling with photorealistic environments and multi-modal sensing capabilities. (1 upvotes on HuggingFace)

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!