Research Papers research paper arxiv speculative decoding draft model autoregressive generation

TAPS: Task Aware Proposal Distributions for Speculative Sampling

HuggingFace PapersMarch 27, 20262 min read3 views

Speculative decoding effectiveness depends on draft model training data alignment with downstream tasks, with specialized drafters performing better when combined through confidence-based routing rather than simple averaging. (2 upvotes on HuggingFace)

Published on Mar 27

Authors:

Abstract

AI-generated summary

Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however, draft models are usually trained on broad generic corpora, which leaves it unclear how much speculative decoding quality depends on the draft training distribution. We study this question with lightweight HASS and EAGLE-2 drafters trained on MathInstruct, ShareGPT, and mixed-data variants, evaluated on MT-Bench, GSM8K, MATH-500, and SVAMP. Measured by acceptance length, task-specific training yields clear specialization: MathInstruct-trained drafts are strongest on reasoning benchmarks, while ShareGPT-trained drafts are strongest on MT-Bench. Mixed-data training improves robustness, but larger mixtures do not dominate across decoding temperatures. We also study how to combine specialized drafters at inference time. Naive checkpoint averaging performs poorly, whereas confidence-based routing improves over single-domain drafts and merged-tree verification yields the highest acceptance length overall for both backbones. Finally, confidence is a more useful routing signal than entropy: rejected tokens tend to have higher entropy, but confidence produces much clearer benchmark-level routing decisions. These results show that speculative decoding quality depends not only on draft architecture, but also on the match between draft training data and downstream workload, and that specialized drafters are better combined at inference time than in weight space.

View arXiv page View PDF GitHub 0 Add to collection

Models citing this paper 10

Browse 10 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.27027 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Original source

HuggingFace Papers

https://huggingface.co/papers/2603.27027

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Market NewsFresh

PSMC emerges as key link in Europe's push to bring AI chip research to market - digitimes

PSMC emerges as key link in Europe's push to bring AI chip research to market digitimes

GNews AI chips

1mabout 9 hours ago

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1mabout 1 month ago

ProductsLive

New Advances Bring the Era of Quantum Computers Closer Than Ever

Two research groups say they have significantly reduced the amount of qubits and time required to crack common online security technologies. The post New Advances Bring the Era of Quantum Computers Closer Than Ever first appeared on Quanta Magazine

Quanta Magazine

12mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 129 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

TAPS: Task Aware Proposal Distributions for Speculative Sampling

Abstract

Models citing this paper 10

Datasets citing this paper 1

Spaces citing this paper 0

Collections including this paper 0

Daily AI Digest

More about

PSMC emerges as key link in Europe's push to bring AI chip research to market - digitimes

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

New Advances Bring the Era of Quantum Computers Closer Than Ever

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

AI or human? ASU researchers use radar to verify human speech - The State Press

Picking Up 'Skull Vibrations'? Could Be XR Headset Authentication