Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models
arXiv:2603.25750v1 Announce Type: cross Abstract: As the paradigm of AI shifts from text-based LLMs to Speech Language Models (SLMs), there is a growing demand for full-duplex systems capable of real-time, natural human-computer interaction. However, the development of such models is constrained by the scarcity of high-quality, multi-speaker conversational data, as existing large-scale resources are predominantly single-speaker or limited in volume. Addressing the complex dynamics of natural dialogue, such as overlapping and back-channeling remains a challenge, with standard processing pipelin — Kyudan Jung, Jihwan Kim, Soyoon Kim, Jeongoon Kim, Jaegul Choo, Cheonbok Park
View PDF
Abstract:As the paradigm of AI shifts from text-based LLMs to Speech Language Models (SLMs), there is a growing demand for full-duplex systems capable of real-time, natural human-computer interaction. However, the development of such models is constrained by the scarcity of high-quality, multi-speaker conversational data, as existing large-scale resources are predominantly single-speaker or limited in volume. Addressing the complex dynamics of natural dialogue, such as overlapping and back-channeling remains a challenge, with standard processing pipelines suffering from diarization errors and ASR hallucinations. To bridge this gap, we present a robust and scalable open-source data processing pipeline designed for full-duplex model.
Comments: 34 pages, 7 figures, 11 tables
Subjects:
Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2603.25750 [cs.SD]
(or arXiv:2603.25750v1 [cs.SD] for this version)
https://doi.org/10.48550/arXiv.2603.25750
arXiv-issued DOI via DataCite
Submission history
From: Kyudan Jung [view email] [v1] Fri, 20 Mar 2026 09:10:43 UTC (3,412 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivReview of Kawabata's "Palm of the Hand" stories and their translation into English
Perhaps this is a somewhat unusual subject for LessWrong, but hopefully it's of some interest, if only as a case study of what we lose through translation. "Palm of the Hand stories" refer to short stories written by Kawabata between 1923 and 1972. This is a review of a collection of such stories, translated by Lane Dunlop and J. Martin Holman (JMH). [Very mild spoiler for Love Suicides, The Grasshopper and the Bell Cricket] These are some of the best 1-2 page short stories I've ever read. Unfortunately, I suspect the writing loses some luster through translation. The first part of my review focuses on some of the greatest moments in the collection. The second part focuses on a particular piece, Water (1944), which I have went through the pain of retranslating. I hope that this will both s
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPdnA0SVIwQjktYkI3TUdZQWVHTXBDRWl6akZZOEhiVHVSZm53dkVoNEpEV0ZDOU1IUXBOVGZpNEVwUlRpaW1vbkwzTi1tcDJQMlliRUViWlNLaTQ1ak5vckdkWVdZTTBlMzM3bkRZbmM5LW42dTNKRkRBbGdmNmpWaVhDQXpSbzlDYTl4VE1jV2pIWGxQOXoxaWZ6SFBDU21sUmJKT2tmMjRjb1k0anBkLTRHbjFtbno5emtQaVNWUm1iZWF0UGJwZE9HZ29LWVUyVjdhdzA2cTF1R2NUY3J6bkJlUVhzYjVWZUZCdHdfbXJyX3lwRlJ6ak42MlJ3dUxTMEVpRHNGSmNfNi1GSmFmdTlkQUdCZEZvWlBBUjVYNTEtc0Y0ZFpkMGFKbTFFS3ZicjFYcllCMHV3YkJnZ2IxZkRTX1JiRlUzQkhjZzVYWlRUdVNfZGhqRWRWRmxyZTJJeHZ2T2RWQXR5aFZnMHgtdThweE5FdHNKOVZmOF9zMVdmb1djOWZxbFBkQ05lTndNLWZ6dFVYWXVudDZncGx6RllwcVJjVFRjUUdmOV9zOE9LYUgxTlR1eA?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxOQUF5ZjI0bHRwSEdoaXVWTnd6REQzYjByRUxfMndKT0RzS2RPaUktZ3BEVVVyNWczYzNUbjVxSzd5WGtwLTVuRnZlT3VZb0YyMk4zNFphZWNZbUh5WWhQY0ZSVWFFTTNXZXNXRTVibUpiRHBlaHhIeVNlQjhSZ29YZ1RVclkzS2p5cVhaWTFNSW5lU2o5VUNuUWwtNE1ObWEzT2RmRHZheE8zLW5HLU5rY0loeVdEM1dYRk02YlBLajdkZU5ZcEliR1ZzNWFvdFEwTEs1WEtVQS1aVUpBMmRncWJLS0ZKaGlSbTdQVmxfeXpIX3I2MGlJTDNuZE1OdVpPUWpzWXlfQkdUeHhGMnF2Y2FhakNDVjBYTFRqTXNJTjZXZ2JUSXZudjVremdZUDBMS24xN3lySEJTWmxOMWtWdUZhb1VHeUlQVFhnWnJtOFpGU052VVNiSENXNkdNbjdaVmZzbkI2MnpDMGNSZ2FzUVJ0WWFEWElEeDlCYzZZcHk5T3B6NGtYMmw5bTU5d3RVRGdmMnMxcE56T2o3cGhjeDBGTzJhUHVqdnp5ZVZ2MlZuQ3BCaDJXVg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

UK police force presses pause on live facial recognition after study finds racial bias
<h4>Cams statistically more likely to ID Black people, says new research</h4> <p>A UK police force has suspended its deployment of live facial recognition (LFR) technology after a study revealed it was statistically more likely to identify Black people on a watchlist database.…</p>

Caltech breakthrough makes quantum memory last 30 times longer
While superconducting qubits are great at fast calculations, they struggle to store information for long periods. A team at Caltech has now developed a clever solution: converting quantum information into sound waves. By using a tiny device that acts like a miniature tuning fork, the researchers were able to extend quantum memory lifetimes up to 30 times longer than before. This breakthrough could pave the way toward practical, scalable quantum computers that can both compute and remember.

Too much screen time may be hurting kids’ hearts
More screen time among children and teens is linked to higher risks of heart and metabolic problems, particularly when combined with insufficient sleep. Danish researchers discovered a measurable rise in cardiometabolic risk scores and a metabolic “fingerprint” in frequent screen users. Experts say better sleep and balanced daily routines can help offset these effects and safeguard lifelong health.

Unbreakable? Researchers warn quantum computers have serious security flaws
Quantum computers could revolutionize everything from drug discovery to business analytics—but their incredible power also makes them surprisingly vulnerable. New research from Penn State warns that today’s quantum machines are not just futuristic tools, but potential gold mines for hackers. The study reveals that weaknesses can exist not only in software, but deep within the physical hardware itself, where valuable algorithms and sensitive data may be exposed.

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!