Tracking the emergence of linguistic structure in self-supervised models learning from speech
Hey there, little explorer! Imagine you have a super-smart robot friend who wants to learn how to talk, just like you!
This science story is about how these robot friends, called "AI models," learn from listening to people speak. It's like they're listening to lots and lots of Dutch stories and songs.
The scientists wanted to know: When does the robot start understanding different things? Like, when does it learn the sounds? When does it learn whole words? And when does it learn to make sentences?
They found out that the robot learns different things at different times, like building blocks! First, it learns the tiny sounds, then it puts them together to make words, and then it learns how words fit to make sense. It's like watching a baby learn to talk, but super fast! Isn't that cool?
arXiv:2604.02043v1 Announce Type: cross Abstract: Self-supervised speech models learn effective representations of spoken language, which have been shown to reflect various aspects of linguistic structure. But when does such structure emerge in model training? We study the encoding of a wide range of linguistic structures, across layers and intermediate checkpoints of six Wav2Vec2 and HuBERT models trained on spoken Dutch. We find that different levels of linguistic structure show notably distinct layerwise patterns as well as learning trajectories, which can partially be explained by differences in their degree of abstraction from the acoustic signal and the timescale at which information from the input is integrated. Moreover, we find that the level at which pre-training objectives are d
View PDF HTML (experimental)
Abstract:Self-supervised speech models learn effective representations of spoken language, which have been shown to reflect various aspects of linguistic structure. But when does such structure emerge in model training? We study the encoding of a wide range of linguistic structures, across layers and intermediate checkpoints of six Wav2Vec2 and HuBERT models trained on spoken Dutch. We find that different levels of linguistic structure show notably distinct layerwise patterns as well as learning trajectories, which can partially be explained by differences in their degree of abstraction from the acoustic signal and the timescale at which information from the input is integrated. Moreover, we find that the level at which pre-training objectives are defined strongly affects both the layerwise organization and the learning trajectories of linguistic structures, with greater parallelism induced by higher-order prediction tasks (i.e. iteratively refined pseudo-labels).
Subjects:
Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2604.02043 [cs.CL]
(or arXiv:2604.02043v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2604.02043
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Marianne De Heer Kloots [view email] [v1] Thu, 2 Apr 2026 13:48:22 UTC (7,082 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingannounce
The Spaceballs sequel will be released in April next year
There's finally a release date for the Spaceballs sequel — but before you get too excited, it's a whole year away. As first reported by Deadline , Amazon MGM Studios announced on Friday night that the upcoming Spaceballs movie will hit theaters on April 23, 2027, right around the 40th anniversary of the first film. Several members of the original cast will be reprising their roles, according to Deadline , including Mel Brooks, Rick Moranis, Bill Pullman, George Wynder and Daphne Zuniga. Spaceballs: The Release Date. April 23, 2027. pic.twitter.com/5Xv0BKmf7C — Amazon MGM Studios (@AmazonMGMStudio) April 4, 2026 Whispers of a potential Spaceballs 2 go back a couple of years, but Brooks officially confirmed in an extremely on-brand announcement video last summer that the movie is actually ha
![How to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]](https://media2.dev.to/dynamic/image/width=1200,height=627,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fap1l58ek0p6aqj2yrzi6.png)
How to Embed ChatGPT in Your Website: 5 Methods Compared [2026 Guide]
You want ChatGPT on your website. Maybe for customer support. Maybe to answer FAQs automatically. Or maybe you're running live events and need AI to handle the flood of questions pouring into your chat room. Learning how to embed ChatGPT in your website is simpler than you think - but there's more to consider than most guides tell you. Here's the thing: most guides only cover half the picture. They show you how to add a basic AI chatbot widget. But what happens when 5,000 people hit your site during a product launch? What about moderating AI responses before your chatbot tells a customer something embarrassingly wrong? And what if you need AI assistance in a group chat, not just a 1-to-1 support conversation? To embed ChatGPT in your website, you have two main approaches: use a no-code pla
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Research across 1,372 participants and 9K+ trials details "cognitive surrender", where most subjects had minimal AI skepticism and accepted faulty AI reasoning (Kyle Orland/Ars Technica)
Kyle Orland / Ars Technica : Research across 1,372 participants and 9K+ trials details cognitive surrender , where most subjects had minimal AI skepticism and accepted faulty AI reasoning When it comes to large language model-powered tools, there are generally two broad categories of users.





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!