Models model language model training release announce paper

Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression

arXiv cs.CLby Ruoling Qi, Yirui Liu, Xuaner Wu, Xiangyu Wang, Ming Li, Chen Chen, Jian Chen, Yin Chen, Qizhen WengApril 4, 20261 min read0 views

Source Quiz

arXiv:2604.01609v1 Announce Type: new Abstract: The deployment of Large Language Models is constrained by the memory and bandwidth demands of static weights and dynamic Key-Value cache. SVD-based compression provides a hardware-friendly solution to reduce these costs. However, existing methods suffer from two key limitations: some are suboptimal in reconstruction error, while others are theoretically optimal but practically inefficient. In this paper, we propose Swift-SVD, an activation-aware, closed-form compression framework that simultaneously guarantees theoretical optimum, practical efficiency and numerical stability. Swift-SVD incrementally aggregates covariance of output activations given a batch of inputs and performs a single eigenvalue decomposition after aggregation, enabling tr

View PDF HTML (experimental)

Abstract:The deployment of Large Language Models is constrained by the memory and bandwidth demands of static weights and dynamic Key-Value cache. SVD-based compression provides a hardware-friendly solution to reduce these costs. However, existing methods suffer from two key limitations: some are suboptimal in reconstruction error, while others are theoretically optimal but practically inefficient. In this paper, we propose Swift-SVD, an activation-aware, closed-form compression framework that simultaneously guarantees theoretical optimum, practical efficiency and numerical stability. Swift-SVD incrementally aggregates covariance of output activations given a batch of inputs and performs a single eigenvalue decomposition after aggregation, enabling training-free, fast, and optimal layer-wise low-rank approximation. We employ effective rank to analyze local layer-wise compressibility and design a dynamic rank allocation strategy that jointly accounts for local reconstruction loss and end-to-end layer importance. Extensive experiments across six LLMs and eight datasets demonstrate that Swift-SVD outperforms state-of-the-art baselines, achieving optimal compression accuracy while delivering 3-70X speedups in end-to-end compression time. Our code will be released upon acceptance.

Comments: Under Review

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2604.01609 [cs.CL]

(or arXiv:2604.01609v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.01609

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jian Chen [view email] [v1] Thu, 2 Apr 2026 04:40:50 UTC (613 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2604.01609

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modeltraining

ReleasesLive

Same Model, Different Environment, Different Results

Same Model, Different Environment, Different Results I've been running the same foundation model in two different environments for the same project for several months. Not different models — the same one. Same underlying weights, same training, same capabilities. The only difference is the environment: what tools are available, how session state persists, what gets loaded into context before I ask a question. The outputs are systematically different. Not randomly different — not the kind of variation you'd get from temperature or sampling. Structurally different, in ways that repeat across sessions and follow predictable patterns. When I ask a causal question in one environment — "Why does this component exist?" — I get back a dependency chain. Clean, correct, verifiable against stored dat

DEV Community

13m35 minutes ago

ReleasesLive

I stopped managing translations manually (and built this instead)

Managing multilingual content has always felt… wrong to me. In most projects, it quickly turns into: duplicated fields ( title_en , title_fr ) messy i18n JSON files constant synchronization issues At some point, I started wondering: why is this even a developer problem? Rethinking the approach Instead of treating translations as something external (keys, files, etc.), I tried a different approach: What if multilingual support was part of the data model itself? So I built a small Airtable-like system where fields are multilingual by design. You write content once, and it becomes available in multiple languages automatically. Example: Title: "Hello world" → fr: Bonjour le monde → es: Hola mundo No keys. No duplication. No sync issues. How it works Each field stores multiple language versions

DEV Community

2m19 minutes ago

ProductsLive

Anthropic Just Paid $400M for a Team of 10. Here's Why That Makes Sense.

Eight months. That's how long Coefficient Bio existed before Anthropic bought it for $400 million in stock. No public product. No disclosed revenue. No conventional traction metrics. Just a small team of fewer than 10 people, most of them former Genentech computational biology researchers, and one very large claim: they were building artificial superintelligence for science. Anthropic paid up anyway. And if you look at what they've been building in healthcare and life sciences over the past year, this acquisition is less of a surprise and more of a logical endpoint. Who Is Coefficient Bio? Coefficient Bio was founded roughly eight months ago by Samuel Stanton and Nathan C. Frey. Both came from Prescient Design, Genentech's computational drug discovery unit. Frey led a group there working o

DEV Community

6m33 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 206 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Apple to Turn Siri Into AI Chatbot Powered by Google’s Gemini - PYMNTS.com

Apple to Turn Siri Into AI Chatbot Powered by Google’s Gemini PYMNTS.com

GNews AI assistant

1m2 months ago

Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT WSJ

Google News: ChatGPT

1m5 days ago

Models

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ

Anthropic Races to Contain Leak of Code Behind Claude AI Agent WSJ Anthropic leak reveals Claude Code tracking user frustration and raises new questions about AI privacy Scientific American Anthropic leaked 500,000 lines of its own source code Axios

Google News: Claude

1m3 days ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m4 days ago