Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression
arXiv:2604.01609v1 Announce Type: new Abstract: The deployment of Large Language Models is constrained by the memory and bandwidth demands of static weights and dynamic Key-Value cache. SVD-based compression provides a hardware-friendly solution to reduce these costs. However, existing methods suffer from two key limitations: some are suboptimal in reconstruction error, while others are theoretically optimal but practically inefficient. In this paper, we propose Swift-SVD, an activation-aware, closed-form compression framework that simultaneously guarantees theoretical optimum, practical efficiency and numerical stability. Swift-SVD incrementally aggregates covariance of output activations given a batch of inputs and performs a single eigenvalue decomposition after aggregation, enabling tr
View PDF HTML (experimental)
Abstract:The deployment of Large Language Models is constrained by the memory and bandwidth demands of static weights and dynamic Key-Value cache. SVD-based compression provides a hardware-friendly solution to reduce these costs. However, existing methods suffer from two key limitations: some are suboptimal in reconstruction error, while others are theoretically optimal but practically inefficient. In this paper, we propose Swift-SVD, an activation-aware, closed-form compression framework that simultaneously guarantees theoretical optimum, practical efficiency and numerical stability. Swift-SVD incrementally aggregates covariance of output activations given a batch of inputs and performs a single eigenvalue decomposition after aggregation, enabling training-free, fast, and optimal layer-wise low-rank approximation. We employ effective rank to analyze local layer-wise compressibility and design a dynamic rank allocation strategy that jointly accounts for local reconstruction loss and end-to-end layer importance. Extensive experiments across six LLMs and eight datasets demonstrate that Swift-SVD outperforms state-of-the-art baselines, achieving optimal compression accuracy while delivering 3-70X speedups in end-to-end compression time. Our code will be released upon acceptance.
Comments: Under Review
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2604.01609 [cs.CL]
(or arXiv:2604.01609v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2604.01609
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Jian Chen [view email] [v1] Thu, 2 Apr 2026 04:40:50 UTC (613 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modeltraining
Same Model, Different Environment, Different Results
Same Model, Different Environment, Different Results I've been running the same foundation model in two different environments for the same project for several months. Not different models — the same one. Same underlying weights, same training, same capabilities. The only difference is the environment: what tools are available, how session state persists, what gets loaded into context before I ask a question. The outputs are systematically different. Not randomly different — not the kind of variation you'd get from temperature or sampling. Structurally different, in ways that repeat across sessions and follow predictable patterns. When I ask a causal question in one environment — "Why does this component exist?" — I get back a dependency chain. Clean, correct, verifiable against stored dat

I stopped managing translations manually (and built this instead)
Managing multilingual content has always felt… wrong to me. In most projects, it quickly turns into: duplicated fields ( title_en , title_fr ) messy i18n JSON files constant synchronization issues At some point, I started wondering: why is this even a developer problem? Rethinking the approach Instead of treating translations as something external (keys, files, etc.), I tried a different approach: What if multilingual support was part of the data model itself? So I built a small Airtable-like system where fields are multilingual by design. You write content once, and it becomes available in multiple languages automatically. Example: Title: "Hello world" → fr: Bonjour le monde → es: Hola mundo No keys. No duplication. No sync issues. How it works Each field stores multiple language versions

Anthropic Just Paid $400M for a Team of 10. Here's Why That Makes Sense.
Eight months. That's how long Coefficient Bio existed before Anthropic bought it for $400 million in stock. No public product. No disclosed revenue. No conventional traction metrics. Just a small team of fewer than 10 people, most of them former Genentech computational biology researchers, and one very large claim: they were building artificial superintelligence for science. Anthropic paid up anyway. And if you look at what they've been building in healthcare and life sciences over the past year, this acquisition is less of a surprise and more of a logical endpoint. Who Is Coefficient Bio? Coefficient Bio was founded roughly eight months ago by Samuel Stanton and Nathan C. Frey. Both came from Prescient Design, Genentech's computational drug discovery unit. Frey led a group there working o
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Anthropic Races to Contain Leak of Code Behind Claude AI Agent - WSJ
Anthropic Races to Contain Leak of Code Behind Claude AI Agent WSJ Anthropic leak reveals Claude Code tracking user frustration and raises new questions about AI privacy Scientific American Anthropic leaked 500,000 lines of its own source code Axios



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!