Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessWeekend Project: I Built a Full MLOps Pipeline for a Credit Scoring Model (And You Can Too)Hackernoon AIUMich Engineering, School of Information offers AI minors - The Michigan DailyGNews AI educationShahed-killing interceptor drones may look simple, but building them to keep up with the threat isn't easyBusiness InsiderHow Strataphy Geothermal Cooling to Manage AI's Energy Demands - cairoscene.comGNews AI energyUber drivers: Your boss knows you're using Tesla's FSD on the jobBusiness InsiderPitchBook: US venture funding surges to record $267B as OpenAI, Anthropic and xAI dominate AI deals - SiliconANGLEGoogle News: OpenAIEfficient and Principled Scientific Discovery through Bayesian Optimization: A TutorialarXivSECURE: Stable Early Collision Understanding via Robust Embeddings in Autonomous DrivingarXivSven: Singular Value Descent as a Computationally Efficient Natural Gradient MethodarXivModel Merging via Data-Free Covariance EstimationarXivDetecting Complex Money Laundering Patterns with Incremental and Distributed Graph ModelingarXivDySCo: Dynamic Semantic Compression for Effective Long-term Time Series ForecastingarXivBlack Hat USADark ReadingBlack Hat AsiaAI BusinessWeekend Project: I Built a Full MLOps Pipeline for a Credit Scoring Model (And You Can Too)Hackernoon AIUMich Engineering, School of Information offers AI minors - The Michigan DailyGNews AI educationShahed-killing interceptor drones may look simple, but building them to keep up with the threat isn't easyBusiness InsiderHow Strataphy Geothermal Cooling to Manage AI's Energy Demands - cairoscene.comGNews AI energyUber drivers: Your boss knows you're using Tesla's FSD on the jobBusiness InsiderPitchBook: US venture funding surges to record $267B as OpenAI, Anthropic and xAI dominate AI deals - SiliconANGLEGoogle News: OpenAIEfficient and Principled Scientific Discovery through Bayesian Optimization: A TutorialarXivSECURE: Stable Early Collision Understanding via Robust Embeddings in Autonomous DrivingarXivSven: Singular Value Descent as a Computationally Efficient Natural Gradient MethodarXivModel Merging via Data-Free Covariance EstimationarXivDetecting Complex Money Laundering Patterns with Incremental and Distributed Graph ModelingarXivDySCo: Dynamic Semantic Compression for Effective Long-term Time Series ForecastingarXiv
AI NEWS HUBbyEIGENVECTOREigenvector

Model Merging via Data-Free Covariance Estimation

arXivApril 3, 202610 min read0 views
Source Quiz

arXiv:2604.01329v1 Announce Type: new Abstract: Model merging provides a way of cheaply combining individual models to produce a model that inherits each individual's capabilities. While some merging methods can approach the performance of multitask training, they are often heuristically motivated and lack theoretical justification. A principled alternative is to pose model merging as a layer-wise optimization problem that directly minimizes interference between tasks. However, this formulation requires estimating per-layer covariance matrices from data, which may not be available when perform — Marawan Gamal Abdel Hameed, Derek Tam, Pascal Jr Tikeng Notsawo, Colin Raffel, Guillaume Rabusseau

View PDF

Abstract:Model merging provides a way of cheaply combining individual models to produce a model that inherits each individual's capabilities. While some merging methods can approach the performance of multitask training, they are often heuristically motivated and lack theoretical justification. A principled alternative is to pose model merging as a layer-wise optimization problem that directly minimizes interference between tasks. However, this formulation requires estimating per-layer covariance matrices from data, which may not be available when performing merging. In contrast, many of the heuristically-motivated methods do not require auxiliary data, making them practically advantageous. In this work, we revisit the interference minimization framework and show that, under certain conditions, covariance matrices can be estimated directly from difference matrices, eliminating the need for data while also reducing computational costs. We validate our approach across vision and language benchmarks on models ranging from 86M parameters to 7B parameters, outperforming previous data-free state-of-the-art merging methods

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2604.01329 [cs.LG]

(or arXiv:2604.01329v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2604.01329

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Marawan Gamal Abdel Hameed [view email] [v1] Wed, 1 Apr 2026 19:16:31 UTC (273 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Model Mergi…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 216 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers