Research Papers research paper arxiv machine-learning deep-learning

Model Merging via Data-Free Covariance Estimation

arXivApril 3, 202610 min read0 views

arXiv:2604.01329v1 Announce Type: new Abstract: Model merging provides a way of cheaply combining individual models to produce a model that inherits each individual's capabilities. While some merging methods can approach the performance of multitask training, they are often heuristically motivated and lack theoretical justification. A principled alternative is to pose model merging as a layer-wise optimization problem that directly minimizes interference between tasks. However, this formulation requires estimating per-layer covariance matrices from data, which may not be available when perform — Marawan Gamal Abdel Hameed, Derek Tam, Pascal Jr Tikeng Notsawo, Colin Raffel, Guillaume Rabusseau

View PDF

Abstract:Model merging provides a way of cheaply combining individual models to produce a model that inherits each individual's capabilities. While some merging methods can approach the performance of multitask training, they are often heuristically motivated and lack theoretical justification. A principled alternative is to pose model merging as a layer-wise optimization problem that directly minimizes interference between tasks. However, this formulation requires estimating per-layer covariance matrices from data, which may not be available when performing merging. In contrast, many of the heuristically-motivated methods do not require auxiliary data, making them practically advantageous. In this work, we revisit the interference minimization framework and show that, under certain conditions, covariance matrices can be estimated directly from difference matrices, eliminating the need for data while also reducing computational costs. We validate our approach across vision and language benchmarks on models ranging from 86M parameters to 7B parameters, outperforming previous data-free state-of-the-art merging methods

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2604.01329 [cs.LG]

(or arXiv:2604.01329v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2604.01329

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Marawan Gamal Abdel Hameed [view email] [v1] Wed, 1 Apr 2026 19:16:31 UTC (273 KB)

Original source

arXiv

https://arxiv.org/abs/2604.01329

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersLive

Probabilistic AVL Trees (p-AVL): Relaxing Deterministic Balancing

arXiv:2604.02223v1 Announce Type: new Abstract: This paper studies the empirical behaviour of the p-AVL tree, a probabilistic variant of the AVL tree in which each imbalance is repaired with probability $p$. This gives an exact continuous interpolation from $p = 0$, which recovers the BST endpoint, to $p = 1$, which recovers the standard AVL tree. Across random-order insertion experiments, we track rotations per node, total imbalance events, average depth, average height, and a global imbalance statistic $\sigma$. The main empirical result is that even small nonzero p already causes a strong structural change. The goal here is empirical rather than fully theoretical: to document the behaviour of the p-AVL family clearly and identify the main patterns.

arXiv cs.DS

1mabout 1 hour ago

Research PapersLive

A Constant-Approximation Distance Labeling Scheme under Polynomially Many Edge Failures

arXiv:2604.01829v1 Announce Type: new Abstract: A fault-tolerant distance labeling scheme assigns a label to each vertex and edge of an undirected weighted graph $G$ with $n$ vertices so that, for any edge set $F$ of size $|F| \leq f$, one can approximate the distance between $p$ and $q$ in $G \setminus F$ by reading only the labels of $F \cup \{p,q\}$. For any $k$, we present a deterministic polynomial-time scheme with $O(k^{4})$ approximation and $\tilde{O}(f^{4}n^{1/k})$ label size. This is the first scheme to achieve a constant approximation while handling any number of edge faults $f$, resolving the open problem posed by Dory and Parter [DP21]. All previous schemes provided only a linear-in-$f$ approximation [DP21, LPS25]. Our labeling scheme directly improves the state of the art in

arXiv cs.DS

1mabout 1 hour ago

Research PapersLive

Adaptive Fully Dynamic $k$-Center Clustering with (Near-)Optimal Worst-Case Guarantees

arXiv:2604.01726v1 Announce Type: new Abstract: Given a sequence of adversarial point insertions and point deletions, is it possible to simultaneously optimize the approximation ratio, update time, and recourse for a $k$-clustering problem? If so, can this be achieved with worst-case guarantees against an adaptive adversary? These questions have garnered significant attention in recent years. Prior works by Bhattacharya, Costa, Garg, Lattanzi, and Parotsidis [FOCS '24] and by Bhattacharya, Costa, and Farokhnejad [STOC '25] have taken significant steps toward this direction for the $k$-median clustering problem and its generalization, the $(k, z)$-clustering problem. In this paper, we study the $k$-center clustering problem, which is one of the most classical and well-studied $k$-clustering

arXiv cs.DS

2mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 216 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Probabilistic AVL Trees (p-AVL): Relaxing Deterministic Balancing

arXiv cs.DS

1mabout 1 hour ago

Research PapersLive

A Constant-Approximation Distance Labeling Scheme under Polynomially Many Edge Failures

arXiv cs.DS

1mabout 1 hour ago

Research PapersLive

Adaptive Fully Dynamic $k$-Center Clustering with (Near-)Optimal Worst-Case Guarantees

arXiv cs.DS

2mabout 1 hour ago

Research PapersLive

Single-Pass Streaming CSPs via Two-Tier Sampling

arXiv:2604.01575v1 Announce Type: new Abstract: We study the maximum constraint satisfaction problem, Max-CSP, in the streaming setting. Given $n$ variables, the constraints arrive sequentially in an arbitrary order, with each constraint involving only a small subset of the variables. The objective is to approximate the maximum fraction of constraints that can be satisfied by an optimal assignment in a single pass. The problem admits a trivial near-optimal solution with $O(n)$ space, so the major open problem in the literature has been the best approximation achievable when limiting the space to $o(n)$. The answer to the question above depends heavily on the CSP instance at hand. The integrality gap $\alpha$ of an LP relaxation, known as the BasicLP, plays a central role. In particular, a

arXiv cs.DS

2mabout 1 hour ago