Model Merging via Data-Free Covariance Estimation
arXiv:2604.01329v1 Announce Type: new Abstract: Model merging provides a way of cheaply combining individual models to produce a model that inherits each individual's capabilities. While some merging methods can approach the performance of multitask training, they are often heuristically motivated and lack theoretical justification. A principled alternative is to pose model merging as a layer-wise optimization problem that directly minimizes interference between tasks. However, this formulation requires estimating per-layer covariance matrices from data, which may not be available when perform — Marawan Gamal Abdel Hameed, Derek Tam, Pascal Jr Tikeng Notsawo, Colin Raffel, Guillaume Rabusseau
View PDF
Abstract:Model merging provides a way of cheaply combining individual models to produce a model that inherits each individual's capabilities. While some merging methods can approach the performance of multitask training, they are often heuristically motivated and lack theoretical justification. A principled alternative is to pose model merging as a layer-wise optimization problem that directly minimizes interference between tasks. However, this formulation requires estimating per-layer covariance matrices from data, which may not be available when performing merging. In contrast, many of the heuristically-motivated methods do not require auxiliary data, making them practically advantageous. In this work, we revisit the interference minimization framework and show that, under certain conditions, covariance matrices can be estimated directly from difference matrices, eliminating the need for data while also reducing computational costs. We validate our approach across vision and language benchmarks on models ranging from 86M parameters to 7B parameters, outperforming previous data-free state-of-the-art merging methods
Subjects:
Machine Learning (cs.LG)
Cite as: arXiv:2604.01329 [cs.LG]
(or arXiv:2604.01329v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2604.01329
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Marawan Gamal Abdel Hameed [view email] [v1] Wed, 1 Apr 2026 19:16:31 UTC (273 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Probabilistic AVL Trees (p-AVL): Relaxing Deterministic Balancing
arXiv:2604.02223v1 Announce Type: new Abstract: This paper studies the empirical behaviour of the p-AVL tree, a probabilistic variant of the AVL tree in which each imbalance is repaired with probability $p$. This gives an exact continuous interpolation from $p = 0$, which recovers the BST endpoint, to $p = 1$, which recovers the standard AVL tree. Across random-order insertion experiments, we track rotations per node, total imbalance events, average depth, average height, and a global imbalance statistic $\sigma$. The main empirical result is that even small nonzero p already causes a strong structural change. The goal here is empirical rather than fully theoretical: to document the behaviour of the p-AVL family clearly and identify the main patterns.

A Constant-Approximation Distance Labeling Scheme under Polynomially Many Edge Failures
arXiv:2604.01829v1 Announce Type: new Abstract: A fault-tolerant distance labeling scheme assigns a label to each vertex and edge of an undirected weighted graph $G$ with $n$ vertices so that, for any edge set $F$ of size $|F| \leq f$, one can approximate the distance between $p$ and $q$ in $G \setminus F$ by reading only the labels of $F \cup \{p,q\}$. For any $k$, we present a deterministic polynomial-time scheme with $O(k^{4})$ approximation and $\tilde{O}(f^{4}n^{1/k})$ label size. This is the first scheme to achieve a constant approximation while handling any number of edge faults $f$, resolving the open problem posed by Dory and Parter [DP21]. All previous schemes provided only a linear-in-$f$ approximation [DP21, LPS25]. Our labeling scheme directly improves the state of the art in

Adaptive Fully Dynamic $k$-Center Clustering with (Near-)Optimal Worst-Case Guarantees
arXiv:2604.01726v1 Announce Type: new Abstract: Given a sequence of adversarial point insertions and point deletions, is it possible to simultaneously optimize the approximation ratio, update time, and recourse for a $k$-clustering problem? If so, can this be achieved with worst-case guarantees against an adaptive adversary? These questions have garnered significant attention in recent years. Prior works by Bhattacharya, Costa, Garg, Lattanzi, and Parotsidis [FOCS '24] and by Bhattacharya, Costa, and Farokhnejad [STOC '25] have taken significant steps toward this direction for the $k$-median clustering problem and its generalization, the $(k, z)$-clustering problem. In this paper, we study the $k$-center clustering problem, which is one of the most classical and well-studied $k$-clustering
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Probabilistic AVL Trees (p-AVL): Relaxing Deterministic Balancing
arXiv:2604.02223v1 Announce Type: new Abstract: This paper studies the empirical behaviour of the p-AVL tree, a probabilistic variant of the AVL tree in which each imbalance is repaired with probability $p$. This gives an exact continuous interpolation from $p = 0$, which recovers the BST endpoint, to $p = 1$, which recovers the standard AVL tree. Across random-order insertion experiments, we track rotations per node, total imbalance events, average depth, average height, and a global imbalance statistic $\sigma$. The main empirical result is that even small nonzero p already causes a strong structural change. The goal here is empirical rather than fully theoretical: to document the behaviour of the p-AVL family clearly and identify the main patterns.

A Constant-Approximation Distance Labeling Scheme under Polynomially Many Edge Failures
arXiv:2604.01829v1 Announce Type: new Abstract: A fault-tolerant distance labeling scheme assigns a label to each vertex and edge of an undirected weighted graph $G$ with $n$ vertices so that, for any edge set $F$ of size $|F| \leq f$, one can approximate the distance between $p$ and $q$ in $G \setminus F$ by reading only the labels of $F \cup \{p,q\}$. For any $k$, we present a deterministic polynomial-time scheme with $O(k^{4})$ approximation and $\tilde{O}(f^{4}n^{1/k})$ label size. This is the first scheme to achieve a constant approximation while handling any number of edge faults $f$, resolving the open problem posed by Dory and Parter [DP21]. All previous schemes provided only a linear-in-$f$ approximation [DP21, LPS25]. Our labeling scheme directly improves the state of the art in

Adaptive Fully Dynamic $k$-Center Clustering with (Near-)Optimal Worst-Case Guarantees
arXiv:2604.01726v1 Announce Type: new Abstract: Given a sequence of adversarial point insertions and point deletions, is it possible to simultaneously optimize the approximation ratio, update time, and recourse for a $k$-clustering problem? If so, can this be achieved with worst-case guarantees against an adaptive adversary? These questions have garnered significant attention in recent years. Prior works by Bhattacharya, Costa, Garg, Lattanzi, and Parotsidis [FOCS '24] and by Bhattacharya, Costa, and Farokhnejad [STOC '25] have taken significant steps toward this direction for the $k$-median clustering problem and its generalization, the $(k, z)$-clustering problem. In this paper, we study the $k$-center clustering problem, which is one of the most classical and well-studied $k$-clustering

Single-Pass Streaming CSPs via Two-Tier Sampling
arXiv:2604.01575v1 Announce Type: new Abstract: We study the maximum constraint satisfaction problem, Max-CSP, in the streaming setting. Given $n$ variables, the constraints arrive sequentially in an arbitrary order, with each constraint involving only a small subset of the variables. The objective is to approximate the maximum fraction of constraints that can be satisfied by an optimal assignment in a single pass. The problem admits a trivial near-optimal solution with $O(n)$ space, so the major open problem in the literature has been the best approximation achievable when limiting the space to $o(n)$. The answer to the question above depends heavily on the CSP instance at hand. The integrality gap $\alpha$ of an LP relaxation, known as the BasicLP, plays a central role. In particular, a


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!